Logic and set theory form the foundational bedrock of modern mathematics. This course develops both the formal languages we use to express mathematical statements and the axiomatic systems that govern the objects we study. Rather than treating logic and set theory as separate subjects, we weave them together, revealing a deep parallel between semantic notions (what statements mean, what objects exist) and syntactic notions (what we can prove, what rules of inference are valid). This interplay culminates in the completeness theorems—first for propositional logic, later for predicate logic—which show that semantic truth and syntactic provability coincide.
The course begins with propositional calculus, where we learn to manipulate truth values and build formal derivations. We then introduce order structures—well-orderings, ordinals, and posets—which organize mathematical objects hierarchically and lead us to Zorn's lemma, a principle so powerful it underlies the existence of bases in vector spaces, maximal ideals in rings, and choice functions on infinite collections. Alongside these set-theoretic ideas, we develop predicate logic, the machinery for quantifying over objects and deriving consequences about them. This prepares us for axiomatic set theory itself: the Zermelo–Fraenkel axioms, the von Neumann cumulative hierarchy that constructs all sets from the empty set, and the theory of cardinals that measures infinite sizes. We close with incompleteness—Gödel's profound result that no consistent formal system rich enough to capture arithmetic can prove all truths about itself—a humbling boundary on what axioms can achieve.
By the end of this course, you will understand how modern mathematics rests on explicit, verifiable principles; how the concept of infinity is formalized and studied rigorously; and why certain mathematical questions cannot be resolved within standard axioms. You will have internalized the language of logic and developed facility with the set-theoretic constructions that underpin nearly every mathematical discipline.
# Introduction
This course is, at its core, about the foundations of mathematics itself. Most working mathematicians treat sets as self-evident objects — things can belong to other things, collections can be formed freely, and familiar operations like union, intersection, and subset are taken as given. This naive picture feels secure until it collapses. The first chapter explains why that collapse is inevitable, why it led to the formal apparatus studied in this course, and why that apparatus is genuinely difficult to set up without getting into trouble.
## The Problem with Naive Set Theory
The naive idea is simple: given any property $\phi$, we can form the set of all objects satisfying it. This principle is called **unrestricted comprehension**, and it is what allows us to write down familiar sets like $\{x \in \mathbb{Z} : x > 0\}$, the set of even integers, or the empty set $\varnothing = \{x : x \neq x\}$.
[definition: Unrestricted Comprehension]
The **unrestricted comprehension principle** (also called naive comprehension) asserts: for any property $\phi$, the collection
\begin{align*}
\{x : \phi(x)\}
\end{align*}
is a set. That is, for any formula $\phi$, there exists a set whose members are precisely those objects $x$ for which $\phi(x)$ holds.
[/definition]
The difficulty — discovered by Bertrand Russell in 1901 — is that unrestricted comprehension leads immediately to a contradiction. Consider the property $\phi(x) := x \notin x$, the property that a set does not contain itself as a member. By unrestricted comprehension, the collection
\begin{align*}
X := \{x : x \notin x\}
\end{align*}
is a set. But now ask: does $X$ belong to itself?
[quotetheorem:620]
The paradox is not an incidental flaw that might be patched by minor adjustment — it exposes a structural defect in unrestricted comprehension itself. The single axiom the paradox exploits is the claim that *any* formula $\phi$ determines a set; the self-referential nature of $\phi(x) := x \notin x$ is what turns this freedom into a contradiction. One might hope that simply forbidding the specific property "$x \notin x$" would suffice, but this response is inadequate for two reasons. First, Russell's paradox is only the most transparent of a family of such contradictions: **Cantor's paradox** (the set of all sets has a power set strictly larger than itself, contradicting the assumption that the set of all sets is maximal) and the **Burali-Forti paradox** (the collection of all ordinals is itself an ordinal strictly larger than itself) arise from the same permissive stance toward set formation, without any overt mention of self-membership. Second, banning self-referential definitions piecemeal does not touch the underlying pathology, which is that *any* comprehension scheme liberal enough to permit "the set of all sets" will generate paradoxes. The correct response is not to prohibit particular formulas but to restrict comprehension itself: sets may only be carved out of already-existing sets, never conjured from a universe of all objects.
[remark: The Barber Paradox]
Russell popularised an informal version: a barber shaves exactly those people who do not shave themselves. Does the barber shave himself? The logical structure is identical to the set-theoretic paradox. The informal version makes the contradiction vivid, but the mathematical version is what matters: naive set theory is inconsistent, not merely paradoxical.
[/remark]
This is not a curiosity about a strange, pathological set. It is a proof that the unrestricted comprehension principle is logically untenable. Any mathematical system built on it is inconsistent — every statement is provable, which means nothing is provable in any meaningful sense. The entire edifice of mathematics, if resting on naive set theory, is built on sand.
## The Response: Axiomatic Set Theory
The response to Russell's paradox was to replace the naive picture with a carefully chosen list of axioms — specific, explicit principles about what sets exist and how they can be formed, chosen precisely to be strong enough to derive all of mathematics while avoiding the contradictions that unrestricted comprehension produces. This is the programme of **axiomatic set theory**, most fully developed by Zermelo and Fraenkel in the early twentieth century.
The analogy with Euclid's geometry is apt. Euclid did not define lines and points in terms of anything more primitive; he stated axioms about how they behave. Similarly, axiomatic set theory does not define what sets *are* — it states axioms about membership $\in$, and derives everything from those axioms alone. The axioms of ZF set theory (Zermelo–Fraenkel) will be developed in detail later in the course. They include, for instance, a restricted comprehension principle: given an *existing* set $A$, one may form $\{x \in A : \phi(x)\}$, but one may not form $\{x : \phi(x)\}$ without first specifying the ambient set $A$. This single restriction blocks Russell's paradox: the set $X = \{x : x \notin x\}$ would require an ambient set of *all* sets, and the axioms do not permit that set to exist.
The contrast between the two comprehension principles is important enough to illustrate concretely.
[example: Unrestricted vs Restricted Comprehension]
Consider the property $\phi(x) :=$ "$x$ is an even integer."
**Under unrestricted comprehension**, we may form $E := \{x : x \text{ is even}\}$ outright, with no ambient set required. This particular set causes no trouble — the even integers form a perfectly coherent collection. But unrestricted comprehension makes no distinction between benign properties and dangerous ones. The same principle that produces $E$ also produces $X := \{x : x \notin x\}$, which we have already seen leads to contradiction.
**Under restricted comprehension** (as in ZF), we must first specify an ambient set. Given the set $\mathbb{Z}$ of integers (whose existence is guaranteed by the axioms), we may form
\begin{align*}
E := \{x \in \mathbb{Z} : x \text{ is even}\},
\end{align*}
and this is a legitimate set. But we cannot form $\{x : x \notin x\}$ without an ambient set — and no axiom of ZF provides a set of all sets to serve as that ambient. The dangerous collection is blocked at the source.
The key point is that restricted comprehension gives us all the sets we need for ordinary mathematics (we can always specify a suitable ambient set), while ruling out exactly the self-referential constructions that cause paradoxes.
[/example]
Blocking the paradoxes at the level of set formation is a substantial achievement, but it leaves a deeper question untouched: can we be *certain* that the axioms we have chosen are free of hidden contradictions? Russell's paradox was visible once someone thought to look for it; the history of the subject gives no guarantee that a subtler contradiction does not lurk somewhere in ZF. One might hope that the axiomatic system could be marshalled to prove its own reliability — that ZF itself could establish $\text{ZF}$ is consistent. The striking discovery of the 1930s was that this hope is in principle unattainable, and the reason is a general feature of formal systems rather than any specific defect of ZF.
[remark: Gödel's Shadow]
Axiomatic set theory appears to resolve the paradoxes, but Gödel's incompleteness theorems (1931) cast a long shadow. Informally, Gödel's second incompleteness theorem asserts that no sufficiently powerful consistent formal system can prove its own consistency — in particular, ZF cannot prove from within itself that ZF is free of contradictions. Mathematicians generally accept ZF as a working foundation not because its consistency is provable, but because decades of use have produced no contradiction, and because the axioms match our refined intuition about sets. A precise formulation and proof sketch of both incompleteness theorems appears in Chapter 8.
[/remark]
## Formal Logic and Its Relationship to Set Theory
Closely related to axiomatic set theory is the programme of **formal logic**. Here the goal is to put reasoning itself on a rigorous footing: to define precisely what a *proposition* is, what it means for a proposition to be *true*, and what it means to *prove* something. The informal notion of proof — a convincing argument — is replaced by a precise syntactic notion: a proof is a finite sequence of formulas, each of which is either an axiom or follows from earlier formulas by one of a fixed list of inference rules.
The central result connecting truth and proof is the Completeness Theorem. For propositional logic, it asserts that semantic truth (being a tautology) and syntactic provability coincide exactly. This is not a philosophical consolation — it is a precise mathematical theorem with a constructive proof.
[quotetheorem:1450]
The scope of the Completeness Theorem deserves careful qualification, because its reach is narrower than a first reading suggests. The theorem is a statement about **first-order** logic, and it genuinely fails once one moves beyond that setting: in **second-order logic** — where one may quantify over sets, relations, or functions, not merely over individuals — there is no deductive system that is simultaneously sound and complete with respect to the standard semantics. Any attempt to extend the first-order proof apparatus to the second-order case loses either soundness or completeness; the mismatch between what is semantically valid and what is syntactically derivable is unavoidable. A second limitation concerns the proof itself. The argument for countable languages (finitely or countably many symbols) can be carried out in reasonably weak meta-theory, but for uncountable languages the construction of the required model uses the Axiom of Choice in an essential way — typically via Zorn's Lemma or an equivalent maximality principle. The theorem for uncountable languages is therefore not simply a mechanical extension of the countable case; it requires genuine set-theoretic machinery, which is one reason the course develops Zorn's Lemma before turning to predicate logic.
These theorems justify the entire activity of proof-writing: seeking a proof is a reliable method for establishing truth, and every semantic truth is in principle reachable by syntactic means. The course will develop propositional logic first, proving its completeness theorem and deriving the compactness theorem as a consequence. It will then treat predicate logic, which is richer and more subtle: here the completeness theorem is due to Gödel (1930), and consequences like the Löwenheim–Skolem theorems reveal genuine limitations of first-order reasoning.
## The Unavoidable Circularity
There is a philosophical point that deserves direct acknowledgement: this course cannot be entirely free of circular reasoning.
To study formal logic rigorously, we need to reason about formal systems — about the set of all proofs, about models and their properties, about cardinalities of formula sets. All of that meta-theoretic reasoning takes place in ordinary mathematical language, which is itself grounded in sets. Conversely, to give set theory a rigorous foundation, we develop it within the framework of first-order logic — but first-order logic is itself a formal system whose properties we understand via informal mathematical reasoning.
There is no way to escape this circle entirely. We cannot simultaneously bootstrap logic from nothing and use logic to verify what we are doing. What we can do — and what this course does — is be honest about the layering. Informal reasoning is used to study formal systems; formal systems are used to codify mathematical reasoning; and the resulting picture is internally coherent even if it cannot be grounded in something more primitive. One is not, in the end, permitted to complain that this involves circular reasoning. It does, and it must, and mathematics has thrived under these conditions for a century.
## Structure of the Course
The course interleaves topics from formal logic and set theory in a way that reflects their mutual dependence, rather than treating them as entirely separate subjects. The main threads are:
**Ordinals and cardinals.** Before diving into formal logic, the course develops the theory of well-orderings and ordinal numbers. Ordinals provide a canonical way to compare sizes of well-ordered sets; cardinals measure the sizes of sets in absolute terms. The arithmetic of infinite cardinals — and especially the hierarchy of alephs $\aleph_0, \aleph_1, \aleph_2, \ldots$ — is one of the most striking features of set theory.
**Posets and Zorn's Lemma.** Partially ordered sets appear throughout mathematics, and the Axiom of Choice — in its equivalent form as Zorn's Lemma — is one of the most used and most contested principles in the foundations of mathematics. The course examines its statement, its equivalents, and its applications.
**Propositional logic.** The propositional calculus is the simplest formal logic: propositions have no internal structure, and everything reduces to truth-value assignments. Its completeness and compactness theorems are proved in full.
**Predicate logic.** First-order predicate logic adds quantifiers ($\forall$ and $\exists$) and a richer language. Its completeness theorem is stated and its proof sketched. The compactness theorem and Löwenheim–Skolem theorems are proved and their implications explored.
**Axiomatic set theory.** The axioms of ZF are stated within the framework of first-order logic, and the course develops the consequences: transitive closures, well-founded relations, $\epsilon$-induction, Mostowski's collapsing theorem, and the von Neumann hierarchy of sets.
**Consistency.** The course concludes with a discussion of the limits of formal systems — the problems of consistency and independence that Gödel's theorems make unavoidable.
[motivation]
### Why Study Foundations?
A natural reaction is: mathematics works. Analysts prove theorems about $\mathbb{R}$, algebraists prove theorems about groups, and none of them worry about whether sets are "really" well-defined. Why spend a course on foundations?
The answer has several layers. The first is historical necessity: the paradoxes were not hypothetical worries but actual refutations of systems that mathematicians were using. Russell's paradox appeared in Frege's system, which Frege had spent years building. The crisis was real, and the response — axiomatic set theory — shaped the entire twentieth century of mathematics.
The second is that foundational questions arise naturally at the edges of mathematics. When analysts ask whether every subset of $\mathbb{R}$ is measurable, or when algebraists ask whether every vector space has a basis, the answers depend on which axioms of set theory one accepts. These are not exotic questions; they are places where ordinary mathematics runs into the axiom of choice, the continuum hypothesis, and the hierarchy of consistency strengths.
The third is precision. Formal logic is the discipline of being exact about what "follows from" means. The completeness theorem is not a philosophical consolation — it is a precise mathematical theorem, proved by an explicit construction, with a concrete statement. Learning to read and write such theorems is a core mathematical skill.
[/motivation]
Having established the motivations and historical context for formal mathematics, we now turn to the first concrete system: propositional calculus, where logical statements are built from primitive symbols using a fixed set of connectives. This is where the formal study of logic begins — not with the paradoxes, but with the simplest language in which to reason precisely.
# 1. Propositional Calculus
Propositional calculus is the study of logical statements — expressions like $p \Rightarrow p$ and $p \Rightarrow (q \Rightarrow p)$ — built from primitive symbols using a fixed set of logical connectives. Unlike predicate calculus, which will occupy Chapter 5, propositional logic contains no quantifiers ($\forall$, $\exists$); all statements are formed purely from propositions and implication.
The central question of this chapter is: what does it mean for a statement to be "correct"? There are two distinct answers. We could mean that the statement is *semantically* correct — true under every possible assignment of truth values — or we could mean that the statement is *syntactically* correct — derivable by a formal proof from a fixed set of axioms. One of the main achievements of this chapter is the **completeness theorem**, which shows these two notions coincide. This equivalence is not merely reassuring: it has striking consequences for decidability and compactness.
It is important to appreciate from the outset that there is no single universal logical system. The axioms and deduction rules presented below are one of many possible choices. Fortunately, the major theorems — completeness, compactness, decidability — do not depend sensitively on the exact implementation.
## The Propositional Language
To do logic rigorously, we must first be precise about what counts as a meaningful statement. This requires specifying both the raw symbols and the grammatical rules for forming expressions from them.
[definition: Primitive Propositions and the Language $L(P)$]
Let $P$ be a set of **primitive propositions** — meaningless symbols (such as $p$, $q$, $r$, $p_1$, $p_2$, ...) that serve as the basic building blocks of logical statements.
The set of **propositions** $L$, also written $L(P)$, is defined inductively:
1. If $p \in P$, then $p \in L$.
2. $\bot \in L$, where $\bot$ is a distinguished symbol read as "false".
3. If $p, q \in L$, then $(p \Rightarrow q) \in L$.
To make this inductive definition precise, set
\begin{align*}
L_0 &= \{\bot\} \cup P, \\
L_{n+1} &= L_n \cup \{(p \Rightarrow q) : p, q \in L_n\},
\end{align*}
and define $L = \bigcup_{n=0}^{\infty} L_n$.
[/definition]
In formal language terms, $L$ is the set of finite strings over the alphabet $\{\bot, \Rightarrow, (, ), p_1, p_2, \ldots\}$ satisfying the grammar above. Notice that the only primitive logical connective is $\Rightarrow$; there is no $\neg$, $\wedge$, or $\vee$ built in. This minimalism is a deliberate choice — when proving results about the system, we only need to handle a single connective rather than four.
The familiar symbols are introduced as abbreviations:
[definition: Logical Abbreviations]
The following are **abbreviations** for certain expressions in $L$:
\begin{align*}
\neg p &\quad \text{abbreviates} \quad (p \Rightarrow \bot), \quad \text{"not } p\text{",} \\
p \wedge q &\quad \text{abbreviates} \quad \neg(p \Rightarrow \neg q), \quad \text{"$p$ and $q$",} \\
p \vee q &\quad \text{abbreviates} \quad (\neg p) \Rightarrow q, \quad \text{"$p$ or $q$".}
\end{align*}
[/definition]
These are purely notational conveniences; internally, every proposition is formed from $\bot$ and $\Rightarrow$ alone.
The following example illustrates how the grammar generates increasingly complex propositions and helps to build familiarity with the notation before we assign any meaning to these expressions.
[example: Propositions]
With primitive propositions $P = \{p, q, r\}$, the following are propositions in $L$:
\begin{align*}
&p \Rightarrow q, \\
&p \Rightarrow \bot, \\
&(p \Rightarrow q) \Rightarrow (p \Rightarrow r), \\
&((p \Rightarrow q) \Rightarrow (p \Rightarrow r)) \Rightarrow (p \Rightarrow p).
\end{align*}
The last two illustrate how the inductive grammar allows expressions of arbitrary complexity.
[/example]
## Semantic Entailment
Having defined the language, we now ask: when is a proposition *true*? The semantic approach assigns truth values to propositions and defines correctness in terms of these assignments.
[definition: Valuation]
A **valuation** on $L$ is a function $v: L \to \{0, 1\}$ satisfying:
- $v(\bot) = 0$,
- $v(p \Rightarrow q) = \begin{cases} 0 & \text{if } v(p) = 1 \text{ and } v(q) = 0, \\ 1 & \text{otherwise.} \end{cases}$
We interpret $v(p) = 1$ as "$p$ is true" and $v(p) = 0$ as "$p$ is false". No condition is imposed on $v(p)$ for $p \in P$; primitive propositions may take either truth value.
[/definition]
One may think of a valuation as a homomorphism: equip $\{0, 1\}$ with the binary operation $a \Rightarrow b$ defined by the table above, and a constant $\bot = 0$; then valuations are exactly the structure-preserving maps $L \to \{0, 1\}$.
The key fact about valuations is that they are determined entirely by their values on the primitive propositions. No additional freedom exists once the truth values of primitive propositions are fixed.
[quotetheorem:1451]
[citeproof:1451]
Part (1) makes explicit that truth tables are a complete method for evaluating propositions: to determine whether a proposition is a tautology, it suffices to check all $2^{|P_0|}$ assignments of truth values to the primitive propositions actually appearing in it, where $P_0 \subset P$ is the finite set of those propositions. The theorem also places a hard limit on what extra information could distinguish two valuations — there is none beyond the values on $P$.
The following example puts the inductive evaluation rule into practice and will be used to verify the axioms (A1)–(A3) in the next section.
[example: Computing a Valuation]
Let $v$ be the valuation with $v(p) = 1$, $v(q) = 1$, $v(r) = 0$. Then:
\begin{align*}
v(p \Rightarrow q) &= 1 \quad \text{(since $v(q) = 1$)}, \\
v(q \Rightarrow r) &= 0 \quad \text{(since $v(q) = 1$, $v(r) = 0$)}, \\
v((p \Rightarrow q) \Rightarrow r) &= 0 \quad \text{(since $v(p \Rightarrow q) = 1$, $v(r) = 0$)}.
\end{align*}
[/example]
### Tautologies and Semantic Entailment
Some propositions are true under *every* possible assignment of truth values to primitive propositions. These are the logical laws — the statements that hold independently of any particular interpretation.
[definition: Tautology]
A proposition $t \in L$ is a **tautology**, written $\models t$, if $v(t) = 1$ for every valuation $v$.
[/definition]
Tautologies can be verified using **truth tables**, which enumerate all possible valuations by systematically assigning 0 and 1 to each primitive proposition.
[example: Three Key Tautologies]
The following are tautologies that will serve as our axioms in the deduction system.
**(i)** $\models p \Rightarrow (q \Rightarrow p)$: "A true statement is implied by anything."
| $v(p)$ | $v(q)$ | $v(q \Rightarrow p)$ | $v(p \Rightarrow (q \Rightarrow p))$ |
|--------|--------|----------------------|--------------------------------------|
| 1 | 1 | 1 | 1 |
| 1 | 0 | 1 | 1 |
| 0 | 1 | 0 | 1 |
| 0 | 0 | 1 | 1 |
**(ii)** $\models (\neg\neg p) \Rightarrow p$: "Double negation elimination." Recall $\neg\neg p = (p \Rightarrow \bot) \Rightarrow \bot$.
| $v(p)$ | $v(p \Rightarrow \bot)$ | $v((p \Rightarrow \bot) \Rightarrow \bot)$ | $v((\neg\neg p) \Rightarrow p)$ |
|--------|-------------------------|--------------------------------------------|----------------------------------|
| 1 | 0 | 1 | 1 |
| 0 | 1 | 0 | 1 |
**(iii)** $\models [p \Rightarrow (q \Rightarrow r)] \Rightarrow [(p \Rightarrow q) \Rightarrow (p \Rightarrow r)]$: The "distributivity" of implication.
Rather than constructing the full $2^3 = 8$-row truth table, we argue by contradiction. Suppose this is not a tautology, so there exists $v$ with the whole expression false. Then $v(p \Rightarrow (q \Rightarrow r)) = 1$ and $v((p \Rightarrow q) \Rightarrow (p \Rightarrow r)) = 0$. The second forces $v(p \Rightarrow q) = 1$ and $v(p \Rightarrow r) = 0$, hence $v(p) = 1$ and $v(r) = 0$. From $v(p \Rightarrow q) = 1$ and $v(p) = 1$, we get $v(q) = 1$. But then $v(q \Rightarrow r) = 0$, which combined with $v(p) = 1$ gives $v(p \Rightarrow (q \Rightarrow r)) = 0$, a contradiction.
[/example]
The notion of tautology is a special case of a more general concept: the set $S$ of hypotheses may not be empty.
[definition: Semantic Entailment]
For $S \subset L$ and $t \in L$, we say $S$ **semantically entails** $t$, written $S \models t$, if every valuation $v$ that satisfies all $s \in S$ (i.e., $v(s) = 1$ for all $s \in S$) also satisfies $v(t) = 1$.
A valuation $v$ that satisfies all members of $S$ is called a **model** of $S$.
[/definition]
Note that $\models t$ and $\varnothing \models t$ are the same statement: $t$ is a tautology iff it is true in every model of the empty set, i.e., every valuation.
[example: Transitivity of Implication]
$\{p \Rightarrow q,\ q \Rightarrow r\} \models (p \Rightarrow r)$.
Let $v$ be any valuation with $v(p \Rightarrow q) = 1$ and $v(q \Rightarrow r) = 1$; we must show $v(p \Rightarrow r) = 1$. Suppose for contradiction that $v(p \Rightarrow r) = 0$; then $v(p) = 1$ and $v(r) = 0$. If $v(q) = 0$, then $v(p \Rightarrow q) = 0$, contradicting our hypothesis. If $v(q) = 1$, then $v(q \Rightarrow r) = 0$, again a contradiction.
[/example]
It is essential to distinguish the proposition $p \Rightarrow q$ from the semantic entailment relation $\{p\} \models q$. The expression $p \Rightarrow q$ is a syntactic object *within* the language $L$ — it is a string of symbols that a valuation may evaluate to 0 or 1. The statement $\{p\} \models q$ is a claim *about* the language, made in the meta-theory: it asserts that every valuation making $p$ true also makes $q$ true. One lives inside the system; the other lives outside it. The same care will be needed when distinguishing $\Rightarrow$ from the syntactic entailment symbol $\vdash$ introduced in the next section.
## Syntactic Entailment and Formal Proofs
Semantic entailment tells us what *is* true; but we also want a notion of what is *provable*. A formal proof should be a finite, mechanically checkable derivation. To make this precise, we specify axioms and deduction rules.
### The Axioms and Modus Ponens
Our deduction system is built from three axiom schemes — one for each primitive proposition and one for implication — together with a single rule of inference.
The three **axiom schemes** are, for any $p, q, r \in L$:
\begin{align*}
\text{(A1)} &\quad p \Rightarrow (q \Rightarrow p), \\
\text{(A2)} &\quad [p \Rightarrow (q \Rightarrow r)] \Rightarrow [(p \Rightarrow q) \Rightarrow (p \Rightarrow r)], \\
\text{(A3)} &\quad (\neg\neg p) \Rightarrow p.
\end{align*}
The single rule of inference is **modus ponens** (MP): from $p$ and $p \Rightarrow q$, deduce $q$.
Axiom (A1) says a true statement follows from anything. Axiom (A3) is double-negation elimination; it is what forces our logic to be classical rather than intuitionistic. The role of (A2) may seem mysterious at first — its purpose becomes clear in the proof of the Deduction Theorem below, where it is exactly the tool needed to internalize the meta-level entailment relation.
[definition: Proof and Syntactic Entailment]
For $S \subset L$ and $t \in L$, a **proof of $t$ from $S$** is a finite sequence $t_1, t_2, \ldots, t_n$ of propositions with $t_n = t$, where each $t_i$ is either:
1. an instance of one of the axiom schemes (A1)–(A3),
2. a member of $S$ (a **hypothesis**), or
3. obtained by modus ponens from two earlier lines: there exist $j, k < i$ with $t_j = (t_k \Rightarrow t_i)$.
If a proof of $t$ from $S$ exists, we write $S \vdash t$ and say $S$ **syntactically entails** $t$. If $\varnothing \vdash t$, we write $\vdash t$ and call $t$ a **theorem**.
[/definition]
To see that this definition produces a non-trivial system, it helps to exhibit an explicit proof of even the simplest possible theorem. The derivation of $p \Rightarrow p$ from no hypotheses is more involved than one might expect and illustrates why (A2) is indispensable.
[example: Proof of $p \Rightarrow p$]
$\vdash p \Rightarrow p$. The following is a formal proof:
\begin{align*}
&\text{1.} \quad [p \Rightarrow ((p \Rightarrow p) \Rightarrow p)] \Rightarrow [(p \Rightarrow (p \Rightarrow p)) \Rightarrow (p \Rightarrow p)] && \text{(A2)} \\
&\text{2.} \quad p \Rightarrow ((p \Rightarrow p) \Rightarrow p) && \text{(A1)} \\
&\text{3.} \quad (p \Rightarrow (p \Rightarrow p)) \Rightarrow (p \Rightarrow p) && \text{MP on 1, 2} \\
&\text{4.} \quad p \Rightarrow (p \Rightarrow p) && \text{(A1)} \\
&\text{5.} \quad p \Rightarrow p && \text{MP on 3, 4}
\end{align*}
[/example]
A different kind of syntactic derivation arises when hypotheses from $S$ are available. The next example shows how to derive transitivity of implication using the system, a result we previously verified only semantically.
[example: Proof of Transitivity from Hypotheses]
$\{p \Rightarrow q,\ q \Rightarrow r\} \vdash p \Rightarrow r$. The key idea is to produce $(p \Rightarrow q) \Rightarrow (p \Rightarrow r)$ via (A2):
\begin{align*}
&\text{1.} \quad [p \Rightarrow (q \Rightarrow r)] \Rightarrow [(p \Rightarrow q) \Rightarrow (p \Rightarrow r)] && \text{(A2)} \\
&\text{2.} \quad q \Rightarrow r && \text{Hypothesis} \\
&\text{3.} \quad (q \Rightarrow r) \Rightarrow [p \Rightarrow (q \Rightarrow r)] && \text{(A1)} \\
&\text{4.} \quad p \Rightarrow (q \Rightarrow r) && \text{MP on 2, 3} \\
&\text{5.} \quad (p \Rightarrow q) \Rightarrow (p \Rightarrow r) && \text{MP on 1, 4} \\
&\text{6.} \quad p \Rightarrow q && \text{Hypothesis} \\
&\text{7.} \quad p \Rightarrow r && \text{MP on 5, 6}
\end{align*}
[/example]
Writing out formal proofs in this way is extremely laborious. The Deduction Theorem, proved next, provides the main tool for constructing proofs without spelling out every axiom application.
### The Deduction Theorem
The Deduction Theorem says that the syntactic entailment relation $\vdash$ mirrors the behavior of the connective $\Rightarrow$ inside the language: to prove $p \Rightarrow q$ from $S$, it is equivalent to prove $q$ from $S \cup \{p\}$.
[quotetheorem:1452]
[citeproof:1452]
This proof reveals the precise purpose of axiom (A2): it is exactly what is needed to handle the modus ponens case in the backward direction of the Deduction Theorem. The converse also holds in a precise sense: any deduction system admitting modus ponens for which soundness and the Deduction Theorem both hold must have (A1) and (A2) as theorems. Axiom (A3), however, is independent — it is not forced by modus ponens and the Deduction Theorem alone.
The Deduction Theorem is most useful as a labour-saving device: instead of constructing an explicit formal proof of $p \Rightarrow q$ from $S$, one can work informally in $S \cup \{p\}$ and appeal to the theorem at the end.
[example: Using the Deduction Theorem]
To show $\{p \Rightarrow q,\ q \Rightarrow r\} \vdash (p \Rightarrow r)$, the Deduction Theorem reduces this to showing $\{p \Rightarrow q,\ q \Rightarrow r,\ p\} \vdash r$. From the hypotheses, MP gives $q$ (from $p$ and $p \Rightarrow q$), and then $r$ (from $q$ and $q \Rightarrow r$). The Deduction Theorem converts this into the desired $S \vdash p \Rightarrow r$.
[/example]
## Soundness and Completeness
We now have two notions — $S \models t$ (semantic) and $S \vdash t$ (syntactic) — and the central goal is to show they coincide. The result that $\vdash$ and $\models$ are equivalent is the **completeness theorem**. It splits into two halves:
- **Soundness**: If $S \vdash t$, then $S \models t$. (Provable things are true.)
- **Adequacy** (also called completeness in one direction): If $S \models t$, then $S \vdash t$. (True things are provable.)
Soundness is the easy half; adequacy is the deep result.
[definition: Consistency]
$S \subset L$ is **inconsistent** if $S \vdash \bot$. $S$ is **consistent** if it is not inconsistent.
[/definition]
### Soundness
[quotetheorem:1453]
[citeproof:1453]
Soundness is the easier half of completeness, but it is not vacuous: it rules out inconsistent systems. Note carefully what soundness does *not* say: it does not assert that every tautology is provable — that is adequacy, which requires a much deeper argument. Soundness also does not tell us that the axiom schemes (A1)–(A3) are the *only* possible tautologies; there are infinitely many tautologies, and we have chosen only three schemes to generate them all syntactically.
[remark: Soundness Requires Only That Axioms Are Tautologies]
The proof shows that any deduction system whose axioms are all tautologies and whose only rule of inference is modus ponens will be sound. In particular, even a very weak system — one with only the axiom $\vdash p \Rightarrow p$ for each $p$ — would be sound. Soundness alone is a weak requirement; adequacy is where the real content lies.
[/remark]
### Completeness via Model Existence
Adequacy is not proved directly. Instead, we reduce it to a special case: the **Model Existence Theorem**, which asserts that every consistent set of propositions has a model.
To prove the Model Existence Theorem, we face two obstacles. First, the set $S$ may not be *deductively closed* — there may be propositions $p$ with $S \vdash p$ but $p \notin S$. Second, there may be propositions $p$ for which neither $p \in S$ nor $\neg p \in S$; in that case, defining $v(p) = \mathbb{1}[p \in S]$ produces a function that is not guaranteed to be a valuation. The strategy is to enlarge $S$ so that it decides every proposition, without introducing inconsistency. The key auxiliary result that makes this enlargement possible is the following lemma.
[quotetheorem:1454]
[citeproof:1454]
The lemma guarantees that we can always extend a consistent set by deciding any single undecided proposition. It does not, however, specify which side to choose — and indeed both choices may be consistent. What it rules out is the possibility that both extensions are simultaneously inconsistent. This is precisely the handle we need to build, by induction, a maximal consistent extension that decides every proposition.
With the lemma in hand, we can now state and prove the Model Existence Theorem.
[quotetheorem:1455]
Before proving this, let us see why it implies adequacy. Suppose $S \models t$. Then no valuation satisfies all of $S$ and also satisfies $\neg t$, meaning $S \cup \{\neg t\}$ has no model. By the Model Existence Theorem (contrapositive), $S \cup \{\neg t\}$ is inconsistent: $S \cup \{\neg t\} \vdash \bot$. By the Deduction Theorem, $S \vdash \neg t \Rightarrow \bot$, i.e., $S \vdash \neg\neg t$. Since $\vdash (\neg\neg t) \Rightarrow t$ by axiom (A3), modus ponens gives $S \vdash t$. This is exactly adequacy.
[citeproof:1455]
The argument in Step 5 is where axiom (A3) plays its essential role: without double-negation elimination, one cannot derive $q$ from $\bot$, and the valuation would fail to be well-defined.
[remark: The Role of Axiom (A3)]
The final case of the proof — when $v(p) = 0$ — is where (A3) is used essentially for the first time. Without double-negation elimination, one cannot conclude that $\bot$ entails every proposition; this is exactly the classical step that distinguishes classical logic from intuitionistic logic. Dropping (A3) would make the completeness theorem fail for the resulting system.
[/remark]
Combining the Model Existence Theorem with the reduction argument above yields:
[quotetheorem:1456]
Together with Soundness, this gives the main theorem of the chapter:
[quotetheorem:1457]
The Completeness Theorem says that semantics and syntax are perfectly aligned: every semantically valid inference is syntactically derivable, and every syntactic derivation is semantically valid. This is a non-trivial fact — it required a substantial construction (the maximal consistent extension) to prove. It would be entirely possible in principle for the two notions to diverge, with semantic truths that admit no proof.
## Compactness and Decidability
The Completeness Theorem has two important corollaries. Both are facts about $\models$, but they are proved by passing through $\vdash$ where the corresponding statement is transparent.
### Compactness
[quotetheorem:1458]
[citeproof:1458]
The Compactness Theorem can be equivalently stated as: if every finite subset of $S$ has a model, then $S$ itself has a model. (Take the contrapositive with $t = \bot$.) This formulation resembles compactness in topology — a set is "compact" if covers by open sets have finite subcovers — which is where the name comes from.
[remark: Applications of Compactness]
The Compactness Theorem has striking applications outside logic, particularly in combinatorics and graph theory. For instance, it can be used to show that if every finite subgraph of an infinite graph is $k$-colourable, then the whole graph is $k$-colourable. The argument encodes graph colouring as a propositional theory and applies Compactness. These applications show that the theorem is not merely a technical consequence of the proof system but has genuine mathematical content.
[/remark]
### Decidability
Compactness addresses the relationship between global and local semantic truth. A separate question is whether syntactic derivability is mechanically checkable: given a finite set $S$ and a proposition $t$, can a computer determine in finite time whether $S \vdash t$?
[quotetheorem:1459]
[citeproof:1459]
The Decidability Theorem is remarkably useful, but it must be interpreted carefully. It tells us that an *algorithm exists* to decide $S \vdash t$; it does not tell us that finding the proof itself is easy. Indeed, the truth-table algorithm has exponential running time in the number of primitive propositions, and the question of whether a substantially better algorithm exists is the celebrated P vs NP problem (satisfiability of propositional formulas is NP-complete). The theorem also applies only to propositional logic; predicate logic, studied in Chapter 5, is in general undecidable.
Propositional logic, though decidable and complete, operates on atomic propositions without structure; to count transfinite sets and reason about infinite hierarchies, we need a richer framework. The well-orderings and ordinals provide precisely this — a way to extend the natural numbers beyond infinity and establish a fundamental ordering principle for all sets.
# 2. Well-orderings and Ordinals
This chapter develops the theory of well-orderings and ordinals, which together form the backbone of how mathematicians "count beyond infinity." The central idea is simple to state: instead of studying specific well-ordered sets, we want to study their *lengths* — the abstract measurement of how long a well-ordering is, independent of what its elements happen to be. These lengths are the ordinals. Once we have ordinals, we can define arithmetic on them, carry out induction and recursion in settings far more general than the natural numbers, and establish the existence of uncountable well-orderings via Hartogs' lemma. This chapter is the starting point for everything that follows in the theory of ordinals and cardinals.
## Well-orderings
The entire theory rests on a particular kind of ordering that generalises the familiar structure of $\mathbb{N}$. Before arriving at the central definition, we recall the broader setting.
A *total order* (or *linear order*) is the basic context: a set equipped with a relation that allows any two elements to be compared.
[definition: Strict Total Order]
A **strict total order** is a pair $(X, <)$ where $X$ is a set and $<$ is a relation on $X$ satisfying:
1. $x \not< x$ for all $x \in X$ (irreflexivity),
2. if $x < y$ and $y < z$ then $x < z$ (transitivity),
3. for all $x, y \in X$, exactly one of $x < y$, $x = y$, $y < x$ holds (trichotomy).
One writes $x > y$ to mean $y < x$, and $x \leq y$ to mean ($x < y$ or $x = y$).
[/definition]
An equivalent formulation replaces $<$ by a non-strict relation $\leq$, requiring reflexivity, transitivity, antisymmetry, and totality. The two formulations are interchangeable in practice.
[example: Total Orders and Non-Examples]
The sets $\mathbb{N}, \mathbb{Z}, \mathbb{Q}, \mathbb{R}$ with their usual orderings are all total orders. Two illuminating non-examples show that neither divisibility nor inclusion is total: on $\mathbb{N}^+$, the relation "$x < y$ if $x \mid y$ and $x \neq y$" fails trichotomy (consider $2$ and $3$); on $\mathcal{P}(S)$ for $|S| > 1$, inclusion fails trichotomy (consider two distinct singletons).
[/example]
The crucial additional condition that gives us well-orderings is that every non-empty subset has a least element — a global minimum, not merely a local one.
[definition: Well-ordering]
A total order $(X, <)$ is a **well-ordering** if every non-empty subset $S \subseteq X$ has a least element: for all $S \subseteq X$ with $S \neq \varnothing$, there exists $x \in S$ such that $y \geq x$ for all $y \in S$.
[/definition]
This single condition has profound consequences. The intuition is that a well-ordered set has no "downward infinite descent" — you cannot keep finding smaller and smaller elements indefinitely.
[example: Well-orderings and Failures]
The archetypal well-ordering is $(\mathbb{N}, <)$. By contrast, $\mathbb{Z}$, $\mathbb{Q}$, and $\mathbb{R}$ fail to be well-ordered: the whole set has no least element. Even $\{x \in \mathbb{Q} : x \geq 0\}$ fails, since $\{x \in \mathbb{Q} : x > 0\}$ has no minimum. A subtler example: in $\mathbb{R}$, the set $\{1 - 1/n : n = 2, 3, 4, \ldots\}$ is well-ordered (it is order-isomorphic to $\mathbb{N}$). Adding the point $1$ to get $\{1 - 1/n : n = 2, 3, 4, \ldots\} \cup \{1\}$ preserves the well-ordering: for any non-empty subset, if it contains $1$ alongside other elements, the minimum of the remaining part serves; if it contains only $1$, then $1$ itself is the minimum. Similarly, $\{1 - 1/n\} \cup \{2 - 1/n\}$ — two "copies" of $\mathbb{N}$ placed end to end — is well-ordered, and will later serve as the model for the ordinal $\omega \cdot 2$.
[/example]
There is an equivalent characterisation of well-orderings in terms of sequences, which is often easier to check in practice.
[quotetheorem:1460]
[citeproof:1460]
### Order Isomorphisms
Since we want to study well-orderings up to their abstract structure — ignoring what the elements are and caring only about their order — we need the correct notion of equivalence.
[definition: Order Isomorphism]
Two total orders $(X, <_X)$ and $(Y, <_Y)$ are **isomorphic** if there exists a bijection $f: X \to Y$ that is order-preserving: $x <_X y \implies f(x) <_Y f(y)$.
Such an $f$ is called an **order isomorphism**. Since $f$ is a bijection and order-preserving, its inverse $f^{-1}$ is also order-preserving, so being isomorphic is a symmetric relation.
[/definition]
The definition captures structural identity: two isomorphic orders look the same, even if their elements differ. For well-orderings this identification is especially clean, because isomorphisms between them are unique — but to appreciate why this is remarkable, one should first see that isomorphic well-orderings can arise from very different-looking sets.
[example: Isomorphic Well-orderings]
The set $\mathbb{N}$ is isomorphic to $\{1 - 1/n : n = 2, 3, 4, \ldots\}$ via the map $n \mapsto 1 - 1/(n+2)$. On the other hand, $\{1 - 1/n : n = 2, 3, \ldots\} \cup \{1\}$ is not isomorphic to $\{1 - 1/n : n = 2, 3, \ldots\}$: the former has a greatest element (namely $1$) while the latter does not, and any isomorphism must preserve this property.
[/example]
For general total orders like $\mathbb{Z}$, there can be many isomorphisms from the structure to itself: $x \mapsto x$ and $x \mapsto x - 13$ are both automorphisms of $(\mathbb{Z}, <)$. For well-orderings, this multiplicity disappears entirely — a remarkable rigidity that we now establish.
[quotetheorem:1461]
[citeproof:1461]
This uniqueness is a special feature of well-orderings. It does not hold for arbitrary total orders (as the automorphisms of $\mathbb{Z}$ show), nor for algebraic structures like groups (where $\mathbb{Z}_3$ has two automorphisms).
### Initial Segments and Induction
A central tool in the theory is the notion of an initial segment: a "downward-closed" part of an ordered set.
[definition: Initial Segment]
A subset $Y \subseteq X$ of a totally ordered set $(X, <)$ is an **initial segment** if whenever $x \in Y$ and $y < x$, we have $y \in Y$. Equivalently, $Y$ is a "downward-closed" subset.
[/definition]
For any element $x \in X$, the set $I_x := \{y \in X : y < x\}$ is an initial segment, called the **initial segment determined by $x$**. In a general total order, not every initial segment need be of this form: in $\mathbb{R}$, the set $(-\infty, 3] = \{x \in \mathbb{R} : x \leq 3\}$ is an initial segment, but there is no element $a \in \mathbb{R}$ with $I_a = (-\infty, 3]$ (that would require $a$ to be the least element above all of $(-\infty, 3]$, i.e., $a = 3$ — but $I_3 = (-\infty, 3)$, not $(-\infty, 3]$). Similarly, $\{x \in \mathbb{Q} : x^2 < 2\}$ is an initial segment of $\mathbb{Q}$ not of the form $I_q$ for any $q \in \mathbb{Q}$. For well-orderings, however, this kind of "gap at the top" cannot occur: every proper initial segment is determined by a single element.
[quotetheorem:1462]
[citeproof:1462]
The contrast with general total orders is worth emphasising. In $\mathbb{R}$, any half-line $(-\infty, b]$ is an initial segment, and these are parameterised not just by elements of $\mathbb{R}$ but by points of $\mathbb{R} \cup \{+\infty\}$; there are initial segments with no "determining element" inside the set. In a well-ordering, the structure is tight: every proper downward-closed subset is determined by the unique least element it excludes. This rigidity is what makes well-orderings suitable as the common representation for ordinal lengths.
Well-orderings support a powerful induction principle, the direct generalisation of strong induction on $\mathbb{N}$ to any well-ordered set.
[quotetheorem:1463]
[citeproof:1463]
Note that for $x = \min X$, the hypothesis "$\{y : y < x\} \subseteq S$" is vacuously true (there are no elements below the minimum), so the base case is automatic. The induction principle generalises strong induction: ordinary strong induction on $\mathbb{N}$ is the special case $X = \mathbb{N}$.
## New Well-orderings from Old
Given well-orderings, there are systematic ways to build new, typically longer ones. We describe three constructions: taking subsets, extending by one element, and gluing a nested family together.
### Subset Collapse
A first, somewhat surprising, fact is that every subset of a well-ordering is itself isomorphic to an initial segment of the original — not merely a subset, but an initial segment. This is false for general total orders: no initial segment of $\mathbb{Z}$ is isomorphic to $\{1, 2, 3\}$, since every initial segment of $\mathbb{Z}$ is empty or infinite.
To prove the subset collapse lemma, we need the machinery of definition by recursion — the formal justification for defining a function on a well-ordered set by specifying each value in terms of all earlier values.
[quotetheorem:1464]
[citeproof:1464]
The recursion theorem gives a clean way to define functions on well-ordered sets inductively — it is the rigorous foundation for all "define by induction" arguments that follow.
With the recursion theorem in hand, we can now prove the subset collapse lemma.
[quotetheorem:1465]
[citeproof:1465]
A key consequence: a well-ordering $X$ can never be isomorphic to a proper initial segment of itself, since the identity is the unique isomorphism from $X$ to itself (by uniqueness of isomorphisms), and the image of any isomorphism from $X$ to an initial segment would have to be all of $X$.
### The Successor Construction
The simplest way to extend a well-ordering is to append a single new element at the top.
[definition: Successor of a Well-ordering]
Given a well-ordering $(X, <)$, choose any element $x^* \notin X$ and define a well-ordering on $X \cup \{x^*\}$ by declaring $y < x^*$ for all $y \in X$ and retaining the order on $X$. This is the **successor** of $X$, written $X^+$. The isomorphism class of $X^+$ does not depend on the choice of $x^*$.
[/definition]
We have $X < X^+$ in the ordering of well-orderings: $X$ embeds as a proper initial segment of $X^+$ (namely the initial segment determined by $x^*$). The successor construction is the simplest way to produce a strictly longer well-ordering, and it corresponds, at the level of ordinals, to adding $1$.
### Gluing a Nested Family
To go further — constructing well-orderings that are genuinely "longer" than any individual member of some collection — we need a way to stitch well-orderings together.
[definition: Extension and Nested Family]
A well-ordering $(Y, <_Y)$ **extends** $(X, <_X)$ if $X$ is a proper initial segment of $Y$ and the two orders agree on $X$. A family $\{X_i : i \in I\}$ of well-orderings is **nested** if for any $i, j \in I$, either $X_i$ extends $X_j$ or $X_j$ extends $X_i$.
[/definition]
The condition that $X$ be an initial segment of $Y$ (not just a subset) is essential: without it, taking the union of $\{x \in \mathbb{Z} : x \geq -n\}$ over $n \in \mathbb{N}$ gives $\mathbb{Z}$, which is not well-ordered.
[quotetheorem:1466]
[citeproof:1466]
The "nested" (initial-segment) condition is doing essential work here. A family of well-orderings that are merely *compatible* — meaning they agree wherever both are defined, but neither need be an initial segment of the other — need not have a well-ordered union. For instance, take any well-ordering $X$ and two distinct final segments $X \setminus I_{a}$ and $X \setminus I_{b}$ with $a \neq b$; they agree on their overlap but neither extends the other, and their union is all of $X$ (fine in this case), yet the construction can fail in more exotic configurations. The point is that without the initial-segment condition, a minimal element of $S$ chosen inside $X_i$ might be "beaten" by elements of $X_j$ that are below it in $X_j$'s order but were not in $X_i$'s domain. The initial-segment condition rules this out by ensuring every element of $X_i$ is above everything outside $X_i$.
### Comparing Well-orderings
The subset collapse lemma gives us a natural total ordering on well-orderings themselves.
Write $X \leq Y$ if $X$ is isomorphic to an initial segment of $Y$ (including all of $Y$ itself), and $X < Y$ if $X$ is isomorphic to a proper initial segment of $Y$. We identify two well-orderings when they are isomorphic.
This comparison is indeed a total order:
[quotetheorem:1467]
[citeproof:1467]
This totality theorem is the structural backbone of the ordinal hierarchy: it says that no two ordinals are "incomparable." It is worth noting that an analogous statement for cardinals — for any two sets, one injects into the other — requires the Axiom of Choice and is in fact equivalent to it. For well-orderings, totality is a theorem of ZF alone, without Choice; it follows from the ability to compare well-orderings by the explicit recursion in the proof above. This distinction between ordinal comparability (provable in ZF) and cardinal comparability (requires Choice) is a subtle but important point in foundations.
[quotetheorem:1468]
[citeproof:1468]
Together with reflexivity and transitivity (which are straightforward), these results show that the comparison of well-orderings is a total order.
## Ordinals
The key insight of ordinal theory is to treat the comparison order on well-orderings not as a relation between specific sets, but as a hierarchy of abstract "lengths." Each isomorphism class of well-orderings is an *ordinal*.
[definition: Ordinal and Order Type]
An **ordinal** is an isomorphism class of well-orderings. We write ordinals as Greek letters $\alpha, \beta, \gamma, \ldots$. If a well-ordering $X$ belongs to the class $\alpha$, we say $X$ has **order type** $\alpha$ and write $\operatorname{otp}(X) = \alpha$.
[/definition]
A remark is in order: the definition above is informal, because the collection of all well-orderings is too large to form a set — hence "isomorphism class" cannot mean a set in the ordinary sense. A formal set-theoretic definition of ordinals (as transitive sets well-ordered by $\in$) is given in Chapter 6. For now, we treat ordinals as the canonical representatives of well-ordering lengths.
[definition: Standard Ordinal Notation]
For each $k \in \mathbb{N}$, write $k$ for the order type of the unique well-ordering of size $k$ (a $k$-element set with any total order, all of which are isomorphic). Write $\omega$ for the order type of $(\mathbb{N}, <)$.
[/definition]
[example: Computing Order Types]
In $\mathbb{R}$, the set $\{2, 3, 5, 6\}$ with its inherited order has order type $4$. The set $\{1 - 1/n : n = 2, 3, 4, \ldots\}$ has order type $\omega$, via the isomorphism $n \mapsto 1 - 1/(n+2)$.
[/example]
### Comparing Ordinals
For ordinals $\alpha, \beta$, write $\alpha \leq \beta$ if some (equivalently, every) well-ordering of type $\alpha$ embeds as an initial segment of some (equivalently, every) well-ordering of type $\beta$. Write $\alpha < \beta$ for strict inequality.
A beautiful self-referential property of ordinals is that each ordinal "is" the set of all smaller ordinals, ordered by the comparison $<$.
[quotetheorem:1469]
[citeproof:1469]
This means, in set-theoretic language, that the ordinals "are" their own elements ordered by $\in$. When we later define ordinals set-theoretically (as transitive sets well-ordered by $\in$), this proposition becomes literal.
### The Ordinals are Well-ordered
[quotetheorem:1470]
[citeproof:1470]
However, we cannot say the ordinals form a well-ordered *set* — they form a proper class, as the following argument shows.
[quotetheorem:1471]
[citeproof:1471]
The Burali-Forti paradox is not a contradiction within our theory — it simply means the ordinals form a *proper class*, a collection too large to be a set. This will be formalised in the chapter on ZF set theory.
### Hartogs' Lemma and the First Uncountable Ordinal
So far, starting from $\mathbb{N}$, we can produce ordinals by repeatedly taking successors and countable suprema. All such ordinals are countable. A fundamental question is whether there is an uncountable ordinal. Hartogs' lemma answers this in remarkable generality.
[quotetheorem:1472]
[citeproof:1472]
The proof produces the Hartogs number as a supremum of a set of ordinals, making it the smallest ordinal failing to inject into $X$. This construction warrants a name.
[definition: Hartogs Number]
For a set $X$, write $\gamma(X)$ for the least ordinal that does not inject into $X$. This is called the **Hartogs number** of $X$.
[/definition]
Applying Hartogs' lemma to $X = \omega = \mathbb{N}$ gives the first uncountable ordinal:
[example: The First Uncountable Ordinal $\omega_1$]
Let $B$ be the set of all countable ordinals (those injecting into $\mathbb{N}$), and set $\omega_1 = \sup B = \gamma(\omega)$. Then $\omega_1$ is the **least uncountable ordinal**. Indeed, if $\omega_1$ were countable, it would belong to $B$, so $\omega_1 < \sup B = \omega_1$, a contradiction. Every proper initial segment $\{\beta : \beta < \xi\}$ for $\xi < \omega_1$ is countable.
Two notable properties of $\omega_1$:
- Although $\omega_1$ itself is uncountable, every element $\xi \in \omega_1$ has a countable initial segment $\{\beta : \beta < \xi\}$.
- Every sequence $(\xi_n)_{n \in \mathbb{N}}$ of elements of $\omega_1$ is bounded: $\sup_n \xi_n$ is a countable union of countable sets, hence countable, so $\sup_n \xi_n < \omega_1$.
[/example]
## Successors and Limits
Every ordinal is built from two elementary operations: taking a successor (adding one to the top) or taking a supremum. This leads to a fundamental dichotomy.
Given an ordinal $\alpha$, consider the set $I_\alpha = \{\beta : \beta < \alpha\}$. Either this set has a greatest element, or it does not. In the first case, $\alpha$ is a successor ordinal; in the second, it is a limit ordinal.
[definition: Successor Ordinal]
An ordinal $\alpha$ is a **successor ordinal** if $I_\alpha = \{\beta : \beta < \alpha\}$ has a greatest element $\beta$. In this case $\alpha = \beta^+$, the successor of $\beta$. One writes $\alpha = \beta + 1$ informally.
[/definition]
The next definition captures the complementary case: ordinals that are not successors, but rather accumulation points of the ordinal hierarchy.
[definition: Limit Ordinal]
An ordinal $\alpha$ is a **limit ordinal** if $I_\alpha$ has no greatest element. Equivalently, $\alpha = \sup\{\beta : \beta < \alpha\}$. We conventionally denote limit ordinals by $\lambda$. The ordinal $0$ is a limit ordinal by convention: it has no element below it, hence no greatest element below it.
[/definition]
To consolidate these definitions, it helps to work through the first few ordinals concretely, since the distinction between successors and limits drives every recursive construction that follows.
[example: Classifying Small Ordinals]
The ordinals $1, 2, 3, 4, 5$ and $\omega + 1$ are successors ($1 = 0^+$, $2 = 1^+$, $\omega + 1 = \omega^+$, etc.). The ordinals $0$ and $\omega$ are limits. For $\omega$: every $n < \omega$ satisfies $n + 1 < \omega$, so $n$ is not the greatest element below $\omega$; thus $\omega = \sup\{n : n < \omega\}$, confirming it is a limit. The distinction is important in recursive definitions, which treat the two cases differently.
[/example]
The successor/limit dichotomy gives a structured picture of how ordinals grow. From small to large:
\begin{align*}
0,\; 1,\; 2,\; \ldots,\; \omega,\; \omega + 1,\; \omega + 2,\; \ldots,\; \omega \cdot 2,\; \omega \cdot 2 + 1,\; \ldots,\; \omega^2,\; \ldots,\; \omega^\omega,\; \ldots,\; \varepsilon_0,\; \ldots
\end{align*}
Here $\omega \cdot 2$, $\omega^2$, $\omega^\omega$, and $\varepsilon_0$ are all limit ordinals, while $\omega + 1$, $\omega \cdot 2 + 1$ are successors. The formal meaning of each expression will be established in the next section.
## Ordinal Arithmetic
We want to define addition, multiplication, and exponentiation of ordinals so that notations like $\omega + \omega$ and $\omega^2$ acquire precise meaning. Each operation is defined by transfinite recursion on the right-hand argument. The result is arithmetic that extends the familiar operations on $\mathbb{N}$ but loses commutativity.
### Ordinal Addition
Two natural candidate definitions of "addition" compete for the name. One can try to add $\alpha$ and $\beta$ by appending $\beta$ after $\alpha$ (placing $\beta$ to the right), or by appending $\alpha$ before $\beta$ (placing $\alpha$ to the left). These are genuinely different because the resulting order types depend on which argument "comes last" — and it is the last block that determines whether the result has a greatest element or accumulates at a limit. The convention of recursing on the second argument, $\alpha + \beta$ meaning "$\alpha$ then $\beta$," is the one that makes the recursion at limit ordinals produce a true supremum of previously computed values. Recursing on the first argument would not have this property, since extending a block on the left does not accumulate toward a limit in the same way. The consequence of this choice is that addition is non-commutative: $1 + \omega \neq \omega + 1$.
[definition: Ordinal Addition (Inductive)]
For ordinals $\alpha$, define $\alpha + \beta$ by transfinite recursion on $\beta$:
\begin{align*}
\alpha + 0 &= \alpha, \\
\alpha + \beta^+ &= (\alpha + \beta)^+, \\
\alpha + \lambda &= \sup\{\alpha + \gamma : \gamma < \lambda\} \quad \text{for non-zero limit } \lambda.
\end{align*}
[/definition]
An equivalent *synthetic* definition makes the geometric picture transparent: $\alpha + \beta$ is the order type of the disjoint union $\alpha \sqcup \beta$ with all elements of $\alpha$ placed before all elements of $\beta$ in their respective orders.
[definition: Ordinal Addition (Synthetic)]
$\alpha + \beta$ is the order type of $\alpha \times \{0\} \cup \beta \times \{1\}$, where $(x, 0) < (y, 1)$ always, $(x, 0) < (x', 0)$ iff $x <_\alpha x'$, and $(y, 1) < (y', 1)$ iff $y <_\beta y'$.
[/definition]
Geometrically: $\alpha + \beta$ means "write out all of $\alpha$, then write out all of $\beta$."
[example: Non-commutativity of Addition]
The key computation is $1 + \omega \neq \omega + 1$. By the synthetic definition, $\omega + 1 = \omega^+$ is a successor ordinal with a greatest element. But $1 + \omega$ places a single point before $\omega$; any such point has $\omega$ many elements above it, so the resulting order type is still $\omega$ (the map sending the extra point to $0$ and $n \in \omega$ to $n + 1$ is an isomorphism). Thus $1 + \omega = \omega \neq \omega + 1$.
More generally, for any finite $n$, we have $n + \omega = \omega$: finitely many points before an $\omega$-chain do not change the order type.
[/example]
This asymmetry arises directly from the choice to recurse on $\beta$ rather than $\alpha$. Addition is, however, associative:
[quotetheorem:1473]
[citeproof:1473]
It is worth recording that ordinal addition is left-cancellative but not right-cancellative: $\alpha + \beta < \alpha + \gamma$ whenever $\beta < \gamma$, but $\beta + \alpha = \gamma + \alpha$ does not imply $\beta = \gamma$ (e.g., $1 + \omega = 2 + \omega = \omega$).
### Ordinal Multiplication
The name "multiplication" could reasonably refer to several operations on ordinals. A natural first guess — the pointwise product, taking two ordinals and somehow "multiplying their sizes" — fails to produce a well-defined order type. The correct definition comes from the lexicographic picture: $\alpha \cdot \beta$ is the order type of $\beta$ many disjoint copies of $\alpha$ arranged in sequence, which is the order type of $\alpha \times \beta$ under the lexicographic order with $\beta$ as the more significant digit. This choice recurses on the second argument: adding one more copy of $\alpha$ at the successor step, and taking the supremum at the limit step. As with addition, recursing on the second argument (not the first) is what makes the operation continuous at limits and consistent with the inductive definition.
[definition: Ordinal Multiplication (Inductive)]
For ordinals $\alpha$, define $\alpha \cdot \beta$ by transfinite recursion on $\beta$:
\begin{align*}
\alpha \cdot 0 &= 0, \\
\alpha \cdot \beta^+ &= \alpha \cdot \beta + \alpha, \\
\alpha \cdot \lambda &= \sup\{\alpha \cdot \gamma : \gamma < \lambda\} \quad \text{for non-zero limit } \lambda.
\end{align*}
[/definition]
The synthetic definition makes the structure visible: $\alpha \cdot \beta$ is the order type of $\alpha \times \beta$, ordered *lexicographically with $\beta$ as the more significant digit* — that is, $(x, y) < (x', y')$ if $y <_\beta y'$, or $y = y'$ and $x <_\alpha x'$. Geometrically, this is $\beta$ many copies of $\alpha$ placed end to end.
[example: Multiplication is Non-commutative]
Compute $2 \cdot \omega$ and $\omega \cdot 2$. By the synthetic definition, $\omega \cdot 2 = \omega + \omega$ (two copies of $\omega$ in sequence). For $2 \cdot \omega$: this is $\omega$ many copies of $\{0, 1\}$, i.e., the order type of $\{0,1\} \times \omega$ with the above lexicographic order. The pairs $(0, 0) < (1, 0) < (0, 1) < (1, 1) < (0, 2) < \cdots$ form an $\omega$-sequence, so $2 \cdot \omega = \omega$. Thus $2 \cdot \omega = \omega \neq \omega + \omega = \omega \cdot 2$.
Similarly, $\omega^2 = \omega \cdot \omega = \sup\{\omega \cdot n : n < \omega\} = \sup\{\omega, \omega + \omega, \omega + \omega + \omega, \ldots\}$, the order type of $\omega \times \omega$ with the lexicographic order described.
[/example]
Multiplication distributes over addition on the left: $\alpha \cdot (\beta + \gamma) = \alpha \cdot \beta + \alpha \cdot \gamma$, proved by induction on $\gamma$. Right distributivity fails: $(1 + 1) \cdot \omega = 2 \cdot \omega = \omega$ but $1 \cdot \omega + 1 \cdot \omega = \omega + \omega$. Multiplication is associative, as one verifies by transfinite induction on the third argument: the base and successor cases follow directly from the definition and left distributivity, while the limit case uses the fact that $\beta \cdot \lambda$ is a non-zero limit whenever $\lambda$ is, by an argument parallel to the one for addition.
### Ordinal Exponentiation
The sequence $\omega, \omega \cdot \omega, \omega \cdot \omega \cdot \omega, \ldots$ — obtained by multiplying $\omega$ by itself repeatedly — grows without bound, and each term is a limit ordinal strictly larger than the previous. The question naturally arises: is there an ordinal that equals "all of $\omega, \omega^2, \omega^3, \ldots$ combined"? And if we form $\omega^{\omega}$, then $\omega^{\omega^{\omega}}$, and so on, does the resulting tower eventually stabilise? These questions motivate ordinal exponentiation: we need a single recursive definition that captures $\omega^n$ for each $n$ and whose limit at $\omega$ gives a meaningful supremum.
[definition: Ordinal Exponentiation (Inductive)]
For ordinals $\alpha$, define $\alpha^\beta$ by transfinite recursion on $\beta$:
\begin{align*}
\alpha^0 &= 1, \\
\alpha^{\beta^+} &= \alpha^\beta \cdot \alpha, \\
\alpha^\lambda &= \sup\{\alpha^\gamma : \gamma < \lambda\} \quad \text{for non-zero limit } \lambda.
\end{align*}
[/definition]
The limit clause says that $\alpha^\lambda$ is the supremum of all previously computed powers — it is the "accumulation point" of the tower of powers. As with multiplication, this definition recurses on the exponent (right argument), ensuring continuity at limits.
[example: Exponentiation Computations]
By direct computation: $\omega^1 = \omega^0 \cdot \omega = 1 \cdot \omega = \omega$; $\omega^2 = \omega^1 \cdot \omega = \omega \cdot \omega$; $\omega^3 = \omega^2 \cdot \omega$; and in general $\omega^n$ for $n < \omega$ is a finite tower of $\omega$-multiplications. Then $\omega^\omega = \sup\{\omega^n : n < \omega\}$, the supremum of all finite tower lengths.
Note the asymmetry: $2^\omega = \sup\{2^n : n < \omega\} = \sup\{1, 2, 4, 8, \ldots\} = \omega$, a countable ordinal, while $\omega^2 = \omega \cdot \omega$ is much larger (in cardinality terms, both remain countable, but as ordinals $\omega < \omega^2$).
[/example]
### The Ordinal $\varepsilon_0$
The definition of ordinal exponentiation raises a natural question: can the tower $\omega, \omega^\omega, \omega^{\omega^\omega}, \ldots$ ever stabilise at a fixed point? That is, is there an ordinal $\alpha$ with $\omega^\alpha = \alpha$, so that the map $\alpha \mapsto \omega^\alpha$ can go no further? Such a fixed point would represent a kind of "closure" under all the operations $+$, $\cdot$, and $\beta \mapsto \omega^\beta$ that we have defined so far. The answer is yes, and the smallest such fixed point is $\varepsilon_0$.
[definition: The Ordinal $\varepsilon_0$]
Define the sequence $\omega_0 = \omega$ and $\omega_{n+1} = \omega^{\omega_n}$ for each $n < \omega$. Set
\begin{align*}
\varepsilon_0 = \sup\{\omega_n : n < \omega\} = \sup\{\omega,\, \omega^\omega,\, \omega^{\omega^\omega},\, \ldots\}.
\end{align*}
This is the **smallest ordinal $\alpha$ satisfying $\omega^\alpha = \alpha$** — the least fixed point of the map $\alpha \mapsto \omega^\alpha$.
[/definition]
That $\varepsilon_0$ indeed satisfies $\omega^{\varepsilon_0} = \varepsilon_0$ follows from normality of $\alpha \mapsto \omega^\alpha$ (proved in the next section): $\omega^{\varepsilon_0} = \sup\{\omega^{\omega_n} : n < \omega\} = \sup\{\omega_{n+1} : n < \omega\} = \varepsilon_0$. The sequence $0, 1, 2, \ldots, \omega, \omega + 1, \ldots, \omega \cdot 2, \ldots, \omega^2, \ldots, \omega^\omega, \ldots, \varepsilon_0$ represents the hierarchy of ordinals obtainable by starting from $0$ and applying successor, addition, multiplication, and exponentiation with base $\omega$ finitely many times.
## Normal Functions
*The material in this section was not lectured; it is included as a starred extension.*
When studying functions from ordinals to ordinals, the most tractable and useful class are those that respect the ordinal structure: they are increasing and preserve suprema of bounded sets. These are the *normal functions*, and they appear naturally whenever one constructs "enumerating functions" for closed unbounded classes of ordinals.
[definition: Normal Function]
A function $f: \mathrm{On} \to \mathrm{On}$ (where $\mathrm{On}$ denotes the class of all ordinals) is **normal** if:
1. $f(\alpha) < f(\alpha^+)$ for every ordinal $\alpha$ (strictly increasing at successors), and
2. $f(\lambda) = \sup\{f(\gamma) : \gamma < \lambda\}$ for every non-zero limit ordinal $\lambda$ (continuity at limits).
[/definition]
The two conditions together capture the idea that $f$ "commutes with the ordinal structure." One may equivalently define normal functions as those that are strictly increasing and preserve all suprema. The equivalence follows from the lemmas below.
[example: Natural Examples of Normal Functions]
For any $\beta > 1$, the map $\alpha \mapsto \beta^\alpha$ is normal. Condition (1): $\beta^{\alpha^+} = \beta^\alpha \cdot \beta > \beta^\alpha$ since $\beta > 1$. Condition (2): $\beta^\lambda = \sup\{\beta^\gamma : \gamma < \lambda\}$ by definition of ordinal exponentiation at limits. Similarly, $\alpha \mapsto \alpha + \beta$ (for fixed $\beta > 0$) is normal.
[/example]
### Basic Properties of Normal Functions
A normal function $f$ is defined to be strictly increasing at successors and continuous at limits — but why are these the "right" hypotheses? Strict increase at successors is clearly necessary if $f$ is to be injective (and thus track ordinals faithfully). Continuity at limits is the condition that makes $f$ compatible with the ordinal construction of limits as suprema: if $\lambda = \sup_{\gamma < \lambda} \gamma$, then one wants $f(\lambda) = \sup_{\gamma < \lambda} f(\gamma)$ so that $f$ does not "jump over" limit ordinals. Without continuity, a function could be strictly increasing yet fail to be dominated by the identity (e.g., define $f(0) = 1$, $f(\alpha) = \alpha$ for $\alpha \geq 1$: this is not continuous at $\omega$ since $f(\omega) = \omega$ but the limit clause would give $\omega$ anyway — but one can construct discontinuous examples that "compress" large cardinals). The theorem below shows that the two conditions together prevent any compression of the ordinal hierarchy.
[quotetheorem:1474]
[citeproof:1474]
The fact that $f(\alpha) \geq \alpha$ means normal functions cannot "compress" the ordinals. The following lemma extends the continuity property from bounded sequences to arbitrary sets.
[quotetheorem:1475]
[citeproof:1475]
The "non-empty" hypothesis in this theorem is necessary. If the collection were empty, the supremum of the empty set of ordinals is $0$ (the least ordinal, serving as the infimum of $\mathrm{On}$), and $f(0)$ need not equal $0$. For instance, $f(\alpha) = \alpha + 1$ is normal, and $f(0) = 1 \neq 0$. More generally, the statement "$f(\sup \varnothing) = \sup_{i \in \varnothing} f(\alpha_i)$" would read "$f(0) = 0$," which fails for any normal function with $f(0) > 0$.
### Fixed Points of Normal Functions
Every normal function has fixed points — ordinals $\beta$ where $f(\beta) = \beta$. This is a powerful and non-obvious fact.
[quotetheorem:1476]
[citeproof:1476]
The construction of $\beta$ as a supremum of an iterated sequence is a standard ordinal-theoretic technique: iterate the function $\omega$ many times, then take the supremum. The continuity of $f$ at the resulting limit ordinal collapses the iteration.
[remark: The Veblen Hierarchy]
Since the set of fixed points of a normal function $f$ is itself closed and unbounded in the ordinals (the fixed points form a proper class), the function enumerating them is again normal. This observation leads to the **Veblen hierarchy**: $\varphi_0(\alpha) = \omega^\alpha$, and $\varphi_{\beta+1}$ enumerates the common fixed points of $\varphi_\gamma$ for $\gamma \leq \beta$. The ordinal $\varepsilon_0$ is precisely $\varphi_1(0)$, the first fixed point of $\alpha \mapsto \omega^\alpha$. The ordinal $\varepsilon_1 = \varphi_1(1)$ is the second fixed point, and $\varepsilon_\omega = \varphi_1(\omega)$. The Veblen hierarchy provides a systematic notation for a vast class of countable ordinals, though all of them remain below the first uncountable ordinal $\omega_1$.
[/remark]
### The Division Algorithm for Normal Functions
In ordinary arithmetic, the Euclidean division algorithm says that for any integer $a$ and positive integer $b$, one can write $a = bq + r$ with $0 \leq r < b$. An analogous statement holds for ordinals when the "denominator" is a normal function: given any ordinal $\alpha$, one can find the largest $\gamma$ such that $f(\gamma) \leq \alpha$, playing the role of the "quotient." This is not a mere analogy — applied to $f(\alpha) = \omega^\alpha$, it yields the Cantor normal form theorem, the ordinal version of positional notation. The theorem rests on the fact that a normal function, being strictly increasing and dominating the identity, cannot "skip over" $\alpha$: every ordinal is "reachable from below" by $f$.
[quotetheorem:1477]
[citeproof:1477]
This result is the ordinal analogue of the Euclidean division algorithm. Applied to $f(\alpha) = \omega^\alpha$, it yields the Cantor normal form theorem: every ordinal $\alpha$ can be written uniquely as $\omega^{\gamma_1} \cdot k_1 + \omega^{\gamma_2} \cdot k_2 + \cdots + \omega^{\gamma_m} \cdot k_m$ with $\gamma_1 > \gamma_2 > \cdots > \gamma_m$ and $k_i \in \mathbb{N}^+$. This is the ordinal version of base-$\omega$ positional notation.
Ordinals give us a total order on the universe of sets, but most mathematical structures admit only partial orderings. The theory of posets weakens the totality requirement and reveals a more flexible principle: Zorn's lemma, which guarantees the existence of maximal elements even when we cannot compare all pairs. This is our first deep theorem of set theory.
# 3. Posets and Zorn's Lemma
This chapter studies partial orders — a weakening of total order that arises naturally whenever one collection can be contained in or built from another. The leading example throughout is the power set $\mathcal{P}(X)$ ordered by inclusion, together with its subposets. The two principal theorems are the Knaster–Tarski fixed-point theorem, which provides a powerful tool for constructing fixed points in complete posets, and Zorn's lemma, which guarantees maximal elements whenever every chain is bounded. We use both to derive results across algebra and logic. We also investigate the deep equivalence between Zorn's lemma, the Axiom of Choice, and the Well-Ordering Principle, and close with the starred Bourbaki–Witt theorem, which recovers a fixed-point theorem in the choice-free setting.
## Partial Orders
The concept of an order arises in many different settings: numbers ordered by size, sets ordered by inclusion, logical propositions ordered by provability. These structures share common features — transitivity, antisymmetry, reflexivity — but differ in whether every pair of elements can be compared. A total order forces a linear ranking; a partial order does not. The study of partial orders thus captures the common structural features of these diverse examples while retaining enough generality to be broadly applicable.
[definition: Partial Order]
A **partially ordered set** (or **poset**) is a pair $(X, \leq)$ where $X$ is a set and $\leq$ is a binary relation on $X$ satisfying:
1. $x \leq x$ for all $x \in X$ (reflexivity),
2. if $x \leq y$ and $y \leq z$, then $x \leq z$ (transitivity),
3. if $x \leq y$ and $y \leq x$, then $x = y$ (antisymmetry).
We write $x < y$ to mean $x \leq y$ and $x \neq y$. Equivalently, one may define a poset via a strict relation $<$ satisfying irreflexivity ($x \not< x$) and transitivity.
[/definition]
The strict relation $<$ and the non-strict relation $\leq$ are interdefinable: given $\leq$, set $x < y \iff x \leq y$ and $x \neq y$; given $<$, set $x \leq y \iff x < y$ or $x = y$. Both formulations appear in the literature, and it is worth being comfortable with each.
[example: Poset Examples]
The following are all posets.
1. Any totally ordered set (such as $\mathbb{R}$ or $\mathbb{Z}$ under $\leq$) is a poset. Total orders are the special case in which every pair of elements is comparable.
2. The natural numbers $\mathbb{N}$ under divisibility: declare $x \leq y$ if $x \mid y$. Reflexivity holds since $x \mid x$; transitivity holds since $x \mid y$ and $y \mid z$ imply $x \mid z$; antisymmetry holds since $x \mid y$ and $y \mid x$ imply $x = y$ in $\mathbb{N}$.
3. For any set $S$, the power set $\mathcal{P}(S)$ ordered by $\subseteq$ is a poset. This is the central example of the chapter — most arguments concern subsets of $\mathcal{P}(S)$.
4. Any subset of $\mathcal{P}(S)$ ordered by inclusion is again a poset. Subsets of power sets play a central role whenever one studies collections of sets with a given property.
[/example]
One useful way to visualise a finite poset is through a **Hasse diagram**. Rather than drawing all the order relations explicitly (which would be cluttered due to transitivity), one draws only the "immediate successors." The notion of an immediate successor requires a precise definition, since in a dense order such as $\mathbb{Q}$ there is no such thing — between any two elements lies another.
[definition: Cover and Hasse Diagram]
In a poset $(X, \leq)$, an element $y$ **covers** $x$ if $x < y$ and there is no $z \in X$ with $x < z < y$. The **Hasse diagram** of $X$ is the directed graph with vertex set $X$ where an upward edge is drawn from $x$ to $y$ precisely when $y$ covers $x$.
[/definition]
The convention is that "higher" in the diagram means "greater" in the order. Hasse diagrams are extremely useful for finite posets such as $\mathcal{P}(\{1,2,3\})$ or lattices of divisors of a fixed integer, but are of no use for dense orders such as $\mathbb{Q}$, where no element covers any other.
[example: Hasse Diagrams]
Consider the poset $(\{1, 2, 3, 6\}, \mid)$. The element $2$ covers $1$ (since $1 \mid 2$ and there is nothing strictly between), $3$ covers $1$, and $6$ covers both $2$ and $3$. The Hasse diagram has $1$ at the bottom, $2$ and $3$ at the next level, and $6$ at the top. Note that $1 \mid 6$ but we do not draw this edge directly, since it is already implied by the path $1 \to 2 \to 6$.
One must be careful: in a general poset, there need not be any notion of "height" or "rank" consistent with the order. A poset may have two incomparable maximal elements at different vertical levels in any drawing, and the Hasse diagram is merely a convenient depiction, not a strict height assignment.
[/example]
### Chains, Antichains, and Bounds
Not every pair of elements in a poset is comparable, but subsets in which all pairs are comparable — or in which no pair is — arise constantly in the theory.
[definition: Chain and Antichain]
In a poset $(X, \leq)$, a subset $S \subseteq X$ is a **chain** if it is totally ordered: for all $x, y \in S$, either $x \leq y$ or $y \leq x$. A subset $S$ is an **antichain** if no two distinct elements of $S$ are comparable: for all $x \neq y$ in $S$, neither $x \leq y$ nor $y \leq x$.
[/definition]
Chains and antichains represent the two extreme kinds of subset in a poset: a chain is one in which order imposes a total ranking on every pair, while an antichain is one in which order gives no information at all — every pair is incomparable. Most interesting posets contain both chains and antichains as subsets, and understanding their interplay is central to the theory. In Zorn's lemma, it is precisely the behaviour of chains that controls the existence of maximal elements; antichains, by contrast, measure the "width" of a poset and play a role in Dilworth's theorem (not covered here).
[example: Chains and Antichains]
In $(\mathbb{N}, \mid)$, the set $\{1, 2, 4, 8, 16, \ldots\}$ of powers of $2$ is a chain, since $2^k \mid 2^{k+1}$ for all $k \geq 0$. The set $\{2, 3, 5, 7, 11, \ldots\}$ of primes is an antichain, since no prime divides another.
In $\mathcal{P}(\{1,2,3\})$ under inclusion, the set $\{\{1\}, \{1,2\}, \{1,2,3\}\}$ is a chain. The set $\{\{1\}, \{2\}, \{3\}\}$ is an antichain.
[/example]
Among the most fundamental questions about a subset $S$ of a poset is whether it has a smallest upper bound.
[definition: Upper Bound and Supremum]
Let $(X, \leq)$ be a poset and $S \subseteq X$. An **upper bound** for $S$ is an element $x \in X$ such that $y \leq x$ for all $y \in S$. The **supremum** (or **least upper bound**, or **join**) of $S$ is an element $x \in X$, written $x = \sup S$ or $x = \bigvee S$, that is an upper bound for $S$ and satisfies $x \leq z$ for every upper bound $z$ of $S$.
Analogously, a **lower bound** for $S$ is an element $x$ with $x \leq y$ for all $y \in S$, and the **infimum** (or **greatest lower bound**, or **meet**) is the greatest lower bound, written $\inf S$ or $\bigwedge S$.
[/definition]
The supremum, when it exists, is unique by antisymmetry: if $x$ and $x'$ are both suprema of $S$, then $x \leq x'$ and $x' \leq x$, giving $x = x'$. However, a subset may have upper bounds without having a supremum — there may be no single upper bound smaller than all others.
[example: Suprema and Their Failure]
In $\mathbb{R}$ with its usual order, the set $\{x \in \mathbb{R} : x < \sqrt{2}\}$ has upper bound $7$, and has supremum $\sqrt{2}$.
Now consider the five-element poset with elements $\{a, b, c, d, e\}$ where $a < b < c$, $a < d < c$, and $b, d$ are incomparable. The subset $\{a, b\}$ has upper bounds $b$, $c$, and (via $b \leq c$ and $a \leq c$) $c$, so its supremum is $b$ (the smallest upper bound). However, the subset $\{b, d\}$ has only one upper bound, namely $c$, so $\sup\{b,d\} = c$.
Now add another element $e$ above both $b$ and $d$ but incomparable with $c$. Then $\{b, d\}$ has two upper bounds, $c$ and $e$, which are incomparable to each other, and therefore no supremum exists.
[/example]
### Complete Posets and the Knaster–Tarski Theorem
The failure of suprema to exist is an obstacle to analysis on posets. One natural condition that eliminates this obstacle entirely is completeness.
[definition: Complete Poset]
A poset $(X, \leq)$ is **complete** if every subset $S \subseteq X$ (including the empty set and $X$ itself) has a supremum in $X$.
[/definition]
It is essential to note that completeness requires suprema for all subsets, not merely bounded or non-empty ones. In particular, a complete poset must have a greatest element (namely $\sup X$) and a least element (namely $\sup \varnothing$, which is the least upper bound of the empty set, hence the element below everything).
[example: Complete and Incomplete Posets]
The real line $\mathbb{R}$ is not complete: $\mathbb{R}$ itself has no supremum. The closed interval $[0,1]$ is complete: every subset is bounded above by $1$, so a supremum exists; the empty set has supremum $0$ (since $0$ is the least element). The open interval $(0,1)$ is not complete: $(0,1)$ as a subset has no supremum in $(0,1)$.
The power set $\mathcal{P}(S)$ for any set $S$ is always complete. Given any family $\{A_i : i \in I\} \subseteq \mathcal{P}(S)$, the union $\bigcup_{i \in I} A_i$ is the supremum (it is an upper bound since each $A_i \subseteq \bigcup A_i$, and it is the least such since any upper bound $B$ must contain each $A_i$ hence $\bigcup A_i \subseteq B$). The empty family has supremum $\varnothing$.
[/example]
Complete posets admit a beautiful fixed-point theorem. The key observation is that in a complete poset, one can form the supremum of any collection of elements, and this can be leveraged to construct fixed points of order-preserving maps.
[definition: Order-Preserving Function]
Let $(X, \leq)$ be a poset. A function $f: X \to X$ is **order-preserving** (or **monotone**) if $x \leq y$ implies $f(x) \leq f(y)$ for all $x, y \in X$.
[/definition]
Not every order-preserving map has a fixed point: the map $n \mapsto n + 1$ on $\mathbb{N}$ is order-preserving but has no fixed point, and $x \mapsto x - 1$ on $\mathbb{Z}$ likewise has no fixed point. The difficulty is that these posets are not complete. On a complete poset, the situation changes fundamentally.
[quotetheorem:1478]
[citeproof:1478]
The ordering of the two steps matters: one must establish $s \leq f(s)$ before deducing $f(s) \leq s$, not the other way around. The proof is a model of a "bootstrap" argument: use the supremum structure to pull yourself up once, then use that gain to conclude the reverse inequality.
The theorem does not merely assert the existence of one fixed point: the set $E$ of pre-fixed points has a supremum which is itself a fixed point. In fact, the set of all fixed points of an order-preserving map on a complete poset is itself a complete poset, though we do not prove this here.
A striking application of Knaster–Tarski is a short, self-contained proof of the Cantor–Schröder–Bernstein theorem.
[quotetheorem:1479]
[citeproof:1479]
The Knaster–Tarski proof of Cantor–Schröder–Bernstein is worth pausing over. The usual proof proceeds by decomposing $A$ and $B$ into orbits under the composed maps $g \circ f$ and $f \circ g$, then stitching together $f$ and $g^{-1}$ orbit by orbit — an argument that, while correct, requires careful bookkeeping. The fixed-point proof reduces the entire construction to a single sentence: the required partition is a fixed point of an order-preserving map on a complete poset. The hypothesis that $f$ and $g$ are injections is essential: without injectivity, $\Phi$ need not be order-preserving, and without order-preservation Knaster–Tarski does not apply. The conclusion — a bijection — is also sharp: two injections $f: A \to B$ and $g: B \to A$ do not force $f$ or $g$ itself to be a bijection. The theorem says only that some bijection exists, and the constructed $h$ will in general use $f$ on one piece of $A$ and $g^{-1}$ on the complementary piece.
### Maximal Elements and Zorn's Lemma
The Knaster–Tarski theorem is about fixed points of maps. A different but equally fundamental question is: when does a poset contain a maximal element — one that cannot be extended further?
[definition: Maximal Element]
In a poset $(X, \leq)$, an element $m \in X$ is **maximal** if there is no $y \in X$ with $y > m$ (i.e.\ $y \geq m$ and $y \neq m$). An element $t \in X$ is a **maximum** (or **greatest element**) if $y \leq t$ for all $y \in X$.
[/definition]
The distinction between maximal and maximum is crucial and often confused. A maximum is above everything; a maximal element is simply not below anything. In a totally ordered set the two notions coincide, but in a general poset a maximal element need not be above all other elements — there may be other elements incomparable with it. A poset may have many maximal elements and no maximum.
[example: Maximal vs. Maximum]
In the five-element poset $\{a, b, c, d, e\}$ where $a < b < c$, $a < d < e$, and $\{b,c\}$ is incomparable with $\{d,e\}$: both $c$ and $e$ are maximal (nothing is above either), but neither is a maximum, since $c$ and $e$ are incomparable.
Not every poset has a maximal element. The naturals $\mathbb{N}$ under $\leq$ have no maximal element, nor does $\mathbb{Q}$ or $\mathbb{R}$. In each case, the failure is explained by the existence of chains that are unbounded above.
[/example]
Zorn's lemma provides a sufficient condition for the existence of maximal elements.
[quotetheorem:1226]
The hypothesis "every chain has an upper bound" is exactly the condition that prevents the failure mode seen in $\mathbb{N}$: there, the chain $1 < 2 < 3 < \cdots$ has no upper bound. The conclusion asserts the existence of a maximal element, not a maximum; the distinction matters in applications.
[citeproof:1226]
The proof is a transfinite hunt: start anywhere in $X$, keep moving up (using the assumed choice function), take upper bounds at limit stages (using AC again), and observe that the process cannot continue for more than $|X|$ steps. If it never terminates with a maximal element, one injects an ordinal larger than $X$ into $X$ — absurd.
[remark: Non-emptiness Hypothesis]
The hypothesis that $X$ is non-empty is not an independent assumption but rather a consequence of the chain condition in a minor way: the empty set $\varnothing$ is a chain, and the hypothesis "every chain has an upper bound" applied to $\varnothing$ would require $X$ to contain some element. So the hypotheses together force $X$ to be non-empty.
[/remark]
### Lattices and Boolean Algebras
In a general poset, the supremum $\sup\{x, y\}$ of a two-element set need not exist: we have seen examples where two elements have multiple incomparable upper bounds, preventing a least one from existing. The question, then, is what additional structure on a poset guarantees that every pair of elements has both a supremum and an infimum. A poset with this property admits algebraic operations — join and meet — making it behave in many respects like an algebra of sets. Even then, the interaction between join, meet, and a complementation operation is not automatic: one needs distributivity to hold, and a complementation law to be satisfied, before the structure mirrors the classical algebra of propositional logic. These additional requirements lead to the notion of a Boolean algebra.
[definition: Lattice]
A **lattice** is a poset $(X, \leq)$ in which every finite non-empty subset has a supremum and an infimum. Equivalently, every pair $\{x, y\}$ has a join $x \vee y = \sup\{x, y\}$ and a meet $x \wedge y = \inf\{x, y\}$. A **complete lattice** is a poset in which every subset (including empty and infinite) has a supremum and an infimum.
[/definition]
Note that a complete lattice is in particular a complete poset in the sense of our earlier definition: every subset has a supremum (and, additionally, an infimum). The power set $\mathcal{P}(S)$ ordered by inclusion is a complete lattice, with $A \vee B = A \cup B$ and $A \wedge B = A \cap B$.
Boolean algebras are lattices with an additional complementation operation satisfying the classical laws of propositional logic. The pair $(\vee, \wedge)$ alone does not suffice to reconstruct logical negation: a lattice may be highly non-trivial while admitting no complementation at all, or admitting complementation that fails to distribute correctly. The Boolean algebra axioms pin down exactly the structure that makes the algebra of a Boolean lattice behave like $\mathcal{P}(S)$.
[definition: Boolean Algebra]
A **Boolean algebra** is a lattice $(B, \leq, \vee, \wedge)$ together with a distinguished least element $0$, greatest element $1$, and a unary operation $\neg: B \to B$ (complementation) satisfying, for all $a, b \in B$:
\begin{align*}
a \vee \neg a &= 1, \qquad a \wedge \neg a = 0, \\
a \vee b &= b \vee a, \qquad a \wedge b = b \wedge a, \\
a \vee (b \wedge c) &= (a \vee b) \wedge (a \vee c), \\
a \wedge (b \vee c) &= (a \wedge b) \vee (a \wedge c).
\end{align*}
[/definition]
Boolean algebras arise in three closely related settings, each illuminating a different facet of the structure. The first is pure set algebra: for any set $S$, the power set $\mathcal{P}(S)$ is a Boolean algebra with $0 = \varnothing$, $1 = S$, and $\neg A = S \setminus A$. The second is the two-element algebra $B = \{0, 1\}$ with $\neg 0 = 1$ and $\neg 1 = 0$, which is the algebra of classical truth values and is the simplest non-trivial Boolean algebra. The third is propositional logic: the Lindenbaum–Tarski algebra of a propositional theory — whose elements are equivalence classes of sentences under provable equivalence — is a Boolean algebra, with $\vee$ corresponding to disjunction, $\wedge$ to conjunction, and $\neg$ to negation. These three incarnations are not coincidental: Stone's representation theorem (not proved here) establishes that every Boolean algebra is isomorphic to a field of sets, placing set algebra and propositional logic on precisely the same footing.
[example: Boolean Algebras]
The simplest Boolean algebra is $\{0, 1\}$ with $\neg 0 = 1$ and $\neg 1 = 0$ — this is the algebra of classical truth values.
For any set $S$, the power set $\mathcal{P}(S)$ ordered by inclusion is a Boolean algebra with $0 = \varnothing$, $1 = S$, $A \vee B = A \cup B$, $A \wedge B = A \cap B$, and $\neg A = S \setminus A$. This is the motivating example. The connection to logic is immediate: propositions about elements of $S$ correspond to subsets of $S$, and logical connectives correspond to set operations.
[/example]
### Applications of Zorn's Lemma
Zorn's lemma is one of the most widely applied results in all of mathematics. The strategy is always the same: find a poset whose maximal elements are exactly the objects one seeks, verify that every chain has an upper bound, and conclude by Zorn.
**Every vector space has a basis.** Recall that a basis of a vector space $V$ over a field $k$ is a subset $B \subseteq V$ that is both linearly independent (no nontrivial finite linear combination of elements of $B$ is zero) and spanning (every element of $V$ is a finite linear combination of elements of $B$).
For finite-dimensional spaces, the existence of a basis is a standard result from linear algebra. For infinite-dimensional spaces — such as the space of all real sequences, or $\mathbb{R}$ viewed as a vector space over $\mathbb{Q}$ — no elementary argument suffices, and Zorn's lemma is required.
[quotetheorem:1480]
[citeproof:1480]
A basis obtained in this way is called a **Hamel basis** when $V = \mathbb{R}$ over $\mathbb{Q}$. No explicit description of a Hamel basis can be given; its existence is purely existential, a testament to the non-constructive character of Zorn's lemma. It is worth identifying exactly where the argument requires the hypotheses. The finiteness of each nontrivial linear dependency relation is essential: the key step — finding a single $A_{i_m}$ in the chain that contains all the vectors in a given dependency — works because there are only finitely many vectors in that dependency. If "linear independence" were replaced by some infinitary condition, the same argument would fail. The field structure of $k$ is also used to conclude that a vector outside a maximal independent set must be in the span: this uses that linear combinations are finite, and that the coefficient field allows division, so dependence of $B \cup \{v\}$ forces $v$ into $\operatorname{span}(B)$.
The same pattern of proof recurs in logic. The completeness theorem for propositional logic over a countable set of primitives uses a direct construction; for an uncountable set of primitives, Zorn's lemma is needed.
[quotetheorem:1481]
[citeproof:1481]
The two hypotheses driving this proof are compactness of propositional logic and the initial consistency of $S$. Compactness — the fact that provability from $T$ involves only finitely many premises — is what makes the union of a chain of consistent sets again consistent: any contradiction would involve finitely many sentences, all drawable from a single element of the chain. Without compactness, the union of consistent sets need not be consistent. The initial consistency of $S$ is also necessary: if $S$ were inconsistent, then $X$ would be empty and Zorn's lemma would not apply. The conclusion — existence of a model — cannot be strengthened to say the model is canonical or uniquely determined; for uncountable $P$, a model is constructed purely existentially.
The reader should note how the two proofs — existence of a basis and existence of a model — follow exactly the same template. This parallelism is not coincidental: Zorn's lemma is the common mechanism underlying a vast number of existence results in modern mathematics.
## Zorn's Lemma and the Axiom of Choice
In the proof of Zorn's lemma, we made two kinds of choice: we chose, for each non-maximal $x \in X$, an element $x' > x$; and we chose, for each chain $C$, an upper bound $u(C)$. Each of these is an act of arbitrary selection, and taken together they constitute infinitely many simultaneous choices. This is precisely the content of the Axiom of Choice.
Making infinitely many simultaneous choices is not automatically permitted by the other axioms of set theory. For a single set $A \neq \varnothing$, the assertion $\exists x \in A$ is simply the statement that $A$ is non-empty — no choice principle is needed. For a finite family of non-empty sets, one can select an element from each by induction on the size of the family. But for an infinite family, no such inductive process is available, and the Axiom of Choice is required.
[definition: Axiom of Choice]
Given any family $\{A_i : i \in I\}$ of non-empty sets (where $I$ is any index set), there exists a **choice function** $f: I \to \bigcup_{i \in I} A_i$ such that $f(i) \in A_i$ for all $i \in I$.
[/definition]
The Axiom of Choice differs fundamentally in character from the other set-building axioms (union, power set, separation, replacement). Those axioms are **constructive**: they specify exactly what set is produced and that set is unique. The union $A \cup B$ is the uniquely determined set of elements belonging to $A$ or $B$; there is no ambiguity. The choice function, by contrast, is highly non-unique — in general there are many possible choice functions, and the axiom asserts only that one exists, without specifying which. This non-constructive character makes AC philosophically distinctive and historically controversial.
One may ask: did we actually need AC to prove Zorn's lemma, or did we merely use it out of convenience? The answer is that AC is genuinely necessary: Zorn's lemma implies AC. The two are equivalent. There is a third equivalent formulation, the Well-Ordering Principle, completing a triangle of equivalences.
[quotetheorem:1482]
[citeproof:1482]
This equivalence is fundamental to the foundations of set theory. The three statements differ greatly in their apparent content — a local combinatorial existence principle (Zorn), a global algebraic choice principle (AC), and a structural principle about orderings (WOP) — yet they are exactly equivalent. Each implies the other two using only the remaining ZF axioms.
It is worth pausing to note a potential circularity concern: the proof of Zorn's lemma invoked Hartogs' lemma and ordinal recursion, both developed in Chapter 3. None of those arguments used AC; they relied on ZF axioms alone. The ordinal machinery is therefore available as a foundation for the proof of Zorn's and the well-ordering construction above.
[example: Extension of Partial Orders]
A further application of Zorn's lemma: every partial order on a set $X$ can be extended to a total order. Given a partial order $\leq$ on $X$, let $\mathcal{T}$ be the collection of all partial orders on $X$ that extend $\leq$, ordered by inclusion (viewed as relations, i.e.\ subsets of $X \times X$). The union of a chain of partial orders extending $\leq$ is again a partial order extending $\leq$ (one checks that reflexivity, transitivity, and antisymmetry are each preserved by directed unions). By Zorn's lemma, $\mathcal{T}$ has a maximal element $\leq^*$. If $\leq^*$ were not total, there would exist incomparable $a, b \in X$; one could then extend $\leq^*$ by adding $a \leq^* b$ and taking the transitive closure, contradicting maximality. Hence $\leq^*$ is a total order extending $\leq$.
[/example]
## Bourbaki–Witt Theorem*
The Bourbaki–Witt theorem provides a second fixed-point theorem, this time for **chain-complete** posets and **inflationary** functions. Its chief distinction from Zorn's lemma — and from Knaster–Tarski — is that it does not require the Axiom of Choice. Understanding why requires identifying precisely where AC entered our earlier arguments. In the proof of Zorn's lemma, AC was used twice: once to select, for each non-maximal element $x$, a strict upper bound $x' > x$; and once to select, for each chain $C$, an upper bound $u(C)$ from among possibly many candidates. Both uses are genuine — removing either makes the proof collapse. The Bourbaki–Witt theorem sidesteps both: if $f$ is an inflationary function, the first selection is replaced by the specific element $f(x)$, determined by $f$ itself rather than by choice; and if the poset is chain-complete, the second selection is replaced by the canonical supremum, which is uniquely determined by the chain. The result is a fixed-point theorem that makes no arbitrary choices at any point.
[definition: Chain-Complete Poset]
A poset $(X, \leq)$ is **chain-complete** if $X \neq \varnothing$ and every non-empty chain in $X$ has a supremum.
[/definition]
Chain-completeness is a strictly weaker condition than completeness: we require suprema only for non-empty chains, not for all subsets. In particular, $X$ itself need not have a supremum.
[example: Chain-Complete Posets]
Every complete poset is chain-complete, since suprema exist for all subsets and in particular for non-empty chains. Any finite non-empty poset is chain-complete: every chain is finite and has a greatest element, which serves as its supremum. The collection $\{A \subseteq V : A \text{ is linearly independent}\}$ for a vector space $V$, ordered by inclusion, is chain-complete by the argument in the proof of the basis theorem: the union of a chain of linearly independent sets is linearly independent and is the supremum of that chain.
[/example]
The preceding chain-completeness result shows that the poset of linearly independent sets is exactly the kind of structure on which Bourbaki–Witt applies. What is needed, then, is a function on this poset that is inflationary — one that moves each element upward (or keeps it fixed) — rather than one that is order-preserving in the sense of Knaster–Tarski. Inflation is a weaker condition than monotonicity and arises naturally when one defines a function by "adding one more element."
[definition: Inflationary Function]
A function $f: X \to X$ on a poset $(X, \leq)$ is **inflationary** if $f(x) \geq x$ for all $x \in X$.
[/definition]
Every inflationary function is a candidate for Bourbaki–Witt. Note that an inflationary function need not be order-preserving: the conclusion $f(x) \geq x$ is about comparing $x$ to its own image, not about comparing images of comparable elements. The map $f(x) = x + 1$ on $\mathbb{N}$ is both inflationary and order-preserving, but one can construct inflationary maps on partial orders that are not monotone.
[quotetheorem:1483]
[citeproof:1483]
The Bourbaki–Witt theorem can be deduced more swiftly from Zorn's lemma: if $X$ is chain-complete and $f$ is inflationary, then $X$ has a maximal element $m$ (by Zorn applied to $X$ with the guarantee that every chain has a supremum, hence an upper bound). Since $f(m) \geq m$ and $m$ is maximal, $f(m) = m$. However, this derivation uses Zorn's lemma and hence AC. The proof above is constructive in the sense that it explicitly identifies the ordinal stage at which the iteration stabilises, requiring no choice at any step.
[remark: Bourbaki–Witt and Zorn]
One may view the Bourbaki–Witt theorem as isolating the "AC-free" core of Zorn's lemma. The passage from Bourbaki–Witt to Zorn's lemma requires additional machinery, but the essential fixed-point content is already present without AC. In constructive and choice-free settings, Bourbaki–Witt is the tool of choice for arguments that would classically appeal to Zorn's lemma applied to posets arising from iterative constructions.
[/remark]
Zorn's lemma and fixed-point theorems resolve existence questions about posets, but they operate in the metalanguage of set theory itself. To formalize such reasoning within mathematics, we must expand propositional logic to predicate logic, where quantifiers and predicate symbols allow us to express properties and relationships with full generality.
# 4. Predicate Logic
Predicate logic is a vast expansion of the propositional calculus developed in Chapter 2. Propositional logic is expressive enough to reason about compound statements built from atomic propositions, but it cannot say anything about the internal structure of those propositions — it cannot speak of elements, functions, or relations. To reason about group theory, number theory, or any branch of mathematics where we quantify over objects, we need a richer framework. This chapter builds that framework from the ground up: first the formal language, then the semantic notion of truth in a structure, then the syntactic proof system and its relationship to semantics via the Soundness and Completeness theorems, and finally two fundamental examples — Peano Arithmetic and the model-theoretic properties of complete and categorical theories.
## The Language of Predicate Logic
Propositional logic provides no way to express a statement like "every element has an inverse" or "there exists an element satisfying a given equation." The obstacle is that propositional logic has no variables ranging over a domain of objects, no function symbols that produce new objects from old ones, and no relation symbols that compare objects. To state a single axiom of group theory — say, associativity — we need all three of these ingredients simultaneously. The language of predicate logic introduces precisely this structure.
The fundamental design decision is to separate the **signature** (the list of symbols and their arities) from the **meaning** those symbols carry. A formula is a syntactic object, a finite string of symbols with no inherent interpretation. Meaning is supplied only later, when we fix a structure. This separation is what allows us to prove theorems that apply simultaneously to every group, every field, or every poset.
[definition: Signature and Language]
Let $\Omega$ (the set of **function symbols**) and $\Pi$ (the set of **relation symbols**) be disjoint sets, and let $\alpha : \Omega \cup \Pi \to \mathbb{N}$ be a function assigning to each symbol its **arity**. A function symbol $f \in \Omega$ with $\alpha(f) = 0$ is called a **constant symbol**.
The **language** $L = L(\Omega, \Pi, \alpha)$ consists of
- an infinite supply of **variables** $x_1, x_2, x_3, \ldots$ (also written $x, y, z, \ldots$ informally);
- the logical connectives $\bot$, $\Rightarrow$, and the quantifier $\forall$;
- the equality symbol $=$.
The **terms** of $L$ are defined inductively:
1. Every variable is a term.
2. If $f \in \Omega$ with $\alpha(f) = n$ and $t_1, \ldots, t_n$ are terms, then $f(t_1, \ldots, t_n)$ is a term.
The **atomic formulas** of $L$ are:
1. $\bot$ (falsehood);
2. $(s = t)$ for any terms $s, t$;
3. $\phi(t_1, \ldots, t_n)$ for any $\phi \in \Pi$ with $\alpha(\phi) = n$ and terms $t_1, \ldots, t_n$.
The **well-formed formulas** (wffs) of $L$ are defined inductively:
1. Every atomic formula is a wff.
2. If $p$ and $q$ are wffs, then $(p \Rightarrow q)$ is a wff.
3. If $p$ is a wff and $x$ is a variable, then $(\forall x)\, p$ is a wff.
[/definition]
The other logical connectives are abbreviations: $\neg p$ stands for $p \Rightarrow \bot$; $p \wedge q$ stands for $\neg(p \Rightarrow \neg q)$; $p \vee q$ stands for $\neg p \Rightarrow q$; and $(\exists x)\, p$ stands for $\neg(\forall x)(\neg p)$. These are entirely defined in terms of the primitive symbols, so no new grammar rules are needed. Two examples illustrate the range of structures this language can describe.
[example: Language of Groups]
The language of groups has $\Omega = \{m, i, e\}$ with $\alpha(m) = 2$, $\alpha(i) = 1$, $\alpha(e) = 0$, and $\Pi = \varnothing$. Terms include: $e$ (the constant), $x_1$, $m(x_1, x_2)$, and $i(m(x_1, x_1))$. Atomic formulas include $m(e, e) = e$ and $x_1 = e$. A wff that will serve as an axiom is $(\forall x)(\forall y)(\forall z)\, m(x, m(y,z)) = m(m(x,y),z)$, expressing associativity.
[/example]
[example: Language of Posets]
The language of posets has $\Omega = \varnothing$ and $\Pi = \{\leq\}$ with $\alpha(\leq) = 2$. There are no function symbols, so the only terms are variables. Atomic formulas include $x_1 = x_2$ and $\leq(x_1, x_2)$ (written $x_1 \leq x_2$). The contrast with the group language is instructive: the absence of function symbols means one cannot form compound terms — every term is simply a variable — and consequently the richness of the language resides entirely in its relation symbol.
[/example]
### Free and Bound Variables, Sentences
A critical distinction in predicate logic concerns which variable occurrences carry information and which are merely placeholders captured by a quantifier. Without this distinction, we cannot say what a formula asserts about a particular element.
[definition: Free and Bound Variables]
An occurrence of a variable $x$ in a formula $p$ is **bound** if it falls within the scope of a $(\forall x)$ quantifier, and **free** otherwise. The **scope** of $(\forall x)$ in $(\forall x)\, q$ is the entire formula $q$.
[/definition]
The distinction matters because free variables act as parameters: a formula with a free variable $x$ makes a statement about whatever element $x$ refers to in a given context, whereas a bound variable is a silent placeholder with no referent outside its quantifier. Renaming a bound variable leaves the formula logically equivalent; renaming a free variable changes the subject of the assertion entirely.
[example: Free and Bound Occurrences]
In $(\forall x)(m(x, x) = e)$, the variable $x$ is bound throughout. In $(\forall y)(m(y,y) = x \Rightarrow m(x,y) = m(y,x))$, the variable $y$ is bound while $x$ is free. It is formally permitted for a variable to appear both free and bound in a single formula — for instance $m(x,x) = e \Rightarrow (\forall x)(\forall y)(m(x,y) = m(y,x))$ is well-formed — but this practice causes confusion and should be avoided.
[/example]
[definition: Sentence]
A **sentence** is a wff with no free variables.
[/definition]
Sentences are the objects that can be meaningfully said to be true or false in a structure: the free variables of a formula act as parameters, and only when all variables are bound (or replaced by closed terms) does the formula make a determinate assertion. For example, $m(e, e) = e$ and $(\forall x)(m(x, x) = e)$ are sentences, while $m(x, i(x)) = e$ is not — the latter makes a claim about a specific but unnamed element $x$, which requires a context to evaluate.
[definition: Closed Term and Substitution]
A term is **closed** if it contains no variables. For a formula $p$, a variable $x$, and a term $t$, the **substitution** $p[t/x]$ is the formula obtained by replacing every free occurrence of $x$ in $p$ with $t$.
[/definition]
Substitution must be performed carefully: if $t$ contains a variable $y$ that would fall within the scope of a $(\forall y)$ quantifier in $p$, the substitution would **capture** $y$, changing its status from free to bound. The axioms of predicate logic include side conditions specifically to avoid this. In the language of groups, $e$ and $m(e,e)$ are closed terms, but $m(x, i(x))$ is not closed even though $m(x,i(x)) = e$ holds in every group — the expression still contains the free variable $x$.
## Semantic Entailment
Propositional logic assigns truth values to atomic propositions via a valuation function. In predicate logic the situation is more structured: a formula like $(\forall x)(m(x,x) = e)$ makes an assertion about every element of some domain. To evaluate it, we need to know both the domain and the interpretations of the function and relation symbols. This data is packaged together in a structure.
[definition: $L$-Structure]
An **$L$-structure** is a non-empty set $A$ together with:
- for each $f \in \Omega$ with $\alpha(f) = n$, a function $f_A : A^n \to A$;
- for each $\phi \in \Pi$ with $\alpha(\phi) = n$, a relation $\phi_A \subseteq A^n$.
[/definition]
The requirement that $A$ be non-empty is deliberate. If we allowed $A = \varnothing$, then $(\forall x)\, \bot$ would hold vacuously in the empty structure. But Axiom 6 of predicate logic (stated below) gives $[(\forall x)\, \bot] \Rightarrow \bot$, so we could derive $\bot$ from no premises — making every theory inconsistent over the empty structure. Excluding empty structures avoids this anomaly with no practical cost, since any language with even one constant symbol automatically has no empty models.
[example: Structures for Groups and Posets]
In the language of groups, an $L$-structure is a set $A$ with functions $m_A : A \times A \to A$, $i_A : A \to A$, and a constant $e_A \in A$. This need not satisfy the group axioms — any set with such functions qualifies as a structure. A structure that additionally satisfies the group axioms is precisely a group.
In the language of posets, an $L$-structure is a set $A$ with a binary relation ${\leq_A} \subseteq A \times A$, again not necessarily a partial order until the axioms are imposed.
[/example]
Given a structure $A$ and a sentence $p$, we want to define what it means for $p$ to **hold** in $A$. The informal procedure is: replace each function symbol $f$ by $f_A$, each relation symbol $\phi$ by $\phi_A$, and interpret quantifiers as ranging over $A$. The formal definition makes this precise by induction on the complexity of $p$.
[definition: Interpretation]
For a sentence $p$ and an $L$-structure $A$, the **interpretation** $p_A \in \{0, 1\}$ is defined inductively:
**Closed terms:** For each closed term $t$, define $t_A \in A$ by $(f(t_1, \ldots, t_n))_A = f_A((t_1)_A, \ldots, (t_n)_A)$.
**Atomic formulas:**
\begin{align*}
\bot_A &= 0, \\
(s = t)_A &= \begin{cases} 1 & \text{if } s_A = t_A \\ 0 & \text{otherwise,} \end{cases} \\
(\phi(t_1,\ldots,t_n))_A &= \begin{cases} 1 & \text{if } ((t_1)_A, \ldots, (t_n)_A) \in \phi_A \\ 0 & \text{otherwise.} \end{cases}
\end{align*}
**Compound formulas:**
\begin{align*}
(p \Rightarrow q)_A &= \begin{cases} 0 & \text{if } p_A = 1 \text{ and } q_A = 0 \\ 1 & \text{otherwise,} \end{cases} \\
((\forall x)\, p)_A &= \begin{cases} 1 & \text{if } p[\bar{a}/x]_{\bar{A}} = 1 \text{ for all } a \in A \\ 0 & \text{otherwise,} \end{cases}
\end{align*}
where $\bar{A}$ is $A$ regarded as a structure for the extended language $L' = L(\Omega \cup \{\bar{a}\}, \Pi, \alpha')$ with $\bar{a}$ a new constant interpreted as $a$.
[/definition]
In practice, one uses the informal procedure ("add subscript $A$ everywhere and read it aloud") and invokes the formal definition only when precision demands it. What matters is that the definition is entirely compositional: the truth value of a compound sentence is determined recursively from the truth values of its immediate subformulas.
[definition: Theory, Model, and Semantic Entailment]
A **theory** is a set of sentences. A sentence $p$ **holds** in an $L$-structure $A$, written $A \models p$, if $p_A = 1$. A **model** of a theory $S$ is a structure in which every sentence of $S$ holds.
For a theory $S$ and a sentence $t$, we write $S \models t$ (**$S$ semantically entails $t$**) if every model of $S$ is also a model of $t$. A sentence $t$ is a **tautology**, written $\models t$, if $\varnothing \models t$, i.e., $t$ holds in every $L$-structure.
[/definition]
The definition of semantic entailment immediately shows that $(\forall x)(x = x)$ is a tautology: in any structure $A$ and for any $a \in A$, the element $a$ equals itself.
[example: Theories for Groups, Fields, and Graphs]
**Groups.** The theory $T_{\mathrm{grp}}$ in the language $L = (\{m, i, e\}, \varnothing, \alpha)$ consists of three sentences:
\begin{align*}
&(\forall x)(\forall y)(\forall z)\, m(x, m(y,z)) = m(m(x,y),z), \\
&(\forall x)(m(x,e) = x \wedge m(e,x) = x), \\
&(\forall x)(m(x,i(x)) = e \wedge m(i(x),x) = e).
\end{align*}
An $L$-structure is a model of $T_{\mathrm{grp}}$ if and only if it is a group. It follows that $T_{\mathrm{grp}} \models (\forall x)(\forall y)(m(x,y) = m(y,x))$ is false (non-abelian groups exist), while $T_{\mathrm{grp}} \models (\forall x)(i(i(x)) = x)$ is true.
**Fields.** The language is $\Omega = \{+, \times, -, 0, 1\}$ with arities $2, 2, 1, 0, 0$ and $\Pi = \varnothing$. The theory $T_{\mathrm{fld}}$ expresses that the structure is an abelian group under $+$, that $\times$ is commutative, associative, and distributes over $+$, that $\neg(0 = 1)$, and that $(\forall x)(\neg(x = 0) \Rightarrow (\exists y)(y \times x = 1))$. Then $T_{\mathrm{fld}} \models (\forall x)(\forall y)(\forall z)((xy = 1 \wedge xz = 1) \Rightarrow y = z)$, expressing the uniqueness of multiplicative inverses.
**Graphs.** The language has $\Omega = \varnothing$ and $\Pi = \{a\}$ with $\alpha(a) = 2$, where $a(x,y)$ means "$x$ and $y$ are adjacent." The theory consists of $(\forall x)\neg a(x,x)$ (no self-loops) and $(\forall x)(\forall y)(a(x,y) \Rightarrow a(y,x))$ (symmetry).
[/example]
Predicate logic is also called **first-order logic**. The quantifiers range over elements of the structure, not over subsets. This is the key limitation: a property like "every bounded subset has a supremum" cannot be expressed in first-order logic, because it quantifies over subsets. As a consequence, the complete ordered field axioms do not characterize $\mathbb{R}$ up to isomorphism in first-order logic — any theory that $\mathbb{R}$ satisfies also has non-isomorphic models.
## Syntactic Implication
Semantic entailment tells us when $S \models p$ holds across all models, but checking every structure is impractical. We need a finitary proof system: a set of axiom schemes and deduction rules that allow us to derive $p$ from $S$ in finitely many steps. The central question of this section is whether the proof system is adequate — whether $S \vdash p$ and $S \models p$ coincide.
The axioms of predicate logic extend the three axioms of propositional logic with two axioms governing equality and two governing universal quantification. The equality axioms express that $=$ is a congruence relation: reflexivity is built in, and the axiom for substitutivity says that equal elements are interchangeable in any formula.
[definition: Axioms of Predicate Logic]
The **axioms of predicate logic** are the following axiom schemes (each holds for all formulas and variables in the given positions):
**Propositional axioms:**
1. $p \Rightarrow (q \Rightarrow p)$
2. $[p \Rightarrow (q \Rightarrow r)] \Rightarrow [(p \Rightarrow q) \Rightarrow (p \Rightarrow r)]$
3. $\neg\neg p \Rightarrow p$
**Equality axioms:**
4. $(\forall x)(x = x)$
5. $(\forall x)(\forall y)\big((x = y) \Rightarrow (p \Rightarrow p[y/x])\big)$ for any formula $p$ in which $y$ does not occur bound.
**Quantifier axioms:**
6. $[(\forall x)\, p] \Rightarrow p[t/x]$ for any formula $p$ and term $t$ in which no free variable of $t$ becomes bound after substitution.
7. $[(\forall x)(p \Rightarrow q)] \Rightarrow [p \Rightarrow (\forall x)\, q]$ for any formulas $p, q$ with $x$ not free in $p$.
The **deduction rules** are:
1. **Modus Ponens:** from $p$ and $p \Rightarrow q$, deduce $q$.
2. **Generalization:** from $r$, deduce $(\forall x)\, r$, provided no premise used in deriving $r$ has $x$ as a free variable.
[/definition]
Axiom 6 is the **instantiation** axiom: a universal statement may be specialized to any term. The side condition prevents variable capture — if we substituted $t = y$ into $(\forall x)(\forall y)(x \neq y)$, we would incorrectly obtain $(\forall y)(y \neq y)$ if the $y$ in $t$ were captured. Axiom 7 allows us to "pull in" a universal quantifier over the conclusion of an implication, provided the hypothesis does not mention the quantified variable. The Generalization rule carries its own side condition: we may not generalize over a variable that appeared free in a premise, since a premise about a specific $x$ cannot be promoted to a universal statement.
[definition: Proof and Syntactic Implication]
A **proof** of a formula $p$ from a set of formulas $S$ is a finite sequence of formulas $q_1, \ldots, q_n = p$ in which each $q_i$ is either an axiom, a member of $S$, or follows from earlier formulas by Modus Ponens or Generalization. We write $S \vdash p$ and say **$S$ syntactically implies $p$** (or **proves $p$**). A formula $p$ with $S \vdash p$ is called a **theorem of $S$**.
[/definition]
The definition makes the proof system entirely syntactic: one need not think about structures or truth to verify a proof, only about which formulas appear in the sequence and whether each step is licensed. The following derivation shows this in action.
[example: Deriving Symmetry of Equality]
We show $\{x = y, x = z\} \vdash y = z$. Using Axiom 5 with the formula $p$ being $x = z$:
\begin{align*}
&(\forall x)(\forall y)\big((x = y) \Rightarrow (x = z \Rightarrow y = z)\big) &&\text{Axiom 5} \\
&\big[(\forall x)(\forall y)(\cdots)\big] \Rightarrow (\forall y)(x = y \Rightarrow x = z \Rightarrow y = z) &&\text{Axiom 6} \\
&(\forall y)(x = y \Rightarrow x = z \Rightarrow y = z) &&\text{MP} \\
&\big[(\forall y)(\cdots)\big] \Rightarrow \big[(x = y) \Rightarrow (x = z \Rightarrow y = z)\big] &&\text{Axiom 6} \\
&(x = y) \Rightarrow \big[(x = z) \Rightarrow (y = z)\big] &&\text{MP} \\
&x = y &&\text{Premise} \\
&(x = z) \Rightarrow (y = z) &&\text{MP} \\
&x = z &&\text{Premise} \\
&y = z &&\text{MP}
\end{align*}
The first five lines are a standard routine for eliminating universal quantifiers; the real content begins at line 6.
[/example]
### The Deduction Theorem
In propositional logic, the Deduction Theorem states that $S \cup \{p\} \vdash q$ if and only if $S \vdash p \Rightarrow q$. The same result holds for predicate logic, but the proof requires additional care because of the Generalization rule: when we convert a proof of $q$ from $S \cup \{p\}$ into a proof of $p \Rightarrow q$ from $S$, we must ensure that any application of Generalization in the original proof remains valid.
[quotetheorem:1484]
[citeproof:1484]
The Deduction Theorem is indispensable in practice: instead of building up a proof of $p \Rightarrow q$ explicitly from $S$, we may work with the richer assumption set $S \cup \{p\}$ and derive $q$ more naturally. It is worth pausing on the side condition in the Generalization rule, because omitting it would make the theorem false. Suppose we dropped the requirement that $x$ not be free in any premise used to derive $r$. Then from the single premise $\phi(x)$ (with $x$ free) we could apply Generalization to obtain $(\forall x)\,\phi(x)$. By the (hypothetically unrestricted) Deduction Theorem, this would give $\vdash \phi(x) \Rightarrow (\forall x)\,\phi(x)$ — asserting that any property holding of one specific element automatically holds universally. This would allow us to prove $(\forall x)\,\phi(x)$ from $\phi(x)$ as a free hypothesis, regardless of what $\phi$ says, which is plainly unsound: $\phi(x)$ might be true of only one element of a large structure. The side condition is exactly what prevents this collapse.
### Soundness
The first bridge between syntax and semantics is Soundness: every provable statement is semantically valid. This direction is the easy one — it amounts to checking that all axioms are tautologies and that the deduction rules preserve truth.
[quotetheorem:1485]
[citeproof:1485]
Soundness guarantees that predicate logic does not prove falsehoods: if something is provable from $S$, it is true in every model of $S$. The axiom system is crucial here — in particular, Axiom 3 ($\neg\neg p \Rightarrow p$, the law of double negation elimination) is what makes the system classical rather than intuitionistic. Without Axiom 3, the remaining axioms describe intuitionistic predicate logic. In that setting, Soundness still holds (with respect to Kripke semantics rather than classical truth), but the statement of Completeness changes: one can no longer derive every classically valid sentence. For example, $p \vee \neg p$ is a classical tautology (it holds in every $L$-structure under the two-valued semantics) but is not provable in intuitionistic logic — its proof requires Axiom 3 or an equivalent. The system we have presented is sound for classical semantics precisely because Axiom 3 is present. The converse of Soundness — Completeness — is much deeper.
### Completeness: Henkin's Construction
The Completeness Theorem asserts that $S \models p$ implies $S \vdash p$. By the Deduction Theorem, this is equivalent to showing: if $S \cup \{\neg p\}$ is consistent (does not prove $\bot$), then $S \cup \{\neg p\}$ has a model. Thus the theorem reduces to a single key lemma.
[quotetheorem:1486]
[citeproof:1486]
Several features of this argument deserve attention. The use of Zorn's Lemma in Phase 1 is necessary when $L$ is uncountable: there is no effective procedure for deciding which sentences to add, and Zorn replaces that missing decision procedure. When $L$ is countable, however, the complete extension can be built by direct enumeration — list all sentences $p_1, p_2, p_3, \ldots$ of $L$ and at stage $n$ add $p_n$ if doing so preserves consistency, and $\neg p_n$ otherwise. This entirely avoids the axiom of choice (or more precisely, requires only the Boolean Prime Ideal Theorem, which is strictly weaker than the full axiom of choice and suffices for the uncountable case via a more refined argument). The term model itself is built entirely from the language — there is no "external" domain — and the witness constants are the device that forces existential sentences to be satisfied: without them, the term model might contain a sentence $(\exists x)\, q$ provable in $\bar{S}$ without any closed term $t$ providing $q[t/x]$.
[quotetheorem:1487]
[citeproof:1487]
The Completeness Theorem is a profound alignment: every truth that holds universally across all models is accessible by a finite proof. It does not say that every sentence is decidable — the question of which sentences are provable from $S$ may still be computationally intractable or even undecidable, as Gödel's Incompleteness Theorem shows for Peano Arithmetic (see Chapter 8). Nor does Completeness say anything about which sentences are true in a specific model: it speaks only about semantic consequences that hold in every model simultaneously. A sentence true in $\mathbb{N}$ but false in some non-standard model of PA is not a semantic consequence of PA, and the Completeness Theorem correctly predicts it is not provable in PA either.
## Peano Arithmetic
How do we pin down the natural numbers in first-order logic? A naive attempt might say "every element is reachable from $0$ by finitely many applications of the successor function." But this formulation quantifies over the natural numbers themselves in the meta-language — "finitely many" is not a first-order concept unless we already have the natural numbers available to count with. Any attempt to express reachability directly collapses into circularity. Peano's axioms break this impasse by replacing the single non-first-order reachability condition with an induction scheme: instead of asserting that every element can be reached, we assert that any definable property which holds at $0$ and is closed under the successor must hold everywhere. Each instance of the scheme is a genuine first-order sentence, and together they capture a great deal of the inductive structure of $\mathbb{N}$ without appealing to any external notion of finiteness.
The signature for Peano Arithmetic is $\Omega = \{0, s, +, \times\}$ with arities $0, 1, 2, 2$ respectively, and $\Pi = \varnothing$. The symbol $s$ is the **successor** function, to be thought of as "$+1$": the natural number $n$ is represented by the closed term $s(\cdots s(0)\cdots)$ with $n$ applications of $s$.
[definition: Peano Arithmetic]
The **Peano Arithmetic** (PA) is the theory in the language $(\{0, s, +, \times\}, \varnothing, \alpha)$ with the following axioms:
1. $(\forall x)\, \neg(s(x) = 0)$ — $0$ is not a successor;
2. $(\forall x)(\forall y)\, (s(x) = s(y) \Rightarrow x = y)$ — $s$ is injective;
3. For each formula $p(x, y_1, \ldots, y_n)$ with free variables among $x, y_1, \ldots, y_n$:
\begin{align*}
(\forall y_1) \cdots (\forall y_n)\big[p[0/x] \wedge (\forall x)(p \Rightarrow p[s(x)/x])\big] \Rightarrow (\forall x)\, p
\end{align*}
(the **induction scheme** — one axiom for each such formula $p$);
4. $(\forall x)(x + 0 = x)$;
5. $(\forall x)(\forall y)(x + s(y) = s(x + y))$;
6. $(\forall x)(x \times 0 = 0)$;
7. $(\forall x)(\forall y)(x \times s(y) = (x \times y) + x)$.
[/definition]
The standard model $\mathbb{N}$ (with $0, s(n) = n+1, +, \times$ their usual meanings) satisfies all these axioms. Axioms 4 and 5 together define addition by recursion on the second argument, and Axioms 6 and 7 define multiplication similarly.
The induction scheme in Axiom 3 deserves careful attention. The quantifiers $(\forall y_1) \cdots (\forall y_n)$ in front are necessary: to prove commutativity of addition $x + y = y + x$ by induction on $x$, we need to carry the parameter $y$ along, proving the statement for all $y$ simultaneously. Without the universal quantifiers over the parameters, we could only fix a specific $y = 0, 1, 2, \ldots$ and run a separate induction for each, which requires naming each individual natural number — impossible in an uncountable model.
### Non-Standard Models of PA
By the Upward Löwenheim–Skolem theorem (proved below), PA has models of every infinite cardinality. Since $\mathbb{N}$ is countable, any uncountable model of PA is not isomorphic to $\mathbb{N}$. These **non-standard models** contain elements that behave like natural numbers under the operations $s, +, \times$, but that are "larger than all standard natural numbers" — there exist non-standard elements $h$ for which $0 < h, s(0) < h, s(s(0)) < h, \ldots$ holds (using the order definable from PA), yet $h \neq s^n(0)$ for any $n \in \mathbb{N}$.
The existence of non-standard models is not a contradiction: it shows that no first-order theory can fully characterize $\mathbb{N}$ up to isomorphism. The true second-order induction axiom — which quantifies over all subsets of $\mathbb{N}$ — does characterize $\mathbb{N}$, but it lies outside first-order logic. The first-order induction scheme of PA covers only those subsets of $\mathbb{N}$ that are definable by a first-order formula, and there are only countably many such formulas.
Gödel's Incompleteness Theorem states that PA is not complete: there exists a sentence $p$ such that neither $\mathrm{PA} \vdash p$ nor $\mathrm{PA} \vdash \neg p$. Such a $p$ is true in $\mathbb{N}$ but not provable in PA. This does not contradict Gödel's Completeness Theorem: the Completeness Theorem says that if $p$ holds in every model of PA, then PA proves $p$. Since PA has non-standard models that do not satisfy every sentence true in $\mathbb{N}$, the sentence $p$ (which is true in $\mathbb{N}$ but false in some non-standard model) falls outside the scope of what PA can prove.
## Completeness and Categoricity*
The Completeness Theorem raises a further question: which theories are **complete** in the sense that they settle every sentence one way or the other? And which theories have a unique model of each infinite cardinality? This section develops these ideas and draws out their consequences via the Compactness Theorem and the Löwenheim–Skolem theorems.
Throughout this section we assume the language $L$ is countable (i.e., $\Omega$ and $\Pi$ are both countable sets).
[definition: Complete Theory]
A theory $T$ is **complete** if for every sentence $p$ in the language, either $T \vdash p$ or $T \vdash \neg p$.
[/definition]
Complete theories are relatively rare. The theory of groups is not complete, since the commutativity sentence $(\forall x)(\forall y)(m(x,y) = m(y,x))$ is true in abelian groups and false in non-abelian groups, so neither the sentence nor its negation follows from the group axioms alone. Finding complete theories requires either very strong axiom sets or a structural argument showing all models look alike.
### The Compactness Theorem
The Compactness Theorem is an immediate corollary of the Completeness Theorem, but its implications reach far beyond mere syntax. It says that an infinite theory can be inconsistent only if some finite fragment is already inconsistent — or dually, that satisfying every finite subset is sufficient to guarantee the existence of a model for the whole theory.
[quotetheorem:1488]
[citeproof:1488]
The Compactness Theorem is a powerful tool for constructing non-standard models and for showing that certain properties cannot be axiomatized in first-order logic. Two immediate applications illustrate both uses.
[example: Finite Groups Cannot Be Axiomatized]
The class of finite groups has no first-order axiomatization. Suppose for contradiction that a theory $T$ in the language of groups has exactly the finite groups as its models. Let $T'$ be $T$ together with the sentences $\sigma_n := (\exists x_1) \cdots (\exists x_n)\bigwedge_{i \neq j}(x_i \neq x_j)$ for each $n \geq 2$, which assert that the model has at least $n$ elements. Every finite subset of $T'$ involves only finitely many of the $\sigma_n$, say $\sigma_2, \ldots, \sigma_N$, and $\mathbb{Z}_M$ for $M > N$ satisfies all of $T \cup \{\sigma_2, \ldots, \sigma_N\}$. By Compactness, $T'$ has a model, which is simultaneously a finite group (by $T$) and arbitrarily large (by all $\sigma_n$) — a contradiction.
More generally, **finiteness is not a first-order property**: if a theory $S$ has arbitrarily large finite models, then by Compactness it also has an infinite model.
[/example]
### The Löwenheim–Skolem Theorems
The Löwenheim–Skolem theorems characterize the possible cardinalities of models of a first-order theory. Together they show that first-order logic cannot control the size of its models: any theory with an infinite model has models of every infinite cardinality.
[quotetheorem:1489]
[citeproof:1489]
The Upward theorem shows that no first-order theory with an infinite model is categorical across all cardinalities in the upward direction: there are always larger models. As a consequence, PA has uncountable models that are not isomorphic to $\mathbb{N}$, and the theory of real-closed fields has models of every uncountable cardinality. It is worth noting that the argument relies critically on Compactness — and Compactness in turn applies only to infinite (i.e., satisfiable) theories. The technique of adding $\kappa$ many pairwise-distinct constants and invoking Compactness to build a model of size $\geq \kappa$ does not apply to finite structures: if $S$ has only finite models, there is no infinite model to "start from," and indeed the Compactness argument breaks down because the finite-model condition on $S$ may not survive the addition of infinitely many distinct-constants sentences.
[quotetheorem:1490]
[citeproof:1490]
The downward theorem has a striking corollary: the theory of the reals (in its first-order language) has a countable model. This does not mean the reals are countable — it means there is a countable structure satisfying the same first-order sentences as $\mathbb{R}$. The "reals" in this countable model look identical to $\mathbb{R}$ from the perspective of first-order logic, but they omit "most" real numbers. This is **Skolem's paradox**: set theory (a first-order theory) has a countable model by the downward theorem, even though set theory proves the existence of uncountable sets. The resolution is that "uncountable" is itself a first-order concept relative to the model — the bijection witnessing countability lies outside the model.
### Categoricity and Complete Theories
A theory is $\kappa$-categorical if it has exactly one model of cardinality $\kappa$ up to isomorphism. Categorical theories are the most rigid possible: there is essentially no choice in what a model looks like at that cardinality. The connection to completeness is clean.
[definition: $\kappa$-Categorical Theory]
Let $\kappa$ be an infinite cardinal. A theory $T$ is **$\kappa$-categorical** if all models of $T$ of cardinality $\kappa$ are isomorphic to one another.
[/definition]
Categoricity is a strong structural condition. To verify it, one must show that any two models of the right cardinality are isomorphic — a task that typically requires a back-and-forth or transfinite construction. The payoff, however, is that categoricity implies completeness under a mild hypothesis.
[quotetheorem:1491]
[citeproof:1491]
The requirement that $T$ have no finite models is essential: the pure identity theory (empty language, no axioms) is $\kappa$-categorical for every $\kappa$ — any bijection between two sets of the same cardinality is an isomorphism since there is nothing to preserve — but it is not complete, because $(\exists x)(\exists y)(\neg(x = y))$ is true in all structures of cardinality $\geq 2$ but not provable from the empty theory (a one-element structure is also a model). The hypothesis excludes exactly this kind of degenerate situation: once $T$ has only infinite models, the Löwenheim–Skolem machinery can produce models of the target cardinality from any consistent extension, and the categoricity assumption then forces those extensions to agree on every sentence.
[example: Dense Linear Orders Without Endpoints]
The theory $\mathrm{DLO}$ of **dense linear orders without endpoints** has the language $(\varnothing, \{\leq\}, \alpha)$ with $\alpha(\leq) = 2$, and its axioms express: $\leq$ is a total order; for all $x < y$ there exists $z$ with $x < z < y$ (density); and there is no minimum or maximum element. The rational line $(\mathbb{Q}, \leq)$ is a countable model of $\mathrm{DLO}$. By Cantor's back-and-forth argument, any two countable dense linear orders without endpoints are isomorphic, so $\mathrm{DLO}$ is $\aleph_0$-categorical. Since $\mathrm{DLO}$ has no finite models (density forces the order to be infinite), Categoricity implies Completeness: $\mathrm{DLO}$ is complete. This means every first-order sentence in the language of orders is either true in every dense linear order without endpoints, or false in all of them — and the truth value is determined by $\mathbb{Q}$.
[/example]
[example: Algebraically Closed Fields]
The theory $\mathrm{ACF}_p$ of algebraically closed fields of characteristic $p$ (for $p$ a prime or $p = 0$) is $\kappa$-categorical for every uncountable cardinal $\kappa$. The key field-theoretic fact is that an algebraically closed field is determined up to isomorphism by its characteristic and its transcendence degree over the prime field ($\mathbb{Q}$ or $\mathbb{F}_p$); two uncountable algebraically closed fields of the same characteristic and the same uncountable cardinality have equal transcendence degree and are therefore isomorphic. Since $\mathrm{ACF}_p$ has no finite models, it follows that $\mathrm{ACF}_p$ is complete for each $p$.
This completeness has a striking corollary — the **Ax–Grothendieck theorem**: any injective polynomial map $f : \mathbb{C}^n \to \mathbb{C}^n$ is surjective. The statement is a first-order sentence in the language of rings. Since $\mathrm{ACF}_0$ is complete, it suffices to verify the sentence in some algebraically closed field of characteristic $0$. Now, the injectivity–surjectivity statement for polynomial maps of degree $d$ can be expressed as a first-order sentence, and any proof of its negation in $\mathrm{ACF}_0$ uses only finitely many of the characteristic-$0$ axioms (the sentences $\underbrace{1 + \cdots + 1}_{n} \neq 0$). Hence if the statement failed in $\mathrm{ACF}_0$, its negation would be provable in $\mathrm{ACF}_p$ for all sufficiently large primes $p$ (since only finitely many characteristic-$0$ axioms are used). But for characteristic $p > 0$, we can work in $\overline{\mathbb{F}}_p$, the algebraic closure of $\mathbb{F}_p$: any polynomial $f$ has coefficients in some finite subfield $\mathbb{F}_{p^k}$, and any element of the target lies in a finite extension. Restricting $f$ to the finite field generated by the coefficients and the target element, an injective function on a finite set is automatically surjective. Hence the statement is true in every $\mathrm{ACF}_p$ for $p > 0$, and by the transfer argument, true in $\mathrm{ACF}_0$, and thus true in $\mathbb{C}$.
[/example]
The Löwenheim–Skolem theorems show that first-order categoricity is impossible for uncountable structures in the downward direction: any theory with an uncountable model also has a countable model, so no theory categorical at an uncountable cardinal $\kappa$ is categorical at $\aleph_0$ (the models look quite different). The full picture of which cardinalities a given theory can be categorical at is given by a deep theorem of Morley.
[quotetheorem:1492]
The proof of Morley's theorem lies beyond this course, but the statement reveals a remarkable rigidity: uncountable categoricity is not a property that depends on the particular cardinality chosen. A theory is either categorical at all uncountable cardinals or at none. The term **uncountably categorical** is therefore unambiguous. Both $\mathrm{DLO}$ (which is $\aleph_0$-categorical but not uncountably categorical — $\mathbb{R}$ and $\mathbb{Q}$ have the same theory but are not isomorphic) and $\mathrm{ACF}_p$ (which is uncountably categorical but not $\aleph_0$-categorical — countable algebraically closed fields of characteristic $0$ can have different transcendence degrees over $\mathbb{Q}$) illustrate the distinction.
Predicate logic is where we can write the axioms of set theory itself, and certain fragments — like the theory of dense linear orders or algebraically closed fields — achieve a remarkable completeness: all models satisfying the axioms are structurally identical. This raises the fundamental question: is set theory, too, categorical?
# 5. Set Theory
Set theory occupies a peculiar position in the mathematical landscape: it is simultaneously the foundation on which all other mathematics is built, and just another first-order theory in the sense of Chapter 5. This chapter treats it as both. We axiomatise set theory using the language of predicate logic — the same framework we used for groups and fields — and then ask what the resulting universe of sets actually looks like. The answer involves transfinite induction, collapsing theorems, and an infinite tower of cumulative hierarchies that together paint a coherent picture of the mathematical universe.
## Axioms of Set Theory
The difficulty in founding mathematics is that we want to talk about arbitrary collections — of numbers, of functions, of structures — without running into contradictions like Russell's paradox. Naive set theory, which permits forming $\{x : p(x)\}$ for any property $p$, collapses immediately: take $p(x)$ to be $x \notin x$. The resolution is to replace unrestricted comprehension with a carefully chosen list of axioms that permit enough set-formation to do mathematics, but not so much that contradictions arise. This is the programme of Zermelo–Fraenkel set theory.
[definition: Zermelo-Fraenkel Set Theory]
**Zermelo–Fraenkel set theory** (ZF) is the first-order theory with empty function-symbol signature $\Omega = \varnothing$ and relation signature $\Pi = \{\in\}$, where $\in$ has arity 2. A **universe of sets** is a model $(V, \in)$ satisfying the ZF axioms listed below.
[/definition]
Officially we should write $\in_V$ to emphasise that the binary relation symbol is interpreted in $V$, but this notation is so unwieldy that we always suppress it. Each model $(V, \in)$ is intended to contain an internal copy of all of mathematics, so it will be an enormously complicated structure. The axioms are chosen so that this is plausible.
We can group the axioms into three families: two axioms that get the theory started, four axioms that let us build new sets from old ones, and three axioms that are needed but less immediately intuitive.
### The Starting Axioms
The most basic question is: what makes two sets the same? In ordinary mathematics, a set is determined by its elements. This becomes an axiom.
[definition: Axiom of Extensionality]
**Extensionality**: Two sets with the same members are equal:
\begin{align*}
(\forall x)(\forall y)\bigl((\forall z)(z \in x \Leftrightarrow z \in y) \Rightarrow x = y\bigr).
\end{align*}
[/definition]
The converse direction — that equal sets have the same members — is an instance of a logical axiom and does not need to be stated separately. Extensionality is what makes $\in$ behave like genuine set-membership rather than an arbitrary binary relation.
The next question is whether there are any sets at all. We cannot derive the existence of a set from logic alone, so we posit one explicitly.
[definition: Axiom of Empty Set]
**Empty Set**: There exists a set with no members:
\begin{align*}
(\exists x)(\forall y)(y \notin x).
\end{align*}
By Extensionality, this set is unique; we denote it $\varnothing$. The expression $\varnothing$ is an abbreviation: $p(\varnothing)$ means $(\exists x)(x\text{ has no members} \wedge p(x))$.
[/definition]
### The Building Axioms
With $\varnothing$ in hand, we need ways to construct more sets. The four building axioms provide the basic operations.
[definition: Axiom of Pair Set]
**Pair**: For any two sets $x$ and $y$, the set $\{x, y\}$ exists:
\begin{align*}
(\forall x)(\forall y)(\exists z)(\forall t)(t \in z \Leftrightarrow (t = x \vee t = y)).
\end{align*}
We write $\{x\}$ for $\{x, x\}$.
[/definition]
The Pair axiom alone is not enough to produce ordered pairs, since $\{x, y\} = \{y, x\}$ — it carries no ordering information. The Kuratowski encoding resolves this.
[definition: Ordered Pair]
The **ordered pair** $(x, y)$ is the set $\{\{x\}, \{x, y\}\}$.
We say that $x$ is an ordered pair if $(\exists y)(\exists z)(x = (y, z))$.
[/definition]
One verifies from Extensionality that $(a, b) = (c, d)$ if and only if $a = c$ and $b = d$. With ordered pairs available, we can define functions set-theoretically.
[definition: Function (Set-Theoretic)]
A set $f$ is a **function** if every member of $f$ is an ordered pair and $f$ is single-valued:
\begin{align*}
&(\forall x)(x \in f \Rightarrow x \text{ is an ordered pair}) \wedge {}\\
&(\forall x)(\forall y)(\forall z)\bigl[(x, y) \in f \wedge (x, z) \in f \Rightarrow y = z\bigr].
\end{align*}
The **domain** of $f$ is $\operatorname{dom} f := \{y : (\exists z)((y, z) \in f)\}$. We write $f \colon x \to y$ to mean $f$ is a function with $\operatorname{dom} f = x$ whose range is contained in $y$.
[/definition]
Once we have verified that this formal definition reproduces the properties of functions we expect, we set it aside and reason about functions in the usual informal way. The next building axiom allows us to take unions.
[definition: Axiom of Union]
**Union**: For any set $x$, the union $\bigcup x$ of all members of $x$ is a set:
\begin{align*}
(\forall x)(\exists y)(\forall z)\bigl(z \in y \Leftrightarrow (\exists t)(t \in x \wedge z \in t)\bigr).
\end{align*}
We write $x \cup y$ for $\bigcup\{x, y\}$.
[/definition]
Note that intersection does not require a separate axiom: given any $y \in x$, we can define $\bigcap x := \{z \in y : (\forall t \in x)(z \in t)\}$ using Separation (which we will derive from Replacement below). The Power Set axiom gives us the next fundamental construction.
[definition: Axiom of Power Set]
**Power Set**: For any set $x$, the collection of all subsets of $x$ is a set:
\begin{align*}
(\forall x)(\exists y)(\forall z)(z \in y \Leftrightarrow z \subseteq x),
\end{align*}
where $z \subseteq x$ abbreviates $(\forall t)(t \in z \Rightarrow t \in x)$. We write $\mathcal{P}(x)$ for this set.
[/definition]
With Union and Power Set together, we can form Cartesian products: for $t \in x$ and $s \in y$, the ordered pair $(t, s) = \{\{t\}, \{t, s\}\}$ belongs to $\mathcal{P}(\mathcal{P}(x \cup y))$, so $x \times y$ exists as a subset of $\mathcal{P}(\mathcal{P}(x \cup y))$ by Separation. The set of all functions from $x$ to $y$ then exists as a subset of $\mathcal{P}(x \times y)$.
### The Three Surprising Axioms
The building axioms are not enough. From $\varnothing$ alone, all the operations above produce only finite sets. To do mathematics we need infinite sets, and to prevent pathological self-referential behaviour we need structural constraints on the universe.
**Infinity.** To assert the existence of an infinite set in finitely many words, observe that if we write $x^+$ for $x \cup \{x\}$, then $\varnothing, \varnothing^+, \varnothing^{++}, \ldots$ are all distinct sets. Writing $0 = \varnothing$, $1 = \{0\}$, $2 = \{0, 1\}$, and so on, these sets represent the natural numbers. The problem is that their collection need not be a set without an explicit axiom.
[definition: Axiom of Infinity]
**Infinity**: There exists a **successor set** — a set that contains $\varnothing$ and is closed under the successor operation $x \mapsto x^+ := x \cup \{x\}$:
\begin{align*}
(\exists x)\bigl(\varnothing \in x \wedge (\forall y)(y \in x \Rightarrow y^+ \in x)\bigr).
\end{align*}
[/definition]
A successor set may contain many elements beyond the natural numbers. To isolate $\mathbb{N}$, we intersect all successor sets. The intersection of successor sets is again a successor set, so there is a least one. We write $\omega$ for this least successor set; it satisfies
\begin{align*}
x \in \omega \iff (\forall y)(y \text{ is a successor set} \Rightarrow x \in y).
\end{align*}
From this characterisation one derives — by $\omega$-induction, i.e. induction on $\omega$ — that $\omega$ satisfies the Peano axioms. Unlike the weak first-order induction scheme in Peano arithmetic, this is genuine second-order induction: if $x \subseteq \omega$ is a successor set, then $x = \omega$. We can also define finiteness and countability internally: $x$ is **finite** if there exists $n \in \omega$ and a bijection between $x$ and $n$; $x$ is **countable** if it is finite or bijects with $\omega$.
**Foundation.** We want to exclude pathological sets like $x \in x$, the two-cycle $x \in y \in x$, or infinite descending chains $\cdots \in x_2 \in x_1 \in x_0$. All these pathologies share a common feature: the set $\{x\}$, or $\{x, y\}$, or $\{x_0, x_1, x_2, \ldots\}$ has no $\in$-minimal element — no member that is disjoint from the set.
[definition: Axiom of Foundation]
**Foundation** (also called Regularity): Every nonempty set has an $\in$-minimal member — a member disjoint from the set:
\begin{align*}
(\forall x)\bigl(x \neq \varnothing \Rightarrow (\exists y)(y \in x \wedge (\forall z)(z \in x \Rightarrow z \notin y))\bigr).
\end{align*}
[/definition]
Foundation is used in only a few key places (particularly when setting up the von Neumann hierarchy), and most results in set theory do not depend on it. Its role is to ensure that the universe "looks nice" — built from the empty set upward, with no self-referential sets lurking.
**Replacement.** In ordinary mathematics, whenever we have an indexed family $\{A_i : i \in I\}$, we treat $\{A_i : i \in I\}$ as a set without comment. But why should the image of a set under a "definable assignment" be a set? None of the axioms so far guarantee this. The Axiom of Replacement fills this gap, but to state it precisely we first need the notion of a class.
### Classes and the Axiom of Replacement
The assignment $x \mapsto \{x\}$ for all sets $x$ looks like a function, but is not one in the formal sense: every set-theoretic function has a domain that is a set, while here the "domain" is all of $V$. Such assignments are called classes.
[definition: Class]
Let $(V, \in)$ be a model of ZF. A **class** is a collection $C$ of elements of $V$ defined by some first-order formula $p(x, t_1, \ldots, t_n)$ with parameters $t_1, \ldots, t_n \in V$:
\begin{align*}
x \in C \iff p(x, t_1, \ldots, t_n) \text{ holds in } (V, \in).
\end{align*}
A class that is not a set (i.e. is not an element of $V$) is called a **proper class**.
[/definition]
[example: Examples of Classes]
The following are classes in any model of ZF.
- $V$ itself, defined by $p(x)$ being $x = x$.
- For any fixed $t \in V$, the collection $\{x \in V : t \in x\}$, with parameter $t$.
- Any set $y \in V$, viewed as the class defined by $p(x)$ being $x \in y$.
The class $V$ itself is a proper class: if $V$ were a set, Separation would yield Russell's paradox.
[/example]
A **function-class** is a class of ordered pairs that is single-valued — a class $F$ such that $(x, y) \in F$ and $(x, z) \in F$ implies $y = z$. The assignment $x \mapsto \{x\}$ is a function-class, defined by $p(x, y)$ being $y = \{x\}$. We cannot refer to function-classes inside $V$ (they are not sets), so we replace them by the formulas defining them.
[definition: Axiom of Replacement]
**Replacement**: For each first-order formula $p(x, y, t_1, \ldots, t_n)$, if $p$ defines a function-class (for each $x$ there is at most one $y$ with $p(x, y)$), then the image of any set under this function-class is a set:
\begin{align*}
(\forall t_1)\cdots(\forall t_n)\Bigl(&\bigl[(\forall x)(\forall y)(\forall z)(p(x,y) \wedge p(x,z) \Rightarrow y = z)\bigr]\\
&\Rightarrow \bigl[(\forall x)(\exists y)(\forall z)(z \in y \Leftrightarrow (\exists t)(t \in x \wedge p(t, z)))\bigr]\Bigr).
\end{align*}
This is an axiom scheme, with one instance for each formula $p$.
[/definition]
Replacement has a companion: the **Axiom of Separation** (also called Comprehension), which asserts that for any set $x$ and formula $p$, the collection $\{z \in x : p(z)\}$ is a set. Separation is in fact a consequence of Replacement: apply Replacement with the function-class $z \mapsto z$ restricted to those $z \in x$ satisfying $p(z)$. So we do not need Separation as a separate axiom, though it is often listed separately for clarity.
### The Axiom of Choice
The eight axioms above constitute ZF. A ninth axiom — the Axiom of Choice — is kept separate because it has a different logical character: it is independent of ZF (neither provable nor disprovable from ZF alone, assuming ZF is consistent).
[definition: Axiom of Choice and ZFC]
**AC** (Axiom of Choice): Every family of nonempty sets has a choice function. More precisely, if $f$ is a function with $f(x) \neq \varnothing$ for every $x \in \operatorname{dom} f$, then there exists a function $g$ with $\operatorname{dom} g = \operatorname{dom} f$ and $g(x) \in f(x)$ for every $x \in \operatorname{dom} g$.
**ZFC** is the theory ZF $+$ AC.
[/definition]
The Axiom of Choice is equivalent (over ZF) to many statements throughout mathematics: Zorn's Lemma, the Well-Ordering Principle, Tychonoff's theorem for compact spaces, the existence of bases for all vector spaces, and many more. Its independence from ZF, established by Gödel and Cohen, means that questions like whether every set of real numbers is measurable genuinely cannot be settled without it (or without its negation).
The independence of AC has concrete consequences that are easy to state but impossible to prove in ZF alone. For instance, ZF does not prove that every vector space has a basis: there are models of ZF in which $\mathbb{R}$, viewed as a vector space over $\mathbb{Q}$, has no Hamel basis — no maximal linearly independent subset from which every real number can be expressed as a finite rational linear combination. Similarly, ZF alone does not prove that a countable union of countable sets is countable: without a choice function selecting a bijection $A_n \to \omega$ for each $n$, there is no way to interleave the enumerations, and models exist in which $\bigcup_{n \in \omega} A_n$ is uncountable even though each $A_n$ is countable. These failures are not pathologies of exotic models; they reflect a genuine combinatorial gap in what ZF can prove.
## Properties of ZF
### Transitive Sets and the Transitive Closure
To understand the structure of a model $(V, \in)$, we need a way to talk about sets that are "self-contained" with respect to membership — sets where following the membership relation stays inside the set. Without such a notion, we cannot make sense of the idea that sets are built from simpler sets, nor can we apply Foundation productively.
[definition: Transitive Set]
A set $x$ is **transitive** if every member of a member of $x$ is again a member of $x$:
\begin{align*}
(\forall y)\bigl((\exists z)(y \in z \wedge z \in x) \Rightarrow y \in x\bigr).
\end{align*}
Equivalently, $\bigcup x \subseteq x$.
[/definition]
[example: Transitive Sets]
The set $\{\varnothing, \{\varnothing\}\}$ is transitive: its only member with a member is $\{\varnothing\}$, and $\varnothing \in \{\varnothing, \{\varnothing\}\}$. More generally, for any $n \in \omega$, the set $n = \{0, 1, \ldots, n-1\}$ is transitive, since $k < j < n$ implies $k \in j$ and $k \in n$. The set $\omega$ itself is transitive: if $k \in j \in \omega$, then $k \in \omega$.
The set $\{\{\varnothing\}\}$ is not transitive: it has $\{\varnothing\}$ as a member, and $\varnothing \in \{\varnothing\}$, but $\varnothing \notin \{\{\varnothing\}\}$.
[/example]
Transitivity matters because it makes a set into a "self-contained" universe: within a transitive set, membership witnesses are available locally. This is why transitive sets arise naturally when working with models of ZF or fragments thereof.
[quotetheorem:1493]
[citeproof:1493]
The use of Replacement here is essential and genuine, not merely a technicality. We need to collect infinitely many iterated unions into a single set, which no combination of the other axioms achieves alone. Note also the technique of **attempts**: rather than defining a recursive function by a circular formula, we quantify over all finite approximations.
### $\in$-Induction and $\in$-Recursion
Foundation states that every nonempty set has an $\in$-minimal element. This looks like a well-foundedness condition, and should therefore support an induction principle. But the usual induction argument — "take the minimal counterexample" — requires forming the set $\{x : \neg p(x)\}$, which may not exist (if $p(x)$ is $x \neq x$, this "set" is all of $V$, which is a proper class). The transitive closure provides the workaround.
[quotetheorem:1494]
[citeproof:1494]
The key technique is to localise the argument to $TC(\{x_0\})$, a genuine set, so that Foundation can be applied. This trick — reducing a statement about all of $V$ to a statement about an actual set — recurs throughout set theory.
The converse direction is also true: $\in$-induction implies Foundation. To see this, consider the property $p(x)$ defined as "for every $y$ with $x \in y$, the set $y$ has an $\in$-minimal member." One shows by $\in$-induction that every set has this property, which is equivalent to Foundation. The two principles are therefore equivalent.
With induction established, we turn to recursion. In ordinary mathematics, we define functions by recursion: $f(x)$ is defined using the values $f(y)$ for $y$ simpler than $x$. The same mechanism works for $\in$, since Foundation makes $\in$ well-founded.
[quotetheorem:1495]
[citeproof:1495]
The theorem produces a function-class $F$, not a set-theoretic function, and this distinction is essential. To see why, consider the attempt to define $F$ by $F(x) = \{F(y) : y \in x\}$ for all sets $x$. Evaluated at every set $x$ simultaneously, the "range" of $F$ must contain a copy of every set — indeed, for each set $x$, the value $F(x)$ is determined by the values at its members, and by induction $F$ is the identity on all sets (one checks $F(x) = x$ for all $x$). The full "range" $\{F(x) : x \in V\} = V$ is then the class of all sets, which is a proper class. There is no set-valued function $f : V \to V$ with this property, because the codomain would need to be a set. The $\in$-recursion theorem sidesteps this by accepting a function-class as output: the universe $V$ is wide enough to accommodate $F$, even though no set is.
This result is used almost immediately in the next section to define the von Neumann hierarchy $V_\alpha$: the assignment $\alpha \mapsto V_\alpha$ is a function-class defined by ordinal recursion, which is a special case of $\in$-recursion. Without the theorem in its full class-valued generality, the definition of $V_\alpha$ would have no formal justification.
### Well-Founded Relations
The proofs of $\in$-induction and $\in$-recursion used only two properties of $\in$: that it is well-founded (every nonempty set has an $\in$-minimal element), and that it is local (the collection $\{x : x \in y\}$ is a set for each $y$, which is just $y$ itself). Any other relation sharing these properties supports the same principles. Before stating the general theorem, it is worth isolating each property as a definition, since they play genuinely different roles.
[definition: Well-Founded Relation]
A relation-class $R$ is **well-founded** if every nonempty set has an $R$-minimal element: for every nonempty set $a$, there exists $y \in a$ such that no $z \in a$ satisfies $z \mathrel{R} y$.
[/definition]
Well-foundedness alone is not enough for recursion. The recursion over $\in$ is possible not just because $\in$ descends to a base, but because the set of $\in$-predecessors of any $x$ — namely $x$ itself — is a set. For a general relation $R$, the class $\{x : x \mathrel{R} y\}$ of $R$-predecessors of $y$ could be a proper class, making it impossible to treat the recursion as a function defined on a set-sized argument. Locality is the condition that rules this out: it requires that for each $y$, the recursion $F(y) = G(F|_{\{x : x \mathrel{R} y\}})$ is evaluated on a genuine set, not a proper class.
[definition: Local Relation]
A relation-class $R$ is **local** if $\{x : x \mathrel{R} y\}$ is a set for every $y$.
[/definition]
Together, well-foundedness and locality reproduce exactly the environment in which $\in$ operates: well-foundedness gives the descent-base that stops any induction from running forever, and locality guarantees that the collection of predecessors at each step is a set, so that a recursive definition can meaningfully appeal to all earlier values at once.
[quotetheorem:1496]
[citeproof:1496]
This general theorem applies to many relations beyond $\in$. A particularly useful example is the **subterm relation** on a term algebra. Given a signature $\Sigma$ of function symbols, define the terms $T(\Sigma)$ inductively (variables are terms, and if $f$ has arity $n$ and $t_1, \ldots, t_n$ are terms then $f(t_1, \ldots, t_n)$ is a term). Declare $s \mathrel{R} t$ if $s$ is a proper subterm of $t$ — i.e. $s$ occurs strictly inside $t$ at some position. This relation is well-founded (the subterm tree of any term is finite, so any nonempty set of terms has a $\mathrel{R}$-minimal element, namely one with no proper subterms in the set), and it is local (for each term $t$, the set $\{s : s \mathrel{R} t\}$ is exactly the finite set of proper subterms of $t$). The general theorem therefore guarantees that we can define functions on $T(\Sigma)$ by structural recursion — a fact used throughout logic and the theory of programming languages.
To see why locality cannot be dropped, consider the following example. Let $V$ be any model of ZF and define a relation-class $R$ on $V$ by $x \mathrel{R} y$ iff $y \in TC(\{x\})$ — in other words, $x$ can reach $y$ by iterating $\in$. This relation is well-founded (if $a$ is nonempty and $y_0 \in a$ has minimal rank, then no $z \in a$ satisfies $z \mathrel{R} y_0$, since that would require $y_0 \in TC(\{z\})$, forcing $\operatorname{rank}(y_0) > \operatorname{rank}(z)$). However, $R$ is not local: $\{x : x \mathrel{R} \omega\}$ contains all sets that can reach $\omega$ via iterated membership, which includes $0, 1, 2, \ldots$ and all sets reachable from them — a proper class. Attempting recursion over $R$ would require evaluating $F(y) = G(\{F(x) : x \mathrel{R} y\})$ at $y = \omega$, where $\{x : x \mathrel{R} \omega\}$ is a proper class; the expression $\{F(x) : x \mathrel{R} \omega\}$ is then a class-indexed collection with no guarantee of being a set, and $G$ (a function-class) cannot be applied to it in any standard sense. Locality is therefore not a technical nicety but a genuine necessity for the recursion to be well-defined.
If $r$ is a relation on a set $a$ (not a proper class), then $r$ is automatically local, so we only need to check well-foundedness. In particular, induction and recursion on well-orderings are special cases of this general theorem.
### Mostowski's Collapsing Theorem
A natural question: given a set $a$ equipped with a relation $r$ that "looks like $\in$," can we always find a transitive set $b$ with a bijection $f \colon a \to b$ under which $r$ corresponds exactly to $\in$? If $r$ is to be modelled by $\in$, it must be well-founded (since $\in$ is well-founded by Foundation) and extensional (since $\in$ satisfies Extensionality).
[definition: Extensional Relation]
A relation $r$ on a set $a$ is **extensional** if it satisfies the analogue of Extensionality:
\begin{align*}
(\forall x \in a)(\forall y \in a)\bigl((\forall z \in a)(z \mathrel{r} x \Leftrightarrow z \mathrel{r} y) \Rightarrow x = y\bigr).
\end{align*}
[/definition]
Extensionality is the additional hypothesis that makes the collapse unique, not merely existent. Without it, distinct elements $x$ and $y$ in $a$ could have identical sets of $r$-predecessors within $a$, so the recursion $f(x) = \{f(z) : z \mathrel{r} x\}$ would assign $f(x) = f(y)$, and $f$ would fail to be injective. Well-foundedness ensures the recursion terminates, but it cannot distinguish $x$ from $y$ if they are $r$-indistinguishable. Extensionality is precisely the condition that $r$-indistinguishability coincides with equality — so that the collapse can be injective, and hence a bijection.
[quotetheorem:1497]
[citeproof:1497]
Both hypotheses are necessary. If $r$ is not well-founded — say $a = \{x\}$ with $x \mathrel{r} x$ — then we would need $f(x) \in f(x)$, which Foundation prohibits. If $r$ is not extensional — say $a = \{x, y\}$ with $x \neq y$ but no $z$ satisfying $z \mathrel{r} x$ or $z \mathrel{r} y$ — then $f$ would have to send both $x$ and $y$ to $\varnothing$, contradicting bijectivity. Mostowski's theorem is the precise sense in which every abstract well-founded extensional structure is isomorphic to a concrete transitive set ordered by $\in$.
The theorem has an important consequence for ordinals. In Chapter 3, ordinals were defined as equivalence classes of well-orderings, which creates foundational difficulties (the equivalence classes are proper classes). Mostowski gives a canonical representative for each equivalence class.
[definition: Ordinal]
An **ordinal** is a transitive set that is totally ordered by $\in$.
[/definition]
By Foundation, $\in$ on a transitive set is automatically well-founded, so every ordinal is well-ordered by $\in$. The finite ordinals are $0 = \varnothing$, $1 = \{\varnothing\}$, $2 = \{\varnothing, \{\varnothing\}\}$, and so on; $\omega$ is the first infinite ordinal. Mostowski's theorem guarantees that every well-ordering is order-isomorphic to a unique ordinal: given a well-ordering $(a, r)$, the relation $r$ is extensional (since $r$ is a total order, distinct elements have different sets of $r$-predecessors), so collapse yields a unique transitive set $b$ with $r$ mirrored by $\in$, and one checks that $b$ is totally ordered by $\in$, hence an ordinal. This ordinal is the **order type** of $(a, r)$.
For any ordinal $\alpha$, the set of ordinals below $\alpha$ is $\alpha$ itself — i.e. $\beta < \alpha$ iff $\beta \in \alpha$. Successor ordinals are $\alpha^+ = \alpha \cup \{\alpha\}$, and the supremum of a family of ordinals is their union: $\sup\{\alpha_i : i \in I\} = \bigcup\{\alpha_i : i \in I\}$.
## Picture of the Universe
### The von Neumann Hierarchy
Foundation says that every set is "built from simpler sets," and $\in$-induction formalises this principle. But what does the resulting structure look like globally? We want to describe the universe $V$ as a whole — to show that every set appears at some stage of a transfinite cumulative construction starting from $\varnothing$.
The obstacle is that $V$ is a proper class, not a set, so we cannot speak of it as a single mathematical object inside the theory. What we can do is define, for each ordinal $\alpha$, the set $V_\alpha$ of all sets that have been built by stage $\alpha$, and then observe that every set belongs to some $V_\alpha$.
[definition: Von Neumann Hierarchy]
Define sets $V_\alpha$ for each ordinal $\alpha$ by $\in$-recursion on $\alpha$:
\begin{align*}
V_0 &= \varnothing,\\
V_{\alpha+1} &= \mathcal{P}(V_\alpha),\\
V_\lambda &= \bigcup\{V_\gamma : \gamma < \lambda\}
\end{align*}
for $\lambda$ a nonzero limit ordinal. We write $\mathrm{On}$ for the proper class of all ordinals, and define
\begin{align*}
V = \bigcup_{\alpha \in \mathrm{On}} V_\alpha.
\end{align*}
[/definition]
The recursive definition is legitimate by the $\in$-recursion theorem, applied to the ordinal $\in$-structure. Note the key equivalence: $x \subseteq V_\alpha \iff x \in V_{\alpha+1}$. This means membership in the next level is exactly subsethood of the current level.
[example: Computing Early Levels]
\begin{align*}
V_0 &= \varnothing,\\
V_1 &= \mathcal{P}(V_0) = \{\varnothing\},\\
V_2 &= \mathcal{P}(V_1) = \{\varnothing, \{\varnothing\}\},\\
V_3 &= \mathcal{P}(V_2) = \bigl\{\varnothing,\, \{\varnothing\},\, \{\{\varnothing\}\},\, \{\varnothing, \{\varnothing\}\}\bigr\}.
\end{align*}
For each $n \in \omega$, $|V_n| = 2^{|V_{n-1}|}$, so the finite levels grow as iterated exponentials. The ordinal $n$ itself belongs to $V_{n+1} \setminus V_n$.
[/example]
To show that every set belongs to some $V_\alpha$ we establish two preliminary lemmas.
[quotetheorem:1498]
[citeproof:1498]
[quotetheorem:1499]
[citeproof:1499]
With these lemmas we can define the rank function and prove the main theorem.
### The Rank Function and the Universe
The rank of a set $x$ measures at which stage of the von Neumann hierarchy $x$ first appears. Defining it naively — as "the least $\alpha$ with $x \in V_\alpha$" — risks being off by one. The correct recursive definition avoids this.
[definition: Rank]
The **rank** of a set $x$ is the ordinal defined by $\in$-recursion:
\begin{align*}
\operatorname{rank}(x) = \sup\bigl\{(\operatorname{rank}(y))^+ : y \in x\bigr\}.
\end{align*}
[/definition]
This definition ensures that $\operatorname{rank}(x)$ is the least ordinal $\alpha$ such that $x \subseteq V_\alpha$. For example: $\operatorname{rank}(\varnothing) = \sup \varnothing = 0$; $\operatorname{rank}(\{\varnothing\}) = 0^+ = 1$; $\operatorname{rank}(\omega) = \omega$. For any ordinal $\alpha$, $\operatorname{rank}(\alpha) = \alpha$.
[quotetheorem:1500]
[citeproof:1500]
This theorem is the culmination of the chapter: it shows that the von Neumann hierarchy $V_0 \subseteq V_1 \subseteq V_2 \subseteq \cdots \subseteq V_\omega \subseteq V_{\omega+1} \subseteq \cdots$ exhausts the entire universe. Every mathematical object — every natural number, every real number, every function, every structure — can be encoded as a set and therefore appears at some finite or transfinite stage of the hierarchy. The Axiom of Foundation is what makes this possible: without it, sets like $x = \{x\}$ would exist and have no rank.
[example: Ranks of Familiar Sets]
- $\operatorname{rank}(n) = n$ for each $n \in \omega$, since $n = \{0, 1, \ldots, n-1\}$ and each $k < n$ has rank $k$.
- $\operatorname{rank}(\omega) = \omega$, since $\omega = \{0, 1, 2, \ldots\}$ and $\operatorname{rank}(n) = n$ for each $n \in \omega$.
- $\operatorname{rank}(\mathcal{P}(\omega)) = \omega + 1$, since $\mathcal{P}(\omega) \subseteq V_{\omega+1}$ and $\omega \in \mathcal{P}(\omega)$ has rank $\omega$.
- In general, $\operatorname{rank}(\mathcal{P}(x)) = \operatorname{rank}(x) + 1$.
[/example]
The structure of the von Neumann hierarchy illuminates the relationship between ordinals and the rest of the universe. An ordinal $\alpha$ is a transitive set well-ordered by $\in$; its rank equals itself, so ordinals serve as their own bookmarks in the hierarchy. The universe $V$ is stratified into layers $V_\alpha$ indexed by the ordinals, with each layer consisting of all sets whose members were already present one stage earlier. This picture — a cone widening as one ascends through the ordinals — is the standard mental image of the set-theoretic universe and provides the foundation for all subsequent developments, including the study of consistency and independence questions.
Set theory, the universal foundation, is itself merely a first-order theory with an underlying partial order given by the membership relation. The cumulative hierarchy V_α shows how all sets are built iteratively from the empty set, layer by layer, forming an ever-widening cone. But what about the sizes of these sets — how do we compare their cardinalities?
# 6. Cardinals
This chapter addresses a question that has been lurking throughout the course: what does it mean for two sets to have the "same size"? For finite sets the answer is elementary — two sets have the same size when they have the same number of elements. But once we enter the realm of infinite sets, counting elements no longer makes sense in the ordinary way. The ordinals developed in Chapter 3 measure the *length* of a well-ordering, and we have seen that $\omega$ and $\omega + 1$ are genuinely different ordinals even though they differ by only one element. Cardinals, by contrast, measure something coarser: the raw *size* of a set, abstracted away from any particular ordering. The machinery of ZFC — in particular, the Well-ordering Theorem — gives us a beautifully economical way to define cardinals as special ordinals, and from that definition flows an entire arithmetic of infinite sizes. A surprising punchline of the theory is that addition and multiplication of infinite cardinals are as simple as possible, while exponentiation is as complicated and independent of ZFC as anything in mathematics.
## Cardinality and the Definition of Cardinals
The central problem is to assign to each set $X$ a "size" $\operatorname{card}(X)$ so that $\operatorname{card}(X) = \operatorname{card}(Y)$ if and only if there is a bijection from $X$ to $Y$. Write $X \leftrightarrow Y$ to mean that such a bijection exists. The most naive attempt — define $\operatorname{card}(X)$ to be the collection $\{Y : Y \leftrightarrow X\}$ of all sets equinumerous with $X$ — fails immediately: this collection is not a set (it is a proper class), since one can take any $Y \leftrightarrow X$ and replace each element with an arbitrary set, producing a bijection from a new set, and there is no bound on what elements such sets can contain. We need, instead, to select a canonical *representative* of the equivalence class of sets equinumerous with $X$.
The choice of representative is the same idea that underlies the definition of ordinals: pick the smallest ordinal equinumerous with $X$. By the Axiom of Choice (which gives us the Well-ordering Theorem), every set $X$ can be well-ordered, so there exists at least one ordinal $\alpha$ with $X \leftrightarrow \alpha$. Among all such ordinals, there is a least one.
[definition: Cardinality]
The **cardinality** of a set $X$, written $\operatorname{card}(X)$, is the least ordinal $\alpha$ such that $X \leftrightarrow \alpha$.
[/definition]
With this definition in place, the fundamental equivalence $\operatorname{card}(X) = \operatorname{card}(Y) \iff X \leftrightarrow Y$ follows at once: if $\operatorname{card}(X) = \operatorname{card}(Y) = \alpha$, then both $X$ and $Y$ biject with $\alpha$, and hence with each other; conversely, if $X \leftrightarrow Y$, then any ordinal equinumerous with $X$ is also equinumerous with $Y$, so the least such ordinal is the same.
It is worth pausing to ask: what if we lack the Axiom of Choice? In ZF alone, not every set can be well-ordered, so the above definition may yield no ordinal for a given $X$. The remedy — known as the **Scott trick** — is more subtle. Every set $X$ has a *rank* in the cumulative hierarchy: the least $\alpha$ such that $X \in V_{\alpha+1}$. For any set $Y \leftrightarrow X$, $Y$ has some rank too. Define the *essential rank* of $X$ as the minimum rank among all sets equinumerous with $X$. Then set $\operatorname{card}(X) := \{Y \in V_{\operatorname{essrank}(X)+1} : Y \leftrightarrow X\}$, the collection of sets equinumerous with $X$ and living at that minimum rank. This collection is a set (it lives inside a single $V_\alpha$), and it serves as the canonical representative. The Scott trick requires neither Choice nor ordinals — but it is considerably less convenient for arithmetic. For the rest of this chapter we assume ZFC.
### Initial Ordinals and the Aleph Hierarchy
The cardinalities of infinite sets are not all ordinals: many ordinals biject with each other. For instance, $\omega$ and $\omega + 1$ both have cardinality $\aleph_0$, since the bijection $n \mapsto n + 1$ (with $\omega$ mapped to $0$) witnesses $\omega + 1 \leftrightarrow \omega$. More dramatically, $\omega^\omega$ — a countable ordinal — also bijects with $\omega$. The question, then, is: which ordinals *are* cardinalities, i.e., are the least ordinal of their size?
[definition: Initial Ordinal]
An ordinal $\alpha$ is **initial** if no ordinal $\beta < \alpha$ satisfies $\beta \leftrightarrow \alpha$; equivalently, $\alpha$ is the smallest ordinal of its cardinality.
[/definition]
The finite ordinals $0, 1, 2, 3, \ldots$ are all initial (no two finite ordinals biject with each other), and $\omega$ is initial (no finite set bijects with $\omega$). By contrast, $\omega + 1, \omega + 2, \ldots, \omega \cdot 2, \ldots, \omega^\omega$ are all non-initial: each bijects with $\omega$ via a re-enumeration argument. The first uncountable ordinal $\omega_1 = \gamma(\omega)$ is initial, because by definition it is the least ordinal not injecting into $\omega$ — so certainly no $\beta < \omega_1$ bijects with $\omega_1$.
The initial ordinals $\geq \omega$ are indexed by all ordinals via a recursive definition.
[definition: Omega Numbers]
Define $\omega_\alpha$ for each ordinal $\alpha$ by transfinite recursion:
\begin{align*}
\omega_0 &= \omega, \\
\omega_{\alpha+1} &= \gamma(\omega_\alpha), \\
\omega_\lambda &= \sup\{\omega_\alpha : \alpha < \lambda\} \quad \text{for non-zero limit } \lambda,
\end{align*}
where $\gamma(X)$ denotes the Hartogs number of $X$, i.e., the least ordinal not injecting into $X$.
[/definition]
The claim is that each $\omega_\alpha$ is initial. For $\omega_0 = \omega$ this is clear. For $\omega_{\alpha+1} = \gamma(\omega_\alpha)$: by definition of the Hartogs number, $\omega_{\alpha+1}$ does not inject into $\omega_\alpha$, so it cannot biject with any ordinal $\leq \omega_\alpha$; and every $\beta$ with $\omega_\alpha < \beta < \omega_{\alpha+1}$ must inject into $\omega_\alpha$ (since $\omega_{\alpha+1}$ is the *least* ordinal not injecting into $\omega_\alpha$), which means $\beta < \omega_{\alpha+1}$ in cardinality — so no such $\beta$ bijects with $\omega_{\alpha+1}$. For limit $\lambda$: if $\beta < \omega_\lambda = \sup_{\alpha < \lambda} \omega_\alpha$, then $\beta < \omega_\alpha$ for some $\alpha < \lambda$, so $\operatorname{card}(\beta) \leq \operatorname{card}(\omega_\alpha) < \operatorname{card}(\omega_\lambda)$ (since the $\omega_\alpha$ are strictly increasing), and hence $\beta \not\leftrightarrow \omega_\lambda$.
Moreover, every infinite initial ordinal $\delta$ is some $\omega_\alpha$. To see this: since $\omega_\alpha \geq \alpha$ for all $\alpha$ (proved by induction: $\omega_0 = \omega \geq 0$; $\omega_{\alpha+1} > \omega_\alpha \geq \alpha$; at limits, $\omega_\lambda = \sup \omega_\alpha \geq \sup \alpha = \lambda$), the sequence $(\omega_\alpha)$ is cofinal in the ordinals, so there is a least $\alpha$ with $\omega_\alpha \geq \delta$. If $\alpha$ is a successor $\beta^+$, then $\omega_\beta < \delta \leq \omega_{\beta+1} = \gamma(\omega_\beta)$, but $\gamma(\omega_\beta)$ is by definition the *least* ordinal not injecting into $\omega_\beta$, so there is no initial ordinal strictly between $\omega_\beta$ and $\gamma(\omega_\beta)$; hence $\delta = \omega_{\beta+1}$. If $\alpha$ is a limit, then $\omega_\alpha = \sup_{\beta < \alpha} \omega_\beta$, and $\delta \leq \omega_\alpha$ forces $\delta = \omega_\alpha$ (if $\delta < \omega_\alpha$ there would be some $\omega_\beta > \delta$ with $\beta < \alpha$, contradicting minimality of $\alpha$).
The cardinalities of the $\omega_\alpha$ deserve their own notation.
[definition: Aleph Numbers]
Write $\aleph_\alpha$ (read "aleph-alpha") for $\operatorname{card}(\omega_\alpha)$.
[/definition]
The analysis above establishes the following classification.
[quotetheorem:1501]
This theorem encapsulates a remarkable structural fact: the infinite cardinals form a hierarchy indexed by all ordinals, with no "gaps" and no "limit cardinals" that are not of the form $\aleph_\lambda$ for a limit $\lambda$. The alephs are themselves well-ordered (as they are cardinals, hence ordinals) and every infinite cardinal appears exactly once. Note that $\operatorname{card}(\omega) = \aleph_0$, $\operatorname{card}(\omega_1) = \aleph_1$, and so on.
### Cardinal Inequalities
Having defined cardinalities as ordinals, we need the correct notion of inequality. Two sets may be compared not by well-ordering but by injection.
[definition: Cardinal Inequality]
For cardinals $m$ and $n$ (with $M$ and $N$ being sets of those respective cardinalities), write $m \leq n$ if $M$ injects into $N$, and $m < n$ if $m \leq n$ and $m \neq n$. This does not depend on the choice of $M$ and $N$.
[/definition]
The relation $\leq$ is a partial order on cardinals: reflexivity is the identity injection; transitivity follows from composing injections. The Cantor–Schröder–Bernstein theorem (proved in Chapter 4 via the Knaster–Tarski fixed-point theorem, and re-proved here by orbit decomposition), which we state and sketch below, establishes antisymmetry: $m \leq n$ and $n \leq m$ together imply $m = n$. Moreover, assuming the Axiom of Choice, $\leq$ is a *total* order on cardinals: given any two sets, the Well-ordering Theorem allows us to well-order both, and the totality of ordinal comparison then applies. Without Choice, cardinal comparison is genuinely not total — there exist models of ZF in which two sets are incomparable in this sense.
[quotetheorem:1479]
[citeproof:1479]
The Cantor–Schröder–Bernstein theorem is notable precisely because it requires *no* form of the Axiom of Choice: the explicit construction of $A$ gives a definite bijection. The proof is the canonical example of using an orbit decomposition to patch two injections into a single bijection.
[example: Cardinality of $\mathcal{P}(\omega)$]
As an immediate corollary of the cardinal definitions and Cantor's theorem (proved in the next section), $\operatorname{card}(\mathcal{P}(\omega)) > \operatorname{card}(\omega) = \aleph_0$. We will see that $\operatorname{card}(\mathcal{P}(\omega)) = 2^{\aleph_0}$, and that $\mathbb{R} \leftrightarrow \mathcal{P}(\omega) \leftrightarrow 2^\omega$, so $\operatorname{card}(\mathbb{R}) = 2^{\aleph_0}$. Whether $2^{\aleph_0} = \aleph_1$ — the Continuum Hypothesis — is independent of ZFC.
[/example]
## Cantor's Theorem and the Diagonal Argument
The central result of cardinal theory — one that launched Cantor's entire program — is that no set can be put in bijection with its own power set. This might seem obvious for finite sets (a set with $n$ elements has $2^n$ subsets, and $n < 2^n$ for all $n \geq 0$), but the infinite case is far from obvious: after all, $\omega \leftrightarrow \omega \times \omega$, so infinite sets can biject with things that *look* much bigger.
The proof uses a *diagonalisation* argument: given any function $f: X \to \mathcal{P}(X)$, we exhibit an element of $\mathcal{P}(X)$ not in the image of $f$, showing $f$ cannot be a surjection.
[quotetheorem:621]
The diagonalisation argument is one of the most reused techniques in all of logic and set theory. The same idea appears in Russell's paradox (the set of all sets not containing themselves), in Gödel's incompleteness theorems (a statement that says of itself "I am not provable"), and in Turing's proof that the halting problem is undecidable. In each case, one constructs an object that is designed to differ from every object in a given enumeration, thereby escaping that enumeration.
Cantor's theorem has an immediate and striking consequence: there is no "largest" cardinal. Starting from $\aleph_0$, we get an inexhaustible ascending chain
\begin{align*}
\aleph_0 < \operatorname{card}(\mathcal{P}(\omega)) < \operatorname{card}(\mathcal{P}(\mathcal{P}(\omega))) < \cdots,
\end{align*}
and this chain continues through all ordinals via the beth numbers $\beth_\alpha$. The alephs $\aleph_\alpha$ and beth numbers $\beth_\alpha$ coincide only when the Generalised Continuum Hypothesis holds.
[example: $\operatorname{card}(\mathbb{R}) = 2^{\aleph_0}$]
We verify the chain of bijections $\mathbb{R} \leftrightarrow \mathcal{P}(\omega) \leftrightarrow 2^\omega$. Every real number has a binary expansion (with the usual convention for dyadic rationals to choose a canonical expansion), which is a function $\omega \to \{0, 1\}$; this gives a bijection $\mathbb{R} \to 2^\omega$ up to countably many exceptions that can be handled with a Cantor–Bernstein argument. The bijection $\mathcal{P}(\omega) \to 2^\omega$ sends $S \subseteq \omega$ to its indicator function $\mathbf{1}_S: \omega \to \{0, 1\}$. Thus $\operatorname{card}(\mathbb{R}) = \operatorname{card}(\mathcal{P}(\omega)) = 2^{\aleph_0}$.
[/example]
## Cardinal Arithmetic
Having established what cardinals are, we want to compute with them. The natural operations — addition, multiplication, exponentiation — are defined set-theoretically, and their behaviour at infinite cardinals is the subject of this section.
[definition: Cardinal Arithmetic Operations]
For cardinals $m = \operatorname{card}(M)$ and $n = \operatorname{card}(N)$ (where $M$ and $N$ are disjoint sets of the respective cardinalities), define:
\begin{align*}
m + n &= \operatorname{card}(M \sqcup N), \\
m \cdot n &= \operatorname{card}(M \times N), \\
m^n &= \operatorname{card}(M^N),
\end{align*}
where $M^N = \{f : f \text{ is a function } N \to M\}$ and $M \sqcup N$ denotes the disjoint union. These definitions do not depend on the choice of $M$ and $N$ within their respective cardinality classes.
[/definition]
It is important to note that cardinal exponentiation and ordinal exponentiation are genuinely different operations. The ordinal $\omega^\omega$ (ordinal exponentiation) is countable — it is the order type of a certain well-ordering of $\mathbb{N}$ — while $\aleph_0^{\aleph_0}$ (cardinal exponentiation) equals $2^{\aleph_0} > \aleph_0$, an uncountable cardinality. Whenever exponential notation appears in a context mixing cardinals and ordinals, it is essential to specify which operation is intended.
The basic algebraic laws carry over from set-theoretic bijections.
[quotetheorem:1502]
Each law follows from exhibiting an explicit bijection between the relevant sets: $M \sqcup N \leftrightarrow N \sqcup M$ via swapping labels; $M \times N \leftrightarrow N \times M$ via $(m, n) \mapsto (n, m)$; $(M^N)^P \leftrightarrow M^{N \times P}$ via the currying bijection $g \mapsto (n, p) \mapsto g(p)(n)$. These are bijections of sets, so they hold for all cardinals — finite and infinite alike.
One can also define infinite sums and products:
\begin{align*}
\sum_{i \in I} m_i = \operatorname{card}\!\left(\bigsqcup_{i \in I} M_i\right), \qquad \prod_{i \in I} m_i = \operatorname{card}\!\left(\prod_{i \in I} M_i\right).
\end{align*}
These generalise the binary operations and are used, for instance, in computing the cardinality of function spaces.
[example: Cardinality of Sequences of Reals]
How many sequences of real numbers are there? A real sequence is a function $\omega \to \mathbb{R}$, so its set is $\mathbb{R}^\omega$. We compute:
\begin{align*}
\operatorname{card}(\mathbb{R}^\omega) = (2^{\aleph_0})^{\aleph_0} = 2^{\aleph_0 \cdot \aleph_0} = 2^{\aleph_0} = \operatorname{card}(\mathbb{R}).
\end{align*}
Here we used the tower law $(m^n)^p = m^{n \cdot p}$ and the fact (proved below) that $\aleph_0 \cdot \aleph_0 = \aleph_0$. The surprising conclusion is that there are no more sequences of real numbers than there are real numbers.
[/example]
### The Key Theorem: $\kappa \cdot \kappa = \kappa$ for Infinite $\kappa$
Infinite cardinal addition and multiplication turn out to be trivial in the strongest possible sense. The following theorem, due to König (in this form), is the foundational result.
[quotetheorem:1503]
[citeproof:1503]
The square-enumeration technique is genuinely important: it is the reason we could not simply use the diagonal enumeration from the $\aleph_0$ case. For $\aleph_0$, the diagonal works because all initial segments are finite. For larger $\aleph_\alpha$, the "squares" $\beta \times \beta$ with $\beta < \omega_\alpha$ have cardinality $< \aleph_\alpha$ by the induction hypothesis, and the square enumeration is the device that converts this inductive knowledge into a well-ordering of the right length.
From this theorem, the coarser facts about cardinal sums and products follow immediately.
[quotetheorem:1504]
[citeproof:1504]
The phrase "cardinal arithmetic is boring" refers precisely to this: for infinite cardinals, neither addition nor multiplication can escape the larger of the two operands. There is no analogue of "going up" by addition or multiplication — only exponentiation can genuinely increase the size.
[example: $X \sqcup X \leftrightarrow X$ for Infinite Sets]
A concrete special case: any infinite set $X$ bijects with $X \sqcup X$ (two disjoint copies of $X$). This is immediate from the theorem: $\operatorname{card}(X \sqcup X) = \operatorname{card}(X) + \operatorname{card}(X) = 2 \cdot \operatorname{card}(X) = \operatorname{card}(X)$, since $2 \leq \aleph_0 \leq \operatorname{card}(X)$. Without Choice (in ZF), this statement is actually not provable in general — it is equivalent to the Axiom of Choice for the particular set $X$.
[/example]
### Cardinal Exponentiation and the Continuum Hypothesis
The striking uniformity of cardinal addition and multiplication makes the behaviour of cardinal exponentiation all the more surprising: it is deeply difficult and largely independent of ZFC. The simplest open question is whether $2^{\aleph_0} = \aleph_1$. This is the **Continuum Hypothesis (CH)**. Gödel showed in 1940 that CH is consistent with ZFC (it holds in the constructible universe $L$), and Cohen showed in 1963 that its negation is also consistent with ZFC (using the technique of forcing). Thus CH can neither be proved nor refuted in ZFC — it is independent.
The difficulty is not merely about $2^{\aleph_0}$. Even the question of which $\aleph_\beta$ equals $\aleph_\alpha^{\aleph_\gamma}$ for general $\alpha, \gamma, \beta$ is not fully resolved. Silver's theorem (1975) gives a partial result: if $\aleph_\delta$ is a singular cardinal of uncountable cofinality and $2^{\aleph_\alpha} = \aleph_{\alpha+1}$ for all $\alpha < \delta$, then $2^{\aleph_\delta} = \aleph_{\delta+1}$. But many specific values remain consistent with ZFC to take a wide range of values. Cardinal exponentiation is, in a precise sense, as hard as mathematics gets.
[remark: The Generalised Continuum Hypothesis]
The **Generalised Continuum Hypothesis (GCH)** asserts $2^{\aleph_\alpha} = \aleph_{\alpha+1}$ for every ordinal $\alpha$. Under GCH, cardinal arithmetic is completely determined: $\aleph_\alpha^{\aleph_\beta} = \aleph_{\alpha+1}$ when $\alpha \geq \beta$ and $= \aleph_\beta^+ = \aleph_{\beta+1}$ otherwise. Like CH, GCH is independent of ZFC. In Gödel's constructible universe $L$, GCH holds; Cohen's forcing methods show it can fail, and it can fail in a variety of ways.
[/remark]
The contrast between the two halves of cardinal arithmetic — the complete triviality of addition and multiplication versus the irreducible complexity of exponentiation — reflects something deep about the structure of set theory itself.
Cardinal arithmetic answers the question of set size, revealing a stark split: addition and multiplication of infinite cardinals are trivial, but exponentiation is profoundly difficult, with the continuum hypothesis independent of all standard axioms. This independence points to a final frontier: the incompleteness theorems, which show that arithmetic itself cannot prove every true statement about itself.
# 7. Incompleteness
This concluding chapter addresses the deepest foundational question raised by the course: can a formal theory of arithmetic prove every true statement about the natural numbers? The answer is emphatically negative. Gödel's incompleteness theorems, proved in 1931, show that any sufficiently expressive and consistent theory — including Peano Arithmetic (PA) and Zermelo–Fraenkel set theory (ZF) — must leave some true statements unprovable. These results are marked with a star in the course schedule, meaning they are non-examinable, but they form the natural endpoint of everything developed in the preceding chapters: propositional logic (Chapter 2), ordinals and cardinals (Chapters 3 and 7), Zorn's lemma (Chapter 4), predicate logic (Chapter 5), and axiomatic set theory (Chapter 6).
## The Problem of Self-Reference in Arithmetic
The ambition behind Gödel's First Incompleteness Theorem can be stated simply: we want a sentence $p$ of the language of arithmetic such that $p$ is true in $\mathbb{N}$ but $\mathrm{PA} \not\vdash p$. The strategy is to construct a sentence that, in some sense, asserts its own unprovability. If such a sentence $p$ can be arranged so that $p$ is true if and only if $p$ is not provable, then the argument runs as follows: if $p$ were false, then $p$ would be provable (since $p$ asserts its own unprovability, being false means being provable), so PA would prove $p$, and by the soundness of PA, $p$ would hold in $\mathbb{N}$. But we assumed $p$ is false in $\mathbb{N}$ — a contradiction. Therefore $p$ must be true in $\mathbb{N}$, and consequently $p$ is not provable by PA.
The immediate difficulty is one of self-reference. A sentence of PA is a finite string of symbols from the language of arithmetic, and it seems impossible for such a sentence to refer to itself — any attempt to write "$p$ is unprovable" would require $p$ to already contain a description of itself, making $p$ longer than any fixed name for it. The resolution of this apparent paradox, due to Gödel, involves encoding formulae and proofs as natural numbers, and then showing that the resulting numerical encoding can itself be discussed inside PA.
## Definability and the Coding of Syntax
The key technical device is the concept of definability, which asks which subsets of $\mathbb{N}$ and which functions on $\mathbb{N}$ can be expressed using the language of first-order arithmetic.
[definition: Definability in Arithmetic]
A subset $S \subset \mathbb{N}$ is **definable** if there exists a formula $\varphi(x)$ with one free variable such that for all $m \in \mathbb{N}$,
\begin{align*}
m \in S \iff \varphi(m) \text{ holds in } \mathbb{N}.
\end{align*}
A function $f: \mathbb{N} \to \mathbb{N}$ is **definable** if there exists a formula $\psi(x, y)$ with two free variables such that for all $m, n \in \mathbb{N}$,
\begin{align*}
f(m) = n \iff \psi(m, n) \text{ holds in } \mathbb{N}.
\end{align*}
[/definition]
The language of PA has only addition, multiplication, successor, and the constant $0$, so it is not immediately obvious which sets and functions are definable. Two instructive examples show the range of what is possible.
[example: Definable Sets in Arithmetic]
The set of prime numbers is definable. A natural number $x$ is prime if and only if $x \neq 1$ and every factorisation $yz = x$ forces $y = 1$ or $z = 1$. In the language of PA, the formula $\varphi(x)$ is
\begin{align*}
x \neq 1 \;\wedge\; (\forall y)(\forall z)\bigl(yz = x \Rightarrow (y = 1 \vee z = 1)\bigr).
\end{align*}
The set of powers of $2$ is also definable, even though PA has no exponentiation symbol. The key observation is that $x$ is a power of $2$ if and only if every prime divisor of $x$ equals $2$. Writing $2$ as the term $s(s(0))$ and divisibility as $y \mid x \iff (\exists z)(yz = x)$, the defining formula is
\begin{align*}
(\forall y)\bigl((y \text{ is prime} \;\wedge\; y \mid x) \Rightarrow y = 2\bigr).
\end{align*}
The function $m \mapsto m^2$ is definable via $\psi(x, y) \equiv (yy = x)$.
These examples illustrate that definability is far more expressive than the syntax of PA might suggest.
[/example]
The more powerful fact underlying the whole construction is the following.
[quotetheorem:1505]
The proof of this theorem is substantial and lies outside the scope of the course. It proceeds by showing that all primitive recursive functions are definable, and then extending to the class of all computable functions using a technique due to Gödel involving the Chinese Remainder Theorem to encode sequences of natural numbers. A careful treatment appears in Johnstone's book on the foundations of mathematics. The theorem implies, in particular, that $m \mapsto 2^m$ is definable, since there is a straightforward algorithm to compute it.
### Gödel Numbering
The language of PA has twelve symbols: $s, 0, \times, +, \bot, \Rightarrow, (, ), =, x, ', \forall$, where variables are the infinitely many expressions $x, x', x'', \ldots$ built from $x$ and the prime symbol. Assign to each base symbol a positive integer value, say $v(s) = 1, v(0) = 2, \ldots, v(\forall) = 12$. The **Gödel number** (or code) $c(p)$ of a formula $p = \sigma_1 \sigma_2 \cdots \sigma_k$ (where $\sigma_i$ are the successive symbols) is
\begin{align*}
c(p) \;=\; 2^{v(\sigma_1)} \cdot 3^{v(\sigma_2)} \cdots p_k^{v(\sigma_k)},
\end{align*}
where $p_k$ denotes the $k$-th prime number. By the fundamental theorem of arithmetic, this assignment is injective: distinct formulae receive distinct codes. Not every natural number codes a formula — for instance, $2^7 \cdot 3^{12} \cdot 5^{10}$ decodes to the unfinished string $(\forall x$, which is not a well-formed formula — but the property "$m$ codes a well-formed formula" is decidable by a finite algorithm, and therefore definable.
Write $S_m$ for the formula coded by $m$ (setting $S_m$ to be $\bot$ if $m$ does not code a valid formula). A finite sequence of formulae $p_1, p_2, \ldots, p_n$ can itself be coded as $2^{c(p_1)} 3^{c(p_2)} \cdots p_n^{c(p_n)}$, enabling proofs (which are finite sequences of formulae) to be represented as natural numbers.
Since the axioms of PA are given by finite, algorithmic descriptions, the property "$m$ codes an axiom of PA (or a logical axiom)" is definable. The property of being a valid application of Modus Ponens — "$S_n$ is obtained from $S_\ell$ and $S_m$ via MP" — corresponds to a specific syntactic check that can be carried out algorithmically, so it too is definable. Building upward, the relation
\begin{align*}
\Theta(m, n) \;\equiv\; \text{"$n$ codes a proof of $S_m$"}
\end{align*}
is definable. This means: $n$ codes a finite sequence of formulae, the last of which is $S_m$, and each formula in the sequence is either an axiom or follows from earlier formulae by Modus Ponens or Generalisation. From this, provability itself becomes definable:
\begin{align*}
\Psi(m) \;\equiv\; \text{"$S_m$ is provable"} \;\iff\; (\exists n)\,\Theta(m, n).
\end{align*}
At this stage nothing surprising has occurred. That strings of symbols can be encoded as numbers is the basic insight behind computing. The clever step comes next.
## The Gödel Sentence
The construction of the self-referential sentence rests on the idea of substitution. Given a formula $\varphi(x)$ with one free variable $x$ and a natural number $k$, write $\varphi(k)$ for the formula obtained by substituting the numeral for $k$ (that is, the term $s^k(0) = s(s(\cdots s(0)\cdots))$ with $k$ applications of $s$) in place of $x$. The map $(m, k) \mapsto c(\varphi(k))$, where $m = c(\varphi)$, is definable since it amounts to a syntactic substitution operation that a finite algorithm can carry out.
Now define the property $\chi(m)$ asserting:
\begin{align*}
\chi(m) \;\equiv\; \text{"$m$ codes a formula $S_m$ with one free variable, and $S_m(m)$ is not provable."}&
\end{align*}
Since "$m$ codes a formula with exactly one free variable" is decidable, and "$S_m(m)$ is provable" is the statement $\Psi(c(S_m(m)))$ which is definable (recall $\Psi$ was constructed above), the entire property $\chi(m)$ is definable. Let $\varphi(x)$ be the formula of PA that defines $\chi$, so that for all $m \in \mathbb{N}$,
\begin{align*}
\chi(m) \iff \varphi(m) \text{ holds in } \mathbb{N}.
\end{align*}
The formula $\varphi$ is itself an expression of PA, and therefore has a Gödel number. Let $N = c(\varphi)$. Unfolding what $\chi(N)$ says: $N$ codes a formula (namely $\varphi$) with one free variable, and $S_N(N) = \varphi(N)$ is not provable. In other words,
\begin{align*}
\chi(N) \;\equiv\; \text{"$\varphi(N)$ is not provable."}&
\end{align*}
Since $\chi$ is defined by $\varphi$, the sentence $\varphi(N)$ holds in $\mathbb{N}$ if and only if $\chi(N)$ holds in $\mathbb{N}$, which is precisely the assertion that $\varphi(N)$ is not provable. Thus $\varphi(N)$ is a sentence that is true in $\mathbb{N}$ if and only if it is not provable from PA. This is the **Gödel sentence** for PA, commonly denoted $G$.
[quotetheorem:1506]
[citeproof:1506]
The incompleteness of PA is not an artefact of PA being a weak theory. One might hope to repair the situation by adjoining the true sentence $G$ as a new axiom. But the same argument applies to the extended theory $\mathrm{PA} \cup \{G\}$: run the construction again with $\mathrm{PA} \cup \{G\}$ in place of PA to produce a new Gödel sentence $G'$ that is true but not provable from $\mathrm{PA} \cup \{G\}$. Incompleteness is an intrinsic feature of any consistent, recursively axiomatised theory that is sufficiently strong to carry out the Gödel coding construction.
### Why Truth Cannot Be Defined
There is a complementary perspective on why the incompleteness argument cannot be absorbed into PA. If one takes $T = \{p : p \text{ holds in } \mathbb{N}\}$ — the set of all arithmetical truths — then $T$ is a complete theory of arithmetic. The Gödel construction applied to $T$ must break down somewhere, and the breakdown occurs at the step where we claimed $\Psi$ is definable. Indeed:
[quotetheorem:1507]
This result, due to Tarski, is proved by a diagonal argument analogous to the Gödel construction: if truth in $\mathbb{N}$ were definable by some formula $\tau(x)$, one could construct a sentence that asserts its own falsity, leading to a contradiction in $\mathbb{N}$ itself (not merely in PA). The undefinability of truth is the reason why the Gödel sentence $G$, although informally describable as "I am not provable," cannot be turned into a sentence asserting its own falsity — truth and provability genuinely differ.
## Gödel's Second Incompleteness Theorem
The First Incompleteness Theorem shows that PA cannot prove some true sentence $G$. The sentence $G$, though somewhat artificial, is at least a statement purely about natural numbers. The Second Incompleteness Theorem identifies a far more mathematically natural statement that PA cannot prove.
The statement that PA is consistent — that PA does not prove $\bot$ — can itself be expressed as a sentence of arithmetic. Since "$n$ codes a proof of $\bot$" is a decidable condition and hence definable by a formula $\Theta(m, n)$ (with $m$ set to the Gödel number of $\bot$), the consistency statement is
\begin{align*}
\mathrm{con}(\mathrm{PA}) \;\equiv\; (\forall n)\,\neg\,\Theta(c(\bot), n),
\end{align*}
asserting that no natural number codes a proof of $\bot$ from PA.
[quotetheorem:1508]
The argument for the Second Theorem is more subtle than for the First. The proof of the First Theorem, carefully analysed, amounts to the provability (inside PA) of the implication $\mathrm{con}(\mathrm{PA}) \Rightarrow G$. If PA could also prove $\mathrm{con}(\mathrm{PA})$, then by Modus Ponens PA would prove $G$. But the First Theorem establishes $\mathrm{PA} \not\vdash G$. Therefore $\mathrm{PA} \not\vdash \mathrm{con}(\mathrm{PA})$.
The crucial step — that the proof of "if PA is consistent then $G$ is unprovable" can itself be formalised within PA — requires careful work to verify that the provability predicate $\Psi$ satisfies certain derivability conditions (the Löb conditions). This is the technically demanding part of the proof, and the course does not carry it out in full. The conclusion, however, is striking: no consistent formal system strong enough to express basic arithmetic can give a formal proof of its own consistency. Any consistency proof for a system $S$ must use methods transcending $S$.
[remark: The Scope of the Second Theorem]
The same argument applies verbatim to ZF. Since ZF can carry out the Gödel coding construction (ZF is far stronger than PA and can certainly express arithmetic), ZF is incomplete and $\mathrm{ZF} \not\vdash \mathrm{con}(\mathrm{ZF})$ (assuming ZF is consistent). This does not mean one cannot prove ZF is consistent — it means any such proof must use axioms beyond ZF itself. For instance, large cardinal axioms imply $\mathrm{con}(\mathrm{ZF})$, but they are not provable from ZF.
[/remark]
## Independence in Set Theory
The incompleteness results show that any sufficiently strong consistent theory must have undecidable statements. The natural question for a working mathematician is: which specific, natural mathematical statements are independent? Two central examples arose in the development of set theory.
### The Continuum Hypothesis
The Continuum Hypothesis (CH) asserts that there is no set whose cardinality lies strictly between $\aleph_0$ (the cardinality of $\mathbb{N}$) and $2^{\aleph_0}$ (the cardinality of $\mathbb{R}$). In the notation of cardinals, this is the statement $2^{\aleph_0} = \aleph_1$. Cantor spent years attempting to prove CH and conjectured it was true; Hilbert listed it as the first problem on his celebrated 1900 list.
The independence of CH from ZFC (ZF together with the Axiom of Choice) is established by two results, each requiring entirely different methods.
[quotetheorem:1509]
Gödel proved this in 1938 by constructing the **constructible universe** $L$, an inner model of ZF consisting of all sets that can be built by a definable transfinite process. A careful argument shows that $L$ satisfies all axioms of ZFC, and moreover that in $L$, every set of reals is constructible, which forces $2^{\aleph_0} = \aleph_1$ — exactly CH. Since $L$ is a model of $\mathrm{ZFC} + \mathrm{CH}$ built within ZF, if ZF is consistent then $\mathrm{ZFC} + \mathrm{CH}$ is consistent.
[quotetheorem:1510]
Cohen established this in 1963 using the revolutionary technique of **forcing**. Starting from a countable transitive model $M$ of ZFC (a ground model), forcing adjoins a generic object $G$ — in the case of CH, one adjoins $\aleph_2^M$ many new real numbers — to produce an extended model $M[G]$ that still satisfies all axioms of ZFC but in which $2^{\aleph_0} \geq \aleph_2$, violating CH. The forcing relation, which allows one to determine which statements are "forced" to hold in $M[G]$ using only information about the conditions in $M$, is the key tool. Forcing earned Cohen the Fields Medal in 1966 and has since become a central method in set-theoretic research, allowing the independence of hundreds of natural mathematical questions to be established.
Together, Gödel's and Cohen's results say that CH is independent of ZFC: neither CH nor its negation is provable from the standard axioms of set theory. The question "how large is the continuum?" does not have an answer within ZFC alone.
### The Axiom of Choice
A structurally similar story applies to the Axiom of Choice (AC). Recall that AC asserts: for every family of non-empty sets, there exists a function selecting one element from each member of the family. Within ZF (without AC), it is neither provable nor refutable.
[quotetheorem:1511]
The first half (consistency of AC with ZF) follows from Gödel's constructible universe: the axiom of choice holds in $L$, so $L$ is a model of ZFC. For the second half, one constructs a model where AC fails. The classical approach uses **permutation models** (Fraenkel–Mostowski models): starting from a set theory with urelements (atoms that are not sets but can be elements of sets), one takes a group of permutations of the atoms and considers the class of sets hereditarily fixed by all permutations in a normal subgroup. In such models, there can exist families of non-empty sets with no choice function — for instance, a countably infinite collection of 2-element sets might lack a selector if the atoms are arranged so that no consistent selection is invariant under the symmetries. To transfer this to pure set theory (without urelements), Cohen's forcing technique can be adapted: one forces over a model of ZFC by adding a sequence of Cohen reals without adding a well-ordering of them, producing a model of ZF in which a certain set of reals has no well-ordering, contradicting the well-ordering principle that is equivalent to AC.
[remark: What Independence Means]
Independence does not mean that CH or AC are "meaningless" or that mathematics is arbitrary. It means that the axioms of ZFC, powerful as they are, do not determine all mathematical truths about infinite sets. Some set theorists pursue additional axioms — large cardinal axioms, or the Proper Forcing Axiom — that do settle CH (the latter forces $2^{\aleph_0} = \aleph_2$). Others argue that CH is a genuinely open question about mathematical reality. The philosophical debate continues.
[/remark]
## What Consistency Proofs Can Do
The Second Incompleteness Theorem might seem to make all consistency proofs pointless: if a system cannot prove its own consistency, what good is a consistency proof? But this reading is too hasty, and the closing observation of the course is important.
Relative consistency proofs — showing that if $S$ is consistent then $S + \varphi$ is consistent — are mathematically productive even though they cannot be absolute. Gödel showed $\mathrm{con}(\mathrm{ZF}) \Rightarrow \mathrm{con}(\mathrm{ZFC} + \mathrm{CH})$, and Cohen showed $\mathrm{con}(\mathrm{ZFC}) \Rightarrow \mathrm{con}(\mathrm{ZFC} + \neg\mathrm{CH})$. These are genuine theorems provable in a meta-theory (for instance, in ZF itself, or even in Peano Arithmetic with sufficient coding). They tell us something real: CH cannot be used to derive a contradiction from ZFC, and neither can $\neg$CH. Any theorem proved with the help of CH is "safe" in the sense that its proof can be replaced by a ZFC proof — or it is genuinely new information that could only be established by working in a model where CH holds.
More broadly, consistency proofs organise our knowledge of which mathematical structures can coexist. They have become indispensable tools for working set theorists, combinatorialists, and even analysts, who sometimes encounter statements whose independence from ZFC is the most precise thing that can be said about them. Incompleteness is not a defect of mathematics but a feature of its extraordinary richness.
Contents
- Introduction
- The Problem with Naive Set Theory
- The Response: Axiomatic Set Theory
- Formal Logic and Its Relationship to Set Theory
- The Unavoidable Circularity
- Structure of the Course
- Why Study Foundations?
- 1. Propositional Calculus
- The Propositional Language
- Semantic Entailment
- Tautologies and Semantic Entailment
- Syntactic Entailment and Formal Proofs
- The Axioms and Modus Ponens
- The Deduction Theorem
- Soundness and Completeness
- Soundness
- Completeness via Model Existence
- Compactness and Decidability
- Compactness
- Decidability
- 2. Well-orderings and Ordinals
- Well-orderings
- Order Isomorphisms
- Initial Segments and Induction
- New Well-orderings from Old
- Subset Collapse
- The Successor Construction
- Gluing a Nested Family
- Comparing Well-orderings
- Ordinals
- Comparing Ordinals
- The Ordinals are Well-ordered
- Hartogs' Lemma and the First Uncountable Ordinal
- Successors and Limits
- Ordinal Arithmetic
- Ordinal Addition
- Ordinal Multiplication
- Ordinal Exponentiation
- The Ordinal $\varepsilon_0$
- Normal Functions
- Basic Properties of Normal Functions
- Fixed Points of Normal Functions
- The Division Algorithm for Normal Functions
- 3. Posets and Zorn's Lemma
- Partial Orders
- Chains, Antichains, and Bounds
- Complete Posets and the Knaster–Tarski Theorem
- Maximal Elements and Zorn's Lemma
- Lattices and Boolean Algebras
- Applications of Zorn's Lemma
- Zorn's Lemma and the Axiom of Choice
- Bourbaki–Witt Theorem*
- 4. Predicate Logic
- The Language of Predicate Logic
- Free and Bound Variables, Sentences
- Semantic Entailment
- Syntactic Implication
- The Deduction Theorem
- Soundness
- Completeness: Henkin's Construction
- Peano Arithmetic
- Non-Standard Models of PA
- Completeness and Categoricity*
- The Compactness Theorem
- The Löwenheim–Skolem Theorems
- Categoricity and Complete Theories
- 5. Set Theory
- Axioms of Set Theory
- The Starting Axioms
- The Building Axioms
- The Three Surprising Axioms
- Classes and the Axiom of Replacement
- The Axiom of Choice
- Properties of ZF
- Transitive Sets and the Transitive Closure
- $\in$-Induction and $\in$-Recursion
- Well-Founded Relations
- Mostowski's Collapsing Theorem
- Picture of the Universe
- The von Neumann Hierarchy
- The Rank Function and the Universe
- 6. Cardinals
- Cardinality and the Definition of Cardinals
- Initial Ordinals and the Aleph Hierarchy
- Cardinal Inequalities
- Cantor's Theorem and the Diagonal Argument
- Cardinal Arithmetic
- The Key Theorem: $\kappa \cdot \kappa = \kappa$ for Infinite $\kappa$
- Cardinal Exponentiation and the Continuum Hypothesis
- 7. Incompleteness
- The Problem of Self-Reference in Arithmetic
- Definability and the Coding of Syntax
- Gödel Numbering
- The Gödel Sentence
- Why Truth Cannot Be Defined
- Gödel's Second Incompleteness Theorem
- Independence in Set Theory
- The Continuum Hypothesis
- The Axiom of Choice
- What Consistency Proofs Can Do
Cambridge II Logic and Set Theory
Content
Problems
History
Created by admin on 4/24/2026 | Last updated on 4/24/2026
Prerequisites
No prerequisites required for this page.
Rate this page
★
★
★
★
★
Poor
Excellent