Set Theory I (see Axiomatic Set Theory entry above).

Edit 0 Issues 0 Pull Requests Roadmap Admin

Content

Problems

History

Issues Verification Attributions

Set Theory I develops the core foundations of modern set theory, beginning with the ZFC axioms and the cumulative hierarchy and then building a working theory of infinite mathematical objects. The course treats sets as the basic language for mathematics and studies how they organize relations, functions, orders, ordinals, and cardinals. Its focus is not just on definitions, but on the structural principles that govern infinite collections and the methods used to prove results about them. The main themes are transfinite methods, size and comparison of infinite sets, and the interaction between combinatorial, logical, and structural ideas. After introducing relations and well-orders, the course develops transfinite induction and recursion, ordinal arithmetic, and the theory of cardinals and equinumerosity. It then moves to the [Axiom of Choice](/page/Axiom%20of%20Choice) and its equivalents, cardinal arithmetic, cofinality, regularity, stationary sets, Fodor's lemma, and the basic machinery of transitive sets, collapse, absoluteness, reflection, and elementary submodels. Each chapter builds on the preceding ones by extending the same foundational toolkit to a deeper level of generality. The early chapters establish the universe of sets and the language for measuring and ordering infinite objects; the middle chapters use that language to analyze ordinals, cardinals, and choice principles; and the later chapters move toward finer structural and logical properties of the set-theoretic universe. The final chapter introduces constructibility as a first inner model, giving a first systematic example of how set theory studies alternative universes of sets and the principles that hold within them. # Introduction This introductory chapter fixes the viewpoint of the course before the axioms begin. Set theory will be treated both as a formal first-order theory and as the common language in which ordinary mathematical objects are coded. The guiding tension is that we want sets to be flexible enough to support ordinary construction, while the axioms must block unrestricted comprehension. ## Why Axiomatise Sets? The first problem is not how to list familiar sets, but how to say which set-forming operations are legitimate. Naive set theory suggests that any property should determine a collection, yet the Russell construction shows that unrestricted comprehension is inconsistent. ZFC replaces this by axioms that build sets from already-given sets and by schemas that carve definable subsets out of existing sets. [explanation: The Foundational Role of ZFC] ZFC is a first-order theory in the language with equality and one non-logical binary relation symbol $\in$. Its intended objects are sets, and all mathematical structures are represented by sets equipped with additional relations and functions. A group, a topological space, or a model of a first-order theory is coded by ordered tuples, functions, relations, and underlying sets. This coding convention does not erase mathematical practice. Instead, it supplies a common background theory strong enough to formalise constructions and compare their strength. Later courses in forcing and large cardinals study extensions or refinements of this background, so this course concentrates on the basic machinery that those subjects assume. [/explanation] The contrast with naive comprehension is the first lesson of the course: collections may be discussed informally, but only some collections are sets. This distinction leads to the working use of classes. [definition: Class] A class is a collection described by a first-order formula $\varphi(x, p_1, \dots, p_n)$ with set parameters $p_1, \dots, p_n$. [/definition] A class need not be a set. In these notes, classes are metalinguistic devices for talking about definable collections; they are not additional objects of the formal ZFC universe. [example: The Universal Class] The formula $x = x$ defines the class $V$ because every set is equal to itself, so for each set $a$ we have $a \in V$ exactly when $a=a$. Suppose, for contradiction, that there were a set $U$ containing every set. Apply the Separation schema to $U$ with the formula $x \notin x$, obtaining the set \begin{align*} R=\{x\in U : x\notin x\}. \end{align*} Since $U$ contains every set and $R$ is a set, we have $R\in U$. By the defining condition for membership in $R$, \begin{align*} R\in R &\iff R\in U \text{ and } R\notin R\\ &\iff R\notin R, \end{align*} because $R\in U$ is true. Thus $R\in R$ holds exactly when $R\notin R$ holds, a contradiction. Hence the definable class $V$ is available in the metatheory, but it cannot be a set in ZFC. [/example] The example also indicates how the early part of the course will proceed. We will often name large definable collections, such as the class of ordinals or the cumulative hierarchy, while proving separately that particular initial segments are sets. ## Formal Language and Informal Interpretation The next problem is how to keep formal syntax and mathematical intuition aligned. The formal language of set theory is minimal, but the intended interpretation is rich: membership is the only primitive non-logical relation, and all other notions must be defined from it. [definition: Language of Set Theory] The language of set theory is the first-order language $\mathcal L_{\in}$ with equality and one binary relation symbol $\in$. [/definition] Because the language is so small, definitions carry much of the mathematical load. Ordered pairs, functions, relations, natural numbers, ordinals, and cardinals will all be built as sets satisfying formulas in $\mathcal L_{\in}$. [example: Coding an Ordered Pair] For sets $a$ and $b$, the Kuratowski ordered pair is \begin{align*} (a,b) := \{\{a\}, \{a,b\}\}. \end{align*} [claim]For all sets $a,b,c,d$, one has $(a,b)=(c,d)$ if and only if $a=c$ and $b=d$.[/claim] [proof]If $a=c$ and $b=d$, then \begin{align*} (a,b) &=\{\{a\},\{a,b\}\}\\ &=\{\{c\},\{c,d\}\}\\ &=(c,d). \end{align*} Conversely, suppose \begin{align*} \{\{a\},\{a,b\}\}=\{\{c\},\{c,d\}\}. \end{align*} Since $\{a\}\in\{\{a\},\{a,b\}\}$, the equality gives \begin{align*} \{a\}\in\{\{c\},\{c,d\}\}. \end{align*} Hence either $\{a\}=\{c\}$ or $\{a\}=\{c,d\}$. In the first case, $a=c$. In the second case, $c\in\{c,d\}=\{a\}$, so $c=a$. Thus in all cases, \begin{align*} a=c. \end{align*} Using $a=c$, the assumed equality becomes \begin{align*} \{\{a\},\{a,b\}\}=\{\{a\},\{a,d\}\}. \end{align*} Since $\{a,b\}\in\{\{a\},\{a,b\}\}$, we have \begin{align*} \{a,b\}\in\{\{a\},\{a,d\}\}. \end{align*} Thus either $\{a,b\}=\{a\}$ or $\{a,b\}=\{a,d\}$. If $\{a,b\}=\{a,d\}$, then $b\in\{a,d\}$, so $b=a$ or $b=d$. If $b=d$, we are done. If $b=a$, then \begin{align*} \{a,d\}=\{a,b\}=\{a\}, \end{align*} so $d\in\{a\}$ and therefore $d=a=b$. If $\{a,b\}=\{a\}$, then $b\in\{a\}$, so $b=a$. Also $\{a,d\}\in\{\{a\},\{a,d\}\}=\{\{a\},\{a,b\}\}$, so either $\{a,d\}=\{a\}$ or $\{a,d\}=\{a,b\}$. In the first case, $d=a=b$. In the second case, \begin{align*} \{a,d\}=\{a,b\}=\{a\}, \end{align*} so again $d=a=b$. Therefore $b=d$ in every case. We have shown that $(a,b)=(c,d)$ implies $a=c$ and $b=d$, and the reverse implication was proved above.[/proof] Thus the set $\{\{a\},\{a,b\}\}$ really remembers the first and second coordinates, which is why it can serve as a set-theoretic code for the ordered pair. [/example] This coding raises a general issue that will recur throughout the course. A definition may choose one convenient representation, but the mathematical content usually lies in the structural properties proved about that representation. [remark: Coding Choices] Different set-theoretic codings of ordered pairs are possible. Once a coding satisfies the characteristic equality property, later constructions of products, relations, and functions are insensitive to the particular implementation. [/remark] The formal theory is first-order, so axiom schemas are allowed when a single mathematical principle must be stated uniformly for every formula. Separation and Replacement are the central examples; they cannot be replaced by one first-order sentence of $\mathcal L_{\in}$. This creates a bookkeeping problem: a single informal axiom may actually stand for infinitely many formal sentences, one for each formula to which the principle applies. To state later axioms accurately, we need terminology that separates an individual axiom from the rule that generates all of its instances. [definition: Axiom Schema] An axiom schema is a rule assigning an axiom to each formula, or to each formula of a specified syntactic form, in the language under discussion. [/definition] The use of schemas is one reason that metatheory remains visible. When we say that ZFC contains Separation, we mean that it contains every instance of the Separation schema. ## The Shape of the Course The course is organised around the question of how ordinary mathematics emerges from the axioms. The first lectures establish the axioms and the cumulative hierarchy; the middle lectures build ordinals, recursion, and cardinals; the final lectures develop choice, cofinality, stationary sets, and reflection principles at a basic level. [explanation: Main Structural Themes] The axioms of ZFC are not a flat list of unrelated assumptions. Extensionality gives equality of sets its membership-based meaning. Pairing, Union, Power Set, Infinity, Separation, and Replacement provide construction principles. Foundation organises the universe into ranks and supports induction on membership. Choice supplies well-orderings and product selection principles that are independent of the other axioms in a precise metamathematical sense. Ordinals then become the canonical well-ordered sets. They serve both as objects of study and as indices for recursion, induction, rank, and hierarchy. Cardinals measure size by comparison through bijections and injections, while cofinality records how a large order can be approached by smaller unbounded pieces. [/explanation] This structure explains why the course does not begin with cardinals. Counting infinite sets requires well-ordering arguments and ordinal technology, and those depend on the earlier development of relations and recursion. [example: A First Course Dependency] The assertion "every set can be well-ordered" is not meaningful until well-orders have been defined: for a set $A$, it says that there is a relation $<_A$ on $A$ such that every nonempty subset $B \subseteq A$ has a $<_A$-least element. It also belongs after the Axiom of Choice, because the [well-ordering principle](/theorems/721) is one of the standard forms of choice, not a consequence of the earlier construction axioms alone. Once $A$ is well-ordered by $<_A$, ordinals enter as the canonical representatives of well-order types: one assigns to $(A,<_A)$ the unique ordinal $\alpha$ order-isomorphic to it. Cardinal comparison then uses these ordinal representatives. If $A$ and $B$ can be well-ordered with order types $\alpha$ and $\beta$, then comparing their sizes is reduced to comparing the corresponding initial ordinals: $|A|\leq |B|$ means that there is an injection from $A$ into $B$, and after passing through the chosen well-orders this becomes a comparison between the ordinal codes of their sizes. Thus the dependency is structural rather than cosmetic: the course first states the axioms, then develops relations and well-orders, then builds ordinals as order types, and only after that treats cardinals as sizes represented by initial ordinals. [/example] The course also separates internal mathematical development from metamathematical comments. We will prove theorems inside ZFC or from specified fragments of ZFC, while occasionally noting when a result later becomes important for independence proofs. ## The Iterative Picture The informal picture behind the axioms is that sets are formed in stages. At each stage, previously available sets may be collected into new sets, and limit stages gather everything produced earlier. This picture motivates the cumulative hierarchy without being itself a formal axiom. [definition: Cumulative Hierarchy] The cumulative hierarchy is the metalinguistic class function from the class $\mathrm{Ord}$ of ordinals to the class $V$ of all sets, written $\alpha \mapsto V_\alpha$, given by \begin{align*} V_0 &:= \varnothing,\\ V_{\alpha+1} &:= \mathcal P(V_\alpha),\\ V_\lambda &:= \bigcup_{\alpha \in \lambda} V_\alpha \quad \text{for limit ordinals } \lambda. \end{align*} [/definition] The notation uses ordinals before they have been fully developed, so at this introductory stage it should be read as a roadmap. Later chapters define ordinals rigorously and prove the recursion theorem that legitimises this definition. A roadmap is useful only if it connects back to the actual universe of sets. The guiding question is whether every set is reached by some stage of the hierarchy, rather than merely whether the hierarchy can be described from the outside. [quotetheorem:4796] [citeproof:4796] The [rank principle](/theorems/4796) is a conceptual bridge between the axioms and the hierarchy. It says that the iterative picture is part of the internal organisation of ZFC: under the usual axioms, every set appears at some stage. Foundation is essential to this reading: in a non-well-founded theory allowing a set $x$ with $x \in x$, a rank assignment would force the rank of $x$ to be strictly below itself. Replacement is also not decorative, because collecting the ranks of all elements of a given set is a set-sized operation needed before taking their supremum. The theorem does not say that the whole hierarchy is a set, nor does it by itself define the least rank of $x$; those refinements require the later formal development of ordinals, recursion, and suprema. [example: First Ranks] Using the defining clauses for the cumulative hierarchy, \begin{align*} V_0&=\varnothing,\\ V_1&=\mathcal P(V_0)=\mathcal P(\varnothing). \end{align*} Since every element of $\varnothing$ is an element of $\varnothing$, we have $\varnothing\subseteq\varnothing$, so $\varnothing\in\mathcal P(\varnothing)=V_1$. Next, \begin{align*} V_1=\mathcal P(\varnothing)=\{\varnothing\}, \end{align*} because the only subset of $\varnothing$ is $\varnothing$. Hence \begin{align*} \{\varnothing\}\subseteq V_1, \end{align*} since its only element is $\varnothing$, and $\varnothing\in V_1$. Therefore \begin{align*} \{\varnothing\}\in\mathcal P(V_1)=V_2. \end{align*} Similarly, \begin{align*} V_2 &=\mathcal P(V_1)\\ &=\mathcal P(\{\varnothing\})\\ &=\{\varnothing,\{\varnothing\}\}, \end{align*} because the subsets of $\{\varnothing\}$ are exactly $\varnothing$ and $\{\varnothing\}$. Thus the two elements of $\{\varnothing,\{\varnothing\}\}$ are both elements of $V_2$, so \begin{align*} \{\varnothing,\{\varnothing\}\}\subseteq V_2. \end{align*} By the successor-stage definition, \begin{align*} V_3=\mathcal P(V_2), \end{align*} and therefore \begin{align*} \{\varnothing,\{\varnothing\}\}\in V_3. \end{align*} These first cases show the pattern behind rank: a set enters the hierarchy one stage after all of its elements have already entered. [/example] The hierarchy also explains why Power Set is a strong axiom. Passing from $V_\alpha$ to $V_{\alpha+1}$ forms all subsets of the previous stage, and this operation drives much of the growth in cardinal size. ## Proof Methods Used Throughout The final introductory problem is methodological: ordinary induction on $\mathbb N$ is not enough for set theory. Many arguments range over well-orders, ranks, or membership itself, so the course develops induction and recursion principles adapted to those domains. [definition: Well-Founded Relation] A relation $R$ on a set $A$ is well-founded if every nonempty subset $B \subset A$ has an $R$-minimal element. [/definition] Well-foundedness is the abstract condition behind minimal counterexample arguments. Once a relation is known to be well-founded, a proof can assume a counterexample of minimal complexity and derive a contradiction from smaller cases. The key question is how to turn that informal minimal-counterexample strategy into a precise proof rule. The principle needed below says that if every element follows from all smaller elements, then no counterexample can exist anywhere in the well-founded domain. [quotetheorem:4797] [citeproof:4797] This theorem is the template for transfinite induction on ordinals and for induction on rank. The hypothesis of well-foundedness is what makes the minimal-counterexample argument legitimate. For example, the usual order $<$ on $\mathbb Z$ is not well-founded, since $\mathbb Z$ has no least element; the implication "if $P(m)$ holds for all $m < n$, then $P(n)$ holds" can be vacuously true for $P(n)$ meaning $n < 0$, while $P(0)$ still fails because there is no first counterexample. The theorem also does not define objects by recursion and does not apply to arbitrary relations; it proves universal validity only after a well-founded relation has already supplied a genuine notion of smaller object. [example: Induction by Membership] Let $A$ be a set such that whenever $x \in A$ and $y \in x$, one also has $y \in A$. Foundation says that every nonempty set has an $\in$-minimal element, so if $B \subseteq A$ is nonempty, there is some $b \in B$ such that no element of $b$ lies in $B$. Thus the membership relation is well-founded on $A$. To prove a property $P(x)$ for every $x \in A$, it is enough to prove the hereditary step \begin{align*} \bigl(\forall y \in x,\ P(y)\bigr) \Longrightarrow P(x) \end{align*} for each $x \in A$. Indeed, suppose some element of $A$ failed $P$, and form the counterexample set \begin{align*} B=\{x \in A : \neg P(x)\}. \end{align*} If $B$ is nonempty, Foundation gives $b \in B$ such that no element of $b$ lies in $B$. Therefore, for every $y \in b$, the closure condition gives $y \in A$, and the minimality of $b$ gives $y \notin B$. Hence $P(y)$ holds for every $y \in b$. Applying the hereditary step to $b$ gives $P(b)$, contradicting $b \in B$. Therefore $B$ is empty, so $P(x)$ holds for every $x \in A$. This is the set-theoretic analogue of strong induction on natural numbers: instead of assuming the statement for all $m<n$, one assumes it for all members $y \in x$. [/example] Recursion is the constructive partner of induction. Induction proves uniqueness or universal validity; recursion defines objects by specifying their values in terms of earlier values. To use this method safely, the earlier values must form an already determined initial segment. The recursion principle below is the formal guarantee that definitions made stage by stage along the ordinals are not circular and determine a unique object at each stage. [quotetheorem:4798] [citeproof:4798] The cumulative hierarchy, ordinal arithmetic, and many rank constructions are applications of this principle. The ordinal initial-segment condition is essential: it prevents circular definitions such as trying to define $F(\alpha)$ in terms of $F(\alpha)$ itself or in terms of values at later ordinals. The principle gives uniqueness only for rules whose previous data are already determined, and the set-sized versions require Replacement-like hypotheses to ensure that the relevant initial segments and images remain sets. It also does not assert that the whole class function $F$ is itself a set; it is a class-level object used to organise set-sized stages. For that reason, recursion is not a technical aside; it is one of the main engines of the subject. ## What This Course Does Not Yet Do A final orientation point is the boundary of the course. We will build the internal toolkit of ZFC, but we will not yet develop forcing, constructibility in depth, or large cardinal axioms. Those topics depend on the definitions and lemmas established here. [remark: Later Uses] Forcing arguments require precise control over partial orders, dense sets, names, and transitive models. Large cardinal theory requires fluency with elementary embeddings, hierarchies, and reflection. The present course prepares the ordinal, cardinal, and structural language needed before those methods can be studied responsibly. [/remark] The intended outcome is therefore practical as well as foundational. By the end of the course, a reader should be able to manipulate ZFC axioms, reason with ordinals and cardinals, use choice in its standard equivalent forms, and recognise the basic combinatorial principles that later set theory refines. The introduction has set the course’s goals: to move from informal set talk to a disciplined theory in which proofs are carried out inside ZFC. The next chapter makes that shift precise by fixing the axioms, explaining the cumulative hierarchy, and showing how the universe of sets is built stage by stage. # 1. The ZFC Axioms and the Cumulative Hierarchy The first task in axiomatic set theory is to separate the informal practice of talking about collections from the formal theory whose objects are sets. ZFC gives a single binary relation, membership, and asks how much mathematics can be rebuilt from axioms governing that relation. This chapter fixes the language, states the axioms used throughout the course, and then explains why the cumulative hierarchy is the right mental model for those axioms. ## The Formal Language and Definable Classes How can a theory with only sets talk about familiar mathematical collections such as groups, functions, ordinals, and the class of all sets? The answer is that the formal language is intentionally small, while definable predicates do the work that informal mathematics often assigns to classes. [definition: Language of Set Theory] The first-order language of set theory has equality $=$ and one non-logical binary relation symbol $\in$. A formula is built from atomic formulas $x=y$ and $x\in y$ using logical connectives, quantifiers, and variables ranging over sets. [/definition] The restriction that variables range only over sets is a central discipline of ZFC. We may write as if there is a class of all ordinals or a class of all groups, but such a class is not automatically a set and is not an extra object of the theory. The danger is that informal class notation can look as though it introduces new things for quantifiers to range over. To keep the language first-order while still allowing useful large collections, class notation must be interpreted as shorthand for a formula, possibly with fixed set parameters. We therefore need a precise convention for when an expression like a class of all objects satisfying a condition is legitimate. The point of the definition is to identify such a class with the formula that defines membership in it, rather than with a new set-sized object. [definition: Definable Class] A definable class is an expression of the form \begin{align*} \{x : \varphi(x,p_1,\dots,p_n)\}, \end{align*} where $\varphi$ is a formula in the language of set theory and $p_1,\dots,p_n$ are fixed set parameters. [/definition] A definable class is therefore a convenient abbreviation for a predicate. When we say that $x$ belongs to such a class, we mean that the formula $\varphi(x,p_1,\dots,p_n)$ holds. [example: The Class of Singleton Sets] For a set $y$, Pairing applied with both inputs equal to $y$ gives a set $p$ such that \begin{align*} \forall z\,(z\in p \iff (z=y\lor z=y)). \end{align*} Since $(z=y\lor z=y)$ is equivalent to $z=y$, this is \begin{align*} \forall z\,(z\in p \iff z=y). \end{align*} Thus $p$ is the singleton whose only element is $y$, and Extensionality justifies writing this unique set as $\{y\}$. The displayed expression \begin{align*} \{x : \exists y\,(x=\{y\})\} \end{align*} therefore abbreviates the predicate “$x$ is equal to the singleton of some set $y$.” It is a definable class, not automatically a set: the formula defines which sets $x$ satisfy the condition, while Pairing supplies the individual singleton sets $\{y\}$ that occur in the formula. [/example] The next axiom says that a set has no hidden structure beyond its members. It is the first point where equality and membership interact. [quotetheorem:4799] [citeproof:4799] Extensionality does not create any sets. Its role is to make later constructions unambiguous, so that phrases like "the empty set", "the pair", and "the union" refer to unique objects once their existence has been established. It is therefore a uniqueness principle rather than an existence principle: it tells us when two descriptions name the same set, but it does not supply any objects satisfying those descriptions. This separation between uniqueness and existence is crucial in ZFC. The next group of axioms provides the raw constructions whose outputs Extensionality then lets us name without ambiguity. ## The Basic Existence Axioms of ZFC What sets are guaranteed to exist before any mathematical structure has been built? ZFC answers by giving local construction principles: from old sets, certain new sets may be formed. [definition: Empty Set Axiom] There exists a set with no elements: \begin{align*} \exists e\,\forall x\,(x\notin e). \end{align*} [/definition] By Extensionality, there is only one set with no elements. We denote it by $\varnothing$ and use it as the starting point for finite constructions. A single starting object is not enough to build compound mathematical data. The next existence axiom supplies the basic operation of gathering two already given sets into one set, which is the first step toward finite sets, ordered pairs, and relations. [definition: Pairing Axiom] For all sets $a$ and $b$, there exists a set $p$ whose elements are exactly $a$ and $b$: \begin{align*} \forall a\,\forall b\,\exists p\,\forall x\,(x\in p \iff (x=a \lor x=b)). \end{align*} [/definition] Extensionality lets us write the unique set supplied by Pairing as $\{a,b\}$. Taking $a=b$ gives the singleton $\{a\}$, and iterating this with the empty set produces the first finite sets. [example: First Finite Sets] Starting with the set $\varnothing$, apply Pairing with $a=\varnothing$ and $b=\varnothing$. We obtain a set $p$ such that \begin{align*} \forall x\,(x\in p \iff (x=\varnothing\lor x=\varnothing)). \end{align*} For every $x$, the formula $(x=\varnothing\lor x=\varnothing)$ is equivalent to $x=\varnothing$, so \begin{align*} \forall x\,(x\in p \iff x=\varnothing). \end{align*} Thus $p$ is the unique set whose only element is $\varnothing$, and we write \begin{align*} p=\{\varnothing\}. \end{align*} Apply Pairing again, now with $a=\varnothing$ and $b=\{\varnothing\}$. We obtain a set $q$ such that \begin{align*} \forall x\,(x\in q \iff (x=\varnothing\lor x=\{\varnothing\})). \end{align*} Hence the elements of $q$ are exactly $\varnothing$ and $\{\varnothing\}$, so \begin{align*} q=\{\varnothing,\{\varnothing\}\}. \end{align*} These are the von Neumann representatives \begin{align*} 1&=\{\varnothing\},\\ 2&=\{\varnothing,\{\varnothing\}\}, \end{align*} and they display the successor pattern $n+1=n\cup\{n\}$ that will later define the finite ordinals. [/example] Pairing collects two specified sets. To flatten an already existing family of sets into the set of all elements appearing anywhere in the family, ZFC uses the Union axiom. The example above produced small nested sets, but Pairing by itself cannot remove a layer of braces. The next axiom supplies exactly that missing operation: given a set whose elements are themselves sets, it forms one set containing all members found inside them. [definition: Union Axiom] For every set $A$, there exists a set $u$ such that \begin{align*} \forall x\,(x\in u \iff \exists y\,(x\in y \land y\in A)). \end{align*} [/definition] The unique set supplied by Union is denoted $\bigcup A$. In particular, if $A=\{a,b\}$ then $\bigcup A=a\cup b$, so binary unions are derived operations rather than primitive symbols. Union extracts elements from sets already present, but many constructions require all possible subcollections of a given set. The Power Set axiom supplies that collection of subsets as a single set, making it the key axiom for forming the next stage of the cumulative hierarchy. [definition: Power Set Axiom] For every set $A$, there exists a set $p$ such that \begin{align*} \forall x\,(x\in p \iff x\subset A). \end{align*} [/definition] The set supplied by Power Set is denoted $\mathcal P(A)$. It is the first axiom that systematically increases size: even if $A$ is finite, $\mathcal P(A)$ contains every subset of $A$ as an element. [example: The Power Set of the Empty Set] By the Power Set Axiom, membership in $\mathcal P(\varnothing)$ is determined by being a subset of $\varnothing$: \begin{align*} x\in\mathcal P(\varnothing) &\iff x\subset\varnothing\\ &\iff \forall t\,(t\in x\implies t\in\varnothing). \end{align*} Since $\varnothing$ has no elements, if $x\subset\varnothing$ then for every $t$, \begin{align*} t\in x &\implies t\in\varnothing,\\ t\in\varnothing &\implies t\in x, \end{align*} where the second implication holds because there is no $t\in\varnothing$. Hence \begin{align*} \forall t\,(t\in x\iff t\in\varnothing), \end{align*} so $x=\varnothing$ by *Extensionality Uniqueness*. Conversely, \begin{align*} \forall t\,(t\in\varnothing\implies t\in\varnothing), \end{align*} so $\varnothing\subset\varnothing$, and therefore $\varnothing\in\mathcal P(\varnothing)$. Thus the members of $\mathcal P(\varnothing)$ are exactly the sets equal to $\varnothing$: \begin{align*} \forall x\,(x\in\mathcal P(\varnothing)\iff x=\varnothing). \end{align*} The singleton $\{\varnothing\}$ also satisfies \begin{align*} \forall x\,(x\in\{\varnothing\}\iff x=\varnothing), \end{align*} so another application of *Extensionality Uniqueness* gives \begin{align*} \mathcal P(\varnothing)=\{\varnothing\}. \end{align*} This calculation separates the two roles of the axioms: Power Set supplies the set of all subsets, while Extensionality identifies it by comparing its members. [/example] Finite mathematics can be developed from Empty Set, Pairing, Union, and Power Set, but set theory also needs an infinite set. The next axiom gives an inductive set, from which the natural numbers are extracted. [definition: Inductive Set] A set $I$ is inductive if \begin{align*} \varnothing\in I \end{align*} and for every $x\in I$, \begin{align*} x\cup\{x\}\in I. \end{align*} [/definition] The operation $x\mapsto x\cup\{x\}$ is the successor operation. It is designed so that each natural number is the set of all smaller natural numbers. The preceding existence axioms build finite patterns, but they do not assert that all finite stages are contained together in one set. To develop arithmetic inside ZFC, we need an axiom guaranteeing a set closed under this successor operation. [definition: Infinity Axiom] There exists an inductive set. [/definition] Infinity gives at least one set closed under successor, but not yet the canonical set of natural numbers. Separation, introduced next, trims any inductive set down to the elements forced to be present in every inductive set. [quotetheorem:4800] [citeproof:4800] The theorem packages the natural numbers as a set rather than as a proper class. Later chapters will prove its well-ordering properties; here the key point is that Infinity supplies enough material and Separation extracts the least inductive part. This also illustrates the division of labour among the axioms. Infinity gives an ambient inductive set, while Separation removes the extra elements not forced by closure under successor. The construction is a model for later arguments: first find a set large enough to contain the desired objects, then use a defining condition to carve out the exact subset needed. ## Separation and Replacement How does ZFC form subsets defined by mathematical conditions without allowing the Russell paradox? The solution is that comprehension is allowed only inside a previously given set, and images are allowed only for definable functional relations. [definition: Separation Schema] For every formula $\varphi(x,p_1,\dots,p_n)$ and every set $A$, there exists a set $B$ such that \begin{align*} \forall x\,(x\in B \iff (x\in A \land \varphi(x,p_1,\dots,p_n))). \end{align*} [/definition] This is a schema: each formula gives an axiom. The bounding set $A$ is what prevents arbitrary definable classes from becoming sets. [example: Even Natural Numbers] Once addition has been defined on $\omega$, the condition “$n$ is even” can be written as the formula \begin{align*} \exists k\,(k\in\omega \land n=k+k). \end{align*} Apply Separation to the already existing set $\omega$ with this formula. It gives a set $E$ such that, for every set $n$, \begin{align*} n\in E &\iff n\in\omega \land \exists k\,(k\in\omega \land n=k+k)\\ &\iff n\in\omega \land \exists k\in\omega\,(n=k+k). \end{align*} Thus \begin{align*} E=\{n\in\omega:\exists k\in\omega\,(n=k+k)\}. \end{align*} For example, since $0=0+0$, we have \begin{align*} 0\in\omega \land \exists k\in\omega\,(0=k+k), \end{align*} using $k=0$, so $0\in E$. Similarly, since $2=1+1$, we have $2\in E$ using $k=1$. On the other hand, $1\in E$ would mean that some $k\in\omega$ satisfies $1=k+k$; this is false for the usual addition on $\omega$, so $1\notin E$. The important point is not the particular arithmetic facts, but the bounded construction: Separation does not form the class of all objects satisfying an arbitrary predicate, but it does form the subset of $\omega$ consisting exactly of those natural numbers satisfying the evenness formula. [/example] Separation only takes subsets of a known set. Replacement is the axiom that lets us collect outputs of a definable operation applied to all elements of a set. This becomes necessary when the desired objects are not already known to lie inside one pre-existing container. The next schema says that a definable rule with a unique output for each input has a set-sized image whenever its domain is a set. [definition: Replacement Schema] For every formula $\varphi(x,y,p_1,\dots,p_n)$, if for each $x\in A$ there is a unique $y$ such that $\varphi(x,y,p_1,\dots,p_n)$, then there exists a set $B$ such that \begin{align*} \forall y\,(y\in B \iff \exists x\in A\,\varphi(x,y,p_1,\dots,p_n)). \end{align*} [/definition] Replacement is essential for transfinite constructions. It says that the image of a set under a definable class function is a set, even when the outputs are not known in advance to lie inside a fixed ambient set. [example: Successor Image of a Set] For a fixed set $x$, the successor $x\cup\{x\}$ is the set whose elements are exactly the elements of $x$ together with $x$ itself: \begin{align*} t\in x\cup\{x\} &\iff t\in x \lor t\in\{x\}\\ &\iff t\in x \lor t=x. \end{align*} Thus the relation \begin{align*} \varphi(x,y) \quad\text{meaning}\quad \forall t\,(t\in y\iff (t\in x\lor t=x)) \end{align*} is functional: if $y$ and $y'$ both satisfy $\varphi(x,\cdot)$, then \begin{align*} \forall t\,(t\in y\iff (t\in x\lor t=x)) \end{align*} and \begin{align*} \forall t\,(t\in y'\iff (t\in x\lor t=x)), \end{align*} so for every $t$, \begin{align*} t\in y \iff t\in y', \end{align*} and extensionality gives $y=y'$. Now apply Replacement to this functional formula on the set $A$. It gives a set $B$ such that \begin{align*} y\in B &\iff \exists x\in A\,\varphi(x,y)\\ &\iff \exists x\in A\,\forall t\,(t\in y\iff (t\in x\lor t=x)). \end{align*} Equivalently, \begin{align*} B=\{x\cup\{x\}:x\in A\}. \end{align*} When $A=\omega$, each element of $B$ is the successor $n\cup\{n\}$ of some $n\in\omega$, so \begin{align*} B=\{n+1:n\in\omega\}. \end{align*} For instance, \begin{align*} 0\cup\{0\}&=\varnothing\cup\{\varnothing\}=\{\varnothing\}=1,\\ 1\cup\{1\}&=\{\varnothing\}\cup\{\{\varnothing\}\}=\{\varnothing,\{\varnothing\}\}=2. \end{align*} Thus Replacement turns the definable successor operation on $\omega$ into the set of positive finite ordinals. [/example] The example shows how Replacement turns one definable operation into a whole image set, but ordinary mathematics also needs a simpler guarantee: any finite list of already available objects can be gathered into a set. Without such a principle, ordered pairs and finite tuples would not be stable constructions inside ZFC, because their usual codings require repeatedly collecting finitely many specified sets. [quotetheorem:4801] [citeproof:4801] This theorem is often used silently when coding ordered pairs and functions. The axioms do not include finite tuples as primitive objects; they are constructed from sets. Its role is not to add tuples as new primitive entities, but to justify the finite set-building steps that make standard encodings available whenever their entries are sets. ## Foundation, Choice, and Structural Consequences What prevents infinitely descending membership chains, and what permits coordinated choices from many nonempty sets? Foundation and Choice answer these different structural questions. [definition: Foundation Axiom] Every nonempty set $A$ has an $\in$-minimal element: \begin{align*} A\ne\varnothing \implies \exists a\in A\,(a\cap A=\varnothing). \end{align*} [/definition] Foundation rules out membership cycles such as $x\in x$ and descending chains arranged into a set. Its deeper role in this chapter is to support the rank analysis of sets. To use Foundation in ordinary arguments, we need more than the slogan that membership descends. The practical consequence is an induction principle for sets themselves: if a property holds once it holds for all members, then Foundation prevents any first failure from existing. [quotetheorem:4802] [citeproof:4802] This principle is the membership-based analogue of [well-founded induction](/theorems/4797). It lets later proofs reason from elements to the set containing them, which is exactly the direction needed for rank computations and recursive definitions along the membership relation. The hypothesis matters because without Foundation there could be membership loops or descending membership patterns with no minimal counterexample. Many arguments also need a different kind of structural tool: selecting elements from many nonempty sets at once. Choice supplies this selection principle. [definition: Choice Axiom] For every set $A$ whose elements are nonempty and pairwise disjoint, there exists a set $C$ such that for every $a\in A$, the intersection $C\cap a$ has exactly one element. [/definition] This form of Choice gives a selector for a family of disjoint nonempty sets. Equivalent forms, including well-ordering and choice functions on arbitrary indexed families, will be studied later. [example: Choosing Representatives from Disjoint Blocks] Let $A=\{a_i:i\in I\}$ be a set whose elements are nonempty and pairwise disjoint. Applying the Choice Axiom to $A$, we obtain a set $C$ such that for every $a\in A$, the intersection $C\cap a$ has exactly one element. Since each $a_i$ is an element of $A$, this means that for every $i\in I$ there is a set $r_i$ such that \begin{align*} r_i\in C\cap a_i \end{align*} and, for every set $t$, \begin{align*} t\in C\cap a_i \implies t=r_i. \end{align*} Unpacking membership in an intersection gives \begin{align*} r_i\in C\cap a_i &\iff r_i\in C \land r_i\in a_i, \end{align*} so each selected element $r_i$ lies both in the representative set $C$ and in the block $a_i$. If $i,j\in I$ with $i\ne j$, then pairwise disjointness gives \begin{align*} a_i\cap a_j=\varnothing. \end{align*} Thus no set can lie in both $a_i$ and $a_j$. Indeed, if $t\in a_i$ and $t\in a_j$, then \begin{align*} t\in a_i\cap a_j, \end{align*} contradicting $a_i\cap a_j=\varnothing$. Therefore the element of $C\cap a_i$ is a representative of the block $a_i$ alone, not an element shared with another block. The set $C$ therefore contains exactly one chosen element from each block $a_i$, and the Choice Axiom supplies this set of representatives without requiring a separately defined rule that names the element of each block. [/example] Choice is independent in spirit from the cumulative picture: it does not create a new stage operation like Power Set, but it asserts that certain global selections exist inside the universe of sets. ## The Cumulative Hierarchy If sets are built in stages, what appears at each stage and why should every set appear somewhere? The cumulative hierarchy answers by iterating the power-set operation along the ordinals. [definition: Cumulative Hierarchy] The classes $V_\alpha$ are defined by transfinite recursion on ordinals: \begin{align*} V_0&=\varnothing,\\ V_{\alpha+1}&=\mathcal P(V_\alpha),\\ V_\lambda&=\bigcup_{\beta\in\lambda}V_\beta \quad\text{for limit ordinals }\lambda. \end{align*} [/definition] The hierarchy is cumulative because earlier stages persist into later stages. At a successor stage, every subset of the previous stage is admitted as a new set. Before ranks can classify individual sets, the hierarchy itself must have the expected monotonic behaviour. There are two possible failures to rule out: a later stage might lose objects already constructed, or a limit stage might fail to contain all earlier stages. Either failure would make the phrase "the stage at which a set appears" unstable. The next point to establish is therefore structural: successor and limit clauses must cooperate so that the stages form an increasing system rather than a disconnected collection of approximations. Without this coherence, later arguments about first appearance and rank would have no stable foundation. [quotetheorem:4803] [citeproof:4803] These properties are the formal version of the iterative conception of set. A set can only be formed from objects available earlier, and limit stages collect everything produced so far. They also make stage comparisons meaningful: once an object appears, it remains available at every later stage, so it makes sense to ask for the earliest stage sufficient to contain the elements of a given set. To turn the hierarchy from a global construction into a tool for studying a particular set, we need a numerical measure of when that set has been built from earlier material. This measure is assigned by an ordinal and records the first stage by which all elements of the set are already present. [definition: Rank] The rank of a set $x$, written $\operatorname{rank}(x)$, is the least ordinal $\alpha$ such that $x\subset V_\alpha$. [/definition] Rank measures the stage before which all elements of $x$ have appeared. The set $x$ itself then appears by stage $\operatorname{rank}(x)+1$. The remaining issue is existence: the definition only helps if every set has some ordinal stage that contains all of its elements. The rank theorem supplies that universe-wide guarantee, connecting Foundation with the cumulative hierarchy. [quotetheorem:4804] [citeproof:4804] The rank theorem is the bridge between the axioms and the hierarchy: every set is assigned a stage, so the hierarchy is not merely a useful construction but a universe-wide classification. [example: Small Rank Computations] We compute ranks using the definition: $\operatorname{rank}(x)$ is the least ordinal $\alpha$ such that $x\subset V_\alpha$. First, $V_0=\varnothing$. Since $\varnothing$ has no elements, \begin{align*} \forall t\,(t\in\varnothing\implies t\in V_0), \end{align*} so $\varnothing\subset V_0$. Because $0$ is the least ordinal, there is no smaller candidate, and therefore \begin{align*} \operatorname{rank}(\varnothing)=0. \end{align*} Next, \begin{align*} V_1=\mathcal P(V_0)=\mathcal P(\varnothing)=\{\varnothing\}. \end{align*} Thus the only element of $\{\varnothing\}$ is $\varnothing$, and $\varnothing\in V_1$, so \begin{align*} \{\varnothing\}\subset V_1. \end{align*} But \begin{align*} \varnothing\in\{\varnothing\} \end{align*} and \begin{align*} \varnothing\notin V_0=\varnothing, \end{align*} so $\{\varnothing\}\not\subset V_0$. Hence \begin{align*} \operatorname{rank}(\{\varnothing\})=1. \end{align*} For finite ordinals, the same pattern gives \begin{align*} \operatorname{rank}(n)=n\qquad(n\in\omega). \end{align*} Indeed, the base case is $n=0$, already computed. If $\operatorname{rank}(n)=n$, then $n\subset V_n$, so \begin{align*} n\in\mathcal P(V_n)=V_{n+1}. \end{align*} Also every $t\in n$ lies in $V_n$, hence in $V_{n+1}$ by monotonicity of the hierarchy from *Basic Properties of the Cumulative Hierarchy*. Since \begin{align*} n+1=n\cup\{n\}, \end{align*} every element of $n+1$ lies in $V_{n+1}$, so \begin{align*} n+1\subset V_{n+1}. \end{align*} The element $n\in n+1$ does not lie in any $V_m$ with $m\le n$, since membership in $V_m$ would force $\operatorname{rank}(n)<m\le n$, contradicting $\operatorname{rank}(n)=n$. Therefore the least stage containing all elements of $n+1$ is $V_{n+1}$. Now consider $\omega$. For each $n\in\omega$, the finite computation gives $\operatorname{rank}(n)=n$, so \begin{align*} n\subset V_n \end{align*} and hence \begin{align*} n\in V_{n+1}\subset V_\omega. \end{align*} Thus \begin{align*} \omega\subset V_\omega. \end{align*} No finite stage $V_m$ contains all elements of $\omega$: the element $m\in\omega$ is not in $V_m$, again because $\operatorname{rank}(m)=m$. Hence \begin{align*} \operatorname{rank}(\omega)=\omega. \end{align*} Finally, \begin{align*} \omega+1=\omega\cup\{\omega\}. \end{align*} Every finite ordinal $n\in\omega$ lies in $V_\omega$, and the previous computation gives $\omega\subset V_\omega$, so \begin{align*} \omega\in\mathcal P(V_\omega)=V_{\omega+1}. \end{align*} Therefore every element of $\omega+1$ lies in $V_{\omega+1}$: \begin{align*} \omega+1\subset V_{\omega+1}. \end{align*} But $\omega\in\omega+1$, and $\omega\notin V_\omega$ because $\operatorname{rank}(\omega)=\omega$ would otherwise be $<\omega$. Thus $\omega+1\not\subset V_\omega$, and no smaller stage can work. Hence \begin{align*} \operatorname{rank}(\omega+1)=\omega+1. \end{align*} These computations show that, for the first ordinals, rank agrees with ordinal height: each ordinal appears only after all of its members have appeared. [/example] These computations show how rank converts membership structure into ordinal height. They also point to a final distinction: the hierarchy classifies every set, but the hierarchy itself is not a set inside ZFC. [remark: Sets and Proper Classes] The hierarchy $V=\bigcup_\alpha V_\alpha$ is often written as if it were a class of all sets. In ZFC this is a definable proper class, not a set. The rank theorem says that every set belongs to some $V_\alpha$, while the collection of all stages is too large to be gathered into one set. [/remark] This chapter has established the formal ground rules for the rest of the course. The next chapter studies relations and well-orders, which provide the induction and recursion principles needed to use ordinals as precise stage indices. With the axioms in place, the central problem becomes how to organize arguments by rank and stage. Relations and well-orders provide the structure needed for transfinite induction and recursion, so the next chapter develops that language before ordinals are introduced as canonical stage indices. # 2. Relations, Well-Orders, and Transfinite Induction This chapter develops the order-theoretic language needed for induction and recursion beyond the finite. The first chapter introduced ZFC and the cumulative hierarchy; now we isolate the relation-theoretic structure that lets a construction proceed by earlier stages. The main question is: when does every failed proof or failed definition have a first point of failure? ## Organising Sets By Relations A comparison between objects is represented by a relation. The first task is to separate arbitrary relations from the relations that carry enough structure to support ordering arguments. [definition: Binary Relation] Let $A$ be a set. A binary relation on $A$ is a subset $R\subseteq A\times A$. [/definition] Writing $xRy$ for $(x,y)\in R$ gives a compact language for comparison. To use a relation as an order, however, comparison must behave consistently: every object should compare with itself, two objects should not be mutually below each other unless they are the same, and comparisons should pass through intermediate objects. These are exactly the conditions isolated by partial orders. [definition: Partial Order] Let $A$ be a set and let $\leq$ be a binary relation on $A$. The relation $\leq$ is a partial order on $A$ if, for all $x,y,z\in A$: 1. $x\leq x$. 2. If $x\leq y$ and $y\leq x$, then $x=y$. 3. If $x\leq y$ and $y\leq z$, then $x\leq z$. [/definition] A partial order allows incomparability. This is important for inclusion, divisibility, and forcing-style orders, where two objects may fail to lie on a single chain. That freedom becomes an obstruction when an argument needs to choose the earlier of two elements or search through a set in a single direction. If two elements can be incomparable, then phrases such as "the first counterexample" or "the smaller of the two" may have no meaning. To support those arguments, the order must require every pair of elements to be comparable. [definition: Linear Order] Let $A$ be a set and let $\leq$ be a partial order on $A$. The relation $\leq$ is a linear order on $A$ if, for all $x,y\in A$, either $x\leq y$ or $y\leq x$. [/definition] The strict relation associated to an order is the relation used in descent arguments: $x$ is below $y$ when $x\leq y$ and $x\ne y$. [example: Inclusion And Divisibility] For any set $A$, inclusion is a partial order on $\mathcal P(A)$. If $X \subseteq A$, then every element of $X$ is an element of $X$, so $X \subseteq X$. If $X \subseteq Y$ and $Y \subseteq X$, then an element belongs to $X$ exactly when it belongs to $Y$, so $X=Y$. If $X \subseteq Y$ and $Y \subseteq Z$, then every element of $X$ lies in $Y$ and hence lies in $Z$, so $X \subseteq Z$. If $A$ has two distinct elements $a$ and $b$, then $\{a\}$ and $\{b\}$ are incomparable under inclusion: since $a \in \{a\}$ but $a \notin \{b\}$, we have $\{a\} \nsubseteq \{b\}$; since $b \in \{b\}$ but $b \notin \{a\}$, we have $\{b\} \nsubseteq \{a\}$. Divisibility on the positive natural numbers is another partial order. For every positive natural number $n$, $n=1\cdot n$, so $n$ divides $n$. If $m$ divides $n$ and $n$ divides $m$, write $n=km$ and $m=\ell n$ with positive natural numbers $k,\ell$. Then \begin{align*} n = km = k(\ell n) = (k\ell)n. \end{align*} Since $n>0$, this gives $k\ell=1$, so $k=\ell=1$ and hence $m=n$. If $m$ divides $n$ and $n$ divides $p$, write $n=km$ and $p=\ell n$; then \begin{align*} p=\ell n=\ell(km)=(\ell k)m, \end{align*} so $m$ divides $p$. Thus divisibility is a partial order. For instance, $2$ divides $6$ because $6=3\cdot 2$, but $2$ and $3$ are incomparable: $3$ is not an integer multiple of $2$, and $2$ is not an integer multiple of $3$. [/example] The examples show that order alone is not enough for induction. A relation may organize objects coherently while still allowing an endless descent, and then a supposed counterexample need not have a first place where the argument fails. To support minimal-counterexample reasoning, the relation must force every nonempty collection to contain an element with nothing smaller inside that same collection. [definition: Well-Founded Relation] Let $A$ be a set and let $R$ be a binary relation on $A$. The relation $R$ is well-founded on $A$ if every nonempty subset $B\subseteq A$ has an $R$-minimal element: an element $b\in B$ such that no $c\in B$ satisfies $cRb$. [/definition] Well-foundedness asks for minimal elements, not necessarily least elements. In a partial order several unrelated elements may be minimal at the same time. [example: Strict Inclusion On Finite Sets] Let $A$ be finite, and define $XRY$ on $\mathcal P(A)$ to mean $X \subsetneq Y$. We show that $R$ is well-founded on $\mathcal P(A)$. Let $\mathcal F$ be a nonempty family of subsets of $A$. Since $A$ is finite, every $X \in \mathcal F$ is finite and satisfies $0 \leq |X| \leq |A|$. Thus the set \begin{align*} \{|X| : X \in \mathcal F\} \end{align*} is a nonempty finite set of natural numbers, so it has a least element. Choose $M \in \mathcal F$ with $|M|$ least among all members of $\mathcal F$. Suppose, toward a contradiction, that $M$ is not $R$-minimal in $\mathcal F$. Then there is some $C \in \mathcal F$ such that $CRM$, meaning $C \subsetneq M$. Choose $m \in M \setminus C$. Then $C \cup \{m\} \subseteq M$, and $m \notin C$, so \begin{align*} |C \cup \{m\}| = |C|+1. \end{align*} Since $C \cup \{m\} \subseteq M$ and both sets are finite, \begin{align*} |C|+1 = |C \cup \{m\}| \leq |M|. \end{align*} Hence $|C|<|M|$, contradicting the choice of $M$ as a member of $\mathcal F$ of least cardinality. Therefore no member of $\mathcal F$ is a proper subset of $M$, so $M$ is $R$-minimal in $\mathcal F$. Thus strict inclusion is well-founded on $\mathcal P(A)$ when $A$ is finite. [/example] For linear orders, a minimal element of a subset is automatically its least element. Combining linear comparability with well-foundedness gives an order in which every nonempty subproblem has a first case to handle. This is the structure needed for transfinite induction, where proofs proceed by eliminating the least possible counterexample. [definition: Well-Order] Let $A$ be a set and let $\leq$ be a linear order on $A$. The relation $\leq$ is a well-order on $A$ if every nonempty subset $B\subseteq A$ has a least element: an element $b\in B$ such that $b\leq c$ for all $c\in B$. [/definition] A well-order is a linear order with no hidden infinite descent. The associated strict order is well-founded, and the least element of a nonempty subset is the unique strict-minimal element of that subset. [example: The Natural Numbers] Let $B$ be a nonempty set of natural numbers, and choose $n \in B$. Set \begin{align*} C=\{m \in B : m \leq n\}. \end{align*} Then $n \in C$, so $C$ is nonempty, and $C$ is finite because $C \subseteq \{m \in \mathbb N : m \leq n\}$. List the elements of $C$ as $c_1,\ldots,c_k$. Define $a_1=c_1$, and for $1 \leq i<k$ define \begin{align*} a_{i+1}= \begin{cases} a_i, & \text{if } a_i \leq c_{i+1},\\ c_{i+1}, & \text{if } c_{i+1}<a_i. \end{cases} \end{align*} By induction on $i$, $a_i$ belongs to $\{c_1,\ldots,c_i\}$ and satisfies $a_i \leq c_j$ for every $1 \leq j \leq i$: the step follows because $a_{i+1}$ is chosen to be the smaller of $a_i$ and $c_{i+1}$. Thus $a_k$ is a least element of $C$. We now show that $a_k$ is least in all of $B$. Let $b \in B$. If $b<a_k$, then since $a_k \in C$ we have $a_k \leq n$, so \begin{align*} b<a_k \leq n, \end{align*} and hence $b \leq n$. Therefore $b \in C$, contradicting that $a_k$ is least in $C$. Thus no $b \in B$ satisfies $b<a_k$, so $a_k \leq b$ for every $b \in B$. Hence every nonempty set of natural numbers has a least element, which is exactly the well-ordering of the usual order on $\mathbb N$. [/example] Finite products illustrate how new well-orders are assembled from old ones. When a construction depends on two parameters, we need a single order that decides which pair comes first without losing the priority of the first coordinate. Lexicographic order does this by resolving the outer coordinate before looking inside the fibre over that coordinate. [definition: Lexicographic Order] Let $A$ and $B$ be linearly ordered sets. The lexicographic order on $A\times B$ compares first coordinates first, and compares second coordinates only when the first coordinates are equal. [/definition] This is the order used in dictionaries and in many recursive constructions with several parameters. [example: Lexicographic Order On A Finite Product] Let $A=\{1,2\}$ with $1<2$, and let $B=\{1,2,3\}$ with $1<2<3$. In the lexicographic order on $A\times B$, we compare first coordinates first; if the first coordinates agree, we compare second coordinates. Thus the pairs with first coordinate $1$ come before every pair with first coordinate $2$, and within each fixed first coordinate the second coordinates appear in the order $1<2<3$: \begin{align*} (1,1) < (1,2) < (1,3) < (2,1) < (2,2) < (2,3). \end{align*} We show that this finite lexicographic order is a well-order. Let $S$ be a nonempty subset of $A\times B$. The set of first coordinates appearing in $S$ is a nonempty subset of $\{1,2\}$, so it is either $\{1\}$, $\{2\}$, or $\{1,2\}$; in these three cases its least element is respectively $1$, $2$, and $1$. Call this least first coordinate $a$. Now consider \begin{align*} T=\{b\in B:(a,b)\in S\}. \end{align*} Since $a$ appears as a first coordinate in $S$, the set $T$ is nonempty. As a nonempty subset of $\{1,2,3\}$, it has a least element: if $1\in T$ the least element is $1$; if $1\notin T$ but $2\in T$ the least element is $2$; and if neither $1$ nor $2$ lies in $T$, then nonemptiness forces $T=\{3\}$, whose least element is $3$. Call this least second coordinate $b$. The pair $(a,b)$ lies in $S$. For any $(a',b')\in S$, the choice of $a$ gives $a\leq a'$. If $a<a'$, then $(a,b)<(a',b')$ by the first-coordinate clause of lexicographic order. If $a=a'$, then $b'\in T$, so the choice of $b$ gives $b\leq b'$, and hence $(a,b)\leq(a',b')$ by the second-coordinate clause. Therefore $(a,b)$ is the least element of $S$, so every nonempty subset of $A\times B$ has a first pair. [/example] The same two-stage choice suggests the general finite-product case, but there is a point to check beyond the example. A nonempty subset of a product can spread across many fibres, so the first coordinate must be chosen from the set of fibres that actually occur, and only then can one choose the least second coordinate inside that fibre. The theorem below packages this fibre-by-fibre argument for arbitrary well-ordered factors. [quotetheorem:4805] [citeproof:4805] This argument is the model for later constructions: first reduce to the earliest possible outer stage, then solve the problem inside that stage. The hypotheses on both factors are essential. If $A$ is not well-ordered, then a nonempty subset of $A$ with no least element gives a nonempty subset of $A \times B$ with no least element by fixing any element of $B$; similarly, if $B$ is not well-ordered, the fibre over the least chosen first coordinate can already fail. The result also depends on the chosen convention for lexicographic order: comparing first coordinates first gives an order type that can differ from the order obtained by comparing second coordinates first. ## Induction And Minimal Counterexamples Induction is a method for proving that counterexamples do not exist. In set-theoretic form, it asks why a property must hold everywhere if each element is forced by all of its predecessors. [quotetheorem:4797] [citeproof:4797] This is exactly the minimal-counterexample method. A first counterexample would have no earlier counterexample below it, so the local induction step would remove it. Well-foundedness is not a cosmetic hypothesis: on the integers with the usual strict order, the property $P(n)$ meaning $n < 0$ satisfies the backward-looking implication from smaller integers, but it does not hold for all integers, because the set of counterexamples has no least element. The theorem is a principle for proving already meaningful predicates; it does not by itself construct the objects whose existence a later recursive definition may require. [quotetheorem:1463] [citeproof:1463/set-theory-i-see-axiomatic-set-theory-entry-above] The word transfinite matters because an element need not have an immediate predecessor. At a limit-like stage, the induction step uses all earlier stages at once. The well-ordering hypothesis is what prevents a counterexample set from drifting downward forever without a first member: for example, the usual order on $\mathbb Z$ is linear but not well-ordered, and the predicate $P(n)$ meaning $n < 0$ again satisfies the predecessor implication while failing at $0$. Transfinite induction is therefore a proof principle for well-ordered time, not a licence to induct over an arbitrary linear order. [example: Least Natural Number Principle From Induction] Assume ordinary induction on the natural numbers, and let $B$ be a nonempty set of natural numbers. We prove that $B$ has a least element. Suppose instead that $B$ has no least element, and define $P(n)$ to mean: \begin{align*} \text{there is no } b\in B \text{ such that } b\leq n. \end{align*} First consider $n=1$. If $1\in B$, then $1$ would be a least element of $B$, because every positive natural number $b$ satisfies $1\leq b$. This contradicts the assumption that $B$ has no least element. Hence $1\notin B$, so no element of $B$ is at most $1$, and therefore $P(1)$ holds. Now assume $P(n)$ holds. We show $P(n+1)$. Suppose, toward a contradiction, that $P(n+1)$ fails. Then there is some $b\in B$ such that \begin{align*} b\leq n+1. \end{align*} Since $P(n)$ holds, no element of $B$ is at most $n$, so this same $b$ cannot satisfy $b\leq n$. For natural numbers with $b\leq n+1$, the alternatives are $b\leq n$ or $b=n+1$; the first alternative is impossible, so \begin{align*} b=n+1. \end{align*} Thus $n+1\in B$. If $c\in B$, then $c\leq n$ is impossible by $P(n)$, so $n<c$ or $c=n+1$. In either case, \begin{align*} n+1\leq c. \end{align*} Therefore $n+1$ is a least element of $B$, contradicting the assumption that $B$ has no least element. Hence $P(n+1)$ holds. By ordinary induction, $P(n)$ holds for every natural number $n$. Since $B$ is nonempty, choose $b\in B$. Applying $P(b)$ says that no element of $B$ is at most $b$, but $b\in B$ and \begin{align*} b\leq b. \end{align*} This contradiction shows that the assumption was false, so every nonempty set of natural numbers has a least element. [/example] This example also explains why induction and well-ordering are two forms of the same idea. Induction rules out a first failure, while well-ordering supplies a first failure when any failure exists. [remark: Base Cases And Limit Stages] In transfinite induction, base cases are included in the general induction step. If an element has no predecessors, the hypothesis that P holds at all earlier elements is vacuous. At a limit-like element, there is no single previous stage, so the hypothesis genuinely uses every earlier stage. [/remark] Minimal-counterexample arguments often hide the predicate P, but their formal content is the same. [example: Divisibility Minimal Counterexample] Let $S$ be a nonempty set of positive natural numbers with the following closure property: whenever $n\in S$, $d$ is a positive divisor of $n$, and $d<n$, then $d\in S$. By the *well-ordering principle for the natural numbers*, the nonempty set $S$ has a least member; call it $m$. We show that $m$ cannot be composite. Suppose, toward a contradiction, that $m$ is composite. Then there are positive natural numbers $a$ and $b$ such that \begin{align*} m=ab,\qquad 1<a,\qquad 1<b. \end{align*} Set $d=a$. Since $m=ab$, the number $d$ divides $m$. Also $b>1$, so $b\geq 2$, and hence \begin{align*} m=ab\geq a\cdot 2=2a>a=d. \end{align*} Thus $d$ is a positive divisor of $m$ with $d<m$. Since $m\in S$, the closure property gives $d\in S$. But $m$ is least in $S$, so every element of $S$ is at least $m$; in particular, \begin{align*} m\leq d. \end{align*} This contradicts $d<m$. Therefore $m$ is not composite. Since $m$ is a positive natural number, the remaining possibilities are $m=1$ or $m>1$. If $m>1$, then being non-composite means exactly that $m$ has no factorization $m=ab$ with $1<a<m$ and $1<b<m$, so $m$ is prime. Hence the least member of $S$ is either $1$ or prime. [/example] ## Recursion Along Well-Founded Relations Induction proves uniqueness once earlier stages determine later stages. Recursion is the corresponding existence principle: if a rule tells us how to define a value from all earlier values, can the whole function be built? For class-length recursion, we must also know that each element has only set many predecessors. Otherwise the value at a stage might require a proper-class-sized input, which is not a set function in ZFC. [definition: Set-Like Relation] Let A be a class and let R be a class relation on A. The relation R is set-like on A if, for every a in A, the class of all b in A such that bRa is a set. [/definition] Relations on sets are automatically set-like. The distinction becomes important for recursion over proper classes, especially over the ordinals. The obstruction is that a recursive rule at a stage is supposed to receive the already constructed predecessor values as a set-sized input. If a single element has a proper class of predecessors, that input is too large to be an ordinary function object in ZFC. A usable class-recursion theorem therefore needs both non-circularity from well-foundedness and set-sized predecessor histories from set-likeness. [quotetheorem:4806] [citeproof:4806] The theorem separates two requirements. Well-foundedness supplies non-circularity: without it, a rule such as $F(a)=F(a)+1$ along a self-loop would demand a value before that same value had been defined. Set-likeness supplies set-sized input data: without it, a single stage could require a function on a proper class of predecessors, which is not the sort of set function accepted by the rule $G$ in ZFC. The theorem does not say that arbitrary circular equations have solutions, and it does not cover recursion rules whose value at one stage depends on a proper-class-sized history. [example: Defining Rank By Recursion] For a set $x$, define the predecessor relation by \begin{align*} yRx \quad \text{if and only if} \quad y\in x. \end{align*} Foundation says that this membership relation is well-founded, and it is set-like because the $R$-predecessors of $x$ are exactly the elements of the set $x$. We may therefore apply *Well-Founded Recursion For Set-Like Relations* with the rule \begin{align*} G(x,h)=\bigcup\{h(y)+1:y\in x\}, \end{align*} where $h$ is the already-defined rank function on the members of $x$. The resulting function $\operatorname{rank}$ satisfies \begin{align*} \operatorname{rank}(x) = \bigcup\{\operatorname{rank}(y)+1:y\in x\}. \end{align*} This ordinal is the least ordinal strictly greater than the ranks of all members of $x$. Indeed, if $y\in x$, then $\operatorname{rank}(y)\in \operatorname{rank}(y)+1$, and since $\operatorname{rank}(y)+1$ is one of the ordinals being unioned, we have \begin{align*} \operatorname{rank}(y)\in \bigcup\{\operatorname{rank}(z)+1:z\in x\} = \operatorname{rank}(x). \end{align*} Thus $\operatorname{rank}(y)<\operatorname{rank}(x)$ for every $y\in x$. Conversely, if $\beta$ is an ordinal such that $\operatorname{rank}(y)<\beta$ for every $y\in x$, then for each $y\in x$, \begin{align*} \operatorname{rank}(y)+1\subseteq \beta, \end{align*} because $\beta$ is transitive and contains $\operatorname{rank}(y)$. Taking the union over all $y\in x$ gives \begin{align*} \operatorname{rank}(x) =\bigcup\{\operatorname{rank}(y)+1:y\in x\} \subseteq \beta. \end{align*} So $\operatorname{rank}(x)\leq \beta$. Hence the recursion defines exactly the intended rank: the first ordinal lying above the ranks of all elements of $x$. [/example] ## Order Isomorphism And Order Type A well-order has both a carrier set and an abstract pattern of earlier and later positions. The next question is how to recognise when two ordered sets have the same pattern. [definition: Order Isomorphism] Let A and B be linearly ordered sets. An order isomorphism from A to B is a bijection from A to B that preserves and reflects the order. [/definition] Order isomorphism forgets the names of elements and remembers only relative position. [example: Renaming A Finite Order] Let $A=\{a,b,c\}$ with $a<b<c$, and let $B=\{10,20,30\}$ with $10<20<30$. Define $f:A\to B$ by \begin{align*} f(a)=10,\qquad f(b)=20,\qquad f(c)=30. \end{align*} The values $10,20,30$ are distinct, so $f$ is injective. Every element of $B$ is one of these three values, so $f$ is surjective. Hence $f$ is a bijection. We check that $f$ preserves and reflects the order. The strict comparisons in $A$ are \begin{align*} a<b,\qquad b<c,\qquad a<c. \end{align*} Their images satisfy \begin{align*} f(a)=10<20=f(b),\\ f(b)=20<30=f(c),\\ f(a)=10<30=f(c). \end{align*} Thus whenever $x<y$ in $A$, we have $f(x)<f(y)$ in $B$. Conversely, the strict comparisons in $B$ are \begin{align*} 10<20,\qquad 20<30,\qquad 10<30, \end{align*} and these are exactly \begin{align*} f(a)<f(b),\qquad f(b)<f(c),\qquad f(a)<f(c). \end{align*} So whenever $f(x)<f(y)$ in $B$, the corresponding elements satisfy $x<y$ in $A$. Therefore $f$ is an order isomorphism from $A$ to $B$. The labels $a,b,c$ and $10,20,30$ differ, but the ordered pattern is the same: first element, then second element, then third element. [/example] An isomorphism of a well-order to itself cannot freely slide the order along, because any nontrivial movement would have a first moved point. At that point, all earlier elements have already been fixed, leaving no room to preserve the initial segment below it while changing the point itself. This is the rigidity phenomenon that distinguishes well-orders from many familiar infinite linear orders. [quotetheorem:4807] [citeproof:4807] This rigidity is a special feature of well-orders, not of linear orders in general. On $\mathbb Z$ with its usual order, the map $n \mapsto n+1$ is a nontrivial order automorphism, and it exists because there is no first point at which the map starts moving elements. Dense countable orders give even more flexibility. Rigidity also has a narrow scope: it rules out automorphisms of a fixed well-order, but it does not say that two different presentations of a well-order are literally equal, only that an isomorphism between them, when it exists, is forced by the order. The next issue is how two different well-orders can be compared. A direct comparison may not give an isomorphism between the whole orders, but well-foundedness should still force one order to match an initial part of the other, or else force the reverse comparison. This comparability is what lets well-orders behave like lengths rather than arbitrary arrangements. [quotetheorem:4808] [citeproof:4808] The theorem says that well-orders are comparable by length. This relies on well-ordering in two ways: the recursive comparison must always have a next unused element when one exists, and any disagreement between two attempted comparisons must have a first point. Arbitrary linear orders are not comparable in this sense; for instance, $\mathbb Z$ with its usual order is not order-isomorphic to $\mathbb Q$ with its usual order or to an initial segment of it, and $\mathbb Q$ is not order-isomorphic to an initial segment of $\mathbb Z$. The theorem compares order types, not cardinalities or underlying sets, and its forward role is to make ordinal comparison possible once each well-order is replaced by a canonical ordinal representative. [quotetheorem:4809] [citeproof:4809] Uniqueness justifies speaking of the order isomorphism between two well-orders when one exists. The proof depends directly on rigidity: if a well-order admitted a nontrivial automorphism, then composing any isomorphism with that automorphism would produce a second isomorphism with the same target. Outside well-orders this failure occurs, for example on $\mathbb Z$, where the shifts $n \mapsto n+k$ are distinct order automorphisms. The result still does not choose a canonical representative for the order type; it only says that once two well-ordered presentations are fixed, the order-preserving bijection between them has no freedom. Since the labels of the underlying set are irrelevant to this comparison, we need a term for the ordered pattern itself. This term should identify well-orders that differ only by a relabelling preserving the order, while keeping apart well-orders whose positions are arranged differently. [definition: Order Type] The order type of a well-order is its equivalence class under order isomorphism. [/definition] Later, ordinals will serve as canonical representatives of these order types. For now, the point is that two well-orders have the same order type exactly when they are order-isomorphic. [example: Two Presentations Of The Same Order Type] Let \begin{align*} A=\{1,2,3\}\quad\text{with}\quad 1<2<3, \end{align*} and let \begin{align*} B=\{a,b,c\}\quad\text{with}\quad a<b<c. \end{align*} Define $f:A\to B$ by \begin{align*} f(1)=a,\qquad f(2)=b,\qquad f(3)=c. \end{align*} The values $a,b,c$ are distinct, so $f$ is injective. Every element of $B$ is one of $a,b,c$, and each is hit by $f$, so $f$ is surjective. Hence $f$ is a bijection. We check that $f$ preserves and reflects the order. The strict comparisons in $A$ are \begin{align*} 1<2,\qquad 2<3,\qquad 1<3. \end{align*} Their images satisfy \begin{align*} f(1)=a<b=f(2),\\ f(2)=b<c=f(3),\\ f(1)=a<c=f(3). \end{align*} Thus $x<y$ in $A$ implies $f(x)<f(y)$ in $B$. Conversely, the strict comparisons in $B$ are \begin{align*} a<b,\qquad b<c,\qquad a<c, \end{align*} and these are exactly \begin{align*} f(1)<f(2),\qquad f(2)<f(3),\qquad f(1)<f(3). \end{align*} Thus $f(x)<f(y)$ in $B$ implies $x<y$ in $A$. Therefore $f$ is an order isomorphism, so the two ordered sets have the same order type. The isomorphism is forced. If $g:A\to B$ is any order isomorphism, then $1<2$ and $1<3$, so \begin{align*} g(1)<g(2),\qquad g(1)<g(3). \end{align*} Thus $g(1)$ is below the two other values in $B$, so $g(1)=a$. Similarly, $3$ is above both $1$ and $2$, so \begin{align*} g(1)<g(3),\qquad g(2)<g(3), \end{align*} and hence $g(3)=c$. Since $g$ is a bijection and $a$ and $c$ are already used, the remaining value gives \begin{align*} g(2)=b. \end{align*} So $g=f$, and the unique order isomorphism sends $1$ to $a$, $2$ to $b$, and $3$ to $c$. [/example] This chapter has built the bridge from finite induction to transfinite methods. Well-foundedness rules out infinite descent, well-orders provide first counterexamples, recursion turns predecessor-dependent rules into functions, and order isomorphism isolates the abstract shape of a well-ordered sequence. The next chapter turns these order types into ordinals, making transfinite induction and recursion part of the internal language of ZFC. Well-orders now give us a way to compare ordered structures up to isomorphism, but the abstract order types still need concrete representatives. The next chapter answers that need by defining the von Neumann ordinals, where membership itself encodes the order relation and transfinite reasoning becomes internal to ZFC. # 3. Ordinals This chapter turns well-orders into canonical objects. In the previous chapter, well-orders were compared up to order isomorphism; here we choose a particular representative for each well-order type. The representatives are the von Neumann ordinals, whose order relation is membership itself. The central gain is that order types become sets that can be compared by $\alpha \in \beta$, extended by taking successors, and collected by unions. This makes transfinite induction, recursion, and later cardinal arithmetic possible without continually passing through arbitrary well-ordered sets. ## The Membership Model of Well-Order Type How should a set represent all earlier stages of a well-order without adding extra coding data? If we merely choose an arbitrary set with the right cardinality, membership may point outside the intended collection: for instance, $\{\{\varnothing\}\}$ contains $\{\varnothing\}$ but omits its predecessor $\varnothing$. The von Neumann answer is to make each ordinal equal to the set of all its predecessors, so that the membership relation carries the entire order structure. [definition: Transitive Set] A set $x$ is transitive if for every $y \in x$ and every $z \in y$, we have $z \in x$. [/definition] Transitivity says that elements of elements have already been admitted into the set. This is the structural feature that lets membership behave like a predecessor relation rather than an arbitrary containment relation. [example: First Transitive Sets] The empty set $\varnothing$ is transitive because the condition \begin{align*} y\in\varnothing \text{ and } z\in y \quad \Longrightarrow \quad z\in\varnothing \end{align*} has no possible instance: there is no set $y$ with $y\in\varnothing$. The set $\{\varnothing\}$ is transitive. If $y\in\{\varnothing\}$, then $y=\varnothing$. There is no $z$ with $z\in\varnothing$, so every implication of the form \begin{align*} y\in\{\varnothing\} \text{ and } z\in y \quad \Longrightarrow \quad z\in\{\varnothing\} \end{align*} is satisfied. The set $\{\{\varnothing\}\}$ is not transitive. Its only element is $\{\varnothing\}$, and $\varnothing\in\{\varnothing\}$. Thus \begin{align*} \{\varnothing\}\in\{\{\varnothing\}\} \quad\text{and}\quad \varnothing\in\{\varnothing\}. \end{align*} Transitivity would require $\varnothing\in\{\{\varnothing\}\}$, but the only element of $\{\{\varnothing\}\}$ is $\{\varnothing\}$, and $\varnothing\ne\{\varnothing\}$. Therefore $\{\{\varnothing\}\}$ fails the transitivity condition. [/example] The examples show why transitivity is not a size condition. It is a closure condition under moving downward through membership. Transitivity alone is still not enough: a set can be closed under predecessors without membership comparing every pair in a single well-ordered line. To turn such sets into order types, we also require that membership give a well-order. [definition: Von Neumann Ordinal] A set $\alpha$ is an ordinal if $\alpha$ is transitive and the relation $\in$ well-orders $\alpha$. [/definition] For an ordinal $\alpha$, the ordering on $\alpha$ is written $\beta < \gamma$ iff $\beta \in \gamma$. Thus the elements of $\alpha$ are its earlier stages. This convention is the source of the compact formulas $0=\varnothing$, $1=\{0\}$, and $2=\{0,1\}$. [example: The Finite Ordinals] Set $0=\varnothing$ and define $n+1=n\cup\{n\}$. The first stages are obtained by substituting this definition: \begin{align*} 1&=0\cup\{0\} =\varnothing\cup\{0\} =\{0\},\\ 2&=1\cup\{1\} =\{0\}\cup\{1\} =\{0,1\},\\ 3&=2\cup\{2\} =\{0,1\}\cup\{2\} =\{0,1,2\}. \end{align*} Thus each successor adds the previous whole stage as one new element. The transitivity can be checked at the same level. For $2=\{0,1\}$, the only possible elements are $0=\varnothing$ and $1=\{0\}$. The element $0$ has no elements, while the only element of $1$ is $0$, and $0\in 2$. Hence every element of an element of $2$ is again in $2$. For $3=\{0,1,2\}$, the elements of $0$ are none, the only element of $1$ is $0$, and the elements of $2$ are $0$ and $1$; all of these lie in $3$. The same induction shows that if $n=\{0,1,\dots,n-1\}$ is transitive, then \begin{align*} n+1=n\cup\{n\}=\{0,1,\dots,n-1,n\} \end{align*} is transitive: elements of old elements already lie in $n$, and elements of the new element $n$ also lie in $n$. Membership gives the usual finite order because \begin{align*} 0\in 1,\qquad 0\in 2,\qquad 1\in 2,\qquad 0\in 3,\qquad 1\in 3,\qquad 2\in 3, \end{align*} and, in general, $i\in j$ exactly when $i<j$ for finite ordinals $i,j<n$. So the set $n=\{0,1,\dots,n-1\}$ represents the usual ordered list of its earlier stages. [/example] The definition is recursive in spirit, but it is still a plain first-order property of a set. A possible defect remains: a set might satisfy the external description of being a stage while having internal elements that are not themselves legitimate stages. If that happened, induction over an ordinal would break as soon as it looked inside the ordinal. The closure of ordinals under membership rules out exactly this obstruction. [quotetheorem:4810] [citeproof:4810] This theorem explains why ordinals behave like perfectly nested time stages. Both hypotheses in the definition are doing work: transitivity prevents missing predecessors, while well-ordering prevents membership cycles or incomparable branches from being mistaken for an order type. The theorem does not say that every transitive set is an ordinal; it says that once a genuine ordinal has been built, all of its internal stages are genuine ordinals too. This closure property is what makes later proofs by induction over $\alpha$ local: when proving something at stage $\alpha$, every earlier object encountered inside $\alpha$ is again a legitimate ordinal stage. [remark: Ordinals as Initial Segments] If $\alpha$ is an ordinal and $\beta \in \alpha$, then $\beta$ is exactly the initial segment $\{\gamma \in \alpha : \gamma \in \beta\}$ below $\beta$. Thus an element of an ordinal is not merely labelled by an earlier position; it is the earlier part of the order. [/remark] ## Successor and Limit Stages Once ordinals are stages, the next question is how to pass from a stage to the next one, and how to recognise stages that have no immediate predecessor. Simply adjoining an arbitrary new marker would destroy the membership representation, because the new marker would not itself be the set of all previous stages. The successor construction avoids this by adding the old ordinal $\alpha$ as the new element. These two operations separate ordinal theory into successor behaviour and limit behaviour. [definition: Successor Ordinal] For an ordinal $\alpha$, its successor is \begin{align*} \alpha+1 := \alpha \cup \{\alpha\}. \end{align*} An ordinal $\beta$ is a successor ordinal if there exists an ordinal $\alpha$ such that $\beta=\alpha+1$. [/definition] The successor operation adds the previous whole stage as a new last element. Because the elements of $\alpha$ are already the predecessors of $\alpha$, adjoining $\alpha$ gives exactly one new top point. There is still a preservation question: adding a new element to a well-ordered transitive set could in principle disturb transitivity or the membership order. The successor construction is useful only if this operation produces another ordinal, so the next result checks that the formal definition of ordinal survives the passage from $\alpha$ to $\alpha+1$. [quotetheorem:4811] [citeproof:4811] Successor ordinals model the act of adding one more position, and the theorem verifies that this operation stays inside the class of ordinals. The hypothesis that $\alpha$ is already an ordinal cannot be dropped: for a non-transitive set such as $x=\{\{\varnothing\}\}$, the set $x\cup\{x\}$ still has membership defects inherited from $x$. The theorem also does not say that every nonzero ordinal is a successor; it only identifies one reliable way to build a next stage. That distinction is why the course next separates successors from limit ordinals, which are reached by collecting earlier stages rather than appending a final one. [definition: Limit Ordinal] An ordinal $\lambda$ is a limit ordinal if $\lambda \ne \varnothing$ and there is no ordinal $\alpha$ such that $\lambda=\alpha+1$. [/definition] The exclusion of $\varnothing$ is conventional in these notes: $0$ has no predecessor, but it is treated separately from genuine limit stages. A limit ordinal is a stage reached after all earlier stages have been collected, with no final earlier stage. [example: The Ordinal Omega] Let $\omega$ be the set whose elements are exactly the finite ordinals \begin{align*} 0,\ 1,\ 2,\ 3,\ \ldots . \end{align*} We first check transitivity. Suppose $x\in y$ and $y\in\omega$. Since $y\in\omega$, the ordinal $y$ is finite, so $y=\{0,1,\ldots,y-1\}$ for some finite stage. The relation $x\in y$ means that $x$ is one of the earlier finite ordinals: \begin{align*} x\in y=\{0,1,\ldots,y-1\}. \end{align*} Hence $x$ is finite, so $x\in\omega$. Therefore every element of an element of $\omega$ is again an element of $\omega$. Membership orders $\omega$ in the usual natural-number order. For finite ordinals $m$ and $n$, \begin{align*} m\in n \quad\Longleftrightarrow\quad m\in\{0,1,\ldots,n-1\} \quad\Longleftrightarrow\quad m<n. \end{align*} Thus every nonempty subset of $\omega$ has a membership-least element, namely its least finite ordinal in the usual order. Finally, $\omega$ is not a successor ordinal. If $\omega=\alpha+1=\alpha\cup\{\alpha\}$, then $\alpha\in\omega$, so $\alpha$ is finite. But the successor $\alpha+1$ is also finite, hence $\alpha+1\in\omega$. Since $\omega=\alpha+1$ has $\alpha$ as its new largest element, every element of $\omega$ would have to be either in $\alpha$ or equal to $\alpha$; the element $\alpha+1\in\omega$ is neither, because $\alpha\in\alpha+1$ and $\alpha+1\ne\alpha$. This contradiction shows that $\omega$ has no immediate predecessor: it is the stage obtained by collecting all finite stages at once. [/example] The successor-limit distinction is the first place where ordinal arithmetic diverges from finite intuition. The notation $\omega+1$ does not mean a larger cardinality; it means the order obtained by placing one new point after all natural-number positions. [example: Omega Plus One] By the successor definition, \begin{align*} \omega+1 &=\omega\cup\{\omega\}. \end{align*} Thus if $x\in\omega+1$, then either $x\in\omega$ or $x=\omega$. The elements of $\omega$ are exactly the finite ordinals, so the membership order on $\omega+1$ is \begin{align*} 0\in 1\in 2\in 3\in\cdots\in\omega. \end{align*} Moreover, $\omega$ is the largest element of $\omega+1$: for every $x\in\omega+1$, either $x\in\omega$ or $x=\omega$. As a set, $\omega+1$ is countably infinite. One explicit bijection $f:\omega\to\omega+1$ is \begin{align*} f(0)&=\omega,\\ f(n+1)&=n \qquad \text{for each finite ordinal } n. \end{align*} Every element of $\omega+1$ is hit: $\omega=f(0)$, and each finite ordinal $n\in\omega$ is $f(n+1)$. The map is injective because $f(0)=\omega$ is not finite, while $f(n+1)=n$ and $f(m+1)=m$ are equal exactly when $n=m$. However, $\omega+1$ is not order-isomorphic to $\omega$. The ordinal $\omega+1$ has largest element $\omega$, while $\omega$ has no largest element: if $n\in\omega$, then $n+1\in\omega$ and $n\in n+1$. Hence $\omega+1$ and $\omega$ have the same cardinality but different order type. [/example] ## Comparing Ordinals For arbitrary well-orders, comparison requires constructing order embeddings and analysing initial segments. Without canonical representatives, two isomorphic well-orders can be different sets, so equality of sets cannot express equality of order type. For ordinals, the canonical representation should make comparison internal: two ordinals should be equal, or one should literally be an element of the other. [quotetheorem:4812] [citeproof:4812] Trichotomy is stronger than saying ordinals form a linear order up to isomorphism. It says the isomorphism classes have been replaced by canonical sets, so comparison is set membership. The ordinal hypotheses are essential: arbitrary transitive sets need not be linearly ordered by membership, and arbitrary well-ordered sets need not be comparable by membership at all. The theorem does not turn the ordinals into a set-sized linear order; it describes a proper-class ordering, which is why Burali-Forti appears immediately below. In practice, trichotomy is the standard proof tool for ordinal comparisons: to prove $\alpha\subseteq\beta$, it is enough to rule out $\beta\in\alpha$, and to prove equality, it is enough to rule out both strict membership alternatives. [remark: The Class of Ordinals] The ordinals do not form a set. If there were a set $O$ of all ordinals, then the union argument below would produce an ordinal containing every ordinal, and in particular it would have to contain itself. This is the Burali-Forti obstruction in its simplest form. [/remark] The comparison theorem lets us write $\alpha<\beta$ for $\alpha\in\beta$ without ambiguity. It also lets us reason about subsets of ordinals as genuinely linearly ordered collections. For induction and minimal-counterexample arguments, linear comparability is not enough by itself. A nonempty set of candidate ordinals must actually have a first member; otherwise one could compare any two candidates but still fail to start the argument. The required well-ordering principle for sets of ordinals is the next structural fact. [quotetheorem:4813] [citeproof:4813] This theorem is the technical reason that sets of ordinals can be treated like ordered lists, even when no enumeration has been specified. The word "set" is essential: the class of all ordinals is not itself a set, so the theorem is not licensing a global minimum argument over every ordinal at once. What it does license is the familiar move used throughout transfinite arguments: from any nonempty set of candidate ordinals, choose the least counterexample. It prepares the main construction of the section: taking the least ordinal above all members of a given set. ## Suprema and Ordinal-Indexed Sequences Given a set of earlier stages, what is the stage reached after all of them have occurred? Taking an ordinary maximum is not enough, because a set such as $\{0,1,2,\dots\}$ has no largest finite stage. For ordinals the answer is not a new coding construction: it is just the union of the set. [definition: Supremum of Ordinals] Let $A$ be a set of ordinals. The supremum of $A$ is \begin{align*} \sup A := \bigcup A. \end{align*} [/definition] This definition is meaningful because each ordinal is the set of its predecessors. Taking the union gathers all stages that occur below some member of $A$. [quotetheorem:4814] [citeproof:4814] This result is the reason limit stages are usually written as suprema. The hypothesis that $A$ is a set of ordinals is again essential: unions of arbitrary sets need not be well-ordered by membership, and the union of a proper class is not a set supplied by the union axiom. The theorem does not say that $\bigcup A$ is a member of $A$; for $A=\omega$, the supremum is $\omega$, which is above every finite member but not itself finite. In computations, the method is to identify all predecessors appearing somewhere in the family and then collect exactly those predecessors. For instance, $\omega=\sup\{0,1,2,\dots\}$, while $\omega+\omega$ is the supremum of $\omega,\omega+1,\omega+2,\dots$. [example: Omega Plus Omega] For each finite ordinal $n$, define $\omega+0=\omega$ and \begin{align*} \omega+(n+1)=(\omega+n)+1=(\omega+n)\cup\{\omega+n\}. \end{align*} The first few stages are \begin{align*} \omega+1&=\omega\cup\{\omega\},\\ \omega+2&=(\omega+1)\cup\{\omega+1\} =\omega\cup\{\omega\}\cup\{\omega+1\},\\ \omega+3&=(\omega+2)\cup\{\omega+2\} =\omega\cup\{\omega\}\cup\{\omega+1\}\cup\{\omega+2\}. \end{align*} Thus, by finite induction, \begin{align*} \omega+n=\omega\cup\{\omega+k:k<n\}. \end{align*} Indeed, if this holds for $n$, then \begin{align*} \omega+(n+1) &=(\omega+n)\cup\{\omega+n\}\\ &=\bigl(\omega\cup\{\omega+k:k<n\}\bigr)\cup\{\omega+n\}\\ &=\omega\cup\{\omega+k:k<n+1\}. \end{align*} Now compute the supremum: \begin{align*} \omega+\omega &=\sup\{\omega+n:n<\omega\}\\ &=\bigcup_{n<\omega}(\omega+n)\\ &=\bigcup_{n<\omega}\bigl(\omega\cup\{\omega+k:k<n\}\bigr)\\ &=\omega\cup\{\omega+k:k<\omega\}. \end{align*} The last equality follows because every finite ordinal belongs to $\omega\subseteq\omega+n$ for every $n<\omega$, and $\omega+k$ belongs to $\omega+(k+1)$ since \begin{align*} \omega+(k+1)=(\omega+k)\cup\{\omega+k\}. \end{align*} Conversely, if $x\in\omega+n$, then either $x\in\omega$ or $x=\omega+k$ for some $k<n$, so no other elements occur in the union. The membership order therefore begins \begin{align*} 0\in 1\in 2\in 3\in\cdots\in\omega\in\omega+1\in\omega+2\in\cdots . \end{align*} This is a copy of $\omega$ followed by a second copy of $\omega$. As a set, $\omega+\omega$ is countably infinite. One explicit bijection $f:\omega\to\omega+\omega$ is \begin{align*} f(2n)&=n,\\ f(2n+1)&=\omega+n \end{align*} for finite ordinals $n$. Every finite ordinal $n$ is hit by $f(2n)$, and every element of the second block $\omega+n$ is hit by $f(2n+1)$. The two clauses cannot collide because $f(2n)\in\omega$, while $f(2m+1)=\omega+m\notin\omega$. However, $\omega+\omega$ is not order-isomorphic to $\omega$. The element $\omega\in\omega+\omega$ is nonzero and is not a successor ordinal, since it is the supremum of all finite ordinals and has no largest finite predecessor. Every nonzero element of $\omega$ is a finite successor ordinal: if $n\in\omega$ and $n\ne0$, then $n=m+1$ for the preceding finite ordinal $m$. Being a nonzero limit element is preserved by order isomorphism, so $\omega+\omega$ and $\omega$ have the same cardinality but different order type. [/example] Suprema also provide the language for transfinite sequences. Such a sequence is not indexed by natural numbers but by all stages below a chosen ordinal. [definition: Ordinal-Indexed Sequence] Let $X$ be a set and let $\lambda$ be an ordinal. A $\lambda$-indexed sequence in $X$ is a function \begin{align*} a:\lambda\to X. \end{align*} Its value at $\alpha\in\lambda$ is written $a_\alpha$. [/definition] The point of using an ordinal as the domain is that the indices themselves are well-ordered. This permits recursive definitions with successor clauses and limit clauses. [example: A Sequence Approaching Omega Squared] Define $\omega\cdot n$ recursively by \begin{align*} \omega\cdot 0&=0,\\ \omega\cdot(n+1)&=\omega\cdot n+\omega \end{align*} for finite $n$. The first stages are \begin{align*} \omega\cdot 1&=\omega\cdot 0+\omega=0+\omega=\omega,\\ \omega\cdot 2&=\omega\cdot 1+\omega=\omega+\omega,\\ \omega\cdot 3&=\omega\cdot 2+\omega=(\omega+\omega)+\omega. \end{align*} Thus each successor step appends one new copy of $\omega$ after all earlier blocks. For each finite $n$, the block from $\omega\cdot n$ up to but not including $\omega\cdot(n+1)$ is \begin{align*} \{\omega\cdot n+m:m<\omega\}. \end{align*} Indeed, \begin{align*} \omega\cdot(n+1) &=\omega\cdot n+\omega\\ &=\sup\{\omega\cdot n+m:m<\omega\}, \end{align*} so the elements added after $\omega\cdot n$ are exactly the finite successors \begin{align*} \omega\cdot n,\quad \omega\cdot n+1,\quad \omega\cdot n+2,\quad \ldots . \end{align*} Taking the supremum over all finite $n$ gives \begin{align*} \omega^2 &=\sup\{\omega\cdot n:n<\omega\}\\ &=\bigcup_{n<\omega}(\omega\cdot n)\\ &=\{\omega\cdot n+m:n<\omega,\ m<\omega\}. \end{align*} The inclusion from left to right holds because every element below $\omega\cdot n$ lies in one of the earlier finite blocks, hence has the form $\omega\cdot k+m$ with $k<n$ and $m<\omega$. The inclusion from right to left holds because \begin{align*} \omega\cdot n+m\in \omega\cdot n+\omega=\omega\cdot(n+1)\subseteq \bigcup_{r<\omega}(\omega\cdot r). \end{align*} The membership order therefore has the form \begin{align*} 0\in 1\in 2\in\cdots \in \omega \in \omega+1 \in \omega+2 \in\cdots \in \omega\cdot 2 \in \omega\cdot 2+1 \in\cdots . \end{align*} So $\omega^2$ is a countable sequence of consecutive $\omega$-blocks: the pair $(n,m)$ records “block $n$, position $m$ inside that block.” Since the set of finite pairs $(n,m)$ is countable and every element of $\omega^2$ has the displayed form $\omega\cdot n+m$, the ordinal $\omega^2$ is countable as a set, while its order structure records a two-level process: first move through each $\omega$-block, then move through the countable list of blocks. [/example] The examples $\omega$, $\omega+1$, $\omega+\omega$, and $\omega^2$ show that ordinals measure order type rather than size alone. The next chapter builds on this by proving recursion theorems over arbitrary ordinals, allowing constructions that specify an initial value, a successor step, and a limit-stage rule. Ordinals measure the shape of well-ordered processes, not just their length, and the basic examples already show how quickly finite intuition breaks down. The next chapter turns these order types into a tool for construction by proving recursion theorems over arbitrary ordinals and then using them to define ordinal arithmetic. # 4. Transfinite Recursion and Ordinal Arithmetic This chapter turns induction on ordinals into a method for constructing objects indexed by all ordinals. The main question is how to define a function when its value at a successor stage depends on the previous value, while its value at a limit stage depends on the entire earlier history. Once this mechanism is available, ordinal addition, multiplication, and exponentiation become canonical examples of recursive definitions, and their behaviour reveals the main difference between ordinal arithmetic and ordinary arithmetic on natural numbers. ## Recursion by Successor and Limit Stages Suppose we want to build a sequence $(a_\alpha)_{\alpha \in \operatorname{Ord}}$ where $a_{\alpha+1}$ is obtained from $a_\alpha$, and where $a_\lambda$ for limit $\lambda$ is obtained from the initial segment $(a_\beta)_{\beta<\lambda}$. The difficulty is that there is no last stage before a limit ordinal, so a recursive definition must specify how the whole earlier function is used. [definition: Initial Segment Function] Let $F$ be a function with domain an ordinal $\alpha$. For $\beta \leq \alpha$, the initial segment of $F$ up to $\beta$ is the restriction $F|_\beta$, a function with domain $\beta$. [/definition] The point of using restrictions is that the instruction at stage $\beta$ should not see any future values. This is the set-theoretic analogue of defining an ordinary sequence $a_{n+1}$ from $a_n$, except that at limit stages the earlier data is a whole function rather than a single predecessor value. To state such definitions uniformly, we need a rule whose input is exactly the history already constructed and nothing beyond it. Separate clauses for zero, successor, and limit stages are convenient in examples, but the existence theorem needs one precise object to check. For a fixed well-ordered index set $X$ and a fixed set of possible values $Y$, that object is a function that takes the already-built graph on an initial segment and returns the next value in $Y$. [definition: Transfinite Recursive Rule] Let $(X,<)$ be a well-ordered set and let $Y$ be a set. A transfinite recursive rule with values in $Y$ is a function \begin{align*} G:\mathcal P(X\times Y)\to Y. \end{align*} For $x\in X$, write \begin{align*} I_x=\{y\in X:y<x\}. \end{align*} A function $f:X\to Y$ follows $G$ when \begin{align*} f(x)=G(f|_{I_x}) \end{align*} for every $x\in X$. [/definition] This single formulation includes initial, successor, and limit stages. The input $f|_{I_x}$ is the whole earlier graph, so the same rule can inspect the immediate predecessor at successor stages and the entire previous segment at limit stages. The remaining issue is existence and uniqueness. A recursive rule describes what a value should be if the earlier values have already been constructed, but it does not by itself guarantee that a coherent function exists across the whole well-order. It also leaves open whether two different functions could both satisfy the same local instructions. The theorem below is the set-indexed form of transfinite recursion: it applies to any well-ordered set $X$, not only to an ordinal written as the domain, and then ordinal recursion is recovered by taking $X=\theta$ with the membership order. [quotetheorem:1464] [citeproof:1464] The theorem is the formal licence for recursive definitions along any well-order. In practice, we often present a rule separately at the first element, at successor-like positions, and at limit-like positions, but the theorem packages those clauses into the single instruction $f(x)=G(f|_{I_x})$. [example: Initial Segments of $\omega+1$] Let $X=\omega+1$ with its usual well-order, and let $Y=\mathcal P(\omega)$. For any set $s\subseteq X\times Y$, define \begin{align*} G(s)=\operatorname{dom}(s)\cap\omega. \end{align*} This value lies in $Y$ because it is a subset of $\omega$. On the initial segment graphs used by the recursion, the domain is already a subset of $\omega$, so $G(s)$ is just that earlier index set. By transfinite recursion, there is a unique function \begin{align*} f:\omega+1\to\mathcal P(\omega) \end{align*} such that \begin{align*} f(\alpha)=G(f|_\alpha)=\operatorname{dom}(f|_\alpha)=\alpha \end{align*} for every $\alpha\leq\omega$. Thus \begin{align*} f(0)&=\varnothing,\\ f(1)&=\{0\},\\ f(2)&=\{0,1\},\\ &\ \vdots\\ f(\omega)&=\omega. \end{align*} The limit value is not chosen from a last predecessor; it is determined from the entire earlier graph, whose domain is all of $\omega$. [/example] This example illustrates why limit clauses are not optional. The value at $\omega$ is determined by the whole initial segment of finite stages, not by a final finite predecessor. ## Ordinal Addition How should we concatenate two well-orders when their lengths may be infinite? The first arithmetic operation answers this by placing a copy of one ordinal after a copy of another, preserving the internal order of each block. [definition: Ordinal Addition] For ordinals $\alpha$ and $\beta$, the ordinal sum $\alpha+\beta$ is defined recursively in $\beta$ by \begin{align*} \alpha+0&=\alpha,\\ \alpha+(\beta+1)&=(\alpha+\beta)+1,\\ \alpha+\lambda&=\sup_{\beta<\lambda}(\alpha+\beta) \quad \text{for limit } \lambda. \end{align*} [/definition] The recursion is on the right argument. This asymmetry is responsible for the first striking phenomenon: ordinal addition is associative, but it is not commutative. [example: One Plus Omega and Omega Plus One] Using the limit clause in the recursive definition of ordinal addition, \begin{align*} 1+\omega &=\sup_{n<\omega}(1+n). \end{align*} For finite $n$, the successor clause gives $1+0=1$ and \begin{align*} 1+(n+1)=(1+n)+1, \end{align*} so by ordinary induction $1+n=n+1$. Hence \begin{align*} 1+\omega &=\sup_{n<\omega}(n+1). \end{align*} The set $\{n+1:n<\omega\}$ is cofinal in $\omega$: if $m<\omega$, then $m< m+1$ and $m+1\in\{n+1:n<\omega\}$. Therefore \begin{align*} \sup_{n<\omega}(n+1)=\omega, \end{align*} and so $1+\omega=\omega$. By contrast, $\omega+1$ is the successor ordinal $\omega\cup\{\omega\}$. Its new element $\omega$ is greater than every $n<\omega$, so $\omega+1$ has a greatest element. The ordinal $\omega$ has no greatest element, since for every $n<\omega$ we have $n+1<\omega$ and $n<n+1$. Because having a greatest element is preserved by order isomorphism, $\omega+1$ is not order-isomorphic to $\omega$. [/example] This example shows that the right-hand input controls the recursion. Adding a finite block before $\omega$ can be swallowed by the limit, while adding a point after $\omega$ creates a new endpoint. Even though addition is not commutative, it must still be stable enough to support unparenthesized iterated sums. The obstruction to check is whether performing two additions in stages changes the order type compared with adding the combined right-hand block at once. Associativity is the property that prevents ordinal addition from depending on arbitrary parenthesization choices. [quotetheorem:1473] [citeproof:1473] Associativity allows expressions such as $\alpha+\beta+\gamma$ without parentheses. It does not restore commutativity, so the order of terms remains part of the data. [remark: Failure of Commutativity for Addition] The computation $1+\omega=\omega$ while $\omega+1>\omega$ gives \begin{align*} 1+\omega\neq \omega+1. \end{align*} The obstruction is order-theoretic: $\omega+1$ has a greatest element, whereas $1+\omega$ does not. [/remark] The next operation repeats addition. Since repetition again occurs along an ordinal, the same successor-and-limit pattern appears. ## Ordinal Multiplication If addition concatenates two blocks, multiplication repeats one block many times. The notation $\alpha\cdot\beta$ means $\beta$ many consecutive copies of $\alpha$, ordered lexicographically by the copy index. [definition: Ordinal Multiplication] For ordinals $\alpha$ and $\beta$, the ordinal product $\alpha\cdot\beta$ is defined recursively in $\beta$ by \begin{align*} \alpha\cdot 0&=0,\\ \alpha\cdot(\beta+1)&=(\alpha\cdot\beta)+\alpha,\\ \alpha\cdot\lambda&=\sup_{\beta<\lambda}(\alpha\cdot\beta) \quad \text{for limit } \lambda. \end{align*} [/definition] Again, the recursion is on the right argument. The ordinal $\alpha\cdot\beta$ should be read as $\beta$ blocks, each of type $\alpha$, not as a symmetric product. [example: Two Times Omega and Omega Times Two] For each finite $n$, we compute $2\cdot n$ by induction on $n$. The zero case is \begin{align*} 2\cdot 0=0. \end{align*} If $2\cdot n$ is the finite ordinal with $2n$ elements, then the successor clause for ordinal multiplication gives \begin{align*} 2\cdot(n+1) &=(2\cdot n)+2\\ &=2n+2. \end{align*} Thus $2\cdot n=2n$ for every $n<\omega$. Since $\omega$ is a limit ordinal, the limit clause gives \begin{align*} 2\cdot\omega &=\sup_{n<\omega}(2\cdot n)\\ &=\sup_{n<\omega}2n. \end{align*} The set $\{2n:n<\omega\}$ is cofinal in $\omega$: given $m<\omega$, choose $n=m+1$, and then \begin{align*} m<m+1\leq 2(m+1)=2n. \end{align*} Therefore \begin{align*} \sup_{n<\omega}2n=\omega, \end{align*} so $2\cdot\omega=\omega$. By contrast, $2=1+1$, so applying the successor clause twice gives \begin{align*} \omega\cdot 1 &=\omega\cdot(0+1)\\ &=(\omega\cdot 0)+\omega\\ &=0+\omega\\ &=\omega, \end{align*} and then \begin{align*} \omega\cdot 2 &=\omega\cdot(1+1)\\ &=(\omega\cdot 1)+\omega\\ &=\omega+\omega. \end{align*} Thus $2\cdot\omega$ is the limit of longer and longer finite blocks and has order type $\omega$, while $\omega\cdot2$ is two consecutive blocks, each of order type $\omega$. [/example] This is the multiplicative analogue of $1+\omega$ versus $\omega+1$. Finite blocks repeated through a limit can collapse to $\omega$, while infinite blocks repeated finitely remain visible. Since multiplication is defined by iterating addition, it inherits the danger that different groupings of repeated blocks might produce different results. Before multiplication can be used reliably in longer ordinal expressions, we need to know that multiplying in two stages agrees with multiplying by the corresponding ordinal product of the indices. [quotetheorem:4815] [citeproof:4815] This result depends on addition because multiplication is defined by iterated addition. In a full development, left distributivity of multiplication over addition is usually established in the same induction package. [remark: Noncommutativity of Multiplication] The computation $2\cdot\omega=\omega$ while $\omega\cdot2=\omega+\omega$ gives \begin{align*} 2\cdot\omega\neq \omega\cdot2. \end{align*} The order type records whether the repeated blocks are finite or infinite. [/remark] Multiplication prepares the final arithmetic operation in this chapter. Exponentiation repeats multiplication through successor stages and takes suprema at limits. ## Ordinal Exponentiation What is the right analogue of $\alpha^n$ when $n$ is replaced by an arbitrary ordinal? The recursive answer is to multiply by $\alpha$ at each successor stage and to take the supremum of all previous powers at a limit. [definition: Ordinal Exponentiation] For ordinals $\alpha$ and $\beta$, with $\alpha>0$, the ordinal power $\alpha^\beta$ is defined recursively in $\beta$ by \begin{align*} \alpha^0&=1,\\ \alpha^{\beta+1}&=\alpha^\beta\cdot\alpha,\\ \alpha^\lambda&=\sup_{\beta<\lambda}\alpha^\beta \quad \text{for limit } \lambda. \end{align*} [/definition] The restriction $\alpha>0$ avoids special conventions at base $0$. For the infinite base $\omega$, exponentiation produces the standard scale $1,\omega,\omega^2,\omega^3,\dots$. [example: The Ordinal Omega Squared] We compute $\omega^2$ from the recursive definition of ordinal exponentiation: \begin{align*} \omega^2 &=\omega^{1+1}\\ &=\omega^1\cdot\omega\\ &=(\omega^{0+1})\cdot\omega\\ &=(\omega^0\cdot\omega)\cdot\omega\\ &=(1\cdot\omega)\cdot\omega\\ &=\omega\cdot\omega. \end{align*} Thus $\omega^2$ is $\omega$ many consecutive blocks, each block having order type $\omega$. A concrete representative is $\omega\times\omega$ with \begin{align*} (i,j)<(i',j') \quad\text{when}\quad i<i' \quad\text{or}\quad i=i' \text{ and } j<j'. \end{align*} For fixed $i$, the set $\{(i,j):j<\omega\}$ is one block of order type $\omega$, and if $i<i'$, then every element of the $i$-th block comes before every element of the $i'$-th block. The first $n$ blocks therefore have order type $\omega\cdot n$. Since $\omega$ is a limit ordinal, the limit clause in ordinal multiplication gives \begin{align*} \omega\cdot\omega &=\sup_{n<\omega}(\omega\cdot n). \end{align*} Also, for each $n<\omega$, the ordinal $\omega\cdot n$ is a proper initial segment of $\omega\cdot\omega$, because all pairs with first coordinate $<n$ come before $(n,0)$. Hence \begin{align*} \omega\cdot n<\omega\cdot\omega=\omega^2. \end{align*} So $\omega^2$ is not a final extra block after some largest $\omega\cdot n$; it is the supremum of all finite strings of $\omega$-blocks. [/example] The example explains why limit stages are described by suprema rather than by adding a final block. At exponent $\omega$, there is no largest finite power $\omega^n$ to multiply once more. ## Continuity at Limits and Monotonicity The recursive clauses for addition, multiplication, and exponentiation all use suprema at limit stages. This raises two related questions: when are ordinal operations continuous in the right argument, and how much monotonicity survives despite noncommutativity? [definition: Continuity in the Right Argument] Let $H$ be a class function on ordinals. The function $H$ is continuous at limit ordinals when for every limit ordinal $\lambda$, \begin{align*} H(\lambda)=\sup_{\beta<\lambda}H(\beta). \end{align*} [/definition] For the arithmetic operations in this chapter, right-continuity is not an extra theorem but part of the recursive definition. The more delicate point is that continuity and monotonicity behave differently in the left argument. The abstract definition of continuity now has to be connected back to the three concrete operations just introduced. Without that bridge, a limit-stage calculation such as replacing a value by the supremum of earlier values would only be a formal pattern, not a justified rule for ordinal addition, multiplication, or exponentiation. The following result records exactly that bridge for the right argument. [quotetheorem:4816] [citeproof:4816] Right-continuity explains the computations $1+\omega=\omega$ and $2\cdot\omega=\omega$. It is also the source of many strict increases, since taking a supremum can produce a genuinely new limit ordinal. The result is one-sided: it concerns the variable that controls the recursion, so it does not erase the asymmetry seen in $1+\omega$ versus $\omega+1$. Later normal-form arguments use this theorem to approximate a limit-stage value from below by simpler values. Continuity describes behaviour at limits, but ordinary comparison also needs monotonicity. If the right-hand ordinal is enlarged, the recursive process should have at least as much time to add, multiply, or exponentiate. The next result records this order-preservation in the right argument, while still leaving room for noncommutativity. [quotetheorem:4817] [citeproof:4817] Monotonicity in the right argument is compatible with noncommutativity. It says that extending the indexing ordinal gives a longer iteration, not that the two arguments play interchangeable roles. [remark: Monotonicity in the Left Argument] For fixed $\beta$, the functions $\alpha\mapsto\alpha+\beta$, $\alpha\mapsto\alpha\cdot\beta$, and $\alpha\mapsto\alpha^\beta$ are monotone under the usual hypotheses, but they need not be continuous at limits in the left argument. For instance, \begin{align*} \sup_{n<\omega}(n+1)=\omega \quad \text{while} \quad \omega+1>\omega. \end{align*} Thus right-recursive definitions give right-continuity, not two-sided continuity. [/remark] These continuity facts are also the computational engine behind normal forms. They let us approximate complicated ordinals by simpler earlier ones while preserving the order of operations. ## Cantor Normal Forms Below Epsilon Zero How can an ordinal below the first fixed point of $\alpha\mapsto\omega^\alpha$ be written in a finite notation? Cantor normal form answers by expressing such ordinals as finite sums of decreasing powers of $\omega$ with positive natural-number coefficients. [definition: Epsilon Zero] The ordinal $\varepsilon_0$ is \begin{align*} \varepsilon_0=\sup\{\omega,\omega^\omega,\omega^{\omega^\omega},\dots\}. \end{align*} [/definition] The ordinal $\varepsilon_0$ is the first fixed point reached by iterating $\alpha\mapsto\omega^\alpha$ from $1$. Below it, finite expressions using $0$, natural numbers, addition, multiplication by natural coefficients, and powers of $\omega$ suffice. [definition: Cantor Normal Form Below Epsilon Zero] A Cantor normal form below $\varepsilon_0$ is an expression \begin{align*} \omega^{\alpha_1}n_1+\omega^{\alpha_2}n_2+\cdots+\omega^{\alpha_k}n_k, \end{align*} where $k\in\mathbb N$, $n_i\in\mathbb N$, $n_i>0$, and \begin{align*} \varepsilon_0>\alpha_1>\alpha_2>\cdots>\alpha_k\geq 0, \end{align*} with each exponent $\alpha_i$ itself written in Cantor normal form below $\varepsilon_0$. [/definition] The decreasing exponents prevent ambiguity. Coefficients record how many consecutive blocks of each leading size occur before passing to a smaller scale. A definition of a notation system is useful only if it actually names every ordinal in the intended range and does so without collisions. Here the obstruction is recursive: the exponents must themselves already have valid normal forms, and the leading exponent has to be chosen so that no larger term is missing. The existence and uniqueness result below turns Cantor normal form from a suggestive syntax into a reliable coordinate system below $\varepsilon_0$. [quotetheorem:4818] [citeproof:4818] The theorem gives a finite syntax for the ordinals most often used in [proof theory](/page/Proof%20Theory) at the first stage. The restriction below $\varepsilon_0$ matters because the exponents appearing in the notation must themselves be finitely describable in the same language. [example: Normal Forms for Basic Ordinals] For $\omega+1$, the leading term is $\omega$ and the final term is $1$. Since $\omega^0=1$ and \begin{align*} \omega^1 &=\omega^{0+1}\\ &=\omega^0\cdot\omega\\ &=1\cdot\omega\\ &=\omega, \end{align*} we can rewrite \begin{align*} \omega+1 &=\omega^1+\omega^0\\ &=\omega^1\cdot 1+\omega^0\cdot 1. \end{align*} The exponents satisfy $1>0$, and both coefficients are positive natural numbers, so this is Cantor normal form. For $\omega\cdot2+5$, the coefficient $2$ records two consecutive $\omega$-blocks, and the finite tail $5$ is five copies of $1=\omega^0$: \begin{align*} \omega\cdot2+5 &=\omega^1\cdot2+5\\ &=\omega^1\cdot2+1\cdot5\\ &=\omega^1\cdot2+\omega^0\cdot5. \end{align*} Again the exponents are strictly decreasing, since $1>0$, and the coefficients $2$ and $5$ are positive. For $\omega^2+\omega\cdot3+7$, first rewrite each visible block as a power of $\omega$ with a natural coefficient: \begin{align*} \omega^2+\omega\cdot3+7 &=\omega^2\cdot1+\omega\cdot3+7\\ &=\omega^2\cdot1+\omega^1\cdot3+7\\ &=\omega^2\cdot1+\omega^1\cdot3+\omega^0\cdot7. \end{align*} Here the exponents satisfy $2>1>0$, so the terms occur in the required strictly decreasing order. These examples show that Cantor normal form separates an ordinal into successive blocks of decreasing size, with each coefficient counting how many blocks of that size occur. [/example] Cantor normal form closes the chapter by turning the recursive arithmetic into a usable notation system. The next topics in set theory use these ordinal calculations to organise cardinals, cofinalities, and transfinite constructions whose length is no longer merely countable. Ordinal arithmetic gives a rigorous calculus for transfinite order types, but it still does not address the question of size. The next chapter shifts from order to cardinality, using the ordinal machinery to compare sets by equinumerosity and to define the first systematic theory of infinite magnitudes. # 5. Cardinals and Equinumerosity Cardinality is the part of set theory that isolates the question "how many elements does this set have?" from the particular nature of those elements. The preceding chapters developed ordinals as canonical order types of well-orders; this chapter uses that machinery to turn size comparisons into precise set-theoretic objects. We begin with maps between arbitrary sets, prove the main comparison theorems, and then use well-ordering to represent cardinals by initial ordinals. ## Comparing Sizes by Functions How should two sets be compared when there is no preferred listing of their elements? The basic method is to look for functions that match, embed, or cover one set by another. This gives a notion of size comparison that works for finite and infinite sets at the same time. [definition: Injection] Let $A$ and $B$ be sets. A function $f: A \to B$ is an injection if for all $a_0,a_1 \in A$, \begin{align*} f(a_0)=f(a_1) \implies a_0=a_1. \end{align*} [/definition] An injection from $A$ to $B$ says that $A$ can be placed inside $B$ without identifying distinct elements. Embedding is only one way to compare sizes. To say that the domain has enough elements to account for the whole codomain, the relevant condition is not that different inputs stay different, but that no target element is missed. This gives the covering notion used for enumerations and quotient-like comparisons of size. [definition: Surjection] Let $A$ and $B$ be sets. A function $f: A \to B$ is a surjection if for every $b \in B$ there exists $a \in A$ such that $f(a)=b$. [/definition] Surjections express that $A$ has enough elements to name every element of $B$, possibly with repetition. The next comparison needs a way to record a perfect pairing rather than only an embedding or a covering. For equality of size, neither condition alone is sufficient: injections can miss elements, and surjections can identify several inputs with the same output. The exact-match notion is the one that simultaneously forbids omissions and repetitions. [definition: Bijection] Let $A$ and $B$ be sets. A function $f: A \to B$ is a bijection if it is both an injection and a surjection. [/definition] A bijection is a function-level matching. Cardinality should compare sets, not particular chosen functions between them. Once a bijection exists, the important information is the existence of some perfect matching, because different bijections can witness the same sameness of size. The next definition packages that existential comparison as a relation between sets. [definition: Equinumerosity] Sets $A$ and $B$ are equinumerous, written $A \approx B$, if there exists a bijection $f: A \to B$. [/definition] Equinumerosity is the set-theoretic version of having the same cardinality. It behaves like equality of sizes rather than equality of sets: the elements may be different, but the sets admit a perfect matching. To use equinumerosity as the equality relation for cardinal arithmetic, we need more than the definition itself. The relation must support the basic equality operations: a set should match itself, a matching should be reversible, and two matchings should compose. These closure properties are what allow later arguments to replace one representative of a size by another without changing the cardinal comparison. [quotetheorem:4819] [citeproof:4819] The hypotheses here are exactly the data needed for matching: without bijections, arbitrary functions do not preserve size, since a constant map $A\to B$ may exist even when $A$ and $B$ are very different. The theorem does not say that equinumerous sets are equal as sets, nor does it choose a preferred bijection between them. Its role is to make $\approx$ a legitimate sameness relation, so the next comparison relation keeps only the embedding information rather than a full matching. [definition: Cardinal Domination] For sets $A$ and $B$, write $A \preccurlyeq B$ if there exists an injection $f: A \to B$. [/definition] The notation $A \preccurlyeq B$ should be read as "$A$ has cardinality at most that of $B$". For ordinary numbers, two inequalities in opposite directions imply equality. For sets this is not immediate, because injections $A\to B$ and $B\to A$ may have unrelated images and need not be inverse functions. The following theorem is the key cancellation principle saying that mutual embeddability is still enough to recover equality of cardinal size. [illustration:schroeder-bernstein-alternating-chain] [quotetheorem:760] [citeproof:760] Both injection hypotheses are needed: an injection $A\to B$ alone only says that $A$ embeds into $B$, as with $\{0\}\to\{0,1\}$, and it gives no bijection. The theorem also does not produce a canonical matching; it builds one from the chosen injections, and different choices can give different bijections. Its value is practical: it lets us prove equality of sizes by proving two easier inequalities, which is especially useful for infinite sets where explicit enumerations can be awkward. [example: Even Natural Numbers] Let $E=\{2n:n\in\omega\}$, and define $f:\omega\to E$ by $f(n)=2n$. The codomain condition holds because for every $n\in\omega$, the value $f(n)=2n$ is one of the elements used in the defining set-builder expression for $E$. To see that $f$ is injective, suppose $f(m)=f(n)$ for $m,n\in\omega$. Then \begin{align*} 2m&=2n,\\ m&=n, \end{align*} where the second line follows by cancellation in natural-number arithmetic. To see that $f$ is surjective, let $e\in E$. By the definition of $E$, there is some $n\in\omega$ such that \begin{align*} e=2n. \end{align*} For this $n$, we have \begin{align*} f(n)=2n=e, \end{align*} so every element of $E$ is hit by $f$. Thus $f$ is a bijection from $\omega$ to $E$, and hence $\omega\approx E$. The subset is proper because $1\in\omega$ but $1\notin E$: if $1=2n$ for some $n\in\omega$, then $n=0$ gives $2n=0$ and every $n\ge 1$ gives $2n\ge 2$, so no such $n$ exists. Therefore an infinite set can be equinumerous with one of its proper subsets, which is the first major departure from finite intuition. [/example] ## Countable Sets and Standard Enumerations Which sets can still be listed in a sequence? The answer is not limited to $\omega$ itself: many sets built from natural numbers remain countable after finite products, finite unions, and finite sequences. [definition: Finite Countable And Uncountable Sets] A set $A$ is finite if $A \approx n$ for some $n \in \omega$. A set $A$ is countably infinite if $A \approx \omega$. A set $A$ is countable if $A$ is finite or countably infinite. A set $A$ is uncountable if it is not countable. [/definition] This definition separates two ideas that often get conflated: being small enough to list, and being listed without repetition by all of $\omega$. The next examples give the standard enumerations used throughout the course. [example: Integers Are Countable] Define $f:\omega\to\mathbb Z$ by \begin{align*} f(0)&=0,\\ f(2k+1)&=k+1 \quad \text{for } k\in\omega,\\ f(2k+2)&=-(k+1) \quad \text{for } k\in\omega. \end{align*} Thus the values of $f$ begin \begin{align*} 0,1,-1,2,-2,3,-3,\dots. \end{align*} To prove that this is a bijection, define $g:\mathbb Z\to\omega$ by \begin{align*} g(z)= \begin{cases} 2z-1, & z>0,\\ 0, & z=0,\\ -2z, & z<0. \end{cases} \end{align*} If $z>0$, then $z=k+1$ for $k=z-1\in\omega$, so \begin{align*} f(g(z))=f(2z-1)=f(2(z-1)+1)=(z-1)+1=z. \end{align*} If $z=0$, then \begin{align*} f(g(0))=f(0)=0. \end{align*} If $z<0$, then $-z=k+1$ for $k=-z-1\in\omega$, so \begin{align*} f(g(z))=f(-2z)=f(2(-z-1)+2)=-((-z-1)+1)=-(-z)=z. \end{align*} Conversely, every $n\in\omega$ is exactly one of $0$, $2k+1$, or $2k+2$ for some $k\in\omega$. For these three cases, \begin{align*} g(f(0))&=g(0)=0,\\ g(f(2k+1))&=g(k+1)=2(k+1)-1=2k+1,\\ g(f(2k+2))&=g(-(k+1))=-2(-(k+1))=2k+2. \end{align*} So $g$ is a two-sided inverse for $f$, hence $f$ is a bijection $\omega\to\mathbb Z$. Therefore $\mathbb Z\approx\omega$, so $\mathbb Z$ is countably infinite. [/example] The integers are countable because they can be threaded into one sequence. Rational numbers require a two-dimensional enumeration, since they are represented by numerator-denominator pairs. Here $\mathbb N=\omega\setminus\{0\}$ denotes the positive natural numbers. [example: Rational Numbers Are Countable] For each $s\in\mathbb N$, form the finite block \begin{align*} B_s=\{(m,n)\in\mathbb Z\times\mathbb N: |m|+n=s \text{ and } \gcd(|m|,n)=1\}. \end{align*} Within $B_s$, order the pairs by increasing $m$, and then concatenate the blocks \begin{align*} B_1,B_2,B_3,\dots. \end{align*} For example, \begin{align*} B_1&=\{(0,1)\},\\ B_2&=\{(-1,1),(1,1)\},\\ B_3&=\{(-2,1),(-1,2),(1,2),(2,1)\}. \end{align*} Whenever a pair $(m,n)$ appears, write down the rational number $m/n$. Every pair listed has $n\in\mathbb N$, so $m/n\in\mathbb Q$. Conversely, let $q\in\mathbb Q$. Choose integers $a,b$ with $b\ne 0$ and $q=a/b$. If $b<0$, replace $(a,b)$ by $(-a,-b)$, so we may assume $b>0$. Let \begin{align*} d=\gcd(|a|,b),\qquad m=\frac{a}{d},\qquad n=\frac{b}{d}. \end{align*} Then $n\in\mathbb N$, $\gcd(|m|,n)=1$, and \begin{align*} \frac{m}{n} =\frac{a/d}{b/d} =\frac{a}{b} =q. \end{align*} The pair $(m,n)$ appears in the block $B_s$ with \begin{align*} s=|m|+n, \end{align*} so $q$ appears somewhere in the list. No rational number appears twice in this reduced list. Suppose two listed pairs satisfy \begin{align*} \frac{m}{n}=\frac{m'}{n'}, \end{align*} where $n,n'\in\mathbb N$, $\gcd(|m|,n)=1$, and $\gcd(|m'|,n')=1$. Cross-multiplying gives \begin{align*} mn'=m'n. \end{align*} Since $\gcd(|m|,n)=1$, the divisibility relation $n\mid mn'$ forces $n\mid n'$. Similarly, $\gcd(|m'|,n')=1$ and $n'\mid m'n$ force $n'\mid n$. Hence $n=n'$, and then \begin{align*} mn'=m'n \end{align*} becomes \begin{align*} mn= m'n, \end{align*} so cancellation by the positive integer $n$ gives $m=m'$. Thus each rational has exactly one reduced representative in the list, and the concatenated block ordering is a bijective enumeration of $\mathbb Q$. Therefore $\mathbb Q$ is countable. [/example] The rational enumeration is the prototype for coding finite pieces of natural-number data by one natural number. The next example packages this coding in the form used later for syntax, finite proofs, and finite strings. [example: Finite Sequences Of Naturals Are Countable] Write a finite sequence as $s=(s_0,\dots,s_{k-1})$, where $k\in\omega$ is its length. We code such a sequence by a natural number using primes. Let $p_0=2,p_1=3,p_2=5,\dots$ be the increasing enumeration of the prime numbers, and define \begin{align*} c(s)=p_0^k\prod_{i<k}p_{i+1}^{s_i+1}. \end{align*} For example, \begin{align*} c((4,0,2))=2^3\cdot 3^{5}\cdot 5^{1}\cdot 7^{3}. \end{align*} We show that $c:\omega^{<\omega}\to\omega$ is injective. Suppose \begin{align*} c((s_0,\dots,s_{k-1}))=c((t_0,\dots,t_{\ell-1})). \end{align*} Then \begin{align*} 2^k\prod_{i<k}p_{i+1}^{s_i+1} = 2^\ell\prod_{j<\ell}p_{j+1}^{t_j+1}. \end{align*} By unique prime factorization, the exponent of the prime $2$ on the left equals the exponent of $2$ on the right, so \begin{align*} k=\ell. \end{align*} Now fix any $i<k$. The exponent of the prime $p_{i+1}$ on the left is $s_i+1$, and the exponent of $p_{i+1}$ on the right is $t_i+1$, so unique prime factorization gives \begin{align*} s_i+1&=t_i+1,\\ s_i&=t_i. \end{align*} Thus the two sequences have the same length and the same entry in every position, so they are equal. Therefore $\omega^{<\omega}$ injects into $\omega$. Its image is a subset of $\omega$; listing that image in increasing order gives either a finite list or a sequence indexed by $\omega$. Decoding each listed code by its prime exponents gives an enumeration of all finite sequences of natural numbers, so $\omega^{<\omega}$ is countable. [/example] The previous examples rely on the fact that countable unions of countable sets can be enumerated. In ZFC this is justified using choice for a countable family of enumerations, or by giving explicit enumerations in the concrete cases above. This closure property is needed whenever a set is built in countably many countable layers rather than by one direct list. The obstruction is that each layer may come with its own enumeration, so the proof must coordinate those enumerations into a single sequence without losing any elements. The theorem below records the general form of that enumeration principle. [quotetheorem:755] [citeproof:755] Countable union closure explains why many sets constructed from finite strings, finite tuples, or countably many stages remain countable. It is not a general license to treat every larger construction as countable: the argument still depends on having only countably many countable pieces. The next operation shows the boundary of the principle, because the power set of a [countable set](/page/Countable%20Set) cannot be listed by any countable enumeration. ## Cantor Diagonalization and Power Sets Can a set ever have the same size as its power set? The answer is no, and the proof is the diagonal argument that reappears throughout logic: assume a proposed list, then build an object that differs from each listed object at its own index. [quotetheorem:621] [citeproof:621] The hypothesis that $F$ has domain $A$ and codomain $\mathcal P(A)$ is essential because the diagonal set is built by asking whether each $a\in A$ belongs to its own assigned subset $F(a)$. The theorem does not say that $\mathcal P(A)$ cannot be injected into some larger set, nor does it identify the exact next size above $A$; it only rules out a surjective listing of all subsets by elements of $A$. This single obstruction creates a strict hierarchy of larger and larger sets, with the first central instance being the uncountability of the power set of the natural numbers. [example: Uncountability Of The Power Set Of Omega] Suppose $F:\omega\to\mathcal P(\omega)$ is a proposed enumeration of all subsets of $\omega$. Define \begin{align*} D=\{n\in\omega:n\notin F(n)\}. \end{align*} By its definition, every element of $D$ is an element of $\omega$, so $D\subseteq\omega$ and hence $D\in\mathcal P(\omega)$. We show that $D$ is not equal to any set appearing in the list. Fix $n\in\omega$. There are two cases. If $n\in D$, then the defining condition for membership in $D$ gives \begin{align*} n\notin F(n), \end{align*} so $D$ and $F(n)$ differ at the element $n$. If $n\notin D$, then, since $n\in\omega$, failure of the defining condition gives \begin{align*} n\in F(n), \end{align*} so again $D$ and $F(n)$ differ at the element $n$. Therefore, for every $n\in\omega$, \begin{align*} D\ne F(n). \end{align*} Thus no function $F:\omega\to\mathcal P(\omega)$ can list all subsets of $\omega$. If $\mathcal P(\omega)$ were countably infinite, there would be a bijection $\omega\to\mathcal P(\omega)$, hence a surjective enumeration, contradicting the argument above. If $\mathcal P(\omega)$ were finite and nonempty, a finite list of all its elements could be repeated periodically to give a surjection $\omega\to\mathcal P(\omega)$, again impossible. Since $\varnothing\in\mathcal P(\omega)$, the power set is nonempty. Therefore $\mathcal P(\omega)$ is uncountable. [/example] It is useful to state the size comparison consequence explicitly. The map $a\mapsto\{a\}$ always injects $A$ into $\mathcal P(A)$, while [Cantor theorem](/theorems/759) rules out a bijection in the reverse direction. [quotetheorem:759] [citeproof:759] The injection part needs only singletons; without Extensionality, the argument that $\{a_0\}=\{a_1\}$ forces $a_0=a_1$ would have no foundation. The theorem does not name the next cardinal after $A$, since $\mathcal P(A)$ may skip over intermediate cardinal sizes. This gap between producing a larger set and naming canonical sizes leads to the use of initial ordinals. ## Cardinals as Initial Ordinals Equinumerosity gives a relation on sets, but ZFC does not package "the class of all sets equinumerous with $A$" as a set. A more concrete failure occurs already at the first infinite ordinal: $\omega$, $\omega+1$, and $\omega\cdot 2$ are different ordinals but have the same size, so ordinary ordinal equality is too fine for cardinality. The standard solution is to choose a canonical representative: the least ordinal, in the well-ordering of ordinals, that has the required size. [definition: Initial Ordinal] An ordinal $\kappa$ is an initial ordinal if there is no ordinal $\alpha<\kappa$ such that $\alpha\approx\kappa$. [/definition] Initial ordinals are the ordinal representatives of cardinal sizes. Without restricting to them, the phrase "the ordinal size of $\omega$" would be ambiguous, since many larger ordinals are still equinumerous with $\omega$. The size representatives now need their own name because they will be the objects compared and manipulated in cardinal arithmetic. The point is not to introduce a new kind of object, but to mark exactly which ordinals are allowed to stand for sizes. This terminology lets later statements talk about cardinal arithmetic and cardinal comparison without repeatedly saying "initial ordinal." [definition: Cardinal] A cardinal is an initial ordinal. [/definition] Calling cardinals initial ordinals turns a size into a canonical ordinal representative. For a particular set $A$, the remaining problem is to choose which cardinal represents its size. If $A$ can be well-ordered, then $A$ is bijective with some ordinal, but there may be many such ordinals and most of them are not initial. Taking the least ordinal among the representatives removes this nonuniqueness and produces the canonical cardinal assigned to $A$. [definition: Cardinality Of A Well Orderable Set] Let $A$ be a well-orderable set. The cardinality of $A$, written $|A|$, is the least ordinal $\kappa$ such that $A\approx\kappa$. [/definition] The definition depends on the existence of a well-ordering of $A$. Under the [well-ordering theorem](/theorems/1227), equivalent to the axiom of choice over ZF, every set has such a cardinal. Even when an ordinal representative exists, we still have to justify that the least representative is initial in the precise sense just defined. Otherwise the notation $|A|$ would pick an ordinal without proving that it is a genuine cardinal. The next result verifies that the least ordinal equinumerous with a well-orderable set is exactly the canonical initial ordinal for its size. [quotetheorem:4820] [citeproof:4820] The well-orderability hypothesis is essential: without a well-order on $A$, there need not be an ordinal known to be equinumerous with $A$ in ZF. The theorem does not prove that every set has a cardinal; that requires the well-ordering theorem, or equivalently the axiom of choice over ZF. What it does prove is that once an ordinal representative exists, the least representative is an honest set of ZFC and can serve as the canonical cardinal. [example: Finite Cardinals And Omega] For a finite ordinal $n\in\omega$, the identity map $\operatorname{id}_n:n\to n$ is a bijection, so $n\approx n$. To see that $n$ is initial, let $\alpha<n$. Since $\alpha$ and $n$ are finite ordinals, write $\alpha=m$ with $m,n\in\omega$ and $m<n$. No function $f:m\to n$ can be surjective: if $m=0$, then $0\in n$ is not hit; if $m>0$, the domain has exactly the elements $0,1,\dots,m-1$, so the range of $f$ has at most one value for each of these $m$ inputs and therefore cannot contain all $n$ elements of $n$ when $m<n$. Hence there is no bijection $m\to n$, so no $\alpha<n$ is equinumerous with $n$. Thus each finite ordinal $n$ is initial, and since $n$ itself is the least ordinal equinumerous with $n$, we have $|n|=n$. The ordinal $\omega$ is also initial. Let $\alpha<\omega$. Then $\alpha=n$ for some finite ordinal $n\in\omega$. Suppose, toward a contradiction, that $f:n\to\omega$ is a bijection. If $n=0$, then $0\in\omega$ is not in the range of $f$, contradicting surjectivity. If $n>0$, the finite set \begin{align*} \{f(0),f(1),\dots,f(n-1)\} \end{align*} has a largest element $M\in\omega$. Then $M+1\in\omega$, and for every $i<n$ we have \begin{align*} f(i)\le M<M+1, \end{align*} so $M+1$ is not in the range of $f$, again contradicting surjectivity. Therefore no finite ordinal is equinumerous with $\omega$, so $\omega$ is initial. Thus the finite cardinals are exactly the finite ordinals, and the first infinite cardinal is $\omega$, often denoted $\aleph_0$. [/example] Finite cardinals and $\omega$ show that cardinal comparison should agree with ordinal comparison on initial ordinals. We now define the order on cardinals using injections, then verify that initiality makes it antisymmetric. [definition: Cardinal Order] For cardinals $\kappa$ and $\lambda$, write $\kappa\le\lambda$ if $\kappa\preccurlyeq\lambda$. [/definition] For this injection-based order to be an actual order, mutual domination must collapse to equality among cardinals. Schroeder-Bernstein provides the equinumerosity step, and initiality must then force the two representatives to be the same ordinal. This is the first point where the definition of cardinal as initial ordinal does real work: it turns a relation defined by arbitrary injections into equality of canonical representatives. The formal result below packages that check, so later cardinal inequalities can be manipulated as inequalities in a genuine ordered class. [quotetheorem:4821] [citeproof:4821] Antisymmetry depends on working with initial ordinals: arbitrary ordinals can be mutually injectable without being equal, as $\omega$ and $\omega+1$ are equinumerous but distinct. The theorem does not say that arbitrary sets are comparable by injections; comparability of all sets is another form of choice. It prepares the ground for constructing new cardinals by showing that the cardinals themselves form a genuine ordered class. ## Alephs Hartogs Numbers and Successor Cardinals Once cardinals are initial ordinals, the next question is whether every cardinal has a next larger cardinal. Cantor theorem says larger sizes exist, but it produces $\mathcal P(A)$ rather than the least ordinal beyond $A$. Hartogs theorem gives a more ordinal-theoretic construction: for any set, there is an ordinal that does not inject into it, so no set can contain copies of all ordinals. [definition: Hartogs Number] Let $A$ be a set. The Hartogs number of $A$, denoted $\aleph(A)$, is the least ordinal $\alpha$ such that there is no injection $f:\alpha\to A$. [/definition] This definition needs a theorem: the collection of order types of well-orderings of subsets of $A$ is bounded by a set-sized ordinal. Hartogs theorem gives exactly this choice-free ceiling, and then chooses the least initial ordinal beyond all ordinals that inject into $A$. [quotetheorem:4822] [citeproof:4822] The construction needs the well-orderings to live on subsets of $A$, because then each ordering is coded as a subset of $A\times A$ and the collection is a set. It does not well-order $A$ itself, and therefore does not prove the axiom of choice. Instead it gives a choice-free ceiling: every set is too small to contain copies of all ordinals, and the next step is to choose the least initial ordinal with this failure. Minimality is needed here: Hartogs theorem alone gives some ordinal too large to inject into $A$, but that ordinal need not be the first such obstruction. The theorem does not assert that $A$ itself is well-orderable or has a cardinal in ZF; it only finds an initial ordinal lying beyond all ordinals that inject into $A$. Applying this bound to a cardinal is the key step in defining the next cardinal size. [definition: Successor Cardinal] Let $\kappa$ be a cardinal. The successor cardinal $\kappa^+$ is the least cardinal $\lambda$ such that $\kappa<\lambda$. [/definition] The existence of $\kappa^+$ follows from Hartogs theorem applied to $\kappa$. This definition is necessary because ordinal successor and cardinal successor differ: $\kappa+1$ is equinumerous with $\kappa$ when $\kappa$ is infinite, while $\kappa^+$ is the next initial ordinal. Thus the notation $+$ in $\kappa^+$ records a jump in cardinal size, not the operation of ordinal addition. [quotetheorem:4823] [citeproof:4823] The hypothesis that $\kappa$ is already a cardinal matters, since the conclusion names the next initial ordinal above a canonical size rather than above an arbitrary ordinal representative such as $\omega+1$. The theorem does not identify $\kappa^+$ with $\mathcal P(\kappa)$; Cantor gives an injection obstruction, but there may be cardinals strictly between $\kappa$ and $|\mathcal P(\kappa)|$. Successor cardinals let us climb from one cardinal to the next, and to name all infinite cardinals in order we iterate this operation through the ordinals and take suprema at limit stages. [definition: Aleph Sequence] The aleph sequence is the transfinite sequence of cardinals defined by \begin{align*} \aleph_0 &= \omega,\\ \aleph_{\alpha+1} &= \aleph_\alpha^+,\\ \aleph_\lambda &= \sup_{\alpha<\lambda}\aleph_\alpha \quad \text{for limit ordinals }\lambda. \end{align*} [/definition] The alephs enumerate the infinite initial ordinals. Later chapters use this notation to discuss cofinality, cardinal arithmetic, and the interaction between choice and comparability of arbitrary sets. Cardinals and alephs organize size, but many of the deepest questions about size comparisons depend on whether one can choose elements coherently from many sets. The next chapter isolates that issue through the Axiom of Choice and its equivalent principles, showing how well-orderability, maximality, and dependent selection fit into the ZFC framework. # 6. The Axiom of Choice and Equivalent Principles This chapter isolates the role of the Axiom of Choice in the ZFC toolkit. Earlier chapters used well-orders and recursion once a well-order was already present; here we ask when arbitrary sets can be well-ordered and when compatible choices can be made across many nonempty sets at once. The central point is that several principles with different mathematical appearances have the same strength over the other ZF axioms: choice functions, well-orderings, and maximal elements in partially ordered sets. ## Choice Functions and Products of Nonempty Sets Suppose a proof asks us to choose one element from each member of a family of nonempty sets. For a finite family this is handled by repeated existential instantiation, but for an arbitrary family the proof needs a single set-sized object recording all the choices. The Axiom of Choice is the assertion that such simultaneous selections exist. [definition: Choice Function] Let $A$ be a set whose elements are nonempty sets. A choice function for $A$ is a function $f: A \to \bigcup A$ such that $f(x) \in x$ for every $x \in A$. [/definition] This definition packages many existential statements into one function. The codomain $\bigcup A$ is not the essential feature; what matters is that each value lands in the corresponding member of the family. [example: Choosing From a Finite Family] Let $A=\{X_1,\dots,X_n\}$, and assume each $X_i$ is nonempty. If some of the displayed sets are repeated, list the distinct members of $A$ as $Y_1,\dots,Y_m$, where $m\le n$. Since each $Y_j$ is one of the $X_i$, it is nonempty, so there exists $y_j\in Y_j$ for each $1\le j\le m$. Using these finitely many witnesses, define \begin{align*} f=\{(Y_j,y_j):1\le j\le m\}. \end{align*} For every $Y\in A$, there is a unique $j$ with $Y=Y_j$, because $Y_1,\dots,Y_m$ lists the distinct members of $A$. Hence $f(Y)=y_j$ is well-defined. Also $y_j\in Y_j\subseteq \bigcup A$, so $f:A\to \bigcup A$, and for every $Y_j\in A$ we have \begin{align*} f(Y_j)=y_j\in Y_j. \end{align*} Thus $f$ is a choice function for $A$. Only finitely many existential choices were made, one for each distinct member of $A$, so this construction uses finite existential instantiation rather than the Axiom of Choice. [/example] The finite example explains why the axiom is invisible in many elementary arguments. To handle arbitrary families, the informal instruction to choose one element from each set has to be promoted to a single set-theoretic assertion. The difficulty begins when there is no previously given rule, order, or definable selection procedure for an infinite family. [definition: Axiom of Choice] For every set $A$ whose elements are nonempty sets, there exists a choice function for $A$. [/definition] A common equivalent formulation uses Cartesian products. To state it precisely for an arbitrary index set, we need notation for the product of a whole family of sets, not just a finite tuple of factors. If $I$ indexes a family $(X_i)_{i \in I}$, an element of the product is a function whose $i$th value lies in $X_i$. [definition: Indexed Product] Let $I$ be a set and let $(X_i)_{i \in I}$ be an indexed family of sets. The product of the family is \begin{align*} \prod_{i \in I} X_i = \{f: I \to \bigcup_{i \in I} X_i : f(i) \in X_i \text{ for all } i \in I\}. \end{align*} [/definition] With this notation, the Axiom of Choice says that a product of nonempty sets is nonempty. This form is often the most convenient one in algebra and analysis. [quotetheorem:4824] [citeproof:4824] The hypothesis that every factor is nonempty is essential: if $j \in I$ and $X_j=\varnothing$, then no function can have a value in $X_j$, so the product is empty. The theorem also does not say that the selected element is definable from the family; it asserts the existence of a single function making all choices at once. The tagged-family argument will reappear in the next formulation, where the fibres of a surjection play the role of the nonempty sets from which choices must be made. [example: Noncanonical Choices of Representatives] Let $E$ be an [equivalence relation](/page/Equivalence%20Relation) on a set $X$, and let \begin{align*} X/E=\{[x]_E:x\in X\} \end{align*} be the set of equivalence classes. Each class $C\in X/E$ is nonempty, since $C=[x]_E$ for some $x\in X$ and $x\in[x]_E$ by reflexivity of $E$. A choice function on $X/E$ is therefore a function \begin{align*} f:X/E\to \bigcup (X/E) \end{align*} such that $f(C)\in C$ for every $C\in X/E$. Since every class is a subset of $X$, we have $\bigcup(X/E)\subseteq X$, so the selected values lie in $X$. Define the set of chosen representatives by \begin{align*} R=\{f(C):C\in X/E\}. \end{align*} If $C\in X/E$, then $f(C)\in C$, so $R\cap C$ is nonempty. To see it has only one element, suppose $r\in R\cap C$. Since $r\in R$, there is some $D\in X/E$ with $r=f(D)$. Because $f(D)\in D$, we have $r\in D$, and because $r\in C$, the two equivalence classes $C$ and $D$ intersect. Equivalence classes are either disjoint or equal, so $D=C$. Hence \begin{align*} r=f(D)=f(C). \end{align*} Thus $R\cap C=\{f(C)\}$ for every class $C\in X/E$. For congruence modulo $n$, the class of an integer $a$ has the unique representative in $\{0,1,\dots,n-1\}$, so a representative can be selected by a displayed rule. For an arbitrary equivalence relation, the same set $R$ exists only when some choice principle or additional structure supplies the selections. [/example] Choice functions also control when maps can be split. Surjections identify many possible preimages of each point, so the natural question is whether those local nonempty fibres can be selected from all at once. A right inverse is exactly such a simultaneous choice of one preimage for every point in the codomain. [quotetheorem:4825] [citeproof:4825] Surjectivity is the necessary hypothesis here: if $p: X \to Y$ misses some $y \in Y$, then no function $s: Y \to X$ can satisfy $p(s(y))=y$. Even for a surjection, the theorem does not identify a canonical section; it only supplies one possible selection of preimages. The exact strength of this statement is captured by the converse. [quotetheorem:4826] [citeproof:4826] Thus splitting surjections is not merely an application of Choice; it is another equivalent way to package simultaneous selection. The content is that every family of nonempty fibres can be treated as the fibres of some surjection, so choosing a section and choosing representatives are the same operation in different notation. This formulation is useful because it replaces many separate choices by a single structural condition on a map. It also prepares the contrast with the next equivalent principle, where the same strength appears as the ability to impose well-orders on arbitrary sets. ## Well-Ordering and Global Selection The next question is whether arbitrary sets can be put into the kind of order that supports transfinite induction and recursion. A well-order on a set supplies a canonical way to choose the least element of any nonempty subset, so it turns selection problems into order-theoretic problems. [quotetheorem:1227] [citeproof:1227] The Well-Ordering Theorem is striking because many familiar sets have no natural displayed well-order. In particular, the usual order on $\mathbb R$ is not a well-order, since intervals such as $(0,1)$ have no least element. [example: A Well-Order Is Not the Usual Order] Define $e:\mathbb N\to\mathbb Z$ by \begin{align*} e(0)&=0,\\ e(2m+1)&=m+1,\\ e(2m+2)&=-(m+1) \end{align*} for $m\in\mathbb N$. Thus the order displayed as \begin{align*} 0,1,-1,2,-2,3,-3,\dots \end{align*} is the order $\prec$ on $\mathbb Z$ given by $a\prec b$ exactly when $e^{-1}(a)<e^{-1}(b)$ in $\mathbb N$. To see that $\prec$ is a well-order, let $S\subseteq\mathbb Z$ be nonempty. Then \begin{align*} T=\{n\in\mathbb N:e(n)\in S\} \end{align*} is nonempty because $S$ contains some integer $s$, and $s=e(n)$ for some $n\in\mathbb N$. Since $\mathbb N$ is well-ordered by its usual order, $T$ has a least element $n_0$. Then $e(n_0)\in S$, and if $s\in S$, say $s=e(n)$, then $n\in T$, so $n_0\le n$. Hence $e(n_0)\preceq s$, proving that $e(n_0)$ is the first element of $S$ in the displayed order. By contrast, the usual order $<$ does not well-order $\mathbb Z$. The subset $\mathbb Z\subseteq\mathbb Z$ is nonempty, but it has no least element: if $z\in\mathbb Z$, then $z-1\in\mathbb Z$ and \begin{align*} z-1<z. \end{align*} So the same set can admit a well-order while its usual algebraically familiar order fails to be one. [/example] The Well-Ordering Theorem implies choice because least elements provide a uniform rule. Conversely, choice well-orders a set by repeatedly choosing the next element not yet chosen and using Hartogs theorem to stop the recursion before it runs too long. [quotetheorem:4827] [citeproof:4827] This theorem is stronger than the familiar fact that many concrete sets can be ordered conveniently. It says every set can be well-ordered, but it does not provide a simple formula for such an order; for instance, the theorem gives a well-order of $\mathbb R$ in ZFC without making it resemble the usual order or any standard analytic construction. Hartogs theorem supplies the stopping argument in the proof: if the recursive choice process continued too long, it would produce an injection from an ordinal too large to inject into $X$. This prepares the passage to Zorn Lemma, where the same transfinite obstruction is hidden inside a maximality principle. [remark: Choice and Definability] A well-order of a set need not be definable from simple parameters, and the Axiom of Choice does not provide a formula naming a preferred well-order. It asserts existence of a set carrying such an order. This distinction matters when comparing ZFC with theories such as ZF plus determinacy, where definability and regularity properties interact strongly with choice principles. [/remark] ## Zorn Lemma and Maximal Principles Many constructions do not try to list all elements of a desired object. Instead, they enlarge partial approximations until no further enlargement is possible. Zorn Lemma is the set-theoretic principle that justifies this maximal-object method. [definition: Chain] Let $(P,\le)$ be a partially ordered set. A chain in $P$ is a subset $C \subset P$ such that for all $x,y \in C$, either $x \le y$ or $y \le x$. [/definition] Chains are the compatible collections of approximations. If every chain has an upper bound, then no obstruction appears when passing from finite or linearly ordered stages to their union. [definition: Upper Bound] Let $(P,\le)$ be a partially ordered set and let $C \subset P$. An upper bound for $C$ in $P$ is an element $u \in P$ such that $c \le u$ for every $c \in C$. [/definition] The target of a maximality argument is not usually a largest element of $P$. We therefore need a weaker endpoint notion that records when an admissible partial object cannot be extended within the chosen order, even if unrelated objects remain elsewhere in the poset. [definition: Maximal Element] Let $(P,\le)$ be a partially ordered set. An element $m \in P$ is maximal if there is no $p \in P$ such that $m < p$. [/definition] These definitions set up the central question of the maximal-object method: when does the ability to bound every linearly ordered family force the existence of an object that admits no further extension? Zorn Lemma answers this question and is the form of Choice most adapted to algebraic and analytic construction arguments. [quotetheorem:1226] [citeproof:1226] Zorn Lemma is usually applied by choosing $P$ to be a set of partial structures ordered by extension. The chain condition says that compatible approximations can be united into another admissible approximation. Since this lemma is often used as a substitute for the Axiom of Choice, we also need to know that it has exactly the same set-theoretic strength. The equivalence result explains why arguments phrased in maximal-element language are not avoiding Choice; they are using an equivalent form better suited to extension problems. [quotetheorem:4828] [citeproof:4828] The hypotheses in Zorn Lemma are genuine. If $P=\varnothing$, there is no maximal element, so nonemptiness cannot be omitted. If $P=\mathbb N$ with the usual order, then $P$ has no maximal element because every $n$ is below $n+1$, and the chain $\mathbb N$ has no upper bound in $P$; this shows why the chain upper-bound condition is the point of the lemma. The equivalence explains why Zorn Lemma appears throughout algebra, topology, and analysis: it is not weaker than Choice, but it is adapted to constructions where the desired object is specified by closure, independence, or extension rather than by an explicit enumeration. [example: A Maximal Extension Principle] Let $X$ be a set, and let $P$ be the collection of all pairs $(A,\preceq)$ where $A\subseteq X$ and $\preceq$ is a well-order of $A$. Order $P$ by end-extension: $(A,\preceq_A)\le (B,\preceq_B)$ when $A\subseteq B$, the order $\preceq_B$ restricts to $\preceq_A$ on $A$, and every element of $B\setminus A$ is placed after every element of $A$. If $\mathcal C$ is a chain in this order, take \begin{align*} A_*=\bigcup_{(A,\preceq)\in\mathcal C} A. \end{align*} Define $x\preceq_* y$ exactly when some member of the chain contains both $x$ and $y$ and orders $x$ before $y$. Because the chain is linearly ordered by end-extension, this definition is independent of which member is chosen. Every nonempty subset of $A_*$ has a first element: choose one of its elements, find a chain member containing it, and use the well-order on that member; if a smaller candidate appears later in the chain, end-extension forces it to have already appeared in the earlier initial segment. Thus $(A_*,\preceq_*)$ is an upper bound for $\mathcal C$. Zorn Lemma therefore gives a maximal partial well-order $(M,\preceq_M)$. If $M\ne X$, choose $x\in X\setminus M$ and put $x$ after every element of $M$; this is a proper end-extension, contradicting maximality. Hence $M=X$, so $X$ is well-ordered. [/example] This example shows why Zorn Lemma is a maximal-extension principle rather than a computational recipe. The chain condition guarantees that compatible partial objects can be merged, and maximality then forces the desired total object. More specialised uses in algebra and topology follow the same set-theoretic pattern, but their subject-specific definitions belong to those later courses. ## Weaker Choice Principles Full Choice is stronger than many ordinary arguments require. It is useful to identify smaller fragments that capture common countable or sequential selection tasks without committing to arbitrary choices over every family. [definition: Countable Choice] The Axiom of Countable Choice states that for every sequence $(X_n)_{n \in \mathbb N}$ of nonempty sets, the product \begin{align*} \prod_{n \in \mathbb N} X_n \end{align*} is nonempty. [/definition] Countable Choice is enough when a proof makes one choice at each natural-number stage and no branching dependence is required beyond the previous finite history. [example: Countable Choice in a Sequential Argument] Suppose $(A_n)_{n \in \mathbb N}$ is a sequence of nonempty subsets of a set $X$. Since each $A_n$ is nonempty, Countable Choice applies to the indexed family $(A_n)_{n\in\mathbb N}$ and gives an element of the product \begin{align*} \prod_{n\in\mathbb N}A_n. \end{align*} By the definition of indexed product, such an element is a function \begin{align*} a:\mathbb N\to \bigcup_{n\in\mathbb N}A_n \end{align*} satisfying \begin{align*} a(n)\in A_n \end{align*} for every $n\in\mathbb N$. Writing $a_n=a(n)$, we obtain a sequence $(a_n)_{n\in\mathbb N}$ with \begin{align*} a_n\in A_n \end{align*} for all $n\in\mathbb N$. Thus Countable Choice supplies exactly one simultaneous sequence of witnesses, one witness for each inhabited set $A_n$. This is the selection needed in diagonal or approximation arguments indexed by $\mathbb N$: the proof does not merely know separately that each $A_n$ has some element; it has a single function $n\mapsto a_n$ recording all those choices at once. [/example] Some recursive constructions need the next choice to depend on the previous value. Dependent Choice captures this situation and is strong enough for many arguments in analysis and descriptive set theory. [definition: Dependent Choice] The Axiom of Dependent Choice states that if $X$ is a nonempty set and $R \subset X \times X$ is a relation such that for every $x \in X$ there exists $y \in X$ with $(x,y) \in R$, then for every $x_0 \in X$ there is a sequence $(x_n)_{n \in \mathbb N}$ in $X$ such that $(x_n,x_{n+1}) \in R$ for all $n \in \mathbb N$. [/definition] This principle is tailored to building infinite paths through serial relations. The first element is specified, and each later element is chosen from the set of permitted successors of the preceding one. The choice principles introduced here are not isolated axioms; they form a hierarchy of strength. To use them responsibly, we need to know which implications are automatic and which require additional set-theoretic assumptions. [quotetheorem:4829] [citeproof:4829] The implications are not reversible over ZF. In particular, Countable Choice and Dependent Choice are designed as weaker principles, and there are models of ZF where some of them hold while full Choice fails. [example: Countable Choice Versus Full Choice] Suppose $(A_n)_{n\in\mathbb N}$ is a sequence of countable sets. For each $n$, let \begin{align*} E_n=\{e:\mathbb N\to A_n:e\text{ is surjective}\}. \end{align*} If every $A_n$ is countable and nonempty, then each $E_n$ is nonempty. Countable Choice applies to the sequence $(E_n)_{n\in\mathbb N}$ and gives a function $F$ such that \begin{align*} F(n)\in E_n \end{align*} for every $n\in\mathbb N$. Thus, for each $n$, the function $F(n):\mathbb N\to A_n$ is a chosen enumeration of $A_n$. Now define a single enumeration of $\bigcup_{n\in\mathbb N}A_n$. For $t\in\mathbb N$, write $t+1=2^r(2s+1)$ with $r,s\in\mathbb N$, and put \begin{align*} g(t)=F(r)(s). \end{align*} Then $g(t)\in A_r\subseteq\bigcup_{n\in\mathbb N}A_n$, so $g:\mathbb N\to\bigcup_{n\in\mathbb N}A_n$. If $x\in\bigcup_{n\in\mathbb N}A_n$, choose $n$ with $x\in A_n$. Since $F(n)$ is surjective onto $A_n$, there is $m\in\mathbb N$ such that \begin{align*} F(n)(m)=x. \end{align*} Set \begin{align*} t=2^n(2m+1)-1. \end{align*} Then $t+1=2^n(2m+1)$, so the definition of $g$ gives \begin{align*} g(t)=F(n)(m)=x. \end{align*} Hence $g$ is surjective, and $\bigcup_{n\in\mathbb N}A_n$ is countable. The choice made here is only the countable selection $n\mapsto F(n)$ of one enumeration for each $A_n$. Choosing one representative from every equivalence class of an arbitrary equivalence relation is different: the family $X/E$ of classes need not be countable, so selecting a function $f:X/E\to X$ with $f(C)\in C$ for all $C\in X/E$ is not supplied by Countable Choice alone unless additional structure gives a canonical representative. [/example] The weaker principles also clarify which parts of ordinary arguments depend on which selection mechanism. A sequential construction may need Dependent Choice, while a maximal-extension argument usually needs a principle equivalent to full Choice. [remark: Why Track Fragments of Choice] Set theorists distinguish fragments of Choice because they interact differently with regularity, determinacy, and the structure of the real line. For this course, the main practical lesson is diagnostic: identify whether a proof chooses along a countable sequence, follows a dependent recursion, or constructs a maximal object from arbitrary chains. The answer usually indicates whether Countable Choice, Dependent Choice, or full Choice is being used. [/remark] Choice makes cardinal arithmetic workable, but it does not eliminate the need to understand the combinatorics of infinite sizes. The next chapter builds arithmetic for cardinals themselves, defining sums, products, and powers and explaining the baseline laws that govern them before deeper structural notions appear. # 7. Cardinal Arithmetic This chapter turns the comparison theory of cardinals into an arithmetic. Once cardinals are represented by initial ordinals and the Well-Ordering Theorem is available, sums, products, and powers of cardinals can be defined by choosing sets of those sizes and counting standard constructions from them. The guiding questions are: which arithmetic laws survive from finite arithmetic, and how much of infinite cardinal arithmetic is decided by Choice alone? ## Cardinal Operations For finite sets, addition counts disjoint unions, multiplication counts Cartesian products, and exponentiation counts function sets. The same constructions make sense for arbitrary sets, but to turn them into operations on cardinals we must check that the answer depends only on the sizes of the inputs. [definition: Cardinal Sum] The cardinal sum is the operation \begin{align*} +:{\operatorname{Card}}\times {\operatorname{Card}}\to {\operatorname{Card}} \end{align*} defined as follows. For cardinals $\kappa$ and $\lambda$, choose disjoint sets $A$ and $B$ with $|A|=\kappa$ and $|B|=\lambda$, and set \begin{align*} \kappa+\lambda=|A\cup B|. \end{align*} [/definition] The disjointness condition records the idea that addition counts two labelled copies. If the original representatives are not disjoint, replace them by $A\times\{0\}$ and $B\times\{1\}$. After counting alternatives from two labelled sources, the next basic operation counts simultaneous choices. This requires ordered pairs rather than disjoint unions, because one element is chosen from each representative at the same time. [definition: Cardinal Product] The cardinal product is the operation \begin{align*} \cdot:{\operatorname{Card}}\times {\operatorname{Card}}\to {\operatorname{Card}} \end{align*} defined as follows. For cardinals $\kappa$ and $\lambda$, choose sets $A$ and $B$ with $|A|=\kappa$ and $|B|=\lambda$, and set \begin{align*} \kappa\cdot\lambda=|A\times B|. \end{align*} [/definition] Products measure the number of pairs of choices. They are the set-theoretic source of familiar finite multiplication, but infinite products collapse in ways that have no finite analogue. Many constructions require not just one or two choices, but a whole family of choices indexed by a set. Cardinal exponentiation records the size of such function spaces, making it the arithmetic operation behind power sets and sequences. [definition: Cardinal Exponentiation] Cardinal exponentiation is the operation \begin{align*} (\kappa,\lambda)\mapsto \kappa^\lambda:{\operatorname{Card}}\times {\operatorname{Card}}\to {\operatorname{Card}} \end{align*} defined as follows. For cardinals $\kappa$ and $\lambda$, choose sets $A$ and $B$ with $|A|=\kappa$ and $|B|=\lambda$, and set \begin{align*} \kappa^\lambda=|A^B|, \end{align*} where $A^B$ denotes the set of functions $f:B\to A$. [/definition] Exponentiation counts independent choices indexed by the exponent. The notation agrees with the finite case: if $|A|=m$ and $|B|=n$, then a function $B\to A$ is an ordered list of $n$ entries from $A$. These operations were defined using chosen representative sets, so they still need a well-definedness check. The issue is whether replacing representatives by bijective ones changes the resulting disjoint union, product, or function set. [quotetheorem:4830] [citeproof:4830] This theorem lets us calculate with cardinals rather than with particular sets. The hypotheses are exactly about preserving size: if $A=\varnothing$ and $A'=\{0\}$, then $|A\sqcup B|$ and $|A'\sqcup B|$ need not agree, so equality of representative cardinals cannot be weakened. The theorem also does not say that the chosen representatives are canonically the same set; it says that any two choices give bijective results, which is the correct invariant statement for cardinal arithmetic. The same bijective viewpoint will be used repeatedly below: commutativity, associativity, and distributivity come from explicit bijections between disjoint unions, products, and function sets, while the genuinely new phenomena begin when these constructions are applied to infinite cardinals. [example: The Square Of The Continuum] Let $2^\omega$ denote the set of binary sequences $\omega\to\{0,1\}$. Its cardinality is the continuum $2^{\aleph_0}$, because $2^\omega$ is the set of all functions from $\omega$ to a two-element set. Define \begin{align*} I:2^\omega\times 2^\omega&\to 2^\omega,\\ I(f,g)(2n)&=f(n),\\ I(f,g)(2n+1)&=g(n). \end{align*} For $h\in 2^\omega$, define two binary sequences \begin{align*} h_0(n)&=h(2n),\\ h_1(n)&=h(2n+1). \end{align*} Then \begin{align*} I(h_0,h_1)(2n)&=h_0(n)=h(2n),\\ I(h_0,h_1)(2n+1)&=h_1(n)=h(2n+1), \end{align*} so $I(h_0,h_1)=h$. Also, for $(f,g)\in 2^\omega\times 2^\omega$, \begin{align*} (I(f,g))_0(n)&=I(f,g)(2n)=f(n),\\ (I(f,g))_1(n)&=I(f,g)(2n+1)=g(n), \end{align*} so the inverse construction recovers $(f,g)$. Hence $I$ is a bijection, and therefore \begin{align*} |2^\omega\times 2^\omega| &=|2^\omega|\\ &=2^{\aleph_0}. \end{align*} Thus the square of the continuum still has continuum size: interleaving two binary sequences produces exactly one binary sequence, and splitting even and odd coordinates recovers the pair. [/example] ## Infinite Sums And Products Under Choice The first major simplification is that finite-looking arithmetic is no longer the right guide once both cardinals are infinite. The central problem is to show that an infinite set can be paired with two copies of itself, and even with its own square, without changing its cardinality. [quotetheorem:4831] [citeproof:4831] This is the basic arithmetic theorem behind many informal size computations. The nonzero condition in the product statement is necessary: for every cardinal $\kappa$, one has $\kappa\cdot 0=0$, so an infinite $\kappa$ cannot satisfy $\kappa\cdot 0=\max\{\kappa,0\}=\kappa$. The use of Choice is also not cosmetic; the proof compares arbitrary cardinals by replacing them with well-ordered representatives, and in ZF some cardinal comparisons and product simplifications can fail. Finally, the theorem is only about finite sums and finite products. It gives $\kappa^n=\kappa$ for each fixed $0<n<\omega$, but it gives no collapse for infinite products or exponentials, where $\kappa^{\aleph_0}$ and $2^\kappa$ may be strictly larger than $\kappa$. [example: Finite Subsets Of An Infinite Cardinal] Let $\kappa$ be an infinite cardinal, viewed as its initial ordinal, and let $[\kappa]^{<\omega}$ be the set of finite subsets of $\kappa$. For $n=0$, the set $[\kappa]^0=\{\varnothing\}$ has cardinal $1\le \kappa$. For $n>0$, send each $F\in[\kappa]^n$ to its increasing enumeration \begin{align*} F=\{\alpha_0<\alpha_1<\cdots<\alpha_{n-1}\} \end{align*} as the tuple $(\alpha_0,\ldots,\alpha_{n-1})\in\kappa^n$. Different $n$-element subsets have different increasing enumerations, so \begin{align*} |[\kappa]^n|\le |\kappa^n|. \end{align*} Also $\kappa^1=\kappa$, and if $\kappa^n=\kappa$ for some $n>0$, then by *Infinite Cardinal Absorption*, \begin{align*} \kappa^{n+1} &=\kappa^n\cdot\kappa\\ &=\kappa\cdot\kappa\\ &=\kappa. \end{align*} Thus $|[\kappa]^n|\le\kappa$ for every $n<\omega$. Now every finite subset has exactly one finite size, so \begin{align*} [\kappa]^{<\omega} &=\bigcup_{n<\omega}[\kappa]^n. \end{align*} Choose, for each $n<\omega$, an injection $e_n:[\kappa]^n\to\kappa$; this is possible because $|[\kappa]^n|\le\kappa$. Define \begin{align*} E:[\kappa]^{<\omega}&\to \omega\times\kappa,\\ E(F)&=(|F|,e_{|F|}(F)). \end{align*} If $E(F)=E(G)$, then $|F|=|G|=n$ and $e_n(F)=e_n(G)$, so $F=G$ because $e_n$ is injective. Hence \begin{align*} |[\kappa]^{<\omega}| &\le |\omega\times\kappa|\\ &=\aleph_0\cdot\kappa\\ &=\kappa \end{align*} by *Infinite Cardinal Absorption*. Conversely, the singleton map \begin{align*} \kappa&\to[\kappa]^{<\omega},\\ \alpha&\mapsto\{\alpha\} \end{align*} is injective, so $\kappa\le |[\kappa]^{<\omega}|$. Therefore \begin{align*} |[\kappa]^{<\omega}|=\kappa. \end{align*} The set of all finite subsets is no larger than $\kappa$ itself: finitely many choices from an infinite cardinal can be coded back into one element of size $\kappa$. [/example] The next example shows how exponentiation can still increase size. A countable sequence of natural numbers carries enough information to code a real number, and no more information than a subset of $\omega\times\omega$. [example: Baire Space Has Size Continuum] The set $\omega^\omega$ consists of all functions $f:\omega\to\omega$. For each $f\in\omega^\omega$, let \begin{align*} \Gamma(f)=\{(n,f(n)):n\in\omega\}\subseteq\omega\times\omega. \end{align*} If $\Gamma(f)=\Gamma(g)$ and $n\in\omega$, then $(n,f(n))\in\Gamma(g)$, so $(n,f(n))=(n,g(n))$ and therefore $f(n)=g(n)$. Hence $f=g$, so $f\mapsto\Gamma(f)$ is injective and \begin{align*} |\omega^\omega|\le |\mathcal P(\omega\times\omega)|. \end{align*} Now define \begin{align*} p:\omega\times\omega&\to\omega,\\ p(m,n)&=2^m(2n+1)-1. \end{align*} For every $k\in\omega$, the positive integer $k+1$ has a unique form $2^m q$ with $q$ odd, and then $q=2n+1$ for a unique $n\in\omega$. Thus $p$ is a bijection. It induces a bijection \begin{align*} \mathcal P(\omega\times\omega)&\to\mathcal P(\omega),\\ S&\mapsto p[S], \end{align*} with inverse $T\mapsto p^{-1}[T]$. Therefore \begin{align*} |\mathcal P(\omega\times\omega)| &=|\mathcal P(\omega)|\\ &=2^{\aleph_0}, \end{align*} and hence $|\omega^\omega|\le 2^{\aleph_0}$. Conversely, since $\{0,1\}\subseteq\omega$, every binary sequence is also a function $\omega\to\omega$. The inclusion \begin{align*} 2^\omega&\to\omega^\omega,\\ s&\mapsto s \end{align*} is injective, so \begin{align*} 2^{\aleph_0}=|2^\omega|\le|\omega^\omega|. \end{align*} Combining the two inequalities gives \begin{align*} |\omega^\omega|=2^{\aleph_0}. \end{align*} Thus a sequence of natural numbers has exactly continuum-many possibilities: its graph is no more data than a subset of $\omega$, while binary sequences already supply continuum-many examples. [/example] ## Cantor And Konig Inequalities Absorption theorems describe sums and products under Choice, but exponentiation has a built-in source of growth. The power set operation always escapes the original set, and Konig's theorem gives a broader version for sums of smaller cardinals. [quotetheorem:621] [citeproof:621] Cantor's argument is the prototype for diagonal methods throughout logic. Its strength is universal: no choice principle is needed, and no structural information about $A$ is required. Its limitation is just as important: it gives only the lower bound $\kappa<2^\kappa$, not the identity of the next cardinal. Thus it proves that the continuum is uncountable and that $2^{\aleph_0}$ is followed by still larger cardinals such as $2^{2^{\aleph_0}}$, but it does not decide whether $2^{\aleph_0}=\aleph_1$, $\aleph_2$, or something much larger. [quotetheorem:4832] [citeproof:4832] Konig's theorem is stronger than [Cantor's theorem](/theorems/621) in a useful direction. Taking $I=A$, $\kappa_i=1$, and $\lambda_i=2$ recovers the idea that no diagonal enumeration can list all binary choices indexed by $A$. The strict pointwise hypothesis is essential: if $I=\{0\}$ and $\kappa_0=\lambda_0$, then the conclusion would read $\kappa_0<\lambda_0$, which is false. The theorem also compares a sum with a full product; it is not a statement that every product is large in isolation, since empty factors make products collapse and finite products of infinite cardinals are governed instead by absorption. Its role is to identify the diagonal obstruction that remains after the easy sum and product calculations have been settled. [remark: Cofinality Consequence] Konig's theorem implies that the cofinality of $2^\kappa$ is greater than $\kappa$ in many standard arguments involving increasing unions of length at most $\kappa$. The general cofinality theory belongs to the next chapter, but this theorem is the first signal that exponentiation is constrained by more than monotonicity. [/remark] ## The Continuum Function, CH, And GCH The final question in this chapter is how much ZFC decides about the function $\kappa\mapsto 2^\kappa$. Cantor gives the lower bound $\kappa^+\le 2^\kappa$, while monotonicity gives $2^\kappa\le 2^\lambda$ whenever $\kappa\le\lambda$. The Continuum Hypothesis and the Generalized Continuum Hypothesis assert that the lower bound is always attained at specific cardinals. [definition: Continuum Function] The continuum function is the class function on cardinals defined by \begin{align*} \kappa\longmapsto 2^\kappa. \end{align*} [/definition] This function records the sizes of power sets. Its first value at an infinite cardinal is $2^{\aleph_0}$, the cardinality of $\mathcal P(\omega)$ and of $\mathbb R$. Cantor theorem only tells us that this first infinite value is larger than $\aleph_0$. The sharp question is whether it is the immediate next cardinal, or whether an intermediate size can sit between countable sets and the continuum. [definition: Continuum Hypothesis] The Continuum Hypothesis is the statement \begin{align*} 2^{\aleph_0}=\aleph_1. \end{align*} [/definition] Thus CH says that there is no cardinal strictly between the countable cardinal and the continuum. In terms of well-orderings of the reals, it says that the reals can be well-ordered in type $\omega_1$. The same question can be asked above every infinite cardinal: does the power set operation always jump exactly one cardinal step? The generalized form states that this minimal-growth pattern holds uniformly across the whole infinite cardinal hierarchy. [definition: Generalized Continuum Hypothesis] The Generalized Continuum Hypothesis is the statement that for every infinite cardinal $\kappa$, \begin{align*} 2^\kappa=\kappa^+. \end{align*} [/definition] GCH extends CH from $\aleph_0$ to all infinite cardinals. It is a formal statement in the language of set theory because cardinals, successor cardinals, and power sets are all definable in ZFC. [remark: Independence Not Proved Here] This course treats CH and GCH as formal statements and does not prove their independence from ZFC. The later forcing course explains why ZFC cannot decide CH, assuming ZFC is consistent, and why the continuum function is subject to both deep freedom and structural restrictions. [/remark] The arithmetic developed in this chapter gives the baseline against which those later independence results are measured. Choice settles finite sums and products of infinite cardinals, Cantor forces strict growth under powers, and Konig's theorem supplies a diagonal obstruction that any proposed continuum function must respect. Cardinal arithmetic reveals the basic constraints on size, yet many arguments require a finer measure of how sets are approached from below. The next chapter introduces cofinality and regularity to capture that approximation behavior, separating cardinals that are internally robust from those built as limits of shorter stages. # 8. Cofinality and Regular Cardinals This chapter studies how large ordered sets are approached from below. Earlier chapters developed ordinals and cardinals as canonical representatives of well-order types and sizes; cofinality measures the least length of an unbounded increasing approximation to an ordinal. The regular-singular distinction is then a structural distinction among cardinals, and it becomes one of the main organising ideas in later combinatorial set theory. ## Approaching an Ordinal from Below When an ordinal is too large to list outright, the next question is whether it can be reached by a smaller sequence of earlier ordinals. For finite ordinals this question is uninteresting, but for limit ordinals it captures a genuine difference between order type and size. [definition: Cofinal Subset of an Ordinal] Let $\alpha$ be an ordinal. A subset $C \subset \alpha$ is cofinal in $\alpha$ if for every $\beta < \alpha$ there exists $\gamma \in C$ such that $\beta \le \gamma$. [/definition] The word cofinal means that $C$ is not trapped below any particular point of $\alpha$. In terms of suprema, a subset $C \subset \alpha$ is cofinal in $\alpha$ precisely when $\sup C = \alpha$, where the supremum is taken in the class of ordinals. [example: Natural Numbers Are Cofinal in Omega] Take $\alpha=\omega$ and $C=\omega$. To check that $C$ is cofinal in $\omega$, let $n<\omega$ be arbitrary. Since $n\in\omega=C$, we may choose $\gamma=n$, and then \begin{align*} n \le \gamma \quad\text{because}\quad \gamma=n. \end{align*} Thus every $n<\omega$ is below some member of $C$, so $C$ is cofinal in $\omega$. Now let $F\subset\omega$ be finite. If $F=\varnothing$, then $0<\omega$ has no witness in $F$, so $F$ is not cofinal. If $F\neq\varnothing$, let $m$ be the largest element of $F$. Then $m+1<\omega$, and for every $\gamma\in F$ we have $\gamma\le m<m+1$. Hence no $\gamma\in F$ satisfies $m+1\le\gamma$, so $F$ is not cofinal in $\omega$. Therefore $\omega$ itself is cofinal in $\omega$, but no finite subset of $\omega$ reaches arbitrarily far below $\omega$. [/example] This example suggests that cofinality should record the least possible order type of a cofinal approximation, not merely whether cofinal subsets exist. The definition uses functions because sequences indexed by ordinals are the standard ZFC way of formalising long approximations. To make this invariant comparable across different ordinals, we measure the shortest ordinal length of a cofinal map. That definition captures how many stages are genuinely needed to approach the ordinal from below. [definition: Cofinality of an Ordinal] Let $\alpha$ be an ordinal. The cofinality of $\alpha$, denoted $\operatorname{cf}(\alpha)$, is the least ordinal $\delta$ for which there exists a function $f: \delta \to \alpha$ whose range is cofinal in $\alpha$. [/definition] The least such $\delta$ is always a cardinal, since replacing $\delta$ by its initial ordinal gives a cofinal map of no larger order type. Thus $\operatorname{cf}(\alpha)$ is usually treated as a cardinal invariant of the order structure of $\alpha$. The first task after defining cofinality is to show that the least possible length is a cardinal, not merely some arbitrary ordinal representative. If a shorter ordinal had the same size as the least cofinal index, a bijection would transport the cofinal map to that shorter index and contradict minimality. The theorem below records this invariance. [quotetheorem:4833] [citeproof:4833] The point of the theorem is that $\operatorname{cf}(\alpha)$ is a genuine cardinal invariant of $\alpha$. Different cofinal maps may have different domains, but the minimal possible domain is the initial ordinal of its cardinality. This is why cofinality can be compared with other cardinals in later cardinal-arithmetic arguments. Before applying this flexibility, it is useful to fix the two extreme cases. Successor ordinals have a final element, so approaching them from below is unnecessary, while limit ordinals have no last point and therefore require a genuinely cofinal set of earlier ordinals. These elementary cases provide the baseline against which singular and regular behaviour will be measured. [quotetheorem:4834] [citeproof:4834] These facts separate two roles played by ordinals: a successor ordinal has a last point, while a limit ordinal must be approached by a genuinely infinite collection of earlier stages. [example: Cofinality of Omega Plus Omega] Let $\alpha=\omega+\omega$, and let \begin{align*} C=\{\omega+n:n<\omega\}. \end{align*} We first show that $C$ is cofinal in $\omega+\omega$. Let $\beta<\omega+\omega$ be arbitrary. Since $\omega+\omega=\sup_{n<\omega}(\omega+n)$, there is some $n<\omega$ such that \begin{align*} \beta<\omega+n. \end{align*} Then $\gamma=\omega+n$ belongs to $C$, and \begin{align*} \beta\le \gamma \quad\text{because}\quad \beta<\omega+n=\gamma. \end{align*} Thus $C$ is cofinal in $\omega+\omega$. Now define $f:\omega\to\omega+\omega$ by \begin{align*} f(n)=\omega+n. \end{align*} Its range is exactly $C$, so the definition of cofinality gives \begin{align*} \operatorname{cf}(\omega+\omega)\le \omega. \end{align*} The ordinal $\omega+\omega$ is a nonzero limit ordinal: if $n<\omega$, then \begin{align*} (\omega+n)+1=\omega+(n+1)<\omega+\omega, \end{align*} so no point $\omega+n$ is last. Hence, by *[Basic Cofinality Properties](/theorems/4834)*, $\operatorname{cf}(\omega+\omega)$ is an infinite cardinal. Since the only infinite cardinal less than or equal to $\omega$ is $\omega$ itself, we get \begin{align*} \operatorname{cf}(\omega+\omega)=\omega. \end{align*} So $\omega+\omega$ has larger order type than $\omega$, but it is still approached cofinally by an $\omega$-sequence. [/example] The order type $\omega+\omega$ is larger than $\omega$, but it is still reached by an $\omega$-sequence. This is the first warning that cofinality is not the same thing as cardinality. ## Regular and Singular Cardinals Cardinals can be viewed as ordinals, so they have cofinalities. The guiding question is whether a cardinal can be assembled from fewer smaller pieces, or whether every unbounded approximation already has full length. [definition: Regular Cardinal] An infinite cardinal $\kappa$ is regular if $\operatorname{cf}(\kappa)=\kappa$. [/definition] A regular cardinal cannot be reached by a sequence of length smaller than itself. This turns a statement about the order structure of the initial ordinal $\kappa$ into a useful size principle. The complementary case is just as important: some infinite cardinals are limits of smaller stages in fewer than their own number of steps. Naming that failure of regularity lets us distinguish cardinals by how they are built from below. [definition: Singular Cardinal] An infinite cardinal $\kappa$ is singular if $\operatorname{cf}(\kappa)<\kappa$. [/definition] Singular cardinals are limits of smaller cardinals in a strong sense: they admit cofinal sequences whose length is below the cardinal itself. The most basic example is obtained by taking the first limit of the aleph sequence. [example: Cofinality of Aleph Omega] Let $\aleph_\omega=\sup_{n<\omega}\aleph_n$, and define $f:\omega\to\aleph_\omega$ by \begin{align*} f(n)=\aleph_n. \end{align*} We first check that the range of $f$ is cofinal in $\aleph_\omega$. Let $\beta<\aleph_\omega$ be arbitrary. If there were no $n<\omega$ with $\beta<\aleph_n$, then $\aleph_n\le \beta$ for every $n<\omega$, so $\beta$ would be an upper bound for $\{\aleph_n:n<\omega\}$. This contradicts \begin{align*} \aleph_\omega=\sup_{n<\omega}\aleph_n \quad\text{and}\quad \beta<\aleph_\omega. \end{align*} Hence there is some $n<\omega$ such that $\beta<\aleph_n=f(n)$, and therefore $\beta\le f(n)$. Thus $f$ has cofinal range in $\aleph_\omega$, so by the definition of cofinality, \begin{align*} \operatorname{cf}(\aleph_\omega)\le \omega. \end{align*} The ordinal $\aleph_\omega$ is a nonzero limit ordinal, since it is the supremum of the strictly increasing sequence \begin{align*} \aleph_0<\aleph_1<\aleph_2<\cdots. \end{align*} By *Basic Cofinality Properties*, $\operatorname{cf}(\aleph_\omega)$ is an infinite cardinal. Since \begin{align*} \operatorname{cf}(\aleph_\omega)\le\omega \end{align*} and $\omega$ is the least infinite cardinal, it follows that \begin{align*} \operatorname{cf}(\aleph_\omega)=\omega. \end{align*} Finally, \begin{align*} \omega=\aleph_0<\aleph_1\le \aleph_\omega, \end{align*} so $\operatorname{cf}(\aleph_\omega)=\omega<\aleph_\omega$. Therefore $\aleph_\omega$ is singular. [/example] The preceding example shows that singularity is not exceptional at limit stages of the cardinal hierarchy. The obstruction is that a limit cardinal may be assembled from many smaller cardinals, so its size and its cofinality can pull in different directions. To control this tension, one needs an inequality saying that products of larger coordinates cannot be swallowed by sums of smaller coordinates when the index set is measured by cofinality. [quotetheorem:4835] [citeproof:4835] Konig's theorem is the basic anti-collapse principle for cardinal arithmetic: coordinatewise strict inequalities cannot be erased by taking a large product on one side and a sum on the other. Its force comes from combining many local size gaps into one global contradiction, which is why the cofinality of the index set matters. In practice, choosing the $\lambda_i$ to be successors turns the theorem into a tool for ruling out impossible exponentiation and summation identities, and it also gives a clean route to the [regularity of successor cardinals](/theorems/4836) under Choice. [quotetheorem:4836] [citeproof:4836] This theorem is one of the main practical uses of Choice in elementary cardinal arithmetic. It says that singular behaviour first appears at limit cardinals, never immediately after a cardinal. [example: Omega One Is Regular] Taking $\kappa=\omega$ in *Successor Cardinals Are Regular* gives \begin{align*} \operatorname{cf}(\omega^+)=\omega^+. \end{align*} Since $\omega^+=\omega_1=\aleph_1$, this is \begin{align*} \operatorname{cf}(\omega_1)=\omega_1, \end{align*} so $\omega_1$ is regular. We now spell out the boundedness consequence. Let $A\subset\omega_1$ be countable. If $A$ were unbounded in $\omega_1$, then some enumeration of $A$ by a countable ordinal would give a function \begin{align*} f:\delta\to\omega_1 \end{align*} with $\delta\le\omega$ and cofinal range. By the definition of cofinality, this would imply \begin{align*} \operatorname{cf}(\omega_1)\le |\delta|\le\omega. \end{align*} But regularity gives \begin{align*} \operatorname{cf}(\omega_1)=\omega_1>\omega, \end{align*} a contradiction. Hence $A$ is not unbounded, so there is some $\beta<\omega_1$ such that \begin{align*} \alpha\le\beta \end{align*} for every $\alpha\in A$. Thus every countable subset of $\omega_1$ is bounded below a countable ordinal, which is the form of regularity used later in club and stationary set arguments. [/example] The contrast between $\omega_1$ and $\aleph_\omega$ is central: both are limit ordinals, but only $\omega_1$ is protected by being a successor cardinal. ## Closed Unbounded Sets in Regular Cardinals Cofinality becomes especially powerful when combined with the order topology on a regular uncountable cardinal. The motivating problem is to identify large subsets of $\kappa$ that are stable under limits and still reach arbitrarily high below $\kappa$. [definition: Unbounded Subset of a Cardinal] Let $\kappa$ be an infinite cardinal. A set $C \subset \kappa$ is unbounded in $\kappa$ if for every $\alpha < \kappa$ there exists $\beta \in C$ such that $\alpha < \beta$. [/definition] For subsets of a limit ordinal, unboundedness is the same cofinality condition as before, with a strict inequality convention that is more natural for cardinals. Reach alone is not enough for the later arguments, because an unbounded set may climb toward a limit while omitting the limit point that its own approximations determine. The missing stability condition is closure under such limits: whenever the set is already cofinal below a limit stage, the limit stage itself should be included. [definition: Closed Subset of a Regular Cardinal] Let $\kappa$ be a regular uncountable cardinal. A set $C \subset \kappa$ is closed in $\kappa$ if whenever $\lambda < \kappa$ is a limit ordinal and $C \cap \lambda$ is unbounded in $\lambda$, then $\lambda \in C$. [/definition] This is closure in the order topology on $\kappa$. It says that if points of $C$ approach a limit stage below $\kappa$, the limit stage is also included. [definition: Club Subset] Let $\kappa$ be a regular uncountable cardinal. A set $C \subset \kappa$ is club in $\kappa$ if $C$ is closed in $\kappa$ and unbounded in $\kappa$. [/definition] Club sets are large in two different senses: unboundedness gives reach, while closure gives stability under limits. Later stationary set theory studies subsets of $\kappa$ that meet every club. [example: Limit Ordinals Form a Club in Omega One] Let \begin{align*} C=\{\lambda<\omega_1:\lambda\text{ is a nonzero limit ordinal}\}. \end{align*} We show first that $C$ is unbounded in $\omega_1$. Let $\alpha<\omega_1$. Then $\alpha$ is a countable ordinal, so \begin{align*} \alpha+\omega=\sup_{n<\omega}(\alpha+n) \end{align*} is still countable, because it is the union of the countable set $\alpha$ with the countable sequence of new finite successors. Hence \begin{align*} \alpha+\omega<\omega_1. \end{align*} Also $\alpha+\omega$ is a nonzero limit ordinal: for each $n<\omega$, \begin{align*} \alpha+n<\alpha+(n+1)<\alpha+\omega, \end{align*} and \begin{align*} \sup_{n<\omega}(\alpha+n)=\alpha+\omega. \end{align*} Thus $\alpha+\omega\in C$, and since \begin{align*} \alpha<\alpha+\omega, \end{align*} there is a member of $C$ strictly above the arbitrary $\alpha<\omega_1$. Now let $\lambda<\omega_1$ be a limit ordinal such that $C\cap\lambda$ is unbounded in $\lambda$. Since $\lambda<\omega_1$, the ordinal $\lambda$ is countable. Since $\lambda$ is already a nonzero limit ordinal in the closure test, it satisfies \begin{align*} \lambda<\omega_1 \quad\text{and}\quad \lambda\text{ is a nonzero limit ordinal}, \end{align*} so $\lambda\in C$. Therefore $C$ is closed in $\omega_1$. The set $C$ is both closed and unbounded in $\omega_1$, so $C$ is club in $\omega_1$. [/example] This example is the prototype: clubs often arise as collections of stages closed under a construction. The useful feature of such sets is not just that one construction has many closure points, but that finitely many club requirements can be imposed at once without losing largeness. The possible obstruction is that intersections can destroy unboundedness, so regularity and closure must work together to keep the common part cofinal. [quotetheorem:4837] [citeproof:4837] The regularity assumption is doing real work here: it keeps the diagonal construction from escaping $\kappa$. This closure-under-intersection fact is the first combinatorial reason that regular uncountable cardinals behave like well-controlled ambient spaces. [example: A Club of Closure Points of a Function] Let $f:\omega_1\to\omega_1$ be any function, and define \begin{align*} C_f=\{\alpha<\omega_1:f[\alpha]\subset\alpha\}. \end{align*} We show that $C_f$ is club in $\omega_1$. First we prove unboundedness. Fix $\beta<\omega_1$, and set \begin{align*} \alpha_0=\beta+1. \end{align*} Having chosen a countable ordinal $\alpha_n<\omega_1$, the set $\alpha_n$ is countable, so $f[\alpha_n]$ is countable. Hence \begin{align*} s_n=\sup f[\alpha_n] \end{align*} is a countable ordinal below $\omega_1$. Choose \begin{align*} \alpha_{n+1}=\max\{\alpha_n+1,s_n+1\}. \end{align*} Then \begin{align*} \alpha_n<\alpha_{n+1}<\omega_1 \quad\text{and}\quad s_n<\alpha_{n+1}. \end{align*} Let \begin{align*} \alpha=\sup_{n<\omega}\alpha_n. \end{align*} Since $\{\alpha_n:n<\omega\}$ is countable and each $\alpha_n$ is countable, its supremum is still a countable ordinal; therefore \begin{align*} \alpha<\omega_1. \end{align*} Also $\beta<\alpha_0\le\alpha$. We now check that $\alpha\in C_f$. Let $\xi<\alpha$. Since $\alpha=\sup_{n<\omega}\alpha_n$, there is some $n<\omega$ such that \begin{align*} \xi<\alpha_n. \end{align*} Thus $\xi\in\alpha_n$, so \begin{align*} f(\xi)\in f[\alpha_n]. \end{align*} By the definition of $s_n$, \begin{align*} f(\xi)\le s_n<\alpha_{n+1}. \end{align*} Because the sequence $(\alpha_n)_{n<\omega}$ is strictly increasing, every $\alpha_{n+1}$ is below its supremum $\alpha$, so \begin{align*} f(\xi)<\alpha. \end{align*} Since this holds for every $\xi<\alpha$, we have $f[\alpha]\subset\alpha$, and hence $\alpha\in C_f$. Thus above every $\beta<\omega_1$ there is some member of $C_f$, so $C_f$ is unbounded in $\omega_1$. Now we prove closure. Let $\lambda<\omega_1$ be a limit ordinal, and suppose that $C_f\cap\lambda$ is unbounded in $\lambda$. To show $\lambda\in C_f$, take any $\xi<\lambda$. By unboundedness of $C_f\cap\lambda$ in $\lambda$, there is some $\alpha\in C_f\cap\lambda$ such that \begin{align*} \xi<\alpha<\lambda. \end{align*} Since $\alpha\in C_f$, we have $f[\alpha]\subset\alpha$. In particular, \begin{align*} f(\xi)\in f[\alpha]\subset\alpha<\lambda, \end{align*} so $f(\xi)<\lambda$. Therefore $f[\lambda]\subset\lambda$, and hence $\lambda\in C_f$. The set $C_f$ is closed and unbounded in $\omega_1$, so it is a club subset of $\omega_1$. [/example] This construction explains why clubs are the right preparation for later combinatorics: they encode closure under operations while remaining unavoidable inside a regular uncountable cardinal. Cofinality shows how large ordinals and cardinals are reached, but the next step is to detect sets that meet every sufficiently rich approximation scheme. The next chapter does this with club and stationary sets, and Fodor's lemma turns that interaction into a powerful combinatorial principle. # 9. Stationary Sets and Fodor's Lemma This chapter introduces the club and stationary subsets of regular uncountable cardinals. The guiding problem is to make precise the idea that a subset of a large cardinal is too large to avoid by any closed unbounded approximation process. The preceding chapter developed cofinality and regularity; here those notions become tools for measuring largeness inside a regular cardinal, especially through the club filter and [Fodor's pressing down lemma](/theorems/4841). ## Closed Unbounded Sets and Stationarity Many arguments below a regular cardinal $\kappa$ build objects by stages indexed by smaller ordinals and then take a limit stage. We want a class of subsets of $\kappa$ that is closed under such limit-stage constructions and still reaches cofinally far through $\kappa$. [definition: Club Subset] Let $\kappa$ be a regular uncountable cardinal. A set $C \subseteq \kappa$ is club in $\kappa$ if: 1. $C$ is unbounded in $\kappa$: for every $\alpha < \kappa$ there is $\beta \in C$ with $\alpha < \beta$; 2. $C$ is closed in $\kappa$: whenever $\lambda < \kappa$ is a limit ordinal and $C \cap \lambda$ is unbounded in $\lambda$, then $\lambda \in C$. [/definition] The closure condition says that increasing sequences from $C$ of length below $\kappa$ have their limit back in $C$, provided the limit is still below $\kappa$. Regularity of $\kappa$ is what makes this closure compatible with the length of the constructions used in the course. [example: Countable Limit Ordinals Are Club] Let $C=\{\alpha<\omega_1:\alpha\text{ is a countable limit ordinal}\}$. We show that $C$ is unbounded and closed in $\omega_1$. If $\alpha<\omega_1$, then $\alpha$ is countable, so \begin{align*} \alpha+\omega=\sup_{n<\omega}(\alpha+n) \end{align*} is a supremum of countably many countable ordinals, hence is still countable and therefore lies below $\omega_1$. Also $\alpha<\alpha+1\le \alpha+\omega$, and $\alpha+\omega$ is a limit ordinal because the sequence \begin{align*} \alpha<\alpha+1<\alpha+2<\cdots \end{align*} is cofinal in it. Thus $\alpha+\omega\in C$ and is above $\alpha$, so $C$ is unbounded in $\omega_1$. For closure, let $\lambda<\omega_1$ be a limit ordinal such that $C\cap\lambda$ is unbounded in $\lambda$. Since $\lambda<\omega_1$, the ordinal $\lambda$ is countable. The unboundedness of $C\cap\lambda$ means that for every $\gamma<\lambda$ there is some $\beta\in C$ with \begin{align*} \gamma<\beta<\lambda. \end{align*} So $\lambda$ cannot be $0$ or a successor ordinal; it is a limit ordinal. Hence $\lambda$ is a countable limit ordinal, so $\lambda\in C$. Therefore $C$ is closed and unbounded in $\omega_1$, i.e. $C$ is club in $\omega_1$. [/example] Club sets are the sets that contain all sufficiently coherent limit stages of some construction. The next notion turns clubs into tests for largeness. A subset of $\kappa$ may be large in cardinality and still miss an important closed unbounded set; the stronger requirement is that it cannot avoid any club at all. This captures sets that remain unavoidable for every coherent approximation process on the cardinal. [definition: Stationary Subset] Let $\kappa$ be a regular uncountable cardinal. A set $S \subseteq \kappa$ is stationary in $\kappa$ if $S \cap C \ne \varnothing$ for every club set $C \subseteq \kappa$. [/definition] A stationary set need not contain a tail segment of $\kappa$, and it can be split into many disjoint stationary pieces in later arguments. The definition says instead that every closed unbounded approximation must meet it. [example: Cofinality Omega Below Omega One] Let $S=\{\alpha<\omega_1:\operatorname{cf}(\alpha)=\omega\}$, and let $C\subseteq\omega_1$ be club. We show that $S\cap C\ne\varnothing$. Since $C$ is unbounded in $\omega_1$, choose $\alpha_0\in C$ with $0<\alpha_0$. Having chosen $\alpha_n\in C$, unboundedness of $C$ gives $\alpha_{n+1}\in C$ with \begin{align*} \alpha_n<\alpha_{n+1}<\omega_1. \end{align*} Thus $(\alpha_n)_{n\in\mathbb N}$ is a strictly increasing sequence of points of $C$. Put \begin{align*} \lambda=\sup_{n\in\mathbb N}\alpha_n. \end{align*} Each $\alpha_n$ is a countable ordinal, and $\lambda=\bigcup_{n\in\mathbb N}\alpha_n$ is a [countable union of countable sets](/theorems/755), so $\lambda$ is countable and hence $\lambda<\omega_1$. Also $\lambda$ is not equal to any $\alpha_m$, because $\alpha_m<\alpha_{m+1}\le\lambda$ for every $m$. Therefore $\lambda$ is a limit ordinal. For every $\gamma<\lambda$, the definition of supremum gives some $n$ with $\gamma<\alpha_n$, and then $\alpha_n\in C\cap\lambda$. Hence $C\cap\lambda$ is unbounded in $\lambda$. Since $C$ is closed in $\omega_1$ and $\lambda<\omega_1$ is a limit ordinal, it follows that $\lambda\in C$. The sequence $(\alpha_n)_{n\in\mathbb N}$ is cofinal in $\lambda$, so $\operatorname{cf}(\lambda)\le\omega$. Since $\lambda$ is a limit ordinal, no finite set is cofinal in $\lambda$: a finite subset has a maximum below $\lambda$. Hence $\operatorname{cf}(\lambda)=\omega$. Thus $\lambda\in S\cap C$, and since $C$ was an arbitrary club subset of $\omega_1$, the set $S$ is stationary in $\omega_1$. [/example] The last example also shows why closed unbounded sets are the right tests for stationarity: closure turns a countable approximation into a point of the club, while unboundedness lets the approximation keep moving upward. To use stationarity as a stable largeness notion, the class of club tests must itself behave well under the basic operations one applies in arguments. The first possible failure is finite intersection: two unbounded sets can be disjoint, so it is not automatic that satisfying two club requirements still leaves a club. The closure condition is designed to overcome exactly this problem for regular uncountable cardinals. [quotetheorem:4838] [citeproof:4838] This theorem starts the filter viewpoint. A club is a large set, and finite intersections of large sets should remain large. The hypotheses are doing real work: unboundedness alone is not preserved by intersection, since the even and odd countable ordinals are both unbounded in $\omega_1$ but disjoint. Closure alone is also too weak, since a bounded [closed set](/page/Closed%20Set) can be avoided by a tail club. Regularity enters when the alternating construction takes a supremum of fewer than $\kappa$ many ordinals and must remain below $\kappa$; at singular cardinals the same style of argument can run into the cardinal itself. The theorem does not say that arbitrary or length-$\kappa$ intersections of clubs are club, but it gives exactly the finite-intersection closure needed to generate a filter. [definition: Club Filter] Let $\kappa$ be a regular uncountable cardinal. The club filter on $\kappa$ is \begin{align*} \mathcal C_\kappa = \{X \subseteq \kappa : \text{there is a club } C \subseteq \kappa \text{ with } C \subseteq X\}. \end{align*} [/definition] Thus members of the club filter are the sets that contain a club, not necessarily club sets themselves. Stationary sets are precisely the sets that meet every member generated by a club. [remark: Stationarity and the Club Filter] For $S \subseteq \kappa$, the set $S$ is stationary in $\kappa$ iff $\kappa \setminus S \notin \mathcal C_\kappa$. Thus stationarity is the positive notion dual to the club filter. [/remark] ## Diagonal Intersections and Normality Finite intersections of clubs are not enough for transfinite recursion. A construction may produce a sequence $(C_\alpha)_{\alpha < \kappa}$ of club requirements, where the requirement at stage $\alpha$ only needs to apply above $\alpha$. Diagonal intersection packages this varying family into one club. [definition: Diagonal Intersection] Let $\kappa$ be a regular uncountable cardinal and let $(C_\alpha)_{\alpha < \kappa}$ be a sequence of subsets of $\kappa$. The diagonal intersection is \begin{align*} \Delta_{\alpha < \kappa} C_\alpha = \{\beta < \kappa : \beta \in C_\alpha \text{ for every } \alpha < \beta\}. \end{align*} [/definition] The condition ignores $C_\alpha$ for $\beta \le \alpha$, so it asks only that $\beta$ satisfy all earlier requirements. This asymmetry is exactly what makes the construction compatible with length $\kappa$. [quotetheorem:4839] [citeproof:4839] This theorem is the point where diagonalisation replaces ordinary intersection. Taking $\bigcap_{\alpha < \kappa} C_\alpha$ would be too strong: for example, in $\omega_1$ the tail sets $C_\alpha = \{\beta < \omega_1 : \alpha < \beta\}$ are club, but their full intersection is empty. The diagonal intersection avoids this obstruction by requiring a point $\beta$ to meet only the earlier requirements $C_\alpha$ for $\alpha < \beta$. Regularity is again essential, because the construction repeatedly takes suprema of fewer than $\kappa$ ordinals and needs them to stay below $\kappa$. The theorem does not make every point satisfy every club requirement; it produces a club of points satisfying all requirements that are visible below that point, which is exactly the shape needed for pressing-down arguments. Diagonal intersections are the characteristic closure property of a normal filter. Finite intersection closure is not enough for many stationary-set arguments, where one often has a separate large requirement for each $\alpha<\kappa$. Ordinary intersection of all those requirements is too strong, so the filter notion must record the weaker diagonal pattern that only asks a point to satisfy earlier requirements. Normality is the abstract name for that precise closure property. [definition: Normal Filter] Let $\kappa$ be a regular uncountable cardinal. A filter $\mathcal F$ on $\kappa$ is normal if for every sequence $(X_\alpha)_{\alpha < \kappa}$ of members of $\mathcal F$, the diagonal intersection $\Delta_{\alpha < \kappa} X_\alpha$ belongs to $\mathcal F$. [/definition] Normality says that a length-$\kappa$ list of large requirements can be met on a large set, provided each point is only asked to satisfy earlier requirements. After defining normality, the key question is whether the club filter really has this stronger closure property, rather than merely finite intersection closure. The diagonal-intersection theorem supplies the hard part, but the filter statement packages it in the language that will be used with stationary sets and regressive functions. This packaging is important because later arguments will refer to normality without reopening the construction of diagonal clubs each time. [quotetheorem:4840] [citeproof:4840] Normality is stronger than finite closure and is different from asking for arbitrary intersections of length $\kappa$ to remain large. The club filter is $<\kappa$-complete on a regular $\kappa$, but full $\kappa$-completeness fails in the basic tail example $C_\alpha = \{\beta < \kappa : \alpha < \beta\}$, whose intersection over all $\alpha < \kappa$ is empty. Normality keeps the useful part of a length-$\kappa$ intersection by only imposing earlier requirements at each point. Without regularity, the proof can fail because the suprema used to meet fewer than $\kappa$ many club requirements may have cofinality reaching the ambient singular cardinal. Thus normality is precisely calibrated to transfinite recursion and stationarity arguments, rather than being a blanket closure principle for all long intersections. ## Regressive Functions and Pressing Down Stationary sets are large enough that a function cannot move every point of such a set downward in an unconstrained way. The pressing-down lemma says that any regressive map on a stationary set is constant on a stationary subset. [definition: Regressive Function] Let $\kappa$ be a regular uncountable cardinal and let $S \subseteq \kappa$. A function $f:S \to \kappa$ is regressive if $f(\alpha) < \alpha$ for every nonzero $\alpha \in S$. [/definition] The condition is meaningful only on ordinals above $0$, and in applications $S$ usually consists of limit ordinals. Regressive functions record earlier witnesses, earlier stages, or earlier choices attached to each point. [example: A Regressive Map from First Entries] Let $S=\{\alpha<\omega_1:\operatorname{cf}(\alpha)=\omega\}$. For each $\alpha\in S$, choose a strictly increasing cofinal sequence $(a_n^\alpha)_{n\in\mathbb N}$ in $\alpha$, meaning \begin{align*} a_0^\alpha<a_1^\alpha<a_2^\alpha<\cdots<\alpha \end{align*} and for every $\gamma<\alpha$ there is some $n\in\mathbb N$ with $\gamma<a_n^\alpha$. Define $f:S\to\omega_1$ by \begin{align*} f(\alpha)=a_1^\alpha. \end{align*} Since $a_1^\alpha<\alpha<\omega_1$, the value $f(\alpha)$ is an ordinal below $\omega_1$. Also $S$ contains only limit ordinals of cofinality $\omega$, so every $\alpha\in S$ is nonzero. Therefore \begin{align*} f(\alpha)=a_1^\alpha<\alpha \end{align*} for every $\alpha\in S$, and $f$ is regressive. By *Fodors Pressing Down Lemma*, there is some $\xi<\omega_1$ such that \begin{align*} f^{-1}(\{\xi\}) = \{\alpha\in S:f(\alpha)=\xi\} = \{\alpha\in S:a_1^\alpha=\xi\} \end{align*} is stationary in $\omega_1$. Since $\xi<\omega_1$, this fixed value $\xi$ is a countable ordinal. Thus even though the cofinal sequences may have been chosen differently for different $\alpha$, their first nonzero entries agree on a stationary subset of $S$. [/example] The example hints at the force of the lemma: even if the choices of cofinal sequences vary widely, a regressive coordinate must repeat stationarily often. The general phenomenon is that stationarity prevents a regressive function from assigning genuinely different smaller values everywhere. If every fibre were nonstationary, one could try to choose clubs avoiding each fibre and then combine those club requirements diagonally; [normality of the club filter](/theorems/4840) is exactly what blocks that attempted dispersion. The formal result turns this obstruction into a homogeneous stationary subset on which the function is constant. [quotetheorem:4841] [citeproof:4841] Fodor's lemma is often called the pressing-down lemma because a function that presses each ordinal below itself must press a stationary collection to the same value. Both hypotheses are necessary. If $S$ is merely unbounded and not stationary, take a club $C$ disjoint from $S$; there is then no reason for a regressive map on $S$ to have a stationary fibre, since every fibre lies inside the nonstationary set $S$. Regressiveness is also essential: the identity map $f(\alpha)=\alpha$ on a stationary set has singleton fibres, and no singleton is stationary in an uncountable regular cardinal. The theorem also does not say that the constant fibre contains a club or that all large subsets of $S$ have the same value; it produces stationarily many points with one fixed pressed-down value. This is one of the standard ways to convert local bounded information into a stationary homogeneous set. [example: Pressing Down a Choice of Bounds] Let $S\subseteq\omega_1$ be stationary. Suppose that for each $\alpha\in S\setminus\{0\}$ we are given a set $A_\alpha\subseteq\alpha$ such that $\sup A_\alpha<\alpha$. Let $T=S\setminus\{0\}$. This set is still stationary in $\omega_1$: if $C\subseteq\omega_1$ is club, then \begin{align*} C^{+}=\{\beta\in C:0<\beta\} \end{align*} is unbounded in $\omega_1$, and if $\lambda<\omega_1$ is a limit ordinal with $C^{+}\cap\lambda$ unbounded in $\lambda$, then $C\cap\lambda$ is unbounded in $\lambda$, so $\lambda\in C$ by closure of $C$; also $\lambda>0$, hence $\lambda\in C^{+}$. Thus $C^{+}$ is club. Since $S$ is stationary, $S\cap C^{+}\ne\varnothing$, so $T\cap C\ne\varnothing$. For each $\alpha\in T$, define \begin{align*} f(\alpha)=\sup A_\alpha. \end{align*} Because $A_\alpha\subseteq\alpha$ and the hypothesis says $\sup A_\alpha<\alpha$, we have \begin{align*} f(\alpha)=\sup A_\alpha<\alpha. \end{align*} Also $\alpha<\omega_1$, so $f(\alpha)<\omega_1$. Therefore $f:T\to\omega_1$ is regressive. By *Fodor's Pressing Down Lemma*, there is some $\xi<\omega_1$ such that \begin{align*} f^{-1}(\{\xi\}) = \{\alpha\in T:f(\alpha)=\xi\} \end{align*} is stationary in $\omega_1$. For every $\alpha$ in this stationary set, \begin{align*} \sup A_\alpha=f(\alpha)=\xi. \end{align*} Thus the assigned countable sets need not be the same, but their suprema are equal to one fixed countable ordinal $\xi$ on a stationary subset of $S$. [/example] The conclusion is not that all the sets $A_\alpha$ are equal, only that their chosen bounds can be made constant on a stationary set. This is the typical use of pressing down: reduce many small pieces of information to one fixed parameter. [remark: Regularity Cannot Be Dropped] The regularity of $\kappa$ is used at the points where suprema of fewer than $\kappa$ ordinals must remain below $\kappa$. For singular cardinals the club filter does not have the same diagonal closure, and Fodor's lemma in this form is no longer the right statement. [/remark] These results form a compact toolkit. Clubs provide closed unbounded stage sets, stationary sets are those that every club must meet, diagonal intersections express normality, and Fodor's lemma turns regressiveness into stationary constancy. The language is set-theoretic, but the intuition is also topological: club closure is closure under limit points in the order topology on $\kappa$, while stationarity says that no closed unbounded test set can miss the set. Later uses of elementary submodels, reflection, and partition arguments repeatedly return to this pattern: build a club of well-behaved stages, intersect it with a stationary set, and press down any bounded choices that remain. Stationary-set arguments depend on working inside well-behaved ambient structures where membership behaves transparently. The next chapter introduces transitive sets, Mostowski collapse, and basic absoluteness so that later reflection and submodel arguments can be carried out with precise control over internal and external viewpoints. # 10. Transitive Sets, Collapse, and Absoluteness Basics This chapter introduces the structural viewpoint that underlies much of modern set theory: many arguments become cleaner when the ambient objects are transitive, so that membership relations inside the object agree with membership in the universe. It assumes the basic axioms of set theory, especially Extensionality, Separation, Replacement, Union, and Foundation, together with the von Neumann treatment of ordinals. We first isolate the operation of closing a set under elements, then show that well-founded extensional relations are exactly disguised membership structures. The chapter ends with the first absoluteness results, explaining why bounded formulas can be trusted inside transitive classes. ## Why Transitivity Is the Right Closure Condition A set-theoretic structure often contains elements whose own members have not been included. The problem is that formulas involving membership can then disagree between the structure and the surrounding universe. Transitivity is the condition that prevents this mismatch at the most basic level. [definition: Transitive Set] A set $M$ is transitive if for all $x$ and $y$, if $x \in y$ and $y \in M$, then $x \in M$. [/definition] Thus a transitive set contains every element of each of its elements. This is stronger than being closed under subsets: if $y \in M$ and $x \subset y$, transitivity alone does not imply $x \in M$, since $x$ need not be an element of $y$. [example: Basic Transitive Sets] The empty set is transitive because there is no $y\in\varnothing$, so the implication $x\in y$ and $y\in\varnothing$ implies $x\in\varnothing$ has no counterexample. Every ordinal is transitive by the von Neumann definition of ordinals. Now let $M=\{\varnothing,\{\varnothing\}\}$. To check transitivity, take $x\in y\in M$. There are two possible values of $y$. If $y=\varnothing$, then there is no $x\in y$. If $y=\{\varnothing\}$, then $x=\varnothing$, and $\varnothing\in M$ by the definition of $M$. Hence every element of every element of $M$ is again in $M$, so $M$ is transitive. By contrast, $N=\{\{\varnothing\}\}$ is not transitive. We have $\{\varnothing\}\in N$ and $\varnothing\in\{\varnothing\}$, but the only element of $N$ is $\{\varnothing\}$, so $\varnothing\notin N$. Thus $N$ misses an element of one of its elements, which is exactly the failure of transitivity. [/example] The preceding example shows that transitivity is a closure property rather than a size property. Given an arbitrary set, the natural question is how much must be added to make it transitive. [definition: Transitive Closure] Let $a$ be a set. A [transitive closure](/theorems/1493) of $a$ is a transitive set $T$ such that $a \subset T$ and, whenever $S$ is transitive with $a \subset S$, we have $T \subset S$. [/definition] The leastness clause makes the transitive closure unique when it exists. We write it as $\operatorname{TC}(a)$. The definition describes what the desired object must do, but it does not by itself show that such a least transitive set is available in ZFC. The natural construction repeatedly adds all members of objects already seen, and the point is to verify that these finite membership stages can be gathered into one set. The existence theorem below provides that construction and justifies using $\operatorname{TC}(a)$ as ordinary notation rather than as a conditional abbreviation. [quotetheorem:1493] [citeproof:1493] The construction is useful because it measures the finite membership depth below $a$. The theorem is not a powerset construction: it does not collect all subsets of elements of $a$, only objects reachable by repeatedly moving downward along $\in$. The proof also records exactly where set existence is used, since Replacement turns the recursively defined finite stages into a single sequence and Union then gathers those stages into one set. This matters later when ranks and cumulative hierarchy arguments require a genuine set around the parameters rather than a proper class. The statement follows the convention $a\subset\operatorname{TC}(a)$; some authors instead close $a\cup\{a\}$, which is the same as working with $\operatorname{TC}(\{a\})$ under the present convention. [example: Computing a Transitive Closure] Let $a=\{\{\varnothing\},\{\{\varnothing\}\}\}$, and compute the stages $T_0=a$ and $T_{n+1}=T_n\cup\bigcup T_n$. Since the two elements of $T_0$ are $\{\varnothing\}$ and $\{\{\varnothing\}\}$, we have \begin{align*} \bigcup T_0 &=\{x:x\in\{\varnothing\}\text{ or }x\in\{\{\varnothing\}\}\}\\ &=\{\varnothing,\{\varnothing\}\}. \end{align*} Therefore \begin{align*} T_1 &=T_0\cup\bigcup T_0\\ &=\{\{\varnothing\},\{\{\varnothing\}\}\}\cup\{\varnothing,\{\varnothing\}\}\\ &=\{\varnothing,\{\varnothing\},\{\{\varnothing\}\}\}. \end{align*} Now the elements of $T_1$ are $\varnothing$, $\{\varnothing\}$, and $\{\{\varnothing\}\}$. Thus \begin{align*} \bigcup T_1 &=\{x:x\in\varnothing\text{ or }x\in\{\varnothing\}\text{ or }x\in\{\{\varnothing\}\}\}\\ &=\{\varnothing,\{\varnothing\}\}\\ &\subset T_1. \end{align*} Hence \begin{align*} T_2=T_1\cup\bigcup T_1=T_1, \end{align*} and the same equality gives $T_n=T_1$ for every $n\geq 1$. Consequently \begin{align*} \operatorname{TC}(a) =\bigcup_{n\in\omega}T_n =T_1 =\{\varnothing,\{\varnothing\},\{\{\varnothing\}\}\}. \end{align*} This uses the convention that $a\subset\operatorname{TC}(a)$ is required; under the alternative convention of closing $a\cup\{a\}$, the set $a$ itself is added as an element as well. [/example] That example also warns that authors sometimes shift between $\operatorname{TC}(a)$ and $\operatorname{TC}(\{a\})$. When ranks are discussed later, the latter convention is often more convenient because it places $a$ itself inside the transitive set. ## How to Recognise Membership in Disguise Transitive sets give honest membership structures, but many naturally occurring structures are presented by an abstract relation $R$ instead of $\in$. The collapse problem asks when such a relation is merely a relabelled version of membership on a transitive set. [definition: Extensional Relation] Let $A$ be a set and let $R\subset A\times A$ be a binary relation. The relation $R$ is extensional on $A$ if for all $a,b\in A$, whenever \begin{align*} \{x\in A:xRa\}=\{x\in A:xRb\}, \end{align*} then $a=b$. [/definition] Extensionality says that an element is determined by its predecessors. This mirrors the Axiom of Extensionality, where a set is determined by its members. Extensionality alone does not prevent circular or endlessly descending predecessor chains, and such chains cannot be represented by ordinary membership in a well-founded universe. To make a relation behave like membership, we also need a condition saying that every nonempty part of the structure has a first point when viewed from below. That condition is isolated in the next definition. [definition: Well-Founded Relation] Let $A$ be a set and let $R\subset A\times A$ be a binary relation. The relation $R$ is well-founded on $A$ if every nonempty $B\subset A$ has an $R$-minimal element, meaning an element $b\in B$ such that there is no $c\in B$ with $cRb$. [/definition] Well-foundedness rules out infinite descent. Together with extensionality, it gives enough structure to assign to every abstract point the set of collapses of its predecessors. With both hypotheses in place, the remaining question is whether this recursive assignment actually produces a transitive membership structure and whether it is unique. The collapse theorem answers that question by converting the abstract predecessor relation into genuine membership while preserving the relation it was meant to encode. This is the bridge from relational presentations to transitive sets. [quotetheorem:4842] [citeproof:4842] The lemma turns relational data into a transitive membership model, but it does not say that every relation can be viewed as membership. Well-foundedness is needed to make the recursive definition terminate downward; without it, a cycle or infinite descending chain would demand sets satisfying circular membership equations, which Foundation rules out in the ambient universe. Extensionality is needed for injectivity, since two distinct points with the same predecessors would collapse to the same set. The conclusion is also limited to the structure carried by $R$: the collapse preserves the relation, not any extra predicates, functions, or external properties unless they are transported separately. Later, this is the mechanism behind replacing well-founded models by transitive ones in model comparison, constructibility, forcing arguments, and admissible set theory. [example: Collapse of a Finite Relation] Let $A=\{p,q,r\}$ and let $R=\{(p,q),(p,r),(q,r)\}$. The predecessor set of an element $a\in A$ is $\{b\in A:bRa\}$, so \begin{align*} \{b\in A:bRp\}&=\varnothing,\\ \{b\in A:bRq\}&=\{p\},\\ \{b\in A:bRr\}&=\{p,q\}. \end{align*} These three predecessor sets are pairwise different, so $R$ is extensional on $A$. It is also well-founded: if $B\subset A$ is nonempty and $p\in B$, then $p$ is $R$-minimal; if $p\notin B$ but $q\in B$, then $q$ is $R$-minimal; and if neither $p$ nor $q$ lies in $B$, then $B=\{r\}$ and $r$ is $R$-minimal. The collapse map satisfies $\pi(a)=\{\pi(b):bRa\}$. Since no element is $R$-below $p$, \begin{align*} \pi(p)=\{\pi(b):bRp\}=\varnothing. \end{align*} Since the only predecessor of $q$ is $p$, \begin{align*} \pi(q)=\{\pi(b):bRq\}=\{\pi(p)\}=\{\varnothing\}. \end{align*} Since the predecessors of $r$ are $p$ and $q$, \begin{align*} \pi(r) &=\{\pi(b):bRr\}\\ &=\{\pi(p),\pi(q)\}\\ &=\{\varnothing,\{\varnothing\}\}. \end{align*} Thus \begin{align*} M=\pi[A]=\{\varnothing,\{\varnothing\},\{\varnothing,\{\varnothing\}\}\}. \end{align*} The membership relations among these three images are exactly \begin{align*} \pi(p)&\in\pi(q),& \pi(p)&\in\pi(r),& \pi(q)&\in\pi(r), \end{align*} because \begin{align*} \varnothing&\in\{\varnothing\},\\ \varnothing&\in\{\varnothing,\{\varnothing\}\},\\ \{\varnothing\}&\in\{\varnothing,\{\varnothing\}\}. \end{align*} These correspond precisely to $pRq$, $pRr$, and $qRr$, so the abstract relation $R$ has become ordinary membership on the transitive set $M$. [/example] The finite computation is the whole idea of the general theorem, with recursion replacing hand calculation. Extensionality ensures that two nodes with the same collapsed predecessor set do not remain as duplicate copies of the same set. [remark: Collapse Without Extensionality] If $A=\{p,q\}$ and $R=\varnothing$, then $R$ is well-founded but not extensional because $p$ and $q$ have the same predecessor set. A collapse map satisfying $\pi(a)=\{\pi(b):bRa\}$ would send both $p$ and $q$ to $\varnothing$, so it cannot be injective. This example isolates the role of extensionality in the lemma. [/remark] After collapse, statements about the relation $R$ can be transported to statements about membership in the transitive set $M$. The next section identifies a syntactic class of formulas for which truth is preserved between transitive universes. ## Which Formulas Are Absolute for Transitive Classes A class model may omit many subsets, functions, or witnesses that exist in the universe. The basic absoluteness question asks which formulas are immune to this omission. The first robust answer is that bounded quantifiers are safe over transitive classes. [definition: Bounded Quantifier] A quantifier is bounded if it has the form $\forall x\in y$ or $\exists x\in y$, where $y$ is a variable already in the formula. [/definition] Bounded quantifiers search only through elements of a specified set. If that set lies in a transitive class, all of its elements lie there as well, so the internal and external search domains agree. To turn this observation into a reusable absoluteness criterion, we need a syntactic name for formulas whose quantifiers are all of this safe bounded kind. The next definition marks exactly the formulas whose truth can be checked without ranging over the whole ambient universe. This separates local membership computations from genuinely unbounded existence assertions. [definition: Delta Zero Formula] A formula of the language of set theory is $\Delta_0$ if every quantifier appearing in it is bounded. [/definition] Atomic formulas are $\Delta_0$, and the class of $\Delta_0$ formulas is closed under Boolean connectives and bounded quantification. Unbounded assertions such as $\exists x\,(x\notin y)$ are not $\Delta_0$. The reason for singling out this class is not merely terminological: bounded formulas should have the same truth value inside any transitive class containing their parameters. The next theorem makes that expectation precise and identifies the first robust absoluteness principle available without elementarity or closure under powersets. It is the basic tool for moving simple set-theoretic assertions between $V$ and transitive subuniverses. [quotetheorem:4843] [citeproof:4843] This theorem is the first place where transitivity pays off syntactically. Transitivity is essential because a bounded quantifier $\exists x\in y$ is only safe when every element of $y$ that exists in $V$ is also present in $M$. Boundedness is essential because an unbounded quantifier may ask for a witness somewhere in the whole universe, and a transitive class can miss subsets, functions, well-orders, or other witnesses that exist externally. Thus the theorem gives a precise syntactic boundary: finite membership checks and bounded constructions are absolute, while existence assertions over the universe require stronger hypotheses. Later chapters use this boundary when comparing internal and external cardinality, analysing constructible levels, and separating what forcing extensions preserve from what they add. [example: Absoluteness of Subset] Let $M$ be transitive and let $x,y\in M$. The assertion $x\subset y$ means \begin{align*} \forall z\,(z\in x\Rightarrow z\in y), \end{align*} which is equivalently the bounded formula \begin{align*} \forall z\in x\,(z\in y). \end{align*} The only quantifier is bounded by the parameter $x$, and the remaining statement $z\in y$ is atomic, so this is a $\Delta_0$ formula. By *Delta Zero Absoluteness*, $M$ and $V$ agree on this formula with parameters $x,y\in M$: \begin{align*} M\models \forall z\in x\,(z\in y) \iff V\models \forall z\in x\,(z\in y). \end{align*} Unpacking the abbreviation on each side gives \begin{align*} M\models x\subset y \iff V\models x\subset y. \end{align*} Thus subsethood between two elements of a transitive class is absolute, because checking $x\subset y$ only ranges over elements of $x$, and transitivity ensures those elements are already in $M$. [/example] Subsethood is safe because checking it only requires ranging over elements of $x$. The same method handles common finite codings, provided the coding is built from pairing and unions in a bounded way. [definition: Kuratowski Ordered Pair] The Kuratowski ordered-pair operation is the class function \begin{align*} (\cdot,\cdot):V\times V&\longrightarrow V,\\ (a,b)&\longmapsto \{\{a\},\{a,b\}\}. \end{align*} [/definition] The definition itself is external notation, but the relation saying that a set $p$ is the ordered pair of $a$ and $b$ can be expressed with bounded quantifiers once the relevant finite membership pattern is expanded. [example: Absoluteness of Ordered Pair Coding] Let $M$ be transitive and let $a,b,p\in M$. We express the assertion $p=(a,b)$, where $(a,b)=\{\{a\},\{a,b\}\}$, by spelling out the finite membership pattern. Say that $u$ is the singleton $\{a\}$ by the bounded formula \begin{align*} a\in u\ \wedge\ \forall z\in u\,(z=a), \end{align*} and say that $v$ is the unordered pair $\{a,b\}$ by the bounded formula \begin{align*} a\in v\ \wedge\ b\in v\ \wedge\ \forall z\in v\,(z=a\vee z=b). \end{align*} Then $p$ codes the Kuratowski ordered pair of $a$ and $b$ exactly when \begin{align*} &\exists u\in p\,\bigl(a\in u\wedge \forall z\in u\,(z=a)\bigr)\\ &\wedge\ \exists v\in p\,\bigl(a\in v\wedge b\in v\wedge \forall z\in v\,(z=a\vee z=b)\bigr)\\ &\wedge\ \forall w\in p\,\Bigl[ \bigl(a\in w\wedge \forall z\in w\,(z=a)\bigr) \vee \bigl(a\in w\wedge b\in w\wedge \forall z\in w\,(z=a\vee z=b)\bigr) \Bigr]. \end{align*} Every quantifier displayed here is bounded by one of $p,u,v,w$, so this is a $\Delta_0$ formula with parameters $a,b,p$. To see that the formula is equivalent to $p=\{\{a\},\{a,b\}\}$, first suppose $p=\{\{a\},\{a,b\}\}$. Taking $u=\{a\}$ gives \begin{align*} a\in \{a\} \quad\text{and}\quad \forall z\in \{a\}\,(z=a), \end{align*} and taking $v=\{a,b\}$ gives \begin{align*} a\in \{a,b\},\qquad b\in \{a,b\},\qquad \forall z\in \{a,b\}\,(z=a\vee z=b). \end{align*} Finally, if $w\in p$, then $w=\{a\}$ or $w=\{a,b\}$, so the final bounded universal clause holds. Conversely, suppose the displayed formula holds. Choose $u\in p$ satisfying \begin{align*} a\in u\wedge \forall z\in u\,(z=a). \end{align*} Then every element of $u$ is $a$, and $a$ is an element of $u$, so $u=\{a\}$ by extensionality. Choose $v\in p$ satisfying \begin{align*} a\in v\wedge b\in v\wedge \forall z\in v\,(z=a\vee z=b). \end{align*} Then $a$ and $b$ are elements of $v$, and every element of $v$ is one of $a,b$, so $v=\{a,b\}$ by extensionality. Thus $\{a\}\in p$ and $\{a,b\}\in p$. The last clause says that every $w\in p$ is either $\{a\}$ or $\{a,b\}$, so \begin{align*} p=\{\{a\},\{a,b\}\}=(a,b). \end{align*} By *Delta Zero Absoluteness*, $M$ and $V$ agree on this bounded formula with parameters $a,b,p\in M$. Hence a transitive class cannot disagree with the universe about whether one of its elements is the Kuratowski code for the ordered pair of two of its elements. [/example] Ordered pairs allow us to code relations and functions, but not every assertion about functions is automatically bounded. For example, saying that a relation has domain all of $x$ is bounded over $x$ and the relation, while saying that some function with a property exists usually has an unbounded leading existential quantifier. [remark: Absoluteness Has Syntactic Limits] The formula $\exists z\,(z\subset x\wedge z\notin M)$ is not a formula of the language with $M$ as a set parameter unless $M$ itself is named, and in any case it contains an unbounded existential quantifier. More mathematically, a transitive model can be correct about bounded facts involving its elements while missing subsets, functions, or well-orders that exist in $V$. Later chapters use this distinction when separating internal cardinality from external cardinality. [/remark] The three ideas of the chapter now fit together. Transitive closures produce small transitive environments around given parameters, Mostowski collapse converts well-founded extensional relations into transitive membership structures, and $\Delta_0$ absoluteness explains why bounded reasoning inside those structures agrees with the surrounding universe. Once transitive structure and absoluteness are available, one can ask how much of the universe is really needed for a given argument. The next chapter answers this with reflection and elementary submodels, showing how large-scale truths about V can be captured inside smaller, well-chosen structures. # 11. Reflection and Elementary Submodels This chapter changes the scale at which the axioms of set theory are used. Earlier chapters treated $V$ as the ambient universe and proved theorems inside it; now we ask how much of that universe is needed to witness a given finite piece of reasoning. The answers are reflection, which finds rank-initial approximations to $V$, and elementary submodels, which let us replace large set-theoretic structures by smaller ones while preserving first-order truths with parameters. The point is not that the universe of sets is itself a set. Rather, first-order formulas only mention finitely much syntax at a time, so ZFC can often be used through finite fragments and through set-sized structures such as $H_\theta$. This chapter introduces the model-theoretic tools that make that reduction precise. ## Finite Reflection in the Cumulative Hierarchy Which parts of $V$ are needed to check a formula about specified parameters? A formula may quantify over all sets, so it seems to require the whole universe. The reflection principle says that for any finite list of formulas, there are arbitrarily high stages $V_\alpha$ where those formulas have the same truth value as they have in $V$, as long as the parameters lie in $V_\alpha$. Before stating the theorem, we need a precise way to compare truth in $V$ with truth inside a rank-initial segment. Since $V$ is a proper class, this is not a definition of satisfaction for a structure named $V$ inside ZFC; it is a metatheoretic abbreviation for the usual interpretation of formulas in the universe. [definition: Relativisation To A Class] Let $\varphi$ be a formula in the language $\{\in\}$ and let $C$ be a class. The relativisation $\varphi^C$ is obtained from $\varphi$ by replacing each quantifier $\exists x$ by $\exists x \in C$ and each quantifier $\forall x$ by $\forall x \in C$. [/definition] Relativisation turns an assertion about all sets into an assertion about a chosen class or set. For $C=V_\alpha$, the formula $\varphi^{V_\alpha}$ is a genuine set-theoretic statement because $V_\alpha$ is a set. [example: Relativising A Formula] Let $\varphi(x)$ be the formula saying that $x$ is transitive. Written using only the membership relation, this is \begin{align*} \forall y\,\bigl(y\in x \implies \forall z\,(z\in y \implies z\in x)\bigr). \end{align*} Relativising each quantifier to $V_\alpha$ gives \begin{align*} \varphi^{V_\alpha}(x) \quad\text{is}\quad \forall y\in V_\alpha\,\bigl(y\in x \implies \forall z\in V_\alpha\,(z\in y \implies z\in x)\bigr). \end{align*} Thus $V_\alpha$ checks only those potential counterexamples whose quantified variables lie in $V_\alpha$: if $y\in x$ but $y\notin V_\alpha$, then this $y$ is not tested, and if $z\in y$ but $z\notin V_\alpha$, then this $z$ is not tested. When $x\in V_\alpha$ and $\alpha$ is a limit ordinal, this relativised formula agrees with ordinary transitivity for $x$. Indeed, if $y\in x$, then $\operatorname{rank}(y)<\operatorname{rank}(x)<\alpha$, so $y\in V_\alpha$; and if $z\in y$, then $\operatorname{rank}(z)<\operatorname{rank}(y)<\alpha$, so $z\in V_\alpha$. Therefore the restricted statement \begin{align*} \forall y\in V_\alpha\,\bigl(y\in x \implies \forall z\in V_\alpha\,(z\in y \implies z\in x)\bigr) \end{align*} has exactly the same witnesses and counterexamples as \begin{align*} \forall y\,\bigl(y\in x \implies \forall z\,(z\in y \implies z\in x)\bigr). \end{align*} The point is that this agreement comes from the ranks of the specific quantified objects in this formula, not from an informal rule that every assertion behaves well after replacing $V$ by $V_\alpha$. [/example] The example shows that agreement between $V$ and a rank-initial segment must be requested formula by formula, with parameters restricted to the segment being tested. To state reflection cleanly, we need a single name for ordinals at which a chosen finite fragment of the language has the same truth values in $V_\alpha$ as in the universe. The next definition packages that comparison so the reflection theorem can assert that such stages occur cofinally often. [definition: Reflection Point] Let $\Phi$ be a finite set of formulas in the language $\{\in\}$. An ordinal $\alpha$ is a reflection point for $\Phi$ if for every $\varphi(x_1,\dots,x_n) \in \Phi$ and every $a_1,\dots,a_n \in V_\alpha$, \begin{align*} V \models \varphi(a_1,\dots,a_n) \quad \text{iff} \quad V_\alpha \models \varphi(a_1,\dots,a_n). \end{align*} [/definition] The definition is finite on purpose. Tarski theorem on undefinability of truth prevents a single first-order formula from expressing full truth for $V$, so reflection is stated for a finite fragment fixed in advance. [quotetheorem:4844] [citeproof:4844] Reflection is often the reason that a proof using the whole universe can be converted into a proof using a sufficiently large set. The finiteness hypothesis is essential: if all formulas were reflected at once, a set-sized $V_\alpha$ would carry a truth predicate for the universe-level theory, contradicting the same limitations behind Tarski's theorem. The parameter condition is also essential, since a formula with a parameter not in $V_\alpha$ cannot even be interpreted inside $V_\alpha$ in the intended way. The theorem does not say that $V_\alpha$ satisfies all of ZFC, but it does say that any chosen finite portion of the theory can be made correct at many ranks, which is exactly what is needed before passing to smaller elementary submodels. [example: Reflecting A Finite Fragment Of ZFC] Fix a finite set $\Phi$ of ZFC axioms, each written as a first-order sentence in the language $\{\in\}$. Apply *Levy reflection theorem* to this finite set. For arbitrarily large ordinals $\alpha$, every sentence $\sigma\in\Phi$ satisfies \begin{align*} V\models \sigma \quad\text{iff}\quad V_\alpha\models \sigma. \end{align*} Since each $\sigma\in\Phi$ is an axiom of ZFC and is true in $V$, the left side holds for every $\sigma\in\Phi$. Hence the equivalence gives \begin{align*} V_\alpha\models \sigma \end{align*} for every $\sigma\in\Phi$, so $V_\alpha$ satisfies the whole finite fragment $\Phi$. For example, suppose the finite fragment contains Extensionality, Pairing, Union, and the particular Replacement instance used in some argument, and suppose all parameters of that argument lie in $V_\alpha$. Then each step of the argument that invokes one of those axioms can be interpreted inside $V_\alpha$, because $V_\alpha$ satisfies exactly those selected sentences. Thus "take a sufficiently large $V_\alpha$" means: choose $\alpha$ reflecting the finite list of formulas actually used, not that some set $V_\alpha$ is being treated as the whole universe. [/example] Finite fragments matter because any formal proof uses only finitely many axioms. Thus reflection gives a bridge from proof-theoretic finiteness to set-sized approximations of the universe. [remark: Reflection And Inaccessibility] If $\kappa$ is inaccessible, then $V_\kappa$ satisfies ZFC. Levy reflection gives many $\alpha$ reflecting any fixed finite fragment, but it does not produce inaccessible cardinals and does not imply that some $V_\alpha$ satisfies all of ZFC. The distinction is a recurring theme: finite first-order demands are much weaker than a single set model of the whole theory. [/remark] ## Downward Lowenheim-Skolem For Set-Theoretic Structures Reflection gives large set-sized approximations. The next question goes in the opposite direction: once we have a structure, can we find a smaller substructure that preserves the first-order facts relevant to its elements? The downward Lowenheim-Skolem theorem says that in a countable language, every infinite structure has elementary substructures of any prescribed infinite size at least the size of the parameter set. We first recall the model-theoretic comparison used here. The language may be just $\{\in\}$, or it may include additional relation and function symbols used to code a structure such as $(H_\theta,\in,\triangleleft)$. [definition: Elementary Substructure] Let $\mathcal M$ and $\mathcal N$ be structures for the same first-order language, with $M \subset N$ and all functions and relations on $\mathcal M$ inherited from $\mathcal N$. We write $\mathcal M \preccurlyeq \mathcal N$ if for every formula $\varphi(x_1,\dots,x_n)$ and all $a_1,\dots,a_n \in M$, \begin{align*} \mathcal M \models \varphi(a_1,\dots,a_n) \quad \text{iff} \quad \mathcal N \models \varphi(a_1,\dots,a_n). \end{align*} [/definition] Elementarity says that the smaller structure has the same first-order theory with parameters from the smaller domain. To use elementary substructures in constructions, however, we need a criterion that can be checked by looking only at existential witnesses. The [Tarski-Vaught test](/theorems/4280) supplies exactly this local replacement for testing every formula directly. [quotetheorem:4298] [citeproof:4298] The closure-under-functions assumption prevents the smaller domain from failing even to be a substructure; for example, if the language has a unary function $s$ and $a\in M$ but $s(a)\notin M$, atomic formulas involving $s(a)$ cannot be interpreted internally as inherited structure. The witness condition is the real content: a subset can be closed under the named functions and still fail elementarity if $\mathcal N$ sees an existential witness outside $M$ for parameters in $M$. The test turns elementarity into a closure problem: build a set that contains the desired parameters and is closed under chosen witnesses for existential formulas. The object that carries out this closure is the Skolem hull. Naming it lets us describe the downward Lowenheim-Skolem construction without repeatedly listing all of the witness functions. [definition: Skolem Hull] Let $\mathcal N$ be a structure in a language $\mathcal L$, and suppose a Skolem function has been chosen for each formula $\exists x\,\varphi(x,y_1,\dots,y_n)$ whenever witnesses exist in $\mathcal N$. For $A \subset N$, the Skolem hull of $A$ in $\mathcal N$ is the smallest subset of $N$ containing $A$ and closed under all chosen Skolem functions. [/definition] The Skolem hull is designed so that every existential statement over its parameters has a witness in the hull. This is the construction behind downward Lowenheim-Skolem. The remaining issue is size control: closing under all the chosen Skolem functions should not force the hull to become as large as the whole structure unless the language or parameter set already demands it. Downward Lowenheim-Skolem makes this precise by producing elementary substructures of prescribed intermediate cardinality. This is the engine that later supplies small elementary models containing selected parameters. [quotetheorem:4299] [citeproof:4299] The lower bound $|\mathcal L|+|A|\le\kappa$ is forced: the submodel must contain $A$, and in a large language the Skolem construction may need at least enough room to handle the named symbols. The upper bound $|N|\le\kappa$ is the size-control conclusion: the Skolem hull is built by closing under only $|\mathcal L|$ many witness functions starting from $A$, so it can be kept within the prescribed cardinal $\kappa$. The theorem does not make a submodel transitive, nor does it preserve second-order properties such as external countability; it preserves first-order truth with parameters from $N$. In set theory this is especially powerful because the language $\{\in\}$ is countable, so large set-theoretic structures usually have countable elementary submodels containing any prescribed countable collection of parameters. [example: A Countable Elementary Submodel With Parameters] Let $\theta$ be an uncountable regular cardinal and let $A\subset H_\theta$ be countable. For the structure $(H_\theta,\in)$, the language is $\{\in\}$, so \begin{align*} |\{\in\}|+|A| \le \aleph_0. \end{align*} Also $H_\theta$ is infinite, since every finite ordinal belongs to $H_\theta$, and therefore \begin{align*} \aleph_0 \le |H_\theta|. \end{align*} Taking $\kappa=\aleph_0$ in *Downward Lowenheim-Skolem Theorem* gives a subset $M\subset H_\theta$ such that \begin{align*} A\subset M,\qquad |M|=\aleph_0,\qquad (M,\in)\preccurlyeq (H_\theta,\in), \end{align*} where the membership relation on $M$ is the restriction of the membership relation on $H_\theta$. If the argument also needs a fixed well-order of $H_\theta$, choose such a relation $\triangleleft$ and use the expanded structure $(H_\theta,\in,\triangleleft)$. Its language has two relation symbols, so it is still countable: \begin{align*} |\{\in,\triangleleft\}|+|A|\le \aleph_0. \end{align*} Applying *Downward Lowenheim-Skolem Theorem* again with $\kappa=\aleph_0$ gives a countable $M\subset H_\theta$ with \begin{align*} A\subset M \quad\text{and}\quad (M,\in,\triangleleft\!\upharpoonright M)\preccurlyeq (H_\theta,\in,\triangleleft). \end{align*} Thus adding the well-order changes which first-order statements are preserved, but it does not change the countability conclusion. [/example] The extra well-order is often included because it gives definable choices inside $H_\theta$. In practice, this makes Skolem hulls canonical relative to $\triangleleft$ and improves closure properties of the resulting submodel. ## Elementary Submodels Of $H_\theta$ Why do set theorists work with $H_\theta$ rather than with arbitrary $V_\alpha$? The class $H_\theta$ consists of sets whose transitive closures have size below $\theta$, and for regular uncountable $\theta$ it behaves well under the constructions used in ordinary mathematics. It is large enough to contain the objects of a given argument while still being a set-sized structure suitable for Lowenheim-Skolem. [definition: Hereditarily Small Sets] Let $\theta$ be an infinite cardinal. Define \begin{align*} H_\theta = \{x : |\operatorname{tc}(x)| < \theta\}, \end{align*} where $\operatorname{tc}(x)$ is the transitive closure of $x$. [/definition] The hereditary condition is stronger than merely asking $|x|<\theta$. It ensures that the elements of elements, and so on down the membership relation, remain small. [example: Basic Members Of $H_\theta$] Let $\theta$ be uncountable. If $\alpha<\theta$ is an ordinal, then every element of $\alpha$ is an ordinal below $\alpha$, and every element of an element of $\alpha$ is again an ordinal below $\alpha$. Thus $\alpha$ is transitive, so \begin{align*} \operatorname{tc}(\alpha)=\alpha. \end{align*} Since $\alpha<\theta$, we have $|\operatorname{tc}(\alpha)|=|\alpha|<\theta$, and therefore $\alpha\in H_\theta$. Now let $A$ be a countable set of countable ordinals. For each $\alpha\in A$, \begin{align*} \operatorname{tc}(\alpha)=\alpha \quad\text{and}\quad |\alpha|\le \aleph_0. \end{align*} The transitive closure of $A$ is contained in $A\cup\bigcup A$: the elements of $A$ are ordinals, and the elements appearing below those ordinals lie in $\bigcup A$. Since $A$ is countable and each $\alpha\in A$ is countable, the union $\bigcup A$ is countable, so \begin{align*} |\operatorname{tc}(A)| &\le |A\cup\bigcup A| \\ &\le |A|+|\bigcup A| \\ &\le \aleph_0+\aleph_0 \\ &= \aleph_0 \\ &< \omega_1. \end{align*} Hence $A\in H_{\omega_1}$. By contrast, $\omega_1$ is itself a transitive ordinal, so \begin{align*} \operatorname{tc}(\omega_1)=\omega_1 \quad\text{and}\quad |\operatorname{tc}(\omega_1)|=|\omega_1|=\omega_1. \end{align*} The definition of $H_{\omega_1}$ requires transitive closure of size strictly below $\omega_1$, so $\omega_1\notin H_{\omega_1}$. [/example] The most basic closure properties of $H_\theta$ require only finite control over transitive closures. Regularity becomes important for stronger closure facts, such as closure under sequences of length below $\theta$, but the following elementary facts are already enough for many first-order arguments. [quotetheorem:4845] [citeproof:4845] The infinitude of $\theta$ is used to keep the finite unions in the proof below $\theta$; for finite cardinals the statement is not the useful closure principle set theorists need. Regularity is deliberately absent from the theorem, because these particular facts do not require it, although later applications often assume regular uncountable $\theta$ to control longer constructions. These closure facts justify treating $H_\theta$ as a stable background for many arguments, after which the central technique is to pick $\theta$ large enough and pass to a smaller elementary submodel containing the parameters of interest. [definition: Elementary Submodel Of H Theta] Let $\theta$ be an uncountable cardinal and let $\mathcal H_\theta$ denote a chosen structure with domain $H_\theta$, usually $(H_\theta,\in)$ or $(H_\theta,\in,\triangleleft)$. A set $M\subset H_\theta$ is an elementary submodel of $H_\theta$ for this language if the induced structure on $M$ satisfies $M\preccurlyeq \mathcal H_\theta$. [/definition] This terminology hides the language in use, so it should be read with the surrounding structure in mind. Adding predicates or functions changes what elementarity means. For applications, the useful case is when the language is countable and the parameter set to be captured is countable. Then the general Lowenheim-Skolem theorem should yield a countable elementary submodel of a sufficiently large $H_\theta$ that still contains all designated objects. The next result records this standard form for set-theoretic arguments. [quotetheorem:4846] [citeproof:4846] The countability of the language and of $A$ is essential for the countable conclusion: if the language names uncountably many independent constants, or if $A$ itself is uncountable, no countable $M$ can contain all required parameters. The theorem also does not say that $M$ is transitive or that it contains every element of each set it contains. This is the standard entry point into elementary submodel arguments, and the warning at the end of the proof is important: membership inside $M$ is inherited from $V$, but quantification inside $M$ only ranges over elements of $M$. [example: Capturing A Countable Construction] Suppose $X\in H_\theta$ is a topological space, $\mathcal B\in H_\theta$ is a base for $X$, and $A\subset H_\theta$ is countable. Work in the expanded language with membership and constant symbols for $X$ and $\mathcal B$. By *Countable Elementary Submodels Of H Theta*, choose a countable \begin{align*} M\preccurlyeq (H_\theta,\in,X,\mathcal B) \end{align*} such that \begin{align*} A\subset M. \end{align*} Because $X$ and $\mathcal B$ are named constants in the language, the induced structure on $M$ contains their interpretations, so \begin{align*} X\in M \quad\text{and}\quad \mathcal B\in M. \end{align*} Now suppose a definable subset is given by a formula $\delta(p,\bar a)$ with parameters $\bar a\in M$, and suppose the desired neighbourhood property is expressed by a formula $\rho(B,p,\bar a)$. The assertion that every point of the definable subset has a basic neighbourhood with that property is the first-order sentence \begin{align*} \forall p\in X\, \bigl( \delta(p,\bar a) \implies \exists B\in \mathcal B\, (p\in B \land \rho(B,p,\bar a)) \bigr). \end{align*} If this sentence holds in $H_\theta$ and $p\in M\cap X$ satisfies $\delta(p,\bar a)$ in $M$, then elementarity gives \begin{align*} M\models \exists B\in \mathcal B\, (p\in B \land \rho(B,p,\bar a)). \end{align*} Thus there is some $B\in M$ such that \begin{align*} M\models B\in\mathcal B,\qquad M\models p\in B,\qquad M\models \rho(B,p,\bar a). \end{align*} Applying elementarity again to these formulas with parameters from $M$ gives the corresponding statements in $H_\theta$. Hence the neighbourhood found inside $M$ is a genuine basic neighbourhood in the ambient structure. The point is that the construction does not make $M$ contain every relevant object automatically; it works because the objects and parameters needed by the argument are put into the language or into the parameter set before applying elementary submodel existence. [/example] ## The Skolem Paradox How can there be a countable elementary model of a structure that contains uncountable sets? This is the Skolem paradox. The resolution is that countability from the outside is not the same as countability as witnessed inside the model. The issue is already visible for $H_\theta$ when $\omega_1\in H_\theta$. A countable $M\preccurlyeq H_\theta$ may contain the set $\omega_1$, even though $M$ itself has only countably many elements from the external viewpoint. [definition: Internal Countability] Let $M\preccurlyeq H_\theta$ and let $x\in M$. The set $x$ is internally countable in $M$ if there is $f\in M$ such that $M$ satisfies that $f:\omega\to x$ is a surjection. [/definition] This definition depends on witnesses that belong to $M$. An external enumeration of $M\cap x$ does not make $x$ internally countable unless that enumeration is itself an element of $M$ with the right domain and range. The sharp test case is $\omega_1$: a countable elementary submodel may contain the ordinal $\omega_1$, but it need not contain a surjection from $\omega$ onto that ordinal. Elementarity transfers the statement that $\omega_1$ is uncountable, while the external countability of $M$ only enumerates the part of $\omega_1$ lying in $M$. The following theorem states this resolution of the Skolem paradox in the language of internal countability. [quotetheorem:4847] [citeproof:4847] The assumption $\omega_1\in M$ is what lets elementarity talk about that particular ordinal; without it, the displayed internal statement would not be a statement with the required parameter in the smaller structure. The hypothesis that $M$ is countable is external, coming from the Lowenheim-Skolem construction, while elementarity controls only first-order assertions made inside the structure. The apparent contradiction disappears once we track where the proposed enumeration lives. From outside, there is a bijection $e:\omega\to M$, and hence an enumeration of $M\cap\omega_1$. But $e$ is not an element of $M$ in general, and it does not enumerate all of $\omega_1$; it enumerates only the ordinals of $\omega_1$ that happen to lie in $M$. [example: A Countable Model Thinking A Set Is Uncountable] Let $M\preccurlyeq H_\theta$ be countable, and suppose $\omega_1\in M$. Since $M\cap\omega_1\subseteq M$ and $M$ is countable in the external universe, there is an external surjection \begin{align*} e:\omega\to M. \end{align*} Then the set \begin{align*} I=\{n\in\omega:e(n)\in\omega_1\} \end{align*} is a subset of $\omega$, and the restricted enumeration $e\upharpoonright I$ lists every element of $M\cap\omega_1$. Hence $M\cap\omega_1$ is externally countable. This does not give $M$ an internal enumeration of $\omega_1$. In $H_\theta$, the statement "$\omega_1$ is uncountable" is expressed as \begin{align*} \neg\exists f\, \bigl(f:\omega\to\omega_1 \text{ is a surjection}\bigr). \end{align*} Because $\omega_1\in M$ and $M\preccurlyeq H_\theta$, elementarity transfers this formula with parameter $\omega_1$ from $H_\theta$ to $M$, so \begin{align*} M\models \neg\exists f\, \bigl(f:\omega\to\omega_1 \text{ is a surjection}\bigr). \end{align*} Equivalently, for every $f\in M$, \begin{align*} M\models \neg\bigl(f:\omega\to\omega_1 \text{ is a surjection}\bigr). \end{align*} Thus $M$ can contain the object $\omega_1$ and still satisfy that it is uncountable, while the visible initial segment $M\cap\omega_1$ is countable from the external viewpoint. [/example] The distinction between $\omega_1$ and $M\cap\omega_1$ is one of the first lessons of elementary submodels. A non-transitive model can contain a set without containing all of its elements. [remark: Transitive Collapse] When $M$ is well-founded and extensional, the Mostowski collapse turns it into a transitive structure isomorphic to $M$. For countable elementary submodels of $H_\theta$, this collapse is often used to separate the external countability of the coded structure from the internal membership relation it carries. The collapse does not remove the Skolem phenomenon; it reorganises it into a countable transitive image of the part of the universe seen by $M$. [/remark] Elementary submodels are therefore bookkeeping devices with genuine mathematical force. Reflection tells us that finite fragments of the universe appear at high ranks; Lowenheim-Skolem tells us that first-order structure can be preserved in small submodels; and the Skolem paradox teaches that internal and external viewpoints must be kept separate. Reflection and elementary submodels reveal that much of set-theoretic reasoning can be localized, but they also sharpen the question of which sets are canonical. The next chapter begins the study of constructibility by building Gödel's inner model L, where sets are generated in a controlled way and the universe is approximated through definability. # 12. Constructibility as a First Inner Model The previous chapters developed the ZFC universe through ordinals, cardinals, choice, cofinality, clubs, stationary sets, and collapse arguments. We now build the first canonical inner model: Gödel's constructible universe $L$. The guiding idea is to replace the unrestricted power-set operation by a definability operation, producing at each stage only those subsets that can be described over the structure already built. The chapter has three aims. First, it makes precise what it means for a subset of a set to be definable over a structure. Second, it uses this notion to define the hierarchy $(L_\alpha)_{\alpha \in \operatorname{Ord}}$ and the class $L$. Third, it records the first major structural facts: $L$ is transitive, satisfies ZFC, and satisfies the axiom $V=L$ internally, with the Generalised Continuum Hypothesis holding in $L$. ## Definability Over a Structure The problem is that the full power set $\mathcal P(x)$ is often too large to be controlled from inside a model. If we want a universe built from explicit information, we need a replacement for $\mathcal P(x)$ that only keeps subsets described by first-order formulas using parameters from $x$. [definition: Structure for Set Membership] A structure for set membership is a pair \begin{align*} \mathcal M = (M, \in_M), \end{align*} where $M$ is a set and $\in_M$ is a binary relation on $M$. [/definition] When $M$ is transitive and $\in_M$ is actual membership restricted to $M$, we usually write $(M, \in)$ instead of $(M, \in_M)$. The definition allows arbitrary membership-like structures because collapse arguments and elementary substructures naturally produce relations that must later be identified with membership. With the ambient structure fixed, the next issue is which subsets of its domain can be recovered from the structure's own first-order information. This is the notion that will replace arbitrary subsets in the construction of $L$. [definition: Definable Subset] Let $\mathcal M = (M, \in_M)$ be a structure for set membership. A subset $A \subset M$ is definable over $\mathcal M$ if there are a first-order formula $\varphi(v,w_1,\dots,w_n)$ in the language $\{\in\}$ and parameters $a_1,\dots,a_n \in M$ such that \begin{align*} A = \{x \in M : \mathcal M \models \varphi(x,a_1,\dots,a_n)\}. \end{align*} [/definition] The parameters are part of the definition, so definability here means definability with parameters from the structure. To build a hierarchy from definability, we need an operation that takes a whole membership structure and returns exactly the subsets that the structure can define. This packages one-formula-at-a-time definability into the stage-building operation used in the constructible hierarchy. [definition: Definable Power Set] The definable power-set operator is the class operation \begin{align*} \operatorname{Def}: \mathcal M=(M,\in_M) \longmapsto \operatorname{Def}(\mathcal M) \subset \mathcal P(M), \end{align*} where \begin{align*} \operatorname{Def}(\mathcal M) = \{A \subset M : A \text{ is definable over } \mathcal M\}. \end{align*} When $\mathcal M=(M,\in)$, write $\operatorname{Def}(M)$. [/definition] This operation is a restrained analogue of $\mathcal P(M)$. It always gives subsets of $M$, but it only includes those subsets separated by a formula and finitely many parameters. [example: Definable Subsets of a Finite Structure] Let $M=\{a,b,c\}$ and let $\in_M$ be any binary relation on $M$. Parameters from $M$ are allowed in definitions over $\mathcal M=(M,\in_M)$, so the membership relation itself is irrelevant for naming individual elements: the singleton $\{a\}$ is defined by $x=a$, the singleton $\{b\}$ by $x=b$, and the singleton $\{c\}$ by $x=c$. For an arbitrary subset $S\subseteq M$, write its elements as $S=\{s_1,\dots,s_k\}$, where each $s_i\in M$. If $S=\varnothing$, then \begin{align*} S=\{x\in M:x\ne x\}. \end{align*} If $S\ne\varnothing$, then \begin{align*} S=\{x\in M:x=s_1\vee \cdots \vee x=s_k\}. \end{align*} Thus every subset of $M$ is definable over $\mathcal M$ using parameters from $M$, so $\mathcal P(M)\subseteq \operatorname{Def}(\mathcal M)$. By the definition of $\operatorname{Def}(\mathcal M)$, every definable subset is already a subset of $M$, so $\operatorname{Def}(\mathcal M)\subseteq \mathcal P(M)$. Therefore \begin{align*} \operatorname{Def}(\mathcal M)=\mathcal P(M). \end{align*} This shows that constructibility does not differ from the cumulative hierarchy at very small finite stages, because parameter-definability can name each available element. [/example] Finite structures therefore hide the real restriction. The difference between definability and full power set becomes meaningful when a structure has many subsets but only set-many formulas and parameters available from within it. Before using $\operatorname{Def}$ as a successor operation, we need a size estimate showing that this restriction is genuine and controlled at infinite stages. [quotetheorem:4848] [citeproof:4848] This estimate is the first sign that $L$ will be thin compared with $V$ whenever $V$ contains many undefinable subsets. The hypothesis that $M$ is infinite is used only to absorb the countably many formulas and all finite powers $M^n$ into $|M|$; in finite structures the preceding example shows that $\operatorname{Def}(\mathcal M)$ can equal the whole power set. The countability of the language is also essential to the stated bound: if the language had too many relation symbols or constants, the number of formulas could dominate $|M|$. The theorem does not say that most subsets of every infinite $M$ are undefinable, only that the definable ones are controlled by the size of the structure; this is exactly the kind of control needed at successor stages of the constructible hierarchy. ## The Constructible Hierarchy The cumulative hierarchy was built by the recursion $V_{\alpha+1}=\mathcal P(V_\alpha)$. The constructible hierarchy asks what remains if, at each successor stage, we keep only the subsets of the previous stage that are first-order definable over it. [definition: Constructible Hierarchy] The constructible hierarchy is the class sequence $(L_\alpha)_{\alpha \in \operatorname{Ord}}$ defined by transfinite recursion as follows: \begin{align*} L_0 &= \varnothing,\\ L_{\alpha+1} &= \operatorname{Def}(L_\alpha),\\ L_\lambda &= \bigcup_{\alpha<\lambda} L_\alpha \quad \text{for limit ordinals } \lambda. \end{align*} [/definition] The relation used in $\operatorname{Def}(L_\alpha)$ is actual membership restricted to $L_\alpha$. Thus every member of $L_{\alpha+1}$ is a subset of $L_\alpha$ definable over $(L_\alpha,\in)$ using parameters from $L_\alpha$. The hierarchy gives only stage-by-stage approximations. To state constructibility as a property of a set, and to compare the definable universe with the ambient universe $V$, we collect all stages into one class. [definition: Constructible Universe] The constructible universe is the class \begin{align*} L = \bigcup_{\alpha \in \operatorname{Ord}} L_\alpha. \end{align*} A set $x$ is constructible if $x \in L$. [/definition] The notation parallels $V=\bigcup_{\alpha \in \operatorname{Ord}}V_\alpha$, but the construction is more selective. Membership in $L$ tells us that a set appears somewhere in the hierarchy, but later arguments need the first stage at which it appears. The constructible rank records that entry point and turns the stage of first definability into usable notation. [definition: Constructible Rank] The constructible rank is the map \begin{align*} \rho_L: L &\longrightarrow \operatorname{Ord},\\ x &\longmapsto \min\{\alpha \in \operatorname{Ord}: x \in L_{\alpha+1}\}. \end{align*} [/definition] The shift by one is convenient because elements of $L_{\alpha+1}$ are subsets of $L_\alpha$. The constructible rank records when a set is first definable from earlier constructible material, rather than when it first appears in the ordinary rank hierarchy. [example: The First Stages of the Constructible Hierarchy] We have $L_0=\varnothing$, and by definition \begin{align*} L_1=\operatorname{Def}(L_0). \end{align*} Every member of $\operatorname{Def}(L_0)$ is a subset of $L_0$, and the only subset of $\varnothing$ is $\varnothing$. The set $\varnothing$ is definable over $(L_0,\in)$ by the formula $x\ne x$, since \begin{align*} \{x\in L_0:x\ne x\}=\varnothing. \end{align*} Hence \begin{align*} L_1=\{\varnothing\}. \end{align*} Now $L_1=\{\varnothing\}$ has exactly two subsets: \begin{align*} \mathcal P(L_1)=\{\varnothing,\{\varnothing\}\}. \end{align*} The empty subset is again defined by \begin{align*} \{x\in L_1:x\ne x\}=\varnothing, \end{align*} and the whole subset is defined by \begin{align*} \{x\in L_1:x=x\}=L_1=\{\varnothing\}. \end{align*} Since every definable subset of $L_1$ is a subset of $L_1$, these two definitions exhaust $\operatorname{Def}(L_1)$, so \begin{align*} L_2=\operatorname{Def}(L_1)=\{\varnothing,\{\varnothing\}\}. \end{align*} Thus the first two successor stages agree with the corresponding early stages of the cumulative hierarchy; the agreement comes from the fact that finite structures have all of their subsets definable from parameters, and it should not be mistaken for equality at later infinite stages. [/example] These computations also show why transitivity is plausible: each new object is a subset of the previous stage. The obstruction is that plausibility at the first finite levels is not enough; later definability stages could only support an inner-model argument if elements of newly constructed sets have already appeared earlier. We therefore need a general closure result showing that the hierarchy remains transitive at every ordinal stage. [quotetheorem:4849] [citeproof:4849] Transitivity lets formulas with bounded quantifiers behave as expected, and it ensures that membership inside $L_\alpha$ is the true membership relation restricted to that stage. The hypothesis that the successor stage consists of subsets of $L_\alpha$ is doing the work; without it, a hierarchy could introduce objects whose elements had not appeared earlier, destroying the inner-model picture. The theorem does not imply that each $L_\alpha$ satisfies much set theory, since early stages are far too small and even large stages may fail Replacement internally. For the hierarchy to function cumulatively, transitivity must be accompanied by monotonicity: once a set appears, later stages must not lose it. This matters because definitions at later stages use old objects as parameters, so the next closure property records persistence through the hierarchy. [quotetheorem:4850] [citeproof:4850] This theorem is the technical reason parameters can be reused at later stages. Transitivity is essential in the proof: to define $x$ over $L_\alpha$ by the formula $u \in x$, the parameter $x$ must already be an element of the structure and the set being defined must be a subset of that structure. The result is only inclusion, not equality; successor stages normally add new definable subsets of $L_\alpha$. It also shows that $L_\lambda$ at a limit stage is a genuine cumulative union, not merely a disjoint collection of earlier approximations. ## Basic Closure Properties of $L$ To be an inner model, $L$ must not merely be a transitive class: it must contain all ordinals and satisfy the ZFC axioms. The next results isolate the closure facts that make this possible. [quotetheorem:4851] [citeproof:4851] This result uses the fact that earlier stages have exactly the earlier ordinals available. This hypothesis is not cosmetic: if a hierarchy skipped ordinals or added a false ordinal too early, later stages would not have the same height as $V$ and could not serve as an inner model. The theorem does not say that $L_\alpha$ contains every set of ordinary rank below $\alpha$; it says only that the ordinal spine of the universe is preserved. This places $L$ in the same ordinal height as $V$, which is a central part of the phrase inner model. [definition: Inner Model] An inner model is a transitive proper class $M$ such that every ordinal belongs to $M$ and $(M,\in)$ satisfies ZF. [/definition] Some authors include Choice in the definition when working over ZFC; here we separate ZF from Choice so that the role of definability in proving well-orderability remains visible. The remaining question is whether the carefully built class $L$ actually satisfies the axioms required of an inner model, rather than merely looking like a transitive cumulative hierarchy. This is where definability, absoluteness for set-sized satisfaction, and the preservation of all ordinals are combined to turn the construction into a model of ZFC. [quotetheorem:4852] [citeproof:4852] This theorem is the foundational reason $L$ is useful. The hypotheses that $L$ is transitive and contains all ordinals are both indispensable: transitivity gives the correct membership relation, while the full ordinal height lets the model interpret Replacement and cardinal structure in the intended way. The theorem is internal to the ambient ZFC universe and does not imply that $L=V$, nor that $L$ contains all subsets present in $V$. It gives a canonical ZFC universe inside any ambient universe satisfying ZFC, allowing statements to be tested in a highly organised subuniverse before forcing or large cardinals enter the course. [remark: Absoluteness Needed in the Proof] The proof that $L$ satisfies ZFC depends on controlled absoluteness for satisfaction over set-sized structures. For each fixed structure $L_\alpha$, the satisfaction relation for first-order formulas over $(L_\alpha,\in)$ can be coded in $V$ by recursion on formula complexity. This does not contradict Tarski's theorem, because the satisfaction relation is for a set-sized structure rather than for the entire universe. [/remark] This distinction between set-sized satisfaction and truth in the universe will recur throughout forcing. Constructibility is the first place where it becomes an organising tool rather than a technical aside. ## Condensation and the Shape of $L$ The construction of $L$ has a strong self-similarity property: elementary substructures of levels of $L$ collapse to earlier levels of $L$. This condensation principle is the first serious structural fact about $L$, and it is the engine behind many cardinal arithmetic consequences. The key point is that small elementary views of $L_\alpha$ are not arbitrary transitive structures: after collapse, they are initial segments of the same hierarchy. Elementarity is essential here, since an arbitrary subset of $L_\alpha$ may collapse to a transitive set with no reason to be a constructible level. Well-foundedness and extensionality are also hidden in the collapse hypothesis, because without them the Mostowski collapse would not produce a genuine transitive membership structure. [example: A Finite Collapse as a Model for Condensation] Let $X=\{a,b,c\}$ and define the collapse map recursively by \begin{align*} \pi(x)=\{\pi(y):yEx\}. \end{align*} First suppose $aEb$ and $aEc$ are the only non-reflexive $E$-arrows. Then $a$ has no $E$-predecessors, while $b$ and $c$ have exactly the same $E$-predecessor, namely $a$. Hence \begin{align*} \pi(a)&=\{\pi(y):yEa\}=\varnothing,\\ \pi(b)&=\{\pi(y):yEb\}=\{\pi(a)\}=\{\varnothing\},\\ \pi(c)&=\{\pi(y):yEc\}=\{\pi(a)\}=\{\varnothing\}. \end{align*} Thus $b$ and $c$ have the same predecessors, so $E$ is not extensional: the distinct elements $b$ and $c$ are not separated by their membership profiles. Now change the relation so that $aEb$ and $bEc$ are the only non-reflexive $E$-arrows. Then the recursive collapse gives \begin{align*} \pi(a)&=\varnothing,\\ \pi(b)&=\{\pi(a)\}=\{\varnothing\},\\ \pi(c)&=\{\pi(b)\}=\{\{\varnothing\}\}. \end{align*} The collapsed set is therefore \begin{align*} \pi[X]=\{\varnothing,\{\varnothing\},\{\{\varnothing\}\}\}. \end{align*} Membership in the collapse matches the original arrows: $\pi(a)\in \pi(b)$ because $\varnothing\in\{\varnothing\}$, and $\pi(b)\in \pi(c)$ because $\{\varnothing\}\in\{\{\varnothing\}\}$. This finite computation shows the mechanism behind condensation: once a relation is well-founded and extensional, its collapse replaces abstract membership arrows by actual membership between sets. [/example] The finite example does not contain elementarity, but it isolates the collapse mechanism used in condensation. Condensation adds the constructibility conclusion: after collapse, the structure is not merely transitive, but exactly some $L_\beta$. ## The Axiom $V=L$ and Cardinal Arithmetic Inside $L$ Having built $L$ as an inner model, we can ask what $L$ believes about its own universe. The answer is that $L$ satisfies $V=L$: internally, every set is constructible. [definition: Axiom of Constructibility] The axiom $V=L$ is the first-order assertion that every set belongs to the constructible universe: \begin{align*} \forall x\, \exists \alpha\, (\alpha \text{ is an ordinal and } x \in L_\alpha). \end{align*} [/definition] Strictly speaking, this is expressed in the language of set theory by expanding the recursive definition of $L_\alpha$ through a formula defining membership in the constructible hierarchy. The displayed form is the mathematical abbreviation used throughout the course. [quotetheorem:4853] [citeproof:4853] This result does not say that $V=L$ holds in the ambient universe. The key hypothesis is that the structure under discussion is $(L,\in)$ itself; if the ambient universe has non-constructible sets, those sets are simply not elements of the structure whose truth is being evaluated. The theorem also relies on the absoluteness of the construction of each $L_\alpha$ between $L$ and $V$, so that $L$ computes its own hierarchy correctly. Once $L$ is known to be an inner model satisfying $V=L$, it can be used for relative consistency. The next step is to make explicit what this says about the theory ZFC together with the axiom of constructibility. [quotetheorem:4854] [citeproof:4854] The relative consistency argument is conditional: it starts from a model of ZFC and builds an inner model, so it proves no absolute consistency from nothing. It also does not decide whether the actual universe satisfies $V=L$; forcing will later show how to build extensions in which additional subsets appear. The axiom $V=L$ has consequences beyond consistency because every set then appears at a least constructible stage with a definable recipe. Ordering sets by their first stage, formula code, and parameter tuple produces a canonical global well-order, which is the next formal consequence to isolate. [quotetheorem:4855] [citeproof:4855] The well-order gives Choice in $L$ and allows subsets of a cardinal to be counted by the stage at which they first appear. The construction depends on choosing least formula codes and least parameter tuples, so it uses the recursive ordering of earlier levels rather than an external arbitrary well-order. The theorem gives a class well-order of $L$, not a set belonging to $L$ that lists all of $L$, since $L$ is a proper class. A global well-order alone does not yet give sharp bounds on how many subsets of a cardinal appear in $L$; it could in principle place many first definitions very high in the hierarchy. The remaining question is whether constructible subsets of a cardinal can be reflected down to a bounded initial segment of the hierarchy. Condensation supplies that reflection mechanism, and the resulting counting argument yields the Generalised Continuum Hypothesis inside $L$. [quotetheorem:4856] [citeproof:4856] The cardinal hypothesis matters: the argument is carried out inside $L$ for infinite $L$-cardinals, and the conclusion concerns only subsets that belong to $L$. Condensation is the step that prevents a subset of $\kappa$ from needing an arbitrarily high first definition; without it, the definable well-order alone would not give the sharp $\kappa^+$ bound. The conceptual point is that constructible subsets of $\kappa$ cannot first appear arbitrarily high: condensation reflects each such subset down below $\kappa^+$. [remark: What Constructibility Does Not Prove About V] Theorems proved inside $L$ need not hold in the ambient universe $V$. The statement $L\models \mathrm{GCH}$ says that the inner model's subsets of its cardinals obey GCH. It does not say that $V$ has no additional subsets of those cardinals. [/remark] This is the bridge to independence. Constructibility gives one highly controlled model of ZFC; forcing, which comes later, builds other models with additional subsets and different cardinal arithmetic.

Created by admin on 6/1/2026 | Last updated on 6/1/2026

What brings you to Androma?

Start with a route through the knowledge graph.

Set Theory I (see Axiomatic Set Theory entry above).

Sign in to Androma

Check your inbox

One last step

Set Theory I (see Axiomatic Set Theory entry above).

Prerequisites (0/11 completed)

Prerequisites Graph

Rate this page