Every mathematical structure rests on the same primitive foundation: the **set**. [Groups](/page/Group) are sets with a binary operation; [topological](/page/Topology) spaces are sets equipped with a collection of designated subsets; measure spaces are sets with a $\sigma$-algebra and a measure. When a mathematician writes "$f: X \to Y$ is [continuous](/page/Continuity)," the objects $X$, $Y$, and $f$ are all defined in terms of sets. Yet the concept of a set is not as straightforward as it first appears. The naive intuition — "a set is any collection of objects" — leads to contradictions, and building a rigorous foundation requires care. This page develops the notion of a set from its naive origins through the paradoxes that forced an axiomatic treatment, defines the fundamental operations on sets, and establishes the first non-trivial results about their structure.
[motivation]
## Motivation
### The Naive Conception
Georg Cantor, in his 1895 *Beiträge*, described a set as "a collection into a whole of definite, distinct objects of our perception or our thought." Under this conception, every property $P$ determines a set $\{x : P(x)\}$ — the collection of all objects satisfying $P$. This principle, called **unrestricted comprehension**, is powerful: it produces the natural numbers $\{1, 2, 3, \ldots\}$, the real line $\mathbb{R}$, the set of all continuous [functions](/page/Function) on $[0,1]$, and indeed the set of all sets. For most everyday mathematical purposes, this approach works perfectly well, and mathematicians routinely write set-builder notation $\{x : P(x)\}$ without incident.
### The Catastrophe: Self-Reference
In 1901, Bertrand Russell observed that unrestricted comprehension allows a devastating self-reference. Consider the property $P(x) :\Leftrightarrow x \notin x$ — the property of not being a member of oneself. Under unrestricted comprehension, $R = \{x : x \notin x\}$ is a set. But is $R$ a member of itself? If $R \in R$, then by definition $R \notin R$; if $R \notin R$, then $R$ satisfies the defining property and so $R \in R$. Both cases lead to a contradiction. The naive conception of set, which seemed so natural, is inconsistent.
This is not an isolated curiosity. The same self-referential mechanism produces other paradoxes: the Burali-Forti paradox (the "set of all ordinals" cannot be an ordinal), and Cantor's own paradox (the "set of all sets" leads to a contradiction with his theorem on power sets). The lesson is that collections which are "too large" or "too self-referential" cannot be sets.
### The Axiomatic Resolution
The response to these paradoxes was to abandon the idea that every property defines a set and replace it with a carefully chosen list of axioms specifying exactly which set-forming operations are permitted. The most widely adopted system is **Zermelo-Fraenkel set theory with the Axiom of Choice** (ZFC). In ZFC, "set" is a primitive undefined term (just as "point" is in Euclidean geometry), and the axioms govern what one may do with sets: form pairs, take unions, separate elements by a property *within an existing set*, take power sets, and so on. Crucially, the Axiom Schema of Separation replaces unrestricted comprehension: given an existing set $A$ and a property $P$, one may form $\{x \in A : P(x)\}$, but not $\{x : P(x)\}$ in general. This seemingly small restriction blocks Russell's paradox while preserving all the set constructions that mathematics actually uses.
[/motivation]
## The Axioms of ZFC
Working within ZFC, we do not define what a set *is* — the term is primitive. Instead, the axioms specify what sets exist and what operations produce new sets from old ones. The single primitive relation is **membership**: $x \in A$ means "$x$ is an element of $A$." All other notions (subset, union, function, cardinality) are defined from membership alone.
We state the axioms informally here; each can be made precise in the language of first-order logic with a single binary relation symbol $\in$.
The first axiom ensures that sets are determined entirely by their elements — two sets with the same members are the same set, regardless of how they were described or constructed.
[definition:Axiom Of Extensionality]
**Axiom of Extensionality.** Two sets are equal if and only if they have exactly the same elements:
\begin{align*}
\forall A\, \forall B\, \bigl(A = B \iff \forall x\, (x \in A \iff x \in B)\bigr).
\end{align*}
[/definition]
Extensionality means that a set has no internal structure beyond its elements: $\{1, 2, 3\}$, $\{3, 1, 2\}$, and $\{1, 2, 2, 3\}$ are all the same set, because they have the same members. There is no notion of order or multiplicity.
The next axiom guarantees that the set-theoretic universe is not vacuous — at least one set exists.
[definition:Axiom Of Empty Set]
**Axiom of the Empty Set.** There exists a set with no elements, denoted $\varnothing$:
\begin{align*}
\exists\, \varnothing\; \forall x\; (x \notin \varnothing).
\end{align*}
By extensionality, this set is unique.
[/definition]
The existence of $\varnothing$ is the starting point from which all of ZFC's constructions proceed. From the empty set alone, the pairing and union axioms can build the finite ordinals $0 = \varnothing$, $1 = \{\varnothing\}$, $2 = \{\varnothing, \{\varnothing\}\}$, and so on.
The axiom that directly resolves Russell's paradox is separation. Unrestricted comprehension said: for any property $P$, the collection $\{x : P(x)\}$ is a set. Separation weakens this to: for any property $P$ and any *existing* set $A$, the collection $\{x \in A : P(x)\}$ is a set. The difference is that one can only carve out subsets of sets that already exist — one cannot conjure a set from a property alone.
[definition:Axiom Schema Of Separation]
**Axiom Schema of Separation (Aussonderung).** For every set $A$ and every property $P(x)$ expressible in the language of set theory, there exists a set
\begin{align*}
B = \{x \in A : P(x)\}.
\end{align*}
[/definition]
With separation, the attempt to form $R = \{x : x \notin x\}$ fails: one can only form $\{x \in A : x \notin x\}$ for some pre-existing set $A$, which is a harmless subset of $A$ (and the contradiction shows only that $R_A \notin A$, not that $R_A$ doesn't exist).
### The Remaining Axioms
The other axioms of ZFC provide the tools for building new sets:
- **Pairing:** for any sets $a$ and $b$, the set $\{a, b\}$ exists.
- **Union:** for any set $\mathcal{F}$ of sets, the union $\bigcup \mathcal{F} = \{x : \exists A \in \mathcal{F},\; x \in A\}$ exists.
- **Power Set:** for any set $A$, the power set $\mathcal{P}(A) = \{B : B \subseteq A\}$ exists.
- **Infinity:** there exists an infinite set (specifically, an inductive set containing $\varnothing$ and closed under the successor operation $x \mapsto x \cup \{x\}$).
- **Replacement (Schema):** the image of a set under any definable function is again a set.
- **Regularity (Foundation):** every nonempty set $A$ contains an element disjoint from $A$; equivalently, there is no infinite descending chain $\cdots \in a_2 \in a_1 \in a_0$. This rules out pathologies like $a \in a$.
- **Choice:** for every family of nonempty sets, there exists a function selecting one element from each set.
Together, these axioms are strong enough to develop all of standard mathematics — number systems, functions, topological spaces, measure theory, algebra — while avoiding the paradoxes of naive set theory.
## Russell's Paradox
The following result, already discussed in the motivation, makes precise exactly why unrestricted comprehension is untenable. It is the simplest and most famous of the set-theoretic paradoxes, and its proof is a direct diagonalisation argument — the same technique that reappears in [Cantor's theorem](/theorems/621).
[quotetheorem:620]
The paradox does far more than eliminate a single pathological set. It demonstrates that any foundation for mathematics must restrict the principle by which properties determine collections. In ZFC, the restriction is separation: one may only separate elements from a set that already exists. In NBG (von Neumann–Bernstein–Gödel) set theory, the restriction is different: "collections that are too large" (like the collection of all sets) are called *proper classes* and are not themselves sets — they cannot be elements of other collections. Both approaches block the paradox, but in different ways.
The diagonal technique at the heart of Russell's paradox — defining an object that, by construction, differs from every candidate in a given list — is one of the most powerful ideas in mathematics. It reappears in the proof that $\mathbb{R}$ is uncountable, in Gödel's incompleteness theorems, in the undecidability of the halting problem, and in the proof of [Cantor's theorem](/theorems/621) below.
## Subsets and the Inclusion Ordering
Once sets exist, the most basic relation between them (beyond membership) is containment. A set $A$ is a subset of $B$ if every element of $A$ also belongs to $B$. This relation turns the collection of all subsets of a fixed set into a partially ordered set — in fact, a complete lattice — with union as join and intersection as meet.
[definition:Subset]
Let $A$ and $B$ be sets. $A$ is a **subset** of $B$, written $A \subseteq B$, if every element of $A$ is an element of $B$:
\begin{align*}
A \subseteq B \iff \forall x\, (x \in A \implies x \in B).
\end{align*}
$A$ is a **proper subset** of $B$, written $A \subset B$, if $A \subseteq B$ and $A \neq B$.
[/definition]
Two immediate consequences of extensionality and the definition of subset are worth recording. First, to prove $A = B$ it suffices to show $A \subseteq B$ and $B \subseteq A$ (this is the **double-inclusion** method, used in virtually every set-equality proof). Second, the empty set is a subset of every set: the statement $\forall x\, (x \in \varnothing \implies x \in B)$ is vacuously true for any $B$, since the hypothesis $x \in \varnothing$ is never satisfied.
[example:Subsets Of A Three Element Set]
Let $S = \{1, 2, 3\}$. The subsets of $S$ are:
\begin{align*}
\varnothing,\; \{1\},\; \{2\},\; \{3\},\; \{1,2\},\; \{1,3\},\; \{2,3\},\; \{1,2,3\}.
\end{align*}
There are $2^3 = 8$ subsets in total. This count generalises: a finite set with $n$ elements has exactly $2^n$ subsets, because each element independently either belongs or does not belong to a given subset — each subset corresponds to a binary string of length $n$, and there are $2^n$ such strings.
[/example]
## Set Operations
The basic constructions on sets — union, intersection, difference, and complement — mirror the logical connectives $\lor$, $\land$, and $\lnot$. This correspondence is not a coincidence: set membership $x \in A$ is a proposition, and the set operations are defined by combining such propositions with logical connectives.
### Union and Intersection
The union of two sets collects all elements that belong to at least one of them; the intersection collects those that belong to both. These operations generalise to arbitrary indexed families.
[definition:Union]
Let $A$ and $B$ be sets. The **union** of $A$ and $B$ is
\begin{align*}
A \cup B := \{x : x \in A \lor x \in B\}.
\end{align*}
More generally, for an indexed family $\{A_i\}_{i \in I}$ of sets:
\begin{align*}
\bigcup_{i \in I} A_i := \{x : \exists\, i \in I,\; x \in A_i\}.
\end{align*}
[/definition]
[definition:Intersection]
Let $A$ and $B$ be sets. The **intersection** of $A$ and $B$ is
\begin{align*}
A \cap B := \{x : x \in A \land x \in B\}.
\end{align*}
For an indexed family $\{A_i\}_{i \in I}$ with $I \neq \varnothing$:
\begin{align*}
\bigcap_{i \in I} A_i := \{x : \forall\, i \in I,\; x \in A_i\}.
\end{align*}
[/definition]
The requirement $I \neq \varnothing$ for the arbitrary intersection deserves comment. If $I = \varnothing$, then $\forall\, i \in I,\; x \in A_i$ is vacuously true for every $x$, so $\bigcap_{i \in \varnothing} A_i$ would be the "set of all objects" — which, as Russell's paradox shows, is not a set. In practice, one either requires $I \neq \varnothing$ or works within a fixed ambient set $X$ and defines $\bigcap_{i \in \varnothing} A_i := X$.
### Set Difference and Complement
Removing elements of one set from another gives the set difference. When the ambient set is fixed, this becomes the complement.
[definition:Set Difference]
Let $A$ and $B$ be sets. The **set difference** (or **relative complement**) of $B$ in $A$ is
\begin{align*}
A \setminus B := \{x \in A : x \notin B\}.
\end{align*}
[/definition]
[definition:Complement]
Let $X$ be a set and $A \subseteq X$. The **complement** of $A$ in $X$ is
\begin{align*}
A^c := X \setminus A = \{x \in X : x \notin A\}.
\end{align*}
When the ambient set $X$ is understood from context, one writes simply $A^c$.
[/definition]
The fundamental identities governing how complementation interacts with union and intersection are De Morgan's laws. They convert unions into intersections and vice versa — a duality that underlies many arguments in analysis (for instance, the equivalence between open-cover and closed-set formulations of compactness).
[quotetheorem:622]
De Morgan's laws are the set-theoretic incarnation of the logical identities $\lnot(P \lor Q) \iff \lnot P \land \lnot Q$ and $\lnot(P \land Q) \iff \lnot P \lor \lnot Q$. Their importance extends well beyond elementary set theory: they appear in measure theory (complementing a countable union of measurable sets), in topology (the complement of an arbitrary union of [open sets](/page/Open%20Set) is an intersection of [closed sets](/page/Closed%20Set), and vice versa), and in functional analysis (characterising the complement of a union of null sets). The generalised versions to arbitrary indexed families follow by the same argument, replacing finite conjunctions and disjunctions with universal and existential quantifiers.
[example:De Morgan In Three Element Set]
Let $X = \{1, 2, 3, 4, 5\}$, $A = \{1, 2, 3\}$, and $B = \{2, 4, 5\}$. Then $A \cup B = \{1, 2, 3, 4, 5\} = X$ and $A \cap B = \{2\}$. The complements are $A^c = \{4, 5\}$ and $B^c = \{1, 3\}$. De Morgan's laws give:
\begin{align*}
(A \cup B)^c &= X^c = \varnothing = \{4, 5\} \cap \{1, 3\} = A^c \cap B^c, \\
(A \cap B)^c &= \{2\}^c = \{1, 3, 4, 5\} = \{4, 5\} \cup \{1, 3\} = A^c \cup B^c.
\end{align*}
Both identities hold, as the theorem guarantees.
[/example]
### Cartesian Product
The operations above combine sets by manipulating membership conditions on individual elements. The Cartesian product does something fundamentally different: it builds a new set whose elements are *ordered pairs* — objects that record both a choice from one set and a choice from another, with the order of selection retained.
To define ordered pairs rigorously, one needs a construction in which $(a, b) = (c, d)$ if and only if $a = c$ and $b = d$. The standard construction, due to Kuratowski, represents the ordered pair as a set of sets: $(a, b) := \{\{a\}, \{a, b\}\}$. One verifies (using only extensionality and pairing) that this encoding satisfies the ordered-pair property.
[definition:Cartesian Product]
Let $A$ and $B$ be sets. The **Cartesian product** of $A$ and $B$ is
\begin{align*}
A \times B := \{(a, b) : a \in A \land b \in B\},
\end{align*}
where $(a, b) := \{\{a\}, \{a, b\}\}$ is the Kuratowski ordered pair.
[/definition]
The Cartesian product is the foundation for defining functions ($f: A \to B$ is a subset of $A \times B$ satisfying a uniqueness condition), relations (subsets of $A \times B$), and product topologies and product measures. The construction extends to finite products $A_1 \times \cdots \times A_n$ by iteration, and to arbitrary products $\prod_{i \in I} A_i$ via functions from the index set — whose nonemptiness, when the factors are nonempty, is precisely the content of the Axiom of Choice.
## The Power Set and Cantor's Theorem
The power set of $A$ — the set of all subsets of $A$ — is guaranteed to exist by the Power Set Axiom. For finite sets, the power set is simply larger (a set with $n$ elements has $2^n$ subsets). The remarkable fact is that this strict inequality persists for *infinite* sets: the power set of $\mathbb{N}$ is strictly larger than $\mathbb{N}$ itself, and no set can ever be put into bijection with its own power set.
[definition:Power Set]
Let $A$ be a set. The **power set** of $A$ is
\begin{align*}
\mathcal{P}(A) := \{B : B \subseteq A\}.
\end{align*}
[/definition]
[example:Power Set Of A Pair]
Let $A = \{a, b\}$ where $a \neq b$. The power set is
\begin{align*}
\mathcal{P}(\{a, b\}) = \bigl\{\varnothing,\; \{a\},\; \{b\},\; \{a, b\}\bigr\},
\end{align*}
which has $2^2 = 4$ elements. As a check: the inclusion $\iota: A \to \mathcal{P}(A)$ given by $\iota(x) = \{x\}$ maps $a \mapsto \{a\}$ and $b \mapsto \{b\}$. This map is injective but not surjective — neither $\varnothing$ nor $\{a,b\}$ is in its range. Already for this smallest non-trivial example, $|A| = 2 < 4 = |\mathcal{P}(A)|$.
[/example]
The question is whether this strict inequality is an artifact of finiteness or a universal phenomenon. Cantor's theorem shows it is universal: even for infinite sets, the power set is always strictly larger. The proof uses exactly the diagonal technique that powered Russell's paradox — but now in a constructive direction, producing for any candidate function $f: A \to \mathcal{P}(A)$ a specific subset that $f$ fails to hit.
[quotetheorem:621]
Cantor's theorem has profound consequences. Applied to $A = \mathbb{N}$, it shows $|\mathbb{N}| < |\mathcal{P}(\mathbb{N})|$, so the set of all subsets of natural numbers is uncountable. Since $\mathcal{P}(\mathbb{N})$ can be identified with $\{0, 1\}^{\mathbb{N}}$ (binary [sequences](/page/Sequence)) and hence with $\mathbb{R}$ (via binary expansions), this gives another proof that $\mathbb{R}$ is uncountable. More dramatically, applying the theorem iteratively yields an infinite strictly increasing chain of cardinalities:
\begin{align*}
|\mathbb{N}| < |\mathcal{P}(\mathbb{N})| < |\mathcal{P}(\mathcal{P}(\mathbb{N}))| < \cdots
\end{align*}
There is no largest infinity — the hierarchy of cardinal numbers is unbounded. Whether there exists a cardinality strictly between $|\mathbb{N}|$ and $|\mathcal{P}(\mathbb{N})|$ is the **Continuum Hypothesis**, which Gödel and Cohen showed is independent of ZFC: it can neither be proved nor disproved from the standard axioms.
### Why Cantor's Diagonal Argument Is Not Russell's Paradox
The diagonal set $D = \{a \in A : a \notin f(a)\}$ in Cantor's proof looks structurally identical to the Russell set $R = \{x : x \notin x\}$. The crucial difference is that Cantor's $D$ is defined by separating elements of $A$ — a pre-existing set — using the property $a \notin f(a)$. This is a valid application of the Axiom Schema of Separation, so $D$ is a well-defined set. Russell's $R$, by contrast, attempts to collect *all* objects satisfying $x \notin x$ without restricting to a pre-existing set. The ZFC axioms permit the former and block the latter.
## Problems
[problem]
Let $A$, $B$, and $C$ be subsets of a set $X$. Prove the **distributive law**:
\begin{align*}
A \cap (B \cup C) = (A \cap B) \cup (A \cap C).
\end{align*}
[/problem]
[solution]
**Step 1 (Forward inclusion: $A \cap (B \cup C) \subseteq (A \cap B) \cup (A \cap C)$).** Let $x \in A \cap (B \cup C)$. By the definition of intersection, $x \in A$ and $x \in B \cup C$. By the definition of union, $x \in B$ or $x \in C$.
- **Case 1:** $x \in B$. Since $x \in A$ and $x \in B$, the definition of intersection gives $x \in A \cap B$. By the definition of union, $x \in (A \cap B) \cup (A \cap C)$.
- **Case 2:** $x \in C$. Since $x \in A$ and $x \in C$, the definition of intersection gives $x \in A \cap C$. By the definition of union, $x \in (A \cap B) \cup (A \cap C)$.
In both cases, $x \in (A \cap B) \cup (A \cap C)$, so the forward inclusion holds.
**Step 2 (Reverse inclusion: $(A \cap B) \cup (A \cap C) \subseteq A \cap (B \cup C)$).** Let $x \in (A \cap B) \cup (A \cap C)$. By the definition of union, $x \in A \cap B$ or $x \in A \cap C$.
- **Case 1:** $x \in A \cap B$. Then $x \in A$ and $x \in B$. Since $x \in B$, the definition of union gives $x \in B \cup C$. Hence $x \in A \cap (B \cup C)$.
- **Case 2:** $x \in A \cap C$. Then $x \in A$ and $x \in C$. Since $x \in C$, the definition of union gives $x \in B \cup C$. Hence $x \in A \cap (B \cup C)$.
In both cases, $x \in A \cap (B \cup C)$, so the reverse inclusion holds.
**Step 3 (Conclusion).** By double inclusion (Steps 1 and 2), $A \cap (B \cup C) = (A \cap B) \cup (A \cap C)$. $\blacksquare$
[/solution]
[problem]
Let $A$ be a finite set with $|A| = n$ elements. Prove that $|\mathcal{P}(A)| = 2^n$.
[/problem]
[solution]
**Step 1 (Establish a bijection with binary strings).** Define a function
\begin{align*}
\Phi: \mathcal{P}(A) &\to \{0, 1\}^A \\
B &\mapsto \mathbb{1}_B,
\end{align*}
where $\mathbb{1}_B: A \to \{0,1\}$ is the indicator function of $B$, defined by $\mathbb{1}_B(a) = 1$ if $a \in B$ and $\mathbb{1}_B(a) = 0$ if $a \notin B$.
**Step 2 (Verify $\Phi$ is a bijection).**
*Injectivity:* If $\Phi(B_1) = \Phi(B_2)$, then $\mathbb{1}_{B_1}(a) = \mathbb{1}_{B_2}(a)$ for all $a \in A$. For any $a \in A$: $a \in B_1 \iff \mathbb{1}_{B_1}(a) = 1 \iff \mathbb{1}_{B_2}(a) = 1 \iff a \in B_2$. By extensionality, $B_1 = B_2$.
*Surjectivity:* Given any function $g: A \to \{0,1\}$, define $B = \{a \in A : g(a) = 1\}$. Then $B \subseteq A$, so $B \in \mathcal{P}(A)$, and $\Phi(B) = \mathbb{1}_B = g$.
**Step 3 (Count).** Fix an enumeration $A = \{a_1, \ldots, a_n\}$. The bijection $\Phi$ identifies $\mathcal{P}(A)$ with $\{0,1\}^A$, which under the enumeration is $\{0,1\}^n$. Each of the $n$ coordinates takes one of $2$ values independently, so $|\{0,1\}^n| = 2^n$ by the multiplication principle.
**Step 4 (Conclusion).** Since $\Phi$ is a bijection, $|\mathcal{P}(A)| = |\{0,1\}^A| = 2^n$. $\blacksquare$
[/solution]
## References
- Halmos, P. R., *Naive Set Theory* (1960).
- Jech, T., *Set Theory: The Third Millennium Edition* (2003).
- Kunen, K., *Set Theory: An Introduction to Independence Proofs* (1980).
- Enderton, H. B., *Elements of Set Theory* (1977).