How do we assign "size" to a set? For finite collections of points, counting works. For intervals on the real line, length works. For rectangles in the plane, area works. But modern analysis demands far more: we must assign sizes to highly irregular sets — the preimage of an interval under a measurable function, the zero set of a Sobolev function, the support of a probability distribution — and we must do so in a way that is compatible with countable operations (unions, intersections, limits). The theory of measure spaces provides the rigorous framework for this assignment.
The need for such a framework is not merely aesthetic. The [Lebesgue integral](/page/Lebesgue%20Integral) partitions the *range* of a function and measures the size of each preimage set. Without a consistent way to measure those preimage sets, the integral cannot be defined. The [$L^p$ spaces](/page/L%5Ep%20Spaces) that form the backbone of functional analysis and PDE theory are quotient spaces modulo sets of "measure zero" — a concept that requires a measure. [Probability theory](/page/Probability%20Space) models random phenomena through probability measures on spaces of outcomes. In each case, the underlying structure is a measure space.
But building this framework forces us to confront a fundamental obstruction: **not every collection of sets can be consistently measured.** The Vitali construction (using the Axiom of Choice to select one representative from each coset of $\mathbb{Q}$ in $\mathbb{R}/\mathbb{Q}$) produces a set $V \subset [0,1]$ that cannot be assigned a Lebesgue measure without contradicting the translation invariance and countable additivity that make the theory useful. This means we cannot simply define a measure on *all* subsets of $\mathbb{R}$ — we must first restrict attention to a well-behaved collection of "measurable" sets.
[example: The Vitali Obstruction]
Suppose, seeking a contradiction, that every subset of $[0,1]$ could be assigned a measure $\mu$ satisfying:
- $\mu([0,1]) = 1$,
- $\mu$ is countably additive: $\mu\bigl(\bigsqcup_{k=1}^\infty A_k\bigr) = \sum_{k=1}^\infty \mu(A_k)$ for pairwise disjoint sets,
- $\mu$ is translation invariant: $\mu(A + t \mod 1) = \mu(A)$ for every $t \in \mathbb{R}$ (where addition is modulo $1$).
Construct the Vitali set $V$ by choosing one representative from each equivalence class of $[0,1]$ under the relation $x \sim y \iff x - y \in \mathbb{Q}$. Let $\{q_k\}_{k=1}^\infty$ be an enumeration of $\mathbb{Q} \cap [-1,1]$, and define $V_k := (V + q_k) \mod 1$. The sets $V_k$ are pairwise disjoint (if $v_1 + q_j \equiv v_2 + q_k \pmod{1}$ with $v_1, v_2 \in V$, then $v_1 - v_2 \in \mathbb{Q}$, forcing $v_1 = v_2$ and $q_j = q_k$). Moreover, $[0,1] = \bigsqcup_{k=1}^\infty V_k$ (every $x \in [0,1]$ is equivalent to some $v \in V$, so $x \in V_k$ for $k$ such that $q_k = x - v \mod 1$).
By countable additivity and translation invariance:
\begin{align*}
1 = \mu([0,1]) = \sum_{k=1}^\infty \mu(V_k) = \sum_{k=1}^\infty \mu(V).
\end{align*}
If $\mu(V) = 0$, the sum is $0 \neq 1$. If $\mu(V) > 0$, the sum diverges to $+\infty \neq 1$. Both cases produce contradictions.
The conclusion is inescapable: no measure on *all* subsets of $[0,1]$ can simultaneously satisfy countable additivity and translation invariance. We must restrict the domain of the measure to a proper subcollection of subsets — and that subcollection is the $\sigma$-algebra.
[/example]
The Vitali construction reveals that the question is not "what is a measure?" but rather "on which sets can a measure be defined?" The answer — a $\sigma$-algebra — is a collection of sets closed under the operations that arise naturally in analysis: complementation, countable unions, and countable intersections. A **measure space** is then a set $X$ equipped with a $\sigma$-algebra $\mathcal{F}$ of "measurable" subsets and a measure $\mu$ that assigns a non-negative size to each set in $\mathcal{F}$.
## Definition
The definition of a measure space has three components, each addressing a specific need. The $\sigma$-algebra specifies which sets can be measured. The measure assigns sizes to those sets. The axioms of countable additivity ensure compatibility with the limit operations that pervade analysis.
[definition: Measure Space]
A **measure space** is a triple $(X, \mathcal{F}, \mu)$ consisting of:
1. A non-empty set $X$ (the **underlying set** or **sample space**).
2. A **$\sigma$-algebra** $\mathcal{F}$ on $X$: a collection $\mathcal{F} \subset \mathcal{P}(X)$ of subsets of $X$ satisfying:
- (i) $\varnothing \in \mathcal{F}$,
- (ii) if $A \in \mathcal{F}$, then $A^c := X \setminus A \in \mathcal{F}$ (closure under complementation),
- (iii) if $A_1, A_2, \ldots \in \mathcal{F}$, then $\bigcup_{k=1}^\infty A_k \in \mathcal{F}$ (closure under countable unions).
The pair $(X, \mathcal{F})$ is called a **measurable space**, and the elements of $\mathcal{F}$ are called **$\mathcal{F}$-measurable sets** (or simply **measurable sets** when $\mathcal{F}$ is clear from context).
3. A **measure** $\mu$ on $(X, \mathcal{F})$: a function $\mu: \mathcal{F} \to [0, \infty]$ satisfying:
- (i) $\mu(\varnothing) = 0$,
- (ii) **countable additivity** ($\sigma$-additivity): if $A_1, A_2, \ldots \in \mathcal{F}$ are pairwise disjoint, then
\begin{align*}
\mu\!\left(\bigsqcup_{k=1}^\infty A_k\right) = \sum_{k=1}^\infty \mu(A_k).
\end{align*}
The measure space is called **finite** if $\mu(X) < \infty$, and a **probability space** if $\mu(X) = 1$ (in which case we write $\mathbb{P}$ for $\mu$ and call $\mathcal{F}$ the **event $\sigma$-algebra**).
[/definition]
Several aspects of this definition deserve attention.
First, the $\sigma$-algebra axioms are designed to produce a collection of sets that is stable under the operations used in analysis. Axiom (ii) ensures that if a set $A$ is measurable, so is its complement — we can speak of the "set where $f > 0$" and the "set where $f \le 0$" simultaneously. Axiom (iii), closure under *countable* unions, is the crucial upgrade from an algebra (which requires only *finite* unions). This upgrade is what makes $\sigma$-algebras compatible with limits: if each set $A_k = \{f_k > 0\}$ is measurable, then $\limsup_{k \to \infty} A_k = \bigcap_{m=1}^\infty \bigcup_{k=m}^\infty A_k$ is also measurable, which is essential for the [Borel-Cantelli lemma](/page/Probability%20Space) and the definition of convergence almost everywhere.
Note that closure under countable intersections follows from the axioms: $\bigcap_{k=1}^\infty A_k = \bigl(\bigcup_{k=1}^\infty A_k^c\bigr)^c$, and both complementation and countable union are available. Similarly, $X = \varnothing^c \in \mathcal{F}$.
Second, countable additivity is the heart of the measure axioms. Finite additivity alone — the weaker requirement that $\mu(A \sqcup B) = \mu(A) + \mu(B)$ — is insufficient for modern analysis, because it does not imply the continuity properties (from below and from above) that underpin the Monotone Convergence Theorem and the Dominated Convergence Theorem. Without countable additivity, the passage from "each approximation $f_k$ is integrable" to "the limit $f$ is integrable" breaks down.
Third, we allow $\mu$ to take the value $+\infty$. This is essential for Lebesgue measure on $\mathbb{R}^n$ (where $\mathcal{L}^n(\mathbb{R}^n) = +\infty$) and for counting measure (where any infinite set has measure $+\infty$). The extended non-negative reals $[0, \infty]$ form a natural codomain for measures.
## Generating $\sigma$-Algebras
A basic difficulty in working with $\sigma$-algebras is that they are typically enormous — too large to describe by listing their elements. The Borel $\sigma$-algebra $\mathcal{B}(\mathbb{R})$ has cardinality $2^{\aleph_0}$ (the cardinality of the continuum), and even "small" $\sigma$-algebras on uncountable sets contain uncountably many elements. Directly verifying that a collection of sets satisfies the $\sigma$-algebra axioms, or that a particular set belongs to a given $\sigma$-algebra, is usually impractical.
The solution is to specify a $\sigma$-algebra through a **generating family**: a (often small) collection of sets whose $\sigma$-algebraic closure is the desired $\sigma$-algebra.
[definition: Generated Sigma-Algebra]
Let $X$ be a non-empty set and let $\mathcal{C} \subset \mathcal{P}(X)$ be an arbitrary collection of subsets. The **$\sigma$-algebra generated by $\mathcal{C}$**, denoted $\sigma(\mathcal{C})$, is the smallest $\sigma$-algebra on $X$ containing $\mathcal{C}$:
\begin{align*}
\sigma(\mathcal{C}) := \bigcap \bigl\{ \mathcal{G} : \mathcal{G} \text{ is a } \sigma\text{-algebra on } X \text{ and } \mathcal{C} \subset \mathcal{G} \bigr\}.
\end{align*}
This intersection is well-defined and is itself a $\sigma$-algebra: the family $\mathcal{P}(X)$ is always a $\sigma$-algebra containing $\mathcal{C}$, so the intersection is taken over a non-empty collection of $\sigma$-algebras, and an arbitrary intersection of $\sigma$-algebras is a $\sigma$-algebra (verified directly from the axioms).
[/definition]
The generated $\sigma$-algebra $\sigma(\mathcal{C})$ is characterised by a universal property: it is the unique $\sigma$-algebra $\mathcal{F}$ on $X$ satisfying (i) $\mathcal{C} \subset \mathcal{F}$, and (ii) if $\mathcal{G}$ is any $\sigma$-algebra with $\mathcal{C} \subset \mathcal{G}$, then $\mathcal{F} \subset \mathcal{G}$. This minimality is the key to the **good sets principle**: to show that every set in $\sigma(\mathcal{C})$ has a property $P$, it suffices to check that (a) every set in $\mathcal{C}$ has property $P$, and (b) the collection of sets with property $P$ forms a $\sigma$-algebra. Since this collection is a $\sigma$-algebra containing $\mathcal{C}$, it must contain $\sigma(\mathcal{C})$.
The most important generated $\sigma$-algebra in analysis is the Borel $\sigma$-algebra.
[definition: Borel Sigma-Algebra]
Let $(X, \tau)$ be a topological space. The **Borel $\sigma$-algebra** on $X$, denoted $\mathcal{B}(X)$, is the $\sigma$-algebra generated by the open sets:
\begin{align*}
\mathcal{B}(X) := \sigma(\tau).
\end{align*}
The elements of $\mathcal{B}(X)$ are called **Borel sets**. Since $\sigma(\tau) = \sigma(\{\text{closed subsets of } X\})$ (every open set is the complement of a closed set and vice versa), the Borel $\sigma$-algebra is equivalently generated by the closed sets.
[/definition]
On $\mathbb{R}^n$, the Borel $\sigma$-algebra admits several equivalent generating families, each useful in different contexts. The following result makes the connection precise and provides practical tools for verifying that a set is Borel.
[quotetheorem:1080]
The equivalence of these generators has an important consequence: to check that a function $f: X \to \mathbb{R}$ is $(\mathcal{F}, \mathcal{B}(\mathbb{R}))$-measurable, it suffices to verify that $f^{-1}((a, \infty)) \in \mathcal{F}$ for every $a \in \mathbb{R}$ (or equivalently, for every $a \in \mathbb{Q}$). This is the basis of the [measurability criterion](/page/Measurable%20Functions) for real-valued functions. The reduction to rational thresholds is possible because $\mathbb{Q}$ is dense in $\mathbb{R}$: the set $\{f > a\}$ can be recovered as $\{f > a\} = \bigcup_{k=1}^\infty \{f > q_k\}$ for any sequence of rationals $q_k \searrow a$.
The theorem as stated addresses $\mathcal{B}(\mathbb{R})$ only. For $\mathcal{B}(\mathbb{R}^n)$ with $n \ge 2$, the Borel $\sigma$-algebra is generated by the open rectangles $(a_1, b_1) \times \cdots \times (a_n, b_n)$, or equivalently by the half-spaces $\{x \in \mathbb{R}^n : x_i > a\}$ for $i = 1, \ldots, n$ and $a \in \mathbb{R}$. This follows from the fact that every open set in $\mathbb{R}^n$ is a countable union of open rectangles (second countability of $\mathbb{R}^n$). The product structure of these generators connects directly to the construction of product $\sigma$-algebras: $\mathcal{B}(\mathbb{R}^n) = \mathcal{B}(\mathbb{R}) \otimes \cdots \otimes \mathcal{B}(\mathbb{R})$, which is the natural domain for product measures and [Fubini's theorem](/page/Lebesgue%20Integral).
A fundamental structural limitation must be noted: the Borel $\sigma$-algebra on $\mathbb{R}$ is strictly smaller than the Lebesgue $\sigma$-algebra $\mathcal{L}(\mathbb{R})$. The Lebesgue $\sigma$-algebra is the completion of $\mathcal{B}(\mathbb{R})$ with respect to Lebesgue measure (see the section on completeness below). There exist Lebesgue measurable sets that are not Borel — for instance, the image of a Borel set under a continuous function need not be Borel, though it is always Lebesgue measurable (being an analytic set).
[example: The Trivial and Discrete Sigma-Algebras]
The two extreme $\sigma$-algebras on a set $X$ illustrate how much information a $\sigma$-algebra encodes.
The **trivial $\sigma$-algebra** $\mathcal{F} = \{\varnothing, X\}$ is the smallest $\sigma$-algebra on $X$. It contains only the empty set and the full space. A measure on $(X, \{\varnothing, X\})$ is determined by a single number $\mu(X) \in [0, \infty]$ (since $\mu(\varnothing) = 0$ is forced). This $\sigma$-algebra is too coarse to distinguish any points or subsets of $X$ — a [measurable function](/page/Measurable%20Functions) $f: (X, \{\varnothing, X\}) \to (\mathbb{R}, \mathcal{B}(\mathbb{R}))$ must satisfy $f^{-1}((a, \infty)) \in \{\varnothing, X\}$ for every $a$, which forces $f$ to be constant.
The **discrete $\sigma$-algebra** $\mathcal{F} = \mathcal{P}(X)$ is the largest $\sigma$-algebra on $X$. Every subset is measurable, and every function $f: X \to \mathbb{R}$ is measurable. This $\sigma$-algebra is too fine for most purposes: on uncountable sets, it admits pathological measures, and on $\mathbb{R}$, no translation-invariant countably additive measure on $\mathcal{P}(\mathbb{R})$ exists (as the Vitali construction shows).
The $\sigma$-algebras used in practice — the Borel and Lebesgue $\sigma$-algebras on $\mathbb{R}^n$, the product $\sigma$-algebras on $\mathbb{R}^n \times \mathbb{R}^m$ — lie between these extremes, large enough to capture the sets that arise in analysis yet small enough to support useful measures.
[/example]
## Fundamental Properties of Measures
Once a measure $\mu$ is defined on a $\sigma$-algebra $\mathcal{F}$, countable additivity has powerful consequences. Many of these follow from manipulating disjoint decompositions and applying the additivity axiom to carefully chosen sequences. The key insight is that countable additivity encodes a *continuity* property: it forces the measure to respect limits of increasing or decreasing sequences of sets.
[quotetheorem:1081]
Monotonicity is perhaps the most frequently invoked property of measures. The finiteness hypothesis in excision is essential: if $\mu(A) = \infty$, then $\mu(B) = \infty$ as well (by monotonicity), but the expression $\mu(B) - \mu(A) = \infty - \infty$ is undefined. In Lebesgue measure, this situation arises whenever $A$ has infinite measure — for instance, $\mathcal{L}^1(\mathbb{R}) = \infty$ and $\mathcal{L}^1(\mathbb{R} \setminus [0,1]) = \infty$, so the identity $\mathcal{L}^1(\mathbb{R} \setminus [0,1]) = \mathcal{L}^1(\mathbb{R}) - \mathcal{L}^1([0,1])$ is the indeterminate form $\infty = \infty - 1$, which fails.
Countable subadditivity extends from disjoint to general unions at the cost of replacing equality with an inequality. The inequality can be strict when the sets overlap: if $A = B = [0,1]$ in $(\mathbb{R}, \mathcal{B}(\mathbb{R}), \mathcal{L}^1)$, then $\mathcal{L}^1(A \cup B) = 1 < 2 = \mathcal{L}^1(A) + \mathcal{L}^1(B)$. The gap between subadditivity and additivity is precisely the measure of the overlaps, quantified by the inclusion-exclusion principle for finitely many sets. For countably many sets, inclusion-exclusion becomes unwieldy, and subadditivity provides a one-sided bound that suffices for most applications — particularly for covering arguments, where one bounds $\mu(\bigcup_k A_k)$ by summing the measures of a countable cover.
The deepest consequences of countable additivity are the continuity properties, which connect the measure of a limit of sets to the limit of the measures. These results are the measure-theoretic analogues of the Monotone Convergence Theorem for integrals.
[quotetheorem:1082]
The asymmetry between the two statements is significant: continuity from below holds unconditionally, while continuity from above requires $\mu(A_1) < \infty$. The following example shows that this finiteness hypothesis cannot be dropped.
[example: Failure of Continuity from Above Without Finite Measure]
Consider $(\mathbb{R}, \mathcal{B}(\mathbb{R}), \mathcal{L}^1)$ and define the decreasing sequence $A_k := [k, \infty)$ for $k \in \mathbb{N}$. Then $A_1 \supset A_2 \supset \cdots$ and $\mathcal{L}^1(A_k) = \infty$ for every $k$.
The intersection is $\bigcap_{k=1}^\infty A_k = \varnothing$ (no real number belongs to $[k, \infty)$ for every $k$), so
\begin{align*}
\mathcal{L}^1\!\left(\bigcap_{k=1}^\infty A_k\right) = \mathcal{L}^1(\varnothing) = 0.
\end{align*}
However, $\lim_{k \to \infty} \mathcal{L}^1(A_k) = \lim_{k \to \infty} \infty = \infty \neq 0$.
The conclusion of continuity from above fails because $\mathcal{L}^1(A_1) = \infty$. The excision argument that drives the proof requires subtracting finite quantities, and $\infty - \infty$ is undefined. This is the same obstruction that prevents excision from working when $\mu(A) = \infty$.
[/example]
The continuity properties have immediate applications. Continuity from below shows that the measure of a set can be computed as the limit of an increasing approximation — a strategy used in the construction of Lebesgue measure itself (approximating arbitrary measurable sets by unions of intervals). Continuity from above is the key ingredient in the proof of the [Borel-Cantelli lemma](/page/Probability%20Space): if $\sum_k \mu(A_k) < \infty$, then the tail sets $B_m := \bigcup_{k=m}^\infty A_k$ decrease, $\mu(B_m) \le \sum_{k=m}^\infty \mu(A_k) \to 0$, and continuity from above gives $\mu(\limsup_k A_k) = \mu(\bigcap_m B_m) = \lim_m \mu(B_m) = 0$, since $\mu(B_1) \le \sum_k \mu(A_k) < \infty$.
## Null Sets and Almost Everywhere
In analysis, many statements hold not for every point but for every point outside a "negligible" set. The concept of a null set makes this precise, and the phrase "almost everywhere" becomes one of the most frequently used qualifiers in measure theory.
The question that motivates this section is: **when can we ignore a set?** In the Riemann theory, the answer involves sets of "measure zero" in an informal sense — but which sets have measure zero depends on the measure. A set that is negligible for Lebesgue measure may be significant for counting measure, and vice versa. The definition is therefore relative to a specific measure.
[definition: Null Set]
Let $(X, \mathcal{F}, \mu)$ be a measure space. A set $N \subset X$ is a **$\mu$-null set** (or simply a **null set**) if there exists a set $E \in \mathcal{F}$ with $N \subset E$ and $\mu(E) = 0$. If the measure space is complete (see below), then every null set is itself measurable and satisfies $\mu(N) = 0$.
A property $P(x)$ is said to hold **$\mu$-almost everywhere** (abbreviated **$\mu$-a.e.** or simply **a.e.** when $\mu$ is understood) if the set $\{x \in X : P(x) \text{ fails}\}$ is a $\mu$-null set:
\begin{align*}
\mu\bigl(\{x \in X : P(x) \text{ fails}\}\bigr) = 0.
\end{align*}
[/definition]
The distinction between "a set of measure zero" and "a subset of a set of measure zero" is subtle but important. If $E \in \mathcal{F}$ satisfies $\mu(E) = 0$, then every subset $N \subset E$ has "zero outer measure" in the sense that $N$ is contained in a measurable set of measure zero. But $N$ itself need not belong to $\mathcal{F}$ — the $\sigma$-algebra may not contain all subsets of null sets. This deficiency is precisely what completeness remedies.
[example: Null Sets Depend on the Measure]
The same set can be null for one measure and non-null for another, illustrating that "negligible" is a measure-dependent concept.
Consider $X = \mathbb{R}$ with $\mathcal{F} = \mathcal{B}(\mathbb{R})$.
**Under Lebesgue measure** $\mathcal{L}^1$: The set $\mathbb{Q}$ of rationals is a $\mathcal{L}^1$-null set. Indeed, $\mathbb{Q} = \bigcup_{k=1}^\infty \{q_k\}$ for some enumeration $\{q_k\}$ of $\mathbb{Q}$, and $\mathcal{L}^1(\{q_k\}) = 0$ for each $k$. By countable subadditivity:
\begin{align*}
\mathcal{L}^1(\mathbb{Q}) \le \sum_{k=1}^\infty \mathcal{L}^1(\{q_k\}) = 0.
\end{align*}
The rationals are "invisible" to Lebesgue measure. This is why the [Dirichlet function](/page/Measurable%20Functions) $\mathbb{1}_{\mathbb{Q}}$ equals $0$ a.e. with respect to $\mathcal{L}^1$, and $\int_0^1 \mathbb{1}_{\mathbb{Q}} \, d\mathcal{L}^1 = 0$.
**Under counting measure** $\#$: The counting measure on $(\mathbb{R}, \mathcal{P}(\mathbb{R}))$ assigns $\#(A) = |A|$ if $A$ is finite and $\#(A) = \infty$ if $A$ is infinite. Since $\mathbb{Q}$ is infinite, $\#(\mathbb{Q}) = \infty$. Under counting measure, the only null set is $\varnothing$. Every nonempty set carries positive (in fact, at least $1$) counting measure.
This example underscores that "almost everywhere" is always relative to a specified measure. A function that equals $0$ a.e. with respect to $\mathcal{L}^1$ may be nonzero at every point from the perspective of counting measure.
[/example]
## Complete Measures
The previous discussion identified a gap in the general theory: a $\sigma$-algebra $\mathcal{F}$ may not contain all subsets of its null sets. This means that a subset of a set of measure zero might fail to be measurable — an aesthetically unpleasant situation that also creates technical problems. For instance, if $f = g$ $\mu$-a.e. and $g$ is measurable, we would like $f$ to be measurable as well. But if the set $\{f \neq g\}$ is a $\mu$-null set whose subsets are not all in $\mathcal{F}$, the measurability of $f$ does not follow from the measurability of $g$.
[definition: Complete Measure]
A measure space $(X, \mathcal{F}, \mu)$ is **complete** if every subset of every $\mu$-null set is $\mathcal{F}$-measurable:
\begin{align*}
\text{if } N \subset E \in \mathcal{F} \text{ and } \mu(E) = 0, \quad \text{then } N \in \mathcal{F}.
\end{align*}
Equivalently, $(X, \mathcal{F}, \mu)$ is complete if every $\mu$-null set belongs to $\mathcal{F}$ (and hence has measure zero).
[/definition]
The Lebesgue measure space $(\mathbb{R}^n, \mathcal{L}(\mathbb{R}^n), \mathcal{L}^n)$ is complete. The Borel measure space $(\mathbb{R}^n, \mathcal{B}(\mathbb{R}^n), \mathcal{L}^n|_{\mathcal{B}})$ is **not** complete: the standard Cantor set $C \subset [0,1]$ satisfies $\mathcal{L}^1(C) = 0$ and $|C| = 2^{\aleph_0}$ (it is uncountable), so $C$ has $2^{2^{\aleph_0}}$ subsets. Since $|\mathcal{B}(\mathbb{R})| = 2^{\aleph_0}$, most subsets of $C$ are not Borel sets. These subsets are Lebesgue null sets (being subsets of a set of Lebesgue measure zero) but not Borel measurable.
Every incomplete measure space can be completed by enlarging the $\sigma$-algebra to include all subsets of null sets.
[quotetheorem:1083]
The well-definedness of $\overline{\mu}$ requires verification: if $A_1 \cup N_1 = A_2 \cup N_2$ with $N_i \subset E_i$ and $\mu(E_i) = 0$, then $A_1 \subset A_2 \cup N_2 \subset A_2 \cup E_2$, so $\mu(A_1) \le \mu(A_2) + \mu(E_2) = \mu(A_2)$. By symmetry, $\mu(A_1) = \mu(A_2)$, so the definition is independent of the representation.
The Lebesgue $\sigma$-algebra on $\mathbb{R}^n$ is precisely the completion of the Borel $\sigma$-algebra with respect to Lebesgue measure: $\mathcal{L}(\mathbb{R}^n) = \overline{\mathcal{B}(\mathbb{R}^n)}$. In practice, the distinction between Borel and Lebesgue measurability is rarely important — most sets encountered in analysis are Borel — but the completeness of $\mathcal{L}(\mathbb{R}^n)$ ensures that technical issues with subsets of null sets do not arise in the Lebesgue framework.
## $\sigma$-Finiteness
Many of the central theorems of measure theory — the [Radon-Nikodym theorem](/page/Absolutely%20Continuous%20Measures), [Fubini's theorem](/page/Lebesgue%20Integral), the duality $(L^1)^* \cong L^\infty$ — require a hypothesis beyond the basic measure space axioms. The hypothesis is **$\sigma$-finiteness**, and its role is to ensure that the measure, while possibly infinite, can be decomposed into countably many finite pieces.
Why is this necessary? Consider the task of comparing two measures: given $\nu \ll \mu$ (absolute continuity), the Radon-Nikodym theorem produces a "density" $f = d\nu/d\mu$ with $\nu(A) = \int_A f \, d\mu$. The proof constructs $f$ on sets of finite $\mu$-measure (using the Hilbert space structure of $L^2$) and then pieces together the global density from the local ones. This patching argument requires that $X$ can be covered by countably many finite-measure sets — exactly the $\sigma$-finiteness condition. Without it, the density may not exist or may not be unique.
[definition: Sigma-Finite Measure]
A measure space $(X, \mathcal{F}, \mu)$ is **$\sigma$-finite** if there exists a countable collection $\{E_k\}_{k=1}^\infty \subset \mathcal{F}$ with $X = \bigcup_{k=1}^\infty E_k$ and $\mu(E_k) < \infty$ for each $k$.
The covering sets $E_k$ may be chosen to be pairwise disjoint (replace $E_k$ by $E_k \setminus \bigcup_{j=1}^{k-1} E_j$) and, if desired, increasing (replace $E_k$ by $\bigcup_{j=1}^k E_j$).
[/definition]
Every finite measure is $\sigma$-finite (take $E_1 = X$). Every probability measure is $\sigma$-finite. Lebesgue measure $\mathcal{L}^n$ on $\mathbb{R}^n$ is $\sigma$-finite: $\mathbb{R}^n = \bigcup_{k=1}^\infty B(0, k)$, and $\mathcal{L}^n(B(0,k)) = \omega_n k^n < \infty$. Counting measure on an uncountable set is **not** $\sigma$-finite: if $X = \mathbb{R}$ with $\#$, then every set $E$ with $\#(E) < \infty$ is finite, and a countable union of finite sets is countable, which cannot cover $\mathbb{R}$.
The following example shows that $\sigma$-finiteness genuinely restricts the class of measures and that its failure can break important theorems.
[example: Failure of Fubini's Theorem Without $\sigma$-Finiteness]
The most dramatic consequence of dropping $\sigma$-finiteness is the failure of [Fubini's theorem](/page/Lebesgue%20Integral). We exhibit a function on a product space where the two iterated integrals exist but are unequal.
Let $X = Y = [0,1]$, let $\mathcal{F} = \mathcal{B}([0,1])$, let $\mu = \mathcal{L}^1|_{[0,1]}$ (Lebesgue measure), and let $\nu = \#$ (counting measure on $([0,1], \mathcal{B}([0,1]))$). Note that $\mu$ is $\sigma$-finite but $\nu$ is **not** $\sigma$-finite (since $[0,1]$ is uncountable and every set of finite counting measure is finite).
Define $D := \{(x, y) \in [0,1] \times [0,1] : x = y\}$, the diagonal. The function $f := \mathbb{1}_D$ satisfies:
**Integrating first in $y$ (against counting measure), then in $x$:** For each fixed $x \in [0,1]$, the section $f(x, \cdot) = \mathbb{1}_{\{x\}}$ has $\int_{[0,1]} \mathbb{1}_{\{x\}}(y) \, d\#(y) = \#(\{x\}) = 1$. Therefore:
\begin{align*}
\int_{[0,1]} \left( \int_{[0,1]} f(x,y) \, d\#(y) \right) d\mathcal{L}^1(x) = \int_{[0,1]} 1 \, d\mathcal{L}^1(x) = 1.
\end{align*}
**Integrating first in $x$ (against Lebesgue measure), then in $y$:** For each fixed $y \in [0,1]$, the section $f(\cdot, y) = \mathbb{1}_{\{y\}}$ has $\int_{[0,1]} \mathbb{1}_{\{y\}}(x) \, d\mathcal{L}^1(x) = \mathcal{L}^1(\{y\}) = 0$. Therefore:
\begin{align*}
\int_{[0,1]} \left( \int_{[0,1]} f(x,y) \, d\mathcal{L}^1(x) \right) d\#(y) = \int_{[0,1]} 0 \, d\#(y) = 0.
\end{align*}
The two iterated integrals give $1 \neq 0$. Fubini's theorem, which guarantees equality of iterated integrals, requires **both** measures to be $\sigma$-finite. Since $\nu = \#$ on $[0,1]$ is not $\sigma$-finite, the hypothesis is violated and the conclusion fails.
[/example]
## The Canonical Examples
The abstract definition of a measure space admits an enormous variety of instances. Four examples form the backbone of the theory, appearing in virtually every application of measure theory to analysis, probability, and geometry.
### Lebesgue Measure
The most important measure in analysis assigns to each "reasonable" subset of $\mathbb{R}^n$ the value that our geometric intuition prescribes: intervals get their length, rectangles get their area, boxes get their volume. The construction of Lebesgue measure proceeds in three stages: define a "pre-measure" on intervals (or rectangles), extend to an outer measure on all subsets using countable covers, and restrict to the $\sigma$-algebra of measurable sets (those satisfying the Caratheodory criterion).
[remark: Lebesgue Measure Summary]
The **Lebesgue measure space** $(\mathbb{R}^n, \mathcal{L}(\mathbb{R}^n), \mathcal{L}^n)$ has the following properties:
- Every open set, every closed set, and every Borel set in $\mathbb{R}^n$ is Lebesgue measurable.
- The measure is translation invariant: $\mathcal{L}^n(A + x) = \mathcal{L}^n(A)$ for every $A \in \mathcal{L}(\mathbb{R}^n)$ and $x \in \mathbb{R}^n$.
- The measure is $\sigma$-finite and complete.
- For a box $B = [a_1, b_1] \times \cdots \times [a_n, b_n]$, $\mathcal{L}^n(B) = \prod_{i=1}^n (b_i - a_i)$.
- The measure is outer regular: $\mathcal{L}^n(A) = \inf\{\mathcal{L}^n(G) : G \supset A, \, G \text{ open}\}$.
- The measure is inner regular on measurable sets: $\mathcal{L}^n(A) = \sup\{\mathcal{L}^n(K) : K \subset A, \, K \text{ compact}\}$ for $A \in \mathcal{L}(\mathbb{R}^n)$.
The Lebesgue $\sigma$-algebra $\mathcal{L}(\mathbb{R}^n)$ is strictly larger than $\mathcal{B}(\mathbb{R}^n)$, and the extension $\mathcal{L}^n$ from $\mathcal{B}(\mathbb{R}^n)$ to $\mathcal{L}(\mathbb{R}^n)$ is the completion of the Borel restriction $\mathcal{L}^n|_{\mathcal{B}}$.
[/remark]
The following example illustrates Lebesgue measure on a set that is far from "simple" — an uncountable set with zero length.
[example: Lebesgue Measure of the Cantor Set]
The **middle-thirds Cantor set** $C$ is constructed by iteratively removing open middle thirds from $[0,1]$. Define $C_0 := [0,1]$, $C_1 := [0, 1/3] \cup [2/3, 1]$, and in general $C_{k+1}$ is obtained from $C_k$ by removing the open middle third of each component interval. The Cantor set is $C := \bigcap_{k=0}^\infty C_k$.
At step $k$, the set $C_k$ consists of $2^k$ disjoint closed intervals, each of length $3^{-k}$, so
\begin{align*}
\mathcal{L}^1(C_k) = 2^k \cdot 3^{-k} = \left(\frac{2}{3}\right)^k.
\end{align*}
Since $C_0 \supset C_1 \supset C_2 \supset \cdots$ is a decreasing sequence with $\mathcal{L}^1(C_0) = 1 < \infty$, continuity from above gives
\begin{align*}
\mathcal{L}^1(C) = \mathcal{L}^1\!\left(\bigcap_{k=0}^\infty C_k\right) = \lim_{k \to \infty} \mathcal{L}^1(C_k) = \lim_{k \to \infty} \left(\frac{2}{3}\right)^k = 0.
\end{align*}
Yet $C$ is uncountable: the Cantor set is in bijection with $\{0,1\}^{\mathbb{N}}$ (via ternary expansions using only digits $0$ and $2$), so $|C| = 2^{\aleph_0}$. This provides a concrete instance of a set that is "large" in the sense of cardinality ($|C| = |\mathbb{R}|$) yet "invisible" to Lebesgue measure ($\mathcal{L}^1(C) = 0$). The computation also illustrates the power of continuity from above: computing $\mathcal{L}^1(C)$ directly from the definition would require working with the Caratheodory outer measure, but the continuity property reduces the computation to a geometric series.
[/example]
### Counting Measure
At the opposite extreme from Lebesgue measure, counting measure treats every point as equally significant and assigns to each set its cardinality. It arises naturally in combinatorics, number theory, and the study of discrete probability spaces.
[example: Counting Measure and $\ell^p$ Spaces]
Let $X$ be any non-empty set, $\mathcal{F} = \mathcal{P}(X)$ (the discrete $\sigma$-algebra), and define the **counting measure**
\begin{align*}
\#: \mathcal{P}(X) &\to [0, \infty] \\
A &\mapsto \begin{cases} |A| & \text{if } A \text{ is finite}, \\ \infty & \text{if } A \text{ is infinite}. \end{cases}
\end{align*}
This is a measure: $\#(\varnothing) = 0$, and countable additivity follows from the fact that a disjoint union of finite sets has cardinality equal to the sum of the cardinalities, while a disjoint union containing an infinite set has infinite cardinality.
Counting measure is $\sigma$-finite if and only if $X$ is countable. When $X = \mathbb{N}$, the [Lebesgue integral](/page/Lebesgue%20Integral) with respect to counting measure reduces to summation:
\begin{align*}
\int_{\mathbb{N}} f \, d\# = \sum_{k=1}^\infty f(k)
\end{align*}
for any measurable function $f: \mathbb{N} \to [0, \infty]$ (every function on $\mathbb{N}$ is $\mathcal{P}(\mathbb{N})$-measurable). The [$L^p$ spaces](/page/L%5Ep%20Spaces) with respect to counting measure on $\mathbb{N}$ are the **sequence spaces** $\ell^p$:
\begin{align*}
\ell^p := L^p(\mathbb{N}, \mathcal{P}(\mathbb{N}), \#) = \left\{ (a_k)_{k=1}^\infty \subset \mathbb{R} : \sum_{k=1}^\infty |a_k|^p < \infty \right\}, \quad \|(a_k)\|_{\ell^p} = \left(\sum_{k=1}^\infty |a_k|^p\right)^{1/p}.
\end{align*}
Similarly, $\ell^\infty = L^\infty(\mathbb{N}, \mathcal{P}(\mathbb{N}), \#) = \{(a_k) : \sup_k |a_k| < \infty\}$, with $\|(a_k)\|_{\ell^\infty} = \sup_k |a_k|$. The essential supremum coincides with the actual supremum because the only $\#$-null set on $\mathbb{N}$ is $\varnothing$.
This identification reveals that results about $L^p$ spaces specialize to classical results about series when the underlying measure is counting measure. For instance, Holder's inequality becomes the discrete Holder inequality: $\sum_k |a_k b_k| \le \|(a_k)\|_{\ell^p} \|(b_k)\|_{\ell^q}$ for $1/p + 1/q = 1$.
[/example]
### The Dirac Measure
In applications, one frequently needs a measure that concentrates all its mass at a single point. The Dirac measure formalises this idea and provides a bridge between measure theory and the theory of [distributions](/page/Schwartz%20Space).
[example: Dirac Measure]
Let $X$ be any non-empty set, let $\mathcal{F}$ be a $\sigma$-algebra on $X$, and fix a point $x_0 \in X$. The **Dirac measure** (or **point mass**) at $x_0$ is
\begin{align*}
\delta_{x_0}: \mathcal{F} &\to [0, \infty] \\
A &\mapsto \begin{cases} 1 & \text{if } x_0 \in A, \\ 0 & \text{if } x_0 \notin A. \end{cases}
\end{align*}
This is a probability measure on $(X, \mathcal{F})$: $\delta_{x_0}(\varnothing) = 0$, $\delta_{x_0}(X) = 1$, and countable additivity holds because at most one set in a disjoint collection can contain $x_0$, so the sum has at most one nonzero term.
Integration with respect to $\delta_{x_0}$ evaluates functions at $x_0$:
\begin{align*}
\int_X f \, d\delta_{x_0} = f(x_0)
\end{align*}
for every measurable function $f: X \to [0, \infty]$ (and more generally for $f \in L^1(X, \delta_{x_0})$). This identity is verified by linearity and approximation: for a simple function $s = \sum_j c_j \mathbb{1}_{A_j}$, we have $\int s \, d\delta_{x_0} = \sum_j c_j \delta_{x_0}(A_j) = \sum_j c_j \mathbb{1}_{A_j}(x_0) = s(x_0)$. The general case follows by the [Monotone Convergence Theorem](/theorems/509).
In $L^p$ terms: $L^p(X, \mathcal{F}, \delta_{x_0}) \cong \mathbb{R}$ for every $p \in [1, \infty]$, since $\|f\|_{L^p(\delta_{x_0})} = |f(x_0)|$ and two functions are equal $\delta_{x_0}$-a.e. if and only if they agree at $x_0$. The Dirac measure is the simplest nontrivial measure but plays a fundamental role as a building block: every measure on a countable set is a (possibly infinite) weighted sum of Dirac measures.
[/example]
### Probability Spaces
Probability theory is measure theory with total mass normalised to $1$. A probability space $(\Omega, \mathcal{F}, \mathbb{P})$ is a measure space with $\mathbb{P}(\Omega) = 1$. The elements of $\Omega$ are called **outcomes**, the elements of $\mathcal{F}$ are called **events**, and $\mathbb{P}(A)$ is the **probability** of event $A$. The notational conventions are different — we write $\mathbb{E}[X] = \int_\Omega X \, d\mathbb{P}$ for the expectation of a random variable $X: \Omega \to \mathbb{R}$ — but the underlying mathematics is identical.
[example: A Continuous Probability Space]
Consider the probability space $(\Omega, \mathcal{F}, \mathbb{P}) = ([0,1], \mathcal{B}([0,1]), \mathcal{L}^1|_{[0,1]})$, where $\mathcal{L}^1|_{[0,1]}$ denotes the restriction of Lebesgue measure to $[0,1]$. This models a "uniform" random experiment on $[0,1]$: every subinterval $[a,b] \subset [0,1]$ has probability $b - a$ proportional to its length.
A random variable $X: [0,1] \to \mathbb{R}$ on this space is simply a Borel measurable function, and its expectation is
\begin{align*}
\mathbb{E}[X] = \int_0^1 X(\omega) \, d\mathcal{L}^1(\omega).
\end{align*}
The variance is $\operatorname{Var}(X) = \mathbb{E}[X^2] - (\mathbb{E}[X])^2$, and $X \in L^2([0,1])$ if and only if $\operatorname{Var}(X) < \infty$. This space is rich enough to support all of probability theory: every Borel probability measure on $\mathbb{R}$ arises as the law of some measurable function $X: [0,1] \to \mathbb{R}$ (a fact that follows from the existence of the quantile function).
[/example]
## Standard Techniques in Measure Theory
Working with measure spaces requires a repertoire of standard arguments that appear across analysis, probability, and geometry. The following techniques are the basic tools for manipulating measures, verifying measurability, and establishing measure-theoretic properties.
### The Good Sets Principle
The most fundamental technique for proving statements about all sets in a $\sigma$-algebra is the **good sets principle** (also called the $\pi$-$\lambda$ argument or the monotone class argument). The idea is to avoid working with the $\sigma$-algebra directly — which is enormous — and instead to verify the desired property on a small generating class, then extend.
The strategy has three steps:
1. Identify a generating class $\mathcal{C}$ for the $\sigma$-algebra $\mathcal{F} = \sigma(\mathcal{C})$.
2. Define the "good sets" $\mathcal{G} := \{A \in \mathcal{F} : A \text{ has property } P\}$.
3. Show that $\mathcal{G}$ contains $\mathcal{C}$ and that $\mathcal{G}$ is a $\sigma$-algebra (or a $\lambda$-system if $\mathcal{C}$ is a $\pi$-system). Conclude $\mathcal{G} = \mathcal{F}$.
The Dynkin $\pi$-$\lambda$ theorem makes this precise: if $\mathcal{C}$ is a $\pi$-system (closed under finite intersections) and $\mathcal{G}$ is a $\lambda$-system (contains $X$, closed under proper differences and increasing countable unions) with $\mathcal{C} \subset \mathcal{G}$, then $\sigma(\mathcal{C}) \subset \mathcal{G}$.
[example: Uniqueness of Measures via the Good Sets Principle]
We prove: **if two $\sigma$-finite measures $\mu$ and $\nu$ on $(X, \sigma(\mathcal{C}))$ agree on a $\pi$-system $\mathcal{C}$, then $\mu = \nu$ on $\sigma(\mathcal{C})$.**
First, assume $\mu(X) = \nu(X) < \infty$ (the finite case). Define the good sets:
\begin{align*}
\mathcal{G} := \{A \in \sigma(\mathcal{C}) : \mu(A) = \nu(A)\}.
\end{align*}
We verify that $\mathcal{G}$ is a $\lambda$-system:
- $X \in \mathcal{G}$ because $\mu(X) = \nu(X)$.
- If $A, B \in \mathcal{G}$ with $A \subset B$, then $\mu(B \setminus A) = \mu(B) - \mu(A) = \nu(B) - \nu(A) = \nu(B \setminus A)$, so $B \setminus A \in \mathcal{G}$ (excision is valid because $\mu(A) = \nu(A) < \infty$).
- If $A_1 \subset A_2 \subset \cdots$ with $A_k \in \mathcal{G}$, then by continuity from below, $\mu(\bigcup_k A_k) = \lim_k \mu(A_k) = \lim_k \nu(A_k) = \nu(\bigcup_k A_k)$, so $\bigcup_k A_k \in \mathcal{G}$.
Since $\mathcal{C} \subset \mathcal{G}$ (by hypothesis) and $\mathcal{C}$ is a $\pi$-system, the $\pi$-$\lambda$ theorem gives $\sigma(\mathcal{C}) \subset \mathcal{G}$, so $\mu = \nu$ on $\sigma(\mathcal{C})$.
For the $\sigma$-finite case: write $X = \bigcup_k E_k$ with $\mu(E_k) = \nu(E_k) < \infty$. For each $k$, the measures $\mu|_{E_k}$ and $\nu|_{E_k}$ agree on $\mathcal{C} \cap E_k := \{A \cap E_k : A \in \mathcal{C}\}$ (which is a $\pi$-system), so they agree on $\sigma(\mathcal{C} \cap E_k) = \sigma(\mathcal{C}) \cap E_k$. For any $A \in \sigma(\mathcal{C})$, continuity from below gives $\mu(A) = \lim_k \mu(A \cap E_k) = \lim_k \nu(A \cap E_k) = \nu(A)$.
This argument is the standard proof of the **uniqueness lemma for measures**, which underlies the construction of product measures and the uniqueness of Lebesgue measure.
[/example]
### Approximation by Simpler Sets
In practice, verifying a property for all measurable sets is often difficult. A powerful strategy is to approximate a general measurable set by structurally simpler sets (open sets, closed sets, compact sets, or finite unions of intervals) and pass to the limit using the continuity properties of the measure.
For Lebesgue measure on $\mathbb{R}^n$, the regularity properties provide the approximation:
- **Outer regularity:** For every $A \in \mathcal{L}(\mathbb{R}^n)$ and $\varepsilon > 0$, there exists an open set $G \supset A$ with $\mathcal{L}^n(G \setminus A) < \varepsilon$.
- **Inner regularity:** For every $A \in \mathcal{L}(\mathbb{R}^n)$ and $\varepsilon > 0$, there exists a compact set $K \subset A$ with $\mathcal{L}^n(A \setminus K) < \varepsilon$.
These approximation properties are the measure-theoretic foundation of the density results for [$L^p$ spaces](/page/L%5Ep%20Spaces): the density of simple functions in $L^p$ reduces (via approximation of measurable sets by intervals) to the density of step functions, and the density of continuous functions follows from approximating indicator functions of intervals by continuous functions.
### Exhaustion Arguments
When the underlying set has infinite measure, many arguments break down because the "whole space" is too large to control at once. The standard fix is an **exhaustion argument**: choose an increasing sequence of finite-measure sets $E_1 \subset E_2 \subset \cdots$ with $X = \bigcup_k E_k$, prove the desired result on each $E_k$ (where finiteness holds), and pass to the limit using continuity from below.
This technique appears in the proof of the Radon-Nikodym theorem ($\sigma$-finiteness allows exhaustion by finite-measure sets), in the construction of product measures (Fubini-Tonelli), and in the proof of uniqueness of measures (as in the example above).
### Monotone Class Arguments
The $\pi$-$\lambda$ argument described above is one instance of a broader technique: proving statements about all sets in a $\sigma$-algebra by showing that the "good sets" form a monotone class (a collection closed under increasing unions and decreasing intersections). The **monotone class theorem** states: if $\mathcal{A}$ is an algebra of sets and $\mathcal{M}$ is a monotone class containing $\mathcal{A}$, then $\mathcal{M} \supset \sigma(\mathcal{A})$. This is often used interchangeably with the $\pi$-$\lambda$ theorem; the choice depends on whether the generating class is an algebra or a $\pi$-system.
There is also a **functional monotone class theorem** (for functions rather than sets): if a class of bounded measurable functions contains all indicator functions $\mathbb{1}_A$ for $A$ in a $\pi$-system, and is closed under bounded monotone limits and linear combinations, then it contains all bounded $\sigma(\mathcal{C})$-measurable functions. This version is the primary tool for extending statements from simple functions to general measurable functions.
## References
1. Folland, G. B., *Real Analysis: Modern Techniques and Their Applications*, 2nd ed. (1999).
2. Halmos, P. R., *Measure Theory* (1950).
3. Rudin, W., *Real and Complex Analysis*, 3rd ed. (1987).
4. Cohn, D. L., *Measure Theory*, 2nd ed. (2013).
5. Billingsley, P., *Probability and Measure*, 3rd ed. (1995).
6. Evans, L. C. and Gariepy, R. F., *Measure Theory and Fine Properties of Functions*, revised ed. (2015).
7. Stein, E. M. and Shakarchi, R., *Real Analysis: Measure Theory, Integration, and Hilbert Spaces* (2005).