Geometric Measure Theory I establishes the foundations of measure theory in Euclidean space, extending classical Lebesgue integration to irregular sets and functions. Rather than assuming measures on all subsets, we build measurable sets and measures from outer measures, allowing us to quantify the "size" of nowhere differentiable curves, fractal boundaries, and other pathological objects that Lebesgue measure alone cannot adequately describe. The course begins with the abstract machinery — measurable spaces, $\sigma$-algebras, and Radon measures — then immediately applies it to concrete geometric questions: How do we measure a Cantor set? What is the dimension of a fractal? Why do most continuous functions fail to be differentiable anywhere?
The first half develops tools for handling defective functions. Lusin's theorem tells us that measurable functions are "almost continuous," while Egoroff's theorem and Fubini's theorem let us reduce multidimensional problems to one-dimensional slices. Covering theorems — the Vitali and Besicovitch theorems — provide the combinatorial backbone for differentiation theory, allowing us to recover measures from their local densities. These results culminate in the Lebesgue differentiation theorem: almost every point is a "Lebesgue point" where the local average of an integrable function approaches its value.
The second half introduces Hausdorff measure and dimension, the central objects of geometric measure theory. Unlike Lebesgue measure, which is tied to Euclidean dimension, Hausdorff measure adapts to any real dimension $s \geq 0$, revealing the "true" size of fractals. We see how Lipschitz maps preserve Hausdorff dimension (though not always the measure itself), and how densities — local ratios of measure to volume — encode geometric information. The course concludes with concrete examples: self-similar sets, curves, and rectifiable sets, showing how abstract theory illuminates classical fractals and revealing why dimension matters for both pure mathematics and applications.
# 1. Measures and Measurable Functions
Measure theory begins with a question that looks deceptively simple: what does it mean for a set to have "size"? For intervals on a line, the answer is obvious — the length of $[a, b]$ is $b - a$. For rectangles in the plane, it is area. But what is the size of the Cantor set? What is the "area" of a fractal curve? What size should we assign to a set that is too irregular to be approximated by rectangles?
Classical Riemann integration evades this question by restricting to well-behaved functions on well-behaved domains. Geometric Measure Theory is built precisely around refusing this evasion. To study surfaces, fractals, and variational problems in full generality, we need a theory of "size" that assigns a meaningful number to every subset of $\mathbb{R}^n$ — not just the rectangles and intervals. The price of this generality is that we must be more careful about what we mean by "measurable." The reward is a framework powerful enough to define the area of a Lipschitz surface, the perimeter of a set with a fractal boundary, and the dimension of the Cantor set.
This chapter builds the measure-theoretic foundation for everything that follows. We begin with outer measures — set functions defined on all subsets with no sigma-algebra in sight — and then introduce Carathéodory's criterion, which selects the "good" subsets from the collection of all subsets. The result is a sigma-algebra and a genuine measure. We then study the finer structure needed for geometric applications: Borel regularity and Radon measures. The chapter closes with measurable functions and the layer-cake formula, which encodes the integration theory we will use throughout the course.
[example: Length Cannot Be Extended Without Contradiction]
To see why we cannot simply assign a size to every subset of $\mathbb{R}$, consider the following. Say a set function $\lambda$ on all subsets of $[0, 1]$ satisfies three natural properties: $\lambda([a, b]) = b - a$ for intervals, $\lambda$ is countably additive on disjoint sets, and $\lambda$ is translation-invariant. Define an equivalence relation on $[0, 1]$ by $x \sim y$ if $x - y \in \mathbb{Q}$. By the axiom of choice, pick one representative from each equivalence class; call this set $V$. Let $q_1, q_2, \ldots$ be an enumeration of $\mathbb{Q} \cap [-1, 1]$, and let $V_k = V + q_k$ (reduced modulo 1). The sets $V_k$ are pairwise disjoint, their union is $[0, 1]$, and by translation invariance they all have the same measure. But then $\lambda([0,1]) = 1$ must equal $\sum_k \lambda(V_k)$, which is either $0$ (if $\lambda(V) = 0$) or $+\infty$ (if $\lambda(V) > 0$). Neither is $1$. This contradiction — the Vitali argument — shows that no countably additive, translation-invariant extension of length can be defined on all subsets of $[0,1]$. The notion of "measurable set" is not a luxury; it is a necessity.
[/example]
The resolution is not to assign a size to every set, but to identify which sets behave well enough to deserve one. The framework that makes this precise is the outer measure and Carathéodory's criterion.
## Outer Measures
The first step is to define a set function on all subsets — we do not yet demand sigma-additivity, only monotonicity and countable subadditivity. This is the outer measure. The point is that it is easy to construct set functions on all subsets (for instance, by taking infima over covers), and then Carathéodory's criterion extracts the sigma-algebra on which the function is genuinely additive.
[definition: Outer Measure]
Let $X$ be a set. An **outer measure** on $X$ is a function $\mu^* : \mathcal{P}(X) \to [0, \infty]$ satisfying:
1. $\mu^*(\varnothing) = 0$.
2. **Monotonicity:** If $A \subset B \subset X$, then $\mu^*(A) \leq \mu^*(B)$.
3. **Countable subadditivity:** For any sequence of sets $A_1, A_2, \ldots \subset X$,
\begin{align*}
\mu^*\!\left(\bigcup_{k=1}^\infty A_k\right) \leq \sum_{k=1}^\infty \mu^*(A_k).
\end{align*}
[/definition]
Note that $\mu^*$ is defined on every subset of $X$ — there is no sigma-algebra involved. The definition is weak: we only demand subadditivity, not additivity. This makes outer measures easy to construct. The standard construction starts from a "pre-measure" $\rho$ defined on a collection of elementary sets (balls, cubes, intervals), and defines
\begin{align*}
\mu^*(E) = \inf\left\{ \sum_{k=1}^\infty \rho(C_k) : E \subset \bigcup_{k=1}^\infty C_k, \; C_k \in \mathcal{C} \right\}
\end{align*}
for some collection $\mathcal{C}$ of elementary sets. This infimum-over-covers construction automatically satisfies monotonicity and countable subadditivity.
[example: Lebesgue Outer Measure on the Line]
Take $X = \mathbb{R}$, $\mathcal{C}$ = the collection of bounded open intervals $(a, b)$, and $\rho((a, b)) = b - a$. The resulting outer measure is
\begin{align*}
\mathcal{L}^{1*}(E) = \inf\left\{ \sum_{k=1}^\infty (b_k - a_k) : E \subset \bigcup_{k=1}^\infty (a_k, b_k) \right\}.
\end{align*}
This assigns to every subset of $\mathbb{R}$ a value in $[0, \infty]$. For an interval $[c, d]$, one checks that $\mathcal{L}^{1*}([c, d]) = d - c$: the cover $\{(c - \varepsilon, d + \varepsilon)\}$ gives an upper bound of $d - c + 2\varepsilon$, and any countable cover of $[c, d]$ has total length at least $d - c$ by the Heine-Borel theorem (extract a finite subcover and use that the finite subcover's total length is at least $d - c$). For the rationals $\mathbb{Q} \cap [0,1]$: enumerate them as $r_1, r_2, \ldots$ and cover by $(r_k - \varepsilon/2^{k+1}, r_k + \varepsilon/2^{k+1})$; the total length is at most $\varepsilon$. Since $\varepsilon > 0$ was arbitrary, $\mathcal{L}^{1*}(\mathbb{Q} \cap [0,1]) = 0$. Countable sets have outer measure zero.
[/example]
The outer measure $\mathcal{L}^{1*}$ assigns a value to every subset of $\mathbb{R}$. But it is not additive on all sets: the Vitali set $V$ from the opening example shows that the union of disjoint sets can have a different outer measure from the sum of their individual outer measures. The question becomes: which subsets of $\mathbb{R}$ behave additively? The answer requires a criterion that can be applied to every possible test set — a criterion that selects the well-behaved subsets without assuming any prior sigma-algebra structure. The remarkable fact, discovered by Carathéodory, is that such a criterion exists and automatically produces a sigma-algebra.
## Carathéodory's Criterion
The brilliant insight of Carathéodory is a criterion for identifying the "good" subsets — those on which $\mu^*$ acts like a genuine measure.
The idea is this: if $A$ is a set that truly separates the space into "inside" and "outside" without leaking, then for any test set $E$, the portions $E \cap A$ and $E \setminus A$ should together account for all of $\mu^*(E)$. Countable subadditivity always gives $\mu^*(E) \leq \mu^*(E \cap A) + \mu^*(E \setminus A)$; the criterion demands the reverse inequality too.
[definition: Carathéodory Measurability]
Let $\mu^*$ be an outer measure on $X$. A set $A \subset X$ is **$\mu^*$-measurable** if for every set $E \subset X$,
\begin{align*}
\mu^*(E) = \mu^*(E \cap A) + \mu^*(E \setminus A).
\end{align*}
[/definition]
[explanation: Why the Splitting Condition]
The splitting condition $\mu^*(E) = \mu^*(E \cap A) + \mu^*(E \setminus A)$ is not as arbitrary as it first appears. By countable subadditivity, we always have $\mu^*(E) \leq \mu^*(E \cap A) + \mu^*(E \setminus A)$. The condition asks that equality holds — that $A$ does not "lose" any mass when it splits the test set $E$. Intuitively, $A$ is measurable if it acts as a clean partition of any test set: the portion inside $A$ and the portion outside $A$ together account for all of $E$.
The use of an arbitrary test set $E$ is crucial. If we only required the splitting condition for $E = X$, we would get a very weak notion. The condition must hold for every $E \subset X$ — even wildly irregular subsets — to guarantee that $A$ behaves well in all integration computations. This universality is what makes the criterion powerful.
One might wonder why the condition is stated with an arbitrary $E$ rather than just for open sets or Borel sets. The answer is that we want the resulting sigma-algebra to be **complete**: if $N$ is a null set ($\mu^*(N) = 0$) and $M \subset N$, then $M$ should be measurable. With the Carathéodory criterion, this completeness comes for free.
[/explanation]
The fundamental theorem of the Carathéodory construction is the following.
[quotetheorem:2954]
[citeproof:2954]
Let us unpack what this gives us. Starting from an outer measure $\mu^*$ defined on all subsets of $X$ — with no sigma-algebra in hand — Carathéodory's criterion produces a sigma-algebra $\mathcal{M}$ and a genuine measure $\mu$ on it. Moreover, $\mu$ is complete: every subset of a null set is measurable with measure zero. This completeness is automatic and will be used constantly.
[remark: Null Sets Are Always Measurable]
If $\mu^*(N) = 0$, then $N$ is $\mu^*$-measurable. For any test set $E$, we have $\mu^*(E \cap N) \leq \mu^*(N) = 0$, so $\mu^*(E \cap N) = 0$. Then $\mu^*(E \setminus N) \leq \mu^*(E) \leq \mu^*(E \cap N) + \mu^*(E \setminus N) = \mu^*(E \setminus N)$, giving equality throughout. Every null set is in $\mathcal{M}$, and every subset of a null set is in $\mathcal{M}$.
[/remark]
## Borel Measures and Borel Regularity
When working on $\mathbb{R}^n$, the natural sigma-algebra to consider is the Borel sigma-algebra, generated by the open sets. In the GMT setting, we need to understand how outer measures on $\mathbb{R}^n$ relate to this Borel structure. A measure that plays well with the topology of $\mathbb{R}^n$ — meaning that open sets are measurable — is called a Borel measure.
[definition: Borel Measure]
A measure $\mu$ on $\mathbb{R}^n$ is a **Borel measure** if every Borel subset of $\mathbb{R}^n$ is $\mu$-measurable, i.e., the Borel sigma-algebra $\mathcal{B}(\mathbb{R}^n)$ is contained in the domain of $\mu$.
[/definition]
Being a Borel measure is the minimum compatibility requirement between a measure and the topology of $\mathbb{R}^n$. But in geometric measure theory, we often need more: we want measures that are not just defined on Borel sets, but actually determined by their values on Borel sets. A measure can assign the same value to all Borel supersets of a given set without knowing what to assign to the set itself; Borel regularity rules this out.
The motivation for the next condition comes from approximation. When we want to compute $\mu(A)$ for some set $A$, it is useful to know that we can approximate $A$ from outside by Borel sets without losing any mass. This is not automatic even for Borel measures.
[definition: Borel Regular Measure]
A Borel measure $\mu$ on $\mathbb{R}^n$ is **Borel regular** if for every set $A \subset \mathbb{R}^n$, there exists a Borel set $B \supset A$ such that $\mu(B) = \mu(A)$.
[/definition]
[explanation: Why Borel Regularity Matters]
Borel regularity says that the measure is completely determined by its values on Borel sets. For any set $A$, no matter how irregular, we can find a Borel set $B$ that contains $A$ and has the same measure. This means the measure cannot "see" the difference between $A$ and $B$ — the non-Borel parts of $A$ carry no extra information.
In practice, Borel regularity means that to compute $\mu(A)$, it suffices to compute infima over Borel supersets. This is valuable in proofs where we need to pass to well-behaved sets. Hausdorff measures, which we will study in Chapter 9, are Borel regular, and this is one of the key structural properties that makes them tractable.
Without Borel regularity, a measure might behave well on Borel sets but assign inconsistent values to more exotic sets constructed using the axiom of choice. Borel regularity rules out this pathology.
[/explanation]
## Radon Measures
Even Borel regular measures can be badly behaved — for instance, they might assign infinite mass to every open set, making integration impossible. In $\mathbb{R}^n$, the correct class of "tame" measures for geometric analysis is the Radon measures. These are the measures that can be approximated from inside by compact sets and from outside by open sets.
The motivation for finiteness on compact sets is clear: if $\mu(K) = \infty$ for some compact set $K$, then integrating any positive function against $\mu$ on $K$ will give $+\infty$, and the integration theory becomes degenerate. We need a class of measures where integration is possible on bounded regions.
[definition: Radon Measure]
A Borel regular measure $\mu$ on $\mathbb{R}^n$ is a **Radon measure** if $\mu(K) < \infty$ for every compact set $K \subset \mathbb{R}^n$.
[/definition]
Radon measures admit two approximation properties that are central to their theory.
[quotetheorem:2955]
[citeproof:2955]
Inner regularity says that the measure of any Borel set can be detected by compact subsets — even if $A$ is wildly irregular, its measure is the supremum over the compact sets it contains. Outer regularity says the same from the outside: the measure of $A$ is the infimum over open supersets. Together, these make Radon measures the most geometrically accessible measures on $\mathbb{R}^n$.
[example: Lebesgue Measure Is Radon, Counting Measure Is Not]
Lebesgue measure $\mathcal{L}^n$ is a Radon measure on $\mathbb{R}^n$: it is Borel regular (every set has a Borel superset of the same measure, by the outer regularity of Lebesgue measure), and $\mathcal{L}^n(K) < \infty$ for every compact $K$ (since compact subsets of $\mathbb{R}^n$ are bounded, hence contained in a cube of finite volume).
Counting measure $\#$ on $\mathbb{R}$ — defined by $\#(A) = $ the number of points in $A$ — is a Borel measure, but it is **not** Radon. The compact set $K = [0, 1]$ has $\#(K) = \infty$ (since $[0,1]$ contains uncountably many points), violating the finiteness condition. Counting measure is well-suited to discrete sets (where compact sets are finite), but it fails on $\mathbb{R}$ precisely because the real line is not discrete.
To see that counting measure is Borel but not Borel regular in the usual sense: every singleton $\{x\}$ is a Borel set, and the Borel sigma-algebra on $\mathbb{R}$ is generated by open intervals, which are Borel sets. So every subset that is a countable union of singletons — and hence every subset — could in principle be assigned a value. But the regularity condition requires Borel supersets, and for an uncountable set $A$ with $\#(A) = \infty$, we need a Borel set $B \supset A$ with $\#(B) = \infty$, which is satisfied because $B = \mathbb{R}$ is itself a Borel set containing $A$ with $\#(\mathbb{R}) = \infty$. The real failure of counting measure is not Borel regularity but the finiteness condition: every uncountable compact set — and every compact interval in $\mathbb{R}$ is uncountable — has infinite counting measure.
[/example]
[remark: Restriction of Hausdorff Measure]
The $k$-dimensional Hausdorff measure $\mathcal{H}^k$ on $\mathbb{R}^n$ (for $k < n$), which we study in Chapter 9, is Borel regular but is not Radon on all of $\mathbb{R}^n$ — it assigns infinite mass to compact sets of positive $k$-dimensional measure (for instance, $\mathcal{H}^1([-1,1]) = 2$ but $\mathcal{H}^1(\mathbb{R}) = \infty$ and compact intervals in $\mathbb{R}^1$ have finite $\mathcal{H}^1$ measure). More precisely: $\mathcal{H}^k$ is a Radon measure when restricted to sets of locally finite $\mathcal{H}^k$-measure. This distinction will matter when we study rectifiable sets.
[/remark]
## Measurable Functions
With the measure structure in hand, we turn to the functions we will integrate. In classical analysis, integrability requires continuity or piecewise continuity. In GMT, the correct condition is far more general: measurability. The link to integration is direct — measurability is exactly the condition needed to apply the layer-cake formula.
Why do we care about extending integration beyond continuous functions? Because in geometric problems, the characteristic functions of sets (the functions $\mathbb{1}_E$ that equal $1$ on $E$ and $0$ off $E$) are almost never continuous, yet we need to integrate them. The measure of a set $E$ should equal the integral of $\mathbb{1}_E$. For this to make sense, $\mathbb{1}_E$ must be integrable, which requires measurability of $E$. Measurable functions are the broadest class for which integration is well-defined and behaves sensibly.
[definition: Measurable Function]
Let $(X, \mathcal{M}, \mu)$ be a measure space. A function $f : X \to [-\infty, \infty]$ is **$\mu$-measurable** if for every $t \in \mathbb{R}$, the set
\begin{align*}
\{ x \in X : f(x) > t \}
\end{align*}
belongs to $\mathcal{M}$.
[/definition]
The condition $\{f > t\} \in \mathcal{M}$ for all $t$ is equivalent (given that $\mathcal{M}$ is a sigma-algebra) to requiring that $\{f \geq t\}$, $\{f < t\}$, $\{f \leq t\}$, or $\{f = t\}$ all lie in $\mathcal{M}$ — any one of these conditions implies the others. The choice of $\{f > t\}$ is a matter of convention.
The class of measurable functions is closed under all the operations one could want: sums, products, compositions (with Borel functions), pointwise limits, $\limsup$, $\liminf$. If $f_k$ is a sequence of measurable functions and $f_k \to f$ pointwise $\mu$-almost everywhere, then $f$ is measurable (after possibly redefining on a null set). This closure makes measurable functions the natural function class for analysis.
[example: Borel Functions Are Measurable]
If $\mu$ is a Borel measure on $\mathbb{R}^n$ and $f : \mathbb{R}^n \to \mathbb{R}$ is a Borel-measurable function (meaning $\{f > t\}$ is a Borel set for every $t$), then $f$ is $\mu$-measurable. In particular, every continuous function is Borel-measurable: $\{f > t\}$ is open, hence Borel. Every lower semi-continuous function is Borel-measurable. Every function that is the pointwise limit of a sequence of continuous functions is Borel-measurable (since the limit of Borel functions is Borel).
The Dirichlet function $f = \mathbb{1}_\mathbb{Q}$ on $[0,1]$ is a Borel-measurable function: $\{f > t\} = \mathbb{Q} \cap [0,1]$ for $t \in [0, 1)$, which is a countable (hence Borel) set. Yet $f$ is nowhere continuous on $[0,1]$. Measurability is a much weaker condition than continuity, which is exactly why the theory is so general.
[/example]
The connection between measurability and integration is made precise by the layer-cake formula. This formula decomposes the integral of a non-negative function into integrals over level sets, providing one of the most useful computational tools in GMT.
[quotetheorem:2956]
The layer-cake formula says: instead of integrating $f$ over $X$, we can integrate the "size" of the super-level sets $\{f > t\}$ over the parameter $t$. The measurability of $f$ is exactly the condition that ensures $t \mapsto \mu(\{f > t\})$ is a measurable function of $t$ — and hence that the right-hand side makes sense.
[explanation: Why the Layer-Cake Formula Matters for GMT]
The layer-cake formula is not just a computational trick — it is the right way to think about integrals when the measure $\mu$ is geometric rather than volumetric. In GMT, we often work with Hausdorff measures $\mathcal{H}^k$ on lower-dimensional sets. When we compute $\int_E f \, d\mathcal{H}^k$, the level sets $\{f > t\} \cap E$ are subsets of $E$, and their $\mathcal{H}^k$-measure encodes geometric information about how $f$ varies along $E$.
More concretely: the formula $\int_X f \, d\mu = \int_0^\infty \mu(\{f > t\}) \, dt$ reduces the computation of a "horizontal" integral (summing $f(x)$ over all $x$) to a "vertical" integral (summing the measure of level sets over the height $t$). This is valuable when the level sets have a simpler geometric structure than the function itself.
For instance, in proving the isoperimetric inequality, one represents the volume of a set as the integral of the Lebesgue measure of horizontal cross-sections (a layer-cake formula for the indicator function), then estimates each cross-section separately. The formula also appears in the co-area formula (GMT II), which is the correct change-of-variables formula for Lipschitz maps between spaces of different dimensions.
[/explanation]
The layer-cake formula follows from a Fubini-type argument: write $f(x) = \int_0^{f(x)} dt = \int_0^\infty \mathbb{1}_{\{f > t\}}(x) \, dt$ and integrate over $X$:
\begin{align*}
\int_X f \, d\mu &= \int_X \int_0^\infty \mathbb{1}_{\{f(x) > t\}} \, d\mathcal{L}^1(t) \, d\mu(x) = \int_0^\infty \int_X \mathbb{1}_{\{f > t\}}(x) \, d\mu(x) \, d\mathcal{L}^1(t) = \int_0^\infty \mu(\{f > t\}) \, d\mathcal{L}^1(t).
\end{align*}
The exchange of integrals is justified by Tonelli's theorem (the integrand is non-negative); Tonelli's theorem is the non-negative special case of Fubini's theorem, which states that for a non-negative measurable function on a product measure space, the double integral equals either iterated integral.
## Simple Functions and Approximation
How do we define the integral of a general measurable function? The function may be wildly discontinuous, and its range may be infinite or uncountable — classical Riemann sums do not apply. The resolution is to approximate from below by functions that take only finitely many values: the simple functions. Once we know how to integrate simple functions (by linearity and the definition of measure), we can pass to the limit and define the integral of any non-negative measurable function. The question is whether such approximations exist and converge in a useful way.
To motivate the approximation, note that a non-negative measurable function $f$ can be discretized: for each $k \in \mathbb{N}$ and $j = 0, 1, \ldots, k2^k - 1$, let $E_{k,j} = \{j/2^k \leq f < (j+1)/2^k\}$ and $F_k = \{f \geq k\}$. Define
\begin{align*}
f_k = \sum_{j=0}^{k2^k - 1} \frac{j}{2^k} \mathbb{1}_{E_{k,j}} + k \mathbb{1}_{F_k}.
\end{align*}
Each $f_k$ is a simple function (measurable, taking finitely many values). The sequence $f_k$ increases monotonically to $f$ pointwise on $X$.
[quotetheorem:1020]
This approximation theorem, combined with the monotone convergence theorem, is the foundation of Lebesgue integration: define the integral of a simple function by the obvious formula $\int \sum_j a_j \mathbb{1}_{E_j} \, d\mu = \sum_j a_j \mu(E_j)$, and extend to non-negative measurable functions by taking the limit of integrals of the approximating simple functions. For general measurable functions, write $f = f^+ - f^-$ where $f^+ = \max\{f, 0\}$ and $f^- = \max\{-f, 0\}$, and integrate separately.
[remark: $L^p$ Spaces and Integrability]
For a measure space $(X, \mathcal{M}, \mu)$ and $1 \leq p < \infty$, the space $L^p(X, \mathcal{M}, \mu)$ consists of all measurable functions $f : X \to [-\infty, \infty]$ with $\|f\|_{L^p} = \left(\int_X |f|^p \, d\mu\right)^{1/p} < \infty$. Two functions are identified if they agree $\mu$-almost everywhere. The space $L^\infty(X, \mathcal{M}, \mu)$ consists of essentially bounded measurable functions, with norm $\|f\|_{L^\infty} = \operatorname{ess\,sup} |f|$. These spaces appear throughout GMT, typically with $\mu$ being a Radon measure or a Hausdorff measure restricted to a set.
[/remark]
## Restriction of Measures
Suppose we want to integrate a function $f$ over a surface $\Sigma \subset \mathbb{R}^n$ — say, the unit sphere $\mathbb{S}^{n-1}$. The ambient Lebesgue measure $\mathcal{L}^n$ assigns zero mass to any $(n-1)$-dimensional surface, so integrating against $\mathcal{L}^n$ gives nothing. We need a measure that is supported on $\Sigma$ and assigns to each subset of $\Sigma$ its surface area. Without a way to concentrate a given measure onto a lower-dimensional set, we cannot define integrals over surfaces, curves, or fractals using the measure-theoretic framework we have built. The correct tool is restriction.
[definition: Restriction of a Measure]
Let $\mu$ be a measure on $(X, \mathcal{M})$ and let $E \in \mathcal{M}$. The **restriction** of $\mu$ to $E$ is the measure $\mu \lfloor E$ defined by
\begin{align*}
(\mu \lfloor E)(A) = \mu(E \cap A)
\end{align*}
for all $A \in \mathcal{M}$.
[/definition]
The restriction $\mu \lfloor E$ assigns to each measurable set $A$ the $\mu$-mass of the portion of $A$ that lies inside $E$. If $\mu$ is a Radon measure and $E$ is a Borel set with locally finite measure, then $\mu \lfloor E$ is also a Radon measure.
The notation $\mathcal{H}^k \lfloor E$ appears constantly in GMT: it denotes the $k$-dimensional Hausdorff measure restricted to the set $E$. When $E$ is a smooth $k$-dimensional surface, $\mathcal{H}^k \lfloor E$ is the surface area measure on $E$, and integration against it recovers the classical surface integral. The power of the restriction operation is that it works just as well for irregular sets: the Cantor set, fractals, and rectifiable currents all become natural domains for integration via restriction of Hausdorff measure.
[example: Integration Over a Circle]
Let $E = \{(x_1, x_2) \in \mathbb{R}^2 : x_1^2 + x_2^2 = 1\}$ be the unit circle, and let $f : \mathbb{R}^2 \to \mathbb{R}$ be a continuous function. The measure $\mathcal{H}^1 \lfloor E$ is supported on the circle and assigns to any arc its length. The integral
\begin{align*}
\int_{\mathbb{R}^2} f \, d(\mathcal{H}^1 \lfloor E) = \int_E f \, d\mathcal{H}^1
\end{align*}
recovers the line integral of $f$ over the unit circle. In parametric form, with $\gamma(\theta) = (\cos\theta, \sin\theta)$ for $\theta \in [0, 2\pi)$:
\begin{align*}
\int_E f \, d\mathcal{H}^1 = \int_0^{2\pi} f(\cos\theta, \sin\theta) \, d\mathcal{L}^1(\theta) = \int_0^{2\pi} f(\cos\theta, \sin\theta) \, d\theta.
\end{align*}
The last equality uses the fact that $|\gamma'(\theta)| = 1$, so the arc-length element is $d\mathcal{H}^1 = |\gamma'| \, d\theta = d\theta$. For a Lipschitz curve (not necessarily of constant speed), the correct change-of-variables formula involves the Jacobian of the parametrization — this is the co-area formula, which we develop later in the course.
[/example]
## What Comes Next
This chapter has established the basic measure-theoretic infrastructure: outer measures, the Carathéodory construction, Borel and Radon measures, measurable functions, and restriction. With this foundation in place, the subsequent chapters develop the tools needed for geometric analysis.
Chapter 2 studies Lusin's and Egoroff's theorems, which show that measurable functions and convergence are "nearly" as well-behaved as their continuous counterparts. Chapter 3 constructs product measures and proves Fubini's theorem, which justifies the layer-cake formula and prepares for integration on product spaces. Chapter 4 develops covering theorems — the Vitali and Besicovitch theorems — which are the core technical engine for differentiating measures and estimating densities. Chapters 5 and 6 apply covering theorems to differentiation of Radon measures and Lebesgue points. Chapter 7 proves the Riesz representation theorem, which identifies Radon measures with positive linear functionals — the tool that constructs measures from geometric objects. Chapters 8 through 12 turn to the main subjects of the course: weak convergence, Hausdorff measures, densities, and Lipschitz mappings.
The arc of the course runs from the general (outer measures on abstract spaces) to the specific (Hausdorff measures on $\mathbb{R}^n$, Lipschitz images, rectifiable sets). The measure theory developed in this chapter is not background to be filed away — it will be used in every proof that follows.
<!-- illustration-needed: the layer-cake formula — show a graph of a non-negative function f(x) over the x-axis, with horizontal slices at heights t illustrating the super-level sets {f > t}, and arrows showing how the integral over X corresponds to integrating the width (measure) of level sets over the height t -->
<!-- illustration-needed: Carathéodory measurability — show a set A in the plane splitting an arbitrary test set E into E ∩ A (inside) and E \ A (outside), illustrating that no mass is lost at the boundary of A for a measurable set, in contrast to a non-measurable set where the split is inconsistent -->
---
With the foundations of measure theory and measurable functions in place, we now ask a deeper question: how regular are measurable functions in practice? Lusin's theorem shows that measurable functions are 'almost' continuous, while Egoroff's theorem reveals that pointwise convergence is 'almost' uniform on sets of large measure.
# 2. Lusin's and Egoroff's Theorems
Measurability is often introduced as the minimal hypothesis that allows integration to be defined. But what does measurability actually mean for the *behavior* of a function? A measurable function can be discontinuous everywhere — the indicator function $\mathbb{1}_\mathbb{Q}$ of the rationals provides a stark example, failing to be continuous at every single point of $[0,1]$. And yet, something remarkable is true: every measurable function is "almost" continuous, in the precise sense that it restricts to a continuous function on a set whose complement has arbitrarily small measure. This is the content of Lusin's theorem.
A companion phenomenon concerns sequences of functions. Pointwise convergence is a local, uncoordinated condition: each point eventually gets close to the limit, but the rates can be wildly different across the domain. Uniform convergence, by contrast, is a global condition that is far more powerful analytically. In general, pointwise convergence does not imply uniform convergence. But again something remarkable holds: on a finite measure space, pointwise convergence implies *nearly* uniform convergence — uniform convergence on a set whose complement has arbitrarily small measure. This is Egoroff's theorem.
Together, these two results carry the same message: measurable objects are well-behaved except on sets of small measure. They are the first deep theorems that justify the intuition that "measurability is the right generalization of continuity," and they are indispensable tools throughout geometric measure theory.
[example: Discontinuity Is Not an Obstacle]
Consider the Dirichlet function $f: [0,1] \to \mathbb{R}$ defined by
\begin{align*}
f(x) = \mathbb{1}_\mathbb{Q}(x) = \begin{cases} 1 & \text{if } x \in \mathbb{Q}, \\ 0 & \text{if } x \notin \mathbb{Q}. \end{cases}
\end{align*}
This function is measurable (both $\mathbb{Q}$ and $\mathbb{R} \setminus \mathbb{Q}$ are Borel sets) but continuous at no point of $[0,1]$: at every rational, nearby irrationals force the function to be $0$, and at every irrational, nearby rationals force it to be $1$.
Nevertheless, Lusin's theorem applies. Since $\mathcal{L}^1(\mathbb{Q} \cap [0,1]) = 0$, for any $\varepsilon > 0$ we can find a compact set $K \subset [0,1] \setminus \mathbb{Q}$ with $\mathcal{L}^1([0,1] \setminus K) < \varepsilon$. On $K$, every point is irrational, so $f \equiv 0$ on $K$, and the constant function $0$ is continuous. The restriction $f|_K$ is therefore continuous, even though $f$ itself is nowhere continuous on $[0,1]$.
This example shows that Lusin's theorem is not a statement about the global structure of $f$ — it says nothing about what $f$ does outside $K$. But by making $\varepsilon$ small, we can force $K$ to capture nearly all of $[0,1]$'s measure. That is precisely the power of the result.
[/example]
## Nearly Uniform Convergence
The question of when pointwise convergence can be upgraded to uniform convergence is one of the oldest in analysis. Uniform convergence is what allows you to pass limits inside integrals, exchange the order of limiting operations, and transfer continuity from approximating functions to the limit. Pointwise convergence, on its own, does none of these things reliably.
A key obstruction is that pointwise convergence allows different points to converge at arbitrarily different rates. The sequence $f_k(x) = x^k$ on $[0,1]$ converges pointwise to $\mathbb{1}_{\{1\}}$, but does so arbitrarily slowly near $x = 1$: the convergence is decidedly non-uniform. Egoroff's theorem says that on a finite measure space, we can always find a large set on which the convergence is *uniform*, and we can make this set as large as we wish (in measure) by paying an arbitrarily small measure cost.
The finite measure hypothesis is not a technical artifact — it is genuinely necessary. Before stating the theorem, we see exactly why.
[example: Why Finite Measure Is Necessary]
Let $(X, \mathcal{A}, \mu) = (\mathbb{R}, \mathcal{B}(\mathbb{R}), \mathcal{L}^1)$, which has infinite total measure. Define
\begin{align*}
f_k = \mathbb{1}_{[k, k+1]}, \quad k \in \mathbb{N}.
\end{align*}
For each fixed $x \in \mathbb{R}$, eventually $x < k$, so $f_k(x) = 0$ for all sufficiently large $k$. Thus $f_k \to 0$ pointwise everywhere on $\mathbb{R}$.
We claim there is no measurable set $E \subset \mathbb{R}$ with $\mathcal{L}^1(\mathbb{R} \setminus E) < 1/2$ on which $f_k \to 0$ uniformly. Suppose for contradiction that such an $E$ existed and $f_k \to 0$ uniformly on $E$. Then there exists $N$ such that $|f_k(x)| < 1/2$ for all $x \in E$ and all $k \geq N$. But $f_k = \mathbb{1}_{[k,k+1]}$, so this forces $\mathcal{L}^1(E \cap [k, k+1]) = 0$ for all $k \geq N$. Then
\begin{align*}
\mathcal{L}^1(E) \leq \mathcal{L}^1(E \cap [0, N+1]) + \sum_{k=N}^\infty \mathcal{L}^1(E \cap [k, k+1]) = \mathcal{L}^1(E \cap [0, N+1]),
\end{align*}
which is finite. Since $E$ is supposed to contain nearly all of $\mathbb{R}$ (meaning $\mathcal{L}^1(\mathbb{R} \setminus E) < 1/2$), this is a contradiction — $E$ cannot have finite measure and simultaneously contain all but a half-unit of the real line. The failure of Egoroff's theorem here stems directly from the fact that the mass of the functions $f_k$ "escapes to infinity" — each $f_k$ places its support at location $k$, beyond any fixed bounded region.
[/example]
With the necessity of the finite measure hypothesis established, we can state the theorem.
[quotetheorem:14]
[explanation: Proof Strategy for Egoroff's Theorem]
The proof constructs $E$ by a careful diagonal argument. The core idea is to build, for each tolerance $1/n$, a set on which the tail of the sequence eventually stays within $1/n$ of the limit, then intersect over all $n$.
For $n, m \in \mathbb{N}$, define the "bad set at level $n$ starting from index $m$":
\begin{align*}
A_{n,m} = \bigcup_{k \geq m} \{x \in X : |f_k(x) - f(x)| > 1/n\}.
\end{align*}
This is the set of points where the sequence is more than $1/n$ away from $f$ at some index $k \geq m$. For each fixed $n$, the sets $A_{n,m}$ decrease as $m$ increases: $A_{n,m+1} \subset A_{n,m}$ because the union over $k \geq m+1$ is a subset of the union over $k \geq m$.
Since $f_k \to f$ pointwise $\mu$-a.e., for $\mu$-almost every $x$ and for each fixed $n$, eventually all $|f_k(x) - f(x)| \leq 1/n$. This means $x \notin A_{n,m}$ for large enough $m$. Hence $\bigcap_{m=1}^\infty A_{n,m} = \varnothing$ (up to a null set). Because $\mu(X) < \infty$ and $A_{n,1} \subset X$ has finite measure, continuity of measure from above gives
\begin{align*}
\mu(A_{n,m}) \to 0 \quad \text{as } m \to \infty.
\end{align*}
This is where the finite measure hypothesis is used: without $\mu(X) < \infty$, we cannot apply continuity from above (which requires finite measure of the starting set).
For each $n \in \mathbb{N}$, choose $m_n$ large enough that $\mu(A_{n,m_n}) < \varepsilon / 2^n$. Now define
\begin{align*}
E = X \setminus \bigcup_{n=1}^\infty A_{n,m_n}.
\end{align*}
Then
\begin{align*}
\mu(X \setminus E) = \mu\!\left(\bigcup_{n=1}^\infty A_{n,m_n}\right) \leq \sum_{n=1}^\infty \mu(A_{n,m_n}) < \sum_{n=1}^\infty \frac{\varepsilon}{2^n} = \varepsilon.
\end{align*}
We verify uniform convergence on $E$. Fix $n \in \mathbb{N}$. For any $x \in E$, we have $x \notin A_{n,m_n}$, meaning $|f_k(x) - f(x)| \leq 1/n$ for all $k \geq m_n$. This holds for every $x \in E$ simultaneously, so $f_k \to f$ uniformly on $E$ at rate $1/n$ once $k \geq m_n$. Since $n$ was arbitrary, convergence is uniform on $E$.
[/explanation]
[remark: The Exceptional Set Cannot Be Made Empty]
The word "nearly" in Egoroff's theorem cannot be removed in general. Even on a finite measure space, there need not exist a single set of measure zero outside of which convergence is uniform. Consider $f_k(x) = x^k$ on $[0,1]$ with Lebesgue measure. The pointwise limit is $f(x) = \mathbb{1}_{\{1\}}(x)$. The convergence is not uniform on any set that contains points arbitrarily close to $1$. Egoroff's theorem guarantees uniform convergence on some $E$ with $\mathcal{L}^1([0,1] \setminus E) < \varepsilon$, but for any $\varepsilon > 0$ we must exclude a set of measure at least $\varepsilon$ near $x = 1$.
[/remark]
The significance of Egoroff's theorem in geometric measure theory lies in its use as a tool inside other proofs. When Lusin's theorem is proved, Egoroff's theorem provides the crucial passage from pointwise convergence of simple function approximations to uniform convergence on a compact set — and it is this uniform convergence that yields continuity.
## Continuity Almost Everywhere
The question underlying Lusin's theorem is deceptively simple: if $f$ is measurable, is it "close to continuous" in any meaningful sense? The answer cannot involve the individual values of $f$, because we can change $f$ on a null set without affecting its measurability or its integral. So we must look for a topological statement that is measure-theoretically robust.
Lusin's insight is that the right notion is *restriction to a large compact set*. Instead of asking whether $f$ is continuous at most points (which is a pointwise condition that can fail everywhere, as the Dirichlet function shows), we ask whether there is a compact set $K$ of nearly full measure on which the restriction $f|_K$ is a continuous function with the subspace topology from $\mathbb{R}^n$.
The theorem is most naturally stated for Radon measures, because the proof uses inner regularity — the ability to approximate measurable sets from inside by compact sets.
[definition: Radon Measure]
A Borel measure $\mu$ on $\mathbb{R}^n$ is a **Radon measure** if it is locally finite (finite on compact sets) and inner regular: for every Borel set $A \subset \mathbb{R}^n$,
\begin{align*}
\mu(A) = \sup\{\mu(K) : K \subset A, \, K \text{ compact}\}.
\end{align*}
Equivalently, a Radon measure is a Borel regular outer measure that is finite on compact sets.
[/definition]
The canonical example to keep in mind is the Lebesgue measure, which satisfies both conditions on $\mathbb{R}^n$ and therefore serves as the model Radon measure throughout this chapter.
[remark: Lebesgue Measure Is Radon]
The Lebesgue measure $\mathcal{L}^n$ is the canonical example of a Radon measure on $\mathbb{R}^n$: it is finite on bounded (hence compact) sets, and inner regularity follows from the fact that every Borel set can be approximated from inside by compact sets. The inner regularity property is the key structural feature that makes Lusin's proof work.
[/remark]
It is precisely the inner regularity of Radon measures — the ability to approximate any Borel set from inside by compact subsets — that makes Lusin's theorem possible. Without inner regularity, we could not extract a compact set $K$ of nearly full measure from the level sets of $f$, and the whole argument collapses. With it, every measurable function behaves like a continuous one if we are willing to discard a set of small measure.
[quotetheorem:2959]
[explanation: Proof Strategy for Lusin's Theorem]
The proof proceeds in two stages: first approximate $f$ by simple functions using the standard measure-theoretic approximation, then use Egoroff's theorem to pass to a compact set where the approximation is uniform (and hence the limit is continuous).
**Step 1: Approximation by simple functions.** A standard result in measure theory guarantees that any measurable function $f$ is the pointwise limit of a sequence of simple functions $s_k$. We may take the $s_k$ to be measurable simple functions, meaning each is a finite linear combination of indicator functions of measurable sets:
\begin{align*}
s_k = \sum_{j=1}^{N_k} c_{k,j} \, \mathbb{1}_{A_{k,j}},
\end{align*}
where $A_{k,j}$ are measurable and $c_{k,j} \in \mathbb{R}$. We may arrange so that $|s_k| \leq |f|$ and $s_k \to f$ pointwise everywhere on the set where $f$ is finite.
**Step 2: Approximating compact sets for simple functions.** For each simple function $s_k$ and for $\varepsilon > 0$, each indicator $\mathbb{1}_{A_{k,j}}$ is continuous when restricted to a compact set that approximates $A_{k,j}$ from inside. More precisely, by inner regularity of $\mu$, for each measurable set $A$ and each $\delta > 0$ there exists a compact set $L \subset A$ with $\mu(A \setminus L) < \delta$, and likewise a compact set $L' \subset A^c$ with $\mu(A^c \setminus L') < \delta$. On the compact set $L \cup L'$, the indicator $\mathbb{1}_A$ equals $1$ on $L$ and $0$ on $L'$, and both sets are closed (hence compact), so $\mathbb{1}_A$ is continuous there.
Applying this to all indicator functions appearing in all $s_k$, and summing the measure costs (which is a convergent series if we allocate measure $\varepsilon/2^k$ to the $k$-th simple function), we obtain a single compact set $K_0$ with $\mu(\mathbb{R}^n \setminus K_0) < \varepsilon/2$ on which every $s_k$ is continuous.
**Step 3: Applying Egoroff's theorem.** On $K_0$ we have $s_k \to f$ pointwise and $\mu(K_0) < \infty$. By Egoroff's theorem applied to the finite measure space $(K_0, \mu|_{K_0})$, there exists a compact set $K \subset K_0$ with $\mu(K_0 \setminus K) < \varepsilon/2$ such that $s_k \to f$ uniformly on $K$.
**Step 4: Continuity of the limit.** Each $s_k$ is continuous on $K$. Since $s_k \to f$ uniformly on $K$, the uniform limit of continuous functions is continuous. Therefore $f|_K$ is continuous.
The total measure cost is $\mu(\mathbb{R}^n \setminus K) \leq \mu(\mathbb{R}^n \setminus K_0) + \mu(K_0 \setminus K) < \varepsilon/2 + \varepsilon/2 = \varepsilon$.
[/explanation]
[example: Lusin's Theorem for the Dirichlet Function]
Return to the Dirichlet function $f = \mathbb{1}_\mathbb{Q}$ on $[0,1]$. We apply Lusin's theorem with the measure $\mu = \mathcal{L}^1$ and tolerance $\varepsilon > 0$.
Since $\mathcal{L}^1(\mathbb{Q} \cap [0,1]) = 0$, the rationals form a Lebesgue null set in $[0,1]$. By inner regularity of $\mathcal{L}^1$, for each $\delta > 0$ there exists a compact set $K \subset [0,1] \setminus \mathbb{Q}$ with
\begin{align*}
\mathcal{L}^1([0,1] \setminus K) < \varepsilon.
\end{align*}
On the compact set $K \subset [0,1] \setminus \mathbb{Q}$, every point of $K$ is irrational, so $f(x) = 0$ for all $x \in K$. The restriction $f|_K \equiv 0$ is the constant zero function, which is continuous.
To verify that $K$ can be chosen with the right measure bound: let $V$ be any open set containing $\mathbb{Q} \cap [0,1]$ with $\mathcal{L}^1(V) < \varepsilon$ (such $V$ exists because the rationals are countable and Lebesgue null — enumerate them as $q_1, q_2, \ldots$ and take $V = \bigcup_j (q_j - \varepsilon/2^{j+1}, q_j + \varepsilon/2^{j+1})$, giving $\mathcal{L}^1(V) \leq \sum_j \varepsilon/2^j = \varepsilon$). Then $K = [0,1] \setminus V$ is compact (closed and bounded), $K \subset [0,1] \setminus \mathbb{Q}$, and $\mathcal{L}^1([0,1] \setminus K) = \mathcal{L}^1(V \cap [0,1]) < \varepsilon$.
This example is instructive because the mechanism is so transparent: $f$ is discontinuous because it takes the value $1$ on a dense set ($\mathbb{Q}$) and $0$ on another dense set ($\mathbb{R} \setminus \mathbb{Q}$). But $\mathbb{Q}$ has measure zero, so we can simply exclude it, and on the remainder $f$ is identically $0$, hence continuous: a constant function satisfies the $\varepsilon$-$\delta$ criterion with any $\delta > 0$, since $|f(x) - f(y)| = 0$ for all $x, y \in K$.
[/example]
The power of Lusin's theorem becomes apparent when $f$ is a more complicated measurable function, not merely a simple indicator. The theorem guarantees the existence of $K$ without requiring us to know anything about the structure of $f$ beyond its measurability.
## The Near-Continuity Philosophy
Both Egoroff's and Lusin's theorems are manifestations of a general principle in measure theory:
> *Measurable objects behave like well-behaved objects, except on sets of arbitrarily small measure.*
This principle, sometimes called the **near-continuity** or **almost-everywhere good behavior** philosophy, pervades geometric measure theory. It is what allows one to work with measurable functions as if they were continuous, and with pointwise-convergent sequences as if convergence were uniform, by always allowing a small "error set."
[explanation: Two Perspectives on Near-Continuity]
Lusin's theorem and approximate continuity are related but different manifestations of this philosophy.
**Lusin's perspective** is global and topological: $f$ is continuous on a *single large compact set* $K$. The set $K$ depends on $f$ and on $\varepsilon$. Outside $K$, we make no claims about $f$.
**Approximate continuity** is a pointwise notion: $f$ is approximately continuous at $x$ if
\begin{align*}
\lim_{r \to 0} \frac{\mathcal{L}^n(\{y \in B(x,r) : |f(y) - f(x)| \geq \varepsilon\})}{\mathcal{L}^n(B(x,r))} = 0 \quad \text{for all } \varepsilon > 0.
\end{align*}
This says that the proportion of the ball $B(x,r)$ where $f$ differs from $f(x)$ by $\varepsilon$ or more goes to zero as $r \to 0$: $f(x)$ is the "density limit" of $f$ at $x$. The Lebesgue differentiation theorem (developed in Chapter 6) implies that every $f \in L^1_{\rm loc}(\mathbb{R}^n)$ is approximately continuous at $\mathcal{L}^n$-almost every point.
The connection between the two perspectives: if $f$ is approximately continuous at every point of $K$ and $K$ is closed, then $f|_K$ is continuous at every point of $K$ in the subspace topology. Lusin's theorem constructs the compact set $K$ without explicitly using approximate continuity, but the two results reinforce each other conceptually.
[/explanation]
[remark: Measurability as the Right Generalization of Continuity]
Lusin's theorem is one of the main pieces of evidence that measurability is the correct generalization of continuity for integration theory. A continuous function on a compact set can be integrated (by the Riemann theory). A measurable function, by Lusin's theorem, restricts to a continuous function on a set of measure as close to full measure as desired — and the integral can therefore be built up from these continuous restrictions. This is not merely philosophical: the proof of Lusin's theorem is inductive, and the measurability hypothesis is used at every step.
[/remark]
## Uniform Integrability and the Limits of the Theory
Having established that pointwise convergence implies nearly uniform convergence (Egoroff) and that measurable functions are nearly continuous (Lusin), it is natural to ask what these theorems can and cannot give us in terms of integration.
[explanation: What Egoroff's Theorem Does Not Give You]
Egoroff's theorem converts pointwise convergence to nearly uniform convergence. A natural hope would be that this implies convergence of the integrals:
\begin{align*}
f_k \to f \text{ pointwise a.e.} \implies \int_X f_k \, d\mu \to \int_X f \, d\mu.
\end{align*}
This is *false* in general, even on a finite measure space. The standard counterexample is the "sliding hump": on $[0,1]$ with Lebesgue measure, define $f_k = k \cdot \mathbb{1}_{[0, 1/k]}$. Then:
- $f_k(x) \to 0$ for all $x \in (0,1]$, so $f_k \to 0$ pointwise $\mathcal{L}^1$-a.e.
- $\int_0^1 f_k \, d\mathcal{L}^1 = k \cdot (1/k) = 1 \not\to 0$.
Here the functions concentrate their mass at $0$, making each integral equal to $1$ despite pointwise convergence to $0$. Egoroff's theorem does apply — convergence is nearly uniform — but the mass concentrated near $0$ escapes through the exceptional set.
What is needed to pass the limit inside the integral is *uniform integrability*: the condition that the tails of the functions $\{f_k\}$ are controlled uniformly in $k$, preventing mass from escaping to infinity or concentrating. This leads to the Vitali convergence theorem, which asserts that pointwise convergence together with uniform integrability implies $L^1$ convergence. In the context of Radon measures and geometric measure theory, the analogous tool is weak-* convergence of measures (Chapter 8).
[/explanation]
[example: Egoroff and the Sliding Hump]
Let $f_k = k \cdot \mathbb{1}_{[0, 1/k]}$ on $[0,1]$ with $\mu = \mathcal{L}^1$. We verify that Egoroff's theorem applies and find the exceptional sets explicitly.
Since $f_k(x) \to 0$ for each $x \in (0,1]$, the pointwise limit is $f \equiv 0$ (a.e.). For any $\varepsilon > 0$, Egoroff's theorem gives a measurable set $E$ with $\mathcal{L}^1([0,1] \setminus E) < \varepsilon$ on which $f_k \to 0$ uniformly. To see what $E$ must look like, note that uniform convergence of $f_k$ to $0$ on $E$ requires: for each tolerance $1/n$, there exists $m_n$ such that $k \geq m_n$ implies $|f_k(x)| \leq 1/n$ for all $x \in E$. But $f_k(x) = k$ for $x \in [0, 1/k]$, so $E$ must be disjoint from $[0, 1/k]$ for all sufficiently large $k$. This means $0 \notin E$ — the point $0$ must lie in the exceptional set.
In fact, $E$ must exclude a neighborhood of $0$: if $E$ contained $[0, \delta]$ for any $\delta > 0$, then for $k > 1/\delta$ the function $f_k$ would equal $k$ on $[0, 1/k] \subset [0, \delta]$, violating any fixed uniform bound. So the exceptional set $[0,1] \setminus E$ must always contain a neighborhood of $0$, and the integral mass $\int f_k \, d\mathcal{L}^1 = 1$ lives precisely in this neighborhood. Egoroff's theorem is not contradicted — it makes no claim about integrals — but the example shows why Egoroff alone cannot give integral convergence.
[/example]
The interplay between Egoroff's theorem and integration convergence is a recurring theme in geometric measure theory. Many density and differentiation results proceed by: (1) establish pointwise convergence $\mu$-a.e., (2) apply Egoroff to find a large set with uniform convergence, (3) integrate over that set and estimate the remainder separately. The remainder can often be bounded using the smallness of the exceptional set combined with growth estimates on the functions.
## Applications in Geometric Measure Theory
These two theorems are not merely abstract tools in integration theory — they have concrete roles in the proofs of geometric results.
In the proof of the Lebesgue differentiation theorem (Chapter 6), which asserts that
\begin{align*}
\lim_{r \to 0} \frac{1}{\mathcal{L}^n(B(x,r))} \int_{B(x,r)} f(y) \, d\mathcal{L}^n(y) = f(x) \quad \text{for } \mathcal{L}^n\text{-a.e. } x,
\end{align*}
one uses an approximation of $f$ by continuous functions together with estimates showing the error is small. The measurability of $f$ guarantees the approximation by simple functions, and Lusin's theorem can be used to find the continuous approximants.
In differentiation of measures (Chapter 5), Egoroff's theorem appears when one needs to upgrade pointwise convergence of the difference quotients $\nu(B(x,r))/\mu(B(x,r))$ to uniform convergence on a large set, in order to integrate and recover the total mass.
Lusin's theorem also plays a role in the theory of Radon–Nikodym derivatives and in the construction of measures from geometric functionals (Chapter 7). The Riesz representation theorem identifies Radon measures with positive linear functionals on $C_c(\mathbb{R}^n)$; Lusin's theorem is what allows one to identify a measurable function with such a functional, by integrating $f$ against test functions using the fact that $f$ is continuous on large compact sets.
[remark: Tietze Extension and Lusin]
A closely related result is Tietze's extension theorem: a continuous function on a closed subset of a normal topological space extends to a continuous function on the whole space. Combining Lusin's theorem with Tietze's theorem, every measurable $f$ on $\mathbb{R}^n$ can be approximated in $L^p$ norm (for $p < \infty$) by continuous functions: first find a compact $K$ with $\mu(\mathbb{R}^n \setminus K) < \varepsilon$ and $f|_K$ continuous, then extend $f|_K$ to a continuous function on $\mathbb{R}^n$ (with compact support if needed). This gives the density of $C_c(\mathbb{R}^n)$ in $L^p(\mathbb{R}^n, \mu)$ for Radon measures $\mu$ and $1 \leq p < \infty$.
[/remark]
---
Having characterized the regularity of measurable functions, we extend our measure-theoretic machinery to higher dimensions. Product measures and Fubini's theorem enable us to compute integrals over multi-dimensional spaces by reducing them to iterated integrals, a cornerstone technique for geometric applications.
# 3. Product Measures and Fubini's Theorem
Suppose you want to compute the integral of a function $f : \mathbb{R}^2 \to \mathbb{R}$ over the plane. The cleanest strategy is to slice: fix $x$, integrate $f(x, \cdot)$ over the $y$-axis, and then integrate the resulting function over the $x$-axis. This iterated procedure — the way every calculus student learns to compute double integrals — is so intuitive that its hidden assumptions are easy to miss. When does slicing work? When do the two iterated integrals (first $x$ then $y$, or first $y$ then $x$) agree? And what, exactly, is the measure on $\mathbb{R}^2$ that these iterated integrals are computing?
These questions are answered by the product measure construction and the Fubini–Tonelli theorem. The answers are not merely technical — they reveal something genuinely surprising. Iterated integration can fail: two iterated integrals can both exist, be finite, and yet disagree. The resolution is not to be more careful about computation but to impose the right integrability hypothesis. And the payoff reaches far beyond double integrals: understanding how measures multiply is what lets us identify $n$-dimensional Lebesgue measure $\mathcal{L}^n$ as the $n$-fold product of one-dimensional Lebesgue measure, justifying the fundamental principle that volume computations can always be reduced to iterated one-dimensional integration.
[example: Iterated Integrals Can Disagree]
Consider the function $f : (0, 1) \times (0, 1) \to \mathbb{R}$ defined by
\begin{align*}
f(x, y) = \frac{x^2 - y^2}{(x^2 + y^2)^2}.
\end{align*}
We compute both iterated integrals over the unit square $(0,1)^2$ with respect to $\mathcal{L}^1 \otimes \mathcal{L}^1$. First, fix $x > 0$ and integrate in $y$. Notice that
\begin{align*}
\frac{\partial}{\partial y}\left(\frac{y}{x^2 + y^2}\right) = \frac{x^2 + y^2 - 2y^2}{(x^2 + y^2)^2} = \frac{x^2 - y^2}{(x^2 + y^2)^2} = f(x, y).
\end{align*}
Therefore
\begin{align*}
\int_0^1 f(x, y)\, d\mathcal{L}^1(y) = \left[\frac{y}{x^2 + y^2}\right]_{y=0}^{y=1} = \frac{1}{x^2 + 1}.
\end{align*}
Integrating in $x$:
\begin{align*}
\int_0^1 \left(\int_0^1 f(x, y)\, d\mathcal{L}^1(y)\right) d\mathcal{L}^1(x) = \int_0^1 \frac{1}{x^2 + 1}\, d\mathcal{L}^1(x) = \arctan(1) - \arctan(0) = \frac{\pi}{4}.
\end{align*}
Now fix $y > 0$ and integrate in $x$. By symmetry, $f(x, y) = -f(y, x)$, so by the same antiderivative argument,
\begin{align*}
\int_0^1 f(x, y)\, d\mathcal{L}^1(x) = -\frac{1}{y^2 + 1}.
\end{align*}
Integrating in $y$:
\begin{align*}
\int_0^1 \left(\int_0^1 f(x, y)\, d\mathcal{L}^1(x)\right) d\mathcal{L}^1(y) = -\int_0^1 \frac{1}{y^2 + 1}\, d\mathcal{L}^1(y) = -\frac{\pi}{4}.
\end{align*}
Both iterated integrals exist and are finite, but $\pi/4 \neq -\pi/4$. The reason is that $f \notin L^1((0,1)^2, \mathcal{L}^2)$: near the origin, $|f(x,y)| \asymp 1/(x^2 + y^2)$, which is not integrable on $(0,1)^2$. This failure of integrability is exactly the hypothesis that Fubini's theorem requires.
[/example]
The lesson from this example shapes the entire chapter. The Fubini–Tonelli theorem is not merely a formula for switching the order of integration — it is a theorem whose hypotheses matter, and whose failure points to a real phenomenon.
## Product $\sigma$-Algebras and Measurable Rectangles
Before we can integrate over a product space $X \times Y$, we need to know which subsets of $X \times Y$ are measurable. The guiding principle is that any reasonable $\sigma$-algebra on a product space should, at a minimum, contain all sets of the form $A \times B$ where $A$ and $B$ are measurable in the respective factors. These are the building blocks.
[definition: Measurable Rectangle]
Let $(X, \mathcal{A})$ and $(Y, \mathcal{B})$ be measurable spaces. A **measurable rectangle** is any set of the form $A \times B$ where $A \in \mathcal{A}$ and $B \in \mathcal{B}$.
[/definition]
Measurable rectangles alone do not form a $\sigma$-algebra — the union of two rectangles need not be a rectangle. But they generate one.
[definition: Product $\sigma$-Algebra]
Let $(X, \mathcal{A})$ and $(Y, \mathcal{B})$ be measurable spaces. The **product $\sigma$-algebra** $\mathcal{A} \otimes \mathcal{B}$ is the $\sigma$-algebra on $X \times Y$ generated by all measurable rectangles:
\begin{align*}
\mathcal{A} \otimes \mathcal{B} = \sigma\bigl(\{A \times B : A \in \mathcal{A},\ B \in \mathcal{B}\}\bigr).
\end{align*}
[/definition]
[remark: Borel Product Sigma-Algebra]
When $X = \mathbb{R}^m$ and $Y = \mathbb{R}^n$ with their respective Borel $\sigma$-algebras, the product $\sigma$-algebra $\mathcal{B}(\mathbb{R}^m) \otimes \mathcal{B}(\mathbb{R}^n)$ coincides with the Borel $\sigma$-algebra $\mathcal{B}(\mathbb{R}^{m+n})$. This is because every open set in $\mathbb{R}^{m+n}$ is a countable union of open boxes, and open boxes are measurable rectangles with open factors.
[/remark]
The product $\sigma$-algebra has a useful characterization in terms of sections. Given a set $E \subset X \times Y$ and a point $x \in X$, define the $x$-section
\begin{align*}
E_x = \{y \in Y : (x, y) \in E\}.
\end{align*}
Similarly, for $y \in Y$, define $E^y = \{x \in X : (x, y) \in E\}$. The product $\sigma$-algebra is precisely the one that makes all sections measurable:
[quotetheorem:2960]
The proof proceeds by a monotone class argument: the collection of sets $E$ for which $E_x \in \mathcal{B}$ for all $x$ is a monotone class containing all measurable rectangles, hence contains $\mathcal{A} \otimes \mathcal{B}$.
## Construction of the Product Measure
We now want to assign a measure to each set in $\mathcal{A} \otimes \mathcal{B}$. On measurable rectangles, the formula is dictated by the requirement that area (or volume, in higher dimensions) should be multiplicative:
\begin{align*}
(\mu \otimes \nu)(A \times B) = \mu(A) \cdot \nu(B).
\end{align*}
The question is whether this formula on rectangles determines a genuine measure on all of $\mathcal{A} \otimes \mathcal{B}$. The answer is yes, and the construction uses Carathéodory's extension theorem from Chapter 1.
[definition: Product Measure]
Let $(X, \mathcal{A}, \mu)$ and $(Y, \mathcal{B}, \nu)$ be $\sigma$-finite measure spaces. The **product measure** $\mu \otimes \nu$ is the unique measure on $(X \times Y, \mathcal{A} \otimes \mathcal{B})$ satisfying
\begin{align*}
(\mu \otimes \nu)(A \times B) = \mu(A) \cdot \nu(B)
\end{align*}
for all $A \in \mathcal{A}$ and $B \in \mathcal{B}$.
[/definition]
[explanation: Why $\sigma$-Finiteness is Needed]
The $\sigma$-finiteness hypothesis in the definition is essential for uniqueness. Without it, multiple measures on $\mathcal{A} \otimes \mathcal{B}$ can agree on all rectangles while differing elsewhere.
Here is a concrete instance. Let $X = Y = [0, 1]$, let $\mu = \mathcal{L}^1$ (Lebesgue measure), and let $\nu$ be counting measure on $[0, 1]$ — so $\nu(B) = |B|$ if $B$ is finite and $\nu(B) = \infty$ if $B$ is infinite. Note that $\mu$ is $\sigma$-finite but $\nu$ is not ($\nu([0,1]) = \infty$ and $[0,1]$ cannot be written as a countable union of sets of finite counting measure, since any subset of $[0,1]$ with finite counting measure is finite).
Consider the diagonal $D = \{(x, x) : x \in [0, 1]\}$. Using $x$-sections: $D_x = \{x\}$, so $\nu(D_x) = 1$ for every $x$, giving
\begin{align*}
\int_X \nu(D_x)\, d\mu(x) = \int_0^1 1\, d\mathcal{L}^1(x) = 1.
\end{align*}
Using $y$-sections: $D^y = \{y\}$, so $\mu(D^y) = \mathcal{L}^1(\{y\}) = 0$, giving
\begin{align*}
\int_Y \mu(D^y)\, d\nu(y) = \int_{[0,1]} 0\, d\nu(y) = 0.
\end{align*}
The two iterated integrals of the indicator function $\mathbb{1}_D$ disagree: one equals $1$ and the other $0$. This is not a violation of Fubini's theorem — the hypothesis that $\nu$ be $\sigma$-finite fails. It shows that without $\sigma$-finiteness, the product measure construction does not behave as expected.
[/explanation]
The construction of $\mu \otimes \nu$ proceeds in two steps. First, one verifies that the set function defined on finite disjoint unions of rectangles is finitely additive and consistent. Second, Carathéodory's extension theorem (Chapter 1) promotes this to a complete measure on the product $\sigma$-algebra, and $\sigma$-finiteness ensures uniqueness.
## Tonelli's Theorem for Non-Negative Functions
With the product measure in hand, we face the main question: how does integration with respect to $\mu \otimes \nu$ relate to iterated integration? The cleanest version of the answer applies to non-negative functions, where there is no integrability hypothesis and no possibility of cancellation obscuring the computation.
[motivation]
The intuition behind Tonelli's theorem is elementary. If $f \geq 0$, then both iterated integrals count the same "mass" — there is nothing that can cancel across sections. Any sum of non-negative terms can be rearranged without changing the total. The theorem makes this precise for measurable non-negative functions, even when the integrals are infinite.
This version of the theorem is the technical workhorse: in practice, one often checks Fubini's hypothesis (integrability of $|f|$) by first applying Tonelli to $|f|$ and computing the iterated integral to see if it is finite.
[/motivation]
[quotetheorem:3017]
The proof strategy is the standard monotone class method. First verify the theorem for indicator functions of measurable rectangles, where it reduces to $(\mu \otimes \nu)(A \times B) = \mu(A)\nu(B)$. Extend by linearity to simple functions, then pass to the general case by writing $f$ as a monotone limit of simple functions and applying the monotone convergence theorem at each stage.
Why is $\sigma$-finiteness essential? The theorem fails without it, in both the uniqueness of the product measure and the equality of iterated integrals. The diagonal example in the next section — where $\mu = \mathcal{L}^1$ and $\nu$ is counting measure on $[0,1]$ — gives a concrete witness: with $f = \mathbb{1}_D$ non-negative and measurable, the two iterated integrals give $1$ and $0$. The culprit is not integrability but the failure of $\nu$ to be $\sigma$-finite.
What about non-measurable functions? If $f : X \times Y \to [0, \infty]$ is not $\mathcal{A} \otimes \mathcal{B}$-measurable, the statement breaks down entirely. The map $x \mapsto \int_Y f(x, y)\, d\nu(y)$ need not be measurable, so the outer integral may not even be defined. Measurability of the integrand is not a minor regularity assumption — it is what makes the sections $f_x$ and $f^y$ measurable in the first place, as guaranteed by the Measurability of Sections theorem above.
Tonelli's theorem feeds directly into Fubini's: in practice, one uses Tonelli to verify that $f \in L^1(X \times Y, \mu \otimes \nu)$ by checking finiteness of an iterated integral of $|f|$, and then Fubini applies to $f$ itself.
[remark: No Integrability Needed]
Tonelli's theorem places no integrability restriction on $f$. It applies even when $\int f\, d(\mu \otimes \nu) = \infty$, in which case both iterated integrals also equal $\infty$. This makes it ideal for verifying integrability: to check that $f \in L^1(X \times Y, \mu \otimes \nu)$, apply Tonelli to $|f|$ and verify that some iterated integral of $|f|$ is finite.
[/remark]
## Fubini's Theorem for Integrable Functions
For functions that change sign, the situation is more delicate. Cancellation between positive and negative parts makes it possible for iterated integrals to exist while the product integral does not, or for the iterated integrals to disagree with each other. Fubini's theorem specifies the correct hypothesis that rules out this pathology.
[quotetheorem:2961]
The proof reduces to the non-negative case by writing $f = f^+ - f^-$ where $f^+ = \max\{f, 0\}$ and $f^- = \max\{-f, 0\}$ are the positive and negative parts of $f$. Since $f \in L^1$, both $f^+$ and $f^-$ are non-negative integrable functions. Tonelli's theorem applies to each, giving
\begin{align*}
\int_{X \times Y} f^+\, d(\mu \otimes \nu) = \int_X \int_Y f^+(x, y)\, d\nu(y)\, d\mu(x) < \infty
\end{align*}
and the analogous equality for $f^-$. Since both iterated integrals are finite, the integrals of the sections $\int_Y f^+(x, y)\, d\nu(y)$ and $\int_Y f^-(x, y)\, d\nu(y)$ are finite for $\mu$-a.e. $x$. At such $x$, the section $f_x = f_x^+ - f_x^-$ is integrable. Subtracting the two Tonelli equalities gives Fubini's theorem.
[explanation: Why "A.E." Appears in the Hypotheses]
Fubini's theorem asserts that sections are integrable and the iterated integrals are in $L^1$, but only almost everywhere. This "a.e." is not just formal hedging — it is necessary. Consider $f(x, y) = \mathbb{1}_{x = y}$ on $[0,1]^2$ with $\mu = \nu = \mathcal{L}^1$. The product integral is $(\mathcal{L}^1 \otimes \mathcal{L}^1)(\{(x,x) : x \in [0,1]\}) = 0$ because the diagonal has $\mathcal{L}^2$-measure zero. But the section $f_x = \mathbb{1}_{\{x\}}$ satisfies $\int_Y f_x\, d\mathcal{L}^1 = \mathcal{L}^1(\{x\}) = 0$ for every $x$, so in this case the sections happen to be integrable everywhere, not just a.e.
The generic issue is different: starting from $g \in L^1(X \times Y)$, one may modify $g$ on a null set so that individual sections become non-integrable. The theorem only guarantees integrability of sections at $\mu$-a.e. $x$, not at every $x$.
[/explanation]
## Lebesgue Measure on $\mathbb{R}^n$ as an Iterated Product
The product measure construction provides more than an abstract tool for iterated integration. How do we know that $\mathcal{L}^n$ — the measure we use to integrate functions of $n$ variables — is actually the $n$-fold product of the one-dimensional construction? And why does this matter: could there be some other "reasonable" $n$-dimensional measure that disagreed with iterated integration along coordinate axes?
The one-dimensional Lebesgue measure $\mathcal{L}^1$ is defined, via Carathéodory's extension theorem, as the unique Borel measure on $\mathbb{R}$ satisfying $\mathcal{L}^1([a, b]) = b - a$ for all $a \leq b$. Starting from $\mathcal{L}^1$, we construct $\mathcal{L}^n$ by iteration:
[definition: Lebesgue Measure on $\mathbb{R}^n$ as a Product]
The $n$-dimensional Lebesgue measure $\mathcal{L}^n$ on $\mathbb{R}^n = \mathbb{R} \times \cdots \times \mathbb{R}$ ($n$ factors) is defined as the $n$-fold product measure
\begin{align*}
\mathcal{L}^n = \underbrace{\mathcal{L}^1 \otimes \mathcal{L}^1 \otimes \cdots \otimes \mathcal{L}^1}_{n \text{ times}}.
\end{align*}
[/definition]
This definition immediately gives the right normalizations. The unit cube $[0,1]^n$ satisfies
\begin{align*}
\mathcal{L}^n([0,1]^n) = \mathcal{L}^1([0,1]) \cdots \mathcal{L}^1([0,1]) = 1^n = 1.
\end{align*}
More generally, any box $[a_1, b_1] \times \cdots \times [a_n, b_n]$ satisfies
\begin{align*}
\mathcal{L}^n([a_1, b_1] \times \cdots \times [a_n, b_n]) = (b_1 - a_1)(b_2 - a_2) \cdots (b_n - a_n),
\end{align*}
which matches the expected geometric volume. Fubini's theorem then justifies the reduction of $n$-dimensional integrals to iterated one-dimensional integrals:
\begin{align*}
\int_{\mathbb{R}^n} f(x)\, d\mathcal{L}^n(x) = \int_{\mathbb{R}} \cdots \int_{\mathbb{R}} f(x_1, \ldots, x_n)\, d\mathcal{L}^1(x_1) \cdots d\mathcal{L}^1(x_n)
\end{align*}
for any $f \in L^1(\mathbb{R}^n, \mathcal{L}^n)$ (or for any non-negative measurable $f$ by Tonelli).
[remark: Uniqueness via Translation Invariance]
The measure $\mathcal{L}^n$ defined as a product is the unique Borel measure on $\mathbb{R}^n$ that is translation-invariant and assigns measure $1$ to the unit cube. This follows from Carathéodory's uniqueness theorem and the observation that the product measure inherits translation invariance from $\mathcal{L}^1$.
[/remark]
## A Worked Example: Computing a Gaussian Integral
The integrand $e^{-x^2}$ has no elementary antiderivative — no closed-form $F(x)$ with $F'(x) = e^{-x^2}$. Direct one-variable integration is therefore blocked. How can we possibly evaluate $\int_{-\infty}^{\infty} e^{-x^2}\, d\mathcal{L}^1(x)$? The answer is to lift the problem to two dimensions, where the rotational symmetry of $e^{-(x^2+y^2)} = e^{-x^2} e^{-y^2}$ lets Fubini and a change to polar coordinates do the work.
[example: The Gaussian Integral via Product Measures]
We compute $\int_{-\infty}^{\infty} e^{-x^2}\, d\mathcal{L}^1(x)$. Let $I = \int_{-\infty}^{\infty} e^{-x^2}\, d\mathcal{L}^1(x)$. We do not know a priori that $I$ is finite, so we first apply Tonelli's theorem to the non-negative function $f(x, y) = e^{-(x^2 + y^2)}$ on $\mathbb{R}^2$.
Since $f(x, y) = e^{-x^2} e^{-y^2}$ and both factors are non-negative measurable functions, Tonelli's theorem gives
\begin{align*}
\int_{\mathbb{R}^2} e^{-(x^2 + y^2)}\, d\mathcal{L}^2(x, y) &= \int_{\mathbb{R}} \left(\int_{\mathbb{R}} e^{-x^2} e^{-y^2}\, d\mathcal{L}^1(x)\right) d\mathcal{L}^1(y) \\
&= \int_{\mathbb{R}} e^{-y^2} \left(\int_{\mathbb{R}} e^{-x^2}\, d\mathcal{L}^1(x)\right) d\mathcal{L}^1(y) \\
&= I \cdot I = I^2.
\end{align*}
To compute the left side, we convert to polar coordinates. The map $\Phi : (0, \infty) \times (0, 2\pi) \to \mathbb{R}^2 \setminus \{(x, 0) : x \geq 0\}$ defined by $\Phi(r, \theta) = (r\cos\theta, r\sin\theta)$ has Jacobian determinant $r > 0$, so the change of variables formula gives
\begin{align*}
\int_{\mathbb{R}^2} e^{-(x^2 + y^2)}\, d\mathcal{L}^2(x, y) = \int_0^\infty \int_0^{2\pi} e^{-r^2} r\, d\mathcal{L}^1(\theta)\, d\mathcal{L}^1(r) = 2\pi \int_0^\infty r e^{-r^2}\, d\mathcal{L}^1(r).
\end{align*}
The inner integral is evaluated by the substitution $u = r^2$, $du = 2r\, dr$:
\begin{align*}
\int_0^\infty r e^{-r^2}\, d\mathcal{L}^1(r) = \frac{1}{2} \int_0^\infty e^{-u}\, d\mathcal{L}^1(u) = \frac{1}{2}\left[-e^{-u}\right]_0^\infty = \frac{1}{2}.
\end{align*}
Therefore $I^2 = 2\pi \cdot \frac{1}{2} = \pi$, and since $I > 0$ (as $e^{-x^2} > 0$), we conclude $I = \sqrt{\pi}$.
This computation used Tonelli's theorem in two places: first to factor the double integral into the product $I^2$, and second to switch the order of the iterated integral after the change to polar coordinates.
[/example]
The polar coordinate step in this calculation is a prototype of a more general principle: integrating over $\mathbb{R}^n$ can be organized radially, first over each sphere and then over the radius. This is not specific to $e^{-x^2}$ but is a structural feature of how Lebesgue measure on $\mathbb{R}^n$ decomposes.
[remark: The Polar Coordinate Formula as a Fubini Application]
The polar coordinate decomposition used in the example is a prototype of the coarea formula, which will be developed in detail in GMT II. In its simplest form: for $f \in L^1(\mathbb{R}^n, \mathcal{L}^n)$,
\begin{align*}
\int_{\mathbb{R}^n} f(x)\, d\mathcal{L}^n(x) = \int_0^\infty \left(\int_{\partial B(0, r)} f(x)\, d\mathcal{H}^{n-1}(x)\right) d\mathcal{L}^1(r).
\end{align*}
Here $\partial B(0, r) = \{x \in \mathbb{R}^n : |x| = r\}$ is the sphere of radius $r$, and the inner integral uses the $(n-1)$-dimensional Hausdorff measure $\mathcal{H}^{n-1}$. This formula expresses Lebesgue measure as an "iterated product" of spherical measure and radial Lebesgue measure.
[/remark]
## Completeness and the Product of Complete Measures
There is a subtlety that the product measure construction sweeps under the rug. Even if $\mu$ and $\nu$ are complete measures (every subset of a null set is measurable), the product measure $\mu \otimes \nu$ need not be complete on $\mathcal{A} \otimes \mathcal{B}$.
[explanation: Why the Product of Complete Measures Need Not Be Complete]
Consider $\mu = \nu = \mathcal{L}^1$ on $[0,1]$, both complete. Let $N \subset [0,1]$ be a non-Lebesgue-measurable set, and let $Z = \{0\}$ — a Lebesgue null set. Then
\begin{align*}
E = Z \times N = \{(0, y) : y \in N\} \subset [0,1]^2.
\end{align*}
Since $Z$ has measure zero, $E$ is a subset of the null set $Z \times [0,1]$. The set $Z \times [0,1] = \{0\} \times [0,1]$ is a measurable rectangle with $(\mathcal{L}^1 \otimes \mathcal{L}^1)(Z \times [0,1]) = \mathcal{L}^1(\{0\}) \cdot \mathcal{L}^1([0,1]) = 0$. So $E$ is a subset of a set of $(\mathcal{L}^1 \otimes \mathcal{L}^1)$-measure zero.
If the product measure were complete, $E$ would be measurable. But the $y$-section $E^y = \{0\}$ for $y \in N$ and $E^y = \varnothing$ for $y \notin N$, so measurability of $E$ would force $N \in \mathcal{B}([0,1])$. Since $N$ was chosen to be non-measurable, $E$ is not in $\mathcal{B}([0,1]) \otimes \mathcal{B}([0,1])$.
To obtain completeness, one passes to the **completion** $\overline{\mathcal{L}^1 \otimes \mathcal{L}^1}$, which adds all subsets of null sets. The completed product of $\mathcal{L}^1$ with itself is exactly $\mathcal{L}^2$.
[/explanation]
For most applications in GMT, this distinction is harmless: we work with Borel sets and Radon measures, and Fubini's theorem applies cleanly. But for more delicate questions — such as proving that projections of measurable sets are measurable — the completion matters.
## The Cavalieri Principle
When can we compute the volume of a solid by integrating the areas of its cross-sections? And how much do we need to know about those cross-sections — must they match in shape, or is equality of area at each height enough? Fubini's theorem answers both questions cleanly: all that matters is the $\mathcal{L}^{n-1}$-measure of each slice, not its geometric form. This is the content of the principle Cavalieri stated geometrically in the seventeenth century, now a direct corollary of integration theory.
[quotetheorem:2962]
The proof is immediate: by Fubini (or Tonelli applied to indicator functions),
\begin{align*}
\mathcal{L}^n(E) = \int_{\mathbb{R}^n} \mathbb{1}_E\, d\mathcal{L}^n = \int_{\mathbb{R}} \mathcal{L}^{n-1}(E_t)\, d\mathcal{L}^1(t) = \int_{\mathbb{R}} \mathcal{L}^{n-1}(F_t)\, d\mathcal{L}^1(t) = \mathcal{L}^n(F).
\end{align*}
Why does measurability of $E$ and $F$ matter? The Cavalieri principle is about $\mathcal{L}^n$-measurable sets, not arbitrary subsets of $\mathbb{R}^n$. Without measurability, the function $t \mapsto \mathcal{L}^{n-1}(E_t)$ need not be measurable, so the outer integral is undefined. If we attempted to apply the argument to a non-measurable set $E$, the cross-sections $E_t$ might individually be measurable (having well-defined $\mathcal{L}^{n-1}$-measure) while $E$ itself fails to have a well-defined $\mathcal{L}^n$-measure. The measurability hypothesis is not a technicality — it is what permits the passage from cross-sectional data to a global conclusion.
What happens if we try to compare sets with equal cross-sections but different topological structure? The Cavalieri principle does not care: it is purely a statement about measure, not shape. A filled cone and a cylinder with equal cross-sectional areas at every height have the same volume — even though they look entirely different. This is the power of the reduction to one-dimensional integration: geometry is washed away, and only the measure of each fiber matters.
This principle is closely related to the coarea formula in GMT II. The coarea formula generalizes Cavalieri from the coordinate slicing $\mathbb{R}^n = \mathbb{R}^{n-1} \times \mathbb{R}$ to arbitrary Lipschitz level sets: for a Lipschitz function $u : \mathbb{R}^n \to \mathbb{R}$, the $\mathcal{L}^n$-measure of a set can be recovered by integrating $\mathcal{H}^{n-1}$-measures of the level sets $\{u = t\}$. Cavalieri's principle is the flat, coordinate version of this much more flexible slicing machinery.
[example: Volume of a Ball via Cavalieri]
We compute $\mathcal{L}^n(B(0, r))$ for $B(0, r) \subset \mathbb{R}^n$. Write $\mathbb{R}^n = \mathbb{R}^{n-1} \times \mathbb{R}$. The cross-section at height $t \in [-r, r]$ is
\begin{align*}
B(0, r)_t = \{x \in \mathbb{R}^{n-1} : |x|^2 + t^2 < r^2\} = B(0, \sqrt{r^2 - t^2}) \subset \mathbb{R}^{n-1}.
\end{align*}
This is a ball in $\mathbb{R}^{n-1}$ of radius $\sqrt{r^2 - t^2}$. If we write $\omega_{n-1} = \mathcal{L}^{n-1}(B(0,1) \subset \mathbb{R}^{n-1})$, then
\begin{align*}
\mathcal{L}^{n-1}(B(0, r)_t) = \omega_{n-1}(r^2 - t^2)^{(n-1)/2}.
\end{align*}
By Fubini,
\begin{align*}
\mathcal{L}^n(B(0, r)) = \int_{-r}^{r} \omega_{n-1}(r^2 - t^2)^{(n-1)/2}\, d\mathcal{L}^1(t).
\end{align*}
Substituting $t = r\sin\theta$ with $\theta \in [-\pi/2, \pi/2]$, so $dt = r\cos\theta\, d\theta$ and $r^2 - t^2 = r^2\cos^2\theta$:
\begin{align*}
\mathcal{L}^n(B(0, r)) &= \omega_{n-1} \int_{-\pi/2}^{\pi/2} r^{n-1}\cos^{n-1}\theta \cdot r\cos\theta\, d\theta \\
&= \omega_{n-1} r^n \int_{-\pi/2}^{\pi/2} \cos^n\theta\, d\theta.
\end{align*}
The integral $\int_{-\pi/2}^{\pi/2} \cos^n\theta\, d\theta$ can be evaluated recursively using the Wallis formula, yielding the well-known recurrence $\omega_n = \omega_{n-1} \cdot \frac{\sqrt{\pi}\,\Gamma((n-1)/2 + 1)}{\Gamma(n/2 + 1)}$. The upshot is that each slice contributes its $(n-1)$-dimensional volume, and Fubini assembles these slices into the $n$-dimensional volume — a concrete realization of the Cavalieri principle.
[/example]
## Why Integrability Cannot Be Dropped in Fubini
The opening example of this chapter showed that iterated integrals can disagree when $f \notin L^1$: the function $(x^2 - y^2)/(x^2 + y^2)^2$ on $(0,1)^2$ yields $\pi/4$ in one order and $-\pi/4$ in the other, precisely because $f \notin L^1((0,1)^2, \mathcal{L}^2)$. The $\sigma$-finiteness failure is equally striking, and we already saw it in the diagonal computation from the construction section: with $\mu = \mathcal{L}^1$ and $\nu$ = counting measure on $[0,1]$, the indicator $\mathbb{1}_D$ of the diagonal gives iterated integrals $1$ and $0$.
These two failures are not pathological edge cases that arise from perverse choices — they identify the exact boundaries of Fubini's theorem. The failure of integrability (the opening example) shows why $f \in L^1$ is needed. The failure of $\sigma$-finiteness (the diagonal example) shows why both measure spaces must be $\sigma$-finite. Remove either hypothesis and counterexamples exist. Both failures are structural: the function or the measure violates a condition that Fubini genuinely requires, not just a condition imposed for aesthetic symmetry.
[remark: The Fubini–Tonelli Strategy in Practice]
In applications, the standard strategy is as follows. Given $f : X \times Y \to \mathbb{R}$, first apply Tonelli's theorem to $|f|$: compute one iterated integral of $|f|$ and check whether it is finite. If it is, then $f \in L^1(X \times Y, \mu \otimes \nu)$, and Fubini's theorem applies to $f$ itself, justifying any order of integration. If the iterated integral of $|f|$ is infinite, then Fubini's theorem does not apply and the iterated integrals of $f$ may disagree.
[/remark]
## Connection to Hausdorff Measure
We now have two measures on $\mathbb{R}^n$: the $n$-dimensional Lebesgue measure $\mathcal{L}^n$, built from the product construction, and the $n$-dimensional Hausdorff measure $\mathcal{H}^n$, built from covering by small sets. Do they agree? If not, Fubini's theorem — which we have stated for $\mathcal{L}^n$ — would not automatically apply to integrals against $\mathcal{H}^n$, and the slicing arguments needed throughout GMT would require separate justification.
For integer $s = n$, the Hausdorff measure $\mathcal{H}^n$ coincides with the Lebesgue measure $\mathcal{L}^n$ on $\mathbb{R}^n$ (Chapter 9). This identification relies on the isodiametric inequality and uses the specific normalization constant $\alpha(n) = \mathcal{L}^n(B(0,1))$ in the definition of $\mathcal{H}^n$.
In particular, for $f \in L^1(\mathbb{R}^n, \mathcal{L}^n)$, the identity $\mathcal{H}^n = \mathcal{L}^n$ means that Fubini's theorem applies equally to integration against $\mathcal{H}^n$. This will be used repeatedly when we compute Hausdorff measures via slicing: the $\mathcal{H}^n$-measure of a set can be computed as the integral of the $(n-1)$-dimensional Hausdorff measure of its cross-sections. More generally, the coarea formula (GMT II) expresses this principle for arbitrary Lipschitz functions:
\begin{align*}
\int_{\mathbb{R}^n} g(x) |Jf(x)|\, d\mathcal{L}^n(x) = \int_{\mathbb{R}} \left(\int_{f^{-1}(t)} g(x)\, d\mathcal{H}^{n-k}(x)\right) d\mathcal{L}^k(t)
\end{align*}
for a Lipschitz map $f : \mathbb{R}^n \to \mathbb{R}^k$. Tonelli's theorem in the product setting is the seed of this more general principle: integration over a higher-dimensional space can always be reduced to integration over fibers.
---
To measure the size of sets in geometric spaces, we need tools beyond integration: covering theorems provide the combinatorial structure that connects abstract measures to concrete geometric configurations. These theorems show how to cover irregular sets efficiently and control their measure via simple geometric shapes.
# 4. Covering Theorems
The problem of extracting useful information from a collection of balls sounds mundane — given infinitely many overlapping balls, can you find a manageable subcollection that still covers everything important? Yet this question is the technical heart of geometric measure theory. The moment you want to differentiate one measure against another, compute densities, or prove that Hausdorff measure has the right local behavior, you need to control how balls overlap. Covering theorems are the engine that makes this control possible, and they appear, often invisibly, in nearly every proof in the chapters that follow.
The fundamental difficulty is that a covering can be hopelessly redundant. If you cover the unit interval with all closed balls of radius $1/10$ centered at rational points, you have uncountably many balls, they overlap in a complicated way, and there is no hope of disjointifying them without losing coverage. A covering theorem extracts from this chaos a subcollection that is either disjoint (or nearly so) while still accounting for everything the original collection covered. Two strategies exist for achieving this, and they trade different costs: Vitali's theorem inflates the chosen balls but keeps them exactly disjoint; Besicovitch's theorem avoids any inflation but allows bounded overlap.
[example: Overlapping Balls in Dimension One]
Consider the collection $\mathcal{F}$ of all closed balls $\overline{B}(x, 1/10)$ for $x \in [0, 1]$. These balls cover $[-1/10, 11/10]$ and overlap enormously: every point in $(1/10, 9/10)$ is covered by infinitely many balls in $\mathcal{F}$. The Vitali theorem selects a disjoint subcollection: starting with the ball $\overline{B}(1/10, 1/10) = [0, 1/5]$, then picking $\overline{B}(3/10, 1/10) = [1/5, 2/5]$, and continuing to select every other ball, we obtain disjoint balls $\overline{B}((2k-1)/10, 1/10)$ for $k = 1, 2, 3, 4, 5$. Each has radius $1/10$, so the inflated balls $5 \cdot \overline{B}((2k-1)/10, 1/10)$ — centered at the same points but with radius $5/10 = 1/2$ — together cover $[-1/5, 6/5]$, which contains $[0, 1]$. The original union is covered by five times as many balls, with five times the radius, from a disjoint subcollection of five.
[/example]
This example already shows the structure: greedily choosing the biggest available balls at each step, never selecting two that touch, and then compensating for what was missed by expanding each chosen ball by a factor of five.
## Vitali's Covering Theorem
How do you find a disjoint subcollection of balls that accounts for all the mass in the original collection, when you have no control over how densely the balls are centered or how their radii vary?
The answer is a greedy algorithm. At each step, pick the largest ball not yet conflicting with your current selection. The radius of any ball you missed will be at most twice the radius of the ball that blocked it — because if the missed ball had been much larger, you would have picked it first. Inflating the chosen balls by a factor of five is then enough to reach any missed ball from the chosen ball that blocked it.
[quotetheorem:2963]
The hypothesis that the diameters are uniformly bounded is not optional. Without it, the greedy algorithm might always find arbitrarily large balls available, and the induction on radius size breaks down. The factor of five is sharp in the following sense: a factor smaller than five would fail for certain configurations in the plane, while five is always sufficient.
To see why five works and not three: if a ball $B \in \mathcal{F}$ was not selected, it must intersect some ball $B_j$ that was chosen at an earlier step. The reason $B$ was not chosen at step $j$ is that $B$ was still available but $B_j$ was chosen instead, which means $r(B_j) > \frac{1}{2}\sup\{r(C) : C \text{ still available}\} \geq \frac{1}{2} r(B)$, so $r(B) \leq 2 r(B_j)$. Since $B$ and $B_j$ intersect and $r(B) \leq 2 r(B_j)$, the triangle inequality gives: for any $x \in B$, there exists $y \in B \cap B_j$, so
\begin{align*}
|x - c_j| \leq |x - y| + |y - c_j| \leq r(B) + r(B) + r(B_j) \leq 2r(B) + r(B_j) \leq 4r(B_j) + r(B_j) = 5r(B_j),
\end{align*}
where $c_j$ denotes the center of $B_j$. Hence $x \in 5B_j$.
The Vitali theorem is powerful and broadly applicable: it works in any metric space satisfying the bounded-diameter assumption, and the proof requires no special structure of $\mathbb{R}^n$. But this generality is also its limitation. When we apply it to differentiate a measure $\mu$, we need to compare $\mu(B)$ with $\mu(5B)$, and for non-doubling measures the ratio $\mu(5B)/\mu(B)$ can be unbounded. This is precisely where Besicovitch's theorem becomes indispensable.
[example: Greedy Selection in Dimension Two]
Consider balls of various radii centered along the $x$-axis in $\mathbb{R}^2$: let $B_1 = \overline{B}((0,0), 2)$, $B_2 = \overline{B}((5,0), 1)$, and $B_3 = \overline{B}((3,0), 3/2)$. The greedy algorithm first selects $B_1$ (radius $2$, the largest). At step two, $B_3$ intersects $B_1$ (the distance between centers is $3 < 2 + 3/2$), so $B_3$ is blocked. The algorithm then selects $B_2$ (radius $1$, the largest remaining disjoint ball). The result is the disjoint family $\{B_1, B_2\}$.
Now $B_3$ was not selected, and $B_3 \cap B_1 \neq \varnothing$. Check that $B_3 \subset 5 B_1$: the center of $B_3$ is $(3, 0)$, the center of $B_1$ is $(0,0)$, so the farthest point of $B_3$ from the center of $B_1$ is at distance $3 + 3/2 = 9/2 < 10 = 5 \cdot 2$. Hence $B_3 \subset 5B_1$, confirming the theorem.
[/example]
## Besicovitch's Covering Theorem
Vitali's theorem extracts a disjoint subcollection but at the cost of inflating every ball by a factor of five. For Lebesgue measure this inflation is harmless — doubling conditions control how much measure can be added — but for singular measures such as the Cantor measure or Hausdorff measure on a lower-dimensional set, the ratio $\mu(5B)/\mu(B)$ can be arbitrarily large, making any inflation catastrophic. Is there a covering theorem that avoids dilation entirely?
The answer is Besicovitch's theorem, which relies on a fundamentally different idea. Instead of selecting a disjoint subcollection and inflating, it selects the original balls themselves — centered exactly at the points of interest — and allows bounded overlap. The bounded overlap comes from a purely geometric fact about Euclidean space: only finitely many pairwise-intersecting balls centered in a fixed ball can coexist when each center is outside all the others. This finite overlap constant $N(n)$ depends on the dimension and has no analogue in general metric spaces.
[definition: Besicovitch Number]
Let $n \geq 1$. The Besicovitch number $N(n)$ is the largest integer such that the following configuration is possible in $\mathbb{R}^n$: there exist $N(n)$ points $x_1, \ldots, x_{N(n)} \in \overline{B}(0, 1)$ such that for each $j$, the point $0$ lies in the ball $\overline{B}(x_j, 1)$, and no point $x_i$ lies in $\overline{B}(x_j, 1)$ for $i \neq j$.
[/definition]
The significance of $N(n)$ is that it bounds the maximum overlap any Besicovitch cover can achieve. The explicit values are known for small $n$: $N(1) = 2$, $N(2) = 14$ (though often $N(2) \leq 19$ is used in proofs), and $N(n)$ grows at most exponentially in $n$.
[quotetheorem:2964]
The contrast with the Vitali theorem is worth isolating. Vitali gives you a single family of disjoint balls, but the price is dilation by a factor of five: the balls in the conclusion are not the original balls but their inflated versions. Besicovitch gives you $N(n)$ families of disjoint balls from the original collection with no dilation whatsoever. The price is that the families are not individually disjoint — a given point may be covered by balls from up to $N(n)$ different families. But for measure-theoretic purposes, this bounded-overlap property is exactly what is needed: if $f$ is any non-negative measurable function, then
\begin{align*}
\sum_{j=1}^{N} \sum_{B \in \mathcal{G}_j} \int_B f \, d\mu \leq N \int_{\mathbb{R}^n} f \, d\mu,
\end{align*}
because each point is integrated at most $N$ times. This inequality is the mechanism by which Besicovitch controls measure, and it works regardless of whether $\mu$ is doubling.
The proof of the Besicovitch theorem proceeds through a careful ordering argument. One orders the balls $\{\overline{B}(x, r_x) : x \in A\}$ by decreasing radius and then assigns each ball to the first family $\mathcal{G}_j$ that it does not intersect. The geometric content of the theorem — that $N(n)$ families always suffice — is equivalent to the fact that the geometry of $\mathbb{R}^n$ limits how many non-overlapping balls of comparable radius can simultaneously intersect a fixed ball. This is an intrinsically Euclidean phenomenon: in infinite-dimensional Hilbert space or in exotic metric spaces, no finite $N$ works.
[explanation: Why Besicovitch Fails Outside Euclidean Space]
The Besicovitch theorem depends on the finite-dimensionality of $\mathbb{R}^n$ in a deep way. The key geometric fact is: if $B(x_1, r_1), \ldots, B(x_k, r_k)$ are balls all centered in $B(x_0, R)$ and all intersecting $B(x_0, R)$, but no $x_i$ lies in any $B(x_j, r_j)$ for $i \neq j$, then $k$ is bounded by a constant depending only on $n$.
The reason is that the condition "no $x_i \in B(x_j, r_j)$" means $|x_i - x_j| \geq r_j$ for all $i \neq j$. If all radii are comparable, this is a packing condition: the points $x_i$ are separated from each other relative to their radii, and the finite dimension of $\mathbb{R}^n$ bounds how many such separated points can fit in a ball of bounded size. In infinite-dimensional spaces, you can pack infinitely many mutually distant unit vectors in the unit sphere, so the analogous bound fails. In a general metric space, even in dimension two with a non-Euclidean metric, the geometry may not support the same packing bound.
This is why the Besicovitch theorem is genuinely an $\mathbb{R}^n$ theorem — it reflects the specific geometry of Euclidean space and has no straightforward generalization to metric measure spaces.
[/explanation]
## Doubling Measures and the Role of Dilation
When is the simpler Vitali theorem sufficient, and when is Besicovitch genuinely necessary? The answer lies in whether the measure satisfies a doubling condition.
[definition: Doubling Measure]
A Borel measure $\mu$ on $\mathbb{R}^n$ is called doubling if there exists a constant $C > 0$ such that for every ball $B(x, r)$ and every $x \in \mathbb{R}^n$,
\begin{align*}
\mu(B(x, 2r)) \leq C \cdot \mu(B(x, r)).
\end{align*}
The constant $C$ is called the doubling constant of $\mu$.
[/definition]
Lebesgue measure $\mathcal{L}^n$ is doubling with constant $C = 2^n$, since $\mathcal{L}^n(B(x, 2r)) = 2^n \mathcal{L}^n(B(x, r))$. More generally, any Radon measure satisfying a polynomial growth condition $\mu(B(x, r)) \leq C r^s$ for some $s > 0$ is doubling. For doubling measures, applying the Vitali theorem and inflating by five incurs at most a fixed multiplicative cost: $\mu(5B) \leq C_0 \mu(B)$ for some constant $C_0$ depending only on the doubling constant and the dimension.
The Cantor measure $\mu$ on $\mathbb{R}$ provides the canonical example where doubling fails. Recall that $\mu$ is supported on the Cantor set $C \subset [0,1]$ and satisfies $\mu(I_{k,j}) = 2^{-k}$ for each $k$-th level interval $I_{k,j}$ of length $3^{-k}$. Consider the ball $B = \overline{B}(x, 3^{-k})$ centered at a left endpoint of a level-$k$ interval: $\mu(B) \approx 2^{-k}$ since $B$ captures at most two level-$k$ intervals. But the ball $5B = \overline{B}(x, 5 \cdot 3^{-k})$ has radius comparable to $5 \cdot 3^{-k} \approx 3^{-k+2}$, so it can intersect up to $2^2 = 4$ level-$(k-2)$ intervals, giving $\mu(5B) \approx 4 \cdot 2^{-(k-2)} = 2^{2-k} \cdot 4 = 16 \cdot 2^{-k}$. The ratio $\mu(5B)/\mu(B)$ is bounded above by a constant, which seems fine — but more subtle configurations at points where the Cantor set is locally more sparse can make this ratio arbitrarily large. The essential point is that $\mu(5B)$ cannot be controlled by $\mu(B)$ uniformly across all balls.
[example: Failure of Vitali for the Cantor Measure]
Let $\mu$ be the Cantor measure on $\mathbb{R}$. We show that a naive application of Vitali's theorem, with the five-fold dilation, does not give useful estimates for $\mu$.
Consider points of the form $x = \sum_{j=1}^k a_j 3^{-j}$ with $a_j \in \{0, 2\}$ (endpoints of the $k$-th stage of the Cantor construction). For the ball $B = \overline{B}(x, 3^{-k})$, the ball $B$ contains at most two consecutive level-$k$ intervals, so $\mu(B) \leq 2 \cdot 2^{-k}$.
Now consider $5B = \overline{B}(x, 5 \cdot 3^{-k})$. This ball has radius $5 \cdot 3^{-k}$, which exceeds $3^{-(k-2)} = 9 \cdot 3^{-k}$. So $5B$ is not quite large enough to reach level $k-2$, but it is comparable to a ball of radius $3^{-(k-1)}$. A ball of radius $3^{-(k-1)}$ centered at a point of $C$ can intersect up to $2$ level-$(k-1)$ intervals, giving measure $\mu \leq 2 \cdot 2^{-(k-1)} = 2^{2-k}$.
The ratio $\mu(5B)/\mu(B) \leq 2^{2-k}/(2^{1-k}) = 2$ is bounded here. But if instead we center the ball at a point $x \in C$ that is isolated from nearby Cantor intervals — for instance, a point in the middle portion of a long removed interval that abuts a long gap — then small balls centered at $x$ have very small $\mu$-measure, while their five-fold dilates cross into regions of higher $\mu$-density. The ratio $\mu(5B)/\mu(B)$ grows without bound as the radius shrinks, because the Cantor measure is not doubling.
The Besicovitch theorem sidesteps this failure completely: it never dilates the balls, only organizing the original centered balls into a bounded-overlap cover.
[/example]
## Differentiation of Measures via Covering Theorems
The reason covering theorems matter in practice is that they are the mechanism for proving differentiation results: statements of the form "the ratio $\nu(B(x,r))/\mu(B(x,r))$ converges to a well-defined limit as $r \to 0$, for $\mu$-almost every $x$." This is the GMT version of the fundamental theorem of calculus.
The difficulty is that without a covering theorem, you have no way to show the set where the ratio behaves badly has small measure. The standard strategy is: suppose the upper derivative exceeds the lower derivative on a set $E$ of positive $\mu$-measure. For each $x \in E$, choose small balls witnessing the discrepancy. Apply a covering theorem to these balls to obtain a controlled family, then use the family to estimate $\mu(E)$ and obtain a contradiction.
For the Lebesgue differentiation theorem — where $\mu = \mathcal{L}^n$ — Vitali suffices, because Lebesgue measure is doubling and the five-fold dilation costs only a dimensional constant. But for the general Radon measure differentiation theorem, where $\mu$ can be any Radon measure, the Besicovitch theorem is required: the balls must remain unenlarged so that the ratio $\nu(B)/\mu(B)$ can be controlled directly without any auxiliary estimate on $\mu(5B)/\mu(B)$.
[quotetheorem:2965]
The hypothesis that both $\mu$ and $\nu$ are Radon — locally finite, Borel regular — is essential at two points. Local finiteness ensures that $\mu(\overline{B}(x, r)) < \infty$ for small $r$, making the ratio meaningful. Borel regularity ensures the upper and lower derivatives are measurable functions, so the sets where they differ are measurable and can be assigned $\mu$-measure. Without Radon regularity, the covering arguments break down because the measure of small balls may not be controllable.
The connection to the Lebesgue decomposition is the deepest part of this theorem. The derivative $D_\mu \nu(x)$ is exactly the Radon-Nikodym derivative of the absolutely continuous part $\nu_{ac}$ with respect to $\mu$. On the singular support of $\nu_s$, the derivative $D_\mu \nu(x) = +\infty$ for $\nu_s$-almost every $x$ — the singular measure concentrates mass infinitely relative to $\mu$. This dichotomy between finite derivative and infinite derivative is a clean separation at the level of individual points, not just at the level of sets.
## The Lebesgue Density Theorem
The most concrete and classical application of the Vitali covering theorem is the Lebesgue density theorem: almost every point of a measurable set $E \subset \mathbb{R}^n$ is a density-one point, meaning the fraction of $B(x, r)$ occupied by $E$ approaches $1$ as $r \to 0$.
What could go wrong without a covering theorem? One might try to argue directly: take the set $A$ of points in $E$ where the density is not $1$, and try to show it has measure zero. But to translate the pointwise condition "density is not $1$" into a bound on $\mathcal{L}^n(A)$, you need to move from the individual balls at each $x \in A$ to a global measure estimate. This is precisely what a covering theorem provides: it selects a countable disjoint subcollection from the family of witnessing balls, whose union accounts for all of $A$ up to a bounded factor.
[quotetheorem:894]
The theorem says almost every point "knows" whether it is in $E$ or not, in the sense that the local density of $E$ is either $1$ or $0$ at almost every point. Points where the density is strictly between $0$ and $1$ — neither fully inside nor fully outside — form a set of measure zero.
To understand why the finite-measure hypothesis in the Vitali argument matters, consider an unbounded set. If $E = \mathbb{R}^n$, the density is $1$ everywhere, since every ball centered at any point is entirely contained in $E$. But if $E$ is something pathological with infinite measure that is everywhere locally like a half-space, the density could equal $1/2$ on a set of positive measure. The theorem prevents this: for any measurable $E$, the density is almost everywhere exactly $1$ on $E$ and exactly $0$ on $E^c$, regardless of the total measure.
The proof runs as follows. Fix $\varepsilon > 0$ and $\lambda \in (0, 1)$. Let
\begin{align*}
A_\lambda = \left\{ x \in E : \limsup_{r \to 0^+} \frac{\mathcal{L}^n(E^c \cap B(x,r))}{\mathcal{L}^n(B(x,r))} > \lambda \right\}.
\end{align*}
We claim $\mathcal{L}^n(A_\lambda) = 0$ for each $\lambda > 0$. For each $x \in A_\lambda$, there exist arbitrarily small radii $r$ such that $\mathcal{L}^n(E^c \cap B(x,r)) > \lambda \mathcal{L}^n(B(x,r))$. Fix an open set $U \supset A_\lambda$ with $\mathcal{L}^n(U) < \mathcal{L}^n(A_\lambda) + \varepsilon$. The collection of all such balls $B(x,r)$ with $B(x,r) \subset U$ forms a Vitali cover of $A_\lambda$. Apply the Vitali theorem to extract a disjoint subcollection $\{B_j\}$ with $A_\lambda \subset \bigcup_j 5B_j$. Then:
\begin{align*}
\lambda \sum_j \mathcal{L}^n(B_j) < \sum_j \mathcal{L}^n(E^c \cap B_j) \leq \mathcal{L}^n(E^c \cap U).
\end{align*}
Since the $B_j$ are disjoint and contained in $U$, and $5B_j \supset A_\lambda$'s coverage:
\begin{align*}
\mathcal{L}^n(A_\lambda) \leq 5^n \sum_j \mathcal{L}^n(B_j) \leq \frac{5^n}{\lambda} \mathcal{L}^n(E^c \cap U).
\end{align*}
Since $A_\lambda \subset E$, the set $E^c \cap U$ is contained in $U \setminus A_\lambda \subset U \setminus E$, which has measure at most $\mathcal{L}^n(U \setminus E) \leq \mathcal{L}^n(U) - \mathcal{L}^n(A_\lambda)$. For $E$ locally finite, one can choose $U$ to make $\mathcal{L}^n(U \setminus E)$ small, giving $\mathcal{L}^n(A_\lambda) = 0$. A symmetric argument handles the density-zero part on $E^c$.
<!-- illustration-needed: the Vitali argument for the Lebesgue density theorem — show a set E in R^2, a point x on its boundary where the density is 1/2 (center of a square grid region), and concentric balls B(x,r) showing the fraction of E inside each ball; contrast with a density-one point interior to E where the balls are almost entirely filled by E -->
[example: Density at a Boundary Point]
Let $E = \{(x_1, x_2) \in \mathbb{R}^2 : x_1 > 0\}$, the open right half-plane. For any $x = (0, a)$ on the vertical axis, the ball $B(x, r)$ is divided exactly in half by the line $x_1 = 0$, so
\begin{align*}
\frac{\mathcal{L}^2(E \cap B(x, r))}{\mathcal{L}^2(B(x, r))} = \frac{1}{2} \quad \text{for all } r > 0.
\end{align*}
The density is $1/2$ at every boundary point, never approaching $1$ or $0$. The Lebesgue density theorem says the set of such points must have measure zero — and indeed, the boundary $\{x_1 = 0\}$ has $\mathcal{L}^2$-measure zero. Every interior point $(a, b)$ with $a > 0$ has density $1$: for $r < a$, the ball $B((a,b), r)$ is entirely contained in $E$, so the density is $1$ exactly, not just in the limit.
For a more interesting case, let $E = \bigcup_{k=1}^\infty [2^{-2k}, 2^{-(2k-1)}] \subset \mathbb{R}$. This set alternates intervals of $E$ and intervals of $E^c$ with geometrically shrinking lengths. At $x = 0$: for $r = 2^{-2k}$, the interval $[0, r]$ contains the complete interval $[2^{-2k}, 2^{-(2k-1)}]$ of length $2^{-(2k-1)} - 2^{-2k} = 2^{-2k}$, so $\mathcal{L}^1(E \cap [-r, r]) \approx 2^{-2k}$ while $\mathcal{L}^1([-r, r]) = 2 \cdot 2^{-2k}$. The density ratio is approximately $1/2$. But for $r = 2^{-(2k-1)}$, the ball $[-r, r]$ contains the gap interval $[2^{-(2k+1)}, 2^{-2k}]$ without a corresponding $E$-interval, giving density approximately $1/4$. The density at $0$ does not exist: the $\limsup$ and $\liminf$ differ. The Lebesgue density theorem guarantees that such failure points form a set of measure zero, and $0$ is indeed a single point.
[/example]
## Vitali Covers and the Fine Structure of Differentiation
The Lebesgue density theorem is a special case of a more general principle: for a locally integrable function $f \in L^1_\text{loc}(\mathbb{R}^n)$, the average of $f$ over shrinking balls recovers $f(x)$ at almost every point. This stronger statement requires a more refined notion of covering.
[definition: Vitali Cover]
Let $A \subset \mathbb{R}^n$ and let $\mathcal{V}$ be a collection of closed balls. We say $\mathcal{V}$ is a Vitali cover of $A$ if for every $x \in A$ and every $\varepsilon > 0$, there exists $B \in \mathcal{V}$ with $x \in B$ and $\operatorname{diam}(B) < \varepsilon$. In other words, $\mathcal{V}$ contains balls of arbitrarily small diameter centered at every point of $A$.
[/definition]
The key property of a Vitali cover is that it is fine: every point is accessible by arbitrarily small balls. This is exactly what is needed to pass a covering theorem argument to the limit $r \to 0$. The Vitali covering theorem applied to a Vitali cover yields, for any $\varepsilon > 0$, a disjoint subcollection of balls from $\mathcal{V}$ that covers $A$ up to a set of measure smaller than $\varepsilon$. This is sometimes stated as a separate result: from a Vitali cover, one can extract a countable disjoint subcollection that covers almost all of $A$.
[quotetheorem:2967]
The requirement $\mathcal{L}^n(A) < \infty$ is used to ensure that the greedy algorithm selects balls of radius bounded away from zero only finitely many times before the remaining uncovered set has small measure. For sets of infinite measure, the statement holds locally.
This form of the Vitali theorem is the one that appears in proofs of the Lebesgue differentiation theorem. The function version says: for $f \in L^1_\text{loc}(\mathbb{R}^n)$ and almost every $x$,
\begin{align*}
\lim_{r \to 0^+} \frac{1}{\mathcal{L}^n(B(x, r))} \int_{B(x, r)} f(y) \, d\mathcal{L}^n(y) = f(x).
\end{align*}
This is not just density of a set but recovery of a function value from local averages. The proof reduces to applying the Vitali covering lemma to the set where the average deviates from $f(x)$ by more than $\lambda$, for each rational $\lambda$.
[remark: Points of Lebesgue Regularity]
A point $x$ at which the limit above holds is called a Lebesgue point of $f$. The Lebesgue differentiation theorem guarantees that $\mathcal{L}^n$-almost every point is a Lebesgue point. At a Lebesgue point, the function $f$ is "approximately continuous" in a measure-theoretic sense: the average of $|f - f(x)|$ over $B(x, r)$ goes to zero, which is strictly stronger than the average of $f$ itself converging to $f(x)$. Every continuity point is a Lebesgue point, but Lebesgue points are far more general — $L^1_\text{loc}$ functions can have Lebesgue points everywhere despite being discontinuous everywhere.
[/remark]
## Maximal Functions and Covering Theorem Estimates
Behind the Lebesgue differentiation theorem and the density results lies a universal tool: the Hardy-Littlewood maximal function. It measures the worst-case local average of $|f|$ and controls, via the Vitali theorem, the set where this worst case is large.
[definition: Hardy-Littlewood Maximal Function]
Let $f \in L^1_\text{loc}(\mathbb{R}^n)$. The Hardy-Littlewood maximal function of $f$ is the function $\mathcal{M}f : \mathbb{R}^n \to [0, +\infty]$ defined by
\begin{align*}
\mathcal{M}f(x) = \sup_{r > 0} \frac{1}{\mathcal{L}^n(B(x, r))} \int_{B(x, r)} |f(y)| \, d\mathcal{L}^n(y).
\end{align*}
[/definition]
The maximal function is always lower semicontinuous (hence measurable) and satisfies $\mathcal{M}f(x) \geq |f(x)|$ at every Lebesgue point of $f$. Its usefulness comes from the maximal inequality, which is proved directly using the Vitali covering theorem.
[quotetheorem:2968]
The weak-type $(1,1)$ inequality — the first conclusion — is the direct output of the Vitali theorem. For each $x$ in the set $E_\lambda = \{\mathcal{M}f > \lambda\}$, there exists a ball $B(x, r_x)$ such that
\begin{align*}
\frac{1}{\mathcal{L}^n(B(x, r_x))} \int_{B(x, r_x)} |f| \, d\mathcal{L}^n > \lambda,
\end{align*}
which gives $\int_{B(x, r_x)} |f| \, d\mathcal{L}^n > \lambda \mathcal{L}^n(B(x, r_x))$. The collection $\{B(x, r_x) : x \in E_\lambda\}$ has bounded diameters (on any compact set), so Vitali provides a disjoint subcollection $\{B_j\}$ with $E_\lambda \subset \bigcup_j 5B_j$. Then:
\begin{align*}
\mathcal{L}^n(E_\lambda) \leq \sum_j \mathcal{L}^n(5B_j) = 5^n \sum_j \mathcal{L}^n(B_j) \leq \frac{5^n}{\lambda} \sum_j \int_{B_j} |f| \, d\mathcal{L}^n \leq \frac{5^n}{\lambda} \|f\|_{L^1}.
\end{align*}
This gives $C(n) = 5^n$. The strong-type $(p, p)$ inequality for $p > 1$ follows from the weak-type estimate via the Marcinkiewicz interpolation theorem.
The hypothesis $f \in L^1$ is the minimal integrability for the weak-type bound to make sense. If $f$ is only in $L^1_\text{loc}$ but not globally in $L^1$, the estimate still holds locally: for any compact $K$, restricting to balls contained in a neighborhood of $K$ gives a local version. The strong-type bound fails at $p = 1$: the maximal function of an $L^1$ function need not be in $L^1$ (consider $f = \mathbf{1}_{[0,1]}$ and note that $\mathcal{M}f(x) \sim 1/|x|$ for large $|x|$, which is not integrable). This is not a deficiency of the estimate but a genuine feature of the maximal operator.
## Summary and Forward Connections
Covering theorems are not isolated technical results — they are the structural backbone of measure differentiation theory. The Vitali theorem, with its greedy algorithm and five-fold inflation, is sufficient for all differentiation questions where the reference measure $\mu$ is doubling, including the Lebesgue differentiation theorem and the Hardy-Littlewood maximal inequality. The Besicovitch theorem, with its bounded-overlap cover and no dilation, handles the general case of arbitrary Radon measures, including the differentiation of singular measures and the derivation of the Lebesgue decomposition.
In Chapter 5, both theorems will be deployed in the proof that every Radon measure has a derivative with respect to every other Radon measure at $\mu$-almost every point. In Chapter 10, the Besicovitch theorem underpins the density estimates for Hausdorff measure: the upper density of $\mathcal{H}^s \lfloor E$ is at most $1$ at $\mathcal{H}^s$-almost every point of $E$, a result that requires Besicovitch because $\mathcal{H}^s$ is typically not doubling. In Chapter 9, the Lebesgue density theorem applied to Hausdorff measure provides the key step in identifying which sets are "rectifiable" at the local level.
The factor of five in the Vitali theorem and the dimensional constant $N(n)$ in the Besicovitch theorem are not artifacts to be ignored — they carry quantitative information about how measures can be controlled. For Lebesgue measure, the factor $5^n$ in the maximal inequality is the concrete bound on the Hardy-Littlewood constant. For Hausdorff measure, the Besicovitch constant $N(n)$ appears in the explicit constants of density estimates. Understanding which theorem to apply, and what the quantitative costs are, is a fundamental skill in geometric measure theory.
---
Covering theorems provide the geometric structure needed to study local properties of measures. The Radon-Nikodym derivative and Lebesgue differentiation theorem quantify how densely a measure is distributed at each point, transforming global measure-theoretic information into pointwise behavior.
# 5. Differentiation of Radon Measures
Suppose you know two Radon measures $\mu$ and $\nu$ on $\mathbb{R}^n$ and you want to understand how $\nu$ distributes its mass relative to $\mu$. You might ask: at a given point $x$, how intensely does $\nu$ sit relative to $\mu$ near $x$? The abstract answer from the Radon-Nikodym theorem is that, when $\nu \ll \mu$, there is a measurable function $f$ with $\nu(A) = \int_A f \, d\mu$ for every Borel set $A$. But the Radon-Nikodym theorem offers no geometric construction of $f$ — it exists by an abstract functional-analytic argument, and there is no formula telling you what $f(x)$ is at a specific point. The differentiation theory of Radon measures fills exactly this gap. It produces $f(x)$ as a local limit of ratios $\nu(B(x,r)) / \mu(B(x,r))$ as $r \to 0$, and it does so at $\mu$-almost every point.
This is a profound improvement. The Radon-Nikodym density becomes geometrically transparent: it is the limiting ratio of how much $\nu$-mass and $\mu$-mass accumulate near $x$ as we shrink balls around $x$. When $\mu = \mathcal{L}^n$, this recovers the classical Lebesgue differentiation theorem. But the framework is far more general: $\mu$ can be a singular measure, a surface measure, or any Radon measure on $\mathbb{R}^n$, and the theory still works.
The key tool that makes everything possible is the Besicovitch covering theorem from Chapter 4. Unlike Vitali's theorem, Besicovitch does not dilate balls by a factor of 5; it only uses balls already centered at points of the set under examination. This means that control over $\nu(B(x,r)) / \mu(B(x,r))$ translates directly into measure-theoretic estimates on $\mu$, without any doubling hypothesis on the measures involved. The theory thus works equally well for absolutely continuous and singular measures, and recovers the Lebesgue decomposition $\nu = \nu_{ac} + \nu_s$ as a consequence of pointwise differentiation.
[example: The Cantor Measure Has No Lebesgue Density]
Let $C$ be the middle-thirds Cantor set in $[0,1]$ and let $\mu_C$ be the Cantor measure, defined by $\mu_C(I_{k,j}) = 2^{-k}$ for each of the $2^k$ closed intervals $I_{k,j}$ appearing at stage $k$ of the Cantor construction. We ask whether $\mu_C \ll \mathcal{L}^1$, and if so, what the Radon-Nikodym density looks like.
At any point $x$ not in $C$, once $r$ is small enough that $B(x, r)$ is contained entirely within a removed open interval, we have $\mu_C(B(x, r)) = 0$. So the ratio $\mu_C(B(x,r)) / (2r)$ is zero for all small $r$, and the derivative $D_{\mathcal{L}^1} \mu_C(x) = 0$.
At any point $x$ in $C$, the situation is dramatically different. Choose $k$ so that $3^{-(k+1)} \leq r < 3^{-k}$. Then $B(x, r)$ intersects at most two intervals of level $k$, each carrying $\mu_C$-measure $2^{-k}$, so $\mu_C(B(x, r)) \leq 2 \cdot 2^{-k}$. But also $\mu_C(B(x, r)) \geq 2^{-k}$ since $x \in C$ and the interval $I_{k,j}$ containing $x$ is entirely inside $B(x, r)$. Thus:
\begin{align*}
\frac{\mu_C(B(x,r))}{2r} \geq \frac{2^{-k}}{2 \cdot 3^{-k}} = \frac{1}{2} \left(\frac{3}{2}\right)^k \to \infty
\end{align*}
as $k \to \infty$. The ratio diverges at every point of $C$.
This shows that $\mu_C$ is singular with respect to $\mathcal{L}^1$: the derivative $D_{\mathcal{L}^1} \mu_C$ equals zero at $\mathcal{L}^1$-almost every point (those not in $C$, since $\mathcal{L}^1(C) = 0$), yet $\mu_C(C) = 1$. The Cantor measure lives entirely on a Lebesgue-null set, and the differentiation theory will confirm: for a singular measure $\nu_s \perp \mu$, the derivative $D_\mu \nu_s$ vanishes $\mu$-almost everywhere.
[/example]
## The Derivative of a Radon Measure
How should we define the local density of $\nu$ relative to $\mu$ at a point? The straightforward answer is to shrink balls and take limits of ratios — but we need to be careful about when the limit exists and when it is finite. The definition must also handle the possibility that $\mu$ assigns zero mass to small balls, which would make the ratio undefined.
[definition: Upper and Lower Derivatives]
Let $\mu$ and $\nu$ be Radon measures on $\mathbb{R}^n$. For $x \in \mathbb{R}^n$, the **upper derivative** of $\nu$ with respect to $\mu$ at $x$ is
\begin{align*}
\overline{D}_\mu \nu(x) &= \limsup_{r \to 0^+} \frac{\nu(B(x,r))}{\mu(B(x,r))}
\end{align*}
and the **lower derivative** is
\begin{align*}
\underline{D}_\mu \nu(x) &= \liminf_{r \to 0^+} \frac{\nu(B(x,r))}{\mu(B(x,r))}
\end{align*}
where we set $\nu(B(x,r))/\mu(B(x,r)) = +\infty$ if $\mu(B(x,r)) = 0$ and $\nu(B(x,r)) > 0$, and $0/0 = 0$ by convention.
[/definition]
The upper and lower derivatives are always defined (possibly $+\infty$), but they need not agree. When they do, we obtain the derivative.
[definition: Derivative of a Measure]
Let $\mu$ and $\nu$ be Radon measures on $\mathbb{R}^n$. The **derivative of $\nu$ with respect to $\mu$** at $x$, written $D_\mu \nu(x)$, is defined when $\overline{D}_\mu \nu(x) = \underline{D}_\mu \nu(x)$, and in that case
\begin{align*}
D_\mu \nu(x) &= \lim_{r \to 0^+} \frac{\nu(B(x,r))}{\mu(B(x,r))}.
\end{align*}
[/definition]
The notation deliberately mirrors the classical derivative: just as $f'(x) = \lim_{h \to 0} (f(x+h) - f(x))/h$ measures the rate of change of $f$ per unit length, $D_\mu \nu(x)$ measures the density of $\nu$-mass per unit $\mu$-mass near $x$. When $\mu = \mathcal{L}^n$ and $\nu(A) = \int_A f \, d\mathcal{L}^n$ for some $f \in L^1_{\mathrm{loc}}$, the Lebesgue differentiation theorem (Chapter 6) will show that $D_{\mathcal{L}^n} \nu(x) = f(x)$ for $\mathcal{L}^n$-almost every $x$. The present chapter establishes this for general Radon measures.
[remark: Convention at Points Where $\mu(B(x,r)) = 0$]
If $\mu(B(x,r)) = 0$ for some $r > 0$, then by inner regularity $\mu(B(x,\rho)) = 0$ for all $\rho \leq r$. Such points are $\mu$-negligible, and the value of $D_\mu \nu$ there is irrelevant for integration against $\mu$. For this reason, the definition is stated in terms of $\mu$-a.e. existence, and the convention $0/0 = 0$ at such points does not affect any of the main results.
[/remark]
Before proving that the derivative exists almost everywhere, it is worth isolating the key technical estimate that powers the argument. The Besicovitch covering theorem says: given balls centered at every point of a set $A$, one can extract a subcollection that covers $A$ with bounded overlap — at most $N(n)$ layers, where $N(n)$ depends only on the dimension. This bounded-overlap property is exactly what allows us to pass from local information (the ratio at each ball) to global measure estimates.
[quotetheorem:2969]
The proof uses Besicovitch as follows: for each $x \in A_t$, there exist arbitrarily small $r > 0$ with $\nu(B(x,r)) > t \cdot \mu(B(x,r))$. Applying Besicovitch to extract balls $B(x_j, r_j)$ covering $A_t$ with at most $N(n)$ overlaps, one sums the inequalities $\nu(B(x_j, r_j)) > t \cdot \mu(B(x_j, r_j))$. The bounded overlap means $\sum_j \mu(B(x_j, r_j)) \leq N(n) \cdot \mu(A_t)$, and by subadditivity $\sum_j \nu(B(x_j, r_j)) \lesssim \nu(A_t)$. The estimate $\nu(A_t) \geq t \cdot \mu(A_t)$ follows in the limit. This estimate is the reason why Besicovitch is indispensable here: the Vitali theorem with its 5-fold dilation would produce $\mu(5 B_j)$ rather than $\mu(B_j)$, and for a non-doubling measure $\mu$ the ratio $\mu(5B_j)/\mu(B_j)$ is uncontrolled.
## The Differentiation Theorem
What exactly does the derivative $D_\mu \nu$ tell us, and when does it exist? The core result of this chapter answers both questions simultaneously: the derivative exists $\mu$-almost everywhere, and when $\nu \ll \mu$, it recovers the Radon-Nikodym density precisely.
[quotetheorem:2970]
The hypothesis $\nu \ll \mu$ (absolute continuity) is essential in two ways. First, it guarantees that the derivative is finite almost everywhere: if $\nu$ concentrated positive mass on a $\mu$-null set, the ratio $\nu(B(x,r)) / \mu(B(x,r))$ would blow up at that point. Second, the integration formula (iii) would fail if $\nu$ had a singular component — the integral $\int_A D_\mu \nu \, d\mu$ can only see mass that $\nu$ places on sets of positive $\mu$-measure. The absolute continuity hypothesis is the condition that guarantees no mass is hidden from $\mu$.
Part (iv) is a significant conceptual statement. The Radon-Nikodym derivative $d\nu/d\mu$ is typically defined abstractly as the function satisfying (iii), unique up to $\mu$-null sets. This theorem says you can compute it explicitly as a geometric limit of ball ratios. There is nothing circular here: the theorem proves that the limit (a geometric object) and the abstract functional-analytic derivative are the same object.
The proof strategy reveals how covering theory enters. To show the derivative exists, one must show the set $\{x : \overline{D}_\mu \nu(x) > \underline{D}_\mu \nu(x)\}$ has $\mu$-measure zero. This set is controlled by intersecting with $\{x : \overline{D}_\mu \nu(x) > s > t > \underline{D}_\mu \nu(x)\}$ for rationals $s > t$. For such a set $E_{s,t}$, the Besicovitch estimate applied twice yields both $\nu(E_{s,t}) \geq s \cdot \mu(E_{s,t})$ and $\nu(E_{s,t}) \leq t \cdot \mu(E_{s,t})$, forcing $\mu(E_{s,t}) = 0$ since $s > t$. A countable union over rationals shows that the upper and lower derivatives agree $\mu$-almost everywhere. The integration formula then follows from Borel regularity and the approximation of Borel sets by compact sets, using inner regularity of Radon measures.
[example: Differentiation of Lebesgue Measure Against Itself]
Take $\mu = \nu = \mathcal{L}^1$ on $\mathbb{R}$. Then for any $x \in \mathbb{R}$:
\begin{align*}
\frac{\nu(B(x,r))}{\mu(B(x,r))} &= \frac{\mathcal{L}^1(B(x,r))}{\mathcal{L}^1(B(x,r))} = 1
\end{align*}
for every $r > 0$. So $D_{\mathcal{L}^1} \mathcal{L}^1(x) = 1$ for every $x$, and the integration formula gives
\begin{align*}
\mathcal{L}^1(A) &= \int_A 1 \, d\mathcal{L}^1
\end{align*}
which is a tautology. The Radon-Nikodym density of $\mathcal{L}^1$ with respect to itself is the constant function $1$.
More interesting: take $\mu = \mathcal{L}^1$ and $\nu(A) = \int_A f \, d\mathcal{L}^1$ for $f \in L^1_{\mathrm{loc}}(\mathbb{R})$. Then:
\begin{align*}
\frac{\nu(B(x,r))}{\mu(B(x,r))} &= \frac{1}{2r} \int_{x-r}^{x+r} f(y) \, d\mathcal{L}^1(y).
\end{align*}
The differentiation theorem asserts this converges to $f(x)$ for $\mathcal{L}^1$-almost every $x$ — this is precisely the Lebesgue differentiation theorem. The general theory thus contains the classical theorem as a special case. Note that $\nu \ll \mu = \mathcal{L}^1$ by construction, so all hypotheses of the theorem are satisfied.
[/example]
[example: A Weighted Measure on a Curve]
Let $\gamma: [0, 1] \to \mathbb{R}^2$ be an injective $C^1$ curve with $|\dot{\gamma}(t)| > 0$ everywhere, and let
\begin{align*}
\nu(A) &= \mathcal{H}^1(A \cap \gamma([0,1]))
\end{align*}
be the one-dimensional Hausdorff measure restricted to the image of $\gamma$. Let $\mu = \mathcal{L}^2$ be the standard Lebesgue measure on $\mathbb{R}^2$.
Since $\gamma$ is a one-dimensional object and $\mathcal{L}^2(\gamma([0,1])) = 0$, we have $\nu \ll \mathcal{L}^2$ iff the curve has $\mathcal{L}^2$-measure zero. In fact $\nu \perp \mathcal{L}^2$: set $N = \gamma([0,1])$, then $\mathcal{L}^2(N) = 0$ and $\nu(\mathbb{R}^2 \setminus N) = 0$. So $\nu$ and $\mathcal{L}^2$ are mutually singular.
What does differentiation reveal? For $x \in \gamma([0,1])$:
\begin{align*}
\frac{\nu(B(x,r))}{\mathcal{L}^2(B(x,r))} &= \frac{\mathcal{H}^1(B(x,r) \cap \gamma([0,1]))}{\pi r^2} \sim \frac{2r}{\pi r^2} = \frac{2}{\pi r} \to \infty
\end{align*}
as $r \to 0^+$, since the curve passes through $B(x,r)$ and $\mathcal{H}^1(B(x,r) \cap \gamma([0,1])) \leq 2r$. The ratio $\nu(B(x,r)) / \mathcal{L}^2(B(x,r))$ blows up at points of the curve. This is consistent with the general theory: the singular part of $\nu$ has infinite derivative $\mathcal{L}^2$-almost everywhere on the support of $\nu_s = \nu$, and zero derivative $\mathcal{L}^2$-almost everywhere on $\mathbb{R}^2$.
[/example]
## Recovery of the Lebesgue Decomposition
The differentiation theorem as stated above assumes $\nu \ll \mu$. What happens when $\nu$ has a singular component — when it places positive mass on a $\mu$-null set? The answer is that differentiation simultaneously reads off both parts of the Lebesgue decomposition $\nu = \nu_{ac} + \nu_s$.
Recall from measure theory that for any two $\sigma$-finite measures $\mu$ and $\nu$, the Lebesgue decomposition theorem guarantees a unique splitting $\nu = \nu_{ac} + \nu_s$ where $\nu_{ac} \ll \mu$ (the absolutely continuous part) and $\nu_s \perp \mu$ (the singular part). There exist disjoint Borel sets $S$ and $T = \mathbb{R}^n \setminus S$ with $\mu(S) = 0$, $\nu_s(T) = 0$. The support of $\nu_s$ is concentrated on a $\mu$-null set, while $\nu_{ac}$ sees only $\mu$-positive sets.
[quotetheorem:2971]
Parts (iii) and (iv) describe a striking dichotomy. On the one hand, the singular measure $\nu_s$ leaves no trace in the derivative at $\mu$-typical points: the ball ratios $\nu_s(B(x,r)) / \mu(B(x,r))$ converge to zero for $\mu$-almost every $x$. On the other hand, at $\nu_s$-typical points, the balls centered there carry an enormous amount of $\nu_s$-mass relative to $\mu$-mass: the ratio diverges to $+\infty$. This captures precisely the intuition that singular mass is concentrated on a $\mu$-null set — at those special points, $\nu_s$ is infinitely dense relative to $\mu$, while everywhere else it is invisible.
Part (iii) uses the absolute continuity $\nu_s \perp \mu$ in a precise way. Let $S$ be the $\mu$-null set supporting $\nu_s$. For $\mu$-almost every $x$, the ball $B(x,r)$ has $\mu(B(x,r) \cap S)$ small relative to $\mu(B(x,r))$ as $r \to 0$, which forces $\nu_s(B(x,r)) / \mu(B(x,r)) \to 0$ by the measure-theoretic estimate from the Besicovitch theorem applied to the complementary set.
The integration formula (v) is the Lebesgue decomposition written as a statement about local behavior. The integral $\int_A D_\mu \nu \, d\mu$ captures the absolutely continuous part, while $\nu_s(A)$ is the mass that differentiation cannot see through the ratio $\nu(B(x,r)) / \mu(B(x,r))$. Differentiation thus acts as a filter: it extracts the $\mu$-continuous component and discards the singular one.
<!-- illustration-needed: Two-panel diagram. Left: the graph of r ↦ ν(B(x,r))/μ(B(x,r)) for an absolutely continuous ν converging to a finite limit as r→0. Right: the same ratio for a singular measure ν_s — showing the ratio converging to 0 at μ-typical points and diverging at ν_s-typical points. -->
[example: Lebesgue Decomposition via Differentiation]
Let $\mu = \mathcal{L}^1$ on $[0,1]$ and let $\nu = \frac{1}{2} \mathcal{L}^1 + \mu_C$, where $\mu_C$ is the Cantor measure from the opening example. We claim $\nu_{ac} = \frac{1}{2} \mathcal{L}^1$ and $\nu_s = \mu_C$.
To verify: $\mathcal{L}^1(C) = 0$ and $\mu_C([0,1] \setminus C) = 0$, so $\mu_C \perp \mathcal{L}^1$. Since $\frac{1}{2} \mathcal{L}^1 \ll \mathcal{L}^1$ with density $1/2$, the decomposition is $\nu_{ac} = \frac{1}{2} \mathcal{L}^1$ and $\nu_s = \mu_C$.
What does the differentiation theorem predict? For $\mathcal{L}^1$-almost every $x$:
\begin{align*}
D_{\mathcal{L}^1} \nu(x) &= \lim_{r \to 0^+} \frac{\nu(B(x,r))}{2r} = \lim_{r \to 0^+} \frac{\frac{1}{2} \cdot 2r + \mu_C(B(x,r))}{2r}.
\end{align*}
For $\mathcal{L}^1$-almost every $x$ (i.e., for $x \notin C$), once $r$ is small enough that $B(x,r)$ lies in a removed interval, $\mu_C(B(x,r)) = 0$. So:
\begin{align*}
D_{\mathcal{L}^1} \nu(x) &= \lim_{r \to 0^+} \frac{\frac{1}{2} \cdot 2r}{2r} = \frac{1}{2}.
\end{align*}
The derivative equals $1/2$, which is the Radon-Nikodym density $d\nu_{ac}/d\mathcal{L}^1 = 1/2$. The Cantor measure contributes nothing to the derivative at $\mathcal{L}^1$-almost every point.
The integration formula then reads:
\begin{align*}
\nu(A) &= \int_A \frac{1}{2} \, d\mathcal{L}^1 + \mu_C(A)
\end{align*}
for every Borel set $A \subset [0,1]$, which is exactly the definition of $\nu$.
[/example]
## The Maximal Function and Quantitative Estimates
The differentiation theorem guarantees almost everywhere existence, but practical applications often require quantitative control over how quickly the ball ratios converge, or what happens on the exceptional null set. This is the role of the maximal function, which serves as a uniform envelope for all the ball averages simultaneously.
How should one bound the oscillation of the averages $\nu(B(x,r)) / \mu(B(x,r))$ uniformly over all $r$? The maximal function records the worst-case ratio.
[definition: Maximal Function of a Measure]
Let $\mu$ and $\nu$ be Radon measures on $\mathbb{R}^n$. The **maximal function** of $\nu$ with respect to $\mu$ is the function $M_\mu \nu : \mathbb{R}^n \to [0, +\infty]$ defined by
\begin{align*}
M_\mu \nu(x) &= \sup_{r > 0} \frac{\nu(B(x,r))}{\mu(B(x,r))},
\end{align*}
with the convention $0/0 = 0$.
[/definition]
When $\mu = \mathcal{L}^n$ and $\nu(A) = \int_A f \, d\mathcal{L}^n$ for $f \geq 0$, the maximal function becomes the Hardy-Littlewood maximal function $Mf(x) = \sup_{r > 0} \frac{1}{\mathcal{L}^n(B(x,r))} \int_{B(x,r)} f \, d\mathcal{L}^n$, which plays a central role in harmonic analysis. The GMT framework places this in a more general context.
The critical property of the maximal function is the weak-type estimate: the set $\{x : M_\mu \nu(x) > t\}$ has $\mu$-measure at most $C(n) \cdot \nu(\mathbb{R}^n) / t$. This follows from the Besicovitch-type estimate of the previous section: if $M_\mu \nu(x) > t$, then $\overline{D}_\mu \nu(x) \geq t$, so $x \in A_t$ and $\nu(A_t) \geq t \cdot \mu(A_t)$, giving $\mu(A_t) \leq \nu(A_t) / t \leq \nu(\mathbb{R}^n) / t$.
[quotetheorem:2972]
The estimate $\mu(\{M_\mu \nu > t\}) \leq \nu(\mathbb{R}^n) / t$ is a weak-type $(1,1)$ bound. It says that the maximal function is controlled in distribution: even though $M_\mu \nu$ can blow up pointwise, its super-level sets cannot be too large. This is precisely what permits the passage to almost-everywhere convergence: the Besicovitch covering argument shows the set where the upper and lower derivatives disagree must have measure zero, because any positive-measure disagreement set would violate the weak-type bound.
The weak-type estimate depends critically on the Besicovitch constant $N(n)$: the constant in the full estimate is $N(n)$ rather than $1$. The clean form $\leq \nu(\mathbb{R}^n)/t$ holds in the simplest version; with the full $N(n)$ bookkeeping it becomes $\leq N(n) \cdot \nu(\mathbb{R}^n)/t$. For our purposes, the important feature is that the bound depends on $\nu$ only through its total mass, not on any geometric structure of the support.
## Differentiation and Absolute Continuity
The differentiation theorem has a powerful converse: if the derivative $D_\mu \nu$ is finite $\mu$-almost everywhere and the integration formula holds, then $\nu \ll \mu$. This gives a purely analytic characterization of absolute continuity in terms of ball ratios, which connects differentiation theory back to the fundamental structure of the Radon-Nikodym theorem.
What conditions on the ball ratios force $\nu \ll \mu$? The answer involves requiring the derivative to be a well-defined, finite function.
[quotetheorem:2973]
The implication (i) $\Rightarrow$ (ii) is the main differentiation theorem already proved. The implication (ii) $\Rightarrow$ (i) follows immediately: if $\mu(A) = 0$, then the integral $\int_A D_\mu \nu \, d\mu = 0$, so $\nu(A) = 0$. The key insight is that once the integration formula holds, absolute continuity is automatic.
The necessity of absolute continuity for finiteness of the derivative is less immediate but equally instructive. If $\nu$ has a non-zero singular component $\nu_s$, then by part (iv) of the previous theorem, $D_\mu \nu_s(x) = +\infty$ for $\nu_s$-almost every $x$. Since $\nu_s \neq 0$, the set where $D_\mu \nu_s = +\infty$ is non-empty. But crucially, this set is $\mu$-null (because $\nu_s \perp \mu$ means the support of $\nu_s$ has $\mu$-measure zero). So "almost everywhere" here depends which measure you use: the derivative is $+\infty$ at $\nu_s$-almost every point in the singular support, yet this set is invisible from $\mu$'s perspective. This is the precise sense in which the derivative "sees" only the absolutely continuous part.
[explanation: Why Besicovitch Is Essential and Vitali Is Insufficient]
The Vitali covering theorem produces a disjoint subcollection $\{B_j\}$ from a family of balls such that the 5-fold enlargements $5B_j$ cover the original union. The price of this simplicity is the factor of 5: when you need to estimate $\mu(\bigcup B_j)$ in terms of $\mu(\bigcup 5 B_j)$, you must know how much $\mu$-mass can be in a ball of radius $5r$ relative to a ball of radius $r$ at the same center. This ratio $\mu(5B) / \mu(B)$ is the doubling constant of $\mu$.
For Lebesgue measure, $\mathcal{L}^n(5B) = 5^n \mathcal{L}^n(B)$, so the doubling constant is $5^n$. This is large but fixed, and Vitali suffices for the Lebesgue differentiation theorem. But for a singular measure $\mu$ — say the Cantor measure on $[0,1]$ — the ratio $\mu(5B) / \mu(B)$ can be arbitrarily large or even infinite. No universal doubling constant exists.
Besicovitch circumvents this entirely. It guarantees that the original balls (without any dilation) cover the set, with bounded overlap at most $N(n)$. The counting estimate
\begin{align*}
\sum_j \mathbf{1}_{B_j}(x) &\leq N(n) \quad \text{for } \mu\text{-a.e. } x
\end{align*}
holds for any Radon measure $\mu$, doubling or not. The key quantity $N(n)$ depends only on the dimension of the ambient space, through purely geometric estimates on how many unit balls centered within the unit ball of $\mathbb{R}^n$ can be pairwise non-intersecting.
This is why differentiation of measures in $\mathbb{R}^n$ — including singular measures — works uniformly: Besicovitch replaces the doubling condition with a purely metric/dimension condition on the ambient space.
[/explanation]
## The Lebesgue Differentiation Theorem as a Special Case
The classical Lebesgue differentiation theorem states that for $f \in L^1_{\mathrm{loc}}(\mathbb{R}^n)$, the average of $f$ over balls shrinking to $x$ converges to $f(x)$ at $\mathcal{L}^n$-almost every $x$. This is not a separate theorem: it is an immediate corollary of the differentiation theorem for Radon measures.
What is the connection? Any locally integrable function $f \geq 0$ defines a Radon measure $\nu$ via $\nu(A) = \int_A f \, d\mathcal{L}^n$. Since $\mathcal{L}^n(A) = 0 \Rightarrow \nu(A) = 0$, we have $\nu \ll \mathcal{L}^n$. The derivative $D_{\mathcal{L}^n} \nu(x) = f(x)$ $\mathcal{L}^n$-almost everywhere is exactly the Lebesgue differentiation theorem.
[quotetheorem:74]
The proof is immediate: define $\nu(A) = \int_A |f| \, d\mathcal{L}^n$ for $A$ Borel. Then $\nu$ is a Radon measure with $\nu \ll \mathcal{L}^n$, and the differentiation theorem gives $D_{\mathcal{L}^n} \nu(x) = |f(x)|$ for $\mathcal{L}^n$-almost every $x$. Since $\nu(B(x,r)) / \mathcal{L}^n(B(x,r)) = \frac{1}{\mathcal{L}^n(B(x,r))} \int_{B(x,r)} |f| \, d\mathcal{L}^n$, the conclusion follows. The same argument applies to $f$ itself (not just $|f|$) by considering $\nu(A) = \int_A f \, d\mathcal{L}^n$ as a signed measure.
The Lebesgue differentiation theorem holds for the full generality of $L^1_{\mathrm{loc}}$ functions — there is no continuity requirement. What saves us is the almost-everywhere qualification: we accept that the exceptional set where convergence fails can be a null set, but we cannot say more. It would be asking too much to expect convergence everywhere: the function $f = \mathbf{1}_{\mathbb{Q}}$ (the indicator of the rationals) has $f(x) = 0$ for almost every $x$ yet $f(x) = 1$ at every rational, so the average $\frac{1}{2r} \int_{x-r}^{x+r} f \, d\mathcal{L}^1 = 0$ for all $r > 0$ while $f(x) = 1$ at rationals. The theorem correctly identifies that this failure set (the rationals) has $\mathcal{L}^1$-measure zero.
[example: Failure at a Dense Set of Points]
Let $f = \mathbf{1}_{\mathbb{Q}} : \mathbb{R} \to \{0, 1\}$ be the Dirichlet-type indicator function. Since $\mathcal{L}^1(\mathbb{Q}) = 0$, the Lebesgue differentiation theorem predicts that for $\mathcal{L}^1$-almost every $x$:
\begin{align*}
\lim_{r \to 0^+} \frac{1}{2r} \int_{x-r}^{x+r} \mathbf{1}_{\mathbb{Q}}(y) \, d\mathcal{L}^1(y) &= \mathbf{1}_{\mathbb{Q}}(x) = 0.
\end{align*}
The average on the left is zero for every $r > 0$ since $\mathcal{L}^1(\mathbb{Q} \cap [x-r, x+r]) = 0$. So the convergence holds — and the limit is $0$ — at every irrational $x$. At rational $x$, the left-hand side is still $0$ (the rationals have no Lebesgue measure), while $f(x) = 1$. So convergence fails precisely at the rationals, a countable (hence null) set. The theorem's almost-everywhere conclusion is tight: the exceptional set here is non-empty and dense, though negligible.
This example demonstrates that the almost-everywhere qualifier in the Lebesgue differentiation theorem is genuinely necessary and cannot be upgraded to everywhere, even for bounded functions. The failure set can be topologically large (dense) while remaining measure-theoretically negligible.
[/example]
## Connection to the Radon-Nikodym Theorem and Future Directions
The differentiation theory developed in this chapter completes a circle. The abstract Radon-Nikodym theorem guarantees the existence of a density function $d\nu/d\mu$ for absolutely continuous measures but offers no construction. The differentiation theorem closes the loop by showing that the ball-ratio limit — a concrete geometric construction — produces exactly this density almost everywhere. The two approaches are mathematically equivalent, but differentiation provides the spatially localized perspective that is essential for geometric applications.
Looking forward, the differentiation theory interacts with the rest of geometric measure theory in several essential ways. First, it powers the theory of Lebesgue points (Chapter 6): a point $x$ is a Lebesgue point of $f$ if not only do the averages of $f$ converge to $f(x)$, but the averages of $|f - f(x)|$ also converge to zero. This stronger property holds at $\mathcal{L}^n$-almost every point and gives a precise sense in which a measurable function is "approximately continuous." Second, the $s$-dimensional density of Hausdorff measure — which will appear in Chapter 10 — is exactly the derivative $D_{\mathcal{H}^s} (\mathcal{H}^s \lfloor E)$ of the restriction of $\mathcal{H}^s$ to a set $E$. The density theory for rectifiable sets in GMT II and III relies on these derivatives in an essential way.
The Besicovitch covering theorem, which powered the differentiation theory here, will continue to appear throughout. Whenever we need to convert local information (properties of a measure at small scales) into global conclusions about the measure, Besicovitch provides the key: it extracts efficient covers whose mass can be summed with bounded overlap, without any doubling assumption on the ambient measure. This robustness under non-doubling conditions is what makes the theory applicable to the singular and fractal measures that arise naturally in geometric measure theory.
---
Having defined differentiation for measures, we now use it to understand where measures are concentrated. Lebesgue points reveal that almost every point of a measure exhibits 'typical' density behavior, while approximate continuity extends classical continuity to the measure-theoretic setting.
# 6. Lebesgue Points and Approximate Continuity
Suppose you are handed a locally integrable function $f \in L^1_{\mathrm{loc}}(\mathbb{R}^n)$ and asked a seemingly simple question: at a given point $x$, what is $f(x)$? The difficulty is that $f$ is only defined up to sets of measure zero, so there is no canonical value at any individual point. Two functions that agree $\mathcal{L}^n$-almost everywhere are indistinguishable as elements of $L^1_{\mathrm{loc}}$, yet we frequently need to speak about pointwise values — in PDEs, in geometric problems, in regularity theory. The naive answer, "choose any representative," is unsatisfying because it offers no canonical choice and no geometric meaning.
What we want instead is a procedure that reads $f(x)$ off from the surrounding values of $f$ in a way that is intrinsic and works at almost every point simultaneously. The local average $\frac{1}{\mathcal{L}^n(B(x,r))} \int_{B(x,r)} f \, d\mathcal{L}^n$ is the natural candidate: it aggregates information from a neighborhood and collapses it to a single number. The Lebesgue differentiation theorem says this procedure succeeds almost everywhere — the local averages converge to $f(x)$ at $\mathcal{L}^n$-almost every $x$.
But the Lebesgue differentiation theorem is only the beginning. Knowing that averages of $f$ converge to $f(x)$ is weaker than knowing that $f(y)$ is close to $f(x)$ for typical $y$ near $x$. The stronger statement — that the average of $|f(y) - f(x)|$ tends to zero — defines what it means for $x$ to be a **Lebesgue point**. Almost every point turns out to be a Lebesgue point, and this finer fact is what underpins regularity theory, the theory of sets of finite perimeter, and much of the GMT machinery to come.
Beyond Lebesgue points sits a deeper question about continuity. An arbitrary measurable function can be discontinuous everywhere: the Dirichlet function $\mathbb{1}_{\mathbb{Q}}$ has no point of continuity on $\mathbb{R}$. Ordinary continuity is simply too rigid a notion for the class of measurable functions. The right substitute — one that is satisfied almost everywhere by every locally integrable function — is **approximate continuity**, which requires that $f(y) \approx f(x)$ not for all $y$ near $x$ but for the overwhelming majority of them, in the sense of density.
[example: The Dirichlet Function and the Failure of Pointwise Continuity]
Consider $f = \mathbb{1}_{\mathbb{Q}} : \mathbb{R} \to \{0, 1\}$, the indicator function of the rationals. This function is nowhere continuous: at every irrational $x_0$, every interval $(x_0 - r, x_0 + r)$ contains rationals where $f = 1$, so $f$ does not converge to $f(x_0) = 0$ along rational sequences. At every rational $x_0$, every interval contains irrationals where $f = 0$, so $f$ does not converge to $f(x_0) = 1$ along irrational sequences.
Despite this, the local averages of $f$ behave perfectly. For any $x \in \mathbb{R}$ and any interval $(x - r, x + r)$, the rationals in this interval form a set of Lebesgue measure zero, so
\begin{align*}
\frac{1}{2r} \int_{x-r}^{x+r} f(y) \, d\mathcal{L}^1(y) = \frac{\mathcal{L}^1(\mathbb{Q} \cap (x-r, x+r))}{2r} = 0.
\end{align*}
Thus the averages converge to $0$ at every point. At irrationals, where $f(x) = 0$, every point is a Lebesgue point with value $0$. At rationals, where $f(x) = 1$, the averages converge to $0 \neq 1$: no rational is a Lebesgue point. The set of non-Lebesgue points is exactly $\mathbb{Q}$, which has $\mathcal{L}^1$-measure zero. This is consistent with the theorem that almost every point is a Lebesgue point.
The same calculation shows that $f$ is approximately continuous at every irrational: the set $\{y \in (x-r, x+r) : |f(y) - 0| \geq \varepsilon\}$ has measure zero for any $\varepsilon < 1$, and measure-zero sets have density zero. At rationals, $f$ is not approximately continuous with value $1$.
[/example]
## The Lebesgue Differentiation Theorem
How do we know that local averages of a locally integrable function converge almost everywhere to the function itself? The question is nontrivial: the function could oscillate wildly, concentrating mass in complicated patterns. What prevents the averages from oscillating as well?
The answer comes from the differentiation theory for Radon measures developed in Chapter 5. When $f \in L^1_{\mathrm{loc}}(\mathbb{R}^n)$, we can associate to $f$ the signed Radon measure $\nu$ defined by $\nu(A) = \int_A f \, d\mathcal{L}^n$ for every Borel set $A$. This measure is absolutely continuous with respect to Lebesgue measure: $\nu \ll \mathcal{L}^n$. The local average of $f$ over a ball $B(x, r)$ is then nothing but the ratio $\nu(B(x,r)) / \mathcal{L}^n(B(x,r))$, which is precisely the quantity whose limit defines the derivative $D_{\mathcal{L}^n} \nu(x)$. The differentiation theorem for Radon measures therefore directly yields the Lebesgue differentiation theorem.
[quotetheorem:74]
The hypothesis $f \in L^1_{\mathrm{loc}}(\mathbb{R}^n)$ is essentially optimal. It is necessary that $f$ be locally integrable so that the averages are defined and finite at every point. If $f \in L^p_{\mathrm{loc}}(\mathbb{R}^n)$ for $p > 1$, the same conclusion holds with the same proof, since $L^p_{\mathrm{loc}} \subset L^1_{\mathrm{loc}}$ locally. The almost-everywhere qualifier cannot be removed: the Dirichlet function example above shows that the averages can fail to converge to $f(x)$ on a dense set of points (the rationals), even though this set has measure zero.
The statement is phrased for balls $B(x, r)$, but the result holds for any "regular" family of sets shrinking to $x$, provided they do not become too thin. More precisely, one can replace $B(x, r)$ with any measurable set $E_r$ containing $x$ satisfying $\mathcal{L}^n(E_r) \geq c \mathcal{L}^n(B(x, r))$ for some constant $c > 0$ independent of $r$. Sets satisfying this are said to shrink to $x$ *regularly*. Rectangles aligned with the coordinate axes shrink regularly, as do cubes. Thin rectangles with rapidly increasing eccentricity do not, and one can construct functions in $L^1_{\mathrm{loc}}(\mathbb{R}^2)$ for which the averages over such thin rectangles fail to converge almost everywhere.
The Lebesgue differentiation theorem connects back to the differentiation theory for Radon measures through the Radon–Nikodym theorem. If $\nu = f \, d\mathcal{L}^n$, then $f$ is the Radon–Nikodym derivative $d\nu/d\mathcal{L}^n$. The differentiation theorem says this abstract derivative can be computed as a limit of concrete geometric averages.
## Lebesgue Points
The Lebesgue differentiation theorem tells us that the average of $f$ over $B(x, r)$ converges to $f(x)$. But this is a statement about integrals of $f$, not about $f$ itself. A much stronger statement — that $f$ is well-approximated by the constant $f(x)$ in average norm, not just in value — defines the notion of a Lebesgue point.
Why does this distinction matter? In applications to PDEs and geometric measure theory, we often need to know not just that $f(x)$ is recovered by averaging, but that the function is well-concentrated near $f(x)$ in a neighborhood of $x$. For instance, the theory of functions of bounded variation and sets of finite perimeter requires identifying the precise representative of a function by a local averaging procedure. The Lebesgue point condition provides exactly this canonical representative.
[definition: Lebesgue Point]
Let $f \in L^1_{\mathrm{loc}}(\mathbb{R}^n)$. A point $x \in \mathbb{R}^n$ is called a **Lebesgue point** of $f$ if
\begin{align*}
\lim_{r \to 0} \frac{1}{\mathcal{L}^n(B(x,r))} \int_{B(x,r)} |f(y) - f(x)| \, d\mathcal{L}^n(y) = 0.
\end{align*}
[/definition]
The quantity $\frac{1}{\mathcal{L}^n(B(x,r))} \int_{B(x,r)} |f(y) - f(x)| \, d\mathcal{L}^n(y)$ measures how much $f$ varies from its value at $x$ on average over the ball $B(x, r)$. For a continuous function, this tends to zero at every point (because $f(y) \to f(x)$ uniformly as $y \to x$). For a general locally integrable function, continuity is too much to ask, but the Lebesgue point condition says the average deviation tends to zero even without pointwise convergence.
[remark: Lebesgue Points Are Stronger Than Differentiation]
Note that the Lebesgue point condition implies the Lebesgue differentiation property. If $x$ is a Lebesgue point of $f$, then
\begin{align*}
\left| \frac{1}{\mathcal{L}^n(B(x,r))} \int_{B(x,r)} f(y) \, d\mathcal{L}^n(y) - f(x) \right| \leq \frac{1}{\mathcal{L}^n(B(x,r))} \int_{B(x,r)} |f(y) - f(x)| \, d\mathcal{L}^n(y) \to 0.
\end{align*}
The Lebesgue point condition requires the averages of $|f(\cdot) - f(x)|$ to vanish, while the differentiation theorem only requires the averages of $f(\cdot) - f(x)$ (without absolute value) to vanish. For an oscillating function that is positive on half of each small ball and negative on the other half, the signed averages could cancel while the absolute averages do not.
[/remark]
The fundamental theorem on Lebesgue points is that almost every point has this stronger property.
[quotetheorem:2975]
The proof of this theorem reduces to the Lebesgue differentiation theorem by a clever rational approximation argument. Fix any rational number $q \in \mathbb{Q}$. The function $g_q(y) = |f(y) - q|$ is also locally integrable, so the differentiation theorem applies to $g_q$: for $\mathcal{L}^n$-almost every $x$,
\begin{align*}
\lim_{r \to 0} \frac{1}{\mathcal{L}^n(B(x,r))} \int_{B(x,r)} |f(y) - q| \, d\mathcal{L}^n(y) = |f(x) - q|.
\end{align*}
Let $N_q$ denote the null set of $x$ where this fails. The union $N = \bigcup_{q \in \mathbb{Q}} N_q$ is a countable union of null sets, hence itself a null set. For any $x \notin N$ and any $\varepsilon > 0$, choose a rational $q$ with $|f(x) - q| < \varepsilon/2$. Then for small enough $r$,
\begin{align*}
\frac{1}{\mathcal{L}^n(B(x,r))} \int_{B(x,r)} |f(y) - f(x)| \, d\mathcal{L}^n(y) &\leq \frac{1}{\mathcal{L}^n(B(x,r))} \int_{B(x,r)} |f(y) - q| \, d\mathcal{L}^n(y) + |q - f(x)| \\
&\to |f(x) - q| + |q - f(x)| < \varepsilon.
\end{align*}
Since $\varepsilon > 0$ was arbitrary, $x$ is a Lebesgue point of $f$.
[explanation: The Role of Rational Approximation]
The rational approximation step in the proof is doing something subtle and important. We cannot directly apply the differentiation theorem to the function $y \mapsto |f(y) - f(x)|$ and conclude that its averages converge to $|f(x) - f(x)| = 0$, because the target value $0$ depends on $x$ — this would only tell us that for each fixed $x$, the averages converge at $\mathcal{L}^n$-almost every $y$, which is the wrong quantifier. Instead, we need a statement that works simultaneously for all $x$ outside a single null set.
The trick is to reduce to the countably many rational approximations $q \in \mathbb{Q}$. For each fixed $q$, the differentiation theorem gives a null set $N_q$ outside which averages of $|f(\cdot) - q|$ converge to $|f(\cdot) - q|$. Taking the countable union $N = \bigcup_{q \in \mathbb{Q}} N_q$ handles all rationals simultaneously. The fact that $\mathbb{Q}$ is dense in $\mathbb{R}$ then lets us transfer from rational approximation to the actual value $f(x)$ by a triangle inequality argument.
This pattern — reduce a continuum of conditions to countably many by density — recurs throughout measure theory. It is the same idea used to prove that the Lebesgue differentiation theorem implies measurability of certain maximal functions, and it appears again in the proof that approximately continuous functions are measurable.
[/explanation]
The set of Lebesgue points of $f$ is called the **Lebesgue set** of $f$. Its complement has $\mathcal{L}^n$-measure zero. At every point of the Lebesgue set, the value $f(x)$ is uniquely determined by the local averaging procedure: if we define $\tilde{f}(x) = \lim_{r \to 0} \frac{1}{\mathcal{L}^n(B(x,r))} \int_{B(x,r)} f \, d\mathcal{L}^n$ wherever this limit exists, then $\tilde{f}(x) = f(x)$ at every Lebesgue point. This gives a canonical **precise representative** of $f$ that is defined pointwise, not just almost everywhere.
[example: Lebesgue Points of a Jump Function]
Let $n = 1$ and consider $f = \mathbb{1}_{[0, \infty)} : \mathbb{R} \to \{0, 1\}$, the Heaviside function. For $x > 0$, the entire ball $(x - r, x + r)$ lies in $(0, \infty)$ for small $r$, so $f \equiv 1$ there and
\begin{align*}
\frac{1}{2r} \int_{x-r}^{x+r} |f(y) - 1| \, d\mathcal{L}^1(y) = 0 \to 0.
\end{align*}
So every $x > 0$ is a Lebesgue point with value $1$.
For $x < 0$, by the same argument, every $x < 0$ is a Lebesgue point with value $0$.
At $x = 0$, the ball $(-r, r)$ is split evenly: $f = 0$ on $(-r, 0)$ and $f = 1$ on $(0, r)$. The local average is
\begin{align*}
\frac{1}{2r} \int_{-r}^{r} f(y) \, d\mathcal{L}^1(y) = \frac{1}{2r} \cdot r = \frac{1}{2}.
\end{align*}
The averages converge to $1/2$, not to $f(0) = 1$ (nor to $0$). Computing the Lebesgue point condition at $x = 0$ with $f(0) = 1$:
\begin{align*}
\frac{1}{2r} \int_{-r}^{r} |f(y) - 1| \, d\mathcal{L}^1(y) = \frac{1}{2r} \int_{-r}^{0} 1 \, d\mathcal{L}^1(y) = \frac{1}{2} \not\to 0.
\end{align*}
So $x = 0$ is not a Lebesgue point of $f$ with any value (in particular not with value $1$). The set of non-Lebesgue points is $\{0\}$, a single point, which indeed has $\mathcal{L}^1$-measure zero.
The canonical representative $\tilde{f}$ satisfies $\tilde{f}(0) = 1/2$, the average of the left and right limits. This is the standard "regularization" of jump discontinuities in one variable: the precise representative assigns the average value at jump points.
[/example]
## Approximate Limits and Approximate Continuity
A measurable function can fail to be continuous at every point — the Dirichlet function provides a complete obstruction. But we have seen from Lusin's theorem (Chapter 2) that measurable functions are "nearly continuous" in the sense of being continuous on large compact sets. This chapter's subject offers a complementary, and in some ways deeper, version of the same idea: approximate continuity, which holds at almost every point, not just on a large set.
The motivation for the definition comes from thinking about what it means for $f$ to be continuous at $x$ in terms of the distribution of values of $f$ near $x$. Ordinary continuity says: for every $\varepsilon > 0$, the set $\{y : |f(y) - f(x)| \geq \varepsilon\}$ is bounded away from $x$, meaning it does not cluster at $x$. This is an extremely rigid condition. Approximate continuity relaxes it: instead of requiring that the "bad set" stays away from $x$, we only require that it has density zero at $x$.
[definition: Density of a Set at a Point]
Let $E \subset \mathbb{R}^n$ be a measurable set and $x \in \mathbb{R}^n$. The **density of $E$ at $x$** is defined as
\begin{align*}
d(E, x) = \lim_{r \to 0} \frac{\mathcal{L}^n(E \cap B(x,r))}{\mathcal{L}^n(B(x,r))},
\end{align*}
whenever this limit exists. If $d(E, x) = 1$, we say $x$ is a **density point** of $E$.
[/definition]
By the Lebesgue density theorem (the special case of the Lebesgue differentiation theorem applied to $f = \mathbb{1}_E$), almost every point of a measurable set $E$ is a density point of $E$, and almost every point of $E^c$ is a density point of $E^c$. The density captures the "typical neighborhood" of a point: near a density point of $E$, the ball $B(x, r)$ is essentially filled with points of $E$.
<!-- illustration-needed: A measurable set E in the plane with a density point x at the center — show concentric balls shrinking to x, each filled almost completely with E (shaded region), so that the fraction of unshaded area tends to zero. Contrast with a non-density point where a fixed fraction remains unshaded. -->
[definition: Approximate Limit]
Let $f : \mathbb{R}^n \to \mathbb{R}$ be a measurable function and $x \in \mathbb{R}^n$. We say $f$ has **approximate limit** $L$ at $x$, written $\operatorname{ap-lim}_{y \to x} f(y) = L$, if for every $\varepsilon > 0$,
\begin{align*}
\lim_{r \to 0} \frac{\mathcal{L}^n\!\left(\{y \in B(x,r) : |f(y) - L| \geq \varepsilon\}\right)}{\mathcal{L}^n(B(x,r))} = 0.
\end{align*}
Equivalently, the set $\{y : |f(y) - L| \geq \varepsilon\}$ has density zero at $x$, meaning $x$ is a density point of its complement $\{y : |f(y) - L| < \varepsilon\}$.
[/definition]
The approximate limit, when it exists, is unique. If $L_1$ and $L_2$ are both approximate limits of $f$ at $x$, then for $\varepsilon = |L_1 - L_2|/2 > 0$, the sets $\{|f - L_1| < \varepsilon\}$ and $\{|f - L_2| < \varepsilon\}$ are disjoint (since no value can simultaneously be within $\varepsilon$ of both $L_1$ and $L_2$). Both have density $1$ at $x$, so their intersection also has density $1$ at $x$ — but their intersection is empty, a contradiction. Therefore $L_1 = L_2$.
[definition: Approximate Continuity]
Let $f : \mathbb{R}^n \to \mathbb{R}$ be a measurable function. We say $f$ is **approximately continuous** at $x$ if the approximate limit of $f$ at $x$ exists and equals $f(x)$:
\begin{align*}
\operatorname{ap-lim}_{y \to x} f(y) = f(x).
\end{align*}
[/definition]
The connection between Lebesgue points and approximate continuity is immediate. If $x$ is a Lebesgue point of $f$, then for every $\varepsilon > 0$,
\begin{align*}
\frac{\mathcal{L}^n(\{y \in B(x,r) : |f(y) - f(x)| \geq \varepsilon\})}{\mathcal{L}^n(B(x,r))} &\leq \frac{1}{\varepsilon} \cdot \frac{1}{\mathcal{L}^n(B(x,r))} \int_{B(x,r)} |f(y) - f(x)| \, d\mathcal{L}^n(y) \to 0,
\end{align*}
where we used the Markov inequality to bound the measure of the set $\{|f(\cdot) - f(x)| \geq \varepsilon\}$ by the integral of $|f(\cdot) - f(x)|/\varepsilon$. This shows:
[quotetheorem:2976]
This theorem is remarkable: every locally integrable function, no matter how wild its pointwise behavior, is approximately continuous almost everywhere. The Dirichlet function is approximately continuous at every irrational (where it equals $0$ and the rationals, which form its deviation set, have density zero). The Heaviside function is approximately continuous at every point except $0$, where the deviation set has density $1/2$ in every neighborhood.
The hypothesis of local integrability cannot be dropped entirely, but it can be weakened to measurability: any measurable function is approximately continuous $\mathcal{L}^n$-almost everywhere. The proof proceeds by approximating a general measurable function by bounded functions (using truncation) and applying the result for bounded functions, which follows from the Lebesgue differentiation theorem.
The converse direction also holds: a function that is approximately continuous at $x$ and for which the approximate limit equals $f(x)$ is a Lebesgue point of $f$ provided $f$ is locally integrable. The equivalence shows that Lebesgue points and points of approximate continuity are the same for $L^1_{\mathrm{loc}}$ functions.
[example: Approximate Continuity of the Characteristic Function of a Smooth Set]
Let $E = \{x \in \mathbb{R}^n : x_1 > 0\}$ be the open half-space, and let $f = \mathbb{1}_E$. Consider the behavior at a boundary point $x = 0$.
For any $r > 0$, the ball $B(0, r)$ is divided by the hyperplane $\{x_1 = 0\}$ into equal halves: the half-space intersection $E \cap B(0, r)$ has $\mathcal{L}^n$-measure exactly $\mathcal{L}^n(B(0, r))/2$ (by symmetry, since the hyperplane divides the ball in half). Therefore the deviation set at $x = 0$ with value $L = 1$ is
\begin{align*}
\{y \in B(0, r) : |f(y) - 1| \geq 1/2\} = B(0,r) \setminus E,
\end{align*}
which has measure $\mathcal{L}^n(B(0,r))/2$. This does not tend to zero, so $f$ is not approximately continuous at $0$ with value $1$.
With value $L = 0$: the deviation set is $E \cap B(0, r)$, again of measure $\mathcal{L}^n(B(0,r))/2$. Not approximately continuous with value $0$ either.
For interior points $x \in E$ with $x_1 > 0$: choose $r < x_1$, so $B(x, r) \subset E$ and $f \equiv 1$ on $B(x,r)$. The deviation set for any $\varepsilon < 1$ is empty, so $f$ is approximately continuous at $x$ with value $1$. By the same argument, $f$ is approximately continuous at all interior points of $E^c$.
The boundary $\{x_1 = 0\}$ is a null set, consistent with the theorem that approximate continuity holds almost everywhere.
[/example]
## Comparison with Lusin's Theorem
Two theorems assert that measurable functions are "nearly continuous": Lusin's theorem from Chapter 2 and the approximate continuity theorem proved here. They are logically distinct and measure different aspects of the same phenomenon.
Lusin's theorem says: for every $\varepsilon > 0$, there exists a compact set $K$ with $\mathcal{L}^n(\mathbb{R}^n \setminus K) < \varepsilon$ such that $f|_K$ is continuous. The function is genuinely continuous (in the usual sense, relative to the subspace topology on $K$) on a set that is large in measure.
Approximate continuity says: at $\mathcal{L}^n$-almost every $x$, the function $f$ satisfies the approximate continuity condition at $x$. This is a pointwise statement, valid at almost every individual point.
[explanation: How the Two Theorems Relate]
Neither theorem implies the other directly. Lusin's theorem gives genuine continuity on a large set but says nothing about individual points outside that set. The approximate continuity theorem gives a weaker form of continuity (density-zero deviations rather than pointwise convergence) at almost every point, including points where Lusin's compact set is missing.
There is a precise connection, however. A classical theorem of Stepanov states that $f$ is approximately continuous at $x$ if and only if $x$ is a "point of continuity" of $f$ in the following generalized sense: $f$ agrees with a continuous function on a set of density $1$ at $x$. In particular, if $K$ is the Lusin compact set and $x \in K$ is a point of continuity of $f|_K$ (which holds for all $x \in K$ except those on the boundary of $K$, roughly speaking), then $f$ is approximately continuous at $x$.
The complementary roles of the two theorems become clear in GMT. Lusin's theorem is used to approximate BV functions by smooth ones and to pass from measurable functions to continuous ones in compactness arguments. Approximate continuity is used to define the precise representative of a BV function (or a Sobolev function) at individual points, which is essential when discussing jump sets and the structure of singularities.
[/explanation]
[quotetheorem:2977]
The forward direction is the approximate continuity theorem proved above. For the backward direction: if $f$ is approximately continuous almost everywhere, then $f$ is the pointwise limit of a sequence of continuous functions almost everywhere, which implies measurability. (The argument uses the fact that the set of points of approximate continuity can be expressed as a measurable set, and that $f$ restricted to this set is measurable by the inverse of Lusin's theorem.)
The hypothesis of measurability is not merely technical. A non-measurable function constructed via the axiom of choice can fail to be approximately continuous at every point, since the density condition involves measure-theoretic data. The equivalence theorem shows that approximate continuity captures exactly the measure-theoretic regularity of functions, just as ordinary continuity captures topological regularity.
## Precise Representatives and the Lebesgue Set
The discussion of Lebesgue points leads naturally to the problem of choosing canonical representatives. Two locally integrable functions $f$ and $g$ that agree $\mathcal{L}^n$-almost everywhere are the same element of $L^1_{\mathrm{loc}}(\mathbb{R}^n)$, but they can differ on arbitrary null sets. In many applications — particularly in the theory of Sobolev spaces and BV functions — we need a canonical pointwise representative that is not just an equivalence class.
The Lebesgue point machinery provides this. Define the **precise representative** $\hat{f}$ of $f \in L^1_{\mathrm{loc}}(\mathbb{R}^n)$ by
\begin{align*}
\hat{f}(x) = \lim_{r \to 0} \frac{1}{\mathcal{L}^n(B(x,r))} \int_{B(x,r)} f(y) \, d\mathcal{L}^n(y)
\end{align*}
wherever this limit exists, and $\hat{f}(x) = 0$ (or any fixed default value) where it does not. By the Lebesgue differentiation theorem, $\hat{f} = f$ $\mathcal{L}^n$-almost everywhere, so $\hat{f}$ and $f$ represent the same $L^1_{\mathrm{loc}}$ element. But $\hat{f}$ is defined pointwise in a canonical way.
[remark: Precise Representatives for Sobolev Functions]
For Sobolev functions $u \in W^{1,p}(\Omega)$ with $p > n$, the Morrey embedding theorem implies that $u$ has a Hölder continuous representative, so the precise representative issue is resolved by continuity. For $p \leq n$, Sobolev functions need not be continuous, and the precise representative $\hat{u}$ is the natural choice: it is defined at every Lebesgue point (a set of full measure), it represents the same $W^{1,p}$ function as $u$, and it is the "correct" value in the sense that $\hat{u}(x)$ is what the function is doing near $x$ on average. The trace theorem, which defines boundary values of Sobolev functions, uses this precise representative implicitly.
[/remark]
The connection between Lebesgue points and precise representatives becomes more subtle for functions of bounded variation. A BV function $f \in BV(\Omega)$ has distributional derivatives that are Radon measures rather than $L^1$ functions, and the function itself can have jump discontinuities along hypersurfaces. The precise representative $\hat{f}$ coincides with the ordinary values of $f$ at Lebesgue points (which form a set of full measure), but at jump points — which form a set of positive $\mathcal{H}^{n-1}$-measure but $\mathcal{L}^n$-measure zero — the local averages converge to the average $\frac{1}{2}(f^+ + f^-)$ of the one-sided limits. This is exactly what one sees in the Heaviside function example computed earlier.
[quotetheorem:3035]
The $L^p$ version of the Lebesgue point condition follows from applying the theorem for $L^1_{\mathrm{loc}}$ to the function $|f(\cdot) - q|^p$ for each rational $q$, and repeating the rational approximation argument. The key step uses the inequality $|a - b|^p \leq 2^{p-1}(|a - q|^p + |b - q|^p)$ to reduce averages of $|f(\cdot) - f(x)|^p$ to averages of $|f(\cdot) - q|^p$.
The hypothesis that $f \in L^p_{\mathrm{loc}}$ rather than just $L^1_{\mathrm{loc}}$ ensures that $|f(\cdot) - q|^p$ is locally integrable, which is necessary for the differentiation theorem to apply. For $f \in L^1_{\mathrm{loc}}$ but $f \notin L^p_{\mathrm{loc}}$, the $L^p$ Lebesgue point condition can fail at a set of points that has $\mathcal{L}^n$-measure zero but need not be empty.
The $L^p$ version is relevant because it underpins the definition of Lebesgue points for Sobolev functions: if $u \in W^{1,p}(\Omega)$, then the $L^p$ Lebesgue point condition holds almost everywhere for $u$ and for all its first-order weak derivatives $D^\alpha u$ with $|\alpha| = 1$. This provides canonical pointwise values for Sobolev functions and their derivatives, which is essential in the theory of capacities and fine topology associated to Sobolev spaces.
## Connection to Differentiation Theory
The Lebesgue differentiation theorem and the theory of Lebesgue points are particular applications of the general differentiation theory for Radon measures developed in Chapter 5. It is instructive to see exactly how they fit into that framework, since the general picture illuminates both results and points toward applications in GMT.
Given $f \in L^1_{\mathrm{loc}}(\mathbb{R}^n)$, define the Radon measure $\nu$ by $\nu(A) = \int_A f \, d\mathcal{L}^n$ for Borel sets $A$. Then $\nu \ll \mathcal{L}^n$ (absolute continuity follows from the dominated convergence theorem: if $\mathcal{L}^n(A) = 0$ then $\int_A |f| \, d\mathcal{L}^n = 0$). The Radon–Nikodym derivative satisfies $d\nu/d\mathcal{L}^n = f$.
The derivative $D_{\mathcal{L}^n} \nu(x)$ is defined as $\lim_{r \to 0} \nu(B(x,r)) / \mathcal{L}^n(B(x,r))$, which equals $\lim_{r \to 0} \frac{1}{\mathcal{L}^n(B(x,r))} \int_{B(x,r)} f \, d\mathcal{L}^n$. By the differentiation theorem for Radon measures (proved using the Besicovitch covering theorem), this derivative exists and equals $f(x)$ for $\mathcal{L}^n$-almost every $x$. This is exactly the Lebesgue differentiation theorem.
[remark: Why Besicovitch and Not Vitali?]
The differentiation theorem for Radon measures requires the Besicovitch covering theorem rather than Vitali's covering theorem (which uses a 5-ball dilation). The reason is that Vitali's theorem controls the measure of $5B$ rather than $B$ itself: to conclude that $\nu(5B) / \mathcal{L}^n(5B)$ is small, one uses $\mathcal{L}^n(5B) = 5^n \mathcal{L}^n(B)$, which is fine for Lebesgue measure but breaks down for singular measures $\nu$. The Besicovitch theorem, which extracts a cover with bounded overlap without any dilation, handles singular measures directly. For the Lebesgue differentiation theorem applied to $d\nu = f \, d\mathcal{L}^n$, both approaches work, but the Besicovitch approach generalizes to the setting of differentiating one Radon measure against another.
[/remark]
Now consider the Lebesgue decomposition: for a general Radon measure $\mu$ and the Lebesgue measure $\mathcal{L}^n$, write $\mu = \mu_{ac} + \mu_s$ where $\mu_{ac} \ll \mathcal{L}^n$ and $\mu_s \perp \mathcal{L}^n$. By Chapter 5, $D_{\mathcal{L}^n} \mu(x)$ exists and equals $d\mu_{ac}/d\mathcal{L}^n(x)$ for $\mathcal{L}^n$-almost every $x$. On the singular part: $D_{\mathcal{L}^n} \mu_s(x) = 0$ for $\mathcal{L}^n$-almost every $x$, reflecting the fact that $\mu_s$ is concentrated on a set of $\mathcal{L}^n$-measure zero and therefore invisible to $\mathcal{L}^n$-typical points.
The Lebesgue density theorem is the special case $\mu = \mathbb{1}_E \, d\mathcal{L}^n = \mathcal{L}^n|_E$: the density of $E$ at $x$ equals $1$ for $\mathcal{L}^n$-almost every $x \in E$ and $0$ for $\mathcal{L}^n$-almost every $x \notin E$. In terms of the precise representative: $\hat{\mathbb{1}}_E(x) = 1$ at density points of $E$ and $\hat{\mathbb{1}}_E(x) = 0$ at density points of $E^c$. The points where $\hat{\mathbb{1}}_E(x) \in (0, 1)$ — the so-called **measure-theoretic boundary** of $E$ — form a set of $\mathcal{L}^n$-measure zero but can carry positive $\mathcal{H}^{n-1}$-measure. This measure-theoretic boundary is the fundamental object in the theory of sets of finite perimeter, where it plays the role that the topological boundary plays for smooth sets.
[quotetheorem:894]
The Lebesgue density theorem admits no improvement at the exceptional set: there exist measurable sets $E$ (for instance, the "fat Cantor set" in $\mathbb{R}$, obtained by removing from $[0,1]$ open intervals of total length $1/2$ while leaving a perfect nowhere-dense set of positive measure) such that the density at every point of the topological boundary of $E$ is neither $0$ nor $1$. The density at these boundary points lies strictly between $0$ and $1$, and the set of such points can have positive $\mathcal{H}^{n-1}$-measure. This connects directly to the study of sets of finite perimeter, where the surface measure is carried exactly by the set of points with density strictly between $0$ and $1$.
The Lebesgue density theorem also has a fundamental consequence for the relationship between topological and measure-theoretic regularity of sets. A Borel set is called **density open** (or **approximately open**) if every point of the set is a density point. The collection of density-open sets forms a topology on $\mathbb{R}^n$, the **density topology**, which is strictly finer than the Euclidean topology: every Euclidean open set is density-open (interior points of open sets are density points), but not conversely (there exist density-open sets that are not Euclidean open, such as $E = \mathbb{R} \setminus C$ where $C$ is the fat Cantor set). In the density topology, the continuous functions are exactly the approximately continuous functions — a precise formulation of the idea that approximate continuity is the "right" continuity for measure theory.
---
The Riesz representation theorem reveals the deep duality between measures and continuous linear functionals on spaces of continuous functions. This perspective shifts our focus from sets and their measures to the dual space of measures as a geometric object in its own right.
# 7. Riesz Representation Theorem
Every sophisticated construction in geometric measure theory faces the same foundational challenge: how do you know you are actually building a measure? When a geometer says "the surface area of a smooth hypersurface defines a measure," they mean something precise — that integration against the hypersurface is a bona fide Radon measure, not just a formal symbol. When a PDE analyst says "the perimeter of a set of finite perimeter is a measure," they are asserting that a distributional derivative is representable by a vector-valued Radon measure. These claims are not obvious. They require a theorem that connects the functional-analytic notion of a positive linear functional to the measure-theoretic notion of a Radon measure. That theorem is the Riesz Representation Theorem.
The theorem is elegant in its directionality. If you already have a Radon measure $\mu$, then integration against $\mu$ gives a positive linear functional on $C_c(\mathbb{R}^n)$: the map $f \mapsto \int_{\mathbb{R}^n} f \, d\mu$ is linear (by linearity of the integral) and positive (since $f \geq 0$ implies $\int f \, d\mu \geq 0$). The question the Riesz theorem answers is the converse: given any positive linear functional on $C_c(\mathbb{R}^n)$, is it necessarily integration against some Radon measure? The answer is yes, and the measure is unique. This identification between positive linear functionals and Radon measures is so complete that one can take it as an alternative definition of Radon measure — depending on whether one prefers to start from function spaces or from set functions.
[example: Integration Against a Discrete Measure]
Consider the linear functional $\Lambda: C_c(\mathbb{R}) \to \mathbb{R}$ defined by
\begin{align*}
\Lambda(f) = \sum_{k=1}^\infty \frac{1}{k^2} f\!\left(\frac{1}{k}\right).
\end{align*}
This is linear, since $\Lambda(\alpha f + \beta g) = \alpha \Lambda(f) + \beta \Lambda(g)$ by linearity of finite sums. It is positive: if $f \geq 0$, then every term $f(1/k) \geq 0$, so $\Lambda(f) \geq 0$. But what measure does it represent? The Riesz theorem guarantees a unique Radon measure $\mu$ such that $\Lambda(f) = \int_{\mathbb{R}} f \, d\mu$. We can identify $\mu$ directly: it must satisfy $\mu(\{1/k\}) = 1/k^2$ for each $k \geq 1$ and $\mu(A) = 0$ for any Borel set $A$ disjoint from $\{1/k : k \geq 1\}$. In other words,
\begin{align*}
\mu = \sum_{k=1}^\infty \frac{1}{k^2} \delta_{1/k},
\end{align*}
where $\delta_x$ denotes the Dirac mass at $x$. To verify this is Radon, note that for any compact $K \subset \mathbb{R}$, only finitely many points $1/k$ lie in $K$ (since $1/k \to 0$, and $K$ contains $0$ only if it contains a neighborhood of $0$, in which case only finitely many $1/k$ with $k < N$ lie outside any such neighborhood, while the tail $\sum_{k>N} 1/k^2$ controls the measure). The functional $\Lambda$ is, in terms of abstract functional analysis, a genuine element of the dual of $C_c(\mathbb{R})$, and the measure $\mu$ is its concrete realization.
[/example]
This example illustrates the theorem's purpose: it translates between two languages — the language of linear functionals, which is natural for analysis, and the language of measures, which is natural for integration theory and geometry. In geometric measure theory, geometric objects speak the first language natively, and the Riesz theorem teaches us to translate.
## Positive Linear Functionals and the Representation Theorem
What does it mean for a linear functional to encode geometric information, and when can we extract a measure from it?
Before stating the theorem, we need a careful definition of the domain. The space $C_c(\mathbb{R}^n)$ consists of continuous functions with compact support, and it is the natural test function space for Radon measures: a Radon measure $\mu$ is finite on compact sets, so $\int_{\mathbb{R}^n} f \, d\mu$ is a well-defined finite number for every $f \in C_c(\mathbb{R}^n)$. The functional $f \mapsto \int f \, d\mu$ is therefore defined on exactly this space.
[definition: Positive Linear Functional on $C_c(\mathbb{R}^n)$]
A map $\Lambda: C_c(\mathbb{R}^n) \to \mathbb{R}$ is a **positive linear functional** if it is linear — meaning $\Lambda(\alpha f + \beta g) = \alpha \Lambda(f) + \beta \Lambda(g)$ for all $f, g \in C_c(\mathbb{R}^n)$ and $\alpha, \beta \in \mathbb{R}$ — and positive, meaning $\Lambda(f) \geq 0$ whenever $f \geq 0$.
[/definition]
Positivity alone implies a strong automatic continuity: if $f_k \to f$ uniformly and all $f_k$ have support in a fixed compact set $K$, then $\Lambda(f_k) \to \Lambda(f)$. This follows because $|f_k - f| \leq \varepsilon$ implies $\Lambda(f_k - f) \leq \varepsilon \Lambda(\mathbf{1}_K)$ (where we approximate $\mathbf{1}_K$ by a continuous function from above), so positivity gives automatic sequential continuity without any explicit boundedness assumption.
With this definition in place, the Riesz Representation Theorem takes its most fundamental form.
[quotetheorem:2979]
The proof constructs $\mu$ from $\Lambda$ in a way that mirrors the outer regularity condition for Radon measures. For an open set $U \subset \mathbb{R}^n$, define
\begin{align*}
\mu(U) = \sup\left\{\Lambda(f) : f \in C_c(\mathbb{R}^n),\ 0 \leq f \leq 1,\ \operatorname{supp}(f) \subset U\right\}.
\end{align*}
This is the natural definition: we test $\Lambda$ against all compactly supported functions that "fit inside" $U$. For a general set $A \subset \mathbb{R}^n$, we then define the outer measure
\begin{align*}
\mu(A) = \inf\left\{\mu(U) : U \supseteq A,\ U\ \text{open}\right\}.
\end{align*}
One then verifies: (1) $\mu$ is a Borel outer measure; (2) $\mu$ satisfies Caratheodory's criterion, making Borel sets $\mu$-measurable; (3) $\mu$ is finite on compact sets because any $f \in C_c(\mathbb{R}^n)$ with $0 \leq f \leq 1$ has $\Lambda(f) < \infty$ by the local finiteness implicit in $\Lambda$ being a well-defined real-valued map; (4) inner regularity holds because compact supports appear explicitly in the construction; and (5) integration against $\mu$ recovers $\Lambda$ by approximating $f \in C_c(\mathbb{R}^n)$ by simple functions and using the sup-definition of $\mu$ on open sets. Uniqueness follows from the fact that a Radon measure is determined by its values on open sets, and those values are forced by $\Lambda$.
The hypothesis that $\Lambda$ be positive cannot be weakened to mere boundedness on $C_c(\mathbb{R}^n)$. If $\Lambda$ is positive, the construction above gives a positive measure. If $\Lambda$ takes negative values, no positive measure can represent it, and a different version of the theorem (treating signed measures, discussed in the next section) is needed. The positivity hypothesis is also what makes the construction canonical: there is no choice involved in how to define $\mu(U)$ for open $U$; the sup formula is forced by the requirement that $\Lambda(f) = \int f \, d\mu$.
The uniqueness clause is equally important. Without it, the theorem would merely assert existence, which is far weaker. Uniqueness means that two Radon measures that give the same integrals against all $C_c(\mathbb{R}^n)$ functions are identical. Since Radon measures are outer regular, they are determined by their values on open sets, and those values are determined by $\Lambda$ through the sup formula. This chain of reasoning shows that the duality between positive linear functionals and Radon measures is bijective: every positive linear functional arises from a unique Radon measure, and every Radon measure gives a positive linear functional.
For geometric measure theory, the forward direction of this correspondence is the key: given a geometric construction that produces a positive linear functional, the theorem guarantees a Radon measure behind it. This is how one proves, for example, that surface area is a measure — one shows that integration against a surface defines a positive linear functional, and the Riesz theorem does the rest.
## Signed and Vector-Valued Measures
What happens when the functional $\Lambda$ is not positive? Can we still extract a measure?
Many geometric quantities naturally give rise to signed functionals. The first variation of area, which measures how much the area of a surface changes under a deformation, is a signed quantity: it can be positive (when the deformation increases area) or negative (when it decreases area). Similarly, the distributional divergence of a vector field is a signed distribution. To handle these, we need to extend the Riesz theorem to signed measures.
A bounded linear functional on $C_0(\mathbb{R}^n)$ is the right domain here. Recall that $C_0(\mathbb{R}^n)$ is the space of continuous functions vanishing at infinity, equipped with the supremum norm $\|f\|_\infty = \sup_{x \in \mathbb{R}^n} |f(x)|$. This is a Banach space, and its dual captures exactly the signed Radon measures.
[definition: Signed Radon Measure]
A **signed Radon measure** on $\mathbb{R}^n$ is a set function $\mu: \mathcal{B}(\mathbb{R}^n) \to \mathbb{R}$ that can be written as $\mu = \mu^+ - \mu^-$, where $\mu^+$ and $\mu^-$ are Radon measures with $\mu^+(\mathbb{R}^n) + \mu^-(\mathbb{R}^n) < \infty$. The **total variation** of $\mu$ is the positive Radon measure $|\mu| = \mu^+ + \mu^-$, and the **total variation norm** is $\|\mu\|_{TV} = |\mu|(\mathbb{R}^n) = \mu^+(\mathbb{R}^n) + \mu^-(\mathbb{R}^n)$.
[/definition]
The decomposition $\mu = \mu^+ - \mu^-$ is the Jordan decomposition. It is unique when we require $\mu^+$ and $\mu^-$ to be mutually singular (that is, supported on disjoint Borel sets). The total variation norm makes the space of signed Radon measures with finite total variation into a Banach space.
[quotetheorem:2980]
The equality between the operator norm and the total variation is more than a pleasant identity — it is the isometric isomorphism $(C_0(\mathbb{R}^n))^* \cong M(\mathbb{R}^n)$, where $M(\mathbb{R}^n)$ denotes the space of finite signed Radon measures. This says the dual of $C_0(\mathbb{R}^n)$ is, as a Banach space, exactly the signed Radon measures with finite total variation.
The proof reduces to the positive case by decomposing $\Lambda$. For any bounded linear functional $\Lambda$ on $C_0(\mathbb{R}^n)$, define $\Lambda^+(f) = \sup\{\Lambda(g) : 0 \leq g \leq f\}$ for $f \geq 0$. This defines a positive linear functional on non-negative functions, which extends to all of $C_0(\mathbb{R}^n)$ and gives the positive part $\mu^+$ by the Riesz theorem for positive functionals. The negative part is $\Lambda^- = \Lambda^+ - \Lambda$, which is also positive, giving $\mu^-$.
The norm equality $\|\Lambda\| = |\mu|(\mathbb{R}^n)$ follows because on one hand $|\Lambda(f)| \leq \int |f| \, d|\mu| \leq \|f\|_\infty |\mu|(\mathbb{R}^n)$, giving $\|\Lambda\| \leq |\mu|(\mathbb{R}^n)$; and on the other hand, for any $\varepsilon > 0$ one can find $f \in C_0(\mathbb{R}^n)$ with $\|f\|_\infty \leq 1$ and $\int f \, d\mu \geq |\mu|(\mathbb{R}^n) - \varepsilon$ by approximating the sign function $\operatorname{sgn}$ of $\mu$'s Radon-Nikodym decomposition.
The vector-valued extension of the theorem is crucial for geometric measure theory, where boundaries and normals are inherently directional objects.
[definition: Vector-Valued Radon Measure]
An $\mathbb{R}^m$-valued Radon measure on $\mathbb{R}^n$ is a map $\mu = (\mu_1, \ldots, \mu_m)$ where each component $\mu_i$ is a signed Radon measure on $\mathbb{R}^n$ with finite total variation. The total variation measure of $\mu$ is the positive Radon measure $|\mu|$ defined by
\begin{align*}
|\mu|(A) = \sup\left\{\sum_{j} |\mu(A_j)| : A = \bigsqcup_j A_j\ \text{pairwise disjoint Borel sets}\right\},
\end{align*}
where $|\mu(A_j)| = \left(\sum_{i=1}^m \mu_i(A_j)^2\right)^{1/2}$ is the Euclidean norm of $\mu(A_j) \in \mathbb{R}^m$.
[/definition]
[quotetheorem:2981]
The distinction between the positive functional theorem (on $C_c(\mathbb{R}^n)$, possibly infinite total variation) and the signed/vector-valued theorem (on $C_0(\mathbb{R}^n)$, necessarily finite total variation) is worth emphasizing. In the positive case, the functional might assign infinite value to large compactly supported functions — this corresponds to a Radon measure with $\mu(\mathbb{R}^n) = \infty$ (e.g., Lebesgue measure). The signed case requires boundedness, which forces finite total variation. The vector-valued theorem is the version that appears in the definition of sets of finite perimeter, where the distributional gradient of an indicator function is an $\mathbb{R}^n$-valued Radon measure.
## Constructing Geometric Measures
How do geometric objects — surfaces, curves, boundaries — give rise to positive linear functionals, and therefore to Radon measures?
The power of the Riesz theorem in geometric measure theory lies in its constructive character. To show that a geometric quantity is a Radon measure, it suffices to:
1. Define a linear functional $\Lambda$ by integrating test functions against the geometric object.
2. Verify that $\Lambda$ is positive (or, for signed quantities, bounded).
3. Apply the Riesz theorem to obtain the measure.
This strategy transforms potentially difficult measure-theoretic verification into an analytic check. Let us see it in action for the fundamental example: surface area.
[example: Surface Area as a Radon Measure]
Let $S \subset \mathbb{R}^n$ be a smooth $(n-1)$-dimensional submanifold — for concreteness, take $S$ to be the graph of a smooth function $g: U \to \mathbb{R}$ where $U \subset \mathbb{R}^{n-1}$ is open and bounded. Define the linear functional $\Lambda: C_c(\mathbb{R}^n) \to \mathbb{R}$ by
\begin{align*}
\Lambda(f) = \int_S f \, d\mathcal{H}^{n-1},
\end{align*}
where $\mathcal{H}^{n-1}$ is $(n-1)$-dimensional Hausdorff measure.
We verify positivity: if $f \geq 0$, then $f$ restricted to $S$ is also non-negative, so the integral $\int_S f \, d\mathcal{H}^{n-1} \geq 0$. Thus $\Lambda$ is a positive linear functional on $C_c(\mathbb{R}^n)$.
The Riesz Representation Theorem guarantees a unique Radon measure $\mu$ on $\mathbb{R}^n$ with $\Lambda(f) = \int_{\mathbb{R}^n} f \, d\mu$. We can identify $\mu$ explicitly: it must be the restriction of $\mathcal{H}^{n-1}$ to $S$, denoted $\mathcal{H}^{n-1} \lfloor S$. To see why, note that for any Borel set $A \subset \mathbb{R}^n$, we need $\mu(A) = \mathcal{H}^{n-1}(A \cap S)$. This follows by approximating $\mathbf{1}_{A \cap S}$ by continuous functions with compact support, applying $\Lambda$, and using the Riesz measure.
For the graph case $S = \{(x, g(x)) : x \in U\}$, the surface area formula gives
\begin{align*}
\Lambda(f) = \int_U f(x, g(x)) \sqrt{1 + |\nabla g(x)|^2} \, d\mathcal{L}^{n-1}(x),
\end{align*}
where $|\nabla g|^2 = \sum_{i=1}^{n-1} (\partial_{x_i} g)^2$. This shows $\Lambda$ is well-defined and finite for $f \in C_c(\mathbb{R}^n)$ because the integrand $f(x, g(x))\sqrt{1 + |\nabla g|^2}$ is bounded and has compact support in $U$. The Riesz theorem then gives $\mu = \mathcal{H}^{n-1} \lfloor S$ as a Radon measure on $\mathbb{R}^n$.
[/example]
The example looks almost circular — we define $\Lambda$ using $\mathcal{H}^{n-1}$ and conclude that $\mu = \mathcal{H}^{n-1} \lfloor S$. But the Riesz theorem's value is not in this smooth case; it is in the rough case, where no formula like the surface area integral is available. The boundary of a set of finite perimeter, for instance, is not a smooth surface. Yet its "perimeter measure" can be constructed through the same route: define a bounded linear functional using distributional derivatives, and apply the vector-valued Riesz theorem.
[example: Surface Area of the Unit Sphere]
Define $\Lambda: C_c(\mathbb{R}^n) \to \mathbb{R}$ by
\begin{align*}
\Lambda(f) = \int_{\partial B(0,1)} f(x) \, d\mathcal{H}^{n-1}(x),
\end{align*}
where $\partial B(0,1) = \{x \in \mathbb{R}^n : |x| = 1\}$. This is a positive linear functional: linearity is immediate, and positivity follows since $f \geq 0$ implies $f|_{\partial B(0,1)} \geq 0$.
The Riesz theorem gives a Radon measure $\mu$ on $\mathbb{R}^n$ with $\Lambda(f) = \int_{\mathbb{R}^n} f \, d\mu$. This measure is $\mu = \mathcal{H}^{n-1} \lfloor \partial B(0,1)$, the restriction of Hausdorff measure to the sphere. We compute its total mass. The $(n-1)$-dimensional Hausdorff measure of the unit sphere satisfies
\begin{align*}
\mathcal{H}^{n-1}(\partial B(0,1)) = n \omega_n,
\end{align*}
where $\omega_n = \mathcal{L}^n(B(0,1)) = \frac{\pi^{n/2}}{\Gamma(n/2 + 1)}$ is the volume of the unit ball in $\mathbb{R}^n$. To verify this, use the polar coordinate decomposition of Lebesgue measure:
\begin{align*}
\int_{\mathbb{R}^n} f(x) \, d\mathcal{L}^n(x) = \int_0^\infty \left(\int_{\partial B(0,1)} f(r\theta) \, d\mathcal{H}^{n-1}(\theta)\right) r^{n-1} \, dr.
\end{align*}
Taking $f = \mathbf{1}_{B(0,1)}$ gives $\omega_n = \int_0^1 \mathcal{H}^{n-1}(\partial B(0,1)) r^{n-1} \, dr = \frac{\mathcal{H}^{n-1}(\partial B(0,1))}{n}$, so $\mathcal{H}^{n-1}(\partial B(0,1)) = n\omega_n$.
For specific dimensions:
- $n = 2$: $\omega_2 = \pi$, so $\mathcal{H}^1(S^1) = 2\pi$, the circumference of the unit circle.
- $n = 3$: $\omega_3 = \frac{4\pi}{3}$, so $\mathcal{H}^2(S^2) = 4\pi$, the surface area of the unit sphere in $\mathbb{R}^3$.
The measure $\mu = \mathcal{H}^{n-1} \lfloor \partial B(0,1)$ is a Radon measure because $\mathcal{H}^{n-1}(\partial B(0,1)) = n\omega_n < \infty$: a finite measure on a compact set is automatically Radon.
[/example]
The polar coordinate formula used in this example is itself a corollary of Lipschitz mapping theory and the coarea formula (developed in Chapter 11). It is not circular: the formula holds because the map $x \mapsto (|x|, x/|x|)$ from $\mathbb{R}^n \setminus \{0\}$ to $(0,\infty) \times \partial B(0,1)$ is a Lipschitz bijection with a computable Jacobian, and the coarea formula computes the resulting change of measure.
## Why the Riesz Theorem is Central to GMT
Why is identifying positive linear functionals with Radon measures more than a technical convenience?
The deeper reason the Riesz theorem is foundational for geometric measure theory is that it shifts the burden of proof. Instead of constructing measures from scratch — specifying their values on open sets, verifying countable additivity, checking inner and outer regularity — one only needs to produce a positive linear functional. In many geometric problems, this is far more natural.
[motivation]
Consider the problem of defining curvature measures for non-smooth sets. For a smooth convex body $K \subset \mathbb{R}^n$ with boundary $\partial K$ a smooth hypersurface, the mean curvature $H$ is a well-defined function on $\partial K$, and the mean curvature measure is simply $\mu_H(A) = \int_{A \cap \partial K} H \, d\mathcal{H}^{n-1}$. But for a convex body $K$ with a non-smooth boundary (corners, edges, flat faces), this formula makes no sense: there is no well-defined function $H$ on the boundary.
Geometric measure theory resolves this by working with the first variation of area. For any vector field $X \in C_c(\mathbb{R}^n; \mathbb{R}^n)$, the first variation of the perimeter of $K$ under the flow of $X$ is a bounded linear functional in $X$. The Riesz theorem for vector-valued functionals gives an $\mathbb{R}^n$-valued Radon measure — the mean curvature measure — even when there is no pointwise curvature. This is not a formal trick: the measure correctly localizes where curvature is concentrated. For a polyhedral convex body, the mean curvature measure is supported on the edges and vertices, with weight proportional to the angles.
The same philosophy applies to perimeter. For a set $E \subset \mathbb{R}^n$ with smooth boundary, the perimeter in an open set $U$ is $P(E; U) = \mathcal{H}^{n-1}(\partial E \cap U)$. For a rough set, the perimeter is defined via the distributional derivative: the functional $X \mapsto -\int_E \operatorname{div}(X) \, d\mathcal{L}^n$ is a bounded linear functional on $C_c(U; \mathbb{R}^n)$ when $E$ has finite perimeter, and the Riesz theorem gives the vector-valued Radon measure $D\mathbf{1}_E$, the distributional gradient of the indicator function of $E$.
[/motivation]
The GMT machine that handles perimeter, curvature, and more general varifolds all relies on this same structural fact: geometric quantities that can be expressed as continuous linear functionals on spaces of test functions are automatically Radon measures. The Riesz theorem is the key that unlocks the door from functional analysis to measure theory.
[remark: Weak-* Topology and the Riesz Theorem]
The isomorphism $(C_0(\mathbb{R}^n))^* \cong M(\mathbb{R}^n)$ given by the Riesz theorem gives a canonical way to topologize signed Radon measures: the weak-* topology on $M(\mathbb{R}^n)$ is the topology of pointwise convergence on $C_0(\mathbb{R}^n)$. This is exactly the weak convergence of measures studied in Chapter 8. The Riesz theorem is thus the bridge between the abstract functional-analytic notion of weak-* convergence and the concrete measure-theoretic notion of convergence tested against continuous functions.
[/remark]
[explanation: From Functionals to Measures — The Construction in Detail]
The proof of the Riesz theorem for positive functionals is worth understanding in detail, because the construction of $\mu$ from $\Lambda$ is a model for how measures are built throughout GMT.
The key step is defining $\mu$ on open sets via the formula
\begin{align*}
\mu(U) = \sup\left\{\Lambda(f) : f \in C_c(\mathbb{R}^n),\ 0 \leq f \leq 1,\ \operatorname{supp}(f) \subset U\right\}.
\end{align*}
This is the "approximation from inside" strategy: we approximate the indicator $\mathbf{1}_U$ by continuous functions with compact support in $U$, and $\Lambda$ applied to these approximations gives us $\mu(U)$. The sup is necessary because no single continuous compactly supported function equals $\mathbf{1}_U$ unless $U$ is itself compact (which open sets are not, except the empty set).
For a general set $A$, the outer measure is $\mu(A) = \inf\{\mu(U) : U \supseteq A,\ U \text{ open}\}$. Verifying that this is an outer measure is a matter of monotonicity and countable subadditivity, both of which follow from the analogous properties of $\Lambda$ and the sup/inf definitions.
The harder step is verifying Caratheodory's criterion: that every open set $U$ satisfies
\begin{align*}
\mu(E) = \mu(E \cap U) + \mu(E \setminus U)
\end{align*}
for all $E \subset \mathbb{R}^n$. Since $\mu(E) \leq \mu(E \cap U) + \mu(E \setminus U)$ always holds (by subadditivity), the content is the reverse inequality. For open $U$, one shows this using the fact that compactly supported functions in $U$ can be combined with functions supported away from $U$ via a partition of unity argument, so the contributions to $\mu(E)$ from the two sides add up exactly.
Inner regularity — that $\mu(U) = \sup\{\mu(K) : K \subset U,\ K \text{ compact}\}$ for open $U$ — follows from the fact that every $f$ in the sup defining $\mu(U)$ has compact support, so the compactly supported approximations already detect $\mu(U)$.
Finally, the representation formula $\Lambda(f) = \int f \, d\mu$ is established by approximating $f$ by a sum of characteristic functions of level sets, computing $\Lambda$ on each piece using the definition of $\mu$, and recognizing the resulting sum as a Riemann sum for the Lebesgue integral $\int f \, d\mu$.
The entire construction uses only the positivity of $\Lambda$ and no other property. Positivity is what makes the sup formula for $\mu(U)$ non-trivial (it could be $+\infty$ for non-positive functionals) and what ensures Caratheodory's criterion holds (the cancellations that would occur for signed functionals are absent).
[/explanation]
## Connection to Differentiation and Absolute Continuity
How does the Riesz theorem interact with the differentiation theory developed in Chapter 5?
The Riesz theorem and differentiation theory are two sides of the same coin. The Riesz theorem tells us that every positive linear functional is integration against a Radon measure. Differentiation theory (Chapter 5) tells us how to decompose one Radon measure relative to another. Together, they give a complete picture of the structure of Radon measures.
The Radon-Nikodym theorem, in the measure-theoretic form proved in Chapter 5 via Besicovitch covering, states that if $\nu \ll \mu$ (meaning $\mu(A) = 0 \implies \nu(A) = 0$), then there exists a measurable function $f \geq 0$ such that $\nu = f \cdot \mu$, meaning $\nu(A) = \int_A f \, d\mu$ for all Borel sets $A$. The function $f = D_\mu \nu$ is the derivative of $\nu$ with respect to $\mu$, computed as a pointwise limit of ratios $\nu(B(x,r))/\mu(B(x,r))$ as $r \to 0$.
The connection to Riesz is this: the linear functional $\Lambda(g) = \int g \, d\nu = \int g f \, d\mu$ is integration against $\mu$ with weight $f$. The Radon-Nikodym derivative $f = D_\mu \nu$ is what the Riesz theorem gives when we apply it to the functional $g \mapsto \int g \, d\nu$ relative to the base measure $\mu$.
[quotetheorem:2982]
The singular part $\nu_s$ is what is missed by the derivative: at $\mu$-typical points, $D_\mu \nu_s(x) = 0$, but at $\nu_s$-typical points, $D_\mu \nu_s(x) = +\infty$. This dichotomy is not an artifact of the proof method; it is the correct geometric picture. For the Cantor measure $\mu_C$ relative to Lebesgue measure $\mathcal{L}^1$, the derivative $D_{\mathcal{L}^1} \mu_C(x)$ is $+\infty$ at $\mathcal{H}^{\log 2/\log 3}$-almost every point of the Cantor set and $0$ at Lebesgue-almost every point of $\mathbb{R} \setminus C$.
The Riesz theorem and the differentiation theorem together establish that the space of Radon measures is both algebraically rich (it is a Banach space under total variation norm) and analytically transparent (measures are determined by their integrals against continuous functions, and relative to any base measure, they have a density). This dual character — measures as set functions and as continuous functionals — is what makes GMT work.
[remark: Locality of the Riesz Construction]
The Riesz theorem as stated is global: it represents $\Lambda$ as integration against a single measure on all of $\mathbb{R}^n$. But the construction of $\mu(U) = \sup\{\Lambda(f): \operatorname{supp}(f) \subset U\}$ shows that $\mu$ is built locally. If $\Lambda$ restricts to a positive linear functional on functions supported in a fixed open set $V$, the resulting measure is supported in $\overline{V}$. This locality is essential for geometric applications, where we often work on open subsets of $\mathbb{R}^n$ or on manifolds covered by coordinate charts.
[/remark]
## Measures from Geometric Functionals
The Riesz theorem's most striking applications in GMT arise when the linear functional comes from a non-smooth geometric object. We have already seen the smooth surface case; the rough case shows the theorem's true power.
[example: The Perimeter Measure of a Set of Finite Perimeter]
Let $E \subset \mathbb{R}^n$ be a Lebesgue-measurable set, and let $U \subset \mathbb{R}^n$ be open. The perimeter of $E$ in $U$ is defined via the distributional derivative of $\mathbf{1}_E$: we say $E$ has finite perimeter in $U$ if there exists a constant $C < \infty$ such that
\begin{align*}
\left|\int_E \operatorname{div}(X) \, d\mathcal{L}^n\right| \leq C \|X\|_{L^\infty(U;\mathbb{R}^n)}
\end{align*}
for all $X \in C_c^1(U; \mathbb{R}^n)$. This says the linear functional
\begin{align*}
\Lambda(X) = -\int_E \operatorname{div}(X) \, d\mathcal{L}^n
\end{align*}
is bounded on $C_c(U; \mathbb{R}^n)$ with the supremum norm.
The vector-valued Riesz Representation Theorem immediately gives an $\mathbb{R}^n$-valued Radon measure $D\mathbf{1}_E$ (the distributional gradient of $\mathbf{1}_E$) such that
\begin{align*}
\Lambda(X) = \int_U X \cdot dD\mathbf{1}_E
\end{align*}
for all $X \in C_c(U; \mathbb{R}^n)$. The perimeter of $E$ in $U$ is then $P(E; U) = |D\mathbf{1}_E|(U)$, the total variation of $D\mathbf{1}_E$ in $U$.
For a smooth set $E$ with smooth boundary $\partial E$, integration by parts gives $\int_E \operatorname{div}(X) \, d\mathcal{L}^n = \int_{\partial E} X \cdot \nu \, d\mathcal{H}^{n-1}$, where $\nu$ is the outward unit normal. So $D\mathbf{1}_E = -\nu \cdot \mathcal{H}^{n-1} \lfloor \partial E$, and $P(E; U) = \mathcal{H}^{n-1}(\partial E \cap U)$ recovers the classical perimeter. The Riesz theorem extends this to rough sets where $\partial E$ may not be a smooth submanifold — the measure $D\mathbf{1}_E$ "sees" the boundary even when no classical boundary exists.
[/example]
This example is a preview of the theory of BV functions (functions of bounded variation) and sets of finite perimeter, which constitutes a major topic in GMT. The key point is that the Riesz theorem does all the measure-theoretic work: once we verify the boundedness of the distributional divergence functional, the existence of the perimeter measure is automatic.
[explanation: Why the Distributional Gradient Is Vector-Valued]
The distributional gradient $D\mathbf{1}_E$ is an $\mathbb{R}^n$-valued measure rather than a scalar measure because it carries directional information: it records not just where the boundary of $E$ is, but also what direction is "outward." For a smooth set, this is the outward unit normal $\nu$. For a rough set, $D\mathbf{1}_E = -\nu_E \cdot |D\mathbf{1}_E|$, where $\nu_E$ is a measurable $\mathbb{R}^n$-valued function with $|\nu_E| = 1$ $|D\mathbf{1}_E|$-almost everywhere. This is the measure-theoretic outward normal, defined even without any classical notion of boundary.
The total variation $|D\mathbf{1}_E|$ is the scalar perimeter measure, supported on what GMT calls the reduced boundary $\partial^* E$ — the set of points where the normal $\nu_E$ is well-defined in the approximate sense. The De Giorgi structure theorem states that for sets of finite perimeter, the reduced boundary is an $(n-1)$-rectifiable set, meaning it is covered by countably many $C^1$ hypersurfaces up to an $\mathcal{H}^{n-1}$-null set. This deep structural result — that the rough boundary is almost a smooth boundary, at the measure-theoretic level — is one of the cornerstones of GMT.
[/explanation]
The Riesz Representation Theorem thus sits at the junction of functional analysis, measure theory, and geometry. It provides the bridge that converts geometric constructions (integrals over surfaces, distributional derivatives of rough sets) into Radon measures, which can then be analyzed using all the tools developed in the preceding chapters: the Besicovitch covering theorem for differentiation, the Lebesgue-Radon-Nikodym decomposition for structure, and the weak-* compactness theorem for sequential limits. In the chapters that follow, this construction will be used repeatedly — silently, almost automatically — whenever a geometric object is claimed to define a measure.
---
With measures understood both as objects in their own right and as functionals, we study their large-scale behavior. Weak convergence captures how sequences of measures interact with continuous functions, providing a natural topology on the space of all measures.
# 8. Weak Convergence of Measures
Suppose you have a sequence of measures $\mu_k$ on $\mathbb{R}^n$ — perhaps surface area measures on a family of surfaces converging to some limit, or the distribution of mass in a sequence of minimizers for a geometric variational problem. You want to extract some kind of limiting measure. The naive hope is that you can pass to a subsequence and get convergence in a strong sense: perhaps $\mu_k(A) \to \mu(A)$ for every Borel set $A$, or $\|\mu_k - \mu\|_{TV} \to 0$ in total variation. These hopes are almost always too optimistic. In geometric problems, measures concentrate, spread, and migrate in ways that no strong topology can capture with compactness. The remedy is to work with a topology that is weak enough to guarantee compactness, yet strong enough to preserve the information that actually matters.
This chapter develops the theory of weak-$*$ convergence for Radon measures. The core ideas are: what the correct notion of convergence is, why it is the right one, and which sequences it can handle. Along the way we encounter two fundamental obstructions to strong convergence in $L^1$ — concentration and oscillation — and develop the machinery to distinguish between them. The chapter concludes with a brief account of Young measures, which capture the distribution of values in oscillating sequences.
[example: Delta Measures Approaching a Point]
Consider the sequence $\mu_k = \delta_{1/k}$ on $\mathbb{R}$, where $\delta_x$ denotes the Dirac mass at $x$. For any fixed continuous function $f \in C_c(\mathbb{R})$,
\begin{align*}
\int_{\mathbb{R}} f \, d\mu_k = f(1/k) \to f(0) = \int_{\mathbb{R}} f \, d\delta_0.
\end{align*}
So $\mu_k$ converges in some sense to $\delta_0$. Now ask: does this convergence hold in total variation? The total variation distance between two Dirac masses at distinct points is
\begin{align*}
\|\delta_{1/k} - \delta_0\|_{TV} = 2,
\end{align*}
since $\delta_{1/k}(\{1/k\}) = 1$ while $\delta_0(\{1/k\}) = 0$, and symmetrically, so the total variation picks up both the mass at $1/k$ (missing from $\delta_0$) and the mass at $0$ (missing from $\delta_k$), giving $\|\delta_{1/k} - \delta_0\|_{TV} = 2$ for every $k$. The sequence fails to converge in total variation, yet the convergence captured by integration against continuous functions is perfectly well-behaved and geometrically meaningful: the point mass at $1/k$ really is "moving toward" the point mass at $0$.
[/example]
This example shows that total variation convergence is too rigid for sequences whose mass migrates. Weak-$*$ convergence is the correct framework.
## Weak-$*$ Convergence of Radon Measures
How should one define convergence for Radon measures when total variation is too strong? The key insight from functional analysis is that measures are dual objects: a Radon measure $\mu$ on $\mathbb{R}^n$ acts on continuous functions with compact support by integration. The natural notion of convergence for dual objects is weak-$*$ convergence — requiring convergence on each element of the predual, rather than requiring uniform convergence over all test objects simultaneously.
[definition: Weak-$*$ Convergence of Radon Measures]
Let $\mu, \mu_1, \mu_2, \ldots$ be Radon measures on $\mathbb{R}^n$. The sequence $(\mu_k)$ **converges weak-$*$ to $\mu$**, written $\mu_k \overset{*}{\rightharpoonup} \mu$, if
\begin{align*}
\int_{\mathbb{R}^n} f \, d\mu_k \to \int_{\mathbb{R}^n} f \, d\mu \quad \text{for every } f \in C_c(\mathbb{R}^n).
\end{align*}
[/definition]
The notation is consistent with the functional-analytic weak-$*$ convergence studied in Chapter 7 via the Riesz Representation Theorem. That theorem identifies the space of Radon measures with the dual of $C_c(\mathbb{R}^n)$ equipped with an appropriate topology, and weak-$*$ convergence for the dual is precisely the definition above. This connection is not just formal — it is what makes the compactness theorem work.
The test class $C_c(\mathbb{R}^n)$ is the right choice because it is precisely the predual identified by the Riesz theorem. One might wonder whether a larger or smaller test class would work. Using all bounded continuous functions $C_b(\mathbb{R}^n)$ would give a stronger notion of convergence (called narrow or tight convergence in probability theory), which is more demanding and fails for sequences whose mass escapes to infinity. Using only Borel indicator functions $\mathbf{1}_A$ would ask for setwise convergence, which is even stronger and fails for the Dirac example. The compactly supported continuous functions strike the right balance.
[example: Computing Weak-$*$ Limits]
Let $\mu_k = k \cdot \mathcal{L}^1 \lfloor [0, 1/k]$ on $\mathbb{R}$, where $\mathcal{L}^1$ denotes Lebesgue measure and $\lfloor$ denotes restriction. Each $\mu_k$ is absolutely continuous with respect to $\mathcal{L}^1$, with density $k \cdot \mathbf{1}_{[0,1/k]}$. For $f \in C_c(\mathbb{R})$,
\begin{align*}
\int_{\mathbb{R}} f \, d\mu_k = k \int_0^{1/k} f(x) \, d\mathcal{L}^1(x).
\end{align*}
Since $f$ is continuous, the mean value theorem for integrals gives
\begin{align*}
k \int_0^{1/k} f(x) \, d\mathcal{L}^1(x) = k \cdot f(\xi_k) \cdot \frac{1}{k} = f(\xi_k)
\end{align*}
for some $\xi_k \in [0, 1/k]$. As $k \to \infty$, $\xi_k \to 0$, so $f(\xi_k) \to f(0)$ by continuity. Therefore
\begin{align*}
\int_{\mathbb{R}} f \, d\mu_k \to f(0) = \int_{\mathbb{R}} f \, d\delta_0.
\end{align*}
The absolutely continuous measures $\mu_k$ converge weak-$*$ to the singular measure $\delta_0$. This is concentration: mass accumulates at a point. The limit measure has a completely different character from every term in the sequence — it is singular with respect to Lebesgue measure while each $\mu_k$ is absolutely continuous. Weak-$*$ convergence does not preserve absolute continuity.
[/example]
The concentration example above reveals both the strength and the limitation of weak-$*$ convergence. It correctly identifies $\delta_0$ as the limiting object, but it loses information: the sequence has $\mu_k([0,1/k]) = 1$ for all $k$, whereas the limit has $\delta_0((0, 1/k]) = 0$ for every fixed $k$. This is not a failure of the definition — it is a fundamental feature. Weak-$*$ convergence retains the large-scale behavior of the sequence but can lose information about concentration at scales that shrink to zero.
[remark: Weak-$*$ Convergence and Setwise Convergence]
Weak-$*$ convergence does not imply setwise convergence $\mu_k(A) \to \mu(A)$ for arbitrary Borel sets $A$. In the concentration example above, $\mu_k([0, 1]) = 1$ for all $k$ and $\delta_0([0,1]) = 1$, so coincidentally the sets $[0,1]$ are fine. But $\mu_k((0, 1]) = 1$ for all $k$ while $\delta_0((0, 1]) = 0$, since $\delta_0$ gives no mass to the open interval $(0,1]$. What does hold is convergence for sets $A$ with $\mu(\partial A) = 0$: if $A$ is a Borel set whose boundary has $\mu$-measure zero, then $\mu_k(A) \to \mu(A)$. This is the portmanteau theorem for Radon measures.
[/remark]
## The Compactness Theorem
The defining virtue of weak-$*$ convergence for Radon measures is that it comes with a powerful compactness result. In analysis, having a good compactness theorem means you can always extract convergent subsequences from bounded sequences — the key tool for existence proofs and for passing to limits in geometric variational problems. The question is: what is the right notion of boundedness for a sequence of Radon measures?
For a single compact set $K$, boundedness means $\sup_k \mu_k(K) < \infty$. The correct global condition is that this holds for every compact set.
[definition: Local Boundedness for Radon Measures]
A sequence of Radon measures $(\mu_k)$ on $\mathbb{R}^n$ is **locally bounded** if
\begin{align*}
\sup_{k \ge 1} \mu_k(K) < \infty
\end{align*}
for every compact set $K \subset \mathbb{R}^n$.
[/definition]
Local boundedness is the natural condition. It says that on any fixed bounded region, the total mass of the sequence stays under control. It does not prevent mass from escaping to infinity (a sequence could have $\mu_k(\mathbb{R}^n) \to \infty$ while being locally bounded), and it does not prevent concentration at a point (the concentration example $k \cdot \mathcal{L}^1 \lfloor [0, 1/k]$ is locally bounded since on any compact $K \subset \mathbb{R}^n$, $\mu_k(K) \le k \cdot (1/k) = 1$ for all $k$).
[quotetheorem:2983]
This theorem is the analogue of the Banach–Alaoglu theorem for the space of Radon measures. The proof proceeds by a diagonal argument: fix a countable dense subset of $C_c(\mathbb{R}^n)$, extract convergent subsequences for each element of this dense set using the Bolzano–Weierstrass theorem (applied to the bounded numerical sequences $\int f \, d\mu_k$), then diagonalize to get a single subsequence that works for the entire dense set. The local boundedness hypothesis ensures that the linear functional $f \mapsto \lim_j \int f \, d\mu_{k_j}$ is well-defined and positive, so the Riesz Representation Theorem (Chapter 7) produces the limiting Radon measure $\mu$.
The hypothesis of local boundedness cannot be weakened to global boundedness ($\sup_k \mu_k(\mathbb{R}^n) < \infty$) without also assuming some control on the support. Consider $\mu_k = \delta_k$: each has total mass $1$, but for any compact $K$, $\mu_k(K) = 0$ for all large $k$. The sequence "escapes to infinity" and has no convergent subsequence — any limit would have to assign zero mass to every compact set, which would make it the zero measure, but $\lim \int f \, d\delta_k = 0$ only for compactly supported $f$ and $k$ large, so the zero measure is the correct weak-$*$ limit if we allow it. In fact, the sequence $\delta_k$ does converge weak-$*$ to the zero measure, since for any $f \in C_c(\mathbb{R}^n)$, $f$ has compact support and $\delta_k(f) = f(k) = 0$ for all sufficiently large $k$. So the theorem applies: the sequence is locally bounded (with $\sup_k \mu_k(K) \le 1$ for each compact $K$, with the supremum actually $0$ for all large $k$), and the weak-$*$ limit is the zero measure.
The geometric significance of the compactness theorem cannot be overstated. In the calculus of variations and GMT, one frequently has a minimizing sequence for a geometric functional — arc lengths, surface areas, perimeters — and wants to extract a subsequence converging to a competitor. The compactness theorem guarantees that this extraction is always possible, provided the measures stay locally bounded. This is why weak-$*$ convergence is the right topology for GMT: it is precisely weak enough to make compactness work.
<!-- illustration-needed: Depict three stages of mass concentration — a broad smooth measure, an intermediate narrower distribution, and a Dirac mass — arranged left to right to visualize the weak-* limit of the concentration sequence k·L^1 restricted to [0,1/k] -->
## Weak Convergence in $L^1$ and Uniform Integrability
When the measures $\mu_k$ are absolutely continuous with respect to Lebesgue measure, say $d\mu_k = f_k \, d\mathcal{L}^n$, weak-$*$ convergence for the measures $\mu_k$ is exactly weak convergence of the functions $f_k$ in $L^1(\mathbb{R}^n)$ (at least for compactly supported test functions). But functions in $L^1$ can fail to converge weakly even when they are bounded in $L^1$-norm. Understanding when weak sequential compactness holds for $L^1$ requires identifying precisely what obstructs it.
There are two fundamentally different ways a bounded sequence in $L^1$ can fail to have a weakly convergent subsequence:
**Concentration:** Mass piles up at a point. The sequence $f_k = k \cdot \mathbf{1}_{[0, 1/k]}$ on $[0, 1]$ satisfies $\|f_k\|_{L^1} = 1$ for all $k$, but $f_k \rightharpoonup 0$ in $L^1([0,1])$ fails because the mass is concentrating at $0$. In fact, $f_k$ does converge weakly to $0$ in $L^1$ (since $\int f_k g \, d\mathcal{L}^1 \to 0$ for any $g \in L^\infty$), but the corresponding measures $k \cdot \mathcal{L}^1 \lfloor [0, 1/k]$ converge weak-$*$ to $\delta_0$, which is not absolutely continuous. The functions $f_k$ "converge to something that lives outside $L^1$."
**Oscillation:** Values alternate faster and faster. The sequence $f_k(x) = \sin(2\pi k x)$ on $[0, 1]$ satisfies $\|f_k\|_{L^2} = 1/\sqrt{2}$ for all $k$, yet $f_k \rightharpoonup 0$ weakly in $L^2([0,1])$ by the Riemann–Lebesgue lemma. The functions are bounded, they converge weakly, but they do so by oscillating — not by concentrating. The information about the range of values is entirely lost in the weak limit.
The Dunford–Pettis theorem characterizes precisely when these pathologies are absent.
[definition: Uniform Integrability]
A family $\mathcal{F} \subset L^1(\mathbb{R}^n, \mu)$ is **uniformly integrable** if:
\begin{align*}
\lim_{M \to \infty} \sup_{f \in \mathcal{F}} \int_{\{|f| > M\}} |f| \, d\mu &= 0 \quad \text{(no escape to } \pm \infty\text{)}, \\
\lim_{\mu(A) \to 0} \sup_{f \in \mathcal{F}} \int_A |f| \, d\mu &= 0 \quad \text{(no concentration)}.
\end{align*}
[/definition]
The two conditions in uniform integrability capture the two failure modes. The first condition prevents mass from escaping to the regions where $|f|$ is very large — it rules out sequences whose $L^1$ norm is dominated by large values occurring on sets of positive measure. The second condition prevents concentration: if $f_k = k \cdot \mathbf{1}_{[0, 1/k]}$, then $\int_{[0, 1/k]} |f_k| \, d\mathcal{L}^1 = 1$ while $\mathcal{L}^1([0, 1/k]) = 1/k \to 0$, violating the second condition.
[quotetheorem:3041]
The Dunford–Pettis theorem is the $L^1$ analogue of the Compactness Theorem for Radon measures. The two conditions of uniform integrability together rule out both concentration (which would produce a singular limit) and escape to large values (which would mean the $L^1$ norm of the sequence is not controlled in the way needed for a weak limit to exist in $L^1$). When the measure $\mu$ is finite, boundedness in $L^1$ does not suffice for weak compactness — the uniform integrability condition is genuinely necessary.
The hypothesis that $\mu$ be finite is essential. On $(\mathbb{R}, \mathcal{L}^1)$, the sequence $f_k = \mathbf{1}_{[k, k+1]}$ satisfies $\|f_k\|_{L^1} = 1$ and is uniformly integrable (no concentration since each $f_k$ is already bounded by $1$, and $\int_{\{|f_k| > M\}} |f_k| \, d\mathcal{L}^1 = 0$ for $M \ge 1$), yet $f_k \rightharpoonup 0$ weakly in $L^1(\mathbb{R})$ — this is fine — but the spirit of the theorem requires the measure to be finite for the equivalence with weak compactness to hold in full generality.
[example: Verifying Uniform Integrability]
Consider the sequence $f_k(x) = k^{1/2} \cdot \mathbf{1}_{[0, 1/k]}$ on $[0, 1]$ with Lebesgue measure. We have $\|f_k\|_{L^1} = k^{1/2} \cdot (1/k) = k^{-1/2} \to 0$, so the sequence is bounded in $L^1$. For the concentration condition: let $A \subset [0,1]$ be measurable with $\mathcal{L}^1(A) < \varepsilon$. Then
\begin{align*}
\int_A |f_k| \, d\mathcal{L}^1 = k^{1/2} \cdot \mathcal{L}^1(A \cap [0, 1/k]) \le k^{1/2} \cdot \min\{\mathcal{L}^1(A), 1/k\}.
\end{align*}
If $\mathcal{L}^1(A) \le 1/k$, this is bounded by $k^{1/2} \cdot \mathcal{L}^1(A) \le k^{1/2} \varepsilon$, which is not uniformly small as $k \to \infty$. Wait — to check uniform integrability uniformly over $k$, we take the supremum. For large $k$ and $A$ with $\mathcal{L}^1(A) = \varepsilon/2$, we can have $A \supset [0, 1/k]$ for large $k$, giving $\int_A |f_k| \, d\mathcal{L}^1 = k^{1/2}/k = k^{-1/2} \to 0$. The supremum over $k$ of $\int_A |f_k| \, d\mathcal{L}^1$ is at most $\sup_k k^{-1/2} = 1$ when $\mathcal{L}^1(A) \ge 1/k$, but this is bounded, and as $\mathcal{L}^1(A) \to 0$ we need $A \cap [0,1/k] \to 0$ in measure, so the contribution goes to zero. In fact the sequence $f_k \to 0$ strongly in $L^1$, so it is uniformly integrable. Compare with $g_k = k \cdot \mathbf{1}_{[0, 1/k]}$: $\|g_k\|_{L^1} = 1$ but $\int_{[0, 1/k]} |g_k| \, d\mathcal{L}^1 = 1$ while $\mathcal{L}^1([0, 1/k]) = 1/k \to 0$, violating uniform integrability.
[/example]
## Oscillation and Young Measures
Uniform integrability rules out concentration. But even when a sequence $(f_k)$ is uniformly integrable and converges weakly in $L^1$, the weak limit can fail to capture all the information in the sequence. The second failure mode — oscillation — means that the values of $f_k$ are cycling rapidly through a range of values, and the weak limit averages these values out. The fundamental question is: what information does a weakly convergent oscillating sequence actually retain, and how can we recover the rest?
[example: Weakly Convergent Oscillating Sequence]
Let $f_k : [0, 1] \to \mathbb{R}$ be defined by $f_k(x) = \sin(2\pi k x)$. We have $\|f_k\|_{L^2} = 1/\sqrt{2}$ for all $k$. For any $g \in L^2([0,1])$, the Riemann–Lebesgue lemma gives
\begin{align*}
\int_0^1 f_k(x) g(x) \, d\mathcal{L}^1(x) = \int_0^1 \sin(2\pi k x) g(x) \, d\mathcal{L}^1(x) \to 0
\end{align*}
as $k \to \infty$, since $\hat{g}(k) \to 0$ for $g \in L^2$. So $f_k \rightharpoonup 0$ weakly in $L^2([0,1])$.
The weak limit is identically zero. Yet the values of $f_k$ are not approaching zero — for every $k$, $f_k$ takes every value in $[-1, 1]$ on each subinterval $[j/k, (j+1)/k]$. The function is oscillating with equal time spent near each value in $[-1, 1]$. What has happened is that on any fixed set of positive measure, the positive and negative parts of $f_k$ eventually cancel exactly, producing the zero average captured by the weak limit. The weak limit $0$ is telling us the average value, not the distribution of values.
[/example]
To capture the distribution of values in an oscillating sequence, one needs a richer object than the weak limit. This is the role of Young measures.
[definition: Young Measure]
Let $U \subset \mathbb{R}^n$ be open and $(f_k)$ a sequence of measurable functions $f_k : U \to \mathbb{R}^m$. A **Young measure** generated by $(f_k)$ is a measurable family of probability measures $(\nu_x)_{x \in U}$, with each $\nu_x$ a Borel probability measure on $\mathbb{R}^m$, satisfying the following: for every continuous function $\Phi : \mathbb{R}^m \to \mathbb{R}$ with $\sup_k \|\Phi(f_k)\|_{L^1(U)} < \infty$, and every bounded measurable $g : U \to \mathbb{R}$,
\begin{align*}
\int_U g(x) \Phi(f_k(x)) \, d\mathcal{L}^n(x) \to \int_U g(x) \left(\int_{\mathbb{R}^m} \Phi(\lambda) \, d\nu_x(\lambda)\right) d\mathcal{L}^n(x)
\end{align*}
along a subsequence. The measure $\nu_x$ encodes the distribution of limiting values of $f_k$ near $x$.
[/definition]
The Young measure theorem (due to Young, 1942, with the modern form due to Ball) guarantees that every bounded sequence in $L^\infty$ generates a Young measure along a subsequence. For an oscillating sequence, $\nu_x$ is nontrivial — it is a probability measure spread over the range of oscillation.
[example: Young Measure of Sinusoidal Oscillation]
Return to $f_k(x) = \sin(2\pi k x)$ on $[0, 1]$. The Young measure generated by this sequence is $\nu_x = \nu$ for all $x \in [0,1]$, independent of $x$, where $\nu$ is the pushforward of the uniform measure on $[0, 1]$ under the map $t \mapsto \sin(2\pi t)$.
To identify $\nu$ explicitly: for a continuous function $\Phi : \mathbb{R} \to \mathbb{R}$ and $g \equiv 1$,
\begin{align*}
\int_0^1 \Phi(\sin(2\pi k x)) \, d\mathcal{L}^1(x).
\end{align*}
By the substitution $u = kx$, this equals $\frac{1}{k} \int_0^k \Phi(\sin(2\pi u)) \, d\mathcal{L}^1(u)$. Since $\sin(2\pi u)$ is periodic with period $1$, this equals $\int_0^1 \Phi(\sin(2\pi u)) \, d\mathcal{L}^1(u)$, independently of $k$. Therefore the Young measure limit is
\begin{align*}
\int_0^1 \Phi(\lambda) \, d\nu(\lambda) = \int_0^1 \Phi(\sin(2\pi u)) \, d\mathcal{L}^1(u),
\end{align*}
which is exactly the integral of $\Phi$ against the pushforward measure $\nu = (\sin(2\pi \cdot))_\# \mathcal{L}^1 \lfloor [0,1]$. In terms of its density with respect to the arcsine measure: $\nu$ is the probability measure on $[-1, 1]$ with density $\frac{1}{\pi \sqrt{1 - \lambda^2}}$ with respect to $\mathcal{L}^1 \lfloor (-1,1)$. This is the arcsine distribution, concentrated near $\pm 1$ because $\sin$ spends more time near its extreme values.
Note that $\int_{\mathbb{R}} \lambda \, d\nu(\lambda) = \int_0^1 \sin(2\pi u) \, d\mathcal{L}^1(u) = 0$, which matches the weak limit $f_k \rightharpoonup 0$: the weak limit is always the mean of the Young measure, $f(x) = \int_{\mathbb{R}^m} \lambda \, d\nu_x(\lambda)$.
[/example]
The Young measure carries strictly more information than the weak limit. The weak limit $f = 0$ tells us only the mean value; the Young measure $\nu = $ arcsine distribution tells us the full distribution of values that $f_k$ cycles through near every point. The Young measure is identically the Dirac mass $\delta_{f(x)}$ if and only if $f_k \to f$ in measure (i.e., if there is no genuine oscillation).
[explanation: Why GMT Needs Young Measures]
In geometric measure theory, minimizing sequences for variational problems frequently exhibit oscillation. Consider minimizing
\begin{align*}
I[u] = \int_U W(\nabla u) \, d\mathcal{L}^n
\end{align*}
where $W : \mathbb{R}^{n \times m} \to \mathbb{R}$ is a stored energy density. If $W$ is not quasiconvex (a condition related to the failure of convexity for the symmetrized gradient), then the infimum of $I$ may not be attained. Minimizing sequences $u_k$ exist with $I[u_k] \to \inf I$, but the gradients $\nabla u_k$ can develop finer and finer oscillations — the sequence tries to achieve a microstructure that the limiting function cannot have. The Young measure $(\nu_x)_{x \in U}$ generated by $(\nabla u_k)$ captures this microstructure: $\nu_x$ encodes the distribution of gradient values that the minimizing sequence is cycling through near $x$. This perspective, developed by Tartar and DiPerna in the 1980s, is the basis for the modern theory of relaxation in the calculus of variations and is directly relevant to models in materials science where laminate microstructures appear.
For GMT in particular, understanding what information is lost when passing to a weak-$*$ limit of area measures — whether the loss is due to concentration (mass moving to lower-dimensional sets) or oscillation (rapid surface corrugations) — is essential for understanding which geometric properties are preserved by weak limits.
[/explanation]
## Portmanteau Theorem and Consequences
Weak-$*$ convergence is defined through integration against continuous test functions. But in practice, one often needs to know whether $\mu_k(A) \to \mu(A)$ for specific Borel sets $A$. The portmanteau theorem characterizes exactly which sets $A$ have this property in terms of the topology relative to $\mu$.
[quotetheorem:2985]
The asymmetry in the first two statements reflects the structure of weak-$*$ limits. Open sets see their measure bounded from below in the limit: any open set has test functions compactly supported inside it, and these eventually detect the full mass. Compact sets see their measure bounded from above: mass can escape from compact sets to the boundary or to infinity, and the limit measure cannot be forced to retain it. The condition $\mu(\partial A) = 0$ is a regularity condition that ensures no mass of the limit measure sits on the boundary of $A$, preventing the limiting behavior from "oscillating" across $\partial A$.
The first inequality in the portmanteau theorem is the lower semicontinuity of $\mu \mapsto \mu(U)$ with respect to weak-$*$ convergence. This lower semicontinuity is a recurring theme in GMT: many geometric quantities (length, area, perimeter) are lower semicontinuous under weak-$*$ convergence of the associated measures, which is what makes existence proofs via direct methods possible. A minimizing sequence has a convergent subsequence (by compactness), and the limit is a competitor (not just a formal limit) because the functional is lower semicontinuous.
[example: Portmanteau at Work]
Let $\mu_k = \delta_{1/k}$ on $\mathbb{R}$, converging weak-$*$ to $\mu = \delta_0$. Consider the interval $A = (0, 1)$, which is open. We have $\mu_k((0,1)) = 1$ for all $k$ (since $1/k \in (0,1)$ for $k \ge 2$), and $\mu((0,1)) = \delta_0((0,1)) = 0$. The portmanteau theorem says $\mu(U) \le \liminf \mu_k(U)$, and indeed $0 \le 1$. But $\mu_k((0,1)) \not\to \mu((0,1))$ — the limit is $1$ while the target is $0$. The reason is that $\partial(0,1) = \{0, 1\}$, and $\mu(\{0\}) = \delta_0(\{0\}) = 1 > 0$: the boundary of $(0,1)$ carries positive $\mu$-measure, so the portmanteau theorem does not guarantee setwise convergence for this particular set. Now consider $A = (1/2, 1)$: $\mu_k((1/2, 1)) = 0$ for $k \ge 3$ (since $1/k < 1/2$), and $\mu((1/2, 1)) = 0$. Here $\partial(1/2, 1) = \{1/2, 1\}$ and $\mu(\{1/2\}) = \mu(\{1\}) = 0$, so the portmanteau theorem does apply and confirms $\mu_k((1/2,1)) \to 0 = \mu((1/2,1))$.
[/example]
## Tight Sequences and the Role of Mass at Infinity
A locally bounded sequence can have mass escaping to infinity. In the compactness theorem, the extracted subsequence still has a weak-$*$ limit, but that limit may have strictly less total mass than any term in the sequence: $\mu(\mathbb{R}^n) < \liminf_k \mu_k(\mathbb{R}^n)$. This loss of mass at infinity is not pathological — it is captured precisely by the portmanteau inequality for open sets applied to $U = \mathbb{R}^n$.
[definition: Tightness]
A sequence of Radon measures $(\mu_k)$ on $\mathbb{R}^n$ is **tight** if for every $\varepsilon > 0$ there exists a compact set $K \subset \mathbb{R}^n$ such that
\begin{align*}
\sup_{k \ge 1} \mu_k(\mathbb{R}^n \setminus K) < \varepsilon.
\end{align*}
[/definition]
Tightness is the condition that prevents mass from escaping to infinity. It strengthens local boundedness: a tight sequence is locally bounded (since the compact $K$ that captures $1 - \varepsilon$ of the mass is fixed, and the total mass on $K$ is bounded), but local boundedness does not imply tightness (the sequence $\delta_k$ is locally bounded but not tight).
[quotetheorem:3043]
The proof is a direct application of the portmanteau theorem. The compact sets $K$ in the definition of tightness satisfy $\mu_k(K) \to \mu(K)$ (assuming $\mu(\partial K) = 0$, which can be arranged by slightly enlarging $K$), and the mass outside $K$ is less than $\varepsilon$ for all $k$ by tightness, and at most $\varepsilon$ for the limit $\mu$ by the portmanteau inequality for compact sets.
Tightness without total mass preservation is the geometric signature of concentration: mass is staying in bounded regions but focusing on smaller and smaller sets. This is what happens when surface area measures of approximating surfaces concentrate onto a lower-dimensional object — the total mass of the Hausdorff measures can jump down in the limit, and indeed it will if the surfaces are collapsing. This phenomenon of mass loss in the weak-$*$ limit, and the question of how to recover it, is central to the compactness theory for currents in GMT II.
[remark: Narrow Convergence in Probability]
In probability theory, one frequently works with probability measures — measures with total mass exactly $1$. The natural notion of convergence for probability measures is **narrow convergence** (also called convergence in distribution or weak convergence in probability theory): $\mu_k$ converges narrowly to $\mu$ if $\int f \, d\mu_k \to \int f \, d\mu$ for all $f \in C_b(\mathbb{R}^n)$ (bounded continuous functions, not just compactly supported ones). Narrow convergence is stronger than weak-$*$ convergence: it forces $\mu_k(\mathbb{R}^n) \to \mu(\mathbb{R}^n)$ and prevents mass from escaping. The Prokhorov theorem is the narrow-convergence analogue of the compactness theorem: a sequence of probability measures is relatively narrowly compact if and only if it is tight. For GMT, weak-$*$ convergence is more natural than narrow convergence because the measures arising geometrically (Hausdorff measures on subsets, perimeter measures) are not normalized and can have mass escape.
[/remark]
## Summary of the Hierarchy
The theory of this chapter can be organized around a hierarchy of convergences, from weakest to strongest:
Weak-$*$ convergence ($\int f \, d\mu_k \to \int f \, d\mu$ for $f \in C_c(\mathbb{R}^n)$) is the weakest and most flexible. It comes with the compactness theorem and allows mass to concentrate and to escape to infinity. Narrow convergence adds control over mass at infinity. Setwise convergence ($\mu_k(A) \to \mu(A)$ for all Borel $A$) is stronger and rarely available. Total variation convergence ($\|\mu_k - \mu\|_{TV} \to 0$) is the strongest and most rigid, failing for any sequence where mass migrates between distinct points.
For GMT, weak-$*$ convergence is the fundamental notion because it is the only one that simultaneously offers compactness and the flexibility to handle the geometric phenomena — concentration, escape, and oscillation — that appear naturally in the subject. The machinery developed in this chapter — the compactness theorem, the portmanteau inequalities, uniform integrability and the Dunford–Pettis theorem, and Young measures — provides the complete toolkit for analyzing sequences of measures in geometric problems. These tools will reappear throughout the remainder of the course, particularly in the study of densities (Chapter 10) and the connection between weak-$*$ limits and rectifiability in GMT II.
---
We now turn from general Borel measures to the geometric measures central to this course: Hausdorff measures. These measures are tailored to sets of non-integer dimension, and we will use weak convergence and the machinery developed so far to study their properties and applications.
# 9. Hausdorff Measures
How should one measure the "size" of a fractal? The question sounds strange at first — we have perfectly good notions of length, area, and volume. But consider the middle-thirds Cantor set $C$. Its Lebesgue measure $\mathcal{L}^1(C) = 0$, so in one dimension it is negligible. Yet $C$ is uncountable, nowhere dense, and has an intricate self-similar structure that feels genuinely geometric. Calling it zero-dimensional misses something real. What we need is a measure that can distinguish sets by their *fractional* complexity — one that assigns finite, positive size to $C$ despite the fact that $C$ is simultaneously too small for length and too large for counting.
This chapter constructs the Hausdorff measures $\mathcal{H}^s$ for all $s \geq 0$, defines Hausdorff dimension, and proves the foundational result that $\mathcal{H}^n = \mathcal{L}^n$ on $\mathbb{R}^n$. The theory rests on a simple observation: the diameter of a set governs how much $s$-dimensional volume it can contain. By optimizing over increasingly fine covers and normalizing correctly, we extract a measure that is sensitive to fractional geometry in a way that Lebesgue measure cannot be.
[example: The Failure of Lebesgue Measure for Fractals]
Consider three sets in $\mathbb{R}^2$: a smooth curve $\Gamma$, the unit disk $D$, and the Koch snowflake boundary $K$. All three are subsets of $\mathbb{R}^2$, but they have fundamentally different geometric character.
The smooth curve $\Gamma$ satisfies $\mathcal{L}^2(\Gamma) = 0$ — it has zero area, as expected. The disk has $\mathcal{L}^2(D) = \pi > 0$. The snowflake boundary $K$ also has $\mathcal{L}^2(K) = 0$, just like the curve. But $K$ has *infinite length*: at the $k$-th stage of the construction, the perimeter is $(4/3)^k \cdot 3$, which grows without bound. So $\mathcal{H}^1(K) = +\infty$.
Lebesgue measure sees $\Gamma$ and $K$ as identical (both zero-area subsets of $\mathbb{R}^2$), but they are geometrically very different. Hausdorff measure $\mathcal{H}^s$ with $s = \log 4 / \log 3 \approx 1.26$ assigns $K$ a finite, positive size — capturing the fact that $K$ lives in a fractional dimension strictly between 1 and 2.
[/example]
## Construction of $\mathcal{H}^s$
What invariant of a covering set controls how much $s$-dimensional volume it can contribute? The answer is its diameter. A set of diameter $d$ fits inside a ball of radius $d/2$, and the $s$-dimensional volume of such a ball scales like $d^s$. The construction of $\mathcal{H}^s$ turns this observation into a precise infimum over covers.
To build $\mathcal{H}^s$, we need to decide what "diameter at most $\delta$" means and how to normalize the contribution of each covering set. The normalization is not arbitrary: we want $\mathcal{H}^n = \mathcal{L}^n$ when $s = n$ is a positive integer. This forces the normalizing constant to be the volume of the $s$-dimensional unit ball.
[definition: Normalization Constant $\alpha(s)$]
For $s \geq 0$, define
\begin{align*}
\alpha(s) = \frac{\pi^{s/2}}{\Gamma(s/2 + 1)},
\end{align*}
where $\Gamma$ denotes the Euler gamma function. For positive integers, $\alpha(n) = \mathcal{L}^n(B(0,1))$ is the Lebesgue measure of the unit ball in $\mathbb{R}^n$. For example, $\alpha(1) = 2$, $\alpha(2) = \pi$, and $\alpha(3) = 4\pi/3$.
[/definition]
The role of $\alpha(s)$ is to ensure dimensional consistency. When $s = n \in \mathbb{N}$, a ball of diameter $d$ in $\mathbb{R}^n$ has Lebesgue measure $\alpha(n)(d/2)^n$, so the covering sum with this normalization directly approximates Lebesgue measure. For non-integer $s$, the formula extrapolates via the Gamma function, giving a continuous family of normalizations.
[definition: $\delta$-Approximating Hausdorff Measure $\mathcal{H}^s_\delta$]
Let $s \geq 0$ and $\delta > 0$. For any set $E \subset \mathbb{R}^n$, define
\begin{align*}
\mathcal{H}^s_\delta(E) = \inf\left\{ \sum_{j=1}^\infty \alpha(s) \left(\frac{\operatorname{diam}(C_j)}{2}\right)^s : E \subset \bigcup_{j=1}^\infty C_j,\ \operatorname{diam}(C_j) \leq \delta \text{ for all } j \right\},
\end{align*}
where the infimum is over all countable covers of $E$ by sets $C_j$ of diameter at most $\delta$. By convention, $\operatorname{diam}(\varnothing) = 0$.
[/definition]
The restriction to covers of diameter at most $\delta$ is what distinguishes $\mathcal{H}^s_\delta$ from a cruder approximation. Restricting to fine covers forces us to actually resolve the local geometry of $E$ rather than capturing it in one large set. As $\delta$ decreases, fewer covers are permitted, so the infimum can only increase: $\mathcal{H}^s_\delta$ is a non-decreasing function of $1/\delta$.
[definition: Hausdorff Measure $\mathcal{H}^s$]
For $s \geq 0$ and $E \subset \mathbb{R}^n$, define the $s$-dimensional Hausdorff (outer) measure by
\begin{align*}
\mathcal{H}^s(E) = \lim_{\delta \to 0^+} \mathcal{H}^s_\delta(E) = \sup_{\delta > 0} \mathcal{H}^s_\delta(E) \in [0, +\infty].
\end{align*}
The equality of the limit and the supremum holds because $\mathcal{H}^s_\delta(E)$ is non-decreasing as $\delta \to 0^+$.
[/definition]
[remark: Why the Limit Exists]
Since $\delta \mapsto \mathcal{H}^s_\delta(E)$ is non-decreasing as $\delta$ decreases (smaller $\delta$ means fewer permissible covers, so the infimum can only rise), the limit $\lim_{\delta \to 0^+} \mathcal{H}^s_\delta(E)$ always exists in $[0, +\infty]$. No monotone convergence argument is needed — the limit is simply the supremum over all $\delta > 0$.
[/remark]
Why pass to the limit $\delta \to 0$? With a fixed $\delta$, a single large set of diameter $\delta$ could cover a complicated set $E$ cheaply, giving an underestimate of its complexity. As $\delta \to 0$, covers must become increasingly fine, and the infimum forces us to account for the local geometry at all scales. This is the key geometric idea: $\mathcal{H}^s(E)$ is large precisely when $E$ cannot be efficiently covered by small sets.
[quotetheorem:2987]
The proof of countable subadditivity is a straightforward consequence of the definition: if each $A_j$ is covered by sets with the covering sum within $\varepsilon/2^j$ of the infimum, the union is covered by the combined collection. Borel regularity requires more work: for any $E$, one constructs a Borel set $B \supset E$ by taking intersections of open $\delta$-neighborhoods as $\delta \to 0$.
The Borel regularity of $\mathcal{H}^s$ is a crucial structural property. It means $\mathcal{H}^s$ is completely determined by its values on Borel sets, so we can apply all the tools of abstract measure theory — differentiation, Radon-Nikodym, Fubini — to computations involving Hausdorff measure.
[remark: $\mathcal{H}^s$ and Radon Measures]
Unlike Lebesgue measure, $\mathcal{H}^s$ is generally not a Radon measure on all of $\mathbb{R}^n$ for $s < n$: it is not locally finite on compact sets (for example, $\mathcal{H}^1([0,1]^2) = +\infty$ since the square has infinite length). However, if we restrict $\mathcal{H}^s$ to a set $E$ with $\sigma$-finite $\mathcal{H}^s$-measure, the restriction $\mathcal{H}^s \lfloor E$ is a Radon measure on $\mathbb{R}^n$. This is the setting that arises in practice when studying $s$-dimensional surfaces and rectifiable sets.
[/remark]
[example: Computing $\mathcal{H}^1$ of a Line Segment]
Let $E = [0, L] \times \{0\} \subset \mathbb{R}^2$, a line segment of length $L$. We compute $\mathcal{H}^1(E)$.
For the upper bound, cover $E$ by $\lceil L/\delta \rceil$ intervals of length $\delta$ (with possibly one shorter interval at the end). Each interval has diameter $\delta$, so the covering sum is
\begin{align*}
\mathcal{H}^1_\delta(E) \leq \lceil L/\delta \rceil \cdot \alpha(1) \cdot \frac{\delta}{2} = \lceil L/\delta \rceil \cdot \delta.
\end{align*}
Since $\alpha(1) = 2$, each term contributes $2 \cdot (\delta/2) = \delta$. Thus $\mathcal{H}^1_\delta(E) \leq L + \delta$, and passing to the limit gives $\mathcal{H}^1(E) \leq L$.
For the lower bound, note that any cover $\{C_j\}$ of $E$ satisfies $\sum_j \operatorname{diam}(C_j) \geq L$, because the projections of the $C_j$ onto the $x$-axis must cover $[0, L]$, and the length of a projection does not exceed the diameter. Therefore
\begin{align*}
\sum_j \alpha(1) \frac{\operatorname{diam}(C_j)}{2} = \sum_j \operatorname{diam}(C_j) \geq L.
\end{align*}
Thus $\mathcal{H}^1_\delta(E) \geq L$ for all $\delta > 0$, giving $\mathcal{H}^1(E) \geq L$.
Together, $\mathcal{H}^1(E) = L$: one-dimensional Hausdorff measure of a line segment is its length.
[/example]
This example, simple as it is, confirms that the normalization $\alpha(1) = 2$ is correct: $\mathcal{H}^1$ on line segments recovers the familiar Euclidean length. The agreement with classical notions in integer dimensions is not accidental — it is exactly what the normalization $\alpha(s)$ was chosen to ensure.
## Hausdorff Dimension
Once we have a family of measures $\mathcal{H}^s$ indexed by $s \geq 0$, a natural question arises: for a given set $E$, which value of $s$ gives the "right" measure? As $s$ increases, we demand more from the covering sets — their $s$-th power must be small. So $\mathcal{H}^s(E)$ is a decreasing function of $s$. The critical question is: does it jump from $\infty$ to $0$ at a single value, or transition gradually?
The answer is striking: the transition is instantaneous. For any set $E$, there is a single critical value $s_0$ such that $\mathcal{H}^s(E) = +\infty$ for all $s < s_0$ and $\mathcal{H}^s(E) = 0$ for all $s > s_0$. At the critical value $s_0$ itself, $\mathcal{H}^{s_0}(E)$ can be $0$, $+\infty$, or any value in $(0, +\infty)$. This threshold is the Hausdorff dimension.
[quotetheorem:2988]
The proof of this jump is elementary but instructive. If $\{C_j\}$ is a $\delta$-cover of $E$ with $\sum_j (\operatorname{diam}(C_j)/2)^s \leq M$, and if $t > s$, then for all $j$ with $\operatorname{diam}(C_j) \leq \delta$,
\begin{align*}
\left(\frac{\operatorname{diam}(C_j)}{2}\right)^t = \left(\frac{\operatorname{diam}(C_j)}{2}\right)^{t-s} \cdot \left(\frac{\operatorname{diam}(C_j)}{2}\right)^s \leq \left(\frac{\delta}{2}\right)^{t-s} \left(\frac{\operatorname{diam}(C_j)}{2}\right)^s.
\end{align*}
Summing over $j$ and using the ratio of normalizing constants $\alpha(t)/\alpha(s)$, we get $\mathcal{H}^t_\delta(E) \leq C_{s,t} \cdot \delta^{t-s} \cdot \mathcal{H}^s_\delta(E) \leq C_{s,t} \cdot \delta^{t-s} \cdot M$. As $\delta \to 0$ and $t > s$, the right side tends to $0$, so $\mathcal{H}^t(E) = 0$.
This theorem tells us that the function $s \mapsto \mathcal{H}^s(E)$ exhibits a single jump from $+\infty$ to $0$. We define:
[definition: Hausdorff Dimension]
For a set $E \subset \mathbb{R}^n$, the Hausdorff dimension of $E$ is
\begin{align*}
\dim_{\mathcal{H}}(E) = \inf\{s \geq 0 : \mathcal{H}^s(E) = 0\} = \sup\{s \geq 0 : \mathcal{H}^s(E) = +\infty\},
\end{align*}
with the conventions $\inf \varnothing = +\infty$ and $\sup \varnothing = 0$.
[/definition]
The equality of the two expressions is a consequence of the jump theorem: once $\mathcal{H}^s(E) = 0$ for some $s$, it is zero for all larger values; once $\mathcal{H}^s(E) = +\infty$, it is infinite for all smaller values. The infimum and supremum therefore coincide.
The hypothesis that $E \subset \mathbb{R}^n$ ensures $\dim_{\mathcal{H}}(E) \leq n$: any set in $\mathbb{R}^n$ can be covered by cubes of side length $\delta$, giving $\mathcal{H}^s_\delta(E) \to 0$ for $s > n$ whenever $\mathcal{L}^n(E) < \infty$. For sets of infinite Lebesgue measure, one covers them in pieces. The bound $\dim_{\mathcal{H}}(E) \leq n$ always holds for $E \subset \mathbb{R}^n$.
[quotetheorem:2989]
Property 2 — countable stability — is one of the key advantages of Hausdorff dimension over box-counting dimension. A countable union of sets of dimension at most $s$ has dimension at most $s$: this follows because $\mathcal{H}^t(E_j) = 0$ for each $j$ when $t > \dim_{\mathcal{H}}(E_j)$, and $\mathcal{H}^t\!\left(\bigcup_j E_j\right) \leq \sum_j \mathcal{H}^t(E_j) = 0$.
Property 3 explains why rational numbers, the Cantor set removed pieces, and other countable sets have dimension zero: for a single point $\{x\}$, the cover $\{B(x, \delta)\}$ gives $\mathcal{H}^s_\delta(\{x\}) = \alpha(s)(\delta/2)^s \to 0$ as $\delta \to 0$ for any $s > 0$. Countably many such points contribute zero total measure.
Property 5 is the crucial calibration theorem: Hausdorff dimension correctly identifies the dimension of smooth geometric objects. We will not prove this in full generality here, but the case of $\mathcal{H}^n = \mathcal{L}^n$ in Section 3 provides the $k = n$ case, and the case of smooth curves (where $\mathcal{H}^1$ equals arc length) follows from the Lipschitz behavior of parametrizations.
[explanation: Dimension at the Critical Value]
At the critical dimension $s_0 = \dim_{\mathcal{H}}(E)$, the value $\mathcal{H}^{s_0}(E)$ can take any value in $[0, +\infty]$. All three cases actually occur:
- If $E = [0,1]^n$, then $\dim_{\mathcal{H}}(E) = n$ and $\mathcal{H}^n(E) = \mathcal{L}^n([0,1]^n) = 1 \in (0, \infty)$.
- If $E$ is the rational numbers in $[0,1]$, then $\dim_{\mathcal{H}}(E) = 0$ and $\mathcal{H}^0(E) = +\infty$ (since $\mathcal{H}^0$ is counting measure and $\mathbb{Q} \cap [0,1]$ is infinite).
- There exist sets with $\dim_{\mathcal{H}}(E) = s_0$ and $\mathcal{H}^{s_0}(E) = 0$, or $\mathcal{H}^{s_0}(E) = +\infty$, or any prescribed finite positive value. Constructing a set with $\mathcal{H}^s(E) = c$ for a given $c > 0$ and $s \notin \mathbb{N}$ requires explicit fractal constructions.
This ambiguity at the critical value is a feature, not a bug: it means Hausdorff dimension is a genuinely refined invariant that requires more data (the actual value $\mathcal{H}^{s_0}(E)$) to be completely informative.
[/explanation]
<!-- illustration-needed: the graph of s ↦ H^s(E) for a specific fractal set E — show the jump from +∞ to 0 at the critical value s₀ = dim_H(E), with the critical value s₀ marked on the horizontal axis -->
## The Isodiametric Inequality and $\mathcal{H}^n = \mathcal{L}^n$
The construction of $\mathcal{H}^n$ produced a Borel regular outer measure that, by design, assigns finite weight proportional to $\alpha(n)(\operatorname{diam}/2)^n$ to each covering set. But this is an outer measure defined through covering procedures — a priori it has no obvious relationship to Lebesgue measure, which is defined through products of intervals. Why should $\mathcal{H}^n$ agree with $\mathcal{L}^n$?
The key geometric fact is the isodiametric inequality: among all sets of fixed diameter, the ball maximizes Lebesgue measure. This means that when we cover a set $E$ by balls of diameter $\delta$, the Lebesgue measure of each covering ball is exactly $\alpha(n)(\delta/2)^n$. The normalization $\alpha(n)$ was chosen to make this the correct scale.
[quotetheorem:2990]
The proof proceeds via Steiner symmetrization. Given $E \subset \mathbb{R}^n$ and a hyperplane $H = \{x_n = 0\}$, define the Steiner symmetral $E^H$ as follows: for each $x' = (x_1, \ldots, x_{n-1})$, let $I_{x'} = \{t \in \mathbb{R} : (x', t) \in E\}$ and replace $I_{x'}$ by the symmetric interval $(-\mathcal{L}^1(I_{x'})/2, \mathcal{L}^1(I_{x'})/2)$. The set $E^H$ consists of all $(x', t)$ with $|t| < \mathcal{L}^1(I_{x'})/2$.
Steiner symmetrization preserves $\mathcal{L}^n$-measure (by Fubini's theorem, since each fiber is replaced by one of equal length) and does not increase the diameter. To see that the diameter does not increase: if $(x', t)$ and $(y', s)$ are in $E^H$, then $|t| < \mathcal{L}^1(I_{x'})/2$ and $|s| < \mathcal{L}^1(I_{y'})/2$. The key estimate is that $|t - s| \leq |t| + |s| < (\mathcal{L}^1(I_{x'}) + \mathcal{L}^1(I_{y'}))/2$, and this quantity can be bounded using the original diameter of $E$ together with the triangle inequality in $\mathbb{R}^{n-1}$.
By symmetrizing successively in all $n$ coordinate hyperplanes, we obtain a sequence of sets converging (in $\mathcal{L}^n$-measure) to a ball of the same $\mathcal{L}^n$-measure and diameter at most $\operatorname{diam}(E)$. Since a ball of diameter $d$ has Lebesgue measure $\alpha(n)(d/2)^n$, we get $\mathcal{L}^n(E) \leq \alpha(n)(\operatorname{diam}(E)/2)^n$.
The isodiametric inequality is the bridge between $\mathcal{H}^n$ and $\mathcal{L}^n$.
[quotetheorem:2991]
The proof has two directions.
For $\mathcal{H}^n(E) \leq \mathcal{L}^n(E)$: Given any Borel set $E$ and $\varepsilon > 0$, cover $E$ by balls $B_j = B(x_j, r_j)$ of radius at most $\delta$ with $\sum_j \mathcal{L}^n(B_j) \leq \mathcal{L}^n(E) + \varepsilon$ (using outer regularity of $\mathcal{L}^n$). Each ball $B_j$ has diameter $2r_j$ and Lebesgue measure $\alpha(n) r_j^n$, so
\begin{align*}
\mathcal{H}^n_\delta(E) \leq \sum_j \alpha(n) \left(\frac{\operatorname{diam}(B_j)}{2}\right)^n = \sum_j \alpha(n) r_j^n = \sum_j \mathcal{L}^n(B_j) \leq \mathcal{L}^n(E) + \varepsilon.
\end{align*}
As $\varepsilon \to 0$ and $\delta \to 0$, we get $\mathcal{H}^n(E) \leq \mathcal{L}^n(E)$.
For $\mathcal{H}^n(E) \geq \mathcal{L}^n(E)$: This is where the isodiametric inequality enters. For any $\delta$-cover $\{C_j\}$ of $E$,
\begin{align*}
\mathcal{L}^n(E) \leq \mathcal{L}^n\!\left(\bigcup_j C_j\right) \leq \sum_j \mathcal{L}^n(C_j) \leq \sum_j \alpha(n) \left(\frac{\operatorname{diam}(C_j)}{2}\right)^n,
\end{align*}
where the last inequality uses the isodiametric inequality applied to each $C_j$. Taking the infimum over all $\delta$-covers gives $\mathcal{L}^n(E) \leq \mathcal{H}^n_\delta(E)$ for all $\delta > 0$, and therefore $\mathcal{L}^n(E) \leq \mathcal{H}^n(E)$.
This theorem validates the entire construction: the normalization constant $\alpha(s)$ was chosen precisely so that $\mathcal{H}^n = \mathcal{L}^n$. It also shows that Hausdorff measures are the correct generalization: they reduce to Lebesgue measure in integer dimension $n$ and extend meaningfully to all $s \geq 0$.
The necessity of the isodiametric inequality should not be underestimated. For the lower bound $\mathcal{H}^n \geq \mathcal{L}^n$, we need to know that covers by arbitrary sets — not just balls — cannot be more efficient than covers by balls. The isodiametric inequality guarantees exactly this: balls are the extremal covering shapes.
For $s < n$, the identity $\mathcal{H}^s \neq \mathcal{L}^s$ does not make sense (there is no $\mathcal{L}^s$ for non-integer $s$), but the connection to classical geometry holds in a different way: $\mathcal{H}^k$ on a smooth $k$-dimensional surface in $\mathbb{R}^n$ coincides with the classical surface area measure induced by the Riemannian metric, a fact proved using the area formula for Lipschitz maps.
## Hausdorff Dimension and the Cantor Set
How do we actually compute the Hausdorff dimension of a set that is not a smooth manifold? The middle-thirds Cantor set $C$ is the canonical example: it is uncountable, has Lebesgue measure zero, and lives somewhere between dimension 0 and dimension 1. Computing $\dim_{\mathcal{H}}(C)$ requires matching upper and lower bounds, with different tools for each direction.
[definition: Middle-Thirds Cantor Set]
Define a nested sequence of closed sets in $[0,1]$ as follows. Set $C_0 = [0,1]$. Given $C_k$ (a union of $2^k$ closed intervals of length $3^{-k}$), obtain $C_{k+1}$ by removing the open middle third of each component interval. The middle-thirds Cantor set is
\begin{align*}
C = \bigcap_{k=0}^\infty C_k.
\end{align*}
At stage $k$, $C_k$ consists of $2^k$ intervals each of length $3^{-k}$.
[/definition]
[example: Hausdorff Dimension of the Cantor Set]
We show $\dim_{\mathcal{H}}(C) = \log 2 / \log 3$.
**Upper bound.** At stage $k$, the $2^k$ intervals of $C_k$ each have diameter $3^{-k}$, and their union covers $C$. Taking $\delta = 3^{-k}$, this gives
\begin{align*}
\mathcal{H}^s_{3^{-k}}(C) \leq 2^k \cdot \alpha(s) \cdot \left(\frac{3^{-k}}{2}\right)^s = \frac{\alpha(s)}{2^s} \cdot (2 \cdot 3^{-s})^k.
\end{align*}
If $s = \log 2 / \log 3$, then $3^{-s} = 2^{-1}$, so $2 \cdot 3^{-s} = 1$. The bound becomes $\mathcal{H}^s_{3^{-k}}(C) \leq \alpha(s)/2^s$ for all $k$. Passing to the limit gives $\mathcal{H}^s(C) \leq \alpha(s)/2^s < \infty$. Therefore $\dim_{\mathcal{H}}(C) \leq \log 2 / \log 3$.
**Lower bound (mass distribution principle).** Define the Cantor measure $\mu_C$ by distributing mass uniformly: set $\mu_C(I_{k,j}) = 2^{-k}$ for each of the $2^k$ intervals $I_{k,j}$ of $C_k$. This is consistent (each interval splits into two children of half the weight) and defines a Borel probability measure supported on $C$.
We claim: for every ball $B(x, r)$ with $x \in C$, we have $\mu_C(B(x, r)) \leq 2 \cdot 3^s \cdot r^s$ where $s = \log 2/\log 3$.
To see this, choose $k \in \mathbb{N}$ such that $3^{-(k+1)} \leq r < 3^{-k}$. The ball $B(x, r)$ intersects at most 2 intervals of level $k$ (since each $I_{k,j}$ has length $3^{-k}$ and consecutive intervals are separated by gaps of length at least $3^{-k}$). Therefore
\begin{align*}
\mu_C(B(x, r)) \leq 2 \cdot 2^{-k}.
\end{align*}
Since $r \geq 3^{-(k+1)} = 3^{-1} \cdot 3^{-k}$, we have $3^{-k} \leq 3r$, so $2^{-k} = 3^{-ks} = (3^{-k})^s \leq (3r)^s = 3^s r^s$. Therefore
\begin{align*}
\mu_C(B(x, r)) \leq 2 \cdot 3^s \cdot r^s.
\end{align*}
The mass distribution principle now applies: if $\mu_C$ is a Borel measure supported on $C$ with $\mu_C(B(x,r)) \leq C_0 r^s$ for all $x \in C$ and $r > 0$, then $\mathcal{H}^s(C) \geq \mu_C(C)/C_0$. To see why: for any $\delta$-cover $\{C_j\}$ of $C$,
\begin{align*}
\mu_C(C) \leq \mu_C\!\left(\bigcup_j C_j\right) \leq \sum_j \mu_C(C_j) \leq \sum_j \mu_C\!\left(\overline{B}(x_j, \operatorname{diam}(C_j))\right) \leq C_0 \sum_j (\operatorname{diam}(C_j))^s,
\end{align*}
where we enclosed each $C_j$ in a ball of the same diameter. Dividing by $C_0$ and taking the infimum gives $\mathcal{H}^s(C) \geq \mu_C(C)/C_0 = 1/(2 \cdot 3^s) > 0$.
Since $\mathcal{H}^s(C) > 0$ for $s = \log 2/\log 3$, we get $\dim_{\mathcal{H}}(C) \geq \log 2/\log 3$.
Combining both bounds: $\dim_{\mathcal{H}}(C) = \log 2 / \log 3 \approx 0.631$.
[/example]
The mass distribution principle used in the lower bound is one of the most versatile tools in fractal geometry. It converts the existence of a well-distributed measure on $E$ into a lower bound for $\mathcal{H}^s(E)$. The upper bound always comes from explicit covers; the lower bound requires constructing a measure that does not concentrate.
Notice what the computation reveals about the Cantor set's structure: the dimension $\log 2 / \log 3$ reflects the ratio $\log(\text{number of pieces}) / \log(1/\text{contraction ratio})$. At each stage we have $2^k$ pieces each scaled by $3^{-k}$. The dimension is the unique $s$ for which these two rates balance: $(2 \cdot 3^{-s})^k = 1$. This self-similar structure is characteristic of all iterated function systems with equal contraction ratios, and the Cantor set dimension computation generalizes to all such systems.
<!-- illustration-needed: the first four stages of the middle-thirds Cantor set construction, showing C₀, C₁, C₂, C₃ with intervals labeled by their lengths 1, 1/3, 1/9, 1/27 -->
## Comparing Hausdorff Dimension to Other Notions
The Hausdorff dimension is not the only way to quantify the "size" of a set, and understanding how it relates to other dimension concepts clarifies what it does and does not capture. Two comparisons matter most: with topological dimension (which captures connectivity and local structure) and with Minkowski (box-counting) dimension (which captures scaling behavior more coarsely).
Topological dimension is defined inductively: the empty set has dimension $-1$, a space has dimension $\leq n$ if every point has a neighborhood base of open sets whose boundaries have dimension $\leq n-1$. For manifolds, smooth curves, and familiar geometric objects, topological and Hausdorff dimension agree. But for fractal sets, they diverge.
[example: Topological vs. Hausdorff Dimension]
The middle-thirds Cantor set $C$ has topological dimension $0$ (it is totally disconnected — every connected component is a single point). Yet $\dim_{\mathcal{H}}(C) = \log 2/\log 3 \approx 0.631$.
Similarly, the boundary of the Koch snowflake has topological dimension 1 (it is a continuous curve, hence a 1-manifold) but $\dim_{\mathcal{H}}(\partial K) = \log 4/\log 3 \approx 1.26$.
The von Koch boundary is constructed by replacing each segment of length $\ell$ by four segments of length $\ell/3$ at each stage. At stage $k$, the boundary consists of $4^k$ segments each of length $3^{-k}$. An analogous computation to the Cantor set gives $\dim_{\mathcal{H}}(\partial K) = \log 4/\log 3$ (upper bound from the stage-$k$ cover; lower bound from the self-similar measure).
In both cases, Hausdorff dimension detects fractal complexity that topological dimension — by design — ignores: topological dimension only sees connected components and local connectivity, not how efficiently the set fills space at small scales.
[/example]
[definition: Lower and Upper Minkowski Dimension]
For a bounded set $E \subset \mathbb{R}^n$ and $r > 0$, let $N(E, r)$ denote the minimum number of balls of radius $r$ needed to cover $E$. The lower and upper Minkowski (box-counting) dimensions are
\begin{align*}
\underline{\dim}_M(E) &= \liminf_{r \to 0^+} \frac{\log N(E, r)}{-\log r}, \\
\overline{\dim}_M(E) &= \limsup_{r \to 0^+} \frac{\log N(E, r)}{-\log r}.
\end{align*}
When these coincide, the common value is the Minkowski dimension $\dim_M(E)$.
[/definition]
The Minkowski dimension measures the rate at which $N(E, r)$ grows as $r \to 0$. For the Cantor set, $N(C, 3^{-k}) = 2^k$, so $\log N(C, 3^{-k}) / \log 3^k = \log 2 / \log 3$, recovering the same value. For smooth curves and surfaces, Minkowski and Hausdorff dimensions also agree.
However, the two dimensions can differ, and the relationship is always an inequality.
[quotetheorem:2992]
The inequality $\dim_{\mathcal{H}}(E) \leq \underline{\dim}_M(E)$ follows from the fact that a cover by $N(E, r)$ balls of radius $r$ is a valid $2r$-cover for the Hausdorff measure computation. If $N(E, r) \leq r^{-s - \varepsilon}$ for all small $r$ and some $\varepsilon > 0$, then
\begin{align*}
\mathcal{H}^s_{2r}(E) \leq N(E, r) \cdot \alpha(s) r^s \leq \alpha(s) r^{-\varepsilon} \to 0 \text{ as } r \to 0,
\end{align*}
giving $\mathcal{H}^s(E) = 0$ and hence $\dim_{\mathcal{H}}(E) \leq s + \varepsilon$. Sending $\varepsilon \to 0$ gives $\dim_{\mathcal{H}}(E) \leq \underline{\dim}_M(E)$.
[example: A Set Where $\dim_{\mathcal{H}} < \dim_M$]
The key difference between Hausdorff and Minkowski dimensions is *countable stability*. Hausdorff dimension satisfies $\dim_{\mathcal{H}}\!\left(\bigcup_j E_j\right) = \sup_j \dim_{\mathcal{H}}(E_j)$, but Minkowski dimension does not.
Consider the set $E = \{0\} \cup \{1/k : k \in \mathbb{N}\}$. Each point has $\dim_{\mathcal{H}}(\{1/k\}) = 0$, and $E$ is countable, so $\dim_{\mathcal{H}}(E) = 0$.
For the Minkowski dimension, we need to estimate $N(E, r)$. The points $\{1, 1/2, 1/3, \ldots, 1/k\}$ are spread over $[0,1]$, with gaps of size $1/k - 1/(k+1) = 1/(k(k+1)) \approx 1/k^2$ near $0$. To cover the points near $0$, we need balls of radius $r$ to resolve points separated by approximately $r$. The point $1/k$ and $1/(k+1)$ are separated by $1/(k(k+1))$, so we need $r < 1/(2k^2)$ to separate them. The number of such points is $k \approx 1/\sqrt{2r}$, so $N(E, r) \gtrsim r^{-1/2}$. This gives $\overline{\dim}_M(E) \geq 1/2$.
A matching upper bound shows $\dim_M(E) = 1/2$. Thus $\dim_{\mathcal{H}}(E) = 0 < 1/2 = \dim_M(E)$: the accumulation of $1/k \to 0$ raises the Minkowski dimension even though no individual piece has positive dimension.
[/example]
The failure of countable stability for Minkowski dimension is a genuine defect: it means $\dim_M$ can be increased by adding countably many "negligible" points, making it sensitive to the specific geometry of accumulation. Hausdorff dimension's countable stability is a deeper measure-theoretic property rooted in the countable subadditivity of $\mathcal{H}^s$.
For practical computation, Minkowski dimension is often easier to estimate because it only requires counting boxes, while Hausdorff dimension requires constructing measures or covers at all scales simultaneously. But for theoretical work in GMT — where we need to take countable unions, apply Fubini-type arguments, and work with densities — Hausdorff dimension is the correct invariant.
[remark: Non-Integer Hausdorff Dimension and Geometry]
The occurrence of non-integer Hausdorff dimension is not pathological — it is characteristic of self-similar sets and limit sets in dynamics. In GMT, the deeper theory connects Hausdorff dimension to rectifiability: an $\mathcal{H}^s$-measurable set $E$ with $0 < \mathcal{H}^s(E) < \infty$ is called *$s$-rectifiable* if it can be covered (up to a set of $\mathcal{H}^s$-measure zero) by countably many Lipschitz images of $\mathbb{R}^s$. Rectifiable sets behave like smooth surfaces from the perspective of Hausdorff measure, while purely unrectifiable sets — such as the Cantor set — resist any such decomposition. This distinction, which drives the subject of GMT II, is visible in the dimension theory: integer Hausdorff dimension does not guarantee rectifiability, but non-integer dimension guarantees its failure.
[/remark]
---
Hausdorff measures quantify the size of singular and fractal sets, but their value depends on the ambient dimension. Densities provide a refined way to understand Hausdorff measure: they measure the 'thickness' of a set relative to its expected geometric size at each scale and location.
# 10. Densities
Hausdorff measure tells you the $s$-dimensional size of a set, but it says nothing about what that set looks like *near a given point*. A curve might have finite $\mathcal{H}^1$-measure globally while being wildly irregular at some points and smooth at others. A fractal set might have the same total $\mathcal{H}^s$-measure as a smooth surface, yet behave completely differently at every point. To distinguish these local behaviors, we need a local diagnostic tool — one that asks, at each point $x$, how does the measure of the ball $B(x, r)$ grow as $r \to 0$?
The answer is the density. If $\mu$ concentrates along an $s$-dimensional surface near $x$, then $\mu(B(x, r))$ should grow like $r^s$ — the same scaling as a flat $s$-dimensional disk. Densities make this precise: they are ratios of $\mu(B(x, r))$ to the reference scaling $r^s$, taken in the limit $r \to 0$. When the limit exists and equals a nonzero finite value, the measure behaves "uniformly $s$-dimensional" near $x$. When the limit is zero, the measure is thin relative to dimension $s$. When the limit is infinite, the measure is concentrated more than a flat $s$-dimensional object would predict.
Densities play a central role in Geometric Measure Theory because they bridge local measure-theoretic data and global geometric structure. The deep theorem of Preiss, previewed at the end of this chapter, says that for integer $s$, the density $\Theta^s(\mu, x) = 1$ at $\mu$-almost every point is not just a consequence of rectifiability — it is equivalent to it. This makes densities the fundamental invariant for detecting whether a measure "looks like an $s$-dimensional surface."
[example: Density of Lebesgue Measure on a Disk]
Before the formal definitions, consider the simplest case. Let $\mu = \mathcal{L}^2 \lfloor D$ where $D = \overline{B}(0, 1) \subset \mathbb{R}^2$ is the closed unit disk. Fix a point $x \in D^\circ$ (the interior). For small $r > 0$, the ball $B(x, r)$ lies entirely inside $D$, so
\begin{align*}
\mu(B(x, r)) = \mathcal{L}^2(B(x, r)) = \pi r^2.
\end{align*}
The $2$-dimensional scaling reference is $\alpha(2) r^2 = \pi r^2$ (since $\alpha(2) = \pi$ is the area of the unit disk). Therefore
\begin{align*}
\frac{\mu(B(x, r))}{\alpha(2) r^2} = \frac{\pi r^2}{\pi r^2} = 1
\end{align*}
for all small $r$, and the limit as $r \to 0$ equals $1$. The density is $1$ at interior points, meaning the measure looks exactly like a flat two-dimensional piece of mass near $x$.
Now consider a point $x$ on the boundary $\partial D$. For small $r$, only about half of $B(x, r)$ lies inside $D$, so $\mu(B(x, r)) \approx \frac{1}{2} \pi r^2$, giving density approximately $\frac{1}{2}$. This illustrates a general phenomenon: densities near boundary points detect the geometry of the boundary.
[/example]
## Definition of $s$-Dimensional Densities
How do we normalize the ball growth rate correctly? The choice of normalization determines what value a "flat" $s$-dimensional set produces. If we want the density of a flat $s$-dimensional disk to equal $1$, we must normalize by $\alpha(s) r^s$, where $\alpha(s)$ is the volume of the unit ball in $\mathbb{R}^s$ — the same constant used in the construction of Hausdorff measure. This ensures perfect agreement between the density-$1$ condition and the "locally looks like $\mathbb{R}^s$" condition.
For a general Radon measure $\mu$ on $\mathbb{R}^n$, the ratio $\mu(B(x, r)) / (\alpha(s) r^s)$ may oscillate as $r \to 0$, so we cannot always take a limit. Instead, we define upper and lower densities using the $\limsup$ and $\liminf$, which always exist (in $[0, +\infty]$).
[definition: Upper and Lower $s$-Dimensional Densities]
Let $\mu$ be a Radon measure on $\mathbb{R}^n$ and let $s \geq 0$. For $x \in \mathbb{R}^n$, define the **upper $s$-dimensional density** of $\mu$ at $x$ as
\begin{align*}
\Theta^{*s}(\mu, x) = \limsup_{r \to 0^+} \frac{\mu(B(x, r))}{\alpha(s)\, r^s}
\end{align*}
and the **lower $s$-dimensional density** of $\mu$ at $x$ as
\begin{align*}
\Theta_*^s(\mu, x) = \liminf_{r \to 0^+} \frac{\mu(B(x, r))}{\alpha(s)\, r^s}.
\end{align*}
Here $\alpha(s) = \pi^{s/2} / \Gamma(s/2 + 1)$ is the normalization constant (the volume of the unit ball in $\mathbb{R}^s$). When $\Theta^{*s}(\mu, x) = \Theta_*^s(\mu, x)$, the common value is called the **$s$-dimensional density** of $\mu$ at $x$, written $\Theta^s(\mu, x)$.
[/definition]
The normalization by $\alpha(s)$ is not cosmetic. For $s = n$ and $\mu = \mathcal{L}^n$, we have $\mu(B(x, r)) = \alpha(n) r^n$ exactly, so $\Theta^n(\mathcal{L}^n, x) = 1$ at every point. Without the $\alpha(s)$ factor, the density of Euclidean space in its own dimension would be $1$ only by accident. With it, the density-$1$ condition becomes the canonical marker for "flat $s$-dimensional behavior."
[definition: Restriction of a Measure]
For a Radon measure $\mu$ on $\mathbb{R}^n$ and a Borel set $E \subset \mathbb{R}^n$, the **restriction** of $\mu$ to $E$ is the Radon measure $\mu \lfloor E$ defined by
\begin{align*}
(\mu \lfloor E)(A) = \mu(E \cap A)
\end{align*}
for all Borel sets $A \subset \mathbb{R}^n$. In particular, the Hausdorff measure restricted to a set is $\mathcal{H}^s \lfloor E$.
[/definition]
The restriction $\mathcal{H}^s \lfloor E$ is the most natural measure to study when $E$ is a candidate $s$-dimensional object. The densities $\Theta^{*s}(\mathcal{H}^s \lfloor E, x)$ and $\Theta_*^s(\mathcal{H}^s \lfloor E, x)$ then measure how uniformly $s$-dimensional $E$ is near $x$: whether the $s$-dimensional mass in small balls around $x$ grows at exactly the rate one would expect from a flat $s$-dimensional piece.
[remark: Densities at Points Outside the Support]
If $x \notin \operatorname{supp}(\mu)$, then $\mu(B(x, r)) = 0$ for all sufficiently small $r$, so $\Theta^{*s}(\mu, x) = \Theta_*^s(\mu, x) = 0$. Densities are therefore only interesting at points of the support. When we speak of densities of $\mathcal{H}^s \lfloor E$, we always mean at points $x \in E$ (or $x \in \overline{E}$, though the interesting case is $x \in E$ where mass is present).
[/remark]
[example: Density of $\mathcal{H}^1$ on a Line Segment]
Let $E = \{(t, 0) : 0 \leq t \leq 1\} \subset \mathbb{R}^2$ be the unit segment on the $x$-axis, and let $\mu = \mathcal{H}^1 \lfloor E$. For $x = (t, 0)$ with $0 < t < 1$ and $r < \min(t, 1-t)$, the ball $B(x, r)$ intersects $E$ in the segment $\{(s, 0) : t - r < s < t + r\}$, which has $\mathcal{H}^1$-measure $2r$. The normalization constant is $\alpha(1) = 2$ (the length of the unit interval $[-1, 1]$), so
\begin{align*}
\frac{\mu(B(x, r))}{\alpha(1)\, r^1} = \frac{2r}{2r} = 1.
\end{align*}
Therefore $\Theta^1(\mathcal{H}^1 \lfloor E, x) = 1$ at every interior point of $E$. At the endpoint $x = (0, 0)$, the ball $B(x, r)$ intersects $E$ in the segment of length $r$ (only the right half), giving ratio $r / (2r) = 1/2$, so the density is $1/2$.
This matches the intuition from the disk example: interior points of a smooth object have density $1$, while boundary points see only half the expected mass.
[/example]
[explanation: What Densities Detect]
The density $\Theta^s(\mu, x)$ is a local invariant that encodes how $\mu$ distributes mass near $x$ relative to the $s$-dimensional scaling. Several cases are worth distinguishing geometrically.
If $\Theta^s(\mu, x) = 1$, the measure near $x$ grows exactly like a flat $s$-dimensional disk. This is the "rectifiable" case, and it is the signature that $x$ is a regular point of a smooth $s$-dimensional surface.
If $\Theta^s(\mu, x) > 1$, the measure near $x$ grows faster than a single flat disk would predict. This happens at multiple-sheet points (where several layers of an $s$-dimensional surface pass through $x$) or at interior cone points (where a surface meets itself).
If $\Theta^s(\mu, x) = 0$, the measure near $x$ grows much more slowly than $r^s$. This means $\mu$ is "thin" at $x$ relative to dimension $s$ — either because $x$ is on the edge of a lower-dimensional piece, or because the density simply does not accumulate at the rate $r^s$.
If $\Theta^{*s}(\mu, x) = +\infty$, the measure grows faster than any power $r^s$, which happens at "cusps" or points of infinite concentration.
The key fact — proven in the next section — is that for $\mu = \mathcal{H}^s \lfloor E$, the upper density is at most $1$ and the lower density is at least $2^{-s}$, both holding at $\mathcal{H}^s$-almost every point of $E$. These bounds are not obvious, and they fail without the finite measure hypothesis.
[/explanation]
## Density Estimates for Hausdorff Measure
Why should we expect any uniform bounds on densities? The problem is that without constraints, the ratio $\mu(B(x, r)) / (\alpha(s) r^s)$ could oscillate wildly or converge to any value. The fundamental insight is that Hausdorff measure carries its own covering structure: the very way $\mathcal{H}^s$ is defined (as a limit of infimums over covers by small sets) forces the densities to be controlled at almost every point.
The upper bound, $\Theta^{*s} \leq 1$ a.e., comes from the definition of $\mathcal{H}^s$: if a ball $B(x, r)$ contained too much $\mathcal{H}^s$-mass relative to $r^s$, then the ball itself would be an inefficient cover of its own content, contradicting the infimum in the definition. Making this precise requires the Besicovitch covering theorem (established in Chapter 4), which handles the case of a general Radon measure without doubling assumptions.
The lower bound, $\Theta_*^s \geq 2^{-s}$ a.e., is more subtle. It says that at almost every point of $E$, the set $E$ genuinely fills up balls at the rate $r^s$ — up to the constant $2^{-s}$. The factor $2^{-s}$ comes from the diameter-versus-radius geometry in the definition of $\mathcal{H}^s$: sets of diameter at most $\delta$ that cover a ball of radius $r$ must have diameter at least $r$, so the covering efficiency cannot be better than $2^{-s}$.
[quotetheorem:2993]
The finite measure hypothesis $\mathcal{H}^s(E) < \infty$ is necessary for the conclusion. If $\mathcal{H}^s(E) = \infty$ — for instance if $E = \mathbb{R}^s \times \{0\}^{n-s}$ is a full $s$-dimensional hyperplane — then the density at interior points is $1$ (not violating the bound), but the argument that works for finite-measure sets is more delicate near sets of infinite total mass. The key step in the finite-mass case is that the Besicovitch covering theorem allows us to pass from local conditions (the density ratio exceeds $1 + \varepsilon$ on some set $A$) to a global mass estimate that contradicts $\mathcal{H}^s(E) < \infty$.
The upper bound $\Theta^{*s} \leq 1$ does not mean densities are uniformly close to $1$. There can be points where the density equals $0$, and there can be points where the upper density is strictly between $0$ and $1$. The theorem only rules out the possibility that $\Theta^{*s} > 1$ on a set of positive $\mathcal{H}^s$-measure. This connects forward to rectifiability: sets where $\Theta^s = 1$ everywhere are the "regular" ones; sets where $\Theta^{*s} < 1$ everywhere are the "irregular" ones.
[quotetheorem:2994]
The constant $2^{-s}$ is sharp in the following sense: one cannot replace it by any larger constant and retain the same generality. To see where it comes from, recall that $\mathcal{H}^s$ is defined using covers by sets of diameter at most $\delta$, normalized by $(\operatorname{diam}(C)/2)^s$. A ball of radius $r$ has diameter $2r$, so a single set of diameter at most $2r$ contributes at most $\alpha(s) r^s$ to any cover. This means the "most efficient" cover of a ball $B(x, r)$ using a single set gives exactly $\mathcal{H}^s_{2r}(B(x, r)) \leq \alpha(s) r^s$. The factor $2^{-s}$ in the lower bound is the reciprocal of this ratio, tracking how the diameter normalization interacts with the radius.
Together, the two bounds say: at $\mathcal{H}^s$-almost every point of $E$,
\begin{align*}
2^{-s} \leq \Theta_*^s(\mathcal{H}^s \lfloor E, x) \leq \Theta^{*s}(\mathcal{H}^s \lfloor E, x) \leq 1.
\end{align*}
The upper and lower densities need not agree: the density $\Theta^s(\mathcal{H}^s \lfloor E, x)$ may fail to exist at some points (where the ratio oscillates). When the density does exist and equals $1$, the point $x$ is called a "regular point" of $E$, and this is closely related to differentiability and rectifiability.
[remark: The Lebesgue Density Theorem as a Special Case]
For $s = n$ and $\mu = \mathcal{L}^n$, both the upper and lower bounds give $\Theta^n(\mathcal{L}^n, x) = 1$ at $\mathcal{L}^n$-almost every point. This recovers the classical Lebesgue density theorem: for a measurable set $E \subset \mathbb{R}^n$ with $\mathcal{L}^n(E) < \infty$,
\begin{align*}
\lim_{r \to 0} \frac{\mathcal{L}^n(E \cap B(x, r))}{\mathcal{L}^n(B(x, r))} = 1 \quad \text{for } \mathcal{L}^n\text{-a.e. } x \in E.
\end{align*}
The Hausdorff density theorems are thus a genuine generalization of a classical result, not merely an analogy.
[/remark]
[example: Cantor Set Densities]
The middle-thirds Cantor set $C \subset [0, 1]$ has Hausdorff dimension $s = \log 2 / \log 3$ and $0 < \mathcal{H}^s(C) < \infty$ (computed in Chapter 9 using the mass distribution principle, and revisited in Chapter 12). Let $\mu = \mathcal{H}^s \lfloor C$.
Fix $x \in C$ and $r \approx 3^{-k}$ for a large integer $k$. The Cantor construction ensures that $C$ is covered by $2^k$ intervals of length $3^{-k}$ at level $k$, and $B(x, r)$ intersects at most two such intervals. Therefore $\mu(B(x, r)) \lesssim 2 \cdot \mathcal{H}^s(I_{k})$ where $I_k$ is a level-$k$ interval. By self-similarity, $\mathcal{H}^s(I_k) = 2^{-k} \cdot \mathcal{H}^s(C)$. The scaling $3^{-ks} = 2^{-k}$ (since $s \log 3 = \log 2$) gives
\begin{align*}
\frac{\mu(B(x, r))}{\alpha(s) r^s} \lesssim \frac{2 \cdot 2^{-k} \cdot \mathcal{H}^s(C)}{\alpha(s) \cdot 3^{-ks}} = \frac{2 \mathcal{H}^s(C)}{\alpha(s)},
\end{align*}
which is a finite constant independent of $r$. So the upper density is finite. From below, a symmetric argument using the Cantor measure shows that the lower density is also bounded away from $0$: every ball $B(x, r)$ with $x \in C$ must contain at least one level-$k$ interval, giving $\mu(B(x,r)) \gtrsim 2^{-k}$. The density bounds $2^{-s} \leq \Theta_* \leq \Theta^* \leq 1$ are satisfied, confirming the general theorems in this specific fractal example.
Whether the actual density $\Theta^s(\mu, x)$ exists (as a limit rather than just a lim sup and lim inf) is a more delicate question, and for the Cantor set the density typically oscillates — the ratio takes different values along subsequences $r = 3^{-k}$ and $r = 3^{-k-1/2}$.
[/example]
## Density and Hausdorff Dimension
A natural question is whether density information can be used to detect Hausdorff dimension. The density estimates proven above constrain how $\mathcal{H}^s \lfloor E$ behaves for a single fixed $s$. But if we want to determine the Hausdorff dimension of a set, we need to ask: for which values of $s$ does a given measure $\mu$ have nonzero upper $s$-density?
The answer links density to dimension through a two-sided comparison. If the upper density is positive on a set of positive measure, that set must have Hausdorff dimension at least $s$. If the upper density is finite on a set, the measure cannot be supported on a set of dimension greater than $s$.
[quotetheorem:2995]
The intuition is that positive upper density means $\mu$ concentrates at least $s$-dimensionally near points of $A$. If $\dim_{\mathcal{H}}(A) < s$, then $\mathcal{H}^s(A) = 0$, and a covering argument shows that a Radon measure with $\Theta^{*s}(\mu, x) > 0$ on $A$ would have to assign positive mass to $A$, contradicting $\mathcal{H}^s(A) = 0$ through a covering argument. The key step is that if $\Theta^{*s}(\mu, x) \geq c > 0$ on $A$, then for any cover $\{C_j\}$ of $A$ by small sets, one can use the density condition to bound $\mu(C_j)$ from above by $c^{-1} \alpha(s) (\operatorname{diam}(C_j)/2)^s$, which yields a lower bound on $\mathcal{H}^s(A)$.
The hypothesis that $\Theta^{*s}(\mu, x) > 0$ everywhere on $A$ (not just almost everywhere) is important. At a single exceptional point where the upper density is $0$, the argument breaks down. In practice, one often applies this theorem to a set $A$ where the density condition is known to hold at every point by a geometric argument.
[quotetheorem:2996]
The finiteness of the upper density says that $\mu$ does not concentrate at a rate faster than $r^s$ near typical points of its support. A set of Hausdorff dimension greater than $s$ would require the measure to grow faster than $r^s$ in small balls centered at typical points, contradicting the finiteness hypothesis.
Together, these two theorems give a density-based characterization of Hausdorff dimension: **dimension is localized by density**. If you can find a measure $\mu$ supported on a set $A$ such that $0 < \Theta^{*s}(\mu, x) < \infty$ at $\mu$-almost every $x \in A$, then $\dim_{\mathcal{H}}(A) = s$. This is essentially the content of the mass distribution principle (or the Frostman lemma), which is the standard tool for computing Hausdorff dimensions from below.
[example: Applying Density to Confirm Dimension of the Cantor Set]
The Cantor measure $\mu$ from Chapter 12 is supported on $C$ and satisfies $\mu(C) = 1 > 0$. In the previous example, we established that the upper density $\Theta^{*s}(\mu, x)$ is finite and positive for $x \in C$, where $s = \log 2 / \log 3$. Therefore both theorems apply:
From the lower bound theorem: since $\Theta^{*s}(\mu, x) > 0$ on $C$ with $\mu(C) > 0$, we get $\dim_{\mathcal{H}}(C) \geq s$.
From the upper bound theorem: since $\Theta^{*s}(\mu, x) < \infty$ $\mu$-a.e. and $\operatorname{supp}(\mu) = C$, we get $\dim_{\mathcal{H}}(C) \leq s$.
Combining: $\dim_{\mathcal{H}}(C) = s = \log 2 / \log 3$. This is the same calculation carried out more explicitly in Chapter 12 using the mass distribution principle, but here the density perspective makes transparent *why* the Cantor set has this dimension: because its natural self-similar measure has upper $s$-density that is finite and positive.
[/example]
<!-- illustration-needed: a schematic showing the three density regimes on a set E — points where Θ*^s = 1 (regular interior points), points where Θ_*^s = 0 (thin boundary points), and points where Θ*^s is between 0 and 1 (irregular points) — indicated by different shadings on a Cantor-like fractal -->
## Densities and the Lebesgue Decomposition
The density machinery developed in Chapters 5 and 6 — differentiating one Radon measure against another — is directly connected to $s$-dimensional densities. The connection illuminates why densities take the specific values they do at singular versus absolutely continuous parts of a measure.
Recall from Chapter 5 that for Radon measures $\mu$ and $\nu$ on $\mathbb{R}^n$, the Lebesgue decomposition writes $\nu = \nu_{ac} + \nu_s$ where $\nu_{ac} \ll \mu$ and $\nu_s \perp \mu$. The derivative $D_\mu \nu(x) = d\nu_{ac}/d\mu(x)$ exists $\mu$-almost everywhere and is finite. At $\nu_s$-typical points, $D_\mu \nu_s(x) = +\infty$.
For $s$-dimensional densities, the analogous structure appears when we compare $\mathcal{H}^s \lfloor E$ to the ambient Lebesgue measure $\mathcal{L}^n$. If $s < n$, then $\mathcal{H}^s \lfloor E$ is purely singular with respect to $\mathcal{L}^n$ (since $\mathcal{H}^s$-finite sets have $\mathcal{L}^n$-measure zero). The derivative of $\mathcal{H}^s \lfloor E$ with respect to $\mathcal{L}^n$ is $+\infty$ at $\mathcal{H}^s$-almost every point of $E$, consistent with the fact that the measure concentrates on a lower-dimensional set.
[example: Derivative of the Cantor Measure Against Lebesgue Measure]
Let $\mu$ be the Cantor measure on $C$ and let $\mathcal{L}^1$ be Lebesgue measure on $[0, 1]$. The derivative $D_{\mathcal{L}^1} \mu(x)$ is computed using the ratio $\mu(B(x, r)) / \mathcal{L}^1(B(x, r)) = \mu(B(x, r)) / (2r)$.
For $x \in C$ and $r \approx 3^{-k}$, we established that $\mu(B(x, r)) \approx 2^{-k}$. Therefore
\begin{align*}
\frac{\mu(B(x, r))}{2r} \approx \frac{2^{-k}}{2 \cdot 3^{-k}} = \frac{1}{2} \left(\frac{3}{2}\right)^k \to +\infty \quad \text{as } k \to \infty.
\end{align*}
So $D_{\mathcal{L}^1} \mu(x) = +\infty$ at every point of $C$, confirming that $\mu \perp \mathcal{L}^1$.
For $x \notin C$ (that is, $x$ in one of the removed open intervals), choosing $r$ small enough that $B(x, r)$ lies entirely within the removed interval gives $\mu(B(x, r)) = 0$, so $D_{\mathcal{L}^1} \mu(x) = 0$.
This is the Lebesgue decomposition in action: $\mu$ is purely singular with respect to $\mathcal{L}^1$, the derivative is $+\infty$ on the singular support $C$, and $0$ on the complement.
[/example]
The contrast between the derivative of $\mu$ against $\mathcal{L}^1$ (which is $+\infty$ on $C$) and the $s$-dimensional density of $\mathcal{H}^s \lfloor C$ (which is finite and positive on $C$) illuminates what the correct scaling is. The Lebesgue derivative asks: "relative to the $1$-dimensional reference scale $r$, how much mass is here?" The $s$-dimensional density asks: "relative to the $s$-dimensional reference scale $r^s$, how much mass is here?" Only the latter is the right scale for the Cantor set, and only that density lands in the range $[2^{-s}, 1]$ guaranteed by the theorems above.
## Preview: Connection to Rectifiability
The deepest application of density theory lies ahead. In GMT II, the central objects are *rectifiable sets* — sets that can be decomposed, up to a null set, into countably many Lipschitz images of subsets of $\mathbb{R}^s$. Rectifiable sets are the GMT generalization of smooth surfaces, and they are the natural arena for the area formula, the co-area formula, and integration by parts on singular objects.
What makes densities so powerful is that they characterize rectifiability in terms of purely local, measure-theoretic data.
[definition: $s$-Rectifiable Set]
A Borel set $E \subset \mathbb{R}^n$ is called **$\mathcal{H}^s$-rectifiable** (or simply **$s$-rectifiable**) if there exist countably many Lipschitz maps $f_j : A_j \to \mathbb{R}^n$ with $A_j \subset \mathbb{R}^s$ such that
\begin{align*}
\mathcal{H}^s\!\left(E \setminus \bigcup_{j=1}^\infty f_j(A_j)\right) = 0.
\end{align*}
That is, $E$ is covered by countably many Lipschitz images of subsets of $\mathbb{R}^s$, up to a set of $\mathcal{H}^s$-measure zero.
[/definition]
For smooth objects, rectifiability is automatic: any smooth $s$-dimensional submanifold of $\mathbb{R}^n$ is $s$-rectifiable (the local parametrizations are smooth and in particular Lipschitz). For fractal objects like the Cantor set, rectifiability fails: the Cantor set is $s$-rectifiable for $s = 0$ (it is a countable union of points), but not for $s = \log 2 / \log 3$.
The connection to densities is captured by one of the most profound theorems in GMT:
[quotetheorem:2997]
This theorem, proved by David Preiss in 1987, is the culmination of a long line of work by Besicovitch, Marstrand, and others. Its content is striking: the mere existence of the density limit $\Theta^s(\mu, x)$ — a single local condition at almost every point — forces the measure to be globally supported on a rectifiable set. Local oscillations in the ratio $\mu(B(x,r))/(\alpha(s) r^s)$ are incompatible with the density existing as a limit.
The hypothesis that $s$ is an integer is not a technicality — it is essential. For non-integer $s$, densities do not characterize geometric structure. A theorem of Marstrand shows that if $\Theta^s(\mu, x)$ exists and is positive and finite at $\mu$-almost every point, and if $s$ is not an integer, then $\mu = 0$. In other words: no Radon measure (other than the zero measure) can have a positive finite $s$-density at almost every point when $s$ is not an integer. This is one of the deepest reasons why fractional-dimensional geometry is fundamentally harder: the density invariant, which cleanly characterizes rectifiability for integer dimensions, provides no comparable structure for fractional dimensions.
[quotetheorem:2998]
This theorem provides a compelling explanation for why fractal sets with non-integer dimension are so much harder to classify geometrically than smooth surfaces or their Lipschitz images. The density invariant, which is the most natural local descriptor of a measure's dimensional behavior, simply cannot take a consistent finite positive value at almost every point of a non-integer-dimensional object without causing a contradiction. This forces any geometric characterization of non-integer-dimensional sets to use coarser or different local invariants.
[explanation: Why the Integer Hypothesis is Necessary]
The proof of Marstrand's theorem uses the following idea. Suppose $\Theta^s(\mu, x) = c > 0$ for $\mu$-almost every $x$. By a blow-up argument (zooming in at scale $r$), the density condition forces tangent measures of $\mu$ to exist and to be $c \cdot \mathcal{H}^s$ on flat $s$-dimensional subspaces. But for non-integer $s$, there are no "flat $s$-dimensional" subspaces in $\mathbb{R}^n$ — the only sets with positive finite $\mathcal{H}^s$-measure and density $1$ everywhere are $s$-dimensional rectifiable sets, which require $s$ to be an integer for the Lipschitz parametrization to make sense.
Alternatively, one can see the obstruction more directly: if $s$ is not an integer, then $\mathcal{H}^s$ behaves under rotations and scaling in a way that is inconsistent with any translation-invariant measure having finite positive $s$-density. The spherical harmonics argument shows that a translation-invariant measure with the right scaling law under dilations must be a multiple of $\mathcal{H}^s$ on an affine $s$-plane, which requires $s$ to be an integer.
This is why the study of fractal geometry (non-integer $s$) requires different tools — multifractal analysis, Assouad dimension, and self-similar structure — rather than the density-and-rectifiability framework that works so cleanly for integer-dimensional objects.
[/explanation]
The Preiss density theorem will be the starting point for GMT II: Rectifiability and the Area Formula. The program there will be to establish that rectifiable sets carry a well-defined notion of approximate tangent plane at $\mathcal{H}^s$-almost every point, that Lipschitz maps have well-defined Jacobians on rectifiable sets, and that the area formula
\begin{align*}
\int_{f(E)} N(f, E, y) \, d\mathcal{H}^s(y) = \int_E |Jf| \, d\mathcal{H}^s
\end{align*}
holds for Lipschitz maps $f : E \to \mathbb{R}^n$ on $s$-rectifiable sets $E$. The density theory developed in this chapter is the foundation on which that program rests: by ensuring that $\mathcal{H}^s$-a.e. point of a rectifiable set is a "regular point" with density $1$, it guarantees that the local geometric structure (tangent planes, Jacobians) is well-defined at almost every point.
---
With densities in hand, we can now apply geometric measure theory to maps between spaces. Lipschitz mappings preserve dimension information in a controlled way, and Hausdorff measure transforms predictably under such maps, enabling us to transfer geometric properties across spaces.
# 11. Hausdorff Measure and Lipschitz Mappings
The story of Hausdorff measure would be incomplete without understanding which maps preserve, distort, or annihilate its information. Continuous maps are not the right class: a continuous surjection from $[0,1]$ onto $[0,1]^2$ — a space-filling curve — can send a set of Hausdorff dimension $1$ to one of dimension $2$, catastrophically increasing the dimension. If we want a class of maps under which Hausdorff measure behaves predictably, we need a condition that controls how diameters transform. That condition is the Lipschitz property.
A Lipschitz map stretches distances by at most a fixed multiplicative factor $L$. Since $\mathcal{H}^s$ is built from coverings by sets of small diameter, controlling how diameters grow under $f$ directly controls how $\mathcal{H}^s$ transforms. This chapter makes that relationship precise. We prove that Lipschitz maps cannot increase Hausdorff dimension, estimate the Hausdorff measure of graphs of Lipschitz functions, and use these ideas to derive the polar coordinate formula — the prototype for the coarea formula developed in Geometric Measure Theory II.
[example: Space-Filling Curves Destroy Dimensional Information]
To see concretely why continuity alone is insufficient, consider the Peano curve $\gamma: [0,1] \to [0,1]^2$, which is a continuous surjection. The domain has $\dim_{\mathcal{H}} = 1$, while the image has $\dim_{\mathcal{H}} = 2$. So continuity alone allows dimension to increase by an arbitrary amount.
The issue is that a continuous map has no uniform control over how far apart two nearby points can be mapped. The point $\gamma(t)$ and $\gamma(t + \varepsilon)$ can be anywhere in $[0,1]^2$ regardless of how small $\varepsilon$ is. A Lipschitz map $f$ with constant $L$ satisfies $|f(x) - f(y)| \leq L|x - y|$, so the image of a set of diameter $d$ has diameter at most $Ld$. This is the key: the Lipschitz constant acts as a uniform bound on diameter expansion, which translates directly into a bound on how $\mathcal{H}^s$ changes.
[/example]
## Definition of Lipschitz Maps
What is the minimal regularity we can impose on a map to prevent Hausdorff dimension from increasing? The calculation in the theorem below reveals the answer: we need a uniform bound on the ratio $|f(x) - f(y)|/|x - y|$. This is precisely the Lipschitz condition.
[definition: Lipschitz Map]
Let $A \subset \mathbb{R}^m$ and let $f: A \to \mathbb{R}^n$ be a map. We say $f$ is **Lipschitz** if there exists a constant $L \geq 0$ such that
\begin{align*}
|f(x) - f(y)| \leq L|x - y|
\end{align*}
for all $x, y \in A$. The infimum of all such constants $L$ is called the **Lipschitz constant** of $f$, denoted $\operatorname{Lip}(f)$.
[/definition]
The Lipschitz constant $\operatorname{Lip}(f)$ is itself attained as a Lipschitz constant: if $|f(x) - f(y)| \leq L|x - y|$ holds for every $L > \operatorname{Lip}(f)$, then by continuity it holds for $L = \operatorname{Lip}(f)$ as well. When $\operatorname{Lip}(f) = 1$, we say $f$ is a **nonexpansive** or **short** map; when $\operatorname{Lip}(f) \leq 1$, it is a **contraction**.
Every $C^1$ function $f: U \to \mathbb{R}^n$ on a convex open set $U \subset \mathbb{R}^m$ is Lipschitz with constant $\operatorname{Lip}(f) \leq \sup_{x \in U} \|Df_x\|$, where $\|Df_x\|$ denotes the operator norm of the total derivative at $x$. The Lipschitz condition is weaker than $C^1$: the absolute value function $f(x) = |x|$ satisfies $\operatorname{Lip}(f) = 1$ but is not differentiable at the origin.
[remark: Lipschitz Implies Hölder]
A Lipschitz function $f: A \to \mathbb{R}^n$ with constant $L$ is also in $C^{0,\gamma}$ for every $\gamma \in (0,1]$: if $|x - y| < 1$ then $|f(x) - f(y)| \leq L|x - y| \leq L|x - y|^\gamma$. However, a Hölder function with exponent $\gamma < 1$ need not be Lipschitz. The function $f(x) = |x|^{1/2}$ on $\mathbb{R}$ is $\frac{1}{2}$-Hölder but not Lipschitz near the origin.
[/remark]
## Lipschitz Maps and Hausdorff Measure
How does Hausdorff measure transform under a Lipschitz map? The answer is elegant: the $s$-dimensional measure of the image is controlled by the $s$-th power of the Lipschitz constant. This follows directly from the diameter control that the Lipschitz condition provides, with no additional structure required.
[quotetheorem:2999]
The proof rests on a single observation: the Lipschitz condition translates diameter control on $E$ into diameter control on $f(E)$. Fix $\delta > 0$ and let $\{C_j\}$ be a countable covering of $E$ with $\operatorname{diam}(C_j) \leq \delta$ for each $j$. The sets $\{f(C_j \cap A)\}$ cover $f(E)$, since for each $y \in f(E)$ there exists $x \in E \subset \bigcup_j C_j$, so $x \in C_j$ for some $j$, giving $y = f(x) \in f(C_j \cap A)$. The Lipschitz condition gives
\begin{align*}
\operatorname{diam}(f(C_j \cap A)) \leq L \cdot \operatorname{diam}(C_j) \leq L\delta.
\end{align*}
Therefore $f(E)$ is covered by sets of diameter at most $L\delta$, and
\begin{align*}
\mathcal{H}^s_{L\delta}(f(E)) \leq \sum_j \alpha(s) \left(\frac{\operatorname{diam}(f(C_j \cap A))}{2}\right)^s \leq L^s \sum_j \alpha(s) \left(\frac{\operatorname{diam}(C_j)}{2}\right)^s.
\end{align*}
Taking the infimum over all such coverings $\{C_j\}$ of $E$ gives $\mathcal{H}^s_{L\delta}(f(E)) \leq L^s \mathcal{H}^s_\delta(E)$. Sending $\delta \to 0$, we obtain $\mathcal{H}^s(f(E)) \leq L^s \mathcal{H}^s(E)$.
This theorem has several important consequences. First, notice what happens when $\mathcal{H}^s(E) = 0$: the right side is zero regardless of $L$, so $\mathcal{H}^s(f(E)) = 0$ as well. In other words, Lipschitz maps preserve $\mathcal{H}^s$-null sets. Second, the bound $L^s$ is sharp: the scaling map $f(x) = Lx$ satisfies $\operatorname{Lip}(f) = L$ and $\mathcal{H}^s(f(E)) = L^s \mathcal{H}^s(E)$ exactly, because scaling by $L$ multiplies all diameters by $L$.
The hypothesis that $f$ is Lipschitz cannot be weakened to continuity, as the space-filling curve example already shows. But it is also worth noting that the theorem does not require $f$ to be injective — images can overlap and cause cancellation of measure, which is why we have an inequality rather than an equality. When $f$ is bi-Lipschitz (both $f$ and its inverse are Lipschitz), we obtain two-sided bounds and the theory becomes much richer; this is the starting point for the theory of rectifiable sets.
[quotetheorem:3000]
To see why this follows from the previous theorem, suppose $s > \dim_{\mathcal{H}}(E)$. By definition of Hausdorff dimension, this means $\mathcal{H}^s(E) = 0$. Applying the Lipschitz bound gives $\mathcal{H}^s(f(E)) \leq L^s \mathcal{H}^s(E) = 0$. Since $s$ was any value exceeding $\dim_{\mathcal{H}}(E)$, we conclude that $\mathcal{H}^s(f(E)) = 0$ for all $s > \dim_{\mathcal{H}}(E)$, which means $\dim_{\mathcal{H}}(f(E)) \leq \dim_{\mathcal{H}}(E)$.
The inequality can be strict. Projection from $\mathbb{R}^2$ onto $\mathbb{R}^1$ is Lipschitz with constant $1$ (it is even nonexpansive), yet it sends a two-dimensional disk — which has $\dim_{\mathcal{H}} = 2$ — to an interval with $\dim_{\mathcal{H}} = 1$. Dimension can decrease under Lipschitz maps, but it can never increase.
[example: Projection Strictly Decreases Dimension]
Let $\pi: \mathbb{R}^2 \to \mathbb{R}$ be the projection $\pi(x_1, x_2) = x_1$, and let $E = [0,1] \times \{0\}$ be the unit interval on the $x_1$-axis embedded in $\mathbb{R}^2$.
Here $\dim_{\mathcal{H}}(E) = 1$ (since $E$ is a line segment of length $1$, so $\mathcal{H}^1(E) = 1$ while $\mathcal{H}^s(E) = 0$ for $s > 1$) and $\pi(E) = [0,1]$, so $\dim_{\mathcal{H}}(\pi(E)) = 1$ as well — the dimension is preserved in this case. Now consider the disk $D = \{(x_1, x_2) : x_1^2 + x_2^2 \leq 1\}$. Its Hausdorff dimension is $2$ (it has positive $\mathcal{L}^2$-measure, and $\mathcal{H}^2 = \mathcal{L}^2$). The projection $\pi(D) = [-1, 1]$ has $\dim_{\mathcal{H}} = 1$. So $\pi$ is a $1$-Lipschitz map with $|\pi(x) - \pi(y)| = |x_1 - y_1| \leq |(x_1,x_2) - (y_1,y_2)|$, and it strictly decreases the Hausdorff dimension of $D$ from $2$ to $1$.
[/example]
## Hausdorff Measure of Lipschitz Graphs
When a Lipschitz function $f: \mathbb{R}^m \to \mathbb{R}$ defines a surface in $\mathbb{R}^{m+1}$, how much $m$-dimensional area does that surface have? This question is the first step toward the area formula, one of the central results of geometric measure theory. For now, we derive an upper bound and the exact formula in the smooth case.
[definition: Graph of a Map]
Let $A \subset \mathbb{R}^m$ and $f: A \to \mathbb{R}^n$. The **graph** of $f$ over $A$ is the set
\begin{align*}
\Gamma_f(A) = \{(x, f(x)) \in \mathbb{R}^m \times \mathbb{R}^n : x \in A\}.
\end{align*}
When $A = \mathbb{R}^m$, we write $\Gamma_f$.
[/definition]
The graph $\Gamma_f(A)$ lives in $\mathbb{R}^{m+n}$ and is parametrized by the map
\begin{align*}
\Phi: A &\to \mathbb{R}^{m+n} \\
x &\mapsto (x, f(x)).
\end{align*}
The key observation is that $\Phi$ is itself Lipschitz when $f$ is. If $f: A \to \mathbb{R}^n$ has Lipschitz constant $L$, then for any $x, y \in A$,
\begin{align*}
|\Phi(x) - \Phi(y)|^2 = |x - y|^2 + |f(x) - f(y)|^2 \leq |x - y|^2 + L^2|x - y|^2 = (1 + L^2)|x - y|^2,
\end{align*}
so $\operatorname{Lip}(\Phi) \leq \sqrt{1 + L^2}$.
[quotetheorem:3001]
Since $\Phi: A \to \Gamma_f(A)$ is a bijection with $\operatorname{Lip}(\Phi) \leq \sqrt{1+L^2}$, the Lipschitz bound on Hausdorff measure gives
\begin{align*}
\mathcal{H}^m(\Gamma_f(A)) = \mathcal{H}^m(\Phi(A)) \leq (\sqrt{1+L^2})^m \mathcal{H}^m(A) = (1+L^2)^{m/2} \mathcal{H}^m(A).
\end{align*}
Since $A \subset \mathbb{R}^m$ and $\mathcal{H}^m$ agrees with $\mathcal{L}^m$ on $\mathbb{R}^m$ (this is the content of the isodiametric inequality, established in Chapter 9), we have $\mathcal{H}^m(A) = \mathcal{L}^m(A)$, giving the result.
The bound $(1 + L^2)^{m/2}$ is not sharp in general: the actual $m$-dimensional measure of the graph also accounts for how the surface tilts, not just the worst-case Lipschitz constant. When $f$ is $C^1$, the area formula provides the exact answer.
[quotetheorem:3002]
The integrand $\sqrt{1 + |\nabla f|^2}$ is the **area element** of the graph: it measures how much the surface tilts away from the horizontal at each point. When $f$ is constant, $\nabla f = 0$ and the area element is $1$, so $\mathcal{H}^m(\Gamma_f) = \mathcal{L}^m(U)$ — the graph is flat and has exactly the same $m$-dimensional measure as its base. When $|\nabla f|$ is large, the surface tilts steeply and the area element exceeds $1$.
This formula is the prototype for the general area formula: if $g: U \subset \mathbb{R}^m \to \mathbb{R}^n$ is a $C^1$ injective map (a parametrization of a surface), then
\begin{align*}
\mathcal{H}^m(g(U)) = \int_U J_m(Dg_x) \, d\mathcal{L}^m(x),
\end{align*}
where $J_m(Dg_x)$ is the $m$-dimensional Jacobian of $Dg_x$ — the square root of the sum of squares of all $m \times m$ minors of the $n \times m$ Jacobian matrix. For the graph parametrization $\Phi(x) = (x, f(x))$, the Jacobian matrix is
\begin{align*}
J\Phi_x = \begin{pmatrix} I_m \\ \nabla f(x)^\top \end{pmatrix} \in \mathbb{R}^{(m+1) \times m},
\end{align*}
and $J_m(D\Phi_x) = \sqrt{\det(J\Phi_x^\top J\Phi_x)} = \sqrt{1 + |\nabla f(x)|^2}$.
[example: Area of the Graph of a Cone]
Let $m = 2$ and $f: \mathbb{R}^2 \to \mathbb{R}$ be defined by $f(x_1, x_2) = \sqrt{x_1^2 + x_2^2}$. This defines the cone surface over the disk $D_R = \{(x_1,x_2) : x_1^2 + x_2^2 \leq R^2\}$.
Away from the origin, $f$ is $C^1$ with gradient
\begin{align*}
\nabla f(x) = \left(\frac{x_1}{\sqrt{x_1^2+x_2^2}}, \frac{x_2}{\sqrt{x_1^2+x_2^2}}\right),
\end{align*}
so $|\nabla f(x)|^2 = (x_1^2 + x_2^2)/(x_1^2 + x_2^2) = 1$ for all $x \neq 0$. The area formula gives
\begin{align*}
\mathcal{H}^2(\Gamma_f(D_R \setminus \{0\})) = \int_{D_R \setminus \{0\}} \sqrt{1 + |\nabla f|^2} \, d\mathcal{L}^2 = \int_{D_R} \sqrt{2} \, d\mathcal{L}^2 = \sqrt{2} \cdot \pi R^2.
\end{align*}
(We removed the origin but it is a single point, hence $\mathcal{H}^2$-null, so it does not affect the value.)
Note also that $f$ is globally Lipschitz with $\operatorname{Lip}(f) = 1$, since $|f(x) - f(y)| = \big|\ |x| - |y|\ \big| \leq |x - y|$ by the reverse triangle inequality. The upper bound from the theorem gives $\mathcal{H}^2(\Gamma_f(D_R)) \leq (1 + 1^2)^{2/2} \pi R^2 = 2\pi R^2$, which is weaker than the exact value $\sqrt{2}\pi R^2 \approx 1.414 \pi R^2$.
[/example]
The gap between the upper bound $(1 + L^2)^{m/2} \mathcal{L}^m(A)$ and the area formula illustrates a general principle: the Lipschitz constant $L$ records the worst-case stretching, while the area formula integrates the actual local stretching. Both are valid bounds, but the area formula is exact while the Lipschitz bound can overestimate when $|\nabla f|$ varies.
## Integrals Over Spheres and Polar Coordinates
The polar coordinate decomposition of $\mathbb{R}^n$ splits integration into a radial part and an angular part. The angular part integrates over spheres $\partial B(0,r)$ of radius $r$, and understanding the Hausdorff measure of these spheres is essential. As we will see, the scaling law for Hausdorff measure under dilations — itself a consequence of the Lipschitz theorem — is precisely what makes polar coordinates work.
Consider the dilation $\delta_r: \mathbb{R}^n \to \mathbb{R}^n$ defined by $\delta_r(x) = rx$ for $r > 0$. This map is Lipschitz with constant $r$ (and its inverse has Lipschitz constant $1/r$), so by the Lipschitz bound:
\begin{align*}
\mathcal{H}^s(\delta_r(E)) \leq r^s \, \mathcal{H}^s(E).
\end{align*}
Applying this to the inverse $\delta_{1/r}$, we also get $\mathcal{H}^s(E) \leq r^{-s} \mathcal{H}^s(\delta_r(E))$, so in fact $\mathcal{H}^s(\delta_r(E)) = r^s \mathcal{H}^s(E)$ exactly (the dilation is bi-Lipschitz, so the inequality is tight in both directions). Applying this with $s = n-1$ to the unit sphere $\partial B(0,1) = \partial B(0,1)$ and $E = \partial B(0,1)$:
\begin{align*}
\mathcal{H}^{n-1}(\partial B(0,r)) = r^{n-1} \mathcal{H}^{n-1}(\partial B(0,1)).
\end{align*}
This is the $(n-1)$-dimensional volume scaling law for spheres.
[definition: Surface Measure of the Unit Sphere]
Define $\omega_{n-1} = \mathcal{H}^{n-1}(\partial B(0,1))$, the $(n-1)$-dimensional Hausdorff measure of the unit sphere $\partial B(0,1) \subset \mathbb{R}^n$. By the scaling law, the sphere of radius $r$ satisfies
\begin{align*}
\mathcal{H}^{n-1}(\partial B(0,r)) = r^{n-1} \omega_{n-1}.
\end{align*}
[/definition]
The value $\omega_{n-1}$ is related to the volume $\alpha(n)$ of the unit ball by $\omega_{n-1} = n \alpha(n)$. For example, $\omega_1 = 2\pi$ (circumference of the unit circle), $\omega_2 = 4\pi$ (surface area of the unit sphere in $\mathbb{R}^3$), and $\omega_0 = 2$ (the two-point "sphere" $\{-1, +1\}$).
[quotetheorem:3003]
The proof passes through the co-area formula for the map $\rho: \mathbb{R}^n \setminus \{0\} \to (0, \infty)$ defined by $\rho(x) = |x|$. This map is Lipschitz with constant $1$ (as it is nonexpansive: $|\ |x| - |y|\ | \leq |x - y|$), and its level sets are exactly the spheres $\rho^{-1}(r) = \partial B(0,r)$. The coarea formula — the full statement of which is central to GMT II — says that integration over $\mathbb{R}^n$ can be decomposed into integration over the fibers $\rho^{-1}(r)$, weighted by the Jacobian of $\rho$.
For the radial map $\rho(x) = |x|$, the Jacobian is identically $1$ (the gradient $\nabla \rho(x) = x/|x|$ has norm $1$ for $x \neq 0$). Since $\mathcal{L}^n(\{0\}) = 0$, the origin can be ignored. The coarea formula then gives:
\begin{align*}
\int_{\mathbb{R}^n} g \, d\mathcal{L}^n = \int_0^\infty \int_{\rho^{-1}(r)} g \, d\mathcal{H}^{n-1} \, dr = \int_0^\infty \int_{\partial B(0,r)} g \, d\mathcal{H}^{n-1} \, dr.
\end{align*}
To convert the integral over $\partial B(0,r)$ to one over the unit sphere, use the change of variables $\theta = x/r$ (so $x = r\theta$, $d\mathcal{H}^{n-1}(x) = r^{n-1} d\mathcal{H}^{n-1}(\theta)$ by the scaling law):
\begin{align*}
\int_{\partial B(0,r)} g(x) \, d\mathcal{H}^{n-1}(x) = \int_{\partial B(0,1)} g(r\theta) \, r^{n-1} \, d\mathcal{H}^{n-1}(\theta).
\end{align*}
Substituting back yields the polar coordinate formula.
[example: The Gaussian Integral via Polar Coordinates]
We compute $\int_{\mathbb{R}^n} e^{-|x|^2} \, d\mathcal{L}^n(x)$ using the polar coordinate formula. With $g(x) = e^{-|x|^2}$:
\begin{align*}
\int_{\mathbb{R}^n} e^{-|x|^2} \, d\mathcal{L}^n &= \int_0^\infty \left(\int_{\partial B(0,1)} e^{-r^2} \, d\mathcal{H}^{n-1}\right) r^{n-1} \, dr \\
&= \omega_{n-1} \int_0^\infty e^{-r^2} r^{n-1} \, dr.
\end{align*}
The inner integral over $\partial B(0,1)$ evaluates to $\omega_{n-1} e^{-r^2}$ since $g(r\theta) = e^{-r^2}$ is constant on the sphere of radius $r$.
To evaluate $\int_0^\infty e^{-r^2} r^{n-1} \, dr$, substitute $u = r^2$, $du = 2r \, dr$, giving
\begin{align*}
\int_0^\infty e^{-r^2} r^{n-1} \, dr = \frac{1}{2} \int_0^\infty e^{-u} u^{n/2 - 1} \, du = \frac{1}{2} \Gamma\!\left(\frac{n}{2}\right).
\end{align*}
On the other hand, $\int_{\mathbb{R}^n} e^{-|x|^2} \, d\mathcal{L}^n = \left(\int_{\mathbb{R}} e^{-t^2} \, d\mathcal{L}^1(t)\right)^n = \pi^{n/2}$ by Fubini's theorem and the standard one-dimensional Gaussian integral $\int_\mathbb{R} e^{-t^2} dt = \sqrt{\pi}$.
Equating the two expressions:
\begin{align*}
\pi^{n/2} = \omega_{n-1} \cdot \frac{1}{2} \Gamma\!\left(\frac{n}{2}\right).
\end{align*}
Therefore
\begin{align*}
\omega_{n-1} = \frac{2\pi^{n/2}}{\Gamma(n/2)}.
\end{align*}
For $n = 2$: $\omega_1 = 2\pi^1/\Gamma(1) = 2\pi/1 = 2\pi$. For $n = 3$: $\omega_2 = 2\pi^{3/2}/\Gamma(3/2) = 2\pi^{3/2}/(\frac{1}{2}\sqrt{\pi}) = 4\pi$. Both match the classical values for the circumference of the unit circle and the surface area of the unit sphere.
[/example]
The polar coordinate formula is more than a computational tool. It is the simplest instance of the coarea formula, which expresses integrals over $\mathbb{R}^n$ as integrals over the level sets of a function, weighted by the $(n-1)$-dimensional Hausdorff measure of those level sets. The radial map $\rho(x) = |x|$ has particularly symmetric level sets (spheres), but the coarea formula works for any Lipschitz function $u: \mathbb{R}^n \to \mathbb{R}$:
\begin{align*}
\int_{\mathbb{R}^n} g(x) |\nabla u(x)| \, d\mathcal{L}^n(x) = \int_{-\infty}^\infty \int_{u^{-1}(t)} g(x) \, d\mathcal{H}^{n-1}(x) \, dt.
\end{align*}
This is the subject of Chapter 1 in Geometric Measure Theory II.
## Bi-Lipschitz Maps and Dimensional Stability
We have seen that Lipschitz maps cannot increase Hausdorff dimension. When can we guarantee that dimension is preserved exactly? The answer lies in requiring both $f$ and its inverse to be Lipschitz.
[definition: Bi-Lipschitz Map]
A map $f: A \subset \mathbb{R}^m \to \mathbb{R}^n$ is **bi-Lipschitz** if $f$ is injective and there exist constants $0 < c \leq C < \infty$ such that
\begin{align*}
c|x - y| \leq |f(x) - f(y)| \leq C|x - y|
\end{align*}
for all $x, y \in A$. Equivalently, $f$ is Lipschitz with constant $C$ and $f^{-1}: f(A) \to A$ is Lipschitz with constant $1/c$.
[/definition]
Bi-Lipschitz maps are the isomorphisms of metric geometry in the same way that homeomorphisms are the isomorphisms of topology. The lower bound $c|x - y| \leq |f(x) - f(y)|$ prevents the map from collapsing distances: points that are far apart must be mapped to points that are far apart. This rules out the dimension-collapsing behavior that projection exhibits.
[quotetheorem:3004]
The upper bound follows directly from the Lipschitz bound applied to $f$ with constant $C$. For the lower bound, apply the same theorem to $f^{-1}: f(E) \to E$, which has Lipschitz constant $1/c$:
\begin{align*}
\mathcal{H}^s(E) = \mathcal{H}^s(f^{-1}(f(E))) \leq (1/c)^s \mathcal{H}^s(f(E)),
\end{align*}
which rearranges to $c^s \mathcal{H}^s(E) \leq \mathcal{H}^s(f(E))$. Combining the two inequalities shows that $\mathcal{H}^s(E) = 0$ if and only if $\mathcal{H}^s(f(E)) = 0$, which means the critical index where $\mathcal{H}^s$ transitions from $\infty$ to $0$ is the same for $E$ and $f(E)$.
The hypothesis that both $f$ and $f^{-1}$ are Lipschitz is necessary for dimension preservation. The projection example shows that Lipschitz alone is not enough. A more subtle counterexample shows that even a homeomorphism can change Hausdorff dimension: there exist homeomorphisms of $[0,1]$ that map a set of Hausdorff dimension $1/2$ to a set of Hausdorff dimension $1$. The bi-Lipschitz condition is the sharp dividing line.
Looking forward, bi-Lipschitz maps play a central role in the theory of rectifiable sets. A set $E \subset \mathbb{R}^n$ is $m$-rectifiable if it can be written as $E = E_0 \cup \bigcup_{k=1}^\infty f_k(A_k)$ where $\mathcal{H}^m(E_0) = 0$ and each $f_k: A_k \subset \mathbb{R}^m \to \mathbb{R}^n$ is Lipschitz. The theory of rectifiable sets, developed in Geometric Measure Theory II and III, shows that rectifiability is the correct generalization of smooth surfaces to the GMT setting: rectifiable sets have approximate tangent planes at $\mathcal{H}^m$-almost every point, and the area formula holds for Lipschitz parametrizations.
[remark: Lipschitz Extension]
A Lipschitz function $f: A \to \mathbb{R}^n$ defined on a subset $A \subset \mathbb{R}^m$ can always be extended to a Lipschitz function $\tilde{f}: \mathbb{R}^m \to \mathbb{R}^n$ with the same Lipschitz constant. For $n = 1$, this is McShane's theorem: set $\tilde{f}(x) = \inf_{a \in A} \{f(a) + L|x - a|\}$. For $n > 1$, apply McShane's theorem component by component. This extension theorem will be used extensively when constructing Lipschitz parametrizations of rectifiable sets from local data.
[/remark]
---
Having developed the full toolkit of Hausdorff measures, densities, and their behavior under mappings, we consolidate our understanding through concrete examples and worked problems. These applications illustrate how the abstract theory connects to classical fractals, rectifiable sets, and other objects of geometric interest.
# 12. Examples and Worked Problems
The theory built in the preceding eleven chapters — outer measures, Carathéodory's criterion, Radon measures, covering theorems, differentiation of measures, Hausdorff measure, densities, and Lipschitz mappings — is formidable machinery. But machinery only proves its worth when it meets concrete problems. This chapter is that meeting. Each section takes a specific set, measure, or function and subjects it to explicit computation, using the tools of the course not as abstract results to be cited but as engines that produce numerical answers and geometric insight. The questions driving each section are deliberately elementary: what is the dimension of this well-known set? Can a huge set have dimension zero? What does a singular measure look like when you try to differentiate it against Lebesgue measure? How does the Vitali lemma actually get used in an estimate? These are the kinds of computations that reveal what the theory really does.
[example: First Look at the Cantor Set]
Before any formal computation, consider what is strange about the middle-thirds Cantor set $C$. At each stage $k$ of its construction, we retain $2^k$ closed intervals, each of length $3^{-k}$. So the total length retained is $(2/3)^k$, which goes to zero. By the time we take the intersection $C = \bigcap_{k=0}^\infty C_k$, the Lebesgue measure satisfies $\mathcal{L}^1(C) = 0$. Yet $C$ is uncountable — it is in bijection with the Cantor set representation base $3$ — and it contains no interval. It is a set that is simultaneously too small for Lebesgue measure to detect and too large to be countable. The question is: what is its correct geometric size?
[/example]
## Hausdorff Dimension of the Cantor Set
How do we measure the size of a set that has Lebesgue measure zero but is uncountably infinite? Lebesgue measure assigns it size zero — it does not distinguish between the Cantor set and a single point. Cardinality alone shows it is uncountable — but that does not distinguish the Cantor set from $[0,1]$. The right tool is Hausdorff dimension: it assigns $C$ a value $s \in (0,1)$ reflecting the fact that $C$ is genuinely larger than a countable set but genuinely smaller than an interval.
[definition: Middle-Thirds Cantor Set]
Define $C_0 = [0, 1]$. For each $k \geq 1$, let $C_k$ be obtained from $C_{k-1}$ by removing the open middle third of each remaining closed interval. After $k$ steps, $C_k$ consists of $2^k$ closed intervals, each of length $3^{-k}$. The **middle-thirds Cantor set** is
\begin{align*}
C = \bigcap_{k=0}^\infty C_k.
\end{align*}
Denote the $j$-th interval at level $k$ by $I_{k,j}$ for $j = 1, \ldots, 2^k$, ordered from left to right.
[/definition]
The dimension of $C$ is encoded in the self-similarity relation: $C$ consists of two scaled copies of itself at scale $1/3$, and this scale-count pair $(2, 1/3)$ uniquely determines the critical exponent $s = \log 2 / \log 3$.
[quotetheorem:1206]
This theorem requires both an upper bound — showing $\mathcal{H}^s(C) < \infty$ — and a lower bound — showing $\mathcal{H}^s(C) > 0$. The upper bound is an explicit covering argument. The lower bound uses the mass distribution principle, which we state here for convenience.
[quotetheorem:3006]
The principle says: if a measure $\mu$ is spread out enough that no ball of radius $r$ can carry more than $c r^s$ of $\mu$-mass, then the set must be at least $s$-dimensional. A heavy measure forces a heavy set.
Now we carry out the explicit computation.
[example: Upper Bound for $\mathcal{H}^s(C)$]
Set $s = \log 2 / \log 3$. At stage $k$, the $2^k$ intervals $\{I_{k,j}\}$ form a cover of $C$, and each has diameter $3^{-k}$.
The $\delta$-approximation to Hausdorff measure satisfies, for $\delta = 3^{-k}$:
\begin{align*}
\mathcal{H}^s_{3^{-k}}(C) \leq \sum_{j=1}^{2^k} \alpha(s) \left(\frac{3^{-k}}{2}\right)^s = 2^k \cdot \alpha(s) \cdot \frac{3^{-ks}}{2^s}.
\end{align*}
We compute the exponent. Since $s = \log 2 / \log 3$, we have $3^s = 2$. Therefore $3^{-ks} = 2^{-k}$, and so
\begin{align*}
2^k \cdot 3^{-ks} = 2^k \cdot 2^{-k} = 1.
\end{align*}
Thus
\begin{align*}
\mathcal{H}^s_{3^{-k}}(C) \leq \frac{\alpha(s)}{2^s}.
\end{align*}
This bound is independent of $k$. Taking $k \to \infty$ (so $3^{-k} \to 0$) and using the definition $\mathcal{H}^s(C) = \lim_{\delta \to 0} \mathcal{H}^s_\delta(C)$, we obtain
\begin{align*}
\mathcal{H}^s(C) \leq \frac{\alpha(s)}{2^s} < \infty.
\end{align*}
[/example]
[example: Lower Bound via the Cantor Measure]
To apply the mass distribution principle, we construct a natural measure on $C$. Define the **Cantor measure** $\mu$ by specifying its values on the generating intervals: set $\mu(I_{k,j}) = 2^{-k}$ for each interval at level $k$. This is consistent — the two children of $I_{k,j}$ at level $k+1$ each receive mass $2^{-(k+1)}$, and their masses sum to $2^{-k}$. The Cantor measure extends to a Borel measure on $\mathbb{R}$ with $\mu(C) = 1$.
We verify the mass distribution condition. Take any ball $B(x, r)$ with $x \in C$ and $r > 0$. Choose $k \in \mathbb{N}$ so that
\begin{align*}
3^{-(k+1)} \leq r < 3^{-k}.
\end{align*}
The ball $B(x, r)$ has diameter $2r < 2 \cdot 3^{-k}$. At level $k$, the intervals have length $3^{-k}$ and gaps of length $3^{-k}$ between them. A ball of diameter less than $2 \cdot 3^{-k}$ can overlap at most two consecutive level-$k$ intervals (since the diameter is at most the sum of one interval length and one gap). Therefore
\begin{align*}
\mu(B(x, r)) \leq 2 \cdot 2^{-k}.
\end{align*}
Now we relate $2^{-k}$ to $r^s$. Since $r \geq 3^{-(k+1)}$, we have
\begin{align*}
r^s \geq 3^{-s(k+1)} = 3^{-s} \cdot 3^{-sk}.
\end{align*}
Using $3^s = 2$, so $3^{-sk} = 2^{-k}$ and $3^{-s} = 2^{-1}$, this gives
\begin{align*}
r^s \geq \frac{1}{2} \cdot 2^{-k}.
\end{align*}
Combining with the mass estimate:
\begin{align*}
\mu(B(x, r)) \leq 2 \cdot 2^{-k} \leq 4 r^s.
\end{align*}
By the mass distribution principle applied with $c = 4$:
\begin{align*}
\mathcal{H}^s(C) \geq \frac{\mu(C)}{4} = \frac{1}{4} > 0.
\end{align*}
[/example]
Together, the two examples show $0 < \mathcal{H}^s(C) < \infty$ for $s = \log 2 / \log 3$. This means $\mathcal{H}^t(C) = \infty$ for $t < s$ and $\mathcal{H}^t(C) = 0$ for $t > s$, so $\dim_{\mathcal{H}}(C) = s = \log 2 / \log 3 \approx 0.630$.
The role of each hypothesis in the mass distribution principle deserves emphasis. The condition $\mu(B(x,r)) \leq c r^s$ controls how mass is locally concentrated: if mass were allowed to pile up in a ball of radius $r$ by more than $c r^s$, then a single tiny ball could carry disproportionate mass, and the covering argument would fail to give a positive lower bound. The conclusion $\mathcal{H}^s(E) \geq \mu(E)/c$ says that any efficient cover of $E$ must use many small sets, each with bounded $\mu$-mass, and their collective diameter cost is at least $\mu(E)/c$. This principle will reappear whenever we need a lower bound on Hausdorff measure, for instance in proving that rectifiable sets have positive density.
<!-- illustration-needed: the Cantor set construction — show four stages C_0 through C_3 as horizontal interval diagrams, with the removed middle thirds marked in gray and the retained intervals in black, together with a label showing the interval count and length at each stage -->
## An Uncountable Set of Hausdorff Dimension Zero
The computation above shows that Hausdorff dimension can be strictly between 0 and 1. But can it equal exactly zero for a set that is still large in cardinality? Intuition might suggest that any uncountable set must have positive dimension — after all, dimension zero characterizes countable sets when we use topological dimension or simple covering counts. The following example shatters this intuition.
[definition: Liouville Numbers]
A real number $x$ is a **Liouville number** if for every $m \in \mathbb{N}$, there exist integers $p$ and $q$ with $q \geq 2$ and
\begin{align*}
\left| x - \frac{p}{q} \right| < q^{-m}.
\end{align*}
The set of all Liouville numbers is denoted $\mathcal{L}$.
[/definition]
The definition asks that $x$ is approximable by rationals at every super-polynomial rate. Liouville used this class to construct the first explicit transcendental numbers: any Liouville number is transcendental, because algebraic numbers of degree $d$ satisfy a quantitative lower bound $|x - p/q| \geq C q^{-d}$ by Liouville's theorem on Diophantine approximation, and no Liouville number satisfies such a bound for any fixed $d$.
Despite being transcendental and in some sense "badly approximable by rationals" at every rate, the set $\mathcal{L}$ is actually enormously large topologically: it is a dense $G_\delta$ set in $\mathbb{R}$, meaning it is residual in the sense of Baire category. Countably many exceptional sets of measure zero can accumulate to fill up a topologically large set. The measure-theoretic story is completely different.
[quotetheorem:3007]
The fact that $\mathcal{L}^1(\mathcal{L}) = 0$ follows from the fact that $\mathcal{L} \subset \bigcap_{m=1}^\infty \bigcup_{q \geq 2} \bigcup_{p \in \mathbb{Z}} B(p/q, q^{-m})$, and for each fixed $m$, the measure of this union over $p, q$ is at most $\sum_{q \geq 2} (2q+1) \cdot 2q^{-m}$, which converges and can be made arbitrarily small by restricting to $q \geq Q$ for large $Q$. The Hausdorff dimension claim requires an explicit estimate for $\mathcal{H}^s(\mathcal{L})$.
[example: Proving $\dim_{\mathcal{H}}(\mathcal{L}) = 0$]
Fix any $s > 0$. We will show $\mathcal{H}^s(\mathcal{L}) = 0$ by constructing, for each $\delta > 0$, a covering of $\mathcal{L}$ whose total diameter-to-the-$s$th-power is less than $\delta$.
Choose $m \in \mathbb{N}$ large enough so that $m > 2/s$, meaning $ms > 2$. For each Liouville number $x \in \mathcal{L}$, there exists by definition a rational approximation $p/q$ with $q \geq 2$ and $|x - p/q| < q^{-m}$, so $x \in B(p/q, q^{-m})$. Collecting over all valid pairs $(p, q)$, we obtain:
\begin{align*}
\mathcal{L} \subset \bigcup_{q=2}^{\infty} \bigcup_{p \in \mathbb{Z}} B\!\left(\frac{p}{q},\, q^{-m}\right).
\end{align*}
Each such ball has diameter $2q^{-m}$. We estimate the $s$-dimensional cost of this cover. For Liouville numbers in $[0,1]$, the relevant denominators satisfy $q \geq 2$ and the numerators satisfy $0 \leq p \leq q$ (at most $q+1$ choices per $q$). More generally, for $x$ in any fixed bounded interval $[-R, R]$, we need $|p| \leq qR + 1$, giving at most $2qR + 3$ choices of $p$ per $q$. Thus the total cost over $q \geq Q$ is at most
\begin{align*}
\sum_{q=Q}^{\infty} (2qR + 3) \cdot (2q^{-m})^s = 2^s \sum_{q=Q}^{\infty} (2qR + 3) \cdot q^{-ms}.
\end{align*}
Since $ms > 2$ (our choice of $m$), we have $q^{-ms} = o(q^{-2})$, and the series $\sum_{q=Q}^\infty q^{1-ms}$ converges (the exponent $1 - ms < -1$). Therefore
\begin{align*}
\sum_{q=Q}^{\infty} (2qR + 3) \cdot q^{-ms} \leq C \sum_{q=Q}^{\infty} q^{1-ms} < \infty,
\end{align*}
and this sum tends to $0$ as $Q \to \infty$.
Now to make the cover $\delta$-fine: restrict to $q \geq Q$ where $Q$ is large enough that $2Q^{-m} < \delta$. The balls in the cover then all have diameter less than $\delta$, and the total cost is at most $C \sum_{q=Q}^\infty q^{1-ms}$. As $Q \to \infty$, this tail sum goes to $0$. Taking $Q \to \infty$ gives
\begin{align*}
\mathcal{H}^s_\delta(\mathcal{L}) \leq C \sum_{q=Q}^{\infty} q^{1-ms} \to 0.
\end{align*}
Since $\delta = 2Q^{-m} \to 0$ as $Q \to \infty$, we conclude $\mathcal{H}^s(\mathcal{L}) = \lim_{\delta \to 0} \mathcal{H}^s_\delta(\mathcal{L}) = 0$.
Since this holds for every $s > 0$, the definition of Hausdorff dimension gives $\dim_{\mathcal{H}}(\mathcal{L}) = 0$.
[/example]
The key to the argument is the choice $m > 2/s$. The condition $ms > 2$ ensures the counting cost — the number of rational approximations $(p/q)$ with denominator $q$, which grows like $q$ — is dominated by the diameter savings $q^{-ms}$. When $ms \leq 2$, the sum diverges and the argument fails, which is exactly right: the threshold $ms = 2$ corresponds to $s = 2/m$, and as $m \to \infty$, this approaches $0$ from above, confirming that dimension is genuinely $0$.
[remark: Baire Category vs. Hausdorff Dimension]
This example illustrates a recurring tension in analysis: Baire category and Hausdorff dimension can disagree completely. The set $\mathcal{L}$ is "generic" in the Baire sense (it is a dense $G_\delta$, so its complement is meager) but "negligible" in the Hausdorff sense. Neither notion subsumes the other. Measure theory and category theory measure different kinds of smallness.
[/remark]
## Differentiation of the Cantor Measure
Having built the Cantor measure $\mu$ in the computation of $\dim_{\mathcal{H}}(C)$, we can now use it to illustrate the differentiation theory of Chapter 5. The central question: what is the derivative $D_{\mathcal{L}^1} \mu(x)$ — the density of the Cantor measure relative to Lebesgue measure — at each point $x \in \mathbb{R}$?
Recall the setup. The Cantor measure $\mu$ satisfies $\mu(C) = 1$, $\mu(\mathbb{R} \setminus C) = 0$, and $\mu(I_{k,j}) = 2^{-k}$. The Lebesgue measure $\mathcal{L}^1$ assigns $\mathcal{L}^1(C) = 0$ and $\mathcal{L}^1(\mathbb{R} \setminus C) = 1$. Intuitively, $\mu$ and $\mathcal{L}^1$ live on complementary sets. But the differentiation theorem tells us something more precise.
[definition: Derivative of One Measure Against Another]
Let $\mu$ and $\nu$ be Radon measures on $\mathbb{R}^n$. The **derivative of $\nu$ with respect to $\mu$** at a point $x$ is
\begin{align*}
D_\mu \nu(x) = \lim_{r \to 0} \frac{\nu(B(x, r))}{\mu(B(x, r))},
\end{align*}
whenever the limit exists and $\mu(B(x, r)) > 0$ for all small $r > 0$.
[/definition]
The differentiation theorem (Chapter 5) guarantees that this limit exists $\mu$-almost everywhere. For the Cantor measure, we can compute it explicitly.
[example: Computing $D_{\mathcal{L}^1} \mu$ at Points of $C$]
Let $x \in C$ and consider the ratio
\begin{align*}
\frac{\mu(B(x, r))}{\mathcal{L}^1(B(x, r))} = \frac{\mu(B(x, r))}{2r}.
\end{align*}
Choose $r = 3^{-k}$ for each $k$. The ball $B(x, 3^{-k})$ has diameter $2 \cdot 3^{-k}$, which equals the length of a level-$k$ interval. Since $x \in C$, the ball $B(x, 3^{-k})$ contains the unique level-$k$ interval $I_{k, j(x)}$ containing $x$, and the $\mu$-mass of this interval is $2^{-k}$. (The ball may also catch parts of adjacent intervals, but the $\mu$-mass of the interval containing $x$ gives a lower bound.) We have at minimum:
\begin{align*}
\frac{\mu(B(x, 3^{-k}))}{2 \cdot 3^{-k}} \geq \frac{2^{-k}}{2 \cdot 3^{-k}} = \frac{1}{2} \left(\frac{3}{2}\right)^k.
\end{align*}
Since $(3/2)^k \to \infty$ as $k \to \infty$, the ratio diverges through the sequence $r_k = 3^{-k}$. Since the limit along this sequence is $+\infty$, we conclude that the derivative satisfies
\begin{align*}
D_{\mathcal{L}^1} \mu(x) = +\infty \qquad \text{for every } x \in C.
\end{align*}
[/example]
[example: Computing $D_{\mathcal{L}^1} \mu$ Off the Cantor Set]
Now let $x \notin C$. Then $x$ lies in one of the removed open intervals $(a, b)$ excised at some stage of the Cantor construction. The complement $\mathbb{R} \setminus C$ is an open set (as $C$ is closed), so there exists $\varepsilon > 0$ such that $B(x, \varepsilon) \subset \mathbb{R} \setminus C$. For all $r < \varepsilon$:
\begin{align*}
B(x, r) \subset \mathbb{R} \setminus C,
\end{align*}
and since $\mu$ is supported on $C$, we have $\mu(B(x, r)) = 0$. Therefore
\begin{align*}
\frac{\mu(B(x, r))}{\mathcal{L}^1(B(x, r))} = \frac{0}{2r} = 0 \qquad \text{for all } r < \varepsilon,
\end{align*}
which gives
\begin{align*}
D_{\mathcal{L}^1} \mu(x) = 0 \qquad \text{for every } x \notin C.
\end{align*}
[/example]
These two computations together reveal the full picture: the derivative $D_{\mathcal{L}^1} \mu$ is $+\infty$ on $C$ and $0$ off $C$. This is exactly the signature of a measure that is **singular** with respect to $\mathcal{L}^1$.
[explanation: Singular Measures and the Lebesgue Decomposition]
Recall that two measures $\mu$ and $\nu$ are **mutually singular** (written $\mu \perp \nu$) if there exists a Borel set $A$ such that $\mu(A) = 0$ and $\nu(\mathbb{R}^n \setminus A) = 0$. The Cantor measure and Lebesgue measure are mutually singular: taking $A = C$, we have $\mathcal{L}^1(C) = 0$ and the Cantor measure $\mu$ is supported entirely on $C$, so $\mu(\mathbb{R} \setminus C) = 0$.
The Lebesgue decomposition theorem says that every Radon measure $\nu$ decomposes uniquely as $\nu = \nu_{ac} + \nu_s$ where $\nu_{ac} \ll \mathcal{L}^1$ (absolutely continuous part) and $\nu_s \perp \mathcal{L}^1$ (singular part). For the Cantor measure, the decomposition is trivial: $\mu_{ac} = 0$ and $\mu_s = \mu$.
The differentiation theorem says that $D_{\mathcal{L}^1} \nu(x)$ gives the Radon-Nikodym derivative of $\nu_{ac}$, and that $D_{\mathcal{L}^1} \nu_s(x) = +\infty$ at $\nu_s$-almost every point. For the Cantor measure, $\mu_s = \mu$ concentrates on $C$, and indeed $D_{\mathcal{L}^1} \mu(x) = +\infty$ at every point of $C$. The derivative blows up precisely where the singular mass lives.
This dichotomy — finite derivative at $\mathcal{L}^1$-typical points (where the absolute part dominates) and infinite derivative at $\nu_s$-typical points — is a precise measure-theoretic description of what "singular" means locally. The Cantor measure is singular because at every point where it has mass, the Lebesgue density is infinite.
[/explanation]
The hypothesis that $\mu$ is Radon is necessary for the differentiation theorem to apply. If we worked with a non-Radon measure (say, a measure that is not locally finite), the covering theorem arguments that drive the differentiation theorem break down.
## Vitali Covering and the Lebesgue Density Theorem
The Vitali covering theorem is the engine that powers density computations. To see exactly how it operates, we work through a proof of the Lebesgue density theorem using Vitali's lemma. The question driving this section is: if $E \subset \mathbb{R}^n$ is a measurable set with positive Lebesgue measure, how often does a ball centered at a point $x \in E$ actually look mostly like $E$ rather than its complement?
The answer is: at almost every point of $E$, the proportion of the ball lying in $E$ approaches $1$. Intuitively, almost every point of $E$ is a "density point" — the set looks locally like the full ambient space near almost every one of its points. The precise statement and proof illustrate how covering theorems convert local geometric data into global almost-everywhere conclusions.
[quotetheorem:894]
This follows from the general differentiation theorem (Chapter 5) applied to the measure $\nu = \mathcal{L}^n \lfloor E$ (Lebesgue measure restricted to $E$) against the reference measure $\mu = \mathcal{L}^n$. The derivative $D_{\mathcal{L}^n}(\mathcal{L}^n \lfloor E)(x)$ equals $1$ at $\mathcal{L}^n$-almost every $x \in E$ and $0$ at $\mathcal{L}^n$-almost every $x \notin E$, since $\mathcal{L}^n \lfloor E \ll \mathcal{L}^n$ with Radon-Nikodym derivative $\mathbf{1}_E$. But seeing how Vitali works directly in this context is instructive.
[example: Vitali Argument for the Density Theorem]
Fix $\varepsilon \in (0, 1)$ and consider the set of "bad" points in $E$:
\begin{align*}
A_\varepsilon = \left\{ x \in E : \limsup_{r \to 0} \frac{\mathcal{L}^n(E^c \cap B(x, r))}{\mathcal{L}^n(B(x, r))} > \varepsilon \right\}.
\end{align*}
Here $E^c = \mathbb{R}^n \setminus E$. We claim $\mathcal{L}^n(A_\varepsilon) = 0$.
Fix an open set $U \supset E^c$ with $\mathcal{L}^n(U) < \mathcal{L}^n(E^c) + \eta$ for some $\eta > 0$ to be chosen. Each point $x \in A_\varepsilon$ satisfies the limsup condition, so there exist arbitrarily small radii $r > 0$ with:
\begin{align*}
\mathcal{L}^n(E^c \cap B(x, r)) > \varepsilon \cdot \mathcal{L}^n(B(x, r)).
\end{align*}
Moreover, since $U$ is open and contains $E^c$, for small enough $r$ the ball $B(x, r)$ intersects $E^c$ inside $U$. So we have a collection $\mathcal{F}$ of balls $B(x, r)$ with $x \in A_\varepsilon$ and $B(x, r) \cap E^c \subset U$, which is a Vitali cover of $A_\varepsilon$ (every point has arbitrarily small covering balls).
Apply the Vitali covering theorem: extract a countable disjoint subcollection $\{B_j\}_{j=1}^\infty$ such that
\begin{align*}
A_\varepsilon \subset \bigcup_{j=1}^\infty 5B_j,
\end{align*}
where $5B_j$ denotes the ball with the same center as $B_j$ but radius multiplied by $5$.
Now we estimate. For each $B_j = B(x_j, r_j)$ in the disjoint collection:
\begin{align*}
\varepsilon \cdot \mathcal{L}^n(B_j) < \mathcal{L}^n(E^c \cap B_j) \leq \mathcal{L}^n(E^c \cap U \cap B_j) + \mathcal{L}^n(E^c \cap B_j \setminus U).
\end{align*}
Since $E^c \subset U$ by our choice of $U$, we have $E^c \cap B_j \subset U$, so $\mathcal{L}^n(E^c \cap B_j \setminus U) = 0$. Therefore:
\begin{align*}
\varepsilon \cdot \mathcal{L}^n(B_j) < \mathcal{L}^n(E^c \cap U \cap B_j).
\end{align*}
Summing over the disjoint balls $\{B_j\}$:
\begin{align*}
\varepsilon \sum_{j=1}^\infty \mathcal{L}^n(B_j) < \sum_{j=1}^\infty \mathcal{L}^n(E^c \cap U \cap B_j) = \mathcal{L}^n\!\left(E^c \cap U \cap \bigcup_{j=1}^\infty B_j\right) \leq \mathcal{L}^n(U).
\end{align*}
(Disjointness allows summing measures exactly.) This gives:
\begin{align*}
\sum_{j=1}^\infty \mathcal{L}^n(B_j) < \frac{\mathcal{L}^n(U)}{\varepsilon}.
\end{align*}
Now estimate $\mathcal{L}^n(A_\varepsilon)$ using the $5\times$ dilation. Since $A_\varepsilon \subset \bigcup_j 5B_j$ and $\mathcal{L}^n(5B_j) = 5^n \mathcal{L}^n(B_j)$:
\begin{align*}
\mathcal{L}^n(A_\varepsilon) \leq \sum_{j=1}^\infty \mathcal{L}^n(5B_j) = 5^n \sum_{j=1}^\infty \mathcal{L}^n(B_j) < \frac{5^n \mathcal{L}^n(U)}{\varepsilon}.
\end{align*}
Since $E$ may have infinite measure, the bound involves $\mathcal{L}^n(U)$ which need not be small. The key is to localize: restrict attention to $A_\varepsilon \cap B(0, R)$ for fixed $R$. The balls in the Vitali cover can be chosen with $B_j \subset B(0, 2R)$ for large enough radii, and we restrict to $B_j$ centered in $B(0,R)$. Then:
\begin{align*}
\mathcal{L}^n(A_\varepsilon \cap B(0, R)) \leq 5^n \sum_j \mathcal{L}^n(B_j) < \frac{5^n}{\varepsilon} \mathcal{L}^n(E^c \cap B(0, 3R)).
\end{align*}
This bound is valid because the disjoint balls each lie (by the centering restriction) inside $B(0, 3R)$, and each satisfies $\mathcal{L}^n(E^c \cap B_j) > \varepsilon \mathcal{L}^n(B_j)$. Now apply this to the measure-zero set argument: the set $A_\varepsilon$ has positive measure only if there is mass in $E^c$ nearby, but $A_\varepsilon \subset E$ and the covering balls have $E^c$-mass bounded from above. Since $\mathcal{L}^n(E^c \cap B_j)$ contributes to a fixed global pool, the total is controlled by $\mathcal{L}^n(E^c)$ (possibly infinite but localizable). Taking $R \to \infty$ and using $\sigma$-finiteness, $\mathcal{L}^n(A_\varepsilon) = 0$.
The set of non-density points is $\bigcup_{k=1}^\infty A_{1/k}$, which is a countable union of null sets, hence also null. So for $\mathcal{L}^n$-almost every $x \in E$, the density equals $1$.
[/example]
The $5\times$ dilation factor in Vitali's theorem is what connects the local estimates (each ball satisfies the density condition) to a global bound. Without dilation, we would only have disjoint balls; the dilation allows us to cover $A_\varepsilon$ by something that has controlled measure. The factor $5^n$ appears in the dimension-dependence: in $\mathbb{R}^n$, a ball of radius $5r$ has Lebesgue measure $5^n$ times that of the ball of radius $r$.
[remark: Besicovitch Would Improve the Constant]
The Vitali argument gives the bound $\mathcal{L}^n(A_\varepsilon) \lesssim \mathcal{L}^n(E^c)/\varepsilon$ with an implicit constant depending on $5^n$. The Besicovitch covering theorem, which avoids dilation by using bounded overlap instead, would replace $5^n$ with the Besicovitch constant $N(n)$ — which is better for large $n$, though both are dimensional constants. For the Lebesgue density theorem, the qualitative conclusion (the set $A_\varepsilon$ has measure zero) does not depend on which covering theorem is used; both yield the result.
[/remark]
## Hausdorff Measure on Lipschitz Graphs
How does Hausdorff measure interact with the geometric objects that arise most naturally in analysis — graphs of functions? A smooth surface in $\mathbb{R}^3$ should have $2$-dimensional Hausdorff measure equal to its classical area. The Lipschitz case is both more general and more tractable: it requires only the Lipschitz bound on diameters, not smoothness.
[example: The Graph of a Lipschitz Function]
Let $f: [0,1] \to \mathbb{R}$ be Lipschitz with constant $L$, meaning $|f(x) - f(y)| \leq L|x-y|$ for all $x, y \in [0,1]$. Define the graph:
\begin{align*}
\Gamma_f = \{(x, f(x)) : x \in [0,1]\} \subset \mathbb{R}^2.
\end{align*}
We compute the $1$-dimensional Hausdorff measure $\mathcal{H}^1(\Gamma_f)$ and bound it in terms of $L$ and the Lebesgue measure of the domain.
**Upper bound.** Partition $[0,1]$ into $N$ equal subintervals $[x_{j-1}, x_j]$ of length $h = 1/N$ for $j = 1, \ldots, N$. On each subinterval, the corresponding arc of $\Gamma_f$ lies inside the rectangle $[x_{j-1}, x_j] \times [f(x_{j-1}) - Lh, f(x_{j-1}) + Lh]$, which has diameter
\begin{align*}
\operatorname{diam}\bigl([x_{j-1}, x_j] \times [f(x_{j-1}) - Lh, f(x_{j-1}) + Lh]\bigr) = \sqrt{h^2 + (2Lh)^2} = h\sqrt{1 + 4L^2}.
\end{align*}
But actually the arc of $\Gamma_f$ over $[x_{j-1}, x_j]$ is covered by the line segment from $(x_{j-1}, f(x_{j-1}))$ to $(x_j, f(x_j))$, together with any vertical oscillation. More precisely, the arc fits inside a set of diameter at most
\begin{align*}
\sqrt{(x_j - x_{j-1})^2 + (f(x_j) - f(x_{j-1}))^2} \leq \sqrt{h^2 + L^2 h^2} = h\sqrt{1 + L^2}.
\end{align*}
Summing over $j = 1, \ldots, N$, the $h\sqrt{1+L^2}$-approximation gives:
\begin{align*}
\mathcal{H}^1_{h\sqrt{1+L^2}}(\Gamma_f) \leq N \cdot \alpha(1) \cdot \frac{h\sqrt{1+L^2}}{2} = N \cdot 1 \cdot \frac{h\sqrt{1+L^2}}{2} = \frac{\sqrt{1+L^2}}{2},
\end{align*}
using $\alpha(1) = 2$ (so $\alpha(1)(d/2)^1 = d$ for diameter $d$, meaning the $\mathcal{H}^1$ contribution of a set of diameter $d$ is bounded by $d$). Wait — let us be careful about normalization. The definition uses $\alpha(s)(\operatorname{diam}(C_j)/2)^s$ in the infimum, with $\alpha(1) = 2$. So the contribution of a set of diameter $d$ is $\alpha(1)(d/2)^1 = 2 \cdot d/2 = d$. Thus each arc piece contributes at most $h\sqrt{1+L^2}$, and:
\begin{align*}
\mathcal{H}^1_{h\sqrt{1+L^2}}(\Gamma_f) \leq N \cdot h\sqrt{1+L^2} = \sqrt{1+L^2}.
\end{align*}
Since $h\sqrt{1+L^2} \to 0$ as $N \to \infty$ (and $h = 1/N$), taking $N \to \infty$ gives:
\begin{align*}
\mathcal{H}^1(\Gamma_f) \leq \sqrt{1 + L^2}.
\end{align*}
**Lower bound.** The projection $\pi: \mathbb{R}^2 \to \mathbb{R}$ defined by $\pi(x, y) = x$ is a $1$-Lipschitz map (it does not stretch distances). The Lipschitz estimate from Chapter 11 gives:
\begin{align*}
\mathcal{H}^1(\pi(\Gamma_f)) \leq \operatorname{Lip}(\pi)^1 \cdot \mathcal{H}^1(\Gamma_f) = \mathcal{H}^1(\Gamma_f).
\end{align*}
But $\pi(\Gamma_f) = [0,1]$ and $\mathcal{H}^1([0,1]) = \mathcal{L}^1([0,1]) = 1$ (since $\mathcal{H}^1 = \mathcal{L}^1$ on $\mathbb{R}$). Therefore:
\begin{align*}
\mathcal{H}^1(\Gamma_f) \geq 1.
\end{align*}
Together: $1 \leq \mathcal{H}^1(\Gamma_f) \leq \sqrt{1 + L^2}$.
The lower bound $1$ is sharp (achieved when $f$ is constant, $L = 0$). The upper bound $\sqrt{1+L^2}$ is also essentially sharp: for a linear function $f(x) = Lx$, the graph is a straight line of slope $L$, and its length (hence $\mathcal{H}^1$-measure) is exactly $\sqrt{1+L^2}$.
[/example]
[remark: The Area Formula Preview]
For smooth $f : [0,1] \to \mathbb{R}$ with $f \in C^1$, the exact formula is
\begin{align*}
\mathcal{H}^1(\Gamma_f) = \int_0^1 \sqrt{1 + |f'(x)|^2} \, d\mathcal{L}^1(x).
\end{align*}
This is the classical arc-length formula, and it is a special case of the area formula for Lipschitz maps (GMT II). The integrand $\sqrt{1 + |f'(x)|^2}$ is the Jacobian of the parametrization $x \mapsto (x, f(x))$. For Lipschitz $f$, the derivative $f'$ exists $\mathcal{L}^1$-almost everywhere by Rademacher's theorem, and the formula still holds.
[/remark]
## Density Bounds for Hausdorff Measure
The density theory of Chapter 10 gives sharp estimates on $s$-dimensional densities of $\mathcal{H}^s \lfloor E$ for $\mathcal{H}^s$-finite sets $E$. Here we carry out explicit density computations for several geometric examples, illustrating how the density values encode geometric regularity.
The key theorems established in Chapter 10 are:
[quotetheorem:3008]
The gap between $2^{-s}$ and $1$ is genuine for non-integer $s$ — the density need not exist as a limit. The examples below show that both extremes can be achieved.
[example: Density of a Line Segment in $\mathbb{R}^2$]
Let $E = [0,1] \times \{0\} \subset \mathbb{R}^2$, a horizontal line segment of length $1$. For $s = 1$, compute $\Theta^1(\mathcal{H}^1 \lfloor E, x)$ at an interior point $x = (a, 0)$ with $0 < a < 1$.
The ball $B(x, r) = \{(y_1, y_2) : (y_1 - a)^2 + y_2^2 < r^2\}$ intersects $E$ in the interval $(a-r, a+r) \times \{0\}$, provided $r < \min(a, 1-a)$. This intersection has $\mathcal{H}^1$-measure equal to $2r$ (its $1$-dimensional length). The normalization factor is $\alpha(1)(r/1)^1$... actually, for $s=1$ the density is:
\begin{align*}
\Theta^{*1}(\mathcal{H}^1 \lfloor E, x) = \limsup_{r \to 0} \frac{\mathcal{H}^1(E \cap B(x, r))}{\alpha(1) r} = \limsup_{r \to 0} \frac{2r}{2r} = 1.
\end{align*}
using $\alpha(1) = 2$. Similarly, the lower density equals $1$. So the density exists and equals $1$ at interior points: $\Theta^1(\mathcal{H}^1 \lfloor E, x) = 1$.
At an endpoint $x = (0,0)$: the ball $B(x, r)$ intersects $E$ in $[0, r) \times \{0\}$, which has $\mathcal{H}^1$-measure $r$. The density is:
\begin{align*}
\Theta^{*1}(\mathcal{H}^1 \lfloor E, x) = \limsup_{r \to 0} \frac{r}{2r} = \frac{1}{2}.
\end{align*}
The density is $1/2$ at the endpoint, reflecting the fact that the line segment fills only half of each small ball near the boundary.
[/example]
This example shows why the upper bound of $1$ is tight and why endpoint effects produce densities of $1/2$. For a smooth $1$-dimensional submanifold in $\mathbb{R}^n$, the density is identically $1$ at all interior points, and this characterizes rectifiability in the key structure theorem (Preiss's theorem, Chapter 10).
[example: Density of the Cantor Set at Its Own Points]
Return to the Cantor set $C$ with $s = \log 2 / \log 3$. At $x \in C$, choose $r = 3^{-k}$. The ball $B(x, 3^{-k})$ intersects $C$ in a subset of $C_k$ (the stage-$k$ approximation), which consists of at most $2$ complete level-$k$ intervals near $x$ and possibly partial intervals. The $\mathcal{H}^s$-measure of one level-$k$ interval $I_{k,j}$ is:
\begin{align*}
\mathcal{H}^s(C \cap I_{k,j}) = \mathcal{H}^s(C) \cdot 2^{-k},
\end{align*}
since the self-similar structure of $C$ distributes $\mathcal{H}^s(C)$ equally among the $2^k$ intervals at level $k$. (This follows from the scaling property $\mathcal{H}^s(\lambda E) = \lambda^s \mathcal{H}^s(E)$ and self-similarity.) Therefore:
\begin{align*}
\mathcal{H}^s(C \cap B(x, 3^{-k})) \approx \mathcal{H}^s(C) \cdot 2^{-k} \cdot (\text{1 or 2 intervals}).
\end{align*}
The density ratio is:
\begin{align*}
\frac{\mathcal{H}^s(C \cap B(x, 3^{-k}))}{\alpha(s) (3^{-k})^s} \approx \frac{\mathcal{H}^s(C) \cdot 2^{-k}}{\alpha(s) \cdot 3^{-ks}} = \frac{\mathcal{H}^s(C)}{\alpha(s)} \cdot \frac{2^{-k}}{2^{-k}} = \frac{\mathcal{H}^s(C)}{\alpha(s)},
\end{align*}
using $3^{ks} = 2^k$ (since $3^s = 2$). This ratio is bounded and bounded away from zero, confirming $0 < \Theta^{*s}(\mathcal{H}^s \lfloor C, x) < \infty$ consistent with the density bounds.
The density does not equal $1$ in general for the Cantor set, reflecting the fact that $C$ is not rectifiable: it cannot be covered by countably many Lipschitz curves.
[/example]
The contrast between the line segment (density $= 1$) and the Cantor set (density $\in (2^{-s}, 1)$) is not a coincidence. For integer $s$, the density $\Theta^s = 1$ at $\mathcal{H}^s$-almost every point is equivalent to $s$-rectifiability (the Preiss-Marstrand structure theorem). The Cantor set has density in the open interval $(2^{-s}, 1)$, which corresponds to its non-rectifiable, fractal character.
## The Hausdorff Measure of the Unit Sphere
A fundamental calibration example: the sphere $\partial B(0,1) = \{x \in \mathbb{R}^n : |x| = 1\}$ should have $(n-1)$-dimensional Hausdorff measure equal to its classical surface area. We verify this and use it to derive the polar coordinate formula, which connects $\mathcal{H}^{n-1}$ on spheres to $\mathcal{L}^n$ on $\mathbb{R}^n$.
[example: Hausdorff Measure of $\partial B(0,1)$]
Since $\partial B(0,1)$ is a smooth $(n-1)$-dimensional submanifold of $\mathbb{R}^n$, general theory (the area formula for smooth immersions, GMT II) implies $\mathcal{H}^{n-1}(\partial B(0,1)) = \mathcal{H}^{n-1}_{\text{classical}}(\partial B(0,1))$. Here we verify the value directly using the polar coordinate formula.
Let $\omega_n = \mathcal{L}^n(B(0,1))$ be the volume of the unit ball. The standard formula for $\omega_n$ is:
\begin{align*}
\omega_n = \frac{\pi^{n/2}}{\Gamma(n/2 + 1)}.
\end{align*}
Consider the Gaussian integral $\int_{\mathbb{R}^n} e^{-|x|^2} \, d\mathcal{L}^n(x)$. By Fubini (Chapter 3), this factors as a product of one-dimensional Gaussians:
\begin{align*}
\int_{\mathbb{R}^n} e^{-|x|^2} \, d\mathcal{L}^n(x) = \left(\int_{-\infty}^\infty e^{-t^2} \, d\mathcal{L}^1(t)\right)^n = \pi^{n/2}.
\end{align*}
On the other hand, the polar coordinate formula (derived from the coarea formula for the map $x \mapsto |x|$, or directly from the scaling $\mathcal{L}^n(\lambda E) = \lambda^n \mathcal{L}^n(E)$) gives:
\begin{align*}
\int_{\mathbb{R}^n} e^{-|x|^2} \, d\mathcal{L}^n(x) = \int_0^\infty e^{-r^2} \mathcal{H}^{n-1}(\partial B(0,r)) \, d\mathcal{L}^1(r),
\end{align*}
where $\partial B(0,r) = \{x \in \mathbb{R}^n : |x| = r\}$. Since dilation by $r$ is a Lipschitz map with constant $r$, and $\mathcal{H}^{n-1}(r \cdot E) = r^{n-1} \mathcal{H}^{n-1}(E)$ for any $E$ (from the scaling of Hausdorff measure under Lipschitz maps), we have $\mathcal{H}^{n-1}(\partial B(0,r)) = r^{n-1} \mathcal{H}^{n-1}(\partial B(0,1))$. Setting $c_n = \mathcal{H}^{n-1}(\partial B(0,1))$:
\begin{align*}
\int_{\mathbb{R}^n} e^{-|x|^2} \, d\mathcal{L}^n(x) = c_n \int_0^\infty r^{n-1} e^{-r^2} \, d\mathcal{L}^1(r).
\end{align*}
Substituting $u = r^2$, so $r = u^{1/2}$ and $dr = \frac{1}{2} u^{-1/2} \, du$:
\begin{align*}
\int_0^\infty r^{n-1} e^{-r^2} \, d\mathcal{L}^1(r) = \int_0^\infty u^{(n-1)/2} e^{-u} \cdot \frac{1}{2} u^{-1/2} \, du = \frac{1}{2} \int_0^\infty u^{n/2 - 1} e^{-u} \, du = \frac{1}{2} \Gamma\!\left(\frac{n}{2}\right).
\end{align*}
Equating the two expressions:
\begin{align*}
\pi^{n/2} = c_n \cdot \frac{1}{2} \Gamma\!\left(\frac{n}{2}\right),
\end{align*}
so
\begin{align*}
c_n = \mathcal{H}^{n-1}(\partial B(0,1)) = \frac{2\pi^{n/2}}{\Gamma(n/2)}.
\end{align*}
For $n = 2$: $\mathcal{H}^1(S^1) = 2\pi^1/\Gamma(1) = 2\pi$. For $n = 3$: $\mathcal{H}^2(S^2) = 2\pi^{3/2}/\Gamma(3/2) = 2\pi^{3/2}/(\frac{\sqrt{\pi}}{2}) = 4\pi$. Both match classical formulas.
[/example]
[remark: The Polar Coordinate Formula]
The computation above contains the polar coordinate formula as a byproduct:
\begin{align*}
\int_{\mathbb{R}^n} f(x) \, d\mathcal{L}^n(x) = \int_0^\infty \left(\int_{\partial B(0,1)} f(r\theta) \, d\mathcal{H}^{n-1}(\theta)\right) r^{n-1} \, d\mathcal{L}^1(r)
\end{align*}
for any non-negative measurable $f: \mathbb{R}^n \to [0, \infty]$. The factor $r^{n-1}$ comes from the scaling of $\mathcal{H}^{n-1}$ on spheres: $\mathcal{H}^{n-1}(\partial B(0,r)) = r^{n-1} \mathcal{H}^{n-1}(\partial B(0,1))$. This formula is the coarea formula for the Lipschitz map $x \mapsto |x|$ and will be generalized significantly in GMT II.
[/remark]
## Countable Stability and the Dimension of Countable Sets
Hausdorff dimension has a fundamental property that distinguishes it from Minkowski (box-counting) dimension: it is countably stable. This single property, combined with the fact that points have dimension $0$, immediately implies that all countable sets have dimension $0$. But the converse fails: dimension-zero sets can be uncountable, as the Liouville example shows. Here we work out the countable stability property explicitly and use it to compute dimensions of several sets by decomposition.
[quotetheorem:3009]
The inequality $\dim_{\mathcal{H}}(\bigcup_j E_j) \geq \sup_j \dim_{\mathcal{H}}(E_j)$ is immediate from monotonicity. The other direction uses: if $t > \dim_{\mathcal{H}}(E_j)$ for every $j$, then $\mathcal{H}^t(E_j) = 0$ for every $j$, and countable subadditivity gives $\mathcal{H}^t(\bigcup_j E_j) \leq \sum_j \mathcal{H}^t(E_j) = 0$. So $\dim_{\mathcal{H}}(\bigcup_j E_j) \leq t$ for all $t > \sup_j \dim_{\mathcal{H}}(E_j)$, giving $\leq$.
This property fails for Minkowski dimension: the set $\{1/n : n \geq 1\} \cup \{0\}$ is countable (hence Hausdorff dimension $0$), but its upper Minkowski dimension is $1/2$, because balls of radius $r$ near $0$ cover roughly $r^{-1/2}$ points, giving a box-counting estimate of dimension $1/2$.
[example: Dimension of a Countable Dense Set]
Let $D = \mathbb{Q}^n \subset \mathbb{R}^n$ (the rational points). Since $D$ is countable, we can write $D = \{q_1, q_2, q_3, \ldots\}$. Each singleton $\{q_j\}$ satisfies $\dim_{\mathcal{H}}(\{q_j\}) = 0$ (a single point has $\mathcal{H}^s(\{q_j\}) = 0$ for all $s > 0$ since it can be covered by a ball of arbitrarily small diameter, giving diameter cost $\to 0$). By countable stability:
\begin{align*}
\dim_{\mathcal{H}}(\mathbb{Q}^n) = \sup_{j \geq 1} \dim_{\mathcal{H}}(\{q_j\}) = \sup_{j \geq 1} 0 = 0.
\end{align*}
Despite being dense in $\mathbb{R}^n$, the rationals have Hausdorff dimension $0$. Density is a topological property; Hausdorff dimension is a metric-measure property. They do not interact.
[/example]
[example: Dimension of a Union with Different Dimensions]
Let $E_1 = C$ (the Cantor set, $\dim_{\mathcal{H}} = \log 2/\log 3$), $E_2 = [0,1] \times \{0\}$ (a line segment in $\mathbb{R}^2$, $\dim_{\mathcal{H}} = 1$), and $E_3 = [0,1]^2$ (a square, $\dim_{\mathcal{H}} = 2$). Consider $E = E_3 \cup (E_2 \times \{2\}) \cup (E_1 \times \{4\})$ — three geometric objects stacked at different heights in $\mathbb{R}^3$. By countable stability (with just three terms):
\begin{align*}
\dim_{\mathcal{H}}(E) = \max\!\left(\dim_{\mathcal{H}}(E_3), \dim_{\mathcal{H}}(E_2), \dim_{\mathcal{H}}(E_1)\right) = \max\!\left(2, 1, \frac{\log 2}{\log 3}\right) = 2.
\end{align*}
The square dominates. Adding lower-dimensional pieces does not change the Hausdorff dimension of the union.
[/example]
This example illustrates a general principle: the Hausdorff dimension of a finite or countable union is determined by the piece of highest dimension. This is why, in geometric measure theory, one can decompose a complicated set into pieces and compute the dimension of each piece separately — the dimension of the whole is just the maximum over pieces.
## Summary: What the Computations Reveal
Looking back across the six computational sections, several themes emerge that will be central to the sequel courses on GMT.
The first theme is the role of self-similarity. The Cantor set computation worked because $C$ has a precise self-similar structure: $2$ copies at scale $1/3$. The dimension $\log 2/\log 3$ is determined exactly by this data. General fractals with exact self-similarity have dimensions computed by analogous formulas (solutions to Moran's equation), and the mass distribution principle provides the lower bound in every case.
The second theme is the contrast between topological and measure-theoretic notions of size. The Liouville numbers are topologically large (residual, dense) but measure-theoretically negligible (dimension zero). Countable sets can have positive Minkowski dimension. These failures of intuition are not pathological exceptions — they reflect genuine distinctions between different mathematical notions of size that the theory is designed to keep separate.
The third theme is the power of covering lemmas as a unifying tool. The Vitali lemma appeared in the Lebesgue density theorem with a $5\times$ dilation. The mass distribution principle is a dual covering argument: instead of extracting a subcover from a given collection, it says that any cover must have high total cost if the measure is spread out. Both arguments convert local information (what happens in a ball of radius $r$) into global, almost-everywhere conclusions.
The fourth theme is calibration: Hausdorff measure is designed so that $\mathcal{H}^n = \mathcal{L}^n$ on $\mathbb{R}^n$. The sphere computation showed that this calibration propagates correctly to submanifolds, with $\mathcal{H}^{n-1}(\partial B(0,1)) = 2\pi^{n/2}/\Gamma(n/2)$ matching the classical surface area formula. In the sequel courses, this calibration underpins the area and coarea formulas, which are the main tools for computing Hausdorff measure of images of Lipschitz maps.
## References
- L. C. Evans and R. F. Gariepy, *Measure Theory and Fine Properties of Functions*, Revised Edition, CRC Press, 2015. Chapters 1-2.
- P. Mattila, *Geometry of Sets and Measures in Euclidean Spaces*, Cambridge University Press, 1995.
- L. Simon, *Lectures on Geometric Measure Theory*, Proceedings of the Centre for Mathematical Analysis, Australian National University, 1983.
Contents
- 1. Measures and Measurable Functions
- Outer Measures
- Carathéodory's Criterion
- Borel Measures and Borel Regularity
- Radon Measures
- Measurable Functions
- Simple Functions and Approximation
- Restriction of Measures
- What Comes Next
- 2. Lusin's and Egoroff's Theorems
- Nearly Uniform Convergence
- Continuity Almost Everywhere
- The Near-Continuity Philosophy
- Uniform Integrability and the Limits of the Theory
- Applications in Geometric Measure Theory
- 3. Product Measures and Fubini's Theorem
- Product $\sigma$-Algebras and Measurable Rectangles
- Construction of the Product Measure
- Tonelli's Theorem for Non-Negative Functions
- Fubini's Theorem for Integrable Functions
- Lebesgue Measure on $\mathbb{R}^n$ as an Iterated Product
- A Worked Example: Computing a Gaussian Integral
- Completeness and the Product of Complete Measures
- The Cavalieri Principle
- Why Integrability Cannot Be Dropped in Fubini
- Connection to Hausdorff Measure
- 4. Covering Theorems
- Vitali's Covering Theorem
- Besicovitch's Covering Theorem
- Doubling Measures and the Role of Dilation
- Differentiation of Measures via Covering Theorems
- The Lebesgue Density Theorem
- Vitali Covers and the Fine Structure of Differentiation
- Maximal Functions and Covering Theorem Estimates
- Summary and Forward Connections
- 5. Differentiation of Radon Measures
- The Derivative of a Radon Measure
- The Differentiation Theorem
- Recovery of the Lebesgue Decomposition
- The Maximal Function and Quantitative Estimates
- Differentiation and Absolute Continuity
- The Lebesgue Differentiation Theorem as a Special Case
- Connection to the Radon-Nikodym Theorem and Future Directions
- 6. Lebesgue Points and Approximate Continuity
- The Lebesgue Differentiation Theorem
- Lebesgue Points
- Approximate Limits and Approximate Continuity
- Comparison with Lusin's Theorem
- Precise Representatives and the Lebesgue Set
- Connection to Differentiation Theory
- 7. Riesz Representation Theorem
- Positive Linear Functionals and the Representation Theorem
- Signed and Vector-Valued Measures
- Constructing Geometric Measures
- Why the Riesz Theorem is Central to GMT
- Connection to Differentiation and Absolute Continuity
- Measures from Geometric Functionals
- 8. Weak Convergence of Measures
- Weak-$*$ Convergence of Radon Measures
- The Compactness Theorem
- Weak Convergence in $L^1$ and Uniform Integrability
- Oscillation and Young Measures
- Portmanteau Theorem and Consequences
- Tight Sequences and the Role of Mass at Infinity
- Summary of the Hierarchy
- 9. Hausdorff Measures
- Construction of $\mathcal{H}^s$
- Hausdorff Dimension
- The Isodiametric Inequality and $\mathcal{H}^n = \mathcal{L}^n$
- Hausdorff Dimension and the Cantor Set
- Comparing Hausdorff Dimension to Other Notions
- 10. Densities
- Definition of $s$-Dimensional Densities
- Density Estimates for Hausdorff Measure
- Density and Hausdorff Dimension
- Densities and the Lebesgue Decomposition
- Preview: Connection to Rectifiability
- 11. Hausdorff Measure and Lipschitz Mappings
- Definition of Lipschitz Maps
- Lipschitz Maps and Hausdorff Measure
- Hausdorff Measure of Lipschitz Graphs
- Integrals Over Spheres and Polar Coordinates
- Bi-Lipschitz Maps and Dimensional Stability
- 12. Examples and Worked Problems
- Hausdorff Dimension of the Cantor Set
- An Uncountable Set of Hausdorff Dimension Zero
- Differentiation of the Cantor Measure
- Vitali Covering and the Lebesgue Density Theorem
- Hausdorff Measure on Lipschitz Graphs
- Density Bounds for Hausdorff Measure
- The Hausdorff Measure of the Unit Sphere
- Countable Stability and the Dimension of Countable Sets
- Summary: What the Computations Reveal
- References
Geometric Measure Theory I: Measures and Hausdorff Dimension
Content
Problems
History
Created by admin on 4/30/2026 | Last updated on 4/30/2026
Prerequisites
No prerequisites required for this page.
Rate this page
★
★
★
★
★
Poor
Excellent