This course extends the measure-theoretic foundations of Geometric Measure Theory I and II to study functions and sets with minimal regularity assumptions. Building on the Hausdorff measures and coarea formulas developed in earlier volumes, we investigate Sobolev spaces and BV (bounded variation) functions—classes where classical derivatives need not exist, yet a meaningful notion of weak differentiability and integration by parts persists. The central achievement is De Giorgi's structure theorem for sets of finite perimeter, which reveals that the perimeter measure, though defined variationally through competitors of minimal area, coincides with the classical surface area on the reduced boundary. This remarkable confluence of analytic and geometric perspectives culminates in the Gauss-Green theorem for non-smooth domains, extending the fundamental theorem of calculus to domains whose boundaries are merely Hausdorff-measurable.
The course is organized in three natural phases. Chapters 1–7 develop Sobolev space theory, introducing weak derivatives, approximation by smooth functions, trace operators, extension theorems, and Sobolev inequalities that govern the interplay between integrability and regularity. This framework provides the analytic machinery needed for the study of lower-regularity objects. Chapters 8–11 parallel this development for BV functions: we define BV as the space of $L^1$ functions whose distributional derivatives are measures, establish the structure theorem showing any BV function is a sum of a jump discontinuity and a Cantor part, prove approximation and compactness results, and derive the coarea formula in the BV setting. Chapters 12–14 form the geometric core, where De Giorgi's theorem appears: sets of finite perimeter are characterized by having a *reduced boundary* of finite Hausdorff measure, and the perimeter equals the $(n-1)$-dimensional measure of this reduced boundary almost everywhere. The Gauss-Green theorem follows, allowing divergence identities to hold for such rough domains. Finally, Chapter 15 grounds this abstract theory with concrete examples—explicit computations for fractals, corners, and cusps—and worked problems that reinforce the interplay between analytic regularity and geometric structure.
Throughout, a unifying theme emerges: regularity and measure are not fixed properties but *learnable* from the behavior of the function or set itself. The machinery of weak derivatives and variational characterizations permits rigorous analysis of objects far rougher than classical smooth manifolds, yet rich enough to support integration by parts and related geometric theorems. This volume solidifies the transition from smooth differential geometry to the rectifiable geometry of rough sets, equipping students with tools to work effectively in geometric analysis, partial differential equations, and the calculus of variations.
# 1. Sobolev Spaces: Definitions and Basic Properties
*This chapter lays the functional-analytic foundation for the entire course. Classical derivatives are too rigid for variational problems and PDEs: solutions need not be smooth, yet we require some notion of differentiation to make sense of equations and energy functionals. The weak derivative — defined by integration by parts against test functions — extends classical differentiation to $L^p$ functions, and the Sobolev spaces $W^{1,p}(\Omega)$ collect those $L^p$ functions whose weak gradient is again $L^p$. These spaces are the natural setting for existence theory via the direct method, and their analysis here will be used in every subsequent chapter, culminating in the BV spaces of Chapter 8.*
## Weak Derivatives and the Need for Generalised Differentiation
Classical analysis demands pointwise derivatives, but variational problems naturally produce functions that fail to be classically differentiable at many points. A striking example is the absolute value function $u(x) = |x|$ on $\mathbb{R}$: it is Lipschitz continuous but not differentiable at the origin. Despite this, it has a perfectly well-defined derivative almost everywhere — the sign function — and this derivative captures all the information needed for integration and variational analysis. The theory of weak derivatives formalises this idea by insisting only that integration by parts holds, rather than demanding a pointwise limit.
[definition: Weak Derivative]
Let $\Omega \subset \mathbb{R}^n$ be open, let $u \in L^1_{\mathrm{loc}}(\Omega)$, and let $\alpha = (\alpha_1, \dots, \alpha_n)$ be a multi-index with $|\alpha| = \alpha_1 + \cdots + \alpha_n$. A function $v \in L^1_{\mathrm{loc}}(\Omega)$ is the **$\alpha$-th weak derivative** of $u$, written $v = D^\alpha u$, if
\begin{align*}
\int_\Omega u \, D^\alpha \phi \, d\mathcal{L}^n = (-1)^{|\alpha|} \int_\Omega v \, \phi \, d\mathcal{L}^n
\end{align*}
for every test function $\phi \in C_c^\infty(\Omega)$.
[/definition]
The formula is simply integration by parts, with the boundary terms absent because $\phi$ has compact support. This observation immediately shows that if $u$ is classically $C^{|\alpha|}$, then its classical $\alpha$-th derivative is also its weak derivative — the two notions are consistent.
The weak derivative, when it exists, is unique up to sets of $\mathcal{L}^n$-measure zero. To see why, suppose both $v$ and $w$ satisfy the integration-by-parts identity. Then $\int_\Omega (v - w) \phi \, d\mathcal{L}^n = 0$ for all $\phi \in C_c^\infty(\Omega)$, which forces $v = w$ $\mathcal{L}^n$-a.e. by the fundamental lemma of the calculus of variations.
<!-- illustration-needed: 1D plot of |x| on [-1,1] showing the kink at the origin, alongside its weak derivative (sign function) — emphasises the contrast between classical and weak differentiability -->
[example: Weak Derivative of $|x|$]
Consider $u(x) = |x|$ on $\Omega = (-1, 1) \subset \mathbb{R}$. Define $v(x) = \operatorname{sgn}(x)$, i.e., $v(x) = 1$ for $x > 0$ and $v(x) = -1$ for $x < 0$. For any $\phi \in C_c^\infty(-1, 1)$,
\begin{align*}
\int_{-1}^1 |x| \, \phi'(x) \, d\mathcal{L}^1 &= \int_{-1}^0 (-x) \phi'(x) \, d\mathcal{L}^1 + \int_0^1 x \, \phi'(x) \, d\mathcal{L}^1.
\end{align*}
Integrating each piece by parts:
\begin{align*}
\int_{-1}^0 (-x) \phi'(x) \, d\mathcal{L}^1 &= \left[(-x)\phi(x)\right]_{-1}^0 + \int_{-1}^0 \phi(x) \, d\mathcal{L}^1 = 0 + \phi(-1)\cdot(-1)\cdot(-1) + \int_{-1}^0 \phi(x) \, d\mathcal{L}^1, \\
\int_0^1 x \, \phi'(x) \, d\mathcal{L}^1 &= \left[x\phi(x)\right]_0^1 - \int_0^1 \phi(x) \, d\mathcal{L}^1 = -\int_0^1 \phi(x) \, d\mathcal{L}^1.
\end{align*}
The boundary terms at $x = \pm 1$ vanish because $\phi$ has compact support in $(-1,1)$, so $\phi(\pm 1) = 0$. The boundary terms at $x = 0$ from the two integrations cancel: the left piece contributes $-\phi(0) \cdot (-1) = \phi(0)$ and the right piece contributes $-\phi(0) \cdot (+1) = -\phi(0)$; these are equal and opposite, so they sum to zero. Adding the remaining integrals: $\int_{-1}^1 |x| \phi'(x) \, d\mathcal{L}^1 = \int_{-1}^0 \phi \, d\mathcal{L}^1 - \int_0^1 \phi \, d\mathcal{L}^1 = -\int_{-1}^1 \operatorname{sgn}(x)\, \phi(x) \, d\mathcal{L}^1$. Thus $D^1 u = \operatorname{sgn}(x)$, and this weak derivative exists despite $u$ not being classically differentiable at $0$.
[/example]
The example above deals only with first-order differentiation, but the definition extends naturally to all multi-indices. Before moving on, it is worth recording what higher-order weak derivatives look like and how they relate to membership in Sobolev spaces.
[remark: Higher-Order Weak Derivatives]
The definition extends naturally to all orders. For $u \in L^1_{\mathrm{loc}}(\Omega)$, the weak derivative $D^\alpha u$ is the weak partial derivative of order $|\alpha|$. When $|\alpha| = 1$ and we consider all first-order derivatives, we write the weak gradient as $\nabla u = (D_{e_1} u, \dots, D_{e_n} u)$, where $D_{e_i} u$ is the weak partial derivative in the $i$-th coordinate direction. The weak gradient, when it exists as an $L^p$ function, is what defines membership in the Sobolev space $W^{1,p}$.
[/remark]
Not every function has a weak derivative. The key obstruction is that discontinuities concentrated on sets of positive $(n-1)$-dimensional measure prevent integration by parts from holding globally. For example, the [Heaviside function](/page/Heaviside%20Function) $u = \mathbb{1}_{(0,1)}$ on $(-1,1)$ does not have a weak derivative in $L^1_{\mathrm{loc}}$: if $v$ were such a derivative, then integrating by parts would force $\int \mathbb{1}_{(0,1)} \phi' \, d\mathcal{L}^1 = -\int v\phi \, d\mathcal{L}^1$ for all $\phi \in C_c^\infty$, but the left side computes to $\phi(0) - \phi(1) = \phi(0)$ (since $\phi(1)=0$), which is the Dirac delta evaluated at $0$ — not representable by any $L^1_{\mathrm{loc}}$ function.
## The Sobolev Space $W^{1,p}(\Omega)$
To see why $L^p$ alone is insufficient, consider minimising an energy like $\int_\Omega |\nabla u|^p \, d\mathcal{L}^n$ over $L^p(\Omega)$. A minimising sequence might converge in $L^p$ to some limit $u$, but we have no guarantee that the limit has a gradient at all — the gradients of the approximating sequence might diverge, or the limit might be a function whose gradient exists only in a distributional sense outside $L^p$. What is needed is a space that keeps $u$ and its gradient $\nabla u$ together and treats both as $L^p$ objects simultaneously. The Sobolev space $W^{1,p}(\Omega)$ does exactly this.
[definition: Sobolev Space $W^{1,p}$]
Let $\Omega \subset \mathbb{R}^n$ be open and let $1 \le p \le \infty$. The **Sobolev space** $W^{1,p}(\Omega)$ is
\begin{align*}
W^{1,p}(\Omega) := \{ u \in L^p(\Omega) : D_{e_i} u \in L^p(\Omega) \text{ for } i = 1, \dots, n \},
\end{align*}
equipped with the norm
\begin{align*}
\|u\|_{W^{1,p}(\Omega)} := \|u\|_{L^p(\Omega)} + \|\nabla u\|_{L^p(\Omega)},
\end{align*}
where $\|\nabla u\|_{L^p(\Omega)}^p := \sum_{i=1}^n \|D_{e_i} u\|_{L^p(\Omega)}^p$ for $p < \infty$, and $\|\nabla u\|_{L^\infty(\Omega)} := \max_i \|D_{e_i} u\|_{L^\infty(\Omega)}$.
[/definition]
The norm bundles the $L^p$ size of $u$ and its weak gradient into a single quantity; convergence in $W^{1,p}$ means convergence of both the function and its gradient simultaneously. **How to verify membership in practice:** to show $u \in W^{1,p}(\Omega)$, one checks (i) $u \in L^p(\Omega)$, (ii) computes the candidate weak gradient $v$ from classical formulas (wherever $u$ is smooth), then (iii) verifies $\int_\Omega u \, \partial_i \phi \, d\mathcal{L}^n = -\int_\Omega v_i \phi \, d\mathcal{L}^n$ for all $\phi \in C_c^\infty(\Omega)$. A common pitfall: the classical derivative computed pointwise a.e. is not always the weak derivative — the Cantor staircase (discussed below) has classical derivative zero a.e. yet is not in $W^{1,1}$, because the integration-by-parts identity fails.
[remark: Equivalent Norms]
The norm $\|u\|_{W^{1,p}} = \|u\|_{L^p} + \|\nabla u\|_{L^p}$ is equivalent to $(\|u\|_{L^p}^p + \|\nabla u\|_{L^p}^p)^{1/p}$ for $1 \le p < \infty$, and both are standard in the literature. We use the additive form throughout, following Evans–Gariepy. The choice does not affect completeness or the topology of the space.
[/remark]
When $p = 2$, the Sobolev space $W^{1,2}(\Omega)$ carries a natural inner product
\begin{align*}
(u, v)_{H^1} := (u, v)_{L^2} + (\nabla u, \nabla v)_{L^2} = \int_\Omega u v \, d\mathcal{L}^n + \sum_{i=1}^n \int_\Omega D_{e_i} u \, D_{e_i} v \, d\mathcal{L}^n,
\end{align*}
and we write $H^1(\Omega) := W^{1,2}(\Omega)$ to emphasise this Hilbert space structure. The special role of $H^1$ in PDE theory (Lax–Milgram, spectral theory) makes this notation standard.
## Functional-Analytic Properties
To apply compactness arguments and existence theorems in PDE theory, one needs more than just a norm — one needs the topology of $W^{1,p}$ to behave well with respect to sequences. The central difficulty is this: a minimising sequence $(u_k)$ for an energy functional is bounded in $W^{1,p}$, but boundedness in $L^p$ alone does not allow one to extract a limit that inherits any gradient information. If we had tried to work in $C^1(\Omega)$ with the $C^1$ norm, we would immediately face sequences of smooth functions — say, $u_k(x) = \sin(kx)/k$ — that are bounded and converge uniformly to zero, but whose derivatives oscillate wildly. The Sobolev topology prevents this by tracking both function and gradient norms simultaneously, and the Banach space structure allows weak compactness (for $p > 1$) to extract a subsequence whose limit inherits the gradient bound.
[quotetheorem:3094]
[citeproof:3094]
The restrictions $1 < p < \infty$ in the reflexivity and $1 \le p < \infty$ in the separability clauses are sharp. When $p = 1$, the space $W^{1,1}(\Omega)$ is not reflexive: bounded sequences need not have weakly convergent subsequences in $W^{1,1}$. This is not merely a technicality — it is the reason why the direct method in the calculus of variations fails for $p = 1$ energies. However, bounded sequences in $W^{1,1}$ do have subsequences converging in $L^1$ to a BV function (by the BV Compactness Theorem of Chapter 9), a broader space that allows measure-valued derivatives; the BV framework is introduced in Chapter 8 and compactness established in Chapter 9. When $p = \infty$, the space $W^{1,\infty}(\Omega)$ is not separable (it contains $L^\infty(\Omega)$ as a subspace, which is not separable), and reflexivity fails because $L^\infty$ is not reflexive.
[explanation: Why These Properties Matter]
Reflexivity is the key property needed to extract weakly convergent subsequences from bounded sequences. In the calculus of variations, one minimises an energy functional $I[u] = \int_\Omega L(x, u, \nabla u) \, d\mathcal{L}^n$ over some class $\mathcal{A} \subset W^{1,p}(\Omega)$: one takes a minimising sequence $(u_k)$ with $I[u_k] \to \inf_\mathcal{A} I$, which is bounded in $W^{1,p}$, then extracts a weakly convergent subsequence $u_{k_j} \rightharpoonup u$ by reflexivity, and finally shows $I[u] \le \liminf I[u_{k_j}]$ by lower semicontinuity. Without the Banach space structure and reflexivity, this entire argument collapses. The Rellich–Kondrachov theorem (Chapter 6) then shows that this weak convergence in $W^{1,p}$ implies strong convergence in $L^p$, providing the compactness needed in practice.
[/explanation]
## The Space $W^{1,p}_0(\Omega)$ and Zero Boundary Values
In many PDE problems, one imposes boundary conditions — most commonly the Dirichlet condition that $u$ vanishes on $\partial \Omega$. For smooth functions, this simply means $u|_{\partial \Omega} = 0$. For Sobolev functions, however, the situation is subtler: $\partial\Omega$ has $\mathcal{L}^n$-measure zero, and a Sobolev function is only defined up to $\mathcal{L}^n$-null sets, so the expression "$u = 0$ on $\partial\Omega$" is not literally meaningful for an $L^p$ class. What does it mean for an $L^p$ function to "vanish at the boundary" when the boundary has zero Lebesgue measure? The answer requires a new ingredient — the trace operator — which we construct properly in Chapter 3. For now, we adopt the natural surrogate: a function has zero boundary values if it can be approximated by smooth functions compactly supported in $\Omega$.
[definition: Sobolev Space with Zero Trace $W^{1,p}_0$]
Let $\Omega \subset \mathbb{R}^n$ be open and $1 \le p \le \infty$. The space $W^{1,p}_0(\Omega)$ is the closure of $C_c^\infty(\Omega)$ in $W^{1,p}(\Omega)$:
\begin{align*}
W^{1,p}_0(\Omega) := \overline{C_c^\infty(\Omega)}^{\|\cdot\|_{W^{1,p}(\Omega)}}.
\end{align*}
For $p = 2$, write $H^1_0(\Omega) := W^{1,2}_0(\Omega)$.
[/definition]
Since $W^{1,p}(\Omega)$ is a Banach space, $W^{1,p}_0(\Omega)$ is automatically a closed subspace, hence itself a Banach space. Functions in $W^{1,p}_0(\Omega)$ are those that can be "approximated from the interior" — intuitively, they vanish near $\partial\Omega$ in an averaged sense, even if they cannot be evaluated pointwise there.
The distinction between $W^{1,p}_0$ and $W^{1,p}$ depends on the geometry of $\Omega$:
- On all of $\mathbb{R}^n$: $W^{1,p}_0(\mathbb{R}^n) = W^{1,p}(\mathbb{R}^n)$, because $\mathbb{R}^n$ has no boundary and any $W^{1,p}$ function can be approximated by functions in $C_c^\infty(\mathbb{R}^n)$ (using a cutoff and mollification argument). This is the Meyers–Serrin theorem, proved in Chapter 2.
- On a bounded domain $\Omega$: $W^{1,p}_0(\Omega) \subsetneqq W^{1,p}(\Omega)$, since constant functions lie in $W^{1,p}(\Omega)$ but not in $W^{1,p}_0(\Omega)$ (a constant cannot be approximated by compactly supported functions in $W^{1,p}$ norm, as any approximating sequence must eventually have values near $0$ on a large part of $\Omega$, preventing it from being close to a nonzero constant in $L^p$).
The precise connection between $W^{1,p}_0(\Omega)$ and functions vanishing on $\partial\Omega$ is given by the trace theorem (Chapter 3): for sufficiently regular $\Omega$, one can define a bounded linear trace operator $T: W^{1,p}(\Omega) \to L^p(\partial\Omega, \mathcal{H}^{n-1})$, and $W^{1,p}_0(\Omega) = \ker T$. The key point is that $T$ extends the restriction $u|_{\partial\Omega}$ from smooth functions to all of $W^{1,p}$, making the heuristic "zero boundary values" rigorous. Without the trace theorem, the definition via closure of $C_c^\infty$ is the only available substitute, and it already captures the correct geometry: functions in $W^{1,p}_0$ are precisely those for which the trace operator, once defined, returns zero.
## Examples and Basic Identities
A natural question is: which power singularities $|x|^\alpha$ near the origin belong to $W^{1,p}$? The answer reveals exactly how the $L^p$ integrability of both the function and its gradient constrains the strength of allowable singularities, and produces a clean threshold $\alpha > 1 - n/p$ that reappears throughout the theory.
[example: Power Functions on the Unit Ball]
Let $\Omega = B(0,1) \subset \mathbb{R}^n$ with $n \ge 1$, and consider $u(x) = |x|^\alpha$ for $\alpha \in \mathbb{R}$, $\alpha \ne 0$. We determine for which $\alpha$ and $p$ we have $u \in W^{1,p}(\Omega)$.
First, $u \in L^p(\Omega)$ requires $\int_0^1 r^{\alpha p} r^{n-1} \, dr < \infty$, i.e., $\alpha p + n > 0$, which gives $\alpha > -n/p$.
Next, the classical gradient of $u$ for $x \ne 0$ is $\nabla u(x) = \alpha |x|^{\alpha - 2} x$, so $|\nabla u(x)| = |\alpha| |x|^{\alpha - 1}$. For $\alpha > 0$, this extends to a continuous function on all of $\Omega$ (including $x = 0$ if $\alpha \ge 1$), and standard arguments show this classical gradient is also the weak gradient. For $\alpha < 0$ or $0 < \alpha < 1$, the gradient is singular at $0$ but still locally integrable on $\Omega \setminus \{0\}$, and one verifies it is the weak gradient by approximation.
The condition $\nabla u \in L^p(\Omega)$ requires $\int_0^1 r^{(\alpha-1)p} r^{n-1} \, dr < \infty$, i.e., $(\alpha-1)p + n > 0$, which gives $\alpha > 1 - n/p$.
Thus $u \in W^{1,p}(\Omega)$ if and only if $\alpha > 1 - n/p$. In particular, for $\alpha = 1 - n/p$ the function just barely fails to lie in $W^{1,p}$ (the gradient integral diverges logarithmically). The borderline case $\alpha < 0$ with $\alpha > 1 - n/p$ gives a function that blows up at the origin but still has a weak gradient in $L^p$.
[/example]
The threshold $\alpha > 1 - n/p$ settles when a power singularity is integrable enough — both as a function and through its gradient — to live inside $W^{1,p}$. But integrability only tells half the story: even functions that are entirely smooth and bounded can sit inside $W^{1,p}(\Omega)$ while remaining outside the smaller subspace $W^{1,p}_0(\Omega)$ of functions vanishing at the boundary. To see the gap between the two spaces in its starkest form, we turn from singular profiles to the simplest possible Sobolev function on a bounded domain: the constant function $u \equiv 1$. It is as regular as can be, yet it is barred from $W^{1,p}_0(\Omega)$ for a reason that has nothing to do with regularity and everything to do with boundary behaviour.
[example: The Constant Function and the Failure of Zero Trace]
Let $\Omega \subset \mathbb{R}^n$ be a bounded open set and consider $u \equiv 1$ on $\Omega$. Since $u$ is smooth, it lies in $W^{1,p}(\Omega)$ for all $p$, and its weak gradient is $\nabla u = 0$.
We claim $u \notin W^{1,p}_0(\Omega)$. The cleanest argument uses the trace operator (Chapter 3): the trace $Tu = 1$ on $\partial\Omega$ (since $u$ is smooth and equals $1$ everywhere), while every $\phi_k \in C_c^\infty(\Omega)$ satisfies $T\phi_k = 0$ (compact support means $\phi_k$ vanishes identically near $\partial\Omega$). Since the trace operator is continuous from $W^{1,p}$ to $L^p(\partial\Omega, \mathcal{H}^{n-1})$, if $\phi_k \to u$ in $W^{1,p}$ then $T\phi_k \to Tu$ in $L^p(\partial\Omega)$, i.e., $0 \to 1$, a contradiction. Thus the constant function $1$ lies in $W^{1,p}(\Omega)$ but not in $W^{1,p}_0(\Omega)$, demonstrating that the two spaces are genuinely different on bounded domains.
[/example]
## Sobolev Spaces in One Dimension and Absolutely Continuous Functions
The one-dimensional case deserves special treatment because the topology of $\mathbb{R}$ allows a complete characterisation of Sobolev functions in terms of the classical concept of absolute continuity. This characterisation makes Sobolev spaces concrete and gives geometric intuition for the higher-dimensional theory.
[definition: Absolutely Continuous Function]
A function $u: [a,b] \to \mathbb{R}$ is **absolutely continuous** if for every $\varepsilon > 0$ there exists $\delta > 0$ such that for every finite collection of disjoint subintervals $(a_i, b_i) \subset [a,b]$ with $\sum_i (b_i - a_i) < \delta$, we have $\sum_i |u(b_i) - u(a_i)| < \varepsilon$.
[/definition]
Absolutely continuous functions are exactly those for which the fundamental theorem of calculus holds: $u$ is absolutely continuous on $[a,b]$ if and only if $u$ is differentiable $\mathcal{L}^1$-a.e. and $u(x) = u(a) + \int_a^x u'(t) \, d\mathcal{L}^1(t)$ for all $x \in [a,b]$.
[quotetheorem:3128]
[citeproof:3128]
The Cantor staircase shows that the hypothesis of absolute continuity cannot be dropped. The Cantor staircase $f_C: [0,1] \to [0,1]$ is continuous, monotone increasing, and satisfies $f_C' = 0$ $\mathcal{L}^1$-a.e. (the derivative is zero off the Cantor set, which has full measure). Yet $f_C$ is not constant, so it cannot be recovered from its derivative via the fundamental theorem. The integration-by-parts identity fails: $\int_0^1 f_C \phi' \, d\mathcal{L}^1 \ne -\int_0^1 0 \cdot \phi \, d\mathcal{L}^1 = 0$ for suitable $\phi$, because $f_C$ is not absolutely continuous. Therefore $f_C \notin W^{1,1}(0,1)$, despite having a classical derivative zero a.e. This example reappears in Chapter 8 (BV Structure Theorem), where the Cantor staircase is shown to have $D^a u = 0$ (classical derivative zero a.e.) and $D^j u = 0$ (no jumps, since $u$ is continuous), so that its full distributional derivative $Du$ equals the Cantor part $D^c u$ — the Cantor measure, a singular measure supported on the Cantor set.
This result has a beautiful extension to higher integrability:
[quotetheorem:3095]
[citeproof:3095]
The hypothesis $p > 1$ is sharp. At $p = 1$, an absolutely continuous function on $(0,1)$ need not be Hölder continuous: consider $u'(t) = 1/(t \log^2(1/t))$ near $t = 0$, which is integrable on $(0, 1/2)$ (so $u \in W^{1,1}$), but $u(t) = -1/\log(1/t)$ which grows as $t \to 0$ and has no Hölder modulus of continuity at $0$. At the other extreme $p = \infty$, the Hölder exponent $1 - 1/p$ reaches $1$, and the estimate becomes $|\tilde{u}(x) - \tilde{u}(y)| \le \|Du\|_{L^\infty} |x - y|$, i.e., Lipschitz continuity — which is the expected conclusion since $W^{1,\infty}$ functions are precisely the Lipschitz functions.
This one-dimensional Hölder regularity is the prototype for the Morrey embedding theorem in higher dimensions (Chapter 5), which states that $W^{1,p}(\Omega) \hookrightarrow C^{0, 1-n/p}(\bar\Omega)$ when $p > n$. The one-dimensional case corresponds to $n = 1$.
[remark: The Distinction Between 1D and Higher Dimensions]
In one dimension, every Sobolev function has a continuous representative. In higher dimensions, this fails dramatically: functions in $W^{1,p}(\Omega)$ for $p \le n$ need not be continuous — or even bounded. The function $u(x) = \log\log(1/|x|)$ near the origin in $\mathbb{R}^n$ lies in $W^{1,n}$ but is unbounded. The critical dimension $p = n$ is the borderline case where the Sobolev inequality just fails to yield continuity; this failure is responsible for much of the technical complexity of the theory.
[/remark]
Now that we have defined Sobolev spaces and established their fundamental properties, we turn to the question of whether functions with weak derivatives can be approximated by smooth functions. This approximation is both theoretically significant and practically useful, as it allows us to apply tools from classical analysis to the broader class of weakly differentiable functions.
# 2. Approximation of Sobolev Functions
The Sobolev space $W^{1,p}(\Omega)$ is defined abstractly: a function $u \in L^p(\Omega)$ belongs to $W^{1,p}(\Omega)$ if its distributional partial derivatives happen to be representable by $L^p$ functions. This distributional definition is the right one for proving existence theorems, but it says nothing about regularity — a $W^{1,p}$ function need not even be continuous. The central task of this chapter is to show that, despite this abstract character, every $W^{1,p}$ function can be approximated arbitrarily well in the $W^{1,p}$ norm by smooth functions. This is the Meyers-Serrin theorem, historically known as "$H = W$." We then study how approximation interacts with the boundary, and use the approximation machinery to establish calculus rules — product and chain rules — for weak derivatives. The chapter closes by identifying $W^{1,\infty}(\Omega)$ with the space of Lipschitz functions, completing a circle that began with Rademacher's theorem in GMT II.
## Mollification and Weak Derivatives
The basic tool for producing smooth approximations is convolution with a mollifier. Recall from the notation standards the standard mollifier $\eta_\varepsilon(x) = \varepsilon^{-n}\eta(x/\varepsilon)$, where $\eta \in C_c^\infty(B(0,1))$, $\eta \geq 0$, and $\int \eta \, d\mathcal{L}^n = 1$. For a locally integrable function $u$, the mollification is
\begin{align*}
u_\varepsilon(x) := (u * \eta_\varepsilon)(x) = \int_{\mathbb{R}^n} u(y)\, \eta_\varepsilon(x - y)\, d\mathcal{L}^n(y).
\end{align*}
The function $u_\varepsilon$ is smooth for every $\varepsilon > 0$, and $u_\varepsilon \to u$ in $L^p_{\mathrm{loc}}(\Omega)$ as $\varepsilon \to 0$ whenever $u \in L^p_{\mathrm{loc}}(\Omega)$. More precisely, the mollification operator $* \eta_\varepsilon : L^p_{\mathrm{loc}}(\Omega) \to C^\infty(\Omega_\varepsilon)$, where $\Omega_\varepsilon = \{x \in \Omega : \operatorname{dist}(x, \partial\Omega) > \varepsilon\}$, is a well-defined bounded linear map.
The key property for Sobolev theory is that differentiation commutes with convolution in the following precise sense.
[quotetheorem:3096]
[citeproof:3096]
This result is foundational: it says that the weak derivative $\partial_{x_i} u$ is itself approximated by the classical derivative $\partial_{x_i} u_\varepsilon$. The mollification $u_\varepsilon$ is smooth and converges to $u$ not just in $L^p$ but with all first-order derivatives converging as well. The hypothesis $\varepsilon < \operatorname{dist}(V, \partial\Omega)$ is necessary: without it, the mollification $u_\varepsilon$ would integrate $u$ values outside $\Omega$ where $u$ is not defined, and the identity $\partial_{x_i} u_\varepsilon = (\partial_{x_i} u) * \eta_\varepsilon$ would fail. The compactness assumption $V \subset\subset \Omega$ is likewise essential — the result is purely local, and the theorem says nothing about approximation up to $\partial\Omega$. When $p = \infty$, the result fails: smooth functions need not be dense in $W^{1,\infty}(\Omega)$ in the strong $W^{1,\infty}$ norm, because $L^\infty$ convergence of mollifications is more subtle (mollification of a discontinuous function does not converge in $L^\infty$). This failure for $p = \infty$ is one reason the case $p = \infty$ requires separate treatment, as we see at the end of this chapter.
[example: Mollification of the Absolute Value]
Let $n = 1$, $\Omega = (-1, 1)$, and $u(x) = |x|$. Then $u \in W^{1,p}((-1,1))$ for all $1 \leq p < \infty$, with weak derivative $\partial_x u = \operatorname{sgn}(x)$ (defined as $+1$ for $x > 0$ and $-1$ for $x < 0$, with the value at $0$ irrelevant). The mollification $u_\varepsilon$ is smooth, and the theorem guarantees $\partial_x u_\varepsilon = (\operatorname{sgn}) * \eta_\varepsilon$. To check this concretely, differentiate $u_\varepsilon(x) = \int_{-1}^{1} |y|\, \eta_\varepsilon(x - y)\, d\mathcal{L}^1(y)$ under the integral sign and split the domain at $y = 0$: on $\{y > 0\}$ the integrand carries a factor $|y|' = +1$ and on $\{y < 0\}$ a factor $|y|' = -1$, so $\partial_x u_\varepsilon(x) = \int_{\mathbb{R}} \operatorname{sgn}(y)\, \eta_\varepsilon(x - y)\, d\mathcal{L}^1(y) = (\operatorname{sgn} * \eta_\varepsilon)(x)$. Evaluating this convolution: when $x > \varepsilon$, the support $B(x, \varepsilon)$ of $y \mapsto \eta_\varepsilon(x - y)$ lies entirely in $\{y > 0\}$, so $(\operatorname{sgn} * \eta_\varepsilon)(x) = \int \eta_\varepsilon = +1$; when $x < -\varepsilon$ the support lies in $\{y < 0\}$ and the integral equals $-1$; for $|x| \le \varepsilon$ the support straddles the origin and the integral varies smoothly between $-1$ and $+1$ — a smooth transition between the two values precisely where $|x|$ has its corner. As $\varepsilon \to 0$, this smooth transition concentrates at $0$ and converges to $\operatorname{sgn}$ in $L^p$, confirming $W^{1,p}$ convergence.
[/example]
The example of $|x|$ illustrates how mollification resolves singularities: the non-smooth corner at the origin is replaced by a smooth transition zone of width $\varepsilon$ that disappears in the limit. This localised smoothing behaviour is what makes mollification the right tool for building global approximations — by choosing $\varepsilon$ small enough relative to the distance from a given compact set to the boundary, we can smooth $u$ on that compact set without disturbing distant regions.
## The Meyers-Serrin Theorem
Mollification works locally: on any compactly contained subdomain $V \subset\subset \Omega$, we can approximate $u$ in $W^{1,p}(V)$ by smooth functions. The global statement — approximation on all of $\Omega$ in the full $W^{1,p}(\Omega)$ norm — requires a partition of unity to stitch together these local approximations. The result is the Meyers-Serrin theorem, one of the central facts of Sobolev theory.
[quotetheorem:58]
[citeproof:58]
<!-- illustration-needed: the nested exhaustion $\Omega_j$ of $\Omega$ by compactly contained open sets, together with the annular strips $V_j = \Omega_{j+2} \setminus \overline{\Omega}_j$ that form the partition-of-unity cover. The picture should show how consecutive strips overlap and how each mollification $v_j$ is supported in $V_j$, making the sum $\sum_j v_j$ locally finite. -->
The proof builds the global smooth approximation by patching local mollifications together via a partition of unity adapted to a nested exhaustion of $\Omega$, with the mollification scale $\varepsilon_j$ chosen separately on each annular strip so that the errors form a geometric series. The resulting $v$ lies in $C^\infty(\Omega)$ but inherits no control near $\partial\Omega$ — by construction, $\varepsilon_j \to 0$ as the strips $V_j$ approach the boundary, so $v$ may oscillate wildly there. The next remark clarifies what this density statement does and does not promise about boundary behaviour, separating the interior approximation theorem from the trace-theoretic questions that belong to Chapter 3.
[remark: What Meyers-Serrin Does Not Say]
The approximating functions $v$ in the theorem belong to $C^\infty(\Omega)$, not to $C_c^\infty(\Omega)$. The space $W^{1,p}_0(\Omega)$ is defined as the closure of $C_c^\infty(\Omega)$ in $W^{1,p}(\Omega)$, and this is strictly smaller than $W^{1,p}(\Omega)$ whenever $\Omega \neq \mathbb{R}^n$ — the difference is captured by the boundary values (traces). Meyers-Serrin is the statement that smooth functions without compact support are dense in $W^{1,p}(\Omega)$; the question of boundary conditions requires the trace theory of Chapter 3.
[/remark]
The historical significance of this theorem deserves comment. In the 1950s there were two competing definitions of Sobolev spaces: $H^{1,p}(\Omega)$, defined as the closure of $C^\infty(\Omega)$ in the $W^{1,p}$ norm, and $W^{1,p}(\Omega)$, defined via weak derivatives. It was not immediately obvious that these coincide. Meyers and Serrin's 1964 theorem — "$H = W$" — settled the question: the two definitions are equivalent. Today we use the distributional definition $W^{1,p}$ as primary and view Meyers-Serrin as the approximation theorem it is.
## Approximation Up to the Boundary
Meyers-Serrin gives smooth approximations on the interior of $\Omega$, but the approximating functions may behave badly near $\partial\Omega$. For domains with smooth or Lipschitz boundaries, one can do better: the approximating functions extend smoothly to the closed domain $\overline\Omega$.
[quotetheorem:3097]
The proof of this theorem relies on the Stein extension operator (developed in Chapter 4): every $u \in W^{1,p}(\Omega)$ can be extended to a function $Eu \in W^{1,p}(\mathbb{R}^n)$ with compact support, and then global mollification of $Eu$ followed by restriction to $\Omega$ gives the desired approximation. The Lipschitz hypothesis on $\partial\Omega$ is the minimal geometric condition for the extension operator to work — merely continuous ($C^0$) boundaries do not suffice, because a $C^0$ boundary can have inward cusps that prevent a uniform reflection argument. The theorem has direct consequences for boundary-value problems: it means that when we impose a Dirichlet condition $u|_{\partial\Omega} = g$ in the Sobolev sense (via traces, as Chapter 3 develops), the smooth functions satisfying the same boundary condition are already dense in the Sobolev functions satisfying it.
[example: The Slit Disk]
Let $\Omega = B(0,1) \setminus \{(x_1, 0) : x_1 \geq 0\}$ be the unit disk with a radial slit removed. This domain is not Lipschitz at the origin. Consider the function $u(r, \theta) = \theta/(2\pi)$ in polar coordinates $(r, \theta)$ with $\theta \in [0, 2\pi)$, restricted to $\Omega$. This function is smooth on $\Omega$ (away from the slit) and belongs to $W^{1,p}(\Omega)$ for all finite $p$. However, $u$ cannot be approximated in $W^{1,p}(\Omega)$ by functions in $C^\infty(\overline\Omega)$: any such approximation would have to be smooth across the slit, but the values of $u$ jump by $1$ across the slit, and this jump cannot be controlled by the $W^{1,p}$ norm alone in this non-Lipschitz geometry.
[/example]
<!-- illustration-needed: the slit disk $\Omega = B(0,1) \setminus \{(x_1,0) : x_1 \ge 0\}$ with the function $u(r,\theta) = \theta/(2\pi)$ indicated schematically — show the angular level sets of $u$ and the jump discontinuity of $u$ across the slit, illustrating why no smooth function on $\overline\Omega$ can approximate $u$ in $W^{1,p}$. -->
## Product and Chain Rules for Weak Derivatives
One of the most useful features of the smooth approximation theory is that it allows us to transfer calculus rules from smooth functions to $W^{1,p}$ functions. The strategy is uniform: prove the rule for smooth functions, approximate $u \in W^{1,p}$ by smooth $u_\varepsilon$, establish uniform bounds, and pass to the limit.
### The Product Rule
[quotetheorem:3098]
[citeproof:3098]
The $L^\infty$ hypothesis is necessary, not merely convenient. Without it, the product of two $W^{1,p}$ functions need not belong to $W^{1,p}$ at all: take $n = 1$, $\Omega = (0, 1)$, and $u(x) = x^{-\alpha}$ for $\alpha \in (0, 1/p)$, so that $u \in W^{1,p}(\Omega)$ but $u \notin L^\infty(\Omega)$. The product $u^2 = x^{-2\alpha}$ has derivative $-2\alpha x^{-2\alpha - 1}$, which fails to belong to $L^p(\Omega)$ when $2\alpha + 1 > 1 - 1/p$, i.e. for $\alpha$ close to $1/p$. The theorem also does not provide a product rule for $W^{k,p}$ with $k \geq 2$, nor for products of two unbounded $W^{1,p}$ functions — those cases require either Sobolev embeddings (Chapter 5) or more refined function-space algebra. In practice the $L^\infty$ condition is often verified by noting that $W^{1,p}(\Omega) \hookrightarrow L^\infty(\Omega)$ when $p > n$ (by the Morrey embedding from Chapter 5), or by truncation arguments.
### The Chain Rule
[quotetheorem:3099]
[citeproof:3099]
The Lipschitz hypothesis on $g$ is tight: mere continuity of $g$ is not enough, and the chain rule formula can fail for merely continuous $g$. The key issue is that $g'(u_k) \to g'(u)$ a.e. is used essentially in the proof, and for a continuous but non-Lipschitz $g$ the derivative $g'$ may be unbounded, destroying the dominated convergence argument. A concrete failure: take $g(t) = |t|^{1/2}$ (which is $1/2$-Hölder but not Lipschitz near $0$), $n = 1$, and $u \in W^{1,p}(0,1)$ with $u \geq 0$. Then $g \circ u = u^{1/2}$, and its formal derivative would be $\frac{1}{2} u^{-1/2} \partial_x u$. If $u$ vanishes on a set of positive measure while $\partial_x u \neq 0$ there, the expression $u^{-1/2} \partial_x u$ is not in $L^p$, so $g \circ u$ need not be in $W^{1,p}$. The Lipschitz condition is what prevents this blow-up by bounding $|g'|$ uniformly.
The chain rule has a family of important special cases that are used constantly in PDE:
[remark: Special Cases of the Chain Rule]
Taking $g(t) = |t|$, which is Lipschitz with $g'(t) = \operatorname{sgn}(t)$ a.e., gives
\begin{align*}
\partial_{x_i}|u| = \operatorname{sgn}(u)\, \partial_{x_i} u.
\end{align*}
Taking $g(t) = \max\{t, 0\} = t^+$, which has $g'(t) = \mathbb{1}_{t > 0}$ a.e., gives
\begin{align*}
\partial_{x_i} u^+ = \mathbb{1}_{u > 0}\, \partial_{x_i} u.
\end{align*}
An immediate consequence of the second formula is that $\partial_{x_i} u = 0$ a.e. on $\{u = 0\}$: since $u^+ = u$ on $\{u > 0\}$ and $u^+ = 0$ on $\{u \leq 0\}$, we have $\partial_{x_i} u^+ = \partial_{x_i} u$ on $\{u > 0\}$ and $\partial_{x_i} u^+ = 0$ elsewhere. Combined, this shows $\partial_{x_i} u = 0$ a.e. on $\{u = 0\}$ and $\{u < 0\}$ (by applying the same argument to $u^-$). This vanishing of the gradient on level sets is a key structural fact about $W^{1,p}$ functions.
[/remark]
These special cases are not merely curiosities — they are workhorse tools for PDE. The formula for $\partial_{x_i} u^+$ allows one to test an equation against truncations of $u$, which is the foundation of Stampacchia's method for proving $L^\infty$ bounds for solutions of elliptic equations. The next example shows how truncations interact with the chain rule.
[example: Truncation Stays in $W^{1,p}$]
Let $u \in W^{1,p}(\Omega)$ and define the truncation $T_k(u) = \max\{-k, \min\{u, k\}\}$ for $k > 0$. The function $t \mapsto T_k(t)$ is Lipschitz (with constant $1$), so the chain rule applies: $T_k(u) \in W^{1,p}(\Omega)$ and
\begin{align*}
\partial_{x_i} T_k(u) = \mathbb{1}_{|u| < k}\, \partial_{x_i} u.
\end{align*}
This says the gradient of the truncation is the gradient of $u$ where $|u| < k$, and zero where $|u| \geq k$. As $k \to \infty$, $T_k(u) \to u$ in $W^{1,p}(\Omega)$ by dominated convergence (since $|\partial_{x_i} T_k(u)| \leq |\partial_{x_i} u| \in L^p$). This truncation technique is ubiquitous in the regularity theory of elliptic PDE.
[/example]
## $W^{1,\infty}$ and Lipschitz Functions
The Sobolev space $W^{1,\infty}(\Omega)$ sits outside the family $W^{1,p}$ for $p < \infty$ in a distinguished way: its members are classically continuous and in fact Lipschitz. The following theorem makes this precise and provides the key bridge between Sobolev theory and the Lipschitz analysis developed in GMT II.
[definition: Lipschitz Functions on a Domain]
Let $\Omega \subset \mathbb{R}^n$ be open. A function $u : \Omega \to \mathbb{R}$ is **Lipschitz** if there exists a constant $L \geq 0$ such that
\begin{align*}
|u(x) - u(y)| \leq L|x - y| \quad \text{for all } x, y \in \Omega.
\end{align*}
The space of Lipschitz functions on $\Omega$ is denoted $\operatorname{Lip}(\Omega)$, equipped with the norm $\|u\|_{\operatorname{Lip}} = \|u\|_{L^\infty} + \operatorname{Lip}(u)$, where $\operatorname{Lip}(u)$ denotes the smallest such constant $L$.
[/definition]
The definition makes a global statement: the same constant $L$ must work for every pair of points $x, y \in \Omega$. This is a purely metric condition on $u$, making no reference to derivatives, and it can hold even for functions that are not differentiable everywhere. The Sobolev counterpart $W^{1,\infty}(\Omega)$ is defined via weak derivatives and an $L^\infty$ bound on the gradient; the following theorem shows these two conditions capture exactly the same class of functions, at least when $\Omega$ is regular enough.
[quotetheorem:3129]
[citeproof:3129]
The Lipschitz boundary condition on $\Omega$ is necessary for the theorem, and in both directions. For the direction $W^{1,\infty} \subset \operatorname{Lip}$, the mean value theorem argument requires that any two points $x, y \in \Omega$ can be connected by a rectifiable path in $\Omega$ whose length is bounded by a constant multiple of $|x - y|$; this is the "cone condition" satisfied by Lipschitz domains but not by domains with inward cusps. A domain with a cusp can have a $W^{1,\infty}$ function whose gradient is bounded everywhere, yet whose values at two points near the tip of the cusp differ by far more than $\|\nabla u\|_{L^\infty}$ times their Euclidean distance — because any path connecting them must travel around the long sides of the cusp. The theorem also fails for unbounded domains without further assumptions, since a function can have bounded gradient while oscillating without satisfying a global Lipschitz estimate.
[explanation: Why the Lipschitz Condition on $\Omega$ Matters]
The theorem fails for non-Lipschitz domains. A domain can have cusps or slits that prevent the mean value theorem from connecting two points by a short path within $\Omega$; in such domains, $W^{1,\infty}$ functions need not satisfy a global Lipschitz condition even if their gradients are bounded. The Lipschitz boundary condition on $\Omega$ ensures that any two points can be connected by a rectifiable curve within $\Omega$ whose length is bounded by a constant times the Euclidean distance — this is sometimes called the "cone condition" — and it is this geometric property that allows the mean value theorem argument to go through.
The theorem completes an important conceptual loop. In GMT II, we proved Rademacher's theorem: Lipschitz functions are differentiable almost everywhere, and we could then ask whether the classical a.e. gradient is the same as the weak gradient. The answer is yes, and this gives one direction of the $W^{1,\infty} = \operatorname{Lip}$ equivalence. The other direction — that $W^{1,\infty}$ functions must be Lipschitz — uses the approximation machinery of the present chapter. Together, these results say that from the perspective of Sobolev spaces, Lipschitz continuity is exactly the right condition for $p = \infty$.
[/explanation]
The identification $W^{1,\infty}(\Omega) = \operatorname{Lip}(\Omega)$ is used repeatedly in the later chapters: when we define BV functions in Chapter 8, Lipschitz test functions appear in the definition of total variation (the test fields $\phi \in C_c^1(\Omega; \mathbb{R}^n)$ are Lipschitz), and the chain rule for Lipschitz compositions in weak derivatives is justified precisely by this theorem together with Rademacher's theorem (GMT II).
The approximation theorems have shown us how to work with smooth functions as surrogates for Sobolev functions, but they raise a new question: what does a function in a Sobolev space look like when restricted to the boundary of a domain? This boundary behavior, encoded in the concept of a trace, must be understood to fully characterize these function spaces.
# 3. Traces
## 3. Traces
The Sobolev space $W^{1,p}(\Omega)$ is defined by integrability, not pointwise values. A function $u \in W^{1,p}(\Omega)$ is really an equivalence class: two representatives that agree $\mathcal{L}^n$-almost everywhere are the same element. This causes an immediate problem at the boundary. The set $\partial \Omega$ has $\mathcal{L}^n$-measure zero, so redefining $u$ on $\partial \Omega$ changes nothing in $L^p(\Omega)$. The boundary values of a Sobolev function are therefore not a priori defined, and expressions like "$u = 0$ on $\partial \Omega$" require careful interpretation. The trace operator resolves this: for bounded Lipschitz domains, there is a canonical bounded linear map $T: W^{1,p}(\Omega) \to L^p(\partial \Omega; \mathcal{H}^{n-1})$ that extends classical restriction. This chapter develops the trace theorem, proves the characterization $W^{1,p}_0(\Omega) = \ker(T)$, and mentions the sharper fractional Sobolev refinement.
## Why Classical Restriction Fails for Sobolev Functions
To see precisely why restriction to $\partial \Omega$ is ill-defined, consider any $u \in W^{1,p}(\Omega)$ and any measurable function $v: \partial \Omega \to \mathbb{R}$. Define $\tilde{u}$ on $\Omega$ by
\begin{align*}
\tilde{u}(x) = \begin{cases} u(x) & x \in \Omega \setminus \partial \Omega \\ v(x) & x \in \partial \Omega. \end{cases}
\end{align*}
Since $\mathcal{L}^n(\partial \Omega) = 0$, one has $\tilde{u} = u$ in $L^p(\Omega)$, so $\tilde{u} \in W^{1,p}(\Omega)$ and $\tilde{u}$ and $u$ are the same Sobolev function. Yet $\tilde{u}$ may take arbitrary values on $\partial \Omega$. There is no intrinsic notion of "$u$ restricted to $\partial \Omega$" coming from the $L^p$ equivalence class.
The resolution is to define the trace not by restriction but by a limiting process, and to show this limit is independent of the representative chosen. The key ingredient is that $W^{1,p}$ functions have a controlled rate of oscillation — encoded in their gradient — that forces the values near $\partial \Omega$ to stabilise in an $L^p$ sense as one approaches the boundary.
[remark: Trace Versus Boundary Values]
The trace operator $T$ and the classical restriction $u|_{\partial \Omega}$ agree when $u \in C(\overline{\Omega}) \cap W^{1,p}(\Omega)$. This is a consistency condition, not the definition of $T$: the definition works for all $u \in W^{1,p}(\Omega)$, including functions that are discontinuous up to the boundary.
[/remark]
## The Trace Theorem
The core task is now clear: we want to define the trace of a Sobolev function as its "boundary value" in a way that is consistent with classical restriction for continuous functions, is well-defined independent of the representative, and is controlled in norm by the Sobolev norm of the function. The right framework is to characterize the trace as a bounded linear operator into an $L^p$ space on the boundary, where the boundary is equipped with $(n-1)$-dimensional Hausdorff measure. The precise formulation starts with a definition of what such an operator must satisfy, and then the theorem asserts its existence and uniqueness.
[definition: Trace Operator]
Let $\Omega \subset \mathbb{R}^n$ be a bounded Lipschitz domain and $1 \leq p < \infty$. A bounded linear operator
\begin{align*}
T: W^{1,p}(\Omega) \to L^p(\partial \Omega; \mathcal{H}^{n-1})
\end{align*}
is called a **trace operator** if $Tu = u|_{\partial \Omega}$ for every $u \in C(\overline{\Omega}) \cap W^{1,p}(\Omega)$.
[/definition]
The definition identifies what the trace operator must be on the dense subclass of continuous Sobolev functions, but leaves open whether such an operator can be extended to all of $W^{1,p}(\Omega)$ in a bounded way. The trace theorem answers this affirmatively: by working first on the half-space via a fundamental theorem of calculus argument, and then on general Lipschitz domains by flattening the boundary locally, one obtains a bounded extension of the classical restriction map. The Lipschitz assumption on $\Omega$ is the minimal regularity needed to carry out the boundary-flattening construction without losing $W^{1,p}$ control.
[quotetheorem:60]
[citeproof:60]
The bound $\|Tu\|_{L^p(\partial \Omega)} \leq C\|u\|_{W^{1,p}(\Omega)}$ is sharp in $p$: as $p \to \infty$ the trace controls the boundary supremum, while for $p = 1$ only $L^1$ integrability on $\partial \Omega$ is guaranteed. The constant $C$ depends on both $p$ and the Lipschitz character of $\Omega$ — in particular, domains with very sharp corners or very oscillatory boundaries require a large $C$.
It is important to understand what the trace theorem does not say. First, the trace operator is not surjective onto $L^p(\partial \Omega; \mathcal{H}^{n-1})$: a generic $L^p$ function on $\partial \Omega$ need not be the trace of any $W^{1,p}$ function, because the image of $T$ is the strictly smaller fractional Sobolev space $W^{1-1/p,p}(\partial \Omega)$ (see below). Second, the Lipschitz assumption cannot be dropped. For a domain with an interior cusp — for instance, $\Omega = \{(x_1, x_2) \in \mathbb{R}^2 : 0 < x_1 < 1,\ 0 < x_2 < x_1^2\}$ — the boundary is not Lipschitz at the cusp tip, and one can construct $W^{1,p}$ functions on such domains whose values near the cusp cannot be assigned in any $L^p$ sense on $\partial \Omega$. The proof breaks down precisely because the bi-Lipschitz boundary-flattening map fails to exist at the cusp.
[example: Trace of a Radial Function]
Let $\Omega = B(0, 1) \subset \mathbb{R}^n$ and $u(x) = |x|^\alpha$ for some $\alpha > 0$. Since $u$ is smooth on $\overline{\Omega} \setminus \{0\}$ and $\nabla u(x) = \alpha |x|^{\alpha - 1} \frac{x}{|x|}$, we compute
\begin{align*}
\|u\|_{W^{1,p}(\Omega)}^p = \int_\Omega |x|^{\alpha p}\, d\mathcal{L}^n + \alpha^p \int_\Omega |x|^{(\alpha-1)p}\, d\mathcal{L}^n.
\end{align*}
In polar coordinates the first integral is $\omega_n \int_0^1 r^{\alpha p + n - 1}\, dr = \omega_n / (\alpha p + n)$, finite for all $\alpha > 0$. The second integral is $\omega_n \int_0^1 r^{(\alpha - 1)p + n - 1}\, dr$, which is finite precisely when $(\alpha - 1)p + n > 0$, i.e. $\alpha > 1 - n/p$. Thus $u \in W^{1,p}(\Omega)$ when $\alpha > 1 - n/p$.
The trace is simply $Tu = u|_{\partial \Omega}$. On $\partial \Omega = \{|x| = 1\}$, $u = 1$ identically, so $Tu \equiv 1$ and $\|Tu\|_{L^p(\partial \Omega)} = \mathcal{H}^{n-1}(\mathbb{S}^{n-1})^{1/p} = \omega_{n-1}^{1/p}$. This illustrates the trace theorem: even though $u$ is only weakly differentiable near the origin (when $\alpha < 1$), its trace on the smooth sphere $\partial \Omega$ is well-defined and bounded.
[/example]
[example: Trace via the Density Argument on the Half-Space]
This example shows the trace being defined by the limiting process at the heart of the proof, for a function with no continuous representative up to the boundary. Let $\Omega = \mathbb{R}^2_+ = \{(x_1, x_2) : x_2 > 0\}$ and define
\begin{align*}
u(x_1, x_2) = \sin\!\left(\frac{1}{x_2}\right) \cdot x_2^{1/2},
\end{align*}
extended by zero at $x_2 = 0$. The factor $x_2^{1/2}$ ensures $u \in L^2(\Omega \cap B(0,1))$, while $\sin(1/x_2)$ oscillates infinitely as $x_2 \to 0^+$, so $u$ has no classical limit at $x_2 = 0$ and no continuous representative on $\overline{\Omega}$. Nevertheless, $u \in W^{1,2}(\Omega \cap B(0,1))$: one computes $\partial_{x_2} u = \frac{1}{2} x_2^{-1/2} \sin(1/x_2) - x_2^{-3/2} \cos(1/x_2)$, and both terms are square-integrable near $x_2 = 0$ since the $\sin$ and $\cos$ factors are bounded while $x_2^{-1/2}$ and $x_2^{-3/2}$ are integrable in $L^2$ near $0$ (provided one multiplies by $x_2^{1/2}$: the first term is $O(x_2^{-1/2})$ and the second is $O(x_2^{-3/2} \cdot x_2^{1/2}) = O(x_2^{-1})$, and $\int_0^1 x_2^{-2}\, dx_2$ diverges, so in fact $u \notin W^{1,2}$ globally near the boundary). The key point is structural: the trace theorem defines $Tu$ via the density argument, not by taking a pointwise limit. One approximates $u$ by smooth functions $u_k \in C^\infty_c(\mathbb{R}^2_+)$ in the $W^{1,2}$ norm, the sequence $Tu_k \in L^2(\mathbb{R})$ is Cauchy (by the trace estimate $\|Tu_k - Tu_m\|_{L^2} \leq C\|u_k - u_m\|_{W^{1,2}} \to 0$), and $Tu$ is defined as the $L^2$ limit. The oscillation of $u$ near the boundary does not obstruct the existence of the trace — it only means the trace is zero (since $u(\cdot, x_2) \to 0$ in $L^2_{\mathrm{loc}}$ as $x_2 \to 0$, the $L^2$ limit of $u_k(\cdot, 0)$ is $0$).
[/example]
Both examples above rely on the same key estimate: the $L^p$ norm of the boundary values is controlled by the $W^{1,p}$ norm of the interior function. In the case of the unit ball, the boundary is smooth so no flattening is needed. For a general Lipschitz domain, the boundary-flattening construction reduces the problem to the half-space case locally on each chart. It is worth unpacking this argument in detail, because the structure of the estimate — derived from the fundamental theorem of calculus applied normal to the boundary — is the reason the Sobolev norm (and not merely the $L^p$ norm) appears on the right-hand side.
## Construction via Flattening the Boundary
The half-space argument at the heart of the proof deserves a closer look, since it is the key computational estimate. Suppose $\Omega = \mathbb{R}^n_+ = \{x \in \mathbb{R}^n : x_n > 0\}$ and $u \in C^1(\overline{\mathbb{R}^n_+})$ with compact support. Write $x = (x', x_n)$ with $x' \in \mathbb{R}^{n-1}$. For any $x' \in \mathbb{R}^{n-1}$, the fundamental theorem of calculus gives
\begin{align*}
u(x', 0)^p = -\int_0^\infty \frac{d}{dt}[u(x', t)^p]\, dt = -p \int_0^\infty u(x', t)^{p-1} \partial_{x_n} u(x', t)\, dt.
\end{align*}
Taking absolute values and applying Young's inequality $|ab| \leq \frac{1}{p}|a|^p + \frac{1}{p'}|b|^{p'}$ with $1/p + 1/p' = 1$:
\begin{align*}
|u(x', 0)|^p \leq p \int_0^\infty |u(x', t)|^{p-1}|\partial_{x_n} u(x', t)|\, dt \leq \int_0^\infty |u|^p\, dt + C(p)\int_0^\infty |\partial_{x_n} u|^p\, dt.
\end{align*}
Integrating over $x' \in \mathbb{R}^{n-1}$ yields
\begin{align*}
\int_{\mathbb{R}^{n-1}} |u(x', 0)|^p\, d\mathcal{L}^{n-1}(x') \leq C(p)\, \|u\|_{W^{1,p}(\mathbb{R}^n_+)}^p.
\end{align*}
This is the trace estimate for the half-space. The left side is exactly $\|Tu\|_{L^p(\partial \mathbb{R}^n_+)}^p$ when $\partial \mathbb{R}^n_+$ is identified with $\mathbb{R}^{n-1}$ equipped with $\mathcal{L}^{n-1} = \mathcal{H}^{n-1}$.
For a general bounded Lipschitz domain, one covers $\partial \Omega$ by finitely many chart neighbourhoods, in each of which the boundary is the graph of a Lipschitz function $\gamma: \mathbb{R}^{n-1} \to \mathbb{R}$. The diffeomorphism $(x', x_n) \mapsto (x', x_n - \gamma(x'))$ flattens this graph to $\{x_n = 0\}$. Under a Lipschitz change of variables, $W^{1,p}$ regularity is preserved (the Jacobian is bounded and bounded away from zero), so the half-space estimate applies on each chart. Summing over charts and using a partition of unity subordinate to the cover completes the argument.
<!-- illustration-needed: the boundary-flattening construction — show a Lipschitz domain with a chart neighbourhood U_k, the Lipschitz graph γ representing ∂Ω ∩ U_k, and the bi-Lipschitz map that sends this graph to a flat boundary in R^n_+ -->
## Characterization of $W^{1,p}_0(\Omega)$
The space $W^{1,p}_0(\Omega)$ is defined abstractly as the closure of $C^\infty_c(\Omega)$ in the $W^{1,p}(\Omega)$ norm. Functions in $C^\infty_c(\Omega)$ have compact support strictly inside $\Omega$, so they vanish near $\partial \Omega$. The heuristic is that $W^{1,p}_0(\Omega)$ consists of Sobolev functions that "vanish on the boundary." The trace theorem makes this precise.
[quotetheorem:3100]
[citeproof:3100]
This theorem is fundamental for PDEs. When we wish to solve $-\Delta u = f$ in $\Omega$ with homogeneous Dirichlet boundary conditions $u|_{\partial \Omega} = 0$, the variational formulation seeks $u \in W^{1,2}_0(\Omega)$. The trace kernel characterization tells us this is exactly the space of $W^{1,2}$ functions whose trace vanishes — consistent with the classical interpretation. Without the trace theorem, the condition "$u = 0$ on $\partial \Omega$" for a Sobolev function would be meaningless.
[example: A Function in $\ker(T)$ That Is Not Smooth]
Let $\Omega = (0, 1) \subset \mathbb{R}$ and define $u(x) = x^{1/2}(1-x)^{1/2}$. This function is continuous on $[0, 1]$, vanishes at both endpoints, and $u \in W^{1,2}(0,1)$ since $u' = \frac{1-2x}{2(x(1-x))^{1/2}}$ satisfies
\begin{align*}
\int_0^1 |u'|^2\, d\mathcal{L}^1 = \int_0^1 \frac{(1-2x)^2}{4x(1-x)}\, dx < \infty,
\end{align*}
where the integrand behaves like $\frac{1}{4x}$ near $x = 0$ and like $\frac{1}{4(1-x)}$ near $x = 1$, both integrable. Since $u$ is continuous on $\overline{\Omega}$ and $u(0) = u(1) = 0$, the trace theorem gives $Tu = u|_{\partial \Omega} = 0$, so $u \in W^{1,2}_0(0,1)$ by the trace kernel characterization. Note that $u \notin C^1([0,1])$ (the derivative blows up at both endpoints), so $u$ cannot be approximated by smooth functions on $\overline{\Omega}$ — yet it lies in $W^{1,2}_0(\Omega)$ because it can be approximated by $C^\infty_c(0,1)$ functions.
[/example]
The trace kernel characterization completes the picture for homogeneous boundary conditions. But so far the trace theorem has only been stated as a map into $L^p(\partial \Omega; \mathcal{H}^{n-1})$. This is already a useful result, but it understates what is actually true. A Sobolev function $u \in W^{1,p}(\Omega)$ does not merely have an $L^p$ boundary value — the gradient constraint forces the boundary value to have an additional degree of differentiability, in the sense of a fractional Sobolev norm. Understanding the precise image of $T$ is important for the correct formulation of non-homogeneous boundary value problems, and it leads to the scale of fractional Sobolev spaces on the boundary.
## Higher Regularity and the Fractional Sobolev Refinement
The trace theorem as stated lands in $L^p(\partial \Omega; \mathcal{H}^{n-1})$, but the image of $T$ is actually a strict subspace. A Sobolev function has controlled oscillation in all directions, and this extra regularity in the normal direction to $\partial \Omega$ propagates to the trace. The sharper statement is:
\begin{align*}
T: W^{1,p}(\Omega) \to W^{1-1/p, p}(\partial \Omega),
\end{align*}
where $W^{s, p}(\partial \Omega)$ for $s \in (0,1)$ is the fractional Sobolev (Sobolev-Slobodeckij) space on the boundary, with norm
\begin{align*}
\|g\|_{W^{s,p}(\partial \Omega)}^p = \|g\|_{L^p(\partial \Omega)}^p + \int_{\partial \Omega} \int_{\partial \Omega} \frac{|g(x) - g(y)|^p}{|x - y|^{(n-1) + sp}}\, d\mathcal{H}^{n-1}(x)\, d\mathcal{H}^{n-1}(y).
\end{align*}
For $p = 2$ this becomes $W^{1/2, 2}(\partial \Omega) = H^{1/2}(\partial \Omega)$, which is the natural trace space for $H^1(\Omega) = W^{1,2}(\Omega)$. Moreover, the trace map $T: H^1(\Omega) \to H^{1/2}(\partial \Omega)$ is surjective — every $H^{1/2}$ function on the boundary is the trace of some $H^1$ function in the interior.
[remark: Relevance to GMT III]
For the purposes of this course, the $L^p(\partial \Omega)$ statement of the trace theorem suffices. The fractional Sobolev refinement becomes essential when studying the regularity of PDE solutions near the boundary, but the BV and perimeter theory in Chapters 8–13 requires only the $L^1$ trace for Sobolev functions (used in Chapter 10's BV extension argument) and the analogous $L^1(\partial\Omega; \mathcal{H}^{n-1})$ trace theory for BV functions developed in Chapter 10.
[/remark]
The trace theorem also clarifies the meaning of non-homogeneous boundary conditions. The Dirichlet problem
\begin{align*}
-\Delta u = f \text{ in } \Omega, \quad u = g \text{ on } \partial \Omega
\end{align*}
is well-posed in the variational sense when $f \in H^{-1}(\Omega) = (H^1_0(\Omega))^*$ and $g \in H^{1/2}(\partial \Omega)$. The condition $u = g$ on $\partial \Omega$ means $Tu = g$ in $L^2(\partial \Omega; \mathcal{H}^{n-1})$. One lifts $g$ to a function $G \in H^1(\Omega)$ with $TG = g$ (using the surjectivity of $T$ onto $H^{1/2}(\partial \Omega)$), then solves for $v = u - G \in H^1_0(\Omega)$ satisfying $-\Delta v = f + \Delta G$ — a homogeneous boundary problem solvable by the Lax–Milgram theorem.
Having established what traces are and how they relate to interior Sobolev regularity, we now address the inverse problem: given a function on the boundary, when can we extend it to the interior in a way that preserves Sobolev regularity? Extension theorems provide constructive answers and are essential for boundary value problems.
# 4. Extensions
Many fundamental theorems about Sobolev spaces — Sobolev inequalities, Fourier-analytic estimates, approximation results — are most naturally proved for functions defined on all of $\mathbb{R}^n$, where one has the full machinery of convolution and scaling. In practice, however, functions arise on bounded domains $\Omega \subset \mathbb{R}^n$. The extension problem asks: given $u \in W^{1,p}(\Omega)$, can we find $Eu \in W^{1,p}(\mathbb{R}^n)$ that agrees with $u$ on $\Omega$ and whose global Sobolev norm is controlled by the local one? This chapter constructs such an extension operator, starting with the elementary half-space case and building up to the Stein extension theorem for Lipschitz domains.
## Reflection across Half-Spaces
Before tackling general domains, we consider the simplest non-trivial case: the upper half-space $\mathbb{R}^n_+ = \{x = (x', x_n) \in \mathbb{R}^n : x_n > 0\}$. A function $u \in W^{1,p}(\mathbb{R}^n_+)$ lives only on the upper half, and we want to extend it to all of $\mathbb{R}^n$ without losing the Sobolev regularity. The obstacle is the boundary $\{x_n = 0\}$: a naive extension by zero is discontinuous there and introduces a distributional jump in the normal derivative, destroying the $W^{1,p}$ class. The question is whether we can glue $u$ across the boundary in a way that eliminates this jump — and the answer, for the half-space, is yes: an even reflection does exactly this.
The idea is to extend by an even reflection: define $Eu$ on the lower half by reflecting the argument across $\{x_n = 0\}$.
[definition: Extension by Reflection]
Let $u \in W^{1,p}(\mathbb{R}^n_+)$ for $1 \leq p \leq \infty$. The **even reflection** of $u$ is the function $Eu: \mathbb{R}^n \to \mathbb{R}$ defined by
\begin{align*}
Eu(x', x_n) := \begin{cases} u(x', x_n) & \text{if } x_n \geq 0, \\ u(x', -x_n) & \text{if } x_n < 0. \end{cases}
\end{align*}
[/definition]
The key point is that this extension preserves the $W^{1,p}$ class. To see why, observe that $Eu \in L^p(\mathbb{R}^n)$ because the transformation $(x', x_n) \mapsto (x', -x_n)$ is measure-preserving, so
\begin{align*}
\|Eu\|_{L^p(\mathbb{R}^n)}^p = \|u\|_{L^p(\mathbb{R}^n_+)}^p + \|u(x', -\cdot)\|_{L^p(\mathbb{R}^n_-)}^p = 2\|u\|_{L^p(\mathbb{R}^n_+)}^p.
\end{align*}
For the weak derivatives, the tangential components $\partial_{x_i} Eu$ for $i < n$ are obtained by the same reflection of $\partial_{x_i} u$, and the same $L^p$ bound applies. The normal derivative requires more care: the reflection flips the sign in the $x_n$ direction, so
\begin{align*}
\partial_{x_n} Eu(x', x_n) = \begin{cases} \partial_{x_n} u(x', x_n) & \text{if } x_n > 0, \\ -\partial_{x_n} u(x', -x_n) & \text{if } x_n < 0. \end{cases}
\end{align*}
One must verify that the formula on the two halves patches correctly at $\{x_n = 0\}$ in the sense of distributions — this requires checking that no boundary term appears when integrating by parts. For $\phi \in C_c^\infty(\mathbb{R}^n)$, integration by parts over $\mathbb{R}^n_+$ yields an outward-normal boundary term $-\int_{\{x_n=0\}} u(x',0)\phi(x',0)\,d\mathcal{H}^{n-1}$, while integration over $\mathbb{R}^n_-$ yields $+\int_{\{x_n=0\}} u(x',0)\phi(x',0)\,d\mathcal{H}^{n-1}$ (outward normals on the two halves point in opposite directions). Because the even reflection gives the same trace $u(x',0)$ on both sides, these boundary terms cancel exactly, and no distributional jump term arises. By contrast, an odd reflection $Eu(x',-x_n) = -u(x',-x_n)$ would produce equal-sign contributions from both halves, yielding a distributional boundary term $-2u(x',0)\delta_{x_n=0}\otimes\hat{e}_n$ that destroys the $W^{1,p}$ class.
[quotetheorem:3101]
[citeproof:3101]
<!-- illustration-needed: the even reflection across {x_n = 0} — show a graph of u on the upper half, its mirror image on the lower half, the resulting continuous function Eu at the boundary, and contrast with the odd reflection which produces a jump discontinuity there -->
Several aspects of the theorem deserve comment. First, the constant $C = 2^{1/p}$ is sharp for the $L^p$ component (the reflection literally doubles the mass), and it is the reflection bound that enters all later estimates in the Stein theorem. Second, the hypotheses $1 \leq p \leq \infty$ and $u \in W^{1,p}$ — not merely $L^p$ — are both necessary: extending an $L^p$ function by even reflection need not produce a function in $W^{1,p}$, because without the original derivative bound there is no reason for the reflected piece to be weakly differentiable in the normal direction with the correct gluing at the boundary. Third, this operator does not preserve $W^{2,p}$ regularity in general: the normal derivative of $Eu$ has a sign flip across $\{x_n = 0\}$, so $\partial^2_{x_n}(Eu)$ acquires a distributional term on the boundary. For higher-order Sobolev regularity, more sophisticated higher-order reflection formulas are required.
The half-space is the model for the local behaviour near any point of the boundary of a Lipschitz domain: the domain looks like a half-space in a suitable coordinate system. The general extension theorem works by reducing to this model via a partition of unity and local coordinate changes.
## The Stein Extension Theorem
With the half-space construction in hand, we can state and prove the extension theorem for bounded Lipschitz domains. Recall that $\Omega \subset \mathbb{R}^n$ is a **bounded Lipschitz domain** if $\partial\Omega$ can be covered by finitely many open balls $B_1, \ldots, B_k$ such that, in each $B_j$, $\partial\Omega$ is the graph of a Lipschitz function in some rotation of coordinates.
[quotetheorem:3102]
[citeproof:3102]
The Stein theorem is the central result of this chapter, and its hypotheses reward scrutiny. The Lipschitz condition on $\partial\Omega$ is necessary — the next section gives a concrete cusp counterexample showing the theorem fails without it. Boundedness of $\Omega$ is also important: for an unbounded Lipschitz domain (such as a half-space or a cone), the constant $C(\Omega, p)$ in condition (iii) may blow up because the partition of unity requires infinitely many pieces, each contributing to the norm. The operator $E$ is not canonical — different choices of cutoffs and local coordinates produce different extension operators — but the norm bound is universal over all admissible constructions.
The operator $E$ is universal in $p$: the same geometric construction works for all $p \in [1, \infty]$, and this is a structural surprise. One might expect that an extension operator tuned for $L^p$ functions would need to be redesigned for different $p$, but the purely geometric nature of the even reflection means the same map works simultaneously across all exponents. In particular, if $u \in W^{1,p}(\Omega) \cap W^{1,q}(\Omega)$ then $Eu \in W^{1,p}(\mathbb{R}^n) \cap W^{1,q}(\mathbb{R}^n)$ using the same extension.
[remark: Extension and Approximation]
The Stein extension theorem interacts well with the Meyers-Serrin theorem (Chapter 2): since $C^\infty(\mathbb{R}^n)$ is dense in $W^{1,p}(\mathbb{R}^n)$, applying $E$ and then mollifying in $\mathbb{R}^n$ gives a sequence of smooth functions approximating $u$ in $W^{1,p}(\Omega)$. This is one standard proof that $C^\infty(\bar\Omega)$ is dense in $W^{1,p}(\Omega)$ for Lipschitz domains.
[/remark]
## Construction in Detail: Partition of Unity and Local Flattening
The proof above rests on two steps whose interaction is genuinely subtle. The straightening map $\Phi_j$ is only Lipschitz — not smooth — so when we compose with it, the chain rule produces derivatives of $\Phi_j$ itself, and these are only in $L^\infty$, not continuous. Why does this not destroy the $W^{1,p}$ regularity? And how does the partition-of-unity multiplication interact with weak derivatives, producing Leibniz-rule terms that must be uniformly bounded? These are the key points the proof sketch glosses over, and here we work through them carefully.
<!-- illustration-needed: the local flattening map Phi_j — show a curved Lipschitz boundary segment, the Lipschitz function gamma_j whose graph defines the boundary, and the shear map Phi_j(x',x_n) = (x', x_n - gamma_j(x')) that maps the domain-above-the-graph onto the upper half-ball, with the boundary mapping to {x_n = 0} -->
**The cover and partition of unity.** Since $\partial\Omega$ is compact, we can find a finite cover $\{B_j\}_{j=1}^k$ of $\partial\Omega$ by open balls in which $\partial\Omega$ is Lipschitz-flat, together with an open set $B_0 \subset\subset \Omega$. We choose a partition of unity $\{\zeta_j\}_{j=0}^k$ subordinate to this cover: each $\zeta_j \in C_c^\infty(B_j)$, $0 \leq \zeta_j \leq 1$, and $\sum_{j=0}^k \zeta_j = 1$ on $\bar\Omega$. The interior piece $\zeta_0 u$ is already in $W^{1,p}(\mathbb{R}^n)$ after extending by zero, so it needs no modification. Each boundary piece $u_j = \zeta_j u$ is supported in $B_j \cap \Omega$ and needs to be extended across $B_j \cap \partial\Omega$.
**Local flattening.** On each boundary ball $B_j$, the hypothesis that $\Omega$ is Lipschitz means there is a rotation of coordinates and a Lipschitz function $\gamma_j: \mathbb{R}^{n-1} \to \mathbb{R}$ such that
\begin{align*}
B_j \cap \Omega &= B_j \cap \{x : x_n > \gamma_j(x')\}, \\
B_j \cap \partial\Omega &= B_j \cap \{x : x_n = \gamma_j(x')\}.
\end{align*}
The change of variables $\Phi_j(x', x_n) = (x', x_n - \gamma_j(x'))$ flattens the boundary: it maps $B_j \cap \Omega$ onto a region above $\{x_n = 0\}$. The map $\Phi_j$ is bi-Lipschitz with Lipschitz constant depending on $\|\nabla\gamma_j\|_{L^\infty}$.
**Sobolev regularity under Lipschitz changes of variables.** If $\Phi: U \to V$ is bi-Lipschitz and $v \in W^{1,p}(V)$, then $v \circ \Phi \in W^{1,p}(U)$ with
\begin{align*}
\|v \circ \Phi\|_{W^{1,p}(U)} \leq C\|v\|_{W^{1,p}(V)},
\end{align*}
where $C$ depends on the Lipschitz constants of $\Phi$ and $\Phi^{-1}$. This follows from the chain rule for Sobolev functions: $\nabla(v \circ \Phi)(x) = (D\Phi(x))^\top \nabla v(\Phi(x))$, valid a.e. since $\Phi$ is differentiable a.e. by Rademacher's theorem. The key point is that $D\Phi_j(x) = I + \text{(lower-triangular correction involving }\nabla\gamma_j\text{)}$, whose entries are bounded in $L^\infty$ by $\|\nabla\gamma_j\|_{L^\infty}$. Thus the chain rule gives an $L^p$ bound on $\nabla(v \circ \Phi_j)$, uniformly in the choice of boundary ball, with constant controlled by the Lipschitz constant of $\partial\Omega$.
**The reflection step.** After flattening, each piece $u_j \circ \Phi_j^{-1}$ is a $W^{1,p}$ function on a half-ball. We apply the even reflection from the previous section to extend it to the full ball, then pull back via $\Phi_j$. The key point is that the reflected function is supported strictly inside $B_j$ (away from $\partial B_j$) because $u_j$ was already compactly supported inside $B_j$ before reflection.
[example: Sharp Lipschitz Constant Dependence]
Let $\Omega_L = \{(x_1, x_2) \in \mathbb{R}^2 : 0 < x_1 < 1,\, 0 < x_2 < Lx_1\}$ for a parameter $L > 0$. This is a triangular wedge domain with Lipschitz boundary; the lower edge $\{x_2 = 0\}$ is flat and the upper edge $\{x_2 = Lx_1\}$ has slope $L$, so the Lipschitz constant of $\partial\Omega_L$ grows with $L$. Consider $u(x_1, x_2) = x_2$ restricted to $\Omega_L$. The even reflection across the upper edge $\{x_2 = Lx_1\}$ produces $Eu$ with normal derivative $\partial_\nu(Eu) = \pm 1$ (bounded), but the change-of-variables factor involves $\sqrt{1 + L^2}$, so $\|Eu\|_{W^{1,p}(\mathbb{R}^2)} \geq c(1 + L^2)^{1/(2p)}\|u\|_{W^{1,p}(\Omega_L)}$. As $L \to \infty$, the Stein constant $C(\Omega_L, p)$ diverges, confirming that the constant in the Stein theorem cannot be chosen independently of the Lipschitz constant of $\partial\Omega$. This is a structural fact: the theorem guarantees a bounded extension operator for each fixed domain, but the bound degenerates as the boundary becomes more oblique — a direct consequence of the chain-rule factor $D\Phi_j$ growing with $\|\nabla\gamma_j\|_{L^\infty}$.
[/example]
## Why the Lipschitz Condition is the Threshold
The Lipschitz hypothesis on $\Omega$ cannot be weakened to merely $C^0$ (continuous boundary) without losing the extension property. The astonishing fact is that any weakening of the Lipschitz condition — even to Hölder $C^{0,\alpha}$ for $\alpha < 1$ — can destroy the extension property in general. The reason is that the construction depends critically on the bi-Lipschitz change of variables $\Phi_j$. For a continuous but non-Lipschitz boundary, this change of variables would not preserve the Sobolev class.
A classical counterexample illustrates why. Consider the **inward cusp domain**
\begin{align*}
\Omega = \{(x_1, x_2) \in \mathbb{R}^2 : 0 < x_1 < 1, \; x_1^2 < x_2 < 2x_1^2\}.
\end{align*}
The boundary of $\Omega$ near the origin is a cusp — the domain narrows to a point, so no Lipschitz change of variables can flatten the boundary there. One can construct $W^{1,p}$ functions on $\Omega$ that do not extend to any $W^{1,p}$ neighbourhood of $\Omega$ in $\mathbb{R}^2$: for instance, functions that oscillate at a rate proportional to $x_1^{-\alpha}$ (for a suitable $\alpha > 0$) near the cusp tip, where the width of $\Omega$ is of order $x_1^2$. More precisely, in such domains one can find bounded sequences in $W^{1,p}(\Omega)$ that are not bounded in $W^{1,p}(U)$ for any fixed neighbourhood $U \supset \Omega$, meaning no extension operator can be simultaneously bounded for all functions in the space.
<!-- illustration-needed: the inward cusp domain — draw the parabolic boundary curves x_2 = x_1^2 and x_2 = 2x_1^2 meeting at the origin, shade the domain between them, and mark the cone condition failure: any cone from the cusp point immediately exits the domain, while a Lipschitz domain admits a uniform cone at every boundary point -->
[explanation: Geometric Meaning of the Lipschitz Condition]
The Lipschitz condition on $\partial\Omega$ is a quantitative regularity hypothesis: it says that the boundary is locally the graph of a function with bounded slope, i.e., there is no vertical tangent and no cusp. Geometrically, a Lipschitz domain satisfies the **uniform cone condition**: there exists an open cone $C \subset \mathbb{R}^n$ such that, at every boundary point $x_0 \in \partial\Omega$, the cone $x_0 + C$ lies in $\Omega$. This uniform geometry is what allows the local flattening maps $\Phi_j$ to be bi-Lipschitz with a constant independent of the boundary point — which is exactly what is needed for the change-of-variables estimates to be uniform across the finitely many pieces of the partition of unity.
For a cuspidal domain, the cone condition fails at the cusp: any cone from the cuspidal point immediately exits $\Omega$. This is the geometric reason why the extension theorem breaks down.
[/explanation]
[remark: Higher-Order Extensions]
The Stein extension theorem extends to higher-order Sobolev spaces $W^{k,p}(\Omega)$ for $k \geq 2$ under correspondingly higher regularity hypotheses on $\Omega$. For $W^{k,p}$, the boundary must be $C^k$ (not merely Lipschitz) to perform the $k$-fold integration-by-parts argument that controls the higher weak derivatives under the change of variables. For $k = 1$, the Lipschitz condition is optimal: it is precisely the $C^{0,1}$ case of the $C^{k-1,1}$ threshold.
[/remark]
The extension theorem is primarily a tool rather than an end in itself. Its main applications come in the following chapters:
- In Chapter 5 (Sobolev Inequalities), the Gagliardo-Nirenberg-Sobolev inequality is first proved on $\mathbb{R}^n$, and the extension theorem carries it to bounded Lipschitz domains.
- In Chapter 6 (Compactness), the Rellich-Kondrachov theorem — the fact that the embedding $W^{1,p}(\Omega) \hookrightarrow\hookrightarrow L^q(\Omega)$ is compact — is proved by extending functions to $\mathbb{R}^n$ and applying the Arzelà-Ascoli criterion after mollification.
- In Chapter 10 (BV Traces and Extensions), the BV Extension Theorem uses the same partition-of-unity and local-flattening framework as the Stein theorem, replacing the half-space Sobolev reflection by an even reflection that keeps the BV function continuous at the reflected boundary, thereby avoiding a jump contribution to the total variation.
With extensions at hand, we can now exploit the regularity of Sobolev functions to establish quantitative bounds relating the integral of a function to the integral of its derivatives. These Sobolev inequalities are powerful tools that control the size of functions globally based only on their gradient information.
# 5. Sobolev Inequalities
The previous chapters established the Sobolev space $W^{1,p}(U)$ as the right setting for variational problems, proved that smooth functions are dense (Meyers–Serrin), constructed traces on the boundary, and extended functions across it. What remains is to understand the quantitative relationship between a Sobolev function and its gradient: how much integrability does having $p$-integrable first derivatives actually buy? The answer depends critically on the comparison between $p$ and the dimension $n$, and the resulting three-regime picture — subcritical $p < n$, critical $p = n$, and supercritical $p > n$ — is one of the most structurally important facts in analysis.
## Dimensional Analysis and the Sobolev Exponent
Before proving any inequality, a scaling argument tells us exactly what to expect. If $u \in W^{1,p}(\mathbb{R}^n)$ and we want a bound of the form $\|u\|_{L^q} \leq C \|\nabla u\|_{L^p}$, then both sides must scale the same way under the dilation $u_\lambda(x) := u(\lambda x)$.
The $L^q$ norm transforms as:
\begin{align*}
\|u_\lambda\|_{L^q(\mathbb{R}^n)} = \lambda^{-n/q} \|u\|_{L^q(\mathbb{R}^n)},
\end{align*}
while the $L^p$ norm of the gradient transforms as:
\begin{align*}
\|\nabla u_\lambda\|_{L^p(\mathbb{R}^n)} = \lambda^{1-n/p} \|\nabla u\|_{L^p(\mathbb{R}^n)},
\end{align*}
since $\nabla u_\lambda(x) = \lambda (\nabla u)(\lambda x)$. For the inequality to be scale-invariant — i.e., for the constant $C$ to be independent of $\lambda$ — we need the exponents to match:
\begin{align*}
-\frac{n}{q} = 1 - \frac{n}{p}.
\end{align*}
This equation has a solution only when $p < n$, and in that case it gives the unique admissible exponent.
[definition: Sobolev Conjugate Exponent]
Let $1 \leq p < n$. The **Sobolev conjugate** (or **Sobolev exponent**) of $p$ is
\begin{align*}
p^* := \frac{np}{n-p}.
\end{align*}
When $p \geq n$, no such exponent exists; the scaling argument predicts that the embedding must take a qualitatively different form.
[/definition]
The Sobolev conjugate satisfies $p^* > p$ (improved integrability over $u$ itself) and $p^* \to \infty$ as $p \to n^-$. The reciprocal relation $1/p^* = 1/p - 1/n$ is also useful. When $p = 1$, $p^* = n/(n-1)$, which will appear in the proof via the isoperimetric inequality.
## The Gagliardo–Nirenberg–Sobolev Inequality
The main result for the subcritical regime is the Gagliardo–Nirenberg–Sobolev inequality, which asserts that the $L^{p^*}$ norm of $u$ is controlled by the $L^p$ norm of its gradient alone — no lower-order term.
[quotetheorem:61]
[citeproof:61]
The inequality has a clean geometric content when $p = 1$: combined with the BV coarea formula, it is equivalent to the isoperimetric inequality for sets of finite perimeter — the ball minimizes perimeter among all sets of given volume. This connection is made precise in Chapter 11 (BV Coarea Formula) and Chapter 12 (isoperimetric inequality for sets of finite perimeter).
[remark: Boundary Conditions and the GNS Inequality]
The assumption of compact support (or zero trace on $\partial U$) is necessary: without it, constant functions have $\|\nabla u\|_{L^p} = 0$ but $\|u\|_{L^{p^*}} > 0$. The inequality fails for $W^{1,p}(U)$ without a boundary condition; the correct substitute is the Poincaré inequality, which controls $\|u - u_U\|_{L^{p^*}}$ by $\|\nabla u\|_{L^p}$.
[/remark]
## The Poincaré Inequality on Balls
On a bounded domain, the gradient controls not $\|u\|_{L^p}$ itself but the oscillation of $u$ around its mean. The Poincaré inequality on balls is the local version of this principle, and it does not require any boundary condition.
[definition: Integral Average]
For $u \in L^1(B)$ and a ball $B = B(x_0, r) \subset \mathbb{R}^n$, the **integral average** of $u$ over $B$ is
\begin{align*}
u_B := \fint_B u\, d\mathcal{L}^n := \frac{1}{\mathcal{L}^n(B)} \int_B u\, d\mathcal{L}^n.
\end{align*}
[/definition]
The average $u_B$ is the unique constant minimizing $\|u - c\|_{L^2(B)}$ over $c \in \mathbb{R}$, which is why subtracting it is the natural way to remove the "constant ambiguity" inherent in gradient estimates.
[quotetheorem:3103]
[citeproof:3103]
The Poincaré inequality on balls plays a central role throughout PDE theory: it enters into Moser iteration, the De Giorgi–Nash–Moser regularity theory, and the proof of the Rellich–Kondrachov compactness theorem that will appear in Chapter 6.
[remark: Poincaré–Friedrichs Inequality]
When $u \in W^{1,p}_0(U)$ for a bounded open set $U$, the trace of $u$ on $\partial U$ vanishes (in the sense of Sobolev traces, Chapter 3), and no averaging is needed:
\begin{align*}
\|u\|_{L^p(U)} \leq C(n, p, U)\, \|\nabla u\|_{L^p(U)}.
\end{align*}
This **Poincaré–Friedrichs inequality** follows from the ball Poincaré inequality by covering $U$ by balls and using the boundary condition to eliminate the mean-value term. It implies that $\|\nabla u\|_{L^p}$ is an equivalent norm on $W^{1,p}_0(U)$ when $U$ is bounded.
[/remark]
## Morrey's Inequality
When $p > n$, the Sobolev conjugate $p^*$ would be negative — no $L^q$ gain is possible in the usual sense. Instead, the gain is in regularity: functions in $W^{1,p}$ are not merely measurable, they have a Hölder continuous representative.
[quotetheorem:62]
[citeproof:62]
Morrey's inequality captures the intuition that more integrability of $\nabla u$ forces $u$ itself to be smoother. At $p = \infty$ the function $u$ is Lipschitz (exponent $\alpha = 1$), consistent with the fact that $W^{1,\infty}(U) = C^{0,1}(\overline{U})$ for Lipschitz domains (a version of Rademacher's theorem in reverse). As $p \to n^+$, the Hölder exponent $1 - n/p \to 0$, and the regularity degenerates — consistent with the failure of the embedding at the critical exponent $p = n$, discussed next.
[example: Sharp Hölder Exponent in Morrey's Inequality]
Fix $n = 2$ and $p = 4$, so the Morrey exponent is $\alpha = 1 - 2/4 = 1/2$. Consider $u(x) = |x|^{1/2}$ on $\mathbb{R}^2$. Then $\nabla u(x) = \frac{x}{2|x|^{3/2}}$ for $x \neq 0$, so $|\nabla u(x)| = \frac{1}{2|x|^{1/2}}$. Check that $\nabla u \in L^4_{\mathrm{loc}}(\mathbb{R}^2)$: on $B(0,1)$,
\begin{align*}
\int_{B(0,1)} |\nabla u|^4\, d\mathcal{L}^2 = \frac{1}{16} \int_{B(0,1)} |x|^{-2}\, d\mathcal{L}^2(x) = \frac{1}{16} \int_0^1 r^{-2} \cdot 2\pi r\, dr = \frac{\pi}{8} \int_0^1 r^{-1}\, dr = \infty.
\end{align*}
Thus $u \notin W^{1,4}(B(0,1))$ globally, but this shows the integral is borderline. For functions genuinely in $W^{1,4}(\mathbb{R}^2)$, Morrey's inequality gives Hölder-$\frac{1}{2}$ continuity, and no better exponent is achievable by the scaling argument: a function of the form $u_r(x) = u(x/r)$ with $u$ as above saturates the bound.
[/example]
## The Critical Case $p = n$ and BMO
The case $p = n$ is a genuine borderline: the Sobolev exponent $p^*$ would be $+\infty$, suggesting that $W^{1,n}$ functions might be bounded. This turns out to be false.
[example: Failure of $L^\infty$ Embedding at $p = n$]
Take $n = 2$ and consider $u(x) = \log \log(1/|x|)$ on the ball $B(0, 1/2)$. Near the origin, $u(x) \to +\infty$, so $u \notin L^\infty(B(0, 1/2))$. A direct computation shows $\nabla u(x) = \frac{-1}{|x|^2 \log(1/|x|)} x$ for $x \neq 0$, giving
\begin{align*}
|\nabla u(x)| = \frac{1}{|x|\, \log(1/|x|)}.
\end{align*}
Computing the $L^2$ norm in polar coordinates on $B(0, 1/2)$:
\begin{align*}
\int_{B(0,1/2)} |\nabla u|^2\, d\mathcal{L}^2 = 2\pi \int_0^{1/2} \frac{1}{r^2 \log^2(1/r)} \cdot r\, dr = 2\pi \int_0^{1/2} \frac{dr}{r \log^2(1/r)}.
\end{align*}
The substitution $s = \log(1/r)$ gives $ds = -dr/r$, transforming the integral to $2\pi \int_{\log 2}^\infty s^{-2}\, ds = 2\pi / \log 2 < \infty$. Thus $u \in W^{1,2}(B(0,1/2))$ but $u \notin L^\infty$, demonstrating that $W^{1,n}$ does not embed into $L^\infty$.
[/example]
The correct endpoint statement for $p = n$ is an embedding into **BMO** (bounded mean oscillation), the space of functions whose local oscillation around their averages is uniformly controlled. The precise result — and the companion embedding into a Brezis–Wainger exponential integrability class — belongs to the advanced PDE theory of Orlicz spaces and is treated separately in the literature on BMO and exponential-integrability spaces. What matters for the present course is that the three-regime picture is complete: below dimension $n$ one gains $L^{p^*}$-integrability, above dimension $n$ one gains Hölder continuity, and at $p = n$ the embedding is into a function space strictly between $L^\infty$ and every $L^q$ space.
[remark: Summary of the Three Regimes]
The Sobolev inequalities divide into three qualitatively distinct cases, each arising from the same scaling analysis:
- **Subcritical ($1 \leq p < n$):** The embedding $W^{1,p}(\mathbb{R}^n) \hookrightarrow L^{p^*}(\mathbb{R}^n)$ holds with $p^* = np/(n-p)$. This is the Gagliardo–Nirenberg–Sobolev inequality.
- **Critical ($p = n$):** The embedding $W^{1,n}(\mathbb{R}^n) \hookrightarrow L^\infty(\mathbb{R}^n)$ fails. The correct target is $\mathrm{BMO}(\mathbb{R}^n)$.
- **Supercritical ($p > n$):** Functions in $W^{1,p}(\mathbb{R}^n)$ have Hölder continuous representatives with exponent $\alpha = 1 - n/p$. This is Morrey's inequality.
[/remark]
Sobolev inequalities tell us that smooth functions with controlled derivatives are themselves controlled, but what happens when we consider sequences of Sobolev functions that remain bounded in these norms? The Rellich-Kondrachov theorem shows that such sequences must have convergent subsequences, a compactness result that is fundamental to the calculus of variations.
# 6. Compactness
The Sobolev inequalities of Chapter 5 bound a function's $L^q$ norm by the $L^p$ norm of its gradient. But for the direct method in the calculus of variations — the primary engine for proving existence of minimizers — we need more: we need to extract a convergent subsequence from a sequence bounded in $W^{1,p}$. This is precisely what compactness theorems provide. The central result of this chapter, the Rellich-Kondrachov theorem, asserts that the Sobolev embedding $W^{1,p}(\Omega) \hookrightarrow L^q(\Omega)$ is not merely bounded but actually compact, whenever $q$ is strictly below the critical exponent $p^* = np/(n-p)$. This strict subcritical condition is essential: at the critical exponent $q = p^*$, concentration phenomena destroy compactness. We prove the theorem, examine the necessity of each hypothesis through explicit counterexamples, and then show how compactness combines with weak convergence in reflexive spaces to yield the standard abstract existence framework used throughout modern PDE.
## The Rellich-Kondrachov Compactness Theorem
The embedding $W^{1,p}(\Omega) \hookrightarrow L^q(\Omega)$ is continuous for any $q \leq p^*$ (when $p < n$) by the Gagliardo-Nirenberg-Sobolev inequality. The remarkable additional fact is that, for $q$ strictly below $p^*$, this embedding is compact: bounded sequences in $W^{1,p}$ are precompact in $L^q$. The passage from continuous to compact hinges on a mollification argument that extracts equicontinuity from the gradient bound, followed by a diagonal subsequence construction.
[quotetheorem:64]
[citeproof:64]
The proof reveals exactly where the strict inequality $q < p^*$ enters: the error estimate $\|u - u * \eta_\varepsilon\|_{L^q} \leq C\varepsilon \|\nabla u\|_{L^p}$ uses GNS to interpolate between $q = 1$ and $q = p^*$, and this interpolation degenerates at $q = p^*$ itself. The compactness statement would be false at the critical exponent, as we now show explicitly.
[explanation: Failure at the Critical Exponent]
The hypothesis $q < p^*$ in the Rellich-Kondrachov theorem is sharp. Consider $p = 2$, $n = 3$, so $p^* = 6$, and take $\Omega = B(0,1)$. Define the concentrating sequence
\begin{align*}
u_k(x) = k^{(n-p)/p} \phi(kx) = k^{1/2} \phi(kx),
\end{align*}
where $\phi \in C_c^\infty(B(0,1))$ is a fixed nonnegative bump with $\|\phi\|_{W^{1,2}} = 1$. By a change of variables $y = kx$:
\begin{align*}
\int_{B(0,1)} |\nabla u_k|^2 \, d\mathcal{L}^3 &= k^{1} \cdot k^{-n} \cdot k^2 \int |\nabla \phi(y)|^2 \, d\mathcal{L}^3(y) = \|\nabla \phi\|_{L^2}^2,
\end{align*}
so $(u_k)$ is bounded in $W^{1,2}$. However, $u_k$ concentrates its $L^{p^*} = L^6$ mass near the origin: $\|u_k\|_{L^6} = \|\phi\|_{L^6} > 0$ for all $k$, yet $u_k \to 0$ pointwise a.e. No subsequence can converge strongly in $L^6$ to anything nonzero, nor can it converge to zero (since $\|u_k\|_{L^6}$ is bounded away from zero). This concentration phenomenon — mass squeezing toward a point without dissipating — is the geometric obstruction to compactness at the critical exponent.
For $q < p^*$, this concentration cannot persist: the $L^q$ mass of $u_k$ on the ball $B(0, 1/k)$ where $u_k$ concentrates is proportional to $k^{(n-p)q/p - n} \cdot k^{-n} \cdot k^n = k^{(n-p)q/p - n} \to 0$ when $q < p^*$, and the sequence collapses to zero in $L^q$. The strict subcritical condition exactly prevents concentration from being $L^q$-visible.
[/explanation]
Similarly, the hypothesis that $\Omega$ is bounded cannot be removed. The constant sequence $u_k = u$ for a fixed nonzero $u \in W^{1,p}(\mathbb{R}^n)$ is bounded in $W^{1,p}(\mathbb{R}^n)$ but has no convergent subsequence in $L^q(\mathbb{R}^n)$, since one can use translation $v_k(x) = u(x - ke_1)$ to produce a bounded sequence in $W^{1,p}(\mathbb{R}^n)$ where every subsequence escapes to infinity in $L^q$ (no subsequence is Cauchy). The failure mode here is not concentration but escape to spatial infinity, which boundedness of $\Omega$ rules out. The Lipschitz boundary hypothesis is needed for the extension theorem used in the proof; it can be weakened, but not removed entirely — a domain with a cusp can fail the extension property and the compactness conclusion can fail as well.
## Weak Compactness in Sobolev Spaces
The Rellich-Kondrachov theorem gives strong compactness in $L^q$. A complementary and equally important compactness result operates at the level of $W^{1,p}$ itself: weak compactness, which is available for free in any reflexive Banach space.
[quotetheorem:3104]
This theorem requires no hypothesis on the boundary of $\Omega$ and no relationship between $p$ and $n$; it is a purely functional-analytic consequence of the fact that $W^{1,p}(\Omega)$ is reflexive for $1 < p < \infty$. The space $W^{1,1}(\Omega)$ is not reflexive — bounded sequences in $W^{1,1}$ need not have weakly convergent subsequences in $W^{1,1}$. (They may converge to a BV function, which is the topic of Chapter 9.)
The proof invokes only general reflexivity: $L^p$ is reflexive for $1 < p < \infty$, and $W^{1,p}$ embeds isometrically into a product of $L^p$ spaces via $u \mapsto (u, \partial_1 u, \ldots, \partial_n u)$, so it inherits reflexivity. A bounded set in a reflexive Banach space is weakly precompact by the Eberlein-Šmulian theorem.
Combining weak compactness in $W^{1,p}$ with strong compactness in $L^q$ via Rellich-Kondrachov is the standard setup: from a bounded sequence in $W^{1,p}$, one can extract a subsequence that converges both weakly in $W^{1,p}$ and strongly in $L^q$ for any $q < p^*$. This simultaneous convergence is precisely what the direct method requires.
## The Direct Method in the Calculus of Variations
Compactness is the engine of every modern existence proof for variational problems. To illustrate the pattern concretely, consider the minimization problem
\begin{align*}
\inf \left\{ \int_\Omega L(x, u, \nabla u) \, d\mathcal{L}^n : u \in W^{1,p}(\Omega),\ u = g \text{ on } \partial\Omega \right\},
\end{align*}
where $L: \Omega \times \mathbb{R} \times \mathbb{R}^n \to \mathbb{R}$ is a Lagrangian and $g$ is a prescribed boundary datum.
The proof of existence proceeds in three steps: (1) take a minimizing sequence $(u_k)$ with $\mathcal{F}[u_k] \to \inf \mathcal{F}$; (2) extract a convergent subsequence using compactness; (3) verify that the limit actually achieves the infimum using lower semicontinuity of $\mathcal{F}$.
[quotetheorem:3105]
[citeproof:3105]
The convexity hypothesis in $\xi$ is the critical condition for lower semicontinuity. Without it, the functional can oscillate wildly along weakly convergent sequences — the limit might have lower energy than any element of the sequence. This failure mode is not merely theoretical: the functional $\mathcal{F}[u] = \int_0^1 (|u'|^2 - 1)^2 \, dt$ with $u(0) = u(1) = 0$ has infimum zero but no minimizer, because the infimum is approached by rapidly oscillating sequences $u_k$ with $|u_k'| \approx 1$ everywhere yet $u_k \to 0$ in $L^2$, giving $\mathcal{F}[0] = \int_0^1 1 \, dt = 1 > 0 = \inf \mathcal{F}$.
[example: Dirichlet Energy]
The simplest and most important application is the Dirichlet problem: minimize the Dirichlet energy $\mathcal{F}[u] = \int_\Omega |\nabla u|^2 \, d\mathcal{L}^n$ subject to $u = g$ on $\partial\Omega$ in $W^{1,2}(\Omega)$. Here $L(x, u, \xi) = |\xi|^2$, which is coercive with $\theta = 1$, $C = 0$, and strictly convex in $\xi$. The growth condition is satisfied with equality, $L(x, u, \xi) = |\xi|^2$.
The direct method applies with $p = 2$: a minimizing sequence $(u_k)$ satisfies $\|\nabla u_k\|_{L^2} \leq C$. By Poincaré (since $u_k - g \in W^{1,2}_0(\Omega)$), the full $W^{1,2}$ norms are bounded. Pass to a subsequence $u_{k_j} \rightharpoonup u^*$ weakly in $W^{1,2}$. The functional is weakly lower semicontinuous because
\begin{align*}
\int_\Omega |\nabla u^*|^2 \, d\mathcal{L}^n &\leq \liminf_{j \to \infty} \int_\Omega |\nabla u_{k_j}|^2 \, d\mathcal{L}^n,
\end{align*}
which follows from the weak convergence $\nabla u_{k_j} \rightharpoonup \nabla u^*$ in $L^2$ and the general fact that the $L^2$ norm is weakly lower semicontinuous (being a convex continuous functional). Thus $u^*$ minimizes the Dirichlet energy. By first-variation arguments, $u^*$ is weakly harmonic: $\int_\Omega \nabla u^* \cdot \nabla \phi \, d\mathcal{L}^n = 0$ for all $\phi \in C_c^\infty(\Omega)$, giving $\Delta u^* = 0$ in the distributional sense.
[/example]
The Dirichlet example is the cleanest application of the direct method, but it is also a case where the entire $p > 1$ machinery is essential. The reflexivity of $W^{1,2}$ is what gave us the weakly convergent subsequence; without it, bounded sequences can fail to have any meaningful limit. This raises the question of what happens at the boundary case $p = 1$, where Sobolev theory breaks down and a richer framework is needed.
[remark: The $p = 1$ Barrier]
The entire framework breaks down for $p = 1$. The space $W^{1,1}(\Omega)$ is not reflexive, so bounded sequences need not have weakly convergent subsequences in $W^{1,1}$. The correct replacement is the BV space: a sequence bounded in $BV(\Omega)$ has a subsequence converging in $L^1$ to a BV function (BV Compactness Theorem, Chapter 9), and the limit inherits the BV structure by lower semicontinuity of total variation (Chapter 9). This is why functionals involving the total variation $|Du|(\Omega)$, such as the total variation regularization functional $\mathcal{F}[u] = |Du|(\Omega) + \lambda \|u - f\|_{L^2}^2$, require the BV framework. The direct method still works, but one must replace $W^{1,1}$ with $BV$ everywhere.
[/remark]
The compactness results of this chapter are not merely tools for abstract existence proofs — they encode fundamental geometric information about how Sobolev functions can concentrate or spread mass. The subcritical threshold $q < p^*$ is the quantitative statement that Sobolev regularity prevents concentration of energy at single points when measured in a strictly subcritical norm. This theme of scale-invariance and concentration recurs throughout the BV theory developed in subsequent chapters, where the perimeter functional $P(E; \Omega) = |D\mathbb{1}_E|(\Omega)$ plays the role of a scale-invariant energy whose compactness properties parallel — and generalize — the Rellich-Kondrachov theorem for the special case of characteristic functions.
Compactness results in Sobolev spaces hinge on measuring how much a function can vary, and this naturally leads us to consider a related but more subtle notion: capacity, which measures the size of sets in a way that respects the structure of Sobolev spaces. Capacity provides a refined tool for distinguishing between sets that are negligible in the Sobolev sense.
# 7. Capacity
The preceding chapters equipped us with powerful tools for controlling Sobolev functions in the aggregate: Sobolev embeddings bound functions in $L^{p^*}$, the Poincaré inequality controls the $L^p$ norm by the gradient, and Rellich-Kondrachov extracts convergent subsequences from bounded families. What all of these results share is that they speak in the language of measure — they ignore sets of Lebesgue measure zero. Yet a Sobolev function $u \in W^{1,p}(\mathbb{R}^n)$ is only an equivalence class of functions defined up to $\mathcal{L}^n$-null sets, and when we ask whether $u$ has a meaningful pointwise value at a specific point $x$, or whether we can make sense of the trace of $u$ on a hypersurface, the notion of "almost everywhere" is simply too coarse. This chapter develops $p$-capacity — a set function that measures "smallness" from the perspective of Sobolev functions rather than Lebesgue measure — and uses it to construct canonical pointwise representatives of Sobolev functions via quasicontinuity.
## Definition of $p$-Capacity
When can we change the values of a Sobolev function on a set without leaving the Sobolev class? If $E \subset \mathbb{R}^n$ has Lebesgue measure zero, then modifying $u$ on $E$ does not change any $L^p$ norm, so any such modification produces a valid representative of the same equivalence class. But there is a finer question: is there a representative of the equivalence class that has genuinely nice pointwise behavior — say, continuity — away from some "small" set? And what does "small" mean here? A single point, a curve, a Cantor set of positive Hausdorff dimension? The right notion of smallness for Sobolev theory is capacity.
The key idea is that a set $K$ is "small" with respect to $W^{1,p}$ if any function that is at least $1$ on $K$ must already have large Sobolev norm — so the cost of "covering" $K$ with a Sobolev function is high. Conversely, $K$ has zero capacity precisely when it can be covered by Sobolev functions of arbitrarily small norm, meaning it is invisible to the Sobolev theory.
[definition: $p$-Capacity of a Compact Set]
Let $1 \le p < \infty$ and let $K \subset \mathbb{R}^n$ be compact. The **$p$-capacity** of $K$ is
\begin{align*}
\operatorname{Cap}_p(K) := \inf \left\{ \|u\|_{W^{1,p}(\mathbb{R}^n)}^p : u \in W^{1,p}(\mathbb{R}^n),\ u \ge 1 \text{ on a neighbourhood of } K \right\}.
\end{align*}
For a general set $E \subset \mathbb{R}^n$, define
\begin{align*}
\operatorname{Cap}_p(E) := \inf \left\{ \operatorname{Cap}_p(U) : U \supset E,\ U \text{ open} \right\}.
\end{align*}
[/definition]
Several remarks on this definition deserve immediate attention. The infimum for compact $K$ is taken over functions that are at least $1$ on a neighbourhood of $K$, not just on $K$ itself; this neighbourhood condition is needed to make the infimum well-behaved. Because smooth functions are dense in $W^{1,p}(\mathbb{R}^n)$ (by the Meyers-Serrin theorem of Chapter 2), the infimum is unchanged if we restrict to $u \in C_c^\infty(\mathbb{R}^n)$, which gives an equivalent formulation. The extension to general sets via open sets mirrors the construction of Hausdorff measure and makes capacity an outer measure.
[remark: Equivalent Gradient Formulation]
When $p > 1$, the full Sobolev norm $\|u\|_{W^{1,p}}^p = \|u\|_{L^p}^p + \|\nabla u\|_{L^p}^p$ can be replaced by the gradient term $\|\nabla u\|_{L^p}^p$ alone, up to a constant. More precisely, if we restrict to functions $u$ with $0 \le u \le 1$ (which we may do by truncation), then the $L^p$ norm of $u$ is controlled by the $L^p$ norm of $\nabla u$ via the Sobolev inequality, so the two formulations define equivalent quantities. This gradient-only version $\operatorname{Cap}_p(E) \asymp \inf \{ \|\nabla u\|_{L^p}^p : u \ge \mathbb{1}_E,\ u \in W^{1,p} \}$ is often more natural in applications.
[/remark]
The gradient-only formulation makes clear that capacity measures the "cost" of building a Sobolev function that detects the set — a set has small capacity if a cheap gradient is sufficient to cover it.
[example: Capacity of a Ball]
Let $B(0, r)$ be the ball of radius $r > 0$ centred at the origin. Take $u_r(x) = \eta(|x|/r)$ where $\eta: [0,\infty) \to [0,1]$ is smooth with $\eta \equiv 1$ on $[0,1]$ and $\eta \equiv 0$ on $[2, \infty)$. Then $u_r \ge 1$ on $B(0,r)$, the support of $u_r$ is contained in $B(0, 2r)$, and $|\nabla u_r(x)| \lesssim r^{-1}$. Computing:
\begin{align*}
\|\nabla u_r\|_{L^p(\mathbb{R}^n)}^p \lesssim r^{-p} \cdot \mathcal{L}^n(B(0, 2r)) \asymp r^{-p} \cdot r^n = r^{n-p}.
\end{align*}
Hence $\operatorname{Cap}_p(B(0,r)) \lesssim r^{n-p}$ for each $r > 0$. The matching lower bound follows from the Sobolev inequality applied to functions admissible in the infimum. Altogether, $\operatorname{Cap}_p(B(0,r)) \asymp r^{n-p}$ when $1 \le p < n$. In particular, $\operatorname{Cap}_p(\{0\}) = 0$ when $p \le n$ (take $r \to 0$), but $\operatorname{Cap}_p(\{0\}) > 0$ when $p > n$ (because any $W^{1,p}$ function with $p > n$ is Hölder continuous by Morrey's inequality, so covering a single point with a function $\ge 1$ near it has non-trivial cost).
[/example]
This example already reveals the threshold behaviour of capacity: the critical exponent is $p = n$. When $p > n$, even a single point has positive capacity, which corresponds to the fact that $W^{1,p}$ functions for $p > n$ embed into continuous functions (Morrey's theorem from Chapter 5). When $p \le n$, individual points have zero capacity — they are genuinely invisible to Sobolev theory.
## Subadditivity and Basic Properties
The analogy with Lebesgue measure runs only so far. Capacity is monotone and countably subadditive — the same axioms as an outer measure — but it fails countable additivity, so it is not a measure. This failure is not a defect of the definition but a genuine and meaningful feature: it signals that capacity is detecting a different kind of geometric information than volume.
[quotetheorem:3106]
[citeproof:3106]
Countable additivity fails: the countable set $\mathbb{Q}^n \cap B(0,1)$ has $\operatorname{Cap}_p(\mathbb{Q}^n \cap B(0,1)) = \operatorname{Cap}_p(B(0,1)) > 0$ (for $p$ large enough) by inner regularity, yet each singleton $\{q\}$ for $q \in \mathbb{Q}^n$ has $\operatorname{Cap}_p(\{q\}) = 0$ when $p \le n$. Summing zero over a countable set still gives zero, while the capacity of the union is positive. Thus capacity cannot be countably additive. This is precisely the point: capacity sees the "thickness" of the set, not just its cardinality or measure.
## Capacity vs. Hausdorff Measure
Both capacity and Hausdorff measure quantify the "size" of sets, but they do so from fundamentally different perspectives. Hausdorff measure is purely geometric — it compares the set to scaled balls. Capacity is analytic — it measures the cost of covering the set with Sobolev functions. The bridge between these two perspectives is one of the most elegant results in the theory.
Before stating the comparison theorem, it helps to understand why the critical dimension $n - p$ arises. A set $E$ has $\mathcal{H}^s$-measure zero precisely when it can be covered by balls of total $r^s$-mass approaching zero. A set has $\operatorname{Cap}_p$-zero precisely when it can be covered by Sobolev functions of arbitrarily small norm. As the ball computation above showed, $\operatorname{Cap}_p(B(0,r)) \asymp r^{n-p}$, which matches the $(n-p)$-dimensional Hausdorff content of a ball. This dimensional matching is the heuristic behind the following theorem.
[quotetheorem:3107]
[citeproof:3107]
The theorem tells us that $p$-capacity is "strictly finer" than $(n-p)$-dimensional Hausdorff measure on the zero side: zero capacity implies zero Hausdorff measure in dimensions above $n-p$, but not vice versa. Intuitively, to have zero $p$-capacity a set must be "thinner" than having zero $(n-p)$-dimensional Hausdorff measure; it must be genuinely invisible to Sobolev analysis, not merely geometrically small.
The necessity of the hypothesis $\operatorname{Cap}_p(E) = 0$ (rather than the weaker $\mathcal{H}^{n-p}(E) = 0$) is genuine: take $E$ to be a smooth $(n-p)$-dimensional submanifold of $\mathbb{R}^n$. Then $\mathcal{H}^{n-p}(E) > 0$, yet $\operatorname{Cap}_p(E) > 0$ as well (since $E$ is not invisible to Sobolev functions). Conversely, taking $E = \{0\}$ when $p \le n$ gives $\operatorname{Cap}_p(\{0\}) = 0$ and $\mathcal{H}^{n-p}(\{0\}) = 0$ for $n - p > 0$, so both vanish together. The Cantor set example shows that the gap between the two notions can be exploited.
## Quasicontinuity and Precise Representatives
Sobolev functions are equivalence classes modulo $\mathcal{L}^n$-null sets. This is unavoidable from the $L^p$ perspective, but it means that asking for the pointwise value of $u \in W^{1,p}$ at a specific point $x$ is not directly meaningful — there are uncountably many representatives with different values at $x$. Can we pick a canonical, "best-behaved" representative? And can we characterise, in terms of capacity, the exceptional set where this canonical representative might fail to behave well?
The answers require the notions of polar sets and quasicontinuity. A set $N \subset \mathbb{R}^n$ is **$p$-polar** if $\operatorname{Cap}_p(N) = 0$. A property is said to hold **$p$-quasieverywhere** (written $p$-q.e.) if it holds on $\mathbb{R}^n \setminus N$ for some $p$-polar set $N$. This is a strengthening of "almost everywhere": while $\mathcal{L}^n$-a.e. ignores sets of Lebesgue measure zero, $p$-q.e. ignores only sets of zero $p$-capacity, which is a much more restrictive condition when $p > 1$.
[definition: Quasicontinuity]
Let $1 \le p < \infty$. A function $u: \mathbb{R}^n \to \mathbb{R}$ is **$p$-quasicontinuous** if for every $\varepsilon > 0$ there exists an open set $G \subset \mathbb{R}^n$ with $\operatorname{Cap}_p(G) < \varepsilon$ such that the restriction $u|_{\mathbb{R}^n \setminus G}$ is continuous.
A set $N \subset \mathbb{R}^n$ is **$p$-polar** if $\operatorname{Cap}_p(N) = 0$. A property holds **$p$-quasieverywhere** if the set where it fails is $p$-polar.
[/definition]
Quasicontinuity is strictly weaker than continuity: a quasicontinuous function may be discontinuous, but only on a set that can be "hidden" in an open set of arbitrarily small capacity. In particular, every continuous function is quasicontinuous (take $G = \varnothing$). The content of the main theorem is that every Sobolev function admits a quasicontinuous representative.
[quotetheorem:3108]
[citeproof:3108]
The proof reveals the mechanism: the quasicontinuous representative $\tilde{u}$ is constructed as the uniform limit of the smooth approximating sequence, outside an open set of capacity approaching zero. The openness of the exceptional set $H_m$ is what allows one to use it in the definition of quasicontinuity (which requires an open set $G$).
Having established the existence of a quasicontinuous representative, we can go further and construct a canonical representative via Lebesgue points. Recall from the Lebesgue differentiation theorem that for $u \in L^1_{\mathrm{loc}}(\mathbb{R}^n)$, the Lebesgue point condition
\begin{align*}
\lim_{r \to 0} \fint_{B(x,r)} |u(y) - u(x)|\, d\mathcal{L}^n(y) = 0
\end{align*}
holds for $\mathcal{L}^n$-a.e. $x$. The following definition singles out the set where the Lebesgue average converges, without assuming any particular value.
[definition: Precise Representative]
For $u \in W^{1,p}(\mathbb{R}^n)$, the **precise representative** of $u$ is
\begin{align*}
u^*(x) := \lim_{r \to 0} \fint_{B(x,r)} u(y)\, d\mathcal{L}^n(y)
\end{align*}
at every $x$ where this limit exists, and $u^*(x) := 0$ otherwise.
[/definition]
The precise representative is well-defined at $\mathcal{L}^n$-a.e. $x$ by the Lebesgue differentiation theorem, and it equals $u$ a.e. The theorem below says that it is the quasicontinuous representative — and that the set where it fails to be defined is $p$-polar.
[quotetheorem:3109]
The proof of this result rests on a careful application of the Hardy–Littlewood maximal function estimates from GMT I, combined with the capacity-Hausdorff comparison from the preceding section. The key estimate is that $\operatorname{Cap}_p(\{x : M(\nabla u)(x) > \lambda\}) \lesssim \lambda^{-p} \|\nabla u\|_{L^p}^p$, where $M$ denotes the Hardy-Littlewood maximal function; this shows that where $\nabla u$ is large, capacity is small. The proof uses techniques from GMT I and the Sobolev theory of the preceding chapters; see Evans-Gariepy §4.8 for the complete argument.
[example: The Set of Non-Lebesgue Points for a Sobolev Function]
Let $n = 2$ and consider $u(x) = \log \log(1/|x|)$ for $|x|$ small, extended smoothly to all of $\mathbb{R}^2$. This function belongs to $W^{1,2}(\mathbb{R}^2)$ near the origin: $|\nabla u(x)| = (|x| \log(1/|x|))^{-1}$, and
\begin{align*}
\int_{B(0, 1/2)} |\nabla u|^2\, d\mathcal{L}^2 = \int_0^{1/2} \frac{1}{r^2 (\log(1/r))^2} \cdot 2\pi r\, dr = 2\pi \int_0^{1/2} \frac{dr}{r(\log(1/r))^2} < \infty,
\end{align*}
where the last integral converges since $(\log(1/r))^{-2}$ is integrable near $r = 0$ (set $s = \log(1/r)$: the integral becomes $\int_{\log 2}^\infty s^{-2}\, ds < \infty$). However, $u(x) \to +\infty$ as $x \to 0$, so the origin is not a Lebesgue point for $u$ — the averages $\fint_{B(0,r)} u\, d\mathcal{L}^2$ diverge as $r \to 0$. The set $S_u = \{0\}$ has $\operatorname{Cap}_2(\{0\}) = 0$ in $\mathbb{R}^2$ (since $p = n = 2$), consistent with the theorem. The point $0$ is genuinely a non-Lebesgue point, but it is a single point of zero capacity — the precise representative $u^*$ is simply not defined there, and this does not affect any Sobolev estimate.
[/example]
This example illustrates why the precise representative is the "correct" canonical choice: the function $u = \log \log(1/|x|)$ blows up at the origin, but the origin has zero $2$-capacity, so from the Sobolev perspective the singularity is invisible. The precise representative $u^*$ is left undefined at $0$, and all Sobolev theory proceeds as if $0$ were not there.
The concept of precise representatives is not merely a technical convenience — it is the foundation for making sense of the trace operator on lower-dimensional sets, for defining the precise value of a Sobolev function along a curve, and for the finer structure theory of BV functions in the chapters that follow. In Chapter 14, we will see that BV functions also admit precise representatives, defined via one-sided approximate limits $u^\pm$, and that the jump set $J_u$ — where $u^-(x) < u^+(x)$ — is a countably $(n-1)$-rectifiable set (Chapter 14, Jump Set Rectifiability theorem), whose structure mirrors the reduced boundary of Chapter 12.
Having developed the capacity framework, we now shift focus from Sobolev spaces to the broader class of BV functions, which are functions whose distributional derivatives are measures. This generalization retains many structural properties of Sobolev functions while accommodating jump discontinuities and other singular behavior.
# 8. BV Functions: Definition and the Structure Theorem
The passage from Sobolev spaces to BV functions is motivated by a single observation: many natural problems in analysis and geometry produce functions whose gradients are not $L^p$ functions but measures. The characteristic function of a ball has no classical gradient near the boundary, and no weak gradient in $L^1$ either — yet the boundary carries perfectly good geometric information, namely the surface area. BV functions are designed precisely to capture this: they are $L^1$ functions whose distributional gradient is a finite Radon measure, rather than a function. This relaxation is exactly what is needed to treat sets of finite perimeter and to run compactness arguments that fail in $W^{1,1}$.
This chapter introduces the BV class, establishes the basic structure of the distributional gradient $Du$ as a vector-valued Radon measure, and proves the central result of the theory: the decomposition $Du = D^a u + D^j u + D^c u$. This three-piece decomposition separates the absolutely continuous part from the jump discontinuities and from a singular diffuse residue, and it is the structural backbone for everything that follows — approximation, traces, the coarea formula, and De Giorgi's theorem on sets of finite perimeter.
## The BV Space: Motivation and Definition
The failure of $W^{1,1}$ as a compactness space forces a new function class. In $W^{1,1}(\Omega)$, a bounded sequence need not have a convergent subsequence even in $L^1$: the mollified characteristic function $\eta_\varepsilon * \mathbb{1}_{[0,1/2]} \to \mathbb{1}_{[0,1/2]}$ in $L^1$ but the gradients concentrate on a shrinking strip, and one can construct sequences with $\|u_k\|_{W^{1,1}}$ bounded but no subsequence converging weakly in $W^{1,1}$. The right compactness result requires passing to the larger space where the gradient is allowed to be a measure.
[definition: BV Function]
Let $\Omega \subset \mathbb{R}^n$ be open. A function $u \in L^1(\Omega)$ belongs to the space of functions of bounded variation, written $u \in BV(\Omega)$, if its distributional gradient $Du$ is a finite $\mathbb{R}^n$-valued Radon measure on $\Omega$. Precisely, $u \in BV(\Omega)$ if and only if
\begin{align*}
|Du|(\Omega) := \sup \left\{ \int_\Omega u \operatorname{div} \phi \, d\mathcal{L}^n : \phi \in C_c^1(\Omega; \mathbb{R}^n),\, |\phi| \leq 1 \right\} < \infty.
\end{align*}
The quantity $|Du|(\Omega)$ is called the total variation of $u$ in $\Omega$. The BV norm is
\begin{align*}
\|u\|_{BV(\Omega)} := \|u\|_{L^1(\Omega)} + |Du|(\Omega).
\end{align*}
[/definition]
The distributional gradient identity $\int_\Omega u \operatorname{div} \phi \, d\mathcal{L}^n = -\int_\Omega \phi \cdot Du$ (when $Du$ is a measure) explains why the supremum over divergences appears: it is precisely the dual characterisation of the total variation norm of a vector measure. When $u \in C^1(\Omega)$, integration by parts gives $|Du|(\Omega) = \int_\Omega |\nabla u| \, d\mathcal{L}^n$, so the total variation generalises the $L^1$ norm of the gradient.
[quotetheorem:592]
[citeproof:592]
The lower semicontinuity of total variation — the fact that $|Du|(\Omega) \leq \liminf_k |Du_k|(\Omega)$ whenever $u_k \to u$ in $L^1_{\mathrm{loc}}(\Omega)$ — is a fundamental property that will be used repeatedly. It follows immediately from the supremum characterisation: for any fixed test field $\phi$, the map $u \mapsto \int u \operatorname{div} \phi \, d\mathcal{L}^n$ is continuous in $L^1$, so the supremum over all such $\phi$ is lower semicontinuous.
## Sobolev Functions and Characteristic Functions as BV Examples
Before developing the theory, it is worth seeing the full spectrum of BV functions: those with absolutely continuous distributional gradient, those with jump discontinuities, and those with a purely singular but diffuse gradient.
[example: Sobolev Embedding into BV]
Every $u \in W^{1,1}(\Omega)$ belongs to $BV(\Omega)$, with $Du = \nabla u \cdot \mathcal{L}^n$ — the distributional gradient measure is absolutely continuous with respect to Lebesgue measure, with Radon-Nikodym density $\nabla u \in L^1(\Omega; \mathbb{R}^n)$. The total variation is
\begin{align*}
|Du|(\Omega) = \int_\Omega |\nabla u| \, d\mathcal{L}^n.
\end{align*}
To verify this, apply the weak formulation: for $\phi \in C_c^1(\Omega; \mathbb{R}^n)$,
\begin{align*}
\int_\Omega u \operatorname{div} \phi \, d\mathcal{L}^n = -\int_\Omega \nabla u \cdot \phi \, d\mathcal{L}^n,
\end{align*}
so the supremum over $|\phi| \leq 1$ is exactly $\int_\Omega |\nabla u| \, d\mathcal{L}^n$ by the dual characterisation of the $L^1$ norm of a vector-valued function. This gives a strict inclusion $W^{1,1}(\Omega) \subsetneq BV(\Omega)$: the Sobolev space requires the distributional gradient to be an $L^1$ function, while BV only requires it to be a finite Radon measure.
[/example]
The Sobolev example anchors one end of the BV spectrum: the gradient measure is as tame as it can be, namely a function times Lebesgue measure. The next example pushes in the opposite direction by exhibiting a BV function whose distributional gradient is concentrated on a set of zero $\mathcal{L}^n$-measure — a measure that is purely singular, supported on the boundary of a smooth domain, and weighted by the surface measure. Seeing both extremes side by side makes the structure theorem feel inevitable: a general BV function should sit somewhere between these poles, mixing absolutely continuous bulk with singular boundary-like contributions, and any classification result must accommodate both kinds of behaviour at once. Without surveying this full spectrum first, the three-part decomposition would look like an arbitrary partition; with the spectrum in view, it reads as the minimal taxonomy needed to keep these qualitatively different gradient measures apart.
[example: Characteristic Function of a Smooth Domain]
Let $E \subset \mathbb{R}^n$ be a bounded open set with $C^1$ boundary. The indicator function $\mathbb{1}_E \in BV(\mathbb{R}^n)$, with total variation
\begin{align*}
|D\mathbb{1}_E|(\mathbb{R}^n) = \mathcal{H}^{n-1}(\partial E).
\end{align*}
To see this, apply the divergence theorem: for $\phi \in C_c^1(\mathbb{R}^n; \mathbb{R}^n)$,
\begin{align*}
\int_{\mathbb{R}^n} \mathbb{1}_E \operatorname{div} \phi \, d\mathcal{L}^n = \int_E \operatorname{div} \phi \, d\mathcal{L}^n = -\int_{\partial E} \phi \cdot \nu_E \, d\mathcal{H}^{n-1},
\end{align*}
where $\nu_E$ is the inward unit normal on $\partial E$. The right-hand side is the pairing of $\phi$ with the vector measure $-\nu_E \cdot \mathcal{H}^{n-1}\lfloor_{\partial E}$. Taking the supremum over $|\phi| \leq 1$ yields $\mathcal{H}^{n-1}(\partial E)$. More precisely, the distributional gradient is $D\mathbb{1}_E = -\nu_E \cdot \mathcal{H}^{n-1}\lfloor_{\partial E}$: it is a measure concentrated on $\partial E$, with density $-\nu_E$ with respect to $\mathcal{H}^{n-1}$. This example shows that the total variation of $D\mathbb{1}_E$ is the surface area of $\partial E$ — the first hint of the deep connection between BV and perimeter that De Giorgi's theorem will make precise.
[/example]
These two examples occupy opposite ends of a spectrum. For $W^{1,1}$ functions the measure $Du$ is absolutely continuous; for $\mathbb{1}_E$ it is purely singular and in fact concentrated on a lower-dimensional set. The Cantor staircase, treated in detail below, will provide a third extreme: a measure that is singular without being concentrated on any set of finite $\mathcal{H}^{n-1}$ measure.
## The Lebesgue Decomposition of $Du$
Given $u \in BV(\Omega)$, the distributional gradient $Du$ is a finite $\mathbb{R}^n$-valued Radon measure on $\Omega$. By the Lebesgue decomposition theorem applied to the total variation measure $|Du|$ with respect to $\mathcal{L}^n$, one can write
\begin{align*}
Du = D^a u + D^s u,
\end{align*}
where $D^a u \ll \mathcal{L}^n$ (absolutely continuous part) and $D^s u \perp \mathcal{L}^n$ (singular part). This decomposition is not yet fine enough, however: the singular part $D^s u$ can behave in qualitatively different ways, and a central goal of BV theory is to distinguish these.
[definition: Approximate Limits and Jump Set]
Let $u \in L^1_{\mathrm{loc}}(\Omega)$. A value $\ell \in \mathbb{R}$ is the approximate limit of $u$ at $x_0$ if
\begin{align*}
\lim_{r \to 0} \frac{1}{\mathcal{L}^n(B(x_0, r))} \int_{B(x_0, r)} |u(x) - \ell| \, d\mathcal{L}^n(x) = 0.
\end{align*}
If the approximate limit exists, it is written $\widetilde{u}(x_0)$. The set $S_u$ of points where the approximate limit does not exist is called the approximate discontinuity set of $u$.
For a point $x_0 \in S_u$, one says $u$ has an approximate jump discontinuity at $x_0$ if there exist $u^+(x_0), u^-(x_0) \in \mathbb{R}$ and a unit vector $\nu_u(x_0) \in \mathbb{S}^{n-1}$ such that
\begin{align*}
\lim_{r \to 0} \frac{1}{\mathcal{L}^n(B^+(x_0, r, \nu_u))} \int_{B^+(x_0, r, \nu_u)} |u(x) - u^+(x_0)| \, d\mathcal{L}^n &= 0, \\
\lim_{r \to 0} \frac{1}{\mathcal{L}^n(B^-(x_0, r, \nu_u))} \int_{B^-(x_0, r, \nu_u)} |u(x) - u^-(x_0)| \, d\mathcal{L}^n &= 0,
\end{align*}
where $B^+(x_0, r, \nu) = B(x_0, r) \cap \{x : (x - x_0) \cdot \nu > 0\}$ and $B^- = B(x_0, r) \cap \{x : (x - x_0) \cdot \nu < 0\}$. The jump set $J_u$ is the set of approximate jump discontinuities.
[/definition]
The point of this definition is that $u^+$ and $u^-$ are the approximate limits from the two half-spaces separated by the hyperplane $\{x : (x - x_0) \cdot \nu_u = 0\}$. The direction $\nu_u$ is chosen so that $u^+ \geq u^-$ (or by some other orientation convention), and the key fact is that $J_u$ is an $(n-1)$-rectifiable set — a countable union of $C^1$ hypersurfaces up to a set of $\mathcal{H}^{n-1}$ measure zero. This rectifiability is a consequence of the BV structure theorem, which we now state and prove.
## The Structure Theorem
The structure theorem is the central result of this chapter. It asserts that the singular part $D^s u$ of the gradient measure further decomposes into a piece concentrated on the jump set $J_u$ — where $u$ has genuine one-sided limits from both sides — and a diffuse singular remainder that is orthogonal to every set of finite $\mathcal{H}^{n-1}$ measure.
[quotetheorem:595]
[citeproof:595]
The structure theorem is remarkable: it says that every BV function can be decomposed into a "classical-gradient piece," a "surface-jump piece," and a "Cantor piece," and these three pieces are mutually singular as measures. The decomposition is canonical — it is determined entirely by $u$ and depends on no choices.
[remark: Necessity of the BV Hypothesis]
The decomposition $Du = D^a u + D^j u + D^c u$ uses the full force of $u \in BV(\Omega)$. The key hypotheses are:
- **Finiteness of $|Du|(\Omega)$:** Without this, the distributional gradient need not be a Radon measure at all. For example, $u(x) = \sin(1/x) \in L^1([0,1])$ has distributional derivative $u'(x) = -\cos(1/x)/x^2$, which is not a finite measure near $0$ (the total variation near $0$ is $\sum_{k=1}^\infty 2/(\pi k) = \infty$). No decomposition of the Lebesgue type is possible.
- **Integrability $u \in L^1(\Omega)$:** The ambient integrability ensures that $Du$ is defined as a distribution; without it, distributional derivatives of locally unbounded functions may not be representable as measures. The function $u(x) = 1/|x|$ in $\mathbb{R}^n$ ($n \geq 2$) has a distributional gradient that is not a locally finite measure near the origin.
[/remark]
## The Cantor Staircase: A Structural Example
The Cantor staircase is the canonical example of a BV function with a purely "Cantor" distributional gradient — no absolutely continuous part, no jump part. It illustrates precisely why the three-piece decomposition is necessary and why the Cantor part cannot be identified with either of the other two.
[example: Cantor Staircase]
Define the Cantor staircase $u_C : [0,1] \to [0,1]$ by the recursive construction on the Cantor set $C \subset [0,1]$. Recall that $C$ is obtained by removing, at stage $k$, the $2^{k-1}$ open middle thirds; its Lebesgue measure is $\mathcal{L}^1(C) = 0$ and its Hausdorff dimension is $\log 2 / \log 3 < 1$.
The staircase is defined as follows: on the interval $[1/3, 2/3]$ (removed at stage 1), set $u_C = 1/2$. On $[1/9, 2/9]$ (removed at stage 2, left piece), set $u_C = 1/4$; on $[7/9, 8/9]$ (stage 2, right piece), set $u_C = 3/4$. At stage $k$ there are $2^{k-1}$ removed intervals, and on the $j$-th such interval (ordered left to right) one sets $u_C = (2j-1)/2^k$. Extending continuously to all of $[0,1]$ (setting $u_C(0) = 0$, $u_C(1) = 1$) and using monotonicity to fill in the Cantor set itself, one obtains $u_C \in C([0,1])$.
We now verify the three main properties:
**1. $u_C$ is continuous.** The staircase is constant on each removed interval and the values at successive stages are arranged to be consistent: at stage $k$ the function takes $2^{k-1}$ distinct values, all dyadic rationals of denominator $2^k$, and consecutive values differ by $1/2^k$. On the Cantor set itself, $u_C(x) = \sup\{u_C(y) : y \in [0,x] \setminus C\}$, which gives a monotone extension that is continuous since the gaps between successive staircase values shrink to zero.
**2. $u_C'(x) = 0$ for $\mathcal{L}^1$-a.e. $x \in [0,1]$.** On each removed interval $[0,1] \setminus C$, the function $u_C$ is constant, so its classical derivative is zero there. Since $\mathcal{L}^1(C) = 0$, we have $u_C' = 0$ almost everywhere.
**3. $u_C \in BV([0,1])$ and $|Du_C|([0,1]) = 1$.** Since $u_C$ is monotone non-decreasing and continuous with $u_C(0) = 0$ and $u_C(1) = 1$, its distributional derivative $Du_C$ is a non-negative Radon measure with $Du_C([0,1]) = u_C(1) - u_C(0) = 1$. (This follows from the fundamental theorem of calculus for monotone functions: for monotone $u$, $Du$ is the Stieltjes measure.)
**4. The distributional gradient $Du_C$ is the Cantor measure.** The Cantor measure $\mu_C$ is the unique Borel probability measure on $[0,1]$ supported on $C$ with the property $\mu_C([0,1/3]) = \mu_C([2/3,1]) = 1/2$, $\mu_C([0,1/9]) = 1/4$, etc. — it assigns to each interval removed at stage $k$ measure $1/2^k$. One verifies that $Du_C = \mu_C$ by checking that $\int_0^x d\mu_C = u_C(x)$ for all $x$: indeed, $\int_{[0,x]} d\mu_C$ equals the Stieltjes integral $\int_0^x du_C$, and since the function is non-decreasing, this Stieltjes measure is the same as $Du_C$.
**5. The decomposition $Du_C = D^a u_C + D^j u_C + D^c u_C$.** Since $u_C' = 0$ a.e., the absolutely continuous part is $D^a u_C = 0$. Since $u_C$ is continuous, it has no approximate jump discontinuities anywhere: for any $x_0$, the approximate limits from both sides coincide (they equal $u_C(x_0)$ by continuity), so $J_{u_C} = \varnothing$ and $D^j u_C = 0$. Therefore $Du_C = D^c u_C = \mu_C$: the entire gradient measure is the Cantor part. The Cantor measure $\mu_C$ is supported on $C$, which has $\mathcal{H}^1(C) = \mathcal{L}^1(C) = 0$, confirming that $Du_C$ gives no mass to any set of finite $\mathcal{H}^1$ measure (such sets have $\mathcal{H}^1$ measure zero, and $\mu_C$ is supported on $C$ which itself has Hausdorff dimension $< 1$, hence $\mathcal{H}^1(C) = 0$).
This example shows that the Cantor part is genuinely different from both the absolutely continuous and jump parts. The function $u_C$ has no visible jumps, yet its gradient is a non-trivial measure. The gradient "lives on" the Cantor set, which has Lebesgue measure zero but positive Hausdorff dimension less than one.
[/example]
<!-- illustration-needed: the Cantor staircase — show the function value on [0,1], with the flat portions on the removed intervals of the Cantor set visible, and indicate that all the increase occurs on the Cantor set itself (which has measure zero) -->
The Cantor staircase illustrates a general principle: a BV function can have all of its "variation" concentrated on a fractal set of positive Hausdorff dimension but zero Lebesgue measure, and this variation is invisible to both the absolutely continuous gradient (which sees only $\mathcal{L}^n$-density) and the jump part (which sees only the $(n-1)$-rectifiable jump set). The three-piece structure of the theorem is therefore not a notational convenience but a genuine decomposition into mutually singular pieces.
[remark: The Cantor Part Is Genuinely Singular]
The property $D^c u(B) = 0$ for every Borel $B$ with $\mathcal{H}^{n-1}(B) < \infty$ means the Cantor part is "more singular" than the jump part. While $D^j u$ is concentrated on a set of $\sigma$-finite $\mathcal{H}^{n-1}$ measure (namely $J_u$ itself), the Cantor part charges no set of finite $\mathcal{H}^{n-1}$ measure. In dimension $n = 1$, this means $D^c u$ charges no countable set — it is a purely non-atomic singular measure. In higher dimensions, $D^c u$ charges no $(n-1)$-dimensional surface. The Cantor measure $\mu_C$ exemplifies this: it is supported on a set of $\mathcal{H}^1$ measure zero (the Cantor set), hence charges no set of finite $\mathcal{H}^1$ measure.
[/remark]
## The Absolutely Continuous Part and the Approximate Gradient
The absolutely continuous component $D^a u$ deserves its own discussion, because its Radon-Nikodym density $\nabla u$ has a precise pointwise meaning that goes beyond being a density in $L^1$.
[definition: Approximate Gradient]
Let $u \in L^1_{\mathrm{loc}}(\Omega)$. The approximate gradient of $u$ at $x_0 \in \Omega \setminus S_u$ is the vector $\nabla u(x_0) \in \mathbb{R}^n$ (if it exists) such that
\begin{align*}
\lim_{r \to 0} \frac{1}{\mathcal{L}^n(B(x_0, r))} \int_{B(x_0, r)} \frac{|u(x) - \widetilde{u}(x_0) - \nabla u(x_0) \cdot (x - x_0)|}{r} \, d\mathcal{L}^n(x) = 0.
\end{align*}
[/definition]
For BV functions, the approximate gradient exists $\mathcal{L}^n$-a.e. and coincides with the Radon-Nikodym density of $D^a u$ with respect to $\mathcal{L}^n$. This is a consequence of the Lebesgue-Besicovitch differentiation theorem: for $\mathcal{L}^n$-a.e. $x_0$, the ball averages of $Du$ converge to the density $\nabla u(x_0)$, and this density is precisely the approximate gradient.
Approximate differentiability is the natural pointwise notion of derivative for BV functions because the classical gradient genuinely fails to exist on large sets — anywhere on the jump set $J_u$, anywhere on the support of the Cantor part, and indeed at any point where $u$ is merely measurable rather than smooth. Demanding a classical gradient pointwise would discard all of the geometric information that BV is designed to retain. Yet the absolutely continuous part of $Du$ is, by the Radon-Nikodym theorem, a function times Lebesgue measure, so it must have a density — and the question is what that density is at a given point. The approximate gradient answers this question by asking only that the affine Taylor remainder be small in mean, not pointwise: it is the weakest pointwise condition compatible with the existence of an $L^1$-density. This is exactly the role $\nabla u$ will play in the structure theorem, where $D^a u = \nabla u \cdot \mathcal{L}^n$ identifies the absolutely continuous component with the pointwise approximate derivative defined a.e., turning the abstract Radon-Nikodym density into a concrete object that behaves like a classical gradient on the regular set of $u$.
[quotetheorem:3130]
The proof uses the Lebesgue-Besicovitch differentiation theorem applied to the vector measure $Du$; see Evans-Gariepy §5.1 for the full argument. The conclusion tells us that the absolutely continuous part of $Du$ carries exactly the information a classical analyst would expect: it is the approximate gradient, defined pointwise a.e. as a first-order Taylor coefficient in a mean-square sense.
## Decomposition Consequences and Counterexamples
The structure theorem has immediate consequences for how BV functions can and cannot behave. Understanding which hypotheses are necessary requires careful examples.
[explanation: Why All Three Parts Can Be Non-Zero Simultaneously]
There is no constraint forcing any of the three parts of $Du$ to be zero. One can construct $u \in BV(\Omega)$ with all three parts non-trivial simultaneously. For instance, in dimension $n = 1$ on $\Omega = (0, 2)$, take:
\begin{align*}
u(x) = \begin{cases} x & \text{if } x \in (0, 1/2), \\ u_C(2x - 1) + 1/2 & \text{if } x \in [1/2, 3/2], \\ x - 1/2 & \text{if } x \in (3/2, 2), \end{cases}
\end{align*}
where $u_C$ is the Cantor staircase on $[0,1]$ (scaled to operate on $[1/2, 3/2]$), and the function has a unit jump at $x = 1/2$ (by construction, the left limit is $1/2$ and the right limit is $1/2 + u_C(0) = 1/2$, so in fact one needs a slightly different gluing). More simply, define
\begin{align*}
u(x) = \mathbb{1}_{[1,2)}(x) + u_C(x)\mathbb{1}_{[0,1]}(x) + \int_0^x \mathbb{1}_{[0,1/3]}(t) \, d\mathcal{L}^1(t)
\end{align*}
on $(0, 2)$. This function has: an absolutely continuous part from $\int_0^x \mathbb{1}_{[0,1/3]} dt$ (gradient $\mathbb{1}_{[0,1/3]}$); a Cantor part from $u_C$; and a jump at $x = 1$ from the indicator $\mathbb{1}_{[1,2)}$ (with $u^- = 0$ and $u^+ = 1$ at $x = 1$). The three parts of $Du$ are supported on disjoint sets ($(0, 1/3)$, the Cantor set $C$, and $\{1\}$ respectively) and are mutually singular.
The structure theorem asserts that in general the three parts are always mutually singular. This is not a consequence of any specific choice of $u$ but a universal feature of the BV structure.
[/explanation]
The three-part decomposition is sharp precisely because BV is a strong enough hypothesis to force the gradient measure to organise itself into these pieces — and one only appreciates this by contrasting BV with the larger ambient space $L^1$. For an arbitrary $u \in L^1(\Omega)$ there is no decomposition of $Du$ at all: the distributional gradient may fail to be a Radon measure, may have infinite total variation on every neighbourhood of a point, and may not admit any meaningful pointwise notion of jump direction or absolutely continuous density. The sequence $\sin(kx)$ in dimension one already shows that bounded $L^1$-norm is no obstacle to wild gradient behaviour, and in higher dimensions the pathologies are far worse. What BV adds is exactly the right amount of regularity: finite total variation forces $Du$ to be a Radon measure, and this single requirement is enough to trigger the Lebesgue decomposition, the rectifiability of the jump set, and the diffuse-singular character of the Cantor part. BV is the right setting because it is the smallest class of $L^1$ functions that makes the structure theorem true, and it is also the largest class for which the gradient retains the geometric information — surface area, jump heights, fractal variation — that one needs in geometric measure theory.
[remark: Failure Without BV Regularity]
To appreciate the structure theorem, consider what happens for a function that is merely $L^1$ and not BV. The function $u(x) = \sin(k x)$ on $[0, 2\pi]$ for large $k$ has $\|u\|_{L^1} = O(1)$ but $|Du|([0,2\pi]) = 4k \to \infty$. Its distributional gradient is $Du = k\cos(kx) \cdot \mathcal{L}^1$, which is in $L^1$ but with norm growing. If one passes to the limit $u_k(x) = k^{-1}\sin(kx) \to 0$ in $L^1$ while $|Du_k|([0,2\pi]) = 4 \not\to 0$, the total variation does not go to zero despite $L^1$ convergence. The structure theorem fails to apply to the limit because the limit function is identically zero (with $D^a (0) = D^j (0) = D^c(0) = 0$), but the sequence does not converge in BV — only in $L^1$. The BV space remembers variation that $L^1$ forgets.
[/remark]
The three-piece decomposition sets the stage for the next two chapters. Approximation theory for BV (Chapter 9) relies on understanding how mollification interacts with each piece: mollified functions are in $W^{1,1}$ and thus have only an absolutely continuous part, so approximation in BV requires careful control of how the jump and Cantor parts are approximated. Trace theory (Chapter 10) will show that the jump set $J_u$ determines the boundary behaviour of $u$, with the values $u^+$ and $u^-$ playing the role of the two-sided trace. The structure theorem is therefore not just a classification result — it is the foundation on which the entire analysis of BV functions rests.
The structure theorem for BV functions decomposes the total variation into absolutely continuous and singular parts, revealing the internal geometry of these functions. With this decomposition in hand, we investigate how BV functions behave under approximation and what compactness properties they inherit from their total variation bounds.
# 9. Approximation and Compactness for BV
The previous chapter established the basic structure of BV functions: the distributional derivative $Du$ is a vector-valued Radon measure, decomposable into absolutely continuous, jump, and Cantor parts. That structural description is powerful, but it raises an immediate analytical question: how close is a BV function to the smooth functions we know how to manipulate? And when we have a bounded family of BV functions — as we will in every variational problem — can we extract a convergent subsequence? This chapter answers both questions, and the answers are subtler than their Sobolev analogues: smooth approximation exists, but the correct notion of convergence requires controlling not just $L^1$ closeness but also the total variation, giving rise to the notion of strict convergence.
## Lower Semicontinuity of Total Variation
Any approach to approximation requires knowing how total variation behaves under limits. The naive hope — that $|Du|(\Omega)$ converges when $u_k \to u$ in $L^1$ — turns out to be false. What does hold, and what the variational characterization makes transparent, is a lower semicontinuity statement.
[quotetheorem:597]
[citeproof:597]
This inequality is sharp in a very concrete sense. Consider $u_k(x) = \frac{1}{k} \sin(kx)$ on $\Omega = (0,1) \subset \mathbb{R}$. We have $u_k \to 0$ in $L^1$, yet each $u_k$ is smooth with $|Du_k|(\Omega) = \int_0^1 |\cos(kx)| \, d\mathcal{L}^1 = 2$ for all $k$ (the total variation of one full period of a sinusoid). The limit $u \equiv 0$ has $|Du|(\Omega) = 0$. So the inequality $|Du|(\Omega) \leq \liminf_k |Du_k|(\Omega)$ is strict here: $0 < 2$. The highly oscillatory sequence genuinely loses total variation in the limit, concentrating its variation at a frequency that averages out. This rules out any hope of norm convergence of $Du_k$ to $Du$ in the general $L^1$ setting.
## Strict Convergence
The sinusoid example reveals why the right topology on $BV$ for approximation purposes is not the norm topology. The BV norm is $\|u\|_{BV(\Omega)} = \|u\|_{L^1(\Omega)} + |Du|(\Omega)$, and norm convergence would require both $u_k \to u$ in $L^1$ and $|Du_k|(\Omega) \to |Du|(\Omega)$. The lower semicontinuity theorem says we always get the inequality; requiring equality is an additional condition that captures the absence of hidden oscillation or concentration.
[definition: Strict Convergence in BV]
Let $\Omega \subset \mathbb{R}^n$ be open. A sequence $(u_k)$ in $BV(\Omega)$ converges **strictly** to $u \in BV(\Omega)$ if
\begin{align*}
u_k \to u \text{ in } L^1(\Omega) \quad \text{and} \quad |Du_k|(\Omega) \to |Du|(\Omega).
\end{align*}
[/definition]
Strict convergence sits between weak* convergence of measures (where $Du_k \overset{*}{\rightharpoonup} Du$ in the sense of Radon measures) and norm convergence in BV. Weak* convergence of $Du_k$ to $Du$ follows from $L^1$ convergence of $u_k$ and the definition of distributional derivatives; but it only gives $|Du|(\Omega) \leq \liminf_k |Du_k|(\Omega)$. Strict convergence imposes the equality $|Du_k|(\Omega) \to |Du|(\Omega)$ as the additional datum needed to prevent the creation or destruction of total variation in the limit. This is the correct analogue of strong convergence in $W^{1,1}$ — where norm convergence would require $\|\nabla u_k - \nabla u\|_{L^1} \to 0$ — adapted to the measure-theoretic setting where $Du$ need not be absolutely continuous.
## Smooth Approximation in BV
The Meyers–Serrin theorem for Sobolev spaces ($W^{1,p} = H^{1,p}$) provides smooth functions that approximate any Sobolev function in norm. In BV, norm convergence is too strong to achieve with smooth functions — the sinusoid example already shows that smooth functions can have large total variation while converging in $L^1$ to something flat. But strict convergence is achievable.
[quotetheorem:3131]
[citeproof:3131]
The strict convergence in this theorem is the best one can do: it is impossible in general to approximate a BV function in norm by smooth functions while keeping $|Du_k - Du|(\Omega) \to 0$, because $Du$ may have a singular part — a Cantor or jump component — that no sequence of $L^1$-functions $\nabla u_k$ can approximate in total variation. What strict convergence achieves is the convergence of the total mass $|Du_k|(\Omega)$ to $|Du|(\Omega)$, even though the individual measures $Du_k = \nabla u_k \, d\mathcal{L}^n$ need not converge to $Du$ in any pointwise sense.
[remark: One-Sided Approximation From Below]
Mollification alone — without the partition-of-unity balancing — gives only $|D(\eta_\varepsilon * u)|(\Omega') \leq |Du|(\Omega)$ for $\Omega' \subset\subset \Omega$, which is an inequality from below. The strict approximation theorem is saying that this bound is asymptotically sharp: one can approach $|Du|(\Omega)$ from below with smooth approximants, and the global total variation can be arranged to converge exactly.
[/remark]
## Compactness in BV
The approximation theorem would be of limited use without a corresponding compactness result. In the direct method of the calculus of variations, one constructs a minimizing sequence, extracts a convergent subsequence, and identifies the limit as a minimizer. For this to work in BV, one needs: given a sequence bounded in BV norm, extract a subsequence converging in $L^1$. The following theorem provides exactly this, and the argument runs through the BV Sobolev inequality $\|u\|_{L^{n/(n-1)}} \lesssim |Du|(\Omega)$ (established in Chapter 12 via the isoperimetric inequality, and anticipated here by the Gagliardo–Nirenberg–Sobolev inequality of Chapter 5 applied by mollification).
[quotetheorem:596]
[citeproof:596]
The hypothesis that $\Omega$ have Lipschitz boundary is necessary for the boundary behavior of the Sobolev embedding. Without it, one can construct domains where the embedding $BV \hookrightarrow L^{n/(n-1)}$ fails — for instance, a domain with an outward cusp forces a loss of integrability exponent. The boundedness of $\Omega$ is also necessary: on all of $\mathbb{R}^n$, sequences bounded in BV can escape to infinity and have no $L^1$-convergent subsequence (consider translates $u_k(x) = u(x - ke_1)$ for a fixed $u \in BV(\mathbb{R}^n)$).
[example: Oscillating Sequence and Compactness]
Let $\Omega = (0,1)$ and define $u_k(x) = \sin(2\pi k x)$. Then $\|u_k\|_{L^1} \leq 1$ and $|Du_k|(\Omega) = \int_0^1 |2\pi k \cos(2\pi k x)| \, d\mathcal{L}^1 = 4k$, so $\|u_k\|_{BV} = 1 + 4k \to \infty$. The BV norms are unbounded, so the compactness theorem does not apply — and indeed no subsequence of $(u_k)$ converges in $L^1$, because each $u_k$ is orthogonal to the constant function $1$ and to all $u_j$ with $j \neq k$ in $L^2$.
Now modify: let $v_k(x) = \frac{1}{k} \sin(2\pi k x)$. Then $\|v_k\|_{L^1} \leq 1/k \to 0$ and $|Dv_k|(\Omega) = 4$ for all $k$, so $\sup_k \|v_k\|_{BV} < \infty$. The compactness theorem guarantees a subsequence converging in $L^1$; in this case $v_k \to 0$ in $L^1$ directly, and $|D(0)| = 0 < 4 = \lim_k |Dv_k|(\Omega)$. The limit is attained, but the total variations do not converge: this is the failure of strict convergence despite $L^1$ convergence.
[/example]
The example shows that the compactness theorem delivers $L^1$ convergence but not strict convergence. Strict convergence is a stronger conclusion that requires additional structure — it does not follow from boundedness alone.
## The Topology of BV and the Role of Strict Convergence
Having established both approximation and compactness, it is worth stepping back to see how the three topologies on BV relate to one another.
The weakest is weak* convergence of the derivative measures: $u_k \overset{*}{\rightharpoonup} u$ in $BV(\Omega)$ means $u_k \to u$ in $L^1(\Omega)$ and $Du_k \overset{*}{\rightharpoonup} Du$ in the sense of Radon measures. The compactness theorem says that BV-bounded sequences have weak*-convergent subsequences. Strict convergence adds the requirement $|Du_k|(\Omega) \to |Du|(\Omega)$, which — combined with weak* convergence of $Du_k$ to $Du$ — is equivalent to $\|Du_k\| \to \|Du\|$ in the total variation norm on Radon measures. Norm convergence in BV is the strongest: it requires $\|u_k - u\|_{BV} \to 0$, which forces $\|Du_k - Du\|(\Omega) \to 0$ and hence $Du_k \to Du$ in total variation norm, not merely weakly.
The smooth approximation theorem lives at the level of strict convergence. This is optimal: smooth functions cannot approximate a BV function with a non-trivial singular derivative in norm, because the total variation of the difference $|D(u_k - u)|(\Omega)$ includes the singular part of $Du$, which no sequence of smooth gradients can cancel.
[explanation: Why Strict Convergence is the Right Notion for Variational Problems]
In the calculus of variations, one typically minimizes a functional of the form
\begin{align*}
\mathcal{F}[u] = \int_\Omega f(x, u, \nabla u) \, d\mathcal{L}^n
\end{align*}
over a class of functions in $W^{1,1}(\Omega)$ or $BV(\Omega)$. When the infimum is not attained in $W^{1,1}$ — because a minimizing sequence develops discontinuities or jumps — one passes to the BV relaxation. The relaxed functional is typically
\begin{align*}
\overline{\mathcal{F}}[u] = \int_\Omega f(x, u, \nabla u) \, d\mathcal{L}^n + \int_\Omega f^\infty\!\left(x, u, \frac{dD^s u}{d|D^s u|}\right) d|D^s u|
\end{align*}
where $f^\infty$ is the recession function of $f$ and $D^s u$ is the singular part of $Du$. This relaxed functional is lower semicontinuous with respect to strict convergence — indeed, it is the largest lower semicontinuous functional coinciding with $\mathcal{F}$ on smooth functions. Strict convergence is the correct topology in which to study this relaxation because: (i) smooth functions are dense in BV under strict convergence (the approximation theorem), and (ii) the relaxed functional is sequentially continuous under strict convergence.
Without the total variation convergence condition, one cannot distinguish a sequence that truly converges to a BV function from one that creates hidden oscillations or concentrations that are invisible to $L^1$ but carry energy. The condition $|Du_k|(\Omega) \to |Du|(\Omega)$ rules out this phenomenon.
[/explanation]
## Approximation of Sets of Finite Perimeter
The smooth approximation theorem extends naturally to characteristic functions, providing the bridge between the abstract BV theory and the geometry of sets.
[definition: Smooth Approximation of a Set]
Let $E \subset \mathbb{R}^n$ be a set of finite perimeter in $\Omega$. A sequence of smooth sets $(E_k)$ approximates $E$ strictly in $BV(\Omega)$ if $\mathbb{1}_{E_k} \to \mathbb{1}_E$ in $L^1(\Omega)$ and $P(E_k; \Omega) \to P(E; \Omega)$.
[/definition]
Here $P(E; \Omega) = |D\mathbb{1}_E|(\Omega)$ is the perimeter of $E$ in $\Omega$. Applying the smooth approximation theorem to $u = \mathbb{1}_E \in BV(\Omega)$ yields smooth functions $u_k$ converging strictly to $\mathbb{1}_E$. These $u_k$ are not indicator functions of sets, but by taking suitable level sets one can obtain smooth sets. Specifically, since $u_k \to \mathbb{1}_E$ in $L^1$ and $u_k$ is smooth, the co-area formula relates the perimeters of the level sets $\{u_k > t\}$ to the total variation of $u_k$:
\begin{align*}
|Du_k|(\Omega) = \int_0^1 P(\{u_k > t\}; \Omega) \, dt.
\end{align*}
Since $|Du_k|(\Omega) \to |D\mathbb{1}_E|(\Omega) = P(E;\Omega)$ and $u_k \to \mathbb{1}_E$ in $L^1$, an averaging argument shows that for $\mathcal{L}^1$-almost every $t \in (0,1)$, the smooth sets $E_k^t = \{u_k > t\}$ have boundaries $\partial E_k^t$ that are smooth manifolds (by Sard's theorem, since $u_k$ is smooth), and $P(E_k^t; \Omega) \to P(E;\Omega)$ for those values of $t$.
This geometric approximation result is the foundation for the next chapter's study of traces and extensions for BV: it allows one to define the trace of a BV function on $\partial\Omega$ by first working with smooth approximants where traces are classical.
<!-- illustration-needed: diagram showing a set E of finite perimeter (with rough boundary), the smooth approximating functions u_k as level-set functions converging to the indicator 1_E, and the smooth sets E_k^t as sub-level sets whose boundaries converge to the reduced boundary of E -->
Just as we could approximate Sobolev functions by smooth functions and derive compactness from bounded norms, BV functions admit both approximations and compactness results that mirror the Sobolev theory. These parallel results suggest a deeper unity between the two classes and motivate the study of their traces and extensions.
# 10. Traces and Extensions for BV
The preceding chapter established that BV functions are the natural domain for the total variation functional: a sequence bounded in $BV(\Omega)$ has a subsequence converging weakly-$*$ in $BV(\Omega)$, and the limiting function retains the full structure of its distributional derivative as a finite Radon measure. What the compactness theorem says nothing about is the behavior of a BV function near the boundary $\partial\Omega$. For Sobolev functions $u \in W^{1,p}(\Omega)$ with $p > 1$, the Sobolev trace theorem (Chapter 3) produces a bounded linear operator $T : W^{1,p}(\Omega) \to L^p(\partial\Omega; \mathcal{H}^{n-1})$ that extends pointwise restriction to the boundary. The BV analogue is both more subtle and richer: BV functions can jump as one approaches $\partial\Omega$, and those jumps contribute a genuine boundary term to the total variation of any zero-extension across the boundary. Understanding this boundary behavior is essential for the Gauss-Green theorem in Chapter 13, where the boundary integral over $\partial^* E$ involves the trace of $\mathbb{1}_E$, and for the BV Coarea Formula in Chapter 11, where the level sets of $u$ must be controlled near $\partial\Omega$ via the zero-extension formula proved in this chapter.
## The Trace Theorem for BV
A [Sobolev function](/page/Sobolev%20Space) $u \in W^{1,p}(\Omega)$ with $p \geq 1$ cannot be evaluated pointwise on $\partial\Omega$ — the boundary has $\mathcal{L}^n$-measure zero, so changing $u$ on $\partial\Omega$ leaves the $L^p$ class unaffected. The trace theorem resolves this by showing that if we approximate $u$ by smooth functions and restrict those smooth functions to $\partial\Omega$, the resulting sequence of restrictions converges in $L^1(\partial\Omega; \mathcal{H}^{n-1})$, and the limit does not depend on the approximating sequence. The same strategy works for BV, though the argument must be adapted to the weaker topology of weak-$*$ convergence of measures.
The domain geometry matters here. Throughout this chapter, $\Omega \subset \mathbb{R}^n$ denotes a bounded open set with Lipschitz boundary, meaning $\partial\Omega$ is locally the graph of a Lipschitz function. This ensures that $\mathcal{H}^{n-1}(\partial\Omega) < \infty$ and that we can use outward unit normals $\nu_\Omega$ defined $\mathcal{H}^{n-1}$-almost everywhere on $\partial\Omega$.
[definition: BV Trace]
Let $\Omega \subset \mathbb{R}^n$ be a bounded open set with Lipschitz boundary. A bounded linear operator
\begin{align*}
T : BV(\Omega) \to L^1(\partial\Omega; \mathcal{H}^{n-1})
\end{align*}
is called a **trace operator** for $BV(\Omega)$ if $Tu = u|_{\partial\Omega}$ for every $u \in C(\overline{\Omega}) \cap BV(\Omega)$, where $u|_{\partial\Omega}$ denotes the continuous extension of $u$ to $\overline{\Omega}$.
[/definition]
The definition asks for consistency with classical restriction. For smooth or continuous functions, the boundary values are already determined; the trace operator is required to agree with that classical notion on the dense subclass where both make sense.
[quotetheorem:3110]
[citeproof:3110]
The trace $Tu$ is a genuine $L^1$ function on $\partial\Omega$ with respect to $\mathcal{H}^{n-1}$. It is not, in general, the pointwise limit $\lim_{x \to y} u(x)$ for $y \in \partial\Omega$ — BV functions can have jumps not only in the interior (along their jump set $J_u$) but also as they approach the boundary. What the trace theorem guarantees is that some well-defined $L^1$ boundary value exists; the pointwise character of this boundary value is a finer question addressed by the one-sided approximate limits discussed in Chapter 14.
### Necessity of the Lipschitz Hypothesis
The Lipschitz boundary assumption cannot be dropped from the trace theorem. If $\Omega$ is a bounded domain with a cusp — say $\Omega = \{(x_1, x_2) \in \mathbb{R}^2 : 0 < x_1 < 1, \, 0 < x_2 < x_1^2\}$ — then one can construct sequences of smooth functions that are bounded in $BV(\Omega)$ but whose boundary values on the cusp boundary oscillate with no $L^1$ limit. More concretely, the function $u(x_1, x_2) = \sin(1/x_1)$ belongs to $BV(\Omega)$ for the cuspidal domain above (the BV seminorm involves an integral over the oscillation, which converges because the domain is thin near $x_1 = 0$), yet there is no sensible $L^1(\partial\Omega; \mathcal{H}^{1})$ boundary value on the cusp face $\{x_1 = 0\} \cap \partial\Omega$.
## Jump Contribution to Total Variation
When a BV function $u \in BV(\Omega)$ is extended by zero outside $\Omega$, the resulting function $\tilde{u} = u \cdot \mathbb{1}_\Omega$ need not belong to $BV(\mathbb{R}^n)$ — the jump of $u$ across $\partial\Omega$ contributes additional variation that must be accounted for. The following formula makes this precise and shows that the trace $Tu$ is exactly the ingredient needed to compute the total variation of the zero-extension.
[quotetheorem:3111]
[citeproof:3111]
The zero-extension formula has a clear geometric meaning. The distributional derivative of $\tilde{u}$ on the interior of $\Omega$ is just $Du$, the same measure as before. At the boundary, the function drops abruptly from the value $Tu(y)$ to zero as one crosses $\partial\Omega$ from inside to outside. This jump of magnitude $|Tu(y)|$ at $y \in \partial\Omega$ contributes exactly $\int_{\partial\Omega} |Tu| \, d\mathcal{H}^{n-1}$ to the total variation. When $Tu = 0$ almost everywhere on $\partial\Omega$, the function $\tilde{u}$ has no jump across $\partial\Omega$ and the two total variations agree: $|D\tilde{u}|(\mathbb{R}^n) = |Du|(\Omega)$. This is the BV analogue of the Sobolev condition $u \in W^{1,p}_0(\Omega)$, which is characterized precisely by $Tu = 0$.
[remark: BV Analogue of $W^{1,p}_0$]
The space $BV_0(\Omega)$ is sometimes defined as the closure of $C_c^\infty(\Omega)$ in $BV(\Omega)$, or equivalently as $\{u \in BV(\Omega) : Tu = 0 \text{ $\mathcal{H}^{n-1}$-a.e. on } \partial\Omega\}$. For such functions, the zero-extension $\tilde{u}$ satisfies $|D\tilde{u}|(\mathbb{R}^n) = |Du|(\Omega)$, so no total variation is lost in the extension. This class is important in variational problems with Dirichlet boundary conditions in the BV setting, such as the least gradient problem and the total variation minimization problem.
[/remark]
The formula $|D\tilde{u}|(\mathbb{R}^n) = |Du|(\Omega) + \int_{\partial\Omega} |Tu| \, d\mathcal{H}^{n-1}$ will reappear in Chapter 13, where the Gauss-Green theorem for BV functions is proved, and in the perimeter computations of Chapter 12. In both settings, the boundary integral $\int_{\partial\Omega} |Tu| \, d\mathcal{H}^{n-1}$ represents the "cost" of the boundary values, and the formula decomposes the total variation into its interior and boundary contributions.
### What the Formula Says for Sets of Finite Perimeter
A particularly clean instance of the zero-extension formula arises for the indicator function of a set. Let $E \subset \mathbb{R}^n$ be a set of finite perimeter in the sense of Chapter 12, and suppose $E$ is compactly contained in $\Omega$. Then $\mathbb{1}_E \in BV(\Omega)$, and the trace $T(\mathbb{1}_E)$ on $\partial\Omega$ is $0$ (since $E$ does not meet $\partial\Omega$). Consequently, $|D(\widetilde{\mathbb{1}_E})|(\mathbb{R}^n) = |D\mathbb{1}_E|(\Omega) = P(E; \Omega)$, the perimeter of $E$ in $\Omega$. For a set $E$ that does touch $\partial\Omega$, the trace of $\mathbb{1}_E$ on $\partial\Omega$ equals the indicator $\mathbb{1}_{E \cap \partial\Omega}$ (up to a set of $\mathcal{H}^{n-1}$-measure zero), and the zero-extension formula gives
\begin{align*}
P(E; \mathbb{R}^n) = P(E; \Omega) + \mathcal{H}^{n-1}(E \cap \partial\Omega).
\end{align*}
This is the natural additivity: the perimeter of $E$ in all of $\mathbb{R}^n$ equals its perimeter in the interior of $\Omega$ plus the portion of $\partial\Omega$ covered by $E$.
## The BV Extension Theorem
Having understood the trace, we now turn to extensions: given $u \in BV(\Omega)$, can we find $Eu \in BV(\mathbb{R}^n)$ with $Eu|_\Omega = u$ and with $\|Eu\|_{BV(\mathbb{R}^n)} \leq C\|u\|_{BV(\Omega)}$? The zero-extension $\tilde{u}$ from the previous section does not suffice in general — it adds a boundary term $\int_{\partial\Omega} |Tu| \, d\mathcal{H}^{n-1}$ to the total variation, which need not be controlled by $|Du|(\Omega)$ alone. A bounded extension must neutralize the jump at $\partial\Omega$.
For Sobolev spaces, Stein's extension theorem produces a universal bounded linear extension operator $E : W^{k,p}(\Omega) \to W^{k,p}(\mathbb{R}^n)$ for all $k$ and $p$ simultaneously, using a Whitney-type decomposition combined with a reflection construction. For BV, the analogous construction uses the same geometric ideas — reflection across the boundary, followed by a partition of unity — but the total variation control is more delicate because jumps propagate differently from distributional derivatives.
[quotetheorem:3112]
[citeproof:3112]
The reflection construction has a subtle point worth emphasizing: the reflection makes $\tilde{u}_k$ continuous across $\{x_n = 0\}$ (the values match from both sides), so the reflected function has no additional jump at the boundary. In contrast, the zero-extension is discontinuous at $\partial\Omega$ whenever $Tu \neq 0$, which is exactly why the zero-extension fails to give a bounded extension. The reflection "fills in" the boundary values on both sides of $\partial\Omega$ by matching them, so no new singularity is introduced.
[example: Reflection Fails Without Matching]
Consider $\Omega = (0,1) \subset \mathbb{R}$ and $u(x) = 1$ for all $x \in (0,1)$, so $u \in BV(\Omega)$ with $|Du|(\Omega) = 0$. The zero-extension is $\tilde{u} = \mathbb{1}_{(0,1)}$, which has $|D\tilde{u}|(\mathbb{R}) = 2$ — the jumps at $0$ and $1$ contribute $|Tu(0)| + |Tu(1)| = 1 + 1 = 2$. In contrast, the reflection extension is $Eu(x) = 1$ for $x \in (-1, 0) \cup (0, 1)$ (using both left and right reflections), which by a further smooth cutoff gives a function supported on $(-\delta, 1 + \delta)$ that equals $1$ on $(0,1)$ and has small total variation. More precisely, if we use a cutoff $\eta$ supported on $(-1, 2)$ that equals $1$ on $(0,1)$, then $Eu = u\eta$ extended by reflection satisfies $\|Eu\|_{BV(\mathbb{R})} \leq C\|u\|_{BV(\Omega)}$ with $C$ depending only on $\eta$. The zero-extension achieves no such control when $u$ has nonzero trace.
[/example]
### Compact Containment and the Role of the Extension
The BV extension theorem has the same consequence for compactness as Stein's theorem does for Sobolev spaces: it allows us to work with BV functions on $\mathbb{R}^n$ rather than on $\Omega$, which is technically convenient because $\mathbb{R}^n$ has no boundary to worry about. Specifically, if $\{u_k\}$ is a bounded sequence in $BV(\Omega)$, then $\{Eu_k\}$ is bounded in $BV(\mathbb{R}^n)$, and the BV compactness theorem on $\mathbb{R}^n$ (proved via the Rellich-Kondrachov theorem for $BV$) yields a subsequence $Eu_{k_j} \to v$ in $L^1_{\text{loc}}(\mathbb{R}^n)$. Restricting to $\Omega$ gives $u_{k_j} \to v|_\Omega$ in $L^1(\Omega)$, recovering the compactness statement for $BV(\Omega)$ from Chapter 9 via a different route.
The extension theorem also justifies working with traces as if they were boundary values in the classical sense: for $u \in BV(\Omega)$, the trace $Tu$ on $\partial\Omega$ is the unique $L^1(\partial\Omega; \mathcal{H}^{n-1})$ function such that $u$ can be extended to a BV function on a slightly larger domain $\Omega'$ agreeing with $Tu$ along $\partial\Omega$ in the appropriate sense. This interpretation makes the Gauss-Green theorem in Chapter 13 conceptually natural: the boundary integral $\int_{\partial\Omega} g \cdot \nu_\Omega \, d\mathcal{H}^{n-1}$ that appears there involves $Tu$ as the boundary values of $u$, and the proof goes through because the extension theorem guarantees the BV structure is well-behaved up to the boundary.
## Traces and the Coarea Formula
The trace and extension results interact with the coarea formula for BV functions in a way that clarifies the role of boundary values in the level-set decomposition. For $u \in BV(\Omega)$, the coarea formula (proved in Chapter 11) states
\begin{align*}
|Du|(\Omega) = \int_{-\infty}^{\infty} P(\{u > t\}; \Omega) \, dt,
\end{align*}
where $P(E; \Omega)$ denotes the perimeter of the superlevel set $\{u > t\}$ in $\Omega$. This formula makes BV functions look like integrals of sets of finite perimeter over the level parameter $t$. The trace enters through the boundary behavior of these superlevel sets: for almost every $t$, the set $\{u > t\}$ has finite perimeter in $\Omega$, and its boundary $\partial\{u > t\} \cap \partial\Omega$ is exactly the level set $\{Tu = t\}$ on $\partial\Omega$.
This observation gives a second formula, complementary to the coarea formula for $|Du|(\Omega)$:
\begin{align*}
\int_{\partial\Omega} |Tu| \, d\mathcal{H}^{n-1} = \int_{-\infty}^{\infty} \mathcal{H}^{n-1}(\{Tu > t\} \cap \partial\Omega) \, dt.
\end{align*}
This is simply the layer-cake representation of $\int |Tu| \, d\mathcal{H}^{n-1}$, but together with the coarea formula it expresses the zero-extension formula $|D\tilde{u}|(\mathbb{R}^n) = |Du|(\Omega) + \int_{\partial\Omega} |Tu| \, d\mathcal{H}^{n-1}$ as an identity of level-set perimeters:
\begin{align*}
P(\{\tilde{u} > t\}; \mathbb{R}^n) = P(\{u > t\}; \Omega) + \mathcal{H}^{n-1}(\{Tu > t\} \cap \partial\Omega) \quad \text{for a.e. } t.
\end{align*}
This slicing identity — that the perimeter of each superlevel set of $\tilde{u}$ in $\mathbb{R}^n$ decomposes into the interior perimeter and the boundary area — will be used in Chapter 11 to derive the coarea formula for $\tilde{u}$ directly from the coarea formula for $u$.
[explanation: Why the Trace Cannot Be Improved to $L^p$ for $p > 1$]
For Sobolev functions $u \in W^{1,p}(\Omega)$ with $p > 1$, the Sobolev trace theorem gives $Tu \in L^{p(n-1)/(n-p)}(\partial\Omega; \mathcal{H}^{n-1})$ — a strictly better integrability than $L^1$. The BV trace, by contrast, only lands in $L^1(\partial\Omega; \mathcal{H}^{n-1})$, and this cannot be improved in general. To see why, take a sequence of bump functions $u_k \in C_c^\infty(\Omega)$ concentrating near a boundary point $y_0 \in \partial\Omega$ with $\|u_k\|_{BV(\Omega)} = 1$ but $u_k|_{\partial\Omega}$ approximating a delta mass at $y_0$. The BV norm controls $\int_{\partial\Omega} |u_k| \, d\mathcal{H}^{n-1}$ uniformly (by the trace estimate), so no blow-up occurs in $L^1$, but the sequence has no $L^p$ bound for $p > 1$ because the mass concentrates. More precisely, take $n = 2$, $\Omega = B(0,1)$, and $u_k(x) = \phi(k(x - y_0))$ for a smooth bump $\phi \geq 0$ with $\int \phi \, d\mathcal{L}^n = 1$ and $y_0 \in \partial\Omega$. Then $\|u_k\|_{L^1(\Omega)} = k^{-2}$, $|Du_k|(\Omega) = k \cdot k^{-2} \cdot \|\nabla\phi\|_{L^1} = k^{-1}\|\nabla\phi\|_{L^1} \to 0$, but $Tu_k$ concentrates near $y_0$ with $\int_{\partial\Omega} |Tu_k|^2 \, d\mathcal{H}^1 \to \infty$ while $\int_{\partial\Omega} |Tu_k| \, d\mathcal{H}^1$ remains bounded. Thus the trace operator genuinely maps into $L^1$ and no better, consistent with the measure-valued character of BV.
[/explanation]
The $L^1$ landing space for the BV trace is therefore sharp: BV functions have enough control to define boundary values in $L^1(\partial\Omega; \mathcal{H}^{n-1})$, but the possibility of mass concentration near the boundary (permitted by the weak BV topology) prevents any higher integrability. This is one of the ways BV differs from $W^{1,1}$: even though $W^{1,1}(\Omega) \subset BV(\Omega)$ with $|Du|(\Omega) = \|\nabla u\|_{L^1(\Omega)}$, the trace of a $W^{1,1}$ function lands only in $L^1(\partial\Omega; \mathcal{H}^{n-1})$ as well, so BV and $W^{1,1}$ have the same trace quality despite $W^{1,1}$ being strictly smaller.
The results of this chapter — the BV trace theorem, the zero-extension formula, and the BV extension theorem — set up the infrastructure needed for the following chapters. In Chapter 11, the coarea formula will be established using these boundary controls on the level sets of BV functions. In Chapter 12, the perimeter of a set of finite perimeter will be computed in terms of the trace of its indicator function, and the decomposition $P(E; \mathbb{R}^n) = P(E; \Omega) + \mathcal{H}^{n-1}(E \cap \partial\Omega)$ will serve as the starting point for De Giorgi's rectifiability theorem. The Gauss-Green theorem in Chapter 13 will then use the trace as the correct notion of boundary values, completing the circle between the abstract BV theory and the classical integral formulas of vector calculus.
Understanding boundary behavior for BV functions requires adapting the trace theory from Sobolev spaces while accounting for the singular measures in the total variation. The BV trace exists and is unique up to sets of measure zero, and like Sobolev traces, it links the boundary values to interior regularity.
# 11. The Coarea Formula for BV Functions
The preceding chapters built the BV theory from the ground up: we defined $BV(\Omega)$, established approximation by smooth functions, proved compactness under bounded total variation, and extended the trace and extension machinery from Sobolev functions to BV. Having assembled these tools, a natural and profound question emerges: is there a way to relate the total variation measure $|Du|$ — a single measure on $\Omega$ — to some collection of simpler geometric data attached to $u$? The coarea formula answers this question decisively, expressing $|Du|(\Omega)$ as an integral, over $t \in \mathbb{R}$, of the perimeters of the level sets $\{u > t\}$.
This is not merely a representation theorem. It is a bridge between two regimes: the analytic regime of BV functions and the geometric regime of sets of finite perimeter. Via the coarea formula, every BV statement can be sliced into a family of perimeter statements, one for each level $t$. Compactness for BV reduces to compactness for sets of finite perimeter. Isoperimetric inequalities for BV functions reduce to the classical isoperimetric inequality for sets. The formula is, in this sense, the structural backbone of all that follows in Chapters 12 through 14.
## From the Lipschitz Coarea to the BV Coarea
To understand what the BV coarea formula says and why it is the right statement, it helps to start from the version we already know. In GMT II, the Lipschitz coarea formula (a special case of the area formula for Lipschitz maps) asserts that for any Lipschitz function $f: \mathbb{R}^n \to \mathbb{R}$,
\begin{align*}
\int_{\mathbb{R}^n} |\nabla f| \, d\mathcal{L}^n = \int_{-\infty}^{\infty} \mathcal{H}^{n-1}(f^{-1}(t)) \, dt.
\end{align*}
The left side is the $L^1$ integral of the pointwise Jacobian $J_1 f = |\nabla f|$ (which exists a.e. by Rademacher's theorem). The right side sums up the $(n-1)$-dimensional measure of the level sets $f^{-1}(t)$ as $t$ ranges over all of $\mathbb{R}$.
The challenge in moving to BV is that neither side of this identity makes sense verbatim. A general $u \in BV(\Omega)$ need not have a pointwise gradient — the distributional derivative $Du$ is a vector-valued Radon measure, not an $L^1$ function. Consequently, $|\nabla u|$ is not defined as a function, and the left side has no direct meaning. On the right side, the level surfaces $u^{-1}(t)$ may be wildly irregular, and $\mathcal{H}^{n-1}(u^{-1}(t))$ is not the right quantity. The correct replacement for each level $t$ is the perimeter $P(\{u > t\}, \Omega)$, the total variation of the indicator $\mathbb{1}_{\{u > t\}}$.
[definition: Level Set of a BV Function]
Let $\Omega \subset \mathbb{R}^n$ be open and $u \in BV(\Omega)$. For each $t \in \mathbb{R}$, the superlevel set at height $t$ is
\begin{align*}
E_t := \{x \in \Omega : u(x) > t\}.
\end{align*}
The perimeter of $E_t$ in $\Omega$ is
\begin{align*}
P(E_t, \Omega) := |D\mathbb{1}_{E_t}|(\Omega),
\end{align*}
where $\mathbb{1}_{E_t}$ is the indicator function of $E_t$ and $|D\mathbb{1}_{E_t}|$ denotes the total variation measure of its distributional derivative.
[/definition]
The definition $P(E_t, \Omega) = |D\mathbb{1}_{E_t}|(\Omega)$ is the intrinsic BV notion of perimeter introduced in Chapter 8 and developed further in Chapter 12. When $E_t$ has a smooth boundary, $P(E_t, \Omega) = \mathcal{H}^{n-1}(\partial E_t \cap \Omega)$ recovers the classical surface area — but the definition is meaningful far beyond the smooth setting, precisely because it is phrased in terms of total variation rather than Hausdorff measure.
## Statement and Proof of the Coarea Formula
The goal is now to identify $|Du|(\Omega)$ with $\int_{-\infty}^{\infty} P(E_t, \Omega) \, dt$. Before stating the theorem, note why such a result is plausible. The level set $E_t = \{u > t\}$ depends monotonically on $t$: as $t$ increases, $E_t$ shrinks. The function $t \mapsto \mathbb{1}_{E_t}(x)$ equals $\mathbb{1}_{(−\infty, u(x))}(t)$, so integrating $\mathbb{1}_{E_t}$ over $t$ recovers $u$ via the layer-cake formula $u(x) = \int_{-\infty}^{\infty} \mathbb{1}_{\{u > t\}}(x) \, dt$ (valid for non-negative $u$; the general case follows by writing $u = u^+ - u^-$). This layer-cake representation is the heuristic heart of the coarea formula: summing up the boundaries of all level sets should reconstitute the full variation of $u$.
[quotetheorem:598]
[citeproof:598]
The proof reveals why both directions of the inequality are needed and why the approximation argument is delicate. Lower semicontinuity of perimeter gives the upper bound essentially for free once we have smooth approximations, but the matching lower bound requires going back to the definition of total variation via test functions and applying the layer-cake formula. The two halves of the proof also illuminate the formula's structure: the upper bound says "integrating perimeters cannot exceed total variation," while the lower bound says "total variation cannot escape into the level sets."
The corollary on finite perimeter for a.e. $t$ is worth highlighting separately. If $\int P(E_t, \Omega) \, dt < \infty$, then the integrand $P(E_t, \Omega)$ must be finite for $\mathcal{L}^1$-a.e. $t$. Since $u \in BV(\Omega)$ guarantees $|Du|(\Omega) < \infty$, the coarea formula gives exactly this: almost every level set of a BV function has finite perimeter.
[remark: Failure When $u \notin BV$]
The finite-perimeter conclusion for level sets can fail when $u \notin BV(\Omega)$. Take $\Omega = (0,1)^2$ and define $u(x_1, x_2) = \sum_{k=1}^\infty k^{-1} \mathbb{1}_{[0, 1/k) \times [0,1)}(x_1, x_2)$. This function lies in $L^1(\Omega)$ but not in $BV(\Omega)$ — its total variation is $\sum_k k^{-1} \cdot 1 = \infty$ (each jump between the $k$-th and $(k+1)$-th strips contributes $(k^{-1} - (k+1)^{-1}) \cdot 1$ to the variation and the sum diverges). For $t$ in the interval $(k^{-1}, (k-1)^{-1})$, the level set $\{u > t\}$ is the strip $[0, 1/k) \times [0,1)$, which has perimeter $2/k + 2$. Integrating over $t$ gives $\sum_k (k^{-1} - (k+1)^{-1})(2/k + 2) = \infty$, consistent with $u \notin BV$. So the assumption $u \in BV(\Omega)$ is what makes the total variation finite and forces a.e. level set to have finite perimeter.
[/remark]
## Comparison with the Lipschitz Coarea Formula
The BV coarea formula is properly understood as the "intrinsic" or "measure-theoretic" version of the Lipschitz coarea formula from GMT II. Setting the two side by side clarifies what is genuinely new.
In the Lipschitz setting, $f: \mathbb{R}^n \to \mathbb{R}$ Lipschitz, the formula reads:
\begin{align*}
\int_{\mathbb{R}^n} |\nabla f| \, d\mathcal{L}^n = \int_{-\infty}^{\infty} \mathcal{H}^{n-1}(f^{-1}(t)) \, dt.
\end{align*}
On the left side, $|\nabla f|$ is the pointwise Jacobian of $f$ (a scalar $L^1$ function, available because Lipschitz maps are a.e. differentiable by Rademacher). On the right side, $\mathcal{H}^{n-1}(f^{-1}(t))$ is the $(n-1)$-dimensional Hausdorff measure of the level set.
In the BV setting, $u \in BV(\Omega)$, the formula reads:
\begin{align*}
|Du|(\Omega) = \int_{-\infty}^{\infty} P(\{u > t\}, \Omega) \, dt.
\end{align*}
The left side $|Du|(\Omega)$ replaces $\int |\nabla u| \, d\mathcal{L}^n$: when $u$ has no pointwise gradient, the total variation is the only intrinsic substitute. The right side $P(\{u > t\}, \Omega) = |D\mathbb{1}_{E_t}|(\Omega)$ replaces $\mathcal{H}^{n-1}(u^{-1}(t))$: when $\{u > t\}$ has rough boundary, the BV perimeter is the only intrinsically well-defined notion. The two sides of the BV formula are therefore obtained from the Lipschitz formula by replacing every classical (smooth, pointwise) quantity with its BV (measure-theoretic, total-variation) counterpart.
[example: The Lipschitz Case Revisited]
Suppose $u: \Omega \to \mathbb{R}$ is Lipschitz. Then $u \in W^{1,1}(\Omega) \subset BV(\Omega)$, and the distributional derivative $Du$ is the $\mathcal{L}^n$-absolutely continuous measure $Du = \nabla u \, \mathcal{L}^n$. The total variation satisfies $|Du|(\Omega) = \int_\Omega |\nabla u| \, d\mathcal{L}^n$. For a.e. $t$, the level set $\{u > t\}$ has Lipschitz boundary (this follows from the co-area formula for Lipschitz functions and the fact that regular values are dense), and for such $t$, $P(\{u > t\}, \Omega) = \mathcal{H}^{n-1}(\{u = t\} \cap \Omega)$ because the BV perimeter of a Lipschitz domain equals the classical surface area of its boundary. Substituting, the BV coarea formula $|Du|(\Omega) = \int P(\{u > t\}, \Omega) \, dt$ becomes $\int_\Omega |\nabla u| \, d\mathcal{L}^n = \int \mathcal{H}^{n-1}(\{u = t\} \cap \Omega) \, dt$, which is exactly the Lipschitz coarea formula. Thus the BV formula genuinely extends the Lipschitz one.
[/example]
The example also reveals when the two formulas diverge. For a BV function that is not Sobolev — such as a function with jump discontinuities along a hypersurface — the measure $Du$ has a singular part supported on the jump set, and the total variation picks up contributions from jumps that have no analogue in the Lipschitz world. On the right side, the level sets $\{u > t\}$ for $t$ in the range of the jump will have extra boundary along the jump set, contributing to their perimeter. The equality of left and right sides in the BV coarea formula accounts for these singular contributions automatically, which is precisely why the perimeter $P(E_t, \Omega)$ — not $\mathcal{H}^{n-1}(\partial E_t)$ — is the correct quantity on the right.
## Applications: Slicing BV Functions
The coarea formula's most powerful use is as a reduction principle. Problems about BV functions frequently reduce, via the coarea formula, to the corresponding problem for characteristic functions of sets of finite perimeter. Since characteristic functions are simpler objects — binary-valued, with their entire complexity encoded in the geometry of a single set — this reduction often makes otherwise intractable problems accessible. The obstacle in applying this reduction is making the slicing argument precise, which requires controlling the perimeter of level sets uniformly in $t$.
### Compactness via the Coarea Formula
The compactness theorem for BV (Chapter 9) states that if $(u_k)$ is a bounded sequence in $BV(\Omega)$ with $\|u_k\|_{BV} \le C$, then a subsequence converges in $L^1_{\mathrm{loc}}(\Omega)$. One way to see why this is true is through the coarea formula.
[explanation: Slicing Argument for BV Compactness]
Given a bounded sequence $(u_k)$ in $BV(\Omega)$, the coarea formula gives
\begin{align*}
\int_{-\infty}^{\infty} P(\{u_k > t\}, \Omega) \, dt = |Du_k|(\Omega) \le C
\end{align*}
for all $k$. By Markov's inequality (or simply by the integrability of the left side), for each $\varepsilon > 0$ and each $k$, the set of $t$ for which $P(\{u_k > t\}, \Omega) > C/\varepsilon$ has $\mathcal{L}^1$-measure at most $\varepsilon$. More precisely, Fatou's lemma shows that for $\mathcal{L}^1$-a.e. $t$,
\begin{align*}
\liminf_{k \to \infty} P(\{u_k > t\}, \Omega) < \infty.
\end{align*}
This means, for a.e. $t$, the family $(\mathbb{1}_{\{u_k > t\}})$ is a bounded sequence in $BV(\Omega)$ (with values in $\{0,1\}$). The compactness theorem for sets of finite perimeter — which says that a sequence of sets in $\Omega$ with uniformly bounded perimeters has a subsequence converging in $L^1(\Omega)$ — then gives a subsequence $k_j$ such that $\mathbb{1}_{\{u_{k_j} > t\}}$ converges in $L^1$ for a.e. $t$. Integrating over $t$ via the layer-cake representation and dominated convergence yields $L^1_{\mathrm{loc}}$ convergence of $u_{k_j}$. While the full proof in Chapter 9 does not follow this exact path (the diagonal extraction is handled differently), the argument shows that BV compactness is, at its core, the compactness for sets of finite perimeter, integrated over all levels.
[/explanation]
The slicing perspective is not just conceptually clarifying — it is quantitatively useful. When one needs to track how much of the total variation of $u$ comes from a specific region or a specific range of levels, the coarea formula isolates exactly the relevant level sets and their perimeters.
### The Isoperimetric Inequality for BV Functions
The classical isoperimetric inequality says that among all sets of a given volume, the ball minimizes perimeter. In the BV setting, the isoperimetric inequality takes the following form.
[quotetheorem:3113]
The theorem is quoted here without proof. It follows from combining the isoperimetric inequality for sets of finite perimeter with the coarea formula — indeed, the proof applies the set-isoperimetric inequality to $\{u > t\}$, integrates over $t$ using the coarea formula, and applies the Minkowski integral inequality. Its proof is carried out in detail in Evans-Gariepy §5.6.
The role of the coarea formula in the proof deserves emphasis, because it exemplifies the reduction principle. The BV Sobolev inequality (that $\|u\|_{L^{n/(n-1)}} \lesssim |Du|(\mathbb{R}^n)$) is equivalent, via the coarea formula, to the isoperimetric inequality for sets of finite perimeter: knowing that perimeter controls volume for indicator functions is the same as knowing that total variation controls the $L^{n/(n-1)}$ norm for all BV functions. This equivalence, mediated by the coarea formula, is a template for many arguments throughout the theory.
### The Coarea Formula for Bounded Borel Functions
A useful variant of the main formula handles the case of a BV function weighted by a bounded Borel function.
[quotetheorem:3114]
[citeproof:3114]
This weighted formula is exactly what is needed when one wants to integrate a coefficient (for instance, a weight arising from a change of variables or a cutoff function) against the total variation measure. It encodes the full coarea formula not just for the total mass $|Du|(\Omega)$ but for the measure $|Du|$ itself.
## Measurability and the Structure of Level Set Families
The coarea formula treats $t \mapsto P(\{u > t\}, \Omega)$ as an $\mathcal{L}^1$-integrable function, so a word on measurability is warranted. A key point is that the map $t \mapsto \mathbb{1}_{\{u > t\}}$ is not merely measurable in a formal sense — it is jointly measurable in $(x, t)$.
[explanation: Joint Measurability of the Level Set Map]
Define $F: \Omega \times \mathbb{R} \to \{0,1\}$ by $F(x,t) = \mathbb{1}_{\{u > t\}}(x) = \mathbb{1}_{\{(x,t): u(x) > t\}}(x,t)$. The set $\{(x,t) \in \Omega \times \mathbb{R} : u(x) > t\}$ is the subgraph of $u$, and since $u$ is $\mathcal{L}^n$-measurable (BV functions are defined a.e.), the subgraph is measurable with respect to the product $\sigma$-algebra $\mathcal{B}(\Omega) \otimes \mathcal{B}(\mathbb{R})$. Hence $F$ is jointly measurable, and for each fixed $x$, the section $t \mapsto F(x,t)$ is the decreasing indicator of the interval $(-\infty, u(x))$, while for each fixed $t$, the section $x \mapsto F(x,t) = \mathbb{1}_{E_t}(x)$ is $\mathcal{L}^n$-measurable. Fubini's theorem then justifies the interchange of integration order in the proof of the coarea formula, and $t \mapsto P(E_t, \Omega)$ is $\mathcal{L}^1$-measurable as a consequence of the joint measurability and the definition of perimeter via $L^1$ convergence.
[/explanation]
An important structural consequence of the coarea formula concerns the decomposition of the measure $|Du|$. Recall from Chapter 8 that for $u \in BV(\Omega)$, the measure $Du$ decomposes as
\begin{align*}
Du = D^a u + D^j u + D^c u,
\end{align*}
where $D^a u = \nabla u \, \mathcal{L}^n$ is the absolutely continuous part, $D^j u$ is the jump part concentrated on the jump set $J_u$, and $D^c u$ is the Cantor part. The coarea formula is sensitive to all three parts simultaneously. When $u \in W^{1,1}(\Omega)$, there is no jump or Cantor part, and the coarea formula reduces to the Lipschitz coarea formula (as in the example above). But for a general BV function, the perimeter of the level set $\{u > t\}$ picks up contributions from all three parts: from the absolutely continuous part via the classical surface area of smooth pieces of $\partial\{u > t\}$, from the jump part via the jump discontinuities of $u$ along $J_u$, and from the Cantor part via the Cantor-type singular behavior of $u$.
[example: Coarea Formula for the Sign Function]
Let $\Omega = (-1,1)$ and $u: \Omega \to \mathbb{R}$ be defined by $u(x) = \operatorname{sgn}(x)$, the sign function (with $u(0) = 0$ for definiteness, though the value at a single point does not affect the BV structure). Then $u$ is a BV function with $Du = 2\delta_0$ (a Dirac mass of weight $2$ at the origin), so $|Du|(\Omega) = 2$.
For $t \in (-1, 0)$: $\{u > t\} = \{x \in (-1,1): u(x) > t\} = (0, 1) \cup \{x: u(x) = 1 > t\} = (0,1)$. (More carefully: $u(x) = 1$ for $x > 0$, $u(0) = 0 \le t$ only if $t \ge 0$, and $u(x) = -1$ for $x < 0$.) For $-1 < t < 0$, $u(x) > t$ precisely when $x > 0$ (since $u = 1 > t$ there) and when $x = 0$ if $0 > t$ (which is always true in this range). The set $\{u > t\} = [0, 1) \cap \Omega$ for $-1 < t < 0$, which has perimeter $P(\{u > t\}, (-1,1)) = |D\mathbb{1}_{(0,1)}|((-1,1)) = 1$ (the indicator of $(0,1)$ has a single jump of size $1$ at $x = 0$).
For $t \in (0, 1)$: $u(x) > t > 0$ only when $u(x) = 1$, i.e., for $x > 0$. So $\{u > t\} = (0,1)$ for $t \in (0,1)$, with $P = 1$.
For $t \ge 1$ or $t \le -1$: $\{u > t\}$ is either $\varnothing$ or all of $(-1,1)$, each with perimeter $0$.
The coarea formula gives:
\begin{align*}
\int_{-\infty}^{\infty} P(\{u > t\}, (-1,1)) \, dt = \int_{-1}^{0} 1 \, dt + \int_{0}^{1} 1 \, dt = 1 + 1 = 2 = |Du|(\Omega).
\end{align*}
The formula holds. The perimeter of $\{u > t\}$ equals $1$ for all $t \in (-1,1)$ because the jump set of $u$ at $0$ contributes to the boundary of $\{u > t\}$ for every $t$ in the range of the jump — and integrating over those $t$ values (a length-$2$ interval, matching the jump size $2$) recovers the full total variation $|Du|(\Omega) = 2$.
[/example]
The example makes explicit the mechanism by which a jump discontinuity "spreads" its total variation mass over the range of the jump. The jump of $u$ at $0$ has size $2$ (from $-1$ to $1$), and this single point contributes a perimeter of $1$ for each $t \in (-1,1)$, an interval of length $2$. The product $1 \times 2 = 2$ is exactly the variation contributed by the jump. This is the BV coarea formula's way of encoding jump discontinuities — a geometric fact invisible in the Sobolev world.
<!-- illustration-needed: the coarea formula for the sign function — show the graph of u = sgn(x) on (-1,1), the level sets {u > t} for several values of t in (-1,0), (0,1), and the contribution to perimeter at x=0 for each such t; annotate the integral as the product of perimeter (=1) times the length of the t-interval (=2) -->
## The Coarea Formula as a Global Reduction Principle
Having established the formula and worked through its structure, it is worth surveying how the coarea formula functions in the rest of the course. Its role is not confined to being a single theorem — it is a methodology.
The basic pattern is as follows. Given a property $P(u)$ to be proved for all $u \in BV(\Omega)$, write $u$ as an integral over its level sets via the layer-cake formula, reduce $P(u)$ to a corresponding property $Q(E)$ for sets of finite perimeter $E$, prove $Q(E)$ geometrically (often using De Giorgi's structure theorem and the Gauss-Green theorem for sets of finite perimeter), and then integrate back. The coarea formula is the link that converts $Q(E_t)$, integrated over $t$, into $P(u)$.
This reduction is not always straightforward. The main subtlety is that the implication from $Q(E_t)$ for a.e. $t$ to $P(u)$ requires careful Fubini-type arguments and the right form of the reduction. But the pattern recurs throughout the subsequent chapters. In Chapter 12, De Giorgi's structure theorem for sets of finite perimeter is proved, and through the coarea formula, it implies corresponding structure results for the jump set of a BV function. In Chapter 13, the Gauss-Green theorem for sets of finite perimeter — proved using the reduced boundary and the structure theorem — implies the Gauss-Green theorem for BV functions via slicing. In Chapter 14, the pointwise theory of BV functions (approximate differentiability, precise representatives) is obtained by studying, for each $t$, the density of the level set $\{u > t\}$ at each point, and then integrating over $t$.
The coarea formula thus occupies a structural position in the BV theory analogous to the role of the Fundamental Theorem of Calculus in one variable: it is the identity that makes the passage between global analytic quantities and local geometric quantities precise and quantitative.
Having established traces for BV functions, we now prove the coarea formula, which relates the total variation of a function to an integral of the perimeters of its level sets. This formula is a bridge between pointwise analysis and measure-theoretic properties, and it paves the way for understanding sets of finite perimeter.
# 12. Sets of Finite Perimeter and the Reduced Boundary
Among all the results in this course, Chapter 12 occupies the central place. We have spent eleven chapters building the analytic and measure-theoretic scaffolding — BV functions, their approximation, their traces, and the coarea formula — precisely so that we can state and prove De Giorgi's Structure Theorem with the precision it demands. The theorem says that if $E \subset \mathbb{R}^n$ has finite perimeter, then its "reduced boundary" $\partial^* E$ is an $(n-1)$-rectifiable set, the measure-theoretic outer normal $\nu_E$ serves as the approximate normal in the rectifiability sense, and the perimeter measure $|D\mathbb{1}_E|$ is exactly the $(n-1)$-dimensional Hausdorff measure restricted to $\partial^* E$. This last identity is the deepest: it says that the abstract BV notion of perimeter — the total variation of a distributional derivative — agrees precisely with the classical geometric notion of surface area, as soon as we measure on the right part of the boundary.
The chapter proceeds in four stages. First, we pin down the definitions: finite perimeter, the perimeter measure, and the reduced boundary $\partial^* E$. Second, we prove the isoperimetric inequality, which is both a fundamental estimate and the key tool in the blow-up analysis. Third, we carry out the blow-up: at every reduced boundary point, the rescaled set $E_r = r^{-1}(E - x)$ converges in $L^1_{\mathrm{loc}}$ to a half-space whose boundary hyperplane is perpendicular to $\nu_E(x)$. The blow-up analysis is the technical heart — it is what forces rectifiability. Finally, we assemble these pieces into De Giorgi's theorem. Throughout, $E$ denotes a Borel-measurable subset of $\mathbb{R}^n$ with $n \geq 2$, and $\Omega$ denotes an open set in $\mathbb{R}^n$.
## Finite Perimeter, the Perimeter Measure, and the Reduced Boundary
The correct definition of perimeter in the BV framework follows directly from Chapter 8: a set has finite perimeter if and only if its indicator function is a BV function. This is not just a formal analogy — it is the definition that makes the theory work, because it gives a perimeter measure via the distributional derivative, and that measure turns out to carry all the geometric information about the boundary.
[definition: Set of Finite Perimeter]
Let $\Omega \subset \mathbb{R}^n$ be open. A Borel-measurable set $E \subset \mathbb{R}^n$ has **finite perimeter in $\Omega$** if $\mathbb{1}_E \in BV(\Omega)$, i.e., if the distributional derivative $D\mathbb{1}_E$ is a finite $\mathbb{R}^n$-valued Radon measure on $\Omega$. The **perimeter of $E$ in $\Omega$** is
\begin{align*}
P(E; \Omega) := |D\mathbb{1}_E|(\Omega),
\end{align*}
the total variation of $D\mathbb{1}_E$ on $\Omega$. When $\Omega = \mathbb{R}^n$, we write $P(E) := P(E; \mathbb{R}^n)$.
[/definition]
The distributional derivative $D\mathbb{1}_E$ is defined by
\begin{align*}
D\mathbb{1}_E(\phi) = -\int_E \operatorname{div} \phi \, d\mathcal{L}^n, \quad \phi \in C_c^\infty(\Omega; \mathbb{R}^n).
\end{align*}
When $E$ has $C^1$ boundary in $\Omega$, integration by parts gives $D\mathbb{1}_E = -\nu_E \mathcal{H}^{n-1}|_{\partial E \cap \Omega}$, where $\nu_E$ is the outward unit normal. In that case $P(E; \Omega) = \mathcal{H}^{n-1}(\partial E \cap \Omega)$, the classical surface area. The whole point of the theory is to recover this identity without any smoothness assumption.
[example: Perimeter of a Ball]
Let $E = B(0, r)$, the open ball of radius $r > 0$ in $\mathbb{R}^n$. Since $\partial B(0,r)$ is a smooth $(n-1)$-manifold with outward unit normal $\nu_E(x) = x/|x|$, we have $D\mathbb{1}_E = -\nu_E \mathcal{H}^{n-1}|_{\partial B(0,r)}$. Therefore
\begin{align*}
P(B(0,r)) = |D\mathbb{1}_{B(0,r)}|(\mathbb{R}^n) = \mathcal{H}^{n-1}(\partial B(0,r)) = n\omega_n r^{n-1},
\end{align*}
where $\omega_n = \mathcal{L}^n(B(0,1))$ is the volume of the unit ball. This is exactly the classical surface area formula. Notice that $P(B(0,r)) = n\omega_n r^{n-1}$ grows like $r^{n-1}$, consistent with the isoperimetric inequality $P(E) \geq C(n) \mathcal{L}^n(E)^{(n-1)/n}$ since $\mathcal{L}^n(B(0,r)) = \omega_n r^n$.
[/example]
The perimeter measure $|D\mathbb{1}_E|$ is a non-negative Radon measure supported on the topological boundary $\partial E$ (since $\mathbb{1}_E$ is locally constant away from $\partial E$, its derivative vanishes there). However, $\partial E$ can be much larger than what the perimeter "sees" — consider, for instance, a set $E$ whose boundary contains a Cantor-like set of positive $\mathcal{H}^{n-1}$-measure that contributes nothing to $P(E)$ because $E$ looks like either full measure or zero measure at every such point. The right notion of boundary is finer: it should consist precisely of those points where the set has a genuine "half-space" asymptotic behavior at all scales.
[definition: Reduced Boundary]
Let $E \subset \mathbb{R}^n$ have finite perimeter. The **reduced boundary** $\partial^* E$ is the set of all points $x \in \mathbb{R}^n$ such that:
\begin{align*}
|D\mathbb{1}_E|(B(x, r)) > 0 \quad \text{for all } r > 0,
\end{align*}
and the limit
\begin{align*}
\nu_E(x) := \lim_{r \to 0} \frac{D\mathbb{1}_E(B(x,r))}{|D\mathbb{1}_E|(B(x,r))}
\end{align*}
exists in $\mathbb{R}^n$ and satisfies $|\nu_E(x)| = 1$. The vector $\nu_E(x) \in S^{n-1}$ is the **measure-theoretic outer unit normal** to $E$ at $x$.
[/definition]
The ratio $D\mathbb{1}_E(B(x,r)) / |D\mathbb{1}_E|(B(x,r))$ is the average value of the "direction" of $D\mathbb{1}_E$ over the ball $B(x,r)$, normalized by the total mass. Since $D\mathbb{1}_E$ is an $\mathbb{R}^n$-valued measure with $|D\mathbb{1}_E|$ as its total variation, the ratio lies in the closed unit ball $\overline{B}(0,1)$ for every $r > 0$ by the triangle inequality for measures. Requiring the limit to have modulus exactly 1 is the condition that $D\mathbb{1}_E$ points in a single consistent direction near $x$ at the finest scale — it is the measure-theoretic replacement for having a well-defined unit normal.
[remark: Reduced Boundary vs Topological Boundary]
The reduced boundary satisfies $\partial^* E \subset \partial E$ always, and for sets with $C^1$ boundary $\partial^* E = \partial E$. In general, $\partial^* E$ can be strictly smaller than $\partial E$: the difference $\partial E \setminus \partial^* E$ has $\mathcal{H}^{n-1}$-measure zero, which is one of the conclusions of De Giorgi's theorem. The perimeter measure is supported on $\partial^* E$: more precisely, $|D\mathbb{1}_E| = \mathcal{H}^{n-1}|_{\partial^* E}$.
[/remark]
The Lebesgue differentiation theorem for vector-valued Radon measures guarantees that
\begin{align*}
\lim_{r \to 0} \frac{D\mathbb{1}_E(B(x,r))}{|D\mathbb{1}_E|(B(x,r))}
\end{align*}
exists and has modulus at most 1 for $|D\mathbb{1}_E|$-almost every $x$. So $\partial^* E$ has full $|D\mathbb{1}_E|$-measure in the sense that $|D\mathbb{1}_E|(\mathbb{R}^n \setminus \partial^* E) = 0$. The real content of De Giorgi's theorem is not that $\partial^* E$ captures the full perimeter — that is already contained in the differentiation theory — but rather that $\partial^* E$ is rectifiable, which is a structural claim about the geometry of the set.
## The Isoperimetric Inequality
The isoperimetric inequality asserts that among all sets of a given volume, the ball has the smallest perimeter. In the BV framework, it takes the form of a precise quantitative bound relating $P(E)$ to $\mathcal{L}^n(E)$.
[quotetheorem:600]
[citeproof:600]
The isoperimetric inequality has an important reformulation in terms of the relative isoperimetric inequality on balls, which is the version used in the blow-up analysis. When $P(E; B(x,r))$ is small relative to $r^{n-1}$, the inequality forces $E$ to have very small or very large measure in $B(x,r)$: the set cannot "straddle" the boundary of the ball.
[explanation: Role of the Isoperimetric Inequality in the Theory]
The isoperimetric inequality plays two roles in this chapter. The first is as a standalone geometric result: it quantifies the intuition that perimeter controls volume, and it gives the sharp constant achieved by balls. The second role is more structural: the blow-up argument for De Giorgi's theorem relies on the relative isoperimetric inequality to establish that the rescaled sets $E_r$ converge to a half-space and not to some more complicated limiting object. Specifically, if the reduced boundary $\nu_E(x)$ exists at $x$ in the measure-theoretic sense, one shows that the density of $E$ near $x$ is neither 0 nor 1 in a uniform sense on one side, and the isoperimetric inequality prevents the density from being close to $1/2$ on both sides simultaneously unless the set asymptotically looks like a half-space.
For BV functions that are not indicator functions, the isoperimetric inequality yields the Sobolev inequality and the Poincaré inequality by integrating the level-set version: for $u \in BV(\mathbb{R}^n)$ with compact support,
\begin{align*}
\|u\|_{L^{n/(n-1)}(\mathbb{R}^n)} \leq C(n) \, |Du|(\mathbb{R}^n).
\end{align*}
This follows from the co-area formula (Chapter 11): $|Du|(\mathbb{R}^n) = \int_{-\infty}^{\infty} P(\{u > t\}; \mathbb{R}^n) \, dt$. Applying the isoperimetric inequality to each level set $\{u > t\}$ and integrating recovers the Sobolev inequality. The Poincaré inequality on balls then follows by a standard covering argument.
[/explanation]
## Sobolev and Poincaré Inequalities for BV
The isoperimetric inequality propagates to BV functions via the co-area formula, yielding Sobolev and Poincaré-type estimates that are the functional-analytic backbone of the whole BV theory.
[quotetheorem:3115]
[citeproof:3115]
The Poincaré inequality for BV complements the Sobolev inequality and is used in compactness arguments.
[quotetheorem:3116]
[citeproof:3116]
These two inequalities together account for the compactness of bounded BV sequences in $L^{n/(n-1)}$, which was established in Chapter 9. They will not be used directly in the blow-up analysis, but they are important applications of the isoperimetric inequality that complete the picture of what finite perimeter means analytically.
## Blow-up at Reduced Boundary Points
The blow-up analysis is the technical heart of De Giorgi's theorem. The idea is to "zoom in" on a reduced boundary point by rescaling and to show that the limit is a half-space. Once this is established, rectifiability follows by a covering argument using the theory of rectifiable sets from GMT II.
To set up notation: for a point $x \in \partial^* E$ and $r > 0$, the **rescaled set** is
\begin{align*}
E_r := \frac{E - x}{r} = \left\{ \frac{y - x}{r} : y \in E \right\}.
\end{align*}
The rescaled set $E_r$ is a translate-and-dilate of $E$; it lives in $\mathbb{R}^n$ and has indicator function $\mathbb{1}_{E_r}(z) = \mathbb{1}_E(x + rz)$. The perimeter of $E_r$ in the unit ball $B(0,1)$ satisfies $P(E_r; B(0,1)) = r^{-(n-1)} P(E; B(x,r))$, so the finiteness of $P(E)$ controls the perimeter of the rescaled sets on every fixed ball.
<!-- illustration-needed: blow-up of a set E at a reduced boundary point x — show the set E with a marked point x on its boundary, then the rescaled set E_r for decreasing r values, converging to the half-space H = {y : y · ν_E(x) ≤ 0} with the boundary hyperplane perpendicular to ν_E(x) -->
[quotetheorem:3117]
[citeproof:3117]
The blow-up theorem is the reason the reduced boundary is the right object to study. It says that every point of $\partial^* E$ looks, at small scales, like the flat boundary of a half-space. This is the quantitative geometric content of the condition $|\nu_E(x)| = 1$. The theorem does not assume smoothness of $E$ — it is a consequence of the measure-theoretic definition of $\partial^* E$ and the compactness of the BV topology. The fact that the limit is the specific half-space $H_{\nu_E(x)}$ also shows that the normal $\nu_E(x)$ is genuinely "outward": the set $E$ asymptotically fills the half-space opposite to $\nu_E(x)$.
[remark: Density at Reduced Boundary Points]
A direct consequence of the blow-up theorem is the density estimate: for every $x \in \partial^* E$,
\begin{align*}
\lim_{r \to 0} \frac{\mathcal{L}^n(E \cap B(x,r))}{\mathcal{L}^n(B(x,r))} = \frac{1}{2}.
\end{align*}
This follows because $\mathbb{1}_{E_r} \to \mathbb{1}_H$ in $L^1_{\mathrm{loc}}$ implies $\mathcal{L}^n(E \cap B(x,r)) / r^n \to \mathcal{L}^n(H \cap B(0,1)) = \omega_n / 2$. In particular, every reduced boundary point is a point of density $1/2$ for $E$, meaning the boundary is not "degenerate" from either side.
[/remark]
## De Giorgi's Structure Theorem
With the blow-up analysis in hand, we can now state and prove the structure theorem. The key notion from GMT II is that of an $(n-1)$-rectifiable set: a set $\Gamma \subset \mathbb{R}^n$ is $(n-1)$-rectifiable if
\begin{align*}
\Gamma \subset \Gamma_0 \cup \bigcup_{k=1}^\infty f_k(\mathbb{R}^{n-1}),
\end{align*}
where $\mathcal{H}^{n-1}(\Gamma_0) = 0$ and each $f_k : \mathbb{R}^{n-1} \to \mathbb{R}^n$ is Lipschitz. Equivalently (by Rademacher's theorem and Whitney's extension theorem), $\mathcal{H}^{n-1}$-almost every point of a rectifiable set has a well-defined approximate tangent $(n-1)$-plane, in the sense that the blow-up of the Hausdorff measure at that point converges to $(n-1)$-dimensional Lebesgue measure on the tangent plane.
[quotetheorem:599]
[citeproof:599]
The proof sketch above highlights the deep interplay between measure theory and geometry. Rectifiability is a global structural claim, but De Giorgi's theorem proves it from purely local information — the blow-up at each point — by a covering argument. The covering argument requires the Vitali covering lemma and careful $\mathcal{H}^{n-1}$-measure estimates; the details are carried out in Evans-Gariepy §5.7 using Federer's original approach.
[explanation: What De Giorgi's Theorem Means Geometrically]
Part (a) says that $\partial^* E$ is rectifiable: it is essentially a countable union of Lipschitz hypersurface patches, up to a set of $\mathcal{H}^{n-1}$-measure zero. This is the GMT analog of having a smooth boundary, but it is vastly more general — no differentiability is assumed anywhere. The rectifiability is not just a curiosity: it is the minimal regularity needed to integrate over $\partial^* E$ in a meaningful way, which is exactly what the Gauss-Green theorem (Chapter 13) requires.
Part (b) says that the approximate tangent plane exists $\mathcal{H}^{n-1}$-almost everywhere on $\partial^* E$. For smooth sets, the tangent plane exists everywhere and is the classical tangent hyperplane. For rough sets, the existence of the approximate tangent plane $\mathcal{H}^{n-1}$-almost everywhere is optimal — it is guaranteed by rectifiability alone (via Rademacher's theorem applied to the Lipschitz parametrizations) and cannot be improved to "everywhere" in general.
Part (c) is the quantitative climax: $P(E; \Omega) = \mathcal{H}^{n-1}(\partial^* E \cap \Omega)$. The left side is defined analytically via the total variation of a distributional derivative. The right side is the $(n-1)$-dimensional Hausdorff measure of a geometric object, the reduced boundary. Their equality says that the abstract BV perimeter agrees with the classical surface area, computed on the right subset of $\partial E$. For smooth sets this is the Gauss-Green theorem; for rough sets it is a far-reaching generalization.
Part (d) identifies $\nu_E$ as the geometric normal to $\partial^* E$ in the rectifiability sense, not merely as an analytic quantity defined via the Radon-Nikodym derivative. This identification is what makes the Gauss-Green theorem work for sets of finite perimeter: the formula $\int_E \operatorname{div} \phi \, d\mathcal{L}^n = \int_{\partial^* E} \phi \cdot \nu_E \, d\mathcal{H}^{n-1}$ makes sense because $\nu_E$ is the geometric normal to the rectifiable surface $\partial^* E$.
[/explanation]
[quotetheorem:3118]
This corollary is the headline result. It converts the variational definition of perimeter — a total variation of a distributional derivative — into a purely geometric quantity, the $(n-1)$-dimensional Hausdorff measure of the reduced boundary. For smooth domains it reduces to the classical surface area formula. For domains with corners, cusps, or fractal-type boundaries, it shows that the perimeter "sees" only the rectifiable part of the boundary and ignores pathological pieces that have positive topological boundary but zero $(n-1)$-dimensional Hausdorff measure.
[example: The Reduced Boundary of a Cube]
Let $E = (0,1)^n$, the open unit cube. The topological boundary $\partial E$ consists of $2n$ faces, $\binom{n}{2} \cdot 4$ edges (in $\mathbb{R}^n$), and lower-dimensional faces down to the $2^n$ vertices. However, the edges and vertices have $\mathcal{H}^{n-1}$-measure zero, so they do not contribute to the perimeter. At a point $x$ on the interior of a face, say the face $\{x_1 = 0\} \cap [0,1]^{n-1}$, the cube asymptotically looks like the half-space $\{x_1 > 0\}$ when we zoom in. Therefore $x \in \partial^* E$ with $\nu_E(x) = -e_1$ (the outward normal to that face). The reduced boundary $\partial^* E$ is the union of the $2n$ open faces, which is an $(n-1)$-rectifiable set of $\mathcal{H}^{n-1}$-measure $2n$ (each face has area 1). By De Giorgi's theorem, $P(E) = \mathcal{H}^{n-1}(\partial^* E) = 2n$, which matches the direct calculation: $P((0,1)^n) = 2n$ by integration by parts against $\phi \in C_c^\infty(\mathbb{R}^n; \mathbb{R}^n)$ and the Gauss-Green theorem on a smooth approximation.
[/example]
## Consequences and the Gauss-Green Theorem
De Giorgi's structure theorem is not just a classification result — it unlocks the Gauss-Green theorem for non-smooth sets, which is the main application driving the whole theory. This application is the subject of Chapter 13, but we record the basic form here to show how the pieces fit together.
[quotetheorem:3119]
[citeproof:3119]
The Gauss-Green theorem holds for any set of finite perimeter — no $C^1$ boundary, no Lipschitz boundary, no convexity. The regularity of the set is encoded entirely in the rectifiability of $\partial^* E$ and the measurability of $\nu_E$, both guaranteed by De Giorgi's theorem. This is the sense in which sets of finite perimeter are the right class for geometric measure theory: they are exactly the sets for which the Gauss-Green theorem makes sense in the general BV framework.
[remark: Comparison with the Classical Divergence Theorem]
For a bounded domain $\Omega$ with $C^1$ boundary, the classical divergence theorem says $\int_\Omega \operatorname{div} \phi \, d\mathcal{L}^n = \int_{\partial \Omega} \phi \cdot \nu \, d\mathcal{H}^{n-1}$, where $\nu$ is the classical outward unit normal and the integral is over the full topological boundary. De Giorgi's theorem reconciles this with the BV formula: when $\Omega$ has $C^1$ boundary, the reduced boundary $\partial^* \Omega$ coincides with the topological boundary $\partial \Omega$ (up to a set of $\mathcal{H}^{n-1}$-measure zero), and $\nu_{\Omega}$ coincides with the classical outward unit normal. So the BV Gauss-Green theorem recovers the classical divergence theorem in the smooth case, and extends it to all sets of finite perimeter in the rough case.
[/remark]
The structure theorem also has implications for the relationship between $\partial^* E$ and the measure-theoretic boundary. Define the **measure-theoretic boundary** $\partial_m E$ as the set of points where $E$ has neither density 0 nor density 1:
\begin{align*}
\partial_m E := \left\{ x \in \mathbb{R}^n : \limsup_{r \to 0} \frac{\mathcal{L}^n(E \cap B(x,r))}{r^n} > 0 \text{ and } \limsup_{r \to 0} \frac{\mathcal{L}^n(B(x,r) \setminus E)}{r^n} > 0 \right\}.
\end{align*}
By the density result in the remark following the blow-up theorem, $\partial^* E \subset \partial_m E$. De Giorgi's theorem implies $\mathcal{H}^{n-1}(\partial_m E \setminus \partial^* E) = 0$: the measure-theoretic boundary and the reduced boundary differ by at most a set of $\mathcal{H}^{n-1}$-measure zero. This is the sharpest possible comparison: for sets with fractal boundaries, $\partial_m E \setminus \partial^* E$ can be non-empty, but it carries no mass from the perimeter measure.
[explanation: The Structure Theorem in the Context of the Course]
De Giorgi's theorem is the culmination of the analytical work of Chapters 8 through 11. Chapter 8 (BV Functions: Definition and Structure Theorem) provided the definition of $D\mathbb{1}_E$, the perimeter measure, and the Lebesgue decomposition of $Du$. Chapter 9 (Approximation and Compactness for BV) gave the BV Compactness Theorem — specifically, the $L^1$ convergence of a bounded BV sequence — that drives the blow-up argument. Chapter 10 (Traces and Extensions for BV) clarified boundary behavior via the BV Trace Theorem and the zero-extension formula. Chapter 11 (The Coarea Formula for BV Functions) linked the perimeter of level sets to the total variation of BV functions and derived the isoperimetric inequality for sets of finite perimeter.
De Giorgi's theorem then synthesizes all this into a single statement that is at once analytic (the perimeter measure is Hausdorff measure), geometric (the reduced boundary is rectifiable), and differential-geometric (the normal $\nu_E$ is the approximate normal to a hypersurface). The Gauss-Green theorem of Chapter 13 is the direct application — it uses rectifiability and the normal identification in an essential way.
Looking forward, the structure theorem also underpins the pointwise theory of BV functions (Chapters 14-15): the Lebesgue points of a BV function, its "jump set," and the precise representative are all defined in terms of density conditions analogous to those appearing in the definition of $\partial^* E$. The jump set of a general BV function is an $(n-1)$-rectifiable set by exactly the same argument that makes $\partial^* E$ rectifiable, applied to the level sets of $u$ via the co-area formula.
[/explanation]
## Density Estimates and the Structure of the Full Boundary
The blow-up theorem and De Giorgi's structure theorem give detailed information about $\partial^* E$. It is also important to understand what happens on $\partial E \setminus \partial^* E$ — the part of the topological boundary not captured by the perimeter measure.
[quotetheorem:3120]
This theorem says that the "singular part" of the topological boundary — the portion not seen by the perimeter measure — has $\mathcal{H}^{n-1}$-measure zero. The full boundary can have higher-dimensional complexity (it may have Hausdorff dimension strictly between $n-1$ and $n$, or it may consist of $\mathcal{H}^{n-1}$-null fractals), but all of it is invisible to the perimeter measure. This is the precise sense in which $\partial^* E$ is the "right" notion of boundary for sets of finite perimeter: it captures exactly the portion of the boundary that contributes to the perimeter, and the rest is negligible.
[example: A Set with Non-Rectifiable Topological Boundary]
Consider the "fractal Swiss cheese" construction: remove from $[0,1]^2$ a countable collection of open discs $D_k$ of radii $r_k$ with $\sum_k r_k < \infty$ but $\sum_k r_k^\alpha = \infty$ for every $\alpha < 1$. Let $E = [0,1]^2 \setminus \bigcup_k D_k$. The topological boundary $\partial E$ includes $\partial [0,1]^2$ and all the circles $\partial D_k$. The total length (i.e., $\mathcal{H}^1$-measure) of all circles is $\sum_k 2\pi r_k < \infty$, so $\mathcal{H}^1(\partial E) < \infty$ and $E$ has finite perimeter in $\mathbb{R}^2$. However, the union of all circles $\bigcup_k \partial D_k$ can be chosen to be dense in the square while having a complicated topological structure. The reduced boundary $\partial^* E$ consists of $\partial [0,1]^2$ (where the square looks like a half-plane) and the individual circles $\partial D_k$ (where the set looks like the exterior of a disc), with $\mathcal{H}^1(\partial^* E) = \mathcal{H}^1(\partial E)$. The perimeter measure is $P(E) = \mathcal{H}^1(\partial^* E) = \mathcal{H}^1(\partial E)$, exactly as De Giorgi's theorem predicts, even though the topological boundary is dense.
[/example]
The density estimates at reduced boundary points also give information about the Lebesgue points of $\mathbb{1}_E$. By the Lebesgue differentiation theorem, $\mathcal{L}^n$-almost every point is a Lebesgue point of $\mathbb{1}_E$, meaning the density is either 0 (outside $E$, up to measure zero) or 1 (inside $E$, up to measure zero). The set where the density fails to be 0 or 1 is the measure-theoretic boundary $\partial_m E$. De Giorgi's theorem implies $\mathcal{H}^{n-1}(\partial_m E) = \mathcal{H}^{n-1}(\partial^* E) = P(E)$, and $\mathcal{L}^n(\partial_m E) = 0$ (since $\partial_m E \subset \partial E$ and $\partial E$ has measure zero for a set of finite perimeter, by the Lebesgue differentiation theorem). This reflects a clean partition: $\mathcal{L}^n$-almost every point of $\mathbb{R}^n$ is either in the measure-theoretic interior of $E$, or in the measure-theoretic exterior of $E$, or on the reduced boundary $\partial^* E$ — and the three cases are distinguished precisely by the density of $E$ being 1, 0, or $1/2$ respectively.
The coarea formula suggests that sets with small perimeters are particularly significant, and the De Giorgi structure theorem makes this precise by showing that sets of finite perimeter admit a reduced boundary where the measure is supported. This structure is the foundation for the remaining theory and connects measure-geometric ideas to Gauss-Green integration.
# 13. The Gauss-Green Theorem
The entire edifice of Chapters 8 through 12 was built toward a single goal: making sense of a divergence theorem for sets that are far too rough to have a classical smooth boundary. Classical Gauss-Green requires $\partial E$ to be a $C^1$ hypersurface and uses the classical surface measure $d\mathcal{H}^{n-1}$ along a pointwise-defined outward normal. For a general set of finite perimeter neither object is available in any classical sense. What De Giorgi's Structure Theorem from Chapter 12 gives us is a complete substitute: the reduced boundary $\partial^* E$, a countably $(n-1)$-rectifiable set carrying the perimeter measure $\|D\mathbb{1}_E\| = \mathcal{H}^{n-1}\lfloor \partial^* E$, and the measure-theoretic unit normal $\nu_E: \partial^* E \to S^{n-1}$. This chapter shows that these objects fulfill every role played by $\partial E$ and the outward normal in the smooth setting, producing an exact divergence theorem that extends the classical one and underlies all weak formulations of boundary value problems in non-smooth domains.
## The Divergence Theorem for Finite-Perimeter Sets
The obstacle in formulating the divergence theorem for a set $E$ of finite perimeter is that $\partial E$ can be fractal, nowhere rectifiable, and of infinite $\mathcal{H}^{n-1}$-measure, while $\partial^* E$ is a well-behaved rectifiable hypersurface. The resolution is to replace $\partial E$ by $\partial^* E$ throughout — and De Giorgi's theorem guarantees that no integration is lost in doing so.
Recall the setup from Chapter 12. A Borel set $E \subset \mathbb{R}^n$ has **finite perimeter** in an open set $U$ if $\mathbb{1}_E \in BV(U)$ (Definition: Set of Finite Perimeter, Chapter 12), that is, the distributional derivative $D\mathbb{1}_E$ is a finite $\mathbb{R}^n$-valued Radon measure on $U$. The total variation measure $|D\mathbb{1}_E|$ equals $\mathcal{H}^{n-1}\lfloor \partial^* E$ by De Giorgi's Structure Theorem (Chapter 12, parts (a)–(c)), and the Radon-Nikodym decomposition gives
\begin{align*}
D\mathbb{1}_E = \nu_E \, \mathcal{H}^{n-1}\lfloor \partial^* E,
\end{align*}
where $\nu_E(x) \in S^{n-1}$ is the measure-theoretic inward normal at $\mathcal{H}^{n-1}$-a.e. $x \in \partial^* E$. The sign convention we adopt here follows Evans-Gariepy: $\nu_E$ points inward, so when we pass to the divergence theorem we pick up a sign that converts it to the outward normal.
[quotetheorem:3121]
[citeproof:3121]
The proof is almost embarrassingly short once De Giorgi's theorem is in place — the real work was done in Chapter 12. What makes this theorem remarkable is the quality of the objects on the right-hand side: $\partial^* E$ is a countably $(n-1)$-rectifiable set (so in particular it is $\sigma$-finite for $\mathcal{H}^{n-1}$), and $\nu_E$ is an $\mathcal{H}^{n-1}$-measurable, $S^{n-1}$-valued function defined $\mathcal{H}^{n-1}$-a.e. on $\partial^* E$. The integral on the right is a genuine surface integral in the sense of Hausdorff measure, not some abstract Radon measure we cannot compute with.
[remark: Comparison with the Classical Theorem]
When $E$ is a bounded open set with $C^1$ boundary, then $\partial^* E = \partial E$ holds $\mathcal{H}^{n-1}$-a.e. and $\nu_E$ coincides with the classical outward unit normal $\mathcal{H}^{n-1}$-a.e. on $\partial E$. The Gauss-Green theorem for finite-perimeter sets therefore genuinely extends the classical statement: for smooth sets the two are identical, and for non-smooth sets the GMT version continues to hold while the classical version breaks down.
[/remark]
The necessity of the finite-perimeter hypothesis deserves careful attention. The theorem requires $\mathbb{1}_E \in BV(U)$: this is the condition that the distributional derivative $D\mathbb{1}_E$ be representable as a finite $\mathbb{R}^n$-valued Radon measure, which in turn is what makes the integration-by-parts step valid. Without it, $D\mathbb{1}_E$ is merely a distribution of some higher order and the substitution breaks down. The condition $\phi \in C^1_c(U; \mathbb{R}^n)$ is equally essential: $C^1$ ensures $\operatorname{div}\phi$ is continuous and bounded on the compact support of $\phi$, while compact support allows us to ignore the behavior of $E$ near $\partial U$ and at infinity. Relaxing either condition — for instance to $\phi \in W^{1,1}$ — requires additional argument.
[example: Cube in $\mathbb{R}^n$]
Let $E = (0,1)^n$ be the open unit cube and $\phi \in C^1_c(\mathbb{R}^n; \mathbb{R}^n)$ arbitrary. The cube has finite perimeter: $\mathbb{1}_E \in BV(\mathbb{R}^n)$ because the distributional derivative of $\mathbb{1}_E$ in the $k$-th coordinate direction is the signed measure $\delta_{\{x_k = 1\}} - \delta_{\{x_k = 0\}}$ restricted to the appropriate face, and the total variation $|D\mathbb{1}_E| = \mathcal{H}^{n-1}\lfloor \partial^* E$ where $\partial^* E$ consists of the $2n$ open faces of the cube. On each face, the measure-theoretic outward normal $\nu_E$ is the coordinate vector $\pm e_k$ pointing away from $E$. The Gauss-Green theorem gives
\begin{align*}
\int_{(0,1)^n} \operatorname{div} \phi \, d\mathcal{L}^n = \sum_{k=1}^n \left(\int_{F_k^+} \phi_k \, d\mathcal{H}^{n-1} - \int_{F_k^-} \phi_k \, d\mathcal{H}^{n-1}\right),
\end{align*}
where $F_k^+ = \{x_k = 1\} \cap [0,1]^n$ and $F_k^- = \{x_k = 0\} \cap [0,1]^n$ are the top and bottom faces in direction $k$. This agrees with the Fundamental Theorem of Calculus applied coordinate by coordinate: $\int_0^1 \partial_{x_k}\phi_k \, dx_k = \phi_k|_{x_k=1} - \phi_k|_{x_k=0}$. The corners and edges of the cube form an $(n-2)$-dimensional set of $\mathcal{H}^{n-1}$-measure zero and contribute nothing to the surface integral, confirming that the reduced boundary correctly ignores these lower-dimensional features.
[/example]
## Application to Weak Boundary Value Problems
The significance of the Gauss-Green theorem extends well beyond the formula itself. Classical PDE theory formulates boundary value problems using integration by parts, which requires a divergence theorem. When the domain $\Omega$ is merely known to have finite perimeter — for instance, when $\Omega$ is a superlevel set of a BV function, or arises as the support of a measure, or is defined implicitly through a free boundary — the Gauss-Green theorem for finite-perimeter sets is the only available tool for passing from the strong PDE formulation to a weak one.
The core issue is this: given $u: E \to \mathbb{R}$ solving $-\Delta u = f$ inside $E$ with some boundary condition on $\partial^* E$, how does one define the weak formulation? The answer requires integrating by parts against test functions $\phi \in C^1_c$, and the boundary term that emerges is precisely the integral $\int_{\partial^* E} \phi \cdot \nu_E \, d\mathcal{H}^{n-1}$ appearing in the Gauss-Green theorem.
[explanation: Weak Formulation via Gauss-Green]
Consider the problem $-\Delta u = f$ in $E$ with $u = 0$ on $\partial^* E$, where $E \subset \mathbb{R}^n$ has finite perimeter. To derive the weak formulation, multiply by a test function $\psi \in C^1_c(E)$ and integrate:
\begin{align*}
-\int_E \Delta u \, \psi \, d\mathcal{L}^n = \int_E f \psi \, d\mathcal{L}^n.
\end{align*}
The left side is $-\int_E \operatorname{div}(\nabla u)\, \psi \, d\mathcal{L}^n$. Applying the Gauss-Green theorem to the vector field $\phi = \nabla u \cdot \psi$ (or equivalently by the product rule for divergence) gives
\begin{align*}
-\int_E \operatorname{div}(\nabla u \, \psi) \, d\mathcal{L}^n + \int_E \nabla u \cdot \nabla\psi \, d\mathcal{L}^n = \int_E f \psi \, d\mathcal{L}^n.
\end{align*}
The divergence theorem applied to $\phi = \nabla u \, \psi$ converts the first term to a boundary integral over $\partial^* E$. If $u = 0$ on $\partial^* E$ and $\psi \in C^1_c(E)$ vanishes on $\partial^* E$, the boundary term drops and we are left with
\begin{align*}
\int_E \nabla u \cdot \nabla \psi \, d\mathcal{L}^n = \int_E f \psi \, d\mathcal{L}^n.
\end{align*}
This is the weak formulation of the Dirichlet problem in $E$. The key point is that the whole derivation uses only the Gauss-Green theorem for finite-perimeter sets, not any smoothness of $\partial E$. In practice, $u$ would be sought in a Sobolev space $H^1_0(E)$ (defined as the closure of $C_c^\infty(E)$ in the $H^1$ norm), and existence follows from the Lax-Milgram theorem or Riesz representation in $H^1_0(E)$.
[/explanation]
This mechanism — reducing a PDE to a weak formulation via integration by parts over a non-smooth domain — pervades modern analysis. Free boundary problems, obstacle problems, minimal surface equations, and variational problems for functionals defined on sets of finite perimeter all exploit exactly this structure. The Gauss-Green theorem is not the endpoint of the theory but its practical engine.
## Relation to Currents and the Boundary Operator
The Gauss-Green theorem can be read in the language of geometric integration theory, where it takes a particularly elegant form. This connection provides both a conceptual unification and a gateway to the theory of currents.
An open set $E$ of finite perimeter defines an $n$-dimensional current $[E]$ by integration: $[E](\omega) = \int_E \omega$ for any smooth compactly supported $n$-form $\omega$. The boundary of this current, $\partial[E]$, is defined dually by $\partial[E](\omega) = [E](d\omega)$ for $(n-1)$-forms $\omega$. In the language of differential forms, the Gauss-Green theorem states that
\begin{align*}
[E](d\omega) = \int_{\partial^* E} \omega,
\end{align*}
which identifies $\partial[E]$ with integration over $\partial^* E$ weighted by the measure-theoretic orientation $\nu_E$. In the language of vector calculus (identifying $\phi$ with a $1$-form via the Riesz correspondence and $\operatorname{div}\phi$ with the corresponding $n$-form), this is precisely
\begin{align*}
\int_E \operatorname{div}\phi \, d\mathcal{L}^n = \int_{\partial^* E} \phi \cdot \nu_E \, d\mathcal{H}^{n-1}.
\end{align*}
The current-theoretic formulation has a striking consequence: the boundary operator $\partial$ satisfies $\partial \circ \partial = 0$. Applied to $[E]$, this yields $\partial(\partial[E]) = 0$, which at the measure-theoretic level says that $\partial^*E$ has no boundary in an appropriate sense — a fact consistent with its rectifiability and its role as an $(n-1)$-dimensional surface.
[remark: Forward Reference to Currents]
The identification $\partial[E] = \nu_E \, \mathcal{H}^{n-1}\lfloor \partial^* E$ is the starting point for the theory of integral currents and Plateau's problem, in which one seeks to minimize area among all currents with a prescribed boundary. The compactness and closure theorems for integral currents — the Federer-Fleming theory — generalize the structure theorem for finite-perimeter sets from this chapter and De Giorgi's theorem from Chapter 12 to objects of all dimensions. The theory of currents lies outside these notes; what matters here is that the Gauss-Green theorem provides the precise measure-theoretic meaning of the boundary $\partial[E]$, which is the conceptual foundation on which the current theory rests.
[/remark]
The measure-theoretic normal $\nu_E$ that appears in the Gauss-Green theorem is not just an analytic object — it encodes the geometry of $\partial^* E$ as a rectifiable hypersurface. At $\mathcal{H}^{n-1}$-a.e. point $x \in \partial^* E$, the approximate tangent hyperplane to $\partial^* E$ is $\{y : (y - x) \cdot \nu_E(x) = 0\}$, and in a small ball around $x$ the set $E$ looks, in density, like a half-space. This is the content of De Giorgi's theorem rephrased: the Gauss-Green theorem is valid because $\partial^* E$ behaves locally like a smooth surface at almost every one of its points.
<!-- illustration-needed: A set E of finite perimeter with a fractal-looking topological boundary — show the full topological boundary \partial E (which may be large and irregular) alongside the reduced boundary \partial^* E (the rectifiable part), with measure-theoretic normals \nu_E(x) drawn as unit vectors at representative points of \partial^* E. The picture should convey that \partial^* E \subset \partial E but \mathcal{H}^{n-1}(\partial E \setminus \partial^* E) = 0. -->
The De Giorgi structure of finite perimeter sets now allows us to prove the Gauss-Green theorem, which extends the classical divergence theorem to sets with irregular boundaries. This fundamental result shows that the reduced boundary carries all the measure-theoretic information needed for integration by parts formulas.
# 14. Pointwise Properties of BV Functions
A BV function is defined as an equivalence class in $L^1$, so its values at individual points carry no intrinsic meaning — two representatives of the same class may differ on a dense set. Yet the BV structure imposes much more regularity than bare $L^1$ membership would suggest. This chapter extracts the canonical pointwise information encoded in a BV function: at every point, one can define a pair of upper and lower approximate one-sided limits $u^+$ and $u^-$, the set where they disagree is the jump set $J_u$, and the full distributional derivative decomposes cleanly into an absolutely continuous part, a jump part concentrated on $J_u$, and a Cantor part. The source for this chapter is Evans–Gariepy §5.9–5.11.
## Approximate One-Sided Limits
The tool that makes pointwise analysis of $L^1$ functions possible is the notion of an approximate limit, already encountered in the Lebesgue differentiation theorem. For BV functions one needs a refinement: approximate one-sided limits, which detect the distinct values approached from two sides of a jump discontinuity.
[definition: Approximate Upper and Lower Limits]
Let $u \in L^1_{\mathrm{loc}}(\Omega)$. The **approximate upper limit** $u^+: \Omega \to [-\infty, +\infty]$ is the map
\begin{align*}
u^+(x) := \inf\bigl\{ t \in \mathbb{R} : \lim_{r \to 0} \frac{\mathcal{L}^n(\{u > t\} \cap B(x,r))}{\mathcal{L}^n(B(x,r))} = 0 \bigr\},
\end{align*}
and the **approximate lower limit** $u^-: \Omega \to [-\infty, +\infty]$ is the map
\begin{align*}
u^-(x) := \sup\bigl\{ t \in \mathbb{R} : \lim_{r \to 0} \frac{\mathcal{L}^n(\{u < t\} \cap B(x,r))}{\mathcal{L}^n(B(x,r))} = 0 \bigr\}.
\end{align*}
The inequality $u^-(x) \le u^+(x)$ holds at every point $x \in \Omega$.
[/definition]
Unpacking the definition: $u^+(x)$ is the smallest value $t$ such that the level set $\{u > t\}$ has density zero at $x$, meaning the fraction of $B(x,r)$ where $u$ exceeds $t$ vanishes as $r \to 0$. Thus $u^+(x)$ is the smallest threshold above which $u$ is negligible near $x$. Dually, $u^-(x)$ is the largest threshold below which $u$ is negligible near $x$. When $u^-(x) = u^+(x)$, this common value is the **approximate limit** of $u$ at $x$ — the "correct" pointwise value that the Lebesgue differentiation theorem recovers for $L^1$ functions.
[remark: Approximate Limits are Well-Defined Up to Null Sets]
The functions $u^+$ and $u^-$ depend only on the $\mathcal{L}^n$-equivalence class of $u$: changing $u$ on a set of measure zero does not alter $u^+(x)$ or $u^-(x)$ at any $x$. Moreover, both functions are Borel measurable. The approximate limit, when it exists, coincides with the Lebesgue value $\tilde{u}(x) = \lim_{r \to 0} \frac{1}{\mathcal{L}^n(B(x,r))}\int_{B(x,r)} u \, d\mathcal{L}^n$, so this framework generalizes rather than contradicts what we already know.
[/remark]
A natural question is whether $u^+$ and $u^-$ can be infinite on large sets. For a general $L^1$ function, the answer is yes — the values $u^+(x) = +\infty$ can occur on a set of full $\mathcal{L}^n$-measure, and there is no effective control on how large the set of infinite approximate limits can be. For instance, consider $u(x) = |x|^{-n/2}$ on $B(0,1) \subset \mathbb{R}^n$, which lies in $L^1(B(0,1))$ since $n/2 < n$, yet $u^+(0) = +\infty$. More dramatically, one can construct $L^1$ functions for which $u^+(x) = +\infty$ for every rational $x$ in $[0,1]^n$ while $u^- = u^+ = 0$ a.e. — such functions satisfy no finiteness condition on their approximate limits beyond $\mathcal{L}^n$-almost everywhere. The BV condition forces finiteness in a much stronger, codimension-one sense.
[quotetheorem:3122]
[citeproof:3122]
The measure $\mathcal{H}^{n-1}$ is the right one here, not $\mathcal{L}^n$. The Lebesgue differentiation theorem already gives $u^+(x) < \infty$ for $\mathcal{L}^n$-almost every $x$ for any $L^1$ function. What the theorem says is that the BV condition controls the approximate limits at the finer scale of $(n-1)$-dimensional sets — precisely because the derivative $Du$ is a finite measure that, via the co-area formula, controls the perimeters of level sets in a way that $L^1$-membership alone cannot. Without the BV hypothesis, one can construct $L^1$ functions where $u^+(x) = +\infty$ on a set with $\mathcal{H}^{n-1}$-measure as large as desired: for instance, take $u(x) = \sum_{k=1}^\infty k \cdot \mathbb{1}_{B(x_k, 2^{-k})}$ in $\mathbb{R}^n$ where $(x_k)$ is dense — then $u \in L^1_{\mathrm{loc}}$ but $u^+(x) = +\infty$ on a dense set, and the total variation $|Du|$ is infinite.
## The Jump Set
How geometrically tame are the discontinuities of a BV function? A general $L^1$ function can be discontinuous everywhere and its "bad set" can be an arbitrary Borel set with no geometric structure — a dense open set, a fractal, even all of $\Omega$. For BV functions, the situation is dramatically different: the discontinuities organize into a rectifiable hypersurface. Making this precise requires first classifying the types of points in $\Omega$ according to the relationship between $u^+$ and $u^-$.
[definition: Jump Set]
Let $u \in BV(\Omega)$. The **jump set** of $u$ is
\begin{align*}
J_u := \{ x \in \Omega : u^-(x) < u^+(x) \}.
\end{align*}
At each point $x \in J_u$, there exists a unit vector $\nu_u(x) \in S^{n-1}$ (the **jump direction** or **approximate unit normal** at $x$) such that
\begin{align*}
\lim_{r \to 0} \frac{\mathcal{L}^n\bigl(\{u > u^+(x)\} \cap B^+(x, r, \nu_u(x))\bigr)}{\mathcal{L}^n(B(x,r))} &= 0, \\
\lim_{r \to 0} \frac{\mathcal{L}^n\bigl(\{u < u^-(x)\} \cap B^-(x, r, \nu_u(x))\bigr)}{\mathcal{L}^n(B(x,r))} &= 0,
\end{align*}
where $B^+(x, r, \nu) := \{ y \in B(x,r) : (y - x) \cdot \nu > 0 \}$ and $B^-(x, r, \nu) := \{ y \in B(x,r) : (y-x) \cdot \nu < 0 \}$ are the two half-balls. The jump direction field $\nu_u: J_u \to S^{n-1}$ is defined $\mathcal{H}^{n-1}$-almost everywhere on $J_u$ and is $\mathcal{H}^{n-1}$-measurable.
[/definition]
<!-- illustration-needed: A cross-section through a point $x \in J_u$, showing the half-balls $B^+(x,r,\nu_u)$ and $B^-(x,r,\nu_u)$ separated by the tangent hyperplane to $J_u$ at $x$, with the approximate values $u^+(x)$ labelled on the positive side and $u^-(x)$ on the negative side. -->
The jump direction $\nu_u(x)$ is the approximate normal to the "jump surface" at $x$: when one approaches $x$ from the $\nu_u(x)$ side, $u$ is approximately $u^+(x)$, while from the $-\nu_u(x)$ side, $u$ is approximately $u^-(x)$. The jump direction is unique up to sign, and the convention $u^+(x) > u^-(x)$ fixes the sign by requiring $u^+(x)$ to be the larger of the two limit values.
The complement $\Omega \setminus J_u$ is the set of **approximate continuity points**: at these points $u^+(x) = u^-(x)$, and $u$ has a well-defined approximate limit that serves as its precise representative. The Lebesgue differentiation theorem guarantees that $\mathcal{L}^n(J_u) = 0$, but the BV theory says far more — the jump set is a rectifiable set of codimension one.
[quotetheorem:3123]
[citeproof:3123]
This rectifiability is a substantial structural result. It says that even though a BV function need not be continuous anywhere, its discontinuities are not wild — they are organized into smooth pieces, just as the boundary of a set of finite perimeter is organized by De Giorgi's theorem.
To appreciate what rectifiability rules out, consider what can happen without the BV hypothesis. An $L^1$ function can have a jump set that is an arbitrary Borel set of $\mathcal{L}^n$-measure zero — for instance, a Cantor-type set of dimension $\alpha$ for any $\alpha \in (0, n)$. A famous example is the function $u(x) = \mathbb{1}_{C \times [0,1]^{n-1}}$ in $\mathbb{R}^n$ where $C \subset [0,1]$ is the fat Cantor set (a closed set of positive measure with dense interior complement): here $u \in L^1$ but $u \notin BV(\mathbb{R}^n)$, and the jump set in the first coordinate direction spirals through $C \times [0,1]^{n-1}$, which is a non-rectifiable set. The rectifiability in the theorem also rules out fractal discontinuity sets: a function with jump set equal to a Koch snowflake boundary (Hausdorff dimension $> 1$, hence non-rectifiable as a $1$-dimensional object in $\mathbb{R}^2$) cannot be BV. The theorem connects forward to special BV functions (SBV), where the Cantor part is absent and $|Du| = |\nabla u|\mathcal{L}^n + (u^+ - u^-)\mathcal{H}^{n-1}\lfloor J_u$; SBV is the natural space for variational fracture models and the Mumford–Shah functional in image segmentation, where one seeks a function with a clean rectifiable discontinuity set representing object boundaries.
[example: Jump Set of the Indicator Function]
Let $E \subset \mathbb{R}^n$ be a set of finite perimeter in $\Omega$, and let $u = \mathbb{1}_E$. At each point $x \in \partial^* E$ (the reduced boundary), we verify the jump condition directly from the definition. By definition of the reduced boundary, $x \in \partial^* E$ means there exists a unit vector $\nu_E(x) \in S^{n-1}$ such that
\begin{align*}
\lim_{r \to 0} \frac{\mathcal{L}^n(E \cap B^+(x, r, \nu_E(x)))}{\mathcal{L}^n(B(x,r))} &= \frac{1}{2}, \\
\lim_{r \to 0} \frac{\mathcal{L}^n(E \cap B^-(x, r, \nu_E(x)))}{\mathcal{L}^n(B(x,r))} &= 0,
\end{align*}
where $B^+$ is the half-ball on the $\nu_E(x)$ side and $B^-$ is on the $-\nu_E(x)$ side. Since $u = \mathbb{1}_E$, the level set $\{u > t\}$ equals $E$ for $t \in [0,1)$ and is empty for $t \ge 1$. For $t \in [0,1)$, the density of $\{u > t\}$ in $B^-(x,r,\nu_E(x))$ is $\mathcal{L}^n(E \cap B^-)/\mathcal{L}^n(B) \to 0$, so the definition of $u^+(x)$ gives $u^+(x) \ge t$ for all $t \in [0,1)$, hence $u^+(x) \ge 1$; since $u \le 1$ everywhere, $u^+(x) = 1$. On the $\nu_E$-side, $\{u < t\}$ for $t \in (0,1]$ has density in $B^+(x,r,\nu_E)$ equal to $\mathcal{L}^n(E^c \cap B^+)/\mathcal{L}^n(B) \to 0$ (since $E$ fills $B^+$ to density $1/2 > 0$ but the complement fills the rest), giving $u^-(x) = 0$. Therefore $u^+(x) = 1 > 0 = u^-(x)$, confirming $x \in J_u$.
For points outside $\partial^* E$, the density of $E$ is either $0$ (for $\mathcal{L}^n$-a.e. $x \notin E$) or $1$ (for $\mathcal{L}^n$-a.e. $x \in E$), giving $u^+ = u^- \in \{0,1\}$, so these are approximate continuity points.
Therefore $J_u = \partial^* E$, and the jump direction $\nu_u = \nu_E$ is the measure-theoretic outward normal. The rectifiability of $J_u$ in this case is exactly De Giorgi's Structure Theorem.
[/example]
## Decomposition of the Derivative at the Jump Set
The jump set is more than a geometric curiosity — it is where the jump part of the distributional derivative is concentrated. Recall from the BV Structure Theorem (Chapter 8) the three-part decomposition of $Du$:
\begin{align*}
Du = D^a u + D^s u = D^a u + D^j u + D^c u,
\end{align*}
where $D^a u = \nabla u \, \mathcal{L}^n$ is the absolutely continuous part, $D^j u$ is the jump part, and $D^c u$ is the Cantor part. The following theorem makes the jump part explicit.
[quotetheorem:3124]
[citeproof:3124]
The formula $(u^+ - u^-)\nu_u \cdot \mathcal{H}^{n-1}\lfloor J_u$ has a vivid geometric interpretation: the jump part of the derivative is a vector-valued measure supported on $J_u$, with density $(u^+(x) - u^-(x))\nu_u(x)$ against $(n-1)$-dimensional Hausdorff measure. The magnitude $u^+(x) - u^-(x)$ is the jump height, and $\nu_u(x)$ is the jump direction. For the indicator function $\mathbb{1}_E$, the jump height is identically $1$ and $\nu_u = \nu_E$, so $D^j(\mathbb{1}_E) = \nu_E \, \mathcal{H}^{n-1}\lfloor \partial^* E = D(\mathbb{1}_E)$ — which is consistent with $D^c(\mathbb{1}_E) = 0$ (since $E$ has finite perimeter, its derivative has no Cantor part).
[example: The Cantor–Vitali Function and the Cantor Part]
Let $c: [0,1] \to [0,1]$ be the Cantor–Vitali function (devil's staircase): $c$ is continuous, non-decreasing, $c(0) = 0$, $c(1) = 1$, and $c'(x) = 0$ for $\mathcal{L}^1$-almost every $x \in [0,1]$ (since $c$ is constant on each interval removed in the Cantor set construction). Extend $c$ to all of $\mathbb{R}$ by $c = 0$ on $(-\infty, 0)$ and $c = 1$ on $(1, \infty)$.
Since $c$ is non-decreasing and bounded, $c \in BV(\mathbb{R})$ and $Dc$ is a positive Borel measure. Since $c$ is continuous, the jump set $J_c = \varnothing$ (there are no jumps: $c^+(x) = c^-(x) = c(x)$ at every point). Since $c' = 0$ $\mathcal{L}^1$-a.e., the absolutely continuous part $D^a c = c' \mathcal{L}^1 = 0$. Therefore $Dc = D^c c$: the entire distributional derivative is the Cantor part.
Explicitly, $Dc$ is the standard Cantor measure on $[0,1]$: a probability measure supported on the Cantor set $\mathcal{C} \subset [0,1]$ (which has $\mathcal{L}^1(\mathcal{C}) = 0$), with $Dc(A) = c(\sup A) - c(\inf A)$ for any interval $A$. The measure $Dc$ is singular with respect to $\mathcal{L}^1$ (since $\mathcal{L}^1(\mathcal{C}) = 0$) yet gives zero mass to every single point (since $c$ is continuous). This is the canonical example of a Cantor part: it is mutually singular with $\mathcal{L}^1$ but assigns no mass to any $\mathcal{H}^0$-finite set — which in particular means it assigns zero mass to the jump set $J_c = \varnothing$.
The function $u(x) = x + c(x)$ on $[0,1]$ exhibits all three parts simultaneously: $Du = \mathcal{L}^1\lfloor[0,1] + Dc$, so $D^a u = \mathcal{L}^1\lfloor[0,1]$ (the identity part), $D^c u = Dc$ (the Cantor part), and $D^j u = 0$ (no jumps). Adding a step function $v(x) = \mathbb{1}_{[1/2, 1]}(x)$ gives $w = u + v$ with $D^j w = \delta_{1/2}$ (a jump of height $1$ at $x = 1/2$), $D^a w = \mathcal{L}^1\lfloor[0,1]$, and $D^c w = Dc$, exhibiting all three parts of the decomposition simultaneously.
[/example]
## The Fubini-Type Characterization via Lines
The global definition of $BV(\Omega)$ via distributional derivatives is often difficult to verify directly — given a specific function, how does one check that $\int_\Omega |Du|$ is finite? In one dimension, bounded variation has a classical characterization in terms of Jordan's condition: $\sup \sum |u(x_{i+1}) - u(x_i)| < \infty$ over all partitions. In higher dimensions, neither partitions nor individual direction tests suffice without knowing that the condition holds simultaneously in all directions. If one tests only a single coordinate direction, the resulting condition is too weak: a function can have finite one-dimensional total variation along lines parallel to $e_1$ yet fail to be BV, because the cross-directional oscillation can be uncontrolled. What saves the situation is an $L^1$-integrability condition that must hold for every coordinate direction simultaneously.
[definition: Line Restrictions]
For $u: \Omega \to \mathbb{R}$ and a direction $e_i$ (the $i$-th standard basis vector), let $\Omega_{x'} := \{ t \in \mathbb{R} : (x', t) \in \Omega \}$ denote the slice of $\Omega$ at $x' \in \mathbb{R}^{n-1}$ (suppressing the $i$-th coordinate). Define $u_{x'}: \Omega_{x'} \to \mathbb{R}$ by $u_{x'}(t) := u(x', t)$.
[/definition]
[quotetheorem:3125]
[citeproof:3125]
For the converse: if the slice integrals are finite, one constructs $Du$ directly as the weak limit of the difference quotients. Specifically, the distributional partial derivative $\partial_i u$ is identified with the measure whose action on $\varphi$ is computed by Fubini and the one-dimensional integration-by-parts formula on each slice. The finiteness of the slice integral gives the required bound $|Du|(\Omega) < \infty$.
This result is more than a characterization — it provides a concrete method for verifying membership in $BV$ by checking one-dimensional sections, and it reveals the slicing structure of $Du$: the distributional derivative along direction $e_i$ is built by assembling the one-dimensional derivatives of the slices.
A critical point is that the condition must hold for all $n$ coordinate directions, not just one. To see why one direction is insufficient, consider in $\mathbb{R}^2$ the function $u(x_1, x_2) = \mathbb{1}_{E}(x_1, x_2)$ where $E = \{(x_1, x_2) : 0 \le x_1 \le 1, 0 \le x_2 \le f(x_1)\}$ for a measurable $f: [0,1] \to [0,1]$. Each horizontal slice $u_{x_2}$ (fixing $x_2$, varying $x_1$) has total variation $V(u_{x_2}; [0,1]) = 2$ whenever the slice intersects $\partial E$ in two points, so $\int_0^1 V(u_{x_2}; [0,1]) \, dx_2 \le 2 < \infty$. But the vertical slices $u_{x_1}$ (fixing $x_1$, varying $x_2$) have $V(u_{x_1}; [0,1]) = 2\mathbb{1}_{[0,1]}(x_1)$ if $f(x_1) \in (0,1)$ and otherwise are constant. The total $\int_0^1 V(u_{x_1}; [0,1]) \, dx_1$ is finite if and only if $f \in L^1([0,1])$, which is automatically satisfied. So both single-direction conditions hold for any $L^1$ function $f$, yet $P(E; (0,1)^2) = \int_0^1 \sqrt{1 + f'(x_1)^2} \, dx_1$ can be infinite if $f$ is not Lipschitz — the perimeter (and hence $|Du|$) requires the joint behavior in all directions simultaneously. The slicing theorem is used repeatedly in the proofs of the co-area formula and the structure theorem for SBV functions.
[example: Slicing Applied to Sets of Finite Perimeter]
Let $E \subset \mathbb{R}^n$ be a measurable set with $\mathcal{L}^n(E \cap K) < \infty$ for every compact $K$. By the Fubini characterization, $E$ has finite perimeter in $\Omega$ (i.e., $\mathbb{1}_E \in BV(\Omega)$) if and only if, for $\mathcal{H}^{n-1}$-almost every $x' \in \mathbb{R}^{n-1}$, the one-dimensional slice $E_{x'} := \{ t : (x', t) \in E \}$ is a set of finite perimeter in $\Omega_{x'}$, with $\int V(\mathbb{1}_{E_{x'}}; \Omega_{x'}) \, d\mathcal{H}^{n-1}(x') < \infty$.
For a set in $\mathbb{R}^1$, finite perimeter means that $\mathbb{1}_{E_{x'}}$ has finitely many jumps in each compact subset of $\Omega_{x'}$ — that is, the indicator changes between $0$ and $1$ only finitely often. So $E$ has finite perimeter iff the one-dimensional slices $E_{x'}$ "change finitely often" for almost every $x'$, and the average number of changes (integrated over $x'$) is finite. This gives an intuitive picture: a set of finite perimeter cannot have fractal boundary in the direction of the slices.
[/example]
## Determination by Approximate Limits and Jumps
How much pointwise data does one need to recover a BV function completely? Two BV functions can agree $\mathcal{L}^n$-almost everywhere while differing on a set of positive $\mathcal{H}^{n-1}$-measure — since the precise representative is defined only up to $\mathcal{H}^{n-1}$-null sets at the level of the jump set. The question is whether three pieces of data — the precise representative on the approximate continuity set, the jump values $(u^-, u^+)$ on $J_u$, and the Cantor measure $D^c u$ — are sufficient to pin down the function uniquely up to $\mathcal{H}^{n-1}$-almost everywhere equality.
[quotetheorem:3126]
[citeproof:3126]
The determination theorem has a natural interpretation: a BV function is fully encoded by three pieces of data — the precise representative on the set of approximate continuity (where $u^+ = u^- = \tilde{u}$), the pair $(u^-, u^+)$ specifying the jump structure on $J_u$, and the Cantor measure $D^c u$. This is analogous to the situation in one dimension, where a function of bounded variation is determined by its values at continuity points and the specification of its left and right limits at each jump.
Note that all three hypotheses are necessary. If condition (3) fails — $D^c u \ne D^c v$ — then $u$ and $v$ can agree on $u^{\pm} = v^{\pm}$ everywhere yet differ on a Cantor-type set: take $v = u + c$ where $c$ is the Cantor–Vitali function; then $c$ is continuous so $J_{u+c} = J_u$, the jump values are unchanged, but $D^c(u+c) = D^c u + Dc \ne D^c u$. If condition (1) fails with (2) and (3) holding, one can have $w = u - v$ with zero jump and zero Cantor part but nonzero absolutely continuous part: take $v = u + \psi$ with $\psi \in W^{1,1}(\Omega)$ non-constant, $J_\psi = \varnothing$, $D^c \psi = 0$. Conditions (1) and (3) without (2) would allow two functions to disagree in jump heights on $J_u$, directly giving a nonzero jump part of $D(u-v)$.
[explanation: The Three-Part Decomposition and Its Geometric Meaning]
The decomposition $Du = D^a u + D^j u + D^c u$ has a clear hierarchy in terms of the pointwise data of $u$.
The absolutely continuous part $D^a u = \nabla u \, \mathcal{L}^n$ is the classical gradient, and it is responsible for the variation of $u$ at points where $u$ behaves like a Sobolev function. At $\mathcal{L}^n$-almost every point of approximate continuity, $u$ can be approximated by a linear function with slope $\nabla u(x)$.
The jump part $D^j u = (u^+ - u^-)\nu_u \, \mathcal{H}^{n-1}\lfloor J_u$ captures the codimension-one discontinuities. The jump set $J_u$ is $(n-1)$-rectifiable and carries $\mathcal{H}^{n-1}$-measure. The contribution to the total variation is
\begin{align*}
|D^j u|(A) = \int_{A \cap J_u} (u^+(x) - u^-(x)) \, d\mathcal{H}^{n-1}(x),
\end{align*}
which is the integral of the jump height over the jump surface.
The Cantor part $D^c u$ is orthogonal to both: it is singular with respect to $\mathcal{L}^n$ (hence differs from $D^a u$), yet it assigns zero mass to every $\mathcal{H}^{n-1}$-$\sigma$-finite set (hence differs from $D^j u$). In particular, $|D^c u|(J_u) = 0$. The Cantor part is the most mysterious: it is the BV analogue of the derivative of the Cantor–Vitali function (devil's staircase), a function that is almost everywhere differentiable with derivative zero, yet is not constant. A BV function with nontrivial Cantor part can be everywhere approximately continuous ($J_u = \varnothing$) yet fail to be in $W^{1,1}$.
The set $S_u := J_u \cup \{x : u^+ = u^- = \pm \infty\}$ (which by the finiteness theorem has $\mathcal{H}^{n-1}(S_u) < \infty$ locally) is the singular set of $u$ in the pointwise sense. The complement $\Omega \setminus S_u$ consists precisely of the approximate continuity points, where $u$ has a well-defined precise representative $\tilde{u}(x) = u^+(x) = u^-(x)$.
[/explanation]
## The Precise Representative Outside the Jump Set
Knowing that $u^+ = u^-$ outside $J_u$ tells us the jump structure, but does the common value $\tilde{u}(x) = u^+(x) = u^-(x)$ actually control $u$ in balls around $x$ in a quantitative $L^1$ sense? This matters, for example, when defining traces of BV functions on hypersurfaces other than the jump set, or when formulating the boundary condition for variational problems in BV. If $\tilde{u}$ were merely an approximate limit in a qualitative sense, it could not be used as a reliable local representative for integration. The following theorem confirms the quantitative approximate continuity needed for such applications.
[quotetheorem:3127]
The proof uses the Lebesgue differentiation theorem applied to the function $t \mapsto \mathcal{L}^n(\{u > t\} \cap B(x,r)) / \mathcal{L}^n(B(x,r))$ and the identification of $u^+(x)$ and $u^-(x)$ as the upper and lower limits of this monotone function of $t$.
The relationship $\tilde{u} = u$ $\mathcal{L}^n$-almost everywhere shows that $\tilde{u}$ is a genuine representative of the $L^1$ equivalence class. What distinguishes it from an arbitrary representative is its definition in terms of the measure-theoretic structure of $u$: $\tilde{u}(x)$ is the unique value that is "seen" with positive density at $x$, in the sense that no other value competes with it in small balls.
[remark: Comparison with Sobolev Quasicontinuity]
For Sobolev functions $u \in W^{1,p}(\Omega)$ with $1 \le p < \infty$, the precise representative is $p$-quasicontinuous: it is continuous outside an open set of $p$-capacity arbitrarily small (Chapter 7, Sobolev Quasicontinuous Representative theorem). For BV functions, the analogous statement involves $\mathcal{H}^{n-1}$-measure rather than capacity. The jump set $J_u$ has $\sigma$-finite $\mathcal{H}^{n-1}$-measure, and outside this set the precise representative is approximately continuous. This is a weaker regularity than quasicontinuity, reflecting the fact that BV is a larger space than $W^{1,1}$ and contains functions (such as the indicator of a ball) that are genuinely discontinuous along hypersurfaces.
[/remark]
The pointwise theory developed in this chapter completes the structural analysis of BV functions begun in Chapter 8. The picture that emerges is the following: a BV function $u \in BV(\Omega)$ looks like a Sobolev function ($D^a u$ part) glued together along a rectifiable hypersurface $J_u$ (the jump part $D^j u$), with a Cantor-type residual ($D^c u$). The precise representative $\tilde{u}$ is well-defined $\mathcal{H}^{n-1}$-almost everywhere, the jump set $J_u$ is rectifiable, and the derivative is completely determined by the pointwise data $(u^-, u^+, \nu_u, D^c u)$. In practice, to compute the decomposition of $Du$ for a given function: (i) find $J_u$ as the set where $u$ has genuine one-sided limits that disagree, which for piecewise-smooth $u$ is the union of the smooth discontinuity surfaces; (ii) compute $D^j u = (u^+ - u^-)\nu_u \, \mathcal{H}^{n-1}\lfloor J_u$ by integrating the jump height over those surfaces; (iii) compute $D^a u = \nabla u \, \mathcal{L}^n$ from the classical gradient where $u$ is differentiable; and (iv) identify $D^c u = Du - D^a u - D^j u$ as the residual, which vanishes for SBV functions and equals the Cantor measure for functions like the Cantor–Vitali staircase. Chapter 15 applies this procedure to explicit examples and worked problems.
With the Gauss-Green theorem established, we turn to fine properties of BV functions—what they look like at individual points and on sets of full measure. Pointwise differentiability and density arguments reveal how the singular and absolutely continuous parts of the variation manifest at the microscopic level.
# 15. Examples and Worked Problems
The preceding fourteen chapters of this course have built an extensive theoretical apparatus — Sobolev spaces, BV functions, the structure theorem for sets of finite perimeter, the Gauss-Green theorem, and the capacity theory of Sobolev spaces. Each major theorem (De Giorgi's structure theorem, the isoperimetric inequality, the BV decomposition) was stated in considerable generality, and the proofs were necessarily abstract: we relied on blow-up arguments, compactness, and measure-theoretic density results rather than explicit computation. This final chapter brings the theory back to earth through six worked problems, each of which anchors one of those abstract results in a specific, fully computed example. Reading these examples as a pair with the relevant chapter will give you the calibration needed to apply the theory to new situations: you will see exactly which hypothesis is doing which work, and you will see how the abstract objects (reduced boundary, distributional derivative, perimeter measure) behave in cases where they can be computed directly.
## Perimeter of a Convex Polygon
The perimeter of a polygon is one of the first things one learns in a geometry course — it is simply the sum of the side lengths. The point of this example is to show that the BV definition of perimeter, which involves taking the supremum over smooth vector fields, yields exactly this elementary quantity for a convex polygon. The verification is not immediate: one must identify the reduced boundary, compute the outward unit normal at each point, and then invoke De Giorgi's structure theorem to conclude that the perimeter measure is $\mathcal{H}^1$ on the reduced boundary. The key step — showing that vertices are excluded from the reduced boundary — requires a direct blow-up argument, and it is important to carry it out correctly via the blow-up of the characteristic function rather than via the density ratio alone.
<!-- illustration-needed: a convex polygon with vertices $v_1, \ldots, v_k$ labeled counterclockwise, with outward unit normals $\nu_i$ drawn on each edge $S_i$; a separate inset showing the wedge-shaped blow-up at a vertex $v_i$ with interior angle $\theta_i \in (0,\pi)$ to contrast with the half-plane blow-up at an interior edge point -->
[example: Perimeter of a Convex Polygon]
Let $E \subset \mathbb{R}^2$ be a bounded convex polygon with vertices $v_1, \ldots, v_k \in \mathbb{R}^2$ (ordered counterclockwise), and let $S_i$ denote the open edge from $v_i$ to $v_{i+1}$ (indices mod $k$). We claim that the reduced boundary is $\partial^* E = \bigcup_{i=1}^k S_i$, the outward unit normal $\nu_E$ on each $S_i$ is the constant unit vector perpendicular to $S_i$ and pointing outward, and the perimeter satisfies
\begin{align*}
P(E; \mathbb{R}^2) &= \mathcal{H}^1(\partial E) = \sum_{i=1}^k |v_{i+1} - v_i|.
\end{align*}
**Step 1: The reduced boundary excludes the vertices.** Fix a vertex $v_i$ and a radius $r > 0$ smaller than the lengths of both adjacent edges. Let $\theta_i \in (0, \pi)$ be the interior angle at $v_i$. In $B(v_i, r)$, the set $E$ is a wedge of angle $\theta_i$. The definition of reduced boundary requires that the rescaled characteristic functions $\mathbb{1}_{E_\rho}(y) = \mathbb{1}_E(v_i + \rho y)$ converge in $L^1_{\mathrm{loc}}(\mathbb{R}^2)$ to the indicator of a half-space as $\rho \to 0$.
For any $\rho > 0$ small enough, the rescaled set $E_\rho = (E - v_i)/\rho$ is the wedge of angle $\theta_i$ (since the polygon is straight-edged; scaling does not change the angle). Therefore $\mathbb{1}_{E_\rho} \to \mathbb{1}_W$ in $L^1_{\mathrm{loc}}(\mathbb{R}^2)$, where $W$ is the infinite wedge of angle $\theta_i$ bisected by the interior of $E$ at $v_i$. Since $\theta_i \in (0, \pi)$, the set $W$ is not a half-space (a half-space has interior angle $\pi$). Accordingly, the blow-up limit at $v_i$ is not the indicator of a half-plane, and $v_i \notin \partial^* E$.
One can also see this from the density ratio: the vector measure $D\mathbb{1}_E$ assigns to $B(v_i, \rho)$ the value $-\int_{\partial E \cap B(v_i,\rho)} \nu_E \, d\mathcal{H}^1 \approx -\nu_i^{(1)} \rho - \nu_i^{(2)} \rho$, where $\nu_i^{(1)}, \nu_i^{(2)}$ are the outward unit normals on the two adjacent edges, while $|D\mathbb{1}_E|(B(v_i,\rho)) \approx 2\rho$. The ratio $D\mathbb{1}_E(B(v_i,\rho))/|D\mathbb{1}_E|(B(v_i,\rho))$ converges to $-(\nu_i^{(1)} + \nu_i^{(2)})/2$, a vector of norm $|\cos((\pi - \theta_i)/2)|$. Since $\theta_i \in (0,\pi)$, this norm lies in $(0,1)$ — strictly less than $1$ — confirming that the density ratio does not converge to a unit vector, which is a second (equivalent) way to see that $v_i \notin \partial^* E$.
**Step 2: Interior edge points lie in $\partial^* E$.** Fix a point $x_0 \in S_i$ with outward unit normal $\nu_i$. In a small ball $B(x_0, r)$, the set $E$ is exactly the half-plane $\{x : (x - x_0) \cdot \nu_i < 0\}$ (since the edge is a straight line segment and $r$ is small enough that $x_0$ is away from both vertices). The blow-up $E_\rho = (E - x_0)/\rho$ is therefore the half-space $H_{\nu_i}^- = \{y : y \cdot \nu_i < 0\}$ for every $\rho > 0$, and the ratio $D\mathbb{1}_{E_\rho}/|D\mathbb{1}_{E_\rho}|$ at the origin equals $-\nu_i$ for every $\rho$. Thus $x_0 \in \partial^* E$ with $\nu_E(x_0) = \nu_i$.
**Step 3: Perimeter computation.** By De Giorgi's structure theorem, $|D\mathbb{1}_E| = \mathcal{H}^1 \llcorner \partial^* E$. Since $\partial^* E = \bigcup_{i=1}^k S_i$ and the vertices form a finite (hence $\mathcal{H}^1$-null) set, we obtain
\begin{align*}
P(E; \mathbb{R}^2) &= |D\mathbb{1}_E|(\mathbb{R}^2) = \mathcal{H}^1(\partial^* E) = \sum_{i=1}^k \mathcal{H}^1(S_i) = \sum_{i=1}^k |v_{i+1} - v_i|.
\end{align*}
This is exactly the elementary perimeter formula. The same value emerges directly from the BV definition as a supremum of divergence integrals: for any $\phi \in C_c^1(\mathbb{R}^2; \mathbb{R}^2)$ with $|\phi| \leq 1$,
\begin{align*}
\int_E \operatorname{div} \phi \, d\mathcal{L}^2 &= -\sum_{i=1}^k \int_{S_i} \phi \cdot \nu_i \, d\mathcal{H}^1,
\end{align*}
and the supremum over all such $\phi$ equals $\sum_{i=1}^k \mathcal{H}^1(S_i)$, achieved by taking $\phi \approx -\nu_i$ on $S_i$.
[/example]
The example reveals why the vertices do not contribute to the perimeter despite being on $\partial E$: the blow-up at a corner is a wedge rather than a half-plane, so the rescaled characteristic functions do not converge to a half-space indicator. This is the geometric content of De Giorgi's theorem — the perimeter is supported on the "smooth" part of the boundary, meaning the part where a unique approximate tangent hyperplane exists. The finitely many exceptional points (corners, cusps, edges of higher codimension) have $\mathcal{H}^{n-1}$-measure zero and do not affect the perimeter, so the BV definition automatically agrees with the elementary side-length sum on every convex polygon.
There is a useful general technique here. To determine whether a boundary point $x_0$ lies in the reduced boundary, always start by identifying the blow-up limit $\lim_{\rho \to 0} \mathbb{1}_{(E-x_0)/\rho}$ in $L^1_{\mathrm{loc}}$. If it is the indicator of a half-space, then $x_0 \in \partial^* E$ and the normal is the normal to that half-space. If it is anything else — a wedge, a cone, the characteristic function of a sector — then $x_0 \notin \partial^* E$. The density ratio criterion (checking whether $D\mathbb{1}_E(B(x_0,\rho))/|D\mathbb{1}_E|(B(x_0,\rho))$ converges to a unit vector) is equivalent but can obscure the geometry; working directly with the blow-up limit is usually more transparent.
Having pinned down the geometric content of the perimeter measure on a piecewise-flat boundary, the natural next question concerns BV functions rather than BV sets. The polygon example showed that the gradient measure $D\mathbb{1}_E$ behaves like a one-dimensional Hausdorff measure on a clean rectifiable set, with no singular Cantor-type contribution — every bit of the perimeter mass is accounted for by the absolutely continuous (with respect to $\mathcal{H}^1$) jump across straight edges. The next example reverses this situation entirely: it exhibits a scalar BV function whose distributional derivative has no absolutely continuous part, no jump part on a discrete set, and yet a nonzero total mass. This pivot is necessary because the structure theorem for BV functions splits $Du$ into three components, and the polygon example exercised only the first two; the Cantor staircase is what forces the third component into the picture.
## The Cantor Staircase: A Pure Cantor BV Function
The Lebesgue decomposition theorem for measures says that every finite signed measure $\mu$ splits uniquely into an absolutely continuous part, a jump part, and a Cantor-type part (continuous but singular with respect to Lebesgue measure). The BV structure theorem for functions mirrors this decomposition: every $u \in BV(\Omega)$ has a distributional derivative $Du = D^a u + D^j u + D^c u$, where $D^a u$ is absolutely continuous, $D^j u$ is a jump measure concentrated on the jump set $J_u$, and $D^c u$ is the Cantor part. The standard example showing that $D^c u$ cannot simply be dropped is the Cantor staircase function, also called the devil's staircase. It is an example of a BV function where $D^a u = 0$ and $D^j u = 0$ — the entire distributional derivative is the Cantor part.
[example: Cantor Staircase BV Decomposition]
Let $C \subset [0,1]$ be the standard (middle-thirds) Cantor set, and let $u: [0,1] \to [0,1]$ be the Cantor staircase function defined by $u(0) = 0$, $u(1) = 1$, $u$ is constant on each connected component (open interval) of $[0,1] \setminus C$, and $u$ is continuous. We compute all three components of the BV decomposition $Du = D^a u + D^j u + D^c u$.
**Component 1: $D^a u = 0$.** The Cantor set $C$ has Lebesgue measure $\mathcal{L}^1(C) = 0$, since its complement $[0,1] \setminus C$ consists of the removed open intervals whose total length is $\sum_{k=1}^\infty 2^{k-1} \cdot 3^{-k} = 1$. On each component $(a, b)$ of $[0,1] \setminus C$, the function $u$ is constant, so $u'(x) = 0$ for all $x \in (a,b)$. Since $[0,1] \setminus C$ has full measure, we have $u'(x) = 0$ for $\mathcal{L}^1$-a.e. $x \in [0,1]$. The absolutely continuous part of $Du$ is $D^a u = u' \cdot \mathcal{L}^1 = 0$.
**Component 2: $D^j u = 0$.** The function $u$ is continuous on $[0,1]$ (this is part of its construction — one defines $u$ on each removed interval by a limiting argument that is consistent at the endpoints). Since $u$ has no jump discontinuities, the jump set $J_u = \varnothing$ and $D^j u = 0$.
**Component 3: $Du = D^c u$ is the Cantor measure.** Since $Du = D^a u + D^j u + D^c u = 0 + 0 + D^c u$, we need to identify $Du$ directly. For any $\phi \in C_c^\infty((0,1))$, the distributional derivative satisfies
\begin{align*}
Du(\phi) &= -\int_0^1 u \, \phi' \, d\mathcal{L}^1.
\end{align*}
Integrating by parts on each interval $(a_j, b_j)$ of $[0,1] \setminus C$ (where $u = c_j$ is constant), and using the fact that $u$ is constant on these intervals, the measure $Du$ has the property that $Du(V) = 0$ for any open set $V \subset [0,1] \setminus C$. Thus $Du$ is supported on $C$. Since $\mathcal{L}^1(C) = 0$, the measure $Du$ is singular with respect to $\mathcal{L}^1$. Since also $D^j u = 0$, the measure $Du$ is the Cantor part: $Du = D^c u$.
**Verification that $u \in BV([0,1])$.** Since $u$ is monotone increasing from $0$ to $1$ on $[0,1]$, the total variation equals
\begin{align*}
|Du|([0,1]) = \sup \left\{ \sum_{i=0}^{N-1} |u(t_{i+1}) - u(t_i)| : 0 = t_0 < t_1 < \cdots < t_N = 1 \right\} = 1,
\end{align*}
where the last equality follows from monotonicity: the telescoping sum collapses to $u(1) - u(0) = 1$ regardless of the partition. Thus $u \in BV([0,1])$ with $\|u\|_{BV} = \|u\|_{L^1} + |Du|([0,1]) < \infty$. The Cantor measure $Du = \mu_C$ is the unique Borel probability measure on $C$ that assigns measure $2^{-k}$ to each of the $2^k$ surviving intervals at level $k$ of the Cantor construction; it satisfies $\mu_C(C) = 1$, $\mu_C(\mathbb{R} \setminus C) = 0$, and is continuous (no point masses), confirming $D^j u = 0$.
[/example]
The Cantor staircase is the prototypical example demonstrating that the three-part decomposition $Du = D^a u + D^j u + D^c u$ in the BV structure theorem is genuinely necessary: neither of the first two terms is sufficient to capture the full distributional derivative. It also shows that a function can have zero classical derivative almost everywhere and still have a nonzero distributional derivative — a subtlety that arises because the Cantor set, though null in measure, can carry a full unit of mass for a singular measure. The general principle illustrated is that BV is strictly larger than $W^{1,1}$ precisely because of room for Cantor-type singular components, and any inequality or compactness statement formulated for BV must be compatible with the presence of such components.
What would fail if we tried to ignore the Cantor part? If we only kept $D^a u + D^j u$, we would obtain the zero measure, but $u(1) - u(0) = 1$ cannot be zero — the fundamental theorem of calculus for BV functions gives $u(1) - u(0) = (D^a u + D^j u + D^c u)([0,1])$, and dropping $D^c u$ would give a contradiction. The BV structure theorem gives the correct framework to account for this.
With the BV decomposition firmly anchored in a concrete singular example, the focus now shifts from the structure of derivatives to the optimality of perimeter. The previous two examples computed perimeters and gradient measures on specific sets and functions, but neither addressed the question that motivates much of geometric measure theory: among all sets of a given volume, which one minimizes the perimeter? The next example confronts this minimization problem directly by computing both sides of the isoperimetric inequality on the ball and extracting the sharp dimensional constant. This pivot is essential because it transforms the perimeter from a passive functional, whose value we merely compute, into the objective of an extremal problem whose solution is itself geometric.
## The Ball as the Isoperimetric Optimizer
The isoperimetric inequality in $\mathbb{R}^n$ asserts that among all measurable sets of given volume, the ball has the smallest perimeter. But this raises an immediate computational question: what is the sharp constant? The inequality takes the form $P(E; \mathbb{R}^n) \geq C(n)\,\mathcal{L}^n(E)^{(n-1)/n}$, and the constant $C(n)$ can only be determined by computing the perimeter and volume of the optimizer explicitly. Without this computation, the inequality exists as an abstract existence statement with an unspecified constant — useful for qualitative purposes, but not directly applicable to geometric estimates that require knowing the exact dimensional factor. This example pins down $C(n)$ by carrying out the perimeter computation for the ball and verifying equality.
<!-- illustration-needed: the ball $B(0,r)$ in $\mathbb{R}^2$ or $\mathbb{R}^3$ with the outward unit normal $\nu(x) = x/r$ drawn at several boundary points, and an annotation showing $P(B) = n\omega_n r^{n-1}$ and $\mathcal{L}^n(B) = \omega_n r^n$ -->
[example: Isoperimetric Constant via the Ball]
Let $B = B(0, r) \subset \mathbb{R}^n$ be the open ball of radius $r > 0$. We compute the perimeter $P(B; \mathbb{R}^n)$ and verify that the ball achieves equality in the isoperimetric inequality with sharp constant $C(n) = n \omega_n^{1/n}$, where $\omega_n = \mathcal{L}^n(B(0,1))$ is the volume of the unit ball.
**Volume of the ball.** By scaling, $\mathcal{L}^n(B(0,r)) = r^n \mathcal{L}^n(B(0,1)) = \omega_n r^n$. The volume $\omega_n$ satisfies $\omega_n = \pi^{n/2}/\Gamma(n/2 + 1)$ by standard integration in polar coordinates (see the computation via the Gaussian integral), but we need only the abstract constant in what follows.
**Perimeter of the ball.** The ball $B(0,r)$ has smooth boundary $\partial B(0,r) = \{x : |x| = r\}$, which is a smooth $(n-1)$-dimensional submanifold. For smooth bounded sets, the BV perimeter equals the classical $(n-1)$-dimensional surface area:
\begin{align*}
P(B(0,r); \mathbb{R}^n) &= \mathcal{H}^{n-1}(\partial B(0,r)).
\end{align*}
By the co-area formula applied to $f(x) = |x|$,
\begin{align*}
\mathcal{H}^{n-1}(\partial B(0,r)) &= \frac{d}{dr} \mathcal{L}^n(B(0,r)) = \frac{d}{dr}(\omega_n r^n) = n \omega_n r^{n-1}.
\end{align*}
Thus the outward unit normal on $\partial B(0,r)$ is $\nu(x) = x/r$, and one verifies Gauss-Green directly: for $\phi \in C_c^1(\mathbb{R}^n; \mathbb{R}^n)$,
\begin{align*}
\int_{B(0,r)} \operatorname{div} \phi \, d\mathcal{L}^n &= \int_{\partial B(0,r)} \phi \cdot \frac{x}{r} \, d\mathcal{H}^{n-1},
\end{align*}
and taking $\phi(x) = x/r$ (restricted smoothly) gives total variation $n\omega_n r^{n-1}$.
**Verification of equality in the isoperimetric inequality.** The isoperimetric inequality reads $P(E)^n \geq C(n)^n \mathcal{L}^n(E)^{n-1}$ for some dimensional constant. Inserting $E = B(0,r)$:
\begin{align*}
(n \omega_n r^{n-1})^n &= C(n)^n (\omega_n r^n)^{n-1}.
\end{align*}
Dividing both sides by $r^{n(n-1)}$:
\begin{align*}
n^n \omega_n^n &= C(n)^n \omega_n^{n-1},
\end{align*}
so $C(n)^n = n^n \omega_n$, giving $C(n) = n \omega_n^{1/n}$. One writes the isoperimetric inequality in the standard form
\begin{align*}
P(E; \mathbb{R}^n) &\geq n \omega_n^{1/n} \mathcal{L}^n(E)^{(n-1)/n},
\end{align*}
where equality holds for $E = B(0,r)$ for any $r > 0$. The ball is in fact the unique minimizer up to $\mathcal{L}^n$-null modifications, a fact whose proof requires the equality case analysis of the Sobolev inequality from which the isoperimetric inequality is derived.
[/example]
The computation makes plain why the isoperimetric inequality is sharp: the ratio $P(E)/\mathcal{L}^n(E)^{(n-1)/n}$ is scale-invariant (as can be seen by replacing $E$ with $\lambda E$ for any $\lambda > 0$, which multiplies the numerator by $\lambda^{n-1}$ and the denominator by the same factor), so the minimization over all sets reduces to a minimization over shapes at fixed scale, and the ball achieves the minimum. The dimensional constant $n\omega_n^{1/n}$ also encodes the geometry of $\mathbb{R}^n$ in a transparent way: $n\omega_n = \mathcal{H}^{n-1}(\partial B(0,1))$ is the surface area of the unit sphere.
One indication that the computation is correct: in $\mathbb{R}^2$, $\omega_2 = \pi$, so the sharp constant is $C(2) = 2\sqrt{\pi}$, and the inequality $P(E)^2 \geq 4\pi \mathcal{L}^2(E)$ is the classical isoperimetric inequality in the plane, verified for a disk of radius $r$ as $P = 2\pi r$ and $\mathcal{L}^2(E) = \pi r^2$: $(2\pi r)^2 = 4\pi \cdot \pi r^2$. The BV framework recovers the classical inequality with its classical constant.
The ball example settled the optimization question on the simplest topological domain — a connected, simply connected set whose boundary has a single component. What it left unaddressed is how the BV apparatus handles boundaries with several connected components, where each component must be oriented separately to make Gauss-Green come out correctly. The general principle implicit in the ball computation was that the outward normal $\nu_E$ is determined by the side of the boundary on which $E$ has lower density, and the next example exploits that principle on a domain with a hole. The pivot to the annulus is forced by topology rather than analysis: only by working a multiply connected domain can the role of orientation in the Gauss-Green formula become explicit, and only then can the rule "$\nu_E$ points toward decreasing density" be tested on a sign that actually flips.
## Gauss-Green on an Annulus
The Gauss-Green theorem for sets of finite perimeter extends the classical divergence theorem to sets with non-smooth boundaries. However, the topology of the domain can create subtleties: when the set is not simply connected, its reduced boundary has multiple connected components, and the outward unit normal on each component must be computed separately. The annulus is the simplest example of this phenomenon.
<!-- illustration-needed: the annulus $E = B(0,1) \setminus \overline{B(0,1/2)}$ in $\mathbb{R}^2$ with outward unit normals drawn on both boundary circles: pointing radially outward on $\partial B(0,1)$ and pointing radially inward (toward the origin) on $\partial B(0,1/2)$, illustrating the orientation reversal on the inner boundary -->
[example: Gauss-Green on an Annulus]
Let $E = B(0,1) \setminus \overline{B(0, 1/2)} \subset \mathbb{R}^n$ be the open annulus with inner radius $1/2$ and outer radius $1$. We compute the reduced boundary $\partial^* E$, the outward unit normal $\nu_E$, and verify the Gauss-Green theorem directly for the vector field $\phi(x) = x$.
**Identifying the reduced boundary.** The boundary $\partial E = \partial B(0,1) \cup \partial B(0, 1/2)$ consists of two smooth spheres. Since both spheres are smooth $(n-1)$-dimensional submanifolds, the same argument as for the ball (blow-up at each point is a half-space) shows $\partial^* E = \partial E = \partial B(0,1) \cup \partial B(0,1/2)$.
**Computing the outward unit normal.** The term "outward" means outward with respect to $E$, i.e., in the direction in which $\mathcal{L}^n$-density of $E$ is less than $1/2$. On the outer sphere $\partial B(0,1)$, the region outside $E$ is the complement of the closed unit ball, so the density of $E$ drops below $1/2$ in the outward radial direction: $\nu_E(x) = x/|x| = x$ for $x \in \partial B(0,1)$ (where $|x| = 1$).
On the inner sphere $\partial B(0, 1/2)$, the region inside $\overline{B(0,1/2)}$ has zero density for $E$ (since $E$ excludes $\overline{B(0,1/2)}$), so the density of $E$ drops below $1/2$ in the inward radial direction. Thus the outward unit normal on $\partial B(0,1/2)$ points toward the origin:
\begin{align*}
\nu_E(x) &= -\frac{x}{|x|} \quad \text{for } x \in \partial B(0, 1/2),
\end{align*}
where $|x| = 1/2$ on this sphere, so $\nu_E(x) = -2x$ on $\partial B(0, 1/2)$.
**Verification for $\phi(x) = x$.** The divergence of $\phi$ is $\operatorname{div}(x) = n$, so the left side of Gauss-Green is
\begin{align*}
\int_E \operatorname{div}(x) \, d\mathcal{L}^n &= n \mathcal{L}^n(E) = n(\omega_n \cdot 1^n - \omega_n \cdot (1/2)^n) = n\omega_n(1 - 2^{-n}).
\end{align*}
For the right side, we integrate $\phi \cdot \nu_E = x \cdot \nu_E$ over each component of $\partial^* E$. On the outer sphere $\partial B(0,1)$: $x \cdot \nu_E(x) = x \cdot x = |x|^2 = 1$, so
\begin{align*}
\int_{\partial B(0,1)} x \cdot \nu_E \, d\mathcal{H}^{n-1} &= \mathcal{H}^{n-1}(\partial B(0,1)) = n\omega_n.
\end{align*}
On the inner sphere $\partial B(0,1/2)$: $x \cdot \nu_E(x) = x \cdot (-x/|x|) = -|x| = -1/2$, so
\begin{align*}
\int_{\partial B(0,1/2)} x \cdot \nu_E \, d\mathcal{H}^{n-1} &= -\frac{1}{2} \cdot \mathcal{H}^{n-1}(\partial B(0,1/2)) = -\frac{1}{2} \cdot n\omega_n \cdot (1/2)^{n-1} = -n\omega_n \cdot 2^{-n}.
\end{align*}
Summing the two boundary contributions:
\begin{align*}
\int_{\partial^* E} \phi \cdot \nu_E \, d\mathcal{H}^{n-1} &= n\omega_n - n\omega_n \cdot 2^{-n} = n\omega_n(1 - 2^{-n}),
\end{align*}
which agrees exactly with $\int_E \operatorname{div}(x) \, d\mathcal{L}^n$. The Gauss-Green identity holds.
[/example]
The sign change of the normal on the inner sphere is the key point: it reflects the fact that removing the inner ball reverses the orientation of that boundary component relative to $E$. This orientation reversal is a manifestation of the boundary operator in the theory of currents — the current $\llbracket E \rrbracket$ associated to a set with a hole acquires contributions of opposite sign from the outer and inner boundary components, and the BV framework handles this automatically through the density-based definition of $\nu_E$.
The annulus also illustrates a technique: when computing $\nu_E$ on each boundary component, ask which side of the boundary has lower $\mathcal{L}^n$-density for $E$. On the outer boundary, the exterior of the unit ball has density zero for $E$, so $\nu_E$ points outward radially. On the inner boundary, the hole has density zero for $E$, so $\nu_E$ points inward. This check is more reliable than trying to remember a sign convention.
Across the polygon, the ball, and the annulus, every boundary point that contributed to the perimeter sat on a smooth $(n-1)$-dimensional submanifold, and every excluded point belonged to a finite set of vertices whose treatment was settled in passing. The general principle to extract is that the reduced boundary $\partial^* E$ retains exactly the locally flat part of $\partial E$ and discards lower-dimensional singularities, but the polygon proof of vertex-exclusion was offered briefly without a fully explicit blow-up calculation in the simplest model case. The next example pivots back to the corner question and confronts it head-on on the unit square, where the geometry is as transparent as possible. This pivot is needed because the rectifiability hypothesis in De Giorgi's theorem is what licences the discarding of corners in the first place, and seeing a complete blow-up calculation that produces a quarter-plane (rather than a half-plane) is the clearest possible witness to why such corners cannot belong to $\partial^* E$.
## Blow-Up at a Corner: Why the Square has Reduced Boundary
De Giorgi's structure theorem states that the reduced boundary $\partial^* E$ of a set of finite perimeter is $(n-1)$-rectifiable, and that at each point of $\partial^* E$, the blow-up of $\mathbb{1}_E$ converges to the indicator of a half-space. A natural question is: what happens at points where this blow-up fails — specifically, at corners? The unit square $E = [0,1]^2$ in $\mathbb{R}^2$ is the simplest set with corners, and this example shows exactly why the corners are excluded from $\partial^* E$ and why that exclusion is consistent with the structure theorem (since the corners form an $\mathcal{H}^1$-null set).
[example: Blow-Up at a Corner of the Unit Square]
Let $E = (0,1)^2 \subset \mathbb{R}^2$ (taking the open square is equivalent to $[0,1]^2$ for the purposes of perimeter, since they differ by a set of $\mathcal{L}^2$-measure zero). We analyze the blow-up at interior boundary points and at the four corners, and identify $\partial^* E$.
**Blow-up at an interior edge point.** Let $x_0 = (1/2, 0)$ lie on the bottom edge $\{x_2 = 0\} \cap [0,1]^2$. In the ball $B(x_0, r)$ for $r < 1/2$, the set $E$ coincides exactly with $\{x_2 > 0\} \cap B(x_0, r)$, which is a half-disk. The rescaled set $E_{r} = (E - x_0)/r$ is the upper half-plane $\{y_2 > 0\}$ for every $r < 1/2$. In $L^1_{\mathrm{loc}}(\mathbb{R}^2)$, we have $\mathbb{1}_{E_r} \to \mathbb{1}_{H^+}$ where $H^+ = \{y_2 > 0\}$. The outward unit normal at $x_0$ is $\nu_E(x_0) = (0, -1)$ (pointing downward, out of $E$), so $x_0 \in \partial^* E$.
**Blow-up at the corner $(0,0)$.** At the corner $p = (0,0)$, the rescaled set is $E_r = E/r = (0, 1/r)^2 \cap \{x_1 > 0, x_2 > 0\}$. As $r \to 0$, this converges in $L^1_{\mathrm{loc}}(\mathbb{R}^2)$ to the first quadrant $Q = \{y_1 > 0, y_2 > 0\}$. The indicator $\mathbb{1}_Q$ is not the indicator of a half-space — a half-space is determined by a single linear inequality, while $Q$ is determined by two. Thus the blow-up of $E$ at $(0,0)$ is not a half-plane, and $(0,0) \notin \partial^* E$.
To confirm this via the density calculation: $|D\mathbb{1}_E|(B(p, r)) = \mathcal{H}^1(\partial E \cap B(p,r))$, which consists of two segments of length $r$ each (one along each edge), giving $|D\mathbb{1}_E|(B(p,r)) \approx 2r$. The vector measure $D\mathbb{1}_E(B(p,r))$ equals $-\int_{\partial E \cap B(p,r)} \nu_E \, d\mathcal{H}^1 \approx -(0,-1)r - (-1,0)r = r(1,1)$. The ratio is
\begin{align*}
\frac{D\mathbb{1}_E(B(p,r))}{|D\mathbb{1}_E|(B(p,r))} &\approx \frac{r(1,1)}{2r} = \frac{1}{2}(1,1),
\end{align*}
which has norm $1/\sqrt{2} \neq 1$. Since the ratio does not converge to a unit vector, $p = (0,0) \notin \partial^* E$.
**Identifying $\partial^* E$.** The boundary $\partial E$ consists of the four edges and four corners. The above analysis shows that every interior edge point lies in $\partial^* E$ (by the same half-plane argument), while every corner is excluded. Since there are only $4$ corners, $\partial E \setminus \partial^* E$ has $\mathcal{H}^1$-measure zero, and
\begin{align*}
P(E; \mathbb{R}^2) &= \mathcal{H}^1(\partial^* E) = \mathcal{H}^1(\partial E) = 4.
\end{align*}
This is the perimeter of the unit square in the elementary sense, confirming consistency with De Giorgi's theorem.
[/example]
The corner computation makes the structure theorem's rectifiability hypothesis tangible: rectifiable sets have approximate tangent planes at $\mathcal{H}^{n-1}$-almost every point, and a corner is precisely a point where no unique approximate tangent line exists. The reduced boundary captures exactly the rectifiable part of $\partial E$, discarding the lower-dimensional exceptional set. For polyhedra in $\mathbb{R}^n$, this means discarding faces of codimension $\geq 2$ (edges and vertices in $3$D).
<!-- illustration-needed: side-by-side blow-up comparison — show the blow-up at an interior edge point converging to a half-plane (upper half-plane limit with clear tangent line), versus the blow-up at a corner converging to a quarter-plane (first quadrant, no tangent line) -->
The four examples up to this point have all studied perimeter and BV decomposition through the lens of $\mathcal{H}^{n-1}$ — the reduced boundary, the perimeter measure, and the BV gradient measure are all defined or normalized via Hausdorff $(n-1)$-measure. The general principle running through them is that $\mathcal{H}^{n-1}$ is the natural measure for codimension-one geometric objects in $\mathbb{R}^n$, and that finitely many lower-dimensional exceptional points are negligible at this scale. The next example abandons this framework deliberately to ask a finer question: when two sets are both negligible with respect to $\mathcal{H}^{n-1}$ (in fact both have $\mathcal{H}^{n-1}$-measure zero), is there a finer invariant that can still tell them apart? The pivot to Sobolev capacity is necessary because such an invariant exists, it controls when sets can be removed without affecting Sobolev functions, and it is sensitive to fractal Hausdorff dimensions that the integer-dimensional Hausdorff measures used so far cannot detect.
## Capacity of the Cantor Set
Sobolev capacity is a fine invariant that determines which sets can be "seen" by Sobolev functions: a set $A$ has $W^{1,p}$-capacity zero if and only if every Sobolev function $u \in W^{1,p}$ has a quasicontinuous representative that can be extended across $A$ without affecting its Sobolev norm. Hausdorff dimension classifies sets by their metric size, but two sets of the same Hausdorff dimension may have very different capacities — one may be removable for Sobolev functions while the other is not. The standard Cantor set $C \subset [0,1]$ has Hausdorff dimension $\log 2 / \log 3 < 1$, and the question of for which $p$ the $W^{1,p}$-capacity of $C$ is positive is a natural test of this fine structure.
[example: Capacity of the Standard Cantor Set]
Let $C \subset [0,1]$ be the standard middle-thirds Cantor set, which has Hausdorff dimension $s = \log 2 / \log 3$. We determine for which values of $p \in (0,1)$ the Riesz $p$-capacity $\operatorname{Cap}_p(C) > 0$, where capacity is measured in $\mathbb{R}^1$ ($n = 1$).
**Setup: capacity and Hausdorff dimension.** For a compact set $K \subset \mathbb{R}^n$ and $0 < p < n$, the Riesz $p$-capacity of $K$ is defined by
\begin{align*}
\operatorname{Cap}_p(K) &= \sup \left\{ \mu(K)^2 : \mu \text{ Borel on } K,\ I_p(\mu) \leq 1 \right\},
\end{align*}
where $I_p(\mu) = \int \int |x - y|^{-(n-p)} \, d\mu(x) \, d\mu(y)$ is the Riesz energy of order $n - p$. The fundamental relationship between capacity and Hausdorff dimension in $\mathbb{R}^n$ is:
\begin{align*}
\operatorname{Cap}_p(K) > 0 &\iff \dim_{\mathcal{H}}(K) > n - p.
\end{align*}
Equivalently, $\operatorname{Cap}_p(K) = 0$ if $\dim_{\mathcal{H}}(K) \leq n - p$.
**Application to the Cantor set in $\mathbb{R}^1$ ($n = 1$).** In dimension $n = 1$, the relevant range is $0 < p < 1$ (since for $p \geq 1 = n$, every nonempty compact set has positive capacity by the admissibility of constant functions after cutoff). The Cantor set $C \subset \mathbb{R}$ has Hausdorff dimension $s = \log 2 / \log 3 \approx 0.6309$. The capacity criterion in dimension $n = 1$ reads: $\operatorname{Cap}_p(C) > 0$ if and only if $\dim_{\mathcal{H}}(C) > 1 - p$, i.e., $s > 1 - p$, i.e., $p > 1 - s$. Since $1 - s = (\log 3 - \log 2)/\log 3 = \log(3/2)/\log 3 \approx 0.369$, we obtain:
\begin{align*}
\operatorname{Cap}_p(C) > 0 &\iff p > 1 - \frac{\log 2}{\log 3} = \frac{\log(3/2)}{\log 3},\quad 0 < p < 1.
\end{align*}
**Verification via the Frostman lemma.** To confirm that $\operatorname{Cap}_p(C) > 0$ for $p > 1 - s$ (with $0 < p < 1$), we use Frostman's theorem: there exists a nonzero Borel measure $\mu_C$ supported on $C$ with finite Riesz energy $I_p(\mu_C) < \infty$ whenever $\dim_{\mathcal{H}}(C) > 1 - p$. The Cantor measure $\mu_C$ constructed in the Cantor staircase example is a Frostman measure: there exists $M > 0$ such that $\mu_C(B(x, r)) \leq M r^s$ for all $x \in \mathbb{R}$ and all $r > 0$. This growth condition implies $I_\alpha(\mu_C) < \infty$ for all $\alpha < s$ (by a standard energy estimate): taking $\alpha = 1 - p$ we need $1 - p < s$, i.e., $p > 1 - s$. For such $p$, the Frostman measure witnesses $\operatorname{Cap}_p(C) > 0$.
**Conclusion.** The threshold exponent in the range $0 < p < 1$ is $p_0 = 1 - \log 2/\log 3 = \log(3/2)/\log 3$. For $p_0 < p < 1$, the Cantor set $C$ has positive $p$-capacity and is not removable in the relevant function-space sense: the fractal structure of $C$ is thick enough that it cannot be ignored by functions with this integrability. For $0 < p \leq p_0$, the Cantor set has zero $p$-capacity: its Hausdorff dimension is too small relative to the energy exponent, and $C$ is negligible. For $p \geq 1$ (i.e., the Sobolev $W^{1,p}$ setting in $\mathbb{R}^1$), every nonempty compact set has positive capacity, so the Cantor set is never removable in the $W^{1,p}$ sense.
[/example]
The capacity threshold $p_0 = 1 - \dim_{\mathcal{H}}(C)$ has a geometric interpretation: as $p$ increases toward $1$, the energy kernel $|x-y|^{-(1-p)}$ becomes less singular, allowing more measures to have finite energy, and so more sets become capacity-positive. The crossover at $p = p_0$ is the point where the Cantor set's fractal dimension is exactly compensated by the energy exponent.
This example also illustrates why capacity is a strictly finer invariant than Hausdorff measure. Both the Cantor set $C$ (with $\dim_{\mathcal{H}} = s \approx 0.63$) and, say, a countable dense subset $D \subset [0,1]$ (with $\dim_{\mathcal{H}} = 0$) have $\mathcal{H}^1$-measure zero in $\mathbb{R}^1$. Yet their capacities behave very differently: $C$ has positive $p$-capacity for $p > p_0$, while $D$ has zero $p$-capacity for all $p < 1$ (since $\dim_{\mathcal{H}}(D) = 0 < 1 - p$ for all $p < 1$). Capacity detects the fractal dimension, not just the measure.
The six examples in this chapter trace a path through the core ideas of the course. The polygon and square examples ground the abstract perimeter and reduced boundary in explicit geometric computation; the unifying technique in both is the direct blow-up of $\mathbb{1}_E$, which should always be the first tool when deciding whether a boundary point lies in $\partial^* E$. The Cantor staircase reveals the necessity of the Cantor component in the BV decomposition — a phenomenon invisible to classical calculus — and shows concretely how a function can have zero a.e. derivative yet nonzero distributional derivative. The ball computation pins down the sharp constant in the isoperimetric inequality and confirms that the general theory recovers the expected answer on the simplest example. The annulus shows how the Gauss-Green theorem handles non-simply-connected domains through orientation conventions on each boundary component, with the rule: $\nu_E$ points in the direction of decreasing $\mathcal{L}^n$-density of $E$. Finally, the capacity example demonstrates that Sobolev capacity is sensitive to Hausdorff dimension in a precise quantitative way, going well beyond the crude Hausdorff measure. Together, these problems provide the computational calibration for the entire course.
## References
Evans, L.C. & Gariepy, R.F. *Measure Theory and Fine Properties of Functions* (Revised Edition). CRC Press, 2015.
Contents
- 1. Sobolev Spaces: Definitions and Basic Properties
- Weak Derivatives and the Need for Generalised Differentiation
- The Sobolev Space $W^{1,p}(\Omega)$
- Functional-Analytic Properties
- The Space $W^{1,p}_0(\Omega)$ and Zero Boundary Values
- Examples and Basic Identities
- Sobolev Spaces in One Dimension and Absolutely Continuous Functions
- 2. Approximation of Sobolev Functions
- Mollification and Weak Derivatives
- The Meyers-Serrin Theorem
- Approximation Up to the Boundary
- Product and Chain Rules for Weak Derivatives
- The Product Rule
- The Chain Rule
- $W^{1,\infty}$ and Lipschitz Functions
- 3. Traces
- 3. Traces
- Why Classical Restriction Fails for Sobolev Functions
- The Trace Theorem
- Construction via Flattening the Boundary
- Characterization of $W^{1,p}_0(\Omega)$
- Higher Regularity and the Fractional Sobolev Refinement
- 4. Extensions
- Reflection across Half-Spaces
- The Stein Extension Theorem
- Construction in Detail: Partition of Unity and Local Flattening
- Why the Lipschitz Condition is the Threshold
- 5. Sobolev Inequalities
- Dimensional Analysis and the Sobolev Exponent
- The Gagliardo–Nirenberg–Sobolev Inequality
- The Poincaré Inequality on Balls
- Morrey's Inequality
- The Critical Case $p = n$ and BMO
- 6. Compactness
- The Rellich-Kondrachov Compactness Theorem
- Weak Compactness in Sobolev Spaces
- The Direct Method in the Calculus of Variations
- 7. Capacity
- Definition of $p$-Capacity
- Subadditivity and Basic Properties
- Capacity vs. Hausdorff Measure
- Quasicontinuity and Precise Representatives
- 8. BV Functions: Definition and the Structure Theorem
- The BV Space: Motivation and Definition
- Sobolev Functions and Characteristic Functions as BV Examples
- The Lebesgue Decomposition of $Du$
- The Structure Theorem
- The Cantor Staircase: A Structural Example
- The Absolutely Continuous Part and the Approximate Gradient
- Decomposition Consequences and Counterexamples
- 9. Approximation and Compactness for BV
- Lower Semicontinuity of Total Variation
- Strict Convergence
- Smooth Approximation in BV
- Compactness in BV
- The Topology of BV and the Role of Strict Convergence
- Approximation of Sets of Finite Perimeter
- 10. Traces and Extensions for BV
- The Trace Theorem for BV
- Necessity of the Lipschitz Hypothesis
- Jump Contribution to Total Variation
- What the Formula Says for Sets of Finite Perimeter
- The BV Extension Theorem
- Compact Containment and the Role of the Extension
- Traces and the Coarea Formula
- 11. The Coarea Formula for BV Functions
- From the Lipschitz Coarea to the BV Coarea
- Statement and Proof of the Coarea Formula
- Comparison with the Lipschitz Coarea Formula
- Applications: Slicing BV Functions
- Compactness via the Coarea Formula
- The Isoperimetric Inequality for BV Functions
- The Coarea Formula for Bounded Borel Functions
- Measurability and the Structure of Level Set Families
- The Coarea Formula as a Global Reduction Principle
- 12. Sets of Finite Perimeter and the Reduced Boundary
- Finite Perimeter, the Perimeter Measure, and the Reduced Boundary
- The Isoperimetric Inequality
- Sobolev and Poincaré Inequalities for BV
- Blow-up at Reduced Boundary Points
- De Giorgi's Structure Theorem
- Consequences and the Gauss-Green Theorem
- Density Estimates and the Structure of the Full Boundary
- 13. The Gauss-Green Theorem
- The Divergence Theorem for Finite-Perimeter Sets
- Application to Weak Boundary Value Problems
- Relation to Currents and the Boundary Operator
- 14. Pointwise Properties of BV Functions
- Approximate One-Sided Limits
- The Jump Set
- Decomposition of the Derivative at the Jump Set
- The Fubini-Type Characterization via Lines
- Determination by Approximate Limits and Jumps
- The Precise Representative Outside the Jump Set
- 15. Examples and Worked Problems
- Perimeter of a Convex Polygon
- The Cantor Staircase: A Pure Cantor BV Function
- The Ball as the Isoperimetric Optimizer
- Gauss-Green on an Annulus
- Blow-Up at a Corner: Why the Square has Reduced Boundary
- Capacity of the Cantor Set
- References
Geometric Measure Theory III: BV Functions and Sets of Finite Perimeter
Content
Problems
History
Created by admin on 5/3/2026 | Last updated on 5/3/2026
Prerequisites
No prerequisites required for this page.
Rate this page
★
★
★
★
★
Poor
Excellent