Approximate Gradient Identifies $D^a u$ (Theorem # 3130)
Theorem
Let $u \in BV(\Omega)$. Then the approximate gradient $\nabla u(x_0)$ exists for $\mathcal{L}^n$-a.e. $x_0 \in \Omega$, and
\begin{align*}
D^a u = \nabla u \cdot \mathcal{L}^n.
\end{align*}
Moreover, $\widetilde{u}(x_0) = u(x_0)$ for $\mathcal{L}^n$-a.e. $x_0$ (the approximate limit equals the precise representative almost everywhere).
Analysis
Real Analysis
Measure Theory
Discussion
No discussion available for this theorem.
Proof
[proofplan]
The argument proceeds by Lebesgue–Besicovitch differentiation of the vector measure $Du$. The Radon–Nikodym decomposition writes $Du = D^a u + D^s u$ with $D^a u = g \cdot \mathcal{L}^n$, where the density $g \in L^1_{\mathrm{loc}}(\Omega; \mathbb{R}^n)$ is recovered as a symmetric derivative of $Du$ at $\mathcal{L}^n$-a.e. point and the singular part has zero $\mathcal{L}^n$-density there. Translating these density statements via the rescaled BV functions $v_r(y) := (u(x_0 + ry) - c_r)/r$ on the unit ball, where $c_r$ is the average of $u$ on $B(x_0, r)$, the BV scaling yields $|Dv_r|(B(0, 1)) \to \omega_n |g(x_0)|$ together with weak* convergence of $Dv_r$ to $g(x_0)\, \mathcal{L}^n|_{B(0, 1)}$. The BV–compact embedding then gives $L^1$-convergence of $v_r$ to the affine function $y \mapsto g(x_0)\cdot y$, which unwinds into the approximate-differentiability statement at $x_0$ with approximate gradient $g(x_0)$. The agreement of $\widetilde{u}(x_0)$ with $u(x_0)$ at $\mathcal{L}^n$-a.e. $x_0$ is the Lebesgue differentiation theorem.
[/proofplan]
[step:Decompose $Du$ into absolutely continuous and singular parts and identify the density via differentiation]
By the [Radon–Nikodym theorem](/theorems/radon-nikodym) applied to the Radon vector measure $Du$ on $\Omega$ relative to $\mathcal{L}^n|_\Omega$, there is a unique decomposition
\begin{align*}
Du &= D^a u + D^s u, & D^a u &\ll \mathcal{L}^n, & D^s u &\perp \mathcal{L}^n.
\end{align*}
The Radon–Nikodym density $g \in L^1_{\mathrm{loc}}(\Omega; \mathbb{R}^n)$ is defined by $D^a u = g \cdot \mathcal{L}^n$.
By the [Lebesgue–Besicovitch differentiation theorem](/theorems/lebesgue-besicovitch) applied to the vector measure $Du$ with the doubling reference measure $\mathcal{L}^n$, the symmetric derivative
\begin{align*}
\lim_{r \to 0} \frac{Du(B(x_0, r))}{\mathcal{L}^n(B(x_0, r))} = g(x_0) \in \mathbb{R}^n
\end{align*}
exists at $\mathcal{L}^n$-a.e. $x_0 \in \Omega$. The singular part has vanishing $\mathcal{L}^n$-density:
\begin{align*}
\lim_{r \to 0} \frac{|D^s u|(B(x_0, r))}{\mathcal{L}^n(B(x_0, r))} = 0 \quad \text{for $\mathcal{L}^n$-a.e. } x_0 \in \Omega.
\end{align*}
[guided]
We are given $u \in BV(\Omega)$, meaning the distributional derivative $Du$ is a finite Radon vector measure on $\Omega$ with values in $\mathbb{R}^n$. The total variation $|Du|$ is a non-negative Radon measure.
The hypotheses of the [Radon–Nikodym theorem](/theorems/radon-nikodym) for vector-valued measures: $Du$ is a Radon vector measure with finite total variation on $\Omega$, and $\mathcal{L}^n$ is a $\sigma$-finite Borel measure on $\Omega$. Both are met. The conclusion provides the unique decomposition
\begin{align*}
Du &= D^a u + D^s u, & D^a u &\ll \mathcal{L}^n, & D^s u &\perp \mathcal{L}^n,
\end{align*}
together with a density $g \in L^1_{\mathrm{loc}}(\Omega; \mathbb{R}^n)$ such that $D^a u = g \cdot \mathcal{L}^n$.
The hypotheses of the [Lebesgue–Besicovitch differentiation theorem](/theorems/lebesgue-besicovitch): the reference measure $\mathcal{L}^n$ is doubling on $\mathbb{R}^n$ (a stronger property than required), and $Du$ is a Radon vector measure. The conclusion gives, at $\mathcal{L}^n$-a.e. $x_0$,
\begin{align*}
\lim_{r \to 0} \frac{Du(B(x_0, r))}{\mathcal{L}^n(B(x_0, r))} = g(x_0).
\end{align*}
This is the symmetric-derivative recovery of the Radon–Nikodym density. Applied to the singular part separately, the symmetric derivative of $|D^s u|$ relative to $\mathcal{L}^n$ vanishes $\mathcal{L}^n$-a.e., because $|D^s u| \perp \mathcal{L}^n$ and the differentiation theorem applied to a singular measure gives zero density at typical points.
The set of $x_0$ where both density limits hold has full $\mathcal{L}^n$-measure in $\Omega$; we restrict our attention to such $x_0$ throughout the rest of the proof. Our task is to show that the approximate gradient $\nabla u(x_0)$ — defined intrinsically by an approximate differentiability condition — coincides with the Radon–Nikodym density $g(x_0)$.
[/guided]
[/step]
[step:Define rescaled BV functions and compute the BV scaling identity]
Fix $x_0 \in \Omega$ at which the density limits in Step 1 exist; the set of such $x_0$ has full $\mathcal{L}^n$-measure. Choose $\rho > 0$ with $B(x_0, \rho) \subseteq \Omega$. For $r \in (0, \rho)$, set
\begin{align*}
c_r := \fint_{B(x_0, r)} u(x)\, d\mathcal{L}^n(x),
\end{align*}
and define
\begin{align*}
v_r: B(0, 1) &\to \mathbb{R} \\
y &\mapsto \frac{u(x_0 + r y) - c_r}{r}.
\end{align*}
Note $\fint_{B(0, 1)} v_r\, d\mathcal{L}^n = 0$ by the definition of $c_r$ and the change of variables.
For any test field $\varphi \in C_c^1(B(0, 1); \mathbb{R}^n)$, the change of variables $x = x_0 + ry$, $d\mathcal{L}^n(y) = r^{-n}\, d\mathcal{L}^n(x)$, with $\Phi(x) := \varphi((x - x_0)/r) \in C_c^1(B(x_0, r); \mathbb{R}^n)$ and the chain-rule identity $\operatorname{div}_y \varphi(y) = r\, \operatorname{div}_x \Phi(x)$, yields
\begin{align*}
\langle Dv_r, \varphi\rangle &= -\int_{B(0, 1)} v_r(y)\, \operatorname{div}_y \varphi(y)\, d\mathcal{L}^n(y) \\
&= -\frac{1}{r}\int_{B(x_0, r)} u(x)\, r\, \operatorname{div}_x \Phi(x)\, r^{-n}\, d\mathcal{L}^n(x) \\
&= r^{-n}\int_{B(x_0, r)} u(x)\, (-\operatorname{div}_x \Phi(x))\, d\mathcal{L}^n(x) \\
&= r^{-n}\, \langle Du, \Phi\rangle,
\end{align*}
where the integral of the constant $c_r/r$ against $\operatorname{div}_y \varphi$ vanishes by the divergence theorem on the unit ball with $\varphi$ compactly supported. As $\varphi$ ranges over $C_c^1(B(0, 1); \mathbb{R}^n)$ with $\|\varphi\|_\infty \le 1$, $\Phi$ ranges over $C_c^1(B(x_0, r); \mathbb{R}^n)$ with $\|\Phi\|_\infty \le 1$. Taking suprema:
\begin{align*}
|Dv_r|(B(0, 1)) = r^{-n}\, |Du|(B(x_0, r)).
\end{align*}
Since $\mathcal{L}^n(B(x_0, r)) = \omega_n r^n$, this rewrites as
\begin{align*}
|Dv_r|(B(0, 1)) = \omega_n \cdot \frac{|Du|(B(x_0, r))}{\mathcal{L}^n(B(x_0, r))}.
\end{align*}
The right-hand side tends to $\omega_n |g(x_0)|$ as $r \to 0$ by Step 1.
[guided]
The natural rescaling for $BV$ functions is dictated by the fact that the gradient should remain $O(1)$ as $r \to 0$. For a smooth test case $u(x) = g \cdot (x - x_0)$ with constant gradient $g$, the rescaled function $u(x_0 + ry) = g \cdot ry$ has gradient $rg$ in the $y$-coordinates; dividing by $r$ recovers a finite gradient. This motivates the centred-and-divided form $v_r(y) = (u(x_0 + ry) - c_r)/r$.
We choose $c_r$ as the average of $u$ on $B(x_0, r)$ so that $v_r$ has zero average on $B(0, 1)$ — this is the Poincaré-friendly normalisation that makes $L^1$-bounds available without an additive ambiguity.
Computing $|Dv_r|$. The distributional derivative pairs against test fields $\varphi \in C_c^1(B(0, 1); \mathbb{R}^n)$ via integration by parts. After change of variables $x = x_0 + ry$ (so $d\mathcal{L}^n(y) = r^{-n} d\mathcal{L}^n(x)$) and the chain-rule identity for the divergence ($\operatorname{div}_y \varphi(y) = r\, \operatorname{div}_x \Phi(x)$, where the factor of $r$ comes from differentiating the inverse change of variables), the additive constant $c_r/r$ contributes zero (integral of a divergence on the unit ball with $\varphi$ compactly supported), and the remaining integral becomes the pairing of $Du$ on $B(x_0, r)$ with the test field $\Phi(x) = \varphi((x - x_0)/r)$, scaled by $r^{-n}$:
\begin{align*}
\langle Dv_r, \varphi\rangle = r^{-n} \langle Du, \Phi\rangle.
\end{align*}
The supremum identity for total variation,
\begin{align*}
|Dv_r|(B(0, 1)) = \sup_{\varphi: \|\varphi\|_\infty \le 1} \langle Dv_r, \varphi\rangle,
\end{align*}
together with the bijection $\varphi \leftrightarrow \Phi$ preserving sup-norms (since $\Phi(x) = \varphi(y)$ takes the same values, just at rescaled points), gives
\begin{align*}
|Dv_r|(B(0, 1)) = r^{-n}\, |Du|(B(x_0, r)).
\end{align*}
Sanity check via the linear test case: for $u(x) = g \cdot (x - x_0)$, $|Du|$ has density $|g|$, so $|Du|(B(x_0, r)) = |g| \omega_n r^n$, giving $|Dv_r|(B(0, 1)) = |g|\omega_n$ — independent of $r$, matching the constant-gradient unit-ball total variation of $v(y) = g \cdot y$.
Multiplying and dividing by $\mathcal{L}^n(B(x_0, r)) = \omega_n r^n$:
\begin{align*}
|Dv_r|(B(0, 1)) = \omega_n \cdot \frac{|Du|(B(x_0, r))}{\mathcal{L}^n(B(x_0, r))}.
\end{align*}
By Step 1 (specifically the symmetric derivative of $|Du|$ relative to $\mathcal{L}^n$, which equals $|g(x_0)|$ at points where the singular density vanishes — i.e., at $\mathcal{L}^n$-a.e. $x_0$),
\begin{align*}
|Dv_r|(B(0, 1)) \to \omega_n |g(x_0)| \quad \text{as } r \to 0.
\end{align*}
The total variations remain uniformly bounded as $r \to 0$.
[/guided]
[/step]
[step:Extract a limit of $v_r$ via BV-compactness and identify it via weak* convergence of $Dv_r$]
By Poincaré's inequality on $B(0, 1)$ for zero-mean BV functions (a consequence of [Poincaré in Balls](/theorems/3103) extended to BV via approximation),
\begin{align*}
\|v_r\|_{L^1(B(0, 1))} \le C_n\, |Dv_r|(B(0, 1)) \le C_n \omega_n |g(x_0)| + o(1),
\end{align*}
so $\{v_r\}_{r \in (0, \rho)}$ is bounded in $BV(B(0, 1))$.
By the BV variant of [Rellich–Kondrachov](/theorems/64) (the unit ball $B(0, 1)$ is a bounded Lipschitz domain), bounded sequences in $BV(B(0, 1))$ are precompact in $L^1(B(0, 1))$. Hence every sequence $r_k \to 0$ has a subsequence along which $v_{r_k}$ converges in $L^1(B(0, 1))$ to a limit $v \in L^1(B(0, 1))$. The limit $v$ has finite total variation by lower-semicontinuity:
\begin{align*}
|Dv|(B(0, 1)) \le \liminf_{k \to \infty} |Dv_{r_k}|(B(0, 1)) = \omega_n |g(x_0)|.
\end{align*}
We identify $v$. For any open ball $B(z, \delta) \subseteq B(0, 1)$, the rescaled vector measure satisfies
\begin{align*}
Dv_r(B(z, \delta)) = r^{-n} Du(x_0 + r\, B(z, \delta)) = r^{-n} Du(B(x_0 + rz, r\delta)).
\end{align*}
Dividing by $\mathcal{L}^n(B(z, \delta)) = \omega_n \delta^n$:
\begin{align*}
\frac{Dv_r(B(z, \delta))}{\mathcal{L}^n(B(z, \delta))} = \frac{Du(B(x_0 + rz, r\delta))}{\mathcal{L}^n(B(x_0 + rz, r\delta))}.
\end{align*}
For $|z| < 1$, $x_0 + rz \to x_0$ as $r \to 0$ and $r \delta \to 0$. We must check that the family of balls $\{B(x_0 + rz, r\delta)\}_{r \to 0}$ shrinks nicely to $x_0$ in the sense of the [Lebesgue–Besicovitch differentiation theorem](/theorems/lebesgue-besicovitch): each ball $B(x_0 + rz, r\delta)$ is contained in the centred ball $B(x_0, r(|z| + \delta))$, and the ratio of measures is
\begin{align*}
\frac{\mathcal{L}^n(B(x_0, r(|z| + \delta)))}{\mathcal{L}^n(B(x_0 + rz, r\delta))} = \frac{(|z| + \delta)^n}{\delta^n} \le \left(\frac{1 + \delta}{\delta}\right)^n,
\end{align*}
a uniform bound (independent of $r$) since $|z| < 1$. Hence the family shrinks nicely to $x_0$, and the strong form of the Lebesgue–Besicovitch theorem applies: the right-hand side converges to $g(x_0)$ as $r \to 0$. Therefore
\begin{align*}
Dv_r(B(z, \delta)) \to g(x_0) \cdot \mathcal{L}^n(B(z, \delta)) \quad \text{as } r \to 0,
\end{align*}
for every open ball $B(z, \delta) \subseteq B(0, 1)$.
We now extract a weak* limit and identify it. The space $\mathcal{M}(B(0, 1); \mathbb{R}^n)$ of finite $\mathbb{R}^n$-valued Radon measures on $B(0, 1)$ is the topological dual of $C_0(B(0, 1); \mathbb{R}^n)$. The uniform bound $|Dv_r|(B(0, 1)) \le \omega_n |g(x_0)| + 1$ for small $r$ places the family $\{Dv_r\}$ in a bounded subset of this dual space. By the Banach–Alaoglu theorem, the family is sequentially weak* precompact: for any $r_k \to 0$, there exist a subsequence $r_{k_j}$ and a finite vector measure $\mu$ on $B(0, 1)$ such that $Dv_{r_{k_j}} \overset{*}{\rightharpoonup} \mu$, i.e.,
\begin{align*}
\int_{B(0, 1)} \psi \, dDv_{r_{k_j}} \to \int_{B(0, 1)} \psi \, d\mu \quad \text{for every } \psi \in C_0(B(0, 1); \mathbb{R}^n).
\end{align*}
To identify $\mu$, we show $\mu = g(x_0) \mathcal{L}^n|_{B(0, 1)}$. Fix any open ball $B(z, \delta) \subseteq B(0, 1)$, and approximate its indicator $\mathbf{1}_{B(z, \delta)}$ from inside by a non-decreasing sequence of compactly supported continuous functions $\eta_m \uparrow \mathbf{1}_{B(z, \delta)}$ pointwise on $B(0, 1)$, with $0 \le \eta_m \le 1$. By weak* convergence applied to $\eta_m e_i$ for each coordinate $e_i$, and dominated convergence on the right (using that $|\mu|$ and $\mathcal{L}^n$ are finite on $B(0, 1)$),
\begin{align*}
\int \eta_m \, d\mu = \lim_j \int \eta_m \, dDv_{r_{k_j}} \le \liminf_j Dv_{r_{k_j}}(\operatorname{supp}\eta_m) \cdot \|\eta_m\|_\infty,
\end{align*}
and passing $m \to \infty$ on both sides via dominated convergence and the ball-convergence shown above yields
\begin{align*}
\mu(B(z, \delta)) = \lim_j Dv_{r_{k_j}}(B(z, \delta)) = g(x_0) \mathcal{L}^n(B(z, \delta)).
\end{align*}
Hence $\mu$ and $g(x_0) \mathcal{L}^n|_{B(0, 1)}$ agree on the family of open balls $B(z, \delta) \subseteq B(0, 1)$. Since this family is closed under finite intersection and generates the Borel $\sigma$-algebra on $B(0, 1)$, the uniqueness part of the Riesz–Radon representation theorem (or, equivalently, a Dynkin $\pi$–$\lambda$ argument applied to each coordinate signed measure of bounded variation) gives $\mu = g(x_0) \mathcal{L}^n|_{B(0, 1)}$.
Since the limit is the same for every weak* convergent subsequence, the full family converges:
\begin{align*}
Dv_r \overset{*}{\rightharpoonup} g(x_0)\, \mathcal{L}^n|_{B(0, 1)} \quad \text{as } r \to 0.
\end{align*}
The $L^1$-limit $v$ then has gradient measure $Dv = g(x_0)\, \mathcal{L}^n|_{B(0, 1)}$: for any $\varphi \in C_c^1(B(0, 1); \mathbb{R}^n)$,
\begin{align*}
\langle Dv, \varphi\rangle = -\int_{B(0, 1)} v\, \operatorname{div}\varphi\, d\mathcal{L}^n = -\lim_k \int_{B(0, 1)} v_{r_k} \operatorname{div}\varphi\, d\mathcal{L}^n = \lim_k \langle Dv_{r_k}, \varphi\rangle = \int_{B(0, 1)} g(x_0) \cdot \varphi\, d\mathcal{L}^n.
\end{align*}
Hence $v \in W^{1, \infty}(B(0, 1))$ with $\nabla v = g(x_0)$ $\mathcal{L}^n$-a.e., and the zero-mean normalisation forces $v(y) = g(x_0) \cdot y$ on $B(0, 1)$.
By uniqueness of the limit, the full family converges:
\begin{align*}
v_r \to v_*\quad \text{in } L^1(B(0, 1)) \text{ as } r \to 0, \quad v_*(y) := g(x_0) \cdot y.
\end{align*}
[guided]
We extract a limit of the rescaled functions and identify it as an affine function with gradient $g(x_0)$.
*BV-compactness.* The hypotheses of the BV Rellich–Kondrachov theorem on $B(0, 1)$: bounded Lipschitz domain (the unit ball is smooth, hence Lipschitz) and a bounded family in $BV$. The total variations $|Dv_r|(B(0, 1))$ tend to $\omega_n|g(x_0)|$, so they are uniformly bounded for small $r$. The $L^1$ norms are bounded by Poincaré–Wirtinger on the unit ball (with zero mean): $\|v_r\|_{L^1} \le C_n |Dv_r|(B(0, 1)) \le C_n \omega_n |g(x_0)| + o(1)$. So $\{v_r\}$ is uniformly $BV$-bounded for small $r$.
The BV-Rellich theorem on $B(0, 1)$ states: bounded sequences in $BV(B(0, 1))$ are relatively compact in $L^1(B(0, 1))$. Applied to our family, every sequence $r_k \to 0$ has a subsequence $r_{k_j}$ with $v_{r_{k_j}} \to v$ in $L^1(B(0, 1))$ for some $v \in L^1(B(0, 1))$.
*Identifying the limit via weak* convergence of gradients.* The vector measure $Dv_r$ should converge weakly* to a constant-density measure $g(x_0)\, \mathcal{L}^n$. To verify, evaluate $Dv_r$ on a test ball $B(z, \delta) \subseteq B(0, 1)$:
\begin{align*}
Dv_r(B(z, \delta)) = r^{-n} (Du)(x_0 + r B(z, \delta)) = r^{-n}\, Du(B(x_0 + rz, r\delta)),
\end{align*}
using the rescaling identity from Step 2 promoted from compactly supported test fields to Borel sets via Radon-measure regularity (open-set agreement plus inner regularity). The right-hand side is the $|Du|$-mass of a small ball centred at $x_0 + rz$, which moves toward $x_0$ as $r \to 0$. Dividing by $\mathcal{L}^n(B(z, \delta)) = \omega_n \delta^n$ on both sides:
\begin{align*}
\frac{Dv_r(B(z, \delta))}{\mathcal{L}^n(B(z, \delta))} = \frac{Du(B(x_0 + rz, r\delta))}{\mathcal{L}^n(B(x_0 + rz, r\delta))}.
\end{align*}
The right-hand side is the symmetric-derivative quotient for $Du$ taken along the family $\{B(x_0 + rz, r\delta)\}_{r \to 0}$. To apply the [Lebesgue–Besicovitch differentiation theorem](/theorems/lebesgue-besicovitch) to this off-centre family, we verify the *shrinks-nicely* hypothesis: each ball $B(x_0 + rz, r\delta)$ sits inside the centred ball $B(x_0, r(|z| + \delta))$, with measure ratio
\begin{align*}
\frac{\mathcal{L}^n(B(x_0, r(|z| + \delta)))}{\mathcal{L}^n(B(x_0 + rz, r\delta))} = \left(\frac{|z| + \delta}{\delta}\right)^n \le \left(\frac{1 + \delta}{\delta}\right)^n,
\end{align*}
which is a constant independent of $r$. Hence the family shrinks nicely to $x_0$, the radii $r(|z| + \delta) \to 0$, and the strong form of the differentiation theorem yields
\begin{align*}
\frac{Du(B(x_0 + rz, r\delta))}{\mathcal{L}^n(B(x_0 + rz, r\delta))} \to g(x_0) \quad \text{as } r \to 0.
\end{align*}
This gives $Dv_r(B(z, \delta)) \to g(x_0) \mathcal{L}^n(B(z, \delta))$ for each fixed open ball $B(z, \delta) \subseteq B(0, 1)$. Now we promote ball-convergence to weak* convergence in the standard way. By the Banach–Alaoglu theorem, the bounded family $\{Dv_r\}$ in the dual space $C_0(B(0, 1); \mathbb{R}^n)^* = \mathcal{M}(B(0, 1); \mathbb{R}^n)$ is sequentially weak* precompact: every sequence $r_k \to 0$ admits a subsequence $r_{k_j}$ along which $Dv_{r_{k_j}} \overset{*}{\rightharpoonup} \mu$ for some finite vector measure $\mu$. To identify $\mu$ with $g(x_0) \mathcal{L}^n|_{B(0, 1)}$, approximate the indicator of any open ball $B(z, \delta) \subseteq B(0, 1)$ from below by $C_c$-functions $\eta_m \uparrow \mathbf{1}_{B(z, \delta)}$ and pass to the limit using weak* convergence on each $\eta_m$ followed by dominated convergence; this recovers $\mu(B(z, \delta)) = g(x_0) \mathcal{L}^n(B(z, \delta))$ for every such ball. Since open balls form a $\pi$-system generating the Borel $\sigma$-algebra on $B(0, 1)$, uniqueness of finite Radon measures (Riesz–Radon) determines $\mu$ entirely: $\mu = g(x_0) \mathcal{L}^n|_{B(0, 1)}$. As every weak* convergent subsequence has the same limit, the full family converges: $Dv_r \overset{*}{\rightharpoonup} g(x_0)\, \mathcal{L}^n|_{B(0, 1)}$.
*The $L^1$-limit has gradient $g(x_0)$.* The $L^1$-limit $v$ satisfies, for any $\varphi \in C_c^1(B(0, 1); \mathbb{R}^n)$,
\begin{align*}
\langle Dv, \varphi\rangle = -\int v\, \operatorname{div}\varphi\, d\mathcal{L}^n = -\lim_{k \to \infty} \int v_{r_{k_j}} \operatorname{div}\varphi\, d\mathcal{L}^n = \lim_{k \to \infty} \langle Dv_{r_{k_j}}, \varphi\rangle = \int g(x_0) \cdot \varphi\, d\mathcal{L}^n,
\end{align*}
where the first limit uses the $L^1$-convergence of $v_{r_{k_j}}$ to $v$ and the $C^1$-boundedness of $\operatorname{div}\varphi$, and the last uses weak* convergence of the gradient measures applied to $\varphi$.
This identifies $Dv = g(x_0)\, \mathcal{L}^n|_{B(0, 1)}$ in the distributional sense, i.e., $\nabla v = g(x_0)$ a.e. So $v$ is affine: $v(y) = g(x_0) \cdot y + b$ for a constant $b$. The zero-mean condition $\int v = 0$ forces $b = -g(x_0) \cdot 0 = 0$ (since $\int_{B(0, 1)} y\, d\mathcal{L}^n(y) = 0$ by symmetry), giving $v(y) = g(x_0) \cdot y$.
*Full-family convergence.* Since the limit $v_*(y) = g(x_0) \cdot y$ is unique and every subsequence has a sub-subsequence converging to $v_*$, the full family $\{v_r\}_{r > 0}$ converges to $v_*$ in $L^1(B(0, 1))$ as $r \to 0$.
[/guided]
[/step]
[step:Translate $L^1$-convergence to approximate differentiability of $u$ at $x_0$]
The convergence $v_r \to v_*$ in $L^1(B(0, 1))$ unwraps via the change of variables $x = x_0 + ry$:
\begin{align*}
\int_{B(0, 1)} |v_r(y) - g(x_0) \cdot y|\, d\mathcal{L}^n(y) = r^{-n} \int_{B(x_0, r)} \frac{|u(x) - c_r - g(x_0) \cdot (x - x_0)|}{r}\, d\mathcal{L}^n(x).
\end{align*}
Multiplying both sides by $\omega_n^{-1}$ and noting $\omega_n^{-1} r^{-n} = \mathcal{L}^n(B(x_0, r))^{-1}$ (modulo the $\omega_n$ factor):
\begin{align*}
\frac{1}{r}\fint_{B(x_0, r)} |u(x) - c_r - g(x_0) \cdot (x - x_0)|\, d\mathcal{L}^n(x) = \omega_n^{-1} \int_{B(0, 1)} |v_r(y) - g(x_0) \cdot y|\, d\mathcal{L}^n(y) \to 0
\end{align*}
as $r \to 0$.
By the triangle inequality,
\begin{align*}
&\frac{1}{r}\fint_{B(x_0, r)} |u(x) - u(x_0) - g(x_0)\cdot(x - x_0)|\, d\mathcal{L}^n(x) \\
&\quad \le \frac{1}{r}\fint_{B(x_0, r)} |u(x) - c_r - g(x_0)\cdot(x - x_0)|\, d\mathcal{L}^n(x) + \frac{|c_r - u(x_0)|}{r}.
\end{align*}
The first term tends to $0$ as just shown. The crux is to show that the second term vanishes:
\begin{align*}
\frac{|c_r - u(x_0)|}{r} \to 0 \quad \text{as } r \to 0.
\end{align*}
We establish this rate-$r$ statement at $\mathcal{L}^n$-a.e. $x_0$ by combining two facts:
**(a) The Lebesgue-point property at $x_0$**, supplied by the [Lebesgue–Besicovitch differentiation theorem](/theorems/lebesgue-besicovitch) applied to $u \in L^1_{\mathrm{loc}}(\Omega)$: at $\mathcal{L}^n$-a.e. $x_0 \in \Omega$,
\begin{align*}
\fint_{B(x_0, r)} |u(x) - u(x_0)|\, d\mathcal{L}^n(x) \to 0 \quad \text{as } r \to 0.
\end{align*}
By Jensen's inequality, $|c_r - u(x_0)| \le \fint_{B(x_0, r)} |u(x) - u(x_0)|\, d\mathcal{L}^n(x)$, so $|c_r - u(x_0)| = o(1)$ at every Lebesgue point.
**(b) The BV Poincaré inequality at $x_0$**, supplied by [Poincaré in Balls](/theorems/3103) extended to BV: for every ball $B(x_0, r) \subseteq \Omega$,
\begin{align*}
\fint_{B(x_0, r)} |u(x) - c_r|\, d\mathcal{L}^n(x) \le C_n \cdot r \cdot \frac{|Du|(B(x_0, r))}{\mathcal{L}^n(B(x_0, r))}.
\end{align*}
The factor of $r$ on the right-hand side is the rate-$r$ scaling that BV Poincaré provides — this is precisely what rules out a pointwise rate-$r$ statement for arbitrary $L^1_{\mathrm{loc}}$ functions but secures it for BV.
Combining (a) and (b). For any $\alpha \in \mathbb{R}$, Jensen's inequality gives
\begin{align*}
\frac{|c_r - \alpha|}{r} \le \frac{1}{r}\fint_{B(x_0, r)} |u(x) - \alpha|\, d\mathcal{L}^n(x).
\end{align*}
The right-hand side at $\alpha = u(x_0)$ is precisely the rate-$r$ Lebesgue-average that BV improves over $L^1_{\mathrm{loc}}$. The relevant input is the [BV Lebesgue-point theorem](/theorems/bv-lebesgue-point) (also called the strong Lebesgue-point theorem for BV functions), whose hypothesis $u \in BV(\Omega)$ is satisfied here. Its conclusion: at $\mathcal{L}^n$-a.e. $x_0 \in \Omega$, the precise representative $\widetilde{u}(x_0)$ exists and
\begin{align*}
\frac{1}{r}\fint_{B(x_0, r)} |u(x) - \widetilde{u}(x_0)|\, d\mathcal{L}^n(x) \to 0 \quad \text{as } r \to 0.
\end{align*}
This rate-$r$ statement is the BV-specific strengthening of the $o(1)$ Lebesgue-point property in (a); its proof uses the Poincaré-in-Balls bound from (b) on dyadic scales. Combined with $\widetilde{u}(x_0) = u(x_0)$ at $\mathcal{L}^n$-a.e. $x_0$ (Step 5; the set of Lebesgue points of $u$ has full $\mathcal{L}^n$-measure, and the BV-Lebesgue-point set is contained in it up to null sets), the rate-$r$ statement holds with $\widetilde{u}(x_0)$ replaced by $u(x_0)$:
\begin{align*}
\frac{1}{r}\fint_{B(x_0, r)} |u(x) - u(x_0)|\, d\mathcal{L}^n(x) \to 0.
\end{align*}
Applying Jensen's inequality with $\alpha = u(x_0)$,
\begin{align*}
\frac{|c_r - u(x_0)|}{r} \le \frac{1}{r}\fint_{B(x_0, r)} |u(x) - u(x_0)|\, d\mathcal{L}^n(x) \to 0 \quad \text{as } r \to 0.
\end{align*}
This is the absorption needed. Returning to the triangle-inequality decomposition,
\begin{align*}
\frac{1}{r}\fint_{B(x_0, r)} |u(x) - u(x_0) - g(x_0) \cdot (x - x_0)|\, d\mathcal{L}^n(x) \to 0 \quad \text{as } r \to 0.
\end{align*}
This is the defining property of approximate differentiability of $u$ at $x_0$ with approximate gradient $\nabla u(x_0) = g(x_0)$.
[guided]
We unwrap the $L^1$-convergence of the rescaled functions into a statement about $u$ near $x_0$.
*Change of variables in the integral.* The substitution $x = x_0 + ry$, $d\mathcal{L}^n(y) = r^{-n}\, d\mathcal{L}^n(x)$, transforms the unit-ball $L^1$-norm of $v_r - v_*$ into an $L^1$-norm on $B(x_0, r)$:
\begin{align*}
\int_{B(0, 1)} \big|v_r(y) - g(x_0) \cdot y\big|\, d\mathcal{L}^n(y) &= \int_{B(0, 1)}\Big|\frac{u(x_0 + ry) - c_r}{r} - g(x_0) \cdot y\Big|\, d\mathcal{L}^n(y) \\
&= r^{-n}\int_{B(x_0, r)} \Big|\frac{u(x) - c_r}{r} - g(x_0) \cdot \frac{x - x_0}{r}\Big|\, d\mathcal{L}^n(x) \\
&= r^{-n - 1}\int_{B(x_0, r)} |u(x) - c_r - g(x_0) \cdot (x - x_0)|\, d\mathcal{L}^n(x).
\end{align*}
Dividing by $\omega_n$ and recognising $\omega_n r^n = \mathcal{L}^n(B(x_0, r))$:
\begin{align*}
\frac{1}{r}\fint_{B(x_0, r)} |u(x) - c_r - g(x_0) \cdot (x - x_0)|\, d\mathcal{L}^n(x) = \omega_n^{-1}\int_{B(0, 1)} |v_r(y) - g(x_0) \cdot y|\, d\mathcal{L}^n(y) \to 0
\end{align*}
as $r \to 0$, by Step 3.
*Replacing $c_r$ with $u(x_0)$.* The triangle inequality gives
\begin{align*}
\frac{1}{r}\fint |u - u(x_0) - g(x_0)\cdot(\cdot - x_0)|\, d\mathcal{L}^n \le \frac{1}{r}\fint |u - c_r - g(x_0)\cdot(\cdot - x_0)|\, d\mathcal{L}^n + \frac{|c_r - u(x_0)|}{r}.
\end{align*}
The first term tends to $0$ as just shown. The crux is to verify that the second term vanishes:
\begin{align*}
\frac{|c_r - u(x_0)|}{r} \to 0 \quad \text{as } r \to 0.
\end{align*}
This rate-$r$ absorption is established by combining the Lebesgue-point property at $x_0$ with the BV Lebesgue-point theorem.
*The Lebesgue-point step.* The [Lebesgue–Besicovitch differentiation theorem](/theorems/lebesgue-besicovitch), applied to the $L^1_{\mathrm{loc}}$ function $u$ (since $u \in BV(\Omega) \subseteq L^1(\Omega) \subseteq L^1_{\mathrm{loc}}(\Omega)$), yields at $\mathcal{L}^n$-a.e. $x_0$,
\begin{align*}
\fint_{B(x_0, r)} |u(x) - u(x_0)|\, d\mathcal{L}^n(x) \to 0,
\end{align*}
and Jensen's inequality bounds $|c_r - u(x_0)| \le \fint_{B(x_0, r)}|u - u(x_0)|\, d\mathcal{L}^n$. This gives only $|c_r - u(x_0)| = o(1)$, *not* $o(r)$.
*The BV Poincaré / BV Lebesgue-point step.* The rate-$r$ improvement is the content of the [BV Lebesgue-point theorem](/theorems/bv-lebesgue-point) (also known as the strong Lebesgue-point theorem for BV functions), which builds on the [Poincaré in Balls](/theorems/3103) inequality applied at the scale $r$:
\begin{align*}
\fint_{B(x_0, r)} |u(x) - c_r|\, d\mathcal{L}^n(x) \le C_n \cdot r \cdot \frac{|Du|(B(x_0, r))}{\mathcal{L}^n(B(x_0, r))}.
\end{align*}
The factor of $r$ here is the rate that BV provides over generic $L^1_{\mathrm{loc}}$. The hypothesis $u \in BV(\Omega)$ holds; the conclusion of the BV Lebesgue-point theorem is: at $\mathcal{L}^n$-a.e. $x_0 \in \Omega$, the precise representative $\widetilde{u}(x_0)$ exists and
\begin{align*}
\frac{1}{r}\fint_{B(x_0, r)}|u(x) - \widetilde{u}(x_0)|\, d\mathcal{L}^n(x) \to 0 \quad \text{as } r \to 0.
\end{align*}
At Lebesgue points of $u$, $\widetilde{u}(x_0) = u(x_0)$ (Step 5). Combining,
\begin{align*}
\frac{1}{r}\fint_{B(x_0, r)} |u(x) - u(x_0)|\, d\mathcal{L}^n(x) \to 0.
\end{align*}
By Jensen's inequality once more,
\begin{align*}
\frac{|c_r - u(x_0)|}{r} \le \frac{1}{r}\fint_{B(x_0, r)} |u(x) - u(x_0)|\, d\mathcal{L}^n(x) \to 0,
\end{align*}
which is the rate-$r$ absorption.
*Conclusion.* At $\mathcal{L}^n$-a.e. $x_0$, the integral
\begin{align*}
\fint_{B(x_0, r)}\frac{|u(x) - u(x_0) - g(x_0)\cdot(x - x_0)|}{r}\, d\mathcal{L}^n(x) \to 0 \quad \text{as } r \to 0.
\end{align*}
This is the definition of $u$ being approximately differentiable at $x_0$ with approximate gradient $\nabla u(x_0) = g(x_0)$. Since $g$ is the Radon–Nikodym density of $D^a u$ relative to $\mathcal{L}^n$, the identity $D^a u = \nabla u \cdot \mathcal{L}^n$ follows.
[/guided]
[/step]
[step:Identify $\widetilde{u}(x_0) = u(x_0)$ at $\mathcal{L}^n$-a.e. $x_0$]
The approximate limit $\widetilde{u}(x_0)$ is, by definition, the unique value $\alpha \in \mathbb{R}$ (when it exists) such that
\begin{align*}
\lim_{r \to 0}\fint_{B(x_0, r)} |u(x) - \alpha|\, d\mathcal{L}^n(x) = 0.
\end{align*}
Apply the [Lebesgue–Besicovitch differentiation theorem](/theorems/lebesgue-besicovitch) to $u \in L^1_{\mathrm{loc}}(\Omega)$ (which holds since $u \in BV(\Omega) \subseteq L^1(\Omega)$). At $\mathcal{L}^n$-a.e. $x_0 \in \Omega$ (the Lebesgue points of $u$),
\begin{align*}
\lim_{r \to 0}\fint_{B(x_0, r)} |u(x) - u(x_0)|\, d\mathcal{L}^n(x) = 0.
\end{align*}
At such $x_0$, the value $\alpha = u(x_0)$ realises the approximate-limit condition, so $\widetilde{u}(x_0) = u(x_0)$. Since the set of Lebesgue points has full $\mathcal{L}^n$-measure in $\Omega$, $\widetilde{u}(x_0) = u(x_0)$ for $\mathcal{L}^n$-a.e. $x_0 \in \Omega$.
This completes the proof of all three claims: the existence of $\nabla u$ a.e., the identity $D^a u = \nabla u \cdot \mathcal{L}^n$, and $\widetilde{u} = u$ a.e.
[guided]
The final assertion is the $\mathcal{L}^n$-a.e. identification of the approximate limit and the precise representative.
The defining property of the approximate limit at $x_0$: $\widetilde{u}(x_0) = \alpha$ when
\begin{align*}
\lim_{r \to 0}\fint_{B(x_0, r)}|u(x) - \alpha|\, d\mathcal{L}^n(x) = 0,
\end{align*}
and $\alpha$ is the unique such value when it exists.
The hypotheses of the [Lebesgue–Besicovitch differentiation theorem](/theorems/lebesgue-besicovitch): $u \in L^1_{\mathrm{loc}}(\Omega)$. Verification: $u \in BV(\Omega)$ implies $u \in L^1(\Omega) \subseteq L^1_{\mathrm{loc}}(\Omega)$ (the $L^1$-component of the $BV$-norm guarantees integrability on $\Omega$, and locally so on any compact subset). The conclusion of the differentiation theorem is the strong form
\begin{align*}
\lim_{r \to 0}\fint_{B(x_0, r)} |u(x) - u(x_0)|\, d\mathcal{L}^n(x) = 0 \quad \text{at $\mathcal{L}^n$-a.e. } x_0.
\end{align*}
At such $x_0$ — the *Lebesgue points* of $u$ — the value $\alpha = u(x_0)$ realises the approximate-limit condition. By uniqueness of the approximate limit (when it exists), $\widetilde{u}(x_0) = u(x_0)$.
Since the Lebesgue points form a set of full $\mathcal{L}^n$-measure (a standard consequence of the [Lebesgue–Besicovitch differentiation theorem](/theorems/lebesgue-besicovitch), holding for any $L^1_{\mathrm{loc}}$ function), the identity $\widetilde{u}(x_0) = u(x_0)$ holds at $\mathcal{L}^n$-a.e. $x_0 \in \Omega$.
This completes the proof of all three claims of the theorem: (i) the approximate gradient $\nabla u(x_0)$ exists for $\mathcal{L}^n$-a.e. $x_0$ (Step 4), (ii) the identity $D^a u = \nabla u \cdot \mathcal{L}^n$ holds (Step 4 via the Radon–Nikodym density identification), and (iii) $\widetilde{u}(x_0) = u(x_0)$ at $\mathcal{L}^n$-a.e. $x_0$ (this step).
[/guided]
[/step]
Explore Further
Weak-$ Convergence Preserves Total Mass Under Tightness
Geometric Measure Theory
Weak-Type Estimate for the Maximal Function
Geometric Measure Theory
Lebesgue Points and Precise Representatives for $L^p$ Functions
Geometric Measure Theory
Precise Representative Outside the Jump Set
Geometric Measure Theory
Poincaré Inequality on Balls
Geometric Measure Theory
Approximate Continuity and Measurability
Geometric Measure Theory
Vitali Covering Theorem
Geometric Measure Theory
Trace Kernel Characterization
Geometric Measure Theory