Rademacher's Theorem — Statement & Proof

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

No discussion available for this theorem.

Proof

[proofplan] The strategy is to first reduce to scalar globally Lipschitz functions, then build the linear approximant from directional derivatives. Differentiability is component-wise, so $m = 1$ suffices, and using [McShane's Extension Theorem](/theorems/3067) we may assume $f$ is globally Lipschitz on $\mathbb{R}^n$. Three layers: (i) for each fixed direction $v \in S^{n-1}$, the directional derivative $D_v f$ exists $\mathcal{L}^n$-a.e. on $\mathbb{R}^n$ via Tonelli and the absolute continuity of one-variable Lipschitz functions; (ii) on a full-measure set, the directional derivative in *every* direction in a countable dense family equals $\nabla f \cdot v$, where $\nabla f \in L^\infty(\mathbb{R}^n; \mathbb{R}^n)$ is the weak gradient (existing because $f$ is Lipschitz); (iii) at Lebesgue points of $\nabla f$, the average $\fint_{B(x_0, r)} \nabla f\, d\mathcal{L}^n$ converges to $\nabla f(x_0)$, which both pins down the genuine pointwise representative of $\nabla f$ and yields the bound $|\nabla f(x_0)| \le L$ pointwise (rather than essentially). Combined with directional consistency at $x_0$ and a finite $\varepsilon/(8L)$-net of $S^{n-1}$, the global Lipschitz constant $L$ then upgrades the directional limits to a uniform Fréchet limit. [/proofplan] [step:Reduce to globally Lipschitz scalar functions] If $f = (f^1, \dots, f^m): \mathbb{R}^n \to \mathbb{R}^m$ is differentiable at $x$, then so is each component $f^i: \mathbb{R}^n \to \mathbb{R}$, and conversely: if every $f^i$ is differentiable at $x$ with derivative $\ell_i \in (\mathbb{R}^n)^*$, then $f$ is differentiable at $x$ with $Df_x = (\ell_1, \dots, \ell_m)$. Hence it suffices to prove the theorem for each scalar component, i.e., we may assume $m = 1$. Now $f: \mathbb{R}^n \to \mathbb{R}$ is locally Lipschitz, but differentiability is a local property: $f$ is differentiable at $x$ iff $f|_U$ is differentiable at $x$ for any open neighbourhood $U$ of $x$. Cover $\mathbb{R}^n$ by countably many bounded *open* balls $\{B_k\}_{k=1}^\infty$ on each of which $f$ is Lipschitz with constant $L_k$. Apply [McShane's Extension Theorem](/theorems/3067) to $f|_{B_k}$ to obtain a globally $L_k$-Lipschitz extension $f_k: \mathbb{R}^n \to \mathbb{R}$ of $f|_{B_k}$. For any $x \in B_k$, the open ball $B_k$ is a neighbourhood of $x$ on which $f = f_k$, so the differentiability of $f$ at $x$ is equivalent to the differentiability of $f_k$ at $x$. Hence \begin{align*} \{x \in B_k : f \text{ is not differentiable at } x\} \subseteq \{x \in \mathbb{R}^n : f_k \text{ is not differentiable at } x\}. \end{align*} A countable union of $\mathcal{L}^n$-null sets is null, so it suffices to prove the theorem for globally $L$-Lipschitz $f: \mathbb{R}^n \to \mathbb{R}$. [/step] [step:Establish $\mathcal{L}^n$-a.e. existence of the directional derivative in a fixed direction $v$] Fix $v \in S^{n-1}$. Define \begin{align*} D_v f: \mathbb{R}^n &\to \mathbb{R} \\ x &\mapsto \lim_{t \to 0} \frac{f(x + tv) - f(x)}{t}, \end{align*} whenever the (two-sided) limit exists, and $D_v f(x) := 0$ otherwise. Choose any orthonormal basis $\{v, u_2, \dots, u_n\}$ of $\mathbb{R}^n$. Each $x \in \mathbb{R}^n$ decomposes uniquely as $x = sv + y$ with $s \in \mathbb{R}$ and $y \in v^\perp \cong \mathbb{R}^{n-1}$. For each fixed $y \in v^\perp$, define \begin{align*} g_y: \mathbb{R} &\to \mathbb{R} \\ s &\mapsto f(sv + y). \end{align*} Then $g_y$ is $L$-Lipschitz on $\mathbb{R}$ since $|g_y(s) - g_y(s')| = |f(sv + y) - f(s'v + y)| \le L|s - s'|$. Every Lipschitz function on $\mathbb{R}$ is absolutely continuous on bounded intervals, so by the Differentiability of Absolutely Continuous Functions, $g_y'(s)$ exists for $\mathcal{L}^1$-a.e. $s \in \mathbb{R}$. Equivalently, the set \begin{align*} N_y := \{s \in \mathbb{R}: g_y'(s) \text{ does not exist}\} \end{align*} satisfies $\mathcal{L}^1(N_y) = 0$ for every $y \in v^\perp$. Define $N_v := \{x \in \mathbb{R}^n: D_v f(x) \text{ does not exist}\}$. Under the decomposition $x = sv + y$, we have $x \in N_v \iff s \in N_y$. To verify Borel measurability of $N_v$, write \begin{align*} \overline{D}_v f(x) := \limsup_{\substack{t \to 0 \\ t \in \mathbb{Q} \setminus \{0\}}} \frac{f(x + tv) - f(x)}{t}, \qquad \underline{D}_v f(x) := \liminf_{\substack{t \to 0 \\ t \in \mathbb{Q} \setminus \{0\}}} \frac{f(x + tv) - f(x)}{t}, \end{align*} which are countable suprema/infima of continuous functions of $x$ (since $f$ is Lipschitz, hence continuous), hence Borel measurable; by continuity of $f$, the rational and real limits agree, so $N_v = \{\overline{D}_v f \ne \underline{D}_v f\}$ is Borel. The integrand $\mathbf{1}_{N_v}: \mathbb{R}^n \to \{0,1\}$ is non-negative and Borel measurable, and $(\mathbb{R} \times v^\perp, \mathcal{L}^1 \otimes \mathcal{L}^{n-1})$ is $\sigma$-finite. We therefore apply **Tonelli's theorem** (the non-negative form of Fubini, which requires no integrability hypothesis): \begin{align*} \mathcal{L}^n(N_v) = \int_{\mathbb{R}^n} \mathbf{1}_{N_v}\, d\mathcal{L}^n = \int_{v^\perp} \int_{\mathbb{R}} \mathbf{1}_{N_v}(sv + y)\, d\mathcal{L}^1(s)\, d\mathcal{L}^{n-1}(y) = \int_{v^\perp} \mathcal{L}^1(N_y)\, d\mathcal{L}^{n-1}(y) = 0. \end{align*} [/step] [step:Identify the weak gradient $\nabla f$ as the directional derivative for almost every direction] Since $f$ is globally $L$-Lipschitz on $\mathbb{R}^n$, $f \in W^{1,\infty}(\mathbb{R}^n)$ with $\|\nabla f\|_{L^\infty} \le L$ — this is the standard characterisation Lipschitz Functions Coincide with $W^{1,\infty}$. Let $\nabla f \in L^\infty(\mathbb{R}^n; \mathbb{R}^n)$ denote the weak gradient. Fix $v \in S^{n-1}$. We claim \begin{align*} D_v f(x) = \nabla f(x) \cdot v \quad \text{for $\mathcal{L}^n$-a.e. } x \in \mathbb{R}^n. \end{align*} Take any test function $\varphi \in C_c^\infty(\mathbb{R}^n)$. The difference quotient $(f(x + tv) - f(x))/t$ is bounded by $L$ everywhere (Lipschitz bound) and converges $\mathcal{L}^n$-a.e. to $D_v f(x)$ as $t \to 0$ by Step 2. The integrable majorant $L \cdot |\varphi(x)|$ allows the dominated convergence theorem: \begin{align*} \int_{\mathbb{R}^n} \frac{f(x + tv) - f(x)}{t}\, \varphi(x)\, d\mathcal{L}^n(x) \to \int_{\mathbb{R}^n} D_v f(x)\, \varphi(x)\, d\mathcal{L}^n(x) \quad \text{as } t \to 0. \end{align*} On the other hand, by translation-invariance of Lebesgue measure (substitute $x \mapsto x + tv$ in one of the two pieces), \begin{align*} \int_{\mathbb{R}^n} \frac{f(x + tv) - f(x)}{t}\, \varphi(x)\, d\mathcal{L}^n(x) = -\int_{\mathbb{R}^n} f(x)\, \frac{\varphi(x) - \varphi(x - tv)}{t}\, d\mathcal{L}^n(x). \end{align*} The difference quotient $(\varphi(x) - \varphi(x - tv))/t$ converges uniformly to $\nabla \varphi(x) \cdot v$ as $t \to 0$ (since $\varphi$ is smooth and compactly supported), and is bounded by $\|\nabla\varphi\|_{L^\infty}$ uniformly with support in a fixed compact $K$ for $|t| \le 1$. Hence by dominated convergence with majorant $\|\nabla\varphi\|_{L^\infty} \cdot \mathbf{1}_K(x) \cdot |f(x)| \in L^1(\mathbb{R}^n)$ (since $f$ is continuous and $K$ is compact), \begin{align*} \int_{\mathbb{R}^n} \frac{f(x + tv) - f(x)}{t}\, \varphi(x)\, d\mathcal{L}^n(x) \to -\int_{\mathbb{R}^n} f(x)\, \nabla \varphi(x) \cdot v\, d\mathcal{L}^n(x). \end{align*} By the definition of the weak gradient applied componentwise, \begin{align*} -\int_{\mathbb{R}^n} f\, \nabla\varphi \cdot v\, d\mathcal{L}^n = \int_{\mathbb{R}^n} (\nabla f \cdot v)\, \varphi\, d\mathcal{L}^n. \end{align*} Comparing the two limits, \begin{align*} \int_{\mathbb{R}^n} D_v f(x)\, \varphi(x)\, d\mathcal{L}^n(x) = \int_{\mathbb{R}^n} (\nabla f(x) \cdot v)\, \varphi(x)\, d\mathcal{L}^n(x) \qquad \forall \varphi \in C_c^\infty(\mathbb{R}^n). \end{align*} Both $D_v f$ and $\nabla f \cdot v$ lie in $L^\infty(\mathbb{R}^n) \subseteq L^1_{\mathrm{loc}}(\mathbb{R}^n)$, so the Fundamental Lemma of the Calculus of Variations yields \begin{align*} D_v f(x) = \nabla f(x) \cdot v \quad \text{for $\mathcal{L}^n$-a.e. } x \in \mathbb{R}^n. \end{align*} [/step] [step:Pass to a countable dense set of directions and isolate the full-measure set] Choose a countable dense subset $\{v_k\}_{k=1}^\infty \subseteq S^{n-1}$. For each $k$, the set \begin{align*} Z_k := \{x \in \mathbb{R}^n: D_{v_k} f(x) \text{ does not exist or } D_{v_k} f(x) \ne \nabla f(x) \cdot v_k\} \end{align*} satisfies $\mathcal{L}^n(Z_k) = 0$ by Steps 2 and 3. Let \begin{align*} Z := \left( \bigcup_{k=1}^\infty Z_k \right) \cup \{x \in \mathbb{R}^n : x \text{ is not a Lebesgue point of } \nabla f\}. \end{align*} The first piece is null as a countable union of null sets; the second is null by the Lebesgue differentiation theorem applied to the locally integrable vector field $\nabla f \in L^\infty \subset L^1_{\mathrm{loc}}$ (componentwise). Hence $\mathcal{L}^n(Z) = 0$. For $x \in \mathbb{R}^n \setminus Z$, the following hold simultaneously: (i) For every $k \ge 1$, the two-sided limit $D_{v_k} f(x)$ exists and equals $\nabla f(x) \cdot v_k$. (ii) $x$ is a Lebesgue point of $\nabla f$, i.e., \begin{align*} \lim_{r \to 0} \frac{1}{\mathcal{L}^n(B(x, r))} \int_{B(x, r)} |\nabla f(y) - \nabla f(x)|\, d\mathcal{L}^n(y) = 0, \end{align*} and in particular \begin{align*} \nabla f(x) = \lim_{r \to 0} \fint_{B(x, r)} \nabla f(y)\, d\mathcal{L}^n(y). \end{align*} [/step] [step:Use the Lebesgue point representative to obtain the pointwise gradient bound] Fix $x_0 \in \mathbb{R}^n \setminus Z$. The Lebesgue point property (ii) is what justifies treating $\nabla f(x_0)$ as a genuine pointwise value rather than merely an equivalence class. We extract two consequences for the rest of the proof. First, condition (ii) identifies $\nabla f(x_0)$ as the strong-pointwise representative through averages: \begin{align*} \nabla f(x_0) = \lim_{r \to 0} \fint_{B(x_0, r)} \nabla f(y)\, d\mathcal{L}^n(y). \end{align*} Second, this identification yields the **pointwise** (not merely essential) gradient bound $|\nabla f(x_0)| \le L$. Indeed, for every $r > 0$, the average $\fint_{B(x_0, r)} \nabla f\, d\mathcal{L}^n \in \mathbb{R}^n$ is the integral of an $L^\infty$-vector field of norm at most $L$, so by the triangle inequality for vector-valued integrals, \begin{align*} \left| \fint_{B(x_0, r)} \nabla f(y)\, d\mathcal{L}^n(y) \right| \le \fint_{B(x_0, r)} |\nabla f(y)|\, d\mathcal{L}^n(y) \le \|\nabla f\|_{L^\infty} \le L. \end{align*} Passing to the limit $r \to 0$ using the Lebesgue point identity above, \begin{align*} |\nabla f(x_0)| = \left| \lim_{r \to 0} \fint_{B(x_0, r)} \nabla f\, d\mathcal{L}^n \right| \le L. \end{align*} Without condition (ii), this bound would only hold for $\mathcal{L}^n$-a.e. $x$; the Lebesgue point hypothesis upgrades it to *the* particular point $x_0$ chosen here. [/step] [step:Upgrade pointwise directional consistency to Fréchet differentiability] We continue with $x_0 \in \mathbb{R}^n \setminus Z$ and prove $f$ is differentiable at $x_0$ with derivative $\nabla f(x_0)$, i.e., \begin{align*} \lim_{h \to 0} \frac{f(x_0 + h) - f(x_0) - \nabla f(x_0) \cdot h}{|h|} = 0. \end{align*} Fix $\varepsilon > 0$. We seek $\delta > 0$ such that $0 < |h| < \delta$ implies $|f(x_0 + h) - f(x_0) - \nabla f(x_0) \cdot h| < \varepsilon|h|$. [claim:For each $v_k$ in the countable dense set, the two-sided limit $\lim_{t \to 0} (f(x_0 + tv_k) - f(x_0))/t$ exists and equals $\nabla f(x_0) \cdot v_k$] [proof] By the construction of $Z$ in Step 4, $x_0 \notin Z_k$, which means the two-sided limit $D_{v_k} f(x_0) = \lim_{t \to 0} (f(x_0 + tv_k) - f(x_0))/t$ exists and equals $\nabla f(x_0) \cdot v_k$. This is exactly condition (i). [/proof] [/claim] The directional consistency at the *single* point $x_0$ for the countably many directions $v_k$ does not by itself give Fréchet differentiability — we still need uniform control as $h$ ranges over a small ball around the origin. The two ingredients that supply this uniformity are the global Lipschitz constant $L$ (which bounds finite differences in *any* direction uniformly) and the pointwise bound $|\nabla f(x_0)| \le L$ established in Step 5 from the Lebesgue point condition (ii). By compactness of $S^{n-1}$ and density of $\{v_k\}$, choose finitely many $v_{k_1}, \dots, v_{k_N}$ such that \begin{align*} S^{n-1} \subseteq \bigcup_{j=1}^N B\!\left(v_{k_j}, \frac{\varepsilon}{8L}\right). \end{align*} For each $j \in \{1, \dots, N\}$, by condition (i) at $x_0$ (the two-sided directional limit), there exists $\delta_j > 0$ such that \begin{align*} \left| \frac{f(x_0 + tv_{k_j}) - f(x_0)}{t} - \nabla f(x_0) \cdot v_{k_j} \right| < \frac{\varepsilon}{4} \qquad \text{for all } t \in \mathbb{R} \text{ with } 0 < |t| < \delta_j. \end{align*} Set $\delta := \min(\delta_1, \dots, \delta_N) > 0$. Now fix $h \in \mathbb{R}^n$ with $0 < |h| < \delta$. Let $u := h/|h| \in S^{n-1}$ and choose $j$ such that $|u - v_{k_j}| < \varepsilon/(8L)$. Decompose: \begin{align*} &\bigl| f(x_0 + h) - f(x_0) - \nabla f(x_0) \cdot h \bigr| \\ &\quad \le \underbrace{|f(x_0 + |h|u) - f(x_0 + |h|v_{k_j})|}_{\text{(I)}} + \underbrace{|f(x_0 + |h|v_{k_j}) - f(x_0) - \nabla f(x_0) \cdot (|h| v_{k_j})|}_{\text{(II)}} + \underbrace{|\nabla f(x_0) \cdot |h|(v_{k_j} - u)|}_{\text{(III)}}. \end{align*} **Term (I).** By the global $L$-Lipschitz bound on $f$, \begin{align*} |f(x_0 + |h|u) - f(x_0 + |h|v_{k_j})| \le L\bigl| |h|u - |h|v_{k_j} \bigr| = L|h|\,|u - v_{k_j}| < L|h| \cdot \frac{\varepsilon}{8L} = \frac{\varepsilon|h|}{8}. \end{align*} **Term (II).** Since $0 < |h| < \delta \le \delta_j$, the directional derivative bound applied with $t := |h|$ gives \begin{align*} \left| \frac{f(x_0 + |h|v_{k_j}) - f(x_0)}{|h|} - \nabla f(x_0) \cdot v_{k_j} \right| < \frac{\varepsilon}{4}, \end{align*} and multiplying by $|h|$, \begin{align*} |f(x_0 + |h|v_{k_j}) - f(x_0) - \nabla f(x_0) \cdot (|h|v_{k_j})| < \frac{\varepsilon|h|}{4}. \end{align*} **Term (III).** By the pointwise gradient bound $|\nabla f(x_0)| \le L$ established in Step 5 (which used the Lebesgue point condition (ii)), \begin{align*} |\nabla f(x_0) \cdot |h|(v_{k_j} - u)| \le |\nabla f(x_0)|\, |h|\, |v_{k_j} - u| \le L|h| \cdot \frac{\varepsilon}{8L} = \frac{\varepsilon|h|}{8}. \end{align*} Combining, \begin{align*} |f(x_0 + h) - f(x_0) - \nabla f(x_0) \cdot h| < \frac{\varepsilon|h|}{8} + \frac{\varepsilon|h|}{4} + \frac{\varepsilon|h|}{8} = \frac{\varepsilon|h|}{2} < \varepsilon|h|. \end{align*} Since $\varepsilon > 0$ was arbitrary, \begin{align*} \lim_{h \to 0} \frac{|f(x_0 + h) - f(x_0) - \nabla f(x_0) \cdot h|}{|h|} = 0, \end{align*} i.e., $f$ is differentiable at $x_0$ with derivative $Df_{x_0}(h) = \nabla f(x_0) \cdot h$. [/step] [step:Conclude $f$ is differentiable on a full-measure set] By Step 6, $f$ is differentiable at every $x_0 \in \mathbb{R}^n \setminus Z$, where $\mathcal{L}^n(Z) = 0$. Combined with the reduction in Step 1 — which shows it suffices to handle scalar globally Lipschitz functions because differentiability is a local, componentwise property — this completes the proof: every locally Lipschitz $f: \mathbb{R}^n \to \mathbb{R}^m$ is differentiable at $\mathcal{L}^n$-a.e. point of $\mathbb{R}^n$. [/step]

What brings you to Androma?

Start with a route through the knowledge graph.

Rademacher's Theorem (Theorem # 3069)

Discussion

Proof

Explore Further

Sign in to Androma

Check your inbox

One last step

Rademacher's Theorem (Theorem # 3069)

Discussion

Proof

Explore Further