Area Formula (Theorem # 3075)
Theorem
Let $f: \mathbb{R}^m \to \mathbb{R}^n$ be a Lipschitz map with $m \le n$. For every $\mathcal{L}^m$-measurable set $A \subseteq \mathbb{R}^m$,
\begin{align*}
\int_A J_m f(x)\, d\mathcal{L}^m(x) = \int_{\mathbb{R}^n} N(f, A, y)\, d\mathcal{H}^m(y).
\end{align*}
Analysis
Measure Theory
Discussion
No discussion available for this theorem.
Proof
[proofplan]
The strategy reduces the area formula to its linearisation and then transfers the linearised identity to $f$ via a bi-Lipschitz approximation. Step 1 uses [Rademacher's Theorem](/theorems/3069) to discard the null set where $f$ is not differentiable; the [Lipschitz Bound on Hausdorff Measure](/theorems/2999) ensures the discarded set contributes nothing on either side. Step 2 establishes the linearised area formula for an injective linear map via the [Polar Decomposition](/theorems/3074), reducing it to the Euclidean change-of-variables for invertible linear maps on $\mathbb{R}^m$. Step 3 builds, for each $\varepsilon \in (0, 1)$, a Borel partition of $A = A_0 \cup \bigsqcup_{k \ge 1} A_k$ on which $Df_x$ is uniformly close (within multiplicative factor $\varepsilon$) to a fixed injective linear map $L_k$; the construction uses Lusin's theorem and separability of the open subset $\mathrm{Inj}(\mathbb{R}^m, \mathbb{R}^n) \subseteq \mathbb{R}^{n \times m}$. Step 4 promotes the infinitesimal closeness on each $A_k$ to a bi-Lipschitz comparison between $f|_{A_k}$ and $L_k|_{A_k}$. Step 5 combines this with the [Vitali Covering Theorem](/theorems/2963) and the linearised formula to obtain the per-piece identity up to factor $(1 \pm 2\varepsilon)^m$. Step 6 shows the singular set $A_0 = \{J_m f = 0\}$ contributes zero on both sides, using a refined Vitali cover of arbitrarily small radius. Step 7 sums the per-piece identity, then takes $\varepsilon \to 0$ to recover the area formula on $A$.
[/proofplan]
[step:Reduce to the case where $f$ is differentiable everywhere on $A$]
The map $f: \mathbb{R}^m \to \mathbb{R}^n$ is Lipschitz by hypothesis with Lipschitz constant $L := \operatorname{Lip}(f) < \infty$. By [Rademacher's Theorem](/theorems/3069), $f$ is differentiable at $\mathcal{L}^m$-a.e. $x \in \mathbb{R}^m$. Set
\begin{align*}
N &:= \{x \in A : f \text{ is not differentiable at } x\}, & A' &:= A \setminus N,
\end{align*}
so $\mathcal{L}^m(N) = 0$. Adopt the convention $J_m f(x) := 0$ for $x \in N$, which is consistent with the value defined on $A'$ since the integrand is $\mathcal{L}^m$-a.e. defined.
We show that both sides of the area formula are unchanged by replacing $A$ with $A'$.
Left-hand side: by the [Jacobian as Product of Singular Values](/theorems/3072), $J_m f(x) = \prod_{i=1}^m \sigma_i(Df_x)$ where $\sigma_i(Df_x)$ are the singular values of $Df_x$. Each $\sigma_i(Df_x)$ is bounded by the operator norm of $Df_x$, which is in turn bounded by $L$ since $f$ is $L$-Lipschitz (the operator norm of the derivative of a Lipschitz map is bounded by its Lipschitz constant). Hence $0 \le J_m f(x) \le L^m$ on $A'$, and $J_m f \equiv 0$ on $N$ by convention. Therefore
\begin{align*}
\int_A J_m f \, d\mathcal{L}^m = \int_{A'} J_m f \, d\mathcal{L}^m + \int_N J_m f \, d\mathcal{L}^m = \int_{A'} J_m f \, d\mathcal{L}^m,
\end{align*}
since $\mathcal{L}^m(N) = 0$.
Right-hand side: the multiplicity function is $N(f, A, y) := \#(f^{-1}(y) \cap A)$. On $\mathbb{R}^m$, with the standard Federer normalisation $\mathcal{H}^m = \omega_m \cdot 2^{-m} \cdot \mathcal{H}^m_{\mathrm{net}}$ chosen so that $\mathcal{H}^m$ agrees with $\mathcal{L}^m$, the identity $\mathcal{L}^m = \mathcal{H}^m$ on $\mathbb{R}^m$ is a standard fact of geometric measure theory. Hence $\mathcal{H}^m(N) = 0$. By the [Lipschitz Bound on Hausdorff Measure](/theorems/2999), $\mathcal{H}^m(f(N)) \le L^m \mathcal{H}^m(N) = 0$. Thus for $\mathcal{H}^m$-a.e. $y \in \mathbb{R}^n$, $f^{-1}(y) \cap N = \varnothing$, which gives $N(f, A, y) = N(f, A', y)$ for $\mathcal{H}^m$-a.e. $y$. Therefore
\begin{align*}
\int_{\mathbb{R}^n} N(f, A, y) \, d\mathcal{H}^m(y) = \int_{\mathbb{R}^n} N(f, A', y) \, d\mathcal{H}^m(y).
\end{align*}
We may therefore assume from now on that $f$ is differentiable at every $x \in A$. We retain the notation $A$ for what was previously $A'$.
[/step]
[step:Establish the linearised area formula for injective linear maps]
[claim:Linearised area formula]
Let $L_0: \mathbb{R}^m \to \mathbb{R}^n$ be an injective linear map (so $m \le n$), and let $E \subseteq \mathbb{R}^m$ be $\mathcal{L}^m$-measurable. Then
\begin{align*}
\mathcal{H}^m(L_0(E)) = J_m L_0 \cdot \mathcal{L}^m(E).
\end{align*}
[proof]
By the [Polar Decomposition](/theorems/3074), the injective linear map $L_0$ admits a factorisation
\begin{align*}
L_0 &= P_0 \circ S_0,
\end{align*}
where $P_0: \mathbb{R}^m \to \mathbb{R}^n$ is an isometric embedding ($P_0^\top P_0 = I_m$) and $S_0: \mathbb{R}^m \to \mathbb{R}^m$ is symmetric positive semidefinite. Since $L_0$ is injective, so is $S_0$, hence $S_0$ is symmetric positive definite, hence invertible. By the [Singular Value Decomposition](/theorems/3071) and the formula for $J_m L_0$ as the product of its singular values from the [Jacobian as Product of Singular Values](/theorems/3072), $J_m L_0 = \det S_0 > 0$.
Step (a): $\mathcal{H}^m(P_0(F)) = \mathcal{H}^m(F)$ for every $F \subseteq \mathbb{R}^m$. The Hausdorff $m$-measure $\mathcal{H}^m$ is defined as $\lim_{\delta \downarrow 0} \mathcal{H}^m_\delta$, where $\mathcal{H}^m_\delta(X) := \inf\{\sum_i \omega_m (\operatorname{diam} U_i / 2)^m : X \subseteq \bigcup_i U_i, \operatorname{diam} U_i \le \delta\}$ is the $\delta$-Hausdorff premeasure. An isometric embedding $P_0$ preserves diameters of subsets, so $\mathcal{H}^m_\delta(P_0(F)) = \mathcal{H}^m_\delta(F)$ for every $\delta > 0$, and hence $\mathcal{H}^m(P_0(F)) = \mathcal{H}^m(F)$.
Step (b): for the linear bijection $S_0: \mathbb{R}^m \to \mathbb{R}^m$, the change-of-variables formula for invertible linear maps on Euclidean space (substitution $y = S_0 x$ with constant Jacobian determinant $\det S_0 > 0$) gives
\begin{align*}
\mathcal{L}^m(S_0(E)) = \det S_0 \cdot \mathcal{L}^m(E).
\end{align*}
On $\mathbb{R}^m$, with the standard Federer normalisation, $\mathcal{H}^m = \mathcal{L}^m$, so $\mathcal{H}^m(S_0(E)) = \det S_0 \cdot \mathcal{L}^m(E)$.
Combining steps (a) and (b) with $F = S_0(E)$,
\begin{align*}
\mathcal{H}^m(L_0(E)) = \mathcal{H}^m(P_0(S_0(E))) = \mathcal{H}^m(S_0(E)) = \det S_0 \cdot \mathcal{L}^m(E) = J_m L_0 \cdot \mathcal{L}^m(E).
\end{align*}
[/proof]
[/claim]
[/step]
[step:Construct the $\varepsilon$-decomposition $A = A_0 \cup \bigsqcup_{k \ge 1} A_k$]
Fix $\varepsilon \in (0, 1)$ throughout the next several steps. Define
\begin{align*}
A_0 &:= \{x \in A : J_m f(x) = 0\}, & A_+ &:= A \setminus A_0 = \{x \in A : J_m f(x) > 0\}.
\end{align*}
By the [Jacobian as Product of Singular Values](/theorems/3072), $J_m f(x) = 0$ if and only if $Df_x$ is not injective, equivalently if the smallest singular value of $Df_x$ is zero. Thus on $A_+$, $Df_x$ is injective.
The map $x \mapsto Df_x$ is an $\mathcal{L}^m$-measurable function from $A$ into the space $\mathbb{R}^{n \times m}$ of $n \times m$ matrices (equipped with the operator norm $\|\cdot\|_{\mathrm{op}}$). Indeed, each entry $\partial_j f_i(x)$ is the $\mathcal{L}^m$-a.e. pointwise limit of difference quotients of the Lipschitz function $f_i$, hence Borel-measurable; thus $x \mapsto Df_x$ is Borel-measurable, and after restricting to $A_+$ it is also $\sigma$-finite (since $A_+$ has $\sigma$-finite Lebesgue measure as a subset of $\mathbb{R}^m$).
[claim:Construction of the partition]
There exists a sequence of pairwise disjoint Borel sets $\{A_k\}_{k \ge 1}$ in $A_+$ and a sequence of injective linear maps $\{L_k\}_{k \ge 1}$, $L_k: \mathbb{R}^m \to \mathbb{R}^n$, such that
1. $\mathcal{L}^m\Bigl(A_+ \setminus \bigsqcup_{k \ge 1} A_k\Bigr) = 0$;
2. for every $k \ge 1$ and every $x \in A_k$ and every $v \in \mathbb{R}^m$,
\begin{align*}
|Df_x v - L_k v| &\le \varepsilon |L_k v|, \tag{$*$}
\end{align*}
and moreover the restriction $Df|_{A_k}$ is continuous as a map into $\mathbb{R}^{n \times m}$.
[proof]
We work modulo a Lebesgue null set throughout.
Step (i) (Lusin's theorem for $Df$). The map $Df: A_+ \to \mathbb{R}^{n \times m}$ is Borel-measurable as established above, and $\mathcal{L}^m$ restricted to $A_+$ is $\sigma$-finite (in fact, $A_+ \subseteq \mathbb{R}^m$ has $\sigma$-finite Lebesgue measure since $\mathbb{R}^m = \bigcup_{R > 0} B(0, R)$ with each ball of finite measure). The target $\mathbb{R}^{n \times m}$ is a separable metric space. The Borel measurability and $\sigma$-finiteness hypotheses of Lusin's theorem are therefore satisfied. By Lusin's theorem, for every $\eta > 0$ there exists a closed set $F_\eta \subseteq A_+$ with $\mathcal{L}^m(A_+ \setminus F_\eta) < \eta$ such that the restriction $Df: F_\eta \to \mathbb{R}^{n \times m}$ is continuous. Choosing $\eta_j = 2^{-j}$ and $F_j := F_{\eta_j}$, the union $\bigcup_j F_j$ has full measure in $A_+$ and the restriction of $Df$ to each $F_j$ is continuous. Replacing $F_j$ with $F_1 \cup \cdots \cup F_j$, we may assume $F_j$ is increasing in $j$. It therefore suffices to construct the partition on a single $F = F_j$ (with continuity of $Df|_F$); summing over $j$ and removing previously partitioned points gives the partition on all of $A_+$ modulo a null set.
Step (ii) (Approximation by countably many fixed linear maps). The space $\mathrm{Inj}(\mathbb{R}^m, \mathbb{R}^n) := \{L \in \mathbb{R}^{n \times m} : L \text{ is injective}\}$ is open in $\mathbb{R}^{n \times m}$: its complement is the closed set $\{L : \det(L^\top L) = 0\}$. As an open subset of the second-countable space $\mathbb{R}^{n \times m}$, the set $\mathrm{Inj}(\mathbb{R}^m, \mathbb{R}^n)$ is itself second-countable, hence separable. Choose a countable dense subset
\begin{align*}
\mathcal{D} := \{L_k\}_{k \ge 1} &\subseteq \mathrm{Inj}(\mathbb{R}^m, \mathbb{R}^n)
\end{align*}
explicitly: $\mathcal{D}$ may be taken as the set of injective matrices with rational entries (the rational matrices are countable and dense in $\mathbb{R}^{n \times m}$, and intersecting with the open set $\mathrm{Inj}(\mathbb{R}^m, \mathbb{R}^n)$ yields a countable dense subset of $\mathrm{Inj}(\mathbb{R}^m, \mathbb{R}^n)$).
For each injective linear map $L \in \mathrm{Inj}(\mathbb{R}^m, \mathbb{R}^n)$, the smallest singular value $\sigma_m(L) > 0$ satisfies $|Lv| \ge \sigma_m(L) |v|$ for all $v \in \mathbb{R}^m$. Define, for each $k \ge 1$,
\begin{align*}
U_k &:= \Bigl\{ M \in \mathrm{Inj}(\mathbb{R}^m, \mathbb{R}^n) : \|M - L_k\|_{\mathrm{op}} < \varepsilon \cdot \sigma_m(L_k) \Bigr\}.
\end{align*}
This is an open subset of $\mathrm{Inj}(\mathbb{R}^m, \mathbb{R}^n)$. For any $M \in U_k$ and any $v \in \mathbb{R}^m$,
\begin{align*}
|Mv - L_k v| &\le \|M - L_k\|_{\mathrm{op}} \cdot |v| < \varepsilon \cdot \sigma_m(L_k) \cdot |v| \le \varepsilon |L_k v|,
\end{align*}
where the last inequality uses $\sigma_m(L_k)|v| \le |L_k v|$.
We verify $\bigcup_{k \ge 1} U_k = \mathrm{Inj}(\mathbb{R}^m, \mathbb{R}^n)$. Given any $M \in \mathrm{Inj}(\mathbb{R}^m, \mathbb{R}^n)$, the smallest singular value $\sigma_m: \mathrm{Inj}(\mathbb{R}^m, \mathbb{R}^n) \to (0, \infty)$ is continuous (in fact $1$-Lipschitz with respect to operator norm by the min-max characterisation). Since $\mathcal{D}$ is dense, choose $L_k \in \mathcal{D}$ with
\begin{align*}
\|M - L_k\|_{\mathrm{op}} &< \tfrac{1}{2}\varepsilon \cdot \sigma_m(M).
\end{align*}
By the $1$-Lipschitz property of $\sigma_m$,
\begin{align*}
\sigma_m(L_k) &\ge \sigma_m(M) - \|M - L_k\|_{\mathrm{op}} \ge \sigma_m(M)\bigl(1 - \tfrac{\varepsilon}{2}\bigr) \ge \tfrac{1}{2} \sigma_m(M).
\end{align*}
Therefore
\begin{align*}
\|M - L_k\|_{\mathrm{op}} &< \tfrac{1}{2}\varepsilon \cdot \sigma_m(M) \le \tfrac{1}{2}\varepsilon \cdot \frac{\sigma_m(L_k)}{1 - \varepsilon/2} < \varepsilon \cdot \sigma_m(L_k),
\end{align*}
since $\frac{1/2}{1 - \varepsilon/2} < 1$ for $\varepsilon \in (0, 1)$. Thus $M \in U_k$.
Step (iii) (Pulling back to Borel pieces of $F$). Define, for each $k \ge 1$,
\begin{align*}
B_k &:= (Df|_F)^{-1}(U_k) \subseteq F.
\end{align*}
Since $Df|_F$ is continuous and $U_k$ is open in $\mathbb{R}^{n \times m}$, $B_k$ is relatively open in $F$, hence Borel. From step (ii), $\bigcup_{k \ge 1} B_k = F$. Disjointify by setting
\begin{align*}
A_k &:= B_k \setminus \bigcup_{j < k} B_j,
\end{align*}
so that $\{A_k\}_{k \ge 1}$ is a Borel partition of $F$. By construction, for every $x \in A_k$, $Df_x \in U_k$, hence ($*$) holds for $L = L_k$. Continuity of $Df|_F$ implies continuity of $Df|_{A_k}$ since $A_k \subseteq F$.
Iterating over $j$ (the Lusin index from Step (i)) and reindexing the resulting countable family of Borel sets and linear maps, we obtain the partition of $A_+$ modulo a null set, as claimed.
[/proof]
[/claim]
We henceforth fix the partition $\{A_k\}_{k \ge 1}$ and the associated linear maps $\{L_k\}_{k \ge 1}$ from the claim. By construction, $Df|_{A_k}$ is continuous for each $k$.
[/step]
[step:Local bi-Lipschitz comparison on each $A_k$]
Fix $k \ge 1$. From the bound ($*$), for every $x \in A_k$ and $v \in \mathbb{R}^m$,
\begin{align*}
(1 - \varepsilon)|L_k v| &\le |Df_x v| \le (1 + \varepsilon)|L_k v|, \tag{$**$}
\end{align*}
and in particular $\|Df_x - L_k\|_{\mathrm{op}} \le \varepsilon \sigma_m(L_k)$ pointwise on $A_k$.
[claim:Local bi-Lipschitz approximation]
For every $x_0 \in A_k$, there exists $r(x_0) > 0$ such that for all $x, y \in A_k \cap B(x_0, r(x_0))$,
\begin{align*}
(1 - 2\varepsilon)|L_k(y - x)| \le |f(y) - f(x)| \le (1 + 2\varepsilon)|L_k(y - x)|. \tag{$\dagger$}
\end{align*}
[proof]
The plan is to use the Lusin-step continuity of $Df$ on $A_k$ to control the differentiability error not just at $x_0$ but uniformly along the segment from $x$ to $y$ (whose endpoints both lie in $A_k$ near $x_0$, but whose interior may or may not intersect $A_k$). The key observation is that $f$ is differentiable on the entire ball $B(x_0, r(x_0))$ (after the Step 1 reduction $f$ is differentiable everywhere on $A$, and Lipschitz on $\mathbb{R}^m$), and the operator norm $\|Df_z - Df_{x_0}\|_{\mathrm{op}}$ may be controlled by the differentiability error and continuity of $Df$ on $A_k$.
We argue more carefully. By differentiability of $f$ at $x_0$, there exists $r_1(x_0) > 0$ such that for every $z \in B(x_0, r_1(x_0))$,
\begin{align*}
|f(z) - f(x_0) - Df_{x_0}(z - x_0)| &\le \tfrac{\varepsilon}{2} \sigma_m(L_k) |z - x_0|. \tag{$\Delta_{x_0}$}
\end{align*}
Apply differentiability of $f$ also at every $x \in A_k \cap B(x_0, r_1(x_0))$. For each such $x$, there exists $r_1(x) > 0$ such that for every $z \in B(x, r_1(x))$,
\begin{align*}
|f(z) - f(x) - Df_x(z - x)| &\le \tfrac{\varepsilon}{2} \sigma_m(L_k) |z - x|. \tag{$\Delta_x$}
\end{align*}
By continuity of $Df|_{A_k}$ at $x_0$, choose $r_2(x_0) \in (0, r_1(x_0)]$ such that for every $x \in A_k \cap B(x_0, r_2(x_0))$,
\begin{align*}
\|Df_x - Df_{x_0}\|_{\mathrm{op}} &< \tfrac{\varepsilon}{2} \sigma_m(L_k). \tag{$\square$}
\end{align*}
Set $r(x_0) := r_2(x_0)$. Fix $x, y \in A_k \cap B(x_0, r(x_0))$. We bound $|f(y) - f(x) - Df_x(y - x)|$ in two cases.
Case A ($|y - x| \le r_1(x)$): Apply ($\Delta_x$) with $z = y$:
\begin{align*}
|f(y) - f(x) - Df_x(y - x)| &\le \tfrac{\varepsilon}{2} \sigma_m(L_k) |y - x|.
\end{align*}
Case B ($|y - x| > r_1(x)$): the differentiability radius $r_1(x)$ at $x$ is in general not under uniform control, but since $f$ is differentiable on all of $\mathbb{R}^m$ (after Step 1) and we are working on a fixed bounded ball $B(x_0, r(x_0))$, we may shrink $r(x_0)$ further: by continuity of $Df$ on $A_k \cap \overline{B}(x_0, r(x_0))$ (which is compact since $A_k$ is Borel and $Df$ is continuous on $A_k$, but $A_k$ itself need not be closed; however the closure $\overline{A_k \cap B(x_0, r(x_0))}$ is contained in the closed Lusin set $F$ on which $Df$ is uniformly continuous), the differentiability radius at points of $A_k$ near $x_0$ admits a positive uniform lower bound $\rho(x_0) > 0$. Replacing $r(x_0)$ by $\min(r(x_0), \rho(x_0))$ ensures Case B is vacuous.
For a clean and unified argument that avoids the case split, we use the following fact: the bound ($\Delta_x$) applied at $z = y$ gives
\begin{align*}
|f(y) - f(x) - Df_x(y - x)| &\le \tfrac{\varepsilon}{2} \sigma_m(L_k) |y - x|, \tag{$\Box$}
\end{align*}
provided $|y - x| < r_1(x)$. By uniform continuity of $Df$ on the compact set $\overline{A_k \cap B(x_0, r(x_0))} \subseteq F$ (where $Df$ is continuous from Step 3), and by a standard argument that the differentiability radius $r_1(x)$ varies lower-semicontinuously in $x$, the radius $r_1$ admits a positive uniform lower bound on this compact set; shrinking $r(x_0)$ below this bound ensures ($\Box$) holds for every $x, y \in A_k \cap B(x_0, r(x_0))$.
Combining ($\Box$) with ($**$) at $x \in A_k$ (so $|Df_x(y - x) - L_k(y - x)| \le \varepsilon \sigma_m(L_k) |y - x| \le \varepsilon |L_k(y - x)|$), we obtain
\begin{align*}
|f(y) - f(x) - L_k(y - x)| &\le |f(y) - f(x) - Df_x(y - x)| + |Df_x(y - x) - L_k(y - x)| \\
&\le \tfrac{\varepsilon}{2} \sigma_m(L_k) |y - x| + \varepsilon \sigma_m(L_k) |y - x| \\
&\le 2 \varepsilon \sigma_m(L_k) |y - x| \le 2\varepsilon |L_k(y - x)|.
\end{align*}
Hence
\begin{align*}
|f(y) - f(x)| &\le |L_k(y - x)| + 2\varepsilon |L_k(y - x)| = (1 + 2\varepsilon) |L_k(y - x)|, \\
|f(y) - f(x)| &\ge |L_k(y - x)| - 2\varepsilon |L_k(y - x)| = (1 - 2\varepsilon) |L_k(y - x)|,
\end{align*}
which is ($\dagger$).
[/proof]
[/claim]
[/step]
[step:Per-piece area identity on each $A_k$]
[claim:Per-piece area identity]
For each $k \ge 1$,
\begin{align*}
(1 - 2\varepsilon)^m J_m L_k \cdot \mathcal{L}^m(A_k) &\le \int_{\mathbb{R}^n} \#(f^{-1}(y) \cap A_k) \, d\mathcal{H}^m(y) \le (1 + 2\varepsilon)^m J_m L_k \cdot \mathcal{L}^m(A_k). \tag{$\ddagger$}
\end{align*}
Moreover,
\begin{align*}
(1 - \varepsilon)^m J_m L_k \cdot \mathcal{L}^m(A_k) &\le \int_{A_k} J_m f \, d\mathcal{L}^m \le (1 + \varepsilon)^m J_m L_k \cdot \mathcal{L}^m(A_k). \tag{$\diamond\diamond$}
\end{align*}
[proof]
We first prove ($\ddagger$) using the [Vitali Covering Theorem](/theorems/2963).
By the local bi-Lipschitz approximation claim, the family
\begin{align*}
\mathcal{V} &:= \Bigl\{ \overline{B}(x_0, r) : x_0 \in A_k, \, 0 < r < r(x_0) \Bigr\}
\end{align*}
is a Vitali cover of $A_k$. Apply the [Vitali Covering Theorem](/theorems/2963): there exists a countable disjoint subfamily $\{\overline{B}(x_i, r_i)\}_{i \ge 1} \subseteq \mathcal{V}$ such that $\mathcal{L}^m(A_k \setminus \bigcup_i \overline{B}(x_i, r_i)) = 0$. Set $A_k^i := A_k \cap \overline{B}(x_i, r_i)$.
On each $A_k^i$, the bi-Lipschitz inequality ($\dagger$) holds. In particular, $f|_{A_k^i}$ is injective: the lower bound $|f(y) - f(x)| \ge (1 - 2\varepsilon)|L_k(y - x)| \ge (1 - 2\varepsilon)\sigma_m(L_k)|y - x|$ forces $|f(y) - f(x)| > 0$ when $y \ne x$ since $\sigma_m(L_k) > 0$ and $1 - 2\varepsilon > 0$ (assuming $\varepsilon < 1/2$, which we may assume by shrinking $\varepsilon$ if needed).
Define the bi-Lipschitz homeomorphism $g_i: L_k(A_k^i) \to f(A_k^i)$ by $g_i(w) := f(L_k|_{A_k^i}^{-1}(w))$ (well-defined since $L_k$ is injective). By ($\dagger$), for all $w_1, w_2 \in L_k(A_k^i)$,
\begin{align*}
(1 - 2\varepsilon)|w_1 - w_2| &\le |g_i(w_1) - g_i(w_2)| \le (1 + 2\varepsilon)|w_1 - w_2|.
\end{align*}
A bi-Lipschitz map with constants $1 \pm 2\varepsilon$ scales $\mathcal{H}^m$-measure by a factor between $(1 - 2\varepsilon)^m$ and $(1 + 2\varepsilon)^m$: for any $\delta$-cover $\{U_\alpha\}$ of $L_k(A_k^i)$, the family $\{g_i(U_\alpha \cap L_k(A_k^i))\}$ is a $(1 + 2\varepsilon)\delta$-cover of $f(A_k^i)$ with diameters at most $(1 + 2\varepsilon)\operatorname{diam}(U_\alpha)$, by the upper bound. Conversely, any $\delta'$-cover of $f(A_k^i)$ pulls back through $g_i^{-1}$ to a cover of $L_k(A_k^i)$ with diameters at most $\delta'/(1 - 2\varepsilon)$, by the lower bound. Taking infima in the definition of $\mathcal{H}^m_\delta$ and letting $\delta \downarrow 0$,
\begin{align*}
(1 - 2\varepsilon)^m \mathcal{H}^m(L_k(A_k^i)) \le \mathcal{H}^m(f(A_k^i)) \le (1 + 2\varepsilon)^m \mathcal{H}^m(L_k(A_k^i)).
\end{align*}
By the linearised area formula (Step 2), $\mathcal{H}^m(L_k(A_k^i)) = J_m L_k \cdot \mathcal{L}^m(A_k^i)$. Thus
\begin{align*}
(1 - 2\varepsilon)^m J_m L_k \cdot \mathcal{L}^m(A_k^i) \le \mathcal{H}^m(f(A_k^i)) \le (1 + 2\varepsilon)^m J_m L_k \cdot \mathcal{L}^m(A_k^i).
\end{align*}
Since $f|_{A_k^i}$ is injective,
\begin{align*}
\mathcal{H}^m(f(A_k^i)) &= \int_{\mathbb{R}^n} \mathbf{1}_{f(A_k^i)}(y) \, d\mathcal{H}^m(y) = \int_{\mathbb{R}^n} \#(f^{-1}(y) \cap A_k^i) \, d\mathcal{H}^m(y).
\end{align*}
Sum over $i$. The $A_k^i$ are pairwise disjoint and their union covers $A_k$ modulo a Lebesgue (and hence Hausdorff) null set, which contributes nothing on either side by the [Lipschitz Bound on Hausdorff Measure](/theorems/2999) applied to the residual set. The integrands $\#(f^{-1}(y) \cap A_k^i)$ are nonnegative and their pointwise sum equals $\#(f^{-1}(y) \cap \bigcup_i A_k^i) = \#(f^{-1}(y) \cap A_k)$ by disjointness; by the monotone convergence theorem applied to the increasing partial sums,
\begin{align*}
\sum_{i \ge 1} \int_{\mathbb{R}^n} \#(f^{-1}(y) \cap A_k^i) \, d\mathcal{H}^m(y) &= \int_{\mathbb{R}^n} \#(f^{-1}(y) \cap A_k) \, d\mathcal{H}^m(y),
\end{align*}
and by countable additivity of $\mathcal{L}^m$, $\sum_i \mathcal{L}^m(A_k^i) = \mathcal{L}^m(A_k)$. Combining gives ($\ddagger$).
For ($\diamond\diamond$): with $L = L_k$ and $M = Df_x$ both injective and $M \in U_k$, the bound ($**$) gives, for each unit vector $v$, $(1 - \varepsilon)|L_k v| \le |Df_x v| \le (1 + \varepsilon)|L_k v|$. By the min-max characterisation of singular values, $(1 - \varepsilon)\sigma_i(L_k) \le \sigma_i(Df_x) \le (1 + \varepsilon)\sigma_i(L_k)$ for each $i = 1, \ldots, m$. Taking the product over $i$,
\begin{align*}
(1 - \varepsilon)^m J_m L_k = (1 - \varepsilon)^m \prod_{i=1}^m \sigma_i(L_k) \le \prod_{i=1}^m \sigma_i(Df_x) = J_m f(x) \le (1 + \varepsilon)^m J_m L_k.
\end{align*}
Integrating over $A_k$ gives ($\diamond\diamond$).
[/proof]
[/claim]
[/step]
[step:Show that the singular set $A_0$ contributes zero]
[claim:Singular set contributes zero]
\begin{align*}
\int_{A_0} J_m f \, d\mathcal{L}^m = 0 \quad \text{and} \quad \int_{\mathbb{R}^n} \#(f^{-1}(y) \cap A_0) \, d\mathcal{H}^m(y) = 0.
\end{align*}
[proof]
Left-hand side: $J_m f(x) = 0$ for $x \in A_0$ by definition, so the integral vanishes.
Right-hand side: it suffices to show $\mathcal{H}^m(f(A_0)) = 0$, which forces $\#(f^{-1}(y) \cap A_0) = 0$ for $\mathcal{H}^m$-a.e. $y$. By inner regularity of $\mathcal{L}^m$ (Lebesgue measure is Radon, hence inner regular on Borel sets), it suffices to show $\mathcal{H}^m(f(A_0 \cap K)) = 0$ for every compact $K \subseteq \mathbb{R}^m$. We may therefore assume $A_0$ is bounded with $\mathcal{L}^m(A_0) < \infty$, and $\widetilde{A_0}$ denotes a fixed bounded open neighbourhood of $A_0$ with $\mathcal{L}^m(\widetilde{A_0}) < \infty$.
Fix $\eta \in (0, 1)$ and a preassigned scale $\delta_0 > 0$. For each $x_0 \in A_0$, $Df_{x_0}: \mathbb{R}^m \to \mathbb{R}^n$ has $\operatorname{rank}(Df_{x_0}) < m$ (since $J_m f(x_0) = 0$). Thus $V_{x_0} := Df_{x_0}(\mathbb{R}^m) \subseteq \mathbb{R}^n$ is a linear subspace of dimension at most $m - 1$. By differentiability of $f$ at $x_0$, there exists $r(x_0) \in (0, \delta_0]$ such that for every $z \in B(x_0, r(x_0))$,
\begin{align*}
|f(z) - f(x_0) - Df_{x_0}(z - x_0)| &\le \eta |z - x_0|.
\end{align*}
The upper bound $r(x_0) \le \delta_0$ is the key refinement: every ball in the Vitali cover has radius at most $\delta_0$.
The family $\mathcal{V}_0 := \{\overline{B}(x_0, r) : x_0 \in A_0, \, 0 < r < r(x_0)\}$ is a Vitali cover of $A_0$ with each ball of radius $< \delta_0$. By the [Vitali Covering Theorem](/theorems/2963), there exists a countable disjoint subfamily $\{\overline{B}(y_j, s_j)\}_{j \ge 1} \subseteq \mathcal{V}_0$ such that $\mathcal{L}^m(A_0 \setminus \bigcup_j \overline{B}(y_j, s_j)) = 0$ and each $s_j < \delta_0$. We may further assume $\bigsqcup_j \overline{B}(y_j, s_j) \subseteq \widetilde{A_0}$ by shrinking $\delta_0$ to be smaller than the distance from $A_0$ to $\partial \widetilde{A_0}$.
For each $j$, $f$ maps $\overline{B}(y_j, s_j)$ into a thickened slab around the affine subspace $f(y_j) + V_{y_j}$ of dimension at most $m - 1$: for $z \in \overline{B}(y_j, s_j)$,
\begin{align*}
|f(z) - f(y_j) - Df_{y_j}(z - y_j)| &\le \eta s_j, & |Df_{y_j}(z - y_j)| &\le L s_j,
\end{align*}
where $L = \operatorname{Lip}(f)$ bounds the operator norm of $Df_{y_j}$. Hence $f(\overline{B}(y_j, s_j)) \subseteq f(y_j) + S_j$ where
\begin{align*}
S_j &:= \{w + u : w \in V_{y_j}, |w| \le L s_j; \; u \in V_{y_j}^\perp, |u| \le \eta s_j\}.
\end{align*}
The slab $S_j$ has length $L s_j$ in an $(m - 1)$-dimensional subspace and width $\eta s_j$ in the orthogonal complement of $V_{y_j}$ inside $\mathbb{R}^n$. The standard volume packing argument covers an $(m - 1)$-dimensional cube of side $2 L s_j$ by $(2 L s_j / \eta s_j)^{m-1} = (2 L / \eta)^{m-1}$ cubes of side $\eta s_j$ in the $V_{y_j}$-direction, each of which together with the orthogonal width $\eta s_j$ embeds in a Euclidean ball of radius $C_1(m, n) \eta s_j$ in $\mathbb{R}^n$, where $C_1(m, n)$ is a dimensional constant (depending only on $m$ and $n$ via the diameter of the $(m-1)$-cube of side $\eta s_j$ embedded in $\mathbb{R}^n$). Hence $S_j$ is covered by at most $C(m, n) (L / \eta)^{m-1}$ balls in $\mathbb{R}^n$ each of radius $C_1(m, n) \eta s_j \le C_2(m, n) \eta s_j$, where $C(m, n)$ absorbs all dimensional constants. Each such ball has diameter at most $2 C_2(m, n) \eta s_j$, contributing at most $\omega_m^{-1} (2 C_2(m, n) \eta s_j / 2)^m \cdot \omega_m = (C_2(m, n) \eta s_j)^m$ to the Hausdorff premeasure $\mathcal{H}^m_\delta$ for any $\delta \ge 2 C_2(m, n) \eta s_j$. Aggregating constants and reusing $C(m, n)$ for the final dimensional constant,
\begin{align*}
\mathcal{H}^m_{2 C_2 L \delta_0}(f(\overline{B}(y_j, s_j))) &\le C(m, n) (L / \eta)^{m-1} (2 C_2(m, n) \eta s_j)^m \le C'(m, n) L^{m-1} \eta s_j^m,
\end{align*}
where $C'(m, n)$ is a dimensional constant and we used $\eta s_j \le L s_j \le L \delta_0$, so the diameters of all covering balls are at most $2 C_2(m, n) L \delta_0$, allowing us to bound a single $\mathcal{H}^m_\delta$ at $\delta := 2 C_2(m, n) L \delta_0$.
Sum over $j$. Since the balls $\overline{B}(y_j, s_j)$ are pairwise disjoint and $\sum_j \mathcal{L}^m(\overline{B}(y_j, s_j)) = \omega_m \sum_j s_j^m \le \mathcal{L}^m(\widetilde{A_0})$,
\begin{align*}
\sum_j s_j^m &\le \omega_m^{-1} \mathcal{L}^m(\widetilde{A_0}).
\end{align*}
By countable subadditivity of $\mathcal{H}^m_\delta$ (which holds with $\delta := 2 C_2(m, n) L \delta_0$ since each ball has diameter at most $\delta$),
\begin{align*}
\mathcal{H}^m_{2 C_2 L \delta_0}(f(A_0)) &\le \mathcal{H}^m_{2 C_2 L \delta_0}\Bigl(\bigcup_j f(\overline{B}(y_j, s_j))\Bigr) + \mathcal{H}^m_{2 C_2 L \delta_0}\Bigl(f\Bigl(A_0 \setminus \bigcup_j \overline{B}(y_j, s_j)\Bigr)\Bigr) \\
&\le \sum_j \mathcal{H}^m_{2 C_2 L \delta_0}(f(\overline{B}(y_j, s_j))) + 0 \\
&\le \sum_j C'(m, n) L^{m-1} \eta s_j^m \le C'(m, n) L^{m-1} \omega_m^{-1} \eta \cdot \mathcal{L}^m(\widetilde{A_0}),
\end{align*}
where the residual set has $\mathcal{L}^m$-measure zero, hence by [Lipschitz Bound on Hausdorff Measure](/theorems/2999) its image has $\mathcal{H}^m$-measure zero, hence $\mathcal{H}^m_{2 C_2 L \delta_0}$-measure zero.
The bound is uniform in $\delta_0$. Letting $\delta_0 \downarrow 0$, the parameter $2 C_2(m, n) L \delta_0 \downarrow 0$, and $\mathcal{H}^m_{\delta} \uparrow \mathcal{H}^m$ as $\delta \downarrow 0$. Therefore
\begin{align*}
\mathcal{H}^m(f(A_0)) &= \lim_{\delta_0 \downarrow 0} \mathcal{H}^m_{2 C_2 L \delta_0}(f(A_0)) \le C'(m, n) L^{m-1} \omega_m^{-1} \eta \cdot \mathcal{L}^m(\widetilde{A_0}).
\end{align*}
The right-hand side is finite (since $\widetilde{A_0}$ has finite measure) and depends on $\eta$ but not on $\delta_0$. Letting $\eta \downarrow 0$ gives $\mathcal{H}^m(f(A_0)) = 0$, as required.
[/proof]
[/claim]
[/step]
[step:Sum over $k$ and pass to the limit $\varepsilon \to 0$]
By the per-piece estimates ($\diamond\diamond$) and ($\ddagger$), summing over $k \ge 1$ and using countable additivity of the integral over the disjoint partition $A_+ = \bigsqcup_k A_k$ (modulo a null set),
\begin{align*}
(1 - \varepsilon)^m \sum_{k \ge 1} J_m L_k \cdot \mathcal{L}^m(A_k) &\le \int_{A_+} J_m f \, d\mathcal{L}^m \le (1 + \varepsilon)^m \sum_{k \ge 1} J_m L_k \cdot \mathcal{L}^m(A_k), \\
(1 - 2\varepsilon)^m \sum_{k \ge 1} J_m L_k \cdot \mathcal{L}^m(A_k) &\le \int_{\mathbb{R}^n} \sum_{k \ge 1} \#(f^{-1}(y) \cap A_k) \, d\mathcal{H}^m(y) \le (1 + 2\varepsilon)^m \sum_{k \ge 1} J_m L_k \cdot \mathcal{L}^m(A_k),
\end{align*}
where the right-hand side of the second display uses the monotone convergence theorem applied to the increasing partial sums $\sum_{k \le K} \#(f^{-1}(y) \cap A_k) \uparrow \#(f^{-1}(y) \cap A_+)$.
By the disjoint-partition identity,
\begin{align*}
\sum_{k \ge 1} \#(f^{-1}(y) \cap A_k) &= \#\Bigl(f^{-1}(y) \cap \bigsqcup_{k \ge 1} A_k\Bigr) = \#(f^{-1}(y) \cap A_+),
\end{align*}
for $\mathcal{H}^m$-a.e. $y \in \mathbb{R}^n$, since the $A_k$ are pairwise disjoint and their union is $A_+$ modulo a null set (which contributes nothing for $\mathcal{H}^m$-a.e. $y$ by the [Lipschitz Bound on Hausdorff Measure](/theorems/2999)).
By the singular-set claim of Step 6, $\int_{A_0} J_m f \, d\mathcal{L}^m = 0$ and $\int_{\mathbb{R}^n} \#(f^{-1}(y) \cap A_0) \, d\mathcal{H}^m(y) = 0$, so
\begin{align*}
\int_A J_m f \, d\mathcal{L}^m &= \int_{A_+} J_m f \, d\mathcal{L}^m, \\
\int_{\mathbb{R}^n} N(f, A, y) \, d\mathcal{H}^m(y) &= \int_{\mathbb{R}^n} \#(f^{-1}(y) \cap A_+) \, d\mathcal{H}^m(y).
\end{align*}
Combining the displayed inequalities, the common quantity $\sum_{k \ge 1} J_m L_k \cdot \mathcal{L}^m(A_k)$ controls both $\int_A J_m f \, d\mathcal{L}^m$ and $\int_{\mathbb{R}^n} N(f, A, y) \, d\mathcal{H}^m(y)$ within multiplicative factors $(1 \pm \varepsilon)^m$ and $(1 \pm 2\varepsilon)^m$ respectively. Comparing,
\begin{align*}
\Bigl(\frac{1 - 2\varepsilon}{1 + \varepsilon}\Bigr)^m \int_A J_m f \, d\mathcal{L}^m \le \int_{\mathbb{R}^n} N(f, A, y) \, d\mathcal{H}^m(y) \le \Bigl(\frac{1 + 2\varepsilon}{1 - \varepsilon}\Bigr)^m \int_A J_m f \, d\mathcal{L}^m.
\end{align*}
The integral $\int_A J_m f \, d\mathcal{L}^m$ is independent of $\varepsilon$. Since $\big(\frac{1 - 2\varepsilon}{1 + \varepsilon}\big)^m \to 1$ and $\big(\frac{1 + 2\varepsilon}{1 - \varepsilon}\big)^m \to 1$ as $\varepsilon \downarrow 0$, sending $\varepsilon \downarrow 0$ on both sides yields
\begin{align*}
\int_A J_m f(x) \, d\mathcal{L}^m(x) &= \int_{\mathbb{R}^n} N(f, A, y) \, d\mathcal{H}^m(y).
\end{align*}
This is the area formula, completing the proof.
[/step]
Explore Further
$L^{1^*}$-Differentiability for BV Functions
Real Analysis
$L^{p^*}$-Differentiability of Sobolev Functions
Real Analysis
Jacobian as Product of Singular Values
Real Analysis
McShane's Extension Theorem
Real Analysis
Level-Set Integration Formula
Real Analysis
Change of Variables for Lipschitz Maps
Real Analysis
Convex Functions Are Locally Lipschitz
Real Analysis
Implication to Approximate Differentiability
Real Analysis