[proofplan]
We split into two cases. If $A$ is singular, then $A(\mathbb{R}^n)$ is contained in a proper subspace of $\mathbb{R}^n$, which has $\mathcal{L}^n$-measure zero, so both sides vanish. If $A$ is invertible, we factor $A$ as a product of elementary matrices using row reduction, prove the formula for each type of elementary matrix by direct computation, and compose the results using the multiplicativity of the determinant.
[/proofplan]
[step:Handle the singular case: $\det A = 0$]
Suppose $\det A = 0$. By the [Determinant Invertibility Criterion](/theorems/396), $A$ is not invertible, so $\ker A \neq \{0\}$ and $\operatorname{rank}(A) \leq n - 1$. The image $A(\mathbb{R}^n)$ is a linear subspace of $\mathbb{R}^n$ of dimension at most $n - 1$. Since $A(S) \subset A(\mathbb{R}^n)$, we have $A(S)$ contained in an affine subspace of dimension at most $n - 1$.
A $k$-dimensional affine subspace of $\mathbb{R}^n$ with $k < n$ has $\mathcal{L}^n$-measure zero (it is contained in a countable union of $k$-dimensional coordinate hyperplanes under a suitable rotation, each of which is a null set for $\mathcal{L}^n$). Therefore $\mathcal{L}^n(A(S)) = 0 = |\det A| \cdot \mathcal{L}^n(S)$.
[/step]
[step:Reduce the invertible case to elementary matrices]
Suppose $\det A \neq 0$, so $A$ is invertible. By Gaussian elimination, $A$ can be written as a product of elementary matrices:
\begin{align*}
A = E_1 E_2 \cdots E_m,
\end{align*}
where each $E_l$ corresponds to one of the three elementary row operations (row swap, row scaling, row addition). If we establish the formula $\mathcal{L}^n(E(S)) = |\det E| \cdot \mathcal{L}^n(S)$ for each elementary matrix $E$ and every measurable set $S$, then the general result follows by induction on the number of factors:
\begin{align*}
\mathcal{L}^n(A(S)) &= \mathcal{L}^n(E_1(E_2(\cdots E_m(S) \cdots))) \\
&= |\det E_1| \cdot \mathcal{L}^n(E_2(\cdots E_m(S) \cdots)) \\
&= |\det E_1| \cdot |\det E_2| \cdots |\det E_m| \cdot \mathcal{L}^n(S) \\
&= |\det E_1 \cdot \det E_2 \cdots \det E_m| \cdot \mathcal{L}^n(S) \\
&= |\det(E_1 E_2 \cdots E_m)| \cdot \mathcal{L}^n(S) = |\det A| \cdot \mathcal{L}^n(S),
\end{align*}
where the penultimate equality uses [Determinant Multiplicativity](/theorems/395) applied iteratively.
[guided]
When $A$ is invertible, the strategy is to avoid working with the linear map $A$ directly and instead decompose it into elementary operations whose effect on volume is easy to compute.
Every invertible matrix can be row-reduced to the identity: there exist elementary matrices $E_1, \dots, E_m$ such that $E_m \cdots E_2 E_1 A = I$. Equivalently, $A = E_1^{-1} E_2^{-1} \cdots E_m^{-1}$. Since the inverse of an elementary matrix is again elementary (the inverse of a row swap is the same swap; the inverse of scaling row $i$ by $\lambda$ is scaling by $\lambda^{-1}$; the inverse of adding $\lambda$ times row $j$ to row $i$ is adding $-\lambda$ times row $j$ to row $i$), we can write $A = E_1' E_2' \cdots E_m'$ as a product of elementary matrices.
If we prove $\mathcal{L}^n(E(S)) = |\det E| \cdot \mathcal{L}^n(S)$ for each elementary type, then composing gives
\begin{align*}
\mathcal{L}^n(A(S)) &= |\det E_1'| \cdots |\det E_m'| \cdot \mathcal{L}^n(S).
\end{align*}
By [Determinant Multiplicativity](/theorems/395), $\det A = \det E_1' \cdots \det E_m'$, so $|\det E_1'| \cdots |\det E_m'| = |\det A|$, completing the argument.
[/guided]
[/step]
[step:Verify the formula for each type of elementary matrix]
We verify $\mathcal{L}^n(E(S)) = |\det E| \cdot \mathcal{L}^n(S)$ for each of the three types of elementary row operations. Denote the standard coordinates on $\mathbb{R}^n$ by $(x_1, \dots, x_n)$.
**Type 1: Row swap.** Let $E$ swap coordinates $i$ and $j$. This map permutes the coordinate axes without stretching or reflecting. By the [Effect of Row Operations on the Determinant](/theorems/3298), $\det E = -1$, so $|\det E| = 1$. The map $E: \mathbb{R}^n \to \mathbb{R}^n$ is the permutation $(x_1, \dots, x_i, \dots, x_j, \dots, x_n) \mapsto (x_1, \dots, x_j, \dots, x_i, \dots, x_n)$. Since $\mathcal{L}^n$ is invariant under permutations of coordinates (the Lebesgue measure on $\mathbb{R}^n$ is the $n$-fold product $\mathcal{L}^1 \otimes \cdots \otimes \mathcal{L}^1$, and permuting the factors does not change the product measure), we have $\mathcal{L}^n(E(S)) = \mathcal{L}^n(S) = 1 \cdot \mathcal{L}^n(S) = |\det E| \cdot \mathcal{L}^n(S)$.
**Type 2: Row scaling.** Let $E$ multiply coordinate $i$ by $\lambda \in \mathbb{R} \setminus \{0\}$. By the [Effect of Row Operations on the Determinant](/theorems/3298), $\det E = \lambda$, so $|\det E| = |\lambda|$. The map acts as
\begin{align*}
E(x_1, \dots, x_n) = (x_1, \dots, x_{i-1}, \lambda x_i, x_{i+1}, \dots, x_n).
\end{align*}
By Fubini's theorem (applicable since $S$ is measurable and the map acts on a single coordinate), we can evaluate $\mathcal{L}^n(E(S))$ by integrating over the product structure. The scaling $x_i \mapsto \lambda x_i$ in the $i$-th coordinate dilates $\mathcal{L}^1$-measure by a factor of $|\lambda|$, while all other coordinates are unchanged. Therefore $\mathcal{L}^n(E(S)) = |\lambda| \cdot \mathcal{L}^n(S) = |\det E| \cdot \mathcal{L}^n(S)$.
**Type 3: Row addition (shear).** Let $E$ add $\lambda$ times coordinate $j$ to coordinate $i$ (with $i \neq j$). By the [Effect of Row Operations on the Determinant](/theorems/3298), $\det E = 1$, so $|\det E| = 1$. The map acts as
\begin{align*}
E(x_1, \dots, x_n) = (x_1, \dots, x_{i-1}, x_i + \lambda x_j, x_{i+1}, \dots, x_n).
\end{align*}
For each fixed value of the coordinates $(x_1, \dots, x_{i-1}, x_{i+1}, \dots, x_n)$, the map $x_i \mapsto x_i + \lambda x_j$ is a translation in the $x_i$-variable (since $x_j$ is fixed). Lebesgue measure $\mathcal{L}^1$ is translation-invariant, so this map preserves $\mathcal{L}^1$-measure in the $i$-th coordinate for each fixed value of the remaining coordinates. By Fubini's theorem, $\mathcal{L}^n(E(S)) = \mathcal{L}^n(S) = |\det E| \cdot \mathcal{L}^n(S)$.
[guided]
We need to verify $\mathcal{L}^n(E(S)) = |\det E| \cdot \mathcal{L}^n(S)$ for each of the three types of elementary matrices. The strategy is to exploit the product structure of Lebesgue measure: $\mathcal{L}^n = \mathcal{L}^1 \otimes \cdots \otimes \mathcal{L}^1$. Each elementary operation acts on at most one coordinate, so we can use Fubini's theorem to reduce to a one-dimensional problem. Denote the standard coordinates on $\mathbb{R}^n$ by $(x_1, \dots, x_n)$.
**Type 1 (Row swap):** Let $E$ swap coordinates $i$ and $j$. By the [Effect of Row Operations on the Determinant](/theorems/3298), $\det E = -1$, so $|\det E| = 1$. The map $E: \mathbb{R}^n \to \mathbb{R}^n$ acts as
\begin{align*}
E(x_1, \dots, x_i, \dots, x_j, \dots, x_n) = (x_1, \dots, x_j, \dots, x_i, \dots, x_n).
\end{align*}
Why does this map preserve $\mathcal{L}^n$-measure? The $n$-dimensional Lebesgue measure is the product $\mathcal{L}^n = \mathcal{L}^1 \otimes \cdots \otimes \mathcal{L}^1$, and permuting the factors of a product measure does not change the measure of any measurable set. Formally, for any measurable rectangle $R = R_1 \times \cdots \times R_n$, we have $\mathcal{L}^n(E(R)) = \prod_{k=1}^{n} \mathcal{L}^1(R_{\tau(k)}) = \prod_{k=1}^{n} \mathcal{L}^1(R_k) = \mathcal{L}^n(R)$, where $\tau = (i\;j)$. The result extends from rectangles to all measurable sets by the uniqueness of measures on product $\sigma$-algebras (both $S \mapsto \mathcal{L}^n(E(S))$ and $S \mapsto \mathcal{L}^n(S)$ are measures that agree on measurable rectangles, which generate the Borel $\sigma$-algebra). Therefore $\mathcal{L}^n(E(S)) = \mathcal{L}^n(S) = 1 \cdot \mathcal{L}^n(S) = |\det E| \cdot \mathcal{L}^n(S)$.
**Type 2 (Row scaling by $\lambda \neq 0$):** Let $E$ multiply coordinate $i$ by $\lambda \in \mathbb{R} \setminus \{0\}$. By the [Effect of Row Operations on the Determinant](/theorems/3298), $\det E = \lambda$, so $|\det E| = |\lambda|$. The map acts as
\begin{align*}
E(x_1, \dots, x_n) = (x_1, \dots, x_{i-1}, \lambda x_i, x_{i+1}, \dots, x_n).
\end{align*}
By Fubini's theorem (applicable since $S$ is measurable and the map acts on a single coordinate), we can evaluate $\mathcal{L}^n(E(S))$ by integrating over the product structure: first in the $i$-th coordinate, then in the remaining $n-1$ coordinates. The one-dimensional scaling $t \mapsto \lambda t$ satisfies $\mathcal{L}^1(\lambda B) = |\lambda| \cdot \mathcal{L}^1(B)$ for any measurable $B \subset \mathbb{R}$. This is the standard homogeneity property of Lebesgue measure: if $B$ is an interval of length $\ell$, then $\lambda B$ is an interval of length $|\lambda| \ell$, and the result extends to all measurable sets by countable additivity. Since all other coordinates are unchanged, integrating over the remaining coordinates gives $\mathcal{L}^n(E(S)) = |\lambda| \cdot \mathcal{L}^n(S) = |\det E| \cdot \mathcal{L}^n(S)$.
**Type 3 (Shear: add $\lambda$ times coordinate $j$ to coordinate $i$):** Let $E$ add $\lambda$ times coordinate $j$ to coordinate $i$ (with $i \neq j$). By the [Effect of Row Operations on the Determinant](/theorems/3298), $\det E = 1$, so $|\det E| = 1$. This is the most subtle case geometrically --- a shear distorts shapes but does not change volume. The map acts as
\begin{align*}
E(x_1, \dots, x_n) = (x_1, \dots, x_{i-1}, x_i + \lambda x_j, x_{i+1}, \dots, x_n).
\end{align*}
For each fixed value of the $n-1$ coordinates $(x_1, \dots, x_{i-1}, x_{i+1}, \dots, x_n)$, the transformation in the $x_i$-variable is a translation $x_i \mapsto x_i + c$ where $c = \lambda x_j$ is a constant (because $x_j$ is among the fixed coordinates, not the variable being transformed). Since Lebesgue measure $\mathcal{L}^1$ is translation-invariant, this map preserves $\mathcal{L}^1$-measure in the $i$-th coordinate for each fixed value of the remaining coordinates. By Fubini's theorem, integrating over the remaining coordinates gives $\mathcal{L}^n(E(S)) = \mathcal{L}^n(S) = |\det E| \cdot \mathcal{L}^n(S)$.
The key insight across all three cases is that Lebesgue measure has three fundamental invariance properties --- permutation invariance, scaling homogeneity, and translation invariance --- and each elementary row operation exploits exactly one of them.
[/guided]
[/step]
[step:Combine the elementary factors to conclude]
By the decomposition in the previous steps, for any invertible $A = E_1 \cdots E_m$:
\begin{align*}
\mathcal{L}^n(A(S)) = |\det E_1| \cdots |\det E_m| \cdot \mathcal{L}^n(S) = |\det A| \cdot \mathcal{L}^n(S).
\end{align*}
Together with the singular case ($\det A = 0$), this establishes $\mathcal{L}^n(A(S)) = |\det A| \cdot \mathcal{L}^n(S)$ for all $A \in M_{n \times n}(\mathbb{R})$ and all measurable $S \subset \mathbb{R}^n$.
Setting $S = [0,1]^n$ gives $\mathcal{L}^n(A([0,1]^n)) = |\det A| \cdot 1 = |\det A|$, recovering the geometric interpretation: the absolute value of the determinant is the volume of the image of the unit cube.
[/step]