[proofplan]
The strategy is a two-step approximation, both legs of which are standard. First, we mollify $u$ to obtain a smooth $W^{1,1}$ function $w := u_{\delta_0}$ that approximates $u$ in $L^1$ and whose gradient $L^1$-norm approximates the total variation $|Du|(\mathbb{R}^n)$; this is the strict-convergence theorem for the mollification of a $BV$ function. Second, we apply the $C^1$-approximation theorem for $W^{1,1}$ functions (theorem 3092) to $w$ with tolerance $\varepsilon/3$, producing $g \in C^1(\mathbb{R}^n)$ with $\|w - g\|_{W^{1,1}} < \varepsilon/3$. The triangle inequality in $L^1$ and the reverse triangle inequality $\bigl| \|a\|_{L^1} - \|b\|_{L^1} \bigr| \le \|a - b\|_{L^1}$ then assemble the two legs into the conclusion. No singular-part cancellation is required: the conclusion is purely about the $L^1$-distance of the functions and the closeness of the scalars $|Du|(\mathbb{R}^n)$ and $\|\nabla g\|_{L^1}$, which is the natural notion of strict convergence in $BV$.
[/proofplan]
[step:Mollify $u$ in $C^\infty \cap W^{1,1}$]
Let $u \in BV(\mathbb{R}^n)$ and $\varepsilon > 0$ be as in the statement. Recall that $u \in BV(\mathbb{R}^n)$ means $u \in L^1(\mathbb{R}^n)$ and the distributional gradient $Du$ is a finite $\mathbb{R}^n$-valued Radon measure on $\mathbb{R}^n$ with total variation $|Du|(\mathbb{R}^n) < \infty$.
Fix a [standard mollifier](/page/Standard%20Mollifier) $\eta \in C_c^\infty(\mathbb{R}^n)$ with $\eta \ge 0$, $\operatorname{supp} \eta \subseteq B(0, 1)$, and $\int_{\mathbb{R}^n} \eta \, d\mathcal{L}^n = 1$. For $\delta > 0$ define the rescaled mollifier $\eta_\delta(x) := \delta^{-n} \eta(x/\delta)$, so $\operatorname{supp} \eta_\delta \subseteq B(0, \delta)$ and $\int_{\mathbb{R}^n} \eta_\delta \, d\mathcal{L}^n = 1$. Define the mollification
\begin{align*}
u_\delta: \mathbb{R}^n &\to \mathbb{R} \\
x &\mapsto (\eta_\delta * u)(x) = \int_{\mathbb{R}^n} \eta_\delta(x - y) \, u(y) \, d\mathcal{L}^n(y).
\end{align*}
Standard properties of mollification (differentiation under the integral) give $u_\delta \in C^\infty(\mathbb{R}^n)$ with classical gradient
\begin{align*}
\nabla u_\delta(x) = \int_{\mathbb{R}^n} \nabla_x \eta_\delta(x - y) \, u(y) \, d\mathcal{L}^n(y).
\end{align*}
[claim:$\nabla u_\delta = \eta_\delta * Du$ as $\mathbb{R}^n$-valued Radon measures]
[proof]
For each fixed $x \in \mathbb{R}^n$, the map $y \mapsto \eta_\delta(x - y)$ lies in $C_c^\infty(\mathbb{R}^n)$ (compactly supported in the ball $\overline{B}(x, \delta)$). Hence we may pair it against the distributional gradient $Du$. Using $\nabla_x \eta_\delta(x - y) = -\nabla_y \eta_\delta(x - y)$ and the definition of the distributional derivative $D_i u$ as a Radon measure paired against test functions in $C_c^\infty(\mathbb{R}^n)$,
\begin{align*}
\partial_{x_i} u_\delta(x) &= \int_{\mathbb{R}^n} \partial_{x_i} \eta_\delta(x - y) \, u(y) \, d\mathcal{L}^n(y) \\
&= -\int_{\mathbb{R}^n} \partial_{y_i} \bigl(\eta_\delta(x - y)\bigr) \, u(y) \, d\mathcal{L}^n(y) \\
&= \int_{\mathbb{R}^n} \eta_\delta(x - y) \, dD_i u(y).
\end{align*}
Hence $\nabla u_\delta(x) = \int_{\mathbb{R}^n} \eta_\delta(x - y) \, dDu(y)$, i.e. $\nabla u_\delta$ is the convolution of the scalar function $\eta_\delta$ with the vector-valued Radon measure $Du$.
[/proof]
[/claim]
We now bound $\|\nabla u_\delta\|_{L^1}$ in terms of $|Du|(\mathbb{R}^n)$ via Jensen's inequality and Fubini-Tonelli:
\begin{align*}
\int_{\mathbb{R}^n} |\nabla u_\delta(x)| \, d\mathcal{L}^n(x) &= \int_{\mathbb{R}^n} \Bigl| \int_{\mathbb{R}^n} \eta_\delta(x - y) \, dDu(y) \Bigr| \, d\mathcal{L}^n(x) \\
&\le \int_{\mathbb{R}^n} \int_{\mathbb{R}^n} \eta_\delta(x - y) \, d|Du|(y) \, d\mathcal{L}^n(x) \\
&= \int_{\mathbb{R}^n} \Bigl( \int_{\mathbb{R}^n} \eta_\delta(x - y) \, d\mathcal{L}^n(x) \Bigr) \, d|Du|(y) \\
&= \int_{\mathbb{R}^n} 1 \, d|Du|(y) = |Du|(\mathbb{R}^n).
\end{align*}
The first inequality is the triangle inequality for vector-valued integrals against the non-negative kernel $\eta_\delta$ (a Jensen-type bound). The swap of integrals is justified by Fubini-Tonelli: the integrand is non-negative and both measures $\mathcal{L}^n$ and $|Du|$ are $\sigma$-finite. The inner $\mathcal{L}^n$-integral evaluates to $1$ by the unit-mass property of $\eta_\delta$ (translation-invariance of $\mathcal{L}^n$). Hence
\begin{align*}
\|\nabla u_\delta\|_{L^1(\mathbb{R}^n)} \le |Du|(\mathbb{R}^n) < \infty,
\end{align*}
so $u_\delta \in W^{1,1}(\mathbb{R}^n)$ for every $\delta > 0$.
We now invoke two black-box convergence results as $\delta \to 0^+$:
(a) [$L^1$ continuity of translation and mollification](/theorems/l1-continuity-of-mollification) (Evans-Gariepy *Measure Theory and Fine Properties of Functions*, revised edition, Section 4.2 Theorem 1; Ambrosio-Fusco-Pallara *Functions of Bounded Variation*, Proposition 3.7): for $u \in L^1(\mathbb{R}^n)$,
\begin{align*}
\|u_\delta - u\|_{L^1(\mathbb{R}^n)} \to 0 \quad \text{as } \delta \to 0^+.
\end{align*}
The hypothesis $u \in L^1(\mathbb{R}^n)$ holds because $BV(\mathbb{R}^n) \subseteq L^1(\mathbb{R}^n)$ by definition.
(b) [Strict convergence of mollification in $BV$](/theorems/strict-convergence-of-mollification) (Evans-Gariepy Section 5.3 Theorem 2; Ambrosio-Fusco-Pallara Theorem 3.9): for $u \in BV(\mathbb{R}^n)$,
\begin{align*}
\|\nabla u_\delta\|_{L^1(\mathbb{R}^n)} \to |Du|(\mathbb{R}^n) \quad \text{as } \delta \to 0^+.
\end{align*}
The hypothesis $u \in BV(\mathbb{R}^n)$ is the standing assumption.
Combining (a) and (b), choose $\delta_0 > 0$ small enough that
\begin{align*}
\|u_{\delta_0} - u\|_{L^1(\mathbb{R}^n)} < \varepsilon/3 \qquad \text{and} \qquad \Bigl| \|\nabla u_{\delta_0}\|_{L^1(\mathbb{R}^n)} - |Du|(\mathbb{R}^n) \Bigr| < \varepsilon/3.
\end{align*}
Define $w := u_{\delta_0}$. Then $w \in C^\infty(\mathbb{R}^n) \cap W^{1,1}(\mathbb{R}^n)$.
[/step]
[step:Apply $C^1$ approximation of Sobolev functions]
We apply the [$C^1$ Approximation of Sobolev Functions](/theorems/c1-approximation-of-sobolev-functions) (theorem 3092) to $w$ with exponent $p = 1$ and tolerance $\varepsilon/3$. We verify the hypotheses of theorem 3092: it requires $p \in [1, \infty)$ and an input function in $W^{1,p}(\mathbb{R}^n)$. Here $p = 1 \in [1, \infty)$, and $w \in W^{1,1}(\mathbb{R}^n)$ by Step 1.
Theorem 3092 produces $g \in C^1(\mathbb{R}^n)$ such that
\begin{align*}
\mathcal{L}^n(\{w \ne g\}) < \varepsilon/3 \qquad \text{and} \qquad \|w - g\|_{W^{1,1}(\mathbb{R}^n)} < \varepsilon/3.
\end{align*}
We will not use the bound on $\mathcal{L}^n(\{w \ne g\})$. From the $W^{1,1}$ bound, recalling $\|h\|_{W^{1,1}} = \|h\|_{L^1} + \|\nabla h\|_{L^1}$, we extract the two component bounds
\begin{align*}
\|w - g\|_{L^1(\mathbb{R}^n)} < \varepsilon/3 \qquad \text{and} \qquad \|\nabla w - \nabla g\|_{L^1(\mathbb{R}^n)} < \varepsilon/3.
\end{align*}
[/step]
[step:Aggregate via triangle inequality]
We now verify the two conclusions of the theorem for the function $g \in C^1(\mathbb{R}^n)$ produced in Step 2.
For the $L^1$-distance, the triangle inequality gives
\begin{align*}
\|u - g\|_{L^1(\mathbb{R}^n)} &\le \|u - w\|_{L^1(\mathbb{R}^n)} + \|w - g\|_{L^1(\mathbb{R}^n)} \\
&< \varepsilon/3 + \varepsilon/3 = 2\varepsilon/3 < \varepsilon,
\end{align*}
using the bound on $\|u - w\|_{L^1}$ from Step 1 and the bound on $\|w - g\|_{L^1}$ from Step 2.
For the gradient-mass discrepancy, we combine the strict-convergence bound from Step 1 with the reverse triangle inequality
\begin{align*}
\bigl| \|\nabla w\|_{L^1(\mathbb{R}^n)} - \|\nabla g\|_{L^1(\mathbb{R}^n)} \bigr| \le \|\nabla w - \nabla g\|_{L^1(\mathbb{R}^n)},
\end{align*}
which follows from the triangle inequality $\|\nabla w\|_{L^1} \le \|\nabla w - \nabla g\|_{L^1} + \|\nabla g\|_{L^1}$ and the symmetric estimate. Then
\begin{align*}
\Bigl| |Du|(\mathbb{R}^n) - \int_{\mathbb{R}^n} |\nabla g| \, d\mathcal{L}^n \Bigr| &= \bigl| |Du|(\mathbb{R}^n) - \|\nabla g\|_{L^1(\mathbb{R}^n)} \bigr| \\
&\le \bigl| |Du|(\mathbb{R}^n) - \|\nabla w\|_{L^1(\mathbb{R}^n)} \bigr| + \bigl| \|\nabla w\|_{L^1(\mathbb{R}^n)} - \|\nabla g\|_{L^1(\mathbb{R}^n)} \bigr| \\
&\le \bigl| |Du|(\mathbb{R}^n) - \|\nabla w\|_{L^1(\mathbb{R}^n)} \bigr| + \|\nabla w - \nabla g\|_{L^1(\mathbb{R}^n)} \\
&< \varepsilon/3 + \varepsilon/3 = 2\varepsilon/3 < \varepsilon,
\end{align*}
using the bound on $\bigl| \|\nabla w\|_{L^1} - |Du|(\mathbb{R}^n) \bigr|$ from Step 1 and the bound on $\|\nabla w - \nabla g\|_{L^1}$ from Step 2.
Thus $g \in C^1(\mathbb{R}^n)$ satisfies both
\begin{align*}
\|u - g\|_{L^1(\mathbb{R}^n)} < \varepsilon \qquad \text{and} \qquad \Bigl| |Du|(\mathbb{R}^n) - \int_{\mathbb{R}^n} |\nabla g| \, d\mathcal{L}^n \Bigr| < \varepsilon,
\end{align*}
which is the conclusion of the theorem.
[/step]