[proofplan]
We prove the identity by comparing the optimality conditions for the two proximal minimization problems. First, the primal proximal point satisfies a subgradient inclusion. Next, the defining inequality for the convex conjugate shows that this inclusion is equivalent to a dual subgradient inclusion. Finally, that dual inclusion is exactly the optimality condition for the dual proximal point, and the decomposition follows from the definition of the auxiliary vector.
[/proofplan]
[step:Translate the primal proximal minimizer into a subgradient inclusion]
Fix $v \in \mathbb{R}^n$. Let $f^*:\mathbb{R}^n \to (-\infty,\infty]$ denote the convex conjugate of $f$, meaning
\begin{align*}
f^*(u) := \sup_{y \in \mathbb{R}^n}\{\langle u,y\rangle-f(y)\}
\end{align*}
for $u \in \mathbb{R}^n$. For any proper convex function $g:\mathbb{R}^n \to (-\infty,\infty]$ and any point $a \in \mathbb{R}^n$ with $g(a)<\infty$, let $\partial g(a) \subset \mathbb{R}^n$ denote the subdifferential, namely the set of all $p \in \mathbb{R}^n$ such that $g(y) \geq g(a)+\langle p,y-a\rangle$ for every $y \in \mathbb{R}^n$. Define
\begin{align*}
x := \operatorname{prox}_{\lambda f}(v) \in \mathbb{R}^n.
\end{align*}
By the definition of the proximal map, $x$ is the unique minimizer of the function $\Phi:\mathbb{R}^n \to (-\infty,\infty]$ defined by
\begin{align*}
\Phi(y) := f(y) + \frac{1}{2\lambda}|y-v|^2.
\end{align*}
Existence follows from the following finite-dimensional argument. The function $\Phi$ is lower semicontinuous because $f$ is lower semicontinuous and the quadratic term is continuous. To justify coercivity of $\Phi$, use the affine minorant property for proper closed convex functions in finite-dimensional Euclidean space, obtained by applying the supporting-hyperplane separation theorem to the closed convex epigraph of $f$: there exist $a \in \mathbb{R}^n$ and $b \in \mathbb{R}$ such that $f(y) \geq \langle a,y\rangle+b$ for every $y \in \mathbb{R}^n$. Hence the quadratic term dominates the affine lower bound as $|y| \to \infty$, so $\Phi(y) \to \infty$ as $|y| \to \infty$. Choose $R>0$ so that $\Phi(y)>\Phi(y_0)$ whenever $|y|>R$, where $y_0 \in \mathbb{R}^n$ is any point with $f(y_0)<\infty$. Let $\overline{B}(0,R) := \{y \in \mathbb{R}^n : |y| \leq R\}$ denote the closed Euclidean ball of radius $R$ centered at $0$. The restriction of $\Phi$ to the compact ball $\overline{B}(0,R)$ attains its minimum because its sublevel sets are closed, so $\Phi$ has a minimizer on $\mathbb{R}^n$. Uniqueness follows from strict convexity: $\Phi$ is strictly convex because $f$ is convex while $y \mapsto |y-v|^2/(2\lambda)$ is $1/\lambda$-strongly convex.
We claim that
\begin{align*}
\frac{v-x}{\lambda} \in \partial f(x).
\end{align*}
Fix $y \in \mathbb{R}^n$. If $f(y)=\infty$, then the desired subgradient inequality at $y$ is immediate. Hence assume $f(y)<\infty$. For $t \in (0,1)$, define $z_t := x+t(y-x) \in \mathbb{R}^n$. Since $x$ minimizes $\Phi$,
\begin{align*}
f(x) + \frac{1}{2\lambda}|x-v|^2 \leq f(z_t) + \frac{1}{2\lambda}|z_t-v|^2.
\end{align*}
By convexity of $f$,
\begin{align*}
f(z_t) \leq (1-t)f(x)+t f(y).
\end{align*}
Combining these two inequalities and subtracting $f(x)+\frac{1}{2\lambda}|x-v|^2$ gives
\begin{align*}
0 \leq t\bigl(f(y)-f(x)\bigr)+\frac{1}{2\lambda}\bigl(|x-v+t(y-x)|^2-|x-v|^2\bigr).
\end{align*}
Using the Euclidean square expansion,
\begin{align*}
|x-v+t(y-x)|^2-|x-v|^2 = 2t\langle x-v,y-x\rangle+t^2|y-x|^2.
\end{align*}
Therefore
\begin{align*}
0 \leq t\left(f(y)-f(x)+\left\langle \frac{x-v}{\lambda},y-x\right\rangle\right)+\frac{t^2}{2\lambda}|y-x|^2.
\end{align*}
Dividing by $t>0$ and letting $t \downarrow 0$ yields
\begin{align*}
f(y) \geq f(x)+\left\langle \frac{v-x}{\lambda},y-x\right\rangle.
\end{align*}
Since this holds for every $y \in \mathbb{R}^n$, this is precisely the definition of
\begin{align*}
\frac{v-x}{\lambda} \in \partial f(x).
\end{align*}
[guided]
The proximal point $x$ is defined by minimizing $f$ plus a quadratic penalty that keeps the minimizer near $v$. In the language of the proximal map, this means that $x$ is the minimizer of the function $\Phi:\mathbb{R}^n \to (-\infty,\infty]$ defined by
\begin{align*}
\Phi(y) := f(y) + \frac{1}{2\lambda}|y-v|^2.
\end{align*}
The minimizer exists by a direct finite-dimensional compactness argument, and it is unique because $\Phi$ is strictly convex. We verify the ingredients. The function $\Phi$ is lower semicontinuous because $f$ is lower semicontinuous and the quadratic penalty is continuous. The remaining point for existence is coercivity. We use the affine minorant property for proper closed convex functions in finite-dimensional Euclidean space, obtained by applying the supporting-hyperplane separation theorem to the closed convex epigraph of $f$: there are $a \in \mathbb{R}^n$ and $b \in \mathbb{R}$ such that $f(y) \geq \langle a,y\rangle+b$ for every $y \in \mathbb{R}^n$. Therefore the quadratic term in $\Phi$ dominates this affine lower bound as $|y| \to \infty$, so $\Phi(y) \to \infty$ as $|y| \to \infty$. Choose $R>0$ so that $\Phi(y)>\Phi(y_0)$ whenever $|y|>R$, where $y_0 \in \mathbb{R}^n$ is any point with $f(y_0)<\infty$. Let $\overline{B}(0,R) := \{y \in \mathbb{R}^n : |y| \leq R\}$ denote the closed Euclidean ball of radius $R$ centered at $0$. On the compact ball $\overline{B}(0,R)$, lower semicontinuity implies that the sublevel sets of $\Phi$ are closed, so $\Phi$ attains its minimum there; the choice of $R$ makes this a global minimum on $\mathbb{R}^n$. Finally, $\Phi$ is strictly convex because $f$ is convex and $y \mapsto |y-v|^2/(2\lambda)$ is $1/\lambda$-strongly convex, so this global minimizer is unique. The goal is to turn the minimality of $x$ for $\Phi$ into the subgradient inequality for $f$ at $x$.
Fix a comparison point $y \in \mathbb{R}^n$. If $f(y)=\infty$, then the inequality
\begin{align*}
f(y) \geq f(x)+\left\langle \frac{v-x}{\lambda},y-x\right\rangle
\end{align*}
holds immediately in the extended-real sense. Thus assume $f(y)<\infty$. Instead of comparing $x$ directly to $y$, compare $x$ to points on the line segment from $x$ to $y$. For each $t \in (0,1)$, define $z_t := x+t(y-x) \in \mathbb{R}^n$. Since $x$ minimizes $\Phi$,
\begin{align*}
f(x) + \frac{1}{2\lambda}|x-v|^2 \leq f(z_t) + \frac{1}{2\lambda}|z_t-v|^2.
\end{align*}
Convexity of $f$ gives
\begin{align*}
f(z_t) \leq (1-t)f(x)+t f(y).
\end{align*}
Substituting this convexity bound into the minimality inequality gives
\begin{align*}
f(x) + \frac{1}{2\lambda}|x-v|^2 \leq (1-t)f(x)+t f(y)+\frac{1}{2\lambda}|x-v+t(y-x)|^2.
\end{align*}
After subtracting $f(x)+\frac{1}{2\lambda}|x-v|^2$, we obtain
\begin{align*}
0 \leq t\bigl(f(y)-f(x)\bigr)+\frac{1}{2\lambda}\bigl(|x-v+t(y-x)|^2-|x-v|^2\bigr).
\end{align*}
Now expand the square in the direction $y-x$:
\begin{align*}
|x-v+t(y-x)|^2-|x-v|^2 = 2t\langle x-v,y-x\rangle+t^2|y-x|^2.
\end{align*}
Substitution yields
\begin{align*}
0 \leq t\left(f(y)-f(x)+\left\langle \frac{x-v}{\lambda},y-x\right\rangle\right)+\frac{t^2}{2\lambda}|y-x|^2.
\end{align*}
Because $t>0$, we may divide by $t$:
\begin{align*}
0 \leq f(y)-f(x)+\left\langle \frac{x-v}{\lambda},y-x\right\rangle+\frac{t}{2\lambda}|y-x|^2.
\end{align*}
Letting $t \downarrow 0$ removes the last term and gives
\begin{align*}
f(y) \geq f(x)+\left\langle \frac{v-x}{\lambda},y-x\right\rangle.
\end{align*}
This holds for every $y \in \mathbb{R}^n$, which is exactly the definition of the subdifferential inclusion
\begin{align*}
\frac{v-x}{\lambda} \in \partial f(x).
\end{align*}
[/guided]
[/step]
[step:Convert the primal subgradient inclusion into a dual subgradient inclusion]
Define
\begin{align*}
u := \frac{v-x}{\lambda} \in \mathbb{R}^n.
\end{align*}
From the previous step, $u \in \partial f(x)$, meaning that for every $y \in \mathbb{R}^n$,
\begin{align*}
f(y) \geq f(x) + \langle u, y-x\rangle.
\end{align*}
Equivalently,
\begin{align*}
\langle u,y\rangle - f(y) \leq \langle u,x\rangle - f(x)
\end{align*}
for every $y \in \mathbb{R}^n$. Therefore, by the definition of the convex conjugate,
\begin{align*}
f^*(u) = \langle u,x\rangle - f(x).
\end{align*}
Now let $w \in \mathbb{R}^n$. By the definition of $f^*(w)$,
\begin{align*}
f^*(w) \geq \langle w,x\rangle - f(x).
\end{align*}
Using $f^*(u)=\langle u,x\rangle-f(x)$, this becomes
\begin{align*}
f^*(w) \geq f^*(u) + \langle x, w-u\rangle.
\end{align*}
Since this holds for every $w \in \mathbb{R}^n$, we have
\begin{align*}
x \in \partial f^*(u).
\end{align*}
[guided]
The point of this step is to translate the primal optimality information into the dual language of the convex conjugate. Define
\begin{align*}
u := \frac{v-x}{\lambda} \in \mathbb{R}^n.
\end{align*}
The previous step proved $u \in \partial f(x)$, which means that for every $y \in \mathbb{R}^n$,
\begin{align*}
f(y) \geq f(x) + \langle u, y-x\rangle.
\end{align*}
Rearranging this inequality gives
\begin{align*}
\langle u,y\rangle - f(y) \leq \langle u,x\rangle - f(x).
\end{align*}
Thus the expression whose supremum defines $f^*(u)$ is maximized at $y=x$, so
\begin{align*}
f^*(u) = \langle u,x\rangle - f(x).
\end{align*}
Now fix $w \in \mathbb{R}^n$. The definition of the convex conjugate gives the lower bound
\begin{align*}
f^*(w) \geq \langle w,x\rangle - f(x).
\end{align*}
Using the identity for $f^*(u)$, we rewrite the right-hand side as
\begin{align*}
\langle w,x\rangle - f(x) = f^*(u) + \langle x,w-u\rangle.
\end{align*}
Therefore
\begin{align*}
f^*(w) \geq f^*(u) + \langle x,w-u\rangle
\end{align*}
for every $w \in \mathbb{R}^n$. This is exactly the subdifferential inequality for $f^*$ at $u$, with subgradient $x$, and hence
\begin{align*}
x \in \partial f^*(u).
\end{align*}
[/guided]
[/step]
[step:Identify the dual proximal minimizer]
We show that $u$ is the proximal point of the scaled conjugate at the scaled vector. Since $f^*:\mathbb{R}^n \to (-\infty,\infty]$ has already been defined, introduce $h:\mathbb{R}^n \to (-\infty,\infty]$ by
\begin{align*}
h(w) := \frac{1}{\lambda}f^*(w).
\end{align*}
Thus $h$ is the scaled function customarily denoted $f^*/\lambda$. Define $\Psi:\mathbb{R}^n \to (-\infty,\infty]$ by
\begin{align*}
\Psi(w) := \frac{1}{\lambda}f^*(w) + \frac{1}{2}\left|w-\frac{v}{\lambda}\right|^2.
\end{align*}
The subgradient inclusion $x \in \partial f^*(u)$ says that for every $w \in \mathbb{R}^n$,
\begin{align*}
f^*(w) \geq f^*(u) + \langle x, w-u\rangle.
\end{align*}
Multiplying by $1/\lambda$ gives
\begin{align*}
\frac{1}{\lambda}f^*(w) \geq \frac{1}{\lambda}f^*(u) + \left\langle \frac{x}{\lambda}, w-u\right\rangle.
\end{align*}
Since the definition of $u$ gives
\begin{align*}
\frac{x}{\lambda} = \frac{v}{\lambda} - u.
\end{align*}
Thus
\begin{align*}
\frac{1}{\lambda}f^*(w) \geq \frac{1}{\lambda}f^*(u) + \left\langle \frac{v}{\lambda}-u, w-u\right\rangle.
\end{align*}
Equivalently,
\begin{align*}
\frac{1}{\lambda}f^*(w) + \frac{1}{2}\left|w-\frac{v}{\lambda}\right|^2 \geq \frac{1}{\lambda}f^*(u) + \frac{1}{2}\left|u-\frac{v}{\lambda}\right|^2 + \frac{1}{2}|w-u|^2.
\end{align*}
Therefore $\Psi(w)\geq \Psi(u)$ for every $w \in \mathbb{R}^n$, so by the definition of the proximal map,
\begin{align*}
u = \operatorname{prox}_{f^*/\lambda}(v/\lambda).
\end{align*}
[guided]
We now show that the vector $u$ is not merely related to the dual problem, but is exactly its proximal minimizer. For the scaled conjugate
\begin{align*}
h(w) := \frac{1}{\lambda}f^*(w),
\end{align*}
the proximal objective at the point $v/\lambda$ is the function $\Psi:\mathbb{R}^n \to (-\infty,\infty]$ defined by
\begin{align*}
\Psi(w) := \frac{1}{\lambda}f^*(w) + \frac{1}{2}\left|w-\frac{v}{\lambda}\right|^2.
\end{align*}
From the previous step, $x \in \partial f^*(u)$, so for every $w \in \mathbb{R}^n$,
\begin{align*}
f^*(w) \geq f^*(u) + \langle x, w-u\rangle.
\end{align*}
Multiplying by $1/\lambda$ gives
\begin{align*}
\frac{1}{\lambda}f^*(w) \geq \frac{1}{\lambda}f^*(u) + \left\langle \frac{x}{\lambda}, w-u\right\rangle.
\end{align*}
Because of the defining relation between $u$, $x$, and $v$, we have
\begin{align*}
\frac{x}{\lambda} = \frac{v}{\lambda}-u.
\end{align*}
Substituting this identity yields
\begin{align*}
\frac{1}{\lambda}f^*(w) \geq \frac{1}{\lambda}f^*(u) + \left\langle \frac{v}{\lambda}-u, w-u\right\rangle.
\end{align*}
Adding the quadratic term and completing the square gives
\begin{align*}
\frac{1}{\lambda}f^*(w) + \frac{1}{2}\left|w-\frac{v}{\lambda}\right|^2 \geq \frac{1}{\lambda}f^*(u) + \frac{1}{2}\left|u-\frac{v}{\lambda}\right|^2 + \frac{1}{2}|w-u|^2.
\end{align*}
The final nonnegative term shows $\Psi(w) \geq \Psi(u)$ for every $w \in \mathbb{R}^n$. Hence $u$ minimizes the dual proximal objective, and by the definition of the proximal map,
\begin{align*}
u = \operatorname{prox}_{f^*/\lambda}(v/\lambda).
\end{align*}
[/guided]
[/step]
[step:Substitute the dual proximal point into the decomposition]
By the definition of $u$,
\begin{align*}
v = x + \lambda u.
\end{align*}
The first step defined
\begin{align*}
x=\operatorname{prox}_{\lambda f}(v),
\end{align*}
and the previous step proved
\begin{align*}
u = \operatorname{prox}_{f^*/\lambda}(v/\lambda).
\end{align*}
Substituting these two identities into
\begin{align*}
v=x+\lambda u
\end{align*}
gives
\begin{align*}
v = \operatorname{prox}_{\lambda f}(v) + \lambda\operatorname{prox}_{f^*/\lambda}(v/\lambda).
\end{align*}
Since $v \in \mathbb{R}^n$ was arbitrary, the Moreau decomposition holds for every $v \in \mathbb{R}^n$.
[guided]
The last step is only substitution, but it is where the two proximal identities are assembled. By definition,
\begin{align*}
u := \frac{v-x}{\lambda}
\end{align*}
so multiplying by $\lambda$ and adding $x$ gives
\begin{align*}
v = x + \lambda u.
\end{align*}
The first proximal problem defined
\begin{align*}
x = \operatorname{prox}_{\lambda f}(v),
\end{align*}
and the dual proximal step proved
\begin{align*}
u = \operatorname{prox}_{f^*/\lambda}(v/\lambda).
\end{align*}
Substituting these two identities into
\begin{align*}
v=x+\lambda u
\end{align*}
gives
\begin{align*}
v = \operatorname{prox}_{\lambda f}(v) + \lambda\operatorname{prox}_{f^*/\lambda}(v/\lambda).
\end{align*}
Because the argument began with an arbitrary vector $v \in \mathbb{R}^n$, this proves the Moreau decomposition for every $v \in \mathbb{R}^n$.
[/guided]
[/step]