[proofplan]
We prove the equivalence directly from one-sided variations and the defining inequalities for convex differentiable functions and subgradients. If $\hat{x}$ is optimal, we compare $F(\hat{x})$ with $F((1-t)\hat{x}+tx)$, divide by $t>0$, use convexity of $R$, and let $t \downarrow 0$ using differentiability of $\ell$; this gives the subgradient inequality for $R$ at $\hat{x}$ with subgradient
\begin{align*}
-\frac{1}{\lambda}\nabla \ell(\hat{x}).
\end{align*}
Conversely, if such a subgradient exists, the subgradient inequality for $R$ and the first-order convexity inequality for $\ell$ combine to show that every $x \in \mathbb{R}^n$ has objective value at least that of $\hat{x}$.
[/proofplan]
[step:Record the two convex inequalities used in both directions]
Define the composite objective $F: \mathbb{R}^n \to (-\infty,\infty]$ by
\begin{align*}
F(x) = \ell(x) + \lambda R(x).
\end{align*}
Since $\hat{x} \in \operatorname{dom} R$ and $\ell$ is real-valued, $F(\hat{x}) < \infty$.
Because $\ell: \mathbb{R}^n \to \mathbb{R}$ is convex and differentiable, for every $x \in \mathbb{R}^n$ the first-order convexity inequality gives
\begin{align*}
\ell(x) \geq \ell(\hat{x}) + \nabla \ell(\hat{x}) \cdot (x - \hat{x}).
\end{align*}
For a vector $z \in \mathbb{R}^n$, the subgradient condition $z \in \partial R(\hat{x})$ means precisely that for every $x \in \mathbb{R}^n$,
\begin{align*}
R(x) \geq R(\hat{x}) + z \cdot (x - \hat{x}).
\end{align*}
[/step]
[step:Derive the subgradient certificate from global optimality]
Assume that $\hat{x}$ minimizes $F$ over $\mathbb{R}^n$. Define the vector $z \in \mathbb{R}^n$ by
\begin{align*}
z = -\frac{1}{\lambda}\nabla \ell(\hat{x}).
\end{align*}
We prove that $z \in \partial R(\hat{x})$.
Fix $x \in \mathbb{R}^n$. If $R(x) = \infty$, then the subgradient inequality for $R$ at $\hat{x}$ holds because $R(\hat{x}) < \infty$ and the left-hand side is $\infty$. Suppose instead that $R(x) < \infty$. For each $t \in (0,1]$, define the point $x_t \in \mathbb{R}^n$ by
\begin{align*}
x_t = (1-t)\hat{x} + tx.
\end{align*}
Convexity of $R$ and finiteness of $R(\hat{x})$ and $R(x)$ give $R(x_t) < \infty$. Since $\hat{x}$ minimizes $F$,
\begin{align*}
0 \leq \ell(x_t) - \ell(\hat{x}) + \lambda\bigl(R(x_t) - R(\hat{x})\bigr).
\end{align*}
Dividing by $t>0$ gives
\begin{align*}
0 \leq \frac{\ell(x_t)-\ell(\hat{x})}{t} + \lambda\frac{R(x_t)-R(\hat{x})}{t}.
\end{align*}
Convexity of $R$ also gives
\begin{align*}
R(x_t) \leq (1-t)R(\hat{x}) + tR(x),
\end{align*}
and hence
\begin{align*}
\frac{R(x_t)-R(\hat{x})}{t} \leq R(x)-R(\hat{x}).
\end{align*}
Therefore
\begin{align*}
0 \leq \frac{\ell(x_t)-\ell(\hat{x})}{t} + \lambda\bigl(R(x)-R(\hat{x})\bigr).
\end{align*}
Because $\ell$ is differentiable at $\hat{x}$ and $x_t-\hat{x}=t(x-\hat{x})$, letting $t \downarrow 0$ yields
\begin{align*}
0 \leq \nabla \ell(\hat{x}) \cdot (x - \hat{x}) + \lambda\bigl(R(x)-R(\hat{x})\bigr).
\end{align*}
Dividing by $\lambda>0$ and rearranging gives
\begin{align*}
R(x) \geq R(\hat{x}) + z \cdot (x - \hat{x}).
\end{align*}
Thus $z \in \partial R(\hat{x})$, and by definition of $z$,
\begin{align*}
\nabla \ell(\hat{x}) + \lambda z = 0.
\end{align*}
[guided]
Assume that $\hat{x}$ is a global minimizer of the composite objective $F: \mathbb{R}^n \to (-\infty,\infty]$, where
\begin{align*}
F(x) = \ell(x) + \lambda R(x).
\end{align*}
We want to turn the global comparison $F(\hat{x}) \leq F(y)$ for every $y \in \mathbb{R}^n$ into the subgradient inequality for $R$ at $\hat{x}$.
The desired optimality equation forces the candidate subgradient. Define $z \in \mathbb{R}^n$ by
\begin{align*}
z = -\frac{1}{\lambda}\nabla \ell(\hat{x}).
\end{align*}
To prove $z \in \partial R(\hat{x})$, we must prove
\begin{align*}
R(x) \geq R(\hat{x}) + z \cdot (x - \hat{x})
\end{align*}
for every $x \in \mathbb{R}^n$.
Fix $x \in \mathbb{R}^n$. If $R(x)=\infty$, the inequality holds because $R(\hat{x})<\infty$ and the left-hand side is $\infty$. Hence suppose $R(x)<\infty$. The direct comparison $F(\hat{x})\leq F(x)$ is not enough by itself, because a lower bound for $\ell(x)$ cannot be substituted into the upper side of that inequality. Instead, we approach $x$ from $\hat{x}$ along the line segment and then take a one-sided derivative.
For each $t \in (0,1]$, define $x_t \in \mathbb{R}^n$ by
\begin{align*}
x_t = (1-t)\hat{x} + tx.
\end{align*}
Since $R$ is convex and both $R(\hat{x})$ and $R(x)$ are finite, convexity gives
\begin{align*}
R(x_t) \leq (1-t)R(\hat{x}) + tR(x) < \infty.
\end{align*}
Thus $F(x_t)$ is finite, and optimality of $\hat{x}$ gives
\begin{align*}
0 \leq F(x_t)-F(\hat{x}) = \ell(x_t)-\ell(\hat{x}) + \lambda\bigl(R(x_t)-R(\hat{x})\bigr).
\end{align*}
Dividing by $t>0$ preserves the inequality:
\begin{align*}
0 \leq \frac{\ell(x_t)-\ell(\hat{x})}{t} + \lambda\frac{R(x_t)-R(\hat{x})}{t}.
\end{align*}
The convexity estimate for $R(x_t)$ also gives
\begin{align*}
R(x_t)-R(\hat{x}) \leq t\bigl(R(x)-R(\hat{x})\bigr),
\end{align*}
so
\begin{align*}
\frac{R(x_t)-R(\hat{x})}{t} \leq R(x)-R(\hat{x}).
\end{align*}
Substituting this upper bound is valid because it makes the right-hand side larger, and the previous non-negative quantity remains bounded above by that larger expression. We obtain
\begin{align*}
0 \leq \frac{\ell(x_t)-\ell(\hat{x})}{t} + \lambda\bigl(R(x)-R(\hat{x})\bigr).
\end{align*}
Now use differentiability of $\ell$ at $\hat{x}$. Since $x_t-\hat{x}=t(x-\hat{x})$, the directional difference quotient satisfies
\begin{align*}
\lim_{t\downarrow 0}\frac{\ell(x_t)-\ell(\hat{x})}{t} = \nabla \ell(\hat{x}) \cdot (x-\hat{x}).
\end{align*}
Letting $t \downarrow 0$ gives
\begin{align*}
0 \leq \nabla \ell(\hat{x}) \cdot (x - \hat{x}) + \lambda\bigl(R(x)-R(\hat{x})\bigr).
\end{align*}
After dividing by the positive scalar $\lambda$ and rearranging, this becomes
\begin{align*}
R(x) \geq R(\hat{x}) - \frac{1}{\lambda}\nabla \ell(\hat{x}) \cdot (x - \hat{x}).
\end{align*}
Using the definition of $z$, we get
\begin{align*}
R(x) \geq R(\hat{x}) + z \cdot (x - \hat{x}).
\end{align*}
This is exactly the definition of $z \in \partial R(\hat{x})$. Finally, the definition of $z$ gives
\begin{align*}
\nabla \ell(\hat{x}) + \lambda z = 0.
\end{align*}
[/guided]
[/step]
[step:Use the certificate to prove global optimality]
Conversely, assume that there exists $z \in \partial R(\hat{x})$ such that
\begin{align*}
\nabla \ell(\hat{x}) + \lambda z = 0.
\end{align*}
Fix $x \in \mathbb{R}^n$. The first-order convexity inequality for $\ell$ gives
\begin{align*}
\ell(x) \geq \ell(\hat{x}) + \nabla \ell(\hat{x}) \cdot (x - \hat{x}).
\end{align*}
The subgradient inequality for $z \in \partial R(\hat{x})$ gives
\begin{align*}
R(x) \geq R(\hat{x}) + z \cdot (x - \hat{x}).
\end{align*}
Multiplying the [second inequality](/theorems/2136) by $\lambda > 0$ and adding it to the first yields
\begin{align*}
\ell(x) + \lambda R(x) \geq \ell(\hat{x}) + \lambda R(\hat{x}) + \bigl(\nabla \ell(\hat{x}) + \lambda z\bigr) \cdot (x - \hat{x}).
\end{align*}
The certificate makes the final [inner product](/page/Inner%20Product) equal to $0$, hence
\begin{align*}
F(x) \geq F(\hat{x}).
\end{align*}
Since $x \in \mathbb{R}^n$ was arbitrary, $\hat{x}$ minimizes $F$ over $\mathbb{R}^n$.
[/step]
[step:Rewrite the certificate as a subdifferential inclusion]
The equation
\begin{align*}
\nabla \ell(\hat{x}) + \lambda z = 0
\end{align*}
is equivalent, because $\lambda > 0$, to
\begin{align*}
z = -\frac{1}{\lambda}\nabla \ell(\hat{x}).
\end{align*}
Therefore the existence of $z \in \partial R(\hat{x})$ satisfying the optimality equation is equivalent to the single inclusion
\begin{align*}
-\frac{1}{\lambda}\nabla \ell(\hat{x}) \in \partial R(\hat{x}).
\end{align*}
This proves both stated formulations of the optimality certificate.
[/step]