[proofplan]
We prove the equivalence by reducing optimality to equality in the two Fenchel inequalities used to derive [weak duality](/theorems/2549). If the subgradient inclusions hold, the equality case in Fenchel's inequality gives two identities whose sum says that the primal objective at $x^*$ equals the dual objective at $y^*$. Since weak duality always gives every dual value below every primal value, this equality forces primal optimality, dual optimality, and zero duality gap. Conversely, if an optimal pair has zero gap, the two nonnegative Fenchel slacks appearing in the weak-duality estimate must both vanish, and equality in Fenchel's inequality is exactly the stated pair of subgradient inclusions.
[/proofplan]
[step:Record the Fenchel inequality and its equality condition]
For a proper convex function $h: \mathbb{R}^k \to (-\infty,\infty]$, define its convex conjugate
\begin{align*}
h^*: \mathbb{R}^k \to (-\infty,\infty]
\end{align*}
by
\begin{align*}
h^*(u) := \sup_{z \in \mathbb{R}^k} \{ u \cdot z - h(z) \}.
\end{align*}
For every $z,u \in \mathbb{R}^k$, the definition of the supremum gives
\begin{align*}
u \cdot z - h(z) \leq h^*(u).
\end{align*}
Equivalently,
\begin{align*}
h(z) + h^*(u) \geq u \cdot z.
\end{align*}
Moreover, equality holds if and only if $u \in \partial h(z)$, where equality is equality of finite [real numbers](/page/Real%20Numbers). Indeed, if equality holds, then the right-hand side $u \cdot z$ is finite and $h$ is proper, so neither $h(z)$ nor $h^*(u)$ can be $+\infty$ in the equality
\begin{align*}
h(z) + h^*(u) = u \cdot z.
\end{align*}
Thus $h(z)$ and $h^*(u)$ are finite real numbers, and the equality is equivalent to
\begin{align*}
h^*(u) = u \cdot z - h(z).
\end{align*}
By the definition of $h^*(u)$, this is equivalent to saying that for every $w \in \mathbb{R}^k$,
\begin{align*}
u \cdot w - h(w) \leq u \cdot z - h(z),
\end{align*}
which rearranges to
\begin{align*}
h(w) \geq h(z) + u \cdot (w-z) \qquad \text{for all } w \in \mathbb{R}^k.
\end{align*}
This is precisely the definition of $u \in \partial h(z)$. Conversely, if $u \in \partial h(z)$, then $h(z)$ is finite by the definition of the subdifferential, and for every $w \in \mathbb{R}^k$,
\begin{align*}
u \cdot w - h(w) \leq u \cdot z - h(z).
\end{align*}
Taking the supremum over $w \in \mathbb{R}^k$ gives
\begin{align*}
h^*(u) \leq u \cdot z - h(z).
\end{align*}
The reverse inequality follows by evaluating the supremum defining $h^*(u)$ at $w=z$, so
\begin{align*}
h^*(u) = u \cdot z - h(z),
\end{align*}
and hence equality holds in Fenchel's inequality.
[/step]
[step:Derive weak duality from the two Fenchel inequalities]
Fix arbitrary $x \in \mathbb{R}^n$ and $y \in \mathbb{R}^m$. Apply the Fenchel inequality to $f$ with $z=x$ and $u=-A^\top y$. This gives
\begin{align*}
f(x) + f^*(-A^\top y) \geq (-A^\top y) \cdot x.
\end{align*}
Since $A^\top$ is the transpose of $A$, the Euclidean adjoint identity gives
\begin{align*}
(-A^\top y) \cdot x = - y \cdot Ax.
\end{align*}
Thus
\begin{align*}
f(x) + f^*(-A^\top y) \geq -y \cdot Ax.
\end{align*}
Apply the Fenchel inequality to $g$ with $z=Ax$ and $u=y$. This gives
\begin{align*}
g(Ax) + g^*(y) \geq y \cdot Ax.
\end{align*}
Adding the two inequalities cancels the two pairing terms and yields
\begin{align*}
f(x) + g(Ax) + f^*(-A^\top y) + g^*(y) \geq 0.
\end{align*}
Equivalently,
\begin{align*}
-f^*(-A^\top y) - g^*(y) \leq f(x) + g(Ax).
\end{align*}
Since $x$ and $y$ were arbitrary, every dual feasible value is bounded above by every primal feasible value.
All displayed calculations in this derivation are written as separate single-line `align*` displays, so no raw TeX row separators are used.
[guided]
The purpose of this step is to isolate the entire duality argument in one inequality. Take any primal candidate $x \in \mathbb{R}^n$ and any dual candidate $y \in \mathbb{R}^m$. We want to compare the primal value $f(x)+g(Ax)$ with the dual value $-f^*(-A^\top y)-g^*(y)$.
First apply Fenchel's inequality to the function $f: \mathbb{R}^n \to (-\infty,\infty]$ at the point $x \in \mathbb{R}^n$ and the dual vector $-A^\top y \in \mathbb{R}^n$. This gives
\begin{align*}
f(x) + f^*(-A^\top y) \geq (-A^\top y) \cdot x.
\end{align*}
The right-hand side is converted into a pairing in $\mathbb{R}^m$ by the defining property of the transpose. Since $A^\top: \mathbb{R}^m \to \mathbb{R}^n$ is the transpose of $A: \mathbb{R}^n \to \mathbb{R}^m$, we have
\begin{align*}
(A^\top y) \cdot x = y \cdot Ax.
\end{align*}
Multiplying by $-1$ gives
\begin{align*}
(-A^\top y) \cdot x = -y \cdot Ax.
\end{align*}
Therefore
\begin{align*}
f(x) + f^*(-A^\top y) \geq -y \cdot Ax.
\end{align*}
Now apply Fenchel's inequality to the function $g: \mathbb{R}^m \to (-\infty,\infty]$ at the point $Ax \in \mathbb{R}^m$ and the dual vector $y \in \mathbb{R}^m$. This gives
\begin{align*}
g(Ax) + g^*(y) \geq y \cdot Ax.
\end{align*}
The reason these two choices are paired is that the terms $-y \cdot Ax$ and $y \cdot Ax$ cancel when the inequalities are added. Adding them gives
\begin{align*}
f(x) + g(Ax) + f^*(-A^\top y) + g^*(y) \geq 0.
\end{align*}
Rearranging gives the weak-duality inequality
\begin{align*}
-f^*(-A^\top y) - g^*(y) \leq f(x) + g(Ax).
\end{align*}
This inequality holds for every $x \in \mathbb{R}^n$ and every $y \in \mathbb{R}^m$, so no feasibility qualification is being hidden: the extended-real values automatically encode inadmissible choices.
[/guided]
[/step]
[step:Use the subgradient inclusions to force equality of primal and dual values]
Assume
\begin{align*}
-A^\top y^* \in \partial f(x^*), \qquad y^* \in \partial g(Ax^*).
\end{align*}
By the equality condition in Fenchel's inequality applied to $f$ with $z=x^*$ and $u=-A^\top y^*$,
\begin{align*}
f(x^*) + f^*(-A^\top y^*) = (-A^\top y^*) \cdot x^*.
\end{align*}
Using the transpose identity, this becomes
\begin{align*}
f(x^*) + f^*(-A^\top y^*) = -y^* \cdot Ax^*.
\end{align*}
By the equality condition in Fenchel's inequality applied to $g$ with $z=Ax^*$ and $u=y^*$,
\begin{align*}
g(Ax^*) + g^*(y^*) = y^* \cdot Ax^*.
\end{align*}
Adding the two identities gives
\begin{align*}
f(x^*) + g(Ax^*) + f^*(-A^\top y^*) + g^*(y^*) = 0.
\end{align*}
Equivalently,
\begin{align*}
f(x^*) + g(Ax^*) = -f^*(-A^\top y^*) - g^*(y^*).
\end{align*}
Weak duality gives, for every $x \in \mathbb{R}^n$ and every $y \in \mathbb{R}^m$,
\begin{align*}
-f^*(-A^\top y) - g^*(y) \leq f(x^*) + g(Ax^*) = -f^*(-A^\top y^*) - g^*(y^*) \leq f(x) + g(Ax).
\end{align*}
Thus $x^*$ attains the primal infimum, $y^*$ attains the dual supremum, and the two optimal values are equal.
[/step]
[step:Use zero duality gap to force equality in both Fenchel inequalities]
Assume now that $x^*$ is primal optimal, $y^*$ is dual optimal, and
\begin{align*}
f(x^*) + g(Ax^*) = -f^*(-A^\top y^*) - g^*(y^*).
\end{align*}
Because $x^*$ is an attained primal optimum and $y^*$ is an attained dual optimum with zero duality gap, the common optimal value is finite. Hence $f(x^*) + g(Ax^*)$ and $-f^*(-A^\top y^*) - g^*(y^*)$ are finite real numbers. Since $f$ and $g$ are proper convex functions, they never take the value $-\infty$; by definition as suprema of affine functions minus proper extended-real values, $f^*$ and $g^*$ also never take the value $-\infty$. This implies that $f(x^*)$, $g(Ax^*)$, $f^*(-A^\top y^*)$, and $g^*(y^*)$ are all finite real numbers. Define the two real-valued Fenchel slack quantities
\begin{align*}
S_f := f(x^*) + f^*(-A^\top y^*) - (-A^\top y^*) \cdot x^*
\end{align*}
and
\begin{align*}
S_g := g(Ax^*) + g^*(y^*) - y^* \cdot Ax^*.
\end{align*}
By Fenchel's inequality applied to $f$ and to $g$, both quantities satisfy
\begin{align*}
S_f \geq 0, \qquad S_g \geq 0.
\end{align*}
Using $(-A^\top y^*) \cdot x^* = -y^* \cdot Ax^*$, their sum is
\begin{align*}
S_f + S_g = f(x^*) + g(Ax^*) + f^*(-A^\top y^*) + g^*(y^*).
\end{align*}
The assumed equality of the primal value at $x^*$ and the dual value at $y^*$ says exactly that
\begin{align*}
f(x^*) + g(Ax^*) + f^*(-A^\top y^*) + g^*(y^*) = 0.
\end{align*}
Hence
\begin{align*}
S_f + S_g = 0.
\end{align*}
Since $S_f$ and $S_g$ are nonnegative real numbers whose sum is $0$, both are equal to $0$. Therefore equality holds in Fenchel's inequality for $f$ at $(x^*,-A^\top y^*)$ and for $g$ at $(Ax^*,y^*)$. By the equality condition proved above,
\begin{align*}
-A^\top y^* \in \partial f(x^*), \qquad y^* \in \partial g(Ax^*).
\end{align*}
This proves the converse implication and completes the proof.
[guided]
The converse is the equality-case argument behind weak duality. Assume $x^*$ is primal optimal, $y^*$ is dual optimal, and the duality gap is zero, so
\begin{align*}
f(x^*) + g(Ax^*) = -f^*(-A^\top y^*) - g^*(y^*).
\end{align*}
Because both optima are attained and the gap is zero, this common value is finite. Therefore the displayed primal and dual values are finite real numbers. Since $f$ and $g$ never take the value $-\infty$, and since their conjugates $f^*$ and $g^*$ are defined as suprema and hence also never take the value $-\infty$, each of the four terms $f(x^*)$, $g(Ax^*)$, $f^*(-A^\top y^*)$, and $g^*(y^*)$ is finite.
Define the Fenchel slack for $f$ at the pair $(x^*,-A^\top y^*)$ by
\begin{align*}
S_f := f(x^*) + f^*(-A^\top y^*) - (-A^\top y^*) \cdot x^*.
\end{align*}
Define the Fenchel slack for $g$ at the pair $(Ax^*,y^*)$ by
\begin{align*}
S_g := g(Ax^*) + g^*(y^*) - y^* \cdot Ax^*.
\end{align*}
Fenchel's inequality gives $S_f \geq 0$ and $S_g \geq 0$. The transpose identity gives $(-A^\top y^*) \cdot x^* = -y^* \cdot Ax^*$, so adding the two slack definitions cancels the pairing terms and gives
\begin{align*}
S_f + S_g = f(x^*) + g(Ax^*) + f^*(-A^\top y^*) + g^*(y^*).
\end{align*}
The zero-gap identity is exactly the statement that the right-hand side is $0$. Hence
\begin{align*}
S_f + S_g = 0.
\end{align*}
Now $S_f$ and $S_g$ are nonnegative real numbers, so a zero sum forces $S_f=0$ and $S_g=0$. Thus equality holds in Fenchel's inequality for $f$ at $(x^*,-A^\top y^*)$ and for $g$ at $(Ax^*,y^*)$. By the equality condition proved at the start of the proof, this is equivalent to
\begin{align*}
-A^\top y^* \in \partial f(x^*), \qquad y^* \in \partial g(Ax^*).
\end{align*}
This proves the converse implication and completes the proof.
[/guided]
[/step]