[proofplan]
The forward implication is exactly the concentration theorem for optimal finite-cost plans: an optimal plan is concentrated on a Borel $c$-cyclically monotone set. For the converse, we start from a nonempty Borel $c$-cyclically monotone carrying set $\Gamma$ for $\pi$, reconstruct a $c$-convex potential $\phi$, and use $\Gamma \subset \partial^c\phi$ to obtain equality $\phi(x)+\phi^c(y)=c(x,y)$ on a $\pi$-full set. Integrating this equality gives equality between the cost of $\pi$ and the value of a dual-admissible pair. [Weak duality](/theorems/2549) then forces every competing transport plan to have cost at least the cost of $\pi$, so $\pi$ is optimal.
[/proofplan]
[step:Use the concentration theorem for the forward implication]
Assume that $\pi \in \Pi(\mu,\nu)$ is optimal and has finite cost. The hypotheses of the concentration theorem for optimal plans with lower semicontinuous costs are satisfied: $X$ and $Y$ are Polish spaces, $c: X \times Y \to [0,\infty]$ is lower semicontinuous, and $\pi$ is an optimal plan with
\begin{align*}
\int_{X \times Y} c(x,y)\, d\pi(x,y)<\infty.
\end{align*}
By that theorem, every optimal finite-cost plan is concentrated on a Borel $c$-cyclically monotone set. Hence there exists a Borel $c$-cyclically monotone set $\Gamma \subset X \times Y$ such that $\pi(\Gamma)=1$.
Since $\pi$ is a probability measure, $\pi(\Gamma)=1$ implies $\Gamma \neq \varnothing$. Thus $\pi$ is concentrated on a nonempty Borel $c$-cyclically monotone set.
[/step]
[step:Reconstruct dual potentials from the carrying set]
Assume conversely that $\pi \in \Pi(\mu,\nu)$ has finite cost and that there exists a nonempty Borel $c$-cyclically monotone set $\Gamma \subset X \times Y$ such that $\pi(\Gamma)=1$.
By the reconstruction hypothesis applied to this set $\Gamma$ and the finite-cost plan $\pi$, there exists a Borel $c$-convex function $\phi: X \to (-\infty,\infty]$ whose $c$-transform
\begin{align*}
\phi^c: Y \to [-\infty,\infty]
\end{align*}
is Borel, satisfies $\Gamma \subset \partial^c\phi$, and has well-defined marginal integrals against $\mu$ and $\nu$.
By the definition of the $c$-transform, for every $(x,y) \in X \times Y$,
\begin{align*}
\phi^c(y) \le c(x,y)-\phi(x).
\end{align*}
Equivalently,
\begin{align*}
\phi(x)+\phi^c(y) \le c(x,y).
\end{align*}
Thus $(\phi,\phi^c)$ is a Kantorovich dual-admissible pair.
Because $\Gamma \subset \partial^c\phi$, the defining equality for the $c$-subdifferential gives, for every $(x,y) \in \Gamma$,
\begin{align*}
\phi(x)+\phi^c(y)=c(x,y).
\end{align*}
[guided]
We now turn the geometric assumption on $\Gamma$ into a dual certificate. The reconstruction hypothesis is designed exactly for this purpose: from the nonempty Borel $c$-cyclically monotone set $\Gamma$ carrying the finite-cost plan $\pi$, it gives a Borel $c$-convex function $\phi: X \to (-\infty,\infty]$ and its Borel $c$-transform
\begin{align*}
\phi^c: Y \to [-\infty,\infty].
\end{align*}
The $c$-transform is defined by
\begin{align*}
\phi^c(y)=\inf_{x' \in X}\bigl(c(x',y)-\phi(x')\bigr).
\end{align*}
Fix any pair $(x,y) \in X \times Y$. Since an infimum is bounded above by each value in the family being infimized, we may take the particular point $x' = x$ and obtain
\begin{align*}
\phi^c(y) \le c(x,y)-\phi(x).
\end{align*}
Rearranging gives the dual admissibility inequality
\begin{align*}
\phi(x)+\phi^c(y) \le c(x,y).
\end{align*}
This inequality is the weak side of the dual certificate: it says that the potential pair never overestimates the transport cost. The stronger information comes from the $c$-subdifferential condition. Since $\Gamma \subset \partial^c\phi$, every pair $(x,y) \in \Gamma$ satisfies the equality condition in the definition of $\partial^c\phi$, namely
\begin{align*}
\phi(x)+\phi^c(y)=c(x,y).
\end{align*}
Thus the same admissible pair is tight precisely on the set where $\pi$ is concentrated.
[/guided]
[/step]
[step:Integrate the tight equality against the plan]
Since $\pi(\Gamma)=1$ and $\phi(x)+\phi^c(y)=c(x,y)$ for every $(x,y)\in\Gamma$, the equality holds $\pi$-almost everywhere. Therefore
\begin{align*}
\int_{X \times Y} c(x,y)\, d\pi(x,y)=\int_{X \times Y}\bigl(\phi(x)+\phi^c(y)\bigr)\, d\pi(x,y).
\end{align*}
The marginal conditions $\pi \in \Pi(\mu,\nu)$ mean that the first marginal of $\pi$ is $\mu$ and the second marginal of $\pi$ is $\nu$. Hence, using the definition of pushforward integration for the coordinate projections,
\begin{align*}
\int_{X \times Y} \phi(x)\, d\pi(x,y)=\int_X \phi(x)\, d\mu(x)
\end{align*}
and
\begin{align*}
\int_{X \times Y} \phi^c(y)\, d\pi(x,y)=\int_Y \phi^c(y)\, d\nu(y).
\end{align*}
The reconstruction hypothesis ensures that the sum of these extended integrals is well-defined. Consequently,
\begin{align*}
\int_{X \times Y} c(x,y)\, d\pi(x,y)=\int_X \phi(x)\, d\mu(x)+\int_Y \phi^c(y)\, d\nu(y).
\end{align*}
[/step]
[step:Apply extended weak duality to every competitor]
Let $\gamma \in \Pi(\mu,\nu)$ be arbitrary. Since $(\phi,\phi^c)$ is dual-admissible, the inequality
\begin{align*}
\phi(x)+\phi^c(y) \le c(x,y)
\end{align*}
holds for every pair $(x,y) \in X \times Y$ for which the extended sum is defined, in the admissibility sense coming from the $c$-transform. The functions $\phi$ and $\phi^c$ are Borel, and the reconstruction hypothesis gives the well-defined extended marginal quantity
\begin{align*}
\int_X \phi(x)\, d\mu(x)+\int_Y \phi^c(y)\, d\nu(y).
\end{align*}
Therefore the hypotheses of the extended-valued Kantorovich weak duality theorem are satisfied: for every transport plan $\gamma \in \Pi(\mu,\nu)$,
\begin{align*}
\int_X \phi(x)\, d\mu(x)+\int_Y \phi^c(y)\, d\nu(y) \le \int_{X \times Y} c(x,y)\, d\gamma(x,y),
\end{align*}
where the right-hand side is allowed to be $\infty$. This invocation avoids forming an undefined integral of the pointwise extended-valued sum against $\gamma$; weak duality is applied directly to the measurable admissible pair and its well-defined marginal integrals.
Combining this inequality with the equality obtained for $\pi$ yields
\begin{align*}
\int_{X \times Y} c(x,y)\, d\pi(x,y) \le \int_{X \times Y} c(x,y)\, d\gamma(x,y).
\end{align*}
If $\gamma$ has infinite cost, the inequality is immediate from the finiteness of the cost of $\pi$; if $\gamma$ has finite cost, it is exactly the preceding weak-duality inequality. Since $\gamma \in \Pi(\mu,\nu)$ was arbitrary, $\pi$ attains the minimum Kantorovich cost. Therefore $\pi$ is optimal.
[/step]