[proofplan]
The proof is a corollary of Kellerer's [compactness theorem](/theorems/2748) for the Kantorovich dual problem, in a form strictly stronger than the present statement: it asserts compactness and attainment for maximizing sequences after domination by integrable marginal functions. We first define the admissible dual value and prove [weak duality](/theorems/2549), so the supremum is finite. We then invoke the stronger Kellerer compactness theorem, verify each of its hypotheses from the stated domination $0\leq c\leq a+b$, and obtain finite real-valued Borel integrable potentials satisfying the admissibility inequality pointwise everywhere and attaining the dual supremum.
[/proofplan]
[step:Define the admissible dual value]
Let $\mathcal{A}_c(\mu,\nu)$ denote the set of all pairs $(f,g)$ such that $f: X \to \mathbb{R}$ and $g: Y \to \mathbb{R}$ are finite real-valued Borel functions, $f \in L^1(X,\mu)$, $g \in L^1(Y,\nu)$, and
\begin{align*}
f(x)+g(y) \leq c(x,y)
\end{align*}
for every $(x,y) \in X \times Y$. Define the Kantorovich dual value
\begin{align*}
D_c(\mu,\nu)
:=
\sup_{(f,g)\in \mathcal{A}_c(\mu,\nu)}
\left(
\int_X f(x)\, d\mu(x)+\int_Y g(y)\, d\nu(y)
\right).
\end{align*}
The pair $(0,0)$ belongs to $\mathcal{A}_c(\mu,\nu)$ because $c \geq 0$, so $D_c(\mu,\nu)$ is not $-\infty$. Also, for every $(f,g)\in \mathcal{A}_c(\mu,\nu)$ and every $\pi \in \Pi(\mu,\nu)$,
\begin{align*}
\int_X f(x)\, d\mu(x)+\int_Y g(y)\, d\nu(y)
=
\int_{X\times Y} \bigl(f(x)+g(y)\bigr)\, d\pi(x,y)
\leq
\int_{X\times Y} c(x,y)\, d\pi(x,y).
\end{align*}
Taking the infimum over $\pi \in \Pi(\mu,\nu)$ gives
\begin{align*}
D_c(\mu,\nu) \leq C_c(\mu,\nu)<\infty.
\end{align*}
[guided]
The dual problem is the maximization problem over all admissible potentials. We therefore introduce the precise class
$\mathcal{A}_c(\mu,\nu)$: a pair $(f,g)$ is admissible when both functions are finite real-valued Borel maps, marginally integrable, and satisfy the pointwise constraint
\begin{align*}
f(x)+g(y) \leq c(x,y)
\end{align*}
for every $(x,y)\in X\times Y$.
The zero pair is admissible because $c$ takes values in $[0,\infty)$, so the dual value is a genuine extended real number bounded below by $0$. Next we prove weak duality, since this identifies the scale on which the maximizing sequence lives. Let $(f,g)\in \mathcal{A}_c(\mu,\nu)$ and let $\pi\in \Pi(\mu,\nu)$. Because the first marginal of $\pi$ is $\mu$ and the second marginal of $\pi$ is $\nu$, the marginal identities give
\begin{align*}
\int_X f(x)\, d\mu(x)+\int_Y g(y)\, d\nu(y)
=
\int_{X\times Y} f(x)\, d\pi(x,y)
+
\int_{X\times Y} g(y)\, d\pi(x,y).
\end{align*}
Combining the two integrals yields
\begin{align*}
\int_X f(x)\, d\mu(x)+\int_Y g(y)\, d\nu(y)
=
\int_{X\times Y} \bigl(f(x)+g(y)\bigr)\, d\pi(x,y).
\end{align*}
The admissibility inequality then gives
\begin{align*}
\int_{X\times Y} \bigl(f(x)+g(y)\bigr)\, d\pi(x,y)
\leq
\int_{X\times Y} c(x,y)\, d\pi(x,y).
\end{align*}
Since this holds for every transport plan $\pi\in\Pi(\mu,\nu)$, taking the infimum over all such $\pi$ gives
\begin{align*}
\int_X f(x)\, d\mu(x)+\int_Y g(y)\, d\nu(y)
\leq
C_c(\mu,\nu).
\end{align*}
Finally, taking the supremum over all admissible pairs $(f,g)$ gives
\begin{align*}
D_c(\mu,\nu) \leq C_c(\mu,\nu)<\infty.
\end{align*}
[/guided]
[/step]
[step:Apply Kellerer's compactness theorem to a maximizing sequence]
We use the following previously established external result, which is stronger than the present theorem because it gives compactness of maximizing sequences in addition to existence of an optimizer. Kellerer's compactness theorem for the Kantorovich dual problem states: if $X$ and $Y$ are Polish spaces, $\mu\in\mathcal{P}(X)$, $\nu\in\mathcal{P}(Y)$, and $c:X\times Y\to[0,\infty)$ is finite lower semicontinuous with Borel functions $a:X\to\mathbb{R}$ and $b:Y\to\mathbb{R}$ satisfying $a\in L^1(X,\mu)$, $b\in L^1(Y,\nu)$, and $c(x,y)\leq a(x)+b(y)$ for every $(x,y)\in X\times Y$, then every maximizing sequence in $\mathcal{A}_c(\mu,\nu)$ has a subsequence whose normalized potentials converge in the Kellerer compactness topology to Borel integrable potentials $(\varphi,\psi)\in\mathcal{A}_c(\mu,\nu)$ attaining the supremum $D_c(\mu,\nu)$.
We verify the hypotheses of this theorem. The spaces $X$ and $Y$ are Polish by assumption, and $\mu$ and $\nu$ are Borel probability measures because $\mu\in\mathcal{P}(X)$ and $\nu\in\mathcal{P}(Y)$. The cost map $c:X\times Y\to[0,\infty)$ is finite and lower semicontinuous by assumption. The domination hypothesis is also exactly available: the given Borel maps $a:X\to\mathbb{R}$ and $b:Y\to\mathbb{R}$ satisfy $a\in L^1(X,\mu)$, $b\in L^1(Y,\nu)$, and
\begin{align*}
0 \leq c(x,y) \leq a(x)+b(y)
\end{align*}
for every $(x,y)\in X\times Y$.
Since $D_c(\mu,\nu)>-\infty$ and $D_c(\mu,\nu)<\infty$ by the preceding step, choose a maximizing sequence $((f_n,g_n))_{n=1}^{\infty}$ in $\mathcal{A}_c(\mu,\nu)$, meaning that each $(f_n,g_n)\in\mathcal{A}_c(\mu,\nu)$ and
\begin{align*}
\lim_{n\to\infty}
\left(
\int_X f_n(x)\, d\mu(x)+\int_Y g_n(y)\, d\nu(y)
\right)
=
D_c(\mu,\nu).
\end{align*}
Kellerer's compactness theorem applied to this sequence gives finite real-valued Borel functions $\varphi:X\to\mathbb{R}$ and $\psi:Y\to\mathbb{R}$ such that $\varphi\in L^1(X,\mu)$, $\psi\in L^1(Y,\nu)$,
\begin{align*}
\varphi(x)+\psi(y)\leq c(x,y)
\end{align*}
for every $(x,y)\in X\times Y$, and
\begin{align*}
\int_X \varphi(x)\, d\mu(x)+\int_Y \psi(y)\, d\nu(y)
=
D_c(\mu,\nu).
\end{align*}
[guided]
The central input is not the theorem currently being proved, but Kellerer's compactness theorem for maximizing sequences in the Kantorovich dual problem. That theorem has a stronger conclusion than mere existence: after the usual normalization of dual potentials, a maximizing sequence has a compactness limit, and the limit is an admissible maximizing pair. Its hypotheses are precisely the structural assumptions in the present theorem: Polish source and target spaces, probability marginals, a finite lower semicontinuous non-negative cost, and an upper bound by an integrable marginal sum.
We now check those hypotheses one by one. The spaces $X$ and $Y$ are Polish by assumption. Since $\mu\in\mathcal{P}(X)$ and $\nu\in\mathcal{P}(Y)$, $\mu$ and $\nu$ are Borel probability measures on $X$ and $Y$. The function
\begin{align*}
c:X\times Y\to[0,\infty)
\end{align*}
is finite and lower semicontinuous by the theorem statement. Finally, the domination needed for Kellerer's compactness theorem is exactly the given inequality: there are Borel maps $a:X\to\mathbb{R}$ and $b:Y\to\mathbb{R}$ with $a\in L^1(X,\mu)$ and $b\in L^1(Y,\nu)$ such that
\begin{align*}
0 \leq c(x,y) \leq a(x)+b(y)
\end{align*}
for every $(x,y)\in X\times Y$.
Why is this domination hypothesis important? It prevents maximizing sequences from losing mass by sending one potential to $+\infty$ on a small $\mu$-set and the other to $-\infty$ on a compensating $\nu$-set. Kellerer's theorem packages this tightness and normalization argument into a compactness statement for the dual potentials.
Because the previous step proved $0\leq D_c(\mu,\nu)<\infty$, we may choose a maximizing sequence $((f_n,g_n))_{n=1}^{\infty}$ in $\mathcal{A}_c(\mu,\nu)$, so each pair is admissible and
\begin{align*}
\lim_{n\to\infty}
\left(
\int_X f_n(x)\, d\mu(x)+\int_Y g_n(y)\, d\nu(y)
\right)
=
D_c(\mu,\nu).
\end{align*}
Applying Kellerer's compactness theorem to this sequence yields finite real-valued Borel maps $\varphi:X\to\mathbb{R}$ and $\psi:Y\to\mathbb{R}$ with $\varphi\in L^1(X,\mu)$ and $\psi\in L^1(Y,\nu)$. The same theorem gives the pointwise admissibility condition
\begin{align*}
\varphi(x)+\psi(y)\leq c(x,y)
\end{align*}
for every $(x,y)\in X\times Y$, not merely outside a null set. It also gives attainment of the dual value:
\begin{align*}
\int_X \varphi(x)\, d\mu(x)+\int_Y \psi(y)\, d\nu(y)
=
D_c(\mu,\nu).
\end{align*}
Thus the pair obtained from the compactness theorem is exactly an admissible Borel integrable dual maximizer.
[/guided]
[/step]
[step:Conclude that the compactness limit is the required dual maximizer]
The preceding step gives finite real-valued Borel functions $\varphi:X\to\mathbb{R}$ and $\psi:Y\to\mathbb{R}$ with $\varphi\in L^1(X,\mu)$ and $\psi\in L^1(Y,\nu)$, satisfying
\begin{align*}
\varphi(x)+\psi(y)\leq c(x,y)
\end{align*}
for every $(x,y)\in X\times Y$. Hence $(\varphi,\psi)\in\mathcal{A}_c(\mu,\nu)$. The same application of Kellerer's compactness theorem gives
\begin{align*}
\int_X \varphi(x)\, d\mu(x)+\int_Y \psi(y)\, d\nu(y)
=
D_c(\mu,\nu).
\end{align*}
Therefore $(\varphi,\psi)$ belongs to the explicitly defined finite real-valued Borel integrable admissible class $\mathcal{A}_c(\mu,\nu)$ and attains the dual supremum $D_c(\mu,\nu)$, as required.
[guided]
The preceding step already produced the only object needed for the theorem: a pair of finite real-valued Borel maps
$\varphi:X\to\mathbb{R}$ and $\psi:Y\to\mathbb{R}$ with $\varphi\in L^1(X,\mu)$ and $\psi\in L^1(Y,\nu)$. Kellerer's compactness theorem also gives the pointwise admissibility inequality
\begin{align*}
\varphi(x)+\psi(y)\leq c(x,y)
\end{align*}
for every $(x,y)\in X\times Y$. By the definition of $\mathcal{A}_c(\mu,\nu)$, these two facts imply
$(\varphi,\psi)\in\mathcal{A}_c(\mu,\nu)$.
It remains only to record the maximizing property. The compactness theorem was applied to a maximizing sequence for $D_c(\mu,\nu)$, and its conclusion gives
\begin{align*}
\int_X \varphi(x)\, d\mu(x)+\int_Y \psi(y)\, d\nu(y)
=
D_c(\mu,\nu).
\end{align*}
Thus the supremum defining the Kantorovich dual value over finite real-valued Borel integrable admissible potentials is attained by $(\varphi,\psi)$. This is exactly the desired conclusion.
[/guided]
[/step]