[proofplan]
We first check that every $1$-Lipschitz [test function](/page/Test%20Function) is integrable against both measures, so the dual expression is well-defined. The inequality from right to left follows directly from the defining Lipschitz estimate $f(y)-f(x)\le d(x,y)$, integrated against an arbitrary coupling. For the reverse inequality, we invoke the general [Kantorovich duality](/theorems/6799) theorem for non-negative lower semicontinuous costs on Polish spaces, applied to the cost $c(x,y)=d(x,y)$. The two-potential dual problem is then reduced to a single $1$-Lipschitz potential by taking the $d$-Lipschitz envelope of one potential.
[/proofplan]
[step:Verify that the Lipschitz test integrals are finite]
Fix a point $x_0 \in X$. Let $f:X\to\mathbb R$ be $1$-Lipschitz. Since $f$ is Lipschitz, it is continuous and therefore $\mathcal{B}(X)$-measurable. For every $x\in X$, the Lipschitz condition gives
\begin{align*}
|f(x)| \le |f(x_0)| + d(x,x_0).
\end{align*}
Since $\mu$ and $\nu$ have finite first moments, this estimate implies
\begin{align*}
\int_X |f|\,d\mu < \infty
\end{align*}
and
\begin{align*}
\int_X |f|\,d\nu < \infty.
\end{align*}
Thus both integrals in the displayed supremum are finite [real numbers](/page/Real%20Numbers).
The same first-moment assumption also implies that $W_1(\mu,\nu)<\infty$. Indeed, the product probability measure $\mu\otimes\nu$ belongs to $\Pi(\mu,\nu)$, and the triangle inequality gives
\begin{align*}
d(x,y) \le d(x,x_0)+d(y,x_0)
\end{align*}
for all $(x,y)\in X\times X$. Therefore Tonelli's theorem applies to the non-negative functions involved and yields
\begin{align*}
\int_{X\times X} d(x,y)\,d(\mu\otimes\nu)(x,y) \le \int_X d(x,x_0)\,d\mu(x)+\int_X d(y,x_0)\,d\nu(y) < \infty.
\end{align*}
[/step]
[step:Integrate the pointwise Lipschitz bound against an arbitrary coupling]
Let $\pi\in\Pi(\mu,\nu)$ be arbitrary, and let $f:X\to\mathbb R$ be $1$-Lipschitz. Since $\pi$ has first marginal $\mu$ and second marginal $\nu$, the marginal identities give
\begin{align*}
\int_{X\times X} f(y)\,d\pi(x,y)=\int_X f\,d\nu
\end{align*}
and
\begin{align*}
\int_{X\times X} f(x)\,d\pi(x,y)=\int_X f\,d\mu.
\end{align*}
The Lipschitz condition gives $f(y)-f(x)\le d(x,y)$ for every $(x,y)\in X\times X$. Integrating this pointwise inequality with respect to $\pi$ gives
\begin{align*}
\int_X f\,d\nu-\int_X f\,d\mu \le \int_{X\times X} d(x,y)\,d\pi(x,y).
\end{align*}
Taking the infimum over all $\pi\in\Pi(\mu,\nu)$ yields
\begin{align*}
\int_X f\,d\nu-\int_X f\,d\mu \le W_1(\mu,\nu).
\end{align*}
Finally, taking the supremum over all $1$-Lipschitz functions $f:X\to\mathbb R$ gives
\begin{align*}
\sup\left\{\int_X f\,d\nu-\int_X f\,d\mu : f:X\to\mathbb R \text{ is }1\text{-Lipschitz}\right\} \le W_1(\mu,\nu).
\end{align*}
[/step]
[step:Apply Kantorovich duality to the metric cost]
Define the cost function
\begin{align*}
c:X\times X\to[0,\infty),\qquad c(x,y):=d(x,y).
\end{align*}
Because $d$ is continuous on $X\times X$, the function $c$ is lower semicontinuous. Because $\mu$ and $\nu$ have finite first moments, the cost has finite value for at least one coupling, as shown using $\mu\otimes\nu$ in the first step.
By the general Kantorovich duality theorem for non-negative lower semicontinuous costs on Polish spaces, applied to $c=d$ and to the Borel probability measures $\mu,\nu$ (citing a result not yet in the wiki: Kantorovich duality for lower semicontinuous costs on Polish spaces), we have
\begin{align*}
W_1(\mu,\nu)=\sup\left\{\int_X \varphi\,d\nu-\int_X \psi\,d\mu : \varphi(y)-\psi(x)\le d(x,y)\text{ for all }x,y\in X\right\},
\end{align*}
where the supremum is taken over admissible real-valued Borel functions $\varphi,\psi:X\to\mathbb R$ for which the displayed integrals are well-defined.
[guided]
We now pass from the transport problem to its two-potential dual problem. The cost function used in optimal transport is
\begin{align*}
c:X\times X\to[0,\infty),\qquad c(x,y):=d(x,y).
\end{align*}
The general Kantorovich duality theorem for Polish spaces requires the cost to be non-negative and lower semicontinuous. Non-negativity follows from the definition of a metric, and lower semicontinuity follows because $d:X\times X\to\mathbb R$ is continuous. The theorem also requires the primal value not to be forced into the indeterminate form $+\infty-\infty$; here the primal value is finite because the product coupling $\mu\otimes\nu$ has finite cost:
\begin{align*}
\int_{X\times X} d(x,y)\,d(\mu\otimes\nu)(x,y) \le \int_X d(x,x_0)\,d\mu(x)+\int_X d(y,x_0)\,d\nu(y) < \infty.
\end{align*}
Applying the general Kantorovich duality theorem to this cost gives the two-potential formula
\begin{align*}
W_1(\mu,\nu)=\sup\left\{\int_X \varphi\,d\nu-\int_X \psi\,d\mu : \varphi(y)-\psi(x)\le d(x,y)\text{ for all }x,y\in X\right\}.
\end{align*}
Here $\varphi:X\to\mathbb R$ and $\psi:X\to\mathbb R$ are Borel potentials satisfying the pointwise admissibility constraint. The rest of the proof is therefore not about transport plans anymore; it is about showing that this two-function admissibility condition is exactly captured by one $1$-Lipschitz function.
[/guided]
[/step]
[step:Compress every admissible pair into one Lipschitz potential]
Let $\varphi,\psi:X\to\mathbb R$ be an admissible pair in the Kantorovich dual problem, so that
\begin{align*}
\varphi(y)-\psi(x)\le d(x,y)
\end{align*}
for all $x,y\in X$. Define the envelope function
\begin{align*}
f:X\to\mathbb R,\qquad f(y):=\inf_{x\in X}\{\psi(x)+d(x,y)\}.
\end{align*}
For every $y\in X$, the admissibility inequality gives $\varphi(y)\le \psi(x)+d(x,y)$ for every $x\in X$, hence
\begin{align*}
\varphi(y)\le f(y).
\end{align*}
Taking $x=y$ in the definition of $f$ gives
\begin{align*}
f(y)\le \psi(y).
\end{align*}
We claim that $f$ is $1$-Lipschitz. Let $y,z\in X$. For every $x\in X$, the triangle inequality gives
\begin{align*}
\psi(x)+d(x,y)\le \psi(x)+d(x,z)+d(z,y).
\end{align*}
Taking the infimum over $x\in X$ gives
\begin{align*}
f(y)\le f(z)+d(y,z).
\end{align*}
Interchanging $y$ and $z$ gives
\begin{align*}
f(z)\le f(y)+d(y,z).
\end{align*}
Therefore $|f(y)-f(z)|\le d(y,z)$, so $f$ is $1$-Lipschitz.
Since $\varphi\le f\le \psi$, we obtain
\begin{align*}
\int_X \varphi\,d\nu-\int_X \psi\,d\mu \le \int_X f\,d\nu-\int_X f\,d\mu.
\end{align*}
Thus every admissible two-potential value is bounded above by a value achieved by a single $1$-Lipschitz potential. Consequently,
\begin{align*}
W_1(\mu,\nu)\le \sup\left\{\int_X f\,d\nu-\int_X f\,d\mu : f:X\to\mathbb R \text{ is }1\text{-Lipschitz}\right\}.
\end{align*}
[/step]
[step:Conclude equality of the primal and Lipschitz dual values]
The second step proved
\begin{align*}
\sup\left\{\int_X f\,d\nu-\int_X f\,d\mu : f:X\to\mathbb R \text{ is }1\text{-Lipschitz}\right\} \le W_1(\mu,\nu).
\end{align*}
The preceding step proved the reverse inequality
\begin{align*}
W_1(\mu,\nu)\le \sup\left\{\int_X f\,d\nu-\int_X f\,d\mu : f:X\to\mathbb R \text{ is }1\text{-Lipschitz}\right\}.
\end{align*}
Combining the two inequalities gives
\begin{align*}
W_1(\mu,\nu)=\sup\left\{\int_X f\,d\nu-\int_X f\,d\mu : f:X\to\mathbb R \text{ is }1\text{-Lipschitz}\right\}.
\end{align*}
This is the desired Kantorovich-Rubinstein duality formula.
[/step]