[proofplan]
We prove the equivalence by combining [Kantorovich duality](/theorems/6799) for the oriented cost from $\nu$ to $\mu$ with the Gibbs variational formula for relative entropy. In the forward direction, the admissible Kantorovich pair associated to a bounded [continuous function](/page/Continuous%20Function) $f$ is $\varphi=Q_1f$ and $\psi=-f$, which converts the transport-[entropy inequality](/theorems/6729) into a linear entropy bound. Taking the supremum over absolutely continuous probability measures gives the exponential inequality. In the reverse direction, the exponential inequality gives the same linear bound by the Gibbs formula, and Kantorovich duality recovers the full transport inequality by taking the supremum over all bounded continuous admissible pairs.
[/proofplan]
[step:Define the oriented Kantorovich dual problem]
Since $C>0$ by hypothesis, division by $C$ and multiplication by $C$ preserve the direction of inequalities throughout the proof.
Let $c:X\times X\to[0,\infty)$ denote the lower semicontinuous cost $c(x,y)=d(x,y)^2/2$. For $\nu,\mu\in\mathcal P(X)$, let $\Pi(\nu,\mu)$ denote the set of probability measures $\pi\in\mathcal P(X\times X)$ whose first marginal is $\nu$ and whose second marginal is $\mu$. The transport functional is oriented with first marginal $\nu$ and second marginal $\mu$:
\begin{align*}
\mathcal T_{d^2/2}(\nu,\mu)=\inf_{\pi\in\Pi(\nu,\mu)}\int_{X\times X}c(x,y)\,d\pi(x,y).
\end{align*}
The integral is understood as an extended non-negative integral, so the value may be $+\infty$.
Let $\mathcal B(X)$ denote the Borel $\sigma$-algebra generated by the topology of $X$. Let $C_b(X)$ denote the [vector space](/page/Vector%20Space) of bounded continuous maps $X\to\mathbb R$. We use the bounded-continuous Polish-space form of the Kantorovich duality theorem for non-negative lower semicontinuous costs. This is the exact duality input used below: both potentials are required to lie in $C_b(X)$, the constraint is pointwise on $X\times X$, and the value is allowed to be $+\infty$. The theorem applies here because $X$ is Polish and $c$ is non-negative and lower semicontinuous, and it gives, for every $\nu,\mu\in\mathcal P(X)$,
\begin{align*}
\mathcal T_{d^2/2}(\nu,\mu)=\sup\left\{\int_X \varphi(x)\,d\nu(x)+\int_X \psi(y)\,d\mu(y):\varphi,\psi\in C_b(X),\ \varphi(x)+\psi(y)\le c(x,y)\text{ for all }x,y\in X\right\}.
\end{align*}
This equality is an equality in $[0,+\infty]$. The admissibility inequality is pointwise on $X\times X$, and the first integral is taken against the first marginal $\nu$ while the second integral is taken against the second marginal $\mu$.
[/step]
[step:Turn each bounded continuous function into an admissible Kantorovich pair]
Fix a bounded continuous function $f:X\to\mathbb R$. Define $Q_1f:X\to\mathbb R$ by
\begin{align*}
(Q_1 f)(x)=\inf_{y\in X}\left\{f(y)+c(x,y)\right\}.
\end{align*}
Since $f$ is bounded and $c\ge 0$, the function $Q_1 f$ is bounded below by $\inf_X f$ and bounded above by $f$, because choosing $y=x$ gives $(Q_1f)(x)\le f(x)$. Also $Q_1f$ is Borel measurable, since it is the pointwise infimum over $y\in X$ of the continuous functions $x\mapsto f(y)+c(x,y)$ and is therefore upper semicontinuous.
For every $x,y\in X$, the defining infimum gives
\begin{align*}
(Q_1f)(x)\le f(y)+c(x,y).
\end{align*}
Equivalently,
\begin{align*}
(Q_1f)(x)-f(y)\le c(x,y).
\end{align*}
Let $\nu\in\mathcal P(X)$, and let $\pi\in\Pi(\nu,\mu)$ be arbitrary. The pointwise inequality may be integrated against $\pi$ because $Q_1f$ and $f$ are bounded Borel functions, giving
\begin{align*}
\int_{X\times X}\left((Q_1f)(x)-f(y)\right)\,d\pi(x,y)\le\int_{X\times X}c(x,y)\,d\pi(x,y).
\end{align*}
Using the marginal identities for $\pi$ gives
\begin{align*}
\int_X (Q_1f)(x)\,d\nu(x)-\int_X f(y)\,d\mu(y)\le\int_{X\times X}c(x,y)\,d\pi(x,y).
\end{align*}
Taking the infimum over all $\pi\in\Pi(\nu,\mu)$ yields, for every $\nu\in\mathcal P(X)$,
\begin{align*}
\int_X (Q_1f)(x)\,d\nu(x)-\int_X f(y)\,d\mu(y)\le\mathcal T_{d^2/2}(\nu,\mu).
\end{align*}
[guided]
Fix a bounded continuous function $f:X\to\mathbb R$. The reason for introducing $Q_1f$ is that its definition builds the Kantorovich constraint into the notation. Define the map $Q_1f:X\to\mathbb R$ by
\begin{align*}
(Q_1 f)(x)=\inf_{y\in X}\left\{f(y)+\frac{d(x,y)^2}{2}\right\}.
\end{align*}
This function is bounded below because $f$ is bounded and $d(x,y)^2/2\ge 0$:
\begin{align*}
(Q_1f)(x)\ge \inf_{z\in X} f(z).
\end{align*}
It is bounded above because the point $y=x$ is allowed in the infimum:
\begin{align*}
(Q_1f)(x)\le f(x)+\frac{d(x,x)^2}{2}=f(x).
\end{align*}
Hence $Q_1f$ is bounded. It is also Borel measurable: for each fixed $y\in X$, the map $x\mapsto f(y)+d(x,y)^2/2$ is continuous, and the pointwise infimum of continuous functions is upper semicontinuous, hence Borel measurable.
Now take arbitrary points $x,y\in X$. Since $(Q_1f)(x)$ is the infimum over all choices of the second variable, the particular choice $y$ gives
\begin{align*}
(Q_1f)(x)\le f(y)+\frac{d(x,y)^2}{2}.
\end{align*}
Subtracting $f(y)$ from both sides gives
\begin{align*}
(Q_1f)(x)-f(y)\le \frac{d(x,y)^2}{2}.
\end{align*}
This is exactly the Kantorovich admissibility condition for the pair $\varphi=Q_1f$ and $\psi=-f$.
Now let $\nu\in\mathcal P(X)$ and choose an arbitrary coupling $\pi\in\Pi(\nu,\mu)$. Because $Q_1f$ and $f$ are bounded Borel functions, we may integrate the pointwise inequality against $\pi$:
\begin{align*}
\int_{X\times X}\left((Q_1f)(x)-f(y)\right)\,d\pi(x,y)\le\int_{X\times X}\frac{d(x,y)^2}{2}\,d\pi(x,y).
\end{align*}
The left-hand side separates by the marginal conditions defining $\Pi(\nu,\mu)$:
\begin{align*}
\int_{X\times X}(Q_1f)(x)\,d\pi(x,y)=\int_X (Q_1f)(x)\,d\nu(x)
\end{align*}
and
\begin{align*}
\int_{X\times X}f(y)\,d\pi(x,y)=\int_X f(y)\,d\mu(y).
\end{align*}
Therefore
\begin{align*}
\int_X (Q_1f)(x)\,d\nu(x)-\int_X f(y)\,d\mu(y)\le\int_{X\times X}\frac{d(x,y)^2}{2}\,d\pi(x,y).
\end{align*}
Since $\pi$ was arbitrary, taking the infimum over $\Pi(\nu,\mu)$ gives
\begin{align*}
\int_X (Q_1f)(x)\,d\nu(x)-\int_X f(y)\,d\mu(y)\le\mathcal T_{d^2/2}(\nu,\mu).
\end{align*}
The orientation matters: $Q_1f$ is integrated against the first marginal $\nu$, while $f$ is integrated against the second marginal $\mu$.
[/guided]
[/step]
[step:Derive the exponential inequality from the transport-entropy inequality]
Assume that
\begin{align*}
\mathcal T_{d^2/2}(\nu,\mu)\le C\,H(\nu\mid\mu)
\end{align*}
for every $\nu\in\mathcal P(X)$. Fix a bounded continuous function $f:X\to\mathbb R$. Combining the previous step with the transport-entropy inequality gives, for every $\nu\in\mathcal P(X)$,
\begin{align*}
\int_X (Q_1f)(x)\,d\nu(x)-\int_X f(y)\,d\mu(y)\le C\,H(\nu\mid\mu).
\end{align*}
If $H(\nu\mid\mu)=+\infty$, this inequality is automatic. If $\nu\ll\mu$, rearranging gives
\begin{align*}
\frac{1}{C}\int_X (Q_1f)(x)\,d\nu(x)-H(\nu\mid\mu)
\le
\frac{1}{C}\int_X f(y)\,d\mu(y).
\end{align*}
We now apply the standard Gibbs variational formula for relative entropy. The form needed is the following: if $(X,\mathcal B(X),\mu)$ is a probability space and $g:X\to\mathbb R$ is bounded and Borel measurable, then
\begin{align*}
\log\int_X e^{g(x)}\,d\mu(x)
=
\sup_{\nu\in\mathcal P(X),\,\nu\ll\mu}
\left\{
\int_X g(x)\,d\nu(x)-H(\nu\mid\mu)
\right\}.
\end{align*}
The hypotheses are satisfied because $X$ is a Polish space, $\mathcal B(X)$ is its Borel $\sigma$-algebra, $\mu\in\mathcal P(X)$ is a probability measure, and $Q_1f$ is bounded and Borel measurable. Define the bounded Borel map $g:X\to\mathbb R$ by
\begin{align*}
g(x)=\frac{1}{C}(Q_1f)(x).
\end{align*}
The Gibbs variational formula gives
\begin{align*}
\log\int_X e^{g(x)}\,d\mu(x)
=
\sup_{\nu\in\mathcal P(X),\,\nu\ll\mu}
\left\{
\int_X g(x)\,d\nu(x)-H(\nu\mid\mu)
\right\}.
\end{align*}
Using the preceding bound inside the supremum yields
\begin{align*}
\log\int_X \exp\left(\frac{1}{C}(Q_1f)(x)\right)\,d\mu(x)
\le
\frac{1}{C}\int_X f(y)\,d\mu(y).
\end{align*}
Exponentiating both sides gives
\begin{align*}
\int_X \exp\left(\frac{1}{C}(Q_1f)(x)\right)\,d\mu(x)
\le
\exp\left(\frac{1}{C}\int_X f(y)\,d\mu(y)\right).
\end{align*}
This is the desired dual exponential inequality.
[guided]
The only new ingredient in this step is the Gibbs variational formula, so we verify its hypotheses carefully. We apply it on the probability space $(X,\mathcal B(X),\mu)$ to the bounded Borel map $g:X\to\mathbb R$ defined by
\begin{align*}
g(x)=\frac{1}{C}(Q_1f)(x).
\end{align*}
The boundedness and Borel measurability of $g$ follow from the previous step and from $C>0$. The formula gives
\begin{align*}
\log\int_X e^{g(x)}\,d\mu(x)
=
\sup_{\nu\in\mathcal P(X),\,\nu\ll\mu}
\left\{
\int_X g(x)\,d\nu(x)-H(\nu\mid\mu)
\right\}.
\end{align*}
For every $\nu\ll\mu$, the transport-entropy inequality and the admissible-pair estimate give
\begin{align*}
\frac{1}{C}\int_X (Q_1f)(x)\,d\nu(x)-H(\nu\mid\mu)
\le
\frac{1}{C}\int_X f(y)\,d\mu(y).
\end{align*}
Taking the supremum over such $\nu$ and using the Gibbs formula therefore yields
\begin{align*}
\log\int_X \exp\left(\frac{1}{C}(Q_1f)(x)\right)\,d\mu(x)
\le
\frac{1}{C}\int_X f(y)\,d\mu(y).
\end{align*}
Since the exponential function is increasing, exponentiating preserves the inequality and gives
\begin{align*}
\int_X \exp\left(\frac{1}{C}(Q_1f)(x)\right)\,d\mu(x)
\le
\exp\left(\frac{1}{C}\int_X f(y)\,d\mu(y)\right).
\end{align*}
[/guided]
[/step]
[step:Recover the linear entropy bound from the exponential inequality]
Assume conversely that for every bounded continuous function $f:X\to\mathbb R$,
\begin{align*}
\int_X \exp\left(\frac{1}{C}(Q_1f)(x)\right)\,d\mu(x)
\le
\exp\left(\frac{1}{C}\int_X f(y)\,d\mu(y)\right).
\end{align*}
Taking logarithms gives
\begin{align*}
\log\int_X \exp\left(\frac{1}{C}(Q_1f)(x)\right)\,d\mu(x)
\le
\frac{1}{C}\int_X f(y)\,d\mu(y).
\end{align*}
Let $\nu\in\mathcal P(X)$ with $\nu\ll\mu$. Applying the Gibbs variational formula stated in the previous step to the bounded Borel map $g:X\to\mathbb R$ defined by $g(x)=(Q_1f)(x)/C$ gives
\begin{align*}
\frac{1}{C}\int_X (Q_1f)(x)\,d\nu(x)-H(\nu\mid\mu)
\le
\log\int_X \exp\left(\frac{1}{C}(Q_1f)(x)\right)\,d\mu(x).
\end{align*}
Combining the last two inequalities and multiplying by $C$ gives
\begin{align*}
\int_X (Q_1f)(x)\,d\nu(x)-\int_X f(y)\,d\mu(y)
\le
C\,H(\nu\mid\mu).
\end{align*}
If $\nu\not\ll\mu$, then $H(\nu\mid\mu)=+\infty$, so this same inequality is automatic.
[guided]
Assume the exponential inequality and fix $f\in C_b(X)$. Because $C>0$, taking logarithms and multiplying by $C$ are order-preserving operations. The logarithmic form of the assumed inequality is
\begin{align*}
\log\int_X \exp\left(\frac{1}{C}(Q_1f)(x)\right)\,d\mu(x)
\le
\frac{1}{C}\int_X f(y)\,d\mu(y).
\end{align*}
Now fix $\nu\in\mathcal P(X)$ with $\nu\ll\mu$. Apply the standard Gibbs variational formula for relative entropy to the bounded Borel map $g:X\to\mathbb R$ given by $g(x)=(Q_1f)(x)/C$. The hypotheses are satisfied because $(X,\mathcal B(X),\mu)$ is a probability space and $g$ is bounded and Borel measurable. Since the logarithm is the supremum over all absolutely continuous probability measures, the particular measure $\nu$ gives
\begin{align*}
\frac{1}{C}\int_X (Q_1f)(x)\,d\nu(x)-H(\nu\mid\mu)
\le
\log\int_X \exp\left(\frac{1}{C}(Q_1f)(x)\right)\,d\mu(x).
\end{align*}
Combining the two displayed inequalities gives
\begin{align*}
\frac{1}{C}\int_X (Q_1f)(x)\,d\nu(x)-H(\nu\mid\mu)
\le
\frac{1}{C}\int_X f(y)\,d\mu(y).
\end{align*}
Multiplying by $C>0$ and rearranging yields
\begin{align*}
\int_X (Q_1f)(x)\,d\nu(x)-\int_X f(y)\,d\mu(y)
\le
C\,H(\nu\mid\mu).
\end{align*}
If $\nu\not\ll\mu$, the entropy convention gives $H(\nu\mid\mu)=+\infty$, so the same inequality holds without further work.
[/guided]
[/step]
[step:Use admissible pairs to recover the transport inequality]
Fix $\nu\in\mathcal P(X)$. Let $\varphi,\psi\in C_b(X)$ be bounded continuous functions satisfying the Kantorovich admissibility condition
\begin{align*}
\varphi(x)+\psi(y)\le \frac{d(x,y)^2}{2}
\end{align*}
for every $x,y\in X$. Define the bounded continuous function $f:X\to\mathbb R$ by
\begin{align*}
f(y)=-\psi(y).
\end{align*}
The admissibility condition becomes
\begin{align*}
\varphi(x)\le f(y)+\frac{d(x,y)^2}{2}
\end{align*}
for every $x,y\in X$. Taking the infimum over $y\in X$ gives
\begin{align*}
\varphi(x)\le (Q_1f)(x)
\end{align*}
for every $x\in X$.
Using this pointwise inequality and the linear entropy bound from the previous step,
\begin{align*}
\int_X \varphi(x)\,d\nu(x)+\int_X \psi(y)\,d\mu(y)
\le
\int_X (Q_1f)(x)\,d\nu(x)-\int_X f(y)\,d\mu(y)
\le
C\,H(\nu\mid\mu).
\end{align*}
Taking the supremum over all bounded continuous admissible pairs $\varphi,\psi$ in the Kantorovich duality formula yields
\begin{align*}
\mathcal T_{d^2/2}(\nu,\mu)\le C\,H(\nu\mid\mu).
\end{align*}
Since $\nu\in\mathcal P(X)$ was arbitrary, the transport-entropy inequality holds for every probability measure $\nu$ on $X$.
[guided]
Fix $\nu\in\mathcal P(X)$. To recover the transport inequality, we must bound every admissible Kantorovich dual value by $C H(\nu\mid\mu)$. Let $\varphi,\psi\in C_b(X)$ satisfy
\begin{align*}
\varphi(x)+\psi(y)\le \frac{d(x,y)^2}{2}
\end{align*}
for all $x,y\in X$, and define the bounded continuous map $f:X\to\mathbb R$ by $f(y)=-\psi(y)$. Then the admissibility condition says
\begin{align*}
\varphi(x)\le f(y)+\frac{d(x,y)^2}{2}
\end{align*}
for every $y\in X$. Taking the infimum over $y$ gives the pointwise comparison $\varphi(x)\le (Q_1f)(x)$. Integrating this inequality against $\nu$ and using $\psi=-f$ gives
\begin{align*}
\int_X \varphi(x)\,d\nu(x)+\int_X \psi(y)\,d\mu(y)
\le
\int_X (Q_1f)(x)\,d\nu(x)-\int_X f(y)\,d\mu(y).
\end{align*}
The linear entropy bound from the previous step bounds the right-hand side by $C H(\nu\mid\mu)$. Hence every bounded continuous admissible dual pair has value at most $C H(\nu\mid\mu)$. Taking the supremum over such pairs in Kantorovich duality gives
\begin{align*}
\mathcal T_{d^2/2}(\nu,\mu)\le C\,H(\nu\mid\mu).
\end{align*}
[/guided]
[/step]
[step:Identify the normalization of the Hopf-Lax operator]
The operator used throughout the proof is
\begin{align*}
(Q_1f)(x)=\inf_{y\in X}\left\{f(y)+\frac{d(x,y)^2}{2}\right\}.
\end{align*}
Let $B_b(X)$ denote the vector space of bounded Borel measurable maps $X\to\mathbb R$. More generally, for each $t>0$, the time-$t$ Hopf-Lax operator $Q_t:C_b(X)\to B_b(X)$ is defined by
\begin{align*}
(Q_tf)(x)=\inf_{y\in X}\left\{f(y)+\frac{d(x,y)^2}{2t}\right\}.
\end{align*}
Thus the dual inequality above is exactly the time-$1$ case. The constant $C$ enters only through the entropy coefficient and through the exponential normalization by $1/C$; it is not absorbed into the transport cost. This completes the equivalence.
[/step]