[proofplan]
We prove the equivalence by passing between transport estimates and exponential moment estimates through the Gibbs variational formula for relative entropy. The implication $T_1(C)\Rightarrow$ the Laplace bound uses the dual formula for $W_1$ to control every tilted expectation by entropy, then optimizes a quadratic expression. Conversely, the Laplace bound and the Gibbs formula give a one-parameter estimate for each [test function](/page/Test%20Function) and each probability measure; optimizing in the parameter and taking the supremum over bounded $1$-Lipschitz functions recovers $T_1(C)$.
[/proofplan]
[step:Use $T_1(C)$ to bound tilted expectations by entropy]
Assume that $\mu$ satisfies $T_1(C)$ in the bounded dual sense specified in the theorem statement. Thus, for probability measures $\rho$ and $\sigma$ on $(X,\mathcal{B}(X))$, the symbol $W_1(\rho,\sigma)\in[0,\infty]$ denotes the bounded Kantorovich--Rubinstein dual quantity
\begin{align*}
W_1(\rho,\sigma)=\sup\left\{\int_X h(x)\,d\rho(x)-\int_X h(x)\,d\sigma(x):h:X\to\mathbb{R}\text{ is bounded and }1\text{-Lipschitz}\right\}.
\end{align*}
This definition is the bounded dual version of the extended $1$-Wasserstein distance used in the theorem, and it allows the value $+\infty$, so no finite first-moment assumption is needed. For probability measures $\nu$ and $\mu$ on $(X,\mathcal{B}(X))$, let $H(\nu\mid\mu)\in[0,\infty]$ denote the relative entropy
\begin{align*}
H(\nu\mid\mu)=\int_X \log\left(\frac{d\nu}{d\mu}(x)\right)\,d\nu(x)
\end{align*}
when $\nu\ll\mu$, and set $H(\nu\mid\mu)=+\infty$ otherwise. Let $f:X\to\mathbb{R}$ be a bounded $1$-Lipschitz map, and define the centered bounded measurable map $g_f:X\to\mathbb{R}$ by
\begin{align*}
g_f(x)=f(x)-\int_X f(y)\,d\mu(y).
\end{align*}
Let $\nu$ be a probability measure on $(X,\mathcal{B}(X))$ with $\nu\ll\mu$. By the bounded Kantorovich--Rubinstein representation of $W_1$, the function $f$ is an admissible test function, so
\begin{align*}
\int_X f\,d\nu(x)-\int_X f\,d\mu(x)\leq W_1(\nu,\mu).
\end{align*}
Using $T_1(C)$ gives
\begin{align*}
\int_X g_f\,d\nu(x)\leq \sqrt{2C\,H(\nu\mid\mu)}.
\end{align*}
[guided]
We start with the transport inequality and extract exactly the estimate needed for the exponential moment. The statement defines $T_1(C)$ using the bounded dual transport quantity: for probability measures $\rho$ and $\sigma$ on $(X,\mathcal{B}(X))$, $W_1(\rho,\sigma)$ denotes
\begin{align*}
W_1(\rho,\sigma)=\sup\left\{\int_X h(x)\,d\rho(x)-\int_X h(x)\,d\sigma(x):h:X\to\mathbb{R}\text{ is bounded and }1\text{-Lipschitz}\right\}.
\end{align*}
This is the bounded Kantorovich--Rubinstein dual form of the extended $1$-Wasserstein distance. The value may be $+\infty$; this is important because the measures in the theorem are arbitrary probability measures on a Polish [metric space](/page/Metric%20Space) and need not have finite first moment. For probability measures $\nu$ and $\mu$, $H(\nu\mid\mu)$ denotes relative entropy, meaning
\begin{align*}
H(\nu\mid\mu)=\int_X \log\left(\frac{d\nu}{d\mu}(x)\right)\,d\nu(x)
\end{align*}
when $\nu\ll\mu$, and $H(\nu\mid\mu)=+\infty$ otherwise.
Fix a bounded $1$-Lipschitz map $f:X\to\mathbb{R}$, and center it by defining $g_f:X\to\mathbb{R}$ as
\begin{align*}
g_f(x)=f(x)-\int_X f(y)\,d\mu(y).
\end{align*}
The function $g_f$ is bounded and Borel measurable because $f$ is bounded and continuous, since every Lipschitz map is continuous.
Now let $\nu$ be a probability measure on $(X,\mathcal{B}(X))$ with $\nu\ll\mu$. The bounded Kantorovich--Rubinstein representation of $W_1$ says that $W_1(\nu,\mu)$ is the supremum of
\begin{align*}
\int_X h(x)\,d\nu(x)-\int_X h(x)\,d\mu(x)
\end{align*}
over bounded $1$-Lipschitz maps $h:X\to\mathbb{R}$. Since our chosen $f$ is one of these admissible maps, we get
\begin{align*}
\int_X f\,d\nu(x)-\int_X f\,d\mu(x)\leq W_1(\nu,\mu).
\end{align*}
The left-hand side is exactly $\int_X g_f\,d\nu(x)$, because
\begin{align*}
\int_X g_f\,d\nu(x)=\int_X f\,d\nu(x)-\int_X\left(\int_X f\,d\mu(y)\right)\,d\nu(x)=\int_X f\,d\nu(x)-\int_X f\,d\mu(y),
\end{align*}
and $\nu(X)=1$. The hypothesis $T_1(C)$ applies to this probability measure $\nu$, giving
\begin{align*}
W_1(\nu,\mu)\leq \sqrt{2C\,H(\nu\mid\mu)}.
\end{align*}
Combining the two inequalities yields
\begin{align*}
\int_X g_f\,d\nu(x)\leq \sqrt{2C\,H(\nu\mid\mu)}.
\end{align*}
This is the bridge from transport to entropy: every possible tilted expectation of the centered Lipschitz function is controlled by the square root of relative entropy.
[/guided]
[/step]
[claim:Establish the bounded entropy variational formula]
For every bounded Borel measurable map $\psi:X\to\mathbb{R}$,
\begin{align*}
\log\int_X e^{\psi(x)}\,d\mu(x)=\sup_{\nu\ll\mu}\left\{\int_X \psi(x)\,d\nu(x)-H(\nu\mid\mu)\right\},
\end{align*}
where the supremum ranges over probability measures $\nu$ on $(X,\mathcal{B}(X))$ that are absolutely continuous with respect to $\mu$.
[/claim]
[proof]
Define $Z\in(0,\infty)$ by
\begin{align*}
Z=\int_X e^{\psi(x)}\,d\mu(x).
\end{align*}
The number $Z$ is finite and positive because $\psi$ is bounded and $\mu$ is a probability measure. Define the probability measure $\eta$ on $(X,\mathcal{B}(X))$ by its density $m:X\to(0,\infty)$ with respect to $\mu$:
\begin{align*}
m(x)=\frac{e^{\psi(x)}}{Z}.
\end{align*}
Let $\nu\ll\mu$ be a probability measure, and let $q:X\to[0,\infty)$ denote a Radon--Nikodym density $q=d\nu/d\mu$. Since $m>0$, also $\nu\ll\eta$, with density $q/m$ relative to $\eta$. Using the definition of relative entropy and then the [Jensen Inequality](/theorems/9) for the probability measure $\nu$ and the convex function $t\mapsto -\log t$ gives
\begin{align*}
H(\nu\mid\eta)=\int_X \log\left(\frac{q(x)}{m(x)}\right)\,d\nu(x)\geq 0.
\end{align*}
Because $\log m(x)=\psi(x)-\log Z$, we have
\begin{align*}
\int_X \psi(x)\,d\nu(x)-H(\nu\mid\mu)=\log Z-H(\nu\mid\eta)\leq \log Z.
\end{align*}
Taking the supremum over all $\nu\ll\mu$ gives the upper bound. For the reverse inequality, take $\nu=\eta$. Then $H(\eta\mid\mu)=\int_X \log m(x)\,d\eta(x)$ and the preceding identity gives
\begin{align*}
\int_X \psi(x)\,d\eta(x)-H(\eta\mid\mu)=\log Z.
\end{align*}
Thus the supremum equals $\log Z$, proving the formula.
[/proof]
[step:Apply the entropy variational formula and optimize the entropy parameter]
Let $\lambda\geq 0$. Since $g_f$ is bounded and Borel measurable, the bounded entropy variational formula just proved applies to the bounded Borel measurable map $\lambda g_f:X\to\mathbb{R}$ and gives
\begin{align*}
\log\int_X e^{\lambda g_f(x)}\,d\mu(x)=\sup_{\nu\ll\mu}\left\{\lambda\int_X g_f(x)\,d\nu(x)-H(\nu\mid\mu)\right\}.
\end{align*}
For every admissible $\nu$, define $r_\nu\in[0,\infty]$ by
\begin{align*}
r_\nu=\sqrt{H(\nu\mid\mu)}.
\end{align*}
If $r_\nu<\infty$, the preceding step gives
\begin{align*}
\lambda\int_X g_f\,d\nu(x)-H(\nu\mid\mu)\leq \lambda\sqrt{2C}\,r_\nu-r_\nu^2.
\end{align*}
The quadratic function $r\mapsto \lambda\sqrt{2C}\,r-r^2$ on $[0,\infty)$ has maximum $C\lambda^2/2$, attained at $r=\lambda\sqrt{2C}/2$. If $r_\nu=\infty$, then the variational term is $-\infty$ and is irrelevant to the supremum. Therefore
\begin{align*}
\log\int_X e^{\lambda g_f}\,d\mu(x)\leq \frac{C\lambda^2}{2}.
\end{align*}
For $\lambda<0$, apply the already proved estimate with $-\lambda>0$ to the bounded $1$-Lipschitz map $-f:X\to\mathbb{R}$. This gives the asserted Laplace bound for every $\lambda\in\mathbb{R}$.
[/step]
[step:Use the Laplace bound to estimate one Lipschitz test function]
Assume now that the Laplace bound holds for every bounded $1$-Lipschitz map. Let $\nu$ be a probability measure on $(X,\mathcal{B}(X))$, and let $f:X\to\mathbb{R}$ be bounded and $1$-Lipschitz. Define $g_f:X\to\mathbb{R}$ by
\begin{align*}
g_f(x)=f(x)-\int_X f(y)\,d\mu(y).
\end{align*}
If $H(\nu\mid\mu)=+\infty$, the desired estimate
\begin{align*}
\int_X g_f\,d\nu(x)\leq \sqrt{2C\,H(\nu\mid\mu)}
\end{align*}
is immediate. Suppose $H(\nu\mid\mu)<\infty$. Then $\nu\ll\mu$. For each $\lambda>0$, the bounded entropy variational formula proved above applies to the bounded measurable function $\lambda g_f:X\to\mathbb{R}$ and gives
\begin{align*}
\lambda\int_X g_f\,d\nu(x)-H(\nu\mid\mu)\leq \log\int_X e^{\lambda g_f}\,d\mu(x).
\end{align*}
Using the assumed Laplace bound,
\begin{align*}
\lambda\int_X g_f\,d\nu(x)\leq H(\nu\mid\mu)+\frac{C\lambda^2}{2}.
\end{align*}
Dividing by $\lambda>0$ gives
\begin{align*}
\int_X g_f\,d\nu(x)\leq \frac{H(\nu\mid\mu)}{\lambda}+\frac{C\lambda}{2}.
\end{align*}
[guided]
Now we reverse the argument. Fix a probability measure $\nu$ and a bounded $1$-Lipschitz test function $f:X\to\mathbb{R}$. As before, define the centered map $g_f:X\to\mathbb{R}$ by
\begin{align*}
g_f(x)=f(x)-\int_X f(y)\,d\mu(y).
\end{align*}
The goal is to control $\int_X g_f\,d\nu(x)$ by the square root of the entropy of $\nu$ relative to $\mu$.
If $H(\nu\mid\mu)=+\infty$, then the inequality
\begin{align*}
\int_X g_f\,d\nu(x)\leq \sqrt{2C\,H(\nu\mid\mu)}
\end{align*}
has right-hand side $+\infty$, so there is nothing further to prove. We therefore assume $H(\nu\mid\mu)<\infty$. By the definition of relative entropy used in the statement, this implies $\nu\ll\mu$.
Fix $\lambda>0$. The bounded entropy variational formula proved above says that, for the bounded Borel measurable function $\lambda g_f:X\to\mathbb{R}$,
\begin{align*}
\log\int_X e^{\lambda g_f(x)}\,d\mu(x)=\sup_{\rho\ll\mu}\left\{\lambda\int_X g_f(x)\,d\rho(x)-H(\rho\mid\mu)\right\},
\end{align*}
where the supremum ranges over probability measures $\rho$ absolutely continuous with respect to $\mu$. Our measure $\nu$ is one admissible competitor in this supremum, so
\begin{align*}
\lambda\int_X g_f\,d\nu(x)-H(\nu\mid\mu)\leq \log\int_X e^{\lambda g_f}\,d\mu(x).
\end{align*}
The assumed Laplace bound applies because $f$ is bounded and $1$-Lipschitz, and $g_f$ is exactly the centered version of $f$. Hence
\begin{align*}
\log\int_X e^{\lambda g_f}\,d\mu(x)\leq \frac{C\lambda^2}{2}.
\end{align*}
Combining these two inequalities gives
\begin{align*}
\lambda\int_X g_f\,d\nu(x)\leq H(\nu\mid\mu)+\frac{C\lambda^2}{2}.
\end{align*}
Finally, divide by the positive number $\lambda$:
\begin{align*}
\int_X g_f\,d\nu(x)\leq \frac{H(\nu\mid\mu)}{\lambda}+\frac{C\lambda}{2}.
\end{align*}
This is the key one-parameter estimate. The free parameter $\lambda$ will now be chosen to minimize the right-hand side.
[/guided]
[/step]
[step:Optimize over the Laplace parameter]
Define $h\in[0,\infty)$ by
\begin{align*}
h=H(\nu\mid\mu).
\end{align*}
If $h=0$, then the estimate from the preceding step gives, for every $\lambda>0$,
\begin{align*}
\int_X g_f\,d\nu(x)\leq \frac{C\lambda}{2}.
\end{align*}
Letting $\lambda\downarrow 0$ yields
\begin{align*}
\int_X g_f\,d\nu(x)\leq 0=\sqrt{2Ch}.
\end{align*}
If $0<h<\infty$, choose
\begin{align*}
\lambda=\sqrt{\frac{2h}{C}}.
\end{align*}
Substitution into the preceding estimate gives
\begin{align*}
\int_X g_f\,d\nu(x)\leq \frac{h}{\sqrt{2h/C}}+\frac{C}{2}\sqrt{\frac{2h}{C}}=\sqrt{2Ch}.
\end{align*}
Thus, for every bounded $1$-Lipschitz map $f:X\to\mathbb{R}$,
\begin{align*}
\int_X f\,d\nu(x)-\int_X f\,d\mu(x)\leq \sqrt{2C\,H(\nu\mid\mu)}.
\end{align*}
[guided]
The preceding step proved that, for every $\lambda>0$,
\begin{align*}
\int_X g_f\,d\nu(x)\leq \frac{H(\nu\mid\mu)}{\lambda}+\frac{C\lambda}{2}.
\end{align*}
We now minimize the right-hand side in the only remaining free parameter. Define $h\in[0,\infty)$ by
\begin{align*}
h=H(\nu\mid\mu).
\end{align*}
The case $h=0$ must be separated because the formal minimizing value of $\lambda$ is then $0$, while the estimate is available only for $\lambda>0$. If $h=0$, then for every $\lambda>0$,
\begin{align*}
\int_X g_f\,d\nu(x)\leq \frac{C\lambda}{2}.
\end{align*}
Letting $\lambda\downarrow 0$ gives
\begin{align*}
\int_X g_f\,d\nu(x)\leq 0=\sqrt{2Ch}.
\end{align*}
Now suppose $0<h<\infty$. The expression
\begin{align*}
\frac{h}{\lambda}+\frac{C\lambda}{2}
\end{align*}
is minimized when the two terms balance. Choose
\begin{align*}
\lambda=\sqrt{\frac{2h}{C}}.
\end{align*}
This number is positive because $h>0$ and $C>0$. Substituting this value into the one-parameter estimate yields
\begin{align*}
\int_X g_f\,d\nu(x)\leq \frac{h}{\sqrt{2h/C}}+\frac{C}{2}\sqrt{\frac{2h}{C}}=\sqrt{2Ch}.
\end{align*}
Since $g_f$ was defined by centering $f$ around its $\mu$-mean, this is exactly
\begin{align*}
\int_X f\,d\nu(x)-\int_X f\,d\mu(x)\leq \sqrt{2C\,H(\nu\mid\mu)}.
\end{align*}
Thus every bounded $1$-Lipschitz test function satisfies the entropy-controlled expectation bound.
[/guided]
[/step]
[step:Take the supremum over bounded Lipschitz test functions]
By the bounded dual definition of $W_1$ fixed in the theorem statement,
\begin{align*}
W_1(\nu,\mu)=\sup\left\{\int_X f(x)\,d\nu(x)-\int_X f(x)\,d\mu(x): f:X\to\mathbb{R}\text{ is bounded and }1\text{-Lipschitz}\right\}.
\end{align*}
The estimate from the preceding step holds for every function in this admissible class. Taking the supremum over all such $f$ gives
\begin{align*}
W_1(\nu,\mu)\leq \sqrt{2C\,H(\nu\mid\mu)}.
\end{align*}
Since $\nu$ was arbitrary, $\mu$ satisfies $T_1(C)$. This proves the converse implication and completes the equivalence.
[guided]
The final move is to pass from one test function to the transport distance. The bounded dual definition of $W_1$ fixed in the theorem statement defines
\begin{align*}
W_1(\nu,\mu)=\sup\left\{\int_X f(x)\,d\nu(x)-\int_X f(x)\,d\mu(x): f:X\to\mathbb{R}\text{ is bounded and }1\text{-Lipschitz}\right\}.
\end{align*}
The preceding step proved that each individual function in this admissible class satisfies
\begin{align*}
\int_X f(x)\,d\nu(x)-\int_X f(x)\,d\mu(x)\leq \sqrt{2C\,H(\nu\mid\mu)}.
\end{align*}
A supremum of quantities all bounded above by the same number is bounded above by that number. Therefore
\begin{align*}
W_1(\nu,\mu)\leq \sqrt{2C\,H(\nu\mid\mu)}.
\end{align*}
Because the probability measure $\nu$ on $(X,\mathcal{B}(X))$ was arbitrary, this is precisely the transportation inequality $T_1(C)$ for $\mu$. This proves the converse implication and completes the equivalence.
[/guided]
[/step]