[guided]We start with the transport inequality and extract exactly the estimate needed for the exponential moment. The statement defines $T_1(C)$ using the bounded dual transport quantity: for probability measures $\rho$ and $\sigma$ on $(X,\mathcal{B}(X))$, $W_1(\rho,\sigma)$ denotes
\begin{align*}
W_1(\rho,\sigma)=\sup\left\{\int_X h(x)\,d\rho(x)-\int_X h(x)\,d\sigma(x):h:X\to\mathbb{R}\text{ is bounded and }1\text{-Lipschitz}\right\}.
\end{align*}
This is the bounded Kantorovich--Rubinstein dual form of the extended $1$-Wasserstein distance. The value may be $+\infty$; this is important because the measures in the theorem are arbitrary probability measures on a Polish [metric space](/page/Metric%20Space) and need not have finite first moment. For probability measures $\nu$ and $\mu$, $H(\nu\mid\mu)$ denotes relative entropy, meaning
\begin{align*}
H(\nu\mid\mu)=\int_X \log\left(\frac{d\nu}{d\mu}(x)\right)\,d\nu(x)
\end{align*}
when $\nu\ll\mu$, and $H(\nu\mid\mu)=+\infty$ otherwise.
Fix a bounded $1$-Lipschitz map $f:X\to\mathbb{R}$, and center it by defining $g_f:X\to\mathbb{R}$ as
\begin{align*}
g_f(x)=f(x)-\int_X f(y)\,d\mu(y).
\end{align*}
The function $g_f$ is bounded and Borel measurable because $f$ is bounded and continuous, since every Lipschitz map is continuous.
Now let $\nu$ be a probability measure on $(X,\mathcal{B}(X))$ with $\nu\ll\mu$. The bounded Kantorovich--Rubinstein representation of $W_1$ says that $W_1(\nu,\mu)$ is the supremum of
\begin{align*}
\int_X h(x)\,d\nu(x)-\int_X h(x)\,d\mu(x)
\end{align*}
over bounded $1$-Lipschitz maps $h:X\to\mathbb{R}$. Since our chosen $f$ is one of these admissible maps, we get
\begin{align*}
\int_X f\,d\nu(x)-\int_X f\,d\mu(x)\leq W_1(\nu,\mu).
\end{align*}
The left-hand side is exactly $\int_X g_f\,d\nu(x)$, because
\begin{align*}
\int_X g_f\,d\nu(x)=\int_X f\,d\nu(x)-\int_X\left(\int_X f\,d\mu(y)\right)\,d\nu(x)=\int_X f\,d\nu(x)-\int_X f\,d\mu(y),
\end{align*}
and $\nu(X)=1$. The hypothesis $T_1(C)$ applies to this probability measure $\nu$, giving
\begin{align*}
W_1(\nu,\mu)\leq \sqrt{2C\,H(\nu\mid\mu)}.
\end{align*}
Combining the two inequalities yields
\begin{align*}
\int_X g_f\,d\nu(x)\leq \sqrt{2C\,H(\nu\mid\mu)}.
\end{align*}
This is the bridge from transport to entropy: every possible tilted expectation of the centered Lipschitz function is controlled by the square root of relative entropy.[/guided]