[proofplan]
We first compare the two Wasserstein distances on $\mathcal P_2(E)$. For any coupling of $\nu$ and $\mu$, the first moment of the transport distance is bounded by the square root of its second moment, because the square has non-negative variance on a probability space. Taking the infimum over couplings gives $W_1(\nu,\mu)\le W_2(\nu,\mu)$, and the assumed $T_2(C)$ inequality then gives the desired $T_1(C)$ bound on $\mathcal P_2(E)$. Finally, the stated approximation property extends the estimate from $\mathcal P_2(E)$ to finite-entropy measures in $\mathcal P_1(E)$ by passing to the limit in $W_1$ and using the upper entropy bound.
[/proofplan]
[step:Compare $W_1$ and $W_2$ on $\mathcal P_2(E)$]
Let $\nu \in \mathcal P_2(E)$, and let $\Pi(\nu,\mu)$ denote the set of Borel probability measures $\pi$ on $E \times E$ whose first marginal is $\nu$ and whose second marginal is $\mu$. For $\pi \in \Pi(\nu,\mu)$, define the measurable transport-distance function $D_\pi: E \times E \to [0,\infty)$ by $D_\pi(x,y)=d(x,y)$.
Assume first that
\begin{align*}
\int_{E \times E} D_\pi(x,y)^2 \, d\pi(x,y) <+\infty.
\end{align*}
Define the number
\begin{align*}
m_\pi := \int_{E \times E} D_\pi(x,y) \, d\pi(x,y).
\end{align*}
Since $\pi$ is a probability measure and $(D_\pi-m_\pi)^2 \ge 0$, we have
\begin{align*}
0 \le \int_{E \times E} (D_\pi(x,y)-m_\pi)^2 \, d\pi(x,y).
\end{align*}
Expanding the square and using $\pi(E\times E)=1$ gives
\begin{align*}
0 \le \int_{E \times E} D_\pi(x,y)^2 \, d\pi(x,y) - m_\pi^2.
\end{align*}
Therefore
\begin{align*}
\int_{E \times E} D_\pi(x,y) \, d\pi(x,y) \le \left(\int_{E \times E} D_\pi(x,y)^2 \, d\pi(x,y)\right)^{1/2}.
\end{align*}
If the quadratic cost is infinite, the same inequality holds with right-hand side $+\infty$.
Taking the infimum over all $\pi \in \Pi(\nu,\mu)$ and using the monotonicity of the square-root function on $[0,\infty]$, we obtain
\begin{align*}
W_1(\nu,\mu) \le W_2(\nu,\mu).
\end{align*}
[guided]
The purpose of this step is to prove that a quadratic transport bound is stronger than a linear transport bound. We do this at the level of each coupling, before taking infima.
Let $\Pi(\nu,\mu)$ be the set of Borel probability measures $\pi$ on $E \times E$ with first marginal $\nu$ and second marginal $\mu$. For a fixed coupling $\pi \in \Pi(\nu,\mu)$, define the map $D_\pi: E \times E \to [0,\infty)$ by $D_\pi(x,y)=d(x,y)$. This function is measurable because $d$ is the metric on the Borel [metric space](/page/Metric%20Space) $E$.
Suppose first that the quadratic transportation cost under $\pi$ is finite:
\begin{align*}
\int_{E \times E} D_\pi(x,y)^2 \, d\pi(x,y) <+\infty.
\end{align*}
We want to compare the first moment of $D_\pi$ to the second moment of $D_\pi$. Define
\begin{align*}
m_\pi := \int_{E \times E} D_\pi(x,y) \, d\pi(x,y).
\end{align*}
The key elementary estimate is the non-negativity of variance. Since $\pi$ is a probability measure and $(D_\pi-m_\pi)^2$ is non-negative, we have
\begin{align*}
0 \le \int_{E \times E} (D_\pi(x,y)-m_\pi)^2 \, d\pi(x,y).
\end{align*}
Expanding the square under the integral is justified by the finite second-moment assumption. Because $\pi(E\times E)=1$ and $m_\pi$ is constant, this gives
\begin{align*}
0 \le \int_{E \times E} D_\pi(x,y)^2 \, d\pi(x,y) - 2m_\pi \int_{E \times E} D_\pi(x,y) \, d\pi(x,y) + m_\pi^2.
\end{align*}
By the definition of $m_\pi$, the middle integral equals $m_\pi$, so the inequality becomes
\begin{align*}
0 \le \int_{E \times E} D_\pi(x,y)^2 \, d\pi(x,y) - m_\pi^2.
\end{align*}
Thus
\begin{align*}
\left(\int_{E \times E} D_\pi(x,y) \, d\pi(x,y)\right)^2 \le \int_{E \times E} D_\pi(x,y)^2 \, d\pi(x,y).
\end{align*}
Taking square roots gives
\begin{align*}
\int_{E \times E} D_\pi(x,y) \, d\pi(x,y) \le \left(\int_{E \times E} D_\pi(x,y)^2 \, d\pi(x,y)\right)^{1/2}.
\end{align*}
If the quadratic cost is infinite, the same displayed inequality holds in the extended real sense.
Now take the infimum over all couplings $\pi \in \Pi(\nu,\mu)$. The left-hand infimum is $W_1(\nu,\mu)$ by definition, while the infimum of the square roots of the quadratic costs is the square root of the infimum because the square-root map is increasing on $[0,\infty]$. Therefore
\begin{align*}
W_1(\nu,\mu) \le W_2(\nu,\mu).
\end{align*}
[/guided]
[/step]
[step:Apply the assumed $T_2(C)$ inequality to obtain $T_1(C)$ on $\mathcal P_2(E)$]
Let $\nu \in \mathcal P_2(E)$. If $H(\nu\mid\mu)=+\infty$, then
\begin{align*}
W_1^2(\nu,\mu) \le +\infty = 2C H(\nu\mid\mu)
\end{align*}
in the extended real sense, so the desired assertion is automatic.
Assume now that $H(\nu\mid\mu)<+\infty$. Since $\mu$ satisfies $T_2(C)$ on $\mathcal P_2(E)$, applying the hypothesis to $\rho=\nu$ gives
\begin{align*}
W_2^2(\nu,\mu) \le 2C H(\nu\mid\mu).
\end{align*}
From the previous step,
\begin{align*}
W_1(\nu,\mu) \le W_2(\nu,\mu).
\end{align*}
Squaring this inequality and combining it with the $T_2(C)$ estimate yields
\begin{align*}
W_1^2(\nu,\mu) \le W_2^2(\nu,\mu) \le 2C H(\nu\mid\mu).
\end{align*}
Thus the claimed $T_1(C)$ inequality holds for every $\nu \in \mathcal P_2(E)$.
[/step]
[step:Pass from $\mathcal P_2(E)$ to $\mathcal P_1(E)$ using the approximation hypothesis]
Assume the stated approximation property, and let $\nu \in \mathcal P_1(E)$. If $H(\nu\mid\mu)=+\infty$, the inequality
\begin{align*}
W_1^2(\nu,\mu) \le 2C H(\nu\mid\mu)
\end{align*}
is automatic in the extended real sense.
Assume now that $H(\nu\mid\mu)<+\infty$. By the approximation hypothesis, there exists a sequence $(\nu_k)_{k=1}^{\infty}$ in $\mathcal P_2(E)$ such that $\nu_k \to \nu$ in $W_1$ and
\begin{align*}
\limsup_{k \to \infty} H(\nu_k\mid\mu) \le H(\nu\mid\mu).
\end{align*}
For each positive integer $k$, the already proved $\mathcal P_2(E)$ case gives
\begin{align*}
W_1^2(\nu_k,\mu) \le 2C H(\nu_k\mid\mu).
\end{align*}
Since $\nu_k \to \nu$ in $W_1$, we have
\begin{align*}
\lim_{k \to \infty} W_1(\nu_k,\nu)=0.
\end{align*}
The triangle inequality for the metric $W_1$ gives
\begin{align*}
|W_1(\nu_k,\mu)-W_1(\nu,\mu)| \le W_1(\nu_k,\nu),
\end{align*}
and hence
\begin{align*}
\lim_{k \to \infty} W_1(\nu_k,\mu)=W_1(\nu,\mu).
\end{align*}
Taking the limit superior in the inequalities for $\nu_k$ yields
\begin{align*}
W_1^2(\nu,\mu) = \lim_{k \to \infty} W_1^2(\nu_k,\mu) \le 2C \limsup_{k \to \infty} H(\nu_k\mid\mu).
\end{align*}
Using the entropy approximation bound, we conclude that
\begin{align*}
W_1^2(\nu,\mu) \le 2C H(\nu\mid\mu).
\end{align*}
Therefore $\mu$ satisfies $T_1(C)$ on all of $\mathcal P_1(E)$.
[/step]