[proofplan]
We prove the result through the Laplace-transform characterization of the $T_1$ inequality. A $1$-Lipschitz [test function](/page/Test%20Function) on $Y$ pulls back along $T$ to an $L$-Lipschitz function on $X$, so after dividing by $L$ the $T_1(C)$ hypothesis gives the required sub-Gaussian estimate on $X$. The pushforward identity then rewrites this estimate as the Bobkov-Götze criterion for $T_\#\mu$ on $Y$. The case $L=0$ is handled separately because the normalization by $L$ is unavailable.
[/proofplan]
[step:Handle the degenerate case in which the map has Lipschitz constant zero]
Assume first that $L=0$. Since $\mu$ is a probability measure on $X$, the space $X$ is nonempty. Choose $x_0 \in X$ and set $y_0 := T(x_0) \in Y$. For every $x \in X$, the Lipschitz bound gives $d_Y(T(x),y_0) = d_Y(T(x),T(x_0)) \leq 0 \cdot d_X(x,x_0) = 0$. Hence $T(x)=y_0$ for all $x \in X$, so $T_\#\mu=\delta_{y_0}$.
Let $\sigma$ be a Borel probability measure on $Y$. If $H(\sigma\mid \delta_{y_0})<\infty$, then $\sigma \ll \delta_{y_0}$, hence $\sigma=\delta_{y_0}$, and therefore
\begin{align*}
W_{1,d_Y}(\sigma,\delta_{y_0})^2 = 0 \leq 0 = 2CL^2\,H(\sigma\mid \delta_{y_0}).
\end{align*}
If $H(\sigma\mid \delta_{y_0})=\infty$, we use the standard extended-value convention for the entropy form of $T_1(A)$ that $A\,H(\sigma\mid\rho)=+\infty$ when $H(\sigma\mid\rho)=+\infty$, even when $A=0$. With this convention the inequality is vacuous in the infinite-entropy case. Thus $T_\#\mu$ satisfies $T_1(0)=T_1(CL^2)$.
[/step]
[step:Pull a Lipschitz test function on $Y$ back to an $L$-Lipschitz function on $X$]
Assume now that $L>0$, and define the Borel probability measure
\begin{align*}
\nu := T_\#\mu.
\end{align*}
Let
\begin{align*}
f:Y \to \mathbb{R}
\end{align*}
be a $1$-Lipschitz Borel function. Define
\begin{align*}
g:X \to \mathbb{R}
\end{align*}
by $g(x):=f(T(x))$ for $x \in X$. Since $T$ is Borel measurable and $f$ is continuous, hence Borel measurable, the map $g$ is Borel measurable. For all $x,x' \in X$,
\begin{align*}
|g(x)-g(x')| = |f(T(x))-f(T(x'))| \leq d_Y(T(x),T(x')) \leq L\,d_X(x,x').
\end{align*}
Thus $g$ is $L$-Lipschitz. Define
\begin{align*}
h:X \to \mathbb{R}
\end{align*}
by $h(x):=L^{-1}g(x)$. Then $h$ is $1$-Lipschitz on $(X,d_X)$.
[guided]
The reason for passing from $f$ to $h=L^{-1}f\circ T$ is that the Laplace form of the $T_1(C)$ inequality is stated for $1$-Lipschitz test functions on the underlying [metric space](/page/Metric%20Space). We start with a $1$-Lipschitz function
\begin{align*}
f:Y \to \mathbb{R}.
\end{align*}
The composition with the transport map gives
\begin{align*}
g:X \to \mathbb{R}
\end{align*}
defined by $g(x):=f(T(x))$. This function is measurable because $T$ is Borel measurable and $f$ is continuous; the continuity of $f$ follows from its Lipschitz property. For any two points $x,x' \in X$, the Lipschitz property of $f$ on $Y$ gives
\begin{align*}
|f(T(x))-f(T(x'))| \leq d_Y(T(x),T(x')).
\end{align*}
The Lipschitz property of $T$ then gives
\begin{align*}
d_Y(T(x),T(x')) \leq L\,d_X(x,x').
\end{align*}
Combining the two inequalities,
\begin{align*}
|g(x)-g(x')| \leq L\,d_X(x,x').
\end{align*}
Thus $g$ is $L$-Lipschitz. Since we are in the case $L>0$, the rescaled map
\begin{align*}
h:X \to \mathbb{R}
\end{align*}
defined by $h(x):=L^{-1}g(x)$ is well-defined, and for all $x,x' \in X$,
\begin{align*}
|h(x)-h(x')| \leq d_X(x,x').
\end{align*}
Therefore $h$ is $1$-Lipschitz on $X$, so it is an admissible test function for the Laplace characterization of the $T_1(C)$ inequality under $\mu$.
[/guided]
[/step]
[step:Apply the Bobkov-Götze Laplace criterion on $X$]
We use the following external Bobkov-Götze Laplace characterization of the $T_1$ transportation inequality; once this result is available as an Androma theorem, this citation should be replaced by the corresponding theorem link. On a Polish metric space $(M,d)$, a Borel probability measure $\rho$ satisfies $T_1(A)$ if and only if every real-valued $1$-Lipschitz Borel function $u:M\to\mathbb{R}$ satisfies the centered exponential estimate
\begin{align*}
\int_M \exp\left(\lambda\left(u(m)-\int_M u(q)\,d\rho(q)\right)\right)\,d\rho(m) \leq \exp\left(\frac{A\lambda^2}{2}\right)
\end{align*}
for every $\lambda \in \mathbb{R}$, with the convention that the displayed inequality includes the finiteness of the centered exponential moment. Since $\mu$ satisfies $T_1(C)$ on $(X,d_X)$ and $h:X\to\mathbb{R}$ is $1$-Lipschitz, for every $\lambda \in \mathbb{R}$,
\begin{align*}
\int_X \exp\left(\lambda\left(h(x)-\int_X h(z)\,d\mu(z)\right)\right)\,d\mu(x) \leq \exp\left(\frac{C\lambda^2}{2}\right).
\end{align*}
Substituting $h=L^{-1}g$ and then replacing $\lambda$ by $L\lambda$, we obtain, for every $\lambda \in \mathbb{R}$,
\begin{align*}
\int_X \exp\left(\lambda\left(g(x)-\int_X g(z)\,d\mu(z)\right)\right)\,d\mu(x) \leq \exp\left(\frac{CL^2\lambda^2}{2}\right).
\end{align*}
[/step]
[step:Rewrite the Laplace estimate under the pushforward measure]
By the defining integration identity for the pushforward measure $\nu=T_\#\mu$, every Borel measurable function $\varphi:Y\to[0,\infty]$ satisfies
\begin{align*}
\int_Y \varphi(y)\,d\nu(y)=\int_X \varphi(T(x))\,d\mu(x).
\end{align*}
The same identity also applies to any real-valued Borel function whose positive and negative parts are integrable. Applying the Laplace estimate from the preceding step with $\lambda=L$ and $\lambda=-L$ shows that $g$ has finite exponential moments in both signs under $\mu$, hence $g$ is $\mu$-integrable. Since $g=f\circ T$, the pushforward identity for the positive and negative parts gives that $f$ is $\nu$-integrable and
\begin{align*}
\int_X g(z)\,d\mu(z)=\int_Y f(w)\,d\nu(w).
\end{align*}
Define the nonnegative Borel function
\begin{align*}
\varphi:Y &\to [0,\infty]
\end{align*}
by
\begin{align*}
\varphi(y):=\exp\left(\lambda\left(f(y)-\int_Y f(w)\,d\nu(w)\right)\right).
\end{align*}
Applying the nonnegative pushforward identity to $\varphi$, we get
\begin{align*}
\int_Y \exp\left(\lambda\left(f(y)-\int_Y f(w)\,d\nu(w)\right)\right)\,d\nu(y)=\int_X \exp\left(\lambda\left(g(x)-\int_X g(z)\,d\mu(z)\right)\right)\,d\mu(x).
\end{align*}
Therefore the estimate from the preceding step becomes
\begin{align*}
\int_Y \exp\left(\lambda\left(f(y)-\int_Y f(w)\,d\nu(w)\right)\right)\,d\nu(y) \leq \exp\left(\frac{CL^2\lambda^2}{2}\right)
\end{align*}
for every $\lambda \in \mathbb{R}$ and every $1$-Lipschitz Borel function $f:Y\to\mathbb{R}$.
[/step]
[step:Conclude the $T_1(CL^2)$ inequality on $Y$]
The preceding step verifies the Laplace condition in the Bobkov-Götze characterization of $T_1$ on the Polish metric space $(Y,d_Y)$ with constant $CL^2$. Hence, again by the external Bobkov-Götze characterization of $T_1$ stated above, the Borel probability measure $\nu=T_\#\mu$ satisfies
\begin{align*}
W_{1,d_Y}(\sigma,\nu)^2 \leq 2CL^2\,H(\sigma\mid \nu)
\end{align*}
for every Borel probability measure $\sigma$ on $Y$. Since $\nu=T_\#\mu$, this is exactly the assertion that $T_\#\mu$ satisfies $T_1(CL^2)$ on $(Y,d_Y)$.
[/step]