[proofplan]
First we show that every $1$-Lipschitz function is $\mu$-integrable, using the finite first moment assumption. We then prove the key exponential moment estimate for bounded $1$-Lipschitz functions by applying $T_1(C)$ to the exponentially tilted probability measure and using the elementary coupling bound behind the Kantorovich-Rubinstein dual inequality. A truncation argument extends the exponential estimate to arbitrary $1$-Lipschitz functions. Finally, Markov's inequality and optimization in the Laplace parameter give the Gaussian concentration bound.
[/proofplan]
[step:Use the finite first moment to prove integrability of Lipschitz functions]
Fix a point $x_0 \in X$. Since $\mu \in \mathcal{P}_1(X)$, define the distance function $\rho_{x_0}:X \to [0,\infty)$ by
\begin{align*}
\rho_{x_0}(x) := d(x,x_0).
\end{align*}
Then $\rho_{x_0}$ belongs to $L^1(X,\mu)$. Let
\begin{align*}
f:X &\to \mathbb{R}
\end{align*}
be $1$-Lipschitz. For every $x \in X$, the Lipschitz condition gives
\begin{align*}
|f(x)| \leq |f(x_0)| + |f(x)-f(x_0)| \leq |f(x_0)| + d(x,x_0).
\end{align*}
The right-hand side is $\mu$-integrable, because $\mu(X)=1$ and $\rho_{x_0} \in L^1(X,\mu)$. Hence $f \in L^1(X,\mu)$.
[/step]
[step:Derive the exponential moment estimate for bounded Lipschitz functions]
Let
\begin{align*}
g:X &\to \mathbb{R}
\end{align*}
be a bounded $1$-Lipschitz Borel function, and define its $\mu$-mean by
\begin{align*}
m_g := \int_X g\,d\mu(x).
\end{align*}
Fix $\lambda \geq 0$. Define the normalizing constant
\begin{align*}
Z_\lambda := \int_X e^{\lambda g}\,d\mu(x),
\end{align*}
and define the exponentially tilted probability measure $\nu_\lambda$ on the Borel $\sigma$-algebra of $X$ by
\begin{align*}
\nu_\lambda(A) := \frac{1}{Z_\lambda}\int_A e^{\lambda g}\,d\mu(x)
\end{align*}
for every Borel set $A \subset X$. Since $g$ is bounded, $0 < Z_\lambda < \infty$, and $\nu_\lambda \ll \mu$. Moreover, if $M_\lambda := \exp(\lambda\|g\|_\infty)/Z_\lambda$, then $d\nu_\lambda/d\mu \leq M_\lambda$, so
\begin{align*}
\int_X d(x,x_0)\,d\nu_\lambda(x) \leq M_\lambda \int_X d(x,x_0)\,d\mu(x) < \infty.
\end{align*}
Thus $\nu_\lambda \in \mathcal{P}_1(X)$, and the hypothesis $T_1(C)$ applies to $\nu_\lambda$. Its relative entropy is
\begin{align*}
H(\nu_\lambda \mid \mu)
= \int_X \log\left(\frac{d\nu_\lambda}{d\mu}\right)\,d\nu_\lambda(x)
= \lambda \int_X g\,d\nu_\lambda(x) - \log Z_\lambda.
\end{align*}
We first record the coupling estimate used for $W_1$. For every coupling $\pi$ of $\nu_\lambda$ and $\mu$, the function $g$ being $1$-Lipschitz gives
\begin{align*}
g(x)-g(y) \leq d(x,y)
\end{align*}
for every $(x,y) \in X \times X$. Integrating with respect to $\pi$ yields
\begin{align*}
\int_X g\,d\nu_\lambda(x)-\int_X g\,d\mu(y)
= \int_{X \times X} (g(x)-g(y))\,d\pi(x,y)
\leq \int_{X \times X} d(x,y)\,d\pi(x,y).
\end{align*}
Applying the same argument to the $1$-Lipschitz function $-g:X\to\mathbb{R}$ gives
\begin{align*}
\int_X g\,d\mu(y)-\int_X g\,d\nu_\lambda(x) \leq \int_{X \times X} d(x,y)\,d\pi(x,y).
\end{align*}
Taking the infimum over all couplings $\pi$ in both inequalities gives
\begin{align*}
\left|\int_X g\,d\nu_\lambda(x)-m_g\right| \leq W_1(\nu_\lambda,\mu).
\end{align*}
Set
\begin{align*}
a_\lambda := \int_X g\,d\nu_\lambda(x)-m_g.
\end{align*}
By the previous inequality and $T_1(C)$,
\begin{align*}
|a_\lambda| \leq \sqrt{2C H(\nu_\lambda \mid \mu)}.
\end{align*}
Since $H(\nu_\lambda \mid \mu)=\lambda a_\lambda+\lambda m_g-\log Z_\lambda$, this gives
\begin{align*}
a_\lambda^2 \leq 2C\left(\lambda a_\lambda+\lambda m_g-\log Z_\lambda\right).
\end{align*}
Therefore
\begin{align*}
\log Z_\lambda-\lambda m_g \leq \lambda a_\lambda-\frac{a_\lambda^2}{2C}.
\end{align*}
For every real number $a$, completing the square gives
\begin{align*}
\lambda a-\frac{a^2}{2C} \leq \frac{C\lambda^2}{2}.
\end{align*}
Applying this with $a=a_\lambda$ gives
\begin{align*}
\log \int_X e^{\lambda(g-m_g)}\,d\mu(x)
= \log Z_\lambda-\lambda m_g
\leq \frac{C\lambda^2}{2}.
\end{align*}
Exponentiating,
\begin{align*}
\int_X e^{\lambda(g-m_g)}\,d\mu(x) \leq \exp\left(\frac{C\lambda^2}{2}\right).
\end{align*}
[guided]
We want to turn the transport-[entropy inequality](/theorems/6729) into an exponential integrability statement. The standard way to do this is to test $T_1(C)$ on a probability measure that favours large values of the function. Let
\begin{align*}
g:X &\to \mathbb{R}
\end{align*}
be bounded, Borel, and $1$-Lipschitz, and let
\begin{align*}
m_g := \int_X g\,d\mu(x).
\end{align*}
For a fixed $\lambda \geq 0$, define
\begin{align*}
Z_\lambda := \int_X e^{\lambda g}\,d\mu(x).
\end{align*}
Because $g$ is bounded, the integrand $e^{\lambda g}$ is bounded above and below by positive constants, so $0<Z_\lambda<\infty$. We may therefore define a probability measure $\nu_\lambda$ by
\begin{align*}
\nu_\lambda(A) := \frac{1}{Z_\lambda}\int_A e^{\lambda g}\,d\mu(x)
\end{align*}
for each Borel set $A \subset X$. This measure satisfies $\nu_\lambda \ll \mu$. To check the other hypothesis in $T_1(C)$, define $M_\lambda := \exp(\lambda\|g\|_\infty)/Z_\lambda$. Since $d\nu_\lambda/d\mu \leq M_\lambda$ and $\mu\in\mathcal P_1(X)$,
\begin{align*}
\int_X d(x,x_0)\,d\nu_\lambda(x) \leq M_\lambda\int_X d(x,x_0)\,d\mu(x) < \infty.
\end{align*}
Thus $\nu_\lambda\in\mathcal P_1(X)$, so the hypothesis $T_1(C)$ applies to $\nu_\lambda$.
The density of $\nu_\lambda$ with respect to $\mu$ is
\begin{align*}
\frac{d\nu_\lambda}{d\mu}(x)=\frac{e^{\lambda g(x)}}{Z_\lambda}.
\end{align*}
Thus
\begin{align*}
\log\left(\frac{d\nu_\lambda}{d\mu}(x)\right)=\lambda g(x)-\log Z_\lambda.
\end{align*}
Integrating this identity with respect to $\nu_\lambda$ gives
\begin{align*}
H(\nu_\lambda \mid \mu)
= \lambda \int_X g\,d\nu_\lambda(x)-\log Z_\lambda.
\end{align*}
Next we connect the mean shift of $g$ under $\nu_\lambda$ to $W_1(\nu_\lambda,\mu)$. Let $\pi$ be any coupling of $\nu_\lambda$ and $\mu$, meaning that $\pi$ is a probability measure on $X \times X$ whose first marginal is $\nu_\lambda$ and whose second marginal is $\mu$. Since $g$ is $1$-Lipschitz,
\begin{align*}
g(x)-g(y) \leq |g(x)-g(y)| \leq d(x,y)
\end{align*}
for all $x,y \in X$. Therefore
\begin{align*}
\int_X g\,d\nu_\lambda(x)-\int_X g\,d\mu(y)
= \int_{X \times X} (g(x)-g(y))\,d\pi(x,y)
\leq \int_{X \times X} d(x,y)\,d\pi(x,y).
\end{align*}
Applying the same estimate to the $1$-Lipschitz function $-g:X\to\mathbb{R}$ gives the reverse mean difference bound, and taking the infimum over all couplings $\pi$ in both inequalities gives
\begin{align*}
\left|\int_X g\,d\nu_\lambda(x)-m_g\right| \leq W_1(\nu_\lambda,\mu).
\end{align*}
Define the mean displacement
\begin{align*}
a_\lambda := \int_X g\,d\nu_\lambda(x)-m_g.
\end{align*}
The transport-entropy hypothesis now gives
\begin{align*}
|a_\lambda| \leq W_1(\nu_\lambda,\mu) \leq \sqrt{2C H(\nu_\lambda \mid \mu)}.
\end{align*}
Using the entropy identity and rewriting $\int_X g\,d\nu_\lambda(x)=a_\lambda+m_g$, we obtain
\begin{align*}
H(\nu_\lambda \mid \mu)
= \lambda(a_\lambda+m_g)-\log Z_\lambda.
\end{align*}
Thus
\begin{align*}
a_\lambda^2 \leq 2C\left(\lambda a_\lambda+\lambda m_g-\log Z_\lambda\right).
\end{align*}
Solving this inequality for $\log Z_\lambda-\lambda m_g$ gives
\begin{align*}
\log Z_\lambda-\lambda m_g \leq \lambda a_\lambda-\frac{a_\lambda^2}{2C}.
\end{align*}
The right-hand side is a quadratic function of $a_\lambda$. Its maximum over $a \in \mathbb{R}$ is found by completing the square:
\begin{align*}
\lambda a-\frac{a^2}{2C}
= \frac{C\lambda^2}{2}-\frac{(a-C\lambda)^2}{2C}
\leq \frac{C\lambda^2}{2}.
\end{align*}
Therefore
\begin{align*}
\log Z_\lambda-\lambda m_g \leq \frac{C\lambda^2}{2}.
\end{align*}
Since
\begin{align*}
\int_X e^{\lambda(g-m_g)}\,d\mu(x)=e^{-\lambda m_g}Z_\lambda,
\end{align*}
we conclude that
\begin{align*}
\int_X e^{\lambda(g-m_g)}\,d\mu(x) \leq \exp\left(\frac{C\lambda^2}{2}\right).
\end{align*}
[/guided]
[/step]
[step:Pass from bounded truncations to an arbitrary Lipschitz function]
Let
\begin{align*}
f:X &\to \mathbb{R}
\end{align*}
be $1$-Lipschitz. For each integer $k \geq 1$, define the truncation map
\begin{align*}
\tau_k:\mathbb{R} &\to [-k,k]
\end{align*}
by
\begin{align*}
\tau_k(t) := \max\{-k,\min\{t,k\}\},
\end{align*}
and define
\begin{align*}
f_k:X &\to [-k,k]
\end{align*}
by
\begin{align*}
f_k(x) := \tau_k(f(x)).
\end{align*}
The map $\tau_k$ is $1$-Lipschitz on $\mathbb{R}$, so $f_k$ is a bounded $1$-Lipschitz Borel function on $X$. Let
\begin{align*}
m_k := \int_X f_k\,d\mu(x),
\qquad
m := \int_X f\,d\mu(x).
\end{align*}
Since $f \in L^1(X,\mu)$ and $f_k \to f$ pointwise with $|f_k| \leq |f|$, the [dominated convergence theorem](/theorems/4) gives $m_k \to m$.
Applying the exponential moment estimate to $f_k$ gives, for every $\lambda \geq 0$,
\begin{align*}
\int_X e^{\lambda(f_k-m_k)}\,d\mu(x)
\leq \exp\left(\frac{C\lambda^2}{2}\right).
\end{align*}
Since $f_k-m_k \to f-m$ pointwise, Fatou's lemma applied to the non-negative functions $e^{\lambda(f_k-m_k)}$ gives
\begin{align*}
\int_X e^{\lambda(f-m)}\,d\mu(x)
\leq \liminf_{k \to \infty}\int_X e^{\lambda(f_k-m_k)}\,d\mu(x)
\leq \exp\left(\frac{C\lambda^2}{2}\right).
\end{align*}
Thus every $1$-Lipschitz $f$ satisfies the exponential moment bound
\begin{align*}
\int_X e^{\lambda\left(f-\int_X f\,d\mu\right)}\,d\mu(x)
\leq \exp\left(\frac{C\lambda^2}{2}\right)
\end{align*}
for every $\lambda \geq 0$.
[/step]
[step:Apply Markov's inequality and optimize the Laplace parameter]
Let $f:X \to \mathbb{R}$ be $1$-Lipschitz, and set
\begin{align*}
m := \int_X f\,d\mu(x).
\end{align*}
Fix $r \geq 0$ and $\lambda > 0$. Define the non-negative measurable function
\begin{align*}
Y_\lambda:X &\to [0,\infty)
\end{align*}
by
\begin{align*}
Y_\lambda(x) := e^{\lambda(f(x)-m)}.
\end{align*}
The event $\{x \in X : f(x)-m \geq r\}$ is contained in $\{x \in X : Y_\lambda(x) \geq e^{\lambda r}\}$. Markov's inequality gives
\begin{align*}
\mu\left(\left\{x \in X : f(x)-m \geq r\right\}\right)
\leq e^{-\lambda r}\int_X e^{\lambda(f-m)}\,d\mu(x).
\end{align*}
Using the exponential moment estimate,
\begin{align*}
\mu\left(\left\{x \in X : f(x)-m \geq r\right\}\right)
\leq \exp\left(-\lambda r+\frac{C\lambda^2}{2}\right).
\end{align*}
For $r>0$, choose $\lambda=r/C$, which is positive because $C>0$. Then
\begin{align*}
-\lambda r+\frac{C\lambda^2}{2}
= -\frac{r^2}{C}+\frac{r^2}{2C}
= -\frac{r^2}{2C}.
\end{align*}
Hence
\begin{align*}
\mu\left(\left\{x \in X : f(x)-m \geq r\right\}\right)
\leq \exp\left(-\frac{r^2}{2C}\right)
\end{align*}
for every $r>0$. When $r=0$, the desired inequality reads
\begin{align*}
\mu\left(\left\{x \in X : f(x)-m \geq 0\right\}\right) \leq 1,
\end{align*}
which holds because $\mu$ is a probability measure. Combining the cases $r>0$ and $r=0$ proves the stated concentration inequality for all $r \geq 0$.
[/step]