[proofplan]
We first prove the exponential moment estimate for bounded centered Lipschitz functions. The tilted probability measure converts the derivative of the logarithmic moment generating function into a difference of expectations, which is bounded by $W_1$ and then by the $T_1(C)$ inequality. This gives a differential inequality for the logarithmic moment generating function, hence a sub-Gaussian bound. Chernoff's bound gives the tail estimate for bounded functions, and truncation removes the boundedness assumption.
[/proofplan]
[step:Derive the moment bound for bounded centered Lipschitz functions]
Let $\mathcal{B}(E)$ denote the Borel $\sigma$-algebra generated by the metric topology on $E$. By the meaning of the hypothesis $T_1(C)$, $\mu$ is a probability measure on $(E,\mathcal{B}(E))$ with finite first moment and satisfies
\begin{align*}
W_1(\nu,\mu) \leq \sqrt{2C\,\operatorname{Ent}(\nu\mid\mu)}
\end{align*}
for every probability measure $\nu$ on $(E,\mathcal{B}(E))$ such that $\nu\ll\mu$ and $\operatorname{Ent}(\nu\mid\mu)<\infty$. Here $\operatorname{Ent}(\nu\mid\mu)$ denotes the relative entropy
\begin{align*}
\operatorname{Ent}(\nu\mid\mu):=\int_E \log\left(\frac{d\nu}{d\mu}(x)\right)\,d\nu(x),
\end{align*}
$\Pi(\nu,\mu)$ denotes the set of probability measures on $(E\times E,\mathcal{B}(E)\otimes\mathcal{B}(E))$ whose first marginal is $\nu$ and whose second marginal is $\mu$, and $W_1(\nu,\mu)$ denotes the first Wasserstein distance
\begin{align*}
W_1(\nu,\mu):=\inf_{\pi\in\Pi(\nu,\mu)}\int_{E\times E} d(x,y)\,d\pi(x,y).
\end{align*}
Let $G:E \to \mathbb{R}$ be a bounded Borel measurable $1$-Lipschitz function satisfying
\begin{align*}
\int_E G(x)\,d\mu(x) = 0.
\end{align*}
For $\lambda \geq 0$, define the logarithmic moment generating function $\Lambda:[0,\infty) \to \mathbb{R}$ by
\begin{align*}
\Lambda(\lambda) := \log\left(\int_E e^{\lambda G(x)}\,d\mu(x)\right).
\end{align*}
Since $G$ is bounded, $\Lambda$ is continuously differentiable and
\begin{align*}
\Lambda'(\lambda) = \frac{\int_E G(x)e^{\lambda G(x)}\,d\mu(x)}{\int_E e^{\lambda G(x)}\,d\mu(x)}.
\end{align*}
Let $\nu$ be a probability measure on $(E,\mathcal{B}(E))$ such that $\nu \ll \mu$, $\operatorname{Ent}(\nu\mid\mu)<\infty$, and $\nu$ has finite first moment. Because $G$ is bounded, the integrals $\int_E G(x)\,d\nu(x)$ and $\int_E G(y)\,d\mu(y)$ are finite. Because $\nu$ and $\mu$ have finite first moments, $W_1(\nu,\mu)$ is finite and the coupling costs below are meaningful. Since $G$ is centered and $1$-Lipschitz, every coupling $\pi \in \Pi(\nu,\mu)$ satisfies
\begin{align*}
\int_E G(x)\,d\nu(x) - \int_E G(y)\,d\mu(y)
= \int_{E \times E} \bigl(G(x)-G(y)\bigr)\,d\pi(x,y)
\leq \int_{E \times E} d(x,y)\,d\pi(x,y).
\end{align*}
Taking the infimum over $\pi \in \Pi(\nu,\mu)$ gives
\begin{align*}
\int_E G(x)\,d\nu(x) \leq W_1(\nu,\mu).
\end{align*}
The $T_1(C)$ hypothesis applies to such $\nu$, so
\begin{align*}
\int_E G(x)\,d\nu(x) \leq \sqrt{2C\,\operatorname{Ent}(\nu\mid\mu)}.
\end{align*}
We now prove the entropy variational step needed for the logarithmic moment bound. For $\lambda \geq 0$, define the tilted probability measure $\nu_\lambda$ on $(E,\mathcal{B}(E))$ by
\begin{align*}
\frac{d\nu_\lambda}{d\mu}(x) := \frac{e^{\lambda G(x)}}{\int_E e^{\lambda G(y)}\,d\mu(y)}.
\end{align*}
Since $G$ is bounded, $\nu_\lambda \ll \mu$, the density is bounded above and below by positive constants, and $\operatorname{Ent}(\nu_\lambda\mid\mu)<\infty$. To verify the finite-first-moment condition, fix a point $x_0\in E$ and let $M_\lambda:=\sup_{x\in E} d\nu_\lambda/d\mu(x)<\infty$. The finite-first-moment assumption on $\mu$ gives
\begin{align*}
\int_E d(x,x_0)\,d\nu_\lambda(x) \leq M_\lambda\int_E d(x,x_0)\,d\mu(x) < \infty.
\end{align*}
Thus $\nu_\lambda$ is admissible for the $T_1(C)$ hypothesis. Its entropy is
\begin{align*}
\operatorname{Ent}(\nu_\lambda\mid\mu)
= \int_E \log\left(\frac{d\nu_\lambda}{d\mu}(x)\right)\,d\nu_\lambda(x)
= \lambda\int_E G(x)\,d\nu_\lambda(x)-\Lambda(\lambda).
\end{align*}
Hence
\begin{align*}
\Lambda(\lambda)=\lambda\int_E G(x)\,d\nu_\lambda(x)-\operatorname{Ent}(\nu_\lambda\mid\mu).
\end{align*}
Applying the preceding transport-entropy estimate to $\nu_\lambda$ gives
\begin{align*}
\Lambda(\lambda) \leq \lambda\sqrt{2C\,\operatorname{Ent}(\nu_\lambda\mid\mu)}-\operatorname{Ent}(\nu_\lambda\mid\mu).
\end{align*}
For the scalar $r_\lambda:=\operatorname{Ent}(\nu_\lambda\mid\mu)\geq0$, completing the square gives
\begin{align*}
\lambda\sqrt{2Cr_\lambda}-r_\lambda
= \frac{C\lambda^2}{2}-\left(\sqrt{r_\lambda}-\lambda\sqrt{\frac{C}{2}}\right)^2
\leq \frac{C\lambda^2}{2}.
\end{align*}
Therefore
\begin{align*}
\Lambda(\lambda) \leq \frac{C\lambda^2}{2}.
\end{align*}
Thus, for every $\lambda \geq 0$,
\begin{align*}
\int_E e^{\lambda G(x)}\,d\mu(x) \leq \exp\left(\frac{C\lambda^2}{2}\right).
\end{align*}
[guided]
The goal of this step is to prove a Gaussian-type bound for the exponential moment of a bounded centered Lipschitz function. Let $G:E \to \mathbb{R}$ be bounded, Borel measurable, $1$-Lipschitz, and centered:
\begin{align*}
\int_E G(x)\,d\mu(x) = 0.
\end{align*}
For $\lambda \geq 0$, define
\begin{align*}
\Lambda(\lambda) := \log\left(\int_E e^{\lambda G(x)}\,d\mu(x)\right).
\end{align*}
Because $G$ is bounded, differentiation under the integral sign is justified, and
\begin{align*}
\Lambda'(\lambda)
= \frac{\int_E G(x)e^{\lambda G(x)}\,d\mu(x)}{\int_E e^{\lambda G(x)}\,d\mu(x)}.
\end{align*}
We next turn the transport-entropy assumption into an exponential moment bound. Let $\nu$ be a probability measure on $(E,\mathcal{B}(E))$ such that $\nu \ll \mu$, $\operatorname{Ent}(\nu\mid\mu)<\infty$, and $\nu$ has finite first moment. Since $G$ is bounded, the integrals against $\nu$ and $\mu$ are finite. Since both measures have finite first moment, the Wasserstein distance $W_1(\nu,\mu)$ is finite. Because $G$ is centered,
\begin{align*}
\int_E G(x)\,d\nu(x)=\int_E G(x)\,d\nu(x)-\int_E G(y)\,d\mu(y).
\end{align*}
Let $\pi \in \Pi(\nu,\mu)$ be any coupling. The $1$-Lipschitz property gives $G(x)-G(y)\leq d(x,y)$ for all $x,y\in E$, so
\begin{align*}
\int_E G(x)\,d\nu(x)-\int_E G(y)\,d\mu(y)
= \int_{E\times E}\bigl(G(x)-G(y)\bigr)\,d\pi(x,y)
\leq \int_{E\times E} d(x,y)\,d\pi(x,y).
\end{align*}
Taking the infimum over all couplings yields
\begin{align*}
\int_E G(x)\,d\nu(x) \leq W_1(\nu,\mu).
\end{align*}
Because $\nu$ is one of the measures to which the $T_1(C)$ hypothesis applies, we obtain
\begin{align*}
\int_E G(x)\,d\nu(x) \leq \sqrt{2C\,\operatorname{Ent}(\nu\mid\mu)}.
\end{align*}
Now choose the particular measure that is adapted to the exponential moment. For $\lambda\geq0$, define the tilted probability measure $\nu_\lambda$ by
\begin{align*}
\frac{d\nu_\lambda}{d\mu}(x)
:= \frac{e^{\lambda G(x)}}{\int_E e^{\lambda G(y)}\,d\mu(y)}.
\end{align*}
The boundedness of $G$ makes this density bounded above and below by positive constants. Hence $\nu_\lambda\ll\mu$ and its entropy is finite. We also must verify the finite-first-moment hypothesis before applying $T_1(C)$. Fix $x_0\in E$ and set $M_\lambda:=\sup_{x\in E} d\nu_\lambda/d\mu(x)$. Since $M_\lambda<\infty$ and $\mu$ has finite first moment,
\begin{align*}
\int_E d(x,x_0)\,d\nu_\lambda(x) \leq M_\lambda\int_E d(x,x_0)\,d\mu(x) < \infty.
\end{align*}
Thus $\nu_\lambda$ is admissible for the $T_1(C)$ assumption. Computing the entropy from the density gives
\begin{align*}
\operatorname{Ent}(\nu_\lambda\mid\mu)
= \int_E \log\left(\frac{d\nu_\lambda}{d\mu}(x)\right)\,d\nu_\lambda(x)
= \lambda\int_E G(x)\,d\nu_\lambda(x)-\Lambda(\lambda).
\end{align*}
Rearranging this identity gives the entropy variational identity for this tilt:
\begin{align*}
\Lambda(\lambda)
= \lambda\int_E G(x)\,d\nu_\lambda(x)-\operatorname{Ent}(\nu_\lambda\mid\mu).
\end{align*}
Applying the transport-entropy estimate to $\nu_\lambda$ therefore gives
\begin{align*}
\Lambda(\lambda)
\leq \lambda\sqrt{2C\,\operatorname{Ent}(\nu_\lambda\mid\mu)}-\operatorname{Ent}(\nu_\lambda\mid\mu).
\end{align*}
Set
\begin{align*}
r_\lambda:=\operatorname{Ent}(\nu_\lambda\mid\mu).
\end{align*}
Since $r_\lambda\geq0$, completing the square in this scalar quantity gives
\begin{align*}
\lambda\sqrt{2Cr_\lambda}-r_\lambda
= \frac{C\lambda^2}{2}-\left(\sqrt{r_\lambda}-\lambda\sqrt{\frac{C}{2}}\right)^2
\leq \frac{C\lambda^2}{2}.
\end{align*}
Thus
\begin{align*}
\Lambda(\lambda) \leq \frac{C\lambda^2}{2}.
\end{align*}
Equivalently,
\begin{align*}
\int_E e^{\lambda G(x)}\,d\mu(x) \leq \exp\left(\frac{C\lambda^2}{2}\right).
\end{align*}
[/guided]
[/step]
[step:Apply Chernoff optimization to obtain the bounded tail bound]
Let $G:E\to\mathbb{R}$ be bounded, Borel measurable, $1$-Lipschitz, and centered. For $t\geq0$ and $\lambda>0$, Markov's inequality applied to the nonnegative [random variable](/page/Random%20Variable) $e^{\lambda G}$ gives
\begin{align*}
\mu(\{x\in E:G(x)\geq t\})
= \mu(\{x\in E:e^{\lambda G(x)}\geq e^{\lambda t}\})
\leq e^{-\lambda t}\int_E e^{\lambda G(x)}\,d\mu(x).
\end{align*}
Using the moment bound from the previous step,
\begin{align*}
\mu(\{x\in E:G(x)\geq t\})
\leq \exp\left(-\lambda t+\frac{C\lambda^2}{2}\right).
\end{align*}
The quadratic function $\lambda \mapsto -\lambda t+C\lambda^2/2$ on $(0,\infty)$ is minimized at $\lambda=t/C$ when $t>0$. Substituting this value gives
\begin{align*}
\mu(\{x\in E:G(x)\geq t\})
\leq \exp\left(-\frac{t^2}{2C}\right).
\end{align*}
For $t=0$, the same bound is $\mu(\{G\geq0\})\leq1$, which is immediate.
[/step]
[step:Remove the boundedness assumption by Lipschitz truncation]
Let $F:E\to\mathbb{R}$ be Borel measurable, $1$-Lipschitz, and integrable. Define
\begin{align*}
m := \int_E F\,d\mu
\end{align*}
and define the centered function $G:E\to\mathbb{R}$ by
\begin{align*}
G(x):=F(x)-m.
\end{align*}
Then $G$ is $1$-Lipschitz and belongs to $L^1(E,\mathcal{B}(E),\mu)$, with
\begin{align*}
\int_E G\,d\mu=0.
\end{align*}
For each $a>0$, define the truncation map $\tau_a:\mathbb{R}\to[-a,a]$ by
\begin{align*}
\tau_a(r):=\max\{-a,\min\{r,a\}\}.
\end{align*}
The map $\tau_a$ is $1$-Lipschitz. Hence the map $G_a:E\to\mathbb{R}$ defined by $G_a(x):=\tau_a(G(x))$ is bounded, Borel measurable, and $1$-Lipschitz. Let
\begin{align*}
m_a:=\int_E G_a\,d\mu.
\end{align*}
Then $G_a-m_a$ is bounded, centered, and $1$-Lipschitz. Applying the bounded moment estimate to $G_a-m_a$ gives, for every $\lambda\geq0$,
\begin{align*}
\int_E e^{\lambda(G_a(x)-m_a)}\,d\mu(x)
\leq \exp\left(\frac{C\lambda^2}{2}\right).
\end{align*}
Because $|G_a|\leq |G|$ and $G_a(x)\to G(x)$ for every $x\in E$, dominated convergence gives
\begin{align*}
m_a\to \int_E G\,d\mu=0.
\end{align*}
For fixed $\lambda\geq0$, Fatou's lemma applied to the nonnegative functions $e^{\lambda(G_a-m_a)}$ gives
\begin{align*}
\int_E e^{\lambda G(x)}\,d\mu(x)
\leq \liminf_{a\to\infty}\int_E e^{\lambda(G_a(x)-m_a)}\,d\mu(x)
\leq \exp\left(\frac{C\lambda^2}{2}\right).
\end{align*}
Thus the same moment bound holds for the original centered function $G$.
[/step]
[step:Conclude the stated concentration inequality]
For $t\geq0$ and $\lambda>0$, Markov's inequality and the moment bound for $G=F-m$ give
\begin{align*}
\mu(\{x\in E:F(x)-m\geq t\})
\leq \exp\left(-\lambda t+\frac{C\lambda^2}{2}\right).
\end{align*}
Optimizing at $\lambda=t/C$ for $t>0$ yields
\begin{align*}
\mu\left(\left\{x\in E:F(x)-\int_E F\,d\mu\geq t\right\}\right)
\leq \exp\left(-\frac{t^2}{2C}\right).
\end{align*}
For $t=0$, the estimate reduces to the bound by $1$, so it also holds. This proves the claimed one-sided concentration inequality.
[/step]