[proofplan]
We first prove a sub-Gaussian [Laplace transform](/page/Laplace%20Transform) estimate for bounded $1$-Lipschitz functions by applying the $T_1(C)$ inequality to the exponentially tilted probability measures. The key identity is the entropy formula for the tilt, which converts the transportation bound into a differential inequality for the logarithmic moment generating function. We then pass from bounded functions to general integrable Lipschitz functions by clipping and Fatou's lemma. Finally, Chernoff's bound optimizes the Laplace estimate and gives the two tail inequalities, with the lower tail obtained by applying the upper-tail result to $-f$.
[/proofplan]
[step:Derive the logarithmic moment bound for bounded Lipschitz functions]
Assume first that $f:X\to\mathbb{R}$ is bounded and $1$-Lipschitz. Define the mean $m\in\mathbb{R}$ by
\begin{align*}
m:=\int_X f(x)\,d\mu(x).
\end{align*}
For $\lambda\geq 0$, define the logarithmic moment generating function $\Lambda:[0,\infty)\to\mathbb{R}$ by
\begin{align*}
\Lambda(\lambda):=\log\left(\int_X e^{\lambda(f(x)-m)}\,d\mu(x)\right).
\end{align*}
Since $f$ is bounded, the integrand and its $\lambda$-derivative are bounded on compact $\lambda$-intervals, so differentiation under the integral sign gives
\begin{align*}
\Lambda'(\lambda)=\frac{\int_X (f(x)-m)e^{\lambda(f(x)-m)}\,d\mu(x)}{\int_X e^{\lambda(f(x)-m)}\,d\mu(x)}.
\end{align*}
For $\lambda\geq 0$, define the tilted Borel probability measure $\nu_\lambda$ on $X$ by its Radon-Nikodym derivative
\begin{align*}
\frac{d\nu_\lambda}{d\mu}(x):=\exp\left(\lambda(f(x)-m)-\Lambda(\lambda)\right).
\end{align*}
Then
\begin{align*}
\Lambda'(\lambda)=\int_X f(x)\,d\nu_\lambda(x)-\int_X f(x)\,d\mu(x).
\end{align*}
Because $f$ is $1$-Lipschitz, the Kantorovich-Rubinstein dual formula gives
\begin{align*}
\int_X f(x)\,d\nu_\lambda(x)-\int_X f(x)\,d\mu(x)\leq W_1(\nu_\lambda,\mu).
\end{align*}
(citing a result not yet in the wiki: Kantorovich-Rubinstein duality)
Since $\nu_\lambda\ll\mu$, the $T_1(C)$ hypothesis applies and yields
\begin{align*}
\Lambda'(\lambda)\leq \sqrt{2C\,H(\nu_\lambda\mid\mu)}.
\end{align*}
The entropy of the tilted measure is
\begin{align*}
H(\nu_\lambda\mid\mu)=\int_X \log\left(\frac{d\nu_\lambda}{d\mu}(x)\right)\,d\nu_\lambda(x).
\end{align*}
Using the definition of $d\nu_\lambda/d\mu$, we obtain
\begin{align*}
H(\nu_\lambda\mid\mu)=\lambda\int_X (f(x)-m)\,d\nu_\lambda(x)-\Lambda(\lambda)=\lambda\Lambda'(\lambda)-\Lambda(\lambda).
\end{align*}
Thus
\begin{align*}
\Lambda'(\lambda)\leq \sqrt{2C\left(\lambda\Lambda'(\lambda)-\Lambda(\lambda)\right)}.
\end{align*}
Since $\Lambda(0)=0$ and $\Lambda$ is convex, $\Lambda'(\lambda)\geq \Lambda'(0)=0$ for every $\lambda\geq 0$. Squaring the preceding inequality is therefore valid and gives
\begin{align*}
\left(\Lambda'(\lambda)\right)^2\leq 2C\left(\lambda\Lambda'(\lambda)-\Lambda(\lambda)\right).
\end{align*}
Rearranging,
\begin{align*}
\left(\Lambda'(\lambda)-C\lambda\right)^2+2C\Lambda(\lambda)-C^2\lambda^2\leq 0.
\end{align*}
The square term is non-negative, so
\begin{align*}
\Lambda(\lambda)\leq \frac{C\lambda^2}{2}
\end{align*}
for every $\lambda\geq 0$. Equivalently,
\begin{align*}
\int_X e^{\lambda(f(x)-m)}\,d\mu(x)\leq e^{C\lambda^2/2}.
\end{align*}
[guided]
We begin with the bounded case because exponential tilting is then completely harmless: all exponential moments are finite, and the logarithmic moment generating function is differentiable by differentiation under the integral sign.
Let
\begin{align*}
m:=\int_X f(x)\,d\mu(x)
\end{align*}
be the mean of $f$ under $\mu$. For $\lambda\geq 0$, define
\begin{align*}
\Lambda(\lambda):=\log\left(\int_X e^{\lambda(f(x)-m)}\,d\mu(x)\right).
\end{align*}
The function $f$ is bounded. Define $M\in[0,\infty)$ by
\begin{align*}
M:=\sup\{|f(x)-m|:x\in X\}.
\end{align*}
Then $M<\infty$, so for $\lambda$ in any compact interval the functions $e^{\lambda(f-m)}$ and $(f-m)e^{\lambda(f-m)}$ are bounded by constants depending only on that compact interval and $M$. Hence differentiation under the integral sign gives
\begin{align*}
\Lambda'(\lambda)=\frac{\int_X (f(x)-m)e^{\lambda(f(x)-m)}\,d\mu(x)}{\int_X e^{\lambda(f(x)-m)}\,d\mu(x)}.
\end{align*}
The purpose of the exponential tilt is to turn the derivative $\Lambda'(\lambda)$ into a difference of expectations. Define the tilted probability measure $\nu_\lambda$ by
\begin{align*}
\frac{d\nu_\lambda}{d\mu}(x):=\exp\left(\lambda(f(x)-m)-\Lambda(\lambda)\right).
\end{align*}
This is a Borel probability measure because the normalizing factor $e^{-\Lambda(\lambda)}$ is the reciprocal of $\int_X e^{\lambda(f-m)}\,d\mu$. It is also absolutely continuous with respect to $\mu$ by construction. Substituting this density into the formula for $\Lambda'(\lambda)$ gives
\begin{align*}
\Lambda'(\lambda)=\int_X (f(x)-m)\,d\nu_\lambda(x)=\int_X f(x)\,d\nu_\lambda(x)-\int_X f(x)\,d\mu(x).
\end{align*}
Now the Lipschitz assumption enters. Since $f$ is $1$-Lipschitz, the Kantorovich-Rubinstein dual formula implies
\begin{align*}
\int_X f(x)\,d\nu_\lambda(x)-\int_X f(x)\,d\mu(x)\leq W_1(\nu_\lambda,\mu).
\end{align*}
(citing a result not yet in the wiki: Kantorovich-Rubinstein duality)
The measure $\nu_\lambda$ satisfies $\nu_\lambda\ll\mu$, so the assumed $T_1(C)$ inequality applies to this particular choice of $\nu_\lambda$. Therefore
\begin{align*}
\Lambda'(\lambda)\leq W_1(\nu_\lambda,\mu)\leq \sqrt{2C\,H(\nu_\lambda\mid\mu)}.
\end{align*}
We next compute the entropy of the tilted measure. By definition,
\begin{align*}
H(\nu_\lambda\mid\mu)=\int_X \log\left(\frac{d\nu_\lambda}{d\mu}(x)\right)\,d\nu_\lambda(x).
\end{align*}
Since
\begin{align*}
\log\left(\frac{d\nu_\lambda}{d\mu}(x)\right)=\lambda(f(x)-m)-\Lambda(\lambda),
\end{align*}
we obtain
\begin{align*}
H(\nu_\lambda\mid\mu)=\lambda\int_X (f(x)-m)\,d\nu_\lambda(x)-\Lambda(\lambda)=\lambda\Lambda'(\lambda)-\Lambda(\lambda).
\end{align*}
Combining this entropy identity with the transportation bound gives the differential inequality
\begin{align*}
\Lambda'(\lambda)\leq \sqrt{2C\left(\lambda\Lambda'(\lambda)-\Lambda(\lambda)\right)}.
\end{align*}
It remains to extract the Gaussian bound from this inequality. Since $\Lambda$ is the logarithm of an exponential moment, it is convex. Also $\Lambda(0)=0$ and $\Lambda'(0)=\int_X(f-m)\,d\mu=0$. Convexity therefore implies $\Lambda'(\lambda)\geq 0$ for $\lambda\geq 0$, so we may square the preceding inequality:
\begin{align*}
\left(\Lambda'(\lambda)\right)^2\leq 2C\left(\lambda\Lambda'(\lambda)-\Lambda(\lambda)\right).
\end{align*}
Rearranging this as a completed square yields
\begin{align*}
\left(\Lambda'(\lambda)-C\lambda\right)^2+2C\Lambda(\lambda)-C^2\lambda^2\leq 0.
\end{align*}
The square term is non-negative, so it can only make the left-hand side larger. Hence
\begin{align*}
2C\Lambda(\lambda)-C^2\lambda^2\leq 0.
\end{align*}
Because $C>0$, division by $2C$ gives
\begin{align*}
\Lambda(\lambda)\leq \frac{C\lambda^2}{2}.
\end{align*}
Exponentiating the definition of $\Lambda$ gives the bounded-function Laplace estimate
\begin{align*}
\int_X e^{\lambda(f(x)-m)}\,d\mu(x)\leq e^{C\lambda^2/2}
\end{align*}
for every $\lambda\geq 0$.
[/guided]
[/step]
[step:Pass the Laplace bound to integrable Lipschitz functions by clipping]
Now let $f:X\to\mathbb{R}$ be $1$-Lipschitz and in $L^1(X,\mu)$. For each $k\in\mathbb{N}$, define the clipping map $\psi_k:\mathbb{R}\to\mathbb{R}$ by
\begin{align*}
\psi_k(r):=\max\{-k,\min\{r,k\}\}.
\end{align*}
Define the bounded Borel function $f_k:X\to\mathbb{R}$ by
\begin{align*}
f_k(x):=\psi_k(f(x)).
\end{align*}
The map $\psi_k$ is $1$-Lipschitz on $\mathbb{R}$, so $f_k$ is $1$-Lipschitz on $X$. Let
\begin{align*}
m_k:=\int_X f_k(x)\,d\mu(x).
\end{align*}
Since $|f_k|\leq |f|$ and $f_k(x)\to f(x)$ for every $x\in X$, dominated convergence gives
\begin{align*}
m_k\to m:=\int_X f(x)\,d\mu(x).
\end{align*}
By the bounded case, for every $k\in\mathbb{N}$ and every $\lambda\geq 0$,
\begin{align*}
\int_X e^{\lambda(f_k(x)-m_k)}\,d\mu(x)\leq e^{C\lambda^2/2}.
\end{align*}
For fixed $\lambda\geq 0$, the non-negative functions
\begin{align*}
x\mapsto e^{\lambda(f_k(x)-m_k)}
\end{align*}
converge pointwise to
\begin{align*}
x\mapsto e^{\lambda(f(x)-m)}.
\end{align*}
By Fatou's lemma,
\begin{align*}
\int_X e^{\lambda(f(x)-m)}\,d\mu(x)\leq \liminf_{k\to\infty}\int_X e^{\lambda(f_k(x)-m_k)}\,d\mu(x)\leq e^{C\lambda^2/2}.
\end{align*}
(citing a result not yet in the wiki: Fatou's lemma)
[/step]
[step:Apply Chernoff optimization to obtain the upper tail]
Let
\begin{align*}
m:=\int_X f(x)\,d\mu(x).
\end{align*}
For $t=0$, the desired estimate is immediate because $\mu$ is a probability measure and $e^0=1$. Fix $t>0$ and $\lambda>0$. Define the Borel set
\begin{align*}
A_t:=\{x\in X:f(x)-m\geq t\}.
\end{align*}
On $A_t$,
\begin{align*}
e^{\lambda(f(x)-m)}\geq e^{\lambda t}.
\end{align*}
Therefore
\begin{align*}
e^{\lambda t}\mu(A_t)\leq \int_{A_t} e^{\lambda(f(x)-m)}\,d\mu(x)\leq \int_X e^{\lambda(f(x)-m)}\,d\mu(x)\leq e^{C\lambda^2/2}.
\end{align*}
Hence
\begin{align*}
\mu(A_t)\leq \exp\left(\frac{C\lambda^2}{2}-\lambda t\right).
\end{align*}
Choosing $\lambda=t/C$ gives
\begin{align*}
\mu(A_t)\leq \exp\left(-\frac{t^2}{2C}\right).
\end{align*}
Thus
\begin{align*}
\mu\left(\left\{x\in X:f(x)-\int_X f\,d\mu\geq t\right\}\right)\leq \exp\left(-\frac{t^2}{2C}\right)
\end{align*}
for every $t\geq 0$.
[/step]
[step:Apply the same argument to the negative function for the lower tail]
The function $-f:X\to\mathbb{R}$ is also $1$-Lipschitz and belongs to $L^1(X,\mu)$. Applying the upper-tail estimate to $-f$ gives, for every $t\geq 0$,
\begin{align*}
\mu\left(\left\{x\in X:-f(x)-\int_X (-f)\,d\mu\geq t\right\}\right)\leq \exp\left(-\frac{t^2}{2C}\right).
\end{align*}
Since
\begin{align*}
-f(x)-\int_X (-f)\,d\mu=\int_X f\,d\mu-f(x),
\end{align*}
we obtain
\begin{align*}
\mu\left(\left\{x\in X:\int_X f\,d\mu-f(x)\geq t\right\}\right)\leq \exp\left(-\frac{t^2}{2C}\right).
\end{align*}
This proves both claimed concentration inequalities.
[/step]