Bobkov–Götze Concentration Theorem for $T_1$ Transport-Entropy Inequalities

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We first derive the exponential moment estimate implied by the $T_1(C)$ inequality. For a bounded $1$-Lipschitz function, we tilt the measure by the exponential density, compute its entropy exactly, and combine $T_1(C)$ with the coupling definition of $W_1$. A truncation argument extends the exponential moment bound to arbitrary $1$-Lipschitz functions, and the pointwise exponential tail estimate then gives the probability bound after optimizing the exponential parameter. The product statement is the same argument applied to the metric probability space $(E^n,d_{2,n},\nu^{\otimes n})$. [/proofplan] [step:Reduce the concentration estimate to an exponential moment bound] Let $\mathcal{B}(E)$ denote the Borel $\sigma$-algebra of the [metric space](/page/Metric%20Space) $(E,d)$. Let $\mathcal{P}_1(E)$ denote the set of Borel probability measures $\rho$ on $(E,\mathcal{B}(E))$ for which there exists $x_0\in E$ with \begin{align*} \int_E d(x,x_0)\,d\rho(x)<\infty. \end{align*} By the hypothesis $\nu \in \mathcal{P}_1(E)$ in the updated theorem statement, fix $x_0 \in E$ with \begin{align*} \int_E d(x,x_0)\,d\nu(x)<\infty. \end{align*} If $f:E \to \mathbb{R}$ is $1$-Lipschitz, then \begin{align*} |f(x)| \leq |f(x_0)|+d(x,x_0) \end{align*} for every $x \in E$, so $f \in L^1(E,\mathcal{B}(E),\nu)$. It remains to prove that, for every $\theta \geq 0$, \begin{align*} \int_E \exp\left(\theta\left(f-\int_E f\,d\nu\right)\right)\,d\nu \leq \exp\left(\frac{C\theta^2}{2}\right). \end{align*} Once this is known, the upper tail estimate follows from the following pointwise exponential bound. For $\theta>0$ and \begin{align*} A_r:=\left\{x \in E : f(x)-\int_E f\,d\nu \geq r\right\}, \end{align*} we have on $A_r$ the pointwise inequality \begin{align*} \mathbb{1}_{A_r}(x) \leq \exp\left(-\theta r\right)\exp\left(\theta\left(f(x)-\int_E f\,d\nu\right)\right). \end{align*} Integrating with respect to $\nu$ gives \begin{align*} \nu(A_r) \leq \exp\left(-\theta r+\frac{C\theta^2}{2}\right). \end{align*} Since the updated theorem statement assumes $C>0$, for $r>0$ choose $\theta=r/C$ to obtain \begin{align*} \nu(A_r) \leq \exp\left(-\frac{r^2}{2C}\right). \end{align*} For $r=0$, the same bound is the statement $\nu(A_0)\leq 1$. [guided] The concentration bound is obtained from a [Laplace transform](/page/Laplace%20Transform) estimate. First note that $f$ is integrable. Here $\mathcal{B}(E)$ is the Borel $\sigma$-algebra of $(E,d)$, and $\mathcal{P}_1(E)$ denotes the Borel probability measures with finite first moment. Since the corrected theorem statement assumes $\nu\in\mathcal{P}_1(E)$, choose $x_0 \in E$ such that \begin{align*} \int_E d(x,x_0)\,d\nu(x)<\infty. \end{align*} The $1$-Lipschitz property gives \begin{align*} |f(x)-f(x_0)| \leq d(x,x_0), \end{align*} hence \begin{align*} |f(x)| \leq |f(x_0)|+d(x,x_0). \end{align*} The right-hand side is $\nu$-integrable, so $f \in L^1(E,\mathcal{B}(E),\nu)$. The key analytic target is the exponential moment estimate \begin{align*} \int_E \exp\left(\theta\left(f-\int_E f\,d\nu\right)\right)\,d\nu \leq \exp\left(\frac{C\theta^2}{2}\right) \end{align*} for every $\theta \geq 0$. Why is this the right target? Because the event $f-\int_E f\,d\nu \geq r$ can be bounded by exponentiating the inequality. Define \begin{align*} A_r:=\left\{x \in E : f(x)-\int_E f\,d\nu \geq r\right\}. \end{align*} For $x \in A_r$ and $\theta>0$, \begin{align*} \exp\left(\theta\left(f(x)-\int_E f\,d\nu\right)\right) \geq \exp(\theta r). \end{align*} Equivalently, \begin{align*} \mathbb{1}_{A_r}(x) \leq \exp(-\theta r)\exp\left(\theta\left(f(x)-\int_E f\,d\nu\right)\right). \end{align*} Integrating this pointwise inequality with respect to $\nu$ and then using the exponential moment estimate gives \begin{align*} \nu(A_r) \leq \exp(-\theta r)\int_E \exp\left(\theta\left(f-\int_E f\,d\nu\right)\right)\,d\nu \leq \exp\left(-\theta r+\frac{C\theta^2}{2}\right). \end{align*} Since $C>0$, the right-hand side is minimized over $\theta>0$ at $\theta=r/C$ when $r>0$, giving \begin{align*} \nu(A_r) \leq \exp\left(-\frac{r^2}{2C}\right). \end{align*} When $r=0$, the desired bound is just $\nu(A_0)\leq 1$. [/guided] [/step] [step:Derive the Laplace bound for bounded Lipschitz functions] Let $g:E \to \mathbb{R}$ be a bounded $1$-Lipschitz Borel function. Define its $\nu$-mean $m_g \in \mathbb{R}$ by \begin{align*} m_g:=\int_E g\,d\nu(x). \end{align*} For $\theta \geq 0$, define the normalizing factor $Z_g:[0,\infty)\to(0,\infty)$ and the logarithmic moment generating function $\psi_g:[0,\infty)\to\mathbb{R}$ by \begin{align*} Z_g(\theta):=\int_E \exp(\theta(g-m_g))\,d\nu(x) \end{align*} and \begin{align*} \psi_g(\theta):=\log Z_g(\theta). \end{align*} Since $g-m_g$ is bounded, differentiation under the integral sign gives that $\psi_g$ is differentiable on $(0,\infty)$ and has right derivative at $0$ given by \begin{align*} \psi_{g,+}'(0)=\int_E(g-m_g)\,d\nu(x)=0. \end{align*} Moreover $\psi_g$ is convex: for $0\leq \lambda\leq 1$ and $\theta_1,\theta_2\geq0$, the integral form of Holder's inequality with conjugate exponents $1/\lambda$ and $1/(1-\lambda)$, with the endpoint cases interpreted directly, gives \begin{align*} Z_g(\lambda\theta_1+(1-\lambda)\theta_2)\leq Z_g(\theta_1)^\lambda Z_g(\theta_2)^{1-\lambda}. \end{align*} Taking logarithms gives the convexity of $\psi_g$. Since $g-m_g$ is bounded, $Z_g$ is finite and positive on $[0,\infty)$, so $\psi_g$ is a finite convex function on the interval $[0,\infty)$. Fix $\theta>0$. Define the probability measure $\mu_\theta$ on $(E,\mathcal{B}(E))$ by \begin{align*} \frac{d\mu_\theta}{d\nu}(x):=\frac{\exp(\theta(g(x)-m_g))}{Z_g(\theta)}. \end{align*} Then $\mu_\theta \ll \nu$ and $\mu_\theta \in \mathcal{P}_1(E)$, since $\nu \in \mathcal{P}_1(E)$ and $d\mu_\theta/d\nu$ is bounded. The entropy is \begin{align*} H(\mu_\theta\mid\nu)=\theta\int_E(g-m_g)\,d\mu_\theta(x)-\psi_g(\theta). \end{align*} Also, \begin{align*} \psi_g'(\theta)=\int_E(g-m_g)\,d\mu_\theta(x). \end{align*} Set \begin{align*} a_\theta:=\psi_g'(\theta). \end{align*} Since $\psi_g$ is a finite convex function on $[0,\infty)$ and is differentiable at $\theta>0$, the monotonicity of secant slopes for convex functions implies that its right derivative at $0$ is at most its derivative at $\theta$. Thus \begin{align*} 0=\psi_{g,+}'(0)\leq \psi_g'(\theta)=a_\theta. \end{align*} We prove the needed $W_1$ bound directly. Let $\Pi(\mu_\theta,\nu)$ denote the set of couplings of $\mu_\theta$ and $\nu$. This set is non-empty because $\mu_\theta\otimes\nu \in \Pi(\mu_\theta,\nu)$. For any $\pi \in \Pi(\mu_\theta,\nu)$, boundedness of $g$ makes $g(x)-g(y)$ $\pi$-integrable, and the marginal identities give \begin{align*} \int_E g\,d\mu_\theta(x)-\int_E g\,d\nu(y)=\int_{E\times E}(g(x)-g(y))\,d\pi(x,y). \end{align*} Because $g$ is $1$-Lipschitz, \begin{align*} \int_{E\times E}(g(x)-g(y))\,d\pi(x,y)\leq \int_{E\times E}d(x,y)\,d\pi(x,y). \end{align*} Taking the infimum over $\pi \in \Pi(\mu_\theta,\nu)$ gives \begin{align*} a_\theta=\int_E g\,d\mu_\theta(x)-\int_E g\,d\nu(x) \leq W_1(\mu_\theta,\nu), \end{align*} where $W_1$ denotes the Wasserstein distance induced by the metric $d$ on $E$, as in the theorem statement. Applying the $T_1(C)$ inequality to $\mu_\theta$ is valid because $\mu_\theta \ll \nu$. Thus \begin{align*} a_\theta^2 \leq W_1(\mu_\theta,\nu)^2 \leq 2C H(\mu_\theta\mid\nu)=2C(\theta a_\theta-\psi_g(\theta)). \end{align*} Therefore \begin{align*} \psi_g(\theta) \leq \theta a_\theta-\frac{a_\theta^2}{2C}. \end{align*} For every real number $a$, \begin{align*} \theta a-\frac{a^2}{2C}=\frac{C\theta^2}{2}-\frac{(a-C\theta)^2}{2C}\leq \frac{C\theta^2}{2}. \end{align*} Using $a=a_\theta$, we obtain \begin{align*} \psi_g(\theta)\leq \frac{C\theta^2}{2}. \end{align*} Exponentiating gives \begin{align*} \int_E \exp\left(\theta(g-m_g)\right)\,d\nu(x) \leq \exp\left(\frac{C\theta^2}{2}\right) \end{align*} for every $\theta>0$, and the case $\theta=0$ is equality. [guided] We prove the Laplace estimate first for a bounded Lipschitz function because the exponential tilt is then automatically well behaved. Let $g:E\to\mathbb{R}$ be a bounded $1$-Lipschitz Borel function and define \begin{align*} m_g:=\int_E g\,d\nu(x). \end{align*} For $\theta\geq0$, set \begin{align*} Z_g(\theta):=\int_E \exp(\theta(g-m_g))\,d\nu(x) \end{align*} and \begin{align*} \psi_g(\theta):=\log Z_g(\theta). \end{align*} The boundedness of $g-m_g$ permits differentiation under the integral sign. Hence $\psi_g$ is differentiable for $\theta>0$, and its right derivative at $0$ is \begin{align*} \psi_{g,+}'(0)=\frac{\int_E(g-m_g)\,d\nu(x)}{Z_g(0)}=0, \end{align*} because $Z_g(0)=1$ and $m_g$ is the $\nu$-mean of $g$. We also need convexity of $\psi_g$. For $0\leq \lambda\leq 1$ and $\theta_1,\theta_2\geq0$, write the integrand defining $Z_g(\lambda\theta_1+(1-\lambda)\theta_2)$ as the product of $\exp(\theta_1(g-m_g))^\lambda$ and $\exp(\theta_2(g-m_g))^{1-\lambda}$. The integral form of Holder's inequality with conjugate exponents $1/\lambda$ and $1/(1-\lambda)$ gives \begin{align*} Z_g(\lambda\theta_1+(1-\lambda)\theta_2)\leq Z_g(\theta_1)^\lambda Z_g(\theta_2)^{1-\lambda}, \end{align*} with the endpoint cases $\lambda=0$ and $\lambda=1$ being identities. Taking logarithms proves that $\psi_g$ is convex. Because $g-m_g$ is bounded, $Z_g(\theta)$ is finite and positive for every $\theta\geq0$, so $\psi_g$ is a finite convex function on $[0,\infty)$. For a fixed $\theta>0$, define the tilted probability measure $\mu_\theta$ by \begin{align*} \frac{d\mu_\theta}{d\nu}(x):=\frac{\exp(\theta(g(x)-m_g))}{Z_g(\theta)}. \end{align*} This is a probability measure because the denominator is exactly the integral of the numerator. It is absolutely continuous with respect to $\nu$, and its density is bounded because $g$ is bounded. Since $\nu\in\mathcal{P}_1(E)$, the bounded density also gives $\mu_\theta\in\mathcal{P}_1(E)$. The entropy can now be computed from the definition of relative entropy: \begin{align*} H(\mu_\theta\mid\nu)=\int_E \log\left(\frac{d\mu_\theta}{d\nu}\right)\,d\mu_\theta(x)=\theta\int_E(g-m_g)\,d\mu_\theta(x)-\psi_g(\theta). \end{align*} Differentiating $\psi_g$ at $\theta>0$ gives \begin{align*} \psi_g'(\theta)=\int_E(g-m_g)\,d\mu_\theta(x). \end{align*} Write this quantity as \begin{align*} a_\theta:=\psi_g'(\theta). \end{align*} The monotonicity of secant slopes for finite convex functions applies because $\psi_g$ is finite and convex on $[0,\infty)$ and differentiable at the chosen point $\theta>0$. It implies that the right derivative at $0$ is at most the derivative at $\theta$. Therefore \begin{align*} 0=\psi_{g,+}'(0)\leq\psi_g'(\theta)=a_\theta. \end{align*} The next step is the only place where transport enters. We do not need the full [Kantorovich-Rubinstein duality theorem](/theorems/6779); the one-sided bound follows from the definition of $W_1$. Let $\Pi(\mu_\theta,\nu)$ be the set of couplings of $\mu_\theta$ and $\nu$. It is non-empty, for instance because $\mu_\theta\otimes\nu$ is a coupling. If $\pi\in\Pi(\mu_\theta,\nu)$, then the marginal identities give \begin{align*} \int_E g\,d\mu_\theta(x)-\int_E g\,d\nu(y)=\int_{E\times E}(g(x)-g(y))\,d\pi(x,y). \end{align*} Since $g$ is $1$-Lipschitz, $g(x)-g(y)\leq d(x,y)$ for all $x,y\in E$. Therefore \begin{align*} \int_E g\,d\mu_\theta(x)-\int_E g\,d\nu(y)\leq \int_{E\times E}d(x,y)\,d\pi(x,y). \end{align*} Taking the infimum over all couplings gives \begin{align*} a_\theta\leq W_1(\mu_\theta,\nu), \end{align*} where $W_1$ is the Wasserstein distance induced by the metric $d$ on $E$. This notation matches the $W_1$ appearing in the theorem statement. The $T_1(C)$ inequality applies to $\mu_\theta$ because $\mu_\theta\ll\nu$. Combining the previous estimate with $T_1(C)$ gives \begin{align*} a_\theta^2\leq W_1(\mu_\theta,\nu)^2\leq2C H(\mu_\theta\mid\nu)=2C(\theta a_\theta-\psi_g(\theta)). \end{align*} Rearranging yields \begin{align*} \psi_g(\theta)\leq \theta a_\theta-\frac{a_\theta^2}{2C}. \end{align*} Finally complete the square: \begin{align*} \theta a_\theta-\frac{a_\theta^2}{2C}=\frac{C\theta^2}{2}-\frac{(a_\theta-C\theta)^2}{2C}\leq\frac{C\theta^2}{2}. \end{align*} Thus \begin{align*} \psi_g(\theta)\leq\frac{C\theta^2}{2}. \end{align*} Exponentiating this inequality gives \begin{align*} \int_E\exp\left(\theta(g-m_g)\right)\,d\nu(x)\leq\exp\left(\frac{C\theta^2}{2}\right) \end{align*} for every $\theta>0$, and for $\theta=0$ both sides equal $1$. [/guided] [/step] [step:Pass from bounded Lipschitz functions to arbitrary Lipschitz functions] Let $f:E\to\mathbb{R}$ be $1$-Lipschitz. For each $k \in \mathbb{N}$, define the truncation map $\tau_k:\mathbb{R}\to[-k,k]$ by \begin{align*} \tau_k(t):=\min\{k,\max\{-k,t\}\}. \end{align*} The map $\tau_k$ is $1$-Lipschitz, so the function $f_k:E\to\mathbb{R}$ defined by \begin{align*} f_k:=\tau_k\circ f \end{align*} is bounded and $1$-Lipschitz. By the preceding step, \begin{align*} \int_E \exp\left(\theta\left(f_k-\int_E f_k\,d\nu\right)\right)\,d\nu \leq \exp\left(\frac{C\theta^2}{2}\right) \end{align*} for every $\theta\geq 0$. Since $|\tau_k(f)|\leq |f|$ and $f\in L^1(E,\mathcal{B}(E),\nu)$, convergence under the integrable domination $|f|$ gives \begin{align*} \int_E f_k\,d\nu \to \int_E f\,d\nu. \end{align*} Also $f_k(x)\to f(x)$ for every $x\in E$. Hence \begin{align*} \exp\left(\theta\left(f_k-\int_E f_k\,d\nu\right)\right)\to \exp\left(\theta\left(f-\int_E f\,d\nu\right)\right) \end{align*} pointwise. Fatou's lemma applies to the non-negative [measurable functions](/page/Measurable%20Functions) \begin{align*} x\mapsto \exp\left(\theta\left(f_k(x)-\int_E f_k\,d\nu\right)\right). \end{align*} It gives \begin{align*} \int_E \exp\left(\theta\left(f-\int_E f\,d\nu\right)\right)\,d\nu \leq \liminf_{k\to\infty}\int_E \exp\left(\theta\left(f_k-\int_E f_k\,d\nu\right)\right)\,d\nu \leq \exp\left(\frac{C\theta^2}{2}\right). \end{align*} This proves the exponential moment bound for $f$. [/step] [step:Apply the same argument on the product metric space] Fix $n\in\mathbb{N}$. Define the product metric $d_{2,n}:E^n\times E^n\to[0,\infty)$ by \begin{align*} d_{2,n}(x,y):=\left(\sum_{i=1}^n d(x_i,y_i)^2\right)^{1/2} \end{align*} for $x=(x_1,\dots,x_n)\in E^n$ and $y=(y_1,\dots,y_n)\in E^n$. Assume that $\nu^{\otimes n}$ satisfies the $T_1(C)$ inequality on $(E^n,d_{2,n})$. Let $F:E^n\to\mathbb{R}$ be $1$-Lipschitz with respect to $d_{2,n}$. Choose $x_0\in E$ such that \begin{align*} \int_E d(x,x_0)\,d\nu(x)<\infty. \end{align*} Let $x_{0,n}:=(x_0,\dots,x_0)\in E^n$. Since \begin{align*} d_{2,n}(x,x_{0,n})\leq\sum_{i=1}^n d(x_i,x_0) \end{align*} for $x=(x_1,\dots,x_n)\in E^n$, iterated integration of the non-negative coordinate functions gives \begin{align*} \int_{E^n}d_{2,n}(x,x_{0,n})\,d\nu^{\otimes n}(x)\leq\sum_{i=1}^n\int_E d(y,x_0)\,d\nu(y)<\infty. \end{align*} Let $\mathcal{P}_1(E^n,d_{2,n})$ denote the set of Borel probability measures on $E^n$ with finite first moment with respect to the metric $d_{2,n}$. The preceding estimate proves $\nu^{\otimes n}\in\mathcal{P}_1(E^n,d_{2,n})$. Applying the result already proved to the metric probability space \begin{align*} (E^n,d_{2,n},\nu^{\otimes n}) \end{align*} gives \begin{align*} \nu^{\otimes n}\bigl(\{x \in E^n : F(x)-\int_{E^n}F\,d\nu^{\otimes n}\geq r\}\bigr)\leq \exp\left(-\frac{r^2}{2C}\right) \end{align*} for every $r\geq 0$. This is exactly the asserted tensorized concentration estimate. [/step]

Prerequisites (0/1 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Definitions & Concepts

Event

What brings you to Androma?

Start with a route through the knowledge graph.