Transport-Entropy Concentration Principle — Statement & Proof

Transport-Entropy Concentration Principle (Theorem # 6808)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] First we show that every $1$-Lipschitz function is $\mu$-integrable, using the finite first moment assumption. We then prove the key exponential moment estimate for bounded $1$-Lipschitz functions by applying $T_1(C)$ to the exponentially tilted probability measure and using the elementary coupling bound behind the Kantorovich-Rubinstein dual inequality. A truncation argument extends the exponential estimate to arbitrary $1$-Lipschitz functions. Finally, Markov's inequality and optimization in the Laplace parameter give the Gaussian concentration bound. [/proofplan] [step:Use the finite first moment to prove integrability of Lipschitz functions] Fix a point $x_0 \in X$. Since $\mu \in \mathcal{P}_1(X)$, define the distance function $\rho_{x_0}:X \to [0,\infty)$ by \begin{align*} \rho_{x_0}(x) := d(x,x_0). \end{align*} Then $\rho_{x_0}$ belongs to $L^1(X,\mu)$. Let \begin{align*} f:X &\to \mathbb{R} \end{align*} be $1$-Lipschitz. For every $x \in X$, the Lipschitz condition gives \begin{align*} |f(x)| \leq |f(x_0)| + |f(x)-f(x_0)| \leq |f(x_0)| + d(x,x_0). \end{align*} The right-hand side is $\mu$-integrable, because $\mu(X)=1$ and $\rho_{x_0} \in L^1(X,\mu)$. Hence $f \in L^1(X,\mu)$. [/step] [step:Derive the exponential moment estimate for bounded Lipschitz functions] Let \begin{align*} g:X &\to \mathbb{R} \end{align*} be a bounded $1$-Lipschitz Borel function, and define its $\mu$-mean by \begin{align*} m_g := \int_X g\,d\mu(x). \end{align*} Fix $\lambda \geq 0$. Define the normalizing constant \begin{align*} Z_\lambda := \int_X e^{\lambda g}\,d\mu(x), \end{align*} and define the exponentially tilted probability measure $\nu_\lambda$ on the Borel $\sigma$-algebra of $X$ by \begin{align*} \nu_\lambda(A) := \frac{1}{Z_\lambda}\int_A e^{\lambda g}\,d\mu(x) \end{align*} for every Borel set $A \subset X$. Since $g$ is bounded, $0 < Z_\lambda < \infty$, and $\nu_\lambda \ll \mu$. Moreover, if $M_\lambda := \exp(\lambda\|g\|_\infty)/Z_\lambda$, then $d\nu_\lambda/d\mu \leq M_\lambda$, so \begin{align*} \int_X d(x,x_0)\,d\nu_\lambda(x) \leq M_\lambda \int_X d(x,x_0)\,d\mu(x) < \infty. \end{align*} Thus $\nu_\lambda \in \mathcal{P}_1(X)$, and the hypothesis $T_1(C)$ applies to $\nu_\lambda$. Its relative entropy is \begin{align*} H(\nu_\lambda \mid \mu) = \int_X \log\left(\frac{d\nu_\lambda}{d\mu}\right)\,d\nu_\lambda(x) = \lambda \int_X g\,d\nu_\lambda(x) - \log Z_\lambda. \end{align*} We first record the coupling estimate used for $W_1$. For every coupling $\pi$ of $\nu_\lambda$ and $\mu$, the function $g$ being $1$-Lipschitz gives \begin{align*} g(x)-g(y) \leq d(x,y) \end{align*} for every $(x,y) \in X \times X$. Integrating with respect to $\pi$ yields \begin{align*} \int_X g\,d\nu_\lambda(x)-\int_X g\,d\mu(y) = \int_{X \times X} (g(x)-g(y))\,d\pi(x,y) \leq \int_{X \times X} d(x,y)\,d\pi(x,y). \end{align*} Applying the same argument to the $1$-Lipschitz function $-g:X\to\mathbb{R}$ gives \begin{align*} \int_X g\,d\mu(y)-\int_X g\,d\nu_\lambda(x) \leq \int_{X \times X} d(x,y)\,d\pi(x,y). \end{align*} Taking the infimum over all couplings $\pi$ in both inequalities gives \begin{align*} \left|\int_X g\,d\nu_\lambda(x)-m_g\right| \leq W_1(\nu_\lambda,\mu). \end{align*} Set \begin{align*} a_\lambda := \int_X g\,d\nu_\lambda(x)-m_g. \end{align*} By the previous inequality and $T_1(C)$, \begin{align*} |a_\lambda| \leq \sqrt{2C H(\nu_\lambda \mid \mu)}. \end{align*} Since $H(\nu_\lambda \mid \mu)=\lambda a_\lambda+\lambda m_g-\log Z_\lambda$, this gives \begin{align*} a_\lambda^2 \leq 2C\left(\lambda a_\lambda+\lambda m_g-\log Z_\lambda\right). \end{align*} Therefore \begin{align*} \log Z_\lambda-\lambda m_g \leq \lambda a_\lambda-\frac{a_\lambda^2}{2C}. \end{align*} For every real number $a$, completing the square gives \begin{align*} \lambda a-\frac{a^2}{2C} \leq \frac{C\lambda^2}{2}. \end{align*} Applying this with $a=a_\lambda$ gives \begin{align*} \log \int_X e^{\lambda(g-m_g)}\,d\mu(x) = \log Z_\lambda-\lambda m_g \leq \frac{C\lambda^2}{2}. \end{align*} Exponentiating, \begin{align*} \int_X e^{\lambda(g-m_g)}\,d\mu(x) \leq \exp\left(\frac{C\lambda^2}{2}\right). \end{align*} [guided] We want to turn the transport-[entropy inequality](/theorems/6729) into an exponential integrability statement. The standard way to do this is to test $T_1(C)$ on a probability measure that favours large values of the function. Let \begin{align*} g:X &\to \mathbb{R} \end{align*} be bounded, Borel, and $1$-Lipschitz, and let \begin{align*} m_g := \int_X g\,d\mu(x). \end{align*} For a fixed $\lambda \geq 0$, define \begin{align*} Z_\lambda := \int_X e^{\lambda g}\,d\mu(x). \end{align*} Because $g$ is bounded, the integrand $e^{\lambda g}$ is bounded above and below by positive constants, so $0<Z_\lambda<\infty$. We may therefore define a probability measure $\nu_\lambda$ by \begin{align*} \nu_\lambda(A) := \frac{1}{Z_\lambda}\int_A e^{\lambda g}\,d\mu(x) \end{align*} for each Borel set $A \subset X$. This measure satisfies $\nu_\lambda \ll \mu$. To check the other hypothesis in $T_1(C)$, define $M_\lambda := \exp(\lambda\|g\|_\infty)/Z_\lambda$. Since $d\nu_\lambda/d\mu \leq M_\lambda$ and $\mu\in\mathcal P_1(X)$, \begin{align*} \int_X d(x,x_0)\,d\nu_\lambda(x) \leq M_\lambda\int_X d(x,x_0)\,d\mu(x) < \infty. \end{align*} Thus $\nu_\lambda\in\mathcal P_1(X)$, so the hypothesis $T_1(C)$ applies to $\nu_\lambda$. The density of $\nu_\lambda$ with respect to $\mu$ is \begin{align*} \frac{d\nu_\lambda}{d\mu}(x)=\frac{e^{\lambda g(x)}}{Z_\lambda}. \end{align*} Thus \begin{align*} \log\left(\frac{d\nu_\lambda}{d\mu}(x)\right)=\lambda g(x)-\log Z_\lambda. \end{align*} Integrating this identity with respect to $\nu_\lambda$ gives \begin{align*} H(\nu_\lambda \mid \mu) = \lambda \int_X g\,d\nu_\lambda(x)-\log Z_\lambda. \end{align*} Next we connect the mean shift of $g$ under $\nu_\lambda$ to $W_1(\nu_\lambda,\mu)$. Let $\pi$ be any coupling of $\nu_\lambda$ and $\mu$, meaning that $\pi$ is a probability measure on $X \times X$ whose first marginal is $\nu_\lambda$ and whose second marginal is $\mu$. Since $g$ is $1$-Lipschitz, \begin{align*} g(x)-g(y) \leq |g(x)-g(y)| \leq d(x,y) \end{align*} for all $x,y \in X$. Therefore \begin{align*} \int_X g\,d\nu_\lambda(x)-\int_X g\,d\mu(y) = \int_{X \times X} (g(x)-g(y))\,d\pi(x,y) \leq \int_{X \times X} d(x,y)\,d\pi(x,y). \end{align*} Applying the same estimate to the $1$-Lipschitz function $-g:X\to\mathbb{R}$ gives the reverse mean difference bound, and taking the infimum over all couplings $\pi$ in both inequalities gives \begin{align*} \left|\int_X g\,d\nu_\lambda(x)-m_g\right| \leq W_1(\nu_\lambda,\mu). \end{align*} Define the mean displacement \begin{align*} a_\lambda := \int_X g\,d\nu_\lambda(x)-m_g. \end{align*} The transport-entropy hypothesis now gives \begin{align*} |a_\lambda| \leq W_1(\nu_\lambda,\mu) \leq \sqrt{2C H(\nu_\lambda \mid \mu)}. \end{align*} Using the entropy identity and rewriting $\int_X g\,d\nu_\lambda(x)=a_\lambda+m_g$, we obtain \begin{align*} H(\nu_\lambda \mid \mu) = \lambda(a_\lambda+m_g)-\log Z_\lambda. \end{align*} Thus \begin{align*} a_\lambda^2 \leq 2C\left(\lambda a_\lambda+\lambda m_g-\log Z_\lambda\right). \end{align*} Solving this inequality for $\log Z_\lambda-\lambda m_g$ gives \begin{align*} \log Z_\lambda-\lambda m_g \leq \lambda a_\lambda-\frac{a_\lambda^2}{2C}. \end{align*} The right-hand side is a quadratic function of $a_\lambda$. Its maximum over $a \in \mathbb{R}$ is found by completing the square: \begin{align*} \lambda a-\frac{a^2}{2C} = \frac{C\lambda^2}{2}-\frac{(a-C\lambda)^2}{2C} \leq \frac{C\lambda^2}{2}. \end{align*} Therefore \begin{align*} \log Z_\lambda-\lambda m_g \leq \frac{C\lambda^2}{2}. \end{align*} Since \begin{align*} \int_X e^{\lambda(g-m_g)}\,d\mu(x)=e^{-\lambda m_g}Z_\lambda, \end{align*} we conclude that \begin{align*} \int_X e^{\lambda(g-m_g)}\,d\mu(x) \leq \exp\left(\frac{C\lambda^2}{2}\right). \end{align*} [/guided] [/step] [step:Pass from bounded truncations to an arbitrary Lipschitz function] Let \begin{align*} f:X &\to \mathbb{R} \end{align*} be $1$-Lipschitz. For each integer $k \geq 1$, define the truncation map \begin{align*} \tau_k:\mathbb{R} &\to [-k,k] \end{align*} by \begin{align*} \tau_k(t) := \max\{-k,\min\{t,k\}\}, \end{align*} and define \begin{align*} f_k:X &\to [-k,k] \end{align*} by \begin{align*} f_k(x) := \tau_k(f(x)). \end{align*} The map $\tau_k$ is $1$-Lipschitz on $\mathbb{R}$, so $f_k$ is a bounded $1$-Lipschitz Borel function on $X$. Let \begin{align*} m_k := \int_X f_k\,d\mu(x), \qquad m := \int_X f\,d\mu(x). \end{align*} Since $f \in L^1(X,\mu)$ and $f_k \to f$ pointwise with $|f_k| \leq |f|$, the [dominated convergence theorem](/theorems/4) gives $m_k \to m$. Applying the exponential moment estimate to $f_k$ gives, for every $\lambda \geq 0$, \begin{align*} \int_X e^{\lambda(f_k-m_k)}\,d\mu(x) \leq \exp\left(\frac{C\lambda^2}{2}\right). \end{align*} Since $f_k-m_k \to f-m$ pointwise, Fatou's lemma applied to the non-negative functions $e^{\lambda(f_k-m_k)}$ gives \begin{align*} \int_X e^{\lambda(f-m)}\,d\mu(x) \leq \liminf_{k \to \infty}\int_X e^{\lambda(f_k-m_k)}\,d\mu(x) \leq \exp\left(\frac{C\lambda^2}{2}\right). \end{align*} Thus every $1$-Lipschitz $f$ satisfies the exponential moment bound \begin{align*} \int_X e^{\lambda\left(f-\int_X f\,d\mu\right)}\,d\mu(x) \leq \exp\left(\frac{C\lambda^2}{2}\right) \end{align*} for every $\lambda \geq 0$. [/step] [step:Apply Markov's inequality and optimize the Laplace parameter] Let $f:X \to \mathbb{R}$ be $1$-Lipschitz, and set \begin{align*} m := \int_X f\,d\mu(x). \end{align*} Fix $r \geq 0$ and $\lambda > 0$. Define the non-negative measurable function \begin{align*} Y_\lambda:X &\to [0,\infty) \end{align*} by \begin{align*} Y_\lambda(x) := e^{\lambda(f(x)-m)}. \end{align*} The event $\{x \in X : f(x)-m \geq r\}$ is contained in $\{x \in X : Y_\lambda(x) \geq e^{\lambda r}\}$. Markov's inequality gives \begin{align*} \mu\left(\left\{x \in X : f(x)-m \geq r\right\}\right) \leq e^{-\lambda r}\int_X e^{\lambda(f-m)}\,d\mu(x). \end{align*} Using the exponential moment estimate, \begin{align*} \mu\left(\left\{x \in X : f(x)-m \geq r\right\}\right) \leq \exp\left(-\lambda r+\frac{C\lambda^2}{2}\right). \end{align*} For $r>0$, choose $\lambda=r/C$, which is positive because $C>0$. Then \begin{align*} -\lambda r+\frac{C\lambda^2}{2} = -\frac{r^2}{C}+\frac{r^2}{2C} = -\frac{r^2}{2C}. \end{align*} Hence \begin{align*} \mu\left(\left\{x \in X : f(x)-m \geq r\right\}\right) \leq \exp\left(-\frac{r^2}{2C}\right) \end{align*} for every $r>0$. When $r=0$, the desired inequality reads \begin{align*} \mu\left(\left\{x \in X : f(x)-m \geq 0\right\}\right) \leq 1, \end{align*} which holds because $\mu$ is a probability measure. Combining the cases $r>0$ and $r=0$ proves the stated concentration inequality for all $r \geq 0$. [/step]

Prerequisites (0/2 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

test

Definitions & Concepts

Event

Explore Further

What brings you to Androma?

Start with a route through the knowledge graph.

Transport-Entropy Concentration Principle (Theorem # 6808)

Discussion

Proof

Prerequisites (0/2 completed)

Prerequisites Graph

Explore Further

Sign in to Androma

Check your inbox

One last step

Transport-Entropy Concentration Principle (Theorem # 6808)

Discussion

Proof

Prerequisites (0/2 completed)

Prerequisites Graph

Explore Further