Concentration from the $T_2$ Transportation Inequality

Concentration from the $T_2$ Transportation Inequality (Theorem # 9597)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

No discussion available for this theorem.

Proof

[proofplan] We first observe that the assumed $T_2(C)$ inequality implies the corresponding $T_1(C)$ inequality, because $W_1\le W_2$ on probability measures with finite second transport cost. The Bobkov-Götze characterization of $T_1(C)$ then gives a sub-Gaussian Laplace transform estimate for every Lipschitz function centered by its $\rho$-mean. Applying Markov's inequality to the exponential of the centered function gives the upper-tail estimate after optimizing the exponential parameter. The lower-tail estimate follows by applying the same argument to $-f$. [/proofplan] [step:Derive the $T_1(C)$ transportation inequality from the assumed $T_2(C)$ inequality] Let $\nu\in\mathcal P(X)$. The set $\Pi(\nu,\rho)$ and the extended distances $W_1(\nu,\rho)$ and $W_2(\nu,\rho)$ are defined in the theorem statement. If every $\pi\in\Pi(\nu,\rho)$ satisfies \begin{align*} \int_{X\times X} d(x,y)^2\,d\pi(x,y)=\infty, \end{align*} then $W_2(\nu,\rho)=\infty$, and the inequality $W_1(\nu,\rho)\le W_2(\nu,\rho)$ holds in the extended sense. Otherwise choose $\pi\in\Pi(\nu,\rho)$ such that \begin{align*} \int_{X\times X} d(x,y)^2\,d\pi(x,y)<\infty. \end{align*} Let $u:X\times X\to[0,\infty)$ be the Borel function $u(x,y):=d(x,y)$. Then $u\in L^2(X\times X,\mathcal B(X\times X),\pi)$, and the Cauchy-Schwarz inequality in this $L^2$ space, applied to $u$ and the constant function $1$, gives \begin{align*} \int_{X\times X} d(x,y)\,d\pi(x,y) \le \left(\int_{X\times X} d(x,y)^2\,d\pi(x,y)\right)^{1/2}. \end{align*} Taking the infimum over all $\pi\in\Pi(\nu,\rho)$ gives \begin{align*} W_1(\nu,\rho)\le W_2(\nu,\rho). \end{align*} Therefore the hypothesis implies, for every $\nu\in\mathcal P(X)$, \begin{align*} W_1(\nu,\rho)^2\le W_2(\nu,\rho)^2\le 2C\,H(\nu\mid\rho). \end{align*} Thus $\rho$ satisfies $T_1(C)$ with the same normalization. [/step] [step:Apply the Bobkov-Götze criterion to the centered Lipschitz function] Define the finite real number \begin{align*} m:=\int_X f\,d\rho(x). \end{align*} Define the centered function $g:X\to\mathbb R$ by \begin{align*} g(x):=f(x)-m. \end{align*} Since subtracting a constant does not change Lipschitz constants, $g$ is $L$-Lipschitz. Since $f\in L^1(X,\mathcal B(X),\rho)$, we have \begin{align*} \int_X g\,d\rho=0. \end{align*} We use the Bobkov-Götze characterization of the $T_1(C)$ transportation inequality in the normalization \begin{align*} W_1(\nu,\rho)^2\le 2C\,H(\nu\mid\rho). \end{align*} This characterization states that, for every real-valued integrable $L$-Lipschitz function $h:X\to\mathbb R$ with $\int_X h\,d\rho(x)=0$, and every $\lambda\ge 0$, \begin{align*} \int_X \exp(\lambda h)\,d\rho(x) \le \exp\left(\frac{C\lambda^2L^2}{2}\right). \end{align*} We invoke the Bobkov-Götze characterization of $T_1$ in this stated normalization, including its standard integrable Lipschitz formulation obtained from the bounded Lipschitz formulation by applying the estimate to the truncations $((-M)\vee h)\wedge M$ and passing to the limit using monotone convergence for the positive and negative exponential tails. Applying it to $h=g$ gives, for every $\lambda\ge 0$, \begin{align*} \int_X \exp(\lambda g)\,d\rho \le \exp\left(\frac{C\lambda^2L^2}{2}\right). \end{align*} [guided] The role of this step is to convert the transportation inequality into a bound on exponential moments. We have already shown that $\rho$ satisfies $T_1(C)$ in the normalization \begin{align*} W_1(\nu,\rho)^2\le 2C\,H(\nu\mid\rho). \end{align*} The Bobkov-Götze characterization says that this is equivalent to the following Laplace transform estimate: if $h:X\to\mathbb R$ is an integrable $L$-Lipschitz function and satisfies \begin{align*} \int_X h\,d\rho(x)=0, \end{align*} then, for every $\lambda\ge 0$, \begin{align*} \int_X \exp(\lambda h)\,d\rho(x) \le \exp\left(\frac{C\lambda^2L^2}{2}\right). \end{align*} We use the Bobkov-Götze characterization as an external theorem in exactly this constant convention, in its standard extension from bounded Lipschitz functions to integrable Lipschitz functions by truncation and monotone convergence. Now define \begin{align*} m:=\int_X f\,d\rho(x). \end{align*} This number is finite because $f\in L^1(X,\mathcal B(X),\rho)$. Define $g:X\to\mathbb R$ by \begin{align*} g(x):=f(x)-m. \end{align*} The function $g$ is still $L$-Lipschitz, since for all $x,y\in X$, \begin{align*} |g(x)-g(y)|=|f(x)-f(y)|\le Ld(x,y). \end{align*} It is also centered: \begin{align*} \int_X g\,d\rho = \int_X f\,d\rho-\int_X m\,d\rho = m-m\rho(X) = 0, \end{align*} because $\rho(X)=1$. Therefore the Bobkov-Götze criterion applies to $g$ and gives, for every $\lambda\ge 0$, \begin{align*} \int_X \exp(\lambda g)\,d\rho \le \exp\left(\frac{C\lambda^2L^2}{2}\right). \end{align*} [/guided] [/step] [step:Use Markov's inequality to convert the Laplace bound into an upper-tail bound] Fix $r\ge 0$ and $\lambda>0$. Define the nonnegative Borel function $Y_\lambda:X\to[0,\infty)$ by \begin{align*} Y_\lambda(x):=\exp(\lambda g(x)). \end{align*} On the set $\{x\in X:g(x)\ge r\}$, one has $Y_\lambda(x)\ge \exp(\lambda r)$. Markov's inequality for the nonnegative function $Y_\lambda$ gives \begin{align*} \rho(\{x\in X:g(x)\ge r\}) \le \exp(-\lambda r)\int_X \exp(\lambda g)\,d\rho. \end{align*} Using the Laplace estimate from the previous step, \begin{align*} \rho(\{x\in X:g(x)\ge r\}) \le \exp\left(-\lambda r+\frac{C\lambda^2L^2}{2}\right). \end{align*} [/step] [step:Optimize the exponential parameter] If $r=0$, then \begin{align*} \rho(\{x\in X:g(x)\ge 0\})\le 1=\exp(0), \end{align*} which is the desired estimate. Assume now that $r>0$. Define \begin{align*} \lambda_*:=\frac{r}{CL^2}. \end{align*} Since $C>0$, $L>0$, and $r>0$, we have $\lambda_*>0$, so the preceding bound applies with $\lambda=\lambda_*$. Substituting this value gives \begin{align*} -\lambda_* r+\frac{C\lambda_*^2L^2}{2} = -\frac{r^2}{CL^2}+\frac{r^2}{2CL^2} = -\frac{r^2}{2CL^2}. \end{align*} Therefore \begin{align*} \rho(\{x\in X:g(x)\ge r\}) \le \exp\left(-\frac{r^2}{2CL^2}\right). \end{align*} Since $g(x)=f(x)-m$ and $m=\int_X f\,d\rho$, this is precisely \begin{align*} \rho\left(\left\{x\in X:f(x)-\int_X f\,d\rho\ge r\right\}\right) \le \exp\left(-\frac{r^2}{2CL^2}\right). \end{align*} [/step] [step:Apply the upper-tail estimate to $-f$ to obtain the lower-tail estimate] Define the function $\tilde f:X\to\mathbb R$ by \begin{align*} \tilde f(x):=-f(x). \end{align*} Then $\tilde f$ is $L$-Lipschitz because, for all $x,y\in X$, \begin{align*} |\tilde f(x)-\tilde f(y)|=|f(x)-f(y)|\le Ld(x,y), \end{align*} and $\tilde f\in L^1(X,\mathcal B(X),\rho)$. Applying the upper-tail estimate already proved to $\tilde f$ gives, for every $r\ge 0$, \begin{align*} \rho\left(\left\{x\in X:\tilde f(x)-\int_X \tilde f\,d\rho\ge r\right\}\right) \le \exp\left(-\frac{r^2}{2CL^2}\right). \end{align*} Since \begin{align*} \int_X \tilde f\,d\rho(x)=-\int_X f\,d\rho(x), \end{align*} the event in the last display is exactly \begin{align*} \left\{x\in X:\int_X f\,d\rho-f(x)\ge r\right\}. \end{align*} Thus, for every $r\ge 0$, \begin{align*} \rho\left(\left\{x\in X:\int_X f\,d\rho-f(x)\ge r\right\}\right) \le \exp\left(-\frac{r^2}{2CL^2}\right). \end{align*} This proves both the upper-tail and lower-tail estimates. [/step]

Explore Further

Algebra of Scalar $C^1$ Functions Analysis Existence and Uniqueness for the Membrane Obstacle Problem Partial Differential Equations Clean Composition Theorem for Properly Supported Fourier Integral Operators Analysis Interpolation Error Formula via Divided Differences Analysis A-Stability Controls Diagonal Decay Analysis Baire Category Theorem Functional Analysis Nirenberg Non-Embeddability Theorem Analysis Sinai Factor Theorem Analysis Analysis Area

What brings you to Androma?

Start with a route through the knowledge graph.