Mean Form of the Convex-Lipschitz Concentration Inequality

Mean Form of the Convex-Lipschitz Concentration Inequality (Theorem # 6775)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We compare the mean of $f(X)$ to a median $m_f$ of $f(X)$ by integrating the median tail estimate. This shows that $|\mathbb{E}[f(X)]-m_f|$ is bounded by a universal multiple of the Lipschitz constant $L$. For small deviations from the mean, the desired estimate is made true by enlarging the universal constant; for large deviations, the event above the mean implies an event above the median, where the assumed concentration inequality applies. The concave lower-tail estimate follows by applying the convex upper-tail estimate to $-f$. [/proofplan] [step:Bound the distance between the mean and a median by integrating the tail] Let $L>0$, let $f:\mathbb{R}^n \to \mathbb{R}$ be convex and $L$-Lipschitz, and define the real-valued [random variable](/page/Random%20Variable) \begin{align*} Y:\Omega \to \mathbb{R} \end{align*} by $Y(\omega)=f(X(\omega))$. Let $m \in \mathbb{R}$ be a median of $Y$. Let $\mathcal{L}^1$ denote one-dimensional [Lebesgue measure](/page/Lebesgue%20Measure) on $\mathbb{R}$, restricted to the Borel subsets of the integration intervals appearing below. The random variable $|Y-m|:\Omega \to [0,\infty)$ is non-negative and measurable. Therefore the [Layer-Cake Representation](/theorems/2956), applied to $|Y-m|$, gives \begin{align*} \mathbb{E}[|Y-m|]=\int_0^\infty \mathbb{P}(|Y-m|>s)\,d\mathcal{L}^1(s). \end{align*} Using the assumed median concentration estimate, \begin{align*} \mathbb{E}[|Y-m|]\le A\int_0^\infty \exp\left(-\frac{s^2}{B L^2}\right)\,d\mathcal{L}^1(s). \end{align*} Use the change of variables \begin{align*} u=\frac{s}{\sqrt{B}L}. \end{align*} Then $d\mathcal{L}^1(s)=\sqrt{B}L\,d\mathcal{L}^1(u)$, and $s \in (0,\infty)$ corresponds to $u \in (0,\infty)$. Hence \begin{align*} \mathbb{E}[|Y-m|]\le A\sqrt{B}L\int_0^\infty e^{-u^2}\,d\mathcal{L}^1(u). \end{align*} The Gaussian tail integral is finite, so define the universal constant \begin{align*} C_0:=A\sqrt{B}\int_0^\infty e^{-u^2}\,d\mathcal{L}^1(u). \end{align*} Then \begin{align*} |\mathbb{E}[Y]-m|\le \mathbb{E}[|Y-m|]\le C_0L. \end{align*} [guided] The first task is to move from a median estimate to a mean estimate. Define the real-valued random variable \begin{align*} Y:\Omega \to \mathbb{R} \end{align*} by $Y(\omega)=f(X(\omega))$, and let $m \in \mathbb{R}$ be a median of $Y$. Let $\mathcal{L}^1$ denote one-dimensional Lebesgue measure on $\mathbb{R}$, restricted to the Borel subsets of the integration intervals appearing below. Since the concentration hypothesis controls deviations of $Y$ from $m$, we estimate the absolute first moment of $Y-m$. For a non-negative random variable $Z:\Omega \to [0,\infty]$, the [Layer-Cake Representation](/theorems/2956) states that \begin{align*} \mathbb{E}[Z]=\int_0^\infty \mathbb{P}(Z>s)\,d\mathcal{L}^1(s). \end{align*} Here $Z=|Y-m|$ is non-negative and measurable, because $Y$ is a real-valued random variable and $m$ is a real constant. Applying the theorem to $Z=|Y-m|$ gives \begin{align*} \mathbb{E}[|Y-m|]=\int_0^\infty \mathbb{P}(|Y-m|>s)\,d\mathcal{L}^1(s). \end{align*} The median concentration hypothesis applies because $f$ is convex and $L$-Lipschitz. Hence, for every $s \ge 0$, \begin{align*} \mathbb{P}(|Y-m|>s)\le \mathbb{P}(|Y-m|\ge s)\le A\exp\left(-\frac{s^2}{B L^2}\right). \end{align*} Substitution into the tail integral yields \begin{align*} \mathbb{E}[|Y-m|]\le A\int_0^\infty \exp\left(-\frac{s^2}{B L^2}\right)\,d\mathcal{L}^1(s). \end{align*} Now perform the one-dimensional change of variables \begin{align*} u=\frac{s}{\sqrt{B}L}. \end{align*} Since $L>0$ and $B>0$, this is an increasing bijection from $(0,\infty)$ to $(0,\infty)$, and the measure transforms as $d\mathcal{L}^1(s)=\sqrt{B}L\,d\mathcal{L}^1(u)$. Therefore \begin{align*} \mathbb{E}[|Y-m|]\le A\sqrt{B}L\int_0^\infty e^{-u^2}\,d\mathcal{L}^1(u). \end{align*} The integral $\int_0^\infty e^{-u^2}\,d\mathcal{L}^1(u)$ is finite. Define \begin{align*} C_0:=A\sqrt{B}\int_0^\infty e^{-u^2}\,d\mathcal{L}^1(u). \end{align*} This number is finite and universal because it depends only on the universal constants $A$ and $B$. Finally, the inequality $|\mathbb{E}[Y]-m|\le \mathbb{E}[|Y-m|]$ follows from the triangle inequality for expectation, so \begin{align*} |\mathbb{E}[Y]-m|\le C_0L. \end{align*} [/guided] [/step] [step:Handle deviations no larger than the mean-median error scale] Choose a universal constant $C\ge 1$ so large that \begin{align*} C\ge e^{1/2} \end{align*} and \begin{align*} C\ge 8C_0^2. \end{align*} If $0\le t\le 2C_0L$, then \begin{align*} \frac{t^2}{CL^2}\le \frac{4C_0^2}{C}\le \frac{1}{2}. \end{align*} Therefore \begin{align*} C\exp\left(-\frac{t^2}{CL^2}\right)\ge Ce^{-1/2}\ge 1. \end{align*} Since every probability is at most $1$, \begin{align*} \mathbb{P}(Y\ge \mathbb{E}[Y]+t)\le C\exp\left(-\frac{t^2}{CL^2}\right). \end{align*} [/step] [step:Convert large deviations from the mean into deviations from the median] Increase $C$, if necessary, so that also $C\ge A$ and $C\ge 4B$. Suppose $t>2C_0L$. From $|\mathbb{E}[Y]-m|\le C_0L$, we have \begin{align*} \mathbb{E}[Y]+t\ge m+t-|\mathbb{E}[Y]-m|>m+\frac{t}{2}. \end{align*} Hence \begin{align*} \{Y\ge \mathbb{E}[Y]+t\}\subseteq \left\{Y\ge m+\frac{t}{2}\right\}. \end{align*} Applying the median concentration estimate with $s=t/2$ gives \begin{align*} \mathbb{P}(Y\ge \mathbb{E}[Y]+t)\le A\exp\left(-\frac{t^2}{4BL^2}\right). \end{align*} Because $C\ge A$ and $C\ge 4B$, \begin{align*} A\exp\left(-\frac{t^2}{4BL^2}\right)\le C\exp\left(-\frac{t^2}{CL^2}\right). \end{align*} Thus the desired convex upper-tail estimate holds for all $t\ge 0$. [/step] [step:Apply the convex upper-tail estimate to the negative function] Let $f:\mathbb{R}^n\to\mathbb{R}$ be concave and $L$-Lipschitz. Define \begin{align*} g:\mathbb{R}^n \to \mathbb{R} \end{align*} by $g(x)=-f(x)$. Then $g$ is convex and $L$-Lipschitz. Applying the convex upper-tail estimate to $g$ gives, for every $t\ge 0$, \begin{align*} \mathbb{P}(g(X)\ge \mathbb{E}[g(X)]+t)\le C\exp\left(-\frac{t^2}{CL^2}\right). \end{align*} Since $g(X)=-f(X)$ and $\mathbb{E}[g(X)]=-\mathbb{E}[f(X)]$, the event on the left is exactly \begin{align*} \{f(X)\le \mathbb{E}[f(X)]-t\}. \end{align*} Therefore \begin{align*} \mathbb{P}(f(X)\le \mathbb{E}[f(X)]-t)\le C\exp\left(-\frac{t^2}{CL^2}\right). \end{align*} This proves the corresponding lower-tail bound for concave $f$ and completes the proof. [/step]

Prerequisites (0/5 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

Triangle Inequality For Inner Product Spaces

Definitions & Concepts

Explore Further

Lebesgue Measure Definition Expectation Definition Event Definition Random Variable Definition Triangle Inequality For Inner Product Spaces Theorem #433 Unbiasedness of the Ordinary Least Squares Estimator Under Exogeneity Probability & Statistics Epsilon-Net Bound for the Spectral Norm of a Symmetric Matrix Probability & Statistics Blumenthal's Zero-One Law Brownian Motion Additive Chernoff-Hoeffding Bound for Bernoulli Sums Probability & Statistics AMISE-Optimal Bandwidth for Kernel Density Estimation Probability & Statistics MISE Rate for Multivariate Kernel Density Estimation Probability & Statistics Threshold Events Are Events Probability Theory Boundary Bias Reduction by Local Linear Regression Probability & Statistics Probability & Statistics Area

What brings you to Androma?

Start with a route through the knowledge graph.