Entropy Method for Self-Bounding Functions — Statement & Proof

Entropy Method for Self-Bounding Functions (Theorem # 6757)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We first apply entropy tensorization to the product structure generated by the independent coordinates $X_1,\dots,X_n$. Each conditional entropy is bounded by comparing $F$ with the auxiliary coordinate-deleted value $F_i$ and using the one-sided increment bound $0\le F-F_i\le 1$. Summing these local estimates and using the self-bounding inequality gives the entropy estimate. We then convert the entropy estimate into a differential inequality for the centered log-[Laplace transform](/page/Laplace%20Transform) and integrate it by the Herbst argument. Finally, Markov's inequality and an explicit optimization in the Laplace parameter give the stated upper tail bound. [/proofplan] [step:Define the coordinate increments and conditional entropy operators] For $i\in\{1,\dots,n\}$, define $X_{-i}:=(X_1,\dots,X_{i-1},X_{i+1},\dots,X_n)$ and write $E_{-i}:=E_1\times\cdots\times E_{i-1}\times E_{i+1}\times\cdots\times E_n$. By the self-bounding hypothesis, there is a measurable coordinate-deleted map \begin{align*} F_i:E_{-i}\to[0,\infty) \end{align*} Thus $F_i(X_{-i})$ is a [random variable](/page/Random%20Variable) depending on all coordinates except $X_i$. Define the increment random variable \begin{align*} \Delta_i:\Omega\to[0,1],\qquad \Delta_i:=F(X)-F_i(X_{-i}) \end{align*} The self-bounding hypotheses say that \begin{align*} 0\le \Delta_i\le 1 \end{align*} for each $i$, and \begin{align*} \sum_{i=1}^n\Delta_i\le F(X). \end{align*} Let $\mathcal L^1$ denote one-dimensional [Lebesgue measure](/page/Lebesgue%20Measure). Let $\mu_i$ denote the law of $X_i$ on $(E_i,\mathcal E_i)$. For a non-negative random variable $Y=Y(X_1,\dots,X_n)$ with finite expectations in the following expressions and a coordinate $i$, define $\mathbb E_i[Y]$ to be the function of $X_{-i}$ obtained by integrating only the $i$-th coordinate against $\mu_i$: \begin{align*} \mathbb E_i[Y](X_{-i}) := \int_{E_i} Y(X_1,\dots,X_{i-1},x_i,X_{i+1},\dots,X_n)\,d\mu_i(x_i). \end{align*} This definition uses the product structure coming from independence and does not require a regular [conditional probability](/page/Conditional%20Probability) kernel. Define the corresponding conditional entropy by \begin{align*} \operatorname{Ent}_i(Y):=\mathbb E_i[Y\log Y]-\mathbb E_i[Y]\log \mathbb E_i[Y]. \end{align*} We use the entropy tensorization inequality for product measures: \begin{align*} \operatorname{Ent}(Y)\le \mathbb E\left[\sum_{i=1}^n \operatorname{Ent}_i(Y)\right]. \end{align*} Here the structural hypothesis needed for tensorization is exactly the independence of $X_1,\dots,X_n$; the displayed inequality is applied only when the global and conditional entropy terms are finite. [/step] [step:Bound each conditional entropy by the coordinate increment] Fix $\lambda>0$ and define \begin{align*} Y:\Omega\to[0,\infty),\qquad Y:=e^{\lambda F(X)} \end{align*} For each $i$, condition on $X_{-i}$. Under this conditioning, $F_i(X_{-i})$ is constant as a function of $X_i$. The variational entropy bound gives, for every positive constant $a$, \begin{align*} \operatorname{Ent}_i(Y)\le \mathbb E_i\left[Y\log\frac{Y}{a}-Y+a\right]. \end{align*} Choose \begin{align*} a:=e^{\lambda F_i(X_{-i})}. \end{align*} Since $Y=e^{\lambda F(X)}$ and $\Delta_i=F(X)-F_i(X_{-i})$, this gives \begin{align*} \operatorname{Ent}_i(e^{\lambda F(X)}) \le \mathbb E_i\left[e^{\lambda F(X)}\left(\lambda\Delta_i-1+e^{-\lambda\Delta_i}\right)\right]. \end{align*} For $u\ge 0$, \begin{align*} e^{-u}+u-1=\int_0^u (u-s)e^{-s}\,d\mathcal L^1(s)\le \frac{u^2}{2}. \end{align*} Applying this with $u=\lambda\Delta_i$ and using $0\le\Delta_i\le1$, we obtain \begin{align*} e^{-\lambda\Delta_i}+\lambda\Delta_i-1\le \lambda^2 e^\lambda \Delta_i. \end{align*} Therefore \begin{align*} \operatorname{Ent}_i(e^{\lambda F(X)}) \le \lambda^2 e^\lambda\,\mathbb E_i[\Delta_i e^{\lambda F(X)}]. \end{align*} [guided] Fix one coordinate $i$. The goal is to estimate the entropy coming only from the randomness of $X_i$, while all other coordinates are frozen. Under this conditioning, the quantity $F_i(X_{-i})$ is constant because it does not depend on $X_i$. We apply the variational entropy bound \begin{align*} \operatorname{Ent}_i(Y)\le \mathbb E_i\left[Y\log\frac{Y}{a}-Y+a\right], \end{align*} valid for every positive constant $a$ with respect to integration against the marginal law $\mu_i$ of $X_i$. We use it with \begin{align*} Y:=e^{\lambda F(X)} \end{align*} and choose the comparison constant \begin{align*} a:=e^{\lambda F_i(X_{-i})}. \end{align*} This is the natural comparison because the difference between $F(X)$ and $F_i(X_{-i})$ is precisely the controlled coordinate increment $\Delta_i$. Since \begin{align*} \log\frac{e^{\lambda F(X)}}{e^{\lambda F_i(X_{-i})}}=\lambda(F(X)-F_i(X_{-i}))=\lambda\Delta_i \end{align*} and \begin{align*} e^{\lambda F_i(X_{-i})}=e^{\lambda F(X)}e^{-\lambda\Delta_i}, \end{align*} the variational bound becomes \begin{align*} \operatorname{Ent}_i(e^{\lambda F(X)}) \le \mathbb E_i\left[e^{\lambda F(X)}\left(\lambda\Delta_i-1+e^{-\lambda\Delta_i}\right)\right]. \end{align*} It remains to estimate the scalar factor in parentheses. For $u\ge0$, Taylor's formula with integral remainder gives \begin{align*} e^{-u}+u-1=\int_0^u (u-s)e^{-s}\,d\mathcal L^1(s). \end{align*} Since $e^{-s}\le1$ for $s\ge0$, \begin{align*} e^{-u}+u-1\le \int_0^u (u-s)\,d\mathcal L^1(s)=\frac{u^2}{2}. \end{align*} Now set $u=\lambda\Delta_i$. The self-bounding hypothesis gives $0\le\Delta_i\le1$, hence $\Delta_i^2\le\Delta_i$. Therefore \begin{align*} e^{-\lambda\Delta_i}+\lambda\Delta_i-1\le \frac{\lambda^2\Delta_i^2}{2}\le \lambda^2 e^\lambda\Delta_i. \end{align*} Substituting this into the conditional entropy estimate yields \begin{align*} \operatorname{Ent}_i(e^{\lambda F(X)}) \le \lambda^2 e^\lambda\,\mathbb E_i[\Delta_i e^{\lambda F(X)}]. \end{align*} [/guided] [/step] [step:Sum the local estimates to prove the entropy inequality] Apply entropy tensorization to $Y=e^{\lambda F(X)}$ and then use the conditional estimate from the previous step: \begin{align*} \operatorname{Ent}(e^{\lambda F(X)}) \le \mathbb E\left[\sum_{i=1}^n \operatorname{Ent}_i(e^{\lambda F(X)})\right]. \end{align*} Thus \begin{align*} \operatorname{Ent}(e^{\lambda F(X)}) \le \lambda^2 e^\lambda\,\mathbb E\left[e^{\lambda F(X)}\sum_{i=1}^n\Delta_i\right]. \end{align*} Using the self-bounding inequality $\sum_{i=1}^n\Delta_i\le F(X)$, we obtain \begin{align*} \operatorname{Ent}(e^{\lambda F(X)}) \le \lambda^2 e^\lambda\,\mathbb E[F(X)e^{\lambda F(X)}]. \end{align*} This proves the asserted entropy estimate. [/step] [step:Convert the entropy estimate into a differential inequality for the log-Laplace transform] Let \begin{align*} m:=\mathbb E[F(X)]. \end{align*} If $m=0$, then $F(X)=0$ $\mathbb P$-a.s. because $F(X)\ge0$, and both the entropy estimate and the tail bound are immediate. Hence assume $m>0$ for the Laplace-transform argument. For the concentration conclusion, assume \begin{align*} \mathbb E[e^{aF(X)}]<\infty \end{align*} for every $0<a<1/e$. For every $0<\lambda<1/e$, choose $\varepsilon>0$ such that $\lambda+\varepsilon<1/e$. Then $F(X)e^{\lambda F(X)}\le \varepsilon^{-1}e^{(\lambda+\varepsilon)F(X)}$, so all differentiations below are justified by dominated convergence. Define the centered log-Laplace transform on $(0,1/e)$ by \begin{align*} \psi:(0,1/e)\to\mathbb R, \lambda\mapsto\log\mathbb E[e^{\lambda(F(X)-m)}] \end{align*} For such $\lambda$, \begin{align*} \mathbb E[e^{\lambda F(X)}]=e^{\lambda m+\psi(\lambda)}. \end{align*} Therefore differentiating under the expectation gives \begin{align*} \frac{\mathbb E[F(X)e^{\lambda F(X)}]}{\mathbb E[e^{\lambda F(X)}]}=m+\psi'(\lambda). \end{align*} The entropy identity for $e^{\lambda F(X)}$ is \begin{align*} \operatorname{Ent}(e^{\lambda F(X)}) =\mathbb E[e^{\lambda F(X)}]\left(\lambda\psi'(\lambda)-\psi(\lambda)\right). \end{align*} Dividing the entropy estimate by the positive number $\mathbb E[e^{\lambda F(X)}]$ gives \begin{align*} \lambda\psi'(\lambda)-\psi(\lambda) \le \lambda^2 e^\lambda\left(m+\psi'(\lambda)\right). \end{align*} For $0<\lambda<1/e$, we have $e^\lambda\le e$, so \begin{align*} \lambda\psi'(\lambda)-\psi(\lambda) \le e\lambda^2\left(m+\psi'(\lambda)\right). \end{align*} [/step] [step:Integrate the differential inequality by the Herbst argument] We now carry out the Herbst argument. For $0<\lambda<1/e$, define \begin{align*} h:(0,1/e)\to\mathbb R, \lambda\mapsto\frac{\psi(\lambda)}{\lambda} \end{align*} Since $\psi(0)=0$ and $\psi'(0)=\mathbb E[F(X)-m]=0$, we have $\lim_{\lambda\downarrow0}h(\lambda)=0$. Also, \begin{align*} h'(\lambda)=\frac{\lambda\psi'(\lambda)-\psi(\lambda)}{\lambda^2}. \end{align*} The differential inequality from the previous step gives \begin{align*} h'(\lambda)\le e(m+\psi'(\lambda)). \end{align*} Because $\psi(\lambda)=\lambda h(\lambda)$, we have \begin{align*} \psi'(\lambda)=h(\lambda)+\lambda h'(\lambda). \end{align*} Therefore \begin{align*} h'(\lambda)\le e(m+h(\lambda)+\lambda h'(\lambda)). \end{align*} Rearranging, and using $1-e\lambda>0$, yields \begin{align*} \frac{d}{d\lambda}\log(m+h(\lambda))\le \frac{e}{1-e\lambda}. \end{align*} Integrating over $(0,\lambda)$ with respect to $\mathcal L^1$ gives \begin{align*} \log\frac{m+h(\lambda)}{m}\le -\log(1-e\lambda). \end{align*} Hence \begin{align*} h(\lambda)\le \frac{e\lambda m}{1-e\lambda}. \end{align*} Multiplying by $\lambda$ gives \begin{align*} \psi(\lambda)\le \frac{e\lambda^2 m}{1-e\lambda} \end{align*} for every $0<\lambda<1/e$. [/step] [step:Apply Markov's inequality and optimize the Laplace parameter] Fix $t>0$ and $0<\lambda<1/e$. Markov's inequality applied to the non-negative random variable $e^{\lambda(F(X)-m)}$ gives \begin{align*} \mathbb P(F(X)-m\ge t) \le \exp(-\lambda t+\psi(\lambda)). \end{align*} Using the Laplace estimate from the previous step, \begin{align*} \mathbb P(F(X)-m\ge t) \le \exp\left(-\lambda t+\frac{e\lambda^2 m}{1-e\lambda}\right). \end{align*} Choose \begin{align*} \lambda:=\frac{t}{e(2m+t)}. \end{align*} Then $0<\lambda<1/e$ and \begin{align*} 1-e\lambda=1-\frac{t}{2m+t}=\frac{2m}{2m+t}. \end{align*} Substituting this value into the exponent gives \begin{align*} -\lambda t+\frac{e\lambda^2 m}{1-e\lambda} = -\frac{t^2}{e(2m+t)}+\frac{t^2}{2e(2m+t)} = -\frac{t^2}{2e(2m+t)}. \end{align*} Since $m=\mathbb E[F(X)]$, this is \begin{align*} \mathbb P(F(X)-\mathbb E[F(X)]\ge t) \le \exp\left(-\frac{t^2}{4e\,\mathbb E[F(X)]+2e\,t}\right). \end{align*} This proves the displayed concentration inequality, with the denominator $4e\,\mathbb E[F(X)]+2e\,t$ obtained from the entropy estimate proved above. [/step]

Prerequisites (0/3 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Definitions & Concepts

Explore Further

What brings you to Androma?

Start with a route through the knowledge graph.

Entropy Method for Self-Bounding Functions (Theorem # 6757)

Discussion

Proof

Prerequisites (0/3 completed)

Prerequisites Graph

Explore Further

Sign in to Androma

Check your inbox

One last step

Entropy Method for Self-Bounding Functions (Theorem # 6757)

Discussion

Proof

Prerequisites (0/3 completed)

Prerequisites Graph

Explore Further