Binomial Chernoff Bound — Statement & Proof

Binomial Chernoff Bound (Theorem # 6046)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

No discussion available for this theorem.

Proof

[proofplan] We prove both tail estimates by applying the exponential moment method to the binomial [random variable](/page/Random%20Variable) $S_n$. First we compute the moment generating function of $S_n$ directly from its binomial probability mass function, using the [Binomial Theorem](/theorems/750). Then we use the exponential form of [Markov's Inequality](/theorems/514) for $e^{\lambda S_n}$, choosing $\lambda>0$ for the upper tail and $\lambda<0$ for the lower tail. The remaining optimization is one-dimensional and gives exactly the Bernoulli relative entropy $D(a \mid p)$. [/proofplan] [step:Compute the exponential moment of the binomial random variable] For each $\lambda \in \mathbb{R}$, define the exponential moment function $M: \mathbb{R} \to (0,\infty)$ by $M(\lambda) = \mathbb{E}[e^{\lambda S_n}]$. Since $S_n \sim \operatorname{Bin}(n,p)$, for each $k \in \{0,1,\dots,n\}$ the binomial probability mass function gives \begin{align*} \mathbb{P}(S_n=k)=\binom{n}{k}p^k(1-p)^{n-k}. \end{align*} Therefore, by the definition of expectation for a finite-valued random variable and the [Binomial Theorem](/theorems/750), \begin{align*} M(\lambda)=\sum_{k=0}^{n} e^{\lambda k}\mathbb{P}(S_n=k)=\sum_{k=0}^{n} \binom{n}{k}(pe^\lambda)^k(1-p)^{n-k}=(1-p+pe^\lambda)^n. \end{align*} Define also the logarithmic Bernoulli moment function $\Lambda: \mathbb{R} \to \mathbb{R}$ by $\Lambda(\lambda)=\log(1-p+pe^\lambda)$. Then \begin{align*} \log M(\lambda)=n\Lambda(\lambda). \end{align*} [/step] [step:Bound the upper tail by an exponential moment] Assume $a>p$. Let $(\Omega,\mathcal F,\mathbb P)$ be a probability space on which the random variable $S_n: \Omega \to \{0,1,\dots,n\}$ is defined. Let $\lambda \in (0,\infty)$. Define the non-negative random variable $Y_\lambda: \Omega \to (0,\infty)$ by $Y_\lambda(\omega)=e^{\lambda S_n(\omega)}$. The following estimate is the exponential form of [Markov's Inequality](/theorems/514), with the verification written out. On the event $\{S_n\ge na\}$, since $\lambda>0$, we have $Y_\lambda\ge e^{\lambda na}$. Hence \begin{align*} \mathbb{E}[Y_\lambda] \ge \int_{\{S_n\ge na\}} Y_\lambda\,d\mathbb{P}(\omega) \ge e^{\lambda na}\mathbb{P}(S_n\ge na). \end{align*} Using the exponential moment computed above, \begin{align*} \mathbb{P}(S_n\ge na) \le e^{-\lambda na}\mathbb{E}[e^{\lambda S_n}] = \exp\{n(\Lambda(\lambda)-\lambda a)\}. \end{align*} Thus \begin{align*} \mathbb{P}(S_n\ge na) \le \exp\left\{-n\sup_{\lambda>0}(\lambda a-\Lambda(\lambda))\right\}. \end{align*} [guided] Let $(\Omega,\mathcal F,\mathbb P)$ be a probability space on which the random variable $S_n: \Omega \to \{0,1,\dots,n\}$ is defined. The upper-tail event $\{S_n\ge na\}$ is controlled by exponentiating $S_n$ with a positive parameter; this is the standard exponential form of [Markov's Inequality](/theorems/514). The reason for taking $\lambda>0$ is monotonicity: if $S_n(\omega)\ge na$, then \begin{align*} e^{\lambda S_n(\omega)}\ge e^{\lambda na}. \end{align*} Define $Y_\lambda: \Omega \to (0,\infty)$ by $Y_\lambda(\omega)=e^{\lambda S_n(\omega)}$. This is a non-negative random variable. Integrating $Y_\lambda$ over all of $\Omega$ and then restricting the domain of integration to the event $\{S_n\ge na\}$ gives \begin{align*} \mathbb{E}[Y_\lambda]=\int_\Omega Y_\lambda\,d\mathbb{P}(\omega)\ge \int_{\{S_n\ge na\}}Y_\lambda\,d\mathbb{P}(\omega). \end{align*} On this restricted event, $Y_\lambda\ge e^{\lambda na}$, so \begin{align*} \int_{\{S_n\ge na\}}Y_\lambda\,d\mathbb{P}(\omega) \ge \int_{\{S_n\ge na\}}e^{\lambda na}\,d\mathbb{P}(\omega) = e^{\lambda na}\mathbb{P}(S_n\ge na). \end{align*} Combining the two inequalities gives \begin{align*} \mathbb{P}(S_n\ge na)\le e^{-\lambda na}\mathbb{E}[e^{\lambda S_n}]. \end{align*} We now compute the exponential moment inside this guided argument so the step is self-contained. Since $S_n \sim \operatorname{Bin}(n,p)$, for each $k \in \{0,1,\dots,n\}$, \begin{align*} \mathbb{P}(S_n=k)=\binom{n}{k}p^k(1-p)^{n-k}. \end{align*} Using the definition of expectation for a finite-valued random variable and the [Binomial Theorem](/theorems/750), \begin{align*} \mathbb{E}[e^{\lambda S_n}] =\sum_{k=0}^{n}e^{\lambda k}\mathbb{P}(S_n=k) =\sum_{k=0}^{n}\binom{n}{k}(pe^\lambda)^k(1-p)^{n-k} =(1-p+pe^\lambda)^n =\exp\{n\Lambda(\lambda)\}. \end{align*} Therefore \begin{align*} \mathbb{P}(S_n\ge na) \le \exp\{n(\Lambda(\lambda)-\lambda a)\}. \end{align*} Since this estimate is valid for every $\lambda>0$, we may optimize over all such $\lambda$: \begin{align*} \mathbb{P}(S_n\ge na) \le \exp\left\{-n\sup_{\lambda>0}(\lambda a-\Lambda(\lambda))\right\}. \end{align*} [/guided] [/step] [step:Optimize the exponential rate for the upper tail] Define $G_a: \mathbb{R} \to \mathbb{R}$ by $G_a(\lambda)=\lambda a-\Lambda(\lambda)$. The derivative is \begin{align*} G_a'(\lambda) =a-\frac{pe^\lambda}{1-p+pe^\lambda}. \end{align*} The unique critical point is the real number $\lambda_a$ defined by \begin{align*} \lambda_a=\log\frac{a(1-p)}{p(1-a)}. \end{align*} Since $a>p$, we have $\lambda_a>0$. Also, \begin{align*} G_a''(\lambda) =-\frac{pe^\lambda(1-p)}{(1-p+pe^\lambda)^2}<0, \end{align*} so $G_a$ is strictly concave and $\lambda_a$ is the global maximizer. Substituting $e^{\lambda_a}=\frac{a(1-p)}{p(1-a)}$ gives \begin{align*} 1-p+pe^{\lambda_a}=1-p+\frac{a(1-p)}{1-a}=\frac{1-p}{1-a}. \end{align*} Therefore \begin{align*} G_a(\lambda_a)=a\log\frac{a(1-p)}{p(1-a)}-\log\frac{1-p}{1-a}=a\log\frac{a}{p}+(1-a)\log\frac{1-a}{1-p}=D(a \mid p). \end{align*} Hence \begin{align*} \mathbb{P}(S_n\ge na)\le \exp\{-nD(a \mid p)\}. \end{align*} [/step] [step:Bound and optimize the lower tail with a negative exponential parameter] Assume $a<p$. Let $\lambda \in (-\infty,0)$. Since $\lambda<0$, on the event $\{S_n\le na\}$ we have $e^{\lambda S_n}\ge e^{\lambda na}$. Applying the same exponential form of [Markov's Inequality](/theorems/514) to the non-negative random variable $Y_\lambda: \Omega \to (0,\infty)$, defined by $Y_\lambda(\omega)=e^{\lambda S_n(\omega)}$, gives \begin{align*} \mathbb{P}(S_n\le na) \le e^{-\lambda na}\mathbb{E}[e^{\lambda S_n}] = \exp\{n(\Lambda(\lambda)-\lambda a)\}. \end{align*} Thus \begin{align*} \mathbb{P}(S_n\le na) \le \exp\left\{-n\sup_{\lambda<0}G_a(\lambda)\right\}. \end{align*} The same derivative computation gives the unique maximizer \begin{align*} \lambda_a=\log\frac{a(1-p)}{p(1-a)}. \end{align*} Since $a<p$, we have $\lambda_a<0$, so this maximizer lies in the allowed range $\lambda<0$. The same substitution as above yields \begin{align*} G_a(\lambda_a)=D(a \mid p). \end{align*} Consequently \begin{align*} \mathbb{P}(S_n\le na)\le \exp\{-nD(a \mid p)\}. \end{align*} This proves both asserted binomial Chernoff bounds. [/step]

Prerequisites (0/3 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Definitions & Concepts

Explore Further

Random Variable Definition Expectation Definition Event Definition Almost Sure Martingale Convergence Theorem Martingale Theory Chernoff Bound via Moment Generating Functions Probability & Statistics Effective-Rank Sample Covariance Concentration Theorem Probability & Statistics Hitting Times of Closed Sets are Stopping Times Stochastic Processes Davis–Kahan Rank-One PCA Bound Probability & Statistics Weak Law of Large Numbers Probability Theory Sub-Gaussian Linear Combination Norm Estimate Probability & Statistics Matching Minimax Lower and Upper Bounds Principle Probability & Statistics Probability & Statistics Area

What brings you to Androma?

Start with a route through the knowledge graph.

Binomial Chernoff Bound (Theorem # 6046)

Discussion

Proof

Prerequisites (0/3 completed)

Prerequisites Graph

Explore Further

Sign in to Androma

Check your inbox

One last step

Binomial Chernoff Bound (Theorem # 6046)

Discussion

Proof

Prerequisites (0/3 completed)

Prerequisites Graph

Explore Further