Sub-Exponential Confidence Radius for the Sample Mean

Sub-Exponential Confidence Radius for the Sample Mean (Theorem # 6066)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

No discussion available for this theorem.

Proof

[proofplan] We first prove a one-sided Bernstein-type tail estimate directly from the moment-generating-function assumption, using Chernoff's method and independence. The same calculation with negative exponential parameter gives the lower tail because the hypothesis holds for both positive and negative $\lambda$. We then choose the stated radius so that each one-sided tail probability is at most $\beta/2$, and combine the two estimates by the union bound. [/proofplan] [step:Derive the upper tail estimate by optimizing the Chernoff bound] Define the centered sum $S_n: \Omega \to \mathbb R$ by \begin{align*} S_n(\omega) := \sum_{i=1}^n (X_i(\omega)-\mu). \end{align*} For $r > 0$ and $\lambda \in (0,1/b)$, the event $\{\bar X_n-\mu \ge r\}$ equals $\{S_n \ge nr\}$, which equals $\{e^{\lambda S_n} \ge e^{\lambda nr}\}$ because the exponential map is increasing. The [Markov inequality](/theorems/514) applied to the non-negative [random variable](/page/Random%20Variable) $e^{\lambda S_n}$ gives \begin{align*} \mathbb P(\bar X_n-\mu \ge r) = \mathbb P(e^{\lambda S_n} \ge e^{\lambda nr}). \end{align*} Therefore \begin{align*} \mathbb P(\bar X_n-\mu \ge r) \le e^{-\lambda nr}\mathbb E[e^{\lambda S_n}]. \end{align*} Since $X_1,\dots,X_n$ are independent, the centered variables $X_i-\mu$ are independent, and therefore \begin{align*} \mathbb E[e^{\lambda S_n}] = \prod_{i=1}^n \mathbb E[e^{\lambda(X_i-\mu)}]. \end{align*} The moment-generating-function hypothesis applies because $\lambda \in (0,1/b)$, so \begin{align*} \mathbb E[e^{\lambda S_n}] \le \prod_{i=1}^n \exp\left(\frac{\nu^2\lambda^2}{2}\right) = \exp\left(\frac{n\nu^2\lambda^2}{2}\right). \end{align*} Hence \begin{align*} \mathbb P(\bar X_n-\mu \ge r) \le \exp\left(-n\lambda r+\frac{n\nu^2\lambda^2}{2}\right). \end{align*} If $0<r<\nu^2/b$, choose $\lambda := r/\nu^2 \in (0,1/b)$. Then \begin{align*} \mathbb P(\bar X_n-\mu \ge r) \le \exp\left(-\frac{nr^2}{2\nu^2}\right). \end{align*} If $r \ge \nu^2/b$, then for every $\lambda \in (0,1/b)$ the same bound holds; taking $\lambda \uparrow 1/b$ yields \begin{align*} \mathbb P(\bar X_n-\mu \ge r) \le \exp\left(-\frac{nr}{b}+\frac{n\nu^2}{2b^2}\right). \end{align*} Because $r \ge \nu^2/b$, the exponent satisfies \begin{align*} -\frac{nr}{b}+\frac{n\nu^2}{2b^2} \le -\frac{nr}{2b}. \end{align*} Hence \begin{align*} \mathbb P(\bar X_n-\mu \ge r) \le \exp\left(-\frac{nr}{2b}\right). \end{align*} Combining the two cases gives, for every $r>0$, \begin{align*} \mathbb P(\bar X_n-\mu \ge r) \le \exp\left(-\frac{n}{2}\min\left\{\frac{r^2}{\nu^2},\frac{r}{b}\right\}\right). \end{align*} [guided] The goal is to convert the exponential-moment hypothesis into a tail bound for the average. The standard method is Chernoff's bound: exponentiate the event, use Markov's inequality, and then optimize the exponential parameter. Define the centered sum $S_n: \Omega \to \mathbb R$ by \begin{align*} S_n(\omega) := \sum_{i=1}^n (X_i(\omega)-\mu). \end{align*} The event $\{\bar X_n-\mu \ge r\}$ is exactly the event $\{S_n \ge nr\}$. Since $\lambda>0$, the exponential map is increasing, so this event is also $\{e^{\lambda S_n} \ge e^{\lambda nr}\}$. For a parameter $\lambda \in (0,1/b)$, the map $\omega \mapsto e^{\lambda S_n(\omega)}$ is a non-negative random variable, so the [Markov inequality](/theorems/514) applies. We obtain \begin{align*} \mathbb P(\bar X_n-\mu \ge r) = \mathbb P(e^{\lambda S_n} \ge e^{\lambda nr}). \end{align*} Then Markov's inequality gives \begin{align*} \mathbb P(\bar X_n-\mu \ge r) \le e^{-\lambda nr}\mathbb E[e^{\lambda S_n}]. \end{align*} Now we estimate the expectation. Since $X_1,\dots,X_n$ are independent, the centered variables $X_1-\mu,\dots,X_n-\mu$ are also independent. Therefore the exponential of the sum factors. First, \begin{align*} \mathbb E[e^{\lambda S_n}] = \mathbb E\left[\prod_{i=1}^n e^{\lambda(X_i-\mu)}\right]. \end{align*} By independence of the centered variables, the expectation of the product is the product of the expectations: \begin{align*} \mathbb E[e^{\lambda S_n}] = \prod_{i=1}^n \mathbb E[e^{\lambda(X_i-\mu)}]. \end{align*} The hypothesis applies to this positive $\lambda$ because $\lambda \in (0,1/b)$. Hence \begin{align*} \mathbb E[e^{\lambda S_n}] \le \prod_{i=1}^n \exp\left(\frac{\nu^2\lambda^2}{2}\right) = \exp\left(\frac{n\nu^2\lambda^2}{2}\right). \end{align*} Substituting this into the Markov estimate gives \begin{align*} \mathbb P(\bar X_n-\mu \ge r) \le \exp\left(-n\lambda r+\frac{n\nu^2\lambda^2}{2}\right). \end{align*} We now choose $\lambda$ to make the exponent as negative as possible while respecting $\lambda<1/b$. If $0<r<\nu^2/b$, the unconstrained minimizer $\lambda=r/\nu^2$ lies in $(0,1/b)$. Substituting this value of $\lambda$ gives \begin{align*} -n\lambda r+\frac{n\nu^2\lambda^2}{2} = -\frac{nr^2}{2\nu^2}. \end{align*} Thus \begin{align*} \mathbb P(\bar X_n-\mu \ge r) \le \exp\left(-\frac{nr^2}{2\nu^2}\right). \end{align*} If $r \ge \nu^2/b$, the minimizer $r/\nu^2$ is not necessarily allowed, so we push $\lambda$ up to the boundary of the admissible interval. The hypothesis is stated for $\lambda<1/b$, so we use the bound for $\lambda \in (0,1/b)$ and then take the limit $\lambda \uparrow 1/b$. This gives \begin{align*} \mathbb P(\bar X_n-\mu \ge r) &\le \exp\left(-\frac{nr}{b}+\frac{n\nu^2}{2b^2}\right). \end{align*} Because $r \ge \nu^2/b$, we have $\nu^2/b^2 \le r/b$, and hence \begin{align*} -\frac{nr}{b}+\frac{n\nu^2}{2b^2} \le -\frac{nr}{2b}. \end{align*} Therefore \begin{align*} \mathbb P(\bar X_n-\mu \ge r) \le \exp\left(-\frac{nr}{2b}\right). \end{align*} Combining the small-deviation and large-deviation cases gives the single bound \begin{align*} \mathbb P(\bar X_n-\mu \ge r) \le \exp\left(-\frac{n}{2}\min\left\{\frac{r^2}{\nu^2},\frac{r}{b}\right\}\right). \end{align*} [/guided] [/step] [step:Apply the same Chernoff argument to the lower tail] For $r>0$ and $\lambda \in (0,1/b)$, the event $\{\bar X_n-\mu \le -r\}$ equals $\{-S_n \ge nr\}$, which equals $\{e^{-\lambda S_n} \ge e^{\lambda nr}\}$ because the exponential map is increasing. Applying the [Markov inequality](/theorems/514) to the non-negative random variable $e^{-\lambda S_n}$ gives \begin{align*} \mathbb P(\bar X_n-\mu \le -r) \le e^{-\lambda nr}\mathbb E[e^{-\lambda S_n}]. \end{align*} Independence gives \begin{align*} \mathbb E[e^{-\lambda S_n}] = \prod_{i=1}^n \mathbb E[e^{-\lambda(X_i-\mu)}]. \end{align*} The moment-generating-function hypothesis applies with parameter $-\lambda \in (-1/b,0)$, and therefore \begin{align*} \mathbb E[e^{-\lambda S_n}] \le \exp\left(\frac{n\nu^2\lambda^2}{2}\right). \end{align*} The same optimization in $\lambda$ as in the upper-tail estimate yields \begin{align*} \mathbb P(\bar X_n-\mu \le -r) \le \exp\left(-\frac{n}{2}\min\left\{\frac{r^2}{\nu^2},\frac{r}{b}\right\}\right). \end{align*} [/step] [step:Choose the radius so each one-sided tail has probability at most $\beta/2$] Fix $\beta \in (0,1)$ and define \begin{align*} \rho := \max\left\{\nu\sqrt{\frac{2\log(2/\beta)}{n}},\,\frac{2b\log(2/\beta)}{n}\right\}. \end{align*} Since $\rho$ is at least the first term in the maximum, \begin{align*} \frac{n\rho^2}{2\nu^2} \ge \log(2/\beta). \end{align*} Since $\rho$ is at least the second term in the maximum, \begin{align*} \frac{n\rho}{2b} \ge \log(2/\beta). \end{align*} Therefore \begin{align*} \frac{n}{2}\min\left\{\frac{\rho^2}{\nu^2},\frac{\rho}{b}\right\} \ge \log(2/\beta). \end{align*} Applying the upper-tail and lower-tail estimates with $r=\rho$ gives \begin{align*} \mathbb P(\bar X_n-\mu \ge \rho) \le \frac{\beta}{2}, \qquad \mathbb P(\bar X_n-\mu \le -\rho) \le \frac{\beta}{2}. \end{align*} [/step] [step:Combine the two one-sided estimates by the union bound] The failure event of the desired confidence statement is \begin{align*} \{|\bar X_n-\mu|>\rho\} = \{\bar X_n-\mu>\rho\}\cup \{\bar X_n-\mu<-\rho\}. \end{align*} Using monotonicity of probability to replace strict inequalities by non-strict tail events and then applying the [union bound](/theorems/6078), \begin{align*} \mathbb P(|\bar X_n-\mu|>\rho) \le \mathbb P(\bar X_n-\mu\ge \rho)+\mathbb P(\bar X_n-\mu\le -\rho). \end{align*} Using the two one-sided estimates gives \begin{align*} \mathbb P(|\bar X_n-\mu|>\rho) \le \frac{\beta}{2}+\frac{\beta}{2} = \beta. \end{align*} Taking complements gives \begin{align*} \mathbb P(|\bar X_n-\mu|\le \rho) \ge 1-\beta. \end{align*} Substituting the definition of $\rho$ is exactly \begin{align*} \mathbb P\left(|\bar X_n-\mu| \le \max\left\{\nu\sqrt{\frac{2\log(2/\beta)}{n}},\,\frac{2b\log(2/\beta)}{n}\right\}\right) \ge 1-\beta, \end{align*} which proves the theorem. [/step]

Prerequisites (0/3 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Definitions & Concepts

Explore Further

What brings you to Androma?

Start with a route through the knowledge graph.