Hoeffding Confidence Radius for Bounded Independent Means

Hoeffding Confidence Radius for Bounded Independent Means (Theorem # 6064)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

No discussion available for this theorem.

Proof

[proofplan] We center the variables and apply [Hoeffding's inequality](/theorems/1962) to the sum of the centered bounded independent variables. The upper and lower tails are controlled separately, with the lower tail obtained by applying the same inequality to the negated centered variables. A union bound gives the two-sided deviation probability, and then we choose the radius so that the resulting exponential upper bound is exactly $\beta$. [/proofplan] [step:Center the variables and record their bounds] For each $i \in \{1,\dots,n\}$, define the centered [random variable](/page/Random%20Variable) $Y_i: (\Omega,\mathcal F) \to (\mathbb R,\mathcal B(\mathbb R))$ by \begin{align*} Y_i(\omega)=X_i(\omega)-\mu \quad \text{for every } \omega \in \Omega. \end{align*} Since $X_1,\dots,X_n$ are independent and each $Y_i$ is obtained from $X_i$ by the measurable map $x \mapsto x-\mu$, the random variables $Y_1,\dots,Y_n$ are independent. Moreover, \begin{align*} \mathbb E[Y_i]=\mathbb E[X_i]-\mu=0. \end{align*} Since $\mathbb P(a_i \le X_i \le b_i)=1$, we also have \begin{align*} \mathbb P(a_i-\mu \le Y_i \le b_i-\mu)=1. \end{align*} Thus $Y_i$ is almost surely bounded in an interval of length \begin{align*} (b_i-\mu)-(a_i-\mu)=b_i-a_i. \end{align*} Finally, define the deterministic range-sum \begin{align*} D:=\sum_{i=1}^n (b_i-a_i)^2. \end{align*} [/step] [step:Handle the degenerate zero-radius case] Assume first that $D=0$. Since each summand $(b_i-a_i)^2$ is non-negative, $D=0$ implies $b_i=a_i$ for every $i \in \{1,\dots,n\}$. Therefore $\mathbb P(X_i=a_i)=1$ for every $i$. An almost surely constant random variable has expectation equal to that constant, so $\mathbb E[X_i]=a_i$. Because $\mathbb E[X_i]=\mu$, it follows that $a_i=\mu$ for every $i$, and hence \begin{align*} \mathbb P(\bar X_n=\mu)=1. \end{align*} The asserted bound then holds because the displayed radius is $0$ and $1 \ge 1-\beta$. [/step] [step:Apply Hoeffding's inequality to both one-sided deviations] Assume from now on that $D>0$. For $t>0$, Hoeffding's inequality for independent bounded zero-mean random variables gives \begin{align*} \mathbb P\left(\sum_{i=1}^n Y_i \ge t\right) \le \exp\left(-\frac{2t^2}{D}\right). \end{align*} Here the hypotheses are satisfied because the $Y_i$ are independent, $\mathbb E[Y_i]=0$, and $Y_i$ is almost surely contained in an interval of length $b_i-a_i$. Applying the same inequality to the independent zero-mean variables $-Y_i$, which are almost surely contained in intervals of the same lengths $b_i-a_i$, gives \begin{align*} \mathbb P\left(\sum_{i=1}^n Y_i \le -t\right) = \mathbb P\left(\sum_{i=1}^n (-Y_i) \ge t\right) \le \exp\left(-\frac{2t^2}{D}\right). \end{align*} This is the external input of the proof: Hoeffding's inequality for independent bounded zero-mean random variables. [guided] We want to control the deviation of $\bar X_n$ from $\mu$, and this is the same as controlling the centered sum $\sum_{i=1}^n (X_i-\mu)$. For this reason we introduced $Y_i=X_i-\mu$. The random variables $Y_i$ are independent because subtracting the deterministic constant $\mu$ from each $X_i$ does not introduce dependence. They have mean zero since \begin{align*} \mathbb E[Y_i]=\mathbb E[X_i-\mu]=\mathbb E[X_i]-\mu=0. \end{align*} They are also bounded: from $a_i \le X_i \le b_i$ almost surely, we get \begin{align*} a_i-\mu \le Y_i \le b_i-\mu \end{align*} almost surely. The length of this interval is $b_i-a_i$, so the variance proxy in Hoeffding's inequality is \begin{align*} D=\sum_{i=1}^n (b_i-a_i)^2. \end{align*} Hoeffding's inequality for independent bounded zero-mean random variables applies to the family $Y_1,\dots,Y_n$ and yields, for every $t>0$, \begin{align*} \mathbb P\left(\sum_{i=1}^n Y_i \ge t\right) \le \exp\left(-\frac{2t^2}{D}\right). \end{align*} This controls only the upper tail. To control the lower tail, we apply exactly the same result to the random variables $-Y_1,\dots,-Y_n$. They are still independent, still have mean zero, and if $Y_i$ lies in an interval of length $b_i-a_i$, then $-Y_i$ lies in an interval of the same length. Therefore \begin{align*} \mathbb P\left(\sum_{i=1}^n Y_i \le -t\right) = \mathbb P\left(\sum_{i=1}^n (-Y_i) \ge t\right) \le \exp\left(-\frac{2t^2}{D}\right). \end{align*} This is the external input of the proof: Hoeffding's inequality for independent bounded zero-mean random variables. [/guided] [/step] [step:Combine the one-sided bounds into a two-sided bound] For $t>0$, define the event $A_+(t)$ by \begin{align*} A_+(t):=\left\{\omega \in \Omega:\sum_{i=1}^n Y_i(\omega)\ge t\right\}. \end{align*} Define the event $A_-(t)$ by \begin{align*} A_-(t):=\left\{\omega \in \Omega:\sum_{i=1}^n Y_i(\omega)\le -t\right\}. \end{align*} Then \begin{align*} \left\{\omega \in \Omega:\left|\sum_{i=1}^n Y_i(\omega)\right|\ge t\right\} = A_+(t)\cup A_-(t). \end{align*} By finite subadditivity of probability and the one-sided bounds above, \begin{align*} \mathbb P\left(\left|\sum_{i=1}^n Y_i\right|\ge t\right) \le \mathbb P(A_+(t))+\mathbb P(A_-(t)) \le 2\exp\left(-\frac{2t^2}{D}\right). \end{align*} [/step] [step:Choose the radius that makes the failure probability equal to $\beta$] Define the deterministic radius \begin{align*} r:=\sqrt{\frac{D}{2n^2}\log\frac{2}{\beta}}. \end{align*} Since $\beta \in (0,1)$ and $D>0$, we have $r>0$. Set $t:=nr$. Then \begin{align*} \left\{|\bar X_n-\mu|\ge r\right\}=\left\{\left|\frac{1}{n}\sum_{i=1}^n (X_i-\mu)\right|\ge r\right\}. \end{align*} Using the definition of $Y_i$, this event is also \begin{align*} \left\{\left|\sum_{i=1}^n Y_i\right|\ge nr\right\}. \end{align*} Using the two-sided estimate with $t=nr$ gives \begin{align*} \mathbb P(|\bar X_n-\mu|\ge r) \le 2\exp\left(-\frac{2n^2r^2}{D}\right). \end{align*} By the definition of $r$, \begin{align*} 2\exp\left(-\frac{2n^2r^2}{D}\right)=2\exp\left(-\log\frac{2}{\beta}\right)=\beta. \end{align*} Therefore \begin{align*} \mathbb P(|\bar X_n-\mu|\ge r)\le \beta. \end{align*} Taking complements gives \begin{align*} \mathbb P(|\bar X_n-\mu|< r)\ge 1-\beta. \end{align*} Since \begin{align*} \{|\bar X_n-\mu|<r\}\subseteq \{|\bar X_n-\mu|\le r\}, \end{align*} we obtain \begin{align*} \mathbb P(|\bar X_n-\mu|\le r)\ge 1-\beta. \end{align*} Substituting the definition of $D$ gives the claimed bound. [/step] [step:Specialize the radius to a common interval] If $\mathbb P(a \le X_i \le b)=1$ for every $i \in \{1,\dots,n\}$, then we may take $a_i=a$ and $b_i=b$ for every $i$. Hence \begin{align*} D=\sum_{i=1}^n (b_i-a_i)^2=\sum_{i=1}^n (b-a)^2=n(b-a)^2. \end{align*} The general radius becomes \begin{align*} \sqrt{\frac{n(b-a)^2}{2n^2}\log\frac{2}{\beta}}=(b-a)\sqrt{\frac{\log(2/\beta)}{2n}}. \end{align*} Thus $r_n(\beta)=(b-a)\sqrt{\log(2/\beta)/(2n)}$ is a valid $(1-\beta)$ confidence radius. [/step]

Prerequisites (0/6 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

Hoeffding's Inequality

Definitions & Concepts

Explore Further

Real Numbers Definition Random Variable Definition Expectation Definition Variance Definition Event Definition Hoeffding's Inequality Theorem #1962 Fast-Rate Lasso Prediction Bound Under the Compatibility Condition Probability & Statistics Wiener's Theorem Brownian Motion Variance of a Sum of Independent Random Variables Probability Theory Convergence Criterion via Upcrossings Martingale Theory Jensen's Inequality Probability Theory PGF of a Random Sum Probability Theory Conditional Expectation as the $L^2$ Risk Minimizer Probability & Statistics Orthogonality of Ordinary Least Squares Residuals Probability & Statistics Probability & Statistics Area

What brings you to Androma?

Start with a route through the knowledge graph.