Nonexistence of Fully Adaptive Honest Supremum-Norm Confidence Bands

Nonexistence of Fully Adaptive Honest Supremum-Norm Confidence Bands (Theorem # 6359)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We argue by contradiction. If an honest band is also narrow at the smoother rate around $f_0$, then it cannot contain both $f_0$ and the rough perturbation $f_n$, because their supremum-norm separation is much larger than $r_n(s_2)$. This converts the band into a test between $P_{f_0}^{(n)}$ and $P_{f_n}^{(n)}$ with asymptotic sum of errors at most $2\alpha$. The total-variation lower bound for testing gives every such test sum of errors at least $\eta$, contradicting $\eta>2\alpha$; the theorem has been stated directly in terms of the membership, separation, and testing-distance assumptions needed for this argument. [/proofplan] [step:Assume an adaptive honest band exists] Suppose, toward a contradiction, that there exist a sequence of random confidence bands $C_n$ and a constant $M<\infty$ satisfying the two asserted properties. We use the standard measurability convention for confidence bands: for each deterministic signal $g$ in the model class, the event $\{g\in C_n\}$ is measurable, and the event $\{\operatorname{diam}_\infty(C_n)>u\}$ is measurable for each real number $u>0$. Since $f_0\in\mathcal{F}_{s_2}\subset\mathcal{F}_{s_1}$, honesty over $\mathcal{F}_{s_1}$ implies that the noncoverage probabilities at $f_0$ have limit superior at most $\alpha$. Since $f_n\in\mathcal{F}_{s_1}$ for all sufficiently large $n$, the same honesty condition implies that the noncoverage probabilities at $f_n$ have limit superior at most $\alpha$. Finally, the diameter condition for the smooth class $\mathcal{F}_{s_2}$, applied at $f_0$, implies that the $P_{f_0,n}$-probability of the event $\operatorname{diam}_\infty(C_n)>M r_n(s_2)$ tends to $0$. [/step] [step:Use the separation of $f_0$ and $f_n$ to exclude simultaneous containment in a narrow band] Define the separation \begin{align*} \Delta_n:=\|f_n-f_0\|_\infty. \end{align*} By hypothesis, \begin{align*} \frac{\Delta_n}{r_n(s_2)}\to\infty. \end{align*} Hence, for all sufficiently large $n$, \begin{align*} \Delta_n>M r_n(s_2). \end{align*} On the event \begin{align*} A_n:=\{f_0\in C_n\}\cap\{\operatorname{diam}_\infty(C_n)\leq M r_n(s_2)\}, \end{align*} the band cannot also contain $f_n$. Indeed, if both $f_0$ and $f_n$ belonged to $C_n$, then the definition of $\operatorname{diam}_\infty(C_n)$ would imply \begin{align*} \Delta_n=\|f_n-f_0\|_\infty\leq \operatorname{diam}_\infty(C_n)\leq M r_n(s_2), \end{align*} contradicting $\Delta_n>M r_n(s_2)$. [guided] The role of the bump construction is to produce two signals that are statistically hard to distinguish but geometrically far apart in supremum norm at the smoother confidence-band scale. We formalize the geometric part first. Define \begin{align*} \Delta_n:=\|f_n-f_0\|_\infty. \end{align*} The hypothesis gives \begin{align*} \frac{\Delta_n}{r_n(s_2)}\to\infty. \end{align*} Since $M<\infty$ is fixed, this implies that, for all sufficiently large $n$, \begin{align*} \Delta_n>M r_n(s_2). \end{align*} Now consider the event \begin{align*} A_n:=\{f_0\in C_n\}\cap\{\operatorname{diam}_\infty(C_n)\leq M r_n(s_2)\}. \end{align*} On this event, the band contains $f_0$ and has supremum-norm diameter at most $M r_n(s_2)$. If $f_n$ also belonged to $C_n$, then the pair $f_0,f_n$ would be among the functions over which the diameter is computed, so \begin{align*} \|f_n-f_0\|_\infty\leq \operatorname{diam}_\infty(C_n)\leq M r_n(s_2). \end{align*} This contradicts $\|f_n-f_0\|_\infty=\Delta_n>M r_n(s_2)$. Hence, on $A_n$, the event $\{f_n\in C_n\}$ cannot occur. [/guided] [/step] [step:Convert the band into a test between $f_0$ and $f_n$] For all sufficiently large $n$, let $\mathcal{Y}_n$ denote the observation space of the Gaussian white noise experiment and define the measurable test $\varphi_n:\mathcal{Y}_n\to\{0,1\}$ by \begin{align*} \varphi_n(Y)=\mathbb{1}_{\{f_n\in C_n(Y)\}}. \end{align*} We interpret $\varphi_n=1$ as rejection of $H_0:f=f_0$ in favor of $H_1:f=f_n$. Under $P_{f_n,n}$, the type II error is \begin{align*} P_{f_n,n}(\varphi_n=0) = P_{f_n,n}(f_n\notin C_n), \end{align*} so \begin{align*} \limsup_{n\to\infty}P_{f_n,n}(\varphi_n=0)\leq \alpha. \end{align*} Under $P_{f_0,n}$, the preceding step gives the event inclusion \begin{align*} \{\varphi_n=1\} =\{f_n\in C_n\} \subset \{f_0\notin C_n\}\cup\{\operatorname{diam}_\infty(C_n)>M r_n(s_2)\}. \end{align*} Therefore \begin{align*} P_{f_0,n}(\varphi_n=1) \leq P_{f_0,n}(f_0\notin C_n) + P_{f_0,n}\left(\operatorname{diam}_\infty(C_n)>M r_n(s_2)\right), \end{align*} and hence \begin{align*} \limsup_{n\to\infty}P_{f_0,n}(\varphi_n=1)\leq \alpha. \end{align*} Combining the two error bounds, \begin{align*} \limsup_{n\to\infty}\left(P_{f_0,n}(\varphi_n=1)+P_{f_n,n}(\varphi_n=0)\right)\leq 2\alpha. \end{align*} [/step] [step:Apply the total-variation testing lower bound] For any measurable space $\mathcal{Y}$, any measurable test $\varphi:\mathcal{Y}\to\{0,1\}$, and any two probability measures $P,Q$ on $\mathcal{Y}$, the defining variational bound for total variation distance applied to the event $\{y\in\mathcal{Y}:\varphi(y)=1\}$ gives \begin{align*} |P(\varphi=1)-Q(\varphi=1)|\leq \|P-Q\|_{\mathrm{TV}}. \end{align*} In particular, $P(\varphi=1)-Q(\varphi=1)\geq -\|P-Q\|_{\mathrm{TV}}$. Since $Q(\varphi=0)=1-Q(\varphi=1)$, this implies \begin{align*} P(\varphi=1)+Q(\varphi=0)=1+P(\varphi=1)-Q(\varphi=1). \end{align*} The preceding lower bound for $P(\varphi=1)-Q(\varphi=1)$ therefore gives \begin{align*} P(\varphi=1)+Q(\varphi=0)\geq 1-\|P-Q\|_{\mathrm{TV}}. \end{align*} Applying this with $P=P_{f_0,n}$, $Q=P_{f_n,n}$, and $\varphi=\varphi_n$, and using $\|P_{f_n,n}-P_{f_0,n}\|_{\mathrm{TV}}\leq 1-\eta$, yields \begin{align*} P_{f_0,n}(\varphi_n=1)+P_{f_n,n}(\varphi_n=0)\geq \eta \end{align*} for all sufficiently large $n$. [guided] We now use the statistical indistinguishability assumption. The relevant elementary fact is that total variation controls the best possible testing error. Let $\mathcal{Y}$ be the common observation space, let $P$ and $Q$ be probability measures on $\mathcal{Y}$, and let $\varphi:\mathcal{Y}\to\{0,1\}$ be any measurable test. By the definition of total variation distance as the supremum discrepancy over measurable events, applied to the event $\{\varphi=1\}$, we have \begin{align*} |P(\varphi=1)-Q(\varphi=1)|\leq \|P-Q\|_{\mathrm{TV}}. \end{align*} In particular, \begin{align*} P(\varphi=1)-Q(\varphi=1)\geq -\|P-Q\|_{\mathrm{TV}}. \end{align*} Adding $1$ to both sides and using $Q(\varphi=0)=1-Q(\varphi=1)$ gives \begin{align*} P(\varphi=1)+Q(\varphi=0)=P(\varphi=1)+1-Q(\varphi=1). \end{align*} Rearranging the right-hand side gives \begin{align*} P(\varphi=1)+1-Q(\varphi=1)=1+P(\varphi=1)-Q(\varphi=1). \end{align*} Using the lower bound for $P(\varphi=1)-Q(\varphi=1)$ now yields \begin{align*} P(\varphi=1)+Q(\varphi=0)\geq 1-\|P-Q\|_{\mathrm{TV}}. \end{align*} This inequality says that if two probability measures have total variation distance bounded away from $1$, then no test can make both the type I and type II errors arbitrarily small. Apply this inequality with $P=P_{f_0,n}$, with $Q=P_{f_n,n}$, and with $\varphi=\varphi_n$. The hypothesis gives \begin{align*} \|P_{f_n,n}-P_{f_0,n}\|_{\mathrm{TV}}\leq 1-\eta, \end{align*} so \begin{align*} P_{f_0,n}(\varphi_n=1)+P_{f_n,n}(\varphi_n=0) \geq 1-\|P_{f_0,n}-P_{f_n,n}\|_{\mathrm{TV}}. \end{align*} Combining this with $\|P_{f_n,n}-P_{f_0,n}\|_{\mathrm{TV}}\leq 1-\eta$ gives \begin{align*} P_{f_0,n}(\varphi_n=1)+P_{f_n,n}(\varphi_n=0) \geq \eta. \end{align*} [/guided] [/step] [step:Derive the contradiction] The constructed tests satisfy \begin{align*} \limsup_{n\to\infty}\left(P_{f_0,n}(\varphi_n=1)+P_{f_n,n}(\varphi_n=0)\right)\leq 2\alpha, \end{align*} while the total-variation lower bound gives \begin{align*} P_{f_0,n}(\varphi_n=1)+P_{f_n,n}(\varphi_n=0)\geq \eta \end{align*} for all sufficiently large $n$. Taking the limit superior in the [second inequality](/theorems/2136) yields \begin{align*} \eta\leq 2\alpha, \end{align*} contradicting the hypothesis $\eta>2\alpha$. Therefore no such sequence of confidence bands $C_n$ and constant $M<\infty$ can exist. The contradiction used only the smooth-class diameter condition for $i=2$; the additional diameter condition for $i=1$ is part of the stronger full-adaptivity requirement and is therefore also impossible to satisfy simultaneously with honesty. [/step]

Prerequisites (0/2 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

test

Definitions & Concepts

Event

Explore Further

Event Definition test Theorem #89 Binary Hypothesis Testing Characterization of Total Variation Probability & Statistics Structure of the Zero Set of Brownian Motion Brownian Motion Karush-Kuhn-Tucker Conditions for the Lasso Active Set Probability & Statistics PGF of a Random Sum Probability Theory Bernstein Inequality for Sums of Independent Sub-Exponential Random Variables Probability & Statistics Central Limit Theorem for Nondegenerate U-Statistics Probability & Statistics White Heteroskedasticity-Robust Covariance Consistency Theorem Probability & Statistics Necessary KKT Conditions for Exact Lasso Sign Recovery Probability & Statistics Probability & Statistics Area

What brings you to Androma?

Start with a route through the knowledge graph.