Local Packing Principle — Statement & Proof

Local Packing Principle (Theorem # 5901)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We reduce estimation under the metric $d$ to testing which point of the finite packing $\{\theta_1,\dots,\theta_M\}$ generated the observation. The separation condition $d(\theta_j,\theta_k)>2s$ ensures that any estimator with loss less than $s$ determines the correct packing index. A Fano-type testing bound with auxiliary measure $P_{\theta_0}$ then lower bounds the testing error by the average Kullback--Leibler divergence to $P_{\theta_0}$, and the local $\rho$-ball condition turns this average divergence into the bound $nr^2\le \alpha\log M$. [/proofplan] [step:Reduce the minimax risk to the finite packing] Let $\hat\theta:\mathcal X\to\Theta$ be an arbitrary measurable estimator. For each $j\in\{1,\dots,M\}$, define the loss [random variable](/page/Random%20Variable) $L_j:\mathcal X\to[0,\infty)$ by \begin{align*} L_j(x):=d(\hat\theta(x),\theta_j). \end{align*} Since $\{\theta_1,\dots,\theta_M\}\subset\Theta$, \begin{align*} \sup_{\theta\in\Theta}\mathbb E_\theta[d(\hat\theta,\theta)] \ge \max_{1\le j\le M}\mathbb E_{\theta_j}[L_j] \ge \frac{1}{M}\sum_{j=1}^M \mathbb E_{\theta_j}[L_j]. \end{align*} For every nonnegative random variable $Y$ on a probability space, the pointwise inequality $Y\ge s\,\mathbb 1_{\{Y\ge s\}}$ gives $\mathbb E[Y]\ge s\,\mathbb P(Y\ge s)$. Applying this under $P_{\theta_j}$ to $Y=L_j$ yields \begin{align*} \sup_{\theta\in\Theta}\mathbb E_\theta[d(\hat\theta,\theta)] \ge \frac{s}{M}\sum_{j=1}^M P_{\theta_j}\bigl(L_j\ge s\bigr). \end{align*} [/step] [step:Decode an index from the estimator] Define the measurable nearest-packing decoder $\hat J:\mathcal X\to\{1,\dots,M\}$ by choosing the smallest index among minimizers: \begin{align*} \hat J(x) := \min\operatorname*{arg\,min}_{1\le k\le M} d(\hat\theta(x),\theta_k). \end{align*} If $\hat J(x)\ne j$, then $d(\hat\theta(x),\theta_{\hat J(x)})\le d(\hat\theta(x),\theta_j)$ by definition of $\hat J$. If also $d(\hat\theta(x),\theta_j)<s$, then the triangle inequality gives \begin{align*} d(\theta_j,\theta_{\hat J(x)}) \le d(\theta_j,\hat\theta(x))+d(\hat\theta(x),\theta_{\hat J(x)}) < s+s = 2s, \end{align*} contradicting $d(\theta_j,\theta_k)>2s$ for $j\ne k$. Therefore \begin{align*} \{\hat J\ne j\}\subseteq \{L_j\ge s\} \end{align*} under $P_{\theta_j}$, and hence \begin{align*} \sup_{\theta\in\Theta}\mathbb E_\theta[d(\hat\theta,\theta)] \ge s\,\frac{1}{M}\sum_{j=1}^M P_{\theta_j}(\hat J\ne j). \end{align*} [guided] The estimator $\hat\theta$ takes values in the whole parameter space $\Theta$, not necessarily in the finite set $\{\theta_1,\dots,\theta_M\}$. To compare estimation with testing, we convert $\hat\theta$ into a finite-valued decision rule. Define the map $\hat J:\mathcal X\to\{1,\dots,M\}$ by \begin{align*} \hat J(x):=\min\operatorname*{arg\,min}_{1\le k\le M} d(\hat\theta(x),\theta_k). \end{align*} The minimum tie-breaking rule makes $\hat J$ well-defined. Under the standard convention that estimators are measurable as maps into the Borel space associated with the [metric space](/page/Metric%20Space) $(\Theta,d)$, each function $x\mapsto d(\hat\theta(x),\theta_k)$ is measurable, and therefore the finite-valued nearest-packing rule $\hat J$ is measurable. Now fix $j\in\{1,\dots,M\}$ and suppose that $\hat J(x)\ne j$. The defining property of $\hat J$ gives \begin{align*} d(\hat\theta(x),\theta_{\hat J(x)}) \le d(\hat\theta(x),\theta_j). \end{align*} If, in addition, the estimator were within distance $s$ of the true packing point $\theta_j$, then the triangle inequality would imply \begin{align*} d(\theta_j,\theta_{\hat J(x)}) \le d(\theta_j,\hat\theta(x))+d(\hat\theta(x),\theta_{\hat J(x)}) < s+s = 2s. \end{align*} This is impossible because distinct packing points are separated by more than $2s$. Hence an incorrect decoded index forces loss at least $s$: \begin{align*} \{\hat J\ne j\}\subseteq \{d(\hat\theta,\theta_j)\ge s\}. \end{align*} Combining this inclusion with the elementary bound $Y\ge s\,\mathbb 1_{\{Y\ge s\}}$ for the nonnegative loss variable $Y=d(\hat\theta,\theta_j)$ gives \begin{align*} \mathbb E_{\theta_j}[d(\hat\theta,\theta_j)] \ge s\,P_{\theta_j}(\hat J\ne j). \end{align*} Averaging over $j=1,\dots,M$ and using that the supremum risk over $\Theta$ dominates the average risk over the finite subset $\{\theta_1,\dots,\theta_M\}$ yields \begin{align*} \sup_{\theta\in\Theta}\mathbb E_\theta[d(\hat\theta,\theta)] \ge s\,\frac{1}{M}\sum_{j=1}^M P_{\theta_j}(\hat J\ne j). \end{align*} [/guided] [/step] [step:Apply the finite testing bound with centre $P_{\theta_0}$] We use the following finite testing inequality: for any probability measures $P_1,\dots,P_M,Q$ on $(\mathcal X,\mathcal A)$ and any measurable decision rule $\psi:\mathcal X\to\{1,\dots,M\}$, \begin{align*} \frac{1}{M}\sum_{j=1}^M P_j(\psi\ne j)\ge 1-\frac{\frac{1}{M}\sum_{j=1}^M D(P_j\|Q)+\log 2}{\log M}. \end{align*} We invoke this standard Fano inequality with an auxiliary reference measure, applied to the finite experiment generated by $P_1,\dots,P_M$ and the reference law $Q$. Apply the inequality with $P_j=P_{\theta_j}$, $Q=P_{\theta_0}$, and $\psi=\hat J$. We obtain \begin{align*} \frac{1}{M}\sum_{j=1}^M P_{\theta_j}(\hat J\ne j)\ge 1-\frac{\frac{1}{M}\sum_{j=1}^M D(P_{\theta_j}\|P_{\theta_0})+\log 2}{\log M}. \end{align*} [/step] [step:Use the local information radius to bound the average divergence] For each $j\in\{1,\dots,M\}$, the information domination hypothesis applied to the pair $(\theta_j,\theta_0)$ gives \begin{align*} D(P_{\theta_j}\|P_{\theta_0}) \le n\,\rho(\theta_j,\theta_0)^2. \end{align*} Since $\rho(\theta_j,\theta_0)\le r$, each summand satisfies \begin{align*} D(P_{\theta_j}\|P_{\theta_0}) \le nr^2. \end{align*} Therefore \begin{align*} \frac{1}{M}\sum_{j=1}^M D(P_{\theta_j}\|P_{\theta_0}) \le nr^2 \le \alpha\log M. \end{align*} Substituting this estimate into the testing bound gives \begin{align*} \frac{1}{M}\sum_{j=1}^M P_{\theta_j}(\hat J\ne j) \ge 1-\alpha-\frac{\log 2}{\log M}. \end{align*} [/step] [step:Conclude the minimax lower bound] Combining the estimation-to-testing reduction with the preceding lower bound on the average testing error gives, for every measurable estimator $\hat\theta:\mathcal X\to\Theta$, \begin{align*} \sup_{\theta\in\Theta}\mathbb E_\theta[d(\hat\theta,\theta)] \ge s\left(1-\alpha-\frac{\log 2}{\log M}\right). \end{align*} Taking the infimum over all measurable estimators $\hat\theta:\mathcal X\to\Theta$ preserves the inequality and yields \begin{align*} \inf_{\hat\theta:\mathcal X\to\Theta}\sup_{\theta\in\Theta}\mathbb E_\theta[d(\hat\theta,\theta)] \ge s\left(1-\alpha-\frac{\log 2}{\log M}\right). \end{align*} This is the desired local packing lower bound. [/step]

Prerequisites (0/6 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

Triangle Inequality For Inner Product Spaces

Definitions & Concepts

Explore Further

Expectation Definition Random Variable Definition Function Definition Matrix Definition Set Definition Triangle Inequality For Inner Product Spaces Theorem #433 Positive Disjoint Events Are Not Independent Probability Theory Generated Sigma-Algebra of Maps Probability & Statistics Necessary KKT Conditions for Exact Lasso Sign Recovery Probability & Statistics Necessary Signal Strength for Exact Support Recovery in Sparse Gaussian Linear Regression Probability & Statistics Binomial Distribution from Independent Bernoulli Trials Probability Theory Weak $\ell_q$ Sparsity Effective Support Bound Probability & Statistics Weak Stirling Probability Theory Epsilon-Net Bound for the Spectral Norm of a Symmetric Matrix Probability & Statistics Probability & Statistics Area

What brings you to Androma?

Start with a route through the knowledge graph.

Local Packing Principle (Theorem # 5901)

Discussion

Proof

Prerequisites (0/6 completed)

Prerequisites Graph

Explore Further

Sign in to Androma

Check your inbox

One last step

Local Packing Principle (Theorem # 5901)

Discussion

Proof

Prerequisites (0/6 completed)

Prerequisites Graph

Explore Further