Testing Reduction for Estimation — Statement & Proof

Testing Reduction for Estimation (Theorem # 5889)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We convert any estimator $\hat T$ into a nearest-neighbour test among the separated target values $T(\theta_1),\dots,T(\theta_M)$. The separation condition implies that, under $\theta_j$, the test can make an error only when $\hat T(X)$ is at metric distance at least $a$ from $T(\theta_j)$. Since the loss dominates the metric, this testing error probability is bounded by the estimation risk divided by $a$. Maximizing over the finite parameter subset and then infimizing gives the claimed testing lower bound. [/proofplan] [step:Show that small estimation error forces the nearest-neighbour decoder to choose the true index] Let $\mathcal A$ denote the $\sigma$-algebra on $\mathcal X$, and let $\mathcal B(\mathcal T)$ denote the Borel $\sigma$-algebra induced by the metric $\rho$ on $\mathcal T$. Fix a measurable estimator \begin{align*} \hat T:(\mathcal X,\mathcal A) &\to (\mathcal T,\mathcal B(\mathcal T)). \end{align*} Let \begin{align*} \hat J_{\hat T}:\mathcal X &\to \{1,\dots,M\} \end{align*} be its nearest-neighbour decoder with the deterministic tie-breaking rule from the statement. For each $i \in \{1,\dots,M\}$, define the distance-to-target map $d_i:\mathcal X \to [0,\infty)$ by \begin{align*} d_i(x) := \rho(\hat T(x),T(\theta_i)). \end{align*} Since $\rho(\cdot,T(\theta_i)):(\mathcal T,\mathcal B(\mathcal T)) \to ([0,\infty),\mathcal B([0,\infty)))$ is continuous and hence Borel-measurable, and since $\hat T$ is measurable, each $d_i$ is $\mathcal A$-measurable. Because $\{1,\dots,M\}$ is finite and the tie-breaking rule is deterministic, every set $\{x \in \mathcal X : \hat J_{\hat T}(x)=i\}$ is obtained from finitely many measurable comparison sets of the form $\{d_i \le d_k\}$ and $\{d_i < d_k\}$; hence $\hat J_{\hat T}$ is measurable. Fix $j \in \{1,\dots,M\}$ and $x \in \mathcal X$ such that \begin{align*} \rho(\hat T(x),T(\theta_j)) < a. \end{align*} For every $i \in \{1,\dots,M\}$ with $i \ne j$, the triangle inequality for the metric $\rho$ gives \begin{align*} \rho(T(\theta_i),T(\theta_j)) \le \rho(T(\theta_i),\hat T(x))+\rho(\hat T(x),T(\theta_j)). \end{align*} Using the separation hypothesis and rearranging, we obtain \begin{align*} \rho(\hat T(x),T(\theta_i)) \ge \rho(T(\theta_i),T(\theta_j))-\rho(\hat T(x),T(\theta_j)) > 2a-a = a. \end{align*} Since $\rho(\hat T(x),T(\theta_j))<a$, the index $j$ gives a strictly smaller distance than every $i \ne j$. Therefore $\hat J_{\hat T}(x)=j$. [/step] [step:Contain the testing error event inside a large estimation error event] For each $j \in \{1,\dots,M\}$, define the testing error event \begin{align*} E_j := \{x \in \mathcal X : \hat J_{\hat T}(x) \ne j\} \end{align*} and the metric estimation error event \begin{align*} A_j := \{x \in \mathcal X : \rho(\hat T(x),T(\theta_j)) \ge a\}. \end{align*} The previous step proves the contrapositive implication \begin{align*} x \notin A_j \implies x \notin E_j. \end{align*} Hence \begin{align*} E_j \subset A_j. \end{align*} Therefore, for every $j \in \{1,\dots,M\}$, \begin{align*} \mathbb P_j(\hat J_{\hat T}(X) \ne j) \le \mathbb P_j(\rho(\hat T(X),T(\theta_j)) \ge a). \end{align*} [guided] Fix $j \in \{1,\dots,M\}$. We want to compare two events under the law $\nu_{\theta_j}$: the testing error event and the event that the estimator is far from the true target value. Define \begin{align*} E_j := \{x \in \mathcal X : \hat J_{\hat T}(x) \ne j\} \end{align*} and \begin{align*} A_j := \{x \in \mathcal X : \rho(\hat T(x),T(\theta_j)) \ge a\}. \end{align*} Suppose $x \in \mathcal X$ satisfies \begin{align*} \rho(\hat T(x),T(\theta_j)) < a. \end{align*} For any $i \in \{1,\dots,M\}$ with $i \ne j$, the triangle inequality for the metric $\rho$ gives \begin{align*} \rho(T(\theta_i),T(\theta_j)) \le \rho(T(\theta_i),\hat T(x))+ \rho(\hat T(x),T(\theta_j)). \end{align*} Using the separation hypothesis $\rho(T(\theta_i),T(\theta_j)) \ge 2a$ and subtracting the strict bound $\rho(\hat T(x),T(\theta_j)) < a$, we obtain \begin{align*} \rho(\hat T(x),T(\theta_i)) \ge \rho(T(\theta_i),T(\theta_j))-\rho(\hat T(x),T(\theta_j)) > 2a-a = a. \end{align*} Thus $j$ is strictly closer to $\hat T(x)$ than every index $i \ne j$, so the deterministic tie-breaking rule is not reached and $\hat J_{\hat T}(x)=j$. In set notation, this proves \begin{align*} x \notin A_j \implies x \notin E_j. \end{align*} Taking contrapositives gives \begin{align*} x \in E_j \implies x \in A_j, \end{align*} so \begin{align*} E_j \subset A_j. \end{align*} Probability is monotone with respect to inclusion of measurable events, and therefore \begin{align*} \mathbb P_j(\hat J_{\hat T}(X) \ne j) = \mathbb P_j(E_j) \le \mathbb P_j(A_j) = \mathbb P_j(\rho(\hat T(X),T(\theta_j)) \ge a). \end{align*} This is the reduction: a testing mistake is possible only when the estimator has already made an error of size at least $a$ in the target metric. [/guided] [/step] [step:Bound the testing error probability by the estimation risk] For $j \in \{1,\dots,M\}$, let $\mathbb P_j:=\mathbb P_{\theta_j}$ be the probability measure induced by $X\sim\nu_{\theta_j}$ on $(\mathcal X,\mathcal A)$, and write $\mathbb E_j:=\mathbb E_{\theta_j}$ for expectation with respect to $\mathbb P_j$. For each $j \in \{1,\dots,M\}$, define the map $Y_j:\mathcal X \to [0,\infty]$ by \begin{align*} Y_j(x) := L(\hat T(x),T(\theta_j)). \end{align*} The estimator $\hat T$ is $\mathcal A/\mathcal B(\mathcal T)$-measurable, the map $x \mapsto T(\theta_j)$ from $(\mathcal X,\mathcal A)$ to $(\mathcal T,\mathcal B(\mathcal T))$ is constant and hence measurable, and $L$ is $\mathcal B(\mathcal T)\otimes\mathcal B(\mathcal T)$-measurable by hypothesis; therefore $Y_j$ is a non-negative [random variable](/page/Random%20Variable). Since $L(t,t') \ge \rho(t,t')$ for all $t,t' \in \mathcal T$, we have the event inclusion \begin{align*} \{x \in \mathcal X : \rho(\hat T(x),T(\theta_j)) \ge a\} \subset \{x \in \mathcal X : Y_j(x) \ge a\}. \end{align*} Because $Y_j \ge 0$, the pointwise inequality \begin{align*} Y_j(x) \ge a\,\mathbb 1_{\{Y_j \ge a\}}(x) \end{align*} holds for every $x \in \mathcal X$. Taking expectation under $\mathbb P_j$ gives \begin{align*} \mathbb E_j[L(\hat T(X),T(\theta_j))] = \mathbb E_j[Y_j(X)] \ge a\,\mathbb P_j(Y_j(X) \ge a). \end{align*} Combining this with the event inclusions from this step and the previous step yields \begin{align*} \mathbb E_j[L(\hat T(X),T(\theta_j))] \ge a\,\mathbb P_j(\hat J_{\hat T}(X) \ne j). \end{align*} [/step] [step:Maximize over the finite testing family and infimize over estimators] Since $\{\theta_1,\dots,\theta_M\} \subset \Theta$, for the fixed estimator $\hat T$ we have \begin{align*} \sup_{\theta \in \Theta} \mathbb E_\theta[L(\hat T(X),T(\theta))] \ge \max_{1 \le j \le M} \mathbb E_j[L(\hat T(X),T(\theta_j))]. \end{align*} Using the estimate from the previous step for each $j$ gives \begin{align*} \max_{1 \le j \le M} \mathbb E_j[L(\hat T(X),T(\theta_j))] \ge a \max_{1 \le j \le M} \mathbb P_j(\hat J_{\hat T}(X) \ne j). \end{align*} The decoder $\hat J_{\hat T}$ is one measurable test from $\mathcal X$ to $\{1,\dots,M\}$, so \begin{align*} \max_{1 \le j \le M} \mathbb P_j(\hat J_{\hat T}(X) \ne j) \ge \inf_{\hat J} \max_{1 \le j \le M} \mathbb P_j(\hat J(X) \ne j), \end{align*} where the infimum is over all measurable tests $\hat J:\mathcal X \to \{1,\dots,M\}$. Therefore every measurable estimator $\hat T$ satisfies \begin{align*} \sup_{\theta \in \Theta} \mathbb E_\theta[L(\hat T(X),T(\theta))] \ge a \inf_{\hat J} \max_{1 \le j \le M} \mathbb P_j(\hat J(X) \ne j). \end{align*} Taking the infimum over all measurable estimators $\hat T:\mathcal X \to \mathcal T$ gives \begin{align*} \inf_{\hat T} \sup_{\theta \in \Theta} \mathbb E_\theta[L(\hat T(X),T(\theta))] \ge a \inf_{\hat J} \max_{1 \le j \le M} \mathbb P_j(\hat J(X) \ne j). \end{align*} This is the desired inequality. [/step]

Prerequisites (0/6 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

Definitions & Concepts

Explore Further

What brings you to Androma?

Start with a route through the knowledge graph.

Testing Reduction for Estimation (Theorem # 5889)

Discussion

Proof

Prerequisites (0/6 completed)

Prerequisites Graph

Explore Further

Sign in to Androma

Check your inbox

One last step

Testing Reduction for Estimation (Theorem # 5889)

Discussion

Proof

Prerequisites (0/6 completed)

Prerequisites Graph

Explore Further