Le Cam Two-Point Lower Bound — Statement & Proof

Le Cam Two-Point Lower Bound (Theorem # 6297)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We convert the estimator into a binary test between the two product laws $Q_0=P_0^{\otimes n}$ and $Q_1=P_1^{\otimes n}$ by selecting the closer of the two target values. The separation assumption forces the estimation error to be at least $s$ whenever this induced test chooses the wrong alternative. We then prove the elementary two-point testing inequality that the sum of the two error probabilities is at least $1-\operatorname{TV}(Q_0,Q_1)$, and finally pass from the average of the two endpoint risks to the supremum risk over $\mathcal P$. [/proofplan] [step:Introduce endpoint notation and reduce the supremum risk to the two endpoint risks] Define the two target points $\theta_0,\theta_1\in\Psi$ by \begin{align*} \theta_i:=\psi(P_i), \qquad i\in\{0,1\}. \end{align*} Define the endpoint risks $R_i\in[0,\infty]$ of the estimator $\hat\psi_n$ by \begin{align*} R_i:=\int_{\mathcal X^n} d(\hat\psi_n(x),\theta_i)\,dQ_i(x), \qquad i\in\{0,1\}. \end{align*} Since $P_0,P_1\in\mathcal P$, the supremum risk dominates the larger endpoint risk: \begin{align*} \sup_{P\in\mathcal P}\mathbb E_{P^{\otimes n}}\!\left[d(\hat\psi_n,\psi(P))\right] \ge \max\{R_0,R_1\}. \end{align*} The maximum dominates the arithmetic mean, so \begin{align*} \sup_{P\in\mathcal P}\mathbb E_{P^{\otimes n}}\!\left[d(\hat\psi_n,\psi(P))\right] \ge \frac{R_0+R_1}{2}. \end{align*} [/step] [step:Build the closest-target test induced by the estimator] Since $\hat\psi_n:\mathcal X^n\to\Psi$ is an estimator, it is $\mathcal A^{\otimes n}$-to-Borel measurable. The maps $y\mapsto d(y,\theta_0)$ and $y\mapsto d(y,\theta_1)$ from $\Psi$ to $\mathbb R$ are continuous because $d$ is a metric, so the set \begin{align*} A_0:=\{x\in\mathcal X^n: d(\hat\psi_n(x),\theta_0)\le d(\hat\psi_n(x),\theta_1)\} \end{align*} belongs to $\mathcal A^{\otimes n}$. Define \begin{align*} A_1:=\mathcal X^n\setminus A_0. \end{align*} Then $A_1\in\mathcal A^{\otimes n}$ as well. Define the induced test $\varphi:\mathcal X^n\to\{0,1\}$ by setting $\varphi(x)=0$ for $x\in A_0$ and $\varphi(x)=1$ for $x\in A_1$. The test $\varphi$ selects the target value closer to $\hat\psi_n(x)$, with ties assigned to $0$. On $A_1$, we have $d(\hat\psi_n(x),\theta_1)<d(\hat\psi_n(x),\theta_0)$. By the separation assumption, \begin{align*} 2s\le d(\theta_0,\theta_1). \end{align*} By the triangle inequality and the defining inequality of $A_1$, \begin{align*} d(\theta_0,\theta_1)\le d(\theta_0,\hat\psi_n(x))+d(\hat\psi_n(x),\theta_1)<2d(\theta_0,\hat\psi_n(x)). \end{align*} Thus $d(\hat\psi_n(x),\theta_0)>s$ for every $x\in A_1$. Therefore \begin{align*} R_0 =\int_{\mathcal X^n} d(\hat\psi_n(x),\theta_0)\,dQ_0(x) \ge s\,Q_0(A_1). \end{align*} On $A_0$, the definition of $A_0$ gives $d(\hat\psi_n(x),\theta_0)\le d(\hat\psi_n(x),\theta_1)$. Again, the separation assumption gives \begin{align*} 2s\le d(\theta_0,\theta_1). \end{align*} The triangle inequality and the defining inequality of $A_0$ give \begin{align*} d(\theta_0,\theta_1)\le d(\theta_0,\hat\psi_n(x))+d(\hat\psi_n(x),\theta_1)\le 2d(\hat\psi_n(x),\theta_1). \end{align*} Thus $d(\hat\psi_n(x),\theta_1)\ge s$ for every $x\in A_0$. Hence \begin{align*} R_1 =\int_{\mathcal X^n} d(\hat\psi_n(x),\theta_1)\,dQ_1(x) \ge s\,Q_1(A_0). \end{align*} Combining the two endpoint bounds gives \begin{align*} R_0+R_1\ge s\bigl(Q_0(A_1)+Q_1(A_0)\bigr). \end{align*} [guided] The estimator $\hat\psi_n$ takes values in the target space $\Psi$, not in the label set $\{0,1\}$. Recall that the two target points are defined by $\theta_i:=\psi(P_i)$ for $i\in\{0,1\}$. To compare estimation with testing, we turn the estimator into a test by asking which target point it is closer to. Since $\hat\psi_n:\mathcal X^n\to\Psi$ is $\mathcal A^{\otimes n}$-to-Borel measurable and the maps $y\mapsto d(y,\theta_0)$ and $y\mapsto d(y,\theta_1)$ are continuous on the [metric space](/page/Metric%20Space) $\Psi$, the set \begin{align*} A_0:=\{x\in\mathcal X^n: d(\hat\psi_n(x),\theta_0)\le d(\hat\psi_n(x),\theta_1)\} \end{align*} belongs to $\mathcal A^{\otimes n}$. Define \begin{align*} A_1:=\mathcal X^n\setminus A_0. \end{align*} Then $A_1\in\mathcal A^{\otimes n}$. Define $\varphi:\mathcal X^n\to\{0,1\}$ by $\varphi(x)=0$ on $A_0$ and $\varphi(x)=1$ on $A_1$. Thus $\varphi$ chooses model $0$ when $\hat\psi_n(x)$ is at least as close to $\theta_0$ as to $\theta_1$, and chooses model $1$ otherwise. Why does a wrong testing decision force a large estimation error? Suppose first that the true model is $0$, but the induced test chooses $1$, so $x\in A_1$. Then \begin{align*} d(\hat\psi_n(x),\theta_1)<d(\hat\psi_n(x),\theta_0). \end{align*} The separation assumption gives \begin{align*} 2s\le d(\theta_0,\theta_1). \end{align*} Using the triangle inequality between $\theta_0$, $\hat\psi_n(x)$, and $\theta_1$, and then using the displayed inequality above, we get \begin{align*} d(\theta_0,\theta_1)\le d(\theta_0,\hat\psi_n(x))+d(\hat\psi_n(x),\theta_1)<2d(\theta_0,\hat\psi_n(x)). \end{align*} Therefore $d(\hat\psi_n(x),\theta_0)>s$ on $A_1$. Integrating this pointwise lower bound with respect to the true law $Q_0$ gives \begin{align*} R_0 =\int_{\mathcal X^n} d(\hat\psi_n(x),\theta_0)\,dQ_0(x) \ge s\,Q_0(A_1). \end{align*} Now suppose the true model is $1$, but the induced test chooses $0$, so $x\in A_0$. The definition of $A_0$ gives \begin{align*} d(\hat\psi_n(x),\theta_0)\le d(\hat\psi_n(x),\theta_1). \end{align*} The separation assumption gives \begin{align*} 2s\le d(\theta_0,\theta_1). \end{align*} The same triangle inequality calculation gives \begin{align*} d(\theta_0,\theta_1)\le d(\theta_0,\hat\psi_n(x))+d(\hat\psi_n(x),\theta_1)\le 2d(\hat\psi_n(x),\theta_1). \end{align*} Therefore $d(\hat\psi_n(x),\theta_1)\ge s$ on $A_0$, and integration with respect to $Q_1$ yields \begin{align*} R_1 =\int_{\mathcal X^n} d(\hat\psi_n(x),\theta_1)\,dQ_1(x) \ge s\,Q_1(A_0). \end{align*} Adding the two inequalities gives the reduction from estimation risk to testing error: \begin{align*} R_0+R_1\ge s\bigl(Q_0(A_1)+Q_1(A_0)\bigr). \end{align*} [/guided] [/step] [step:Lower-bound the two testing errors by total variation] For any measurable set $A\in\mathcal A^{\otimes n}$, the definition of total variation gives \begin{align*} Q_0(A)-Q_1(A)\le \operatorname{TV}(Q_0,Q_1). \end{align*} Apply this with $A=A_0$. Since $A_1=\mathcal X^n\setminus A_0$ and $Q_0(\mathcal X^n)=1$, finite additivity gives \begin{align*} Q_0(A_1)+Q_1(A_0)=1-Q_0(A_0)+Q_1(A_0). \end{align*} Rearranging the right-hand side gives \begin{align*} 1-Q_0(A_0)+Q_1(A_0)=1-\bigl(Q_0(A_0)-Q_1(A_0)\bigr). \end{align*} Using $Q_0(A_0)-Q_1(A_0)\le \operatorname{TV}(Q_0,Q_1)$, we obtain \begin{align*} Q_0(A_1)+Q_1(A_0)\ge 1-\operatorname{TV}(Q_0,Q_1). \end{align*} Consequently, \begin{align*} R_0+R_1\ge s\left(1-\operatorname{TV}(Q_0,Q_1)\right). \end{align*} [/step] [step:Combine the endpoint average with the testing lower bound] From the endpoint reduction and the testing bound, first \begin{align*} \sup_{P\in\mathcal P}\mathbb E_{P^{\otimes n}}\!\left[d(\hat\psi_n,\psi(P))\right]\ge \frac{R_0+R_1}{2}. \end{align*} Combining this with $R_0+R_1\ge s\left(1-\operatorname{TV}(Q_0,Q_1)\right)$ gives \begin{align*} \sup_{P\in\mathcal P}\mathbb E_{P^{\otimes n}}\!\left[d(\hat\psi_n,\psi(P))\right]\ge \frac{s}{2}\left(1-\operatorname{TV}(Q_0,Q_1)\right). \end{align*} Substituting $Q_i=P_i^{\otimes n}$ for $i\in\{0,1\}$ gives \begin{align*} \sup_{P\in\mathcal P}\mathbb E_{P^{\otimes n}}\!\left[d(\hat\psi_n,\psi(P))\right] \ge \frac{s}{2}\left(1-\operatorname{TV}(P_0^{\otimes n},P_1^{\otimes n})\right). \end{align*} This is the desired lower bound for the arbitrary estimator $\hat\psi_n$. [/step]

Prerequisites (0/2 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

Explore Further

test Theorem #89 Triangle Inequality For Inner Product Spaces Theorem #433 MGF Determines the Distribution Probability Theory Properties of Conditional Expectation Probability Theory Chebyshev's Inequality Probability Theory Bias and Variance Orders for Multivariate Kernel Density Estimators Probability & Statistics Hilbert Projection Theorem for Closed Linear Subspaces of $L^2$ Probability & Statistics Weighted Least Squares Gauss-Markov Theorem Probability & Statistics Unbiasedness of the Ordinary Least Squares Error Variance Estimator Probability & Statistics Gambler's Ruin Probability Probability Theory Probability & Statistics Area

What brings you to Androma?

Start with a route through the knowledge graph.

Le Cam Two-Point Lower Bound (Theorem # 6297)

Discussion

Proof

Prerequisites (0/2 completed)

Prerequisites Graph

Explore Further

Sign in to Androma

Check your inbox

One last step

Le Cam Two-Point Lower Bound (Theorem # 6297)

Discussion

Proof

Prerequisites (0/2 completed)

Prerequisites Graph

Explore Further