Androma — The Home of Mathematics on the Internet

custom_env admin

[step:Prove a scalar two-point Gaussian Bayes risk lower bound]Let $a \in [0,\sigma]$. Let $\varepsilon$ be a Rademacher [random variable](/page/Random%20Variable), meaning $\mathbb{P}(\varepsilon=1)=\mathbb{P}(\varepsilon=-1)=1/2$, and let $\xi \sim \mathcal{N}(0,\sigma^2)$ be independent of $\varepsilon$. Define \begin{align*} Y := a\varepsilon+\xi. \end{align*} For every measurable function $T:\mathbb{R}\to\mathbb{R}$, \begin{align*} \mathbb{E}[(T(Y)-a\varepsilon)^2] \geq a^2 \Phi(-1), \end{align*} where $\mathcal{L}^1$ denotes one-dimensional [Lebesgue measure](/page/Lebesgue%20Measure) and $\Phi: \mathbb{R} \to [0,1]$ is the standard normal distribution function defined by \begin{align*} \Phi(t):=\frac{1}{\sqrt{2\pi}}\int_{(-\infty,t]} e^{-u^2/2}\,d\mathcal{L}^1(u) \end{align*} for $t \in \mathbb{R}$. Indeed, define the sign estimator $\psi_T:\mathbb{R}\to\{-1,1\}$ by declaring, for each $y\in\mathbb{R}$, that $\psi_T(y)=1$ if $T(y)\geq 0$ and $\psi_T(y)=-1$ if $T(y)<0$. If $\psi_T(Y)\neq \varepsilon$, then $T(Y)$ and $a\varepsilon$ have opposite signs or $T(Y)=0$ while $a\varepsilon \neq 0$, hence $|T(Y)-a\varepsilon|\geq a$. Therefore \begin{align*} \mathbb{E}[(T(Y)-a\varepsilon)^2] \geq a^2\mathbb{P}(\psi_T(Y)\neq \varepsilon). \end{align*} The two conditional densities of $Y$ with respect to $\mathcal{L}^1$ are $p_{+}:\mathbb{R}\to[0,\infty)$ and $p_{-}:\mathbb{R}\to[0,\infty)$, corresponding respectively to $\varepsilon=1$ and $\varepsilon=-1$, where \begin{align*} p_{+}(y)=\frac{1}{\sqrt{2\pi}\sigma}\exp\left(-\frac{(y-a)^2}{2\sigma^2}\right) \end{align*} and \begin{align*} p_{-}(y)=\frac{1}{\sqrt{2\pi}\sigma}\exp\left(-\frac{(y+a)^2}{2\sigma^2}\right) \end{align*} for $y \in \mathbb{R}$. Since the two prior probabilities are equal, every measurable sign rule $\psi:\mathbb{R}\to\{-1,1\}$ has error probability \begin{align*} \mathbb{P}(\psi(Y)\neq \varepsilon) = \frac{1}{2}\int_{\{y:\psi(y)=-1\}} p_{+}(y)\,d\mathcal{L}^1(y) + \frac{1}{2}\int_{\{y:\psi(y)=1\}} p_{-}(y)\,d\mathcal{L}^1(y). \end{align*} Pointwise minimization of the integrand gives \begin{align*} \inf_{\psi}\mathbb{P}(\psi(Y)\neq \varepsilon) = \frac{1}{2}\int_{\mathbb{R}}\min\{p_{+}(y),p_{-}(y)\}\,d\mathcal{L}^1(y). \end{align*} The inequality $p_{+}(y)\geq p_{-}(y)$ is equivalent to $y\geq 0$, so the minimizing rule decides $\varepsilon=1$ exactly when $Y\geq 0$. Its error probability is \begin{align*} \frac{1}{2}\mathbb{P}(Y<0\mid \varepsilon=1) + \frac{1}{2}\mathbb{P}(Y\geq 0\mid \varepsilon=-1) = \Phi(-a/\sigma) \geq \Phi(-1), \end{align*} because both conditional probabilities equal $\Phi(-a/\sigma)$ and $0\leq a/\sigma\leq 1$. Thus every measurable $T$ satisfies the claimed bound.[/step]

custom_env admin

[guided]We reduce scalar estimation to scalar testing. Let $T:\mathbb{R}\to\mathbb{R}$ be any measurable estimator of $a\varepsilon$ from the observation $Y=a\varepsilon+\xi$. Define the induced sign rule $\psi_T:\mathbb{R}\to\{-1,1\}$ by $\psi_T(y)=1$ when $T(y)\geq 0$ and $\psi_T(y)=-1$ when $T(y)<0$. If $\psi_T(Y)\neq\varepsilon$, then $T(Y)$ lies on the wrong side of $0$ relative to $a\varepsilon$, or equals $0$ while $a\varepsilon\neq 0$. Hence $|T(Y)-a\varepsilon|\geq a$, and therefore \begin{align*} \mathbb{E}[(T(Y)-a\varepsilon)^2] \geq a^2\mathbb{P}(\psi_T(Y)\neq\varepsilon). \end{align*} It remains to lower-bound the best possible testing error. Conditional on $\varepsilon=1$, the observation $Y$ has density $p_{+}:\mathbb{R}\to[0,\infty)$ with respect to $\mathcal{L}^1$ given by \begin{align*} p_{+}(y)=\frac{1}{\sqrt{2\pi}\sigma}\exp\left(-\frac{(y-a)^2}{2\sigma^2}\right), \end{align*} and conditional on $\varepsilon=-1$, it has density $p_{-}:\mathbb{R}\to[0,\infty)$ given by \begin{align*} p_{-}(y)=\frac{1}{\sqrt{2\pi}\sigma}\exp\left(-\frac{(y+a)^2}{2\sigma^2}\right). \end{align*} For any measurable sign rule $\psi:\mathbb{R}\to\{-1,1\}$, equal prior probabilities give \begin{align*} \mathbb{P}(\psi(Y)\neq \varepsilon) = \frac{1}{2}\int_{\{y:\psi(y)=-1\}} p_{+}(y)\,d\mathcal{L}^1(y) + \frac{1}{2}\int_{\{y:\psi(y)=1\}} p_{-}(y)\,d\mathcal{L}^1(y). \end{align*} At each observed value $y$, choosing the larger of $p_{+}(y)$ and $p_{-}(y)$ is the unique way to minimize the contribution to the error integral. Thus \begin{align*} \inf_{\psi}\mathbb{P}(\psi(Y)\neq \varepsilon) = \frac{1}{2}\int_{\mathbb{R}}\min\{p_{+}(y),p_{-}(y)\}\,d\mathcal{L}^1(y). \end{align*} The comparison $p_{+}(y)\geq p_{-}(y)$ is equivalent, after taking logarithms and cancelling common constants, to $(y-a)^2\leq (y+a)^2$, which is equivalent to $y\geq 0$. Therefore the optimal test decides $\varepsilon=1$ when $Y\geq 0$ and $\varepsilon=-1$ when $Y<0$. Its error probability is \begin{align*} \frac{1}{2}\mathbb{P}(Y<0\mid \varepsilon=1) + \frac{1}{2}\mathbb{P}(Y\geq 0\mid \varepsilon=-1) = \Phi(-a/\sigma). \end{align*} Since $0\leq a\leq\sigma$, we have $0\leq a/\sigma\leq 1$, and monotonicity of the standard normal distribution function gives $\Phi(-a/\sigma)\geq\Phi(-1)$. Combining the testing lower bound with the reduction from estimation to testing yields \begin{align*} \mathbb{E}[(T(Y)-a\varepsilon)^2] \geq a^2\Phi(-1). \end{align*}[/guided]

custom_env admin

[guided]The point of the hypercube prior is that it creates many independent scalar estimation problems inside the ball. Recall the construction: $m \in \{1,\dots,d\}$ and $a \in [0,\sigma]$ were chosen so that $ma^2\leq R^2$ and $ma^2\geq \frac{1}{2}\min\{R^2,d\sigma^2\}$, and the random parameter is \begin{align*} \Theta=(a\varepsilon_1,\dots,a\varepsilon_m,0,\dots,0), \end{align*} where $\varepsilon_1,\dots,\varepsilon_m$ are independent Rademacher random variables. Let $\hat{\theta}:\mathbb{R}^d\to\mathbb{R}^d$ be arbitrary, and write its coordinate functions as $\hat{\theta}_1,\dots,\hat{\theta}_d:\mathbb{R}^d\to\mathbb{R}$, so that $\hat{\theta}(x)=(\hat{\theta}_1(x),\dots,\hat{\theta}_d(x))$ for $x\in\mathbb{R}^d$. Under this prior, the observation has the form \begin{align*} X=\Theta+\sigma Z, \end{align*} where $Z=(Z_1,\dots,Z_d)\sim \mathcal{N}(0,I_d)$ is independent of $\Theta$. Expanding the squared Euclidean norm coordinate by coordinate gives \begin{align*} \mathbb{E}[|\hat{\theta}(X)-\Theta|^2] = \sum_{i=1}^d \mathbb{E}[(\hat{\theta}_i(X)-\Theta_i)^2]. \end{align*} The inactive coordinates only add non-negative terms, so \begin{align*} \mathbb{E}[|\hat{\theta}(X)-\Theta|^2] \geq \sum_{i=1}^m \mathbb{E}[(\hat{\theta}_i(X)-a\varepsilon_i)^2]. \end{align*} Now fix an active coordinate $i$. The estimator $\hat{\theta}_i(X)$ is allowed to depend on all coordinates of $X$, not just $X_i$, so we must justify why the scalar lower bound still applies. Define $X_{-i}$ to be the vector obtained from $X$ by deleting its $i$th coordinate, and let $\mathbb{P}_{X_{-i}}$ denote the law of $X_{-i}$ on $\mathbb{R}^{d-1}$. Because the prior signs $\varepsilon_1,\dots,\varepsilon_m$ are independent and the Gaussian noises $Z_1,\dots,Z_d$ are independent, the random vector $X_{-i}$ is independent of $\varepsilon_i$. Hence, after conditioning on $X_{-i}=z$, the only remaining information about $\varepsilon_i$ is contained in \begin{align*} X_i=a\varepsilon_i+\sigma Z_i. \end{align*} Because $X_{-i}$ takes values in a Euclidean space, regular conditional distributions exist. For $\mathbb{P}_{X_{-i}}$-almost every fixed value $z$, define the map $T_z:\mathbb{R}\to\mathbb{R}$ by $T_z(x_i)=\hat{\theta}_i(x_i,z)$ for $x_i\in\mathbb{R}$. This is a scalar estimator of $a\varepsilon_i$ from the one-dimensional Gaussian observation $X_i$. The scalar two-point bound from the previous step applies because its hypotheses are satisfied: $\varepsilon_i$ is Rademacher, $\sigma Z_i\sim\mathcal{N}(0,\sigma^2)$ is independent of $\varepsilon_i$, and the amplitude satisfies $0\leq a\leq\sigma$. Hence \begin{align*} \mathbb{E}[(T_z(X_i)-a\varepsilon_i)^2\mid X_{-i}=z] \geq a^2\Phi(-1) \end{align*} for $\mathbb{P}_{X_{-i}}$-almost every $z$. Equivalently, \begin{align*} \mathbb{E}[(\hat{\theta}_i(X)-a\varepsilon_i)^2\mid X_{-i}=z] \geq a^2\Phi(-1). \end{align*} Integrating this conditional inequality over the distribution of $X_{-i}$ gives \begin{align*} \mathbb{E}[(\hat{\theta}_i(X)-a\varepsilon_i)^2] \geq a^2\Phi(-1). \end{align*} Because this holds for every active coordinate $i=1,\dots,m$, summation gives \begin{align*} \mathbb{E}[|\hat{\theta}(X)-\Theta|^2] \geq m a^2\Phi(-1). \end{align*} The construction ensured $ma^2\geq \frac{1}{2}\min\{R^2,d\sigma^2\}$, so \begin{align*} \mathbb{E}[|\hat{\theta}(X)-\Theta|^2] \geq \frac{\Phi(-1)}{2}\min\{R^2,d\sigma^2\}. \end{align*}[/guided]

custom_env admin

What brings you to Androma?

Start with a route through the knowledge graph.

Attributions & Verification

Proof

Verification Progress

Contributors

Who Can Verify

Quick Actions

Sign in to Androma

Check your inbox

One last step

Attributions & Verification

Proof

Verification Progress

Contributors

Who Can Verify

Quick Actions

Raw Attribution Data