Ingster Detection Boundary for Sparse Gaussian Mean Mixtures

Ingster Detection Boundary for Sparse Gaussian Mean Mixtures (Theorem # 5953)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We formulate the minimax risk and prove the two directions separately. The lower bound is obtained from a least favourable prior: in the moderately sparse regime its likelihood ratio has second moment tending to one, while in the very sparse regime the same conclusion holds after truncating the likelihood on the rare large-coordinate events. The upper bound uses threshold exceedance counts; optimizing the threshold exponent gives exactly the two pieces of $\rho^*(\beta)$. The signed case inherits the lower bound from the one-sided subclass and uses two-sided exceedance counts for the upper bound. [/proofplan] [step:Define the minimax risk and the sparse priors] Let $P_0$ denote the probability law on $(\mathbb{R}^d,\mathcal{B}(\mathbb{R}^d))$ under which the coordinate maps $X_i:\mathbb{R}^d\to\mathbb{R}$ are independent standard normal random variables. For a vector $\mu\in\mathbb{R}^d$, let $P_\mu$ denote the probability law on $(\mathbb{R}^d,\mathcal{B}(\mathbb{R}^d))$ under which the coordinate maps satisfy $X_i\sim \mathcal{N}(\mu_i,1)$ independently for $1\le i\le d$. Write $\mathbb{E}_0$ for expectation under $P_0$, $\mathbb{E}_\mu$ and $\operatorname{Var}_\mu$ for expectation and variance under $P_\mu$, and write $\mathbb{E}$ without a subscript for expectation over the auxiliary random supports and signs used in the priors. For probability measures $P$ and $Q$ on the same measurable space, write \begin{align*} \|P-Q\|_{\mathrm{TV}}=\sup_A |P(A)-Q(A)| \end{align*} for their total variation distance. Throughout the proof, $o(1)$ denotes a deterministic quantity tending to $0$ as $d\to\infty$, with the fixed constants $\beta$, $r$, and any chosen threshold exponent $q$ held fixed. Thus $d^{o(1)}$ denotes $\exp(o(1)\log d)$. Let $\mathcal{A}_{d,+}(r,\beta)$ be the one-sided alternative class consisting of vectors with exactly $s_d$ nonzero coordinates, each equal to $a_d=\sqrt{2r\log d}$. For a test $\psi_d:\mathbb{R}^d\to[0,1]$, define its minimax risk by \begin{align*} R_d(\psi_d)=P_0\psi_d+\sup_{\mu\in\mathcal{A}_{d,+}(r,\beta)} P_\mu(1-\psi_d). \end{align*} Detection is impossible when $\inf_{\psi_d}R_d(\psi_d)\to1$, and possible when there exists $\psi_d$ with $R_d(\psi_d)\to0$. For the lower bound, let $S_d$ be uniformly distributed over all subsets of $\{1,\dots,d\}$ of cardinality $s_d$. This is well-defined because $s_d$ is integer-valued and $s_d=d^{1-\beta+o(1)}$. Define the prior $\Pi_{d,+}$ on $\mathcal{A}_{d,+}(r,\beta)$ by setting $\mu_i=a_d\mathbb{1}_{\{i\in S_d\}}$. Let $P_{\Pi_{d,+}}$ be the mixture law. If $P_{\Pi_{d,+}}$ is contiguous to $P_0$, then no test has vanishing sum of type I error and worst-case type II error, because the Bayes type II error under $\Pi_{d,+}$ is bounded above by the worst-case type II error over $\mathcal{A}_{d,+}(r,\beta)$. [/step] [step:Prove impossibility below the boundary by second moments and truncation] Let $L_d=\frac{dP_{\Pi_{d,+}}}{dP_0}$ be the Radon-Nikodym likelihood ratio. Conditional on a support $S\subset\{1,\dots,d\}$ with $|S|=s_d$, define the support-conditional likelihood ratio $L_S:\mathbb{R}^d\to(0,\infty)$ as the map sending $x\in\mathbb{R}^d$ to \begin{align*} L_S(x)=\exp\left(a_d\sum_{i\in S}x_i-\frac{s_da_d^2}{2}\right). \end{align*} Then $L_d=\mathbb{E}_{S_d}[L_{S_d}(X)]$ under $P_0$. [claim:] If $r<\rho^*(\beta)$, then $\|P_{\Pi_{d,+}}-P_0\|_{\mathrm{TV}}\to0$. In particular, $P_{\Pi_{d,+}}$ is contiguous to $P_0$. [/claim] [proof] First suppose $1/2<\beta\le3/4$ and $r<\beta-1/2$. Let $S_d$ and $T_d$ be independent uniform subsets of cardinality $s_d$, and define the overlap [random variable](/page/Random%20Variable) $K_d=|S_d\cap T_d|$. The Gaussian moment-generating function gives \begin{align*} \mathbb{E}_{0}[L_d^2]=\mathbb{E}\left[\exp(a_d^2K_d)\right]. \end{align*} For every integer $k\ge1$, the hypergeometric overlap satisfies \begin{align*} \mathbb{P}(K_d=k)\le \frac{1}{k!}\left(\frac{s_d^2}{d-s_d}\right)^k. \end{align*} Since $s_d/d\to0$, summing this bound gives \begin{align*} \mathbb{E}_{0}[L_d^2]-1\le \exp\left(\frac{s_d^2}{d-s_d}(e^{a_d^2}-1)\right)-1. \end{align*} Here $s_d^2d^{-1}=d^{1-2\beta+o(1)}$ and $e^{a_d^2}=d^{2r}$, so the exponent is $d^{1-2\beta+2r+o(1)}\to0$. Therefore $\mathbb{E}_{0}[L_d^2]\to1$, and the [Cauchy-Schwarz inequality](/page/Cauchy-Schwarz%20Inequality) gives total variation convergence $\|P_{\Pi_{d,+}}-P_0\|_{\mathrm{TV}}\to0$. Now suppose $3/4<\beta<1$ and $r<(1-\sqrt{1-\beta})^2$. Define the common-coordinate exponent $\kappa:(0,\infty)\times(0,\infty)\to[0,\infty)$ by \begin{align*} \kappa(q,r)=\left[2r-(2\sqrt r-\sqrt q)_+^2\right]_+. \end{align*} Choose $q\in(r,1)$ such that \begin{align*} (\sqrt q-\sqrt r)^2>1-\beta \end{align*} and \begin{align*} \kappa(q,r)<2\beta-1. \end{align*} We verify that such a $q$ exists. Since $r<(1-\sqrt{1-\beta})^2$, we have $\sqrt r<1-\sqrt{1-\beta}$, hence \begin{align*} (1-\sqrt r)^2>1-\beta. \end{align*} By continuity, the [first inequality](/theorems/2897) holds for all $q<1$ sufficiently close to $1$. For the [second inequality](/theorems/2136), distinguish the two cases in the definition of $\kappa$. If $2\sqrt r\le1$, then for $q$ close enough to $1$ we have $2\sqrt r-\sqrt q\le0$, so $\kappa(q,r)=2r$. In this subcase $r\le 1/4$, while $\beta>3/4$ gives $\beta-1/2>1/4$, so $2r<2\beta-1$. If $2\sqrt r>1$, then for $q$ close to $1$, \begin{align*} \kappa(q,r)=2r-(2\sqrt r-\sqrt q)^2 \to 2r-(2\sqrt r-1)^2. \end{align*} The inequality $r<(1-\sqrt{1-\beta})^2$ is equivalent, after writing $u=\sqrt r$, to \begin{align*} 2u^2-(2u-1)^2<2\beta-1, \end{align*} so continuity again gives $\kappa(q,r)<2\beta-1$ for all $q<1$ sufficiently close to $1$. Set $\tau_d=\sqrt{2q\log d}$ and define the truncated likelihood ratio $\widetilde L_d:\mathbb{R}^d\to[0,\infty)$ as the map sending $x\in\mathbb{R}^d$ to \begin{align*} \widetilde L_d(x)=\mathbb{E}_{S_d}\left[L_{S_d}(x)\mathbb{1}_{\{\max_{i\in S_d}x_i\le \tau_d\}}\right]. \end{align*} Let $(\Omega_Z,\mathcal{F}_Z,\mathbb{P}_Z)$ be an auxiliary probability space supporting a real-valued random variable $Z:(\Omega_Z,\mathcal{F}_Z)\to(\mathbb{R},\mathcal{B}(\mathbb{R}))$ with law $\mathcal{N}(0,1)$. We use the Gaussian tail estimate in the following fixed-exponent forms: for every fixed $u>0$, \begin{align*} \mathbb{P}_Z(Z>\sqrt{2u\log d})=d^{-u+o(1)} \end{align*} and, by symmetry, \begin{align*} \mathbb{P}_Z(Z<-\sqrt{2u\log d})=d^{-u+o(1)}. \end{align*} The same estimate applies after replacing $u$ by any fixed positive expression in the chosen constants $q$ and $r$. Under the mixture prior, each active coordinate has distribution $\mathcal{N}(a_d,1)$. Here $q>r$, so $\tau_d-a_d=\sqrt{2\log d}(\sqrt q-\sqrt r)$ is a positive threshold. By the union bound over the $s_d$ active coordinates, \begin{align*} P_{\Pi_{d,+}}\left(\max_{i\in S_d}X_i>\tau_d\right)\le s_d\,\mathbb{P}_Z\left(Z>\tau_d-a_d\right). \end{align*} The Gaussian tail estimate then gives \begin{align*} s_d\,\mathbb{P}_Z\left(Z>\tau_d-a_d\right)=d^{1-\beta-(\sqrt q-\sqrt r)^2+o(1)}\to0. \end{align*} Since $\mathbb{E}_0\widetilde L_d$ is exactly the mixture probability of the truncation event, this proves $\mathbb{E}_0\widetilde L_d\to1$. It remains to prove $\mathbb{E}_0\widetilde L_d^2\to1$. Fix supports $S,T\subset\{1,\dots,d\}$ with $|S|=|T|=s_d$ and write $K=|S\cap T|$. Coordinates in $S\triangle T$ contribute factors at most $1$, because \begin{align*} \mathbb{E}_0\left[e^{a_dX_i-a_d^2/2}\mathbb{1}_{\{X_i\le\tau_d\}}\right]\le1. \end{align*} For a common coordinate $i\in S\cap T$, completing the square in the one-dimensional standard normal density gives \begin{align*} \mathbb{E}_0\left[e^{2a_dX_i-a_d^2}\mathbb{1}_{\{X_i\le\tau_d\}}\right]=e^{a_d^2}\mathbb{P}_Z\left(Z\le\tau_d-2a_d\right). \end{align*} If $2\sqrt r\le\sqrt q$, then $\tau_d-2a_d\ge0$ and the probability is at most $1$, giving the exponent $2r$. If $2\sqrt r>\sqrt q$, then $\tau_d-2a_d=-\sqrt{2\log d}(2\sqrt r-\sqrt q)$ and the left-tail form of the same Gaussian estimate gives exponent $2r-(2\sqrt r-\sqrt q)^2$. Combining the two cases with the convention $[u]_+=\max\{u,0\}$ gives \begin{align*} e^{a_d^2}\mathbb{P}_Z\left(Z\le\tau_d-2a_d\right)\le d^{\kappa(q,r)+o(1)}, \end{align*} where the $o(1)$ is uniform for the fixed chosen value of $q$. Hence \begin{align*} \mathbb{E}_0\left[L_S(X)L_T(X)\mathbb{1}_{\{\max_{i\in S}X_i\le\tau_d\}}\mathbb{1}_{\{\max_{i\in T}X_i\le\tau_d\}}\right] \le d^{K\kappa(q,r)+o(K)}. \end{align*} Averaging over independent uniform supports and using the same hypergeometric bound as above yields \begin{align*} \mathbb{E}_0\widetilde L_d^2\le 1+\sum_{k=1}^{s_d}\frac{1}{k!}\left(\frac{s_d^2}{d-s_d}d^{\kappa(q,r)+o(1)}\right)^k. \end{align*} The exponential series bound gives \begin{align*} \mathbb{E}_0\widetilde L_d^2\le \exp\left(d^{1-2\beta+\kappa(q,r)+o(1)}\right). \end{align*} Because $\kappa(q,r)<2\beta-1$, the exponent tends to $0$, so $\mathbb{E}_0\widetilde L_d^2\to1$. The truncated second-moment criterion used here is the following elementary implication: if $\widetilde L_d\le L_d$, $\mathbb{E}_0\widetilde L_d\to1$, and $\mathbb{E}_0\widetilde L_d^2\to1$, then $\|P_{\Pi_{d,+}}-P_0\|_{\mathrm{TV}}\to0$. Indeed, by the likelihood-ratio formula and the definition of total variation, \begin{align*} 2\|P_{\Pi_{d,+}}-P_0\|_{\mathrm{TV}}=\mathbb{E}_0|L_d-1|. \end{align*} Since $\widetilde L_d\le L_d$ and $\mathbb{E}_0 L_d=1$, the triangle inequality gives \begin{align*} \mathbb{E}_0|L_d-1|\le \mathbb{E}_0|\widetilde L_d-1|+\mathbb{E}_0[L_d-\widetilde L_d]=\mathbb{E}_0|\widetilde L_d-1|+1-\mathbb{E}_0\widetilde L_d. \end{align*} Applying the [Cauchy-Schwarz inequality](/page/Cauchy-Schwarz%20Inequality) to $\widetilde L_d-1$ gives \begin{align*} \mathbb{E}_0|\widetilde L_d-1|\le \left(\mathbb{E}_0(\widetilde L_d-1)^2\right)^{1/2}\to0, \end{align*} because $\mathbb{E}_0\widetilde L_d\to1$ and $\mathbb{E}_0\widetilde L_d^2\to1$. Hence $\|P_{\Pi_{d,+}}-P_0\|_{\mathrm{TV}}\to0$, and contiguity follows. [/proof] [/step] [step:Convert total variation convergence into minimax impossibility] The claim implies the stronger minimax conclusion $\inf_{\psi_d}R_d(\psi_d)\to1$ along the one-sided alternative. For any test $\psi_d:\mathbb{R}^d\to[0,1]$, the Bayes risk against the prior $\Pi_{d,+}$ satisfies \begin{align*} P_0\psi_d+P_{\Pi_{d,+}}(1-\psi_d)=1+P_0\psi_d-P_{\Pi_{d,+}}\psi_d\ge 1-\|P_{\Pi_{d,+}}-P_0\|_{\mathrm{TV}}. \end{align*} Since the Bayes type II error is bounded above by $\sup_{\mu\in\mathcal{A}_{d,+}(r,\beta)}P_\mu(1-\psi_d)$, we have \begin{align*} R_d(\psi_d)\ge P_0\psi_d+P_{\Pi_{d,+}}(1-\psi_d)\ge 1-\|P_{\Pi_{d,+}}-P_0\|_{\mathrm{TV}}. \end{align*} Taking the infimum over $\psi_d$ and using $\|P_{\Pi_{d,+}}-P_0\|_{\mathrm{TV}}\to0$ gives $\inf_{\psi_d}R_d(\psi_d)\to1$. [/step] [step:Construct threshold tests above the boundary] Fix $r>\rho^*(\beta)$. Define $\gamma:(0,1)\to\mathbb{R}$ by \begin{align*} \gamma(q)=1-\beta-(\sqrt q-\sqrt r)_+^2-\frac{1-q}{2}. \end{align*} where $(u)_+=\max\{u,0\}$. We first verify that some $q\in(0,1)$ has $\gamma(q)>0$. For $q\le r$, the function is $\gamma(q)=1/2-\beta+q/2$. For $q>r$, it is \begin{align*} \gamma(q)=\frac12-\beta+\frac q2-(\sqrt q-\sqrt r)^2. \end{align*} Its derivative on $(r,1)$ is \begin{align*} \gamma'(q)=-\frac12+\sqrt{\frac rq}. \end{align*} Thus the interior critical point is $q=4r$ when $4r<1$, and the endpoint value is approached as $q\uparrow1$. These two alternatives give exactly the thresholds $\beta-1/2$ and $(1-\sqrt{1-\beta})^2$, so $r>\rho^*(\beta)$ allows a choice of $q$ with $\gamma(q)>0$. Let $\tau_d=\sqrt{2q\log d}$ and define the exceedance count $N_d:\mathbb{R}^d\to\{0,1,\dots,d\}$ by \begin{align*} N_d(x)=\sum_{i=1}^d\mathbb{1}_{\{x_i>\tau_d\}}. \end{align*} Under $P_0$, let $p_{0,d}=P_0(X_1>\tau_d)$, $m_{0,d}=dp_{0,d}$, and $v_{0,d}=dp_{0,d}(1-p_{0,d})$. The [Gaussian tail estimate](/page/Gaussian%20Tail%20Estimate), used for the fixed threshold exponent $q$, gives \begin{align*} p_{0,d}=d^{-q+o(1)},\qquad v_{0,d}=d^{1-q+o(1)}. \end{align*} Define the test $\psi_d:\mathbb{R}^d\to\{0,1\}$ by \begin{align*} \psi_d(x)=\mathbb{1}_{\{N_d(x)-m_{0,d}>d^{\gamma(q)/2}v_{0,d}^{1/2}\}}. \end{align*} By [Chebyshev's inequality](/page/Chebyshev%20Inequality) applied to the binomial exceedance count $N_d$ under $P_0$, we get \begin{align*} P_0(\psi_d=1)\le d^{-\gamma(q)}\to0. \end{align*} For any $\mu\in\mathcal{A}_{d,+}(r,\beta)$, each signal coordinate contributes exceedance probability \begin{align*} p_{1,d}=\mathbb{P}_Z(Z+a_d>\tau_d)=d^{-(\sqrt q-\sqrt r)_+^2+o(1)}, \end{align*} where $Z$ has standard normal law under $\mathbb{P}_Z$, again by the [Gaussian tail estimate](/page/Gaussian%20Tail%20Estimate) for the fixed exponent $q$. Therefore \begin{align*} \mathbb{E}_\mu[N_d]-m_{0,d}=s_d(p_{1,d}-p_{0,d})=d^{1-\beta-(\sqrt q-\sqrt r)_+^2+o(1)}. \end{align*} Dividing by $v_{0,d}^{1/2}=d^{(1-q)/2+o(1)}$ gives \begin{align*} \frac{\mathbb{E}_\mu[N_d]-m_{0,d}}{v_{0,d}^{1/2}}=d^{\gamma(q)+o(1)}. \end{align*} The variance under $P_\mu$ is the sum of Bernoulli variances. Since non-signal coordinates have exceedance probability $p_{0,d}$ and signal coordinates have exceedance probability $p_{1,d}$, we have \begin{align*} \operatorname{Var}_\mu(N_d)\le (d-s_d)p_{0,d}+s_dp_{1,d}. \end{align*} Using the tail estimates for $p_{0,d}$ and $p_{1,d}$ gives \begin{align*} \operatorname{Var}_\mu(N_d)\le d^{1-q+o(1)}+d^{1-\beta-(\sqrt q-\sqrt r)_+^2+o(1)}. \end{align*} The squared mean separation satisfies \begin{align*} \left(\mathbb{E}_\mu[N_d]-m_{0,d}\right)^2 =d^{2\{1-\beta-(\sqrt q-\sqrt r)_+^2\}+o(1)}. \end{align*} Set \begin{align*} A(q,r,\beta)=1-\beta-(\sqrt q-\sqrt r)_+^2. \end{align*} The choice $\gamma(q)>0$ says $A(q,r,\beta)>(1-q)/2$, so $A(q,r,\beta)>0$. Therefore the squared mean separation $d^{2A(q,r,\beta)+o(1)}$ dominates the null variance term $d^{1-q+o(1)}$, because $2A(q,r,\beta)>1-q$, and it dominates the signal variance term $d^{A(q,r,\beta)+o(1)}$, because $A(q,r,\beta)>0$. The rejection threshold is $d^{\gamma(q)/2}v_{0,d}^{1/2}=d^{A(q,r,\beta)-\gamma(q)/2+o(1)}$. Hence it is smaller than the mean separation $d^{A(q,r,\beta)+o(1)}$ by the polynomial factor $d^{\gamma(q)/2+o(1)}$. Applying [Chebyshev's inequality](/page/Chebyshev%20Inequality) to $N_d$ under $P_\mu$ gives \begin{align*} P_\mu(\psi_d=0) &\le P_\mu\left(|N_d-\mathbb{E}_\mu[N_d]|\ge \mathbb{E}_\mu[N_d]-m_{0,d}-d^{\gamma(q)/2}v_{0,d}^{1/2}\right)\to0, \end{align*} uniformly over $\mathcal{A}_{d,+}(r,\beta)$, because the denominator has square exponent $2A(q,r,\beta)+o(1)$ while $\operatorname{Var}_\mu(N_d)=o(d^{2A(q,r,\beta)})$. Thus detection is possible for $r>\rho^*(\beta)$. [/step] [step:Optimize the threshold exponent] For $q\le r$, one has $(\sqrt q-\sqrt r)_+=0$, so \begin{align*} \gamma(q)=1-\beta-\frac{1-q}{2}=\frac12-\beta+\frac q2. \end{align*} This is increasing in $q$ and therefore contributes no larger value than the value approached at $q=r$. For $q>r$, the exponent is \begin{align*} \gamma(q)=\frac12-\beta+\frac q2-(\sqrt q-\sqrt r)^2. \end{align*} Differentiating on $(r,1)$ gives \begin{align*} \gamma'(q)=-\frac12+\sqrt{\frac r q}, \end{align*} so the unique interior critical point is $q=4r$ when $4r\in(r,1)$. If $1/2<\beta\le3/4$ and $r>\beta-1/2$, choose $q<1$ close to $4r$ when $r<1/4$, and choose $q<1$ close enough to $1$ when $r\ge1/4$. In the first subcase the maximum value is $r-(\beta-1/2)>0$; in the second subcase $r\ge1/4\ge\beta-1/2$, and the endpoint value is positive after taking $q$ sufficiently close to $1$. Conversely, if $r\le\beta-1/2$, the preceding derivative computation and the increasing behaviour on $q\le r$ show that $\gamma(q)\le0$ for every $q\in(0,1)$. If $3/4<\beta<1$, the relevant maximum over $q\in(0,1)$ is approached at the endpoint $q=1$, and positivity is \begin{align*} 1-\beta-(1-\sqrt r)^2>0, \end{align*} which is equivalent to $r>(1-\sqrt{1-\beta})^2$. This proves that the threshold construction succeeds precisely above the stated boundary. [/step] [step:Transfer the argument to signed fixed amplitudes] For the signed alternative, the lower bound is immediate from the one-sided result. The signed class permits arbitrary signs on the support, so it contains the all-positive class $\mathcal{A}_{d,+}(r,\beta)$ as a subclass. Therefore the supremum of the type II error over the signed class is at least the supremum over $\mathcal{A}_{d,+}(r,\beta)$, and the impossibility result for the one-sided class implies the same lower bound for the signed class. For the upper bound, replace $N_d$ by the two-sided exceedance count $M_d:\mathbb{R}^d\to\{0,1,\dots,d\}$ defined by \begin{align*} M_d(x)=\sum_{i=1}^d\mathbb{1}_{\{|x_i|>\tau_d\}}. \end{align*} Under the null, the exceedance probability is twice the one-sided tail up to a constant factor, which does not change any power of $d$, and the null variance is still $d^{1-q+o(1)}$. Under a signal coordinate with mean either $a_d$ or $-a_d$, the probability of $|X_i|>\tau_d$ has the same exponent $d^{-(\sqrt q-\sqrt r)_+^2+o(1)}$. Using the corrected threshold $d^{\gamma(q)/2}$ times the null standard deviation, the same Chebyshev argument therefore gives vanishing risk for $r>\rho^*(\beta)$, and the signed lower bound gives impossibility for $r<\rho^*(\beta)$. This completes the proof for both alternatives. [guided] Here is the full proof in guided form. First define the one-sided least-favourable prior by choosing a support $S_d$ uniformly among all subsets of size $s_d$ and putting $\mu_i=a_d$ on $S_d$. For a fixed support $S$, the likelihood ratio against the null is \begin{align*} L_S(x)=\exp\left(a_d\sum_{i\in S}x_i-\frac{s_da_d^2}{2}\right). \end{align*} The mixture likelihood is $L_d=\mathbb{E}_{S_d}[L_{S_d}(X)]$. If $T_d$ is an independent copy of $S_d$ and $K_d=|S_d\cap T_d|$, independence of the Gaussian coordinates gives \begin{align*} \mathbb{E}_0[L_d^2]=\mathbb{E}\exp(a_d^2K_d). \end{align*} The overlap obeys \begin{align*} \mathbb{P}(K_d=k)\le \frac{1}{k!}\left(\frac{s_d^2}{d-s_d}\right)^k, \end{align*} so \begin{align*} \mathbb{E}_0[L_d^2]-1\le \exp\left(\frac{s_d^2}{d-s_d}(e^{a_d^2}-1)\right)-1. \end{align*} Because $s_d^2/d=d^{1-2\beta+o(1)}$ and $e^{a_d^2}=d^{2r}$, this tends to $0$ when $r<\beta-1/2$. Hence $L_d\to1$ in $L^2(P_0)$, so the mixture and null are close in total variation. For $3/4<\beta<1$, use truncation. Choose $q\in(r,1)$ so that \begin{align*} (\sqrt q-\sqrt r)^2>1-\beta \end{align*} and \begin{align*} \kappa(q,r)<2\beta-1 \end{align*} where \begin{align*} \kappa(q,r)=\left[2r-(2\sqrt r-\sqrt q)_+^2\right]_+. \end{align*} Such a $q$ exists below the very sparse boundary by taking $q$ close to $1$. Indeed, the first inequality follows from $(1-\sqrt r)^2>1-\beta$. For the [second inequality](/theorems/2899), if $2\sqrt r\le1$, then $\kappa(q,r)=2r$ for $q$ close to $1$, and $r\le1/4<\beta-1/2$ gives $\kappa(q,r)<2\beta-1$. If $2\sqrt r>1$, then $\kappa(q,r)\to2r-(2\sqrt r-1)^2$, and the boundary condition is equivalent to this limit being less than $2\beta-1$. Let $\tau_d=\sqrt{2q\log d}$ and define \begin{align*} \widetilde L_d(x)=\mathbb{E}_{S_d}\left[L_{S_d}(x)\mathbb{1}_{\{\max_{i\in S_d}x_i\le \tau_d\}}\right]. \end{align*} The first inequality above makes the truncated-away event have mixture probability $o(1)$, hence $\mathbb{E}_0\widetilde L_d\to1$. For a common active coordinate in two supports, \begin{align*} \mathbb{E}_0\left[e^{2a_dX_i-a_d^2}\mathbb{1}_{\{X_i\le\tau_d\}}\right]\le d^{\kappa(q,r)+o(1)}. \end{align*} Combining this with the same overlap bound gives \begin{align*} \mathbb{E}_0\widetilde L_d^2\le \exp\left(d^{1-2\beta+\kappa(q,r)+o(1)}\right)=1+o(1). \end{align*} Since $\widetilde L_d\le L_d$, the inequality \begin{align*} \mathbb{E}_0|L_d-1|\le \mathbb{E}_0|\widetilde L_d-1|+1-\mathbb{E}_0\widetilde L_d \end{align*} and Cauchy-Schwarz imply total variation convergence. For any test $\psi_d$, the Bayes risk against the prior satisfies \begin{align*} P_0\psi_d+P_{\Pi_{d,+}}(1-\psi_d)\ge 1-\|P_{\Pi_{d,+}}-P_0\|_{\mathrm{TV}}. \end{align*} Because this Bayes type II error is bounded above by the worst-case type II error over the sparse class, total variation convergence proves that every test has risk tending to $1$ below the boundary. For the upper bound, choose $q\in(0,1)$ with \begin{align*} \gamma(q)=1-\beta-(\sqrt q-\sqrt r)_+^2-\frac{1-q}{2}>0. \end{align*} This is possible exactly when $r>\rho^*(\beta)$ by differentiating $\gamma$ on $q>r$: the critical point is $q=4r$, and the endpoint $q\uparrow1$ gives the very sparse branch. Count exceedances $N_d=\sum_i\mathbb{1}_{\{X_i>\tau_d\}}$ with $\tau_d=\sqrt{2q\log d}$. Under the null, the mean and variance are of order $d^{1-q+o(1)}$, so [Chebyshev's inequality](/theorems/1126) makes the type I error vanish for the rejection rule \begin{align*} N_d-m_{0,d}>d^{\gamma(q)/2}v_{0,d}^{1/2}. \end{align*} Under any positive sparse signal, the mean exceeds the null mean by \begin{align*} d^{1-\beta-(\sqrt q-\sqrt r)_+^2+o(1)}, \end{align*} which is $d^{\gamma(q)+o(1)}$ null standard deviations. If \begin{align*} A=1-\beta-(\sqrt q-\sqrt r)_+^2, \end{align*} then the mean separation has order $d^{A+o(1)}$, while the alternative variance is at most \begin{align*} d^{1-q+o(1)}+d^{A+o(1)}. \end{align*} The condition $\gamma(q)>0$ says $A>(1-q)/2$, so both variance terms are $o(d^{2A})$. Chebyshev's inequality therefore makes the type II error vanish. Finally, the signed class contains the positive class, so the lower bound transfers by inclusion. For the signed upper bound, replace $N_d$ by the two-sided exceedance count $M_d=\sum_i\mathbb{1}_{\{|X_i|>\tau_d\}}$. The null tail changes only by a constant factor and the signal tail has the same exponent, so the same Chebyshev argument proves the signed achievability statement. [/guided] [/step]

Prerequisites (0/17 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

Definitions & Concepts

Explore Further

Distribution Definition Continuity Definition Random Variable Definition Expectation Definition Variance Definition Event Definition Derivative Definition Boundary Definition Function Definition Interior Definition Sequence Definition Matrix Definition test Theorem #89 First Inequality Theorem #2897 Chebyshev's Inequality Theorem #1126 Triangle Inequality For Inner Product Spaces Theorem #433 Second-Moment Criterion for Contiguity Theorem #5944 Eigenvector Inconsistency Below the Spiked PCA Threshold Probability & Statistics Harmonicity of the Brownian Dirichlet Solution Brownian Motion Conditional Expectations are Uniformly Integrable Martingale Theory Memoryless Property of the Exponential Probability Theory Yang-Barron Entropy Lower Bound Probability & Statistics Discrete Convolution Formula Probability Theory Extreme Information Levels Probability & Statistics Sparse Gaussian Design Prediction Minimax Lower Bound Probability & Statistics Probability & Statistics Area

What brings you to Androma?

Start with a route through the knowledge graph.