[proofplan]
We argue by contradiction through a many-way testing reduction. On a restricted-eigenvalue event, fix a common baseline support of size $k_\ell-1$ and vary one additional active coordinate among at least order $d_\ell$ possible locations. Exact support recovery on the whole sparse class would identify this varying coordinate with vanishing error. A self-contained Fano bound shows this is impossible unless the average Kullback-Leibler divergence between the induced Gaussian laws is at least a constant multiple of the logarithm of the number of alternatives; the restricted-eigenvalue upper bound controls that divergence by $n_\ell a_\ell^2/\sigma^2$. No lower restricted-eigenvalue estimate, sample-size estimate, or distributional property of the Gaussian design is used after conditioning on $X\in E_\ell$; those hypotheses serve only to place the deterministic obstruction inside the surrounding high-probability random-design setting.
[/proofplan]
[step:Build a many-alternative subfamily inside the sparse parameter class]
Fix $\ell$ and write $n:=n_\ell$, $d:=d_\ell$, $k:=k_\ell$, $a:=a_\ell$, and $X:=X_\ell$. Let $E:=E_\ell$. Work on an arbitrary realization of $X$ in $E$. For $\gamma\in\mathbb R^d$, define its support by
\begin{align*}
S(\gamma):=\{i\in\{1,\dots,d\}:\gamma_i\ne 0\},
\end{align*}
and define the sparse signal class by
\begin{align*}
\Theta_k(a):=\{\gamma\in\mathbb R^d: |S(\gamma)|=k \text{ and } |\gamma_i|\ge a \text{ for every } i\in S(\gamma)\}.
\end{align*}
If $k=1$, set $B:=\varnothing$. If $k\ge 2$, set
\begin{align*}
B:=\{1,\dots,k-1\}.
\end{align*}
Define the candidate coordinate set
\begin{align*}
J:=\{k,k+1,\dots,d\}.
\end{align*}
Then
\begin{align*}
M:=|J|=d-k+1\ge d/2,
\end{align*}
because $k\le d/2$.
For each $j\in J$, define the parameter vector $\beta_j\in\mathbb R^d$ as follows: $(\beta_j)_i=a$ when $i\in B\cup\{j\}$, and $(\beta_j)_i=0$ when $i\notin B\cup\{j\}$. Then $S(\beta_j)=B\cup\{j\}$ and $|S(\beta_j)|=k$, so $\beta_j\in\Theta_k(a)$ for every $j\in J$.
Let $P_j^X$ denote the conditional law of $y$ given $X$ under parameter $\beta_j$. Thus
\begin{align*}
P_j^X=\mathcal N(X\beta_j,\sigma^2 I_n)
\end{align*}
as a probability measure on $\mathbb R^n$ equipped with its Borel $\sigma$-algebra.
[guided]
For any vector $\gamma\in\mathbb R^d$, its support is
\begin{align*}
S(\gamma):=\{i\in\{1,\dots,d\}:\gamma_i\ne 0\}.
\end{align*}
The sparse signal class in this fixed dimension is
\begin{align*}
\Theta_k(a):=\{\gamma\in\mathbb R^d: |S(\gamma)|=k \text{ and } |\gamma_i|\ge a \text{ for every } i\in S(\gamma)\}.
\end{align*}
Thus $\Theta_k(a)$ contains only vectors with exactly $k$ active coordinates and all nonzero entries at least $a$ in absolute value. Therefore we cannot use the simpler alternatives $0$ and $a e_j$ when $k>1$, because $0\notin\Theta_k(a)$ and $a e_j$ has support size $1$ rather than $k$.
The repair is to keep $k-1$ active coordinates fixed and make only the last active coordinate vary. If $k=1$, no baseline coordinates are needed, so $B=\varnothing$. If $k\ge 2$, we take
\begin{align*}
B=\{1,\dots,k-1\}.
\end{align*}
The remaining candidate coordinates are
\begin{align*}
J=\{k,k+1,\dots,d\}.
\end{align*}
Since $k\le d/2$, the number of candidates satisfies
\begin{align*}
M=|J|=d-k+1\ge d/2.
\end{align*}
For each $j\in J$, define $\beta_j:\{1,\dots,d\}\to\mathbb R$ componentwise by setting $(\beta_j)_i=a$ when $i\in B\cup\{j\}$ and $(\beta_j)_i=0$ when $i\notin B\cup\{j\}$. Equivalently, $\beta_j$ has coefficient $a$ on every baseline coordinate in $B$ and coefficient $a$ at the one varying coordinate $j$. Hence
\begin{align*}
S(\beta_j)=B\cup\{j\},\qquad |S(\beta_j)|=k,\qquad \min_{i\in S(\beta_j)}|(\beta_j)_i|=a.
\end{align*}
Thus every $\beta_j$ lies in $\Theta_k(a)$.
For this fixed design matrix $X$, the observation model under $\beta_j$ is Gaussian:
\begin{align*}
y=X\beta_j+\varepsilon,\qquad \varepsilon\sim\mathcal N(0,\sigma^2 I_n).
\end{align*}
We denote this conditional law by
\begin{align*}
P_j^X=\mathcal N(X\beta_j,\sigma^2 I_n).
\end{align*}
The testing problem is now: from one draw $y\sim P_j^X$, identify which $j\in J$ generated the observation.
[/guided]
[/step]
[step:Convert support recovery into identification of the varying coordinate]
Let
\begin{align*}
\hat S: \mathbb R^n\times\mathbb R^{n\times d} &\to 2^{\{1,\dots,d\}}
\end{align*}
be any measurable estimator of the support. For the fixed design $X$, define the induced coordinate estimator
\begin{align*}
\hat j_X: \mathbb R^n &\to J\cup\{\ast\}
\end{align*}
as follows. If $\hat S(y,X)\setminus B$ is a singleton contained in $J$, then $\hat j_X(y)$ is the unique element of $\hat S(y,X)\setminus B$. In all other cases, set $\hat j_X(y):=\ast$. Here $\ast$ is an auxiliary failure symbol not belonging to $J$.
If $\hat S(y,X)=S(\beta_j)=B\cup\{j\}$, then $\hat j_X(y)=j$. Therefore, for every $j\in J$,
\begin{align*}
P_j^X(\hat j_X(y)\ne j)
\le
P_j^X(\hat S(y,X)\ne S(\beta_j)).
\end{align*}
Consequently, with
\begin{align*}
R_X:=\sup_{\beta\in\Theta_k(a)}
\mathbb P_\beta(\hat S(y,X)\ne S(\beta)\mid X),
\end{align*}
we have
\begin{align*}
\max_{j\in J}P_j^X(\hat j_X(y)\ne j)\le R_X.
\end{align*}
[/step]
[step:Apply a finite Fano bound to the induced testing problem]
[claim:Fano lower bound for finitely many Gaussian experiments]
Let $J$ be a finite set with $M:=|J|\ge 2$. For each $j\in J$, let $P_j$ be a probability measure on a common measurable space $(\mathcal Y,\mathcal A)$. Let the measurable estimator $\tilde j$ be the map
\begin{align*}
\tilde j: \mathcal Y &\to J\cup\{\ast\}.
\end{align*}
Fix $j_0\in J$. Define $D_{\mathrm{KL}}(P\,\|\,Q)$ to be the Kullback-Leibler divergence of a probability measure $P$ from a probability measure $Q$, with value $+\infty$ if $P$ is not absolutely continuous with respect to $Q$. Then
\begin{align*}
\max_{j\in J}P_j(\tilde j\ne j)
\ge
1-\frac{\frac{1}{M}\sum_{j\in J}D_{\mathrm{KL}}(P_j\,\|\,P_{j_0})+\log 2}{\log M}.
\end{align*}
[/claim]
[proof]
Let $V$ be a $J$-valued [random variable](/page/Random%20Variable) with the uniform law on $J$, and conditional on the event $\{V=j\}$ let $Y$ be a $\mathcal Y$-valued random variable with law $P_j$. Let $Q$ be the marginal law of $Y$, so
\begin{align*}
Q=\frac{1}{M}\sum_{m\in J}P_m.
\end{align*}
Define the mutual information $I(V;Y)$ by
\begin{align*}
I(V;Y)
:=
\frac{1}{M}\sum_{j\in J}D_{\mathrm{KL}}(P_j\,\|\,Q).
\end{align*}
If at least one term $D_{\mathrm{KL}}(P_j\,\|\,P_{j_0})$ is infinite, the desired bound follows after substituting $+\infty$ on the right-hand side. Hence assume all these divergences are finite. Then $P_j\ll P_{j_0}$ for every $j\in J$, and therefore $Q\ll P_{j_0}$. Let $p_j:=dP_j/dP_{j_0}$ and $q:=dQ/dP_{j_0}$. By the definition of $Q$,
\begin{align*}
q=\frac{1}{M}\sum_{m\in J}p_m
\end{align*}
$P_{j_0}$-almost everywhere. Expanding the average divergence relative to $P_{j_0}$ gives
\begin{align*}
\frac{1}{M}\sum_{j\in J}D_{\mathrm{KL}}(P_j\,\|\,P_{j_0})
&=
\frac{1}{M}\sum_{j\in J}\int_{\mathcal Y} p_j\log p_j\,dP_{j_0}.
\end{align*}
Since $p_j=q\,(dP_j/dQ)$ $P_{j_0}$-almost everywhere on the set where $q>0$, and $p_j=0$ where $q=0$, we obtain
First, using $p_j=q\,(dP_j/dQ)$ $P_{j_0}$-almost everywhere on $\{q>0\}$ and $p_j=0$ on $\{q=0\}$ gives
\begin{align*}
\frac{1}{M}\sum_{j\in J}\int_{\mathcal Y} p_j\log p_j\,dP_{j_0}
=
\frac{1}{M}\sum_{j\in J}\int_{\mathcal Y} p_j\log\left(\frac{dP_j}{dQ}\right)\,dP_{j_0}
+
\frac{1}{M}\sum_{j\in J}\int_{\mathcal Y}p_j\log q\,dP_{j_0}.
\end{align*}
Since $q=M^{-1}\sum_{j\in J}p_j$, this becomes
\begin{align*}
\frac{1}{M}\sum_{j\in J}\int_{\mathcal Y} p_j\log p_j\,dP_{j_0}
=
\frac{1}{M}\sum_{j\in J}D_{\mathrm{KL}}(P_j\,\|\,Q)
+
\int_{\mathcal Y}q\log q\,dP_{j_0}.
\end{align*}
By the definitions of $I(V;Y)$ and $D_{\mathrm{KL}}(Q\,\|\,P_{j_0})$, we conclude
\begin{align*}
\frac{1}{M}\sum_{j\in J}\int_{\mathcal Y} p_j\log p_j\,dP_{j_0}
=
I(V;Y)+D_{\mathrm{KL}}(Q\,\|\,P_{j_0}).
\end{align*}
Because $D_{\mathrm{KL}}(Q\,\|\,P_{j_0})\ge 0$, this identity yields the required information bound
\begin{align*}
I(V;Y)\le \frac{1}{M}\sum_{j\in J}D_{\mathrm{KL}}(P_j\,\|\,P_{j_0}).
\end{align*}
Let $\hat V:=\tilde j(Y)$. Define the error probability $p_e:=\mathbb P(\hat V\ne V)$. Since $\ast\notin J$, every outcome with $\tilde j(Y)=\ast$ is counted as an error, so $\hat V$ is a finite-valued decision rule whose error event is precisely $\{\hat V\ne V\}$. Let $H(V)$ denote the Shannon entropy of $V$, and let $H(V\mid \hat V)$ denote the conditional Shannon entropy of $V$ given $\hat V$.
We prove the finite-alphabet entropy estimate needed here. Let $E_0$ be the Bernoulli random variable defined by $E_0=1$ on $\{\hat V\ne V\}$ and $E_0=0$ on $\{\hat V=V\}$. Conditional on $\hat V$ and $E_0=0$, the value of $V$ is determined by $V=\hat V$. Conditional on $\hat V$ and $E_0=1$, the value of $V$ belongs to at most $M$ possibilities. Hence
Since adding $E_0$ cannot decrease the information available to describe the pair, we have
\begin{align*}
H(V\mid \hat V)
\le H(E_0,V\mid \hat V).
\end{align*}
The finite-alphabet [chain rule for entropy](/theorems/1635) gives
\begin{align*}
H(E_0,V\mid \hat V)
= H(E_0\mid \hat V)+H(V\mid \hat V,E_0).
\end{align*}
The Bernoulli entropy is bounded by $\log 2$, and the conditional entropy of $V$ is $0$ on $\{E_0=0\}$ and at most $\log M$ on $\{E_0=1\}$. Therefore
\begin{align*}
H(V\mid \hat V)
\le \log 2+p_e\log M.
\end{align*}
Moreover, $H(V)=\log M$. Because $\hat V$ is a measurable function of $Y$, conditioning on the full observation $Y$ leaves no more uncertainty than conditioning on $\hat V$ alone, so
\begin{align*}
H(V\mid \hat V)\ge H(V\mid Y)=H(V)-I(V;Y)=\log M-I(V;Y).
\end{align*}
Combining the preceding upper and lower bounds on $H(V\mid \hat V)$ gives
\begin{align*}
p_e
\ge
1-\frac{I(V;Y)+\log 2}{\log M}.
\end{align*}
Finally,
\begin{align*}
p_e
=
\frac{1}{M}\sum_{j\in J}P_j(\tilde j\ne j)
\le
\max_{j\in J}P_j(\tilde j\ne j).
\end{align*}
Combining this with the information bound proves the claim.
[/proof]
Choose a fixed index $j_0\in J$. Applying the claim with $P_j=P_j^X$ and $\tilde j=\hat j_X$ gives
\begin{align*}
R_X
\ge
1-\frac{\frac{1}{M}\sum_{j\in J}D_{\mathrm{KL}}(P_j^X\,\|\,P_{j_0}^X)+\log 2}{\log M}.
\end{align*}
[/step]
[step:Bound the Gaussian divergences by the restricted eigenvalue upper bound]
Let $\kappa_+>0$ denote the fixed upper restricted-eigenvalue constant on the event $E$, meaning that every $2k$-sparse vector $v\in\mathbb R^d$ satisfies
\begin{align*}
\frac{1}{n}\|Xv\|_2^2\le \kappa_+\|v\|_2^2.
\end{align*}
For $j\in J$, both $P_j^X$ and $P_{j_0}^X$ are Gaussian measures on $\mathbb R^n$ with common covariance matrix $\sigma^2 I_n$. We derive the Kullback-Leibler formula directly. If $\mu_j:=X\beta_j$ and $\mu_{j_0}:=X\beta_{j_0}$ denote the two mean vectors in $\mathbb R^n$, then the densities of $P_j^X$ and $P_{j_0}^X$ with respect to $n$-dimensional [Lebesgue measure](/page/Lebesgue%20Measure) $\mathcal L^n$ satisfy
\begin{align*}
\log\frac{dP_j^X}{dP_{j_0}^X}(y)
=
-\frac{1}{2\sigma^2}|y-\mu_j|^2+\frac{1}{2\sigma^2}|y-\mu_{j_0}|^2.
\end{align*}
Taking expectation with respect to $P_j^X$ and using $\mathbb E_j[y-\mu_j]=0$ and $\mathbb E_j[y-\mu_{j_0}]=\mu_j-\mu_{j_0}$ in the cross term gives
\begin{align*}
D_{\mathrm{KL}}(P_j^X\,\|\,P_{j_0}^X)
=
\frac{1}{2\sigma^2}|\mu_j-\mu_{j_0}|^2
=
\frac{1}{2\sigma^2}\|X(\beta_j-\beta_{j_0})\|_2^2.
\end{align*}
The vector $\beta_j-\beta_{j_0}$ has support contained in $\{j,j_0\}$, so it is $2$-sparse and hence $2k$-sparse because $k\ge 1$. Since $X\in E$, the upper restricted-eigenvalue bound gives
\begin{align*}
\frac{1}{n}\|X(\beta_j-\beta_{j_0})\|_2^2
\le
\kappa_+\|\beta_j-\beta_{j_0}\|_2^2.
\end{align*}
If $j=j_0$, the right-hand side is $0$. If $j\ne j_0$, then $\beta_j-\beta_{j_0}$ has entries $a$ at $j$ and $-a$ at $j_0$, so
\begin{align*}
\|\beta_j-\beta_{j_0}\|_2^2=2a^2.
\end{align*}
Thus, for every $j\in J$,
\begin{align*}
D_{\mathrm{KL}}(P_j^X\,\|\,P_{j_0}^X)
\le
\frac{\kappa_+ n a^2}{\sigma^2}.
\end{align*}
Substituting this bound into the Fano inequality yields
\begin{align*}
R_X
\ge
1-\frac{\kappa_+ n a^2/\sigma^2+\log 2}{\log M}.
\end{align*}
[guided]
The only quantitative input from the design matrix is the upper restricted-eigenvalue bound. The constant used for that bound is $\kappa_+>0$, defined by the condition that every $2k$-sparse vector $v\in\mathbb R^d$ satisfies
\begin{align*}
\frac{1}{n}\|Xv\|_2^2\le \kappa_+\|v\|_2^2.
\end{align*}
For two different alternatives $\beta_j$ and $\beta_{j_0}$, the Gaussian laws have the same covariance $\sigma^2 I_n$ and different means $X\beta_j$ and $X\beta_{j_0}$. Let
\begin{align*}
\mu_j:=X\beta_j,\qquad \mu_{j_0}:=X\beta_{j_0}
\end{align*}
denote these two mean vectors in $\mathbb R^n$. We compute the divergence from the Gaussian densities with respect to $\mathcal L^n$. For $y\in\mathbb R^n$,
\begin{align*}
\log\frac{dP_j^X}{dP_{j_0}^X}(y)
=
-\frac{1}{2\sigma^2}|y-\mu_j|^2+\frac{1}{2\sigma^2}|y-\mu_{j_0}|^2.
\end{align*}
Under $P_j^X$, write $y=\mu_j+\varepsilon$ with $\varepsilon\sim\mathcal N(0,\sigma^2 I_n)$. Then $y-\mu_{j_0}=\varepsilon+(\mu_j-\mu_{j_0})$, and the expectation of the cross term is zero because $\mathbb E_j[\varepsilon]=0$. Hence
\begin{align*}
D_{\mathrm{KL}}(P_j^X\,\|\,P_{j_0}^X)
=
\frac{1}{2\sigma^2}|\mu_j-\mu_{j_0}|^2
=
\frac{1}{2\sigma^2}\|X(\beta_j-\beta_{j_0})\|_2^2.
\end{align*}
Now we check that the restricted-eigenvalue hypothesis applies to $\beta_j-\beta_{j_0}$. The baseline coordinates in $B$ cancel, because every $\beta_j$ has the same value $a$ on $B$. Therefore $\beta_j-\beta_{j_0}$ is supported only on the two varying coordinates $j$ and $j_0$. Since $k\ge 1$, every $2$-sparse vector is $2k$-sparse. Hence, on $E$,
\begin{align*}
\frac{1}{n}\|X(\beta_j-\beta_{j_0})\|_2^2
\le
\kappa_+\|\beta_j-\beta_{j_0}\|_2^2.
\end{align*}
If $j\ne j_0$, then the difference vector has one coordinate equal to $a$ and one coordinate equal to $-a$, so
\begin{align*}
\|\beta_j-\beta_{j_0}\|_2^2=a^2+a^2=2a^2.
\end{align*}
If $j=j_0$, the difference is zero, and the same upper bound remains valid. Therefore
\begin{align*}
D_{\mathrm{KL}}(P_j^X\,\|\,P_{j_0}^X)
\le
\frac{1}{2\sigma^2}\,n\kappa_+(2a^2)
=
\frac{\kappa_+ n a^2}{\sigma^2}.
\end{align*}
[Fano's inequality](/theorems/1654) then forces
\begin{align*}
R_X
\ge
1-\frac{\kappa_+ n a^2/\sigma^2+\log 2}{\log M}.
\end{align*}
This inequality expresses the core obstruction: if $n a^2/\sigma^2$ is much smaller than $\log M$, then the observations do not carry enough information to identify the active coordinate among $M$ possibilities.
[/guided]
[/step]
[step:Use uniform vanishing recovery error on the restricted-eigenvalue event to force the signal lower bound]
For each realization $X\in E_\ell$, define
\begin{align*}
R_\ell(X):=\sup_{\beta\in\Theta_{k_\ell}(a_\ell)}\mathbb P_\beta\!\left(\hat S_\ell(y,X)\ne S(\beta)\mid X_\ell=X\right).
\end{align*}
By the uniform conditional recovery hypothesis in the theorem statement, where "on $E_\ell$" is interpreted as uniform over every design realization belonging to $E_\ell$,
\begin{align*}
\sup_{X\in E_\ell}R_\ell(X)\to 0.
\end{align*}
The assumption $\mathbb P(E_\ell)\to 1$ ensures that this uniform conditional statement is imposed on a typical set of Gaussian designs; the deterministic lower bound below only uses that the chosen realization $X$ belongs to $E_\ell$. The preceding steps show that, for every $X\in E_\ell$ and all sufficiently large $\ell$ with $M_\ell:=d_\ell-k_\ell+1\ge 2$,
\begin{align*}
R_\ell(X)\ge 1-\frac{\kappa_+ n_\ell a_\ell^2/\sigma^2+\log 2}{\log M_\ell}.
\end{align*}
Since $M_\ell\ge d_\ell/2$ and $d_\ell\to\infty$,
\begin{align*}
\log M_\ell\ge \frac{1}{2}\log d_\ell
\end{align*}
for all sufficiently large $\ell$. The [uniform convergence](/page/Uniform%20Convergence) gives $\sup_{X\in E_\ell}R_\ell(X)\le 1/2$ for all sufficiently large $\ell$, and hence, for each $X\in E_\ell$,
\begin{align*}
\frac{\kappa_+ n_\ell a_\ell^2}{\sigma^2}+\log 2\ge \frac{1}{2}\log M_\ell.
\end{align*}
For all sufficiently large $\ell$, $\log M_\ell\ge 4\log 2$, so
\begin{align*}
\frac{\kappa_+ n_\ell a_\ell^2}{\sigma^2}\ge \frac{1}{4}\log M_\ell\ge \frac{1}{8}\log d_\ell.
\end{align*}
Thus
\begin{align*}
a_\ell\ge \frac{1}{\sqrt{8\kappa_+}}\,\sigma\sqrt{\frac{\log d_\ell}{n_\ell}}
\end{align*}
for all sufficiently large $\ell$. This is the claimed necessary signal-strength lower bound, with a constant depending only on the restricted-eigenvalue upper bound.
[guided]
The previous step gave a deterministic lower bound for the worst-case conditional support recovery error on every design realization $X\in E_\ell$:
\begin{align*}
R_\ell(X)\ge 1-\frac{\kappa_+ n_\ell a_\ell^2/\sigma^2+\log 2}{\log M_\ell},
\end{align*}
where
\begin{align*}
M_\ell:=d_\ell-k_\ell+1.
\end{align*}
The recovery hypothesis says that this same quantity must vanish uniformly over $X\in E_\ell$:
\begin{align*}
\sup_{X\in E_\ell}R_\ell(X)\to 0.
\end{align*}
Thus, for all sufficiently large $\ell$, every $X\in E_\ell$ satisfies $R_\ell(X)\le 1/2$. Combining this upper bound with the deterministic Fano lower bound gives
\begin{align*}
\frac{\kappa_+ n_\ell a_\ell^2}{\sigma^2}+\log 2\ge \frac{1}{2}\log M_\ell.
\end{align*}
Now use the size of the testing family. Since $k_\ell\le d_\ell/2$, we have
\begin{align*}
M_\ell=d_\ell-k_\ell+1\ge d_\ell/2.
\end{align*}
Because $d_\ell\to\infty$, for all sufficiently large $\ell$,
\begin{align*}
\log M_\ell\ge \frac{1}{2}\log d_\ell
\end{align*}
and also
\begin{align*}
\log M_\ell\ge 4\log 2.
\end{align*}
The latter inequality lets us absorb the additive $\log 2$ term:
\begin{align*}
\frac{\kappa_+ n_\ell a_\ell^2}{\sigma^2}
\ge \frac{1}{2}\log M_\ell-\log 2
\ge \frac{1}{4}\log M_\ell
\ge \frac{1}{8}\log d_\ell.
\end{align*}
Solving this inequality for $a_\ell$ yields
\begin{align*}
a_\ell\ge \frac{1}{\sqrt{8\kappa_+}}\,\sigma\sqrt{\frac{\log d_\ell}{n_\ell}}
\end{align*}
for all sufficiently large $\ell$. This proves the necessary signal-strength lower bound, and the displayed constant depends only on the restricted-eigenvalue upper bound $\kappa_+$.
[/guided]
[/step]