[proofplan]
We work on the high-probability event on which the Gaussian design has restricted sparse eigenvalues of order $n$ up to sparsity $2k$. On that event, the empirical prediction norm $v \mapsto n^{-1/2}|Xv|$ and the Euclidean norm $v \mapsto |v|$ are equivalent for every $2k$-sparse vector. The lower bound follows by applying a sparse Fano testing argument on a $k$-sparse packing and using the upper sparse eigenvalue control to keep Kullback-Leibler divergences below a fixed fraction of the packing entropy. The upper bound uses the constrained least-squares estimator over the $k$-sparse parameter set, applies the prediction oracle bound at scale $\sigma^2 k\log(ed/k)$ for the unaveraged prediction loss $|X(\hat\beta-\beta)|^2$, and converts prediction error into Euclidean error using the lower sparse eigenvalue control.
[/proofplan]
[step:Condition on the restricted eigenvalue event for the Gaussian design]
For an integer $s \in \{1,\dots,d\}$, define the lower restricted eigenvalue of $X$ at sparsity $s$ by
\begin{align*}
\kappa_-(s;X)
&:=
\inf\left\{\frac{|Xv|^2}{|v|^2}: v \in \mathbb R^d,\ 0<|\{j:v_j\neq 0\}|\leq s\right\}.
\end{align*}
Define the upper restricted eigenvalue of $X$ at sparsity $s$ by
\begin{align*}
\kappa_+(s;X)
&:=
\sup\left\{\frac{|Xv|^2}{|v|^2}: v \in \mathbb R^d,\ 0<|\{j:v_j\neq 0\}|\leq s\right\}.
\end{align*}
Because the theorem statement takes $X$ to have independent $\mathcal N(0,1)$ entries, the [Gaussian restricted eigenvalue bound for sparse vectors](/page/Gaussian%20Restricted%20Eigenvalue%20Bound) applies in its unnormalized form. The distributional hypothesis is verified directly: the entries of $X$ are independent centered Gaussian random variables with variance $1$, and $X$ is independent of the noise vector $\varepsilon$. The bound gives universal constants $a,b,c \in (0,\infty)$ and $C_0 \in (0,\infty)$ such that, whenever
\begin{align*}
n \geq C_0 k \log\left(\frac{ed}{k}\right),
\end{align*}
the event
\begin{align*}
\mathcal E_X
:=
\{an \leq \kappa_-(2k;X) \text{ and } \kappa_+(2k;X) \leq bn\}
\end{align*}
satisfies
\begin{align*}
\mathbb P(\mathcal E_X)
\geq
1 - 2\exp\left(-c k \log\left(\frac{ed}{k}\right)\right).
\end{align*}
For the rest of the proof, fix a deterministic matrix $X \in \mathcal E_X$. Define the $k$-sparse parameter set
\begin{align*}
\Theta_k
:=
\{\theta\in\mathbb R^d:|\{j:\theta_j\neq0\}|\leq k\}.
\end{align*}
If $\beta,\gamma \in \Theta_k$, then $\beta-\gamma$ is $2k$-sparse. Hence the definition of $\mathcal E_X$ gives
\begin{align*}
an|\beta-\gamma|^2
\leq
|X(\beta-\gamma)|^2
\leq
bn|\beta-\gamma|^2.
\end{align*}
[/step]
[step:Prove the Euclidean minimax lower bound by sparse testing]
Let $\mathcal A$ denote the class of measurable estimators $\hat\beta:\mathbb R^n\times\mathbb R^{n\times d}\to\mathbb R^d$. Define
\begin{align*}
R_X
:=
\inf_{\hat\beta\in\mathcal A}\sup_{\beta\in\Theta_k}
\mathbb E_\beta\left[|\hat\beta(Y,X)-\beta|^2\mid X\right].
\end{align*}
Use the following standard [Varshamov-Gilbert packing](/page/Varshamov-Gilbert%20Bound) and reference-measure form of [Fano inequality](/page/Fano%27s%20Inequality) for sparse Gaussian regression: if a finite parameter set has pairwise squared Euclidean separation at least $r^2$ and every corresponding law has Kullback-Leibler divergence at most $(1/16)\log M$ from a common reference law, where $M$ is the cardinality of the finite set, then the minimax squared Euclidean risk over that finite set is at least a universal constant times $r^2$. There are universal constants $c_2,c_3,c_4 \in (0,\infty)$ such that, for each $1\leq k\leq d/2$, there is a set $\mathcal U\subset\{0,1\}^d$ with $|u|_0=k$ for every $u\in\mathcal U$,
\begin{align*}
\log |\mathcal U|
\geq
c_2 k\log\left(\frac{ed}{k}\right),
\end{align*}
and $|u-v|^2\geq c_3 k$ whenever $u,v\in\mathcal U$ and $u\neq v$. Define
\begin{align*}
\delta^2 := \alpha\sigma^2\frac{\log(ed/k)}{n},
\qquad
\mathcal V := \{\delta u : u\in\mathcal U\}\subset\Theta_k,
\end{align*}
where $\alpha>0$ is chosen so that $\alpha b/2\leq c_2/16$. For each $\theta=\delta u\in\mathcal V$,
\begin{align*}
|\theta|^2=k\delta^2,
\end{align*}
so the upper restricted eigenvalue bound gives
\begin{align*}
D_{\mathrm{KL}}\!\left(\mathcal N(X\theta,\sigma^2 I_n)\,\middle\|\,\mathcal N(0,\sigma^2 I_n)\right)
=
\frac{|X\theta|^2}{2\sigma^2}
\leq
\frac{bnk\delta^2}{2\sigma^2}
=
\frac{\alpha b}{2}k\log\left(\frac{ed}{k}\right)
\leq
\frac{1}{16}\log|\mathcal V|.
\end{align*}
Moreover distinct $\theta,\eta\in\mathcal V$ satisfy
\begin{align*}
|\theta-\eta|^2
=
\delta^2|u-v|^2
\geq
c_3 k\delta^2.
\end{align*}
[Fano's inequality](/theorems/1654) therefore gives
\begin{align*}
R_X
\geq
c_4 c_3 k\delta^2
=
c_4c_3\alpha\sigma^2\frac{k\log(ed/k)}{n}.
\end{align*}
Setting $c_1:=c_4c_3\alpha$ proves the lower bound.
[guided]
This guided block rewrites the lower-bound step; the upper-bound estimator argument is proved in the following steps. We now give the finite testing construction explicitly. The goal is to find many $k$-sparse vectors that are well separated in Euclidean distance, while the corresponding Gaussian laws remain close enough in Kullback-Leibler divergence for Fano's inequality to apply.
By the [Varshamov-Gilbert packing](/page/Varshamov-Gilbert%20Bound), there are universal constants $c_2,c_3\in(0,\infty)$ and a set $\mathcal U\subset\{0,1\}^d$ such that each $u\in\mathcal U$ has exactly $k$ nonzero coordinates,
\begin{align*}
\log |\mathcal U|
\geq
c_2 k\log\left(\frac{ed}{k}\right),
\end{align*}
and, for $u\neq v$ in $\mathcal U$,
\begin{align*}
|u-v|^2\geq c_3k.
\end{align*}
Choose a scale $\delta>0$ by
\begin{align*}
\delta^2 := \alpha\sigma^2\frac{\log(ed/k)}{n},
\end{align*}
where $\alpha>0$ will be fixed below, and define the finite parameter set
\begin{align*}
\mathcal V := \{\delta u : u\in\mathcal U\}\subset\Theta_k.
\end{align*}
This declaration ensures that every element of $\mathcal V$ is $k$-sparse, hence admissible. It also gives a precise separation estimate: if $\theta=\delta u$ and $\eta=\delta v$ are distinct elements of $\mathcal V$, then
\begin{align*}
|\theta-\eta|^2
=
\delta^2|u-v|^2
\geq
c_3k\delta^2.
\end{align*}
Next we control the information distance. Under parameter $\theta\in\mathcal V$, the conditional law of $Y$ given $X$ is the Gaussian measure $\mathcal N(X\theta,\sigma^2I_n)$. The Kullback-Leibler divergence from the law at $0$ is
\begin{align*}
D_{\mathrm{KL}}\!\left(\mathcal N(X\theta,\sigma^2 I_n)\,\middle\|\,\mathcal N(0,\sigma^2 I_n)\right)
=
\frac{|X\theta|^2}{2\sigma^2}.
\end{align*}
Because $\theta$ is $k$-sparse and $X\in\mathcal E_X$, the upper restricted eigenvalue bound gives
\begin{align*}
\frac{|X\theta|^2}{2\sigma^2}
\leq
\frac{bn|\theta|^2}{2\sigma^2}.
\end{align*}
Since $\theta=\delta u$ and $|u|^2=k$, this becomes
\begin{align*}
\frac{bn|\theta|^2}{2\sigma^2}
=
\frac{bnk\delta^2}{2\sigma^2}
=
\frac{\alpha b}{2}k\log\left(\frac{ed}{k}\right).
\end{align*}
Choose $\alpha$ so that $\alpha b/2\leq c_2/16$. Then
\begin{align*}
D_{\mathrm{KL}}\!\left(\mathcal N(X\theta,\sigma^2 I_n)\,\middle\|\,\mathcal N(0,\sigma^2 I_n)\right)
\leq
\frac{1}{16}\log|\mathcal V|.
\end{align*}
The reference-measure form of Fano's inequality applies because all conditional Gaussian laws indexed by $\mathcal V$ have Kullback-Leibler divergence at most $(1/16)\log |\mathcal V|$ from the common reference law $\mathcal N(0,\sigma^2I_n)$. It gives a universal constant $c_4>0$ such that every estimator has worst-case squared Euclidean risk at least $c_4$ times the squared separation scale. Hence
\begin{align*}
R_X
\geq
c_4 c_3 k\delta^2
=
c_4c_3\alpha\sigma^2\frac{k\log(ed/k)}{n}.
\end{align*}
With $c_1:=c_4c_3\alpha$, this is the desired lower bound.
[/guided]
[/step]
[step:Choose the constrained least-squares estimator for the upper bound]
Define the constrained least-squares estimator as a measurable map
\begin{align*}
\hat\beta_{\mathrm{LS}}: \mathbb R^n\times\mathbb R^{n\times d} &\to \Theta_k.
\end{align*}
For each $(y,X)\in\mathbb R^n\times\mathbb R^{n\times d}$, choose
\begin{align*}
\hat\beta_{\mathrm{LS}}(y,X) \in \operatorname*{argmin}_{\theta\in\Theta_k}|y-X\theta|^2,
\end{align*}
with an arbitrary measurable minimizer selected if there is more than one minimizer. We justify that such a measurable choice exists on the fixed event $\mathcal E_X$. For each support set $S\subset\{1,\dots,d\}$ with $|S|\leq k$, the restriction of the objective to the coordinate subspace
\begin{align*}
L_S:=\{\theta\in\mathbb R^d:\theta_j=0\text{ for every }j\notin S\}
\end{align*}
is a continuous quadratic function of the coordinates in $S$. Since $X\in\mathcal E_X$ and $|S|\leq k\leq2k$, the lower restricted eigenvalue bound gives coercivity on $L_S$; hence the minimum on $L_S$ is attained. There are finitely many such support sets, so taking the support with the smallest attained residual gives a global minimizer over $\Theta_k$. The finite-support construction and a fixed deterministic tie-breaking rule give a Borel measurable minimizer as a function of $(y,X)$. Since $\hat\beta_{\mathrm{LS}}(Y,X)\in\Theta_k$ and $\beta\in\Theta_k$, the difference
\begin{align*}
\hat\Delta := \hat\beta_{\mathrm{LS}}(Y,X)-\beta
\end{align*}
is $2k$-sparse. On $\mathcal E_X$, the lower restricted eigenvalue bound gives
\begin{align*}
|\hat\Delta|^2
\leq
\frac{1}{an}|X\hat\Delta|^2.
\end{align*}
Taking [conditional expectation](/page/Conditional%20Expectation) under parameter $\beta$ gives
\begin{align*}
\mathbb E_\beta\left[|\hat\beta_{\mathrm{LS}}(Y,X)-\beta|^2\mid X\right]
\leq
\frac{1}{an}
\mathbb E_\beta\left[|X(\hat\beta_{\mathrm{LS}}(Y,X)-\beta)|^2\mid X\right].
\end{align*}
[/step]
[step:Apply the sparse prediction oracle bound and convert to Euclidean risk]
The [sparse prediction oracle bound for constrained least squares](/page/Sparse%20Prediction%20Oracle%20Bound) applies to the conditional Gaussian model $Y=X\beta+\varepsilon$ with fixed deterministic design $X$ and noise $\varepsilon\sim\mathcal N(0,\sigma^2I_n)$. Its hypotheses match the present setup: after conditioning on the deterministic matrix $X$, the noise vector is centered Gaussian with covariance matrix $\sigma^2I_n$; the estimator is the least-squares minimizer constrained to the $k$-sparse set $\Theta_k$; the true parameter $\beta$ belongs to the same set $\Theta_k$; and the cited fixed-design result is uniform over deterministic matrices $X$. It gives a universal constant $c_5>0$ such that, for every $\beta\in\Theta_k$,
\begin{align*}
\mathbb E_\beta\left[|X(\hat\beta_{\mathrm{LS}}(Y,X)-\beta)|^2\mid X\right]
\leq
c_5\sigma^2 k\log\left(\frac{ed}{k}\right).
\end{align*}
Combining this estimate with the lower restricted eigenvalue conversion from the previous step yields
\begin{align*}
\mathbb E_\beta\left[|\hat\beta_{\mathrm{LS}}(Y,X)-\beta|^2\mid X\right]
\leq
\frac{c_5}{a}\sigma^2\frac{k\log(ed/k)}{n}.
\end{align*}
Taking the supremum over $\beta\in\Theta_k$ and then using that the minimax risk is no larger than the risk of the particular estimator $\hat\beta_{\mathrm{LS}}$, we obtain
\begin{align*}
R_X
\leq
\frac{c_5}{a}\sigma^2\frac{k\log(ed/k)}{n}.
\end{align*}
[/step]
[step:Combine the two bounds on the high-probability event]
For every fixed $X\in\mathcal E_X$, the lower and upper bounds proved above give
\begin{align*}
c_1\sigma^2\frac{k\log(ed/k)}{n}
\leq
\inf_{\hat\beta\in\mathcal A}\sup_{\beta\in\Theta_k}
\mathbb E_\beta\left[|\hat\beta(Y,X)-\beta|^2\mid X\right]
\leq
\frac{c_5}{a}\sigma^2\frac{k\log(ed/k)}{n}.
\end{align*}
Set
\begin{align*}
A_0 := c_1,
\qquad
A_1 := \frac{c_5}{a},
\qquad
c_0 := c.
\end{align*}
Choose the universal sample-size constant $C$ in the theorem statement so that $C\geq C_0$, where $C_0$ is the constant from the restricted-eigenvalue event. Then the hypothesis $n\geq Ck\log(ed/k)$ implies $n\geq C_0k\log(ed/k)$, so $\mathcal E_X$ has the probability bound proved in the first step. Since $\mathbb P(\mathcal E_X)\geq 1-2\exp(-c_0 k\log(ed/k))$, the asserted high-probability minimax Euclidean rate follows.
[/step]