[step:Bound the maximum projected noise energy]For a fixed set $S\subset\{1,\dots,d\}$ with $|S|\le2k$, the subspace $V_S$ has dimension at most $2k$, and
\begin{align*}
|\Pi_S\varepsilon|^2/\sigma^2
\end{align*}
is stochastically dominated by a chi-square [random variable](/page/Random%20Variable) with $2k$ degrees of freedom. This follows because $\Pi_S\varepsilon$ is a centered Gaussian vector in the subspace $V_S$ with covariance operator bounded by $\sigma^2\Pi_S$. The number of sets $S$ with $|S|\le2k$ is at most $N$, where
\begin{align*}
N:=\sum_{m=0}^{2k}\binom{d}{m}.
\end{align*}
Using $2k\le d$ and the elementary binomial bound $\binom{d}{m}\le (ed/m)^m$ for $1\le m\le 2k$, there is a universal constant $c_4>0$ such that
\begin{align*}
N\le \exp\{c_4k\log(ed/k)\}.
\end{align*}
We now derive the needed maximal bound from a chi-square tail estimate. For a fixed $S$, the [Laurent-Massart chi-square tail bound](/page/Laurent-Massart%20Chi-Square%20Tail%20Bound) gives, for every $u>0$,
\begin{align*}
\mathbb P_\beta\left(|\Pi_S\varepsilon|^2>\sigma^2(2k+2\sqrt{2ku}+2u)\right)\le e^{-u}.
\end{align*}
Taking a union bound over the $N$ supports gives, for every $u>0$,
\begin{align*}
\mathbb P_\beta\left(\max_{S\subset\{1,\dots,d\}:\ |S|\le2k}|\Pi_S\varepsilon|^2>\sigma^2(2k+2\sqrt{2k(u+\log N)}+2(u+\log N))\right)\le e^{-u}.
\end{align*}
Using $2\sqrt{ab}\le a+b$ with $a=2k$ and $b=u+\log N$, there is a universal constant $c_6>0$ such that, for every $u>0$,
\begin{align*}
\mathbb P_\beta\left(\max_{S\subset\{1,\dots,d\}:\ |S|\le2k}|\Pi_S\varepsilon|^2>c_6\sigma^2(k+\log N+u)\right)\le e^{-u}.
\end{align*}
For the nonnegative random variable
\begin{align*}
Z:=\max_{S\subset\{1,\dots,d\}:\ |S|\le2k}|\Pi_S\varepsilon|^2,
\end{align*}
set $t_0:=c_6\sigma^2(k+\log N)$. The tail identity for nonnegative random variables gives
\begin{align*}
\mathbb E_\beta[Z]
\le
t_0+\int_{t_0}^\infty \mathbb P_\beta(Z>t)\,d\mathcal{L}^1(t),
\end{align*}
where $\mathcal{L}^1$ denotes one-dimensional [Lebesgue measure](/page/Lebesgue%20Measure). With the change of variables $t=t_0+c_6\sigma^2u$, this becomes
\begin{align*}
\mathbb E_\beta[Z]
\le
c_6\sigma^2(k+\log N)+c_6\sigma^2\int_0^\infty e^{-u}\,d\mathcal{L}^1(u).
\end{align*}
Since $\int_0^\infty e^{-u}\,d\mathcal{L}^1(u)=1$ and $\log N\le c_4k\log(ed/k)$, after enlarging constants there is a universal constant $c_5>0$ such that
\begin{align*}
\mathbb E_\beta\left[\max_{S\subset\{1,\dots,d\}:\ |S|\le2k}|\Pi_S\varepsilon|^2\right]
\le
c_5\sigma^2 k\log(ed/k).
\end{align*}
Combining this with the deterministic inequality from the previous step yields
\begin{align*}
\mathbb E_\beta\left[\frac{1}{n}|X(\hat\beta_{\mathrm{cls}}-\beta)|^2\right]\le4c_5\sigma^2\frac{k\log(ed/k)}{n}.
\end{align*}
Taking the supremum over $\beta\in\Theta_k$ and then using that the minimax risk is at most the risk of $\hat\beta_{\mathrm{cls}}$ gives the upper bound with $C:=4c_5$.[/step]