[proofplan]
The proof separates the deterministic pivotal identity from the probabilistic calibration. First we substitute $\varepsilon=\sigma z$ and cancel the positive scale factor $\sigma$, showing that the normalized score contains no unknown noise scale. Then we bound the numerator by Gaussian one-dimensional tail estimates and the denominator by a lower-tail estimate for $|z|$. Combining these two events by a union bound gives the advertised order for the square-root Lasso score.
[/proofplan]
custom_env
admin
[step:Cancel the unknown noise scale from the normalized score]
Let $(\Omega,\mathcal F,\mathbb P)$ be a probability space on which the Gaussian random vector
\begin{align*}
z: \Omega &\to \mathbb{R}^n
\end{align*}
is defined as a measurable map. For every vector $v\in\mathbb{R}^n$, write $|v|:=\|v\|_2$ for the Euclidean norm. Since $z$ has a non-degenerate Gaussian distribution on $\mathbb{R}^n$, the event $\{|z|=0\}$ has probability zero. On $\{|z|>0\}$ we also have $|\varepsilon|=\sigma |z|>0$. Therefore, because $\sigma>0$,
\begin{align*}
\frac{X^\top \varepsilon}{\sqrt n\,|\varepsilon|}=\frac{X^\top(\sigma z)}{\sqrt n\,|\sigma z|}=\frac{\sigma X^\top z}{\sqrt n\,\sigma |z|}=\frac{X^\top z}{\sqrt n\,|z|}.
\end{align*} Taking the $\ell^\infty$ norm on $\mathbb{R}^p$ gives
\begin{align*}
\left\|\frac{X^\top \varepsilon}{\sqrt n\,|\varepsilon|}\right\|_\infty
=
\left\|\frac{X^\top z}{\sqrt n\,|z|}\right\|_\infty.
\end{align*}
The right-hand side depends on $X$ and $z$, but not on $\sigma$.
[/step]
custom_env
admin
[step:Control every coordinate of the Gaussian numerator]For each $j \in \{1,\dots,p\}$, define the real-valued Gaussian [random variable](/page/Random%20Variable) $G_j: \Omega \to \mathbb{R}$ by
\begin{align*}
G_j(\omega)=\frac{X_j^\top z(\omega)}{\sqrt n}.
\end{align*}
The updated theorem statement assumes that $X$ is deterministic, so each column $X_j\in\mathbb{R}^n$ is a fixed vector. Since $z \sim \mathcal{N}(0,I_n)$, the fixed linear functional $w\mapsto X_j^\top w$ sends $z$ to a centered Gaussian random variable with variance $|X_j|^2$. Hence $G_j$ is centered Gaussian with variance
\begin{align*}
\tau_j^2:=\operatorname{Var}(G_j)=\frac{|X_j|^2}{n}\leq 1.
\end{align*}
We prove the needed one-dimensional tail estimate from the Gaussian moment-generating formula. For every $t>0$,
\begin{align*}
\mathbb{E}[e^{tG_j}]=e^{t^2\tau_j^2/2}\leq e^{t^2/2}.
\end{align*}
Markov's inequality applied to the nonnegative random variable $e^{tG_j}$ gives
\begin{align*}
\mathbb{P}(G_j>u)
=
\mathbb{P}(e^{tG_j}>e^{tu})
\leq
e^{-tu}\mathbb{E}[e^{tG_j}]
\leq
e^{-tu+t^2/2}.
\end{align*}
Choosing $t=u$ yields $\mathbb{P}(G_j>u)\leq e^{-u^2/2}$. Applying the same argument to $-G_j$ and using subadditivity of probability,
\begin{align*}
\mathbb{P}(|G_j|>u)\leq 2e^{-u^2/2}.
\end{align*}
Taking the union bound, that is, subadditivity applied to the union of the events $\{|G_j|>u\}$, gives
\begin{align*}
\mathbb{P}\left(\max_{1\leq j\leq p}|G_j|>u\right)
\leq
\sum_{j=1}^p \mathbb{P}(|G_j|>u)
\leq
2p e^{-u^2/2}.
\end{align*}[/step]
custom_env
admin
[guided]The numerator of the score is a maximum over the $p$ coordinates $X_j^\top z$. We normalize each coordinate by $\sqrt n$ and define $G_j: \Omega \to \mathbb{R}$ by
\begin{align*}
G_j(\omega)=\frac{X_j^\top z(\omega)}{\sqrt n}.
\end{align*}
Why does this produce a Gaussian random variable with a controlled variance? The updated theorem statement assumes that $X$ is deterministic. Thus, for each fixed $j$, the column $X_j\in\mathbb{R}^n$ is a fixed vector, and the map $w\mapsto X_j^\top w$ is a fixed linear functional on $\mathbb{R}^n$. Since $z\sim\mathcal N(0,I_n)$, this fixed linear functional sends $z$ to a centered Gaussian random variable with variance $|X_j|^2$. Therefore
\begin{align*}
\tau_j^2:=\operatorname{Var}(G_j)=\operatorname{Var}\left(\frac{X_j^\top z}{\sqrt n}\right)=\frac{|X_j|^2}{n}\leq 1.
\end{align*}
This is exactly where the column normalization hypothesis is used.
Now we derive the tail bound rather than citing it. For every $t>0$, the moment-generating formula for a centered Gaussian with variance $\tau_j^2$ gives
\begin{align*}
\mathbb{E}[e^{tG_j}]=e^{t^2\tau_j^2/2}\leq e^{t^2/2}.
\end{align*}
Markov's inequality applied to the nonnegative random variable $e^{tG_j}$ yields
\begin{align*}
\mathbb{P}(G_j>u)
=
\mathbb{P}(e^{tG_j}>e^{tu})
\leq
e^{-tu}\mathbb{E}[e^{tG_j}]
\leq
e^{-tu+t^2/2}.
\end{align*}
The best choice in this bound is $t=u$, which gives
\begin{align*}
\mathbb{P}(G_j>u)\leq e^{-u^2/2}.
\end{align*}
Applying the same computation to $-G_j$ gives the lower-tail estimate, and subadditivity of probability gives
\begin{align*}
\mathbb{P}(|G_j|>u)\leq \mathbb{P}(G_j>u)+\mathbb{P}(-G_j>u)\leq 2e^{-u^2/2}.
\end{align*}
The score takes the maximum over all columns, so we apply subadditivity to the union of the bad coordinate events:
\begin{align*}
\mathbb{P}\left(\max_{1\leq j\leq p}|G_j|>u\right)
=
\mathbb{P}\left(\bigcup_{j=1}^p \{|G_j|>u\}\right)
\leq
\sum_{j=1}^p \mathbb{P}(|G_j|>u)
\leq
2p e^{-u^2/2}.
\end{align*}
This is the source of the $\sqrt{\log p}$ numerator scale.[/guided]
custom_env
admin
[step:Lower bound the Gaussian denominator]
Define the nonnegative random variable $Q: \Omega \to [0,\infty)$ by
\begin{align*}
Q(\omega)=|z(\omega)|^2.
\end{align*}
Since $z\sim \mathcal{N}(0,I_n)$, the random variable $Q$ has the $\chi_n^2$ distribution. We derive the needed lower-tail estimate by Chernoff's method. For every $t>0$, Markov's inequality applied to $e^{-tQ}$ gives
\begin{align*}
\mathbb{P}\left(Q < \frac{n}{4}\right)
=
\mathbb{P}\left(e^{-tQ}>e^{-tn/4}\right)
\leq
e^{tn/4}\mathbb{E}[e^{-tQ}].
\end{align*}
The [Laplace transform](/page/Laplace%20Transform) of a $\chi_n^2$ random variable is
\begin{align*}
\mathbb{E}[e^{-tQ}]=(1+2t)^{-n/2}.
\end{align*}
Choosing $t=3/2$ gives
\begin{align*}
\mathbb{P}\left(Q < \frac{n}{4}\right)
\leq
e^{3n/8}4^{-n/2}
=
\exp\left(-n\left(\frac{\log 4}{2}-\frac{3}{8}\right)\right)
\leq
e^{-n/16},
\end{align*}
because $\frac{\log 4}{2}-\frac{3}{8}\geq \frac{1}{16}$. Equivalently,
\begin{align*}
\mathbb{P}\left(|z| \geq \frac{\sqrt n}{2}\right) \geq 1-e^{-n/16}.
\end{align*}
[/step]
custom_env
admin
[step:Combine the numerator and denominator events]
Define the event $A_u$ by
\begin{align*}
A_u:=\left\{\max_{1\leq j\leq p}\left|\frac{X_j^\top z}{\sqrt n}\right| \leq u\right\}.
\end{align*}
Define the event $B$ by
\begin{align*}
B:=\left\{|z| \geq \frac{\sqrt n}{2}\right\}.
\end{align*}
From the preceding two steps,
\begin{align*}
\mathbb{P}(A_u \cap B)
\geq
1-2p e^{-u^2/2}-e^{-n/16}.
\end{align*}
On $A_u \cap B$, the definition of the $\ell^\infty$ norm and the two event bounds give
\begin{align*}
\left\|\frac{X^\top z}{\sqrt n\,|z|}\right\|_\infty=\max_{1\leq j\leq p}\frac{|X_j^\top z|}{\sqrt n\,|z|}=\max_{1\leq j\leq p}\frac{|X_j^\top z|/\sqrt n}{|z|}\leq\frac{u}{\sqrt n/2}=\frac{2u}{\sqrt n}.
\end{align*}
Thus the normalized score is bounded by a quantity of order $\sqrt{\log p/n}$ after choosing $u$ of order $\sqrt{\log p}$.
[/step]
custom_env
admin
[step:Translate the bound into pivotal square-root Lasso tuning]
Choose $u=\sqrt{2\log(2p/\delta)}$ for $\delta\in(0,1)$. Then
\begin{align*}
2p e^{-u^2/2}
=
2p e^{-\log(2p/\delta)}
=
\delta.
\end{align*}
The previous step gives
\begin{align*}
\mathbb{P}\left(
\left\|\frac{X^\top z}{\sqrt n\,|z|}\right\|_\infty
\leq
2\sqrt{\frac{2\log(2p/\delta)}{n}}
\right)
\geq
1-\delta-e^{-n/16}.
\end{align*}
Together with the identity
\begin{align*}
\left\|\frac{X^\top \varepsilon}{\sqrt n\,|\varepsilon|}\right\|_\infty
=
\left\|\frac{X^\top z}{\sqrt n\,|z|}\right\|_\infty,
\end{align*}
this proves that the score level used by the square-root Lasso is pivotal: its distribution and high-probability calibration do not involve the unknown $\sigma$. Therefore penalty choices at the corresponding normalization have order $\sqrt{\log p/n}$ and can be selected without estimating $\sigma$.
[/step]