[proofplan]
We decompose the debiased Lasso coordinate into a leading Gaussian score term plus a deterministic approximation-error remainder. The leading term is exactly standard normal after normalisation because $X_n$ and $\hat\Theta_n$ are deterministic functions of the fixed design. The [nodewise approximate inverse bound](/theorems/5581) and the assumed Lasso $\ell^1$ oracle rate show that the coordinate remainder is $o_{\mathbb{P}}(n^{-1/2})$ after the stated sparsity scaling. The restricted eigenvalue, column-normalisation, tuning, nodewise-construction, and nodewise-variance hypotheses are not used again directly; in this theorem they serve as deterministic sufficient context for the separately assumed oracle and approximate-inverse bounds. The known-variance result follows by Slutsky's theorem, and the feasible result follows by applying the same principle to the multiplicative factor $\sigma/\hat\sigma$.
[/proofplan]
[step:Expand the debiased estimator into a Gaussian score and a coordinate remainder]
For each sufficiently large $n$, define the debiased Lasso estimator $\hat b_n\in\mathbb{R}^{p_n}$ by
\begin{align*}
\hat b_n:=\hat\beta_n+\hat\Theta_n\frac{X_n^\top(Y_n-X_n\hat\beta_n)}{n}.
\end{align*}
Define the score vector
\begin{align*}
W_n := \hat\Theta_n \frac{X_n^\top \varepsilon_n}{n} \in \mathbb{R}^{p_n}
\end{align*}
and the remainder vector
\begin{align*}
R_n := (I_{p_n}-\hat\Theta_n\hat\Sigma_n)(\hat\beta_n-\beta_n^*) \in \mathbb{R}^{p_n}.
\end{align*}
Using $Y_n-X_n\hat\beta_n=X_n(\beta_n^*-\hat\beta_n)+\varepsilon_n$, we compute first
\begin{align*}
\hat b_n-\beta_n^* = \hat\beta_n-\beta_n^* + \hat\Theta_n \frac{X_n^\top X_n(\beta_n^*-\hat\beta_n)}{n} + \hat\Theta_n \frac{X_n^\top\varepsilon_n}{n}.
\end{align*}
Since $\hat\Sigma_n=X_n^\top X_n/n$, this becomes
\begin{align*}
\hat b_n-\beta_n^* = \hat\beta_n-\beta_n^* - \hat\Theta_n\hat\Sigma_n(\hat\beta_n-\beta_n^*) + \hat\Theta_n \frac{X_n^\top\varepsilon_n}{n}.
\end{align*}
By the definitions of $R_n$ and $W_n$, we obtain
\begin{align*}
\hat b_n-\beta_n^* = R_n + W_n.
\end{align*}
Taking the $j$th coordinate gives
\begin{align*}
\hat b_{n,j}-\beta_{n,j}^*
=
\hat\theta_{n,j}^\top\frac{X_n^\top\varepsilon_n}{n}
+
e_j^\top(I_{p_n}-\hat\Theta_n\hat\Sigma_n)(\hat\beta_n-\beta_n^*).
\end{align*}
[/step]
[step:Identify the exact normal law of the leading score term]
Define the coordinate variance factor
\begin{align*}
v_{n,j} := \hat\theta_{n,j}^\top\hat\Sigma_n\hat\theta_{n,j}.
\end{align*}
By hypothesis, $c \le v_{n,j} \le C$, so $v_{n,j} > 0$. Define the scalar [random variable](/page/Random%20Variable)
\begin{align*}
G_{n,j} :=
\hat\theta_{n,j}^\top\frac{X_n^\top\varepsilon_n}{n}.
\end{align*}
Since $X_n$ and $\hat\theta_{n,j}$ are deterministic and $\varepsilon_n \sim \mathcal{N}(0,\sigma^2 I_n)$, the scalar $G_{n,j}$ is Gaussian with mean $0$. Its variance is computed by the covariance formula for a deterministic linear functional of a Gaussian vector:
\begin{align*}
\operatorname{Var}(G_{n,j}) = \operatorname{Var}\left(\frac{1}{n}\hat\theta_{n,j}^\top X_n^\top\varepsilon_n\right).
\end{align*}
Using $\operatorname{Var}(\varepsilon_n)=\sigma^2 I_n$, we get
\begin{align*}
\operatorname{Var}(G_{n,j}) = \frac{1}{n^2}\hat\theta_{n,j}^\top X_n^\top \operatorname{Var}(\varepsilon_n) X_n\hat\theta_{n,j}.
\end{align*}
Therefore
\begin{align*}
\operatorname{Var}(G_{n,j}) = \frac{\sigma^2}{n^2}\hat\theta_{n,j}^\top X_n^\top X_n\hat\theta_{n,j}.
\end{align*}
Since $\hat\Sigma_n=X_n^\top X_n/n$, this equals
\begin{align*}
\operatorname{Var}(G_{n,j}) = \frac{\sigma^2}{n}\hat\theta_{n,j}^\top\hat\Sigma_n\hat\theta_{n,j}.
\end{align*}
By the definition of $v_{n,j}$, we conclude
\begin{align*}
\operatorname{Var}(G_{n,j}) = \frac{\sigma^2 v_{n,j}}{n}.
\end{align*}
Therefore
\begin{align*}
Z_{n,j}
:=
\frac{G_{n,j}}{\sigma(v_{n,j}/n)^{1/2}}
\sim \mathcal{N}(0,1)
\end{align*}
for every sufficiently large $n$.
[guided]
The purpose of this step is to isolate the term whose distribution we can compute exactly. Define
\begin{align*}
v_{n,j} := \hat\theta_{n,j}^\top\hat\Sigma_n\hat\theta_{n,j}.
\end{align*}
The theorem assumes $c \le v_{n,j} \le C$, so the normalising denominator
\begin{align*}
\sigma(v_{n,j}/n)^{1/2}
\end{align*}
is positive and finite.
Now define the leading coordinate score
\begin{align*}
G_{n,j} :=
\hat\theta_{n,j}^\top\frac{X_n^\top\varepsilon_n}{n}.
\end{align*}
This is a linear functional of the Gaussian vector $\varepsilon_n$. Because $X_n$ is fixed and $\hat\theta_{n,j}$ is constructed only from $X_n$, the vector $X_n\hat\theta_{n,j}/n \in \mathbb{R}^n$ is deterministic. Hence $G_{n,j}$ is a centered Gaussian scalar. Its variance is computed directly from $\operatorname{Var}(\varepsilon_n)=\sigma^2 I_n$. First,
\begin{align*}
\operatorname{Var}(G_{n,j}) = \operatorname{Var}\left(\frac{1}{n}\hat\theta_{n,j}^\top X_n^\top\varepsilon_n\right).
\end{align*}
The covariance formula for a deterministic linear functional gives
\begin{align*}
\operatorname{Var}(G_{n,j}) = \frac{1}{n^2}\hat\theta_{n,j}^\top X_n^\top \operatorname{Var}(\varepsilon_n) X_n\hat\theta_{n,j}.
\end{align*}
Substituting $\operatorname{Var}(\varepsilon_n)=\sigma^2 I_n$, we obtain
\begin{align*}
\operatorname{Var}(G_{n,j}) = \frac{\sigma^2}{n^2}\hat\theta_{n,j}^\top X_n^\top X_n\hat\theta_{n,j}.
\end{align*}
Because $\hat\Sigma_n=X_n^\top X_n/n$, this is
\begin{align*}
\operatorname{Var}(G_{n,j}) = \frac{\sigma^2}{n}\hat\theta_{n,j}^\top\hat\Sigma_n\hat\theta_{n,j}.
\end{align*}
Finally, the definition $v_{n,j}=\hat\theta_{n,j}^\top\hat\Sigma_n\hat\theta_{n,j}$ gives
\begin{align*}
\operatorname{Var}(G_{n,j}) = \frac{\sigma^2 v_{n,j}}{n}.
\end{align*}
Thus the denominator in the theorem is exactly the standard deviation of $G_{n,j}$. After division by that standard deviation, we obtain
\begin{align*}
Z_{n,j}
:=
\frac{G_{n,j}}{\sigma(v_{n,j}/n)^{1/2}}
\sim \mathcal{N}(0,1).
\end{align*}
This exact normality is the reason the proof reduces to showing that the debiasing remainder is negligible.
[/guided]
[/step]
[step:Bound the coordinate remainder at the standard-error scale]
Define the high-probability event
\begin{align*}
E_n :=
\left\{
\|\hat\beta_n-\beta_n^*\|_1
\le A s_n\sqrt{\frac{\log p_n}{n}}
\right\}.
\end{align*}
By assumption, $\mathbb{P}(E_n) \to 1$. On $E_n$, the coordinate remainder satisfies the identity
\begin{align*}
|e_j^\top R_n| = \left|e_j^\top(I_{p_n}-\hat\Theta_n\hat\Sigma_n)(\hat\beta_n-\beta_n^*)\right|.
\end{align*}
By the [duality inequality between $\ell^\infty$ and $\ell^1$ norms](/page/Holder%20Inequality),
\begin{align*}
|e_j^\top R_n| \le \|e_j^\top(I_{p_n}-\hat\Theta_n\hat\Sigma_n)\|_\infty \|\hat\beta_n-\beta_n^*\|_1.
\end{align*}
Using the nodewise approximate inverse bound and the defining inequality for $E_n$, we get
\begin{align*}
|e_j^\top R_n| \le A\sqrt{\frac{\log p_n}{n}} \cdot A s_n\sqrt{\frac{\log p_n}{n}}.
\end{align*}
Hence
\begin{align*}
|e_j^\top R_n| \le A^2\frac{s_n\log p_n}{n}.
\end{align*}
Since $v_{n,j}\ge c$, we have on $E_n$
\begin{align*}
\left|\frac{e_j^\top R_n}{\sigma(v_{n,j}/n)^{1/2}}\right| \le \frac{A^2 s_n\log p_n/n}{\sigma(c/n)^{1/2}}.
\end{align*}
Equivalently,
\begin{align*}
\left|\frac{e_j^\top R_n}{\sigma(v_{n,j}/n)^{1/2}}\right| \le \frac{A^2}{\sigma c^{1/2}} \frac{s_n\log p_n}{\sqrt n}.
\end{align*}
The right-hand side tends to $0$ by hypothesis. To pass from this eventwise bound to convergence in probability, fix $\varepsilon>0$. For all sufficiently large $n$, the deterministic right-hand side is at most $\varepsilon$, and therefore
\begin{align*}
\mathbb{P}\left(
\left|\frac{e_j^\top R_n}{\sigma(v_{n,j}/n)^{1/2}}\right|>\varepsilon
\right)
\le \mathbb{P}(E_n^c).
\end{align*}
Since $\mathbb{P}(E_n^c)=1-\mathbb{P}(E_n)\to 0$, it follows that
\begin{align*}
\frac{e_j^\top R_n}{\sigma(v_{n,j}/n)^{1/2}}
\xrightarrow{\mathbb{P}} 0.
\end{align*}
[/step]
[step:Combine exact normality and negligible remainder]
From the expansion in the first step,
\begin{align*}
\frac{\hat b_{n,j}-\beta_{n,j}^*}{\sigma(v_{n,j}/n)^{1/2}} = Z_{n,j} + \frac{e_j^\top R_n}{\sigma(v_{n,j}/n)^{1/2}}.
\end{align*}
The normalized score sequence satisfies $Z_{n,j}\sim\mathcal{N}(0,1)$ for every sufficiently large $n$; hence $(Z_{n,j})$ converges in distribution to $\mathcal{N}(0,1)$. The normalized remainder satisfies
\begin{align*}
\frac{e_j^\top R_n}{\sigma(v_{n,j}/n)^{1/2}}
\xrightarrow{\mathbb{P}} 0
\end{align*}
by the previous step. Thus the two hypotheses of [Slutsky's Theorem](/page/Slutsky%27s%20Theorem) are met: one summand converges in distribution and the other converges in probability to the constant $0$. Therefore
\begin{align*}
\frac{\hat b_{n,j}-\beta_{n,j}^*}
{\sigma(v_{n,j}/n)^{1/2}}
\xrightarrow{d}
\mathcal{N}(0,1).
\end{align*}
Substituting $v_{n,j}=\hat\theta_{n,j}^\top\hat\Sigma_n\hat\theta_{n,j}$ gives the stated known-variance conclusion.
[/step]
[step:Replace the noise level by a consistent estimator]
Assume now that $\hat\sigma/\sigma \xrightarrow{\mathbb{P}} 1$. Let $h:(0,\infty)\to(0,\infty)$ be the continuous map $h(t)=1/t$. Since $\sigma>0$ and the limit is the positive constant $1$, the [Continuous Mapping Theorem](/page/Continuous%20Mapping%20Theorem) applied to $\hat\sigma/\sigma$ gives
\begin{align*}
\frac{\sigma}{\hat\sigma}=h\left(\frac{\hat\sigma}{\sigma}\right) \xrightarrow{\mathbb{P}} h(1)=1.
\end{align*}
Using the known-variance statistic from the previous step, write
\begin{align*}
\frac{\hat b_{n,j}-\beta_{n,j}^*}{\hat\sigma(v_{n,j}/n)^{1/2}} = \left(\frac{\hat b_{n,j}-\beta_{n,j}^*}{\sigma(v_{n,j}/n)^{1/2}}\right)\left(\frac{\sigma}{\hat\sigma}\right).
\end{align*}
The first factor converges in distribution to $\mathcal{N}(0,1)$ by the known-variance result, and the second factor converges in probability to the constant $1$. Therefore [Slutsky's Theorem](/page/Slutsky%27s%20Theorem), now applied to the product of these two factors, yields
\begin{align*}
\frac{\hat b_{n,j}-\beta_{n,j}^*}
{\hat\sigma(v_{n,j}/n)^{1/2}}
\xrightarrow{d}
\mathcal{N}(0,1).
\end{align*}
Again substituting $v_{n,j}=\hat\theta_{n,j}^\top\hat\Sigma_n\hat\theta_{n,j}$ gives the feasible conclusion. This completes the proof.
[/step]