[proofplan]
The proof is an algebraic expansion of the definition of the debiased estimator. We substitute the fixed-design model $Y=X\beta^*+\varepsilon$ into the residual $Y-X\hat\beta$, isolate the stochastic term involving $X^\top\varepsilon/n$, and collect the remaining deterministic estimation-error terms. The coordinate identity follows by multiplying the vector identity on the left by the standard basis row vector $e_j^\top$.
[/proofplan]
custom_env
admin
[step:Decompose the residual using the fixed-design model]From the model equation $Y=X\beta^*+\varepsilon$, the residual vector $Y-X\hat\beta \in \mathbb{R}^n$ satisfies
\begin{align*}
Y-X\hat\beta = X\beta^*+\varepsilon-X\hat\beta.
\end{align*}
Combining the two terms involving $X$ gives
\begin{align*}
Y-X\hat\beta = \varepsilon-X(\hat\beta-\beta^*).
\end{align*}[/step]
custom_env
admin
[guided]The residual in the debiased estimator is $Y-X\hat\beta$, and the goal is to express it in terms of the true noise vector $\varepsilon$ and the estimation error $\hat\beta-\beta^*$. Substituting the model identity $Y=X\beta^*+\varepsilon$ gives
\begin{align*}
Y-X\hat\beta = X\beta^*+\varepsilon-X\hat\beta.
\end{align*}
The two terms involving $X$ can be grouped because $X$ is a [linear map](/page/Linear%20Map) from $\mathbb{R}^p$ to $\mathbb{R}^n$ represented by the matrix $X \in \mathbb{R}^{n \times p}$. Thus
\begin{align*}
X\beta^*-X\hat\beta = -X(\hat\beta-\beta^*),
\end{align*}
and therefore
\begin{align*}
Y-X\hat\beta = \varepsilon-X(\hat\beta-\beta^*).
\end{align*}
This is the only place where the model equation is used.[/guided]
custom_env
admin
[step:Expand the debiased estimator and collect the estimation error]
Using the definition of $\hat b$ and the residual decomposition above,
\begin{align*}
\hat b-\beta^* = \hat\beta-\beta^*+\hat\Theta\frac{X^\top(Y-X\hat\beta)}{n}.
\end{align*}
Substituting $Y-X\hat\beta=\varepsilon-X(\hat\beta-\beta^*)$ into this expression gives
\begin{align*}
\hat b-\beta^* = \hat\beta-\beta^*+\hat\Theta\frac{X^\top\varepsilon}{n}
-\hat\Theta\frac{X^\top X(\hat\beta-\beta^*)}{n}.
\end{align*}
Since $\hat\Sigma=X^\top X/n$, this becomes
\begin{align*}
\hat b-\beta^* = \hat\beta-\beta^*+\hat\Theta\frac{X^\top\varepsilon}{n}
-\hat\Theta\hat\Sigma(\hat\beta-\beta^*).
\end{align*}
Factoring the common vector $\hat\beta-\beta^*$ from the first and third terms gives
\begin{align*}
\hat b-\beta^* = \hat\Theta\frac{X^\top\varepsilon}{n}
+\bigl(I_p-\hat\Theta\hat\Sigma\bigr)(\hat\beta-\beta^*).
\end{align*}
This proves the vector identity.
[/step]
custom_env
admin
[step:Project the vector identity onto the $j$-th coordinate]
Fix $j \in \{1,\dots,p\}$. Multiplying the vector identity on the left by $e_j^\top$ gives
\begin{align*}
e_j^\top(\hat b-\beta^*) = e_j^\top\hat\Theta\frac{X^\top\varepsilon}{n}+e_j^\top(I_p-\hat\Theta\hat\Sigma)(\hat\beta-\beta^*).
\end{align*}
By the definitions of coordinates and of $\hat\theta_j$, we have $e_j^\top(\hat b-\beta^*)=\hat b_j-\beta_j^*$ and $e_j^\top\hat\Theta=\hat\theta_j^\top$. Hence
\begin{align*}
\hat b_j-\beta_j^* = \hat\theta_j^\top\frac{X^\top\varepsilon}{n}+e_j^\top(I_p-\hat\Theta\hat\Sigma)(\hat\beta-\beta^*).
\end{align*}
This is the claimed coordinate decomposition.
[/step]