[proofplan]
We write the least squares problem through its normal equations, split into the $X$-block and the $Z$-block. The $Z$-block identifies the fitted contribution of $Z$ as the [orthogonal projection](/theorems/437) of $Y-X\hat{\beta}_X$ onto $\operatorname{Range}(Z)$, so applying $M_Z$ removes it. This reduces the $X$-block normal equation to $X^\top M_Z(Y-X\hat{\beta}_X)=0$. The full column rank of $[X\ Z]$ then gives invertibility of $X^\top M_ZX$, allowing us to solve for $\hat{\beta}_X$.
[/proofplan]
custom_env
admin
[step:Derive the block normal equations for the least squares minimizer]
Define the least squares objective function
\begin{align*}
Q: \mathbb{R}^k \times \mathbb{R}^m &\to \mathbb{R} \\
(\beta,\gamma) &\mapsto |Y - X\beta - Z\gamma|^2.
\end{align*}
Since $A=[X\ Z]$ has full column rank, the quadratic function $Q$ is strictly convex, and hence its minimizer $(\hat{\beta}_X,\hat{\gamma})$ is characterized by vanishing first derivatives in the $\beta$ and $\gamma$ directions.
For every $h \in \mathbb{R}^k$, differentiating $t \mapsto Q(\hat{\beta}_X+th,\hat{\gamma})$ at $t=0$ gives
\begin{align*}
0
&= -2h^\top X^\top(Y-X\hat{\beta}_X-Z\hat{\gamma}).
\end{align*}
Since this holds for all $h \in \mathbb{R}^k$,
\begin{align*}
X^\top(Y-X\hat{\beta}_X-Z\hat{\gamma})=0.
\end{align*}
Similarly, for every $\ell \in \mathbb{R}^m$, differentiating $t \mapsto Q(\hat{\beta}_X,\hat{\gamma}+t\ell)$ at $t=0$ gives
\begin{align*}
Z^\top(Y-X\hat{\beta}_X-Z\hat{\gamma})=0.
\end{align*}
Thus the block normal equations are
\begin{align*}
X^\top(Y-X\hat{\beta}_X-Z\hat{\gamma}) &= 0,\\
Z^\top(Y-X\hat{\beta}_X-Z\hat{\gamma}) &= 0.
\end{align*}
[/step]
custom_env
admin
[step:Use the $Z$-normal equation to eliminate $\hat{\gamma}$]
Because $[X\ Z]$ has full column rank, the columns of $Z$ are linearly independent. Hence $Z^\top Z$ is invertible. The second normal equation gives
\begin{align*}
Z^\top Y - Z^\top X\hat{\beta}_X - Z^\top Z\hat{\gamma}=0.
\end{align*}
Solving for $\hat{\gamma}$ yields
\begin{align*}
\hat{\gamma}
&= (Z^\top Z)^{-1}Z^\top(Y-X\hat{\beta}_X).
\end{align*}
Substituting this expression into the full residual gives
\begin{align*}
Y-X\hat{\beta}_X-Z\hat{\gamma}
&= Y-X\hat{\beta}_X
- Z(Z^\top Z)^{-1}Z^\top(Y-X\hat{\beta}_X)\\
&= M_Z(Y-X\hat{\beta}_X).
\end{align*}
[/step]
custom_env
admin
[step:Reduce the $X$-normal equation to the residualized regression equation]
Substitute the identity
\begin{align*}
Y-X\hat{\beta}_X-Z\hat{\gamma}=M_Z(Y-X\hat{\beta}_X)
\end{align*}
into the first normal equation. This gives
\begin{align*}
0
&= X^\top(Y-X\hat{\beta}_X-Z\hat{\gamma})\\
&= X^\top M_Z(Y-X\hat{\beta}_X)\\
&= X^\top M_ZY - X^\top M_ZX\hat{\beta}_X.
\end{align*}
Therefore
\begin{align*}
X^\top M_ZX\hat{\beta}_X = X^\top M_ZY.
\end{align*}
[/step]
custom_env
admin
[step:Prove that $X^\top M_ZX$ is invertible]
First note that $M_Z$ is symmetric and idempotent:
\begin{align*}
M_Z^\top &= M_Z,\\
M_Z^2 &= M_Z.
\end{align*}
Thus, for every $w \in \mathbb{R}^n$,
\begin{align*}
w^\top M_Zw
&= w^\top M_Z^\top M_Zw\\
&= |M_Zw|^2.
\end{align*}
We prove that $X^\top M_ZX$ has trivial kernel. Let $v \in \mathbb{R}^k$ satisfy
\begin{align*}
X^\top M_ZXv=0.
\end{align*}
Multiplying on the left by $v^\top$ gives
\begin{align*}
0
&= v^\top X^\top M_ZXv\\
&= |M_ZXv|^2.
\end{align*}
Hence $M_ZXv=0$. By the definition of $M_Z$,
\begin{align*}
Xv
&= Z(Z^\top Z)^{-1}Z^\top Xv.
\end{align*}
Define
\begin{align*}
a := (Z^\top Z)^{-1}Z^\top Xv \in \mathbb{R}^m.
\end{align*}
Then $Xv=Za$, so
\begin{align*}
[X\ Z]\begin{pmatrix}v\\-a\end{pmatrix}=0.
\end{align*}
Since $[X\ Z]$ has full column rank, its kernel is $\{0\}$. Therefore $v=0$ and $a=0$. Thus $X^\top M_ZX$ has trivial kernel. Since it is a $k \times k$ matrix, it is invertible.
[/step]
custom_env
admin
[step:Solve the residualized normal equation for the coefficient on $X$]
From the residualized normal equation,
\begin{align*}
X^\top M_ZX\hat{\beta}_X = X^\top M_ZY.
\end{align*}
Since $X^\top M_ZX$ is invertible, multiplying on the left by $(X^\top M_ZX)^{-1}$ gives
\begin{align*}
\hat{\beta}_X
&= (X^\top M_ZX)^{-1}X^\top M_ZY.
\end{align*}
This is exactly the coefficient vector obtained by regressing the residualized outcome $M_ZY$ on the residualized regressors $M_ZX$, completing the proof.
[/step]