[proofplan]
The lemma has two parts: a trace-equals-rank identity for symmetric idempotent matrices, and a distributional statement that $\mathbf Z^\top A \mathbf Z / \sigma^2 \sim \chi^2_r$ when $\mathbf Z \sim N_n(\mathbf 0, \sigma^2 I_n)$. Both follow from the same structural fact: a symmetric idempotent matrix $A$ of rank $r$ is orthogonally diagonalisable with eigenvalues $1$ (with multiplicity $r$) and $0$ (with multiplicity $n-r$). For the trace identity, we read the rank off the spectrum. For the distribution, we change variables $\mathbf W := Q^\top \mathbf Z$ using the diagonalising orthogonal matrix; rotation preserves the distribution of a spherical Gaussian, and in the $\mathbf W$-basis the quadratic form becomes the sum of $r$ independent squared $N(0, \sigma^2)$ variates, which is $\sigma^2 \chi^2_r$.
[/proofplan]
[step:Diagonalise $A$ orthogonally using symmetry]
Let $A \in \mathbb R^{n \times n}$ be symmetric ($A = A^\top$) and idempotent ($A^2 = A$) with $\operatorname{rank}(A) = r$. By the spectral theorem for real symmetric matrices, there exists an orthogonal matrix $Q \in \mathbb R^{n \times n}$ (so $Q^\top Q = QQ^\top = I_n$) and a diagonal matrix $\Lambda = \operatorname{diag}(\lambda_1, \dots, \lambda_n)$ such that
\begin{align*}
Q^\top A Q = \Lambda.
\end{align*}
Each $\lambda_i$ is an eigenvalue of $A$, repeated according to geometric (equivalently, algebraic) multiplicity.
[/step]
[step:Show every eigenvalue of $A$ is $0$ or $1$]
Let $\lambda$ be any eigenvalue of $A$ with eigenvector $v \neq 0$, so $Av = \lambda v$. Applying $A$ again and using idempotence $A^2 = A$,
\begin{align*}
\lambda^2 v = A(\lambda v) = A(Av) = A^2 v = Av = \lambda v.
\end{align*}
Since $v \neq 0$ we conclude $\lambda^2 = \lambda$, i.e. $\lambda(\lambda - 1) = 0$, so $\lambda \in \{0, 1\}$. Therefore exactly $r$ of the $\lambda_i$ equal $1$ — since the number of non-zero eigenvalues of a diagonalisable matrix equals its rank — and the remaining $n - r$ equal $0$.
[guided]
The rank equals the number of non-zero eigenvalues **for diagonalisable matrices**. In our setup, $A = Q\Lambda Q^\top$, so
\begin{align*}
\operatorname{rank}(A) = \operatorname{rank}(Q\Lambda Q^\top) = \operatorname{rank}(\Lambda),
\end{align*}
because $Q$ and $Q^\top$ are invertible, and multiplication by invertible matrices preserves rank. The rank of the diagonal matrix $\Lambda$ is simply the number of non-zero diagonal entries. So the hypothesis $\operatorname{rank}(A) = r$ combined with the fact that every eigenvalue is $0$ or $1$ forces exactly $r$ ones on the diagonal of $\Lambda$ and $n-r$ zeros. Reorder the basis vectors of $Q$ if necessary so that
\begin{align*}
\Lambda = \begin{pmatrix} I_r & 0 \\ 0 & 0 \end{pmatrix}.
\end{align*}
[/guided]
[/step]
[step:Read off $\operatorname{tr}(A) = r$ from the spectrum]
The trace is invariant under orthogonal (more generally, similarity) transformation: for any square matrices $M, N$ of compatible sizes, $\operatorname{tr}(MN) = \operatorname{tr}(NM)$. Applying this with $M = Q^\top A$ and $N = Q$,
\begin{align*}
\operatorname{tr}(A) = \operatorname{tr}(AQQ^\top) = \operatorname{tr}(Q^\top A Q) = \operatorname{tr}(\Lambda) = \sum_{i=1}^n \lambda_i = r \cdot 1 + (n - r) \cdot 0 = r.
\end{align*}
Therefore $\operatorname{tr}(A) = \operatorname{rank}(A) = r$, proving the trace identity.
[/step]
[step:Change variables to $\mathbf W := Q^\top \mathbf Z$ and verify $\mathbf W \sim N_n(\mathbf 0, \sigma^2 I_n)$]
Now suppose $\mathbf Z \sim N_n(\mathbf 0, \sigma^2 I_n)$. Define the random vector
\begin{align*}
\mathbf W := Q^\top \mathbf Z \in \mathbb R^n.
\end{align*}
Since $Q^\top$ is a linear map and $\mathbf Z$ is multivariate normal, $\mathbf W$ is multivariate normal (an application of the [Orthogonal Transformations Preserve Multivariate Normality](/theorems/1434)). Its mean is $Q^\top \mathbb E[\mathbf Z] = Q^\top \mathbf 0 = \mathbf 0$, and its covariance matrix is
\begin{align*}
\operatorname{Cov}(\mathbf W) = Q^\top \operatorname{Cov}(\mathbf Z) Q = Q^\top (\sigma^2 I_n) Q = \sigma^2 Q^\top Q = \sigma^2 I_n,
\end{align*}
using orthogonality $Q^\top Q = I_n$ in the last step. Thus $\mathbf W \sim N_n(\mathbf 0, \sigma^2 I_n)$, so the components $W_1, \dots, W_n$ are independent with each $W_i \sim N(0, \sigma^2)$.
[guided]
Why is this change of variables useful? The quadratic form $\mathbf Z^\top A \mathbf Z$ is hard to analyse directly because $A$ has complicated structure. In the rotated basis determined by $Q$, the matrix $A$ becomes the diagonal matrix $\Lambda$ — as simple a matrix as possible. The trick is to push the rotation onto $\mathbf Z$ instead of $A$: we verify that the rotated Gaussian $\mathbf W = Q^\top \mathbf Z$ has the same distribution as the original, namely $N_n(\mathbf 0, \sigma^2 I_n)$. The spherical symmetry of the multivariate standard normal is precisely the reason this works.
In full detail: $\mathbf W = Q^\top \mathbf Z$ is a linear transformation of a Gaussian, hence Gaussian. Its mean is $Q^\top \mathbb E[\mathbf Z] = Q^\top \mathbf 0 = \mathbf 0$, and its covariance matrix is
\begin{align*}
\operatorname{Cov}(\mathbf W) = \operatorname{Cov}(Q^\top \mathbf Z) = Q^\top \operatorname{Cov}(\mathbf Z) Q = Q^\top (\sigma^2 I_n) Q = \sigma^2 I_n,
\end{align*}
where the last equality uses the defining property $Q^\top Q = I_n$ of an orthogonal matrix. The key point is: the coordinate-wise independence of $\mathbf Z$ (a consequence of the diagonal covariance $\sigma^2 I_n$) is preserved by rotation. This would **fail** if the covariance were not a scalar multiple of the identity.
[/guided]
[/step]
[step:Rewrite $\mathbf Z^\top A \mathbf Z$ as $\sum_{i=1}^r W_i^2$]
Using $A = Q \Lambda Q^\top$ (rearranging $Q^\top A Q = \Lambda$),
\begin{align*}
\mathbf Z^\top A \mathbf Z = \mathbf Z^\top Q \Lambda Q^\top \mathbf Z = (Q^\top \mathbf Z)^\top \Lambda (Q^\top \mathbf Z) = \mathbf W^\top \Lambda \mathbf W.
\end{align*}
Since $\Lambda = \operatorname{diag}(1, \dots, 1, 0, \dots, 0)$ with $r$ ones,
\begin{align*}
\mathbf W^\top \Lambda \mathbf W = \sum_{i=1}^n \lambda_i W_i^2 = \sum_{i=1}^r W_i^2.
\end{align*}
Therefore $\mathbf Z^\top A \mathbf Z = \sum_{i=1}^r W_i^2$.
[/step]
[step:Conclude $\mathbf Z^\top A \mathbf Z / \sigma^2 \sim \chi^2_r$]
Each $W_i \sim N(0, \sigma^2)$ and the $W_1, \dots, W_r$ are mutually independent (a subset of the independent family $\mathbf W$). Therefore $W_i/\sigma \sim N(0, 1)$ independently across $i$, and
\begin{align*}
(W_i/\sigma)^2 \sim \chi^2_1.
\end{align*}
The definition of the $\chi^2_r$ distribution is the law of a sum of $r$ independent $\chi^2_1$ random variables. Hence
\begin{align*}
\frac{\mathbf Z^\top A \mathbf Z}{\sigma^2} = \frac{1}{\sigma^2}\sum_{i=1}^r W_i^2 = \sum_{i=1}^r \left( \frac{W_i}{\sigma} \right)^2 \sim \chi^2_r,
\end{align*}
equivalently $\mathbf Z^\top A \mathbf Z \sim \sigma^2 \chi^2_r$. This completes both claims: $\operatorname{tr}(A) = \operatorname{rank}(A) = r$, and the quadratic form $\mathbf Z^\top A \mathbf Z$ is $\sigma^2$ times a chi-squared random variable with $r$ degrees of freedom.
[/step]