Quadratic Forms and Idempotent Matrices — Statement & Proof

Quadratic Forms and Idempotent Matrices (Theorem # 1441)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

No discussion available for this theorem.

Proof

[proofplan] The lemma has two parts: a trace-equals-rank identity for symmetric idempotent matrices, and a distributional statement that $\mathbf Z^\top A \mathbf Z / \sigma^2 \sim \chi^2_r$ when $\mathbf Z \sim N_n(\mathbf 0, \sigma^2 I_n)$. Both follow from the same structural fact: a symmetric idempotent matrix $A$ of rank $r$ is orthogonally diagonalisable with eigenvalues $1$ (with multiplicity $r$) and $0$ (with multiplicity $n-r$). For the trace identity, we read the rank off the spectrum. For the distribution, we change variables $\mathbf W := Q^\top \mathbf Z$ using the diagonalising orthogonal matrix; rotation preserves the distribution of a spherical Gaussian, and in the $\mathbf W$-basis the quadratic form becomes the sum of $r$ independent squared $N(0, \sigma^2)$ variates, which is $\sigma^2 \chi^2_r$. [/proofplan] [step:Diagonalise $A$ orthogonally using symmetry] Let $A \in \mathbb R^{n \times n}$ be symmetric ($A = A^\top$) and idempotent ($A^2 = A$) with $\operatorname{rank}(A) = r$. By the spectral theorem for real symmetric matrices, there exists an orthogonal matrix $Q \in \mathbb R^{n \times n}$ (so $Q^\top Q = QQ^\top = I_n$) and a diagonal matrix $\Lambda = \operatorname{diag}(\lambda_1, \dots, \lambda_n)$ such that \begin{align*} Q^\top A Q = \Lambda. \end{align*} Each $\lambda_i$ is an eigenvalue of $A$, repeated according to geometric (equivalently, algebraic) multiplicity. [/step] [step:Show every eigenvalue of $A$ is $0$ or $1$] Let $\lambda$ be any eigenvalue of $A$ with eigenvector $v \neq 0$, so $Av = \lambda v$. Applying $A$ again and using idempotence $A^2 = A$, \begin{align*} \lambda^2 v = A(\lambda v) = A(Av) = A^2 v = Av = \lambda v. \end{align*} Since $v \neq 0$ we conclude $\lambda^2 = \lambda$, i.e. $\lambda(\lambda - 1) = 0$, so $\lambda \in \{0, 1\}$. Therefore exactly $r$ of the $\lambda_i$ equal $1$ — since the number of non-zero eigenvalues of a diagonalisable matrix equals its rank — and the remaining $n - r$ equal $0$. [guided] The rank equals the number of non-zero eigenvalues **for diagonalisable matrices**. In our setup, $A = Q\Lambda Q^\top$, so \begin{align*} \operatorname{rank}(A) = \operatorname{rank}(Q\Lambda Q^\top) = \operatorname{rank}(\Lambda), \end{align*} because $Q$ and $Q^\top$ are invertible, and multiplication by invertible matrices preserves rank. The rank of the diagonal matrix $\Lambda$ is simply the number of non-zero diagonal entries. So the hypothesis $\operatorname{rank}(A) = r$ combined with the fact that every eigenvalue is $0$ or $1$ forces exactly $r$ ones on the diagonal of $\Lambda$ and $n-r$ zeros. Reorder the basis vectors of $Q$ if necessary so that \begin{align*} \Lambda = \begin{pmatrix} I_r & 0 \\ 0 & 0 \end{pmatrix}. \end{align*} [/guided] [/step] [step:Read off $\operatorname{tr}(A) = r$ from the spectrum] The trace is invariant under orthogonal (more generally, similarity) transformation: for any square matrices $M, N$ of compatible sizes, $\operatorname{tr}(MN) = \operatorname{tr}(NM)$. Applying this with $M = Q^\top A$ and $N = Q$, \begin{align*} \operatorname{tr}(A) = \operatorname{tr}(AQQ^\top) = \operatorname{tr}(Q^\top A Q) = \operatorname{tr}(\Lambda) = \sum_{i=1}^n \lambda_i = r \cdot 1 + (n - r) \cdot 0 = r. \end{align*} Therefore $\operatorname{tr}(A) = \operatorname{rank}(A) = r$, proving the trace identity. [/step] [step:Change variables to $\mathbf W := Q^\top \mathbf Z$ and verify $\mathbf W \sim N_n(\mathbf 0, \sigma^2 I_n)$] Now suppose $\mathbf Z \sim N_n(\mathbf 0, \sigma^2 I_n)$. Define the random vector \begin{align*} \mathbf W := Q^\top \mathbf Z \in \mathbb R^n. \end{align*} Since $Q^\top$ is a linear map and $\mathbf Z$ is multivariate normal, $\mathbf W$ is multivariate normal (an application of the [Orthogonal Transformations Preserve Multivariate Normality](/theorems/1434)). Its mean is $Q^\top \mathbb E[\mathbf Z] = Q^\top \mathbf 0 = \mathbf 0$, and its covariance matrix is \begin{align*} \operatorname{Cov}(\mathbf W) = Q^\top \operatorname{Cov}(\mathbf Z) Q = Q^\top (\sigma^2 I_n) Q = \sigma^2 Q^\top Q = \sigma^2 I_n, \end{align*} using orthogonality $Q^\top Q = I_n$ in the last step. Thus $\mathbf W \sim N_n(\mathbf 0, \sigma^2 I_n)$, so the components $W_1, \dots, W_n$ are independent with each $W_i \sim N(0, \sigma^2)$. [guided] Why is this change of variables useful? The quadratic form $\mathbf Z^\top A \mathbf Z$ is hard to analyse directly because $A$ has complicated structure. In the rotated basis determined by $Q$, the matrix $A$ becomes the diagonal matrix $\Lambda$ — as simple a matrix as possible. The trick is to push the rotation onto $\mathbf Z$ instead of $A$: we verify that the rotated Gaussian $\mathbf W = Q^\top \mathbf Z$ has the same distribution as the original, namely $N_n(\mathbf 0, \sigma^2 I_n)$. The spherical symmetry of the multivariate standard normal is precisely the reason this works. In full detail: $\mathbf W = Q^\top \mathbf Z$ is a linear transformation of a Gaussian, hence Gaussian. Its mean is $Q^\top \mathbb E[\mathbf Z] = Q^\top \mathbf 0 = \mathbf 0$, and its covariance matrix is \begin{align*} \operatorname{Cov}(\mathbf W) = \operatorname{Cov}(Q^\top \mathbf Z) = Q^\top \operatorname{Cov}(\mathbf Z) Q = Q^\top (\sigma^2 I_n) Q = \sigma^2 I_n, \end{align*} where the last equality uses the defining property $Q^\top Q = I_n$ of an orthogonal matrix. The key point is: the coordinate-wise independence of $\mathbf Z$ (a consequence of the diagonal covariance $\sigma^2 I_n$) is preserved by rotation. This would **fail** if the covariance were not a scalar multiple of the identity. [/guided] [/step] [step:Rewrite $\mathbf Z^\top A \mathbf Z$ as $\sum_{i=1}^r W_i^2$] Using $A = Q \Lambda Q^\top$ (rearranging $Q^\top A Q = \Lambda$), \begin{align*} \mathbf Z^\top A \mathbf Z = \mathbf Z^\top Q \Lambda Q^\top \mathbf Z = (Q^\top \mathbf Z)^\top \Lambda (Q^\top \mathbf Z) = \mathbf W^\top \Lambda \mathbf W. \end{align*} Since $\Lambda = \operatorname{diag}(1, \dots, 1, 0, \dots, 0)$ with $r$ ones, \begin{align*} \mathbf W^\top \Lambda \mathbf W = \sum_{i=1}^n \lambda_i W_i^2 = \sum_{i=1}^r W_i^2. \end{align*} Therefore $\mathbf Z^\top A \mathbf Z = \sum_{i=1}^r W_i^2$. [/step] [step:Conclude $\mathbf Z^\top A \mathbf Z / \sigma^2 \sim \chi^2_r$] Each $W_i \sim N(0, \sigma^2)$ and the $W_1, \dots, W_r$ are mutually independent (a subset of the independent family $\mathbf W$). Therefore $W_i/\sigma \sim N(0, 1)$ independently across $i$, and \begin{align*} (W_i/\sigma)^2 \sim \chi^2_1. \end{align*} The definition of the $\chi^2_r$ distribution is the law of a sum of $r$ independent $\chi^2_1$ random variables. Hence \begin{align*} \frac{\mathbf Z^\top A \mathbf Z}{\sigma^2} = \frac{1}{\sigma^2}\sum_{i=1}^r W_i^2 = \sum_{i=1}^r \left( \frac{W_i}{\sigma} \right)^2 \sim \chi^2_r, \end{align*} equivalently $\mathbf Z^\top A \mathbf Z \sim \sigma^2 \chi^2_r$. This completes both claims: $\operatorname{tr}(A) = \operatorname{rank}(A) = r$, and the quadratic form $\mathbf Z^\top A \mathbf Z$ is $\sigma^2$ times a chi-squared random variable with $r$ degrees of freedom. [/step]

Explore Further

Wald Confidence Ellipsoids Statistics Null Distribution of the Sample Correlation Statistics Scalar Asymptotic Normality Statistics Discrete Uniform Simulation Statistics Posterior Minimisation Implies Bayes Optimality Statistics Bias-Variance Decomposition Statistics Credible Sets Are Asymptotic Confidence Sets Statistics Minimal Sufficiency Criterion Statistics

What brings you to Androma?

Start with a route through the knowledge graph.

Quadratic Forms and Idempotent Matrices (Theorem # 1441)

Discussion

Proof

Explore Further

Sign in to Androma

Check your inbox

One last step

Quadratic Forms and Idempotent Matrices (Theorem # 1441)

Discussion

Proof

Explore Further