Rayleigh–Ritz Variance Maximisation Characterisation

Rayleigh–Ritz Variance Maximisation Characterisation (Theorem # 4030)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We rewrite every unit vector $a \in \mathbb{R}^p$ in the orthonormal eigenbasis of $\Sigma$, so that the variance $\operatorname{Var}(a^\top X)$ becomes the diagonal quadratic form $\sum_{j=1}^p \lambda_j b_j^2$. The unit norm condition says that the coefficients $b_j^2$ sum to $1$, and the orthogonality constraints defining $A_k$ force the first $k-1$ coefficients to vanish. The ordered eigenvalues then give the upper bound $\lambda_k$, and equality is characterised by determining exactly when all nonzero coefficient mass lies in the eigenspace corresponding to $\lambda_k$. [/proofplan] [step:Express the variance in the eigenbasis of $\Sigma$] Let $a \in \mathbb{R}^p$ be a unit vector. Define the coordinate vector $b \in \mathbb{R}^p$ by \begin{align*} b := \Gamma^\top a. \end{align*} Since $\Gamma$ is orthogonal, $\Gamma^\top \Gamma = I_p$ and $a = \Gamma b$. Also, \begin{align*} |b|^2 = b^\top b = a^\top \Gamma \Gamma^\top a = a^\top a = |a|^2 = 1. \end{align*} Using the covariance matrix identity for a linear projection, \begin{align*} \operatorname{Var}(a^\top X) = a^\top \Sigma a. \end{align*} Substituting $a=\Gamma b$ and $\Sigma=\Gamma\Lambda\Gamma^\top$ gives \begin{align*} \operatorname{Var}(a^\top X) &= (\Gamma b)^\top \Gamma\Lambda\Gamma^\top(\Gamma b) \\ &= b^\top \Gamma^\top\Gamma\Lambda\Gamma^\top\Gamma b \\ &= b^\top \Lambda b \\ &= \sum_{j=1}^p \lambda_j b_j^2. \end{align*} [guided] The point of passing from $a$ to $b$ is to write the quadratic form associated with $\Sigma$ in diagonal coordinates. Define $b := \Gamma^\top a$. Because $\Gamma$ is orthogonal, multiplication by $\Gamma^\top$ preserves Euclidean length, so \begin{align*} |b|^2 = b^\top b = a^\top \Gamma \Gamma^\top a = a^\top a = |a|^2 = 1. \end{align*} Thus $b_1^2,\dots,b_p^2$ are nonnegative numbers whose sum is $1$. The covariance matrix $\Sigma$ represents variances of linear projections: for each $a \in \mathbb{R}^p$, \begin{align*} \operatorname{Var}(a^\top X) = a^\top \Sigma a. \end{align*} Now substitute $a=\Gamma b$ and use the spectral decomposition $\Sigma=\Gamma\Lambda\Gamma^\top$: \begin{align*} \operatorname{Var}(a^\top X) &= (\Gamma b)^\top \Gamma\Lambda\Gamma^\top(\Gamma b) \\ &= b^\top \Gamma^\top\Gamma\Lambda\Gamma^\top\Gamma b \\ &= b^\top \Lambda b \\ &= \sum_{j=1}^p \lambda_j b_j^2. \end{align*} This is the central reduction: maximising variance over vectors $a$ is equivalent to placing the unit coefficient mass $\sum_{j=1}^p b_j^2=1$ among the ordered eigenvalues. [/guided] [/step] [step:Maximise over all unit vectors by putting all coefficient mass on the largest eigenvalue] For $k=1$, the only constraint is $|a|=1$, equivalently $|b|=1$. Since $\lambda_j \leq \lambda_1$ for every $1 \leq j \leq p$, \begin{align*} \operatorname{Var}(a^\top X) = \sum_{j=1}^p \lambda_j b_j^2 \leq \sum_{j=1}^p \lambda_1 b_j^2 = \lambda_1 \sum_{j=1}^p b_j^2 = \lambda_1. \end{align*} Taking $a=\gamma_1$ gives $b=e_1$, where $e_1 \in \mathbb{R}^p$ is the first standard basis vector, and hence \begin{align*} \operatorname{Var}(\gamma_1^\top X) = \lambda_1. \end{align*} Therefore \begin{align*} \max_{|a|=1}\operatorname{Var}(a^\top X)=\lambda_1. \end{align*} [/step] [step:Translate the orthogonality constraints into vanishing eigenbasis coordinates] Fix $k$ with $1 \leq k \leq p$, and let $a \in A_k$. Define $b=\Gamma^\top a$ as above. For each $1 \leq j \leq p$, the $j$-th coordinate of $b$ is \begin{align*} b_j = e_j^\top b = e_j^\top \Gamma^\top a = \gamma_j^\top a, \end{align*} where $e_j \in \mathbb{R}^p$ is the $j$-th standard basis vector. Hence the constraints $a^\top \gamma_j=0$ for $1 \leq j<k$ are exactly \begin{align*} b_1=\cdots=b_{k-1}=0. \end{align*} Since $|b|=1$, this gives \begin{align*} \sum_{j=k}^p b_j^2 = 1. \end{align*} [/step] [step:Bound the constrained variance by $\lambda_k$] For $a \in A_k$, the preceding step gives $b_1=\cdots=b_{k-1}=0$. Therefore \begin{align*} \operatorname{Var}(a^\top X) = \sum_{j=k}^p \lambda_j b_j^2. \end{align*} Since the eigenvalues are ordered, $\lambda_j \leq \lambda_k$ for every $j \geq k$. Thus \begin{align*} \operatorname{Var}(a^\top X) = \sum_{j=k}^p \lambda_j b_j^2 \leq \sum_{j=k}^p \lambda_k b_j^2 = \lambda_k \sum_{j=k}^p b_j^2 = \lambda_k. \end{align*} Taking $a=\gamma_k$ gives $b=e_k$, so $a \in A_k$ and \begin{align*} \operatorname{Var}(\gamma_k^\top X) = \lambda_k. \end{align*} Hence \begin{align*} \max_{a \in A_k}\operatorname{Var}(a^\top X)=\lambda_k. \end{align*} [guided] The orthogonality constraints remove the first $k-1$ eigendirections. After writing $a=\Gamma b$, those constraints are precisely $b_1=\cdots=b_{k-1}=0$, so only the coordinates $b_k,\dots,b_p$ can contribute to the variance. Therefore \begin{align*} \operatorname{Var}(a^\top X) = \sum_{j=k}^p \lambda_j b_j^2. \end{align*} Now the ordering of the eigenvalues is used exactly once: for every index $j \geq k$, we have $\lambda_j \leq \lambda_k$. Since each coefficient $b_j^2$ is nonnegative, replacing each $\lambda_j$ by the larger number $\lambda_k$ can only increase the sum: \begin{align*} \operatorname{Var}(a^\top X) = \sum_{j=k}^p \lambda_j b_j^2 \leq \sum_{j=k}^p \lambda_k b_j^2 = \lambda_k \sum_{j=k}^p b_j^2 = \lambda_k. \end{align*} The final equality uses the unit norm condition after the first $k-1$ coordinates have vanished. To see that the upper bound is attained, choose $a=\gamma_k$. This vector has unit length, and it is orthogonal to $\gamma_1,\dots,\gamma_{k-1}$ because the eigenbasis is orthonormal. Its coordinate vector is $b=e_k$, so \begin{align*} \operatorname{Var}(\gamma_k^\top X) = \sum_{j=k}^p \lambda_j (e_k)_j^2 = \lambda_k. \end{align*} Thus the constrained maximum is exactly $\lambda_k$. [/guided] [/step] [step:Identify all constrained maximizers] Let $a \in A_k$ and let $b=\Gamma^\top a$. From the previous computation, \begin{align*} \lambda_k - \operatorname{Var}(a^\top X) &= \lambda_k \sum_{j=k}^p b_j^2 - \sum_{j=k}^p \lambda_j b_j^2 \\ &= \sum_{j=k}^p (\lambda_k-\lambda_j)b_j^2. \end{align*} Each term $(\lambda_k-\lambda_j)b_j^2$ is nonnegative. Therefore equality $\operatorname{Var}(a^\top X)=\lambda_k$ holds if and only if \begin{align*} (\lambda_k-\lambda_j)b_j^2=0 \end{align*} for every $j \geq k$. Equivalently, $b_j=0$ whenever $j \geq k$ and $\lambda_j<\lambda_k$. Since also $b_j=0$ for $j<k$, the maximizers are exactly the unit vectors of the form \begin{align*} a=\sum_{\substack{j\geq k\\ \lambda_j=\lambda_k}} b_j\gamma_j, \qquad \sum_{\substack{j\geq k\\ \lambda_j=\lambda_k}} b_j^2=1. \end{align*} This is precisely the unit sphere in \begin{align*} \operatorname{span}\{\gamma_j : j \geq k \text{ and } \lambda_j=\lambda_k\}. \end{align*} If $\lambda_k$ is simple, this span is $\operatorname{span}\{\gamma_k\}$, so the only unit maximizers in $A_k$ are $a=\gamma_k$ and $a=-\gamma_k$. This proves the stated characterisation. [/step]

Prerequisites (0/1 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Definitions & Concepts

Orthogonality

Explore Further

Orthogonality Definition Orthogonality of Innovations probability Conditional Expectation as the Mean Square Forecast probability Kalman Filter Recursion Theorem probability Linear Forecast from the Wold Representation probability Hypothesis and Error SSP Matrices for Multivariate General Linear Model Contrasts probability Moment Generating Function of a Multivariate Normal Distribution probability Rank Bound for the Sample Covariance Matrix probability Wilks' Lambda for Testing Zero Cross-Covariance probability

What brings you to Androma?

Start with a route through the knowledge graph.