Existence and Uniqueness of Population Canonical Directions

Existence and Uniqueness of Population Canonical Directions (Theorem # 4043)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We reduce the constrained optimization problem to an ordinary singular-vector problem by whitening the two covariance metrics. Under the change of variables $\alpha=\Sigma_{11}^{1/2}a$ and $\beta=\Sigma_{22}^{1/2}b$, the feasible ellipsoids become Euclidean unit spheres and the objective becomes $\alpha^\top C\beta$. Compactness gives existence of maximizers, while the [singular value decomposition](/theorems/3071) of $C$ gives the full ordered family and translates back to canonical directions. Finally, simplicity of a positive singular value forces its left and right singular vectors to span one-dimensional eigenspaces, giving uniqueness up to the simultaneous sign convention that preserves a positive canonical correlation. [/proofplan] [step:Whiten the covariance constraints into Euclidean sphere constraints] Let \begin{align*} T_1: \mathbb{R}^p &\to \mathbb{R}^p, & a &\mapsto \Sigma_{11}^{1/2}a, \\ T_2: \mathbb{R}^q &\to \mathbb{R}^q, & b &\mapsto \Sigma_{22}^{1/2}b \end{align*} be the invertible linear maps induced by the positive definite square roots. For $a \in \mathbb{R}^p$ and $b \in \mathbb{R}^q$, define \begin{align*} \alpha := T_1(a)=\Sigma_{11}^{1/2}a, \qquad \beta := T_2(b)=\Sigma_{22}^{1/2}b. \end{align*} Then $a=\Sigma_{11}^{-1/2}\alpha$ and $b=\Sigma_{22}^{-1/2}\beta$, and \begin{align*} a^\top \Sigma_{11} a &= \alpha^\top \alpha = |\alpha|^2, \\ b^\top \Sigma_{22} b &= \beta^\top \beta = |\beta|^2, \\ a^\top \Sigma_{12} b &= \alpha^\top \Sigma_{11}^{-1/2}\Sigma_{12}\Sigma_{22}^{-1/2}\beta = \alpha^\top C\beta. \end{align*} Thus the feasible sets \begin{align*} \{a \in \mathbb{R}^p : a^\top\Sigma_{11}a=1\}, \qquad \{b \in \mathbb{R}^q : b^\top\Sigma_{22}b=1\} \end{align*} are carried bijectively onto the Euclidean unit spheres \begin{align*} S^{p-1}:=\{\alpha \in \mathbb{R}^p:|\alpha|=1\}, \qquad S^{q-1}:=\{\beta \in \mathbb{R}^q:|\beta|=1\}. \end{align*} [guided] The covariance constraints are ellipsoidal because the matrices $\Sigma_{11}$ and $\Sigma_{22}$ define inner products. Since both matrices are symmetric positive definite, each has a unique symmetric positive definite square root, and those square roots are invertible. We therefore introduce the linear maps \begin{align*} T_1: \mathbb{R}^p &\to \mathbb{R}^p, & a &\mapsto \Sigma_{11}^{1/2}a, \\ T_2: \mathbb{R}^q &\to \mathbb{R}^q, & b &\mapsto \Sigma_{22}^{1/2}b. \end{align*} For a candidate direction pair $(a,b)$, set \begin{align*} \alpha := \Sigma_{11}^{1/2}a, \qquad \beta := \Sigma_{22}^{1/2}b. \end{align*} Because $T_1$ and $T_2$ are invertible, this change of variables loses no information: \begin{align*} a=\Sigma_{11}^{-1/2}\alpha, \qquad b=\Sigma_{22}^{-1/2}\beta. \end{align*} Now compute the constraints. Using symmetry of the square roots and the identity $\Sigma_{11}^{1/2}\Sigma_{11}^{1/2}=\Sigma_{11}$, we get \begin{align*} a^\top \Sigma_{11}a = a^\top \Sigma_{11}^{1/2}\Sigma_{11}^{1/2}a = (\Sigma_{11}^{1/2}a)^\top(\Sigma_{11}^{1/2}a) = |\alpha|^2. \end{align*} The same computation gives \begin{align*} b^\top \Sigma_{22}b=|\beta|^2. \end{align*} Finally, substituting $a=\Sigma_{11}^{-1/2}\alpha$ and $b=\Sigma_{22}^{-1/2}\beta$ into the covariance objective gives \begin{align*} a^\top\Sigma_{12}b = \alpha^\top\Sigma_{11}^{-1/2}\Sigma_{12}\Sigma_{22}^{-1/2}\beta = \alpha^\top C\beta. \end{align*} Thus canonical correlation analysis in the original covariance metrics is exactly the problem of optimizing the [bilinear form](/page/Bilinear%20Form) $(\alpha,\beta)\mapsto \alpha^\top C\beta$ over Euclidean unit spheres. [/guided] [/step] [step:Use compactness to obtain the first canonical correlation] Define \begin{align*} F:S^{p-1}\times S^{q-1} &\to \mathbb{R}, & (\alpha,\beta)&\mapsto \alpha^\top C\beta. \end{align*} The spheres $S^{p-1}$ and $S^{q-1}$ are closed and bounded subsets of finite-dimensional Euclidean spaces, hence compact, and therefore $S^{p-1}\times S^{q-1}$ is compact. The map $F$ is continuous because it is bilinear. Hence $F$ attains a maximum and a minimum on $S^{p-1}\times S^{q-1}$. Define \begin{align*} \rho_1 := \max_{\alpha \in S^{p-1},\,\beta \in S^{q-1}} \alpha^\top C\beta. \end{align*} If this maximum is negative, replacing $\alpha$ by $-\alpha$ makes the value positive, so $\rho_1\geq 0$. Let $(\alpha_1,\beta_1)$ be a maximizer, and define \begin{align*} a_1:=\Sigma_{11}^{-1/2}\alpha_1, \qquad b_1:=\Sigma_{22}^{-1/2}\beta_1. \end{align*} Then \begin{align*} a_1^\top\Sigma_{11}a_1=1, \qquad b_1^\top\Sigma_{22}b_1=1, \qquad a_1^\top\Sigma_{12}b_1=\rho_1. \end{align*} Thus the first population canonical correlation and a corresponding pair of canonical directions exist. [/step] [step:Construct the full ordered family from singular vectors of the whitened matrix] Let \begin{align*} C = UDV^\top \end{align*} be a singular value decomposition of $C$, where $U \in \mathbb{R}^{p\times p}$ and $V \in \mathbb{R}^{q\times q}$ are orthogonal matrices, and $D \in \mathbb{R}^{p\times q}$ has diagonal entries \begin{align*} \sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_r \geq 0, \qquad r:=\min\{p,q\}, \end{align*} with all other entries equal to $0$. Let $u_k \in \mathbb{R}^p$ denote the $k$-th column of $U$, and let $v_k \in \mathbb{R}^q$ denote the $k$-th column of $V$. Define \begin{align*} \rho_k := \sigma_k, \qquad a_k := \Sigma_{11}^{-1/2}u_k, \qquad b_k := \Sigma_{22}^{-1/2}v_k \end{align*} for $1\leq k\leq r$. Since the columns of $U$ and $V$ are Euclidean orthonormal, the whitening computation gives \begin{align*} a_i^\top \Sigma_{11} a_j &= u_i^\top u_j = \delta_{ij}, \\ b_i^\top \Sigma_{22} b_j &= v_i^\top v_j = \delta_{ij}. \end{align*} Also, because $Cv_j=\sigma_j u_j$, we have \begin{align*} a_i^\top\Sigma_{12}b_j = u_i^\top C v_j = u_i^\top(\sigma_j u_j) = \sigma_j\delta_{ij} = \rho_i\delta_{ij}. \end{align*} Therefore the singular vectors of $C$ produce an ordered family of population canonical direction pairs, and the canonical correlations are exactly the singular values of $C$. [guided] The first maximizer gives existence of one canonical pair, but the theorem asserts an ordered family. The natural object controlling all pairs is the singular value decomposition of the whitened matrix $C$. Choose a singular value decomposition \begin{align*} C = UDV^\top, \end{align*} where $U \in \mathbb{R}^{p\times p}$ and $V \in \mathbb{R}^{q\times q}$ are orthogonal, and $D \in \mathbb{R}^{p\times q}$ has diagonal entries \begin{align*} \sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_r \geq 0, \qquad r:=\min\{p,q\}. \end{align*} Let $u_k$ be the $k$-th column of $U$ and $v_k$ the $k$-th column of $V$. These vectors satisfy \begin{align*} |u_k|=1, \qquad |v_k|=1, \qquad u_i^\top u_j=\delta_{ij}, \qquad v_i^\top v_j=\delta_{ij}, \qquad Cv_k=\sigma_k u_k. \end{align*} Now translate back to the original covariance coordinates by defining \begin{align*} \rho_k := \sigma_k, \qquad a_k := \Sigma_{11}^{-1/2}u_k, \qquad b_k := \Sigma_{22}^{-1/2}v_k. \end{align*} The covariance orthonormality relations are exactly the Euclidean orthonormality relations after whitening: \begin{align*} a_i^\top \Sigma_{11}a_j = u_i^\top u_j = \delta_{ij}, \qquad b_i^\top \Sigma_{22}b_j = v_i^\top v_j = \delta_{ij}. \end{align*} For the cross-covariance relation, substitute the definition of $C$: \begin{align*} a_i^\top\Sigma_{12}b_j = u_i^\top \Sigma_{11}^{-1/2}\Sigma_{12}\Sigma_{22}^{-1/2}v_j = u_i^\top C v_j. \end{align*} Since $Cv_j=\sigma_j u_j$, this becomes \begin{align*} a_i^\top\Sigma_{12}b_j = u_i^\top(\sigma_j u_j) = \sigma_j\delta_{ij}. \end{align*} When $i=j$, this is $\rho_i$; when $i\neq j$, it is $0$. Thus the singular value decomposition supplies the entire ordered canonical system. [/guided] [/step] [step:Identify the variational meaning of the singular values] For any $\alpha \in S^{p-1}$ and $\beta \in S^{q-1}$, write \begin{align*} \tilde{\alpha}:=U^\top\alpha \in \mathbb{R}^p, \qquad \tilde{\beta}:=V^\top\beta \in \mathbb{R}^q. \end{align*} Orthogonality of $U$ and $V$ gives $|\tilde{\alpha}|=|\alpha|=1$ and $|\tilde{\beta}|=|\beta|=1$. Therefore \begin{align*} \alpha^\top C\beta &= \alpha^\top UDV^\top\beta = \tilde{\alpha}^\top D\tilde{\beta} = \sum_{k=1}^{r}\sigma_k \tilde{\alpha}_k\tilde{\beta}_k \\ &\leq \sigma_1\sum_{k=1}^{r}|\tilde{\alpha}_k||\tilde{\beta}_k| \leq \sigma_1 \left(\sum_{k=1}^{r}\tilde{\alpha}_k^2\right)^{1/2} \left(\sum_{k=1}^{r}\tilde{\beta}_k^2\right)^{1/2} \leq \sigma_1. \end{align*} The middle inequality is the [Cauchy-Schwarz inequality](/theorems/432) in $\mathbb{R}^r$. Equality is attained at $\alpha=u_1$ and $\beta=v_1$, so the first canonical correlation is $\sigma_1$. More generally, after imposing the orthogonality constraints \begin{align*} \alpha^\top u_i=0, \qquad \beta^\top v_i=0 \end{align*} for $1\leq i<k$, the same computation restricted to the remaining coordinates gives the maximal value $\sigma_k$, attained at $(u_k,v_k)$. Hence the ordered canonical correlations are precisely the ordered singular values of $C$. [guided] We now verify that the singular values are not merely a convenient construction, but exactly the variational optima defining the canonical correlations. Take arbitrary unit vectors $\alpha \in S^{p-1}$ and $\beta \in S^{q-1}$, and rotate them into the singular-vector coordinates: \begin{align*} \tilde{\alpha}:=U^\top\alpha, \qquad \tilde{\beta}:=V^\top\beta. \end{align*} Because $U$ and $V$ are orthogonal matrices, they preserve Euclidean norms: \begin{align*} |\tilde{\alpha}|=|\alpha|=1, \qquad |\tilde{\beta}|=|\beta|=1. \end{align*} In these coordinates the objective diagonalizes: \begin{align*} \alpha^\top C\beta = \alpha^\top UDV^\top\beta = \tilde{\alpha}^\top D\tilde{\beta} = \sum_{k=1}^{r}\sigma_k\tilde{\alpha}_k\tilde{\beta}_k. \end{align*} Since $\sigma_k\leq\sigma_1$ for every $k$, we estimate \begin{align*} \sum_{k=1}^{r}\sigma_k\tilde{\alpha}_k\tilde{\beta}_k \leq \sigma_1\sum_{k=1}^{r}|\tilde{\alpha}_k||\tilde{\beta}_k|. \end{align*} Applying the Cauchy-Schwarz inequality in the Euclidean space $\mathbb{R}^r$ gives \begin{align*} \sum_{k=1}^{r}|\tilde{\alpha}_k||\tilde{\beta}_k| \leq \left(\sum_{k=1}^{r}\tilde{\alpha}_k^2\right)^{1/2} \left(\sum_{k=1}^{r}\tilde{\beta}_k^2\right)^{1/2} \leq 1. \end{align*} Thus $\alpha^\top C\beta\leq\sigma_1$ for all feasible pairs. Equality occurs when $\alpha=u_1$ and $\beta=v_1$, because then \begin{align*} u_1^\top C v_1 = u_1^\top(\sigma_1 u_1)=\sigma_1. \end{align*} So the first canonical correlation is $\sigma_1$. For the later canonical directions, the covariance orthogonality constraints become Euclidean orthogonality constraints after whitening: \begin{align*} a^\top\Sigma_{11}a_i=0 \iff \alpha^\top u_i=0, \qquad b^\top\Sigma_{22}b_i=0 \iff \beta^\top v_i=0. \end{align*} Imposing these constraints for $1\leq i<k$ removes the first $k-1$ singular-vector coordinates. Repeating the same Cauchy-Schwarz estimate on the remaining coordinates shows that the largest possible value is $\sigma_k$, attained by $(u_k,v_k)$. This proves that the canonical correlations are precisely the ordered singular values of $C$. [/guided] [/step] [step:Use simplicity of positive singular values to prove uniqueness up to simultaneous sign] Fix $k$ with $\rho_k=\sigma_k>0$, and assume $\sigma_k$ is simple. Let $(\alpha,\beta)\in S^{p-1}\times S^{q-1}$ be a whitened canonical pair associated with $\sigma_k$, satisfying the preceding orthogonality constraints and attaining the value $\sigma_k$. The equality case in the variational estimate forces $\tilde{\alpha}_j=\tilde{\beta}_j=0$ for all singular coordinates with $\sigma_j<\sigma_k$ or $\sigma_j>\sigma_k$, and since $\sigma_k$ is simple, the only remaining coordinate is $j=k$. Hence \begin{align*} \alpha=\varepsilon u_k, \qquad \beta=\varepsilon v_k \end{align*} for some $\varepsilon\in\{-1,1\}$; the signs must agree because the displayed canonical correlation is required to be $+\sigma_k$. Returning to the original variables gives \begin{align*} a=\Sigma_{11}^{-1/2}\alpha = \varepsilon \Sigma_{11}^{-1/2}u_k = \varepsilon a_k, \qquad b=\Sigma_{22}^{-1/2}\beta = \varepsilon \Sigma_{22}^{-1/2}v_k = \varepsilon b_k. \end{align*} Thus the canonical direction pair is unique up to the simultaneous sign change $(a_k,b_k)\mapsto(-a_k,-b_k)$. Reversing only one sign changes \begin{align*} a_k^\top\Sigma_{12}b_k \end{align*} from $\rho_k$ to $-\rho_k$, so it changes the positive-correlation convention rather than producing the same positively oriented canonical pair. This proves the asserted uniqueness. [/step]

Prerequisites (0/3 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

Definitions & Concepts

Orthogonality

Explore Further

Orthogonality Definition Cauchy-Schwarz Inequality Theorem #432 Singular Value Decomposition Theorem #3071 Characteristic Function of the Multivariate Normal Distribution probability Hypothesis and Error SSP Matrices for Multivariate General Linear Model Contrasts probability Forecast Error Variance of a Causal ARMA Process probability Yule-Walker Equations for a Causal Autoregressive Process probability Positive Definiteness Criterion for Autocovariance Functions probability Consistency of Sample Principal Component Analysis probability Bartlett's Chi-Squared Approximation for Wilks' Lambda in One-Way MANOVA probability Reduction of Multivariate Linear Hypotheses to MANOVA probability

What brings you to Androma?

Start with a route through the knowledge graph.