Rotational Diagonalization of a Maximum Likelihood Factor Loading Representative

Rotational Diagonalization of a Maximum Likelihood Factor Loading Representative (Theorem # 4038)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] The likelihood and observational distribution are unchanged by right multiplication of the loading matrix by an orthogonal matrix, because this operation preserves $\Lambda\Lambda^\top$. We therefore choose the orthogonal rotation to diagonalize the symmetric matrix $\hat{\Lambda}^\top\hat{\Psi}^{-1}\hat{\Lambda}$. The finite-dimensional real spectral theorem supplies such an orthogonal matrix, and the covariance calculation shows that the rotated representative remains in the same maximum likelihood equivalence class. [/proofplan] [step:Form the symmetric weighted Gram matrix of the loading representative] Since $\hat{\Psi}$ is positive diagonal, every diagonal entry of $\hat{\Psi}$ is strictly positive, so $\hat{\Psi}^{-1} \in \mathbb{R}^{p \times p}$ exists and is positive diagonal. Define the matrix \begin{align*} A := \hat{\Lambda}^\top \hat{\Psi}^{-1}\hat{\Lambda} \in \mathbb{R}^{m \times m}. \end{align*} The matrix $A$ is symmetric, because $\hat{\Psi}^{-1}$ is symmetric and hence \begin{align*} A^\top &= \left(\hat{\Lambda}^\top \hat{\Psi}^{-1}\hat{\Lambda}\right)^\top \\ &= \hat{\Lambda}^\top \left(\hat{\Psi}^{-1}\right)^\top \hat{\Lambda} \\ &= \hat{\Lambda}^\top \hat{\Psi}^{-1}\hat{\Lambda} \\ &= A. \end{align*} [/step] [step:Choose an orthogonal rotation that diagonalizes the weighted Gram matrix] By the finite-dimensional real spectral theorem for symmetric matrices (citing a result not yet in the wiki: Spectral Theorem for Real Symmetric Matrices), since $A \in \mathbb{R}^{m \times m}$ is symmetric, there exist an orthogonal matrix $T \in \mathbb{R}^{m \times m}$ and a diagonal matrix $D \in \mathbb{R}^{m \times m}$ such that \begin{align*} T^\top A T = D. \end{align*} Substituting the definition of $A$ gives \begin{align*} T^\top \hat{\Lambda}^\top \hat{\Psi}^{-1}\hat{\Lambda}T = D. \end{align*} Equivalently, \begin{align*} (\hat{\Lambda}T)^\top \hat{\Psi}^{-1}(\hat{\Lambda}T) = D, \end{align*} so the rotated loading matrix $\hat{\Lambda}T$ has diagonal weighted Gram matrix. [guided] The matrix we want to make diagonal is \begin{align*} A := \hat{\Lambda}^\top \hat{\Psi}^{-1}\hat{\Lambda}. \end{align*} The previous step verified that $A$ is a real symmetric $m \times m$ matrix. This is exactly the hypothesis needed for the finite-dimensional real spectral theorem for symmetric matrices (citing a result not yet in the wiki: Spectral Theorem for Real Symmetric Matrices). That theorem gives an orthogonal matrix $T \in \mathbb{R}^{m \times m}$ whose columns are an orthonormal eigenbasis for $A$, and it gives a diagonal matrix $D \in \mathbb{R}^{m \times m}$ such that \begin{align*} T^\top A T = D. \end{align*} Now replace $A$ by its definition. We obtain \begin{align*} T^\top \hat{\Lambda}^\top \hat{\Psi}^{-1}\hat{\Lambda}T = D. \end{align*} Because matrix multiplication is associative and $T^\top\hat{\Lambda}^\top = (\hat{\Lambda}T)^\top$, the left-hand side is exactly \begin{align*} (\hat{\Lambda}T)^\top \hat{\Psi}^{-1}(\hat{\Lambda}T). \end{align*} Thus the same orthogonal matrix $T$ that diagonalizes $A$ also makes the rotated representative satisfy \begin{align*} (\hat{\Lambda}T)^\top \hat{\Psi}^{-1}(\hat{\Lambda}T) = D, \end{align*} with $D$ diagonal. [/guided] [/step] [step:Verify that the rotation preserves the fitted covariance and likelihood] Since $T$ is orthogonal, $TT^\top = I_m$, where $I_m \in \mathbb{R}^{m \times m}$ denotes the identity matrix. Therefore the fitted covariance matrix of the rotated pair $(\hat{\Lambda}T,\hat{\Psi})$ is \begin{align*} \Sigma(\hat{\Lambda}T,\hat{\Psi}) &= (\hat{\Lambda}T)(\hat{\Lambda}T)^\top + \hat{\Psi} \\ &= \hat{\Lambda}T T^\top \hat{\Lambda}^\top + \hat{\Psi} \\ &= \hat{\Lambda} I_m \hat{\Lambda}^\top + \hat{\Psi} \\ &= \hat{\Lambda}\hat{\Lambda}^\top + \hat{\Psi} \\ &= \Sigma(\hat{\Lambda},\hat{\Psi}). \end{align*} By hypothesis, the Gaussian likelihood depends on the parameters only through the covariance matrix $\Sigma(\Lambda,\Psi)$. Hence $(\hat{\Lambda}T,\hat{\Psi})$ has the same likelihood value as $(\hat{\Lambda},\hat{\Psi})$. Since $(\hat{\Lambda},\hat{\Psi})$ is a maximum likelihood representative, $(\hat{\Lambda}T,\hat{\Psi})$ is also a maximum likelihood representative. The equality of covariance matrices also shows that the two representatives are observationally equivalent. [guided] The only remaining point is to check that the rotation did not change the statistical model represented by the parameters. The covariance map is \begin{align*} \Sigma(\Lambda,\Psi) := \Lambda\Lambda^\top + \Psi. \end{align*} For the rotated pair $(\hat{\Lambda}T,\hat{\Psi})$, we compute directly: \begin{align*} \Sigma(\hat{\Lambda}T,\hat{\Psi}) &= (\hat{\Lambda}T)(\hat{\Lambda}T)^\top + \hat{\Psi} \\ &= \hat{\Lambda}T T^\top \hat{\Lambda}^\top + \hat{\Psi}. \end{align*} The orthogonality of $T$ means $TT^\top = I_m$, so this becomes \begin{align*} \Sigma(\hat{\Lambda}T,\hat{\Psi}) &= \hat{\Lambda} I_m \hat{\Lambda}^\top + \hat{\Psi} \\ &= \hat{\Lambda}\hat{\Lambda}^\top + \hat{\Psi} \\ &= \Sigma(\hat{\Lambda},\hat{\Psi}). \end{align*} Thus the rotated and unrotated representatives determine the same covariance matrix. Since the Gaussian likelihood is assumed to depend on $(\Lambda,\Psi)$ only through $\Sigma(\Lambda,\Psi)$, the two representatives have the same likelihood value. The original representative $(\hat{\Lambda},\hat{\Psi})$ is maximum likelihood by hypothesis, so the rotated representative $(\hat{\Lambda}T,\hat{\Psi})$ must also be maximum likelihood. The same covariance equality proves observational equivalence. [/guided] [/step] [step:Conclude the existence of a diagonalizing maximum likelihood representative] The orthogonal matrix $T$ constructed above satisfies both required properties: \begin{align*} \Sigma(\hat{\Lambda}T,\hat{\Psi}) = \Sigma(\hat{\Lambda},\hat{\Psi}) \end{align*} and \begin{align*} (\hat{\Lambda}T)^\top \hat{\Psi}^{-1}(\hat{\Lambda}T) \end{align*} is diagonal. Therefore $(\hat{\Lambda}T,\hat{\Psi})$ is an observationally equivalent maximum likelihood representative whose weighted loading Gram matrix is diagonal. This proves the theorem. [/step]

Prerequisites (0/2 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Definitions & Concepts

Explore Further

Distribution Definition Orthogonality Definition Eckart–Young–Mirsky Theorem probability Independence of the Sample Mean and Sample Covariance for Multivariate Normal Samples probability Linear Pairwise Decision Boundaries for Sample LDA probability Consistency of Sample Principal Component Analysis probability Simultaneous Confidence Intervals for Mean Contrasts in One-Way MANOVA probability Unbiasedness of the Sample Covariance Matrix probability Linear Filter Spectral Transformation Theorem probability Reduction of Multivariate Linear Hypotheses to MANOVA probability

What brings you to Androma?

Start with a route through the knowledge graph.