Likelihood Ratio Test for the Number of Factors in Gaussian Factor Analysis

Likelihood Ratio Test for the Number of Factors in Gaussian Factor Analysis (Theorem # 4040)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We write the Gaussian log-likelihood as a function of the covariance matrix and evaluate it at the maximizers under the saturated and constrained models. The saturated model is minimized, in negative log-likelihood form, at $\Sigma=S$, while the constrained model is minimized at $\hat{\Sigma}_k$ by the definition of the constrained maximum likelihood estimator. Subtracting the two maximized log-likelihoods cancels the constants and gives the displayed statistic. The chi-squared reference law follows from Wilks' likelihood-ratio theorem after counting the locally identifiable covariance parameters in the factor model. [/proofplan] [step:Write the covariance log-likelihood up to constants] Let $\mathcal{M}_{\mathrm{sat}}$ denote the cone of positive definite symmetric $p\times p$ covariance matrices, and let \begin{align*} \mathcal{M}_k := \{\Lambda\Lambda^\top+\Psi : \Lambda\in\mathbb{R}^{p\times k},\ \Psi\in\mathbb{R}^{p\times p}\text{ is positive diagonal},\ \Lambda\Lambda^\top+\Psi\in\mathcal{M}_{\mathrm{sat}}\} \end{align*} denote the covariance model determined by $H_k$. Define \begin{align*} \ell_S:\mathcal{M}_{\mathrm{sat}} &\to \mathbb{R} \\ \Sigma &\mapsto -\frac{n}{2}\left\{\log\det(\Sigma)+\operatorname{tr}(S\Sigma^{-1})\right\} \end{align*} denote the Gaussian covariance log-likelihood with all additive constants independent of $\Sigma$ omitted. The positive definiteness of $\Sigma$ ensures that $\log\det(\Sigma)$ and $\Sigma^{-1}$ are well-defined. Since $S$ is positive definite by hypothesis, $S \in \mathcal{M}_{\mathrm{sat}}$. [guided] Let $\mathcal{M}_{\mathrm{sat}}$ denote the cone of positive definite symmetric $p\times p$ covariance matrices. Let \begin{align*} \mathcal{M}_k := \{\Lambda\Lambda^\top+\Psi : \Lambda\in\mathbb{R}^{p\times k},\ \Psi\in\mathbb{R}^{p\times p}\text{ is positive diagonal},\ \Lambda\Lambda^\top+\Psi\in\mathcal{M}_{\mathrm{sat}}\} \end{align*} denote the covariance model determined by $H_k$. For a centered $p$-variate Gaussian model, the part of the log-likelihood depending on the covariance matrix $\Sigma$ is \begin{align*} -\frac{n}{2}\log\det(\Sigma)-\frac{n}{2}\operatorname{tr}(S\Sigma^{-1}). \end{align*} All remaining terms depend only on $n$ and $p$, so they cancel in a likelihood ratio. Thus we define the likelihood map \begin{align*} \ell_S:\mathcal{M}_{\mathrm{sat}} &\to \mathbb{R} \\ \Sigma &\mapsto -\frac{n}{2}\left\{\log\det(\Sigma)+\operatorname{tr}(S\Sigma^{-1})\right\}. \end{align*} The domain is the cone of positive definite symmetric matrices, because both $\log\det(\Sigma)$ and $\Sigma^{-1}$ require positive definiteness. The assumption that $S$ is positive definite will also allow the saturated maximum to occur at $\Sigma=S$. [/guided] [/step] [step:Show that the saturated Gaussian covariance model is maximized at $S$] Define the negative normalized objective \begin{align*} Q_S:\mathcal{M}_{\mathrm{sat}} &\to \mathbb{R} \\ \Sigma &\mapsto \log\det(\Sigma)+\operatorname{tr}(S\Sigma^{-1}). \end{align*} We show that $Q_S$ is minimized at $\Sigma=S$. For $\Sigma \in \mathcal{M}_{\mathrm{sat}}$, define \begin{align*} A := S^{-1/2}\Sigma S^{-1/2}. \end{align*} Then $A$ is positive definite. Let $\alpha_1,\dots,\alpha_p>0$ denote the eigenvalues of $A$. Using cyclic invariance of trace and multiplicativity of determinant, \begin{align*} Q_S(\Sigma) &= \log\det(S)+\log\det(A)+\operatorname{tr}(A^{-1}) \\ &= \log\det(S)+\sum_{i=1}^{p}\left(\log \alpha_i+\frac{1}{\alpha_i}\right). \end{align*} For each $a>0$, the scalar function $h:(0,\infty)\to\mathbb{R}$ defined by \begin{align*} h(a):=\log a+\frac{1}{a} \end{align*} satisfies $h'(a)=(a-1)/a^2$, so $h$ is strictly decreasing on $(0,1)$ and strictly increasing on $(1,\infty)$. Hence $h(a)\geq h(1)=1$, with equality iff $a=1$. Therefore \begin{align*} Q_S(\Sigma)\geq \log\det(S)+p=Q_S(S), \end{align*} with equality iff $A=I_p$, equivalently $\Sigma=S$. Thus the saturated maximum likelihood covariance estimator is $S$. [guided] We need to justify that the unrestricted covariance model chooses $\Sigma=S$. Since maximizing $\ell_S$ is the same as minimizing \begin{align*} Q_S:\mathcal{M}_{\mathrm{sat}} &\to \mathbb{R} \\ \Sigma &\mapsto \log\det(\Sigma)+\operatorname{tr}(S\Sigma^{-1}), \end{align*} we prove that $Q_S$ has its unique minimum at $S$. The comparison is easiest after normalizing by $S$. Because $S$ is positive definite, its positive definite square root $S^{1/2}$ and inverse square root $S^{-1/2}$ exist. For a candidate covariance matrix $\Sigma \in \mathcal{M}_{\mathrm{sat}}$, define \begin{align*} A := S^{-1/2}\Sigma S^{-1/2}. \end{align*} Then $A$ is positive definite, so it has positive eigenvalues $\alpha_1,\dots,\alpha_p$. The determinant term transforms as \begin{align*} \log\det(\Sigma) = \log\det(S^{1/2}AS^{1/2}) = \log\det(S)+\log\det(A). \end{align*} For the trace term, cyclic invariance of trace gives \begin{align*} \operatorname{tr}(S\Sigma^{-1}) &= \operatorname{tr}\left(S(S^{-1/2}A^{-1}S^{-1/2})\right) \\ &= \operatorname{tr}\left(S^{1/2}A^{-1}S^{-1/2}\right) \\ &= \operatorname{tr}(A^{-1}). \end{align*} Therefore \begin{align*} Q_S(\Sigma) &= \log\det(S)+\log\det(A)+\operatorname{tr}(A^{-1}) \\ &= \log\det(S)+\sum_{i=1}^{p}\left(\log \alpha_i+\frac{1}{\alpha_i}\right). \end{align*} It remains to minimize the scalar expression $\log a+1/a$ over $a>0$. Define \begin{align*} h:(0,\infty)&\to\mathbb{R} \\ a&\mapsto \log a+\frac{1}{a}. \end{align*} Then \begin{align*} h'(a)=\frac{1}{a}-\frac{1}{a^2}=\frac{a-1}{a^2}. \end{align*} Thus $h$ decreases on $(0,1)$ and increases on $(1,\infty)$, so its unique minimum occurs at $a=1$, where $h(1)=1$. Applying this to each eigenvalue $\alpha_i$ gives \begin{align*} Q_S(\Sigma)\geq \log\det(S)+p. \end{align*} Equality holds precisely when every $\alpha_i=1$, which means $A=I_p$ and hence $\Sigma=S$. Thus the saturated model is maximized at $\Sigma=S$. [/guided] [/step] [step:Evaluate the constrained and saturated maximized likelihoods] By hypothesis, the constrained maximum likelihood estimator exists in $\mathcal{M}_k$ and is represented by $(\hat{\Lambda},\hat{\Psi})$, so the constrained maximizer is \begin{align*} \hat{\Sigma}_k=\hat{\Lambda}\hat{\Lambda}^\top+\hat{\Psi}. \end{align*} Hence \begin{align*} \sup_{\Sigma\in\mathcal{M}_k}\ell_S(\Sigma) &= -\frac{n}{2}\left\{\log\det(\hat{\Sigma}_k)+\operatorname{tr}(S\hat{\Sigma}_k^{-1})\right\}, \end{align*} while the previous step gives \begin{align*} \sup_{\Sigma\in\mathcal{M}_{\mathrm{sat}}}\ell_S(\Sigma) &= -\frac{n}{2}\left\{\log\det(S)+p\right\}. \end{align*} The likelihood ratio statistic is \begin{align*} G_k^2 &:= -2\left( \sup_{\Sigma\in\mathcal{M}_k}\ell_S(\Sigma) - \sup_{\Sigma\in\mathcal{M}_{\mathrm{sat}}}\ell_S(\Sigma) \right) \\ &= n\left\{ \log\det(\hat{\Sigma}_k) + \operatorname{tr}(S\hat{\Sigma}_k^{-1}) - \log\det(S) - p \right\}. \end{align*} This is the displayed formula. [/step] [step:Count the codimension under the local identifiability hypothesis] The saturated covariance model has \begin{align*} \frac{p(p+1)}{2} \end{align*} free covariance parameters, one for each entry on and above the diagonal of a symmetric $p\times p$ matrix. Define the positive diagonal cone \begin{align*} \mathcal{D}_p^+ := \{\Psi\in\mathbb{R}^{p\times p}: \Psi \text{ is diagonal and } \Psi_{ii}>0 \text{ for } 1\leq i\leq p\} \end{align*} and define the factor covariance parametrization \begin{align*} \Phi_k: \mathbb{R}^{p\times k}\times \mathcal{D}_p^+ &\to \mathcal{M}_{\mathrm{sat}} \\ (\Lambda,\Psi)&\mapsto \Lambda\Lambda^\top+\Psi. \end{align*} The loading matrix $\Lambda\in\mathbb{R}^{p\times k}$ contributes $pk$ raw parameters and $\Psi\in\mathcal{D}_p^+$ contributes $p$ raw parameters. For every orthogonal matrix $R\in O(k)$, \begin{align*} \Phi_k(\Lambda R,\Psi) = (\Lambda R)(\Lambda R)^\top+\Psi = \Lambda RR^\top\Lambda^\top+\Psi = \Lambda\Lambda^\top+\Psi, \end{align*} so the $O(k)$ rotation action gives a local non-identifiability of dimension $k(k-1)/2$. The regularity assumption in the theorem is precisely that, in a neighbourhood of the true parameter, this rotational orbit is the full local fibre of $\Phi_k$ and the induced quotient map has constant rank. Equivalently, the image of $\Phi_k$ is locally a smooth embedded submanifold of $\mathcal{M}_{\mathrm{sat}}$ of dimension \begin{align*} d_k = pk+p-\frac{k(k-1)}{2}, \end{align*} with $d_k\leq p(p+1)/2$. Under this hypothesis, the number of covariance restrictions imposed by $\mathcal{M}_k$ is its codimension in the saturated covariance cone: \begin{align*} \nu_k = \frac{p(p+1)}{2} - d_k = \frac{p(p+1)}{2} - \left(pk+p-\frac{k(k-1)}{2}\right). \end{align*} If the quotient dimension formula exceeds the saturated dimension, or if the fibre has additional local non-identifiability beyond rotations, this constant-rank hypothesis fails and the displayed degree-of-freedom formula is not asserted. [guided] The asymptotic chi-squared degrees of freedom are the codimension of the constrained covariance model inside the saturated covariance model, but this codimension can only be computed by parameter counting under a genuine local identifiability hypothesis. First, a symmetric $p\times p$ covariance matrix has $p$ diagonal entries and $p(p-1)/2$ entries above the diagonal. Thus the saturated covariance model has \begin{align*} p+\frac{p(p-1)}{2}=\frac{p(p+1)}{2} \end{align*} free covariance parameters. Now define the positive diagonal cone \begin{align*} \mathcal{D}_p^+ := \{\Psi\in\mathbb{R}^{p\times p}: \Psi \text{ is diagonal and } \Psi_{ii}>0 \text{ for } 1\leq i\leq p\} \end{align*} and define the parametrization of the $k$-factor covariance model by \begin{align*} \Phi_k: \mathbb{R}^{p\times k}\times \mathcal{D}_p^+ &\to \mathcal{M}_{\mathrm{sat}} \\ (\Lambda,\Psi)&\mapsto \Lambda\Lambda^\top+\Psi. \end{align*} The loading matrix $\Lambda\in\mathbb{R}^{p\times k}$ has $pk$ real entries, and the diagonal uniqueness matrix $\Psi\in\mathcal{D}_p^+$ has $p$ positive diagonal entries, giving $pk+p$ raw parameters. These raw parameters are not all identifiable. If $R\in O(k)$ is an orthogonal matrix, then \begin{align*} \Phi_k(\Lambda R,\Psi) &=(\Lambda R)(\Lambda R)^\top+\Psi \\ &=\Lambda RR^\top\Lambda^\top+\Psi \\ &=\Lambda\Lambda^\top+\Psi. \end{align*} Thus rotations of the latent factor coordinates do not change the covariance matrix. The orthogonal group $O(k)$ has local dimension $k(k-1)/2$, the number of independent skew-symmetric directions in a $k\times k$ matrix. The important regularity point is that this subtraction is not automatic. The theorem assumes that, near the true parameter, the rotational orbit is the entire local fibre of $\Phi_k$ and that the induced map from the quotient parameter space into $\mathcal{M}_{\mathrm{sat}}$ has constant rank. Under exactly this hypothesis, the local dimension of the factor covariance model is \begin{align*} d_k = pk+p-\frac{k(k-1)}{2}. \end{align*} This hypothesis also forces $d_k\leq p(p+1)/2$, since a smooth embedded submanifold of the saturated covariance cone cannot have dimension larger than the ambient cone. Therefore the number of independent covariance restrictions imposed by the $k$-factor structure is the codimension \begin{align*} \nu_k = \frac{p(p+1)}{2} - d_k = \frac{p(p+1)}{2} - \left(pk+p-\frac{k(k-1)}{2}\right). \end{align*} If $k$ is so large that the raw quotient dimension exceeds the saturated covariance dimension, or if additional local non-identifiability is present, then this parameter count is invalid and the regular Wilks reference law with this $\nu_k$ is not being claimed. [/guided] [/step] [step:Apply the regular likelihood-ratio asymptotic theorem] The hypotheses now give the concrete local conditions needed for the regular likelihood-ratio theorem: $\mathcal{M}_k\subseteq\mathcal{M}_{\mathrm{sat}}$ is locally a smooth embedded submanifold near the true covariance, its codimension is $\nu_k$, the true covariance is an interior point of this local submanifold, the quotient parametrization has full rank after removing the $O(k)$ rotational directions, and the uniqueness parameters remain positive. Therefore Wilks' likelihood-ratio theorem applies to this nested pair of regular covariance models. Its conclusion gives \begin{align*} G_k^2 \xrightarrow{d} \chi^2_{\nu_k}, \end{align*} where $\nu_k$ is the codimension computed in the previous step. Boundary cases such as zero uniquenesses, rotational singularities, additional non-rotational local non-identifiability, dimension saturation for large $k$, or non-unique constrained maximizers violate these regularity hypotheses, so they are excluded from the stated chi-squared approximation. This completes the proof. [/step]

Prerequisites (0/2 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Definitions & Concepts

Explore Further

Determinant Definition Distribution Definition Gaussian Copula Has Zero Tail Dependence probability Stationarity of the Differenced ARIMA Process probability Ljung-Box Portmanteau Test probability Bonferroni Simultaneous Confidence Intervals for Linear Contrasts of a Multivariate Normal Mean probability Distribution of the Sample Mean of a Multivariate Normal Sample probability Nelson's Strict Stationarity Theorem for the GARCH(1,1) Process probability Wilks' Lambda for Testing Zero Cross-Covariance probability Rotational Diagonalization of a Maximum Likelihood Factor Loading Representative probability

What brings you to Androma?

Start with a route through the knowledge graph.