[proofplan]
We write the Gaussian log-likelihood as a function of the covariance matrix and evaluate it at the maximizers under the saturated and constrained models. The saturated model is minimized, in negative log-likelihood form, at $\Sigma=S$, while the constrained model is minimized at $\hat{\Sigma}_k$ by the definition of the constrained maximum likelihood estimator. Subtracting the two maximized log-likelihoods cancels the constants and gives the displayed statistic. The chi-squared reference law follows from Wilks' likelihood-ratio theorem after counting the locally identifiable covariance parameters in the factor model.
[/proofplan]
[step:Write the covariance log-likelihood up to constants]
Let $\mathcal{M}_{\mathrm{sat}}$ denote the cone of positive definite symmetric $p\times p$ covariance matrices, and let
\begin{align*}
\mathcal{M}_k := \{\Lambda\Lambda^\top+\Psi : \Lambda\in\mathbb{R}^{p\times k},\ \Psi\in\mathbb{R}^{p\times p}\text{ is positive diagonal},\ \Lambda\Lambda^\top+\Psi\in\mathcal{M}_{\mathrm{sat}}\}
\end{align*}
denote the covariance model determined by $H_k$. Define
\begin{align*}
\ell_S:\mathcal{M}_{\mathrm{sat}} &\to \mathbb{R} \\
\Sigma &\mapsto -\frac{n}{2}\left\{\log\det(\Sigma)+\operatorname{tr}(S\Sigma^{-1})\right\}
\end{align*}
denote the Gaussian covariance log-likelihood with all additive constants independent of $\Sigma$ omitted. The positive definiteness of $\Sigma$ ensures that $\log\det(\Sigma)$ and $\Sigma^{-1}$ are well-defined. Since $S$ is positive definite by hypothesis, $S \in \mathcal{M}_{\mathrm{sat}}$.
[guided]
Let $\mathcal{M}_{\mathrm{sat}}$ denote the cone of positive definite symmetric $p\times p$ covariance matrices. Let
\begin{align*}
\mathcal{M}_k := \{\Lambda\Lambda^\top+\Psi : \Lambda\in\mathbb{R}^{p\times k},\ \Psi\in\mathbb{R}^{p\times p}\text{ is positive diagonal},\ \Lambda\Lambda^\top+\Psi\in\mathcal{M}_{\mathrm{sat}}\}
\end{align*}
denote the covariance model determined by $H_k$. For a centered $p$-variate Gaussian model, the part of the log-likelihood depending on the covariance matrix $\Sigma$ is
\begin{align*}
-\frac{n}{2}\log\det(\Sigma)-\frac{n}{2}\operatorname{tr}(S\Sigma^{-1}).
\end{align*}
All remaining terms depend only on $n$ and $p$, so they cancel in a likelihood ratio. Thus we define the likelihood map
\begin{align*}
\ell_S:\mathcal{M}_{\mathrm{sat}} &\to \mathbb{R} \\
\Sigma &\mapsto -\frac{n}{2}\left\{\log\det(\Sigma)+\operatorname{tr}(S\Sigma^{-1})\right\}.
\end{align*}
The domain is the cone of positive definite symmetric matrices, because both $\log\det(\Sigma)$ and $\Sigma^{-1}$ require positive definiteness. The assumption that $S$ is positive definite will also allow the saturated maximum to occur at $\Sigma=S$.
[/guided]
[/step]
[step:Show that the saturated Gaussian covariance model is maximized at $S$]
Define the negative normalized objective
\begin{align*}
Q_S:\mathcal{M}_{\mathrm{sat}} &\to \mathbb{R} \\
\Sigma &\mapsto \log\det(\Sigma)+\operatorname{tr}(S\Sigma^{-1}).
\end{align*}
We show that $Q_S$ is minimized at $\Sigma=S$. For $\Sigma \in \mathcal{M}_{\mathrm{sat}}$, define
\begin{align*}
A := S^{-1/2}\Sigma S^{-1/2}.
\end{align*}
Then $A$ is positive definite. Let $\alpha_1,\dots,\alpha_p>0$ denote the eigenvalues of $A$. Using cyclic invariance of trace and multiplicativity of determinant,
\begin{align*}
Q_S(\Sigma)
&=
\log\det(S)+\log\det(A)+\operatorname{tr}(A^{-1}) \\
&=
\log\det(S)+\sum_{i=1}^{p}\left(\log \alpha_i+\frac{1}{\alpha_i}\right).
\end{align*}
For each $a>0$, the scalar function $h:(0,\infty)\to\mathbb{R}$ defined by
\begin{align*}
h(a):=\log a+\frac{1}{a}
\end{align*}
satisfies $h'(a)=(a-1)/a^2$, so $h$ is strictly decreasing on $(0,1)$ and strictly increasing on $(1,\infty)$. Hence $h(a)\geq h(1)=1$, with equality iff $a=1$. Therefore
\begin{align*}
Q_S(\Sigma)\geq \log\det(S)+p=Q_S(S),
\end{align*}
with equality iff $A=I_p$, equivalently $\Sigma=S$. Thus the saturated maximum likelihood covariance estimator is $S$.
[guided]
We need to justify that the unrestricted covariance model chooses $\Sigma=S$. Since maximizing $\ell_S$ is the same as minimizing
\begin{align*}
Q_S:\mathcal{M}_{\mathrm{sat}} &\to \mathbb{R} \\
\Sigma &\mapsto \log\det(\Sigma)+\operatorname{tr}(S\Sigma^{-1}),
\end{align*}
we prove that $Q_S$ has its unique minimum at $S$.
The comparison is easiest after normalizing by $S$. Because $S$ is positive definite, its positive definite square root $S^{1/2}$ and inverse square root $S^{-1/2}$ exist. For a candidate covariance matrix $\Sigma \in \mathcal{M}_{\mathrm{sat}}$, define
\begin{align*}
A := S^{-1/2}\Sigma S^{-1/2}.
\end{align*}
Then $A$ is positive definite, so it has positive eigenvalues $\alpha_1,\dots,\alpha_p$. The determinant term transforms as
\begin{align*}
\log\det(\Sigma)
=
\log\det(S^{1/2}AS^{1/2})
=
\log\det(S)+\log\det(A).
\end{align*}
For the trace term, cyclic invariance of trace gives
\begin{align*}
\operatorname{tr}(S\Sigma^{-1})
&=
\operatorname{tr}\left(S(S^{-1/2}A^{-1}S^{-1/2})\right) \\
&=
\operatorname{tr}\left(S^{1/2}A^{-1}S^{-1/2}\right) \\
&=
\operatorname{tr}(A^{-1}).
\end{align*}
Therefore
\begin{align*}
Q_S(\Sigma)
&=
\log\det(S)+\log\det(A)+\operatorname{tr}(A^{-1}) \\
&=
\log\det(S)+\sum_{i=1}^{p}\left(\log \alpha_i+\frac{1}{\alpha_i}\right).
\end{align*}
It remains to minimize the scalar expression $\log a+1/a$ over $a>0$. Define
\begin{align*}
h:(0,\infty)&\to\mathbb{R} \\
a&\mapsto \log a+\frac{1}{a}.
\end{align*}
Then
\begin{align*}
h'(a)=\frac{1}{a}-\frac{1}{a^2}=\frac{a-1}{a^2}.
\end{align*}
Thus $h$ decreases on $(0,1)$ and increases on $(1,\infty)$, so its unique minimum occurs at $a=1$, where $h(1)=1$. Applying this to each eigenvalue $\alpha_i$ gives
\begin{align*}
Q_S(\Sigma)\geq \log\det(S)+p.
\end{align*}
Equality holds precisely when every $\alpha_i=1$, which means $A=I_p$ and hence $\Sigma=S$. Thus the saturated model is maximized at $\Sigma=S$.
[/guided]
[/step]
[step:Evaluate the constrained and saturated maximized likelihoods]
By hypothesis, the constrained maximum likelihood estimator exists in $\mathcal{M}_k$ and is represented by $(\hat{\Lambda},\hat{\Psi})$, so the constrained maximizer is
\begin{align*}
\hat{\Sigma}_k=\hat{\Lambda}\hat{\Lambda}^\top+\hat{\Psi}.
\end{align*}
Hence
\begin{align*}
\sup_{\Sigma\in\mathcal{M}_k}\ell_S(\Sigma)
&=
-\frac{n}{2}\left\{\log\det(\hat{\Sigma}_k)+\operatorname{tr}(S\hat{\Sigma}_k^{-1})\right\},
\end{align*}
while the previous step gives
\begin{align*}
\sup_{\Sigma\in\mathcal{M}_{\mathrm{sat}}}\ell_S(\Sigma)
&=
-\frac{n}{2}\left\{\log\det(S)+p\right\}.
\end{align*}
The likelihood ratio statistic is
\begin{align*}
G_k^2
&:=
-2\left(
\sup_{\Sigma\in\mathcal{M}_k}\ell_S(\Sigma)
-
\sup_{\Sigma\in\mathcal{M}_{\mathrm{sat}}}\ell_S(\Sigma)
\right) \\
&=
n\left\{
\log\det(\hat{\Sigma}_k)
+
\operatorname{tr}(S\hat{\Sigma}_k^{-1})
-
\log\det(S)
-
p
\right\}.
\end{align*}
This is the displayed formula.
[/step]
[step:Count the codimension under the local identifiability hypothesis]
The saturated covariance model has
\begin{align*}
\frac{p(p+1)}{2}
\end{align*}
free covariance parameters, one for each entry on and above the diagonal of a symmetric $p\times p$ matrix.
Define the positive diagonal cone
\begin{align*}
\mathcal{D}_p^+ := \{\Psi\in\mathbb{R}^{p\times p}: \Psi \text{ is diagonal and } \Psi_{ii}>0 \text{ for } 1\leq i\leq p\}
\end{align*}
and define the factor covariance parametrization
\begin{align*}
\Phi_k: \mathbb{R}^{p\times k}\times \mathcal{D}_p^+ &\to \mathcal{M}_{\mathrm{sat}} \\
(\Lambda,\Psi)&\mapsto \Lambda\Lambda^\top+\Psi.
\end{align*}
The loading matrix $\Lambda\in\mathbb{R}^{p\times k}$ contributes $pk$ raw parameters and $\Psi\in\mathcal{D}_p^+$ contributes $p$ raw parameters. For every orthogonal matrix $R\in O(k)$,
\begin{align*}
\Phi_k(\Lambda R,\Psi)
=
(\Lambda R)(\Lambda R)^\top+\Psi
=
\Lambda RR^\top\Lambda^\top+\Psi
=
\Lambda\Lambda^\top+\Psi,
\end{align*}
so the $O(k)$ rotation action gives a local non-identifiability of dimension $k(k-1)/2$.
The regularity assumption in the theorem is precisely that, in a neighbourhood of the true parameter, this rotational orbit is the full local fibre of $\Phi_k$ and the induced quotient map has constant rank. Equivalently, the image of $\Phi_k$ is locally a smooth embedded submanifold of $\mathcal{M}_{\mathrm{sat}}$ of dimension
\begin{align*}
d_k
=
pk+p-\frac{k(k-1)}{2},
\end{align*}
with $d_k\leq p(p+1)/2$. Under this hypothesis, the number of covariance restrictions imposed by $\mathcal{M}_k$ is its codimension in the saturated covariance cone:
\begin{align*}
\nu_k
=
\frac{p(p+1)}{2}
-
d_k
=
\frac{p(p+1)}{2}
-
\left(pk+p-\frac{k(k-1)}{2}\right).
\end{align*}
If the quotient dimension formula exceeds the saturated dimension, or if the fibre has additional local non-identifiability beyond rotations, this constant-rank hypothesis fails and the displayed degree-of-freedom formula is not asserted.
[guided]
The asymptotic chi-squared degrees of freedom are the codimension of the constrained covariance model inside the saturated covariance model, but this codimension can only be computed by parameter counting under a genuine local identifiability hypothesis.
First, a symmetric $p\times p$ covariance matrix has $p$ diagonal entries and $p(p-1)/2$ entries above the diagonal. Thus the saturated covariance model has
\begin{align*}
p+\frac{p(p-1)}{2}=\frac{p(p+1)}{2}
\end{align*}
free covariance parameters.
Now define the positive diagonal cone
\begin{align*}
\mathcal{D}_p^+ := \{\Psi\in\mathbb{R}^{p\times p}: \Psi \text{ is diagonal and } \Psi_{ii}>0 \text{ for } 1\leq i\leq p\}
\end{align*}
and define the parametrization of the $k$-factor covariance model by
\begin{align*}
\Phi_k: \mathbb{R}^{p\times k}\times \mathcal{D}_p^+ &\to \mathcal{M}_{\mathrm{sat}} \\
(\Lambda,\Psi)&\mapsto \Lambda\Lambda^\top+\Psi.
\end{align*}
The loading matrix $\Lambda\in\mathbb{R}^{p\times k}$ has $pk$ real entries, and the diagonal uniqueness matrix $\Psi\in\mathcal{D}_p^+$ has $p$ positive diagonal entries, giving $pk+p$ raw parameters.
These raw parameters are not all identifiable. If $R\in O(k)$ is an orthogonal matrix, then
\begin{align*}
\Phi_k(\Lambda R,\Psi)
&=(\Lambda R)(\Lambda R)^\top+\Psi \\
&=\Lambda RR^\top\Lambda^\top+\Psi \\
&=\Lambda\Lambda^\top+\Psi.
\end{align*}
Thus rotations of the latent factor coordinates do not change the covariance matrix. The orthogonal group $O(k)$ has local dimension $k(k-1)/2$, the number of independent skew-symmetric directions in a $k\times k$ matrix.
The important regularity point is that this subtraction is not automatic. The theorem assumes that, near the true parameter, the rotational orbit is the entire local fibre of $\Phi_k$ and that the induced map from the quotient parameter space into $\mathcal{M}_{\mathrm{sat}}$ has constant rank. Under exactly this hypothesis, the local dimension of the factor covariance model is
\begin{align*}
d_k
=
pk+p-\frac{k(k-1)}{2}.
\end{align*}
This hypothesis also forces $d_k\leq p(p+1)/2$, since a smooth embedded submanifold of the saturated covariance cone cannot have dimension larger than the ambient cone.
Therefore the number of independent covariance restrictions imposed by the $k$-factor structure is the codimension
\begin{align*}
\nu_k
=
\frac{p(p+1)}{2}
-
d_k
=
\frac{p(p+1)}{2}
-
\left(pk+p-\frac{k(k-1)}{2}\right).
\end{align*}
If $k$ is so large that the raw quotient dimension exceeds the saturated covariance dimension, or if additional local non-identifiability is present, then this parameter count is invalid and the regular Wilks reference law with this $\nu_k$ is not being claimed.
[/guided]
[/step]
[step:Apply the regular likelihood-ratio asymptotic theorem]
The hypotheses now give the concrete local conditions needed for the regular likelihood-ratio theorem: $\mathcal{M}_k\subseteq\mathcal{M}_{\mathrm{sat}}$ is locally a smooth embedded submanifold near the true covariance, its codimension is $\nu_k$, the true covariance is an interior point of this local submanifold, the quotient parametrization has full rank after removing the $O(k)$ rotational directions, and the uniqueness parameters remain positive. Therefore Wilks' likelihood-ratio theorem applies to this nested pair of regular covariance models. Its conclusion gives
\begin{align*}
G_k^2 \xrightarrow{d} \chi^2_{\nu_k},
\end{align*}
where $\nu_k$ is the codimension computed in the previous step. Boundary cases such as zero uniquenesses, rotational singularities, additional non-rotational local non-identifiability, dimension saturation for large $k$, or non-unique constrained maximizers violate these regularity hypotheses, so they are excluded from the stated chi-squared approximation. This completes the proof.
[/step]