Wilks' Lambda for Testing Zero Cross-Covariance — Statement & Proof

Wilks' Lambda for Testing Zero Cross-Covariance (Theorem # 4045)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] The unrestricted Gaussian likelihood is maximized at the full sample covariance matrix $S$, while the likelihood under $H_0$ is maximized at the block-diagonal covariance matrix with diagonal blocks $S_{11}$ and $S_{22}$. Thus Wilks' lambda reduces to a determinant ratio. The Schur complement rewrites this determinant ratio as the determinant of an identity minus a whitened cross-covariance product. The eigenvalues of that whitened product are precisely the squared sample canonical correlations, so the determinant factors into the desired product. [/proofplan] [step:Compute the unrestricted and constrained covariance maximizers] Let $d=p+q$, and let $(\Omega,\mathcal F,\mathbb P)$ denote the underlying probability space. Let $Y_1,\dots,Y_n: (\Omega,\mathcal F) \to (\mathbb{R}^{d},\mathcal B(\mathbb{R}^{d}))$ denote the observed independent Gaussian random vectors, and define the sample mean $\bar{Y}\in\mathbb{R}^{d}$ and sample covariance matrix $S\in\mathbb{R}^{d\times d}$ by \begin{align*} \bar{Y} &= \frac{1}{n}\sum_{i=1}^{n}Y_i,\qquad S = \frac{1}{n}\sum_{i=1}^{n}(Y_i-\bar{Y})(Y_i-\bar{Y})^\top. \end{align*} Partition $S$ conformally with $\mathbb{R}^{d}=\mathbb{R}^{p}\times\mathbb{R}^{q}$ by \begin{align*} S= \begin{pmatrix} S_{11} & S_{12}\\ S_{21} & S_{22} \end{pmatrix}, \end{align*} where $S_{11}\in\mathbb{R}^{p\times p}$, $S_{12}\in\mathbb{R}^{p\times q}$, $S_{21}\in\mathbb{R}^{q\times p}$, and $S_{22}\in\mathbb{R}^{q\times q}$. Let $\mathcal{P}_d$ denote the set of positive definite symmetric matrices in $\mathbb{R}^{d\times d}$. The centered Gaussian log-likelihood, up to terms independent of $\Sigma$, is the map \begin{align*} \ell: \mathcal{P}_d &\to \mathbb{R}\\ \Sigma &\mapsto -\frac{n}{2}\log\det \Sigma -\frac{n}{2}\operatorname{tr}(\Sigma^{-1}S). \end{align*} For $\Sigma\in\mathcal{P}_d$, define $M=\Sigma^{-1/2}S\Sigma^{-1/2}\in\mathbb{R}^{d\times d}$. Since $n>d$, $S\in\mathcal{P}_d$ almost surely. If $\lambda_1,\dots,\lambda_d>0$ are the eigenvalues of $M$, then \begin{align*} \ell(\Sigma)-\ell(S) &= -\frac{n}{2}\left(\log\det M^{-1}+\operatorname{tr}M-d\right)\\ &= -\frac{n}{2}\sum_{j=1}^{d}\left(\lambda_j-1-\log\lambda_j\right) \leq 0, \end{align*} where the last inequality uses $\log t\leq t-1$ for every $t>0$. Equality holds exactly when $\lambda_j=1$ for every $j$, equivalently $M=I_d$, so the unrestricted maximum is attained at $\hat{\Sigma}=S$. Under $H_0$, the covariance matrix has the block-diagonal form \begin{align*} \Sigma_0= \begin{pmatrix} A & 0\\ 0 & B \end{pmatrix}, \end{align*} where $A\in\mathbb{R}^{p\times p}$ and $B\in\mathbb{R}^{q\times q}$ are positive definite. Substituting this form gives \begin{align*} \ell(\Sigma_0) = -\frac{n}{2}\log\det A -\frac{n}{2}\log\det B -\frac{n}{2}\operatorname{tr}(A^{-1}S_{11}) -\frac{n}{2}\operatorname{tr}(B^{-1}S_{22}). \end{align*} The two blocks separate. Applying the preceding maximization argument in dimensions $p$ and $q$, respectively, to the positive definite blocks $S_{11}$ and $S_{22}$ shows that the constrained maximum is attained at \begin{align*} \hat{\Sigma}_0= \begin{pmatrix} S_{11} & 0\\ 0 & S_{22} \end{pmatrix}. \end{align*} Therefore the determinant form of Wilks' lambda is \begin{align*} \Lambda = \frac{\det S}{\det S_{11}\det S_{22}}. \end{align*} [guided] The Gaussian likelihood depends on $\Sigma$ through two terms: a volume term $\log\det\Sigma$ and a quadratic-fit term involving $\Sigma^{-1}S$. Let $d=p+q$, and let $(\Omega,\mathcal F,\mathbb P)$ be the underlying probability space. Let $Y_1,\dots,Y_n: (\Omega,\mathcal F) \to (\mathbb{R}^{d},\mathcal B(\mathbb{R}^{d}))$ be the observed independent Gaussian random vectors. Define \begin{align*} \bar{Y} &= \frac{1}{n}\sum_{i=1}^{n}Y_i,\qquad S = \frac{1}{n}\sum_{i=1}^{n}(Y_i-\bar{Y})(Y_i-\bar{Y})^\top. \end{align*} We partition $S$ according to the decomposition $\mathbb{R}^{d}=\mathbb{R}^{p}\times\mathbb{R}^{q}$: \begin{align*} S= \begin{pmatrix} S_{11} & S_{12}\\ S_{21} & S_{22} \end{pmatrix}, \end{align*} where $S_{11}\in\mathbb{R}^{p\times p}$, $S_{12}\in\mathbb{R}^{p\times q}$, $S_{21}\in\mathbb{R}^{q\times p}$, and $S_{22}\in\mathbb{R}^{q\times q}$. Let $\mathcal{P}_d$ be the set of positive definite symmetric matrices in $\mathbb{R}^{d\times d}$. After maximizing over the mean, the relevant log-likelihood is the map \begin{align*} \ell: \mathcal{P}_d &\to \mathbb{R}\\ \Sigma &\mapsto -\frac{n}{2}\log\det \Sigma -\frac{n}{2}\operatorname{tr}(\Sigma^{-1}S). \end{align*} Why is the maximizer $S$? For $\Sigma\in\mathcal{P}_d$, set $M=\Sigma^{-1/2}S\Sigma^{-1/2}\in\mathbb{R}^{d\times d}$. Since $n>d$, the centered sample covariance matrix $S$ is positive definite almost surely. If $\lambda_1,\dots,\lambda_d>0$ are the eigenvalues of $M$, then \begin{align*} \ell(\Sigma)-\ell(S) &= -\frac{n}{2}\left(\log\det M^{-1}+\operatorname{tr}M-d\right)\\ &= -\frac{n}{2}\sum_{j=1}^{d}\left(\lambda_j-1-\log\lambda_j\right). \end{align*} The scalar inequality $\log t\leq t-1$ for $t>0$ gives $\lambda_j-1-\log\lambda_j\geq 0$ for each $j$, so $\ell(\Sigma)\leq \ell(S)$. Equality holds precisely when every $\lambda_j=1$, which is equivalent to $M=I_d$ and hence $\Sigma=S$. Therefore the unrestricted maximizer is $\hat{\Sigma}=S$. Under the null hypothesis $\Sigma_{12}=0$, the covariance matrix must be block diagonal: \begin{align*} \Sigma_0= \begin{pmatrix} A & 0\\ 0 & B \end{pmatrix}, \end{align*} with $A$ and $B$ positive definite. Its inverse and determinant are \begin{align*} \Sigma_0^{-1} = \begin{pmatrix} A^{-1} & 0\\ 0 & B^{-1} \end{pmatrix}, \qquad \det\Sigma_0=(\det A)(\det B). \end{align*} Substituting into the log-likelihood gives \begin{align*} \ell(\Sigma_0) = -\frac{n}{2}\log\det A -\frac{n}{2}\log\det B -\frac{n}{2}\operatorname{tr}(A^{-1}S_{11}) -\frac{n}{2}\operatorname{tr}(B^{-1}S_{22}). \end{align*} The $A$-terms and $B$-terms are independent, so the constrained likelihood maximizes each block separately. The same eigenvalue argument just used applies to the $p\times p$ block with empirical covariance $S_{11}$ and to the $q\times q$ block with empirical covariance $S_{22}$. Since $S$ is positive definite almost surely, both principal blocks $S_{11}$ and $S_{22}$ are positive definite almost surely, so the argument applies without degeneracy. Hence the constrained covariance estimator is \begin{align*} \hat{\Sigma}_0= \begin{pmatrix} S_{11} & 0\\ 0 & S_{22} \end{pmatrix}. \end{align*} At both maximizers, the trace term contributes the same dimension-dependent constant. Thus the maximized likelihood ratio is \begin{align*} \left(\frac{\det S}{\det S_{11}\det S_{22}}\right)^{n/2}. \end{align*} The conventional Wilks' lambda statistic is the determinant ratio inside this power, namely \begin{align*} \Lambda = \frac{\det S}{\det S_{11}\det S_{22}}. \end{align*} [/guided] [/step] [step:Rewrite the determinant ratio using the Schur complement] Since $n>d$, the sample covariance matrix $S$ is positive definite almost surely, and therefore $S_{11}$ and $S_{22}$ are positive definite almost surely. We use the Schur complement determinant identity for the invertible block $S_{22}$. To verify it here, multiply by block triangular matrices of determinant $1$: \begin{align*} \begin{pmatrix} I_p & -S_{12}S_{22}^{-1}\\ 0 & I_q \end{pmatrix} S \begin{pmatrix} I_p & 0\\ -S_{22}^{-1}S_{21} & I_q \end{pmatrix} = \begin{pmatrix} S_{11}-S_{12}S_{22}^{-1}S_{21} & 0\\ 0 & S_{22} \end{pmatrix}. \end{align*} Taking determinants gives \begin{align*} \det S = \det S_{22}\det(S_{11}-S_{12}S_{22}^{-1}S_{21}). \end{align*} Dividing by $\det S_{11}\det S_{22}$ gives \begin{align*} \Lambda &= \frac{\det(S_{11}-S_{12}S_{22}^{-1}S_{21})}{\det S_{11}}\\ &= \det\left(I_p-S_{11}^{-1/2}S_{12}S_{22}^{-1}S_{21}S_{11}^{-1/2}\right). \end{align*} Define \begin{align*} \hat{C}: \mathbb{R}^{q} &\to \mathbb{R}^{p}\\ v &\mapsto S_{11}^{-1/2}S_{12}S_{22}^{-1/2}v. \end{align*} Then \begin{align*} \hat{C}\hat{C}^{\top} = S_{11}^{-1/2}S_{12}S_{22}^{-1}S_{21}S_{11}^{-1/2}, \end{align*} so \begin{align*} \Lambda=\det(I_p-\hat{C}\hat{C}^{\top}). \end{align*} [guided] The determinant ratio becomes useful only after isolating the cross-covariance block $S_{12}$. Because $n>d$, the centered Gaussian sample covariance matrix is positive definite almost surely, so its principal blocks $S_{11}$ and $S_{22}$ are also positive definite almost surely. This lets us invert $S_{22}$ and take the positive definite square roots $S_{11}^{1/2}$ and $S_{22}^{1/2}$. Apply the Schur complement determinant identity to the lower-right block $S_{22}$. The hypothesis needed for this identity is invertibility of $S_{22}$, which holds because $S_{22}$ is positive definite. In this concrete block matrix, the identity follows from the determinant-preserving block elimination \begin{align*} \begin{pmatrix} I_p & -S_{12}S_{22}^{-1}\\ 0 & I_q \end{pmatrix} S \begin{pmatrix} I_p & 0\\ -S_{22}^{-1}S_{21} & I_q \end{pmatrix} = \begin{pmatrix} S_{11}-S_{12}S_{22}^{-1}S_{21} & 0\\ 0 & S_{22} \end{pmatrix}. \end{align*} Both outside matrices are block triangular with identity diagonal blocks, so each has determinant $1$. Taking determinants therefore gives \begin{align*} \det S = \det S_{22}\det(S_{11}-S_{12}S_{22}^{-1}S_{21}). \end{align*} Substituting this into the determinant ratio gives \begin{align*} \Lambda = \frac{\det(S_{11}-S_{12}S_{22}^{-1}S_{21})}{\det S_{11}}. \end{align*} Now factor $S_{11}^{1/2}$ from both sides of the matrix inside the determinant: \begin{align*} S_{11}-S_{12}S_{22}^{-1}S_{21} = S_{11}^{1/2} \left(I_p-S_{11}^{-1/2}S_{12}S_{22}^{-1}S_{21}S_{11}^{-1/2}\right) S_{11}^{1/2}. \end{align*} Taking determinants yields \begin{align*} \det(S_{11}-S_{12}S_{22}^{-1}S_{21}) = \det S_{11}\, \det\left(I_p-S_{11}^{-1/2}S_{12}S_{22}^{-1}S_{21}S_{11}^{-1/2}\right). \end{align*} After cancellation of $\det S_{11}$, we obtain \begin{align*} \Lambda = \det\left(I_p-S_{11}^{-1/2}S_{12}S_{22}^{-1}S_{21}S_{11}^{-1/2}\right). \end{align*} Finally define the whitened sample cross-covariance map \begin{align*} \hat{C}: \mathbb{R}^{q} &\to \mathbb{R}^{p}\\ v &\mapsto S_{11}^{-1/2}S_{12}S_{22}^{-1/2}v. \end{align*} Then \begin{align*} \hat{C}\hat{C}^{\top} = S_{11}^{-1/2}S_{12}S_{22}^{-1}S_{21}S_{11}^{-1/2}, \end{align*} and hence \begin{align*} \Lambda=\det(I_p-\hat{C}\hat{C}^{\top}). \end{align*} [/guided] [/step] [step:Identify the eigenvalues with squared sample canonical correlations] By definition, the sample canonical correlations $\hat{\rho}_1,\dots,\hat{\rho}_s$ are the singular values of the [linear map](/page/Linear%20Map) $\hat{C}:\mathbb{R}^{q}\to\mathbb{R}^{p}$. Therefore the nonzero eigenvalues of the positive semidefinite matrix $\hat{C}\hat{C}^{\top}$ are \begin{align*} \hat{\rho}_1^2,\dots,\hat{\rho}_s^2, \end{align*} with additional zero eigenvalues if $p>s$. Hence \begin{align*} \det(I_p-\hat{C}\hat{C}^{\top}) = \prod_{k=1}^{s}(1-\hat{\rho}_k^2). \end{align*} Combining this identity with the determinant formula for $\Lambda$ gives \begin{align*} \Lambda = \prod_{k=1}^{s}(1-\hat{\rho}_k^2), \end{align*} which is the desired expression for Wilks' lambda. [/step]

Prerequisites (0/2 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

test

Definitions & Concepts

Determinant

Explore Further

Determinant Definition test Theorem #89 Nonnegative Definiteness of the Spectral Density Matrix probability Covariance Stationarity Criterion for the GARCH(1,1) Process probability Independence of the Sample Mean and Sample Covariance for a Multivariate Normal Sample probability Conditional Expectation as the Mean Square Forecast probability Wishart Additivity Theorem probability Rank Bound for the Sample Covariance Matrix probability Baik-Ben Arous-Peche Phase Transition for Spiked Covariance Matrices probability Expectation of an Indicator Random Variable probability

What brings you to Androma?

Start with a route through the knowledge graph.

Wilks' Lambda for Testing Zero Cross-Covariance (Theorem # 4045)

Discussion

Proof

Prerequisites (0/2 completed)

Prerequisites Graph

Explore Further

Sign in to Androma

Check your inbox

One last step

Wilks' Lambda for Testing Zero Cross-Covariance (Theorem # 4045)

Discussion

Proof

Prerequisites (0/2 completed)

Prerequisites Graph

Explore Further