Bartlett Chi-Squared Approximation for Wilks' Lambda Statistic

Bartlett Chi-Squared Approximation for Wilks' Lambda Statistic (Theorem # 4046)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We identify the ordinary Gaussian likelihood ratio as $\Lambda_n^{n/2}$, where $\Lambda_n$ is Wilks' determinant lambda computed from the formally defined sample covariance blocks. The unrestricted covariance model has $pq$ more free parameters than the null model, so Wilks' likelihood-ratio theorem gives the limiting $\chi^2_{pq}$ law for $-n\log\Lambda_n$ on the probability-one eventual domain where the sample covariance determinants are nonzero. Multiplying first by $(n-1)/n$ and then by the Bartlett factor $\bigl(n-1-(p+q+1)/2\bigr)/(n-1)$ changes the statistic only by deterministic factors tending to $1$, so Slutsky's theorem preserves the same first-order limiting distribution. [/proofplan] [step:Identify Wilks' lambda as the Gaussian likelihood-ratio statistic] For $x_1,\dots,x_n \in \mathbb{R}^{p+q}$, define the sample mean $\bar x_n\in\mathbb{R}^{p+q}$ and the unbiased sample covariance matrix $S_n\in\mathbb{R}^{(p+q)\times(p+q)}$ by \begin{align*} \bar x_n &:= \frac{1}{n}\sum_{i=1}^n x_i, \\ S_n &:= \frac{1}{n-1}\sum_{i=1}^n (x_i-\bar x_n)(x_i-\bar x_n)^\top. \end{align*} Write the block decomposition of $S_n$ according to the split $\mathbb{R}^{p+q}=\mathbb{R}^p\times\mathbb{R}^q$ as \begin{align*} S_n= \begin{pmatrix} S_{11,n} & S_{12,n} \\ S_{21,n} & S_{22,n} \end{pmatrix}, \end{align*} where $S_{11,n}\in\mathbb{R}^{p\times p}$ and $S_{22,n}\in\mathbb{R}^{q\times q}$ are the diagonal sample covariance blocks. Define the Gaussian log-likelihood function \begin{align*} \ell_n:\mathbb{R}^{p+q}\times \mathcal{S}_{++}^{p+q} &\to \mathbb{R} \\ (m,A) &\mapsto -\frac{n(p+q)}{2}\log(2\pi) -\frac{n}{2}\log\det A -\frac{1}{2}\sum_{i=1}^n (x_i-m)^\top A^{-1}(x_i-m), \end{align*} where $\mathcal{S}_{++}^{p+q}$ denotes the set of positive definite symmetric $(p+q)\times(p+q)$ real matrices. With this notation, the unrestricted maximum likelihood estimator of the mean is $\bar x_n$, and the unrestricted maximum likelihood estimator of the covariance matrix is \begin{align*} \widehat\Sigma_n := \frac{1}{n}\sum_{i=1}^n (x_i-\bar x_n)(x_i-\bar x_n)^\top = \frac{n-1}{n}S_n. \end{align*} Under the null model $\Sigma_{12}=0$, the covariance matrix is block diagonal, and the restricted covariance maximum likelihood estimator is \begin{align*} \widehat\Sigma_{0,n} := \begin{pmatrix} \frac{n-1}{n}S_{11,n} & 0 \\ 0 & \frac{n-1}{n}S_{22,n} \end{pmatrix}. \end{align*} Substituting these maximizers into the likelihood gives the likelihood-ratio statistic \begin{align*} \Lambda_n &= \frac{\sup\{\exp(\ell_n(m,A)):\ A_{12}=0\}} {\sup\{\exp(\ell_n(m,A)):\ A\in \mathcal{S}_{++}^{p+q}\}} \\ &= \frac{\det \widehat\Sigma_n}{\det \widehat\Sigma_{0,n}} = \frac{\det S_n}{\det S_{11,n}\det S_{22,n}}. \end{align*} Thus the statistic in the statement is Bartlett's corrected form of Wilks' likelihood-ratio statistic for the Gaussian independence hypothesis. [/step] [step:Compute the number of restrictions imposed by the null model] The unrestricted covariance parameter space $\mathcal{S}_{++}^{p+q}$ is an open subset of the real [vector space](/page/Vector%20Space) of symmetric $(p+q)\times(p+q)$ matrices, so its dimension is \begin{align*} \frac{(p+q)(p+q+1)}{2}. \end{align*} Under $H_0$, the covariance matrix is block diagonal with one positive definite $p\times p$ block and one positive definite $q\times q$ block. Hence the null covariance parameter space has dimension \begin{align*} \frac{p(p+1)}{2}+\frac{q(q+1)}{2}. \end{align*} The difference between these dimensions is \begin{align*} \frac{(p+q)(p+q+1)}{2} - \frac{p(p+1)}{2} - \frac{q(q+1)}{2} = pq. \end{align*} Therefore the null hypothesis imposes exactly $pq$ independent smooth restrictions on the unrestricted covariance model, namely the vanishing of the $pq$ entries of $\Sigma_{12}$. [/step] [step:Apply Wilks' theorem to obtain the uncorrected chi-squared limit] Let the full parameter space be \begin{align*} \Theta := \mathbb{R}^{p+q}\times \mathcal{S}_{++}^{p+q}, \end{align*} and let the null parameter space be \begin{align*} \Theta_0 := \{(m,A)\in \Theta : A_{12}=0\}. \end{align*} The mean parameter $m\in\mathbb{R}^{p+q}$ is unrestricted in both models, so it is a nuisance parameter common to the full and null parameter spaces. The multivariate normal model is a regular finite-dimensional parametric model: the true parameter $(m_0,A_0)$ lies in the interior of $\Theta$ because $A_0\in\mathcal{S}_{++}^{p+q}$, it lies in the relative interior of $\Theta_0$ under $H_0$ whenever $A_{0,11}$ and $A_{0,22}$ are positive definite, the log-likelihood is twice continuously differentiable in a neighbourhood of $(m_0,A_0)$, and the Fisher information for the Gaussian mean-covariance parameter is nonsingular. The null model is a smooth embedded submodel of codimension $pq$, as computed above. Let $E_n$ be the event that $S_n$, $S_{11,n}$, and $S_{22,n}$ are positive definite. Under the nonsingular multivariate normal law, the centered observations span $\mathbb{R}^{p+q}$ with probability $1$ whenever $n\ge p+q+1$, and their first $p$ and last $q$ coordinate projections span $\mathbb{R}^p$ and $\mathbb{R}^q$ with probability $1$ whenever $n\ge \max\{p+1,q+1\}$. Hence $\mathbb{P}(E_n)=1$ for all sufficiently large $n$, so the determinant ratio defining $\Lambda_n$ is eventually defined almost surely. Define the ordinary likelihood-ratio statistic \begin{align*} R_n := \frac{\sup\{\exp(\ell_n(m,A)):\ (m,A)\in\Theta_0\}} {\sup\{\exp(\ell_n(m,A)):\ (m,A)\in\Theta\}}. \end{align*} By Wilks' likelihood-ratio theorem (citing a result not yet in the wiki: [Wilks' theorem](/theorems/1431)), applied to the regular model $\Theta$ and the embedded null submodel $\Theta_0$ of codimension $pq$, under $H_0$, \begin{align*} -2\log R_n \xrightarrow{d} \chi^2_{pq}. \end{align*} Using the maximized Gaussian likelihoods computed above, the normalizing constants and residual quadratic terms cancel at the maximizers, giving \begin{align*} R_n &= \left(\frac{\det \widehat\Sigma_n}{\det \widehat\Sigma_{0,n}}\right)^{n/2} = \Lambda_n^{n/2}. \end{align*} Therefore \begin{align*} -2\log R_n = -n\log\Lambda_n, \end{align*} and [Wilks' theorem](/theorems/1864) gives \begin{align*} -n\log\Lambda_n \xrightarrow{d} \chi^2_{pq}. \end{align*} Since $(n-1)/n\to 1$, Slutsky's theorem yields \begin{align*} -(n-1)\log\Lambda_n = \frac{n-1}{n}\bigl(-n\log\Lambda_n\bigr) \xrightarrow{d} \chi^2_{pq}. \end{align*} [/step] [step:Insert Bartlett's correction without changing the limiting law] Define the deterministic correction factor \begin{align*} a_n := \frac{n-1-\frac{p+q+1}{2}}{n-1}. \end{align*} Since $p$ and $q$ are fixed, \begin{align*} \lim_{n\to\infty} a_n = 1. \end{align*} The Bartlett-corrected statistic can be written as \begin{align*} -\left(n-1-\frac{p+q+1}{2}\right)\log\Lambda_n = a_n\bigl(-(n-1)\log\Lambda_n\bigr). \end{align*} Since $a_n \to 1$ and $-(n-1)\log\Lambda_n \xrightarrow{d}\chi^2_{pq}$, Slutsky's theorem gives \begin{align*} -\left(n-1-\frac{p+q+1}{2}\right)\log\Lambda_n \xrightarrow{d} \chi^2_{pq}. \end{align*} This is precisely the asserted first-order large-sample chi-squared approximation with $pq$ degrees of freedom. The argument uses only that the Bartlett factor tends to $1$; it does not assert the stronger second-order Bartlett correction property. [/step]

Prerequisites (0/2 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Definitions & Concepts

Explore Further

Determinant Definition Distribution Definition Wishart Distribution of the Sample Covariance Matrix probability Quadratic Discriminant Analysis Bayes Rule probability Kalman Filter Recursion Theorem probability Covariance Stationarity Criterion for the GARCH(1,1) Process probability Bonferroni Simultaneous Confidence Intervals for Linear Contrasts of a Multivariate Normal Mean probability Wishart Mean and Entrywise Covariance Formula probability Law of the Unconscious Statistician probability Yule-Walker Equations for a Causal Autoregressive Process probability

What brings you to Androma?

Start with a route through the knowledge graph.