Asymptotic Normality of the Maximum Likelihood Estimators in the Multivariate Normal Model

Asymptotic Normality of the Maximum Likelihood Estimators in the Multivariate Normal Model (Theorem # 4013)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We first compute the distribution of the sample mean directly from the characteristic function of a multivariate normal vector. For the covariance estimator, we temporarily replace the empirical centering by the true centering and apply the [multivariate central limit theorem](/theorems/1854) to the vector of products $Y_iY_i^\top-\Sigma$, where $Y_i=X_i-\mu$. The Gaussian fourth moment formula gives the limiting covariance, and the correction from replacing $\mu$ by $\widehat{\mu}_n$ is negligible after multiplication by $\sqrt n$. Finally, an [orthogonal decomposition](/theorems/436) of the Gaussian sample shows that the sample mean and the centred residual covariance are independent, so the joint limit has independent blocks. [/proofplan] [step:Compute the exact distribution of the scaled sample mean] Let $(\Omega, \mathcal{F}, \mathbb{P})$ denote the probability space on which the sequence $(X_i)_{i \in \mathbb{N}}$ is defined. Let $\mathcal{B}(\mathbb{R}^p)$ denote the Borel $\sigma$-algebra on $\mathbb{R}^p$, and for each $i \in \mathbb{N}$ regard $X_i$ as the measurable random vector \begin{align*} X_i:(\Omega, \mathcal{F}) &\to (\mathbb{R}^p, \mathcal{B}(\mathbb{R}^p)). \end{align*} For each $i \in \{1,\dots,n\}$, define the centred random vector \begin{align*} Y_i:\Omega &\to \mathbb{R}^p, \\ \omega &\mapsto X_i(\omega)-\mu . \end{align*} Then $Y_1,\dots,Y_n$ are independent and identically distributed as $\mathcal{N}_p(0,\Sigma)$. Define \begin{align*} Z_n:\Omega &\to \mathbb{R}^p, \\ \omega &\mapsto \sqrt{n}(\widehat{\mu}_n(\omega)-\mu) = \frac{1}{\sqrt n}\sum_{i=1}^{n}Y_i(\omega). \end{align*} For $t \in \mathbb{R}^p$, the characteristic function of $Y_i$ is \begin{align*} \mathbb{E}\left[e^{i t^\top Y_i}\right] = \exp\left(-\frac{1}{2}t^\top \Sigma t\right). \end{align*} Using independence, \begin{align*} \mathbb{E}\left[e^{i t^\top Z_n}\right] &= \prod_{i=1}^{n}\mathbb{E}\left[e^{i t^\top Y_i/\sqrt n}\right] \\ &= \left[ \exp\left(-\frac{1}{2n}t^\top \Sigma t\right) \right]^n \\ &= \exp\left(-\frac{1}{2}t^\top \Sigma t\right). \end{align*} This is the characteristic function of $\mathcal{N}_p(0,\Sigma)$, so $Z_n \sim \mathcal{N}_p(0,\Sigma)$ for every $n$. Hence \begin{align*} \sqrt n(\widehat{\mu}_n-\mu)\xrightarrow{d}\mathcal{N}_p(0,\Sigma). \end{align*} [/step] [step:Apply the multivariate central limit theorem to the true-centred covariance] Define the true-centred covariance fluctuation \begin{align*} A_n:\Omega &\to \mathbb{R}^{p \times p}, \\ \omega &\mapsto \frac{1}{n}\sum_{i=1}^{n}Y_i(\omega)Y_i(\omega)^\top-\Sigma . \end{align*} Also define the product vector \begin{align*} V_i:\Omega &\to \mathbb{R}^{p^2}, \\ \omega &\mapsto \operatorname{vec}\left(Y_i(\omega)Y_i(\omega)^\top-\Sigma\right). \end{align*} Then $V_1,V_2,\dots$ are independent and identically distributed, $\mathbb{E}[V_i]=0$, and every second moment is finite because Gaussian random variables have finite moments of all orders. Define the covariance matrix $\Gamma \in \mathbb{R}^{p^2 \times p^2}$ by \begin{align*} \Gamma := \operatorname{Cov}(V_1). \end{align*} The hypotheses of the multivariate [central limit theorem](/theorems/1848) are therefore satisfied: the summands are independent and identically distributed in $\mathbb{R}^{p^2}$, have mean zero, and have finite second moments. The multivariate [central limit theorem](/theorems/521) gives \begin{align*} \sqrt n\,\operatorname{vec}(A_n) = \frac{1}{\sqrt n}\sum_{i=1}^{n}V_i \xrightarrow{d} \mathcal{N}_{p^2}(0,\Gamma). \end{align*} In coordinates, this definition of $\Gamma$ means \begin{align*} \Gamma_{(j,k),(l,m)} = \operatorname{Cov}(Y_{1,j}Y_{1,k},Y_{1,l}Y_{1,m}). \end{align*} It remains to compute this covariance. Since $Y_1$ is centred Gaussian, its moment generating function \begin{align*} M:\mathbb{R}^p &\to \mathbb{R}, \\ t &\mapsto \mathbb{E}\left[e^{t^\top Y_1}\right] = \exp\left(\frac{1}{2}t^\top\Sigma t\right) \end{align*} is smooth. Differentiating four times at $t=0$ gives the Gaussian fourth moment identity \begin{align*} \mathbb{E}[Y_{1,j}Y_{1,k}Y_{1,l}Y_{1,m}] = \Sigma_{jk}\Sigma_{lm} + \Sigma_{jl}\Sigma_{km} + \Sigma_{jm}\Sigma_{kl}. \end{align*} Therefore \begin{align*} \Gamma_{(j,k),(l,m)} &= \mathbb{E}[Y_{1,j}Y_{1,k}Y_{1,l}Y_{1,m}] - \mathbb{E}[Y_{1,j}Y_{1,k}] \mathbb{E}[Y_{1,l}Y_{1,m}] \\ &= \left( \Sigma_{jk}\Sigma_{lm} + \Sigma_{jl}\Sigma_{km} + \Sigma_{jm}\Sigma_{kl} \right) - \Sigma_{jk}\Sigma_{lm} \\ &= \Sigma_{jl}\Sigma_{km}+\Sigma_{jm}\Sigma_{kl}. \end{align*} [/step] [step:Show that empirical centering changes the covariance estimator by a negligible term] Define \begin{align*} \delta_n:\Omega &\to \mathbb{R}^p, \\ \omega &\mapsto \widehat{\mu}_n(\omega)-\mu . \end{align*} For every $i \in \{1,\dots,n\}$, \begin{align*} X_i-\widehat{\mu}_n=Y_i-\delta_n. \end{align*} Expanding the outer product and using $\frac{1}{n}\sum_{i=1}^{n}Y_i=\delta_n$, \begin{align*} \widehat{\Sigma}_n &= \frac{1}{n}\sum_{i=1}^{n}(Y_i-\delta_n)(Y_i-\delta_n)^\top \\ &= \frac{1}{n}\sum_{i=1}^{n}Y_iY_i^\top - \delta_n\delta_n^\top . \end{align*} Thus \begin{align*} \widehat{\Sigma}_n-\Sigma=A_n-\delta_n\delta_n^\top. \end{align*} Using the Frobenius norm $|\cdot|_F$ on $\mathbb{R}^{p\times p}$, \begin{align*} \left|\sqrt n\,\operatorname{vec}(\delta_n\delta_n^\top)\right| = \sqrt n\,|\delta_n\delta_n^\top|_F = \sqrt n\,|\delta_n|^2 = \frac{1}{\sqrt n}\left|\sqrt n\,\delta_n\right|^2. \end{align*} The sequence $\sqrt n\,\delta_n$ is tight because it has the fixed distribution $\mathcal{N}_p(0,\Sigma)$ by the first step. Hence \begin{align*} \sqrt n\,\operatorname{vec}(\delta_n\delta_n^\top)\xrightarrow{\mathbb{P}}0. \end{align*} The preceding convergence in probability and the distributional convergence of $\sqrt n\,\operatorname{vec}(A_n)$ satisfy the hypotheses of Slutsky's theorem. Hence \begin{align*} \sqrt n\,\operatorname{vec}(\widehat{\Sigma}_n-\Sigma) = \sqrt n\,\operatorname{vec}(A_n) - \sqrt n\,\operatorname{vec}(\delta_n\delta_n^\top) \xrightarrow{d} \mathcal{N}_{p^2}(0,\Gamma). \end{align*} [/step] [step:Establish independence of the limiting mean and covariance components] Let $\Sigma^{1/2}\in \mathbb{R}^{p\times p}$ denote the symmetric positive definite square root of $\Sigma$, and define \begin{align*} R_i:\Omega &\to \mathbb{R}^p, \\ \omega &\mapsto \Sigma^{-1/2}(X_i(\omega)-\mu). \end{align*} Then $R_1,\dots,R_n$ are independent standard normal random vectors in $\mathbb{R}^p$. Choose an orthogonal matrix $Q_n\in\mathbb{R}^{n\times n}$ whose first row is \begin{align*} \left(\frac{1}{\sqrt n},\dots,\frac{1}{\sqrt n}\right). \end{align*} For $a\in\{1,\dots,n\}$, define \begin{align*} W_a:\Omega &\to \mathbb{R}^p, \\ \omega &\mapsto \sum_{i=1}^{n}(Q_n)_{ai}R_i(\omega). \end{align*} Because the joint vector $(R_1,\dots,R_n)$ is centred Gaussian and the transformation induced by $Q_n$ is orthogonal, $W_1,\dots,W_n$ are independent standard normal random vectors in $\mathbb{R}^p$. The first row of $Q_n$ gives \begin{align*} \sqrt n(\widehat{\mu}_n-\mu)=\Sigma^{1/2}W_1. \end{align*} The remaining rows span the orthogonal complement of the constant vector in $\mathbb{R}^n$, so the residual sum of squares satisfies \begin{align*} \widehat{\Sigma}_n = \Sigma^{1/2} \left( \frac{1}{n}\sum_{a=2}^{n}W_aW_a^\top \right) \Sigma^{1/2}. \end{align*} Thus $\sqrt n(\widehat{\mu}_n-\mu)$ is a measurable function of $W_1$, while $\widehat{\Sigma}_n$ is a measurable function of $(W_2,\dots,W_n)$. Since $W_1$ is independent of $(W_2,\dots,W_n)$, the random vectors $\sqrt n(\widehat{\mu}_n-\mu)$ and $\sqrt n\,\operatorname{vec}(\widehat{\Sigma}_n-\Sigma)$ are independent for each $n$. We now pass from marginal convergence to joint convergence using characteristic functions. For $u \in \mathbb{R}^p$ and $v \in \mathbb{R}^{p^2}$, define \begin{align*} \phi_n(u,v) := \mathbb{E}\left[ \exp\left( i u^\top \sqrt n(\widehat{\mu}_n-\mu) + i v^\top \sqrt n\operatorname{vec}(\widehat{\Sigma}_n-\Sigma) \right) \right]. \end{align*} The finite-sample independence just proved gives the factorisation \begin{align*} \phi_n(u,v) &= \mathbb{E}\left[\exp\left(i u^\top \sqrt n(\widehat{\mu}_n-\mu)\right)\right] \mathbb{E}\left[\exp\left(i v^\top \sqrt n\operatorname{vec}(\widehat{\Sigma}_n-\Sigma)\right)\right]. \end{align*} The first factor converges to $\exp(-u^\top\Sigma u/2)$ by the exact normal distribution of the scaled sample mean. The second factor converges to $\exp(-v^\top\Gamma v/2)$ by the covariance asymptotic normality proved above. Therefore \begin{align*} \phi_n(u,v) \to \exp\left(-\frac{1}{2}u^\top\Sigma u\right) \exp\left(-\frac{1}{2}v^\top\Gamma v\right) = \exp\left( -\frac{1}{2} \begin{pmatrix} u \\ v \end{pmatrix}^\top \begin{pmatrix} \Sigma & 0 \\ 0 & \Gamma \end{pmatrix} \begin{pmatrix} u \\ v \end{pmatrix} \right). \end{align*} This is the characteristic function of the centred normal vector in $\mathbb{R}^{p+p^2}$ with block diagonal covariance matrix. Hence \begin{align*} \left(\sqrt{n}(\widehat{\mu}_n-\mu),\sqrt{n}\operatorname{vec}(\widehat{\Sigma}_n-\Sigma)\right) \xrightarrow{d} \mathcal{N}_{p+p^2}\left(0, \begin{pmatrix} \Sigma & 0 \\ 0 & \Gamma \end{pmatrix} \right). \end{align*} The factorisation of the limiting characteristic function into a function of $u$ times a function of $v$ shows that the limiting mean component and limiting covariance component are independent. [/step]

Prerequisites (0/1 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Definitions & Concepts

Distribution

Explore Further

Distribution Definition Consistency of the Smoothed Periodogram probability Monotonicity and Triangle Inequality for Expectation probability Exact Normal-Theory Prediction Region for a New Multivariate Linear-Model Observation probability Quadratic Discriminant Analysis Bayes Rule probability Durbin-Levinson Recursion probability Linear Discriminant Analysis Generalized Eigenvalue Theorem probability Wilks' Lambda Product Formula probability Nelson's Strict Stationarity Theorem for the GARCH(1,1) Process probability

What brings you to Androma?

Start with a route through the knowledge graph.