Strong Consistency of the Multivariate Normal Maximum Likelihood Estimators

Strong Consistency of the Multivariate Normal Maximum Likelihood Estimators (Theorem # 4012)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We prove the two convergences component by component. The scalar [strong law of large numbers](/theorems/1852) gives almost sure convergence of each coordinate average of $X_i$ to the corresponding coordinate of $\mu$. Applying the same law to the integrable scalar variables $(X_i)_j(X_i)_k$ gives convergence of the empirical second moment matrix to $\Sigma+\mu\mu^\top$. Finally, we use the algebraic identity expressing the MLE covariance as empirical second moment minus $\hat{\mu}_n\hat{\mu}_n^\top$, and pass to the limit entrywise. [/proofplan] [step:Apply the scalar strong law to each coordinate of the sample mean] For $i \in \mathbb{N}$ and $1 \le j \le p$, write $(X_i)_j$ for the $j$-th coordinate of $X_i$. Since $X_i \sim \mathcal{N}_p(\mu,\Sigma)$, each coordinate $(X_i)_j$ is an integrable real-valued random variable and \begin{align*} \mathbb{E}[(X_1)_j] = \mu_j. \end{align*} For each fixed $j$, the sequence $((X_i)_j)_{i \in \mathbb{N}}$ is i.i.d. Therefore, by the scalar [Strong Law of Large Numbers](/theorems/520) (citing a result not yet in the wiki: Strong Law of Large Numbers), \begin{align*} \frac{1}{n}\sum_{i=1}^n (X_i)_j \xrightarrow{a.s.} \mu_j. \end{align*} Because the index set $\{1,\dots,p\}$ is finite, the intersection of these $p$ almost sure events still has probability $1$. On that event, every coordinate of $\hat{\mu}_n$ converges to the corresponding coordinate of $\mu$, hence \begin{align*} \hat{\mu}_n \xrightarrow{a.s.} \mu. \end{align*} [guided] We first reduce the vector convergence to finitely many scalar convergences. For $i \in \mathbb{N}$ and $1 \le j \le p$, let $(X_i)_j$ denote the $j$-th coordinate of the random vector $X_i$. Since $X_i$ has multivariate normal distribution $\mathcal{N}_p(\mu,\Sigma)$, its $j$-th coordinate is a real-valued normal random variable with finite first moment and mean $\mu_j$: \begin{align*} \mathbb{E}[(X_1)_j] = \mu_j. \end{align*} For a fixed coordinate $j$, the random variables $((X_i)_j)_{i \in \mathbb{N}}$ are independent and identically distributed, because the vectors $(X_i)_{i \in \mathbb{N}}$ are independent and identically distributed and coordinate projection preserves independence and identical distribution. The scalar Strong Law of Large Numbers applies to this integrable i.i.d. sequence, so \begin{align*} \frac{1}{n}\sum_{i=1}^n (X_i)_j \xrightarrow{a.s.} \mathbb{E}[(X_1)_j] = \mu_j. \end{align*} This argument gives one almost sure event for each coordinate. Since there are only finitely many coordinates, the intersection of these events still has probability $1$. On that intersection, all coordinate averages converge simultaneously: \begin{align*} (\hat{\mu}_n)_j = \frac{1}{n}\sum_{i=1}^n (X_i)_j \to \mu_j \end{align*} for every $1 \le j \le p$. This is precisely $\hat{\mu}_n \xrightarrow{a.s.} \mu$ in $\mathbb{R}^p$. [/guided] [/step] [step:Apply the scalar strong law to each entry of the empirical second moment matrix] For $1 \le j,k \le p$, define the real-valued random variables \begin{align*} Y_{i,jk} : \Omega &\to \mathbb{R}\\ \omega &\mapsto (X_i(\omega))_j (X_i(\omega))_k . \end{align*} The variables $(Y_{i,jk})_{i \in \mathbb{N}}$ are i.i.d. for each fixed pair $(j,k)$. They are integrable because, by Cauchy-Schwarz, \begin{align*} \mathbb{E}[|Y_{1,jk}|] \le \mathbb{E}[(X_1)_j^2]^{1/2}\mathbb{E}[(X_1)_k^2]^{1/2} < \infty, \end{align*} and all second moments of a multivariate normal vector are finite. By the scalar Strong Law of Large Numbers, \begin{align*} \frac{1}{n}\sum_{i=1}^n (X_i)_j(X_i)_k \xrightarrow{a.s.} \mathbb{E}[(X_1)_j(X_1)_k]. \end{align*} By the definition of covariance, \begin{align*} \Sigma_{jk} = \mathbb{E}\bigl[((X_1)_j-\mu_j)((X_1)_k-\mu_k)\bigr] = \mathbb{E}[(X_1)_j(X_1)_k]-\mu_j\mu_k, \end{align*} so \begin{align*} \mathbb{E}[(X_1)_j(X_1)_k] = \Sigma_{jk}+\mu_j\mu_k. \end{align*} Since there are finitely many pairs $(j,k)$, these convergences hold simultaneously almost surely for all $1 \le j,k \le p$. Therefore, if \begin{align*} M_n := \frac{1}{n}\sum_{i=1}^n X_iX_i^\top, \end{align*} then \begin{align*} M_n \xrightarrow{a.s.} \Sigma+\mu\mu^\top \end{align*} entrywise. [guided] The covariance estimator is most easily handled by first controlling the empirical second moment matrix. For $1 \le j,k \le p$, define \begin{align*} Y_{i,jk} : \Omega &\to \mathbb{R}\\ \omega &\mapsto (X_i(\omega))_j (X_i(\omega))_k . \end{align*} This is the scalar random variable that gives the $(j,k)$-entry of $X_iX_i^\top$. For each fixed pair $(j,k)$, the variables $(Y_{i,jk})_{i \in \mathbb{N}}$ are independent and identically distributed because each $Y_{i,jk}$ is obtained from $X_i$ by the same measurable function $x \mapsto x_jx_k$. To apply the scalar Strong Law of Large Numbers, we must check integrability. By Cauchy-Schwarz, \begin{align*} \mathbb{E}[|Y_{1,jk}|] = \mathbb{E}[|(X_1)_j(X_1)_k|] \le \mathbb{E}[(X_1)_j^2]^{1/2} \mathbb{E}[(X_1)_k^2]^{1/2}. \end{align*} A multivariate normal random vector has finite second moments in every coordinate, so the right-hand side is finite. Thus the strong law applies and gives \begin{align*} \frac{1}{n}\sum_{i=1}^n (X_i)_j(X_i)_k \xrightarrow{a.s.} \mathbb{E}[(X_1)_j(X_1)_k]. \end{align*} We now identify the limit. By definition, the covariance matrix $\Sigma$ satisfies \begin{align*} \Sigma_{jk} = \mathbb{E}\bigl[((X_1)_j-\mu_j)((X_1)_k-\mu_k)\bigr]. \end{align*} Expanding the product and using $\mathbb{E}[(X_1)_j]=\mu_j$ and $\mathbb{E}[(X_1)_k]=\mu_k$, we obtain \begin{align*} \Sigma_{jk} &= \mathbb{E}[(X_1)_j(X_1)_k] -\mu_j\mathbb{E}[(X_1)_k] -\mu_k\mathbb{E}[(X_1)_j] +\mu_j\mu_k\\ &= \mathbb{E}[(X_1)_j(X_1)_k]-\mu_j\mu_k. \end{align*} Hence \begin{align*} \mathbb{E}[(X_1)_j(X_1)_k] = \Sigma_{jk}+\mu_j\mu_k. \end{align*} There are only $p^2$ pairs $(j,k)$. Taking the intersection of the corresponding almost sure events, we obtain simultaneous convergence of all entries. Therefore the empirical second moment matrix \begin{align*} M_n := \frac{1}{n}\sum_{i=1}^n X_iX_i^\top \end{align*} satisfies \begin{align*} M_n \xrightarrow{a.s.} \Sigma+\mu\mu^\top \end{align*} entrywise. [/guided] [/step] [step:Rewrite the covariance estimator using empirical second moments] For each $n \in \mathbb{N}$, \begin{align*} \hat{\Sigma}_n &= \frac{1}{n}\sum_{i=1}^n (X_i-\hat{\mu}_n)(X_i-\hat{\mu}_n)^\top\\ &= \frac{1}{n}\sum_{i=1}^n X_iX_i^\top -\frac{1}{n}\sum_{i=1}^n X_i\hat{\mu}_n^\top -\frac{1}{n}\sum_{i=1}^n \hat{\mu}_nX_i^\top +\frac{1}{n}\sum_{i=1}^n \hat{\mu}_n\hat{\mu}_n^\top. \end{align*} Since $\hat{\mu}_n$ does not depend on the summation index $i$ and $\frac{1}{n}\sum_{i=1}^n X_i=\hat{\mu}_n$, the middle and final terms become \begin{align*} \frac{1}{n}\sum_{i=1}^n X_i\hat{\mu}_n^\top &= \hat{\mu}_n\hat{\mu}_n^\top,\\ \frac{1}{n}\sum_{i=1}^n \hat{\mu}_nX_i^\top &= \hat{\mu}_n\hat{\mu}_n^\top,\\ \frac{1}{n}\sum_{i=1}^n \hat{\mu}_n\hat{\mu}_n^\top &= \hat{\mu}_n\hat{\mu}_n^\top. \end{align*} Therefore \begin{align*} \hat{\Sigma}_n = M_n-\hat{\mu}_n\hat{\mu}_n^\top. \end{align*} [/step] [step:Pass to the limit in the covariance identity] Work on the probability-one event on which both \begin{align*} \hat{\mu}_n &\to \mu, & M_n &\to \Sigma+\mu\mu^\top \end{align*} hold entrywise. For each $1 \le j,k \le p$, the $(j,k)$-entry of $\hat{\mu}_n\hat{\mu}_n^\top$ is $(\hat{\mu}_n)_j(\hat{\mu}_n)_k$. Since products of convergent real sequences converge to the product of their limits, \begin{align*} (\hat{\mu}_n)_j(\hat{\mu}_n)_k \to \mu_j\mu_k. \end{align*} Thus \begin{align*} \hat{\mu}_n\hat{\mu}_n^\top \to \mu\mu^\top \end{align*} entrywise. Using the identity from the previous step, \begin{align*} \hat{\Sigma}_n = M_n-\hat{\mu}_n\hat{\mu}_n^\top \to (\Sigma+\mu\mu^\top)-\mu\mu^\top = \Sigma \end{align*} entrywise. Hence \begin{align*} \hat{\Sigma}_n \xrightarrow{a.s.} \Sigma. \end{align*} Together with $\hat{\mu}_n \xrightarrow{a.s.} \mu$, this proves the claimed strong consistency of both maximum likelihood estimators. [/step]

Prerequisites (0/3 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

Definitions & Concepts

Distribution

Explore Further

Distribution Definition Strong Law Of Large Numbers Theorem #520 Strong Law of Large Numbers Theorem #1852 Asymptotic Normality of ARMA Maximum Likelihood Estimators probability Gaussian ARMA Prediction Interval Theorem probability Ljung-Box Portmanteau Test probability One-Sample Hotelling Confidence Ellipsoid probability Linear Process Autocovariance Formula probability Normal Equal-Covariance Bayes Rule probability Characteristic Function of the Multivariate Normal Distribution probability Mahalanobis Quadratic Form Distribution probability

What brings you to Androma?

Start with a route through the knowledge graph.