Asymptotic Normality of the MLE — Statement & Proof

Asymptotic Normality of the MLE (Theorem # 1857)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

No discussion available for this theorem.

Proof

The proof proceeds in two stages: first a Taylor expansion of the score to relate $\sqrt{n}(\hat{\theta}_n - \theta_0)$ to the rescaled score at $\theta_0$, then the central limit theorem applied to that score. **Stage 1: Taylor expansion of the score.** Since $\hat{\theta}_n \xrightarrow{\mathbb{P}} \theta_0$ and $\theta_0$ lies in the interior of $\Theta$, with probability tending to one the MLE $\hat{\theta}_n$ lies in the interior of $\Theta$ as well. On this event the first-order condition $\nabla_\theta \bar{\ell}_n(\hat{\theta}_n) = 0$ holds. Applying the mean value theorem to each coordinate of $\theta \mapsto \nabla_\theta \bar{\ell}_n(\theta)$ between $\theta_0$ and $\hat{\theta}_n$ yields \begin{align*} 0 = \nabla_\theta \bar{\ell}_n(\hat{\theta}_n) = \nabla_\theta \bar{\ell}_n(\theta_0) + \bar{A}_n (\hat{\theta}_n - \theta_0), \end{align*} where $\bar{A}_n$ is a $p \times p$ matrix whose $(k,j)$-th entry is $\frac{\partial^2}{\partial\theta_k\partial\theta_j}\bar{\ell}_n(\bar{\theta}^{(j)})$ for an intermediate point $\bar{\theta}^{(j)}$ on the segment $[\theta_0, \hat{\theta}_n]$. **Stage 2: Convergence of $\bar{A}_n$.** The matrix $\bar{A}_n$ converges in probability to $-I(\theta_0)$. Each entry of $\bar{A}_n$ is decomposed as \begin{align*} (\bar{A}_n)_{kj} &= \left[\frac{1}{n}\sum_{i=1}^n \frac{\partial^2}{\partial\theta_k\partial\theta_j}\log f(X_i, \bar{\theta}^{(j)}) - \mathbb{E}_{\theta_0}\!\left[\frac{\partial^2}{\partial\theta_k\partial\theta_j}\log f(X, \bar{\theta}^{(j)})\right]\right]\\ &\quad + \left[\mathbb{E}_{\theta_0}\!\left[\frac{\partial^2}{\partial\theta_k\partial\theta_j}\log f(X, \bar{\theta}^{(j)})\right] - \mathbb{E}_{\theta_0}\!\left[\frac{\partial^2}{\partial\theta_k\partial\theta_j}\log f(X, \theta_0)\right]\right] + (-I(\theta_0))_{kj}. \end{align*} Setting $q(x, \theta) = \frac{\partial^2}{\partial\theta_k\partial\theta_j}\log f(x, \theta)$, the first bracket is bounded by $\sup_{\theta \in K}|\frac{1}{n}\sum_i q(X_i,\theta) - \mathbb{E}[q(X,\theta)]|$, which converges to zero almost surely by the uniform law of large numbers. The second bracket converges to zero in probability because $\bar{\theta}^{(j)} \xrightarrow{\mathbb{P}} \theta_0$ by consistency of $\hat{\theta}_n$ and continuity of $\theta \mapsto \mathbb{E}[q(X,\theta)]$. **Stage 3: CLT for the score.** From the expansion, rearranging gives \begin{align*} \sqrt{n}(\hat{\theta}_n - \theta_0) = (-\bar{A}_n)^{-1} \cdot \sqrt{n}\,\nabla_\theta\bar{\ell}_n(\theta_0). \end{align*} Now $\sqrt{n}\,\nabla_\theta\bar{\ell}_n(\theta_0) = \frac{1}{\sqrt{n}}\sum_{i=1}^n \nabla_\theta\log f(X_i, \theta_0)$, which is a sum of i.i.d. mean-zero vectors with covariance matrix $I(\theta_0)$ by definition. The multivariate CLT gives $\sqrt{n}\,\nabla_\theta\bar{\ell}_n(\theta_0) \xrightarrow{d} \mathcal{N}(0, I(\theta_0))$. **Stage 4: Slutsky's lemma.** Since $(-\bar{A}_n)^{-1} \xrightarrow{\mathbb{P}} I(\theta_0)^{-1}$, Slutsky's lemma yields \begin{align*} \sqrt{n}(\hat{\theta}_n - \theta_0) \xrightarrow{d} I(\theta_0)^{-1} \cdot \mathcal{N}(0, I(\theta_0)) = \mathcal{N}(0,\, I(\theta_0)^{-1} I(\theta_0) I(\theta_0)^{-1}) = \mathcal{N}(0,\, I(\theta_0)^{-1}). \end{align*}

Explore Further

Asymptotic Distribution of the Score Statistic Statistics Asymptotic Equivalence of Wald, LRT, and Score Tests Statistics Jackknife Bias Reduction Statistics MLE Equals Least Squares Under Normality Statistics Bayes Classification Risk as Misclassification Probability Statistics Posterior Concentration at Rate $n^{-1/2}$ Statistics Bias-Variance Decomposition Statistics Bootstrap Consistency Statistics

What brings you to Androma?

Start with a route through the knowledge graph.

Asymptotic Normality of the MLE (Theorem # 1857)

Discussion

Proof

Explore Further

Sign in to Androma

Check your inbox

One last step

Asymptotic Normality of the MLE (Theorem # 1857)

Discussion

Proof

Explore Further