Multivariate Delta Method — Statement & Proof

Multivariate Delta Method (Theorem # 1860)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

No discussion available for this theorem.

Proof

[proofplan] We linearize $g$ around $\theta_0$ using its first-order Taylor expansion. The remainder term is $o(|\hat{\theta}_n - \theta_0|)$, and after rescaling by $\sqrt{n}$ it vanishes in probability because $\hat{\theta}_n - \theta_0 = O_{\mathbb{P}}(n^{-1/2})$. This leaves $Jg_{\theta_0} \cdot \sqrt{n}(\hat{\theta}_n - \theta_0)$ as the leading term. An application of Slutsky's lemma (or the continuous mapping theorem for linear maps) then transfers the distributional convergence through the fixed linear map $Jg_{\theta_0}$. [/proofplan] [step:Expand $g(\hat{\theta}_n)$ around $\theta_0$ via first-order Taylor approximation] Since $g : \mathbb{R}^p \to \mathbb{R}^q$ is differentiable at $\theta_0$, the first-order Taylor expansion gives \begin{align*} g(\hat{\theta}_n) - g(\theta_0) = Jg_{\theta_0}(\hat{\theta}_n - \theta_0) + r(\hat{\theta}_n - \theta_0), \end{align*} where the remainder $r : \mathbb{R}^p \to \mathbb{R}^q$ satisfies $|r(h)| / |h| \to 0$ as $|h| \to 0$. Multiplying both sides by $\sqrt{n}$: \begin{align*} \sqrt{n}(g(\hat{\theta}_n) - g(\theta_0)) = Jg_{\theta_0} \cdot \sqrt{n}(\hat{\theta}_n - \theta_0) + \sqrt{n}\, r(\hat{\theta}_n - \theta_0). \end{align*} [guided] Differentiability of $g$ at $\theta_0$ means that the total derivative $Dg_{\theta_0} : \mathbb{R}^p \to \mathbb{R}^q$ exists as a linear map, and its matrix representation with respect to the standard bases is the Jacobian $Jg_{\theta_0} \in \mathbb{R}^{q \times p}$. The defining property of differentiability is \begin{align*} g(\theta_0 + h) = g(\theta_0) + Jg_{\theta_0}\, h + r(h), \qquad \text{where } \frac{|r(h)|}{|h|} \to 0 \text{ as } h \to 0. \end{align*} Setting $h = \hat{\theta}_n - \theta_0$ and multiplying both sides by $\sqrt{n}$ gives \begin{align*} \sqrt{n}(g(\hat{\theta}_n) - g(\theta_0)) = Jg_{\theta_0} \cdot \sqrt{n}(\hat{\theta}_n - \theta_0) + \sqrt{n}\, r(\hat{\theta}_n - \theta_0). \end{align*} The proof now reduces to showing that the remainder term $\sqrt{n}\, r(\hat{\theta}_n - \theta_0)$ is negligible and the leading term converges to the claimed Gaussian. [/guided] [/step] [step:Show the remainder term vanishes in probability] Since $\sqrt{n}(\hat{\theta}_n - \theta_0) \xrightarrow{d} \mathcal{N}(0, \Sigma)$, the sequence $\sqrt{n}(\hat{\theta}_n - \theta_0)$ is bounded in probability (convergence in distribution implies boundedness in probability). Equivalently, $\hat{\theta}_n - \theta_0 = O_{\mathbb{P}}(n^{-1/2})$, so in particular $\hat{\theta}_n \xrightarrow{\mathbb{P}} \theta_0$. Write the remainder as \begin{align*} \sqrt{n}\, r(\hat{\theta}_n - \theta_0) = \frac{r(\hat{\theta}_n - \theta_0)}{|\hat{\theta}_n - \theta_0|} \cdot \sqrt{n}\,|\hat{\theta}_n - \theta_0|. \end{align*} The first factor converges to $0$ in probability because $\hat{\theta}_n \xrightarrow{\mathbb{P}} \theta_0$ and $|r(h)|/|h| \to 0$ as $h \to 0$. The second factor $\sqrt{n}\,|\hat{\theta}_n - \theta_0| = |\sqrt{n}(\hat{\theta}_n - \theta_0)|$ is bounded in probability. The product of a sequence converging to zero in probability and a sequence bounded in probability converges to zero in probability, so $\sqrt{n}\, r(\hat{\theta}_n - \theta_0) \xrightarrow{\mathbb{P}} 0$. [guided] This is the critical step where the $\sqrt{n}$ rescaling interacts with the remainder. We need to show that $\sqrt{n}\, r(\hat{\theta}_n - \theta_0) \xrightarrow{\mathbb{P}} 0$ even though $\sqrt{n} \to \infty$. The trick is to factor the remainder. On the event $\hat{\theta}_n \neq \theta_0$ (which is harmless since the complementary event has vanishing probability), write \begin{align*} \sqrt{n}\, r(\hat{\theta}_n - \theta_0) = \underbrace{\frac{r(\hat{\theta}_n - \theta_0)}{|\hat{\theta}_n - \theta_0|}}_{\to\, 0 \text{ in } \mathbb{P}} \cdot \underbrace{|\sqrt{n}(\hat{\theta}_n - \theta_0)|}_{\text{bounded in } \mathbb{P}}. \end{align*} Why does the first factor vanish? Since $\sqrt{n}(\hat{\theta}_n - \theta_0) \xrightarrow{d} \mathcal{N}(0, \Sigma)$, the [Convergence in Distribution Implies Boundedness in Probability](/theorems/1849) guarantees that $\hat{\theta}_n - \theta_0 = O_{\mathbb{P}}(n^{-1/2})$. In particular $\hat{\theta}_n \xrightarrow{\mathbb{P}} \theta_0$. By the differentiability condition, $|r(h)|/|h| \to 0$ as $h \to 0$. Since $\hat{\theta}_n - \theta_0 \xrightarrow{\mathbb{P}} 0$, the continuous mapping theorem (applied to the continuous-at-zero function $h \mapsto |r(h)|/|h|$, extended to be $0$ at $h = 0$) gives $|r(\hat{\theta}_n - \theta_0)|/|\hat{\theta}_n - \theta_0| \xrightarrow{\mathbb{P}} 0$. Why is the second factor bounded? It equals $|\sqrt{n}(\hat{\theta}_n - \theta_0)|$, which converges in distribution to $|\mathcal{N}(0, \Sigma)|$, and convergence in distribution implies boundedness in probability. The product of a term converging to zero in probability and a bounded-in-probability term converges to zero in probability. Hence $\sqrt{n}\, r(\hat{\theta}_n - \theta_0) \xrightarrow{\mathbb{P}} 0$. [/guided] [/step] [step:Apply Slutsky's lemma to conclude the Gaussian limit] Combining the expansion with the remainder estimate: \begin{align*} \sqrt{n}(g(\hat{\theta}_n) - g(\theta_0)) = Jg_{\theta_0} \cdot \sqrt{n}(\hat{\theta}_n - \theta_0) + \sqrt{n}\, r(\hat{\theta}_n - \theta_0). \end{align*} The first term converges in distribution to $Jg_{\theta_0} \cdot Z$ where $Z \sim \mathcal{N}(0, \Sigma)$, by the [Affine Transformations of Multivariate Normals](/theorems/1853): since $Jg_{\theta_0}$ is a fixed $q \times p$ matrix and $\sqrt{n}(\hat{\theta}_n - \theta_0) \xrightarrow{d} \mathcal{N}(0, \Sigma)$, the linear image converges in distribution to $\mathcal{N}(0, Jg_{\theta_0}\, \Sigma\, Jg_{\theta_0}^\top)$. The second term converges to $0$ in probability. By [Slutsky's Lemma](/theorems/1850), adding a term converging to zero in probability does not affect the distributional limit. Therefore \begin{align*} \sqrt{n}(g(\hat{\theta}_n) - g(\theta_0)) \xrightarrow{d} \mathcal{N}(0,\, Jg_{\theta_0}\, \Sigma\, Jg_{\theta_0}^\top). \end{align*} [guided] We now assemble the two pieces. The leading term is \begin{align*} Jg_{\theta_0} \cdot \sqrt{n}(\hat{\theta}_n - \theta_0). \end{align*} By hypothesis, $\sqrt{n}(\hat{\theta}_n - \theta_0) \xrightarrow{d} Z \sim \mathcal{N}(0, \Sigma)$. Since $Jg_{\theta_0}$ is a deterministic matrix, the [Affine Transformations of Multivariate Normals](/theorems/1853) (part (a) applied to the limiting distribution, combined with part (b) for the convergence statement with $A_n = Jg_{\theta_0}$ constant) gives \begin{align*} Jg_{\theta_0} \cdot \sqrt{n}(\hat{\theta}_n - \theta_0) \xrightarrow{d} Jg_{\theta_0}\, Z \sim \mathcal{N}(0,\, Jg_{\theta_0}\, \Sigma\, Jg_{\theta_0}^\top). \end{align*} The covariance of $Jg_{\theta_0}\, Z$ is computed as $\operatorname{Cov}(Jg_{\theta_0}\, Z) = Jg_{\theta_0}\, \operatorname{Cov}(Z)\, Jg_{\theta_0}^\top = Jg_{\theta_0}\, \Sigma\, Jg_{\theta_0}^\top$. Adding the remainder, which satisfies $\sqrt{n}\, r(\hat{\theta}_n - \theta_0) \xrightarrow{\mathbb{P}} 0$, does not affect the distributional limit by [Slutsky's Lemma](/theorems/1850) (part (b), with $Y_n = \sqrt{n}\, r(\hat{\theta}_n - \theta_0) \xrightarrow{d} 0$, a deterministic constant). The conclusion is \begin{align*} \sqrt{n}(g(\hat{\theta}_n) - g(\theta_0)) \xrightarrow{d} \mathcal{N}(0,\, Jg_{\theta_0}\, \Sigma\, Jg_{\theta_0}^\top). \end{align*} [/guided] [/step]

Explore Further

Approximate Confidence Interval via MLE Statistics Bayes Rule is Minimax When Bayes Risk Equals Maximal Risk Statistics Fisher Information Under Reparametrisation Statistics Bootstrap Mean Statistics Independence of $\bar{X}$ and $S^2$ for Normal Samples Statistics Factorisation Criterion Statistics Multivariate Central Limit Theorem Statistics Rao–Blackwell Theorem Statistics

What brings you to Androma?

Start with a route through the knowledge graph.

Multivariate Delta Method (Theorem # 1860)

Discussion

Proof

Explore Further

Sign in to Androma

Check your inbox

One last step

Multivariate Delta Method (Theorem # 1860)

Discussion

Proof

Explore Further