[proofplan]
We linearize $g$ around $\theta_0$ using its first-order Taylor expansion. The remainder term is $o(|\hat{\theta}_n - \theta_0|)$, and after rescaling by $\sqrt{n}$ it vanishes in probability because $\hat{\theta}_n - \theta_0 = O_{\mathbb{P}}(n^{-1/2})$. This leaves $Jg_{\theta_0} \cdot \sqrt{n}(\hat{\theta}_n - \theta_0)$ as the leading term. An application of Slutsky's lemma (or the continuous mapping theorem for linear maps) then transfers the distributional convergence through the fixed linear map $Jg_{\theta_0}$.
[/proofplan]
[step:Expand $g(\hat{\theta}_n)$ around $\theta_0$ via first-order Taylor approximation]
Since $g : \mathbb{R}^p \to \mathbb{R}^q$ is differentiable at $\theta_0$, the first-order Taylor expansion gives
\begin{align*}
g(\hat{\theta}_n) - g(\theta_0) = Jg_{\theta_0}(\hat{\theta}_n - \theta_0) + r(\hat{\theta}_n - \theta_0),
\end{align*}
where the remainder $r : \mathbb{R}^p \to \mathbb{R}^q$ satisfies $|r(h)| / |h| \to 0$ as $|h| \to 0$. Multiplying both sides by $\sqrt{n}$:
\begin{align*}
\sqrt{n}(g(\hat{\theta}_n) - g(\theta_0)) = Jg_{\theta_0} \cdot \sqrt{n}(\hat{\theta}_n - \theta_0) + \sqrt{n}\, r(\hat{\theta}_n - \theta_0).
\end{align*}
[guided]
Differentiability of $g$ at $\theta_0$ means that the total derivative $Dg_{\theta_0} : \mathbb{R}^p \to \mathbb{R}^q$ exists as a linear map, and its matrix representation with respect to the standard bases is the Jacobian $Jg_{\theta_0} \in \mathbb{R}^{q \times p}$. The defining property of differentiability is
\begin{align*}
g(\theta_0 + h) = g(\theta_0) + Jg_{\theta_0}\, h + r(h), \qquad \text{where } \frac{|r(h)|}{|h|} \to 0 \text{ as } h \to 0.
\end{align*}
Setting $h = \hat{\theta}_n - \theta_0$ and multiplying both sides by $\sqrt{n}$ gives
\begin{align*}
\sqrt{n}(g(\hat{\theta}_n) - g(\theta_0)) = Jg_{\theta_0} \cdot \sqrt{n}(\hat{\theta}_n - \theta_0) + \sqrt{n}\, r(\hat{\theta}_n - \theta_0).
\end{align*}
The proof now reduces to showing that the remainder term $\sqrt{n}\, r(\hat{\theta}_n - \theta_0)$ is negligible and the leading term converges to the claimed Gaussian.
[/guided]
[/step]
[step:Show the remainder term vanishes in probability]
Since $\sqrt{n}(\hat{\theta}_n - \theta_0) \xrightarrow{d} \mathcal{N}(0, \Sigma)$, the sequence $\sqrt{n}(\hat{\theta}_n - \theta_0)$ is bounded in probability (convergence in distribution implies boundedness in probability). Equivalently, $\hat{\theta}_n - \theta_0 = O_{\mathbb{P}}(n^{-1/2})$, so in particular $\hat{\theta}_n \xrightarrow{\mathbb{P}} \theta_0$.
Write the remainder as
\begin{align*}
\sqrt{n}\, r(\hat{\theta}_n - \theta_0) = \frac{r(\hat{\theta}_n - \theta_0)}{|\hat{\theta}_n - \theta_0|} \cdot \sqrt{n}\,|\hat{\theta}_n - \theta_0|.
\end{align*}
The first factor converges to $0$ in probability because $\hat{\theta}_n \xrightarrow{\mathbb{P}} \theta_0$ and $|r(h)|/|h| \to 0$ as $h \to 0$. The second factor $\sqrt{n}\,|\hat{\theta}_n - \theta_0| = |\sqrt{n}(\hat{\theta}_n - \theta_0)|$ is bounded in probability. The product of a sequence converging to zero in probability and a sequence bounded in probability converges to zero in probability, so $\sqrt{n}\, r(\hat{\theta}_n - \theta_0) \xrightarrow{\mathbb{P}} 0$.
[guided]
This is the critical step where the $\sqrt{n}$ rescaling interacts with the remainder. We need to show that $\sqrt{n}\, r(\hat{\theta}_n - \theta_0) \xrightarrow{\mathbb{P}} 0$ even though $\sqrt{n} \to \infty$.
The trick is to factor the remainder. On the event $\hat{\theta}_n \neq \theta_0$ (which is harmless since the complementary event has vanishing probability), write
\begin{align*}
\sqrt{n}\, r(\hat{\theta}_n - \theta_0) = \underbrace{\frac{r(\hat{\theta}_n - \theta_0)}{|\hat{\theta}_n - \theta_0|}}_{\to\, 0 \text{ in } \mathbb{P}} \cdot \underbrace{|\sqrt{n}(\hat{\theta}_n - \theta_0)|}_{\text{bounded in } \mathbb{P}}.
\end{align*}
Why does the first factor vanish? Since $\sqrt{n}(\hat{\theta}_n - \theta_0) \xrightarrow{d} \mathcal{N}(0, \Sigma)$, the [Convergence in Distribution Implies Boundedness in Probability](/theorems/1849) guarantees that $\hat{\theta}_n - \theta_0 = O_{\mathbb{P}}(n^{-1/2})$. In particular $\hat{\theta}_n \xrightarrow{\mathbb{P}} \theta_0$. By the differentiability condition, $|r(h)|/|h| \to 0$ as $h \to 0$. Since $\hat{\theta}_n - \theta_0 \xrightarrow{\mathbb{P}} 0$, the continuous mapping theorem (applied to the continuous-at-zero function $h \mapsto |r(h)|/|h|$, extended to be $0$ at $h = 0$) gives $|r(\hat{\theta}_n - \theta_0)|/|\hat{\theta}_n - \theta_0| \xrightarrow{\mathbb{P}} 0$.
Why is the second factor bounded? It equals $|\sqrt{n}(\hat{\theta}_n - \theta_0)|$, which converges in distribution to $|\mathcal{N}(0, \Sigma)|$, and convergence in distribution implies boundedness in probability.
The product of a term converging to zero in probability and a bounded-in-probability term converges to zero in probability. Hence $\sqrt{n}\, r(\hat{\theta}_n - \theta_0) \xrightarrow{\mathbb{P}} 0$.
[/guided]
[/step]
[step:Apply Slutsky's lemma to conclude the Gaussian limit]
Combining the expansion with the remainder estimate:
\begin{align*}
\sqrt{n}(g(\hat{\theta}_n) - g(\theta_0)) = Jg_{\theta_0} \cdot \sqrt{n}(\hat{\theta}_n - \theta_0) + \sqrt{n}\, r(\hat{\theta}_n - \theta_0).
\end{align*}
The first term converges in distribution to $Jg_{\theta_0} \cdot Z$ where $Z \sim \mathcal{N}(0, \Sigma)$, by the [Affine Transformations of Multivariate Normals](/theorems/1853): since $Jg_{\theta_0}$ is a fixed $q \times p$ matrix and $\sqrt{n}(\hat{\theta}_n - \theta_0) \xrightarrow{d} \mathcal{N}(0, \Sigma)$, the linear image converges in distribution to $\mathcal{N}(0, Jg_{\theta_0}\, \Sigma\, Jg_{\theta_0}^\top)$.
The second term converges to $0$ in probability. By [Slutsky's Lemma](/theorems/1850), adding a term converging to zero in probability does not affect the distributional limit. Therefore
\begin{align*}
\sqrt{n}(g(\hat{\theta}_n) - g(\theta_0)) \xrightarrow{d} \mathcal{N}(0,\, Jg_{\theta_0}\, \Sigma\, Jg_{\theta_0}^\top).
\end{align*}
[guided]
We now assemble the two pieces. The leading term is
\begin{align*}
Jg_{\theta_0} \cdot \sqrt{n}(\hat{\theta}_n - \theta_0).
\end{align*}
By hypothesis, $\sqrt{n}(\hat{\theta}_n - \theta_0) \xrightarrow{d} Z \sim \mathcal{N}(0, \Sigma)$. Since $Jg_{\theta_0}$ is a deterministic matrix, the [Affine Transformations of Multivariate Normals](/theorems/1853) (part (a) applied to the limiting distribution, combined with part (b) for the convergence statement with $A_n = Jg_{\theta_0}$ constant) gives
\begin{align*}
Jg_{\theta_0} \cdot \sqrt{n}(\hat{\theta}_n - \theta_0) \xrightarrow{d} Jg_{\theta_0}\, Z \sim \mathcal{N}(0,\, Jg_{\theta_0}\, \Sigma\, Jg_{\theta_0}^\top).
\end{align*}
The covariance of $Jg_{\theta_0}\, Z$ is computed as $\operatorname{Cov}(Jg_{\theta_0}\, Z) = Jg_{\theta_0}\, \operatorname{Cov}(Z)\, Jg_{\theta_0}^\top = Jg_{\theta_0}\, \Sigma\, Jg_{\theta_0}^\top$.
Adding the remainder, which satisfies $\sqrt{n}\, r(\hat{\theta}_n - \theta_0) \xrightarrow{\mathbb{P}} 0$, does not affect the distributional limit by [Slutsky's Lemma](/theorems/1850) (part (b), with $Y_n = \sqrt{n}\, r(\hat{\theta}_n - \theta_0) \xrightarrow{d} 0$, a deterministic constant). The conclusion is
\begin{align*}
\sqrt{n}(g(\hat{\theta}_n) - g(\theta_0)) \xrightarrow{d} \mathcal{N}(0,\, Jg_{\theta_0}\, \Sigma\, Jg_{\theta_0}^\top).
\end{align*}
[/guided]
[/step]