[proofplan]
We express all three statistics in terms of the common building block $\Delta_n := \sqrt{n}(\hat{\theta}_n - \theta_0)$ and show that each equals $\Delta_n^\top I(\theta_0) \Delta_n + o_{\mathbb{P}}(1)$. The Wald statistic is already a quadratic form in $\Delta_n$ up to replacing $\hat{i}_n$ by $I(\theta_0)$. The LRT is related to the Wald statistic by a second-order Taylor expansion of the log-likelihood. The score statistic is linked via the first-order Taylor expansion of the score at $\theta_0$. Once all three are shown to differ from $\Delta_n^\top I(\theta_0) \Delta_n$ by $o_{\mathbb{P}}(1)$, the asymptotic equivalence follows by the triangle inequality.
[/proofplan]
[step:Express the Wald statistic as $\Delta_n^\top I(\theta_0) \Delta_n + o_{\mathbb{P}}(1)$]
Define $\Delta_n := \sqrt{n}(\hat{\theta}_n - \theta_0)$. By the [Asymptotic Normality of the MLE](/theorems/1857), $\Delta_n \xrightarrow{d} \mathcal{N}(0, I(\theta_0)^{-1})$ under $\mathbb{P}_{\theta_0}$. In particular, $\Delta_n = O_{\mathbb{P}}(1)$.
The Wald statistic is $W_n(\theta_0) = \Delta_n^\top \hat{i}_n \Delta_n$, where $\hat{i}_n$ is the observed Fisher information. By the [Consistency of Observed Fisher Information](/theorems/1862), $\hat{i}_n \xrightarrow{\mathbb{P}_{\theta_0}} I(\theta_0)$. Write
\begin{align*}
W_n(\theta_0) = \Delta_n^\top I(\theta_0) \Delta_n + \Delta_n^\top (\hat{i}_n - I(\theta_0)) \Delta_n.
\end{align*}
The error term satisfies $|\Delta_n^\top (\hat{i}_n - I(\theta_0)) \Delta_n| \leq |\Delta_n|^2 \|\hat{i}_n - I(\theta_0)\|_{\mathrm{op}}$. Since $|\Delta_n|^2 = O_{\mathbb{P}}(1)$ and $\|\hat{i}_n - I(\theta_0)\|_{\mathrm{op}} \xrightarrow{\mathbb{P}} 0$, the product converges to zero in probability. Hence
\begin{align*}
W_n(\theta_0) = \Delta_n^\top I(\theta_0) \Delta_n + o_{\mathbb{P}}(1).
\end{align*}
[/step]
[step:Express the LRT statistic as $\Delta_n^\top I(\theta_0) \Delta_n + o_{\mathbb{P}}(1)$ via Taylor expansion]
The log-likelihood ratio statistic is $\Lambda_n = 2(\ell_n(\hat{\theta}_n) - \ell_n(\theta_0)) = 2n(\bar{\ell}_n(\hat{\theta}_n) - \bar{\ell}_n(\theta_0))$. Perform a second-order Taylor expansion of $\bar{\ell}_n$ around $\hat{\theta}_n$. Since $\nabla_\theta \bar{\ell}_n(\hat{\theta}_n) = 0$ at the MLE (on the event that $\hat{\theta}_n \in \operatorname{int}(\Theta)$, which has probability tending to 1), we have
\begin{align*}
\bar{\ell}_n(\theta_0) = \bar{\ell}_n(\hat{\theta}_n) + \frac{1}{2}(\theta_0 - \hat{\theta}_n)^\top \nabla^2_\theta \bar{\ell}_n(\bar{\theta}_n)(\theta_0 - \hat{\theta}_n),
\end{align*}
where $\bar{\theta}_n$ lies on the segment $[\theta_0, \hat{\theta}_n]$. Rearranging:
\begin{align*}
\Lambda_n = -n(\theta_0 - \hat{\theta}_n)^\top \nabla^2_\theta \bar{\ell}_n(\bar{\theta}_n)(\theta_0 - \hat{\theta}_n) = \Delta_n^\top (-\nabla^2_\theta \bar{\ell}_n(\bar{\theta}_n)) \Delta_n.
\end{align*}
By the uniform law of large numbers for the Hessian and the fact that $\bar{\theta}_n \xrightarrow{\mathbb{P}} \theta_0$ (since $\hat{\theta}_n \xrightarrow{\mathbb{P}} \theta_0$ and $\bar{\theta}_n$ lies between the two), $-\nabla^2_\theta \bar{\ell}_n(\bar{\theta}_n) \xrightarrow{\mathbb{P}} I(\theta_0)$. Therefore
\begin{align*}
\Lambda_n = \Delta_n^\top I(\theta_0) \Delta_n + \Delta_n^\top (-\nabla^2_\theta \bar{\ell}_n(\bar{\theta}_n) - I(\theta_0)) \Delta_n.
\end{align*}
The error term vanishes in probability by the same argument as for the Wald statistic: $|\Delta_n|^2 = O_{\mathbb{P}}(1)$ and $\|-\nabla^2_\theta \bar{\ell}_n(\bar{\theta}_n) - I(\theta_0)\|_{\mathrm{op}} \xrightarrow{\mathbb{P}} 0$. Hence
\begin{align*}
\Lambda_n = \Delta_n^\top I(\theta_0) \Delta_n + o_{\mathbb{P}}(1).
\end{align*}
[guided]
The key idea is that the log-likelihood ratio is well-approximated by a quadratic form at the MLE. The proof of [Wilks' Theorem](/theorems/1864) uses the same Taylor expansion. The first-order term vanishes because $\hat{\theta}_n$ is a critical point of $\bar{\ell}_n$. The Hessian matrix $\nabla^2_\theta \bar{\ell}_n(\bar{\theta}_n)$ evaluated at an intermediate point converges to $-I(\theta_0)$ by the uniform law of large numbers for the Hessian (regularity assumption) and the squeeze $\bar{\theta}_n \in [\theta_0, \hat{\theta}_n]$ combined with $\hat{\theta}_n \xrightarrow{\mathbb{P}} \theta_0$.
After substitution, $\Lambda_n = \Delta_n^\top (-\nabla^2_\theta \bar{\ell}_n(\bar{\theta}_n)) \Delta_n$, which differs from $\Delta_n^\top I(\theta_0) \Delta_n$ by a term bounded by $|\Delta_n|^2 \cdot \|-\nabla^2_\theta \bar{\ell}_n(\bar{\theta}_n) - I(\theta_0)\|_{\mathrm{op}}$. Since the first factor is $O_{\mathbb{P}}(1)$ and the second vanishes in probability, the product is $o_{\mathbb{P}}(1)$.
[/guided]
[/step]
[step:Express the score statistic as $\Delta_n^\top I(\theta_0) \Delta_n + o_{\mathbb{P}}(1)$ via the score expansion]
The score statistic is $T_n(\theta_0) = \frac{1}{n} S_n(\theta_0)^\top I(\theta_0)^{-1} S_n(\theta_0)$, where $S_n(\theta_0) = \sum_{i=1}^n \nabla_\theta \log f(X_i, \theta_0) = n \nabla_\theta \bar{\ell}_n(\theta_0)$. Hence
\begin{align*}
T_n(\theta_0) = n\, \nabla_\theta \bar{\ell}_n(\theta_0)^\top I(\theta_0)^{-1} \nabla_\theta \bar{\ell}_n(\theta_0).
\end{align*}
From the proof of [Asymptotic Normality of the MLE](/theorems/1857), the mean value expansion of the score gives
\begin{align*}
\nabla_\theta \bar{\ell}_n(\theta_0) = \nabla_\theta \bar{\ell}_n(\hat{\theta}_n) - \nabla^2_\theta \bar{\ell}_n(\tilde{\theta}_n)(\theta_0 - \hat{\theta}_n) = -\nabla^2_\theta \bar{\ell}_n(\tilde{\theta}_n)(\theta_0 - \hat{\theta}_n),
\end{align*}
where $\tilde{\theta}_n$ lies on the segment $[\theta_0, \hat{\theta}_n]$ and we used $\nabla_\theta \bar{\ell}_n(\hat{\theta}_n) = 0$. Define $\bar{H}_n := -\nabla^2_\theta \bar{\ell}_n(\tilde{\theta}_n)$, which satisfies $\bar{H}_n \xrightarrow{\mathbb{P}} I(\theta_0)$. Then $\sqrt{n}\, \nabla_\theta \bar{\ell}_n(\theta_0) = \bar{H}_n \Delta_n$, and
\begin{align*}
T_n(\theta_0) = \Delta_n^\top \bar{H}_n^\top I(\theta_0)^{-1} \bar{H}_n \Delta_n.
\end{align*}
Since $\bar{H}_n \xrightarrow{\mathbb{P}} I(\theta_0)$, the matrix $\bar{H}_n^\top I(\theta_0)^{-1} \bar{H}_n \xrightarrow{\mathbb{P}} I(\theta_0)^\top I(\theta_0)^{-1} I(\theta_0) = I(\theta_0)$ (using symmetry of $I(\theta_0)$). Therefore
\begin{align*}
T_n(\theta_0) = \Delta_n^\top I(\theta_0) \Delta_n + o_{\mathbb{P}}(1),
\end{align*}
by the same operator-norm bound used in the previous steps.
[guided]
Why does the score statistic also reduce to $\Delta_n^\top I(\theta_0) \Delta_n$? The score at $\theta_0$ is linked to $\Delta_n = \sqrt{n}(\hat{\theta}_n - \theta_0)$ through the Taylor expansion of the score equation. Specifically, since the MLE zeroes the score, $\nabla_\theta \bar{\ell}_n(\hat{\theta}_n) = 0$, and expanding around $\hat{\theta}_n$:
\begin{align*}
\nabla_\theta \bar{\ell}_n(\theta_0) = \nabla^2_\theta \bar{\ell}_n(\tilde{\theta}_n)(\theta_0 - \hat{\theta}_n) = -\bar{H}_n \cdot \frac{1}{\sqrt{n}} \Delta_n \cdot \frac{1}{\sqrt{n}},
\end{align*}
so $\frac{1}{\sqrt{n}} S_n(\theta_0) = \sqrt{n}\, \nabla_\theta \bar{\ell}_n(\theta_0) = \bar{H}_n \Delta_n$. Substituting into the score statistic:
\begin{align*}
T_n(\theta_0) = (\bar{H}_n \Delta_n)^\top I(\theta_0)^{-1} (\bar{H}_n \Delta_n) = \Delta_n^\top \bar{H}_n^\top I(\theta_0)^{-1} \bar{H}_n \Delta_n.
\end{align*}
Now $\bar{H}_n^\top I(\theta_0)^{-1} \bar{H}_n - I(\theta_0) \xrightarrow{\mathbb{P}} 0$ because $\bar{H}_n \xrightarrow{\mathbb{P}} I(\theta_0)$ and matrix multiplication is continuous. The error $|\Delta_n^\top (\bar{H}_n^\top I(\theta_0)^{-1} \bar{H}_n - I(\theta_0)) \Delta_n| \leq |\Delta_n|^2 \|\bar{H}_n^\top I(\theta_0)^{-1} \bar{H}_n - I(\theta_0)\|_{\mathrm{op}} = o_{\mathbb{P}}(1)$.
[/guided]
[/step]
[step:Conclude asymptotic equivalence by combining the three representations]
From the three preceding steps:
\begin{align*}
W_n(\theta_0) &= \Delta_n^\top I(\theta_0) \Delta_n + o_{\mathbb{P}}(1), \\
\Lambda_n(\Theta, \{\theta_0\}) &= \Delta_n^\top I(\theta_0) \Delta_n + o_{\mathbb{P}}(1), \\
T_n(\theta_0) &= \Delta_n^\top I(\theta_0) \Delta_n + o_{\mathbb{P}}(1).
\end{align*}
Subtracting pairwise:
\begin{align*}
W_n(\theta_0) - \Lambda_n(\Theta, \{\theta_0\}) &= o_{\mathbb{P}}(1), \\
W_n(\theta_0) - T_n(\theta_0) &= o_{\mathbb{P}}(1).
\end{align*}
Since all three statistics share the same asymptotic distribution $\chi^2_p$ (established in the [Wald Confidence Ellipsoids](/theorems/1863), [Wilks' Theorem](/theorems/1864), and [Asymptotic Distribution of the Score Statistic](/theorems/1865)), and their pairwise differences vanish in probability, any test that rejects $H_0$ when the statistic exceeds a threshold $\xi_\alpha$ (the $(1-\alpha)$-quantile of $\chi^2_p$) has the same asymptotic size $\alpha$. Under a sequence of local alternatives $\theta_n = \theta_0 + h/\sqrt{n}$, all three statistics have the same limiting non-central $\chi^2$ distribution (since the $o_{\mathbb{P}}(1)$ remainders do not depend on whether the true parameter is $\theta_0$ or $\theta_n$), so the asymptotic power is also identical.
[/step]