[proofplan]
The proof is the standard likelihood expansion argument. Consistency places the maximiser in an interior neighbourhood of the true parameter, so the score equation holds at $\hat{\vartheta}_n$. Taylor expansion of the score around $\vartheta_0$ expresses $\sqrt n(\hat{\vartheta}_n-\vartheta_0)$ as the inverse observed information times the normalised score. The ARMA hypotheses enter through the stated standard moment and regularity conditions, which supply consistency, score asymptotic normality, and observed-information convergence. These assumed score and Hessian limits then give the claimed normal limit.
[/proofplan]
[step:Use consistency to place the estimator in the regular neighbourhood]
Let $U \subset \Theta$ be the neighbourhood of $\vartheta_0$ from the regularity hypotheses. The ARMA causality, invertibility, identifiability, moment, and no-common-zero assumptions are used here through those likelihood regularity hypotheses: they give consistency, twice continuous differentiability of the log-likelihood on $U$ with probability tending to one, validity of the first-order likelihood equation at any interior maximum, the score central limit theorem, and local uniform observed-information convergence in shrinking neighbourhoods of $\vartheta_0$. Thus the proof below is the abstract likelihood argument after the ARMA-specific regularity facts have been assumed. Since $\vartheta_0$ is an interior point of $\Theta$, choose $r>0$ such that
\begin{align*}
B(\vartheta_0,r) := \{\vartheta \in \mathbb{R}^{p+q} : |\vartheta-\vartheta_0| < r\} \subset U \subset \Theta,
\end{align*}
where $|\cdot|$ denotes the Euclidean norm on $\mathbb{R}^{p+q}$. Because $\hat{\vartheta}_n \xrightarrow{\mathbb{P}} \vartheta_0$, we have
\begin{align*}
\mathbb{P}\bigl(\hat{\vartheta}_n \in B(\vartheta_0,r)\bigr) \to 1.
\end{align*}
Let $E_n$ denote the event that $\hat{\vartheta}_n \in B(\vartheta_0,r)$ and that $\ell_n$ is twice continuously differentiable on $U$. By consistency and the differentiability regularity hypothesis, $\mathbb{P}(E_n) \to 1$. On $E_n$, $\hat{\vartheta}_n$ is an interior maximiser of $\ell_n$ on the differentiability neighbourhood $U$. Therefore the first-order likelihood equation holds:
\begin{align*}
\nabla \ell_n(\hat{\vartheta}_n)=0.
\end{align*}
[/step]
[step:Expand the score around the true parameter using the integral Taylor formula]
Define the score map
\begin{align*}
S_n: U &\to \mathbb{R}^{p+q},\\
\vartheta &\mapsto \nabla \ell_n(\vartheta),
\end{align*}
and define the observed Hessian matrix map as the Jacobian of the score,
\begin{align*}
H_n: U &\to \mathbb{R}^{(p+q)\times(p+q)},\\
\vartheta &\mapsto JS_{n,\vartheta}.
\end{align*}
Equivalently, the $(i,j)$ entry of $H_n(\vartheta)$ is $\partial_{\vartheta_j}S_{n,i}(\vartheta)$ for $1 \le i,j \le p+q$. Let $\mathcal{L}^1$ denote one-dimensional Lebesgue measure on $[0,1]$. On $E_n$, the line segment from $\vartheta_0$ to $\hat{\vartheta}_n$ is contained in $B(\vartheta_0,r) \subset U$, and $S_n$ is continuously differentiable on that segment. Define the pathwise score curve
\begin{align*}
g_n: [0,1] &\to \mathbb{R}^{p+q},\\
t &\mapsto S_n\bigl(\vartheta_0+t(\hat{\vartheta}_n-\vartheta_0)\bigr).
\end{align*}
By the [Fundamental Theorem of Calculus](/theorems/632) applied componentwise to $g_n$, whose components are continuously differentiable on $[0,1]$, we obtain
\begin{align*}
S_n(\hat{\vartheta}_n)-S_n(\vartheta_0)
=
\int_0^1 H_n\bigl(\vartheta_0+t(\hat{\vartheta}_n-\vartheta_0)\bigr)(\hat{\vartheta}_n-\vartheta_0)\,d\mathcal{L}^1(t).
\end{align*}
Define the averaged observed Hessian matrix
\begin{align*}
\bar{H}_n: E_n &\to \mathbb{R}^{(p+q)\times(p+q)},\\
\omega &\mapsto \int_0^1 H_n\bigl(\vartheta_0+t(\hat{\vartheta}_n(\omega)-\vartheta_0)\bigr)\,d\mathcal{L}^1(t).
\end{align*}
Since $S_n(\hat{\vartheta}_n)=0$ on $E_n$, we obtain
\begin{align*}
0
=
S_n(\vartheta_0)+\bar{H}_n(\hat{\vartheta}_n-\vartheta_0).
\end{align*}
Multiplying by $n^{-1/2}$ and rearranging gives
\begin{align*}
-\frac{1}{n}\bar{H}_n\sqrt n(\hat{\vartheta}_n-\vartheta_0)
=
\frac{1}{\sqrt n}S_n(\vartheta_0).
\end{align*}
[guided]
We expand the score, not the likelihood itself, because the estimator is characterised by the first-order condition. Define the score map
\begin{align*}
S_n: U &\to \mathbb{R}^{p+q},\\
\vartheta &\mapsto \nabla \ell_n(\vartheta),
\end{align*}
and define the observed Hessian matrix map as the Jacobian of the score,
\begin{align*}
H_n: U &\to \mathbb{R}^{(p+q)\times(p+q)},\\
\vartheta &\mapsto JS_{n,\vartheta}.
\end{align*}
Thus the $(i,j)$ entry of $H_n(\vartheta)$ is $\partial_{\vartheta_j}S_{n,i}(\vartheta)$ for $1 \le i,j \le p+q$. Let $\mathcal{L}^1$ denote one-dimensional Lebesgue measure on $[0,1]$. This avoids using a scalar one-dimensional mean value theorem in a vector setting. In several parameters, a componentwise Taylor theorem need not produce one common intermediate point for every component of the score, so we use the integral Taylor formula instead.
On $E_n$, the estimator belongs to $B(\vartheta_0,r)$, and this ball is convex as a Euclidean ball in $\mathbb{R}^{p+q}$. Therefore for every $t \in [0,1]$, the parameter
\begin{align*}
\vartheta_{n,t}:=\vartheta_0+t(\hat{\vartheta}_n-\vartheta_0)
\end{align*}
belongs to $B(\vartheta_0,r) \subset U$. The same event $E_n$ also gives twice continuous differentiability of $\ell_n$ on $U$, hence continuous differentiability of $S_n$ on the whole segment $\{\vartheta_{n,t}:0\le t\le 1\}$. These hypotheses allow us to reduce the vector-valued expansion to one-dimensional calculus along the line segment. Define the pathwise score curve
\begin{align*}
g_n: [0,1] &\to \mathbb{R}^{p+q},\\
t &\mapsto S_n(\vartheta_{n,t}).
\end{align*}
Because $S_n$ is continuously differentiable on the segment, every component of $g_n$ is continuously differentiable on $[0,1]$. By the [Fundamental Theorem of Calculus](/theorems/632) applied componentwise to $g_n$,
\begin{align*}
S_n(\hat{\vartheta}_n)-S_n(\vartheta_0)
=
g_n(1)-g_n(0)
=
\int_0^1 H_n(\vartheta_{n,t})(\hat{\vartheta}_n-\vartheta_0)\,d\mathcal{L}^1(t).
\end{align*}
Since the vector $\hat{\vartheta}_n-\vartheta_0$ is independent of the integration variable $t$, we may factor it to the right of the matrix integral. Define the averaged observed Hessian matrix
\begin{align*}
\bar{H}_n: E_n &\to \mathbb{R}^{(p+q)\times(p+q)},\\
\omega &\mapsto \int_0^1 H_n\bigl(\vartheta_0+t(\hat{\vartheta}_n(\omega)-\vartheta_0)\bigr)\,d\mathcal{L}^1(t).
\end{align*}
Then the Taylor identity becomes
\begin{align*}
S_n(\hat{\vartheta}_n)
=
S_n(\vartheta_0)+\bar{H}_n(\hat{\vartheta}_n-\vartheta_0).
\end{align*}
Because $\hat{\vartheta}_n$ is an interior maximiser on $E_n$, the first-order likelihood equation gives $S_n(\hat{\vartheta}_n)=0$. Hence
\begin{align*}
0
=
S_n(\vartheta_0)+\bar{H}_n(\hat{\vartheta}_n-\vartheta_0).
\end{align*}
Multiplying by $n^{-1/2}$ and moving the Hessian term to the left gives
\begin{align*}
-\frac{1}{n}\bar{H}_n\sqrt n(\hat{\vartheta}_n-\vartheta_0)
=
\frac{1}{\sqrt n}S_n(\vartheta_0).
\end{align*}
This identity is the mechanism of the proof: the left side contains the averaged observed information, while the right side is the normalised score.
[/guided]
[/step]
[step:Show the averaged observed information converges to Fisher information]
For $t \in [0,1]$, define the random parameter
\begin{align*}
\vartheta_{n,t}:=\vartheta_0+t(\hat{\vartheta}_n-\vartheta_0).
\end{align*}
For a matrix $A \in \mathbb{R}^{(p+q)\times(p+q)}$, let $\|A\|_{\mathrm{op}}$ denote the operator norm induced by the Euclidean norm on $\mathbb{R}^{p+q}$:
\begin{align*}
\|A\|_{\mathrm{op}}:=\sup\{|Av|:v\in\mathbb{R}^{p+q},\ |v|=1\}.
\end{align*}
Since
\begin{align*}
|\vartheta_{n,t}-\vartheta_0|
=
t|\hat{\vartheta}_n-\vartheta_0|
\le |\hat{\vartheta}_n-\vartheta_0|,
\end{align*}
the whole segment $\{\vartheta_{n,t}:0\le t\le 1\}$ lies in the random ball $B(\vartheta_0,|\hat{\vartheta}_n-\vartheta_0|)$. Because $\hat{\vartheta}_n \xrightarrow{\mathbb{P}} \vartheta_0$, for every fixed $\delta>0$,
\begin{align*}
\mathbb{P}\bigl(|\hat{\vartheta}_n-\vartheta_0|>\delta\bigr) \to 0.
\end{align*}
The assumed local uniform observed-Hessian convergence says that
\begin{align*}
\sup_{\vartheta \in B(\vartheta_0,\delta)}\left\| -\frac{1}{n}H_n(\vartheta)-I(\vartheta_0)\right\|_{\mathrm{op}}
\xrightarrow{\mathbb{P}}
0
\end{align*}
as first $n\to\infty$ and then $\delta\downarrow 0$. Hence, for every $\varepsilon>0$ and $\eta>0$, choose $\delta>0$ small enough for the local uniform convergence bound and then use consistency to obtain
\begin{align*}
\mathbb{P}\left(
\sup_{0\le t\le 1}\left\| -\frac{1}{n}H_n(\vartheta_{n,t})-I(\vartheta_0)\right\|_{\mathrm{op}}>\varepsilon
\right)
&\le
\mathbb{P}\bigl(|\hat{\vartheta}_n-\vartheta_0|>\delta\bigr)\\
&\quad+
\mathbb{P}\left(
\sup_{\vartheta \in B(\vartheta_0,\delta)}\left\| -\frac{1}{n}H_n(\vartheta)-I(\vartheta_0)\right\|_{\mathrm{op}}>\varepsilon
\right)\\
&<\eta
\end{align*}
for all sufficiently large $n$. Therefore
\begin{align*}
\sup_{0\le t\le 1}\left\| -\frac{1}{n}H_n(\vartheta_{n,t})-I(\vartheta_0)\right\|_{\mathrm{op}}
\xrightarrow{\mathbb{P}}
0.
\end{align*}
Using the triangle inequality for the operator norm and the definition of $\bar{H}_n$, we obtain
\begin{align*}
\left\|-\frac{1}{n}\bar{H}_n-I(\vartheta_0)\right\|_{\mathrm{op}}
&=
\left\|\int_0^1\left(-\frac{1}{n}H_n(\vartheta_{n,t})-I(\vartheta_0)\right)\,d\mathcal{L}^1(t)\right\|_{\mathrm{op}}\\
&\le
\int_0^1\left\|-\frac{1}{n}H_n(\vartheta_{n,t})-I(\vartheta_0)\right\|_{\mathrm{op}}\,d\mathcal{L}^1(t)\\
&\le
\sup_{0\le t\le 1}\left\| -\frac{1}{n}H_n(\vartheta_{n,t})-I(\vartheta_0)\right\|_{\mathrm{op}},
\end{align*}
so
\begin{align*}
-\frac{1}{n}\bar{H}_n
\xrightarrow{\mathbb{P}}
I(\vartheta_0).
\end{align*}
Because $I(\vartheta_0)$ is nonsingular, choose $\rho>0$ such that every matrix $A \in \mathbb{R}^{(p+q)\times(p+q)}$ with $\|A-I(\vartheta_0)\|_{\mathrm{op}}<\rho$ is nonsingular. The convergence
\begin{align*}
-\frac{1}{n}\bar{H}_n
\xrightarrow{\mathbb{P}}
I(\vartheta_0)
\end{align*}
therefore implies that $-n^{-1}\bar{H}_n$ is nonsingular with probability tending to one. On this event, the adjugate formula for matrix inversion and continuity of determinant and cofactors give continuity of the inversion map at $I(\vartheta_0)$, hence
\begin{align*}
\left(-\frac{1}{n}\bar{H}_n\right)^{-1}
\xrightarrow{\mathbb{P}}
I(\vartheta_0)^{-1}.
\end{align*}
[/step]
[step:Combine the score limit with the information limit]
From the Taylor identity,
\begin{align*}
\sqrt n(\hat{\vartheta}_n-\vartheta_0)
=
\left(-\frac{1}{n}\bar{H}_n\right)^{-1}
\frac{1}{\sqrt n}S_n(\vartheta_0)
\end{align*}
with probability tending to one. The score regularity assumption gives
\begin{align*}
\frac{1}{\sqrt n}S_n(\vartheta_0)
\xrightarrow{d}
\mathcal{N}(0,I(\vartheta_0)).
\end{align*}
Together with
\begin{align*}
\left(-\frac{1}{n}\bar{H}_n\right)^{-1}
\xrightarrow{\mathbb{P}}
I(\vartheta_0)^{-1},
\end{align*}
the [Slutsky Theorem](/page/Slutsky%20Theorem) yields
\begin{align*}
\sqrt n(\hat{\vartheta}_n-\vartheta_0)
\xrightarrow{d}
I(\vartheta_0)^{-1}Z,
\end{align*}
where $Z$ is a random vector with distribution $\mathcal{N}(0,I(\vartheta_0))$. Therefore
\begin{align*}
I(\vartheta_0)^{-1}Z
\sim
\mathcal{N}\left(0,I(\vartheta_0)^{-1}I(\vartheta_0)I(\vartheta_0)^{-1}\right)
=
\mathcal{N}(0,I(\vartheta_0)^{-1}).
\end{align*}
This proves
\begin{align*}
\sqrt n(\hat{\vartheta}_n-\vartheta_0)
\xrightarrow{d}
\mathcal{N}(0,I(\vartheta_0)^{-1}).
\end{align*}
[/step]