Asymptotic Normality of ARMA Maximum Likelihood Estimators

Asymptotic Normality of ARMA Maximum Likelihood Estimators (Theorem # 3633)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

No discussion available for this theorem.

Proof

[proofplan] The proof is the standard likelihood expansion argument. Consistency places the maximiser in an interior neighbourhood of the true parameter, so the score equation holds at $\hat{\vartheta}_n$. Taylor expansion of the score around $\vartheta_0$ expresses $\sqrt n(\hat{\vartheta}_n-\vartheta_0)$ as the inverse observed information times the normalised score. The ARMA hypotheses enter through the stated standard moment and regularity conditions, which supply consistency, score asymptotic normality, and observed-information convergence. These assumed score and Hessian limits then give the claimed normal limit. [/proofplan] [step:Use consistency to place the estimator in the regular neighbourhood] Let $U \subset \Theta$ be the neighbourhood of $\vartheta_0$ from the regularity hypotheses. The ARMA causality, invertibility, identifiability, moment, and no-common-zero assumptions are used here through those likelihood regularity hypotheses: they give consistency, twice continuous differentiability of the log-likelihood on $U$ with probability tending to one, validity of the first-order likelihood equation at any interior maximum, the score central limit theorem, and local uniform observed-information convergence in shrinking neighbourhoods of $\vartheta_0$. Thus the proof below is the abstract likelihood argument after the ARMA-specific regularity facts have been assumed. Since $\vartheta_0$ is an interior point of $\Theta$, choose $r>0$ such that \begin{align*} B(\vartheta_0,r) := \{\vartheta \in \mathbb{R}^{p+q} : |\vartheta-\vartheta_0| < r\} \subset U \subset \Theta, \end{align*} where $|\cdot|$ denotes the Euclidean norm on $\mathbb{R}^{p+q}$. Because $\hat{\vartheta}_n \xrightarrow{\mathbb{P}} \vartheta_0$, we have \begin{align*} \mathbb{P}\bigl(\hat{\vartheta}_n \in B(\vartheta_0,r)\bigr) \to 1. \end{align*} Let $E_n$ denote the event that $\hat{\vartheta}_n \in B(\vartheta_0,r)$ and that $\ell_n$ is twice continuously differentiable on $U$. By consistency and the differentiability regularity hypothesis, $\mathbb{P}(E_n) \to 1$. On $E_n$, $\hat{\vartheta}_n$ is an interior maximiser of $\ell_n$ on the differentiability neighbourhood $U$. Therefore the first-order likelihood equation holds: \begin{align*} \nabla \ell_n(\hat{\vartheta}_n)=0. \end{align*} [/step] [step:Expand the score around the true parameter using the integral Taylor formula] Define the score map \begin{align*} S_n: U &\to \mathbb{R}^{p+q},\\ \vartheta &\mapsto \nabla \ell_n(\vartheta), \end{align*} and define the observed Hessian matrix map as the Jacobian of the score, \begin{align*} H_n: U &\to \mathbb{R}^{(p+q)\times(p+q)},\\ \vartheta &\mapsto JS_{n,\vartheta}. \end{align*} Equivalently, the $(i,j)$ entry of $H_n(\vartheta)$ is $\partial_{\vartheta_j}S_{n,i}(\vartheta)$ for $1 \le i,j \le p+q$. Let $\mathcal{L}^1$ denote one-dimensional Lebesgue measure on $[0,1]$. On $E_n$, the line segment from $\vartheta_0$ to $\hat{\vartheta}_n$ is contained in $B(\vartheta_0,r) \subset U$, and $S_n$ is continuously differentiable on that segment. Define the pathwise score curve \begin{align*} g_n: [0,1] &\to \mathbb{R}^{p+q},\\ t &\mapsto S_n\bigl(\vartheta_0+t(\hat{\vartheta}_n-\vartheta_0)\bigr). \end{align*} By the [Fundamental Theorem of Calculus](/theorems/632) applied componentwise to $g_n$, whose components are continuously differentiable on $[0,1]$, we obtain \begin{align*} S_n(\hat{\vartheta}_n)-S_n(\vartheta_0) = \int_0^1 H_n\bigl(\vartheta_0+t(\hat{\vartheta}_n-\vartheta_0)\bigr)(\hat{\vartheta}_n-\vartheta_0)\,d\mathcal{L}^1(t). \end{align*} Define the averaged observed Hessian matrix \begin{align*} \bar{H}_n: E_n &\to \mathbb{R}^{(p+q)\times(p+q)},\\ \omega &\mapsto \int_0^1 H_n\bigl(\vartheta_0+t(\hat{\vartheta}_n(\omega)-\vartheta_0)\bigr)\,d\mathcal{L}^1(t). \end{align*} Since $S_n(\hat{\vartheta}_n)=0$ on $E_n$, we obtain \begin{align*} 0 = S_n(\vartheta_0)+\bar{H}_n(\hat{\vartheta}_n-\vartheta_0). \end{align*} Multiplying by $n^{-1/2}$ and rearranging gives \begin{align*} -\frac{1}{n}\bar{H}_n\sqrt n(\hat{\vartheta}_n-\vartheta_0) = \frac{1}{\sqrt n}S_n(\vartheta_0). \end{align*} [guided] We expand the score, not the likelihood itself, because the estimator is characterised by the first-order condition. Define the score map \begin{align*} S_n: U &\to \mathbb{R}^{p+q},\\ \vartheta &\mapsto \nabla \ell_n(\vartheta), \end{align*} and define the observed Hessian matrix map as the Jacobian of the score, \begin{align*} H_n: U &\to \mathbb{R}^{(p+q)\times(p+q)},\\ \vartheta &\mapsto JS_{n,\vartheta}. \end{align*} Thus the $(i,j)$ entry of $H_n(\vartheta)$ is $\partial_{\vartheta_j}S_{n,i}(\vartheta)$ for $1 \le i,j \le p+q$. Let $\mathcal{L}^1$ denote one-dimensional Lebesgue measure on $[0,1]$. This avoids using a scalar one-dimensional mean value theorem in a vector setting. In several parameters, a componentwise Taylor theorem need not produce one common intermediate point for every component of the score, so we use the integral Taylor formula instead. On $E_n$, the estimator belongs to $B(\vartheta_0,r)$, and this ball is convex as a Euclidean ball in $\mathbb{R}^{p+q}$. Therefore for every $t \in [0,1]$, the parameter \begin{align*} \vartheta_{n,t}:=\vartheta_0+t(\hat{\vartheta}_n-\vartheta_0) \end{align*} belongs to $B(\vartheta_0,r) \subset U$. The same event $E_n$ also gives twice continuous differentiability of $\ell_n$ on $U$, hence continuous differentiability of $S_n$ on the whole segment $\{\vartheta_{n,t}:0\le t\le 1\}$. These hypotheses allow us to reduce the vector-valued expansion to one-dimensional calculus along the line segment. Define the pathwise score curve \begin{align*} g_n: [0,1] &\to \mathbb{R}^{p+q},\\ t &\mapsto S_n(\vartheta_{n,t}). \end{align*} Because $S_n$ is continuously differentiable on the segment, every component of $g_n$ is continuously differentiable on $[0,1]$. By the [Fundamental Theorem of Calculus](/theorems/632) applied componentwise to $g_n$, \begin{align*} S_n(\hat{\vartheta}_n)-S_n(\vartheta_0) = g_n(1)-g_n(0) = \int_0^1 H_n(\vartheta_{n,t})(\hat{\vartheta}_n-\vartheta_0)\,d\mathcal{L}^1(t). \end{align*} Since the vector $\hat{\vartheta}_n-\vartheta_0$ is independent of the integration variable $t$, we may factor it to the right of the matrix integral. Define the averaged observed Hessian matrix \begin{align*} \bar{H}_n: E_n &\to \mathbb{R}^{(p+q)\times(p+q)},\\ \omega &\mapsto \int_0^1 H_n\bigl(\vartheta_0+t(\hat{\vartheta}_n(\omega)-\vartheta_0)\bigr)\,d\mathcal{L}^1(t). \end{align*} Then the Taylor identity becomes \begin{align*} S_n(\hat{\vartheta}_n) = S_n(\vartheta_0)+\bar{H}_n(\hat{\vartheta}_n-\vartheta_0). \end{align*} Because $\hat{\vartheta}_n$ is an interior maximiser on $E_n$, the first-order likelihood equation gives $S_n(\hat{\vartheta}_n)=0$. Hence \begin{align*} 0 = S_n(\vartheta_0)+\bar{H}_n(\hat{\vartheta}_n-\vartheta_0). \end{align*} Multiplying by $n^{-1/2}$ and moving the Hessian term to the left gives \begin{align*} -\frac{1}{n}\bar{H}_n\sqrt n(\hat{\vartheta}_n-\vartheta_0) = \frac{1}{\sqrt n}S_n(\vartheta_0). \end{align*} This identity is the mechanism of the proof: the left side contains the averaged observed information, while the right side is the normalised score. [/guided] [/step] [step:Show the averaged observed information converges to Fisher information] For $t \in [0,1]$, define the random parameter \begin{align*} \vartheta_{n,t}:=\vartheta_0+t(\hat{\vartheta}_n-\vartheta_0). \end{align*} For a matrix $A \in \mathbb{R}^{(p+q)\times(p+q)}$, let $\|A\|_{\mathrm{op}}$ denote the operator norm induced by the Euclidean norm on $\mathbb{R}^{p+q}$: \begin{align*} \|A\|_{\mathrm{op}}:=\sup\{|Av|:v\in\mathbb{R}^{p+q},\ |v|=1\}. \end{align*} Since \begin{align*} |\vartheta_{n,t}-\vartheta_0| = t|\hat{\vartheta}_n-\vartheta_0| \le |\hat{\vartheta}_n-\vartheta_0|, \end{align*} the whole segment $\{\vartheta_{n,t}:0\le t\le 1\}$ lies in the random ball $B(\vartheta_0,|\hat{\vartheta}_n-\vartheta_0|)$. Because $\hat{\vartheta}_n \xrightarrow{\mathbb{P}} \vartheta_0$, for every fixed $\delta>0$, \begin{align*} \mathbb{P}\bigl(|\hat{\vartheta}_n-\vartheta_0|>\delta\bigr) \to 0. \end{align*} The assumed local uniform observed-Hessian convergence says that \begin{align*} \sup_{\vartheta \in B(\vartheta_0,\delta)}\left\| -\frac{1}{n}H_n(\vartheta)-I(\vartheta_0)\right\|_{\mathrm{op}} \xrightarrow{\mathbb{P}} 0 \end{align*} as first $n\to\infty$ and then $\delta\downarrow 0$. Hence, for every $\varepsilon>0$ and $\eta>0$, choose $\delta>0$ small enough for the local uniform convergence bound and then use consistency to obtain \begin{align*} \mathbb{P}\left( \sup_{0\le t\le 1}\left\| -\frac{1}{n}H_n(\vartheta_{n,t})-I(\vartheta_0)\right\|_{\mathrm{op}}>\varepsilon \right) &\le \mathbb{P}\bigl(|\hat{\vartheta}_n-\vartheta_0|>\delta\bigr)\\ &\quad+ \mathbb{P}\left( \sup_{\vartheta \in B(\vartheta_0,\delta)}\left\| -\frac{1}{n}H_n(\vartheta)-I(\vartheta_0)\right\|_{\mathrm{op}}>\varepsilon \right)\\ &<\eta \end{align*} for all sufficiently large $n$. Therefore \begin{align*} \sup_{0\le t\le 1}\left\| -\frac{1}{n}H_n(\vartheta_{n,t})-I(\vartheta_0)\right\|_{\mathrm{op}} \xrightarrow{\mathbb{P}} 0. \end{align*} Using the triangle inequality for the operator norm and the definition of $\bar{H}_n$, we obtain \begin{align*} \left\|-\frac{1}{n}\bar{H}_n-I(\vartheta_0)\right\|_{\mathrm{op}} &= \left\|\int_0^1\left(-\frac{1}{n}H_n(\vartheta_{n,t})-I(\vartheta_0)\right)\,d\mathcal{L}^1(t)\right\|_{\mathrm{op}}\\ &\le \int_0^1\left\|-\frac{1}{n}H_n(\vartheta_{n,t})-I(\vartheta_0)\right\|_{\mathrm{op}}\,d\mathcal{L}^1(t)\\ &\le \sup_{0\le t\le 1}\left\| -\frac{1}{n}H_n(\vartheta_{n,t})-I(\vartheta_0)\right\|_{\mathrm{op}}, \end{align*} so \begin{align*} -\frac{1}{n}\bar{H}_n \xrightarrow{\mathbb{P}} I(\vartheta_0). \end{align*} Because $I(\vartheta_0)$ is nonsingular, choose $\rho>0$ such that every matrix $A \in \mathbb{R}^{(p+q)\times(p+q)}$ with $\|A-I(\vartheta_0)\|_{\mathrm{op}}<\rho$ is nonsingular. The convergence \begin{align*} -\frac{1}{n}\bar{H}_n \xrightarrow{\mathbb{P}} I(\vartheta_0) \end{align*} therefore implies that $-n^{-1}\bar{H}_n$ is nonsingular with probability tending to one. On this event, the adjugate formula for matrix inversion and continuity of determinant and cofactors give continuity of the inversion map at $I(\vartheta_0)$, hence \begin{align*} \left(-\frac{1}{n}\bar{H}_n\right)^{-1} \xrightarrow{\mathbb{P}} I(\vartheta_0)^{-1}. \end{align*} [/step] [step:Combine the score limit with the information limit] From the Taylor identity, \begin{align*} \sqrt n(\hat{\vartheta}_n-\vartheta_0) = \left(-\frac{1}{n}\bar{H}_n\right)^{-1} \frac{1}{\sqrt n}S_n(\vartheta_0) \end{align*} with probability tending to one. The score regularity assumption gives \begin{align*} \frac{1}{\sqrt n}S_n(\vartheta_0) \xrightarrow{d} \mathcal{N}(0,I(\vartheta_0)). \end{align*} Together with \begin{align*} \left(-\frac{1}{n}\bar{H}_n\right)^{-1} \xrightarrow{\mathbb{P}} I(\vartheta_0)^{-1}, \end{align*} the [Slutsky Theorem](/page/Slutsky%20Theorem) yields \begin{align*} \sqrt n(\hat{\vartheta}_n-\vartheta_0) \xrightarrow{d} I(\vartheta_0)^{-1}Z, \end{align*} where $Z$ is a random vector with distribution $\mathcal{N}(0,I(\vartheta_0))$. Therefore \begin{align*} I(\vartheta_0)^{-1}Z \sim \mathcal{N}\left(0,I(\vartheta_0)^{-1}I(\vartheta_0)I(\vartheta_0)^{-1}\right) = \mathcal{N}(0,I(\vartheta_0)^{-1}). \end{align*} This proves \begin{align*} \sqrt n(\hat{\vartheta}_n-\vartheta_0) \xrightarrow{d} \mathcal{N}(0,I(\vartheta_0)^{-1}). \end{align*} [/step]

Explore Further

Fisher–Neyman Sufficiency of the Sample Mean and Sample Covariance for the Multivariate Normal Family probability Scheffe Simultaneous Confidence Intervals for Linear Combinations of a Multivariate Mean probability Bartlett's Chi-Squared Approximation for Wilks' Lambda in One-Way MANOVA probability Linear Forecast from the Wold Representation probability Orthogonal Equivariance of Covariance Principal Components probability Independence of the Sample Mean and Sample Covariance for Multivariate Normal Samples probability Ledoit-Wolf Linear Shrinkage Optimality Theorem probability Existence and Uniqueness of Population Canonical Directions probability

What brings you to Androma?

Start with a route through the knowledge graph.