Asymptotic Normality of Z-Estimators — Statement & Proof

Asymptotic Normality of Z-Estimators (Theorem # 9865)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] Set $\delta_n:=\hat\theta_n-\theta_0$ and first use the approximate estimating equation together with stochastic equicontinuity to express $\Psi(\hat\theta_n)$ as $-\Psi_n(\theta_0)$ up to an $o_{\mathbb P}(n^{-1/2})$ error. This shows that $\sqrt n\,\Psi(\hat\theta_n)$ is bounded in probability. Nonsingularity of the derivative $A$ and differentiability of $\Psi$ then convert this bound into the root-$n$ tightness of $\delta_n$. With that rate available, the differentiability remainder becomes $o_{\mathbb P}(n^{-1/2})$, so the estimating equation linearizes to $\sqrt n\,\delta_n=-A^{-1}\sqrt n\,\Psi_n(\theta_0)+o_{\mathbb P}(1)$, and the conclusion follows from Slutsky's theorem and the [continuous mapping theorem](/theorems/1847). [/proofplan] [step:Introduce the estimation error and rewrite the estimating equation] Define the random vector \begin{align*} \delta_n:\Omega\to\mathbb R^p \end{align*} by \begin{align*} \delta_n(\omega):=\hat\theta_n(\omega)-\theta_0. \end{align*} Since $\hat\theta_n\xrightarrow{\mathbb P}\theta_0$, we have $\delta_n\xrightarrow{\mathbb P}0$. For each $n\in\mathbb N$, define the random vector \begin{align*} R_n:\Omega\to\mathbb R^p \end{align*} by \begin{align*} R_n:=(\Psi_n-\Psi)(\hat\theta_n)-(\Psi_n-\Psi)(\theta_0). \end{align*} The stochastic equicontinuity assumption says precisely that \begin{align*} \sqrt n\,|R_n|=o_{\mathbb P}(1). \end{align*} Using $\Psi(\theta_0)=0$, we compute \begin{align*} \Psi(\hat\theta_n)+\Psi_n(\theta_0)=\Psi_n(\hat\theta_n)-R_n. \end{align*} Multiplying by $\sqrt n$ gives \begin{align*} \sqrt n\,\left|\Psi(\hat\theta_n)+\Psi_n(\theta_0)\right|\le \sqrt n\,|\Psi_n(\hat\theta_n)|+\sqrt n\,|R_n|. \end{align*} The residual assumption gives $\sqrt n\,|\Psi_n(\hat\theta_n)|=o_{\mathbb P}(1)$, and the stochastic equicontinuity assumption gives $\sqrt n\,|R_n|=o_{\mathbb P}(1)$. Hence \begin{align*} \Psi(\hat\theta_n)=-\Psi_n(\theta_0)+o_{\mathbb P}(n^{-1/2}). \end{align*} [/step] [step:Deduce that $\Psi(\hat\theta_n)$ is of order $n^{-1/2}$] Because \begin{align*} \sqrt n\,\Psi_n(\theta_0)\xrightarrow{d}Z, \end{align*} the sequence $(\sqrt n\,\Psi_n(\theta_0))_{n\ge 1}$ is tight in $\mathbb R^p$. In the notation used below, $Y_n=O_{\mathbb P}(a_n)$ for positive deterministic numbers $a_n$ means that $Y_n/a_n$ is bounded in probability. Therefore \begin{align*} \Psi_n(\theta_0)=O_{\mathbb P}(n^{-1/2}). \end{align*} Combining this with the conclusion of the previous step yields \begin{align*} \Psi(\hat\theta_n)=O_{\mathbb P}(n^{-1/2}). \end{align*} [/step] [step:Use nonsingularity of $A$ to obtain the root-$n$ rate] Since $\Psi$ is differentiable at $\theta_0$ with [Jacobian matrix](/page/Jacobian%20Matrix) $A$, there exists a deterministic remainder map \begin{align*} r:U\to\mathbb R^p \end{align*} defined on a neighbourhood $U\subset\mathbb R^p$ of $0$ such that $\theta_0+h\in\Theta$ for every $h\in U$ and \begin{align*} \Psi(\theta_0+h)=Ah+r(h) \end{align*} with \begin{align*} \lim_{h\to 0}\frac{|r(h)|}{|h|}=0. \end{align*} Because $A$ is nonsingular, $A^{-1}$ exists. Define \begin{align*} c:=\frac{1}{2\|A^{-1}\|_{\mathrm{op}}}>0. \end{align*} For $\rho>0$, let $B(0,\rho):=\{h\in\mathbb R^p:|h|<\rho\}$ denote the open Euclidean ball in $\mathbb R^p$ with centre $0$ and radius $\rho$. Choose $\rho>0$ such that $B(0,\rho)\subset U$ and \begin{align*} |r(h)|\le c|h| \end{align*} for every $h\in B(0,\rho)$. For such $h$, the lower bound \begin{align*} |Ah|\ge \frac{1}{\|A^{-1}\|_{\mathrm{op}}}|h| \end{align*} gives \begin{align*} |\Psi(\theta_0+h)|\ge |Ah|-|r(h)|\ge c|h|. \end{align*} On the event $\{|\delta_n|<\rho\}$, this gives \begin{align*} |\delta_n|\le c^{-1}|\Psi(\hat\theta_n)|. \end{align*} Since $\delta_n\xrightarrow{\mathbb P}0$, the events $\{|\delta_n|<\rho\}$ have probability tending to $1$. Since $\Psi(\hat\theta_n)=O_{\mathbb P}(n^{-1/2})$, it follows that \begin{align*} \sqrt n\,\delta_n=O_{\mathbb P}(1). \end{align*} [guided] The subtle point is that the theorem does not assume the rate $\sqrt n(\hat\theta_n-\theta_0)=O_{\mathbb P}(1)$; we must derive it. We start from differentiability. There is a neighbourhood $U\subset\mathbb R^p$ of $0$ and a deterministic map \begin{align*} r:U\to\mathbb R^p \end{align*} such that $\theta_0+h\in\Theta$ for $h\in U$ and \begin{align*} \Psi(\theta_0+h)=Ah+r(h), \end{align*} where the differentiability remainder satisfies \begin{align*} \lim_{h\to 0}\frac{|r(h)|}{|h|}=0. \end{align*} The nonsingularity of $A$ supplies a local inverse lower bound. Since $A^{-1}$ exists, for every $h\in\mathbb R^p$ we have $h=A^{-1}Ah$, so the [operator norm](/page/Operator%20Norm) inequality gives \begin{align*} |h|\le \|A^{-1}\|_{\mathrm{op}}|Ah|. \end{align*} Equivalently, \begin{align*} |Ah|\ge \frac{1}{\|A^{-1}\|_{\mathrm{op}}}|h|. \end{align*} Define \begin{align*} c:=\frac{1}{2\|A^{-1}\|_{\mathrm{op}}}>0. \end{align*} For $\rho>0$, let $B(0,\rho):=\{h\in\mathbb R^p:|h|<\rho\}$ denote the open Euclidean ball in $\mathbb R^p$ with centre $0$ and radius $\rho$. Because $|r(h)|/|h|\to 0$, we may choose $\rho>0$ such that $B(0,\rho)\subset U$ and \begin{align*} |r(h)|\le c|h| \end{align*} for every $h\in B(0,\rho)$. Therefore, whenever $h\in B(0,\rho)$, \begin{align*} |\Psi(\theta_0+h)|=|Ah+r(h)|. \end{align*} The [reverse triangle inequality](/theorems/2300) gives \begin{align*} |\Psi(\theta_0+h)|\ge |Ah|-|r(h)|. \end{align*} Using the lower bound for $|Ah|$ and the chosen bound on $r(h)$, we obtain \begin{align*} |\Psi(\theta_0+h)|\ge \frac{1}{\|A^{-1}\|_{\mathrm{op}}}|h|-c|h|=c|h|. \end{align*} Now put $h=\delta_n=\hat\theta_n-\theta_0$. Since $\delta_n\xrightarrow{\mathbb P}0$, the event $\{|\delta_n|<\rho\}$ has probability tending to $1$. On that event, \begin{align*} |\delta_n|\le c^{-1}|\Psi(\hat\theta_n)|. \end{align*} The previous step proved \begin{align*} \Psi(\hat\theta_n)=O_{\mathbb P}(n^{-1/2}). \end{align*} Multiplying the displayed inequality by $\sqrt n$ on the high-probability event $\{|\delta_n|<\rho\}$ gives \begin{align*} \sqrt n\,|\delta_n|\le c^{-1}\sqrt n\,|\Psi(\hat\theta_n)|. \end{align*} The right-hand side is $O_{\mathbb P}(1)$, and the exceptional event has probability tending to $0$. Hence \begin{align*} \sqrt n\,\delta_n=O_{\mathbb P}(1). \end{align*} This is the rate needed before we are allowed to multiply the differentiability remainder by $\sqrt n$. [/guided] [/step] [step:Linearize $\Psi(\hat\theta_n)$ at $\theta_0$ with a negligible remainder] Using the same remainder map $r:U\to\mathbb R^p$ as above, define an extended remainder map $\tilde r:\mathbb R^p\to\mathbb R^p$ by $\tilde r(h):=r(h)$ for $h\in U$ and $\tilde r(h):=0$ for $h\notin U$. On the event $\{\delta_n\in U\}$ we have \begin{align*} \Psi(\hat\theta_n)=A\delta_n+\tilde r(\delta_n). \end{align*} Since $U$ is a neighbourhood of $0$ and $\delta_n\xrightarrow{\mathbb P}0$, the event $\{\delta_n\in U\}$ has probability tending to $1$. Since $\sqrt n\,\delta_n=O_{\mathbb P}(1)$, the differentiability condition implies \begin{align*} \sqrt n\,\tilde r(\delta_n)=o_{\mathbb P}(1). \end{align*} Indeed, define $q:\mathbb R^p\to[0,\infty)$ by $q(0):=0$, \begin{align*} q(h):=\frac{|r(h)|}{|h|} \end{align*} for $h\in U\setminus\{0\}$, and $q(h):=0$ for $h\notin U$. Then $q$ is continuous at $0$ in the sequential sense needed here, because $q(h)\to 0$ as $h\to 0$ with $h\in U$ and every sequence converging to $0$ is eventually in $U$. Hence $q(\delta_n)\xrightarrow{\mathbb P}0$, while $\sqrt n\,|\delta_n|=O_{\mathbb P}(1)$. Therefore \begin{align*} \sqrt n\,|\tilde r(\delta_n)|=q(\delta_n)\sqrt n\,|\delta_n|=o_{\mathbb P}(1). \end{align*} Therefore, since $\{\delta_n\in U\}$ has probability tending to $1$, \begin{align*} \sqrt n\,\Psi(\hat\theta_n)=A\sqrt n\,\delta_n+o_{\mathbb P}(1). \end{align*} [/step] [step:Rearrange the linearized equation and pass to the limit] From the first step, \begin{align*} \sqrt n\,\Psi(\hat\theta_n)=-\sqrt n\,\Psi_n(\theta_0)+o_{\mathbb P}(1). \end{align*} From the linearization step, \begin{align*} \sqrt n\,\Psi(\hat\theta_n)=A\sqrt n\,\delta_n+o_{\mathbb P}(1). \end{align*} Combining these two identities gives \begin{align*} A\sqrt n\,\delta_n=-\sqrt n\,\Psi_n(\theta_0)+o_{\mathbb P}(1). \end{align*} Multiplying by the fixed matrix $A^{-1}$ yields \begin{align*} \sqrt n\,\delta_n=-A^{-1}\sqrt n\,\Psi_n(\theta_0)+o_{\mathbb P}(1). \end{align*} By the continuous mapping theorem for the continuous [linear map](/page/Linear%20Map) $x\mapsto -A^{-1}x$ from $\mathbb R^p$ to $\mathbb R^p$, \begin{align*} -A^{-1}\sqrt n\,\Psi_n(\theta_0)\xrightarrow{d}-A^{-1}Z. \end{align*} By Slutsky's theorem applied to the preceding convergence and the $o_{\mathbb P}(1)$ remainder, we conclude that \begin{align*} \sqrt n(\hat\theta_n-\theta_0)=\sqrt n\,\delta_n\xrightarrow{d}-A^{-1}Z. \end{align*} This is the desired asymptotic normality statement. [/step]

Prerequisites (0/3 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

Definitions & Concepts

Equicontinuity

Explore Further

What brings you to Androma?

Start with a route through the knowledge graph.

Asymptotic Normality of Z-Estimators (Theorem # 9865)

Discussion

Proof

Prerequisites (0/3 completed)

Prerequisites Graph

Explore Further

Sign in to Androma

Check your inbox

One last step

Asymptotic Normality of Z-Estimators (Theorem # 9865)

Discussion

Proof

Prerequisites (0/3 completed)

Prerequisites Graph

Explore Further