Approximate Confidence Interval via MLE — Statement & Proof

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

No discussion available for this theorem.

Proof

[proofplan] The proof exhibits an approximate $N(0,1)$ pivot for $\theta$ and inverts it. Under the regularity conditions, the MLE is asymptotically normal with asymptotic variance $1/I(\theta)$, so $\sqrt{n\,I(\theta)}(\hat\theta_n - \theta) \xrightarrow{d} N(0,1)$. Since $I(\hat\theta_n) \to I(\theta)$ in probability by continuity of $I$ and consistency of $\hat\theta_n$, Slutsky's theorem replaces $I(\theta)$ by $I(\hat\theta_n)$ without changing the limit. Inverting the inequality $|\sqrt{n\,I(\hat\theta_n)}(\hat\theta_n - \theta)| \le z_{(1-\gamma)/2}$ yields the stated interval, and its asymptotic coverage is $\gamma$ by the continuous mapping theorem. [/proofplan] [step:Set up the asymptotic distribution of the MLE] Let $X_1, \ldots, X_n$ be i.i.d. with density $f(x;\theta)$, $\theta \in \Theta \subseteq \mathbb{R}$, and assume the standard regularity conditions: $\log f(x;\theta)$ is twice continuously differentiable in $\theta$; the Fisher information \begin{align*} I(\theta) &:= \mathbb{E}_\theta\!\left[\left(\frac{\partial}{\partial \theta}\log f(X;\theta)\right)^2\right] \end{align*} exists, is finite, and is strictly positive on $\Theta$; the true parameter $\theta$ lies in the interior of $\Theta$; and differentiation may be exchanged with integration against $f(x;\theta)\, d\mathcal{L}^1(x)$. Let $\hat\theta_n$ be the MLE computed from $X_1, \ldots, X_n$. Under these hypotheses the Asymptotic Normality of the MLE gives consistency $\hat\theta_n \xrightarrow{\mathbb{P}} \theta$ and \begin{align*} \sqrt{n}\,(\hat\theta_n - \theta) \xrightarrow{d} N\!\left(0,\; \frac{1}{I(\theta)}\right). \end{align*} Equivalently, multiplying both sides by the deterministic constant $\sqrt{I(\theta)} > 0$, \begin{align*} Z_n := \sqrt{n\,I(\theta)}\,(\hat\theta_n - \theta) \xrightarrow{d} N(0,1). \end{align*} [/step] [step:Replace $I(\theta)$ by $I(\hat\theta_n)$ via Slutsky's theorem] The function $I: \Theta \to (0, \infty)$ is continuous by the regularity assumption on $f$, and $\hat\theta_n \xrightarrow{\mathbb{P}} \theta$ from the previous step. By the Continuous Mapping Theorem, \begin{align*} I(\hat\theta_n) \xrightarrow{\mathbb{P}} I(\theta). \end{align*} Since $I(\theta) > 0$, the function $u \mapsto \sqrt{u/I(\theta)}$ is continuous at $u = I(\theta)$, so a second application of the continuous mapping theorem gives \begin{align*} R_n := \sqrt{\frac{I(\hat\theta_n)}{I(\theta)}} \xrightarrow{\mathbb{P}} 1. \end{align*} Define the pivot \begin{align*} T_n := \sqrt{n\,I(\hat\theta_n)}\,(\hat\theta_n - \theta) = R_n \cdot Z_n. \end{align*} By Slutsky's Theorem, since $Z_n \xrightarrow{d} Z \sim N(0,1)$ and $R_n \xrightarrow{\mathbb{P}} 1$, the product converges in distribution: \begin{align*} T_n \xrightarrow{d} 1 \cdot Z = N(0,1). \end{align*} [guided] We have $Z_n = \sqrt{n\,I(\theta)}(\hat\theta_n - \theta) \xrightarrow{d} N(0,1)$ from the previous step, but $Z_n$ contains the unknown $I(\theta)$ and therefore cannot be used as an observable pivot. The cure is to substitute the consistent estimator $I(\hat\theta_n)$; we must check that this substitution does not alter the limit. We use Slutsky's theorem, which states: if $Z_n \xrightarrow{d} Z$ and $R_n \xrightarrow{\mathbb{P}} c$ for a constant $c$, then $R_n Z_n \xrightarrow{d} c Z$. Here we take $R_n := \sqrt{I(\hat\theta_n)/I(\theta)}$ so that $T_n = R_n Z_n$. We verify the two hypotheses of Slutsky. First, $Z_n \xrightarrow{d} N(0,1)$ is the conclusion of the previous step. Second, we must show $R_n \xrightarrow{\mathbb{P}} 1$. The regularity hypothesis includes continuity of $\theta \mapsto I(\theta)$ on $\Theta$, so the continuous mapping theorem applied to the consistent estimator $\hat\theta_n \xrightarrow{\mathbb{P}} \theta$ yields $I(\hat\theta_n) \xrightarrow{\mathbb{P}} I(\theta)$. Since $I(\theta) > 0$ by hypothesis, the map $u \mapsto \sqrt{u/I(\theta)}$ is continuous on $(0, \infty)$, so a second application of the continuous mapping theorem gives $R_n \xrightarrow{\mathbb{P}} \sqrt{I(\theta)/I(\theta)} = 1$. This is where we consume the strict positivity of $I(\theta)$: without it, the square root would not be continuous at the limit point and Slutsky would fail. With both hypotheses of Slutsky in place, \begin{align*} T_n = R_n \cdot Z_n \xrightarrow{d} 1 \cdot Z = N(0,1). \end{align*} This is the crucial point that makes the interval observable: $T_n$ depends only on the data through $\hat\theta_n$ and $I(\hat\theta_n)$, with no unknown parameters in sight. [/guided] [/step] [step:Invert the pivot to obtain the confidence interval] Fix $\gamma \in (0,1)$ and let $\alpha := 1 - \gamma$. By definition, $z_{\alpha/2}$ is the upper $\alpha/2$ quantile of the standard normal: if $Z \sim N(0,1)$, then $\mathbb{P}(|Z| \le z_{\alpha/2}) = 1 - \alpha = \gamma$. Since the distribution function of $|Z|$ is continuous at $z_{\alpha/2}$, the [Portmanteau Theorem](/theorems/1171) (or equivalently, convergence in distribution at continuity points of the limit CDF) gives \begin{align*} \mathbb{P}_\theta\!\left(|T_n| \le z_{\alpha/2}\right) \xrightarrow[n \to \infty]{} \mathbb{P}\!\left(|Z| \le z_{\alpha/2}\right) = \gamma. \end{align*} Now unfold the event $\{|T_n| \le z_{\alpha/2}\}$. On the event $\{I(\hat\theta_n) > 0\}$ — which has probability tending to $1$ by the preceding convergence — $\sqrt{n\,I(\hat\theta_n)} > 0$, and we may divide by it: \begin{align*} |T_n| \le z_{\alpha/2} &\iff |\sqrt{n\,I(\hat\theta_n)}\,(\hat\theta_n - \theta)| \le z_{\alpha/2} \\ &\iff |\hat\theta_n - \theta| \le \frac{z_{\alpha/2}}{\sqrt{n\,I(\hat\theta_n)}} \\ &\iff \theta \in \left(\hat\theta_n - \frac{z_{\alpha/2}}{\sqrt{n\,I(\hat\theta_n)}},\; \hat\theta_n + \frac{z_{\alpha/2}}{\sqrt{n\,I(\hat\theta_n)}}\right). \end{align*} Writing $z_{(1-\gamma)/2}$ for $z_{\alpha/2}$ and denoting the interval by $I_n(X)$, \begin{align*} \mathbb{P}_\theta\!\left(\theta \in I_n(X)\right) = \mathbb{P}_\theta\!\left(|T_n| \le z_{(1-\gamma)/2}\right) \xrightarrow[n \to \infty]{} \gamma. \end{align*} So $I_n(X)$ has asymptotic coverage $\gamma$, which is the defining property of a $100\gamma\%$ approximate confidence interval. This completes the proof. [guided] We now convert the distributional limit $T_n \xrightarrow{d} N(0,1)$ into a probabilistic statement about a random interval containing $\theta$. The abstract move is: given a pivot with a known limiting distribution, the set of $\theta$ values consistent with the pivot's central mass is a confidence set. Fix the desired coverage level $\gamma \in (0,1)$ and set $\alpha := 1 - \gamma$. The value $z_{\alpha/2}$ is by definition the upper $\alpha/2$ quantile of $N(0,1)$, so for $Z \sim N(0,1)$, \begin{align*} \mathbb{P}(|Z| \le z_{\alpha/2}) = 1 - 2 \cdot \mathbb{P}(Z > z_{\alpha/2}) = 1 - \alpha = \gamma. \end{align*} The distribution function of $|Z|$ is continuous everywhere on $(0, \infty)$ (it is smooth, being built from the normal CDF), so $z_{\alpha/2}$ is a continuity point of the limit. Convergence in distribution of $T_n$ to $|Z|$'s distribution then gives convergence of probabilities at this continuity point: \begin{align*} \mathbb{P}_\theta(|T_n| \le z_{\alpha/2}) \to \mathbb{P}(|Z| \le z_{\alpha/2}) = \gamma. \end{align*} This is the only place we use the continuity-point requirement in the definition of convergence in distribution — it is harmless here because the normal CDF has no atoms. We now algebraically invert the event $\{|T_n| \le z_{\alpha/2}\}$ into a statement about $\theta$. With probability tending to $1$, $I(\hat\theta_n) > 0$, so $\sqrt{n\,I(\hat\theta_n)}$ is strictly positive and we may divide. The inequality $|T_n| = \sqrt{n\,I(\hat\theta_n)}\,|\hat\theta_n - \theta| \le z_{\alpha/2}$ becomes \begin{align*} |\hat\theta_n - \theta| \le \frac{z_{\alpha/2}}{\sqrt{n\,I(\hat\theta_n)}}, \end{align*} which is equivalent to $\theta$ lying in the symmetric interval centred at $\hat\theta_n$ of half-width $z_{\alpha/2}/\sqrt{n\,I(\hat\theta_n)}$: \begin{align*} I_n(X) := \left(\hat\theta_n - \frac{z_{(1-\gamma)/2}}{\sqrt{n\,I(\hat\theta_n)}},\; \hat\theta_n + \frac{z_{(1-\gamma)/2}}{\sqrt{n\,I(\hat\theta_n)}}\right). \end{align*} Therefore $\{|T_n| \le z_{\alpha/2}\} = \{\theta \in I_n(X)\}$ (on the full-probability event where $I(\hat\theta_n) > 0$), and we conclude \begin{align*} \mathbb{P}_\theta(\theta \in I_n(X)) \to \gamma. \end{align*} This is what it means for $I_n(X)$ to be an approximate $100\gamma\%$ confidence interval: the procedure that produces it covers $\theta$ with probability approaching $\gamma$ as the sample size grows. Note that the coverage is asymptotic and one-sided in $n$: for fixed $n$ the exact coverage may differ from $\gamma$, with the error controlled by the quality of the normal approximation to $T_n$'s law. [/guided] [/step]

What brings you to Androma?

Start with a route through the knowledge graph.

Approximate Confidence Interval via MLE (Theorem # 1429)

Discussion

Proof

Explore Further

Sign in to Androma

Check your inbox

One last step

Approximate Confidence Interval via MLE (Theorem # 1429)

Discussion

Proof

Explore Further