[proofplan]
The proof exhibits an approximate $N(0,1)$ pivot for $\theta$ and inverts it. Under the regularity conditions, the MLE is asymptotically normal with asymptotic variance $1/I(\theta)$, so $\sqrt{n\,I(\theta)}(\hat\theta_n - \theta) \xrightarrow{d} N(0,1)$. Since $I(\hat\theta_n) \to I(\theta)$ in probability by continuity of $I$ and consistency of $\hat\theta_n$, Slutsky's theorem replaces $I(\theta)$ by $I(\hat\theta_n)$ without changing the limit. Inverting the inequality $|\sqrt{n\,I(\hat\theta_n)}(\hat\theta_n - \theta)| \le z_{(1-\gamma)/2}$ yields the stated interval, and its asymptotic coverage is $\gamma$ by the continuous mapping theorem.
[/proofplan]
[step:Set up the asymptotic distribution of the MLE]
Let $X_1, \ldots, X_n$ be i.i.d. with density $f(x;\theta)$, $\theta \in \Theta \subseteq \mathbb{R}$, and assume the standard regularity conditions: $\log f(x;\theta)$ is twice continuously differentiable in $\theta$; the Fisher information
\begin{align*}
I(\theta) &:= \mathbb{E}_\theta\!\left[\left(\frac{\partial}{\partial \theta}\log f(X;\theta)\right)^2\right]
\end{align*}
exists, is finite, and is strictly positive on $\Theta$; the true parameter $\theta$ lies in the interior of $\Theta$; and differentiation may be exchanged with integration against $f(x;\theta)\, d\mathcal{L}^1(x)$. Let $\hat\theta_n$ be the MLE computed from $X_1, \ldots, X_n$.
Under these hypotheses the Asymptotic Normality of the MLE gives consistency $\hat\theta_n \xrightarrow{\mathbb{P}} \theta$ and
\begin{align*}
\sqrt{n}\,(\hat\theta_n - \theta) \xrightarrow{d} N\!\left(0,\; \frac{1}{I(\theta)}\right).
\end{align*}
Equivalently, multiplying both sides by the deterministic constant $\sqrt{I(\theta)} > 0$,
\begin{align*}
Z_n := \sqrt{n\,I(\theta)}\,(\hat\theta_n - \theta) \xrightarrow{d} N(0,1).
\end{align*}
[/step]
[step:Replace $I(\theta)$ by $I(\hat\theta_n)$ via Slutsky's theorem]
The function $I: \Theta \to (0, \infty)$ is continuous by the regularity assumption on $f$, and $\hat\theta_n \xrightarrow{\mathbb{P}} \theta$ from the previous step. By the Continuous Mapping Theorem,
\begin{align*}
I(\hat\theta_n) \xrightarrow{\mathbb{P}} I(\theta).
\end{align*}
Since $I(\theta) > 0$, the function $u \mapsto \sqrt{u/I(\theta)}$ is continuous at $u = I(\theta)$, so a second application of the continuous mapping theorem gives
\begin{align*}
R_n := \sqrt{\frac{I(\hat\theta_n)}{I(\theta)}} \xrightarrow{\mathbb{P}} 1.
\end{align*}
Define the pivot
\begin{align*}
T_n := \sqrt{n\,I(\hat\theta_n)}\,(\hat\theta_n - \theta) = R_n \cdot Z_n.
\end{align*}
By Slutsky's Theorem, since $Z_n \xrightarrow{d} Z \sim N(0,1)$ and $R_n \xrightarrow{\mathbb{P}} 1$, the product converges in distribution:
\begin{align*}
T_n \xrightarrow{d} 1 \cdot Z = N(0,1).
\end{align*}
[guided]
We have $Z_n = \sqrt{n\,I(\theta)}(\hat\theta_n - \theta) \xrightarrow{d} N(0,1)$ from the previous step, but $Z_n$ contains the unknown $I(\theta)$ and therefore cannot be used as an observable pivot. The cure is to substitute the consistent estimator $I(\hat\theta_n)$; we must check that this substitution does not alter the limit.
We use Slutsky's theorem, which states: if $Z_n \xrightarrow{d} Z$ and $R_n \xrightarrow{\mathbb{P}} c$ for a constant $c$, then $R_n Z_n \xrightarrow{d} c Z$. Here we take $R_n := \sqrt{I(\hat\theta_n)/I(\theta)}$ so that $T_n = R_n Z_n$. We verify the two hypotheses of Slutsky.
First, $Z_n \xrightarrow{d} N(0,1)$ is the conclusion of the previous step. Second, we must show $R_n \xrightarrow{\mathbb{P}} 1$. The regularity hypothesis includes continuity of $\theta \mapsto I(\theta)$ on $\Theta$, so the continuous mapping theorem applied to the consistent estimator $\hat\theta_n \xrightarrow{\mathbb{P}} \theta$ yields $I(\hat\theta_n) \xrightarrow{\mathbb{P}} I(\theta)$. Since $I(\theta) > 0$ by hypothesis, the map $u \mapsto \sqrt{u/I(\theta)}$ is continuous on $(0, \infty)$, so a second application of the continuous mapping theorem gives $R_n \xrightarrow{\mathbb{P}} \sqrt{I(\theta)/I(\theta)} = 1$. This is where we consume the strict positivity of $I(\theta)$: without it, the square root would not be continuous at the limit point and Slutsky would fail.
With both hypotheses of Slutsky in place,
\begin{align*}
T_n = R_n \cdot Z_n \xrightarrow{d} 1 \cdot Z = N(0,1).
\end{align*}
This is the crucial point that makes the interval observable: $T_n$ depends only on the data through $\hat\theta_n$ and $I(\hat\theta_n)$, with no unknown parameters in sight.
[/guided]
[/step]
[step:Invert the pivot to obtain the confidence interval]
Fix $\gamma \in (0,1)$ and let $\alpha := 1 - \gamma$. By definition, $z_{\alpha/2}$ is the upper $\alpha/2$ quantile of the standard normal: if $Z \sim N(0,1)$, then $\mathbb{P}(|Z| \le z_{\alpha/2}) = 1 - \alpha = \gamma$. Since the distribution function of $|Z|$ is continuous at $z_{\alpha/2}$, the [Portmanteau Theorem](/theorems/1171) (or equivalently, convergence in distribution at continuity points of the limit CDF) gives
\begin{align*}
\mathbb{P}_\theta\!\left(|T_n| \le z_{\alpha/2}\right) \xrightarrow[n \to \infty]{} \mathbb{P}\!\left(|Z| \le z_{\alpha/2}\right) = \gamma.
\end{align*}
Now unfold the event $\{|T_n| \le z_{\alpha/2}\}$. On the event $\{I(\hat\theta_n) > 0\}$ — which has probability tending to $1$ by the preceding convergence — $\sqrt{n\,I(\hat\theta_n)} > 0$, and we may divide by it:
\begin{align*}
|T_n| \le z_{\alpha/2}
&\iff |\sqrt{n\,I(\hat\theta_n)}\,(\hat\theta_n - \theta)| \le z_{\alpha/2} \\
&\iff |\hat\theta_n - \theta| \le \frac{z_{\alpha/2}}{\sqrt{n\,I(\hat\theta_n)}} \\
&\iff \theta \in \left(\hat\theta_n - \frac{z_{\alpha/2}}{\sqrt{n\,I(\hat\theta_n)}},\; \hat\theta_n + \frac{z_{\alpha/2}}{\sqrt{n\,I(\hat\theta_n)}}\right).
\end{align*}
Writing $z_{(1-\gamma)/2}$ for $z_{\alpha/2}$ and denoting the interval by $I_n(X)$,
\begin{align*}
\mathbb{P}_\theta\!\left(\theta \in I_n(X)\right) = \mathbb{P}_\theta\!\left(|T_n| \le z_{(1-\gamma)/2}\right) \xrightarrow[n \to \infty]{} \gamma.
\end{align*}
So $I_n(X)$ has asymptotic coverage $\gamma$, which is the defining property of a $100\gamma\%$ approximate confidence interval. This completes the proof.
[guided]
We now convert the distributional limit $T_n \xrightarrow{d} N(0,1)$ into a probabilistic statement about a random interval containing $\theta$. The abstract move is: given a pivot with a known limiting distribution, the set of $\theta$ values consistent with the pivot's central mass is a confidence set.
Fix the desired coverage level $\gamma \in (0,1)$ and set $\alpha := 1 - \gamma$. The value $z_{\alpha/2}$ is by definition the upper $\alpha/2$ quantile of $N(0,1)$, so for $Z \sim N(0,1)$,
\begin{align*}
\mathbb{P}(|Z| \le z_{\alpha/2}) = 1 - 2 \cdot \mathbb{P}(Z > z_{\alpha/2}) = 1 - \alpha = \gamma.
\end{align*}
The distribution function of $|Z|$ is continuous everywhere on $(0, \infty)$ (it is smooth, being built from the normal CDF), so $z_{\alpha/2}$ is a continuity point of the limit. Convergence in distribution of $T_n$ to $|Z|$'s distribution then gives convergence of probabilities at this continuity point:
\begin{align*}
\mathbb{P}_\theta(|T_n| \le z_{\alpha/2}) \to \mathbb{P}(|Z| \le z_{\alpha/2}) = \gamma.
\end{align*}
This is the only place we use the continuity-point requirement in the definition of convergence in distribution — it is harmless here because the normal CDF has no atoms.
We now algebraically invert the event $\{|T_n| \le z_{\alpha/2}\}$ into a statement about $\theta$. With probability tending to $1$, $I(\hat\theta_n) > 0$, so $\sqrt{n\,I(\hat\theta_n)}$ is strictly positive and we may divide. The inequality $|T_n| = \sqrt{n\,I(\hat\theta_n)}\,|\hat\theta_n - \theta| \le z_{\alpha/2}$ becomes
\begin{align*}
|\hat\theta_n - \theta| \le \frac{z_{\alpha/2}}{\sqrt{n\,I(\hat\theta_n)}},
\end{align*}
which is equivalent to $\theta$ lying in the symmetric interval centred at $\hat\theta_n$ of half-width $z_{\alpha/2}/\sqrt{n\,I(\hat\theta_n)}$:
\begin{align*}
I_n(X) := \left(\hat\theta_n - \frac{z_{(1-\gamma)/2}}{\sqrt{n\,I(\hat\theta_n)}},\; \hat\theta_n + \frac{z_{(1-\gamma)/2}}{\sqrt{n\,I(\hat\theta_n)}}\right).
\end{align*}
Therefore $\{|T_n| \le z_{\alpha/2}\} = \{\theta \in I_n(X)\}$ (on the full-probability event where $I(\hat\theta_n) > 0$), and we conclude
\begin{align*}
\mathbb{P}_\theta(\theta \in I_n(X)) \to \gamma.
\end{align*}
This is what it means for $I_n(X)$ to be an approximate $100\gamma\%$ confidence interval: the procedure that produces it covers $\theta$ with probability approaching $\gamma$ as the sample size grows. Note that the coverage is asymptotic and one-sided in $n$: for fixed $n$ the exact coverage may differ from $\gamma$, with the error controlled by the quality of the normal approximation to $T_n$'s law.
[/guided]
[/step]