Wilks' Theorem (Theorem # 1431)
Theorem
Let $X_1, \ldots, X_n$ be i.i.d. with density $f(\cdot \mid \theta)$, where $\theta \in \Theta$. Suppose $\Theta_0 \subseteq \Theta$ with $|\Theta| - |\Theta_0| = p$. If $H_0: \theta \in \Theta_0$ is true, then as $n \to \infty$,
\begin{align*}
2\log \Lambda_{X}(H_0; H_1) \xrightarrow{d} \chi_p^2.
\end{align*}
If $H_0$ is false, $2\log \Lambda$ tends to be stochastically larger. The GLR test of approximate size $\alpha$ rejects $H_0$ when $2\log \Lambda > \chi_p^2(\alpha)$, where $\chi_p^2(\alpha)$ is the upper $\alpha$ critical point of the $\chi_p^2$ distribution.
Probability & Statistics
Statistical Inference
Discussion
No discussion available for this theorem.
Proof
[proofplan]
The proof is a second-order Taylor expansion of the log-likelihood around the unrestricted MLE, followed by identification of the resulting quadratic form. Write $\Theta \subseteq \mathbb{R}^d$ with $\Theta_0 \subseteq \Theta$ having dimension $d - p$. Under $H_0$, the unrestricted MLE $\hat\theta_n$ and the constrained MLE $\hat\theta_n^{(0)}$ are both consistent for the true parameter $\theta_0 \in \Theta_0$ and satisfy the score equation up to $o_{\mathbb{P}}(n^{-1/2})$. Taylor-expanding the log-likelihood $\ell_n$ around $\hat\theta_n$, the linear term vanishes (first-order condition) and the quadratic term carries the Hessian $-n\,I(\theta_0) + o_{\mathbb{P}}(n)$. Asymptotic normality of $\sqrt n(\hat\theta_n - \hat\theta_n^{(0)})$ restricted to the $p$-dimensional direction orthogonal to $\Theta_0$ (relative to $I(\theta_0)$) reduces $2\log\Lambda$ to the squared norm of a $p$-dimensional standard normal, which is $\chi^2_p$.
[/proofplan]
[step:Fix the parameterisation and identify the null manifold]
Assume $\Theta \subseteq \mathbb{R}^d$ is open and the model $\{f(\cdot \mid \theta): \theta \in \Theta\}$ satisfies the standard regularity conditions of the Asymptotic Normality of the MLE: the log-density $\log f(x \mid \theta)$ is thrice continuously differentiable in $\theta$ on $\Theta$; the Fisher information
\begin{align*}
I(\theta) &:= \mathbb{E}_\theta\!\left[\nabla_\theta \log f(X \mid \theta)\, \nabla_\theta \log f(X \mid \theta)^\top\right] \in \mathbb{R}^{d \times d}
\end{align*}
is finite, continuous, and positive-definite on $\Theta$; the true parameter $\theta_0$ lies in the interior of $\Theta_0$; and the model is identifiable.
Let the null set $\Theta_0 \subseteq \Theta$ be a smooth submanifold of codimension $p$: there exist a $C^2$ map $g: \Theta \to \mathbb{R}^p$ with $\nabla g(\theta_0)$ of full rank $p$, and
\begin{align*}
\Theta_0 &= \{\theta \in \Theta: g(\theta) = 0\}.
\end{align*}
(The condition $|\Theta| - |\Theta_0| = p$ in the theorem statement is the informal dimension count $\dim \Theta - \dim \Theta_0 = p$; this is the precise statement under which the proof proceeds.) Equivalently, by the [Implicit Function Theorem](/theorems/52), near $\theta_0$ there exists a $C^2$ chart $\psi: V \to \Theta$ from an open neighbourhood $V$ of $0 \in \mathbb{R}^d$ such that
\begin{align*}
\psi(V \cap (\mathbb{R}^{d-p} \times \{0\}^p)) \subseteq \Theta_0.
\end{align*}
By reparameterisation and a linear change of coordinates realised by the inverse square root of $I(\theta_0)$, we may and do assume $\theta_0 = 0$, $I(\theta_0) = I_d$ (the identity), and $\Theta_0 = (\mathbb{R}^{d-p} \times \{0\}^p) \cap \Theta$ locally near $0$. This orthogonal decomposition — $\mathbb{R}^d = T_{\theta_0}\Theta_0 \oplus N_{\theta_0}\Theta_0$ where the tangent space is $\mathbb{R}^{d-p} \times \{0\}$ and the normal space is $\{0\} \times \mathbb{R}^p$ — is crucial.
Write a generic $\theta \in \Theta$ in this chart as $\theta = (\xi, \eta)$ with $\xi \in \mathbb{R}^{d-p}$ and $\eta \in \mathbb{R}^p$.
[guided]
Wilks' theorem is a statement about a degree-of-freedom count: the null distribution of $2\log\Lambda$ has $p$ degrees of freedom, where $p$ is the *codimension* of the null — the number of constraints $H_0$ places on $\theta$. We need a setup that makes this count geometric. The regularity conditions supply four ingredients: smoothness (for Taylor expansion to terminate at second order with a controlled remainder), invertibility of $I$ (so the quadratic form is non-degenerate), interior point (so perturbations in every direction are legitimate), and identifiability (so MLEs are well-defined).
The most conceptually clarifying simplification is to choose coordinates in which $I(\theta_0) = I_d$ and $\Theta_0$ is a coordinate hyperplane near $\theta_0$. Both can be arranged:
- Rescale by $I(\theta_0)^{-1/2}$: let $\tilde\theta = I(\theta_0)^{1/2}(\theta - \theta_0)$. In the new coordinate the Fisher information at zero is the identity.
- Straighten $\Theta_0$ via the implicit function theorem: since $\nabla g(\theta_0)$ has full rank $p$, we can choose a $C^2$ local chart in which $\Theta_0$ is defined by $(\theta_{d-p+1}, \ldots, \theta_d) = 0$.
Composing, we work in coordinates $\theta = (\xi, \eta)$ where $\xi$ is tangent to $\Theta_0$ (free under $H_0$) and $\eta$ is normal to $\Theta_0$ (forced to zero under $H_0$). The directional independence of $I(\theta_0) = I_d$ is what reduces the final quadratic form to an unrotated sum of squares — it is what makes the answer "$p$" rather than "a trace involving projection onto the normal space of $\Theta_0$ in the $I$-inner product".
[/guided]
[/step]
[step:Set up both MLEs and derive their score expansions]
Write the log-likelihood
\begin{align*}
\ell_n: \Theta &\to \mathbb{R}, \\
\theta &\mapsto \sum_{i=1}^n \log f(X_i \mid \theta),
\end{align*}
the score $s_n(\theta) := \nabla \ell_n(\theta)$, and the observed information $\mathcal{J}_n(\theta) := -\nabla^2 \ell_n(\theta)$. Let
\begin{align*}
\hat\theta_n &:= \arg\max_{\theta \in \Theta} \ell_n(\theta), & \hat\theta_n^{(0)} &:= \arg\max_{\theta \in \Theta_0} \ell_n(\theta),
\end{align*}
both of which exist and are consistent for $\theta_0 = 0$ by the regularity assumptions. The generalised likelihood ratio is
\begin{align*}
2\log\Lambda_n &= 2\left[\ell_n(\hat\theta_n) - \ell_n(\hat\theta_n^{(0)})\right].
\end{align*}
By the [Weak Law of Large Numbers](/theorems/1127) applied componentwise and the regularity hypothesis that second derivatives of $\log f$ have integrable envelopes near $\theta_0$,
\begin{align*}
n^{-1}\mathcal{J}_n(\theta_0) \xrightarrow{\mathbb{P}} I(\theta_0) = I_d.
\end{align*}
By the [Central Limit Theorem](/theorems/521) applied to the i.i.d. mean-zero score contributions,
\begin{align*}
n^{-1/2} s_n(\theta_0) \xrightarrow{d} Z \sim N_d(0, I_d).
\end{align*}
At the unrestricted MLE, $s_n(\hat\theta_n) = 0$ by the first-order condition (interior maximum). At the constrained MLE, by Lagrange multipliers applied to the constraint $\eta = 0$, we have $s_n^{(\xi)}(\hat\theta_n^{(0)}) = 0$ — the score has zero component in the tangent directions — while the normal component may be nonzero. Writing $\hat\theta_n^{(0)} = (\hat\xi_n^{(0)}, 0)$ and decomposing $s_n = (s_n^{(\xi)}, s_n^{(\eta)})$, the constrained first-order conditions read
\begin{align*}
s_n^{(\xi)}(\hat\xi_n^{(0)}, 0) = 0, \qquad \hat\theta_n^{(0)} \in \Theta_0.
\end{align*}
[guided]
We have two maximisers: $\hat\theta_n$ over the full $\Theta$, and $\hat\theta_n^{(0)}$ over the null manifold $\Theta_0$. Both converge in probability to the true parameter $\theta_0 = 0$ because the true parameter is assumed to lie in $\Theta_0$ under $H_0$.
The score $s_n = \nabla \ell_n$ measures the slope of the log-likelihood. At an interior maximum of a smooth function, the score vanishes. So $s_n(\hat\theta_n) = 0$ unconditionally: this is the unrestricted first-order condition.
At the constrained maximum, the first-order condition is more subtle. We are maximising $\ell_n$ over $\Theta_0 = \{\eta = 0\}$, so we can vary only the $\xi$ coordinates; the derivative with respect to $\xi$ must vanish at $\hat\theta_n^{(0)}$, but the derivative with respect to $\eta$ need not — we are not free to perturb $\eta$ away from $0$. This is exactly the Lagrange-multiplier condition: $\nabla \ell_n$ must be orthogonal to the feasible directions, which are the $\xi$ directions. Hence $s_n^{(\xi)}(\hat\theta_n^{(0)}) = 0$ and $s_n^{(\eta)}(\hat\theta_n^{(0)})$ is in general nonzero — it is the Lagrange multiplier of the constraint $\eta = 0$.
The CLT and LLN inputs — $n^{-1/2} s_n(\theta_0) \xrightarrow{d} N_d(0, I_d)$ and $n^{-1}\mathcal{J}_n(\theta_0) \xrightarrow{\mathbb{P}} I_d$ — are standard under the regularity hypotheses. The score is a sum of i.i.d. mean-zero vectors with covariance $I(\theta_0) = I_d$ (this identity of covariance and Fisher information is the first Bartlett identity). The observed information, after division by $n$, is the sample mean of i.i.d. matrices converging to $I(\theta_0)$ by the LLN.
[/guided]
[/step]
[step:Expand $\ell_n$ to second order around $\hat\theta_n$ and evaluate the difference]
Apply the multivariate [Taylor expansion with Integral Remainder](/theorems/189) to $\ell_n$ around $\hat\theta_n$, evaluated at $\hat\theta_n^{(0)}$. Setting $h := \hat\theta_n^{(0)} - \hat\theta_n \in \mathbb{R}^d$,
\begin{align*}
\ell_n(\hat\theta_n^{(0)}) &= \ell_n(\hat\theta_n) + s_n(\hat\theta_n)^\top h - \tfrac{1}{2} h^\top \mathcal{J}_n(\tilde\theta_n) h,
\end{align*}
for some $\tilde\theta_n$ on the segment between $\hat\theta_n^{(0)}$ and $\hat\theta_n$ (in the componentwise mean-value form of the remainder, $\tilde\theta_n$ depends on the component; for a clean statement use the integral form, which gives the same asymptotic outcome). Since $s_n(\hat\theta_n) = 0$,
\begin{align*}
2\log\Lambda_n &= 2\left[\ell_n(\hat\theta_n) - \ell_n(\hat\theta_n^{(0)})\right] = h^\top \mathcal{J}_n(\tilde\theta_n) h.
\end{align*}
Both $\hat\theta_n$ and $\hat\theta_n^{(0)}$ are consistent for $\theta_0$, so $\tilde\theta_n \xrightarrow{\mathbb{P}} \theta_0 = 0$, and by continuity of $\theta \mapsto I(\theta)$ together with the LLN for the observed information,
\begin{align*}
n^{-1}\mathcal{J}_n(\tilde\theta_n) \xrightarrow{\mathbb{P}} I(\theta_0) = I_d.
\end{align*}
Therefore
\begin{align*}
2\log\Lambda_n &= (\sqrt n\, h)^\top \left[n^{-1}\mathcal{J}_n(\tilde\theta_n)\right] (\sqrt n\, h) = \|\sqrt n\, h\|^2 + o_{\mathbb{P}}(\|\sqrt n\, h\|^2).
\end{align*}
It remains to identify the asymptotic distribution of $\sqrt n\, h = \sqrt n(\hat\theta_n^{(0)} - \hat\theta_n)$.
[guided]
Since the score vanishes at $\hat\theta_n$ (unrestricted first-order condition) and $\ell_n$ is smooth, the value of $\ell_n$ near $\hat\theta_n$ is dominated by the quadratic term of Taylor's expansion. Taylor's theorem in the form with Lagrange remainder gives
\begin{align*}
\ell_n(\hat\theta_n^{(0)}) = \ell_n(\hat\theta_n) + \underbrace{s_n(\hat\theta_n)^\top h}_{= 0} + \tfrac{1}{2} h^\top \nabla^2\ell_n(\tilde\theta_n) h,
\end{align*}
where $\nabla^2\ell_n(\tilde\theta) = -\mathcal{J}_n(\tilde\theta)$. So
\begin{align*}
\ell_n(\hat\theta_n) - \ell_n(\hat\theta_n^{(0)}) = \tfrac{1}{2} h^\top \mathcal{J}_n(\tilde\theta_n) h,
\end{align*}
and $2\log\Lambda_n = h^\top \mathcal{J}_n(\tilde\theta_n) h$.
We want this to be asymptotically $\chi^2_p$. The natural move is to rescale: write $h^\top \mathcal{J}_n(\tilde\theta_n) h = (\sqrt n h)^\top [n^{-1}\mathcal{J}_n(\tilde\theta_n)] (\sqrt n h)$. The middle factor is a (consistent) estimator of the Fisher information, converging in probability to $I_d$. The outer factor, $\sqrt n h$, is a rescaled displacement of the constrained MLE from the unrestricted one.
If we can show $\sqrt n h$ converges in distribution to a vector $W \in \mathbb{R}^d$ that is degenerate on the $\xi$-directions (first $d - p$ coordinates) and standard normal on the $\eta$-directions (last $p$ coordinates) — i.e. $W = (0, \zeta)$ with $\zeta \sim N_p(0, I_p)$ — then the quadratic form becomes $W^\top I_d W = \|W\|^2 = \|\zeta\|^2 \sim \chi^2_p$, and we are done. This is the content of the next step.
[/guided]
[/step]
[step:Compute the asymptotic distribution of $\sqrt n(\hat\theta_n - \hat\theta_n^{(0)})$]
Recall the chart splits $\theta = (\xi, \eta)$ with $\Theta_0 = \{\eta = 0\}$ locally, and $I(\theta_0) = I_d$. By asymptotic normality of the MLE,
\begin{align*}
\sqrt n\, \hat\theta_n &= \sqrt n\,(\hat\theta_n - \theta_0) \xrightarrow{d} Z = (Z^{(\xi)}, Z^{(\eta)}) \sim N_d(0, I_d),
\end{align*}
with $Z^{(\xi)} \in \mathbb{R}^{d-p}$ and $Z^{(\eta)} \in \mathbb{R}^p$ independent standard Gaussians.
The constrained MLE is the projection of $\hat\theta_n$ onto $\Theta_0$ in the information metric. Under our choice of coordinates where $I(\theta_0) = I_d$, this is the Euclidean orthogonal projection onto $\mathbb{R}^{d-p} \times \{0\}$, up to first order: we claim
\begin{align*}
\sqrt n\, \hat\theta_n^{(0)} &= \sqrt n\, (\hat\xi_n^{(0)}, 0) = (\sqrt n\, \hat\xi_n, 0) + o_{\mathbb{P}}(1).
\end{align*}
[claim:First-order equivalence of $\hat\xi_n^{(0)}$ and $\hat\xi_n$]
$\sqrt n\,(\hat\xi_n^{(0)} - \hat\xi_n) \xrightarrow{\mathbb{P}} 0$.
[proof]
By the first-order condition at the unrestricted MLE, $s_n(\hat\theta_n) = 0$, so in particular $s_n^{(\xi)}(\hat\xi_n, \hat\eta_n) = 0$. By the constrained first-order condition, $s_n^{(\xi)}(\hat\xi_n^{(0)}, 0) = 0$. Define
\begin{align*}
\Phi_n: \mathbb{R}^{d-p} \times \mathbb{R}^p &\to \mathbb{R}^{d-p}, \\
(\xi, \eta) &\mapsto n^{-1} s_n^{(\xi)}(\xi, \eta).
\end{align*}
Then $\Phi_n(\hat\xi_n, \hat\eta_n) = \Phi_n(\hat\xi_n^{(0)}, 0) = 0$. The map $\Phi_n$ is continuously differentiable, and by the LLN $\nabla_\xi \Phi_n(\theta_0) \xrightarrow{\mathbb{P}} -I_{\xi\xi}(\theta_0) = -I_{d-p}$ (the upper-left block of $I(\theta_0) = I_d$). By the implicit function theorem applied pathwise in a neighbourhood where $\nabla_\xi \Phi_n$ is invertible (which happens with probability $\to 1$), there is a $C^1$ solution map $\xi = \chi_n(\eta)$ of $\Phi_n(\xi, \eta) = 0$, and both $\hat\xi_n = \chi_n(\hat\eta_n)$ and $\hat\xi_n^{(0)} = \chi_n(0)$. Taylor expanding $\chi_n$ around $0$,
\begin{align*}
\sqrt n\,(\hat\xi_n - \hat\xi_n^{(0)}) = \sqrt n\,[\chi_n(\hat\eta_n) - \chi_n(0)] = \nabla \chi_n(0) \cdot \sqrt n\, \hat\eta_n + o_{\mathbb{P}}(1).
\end{align*}
Implicit differentiation of $\Phi_n(\chi_n(\eta), \eta) = 0$ gives $\nabla\chi_n(0) = -[\nabla_\xi \Phi_n]^{-1}\nabla_\eta \Phi_n \xrightarrow{\mathbb{P}} I_{d-p}^{-1} \cdot I_{\xi\eta}(\theta_0) = 0$, where the cross-block $I_{\xi\eta}(\theta_0)$ vanishes because $I(\theta_0) = I_d$. Since $\sqrt n\, \hat\eta_n = O_{\mathbb{P}}(1)$ by asymptotic normality of the MLE, the product is $o_{\mathbb{P}}(1)$.
[/proof]
[/claim]
Combining the claim with the decomposition of $\sqrt n\, h = \sqrt n\, \hat\theta_n^{(0)} - \sqrt n\, \hat\theta_n$,
\begin{align*}
\sqrt n\, h = (\sqrt n\,(\hat\xi_n^{(0)} - \hat\xi_n), -\sqrt n\, \hat\eta_n) = (o_{\mathbb{P}}(1), -\sqrt n\, \hat\eta_n).
\end{align*}
By asymptotic normality, $\sqrt n\, \hat\eta_n \xrightarrow{d} Z^{(\eta)} \sim N_p(0, I_p)$. Therefore
\begin{align*}
\sqrt n\, h \xrightarrow{d} (0, -Z^{(\eta)}),
\end{align*}
and its squared Euclidean norm converges in distribution to $\|Z^{(\eta)}\|^2 \sim \chi^2_p$.
[guided]
Under $H_0$ both MLEs are close to $\theta_0$, at scale $n^{-1/2}$. The question is: how close are they to each other, along each coordinate direction?
In the $\eta$-directions, the constrained MLE is pinned to $\hat\eta_n^{(0)} = 0$ while the unrestricted MLE has $\hat\eta_n$ of order $n^{-1/2}$, tending to a standard normal $Z^{(\eta)} \sim N_p(0, I_p)$ after scaling. So $\sqrt n\,(\hat\eta_n^{(0)} - \hat\eta_n) = -\sqrt n\,\hat\eta_n \xrightarrow{d} -Z^{(\eta)}$.
In the $\xi$-directions — the directions tangent to $\Theta_0$ — the story is more delicate. Both estimators optimise in $\xi$, one subject to $\eta = 0$ and one freely. If the Fisher information has block-diagonal structure in our chart (so the $\xi$- and $\eta$-components of the score are asymptotically independent), then constraining $\eta$ does not affect the $\xi$-optimiser at first order. That is the content of the claim: $\sqrt n(\hat\xi_n - \hat\xi_n^{(0)}) \xrightarrow{\mathbb{P}} 0$.
The proof of the claim uses the implicit function theorem. Both $\hat\theta_n$ and $\hat\theta_n^{(0)}$ satisfy $s_n^{(\xi)} = 0$; by the IFT this determines $\xi$ as a function $\chi_n$ of $\eta$. The difference $\hat\xi_n - \hat\xi_n^{(0)} = \chi_n(\hat\eta_n) - \chi_n(0)$ equals the derivative $\nabla \chi_n(0)$ times $\hat\eta_n$ plus lower-order terms. The derivative $\nabla\chi_n(0)$ involves the cross-block $I_{\xi\eta}$ of the Fisher information, which vanishes in our chosen chart (where $I(\theta_0) = I_d$). Hence the first-order change in $\hat\xi_n$ due to moving from $\eta = \hat\eta_n$ to $\eta = 0$ is zero.
This is where the orthogonality $I(\theta_0) = I_d$ pays off: in a general chart, the constrained and unconstrained $\xi$-estimates would differ by an amount proportional to $I_{\xi\eta}(\theta_0) \hat\eta_n$, and unscrambling the quadratic form $h^\top I(\theta_0) h$ would require projecting onto the $\eta$-direction in the $I$-inner product. We have arranged coordinates so that the $I$-inner product is the Euclidean inner product, the $\eta$-direction is already $I$-orthogonal to the $\xi$-direction, and the projection reduces to the Euclidean projection.
Putting it together, $\sqrt n h \xrightarrow{d} (0, -Z^{(\eta)})$ with $Z^{(\eta)} \sim N_p(0, I_p)$. The length squared of this vector is $\|Z^{(\eta)}\|^2$, a sum of $p$ independent squared $N(0,1)$'s — by definition $\chi^2_p$.
[/guided]
[/step]
[step:Assemble the pieces to conclude $2\log\Lambda_n \xrightarrow{d} \chi^2_p$]
From Step 3,
\begin{align*}
2\log\Lambda_n = (\sqrt n h)^\top [n^{-1}\mathcal{J}_n(\tilde\theta_n)] (\sqrt n h),
\end{align*}
with $n^{-1}\mathcal{J}_n(\tilde\theta_n) \xrightarrow{\mathbb{P}} I_d$ and $\sqrt n h \xrightarrow{d} (0, -Z^{(\eta)})$ from Step 4. By the Continuous Mapping Theorem and Slutsky's Theorem applied to the bilinear map $(A, v) \mapsto v^\top A v$ (jointly continuous on $\mathbb{R}^{d \times d} \times \mathbb{R}^d$, with the first factor converging in probability to a constant),
\begin{align*}
2\log\Lambda_n \xrightarrow{d} (0, -Z^{(\eta)})^\top I_d (0, -Z^{(\eta)}) = \|Z^{(\eta)}\|^2.
\end{align*}
Since $Z^{(\eta)} \sim N_p(0, I_p)$, we have $\|Z^{(\eta)}\|^2 = \sum_{j=1}^p (Z^{(\eta)}_j)^2 \sim \chi^2_p$ by Definition of the Chi-Squared Distribution. Hence
\begin{align*}
2\log\Lambda_n \xrightarrow{d} \chi^2_p,
\end{align*}
which is the stated asymptotic distribution.
For the second part — that the GLR test of approximate size $\alpha$ rejects when $2\log\Lambda_n > \chi^2_p(\alpha)$ — observe that $\chi^2_p(\alpha)$ is the upper $\alpha$ quantile of $\chi^2_p$, which is a continuity point of the $\chi^2_p$ CDF (since the $\chi^2_p$ distribution has a smooth density on $(0, \infty)$ for all $p \ge 1$). By convergence in distribution at continuity points of the limit CDF,
\begin{align*}
\mathbb{P}_{\theta_0}\!\left(2\log\Lambda_n > \chi^2_p(\alpha)\right) \xrightarrow[n \to \infty]{} \mathbb{P}(\chi^2_p > \chi^2_p(\alpha)) = \alpha.
\end{align*}
So the asymptotic size of the test is $\alpha$, and under $H_0$ the test rejects with probability approaching $\alpha$ — confirming the "approximate size $\alpha$" interpretation. The stochastic-dominance claim under $H_1$ (that $2\log\Lambda_n$ tends to be stochastically larger when $H_0$ is false) follows from consistency of the test: if $\theta_0 \notin \Theta_0$, the restricted MLE is bounded away from $\theta_0$ and the log-likelihood difference $\ell_n(\hat\theta_n) - \ell_n(\hat\theta_n^{(0)})$ grows linearly in $n$, so $2\log\Lambda_n \to \infty$ in probability. This completes the proof.
[/step]
Explore Further
Stein's Lemma
Statistics
Fisher Information Under Reparametrisation
Statistics
CLT for Triangular Arrays
Statistics
Unbiased Bayes Rules Are Trivial
Statistics
Quadratic Forms and Idempotent Matrices
Statistics
Kolmogorov–Smirnov
Statistics
Asymptotic Equivalence of Wald, LRT, and Score Tests
Statistics
Gibbs Sampler Convergence
Statistics