[proofplan]
The proof has two parts. First, we expand the two deviances and observe that the saturated log-likelihood is the same for both models, so it cancels in the difference. The remaining statistic is exactly the likelihood-ratio statistic comparing the constrained model $M_0$ with the larger model $M_1$. Since the true parameter lies in the constrained model and the constraint has rank $q$, [Wilks' theorem](/theorems/1864) gives the limiting $\chi^2_q$ distribution.
[/proofplan]
custom_env
admin
[step:Cancel the saturated log-likelihood in the deviance difference]
For $i \in \{0,1\}$, the deviance of $M_i$ is
\begin{align*}
D_{i,n}
=
2\{\ell_{\mathrm{sat},n}-\ell_{i,n}(\hat{\beta}_{i,n})\}.
\end{align*}
Therefore
\begin{align*}
D_{0,n}-D_{1,n}
&=
2\{\ell_{\mathrm{sat},n}-\ell_{0,n}(\hat{\beta}_{0,n})\}
-
2\{\ell_{\mathrm{sat},n}-\ell_{1,n}(\hat{\beta}_{1,n})\} \\
&=
2\{\ell_{1,n}(\hat{\beta}_{1,n})-\ell_{0,n}(\hat{\beta}_{0,n})\}.
\end{align*}
This proves the asserted identity between the deviance difference and twice the maximized log-likelihood difference.
[/step]
custom_env
admin
[step:Identify the deviance difference as the likelihood-ratio statistic]
Define the likelihood-ratio statistic
\begin{align*}
\Lambda_n
:=
2\left\{
\sup_{\beta \in \Theta_1}\ell_{1,n}(\beta)
-
\sup_{\beta \in \Theta_0}\ell_{0,n}(\beta)
\right\}.
\end{align*}
Since $\hat{\beta}_{1,n}$ is a maximum likelihood estimator over $\Theta_1$ and $\hat{\beta}_{0,n}$ is a maximum likelihood estimator over $\Theta_0$, we have
\begin{align*}
\sup_{\beta \in \Theta_1}\ell_{1,n}(\beta)
&=
\ell_{1,n}(\hat{\beta}_{1,n}), \\
\sup_{\beta \in \Theta_0}\ell_{0,n}(\beta)
&=
\ell_{0,n}(\hat{\beta}_{0,n}).
\end{align*}
Hence
\begin{align*}
\Lambda_n
=
2\{\ell_{1,n}(\hat{\beta}_{1,n})-\ell_{0,n}(\hat{\beta}_{0,n})\}
=
D_{0,n}-D_{1,n}.
\end{align*}
Thus the deviance difference is exactly the likelihood-ratio statistic for testing the constrained model $M_0$ against the larger model $M_1$.
[/step]
custom_env
admin
[step:Apply Wilks theorem with $q$ independent restrictions]The hypotheses place the true parameter $\beta_*$ in $\Theta_0$, and the restriction map
\begin{align*}
r: \Theta_1 \to \mathbb{R}^q
\end{align*}
has Jacobian matrix $Jr_\beta$ of rank $q$ along $\Theta_0$ near $\beta_*$. Thus the null model is a regular codimension-$q$ submodel of the larger regular model. The stated regularity assumptions are exactly the regularity assumptions required by [Wilks' theorem](/theorems/1431). Therefore, by Wilks' theorem (citing a result not yet in the wiki: Wilks theorem),
\begin{align*}
\Lambda_n \xrightarrow{d} \chi^2_q.
\end{align*}[/step]
custom_env
admin
[guided]We now use the asymptotic distribution theorem for likelihood-ratio statistics. The statistic already identified is
\begin{align*}
\Lambda_n
=
2\left\{
\sup_{\beta \in \Theta_1}\ell_{1,n}(\beta)
-
\sup_{\beta \in \Theta_0}\ell_{0,n}(\beta)
\right\}.
\end{align*}
Wilks' theorem applies to nested regular parametric models when the smaller model is the true model and is obtained from the larger model by imposing a fixed number of independent smooth restrictions.
We verify these hypotheses. First, the models are nested because $\Theta_0 \subset \Theta_1$. Second, the theorem assumes that the true parameter satisfies $\beta_* \in \Theta_0$, so the constrained model $M_0$ is correctly specified. Third, the restriction map
\begin{align*}
r: \Theta_1 \to \mathbb{R}^q
\end{align*}
defines the constrained space by
\begin{align*}
\Theta_0 = \{\beta \in \Theta_1 : r(\beta)=0\},
\end{align*}
and the rank condition $\operatorname{rank} Jr_\beta=q$ means that these are $q$ independent restrictions rather than redundant equations. Finally, the statement assumes the standard maximum-likelihood regularity hypotheses required for Wilks' theorem.
Therefore Wilks' theorem (citing a result not yet in the wiki: Wilks theorem) gives
\begin{align*}
\Lambda_n \xrightarrow{d} \chi^2_q.
\end{align*}
This is the only asymptotic input in the proof; everything before this point was an algebraic identification of the deviance difference with the likelihood-ratio statistic.[/guided]
custom_env
admin
[step:Transfer the limiting distribution back to the deviance difference]
From the identity proved above,
\begin{align*}
D_{0,n}-D_{1,n}=\Lambda_n.
\end{align*}
Combining this equality with
\begin{align*}
\Lambda_n \xrightarrow{d} \chi^2_q
\end{align*}
gives
\begin{align*}
D_{0,n}-D_{1,n}
=
2\{\ell_{1,n}(\hat{\beta}_{1,n})-\ell_{0,n}(\hat{\beta}_{0,n})\}
\xrightarrow{d}
\chi^2_q.
\end{align*}
This is precisely the claimed asymptotic deviance comparison for the nested regular GLMs.
[/step]