Neyman-Pearson Lemma — Statement & Proof

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

No discussion available for this theorem.

Proof

[proofplan] Let $C$ be the likelihood-ratio critical region and $C^*$ the critical region of any competing test with size $\alpha^* \le \alpha$. Write the difference of Type II errors as $\beta - \beta^* = \int_{\bar C} f_1\, d\mathcal{L}^n - \int_{\bar C^*} f_1\, d\mathcal{L}^n$ and use $(\bar C) \setminus (\bar C^*) = \bar C \cap C^*$ and $(\bar C^*)\setminus (\bar C) = C \cap \bar C^*$ to reduce to two integrals of $f_1$ over disjoint symmetric-difference pieces. The defining inequality of $C$ — that $f_1 \le k f_0$ on $\bar C$ and $f_1 \ge k f_0$ on $C$ — then replaces $f_1$ by $k f_0$ termwise, and the two integrals collapse to $k(\alpha^* - \alpha) \le 0$. The non-degeneracy assumption (level sets $\{\Lambda = k\}$ are $\mathbb{P}_0$-null) guarantees a threshold $k$ with exact size $\alpha$ exists. [/proofplan] [step:Fix the setup and reduce power comparison to comparing Type II errors] Let $X = (X_1, \ldots, X_n)$ take values in $\mathbb{R}^n$ with joint density $f$ under the true distribution, where $f \in \{f_0, f_1\}$ and $f_0, f_1: \mathbb{R}^n \to [0, \infty)$ are continuous densities with respect to $\mathcal{L}^n$ satisfying $\{f_0 > 0\} = \{f_1 > 0\} =: S$. Assume further that the likelihood ratio $\Lambda := f_1/f_0$ on $S$ satisfies the **non-degeneracy condition**: for every $k \ge 0$, the level set $\{x \in S : \Lambda(x) = k\}$ has $\mathbb{P}_0$-measure zero (equivalently, the distribution of $\Lambda$ under $\mathbb{P}_0$ is atom-free). This is the standard hypothesis under which the Neyman-Pearson lemma holds in its non-randomised form. For a measurable set $D \subseteq \mathbb{R}^n$ and $i \in \{0, 1\}$, write \begin{align*} \mathbb{P}_i(D) &:= \int_D f_i(x)\, d\mathcal{L}^n(x). \end{align*} A **test** of $H_0: f = f_0$ against $H_1: f = f_1$ is specified by a measurable critical region $D \subseteq \mathbb{R}^n$: we reject $H_0$ iff $X \in D$. Its **size** is $\mathbb{P}_0(D)$, its **power** is $\mathbb{P}_1(D)$, and its **Type II error probability** is \begin{align*} \beta(D) &:= 1 - \mathbb{P}_1(D) = \mathbb{P}_1(\bar D), \end{align*} where $\bar D := \mathbb{R}^n \setminus D$. Maximising power is equivalent to minimising $\beta$. Fix a size $\alpha \in (0, 1)$. Define the likelihood-ratio critical region \begin{align*} C &:= \{x \in S : f_1(x) > k\, f_0(x)\}, \end{align*} where $k \ge 0$ is chosen so that $\mathbb{P}_0(C) = \alpha$ (existence of such a $k$ is established in the next step). Write $\beta := \beta(C)$. Let $C^* \subseteq \mathbb{R}^n$ be the critical region of any competing test with $\alpha^* := \mathbb{P}_0(C^*) \le \alpha$, and write $\beta^* := \beta(C^*)$. We must show $\beta \le \beta^*$. [guided] We are comparing two tests. The first, $C$, is the likelihood-ratio test: it rejects $H_0$ when the observed data are relatively more likely under $H_1$ than under $H_0$, with "relatively more likely" meaning the ratio $f_1/f_0$ exceeds a threshold $k$. The second, $C^*$, is an arbitrary competitor whose only stated property is that its size is at most $\alpha$. The Neyman-Pearson lemma claims $C$ has the largest power. "Power" is $\mathbb{P}_1$ of the critical region; "most powerful" means smallest Type II error probability, $\beta(D) = 1 - \mathbb{P}_1(D) = \mathbb{P}_1(\bar D)$. So we must show $\beta \le \beta^*$, or equivalently $\mathbb{P}_1(C) \ge \mathbb{P}_1(C^*)$. The common-support hypothesis $\{f_0 > 0\} = \{f_1 > 0\} = S$ lets us restrict everything to $S$ without loss of generality: outside $S$ both densities vanish, so the integrals are identically zero. This is why the problem of $f_1 > 0, f_0 = 0$ (infinite likelihood ratio) does not arise here — we have excluded it by hypothesis. Setting $\alpha^* := \mathbb{P}_0(C^*) \le \alpha$ records our sole assumption on the competitor: it is not oversized. We do not assume $C^*$ has the form of a likelihood-ratio test, does not assume its size is exactly $\alpha$, does not assume anything about its shape. The proof must apply to all such $C^*$. [/guided] [/step] [step:Choose the threshold $k$ so that the LR test has exact size $\alpha$] Define the tail function \begin{align*} G: [0, \infty) &\to [0, 1], \\ k &\mapsto \mathbb{P}_0(\{x \in S : f_1(x) > k\, f_0(x)\}). \end{align*} Equivalently, letting $\Lambda(x) := f_1(x)/f_0(x)$ on $S$ (well-defined there since $f_0 > 0$), \begin{align*} G(k) = \mathbb{P}_0(\Lambda > k) = \mathbb{P}_0(\{x \in S : \Lambda(x) > k\}). \end{align*} Then $G$ is non-increasing, right-continuous, with $G(0) = \mathbb{P}_0(\{f_1 > 0\} \cap S) = \mathbb{P}_0(S) = 1$ (using $\{f_1 > 0\} = S$) and $\lim_{k \to \infty} G(k) = 0$ by continuity of measure from above applied to the decreasing sets $\{\Lambda > k\}$ (which have finite initial measure $\mathbb{P}_0(\Lambda > 0) \le 1$) whose intersection is $\varnothing$ $\mathbb{P}_0$-a.e. By the non-degeneracy assumption stated in the setup (the level sets $\{\Lambda = k\}$ are $\mathbb{P}_0$-null for every $k \ge 0$), the distribution of $\Lambda$ under $\mathbb{P}_0$ is atom-free, so its distribution function $F_\Lambda$ is continuous. Hence $G(k) = 1 - F_\Lambda(k)$ is continuous. By the intermediate value theorem, there exists $k \ge 0$ with $G(k) = \alpha$. Fix such a $k$. Then \begin{align*} \mathbb{P}_0(C) = G(k) = \alpha. \end{align*} [guided] The likelihood-ratio test as stated requires a threshold $k$ giving size exactly $\alpha$. Such a $k$ need not exist in full generality — for discrete distributions the size function $G$ jumps, and it may skip over $\alpha$. The continuity hypothesis on $f_0, f_1$ is what rules this out. Concretely: the likelihood ratio $\Lambda = f_1/f_0$ is a ratio of continuous non-negative functions on the open set $S = \{f_0 > 0\}$, so $\Lambda: S \to [0, \infty)$ is continuous. However, continuity of $\Lambda$ alone does not guarantee that the distribution of $\Lambda$ under $\mathbb{P}_0$ is atom-free. (Consider the degenerate case $f_0 = f_1$: then $\Lambda \equiv 1$, which is continuous but has all its $\mathbb{P}_0$-mass at a single point.) This is why we imposed the non-degeneracy assumption in the setup: the level sets $\{\Lambda = k\}$ are $\mathbb{P}_0$-null for every $k \ge 0$, which is exactly the condition that the distribution of $\Lambda$ under $\mathbb{P}_0$ has no atoms. With this assumption, the distribution function $F_\Lambda$ is continuous, and $G(k) = \mathbb{P}_0(\Lambda > k) = 1 - F_\Lambda(k)$ is therefore continuous, non-increasing, $G(0) = 1$, $G(\infty) = 0$. The intermediate value theorem gives some $k$ with $G(k) = \alpha$. This is the only place we use the continuity of the densities. For general (possibly discrete) distributions, one replaces the critical region $\{f_1 > k f_0\}$ by a randomised test of the form $\phi(x) \in [0,1]$, and the lemma becomes a statement about the most powerful randomised test at size $\alpha$. The argument below goes through essentially unchanged in that setting. [/guided] [/step] [step:Decompose $\beta - \beta^*$ over the symmetric difference of $\bar C$ and $\bar C^*$] Using additivity of the integral, for any measurable $D, D'$ with a common measurable set we have $\mathbb{P}_1(D) - \mathbb{P}_1(D') = \mathbb{P}_1(D \setminus D') - \mathbb{P}_1(D' \setminus D)$ (both integrals $\int_{D \cap D'} f_1\, d\mathcal{L}^n$ cancel). Applying this with $D = \bar C$ and $D' = \bar C^*$, and observing $\bar C \setminus \bar C^* = \bar C \cap C^*$ and $\bar C^* \setminus \bar C = \bar C^* \cap C = C \cap \bar C^*$: \begin{align*} \beta - \beta^* &= \mathbb{P}_1(\bar C) - \mathbb{P}_1(\bar C^*) \\ &= \mathbb{P}_1(\bar C \cap C^*) - \mathbb{P}_1(C \cap \bar C^*) \\ &= \int_{\bar C \cap C^*} f_1(x)\, d\mathcal{L}^n(x) - \int_{C \cap \bar C^*} f_1(x)\, d\mathcal{L}^n(x). \end{align*} [guided] We want to compare $\beta = \mathbb{P}_1(\bar C)$ and $\beta^* = \mathbb{P}_1(\bar C^*)$. The two sets $\bar C$ and $\bar C^*$ overlap: they share the region $\bar C \cap \bar C^*$ (where both tests accept $H_0$), and contributions from this shared region cancel in the difference. Only the symmetric-difference pieces $\bar C \triangle \bar C^*$ matter. The symmetric difference decomposes as $\bar C \triangle \bar C^* = (\bar C \setminus \bar C^*) \sqcup (\bar C^* \setminus \bar C)$ (disjoint). Using $\bar C^* = \mathbb{R}^n \setminus C^*$, we rewrite \begin{align*} \bar C \setminus \bar C^* &= \bar C \cap (\bar C^*)^c = \bar C \cap C^*, \\ \bar C^* \setminus \bar C &= \bar C^* \cap (\bar C)^c = \bar C^* \cap C = C \cap \bar C^*. \end{align*} The first set — $\bar C \cap C^*$ — is where $C^*$ rejects but $C$ accepts. The second — $C \cap \bar C^*$ — is where $C$ rejects but $C^*$ accepts. These are the two regions where the two tests disagree, and all the action happens there. With $D = \bar C$ and $D' = \bar C^*$, the identity $\mathbb{P}_1(D) - \mathbb{P}_1(D') = \mathbb{P}_1(D \setminus D') - \mathbb{P}_1(D' \setminus D)$ is the measure-theoretic reformulation of subtracting a common integral over the shared region $D \cap D'$. Unfolding, we get exactly \begin{align*} \beta - \beta^* = \int_{\bar C \cap C^*} f_1\, d\mathcal{L}^n - \int_{C \cap \bar C^*} f_1\, d\mathcal{L}^n. \end{align*} This is a clean decomposition: the first term is what $C$ fails to catch (and $C^*$ does); the second is what $C$ catches (and $C^*$ misses). The likelihood-ratio structure of $C$ now lets us bound both pieces. [/guided] [/step] [step:Bound $f_1$ by $k f_0$ on each piece using the LR structure of $C$] On $C$, by definition $f_1 > k f_0$, so on any subset of $C$ — in particular on $C \cap \bar C^*$ — we have $f_1 \ge k f_0$ (with strict inequality). On $\bar C$, by definition $f_1 \le k f_0$, so on $\bar C \cap C^*$ we have $f_1 \le k f_0$. Since $k \ge 0$ and $f_0 \ge 0$, these pointwise bounds integrate directly: \begin{align*} \int_{\bar C \cap C^*} f_1(x)\, d\mathcal{L}^n(x) &\le k \int_{\bar C \cap C^*} f_0(x)\, d\mathcal{L}^n(x) = k\, \mathbb{P}_0(\bar C \cap C^*), \\ \int_{C \cap \bar C^*} f_1(x)\, d\mathcal{L}^n(x) &\ge k \int_{C \cap \bar C^*} f_0(x)\, d\mathcal{L}^n(x) = k\, \mathbb{P}_0(C \cap \bar C^*). \end{align*} Substituting into the identity from the previous step, \begin{align*} \beta - \beta^* \le k\, \mathbb{P}_0(\bar C \cap C^*) - k\, \mathbb{P}_0(C \cap \bar C^*) = k\left[\mathbb{P}_0(\bar C \cap C^*) - \mathbb{P}_0(C \cap \bar C^*)\right]. \end{align*} [guided] Now we use the one piece of structure that distinguishes $C$ from an arbitrary critical region: on $C$, the alternative density dominates the null density scaled by $k$, i.e. $f_1 > k f_0$; on $\bar C$, the reverse holds, $f_1 \le k f_0$. This is a pointwise bound on $f_1$ in terms of $f_0$ that holds with opposite senses on $C$ and $\bar C$. Since integration preserves pointwise inequalities of non-negative integrands (and $k \ge 0$, $f_0 \ge 0$ everywhere), we may integrate: \begin{align*} \int_{\bar C \cap C^*} f_1\, d\mathcal{L}^n &\le \int_{\bar C \cap C^*} k f_0\, d\mathcal{L}^n = k\, \mathbb{P}_0(\bar C \cap C^*), \\ \int_{C \cap \bar C^*} f_1\, d\mathcal{L}^n &\ge \int_{C \cap \bar C^*} k f_0\, d\mathcal{L}^n = k\, \mathbb{P}_0(C \cap \bar C^*). \end{align*} The first bound is an upper bound on a term we are *adding* to $\beta - \beta^*$, and the second is a lower bound on a term we are *subtracting*. Both push $\beta - \beta^*$ upward, yielding \begin{align*} \beta - \beta^* \le k\,\mathbb{P}_0(\bar C \cap C^*) - k\,\mathbb{P}_0(C \cap \bar C^*). \end{align*} The right-hand side is a difference of two probabilities of symmetric-difference-style events, both under $\mathbb{P}_0$. We next recognise this difference as $k(\alpha^* - \alpha)$. [/guided] [/step] [step:Recognise the right-hand side as $k(\alpha^* - \alpha) \le 0$] Apply the same set-symmetric-difference identity used in the decomposition step, this time under $\mathbb{P}_0$ to the pair $(C^*, C)$: $\mathbb{P}_0(C^*) - \mathbb{P}_0(C) = \mathbb{P}_0(C^* \setminus C) - \mathbb{P}_0(C \setminus C^*)$, and $C^* \setminus C = C^* \cap \bar C = \bar C \cap C^*$, $C \setminus C^* = C \cap \bar C^*$. Therefore \begin{align*} \mathbb{P}_0(\bar C \cap C^*) - \mathbb{P}_0(C \cap \bar C^*) = \mathbb{P}_0(C^*) - \mathbb{P}_0(C) = \alpha^* - \alpha \le 0, \end{align*} where the last inequality uses the hypothesis $\alpha^* \le \alpha$. Since $k \ge 0$, combining with the previous step, \begin{align*} \beta - \beta^* \le k(\alpha^* - \alpha) \le 0. \end{align*} Hence $\beta \le \beta^*$, which is equivalent to $\mathbb{P}_1(C) \ge \mathbb{P}_1(C^*)$: the likelihood-ratio test has power at least as large as any competitor of size at most $\alpha$. This completes the proof. [guided] The two probabilities $\mathbb{P}_0(\bar C \cap C^*)$ and $\mathbb{P}_0(C \cap \bar C^*)$ are themselves a symmetric-difference decomposition — this time of $\mathbb{P}_0(C^*)$ and $\mathbb{P}_0(C)$. Rerunning the identity $\mathbb{P}(D) - \mathbb{P}(D') = \mathbb{P}(D \setminus D') - \mathbb{P}(D' \setminus D)$ with $D = C^*$ and $D' = C$: \begin{align*} \mathbb{P}_0(C^*) - \mathbb{P}_0(C) &= \mathbb{P}_0(C^* \setminus C) - \mathbb{P}_0(C \setminus C^*) \\ &= \mathbb{P}_0(\bar C \cap C^*) - \mathbb{P}_0(C \cap \bar C^*). \end{align*} By construction, $\mathbb{P}_0(C) = \alpha$ (chosen in Step 2) and $\mathbb{P}_0(C^*) = \alpha^*$ (the size of the competitor). So \begin{align*} \mathbb{P}_0(\bar C \cap C^*) - \mathbb{P}_0(C \cap \bar C^*) = \alpha^* - \alpha. \end{align*} The hypothesis $\alpha^* \le \alpha$ makes this non-positive. Since $k \ge 0$, multiplying a non-positive quantity by a non-negative one preserves non-positivity: \begin{align*} \beta - \beta^* \le k(\alpha^* - \alpha) \le 0, \end{align*} so $\beta \le \beta^*$. The argument is maximally parsimonious: the structural property of $C$ enters only once, as the pointwise inequality $f_1 \lessgtr k f_0$ on $\bar C/C$, and everything else is set-theoretic bookkeeping. The fundamental insight is that moving probability mass into the critical region is beneficial precisely when $f_1 > k f_0$ — that is, when the Type-I-error cost (measured against $f_0$) is less than $1/k$ per unit of power gain. The likelihood-ratio test is the one that exploits every unit of null-probability budget $\alpha$ on data configurations where this trade-off is most favourable. [/guided] [/step]

What brings you to Androma?

Start with a route through the knowledge graph.

Neyman-Pearson Lemma (Theorem # 1430)

Discussion

Proof

Explore Further

Sign in to Androma

Check your inbox

One last step

Neyman-Pearson Lemma (Theorem # 1430)

Discussion

Proof

Explore Further