Binary Hypothesis Testing Characterization of Total Variation

Binary Hypothesis Testing Characterization of Total Variation (Theorem # 5907)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] A test $\phi$ is equivalent to choosing the measurable rejection region $A=\{x \in \mathcal X:\phi(x)=1\}$. The average error can then be written exactly in terms of the signed difference $P(A)-Q(A)$. Since total variation is the largest possible absolute separation between $P$ and $Q$ over measurable sets, and complements convert $P(A)-Q(A)$ into $Q(A^c)-P(A^c)$, minimizing the error is the same as maximizing this separation. The result follows by taking the infimum over all measurable test regions. [/proofplan] [step:Rewrite every test as a measurable decision region] Let $\phi: \mathcal X \to \{0,1\}$ be a measurable test, where $\{0,1\}$ is equipped with its discrete $\sigma$-algebra. Define the measurable set \begin{align*} A_\phi := \{x \in \mathcal X : \phi(x)=1\} = \phi^{-1}(\{1\}) \in \mathcal A. \end{align*} Then \begin{align*} \{x \in \mathcal X : \phi(x)=0\} = \mathcal X \setminus A_\phi. \end{align*} Conversely, for every $A \in \mathcal A$, define the measurable map $\phi_A: \mathcal X \to \{0,1\}$ by \begin{align*} \phi_A(x) := \mathbb{1}_A(x) \quad \text{for each } x \in \mathcal X. \end{align*} Then $A_{\phi_A}=A$. Therefore taking the infimum over all measurable tests $\phi$ is the same as taking the infimum over all measurable sets $A \in \mathcal A$. [/step] [step:Express the average error through measure separation] For a measurable set $A \in \mathcal A$, the corresponding test $\phi_A=\mathbb{1}_A$ has average error \begin{align*} \frac{1}{2}\left(P(A)+Q(\mathcal X \setminus A)\right). \end{align*} Since $Q$ is a probability measure, finite additivity on the disjoint union $\mathcal X=A\cup(\mathcal X\setminus A)$ gives \begin{align*} Q(\mathcal X \setminus A)=1-Q(A). \end{align*} Hence \begin{align*} \frac{1}{2}\left(P(A)+Q(\mathcal X \setminus A)\right)=\frac{1}{2}\left(P(A)+1-Q(A)\right). \end{align*} Rearranging the terms inside the parentheses gives \begin{align*} \frac{1}{2}\left(P(A)+1-Q(A)\right)=\frac{1}{2}\left(1-\bigl(Q(A)-P(A)\bigr)\right). \end{align*} [guided] The test only matters through the set on which it outputs $1$. If $A \in \mathcal A$ is that set, then the error under $P$ is $P(A)$, because the displayed risk counts $\phi=1$ as an error when the true distribution is $P$. The error under $Q$ is $Q(\mathcal X \setminus A)$, because the risk counts $\phi=0$ as an error when the true distribution is $Q$. Since $Q$ is a probability measure, $Q(\mathcal X)=1$. The sets $A$ and $\mathcal X\setminus A$ are disjoint measurable sets whose union is $\mathcal X$, so finite additivity gives \begin{align*} Q(A)+Q(\mathcal X \setminus A)=Q(\mathcal X)=1. \end{align*} Therefore \begin{align*} Q(\mathcal X \setminus A)=1-Q(A). \end{align*} Substituting this into the average error yields \begin{align*} \frac{1}{2}\left(P(A)+Q(\mathcal X \setminus A)\right)=\frac{1}{2}\left(P(A)+1-Q(A)\right). \end{align*} Rearranging the signed difference gives \begin{align*} \frac{1}{2}\left(P(A)+1-Q(A)\right)=\frac{1}{2}\left(1-\bigl(Q(A)-P(A)\bigr)\right). \end{align*} Thus minimizing the error is equivalent to making $Q(A)-P(A)$ as large as possible. [/guided] [/step] [step:Identify the largest signed separation with total variation] Define \begin{align*} M := \sup_{A \in \mathcal A} \bigl(Q(A)-P(A)\bigr). \end{align*} We show that $M=\operatorname{TV}(P,Q)$. For every $A \in \mathcal A$, \begin{align*} Q(A)-P(A) \le |Q(A)-P(A)| = |P(A)-Q(A)| \le \operatorname{TV}(P,Q), \end{align*} so $M \le \operatorname{TV}(P,Q)$. For the reverse inequality, fix $A \in \mathcal A$. If $P(A)-Q(A) \le Q(A)-P(A)$, then \begin{align*} |P(A)-Q(A)| = Q(A)-P(A) \le M. \end{align*} If $P(A)-Q(A) > Q(A)-P(A)$, use the complement $\mathcal X \setminus A$. Since $P$ and $Q$ are probability measures, \begin{align*} Q(\mathcal X \setminus A)-P(\mathcal X \setminus A)=\bigl(1-Q(A)\bigr)-\bigl(1-P(A)\bigr). \end{align*} Simplifying the right-hand side gives \begin{align*} \bigl(1-Q(A)\bigr)-\bigl(1-P(A)\bigr)=P(A)-Q(A). \end{align*} In the present case $P(A)-Q(A)>Q(A)-P(A)$, so $P(A)-Q(A)>0$ and hence \begin{align*} P(A)-Q(A)=|P(A)-Q(A)|. \end{align*} Thus $|P(A)-Q(A)| \le M$ in both cases. Taking the supremum over $A \in \mathcal A$ gives \begin{align*} \operatorname{TV}(P,Q) \le M. \end{align*} Therefore \begin{align*} M=\operatorname{TV}(P,Q). \end{align*} [/step] [step:Take the infimum over all measurable tests] Using the preceding identification of tests with measurable sets, \begin{align*} \inf_{\phi} \frac{1}{2}\left(P(\{x \in \mathcal X : \phi(x)=1\})+Q(\{x \in \mathcal X : \phi(x)=0\})\right)=\inf_{A \in \mathcal A} \frac{1}{2}\left(P(A)+Q(\mathcal X \setminus A)\right). \end{align*} Using the expression for the average error from the second step, \begin{align*} \inf_{A \in \mathcal A} \frac{1}{2}\left(P(A)+Q(\mathcal X \setminus A)\right)=\inf_{A \in \mathcal A} \frac{1}{2}\left(1-\bigl(Q(A)-P(A)\bigr)\right). \end{align*} For every $A \in \mathcal A$, the quantity $Q(A)-P(A)$ lies in $[-1,1]$, because $P(A),Q(A) \in [0,1]$. Define the affine map $g:[-1,1]\to[0,1]$ by \begin{align*} g(t):=\frac{1-t}{2}. \end{align*} The map $g$ is decreasing, so the order-reversing property of the infimum and supremum gives \begin{align*} \inf_{A \in \mathcal A} \frac{1}{2}\left(1-\bigl(Q(A)-P(A)\bigr)\right)=\frac{1}{2}\left(1-\sup_{A \in \mathcal A}\bigl(Q(A)-P(A)\bigr)\right). \end{align*} By the preceding step, the supremum on the right-hand side is $\operatorname{TV}(P,Q)$. Therefore \begin{align*} \inf_{\phi} \frac{1}{2}\left(P(\{x \in \mathcal X : \phi(x)=1\})+Q(\{x \in \mathcal X : \phi(x)=0\})\right)=\frac{1-\operatorname{TV}(P,Q)}{2}. \end{align*} This is exactly the asserted formula. [/step]

Prerequisites (0/4 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

test

Definitions & Concepts

Explore Further

What brings you to Androma?

Start with a route through the knowledge graph.