[step:Apply a finite Fano bound to the induced testing problem]
[claim:Fano lower bound for finitely many Gaussian experiments]
Let $J$ be a finite set with $M:=|J|\ge 2$. For each $j\in J$, let $P_j$ be a probability measure on a common measurable space $(\mathcal Y,\mathcal A)$. Let the measurable estimator $\tilde j$ be the map
\begin{align*}
\tilde j: \mathcal Y &\to J\cup\{\ast\}.
\end{align*}
Fix $j_0\in J$. Define $D_{\mathrm{KL}}(P\,\|\,Q)$ to be the Kullback-Leibler divergence of a probability measure $P$ from a probability measure $Q$, with value $+\infty$ if $P$ is not absolutely continuous with respect to $Q$. Then
\begin{align*}
\max_{j\in J}P_j(\tilde j\ne j)
\ge
1-\frac{\frac{1}{M}\sum_{j\in J}D_{\mathrm{KL}}(P_j\,\|\,P_{j_0})+\log 2}{\log M}.
\end{align*}
[/claim]
[proof]
Let $V$ be a $J$-valued [random variable](/page/Random%20Variable) with the uniform law on $J$, and conditional on the event $\{V=j\}$ let $Y$ be a $\mathcal Y$-valued random variable with law $P_j$. Let $Q$ be the marginal law of $Y$, so
\begin{align*}
Q=\frac{1}{M}\sum_{m\in J}P_m.
\end{align*}
Define the mutual information $I(V;Y)$ by
\begin{align*}
I(V;Y)
:=
\frac{1}{M}\sum_{j\in J}D_{\mathrm{KL}}(P_j\,\|\,Q).
\end{align*}
If at least one term $D_{\mathrm{KL}}(P_j\,\|\,P_{j_0})$ is infinite, the desired bound follows after substituting $+\infty$ on the right-hand side. Hence assume all these divergences are finite. Then $P_j\ll P_{j_0}$ for every $j\in J$, and therefore $Q\ll P_{j_0}$. Let $p_j:=dP_j/dP_{j_0}$ and $q:=dQ/dP_{j_0}$. By the definition of $Q$,
\begin{align*}
q=\frac{1}{M}\sum_{m\in J}p_m
\end{align*}
$P_{j_0}$-almost everywhere. Expanding the average divergence relative to $P_{j_0}$ gives
\begin{align*}
\frac{1}{M}\sum_{j\in J}D_{\mathrm{KL}}(P_j\,\|\,P_{j_0})
&=
\frac{1}{M}\sum_{j\in J}\int_{\mathcal Y} p_j\log p_j\,dP_{j_0}.
\end{align*}
Since $p_j=q\,(dP_j/dQ)$ $P_{j_0}$-almost everywhere on the set where $q>0$, and $p_j=0$ where $q=0$, we obtain
First, using $p_j=q\,(dP_j/dQ)$ $P_{j_0}$-almost everywhere on $\{q>0\}$ and $p_j=0$ on $\{q=0\}$ gives
\begin{align*}
\frac{1}{M}\sum_{j\in J}\int_{\mathcal Y} p_j\log p_j\,dP_{j_0}
=
\frac{1}{M}\sum_{j\in J}\int_{\mathcal Y} p_j\log\left(\frac{dP_j}{dQ}\right)\,dP_{j_0}
+
\frac{1}{M}\sum_{j\in J}\int_{\mathcal Y}p_j\log q\,dP_{j_0}.
\end{align*}
Since $q=M^{-1}\sum_{j\in J}p_j$, this becomes
\begin{align*}
\frac{1}{M}\sum_{j\in J}\int_{\mathcal Y} p_j\log p_j\,dP_{j_0}
=
\frac{1}{M}\sum_{j\in J}D_{\mathrm{KL}}(P_j\,\|\,Q)
+
\int_{\mathcal Y}q\log q\,dP_{j_0}.
\end{align*}
By the definitions of $I(V;Y)$ and $D_{\mathrm{KL}}(Q\,\|\,P_{j_0})$, we conclude
\begin{align*}
\frac{1}{M}\sum_{j\in J}\int_{\mathcal Y} p_j\log p_j\,dP_{j_0}
=
I(V;Y)+D_{\mathrm{KL}}(Q\,\|\,P_{j_0}).
\end{align*}
Because $D_{\mathrm{KL}}(Q\,\|\,P_{j_0})\ge 0$, this identity yields the required information bound
\begin{align*}
I(V;Y)\le \frac{1}{M}\sum_{j\in J}D_{\mathrm{KL}}(P_j\,\|\,P_{j_0}).
\end{align*}
Let $\hat V:=\tilde j(Y)$. Define the error probability $p_e:=\mathbb P(\hat V\ne V)$. Since $\ast\notin J$, every outcome with $\tilde j(Y)=\ast$ is counted as an error, so $\hat V$ is a finite-valued decision rule whose error event is precisely $\{\hat V\ne V\}$. Let $H(V)$ denote the Shannon entropy of $V$, and let $H(V\mid \hat V)$ denote the conditional Shannon entropy of $V$ given $\hat V$.
We prove the finite-alphabet entropy estimate needed here. Let $E_0$ be the Bernoulli random variable defined by $E_0=1$ on $\{\hat V\ne V\}$ and $E_0=0$ on $\{\hat V=V\}$. Conditional on $\hat V$ and $E_0=0$, the value of $V$ is determined by $V=\hat V$. Conditional on $\hat V$ and $E_0=1$, the value of $V$ belongs to at most $M$ possibilities. Hence
Since adding $E_0$ cannot decrease the information available to describe the pair, we have
\begin{align*}
H(V\mid \hat V)
\le H(E_0,V\mid \hat V).
\end{align*}
The finite-alphabet [chain rule for entropy](/theorems/1635) gives
\begin{align*}
H(E_0,V\mid \hat V)
= H(E_0\mid \hat V)+H(V\mid \hat V,E_0).
\end{align*}
The Bernoulli entropy is bounded by $\log 2$, and the conditional entropy of $V$ is $0$ on $\{E_0=0\}$ and at most $\log M$ on $\{E_0=1\}$. Therefore
\begin{align*}
H(V\mid \hat V)
\le \log 2+p_e\log M.
\end{align*}
Moreover, $H(V)=\log M$. Because $\hat V$ is a measurable function of $Y$, conditioning on the full observation $Y$ leaves no more uncertainty than conditioning on $\hat V$ alone, so
\begin{align*}
H(V\mid \hat V)\ge H(V\mid Y)=H(V)-I(V;Y)=\log M-I(V;Y).
\end{align*}
Combining the preceding upper and lower bounds on $H(V\mid \hat V)$ gives
\begin{align*}
p_e
\ge
1-\frac{I(V;Y)+\log 2}{\log M}.
\end{align*}
Finally,
\begin{align*}
p_e
=
\frac{1}{M}\sum_{j\in J}P_j(\tilde j\ne j)
\le
\max_{j\in J}P_j(\tilde j\ne j).
\end{align*}
Combining this with the information bound proves the claim.
[/proof]
Choose a fixed index $j_0\in J$. Applying the claim with $P_j=P_j^X$ and $\tilde j=\hat j_X$ gives
\begin{align*}
R_X
\ge
1-\frac{\frac{1}{M}\sum_{j\in J}D_{\mathrm{KL}}(P_j^X\,\|\,P_{j_0}^X)+\log 2}{\log M}.
\end{align*}
[/step]