[proofplan]
The argument converts every estimator into a test by composing it with the decoder $\psi$. The assumed pointwise lower bound on the loss implies that, under each distinguished parameter $\theta_i$, the estimator's risk dominates $\delta$ times the probability that the induced test misclassifies $i$. Taking the maximum over the finite subset and then the infimum over estimators gives a lower bound by the minimax testing risk, because estimator-induced tests form only a subclass of all tests.
[/proofplan]
custom_env
admin
[step:Convert an arbitrary estimator into a test]
We work on the statistical experiment $(\Omega,\mathcal F,(\mathbb P_\theta)_{\theta\in\Theta})$ with action space $(\mathcal A,\mathcal G)$, so each estimator is a measurable map from $(\Omega,\mathcal F)$ to $(\mathcal A,\mathcal G)$ and each expectation $\mathbb E_\theta$ is taken with respect to $\mathbb P_\theta$. Because $\theta_1,\dots,\theta_M$ are distinct elements of $\Theta$, the indices $1,\dots,M$ label $M$ distinct hypotheses in the finite testing problem. Fix a measurable estimator
\begin{align*}
\hat a:(\Omega,\mathcal F)\to(\mathcal A,\mathcal G).
\end{align*}
Define the induced test
\begin{align*}
\hat V_{\hat a}:(\Omega,\mathcal F)\to(\{1,\dots,M\},2^{\{1,\dots,M\}})
\end{align*}
by
\begin{align*}
\hat V_{\hat a}(\omega)=\psi(\hat a(\omega)).
\end{align*}
Since $\hat a$ and $\psi$ are measurable, the composition $\hat V_{\hat a}=\psi\circ\hat a$ is measurable and is therefore an admissible test.
[/step]
custom_env
admin
[step:Lower bound the risk at each distinguished parameter by the induced testing error]For each $i\in\{1,\dots,M\}$, applying the assumed pointwise loss bound with $a=\hat a(\omega)$ gives, for every $\omega\in\Omega$,
\begin{align*}
L(\theta_i,\hat a(\omega))
\ge
\delta\,\mathbb{1}_{\{\psi(\hat a(\omega))\ne i\}}
=
\delta\,\mathbb{1}_{\{\hat V_{\hat a}(\omega)\ne i\}}.
\end{align*}
Taking $\mathbb P_{\theta_i}$-expectations of both non-negative random variables yields
\begin{align*}
\mathbb E_{\theta_i}[L(\theta_i,\hat a)]
\ge
\delta\,\mathbb E_{\theta_i}[\mathbb{1}_{\{\hat V_{\hat a}\ne i\}}]
=
\delta\,\mathbb P_{\theta_i}(\hat V_{\hat a}\ne i).
\end{align*}[/step]
custom_env
admin
[guided]Fix $i\in\{1,\dots,M\}$. The hypothesis gives a deterministic inequality for every possible action $a\in\mathcal A$:
\begin{align*}
L(\theta_i,a)\ge \delta\,\mathbb{1}_{\{\psi(a)\ne i\}}.
\end{align*}
We apply this inequality to the random action $a=\hat a(\omega)$. This is valid pointwise in $\omega$, because $\hat a(\omega)\in\mathcal A$ for every $\omega\in\Omega$. Thus
\begin{align*}
L(\theta_i,\hat a(\omega))
\ge
\delta\,\mathbb{1}_{\{\psi(\hat a(\omega))\ne i\}}.
\end{align*}
By the definition $\hat V_{\hat a}=\psi\circ\hat a$, the event $\{\psi(\hat a)\ne i\}$ is exactly the event $\{\hat V_{\hat a}\ne i\}$. Therefore
\begin{align*}
L(\theta_i,\hat a(\omega))
\ge
\delta\,\mathbb{1}_{\{\hat V_{\hat a}(\omega)\ne i\}}.
\end{align*}
Both sides are non-negative measurable random variables: measurability of the left side follows from the stated measurability of $\hat a$ and of the section $a\mapsto L(\theta_i,a)$, while measurability of the right side follows from measurability of $\hat V_{\hat a}$. Taking expectation with respect to $\mathbb P_{\theta_i}$ preserves the inequality:
\begin{align*}
\mathbb E_{\theta_i}[L(\theta_i,\hat a)]
\ge
\delta\,\mathbb E_{\theta_i}[\mathbb{1}_{\{\hat V_{\hat a}\ne i\}}].
\end{align*}
The expectation of an indicator is the probability of its event, so
\begin{align*}
\mathbb E_{\theta_i}[L(\theta_i,\hat a)]
\ge
\delta\,\mathbb P_{\theta_i}(\hat V_{\hat a}\ne i).
\end{align*}
This is the key reduction: estimating with loss at least $\delta$ on decoder mistakes is no easier than testing which $\theta_i$ generated the data.[/guided]
custom_env
admin
[step:Pass from pointwise risk bounds to the minimax lower bound]
Since $\{\theta_1,\dots,\theta_M\}\subset\Theta$, for every estimator $\hat a$,
\begin{align*}
\sup_{\theta\in\Theta}\mathbb E_\theta[L(\theta,\hat a)]
\ge
\max_{1\le i\le M}\mathbb E_{\theta_i}[L(\theta_i,\hat a)].
\end{align*}
Using the previous step for each $i$ and the hypothesis $\delta\ge 0$ gives
\begin{align*}
\max_{1\le i\le M}\mathbb E_{\theta_i}[L(\theta_i,\hat a)]
\ge
\max_{1\le i\le M}\delta\,\mathbb P_{\theta_i}(\hat V_{\hat a}\ne i)
=
\delta\max_{1\le i\le M}\mathbb P_{\theta_i}(\hat V_{\hat a}\ne i).
\end{align*}
Hence
\begin{align*}
\sup_{\theta\in\Theta}\mathbb E_\theta[L(\theta,\hat a)]
\ge
\delta\max_{1\le i\le M}\mathbb P_{\theta_i}(\hat V_{\hat a}\ne i).
\end{align*}
Taking the infimum over all measurable estimators $\hat a$ yields
\begin{align*}
\inf_{\hat a}\sup_{\theta\in\Theta}\mathbb E_\theta[L(\theta,\hat a)]
\ge
\delta\inf_{\hat a}\max_{1\le i\le M}\mathbb P_{\theta_i}(\hat V_{\hat a}\ne i).
\end{align*}
The collection of tests of the form $\hat V_{\hat a}=\psi\circ\hat a$ is a subset of the collection of all measurable tests $\hat V:\Omega\to\{1,\dots,M\}$. Taking an infimum over the larger class can only decrease the value, so
\begin{align*}
\inf_{\hat a}\max_{1\le i\le M}\mathbb P_{\theta_i}(\hat V_{\hat a}\ne i)
\ge
\inf_{\hat V}\max_{1\le i\le M}\mathbb P_{\theta_i}(\hat V\ne i).
\end{align*}
Combining the last two inequalities gives
\begin{align*}
\inf_{\hat a}\sup_{\theta\in\Theta}\mathbb E_\theta[L(\theta,\hat a)]
\ge
\delta\inf_{\hat V}\max_{1\le i\le M}\mathbb P_{\theta_i}(\hat V\ne i),
\end{align*}
which is the desired reduction.
[/step]