[proofplan]
We convert any estimator $\hat T$ into a nearest-neighbour test among the separated target values $T(\theta_1),\dots,T(\theta_M)$. The separation condition implies that, under $\theta_j$, the test can make an error only when $\hat T(X)$ is at metric distance at least $a$ from $T(\theta_j)$. Since the loss dominates the metric, this testing error probability is bounded by the estimation risk divided by $a$. Maximizing over the finite parameter subset and then infimizing gives the claimed testing lower bound.
[/proofplan]
[step:Show that small estimation error forces the nearest-neighbour decoder to choose the true index]
Let $\mathcal A$ denote the $\sigma$-algebra on $\mathcal X$, and let $\mathcal B(\mathcal T)$ denote the Borel $\sigma$-algebra induced by the metric $\rho$ on $\mathcal T$. Fix a measurable estimator
\begin{align*}
\hat T:(\mathcal X,\mathcal A) &\to (\mathcal T,\mathcal B(\mathcal T)).
\end{align*}
Let
\begin{align*}
\hat J_{\hat T}:\mathcal X &\to \{1,\dots,M\}
\end{align*}
be its nearest-neighbour decoder with the deterministic tie-breaking rule from the statement. For each $i \in \{1,\dots,M\}$, define the distance-to-target map $d_i:\mathcal X \to [0,\infty)$ by
\begin{align*}
d_i(x) := \rho(\hat T(x),T(\theta_i)).
\end{align*}
Since $\rho(\cdot,T(\theta_i)):(\mathcal T,\mathcal B(\mathcal T)) \to ([0,\infty),\mathcal B([0,\infty)))$ is continuous and hence Borel-measurable, and since $\hat T$ is measurable, each $d_i$ is $\mathcal A$-measurable. Because $\{1,\dots,M\}$ is finite and the tie-breaking rule is deterministic, every set $\{x \in \mathcal X : \hat J_{\hat T}(x)=i\}$ is obtained from finitely many measurable comparison sets of the form $\{d_i \le d_k\}$ and $\{d_i < d_k\}$; hence $\hat J_{\hat T}$ is measurable.
Fix $j \in \{1,\dots,M\}$ and $x \in \mathcal X$ such that
\begin{align*}
\rho(\hat T(x),T(\theta_j)) < a.
\end{align*}
For every $i \in \{1,\dots,M\}$ with $i \ne j$, the triangle inequality for the metric $\rho$ gives
\begin{align*}
\rho(T(\theta_i),T(\theta_j))
\le
\rho(T(\theta_i),\hat T(x))+\rho(\hat T(x),T(\theta_j)).
\end{align*}
Using the separation hypothesis and rearranging, we obtain
\begin{align*}
\rho(\hat T(x),T(\theta_i))
\ge
\rho(T(\theta_i),T(\theta_j))-\rho(\hat T(x),T(\theta_j))
>
2a-a
=
a.
\end{align*}
Since $\rho(\hat T(x),T(\theta_j))<a$, the index $j$ gives a strictly smaller distance than every $i \ne j$. Therefore $\hat J_{\hat T}(x)=j$.
[/step]
[step:Contain the testing error event inside a large estimation error event]
For each $j \in \{1,\dots,M\}$, define the testing error event
\begin{align*}
E_j := \{x \in \mathcal X : \hat J_{\hat T}(x) \ne j\}
\end{align*}
and the metric estimation error event
\begin{align*}
A_j := \{x \in \mathcal X : \rho(\hat T(x),T(\theta_j)) \ge a\}.
\end{align*}
The previous step proves the contrapositive implication
\begin{align*}
x \notin A_j \implies x \notin E_j.
\end{align*}
Hence
\begin{align*}
E_j \subset A_j.
\end{align*}
Therefore, for every $j \in \{1,\dots,M\}$,
\begin{align*}
\mathbb P_j(\hat J_{\hat T}(X) \ne j)
\le
\mathbb P_j(\rho(\hat T(X),T(\theta_j)) \ge a).
\end{align*}
[guided]
Fix $j \in \{1,\dots,M\}$. We want to compare two events under the law $\nu_{\theta_j}$: the testing error event and the event that the estimator is far from the true target value. Define
\begin{align*}
E_j := \{x \in \mathcal X : \hat J_{\hat T}(x) \ne j\}
\end{align*}
and
\begin{align*}
A_j := \{x \in \mathcal X : \rho(\hat T(x),T(\theta_j)) \ge a\}.
\end{align*}
Suppose $x \in \mathcal X$ satisfies
\begin{align*}
\rho(\hat T(x),T(\theta_j)) < a.
\end{align*}
For any $i \in \{1,\dots,M\}$ with $i \ne j$, the triangle inequality for the metric $\rho$ gives
\begin{align*}
\rho(T(\theta_i),T(\theta_j))
\le
\rho(T(\theta_i),\hat T(x))+
\rho(\hat T(x),T(\theta_j)).
\end{align*}
Using the separation hypothesis $\rho(T(\theta_i),T(\theta_j)) \ge 2a$ and subtracting the strict bound $\rho(\hat T(x),T(\theta_j)) < a$, we obtain
\begin{align*}
\rho(\hat T(x),T(\theta_i))
\ge
\rho(T(\theta_i),T(\theta_j))-\rho(\hat T(x),T(\theta_j))
>
2a-a
=
a.
\end{align*}
Thus $j$ is strictly closer to $\hat T(x)$ than every index $i \ne j$, so the deterministic tie-breaking rule is not reached and $\hat J_{\hat T}(x)=j$. In set notation, this proves
\begin{align*}
x \notin A_j \implies x \notin E_j.
\end{align*}
Taking contrapositives gives
\begin{align*}
x \in E_j \implies x \in A_j,
\end{align*}
so
\begin{align*}
E_j \subset A_j.
\end{align*}
Probability is monotone with respect to inclusion of measurable events, and therefore
\begin{align*}
\mathbb P_j(\hat J_{\hat T}(X) \ne j)
=
\mathbb P_j(E_j)
\le
\mathbb P_j(A_j)
=
\mathbb P_j(\rho(\hat T(X),T(\theta_j)) \ge a).
\end{align*}
This is the reduction: a testing mistake is possible only when the estimator has already made an error of size at least $a$ in the target metric.
[/guided]
[/step]
[step:Bound the testing error probability by the estimation risk]
For $j \in \{1,\dots,M\}$, let $\mathbb P_j:=\mathbb P_{\theta_j}$ be the probability measure induced by $X\sim\nu_{\theta_j}$ on $(\mathcal X,\mathcal A)$, and write $\mathbb E_j:=\mathbb E_{\theta_j}$ for expectation with respect to $\mathbb P_j$. For each $j \in \{1,\dots,M\}$, define the map $Y_j:\mathcal X \to [0,\infty]$ by
\begin{align*}
Y_j(x) := L(\hat T(x),T(\theta_j)).
\end{align*}
The estimator $\hat T$ is $\mathcal A/\mathcal B(\mathcal T)$-measurable, the map $x \mapsto T(\theta_j)$ from $(\mathcal X,\mathcal A)$ to $(\mathcal T,\mathcal B(\mathcal T))$ is constant and hence measurable, and $L$ is $\mathcal B(\mathcal T)\otimes\mathcal B(\mathcal T)$-measurable by hypothesis; therefore $Y_j$ is a non-negative [random variable](/page/Random%20Variable). Since $L(t,t') \ge \rho(t,t')$ for all $t,t' \in \mathcal T$, we have the event inclusion
\begin{align*}
\{x \in \mathcal X : \rho(\hat T(x),T(\theta_j)) \ge a\}
\subset
\{x \in \mathcal X : Y_j(x) \ge a\}.
\end{align*}
Because $Y_j \ge 0$, the pointwise inequality
\begin{align*}
Y_j(x) \ge a\,\mathbb 1_{\{Y_j \ge a\}}(x)
\end{align*}
holds for every $x \in \mathcal X$. Taking expectation under $\mathbb P_j$ gives
\begin{align*}
\mathbb E_j[L(\hat T(X),T(\theta_j))]
=
\mathbb E_j[Y_j(X)]
\ge
a\,\mathbb P_j(Y_j(X) \ge a).
\end{align*}
Combining this with the event inclusions from this step and the previous step yields
\begin{align*}
\mathbb E_j[L(\hat T(X),T(\theta_j))]
\ge
a\,\mathbb P_j(\hat J_{\hat T}(X) \ne j).
\end{align*}
[/step]
[step:Maximize over the finite testing family and infimize over estimators]
Since $\{\theta_1,\dots,\theta_M\} \subset \Theta$, for the fixed estimator $\hat T$ we have
\begin{align*}
\sup_{\theta \in \Theta} \mathbb E_\theta[L(\hat T(X),T(\theta))]
\ge
\max_{1 \le j \le M} \mathbb E_j[L(\hat T(X),T(\theta_j))].
\end{align*}
Using the estimate from the previous step for each $j$ gives
\begin{align*}
\max_{1 \le j \le M} \mathbb E_j[L(\hat T(X),T(\theta_j))]
\ge
a \max_{1 \le j \le M} \mathbb P_j(\hat J_{\hat T}(X) \ne j).
\end{align*}
The decoder $\hat J_{\hat T}$ is one measurable test from $\mathcal X$ to $\{1,\dots,M\}$, so
\begin{align*}
\max_{1 \le j \le M} \mathbb P_j(\hat J_{\hat T}(X) \ne j)
\ge
\inf_{\hat J} \max_{1 \le j \le M} \mathbb P_j(\hat J(X) \ne j),
\end{align*}
where the infimum is over all measurable tests $\hat J:\mathcal X \to \{1,\dots,M\}$. Therefore every measurable estimator $\hat T$ satisfies
\begin{align*}
\sup_{\theta \in \Theta} \mathbb E_\theta[L(\hat T(X),T(\theta))]
\ge
a \inf_{\hat J} \max_{1 \le j \le M} \mathbb P_j(\hat J(X) \ne j).
\end{align*}
Taking the infimum over all measurable estimators $\hat T:\mathcal X \to \mathcal T$ gives
\begin{align*}
\inf_{\hat T} \sup_{\theta \in \Theta} \mathbb E_\theta[L(\hat T(X),T(\theta))]
\ge
a \inf_{\hat J} \max_{1 \le j \le M} \mathbb P_j(\hat J(X) \ne j).
\end{align*}
This is the desired inequality.
[/step]