[proofplan]
We reduce estimation under the metric $d$ to testing which point of the finite packing $\{\theta_1,\dots,\theta_M\}$ generated the observation. The separation condition $d(\theta_j,\theta_k)>2s$ ensures that any estimator with loss less than $s$ determines the correct packing index. A Fano-type testing bound with auxiliary measure $P_{\theta_0}$ then lower bounds the testing error by the average Kullback--Leibler divergence to $P_{\theta_0}$, and the local $\rho$-ball condition turns this average divergence into the bound $nr^2\le \alpha\log M$.
[/proofplan]
[step:Reduce the minimax risk to the finite packing]
Let $\hat\theta:\mathcal X\to\Theta$ be an arbitrary measurable estimator. For each $j\in\{1,\dots,M\}$, define the loss [random variable](/page/Random%20Variable) $L_j:\mathcal X\to[0,\infty)$ by
\begin{align*}
L_j(x):=d(\hat\theta(x),\theta_j).
\end{align*}
Since $\{\theta_1,\dots,\theta_M\}\subset\Theta$,
\begin{align*}
\sup_{\theta\in\Theta}\mathbb E_\theta[d(\hat\theta,\theta)]
\ge
\max_{1\le j\le M}\mathbb E_{\theta_j}[L_j]
\ge
\frac{1}{M}\sum_{j=1}^M \mathbb E_{\theta_j}[L_j].
\end{align*}
For every nonnegative random variable $Y$ on a probability space, the pointwise inequality $Y\ge s\,\mathbb 1_{\{Y\ge s\}}$ gives $\mathbb E[Y]\ge s\,\mathbb P(Y\ge s)$. Applying this under $P_{\theta_j}$ to $Y=L_j$ yields
\begin{align*}
\sup_{\theta\in\Theta}\mathbb E_\theta[d(\hat\theta,\theta)]
\ge
\frac{s}{M}\sum_{j=1}^M P_{\theta_j}\bigl(L_j\ge s\bigr).
\end{align*}
[/step]
[step:Decode an index from the estimator]
Define the measurable nearest-packing decoder $\hat J:\mathcal X\to\{1,\dots,M\}$ by choosing the smallest index among minimizers:
\begin{align*}
\hat J(x)
:=
\min\operatorname*{arg\,min}_{1\le k\le M} d(\hat\theta(x),\theta_k).
\end{align*}
If $\hat J(x)\ne j$, then $d(\hat\theta(x),\theta_{\hat J(x)})\le d(\hat\theta(x),\theta_j)$ by definition of $\hat J$. If also $d(\hat\theta(x),\theta_j)<s$, then the triangle inequality gives
\begin{align*}
d(\theta_j,\theta_{\hat J(x)})
\le
d(\theta_j,\hat\theta(x))+d(\hat\theta(x),\theta_{\hat J(x)})
<
s+s
=
2s,
\end{align*}
contradicting $d(\theta_j,\theta_k)>2s$ for $j\ne k$. Therefore
\begin{align*}
\{\hat J\ne j\}\subseteq \{L_j\ge s\}
\end{align*}
under $P_{\theta_j}$, and hence
\begin{align*}
\sup_{\theta\in\Theta}\mathbb E_\theta[d(\hat\theta,\theta)]
\ge
s\,\frac{1}{M}\sum_{j=1}^M P_{\theta_j}(\hat J\ne j).
\end{align*}
[guided]
The estimator $\hat\theta$ takes values in the whole parameter space $\Theta$, not necessarily in the finite set $\{\theta_1,\dots,\theta_M\}$. To compare estimation with testing, we convert $\hat\theta$ into a finite-valued decision rule. Define the map $\hat J:\mathcal X\to\{1,\dots,M\}$ by
\begin{align*}
\hat J(x):=\min\operatorname*{arg\,min}_{1\le k\le M} d(\hat\theta(x),\theta_k).
\end{align*}
The minimum tie-breaking rule makes $\hat J$ well-defined. Under the standard convention that estimators are measurable as maps into the Borel space associated with the [metric space](/page/Metric%20Space) $(\Theta,d)$, each function $x\mapsto d(\hat\theta(x),\theta_k)$ is measurable, and therefore the finite-valued nearest-packing rule $\hat J$ is measurable.
Now fix $j\in\{1,\dots,M\}$ and suppose that $\hat J(x)\ne j$. The defining property of $\hat J$ gives
\begin{align*}
d(\hat\theta(x),\theta_{\hat J(x)})
\le
d(\hat\theta(x),\theta_j).
\end{align*}
If, in addition, the estimator were within distance $s$ of the true packing point $\theta_j$, then the triangle inequality would imply
\begin{align*}
d(\theta_j,\theta_{\hat J(x)})
\le
d(\theta_j,\hat\theta(x))+d(\hat\theta(x),\theta_{\hat J(x)})
<
s+s
=
2s.
\end{align*}
This is impossible because distinct packing points are separated by more than $2s$. Hence an incorrect decoded index forces loss at least $s$:
\begin{align*}
\{\hat J\ne j\}\subseteq \{d(\hat\theta,\theta_j)\ge s\}.
\end{align*}
Combining this inclusion with the elementary bound $Y\ge s\,\mathbb 1_{\{Y\ge s\}}$ for the nonnegative loss variable $Y=d(\hat\theta,\theta_j)$ gives
\begin{align*}
\mathbb E_{\theta_j}[d(\hat\theta,\theta_j)]
\ge
s\,P_{\theta_j}(\hat J\ne j).
\end{align*}
Averaging over $j=1,\dots,M$ and using that the supremum risk over $\Theta$ dominates the average risk over the finite subset $\{\theta_1,\dots,\theta_M\}$ yields
\begin{align*}
\sup_{\theta\in\Theta}\mathbb E_\theta[d(\hat\theta,\theta)]
\ge
s\,\frac{1}{M}\sum_{j=1}^M P_{\theta_j}(\hat J\ne j).
\end{align*}
[/guided]
[/step]
[step:Apply the finite testing bound with centre $P_{\theta_0}$]
We use the following finite testing inequality: for any probability measures $P_1,\dots,P_M,Q$ on $(\mathcal X,\mathcal A)$ and any measurable decision rule $\psi:\mathcal X\to\{1,\dots,M\}$,
\begin{align*}
\frac{1}{M}\sum_{j=1}^M P_j(\psi\ne j)\ge 1-\frac{\frac{1}{M}\sum_{j=1}^M D(P_j\|Q)+\log 2}{\log M}.
\end{align*}
We invoke this standard Fano inequality with an auxiliary reference measure, applied to the finite experiment generated by $P_1,\dots,P_M$ and the reference law $Q$.
Apply the inequality with $P_j=P_{\theta_j}$, $Q=P_{\theta_0}$, and $\psi=\hat J$. We obtain
\begin{align*}
\frac{1}{M}\sum_{j=1}^M P_{\theta_j}(\hat J\ne j)\ge 1-\frac{\frac{1}{M}\sum_{j=1}^M D(P_{\theta_j}\|P_{\theta_0})+\log 2}{\log M}.
\end{align*}
[/step]
[step:Use the local information radius to bound the average divergence]
For each $j\in\{1,\dots,M\}$, the information domination hypothesis applied to the pair $(\theta_j,\theta_0)$ gives
\begin{align*}
D(P_{\theta_j}\|P_{\theta_0})
\le
n\,\rho(\theta_j,\theta_0)^2.
\end{align*}
Since $\rho(\theta_j,\theta_0)\le r$, each summand satisfies
\begin{align*}
D(P_{\theta_j}\|P_{\theta_0})
\le
nr^2.
\end{align*}
Therefore
\begin{align*}
\frac{1}{M}\sum_{j=1}^M D(P_{\theta_j}\|P_{\theta_0})
\le
nr^2
\le
\alpha\log M.
\end{align*}
Substituting this estimate into the testing bound gives
\begin{align*}
\frac{1}{M}\sum_{j=1}^M P_{\theta_j}(\hat J\ne j)
\ge
1-\alpha-\frac{\log 2}{\log M}.
\end{align*}
[/step]
[step:Conclude the minimax lower bound]
Combining the estimation-to-testing reduction with the preceding lower bound on the average testing error gives, for every measurable estimator $\hat\theta:\mathcal X\to\Theta$,
\begin{align*}
\sup_{\theta\in\Theta}\mathbb E_\theta[d(\hat\theta,\theta)]
\ge
s\left(1-\alpha-\frac{\log 2}{\log M}\right).
\end{align*}
Taking the infimum over all measurable estimators $\hat\theta:\mathcal X\to\Theta$ preserves the inequality and yields
\begin{align*}
\inf_{\hat\theta:\mathcal X\to\Theta}\sup_{\theta\in\Theta}\mathbb E_\theta[d(\hat\theta,\theta)]
\ge
s\left(1-\alpha-\frac{\log 2}{\log M}\right).
\end{align*}
This is the desired local packing lower bound.
[/step]