[guided]The estimator $\hat\psi_n$ takes values in the target space $\Psi$, not in the label set $\{0,1\}$. Recall that the two target points are defined by $\theta_i:=\psi(P_i)$ for $i\in\{0,1\}$. To compare estimation with testing, we turn the estimator into a test by asking which target point it is closer to. Since $\hat\psi_n:\mathcal X^n\to\Psi$ is $\mathcal A^{\otimes n}$-to-Borel measurable and the maps $y\mapsto d(y,\theta_0)$ and $y\mapsto d(y,\theta_1)$ are continuous on the [metric space](/page/Metric%20Space) $\Psi$, the set
\begin{align*}
A_0:=\{x\in\mathcal X^n: d(\hat\psi_n(x),\theta_0)\le d(\hat\psi_n(x),\theta_1)\}
\end{align*}
belongs to $\mathcal A^{\otimes n}$. Define
\begin{align*}
A_1:=\mathcal X^n\setminus A_0.
\end{align*}
Then $A_1\in\mathcal A^{\otimes n}$. Define $\varphi:\mathcal X^n\to\{0,1\}$ by $\varphi(x)=0$ on $A_0$ and $\varphi(x)=1$ on $A_1$. Thus $\varphi$ chooses model $0$ when $\hat\psi_n(x)$ is at least as close to $\theta_0$ as to $\theta_1$, and chooses model $1$ otherwise.
Why does a wrong testing decision force a large estimation error? Suppose first that the true model is $0$, but the induced test chooses $1$, so $x\in A_1$. Then
\begin{align*}
d(\hat\psi_n(x),\theta_1)<d(\hat\psi_n(x),\theta_0).
\end{align*}
The separation assumption gives
\begin{align*}
2s\le d(\theta_0,\theta_1).
\end{align*}
Using the triangle inequality between $\theta_0$, $\hat\psi_n(x)$, and $\theta_1$, and then using the displayed inequality above, we get
\begin{align*}
d(\theta_0,\theta_1)\le d(\theta_0,\hat\psi_n(x))+d(\hat\psi_n(x),\theta_1)<2d(\theta_0,\hat\psi_n(x)).
\end{align*}
Therefore $d(\hat\psi_n(x),\theta_0)>s$ on $A_1$. Integrating this pointwise lower bound with respect to the true law $Q_0$ gives
\begin{align*}
R_0
=\int_{\mathcal X^n} d(\hat\psi_n(x),\theta_0)\,dQ_0(x)
\ge s\,Q_0(A_1).
\end{align*}
Now suppose the true model is $1$, but the induced test chooses $0$, so $x\in A_0$. The definition of $A_0$ gives
\begin{align*}
d(\hat\psi_n(x),\theta_0)\le d(\hat\psi_n(x),\theta_1).
\end{align*}
The separation assumption gives
\begin{align*}
2s\le d(\theta_0,\theta_1).
\end{align*}
The same triangle inequality calculation gives
\begin{align*}
d(\theta_0,\theta_1)\le d(\theta_0,\hat\psi_n(x))+d(\hat\psi_n(x),\theta_1)\le 2d(\hat\psi_n(x),\theta_1).
\end{align*}
Therefore $d(\hat\psi_n(x),\theta_1)\ge s$ on $A_0$, and integration with respect to $Q_1$ yields
\begin{align*}
R_1
=\int_{\mathcal X^n} d(\hat\psi_n(x),\theta_1)\,dQ_1(x)
\ge s\,Q_1(A_0).
\end{align*}
Adding the two inequalities gives the reduction from estimation risk to testing error:
\begin{align*}
R_0+R_1\ge s\bigl(Q_0(A_1)+Q_1(A_0)\bigr).
\end{align*}[/guided]