[guided]The purpose of the decoder is to turn an arbitrary estimator taking values in $\Theta$ into an estimator of the hidden vertex $v \in \{0,1\}^m$. For each $w \in \{0,1\}^m$, define $r_w:\mathcal X\to[0,\infty)$ by
\begin{align*}
r_w(x)=\rho(\hat{\theta}(x),\psi(w)).
\end{align*}
Let $\mathcal B(\Theta)$ denote the Borel $\sigma$-algebra generated by the metric $\rho$ on $\Theta$. This function is measurable because $\hat\theta:(\mathcal X,\mathcal A)\to(\Theta,\mathcal B(\Theta))$ is measurable, while $\theta\mapsto\rho(\theta,\psi(w))$ is continuous. Since the cube is finite, the minimum of $w\mapsto r_w(x)$ over $w\in\{0,1\}^m$ exists for each observation $x\in\mathcal X$. We choose the lexicographically first minimizer to make the choice single-valued.
We also need the decoder to be measurable, because later we will take probabilities of events involving $\hat v_j$. For a fixed $w\in\{0,1\}^m$, the event $\{x\in\mathcal X:\hat v(x)=w\}$ is described by finitely many comparisons between the [measurable functions](/page/Measurable%20Functions) $r_w$ and $r_u$: if $u$ is lexicographically after $w$, require $r_w\le r_u$, and if $u$ is lexicographically before $w$, require $r_w<r_u$. Sets of the form $\{r_w\le r_u\}$ and $\{r_w<r_u\}$ are measurable, so each fibre of $\hat v$ is measurable. Therefore $\hat v:(\mathcal X,\mathcal A)\to\{0,1\}^m$ is measurable for the discrete $\sigma$-algebra on the finite cube.
By definition of this nearest-neighbour choice, for every fixed true vertex $v \in \{0,1\}^m$,
\begin{align*}
\rho(\hat{\theta}(x),\psi(\hat{v}(x))) \leq \rho(\hat{\theta}(x),\psi(v)).
\end{align*}
Now we compare the two cube vertices $\psi(\hat{v}(x))$ and $\psi(v)$. The triangle inequality in the metric space $(\Theta,\rho)$ gives
\begin{align*}
\rho(\psi(\hat{v}(x)),\psi(v))\leq \rho(\psi(\hat{v}(x)),\hat{\theta}(x))+\rho(\hat{\theta}(x),\psi(v)).
\end{align*}
By symmetry of $\rho$, the first term equals $\rho(\hat{\theta}(x),\psi(\hat{v}(x)))$. The nearest-neighbour inequality then gives
\begin{align*}
\rho(\psi(\hat{v}(x)),\psi(v))\leq 2\rho(\hat{\theta}(x),\psi(v)).
\end{align*}
The hypercube separation hypothesis says that metric distance between embedded vertices dominates Hamming distance:
\begin{align*}
\rho(\psi(\hat{v}(x)),\psi(v)) \geq 2s\,d_H(\hat{v}(x),v).
\end{align*}
Combining the last two displays and dividing by $2$ gives
\begin{align*}
\rho(\hat{\theta}(x),\psi(v)) \geq s\,d_H(\hat{v}(x),v).
\end{align*}
Thus every unit error in the decoded Hamming distance costs at least $s$ in the original metric loss.[/guided]