[proofplan]
Put the uniform prior on the hypercube and write the Bayes Hamming risk as the sum of the coordinatewise error probabilities. For each coordinate $j$, condition on all other coordinates; the remaining decision is a binary test between two neighboring vertices of the cube. A direct total-variation testing bound gives average error at least $(1-\alpha)/2$ for each such binary problem. Summing over the $m$ coordinates and comparing Bayes risk with worst-case risk gives the Hamming lower bound, and the probability lower bound follows from the deterministic inequality $|\hat S\triangle S(v)|\le m\mathbb{1}_{\{\hat S\neq S(v)\}}$.
[/proofplan]
[step:Put the uniform prior on the hypercube and decompose the Bayes Hamming risk]
Fix a measurable estimator $\hat S:\mathcal X\to 2^{\{1,\dots,m\}}$. Define a probability measure $\mathbb Q$ on $\{0,1\}^m\times\mathcal X$ by
\begin{align*}
\mathbb Q(\{v\}\times A):=2^{-m}P_v(A)
\end{align*}
for every $v\in\{0,1\}^m$ and every $A\in\mathcal A$. Let $V:\{0,1\}^m\times\mathcal X\to\{0,1\}^m$ and $X:\{0,1\}^m\times\mathcal X\to\mathcal X$ be the coordinate maps. Thus $V$ is uniform on $\{0,1\}^m$, and conditional on $V=v$, the observation $X$ has distribution $P_v$.
Equip the finite set $2^{\{1,\dots,m\}}$ with its power-set $\sigma$-algebra. For each $j\in\{1,\dots,m\}$, define the coordinate decision rule $\delta_j:\mathcal X\to\{0,1\}$ by
\begin{align*}
\delta_j(x):=\mathbb{1}_{\{j\in\hat S(x)\}}.
\end{align*}
The map $A\mapsto \mathbb{1}_{\{j\in A\}}$ from $2^{\{1,\dots,m\}}$ to $\{0,1\}$ is measurable because both spaces are finite and carry their power-set $\sigma$-algebras, so $\delta_j$ is measurable as its composition with the measurable estimator $\hat S$. Since
\begin{align*}
|\hat S(X)\triangle S(V)|
=\sum_{j=1}^m \mathbb{1}_{\{\delta_j(X)\neq V_j\}},
\end{align*}
linearity of expectation with respect to $\mathbb Q$ gives
\begin{align*}
\mathbb E_{\mathbb Q}[|\hat S(X)\triangle S(V)|]
=\sum_{j=1}^m \mathbb Q(\delta_j(X)\neq V_j).
\end{align*}
[guided]
The uniform prior turns the minimax problem into an average-risk problem that is easier to lower bound. We define $\mathbb Q$ by first drawing $V$ uniformly from the cube and then drawing $X$ from the experiment $P_V$:
\begin{align*}
\mathbb Q(\{v\}\times A):=2^{-m}P_v(A).
\end{align*}
The estimator $\hat S$ produces a subset of $\{1,\dots,m\}$. We equip the finite set $2^{\{1,\dots,m\}}$ with its power-set $\sigma$-algebra, so the coordinate membership map $A\mapsto \mathbb{1}_{\{j\in A\}}$ is measurable. To compare $\hat S$ coordinate by coordinate with the true support $S(V)$, define $\delta_j:\mathcal X\to\{0,1\}$ by
\begin{align*}
\delta_j(x):=\mathbb{1}_{\{j\in\hat S(x)\}}.
\end{align*}
Because $\delta_j$ is the composition of $\hat S$ with a measurable map on the finite codomain, $\delta_j$ is measurable. Thus $\delta_j(X)$ is the estimator's decision about whether coordinate $j$ belongs to the support. Since $j\in S(V)$ exactly when $V_j=1$, the event of making a mistake at coordinate $j$ is $\{\delta_j(X)\neq V_j\}$. Therefore the symmetric-difference size is exactly the Hamming loss:
\begin{align*}
|\hat S(X)\triangle S(V)|
=\sum_{j=1}^m \mathbb{1}_{\{\delta_j(X)\neq V_j\}}.
\end{align*}
Taking expectation under $\mathbb Q$ and using linearity of expectation gives
\begin{align*}
\mathbb E_{\mathbb Q}[|\hat S(X)\triangle S(V)|]
=\sum_{j=1}^m \mathbb Q(\delta_j(X)\neq V_j).
\end{align*}
This decomposition is the point of using the hypercube: the total support error splits into $m$ binary testing errors.
[/guided]
[/step]
[step:Lower bound each coordinate error by a binary testing inequality]
Fix $j\in\{1,\dots,m\}$. Let $\{0,1\}^{m-1}_j$ denote the set of assignments to all coordinates except $j$. For each $w\in\{0,1\}^{m-1}_j$, define $v_0(w),v_1(w)\in\{0,1\}^m$ by requiring that both vectors agree with $w$ outside coordinate $j$, while $(v_0(w))_j=0$ and $(v_1(w))_j=1$. Then $v_1(w)=v_0(w)^{(j)}$.
For any measurable set $A\in\mathcal A$ and any probability measures $P,Q$ on $(\mathcal X,\mathcal A)$,
\begin{align*}
P(A)+Q(A^c)\ge 1-\operatorname{TV}(P,Q).
\end{align*}
Indeed, $Q(A^c)=1-Q(A)$, so
\begin{align*}
P(A)+Q(A^c)=1-(Q(A)-P(A))\ge 1-|P(A)-Q(A)|\ge 1-\operatorname{TV}(P,Q).
\end{align*}
Apply this with $P=P_{v_0(w)}$, $Q=P_{v_1(w)}$, and $A=\{x\in\mathcal X:\delta_j(x)=1\}$. Since $\operatorname{TV}(P_{v_0(w)},P_{v_1(w)})\le\alpha$, we obtain
\begin{align*}
\frac{1}{2}P_{v_0(w)}(\delta_j=1)+\frac{1}{2}P_{v_1(w)}(\delta_j=0)
\ge \frac{1-\alpha}{2}.
\end{align*}
Averaging this inequality over all $w\in\{0,1\}^{m-1}_j$ gives
\begin{align*}
\mathbb Q(\delta_j(X)\neq V_j)\ge \frac{1-\alpha}{2}.
\end{align*}
[guided]
Fix a coordinate $j\in\{1,\dots,m\}$. The strategy is to freeze every coordinate except $j$ and view the remaining question as a binary test. Let $\{0,1\}^{m-1}_j$ denote the set of assignments to the coordinates in $\{1,\dots,m\}\setminus\{j\}$. For each $w\in\{0,1\}^{m-1}_j$, define $v_0(w),v_1(w)\in\{0,1\}^m$ as follows: both vectors agree with $w$ away from coordinate $j$, while $(v_0(w))_j=0$ and $(v_1(w))_j=1$. Thus $v_1(w)=v_0(w)^{(j)}$, so the theorem hypothesis gives
\begin{align*}
\operatorname{TV}(P_{v_0(w)},P_{v_1(w)})\le \alpha.
\end{align*}
We now prove the binary testing inequality used for this frozen value of $w$. Let $P$ and $Q$ be probability measures on $(\mathcal X,\mathcal A)$, and let $A\in\mathcal A$ be measurable. Since $Q(A^c)=1-Q(A)$, subtracting and bounding by total variation gives
\begin{align*}
P(A)+Q(A^c)=1-(Q(A)-P(A))\ge 1-|P(A)-Q(A)|\ge 1-\operatorname{TV}(P,Q).
\end{align*}
This inequality says that no test can make both type-I and type-II errors too small when the two laws are close in total variation.
Apply the inequality with $P=P_{v_0(w)}$, $Q=P_{v_1(w)}$, and $A=\{x\in\mathcal X:\delta_j(x)=1\}$. The set $A$ is measurable because $\delta_j$ is measurable. Under $P_{v_0(w)}$, the true value of coordinate $j$ is $0$, so the event $\{\delta_j=1\}$ is an error. Under $P_{v_1(w)}$, the true value of coordinate $j$ is $1$, so the event $\{\delta_j=0\}=A^c$ is an error. Therefore
\begin{align*}
\frac{1}{2}P_{v_0(w)}(\delta_j=1)+\frac{1}{2}P_{v_1(w)}(\delta_j=0)
\ge \frac{1}{2}\bigl(1-\operatorname{TV}(P_{v_0(w)},P_{v_1(w)})\bigr)
\ge \frac{1-\alpha}{2}.
\end{align*}
Averaging this bound over the uniformly distributed frozen vector $w\in\{0,1\}^{m-1}_j$ is exactly the conditional contribution to $\mathbb Q(\delta_j(X)\neq V_j)$. Hence
\begin{align*}
\mathbb Q(\delta_j(X)\neq V_j)\ge \frac{1-\alpha}{2}.
\end{align*}
[/guided]
[/step]
[step:Sum the coordinate lower bounds and pass from Bayes risk to worst-case risk]
Using the coordinate decomposition and the lower bound from the previous step,
\begin{align*}
\mathbb E_{\mathbb Q}[|\hat S(X)\triangle S(V)|]
=\sum_{j=1}^m \mathbb Q(\delta_j(X)\neq V_j)
\ge \frac{m}{2}(1-\alpha).
\end{align*}
By the definition of $\mathbb Q$,
\begin{align*}
\mathbb E_{\mathbb Q}[|\hat S(X)\triangle S(V)|]
=2^{-m}\sum_{v\in\{0,1\}^m}\mathbb E_v[|\hat S\triangle S(v)|]
\le \sup_{v\in\{0,1\}^m}\mathbb E_v[|\hat S\triangle S(v)|].
\end{align*}
Therefore every measurable estimator $\hat S$ satisfies
\begin{align*}
\sup_{v\in\{0,1\}^m}\mathbb E_v[|\hat S\triangle S(v)|]
\ge \frac{m}{2}(1-\alpha).
\end{align*}
Taking the infimum over all measurable estimators $\hat S$ proves
\begin{align*}
\inf_{\hat S}\sup_{v\in\{0,1\}^m}\mathbb E_v[|\hat S\triangle S(v)|]
\ge \frac{m}{2}(1-\alpha).
\end{align*}
[/step]
[step:Convert the Hamming lower bound into a probability lower bound]
For every $v\in\{0,1\}^m$ and every observation $x\in\mathcal X$,
\begin{align*}
|\hat S(x)\triangle S(v)|
\le m\,\mathbb{1}_{\{\hat S(x)\neq S(v)\}},
\end{align*}
because the symmetric difference is a subset of $\{1,\dots,m\}$ and is nonempty exactly when $\hat S(x)\neq S(v)$. Taking expectation under $P_v$ gives
\begin{align*}
\mathbb E_v[|\hat S\triangle S(v)|]
\le m\,\mathbb P_v(\hat S\neq S(v)).
\end{align*}
Since $m\in\mathbb N$, we have $m\ge 1$. Hence
\begin{align*}
\sup_{v\in\{0,1\}^m}\mathbb P_v(\hat S\neq S(v))
\ge \frac{1}{m}\sup_{v\in\{0,1\}^m}\mathbb E_v[|\hat S\triangle S(v)|]
\ge \frac{1-\alpha}{2}.
\end{align*}
Taking the infimum over all measurable estimators $\hat S$ proves
\begin{align*}
\inf_{\hat S}\sup_{v\in\{0,1\}^m}\mathbb P_v(\hat S\neq S(v))
\ge \frac{1-\alpha}{2}.
\end{align*}
[guided]
The Bayes Hamming lower bound from the coordinatewise testing argument says that every measurable estimator $\hat S$ satisfies
\begin{align*}
\mathbb E_{\mathbb Q}[|\hat S(X)\triangle S(V)|]\ge \frac{m}{2}(1-\alpha).
\end{align*}
By the definition of $\mathbb Q$, this Bayes risk is the uniform average of the risks over the cube:
\begin{align*}
\mathbb E_{\mathbb Q}[|\hat S(X)\triangle S(V)|]
=2^{-m}\sum_{v\in\{0,1\}^m}\mathbb E_v[|\hat S\triangle S(v)|].
\end{align*}
An average is bounded above by the maximum of the same finite family, so
\begin{align*}
\sup_{v\in\{0,1\}^m}\mathbb E_v[|\hat S\triangle S(v)|]
\ge \frac{m}{2}(1-\alpha).
\end{align*}
Taking the infimum over all measurable estimators proves the minimax Hamming-risk bound.
It remains to convert Hamming loss into an exact-support error probability. For every $v\in\{0,1\}^m$ and every observation $x\in\mathcal X$, the symmetric difference $\hat S(x)\triangle S(v)$ is a subset of $\{1,\dots,m\}$. It has size $0$ exactly when $\hat S(x)=S(v)$, and otherwise it has size at most $m$. Hence
\begin{align*}
|\hat S(x)\triangle S(v)|
\le m\,\mathbb{1}_{\{\hat S(x)\neq S(v)\}}.
\end{align*}
Taking expectation under $P_v$ gives
\begin{align*}
\mathbb E_v[|\hat S\triangle S(v)|]
\le m\,\mathbb P_v(\hat S\neq S(v)).
\end{align*}
Since $m\in\mathbb N$, the number of coordinates satisfies $m\ge 1$, so division by $m$ is legitimate. Therefore
\begin{align*}
\sup_{v\in\{0,1\}^m}\mathbb P_v(\hat S\neq S(v))
\ge \frac{1}{m}\sup_{v\in\{0,1\}^m}\mathbb E_v[|\hat S\triangle S(v)|]
\ge \frac{1-\alpha}{2}.
\end{align*}
Taking the infimum over all measurable estimators $\hat S$ proves the probability lower bound.
[/guided]
This completes the proof.
[/step]