[proofplan]
We first compute the mean and variance of each block average, using independence inside a block. [Chebyshev's inequality](/theorems/1126) then shows that a single block mean lies above $\mu+r$ or below $\mu-r$ with probability at most $1/8$. The median can exceed $\mu+r$ only if at least half of the block means exceed $\mu+r$, and similarly on the lower side. We bound these two majority events by an exponential moment argument for independent Bernoulli indicators and finish by a union bound.
[/proofplan]
[step:Compute the first two moments of each block mean]
Let $(\Omega,\mathcal F,\mathbb P)$ be the underlying probability space. For each $j\in\{1,\dots,k\}$, define the consecutive block $B_j:=\{(j-1)m+1,\dots,jm\}\subset\{1,\dots,n\}$ and the block mean map
\begin{align*}
Y_j:\Omega\to\mathbb R,\qquad Y_j(\omega):=\frac{1}{m}\sum_{i\in B_j}X_i(\omega).
\end{align*}
Let $Y_{(1)},\dots,Y_{(k)}:\Omega\to\mathbb R$ denote the order-statistic maps determined by $Y_{(1)}(\omega)\le\dots\le Y_{(k)}(\omega)$ for every $\omega\in\Omega$. Define $\ell:=\lceil k/2\rceil$ and
\begin{align*}
\widehat\mu_{\mathrm{MOM}}:\Omega\to\mathbb R,\qquad \widehat\mu_{\mathrm{MOM}}(\omega):=Y_{(\ell)}(\omega).
\end{align*}
Each order statistic is measurable because
\begin{align*}
Y_{(\ell)}=\min_{I\subseteq\{1,\dots,k\},\ |I|=k-\ell+1}\max_{j\in I}Y_j,
\end{align*}
and finite maxima and finite minima of real-valued [measurable functions](/page/Measurable%20Functions) are measurable. Thus $\widehat\mu_{\mathrm{MOM}}$ is a real-valued [random variable](/page/Random%20Variable), so all deviation events involving it are elements of $\mathcal F$.
Fix $j\in\{1,\dots,k\}$. Since each $Y_j$ is a finite linear combination of the measurable maps $X_i:\Omega\to\mathbb R$, it is a real-valued random variable. Linearity of expectation gives
\begin{align*}
\mathbb E[Y_j]=\mathbb E\left[\frac{1}{m}\sum_{i\in B_j}X_i\right]=\frac{1}{m}\sum_{i\in B_j}\mathbb E[X_i]=\mu .
\end{align*}
Because the random variables $(X_i)_{i\in B_j}$ are independent and identically distributed, the covariance terms vanish in the variance of the sum, and therefore
\begin{align*}
\operatorname{Var}(Y_j)=\operatorname{Var}\left(\frac{1}{m}\sum_{i\in B_j}X_i\right)=\frac{1}{m^2}\sum_{i\in B_j}\operatorname{Var}(X_i)\le \frac{1}{m^2}\cdot m\sigma^2=\frac{\sigma^2}{m}.
\end{align*}
Thus every block mean has expectation $\mu$ and variance at most $\sigma^2/m$.
[/step]
[step:Show that each one-sided block failure has probability at most $1/8$]
For each $j\in\{1,\dots,k\}$, define the upper and lower block-failure events $A_{j,\mathrm{up}}\in\mathcal F$ and $A_{j,\mathrm{low}}\in\mathcal F$ by
\begin{align*}
A_{j,\mathrm{up}}:=\{\omega\in\Omega:Y_j(\omega)-\mu>r\},\qquad A_{j,\mathrm{low}}:=\{\omega\in\Omega:\mu-Y_j(\omega)>r\}.
\end{align*}
Since $A_{j,\mathrm{up}}\subseteq \{|Y_j-\mu|>r\}$ and $A_{j,\mathrm{low}}\subseteq \{|Y_j-\mu|>r\}$, it is enough to bound the two-sided event. On $\{|Y_j-\mu|>r\}$ we have $(Y_j-\mu)^2>r^2$, hence
\begin{align*}
r^2\,\mathbb P(|Y_j-\mu|>r)\le \int_{\{|Y_j-\mu|>r\}}(Y_j-\mu)^2\,d\mathbb P(\omega)\le \int_{\Omega}(Y_j-\mu)^2\,d\mathbb P(\omega)=\operatorname{Var}(Y_j)\le \frac{\sigma^2}{m}.
\end{align*}
Therefore
\begin{align*}
\mathbb P(A_{j,\mathrm{up}})\le \frac{\sigma^2}{mr^2},\qquad \mathbb P(A_{j,\mathrm{low}})\le \frac{\sigma^2}{mr^2}.
\end{align*}
The hypothesis $m\ge 8\sigma^2/r^2$ gives
\begin{align*}
\mathbb P(A_{j,\mathrm{up}})\le \frac18,\qquad \mathbb P(A_{j,\mathrm{low}})\le \frac18.
\end{align*}
[guided]
The purpose of this step is to turn the finite-variance assumption into a uniform success probability for every block. For a fixed block $j$, define the events $A_{j,\mathrm{up}}\in\mathcal F$ and $A_{j,\mathrm{low}}\in\mathcal F$ by
\begin{align*}
A_{j,\mathrm{up}}:=\{\omega\in\Omega:Y_j(\omega)-\mu>r\},\qquad A_{j,\mathrm{low}}:=\{\omega\in\Omega:\mu-Y_j(\omega)>r\}.
\end{align*}
These are the events that block $j$ is too high or too low. Both are contained in the two-sided deviation event $\{|Y_j-\mu|>r\}$.
We now prove the needed [Chebyshev estimate](/theorems/1126) directly. On the event $\{|Y_j-\mu|>r\}$, the non-negative random variable $(Y_j-\mu)^2$ is larger than $r^2$. Therefore
\begin{align*}
r^2\,\mathbb P(|Y_j-\mu|>r)=\int_{\{|Y_j-\mu|>r\}} r^2\,d\mathbb P(\omega)\le \int_{\{|Y_j-\mu|>r\}}(Y_j-\mu)^2\,d\mathbb P(\omega)\le \int_{\Omega}(Y_j-\mu)^2\,d\mathbb P(\omega)=\operatorname{Var}(Y_j).
\end{align*}
For this block mean, the first moment computation gives $\mathbb E[Y_j]=\mu$, and the independence inside the block gives $\operatorname{Var}(Y_j)\le \sigma^2/m$. Substituting that variance bound yields
\begin{align*}
\mathbb P(|Y_j-\mu|>r)\le \frac{\sigma^2}{mr^2}.
\end{align*}
Since $A_{j,\mathrm{up}}$ and $A_{j,\mathrm{low}}$ are subsets of this two-sided event,
\begin{align*}
\mathbb P(A_{j,\mathrm{up}})\le \frac{\sigma^2}{mr^2},\qquad \mathbb P(A_{j,\mathrm{low}})\le \frac{\sigma^2}{mr^2}.
\end{align*}
Finally, the block-size assumption $m\ge 8\sigma^2/r^2$ is exactly the condition that $\sigma^2/(mr^2)\le 1/8$. Hence
\begin{align*}
\mathbb P(A_{j,\mathrm{up}})\le \frac18,\qquad \mathbb P(A_{j,\mathrm{low}})\le \frac18.
\end{align*}
[/guided]
[/step]
[step:Relate a median deviation to a majority of bad blocks]
Define the upper and lower counting random variables by
\begin{align*}
S_{\mathrm{up}}:\Omega\to\{0,\dots,k\},\qquad S_{\mathrm{up}}(\omega):=\sum_{j=1}^k\mathbb 1_{A_{j,\mathrm{up}}}(\omega),
\end{align*}
and
\begin{align*}
S_{\mathrm{low}}:\Omega\to\{0,\dots,k\},\qquad S_{\mathrm{low}}(\omega):=\sum_{j=1}^k\mathbb 1_{A_{j,\mathrm{low}}}(\omega).
\end{align*}
If $\widehat\mu_{\mathrm{MOM}}(\omega)>\mu+r$, then at least $k/2$ of the block means satisfy $Y_j(\omega)\ge \widehat\mu_{\mathrm{MOM}}(\omega)>\mu+r$, so $S_{\mathrm{up}}(\omega)\ge k/2$. Therefore
\begin{align*}
\{\widehat\mu_{\mathrm{MOM}}-\mu>r\}\subseteq \{S_{\mathrm{up}}\ge k/2\}.
\end{align*}
Similarly, if $\widehat\mu_{\mathrm{MOM}}(\omega)<\mu-r$, then at least $k/2$ of the block means satisfy $Y_j(\omega)\le \widehat\mu_{\mathrm{MOM}}(\omega)<\mu-r$, so $S_{\mathrm{low}}(\omega)\ge k/2$. Hence
\begin{align*}
\{\mu-\widehat\mu_{\mathrm{MOM}}>r\}\subseteq \{S_{\mathrm{low}}\ge k/2\}.
\end{align*}
Consequently,
\begin{align*}
\{|\widehat\mu_{\mathrm{MOM}}-\mu|>r\}\subseteq\{S_{\mathrm{up}}\ge k/2\}\cup \{S_{\mathrm{low}}\ge k/2\}.
\end{align*}
[guided]
The median is the order statistic $Y_{(\ell)}$ with $\ell=\lceil k/2\rceil$, so the event involving $\widehat\mu_{\mathrm{MOM}}$ is measurable by the order-statistic measurability proved at the start. Define the counting random variables
\begin{align*}
S_{\mathrm{up}}:\Omega\to\{0,\dots,k\},\qquad S_{\mathrm{up}}(\omega):=\sum_{j=1}^k\mathbb 1_{A_{j,\mathrm{up}}}(\omega),
\end{align*}
and
\begin{align*}
S_{\mathrm{low}}:\Omega\to\{0,\dots,k\},\qquad S_{\mathrm{low}}(\omega):=\sum_{j=1}^k\mathbb 1_{A_{j,\mathrm{low}}}(\omega).
\end{align*}
These variables count how many block means fall respectively above $\mu+r$ and below $\mu-r$.
Suppose first that $\widehat\mu_{\mathrm{MOM}}(\omega)>\mu+r$. Since $\widehat\mu_{\mathrm{MOM}}(\omega)=Y_{(\ell)}(\omega)$, every order statistic $Y_{(q)}(\omega)$ with $q\ge \ell$ is also larger than $\mu+r$. There are $k-\ell+1$ such indices, and $k-\ell+1\ge k/2$. Hence at least $k/2$ original block means satisfy $Y_j(\omega)>\mu+r$, so $S_{\mathrm{up}}(\omega)\ge k/2$. Thus
\begin{align*}
\{\widehat\mu_{\mathrm{MOM}}-\mu>r\}\subseteq \{S_{\mathrm{up}}\ge k/2\}.
\end{align*}
Suppose next that $\widehat\mu_{\mathrm{MOM}}(\omega)<\mu-r$. Then $Y_{(q)}(\omega)<\mu-r$ for every $q\le \ell$, giving at least $\ell\ge k/2$ block means below $\mu-r$. Hence $S_{\mathrm{low}}(\omega)\ge k/2$, and therefore
\begin{align*}
\{\mu-\widehat\mu_{\mathrm{MOM}}>r\}\subseteq \{S_{\mathrm{low}}\ge k/2\}.
\end{align*}
Combining the upper and lower implications gives
\begin{align*}
\{|\widehat\mu_{\mathrm{MOM}}-\mu|>r\}\subseteq\{S_{\mathrm{up}}\ge k/2\}\cup \{S_{\mathrm{low}}\ge k/2\}.
\end{align*}
[/guided]
[/step]
[step:Bound each majority event by an exponential moment estimate]
Since the blocks $B_1,\dots,B_k$ are disjoint and the variables $X_1,\dots,X_n$ are independent, the subfamilies $(X_i)_{i\in B_1},\dots,(X_i)_{i\in B_k}$ are independent. Let $\mathcal G_j:=\sigma(X_i:i\in B_j)$ denote the sigma-algebra generated by the variables in block $B_j$. The sigma-algebras $\mathcal G_1,\dots,\mathcal G_k$ are independent, and $Y_j$ is $\mathcal G_j$-measurable for each $j$. By the closure property in the definition of [independence of random variables](/page/Independence%20%28Probability%20Theory%29), measurable functions of independent sigma-algebras are independent, so $Y_1,\dots,Y_k$ are independent. Hence the indicators $\mathbb 1_{A_{1,\mathrm{up}}},\dots,\mathbb 1_{A_{k,\mathrm{up}}}$ are independent Bernoulli random variables. Define $p_{j,\mathrm{up}}:=\mathbb P(A_{j,\mathrm{up}})$, so $0\le p_{j,\mathrm{up}}\le 1/8$.
Let $t:=\log 4$. Since $e^t=4$, independence gives
\begin{align*}
\mathbb E[e^{tS_{\mathrm{up}}}]=\prod_{j=1}^k\mathbb E[e^{t\mathbb 1_{A_{j,\mathrm{up}}}}]=\prod_{j=1}^k\left(1+p_{j,\mathrm{up}}(e^t-1)\right)\le \prod_{j=1}^k \exp\left(p_{j,\mathrm{up}}(e^t-1)\right)\le \exp\left(k\cdot \frac18\cdot 3\right)=\exp\left(\frac{3k}{8}\right).
\end{align*}
On the event $\{S_{\mathrm{up}}\ge k/2\}$, $e^{tS_{\mathrm{up}}}\ge e^{tk/2}$. Therefore [Markov's inequality](/theorems/514), applied to the non-negative random variable $e^{tS_{\mathrm{up}}}$, gives
\begin{align*}
\mathbb P(S_{\mathrm{up}}\ge k/2)\le e^{-tk/2}\mathbb E[e^{tS_{\mathrm{up}}}]\le \exp\left(-\frac{k}{2}\log 4+\frac{3k}{8}\right)=\exp\left(-k\log 2+\frac{3k}{8}\right)\le e^{-k/8},
\end{align*}
because $\log 2\ge 1/2$.
The lower indicators $\mathbb 1_{A_{1,\mathrm{low}}},\dots,\mathbb 1_{A_{k,\mathrm{low}}}$ are also independent Bernoulli random variables by the same measurable-map argument. Define $p_{j,\mathrm{low}}:=\mathbb P(A_{j,\mathrm{low}})$, so $0\le p_{j,\mathrm{low}}\le 1/8$. With the same value $t=\log 4$,
\begin{align*}
\mathbb E[e^{tS_{\mathrm{low}}}]=\prod_{j=1}^k\mathbb E[e^{t\mathbb 1_{A_{j,\mathrm{low}}}}]=\prod_{j=1}^k\left(1+p_{j,\mathrm{low}}(e^t-1)\right)\le \exp\left(\frac{3k}{8}\right).
\end{align*}
On $\{S_{\mathrm{low}}\ge k/2\}$, [Markov's inequality](/theorems/514) applied to the non-negative random variable $e^{tS_{\mathrm{low}}}$ gives
\begin{align*}
\mathbb P(S_{\mathrm{low}}\ge k/2)\le e^{-tk/2}\mathbb E[e^{tS_{\mathrm{low}}}]\le \exp\left(-k\log 2+\frac{3k}{8}\right)\le e^{-k/8}.
\end{align*}
[guided]
We now bound the probability that too many blocks fail on one side. Since the blocks $B_1,\dots,B_k$ are disjoint and the random variables $X_1,\dots,X_n$ are independent, the block families $(X_i)_{i\in B_1},\dots,(X_i)_{i\in B_k}$ are independent. Let $\mathcal G_j:=\sigma(X_i:i\in B_j)$ denote the sigma-algebra generated by the variables in block $B_j$. The sigma-algebras $\mathcal G_1,\dots,\mathcal G_k$ are independent. Since $Y_j$ is a measurable function of the variables in block $B_j$, it is $\mathcal G_j$-measurable. By the closure property in the definition of [independence of random variables](/page/Independence%20%28Probability%20Theory%29), measurable functions of independent sigma-algebras are independent. Hence $Y_1,\dots,Y_k$ are independent, and the indicators $\mathbb 1_{A_{1,\mathrm{up}}},\dots,\mathbb 1_{A_{k,\mathrm{up}}}$ are independent Bernoulli random variables.
Define $p_{j,\mathrm{up}}:=\mathbb P(A_{j,\mathrm{up}})$. From the one-block estimate, $0\le p_{j,\mathrm{up}}\le 1/8$. Choose $t:=\log 4$, so $e^t=4$. The exponential moment factorizes by independence:
\begin{align*}
\mathbb E[e^{tS_{\mathrm{up}}}]=\prod_{j=1}^k\mathbb E[e^{t\mathbb 1_{A_{j,\mathrm{up}}}}]=\prod_{j=1}^k\left(1+p_{j,\mathrm{up}}(e^t-1)\right).
\end{align*}
Using $1+a\le e^a$ for $a\ge 0$ and $e^t-1=3$, we obtain
\begin{align*}
\mathbb E[e^{tS_{\mathrm{up}}}]\le \prod_{j=1}^k\exp\left(3p_{j,\mathrm{up}}\right)\le \exp\left(\frac{3k}{8}\right).
\end{align*}
On the event $\{S_{\mathrm{up}}\ge k/2\}$, the monotonicity of the exponential function gives $e^{tS_{\mathrm{up}}}\ge e^{tk/2}$. Applying [Markov's inequality](/theorems/514) to the non-negative random variable $e^{tS_{\mathrm{up}}}$ yields
\begin{align*}
\mathbb P(S_{\mathrm{up}}\ge k/2)\le e^{-tk/2}\mathbb E[e^{tS_{\mathrm{up}}}]\le \exp\left(-k\log 2+\frac{3k}{8}\right)\le e^{-k/8},
\end{align*}
because $\log 2\ge 1/2$.
The lower-tail indicators are handled by the same argument with the separately defined probabilities $p_{j,\mathrm{low}}:=\mathbb P(A_{j,\mathrm{low}})$. The independence and the bound $p_{j,\mathrm{low}}\le 1/8$ give
\begin{align*}
\mathbb E[e^{tS_{\mathrm{low}}}]\le \exp\left(\frac{3k}{8}\right).
\end{align*}
Applying Markov's inequality to $e^{tS_{\mathrm{low}}}$ on the event $\{S_{\mathrm{low}}\ge k/2\}$ gives
\begin{align*}
\mathbb P(S_{\mathrm{low}}\ge k/2)\le e^{-tk/2}\mathbb E[e^{tS_{\mathrm{low}}}]\le \exp\left(-k\log 2+\frac{3k}{8}\right)\le e^{-k/8}.
\end{align*}
[/guided]
[/step]
[step:Combine the two tails and derive the confidence form]
Using the containment from the median step and the [union bound](/theorems/6078),
\begin{align*}
\mathbb P(|\widehat\mu_{\mathrm{MOM}}-\mu|>r)\le \mathbb P(S_{\mathrm{up}}\ge k/2)+\mathbb P(S_{\mathrm{low}}\ge k/2)\le e^{-k/8}+e^{-k/8}=2e^{-k/8}.
\end{align*}
If $\beta\in(0,1)$ and $k\ge 8\log(2/\beta)$, then
\begin{align*}
2e^{-k/8}\le 2e^{-\log(2/\beta)}=\beta.
\end{align*}
Thus
\begin{align*}
\mathbb P(|\widehat\mu_{\mathrm{MOM}}-\mu|>r)\le \beta,
\end{align*}
which is equivalent to
\begin{align*}
\mathbb P(|\widehat\mu_{\mathrm{MOM}}-\mu|\le r)\ge 1-\beta.
\end{align*}
[guided]
The preceding step reduced the median deviation to two majority events and then bounded each majority event. Using that containment and the [union bound](/theorems/6078), we get
\begin{align*}
\mathbb P(|\widehat\mu_{\mathrm{MOM}}-\mu|>r)\le \mathbb P(S_{\mathrm{up}}\ge k/2)+\mathbb P(S_{\mathrm{low}}\ge k/2)\le e^{-k/8}+e^{-k/8}=2e^{-k/8}.
\end{align*}
This is the stated deviation inequality.
For the confidence form, let $\beta\in(0,1)$ and assume $k\ge 8\log(2/\beta)$. Since the exponential function is increasing, this implies $-k/8\le -\log(2/\beta)$, and therefore
\begin{align*}
2e^{-k/8}\le 2e^{-\log(2/\beta)}=\beta.
\end{align*}
Combining this with the deviation inequality gives
\begin{align*}
\mathbb P(|\widehat\mu_{\mathrm{MOM}}-\mu|>r)\le \beta.
\end{align*}
Taking complements in the probability space $(\Omega,\mathcal F,\mathbb P)$ gives the equivalent high-probability statement
\begin{align*}
\mathbb P(|\widehat\mu_{\mathrm{MOM}}-\mu|\le r)\ge 1-\beta.
\end{align*}
[/guided]
This proves both the deviation inequality and its stated high-probability consequence.
[/step]