[guided]The strategy is a decoupling argument: condition on everything except $\varepsilon_i$, split the resulting coin flip into its two outcomes, and exploit the Lipschitz bound on the difference that appears.
Condition on $\varepsilon_{-i}$ (the Rademacher vector with the $i$-th coordinate removed). Since $\varepsilon_i$ is an independent Rademacher variable taking values $+1$ and $-1$ each with probability $1/2$, the conditional expectation of the left-hand side becomes
\begin{align*}
\frac{1}{2}\sup_{h \in \mathcal{H}}\!\left\{\frac{1}{n}\phi(y_i h(x_i)) + A(h, \varepsilon_{-i})\right\} + \frac{1}{2}\sup_{h \in \mathcal{H}}\!\left\{-\frac{1}{n}\phi(y_i h(x_i)) + A(h, \varepsilon_{-i})\right\}.
\end{align*}
Why do the two suprema decouple? Because they are taken independently -- the $\varepsilon_i = +1$ case optimises over its own $h$, and the $\varepsilon_i = -1$ case optimises over a potentially different $h$. We exploit this by introducing two independent optimisation variables $(h, g) \in \mathcal{H} \times \mathcal{H}$:
\begin{align*}
&\leq \frac{1}{2}\sup_{h, g \in \mathcal{H}}\!\left\{\frac{1}{n}\bigl[\phi(y_i h(x_i)) - \phi(y_i g(x_i))\bigr] + A(h, \varepsilon_{-i}) + A(g, \varepsilon_{-i})\right\}.
\end{align*}
Now we apply the Lipschitz condition. We must verify that the arguments of $\phi$ lie in $[-r, r]$: since $|y_i| = 1$ and $|h(x_i)| \leq r$ by the definition $r = \sup_{x, h} |h(x)|$, we have $|y_i h(x_i)| = |h(x_i)| \leq r$, and similarly $|y_i g(x_i)| \leq r$. The Lipschitz bound gives
\begin{align*}
\phi(y_i h(x_i)) - \phi(y_i g(x_i)) \leq |\phi(y_i h(x_i)) - \phi(y_i g(x_i))| \leq L|y_i h(x_i) - y_i g(x_i)| = L|h(x_i) - g(x_i)|,
\end{align*}
where the final equality uses $|y_i| = 1$. How do we remove the absolute value? The key observation is a symmetry argument: the supremum over $(h, g) \in \mathcal{H} \times \mathcal{H}$ is free to choose whichever pair it likes, so it can always arrange $h(x_i) \geq g(x_i)$, making $|h(x_i) - g(x_i)| = h(x_i) - g(x_i)$. Therefore
\begin{align*}
\frac{1}{2}\sup_{h, g \in \mathcal{H}}\!\left\{\frac{L}{n}\bigl[h(x_i) - g(x_i)\bigr] + A(h, \varepsilon_{-i}) + A(g, \varepsilon_{-i})\right\}.
\end{align*}
Finally, we re-separate the joint supremum into two independent suprema. The expression $\frac{L}{n}h(x_i) + A(h, \varepsilon_{-i})$ depends only on $h$, and $-\frac{L}{n}g(x_i) + A(g, \varepsilon_{-i})$ depends only on $g$, so
\begin{align*}
= \frac{1}{2}\sup_{h \in \mathcal{H}}\!\left\{\frac{L}{n}h(x_i) + A(h, \varepsilon_{-i})\right\} + \frac{1}{2}\sup_{g \in \mathcal{H}}\!\left\{-\frac{L}{n}g(x_i) + A(g, \varepsilon_{-i})\right\}.
\end{align*}
Reading the right-hand side as the conditional expectation over a Rademacher $\varepsilon_i = \pm 1$: the $\varepsilon_i = +1$ term gives $\sup_h\!\left\{\frac{L}{n}h(x_i) + A\right\}$ and the $\varepsilon_i = -1$ term gives $\sup_g\!\left\{-\frac{L}{n}g(x_i) + A\right\}$, which is exactly the conditional expectation of $\sup_{h \in \mathcal{H}}\!\left\{\frac{L}{n}\varepsilon_i h(x_i) + A(h, \varepsilon_{-i})\right\}$. Taking the full expectation over $\varepsilon_{-i}$ proves the claim.[/guided]