[guided]We need to know that the bad set — points where the maximal function of $|\nabla u|$ is large — has small capacity. This is the Sobolev–Markov inequality at the level of capacity.
The key bound is
\begin{align*}
\operatorname{Cap}_p\big(\{M^* f > \lambda\}\big) \le C_n\, \lambda^{-p}\, \|f\|_{L^p}^p, \quad \text{for } f \in L^p, \lambda > 0.
\end{align*}
We sketch the derivation. The non-centred maximal function $M^* f$ is comparable to the *fractional maximal function* $M_1 f$ at scale $1$, but more directly: the Riesz potential $I_1 f(x) = c_n \int |x - y|^{1 - n} f(y)\, d\mathcal{L}^n(y)$ pointwise dominates a positive multiple of $M^* f$ minus a benign term, and $\nabla I_1 f$ is the Riesz transform / $\nabla(-\Delta)^{-1/2} f$, an $L^p$-bounded operator for $1 < p < \infty$.
For $1 < p < \infty$: the Riesz potential of $f \in L^p$ has $\|I_1 f\|_{W^{1,p}(\mathbb{R}^n)} \le C_n \|f\|_{L^p}$ (after subtracting an additive constant if needed; this is the classical fractional integration estimate), and $I_1 f \ge c_n M^* f$ pointwise modulo lower-order terms. Setting $w_\lambda := I_1 f / (c_n \lambda)$, we obtain $w_\lambda \ge 1$ on the level set $\{M^* f > \lambda\}$, with $\|w_\lambda\|_{W^{1,p}} \le C_n (c_n \lambda)^{-1} \|f\|_{L^p}$. Raising to the $p$-th power and using the definition of capacity:
\begin{align*}
\operatorname{Cap}_p(\{M^* f > \lambda\}) \le \|w_\lambda\|_{W^{1,p}}^p \le C_n' \lambda^{-p} \|f\|_{L^p}^p.
\end{align*}
For $p = 1$: the same conclusion holds with a weak-type variant, using that $M^*: L^1 \to L^{1, \infty}$ is bounded (Hardy–Littlewood) combined with the Sobolev embedding at the level of weak Lebesgue spaces.
The hypothesis $u \in W^{1,p}(\mathbb{R}^n)$ gives $f = |\nabla u| \in L^p(\mathbb{R}^n)$, and $\|f\|_{L^p} \le \|u\|_{W^{1,p}}$. Substituting,
\begin{align*}
\operatorname{Cap}_p(E_\lambda) \le C_n \lambda^{-p} \|u\|_{W^{1,p}}^p.
\end{align*}
For large $\lambda$, this bound is small.[/guided]