[proofplan]
Fix $\varepsilon > 0$ and a Lebesgue point $x \in \mathbb{R}$ of $f$ and $|f|^2$. The truncated transform $H_\varepsilon f(x)$ records the Hilbert integral with the singularity excised at scale $\varepsilon$. We compare $H_\varepsilon f(x)$ against the average $\frac{1}{2\varepsilon}\int_{B(x,\varepsilon)} Hf(y)\, d\mathcal{L}^1(y)$ — which is bounded by $M(Hf)(x)$ — and show that the difference is controlled by the maximal functions $Mf(x)$ and $M_2 f(x) := (M(|f|^2)(x))^{1/2}$. The argument splits $f = f_1 + f_2$ with $f_1 := f\,\mathbb{1}_{B(x, 2\varepsilon)}$ the local part and $f_2 := f - f_1$ the far part. Steps 1-3 reduce the difference to two terms: a kernel-difference error bounded by $Mf(x)$ (Steps 1, 2) and the average of $|Hf_1|$ over $B(x, \varepsilon)$ (Step 3). Step 4 bounds the latter by $\sqrt{2}\,M_2 f(x)$ via Cauchy-Schwarz and the $L^2$-isometry of $H$. Step 5 collects the estimates.
[/proofplan]
[step:Decompose $f = f_1 + f_2$ around $x$ and rewrite $H_\varepsilon f(x)$]
Fix $\varepsilon > 0$ and a point $x \in \mathbb{R}$. Define
\begin{align*}
f_1: \mathbb{R} &\to \mathbb{C} \\
y &\mapsto f(y)\,\mathbb{1}_{B(x, 2\varepsilon)}(y),
\end{align*}
and $f_2 := f - f_1 = f \cdot \mathbb{1}_{\mathbb{R} \setminus B(x, 2\varepsilon)}$, where $B(x, 2\varepsilon) = (x - 2\varepsilon, x + 2\varepsilon)$.
Since $f_2$ vanishes on $B(x, 2\varepsilon)$, the Hilbert kernel $1/(\pi(x-y))$ is bounded on the support of $f_2$ near $x$, and no truncation is needed: for $z \in B(x, \varepsilon)$,
\begin{align*}
H f_2(z) = \frac{1}{\pi}\int_{|y-x|\ge 2\varepsilon} \frac{f(y)}{z - y}\, d\mathcal{L}^1(y).
\end{align*}
At $z = x$, this becomes
\begin{align*}
H f_2(x) = \frac{1}{\pi}\int_{|y - x|\ge 2\varepsilon} \frac{f(y)}{x - y}\, d\mathcal{L}^1(y).
\end{align*}
Comparing with the truncated Hilbert transform,
\begin{align*}
H_\varepsilon f(x) = \frac{1}{\pi}\int_{|y - x| > \varepsilon} \frac{f(y)}{x - y}\, d\mathcal{L}^1(y) = H f_2(x) + \frac{1}{\pi}\int_{\varepsilon < |y - x| < 2\varepsilon} \frac{f(y)}{x - y}\, d\mathcal{L}^1(y).
\end{align*}
The middle annular integral is estimated by $|y - x| > \varepsilon$:
\begin{align*}
\left|\frac{1}{\pi}\int_{\varepsilon < |y-x| < 2\varepsilon} \frac{f(y)}{x-y}\, d\mathcal{L}^1(y)\right| \le \frac{1}{\pi\varepsilon}\int_{|y-x|<2\varepsilon} |f(y)|\, d\mathcal{L}^1(y) = \frac{4}{\pi}\cdot\frac{1}{4\varepsilon}\int_{B(x,2\varepsilon)} |f|\, d\mathcal{L}^1 \le \frac{4}{\pi} M f(x).
\end{align*}
Hence
\begin{align*}
H_\varepsilon f(x) = H f_2(x) + R_1(x, \varepsilon), \qquad |R_1(x, \varepsilon)| \le \frac{4}{\pi} M f(x).
\end{align*}
[/step]
[step:Compare $Hf_2(x)$ with the average of $Hf_2$ over $B(x, \varepsilon)$]
For $z \in B(x, \varepsilon)$, the integrand $f(y)/(z - y)$ is continuous on $\{|y - x| \ge 2\varepsilon\}$, so $Hf_2(z)$ is well-defined and depends continuously on $z$ in $B(x, \varepsilon)$. We compute the difference $Hf_2(z) - Hf_2(x)$:
\begin{align*}
Hf_2(z) - Hf_2(x) = \frac{1}{\pi}\int_{|y - x|\ge 2\varepsilon} \left(\frac{1}{z - y} - \frac{1}{x - y}\right) f(y)\, d\mathcal{L}^1(y).
\end{align*}
The kernel difference satisfies
\begin{align*}
\left|\frac{1}{z-y} - \frac{1}{x-y}\right| = \frac{|z - x|}{|z-y|\,|x-y|} \le \frac{\varepsilon}{|z-y|\,|x-y|},
\end{align*}
since $|z - x| < \varepsilon$. For $|y - x| \ge 2\varepsilon$ and $|z - x| < \varepsilon$, we have $|z - y| \ge |y - x| - |z - x| \ge |y - x|/2$, so
\begin{align*}
\left|\frac{1}{z-y} - \frac{1}{x-y}\right| \le \frac{2\varepsilon}{|x-y|^2}.
\end{align*}
Therefore
\begin{align*}
|Hf_2(z) - Hf_2(x)| \le \frac{2\varepsilon}{\pi}\int_{|y-x|\ge 2\varepsilon} \frac{|f(y)|}{|x-y|^2}\, d\mathcal{L}^1(y).
\end{align*}
We decompose $\{|y - x| \ge 2\varepsilon\}$ into dyadic shells $A_k := \{2^k\varepsilon \le |y - x| < 2^{k+1}\varepsilon\}$ for $k \ge 1$. On $A_k$, $|x - y|^{-2} \le (2^k\varepsilon)^{-2}$, and
\begin{align*}
\int_{A_k}|f(y)|\, d\mathcal{L}^1(y) \le \int_{B(x, 2^{k+1}\varepsilon)}|f|\, d\mathcal{L}^1 \le 2 \cdot 2^{k+1}\varepsilon \cdot Mf(x) = 2^{k+2}\varepsilon\, Mf(x),
\end{align*}
using the definition $Mf(x) \ge \frac{1}{2 \cdot 2^{k+1}\varepsilon}\int_{B(x, 2^{k+1}\varepsilon)}|f|\, d\mathcal{L}^1$. Summing,
\begin{align*}
\int_{|y-x|\ge 2\varepsilon}\frac{|f(y)|}{|x-y|^2}\, d\mathcal{L}^1(y) \le \sum_{k\ge 1} \frac{1}{(2^k\varepsilon)^2}\cdot 2^{k+2}\varepsilon\,Mf(x) = \frac{4 Mf(x)}{\varepsilon}\sum_{k\ge 1}2^{-k} = \frac{4 Mf(x)}{\varepsilon}.
\end{align*}
Thus $|Hf_2(z) - Hf_2(x)| \le 8 Mf(x)/\pi$ for all $z \in B(x, \varepsilon)$. Averaging,
\begin{align*}
\left|Hf_2(x) - \frac{1}{2\varepsilon}\int_{B(x,\varepsilon)} Hf_2(z)\, d\mathcal{L}^1(z)\right| = \left|\frac{1}{2\varepsilon}\int_{B(x,\varepsilon)} (Hf_2(x) - Hf_2(z))\, d\mathcal{L}^1(z)\right| \le \frac{8}{\pi} Mf(x).
\end{align*}
[/step]
[step:Decompose the average of $Hf$ over $B(x, \varepsilon)$ into local and far parts]
For $z \in B(x, \varepsilon)$, $Hf(z) = Hf_1(z) + Hf_2(z)$ wherever both transforms are defined (which holds a.e. by $L^p$ boundedness of $H$ for $p > 1$). Averaging,
\begin{align*}
\frac{1}{2\varepsilon}\int_{B(x,\varepsilon)} Hf(z)\, d\mathcal{L}^1(z) = \frac{1}{2\varepsilon}\int_{B(x,\varepsilon)} Hf_1(z)\, d\mathcal{L}^1(z) + \frac{1}{2\varepsilon}\int_{B(x,\varepsilon)} Hf_2(z)\, d\mathcal{L}^1(z).
\end{align*}
Hence
\begin{align*}
\frac{1}{2\varepsilon}\int_{B(x,\varepsilon)} Hf_2(z)\, d\mathcal{L}^1(z) = \frac{1}{2\varepsilon}\int_{B(x,\varepsilon)} Hf(z)\, d\mathcal{L}^1(z) - \frac{1}{2\varepsilon}\int_{B(x,\varepsilon)} Hf_1(z)\, d\mathcal{L}^1(z).
\end{align*}
The first term on the RHS is bounded in absolute value by $M(Hf)(x)$ by definition of the maximal function. We must bound the second term — the average of $Hf_1$ over $B(x, \varepsilon)$.
[/step]
[step:Bound the average of $Hf_1$ via Cauchy-Schwarz and the $L^2$ isometry by $\sqrt{2}\,M_2 f(x)$]
We show that
\begin{align*}
\frac{1}{2\varepsilon}\int_{B(x,\varepsilon)} |Hf_1(z)|\, d\mathcal{L}^1(z) \le \sqrt{2}\,M_2 f(x), \qquad M_2 f(x) := (M(|f|^2)(x))^{1/2}.
\end{align*}
Apply the [Cauchy-Schwarz inequality](/theorems/???) on $(\mathbb{R}, \mathcal{L}^1)$ to the pair $(|Hf_1|, \mathbb{1}_{B(x,\varepsilon)})$:
\begin{align*}
\int_{B(x,\varepsilon)} |Hf_1(z)|\, d\mathcal{L}^1(z) = \int_{\mathbb{R}} |Hf_1(z)|\,\mathbb{1}_{B(x,\varepsilon)}(z)\, d\mathcal{L}^1(z) \le \|Hf_1\|_{L^2}\,\mathcal{L}^1(B(x,\varepsilon))^{1/2} = \|Hf_1\|_{L^2}\,(2\varepsilon)^{1/2}.
\end{align*}
Dividing by $2\varepsilon$,
\begin{align*}
\frac{1}{2\varepsilon}\int_{B(x,\varepsilon)} |Hf_1(z)|\, d\mathcal{L}^1(z) \le \frac{\|Hf_1\|_{L^2}}{(2\varepsilon)^{1/2}}.
\end{align*}
By the [$L^2$ Isometry of the Hilbert Transform](/theorems/???), $\|Hf_1\|_{L^2} = \|f_1\|_{L^2}$. Since $\operatorname{supp} f_1 \subseteq B(x, 2\varepsilon)$,
\begin{align*}
\|f_1\|_{L^2}^2 = \int_{B(x, 2\varepsilon)} |f|^2\, d\mathcal{L}^1,
\end{align*}
and therefore
\begin{align*}
\frac{\|f_1\|_{L^2}}{(2\varepsilon)^{1/2}} = \left(\frac{1}{2\varepsilon}\int_{B(x, 2\varepsilon)}|f|^2\, d\mathcal{L}^1\right)^{1/2} = \sqrt{2}\left(\frac{1}{4\varepsilon}\int_{B(x, 2\varepsilon)}|f|^2\, d\mathcal{L}^1\right)^{1/2} \le \sqrt{2}\,(M(|f|^2)(x))^{1/2},
\end{align*}
where the last inequality follows from the definition of the maximal function: $M(|f|^2)(x) \ge \frac{1}{\mathcal{L}^1(B(x, 2\varepsilon))}\int_{B(x, 2\varepsilon)}|f|^2\, d\mathcal{L}^1 = \frac{1}{4\varepsilon}\int_{B(x, 2\varepsilon)}|f|^2\, d\mathcal{L}^1$. Combining,
\begin{align*}
\frac{1}{2\varepsilon}\int_{B(x,\varepsilon)} |Hf_1(z)|\, d\mathcal{L}^1(z) \le \sqrt{2}\,M_2 f(x).
\end{align*}
[/step]
[step:Combine all bounds to obtain Cotlar's identity]
Collect estimates from Steps 1-4. Note first that since $|f| \le M_2 f$ pointwise a.e. (as $M_2 f \ge (\frac{1}{|B|}\int_B |f|^2)^{1/2} \to |f(x)|$ at every Lebesgue point of $|f|^2$ as $|B| \to 0$) and $Mf \le M_2 f$ pointwise (by Jensen's inequality applied to the convex function $t \mapsto t^2$ inside the ball-average), all $Mf$ contributions from Steps 1-2 may be absorbed into a constant times $M_2 f$.
From Step 1,
\begin{align*}
H_\varepsilon f(x) = Hf_2(x) + R_1(x, \varepsilon), \qquad |R_1| \le \frac{4}{\pi} Mf(x) \le \frac{4}{\pi} M_2 f(x).
\end{align*}
From Step 2, with $A(\varepsilon) := \frac{1}{2\varepsilon}\int_{B(x,\varepsilon)} Hf_2(z)\, d\mathcal{L}^1(z)$,
\begin{align*}
Hf_2(x) = A(\varepsilon) + R_2(x, \varepsilon), \qquad |R_2| \le \frac{8}{\pi}Mf(x) \le \frac{8}{\pi} M_2 f(x).
\end{align*}
From Step 3,
\begin{align*}
A(\varepsilon) = \frac{1}{2\varepsilon}\int_{B(x,\varepsilon)} Hf(z)\, d\mathcal{L}^1(z) - \frac{1}{2\varepsilon}\int_{B(x,\varepsilon)} Hf_1(z)\, d\mathcal{L}^1(z).
\end{align*}
The first term is bounded in absolute value by $M(Hf)(x)$ by definition of the Hardy-Littlewood maximal function. From Step 4, the second term has absolute value at most $\sqrt{2}\,M_2 f(x)$. Combining,
\begin{align*}
|H_\varepsilon f(x)| \le |A(\varepsilon)| + |R_1| + |R_2| \le M(Hf)(x) + \sqrt{2}\,M_2 f(x) + \frac{4}{\pi}M_2 f(x) + \frac{8}{\pi}M_2 f(x).
\end{align*}
Absorbing constants,
\begin{align*}
|H_\varepsilon f(x)| \le M(Hf)(x) + C\, M_2 f(x),
\end{align*}
with $C := \sqrt{2} + 12/\pi$ an absolute constant. Taking the supremum over $\varepsilon > 0$,
\begin{align*}
H_* f(x) = \sup_{\varepsilon>0}|H_\varepsilon f(x)| \le M(Hf)(x) + C\, M_2 f(x),
\end{align*}
as claimed. The inequality holds at every Lebesgue point $x$ of both $f$ and $|f|^2$, hence almost everywhere.
[/step]