[proofplan]
**This proof is a sketch.** We outline the strategy of [Fefferman--Stein, *Acta Math.* 129 (1972), Theorem 11], which establishes the **pointwise** comparison $M_\varphi f(x)\le C(\varphi,\psi,n)\,M_\psi f(x)$ a.e., from which the $L^1$-equivalence and the kernel-independence of $H^1$ follow by integration and symmetry. Technical lemmas — specifically, (i) the Calderón reproducing identity for tempered distributions, (ii) Peetre's maximal-function bound for band-limited functions, and (iii) the pointwise comparison $|\Delta_k f|\le C_n\,M_\psi f$ a.e. for the Littlewood--Paley pieces — are established in [Stein, *Harmonic Analysis: Real-Variable Methods, Orthogonality, and Oscillatory Integrals* (Princeton, 1993), Chapter III §1.5] and [Triebel, *Theory of Function Spaces* (Birkhäuser, 1983), §1.6.4]. The reader should consult those references for the technical core; the present sketch describes the architecture and verifies that the hypotheses of the cited results match the setting of this theorem.
[/proofplan]
[step:Reduce the $L^1$-equivalence to the pointwise comparison]
Let $\varphi, \psi \in \mathcal{S}(\mathbb{R}^n)$ with $\int_{\mathbb{R}^n}\varphi\,d\mathcal{L}^n = \int_{\mathbb{R}^n}\psi\,d\mathcal{L}^n = 1$. The dilates $\varphi_t(x) := t^{-n}\varphi(x/t)$ satisfy $\int\varphi_t\,d\mathcal{L}^n = 1$ for every $t > 0$, and analogously for $\psi$. We claim that, for every $f \in \mathcal{S}'(\mathbb{R}^n)$,
\begin{align*}
M_\varphi f(x)\le C(\varphi,\psi,n)\,M_\psi f(x) \qquad \text{for } \mathcal{L}^n\text{-a.e. } x\in\mathbb{R}^n. \tag{$\ast$}
\end{align*}
Granting $(\ast)$, integration over $\mathbb{R}^n$ gives $\|M_\varphi f\|_{L^1}\le C(\varphi,\psi,n)\,\|M_\psi f\|_{L^1}$, and the symmetric argument with the roles of $\varphi$ and $\psi$ swapped yields the reverse, hence the equivalence $\|M_\varphi f\|_{L^1}\asymp\|M_\psi f\|_{L^1}$.
A pointwise bound of the form $M_\varphi f\lesssim M(M_\psi f)$ would not suffice: the Hardy--Littlewood maximal operator $M$ is only weak-$(1,1)$, so applying $M$ to $M_\psi f \in L^1$ does not in general return an $L^1$ function. The pointwise inequality $(\ast)$ — without an intervening Hardy--Littlewood maximal operator — is therefore the right target.
[/step]
[step:Set up the Littlewood--Paley decomposition $\{\Psi_k\}$]
Choose a radial smooth function $\Psi\in C_c^\infty(\mathbb{R}^n)$ with $\Psi\equiv 1$ on $\{|\xi|\le 1\}$, $\operatorname{supp}\Psi\subseteq\{|\xi|\le 2\}$, $0\le\Psi\le 1$. Define the dyadic cutoffs
\begin{align*}
\Psi_k:\mathbb{R}^n &\to[0,1] \\
\xi &\mapsto\Psi(2^{-k}\xi) - \Psi(2^{-(k-1)}\xi),\qquad k\in\mathbb{Z}.
\end{align*}
Each $\Psi_k$ is supported in the dyadic annulus $A_k := \{2^{k-1}\le|\xi|\le 2^{k+1}\}$. The partition-of-unity property
\begin{align*}
\sum_{k\in\mathbb{Z}}\Psi_k(\xi) = 1\qquad \text{for every } \xi\in\mathbb{R}^n_0
\end{align*}
holds because the sum telescopes: $\sum_{k=-N}^M \Psi_k(\xi) = \Psi(2^{-M}\xi) - \Psi(2^{N+1}\xi)\to 1 - 0 = 1$ as $M\to+\infty,\,N\to+\infty$ for any fixed $\xi\ne 0$.
Define the Littlewood--Paley pieces of $f$ as
\begin{align*}
\Delta_k:\mathcal{S}'(\mathbb{R}^n)\to\mathcal{S}'(\mathbb{R}^n),\qquad \widehat{\Delta_k f}(\xi) := \Psi_k(\xi)\,\hat f(\xi).
\end{align*}
Each $\Delta_k f$ is band-limited to $A_k$. The synthesis identity $f = \sum_{k\in\mathbb{Z}}\Delta_k f$ holds in $\mathcal{S}'(\mathbb{R}^n)$ modulo polynomials (the contribution at $\xi = 0$ is killed when the convolution kernel $\varphi_t$ has $\hat\varphi_t(0) = 1$; see Stein, op. cit., Ch. III §1.4 for the precise distributional statement).
[/step]
[step:State Peetre's maximal-function bound for band-limited functions (cited from Triebel)]
We invoke the following result. *(Peetre's inequality / Plancherel--Pólya inequality; see Triebel, *Theory of Function Spaces*, §1.6.4, Theorem 1.6.4; equivalent forms in Grafakos, *Classical Fourier Analysis*, 3rd ed., Theorem 2.3.20.)* For any tempered distribution $g$ with $\operatorname{supp}\hat g\subseteq\{|\xi|\le R\}$ ($R > 0$) and any $0 < r\le 1$,
\begin{align*}
\sup_{y\in\mathbb{R}^n}\frac{|g(x - y)|}{(1 + R|y|)^{n/r}}\le C_{n,r}\,R^{n/r}\,M(|g|^r)(x)^{1/r}\qquad \text{for every } x\in\mathbb{R}^n,
\end{align*}
where $M$ is the Hardy--Littlewood maximal operator. Combining this with the rapid Schwartz decay $|\varphi_t(y)|\le C_N\,t^{-n}\,(1 + |y|/t)^{-N}$ and integrating against $\varphi_t$ (estimated via the substitution $y = tz$), one obtains (Triebel, op. cit., (3) on p. 22)
\begin{align*}
\sup_{t > 0}|\varphi_t * g|(x)\le C_{\varphi,n,r}\,R^{n/r}\,(1 + Rt)^{n/r}\cdot M(|g|^r)(x)^{1/r}\,\Big|_{\mathrm{operative\ regime}}.
\end{align*}
The cited statement gives the operative form for our setting with $g = \Delta_k f$ (so $R = 2^{k+1}$) and $0 < r\le 1$:
\begin{align*}
|\varphi_t * \Delta_k f|(x)\le C\cdot 2^{kn/r}\,(1 + 2^k t)^{n/r}\,M(|\Delta_k f|^r)(x)^{1/r}. \tag{P$_k$}
\end{align*}
We use $(\mathrm{P}_k)$ as the building block; its proof (a careful change-of-variables in the convolution integral together with the Schwartz decay of $\hat\varphi$) is recorded in the cited references.
[/step]
[step:Cite the pointwise comparison $|\Delta_k f|\le C_n\,M_\psi f$ a.e.]
The technical core of the theorem is the pointwise bound
\begin{align*}
|\Delta_k f(x)|\le C_n\,M_\psi f(x)\qquad \text{for } \mathcal{L}^n\text{-a.e. } x\in\mathbb{R}^n,\,k\in\mathbb{Z}, \tag{LP}
\end{align*}
where the constant depends only on $\psi$ and $n$ (and on the choice of cutoff $\Psi$, which is fixed once and for all). This is established in [Stein, *Harmonic Analysis* (Princeton, 1993), Chapter III §1.5, Lemma 1 and its proof] under the precise hypothesis $\psi\in\mathcal{S}(\mathbb{R}^n)$ with $\int\psi = 1$. The proof in op. cit. proceeds via the **grand maximal function** $\mathcal{M}^*_N f$ (Stein, Ch. III §1, eq. (4)), with the comparison $\mathcal{M}^*_N f\asymp M_\psi f$ established once via a lattice-of-test-functions argument that does *not* require dividing by $\hat\psi$. We do not reproduce the proof of (LP); we cite Stein's Lemma III.1.5.1 for it.
We verify the hypotheses of Stein's Lemma III.1.5.1 in our setting: $\psi\in\mathcal{S}(\mathbb{R}^n)$ (given), $\int\psi\,d\mathcal{L}^n = 1$ (given), $f\in\mathcal{S}'(\mathbb{R}^n)$ (given). The cutoff $\Psi$ is the one fixed in Step 2. The conclusion of the Lemma is exactly (LP).
[/step]
[step:Combine $(\mathrm{P}_k)$ and (LP) and sum the geometric tails using Schwartz decay of $\hat\varphi$]
Fix $x\in\mathbb{R}^n$ and $t > 0$. By $f = \sum_k\Delta_k f$ and the linearity of convolution with $\varphi_t$,
\begin{align*}
|\varphi_t * f|(x)\le\sum_{k\in\mathbb{Z}}|\varphi_t * \Delta_k f|(x).
\end{align*}
Using $(\mathrm{P}_k)$ with $r = 1$ for the gross bound and (LP) for the pointwise size $|\Delta_k f|\le C_n M_\psi f$ a.e., the Hardy--Littlewood maximal operator applied to a constant function gives $M(|\Delta_k f|)(x)\le C_n\,M_\psi f(x)$ pointwise (by the elementary $M(cg)\le|c|\,Mg$ for measurable bounded $g$, and the fact that $\Delta_k f\in L^\infty_{\mathrm{loc}}$ a.e. by Stein's argument).
The Schwartz decay of $\hat\varphi$ supplies the convergence factor: since $\hat\varphi$ is rapidly decreasing and $\Psi_k$ is supported in $\{|\xi|\asymp 2^k\}$, on the support of $\Psi_k$ one has
\begin{align*}
|\hat\varphi(t\xi)\,\Psi_k(\xi)|\le C_N\,(1 + 2^k t)^{-N}\qquad \text{for any } N\in\mathbb{N},
\end{align*>
provided $2^k t\ge 1$ (the high-frequency regime). For $2^k t\le 1$ (the low-frequency regime), the symmetric Schwartz decay of $\hat\psi$ in the comparison kernel — applied via the same Littlewood--Paley analysis of $\psi_t * f$ — supplies the corresponding bound $(1 + 2^{-k}/t)^{-N}$ in that regime. The combined decay $(1 + 2^k t)^{-N}$ for $2^k t\ge 1$ and $(1 + 2^{-k}/t)^{-N}$ for $2^k t\le 1$ is summable in $k$ uniformly in $t$, and the precise summation is carried out in [Stein, op. cit., Ch. III §1.5, proof of Theorem 1].
The conclusion of the cited proof is
\begin{align*}
\sup_{t > 0}|\varphi_t * f|(x)\le C(\varphi,\psi,n)\,M_\psi f(x)\qquad \text{for } \mathcal{L}^n\text{-a.e. } x\in\mathbb{R}^n,
\end{align*}
which is $(\ast)$.
[/step]
[step:Integrate and conclude the equivalence and kernel-independence of $H^1$]
Integrating $(\ast)$ over $\mathbb{R}^n$,
\begin{align*}
\|M_\varphi f\|_{L^1(\mathbb{R}^n)}\le C(\varphi,\psi,n)\,\|M_\psi f\|_{L^1(\mathbb{R}^n)},
\end{align*}
and swapping the roles of $\varphi$ and $\psi$ (the cited Lemma III.1.5.1 is symmetric in the choice of approximating kernel),
\begin{align*}
\|M_\psi f\|_{L^1(\mathbb{R}^n)}\le C(\psi,\varphi,n)\,\|M_\varphi f\|_{L^1(\mathbb{R}^n)}.
\end{align*}
Hence $\|M_\varphi f\|_{L^1}\asymp_{\varphi,\psi,n}\|M_\psi f\|_{L^1}$. The Hardy space
\begin{align*}
H^1(\mathbb{R}^n) := \{f\in\mathcal{S}'(\mathbb{R}^n):M_\varphi f\in L^1(\mathbb{R}^n)\}
\end{align*}
and its topology (defined by the norm $\|f\|_{H^1} := \|M_\varphi f\|_{L^1}$) are therefore independent of the choice of $\varphi$, with equivalent norms across different choices. This completes the sketch.
[/step]