[proofplan]
We prove the identity by comparing Fourier coefficients. The normalized Hecke operator has an explicit action on $q$-expansions, and this coefficient formula reduces the theorem to a finite arithmetic identity. We first verify the prime-power relation, then use the factorisation of the coefficient formula over distinct primes to assemble the general formula.
[/proofplan]
[step:Write the normalized Hecke action on Fourier coefficients]
Let $V$ be either $M_k(\operatorname{SL}_2(\mathbb{Z}))$ or $S_k(\operatorname{SL}_2(\mathbb{Z}))$. Let
\begin{align*}
\mathbb{H}
&:=
\{z\in\mathbb{C}:\operatorname{Im} z>0\},\\
f: \mathbb{H} &\to \mathbb{C}
\end{align*}
be an element of $V$, and write its Fourier expansion at infinity as
\begin{align*}
f(z) = \sum_{r=0}^{\infty} a_r q^r,
\qquad
q = e^{2\pi i z}.
\end{align*}
For $N \in \mathbb{N}$, we use the standard normalization of the Hecke operator $T_N: V \to V$, namely the defining Fourier-coefficient formula
\begin{align*}
(T_N f)(z)
=
\sum_{r=0}^{\infty}
\left(
\sum_{e \mid \gcd(N,r)} e^{k-1} a_{Nr/e^2}
\right) q^r.
\end{align*}
For $r=0$, the inner sum is understood as the finite sum over all divisors $e \mid N$.
Thus it is enough to prove the corresponding identity for the coefficient operators
\begin{align*}
L_N: \mathbb{C}^{\mathbb{N}_0} &\to \mathbb{C}^{\mathbb{N}_0}, \\
(a_r)_{r \geq 0} &\mapsto (b_r)_{r \geq 0},
\end{align*}
where
\begin{align*}
b_r
=
\sum_{e \mid \gcd(N,r)} e^{k-1} a_{Nr/e^2}.
\end{align*}
Indeed, if the Fourier coefficients of two modular forms agree for every $r \geq 0$, then their difference has identically zero Fourier expansion near the cusp and therefore vanishes as a [holomorphic function](/page/Holomorphic%20Function) on $\mathbb{H}$.
[/step]
[step:Prove the prime-power relation on coefficients]
Fix a prime number $p$. Let $A,B \in \mathbb{N}_0$, and set
\begin{align*}
K := k-1.
\end{align*}
We prove that, for every sequence $(a_r)_{r \geq 0}$,
\begin{align*}
L_{p^A}L_{p^B}
=
\sum_{j=0}^{\min(A,B)} p^{jK} L_{p^{A+B-2j}}.
\end{align*}
Let $r \in \mathbb{N}$. Write
\begin{align*}
r = p^R r_0,
\end{align*}
where $R \in \mathbb{N}_0$ and $r_0 \in \mathbb{N}$ satisfies $p \nmid r_0$.
For $C \in \mathbb{N}_0$, the coefficient formula gives
\begin{align*}
(L_{p^C}a)_r
=
\sum_{i=0}^{\min(C,R)} p^{iK} a_{p^{C+R-2i}r_0}.
\end{align*}
Applying this first with $C=B$ and then with $C=A$, we obtain
\begin{align*}
(L_{p^A}L_{p^B}a)_r
=
\sum_{i=0}^{\min(A,R)}
\sum_{j=0}^{\min(B,A+R-2i)}
p^{(i+j)K}
a_{p^{A+B+R-2(i+j)}r_0}.
\end{align*}
For a fixed integer $\ell \geq 0$, the coefficient of
\begin{align*}
p^{\ell K}a_{p^{A+B+R-2\ell}r_0}
\end{align*}
on the left is the number of pairs $(i,j) \in \mathbb{N}_0^2$ satisfying
\begin{align*}
i+j &= \ell,\\
0 \leq i &\leq \min(A,R),\\
0 \leq j &\leq \min(B,A+R-2i).
\end{align*}
Equivalently,
\begin{align*}
\max(0,\ell-B)
\leq i
\leq
\min(\ell,A,R,A+R-\ell).
\end{align*}
On the other hand, the coefficient of the same term on
\begin{align*}
\sum_{h=0}^{\min(A,B)} p^{hK} L_{p^{A+B-2h}}a
\end{align*}
is the number of pairs $(h,t) \in \mathbb{N}_0^2$ satisfying
\begin{align*}
h+t &= \ell,\\
0 \leq h &\leq \min(A,B),\\
0 \leq t &\leq \min(A+B-2h,R).
\end{align*}
Equivalently,
\begin{align*}
\max(0,\ell-R)
\leq h
\leq
\min(\ell,A,B,A+B-\ell).
\end{align*}
The two displayed intervals have the same number of integers. Indeed, writing the number of integers in $[u,v]\cap\mathbb{Z}$ as $\max(0,v-u+1)$, this is the elementary min-max identity
\begin{align*}
\max\bigl(0,\min(\ell,A,R,A+R-\ell)-\max(0,\ell-B)+1\bigr)
\\
=
\max\bigl(0,\min(\ell,A,B,A+B-\ell)-\max(0,\ell-R)+1\bigr).
\end{align*}
To verify the identity, split into the four cases determined by $\ell\leq B$ or $\ell>B$, and by $\ell\leq R$ or $\ell>R$. In each case subtract the positive lower endpoint, if there is one, from the corresponding upper endpoint. The remaining upper bound is the same on both sides:
\begin{align*}
\min(\ell,A,\min(B,R),A+\min(B,R)-\ell),
\end{align*}
with the convention that a negative displayed upper bound gives the empty interval. Hence the two intervals have the same cardinality.
Therefore the two expressions have the same coefficient of every sequence entry $a_s$ at every positive index $r \in \mathbb{N}$.
It remains to check the constant coefficient $r=0$. Define the finite geometric sum
\begin{align*}
\sigma_C := \sum_{i=0}^{C} p^{iK}
\end{align*}
for each $C \in \mathbb{N}_0$. Since every divisor of $p^C$ divides $\gcd(p^C,0)$, the coefficient formula gives
\begin{align*}
(L_{p^C}a)_0 = \sigma_C a_0.
\end{align*}
Thus the constant coefficient of $L_{p^A}L_{p^B}a$ is $\sigma_A\sigma_B a_0$. The constant coefficient of
\begin{align*}
\sum_{h=0}^{\min(A,B)} p^{hK} L_{p^{A+B-2h}}a
\end{align*}
is
\begin{align*}
\left(\sum_{h=0}^{\min(A,B)} p^{hK}\sigma_{A+B-2h}\right)a_0.
\end{align*}
Let $x := p^K$. If $A \leq B$, the finite geometric-series identity gives
\begin{align*}
\sum_{h=0}^{A} x^h\sum_{t=0}^{A+B-2h}x^t
&=
\frac{1}{1-x}\left(\sum_{h=0}^{A}x^h-\sum_{h=0}^{A}x^{A+B-h+1}\right)\\
&=
\frac{1}{1-x}\left(\frac{1-x^{A+1}}{1-x}-x^{B+1}\frac{1-x^{A+1}}{1-x}\right)\\
&=
\left(\sum_{i=0}^{A}x^i\right)\left(\sum_{j=0}^{B}x^j\right).
\end{align*}
The case $B \leq A$ is the same calculation with $A$ and $B$ interchanged. Therefore
\begin{align*}
\sigma_A\sigma_B
=
\sum_{h=0}^{\min(A,B)} p^{hK}\sigma_{A+B-2h},
\end{align*}
so the constant coefficients also agree. Hence
\begin{align*}
L_{p^A}L_{p^B}
=
\sum_{j=0}^{\min(A,B)} p^{jK} L_{p^{A+B-2j}}.
\end{align*}
[guided]
The point of the prime-power case for positive coefficient indices is that divisibility by $p$ is measured by one integer: the $p$-adic valuation. Fix a prime $p$, write $K=k-1$, and let $r \in \mathbb{N}$. Write $r=p^R r_0$, where $R \in \mathbb{N}_0$ and $r_0 \in \mathbb{N}$ satisfies $p \nmid r_0$. Applying $L_{p^C}$ to a coefficient indexed by $r$ gives
\begin{align*}
(L_{p^C}a)_r
=
\sum_{i=0}^{\min(C,R)} p^{iK} a_{p^{C+R-2i}r_0}.
\end{align*}
Here $i$ records the exponent in the divisor $p^i$ of both $p^C$ and $r$.
Now apply $L_{p^B}$ first and then $L_{p^A}$. The first application changes the exponent of $p$ in the index, so the second divisor parameter is constrained by the new exponent. This gives
\begin{align*}
(L_{p^A}L_{p^B}a)_r
=
\sum_{i=0}^{\min(A,R)}
\sum_{j=0}^{\min(B,A+R-2i)}
p^{(i+j)K}
a_{p^{A+B+R-2(i+j)}r_0}.
\end{align*}
Thus the contribution with total divisor exponent $\ell=i+j$ is counted by admissible pairs $(i,j)$.
The right-hand side first chooses an overlap exponent $h$, contributing the scalar $p^{hK}$, and then applies $L_{p^{A+B-2h}}$. Its contribution with total exponent $\ell=h+t$ is counted by admissible pairs $(h,t)$ satisfying
\begin{align*}
0 \leq h \leq \min(A,B),
\qquad
0 \leq t \leq \min(A+B-2h,R),
\qquad
h+t=\ell.
\end{align*}
Equivalently, $h$ ranges through
\begin{align*}
\max(0,\ell-R)
\leq h
\leq
\min(\ell,A,B,A+B-\ell),
\end{align*}
whereas the left-hand admissible values of $i$ range through
\begin{align*}
\max(0,\ell-B)
\leq i
\leq
\min(\ell,A,R,A+R-\ell).
\end{align*}
The min-max identity proved in the exact argument shows that these two intervals have the same number of integers. Therefore every term
\begin{align*}
p^{\ell K}a_{p^{A+B+R-2\ell}r_0}
\end{align*}
appears with the same multiplicity on both sides for every $r \in \mathbb{N}$.
The constant coefficient must be handled separately because $0$ has no finite $p$-adic valuation. Define
\begin{align*}
\sigma_C := \sum_{i=0}^{C} p^{iK}
\end{align*}
for $C \in \mathbb{N}_0$. Since every divisor of $p^C$ divides $\gcd(p^C,0)$, we have
\begin{align*}
(L_{p^C}a)_0 = \sigma_Ca_0.
\end{align*}
Hence the left-hand constant coefficient is $\sigma_A\sigma_Ba_0$. The right-hand constant coefficient is
\begin{align*}
\left(\sum_{h=0}^{\min(A,B)}p^{hK}\sigma_{A+B-2h}\right)a_0.
\end{align*}
Let $x:=p^K$. If $A\leq B$, then
\begin{align*}
\sum_{h=0}^{A}x^h\sum_{t=0}^{A+B-2h}x^t
&=
\frac{1}{1-x}\left(\sum_{h=0}^{A}x^h-\sum_{h=0}^{A}x^{A+B-h+1}\right)\\
&=
\left(\sum_{i=0}^{A}x^i\right)\left(\sum_{j=0}^{B}x^j\right).
\end{align*}
If $B\leq A$, the same finite geometric-series computation with $A$ and $B$ interchanged gives the same identity. Therefore the constant coefficients agree. Combining the positive-index and constant-index computations proves the prime-power relation.
[/guided]
[/step]
[step:Factor the coefficient operators over distinct primes]
Let $M,N \in \mathbb{N}$ with $\gcd(M,N)=1$. For $r \in \mathbb{N}_0$, the coefficient of $L_ML_Na$ at $r$ is
\begin{align*}
(L_ML_Na)_r
=
\sum_{e_1 \mid \gcd(M,r)} e_1^{k-1}
\sum_{e_2 \mid \gcd(N,Mr/e_1^2)} e_2^{k-1}
a_{N(Mr/e_1^2)/e_2^2}.
\end{align*}
For such $e_1$, every prime divisor of $e_2$ divides $N$ and is therefore coprime to $M$ and $e_1$. Hence
\begin{align*}
e_2 \mid \frac{Mr}{e_1^2}
\quad \Longleftrightarrow \quad
e_2 \mid r.
\end{align*}
Thus the admissible pairs are exactly the pairs
\begin{align*}
e_1 \mid \gcd(M,r),
\qquad
e_2 \mid \gcd(N,r).
\end{align*}
Since $\gcd(M,N)=1$, multiplication gives a bijection from these pairs to divisors $e=e_1e_2$ of $\gcd(MN,r)$. Under this bijection,
\begin{align*}
e^{k-1}=e_1^{k-1}e_2^{k-1},
\qquad
\frac{MNr}{e^2}
=
\frac{N(Mr/e_1^2)}{e_2^2}.
\end{align*}
Substituting these identities into the coefficient formula gives
\begin{align*}
L_ML_N = L_{MN}.
\end{align*}
The same argument gives $L_NL_M=L_{MN}$, so the coefficient operators attached to coprime indices commute and multiply.
[guided]
When $M$ and $N$ are coprime, their prime factors do not interact, but we still have to check the divisibility condition after the first operator changes the coefficient index. For $r \in \mathbb{N}_0$, applying $L_N$ first and then $L_M$ gives
\begin{align*}
(L_ML_Na)_r
=
\sum_{e_1 \mid \gcd(M,r)} e_1^{k-1}
\sum_{e_2 \mid \gcd(N,Mr/e_1^2)} e_2^{k-1}
a_{N(Mr/e_1^2)/e_2^2}.
\end{align*}
The outer divisor $e_1$ is the divisor chosen for $L_M$, and the inner divisor $e_2$ is the divisor chosen for $L_N$ after the index has become $Mr/e_1^2$.
Because $e_1 \mid M$ and $\gcd(M,N)=1$, every divisor $e_2$ of $N$ is coprime to both $M$ and $e_1$. Therefore the condition
\begin{align*}
e_2 \mid \frac{Mr}{e_1^2}
\end{align*}
is equivalent to $e_2 \mid r$: the factors of $M$ and $e_1$ cannot contribute any prime factor of $e_2$. Thus the iterated coefficient formula is indexed exactly by pairs
\begin{align*}
e_1 \mid \gcd(M,r),
\qquad
e_2 \mid \gcd(N,r).
\end{align*}
Multiplication sends such a pair to $e=e_1e_2$, and this is a bijection onto the divisors of $\gcd(MN,r)$ because $M$ and $N$ have disjoint prime divisors.
The weight separates as
\begin{align*}
(e_1e_2)^{k-1}=e_1^{k-1}e_2^{k-1},
\end{align*}
and the coefficient index is the same index appearing in $L_{MN}$:
\begin{align*}
\frac{MNr}{(e_1e_2)^2}
=
\frac{N(Mr/e_1^2)}{e_2^2}.
\end{align*}
Therefore each summand in the coefficient formula for $L_ML_N$ corresponds to exactly one summand in the coefficient formula for $L_{MN}$, with the same scalar weight and the same sequence entry. Hence
\begin{align*}
L_ML_N=L_{MN}.
\end{align*}
[/guided]
[/step]
[step:Assemble the formula from prime powers]
Write the prime factorizations
\begin{align*}
m=\prod_p p^{\alpha_p},
\qquad
n=\prod_p p^{\beta_p},
\end{align*}
where all but finitely many exponents $\alpha_p,\beta_p \in \mathbb{N}_0$ are zero. By the coprime factorisation just proved,
\begin{align*}
L_mL_n
=
\prod_p L_{p^{\alpha_p}}L_{p^{\beta_p}}.
\end{align*}
Using the prime-power relation for each prime $p$ gives
\begin{align*}
L_mL_n
=
\prod_p
\left(
\sum_{\gamma_p=0}^{\min(\alpha_p,\beta_p)}
p^{\gamma_p(k-1)}
L_{p^{\alpha_p+\beta_p-2\gamma_p}}
\right).
\end{align*}
Expanding the finite product, each choice of exponents $\gamma_p$ determines a divisor
\begin{align*}
d=\prod_p p^{\gamma_p}
\end{align*}
of $\gcd(m,n)$, and every divisor of $\gcd(m,n)$ arises uniquely in this way. Moreover,
\begin{align*}
\prod_p p^{\gamma_p(k-1)}=d^{k-1},
\qquad
\prod_p p^{\alpha_p+\beta_p-2\gamma_p}
=
\frac{mn}{d^2}.
\end{align*}
Therefore
\begin{align*}
L_mL_n
=
\sum_{d \mid \gcd(m,n)}
d^{k-1}L_{mn/d^2}.
\end{align*}
[/step]
[step:Transfer the coefficient identity back to Hecke operators]
Let $f \in V$. Applying the coefficient identity to the Fourier coefficient sequence of $f$ gives
\begin{align*}
T_mT_n f
=
\sum_{d \mid \gcd(m,n)}
d^{k-1}T_{mn/d^2}f.
\end{align*}
Since this holds for every $f \in V$, the operators are equal in $\operatorname{End}_{\mathbb{C}}(V)$:
\begin{align*}
T_mT_n
=
\sum_{d \mid \gcd(m,n)}
d^{k-1}T_{mn/d^2}.
\end{align*}
The argument used only the normalized Hecke action on Fourier expansions, which is valid on both $M_k(\operatorname{SL}_2(\mathbb{Z}))$ and its Hecke-stable subspace $S_k(\operatorname{SL}_2(\mathbb{Z}))$. This proves the theorem.
[/step]