[proofplan]
The proof is a finite-dimensional linear algebra argument applied to the Hecke operators. We use the standard Hecke facts that the operators $T_n$ commute with one another and are self-adjoint for the Petersson inner product. A commuting family of [self-adjoint operators](/page/Self-Adjoint%20Operators) admits an orthogonal simultaneous eigenspace decomposition; then the level-$1$ Fourier coefficient formula for Hecke operators shows that every nonzero simultaneous eigenform has nonzero first Fourier coefficient, so each basis vector can be normalized.
[/proofplan]
[step:Use the self-adjoint commuting Hecke operators to build simultaneous eigenspaces]
Write
\begin{align*}
V := S_k(SL_2(\mathbb{Z})).
\end{align*}
The Petersson inner product makes $V$ a finite-dimensional complex [inner product space](/page/Inner%20Product%20Space). We use the standard normalization of the level-$1$ Hecke operators, namely the operators
\begin{align*}
T_n: V &\to V
\end{align*}
whose Fourier coefficients satisfy
\begin{align*}
a_r(T_nf) = \sum_{d \mid \gcd(r,n)} d^{k-1} a_{rn/d^2}(f)
\end{align*}
for every $f \in V$ and every $r,n \in \mathbb{N}$. With this normalization, the Hecke operators are self-adjoint for the Petersson inner product and commute:
\begin{align*}
T_mT_n = T_nT_m
\end{align*}
for all $m,n \in \mathbb{N}$.
We prove the [simultaneous diagonalisation](/theorems/2403) statement for the family $\mathcal{T} := \{T_n : n \in \mathbb{N}\}$. Let $W \subset V$ be a nonzero subspace invariant under every operator in $\mathcal{T}$. If every $T_n|_W$ is scalar, then $W$ is already a common eigenspace for the whole family. Otherwise choose $n_0 \in \mathbb{N}$ such that $T_{n_0}|_W$ is not scalar. Since $T_{n_0}$ is self-adjoint on $V$ and $W$ is invariant under $T_{n_0}$, the restriction $T_{n_0}|_W: W \to W$ is self-adjoint for the restricted Petersson inner product. The finite-dimensional spectral theorem gives an [orthogonal decomposition](/theorems/436) of $W$ into proper nonzero eigenspaces of $T_{n_0}|_W$.
Each such eigenspace is invariant under every $T_n$. Indeed, if
\begin{align*}
E_\lambda := \{f \in W : T_{n_0}f = \lambda f\}
\end{align*}
for an eigenvalue $\lambda \in \mathbb{C}$ of $T_{n_0}|_W$, then for $f \in E_\lambda$ and $n \in \mathbb{N}$,
\begin{align*}
T_{n_0}(T_nf) = T_n(T_{n_0}f) = T_n(\lambda f) = \lambda T_nf.
\end{align*}
Thus $T_nf \in E_\lambda$.
Starting from $W=V$, repeat this refinement only on summands on which some $T_n$ is not scalar. Every genuine refinement replaces a nonzero summand by an orthogonal direct sum of proper nonzero subspaces, so the maximal dimension of a non-final summand decreases after finitely many refinements at that dimension. Since dimensions are positive integers bounded by $\dim_{\mathbb{C}} V$, the process terminates by induction on dimension. We obtain an orthogonal decomposition
\begin{align*}
V = \bigoplus_{\alpha \in A} V_\alpha,
\end{align*}
where $A$ is a finite index set and, for every $\alpha \in A$ and every $n \in \mathbb{N}$, there is a scalar $\lambda_{\alpha,n} \in \mathbb{C}$ such that
\begin{align*}
T_nf = \lambda_{\alpha,n}f
\end{align*}
for all $f \in V_\alpha$. Choosing an orthogonal basis of each nonzero $V_\alpha$ gives an orthogonal basis of $V$ consisting of simultaneous Hecke eigenforms.
[guided]
Let
\begin{align*}
V := S_k(SL_2(\mathbb{Z})).
\end{align*}
This is a finite-dimensional complex [vector space](/page/Vector%20Space), and the Petersson inner product makes it a finite-dimensional complex inner product space. We use the standard normalization of the level-$1$ Hecke operators: for each $n \in \mathbb{N}$, the operator
\begin{align*}
T_n: V &\to V
\end{align*}
is the Hecke operator whose Fourier coefficients satisfy
\begin{align*}
a_r(T_nf) = \sum_{d \mid \gcd(r,n)} d^{k-1} a_{rn/d^2}(f)
\end{align*}
for every $f \in V$ and every $r,n \in \mathbb{N}$. With this convention, the standard Hecke algebra facts used here are that each $T_n$ is self-adjoint for the Petersson inner product and that the Hecke operators commute:
\begin{align*}
T_mT_n = T_nT_m
\end{align*}
for all $m,n \in \mathbb{N}$.
The point that needs proof is not just simultaneous diagonalisation for two operators, but simultaneous diagonalisation for the entire infinite family $\{T_n : n \in \mathbb{N}\}$. We handle this by refining invariant subspaces only when some operator in the family is not yet scalar on that subspace.
Let $W \subset V$ be a nonzero subspace invariant under every $T_n$. If every restricted operator $T_n|_W$ is scalar, then every nonzero vector in $W$ is already an eigenvector for every Hecke operator, so no further work is needed on $W$. If not, choose $n_0 \in \mathbb{N}$ such that $T_{n_0}|_W$ is not scalar. Because $T_{n_0}$ is self-adjoint on $V$ and maps $W$ into $W$, its restriction
\begin{align*}
T_{n_0}|_W: W &\to W
\end{align*}
is self-adjoint with respect to the restricted Petersson inner product: for $f,g \in W$,
\begin{align*}
(T_{n_0}f,g) = (f,T_{n_0}g),
\end{align*}
and $T_{n_0}g \in W$. The finite-dimensional spectral theorem therefore decomposes $W$ as an orthogonal direct sum of eigenspaces of $T_{n_0}|_W$. Since $T_{n_0}|_W$ is not scalar, this decomposition is a genuine refinement into proper nonzero subspaces.
We must check that this refinement is compatible with every other Hecke operator. Let
\begin{align*}
E_\lambda := \{f \in W : T_{n_0}f = \lambda f\}
\end{align*}
be one eigenspace of $T_{n_0}|_W$. For $f \in E_\lambda$ and $n \in \mathbb{N}$, commutativity gives
\begin{align*}
T_{n_0}(T_nf) = T_n(T_{n_0}f) = T_n(\lambda f) = \lambda T_nf.
\end{align*}
Hence $T_nf \in E_\lambda$. Thus every eigenspace produced by the spectral theorem remains invariant under the entire Hecke family.
Now start with $W=V$ and repeat this procedure on any summand on which at least one $T_n$ is not scalar. The termination argument is finite-dimensional: a genuine refinement replaces a summand by proper nonzero subspaces of smaller dimension, and dimensions are positive integers at most $\dim_{\mathbb{C}}V$. Induction on the dimension of the summands shows that after finitely many refinements no summand admits further refinement. Therefore each final summand $V_\alpha$ has the property that every $T_n|_{V_\alpha}$ is scalar.
We have obtained an orthogonal decomposition
\begin{align*}
V = \bigoplus_{\alpha \in A} V_\alpha,
\end{align*}
where $A$ is a finite index set and, for every $\alpha \in A$ and every $n \in \mathbb{N}$, there exists $\lambda_{\alpha,n} \in \mathbb{C}$ such that
\begin{align*}
T_nf = \lambda_{\alpha,n}f
\end{align*}
for all $f \in V_\alpha$. Finally, choose an orthogonal basis inside each nonzero $V_\alpha$. Since every vector in $V_\alpha$ is a simultaneous eigenvector for all $T_n$, the union of these bases is an orthogonal basis of $V$ consisting of simultaneous Hecke eigenforms.
[/guided]
[/step]
[step:Show that a nonzero simultaneous eigenform has nonzero first Fourier coefficient]
Let $f \in V$ be a nonzero simultaneous Hecke eigenform. Write its Fourier expansion at infinity as
\begin{align*}
f(z) = \sum_{r=1}^{\infty} a_r(f)q^r,
\qquad q := e^{2\pi iz}.
\end{align*}
For each $n \in \mathbb{N}$, let $\lambda_n(f) \in \mathbb{C}$ be the scalar satisfying
\begin{align*}
T_nf = \lambda_n(f)f.
\end{align*}
Using the coefficient formula for the normalized level-$1$ Hecke operator stated above with $r=1$, the only divisor of $\gcd(1,n)$ is $1$, so
\begin{align*}
a_1(T_nf) = a_n(f).
\end{align*}
On the other hand, using the eigenvalue equation and linearity of the coefficient functional $a_1$, we get
\begin{align*}
a_1(T_nf) = a_1(\lambda_n(f)f) = \lambda_n(f)a_1(f).
\end{align*}
Therefore
\begin{align*}
a_n(f) = \lambda_n(f)a_1(f)
\end{align*}
for every $n \in \mathbb{N}$. If $a_1(f)=0$, then $a_n(f)=0$ for every $n \in \mathbb{N}$, so every Fourier coefficient of $f$ vanishes. Hence $f=0$, contradicting the choice of $f$. Thus
\begin{align*}
a_1(f) \neq 0.
\end{align*}
[guided]
Let $f \in V$ be a nonzero simultaneous Hecke eigenform. Since $f$ is a cusp form for $SL_2(\mathbb{Z})$, it has a Fourier expansion at the cusp infinity of the form
\begin{align*}
f(z) = \sum_{r=1}^{\infty} a_r(f)q^r,
\qquad q := e^{2\pi iz}.
\end{align*}
The coefficient $a_r(f)$ denotes the coefficient of $q^r$ in this expansion.
Because $f$ is a simultaneous eigenform, for each $n \in \mathbb{N}$ there is a scalar $\lambda_n(f) \in \mathbb{C}$ such that
\begin{align*}
T_nf = \lambda_n(f)f.
\end{align*}
We now compare the first Fourier coefficient of both sides after applying $T_n$.
The normalized level-$1$ Hecke coefficient formula stated in the previous step says
\begin{align*}
a_r(T_nf) = \sum_{d \mid \gcd(r,n)} d^{k-1} a_{rn/d^2}(f).
\end{align*}
Taking $r=1$, the set of positive divisors of $\gcd(1,n)$ is $\{1\}$, so the formula becomes
\begin{align*}
a_1(T_nf) = a_n(f).
\end{align*}
The eigenvalue equation gives another expression for the same coefficient. Since $a_1$ is a complex-linear coefficient functional,
\begin{align*}
a_1(T_nf) = a_1(\lambda_n(f)f) = \lambda_n(f)a_1(f).
\end{align*}
Combining these two identities gives
\begin{align*}
a_n(f) = \lambda_n(f)a_1(f)
\end{align*}
for every $n \in \mathbb{N}$.
This identity shows that all Fourier coefficients of a simultaneous eigenform are determined by its first Fourier coefficient and its Hecke eigenvalues. If $a_1(f)=0$, then
\begin{align*}
a_n(f)=0
\end{align*}
for every $n \in \mathbb{N}$. Therefore the entire Fourier expansion of $f$ is zero, so $f=0$. This contradicts the assumption that $f$ is nonzero. Hence
\begin{align*}
a_1(f) \neq 0.
\end{align*}
[/guided]
[/step]
[step:Normalize the orthogonal simultaneous eigenbasis]
Let
\begin{align*}
\mathcal{B} = \{f_1,\dots,f_d\}
\end{align*}
be the orthogonal simultaneous Hecke eigenbasis constructed above, where $d=\dim_{\mathbb{C}}V$. If $d=0$, the assertion is vacuous. If $d>0$, then each $f_j$ is nonzero, so the previous step gives
\begin{align*}
a_1(f_j) \neq 0
\end{align*}
for every $j \in \{1,\dots,d\}$.
Define
\begin{align*}
g_j := a_1(f_j)^{-1}f_j.
\end{align*}
Then $a_1(g_j)=1$. Since scalar multiplication by a nonzero complex number preserves eigenspaces, each $g_j$ remains a simultaneous eigenform for every $T_n$. Since scalar multiplication also preserves orthogonality between distinct vectors, the family
\begin{align*}
\{g_1,\dots,g_d\}
\end{align*}
is an orthogonal basis of $V$ consisting of normalized simultaneous Hecke eigenforms. This proves the theorem.
[/step]