[proofplan]
For a fixed set of indices $S$, orthogonality separates the approximation error into the squared errors of the chosen coefficients and the unchanged tail outside $S$. This shows that the optimal coefficients on $S$ are exactly the Fourier coefficients $a_k$. The remaining problem is therefore purely numerical: choose $n$ coefficients whose squared moduli have maximal possible sum. The decreasing rearrangement records precisely this maximal retained energy, and subtracting it from the total Hilbert-space energy gives the stated error formula.
[/proofplan]
[step:Minimize the error over a fixed coordinate subspace]
Fix a finite set $S \subset I$ with $|S|=n$. Define the finite-dimensional coordinate subspace
\begin{align*}
V_S := \operatorname{span}\{e_k : k \in S\} \subset H.
\end{align*}
For an arbitrary vector $v \in V_S$, write
\begin{align*}
v = \sum_{k \in S} c_k e_k
\end{align*}
with uniquely determined coefficients $c_k \in \mathbb{F}$. Since
\begin{align*}
f = \sum_{k \in I} a_k e_k
\end{align*}
in $H$, orthonormality gives
\begin{align*}
f-v = \sum_{k \in S} (a_k-c_k)e_k + \sum_{k \in I \setminus S} a_k e_k.
\end{align*}
The two sums are orthogonal because they are supported on disjoint subsets of an [orthonormal basis](/page/Orthonormal%20Basis). Hence
\begin{align*}
\|f-v\|_H^2
=
\sum_{k \in S} |a_k-c_k|^2 + \sum_{k \in I \setminus S} |a_k|^2.
\end{align*}
The second term is independent of the coefficients $(c_k)_{k \in S}$, while the first term is minimized uniquely by $c_k=a_k$ for every $k \in S$. Therefore the unique best approximation to $f$ from $V_S$ is
\begin{align*}
P_S f := \sum_{k \in S} a_k e_k,
\end{align*}
and its squared error is
\begin{align*}
\|f-P_S f\|_H^2
=
\sum_{k \in I \setminus S} |a_k|^2.
\end{align*}
[guided]
Fix a finite set $S \subset I$ with $|S|=n$. The coordinate subspace determined by $S$ is
\begin{align*}
V_S := \operatorname{span}\{e_k : k \in S\} \subset H.
\end{align*}
Every element $v \in V_S$ has a unique expansion
\begin{align*}
v = \sum_{k \in S} c_k e_k
\end{align*}
with coefficients $c_k \in \mathbb{F}$, because the vectors $(e_k)_{k \in S}$ are orthonormal and therefore linearly independent.
The point is to separate the error into orthogonal coordinates. Since the theorem assumes
\begin{align*}
f = \sum_{k \in I} a_k e_k
\end{align*}
with convergence in $H$, subtracting the finite sum for $v$ gives
\begin{align*}
f-v = \sum_{k \in S} (a_k-c_k)e_k + \sum_{k \in I \setminus S} a_k e_k.
\end{align*}
The first sum lies in $\operatorname{span}\{e_k:k\in S\}$, while the second lies in the closed span of the basis vectors indexed by $I \setminus S$. These two subspaces are orthogonal because $(e_k)_{k \in I}$ is an orthonormal family and the index sets are disjoint.
Therefore the Pythagorean identity for orthogonal vectors gives
\begin{align*}
\|f-v\|_H^2
=
\sum_{k \in S} |a_k-c_k|^2 + \sum_{k \in I \setminus S} |a_k|^2.
\end{align*}
Now only the first term depends on the chosen coefficients $c_k$. Each summand $|a_k-c_k|^2$ is nonnegative and is equal to $0$ exactly when $c_k=a_k$. Thus the whole expression is minimized uniquely by choosing $c_k=a_k$ for every $k \in S$.
Consequently the best approximation to $f$ from $V_S$ is the coordinate projection
\begin{align*}
P_S f := \sum_{k \in S} a_k e_k,
\end{align*}
and the corresponding squared error is
\begin{align*}
\|f-P_S f\|_H^2
=
\sum_{k \in I \setminus S} |a_k|^2.
\end{align*}
[/guided]
[/step]
[step:Rewrite the fixed-set error as total energy minus retained energy]
For every finite set $S \subset I$ with $|S|=n$, the orthonormal expansion of $f$ gives the energy identity
\begin{align*}
\|f\|_H^2 = \sum_{k \in I} |a_k|^2.
\end{align*}
Combining this identity with the error formula from the previous step yields
\begin{align*}
\|f-P_S f\|_H^2
=
\|f\|_H^2 - \sum_{k \in S} |a_k|^2.
\end{align*}
Thus minimizing the squared error over all $n$-element subsets $S \subset I$ is equivalent to maximizing the retained coefficient energy
\begin{align*}
\sum_{k \in S} |a_k|^2.
\end{align*}
[/step]
[step:Identify the maximal retained energy from the decreasing rearrangement]
By definition, $(a_j^*)_{j \ge 1}$ lists the coefficient moduli $|a_k|$ in nonincreasing order, with zeros appended after the nonzero coefficients if necessary. Therefore, for every finite set $S \subset I$ with $|S|=n$,
\begin{align*}
\sum_{k \in S} |a_k|^2 \leq \sum_{j=1}^n (a_j^*)^2.
\end{align*}
Conversely, for every $\varepsilon>0$, the definition of decreasing rearrangement gives a finite set $S_\varepsilon \subset I$ with $|S_\varepsilon|=n$ such that
\begin{align*}
\sum_{k \in S_\varepsilon} |a_k|^2
>
\sum_{j=1}^n (a_j^*)^2 - \varepsilon.
\end{align*}
Hence
\begin{align*}
\sup_{\substack{S \subset I,\ |S|=n}}
\sum_{k \in S} |a_k|^2
=
\sum_{j=1}^n (a_j^*)^2.
\end{align*}
[/step]
[step:Compute the best squared error and characterize attainment]
Using the fixed-set minimization and the retained-energy supremum, we obtain
\begin{align*}
\sigma_n(f)_{H,\mathcal D}^2
=
\inf_{\substack{S \subset I,\ |S|=n}}
\left(\|f\|_H^2-\sum_{k \in S}|a_k|^2\right).
\end{align*}
Taking the infimum is the same as subtracting the supremum of the retained energies, so
\begin{align*}
\sigma_n(f)_{H,\mathcal D}^2
=
\|f\|_H^2-\sum_{j=1}^n (a_j^*)^2.
\end{align*}
Since
\begin{align*}
\|f\|_H^2 = \sum_{j\ge 1}(a_j^*)^2,
\end{align*}
we conclude that
\begin{align*}
\sigma_n(f)_{H,\mathcal D}^2
=
\sum_{j>n}(a_j^*)^2.
\end{align*}
If a finite set $S \subset I$ with $|S|=n$ satisfies
\begin{align*}
\sum_{k \in S}|a_k|^2=\sum_{j=1}^n(a_j^*)^2,
\end{align*}
then the projection
\begin{align*}
P_S f=\sum_{k \in S}a_k e_k
\end{align*}
attains the infimum and is a best $n$-term approximant. If no such set exists, then no $n$-element coordinate projection can attain the supremal retained energy, and the preceding computation gives only the infimum of the squared errors. This proves all assertions.
[/step]