[proofplan]
We use the singular-value decomposition to write $T-T_n$ as the diagonal tail of $T$ after the first $n$ singular directions. Orthogonality of the vectors $u_j$ gives the upper bound on the tail norm, while testing the tail on $v_{n+1}$ gives the matching lower bound when this vector exists. For the optimality statement, any rank-at-most-$n$ operator must vanish on some non-zero vector in the $(n+1)$-dimensional space spanned by $v_1,\dots,v_{n+1}$; the singular-value ordering then forces $T$ to have norm at least $s_{n+1}$ on that vector.
[/proofplan]
[step:Express the compact operator by its singular-value expansion]
By the singular-value decomposition theorem for compact operators, [citetheorem:8402], there are an index set $J=\{1,\dots,m\}$ for some $m\in\mathbb N$ or $J=\mathbb N$, positive singular values $(s_j)_{j\in J}$ ordered non-increasingly, an orthonormal set $(v_j)_{j\in J}$ in $(\ker T)^\perp\subset H$, and an orthonormal set $(u_j)_{j\in J}$ in $\overline{\operatorname{Range}(T)}\subset K$ such that, for every $x\in H$.
Here $\ker T$ denotes the kernel of $T$, and $\operatorname{Range}(T)$ denotes the range of $T$.
\begin{align*}
Tx=\sum_{j\in J}s_j(x,v_j)_H u_j.
\end{align*}
The convergence is in the norm of $K$. The Schmidt truncation is therefore the finite-rank operator
\begin{align*}
T_nx=\sum_{\substack{j\in J, j\le n}}s_j(x,v_j)_H u_j.
\end{align*}
Hence, for every $x\in H$,
\begin{align*}
(T-T_n)x=\sum_{\substack{j\in J, j>n}}s_j(x,v_j)_H u_j.
\end{align*}
If $J$ has at most $n$ elements, this last sum is empty, so $T-T_n=0$ and $s_{n+1}=0$ by convention.
[/step]
[step:Compute the norm of the omitted singular-value tail]
Assume first that $J$ contains the index $n+1$. For $x\in H$, the orthonormality of $(u_j)_{j\in J}$ and Parseval's inequality for the orthonormal set $(v_j)_{j\in J}$ give
\begin{align*}
\|(T-T_n)x\|_K^2
=
\sum_{\substack{j\in J, j>n}}s_j^2 |(x,v_j)_H|^2.
\end{align*}
Since the singular values are non-increasing, $s_j\le s_{n+1}$ for every $j\in J$ with $j>n$. Therefore
\begin{align*}
\|(T-T_n)x\|_K^2
\le
s_{n+1}^2\sum_{\substack{j\in J, j>n}} |(x,v_j)_H|^2
\le
s_{n+1}^2\|x\|_H^2.
\end{align*}
Taking the supremum over all $x\in H$ with $\|x\|_H\le 1$ gives
\begin{align*}
\|T-T_n\|_{\mathcal{L}(H,K)}\le s_{n+1}.
\end{align*}
For the reverse inequality, evaluate the tail at the unit vector $v_{n+1}$. Since $(v_{n+1},v_j)_H=0$ for $j\ne n+1$ and $(v_{n+1},v_{n+1})_H=1$,
\begin{align*}
(T-T_n)v_{n+1}=s_{n+1}u_{n+1}.
\end{align*}
Because $\|u_{n+1}\|_K=1$, this gives
\begin{align*}
\|T-T_n\|_{\mathcal{L}(H,K)}
\ge
\|(T-T_n)v_{n+1}\|_K
=
s_{n+1}.
\end{align*}
Thus
\begin{align*}
\|T-T_n\|_{\mathcal{L}(H,K)}=s_{n+1}.
\end{align*}
If $J$ has at most $n$ elements, the previous step already showed $T-T_n=0$ and $s_{n+1}=0$, so the same equality holds.
[guided]
We now compute the operator norm of the error made by discarding all singular directions after the first $n$. Assume first that the singular value $s_{n+1}$ exists, meaning that $J$ contains $n+1$. For each $x\in H$, the singular-value expansion gives
\begin{align*}
(T-T_n)x=\sum_{\substack{j\in J, j>n}}s_j(x,v_j)_H u_j.
\end{align*}
The point of writing the error this way is that the vectors $u_j$ are orthonormal in $K$. Therefore the norm squared of the sum is the sum of the squared coefficients:
\begin{align*}
\|(T-T_n)x\|_K^2
=
\sum_{\substack{j\in J, j>n}}s_j^2 |(x,v_j)_H|^2.
\end{align*}
The ordering $s_1\ge s_2\ge\cdots$ now gives the decisive estimate. For every omitted index $j>n$, we have $s_j\le s_{n+1}$, and hence
\begin{align*}
\|(T-T_n)x\|_K^2
\le
s_{n+1}^2\sum_{\substack{j\in J, j>n}} |(x,v_j)_H|^2.
\end{align*}
Since $(v_j)_{j\in J}$ is an orthonormal set in $H$, [Bessel's inequality](/theorems/540) gives
\begin{align*}
\sum_{\substack{j\in J, j>n}} |(x,v_j)_H|^2\le \|x\|_H^2.
\end{align*}
Combining the last two displayed formulas yields
\begin{align*}
\|(T-T_n)x\|_K\le s_{n+1}\|x\|_H.
\end{align*}
Taking the supremum over all $x\in H$ with $\|x\|_H\le 1$ proves
\begin{align*}
\|T-T_n\|_{\mathcal{L}(H,K)}\le s_{n+1}.
\end{align*}
To see that this upper bound is sharp, we test the tail on the first omitted right singular vector. The vector $v_{n+1}$ has norm $1$, and all coefficients $(v_{n+1},v_j)_H$ vanish except the coefficient with $j=n+1$. Therefore
\begin{align*}
(T-T_n)v_{n+1}=s_{n+1}u_{n+1}.
\end{align*}
Since $u_{n+1}$ is also a unit vector in $K$,
\begin{align*}
\|(T-T_n)v_{n+1}\|_K=s_{n+1}.
\end{align*}
The definition of the operator norm then gives
\begin{align*}
\|T-T_n\|_{\mathcal{L}(H,K)}
\ge
\|(T-T_n)v_{n+1}\|_K
=
s_{n+1}.
\end{align*}
Together with the upper bound, this proves
\begin{align*}
\|T-T_n\|_{\mathcal{L}(H,K)}=s_{n+1}.
\end{align*}
If there are at most $n$ non-zero singular values, then no tail remains after the truncation: $T_n=T$. In that case the convention gives $s_{n+1}=0$, and the equality becomes $\|T-T_n\|_{\mathcal{L}(H,K)}=0$.
[/guided]
[/step]
[step:Find a unit vector killed by any rank-at-most-$n$ competitor]
Let $R\in\mathcal{L}(H,K)$ satisfy $\operatorname{rank}R\le n$. If $s_{n+1}=0$, then the desired inequality
\begin{align*}
\|T-R\|_{\mathcal{L}(H,K)}\ge 0
\end{align*}
is immediate. We therefore assume $s_{n+1}>0$, so $J$ contains $1,\dots,n+1$.
Define the finite-dimensional subspace $V_{n+1}\subset H$ by
\begin{align*}
V_{n+1}:=\operatorname{span}\{v_1,\dots,v_{n+1}\}.
\end{align*}
Since the vectors $v_1,\dots,v_{n+1}$ are orthonormal, $\dim V_{n+1}=n+1$. Consider the restricted [linear map](/page/Linear%20Map)
\begin{align*}
R|_{V_{n+1}}:V_{n+1}\to \operatorname{Range}(R).
\end{align*}
Its range has dimension at most $n$. Hence its kernel has positive dimension:
\begin{align*}
\dim\ker(R|_{V_{n+1}})
=
\dim V_{n+1}-\dim \operatorname{Range}(R|_{V_{n+1}})
\ge
(n+1)-n
=
1.
\end{align*}
Choose a non-zero vector $y\in\ker(R|_{V_{n+1}})$, and define
\begin{align*}
x:=\frac{y}{\|y\|_H}.
\end{align*}
Then $x\in V_{n+1}$, $\|x\|_H=1$, and $Rx=0$.
[/step]
[step:Use the singular-value ordering on this vector to force the lower bound]
Because $x\in V_{n+1}$ and $(v_1,\dots,v_{n+1})$ is an [orthonormal basis](/page/Orthonormal%20Basis) of $V_{n+1}$, we have
\begin{align*}
x=\sum_{j=1}^{n+1}(x,v_j)_H v_j.
\end{align*}
Applying the singular-value expansion to this finite linear combination gives
\begin{align*}
Tx=\sum_{j=1}^{n+1}s_j(x,v_j)_H u_j.
\end{align*}
Using the orthonormality of $(u_1,\dots,u_{n+1})$ in $K$,
\begin{align*}
\|Tx\|_K^2
=
\sum_{j=1}^{n+1}s_j^2 |(x,v_j)_H|^2.
\end{align*}
For $1\le j\le n+1$, the non-increasing order of the singular values gives $s_j\ge s_{n+1}$. Therefore
\begin{align*}
\|Tx\|_K^2
\ge
s_{n+1}^2\sum_{j=1}^{n+1}|(x,v_j)_H|^2
=
s_{n+1}^2\|x\|_H^2
=
s_{n+1}^2.
\end{align*}
Since $Rx=0$ and $\|x\|_H=1$, the definition of the operator norm gives
\begin{align*}
\|T-R\|_{\mathcal{L}(H,K)}
\ge
\|(T-R)x\|_K
=
\|Tx\|_K
\ge
s_{n+1}.
\end{align*}
This proves that every rank-at-most-$n$ operator $R$ satisfies
\begin{align*}
\|T-R\|_{\mathcal{L}(H,K)}\ge s_{n+1}.
\end{align*}
Together with the equality for the Schmidt truncation, this completes the proof.
[/step]