[proofplan]
We first compute $\|A - A_k\|_2$ and $\|A - A_k\|_F$ directly from the SVD, obtaining $\sigma_{k+1}$ and $(\sum_{i=k+1}^r \sigma_i^2)^{1/2}$ respectively. For the optimality in spectral norm, we use a dimension-counting argument: any rank-$k$ matrix $B$ has a kernel of dimension at least $n - k$, while the span of the first $k + 1$ right singular vectors has dimension $k + 1$. These subspaces must intersect nontrivially, yielding a unit vector on which $\|A - B\|_2 \ge \sigma_{k+1}$. For the Frobenius norm, we use the unitary invariance of $\|\cdot\|_F$ to reduce the problem to optimising over diagonal matrices, where the solution is immediate.
[/proofplan]
[step:Compute $\|A - A_k\|_2$ and $\|A - A_k\|_F$ from the SVD]
Since $A = \sum_{i=1}^{r} \sigma_i u_i v_i^\top$ and $A_k = \sum_{i=1}^{k} \sigma_i u_i v_i^\top$, the difference is
\begin{align*}
A - A_k = \sum_{i=k+1}^{r} \sigma_i u_i v_i^\top.
\end{align*}
This is itself an SVD: the vectors $u_{k+1}, \dots, u_r$ are orthonormal columns of $U$, the vectors $v_{k+1}, \dots, v_r$ are orthonormal columns of $V$, and $\sigma_{k+1} \ge \cdots \ge \sigma_r > 0$. Therefore the singular values of $A - A_k$ are $\sigma_{k+1}, \dots, \sigma_r$.
The spectral norm of a matrix equals its largest singular value, so
\begin{align*}
\|A - A_k\|_2 = \sigma_{k+1}.
\end{align*}
The Frobenius norm satisfies $\|M\|_F^2 = \sum_{i} \sigma_i(M)^2$ (the sum of squares of all singular values), so
\begin{align*}
\|A - A_k\|_F^2 = \sum_{i=k+1}^{r} \sigma_i^2.
\end{align*}
[guided]
The SVD gives us a clean decomposition of $A$ into rank-one pieces $\sigma_i u_i v_i^\top$. Since these rank-one matrices are "orthogonal" in a precise sense (the $u_i$ are orthonormal and the $v_i$ are orthonormal), truncating the sum simply drops terms without disturbing the remaining ones.
Since $A = \sum_{i=1}^{r} \sigma_i u_i v_i^\top$, the truncated approximation $A_k = \sum_{i=1}^{k} \sigma_i u_i v_i^\top$ yields:
\begin{align*}
A - A_k = \sum_{i=k+1}^{r} \sigma_i u_i v_i^\top.
\end{align*}
Why is this itself an SVD? We need to verify three things: (i) $u_{k+1}, \dots, u_r$ are orthonormal, which holds because they are a subset of the orthonormal columns of $U$; (ii) $v_{k+1}, \dots, v_r$ are orthonormal, for the same reason with $V$; (iii) the coefficients $\sigma_{k+1} \ge \cdots \ge \sigma_r > 0$ are positive and ordered, which is given. Thus $A - A_k$ has singular values $\sigma_{k+1}, \dots, \sigma_r$.
The spectral norm equals the largest singular value: $\|A - A_k\|_2 = \sigma_{k+1}$. The Frobenius norm is the root-sum-of-squares of all singular values. To see this, note that for any matrix $M$ with SVD $M = \sum_i s_i p_i q_i^\top$:
\begin{align*}
\|M\|_F^2 = \operatorname{tr}(M^\top M) = \operatorname{tr}\left(\sum_{i,j} s_i s_j (q_i q_j^\top)(p_i^\top p_j)\right) = \sum_i s_i^2,
\end{align*}
using the orthonormality $p_i^\top p_j = \delta_{ij}$ and $\operatorname{tr}(q_i q_i^\top) = q_i^\top q_i = 1$. Applying this to $A - A_k$:
\begin{align*}
\|A - A_k\|_F^2 = \sum_{i=k+1}^{r} \sigma_i^2.
\end{align*}
[/guided]
[/step]
[step:Prove optimality in the spectral norm via a dimension argument]
Let $B \in \mathbb{R}^{m \times n}$ be any matrix with $\operatorname{rank}(B) \le k$. We must show $\|A - B\|_2 \ge \sigma_{k+1}$.
Define the subspace $V_{k+1} := \operatorname{span}(v_1, \dots, v_{k+1}) \subset \mathbb{R}^n$, which has dimension $k + 1$. Since $\operatorname{rank}(B) \le k$, the kernel $\ker B = \{x \in \mathbb{R}^n : Bx = 0\}$ has dimension at least $n - k$ by the rank-nullity theorem.
We apply a dimension count: $\dim V_{k+1} + \dim \ker B \ge (k+1) + (n-k) = n + 1 > n$. Since both are subspaces of $\mathbb{R}^n$, which has dimension $n$, there exists a nonzero vector $z \in V_{k+1} \cap \ker B$.
Normalise $z$ so that $|z| = 1$. Since $z \in V_{k+1}$, we can write $z = \sum_{i=1}^{k+1} c_i v_i$ for some coefficients $c_1, \dots, c_{k+1} \in \mathbb{R}$ with $\sum_{i=1}^{k+1} c_i^2 = 1$ (using the orthonormality of $v_1, \dots, v_{k+1}$). Since $z \in \ker B$, we have $Bz = 0$, so $(A - B)z = Az$. Now compute:
\begin{align*}
Az = \sum_{i=1}^{r} \sigma_i u_i (v_i^\top z) = \sum_{i=1}^{k+1} \sigma_i c_i u_i,
\end{align*}
where we used $v_i^\top z = v_i^\top \sum_{j=1}^{k+1} c_j v_j = c_i$ for $i \le k+1$ and $v_i^\top z = 0$ for $i > k+1$. Therefore
\begin{align*}
\|A - B\|_2^2 \ge |(A-B)z|^2 = |Az|^2 = \sum_{i=1}^{k+1} \sigma_i^2 c_i^2 \ge \sigma_{k+1}^2 \sum_{i=1}^{k+1} c_i^2 = \sigma_{k+1}^2,
\end{align*}
using $\sigma_i \ge \sigma_{k+1}$ for $i \le k+1$ and the orthonormality of the $u_i$. Hence $\|A - B\|_2 \ge \sigma_{k+1}$.
Since $A_k$ is a rank-$k$ matrix achieving $\|A - A_k\|_2 = \sigma_{k+1}$, it follows that $A_k$ is a best rank-$k$ approximation in spectral norm.
[guided]
The heart of the spectral norm optimality is a dimension argument. Any rank-$k$ matrix $B$ "uses up" at most $k$ dimensions for its column space, leaving a kernel of dimension at least $n - k$. Meanwhile, the first $k + 1$ right singular vectors of $A$ span a $(k+1)$-dimensional subspace. These two subspaces are too large to be disjoint in $\mathbb{R}^n$, so there must be a direction $z$ that $B$ maps to zero but on which $A$ still has substantial energy.
Let $B \in \mathbb{R}^{m \times n}$ with $\operatorname{rank}(B) \le k$. Define $V_{k+1} := \operatorname{span}(v_1, \dots, v_{k+1}) \subset \mathbb{R}^n$. This subspace has dimension $k + 1$ because $v_1, \dots, v_{k+1}$ are orthonormal (hence linearly independent).
By the rank-nullity theorem, $\dim \ker B = n - \operatorname{rank}(B) \ge n - k$. The dimensional intersection formula gives:
\begin{align*}
\dim(V_{k+1} \cap \ker B) &= \dim V_{k+1} + \dim \ker B - \dim(V_{k+1} + \ker B) \\
&\ge (k+1) + (n-k) - n = 1.
\end{align*}
So there exists a nonzero $z \in V_{k+1} \cap \ker B$. Normalise to $|z| = 1$.
Since $z \in V_{k+1}$, write $z = \sum_{i=1}^{k+1} c_i v_i$ with $c_i = v_i^\top z$. The orthonormality of $v_1, \dots, v_{k+1}$ gives $\sum_{i=1}^{k+1} c_i^2 = |z|^2 = 1$.
Since $z \in \ker B$, we have $Bz = 0$, so $(A - B)z = Az$. Now we compute $|Az|^2$. Using $Av_i = \sigma_i u_i$ (from the SVD) and the fact that $v_i^\top z = c_i$ for $i \le k+1$ while $v_i^\top z = 0$ for $i > k+1$ (since $z \in V_{k+1}$ and the $v_i$ are orthonormal):
\begin{align*}
Az = A\sum_{i=1}^{k+1} c_i v_i = \sum_{i=1}^{k+1} c_i \sigma_i u_i.
\end{align*}
The $u_i$ are orthonormal, so:
\begin{align*}
|(A - B)z|^2 = |Az|^2 = \left|\sum_{i=1}^{k+1} c_i \sigma_i u_i\right|^2 = \sum_{i=1}^{k+1} \sigma_i^2 c_i^2.
\end{align*}
Since $\sigma_1 \ge \cdots \ge \sigma_{k+1}$, we bound each term below by replacing $\sigma_i^2$ with $\sigma_{k+1}^2$:
\begin{align*}
\sum_{i=1}^{k+1} \sigma_i^2 c_i^2 \ge \sigma_{k+1}^2 \sum_{i=1}^{k+1} c_i^2 = \sigma_{k+1}^2.
\end{align*}
Since $\|A - B\|_2 = \sup_{|x|=1} |(A - B)x| \ge |(A - B)z| \ge \sigma_{k+1}$, and $\|A - A_k\|_2 = \sigma_{k+1}$ from the previous step, $A_k$ achieves the minimum.
The dimension argument is why this is sometimes called a "min-max" characterisation: the $(k+1)$-th singular value is the minimum over all rank-$k$ approximations of the maximum distortion, and the truncated SVD is the optimizer.
[/guided]
[/step]
[step:Prove optimality in the Frobenius norm via unitary invariance]
Let $B \in \mathbb{R}^{m \times n}$ with $\operatorname{rank}(B) \le k$. We must show $\|A - B\|_F \ge \|A - A_k\|_F$.
The Frobenius norm is unitarily invariant: $\|M\|_F = \|UMV^\top\|_F$ for any orthogonal matrices $U, V$ of compatible dimensions. This holds because $\|M\|_F^2 = \operatorname{tr}(M^\top M)$ and the trace is invariant under orthogonal conjugation.
Define $C := U^\top B V \in \mathbb{R}^{m \times n}$. Since $U$ and $V$ are orthogonal, $\operatorname{rank}(C) = \operatorname{rank}(B) \le k$, and
\begin{align*}
\|A - B\|_F = \|U^\top(A - B)V\|_F = \|U^\top A V - C\|_F = \|\Sigma - C\|_F,
\end{align*}
where $\Sigma$ is the $m \times n$ diagonal matrix of singular values from the SVD $A = U \Sigma V^\top$. Thus the problem reduces to: among all $m \times n$ matrices $C$ of rank at most $k$, minimise $\|\Sigma - C\|_F$.
Let $C = (c_{ij})$. Then
\begin{align*}
\|\Sigma - C\|_F^2 = \sum_{i=1}^{m} \sum_{j=1}^{n} (\Sigma_{ij} - c_{ij})^2 = \sum_{i=1}^{r} (\sigma_i - c_{ii})^2 + \sum_{i=1}^{r} \sum_{j \neq i} c_{ij}^2 + \sum_{\substack{i > r \text{ or} \\ j > r}} c_{ij}^2 + \sum_{i=r+1}^{m} \sum_{j=r+1}^{n} c_{ij}^2,
\end{align*}
where we used $\Sigma_{ij} = 0$ whenever $i \neq j$ or $i > r$. More simply:
\begin{align*}
\|\Sigma - C\|_F^2 = \sum_{i=1}^{r} (\sigma_i - c_{ii})^2 + \sum_{(i,j): i \neq j \text{ or } i > r} c_{ij}^2.
\end{align*}
The second sum is non-negative. To minimise $\|\Sigma - C\|_F^2$ subject to $\operatorname{rank}(C) \le k$, the off-diagonal and extra-diagonal entries of $C$ should be zero (they only increase the cost). With $C$ diagonal, i.e., $C = \operatorname{diag}(d_1, \dots, d_{\min(m,n)})$, the rank constraint becomes: at most $k$ of the $d_i$ are nonzero. The cost is
\begin{align*}
\|\Sigma - C\|_F^2 = \sum_{i=1}^{r} (\sigma_i - d_i)^2 + \sum_{i=r+1}^{\min(m,n)} d_i^2.
\end{align*}
For $i > r$, the optimal choice is $d_i = 0$. For $i \le r$, each term $(\sigma_i - d_i)^2$ is minimised by $d_i = \sigma_i$ (making the term zero). Since we may only choose $k$ nonzero values among $d_1, \dots, d_r$, we minimise the residual by setting $d_i = \sigma_i$ for $i = 1, \dots, k$ (the $k$ largest singular values) and $d_i = 0$ for $i > k$. This gives the minimum
\begin{align*}
\|\Sigma - C\|_F^2 = \sum_{i=k+1}^{r} \sigma_i^2 = \|A - A_k\|_F^2.
\end{align*}
Therefore $\|A - B\|_F \ge \|A - A_k\|_F$ for every rank-$k$ matrix $B$, establishing $A_k$ as the best rank-$k$ approximation in Frobenius norm.
[guided]
The Frobenius norm proof exploits a powerful simplification: unitary invariance. Since $\|M\|_F$ depends only on the singular values of $M$, we can rotate the problem into a coordinate system where $A$ is diagonal, reducing the optimisation to a straightforward combinatorial question about which diagonal entries to keep.
Let $B \in \mathbb{R}^{m \times n}$ with $\operatorname{rank}(B) \le k$. The key property we use is that the Frobenius norm is unitarily invariant: for any orthogonal matrices $P \in \mathbb{R}^{m \times m}$ and $Q \in \mathbb{R}^{n \times n}$,
\begin{align*}
\|PMQ\|_F^2 = \operatorname{tr}(Q^\top M^\top P^\top P M Q) = \operatorname{tr}(Q^\top M^\top M Q) = \operatorname{tr}(M^\top M) = \|M\|_F^2,
\end{align*}
where we used $P^\top P = I$ and the cyclic property of the trace ($\operatorname{tr}(Q^\top N Q) = \operatorname{tr}(N Q Q^\top) = \operatorname{tr}(N)$).
Set $C := U^\top B V$. Since $U$ and $V$ are orthogonal, multiplication by them preserves rank: $\operatorname{rank}(C) = \operatorname{rank}(B) \le k$. By unitary invariance:
\begin{align*}
\|A - B\|_F = \|U^\top(A - B)V\|_F = \|U^\top U \Sigma V^\top V - U^\top B V\|_F = \|\Sigma - C\|_F.
\end{align*}
So we have reduced the problem to: minimise $\|\Sigma - C\|_F^2$ over all $m \times n$ matrices $C$ with $\operatorname{rank}(C) \le k$, where $\Sigma$ is the diagonal matrix with entries $\sigma_1, \dots, \sigma_r$ on the diagonal and zeros elsewhere.
Expanding:
\begin{align*}
\|\Sigma - C\|_F^2 = \sum_{i,j} |\Sigma_{ij} - c_{ij}|^2 = \sum_{i=1}^{r} (\sigma_i - c_{ii})^2 + \sum_{(i,j) \neq (i,i), i \le r} c_{ij}^2 + \sum_{i > r \text{ or } j > r, i \neq j} c_{ij}^2.
\end{align*}
Every off-diagonal entry $c_{ij}$ (with $i \neq j$) appears as $c_{ij}^2 \ge 0$, contributing only cost. Setting all off-diagonal entries to zero cannot increase the rank (it can only decrease it), so the minimiser is diagonal: $C = \operatorname{diag}(d_1, \dots, d_{\min(m,n)})$.
With $C$ diagonal, $\operatorname{rank}(C)$ equals the number of nonzero $d_i$. The constraint $\operatorname{rank}(C) \le k$ means at most $k$ of the $d_i$ are nonzero. The cost becomes:
\begin{align*}
\|\Sigma - C\|_F^2 = \sum_{i=1}^{r} (\sigma_i - d_i)^2 + \sum_{i=r+1}^{\min(m,n)} d_i^2.
\end{align*}
For $i > r$: $d_i = 0$ is optimal (the term $d_i^2$ is minimised at zero, and setting $d_i = 0$ does not use up any of the $k$ nonzero slots). For $i \le r$: the term $(\sigma_i - d_i)^2$ is minimised by $d_i = \sigma_i$, which makes the cost zero. But we can only afford $k$ nonzero values. Which $k$ should we keep? We want to zero out the entries whose removal adds the least cost. Removing $d_i$ (setting it to zero) adds $\sigma_i^2$ to the total. Since $\sigma_1 \ge \cdots \ge \sigma_r$, the cheapest entries to drop are $d_{k+1}, \dots, d_r$. The optimal choice is $d_i = \sigma_i$ for $i = 1, \dots, k$ and $d_i = 0$ for $i > k$, yielding:
\begin{align*}
\min_{\operatorname{rank}(C) \le k} \|\Sigma - C\|_F^2 = \sum_{i=k+1}^{r} \sigma_i^2 = \|A - A_k\|_F^2.
\end{align*}
Since $\|A - B\|_F = \|\Sigma - C\|_F \ge \|A - A_k\|_F$ for every rank-$k$ matrix $B$, and equality holds when $B = A_k$, we conclude that $A_k$ is the best rank-$k$ approximation in Frobenius norm.
[/guided]
[/step]