[guided]The Frobenius norm proof exploits a powerful simplification: unitary invariance. Since $\|M\|_F$ depends only on the singular values of $M$, we can rotate the problem into a coordinate system where $A$ is diagonal, reducing the optimisation to a straightforward combinatorial question about which diagonal entries to keep.
Let $B \in \mathbb{R}^{m \times n}$ with $\operatorname{rank}(B) \le k$. The key property we use is that the Frobenius norm is unitarily invariant: for any orthogonal matrices $P \in \mathbb{R}^{m \times m}$ and $Q \in \mathbb{R}^{n \times n}$,
\begin{align*}
\|PMQ\|_F^2 = \operatorname{tr}(Q^\top M^\top P^\top P M Q) = \operatorname{tr}(Q^\top M^\top M Q) = \operatorname{tr}(M^\top M) = \|M\|_F^2,
\end{align*}
where we used $P^\top P = I$ and the cyclic property of the trace ($\operatorname{tr}(Q^\top N Q) = \operatorname{tr}(N Q Q^\top) = \operatorname{tr}(N)$).
Set $C := U^\top B V$. Since $U$ and $V$ are orthogonal, multiplication by them preserves rank: $\operatorname{rank}(C) = \operatorname{rank}(B) \le k$. By unitary invariance:
\begin{align*}
\|A - B\|_F = \|U^\top(A - B)V\|_F = \|U^\top U \Sigma V^\top V - U^\top B V\|_F = \|\Sigma - C\|_F.
\end{align*}
So we have reduced the problem to: minimise $\|\Sigma - C\|_F^2$ over all $m \times n$ matrices $C$ with $\operatorname{rank}(C) \le k$, where $\Sigma$ is the diagonal matrix with entries $\sigma_1, \dots, \sigma_r$ on the diagonal and zeros elsewhere.
Expanding:
\begin{align*}
\|\Sigma - C\|_F^2 = \sum_{i,j} |\Sigma_{ij} - c_{ij}|^2 = \sum_{i=1}^{r} (\sigma_i - c_{ii})^2 + \sum_{(i,j) \neq (i,i), i \le r} c_{ij}^2 + \sum_{i > r \text{ or } j > r, i \neq j} c_{ij}^2.
\end{align*}
Every off-diagonal entry $c_{ij}$ (with $i \neq j$) appears as $c_{ij}^2 \ge 0$, contributing only cost. Setting all off-diagonal entries to zero cannot increase the rank (it can only decrease it), so the minimiser is diagonal: $C = \operatorname{diag}(d_1, \dots, d_{\min(m,n)})$.
With $C$ diagonal, $\operatorname{rank}(C)$ equals the number of nonzero $d_i$. The constraint $\operatorname{rank}(C) \le k$ means at most $k$ of the $d_i$ are nonzero. The cost becomes:
\begin{align*}
\|\Sigma - C\|_F^2 = \sum_{i=1}^{r} (\sigma_i - d_i)^2 + \sum_{i=r+1}^{\min(m,n)} d_i^2.
\end{align*}
For $i > r$: $d_i = 0$ is optimal (the term $d_i^2$ is minimised at zero, and setting $d_i = 0$ does not use up any of the $k$ nonzero slots). For $i \le r$: the term $(\sigma_i - d_i)^2$ is minimised by $d_i = \sigma_i$, which makes the cost zero. But we can only afford $k$ nonzero values. Which $k$ should we keep? We want to zero out the entries whose removal adds the least cost. Removing $d_i$ (setting it to zero) adds $\sigma_i^2$ to the total. Since $\sigma_1 \ge \cdots \ge \sigma_r$, the cheapest entries to drop are $d_{k+1}, \dots, d_r$. The optimal choice is $d_i = \sigma_i$ for $i = 1, \dots, k$ and $d_i = 0$ for $i > k$, yielding:
\begin{align*}
\min_{\operatorname{rank}(C) \le k} \|\Sigma - C\|_F^2 = \sum_{i=k+1}^{r} \sigma_i^2 = \|A - A_k\|_F^2.
\end{align*}
Since $\|A - B\|_F = \|\Sigma - C\|_F \ge \|A - A_k\|_F$ for every rank-$k$ matrix $B$, and equality holds when $B = A_k$, we conclude that $A_k$ is the best rank-$k$ approximation in Frobenius norm.[/guided]