Androma — The Home of Mathematics on the Internet

custom_env Unknown

[guided]The heart of the spectral norm optimality is a dimension argument. Any rank-$k$ matrix $B$ "uses up" at most $k$ dimensions for its column space, leaving a kernel of dimension at least $n - k$. Meanwhile, the first $k + 1$ right singular vectors of $A$ span a $(k+1)$-dimensional subspace. These two subspaces are too large to be disjoint in $\mathbb{R}^n$, so there must be a direction $z$ that $B$ maps to zero but on which $A$ still has substantial energy. Let $B \in \mathbb{R}^{m \times n}$ with $\operatorname{rank}(B) \le k$. Define $V_{k+1} := \operatorname{span}(v_1, \dots, v_{k+1}) \subset \mathbb{R}^n$. This subspace has dimension $k + 1$ because $v_1, \dots, v_{k+1}$ are orthonormal (hence linearly independent). By the rank-nullity theorem, $\dim \ker B = n - \operatorname{rank}(B) \ge n - k$. The dimensional intersection formula gives: \begin{align*} \dim(V_{k+1} \cap \ker B) &= \dim V_{k+1} + \dim \ker B - \dim(V_{k+1} + \ker B) \\ &\ge (k+1) + (n-k) - n = 1. \end{align*} So there exists a nonzero $z \in V_{k+1} \cap \ker B$. Normalise to $|z| = 1$. Since $z \in V_{k+1}$, write $z = \sum_{i=1}^{k+1} c_i v_i$ with $c_i = v_i^\top z$. The orthonormality of $v_1, \dots, v_{k+1}$ gives $\sum_{i=1}^{k+1} c_i^2 = |z|^2 = 1$. Since $z \in \ker B$, we have $Bz = 0$, so $(A - B)z = Az$. Now we compute $|Az|^2$. Using $Av_i = \sigma_i u_i$ (from the SVD) and the fact that $v_i^\top z = c_i$ for $i \le k+1$ while $v_i^\top z = 0$ for $i > k+1$ (since $z \in V_{k+1}$ and the $v_i$ are orthonormal): \begin{align*} Az = A\sum_{i=1}^{k+1} c_i v_i = \sum_{i=1}^{k+1} c_i \sigma_i u_i. \end{align*} The $u_i$ are orthonormal, so: \begin{align*} |(A - B)z|^2 = |Az|^2 = \left|\sum_{i=1}^{k+1} c_i \sigma_i u_i\right|^2 = \sum_{i=1}^{k+1} \sigma_i^2 c_i^2. \end{align*} Since $\sigma_1 \ge \cdots \ge \sigma_{k+1}$, we bound each term below by replacing $\sigma_i^2$ with $\sigma_{k+1}^2$: \begin{align*} \sum_{i=1}^{k+1} \sigma_i^2 c_i^2 \ge \sigma_{k+1}^2 \sum_{i=1}^{k+1} c_i^2 = \sigma_{k+1}^2. \end{align*} Since $\|A - B\|_2 = \sup_{|x|=1} |(A - B)x| \ge |(A - B)z| \ge \sigma_{k+1}$, and $\|A - A_k\|_2 = \sigma_{k+1}$ from the previous step, $A_k$ achieves the minimum. The dimension argument is why this is sometimes called a "min-max" characterisation: the $(k+1)$-th singular value is the minimum over all rank-$k$ approximations of the maximum distortion, and the truncated SVD is the optimizer.[/guided]

custom_env Unknown

[step:Prove optimality in the Frobenius norm via unitary invariance]Let $B \in \mathbb{R}^{m \times n}$ with $\operatorname{rank}(B) \le k$. We must show $\|A - B\|_F \ge \|A - A_k\|_F$. The Frobenius norm is unitarily invariant: $\|M\|_F = \|UMV^\top\|_F$ for any orthogonal matrices $U, V$ of compatible dimensions. This holds because $\|M\|_F^2 = \operatorname{tr}(M^\top M)$ and the trace is invariant under orthogonal conjugation. Define $C := U^\top B V \in \mathbb{R}^{m \times n}$. Since $U$ and $V$ are orthogonal, $\operatorname{rank}(C) = \operatorname{rank}(B) \le k$, and \begin{align*} \|A - B\|_F = \|U^\top(A - B)V\|_F = \|U^\top A V - C\|_F = \|\Sigma - C\|_F, \end{align*} where $\Sigma$ is the $m \times n$ diagonal matrix of singular values from the SVD $A = U \Sigma V^\top$. Thus the problem reduces to: among all $m \times n$ matrices $C$ of rank at most $k$, minimise $\|\Sigma - C\|_F$. Let $C = (c_{ij})$. Then \begin{align*} \|\Sigma - C\|_F^2 = \sum_{i=1}^{m} \sum_{j=1}^{n} (\Sigma_{ij} - c_{ij})^2 = \sum_{i=1}^{r} (\sigma_i - c_{ii})^2 + \sum_{i=1}^{r} \sum_{j \neq i} c_{ij}^2 + \sum_{\substack{i > r \text{ or} \\ j > r}} c_{ij}^2 + \sum_{i=r+1}^{m} \sum_{j=r+1}^{n} c_{ij}^2, \end{align*} where we used $\Sigma_{ij} = 0$ whenever $i \neq j$ or $i > r$. More simply: \begin{align*} \|\Sigma - C\|_F^2 = \sum_{i=1}^{r} (\sigma_i - c_{ii})^2 + \sum_{(i,j): i \neq j \text{ or } i > r} c_{ij}^2. \end{align*} The second sum is non-negative. To minimise $\|\Sigma - C\|_F^2$ subject to $\operatorname{rank}(C) \le k$, the off-diagonal and extra-diagonal entries of $C$ should be zero (they only increase the cost). With $C$ diagonal, i.e., $C = \operatorname{diag}(d_1, \dots, d_{\min(m,n)})$, the rank constraint becomes: at most $k$ of the $d_i$ are nonzero. The cost is \begin{align*} \|\Sigma - C\|_F^2 = \sum_{i=1}^{r} (\sigma_i - d_i)^2 + \sum_{i=r+1}^{\min(m,n)} d_i^2. \end{align*} For $i > r$, the optimal choice is $d_i = 0$. For $i \le r$, each term $(\sigma_i - d_i)^2$ is minimised by $d_i = \sigma_i$ (making the term zero). Since we may only choose $k$ nonzero values among $d_1, \dots, d_r$, we minimise the residual by setting $d_i = \sigma_i$ for $i = 1, \dots, k$ (the $k$ largest singular values) and $d_i = 0$ for $i > k$. This gives the minimum \begin{align*} \|\Sigma - C\|_F^2 = \sum_{i=k+1}^{r} \sigma_i^2 = \|A - A_k\|_F^2. \end{align*} Therefore $\|A - B\|_F \ge \|A - A_k\|_F$ for every rank-$k$ matrix $B$, establishing $A_k$ as the best rank-$k$ approximation in Frobenius norm.[/step]

custom_env Unknown

[guided]The Frobenius norm proof exploits a powerful simplification: unitary invariance. Since $\|M\|_F$ depends only on the singular values of $M$, we can rotate the problem into a coordinate system where $A$ is diagonal, reducing the optimisation to a straightforward combinatorial question about which diagonal entries to keep. Let $B \in \mathbb{R}^{m \times n}$ with $\operatorname{rank}(B) \le k$. The key property we use is that the Frobenius norm is unitarily invariant: for any orthogonal matrices $P \in \mathbb{R}^{m \times m}$ and $Q \in \mathbb{R}^{n \times n}$, \begin{align*} \|PMQ\|_F^2 = \operatorname{tr}(Q^\top M^\top P^\top P M Q) = \operatorname{tr}(Q^\top M^\top M Q) = \operatorname{tr}(M^\top M) = \|M\|_F^2, \end{align*} where we used $P^\top P = I$ and the cyclic property of the trace ($\operatorname{tr}(Q^\top N Q) = \operatorname{tr}(N Q Q^\top) = \operatorname{tr}(N)$). Set $C := U^\top B V$. Since $U$ and $V$ are orthogonal, multiplication by them preserves rank: $\operatorname{rank}(C) = \operatorname{rank}(B) \le k$. By unitary invariance: \begin{align*} \|A - B\|_F = \|U^\top(A - B)V\|_F = \|U^\top U \Sigma V^\top V - U^\top B V\|_F = \|\Sigma - C\|_F. \end{align*} So we have reduced the problem to: minimise $\|\Sigma - C\|_F^2$ over all $m \times n$ matrices $C$ with $\operatorname{rank}(C) \le k$, where $\Sigma$ is the diagonal matrix with entries $\sigma_1, \dots, \sigma_r$ on the diagonal and zeros elsewhere. Expanding: \begin{align*} \|\Sigma - C\|_F^2 = \sum_{i,j} |\Sigma_{ij} - c_{ij}|^2 = \sum_{i=1}^{r} (\sigma_i - c_{ii})^2 + \sum_{(i,j) \neq (i,i), i \le r} c_{ij}^2 + \sum_{i > r \text{ or } j > r, i \neq j} c_{ij}^2. \end{align*} Every off-diagonal entry $c_{ij}$ (with $i \neq j$) appears as $c_{ij}^2 \ge 0$, contributing only cost. Setting all off-diagonal entries to zero cannot increase the rank (it can only decrease it), so the minimiser is diagonal: $C = \operatorname{diag}(d_1, \dots, d_{\min(m,n)})$. With $C$ diagonal, $\operatorname{rank}(C)$ equals the number of nonzero $d_i$. The constraint $\operatorname{rank}(C) \le k$ means at most $k$ of the $d_i$ are nonzero. The cost becomes: \begin{align*} \|\Sigma - C\|_F^2 = \sum_{i=1}^{r} (\sigma_i - d_i)^2 + \sum_{i=r+1}^{\min(m,n)} d_i^2. \end{align*} For $i > r$: $d_i = 0$ is optimal (the term $d_i^2$ is minimised at zero, and setting $d_i = 0$ does not use up any of the $k$ nonzero slots). For $i \le r$: the term $(\sigma_i - d_i)^2$ is minimised by $d_i = \sigma_i$, which makes the cost zero. But we can only afford $k$ nonzero values. Which $k$ should we keep? We want to zero out the entries whose removal adds the least cost. Removing $d_i$ (setting it to zero) adds $\sigma_i^2$ to the total. Since $\sigma_1 \ge \cdots \ge \sigma_r$, the cheapest entries to drop are $d_{k+1}, \dots, d_r$. The optimal choice is $d_i = \sigma_i$ for $i = 1, \dots, k$ and $d_i = 0$ for $i > k$, yielding: \begin{align*} \min_{\operatorname{rank}(C) \le k} \|\Sigma - C\|_F^2 = \sum_{i=k+1}^{r} \sigma_i^2 = \|A - A_k\|_F^2. \end{align*} Since $\|A - B\|_F = \|\Sigma - C\|_F \ge \|A - A_k\|_F$ for every rank-$k$ matrix $B$, and equality holds when $B = A_k$, we conclude that $A_k$ is the best rank-$k$ approximation in Frobenius norm.[/guided]

custom_env Unknown

What brings you to Androma?

Start with a route through the knowledge graph.

Attributions & Verification

Proof

Verification Progress

Contributors

Who Can Verify

Quick Actions

Sign in to Androma

Check your inbox

One last step

Attributions & Verification

Proof

Verification Progress

Contributors

Who Can Verify

Quick Actions

Raw Attribution Data