Singular Value Decomposition — Statement & Proof

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

No discussion available for this theorem.

Proof

[proofplan] The strategy is to reduce the SVD of the rectangular matrix $A \in \mathbb{R}^{n \times m}$ to the Spectral Theorem for Real Symmetric Matrices applied to a positive semidefinite Gram matrix. Form $G := A^\top A \in \mathbb{R}^{m \times m}$, which is symmetric and positive semidefinite. Diagonalise $G = Q D Q^\top$ via the spectral theorem, with $Q$ orthogonal and $D = \operatorname{diag}(\lambda_1, \dots, \lambda_m)$ with $\lambda_i \ge 0$ ordered decreasingly. The columns of $A Q$ are then orthogonal vectors in $\mathbb{R}^n$ with norms $\sigma_i := \sqrt{\lambda_i}$. Normalising the non-zero columns yields the first $r := \operatorname{rank}(A)$ columns of $P$, and extending to an orthonormal basis of $\mathbb{R}^n$ (using the hypothesis $m \le n$) by Gram–Schmidt orthonormalisation produces an $n \times n$ orthogonal $P$. Reading off the resulting identity gives $A = P \Sigma Q^\top$ with $\Sigma$ "diagonal" in the rectangular sense. Uniqueness of the singular values follows from their identification as the square roots of the eigenvalues of $G$. [/proofplan] [step:Set up notation and reduce to the spectral theorem via the Gram matrix] The theorem concerns a linear map $L: \mathbb{R}^m \to \mathbb{R}^n$ with $m \le n$. Let $A := JL \in \mathbb{R}^{n \times m}$ denote the matrix of $L$ in the standard bases of $\mathbb{R}^m$ and $\mathbb{R}^n$ (the Jacobian matrix of $L$, which for a linear map coincides with its matrix in standard bases and is independent of the base point). Throughout the proof we work with $A$ as an $n \times m$ matrix; the action of $L$ on a vector $x \in \mathbb{R}^m$ is the matrix–vector product $A x \in \mathbb{R}^n$. Define the Gram matrix \begin{align*} G := A^\top A \in \mathbb{R}^{m \times m}. \end{align*} We verify two structural properties of $G$. (i) **$G$ is symmetric**: $G^\top = (A^\top A)^\top = A^\top (A^\top)^\top = A^\top A = G$. (ii) **$G$ is positive semidefinite**: for every $x \in \mathbb{R}^m$, \begin{align*} x^\top G x = x^\top A^\top A x = (A x)^\top (A x) = |A x|^2 \ge 0. \end{align*} By the Spectral Theorem for Real Symmetric Matrices (a real symmetric matrix is orthogonally diagonalisable), there exists an orthogonal matrix $Q \in \mathbb{R}^{m \times m}$ (so $Q^\top Q = Q Q^\top = I_m$) and a diagonal matrix $D = \operatorname{diag}(\lambda_1, \dots, \lambda_m)$ with real entries such that \begin{align*} G = Q D Q^\top. \end{align*} Write $Q = [q_1\ q_2\ \cdots\ q_m]$ with $q_i \in \mathbb{R}^m$ the $i$-th column of $Q$, so that $\{q_1, \dots, q_m\}$ is orthonormal and $G q_i = \lambda_i q_i$ (each $q_i$ is a unit eigenvector of $G$ with eigenvalue $\lambda_i$). The eigenvalues $\lambda_i$ are non-negative: applying the positive semidefiniteness of $G$ to the unit vector $q_i$, \begin{align*} \lambda_i = \lambda_i |q_i|^2 = q_i^\top (\lambda_i q_i) = q_i^\top G q_i = |A q_i|^2 \ge 0. \end{align*} Order the eigenvalues so that $\lambda_1 \ge \lambda_2 \ge \cdots \ge \lambda_m \ge 0$ (and reorder the columns of $Q$ accordingly), and set \begin{align*} \sigma_i := \sqrt{\lambda_i} \ge 0, \qquad \sigma_1 \ge \sigma_2 \ge \cdots \ge \sigma_m \ge 0. \end{align*} [/step] [step:Construct the orthogonal matrix $P$ from the orthonormal image vectors] For each $i = 1, \dots, m$, set $u_i := A q_i \in \mathbb{R}^n$. We compute the inner products: for $i, j \in \{1, \dots, m\}$, \begin{align*} u_i \cdot u_j = (A q_i)^\top (A q_j) = q_i^\top A^\top A q_j = q_i^\top G q_j = q_i^\top \lambda_j q_j = \lambda_j (q_i \cdot q_j) = \lambda_j \delta_{ij}, \end{align*} where $\delta_{ij}$ is the Kronecker delta and we used $G q_j = \lambda_j q_j$ together with the orthonormality $q_i \cdot q_j = \delta_{ij}$. In particular, \begin{align*} |u_i|^2 = \lambda_i = \sigma_i^2, \qquad u_i \cdot u_j = 0 \text{ for } i \ne j. \end{align*} Let $r := \#\{i : \sigma_i > 0\}$, the number of strictly positive singular values. Since the $\sigma_i$ are non-increasing, $\sigma_1, \dots, \sigma_r > 0$ and $\sigma_{r+1} = \cdots = \sigma_m = 0$. For $i = 1, \dots, r$, define \begin{align*} p_i := \frac{u_i}{\sigma_i} \in \mathbb{R}^n, \end{align*} so $|p_i|^2 = |u_i|^2 / \sigma_i^2 = \sigma_i^2 / \sigma_i^2 = 1$. The orthogonality computation above gives $p_i \cdot p_j = (u_i \cdot u_j)/(\sigma_i \sigma_j) = \delta_{ij}$ for $i, j \in \{1, \dots, r\}$. Hence $\{p_1, \dots, p_r\}$ is an orthonormal set in $\mathbb{R}^n$. We now invoke the hypothesis of the theorem that $m \le n$. Since $r \le m$ (as $r$ counts a subset of indices $\{1, \dots, m\}$) and $m \le n$ by hypothesis, we have $r \le n$. Therefore the orthonormal set $\{p_1, \dots, p_r\} \subset \mathbb{R}^n$ has at most $n$ elements and can be extended: choose any vectors $w_{r+1}, \dots, w_n \in \mathbb{R}^n$ such that $\{p_1, \dots, p_r, w_{r+1}, \dots, w_n\}$ is a basis of $\mathbb{R}^n$ (possible since $r \le n$), and apply Gram–Schmidt orthonormalisation to this basis, leaving the already-orthonormal vectors $p_1, \dots, p_r$ unchanged and producing orthonormal vectors $p_{r+1}, \dots, p_n \in \mathbb{R}^n$. The result is an orthonormal basis $\{p_1, \dots, p_n\}$ of $\mathbb{R}^n$. Define \begin{align*} P := [p_1\ p_2\ \cdots\ p_n] \in \mathbb{R}^{n \times n}. \end{align*} Then $P$ is orthogonal: $(P^\top P)_{ij} = p_i \cdot p_j = \delta_{ij}$, so $P^\top P = I_n$. [/step] [step:Define the rectangular diagonal matrix $\Sigma$ and verify the SVD identity] Define \begin{align*} \Sigma \in \mathbb{R}^{n \times m}, \qquad \Sigma_{ij} := \begin{cases} \sigma_i & \text{if } i = j \in \{1, \dots, m\}, \\ 0 & \text{otherwise.} \end{cases} \end{align*} That is, $\Sigma$ has $\sigma_1, \dots, \sigma_m$ on its main diagonal and zeros elsewhere; the bottom $n - m$ rows are entirely zero (this is well-defined precisely because $m \le n$). We verify $A = P \Sigma Q^\top$ by checking equality of the action on the orthonormal basis $\{q_1, \dots, q_m\}$ of $\mathbb{R}^m$. For $i \in \{1, \dots, r\}$: \begin{align*} (P \Sigma Q^\top) q_i = P \Sigma (Q^\top q_i) = P \Sigma e_i^{(m)} = P (\sigma_i e_i^{(n)}) = \sigma_i p_i = \sigma_i \cdot \frac{u_i}{\sigma_i} = u_i = A q_i. \end{align*} Here $Q^\top q_i = e_i^{(m)}$ (the $i$-th standard basis vector of $\mathbb{R}^m$) because $Q$ is orthogonal and $q_i$ is the $i$-th column of $Q$; $\Sigma e_i^{(m)} = \sigma_i e_i^{(n)}$ since the $i$-th column of $\Sigma$ has $\sigma_i$ in slot $i$ and zeros elsewhere (valid for $i \le m \le n$); and $P e_i^{(n)} = p_i$ since $p_i$ is the $i$-th column of $P$. For $i \in \{r+1, \dots, m\}$: \begin{align*} (P \Sigma Q^\top) q_i = P \Sigma e_i^{(m)} = P (\sigma_i e_i^{(n)}) = P \cdot 0 = 0, \end{align*} since $\sigma_i = 0$ for $i > r$. On the other hand, \begin{align*} A q_i = u_i, \qquad |u_i|^2 = \sigma_i^2 = 0 \implies u_i = 0, \end{align*} so $A q_i = 0$ as well. Hence $(P \Sigma Q^\top) q_i = A q_i$ for every $i \in \{1, \dots, m\}$. Since $\{q_1, \dots, q_m\}$ is a basis of $\mathbb{R}^m$, the matrices $P \Sigma Q^\top$ and $A$ agree on all of $\mathbb{R}^m$, hence as $n \times m$ matrices, \begin{align*} A = P \Sigma Q^\top. \end{align*} [/step] [step:Establish uniqueness of the singular values] Suppose $A = P_1 \Sigma_1 Q_1^\top = P_2 \Sigma_2 Q_2^\top$ are two SVDs, with $\Sigma_1, \Sigma_2$ both rectangular diagonal in $\mathbb{R}^{n \times m}$ with non-increasing non-negative diagonals $\sigma_1^{(1)} \ge \cdots \ge \sigma_m^{(1)}$ and $\sigma_1^{(2)} \ge \cdots \ge \sigma_m^{(2)}$, and $P_1, P_2 \in \mathbb{R}^{n \times n}$, $Q_1, Q_2 \in \mathbb{R}^{m \times m}$ orthogonal. Compute the Gram matrix using each decomposition: for $k \in \{1, 2\}$, \begin{align*} G = A^\top A = (P_k \Sigma_k Q_k^\top)^\top (P_k \Sigma_k Q_k^\top) = Q_k \Sigma_k^\top P_k^\top P_k \Sigma_k Q_k^\top = Q_k (\Sigma_k^\top \Sigma_k) Q_k^\top, \end{align*} using $P_k^\top P_k = I_n$. The matrix $\Sigma_k^\top \Sigma_k \in \mathbb{R}^{m \times m}$ is diagonal with entries $(\sigma_j^{(k)})^2$ in the $j$-th slot, so \begin{align*} G = Q_1 \operatorname{diag}\bigl((\sigma_j^{(1)})^2\bigr) Q_1^\top = Q_2 \operatorname{diag}\bigl((\sigma_j^{(2)})^2\bigr) Q_2^\top. \end{align*} Both right-hand sides are orthogonal diagonalisations of $G$ with eigenvalues listed in non-increasing order. The multiset of eigenvalues of a matrix is uniquely determined as the multiset of roots (with multiplicity) of the characteristic polynomial $\det(\lambda I - G)$, which depends only on $G$. Hence \begin{align*} (\sigma_j^{(1)})^2 = (\sigma_j^{(2)})^2 \quad \text{for } j = 1, \dots, m. \end{align*} Since $\sigma_j^{(k)} \ge 0$, taking square roots gives $\sigma_j^{(1)} = \sigma_j^{(2)}$ for all $j$. So the singular values are uniquely determined by $A$, namely $\sigma_j = \sqrt{\lambda_j(G)}$ where $\lambda_j(G)$ is the $j$-th largest eigenvalue (with multiplicity) of $G = A^\top A$. (The matrices $P$ and $Q$ are *not* unique when singular values repeat: in an eigenspace of $G$ of dimension $> 1$, any orthonormal basis can be chosen as the corresponding columns of $Q$, with the corresponding columns of $P$ then determined, or, in the case of zero singular values, free.) [/step]

What brings you to Androma?

Start with a route through the knowledge graph.

Singular Value Decomposition (Theorem # 3071)

Discussion

Proof

Explore Further

Sign in to Androma

Check your inbox

One last step

Singular Value Decomposition (Theorem # 3071)

Discussion

Proof

Explore Further