Gram–Schmidt Algorithm for QR Factorisation — Statement & Proof

Gram–Schmidt Algorithm for QR Factorisation (Theorem # 1423)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

No discussion available for this theorem.

Proof

[proofplan] We prove by induction on $k$ that the Gram-Schmidt algorithm produces vectors $q_1, \ldots, q_k$ that are orthonormal and that $\operatorname{span}(q_1, \ldots, q_k) = \operatorname{span}(a_1, \ldots, a_k)$. At each inductive step, the residual $d_k = a_k - \sum_{i=1}^{k-1}(q_i^\top a_k)q_i$ is orthogonal to $q_1, \ldots, q_{k-1}$ because it is the projection of $a_k$ onto the orthogonal complement of $\operatorname{span}(q_1, \ldots, q_{k-1})$. Normalising $d_k$ (or choosing an arbitrary orthogonal unit vector when $d_k = \mathbf{0}$) extends the orthonormal system. The factorisation $A = QR$ with $R$ upper triangular then follows from the construction of the entries $R_{ik}$. [/proofplan] [step:Establish the base case $k = 1$] At $k = 1$, the algorithm sets: - If $a_1 \neq \mathbf{0}$: $q_1 = a_1 / \|a_1\|$ and $R_{11} = \|a_1\|$. Then $\|q_1\| = 1$ and $a_1 = R_{11} q_1$, so $\operatorname{span}(q_1) = \operatorname{span}(a_1)$. - If $a_1 = \mathbf{0}$: $R_{11} = 0$ and $q_1$ is an arbitrary unit vector. Then $\|q_1\| = 1$ and $a_1 = \mathbf{0} = 0 \cdot q_1 = R_{11}q_1$. Moreover, $\operatorname{span}(a_1) = \{\mathbf{0}\} \subset \operatorname{span}(q_1)$. In both cases, the set $\{q_1\}$ is orthonormal (a single unit vector) and $a_1 = R_{11}q_1$. [/step] [step:State the inductive hypothesis] Assume that after step $k - 1$, the vectors $q_1, \ldots, q_{k-1}$ satisfy: 1. **Orthonormality:** $q_i^\top q_j = \delta_{ij}$ for all $i, j \in \{1, \ldots, k-1\}$. 2. **Span preservation:** $\operatorname{span}(q_1, \ldots, q_j) \supseteq \operatorname{span}(a_1, \ldots, a_j)$ for each $j = 1, \ldots, k-1$. 3. **Column reconstruction:** For each $j = 1, \ldots, k-1$, we have $a_j = \sum_{i=1}^{j} R_{ij} q_i$ (with $R_{ij} = q_i^\top a_j$ for $i < j$ and $R_{jj} = \|d_j\|$). [/step] [step:Verify orthogonality of $d_k$ to the existing basis vectors] The residual at step $k$ is \begin{align*} d_k = a_k - \sum_{i=1}^{k-1} (q_i^\top a_k)\, q_i. \end{align*} For any $\ell \in \{1, \ldots, k-1\}$, we compute \begin{align*} q_\ell^\top d_k = q_\ell^\top a_k - \sum_{i=1}^{k-1} (q_i^\top a_k)(q_\ell^\top q_i). \end{align*} By the inductive hypothesis (orthonormality), $q_\ell^\top q_i = \delta_{\ell i}$, so the sum collapses to a single term: \begin{align*} q_\ell^\top d_k = q_\ell^\top a_k - (q_\ell^\top a_k) \cdot 1 = 0. \end{align*} Therefore $d_k \perp q_\ell$ for all $\ell = 1, \ldots, k-1$. [guided] The residual $d_k$ is constructed by subtracting from $a_k$ its orthogonal projection onto $\operatorname{span}(q_1, \ldots, q_{k-1})$. The projection of a vector $v$ onto an orthonormal set $\{q_1, \ldots, q_{k-1}\}$ is $\sum_{i=1}^{k-1} (q_i^\top v) q_i$, so the residual $d_k = a_k - \sum_{i=1}^{k-1} (q_i^\top a_k) q_i$ is the component of $a_k$ orthogonal to this subspace. To verify orthogonality rigorously: take the inner product of $d_k$ with any $q_\ell$ ($\ell \le k-1$): \begin{align*} q_\ell^\top d_k &= q_\ell^\top a_k - \sum_{i=1}^{k-1} (q_i^\top a_k) \underbrace{q_\ell^\top q_i}_{= \delta_{\ell i}} = q_\ell^\top a_k - q_\ell^\top a_k = 0. \end{align*} The orthonormality of $\{q_1, \ldots, q_{k-1}\}$ is consumed here: without it, the Kronecker delta replacement $q_\ell^\top q_i = \delta_{\ell i}$ would fail, and the sum would not telescope. [/guided] [/step] [step:Normalise $d_k$ and verify orthonormality of the extended set] **Case 1: $d_k \neq \mathbf{0}$.** Set $q_k = d_k / \|d_k\|$ and $R_{kk} = \|d_k\|$. Then $\|q_k\| = 1$, and for each $\ell \in \{1, \ldots, k-1\}$: \begin{align*} q_\ell^\top q_k = \frac{q_\ell^\top d_k}{\|d_k\|} = \frac{0}{\|d_k\|} = 0. \end{align*} Combined with the inductive hypothesis $q_i^\top q_j = \delta_{ij}$ for $i, j \le k-1$, the extended set $\{q_1, \ldots, q_k\}$ is orthonormal. **Case 2: $d_k = \mathbf{0}$.** Set $R_{kk} = 0$ and choose $q_k$ to be any unit vector orthogonal to $q_1, \ldots, q_{k-1}$. Such a vector exists because $k - 1 < m$ (the ambient space $\mathbb{R}^m$ has dimension $m \ge n \ge k$), so the orthogonal complement of $\operatorname{span}(q_1, \ldots, q_{k-1})$ is nontrivial. By construction, $\|q_k\| = 1$ and $q_\ell^\top q_k = 0$ for $\ell = 1, \ldots, k-1$, so $\{q_1, \ldots, q_k\}$ is orthonormal. [/step] [step:Verify span preservation at step $k$] We show $\operatorname{span}(q_1, \ldots, q_k) \supseteq \operatorname{span}(a_1, \ldots, a_k)$. By the inductive hypothesis, $\operatorname{span}(q_1, \ldots, q_{k-1}) \supseteq \operatorname{span}(a_1, \ldots, a_{k-1})$, so it suffices to show $a_k \in \operatorname{span}(q_1, \ldots, q_k)$. From the definition of $d_k$: \begin{align*} a_k = \sum_{i=1}^{k-1} (q_i^\top a_k)\, q_i + d_k. \end{align*} **Case 1: $d_k \neq \mathbf{0}$.** Then $d_k = R_{kk}\, q_k$ with $R_{kk} = \|d_k\| > 0$. Substituting: \begin{align*} a_k = \sum_{i=1}^{k-1} R_{ik}\, q_i + R_{kk}\, q_k = \sum_{i=1}^{k} R_{ik}\, q_i, \end{align*} where $R_{ik} = q_i^\top a_k$ for $i < k$. This lies in $\operatorname{span}(q_1, \ldots, q_k)$. **Case 2: $d_k = \mathbf{0}$.** Then $a_k = \sum_{i=1}^{k-1} (q_i^\top a_k)\, q_i \in \operatorname{span}(q_1, \ldots, q_{k-1}) \subset \operatorname{span}(q_1, \ldots, q_k)$. In this case $R_{kk} = 0$, so $a_k = \sum_{i=1}^{k} R_{ik}\, q_i$ still holds (the $k$-th term is zero). In both cases, $a_k \in \operatorname{span}(q_1, \ldots, q_k)$ and the column reconstruction formula $a_k = \sum_{i=1}^{k} R_{ik}\, q_i$ is established. [guided] The span preservation property has two directions. The inclusion $\operatorname{span}(a_1, \ldots, a_k) \subseteq \operatorname{span}(q_1, \ldots, q_k)$ is what we just proved. Does the reverse inclusion hold? Not necessarily. When $d_k = \mathbf{0}$, the vector $a_k$ already lies in $\operatorname{span}(q_1, \ldots, q_{k-1})$, so adding $a_k$ does not enlarge the span. But the algorithm chooses $q_k$ orthogonal to $q_1, \ldots, q_{k-1}$, introducing a genuinely new direction. Thus $\operatorname{span}(q_1, \ldots, q_k)$ may be strictly larger than $\operatorname{span}(a_1, \ldots, a_k)$ when columns of $A$ are linearly dependent. This is not a defect of the algorithm: the purpose of the $d_k = \mathbf{0}$ branch is to ensure that $Q$ always has orthonormal columns, even when $A$ is rank-deficient. The column $q_k$ is a "filler" that completes the orthonormal system without affecting the factorisation $A = QR$ (because $R_{kk} = 0$ zeroes out the contribution of $q_k$ in the product $QR$). [/guided] [/step] [step:Assemble the factorisation $A = QR$ with $R$ upper triangular] After completing all $n$ steps, define the matrices: \begin{align*} Q := \begin{pmatrix} q_1 & q_2 & \cdots & q_n \end{pmatrix} \in \mathbb{R}^{m \times n}, \qquad R := (R_{ik}) \in \mathbb{R}^{n \times n}. \end{align*} The column reconstruction formula from the inductive argument gives, for each $k = 1, \ldots, n$: \begin{align*} a_k = \sum_{i=1}^{k} R_{ik}\, q_i. \end{align*} This means $R_{ik} = 0$ for $i > k$ (since the sum only runs to $i = k$), so $R$ is upper triangular. In matrix form, the equation $a_k = Q r_k$ (where $r_k$ is the $k$-th column of $R$) for each $k = 1, \ldots, n$ assembles into \begin{align*} A = QR. \end{align*} The columns of $Q$ form an orthonormal set by the induction: $Q^\top Q = I_n$. The matrix $R$ is upper triangular with diagonal entries $R_{kk} = \|d_k\| \ge 0$ and off-diagonal entries $R_{ik} = q_i^\top a_k$ for $i < k$. [guided] Let us verify the matrix equation $A = QR$ entry by entry. The $(p, k)$ entry of $QR$ is \begin{align*} (QR)_{pk} = \sum_{i=1}^{n} Q_{pi} R_{ik} = \sum_{i=1}^{n} (q_i)_p \, R_{ik}. \end{align*} Since $R$ is upper triangular, $R_{ik} = 0$ for $i > k$, so the sum truncates to $i = 1, \ldots, k$: \begin{align*} (QR)_{pk} = \sum_{i=1}^{k} (q_i)_p \, R_{ik} = \left(\sum_{i=1}^{k} R_{ik}\, q_i\right)_p = (a_k)_p = A_{pk}, \end{align*} which is precisely the column reconstruction formula. This confirms $A = QR$. The upper triangularity of $R$ is a consequence of the sequential nature of the algorithm: when processing column $a_k$, only the previously constructed vectors $q_1, \ldots, q_k$ are used, so $a_k$ is expressed as a linear combination of $q_1, \ldots, q_k$ with no contribution from $q_{k+1}, \ldots, q_n$. This forces $R_{ik} = 0$ for $i > k$. [/guided] [/step]

Explore Further

Optimal Step Size via Exact Line Search Numerical Analysis von Neumann Stability Criterion Numerical Analysis Finite Termination of Conjugate Direction Methods Numerical Analysis Spectral Radius Criterion Numerical Analysis Inverse DFT Numerical Analysis Strict Diagonal Dominance Implies Nonsingularity Numerical Analysis Error Propagation Numerical Analysis QR Iteration Preserves Tridiagonal Structure Numerical Analysis

What brings you to Androma?

Start with a route through the knowledge graph.

Gram–Schmidt Algorithm for QR Factorisation (Theorem # 1423)

Discussion

Proof

Explore Further

Sign in to Androma

Check your inbox

One last step

Gram–Schmidt Algorithm for QR Factorisation (Theorem # 1423)

Discussion

Proof

Explore Further