[proofplan]
We prove by induction on $k$ that the Gram-Schmidt algorithm produces vectors $q_1, \ldots, q_k$ that are orthonormal and that $\operatorname{span}(q_1, \ldots, q_k) = \operatorname{span}(a_1, \ldots, a_k)$. At each inductive step, the residual $d_k = a_k - \sum_{i=1}^{k-1}(q_i^\top a_k)q_i$ is orthogonal to $q_1, \ldots, q_{k-1}$ because it is the projection of $a_k$ onto the orthogonal complement of $\operatorname{span}(q_1, \ldots, q_{k-1})$. Normalising $d_k$ (or choosing an arbitrary orthogonal unit vector when $d_k = \mathbf{0}$) extends the orthonormal system. The factorisation $A = QR$ with $R$ upper triangular then follows from the construction of the entries $R_{ik}$.
[/proofplan]
[step:Establish the base case $k = 1$]
At $k = 1$, the algorithm sets:
- If $a_1 \neq \mathbf{0}$: $q_1 = a_1 / \|a_1\|$ and $R_{11} = \|a_1\|$. Then $\|q_1\| = 1$ and $a_1 = R_{11} q_1$, so $\operatorname{span}(q_1) = \operatorname{span}(a_1)$.
- If $a_1 = \mathbf{0}$: $R_{11} = 0$ and $q_1$ is an arbitrary unit vector. Then $\|q_1\| = 1$ and $a_1 = \mathbf{0} = 0 \cdot q_1 = R_{11}q_1$. Moreover, $\operatorname{span}(a_1) = \{\mathbf{0}\} \subset \operatorname{span}(q_1)$.
In both cases, the set $\{q_1\}$ is orthonormal (a single unit vector) and $a_1 = R_{11}q_1$.
[/step]
[step:State the inductive hypothesis]
Assume that after step $k - 1$, the vectors $q_1, \ldots, q_{k-1}$ satisfy:
1. **Orthonormality:** $q_i^\top q_j = \delta_{ij}$ for all $i, j \in \{1, \ldots, k-1\}$.
2. **Span preservation:** $\operatorname{span}(q_1, \ldots, q_j) \supseteq \operatorname{span}(a_1, \ldots, a_j)$ for each $j = 1, \ldots, k-1$.
3. **Column reconstruction:** For each $j = 1, \ldots, k-1$, we have $a_j = \sum_{i=1}^{j} R_{ij} q_i$ (with $R_{ij} = q_i^\top a_j$ for $i < j$ and $R_{jj} = \|d_j\|$).
[/step]
[step:Verify orthogonality of $d_k$ to the existing basis vectors]
The residual at step $k$ is
\begin{align*}
d_k = a_k - \sum_{i=1}^{k-1} (q_i^\top a_k)\, q_i.
\end{align*}
For any $\ell \in \{1, \ldots, k-1\}$, we compute
\begin{align*}
q_\ell^\top d_k = q_\ell^\top a_k - \sum_{i=1}^{k-1} (q_i^\top a_k)(q_\ell^\top q_i).
\end{align*}
By the inductive hypothesis (orthonormality), $q_\ell^\top q_i = \delta_{\ell i}$, so the sum collapses to a single term:
\begin{align*}
q_\ell^\top d_k = q_\ell^\top a_k - (q_\ell^\top a_k) \cdot 1 = 0.
\end{align*}
Therefore $d_k \perp q_\ell$ for all $\ell = 1, \ldots, k-1$.
[guided]
The residual $d_k$ is constructed by subtracting from $a_k$ its orthogonal projection onto $\operatorname{span}(q_1, \ldots, q_{k-1})$. The projection of a vector $v$ onto an orthonormal set $\{q_1, \ldots, q_{k-1}\}$ is $\sum_{i=1}^{k-1} (q_i^\top v) q_i$, so the residual $d_k = a_k - \sum_{i=1}^{k-1} (q_i^\top a_k) q_i$ is the component of $a_k$ orthogonal to this subspace.
To verify orthogonality rigorously: take the inner product of $d_k$ with any $q_\ell$ ($\ell \le k-1$):
\begin{align*}
q_\ell^\top d_k &= q_\ell^\top a_k - \sum_{i=1}^{k-1} (q_i^\top a_k) \underbrace{q_\ell^\top q_i}_{= \delta_{\ell i}} = q_\ell^\top a_k - q_\ell^\top a_k = 0.
\end{align*}
The orthonormality of $\{q_1, \ldots, q_{k-1}\}$ is consumed here: without it, the Kronecker delta replacement $q_\ell^\top q_i = \delta_{\ell i}$ would fail, and the sum would not telescope.
[/guided]
[/step]
[step:Normalise $d_k$ and verify orthonormality of the extended set]
**Case 1: $d_k \neq \mathbf{0}$.** Set $q_k = d_k / \|d_k\|$ and $R_{kk} = \|d_k\|$. Then $\|q_k\| = 1$, and for each $\ell \in \{1, \ldots, k-1\}$:
\begin{align*}
q_\ell^\top q_k = \frac{q_\ell^\top d_k}{\|d_k\|} = \frac{0}{\|d_k\|} = 0.
\end{align*}
Combined with the inductive hypothesis $q_i^\top q_j = \delta_{ij}$ for $i, j \le k-1$, the extended set $\{q_1, \ldots, q_k\}$ is orthonormal.
**Case 2: $d_k = \mathbf{0}$.** Set $R_{kk} = 0$ and choose $q_k$ to be any unit vector orthogonal to $q_1, \ldots, q_{k-1}$. Such a vector exists because $k - 1 < m$ (the ambient space $\mathbb{R}^m$ has dimension $m \ge n \ge k$), so the orthogonal complement of $\operatorname{span}(q_1, \ldots, q_{k-1})$ is nontrivial. By construction, $\|q_k\| = 1$ and $q_\ell^\top q_k = 0$ for $\ell = 1, \ldots, k-1$, so $\{q_1, \ldots, q_k\}$ is orthonormal.
[/step]
[step:Verify span preservation at step $k$]
We show $\operatorname{span}(q_1, \ldots, q_k) \supseteq \operatorname{span}(a_1, \ldots, a_k)$.
By the inductive hypothesis, $\operatorname{span}(q_1, \ldots, q_{k-1}) \supseteq \operatorname{span}(a_1, \ldots, a_{k-1})$, so it suffices to show $a_k \in \operatorname{span}(q_1, \ldots, q_k)$.
From the definition of $d_k$:
\begin{align*}
a_k = \sum_{i=1}^{k-1} (q_i^\top a_k)\, q_i + d_k.
\end{align*}
**Case 1: $d_k \neq \mathbf{0}$.** Then $d_k = R_{kk}\, q_k$ with $R_{kk} = \|d_k\| > 0$. Substituting:
\begin{align*}
a_k = \sum_{i=1}^{k-1} R_{ik}\, q_i + R_{kk}\, q_k = \sum_{i=1}^{k} R_{ik}\, q_i,
\end{align*}
where $R_{ik} = q_i^\top a_k$ for $i < k$. This lies in $\operatorname{span}(q_1, \ldots, q_k)$.
**Case 2: $d_k = \mathbf{0}$.** Then $a_k = \sum_{i=1}^{k-1} (q_i^\top a_k)\, q_i \in \operatorname{span}(q_1, \ldots, q_{k-1}) \subset \operatorname{span}(q_1, \ldots, q_k)$. In this case $R_{kk} = 0$, so $a_k = \sum_{i=1}^{k} R_{ik}\, q_i$ still holds (the $k$-th term is zero).
In both cases, $a_k \in \operatorname{span}(q_1, \ldots, q_k)$ and the column reconstruction formula $a_k = \sum_{i=1}^{k} R_{ik}\, q_i$ is established.
[guided]
The span preservation property has two directions. The inclusion $\operatorname{span}(a_1, \ldots, a_k) \subseteq \operatorname{span}(q_1, \ldots, q_k)$ is what we just proved. Does the reverse inclusion hold?
Not necessarily. When $d_k = \mathbf{0}$, the vector $a_k$ already lies in $\operatorname{span}(q_1, \ldots, q_{k-1})$, so adding $a_k$ does not enlarge the span. But the algorithm chooses $q_k$ orthogonal to $q_1, \ldots, q_{k-1}$, introducing a genuinely new direction. Thus $\operatorname{span}(q_1, \ldots, q_k)$ may be strictly larger than $\operatorname{span}(a_1, \ldots, a_k)$ when columns of $A$ are linearly dependent.
This is not a defect of the algorithm: the purpose of the $d_k = \mathbf{0}$ branch is to ensure that $Q$ always has orthonormal columns, even when $A$ is rank-deficient. The column $q_k$ is a "filler" that completes the orthonormal system without affecting the factorisation $A = QR$ (because $R_{kk} = 0$ zeroes out the contribution of $q_k$ in the product $QR$).
[/guided]
[/step]
[step:Assemble the factorisation $A = QR$ with $R$ upper triangular]
After completing all $n$ steps, define the matrices:
\begin{align*}
Q := \begin{pmatrix} q_1 & q_2 & \cdots & q_n \end{pmatrix} \in \mathbb{R}^{m \times n}, \qquad R := (R_{ik}) \in \mathbb{R}^{n \times n}.
\end{align*}
The column reconstruction formula from the inductive argument gives, for each $k = 1, \ldots, n$:
\begin{align*}
a_k = \sum_{i=1}^{k} R_{ik}\, q_i.
\end{align*}
This means $R_{ik} = 0$ for $i > k$ (since the sum only runs to $i = k$), so $R$ is upper triangular. In matrix form, the equation $a_k = Q r_k$ (where $r_k$ is the $k$-th column of $R$) for each $k = 1, \ldots, n$ assembles into
\begin{align*}
A = QR.
\end{align*}
The columns of $Q$ form an orthonormal set by the induction: $Q^\top Q = I_n$. The matrix $R$ is upper triangular with diagonal entries $R_{kk} = \|d_k\| \ge 0$ and off-diagonal entries $R_{ik} = q_i^\top a_k$ for $i < k$.
[guided]
Let us verify the matrix equation $A = QR$ entry by entry. The $(p, k)$ entry of $QR$ is
\begin{align*}
(QR)_{pk} = \sum_{i=1}^{n} Q_{pi} R_{ik} = \sum_{i=1}^{n} (q_i)_p \, R_{ik}.
\end{align*}
Since $R$ is upper triangular, $R_{ik} = 0$ for $i > k$, so the sum truncates to $i = 1, \ldots, k$:
\begin{align*}
(QR)_{pk} = \sum_{i=1}^{k} (q_i)_p \, R_{ik} = \left(\sum_{i=1}^{k} R_{ik}\, q_i\right)_p = (a_k)_p = A_{pk},
\end{align*}
which is precisely the column reconstruction formula. This confirms $A = QR$.
The upper triangularity of $R$ is a consequence of the sequential nature of the algorithm: when processing column $a_k$, only the previously constructed vectors $q_1, \ldots, q_k$ are used, so $a_k$ is expressed as a linear combination of $q_1, \ldots, q_k$ with no contribution from $q_{k+1}, \ldots, q_n$. This forces $R_{ik} = 0$ for $i > k$.
[/guided]
[/step]