[proofplan]
We prove the equality of the two output lists by induction on the index $k$. The induction hypothesis says not only that the previously produced orthonormal vectors agree, but also that they span the same subspace as the previously processed input vectors. At the $k$th stage, the modified algorithm subtracts the same [orthogonal projection](/theorems/437) components as the classical formula, one component at a time. [Linear independence](/page/Linear%20Independence) guarantees that the common residual is nonzero, so the shared normalization rule gives the same normalized vector in both algorithms.
[/proofplan]
[step:Initialize the first residual in both algorithms]
For $k=1$, both algorithms have no previous orthonormal vectors. Thus the empty sum in the classical formula is zero, and the modified algorithm starts with $w_{1,0}=v_1$. Hence
\begin{align*}
u_{C,1}=v_1=u_{M,1}.
\end{align*}
Because $(v_1,\ldots,v_m)$ is linearly independent, $v_1\neq 0$. Therefore $u_{C,1}$ and $u_{M,1}$ are nonzero, and both algorithms normalize by the same positive scalar $|v_1|$. Consequently
\begin{align*}
e_{C,1}=\frac{v_1}{|v_1|}=e_{M,1}.
\end{align*}
The list $(e_{C,1})$ is orthonormal, and
\begin{align*}
\operatorname{span}\{e_{C,1}\}=\operatorname{span}\{v_1\}.
\end{align*}
[/step]
[step:Assume the previous Gram-Schmidt vectors agree and span the previous inputs]
Fix an index $k$ with $2\le k\le m$. Assume that, for every $1\le i<k$, the vectors produced by the two algorithms agree:
\begin{align*}
e_{C,i}=e_{M,i}.
\end{align*}
Let $e_i$ denote this common vector for $1\le i<k$. Assume also that $(e_1,\ldots,e_{k-1})$ is orthonormal and that
\begin{align*}
\operatorname{span}\{e_1,\ldots,e_{k-1}\}=\operatorname{span}\{v_1,\ldots,v_{k-1}\}.
\end{align*}
We prove the same conclusions at index $k$.
[/step]
[step:Compute the modified working vector after each projection]
We claim that, for every integer $j$ with $0\le j\le k-1$, the modified working vector satisfies
\begin{align*}
w_{k,j}=v_k-\sum_{i=1}^{j}(v_k,e_i)_V e_i.
\end{align*}
For $j=0$, this is the definition $w_{k,0}=v_k$, with the empty sum equal to zero.
Now let $j$ satisfy $1\le j\le k-1$, and assume the formula holds for $j-1$. Since $(e_1,\ldots,e_{k-1})$ is orthonormal, we have $(e_i,e_j)_V=0$ for $i<j$ and $(e_j,e_j)_V=1$. Using linearity of the [inner product](/page/Inner%20Product) in the first argument, the coefficient subtracted by the modified algorithm is
\begin{align*}
(w_{k,j-1},e_j)_V=(v_k,e_j)_V.
\end{align*}
Indeed, the terms involving $e_i$ with $i<j$ vanish by orthogonality. Substituting this coefficient into the modified update gives
\begin{align*}
w_{k,j}=w_{k,j-1}-(v_k,e_j)_V e_j.
\end{align*}
Using the induction formula for $w_{k,j-1}$ yields
\begin{align*}
w_{k,j}=v_k-\sum_{i=1}^{j}(v_k,e_i)_V e_i.
\end{align*}
Thus the formula holds for every $0\le j\le k-1$ by finite induction.
[guided]
The purpose of this step is to show that the modified algorithm has not changed the mathematical residual; it has only changed the order in which the projection components are removed. We prove the precise formula
\begin{align*}
w_{k,j}=v_k-\sum_{i=1}^{j}(v_k,e_i)_V e_i
\end{align*}
for each $0\le j\le k-1$.
When $j=0$, the statement says $w_{k,0}=v_k$, which is exactly the definition of the initial modified working vector. Now suppose the formula has been proved for some $j-1$, where $1\le j\le k-1$. The modified algorithm computes the next coefficient from the current working vector:
\begin{align*}
(w_{k,j-1},e_j)_V.
\end{align*}
By the induction formula,
\begin{align*}
w_{k,j-1}=v_k-\sum_{i=1}^{j-1}(v_k,e_i)_V e_i.
\end{align*}
Taking the inner product with $e_j$ and using linearity in the first argument gives
\begin{align*}
(w_{k,j-1},e_j)_V=(v_k,e_j)_V-\sum_{i=1}^{j-1}(v_k,e_i)_V(e_i,e_j)_V.
\end{align*}
Because $(e_1,\ldots,e_{k-1})$ is orthonormal, every factor $(e_i,e_j)_V$ with $i<j$ is zero. Therefore
\begin{align*}
(w_{k,j-1},e_j)_V=(v_k,e_j)_V.
\end{align*}
This is the key point: although modified Gram-Schmidt computes the coefficient from the updated vector $w_{k,j-1}$, exact orthogonality of the previously removed directions makes that coefficient equal to the classical coefficient.
Substituting this equality into the modified update,
\begin{align*}
w_{k,j}=w_{k,j-1}-(w_{k,j-1},e_j)_V e_j,
\end{align*}
we obtain
\begin{align*}
w_{k,j}=w_{k,j-1}-(v_k,e_j)_V e_j.
\end{align*}
Finally, replacing $w_{k,j-1}$ by its induction formula gives
\begin{align*}
w_{k,j}=v_k-\sum_{i=1}^{j-1}(v_k,e_i)_V e_i-(v_k,e_j)_V e_j.
\end{align*}
Combining the two displayed subtraction terms into a single finite sum gives
\begin{align*}
w_{k,j}=v_k-\sum_{i=1}^{j}(v_k,e_i)_V e_i.
\end{align*}
This completes the induction over $j$.
[/guided]
[/step]
[step:Identify the classical and modified residuals]
Taking $j=k-1$ in the formula from the previous step gives
\begin{align*}
u_{M,k}=w_{k,k-1}=v_k-\sum_{i=1}^{k-1}(v_k,e_i)_V e_i.
\end{align*}
Since $e_i=e_{C,i}$ for every $1\le i<k$, the classical residual is
\begin{align*}
u_{C,k}=v_k-\sum_{i=1}^{k-1}(v_k,e_i)_V e_i.
\end{align*}
Therefore
\begin{align*}
u_{C,k}=u_{M,k}.
\end{align*}
Let $u_k$ denote this common residual.
[/step]
[step:Use linear independence to rule out a zero residual]
Suppose, for contradiction, that $u_k=0$. From the residual formula,
\begin{align*}
v_k=\sum_{i=1}^{k-1}(v_k,e_i)_V e_i.
\end{align*}
Thus $v_k\in \operatorname{span}\{e_1,\ldots,e_{k-1}\}$. By the induction hypothesis on spans,
\begin{align*}
v_k\in \operatorname{span}\{v_1,\ldots,v_{k-1}\}.
\end{align*}
This contradicts the linear independence of $(v_1,\ldots,v_m)$. Hence $u_k\neq 0$.
[/step]
[step:Normalize the common residual and update the induction]
Since $u_{C,k}=u_{M,k}=u_k$ and $u_k\neq 0$, both algorithms divide by the same positive scalar $|u_k|$. Hence
\begin{align*}
e_{C,k}=\frac{u_k}{|u_k|}=e_{M,k}.
\end{align*}
Moreover, $u_k$ is orthogonal to each $e_j$ with $1\le j<k$, because
\begin{align*}
(u_k,e_j)_V=(v_k,e_j)_V-\sum_{i=1}^{k-1}(v_k,e_i)_V(e_i,e_j)_V=0.
\end{align*}
Also $|e_{C,k}|=1$ by construction. Therefore $(e_1,\ldots,e_k)$ is orthonormal.
Finally, since
\begin{align*}
u_k=v_k-\sum_{i=1}^{k-1}(v_k,e_i)_V e_i,
\end{align*}
we have $u_k\in \operatorname{span}\{v_1,\ldots,v_k\}$ and $v_k\in \operatorname{span}\{e_1,\ldots,e_k\}$. Combining these inclusions with
\begin{align*}
\operatorname{span}\{e_1,\ldots,e_{k-1}\}=\operatorname{span}\{v_1,\ldots,v_{k-1}\}
\end{align*}
gives
\begin{align*}
\operatorname{span}\{e_1,\ldots,e_k\}=\operatorname{span}\{v_1,\ldots,v_k\}.
\end{align*}
Thus the induction hypotheses are established at index $k$.
[/step]
[step:Conclude equality of the full orthonormal lists]
By induction on $k$, every residual used by either algorithm is nonzero, and for every $1\le k\le m$ the two algorithms produce the same normalized vector:
\begin{align*}
e_{C,k}=e_{M,k}.
\end{align*}
Therefore, in exact arithmetic and under the stated common inner product and normalization conventions, classical Gram-Schmidt and modified Gram-Schmidt produce the same orthonormal list.
[/step]