[proofplan]
We proceed by induction on $k$. The base case $k = 0$ is immediate. For the inductive step, we multiply the factorisation $A^k X^{(0)} = X^{(k)} R^{(k)}$ on the left by $A$, then use the QR step of the simultaneous iteration algorithm ($AX^{(k)} = X^{(k+1)}\hat{R}$) to express $A^{k+1}X^{(0)}$ as the product of an orthonormal matrix $X^{(k+1)}$ and an upper triangular matrix $\hat{R}\,R^{(k)}$.
[/proofplan]
[step:Verify the base case $k = 0$]
At $k = 0$:
\begin{align*}
A^0 X^{(0)} = I \cdot X^{(0)} = X^{(0)} \cdot I.
\end{align*}
Since $X^{(0)}$ has orthonormal columns (by assumption of the simultaneous iteration algorithm) and $I$ is upper triangular, this is a valid QR factorisation with $R^{(0)} = I$.
[guided]
The base case is immediate but worth stating precisely. The claim is that $A^0 X^{(0)} = X^{(0)} R^{(0)}$ with $X^{(0)}$ having orthonormal columns and $R^{(0)}$ upper triangular. Taking $R^{(0)} = I$ (the identity matrix), we have $A^0 X^{(0)} = X^{(0)} \cdot I = X^{(0)}$, which is trivially true. The identity matrix $I$ is upper triangular (every entry below the diagonal is zero), and $X^{(0)}$ has orthonormal columns by the initialisation assumption of the algorithm.
[/guided]
[/step]
[step:State the inductive hypothesis]
Assume that for some $k \geq 0$, there exists an upper triangular matrix $R^{(k)}$ such that
\begin{align*}
A^k X^{(0)} = X^{(k)} R^{(k)},
\end{align*}
where $X^{(k)}$ has orthonormal columns (as produced by the simultaneous iteration algorithm at step $k$).
[/step]
[step:Establish the inductive step by left-multiplying by $A$ and applying the QR step of the algorithm]
Left-multiplying the inductive hypothesis by $A$:
\begin{align*}
A^{k+1} X^{(0)} = A(A^k X^{(0)}) = A X^{(k)} R^{(k)}.
\end{align*}
By definition of the simultaneous iteration algorithm, the $(k+1)$-th step computes the QR factorisation $AX^{(k)} = X^{(k+1)}\hat{R}$, where $X^{(k+1)}$ has orthonormal columns and $\hat{R}$ is upper triangular. Substituting:
\begin{align*}
A^{k+1} X^{(0)} = X^{(k+1)}\hat{R}\,R^{(k)}.
\end{align*}
[guided]
The inductive step hinges on the observation that left-multiplying the factorisation $A^k X^{(0)} = X^{(k)} R^{(k)}$ by $A$ produces $A^{k+1} X^{(0)} = (AX^{(k)}) R^{(k)}$. The simultaneous iteration algorithm defines its $(k+1)$-th step by computing the QR factorisation of $AX^{(k)}$, so we can replace $AX^{(k)}$ with $X^{(k+1)}\hat{R}$.
This substitution is the heart of the proof: the algorithm's QR step at each iteration peels off one orthogonal factor and accumulates it into the running product, while the triangular factors stack up on the right.
[/guided]
[/step]
[step:Verify that $R^{(k+1)} = \hat{R}\,R^{(k)}$ is upper triangular and complete the induction]
Define $R^{(k+1)} := \hat{R}\,R^{(k)}$. Both $\hat{R}$ and $R^{(k)}$ are upper triangular. The product of two upper triangular matrices is upper triangular: for $i > j$,
\begin{align*}
(R^{(k+1)})_{ij} = \sum_{\ell=1}^{n} \hat{R}_{i\ell} \, R^{(k)}_{\ell j} = \sum_{\ell=i}^{n} \hat{R}_{i\ell} \, R^{(k)}_{\ell j} = 0,
\end{align*}
since $\ell \geq i > j$ forces $R^{(k)}_{\ell j} = 0$ (as $R^{(k)}$ is upper triangular and $\ell > j$). Therefore $R^{(k+1)}$ is upper triangular, and
\begin{align*}
A^{k+1}X^{(0)} = X^{(k+1)}R^{(k+1)}
\end{align*}
is a valid QR factorisation, completing the induction.
[guided]
The only algebraic fact beyond the algorithm's definition is that upper triangular matrices are closed under multiplication. To see this directly: if $U$ and $V$ are upper triangular, then $(UV)_{ij} = \sum_{\ell} U_{i\ell} V_{\ell j}$. For $i > j$, the term $U_{i\ell}$ is zero when $\ell < i$ (since $U$ is upper triangular), so only $\ell \geq i$ contributes. But when $\ell \geq i > j$, the term $V_{\ell j}$ is zero (since $V$ is upper triangular and $\ell > j$). Hence $(UV)_{ij} = 0$ for $i > j$.
This result has a useful interpretation: simultaneous iteration computes the QR factorisation of $A^k X^{(0)}$ incrementally, one factor at a time, without ever forming the potentially ill-conditioned matrix $A^k$ explicitly. Each step of the algorithm peels off one orthogonal factor and accumulates it into $X^{(k)}$, while the triangular factor $R^{(k)}$ grows by one multiplicative update $R^{(k+1)} = \hat{R}\,R^{(k)}$.
[/guided]
[/step]