[proofplan]
We prove the column convergence (claim 1) by induction on the column index $i$. The base case $i = 1$ applies the [Technical Convergence Lemma](/theorems/1410) with $p = 1$ to show that $X^{(k)}_{(1)}$ converges to $\pmw_1$. The inductive step applies the lemma with $p = m+1$ to show the first $m+1$ columns lie asymptotically in $\operatorname{span}(w_1, \ldots, w_{m+1})$, then uses orthonormality of the columns and the induction hypothesis to pin down the $(m+1)$-th column as $\pmw_{m+1}$. Claim 2 follows immediately from the eigenequation.
[/proofplan]
[step:Base case: the first column converges to $\pmw_1$ via the technical lemma with $p = 1$]
Apply the [Technical Convergence Lemma for Simultaneous Iteration](/theorems/1410) with $p = 1$. The matrices are $W_1 = [w_1] \in \mathbb{R}^{n \times 1}$ and $V_1 = [w_2 \mid \cdots \mid w_n] \in \mathbb{R}^{n \times (n-1)}$. By hypothesis, $W_1^\top X^{(0)}_{(1)} = w_1^\top x^{(0)}_1 \in \mathbb{R}$ is nonzero (hence invertible as a $1 \times 1$ matrix). The lemma gives
\begin{align*}
\|V_1^\top X^{(k)}_{(1)}\|_2 \leq c_1 \left|\frac{\lambda_2}{\lambda_1}\right|^k \to 0 \quad \text{as } k \to \infty.
\end{align*}
Since $\{V_1^\top X^{(k)}_{(1)}\}$ collects the components of the unit vector $X^{(k)}_{(1)}$ along $w_2, \ldots, w_n$, and these components vanish, the vector $X^{(k)}_{(1)}$ converges into $\operatorname{span}(w_1)$. Being a unit vector, it must converge to $\pmw_1$.
[/step]
[step:Inductive step: given convergence of the first $m$ columns, establish convergence of column $m + 1$]
Assume that for $i = 1, \ldots, m$ (with $m \leq p - 1$), the $i$-th column of $X^{(k)}$ converges to $\pmw_i$. Apply the [Technical Convergence Lemma](/theorems/1410) with the parameter $p$ set to $m + 1$:
\begin{align*}
W_{m+1} = [w_1 \mid \cdots \mid w_{m+1}], \qquad V_{m+1} = [w_{m+2} \mid \cdots \mid w_n].
\end{align*}
By hypothesis, $W_{m+1}^\top X^{(0)}_{(m+1)} \in \mathbb{R}^{(m+1) \times (m+1)}$ is invertible. The lemma gives
\begin{align*}
\|V_{m+1}^\top X^{(k)}_{(m+1)}\|_2 \leq c_{m+1} \left|\frac{\lambda_{m+2}}{\lambda_{m+1}}\right|^k \to 0.
\end{align*}
This means every column of $X^{(k)}_{(m+1)}$ has vanishing component in $\operatorname{span}(w_{m+2}, \ldots, w_n)$, so the first $m+1$ columns of $X^{(k)}$ converge into $\operatorname{span}(w_1, \ldots, w_{m+1})$.
By the induction hypothesis, columns $1, \ldots, m$ already converge to $\pmw_1, \ldots, \pmw_m$. Since the columns of $X^{(k)}$ are orthonormal at every iteration, the $(m+1)$-th column must be asymptotically orthogonal to $w_1, \ldots, w_m$ and lie in $\operatorname{span}(w_1, \ldots, w_{m+1})$. The only unit vectors in this subspace orthogonal to $w_1, \ldots, w_m$ are $\pmw_{m+1}$. Therefore the $(m+1)$-th column converges to $\pmw_{m+1}$.
[guided]
The induction works because the technical lemma controls the *joint* behaviour of the first $m+1$ columns, while the induction hypothesis pins down columns $1$ through $m$ individually. The gap between these two pieces of information is exactly one column: column $m+1$. Orthonormality of the columns of $X^{(k)}$ bridges the gap — it forces the $(m+1)$-th column, which is converging into the $(m+1)$-dimensional eigenspace, to be asymptotically perpendicular to the directions already claimed by the first $m$ columns.
The assumption that all eigenvalue magnitudes are strictly separated ($|\lambda_1| > |\lambda_2| > \cdots > |\lambda_n|$) is consumed here: each application of the lemma requires a strict spectral gap $|\lambda_{m+1}| > |\lambda_{m+2}|$ to guarantee the rate $|\lambda_{m+2}/\lambda_{m+1}|^k \to 0$. If two eigenvalues had equal magnitude, the corresponding eigenvectors could mix indefinitely without converging individually.
[/guided]
[/step]
[step:Conclude convergence of the projected matrix $(X^{(k)})^\top A X^{(k)}$ to $\operatorname{diag}(\lambda_1, \ldots, \lambda_p)$]
By the first claim, as $k \to \infty$:
\begin{align*}
X^{(k)} \to [\pmw_1 \mid \cdots \mid \pmw_p] =: \tilde{W},
\end{align*}
where the signs $\pm$ are eventually fixed (each column converges to one of the two unit vectors). Since the $w_i$ are orthonormal eigenvectors of $A$ with $Aw_i = \lambda_iw_i$:
\begin{align*}
\bigl((X^{(k)})^\top A X^{(k)}\bigr)_{ij} = (x_i^{(k)})^\top A x_j^{(k)} \to (\pmw_i)^\top A(\pmw_j) = \lambda_j\,w_i^\topw_j = \lambda_j\,\delta_{ij}.
\end{align*}
Therefore $(X^{(k)})^\top A X^{(k)} \to \operatorname{diag}(\lambda_1, \ldots, \lambda_p)$.
[/step]