[proofplan]
We use the spectral decomposition $AW = W\Lambda$ and $AV = V\Omega$ to express $V^\top X^{(k)}$ and $W^\top X^{(k)}$ in terms of the diagonal eigenvalue matrices. From the QR factorisation $A^k X^{(0)} = X^{(k)}R^{(k)}$, we obtain $V^\top X^{(k)} = \Omega^k(V^\top X^{(0)})(R^{(k)})^{-1}$. By eliminating $(R^{(k)})^{-1}$ using the companion equation for $W^\top X^{(k)}$ and bounding norms via sub-multiplicativity, we arrive at the stated decay rate $|\lambda_{p+1}/\lambda_p|^k$.
[/proofplan]
[step:Derive the identities $V^\top A^k = \Omega^k V^\top$ and $W^\top A^k = \Lambda^k W^\top$ from the spectral decomposition]
Since $A$ is symmetric with orthonormal eigenvectors $w_1, \ldots, w_n$, we have $Aw_i = \lambda_iw_i$. In matrix form:
\begin{align*}
AW = W\Lambda, \qquad AV = V\Omega,
\end{align*}
where $\Lambda = \operatorname{diag}(\lambda_1, \ldots, \lambda_p)$ and $\Omega = \operatorname{diag}(\lambda_{p+1}, \ldots, \lambda_n)$. Since $W$ and $V$ have orthonormal columns and $[W \mid V]$ is orthogonal, we have $W^\top W = I_p$, $V^\top V = I_{n-p}$, and $W^\top V = 0$.
Taking transposes: $W^\top A = \Lambda W^\top$ and $V^\top A = \Omega V^\top$. Iterating:
\begin{align*}
W^\top A^k = \Lambda^k W^\top, \qquad V^\top A^k = \Omega^k V^\top.
\end{align*}
[/step]
[step:Express $V^\top X^{(k)}$ and $W^\top X^{(k)}$ using the QR factorisation of $A^k X^{(0)}$]
By the [QR Factor of Iterated Power](/theorems/1409), $A^k X^{(0)} = X^{(k)} R^{(k)}$ for an upper triangular $R^{(k)}$. Left-multiplying by $V^\top$ and $W^\top$:
\begin{align*}
V^\top X^{(k)} R^{(k)} &= V^\top A^k X^{(0)} = \Omega^k (V^\top X^{(0)}), \\
W^\top X^{(k)} R^{(k)} &= W^\top A^k X^{(0)} = \Lambda^k (W^\top X^{(0)}).
\end{align*}
Therefore
\begin{align*}
V^\top X^{(k)} &= \Omega^k (V^\top X^{(0)}) (R^{(k)})^{-1}, \\
W^\top X^{(k)} &= \Lambda^k (W^\top X^{(0)}) (R^{(k)})^{-1}.
\end{align*}
[/step]
[step:Eliminate $(R^{(k)})^{-1}$ using the invertibility of $W^\top X^{(0)}$]
Since $W^\top X^{(0)} \in \mathbb{R}^{p \times p}$ is invertible by hypothesis, and $|\lambda_p| > |\lambda_{p+1}|$ implies $\Lambda$ is invertible, the second equation gives
\begin{align*}
(R^{(k)})^{-1} = (W^\top X^{(0)})^{-1} \Lambda^{-k} (W^\top X^{(k)}).
\end{align*}
Substituting into the expression for $V^\top X^{(k)}$:
\begin{align*}
V^\top X^{(k)} = \Omega^k (V^\top X^{(0)}) (W^\top X^{(0)})^{-1} \Lambda^{-k} (W^\top X^{(k)}).
\end{align*}
[guided]
The key manoeuvre is using the second equation to express the unknown $(R^{(k)})^{-1}$ in terms of known quantities. This is possible because $W^\top X^{(0)}$ is invertible — a condition that geometrically means the initial subspace $\operatorname{Range}(X^{(0)})$ is not orthogonal to any of the first $p$ eigenvectors. Without this condition, the iteration could miss an eigenspace entirely.
[/guided]
[/step]
[step:Bound $\|V^\top X^{(k)}\|_2$ by sub-multiplicativity of the operator norm]
Taking the 2-norm and applying sub-multiplicativity:
\begin{align*}
\|V^\top X^{(k)}\|_2 \leq \|\Omega^k\|_2 \cdot \|V^\top X^{(0)}\|_2 \cdot \|(W^\top X^{(0)})^{-1}\|_2 \cdot \|\Lambda^{-k}\|_2 \cdot \|W^\top X^{(k)}\|_2.
\end{align*}
We bound each factor:
- $\|\Omega^k\|_2 = \max_{i > p} |\lambda_i|^k = |\lambda_{p+1}|^k$, since $\Omega$ is diagonal with entries $\lambda_{p+1}, \ldots, \lambda_n$ and $|\lambda_{p+1}| \geq |\lambda_{p+2}| \geq \cdots$.
- $\|\Lambda^{-k}\|_2 = \max_{i \leq p} |\lambda_i|^{-k} = |\lambda_p|^{-k}$, since $\Lambda^{-1}$ is diagonal with entries $\lambda_1^{-1}, \ldots, \lambda_p^{-1}$ and $|\lambda_p|^{-1}$ is the largest.
- $\|W^\top X^{(k)}\|_2 \leq \|W^\top\|_2 \cdot \|X^{(k)}\|_2 = 1 \cdot 1 = 1$, because $W$ has orthonormal columns (so $\|W^\top\|_2 = 1$) and $X^{(k)}$ has orthonormal columns (so $\|X^{(k)}\|_2 = 1$).
Collecting:
\begin{align*}
\|V^\top X^{(k)}\|_2 \leq \|V^\top X^{(0)}\|_2 \cdot \|(W^\top X^{(0)})^{-1}\|_2 \cdot \left|\frac{\lambda_{p+1}}{\lambda_p}\right|^k = c\left|\frac{\lambda_{p+1}}{\lambda_p}\right|^k,
\end{align*}
where $c = \|V^\top X^{(0)}\|_2 \cdot \|(W^\top X^{(0)})^{-1}\|_2 > 0$ depends only on $X^{(0)}$, $W$, and $V$.
[guided]
The bound $\|W^\top X^{(k)}\|_2 \leq 1$ deserves scrutiny. The matrix $W^\top$ has orthonormal rows (since $W$ has orthonormal columns), so $W^\top$ has operator norm $\|W^\top\|_2 = \sigma_{\max}(W^\top) = 1$. Likewise, $X^{(k)}$ has orthonormal columns, so $\|X^{(k)}\|_2 = 1$. Sub-multiplicativity gives $\|W^\top X^{(k)}\|_2 \leq 1$.
The geometric meaning of the result is that $V^\top X^{(k)}$ measures the component of the columns of $X^{(k)}$ lying in the unwanted eigenspace $\operatorname{span}(w_{p+1}, \ldots, w_n)$. This component decays at rate $|\lambda_{p+1}/\lambda_p|^k$, so the columns of $X^{(k)}$ converge into the desired eigenspace $\operatorname{span}(w_1, \ldots, w_p)$.
[/guided]
[/step]