[proofplan]
We expand the initial vector $x^{(0)}$ in the eigenbasis $\{w_1, \ldots, w_n\}$ and track the effect of repeated multiplication by $A$. After $k$ applications, the component along $w_1$ is amplified by $\lambda_1^k$ while every other component is amplified by $\lambda_i^k$ with $|\lambda_i| \leq |\lambda_2| < |\lambda_1|$. Normalising, the ratio of the spurious components to the dominant component decays like $(|\lambda_2|/|\lambda_1|)^k = \rho^k$, which gives the stated convergence rate.
[/proofplan]
[step:Expand $A^k x^{(0)}$ in the eigenbasis and factor out $\lambda_1^k$]
Since $A$ is diagonalisable with eigenvectors $w_1, \ldots, w_n$, the initial vector decomposes as
\begin{align*}
x^{(0)} = \sum_{i=1}^n c_i w_i, \qquad c_1 \neq 0.
\end{align*}
Applying $A$ repeatedly and using $Aw_i = \lambda_i w_i$:
\begin{align*}
A^k x^{(0)} = \sum_{i=1}^n c_i \lambda_i^k w_i = \lambda_1^k \left( c_1 w_1 + \sum_{i=2}^n c_i \left(\frac{\lambda_i}{\lambda_1}\right)^k w_i \right).
\end{align*}
[guided]
The key idea is that multiplication by $A$ acts diagonally on the eigenbasis: each eigenvector $w_i$ is simply scaled by $\lambda_i$. After $k$ multiplications, $w_i$ is scaled by $\lambda_i^k$. The dominant eigenvalue $\lambda_1$ has the largest modulus, so its component grows fastest. To see the relative behaviour, we factor out $\lambda_1^k$:
\begin{align*}
A^k x^{(0)} = \sum_{i=1}^n c_i \lambda_i^k w_i = \lambda_1^k \left( c_1 w_1 + \sum_{i=2}^n c_i \left(\frac{\lambda_i}{\lambda_1}\right)^k w_i \right).
\end{align*}
Each ratio $|\lambda_i / \lambda_1| \leq |\lambda_2 / \lambda_1| = \rho < 1$ for $i \geq 2$, so the terms in the sum decay geometrically.
[/guided]
[/step]
[step:Bound the distance from $A^k x^{(0)}$ to $\operatorname{span}(w_1)$]
Define the normalised iterate direction as
\begin{align*}
v^{(k)} := \frac{A^k x^{(0)}}{\|A^k x^{(0)}\|}.
\end{align*}
The power method iterate $x^{(k)}$ equals $v^{(k)}$ (up to the specific normalisation convention used at each step, which does not affect the direction). Write
\begin{align*}
A^k x^{(0)} = \lambda_1^k \bigl( c_1 w_1 + r^{(k)} \bigr), \qquad r^{(k)} := \sum_{i=2}^n c_i \left(\frac{\lambda_i}{\lambda_1}\right)^k w_i.
\end{align*}
Since $\{w_1, \ldots, w_n\}$ is an orthonormal set (choosing orthonormal eigenvectors, which is possible since $A$ is diagonalisable — for non-orthogonal eigenbases one uses a biorthogonal argument, but the norm estimates carry through with modified constants), we have
\begin{align*}
\|r^{(k)}\|^2 = \sum_{i=2}^n |c_i|^2 \left|\frac{\lambda_i}{\lambda_1}\right|^{2k} \leq \rho^{2k} \sum_{i=2}^n |c_i|^2,
\end{align*}
where $\rho = |\lambda_2/\lambda_1| < 1$. Therefore $\|r^{(k)}\| \leq C \rho^k$, where $C = \bigl(\sum_{i=2}^n |c_i|^2\bigr)^{1/2}$.
[guided]
We want to measure how close the direction of $A^k x^{(0)}$ is to $\operatorname{span}(w_1)$. The vector $A^k x^{(0)}$ has a component $\lambda_1^k c_1 w_1$ along the dominant eigenspace and a remainder $\lambda_1^k r^{(k)}$ perpendicular to it (using orthonormality of the eigenvectors). The remainder satisfies
\begin{align*}
\|r^{(k)}\|^2 = \sum_{i=2}^n |c_i|^2 \left|\frac{\lambda_i}{\lambda_1}\right|^{2k} \leq \rho^{2k} \sum_{i=2}^n |c_i|^2,
\end{align*}
because $|\lambda_i/\lambda_1| \leq \rho$ for all $i \geq 2$. Setting $C = \bigl(\sum_{i=2}^n |c_i|^2\bigr)^{1/2}$, we get $\|r^{(k)}\| \leq C\rho^k$.
Note: We assumed $A$ has an orthonormal eigenbasis. When $A$ is not normal, the eigenvectors need not be orthogonal. In that case, one works with the oblique projections onto the eigenspaces and the bound acquires additional constants depending on the condition number of the eigenvector matrix, but the $O(\rho^k)$ rate is preserved.
[/guided]
[/step]
[step:Convert the remainder bound into a distance estimate for $x^{(k)}$]
The distance from a unit vector $v$ to a one-dimensional subspace $\operatorname{span}(w_1)$ is
\begin{align*}
\operatorname{dist}(v, \operatorname{span}(w_1)) = \sqrt{1 - |\langle v, w_1 \rangle|^2}.
\end{align*}
For the normalised iterate $x^{(k)} = v^{(k)}$, we have
\begin{align*}
v^{(k)} = \frac{c_1 w_1 + r^{(k)}}{\|c_1 w_1 + r^{(k)}\|}.
\end{align*}
Since $c_1 \neq 0$ and $\|r^{(k)}\| \to 0$, for $k$ sufficiently large we have $\|c_1 w_1 + r^{(k)}\| \geq |c_1|/2 > 0$. The component of $v^{(k)}$ orthogonal to $w_1$ is
\begin{align*}
v^{(k)} - \langle v^{(k)}, w_1 \rangle w_1 = \frac{r^{(k)}}{\|c_1 w_1 + r^{(k)}\|},
\end{align*}
so
\begin{align*}
\operatorname{dist}(x^{(k)}, \operatorname{span}(w_1)) = \frac{\|r^{(k)}\|}{\|c_1 w_1 + r^{(k)}\|} \leq \frac{C\rho^k}{|c_1|/2} = \frac{2C}{|c_1|}\,\rho^k = O(\rho^k).
\end{align*}
This completes the proof that the power method iterates converge to $\operatorname{span}(w_1)$ at rate $O(\rho^k)$ with $\rho = |\lambda_2/\lambda_1|$.
[guided]
The final step converts the bound on the remainder $r^{(k)}$ into a bound on the geometric distance. The normalised iterate $v^{(k)}$ is a unit vector, and its distance to $\operatorname{span}(w_1)$ equals the norm of its component perpendicular to $w_1$. Since $r^{(k)}$ is orthogonal to $w_1$ (by orthonormality of the eigenvectors), the perpendicular component of $v^{(k)}$ is exactly $r^{(k)} / \|c_1w_1 + r^{(k)}\|$.
The denominator $\|c_1w_1 + r^{(k)}\|$ is bounded below by $|c_1| - \|r^{(k)}\| \geq |c_1|/2$ for all sufficiently large $k$ (since $\|r^{(k)}\| \to 0$). Therefore
\begin{align*}
\operatorname{dist}(x^{(k)}, \operatorname{span}(w_1)) = \frac{\|r^{(k)}\|}{\|c_1w_1 + r^{(k)}\|} \leq \frac{C\rho^k}{|c_1|/2} = \frac{2C}{|c_1|}\,\rho^k.
\end{align*}
The constant $2C/|c_1|$ depends on the initial vector $x^{(0)}$ through the coefficients $c_i$ but is independent of $k$. The convergence rate $\rho = |\lambda_2/\lambda_1| < 1$ is determined entirely by the spectral gap between the dominant and second eigenvalues.
[/guided]
[/step]