[proofplan]
We use the eigenbasis expansion from the convergence of the power method to compute $r(x^{(k)})$ directly. After normalisation, the iterate $x^{(k)}$ has dominant component along $w_1$ with a perturbation of order $O(\rho^k)$. Substituting into the Rayleigh quotient formula and expanding, the leading correction to $\lambda_1$ is $O(\rho^k)$.
[/proofplan]
[step:Express the Rayleigh quotient in terms of the eigenbasis expansion]
Under the hypotheses of the [Convergence of the Power Method](/theorems/1405), we write
\begin{align*}
A^kx^{(0)} = \lambda_1^k\Bigl(c_1w_1 + r^{(k)}\Bigr), \qquad r^{(k)} = \sum_{i=2}^n c_i\Bigl(\frac{\lambda_i}{\lambda_1}\Bigr)^k w_i.
\end{align*}
The Rayleigh quotient is invariant under scalar multiplication of $x$, so
\begin{align*}
r(x^{(k)}) = r(A^kx^{(0)}) = \frac{(A^kx^{(0)})^* A (A^kx^{(0)})}{(A^kx^{(0)})^*(A^kx^{(0)})} = \frac{(c_1w_1 + r^{(k)})^* A (c_1w_1 + r^{(k)})}{(c_1w_1 + r^{(k)})^*(c_1w_1 + r^{(k)})},
\end{align*}
where the scalar factor $\lambda_1^k$ cancels in the ratio.
[/step]
[step:Compute the numerator and denominator using orthonormality]
Using $Aw_i = \lambda_iw_i$ and $w_i^*w_j = \delta_{ij}$ (choosing an orthonormal eigenbasis, which exists since $A$ is diagonalisable):
**Denominator:**
\begin{align*}
\|c_1w_1 + r^{(k)}\|^2 = |c_1|^2 + \sum_{i=2}^n |c_i|^2\Bigl|\frac{\lambda_i}{\lambda_1}\Bigr|^{2k} = |c_1|^2 + O(\rho^{2k}).
\end{align*}
**Numerator:**
\begin{align*}
(c_1w_1 + r^{(k)})^*A(c_1w_1 + r^{(k)}) = |c_1|^2\lambda_1 + \sum_{i=2}^n |c_i|^2\lambda_i\Bigl|\frac{\lambda_i}{\lambda_1}\Bigr|^{2k} = |c_1|^2\lambda_1 + O(\rho^{2k}).
\end{align*}
[guided]
Why does orthonormality of the eigenvectors eliminate all cross terms? Because $w_i^*w_j = 0$ for $i \neq j$, the inner product of the sum reduces to a sum of squared moduli. Similarly, $w_i^*Aw_j = \lambda_j\,w_i^*w_j = \lambda_j\delta_{ij}$, so the quadratic form is diagonal in the eigenbasis. Each non-dominant term carries a factor $|\lambda_i/\lambda_1|^{2k} \leq \rho^{2k}$.
Note: When $A$ is not normal and the eigenbasis is not orthogonal, the cross terms do not vanish. However, one can work with the biorthogonal system $\{w_i, \tilde{w}_i\}$ (eigenvectors of $A$ and $A^*$ respectively) and bound the cross terms using the condition number of the eigenvector matrix. The rate $O(\rho^k)$ is preserved.
[/guided]
[/step]
[step:Take the ratio and extract the convergence rate]
Dividing:
\begin{align*}
r(x^{(k)}) = \frac{|c_1|^2\lambda_1 + O(\rho^{2k})}{|c_1|^2 + O(\rho^{2k})} = \lambda_1 \cdot \frac{1 + O(\rho^{2k})}{1 + O(\rho^{2k})} = \lambda_1 + O(\rho^{2k}).
\end{align*}
More precisely, writing $\delta_k := \sum_{i=2}^n |c_i|^2|\lambda_i/\lambda_1|^{2k}$ and factoring $|c_1|^2$ from numerator and denominator:
\begin{align*}
r(x^{(k)}) - \lambda_1 = \frac{\sum_{i=2}^n |c_i|^2(\lambda_i - \lambda_1)|\lambda_i/\lambda_1|^{2k}}{|c_1|^2 + \delta_k}.
\end{align*}
Since $|\lambda_i - \lambda_1|$ is bounded, $\delta_k \leq C_1\rho^{2k}$, and $|c_1|^2 + \delta_k \geq |c_1|^2 > 0$:
\begin{align*}
|r(x^{(k)}) - \lambda_1| \leq \frac{C_2\,\rho^{2k}}{|c_1|^2} = O(\rho^{2k}).
\end{align*}
Since $\rho^{2k} = O(\rho^k)$ (as $\rho^{2k} \leq \rho^k$ for $\rho < 1$), we obtain $|r(x^{(k)}) - \lambda_1| = O(\rho^k)$, which is the stated rate. In fact, the bound above shows the Rayleigh quotient converges at the faster rate $O(\rho^{2k})$, but the theorem only claims $O(\rho^k)$.
[guided]
The computation reveals an important phenomenon: the Rayleigh quotient converges to $\lambda_1$ at rate $O(\rho^{2k})$, which is the *square* of the convergence rate $O(\rho^k)$ of the eigenvector. This is a general feature of the Rayleigh quotient — it extracts eigenvalue approximations that are quadratically more accurate than the eigenvector approximations feeding into it.
Why does this happen? Because the Rayleigh quotient is a *ratio* of two quantities that both have $O(\rho^{2k})$ perturbations from their limiting values. There is no $O(\rho^k)$ contribution: the cross terms vanish by orthogonality. The leading error in $r(x^{(k)}) - \lambda_1$ comes from terms of the form $|c_i|^2(\lambda_i - \lambda_1)|\lambda_i/\lambda_1|^{2k}$, each of which is $O(\rho^{2k})$.
The theorem states only the weaker bound $O(\rho^k)$, which is correct since $\rho^{2k} \leq \rho^k$ for $0 < \rho < 1$, but practitioners rely on the sharper $O(\rho^{2k})$ rate.
[/guided]
[/step]