[proofplan]
Inverse iteration with shift $s$ applies the power method to the matrix $(A - sI)^{-1}$. We verify that this matrix is diagonalisable with the same eigenvectors as $A$ and eigenvalues $(\lambda_i - s)^{-1}$. The dominant eigenvalue of $(A - sI)^{-1}$ corresponds to the eigenvalue of $A$ closest to $s$, and the convergence rate of the power method is determined by the ratio of the second-largest to the largest eigenvalue magnitude of $(A - sI)^{-1}$, which equals $|(\lambda - s)/(\lambda' - s)| = \rho$.
[/proofplan]
[step:Identify the eigenvalues and eigenvectors of $(A - sI)^{-1}$]
Since $A$ is diagonalisable, write $Aw_i = \lambda_iw_i$ for $i = 1, \ldots, n$. Then
\begin{align*}
(A - sI)w_i = (\lambda_i - s)w_i.
\end{align*}
The shift $s$ is not an eigenvalue of $A$ (since $\lambda$ is the closest eigenvalue to $s$ and $|\lambda - s| > 0$ by hypothesis), so $\lambda_i - s \neq 0$ for all $i$ and $(A - sI)$ is invertible. Applying $(A - sI)^{-1}$ to both sides:
\begin{align*}
w_i = (\lambda_i - s)(A - sI)^{-1}w_i, \qquad \text{hence} \qquad (A - sI)^{-1}w_i = \frac{1}{\lambda_i - s}\,w_i.
\end{align*}
Therefore $(A - sI)^{-1}$ is diagonalisable with the same eigenvectors $w_1, \ldots, w_n$ and eigenvalues $\mu_i := (\lambda_i - s)^{-1}$.
[guided]
The key algebraic fact is that shifting and inverting preserve the eigenvectors. If $Aw_i = \lambda_iw_i$, then $(A - sI)w_i = (\lambda_i - s)w_i$ — the shift moves each eigenvalue by $-s$ but leaves the eigenvectors unchanged. Inverting then flips each eigenvalue to its reciprocal: $(A - sI)^{-1}w_i = (\lambda_i - s)^{-1}w_i$.
For the inverse to exist, we need $\lambda_i - s \neq 0$ for all $i$, i.e., $s$ must not be an eigenvalue of $A$. The hypothesis $|\lambda - s| > 0$ (where $\lambda$ is the *closest* eigenvalue to $s$) guarantees this: if the closest eigenvalue is at nonzero distance from $s$, then all eigenvalues are at nonzero distance from $s$.
Since $A$ is diagonalisable with eigenvectors $\{w_1, \ldots, w_n\}$ forming a basis, and $(A - sI)^{-1}$ has the same eigenvectors, the matrix $(A - sI)^{-1}$ is also diagonalisable.
[/guided]
[/step]
[step:Identify the dominant eigenvalue of $(A - sI)^{-1}$]
The magnitude of the $i$-th eigenvalue of $(A - sI)^{-1}$ is $|\mu_i| = |\lambda_i - s|^{-1}$. The largest $|\mu_i|$ corresponds to the smallest $|\lambda_i - s|$, which is $|\lambda - s|$ by hypothesis. The second-largest corresponds to $|\lambda' - s|$, the second-smallest distance. Therefore the dominant eigenvalue of $(A - sI)^{-1}$ has modulus
\begin{align*}
|\mu_{\max}| = \frac{1}{|\lambda - s|},
\end{align*}
and the ratio of the second-largest to dominant eigenvalue modulus is
\begin{align*}
\rho = \frac{|\mu_{\text{second}}|}{|\mu_{\max}|} = \frac{|\lambda - s|}{|\lambda' - s|} < 1.
\end{align*}
[guided]
Inversion reverses the ordering of eigenvalue magnitudes relative to the shift $s$. The eigenvalue $\lambda$ of $A$ that is *closest* to $s$ produces the eigenvalue $(\lambda - s)^{-1}$ of $(A - sI)^{-1}$ that is *largest* in magnitude, because $|\lambda_i - s|$ is smallest for $i$ corresponding to $\lambda$.
The ratio $\rho = |\lambda - s| / |\lambda' - s| < 1$ because $|\lambda - s| < |\lambda' - s|$ by definition ($\lambda$ is the closest eigenvalue, $\lambda'$ is the second-closest). This ratio controls the convergence speed: the smaller $\rho$ is, the faster convergence will be. When $s$ is very close to $\lambda$, $|\lambda - s|$ is small while $|\lambda' - s|$ remains bounded away from zero, making $\rho$ close to zero.
[/guided]
[/step]
[step:Apply the power method convergence theorem to $(A - sI)^{-1}$]
Inverse iteration computes $x^{(k+1)} \propto (A - sI)^{-1}x^{(k)}$, which is exactly the power method applied to the matrix $(A - sI)^{-1}$. Provided the initial vector has a nonzero component along $w$ (the eigenvector associated with $\lambda$), the [Convergence of the Power Method](/theorems/1405) guarantees convergence to $\operatorname{span}(w)$ at rate $O(\rho^k)$.
[guided]
The [Convergence of the Power Method](/theorems/1405) states that if $B$ is diagonalisable with a dominant eigenvalue of multiplicity one, and if the initial vector has a nonzero component in the dominant eigenspace, then the power iteration $x^{(k+1)} \propto Bx^{(k)}$ converges to the dominant eigenvector at rate $O(\rho^k)$, where $\rho$ is the ratio of the second-largest to the largest eigenvalue magnitude.
We apply this with $B = (A - sI)^{-1}$. We have verified: (i) $B$ is diagonalisable (same eigenvectors as $A$); (ii) the dominant eigenvalue $\mu_{\max} = (\lambda - s)^{-1}$ has the largest modulus; (iii) the ratio of the second-largest to the dominant eigenvalue modulus is $\rho = |\lambda - s|/|\lambda' - s| < 1$. The hypothesis on the initial vector (nonzero component along $w$) is carried over from the theorem statement.
The practical advantage of inverse iteration is that the convergence rate $\rho = |\lambda - s|/|\lambda' - s|$ can be made arbitrarily small by choosing $s$ close to $\lambda$. This is in contrast to the standard power method, where the convergence rate $|\lambda_2|/|\lambda_1|$ is fixed by the matrix. In the extreme case $s \to \lambda$, $\rho \to 0$ and convergence occurs in essentially a single step (though the linear system $(A - sI)x^{(k+1)} = x^{(k)}$ becomes increasingly ill-conditioned — fortunately, the errors in the solve are biased in the direction of $w$, which is harmless for eigenvector computation).
[/guided]
[/step]