[proofplan]
We define $f(T)$ via $f(J)$ in the Jordan basis and reduce to computing $f$ on a single Jordan block $J_m(\lambda)$. Writing $J_m(\lambda) = \lambda I + N$ with $N$ nilpotent of index $m$, we substitute into the Taylor series of $f$ centred at $\lambda$. Because $N^k = 0$ for $k \geq m$, the series truncates to a finite sum. Computing the matrix entries of $N^k$ explicitly yields the upper-triangular Toeplitz matrix whose $(i,j)$-entry is $\frac{f^{(j-i)}(\lambda)}{(j-i)!}$.
[/proofplan]
[step:Reduce to computing $f$ on a single Jordan block]
Since $T = P J P^{-1}$ where $J = \bigoplus_{k=1}^r J_{m_k}(\lambda_k)$, we define $f(T) := P\, f(J)\, P^{-1}$. The block-diagonal structure of $J$ gives
\begin{align*}
f(J) &= \bigoplus_{k=1}^r f(J_{m_k}(\lambda_k)),
\end{align*}
because for any polynomial $p$, the identity $p(\bigoplus_k A_k) = \bigoplus_k p(A_k)$ holds by the block-diagonal multiplication rule, and this extends to convergent power series by taking limits block-by-block. It therefore suffices to compute $f(J_m(\lambda))$ for a single Jordan block $J_m(\lambda)$ of size $m$ with eigenvalue $\lambda$.
[guided]
We want to evaluate $f(T)$ where $T$ has Jordan Normal Form $J = P^{-1} T P$. The natural definition is $f(T) := P\, f(J)\, P^{-1}$. But what is $f(J)$?
The Jordan matrix $J = \bigoplus_{k=1}^r J_{m_k}(\lambda_k)$ is block-diagonal. For any polynomial $p(\lambda) = \sum_{j=0}^d c_j \lambda^j$, we have
\begin{align*}
p(J) &= \sum_{j=0}^d c_j J^j = \sum_{j=0}^d c_j \bigoplus_{k=1}^r J_{m_k}(\lambda_k)^j = \bigoplus_{k=1}^r \sum_{j=0}^d c_j J_{m_k}(\lambda_k)^j = \bigoplus_{k=1}^r p(J_{m_k}(\lambda_k)),
\end{align*}
where the third equality uses the fact that powers of a block-diagonal matrix are computed block-by-block: if $J = \operatorname{diag}(A_1, \ldots, A_r)$, then $J^j = \operatorname{diag}(A_1^j, \ldots, A_r^j)$.
For $f$ analytic on a neighbourhood of $\sigma(T)$, we express $f$ as a convergent power series near each eigenvalue and apply the same identity termwise. The convergence is valid because each $J_{m_k}(\lambda_k)$ has spectrum $\{\lambda_k\}$, which lies within the radius of convergence of the Taylor series of $f$ at $\lambda_k$. Therefore
\begin{align*}
f(J) &= \bigoplus_{k=1}^r f(J_{m_k}(\lambda_k)).
\end{align*}
This reduces the problem to computing $f(J_m(\lambda))$ for a single $m \times m$ Jordan block with eigenvalue $\lambda$.
[/guided]
[/step]
[step:Decompose the Jordan block as $J_m(\lambda) = \lambda I_m + N$ with $N$ nilpotent]
Write $J_m(\lambda) = \lambda I_m + N$, where $N$ is the $m \times m$ matrix with $1$'s on the superdiagonal and $0$'s elsewhere:
\begin{align*}
N_{ij} &= \begin{cases} 1 & \text{if } j = i + 1, \\ 0 & \text{otherwise.} \end{cases}
\end{align*}
The matrix $N$ is nilpotent of index $m$: $N^k \neq 0$ for $0 \leq k \leq m - 1$ and $N^k = 0$ for $k \geq m$. More precisely, the entries of $N^k$ are
\begin{align*}
(N^k)_{ij} &= \begin{cases} 1 & \text{if } j = i + k, \\ 0 & \text{otherwise,} \end{cases}
\end{align*}
for $0 \leq k \leq m - 1$. That is, $N^k$ has $1$'s on the $k$-th superdiagonal and $0$'s elsewhere.
[guided]
The Jordan block $J_m(\lambda)$ has $\lambda$ on the main diagonal and $1$'s on the superdiagonal. We isolate these two contributions by writing $J_m(\lambda) = \lambda I_m + N$, where $N$ is the $m \times m$ nilpotent shift matrix:
\begin{align*}
N_{ij} &= \begin{cases} 1 & \text{if } j = i + 1, \\ 0 & \text{otherwise.} \end{cases}
\end{align*}
Why is this decomposition useful? The scalar part $\lambda I_m$ commutes with everything, and the nilpotent part $N$ has a very predictable power structure. Specifically, $N$ shifts basis vectors up by one position: $N e_i = e_{i-1}$ for $i \geq 2$ and $N e_1 = 0$ (where $e_1, \ldots, e_m$ are the standard basis vectors). Applying $N$ repeatedly, $N^k$ shifts by $k$ positions, so its entries are
\begin{align*}
(N^k)_{ij} &= \begin{cases} 1 & \text{if } j = i + k, \\ 0 & \text{otherwise.} \end{cases}
\end{align*}
This means $N^k$ has $1$'s on the $k$-th superdiagonal and $0$'s elsewhere. In particular, $N^{m-1}$ has a single $1$ in position $(1, m)$, and $N^m = 0$ because there is no $m$-th superdiagonal in an $m \times m$ matrix. So $N$ is nilpotent of index exactly $m$: $N^k = 0$ for all $k \geq m$, and $N^{m-1} \neq 0$.
[/guided]
[/step]
[step:Expand $f(\lambda I_m + N)$ via the Taylor series and truncate at $N^{m-1}$]
Since $f$ is analytic on a neighbourhood of $\lambda$, it has a convergent Taylor expansion $f(z) = \sum_{k=0}^\infty \frac{f^{(k)}(\lambda)}{k!}(z - \lambda)^k$ for $z$ near $\lambda$. Substituting $J_m(\lambda) = \lambda I_m + N$:
\begin{align*}
f(J_m(\lambda)) &= \sum_{k=0}^\infty \frac{f^{(k)}(\lambda)}{k!} N^k.
\end{align*}
Since $N^k = 0$ for $k \geq m$, the series truncates:
\begin{align*}
f(J_m(\lambda)) &= \sum_{k=0}^{m-1} \frac{f^{(k)}(\lambda)}{k!} N^k.
\end{align*}
This is a finite sum of matrices, so no convergence issues arise in the matrix norm.
[guided]
We now evaluate $f$ at the matrix $J_m(\lambda) = \lambda I_m + N$. Since $f$ is analytic on a neighbourhood of $\lambda \in \sigma(T)$, we can write
\begin{align*}
f(z) &= \sum_{k=0}^\infty \frac{f^{(k)}(\lambda)}{k!}(z - \lambda)^k
\end{align*}
for all $z$ in some disc centred at $\lambda$. We substitute the matrix $J_m(\lambda)$ for $z$. The quantity $z - \lambda$ becomes $J_m(\lambda) - \lambda I_m = N$, so formally
\begin{align*}
f(J_m(\lambda)) &= \sum_{k=0}^\infty \frac{f^{(k)}(\lambda)}{k!} N^k.
\end{align*}
Is this substitution valid? For a scalar power series $\sum c_k w^k$ with radius of convergence $R > 0$, the matrix series $\sum c_k A^k$ converges in operator norm whenever $\|A\| < R$. Here $A = N$ and the radius of convergence $R$ is positive (since $f$ is analytic at $\lambda$). But we do not even need to check a norm bound: the nilpotency $N^m = 0$ ensures the series terminates after finitely many terms. Specifically,
\begin{align*}
f(J_m(\lambda)) &= \sum_{k=0}^{m-1} \frac{f^{(k)}(\lambda)}{k!} N^k,
\end{align*}
since every term with $k \geq m$ contributes $\frac{f^{(k)}(\lambda)}{k!} \cdot 0 = 0$. This is a polynomial expression in $N$ with scalar coefficients $\frac{f^{(k)}(\lambda)}{k!}$, requiring only that $f$ be $(m-1)$-times differentiable at $\lambda$. The analyticity assumption guarantees this and more.
[/guided]
[/step]
[step:Compute the $(i,j)$-entry and identify the upper-triangular Toeplitz matrix]
Substituting the explicit formula $(N^k)_{ij} = \mathbb{1}_{j = i+k}$ into the truncated series:
\begin{align*}
\bigl(f(J_m(\lambda))\bigr)_{ij} &= \sum_{k=0}^{m-1} \frac{f^{(k)}(\lambda)}{k!} (N^k)_{ij} = \sum_{k=0}^{m-1} \frac{f^{(k)}(\lambda)}{k!} \mathbb{1}_{j = i + k}.
\end{align*}
For a given pair $(i,j)$ with $1 \leq i, j \leq m$, the indicator $\mathbb{1}_{j = i + k}$ selects the unique value $k = j - i$. If $j < i$, then $k = j - i < 0$ and no term contributes, so the entry is $0$ (the matrix is upper triangular). If $j \geq i$, then $k = j - i \in \{0, 1, \ldots, m - 1\}$ and exactly one term survives:
\begin{align*}
\bigl(f(J_m(\lambda))\bigr)_{ij} &= \begin{cases} \dfrac{f^{(j-i)}(\lambda)}{(j-i)!} & \text{if } j \geq i, \\ 0 & \text{if } j < i. \end{cases}
\end{align*}
This is exactly the upper-triangular Toeplitz matrix stated in the theorem: the entry in row $i$, column $j$ depends only on the difference $j - i$, with $\frac{f^{(0)}(\lambda)}{0!} = f(\lambda)$ on the main diagonal, $\frac{f'(\lambda)}{1!} = f'(\lambda)$ on the first superdiagonal, $\frac{f''(\lambda)}{2!}$ on the second, and so on up to $\frac{f^{(m-1)}(\lambda)}{(m-1)!}$ in the top-right corner.
[guided]
We now read off the matrix entries of $f(J_m(\lambda)) = \sum_{k=0}^{m-1} \frac{f^{(k)}(\lambda)}{k!} N^k$. From the previous step, the $(i,j)$-entry of $N^k$ is $1$ if $j = i + k$ and $0$ otherwise. Substituting:
\begin{align*}
\bigl(f(J_m(\lambda))\bigr)_{ij} &= \sum_{k=0}^{m-1} \frac{f^{(k)}(\lambda)}{k!} (N^k)_{ij} = \sum_{k=0}^{m-1} \frac{f^{(k)}(\lambda)}{k!} \mathbb{1}_{j = i + k}.
\end{align*}
The indicator $\mathbb{1}_{j = i + k}$ is $1$ for exactly one value of $k$, namely $k = j - i$, provided $j - i \geq 0$ and $j - i \leq m - 1$. Since $1 \leq i, j \leq m$, the condition $j - i \leq m - 1$ is automatically satisfied. So we have two cases:
**Case $j < i$ (below the diagonal):** Then $j - i < 0$, so $k = j - i$ is negative and no term in the sum has $\mathbb{1}_{j = i+k} = 1$. The entry is $0$. This confirms the matrix is upper triangular.
**Case $j \geq i$ (on or above the diagonal):** Then $k = j - i \in \{0, 1, \ldots, m-1\}$, and exactly one term survives:
\begin{align*}
\bigl(f(J_m(\lambda))\bigr)_{ij} &= \frac{f^{(j-i)}(\lambda)}{(j-i)!}.
\end{align*}
The entry depends only on the difference $j - i$, not on $i$ and $j$ individually. This makes $f(J_m(\lambda))$ an **upper-triangular Toeplitz matrix**: the value along each superdiagonal is constant. Reading off the diagonals:
- Main diagonal ($j - i = 0$): entry $\frac{f^{(0)}(\lambda)}{0!} = f(\lambda)$.
- First superdiagonal ($j - i = 1$): entry $\frac{f'(\lambda)}{1!} = f'(\lambda)$.
- Second superdiagonal ($j - i = 2$): entry $\frac{f''(\lambda)}{2!}$.
- In general, $k$-th superdiagonal ($j - i = k$): entry $\frac{f^{(k)}(\lambda)}{k!}$.
- Top-right corner ($i = 1, j = m$, so $j - i = m - 1$): entry $\frac{f^{(m-1)}(\lambda)}{(m-1)!}$.
This gives exactly the matrix stated in the theorem.
[/guided]
[/step]
[step:Assemble the full result $f(T) = P\, f(J)\, P^{-1}$]
Combining the preceding steps: $f(J) = \bigoplus_k f(J_{m_k}(\lambda_k))$ where each block $f(J_{m_k}(\lambda_k))$ is the upper-triangular Toeplitz matrix with $(i,j)$-entry $\frac{f^{(j-i)}(\lambda_k)}{(j-i)!}$ for $j \geq i$ and $0$ for $j < i$. Therefore
\begin{align*}
f(T) &= P\, f(J)\, P^{-1} = P \left( \bigoplus_{k=1}^r f(J_{m_k}(\lambda_k)) \right) P^{-1},
\end{align*}
which is the formula stated in the theorem.
[/step]