[proofplan]
We decompose the $n$-step transition probability $p_{ij}(n)$ by conditioning on the first passage time $T_j$. The strong Markov property reduces $\mathbb{P}_i(X_n = j \mid T_j = m)$ to $p_{jj}(n - m)$, producing a discrete convolution $p_{ij}(n) = \sum_{m=1}^{n} f_{ij}(m)\, p_{jj}(n-m)$. Multiplying by $s^n$ and summing converts the convolution into a product of generating functions. Adding the $n = 0$ boundary term $\delta_{ij}$ yields the identity.
[/proofplan]
[step:Decompose $p_{ij}(n)$ by conditioning on the first passage time $T_j$]
For $n \geq 1$, partition the event $\{X_n = j\}$ according to the value of the first passage time $T_j = \inf\{r \geq 1 : X_r = j\}$. Since the chain can only be at $j$ at time $n$ if $T_j \leq n$, the law of total probability gives
\begin{align*}
p_{ij}(n) = \mathbb{P}_i(X_n = j) = \sum_{m=1}^{n} \mathbb{P}_i(X_n = j \mid T_j = m)\, \mathbb{P}_i(T_j = m).
\end{align*}
By definition, $\mathbb{P}_i(T_j = m) = f_{ij}(m)$. On the event $\{T_j = m\}$, the chain visits state $j$ for the first time at step $m$, so $X_m = j$. By the [Strong Markov Property](/theorems/2208), conditional on $T_j = m$ and $X_m = j$, the process $(X_{m+r})_{r \geq 0}$ is a fresh Markov chain started at $j$, independent of the past. Therefore
\begin{align*}
\mathbb{P}_i(X_n = j \mid T_j = m) = \mathbb{P}_j(X_{n-m} = j) = p_{jj}(n - m).
\end{align*}
Substituting:
\begin{align*}
p_{ij}(n) = \sum_{m=1}^{n} p_{jj}(n - m)\, f_{ij}(m).
\end{align*}
This is a discrete convolution of the sequences $(p_{jj}(k))_{k \geq 0}$ and $(f_{ij}(m))_{m \geq 1}$.
[guided]
We want to express $p_{ij}(n)$ in terms of the first-passage probabilities $f_{ij}(m)$ and the return probabilities $p_{jj}(k)$. The natural approach is to condition on when the chain first arrives at $j$.
For $n \geq 1$, if $X_n = j$, the chain must have first reached $j$ at some time $m \in \{1, \ldots, n\}$. Partitioning over this first passage time $T_j = \inf\{r \geq 1 : X_r = j\}$, the law of total probability gives
\begin{align*}
p_{ij}(n) = \mathbb{P}_i(X_n = j) = \sum_{m=1}^{n} \mathbb{P}_i(X_n = j \mid T_j = m)\, \mathbb{P}_i(T_j = m).
\end{align*}
By definition of the first-passage probability, $\mathbb{P}_i(T_j = m) = f_{ij}(m)$.
Now we evaluate $\mathbb{P}_i(X_n = j \mid T_j = m)$. On the event $\{T_j = m\}$, the chain is at state $j$ at time $m$ and has not visited $j$ at any earlier time $1, \ldots, m-1$. The [Strong Markov Property](/theorems/2208) tells us that conditional on $T_j = m$ (which is a stopping time for the natural filtration) and $X_{T_j} = j$, the post-$T_j$ process $(X_{m+r})_{r \geq 0}$ is a Markov chain with the same transition matrix $P$, started at $j$, independent of $(X_0, \ldots, X_m)$. In particular,
\begin{align*}
\mathbb{P}_i(X_n = j \mid T_j = m) = \mathbb{P}_j(X_{n-m} = j) = p_{jj}(n - m).
\end{align*}
Substituting back:
\begin{align*}
p_{ij}(n) = \sum_{m=1}^{n} p_{jj}(n - m)\, f_{ij}(m).
\end{align*}
This has the structure of a discrete convolution: $p_{ij}(n) = \sum_{m=1}^{n} f_{ij}(m) \cdot p_{jj}(n - m)$, where the two sequences being convolved are $(f_{ij}(m))_{m \geq 1}$ and $(p_{jj}(k))_{k \geq 0}$.
[/guided]
[/step]
[step:Convert the convolution into a product of generating functions by multiplying by $s^n$ and summing]
Multiply both sides of $p_{ij}(n) = \sum_{m=1}^{n} f_{ij}(m)\, p_{jj}(n-m)$ by $s^n$ and sum over $n \geq 1$. For $|s| \leq 1$, all the power series involved have non-negative coefficients and converge (since $p_{ij}(n) \leq 1$ and $f_{ij}(m) \leq 1$ for all $n, m$), so we may interchange the order of summation freely. On the left side:
\begin{align*}
\sum_{n=1}^{\infty} p_{ij}(n)\, s^n = P_{ij}(s) - p_{ij}(0) = P_{ij}(s) - \delta_{ij}.
\end{align*}
On the right side, substituting $k = n - m$ (so $n = m + k$, and as $n$ ranges from $m$ to $\infty$ with $m$ fixed, $k$ ranges from $0$ to $\infty$):
\begin{align*}
\sum_{n=1}^{\infty} \sum_{m=1}^{n} f_{ij}(m)\, p_{jj}(n-m)\, s^n &= \sum_{m=1}^{\infty} f_{ij}(m)\, s^m \sum_{k=0}^{\infty} p_{jj}(k)\, s^k = F_{ij}(s) \cdot P_{jj}(s).
\end{align*}
The interchange of summation is justified because each term $f_{ij}(m)\, p_{jj}(k)\, s^{m+k}$ is non-negative for $0 < s \leq 1$ (and for $-1 < s < 0$, absolute convergence holds since $|f_{ij}(m)\, p_{jj}(k)\, s^{m+k}| \leq f_{ij}(m)\, p_{jj}(k) \cdot 1$, and $\sum_m f_{ij}(m) \leq 1$, $\sum_k p_{jj}(k) \leq \infty$ with partial sums bounded).
[guided]
The key idea is that convolution in the "time domain" becomes multiplication in the "generating function domain." We multiply both sides of the convolution identity by $s^n$ and sum over $n \geq 1$.
The left side becomes $\sum_{n=1}^{\infty} p_{ij}(n)\, s^n$. We want to relate this to the full generating function $P_{ij}(s) = \sum_{n=0}^{\infty} p_{ij}(n)\, s^n$. The difference is the $n = 0$ term: $p_{ij}(0) = \delta_{ij}$ (the chain is at $j$ at time $0$ iff $i = j$). So the left side equals $P_{ij}(s) - \delta_{ij}$.
For the right side, we write $s^n = s^m \cdot s^{n-m}$ and substitute $k = n - m$:
\begin{align*}
\sum_{n=1}^{\infty} \sum_{m=1}^{n} f_{ij}(m)\, p_{jj}(n-m)\, s^n &= \sum_{n=1}^{\infty} \sum_{m=1}^{n} f_{ij}(m)\, s^m \cdot p_{jj}(n-m)\, s^{n-m}.
\end{align*}
Exchanging the order of summation (which is valid by Tonelli's theorem since all terms are non-negative for $0 < s \leq 1$), and letting $k = n - m$ range from $0$ to $\infty$:
\begin{align*}
\sum_{m=1}^{\infty} f_{ij}(m)\, s^m \sum_{k=0}^{\infty} p_{jj}(k)\, s^k = F_{ij}(s) \cdot P_{jj}(s).
\end{align*}
This is the standard fact that the generating function of a convolution equals the product of the generating functions.
[/guided]
[/step]
[step:Add the boundary term $\delta_{ij}$ to obtain the identity]
Equating the two sides:
\begin{align*}
P_{ij}(s) - \delta_{ij} = F_{ij}(s) \cdot P_{jj}(s).
\end{align*}
Rearranging:
\begin{align*}
P_{ij}(s) = \delta_{ij} + F_{ij}(s)\, P_{jj}(s),
\end{align*}
which is the stated identity, valid for all $i, j \in S$ and $-1 < s \leq 1$.
[/step]