[proofplan]
We condition on the state at the intermediate time $m$ using the law of total probability, then apply the [Extended Markov Property](/theorems/2206) to decouple the future evolution (from time $m$ to time $m + n$) from the past (the initial state at time $0$). This reduces the two conditional probabilities to $n$-step and $m$-step transition probabilities, yielding the convolution formula.
[/proofplan]
[step:Condition on the intermediate state at time $m$ and apply the law of total probability]
By the definition of $n$-step transition probabilities, $p_{i,j}(m+n) = \mathbb{P}(X_{m+n} = j \mid X_0 = i)$. We introduce the intermediate state $X_m$ using the law of total probability. Since the events $\{X_m = k\}$ for $k \in S$ partition the sample space (the chain must be in some state at time $m$):
\begin{align*}
p_{i,j}(m+n) &= \mathbb{P}(X_{m+n} = j \mid X_0 = i) \\
&= \sum_{k \in S} \mathbb{P}(X_{m+n} = j \mid X_m = k,\, X_0 = i)\, \mathbb{P}(X_m = k \mid X_0 = i).
\end{align*}
[guided]
The idea is to break the $(m+n)$-step journey from $i$ to $j$ into two legs: first $m$ steps from $i$ to some intermediate state $k$, then $n$ steps from $k$ to $j$. We formalise this by conditioning on the state at the halfway point.
Starting from $p_{i,j}(m+n) = \mathbb{P}(X_{m+n} = j \mid X_0 = i)$, we partition according to the value of $X_m$. The events $\{X_m = k\}$ for $k \in S$ are pairwise disjoint and exhaust $\Omega$ (since $X_m$ takes values in $S$), so the law of total probability gives:
\begin{align*}
p_{i,j}(m+n) &= \sum_{k \in S} \mathbb{P}(X_{m+n} = j \mid X_m = k,\, X_0 = i)\, \mathbb{P}(X_m = k \mid X_0 = i).
\end{align*}
This decomposition is exact and involves no approximation. The second factor $\mathbb{P}(X_m = k \mid X_0 = i) = p_{i,k}(m)$ is already an $m$-step transition probability. The challenge is to simplify the first factor, which conditions on both the initial state $X_0 = i$ and the intermediate state $X_m = k$.
[/guided]
[/step]
[step:Apply the extended Markov property to decouple past and future]
By the [Extended Markov Property](/theorems/2206), given $X_m = k$, the future evolution of the chain (from time $m$ onward) is independent of the past history, including $X_0 = i$. The event $\{X_{m+n} = j\}$ is determined by the future $\{X_{m+1}, X_{m+2}, \ldots\}$, and the event $\{X_0 = i\}$ is determined by the past $\{X_0\}$. Therefore:
\begin{align*}
\mathbb{P}(X_{m+n} = j \mid X_m = k,\, X_0 = i) = \mathbb{P}(X_{m+n} = j \mid X_m = k).
\end{align*}
By homogeneity of the chain, $\mathbb{P}(X_{m+n} = j \mid X_m = k) = \mathbb{P}(X_n = j \mid X_0 = k) = p_{k,j}(n)$.
[guided]
This is the step where the Markov property does its essential work. We need to show that knowing $X_0 = i$ provides no additional information about $X_{m+n}$ once $X_m = k$ is given.
The [Extended Markov Property](/theorems/2206) states precisely this: for any event $F$ determined by the future $\{X_{m+1}, X_{m+2}, \ldots\}$ and any event $H$ determined by the past $\{X_0, \ldots, X_{m-1}\}$,
\begin{align*}
\mathbb{P}(F \mid X_m = k,\, H) = \mathbb{P}(F \mid X_m = k).
\end{align*}
Here $F = \{X_{m+n} = j\}$ is a future event and $H = \{X_0 = i\}$ is a past event (it depends only on $X_0$, which precedes time $m$). The extended Markov property applies directly, giving:
\begin{align*}
\mathbb{P}(X_{m+n} = j \mid X_m = k,\, X_0 = i) = \mathbb{P}(X_{m+n} = j \mid X_m = k).
\end{align*}
The right-hand side is the probability of going from state $k$ to state $j$ in $n$ steps, starting at time $m$. By homogeneity (the transition probabilities do not depend on the time), this equals $\mathbb{P}(X_n = j \mid X_0 = k) = p_{k,j}(n)$.
The intuition is that the Markov chain "restarts" at time $m$: once we know the chain is in state $k$ at time $m$, we can forget how it arrived there. The remaining $n$ steps behave identically to a fresh chain started at $k$.
[/guided]
[/step]
[step:Combine to obtain the Chapman--Kolmogorov equation]
Substituting the results of the previous steps:
\begin{align*}
p_{i,j}(m+n) &= \sum_{k \in S} \mathbb{P}(X_{m+n} = j \mid X_m = k,\, X_0 = i)\, \mathbb{P}(X_m = k \mid X_0 = i) \\
&= \sum_{k \in S} p_{k,j}(n)\, p_{i,k}(m).
\end{align*}
This is the Chapman--Kolmogorov equation. In matrix notation, since the $(i,j)$ entry of the product $P^m P^n$ is $\sum_{k \in S} p_{i,k}(m)\, p_{k,j}(n)$, this says $P^{m+n} = P^m P^n$: the $n$-step transition probabilities are the entries of the $n$-th matrix power of $P$.
[/step]