[proofplan]
The proof uses the coupling technique in two stages. In Stage 1, we construct a product chain $Z_n = (X_n, Y_n)$ from two independent copies of the original chain started at different states $i$ and $j$, show the product chain is irreducible and positive recurrent (using aperiodicity to ensure irreducibility, and the product invariant distribution $\pi_i \pi_j$ for positive recurrence), define a coupling time $T$ at which both chains meet, and bound $|p_{i,k}(n) - p_{j,k}(n)| \leq \mathbb{P}_{ij}(T > n) \to 0$. In Stage 2, we use the invariance of $\pi$ to write $\pi_k$ as a weighted average of $p_{j,k}(n)$ and apply the bounded convergence theorem to conclude $p_{i,k}(n) \to \pi_k$.
[/proofplan]
[step:Construct the product chain and verify it is irreducible and positive recurrent]
Fix states $i, j \in S$. Define two independent Markov chains $X = (X_n)$ and $Y = (Y_n)$, each with transition matrix $P$, with $X_0 = i$ and $Y_0 = j$. The pair $Z_n = (X_n, Y_n)$ is itself a Markov chain on the product state space $S \times S$ with transition probabilities
\begin{align*}
\tilde{p}_{(a,b),(c,d)} = p_{a,c} \cdot p_{b,d},
\end{align*}
which follows from the independence of $X$ and $Y$.
We verify that $Z$ is irreducible. Let $(a, b)$ and $(c, d)$ be any two states in $S \times S$. Since the original chain is irreducible and aperiodic, for each state $a$ the set $D_a = \{n \geq 1 : p_{a,a}(n) > 0\}$ has $\gcd(D_a) = 1$. By a standard number-theoretic fact, there exists $N_a$ such that $n \in D_a$ for all $n \geq N_a$. In particular, $p_{a,c}(n) \geq p_{a,a}(n - m) \cdot p_{a,c}(m) > 0$ for all sufficiently large $n$ (choosing $m$ with $p_{a,c}(m) > 0$ and using $n - m \in D_a$ for $n$ large). Applying this to both coordinates: for sufficiently large $n$, both $p_{a,c}(n) > 0$ and $p_{b,d}(n) > 0$, so $\tilde{p}_{(a,b),(c,d)}(n) = p_{a,c}(n) \cdot p_{b,d}(n) > 0$. Hence $Z$ is irreducible.
The product distribution $\nuu_{(a,b)} = \pi_a \pi_b$ is invariant for $Z$:
\begin{align*}
\sum_{(a,b) \in S \times S} \nuu_{(a,b)} \, \tilde{p}_{(a,b),(c,d)} = \sum_{a,b} \pi_a \pi_b \, p_{a,c} \, p_{b,d} = \left(\sum_a \pi_a \, p_{a,c}\right)\left(\sum_b \pi_b \, p_{b,d}\right) = \pi_c \, \pi_d = \nuu_{(c,d)},
\end{align*}
where both factors equal $\pi_c$ and $\pi_d$ respectively by the invariance $\pi P = \pi$. Since $Z$ is irreducible and has an invariant distribution, the [Existence and Uniqueness of Invariant Distribution](/theorems/2223) theorem implies $Z$ is positive recurrent.
[guided]
The coupling strategy requires placing two copies of the chain on the same probability space. The simplest way to do this is to run them independently.
Let $X_0 = i$ and $Y_0 = j$ be two independent chains with transition matrix $P$. The product process $Z_n = (X_n, Y_n)$ on $S \times S$ has transition probabilities $\tilde{p}_{(a,b),(c,d)} = p_{a,c} \cdot p_{b,d}$. The Markov property of $Z$ follows from the independence of $X$ and $Y$: the next state of $Z$ depends only on the current state.
Why is $Z$ irreducible? This is where aperiodicity is essential. An irreducible aperiodic chain has the property that for each pair $(a, c)$, $p_{a,c}(n) > 0$ for all sufficiently large $n$. (This is because the set $D_a = \{n : p_{a,a}(n) > 0\}$ has $\gcd(D_a) = 1$, so all sufficiently large integers lie in $D_a$; then $p_{a,c}(n) \geq p_{a,c}(m) \cdot p_{c,c}(n - m) > 0$ for $n$ large enough.)
Without aperiodicity, we could have $p_{a,c}(n) > 0$ only for $n$ in one residue class modulo $d$, and $p_{b,d}(n) > 0$ only for $n$ in a potentially different residue class. There might be no single $n$ making both positive simultaneously. With aperiodicity ($d = 1$), both are positive for all sufficiently large $n$, so $Z$ is irreducible.
The product distribution $\nuu_{(a,b)} = \pi_a \pi_b$ is invariant because the invariance equation $\pi = \pi P$ applies independently to each coordinate. Since $Z$ is irreducible with invariant distribution $\nuu$, the [Existence and Uniqueness of Invariant Distribution](/theorems/2223) theorem guarantees $Z$ is positive recurrent (and in particular recurrent).
[/guided]
[/step]
[step:Define the coupling time and show it is finite almost surely]
Fix any state $s \in S$ and define the coupling time
\begin{align*}
T := \inf\{n \geq 0 : X_n = Y_n = s\} = \inf\{n \geq 0 : Z_n = (s, s)\}.
\end{align*}
This is the first time the product chain $Z$ visits the state $(s, s) \in S \times S$. Since $Z$ is positive recurrent (hence recurrent) and irreducible, the [Certain Visitation from Any Start](/theorems/2222) theorem applied to $Z$ guarantees that $(s, s)$ is visited with probability $1$ from any starting state. Therefore $T < \infty$ $\mathbb{P}_{ij}$-almost surely.
Moreover, $\mathbb{P}_{ij}(T > n) \to 0$ as $n \to \infty$: the events $\{T > n\}$ are decreasing in $n$ with intersection $\{T = \infty\}$, which has probability $0$.
[guided]
The coupling time $T$ is the moment at which both chains first simultaneously occupy the same state $s$. Why does this happen with probability $1$? Because the product chain $Z$ is recurrent and irreducible on $S \times S$, so every state -- including $(s,s)$ -- is visited infinitely often from any starting point by the [Recurrence and Infinitely Many Returns](/theorems/2219) theorem.
An important subtlety: after time $T$, we couple the chains by running them with the same randomness. Since $X_T = Y_T = s$ and both chains evolve according to the same transition matrix $P$, if we use the same sequence of random transitions for both after time $T$, then $X_n = Y_n$ for all $n \geq T$. This is possible because the transition law depends only on the current state, and at time $T$ both chains are in the same state.
The tail probability $\mathbb{P}_{ij}(T > n)$ decreases to $\mathbb{P}_{ij}(T = \infty) = 0$, which is the key estimate we need for the convergence argument.
[/guided]
[/step]
[step:Bound $|p_{i,k}(n) - p_{j,k}(n)|$ using the coupling inequality]
Fix any target state $k \in S$. Decompose $p_{i,k}(n) = \mathbb{P}_{ij}(X_n = k)$ according to whether coupling has occurred by time $n$:
\begin{align*}
p_{i,k}(n) = \mathbb{P}_{ij}(X_n = k,\, T \leq n) + \mathbb{P}_{ij}(X_n = k,\, T > n).
\end{align*}
On the event $\{T \leq n\}$, both chains occupy the same state from time $T$ onward: $X_n = Y_n$. Therefore $\{X_n = k, T \leq n\} = \{Y_n = k, T \leq n\}$, giving
\begin{align*}
\mathbb{P}_{ij}(X_n = k,\, T \leq n) = \mathbb{P}_{ij}(Y_n = k,\, T \leq n).
\end{align*}
Substituting:
\begin{align*}
p_{i,k}(n) = \mathbb{P}_{ij}(Y_n = k,\, T \leq n) + \mathbb{P}_{ij}(X_n = k,\, T > n) \leq \mathbb{P}_{ij}(Y_n = k) + \mathbb{P}_{ij}(T > n) = p_{j,k}(n) + \mathbb{P}_{ij}(T > n).
\end{align*}
By the symmetric argument (the roles of $X$ and $Y$ are interchangeable, since the construction is symmetric in $i$ and $j$):
\begin{align*}
p_{j,k}(n) \leq p_{i,k}(n) + \mathbb{P}_{ij}(T > n).
\end{align*}
Combining both inequalities:
\begin{align*}
|p_{i,k}(n) - p_{j,k}(n)| \leq \mathbb{P}_{ij}(T > n).
\end{align*}
Since $\mathbb{P}_{ij}(T > n) \to 0$ as $n \to \infty$ (established in the previous step):
\begin{align*}
|p_{i,k}(n) - p_{j,k}(n)| \to 0 \quad \text{as } n \to \infty,
\end{align*}
for all $i, j, k \in S$.
[guided]
This is the heart of the coupling argument. The idea is that after the coupling time $T$, both chains are in the same state and evolve identically, so they produce the same transition probabilities. Before coupling, they can differ, but the probability that coupling has not yet occurred is the only source of discrepancy.
We decompose the probability $p_{i,k}(n) = \mathbb{P}_{ij}(X_n = k)$ into two pieces based on whether $T \leq n$ or $T > n$:
\begin{align*}
p_{i,k}(n) = \mathbb{P}_{ij}(X_n = k,\, T \leq n) + \mathbb{P}_{ij}(X_n = k,\, T > n).
\end{align*}
On $\{T \leq n\}$, $X_n = Y_n$ (both chains have been running in lockstep since time $T$), so $\{X_n = k, T \leq n\} = \{Y_n = k, T \leq n\}$. Hence
\begin{align*}
p_{i,k}(n) = \mathbb{P}_{ij}(Y_n = k,\, T \leq n) + \mathbb{P}_{ij}(X_n = k,\, T > n).
\end{align*}
The first term satisfies $\mathbb{P}_{ij}(Y_n = k, T \leq n) \leq \mathbb{P}_{ij}(Y_n = k) = p_{j,k}(n)$, and the second satisfies $\mathbb{P}_{ij}(X_n = k, T > n) \leq \mathbb{P}_{ij}(T > n)$. So $p_{i,k}(n) \leq p_{j,k}(n) + \mathbb{P}_{ij}(T > n)$.
Swapping $i \leftrightarrow j$ gives $p_{j,k}(n) \leq p_{i,k}(n) + \mathbb{P}_{ij}(T > n)$ (the coupling time is symmetric in $i$ and $j$). Combining: $|p_{i,k}(n) - p_{j,k}(n)| \leq \mathbb{P}_{ij}(T > n) \to 0$.
This establishes that all starting states produce transition probabilities that converge to each other. But we have not yet identified the common limit. That is the role of Stage 2.
[/guided]
[/step]
[step:Identify the limit as $\pi_k$ using invariance and bounded convergence]
By the invariance $\pi = \pi P^n$, for every $n \geq 0$ and $k \in S$:
\begin{align*}
\pi_k = \sum_{j \in S} \pi_j \, p_{j,k}(n).
\end{align*}
Fix $i, k \in S$. Subtracting $p_{i,k}(n)$ from both sides and applying the triangle inequality:
\begin{align*}
|\pi_k - p_{i,k}(n)| = \left|\sum_{j \in S} \pi_j \bigl(p_{j,k}(n) - p_{i,k}(n)\bigr)\right| \leq \sum_{j \in S} \pi_j \, |p_{j,k}(n) - p_{i,k}(n)|.
\end{align*}
For each fixed $j$, the previous step gives $|p_{j,k}(n) - p_{i,k}(n)| \to 0$ as $n \to \infty$. The summands satisfy $|p_{j,k}(n) - p_{i,k}(n)| \leq 1$ for all $n$ (since both terms are probabilities in $[0, 1]$). Since $\sum_{j \in S} \pi_j = 1 < \infty$ and the summands are uniformly bounded by the integrable function $g(j) = 1$ (with $\sum_j \pi_j \cdot 1 = 1 < \infty$), the bounded convergence theorem (for sums with respect to the finite measure $\pi$) gives
\begin{align*}
\lim_{n \to \infty} \sum_{j \in S} \pi_j \, |p_{j,k}(n) - p_{i,k}(n)| = \sum_{j \in S} \pi_j \cdot 0 = 0.
\end{align*}
Therefore $|\pi_k - p_{i,k}(n)| \to 0$, i.e., $p_{i,k}(n) \to \pi_k$ as $n \to \infty$, for all $i, k \in S$.
[guided]
Stage 1 showed that all starting points produce transition probabilities that converge to each other. Stage 2 identifies the common limit as $\pi_k$. The idea is to use the invariance equation to express $\pi_k$ as a weighted average of the $p_{j,k}(n)$, and then pass to the limit inside the average.
Since $\pi = \pi P^n$:
\begin{align*}
\pi_k = \sum_{j \in S} \pi_j \, p_{j,k}(n).
\end{align*}
This is a weighted average of the quantities $p_{j,k}(n)$ with weights $\pi_j$. If each $p_{j,k}(n)$ is close to $p_{i,k}(n)$ (which Stage 1 guarantees for large $n$), then $\pi_k$ must also be close to $p_{i,k}(n)$, since the weights sum to $1$.
Formally:
\begin{align*}
|\pi_k - p_{i,k}(n)| = \left|\sum_j \pi_j (p_{j,k}(n) - p_{i,k}(n))\right| \leq \sum_j \pi_j |p_{j,k}(n) - p_{i,k}(n)|.
\end{align*}
We want to take $n \to \infty$ inside the sum. The bounded convergence theorem applies here: the terms $\pi_j |p_{j,k}(n) - p_{i,k}(n)|$ are bounded by $\pi_j$ (since $|p_{j,k}(n) - p_{i,k}(n)| \leq 1$), and $\sum_j \pi_j = 1 < \infty$. The pointwise limit is $0$ by Stage 1. Therefore the sum converges to $0$, giving $p_{i,k}(n) \to \pi_k$.
Why did we need positive recurrence, not just recurrence, in Stage 2? If the chain were null recurrent, Stage 1 would still go through (the coupling time is still finite a.s.), but there would be no invariant distribution $\pi$ to serve as the limit. In the null recurrent case, $p_{i,k}(n) \to 0$ for all $i, k$ -- the chain spreads out and assigns vanishing probability to each state.
[/guided]
[/step]