[proofplan]
The strategy is to reduce the stopping time $T$ to a countable family of deterministic times. Since $T$ takes values in $\{0, 1, 2, \ldots\} \cup \{\infty\}$, we decompose the event $\{T < \infty\}$ into the disjoint union $\bigcup_{m=0}^{\infty} \{T = m\}$. On each piece $\{T = m\}$, the stopping time definition guarantees that this event is determined by $(X_0, \ldots, X_m)$, so the ordinary Markov property at the deterministic time $m$ applies. Summing over all $m$ and conditioning on $X_T = i$ then yields the strong Markov property for the post-$T$ process $(X_{T+n})_{n \geq 0}$.
[/proofplan]
[step:Fix future states and decompose $\{T < \infty\}$ over the values $T = m$]
We must show that conditional on $\{T < \infty, X_T = i\}$, the post-stopping-time process $(X_{T+n})_{n \geq 0}$ is a Markov chain with transition matrix $P$ and is independent of the history $(X_0, X_1, \ldots, X_T)$.
It suffices to prove the following: for every $n \geq 1$, every sequence of states $j_1, \ldots, j_n$ in the state space $S$, every $m \geq 0$, and every sequence of past states $i_0, i_1, \ldots, i_m$ with $i_m = i$ and $\mathbb{P}(X_0 = i_0, X_1 = i_1, \ldots, X_m = i_m, T = m) > 0$, we have
\begin{align*}
&\mathbb{P}(X_{m+1} = j_1, \ldots, X_{m+n} = j_n \mid X_0 = i_0, \ldots, X_m = i_m, T = m) \\
&\quad = P_{i j_1} P_{j_1 j_2} \cdots P_{j_{n-1} j_n}.
\end{align*}
This identity simultaneously establishes the Markov property with transition matrix $P$ and the independence from the past, because the right-hand side depends only on the starting state $i_m = i$ and the future states $j_1, \ldots, j_n$, not on $i_0, \ldots, i_{m-1}$.
[guided]
What does the strong Markov property actually ask us to verify? We need two things: (1) that the process restarts as a fresh Markov chain with the same transition matrix $P$, and (2) that this restarted process is independent of everything that happened up to time $T$.
Both claims follow if we can show that the conditional distribution of the future path $(X_{T+1}, X_{T+2}, \ldots)$ given the entire past $(X_0, \ldots, X_T)$ and the event $\{T < \infty, X_T = i\}$ depends only on the current state $i$, and equals the distribution of a Markov chain started at $i$ with transition matrix $P$.
Since $T$ is a random time taking values in $\{0, 1, 2, \ldots\} \cup \{\infty\}$, we cannot apply the ordinary Markov property directly — the Markov property holds at deterministic times, not at random ones. The key idea is to **decompose** the event $\{T < \infty\}$ into the countable disjoint union $\{T < \infty\} = \bigcup_{m=0}^{\infty} \{T = m\}$ and handle each piece separately. On the piece $\{T = m\}$, the time is deterministic (it equals $m$), so the ordinary Markov property applies.
Concretely, it suffices to prove: for every $n \geq 1$, every sequence of states $j_1, \ldots, j_n$ in $S$, every $m \geq 0$, and every sequence $i_0, i_1, \ldots, i_m$ with $i_m = i$ such that $\mathbb{P}(X_0 = i_0, X_1 = i_1, \ldots, X_m = i_m, T = m) > 0$, we have
\begin{align*}
&\mathbb{P}(X_{m+1} = j_1, \ldots, X_{m+n} = j_n \mid X_0 = i_0, \ldots, X_m = i_m, T = m) \\
&\quad = P_{i j_1} P_{j_1 j_2} \cdots P_{j_{n-1} j_n}.
\end{align*}
The right-hand side depends only on $i = i_m$ and the future states $j_1, \ldots, j_n$. It does not depend on the past states $i_0, \ldots, i_{m-1}$, nor on the value of $m$. Once this identity is established for each fixed $m$, summing over $m$ yields the result for the random time $T$.
[/guided]
[/step]
[step:Use the stopping time property to absorb $\{T = m\}$ into the past $\sigma$-algebra]
Since $T$ is a stopping time for the Markov chain $(X_n)_{n \geq 0}$, the event $\{T = m\}$ is determined by the observations $(X_0, X_1, \ldots, X_m)$. That is, $\{T = m\} \in \sigma(X_0, X_1, \ldots, X_m)$.
In particular, specifying $(X_0 = i_0, X_1 = i_1, \ldots, X_m = i_m)$ completely determines whether $T = m$ or not: either the trajectory $(i_0, i_1, \ldots, i_m)$ satisfies the stopping criterion (in which case $\{T = m\}$ holds with certainty on this event) or it does not (in which case $\{T = m\}$ is impossible on this event).
Define the event
\begin{align*}
A_m := \{X_0 = i_0, X_1 = i_1, \ldots, X_m = i_m, T = m\}.
\end{align*}
Since $\{T = m\} \in \sigma(X_0, \ldots, X_m)$, the event $A_m$ is the intersection of $\{X_0 = i_0, \ldots, X_m = i_m\}$ with an event that is already determined by $(X_0, \ldots, X_m)$. Therefore $A_m \in \sigma(X_0, X_1, \ldots, X_m)$.
[guided]
This step uses the **definition of a stopping time** in the discrete setting. A random variable $T: \Omega \to \{0, 1, 2, \ldots\} \cup \{\infty\}$ is a stopping time with respect to the natural filtration $(\mathcal{F}_m)_{m \geq 0}$, where $\mathcal{F}_m := \sigma(X_0, X_1, \ldots, X_m)$, if $\{T = m\} \in \mathcal{F}_m$ for every $m \geq 0$.
Why does this matter? Because we want to condition on $A_m$ and then apply the Markov property at time $m$. The Markov property at time $m$ tells us about the conditional distribution of $(X_{m+1}, X_{m+2}, \ldots)$ given $\sigma(X_0, \ldots, X_m)$. For this to be useful, the event we condition on must belong to $\sigma(X_0, \ldots, X_m)$.
The stopping time property ensures precisely this. The event $\{T = m\}$ belongs to $\sigma(X_0, \ldots, X_m)$, and $\{X_0 = i_0, \ldots, X_m = i_m\}$ also belongs to $\sigma(X_0, \ldots, X_m)$.
Their intersection $A_m = \{X_0 = i_0, \ldots, X_m = i_m\} \cap \{T = m\}$ therefore belongs to $\sigma(X_0, \ldots, X_m)$ as well.
This is exactly where the stopping time assumption is consumed. If $T$ were an arbitrary random time — say, $T = \min\{n \geq 0 : X_{n+1} = j\}$ with the "+1" making it depend on the future — then $\{T = m\}$ would depend on $X_{m+1}$, and the argument below would fail. The event $A_m$ would not lie in $\sigma(X_0, \ldots, X_m)$, and we could not apply the Markov property at time $m$.
[/guided]
[/step]
[step:Apply the ordinary Markov property at the deterministic time $m$]
We now compute the conditional probability. Assume $\mathbb{P}(A_m) > 0$. By the definition of conditional probability,
\begin{align*}
&\mathbb{P}(X_{m+1} = j_1, \ldots, X_{m+n} = j_n \mid A_m) \\
&\quad = \frac{\mathbb{P}(X_{m+1} = j_1, \ldots, X_{m+n} = j_n, \, A_m)}{\mathbb{P}(A_m)}.
\end{align*}
Since $A_m \subset \{X_m = i_m\} = \{X_m = i\}$, we can write
\begin{align*}
&\mathbb{P}(X_{m+1} = j_1, \ldots, X_{m+n} = j_n, \, A_m) \\
&\quad = \mathbb{P}(X_{m+1} = j_1, \ldots, X_{m+n} = j_n, \, X_0 = i_0, \ldots, X_m = i, \, T = m).
\end{align*}
Since $\{T = m\} \in \sigma(X_0, \ldots, X_m)$, there exists a set $\mathcal{T}_m \subset S^{m+1}$ of admissible trajectories such that
\begin{align*}
\{T = m\} = \{(X_0, \ldots, X_m) \in \mathcal{T}_m\}.
\end{align*}
The trajectory $(i_0, \ldots, i_m)$ either belongs to $\mathcal{T}_m$ or it does not. Since $\mathbb{P}(A_m) > 0$, it must belong to $\mathcal{T}_m$, and therefore $\{T = m\}$ holds with certainty on the event $\{X_0 = i_0, \ldots, X_m = i_m\}$. This means
\begin{align*}
A_m = \{X_0 = i_0, X_1 = i_1, \ldots, X_m = i_m\}.
\end{align*}
By the [Extended Markov Property](/theorems/2206) applied at the deterministic time $m$, conditional on $(X_0 = i_0, \ldots, X_m = i)$, the future process $(X_{m+1}, X_{m+2}, \ldots)$ is a Markov chain with transition matrix $P$ started at state $i$, and it is independent of $(X_0, \ldots, X_m)$. In particular,
\begin{align*}
&\mathbb{P}(X_{m+1} = j_1, \ldots, X_{m+n} = j_n \mid X_0 = i_0, \ldots, X_m = i) \\
&\quad = P_{i j_1} P_{j_1 j_2} \cdots P_{j_{n-1} j_n}.
\end{align*}
This follows from the [Path Probability Formula](/theorems/2205), which gives the probability of a specified path in terms of the product of transition probabilities along that path.
Substituting back into the conditional probability, since $A_m = \{X_0 = i_0, \ldots, X_m = i_m\}$ on the event that $(i_0, \ldots, i_m) \in \mathcal{T}_m$, we obtain
\begin{align*}
\mathbb{P}(X_{m+1} = j_1, \ldots, X_{m+n} = j_n \mid A_m) = P_{i j_1} P_{j_1 j_2} \cdots P_{j_{n-1} j_n}.
\end{align*}
[guided]
This is the core of the proof: on the event $\{T = m\}$, the stopping time becomes a deterministic time, and we can invoke the ordinary Markov property.
Let us be precise about why $A_m = \{X_0 = i_0, \ldots, X_m = i_m\}$ under our assumptions. We defined $A_m = \{X_0 = i_0, \ldots, X_m = i_m\} \cap \{T = m\}$. The event $\{T = m\}$ is determined by $(X_0, \ldots, X_m)$, meaning there is a deterministic set $\mathcal{T}_m \subset S^{m+1}$ such that $\omega \in \{T = m\}$ iff $(X_0(\omega), \ldots, X_m(\omega)) \in \mathcal{T}_m$. Since $\mathbb{P}(A_m) > 0$, the specific trajectory $(i_0, \ldots, i_m)$ must lie in $\mathcal{T}_m$. But then on the event $\{X_0 = i_0, \ldots, X_m = i_m\}$, the condition $(X_0, \ldots, X_m) \in \mathcal{T}_m$ is automatically satisfied, so $\{T = m\}$ holds with certainty. The intersection with $\{T = m\}$ is therefore redundant:
\begin{align*}
A_m &= \{X_0 = i_0, \ldots, X_m = i_m\} \cap \{T = m\} \\
&= \{X_0 = i_0, \ldots, X_m = i_m\}.
\end{align*}
Now we apply the [Extended Markov Property](/theorems/2206) at the deterministic time $m$. This theorem states that for a Markov chain with transition matrix $P$, conditional on $(X_0, \ldots, X_m)$, the shifted process $(X_{m+k})_{k \geq 1}$ is conditionally independent of $(X_0, \ldots, X_{m-1})$ given $X_m$, and evolves as a Markov chain with the same transition matrix $P$.
The hypotheses of the [Extended Markov Property](/theorems/2206) require that we condition on a $\sigma(X_0, \ldots, X_m)$-measurable event of positive probability. The event $A_m = \{X_0 = i_0, \ldots, X_m = i_m\}$ satisfies both requirements: it belongs to $\sigma(X_0, \ldots, X_m)$ and has positive probability by assumption.
Applying the [Extended Markov Property](/theorems/2206), the conditional distribution of the future path depends only on $X_m = i$:
\begin{align*}
&\mathbb{P}(X_{m+1} = j_1, \ldots, X_{m+n} = j_n \mid X_0 = i_0, \ldots, X_m = i) \\
&\quad = \mathbb{P}(X_{m+1} = j_1, \ldots, X_{m+n} = j_n \mid X_m = i).
\end{align*}
By the [Path Probability Formula](/theorems/2205), which expresses the probability of a finite path through a Markov chain as the product of transition probabilities along that path, the right-hand side equals
\begin{align*}
\mathbb{P}(X_{m+1} = j_1, \ldots, X_{m+n} = j_n \mid X_m = i) = P_{i j_1} P_{j_1 j_2} \cdots P_{j_{n-1} j_n}.
\end{align*}
Therefore $\mathbb{P}(X_{m+1} = j_1, \ldots, X_{m+n} = j_n \mid A_m) = P_{ij_1} P_{j_1 j_2} \cdots P_{j_{n-1} j_n}$, as required.
[/guided]
[/step]
[step:Sum over $m$ to obtain the strong Markov property for the stopping time $T$]
We now assemble the result. Fix states $j_1, \ldots, j_n$ and condition on $\{T < \infty, X_T = i\}$. For any event $B$ concerning the past trajectory that is compatible with $T = m$ and $X_T = i$, the computation above shows that the conditional probability of the future path $(X_{T+1} = j_1, \ldots, X_{T+n} = j_n)$ equals $P_{ij_1} P_{j_1 j_2} \cdots P_{j_{n-1} j_n}$, regardless of $m$ and regardless of the specific past trajectory.
More precisely, let $B \in \sigma(X_0, X_1, \ldots, X_T)$ be any event determined by the past, and write $F := \{X_{T+1} = j_1, \ldots, X_{T+n} = j_n\}$ for the future event. We compute
\begin{align*}
\mathbb{P}(F \cap B \cap \{T < \infty, X_T = i\}) &= \sum_{m=0}^{\infty} \mathbb{P}(F \cap B \cap \{T = m, X_m = i\}).
\end{align*}
For each $m$, the event $B \cap \{T = m, X_m = i\}$ belongs to $\sigma(X_0, \ldots, X_m)$, and on this event $F = \{X_{m+1} = j_1, \ldots, X_{m+n} = j_n\}$. By the [Extended Markov Property](/theorems/2206) at time $m$,
\begin{align*}
\mathbb{P}(F \cap B \cap \{T = m, X_m = i\}) &= P_{ij_1} \cdots P_{j_{n-1} j_n} \cdot \mathbb{P}(B \cap \{T = m, X_m = i\}).
\end{align*}
Summing over $m \geq 0$:
\begin{align*}
\mathbb{P}(F \cap B \cap \{T < \infty, X_T = i\}) &= P_{ij_1} \cdots P_{j_{n-1} j_n} \cdot \sum_{m=0}^{\infty} \mathbb{P}(B \cap \{T = m, X_m = i\}) \\
&= P_{ij_1} \cdots P_{j_{n-1} j_n} \cdot \mathbb{P}(B \cap \{T < \infty, X_T = i\}).
\end{align*}
Dividing both sides by $\mathbb{P}(B \cap \{T < \infty, X_T = i\})$ (which is positive whenever the conditioning event is non-trivial), we obtain
\begin{align*}
\mathbb{P}(F \mid B \cap \{T < \infty, X_T = i\}) = P_{ij_1} P_{j_1 j_2} \cdots P_{j_{n-1} j_n}.
\end{align*}
Since this holds for every past event $B \in \sigma(X_0, \ldots, X_T)$ and every future path $(j_1, \ldots, j_n)$, we conclude:
1. **Independence**: The post-$T$ process $(X_{T+n})_{n \geq 0}$ is independent of the $\sigma$-algebra $\sigma(X_0, X_1, \ldots, X_T)$, conditional on $\{T < \infty, X_T = i\}$.
2. **Markov property with transition matrix $P$**: The finite-dimensional distributions of $(X_{T+n})_{n \geq 0}$ conditional on $\{T < \infty, X_T = i\}$ are given by $P_{ij_1} P_{j_1 j_2} \cdots P_{j_{n-1} j_n}$, which are precisely the finite-dimensional distributions of a Markov chain started at state $i$ with transition matrix $P$.
This completes the proof of the Strong Markov Property. $\blacksquare$
[guided]
The final step ties together the work from the previous steps by summing over $m$. The key structural observation is that the product of transition probabilities $P_{ij_1} \cdots P_{j_{n-1} j_n}$ does not depend on $m$. This is what allows the constant to factor out of the sum.
Let us verify the summation step carefully. We decompose $\{T < \infty\} = \bigcup_{m=0}^{\infty} \{T = m\}$, a countable disjoint union. For any events $F$ (future) and $B$ (past) and the event $\{X_T = i\}$:
\begin{align*}
\mathbb{P}(F \cap B \cap \{T < \infty, X_T = i\}) &= \sum_{m=0}^{\infty} \mathbb{P}(F \cap B \cap \{T = m, X_m = i\}).
\end{align*}
This uses countable additivity of $\mathbb{P}$ applied to the disjoint events $\{T = m\}$, $m = 0, 1, 2, \ldots$ Note that $X_T = i$ becomes $X_m = i$ on $\{T = m\}$.
For each $m$, we need to verify that $B \cap \{T = m, X_m = i\} \in \sigma(X_0, \ldots, X_m)$ so that the [Extended Markov Property](/theorems/2206) applies. The event $\{T = m\} \in \sigma(X_0, \ldots, X_m)$ by the stopping time definition, and $\{X_m = i\} \in \sigma(X_0, \ldots, X_m)$. The event $B$ belongs to $\sigma(X_0, \ldots, X_T)$, which by definition means $B \cap \{T = m\} \in \sigma(X_0, \ldots, X_m)$ for every $m$. Therefore $B \cap \{T = m, X_m = i\} \in \sigma(X_0, \ldots, X_m)$.
Now, on the event $\{T = m\}$, the future event $F = \{X_{T+1} = j_1, \ldots, X_{T+n} = j_n\}$ becomes $\{X_{m+1} = j_1, \ldots, X_{m+n} = j_n\}$. Applying the [Extended Markov Property](/theorems/2206) at the deterministic time $m$: for any event $C \in \sigma(X_0, \ldots, X_m)$ with $C \subset \{X_m = i\}$,
\begin{align*}
\mathbb{P}(\{X_{m+1} = j_1, \ldots, X_{m+n} = j_n\} \cap C) = P_{ij_1} \cdots P_{j_{n-1} j_n} \cdot \mathbb{P}(C).
\end{align*}
Taking $C = B \cap \{T = m, X_m = i\}$ (which satisfies $C \in \sigma(X_0, \ldots, X_m)$ and $C \subset \{X_m = i\}$), we get
\begin{align*}
\mathbb{P}(F \cap B \cap \{T = m, X_m = i\}) = P_{ij_1} \cdots P_{j_{n-1} j_n} \cdot \mathbb{P}(B \cap \{T = m, X_m = i\}).
\end{align*}
Summing over $m$ and factoring out the constant $P_{ij_1} \cdots P_{j_{n-1} j_n}$:
\begin{align*}
\mathbb{P}(F \cap B \cap \{T < \infty, X_T = i\}) &= P_{ij_1} \cdots P_{j_{n-1} j_n} \cdot \mathbb{P}(B \cap \{T < \infty, X_T = i\}).
\end{align*}
This is the factorisation identity $\mathbb{P}(F \cap G) = \mathbb{P}(F \mid G) \cdot \mathbb{P}(G)$ with $\mathbb{P}(F \mid G) = P_{ij_1} \cdots P_{j_{n-1} j_n}$, valid for every past event $B$. Since $B \in \sigma(X_0, \ldots, X_T)$ was arbitrary, this establishes both the independence from the past and the Markov property of the restarted chain. $\blacksquare$
[/guided]
[/step]