[proofplan]
The strategy is to reduce both sides of the claimed identity to explicit computations on cylinder events, where the [Path Probability Formula](/theorems/2205) provides a closed-form expression for joint path probabilities. We express the joint event $\{F,\, X_n = i,\, H\}$ for cylinder sets $F$ and $H$, factor the resulting product into a past component (depending on $H$ and $X_n = i$) and a future component (depending on $F$ and $X_n = i$), and observe that the past component cancels upon conditioning. The extension from cylinder events to general events determined by the past or future follows from the fact that cylinder events generate the relevant $\sigma$-algebras.
[/proofplan]
[step:Reduce to cylinder events for the past and future]
It suffices to prove the identity for events of the form
\begin{align*}
H &= \{X_0 = i_0,\, X_1 = i_1,\, \ldots,\, X_{n-1} = i_{n-1}\}, \\
F &= \{X_{n+1} = j_1,\, X_{n+2} = j_2,\, \ldots,\, X_{n+m} = j_m\},
\end{align*}
where $i_0, i_1, \ldots, i_{n-1} \in S$ and $j_1, j_2, \ldots, j_m \in S$ are fixed states and $m \geq 1$ is a fixed integer. The collection of all such cylinder sets for the past forms a $\pi$-system that generates $\sigma(X_0, X_1, \ldots, X_{n-1})$, and the collection of all such cylinder sets for the future forms a $\pi$-system that generates $\sigma(X_{n+1}, X_{n+2}, \ldots)$. Since the conditional probabilities $\mathbb{P}(\,\cdot \mid X_n = i,\, H)$ and $\mathbb{P}(\,\cdot \mid X_n = i)$ are measures in $F$ that agree on a $\pi$-system, Dynkin's $\pi$-$\lambda$ theorem guarantees they agree on the entire generated $\sigma$-algebra.
We assume throughout that $\mathbb{P}(X_n = i,\, H) > 0$, since the conditional probability $\mathbb{P}(F \mid X_n = i,\, H)$ is defined only when the conditioning event has positive probability.
[guided]
The identity we must prove is
\begin{align*}
\mathbb{P}(F \mid X_n = i,\, H) &= \mathbb{P}(F \mid X_n = i),
\end{align*}
and the definition of conditional probability gives
\begin{align*}
\mathbb{P}(F \mid X_n = i,\, H) &= \frac{\mathbb{P}(F \cap \{X_n = i\} \cap H)}{\mathbb{P}(\{X_n = i\} \cap H)}.
\end{align*}
To evaluate this, we need to compute joint probabilities of the form $\mathbb{P}(X_0 = i_0, \ldots, X_{n+m} = j_m)$. The [Path Probability Formula](/theorems/2205) provides exactly this, but only for complete specifications of the chain at consecutive times $0, 1, \ldots, k$. This is why we restrict to cylinder events: they specify the chain at a consecutive block of times, and together with $\{X_n = i\}$ they produce a full path specification from time $0$ to time $n + m$.
Why does this suffice for general $H$ and $F$? The cylinder events $\{X_0 = i_0, \ldots, X_{n-1} = i_{n-1}\}$ form a $\pi$-system (the intersection of two such events is either empty or another event of the same form) that generates the $\sigma$-algebra $\sigma(X_0, \ldots, X_{n-1})$. Similarly, the future cylinder events form a $\pi$-system generating $\sigma(X_{n+1}, X_{n+2}, \ldots)$. For fixed $H$, the maps $F \mapsto \mathbb{P}(F \mid X_n = i,\, H)$ and $F \mapsto \mathbb{P}(F \mid X_n = i)$ are both probability measures in $F$. If two probability measures agree on a $\pi$-system, then by Dynkin's $\pi$-$\lambda$ theorem they agree on the $\sigma$-algebra generated by that $\pi$-system. An analogous argument in $H$ (for each fixed cylinder $F$) extends the result to all events determined by the past.
We assume $\mathbb{P}(X_n = i,\, H) > 0$, since the conditional probability on the left-hand side is undefined otherwise.
[/guided]
[/step]
[step:Compute $\mathbb{P}(F,\, X_n = i,\, H)$ using the Path Probability Formula]
Let $\lambda := (\lambda_k)_{k \in S}$ denote the initial distribution of the chain, so that $\lambda_k = \mathbb{P}(X_0 = k)$ for each $k \in S$. The event $F \cap \{X_n = i\} \cap H$ specifies the chain at every time from $0$ to $n + m$:
\begin{align*}
F \cap \{X_n = i\} \cap H &= \{X_0 = i_0,\, X_1 = i_1,\, \ldots,\, X_{n-1} = i_{n-1},\, X_n = i,\, X_{n+1} = j_1,\, \ldots,\, X_{n+m} = j_m\}.
\end{align*}
Applying the [Path Probability Formula](/theorems/2205) to this path of length $n + m$:
\begin{align*}
\mathbb{P}(F,\, X_n = i,\, H) &= \lambda_{i_0}\, p_{i_0, i_1}\, p_{i_1, i_2} \cdots p_{i_{n-2}, i_{n-1}}\, p_{i_{n-1}, i}\, p_{i, j_1}\, p_{j_1, j_2} \cdots p_{j_{m-1}, j_m}.
\end{align*}
[guided]
The event $F \cap \{X_n = i\} \cap H$ pins down the entire trajectory of the chain at times $0, 1, \ldots, n + m$. Writing $\lambda_k = \mathbb{P}(X_0 = k)$ for the initial distribution, the [Path Probability Formula](/theorems/2205) states that for any sequence of states $s_0, s_1, \ldots, s_N \in S$,
\begin{align*}
\mathbb{P}(X_0 = s_0,\, X_1 = s_1,\, \ldots,\, X_N = s_N) &= \lambda_{s_0}\, \prod_{k=0}^{N-1} p_{s_k, s_{k+1}}.
\end{align*}
We apply this with $N = n + m$ and the state sequence $(i_0, i_1, \ldots, i_{n-1}, i, j_1, j_2, \ldots, j_m)$. The resulting product of transition probabilities is
\begin{align*}
\mathbb{P}(F,\, X_n = i,\, H) &= \lambda_{i_0}\, p_{i_0, i_1}\, p_{i_1, i_2} \cdots p_{i_{n-2}, i_{n-1}}\, p_{i_{n-1}, i}\, p_{i, j_1}\, p_{j_1, j_2} \cdots p_{j_{m-1}, j_m}.
\end{align*}
The key structural observation is that this is a single product of $n + m$ transition probabilities multiplied by the initial probability $\lambda_{i_0}$. In the next step, we will group these factors to reveal a factorisation into past and future.
[/guided]
[/step]
[step:Factor the product into past and future components]
We group the product from the previous step into three parts: the initial probability, the transitions through the past states up to time $n$, and the transitions through the future states from time $n$ onward:
\begin{align*}
\mathbb{P}(F,\, X_n = i,\, H) &= \underbrace{\lambda_{i_0}\, p_{i_0, i_1} \cdots p_{i_{n-1}, i}}_{\text{past and present}} \;\cdot\; \underbrace{p_{i, j_1}\, p_{j_1, j_2} \cdots p_{j_{m-1}, j_m}}_{\text{future given state } i}.
\end{align*}
The first factor is precisely $\mathbb{P}(X_n = i,\, H)$. To verify this, we apply the [Path Probability Formula](/theorems/2205) to the path $(i_0, i_1, \ldots, i_{n-1}, i)$ of length $n$:
\begin{align*}
\mathbb{P}(X_n = i,\, H) &= \mathbb{P}(X_0 = i_0,\, X_1 = i_1,\, \ldots,\, X_{n-1} = i_{n-1},\, X_n = i) \\
&= \lambda_{i_0}\, p_{i_0, i_1} \cdots p_{i_{n-1}, i}.
\end{align*}
Therefore:
\begin{align*}
\mathbb{P}(F,\, X_n = i,\, H) &= \mathbb{P}(X_n = i,\, H) \;\cdot\; p_{i, j_1}\, p_{j_1, j_2} \cdots p_{j_{m-1}, j_m}.
\end{align*}
[guided]
The product structure of the [Path Probability Formula](/theorems/2205) is what makes this factorisation possible. The formula expresses the probability of a full path as $\lambda_{i_0}$ times a product of one-step transition probabilities $p_{s_k, s_{k+1}}$. Because this is a product (not, say, a sum or a more complicated expression), we can split it at any time index.
We split at time $n$. The first $n$ transition factors $p_{i_0, i_1}, p_{i_1, i_2}, \ldots, p_{i_{n-1}, i}$, together with $\lambda_{i_0}$, depend only on the past states $i_0, \ldots, i_{n-1}$ and the present state $i$. The remaining $m$ transition factors $p_{i, j_1}, p_{j_1, j_2}, \ldots, p_{j_{m-1}, j_m}$ depend only on the present state $i$ and the future states $j_1, \ldots, j_m$.
To identify the first factor, note that $\{X_n = i\} \cap H = \{X_0 = i_0, \ldots, X_{n-1} = i_{n-1}, X_n = i\}$, which specifies the chain at consecutive times $0, 1, \ldots, n$. Applying the [Path Probability Formula](/theorems/2205) to this path of length $n$:
\begin{align*}
\mathbb{P}(X_n = i,\, H) &= \lambda_{i_0}\, p_{i_0, i_1}\, p_{i_1, i_2} \cdots p_{i_{n-1}, i}.
\end{align*}
This is exactly the "past and present" factor. Substituting:
\begin{align*}
\mathbb{P}(F,\, X_n = i,\, H) &= \mathbb{P}(X_n = i,\, H) \;\cdot\; p_{i, j_1}\, p_{j_1, j_2} \cdots p_{j_{m-1}, j_m}.
\end{align*}
The crucial point is that the second factor $p_{i, j_1} \cdots p_{j_{m-1}, j_m}$ depends on the present state $i$ and the future states $j_1, \ldots, j_m$, but does **not** depend on the past states $i_0, \ldots, i_{n-1}$. This is the product structure of the Markov chain at work: once the state at time $n$ is specified, the transition probabilities into the future are completely determined by the transition matrix $P$, regardless of how the chain arrived at state $i$.
[/guided]
[/step]
[step:Divide by $\mathbb{P}(X_n = i,\, H)$ to obtain the conditional probability]
Since $\mathbb{P}(X_n = i,\, H) > 0$, dividing both sides of the factorisation by $\mathbb{P}(X_n = i,\, H)$ yields:
\begin{align*}
\mathbb{P}(F \mid X_n = i,\, H) &= \frac{\mathbb{P}(F,\, X_n = i,\, H)}{\mathbb{P}(X_n = i,\, H)} = p_{i, j_1}\, p_{j_1, j_2} \cdots p_{j_{m-1}, j_m}.
\end{align*}
The right-hand side depends only on the present state $i$ and the future states $j_1, \ldots, j_m$. In particular, the past states $i_0, \ldots, i_{n-1}$ have been entirely eliminated.
[/step]
[step:Show that $\mathbb{P}(F \mid X_n = i)$ equals the same product]
We now compute $\mathbb{P}(F \mid X_n = i)$ independently. Using the definition of conditional probability:
\begin{align*}
\mathbb{P}(F \mid X_n = i) &= \frac{\mathbb{P}(F,\, X_n = i)}{\mathbb{P}(X_n = i)}.
\end{align*}
To evaluate the numerator, we sum over all possible past trajectories. The event $\{F,\, X_n = i\}$ can be decomposed as a disjoint union over all state sequences $(i_0', i_1', \ldots, i_{n-1}') \in S^n$:
\begin{align*}
\mathbb{P}(F,\, X_n = i) &= \sum_{i_0', \ldots, i_{n-1}' \in S} \mathbb{P}(X_0 = i_0',\, \ldots,\, X_{n-1} = i_{n-1}',\, X_n = i,\, X_{n+1} = j_1,\, \ldots,\, X_{n+m} = j_m).
\end{align*}
Applying the [Path Probability Formula](/theorems/2205) to each term and factoring as before:
\begin{align*}
\mathbb{P}(F,\, X_n = i) &= \sum_{i_0', \ldots, i_{n-1}' \in S} \lambda_{i_0'}\, p_{i_0', i_1'} \cdots p_{i_{n-1}', i} \;\cdot\; p_{i, j_1}\, p_{j_1, j_2} \cdots p_{j_{m-1}, j_m} \\
&= \Bigl(\sum_{i_0', \ldots, i_{n-1}' \in S} \lambda_{i_0'}\, p_{i_0', i_1'} \cdots p_{i_{n-1}', i}\Bigr) \cdot p_{i, j_1}\, p_{j_1, j_2} \cdots p_{j_{m-1}, j_m}.
\end{align*}
The sum in parentheses equals $\mathbb{P}(X_n = i)$, since it sums the [Path Probability Formula](/theorems/2205) over all paths arriving at state $i$ at time $n$:
\begin{align*}
\sum_{i_0', \ldots, i_{n-1}' \in S} \lambda_{i_0'}\, p_{i_0', i_1'} \cdots p_{i_{n-1}', i} &= \sum_{i_0', \ldots, i_{n-1}' \in S} \mathbb{P}(X_0 = i_0',\, \ldots,\, X_{n-1} = i_{n-1}',\, X_n = i) = \mathbb{P}(X_n = i).
\end{align*}
Dividing by $\mathbb{P}(X_n = i) > 0$:
\begin{align*}
\mathbb{P}(F \mid X_n = i) &= p_{i, j_1}\, p_{j_1, j_2} \cdots p_{j_{m-1}, j_m}.
\end{align*}
[guided]
We need to verify that $\mathbb{P}(F \mid X_n = i)$ — the conditional probability of the future given only the present, with no past conditioning — equals the same product of transition probabilities obtained in the previous step.
The definition of conditional probability gives $\mathbb{P}(F \mid X_n = i) = \mathbb{P}(F,\, X_n = i) / \mathbb{P}(X_n = i)$. Unlike the previous computation, the event $\{F,\, X_n = i\}$ does not specify the chain at times $0, 1, \ldots, n-1$. To apply the [Path Probability Formula](/theorems/2205), we must introduce all possible past trajectories. Since the events $\{X_0 = i_0', \ldots, X_{n-1} = i_{n-1}'\}$ for different sequences $(i_0', \ldots, i_{n-1}') \in S^n$ are mutually exclusive and exhaustive (they partition the sample space by the values of $X_0, \ldots, X_{n-1}$), we decompose:
\begin{align*}
\mathbb{P}(F,\, X_n = i) &= \sum_{i_0', \ldots, i_{n-1}' \in S} \mathbb{P}(X_0 = i_0',\, \ldots,\, X_{n-1} = i_{n-1}',\, X_n = i,\, X_{n+1} = j_1,\, \ldots,\, X_{n+m} = j_m).
\end{align*}
Each summand is now a full path specification from time $0$ to time $n + m$, so the [Path Probability Formula](/theorems/2205) applies:
\begin{align*}
&= \sum_{i_0', \ldots, i_{n-1}' \in S} \lambda_{i_0'}\, p_{i_0', i_1'} \cdots p_{i_{n-1}', i} \;\cdot\; p_{i, j_1}\, p_{j_1, j_2} \cdots p_{j_{m-1}, j_m}.
\end{align*}
The factor $p_{i, j_1} \cdots p_{j_{m-1}, j_m}$ does not depend on the summation indices $(i_0', \ldots, i_{n-1}')$, so we extract it:
\begin{align*}
&= \Bigl(\sum_{i_0', \ldots, i_{n-1}' \in S} \lambda_{i_0'}\, p_{i_0', i_1'} \cdots p_{i_{n-1}', i}\Bigr) \cdot p_{i, j_1}\, p_{j_1, j_2} \cdots p_{j_{m-1}, j_m}.
\end{align*}
The parenthesised sum is
\begin{align*}
\sum_{i_0', \ldots, i_{n-1}' \in S} \mathbb{P}(X_0 = i_0',\, X_1 = i_1',\, \ldots,\, X_{n-1} = i_{n-1}',\, X_n = i) &= \mathbb{P}(X_n = i),
\end{align*}
since summing the joint probability over all values of $X_0, \ldots, X_{n-1}$ marginalises those variables out. Substituting and dividing by $\mathbb{P}(X_n = i) > 0$:
\begin{align*}
\mathbb{P}(F \mid X_n = i) &= p_{i, j_1}\, p_{j_1, j_2} \cdots p_{j_{m-1}, j_m}.
\end{align*}
This is exactly the expression obtained in the previous step for $\mathbb{P}(F \mid X_n = i,\, H)$.
[/guided]
[/step]
[step:Conclude that the past and present conditional probabilities coincide]
Combining the results of the two preceding steps, we have shown that for cylinder events $H$ and $F$:
\begin{align*}
\mathbb{P}(F \mid X_n = i,\, H) &= p_{i, j_1}\, p_{j_1, j_2} \cdots p_{j_{m-1}, j_m} = \mathbb{P}(F \mid X_n = i).
\end{align*}
Both conditional probabilities reduce to the same product of transition probabilities, which depends only on the present state $i$ and the future states $j_1, \ldots, j_m$. The past trajectory encoded in $H$ does not appear.
By the $\pi$-$\lambda$ argument described in the first step, the identity extends from cylinder events to all events $H \in \sigma(X_0, \ldots, X_{n-1})$ and $F \in \sigma(X_{n+1}, X_{n+2}, \ldots)$, completing the proof.
[/step]