Finite-Dimensional Distributions of a Markov Chain from Its Initial Law and Transition Kernel

Finite-Dimensional Distributions of a Markov Chain from Its Initial Law and Transition Kernel (Theorem # 9960)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We prove a slightly stronger integral identity in which the final indicator $\mathbb 1_{A_n}(X_n)$ is replaced by an arbitrary bounded measurable terminal function $g(X_n)$. This stronger form is stable under the induction step, because conditioning on $\mathcal F_n$ converts the last indicator $\mathbb 1_{A_{n+1}}(X_{n+1})$ into the [measurable function](/page/Measurable%20Function) $x_n\mapsto K(x_n,A_{n+1})$. The desired finite-dimensional formula follows by taking $g=\mathbb 1_{A_n}$. [/proofplan] [step:Establish the terminal-function identity by induction] For each integer $n\geq 0$, let $P(n)$ denote the following assertion: for all sets $A_0,\dots,A_{n-1}\in\mathcal E$ when $n\geq 1$, and for every bounded $\mathcal E$-measurable function $g:E\to\mathbb R$, \begin{align*} \mathbb E\left[\left(\prod_{j=0}^{n-1}\mathbb 1_{A_j}(X_j)\right)g(X_n)\right] = \int_{A_0}\mu(dx_0)\int_{A_1}K(x_0,dx_1)\cdots\int_{A_{n-1}}K(x_{n-2},dx_{n-1})\int_E g(x_n)\,K(x_{n-1},dx_n), \end{align*} with the convention that, when $n=0$, the empty product on the left is $1$ and the right-hand side is $\int_E g(x_0)\,d\mu(x_0)$. For $n=0$, since $\mathbb P\circ X_0^{-1}=\mu$, the change-of-variables formula for pushforward measures gives \begin{align*} \mathbb E[g(X_0)] = \int_E g(x_0)\,d\mu(x_0). \end{align*} Thus $P(0)$ holds. Assume $P(n)$ holds for some $n\geq 0$. Let $A_0,\dots,A_n\in\mathcal E$ and let $g:E\to\mathbb R$ be bounded and $\mathcal E$-measurable. Define the bounded $\mathcal F_n$-measurable [random variable](/page/Random%20Variable) \begin{align*} Y_n:\Omega&\to\mathbb R \end{align*} \begin{align*} \omega&\mapsto \prod_{j=0}^{n}\mathbb 1_{A_j}(X_j(\omega)). \end{align*} The measurability follows because the process is adapted, so $X_j$ is $\mathcal F_j$-measurable and hence $\mathcal F_n$-measurable for every $0\leq j\leq n$. By the pull-out property and the [tower property of conditional expectation](/theorems/1150), \begin{align*} \mathbb E[Y_n g(X_{n+1})] = \mathbb E\left[Y_n\,\mathbb E[g(X_{n+1})\mid\mathcal F_n]\right]. \end{align*} The assumed one-step kernel identity applies to $g$, since $g$ is bounded and $\mathcal E$-measurable. Hence \begin{align*} \mathbb E[Y_n g(X_{n+1})] = \mathbb E\left[Y_n\int_E g(x_{n+1})\,K(X_n,dx_{n+1})\right]. \end{align*} Define the bounded $\mathcal E$-measurable function \begin{align*} h:E&\to\mathbb R \end{align*} \begin{align*} x&\mapsto \mathbb 1_{A_n}(x)\int_E g(x_{n+1})\,K(x,dx_{n+1}). \end{align*} Define the finite constant $\|g\|_\infty:=\sup_{z\in E}|g(z)|$. The function $x\mapsto \int_E g(x_{n+1})\,K(x,dx_{n+1})$ is $\mathcal E$-measurable by the defining measurability property of a transition kernel, and its absolute value is bounded by $\|g\|_\infty$. Therefore $h$ is bounded and $\mathcal E$-measurable. Using the definition of $h$, we rewrite the last expectation as \begin{align*} \mathbb E[Y_n g(X_{n+1})] = \mathbb E\left[\left(\prod_{j=0}^{n-1}\mathbb 1_{A_j}(X_j)\right)h(X_n)\right]. \end{align*} If $n=0$, applying $P(0)$ to the terminal function $h$ gives \begin{align*} \mathbb E[Y_0 g(X_1)] = \int_E h(x_0)\,d\mu(x_0). \end{align*} Substituting the definition of $h$ and restricting the $x_0$-integration to $A_0$ because of the factor $\mathbb 1_{A_0}(x_0)$ gives \begin{align*} \mathbb E[Y_0 g(X_1)] = \int_{A_0}\mu(dx_0)\int_E g(x_1)\,K(x_0,dx_1). \end{align*} This is $P(1)$. If $n\geq 1$, applying the induction hypothesis $P(n)$ to the terminal function $h$ gives \begin{align*} \mathbb E[Y_n g(X_{n+1})] = \int_{A_0}\mu(dx_0)\int_{A_1}K(x_0,dx_1)\cdots\int_{A_{n-1}}K(x_{n-2},dx_{n-1})\int_E h(x_n)\,K(x_{n-1},dx_n). \end{align*} Substituting the definition of $h$ and restricting the final $x_n$-integration to $A_n$ because of the factor $\mathbb 1_{A_n}(x_n)$, we obtain \begin{align*} \mathbb E[Y_n g(X_{n+1})] = \int_{A_0}\mu(dx_0)\int_{A_1}K(x_0,dx_1)\cdots\int_{A_n}K(x_{n-1},dx_n)\int_E g(x_{n+1})\,K(x_n,dx_{n+1}). \end{align*} Thus $P(n+1)$ holds. By induction, $P(n)$ holds for every $n\geq 0$. [guided] The reason for proving the stronger statement is that the induction step naturally produces a terminal function rather than only a terminal event. Fix an integer $n\geq 0$. The assertion $P(n)$ says that if we prescribe events up to time $n-1$ and then evaluate a bounded measurable function at time $n$, the expectation is computed by starting with $\mu$ and then integrating successively against the kernel $K$. For the base case $n=0$, no transition kernel is involved. Since $X_0$ has law $\mu$, meaning $\mathbb P\circ X_0^{-1}=\mu$, every bounded $\mathcal E$-measurable function $g:E\to\mathbb R$ satisfies \begin{align*} \mathbb E[g(X_0)] = \int_E g(x_0)\,d\mu(x_0). \end{align*} This is exactly $P(0)$. Now assume $P(n)$ is known. Let $A_0,\dots,A_n\in\mathcal E$ and let $g:E\to\mathbb R$ be bounded and $\mathcal E$-measurable. Define \begin{align*} Y_n:\Omega&\to\mathbb R \end{align*} \begin{align*} \omega&\mapsto \prod_{j=0}^{n}\mathbb 1_{A_j}(X_j(\omega)). \end{align*} Because the process is adapted, $X_j$ is $\mathcal F_j$-measurable. Since the filtration is increasing, $\mathcal F_j\subset\mathcal F_n$ for $j\leq n$, so each indicator $\mathbb 1_{A_j}(X_j)$ is $\mathcal F_n$-measurable. Hence $Y_n$ is bounded and $\mathcal F_n$-measurable. We condition on $\mathcal F_n$ because the hypothesis describes the conditional law of $X_{n+1}$ given $\mathcal F_n$. The pull-out property for [conditional expectation](/page/Conditional%20Expectation) and the tower identity give \begin{align*} \mathbb E[Y_n g(X_{n+1})] = \mathbb E\left[\mathbb E[Y_n g(X_{n+1})\mid\mathcal F_n]\right] = \mathbb E\left[Y_n\,\mathbb E[g(X_{n+1})\mid\mathcal F_n]\right]. \end{align*} The function $g$ is bounded and $\mathcal E$-measurable, so the assumed kernel identity applies: \begin{align*} \mathbb E[g(X_{n+1})\mid\mathcal F_n] = \int_E g(x_{n+1})\,K(X_n,dx_{n+1}) \end{align*} almost surely. Therefore \begin{align*} \mathbb E[Y_n g(X_{n+1})] = \mathbb E\left[Y_n\int_E g(x_{n+1})\,K(X_n,dx_{n+1})\right]. \end{align*} At this point the future has been absorbed into a function of the present state $X_n$. Define \begin{align*} h:E&\to\mathbb R \end{align*} \begin{align*} x&\mapsto \mathbb 1_{A_n}(x)\int_E g(x_{n+1})\,K(x,dx_{n+1}). \end{align*} Define $\|g\|_\infty:=\sup_{z\in E}|g(z)|$, which is finite because $g$ is bounded. The transition-kernel property says that, for every bounded measurable $g$, the map $x\mapsto \int_E g(x_{n+1})\,K(x,dx_{n+1})$ is $\mathcal E$-measurable. Its absolute value is bounded by $\|g\|_\infty$, because $K(x,\cdot)$ is a [probability measure](/page/Probability%20Measure) for each $x\in E$. Multiplying by $\mathbb 1_{A_n}$ preserves boundedness and measurability, so $h$ is an admissible terminal function for the induction hypothesis. Using $h(X_n)$ to combine the event at time $n$ with the averaged future term, we get \begin{align*} \mathbb E[Y_n g(X_{n+1})] = \mathbb E\left[\left(\prod_{j=0}^{n-1}\mathbb 1_{A_j}(X_j)\right)h(X_n)\right]. \end{align*} Now apply the induction hypothesis to $h$, keeping the base-to-first transition separate. If $n=0$, then $P(0)$ gives \begin{align*} \mathbb E[Y_0 g(X_1)] = \int_E h(x_0)\,d\mu(x_0). \end{align*} Substituting the definition of $h$ gives the factor $\mathbb 1_{A_0}(x_0)$, so the integral over $E$ restricts to $A_0$: \begin{align*} \mathbb E[Y_0 g(X_1)] = \int_{A_0}\mu(dx_0)\int_E g(x_1)\,K(x_0,dx_1). \end{align*} This is exactly $P(1)$. If $n\geq 1$, then $P(n)$ gives \begin{align*} \mathbb E[Y_n g(X_{n+1})] = \int_{A_0}\mu(dx_0)\int_{A_1}K(x_0,dx_1)\cdots\int_{A_{n-1}}K(x_{n-2},dx_{n-1})\int_E h(x_n)\,K(x_{n-1},dx_n). \end{align*} Substituting the definition of $h$ gives the factor $\mathbb 1_{A_n}(x_n)$ inside the last integral. This restricts that integral to $A_n$, so \begin{align*} \mathbb E[Y_n g(X_{n+1})] = \int_{A_0}\mu(dx_0)\int_{A_1}K(x_0,dx_1)\cdots\int_{A_n}K(x_{n-1},dx_n)\int_E g(x_{n+1})\,K(x_n,dx_{n+1}). \end{align*} This is precisely $P(n+1)$. Hence induction proves $P(n)$ for every $n\geq 0$. [/guided] [/step] [step:Recover the finite-dimensional distribution formula from the terminal-function identity] Fix $n\geq 0$ and sets $A_0,\dots,A_n\in\mathcal E$. If $n=0$, the law of $X_0$ gives \begin{align*} \mathbb P(X_0\in A_0) = \mu(A_0). \end{align*} This is the stated formula with no kernel factors. Assume $n\geq 1$. Apply the identity $P(n)$ from the previous step with terminal function \begin{align*} g:E&\to\mathbb R \end{align*} \begin{align*} x&\mapsto \mathbb 1_{A_n}(x). \end{align*} This function is bounded and $\mathcal E$-measurable. The left-hand side becomes \begin{align*} \mathbb E\left[\prod_{j=0}^{n}\mathbb 1_{A_j}(X_j)\right] = \mathbb P(X_0\in A_0,\dots,X_n\in A_n), \end{align*} because the product of indicators is the indicator of the intersection of the events $\{X_j\in A_j\}$. The right-hand side becomes \begin{align*} \int_{A_0}\mu(dx_0)\int_{A_1}K(x_0,dx_1)\cdots\int_{A_n}K(x_{n-1},dx_n). \end{align*} Thus the asserted finite-dimensional distribution formula holds for all $n\geq 0$ and all $A_0,\dots,A_n\in\mathcal E$. [/step]

Prerequisites (0/5 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Definitions & Concepts

Explore Further

What brings you to Androma?

Start with a route through the knowledge graph.