[proofplan]
We reduce the Cauchy problem representation to the [Martingale Problem](/theorems/2105) via a time-reversal trick. Given $u$ solving $\partial_t u = Lu$ with $u(0, \cdot) = f$, define $g(s, x) = u(t - s, x)$ for fixed $t > 0$. A direct computation shows $(\partial_s + L)g = 0$, so the Martingale Problem yields that $g(s, X_s) = u(t - s, X_s)$ is a continuous local martingale. Boundedness of $u$ (inherited from $f \in C^2_b$) upgrades it to a true martingale. The martingale property $\mathbb{E}[g(t, X_t) \mid \mathcal{F}_s] = g(s, X_s)$ then gives $\mathbb{E}[f(X_t) \mid \mathcal{F}_s] = u(t - s, X_s)$.
[/proofplan]
[step:Define the time-reversed function $g(s, x) = u(t - s, x)$ and verify $(\partial_s + L)g = 0$]
Fix $t > 0$. Define the function
\begin{align*}
g : [0, t] \times \mathbb{R}^d &\to \mathbb{R} \\
(s, x) &\mapsto u(t - s, x).
\end{align*}
Since $u \in C^1(\mathbb{R}_+) \otimes C^2(\mathbb{R}^d)$, we have $g \in C^1([0, t]) \otimes C^2(\mathbb{R}^d)$.
Compute the action of $\partial_s + L$ on $g$. The time derivative is
\begin{align*}
\partial_s g(s, x) = -\partial_t u(t - s, x),
\end{align*}
by the chain rule (the outer function $t - s$ has derivative $-1$ with respect to $s$). The spatial operator $L$ acts on $g$ as
\begin{align*}
Lg(s, x) = Lu(t - s, x),
\end{align*}
since $L$ involves only spatial derivatives and $t - s$ acts as a parameter. Therefore
\begin{align*}
(\partial_s + L)g(s, x) = -\partial_t u(t - s, x) + Lu(t - s, x).
\end{align*}
Since $u$ solves the Cauchy problem $\partial_t u = Lu$, substituting $\tau = t - s$:
\begin{align*}
(\partial_s + L)g(s, x) = -Lu(\tau, x) + Lu(\tau, x) = 0.
\end{align*}
[guided]
The Cauchy problem uses the equation $\partial_t u = Lu$, but the [Martingale Problem](/theorems/2105) involves the operator $\partial_s + L$. These have opposite signs on the time derivative. The time-reversal $g(s, x) = u(t - s, x)$ reconciles this: the chain rule introduces a minus sign on $\partial_s g = -\partial_t u$, which combined with $Lu$ produces $(\partial_s + L)g = -\partial_t u + Lu = 0$.
Why does the Martingale Problem use $\partial_s + L$ rather than $\partial_s - L$? Because the Itô formula for $f(s, X_s)$ produces $\partial_s f + Lf$ as the drift: the process $X$ moves forward in time, generating both $\partial_s$ (explicit time dependence) and $L$ (from the diffusion dynamics). So $\partial_s + L$ is the natural "forward" operator, while the Cauchy equation $\partial_t u = Lu$ (equivalently $\partial_t u - Lu = 0$) is the "backward" equation. The time reversal bridges the two.
[/guided]
[/step]
[step:Apply the Martingale Problem to conclude $g(s, X_s)$ is a martingale on $[0, t]$]
By the [Martingale Problem](/theorems/2105), the process
\begin{align*}
M_s = g(s, X_s) - g(0, X_0) - \int_0^s (\partial_r + L)g(r, X_r) \, d\mathcal{L}^1(r)
\end{align*}
is a continuous local martingale. Since $(\partial_s + L)g = 0$ (established in the previous step), the integral vanishes:
\begin{align*}
M_s = g(s, X_s) - g(0, X_0) = u(t - s, X_s) - u(t, x).
\end{align*}
Therefore $s \mapsto u(t - s, X_s)$ is a continuous local martingale on $[0, t]$ (up to the additive constant $u(t, x)$).
We upgrade this to a true martingale. Since $f \in C^2_b(\mathbb{R}^d)$, the solution $u$ of the Cauchy problem is bounded: $\|u\|_\infty \leq \|f\|_\infty$ (by the maximum principle for parabolic equations, or directly from the representation $u(\tau, x) = \mathbb{E}_x[f(X_\tau)]$ once established, which gives $|u(\tau, x)| \leq \|f\|_\infty$; alternatively, the existence theory for the Cauchy problem with $f \in C^2_b$ gives bounded solutions). Therefore $|u(t - s, X_s)| \leq \|u\|_\infty$ for all $s \in [0, t]$, which means $M$ is a bounded continuous local martingale. By the [Dominated Local Martingale is a Martingale](/theorems/2079) theorem, $M$ is a true martingale.
[guided]
There is a subtlety here: we need $u$ to be bounded to upgrade the local martingale to a true martingale. Where does this come from?
Since $f \in C^2_b(\mathbb{R}^d)$, the initial data is bounded. The Cauchy problem $\partial_t u = Lu$ with bounded initial data and bounded coefficients (since $\sigma$ and $b$ are bounded) has a unique bounded solution. This can be established either by the parabolic maximum principle (which gives $\sup_{x} |u(t,x)| \leq \sup_x |f(x)| = \|f\|_\infty$) or by the probabilistic representation itself (once we know it holds, $|u(t,x)| = |\mathbb{E}_x[f(X_t)]| \leq \|f\|_\infty$).
For the purpose of this proof, we take the boundedness of $u$ as given from the PDE existence theory. The key point is that $|M_s| = |u(t-s, X_s) - u(t,x)| \leq 2\|u\|_\infty$, so $M$ is bounded and therefore a true martingale by the [Dominated Local Martingale is a Martingale](/theorems/2079) theorem.
[/guided]
[/step]
[step:Extract the representation formula from the martingale property]
Since $s \mapsto g(s, X_s) = u(t - s, X_s)$ is a martingale on $[0, t]$, for $0 \leq s \leq t$:
\begin{align*}
\mathbb{E}[g(t, X_t) \mid \mathcal{F}_s] = g(s, X_s).
\end{align*}
Substituting the definition of $g$:
\begin{align*}
\mathbb{E}[u(0, X_t) \mid \mathcal{F}_s] = u(t - s, X_s).
\end{align*}
Since $u(0, \cdot) = f$ (the initial condition of the Cauchy problem):
\begin{align*}
\mathbb{E}_x[f(X_t) \mid \mathcal{F}_s] = u(t - s, X_s).
\end{align*}
Setting $s = 0$ and using $X_0 = x$:
\begin{align*}
u(t, x) = u(t - 0, X_0) = \mathbb{E}_x[f(X_t) \mid \mathcal{F}_0] = \mathbb{E}_x[f(X_t)].
\end{align*}
[guided]
The martingale property $\mathbb{E}[g(t, X_t) \mid \mathcal{F}_s] = g(s, X_s)$ encodes both statements in the theorem simultaneously:
**The conditional version** $\mathbb{E}_x[f(X_t) \mid \mathcal{F}_s] = u(t-s, X_s)$ says that the solution $u$ evaluated at $(t-s, X_s)$ gives the best prediction of $f(X_t)$ given information up to time $s$. This is the Markov property in action: the conditional expectation depends on the past only through the current state $X_s$, and the time parameter $t - s$ reflects the remaining time until the terminal observation.
**The unconditional version** $u(t, x) = \mathbb{E}_x[f(X_t)]$ (obtained by setting $s = 0$) is the Feynman-Kac formula for the Cauchy problem without a potential term. It states that the solution to the parabolic PDE $\partial_t u = Lu$ can be computed by running the diffusion $X$ starting at $x$ for time $t$ and averaging $f(X_t)$. This connects the analytic object (the PDE solution) to the probabilistic object (the diffusion process), and is the foundation of Monte Carlo methods for solving parabolic PDEs.
[/guided]
[/step]