Feynman-Kac Representation of the Bootstrap Particle Filter

Theorem

Edit Issues Pull Requests Attributions Admin

Let $(E,\mathcal E)$ be a measurable state space and let $\lambda$ be a $\sigma$-finite reference measure on $(E,\mathcal E)$. Let $(F_s,\mathcal F_s)$ be measurable observation spaces with $\sigma$-finite reference measures $\nu_s$ for $0\le s\le t$. Fix $t\in\mathbb N\cup\{0\}$ and observations $y_s\in F_s$ for $0\le s\le t$. Define the product $\sigma$-algebras and reference measures \begin{align*} \mathcal E_{0:t}:=\mathcal E^{\otimes(t+1)},\qquad \mathcal F_{0:t}:=\mathcal F_0\otimes\cdots\otimes\mathcal F_t, \end{align*} \begin{align*} \lambda_{0:t}:=\lambda^{\otimes(t+1)},\qquad \nu_{0:t}:=\nu_0\otimes\cdots\otimes\nu_t. \end{align*} Consider a dominated state space model consisting of random variables $X_s:(\Omega,\mathcal A,\mathbb P)\to(E,\mathcal E)$ and $Y_s:(\Omega,\mathcal A,\mathbb P)\to(F_s,\mathcal F_s)$ for $0\le s\le t$, an initial density $\mu_0:E\to[0,\infty)$, transition densities $f_s:E\times E\to[0,\infty)$ for $1\le s\le t$, and observation likelihood densities $g_s:F_s\times E\to[0,\infty)$ for $0\le s\le t$, all measurable, such that \begin{align*} \int_E \mu_0(x_0)\,d\lambda(x_0)=1. \end{align*} For every $1\le s\le t$ and every $x_{s-1}\in E$, assume \begin{align*} \int_E f_s(x_s\mid x_{s-1})\,d\lambda(x_s)=1. \end{align*} Assume the model factorizes in the following dominated sense: the path-valued [random variable](/page/Random%20Variable) $X_{0:t}:=(X_0,\dots,X_t):(\Omega,\mathcal A)\to(E^{t+1},\mathcal E_{0:t})$ has density \begin{align*} \mu_0(x_0)\prod_{s=1}^{t} f_s(x_s\mid x_{s-1}) \end{align*} with respect to $\lambda_{0:t}$, and, conditionally on $X_0=x_0,\dots,X_t=x_t$, the observation-valued random variable $Y_{0:t}:=(Y_0,\dots,Y_t):(\Omega,\mathcal A)\to(F_0\times\cdots\times F_t,\mathcal F_{0:t})$ has conditional density \begin{align*} \prod_{s=0}^{t} g_s(z_s\mid x_s) \end{align*} with respect to $\nu_{0:t}$ at $z_{0:t}=(z_0,\dots,z_t)$. Equivalently, the joint law of $(X_{0:t},Y_{0:t})$ has density \begin{align*} \mu_0(x_0)\prod_{s=1}^{t} f_s(x_s\mid x_{s-1})\prod_{s=0}^{t} g_s(z_s\mid x_s) \end{align*} with respect to $\lambda_{0:t}\otimes\nu_{0:t}$. Define the initial Feynman-Kac input measure $\eta_0^{\mathrm{init}}$ on $(E,\mathcal E)$ by \begin{align*} \eta_0^{\mathrm{init}}(A)=\int_A \mu_0(x_0)\,d\lambda(x_0) \end{align*} for every $A\in\mathcal E$. For $1\le s\le t$, define the mutation kernel $M_s:E\times\mathcal E\to[0,1]$ by \begin{align*} M_s(x_{s-1},A)=\int_A f_s(x_s\mid x_{s-1})\,d\lambda(x_s) \end{align*} for every $x_{s-1}\in E$ and $A\in\mathcal E$. For $0\le s\le t$, define the potential $G_s:E\to[0,\infty)$ by \begin{align*} G_s(x_s)=g_s(y_s\mid x_s). \end{align*} Let $(Q_s)_{1\le s\le t}$ be the proposal kernels of the particle filter, where each $Q_s:E\times\mathcal E\to[0,1]$ is a Markov kernel. The bootstrap particle filter considered here is the mutation-selection particle system whose initial law is $\eta_0^{\mathrm{init}}$, whose proposal kernels are $(Q_s)_{1\le s\le t}$, and whose incremental weight functions are $G_0$ at time $0$ and $G_s$ after mutation at time $s$ for $1\le s\le t$. For each $0\le s\le t$, let $\gamma_s$ be the unnormalized Feynman-Kac measure on $(E,\mathcal E)$ defined by \begin{align*} \gamma_s(A)=\int_{E^{s+1}}\mathbb{1}_A(x_s)\,\mu_0(x_0)\,G_0(x_0)\prod_{r=1}^{s} f_r(x_r\mid x_{r-1})G_r(x_r)\,d\lambda^{\otimes(s+1)}(x_0,\dots,x_s). \end{align*} Assume that each Feynman-Kac normalizing constant satisfies \begin{align*} 0<\gamma_s(E)<\infty \end{align*} for every $0\le s\le t$. Define the normalized terminal Feynman-Kac measure $\eta_t^{\mathrm{FK}}$ by \begin{align*} \eta_t^{\mathrm{FK}}(A)=\frac{\gamma_t(A)}{\gamma_t(E)} \end{align*} for every $A\in\mathcal E$. If the proposal kernel in the particle filter is the transition kernel, namely $Q_s=M_s$ for every $1\le s\le t$, then the resulting bootstrap particle filter targets the Feynman-Kac flow generated by the input measure $\eta_0^{\mathrm{init}}$, the mutation kernels $(M_s)_{1\le s\le t}$, and the potentials $(G_s)_{0\le s\le t}$. Moreover, $\eta_t^{\mathrm{FK}}$ is a version of the regular conditional distribution of $X_t$ given $Y_{0:t}=y_{0:t}$, where $y_{0:t}:=(y_0,\dots,y_t)$.

Discussion

Proof

[proofplan] We identify the Feynman-Kac path weight with the joint density of the latent path and the fixed observations, using the dominated factorization and conditional independence assumptions in the model. Marginalizing the weighted path density over $x_0,\dots,x_{t-1}$ gives the joint density of $X_t$ and the event $Y_0=y_0,\dots,Y_t=y_t$. The assumption $0<\gamma_t(E)<\infty$ permits normalization, and the normalized joint density is exactly the conditional filtering density. Finally, when the proposal is the transition kernel, the bootstrap particle filter uses these mutation kernels and observation potentials, so its ideal target flow is the Feynman-Kac flow generated by the initial input measure, mutations, and potentials. [/proofplan] [step:Write the Feynman-Kac path density] Let $p_t:E^{t+1}\to[0,\infty)$ denote the measurable path density \begin{align*} p_t(x_0,\dots,x_t)=\mu_0(x_0)\,g_0(y_0\mid x_0)\prod_{s=1}^{t} f_s(x_s\mid x_{s-1})g_s(y_s\mid x_s). \end{align*} Here the product over $s=1,\dots,t$ is interpreted as $1$ when $t=0$. By the definitions $G_s(x_s)=g_s(y_s\mid x_s)$ and \begin{align*} M_s(x_{s-1},A)=\int_A f_s(x_s\mid x_{s-1})\,d\lambda(x_s), \end{align*} the unnormalized Feynman-Kac measure satisfies \begin{align*} \gamma_t(A)=\int_{E^{t+1}}\mathbb{1}_A(x_t)\,p_t(x_0,\dots,x_t)\,d\lambda^{\otimes(t+1)}(x_0,\dots,x_t) \end{align*} for every $A\in\mathcal E$. [/step] [step:Identify the path density with the joint density of states and observations] For $x_0,\dots,x_t\in E$, the dominated state space model assigns the latent path density \begin{align*} \mu_0(x_0)\prod_{s=1}^{t} f_s(x_s\mid x_{s-1}) \end{align*} with respect to $\lambda^{\otimes(t+1)}$, and, conditionally on this latent path, the observation likelihood at the fixed observations $y_0,\dots,y_t$ is \begin{align*} \prod_{s=0}^{t} g_s(y_s\mid x_s). \end{align*} Multiplying these two factors gives exactly $p_t(x_0,\dots,x_t)$. Let $\mathcal E_{0:t}=\mathcal E^{\otimes(t+1)}$ and $\mathcal F_{0:t}=\mathcal F_0\otimes\cdots\otimes\mathcal F_t$ denote the product $\sigma$-algebras defined in the theorem statement. Define the path-valued [random variable](/page/Random%20Variable) $X_{0:t}:(\Omega,\mathcal A)\to(E^{t+1},\mathcal E_{0:t})$ by $X_{0:t}=(X_0,\dots,X_t)$, define the observation-valued random variable $Y_{0:t}:(\Omega,\mathcal A)\to(F_0\times\cdots\times F_t,\mathcal F_{0:t})$ by $Y_{0:t}=(Y_0,\dots,Y_t)$, and write $y_{0:t}$ for the fixed tuple $(y_0,\dots,y_t)\in F_0\times\cdots\times F_t$. Thus $p_t$ is the latent-path section of the full dominated joint density of $(X_{0:t},Y_{0:t})$ with respect to $\lambda^{\otimes(t+1)}\otimes\nu_0\otimes\cdots\otimes\nu_t$, evaluated at the fixed observation tuple $y_{0:t}$, in the sense specified in the theorem statement. [guided] The goal is to connect the abstract Feynman-Kac weight with the probabilistic model. The model has two pieces. First, the density of the latent path $(X_0,\dots,X_t)$ is obtained by multiplying the initial density and the transition densities: \begin{align*} \mu_0(x_0)\prod_{s=1}^{t} f_s(x_s\mid x_{s-1}). \end{align*} This is a density with respect to the product reference measure $\lambda^{\otimes(t+1)}$ on $E^{t+1}$. Second, once the latent path is fixed at $(x_0,\dots,x_t)$, the likelihood of observing the fixed data $y_0,\dots,y_t$ is the product of the observation likelihoods: \begin{align*} \prod_{s=0}^{t} g_s(y_s\mid x_s). \end{align*} The observation at time $s$ depends on the state $x_s$, so the factor at time $s$ is $g_s(y_s\mid x_s)$. Multiplying the latent path density by the conditional observation likelihood gives \begin{align*} \mu_0(x_0)\prod_{s=1}^{t} f_s(x_s\mid x_{s-1})\prod_{s=0}^{t} g_s(y_s\mid x_s). \end{align*} Reordering the factors gives \begin{align*} \mu_0(x_0)\,g_0(y_0\mid x_0)\prod_{s=1}^{t} f_s(x_s\mid x_{s-1})g_s(y_s\mid x_s)=p_t(x_0,\dots,x_t). \end{align*} In dominated-density language, this means that for measurable sets $B\in\mathcal E^{\otimes(t+1)}$ the joint density contribution of $X_{0:t}\in B$ and the fixed observation value $Y_{0:t}=y_{0:t}$ is obtained by integrating $p_t$ over $B$ with respect to $\lambda^{\otimes(t+1)}$. Therefore the Feynman-Kac path density is precisely the joint dominated density of the latent trajectory and the observed data. [/guided] [/step] [step:Marginalize the path density to obtain the joint density of $X_t$ and the observations] Define $h_t:E\to[0,\infty]$ by \begin{align*} h_t(x_t)=\int_{E^t}\mu_0(x_0)\,g_0(y_0\mid x_0)\prod_{s=1}^{t} f_s(x_s\mid x_{s-1})g_s(y_s\mid x_s)\,d\lambda^{\otimes t}(x_0,\dots,x_{t-1}). \end{align*} For $t=0$, this definition means $h_0(x_0)=\mu_0(x_0)g_0(y_0\mid x_0)$. Since the integrand is non-negative and measurable, Tonelli's theorem applies and permits us to integrate first over $x_0,\dots,x_{t-1}$; the resulting function $h_t$ is measurable. For every $A\in\mathcal E$, \begin{align*} \gamma_t(A)=\int_A h_t(x_t)\,d\lambda(x_t) \end{align*} By the preceding step and Tonelli's theorem, $h_t$ is the section at $y_{0:t}$ of the joint density of $(X_t,Y_{0:t})$ with respect to $\lambda\otimes\nu_0\otimes\cdots\otimes\nu_t$. [/step] [step:Normalize the joint density to get the filtering distribution] Taking $A=E$ in the previous identity, we obtain \begin{align*} \gamma_t(E)=\int_E h_t(x_t)\,d\lambda(x_t) \end{align*} By hypothesis, $0<\gamma_t(E)<\infty$, so the function $\bar h_t:E\to[0,\infty)$ defined by \begin{align*} \bar h_t(x_t)=\frac{h_t(x_t)}{\gamma_t(E)} \end{align*} is a normalized density with respect to $\lambda$. Hence, for every $A\in\mathcal E$, \begin{align*} \eta_t^{\mathrm{FK}}(A)=\frac{\gamma_t(A)}{\gamma_t(E)}=\int_A \bar h_t(x_t)\,d\lambda(x_t). \end{align*} Since $h_t$ is the section at $y_{0:t}$ of the joint density of $(X_t,Y_{0:t})$ with respect to $\lambda\otimes\nu_0\otimes\cdots\otimes\nu_t$, and $\gamma_t(E)$ is the corresponding marginal likelihood at $y_{0:t}$, the density $\bar h_t$ is a version of the regular conditional density of $X_t$ given $Y_{0:t}=y_{0:t}$. Therefore $\eta_t^{\mathrm{FK}}$ is the filtering distribution. [/step] [step:Match the bootstrap particle filter with the Feynman-Kac mutation and potential structure] By the bootstrap particle filter specification in the theorem statement, the filter propagates particles at time $s$ with proposal kernel $Q_s$ and assigns weights using the observation likelihood. Under the hypothesis $Q_s=M_s$, for every $1\le s\le t$, every $x_{s-1}\in E$, and every $A\in\mathcal E$, \begin{align*} Q_s(x_{s-1},A)=M_s(x_{s-1},A)=\int_A f_s(x_s\mid x_{s-1})\,d\lambda(x_s). \end{align*} The corresponding weighting function at time $s$ is the observation likelihood evaluated at the proposed state, \begin{align*} G_s(x_s)=g_s(y_s\mid x_s). \end{align*} At time $0$, the initial particles are distributed according to the initial input measure $\eta_0^{\mathrm{init}}$ and weighted by \begin{align*} G_0(x_0)=g_0(y_0\mid x_0). \end{align*} The normalization condition on the transition densities gives $M_s(x_{s-1},E)=1$ for every $1\le s\le t$ and every $x_{s-1}\in E$, so each $M_s$ is a valid transition probability kernel. Thus the ideal mutation-selection recursion of the bootstrap particle filter is exactly the Feynman-Kac recursion generated by $\eta_0^{\mathrm{init}}$, the kernels $(M_s)_{1\le s\le t}$, and the potentials $(G_s)_{0\le s\le t}$. Since the normalized measure at time $t$ in this flow is the filtering distribution proved above, the bootstrap particle filter targets the Feynman-Kac flow and, in particular, targets the filtering distribution of $X_t$ given $Y_0=y_0,\dots,Y_t=y_t$. [/step]

What brings you to Androma?

Start with a route through the knowledge graph.