Unbiasedness of the Particle Filter Likelihood Estimator

Theorem

Edit Issues Pull Requests Attributions Admin

Let $T \in \mathbb{N}$ and $N \in \mathbb{N}$. For each $0 \leq s \leq M$, let $(E_s,\mathcal E_s,\lambda_s)$ be a reference [measure space](/page/Measure%20Space), and write $\lambda_{0:s}=\lambda_0 \otimes \cdots \otimes \lambda_s$ on $E_{0:s}=E_0 \times \cdots \times E_s$. Fix observations $y_{0:M}$. Let $\mu_0:E_0 \to [0,\infty)$ be a measurable initial density, let $f_s:E_s \times E_{s-1} \to [0,\infty)$ be measurable transition densities for $1 \leq s \leq M$, and let $g_s(y_s\mid \cdot):E_s \to [0,\infty)$ be measurable likelihood functions for $0 \leq s \leq M$. Define $\gamma_M:E_{0:M}\to[0,\infty)$ by \begin{align*} \gamma_M(x_{0:M})=\mu_0(x_0)g_0(y_0\mid x_0)\prod_{s=1}^M f_s(x_s\mid x_{s-1})g_s(y_s\mid x_s). \end{align*} Assume $\gamma_M$ is $\lambda_{0:M}$-integrable and define \begin{align*} Z_M=\int_{E_{0:M}}\gamma_M(x_{0:M})\,d\lambda_{0:M}(x_{0:M})<\infty. \end{align*} Let $q_0:E_0\to[0,\infty)$ be a proposal density with respect to $\lambda_0$ that is positive wherever $\mu_0(x_0)g_0(y_0\mid x_0)>0$. For each $1\leq s\leq M$, let $Q_s:E_s\times E_{0:s-1}\to[0,\infty)$ be a proposal transition density with respect to $\lambda_s$, written $Q_s(x_s\mid x_{0:s-1})$, that is positive wherever $f_s(x_s\mid x_{s-1})g_s(y_s\mid x_s)>0$. Define $\omega_0:E_0\to[0,\infty)$ by $\omega_0(x_0)=\mu_0(x_0)g_0(y_0\mid x_0)/q_0(x_0)$ on $\{q_0>0\}$, and for $1\leq s\leq M$ define $\omega_s:E_{0:s}\to[0,\infty)$ by \begin{align*} \omega_s(x_{0:s})=\frac{f_s(x_s\mid x_{s-1})g_s(y_s\mid x_s)}{Q_s(x_s\mid x_{0:s-1})} \end{align*} on $\{Q_s(\cdot\mid x_{0:s-1})>0\}$. Let $(\Omega,\mathcal F,\mathbb P)$ be a probability space on which the following particle filter is defined. Draw $X_{0,1},\dots,X_{0,N}:\Omega\to E_0$ independently with density $q_0$ and set $W_{0,i}=\omega_0(X_{0,i})$. Let $\mathcal F_0=\sigma(X_{0,1},\dots,X_{0,N},W_{0,1},\dots,W_{0,N})$. For each $1\leq s\leq M$, conditional on $\mathcal F_{s-1}$, draw ancestor indices $A_{s,1},\dots,A_{s,N}:\Omega\to\{1,\dots,N\}$ whose offspring counts $C_{s,j}=\#\{i:A_{s,i}=j\}$ satisfy \begin{align*} \mathbb E[C_{s,j}\mid\mathcal F_{s-1}]=\frac{N W_{s-1,j}}{\sum_{k=1}^N W_{s-1,k}} \end{align*} for every $1\leq j\leq N$. Conditional on $A_{s,1},\dots,A_{s,N}$ and $\mathcal F_{s-1}$, draw $X_{s,i}:\Omega\to E_s$ independently with density $Q_s(\cdot\mid X_{0:s-1,A_{s,i}})$ with respect to $\lambda_s$, form $X_{0:s,i}=(X_{0:s-1,A_{s,i}},X_{s,i})$, and set $W_{s,i}=\omega_s(X_{0:s,i})$. Let $\mathcal F_s$ be the sigma-algebra generated by all particles, ancestor indices, and weights up to time $s$. Assume all displayed weights and the estimator below are integrable, and assume $\sum_{i=1}^N W_{s,i}>0$ almost surely for every $0\leq s\leq M$. Define \begin{align*} \widehat Z_{M,N}=\prod_{s=0}^M\left(\frac{1}{N}\sum_{i=1}^N W_{s,i}\right). \end{align*} Then $\mathbb E[\widehat Z_{M,N}]=Z_M$. If the model is interpreted as a hidden Markov model, define $p(y_{0:M})$ by \begin{align*} p(y_{0:M})=\int_{E_{0:M}}\gamma_M(x_{0:M})\,d\lambda_{0:M}(x_{0:M}). \end{align*} In that interpretation, $Z_M=p(y_{0:M})$.

Discussion

Proof

[proofplan] We introduce the exact unnormalized Feynman-Kac functionals $\gamma_s(h)$ and their particle analogues $\gamma_{s,N}(h)$, chosen so that $\gamma_{M,N}(1)$ is exactly the product likelihood estimator. The proof is then a one-step conditional unbiasedness argument: at time $0$ the proposal density $q_0$ and the weight $\omega_0$ recover the initial unnormalized measure, while at later times unbiased resampling preserves the previous empirical measure in [conditional expectation](/page/Conditional%20Expectation) and mutation with the weight ratio recovers the next Feynman-Kac prediction. Iterating these identities by conditional expectation gives $\mathbb{E}[\gamma_{M,N}(1)]=\gamma_M(1)=Z_M$. [/proofplan] [step:Define the exact and particle unnormalized functionals] For $0 \leq s \leq M$, define the unnormalized path density \begin{align*} \gamma_s:E_0 \times \cdots \times E_s \to [0,\infty) \end{align*} by \begin{align*} \gamma_s(x_{0:s}) = \mu_0(x_0)g_0(y_0\mid x_0) \prod_{r=1}^s f_r(x_r\mid x_{r-1})g_r(y_r\mid x_r). \end{align*} For every bounded measurable function \begin{align*} h:E_0 \times \cdots \times E_s \to \mathbb{R}, \end{align*} for which the following integral is finite, define \begin{align*} \gamma_s(h) = \int_{E_0 \times \cdots \times E_s} h(x_{0:s})\gamma_s(x_{0:s}) \,d(\lambda_0 \otimes \cdots \otimes \lambda_s)(x_{0:s}). \end{align*} Thus $\gamma_M(1)=Z_M$. For $0 \leq s \leq M$, define the empirical average weight \begin{align*} \overline W_s=\frac{1}{N}\sum_{i=1}^N W_{s,i}. \end{align*} For each realized particle system, define the time-zero particle unnormalized functional as the map \begin{align*} \gamma_{0,N}:\{h:E_0\to\mathbb{R}\text{ measurable with finite displayed sum}\}\to\mathbb{R} \end{align*} by \begin{align*} \gamma_{0,N}(h)=\frac{1}{N}\sum_{i=1}^N W_{0,i} h(X_{0,i}). \end{align*} For $1 \leq s \leq M$, define the time-$s$ particle unnormalized functional as the map \begin{align*} \gamma_{s,N}:\{h:E_0\times\cdots\times E_s\to\mathbb{R}\text{ measurable with finite displayed sum}\}\to\mathbb{R} \end{align*} by \begin{align*} \gamma_{s,N}(h)=\left(\prod_{r=0}^{s-1}\overline W_r\right)\frac{1}{N}\sum_{i=1}^N W_{s,i} h(X_{0:s,i}). \end{align*} With $h\equiv 1$, this gives \begin{align*} \gamma_{M,N}(1) = \prod_{s=0}^M \overline W_s = \widehat Z_{M,N}. \end{align*} [/step] [step:Verify the initial unbiasedness identity by integrating under $q_0$] Let \begin{align*} h:E_0 \to \mathbb{R} \end{align*} be measurable and integrable with respect to the initial unnormalized measure. Since $X_{0,1},\dots,X_{0,N}$ are independent with density $q_0$ with respect to $\lambda_0$, linearity of expectation gives \begin{align*} \mathbb{E}[\gamma_{0,N}(h)] = \frac{1}{N}\sum_{i=1}^N \mathbb{E}[\omega_0(X_{0,i})h(X_{0,i})]. \end{align*} For each $i$, the definition of $\omega_0$ and the positivity assumption on $q_0$ give \begin{align*} \mathbb{E}[\omega_0(X_{0,i})h(X_{0,i})] = \int_{E_0} h(x_0) \frac{\mu_0(x_0)g_0(y_0\mid x_0)}{q_0(x_0)} q_0(x_0) \,d\lambda_0(x_0). \end{align*} Hence \begin{align*} \mathbb{E}[\omega_0(X_{0,i})h(X_{0,i})] = \int_{E_0} h(x_0)\mu_0(x_0)g_0(y_0\mid x_0) \,d\lambda_0(x_0) = \gamma_0(h). \end{align*} Therefore \begin{align*} \mathbb{E}[\gamma_{0,N}(h)] = \gamma_0(h). \end{align*} [guided] The first step checks that importance sampling is unbiased before any resampling occurs. We take a measurable function $h:E_0 \to \mathbb{R}$ such that the displayed expectations and integrals are finite. The random variables $X_{0,1},\dots,X_{0,N}$ are independent and each has density $q_0$ with respect to $\lambda_0$, while $W_{0,i}=\omega_0(X_{0,i})$. Therefore \begin{align*} \mathbb{E}[\gamma_{0,N}(h)] = \mathbb{E}\left[ \frac{1}{N}\sum_{i=1}^N W_{0,i} h(X_{0,i}) \right]. \end{align*} By linearity of expectation and the identical distribution of the particles, \begin{align*} \mathbb{E}[\gamma_{0,N}(h)] = \frac{1}{N}\sum_{i=1}^N \mathbb{E}[\omega_0(X_{0,i})h(X_{0,i})]. \end{align*} Now compute one summand using the proposal density $q_0$: \begin{align*} \mathbb{E}[\omega_0(X_{0,i})h(X_{0,i})] = \int_{E_0} h(x_0)\omega_0(x_0)q_0(x_0) \,d\lambda_0(x_0). \end{align*} The positivity assumption on $q_0$ ensures that the ratio defining $\omega_0$ is valid wherever the numerator can contribute. Substituting \begin{align*} \omega_0(x_0)=\frac{\mu_0(x_0)g_0(y_0\mid x_0)}{q_0(x_0)} \end{align*} gives \begin{align*} \mathbb{E}[\omega_0(X_{0,i})h(X_{0,i})] = \int_{E_0} h(x_0)\mu_0(x_0)g_0(y_0\mid x_0) \,d\lambda_0(x_0). \end{align*} The last integral is exactly $\gamma_0(h)$ by the definition of the initial unnormalized functional. Averaging $N$ identical copies of the same expectation does not change the value, so \begin{align*} \mathbb{E}[\gamma_{0,N}(h)] = \gamma_0(h). \end{align*} [/guided] [/step] [step:Express one exact Feynman-Kac transition as a weighted proposal integral] Fix $1 \leq s \leq M$. For every measurable function \begin{align*} h:E_0 \times \cdots \times E_s \to \mathbb{R}, \end{align*} define \begin{align*} G_s h:E_0 \times \cdots \times E_{s-1} \to \mathbb{R} \end{align*} by \begin{align*} (G_s h)(x_{0:s-1}) = \int_{E_s} h(x_{0:s}) f_s(x_s\mid x_{s-1})g_s(y_s\mid x_s) \,d\lambda_s(x_s), \end{align*} whenever the integral is finite. Then the product definition of $\gamma_s$ and [Tonelli's theorem](/theorems/601) for nonnegative integrands, extended to integrable signed functions by applying the same argument to positive and negative parts, gives \begin{align*} \gamma_s(h)=\gamma_{s-1}(G_s h). \end{align*} Moreover, by the definition of $\omega_s$, \begin{align*} (G_s h)(x_{0:s-1}) = \int_{E_s} h(x_{0:s})\omega_s(x_{0:s})Q_s(x_s\mid x_{0:s-1}) \,d\lambda_s(x_s) \end{align*} for every previous path $x_{0:s-1}$ for which the integral is finite. The positivity assumption on $Q_s$ ensures that the ratio defining $\omega_s$ is valid on the support of the unnormalized transition density. [/step] [step:Show unbiased resampling and mutation give the conditional Feynman-Kac update] Fix $1 \leq s \leq M$, and let \begin{align*} h:E_0 \times \cdots \times E_s \to \mathbb{R} \end{align*} be measurable with the required integrability. Let $\mathcal F_{s-1}$ denote the sigma-algebra generated by all particles, weights, and resampling variables through time $s-1$. Conditional on $\mathcal F_{s-1}$, define the offspring count \begin{align*} C_{s,j}=\#\{i:A_{s,i}=j\} \end{align*} for $1 \leq j \leq N$. By assumption, \begin{align*} \mathbb{E}[C_{s,j}\mid\mathcal F_{s-1}] = \frac{N W_{s-1,j}}{\sum_{k=1}^N W_{s-1,k}}. \end{align*} Since mutation is conditionally independent given the ancestor indices and $\mathcal F_{s-1}$, the proposal-density calculation from the previous step gives \begin{align*} \mathbb{E}[W_{s,i} h(X_{0:s,i})\mid A_{s,i}=j,\mathcal F_{s-1}] = (G_s h)(X_{0:s-1,j}). \end{align*} Therefore \begin{align*} \mathbb{E}\left[ \frac{1}{N}\sum_{i=1}^N W_{s,i} h(X_{0:s,i}) \mid \mathcal F_{s-1} \right] = \frac{1}{N}\sum_{j=1}^N \mathbb{E}[C_{s,j}\mid\mathcal F_{s-1}] (G_s h)(X_{0:s-1,j}). \end{align*} Substituting the unbiased offspring-count identity yields \begin{align*} \mathbb{E}\left[ \frac{1}{N}\sum_{i=1}^N W_{s,i} h(X_{0:s,i}) \mid \mathcal F_{s-1} \right] = \frac{\sum_{j=1}^N W_{s-1,j}(G_s h)(X_{0:s-1,j})} {\sum_{k=1}^N W_{s-1,k}}. \end{align*} Multiplying by $\prod_{r=0}^{s-1}\overline W_r$, which is $\mathcal F_{s-1}$-measurable, and using \begin{align*} \overline W_{s-1} = \frac{1}{N}\sum_{k=1}^N W_{s-1,k} \end{align*} gives \begin{align*} \mathbb{E}[\gamma_{s,N}(h)\mid\mathcal F_{s-1}] = \gamma_{s-1,N}(G_s h). \end{align*} [guided] This is the main particle-filter step. We want to prove that one resampling-mutation-weighting cycle has the correct conditional mean. Fix $1 \leq s \leq M$ and a measurable function $h:E_0 \times \cdots \times E_s \to \mathbb{R}$ for which the relevant conditional expectations are finite. The sigma-algebra $\mathcal F_{s-1}$ contains the previous paths $X_{0:s-1,j}$ and weights $W_{s-1,j}$, so these quantities are fixed when we condition on $\mathcal F_{s-1}$. For each previous particle $j$, define the random offspring count \begin{align*} C_{s,j}=\#\{i:A_{s,i}=j\}. \end{align*} The resampling scheme is not assumed to be multinomial or independent across children. The only property used is the unbiased count condition \begin{align*} \mathbb{E}[C_{s,j}\mid\mathcal F_{s-1}] = \frac{N W_{s-1,j}}{\sum_{k=1}^N W_{s-1,k}}. \end{align*} The denominator is positive almost surely by hypothesis, so these conditional probabilities and expectations are well-defined. Now condition further on the event that the $i$-th new particle has ancestor $j$. Then the old part of its path is fixed as $X_{0:s-1,j}$, and the new coordinate $X_{s,i}$ is drawn from the proposal density $Q_s(\cdot\mid X_{0:s-1,j})$ with respect to $\lambda_s$. Thus \begin{align*} \mathbb{E}[W_{s,i} h(X_{0:s,i})\mid A_{s,i}=j,\mathcal F_{s-1}] = \int_{E_s} h(X_{0:s-1,j},x_s) \omega_s(X_{0:s-1,j},x_s) Q_s(x_s\mid X_{0:s-1,j}) \,d\lambda_s(x_s). \end{align*} Substituting the incremental weight ratio gives \begin{align*} \mathbb{E}[W_{s,i} h(X_{0:s,i})\mid A_{s,i}=j,\mathcal F_{s-1}] = \int_{E_s} h(X_{0:s-1,j},x_s) f_s(x_s\mid x_{s-1,j})g_s(y_s\mid x_s) \,d\lambda_s(x_s). \end{align*} Here $x_{s-1,j}$ denotes the terminal coordinate of the realized path $X_{0:s-1,j}$, so the transition density is evaluated at the previous state of ancestor $j$. By definition of $G_s h$, this is \begin{align*} \mathbb{E}[W_{s,i} h(X_{0:s,i})\mid A_{s,i}=j,\mathcal F_{s-1}] = (G_s h)(X_{0:s-1,j}). \end{align*} The sum over children can now be grouped by ancestor. Conditional on $\mathcal F_{s-1}$, all children with ancestor $j$ have the same conditional mean contribution $(G_s h)(X_{0:s-1,j})$. Therefore \begin{align*} \mathbb{E}\left[ \frac{1}{N}\sum_{i=1}^N W_{s,i} h(X_{0:s,i}) \mid \mathcal F_{s-1} \right] = \frac{1}{N}\sum_{j=1}^N \mathbb{E}[C_{s,j}\mid\mathcal F_{s-1}] (G_s h)(X_{0:s-1,j}). \end{align*} Using the unbiased offspring-count identity, this becomes \begin{align*} \mathbb{E}\left[ \frac{1}{N}\sum_{i=1}^N W_{s,i} h(X_{0:s,i}) \mid \mathcal F_{s-1} \right] = \frac{\sum_{j=1}^N W_{s-1,j}(G_s h)(X_{0:s-1,j})} {\sum_{k=1}^N W_{s-1,k}}. \end{align*} Finally multiply by the previous product of average weights. Since $\prod_{r=0}^{s-1}\overline W_r$ is $\mathcal F_{s-1}$-measurable, it may be pulled through the conditional expectation. The factor $\overline W_{s-1}=N^{-1}\sum_{k=1}^N W_{s-1,k}$ cancels the denominator above, leaving \begin{align*} \mathbb{E}[\gamma_{s,N}(h)\mid\mathcal F_{s-1}] = \left(\prod_{r=0}^{s-2}\overline W_r\right) \frac{1}{N}\sum_{j=1}^N W_{s-1,j}(G_s h)(X_{0:s-1,j}). \end{align*} The right-hand side is precisely $\gamma_{s-1,N}(G_s h)$ by the definition of the particle unnormalized functional. Hence \begin{align*} \mathbb{E}[\gamma_{s,N}(h)\mid\mathcal F_{s-1}] = \gamma_{s-1,N}(G_s h). \end{align*} [/guided] [/step] [step:Iterate the one-step identity to obtain the likelihood unbiasedness] Define the constant function \begin{align*} \mathbf{1}_M:E_0 \times \cdots \times E_M \to \mathbb{R} \end{align*} by $\mathbf{1}_M(x_{0:M})=1$. Since $\gamma_M(\mathbf{1}_M)=Z_M<\infty$, the backward functions constructed below are nonnegative and finite inside the exact unnormalized integrals; no pointwise finiteness outside null or irrelevant sets is needed. To justify the particle identities for possibly unbounded $h_s$, first apply the previous conditional identities to the bounded truncations $h_{s,m}=\min\{h_s,m\}$ for $m\in\mathbb{N}$. For these truncations, the corresponding weighted sums are bounded above by the same sums with [test function](/page/Test%20Function) $m$, so the displayed conditional expectations are finite by the integrability assumptions on the weights. As $m\to\infty$, the functions $h_{s,m}$ increase pointwise to $h_s$, and the nonnegative particle sums also increase pointwise. The [monotone convergence theorem](/theorems/509) for conditional expectations therefore permits passage to the limit in the conditional identities. The stated integrability assumptions on the particle weights and on $\widehat Z_{M,N}$ ensure that the limiting particle expectations appearing below are finite. For $0 \leq s \leq M-1$, define recursively [measurable functions](/page/Measurable%20Functions) \begin{align*} h_s:E_0 \times \cdots \times E_s \to \mathbb{R} \end{align*} by $h_M=\mathbf{1}_M$ and \begin{align*} h_{s-1}=G_s h_s \end{align*} for $1 \leq s \leq M$. The exact Feynman-Kac identity from above gives \begin{align*} \gamma_M(\mathbf{1}_M)=\gamma_0(h_0). \end{align*} The conditional particle identity gives, for every $1 \leq s \leq M$, \begin{align*} \mathbb{E}[\gamma_{s,N}(h_s)] = \mathbb{E}[\gamma_{s-1,N}(h_{s-1})], \end{align*} where the equality follows by taking expectations on both sides of \begin{align*} \mathbb{E}[\gamma_{s,N}(h_s)\mid\mathcal F_{s-1}] = \gamma_{s-1,N}(h_{s-1}). \end{align*} Iterating from $s=T$ down to $s=1$ yields \begin{align*} \mathbb{E}[\gamma_{M,N}(\mathbf{1}_M)] = \mathbb{E}[\gamma_{0,N}(h_0)]. \end{align*} By the initial unbiasedness identity, \begin{align*} \mathbb{E}[\gamma_{0,N}(h_0)] = \gamma_0(h_0). \end{align*} Combining these identities, \begin{align*} \mathbb{E}[\widehat Z_{M,N}] = \mathbb{E}[\gamma_{M,N}(\mathbf{1}_M)] = \gamma_M(\mathbf{1}_M) = Z_M. \end{align*} In the hidden Markov model interpretation, $p(y_{0:M})$ denotes the marginal likelihood of the fixed observations obtained by integrating the joint latent-observation density over the latent path, so $Z_M=p(y_{0:M})$. [/step]

What brings you to Androma?

Start with a route through the knowledge graph.