Bias and Almost Sure Consistency of Self-Normalized Importance Sampling

Theorem

Edit Issues Pull Requests Attributions Admin

Let $(E,\mathcal E,\mu)$ be a [measure space](/page/Measure%20Space). Let $\gamma:E\to[0,\infty)$ and $q:E\to[0,\infty)$ be $\mathcal E$-[measurable functions](/page/Measurable%20Functions) such that $q$ is a probability density with respect to $\mu$, and assume that $\gamma(x)>0$ implies $q(x)>0$ for $\mu$-a.e. $x$. Define the proposal probability measure $\mathbb Q$ on $(E,\mathcal E)$ by \begin{align*} \mathbb Q(A)=\int_A q(x)\,d\mu(x) \end{align*} for every $A\in\mathcal E$. Define the normalizing constant \begin{align*} Z:=\int_E \gamma(x)\,d\mu(x) \end{align*} and assume $0<Z<\infty$. Define the target probability measure $\pi$ on $(E,\mathcal E)$ by \begin{align*} \pi(A)=\frac{1}{Z}\int_A \gamma(x)\,d\mu(x) \end{align*} for every $A\in\mathcal E$. Let $w:E\to[0,\infty]$ be the unnormalized importance weight defined by $w(x)=\gamma(x)/q(x)$ when $q(x)>0$ and by $w(x)=0$ when $q(x)=0$. Let $Y:(\Omega,\mathcal F,\mathbb P)\to(E,\mathcal E)$ be an $E$-valued [random variable](/page/Random%20Variable) with distribution $\mathbb Q$, and assume $\mathbb Q(w(Y)>0)>0$. Let $Y_1,Y_2,\dots:(\Omega,\mathcal F,\mathbb P)\to(E,\mathcal E)$ be independent $E$-valued random variables with common distribution $\mathbb Q$. If $h:E\to\mathbb R$ is $\mathcal E$-measurable and \begin{align*} \mathbb E_{\mathbb Q}[|w(Y)h(Y)|]<\infty, \end{align*} then, with $\mathbb P$-probability one, there exists a random integer $N_0:\Omega\to\mathbb N$ such that \begin{align*} \sum_{i=1}^N w(Y_i)>0 \end{align*} for every $N\ge N_0$. On this eventual-positive event, the self-normalized [importance sampling estimator](/theorems/2001) \begin{align*} \widehat I_{N,\mathrm{SN}}:=\frac{\sum_{i=1}^N w(Y_i)h(Y_i)}{\sum_{i=1}^N w(Y_i)} \end{align*} satisfies \begin{align*} \widehat I_{N,\mathrm{SN}}\xrightarrow{a.s.}\mathbb E_{\pi}[h(X)], \end{align*} where $X$ is an $E$-valued random variable with distribution $\pi$. Moreover, for every $N\in\mathbb N$, \begin{align*} \mathbb P\left(\sum_{i=1}^N w(Y_i)=0\right)=\mathbb Q(w(Y)=0)^N. \end{align*} In particular, for each finite $N$, $\widehat I_{N,\mathrm{SN}}$ is well-defined almost surely whenever $\mathbb Q(w(Y)>0)=1$. For finite $N$, the estimator is not unbiased in general.

Discussion

Proof

[proofplan] We identify the two random variables whose empirical averages form the numerator and denominator of the self-normalized estimator. The [Strong Law of Large Numbers](/theorems/520) gives almost sure convergence of these averages to their $\mathbb Q$-expectations; the definition of the importance weight converts those expectations into the unnormalized target integral and the normalizing constant $Z$. Since the denominator limit is strictly positive, the denominator is eventually positive and the ratio converges to the target expectation. The zero-denominator formula follows from nonnegativity of the weights and independence, and a two-point example shows that finite-sample unbiasedness fails in general. [/proofplan] [step:Convert the weighted expectations into target integrals] Let $Y:(\Omega,\mathcal F,\mathbb P)\to(E,\mathcal E)$ be an $E$-valued [random variable](/page/Random%20Variable) with distribution $\mathbb Q$. Define the real-valued random variables \begin{align*} A:=w(Y)h(Y) \end{align*} and \begin{align*} B:=w(Y). \end{align*} The hypothesis gives $\mathbb E_{\mathbb Q}[|A|]<\infty$. Also, \begin{align*} \mathbb E_{\mathbb Q}[B]=\int_E w(x)q(x)\,d\mu(x)=\int_{\{q>0\}} \gamma(x)\,d\mu(x). \end{align*} Since $\gamma(x)>0$ implies $q(x)>0$ for $\mu$-a.e. $x$, the set $\{q=0,\gamma>0\}$ has $\mu$-measure zero, so \begin{align*} \mathbb E_{\mathbb Q}[B]=\int_E \gamma(x)\,d\mu(x)=Z. \end{align*} Thus $B$ is integrable because $0\le B$ and $\mathbb E_{\mathbb Q}[B]=Z<\infty$. Similarly, since $w(x)q(x)=\gamma(x)$ for $\mu$-a.e. $x$ on the support relevant to $\mathbb Q$, the absolute integrability hypothesis gives \begin{align*} \int_E |h(x)|\gamma(x)\,d\mu(x)=\int_E |h(x)|w(x)q(x)\,d\mu(x)=\mathbb E_{\mathbb Q}[|A|]<\infty. \end{align*} Hence the signed integral below is well-defined, and \begin{align*} \mathbb E_{\mathbb Q}[A]=\int_E h(x)w(x)q(x)\,d\mu(x)=\int_E h(x)\gamma(x)\,d\mu(x). \end{align*} The target expectation is therefore \begin{align*} \mathbb E_{\pi}[h(X)]=\int_E h(x)\,d\pi(x)=\frac{1}{Z}\int_E h(x)\gamma(x)\,d\mu(x)=\frac{\mathbb E_{\mathbb Q}[A]}{\mathbb E_{\mathbb Q}[B]}. \end{align*} [guided] The estimator is a ratio, so we first identify the deterministic limits of its numerator and denominator. Let $Y:(\Omega,\mathcal F,\mathbb P)\to(E,\mathcal E)$ have distribution $\mathbb Q$, and define \begin{align*} A:=w(Y)h(Y) \end{align*} and \begin{align*} B:=w(Y). \end{align*} The random variable $A$ is integrable by the hypothesis \begin{align*} \mathbb E_{\mathbb Q}[|w(Y)h(Y)|]<\infty. \end{align*} For the denominator, we compute its expectation using the fact that $\mathbb Q$ has density $q$ with respect to $\mu$ and that $w=\gamma/q$ where $q>0$: \begin{align*} \mathbb E_{\mathbb Q}[B]=\int_E w(x)q(x)\,d\mu(x)=\int_{\{q>0\}} \gamma(x)\,d\mu(x). \end{align*} The support assumption says that $\gamma(x)>0$ can occur only where $q(x)>0$, except on a $\mu$-null set. Hence removing the set $\{q=0\}$ does not change the $\gamma$-integral, and therefore \begin{align*} \mathbb E_{\mathbb Q}[B]=\int_E \gamma(x)\,d\mu(x)=Z. \end{align*} Because $0\le B$ and $Z<\infty$, this also proves that $B$ is integrable. This is the condition needed to apply the [Strong Law of Large Numbers](/theorems/1852) to the denominator. The same density calculation first proves absolute integrability of the target integral: \begin{align*} \int_E |h(x)|\gamma(x)\,d\mu(x)=\int_E |h(x)|w(x)q(x)\,d\mu(x)=\mathbb E_{\mathbb Q}[|A|]<\infty. \end{align*} Thus the signed integral $\int_E h(x)\gamma(x)\,d\mu(x)$ is well-defined. The numerator limit is then \begin{align*} \mathbb E_{\mathbb Q}[A]=\int_E h(x)w(x)q(x)\,d\mu(x)=\int_E h(x)\gamma(x)\,d\mu(x). \end{align*} Finally, by the definition of $\pi$, \begin{align*} \mathbb E_{\pi}[h(X)]=\int_E h(x)\,d\pi(x)=\frac{1}{Z}\int_E h(x)\gamma(x)\,d\mu(x). \end{align*} Combining the last two identities with $\mathbb E_{\mathbb Q}[B]=Z$ gives \begin{align*} \mathbb E_{\pi}[h(X)]=\frac{\mathbb E_{\mathbb Q}[A]}{\mathbb E_{\mathbb Q}[B]}. \end{align*} This is the deterministic ratio to which the random self-normalized ratio will converge. [/guided] [/step] [step:Apply the Strong Law of Large Numbers to the numerator and denominator] For each $i\in\mathbb N$, define \begin{align*} A_i:=w(Y_i)h(Y_i) \end{align*} and \begin{align*} B_i:=w(Y_i). \end{align*} The random variables $A_1,A_2,\dots$ are independent and identically distributed with $\mathbb E_{\mathbb Q}[|A_1|]<\infty$, so each $A_i$ is finite $\mathbb P$-a.s. The random variables $B_1,B_2,\dots$ are independent and identically distributed, nonnegative, and satisfy $\mathbb E_{\mathbb Q}[B_1]=Z<\infty$, so each $B_i$ is finite $\mathbb P$-a.s. By the Strong Law of Large Numbers applied to the integrable i.i.d. sequence $A_1,A_2,\dots$, \begin{align*} \frac{1}{N}\sum_{i=1}^N A_i\xrightarrow{a.s.}\mathbb E_{\mathbb Q}[A] \end{align*} and applying the Strong Law of Large Numbers to the integrable i.i.d. sequence $B_1,B_2,\dots$ gives \begin{align*} \frac{1}{N}\sum_{i=1}^N B_i\xrightarrow{a.s.}Z. \end{align*} Let $\Omega_0\in\mathcal F$ be the event on which both almost sure convergences hold. Then $\mathbb P(\Omega_0)=1$. [/step] [step:Construct a measurable random time after which the denominator is positive] For each $N\in\mathbb N$, define the real-valued random variable \begin{align*} S_N:=\frac{1}{N}\sum_{i=1}^N B_i. \end{align*} For each $k\in\mathbb N$, define the event \begin{align*} C_k:=\bigcap_{N=k}^{\infty}\left\{\omega\in\Omega:S_N(\omega)>\frac{Z}{2}\right\}. \end{align*} Each event $C_k$ belongs to $\mathcal F$, because it is a countable intersection of inverse images of the open interval $(Z/2,\infty)$ under measurable random variables. Since $S_N(\omega)\to Z$ for every $\omega\in\Omega_0$ and $Z>0$, every $\omega\in\Omega_0$ belongs to $C_k$ for some $k\in\mathbb N$. Define $N_0:\Omega\to\mathbb N$ by \begin{align*} N_0(\omega):=\min\{k\in\mathbb N:\omega\in C_k\} \end{align*} when $\omega\in\bigcup_{k=1}^{\infty}C_k$, and set $N_0(\omega):=1$ otherwise. For each $m\in\mathbb N$, the event $\{N_0\le m\}$ is the measurable set \begin{align*} \left(\Omega\setminus\bigcup_{k=1}^{\infty}C_k\right)\cup\bigcup_{k=1}^m C_k, \end{align*} so $N_0$ is a measurable $\mathbb N$-valued random variable. If $\omega\in\Omega_0$ and $N\ge N_0(\omega)$, then $\omega\in C_{N_0(\omega)}$, so \begin{align*} \frac{1}{N}\sum_{i=1}^N B_i(\omega)>\frac{Z}{2}>0. \end{align*} Multiplying by $N>0$ gives \begin{align*} \sum_{i=1}^N w(Y_i(\omega))>0. \end{align*} Thus the denominator of $\widehat I_{N,\mathrm{SN}}$ is eventually positive on an event of probability one, with a measurable random integer $N_0$. [/step] [step:Take the ratio of the two almost sure limits] For every $\omega\in\Omega_0$ and every $N\ge N_0(\omega)$, \begin{align*} \widehat I_{N,\mathrm{SN}}(\omega)=\frac{N^{-1}\sum_{i=1}^N A_i(\omega)}{N^{-1}\sum_{i=1}^N B_i(\omega)}. \end{align*} The numerator converges to $\mathbb E_{\mathbb Q}[A]$, and the denominator converges to $Z\ne0$. By the continuity of the map $r:\mathbb R\times(\mathbb R\setminus\{0\})\to\mathbb R$ defined by \begin{align*} r(a,b)=\frac{a}{b}, \end{align*} we obtain \begin{align*} \widehat I_{N,\mathrm{SN}}(\omega)\to\frac{\mathbb E_{\mathbb Q}[A]}{Z}. \end{align*} Using the identity from the first step, \begin{align*} \frac{\mathbb E_{\mathbb Q}[A]}{Z}=\mathbb E_{\pi}[h(X)]. \end{align*} Therefore \begin{align*} \widehat I_{N,\mathrm{SN}}\xrightarrow{a.s.}\mathbb E_{\pi}[h(X)]. \end{align*} [/step] [step:Compute the zero-denominator probability from independence] For each $N\in\mathbb N$, the weights are nonnegative, so \begin{align*} \sum_{i=1}^N w(Y_i)=0 \end{align*} if and only if $w(Y_i)=0$ for every $i\in\{1,\dots,N\}$. Hence \begin{align*} \left\{\sum_{i=1}^N w(Y_i)=0\right\}=\bigcap_{i=1}^N \{w(Y_i)=0\}. \end{align*} The random variables $Y_1,\dots,Y_N$ are independent and identically distributed under $\mathbb Q$, so the events $\{w(Y_i)=0\}$ are independent and each has probability $\mathbb Q(w(Y)=0)$. Therefore \begin{align*} \mathbb P\left(\sum_{i=1}^N w(Y_i)=0\right)=\prod_{i=1}^N \mathbb P(w(Y_i)=0)=\mathbb Q(w(Y)=0)^N. \end{align*} If $\mathbb Q(w(Y)>0)=1$, then $\mathbb Q(w(Y)=0)=0$, so this probability is $0$ for every finite $N$. Hence, for each fixed finite $N$, the estimator is well-defined almost surely. [/step] [step:Exhibit a finite-sample case where the estimator is biased] It remains to justify the assertion that finite-sample unbiasedness fails in general. Take $E=\{0,1\}$ with $\mathcal E$ its power set and $\mu$ the counting measure. Define \begin{align*} q(0)=q(1)=\frac{1}{2}, \end{align*} and define $\gamma:E\to[0,\infty)$ by \begin{align*} \gamma(0)=1,\qquad \gamma(1)=\frac{1}{2}. \end{align*} Then \begin{align*} w(0)=\frac{\gamma(0)}{q(0)}=2,\qquad w(1)=\frac{\gamma(1)}{q(1)}=1, \end{align*} and \begin{align*} Z=\int_E \gamma(x)\,d\mu(x)=\frac{3}{2}. \end{align*} Let $h:E\to\mathbb R$ be defined by \begin{align*} h(0)=1,\qquad h(1)=0. \end{align*} For $N=1$, since $w(Y_1)>0$ always, \begin{align*} \widehat I_{1,\mathrm{SN}}=\frac{w(Y_1)h(Y_1)}{w(Y_1)}=h(Y_1). \end{align*} Thus \begin{align*} \mathbb E_{\mathbb Q}[\widehat I_{1,\mathrm{SN}}]=\mathbb E_{\mathbb Q}[h(Y_1)]=\frac{1}{2}. \end{align*} On the other hand, \begin{align*} \mathbb E_{\pi}[h(X)]=\frac{1}{Z}\int_E h(x)\gamma(x)\,d\mu(x)=\frac{1}{3/2}=\frac{2}{3}. \end{align*} Therefore \begin{align*} \mathbb E_{\mathbb Q}[\widehat I_{1,\mathrm{SN}}]\ne\mathbb E_{\pi}[h(X)]. \end{align*} This example proves that the self-normalized [importance sampling estimator](/theorems/2001) is not unbiased in general at finite sample size. [/step]

What brings you to Androma?

Start with a route through the knowledge graph.