Rejection Sampling Correctness Theorem — Statement & Proof

Rejection Sampling Correctness Theorem (Theorem # 7214)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We first analyze one proposal-acceptance pair $(Y,U)$ and compute the joint probability that the proposal lies in a measurable set $B$ and is accepted. The domination hypothesis implies $q a=p/M$ $\mu$-almost everywhere, including on the set where $q=0$, so the joint accepted law is exactly $M^{-1}p\,d\mu$. Taking $B=E$ gives the single-trial success probability $1/M$, and conditioning identifies the law of one accepted proposal. Finally, independence of the repeated pairs makes the acceptance indicators independent Bernoulli random variables, which gives the geometric law of $T$, almost sure finiteness, and the law of $Y_T$. [/proofplan] [step:Compute the accepted law for one proposal] Let $Y:(\Omega,\mathcal F)\to(E,\mathcal E)$ be an $E$-valued [random variable](/page/Random%20Variable) with density $q$ with respect to $\mu$, and let $U:(\Omega,\mathcal F)\to((0,1),\mathcal B((0,1)))$ be independent of $Y$ with distribution $\operatorname{Unif}(0,1)$. Define the event $A\in\mathcal F$ by \begin{align*} A=\{U\le a(Y)\}. \end{align*} First observe that $p=0$ $\mu$-almost everywhere on the measurable set $\{x\in E:q(x)=0\}$. Indeed, on the full-measure set where $p\le Mq$, the condition $q(x)=0$ implies $0\le p(x)\le0$. Hence, for $\mu$-almost every $x\in E$, \begin{align*} q(x)a(x)=\frac{p(x)}{M}. \end{align*} Let $N\in\mathcal E$ be a $\mu$-null set outside which $p\le Mq$. Then $0\le a(x)\le1$ for every $x\in E\setminus N$ with $q(x)>0$, while $a(x)=0$ on $\{q=0\}$. Since \begin{align*} \int_N q(x)\,d\mu(x)=0, \end{align*} the exceptional set has zero mass under the proposal measure $q\,d\mu$. Let $B\in\mathcal E$. Let $\mathcal L^1$ denote one-dimensional [Lebesgue measure](/page/Lebesgue%20Measure) on $(0,1)$, and let $\nu$ denote the proposal probability measure on $(E,\mathcal E)$ defined by \begin{align*} \nu(C)=\int_C q(y)\,d\mu(y) \end{align*} for $C\in\mathcal E$. Since $Y$ has density $q$ and $U$ is independent of $Y$, the joint law of $(Y,U)$ on $E\times(0,1)$ is $\nu\otimes\mathcal L^1$. Equivalently, this joint law has density $(y,u)\mapsto q(y)$ with respect to $\mu\otimes\mathcal L^1$. Integrating the indicator of the measurable set $\{(y,u)\in B\times(0,1):u\le a(y)\}$ against this product law gives \begin{align*} \mathbb P(Y\in B,A)=\int_B\int_0^1 \mathbb 1_{\{u\le a(y)\}}q(y)\,d\mathcal L^1(u)\,d\mu(y). \end{align*} For every $y\in E\setminus N$, the inner integral equals $a(y)$ because $0\le a(y)\le1$. Since $N$ has zero mass under $q\,d\mu$, changing the value of the inner integral on $N$ does not change the outer integral. Thus \begin{align*} \mathbb P(Y\in B,A)=\int_B q(y)a(y)\,d\mu(y). \end{align*} Using $q a=p/M$ $\mu$-almost everywhere gives \begin{align*} \mathbb P(Y\in B,A)=\frac{1}{M}\int_B p(y)\,d\mu(y). \end{align*} [guided] We isolate one proposal because rejection sampling repeats identical independent trials. Let $Y:(\Omega,\mathcal F)\to(E,\mathcal E)$ have density $q$ with respect to $\mu$, let $U:(\Omega,\mathcal F)\to((0,1),\mathcal B((0,1)))$ be independent of $Y$ and uniformly distributed on $(0,1)$, and define \begin{align*} A=\{U\le a(Y)\}. \end{align*} The first point is the almost-everywhere bookkeeping at the set where $q$ vanishes. The domination assumption says $p(x)\le Mq(x)$ for $\mu$-almost every $x\in E$. Choose a measurable set $N\in\mathcal E$ with $\mu(N)=0$ such that $p(x)\le Mq(x)$ for every $x\in E\setminus N$. On the part of $E\setminus N$ where $q(x)=0$, this gives $0\le p(x)\le0$, hence $p(x)=0$. Therefore the convention $a(x)=0$ on $\{q=0\}$ is compatible with the identity \begin{align*} q(x)a(x)=\frac{p(x)}{M} \end{align*} for $\mu$-almost every $x\in E$. The same domination gives $0\le a(x)\le1$ for every $x\in E\setminus N$: when $q(x)>0$ this follows by dividing $p(x)\le Mq(x)$ by $Mq(x)$, and when $q(x)=0$ it follows from the definition $a(x)=0$. Since \begin{align*} \int_N q(x)\,d\mu(x)=0, \end{align*} the exceptional set has zero mass under the proposal measure $q\,d\mu$ and is never seen by $Y$ with positive probability. Now fix a measurable set $B\in\mathcal E$. Let $\mathcal L^1$ denote one-dimensional Lebesgue measure on $(0,1)$, and define the proposal probability measure $\nu$ on $(E,\mathcal E)$ by \begin{align*} \nu(C)=\int_C q(y)\,d\mu(y) \end{align*} for $C\in\mathcal E$. Independence of $Y$ and $U$ means that the joint law of $(Y,U)$ is $\nu\otimes\mathcal L^1$. Since $\nu$ has density $q$ with respect to $\mu$, the same joint law has density $(y,u)\mapsto q(y)$ with respect to $\mu\otimes\mathcal L^1$. Therefore the event $\{Y\in B,U\le a(Y)\}$ has probability obtained by integrating its indicator against this product law: \begin{align*} \mathbb P(Y\in B,A)=\int_B\int_0^1 \mathbb 1_{\{u\le a(y)\}}q(y)\,d\mathcal L^1(u)\,d\mu(y). \end{align*} For each $y\in E\setminus N$, the one-dimensional Lebesgue measure of $\{u\in(0,1):u\le a(y)\}$ is $a(y)$, because $0\le a(y)\le1$. Hence the inner integral becomes $a(y)$ outside $N$. The contribution of $N$ to the outer integral is zero because $\int_N q\,d\mu=0$, and so \begin{align*} \mathbb P(Y\in B,A)=\int_B q(y)a(y)\,d\mu(y). \end{align*} Substituting the almost-everywhere identity $q a=p/M$ yields \begin{align*} \mathbb P(Y\in B,A)=\frac{1}{M}\int_B p(y)\,d\mu(y). \end{align*} This is the central calculation: among accepted proposals, the unnormalised law is exactly the target density scaled by the constant $1/M$. [/guided] [/step] [step:Identify the one-step acceptance probability and conditional accepted distribution] Taking $B=E$ in the preceding formula and using that $p$ is a probability density gives \begin{align*} \mathbb P(A)=\frac{1}{M}\int_E p(y)\,d\mu(y)=\frac{1}{M}. \end{align*} Since $M<\infty$, this probability is positive. Therefore, for every $B\in\mathcal E$, \begin{align*} \mathbb P(Y\in B\mid A)=\frac{\mathbb P(Y\in B,A)}{\mathbb P(A)}=\int_B p(y)\,d\mu(y). \end{align*} Thus the conditional law of a proposal given acceptance has density $p$ with respect to $\mu$. [/step] [step:Turn repeated acceptances into independent Bernoulli trials] For each $n\in\mathbb N$, define the acceptance event $A_n\in\mathcal F$ and its indicator $I_n:\Omega\to\{0,1\}$ by \begin{align*} A_n=\{U_n\le a(Y_n)\} \end{align*} and \begin{align*} I_n=\mathbb 1_{A_n}. \end{align*} The pairs $(Y_n,U_n)$ are independent because the sequence $(Y_n)_{n\ge1}$ is independent, the sequence $(U_n)_{n\ge1}$ is independent, and the two families are independent of each other. Since $A_n$ is determined by the pair $(Y_n,U_n)$, the events $(A_n)_{n\ge1}$ are independent. Applying the one-step computation to each pair gives \begin{align*} \mathbb P(A_n)=\frac{1}{M} \end{align*} for every $n\in\mathbb N$. Hence $(I_n)_{n\ge1}$ are independent Bernoulli random variables with success probability $1/M$. [/step] [step:Derive the geometric law and almost sure finiteness of the stopping time] For $n\in\mathbb N$, the event $\{T=n\}$ is \begin{align*} \{T=n\}=A_1^c\cap\cdots\cap A_{n-1}^c\cap A_n. \end{align*} Using independence of the events $(A_k)_{k\ge1}$ and the identity $\mathbb P(A_k)=1/M$, we obtain \begin{align*} \mathbb P(T=n)=\left(1-\frac{1}{M}\right)^{n-1}\frac{1}{M}. \end{align*} Also \begin{align*} \mathbb P(T>n)=\mathbb P(A_1^c\cap\cdots\cap A_n^c)=\left(1-\frac{1}{M}\right)^n. \end{align*} Because $M<\infty$, we have $1/M>0$, so \begin{align*} \lim_{n\to\infty}\left(1-\frac{1}{M}\right)^n=0. \end{align*} The events $\{T>n\}$ decrease to $\{T=\infty\}$. By continuity from above for probability measures, therefore \begin{align*} \mathbb P(T=\infty)=\lim_{n\to\infty}\mathbb P(T>n)=0. \end{align*} Thus $T<\infty$ almost surely and $T$ has the geometric distribution on $\{1,2,\dots\}$ with success probability $1/M$. [/step] [step:Sum over the accepted trial to obtain the density of $Y_T$] Let $B\in\mathcal E$. Since $T<\infty$ almost surely, the events $\{T=n\}$ for $n\in\mathbb N$ partition a full-probability subset of $\Omega$. Hence \begin{align*} \mathbb P(Y_T\in B)=\sum_{n=1}^{\infty}\mathbb P(Y_n\in B,T=n). \end{align*} For each $n$, the event $\{T=n\}$ is $A_1^c\cap\cdots\cap A_{n-1}^c\cap A_n$. The event $A_1^c\cap\cdots\cap A_{n-1}^c$ is determined by the pairs $(Y_k,U_k)$ with $1\le k<n$, while the event $\{Y_n\in B\}\cap A_n$ is determined by $(Y_n,U_n)$. By independence of the pairs, \begin{align*} \mathbb P(Y_n\in B,T=n)=\left(1-\frac{1}{M}\right)^{n-1}\mathbb P(Y_n\in B,A_n). \end{align*} The one-step accepted-law computation gives \begin{align*} \mathbb P(Y_n\in B,A_n)=\frac{1}{M}\int_B p(y)\,d\mu(y). \end{align*} Therefore \begin{align*} \mathbb P(Y_T\in B)=\sum_{n=1}^{\infty}\left(1-\frac{1}{M}\right)^{n-1}\frac{1}{M}\int_B p(y)\,d\mu(y). \end{align*} The geometric series has sum $1$ because $1/M>0$, so \begin{align*} \mathbb P(Y_T\in B)=\int_B p(y)\,d\mu(y). \end{align*} Since this holds for every $B\in\mathcal E$, the accepted output $Y_T$ has density $p$ with respect to $\mu$. This completes the proof. [/step]

Explore Further

Finite-Horizon Linear Quadratic Regulator Theorem applied Convergence of Accumulation Points of the Central Path applied Closed Loop Signal Maps applied NP-Completeness of Independent Set applied Kalman Observability Rank Theorem applied Electric Dipole Angular Selection Rules applied Reversibility of the Metropolis-Adjusted Langevin Algorithm applied Polynomial-Time Decidability of Bipartite Matching applied

What brings you to Androma?

Start with a route through the knowledge graph.

Rejection Sampling Correctness Theorem (Theorem # 7214)

Discussion

Proof

Explore Further

Sign in to Androma

Check your inbox

One last step

Rejection Sampling Correctness Theorem (Theorem # 7214)

Discussion

Proof

Explore Further