Exponential Tilting Change of Measure for Rare-Event Importance Sampling

Theorem

Edit Issues Pull Requests Attributions Admin

Let $n\in\mathbb N$. Let $(\Omega,\mathcal F,\mathbb P)$ be a probability space, and let $X_1,X_2,\dots,X_n: (\Omega,\mathcal F)\to(\mathbb R,\mathcal B(\mathbb R))$ be independent and identically distributed real-valued random variables with common law $\mu$. Define the finite moment-generating domain \begin{align*} D_X=\left\{\theta\in\mathbb R:\int_{\mathbb R}e^{\theta x}\,d\mu(x)<\infty\right\}. \end{align*} Assume that $D_X$ contains an open interval containing $0$, and define the moment generating function \begin{align*} M_X:D_X&\to(0,\infty) \end{align*} \begin{align*} \theta&\mapsto \int_{\mathbb R} e^{\theta x}\,d\mu(x). \end{align*} Fix $a\in\mathbb R$ and define $S_n=X_1+\cdots+X_n$ and \begin{align*} p_n=\mathbb P(S_n\geq na). \end{align*} For any $\lambda>0$ with $\lambda\in D_X$, define the exponentially tilted probability measure $\mu_\lambda$ on $(\mathbb R,\mathcal B(\mathbb R))$ by \begin{align*} \mu_\lambda(A)=\frac{1}{M_X(\lambda)}\int_A e^{\lambda x}\,d\mu(x) \end{align*} for every $A\in\mathcal B(\mathbb R)$. Let $\mathbb P_\lambda=\mu_\lambda^{\otimes n}$ be the product tilted measure on $(\mathbb R^n,\mathcal B(\mathbb R^n))$, and let \begin{align*} T_n:\mathbb R^n&\to\mathbb R \end{align*} \begin{align*} (x_1,\dots,x_n)&\mapsto x_1+\cdots+x_n \end{align*} be the coordinate sum. Define the scalar likelihood-ratio map \begin{align*} L_{\lambda,n}:\mathbb R&\to(0,\infty) \end{align*} \begin{align*} s&\mapsto \exp\left(-\lambda s+n\log M_X(\lambda)\right). \end{align*} Then \begin{align*} p_n=\mathbb E_{\mathbb P_\lambda}\left[\mathbb 1_{\{T_n\geq na\}}L_{\lambda,n}(T_n)\right]. \end{align*} Equivalently, for a tilted sample $(Y_1,\dots,Y_n): (\mathbb R^n,\mathcal B(\mathbb R^n),\mathbb P_\lambda)\to(\mathbb R^n,\mathcal B(\mathbb R^n))$ given by the coordinate projections, the coordinates are independent with common law $\mu_\lambda$, the tilted sum $\widetilde S_n=Y_1+\cdots+Y_n$ satisfies \begin{align*} L_{\lambda,n}(\widetilde S_n)=\exp\left(-\lambda \widetilde S_n+n\log M_X(\lambda)\right), \end{align*} and \begin{align*} p_n=\mathbb E_{\mathbb P_\lambda}\left[\mathbb 1_{\{\widetilde S_n\geq na\}}L_{\lambda,n}(\widetilde S_n)\right]. \end{align*} If, in addition, $\lambda\in\operatorname{int}(D_X)$ and \begin{align*} (\log M_X)'(\lambda)=a, \end{align*} then \begin{align*} \mathbb E_{\mu_\lambda}[Y_1]=a. \end{align*}

Discussion

Proof

[proofplan] We first construct the tilted one-dimensional law and verify that it is a probability measure. Independence then turns the one-dimensional density into a product density depending only on the sum of the coordinates. Taking the reciprocal gives the likelihood ratio from the tilted product law back to the original product law, and integrating the rare-event indicator against this likelihood ratio gives the importance-sampling identity. Finally, differentiating the log moment generating function identifies the tilted mean and shows that the equation $(\log M_X)'(\lambda)=a$ centers the proposal at the rare-event boundary. [/proofplan] [step:Construct the exponentially tilted one-dimensional law] Let $\lambda>0$ satisfy $M_X(\lambda)<\infty$. Define \begin{align*} r_\lambda:\mathbb R&\to[0,\infty) \end{align*} \begin{align*} x&\mapsto \frac{e^{\lambda x}}{M_X(\lambda)}. \end{align*} Since $e^{\lambda x}>0$ for every $x\in\mathbb R$ and $0<M_X(\lambda)<\infty$, the map $r_\lambda$ is a non-negative $\mathcal B(\mathbb R)$-measurable function. Moreover, \begin{align*} \int_{\mathbb R} r_\lambda(x)\,d\mu(x)=\frac{1}{M_X(\lambda)}\int_{\mathbb R}e^{\lambda x}\,d\mu(x)=1. \end{align*} Therefore the formula \begin{align*} \mu_\lambda(A)=\int_A r_\lambda(x)\,d\mu(x) \end{align*} defines a probability measure on $(\mathbb R,\mathcal B(\mathbb R))$. By construction, $\mu_\lambda$ is absolutely continuous with respect to $\mu$, and its Radon-Nikodym density is $r_\lambda$. [/step] [step:Compute the product density under independent tilting] Let $\mathbb P_n=\mu^{\otimes n}$ denote the original product law on $(\mathbb R^n,\mathcal B(\mathbb R^n))$, and let $\mathbb P_\lambda=\mu_\lambda^{\otimes n}$ denote the tilted product law. Define \begin{align*} R_{\lambda,n}:\mathbb R^n&\to(0,\infty) \end{align*} \begin{align*} (x_1,\dots,x_n)&\mapsto \prod_{i=1}^n r_\lambda(x_i). \end{align*} For measurable rectangles $A_1\times\cdots\times A_n$ with $A_i\in\mathcal B(\mathbb R)$, the product structure gives \begin{align*} \mathbb P_\lambda(A_1\times\cdots\times A_n)=\prod_{i=1}^n \int_{A_i} r_\lambda(x_i)\,d\mu(x_i). \end{align*} By the defining property of product integration for non-negative measurable product functions, this is \begin{align*} \int_{A_1\times\cdots\times A_n} R_{\lambda,n}(x_1,\dots,x_n)\,d\mathbb P_n(x_1,\dots,x_n). \end{align*} Since measurable rectangles generate $\mathcal B(\mathbb R^n)$ and both sides define measures on $\mathcal B(\mathbb R^n)$, the same identity holds for every $B\in\mathcal B(\mathbb R^n)$. Thus $\mathbb P_\lambda$ is absolutely continuous with respect to $\mathbb P_n$, with density $R_{\lambda,n}$. For $x=(x_1,\dots,x_n)\in\mathbb R^n$, using $T_n(x)=x_1+\cdots+x_n$ gives \begin{align*} R_{\lambda,n}(x)=\prod_{i=1}^n \frac{e^{\lambda x_i}}{M_X(\lambda)}=\exp\left(\lambda T_n(x)-n\log M_X(\lambda)\right). \end{align*} [guided] The point of exponential tilting is that the one-dimensional density multiplies cleanly when the coordinates are sampled independently. We define the original product measure $\mathbb P_n=\mu^{\otimes n}$ and the tilted product measure $\mathbb P_\lambda=\mu_\lambda^{\otimes n}$ on the same measurable space $(\mathbb R^n,\mathcal B(\mathbb R^n))$. The coordinate-wise density is \begin{align*} r_\lambda(x)=\frac{e^{\lambda x}}{M_X(\lambda)}. \end{align*} Therefore the natural candidate density for the whole sample is the product map \begin{align*} R_{\lambda,n}:\mathbb R^n&\to(0,\infty) \end{align*} \begin{align*} (x_1,\dots,x_n)&\mapsto \prod_{i=1}^n r_\lambda(x_i). \end{align*} We verify this on measurable rectangles first, because rectangles generate the product $\sigma$-algebra. If $A_1,\dots,A_n\in\mathcal B(\mathbb R)$, then independence under the product tilted law gives \begin{align*} \mathbb P_\lambda(A_1\times\cdots\times A_n)=\prod_{i=1}^n \mu_\lambda(A_i). \end{align*} Using the definition of $\mu_\lambda$ in each factor, \begin{align*} \prod_{i=1}^n \mu_\lambda(A_i)=\prod_{i=1}^n \int_{A_i} r_\lambda(x_i)\,d\mu(x_i). \end{align*} The product integration formula for non-negative [measurable functions](/page/Measurable%20Functions) gives \begin{align*} \prod_{i=1}^n \int_{A_i} r_\lambda(x_i)\,d\mu(x_i)=\int_{A_1\times\cdots\times A_n} R_{\lambda,n}(x_1,\dots,x_n)\,d\mathbb P_n(x_1,\dots,x_n). \end{align*} Both sides define measures in the set variable, and they agree on the generating class of rectangles. Hence they agree on all of $\mathcal B(\mathbb R^n)$, so $R_{\lambda,n}$ is the Radon-Nikodym density of $\mathbb P_\lambda$ with respect to $\mathbb P_n$. Now we compute the density in terms of the sample sum. For $x=(x_1,\dots,x_n)$, \begin{align*} R_{\lambda,n}(x)=\prod_{i=1}^n \frac{e^{\lambda x_i}}{M_X(\lambda)}. \end{align*} Multiplying the exponentials adds their exponents, and multiplying the $n$ identical normalising constants gives $M_X(\lambda)^n$. Therefore \begin{align*} R_{\lambda,n}(x)=\frac{e^{\lambda(x_1+\cdots+x_n)}}{M_X(\lambda)^n}=\exp\left(\lambda T_n(x)-n\log M_X(\lambda)\right). \end{align*} This is the central algebraic reason exponential tilting is useful: the likelihood ratio depends on the sample only through the statistic $T_n(x)$. [/guided] [/step] [step:Invert the product density to obtain the likelihood ratio] Since $R_{\lambda,n}(x)>0$ for every $x\in\mathbb R^n$, the original product law $\mathbb P_n$ is absolutely continuous with respect to $\mathbb P_\lambda$, and its density is the reciprocal. Define the scalar likelihood-ratio map \begin{align*} L_{\lambda,n}:\mathbb R&\to(0,\infty) \end{align*} \begin{align*} s&\mapsto \exp\left(-\lambda s+n\log M_X(\lambda)\right). \end{align*} Using the product density computed above, the Radon-Nikodym derivative is \begin{align*} \frac{d\mathbb P_n}{d\mathbb P_\lambda}(x)=\frac{1}{R_{\lambda,n}(x)}=L_{\lambda,n}(T_n(x)). \end{align*} Thus, if $(Y_1,\dots,Y_n)$ has law $\mathbb P_\lambda$ and $\widetilde S_n=Y_1+\cdots+Y_n$, then the likelihood ratio evaluated at the tilted sample is \begin{align*} L_{\lambda,n}(\widetilde S_n)=\exp\left(-\lambda \widetilde S_n+n\log M_X(\lambda)\right). \end{align*} [/step] [step:Apply the change of measure identity to the rare-event indicator] Define the rare-event set \begin{align*} B_{n,a}=\{x\in\mathbb R^n:T_n(x)\geq na\}. \end{align*} Since $T_n:\mathbb R^n\to\mathbb R$ is continuous, $B_{n,a}=T_n^{-1}([na,\infty))$ belongs to $\mathcal B(\mathbb R^n)$. The distribution of $(X_1,\dots,X_n)$ under $\mathbb P$ is $\mathbb P_n$, so \begin{align*} p_n=\mathbb P(S_n\geq na)=\mathbb P_n(B_{n,a}). \end{align*} Using the density $d\mathbb P_n/d\mathbb P_\lambda=L_{\lambda,n}\circ T_n$, \begin{align*} \mathbb P_n(B_{n,a})=\int_{\mathbb R^n}\mathbb 1_{B_{n,a}}(x)L_{\lambda,n}(T_n(x))\,d\mathbb P_\lambda(x). \end{align*} Substituting the expression for $L_{\lambda,n}$ gives \begin{align*} p_n=\mathbb E_{\mathbb P_\lambda}\left[\mathbb 1_{\{T_n\geq na\}}\exp\left(-\lambda T_n+n\log M_X(\lambda)\right)\right]. \end{align*} Equivalently, for a tilted sample $(Y_1,\dots,Y_n)$ and $\widetilde S_n=Y_1+\cdots+Y_n$, \begin{align*} p_n=\mathbb E_{\mathbb P_\lambda}\left[\mathbb 1_{\{\widetilde S_n\geq na\}}L_{\lambda,n}(\widetilde S_n)\right]. \end{align*} [/step] [step:Differentiate the log moment generating function to identify the tilted mean] Assume now that $\lambda$ belongs to the interior of the set on which $M_X$ is finite. Then there exists $\delta>0$ such that $M_X(\lambda-\delta)<\infty$ and $M_X(\lambda+\delta)<\infty$. For every $x\in\mathbb R$, the elementary inequality $|x|\leq \delta^{-1}e^{\delta |x|}$ gives \begin{align*} |x|e^{\lambda x}\leq \delta^{-1}\left(e^{(\lambda+\delta)x}+e^{(\lambda-\delta)x}\right). \end{align*} The right-hand side is $\mu$-integrable by the two-sided finiteness above, so $x\mapsto x e^{\lambda x}$ is $\mu$-integrable. The standard differentiability property of moment generating functions on the interior of their finite domain, justified here by domination on a smaller neighbourhood of $\lambda$, gives \begin{align*} M_X'(\lambda)=\int_{\mathbb R}x e^{\lambda x}\,d\mu(x). \end{align*} Hence, if $Y_1$ has law $\mu_\lambda$, \begin{align*} \mathbb E_{\mu_\lambda}[Y_1]=\int_{\mathbb R}x\,d\mu_\lambda(x)=\frac{1}{M_X(\lambda)}\int_{\mathbb R}x e^{\lambda x}\,d\mu(x)=\frac{M_X'(\lambda)}{M_X(\lambda)}=(\log M_X)'(\lambda). \end{align*} Therefore, when $(\log M_X)'(\lambda)=a$, the tilted one-step mean is $\mathbb E_{\mu_\lambda}[Y_1]=a$. Since the tilted coordinates are independent and identically distributed with this mean, the tilted sum has mean $na$, so the threshold $na$ lies at the typical first-order scale under the tilted proposal. [guided] We now explain why the equation involving $(\log M_X)'$ chooses the tilt with the desired mean. The tilted law is defined by weighting the original law by $e^{\lambda x}$ and then normalising. Therefore, if $Y_1$ has distribution $\mu_\lambda$, its expectation is computed from the defining density: \begin{align*} \mathbb E_{\mu_\lambda}[Y_1]=\int_{\mathbb R}x\,d\mu_\lambda(x)=\frac{1}{M_X(\lambda)}\int_{\mathbb R}x e^{\lambda x}\,d\mu(x). \end{align*} The remaining question is how to recognize the numerator. Because $\lambda$ is an interior point of the finite domain of the moment generating function, there exists $\delta>0$ such that $M_X(\lambda-\delta)$ and $M_X(\lambda+\delta)$ are finite. The bound \begin{align*} |x|e^{\lambda x}\leq \delta^{-1}\left(e^{(\lambda+\delta)x}+e^{(\lambda-\delta)x}\right) \end{align*} shows that $x\mapsto x e^{\lambda x}$ is $\mu$-integrable. This is the integrability needed for the standard differentiability theorem for moment generating functions on the interior of their finite domain, so differentiation may be passed under the integral sign at $\lambda$. This gives \begin{align*} M_X'(\lambda)=\int_{\mathbb R}x e^{\lambda x}\,d\mu(x). \end{align*} Substituting this identity into the expectation formula yields \begin{align*} \mathbb E_{\mu_\lambda}[Y_1]=\frac{M_X'(\lambda)}{M_X(\lambda)}. \end{align*} Since $M_X(\lambda)>0$, the derivative of the logarithm is \begin{align*} (\log M_X)'(\lambda)=\frac{M_X'(\lambda)}{M_X(\lambda)}. \end{align*} Combining the two displayed identities gives \begin{align*} \mathbb E_{\mu_\lambda}[Y_1]=(\log M_X)'(\lambda). \end{align*} Thus, if $\lambda$ is chosen so that $(\log M_X)'(\lambda)=a$, then the tilted one-dimensional mean equals $a$. For an independent tilted sample $(Y_1,\dots,Y_n)$, linearity of expectation gives \begin{align*} \mathbb E_{\mathbb P_\lambda}[Y_1+\cdots+Y_n]=\sum_{i=1}^n\mathbb E_{\mu_\lambda}[Y_i]=na. \end{align*} So the boundary $\widetilde S_n\geq na$ is no longer exponentially displaced from the proposal mean; it is placed at the proposal mean itself. This proves the stated rare-event tilting principle. [/guided] [/step]

What brings you to Androma?

Start with a route through the knowledge graph.