Central Limit Theorem for Self-Normalized Importance Sampling

Central Limit Theorem for Self-Normalized Importance Sampling (Theorem # 7211)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] The proof first verifies the two basic importance-sampling identities: $\mathbb{E}_{\mathbb{Q}}[w(Y)]=Z$ and $\mathbb{E}_{\mathbb{Q}}[w(Y)h(Y)]=ZI$. These identities show that the centered weighted summands $w(Y_i)(h(Y_i)-I)$ have mean zero and finite second moment. The numerator then converges in distribution by the one-dimensional [Central Limit Theorem](/theorems/521), while the denominator converges almost surely to $Z>0$ by the [Strong Law of Large Numbers](/theorems/520). Slutsky's Theorem gives the limiting normal distribution for the ratio. [/proofplan] [step:Verify the importance-sampling identities under the support condition] Let $A_0:=\{y\in S:q(y)=0\}$ and $A_1:=\{y\in S:q(y)>0\}$. Since $\gamma=0$ $\mu$-almost everywhere on $A_0$, and since $wq=\gamma$ pointwise on $A_1$ while $wq=0$ on $A_0$, we have $wq=\gamma$ $\mu$-almost everywhere on $S$. Therefore \begin{align*} \mathbb{E}_{\mathbb{Q}}[w(Y)]=\int_S w(y)\,d\mathbb{Q}(y)=\int_S w(y)q(y)\,d\mu(y)=\int_S \gamma(y)\,d\mu(y)=Z. \end{align*} The assumption $\mathbb{E}_{\mathbb{Q}}[w(Y)|h(Y)|]<\infty$ gives absolute integrability of $w(Y)h(Y)$ under $\mathbb{Q}$. Using $d\pi=Z^{-1}\gamma\,d\mu$ and $wq=\gamma$ $\mu$-almost everywhere, \begin{align*} \mathbb{E}_{\mathbb{Q}}[w(Y)h(Y)]=\int_S w(y)h(y)\,d\mathbb{Q}(y)=\int_S h(y)\gamma(y)\,d\mu(y)=Z\int_S h(y)\,d\pi(y)=ZI. \end{align*} [/step] [step:Rewrite the centered self-normalized estimator as a ratio] For each $i\in\mathbb{N}$, define the real-valued random variables \begin{align*} X_i:=w(Y_i)(h(Y_i)-I) \end{align*} and \begin{align*} W_i:=w(Y_i). \end{align*} Whenever $\sum_{i=1}^N W_i>0$, the definition of $\widehat I_{N,\mathrm{SN}}$ gives \begin{align*} \widehat I_{N,\mathrm{SN}}-I=\frac{\sum_{i=1}^N W_i h(Y_i)}{\sum_{i=1}^N W_i}-I=\frac{\sum_{i=1}^N W_i(h(Y_i)-I)}{\sum_{i=1}^N W_i}. \end{align*} Multiplying numerator and denominator by the appropriate powers of $N$ yields \begin{align*} \sqrt{N}\left(\widehat I_{N,\mathrm{SN}}-I\right)=\frac{N^{-1/2}\sum_{i=1}^N X_i}{N^{-1}\sum_{i=1}^N W_i}. \end{align*} [guided] For each $i\in\mathbb{N}$, introduce two real-valued random variables: \begin{align*} X_i:=w(Y_i)(h(Y_i)-I) \end{align*} and \begin{align*} W_i:=w(Y_i). \end{align*} The variable $X_i$ is the centered weighted contribution to the numerator, and $W_i$ is the corresponding normalizing weight. This notation isolates the two parts of the estimator that will have different asymptotic behaviour: a fluctuation term of order $N^{1/2}$ and an average converging to a positive constant. Assume $\sum_{i=1}^N W_i>0$, so the self-normalized estimator is defined. Starting from the definition, \begin{align*} \widehat I_{N,\mathrm{SN}}=\frac{\sum_{i=1}^N W_i h(Y_i)}{\sum_{i=1}^N W_i}. \end{align*} Subtracting $I$ requires putting $I$ over the same denominator: \begin{align*} \widehat I_{N,\mathrm{SN}}-I=\frac{\sum_{i=1}^N W_i h(Y_i)}{\sum_{i=1}^N W_i}-\frac{I\sum_{i=1}^N W_i}{\sum_{i=1}^N W_i}. \end{align*} Combining the fractions gives \begin{align*} \widehat I_{N,\mathrm{SN}}-I=\frac{\sum_{i=1}^N W_i(h(Y_i)-I)}{\sum_{i=1}^N W_i}. \end{align*} By the definition of $X_i$, the numerator is $\sum_{i=1}^N X_i$. Hence \begin{align*} \sqrt{N}\left(\widehat I_{N,\mathrm{SN}}-I\right)=\sqrt{N}\frac{\sum_{i=1}^N X_i}{\sum_{i=1}^N W_i}. \end{align*} Dividing numerator and denominator by $N$ in the denominator form gives the asymptotically useful ratio \begin{align*} \sqrt{N}\left(\widehat I_{N,\mathrm{SN}}-I\right)=\frac{N^{-1/2}\sum_{i=1}^N X_i}{N^{-1}\sum_{i=1}^N W_i}. \end{align*} This is the key algebraic reduction: the numerator is now in the exact normalization used by the [Central Limit Theorem](/theorems/1848), while the denominator is an ordinary sample average. [/guided] [/step] [step:Apply the Central Limit Theorem to the centered numerator] The random variables $(X_i)_{i\in\mathbb{N}}$ are independent and identically distributed because $(Y_i)_{i\in\mathbb{N}}$ are independent and identically distributed and $X_i$ is a measurable function of $Y_i$. By the identities above, \begin{align*} \mathbb{E}[X_1]=\mathbb{E}_{\mathbb{Q}}[w(Y)(h(Y)-I)]=\mathbb{E}_{\mathbb{Q}}[w(Y)h(Y)]-I\mathbb{E}_{\mathbb{Q}}[w(Y)]=ZI-IZ=0. \end{align*} Define \begin{align*} \sigma^2:=\mathbb{E}_{\mathbb{Q}}\left[w(Y)^2(h(Y)-I)^2\right]. \end{align*} The assumed second-moment condition gives $\sigma^2<\infty$, and $\mathbb{E}[X_1^2]=\sigma^2$. By the Central Limit Theorem (citing a result not yet in the wiki: Central Limit Theorem), applied to the real-valued independent identically distributed sequence $(X_i)_{i\in\mathbb{N}}$ with mean $0$ and finite variance $\sigma^2$, \begin{align*} N^{-1/2}\sum_{i=1}^N X_i \xrightarrow{d}\mathcal{N}(0,\sigma^2). \end{align*} [/step] [step:Use the Strong Law to prove eventual positivity of the denominator] The random variables $(W_i)_{i\in\mathbb{N}}$ are independent and identically distributed, nonnegative, and integrable because $\mathbb{E}[W_1]=\mathbb{E}_{\mathbb{Q}}[w(Y)]=Z<\infty$. By the [Strong Law of Large Numbers](/theorems/1852) (citing a result not yet in the wiki: Strong Law of Large Numbers), \begin{align*} N^{-1}\sum_{i=1}^N W_i \to Z \end{align*} almost surely. Since $Z>0$, the same almost sure event contains an index $N_0$ such that for every $N\geq N_0$, \begin{align*} N^{-1}\sum_{i=1}^N W_i>\frac{Z}{2}. \end{align*} Thus $\sum_{i=1}^N w(Y_i)>0$ eventually almost surely. [guided] We now study the denominator \begin{align*} N^{-1}\sum_{i=1}^N W_i. \end{align*} The variables $(W_i)_{i\in\mathbb{N}}$ are independent and identically distributed because each $W_i=w(Y_i)$ is obtained from $Y_i$ by the same measurable map $w:S\to[0,\infty)$. They are nonnegative by construction of $w$. They are also integrable, since the first importance-sampling identity gives \begin{align*} \mathbb{E}[W_1]=\mathbb{E}_{\mathbb{Q}}[w(Y)]=Z<\infty. \end{align*} These are precisely the hypotheses needed to apply the Strong Law of Large Numbers for integrable independent identically distributed real-valued random variables. Therefore, by the Strong Law of Large Numbers (citing a result not yet in the wiki: Strong Law of Large Numbers), \begin{align*} N^{-1}\sum_{i=1}^N W_i \to \mathbb{E}[W_1]=Z \end{align*} almost surely. The positivity of $Z$ is essential here. Since the sample averages converge almost surely to the strictly positive number $Z$, on the almost sure convergence event there exists a random integer $N_0$ such that for every $N\geq N_0$, \begin{align*} \left|N^{-1}\sum_{i=1}^N W_i-Z\right|<\frac{Z}{2}. \end{align*} This implies \begin{align*} N^{-1}\sum_{i=1}^N W_i>\frac{Z}{2}. \end{align*} Multiplying by $N>0$ gives \begin{align*} \sum_{i=1}^N w(Y_i)>0 \end{align*} for every $N\geq N_0$. Hence the denominator of the self-normalized estimator is eventually positive almost surely. [/guided] [/step] [step:Pass to the ratio using Slutsky's Theorem] Let \begin{align*} A_N:=N^{-1/2}\sum_{i=1}^N X_i \end{align*} and \begin{align*} B_N:=N^{-1}\sum_{i=1}^N W_i. \end{align*} The preceding steps show that $A_N\xrightarrow{d}\mathcal{N}(0,\sigma^2)$ and $B_N\to Z$ almost surely, hence $B_N\xrightarrow{\mathbb{P}}Z$. Since $Z>0$, the map $r:\mathbb{R}\times(0,\infty)\to\mathbb{R}$ defined by $r(a,b):=a/b$ is continuous at every point with second coordinate $Z$. By Slutsky's Theorem (citing a result not yet in the wiki: Slutsky's Theorem), \begin{align*} \frac{A_N}{B_N}\xrightarrow{d}\mathcal{N}\left(0,\frac{\sigma^2}{Z^2}\right). \end{align*} Using the ratio identity from the second step and substituting the definition of $\sigma^2$, we obtain \begin{align*} \sqrt{N}\left(\widehat I_{N,\mathrm{SN}}-I\right)\xrightarrow{d}\mathcal{N}\left(0,\frac{1}{Z^2}\mathbb{E}_{\mathbb{Q}}\left[w(Y)^2(h(Y)-I)^2\right]\right). \end{align*} This is the asserted central limit theorem for the self-normalized [importance sampling estimator](/theorems/2001). [/step]

Explore Further

Closed Loop Signal Maps applied Coherent State Expansion applied First-Order Optimality Condition for Constrained Convex Minimization applied Orthogonality Principle for Least-Squares Estimation applied Zero-Variance Importance Sampling Proposal applied Kalman Decomposition Theorem applied Configuration Count Bound for Space-Bounded Turing Machines applied Closed Epigraph Characterization of Lower Semicontinuity applied

What brings you to Androma?

Start with a route through the knowledge graph.