Optimal Scaling Theorem for Random-Walk Metropolis on Product Targets

Optimal Scaling Theorem for Random-Walk Metropolis on Product Targets (Theorem # 7223)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] The diffusion-limit theorem is assumed as an input, so the proof reduces to optimizing the displayed speed function. We remove the nuisance Fisher-information constant by setting $x=\ell\sqrt{I}$, differentiate the resulting one-variable function, and rewrite the critical-point equation using the standard normal density. A monotonicity argument based on the normal Mills ratio proves that this critical point is unique and is therefore the global maximizer. Finally, evaluating the standard normal distribution function at the optimizer gives the acceptance probability $0.234$. [/proofplan] [step:Reduce the optimization to a dimensionless one-variable problem] Define the standard normal density $\phi: \mathbb{R} \to (0,\infty)$ by \begin{align*} \phi(t) := \frac{1}{\sqrt{2\pi}} e^{-t^2/2}. \end{align*} Define the standard normal distribution function $\Phi: \mathbb{R} \to (0,1)$ by \begin{align*} \Phi(t) := \int_{-\infty}^{t} \phi(s)\,d\mathcal{L}^1(s). \end{align*} Since $I \in (0,\infty)$, the map $\ell \mapsto x=\ell\sqrt{I}$ is a bijection from $(0,\infty)$ onto $(0,\infty)$. Define $g: (0,\infty) \to (0,\infty)$ by \begin{align*} g(x) := x^2 \Phi\left(-\frac{x}{2}\right). \end{align*} Then \begin{align*} h(\ell) = \frac{2}{I} g(\ell\sqrt{I}). \end{align*} The factor $2/I$ is positive, so maximizing $h$ over $\ell>0$ is equivalent to maximizing $g$ over $x>0$. [/step] [step:Differentiate the dimensionless speed and obtain the critical-point equation] The function $g$ is differentiable on $(0,\infty)$. Using the product rule and the chain rule, with $\Phi'=\phi$ and $\phi$ even, we obtain \begin{align*} g'(x) = 2x\Phi\left(-\frac{x}{2}\right) - \frac{x^2}{2}\phi\left(\frac{x}{2}\right). \end{align*} Since $x>0$, the equation $g'(x)=0$ is equivalent to \begin{align*} 4\Phi\left(-\frac{x}{2}\right) = x\phi\left(\frac{x}{2}\right). \end{align*} [guided] We now compute the derivative carefully because the numerical constant $2.38$ comes entirely from this scalar optimization. The function being optimized is \begin{align*} g(x) = x^2\Phi\left(-\frac{x}{2}\right). \end{align*} The first factor differentiates to $2x$. For the second factor, the chain rule gives \begin{align*} \frac{d}{dx}\Phi\left(-\frac{x}{2}\right) = -\frac{1}{2}\phi\left(-\frac{x}{2}\right). \end{align*} The density $\phi$ is even, so $\phi(-x/2)=\phi(x/2)$. Therefore \begin{align*} g'(x) = 2x\Phi\left(-\frac{x}{2}\right) - \frac{x^2}{2}\phi\left(\frac{x}{2}\right). \end{align*} Because $x>0$, we may divide the equation $g'(x)=0$ by $x/2$. This gives \begin{align*} 4\Phi\left(-\frac{x}{2}\right) = x\phi\left(\frac{x}{2}\right). \end{align*} This is the critical-point equation that determines the optimal dimensionless scale $x=\ell\sqrt{I}$. [/guided] [/step] [step:Prove that the critical-point equation has exactly one positive solution] For $u>0$, define the normal upper-tail function $Q: (0,\infty) \to (0,1/2)$ by \begin{align*} Q(u) := \Phi(-u) = \int_u^\infty \phi(t)\,d\mathcal{L}^1(t). \end{align*} With $u=x/2$, the critical-point equation becomes \begin{align*} 2Q(u) = u\phi(u). \end{align*} Define $R: (0,\infty) \to (0,\infty)$ by \begin{align*} R(u) := \frac{u\phi(u)}{2Q(u)}. \end{align*} The critical-point equation is $R(u)=1$. We show that $R$ is strictly increasing. Since $Q'(u)=-\phi(u)$ and $\phi'(u)=-u\phi(u)$, logarithmic differentiation gives \begin{align*} \frac{R'(u)}{R(u)} = \frac{1}{u} - u + \frac{\phi(u)}{Q(u)}. \end{align*} For $u>0$, the Mills upper bound follows from $t/u>1$ for $t>u$: \begin{align*} Q(u) = \int_u^\infty \phi(t)\,d\mathcal{L}^1(t) < \int_u^\infty \frac{t}{u}\phi(t)\,d\mathcal{L}^1(t) = \frac{\phi(u)}{u}. \end{align*} Thus $\phi(u)/Q(u)>u$, and hence \begin{align*} \frac{R'(u)}{R(u)} > \frac{1}{u} > 0. \end{align*} Therefore $R$ is strictly increasing. Moreover, \begin{align*} \lim_{u\downarrow 0} R(u)=0, \end{align*} because $u\phi(u)\to0$ and $Q(u)\to1/2$. Also $R(2)>1$, since numerically $Q(2)=\Phi(-2)\approx0.0228$ and $\phi(2)\approx0.0540$, so \begin{align*} R(2) = \frac{2\phi(2)}{2Q(2)} = \frac{\phi(2)}{Q(2)} \approx 2.37 > 1. \end{align*} By continuity and strict monotonicity, there is a unique $u_*\in(0,2)$ satisfying $R(u_*)=1$. Consequently there is a unique positive solution $x_*:=2u_*$ of the critical-point equation. [/step] [step:Identify the unique critical point as the global maximizer] Since $R$ is strictly increasing, the identity \begin{align*} g'(x) = 2x\Phi\left(-\frac{x}{2}\right)\left(1 - R\left(\frac{x}{2}\right)\right) \end{align*} shows that $g'(x)>0$ for $0<x<x_*$ and $g'(x)<0$ for $x>x_*$. Hence $g$ increases on $(0,x_*)$ and decreases on $(x_*,\infty)$. Therefore $x_*$ is the unique global maximizer of $g$ on $(0,\infty)$. Since $x=\ell\sqrt{I}$, the unique maximizer of $h$ is \begin{align*} \ell_* := \frac{x_*}{\sqrt{I}}. \end{align*} [/step] [step:Evaluate the optimal scale and the corresponding acceptance probability] Define $F: (0,\infty) \to \mathbb{R}$ by \begin{align*} F(x) := 4\Phi\left(-\frac{x}{2}\right) - x\phi\left(\frac{x}{2}\right). \end{align*} The preceding uniqueness argument shows that the unique zero of $F$ is $x_*$. A direct interval computation gives \begin{align*} F(2.381)>0 \end{align*} and \begin{align*} F(2.382)<0. \end{align*} Hence $2.381<x_*<2.382$, so \begin{align*} \ell_*\sqrt{I}=x_*\approx2.38. \end{align*} The limiting average acceptance probability at the optimum is \begin{align*} a(\ell_*) = 2\Phi\left(-\frac{\ell_*\sqrt{I}}{2}\right) = 2\Phi\left(-\frac{x_*}{2}\right). \end{align*} Using $2.381<x_*<2.382$ gives $1.1905<x_*/2<1.191$, and evaluating the standard normal distribution function on this interval gives \begin{align*} a(\ell_*) \approx 0.234. \end{align*} This proves the stated optimal scaling and the corresponding limiting acceptance probability. [/step]

Explore Further

Hamilton's Equations in Canonical Coordinates applied Generating Function Criterion applied NP-Completeness of Knapsack applied First Born Scattering Amplitude applied Deterministic Time Hierarchy Theorem applied Spectral Theorem for Self-Adjoint Operators applied Ladder Operator Commutation Relations for Angular Momentum applied Central Limit Theorem for Exponentially Strong-Mixing Stationary Markov Chains applied

What brings you to Androma?

Start with a route through the knowledge graph.