Posterior Concentration at Rate $n^{-1/2}$ — Statement & Proof

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

No discussion available for this theorem.

Proof

[proofplan] We decompose the posterior mass into contributions from a shrinking neighbourhood of $\theta_0$ and its complement. A second-order Taylor expansion of the log-likelihood ratio around $\theta_0$ shows that the likelihood is approximately Gaussian with precision $nI(\theta_0)$, concentrating the posterior near $\theta_0$ at scale $n^{-1/2}$. The mass outside $\{|\theta - \theta_0| \leq M_n/\sqrt{n}\}$ is shown to be exponentially small relative to the mass inside by bounding the likelihood ratio on the two regions. The prior contributes only a bounded multiplicative factor because $\pi$ is continuous and positive at $\theta_0$. [/proofplan] [step:Write the posterior via Bayes' formula and decompose the normalizing constant] By Bayes' formula, the posterior density at $\theta$ given $X_1, \ldots, X_n$ is \begin{align*} \pi(\theta \mid X_1, \ldots, X_n) = \frac{\prod_{i=1}^n f(X_i, \theta) \cdot \pi(\theta)}{\int_\Theta \prod_{i=1}^n f(X_i, \vartheta) \cdot \pi(\vartheta) \, d\vartheta}. \end{align*} Define the log-likelihood ratio $\ell_n(\theta) - \ell_n(\theta_0) = \sum_{i=1}^n \log \frac{f(X_i, \theta)}{f(X_i, \theta_0)}$. Then $\prod_{i=1}^n f(X_i, \theta) = \prod_{i=1}^n f(X_i, \theta_0) \cdot e^{\ell_n(\theta) - \ell_n(\theta_0)}$, and \begin{align*} \Pi\!\left(|\theta - \theta_0| > \frac{M_n}{\sqrt{n}} \;\Big|\; X_1, \ldots, X_n\right) = \frac{\int_{|\theta - \theta_0| > M_n/\sqrt{n}} e^{\ell_n(\theta) - \ell_n(\theta_0)} \pi(\theta) \, d\theta}{\int_\Theta e^{\ell_n(\theta) - \ell_n(\theta_0)} \pi(\theta) \, d\theta}. \end{align*} It suffices to show the numerator is negligible compared to the denominator. [/step] [step:Approximate the log-likelihood ratio by a quadratic form near $\theta_0$] Fix a small $\delta > 0$ such that $B(\theta_0, \delta) \subset \Theta$ and the regularity conditions hold on $B(\theta_0, \delta)$. For $\theta \in B(\theta_0, \delta)$, a second-order Taylor expansion of $\bar{\ell}_n(\theta) = \frac{1}{n}\ell_n(\theta)$ around $\theta_0$ gives \begin{align*} \bar{\ell}_n(\theta) - \bar{\ell}_n(\theta_0) = (\theta - \theta_0)^\top \nabla_\theta \bar{\ell}_n(\theta_0) + \frac{1}{2}(\theta - \theta_0)^\top \nabla^2_\theta \bar{\ell}_n(\theta^*)(\theta - \theta_0), \end{align*} where $\theta^*$ lies on the segment $[\theta_0, \theta]$. By the uniform law of large numbers for the Hessian, $\nabla^2_\theta \bar{\ell}_n(\theta^*) \xrightarrow{\mathbb{P}} -I(\theta_0)$ uniformly over $\theta \in B(\theta_0, \delta)$. Therefore on an event of probability tending to 1, for all $\theta \in B(\theta_0, \delta)$, \begin{align*} \ell_n(\theta) - \ell_n(\theta_0) = n(\theta - \theta_0)^\top \nabla_\theta \bar{\ell}_n(\theta_0) - \frac{n}{2}(\theta - \theta_0)^\top I(\theta_0)(\theta - \theta_0) + n \cdot o(|\theta - \theta_0|^2), \end{align*} where the $o(|\theta - \theta_0|^2)$ term is uniform in $\theta \in B(\theta_0, \delta)$ and vanishes in probability. [guided] The Taylor expansion reveals that the log-likelihood ratio has a dominant quadratic term $-\frac{n}{2}(\theta - \theta_0)^\top I(\theta_0)(\theta - \theta_0)$, which is a negative definite quadratic form that penalizes deviations from $\theta_0$ at rate $n$. The linear term $n(\theta - \theta_0)^\top \nabla_\theta \bar{\ell}_n(\theta_0)$ shifts the mode of the likelihood slightly away from $\theta_0$ (to the MLE), but this shift is of order $O_{\mathbb{P}}(n^{-1/2})$ and does not affect the concentration rate. The uniform control of the Hessian is crucial: we need the quadratic approximation to hold simultaneously over the ball $B(\theta_0, \delta)$, not just at a single point. This is where the regularity assumption (uniform law of large numbers for the second derivatives) is consumed. [/guided] [/step] [step:Bound the denominator from below using the Gaussian approximation on $B(\theta_0, \varepsilon/\sqrt{n})$] For any fixed $\varepsilon > 0$, restrict the denominator to $B(\theta_0, \varepsilon/\sqrt{n}) \subset B(\theta_0, \delta)$ (for $n$ large enough). On this ball, the substitution $\theta = \theta_0 + u/\sqrt{n}$ with $u \in B(0, \varepsilon)$ gives \begin{align*} \int_{B(\theta_0, \varepsilon/\sqrt{n})} e^{\ell_n(\theta) - \ell_n(\theta_0)} \pi(\theta) \, d\theta = \frac{1}{n^{p/2}} \int_{B(0, \varepsilon)} e^{u^\top \sqrt{n}\, \nabla_\theta \bar{\ell}_n(\theta_0) - \frac{1}{2} u^\top I(\theta_0) u + o_{\mathbb{P}}(1)} \pi(\theta_0 + u/\sqrt{n}) \, du. \end{align*} Since $\pi$ is continuous and $\pi(\theta_0) > 0$, for $n$ large, $\pi(\theta_0 + u/\sqrt{n}) \geq \frac{1}{2}\pi(\theta_0)$ uniformly for $u \in B(0, \varepsilon)$. The term $\sqrt{n}\, \nabla_\theta \bar{\ell}_n(\theta_0) = O_{\mathbb{P}}(1)$ by the CLT for the score. Taking $\varepsilon$ large enough to capture most of the Gaussian mass, the integral is bounded below by \begin{align*} \frac{c}{n^{p/2}} \end{align*} for some constant $c > 0$ on an event of probability tending to 1. [/step] [step:Bound the numerator by showing exponential decay outside $B(\theta_0, M_n/\sqrt{n})$] Split the numerator into two regions: $\{M_n/\sqrt{n} < |\theta - \theta_0| \leq \delta\}$ and $\{|\theta - \theta_0| > \delta\}$. **Region 1:** For $\theta$ with $M_n/\sqrt{n} < |\theta - \theta_0| \leq \delta$, the quadratic approximation gives \begin{align*} \ell_n(\theta) - \ell_n(\theta_0) \leq n|\theta - \theta_0| \cdot |\nabla_\theta \bar{\ell}_n(\theta_0)| - \frac{n}{4} \lambda_{\min}(I(\theta_0)) |\theta - \theta_0|^2 \end{align*} on an event of probability tending to 1, where $\lambda_{\min}(I(\theta_0)) > 0$ is the smallest eigenvalue of $I(\theta_0)$ (positive definite by hypothesis). After the substitution $\theta = \theta_0 + u/\sqrt{n}$, the exponential factor on $\{|u| > M_n\}$ decays as $e^{-c' M_n^2}$ for some $c' > 0$, up to a polynomial factor from the linear score term (which is $O_{\mathbb{P}}(1)$). The prior is bounded on bounded sets. Hence this contribution is at most $\frac{1}{n^{p/2}} \cdot e^{-c' M_n^2 / 2}$ on an event of probability tending to 1. **Region 2:** For $|\theta - \theta_0| > \delta$, the [Uniform Law of Large Numbers](/theorems/1855) ensures that $\bar{\ell}_n(\theta) - \bar{\ell}_n(\theta_0) \xrightarrow{\mathbb{P}} \ell(\theta) - \ell(\theta_0) < 0$ uniformly, since $\theta_0$ is the unique maximizer of $\ell$ (the identifiability condition in the regularity assumptions). Therefore $\sup_{|\theta - \theta_0| > \delta} (\bar{\ell}_n(\theta) - \bar{\ell}_n(\theta_0)) \leq -\eta$ for some $\eta > 0$ with probability tending to 1. The contribution from this region is at most $e^{-n\eta} \int_\Theta \pi(\theta) \, d\theta = e^{-n\eta}$, which is exponentially small. [guided] The two regions capture different mechanisms. In Region 1, the quadratic approximation is valid, and the key is that the Gaussian factor $e^{-\frac{n}{4}\lambda_{\min}|\theta - \theta_0|^2}$ provides strong decay. After rescaling by $u = \sqrt{n}(\theta - \theta_0)$, the mass on $\{|u| > M_n\}$ of a Gaussian density with fixed covariance decays as $e^{-c' M_n^2}$, which overwhelms any polynomial growth. In Region 2, we are far from $\theta_0$ in a fixed sense, so the law of large numbers drives $\bar{\ell}_n(\theta) - \bar{\ell}_n(\theta_0)$ towards $\ell(\theta) - \ell(\theta_0)$, which is strictly negative by identifiability. The exponential factor $e^{n(\bar{\ell}_n(\theta) - \bar{\ell}_n(\theta_0))} \leq e^{-n\eta}$ kills this region. In both cases, the contribution to the numerator is $o(n^{-p/2})$, which is negligible compared to the denominator bound $c/n^{p/2}$. [/guided] [/step] [step:Combine the bounds to conclude posterior concentration] Assembling the estimates from the previous steps, on an event $E_n$ with $\mathbb{P}_{\theta_0}(E_n) \to 1$: \begin{align*} \Pi\!\left(|\theta - \theta_0| > \frac{M_n}{\sqrt{n}} \;\Big|\; X_1, \ldots, X_n\right) \leq \frac{C \cdot n^{-p/2} \cdot e^{-c' M_n^2/2} + e^{-n\eta}}{c \cdot n^{-p/2}} = \frac{C}{c} e^{-c' M_n^2/2} + \frac{n^{p/2}}{c} e^{-n\eta}. \end{align*} Since $M_n \to \infty$, both terms converge to zero. On the complementary event $E_n^c$, we use the trivial bound that the posterior probability is at most 1, and $\mathbb{P}_{\theta_0}(E_n^c) \to 0$. Therefore \begin{align*} \Pi\!\left(|\theta - \theta_0| > \frac{M_n}{\sqrt{n}} \;\Big|\; X_1, \ldots, X_n\right) \xrightarrow{\mathbb{P}_{\theta_0}} 0. \end{align*} This shows that the posterior concentrates in a ball of radius $M_n/\sqrt{n}$ around $\theta_0$ for any $M_n \to \infty$, establishing the $n^{-1/2}$ concentration rate. [/step]

What brings you to Androma?

Start with a route through the knowledge graph.

Posterior Concentration at Rate $n^{-1/2}$ (Theorem # 1868)

Discussion

Proof

Explore Further

Sign in to Androma

Check your inbox

One last step

Posterior Concentration at Rate $n^{-1/2}$ (Theorem # 1868)

Discussion

Proof

Explore Further