Asymptotic Normality of Sample Quantiles — Statement & Proof

Asymptotic Normality of Sample Quantiles (Theorem # 6301)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We compare the event that the empirical quantile lies to the left of a moving threshold $q_p + y/\sqrt n$ with the event that the empirical distribution function at that threshold has already reached height $p$. The empirical count at this moving threshold is binomial with success probability $F(q_p + y/\sqrt n)$, so a [central limit theorem](/theorems/521) for Bernoulli triangular arrays gives its limiting fluctuations. The differentiability of $F$ at $q_p$ converts the deterministic shift $F(q_p + y/\sqrt n)-p$ into the linear term $f_X(q_p)y/\sqrt n$, which produces the inverse-density variance factor. [/proofplan] [step:Relate the empirical quantile event to the empirical distribution function] Fix $y \in \mathbb R$. For each $n \in \mathbb N$, define the deterministic threshold $t_n \in \mathbb R$ by \begin{align*} t_n := q_p + \frac{y}{\sqrt n}. \end{align*} Since $F_n: \mathbb R \to [0,1]$ is nondecreasing and right-continuous, and since $F_n^{-1}(p)=\inf\{t \in \mathbb R:F_n(t)\ge p\}$, we have the event identity \begin{align*} \left\{\sqrt n\bigl(F_n^{-1}(p)-q_p\bigr)\le y\right\}=\left\{F_n^{-1}(p)\le t_n\right\}. \end{align*} Also, \begin{align*} \left\{F_n^{-1}(p)\le t_n\right\}=\left\{F_n(t_n)\ge p\right\}. \end{align*} The first equality is the algebraic rearrangement of the inequality, and the second equality follows from monotonicity of $F_n$ and the definition of the generalized inverse. [guided] Fix a real number $y$. To prove convergence in distribution, it is enough to compute the limiting distribution function at each continuity point of the proposed Gaussian limit. The proposed limit is a nondegenerate normal distribution, so every $y \in \mathbb R$ is a continuity point. For each $n \in \mathbb N$, define \begin{align*} t_n := q_p + \frac{y}{\sqrt n}. \end{align*} The event \begin{align*} \left\{\sqrt n\bigl(F_n^{-1}(p)-q_p\bigr)\le y\right\} \end{align*} is exactly the event that the empirical $p$-quantile is at most $t_n$. Rearranging the inequality gives \begin{align*} \left\{\sqrt n\bigl(F_n^{-1}(p)-q_p\bigr)\le y\right\} = \left\{F_n^{-1}(p)\le t_n\right\}. \end{align*} Now use the defining property of a generalized inverse. Since $F_n$ is nondecreasing and right-continuous, the empirical quantile is the first point where $F_n$ reaches height $p$. Therefore the condition $F_n^{-1}(p)\le t_n$ is equivalent to saying that by the time we reach $t_n$, the empirical distribution function has reached height $p$: \begin{align*} \left\{F_n^{-1}(p)\le t_n\right\} = \left\{F_n(t_n)\ge p\right\}. \end{align*} Thus \begin{align*} \left\{\sqrt n\bigl(F_n^{-1}(p)-q_p\bigr)\le y\right\} = \left\{F_n(t_n)\ge p\right\}. \end{align*} This identity is the mechanism that turns a horizontal quantile fluctuation into a vertical empirical distribution fluctuation. [/guided] [/step] [step:Compute the local deterministic shift of $F$ at the moving threshold] Because $t_n \to q_p$ and $F$ is differentiable at $q_p$, the definition of derivative gives \begin{align*} F(t_n)=F(q_p)+f_X(q_p)(t_n-q_p)+o(|t_n-q_p|). \end{align*} Substituting $F(q_p)=p$ and $t_n-q_p=y/\sqrt n$ gives \begin{align*} F(t_n)=p+\frac{f_X(q_p)y}{\sqrt n}+o\left(\frac{1}{\sqrt n}\right). \end{align*} Therefore \begin{align*} \sqrt n\bigl(F(t_n)-p\bigr)\to f_X(q_p)y. \end{align*} [/step] [step:Apply the central limit theorem to the empirical count at $t_n$] For each $n \in \mathbb N$ and each $i \in \{1,\dots,n\}$, define the Bernoulli [random variable](/page/Random%20Variable) Let $Y_{n,i}: \Omega \to \{0,1\}$ be the map defined by \begin{align*} Y_{n,i}(\omega)=\mathbb{1}_{\{X_i(\omega)\le t_n\}}. \end{align*} The random variables $Y_{n,1},\dots,Y_{n,n}$ are independent and identically distributed, with \begin{align*} \mathbb P(Y_{n,i}=1)=F(t_n). \end{align*} Moreover, \begin{align*} F_n(t_n)=\frac{1}{n}\sum_{i=1}^{n}Y_{n,i}. \end{align*} Since $F(t_n)\to F(q_p)=p$ and $p\in(0,1)$, the Bernoulli variances satisfy \begin{align*} F(t_n)(1-F(t_n))\to p(1-p)>0. \end{align*} We verify the Lindeberg condition for the centered Bernoulli triangular array. Define $\mu_n:=F(t_n)$ and $\sigma_n^2:=\mu_n(1-\mu_n)$. Since $\sigma_n^2\to p(1-p)>0$, there exists $N\in\mathbb N$ such that $\sigma_n^2\ge p(1-p)/2$ for all $n\ge N$. For every $\varepsilon>0$ and every $n\ge N$, the centered variables $Y_{n,i}-\mu_n$ satisfy $|Y_{n,i}-\mu_n|\le 1$, while $\varepsilon\sqrt{n\sigma_n^2}\to\infty$; hence for all sufficiently large $n$ the Lindeberg indicators vanish. Therefore the [central limit theorem](/theorems/1848) for Bernoulli triangular arrays (citing a result not yet in the wiki: Central Limit Theorem for Bernoulli triangular arrays) gives \begin{align*} \frac{\sqrt n\bigl(F_n(t_n)-F(t_n)\bigr)}{\sqrt{F(t_n)(1-F(t_n))}}\xrightarrow{d}\mathcal N(0,1). \end{align*} Since $\sqrt{F(t_n)(1-F(t_n))}\to \sqrt{p(1-p)}$, Slutsky's theorem gives \begin{align*} \sqrt n\bigl(F_n(t_n)-F(t_n)\bigr)\xrightarrow{d}\mathcal N(0,p(1-p)). \end{align*} [/step] [step:Convert the empirical distribution fluctuation into the quantile distribution] Define the centered empirical fluctuation random variable $Z_n: \Omega \to \mathbb R$ by \begin{align*} Z_n := \sqrt n\bigl(F_n(t_n)-F(t_n)\bigr). \end{align*} By the preceding step, \begin{align*} Z_n \xrightarrow{d} Z, \end{align*} where $Z$ is a real-valued random variable with distribution $\mathcal N(0,p(1-p))$. Using the event identity from the first step, \begin{align*} \mathbb P\left(\sqrt n\bigl(F_n^{-1}(p)-q_p\bigr)\le y\right)=\mathbb P(F_n(t_n)\ge p). \end{align*} Subtracting $F(t_n)$ from both sides of the inequality inside the event and multiplying by $\sqrt n$ gives \begin{align*} \mathbb P(F_n(t_n)\ge p)=\mathbb P\left(\sqrt n\bigl(F_n(t_n)-F(t_n)\bigr)\ge \sqrt n\bigl(p-F(t_n)\bigr)\right). \end{align*} Using the definition of $Z_n$, this is \begin{align*} \mathbb P\left(\sqrt n\bigl(F_n^{-1}(p)-q_p\bigr)\le y\right)=\mathbb P\left(Z_n\ge -\sqrt n\bigl(F(t_n)-p\bigr)\right). \end{align*} From the local expansion, \begin{align*} -\sqrt n\bigl(F(t_n)-p\bigr)\to -f_X(q_p)y. \end{align*} Since the normal distribution has no atom at $-f_X(q_p)y$, convergence in distribution implies \begin{align*} \lim_{n\to\infty}\mathbb P\left(Z_n \ge -\sqrt n\bigl(F(t_n)-p\bigr)\right)=\mathbb P\left(Z\ge -f_X(q_p)y\right). \end{align*} Since $Z\sim\mathcal N(0,p(1-p))$, this probability is \begin{align*} \mathbb P\left(Z\ge -f_X(q_p)y\right)=\Phi\left(\frac{f_X(q_p)y}{\sqrt{p(1-p)}}\right), \end{align*} where $\Phi: \mathbb R\to[0,1]$ is the standard normal distribution function. If $W$ is a real-valued random variable with distribution \begin{align*} W \sim \mathcal N\left(0,\frac{p(1-p)}{f_X(q_p)^2}\right), \end{align*} then, because $f_X(q_p)>0$, \begin{align*} \mathbb P(W\le y) = \Phi\left(\frac{f_X(q_p)y}{\sqrt{p(1-p)}}\right). \end{align*} Thus for every $y\in\mathbb R$, \begin{align*} \lim_{n\to\infty}\mathbb P\left(\sqrt n\bigl(F_n^{-1}(p)-q_p\bigr)\le y\right) = \mathbb P(W\le y). \end{align*} Therefore \begin{align*} \sqrt n\bigl(F_n^{-1}(p)-q_p\bigr)\xrightarrow{d}\mathcal N\left(0,\frac{p(1-p)}{f_X(q_p)^2}\right). \end{align*} [/step]

Prerequisites (0/8 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

Definitions & Concepts

Explore Further

Distribution Definition Random Variable Definition Continuity Definition Variance Definition Event Definition Central Limit Theorem for Nondegenerate U-Statistics Theorem #6336 Central Limit Theorem Theorem #521 Central Limit Theorem Theorem #1848 Existence and Uniqueness of Conditional Expectation Conditional Expectation Doob's Maximal Inequality Martingale Theory Bias-Variance Decomposition for Squared Loss Probability & Statistics Davis–Kahan Rank-One PCA Bound Probability & Statistics Uniqueness of the PGF Probability Theory Hoeffding Confidence Radius for Bounded Independent Means Probability & Statistics Neyman-Pearson Total Variation Formula Probability & Statistics Moments of Branching Processes Probability Theory Probability & Statistics Area

What brings you to Androma?

Start with a route through the knowledge graph.

Asymptotic Normality of Sample Quantiles (Theorem # 6301)

Discussion

Proof

Prerequisites (0/8 completed)

Prerequisites Graph

Explore Further

Sign in to Androma

Check your inbox

One last step

Asymptotic Normality of Sample Quantiles (Theorem # 6301)

Discussion

Proof

Prerequisites (0/8 completed)

Prerequisites Graph

Explore Further