Central Limit Theorem for Nondegenerate U-Statistics

Central Limit Theorem for Nondegenerate U-Statistics (Theorem # 6336)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We use Hoeffding's decomposition to split $U_n-\theta$ into its first-order projection and canonical degenerate U-statistics of orders $2,\dots,m$. The first-order projection is exactly an average of i.i.d. centered random variables, so the classical [central limit theorem](/theorems/521) gives the Gaussian limit with variance $m^2\zeta_1$. The higher-order canonical terms have variances of order $n^{-s}$ in order $s$, and hence become negligible after multiplication by $\sqrt n$. Slutsky's theorem then transfers the limiting distribution from the first projection to the full U-statistic. [/proofplan] [step:Decompose the U-statistic into Hoeffding projection terms] For each $s\in\{1,\dots,m\}$, let $h_s:E^s\to\mathbb R$ denote the $s$-th canonical Hoeffding kernel associated to $h$, with $h_1$ equal to the first projection in the theorem statement. Thus each $h_s$ is symmetric, square-integrable, and canonical in the sense that for $s\ge2$, \begin{align*} \mathbb E[h_s(x_1,\dots,x_{s-1},X_s)] = 0 \end{align*} for every fixed $(x_1,\dots,x_{s-1})\in E^{s-1}$ for which the conditional expression is defined. For $n\ge s$, define the order-$s$ U-statistic generated by $h_s$ as \begin{align*} U_{n,s}:=\binom{n}{s}^{-1}\sum_{1\le i_1<\cdots<i_s\le n} h_s(X_{i_1},\dots,X_{i_s}). \end{align*} By the Hoeffding decomposition for square-integrable U-statistics (citing a result not yet in the wiki: [Hoeffding decomposition for U-statistics](/theorems/6334)), \begin{align*} U_n-\theta = \sum_{s=1}^m \binom{m}{s}U_{n,s}. \end{align*} Since $U_{n,1}=n^{-1}\sum_{i=1}^n h_1(X_i)$, this becomes \begin{align*} U_n-\theta = \frac{m}{n}\sum_{i=1}^n h_1(X_i) + \sum_{s=2}^m \binom{m}{s}U_{n,s}. \end{align*} [/step] [step:Apply the classical central limit theorem to the first projection] The random variables $h_1(X_1),h_1(X_2),\dots$ are i.i.d. real-valued random variables. Moreover, \begin{align*} \mathbb E[h_1(X_1)] = 0, \qquad \operatorname{Var}(h_1(X_1))=\zeta_1\in(0,\infty), \end{align*} where finiteness follows from the square-integrability of $h$ and [Jensen's inequality](/theorems/9) applied to the [conditional expectation](/page/Conditional%20Expectation) defining $h_1$. By the classical [central limit theorem](/theorems/1848) (citing a result not yet in the wiki: Lindeberg-Levy Central Limit Theorem), \begin{align*} \frac{1}{\sqrt n}\sum_{i=1}^n h_1(X_i) \xrightarrow{d} \mathcal N(0,\zeta_1). \end{align*} Multiplying by the constant $m$ gives \begin{align*} \sqrt n\,\frac{m}{n}\sum_{i=1}^n h_1(X_i) = m\,\frac{1}{\sqrt n}\sum_{i=1}^n h_1(X_i) \xrightarrow{d} \mathcal N(0,m^2\zeta_1). \end{align*} [/step] [step:Show that every degenerate Hoeffding term is negligible at scale $\sqrt n$] Fix $s\in\{2,\dots,m\}$. Let $\mathcal I_{n,s}$ denote the set of all subsets $I\subset\{1,\dots,n\}$ with $|I|=s$. For $I=\{i_1<\cdots<i_s\}\in\mathcal I_{n,s}$, define the real-valued [random variable](/page/Random%20Variable) \begin{align*} Y_I := h_s(X_{i_1},\dots,X_{i_s}). \end{align*} Then \begin{align*} U_{n,s}=\binom{n}{s}^{-1}\sum_{I\in\mathcal I_{n,s}}Y_I. \end{align*} If $I,J\in\mathcal I_{n,s}$ and $I\ne J$, then $I\setminus J$ is nonempty. Conditioning on all random variables indexed by $J$ together with those indexed by $I\cap J$, and then integrating over one index in $I\setminus J$, the canonical property of $h_s$ gives \begin{align*} \mathbb E[Y_IY_J]=0. \end{align*} Therefore only the diagonal terms contribute to the second moment. First, \begin{align*} \mathbb E[U_{n,s}^2]=\binom{n}{s}^{-2}\sum_{I,J\in\mathcal I_{n,s}}\mathbb E[Y_IY_J]. \end{align*} Using $\mathbb E[Y_IY_J]=0$ for $I\ne J$, this reduces to \begin{align*} \mathbb E[U_{n,s}^2]=\binom{n}{s}^{-2}\sum_{I\in\mathcal I_{n,s}}\mathbb E[Y_I^2]. \end{align*} Since each $Y_I$ has the same distribution as $h_s(X_1,\dots,X_s)$ and $|\mathcal I_{n,s}|=\binom{n}{s}$, we get \begin{align*} \mathbb E[U_{n,s}^2]=\binom{n}{s}^{-1}\mathbb E[h_s(X_1,\dots,X_s)^2]. \end{align*} Since $h_s$ is square-integrable, define \begin{align*} A_s:=\mathbb E[h_s(X_1,\dots,X_s)^2]<\infty. \end{align*} Then \begin{align*} \mathbb E[(\sqrt n\,U_{n,s})^2] = n\binom{n}{s}^{-1}A_s. \end{align*} Because $s\ge2$, we have $n\binom{n}{s}^{-1}\to0$, and hence \begin{align*} \sqrt n\,U_{n,s}\to0 \end{align*} in $L^2$, therefore also in probability. [guided] Fix one order $s\in\{2,\dots,m\}$. The goal is to prove that the corresponding Hoeffding term is too small to affect the $\sqrt n$ limit. Define $\mathcal I_{n,s}$ to be the set of all subsets $I\subset\{1,\dots,n\}$ with $|I|=s$. For $I=\{i_1<\cdots<i_s\}\in\mathcal I_{n,s}$, define \begin{align*} Y_I := h_s(X_{i_1},\dots,X_{i_s}). \end{align*} With this notation, \begin{align*} U_{n,s}=\binom{n}{s}^{-1}\sum_{I\in\mathcal I_{n,s}}Y_I. \end{align*} We compute the second moment of $U_{n,s}$. Expanding the square gives \begin{align*} \mathbb E[U_{n,s}^2] = \binom{n}{s}^{-2} \sum_{I,J\in\mathcal I_{n,s}}\mathbb E[Y_IY_J]. \end{align*} The important point is that canonical degeneracy kills every off-diagonal covariance. Indeed, if $I\ne J$, then at least one index belongs to $I$ but not to $J$. Choose such an index $i\in I\setminus J$. Condition on all random variables except $X_i$. With the remaining $s-1$ arguments of $h_s$ fixed, the canonical property gives conditional mean zero in the $X_i$ variable: \begin{align*} \mathbb E[Y_I\mid (X_k)_{k\ne i}]=0. \end{align*} Since $Y_J$ is measurable with respect to the random variables $(X_k)_{k\ne i}$, the tower property gives \begin{align*} \mathbb E[Y_IY_J] = \mathbb E\left[Y_J\,\mathbb E[Y_I\mid (X_k)_{k\ne i}]\right] = 0. \end{align*} Thus the only surviving terms in the double sum are the terms with $I=J$. Hence \begin{align*} \mathbb E[U_{n,s}^2]=\binom{n}{s}^{-2}\sum_{I\in\mathcal I_{n,s}}\mathbb E[Y_I^2]. \end{align*} Since there are $\binom{n}{s}$ such sets $I$, and each $Y_I$ has the same distribution as $h_s(X_1,\dots,X_s)$, we obtain \begin{align*} \mathbb E[U_{n,s}^2]=\binom{n}{s}^{-1}\mathbb E[h_s(X_1,\dots,X_s)^2]. \end{align*} Define the finite constant \begin{align*} A_s:=\mathbb E[h_s(X_1,\dots,X_s)^2]. \end{align*} Then \begin{align*} \mathbb E[(\sqrt n\,U_{n,s})^2] = n\binom{n}{s}^{-1}A_s. \end{align*} For fixed $s\ge2$, the binomial coefficient grows like a constant multiple of $n^s$, so $n\binom{n}{s}^{-1}\to0$. Therefore \begin{align*} \mathbb E[(\sqrt n\,U_{n,s})^2]\to0. \end{align*} This proves $\sqrt n\,U_{n,s}\to0$ in $L^2$, and convergence in $L^2$ implies convergence in probability. [/guided] [/step] [step:Combine the negligible terms and apply Slutsky's theorem] Define the remainder random variable $R_n$ by \begin{align*} R_n:=\sum_{s=2}^m \binom{m}{s}U_{n,s}. \end{align*} Using the inequality \begin{align*} \left(\sum_{s=2}^m a_s\right)^2 \le (m-1)\sum_{s=2}^m a_s^2 \end{align*} for [real numbers](/page/Real%20Numbers) $a_2,\dots,a_m$, we get \begin{align*} \mathbb E[(\sqrt n\,R_n)^2]\le (m-1)\sum_{s=2}^m \binom{m}{s}^2\mathbb E[(\sqrt n\,U_{n,s})^2]. \end{align*} Each summand on the right tends to $0$ by the preceding step, and there are only finitely many values of $s$. Hence \begin{align*} \sqrt n\,R_n\to0 \end{align*} in $L^2$, therefore in probability. From the decomposition, \begin{align*} \sqrt n\,(U_n-\theta) = m\,\frac{1}{\sqrt n}\sum_{i=1}^n h_1(X_i) + \sqrt n\,R_n. \end{align*} The first term converges in distribution to $\mathcal N(0,m^2\zeta_1)$, and the second term converges in probability to $0$. By Slutsky's theorem (citing a result not yet in the wiki: Slutsky's Theorem), \begin{align*} \sqrt n\,(U_n-\theta) \xrightarrow{d} \mathcal N(0,m^2\zeta_1). \end{align*} This is the claimed central limit theorem. [/step]

Prerequisites (0/5 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

Definitions & Concepts

Explore Further

Distribution Definition Random Variable Definition Variance Definition Central Limit Theorem Theorem #521 Central Limit Theorem Theorem #1848 Pointwise Bias and Variance Expansion for the Nadaraya-Watson Estimator Probability & Statistics Lasso Basic Inequality Probability & Statistics Exact Null Distribution of the Wilcoxon Signed-Rank Statistic Probability & Statistics Bandwidth-Scale Stochastic Equicontinuity for Kernel Density Processes Probability & Statistics Threshold Events Are Events Probability Theory Conditioning and Independence Conditional Expectation Ridgeless Regression Conditional Risk Decomposition Probability & Statistics Strong Markov Property of Brownian Motion Brownian Motion Probability & Statistics Area

What brings you to Androma?

Start with a route through the knowledge graph.