Strong Law of Large Numbers for U-Statistics — Statement & Proof

Strong Law of Large Numbers for U-Statistics (Theorem # 6335)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We first prove the assertion for bounded kernels by viewing $U_n$ as a bounded function of the independent variables $X_1,\dots,X_n$ and applying a bounded-differences concentration inequality together with Borel-Cantelli. Then we pass from bounded kernels to integrable kernels by truncating $h$ and using the standard tail reduction for U-statistics, whose content is that non-negative integrable tail kernels have asymptotic U-average no larger than their expectation. Finally, dominated convergence sends the truncation level to infinity and identifies the limiting constant as $\theta$. [/proofplan] [step:Prove almost sure convergence for bounded kernels by bounded differences] Assume first that there is a constant $K \in [0,\infty)$ such that \begin{align*} |h(x_1,\dots,x_m)| \le K \end{align*} for every $(x_1,\dots,x_m) \in E^m$. If $K=0$, then $h(X_{i_1},\dots,X_{i_m})=0$ for every index set $1\le i_1<\cdots<i_m\le n$ on a probability-one event, so $U_n=0=\theta$ for every $n\ge m$ on that event. Hence the bounded case is immediate when $K=0$. We therefore assume $K>0$ for the concentration estimate below. For each $n \ge m$, define the measurable map $F_n:E^n\to \mathbb R$ by sending $(x_1,\dots,x_n)\in E^n$ to \begin{align*} \binom{n}{m}^{-1}\sum_{1 \le i_1 < \cdots < i_m \le n} h(x_{i_1},\dots,x_{i_m}). \end{align*} Then $U_n = F_n(X_1,\dots,X_n)$. Fix $j \in \{1,\dots,n\}$. If two points $(x_1,\dots,x_n),(y_1,\dots,y_n) \in E^n$ agree in every coordinate except possibly the $j$-th coordinate, then only those summands whose index set contains $j$ can change. There are exactly $\binom{n-1}{m-1}$ such summands, and each changed summand changes by at most $2K$. Hence \begin{align*} |F_n(x_1,\dots,x_n)-F_n(y_1,\dots,y_n)| \le \binom{n}{m}^{-1}\binom{n-1}{m-1}2K. \end{align*} Using the identity $\binom{n-1}{m-1}/\binom{n}{m}=m/n$, we obtain \begin{align*} |F_n(x_1,\dots,x_n)-F_n(y_1,\dots,y_n)|\le \frac{2Km}{n}. \end{align*} By the independently established [McDiarmid bounded differences inequality](/theorems/6074) for independent random variables, whose hypotheses are satisfied because $X_1,\dots,X_n$ are independent and the $j$-th coordinate oscillation of $F_n$ is at most $2Km/n$, for every $\varepsilon>0$, \begin{align*} \mathbb P(|U_n-\mathbb E[U_n]| \ge \varepsilon) \le 2\exp\left(-\frac{2\varepsilon^2}{\sum_{j=1}^n (2Km/n)^2}\right). \end{align*} Since $\sum_{j=1}^n(2Km/n)^2=4K^2m^2/n$, this is \begin{align*} \mathbb P(|U_n-\mathbb E[U_n]| \ge \varepsilon) \le 2\exp\left(-\frac{\varepsilon^2 n}{2K^2m^2}\right). \end{align*} The series \begin{align*} \sum_{n=m}^{\infty} 2\exp\left(-\frac{\varepsilon^2 n}{2K^2m^2}\right) \end{align*} is finite. By the independently established first Borel-Cantelli lemma applied to the events $\{|U_n-\mathbb E[U_n]|\ge\varepsilon\}$, for each fixed $\varepsilon>0$, \begin{align*} |U_n-\mathbb E[U_n]| < \varepsilon \end{align*} for all sufficiently large $n$, $\mathbb P$-a.s. Applying this to $\varepsilon=1/q$ for $q \in \mathbb N$ and intersecting the resulting probability-one events gives \begin{align*} U_n-\mathbb E[U_n] \to 0 \end{align*} $\mathbb P$-a.s. For each index set $1 \le i_1 < \cdots < i_m \le n$, the random vector $(X_{i_1},\dots,X_{i_m})$ has the same distribution as $(X_1,\dots,X_m)$ because the sequence $(X_i)_{i\ge1}$ is i.i.d. Therefore \begin{align*} \mathbb E[U_n] = \binom{n}{m}^{-1} \sum_{1 \le i_1 < \cdots < i_m \le n} \mathbb E[h(X_{i_1},\dots,X_{i_m})]. \end{align*} Since every expectation in the sum equals $\theta$ and there are $\binom{n}{m}$ summands, \begin{align*} \mathbb E[U_n]=\binom{n}{m}^{-1}\binom{n}{m}\theta=\theta. \end{align*} Thus $U_n \to \theta$ $\mathbb P$-a.s. for bounded kernels. [guided] The bounded case is the part where concentration replaces the more delicate permutation argument. We package the U-statistic as a function of the first $n$ observations. Define $F_n:E^n\to \mathbb R$ by sending $(x_1,\dots,x_n)\in E^n$ to \begin{align*} \binom{n}{m}^{-1}\sum_{1 \le i_1 < \cdots < i_m \le n}h(x_{i_1},\dots,x_{i_m}). \end{align*} Then $U_n=F_n(X_1,\dots,X_n)$. We estimate how much $F_n$ can change if only one coordinate is altered. Fix $j \in \{1,\dots,n\}$. A summand \begin{align*} h(x_{i_1},\dots,x_{i_m}) \end{align*} depends on $x_j$ exactly when $j \in \{i_1,\dots,i_m\}$. The number of such index sets is $\binom{n-1}{m-1}$, because after choosing $j$ we choose the remaining $m-1$ indices from the other $n-1$ positions. Since $|h|\le K$, changing one affected summand can change its value by at most $2K$. Therefore \begin{align*} |F_n(x_1,\dots,x_n)-F_n(y_1,\dots,y_n)| \le \binom{n}{m}^{-1}\binom{n-1}{m-1}2K. \end{align*} The combinatorial identity $\binom{n-1}{m-1}/\binom{n}{m}=m/n$ turns this into \begin{align*} |F_n(x_1,\dots,x_n)-F_n(y_1,\dots,y_n)|\le \frac{2Km}{n}. \end{align*} The independently established McDiarmid bounded differences inequality now applies because $X_1,\dots,X_n$ are independent and the coordinate oscillation of $F_n$ is bounded by $2Km/n$ in each coordinate. It gives, for every $\varepsilon>0$, \begin{align*} \mathbb P(|U_n-\mathbb E[U_n]| \ge \varepsilon) \le 2\exp\left(-\frac{2\varepsilon^2}{\sum_{j=1}^n (2Km/n)^2}\right). \end{align*} Because $\sum_{j=1}^n(2Km/n)^2=4K^2m^2/n$, this becomes \begin{align*} \mathbb P(|U_n-\mathbb E[U_n]| \ge \varepsilon) \le 2\exp\left(-\frac{\varepsilon^2 n}{2K^2m^2}\right). \end{align*} The right-hand side is summable in $n$. Hence the first Borel-Cantelli lemma, applied to the events $\{|U_n-\mathbb E[U_n]|\ge\varepsilon\}$, implies that, for each fixed $\varepsilon>0$, the event $|U_n-\mathbb E[U_n]|\ge\varepsilon$ occurs only finitely many times almost surely. Applying this statement to the countable sequence $\varepsilon=1/q$, $q\in\mathbb N$, gives \begin{align*} U_n-\mathbb E[U_n]\to0 \end{align*} almost surely. It remains to identify the expectation. Each $m$-tuple $(X_{i_1},\dots,X_{i_m})$ with $1\le i_1<\cdots<i_m\le n$ has the same law as $(X_1,\dots,X_m)$, so every summand has expectation $\theta$. Hence \begin{align*} \mathbb E[U_n] = \binom{n}{m}^{-1} \sum_{1 \le i_1 < \cdots < i_m \le n} \mathbb E[h(X_{i_1},\dots,X_{i_m})]. \end{align*} Every expectation in this sum equals $\theta$, and the number of summands is $\binom{n}{m}$. Therefore \begin{align*} \mathbb E[U_n]=\binom{n}{m}^{-1}\binom{n}{m}\theta=\theta. \end{align*} Combining this identity with $U_n-\mathbb E[U_n]\to0$ proves $U_n\to\theta$ almost surely in the bounded case. [/guided] [/step] [step:Control integrable tails by Hoeffding's non-negative U-statistic strong law] We record the external tail theorem used to pass from bounded kernels to integrable kernels. The independently established Hoeffding non-negative U-statistic strong law is used here as a prior input, not as a consequence of the present theorem; without that independent result, the truncation step below would be circular. It states the following: let $g:E^m\to[0,\infty]$ be an $\mathcal E^{\otimes m}$-measurable symmetric kernel such that \begin{align*} \mathbb E[g(X_1,\dots,X_m)]<\infty. \end{align*} For $n\ge m$, define \begin{align*} V_n(g):=\binom{n}{m}^{-1}\sum_{1\le i_1<\cdots<i_m\le n}g(X_{i_1},\dots,X_{i_m}). \end{align*} Then \begin{align*} \limsup_{n\to\infty}V_n(g)\le \mathbb E[g(X_1,\dots,X_m)] \end{align*} $\mathbb P$-a.s. [guided] This step is not proving Hoeffding's tail theorem from the bounded case, and it is not using the theorem currently being proved. It is invoking the independently established Hoeffding non-negative U-statistic strong law as an external nontrivial input for integrable non-negative kernels. This independence matters: if the non-negative result had only been obtained from the theorem currently under proof, the truncation argument would be circular. Its hypotheses match our setting: $g$ is $\mathcal E^{\otimes m}$-measurable, symmetric, non-negative, and satisfies $\mathbb E[g(X_1,\dots,X_m)]<\infty$. Therefore the Hoeffding non-negative U-statistic strong law gives \begin{align*} \limsup_{n\to\infty}V_n(g)\le \mathbb E[g(X_1,\dots,X_m)] \end{align*} $\mathbb P$-a.s. This is exactly the tail estimate needed later for $g=r_M$, the non-negative truncation error kernel. [/guided] [/step] [step:Truncate the kernel and pass the bounded result to the limit] For each $M>0$, define the truncated kernel $h_M:E^m\to\mathbb R$ by \begin{align*} h_M(x_1,\dots,x_m)=\max\{-M,\min\{h(x_1,\dots,x_m),M\}\}. \end{align*} The function $h_M$ is measurable and symmetric because $h$ is measurable and symmetric. It is bounded by $M$, so the bounded-kernel result gives \begin{align*} U_n(h_M) \xrightarrow{a.s.} \theta_M, \end{align*} where \begin{align*} U_n(h_M) :=\binom{n}{m}^{-1} \sum_{1\le i_1<\cdots<i_m\le n} h_M(X_{i_1},\dots,X_{i_m}) \end{align*} and \begin{align*} \theta_M:=\mathbb E[h_M(X_1,\dots,X_m)]. \end{align*} Define the non-negative tail kernel $r_M:E^m\to[0,\infty)$ by \begin{align*} r_M(x_1,\dots,x_m)=|h(x_1,\dots,x_m)-h_M(x_1,\dots,x_m)|. \end{align*} Then $r_M$ is measurable, symmetric, and \begin{align*} r_M(X_1,\dots,X_m)\le |h(X_1,\dots,X_m)|. \end{align*} Thus $\mathbb E[r_M(X_1,\dots,X_m)]<\infty$. Applying the tail reduction claim to $r_M$ gives \begin{align*} \limsup_{n\to\infty}U_n(r_M) \le \mathbb E[r_M(X_1,\dots,X_m)] \end{align*} $\mathbb P$-a.s. For every $n\ge m$, \begin{align*} |U_n(h)-\theta| \le |U_n(h_M)-\theta_M| + |U_n(h)-U_n(h_M)| + |\theta_M-\theta| \le |U_n(h_M)-\theta_M| + U_n(r_M) + \mathbb E[r_M(X_1,\dots,X_m)]. \end{align*} Taking the limit superior and using $U_n(h_M)\to\theta_M$ almost surely yields, for each fixed $M>0$, \begin{align*} \limsup_{n\to\infty}|U_n(h)-\theta|\le 2\mathbb E[r_M(X_1,\dots,X_m)] \end{align*} on a probability-one event that may depend on $M$. To pass to the truncation limit on one event, restrict from now on to integer truncation levels $M=N\in\mathbb N$ and intersect the corresponding probability-one events. On this single probability-one event, the displayed inequality holds for every $N\in\mathbb N$. Finally, $r_N(X_1,\dots,X_m)\to0$ pointwise as $N\to\infty$, and \begin{align*} 0\le r_N(X_1,\dots,X_m)\le |h(X_1,\dots,X_m)|. \end{align*} Since $|h(X_1,\dots,X_m)|$ is integrable, the [Dominated Convergence Theorem](/theorems/4) applies to the sequence $r_N(X_1,\dots,X_m)$ dominated by $|h(X_1,\dots,X_m)|$ and gives \begin{align*} \mathbb E[r_N(X_1,\dots,X_m)]\to0. \end{align*} Therefore \begin{align*} \limsup_{n\to\infty}|U_n(h)-\theta|=0 \end{align*} $\mathbb P$-a.s., which is equivalent to \begin{align*} U_n\xrightarrow{a.s.}\theta. \end{align*} This proves the theorem. [guided] We now remove the boundedness assumption by approximation. For each $M>0$, define the truncation map $h_M:E^m\to\mathbb R$ by \begin{align*} h_M(x_1,\dots,x_m)=\max\{-M,\min\{h(x_1,\dots,x_m),M\}\}. \end{align*} This function is measurable because it is obtained from the measurable function $h$ by applying continuous scalar truncation, and it is symmetric because $h$ is symmetric. It is bounded by $M$, so the bounded-kernel case already proved applies to $h_M$. Thus, with \begin{align*} U_n(h_M) :=\binom{n}{m}^{-1} \sum_{1\le i_1<\cdots<i_m\le n} h_M(X_{i_1},\dots,X_{i_m}) \end{align*} and \begin{align*} \theta_M:=\mathbb E[h_M(X_1,\dots,X_m)], \end{align*} we have \begin{align*} U_n(h_M)\xrightarrow{a.s.}\theta_M. \end{align*} The error made by truncating is measured by the non-negative kernel $r_M:E^m\to[0,\infty)$ defined by \begin{align*} r_M(x_1,\dots,x_m)=|h(x_1,\dots,x_m)-h_M(x_1,\dots,x_m)|. \end{align*} This kernel is measurable and symmetric. Moreover, for the random vector $(X_1,\dots,X_m)$, \begin{align*} 0\le r_M(X_1,\dots,X_m)\le |h(X_1,\dots,X_m)|. \end{align*} The right-hand side is integrable by hypothesis, so $\mathbb E[r_M(X_1,\dots,X_m)]<\infty$. Therefore the independently established Hoeffding non-negative U-statistic strong law applies to $r_M$ and gives \begin{align*} \limsup_{n\to\infty}U_n(r_M) \le \mathbb E[r_M(X_1,\dots,X_m)] \end{align*} $\mathbb P$-a.s. For every $n\ge m$, the triangle inequality gives \begin{align*} |U_n(h)-\theta| \le |U_n(h_M)-\theta_M|+|U_n(h)-U_n(h_M)|+|\theta_M-\theta|. \end{align*} The middle term is bounded by $U_n(r_M)$ by applying the triangle inequality term-by-term in the U-statistic average. The last term is bounded by $\mathbb E[r_M(X_1,\dots,X_m)]$ because \begin{align*} |\theta_M-\theta| =|\mathbb E[h_M(X_1,\dots,X_m)-h(X_1,\dots,X_m)]| \le \mathbb E[r_M(X_1,\dots,X_m)]. \end{align*} Hence \begin{align*} |U_n(h)-\theta| \le |U_n(h_M)-\theta_M|+U_n(r_M)+\mathbb E[r_M(X_1,\dots,X_m)]. \end{align*} Taking the limit superior in $n$, using $U_n(h_M)\to\theta_M$ almost surely, and using the tail estimate for $r_M$, we obtain for each fixed $M>0$ \begin{align*} \limsup_{n\to\infty}|U_n(h)-\theta| \le 2\mathbb E[r_M(X_1,\dots,X_m)] \end{align*} on a probability-one event depending on $M$. To make the final passage occur on one probability-one event, restrict to integer truncation levels $M=N\in\mathbb N$ and intersect the corresponding countably many probability-one events. On that single event, the preceding bound holds for every $N\in\mathbb N$. Finally, $r_N(X_1,\dots,X_m)\to0$ pointwise as $N\to\infty$, and the integrable [random variable](/page/Random%20Variable) $|h(X_1,\dots,X_m)|$ dominates all $r_N(X_1,\dots,X_m)$. The [Dominated Convergence Theorem](/theorems/4) therefore gives \begin{align*} \mathbb E[r_N(X_1,\dots,X_m)]\to0. \end{align*} Letting $N\to\infty$ in the bound for the limit superior yields \begin{align*} \limsup_{n\to\infty}|U_n(h)-\theta|=0 \end{align*} $\mathbb P$-a.s. This is exactly $U_n\xrightarrow{a.s.}\theta$, so the theorem is proved. [/guided] [/step]

Prerequisites (0/5 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

Definitions & Concepts

Explore Further

Distribution Definition Expectation Definition Event Definition McDiarmid Bounded Differences Inequality Theorem #6074 Triangle Inequality For Inner Product Spaces Theorem #433 Jensen's Inequality Probability Theory Sherman–Morrison Leave-One-Out Formula Probability & Statistics Equality Condition in the Gauss-Markov Theorem Probability & Statistics Stirling's Formula Probability Theory Lasso Oracle Inequality under a Restricted Eigenvalue Condition Probability & Statistics Sub-Gaussian Tail Implies Moment Growth Probability & Statistics Davis-Kahan Sin Theta Theorem Probability & Statistics Expectation of a Product of Independent Variables Probability Theory Probability & Statistics Area

What brings you to Androma?

Start with a route through the knowledge graph.

Strong Law of Large Numbers for U-Statistics (Theorem # 6335)

Discussion

Proof

Prerequisites (0/5 completed)

Prerequisites Graph

Explore Further

Sign in to Androma

Check your inbox

One last step

Strong Law of Large Numbers for U-Statistics (Theorem # 6335)

Discussion

Proof

Prerequisites (0/5 completed)

Prerequisites Graph

Explore Further