Hoeffding Decomposition for U-Statistics — Statement & Proof

Hoeffding Decomposition for U-Statistics (Theorem # 6334)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We construct the kernels by Möbius inversion on the finite lattice of subsets of $\{1,\dots,m\}$. First we define [conditional expectation](/page/Conditional%20Expectation) kernels indexed by subsets of coordinates, then subtract all lower-order contributions to obtain canonical projections. The inversion formula reconstructs $h-\theta$ as a sum of these projections over all coordinate subsets. Averaging this pointwise decomposition over all $m$-subsets of $\{1,\dots,n\}$ gives the stated U-statistic identity, and the canonical degeneracy follows by integrating one coordinate and using cancellation in the inclusion-exclusion formula. [/proofplan] [step:Define the coordinate conditional expectation kernels] Let $[m]:=\{1,\dots,m\}$. For each subset $A\subset [m]$, write $|A|$ for its cardinality. If $A=\{a_1<\cdots<a_r\}$, define the coordinate projection $\pi_A:E^m\to E^r$ by \begin{align*} \pi_A(x_1,\dots,x_m)=(x_{a_1},\dots,x_{a_r}). \end{align*} For $A=\varnothing$, let $E^0$ be the one-point space and let $\pi_\varnothing$ be the unique map to $E^0$. For every $A\subset [m]$, let $A^c=[m]\setminus A$ and define the coordinate insertion map \begin{align*} \iota_A:E^{|A|}\times E^{|A^c|}\to E^m \end{align*} by placing the first argument in the coordinates indexed by $A$ and the second argument in the coordinates indexed by $A^c$, both in increasing order. Define $g_A:E^{|A|}\to\mathbb R$ by \begin{align*} g_A(x_A)=\int_{E^{|A^c|}}h(\iota_A(x_A,z))\,d\mu^{\otimes |A^c|}(z). \end{align*} For $A=\varnothing$, this gives \begin{align*} g_\varnothing=\int_{E^m}h(x)\,d\mu^{\otimes m}(x)=\theta. \end{align*} For $A=[m]$, the integral over the one-point space $E^0$ gives $g_{[m]}=h$ as an element of $L^2(E^m,\mu^{\otimes m})$. Since $\mu^{\otimes m}$ is a probability measure and $h\in L^2(E^m,\mu^{\otimes m})$, Cauchy-Schwarz gives $h\in L^1(E^m,\mu^{\otimes m})$, so these integrals are defined. [Jensen's inequality](/theorems/9) for the probability measure $\mu^{\otimes |A^c|}$ gives \begin{align*} |g_A(x_A)|^2\leq \int_{E^{|A^c|}}|h(\iota_A(x_A,z))|^2\,d\mu^{\otimes |A^c|}(z). \end{align*} Integrating this inequality over $E^{|A|}$ with respect to $\mu^{\otimes |A|}$ and applying Tonelli's theorem gives $g_A\in L^2(E^{|A|},\mu^{\otimes |A|})$. Because $h$ is symmetric and the product measure is invariant under coordinate permutations, the function $g_A$ depends only on $|A|$ up to permutation of its arguments. Thus, for each $0\leq r\leq m$, there is a symmetric kernel $g_r:E^r\to\mathbb R$ in $L^2(E^r,\mu^{\otimes r})$ such that $g_A(x_A)=g_r(x_A)$ whenever $|A|=r$, with the coordinates listed in increasing order. [/step] [step:Construct the canonical kernels by finite Möbius inversion] For $s=0$, set $h_0:=\theta$. For $1\leq s\leq m$, define the map $h_s:E^s\to \mathbb R$ by \begin{align*} h_s(x_1,\dots,x_s)=\sum_{B\subset \{1,\dots,s\}}(-1)^{s-|B|}g_{|B|}(x_B), \end{align*} where, if $B=\{b_1<\cdots<b_r\}$, the notation $x_B$ means $(x_{b_1},\dots,x_{b_r})$, and for $B=\varnothing$ the term is $g_0=\theta$. The finite sum of $L^2$ functions is in $L^2(E^s,\mu^{\otimes s})$, so $h_s\in L^2(E^s,\mu^{\otimes s})$. The symmetry of each $g_r$ implies the symmetry of $h_s$, because permuting $(x_1,\dots,x_s)$ merely permutes the subsets $B\subset \{1,\dots,s\}$ and leaves the cardinalities $|B|$ unchanged. [guided] Fix $s\in\{1,\dots,m\}$. We define the order-$s$ canonical kernel by the inclusion-exclusion formula \begin{align*} h_s(x_1,\dots,x_s)=\sum_{B\subset \{1,\dots,s\}}(-1)^{s-|B|}g_{|B|}(x_B), \end{align*} where, if $B=\{b_1<\cdots<b_r\}$, then $x_B=(x_{b_1},\dots,x_{b_r})$, and the term for $B=\varnothing$ is $g_0=\theta$. The point of this formula is to remove all lower-order coordinate averages from the order-$s$ average. For instance, \begin{align*} h_1(x_1)=g_1(x_1)-\theta, \end{align*} and \begin{align*} h_2(x_1,x_2)=g_2(x_1,x_2)-g_1(x_1)-g_1(x_2)+\theta. \end{align*} Thus $h_2$ is the part of the two-coordinate kernel not already accounted for by the constant term and the one-coordinate effects. The formula has only finitely many terms. For each $B\subset\{1,\dots,s\}$, the map $(x_1,\dots,x_s)\mapsto g_{|B|}(x_B)$ belongs to $L^2(E^s,\mu^{\otimes s})$, because it is pulled back from $g_{|B|}\in L^2(E^{|B|},\mu^{\otimes |B|})$ and the unused coordinates are integrated against a probability measure. Therefore the finite sum defining $h_s$ belongs to $L^2(E^s,\mu^{\otimes s})$. It remains to check symmetry. Let $\rho$ be a permutation of $\{1,\dots,s\}$. Since each $g_r$ is symmetric, replacing $(x_1,\dots,x_s)$ by $(x_{\rho(1)},\dots,x_{\rho(s)})$ sends the summand indexed by $B$ to the summand indexed by $\rho^{-1}(B)$, with the same cardinality and hence the same sign. The map $B\mapsto\rho^{-1}(B)$ is a bijection of the subset lattice, so the whole sum is unchanged. Hence $h_s$ is symmetric. [/guided] [/step] [step:Invert the construction to recover the kernel from its projections] For every subset $A\subset [m]$, define the map $\psi_A:E^{|A|}\to \mathbb R$ by \begin{align*} \psi_A(x_A)=h_{|A|}(x_A), \end{align*} where the coordinates of $x_A$ are listed in increasing order. When $C\subset A$ appears below, $g_C(x_C)$ means the subset-indexed kernel associated to $C$; equivalently, it is the cardinality kernel $g_{|C|}$ applied to the coordinates $x_C$ in increasing order. We claim that, for every $A\subset [m]$, \begin{align*} g_A(x_A)=\sum_{B\subset A}\psi_B(x_B) \end{align*} for $\mu^{\otimes |A|}$-almost every $x_A\in E^{|A|}$. Indeed, let $A\subset [m]$ have cardinality $r$. Using the definition of $\psi_B$ and then changing the order of the finite sums, first obtain \begin{align*} \sum_{B\subset A}\psi_B(x_B)=\sum_{B\subset A}\sum_{C\subset B}(-1)^{|B|-|C|}g_C(x_C). \end{align*} Reindexing the finite double sum by fixing $C\subset A$ gives \begin{align*} \sum_{B\subset A}\psi_B(x_B)=\sum_{C\subset A}g_C(x_C)\sum_{\substack{B:\ C\subset B\subset A}}(-1)^{|B|-|C|}. \end{align*} For fixed $C\subset A$, write $q=|A|-|C|$. The inner sum is \begin{align*} \sum_{\substack{B:\ C\subset B\subset A}}(-1)^{|B|-|C|}=\sum_{\ell=0}^{q}\binom{q}{\ell}(-1)^\ell=(1-1)^q. \end{align*} This equals $1$ when $q=0$, equivalently $C=A$, and equals $0$ otherwise. Hence only the term $C=A$ remains, proving the inversion formula. Taking $A=[m]$ gives \begin{align*} h(x_1,\dots,x_m)=\sum_{B\subset [m]}\psi_B(x_B). \end{align*} Grouping the subsets $B\subset [m]$ by their cardinality gives \begin{align*} h(x_1,\dots,x_m)=\theta+\sum_{s=1}^m\sum_{\substack{B\subset [m]\,:\, |B|=s}}h_s(x_B) \end{align*} for $\mu^{\otimes m}$-almost every $(x_1,\dots,x_m)\in E^m$, because $g_{[m]}=h$ in $L^2(E^m,\mu^{\otimes m})$. [/step] [step:Prove the kernels are canonical in each argument] Fix $s\in\{1,\dots,m\}$ and fix $j\in\{1,\dots,s\}$. Since $h_s$ is symmetric, it is enough to prove the degeneracy in the last coordinate $j=s$. For $C\subset \{1,\dots,s-1\}$, the product-integral definition of the kernels gives \begin{align*} \int_E g_{|C|+1}(x_C,y)\,d\mu(y)=g_{|C|}(x_C) \end{align*} for $\mu^{\otimes |C|}$-almost every $x_C\in E^{|C|}$. Indeed, $h\in L^1(E^m,\mu^{\otimes m})$ because $h\in L^2(E^m,\mu^{\otimes m})$ and $\mu^{\otimes m}$ is a probability measure. Hence [Fubini's theorem](/theorems/2961) applies to the product integral defining $g_{|C|+1}$, and integrating first in $y$ and then in all coordinates outside $C$ is exactly the product integral defining $g_{|C|}$. Therefore, \begin{align*} \int_E h_s(x_1,\dots,x_{s-1},y)\,d\mu(y)=\sum_{B\subset \{1,\dots,s\}}(-1)^{s-|B|}\int_E g_{|B|}((x_1,\dots,x_{s-1},y)_B)\,d\mu(y). \end{align*} Split the subsets $B\subset \{1,\dots,s\}$ into those not containing $s$ and those of the form $C\cup\{s\}$ with $C\subset \{1,\dots,s-1\}$. If $s\notin B$, the integrand is independent of $y$, so its integral is $g_{|B|}(x_B)$. If $B=C\cup\{s\}$, the previous conditional-expectation identity gives the integral $g_{|C|}(x_C)$. Hence \begin{align*} \int_E h_s(x_1,\dots,x_{s-1},y)\,d\mu(y)=\sum_{C\subset \{1,\dots,s-1\}}\left((-1)^{s-|C|}+(-1)^{s-(|C|+1)}\right)g_{|C|}(x_C). \end{align*} For each $C\subset \{1,\dots,s-1\}$, the coefficient satisfies $(-1)^{s-|C|}+(-1)^{s-(|C|+1)}=0$, so \begin{align*} \int_E h_s(x_1,\dots,x_{s-1},y)\,d\mu(y)=0. \end{align*} This proves degeneracy in the last coordinate, and symmetry gives degeneracy in every coordinate. Integrating the degeneracy identity over the remaining $s-1$ variables gives \begin{align*} \int_{E^s}h_s(x_1,\dots,x_s)\,d\mu^{\otimes s}(x_1,\dots,x_s)=0. \end{align*} [/step] [step:Average the pointwise expansion over all sampled subsets] For each subset $I=\{i_1<\cdots<i_m\}\subset \{1,\dots,n\}$ with $|I|=m$, apply the expansion of $h-\theta$ to the random vector $(X_{i_1},\dots,X_{i_m})$. Since the variables $X_1,\dots,X_n$ are independent with common law $\mu$, the expansion holds almost surely for every fixed $I$. Thus \begin{align*} h(X_{i_1},\dots,X_{i_m})-\theta=\sum_{s=1}^m\sum_{\substack{A\subset I\,:\, |A|=s}}h_s(X_A), \end{align*} where, if $A=\{a_1<\cdots<a_s\}$, then $X_A=(X_{a_1},\dots,X_{a_s})$. Average this identity over all $m$-element subsets $I\subset\{1,\dots,n\}$. This gives \begin{align*} U_n-\theta=\binom{n}{m}^{-1}\sum_{\substack{I\subset \{1,\dots,n\}\,:\, |I|=m}}\sum_{s=1}^m\sum_{\substack{A\subset I\,:\, |A|=s}}h_s(X_A). \end{align*} Changing the order of the finite sums gives \begin{align*} U_n-\theta=\sum_{s=1}^m\binom{n}{m}^{-1}\sum_{\substack{A\subset \{1,\dots,n\}\,:\, |A|=s}}h_s(X_A)\#\{I\subset \{1,\dots,n\}: |I|=m,\ A\subset I\}. \end{align*} For a fixed $s$-element set $A$, the number of $m$-element sets $I$ containing $A$ is $\binom{n-s}{m-s}$, because the remaining $m-s$ elements of $I$ must be chosen from the $n-s$ elements outside $A$. Therefore \begin{align*} U_n-\theta=\sum_{s=1}^m\binom{n}{m}^{-1}\binom{n-s}{m-s}\sum_{\substack{A\subset \{1,\dots,n\}\,:\, |A|=s}}h_s(X_A). \end{align*} Using the binomial coefficient identity \begin{align*} \binom{n}{m}^{-1}\binom{n-s}{m-s} = \binom{m}{s}\binom{n}{s}^{-1}, \end{align*} we obtain \begin{align*} U_n-\theta=\sum_{s=1}^m\binom{m}{s}\binom{n}{s}^{-1}\sum_{\substack{A\subset \{1,\dots,n\}\,:\, |A|=s}}h_s(X_A). \end{align*} By the definition of $U_{n,s}$, this is \begin{align*} U_n-\theta=\sum_{s=1}^m \binom{m}{s}U_{n,s}. \end{align*} This is the asserted Hoeffding decomposition. [/step]

Prerequisites (0/1 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Definitions & Concepts

Expectation

Explore Further

Expectation Definition Product of Sub-Gaussian Random Variables is Sub-Exponential Probability & Statistics PGF of a Sum Probability Theory Unbiasedness of $K$-Fold Cross-Validation Risk for Fold-Trained Predictors Probability & Statistics Affine Transformation of Variance Probability Theory Nonexistence of Fully Adaptive Honest Supremum-Norm Confidence Bands Probability & Statistics Gaussian Score Concentration for Column-Normalized Fixed Designs Probability & Statistics Asymptotic Eigenvector Overlap in the Rank-One Spiked Covariance Model Probability & Statistics Strong Law of Large Numbers for U-Statistics Probability & Statistics Probability & Statistics Area

What brings you to Androma?

Start with a route through the knowledge graph.

Hoeffding Decomposition for U-Statistics (Theorem # 6334)

Discussion

Proof

Prerequisites (0/1 completed)

Prerequisites Graph

Explore Further

Sign in to Androma

Check your inbox

One last step

Hoeffding Decomposition for U-Statistics (Theorem # 6334)

Discussion

Proof

Prerequisites (0/1 completed)

Prerequisites Graph

Explore Further