[proofplan]
We construct the kernels by Möbius inversion on the finite lattice of subsets of $\{1,\dots,m\}$. First we define [conditional expectation](/page/Conditional%20Expectation) kernels indexed by subsets of coordinates, then subtract all lower-order contributions to obtain canonical projections. The inversion formula reconstructs $h-\theta$ as a sum of these projections over all coordinate subsets. Averaging this pointwise decomposition over all $m$-subsets of $\{1,\dots,n\}$ gives the stated U-statistic identity, and the canonical degeneracy follows by integrating one coordinate and using cancellation in the inclusion-exclusion formula.
[/proofplan]
[step:Define the coordinate conditional expectation kernels]
Let $[m]:=\{1,\dots,m\}$. For each subset $A\subset [m]$, write $|A|$ for its cardinality. If $A=\{a_1<\cdots<a_r\}$, define the coordinate projection $\pi_A:E^m\to E^r$ by
\begin{align*}
\pi_A(x_1,\dots,x_m)=(x_{a_1},\dots,x_{a_r}).
\end{align*}
For $A=\varnothing$, let $E^0$ be the one-point space and let $\pi_\varnothing$ be the unique map to $E^0$.
For every $A\subset [m]$, let $A^c=[m]\setminus A$ and define the coordinate insertion map
\begin{align*}
\iota_A:E^{|A|}\times E^{|A^c|}\to E^m
\end{align*}
by placing the first argument in the coordinates indexed by $A$ and the second argument in the coordinates indexed by $A^c$, both in increasing order. Define $g_A:E^{|A|}\to\mathbb R$ by
\begin{align*}
g_A(x_A)=\int_{E^{|A^c|}}h(\iota_A(x_A,z))\,d\mu^{\otimes |A^c|}(z).
\end{align*}
For $A=\varnothing$, this gives
\begin{align*}
g_\varnothing=\int_{E^m}h(x)\,d\mu^{\otimes m}(x)=\theta.
\end{align*}
For $A=[m]$, the integral over the one-point space $E^0$ gives $g_{[m]}=h$ as an element of $L^2(E^m,\mu^{\otimes m})$. Since $\mu^{\otimes m}$ is a probability measure and $h\in L^2(E^m,\mu^{\otimes m})$, Cauchy-Schwarz gives $h\in L^1(E^m,\mu^{\otimes m})$, so these integrals are defined. [Jensen's inequality](/theorems/9) for the probability measure $\mu^{\otimes |A^c|}$ gives
\begin{align*}
|g_A(x_A)|^2\leq \int_{E^{|A^c|}}|h(\iota_A(x_A,z))|^2\,d\mu^{\otimes |A^c|}(z).
\end{align*}
Integrating this inequality over $E^{|A|}$ with respect to $\mu^{\otimes |A|}$ and applying Tonelli's theorem gives $g_A\in L^2(E^{|A|},\mu^{\otimes |A|})$.
Because $h$ is symmetric and the product measure is invariant under coordinate permutations, the function $g_A$ depends only on $|A|$ up to permutation of its arguments. Thus, for each $0\leq r\leq m$, there is a symmetric kernel $g_r:E^r\to\mathbb R$ in $L^2(E^r,\mu^{\otimes r})$ such that $g_A(x_A)=g_r(x_A)$ whenever $|A|=r$, with the coordinates listed in increasing order.
[/step]
[step:Construct the canonical kernels by finite Möbius inversion]
For $s=0$, set $h_0:=\theta$. For $1\leq s\leq m$, define the map $h_s:E^s\to \mathbb R$ by
\begin{align*}
h_s(x_1,\dots,x_s)=\sum_{B\subset \{1,\dots,s\}}(-1)^{s-|B|}g_{|B|}(x_B),
\end{align*}
where, if $B=\{b_1<\cdots<b_r\}$, the notation $x_B$ means $(x_{b_1},\dots,x_{b_r})$, and for $B=\varnothing$ the term is $g_0=\theta$.
The finite sum of $L^2$ functions is in $L^2(E^s,\mu^{\otimes s})$, so $h_s\in L^2(E^s,\mu^{\otimes s})$. The symmetry of each $g_r$ implies the symmetry of $h_s$, because permuting $(x_1,\dots,x_s)$ merely permutes the subsets $B\subset \{1,\dots,s\}$ and leaves the cardinalities $|B|$ unchanged.
[guided]
Fix $s\in\{1,\dots,m\}$. We define the order-$s$ canonical kernel by the inclusion-exclusion formula
\begin{align*}
h_s(x_1,\dots,x_s)=\sum_{B\subset \{1,\dots,s\}}(-1)^{s-|B|}g_{|B|}(x_B),
\end{align*}
where, if $B=\{b_1<\cdots<b_r\}$, then $x_B=(x_{b_1},\dots,x_{b_r})$, and the term for $B=\varnothing$ is $g_0=\theta$. The point of this formula is to remove all lower-order coordinate averages from the order-$s$ average. For instance,
\begin{align*}
h_1(x_1)=g_1(x_1)-\theta,
\end{align*}
and
\begin{align*}
h_2(x_1,x_2)=g_2(x_1,x_2)-g_1(x_1)-g_1(x_2)+\theta.
\end{align*}
Thus $h_2$ is the part of the two-coordinate kernel not already accounted for by the constant term and the one-coordinate effects.
The formula has only finitely many terms. For each $B\subset\{1,\dots,s\}$, the map $(x_1,\dots,x_s)\mapsto g_{|B|}(x_B)$ belongs to $L^2(E^s,\mu^{\otimes s})$, because it is pulled back from $g_{|B|}\in L^2(E^{|B|},\mu^{\otimes |B|})$ and the unused coordinates are integrated against a probability measure. Therefore the finite sum defining $h_s$ belongs to $L^2(E^s,\mu^{\otimes s})$.
It remains to check symmetry. Let $\rho$ be a permutation of $\{1,\dots,s\}$. Since each $g_r$ is symmetric, replacing $(x_1,\dots,x_s)$ by $(x_{\rho(1)},\dots,x_{\rho(s)})$ sends the summand indexed by $B$ to the summand indexed by $\rho^{-1}(B)$, with the same cardinality and hence the same sign. The map $B\mapsto\rho^{-1}(B)$ is a bijection of the subset lattice, so the whole sum is unchanged. Hence $h_s$ is symmetric.
[/guided]
[/step]
[step:Invert the construction to recover the kernel from its projections]
For every subset $A\subset [m]$, define the map $\psi_A:E^{|A|}\to \mathbb R$ by
\begin{align*}
\psi_A(x_A)=h_{|A|}(x_A),
\end{align*}
where the coordinates of $x_A$ are listed in increasing order. When $C\subset A$ appears below, $g_C(x_C)$ means the subset-indexed kernel associated to $C$; equivalently, it is the cardinality kernel $g_{|C|}$ applied to the coordinates $x_C$ in increasing order. We claim that, for every $A\subset [m]$,
\begin{align*}
g_A(x_A)=\sum_{B\subset A}\psi_B(x_B)
\end{align*}
for $\mu^{\otimes |A|}$-almost every $x_A\in E^{|A|}$.
Indeed, let $A\subset [m]$ have cardinality $r$. Using the definition of $\psi_B$ and then changing the order of the finite sums, first obtain
\begin{align*}
\sum_{B\subset A}\psi_B(x_B)=\sum_{B\subset A}\sum_{C\subset B}(-1)^{|B|-|C|}g_C(x_C).
\end{align*}
Reindexing the finite double sum by fixing $C\subset A$ gives
\begin{align*}
\sum_{B\subset A}\psi_B(x_B)=\sum_{C\subset A}g_C(x_C)\sum_{\substack{B:\ C\subset B\subset A}}(-1)^{|B|-|C|}.
\end{align*}
For fixed $C\subset A$, write $q=|A|-|C|$. The inner sum is
\begin{align*}
\sum_{\substack{B:\ C\subset B\subset A}}(-1)^{|B|-|C|}=\sum_{\ell=0}^{q}\binom{q}{\ell}(-1)^\ell=(1-1)^q.
\end{align*}
This equals $1$ when $q=0$, equivalently $C=A$, and equals $0$ otherwise. Hence only the term $C=A$ remains, proving the inversion formula.
Taking $A=[m]$ gives
\begin{align*}
h(x_1,\dots,x_m)=\sum_{B\subset [m]}\psi_B(x_B).
\end{align*}
Grouping the subsets $B\subset [m]$ by their cardinality gives
\begin{align*}
h(x_1,\dots,x_m)=\theta+\sum_{s=1}^m\sum_{\substack{B\subset [m]\,:\, |B|=s}}h_s(x_B)
\end{align*}
for $\mu^{\otimes m}$-almost every $(x_1,\dots,x_m)\in E^m$, because $g_{[m]}=h$ in $L^2(E^m,\mu^{\otimes m})$.
[/step]
[step:Prove the kernels are canonical in each argument]
Fix $s\in\{1,\dots,m\}$ and fix $j\in\{1,\dots,s\}$. Since $h_s$ is symmetric, it is enough to prove the degeneracy in the last coordinate $j=s$.
For $C\subset \{1,\dots,s-1\}$, the product-integral definition of the kernels gives
\begin{align*}
\int_E g_{|C|+1}(x_C,y)\,d\mu(y)=g_{|C|}(x_C)
\end{align*}
for $\mu^{\otimes |C|}$-almost every $x_C\in E^{|C|}$. Indeed, $h\in L^1(E^m,\mu^{\otimes m})$ because $h\in L^2(E^m,\mu^{\otimes m})$ and $\mu^{\otimes m}$ is a probability measure. Hence [Fubini's theorem](/theorems/2961) applies to the product integral defining $g_{|C|+1}$, and integrating first in $y$ and then in all coordinates outside $C$ is exactly the product integral defining $g_{|C|}$. Therefore,
\begin{align*}
\int_E h_s(x_1,\dots,x_{s-1},y)\,d\mu(y)=\sum_{B\subset \{1,\dots,s\}}(-1)^{s-|B|}\int_E g_{|B|}((x_1,\dots,x_{s-1},y)_B)\,d\mu(y).
\end{align*}
Split the subsets $B\subset \{1,\dots,s\}$ into those not containing $s$ and those of the form $C\cup\{s\}$ with $C\subset \{1,\dots,s-1\}$. If $s\notin B$, the integrand is independent of $y$, so its integral is $g_{|B|}(x_B)$. If $B=C\cup\{s\}$, the previous conditional-expectation identity gives the integral $g_{|C|}(x_C)$. Hence
\begin{align*}
\int_E h_s(x_1,\dots,x_{s-1},y)\,d\mu(y)=\sum_{C\subset \{1,\dots,s-1\}}\left((-1)^{s-|C|}+(-1)^{s-(|C|+1)}\right)g_{|C|}(x_C).
\end{align*}
For each $C\subset \{1,\dots,s-1\}$, the coefficient satisfies $(-1)^{s-|C|}+(-1)^{s-(|C|+1)}=0$, so
\begin{align*}
\int_E h_s(x_1,\dots,x_{s-1},y)\,d\mu(y)=0.
\end{align*}
This proves degeneracy in the last coordinate, and symmetry gives degeneracy in every coordinate. Integrating the degeneracy identity over the remaining $s-1$ variables gives
\begin{align*}
\int_{E^s}h_s(x_1,\dots,x_s)\,d\mu^{\otimes s}(x_1,\dots,x_s)=0.
\end{align*}
[/step]
[step:Average the pointwise expansion over all sampled subsets]
For each subset $I=\{i_1<\cdots<i_m\}\subset \{1,\dots,n\}$ with $|I|=m$, apply the expansion of $h-\theta$ to the random vector $(X_{i_1},\dots,X_{i_m})$. Since the variables $X_1,\dots,X_n$ are independent with common law $\mu$, the expansion holds almost surely for every fixed $I$. Thus
\begin{align*}
h(X_{i_1},\dots,X_{i_m})-\theta=\sum_{s=1}^m\sum_{\substack{A\subset I\,:\, |A|=s}}h_s(X_A),
\end{align*}
where, if $A=\{a_1<\cdots<a_s\}$, then $X_A=(X_{a_1},\dots,X_{a_s})$.
Average this identity over all $m$-element subsets $I\subset\{1,\dots,n\}$. This gives
\begin{align*}
U_n-\theta=\binom{n}{m}^{-1}\sum_{\substack{I\subset \{1,\dots,n\}\,:\, |I|=m}}\sum_{s=1}^m\sum_{\substack{A\subset I\,:\, |A|=s}}h_s(X_A).
\end{align*}
Changing the order of the finite sums gives
\begin{align*}
U_n-\theta=\sum_{s=1}^m\binom{n}{m}^{-1}\sum_{\substack{A\subset \{1,\dots,n\}\,:\, |A|=s}}h_s(X_A)\#\{I\subset \{1,\dots,n\}: |I|=m,\ A\subset I\}.
\end{align*}
For a fixed $s$-element set $A$, the number of $m$-element sets $I$ containing $A$ is $\binom{n-s}{m-s}$, because the remaining $m-s$ elements of $I$ must be chosen from the $n-s$ elements outside $A$. Therefore
\begin{align*}
U_n-\theta=\sum_{s=1}^m\binom{n}{m}^{-1}\binom{n-s}{m-s}\sum_{\substack{A\subset \{1,\dots,n\}\,:\, |A|=s}}h_s(X_A).
\end{align*}
Using the binomial coefficient identity
\begin{align*}
\binom{n}{m}^{-1}\binom{n-s}{m-s}
=
\binom{m}{s}\binom{n}{s}^{-1},
\end{align*}
we obtain
\begin{align*}
U_n-\theta=\sum_{s=1}^m\binom{m}{s}\binom{n}{s}^{-1}\sum_{\substack{A\subset \{1,\dots,n\}\,:\, |A|=s}}h_s(X_A).
\end{align*}
By the definition of $U_{n,s}$, this is
\begin{align*}
U_n-\theta=\sum_{s=1}^m \binom{m}{s}U_{n,s}.
\end{align*}
This is the asserted Hoeffding decomposition.
[/step]