[proofplan]
We first prove the assertion for bounded kernels by viewing $U_n$ as a bounded function of the independent variables $X_1,\dots,X_n$ and applying a bounded-differences concentration inequality together with Borel-Cantelli. Then we pass from bounded kernels to integrable kernels by truncating $h$ and using the standard tail reduction for U-statistics, whose content is that non-negative integrable tail kernels have asymptotic U-average no larger than their expectation. Finally, dominated convergence sends the truncation level to infinity and identifies the limiting constant as $\theta$.
[/proofplan]
[step:Prove almost sure convergence for bounded kernels by bounded differences]
Assume first that there is a constant $K \in [0,\infty)$ such that
\begin{align*}
|h(x_1,\dots,x_m)| \le K
\end{align*}
for every $(x_1,\dots,x_m) \in E^m$. If $K=0$, then $h(X_{i_1},\dots,X_{i_m})=0$ for every index set $1\le i_1<\cdots<i_m\le n$ on a probability-one event, so $U_n=0=\theta$ for every $n\ge m$ on that event. Hence the bounded case is immediate when $K=0$. We therefore assume $K>0$ for the concentration estimate below. For each $n \ge m$, define the measurable map $F_n:E^n\to \mathbb R$ by sending $(x_1,\dots,x_n)\in E^n$ to
\begin{align*}
\binom{n}{m}^{-1}\sum_{1 \le i_1 < \cdots < i_m \le n} h(x_{i_1},\dots,x_{i_m}).
\end{align*}
Then $U_n = F_n(X_1,\dots,X_n)$.
Fix $j \in \{1,\dots,n\}$. If two points $(x_1,\dots,x_n),(y_1,\dots,y_n) \in E^n$ agree in every coordinate except possibly the $j$-th coordinate, then only those summands whose index set contains $j$ can change. There are exactly $\binom{n-1}{m-1}$ such summands, and each changed summand changes by at most $2K$. Hence
\begin{align*}
|F_n(x_1,\dots,x_n)-F_n(y_1,\dots,y_n)|
\le \binom{n}{m}^{-1}\binom{n-1}{m-1}2K.
\end{align*}
Using the identity $\binom{n-1}{m-1}/\binom{n}{m}=m/n$, we obtain
\begin{align*}
|F_n(x_1,\dots,x_n)-F_n(y_1,\dots,y_n)|\le \frac{2Km}{n}.
\end{align*}
By the independently established [McDiarmid bounded differences inequality](/theorems/6074) for independent random variables, whose hypotheses are satisfied because $X_1,\dots,X_n$ are independent and the $j$-th coordinate oscillation of $F_n$ is at most $2Km/n$, for every $\varepsilon>0$,
\begin{align*}
\mathbb P(|U_n-\mathbb E[U_n]| \ge \varepsilon) \le 2\exp\left(-\frac{2\varepsilon^2}{\sum_{j=1}^n (2Km/n)^2}\right).
\end{align*}
Since $\sum_{j=1}^n(2Km/n)^2=4K^2m^2/n$, this is
\begin{align*}
\mathbb P(|U_n-\mathbb E[U_n]| \ge \varepsilon) \le 2\exp\left(-\frac{\varepsilon^2 n}{2K^2m^2}\right).
\end{align*}
The series
\begin{align*}
\sum_{n=m}^{\infty}
2\exp\left(-\frac{\varepsilon^2 n}{2K^2m^2}\right)
\end{align*}
is finite. By the independently established first Borel-Cantelli lemma applied to the events $\{|U_n-\mathbb E[U_n]|\ge\varepsilon\}$, for each fixed $\varepsilon>0$,
\begin{align*}
|U_n-\mathbb E[U_n]| < \varepsilon
\end{align*}
for all sufficiently large $n$, $\mathbb P$-a.s. Applying this to $\varepsilon=1/q$ for $q \in \mathbb N$ and intersecting the resulting probability-one events gives
\begin{align*}
U_n-\mathbb E[U_n] \to 0
\end{align*}
$\mathbb P$-a.s.
For each index set $1 \le i_1 < \cdots < i_m \le n$, the random vector $(X_{i_1},\dots,X_{i_m})$ has the same distribution as $(X_1,\dots,X_m)$ because the sequence $(X_i)_{i\ge1}$ is i.i.d. Therefore
\begin{align*}
\mathbb E[U_n]
= \binom{n}{m}^{-1}
\sum_{1 \le i_1 < \cdots < i_m \le n}
\mathbb E[h(X_{i_1},\dots,X_{i_m})].
\end{align*}
Since every expectation in the sum equals $\theta$ and there are $\binom{n}{m}$ summands,
\begin{align*}
\mathbb E[U_n]=\binom{n}{m}^{-1}\binom{n}{m}\theta=\theta.
\end{align*}
Thus $U_n \to \theta$ $\mathbb P$-a.s. for bounded kernels.
[guided]
The bounded case is the part where concentration replaces the more delicate permutation argument. We package the U-statistic as a function of the first $n$ observations. Define $F_n:E^n\to \mathbb R$ by sending $(x_1,\dots,x_n)\in E^n$ to
\begin{align*}
\binom{n}{m}^{-1}\sum_{1 \le i_1 < \cdots < i_m \le n}h(x_{i_1},\dots,x_{i_m}).
\end{align*}
Then $U_n=F_n(X_1,\dots,X_n)$.
We estimate how much $F_n$ can change if only one coordinate is altered. Fix $j \in \{1,\dots,n\}$. A summand
\begin{align*}
h(x_{i_1},\dots,x_{i_m})
\end{align*}
depends on $x_j$ exactly when $j \in \{i_1,\dots,i_m\}$. The number of such index sets is $\binom{n-1}{m-1}$, because after choosing $j$ we choose the remaining $m-1$ indices from the other $n-1$ positions. Since $|h|\le K$, changing one affected summand can change its value by at most $2K$. Therefore
\begin{align*}
|F_n(x_1,\dots,x_n)-F_n(y_1,\dots,y_n)|
\le \binom{n}{m}^{-1}\binom{n-1}{m-1}2K.
\end{align*}
The combinatorial identity $\binom{n-1}{m-1}/\binom{n}{m}=m/n$ turns this into
\begin{align*}
|F_n(x_1,\dots,x_n)-F_n(y_1,\dots,y_n)|\le \frac{2Km}{n}.
\end{align*}
The independently established McDiarmid bounded differences inequality now applies because $X_1,\dots,X_n$ are independent and the coordinate oscillation of $F_n$ is bounded by $2Km/n$ in each coordinate. It gives, for every $\varepsilon>0$,
\begin{align*}
\mathbb P(|U_n-\mathbb E[U_n]| \ge \varepsilon) \le 2\exp\left(-\frac{2\varepsilon^2}{\sum_{j=1}^n (2Km/n)^2}\right).
\end{align*}
Because $\sum_{j=1}^n(2Km/n)^2=4K^2m^2/n$, this becomes
\begin{align*}
\mathbb P(|U_n-\mathbb E[U_n]| \ge \varepsilon) \le 2\exp\left(-\frac{\varepsilon^2 n}{2K^2m^2}\right).
\end{align*}
The right-hand side is summable in $n$. Hence the first Borel-Cantelli lemma, applied to the events $\{|U_n-\mathbb E[U_n]|\ge\varepsilon\}$, implies that, for each fixed $\varepsilon>0$, the event $|U_n-\mathbb E[U_n]|\ge\varepsilon$ occurs only finitely many times almost surely. Applying this statement to the countable sequence $\varepsilon=1/q$, $q\in\mathbb N$, gives
\begin{align*}
U_n-\mathbb E[U_n]\to0
\end{align*}
almost surely.
It remains to identify the expectation. Each $m$-tuple $(X_{i_1},\dots,X_{i_m})$ with $1\le i_1<\cdots<i_m\le n$ has the same law as $(X_1,\dots,X_m)$, so every summand has expectation $\theta$. Hence
\begin{align*}
\mathbb E[U_n]
= \binom{n}{m}^{-1}
\sum_{1 \le i_1 < \cdots < i_m \le n}
\mathbb E[h(X_{i_1},\dots,X_{i_m})].
\end{align*}
Every expectation in this sum equals $\theta$, and the number of summands is $\binom{n}{m}$. Therefore
\begin{align*}
\mathbb E[U_n]=\binom{n}{m}^{-1}\binom{n}{m}\theta=\theta.
\end{align*}
Combining this identity with $U_n-\mathbb E[U_n]\to0$ proves $U_n\to\theta$ almost surely in the bounded case.
[/guided]
[/step]
[step:Control integrable tails by Hoeffding's non-negative U-statistic strong law]
We record the external tail theorem used to pass from bounded kernels to integrable kernels. The independently established Hoeffding non-negative U-statistic strong law is used here as a prior input, not as a consequence of the present theorem; without that independent result, the truncation step below would be circular. It states the following: let $g:E^m\to[0,\infty]$ be an $\mathcal E^{\otimes m}$-measurable symmetric kernel such that
\begin{align*}
\mathbb E[g(X_1,\dots,X_m)]<\infty.
\end{align*}
For $n\ge m$, define
\begin{align*}
V_n(g):=\binom{n}{m}^{-1}\sum_{1\le i_1<\cdots<i_m\le n}g(X_{i_1},\dots,X_{i_m}).
\end{align*}
Then
\begin{align*}
\limsup_{n\to\infty}V_n(g)\le \mathbb E[g(X_1,\dots,X_m)]
\end{align*}
$\mathbb P$-a.s.
[guided]
This step is not proving Hoeffding's tail theorem from the bounded case, and it is not using the theorem currently being proved. It is invoking the independently established Hoeffding non-negative U-statistic strong law as an external nontrivial input for integrable non-negative kernels. This independence matters: if the non-negative result had only been obtained from the theorem currently under proof, the truncation argument would be circular. Its hypotheses match our setting: $g$ is $\mathcal E^{\otimes m}$-measurable, symmetric, non-negative, and satisfies $\mathbb E[g(X_1,\dots,X_m)]<\infty$. Therefore the Hoeffding non-negative U-statistic strong law gives
\begin{align*}
\limsup_{n\to\infty}V_n(g)\le \mathbb E[g(X_1,\dots,X_m)]
\end{align*}
$\mathbb P$-a.s. This is exactly the tail estimate needed later for $g=r_M$, the non-negative truncation error kernel.
[/guided]
[/step]
[step:Truncate the kernel and pass the bounded result to the limit]
For each $M>0$, define the truncated kernel $h_M:E^m\to\mathbb R$ by
\begin{align*}
h_M(x_1,\dots,x_m)=\max\{-M,\min\{h(x_1,\dots,x_m),M\}\}.
\end{align*}
The function $h_M$ is measurable and symmetric because $h$ is measurable and symmetric. It is bounded by $M$, so the bounded-kernel result gives
\begin{align*}
U_n(h_M) \xrightarrow{a.s.} \theta_M,
\end{align*}
where
\begin{align*}
U_n(h_M)
:=\binom{n}{m}^{-1}
\sum_{1\le i_1<\cdots<i_m\le n}
h_M(X_{i_1},\dots,X_{i_m})
\end{align*}
and
\begin{align*}
\theta_M:=\mathbb E[h_M(X_1,\dots,X_m)].
\end{align*}
Define the non-negative tail kernel $r_M:E^m\to[0,\infty)$ by
\begin{align*}
r_M(x_1,\dots,x_m)=|h(x_1,\dots,x_m)-h_M(x_1,\dots,x_m)|.
\end{align*}
Then $r_M$ is measurable, symmetric, and
\begin{align*}
r_M(X_1,\dots,X_m)\le |h(X_1,\dots,X_m)|.
\end{align*}
Thus $\mathbb E[r_M(X_1,\dots,X_m)]<\infty$. Applying the tail reduction claim to $r_M$ gives
\begin{align*}
\limsup_{n\to\infty}U_n(r_M)
\le \mathbb E[r_M(X_1,\dots,X_m)]
\end{align*}
$\mathbb P$-a.s.
For every $n\ge m$,
\begin{align*}
|U_n(h)-\theta|
\le |U_n(h_M)-\theta_M|
+ |U_n(h)-U_n(h_M)|
+ |\theta_M-\theta|
\le |U_n(h_M)-\theta_M|
+ U_n(r_M)
+ \mathbb E[r_M(X_1,\dots,X_m)].
\end{align*}
Taking the limit superior and using $U_n(h_M)\to\theta_M$ almost surely yields, for each fixed $M>0$,
\begin{align*}
\limsup_{n\to\infty}|U_n(h)-\theta|\le 2\mathbb E[r_M(X_1,\dots,X_m)]
\end{align*}
on a probability-one event that may depend on $M$. To pass to the truncation limit on one event, restrict from now on to integer truncation levels $M=N\in\mathbb N$ and intersect the corresponding probability-one events. On this single probability-one event, the displayed inequality holds for every $N\in\mathbb N$.
Finally, $r_N(X_1,\dots,X_m)\to0$ pointwise as $N\to\infty$, and
\begin{align*}
0\le r_N(X_1,\dots,X_m)\le |h(X_1,\dots,X_m)|.
\end{align*}
Since $|h(X_1,\dots,X_m)|$ is integrable, the [Dominated Convergence Theorem](/theorems/4) applies to the sequence $r_N(X_1,\dots,X_m)$ dominated by $|h(X_1,\dots,X_m)|$ and gives
\begin{align*}
\mathbb E[r_N(X_1,\dots,X_m)]\to0.
\end{align*}
Therefore
\begin{align*}
\limsup_{n\to\infty}|U_n(h)-\theta|=0
\end{align*}
$\mathbb P$-a.s., which is equivalent to
\begin{align*}
U_n\xrightarrow{a.s.}\theta.
\end{align*}
This proves the theorem.
[guided]
We now remove the boundedness assumption by approximation. For each $M>0$, define the truncation map $h_M:E^m\to\mathbb R$ by
\begin{align*}
h_M(x_1,\dots,x_m)=\max\{-M,\min\{h(x_1,\dots,x_m),M\}\}.
\end{align*}
This function is measurable because it is obtained from the measurable function $h$ by applying continuous scalar truncation, and it is symmetric because $h$ is symmetric. It is bounded by $M$, so the bounded-kernel case already proved applies to $h_M$. Thus, with
\begin{align*}
U_n(h_M)
:=\binom{n}{m}^{-1}
\sum_{1\le i_1<\cdots<i_m\le n}
h_M(X_{i_1},\dots,X_{i_m})
\end{align*}
and
\begin{align*}
\theta_M:=\mathbb E[h_M(X_1,\dots,X_m)],
\end{align*}
we have
\begin{align*}
U_n(h_M)\xrightarrow{a.s.}\theta_M.
\end{align*}
The error made by truncating is measured by the non-negative kernel $r_M:E^m\to[0,\infty)$ defined by
\begin{align*}
r_M(x_1,\dots,x_m)=|h(x_1,\dots,x_m)-h_M(x_1,\dots,x_m)|.
\end{align*}
This kernel is measurable and symmetric. Moreover, for the random vector $(X_1,\dots,X_m)$,
\begin{align*}
0\le r_M(X_1,\dots,X_m)\le |h(X_1,\dots,X_m)|.
\end{align*}
The right-hand side is integrable by hypothesis, so $\mathbb E[r_M(X_1,\dots,X_m)]<\infty$. Therefore the independently established Hoeffding non-negative U-statistic strong law applies to $r_M$ and gives
\begin{align*}
\limsup_{n\to\infty}U_n(r_M)
\le \mathbb E[r_M(X_1,\dots,X_m)]
\end{align*}
$\mathbb P$-a.s.
For every $n\ge m$, the triangle inequality gives
\begin{align*}
|U_n(h)-\theta|
\le |U_n(h_M)-\theta_M|+|U_n(h)-U_n(h_M)|+|\theta_M-\theta|.
\end{align*}
The middle term is bounded by $U_n(r_M)$ by applying the triangle inequality term-by-term in the U-statistic average. The last term is bounded by $\mathbb E[r_M(X_1,\dots,X_m)]$ because
\begin{align*}
|\theta_M-\theta|
=|\mathbb E[h_M(X_1,\dots,X_m)-h(X_1,\dots,X_m)]|
\le \mathbb E[r_M(X_1,\dots,X_m)].
\end{align*}
Hence
\begin{align*}
|U_n(h)-\theta|
\le |U_n(h_M)-\theta_M|+U_n(r_M)+\mathbb E[r_M(X_1,\dots,X_m)].
\end{align*}
Taking the limit superior in $n$, using $U_n(h_M)\to\theta_M$ almost surely, and using the tail estimate for $r_M$, we obtain for each fixed $M>0$
\begin{align*}
\limsup_{n\to\infty}|U_n(h)-\theta|
\le 2\mathbb E[r_M(X_1,\dots,X_m)]
\end{align*}
on a probability-one event depending on $M$.
To make the final passage occur on one probability-one event, restrict to integer truncation levels $M=N\in\mathbb N$ and intersect the corresponding countably many probability-one events. On that single event, the preceding bound holds for every $N\in\mathbb N$. Finally, $r_N(X_1,\dots,X_m)\to0$ pointwise as $N\to\infty$, and the integrable [random variable](/page/Random%20Variable) $|h(X_1,\dots,X_m)|$ dominates all $r_N(X_1,\dots,X_m)$. The [Dominated Convergence Theorem](/theorems/4) therefore gives
\begin{align*}
\mathbb E[r_N(X_1,\dots,X_m)]\to0.
\end{align*}
Letting $N\to\infty$ in the bound for the limit superior yields
\begin{align*}
\limsup_{n\to\infty}|U_n(h)-\theta|=0
\end{align*}
$\mathbb P$-a.s. This is exactly $U_n\xrightarrow{a.s.}\theta$, so the theorem is proved.
[/guided]
[/step]