[proofplan]
We compare each leave-one empirical projection with the true first Hoeffding projection $h_1(X_i)$. The key estimate is that the average squared error of these projection estimates converges to $0$ in probability; this follows by expanding the conditional U-statistic error, observing that disjoint index configurations have conditional covariance $0$, and using the fourth-moment assumption to obtain the square-integrability needed for the overlapping configurations. Once the projection estimates are close in empirical $L^2$, their centred empirical variance is close to the empirical variance of $h_1(X_i)$, which converges to $\zeta_1$ by the ordinary law of large numbers. The studentised limit then follows from the nondegenerate U-statistic [central limit theorem](/theorems/521) and Slutsky's theorem.
[/proofplan]
[step:Introduce the leave-one conditional averages and isolate the projection error]
Let $(\Omega,\mathcal F,\mathbb P)$ denote the probability space on which the i.i.d. $E$-valued sample $(X_i)_{i\geq 1}$ is defined. For $1\leq p<\infty$, write $L^p(\Omega,\mathcal F,\mathbb P)$ for the space of real-valued random variables $Z:(\Omega,\mathcal F)\to(\mathbb R,\mathcal B(\mathbb R))$ satisfying $\mathbb E[|Z|^p]<\infty$, modulo equality $\mathbb P$-a.s. Define the target parameter $\theta:=\mathbb E[h(X_1,\dots,X_m)]$.
For $n\geq m$, define the U-statistic
\begin{align*}
U_n:=\binom{n}{m}^{-1}\sum_{A\in\mathcal A_n} h((X_j)_{j\in A}),
\end{align*}
where $\mathcal A_n$ denotes the family of subsets $A\subset\{1,\dots,n\}$ with $|A|=m$.
Define the first conditional projection function $g:E\to\mathbb R$ by
\begin{align*}
g(x):=\mathbb E[h(x,X_2,\dots,X_m)]
\end{align*}
for $x\in E$, with the convention that when $m=1$ this means $g(x)=h(x)$. Thus $h_1:E\to\mathbb R$ is defined by
\begin{align*}
h_1(x):=g(x)-\theta
\end{align*}
for $x\in E$, and $h_1(X_i)=g(X_i)-\theta$ for every $i\in\{1,\dots,n\}$.
For $i \in \{1,\dots,n\}$, let $\mathcal A_{n,i}$ denote the family of subsets $A \subset \{1,\dots,n\}\setminus\{i\}$ with $|A|=m-1$. Define
\begin{align*}
V_{n,i}:=\binom{n-1}{m-1}^{-1}\sum_{A\in\mathcal A_{n,i}} h\bigl(X_i,(X_j)_{j\in A}\bigr).
\end{align*}
By the definition of the leave-one empirical projection in the theorem statement, $\widehat h_{1,i}=V_{n,i}-U_n$. Define the error variable
\begin{align*}
R_{n,i}:=V_{n,i}-g(X_i).
\end{align*}
Since $h_1(X_i)=g(X_i)-\theta$, we have
\begin{align*}
\widehat h_{1,i}-h_1(X_i)=R_{n,i}-(U_n-\theta).
\end{align*}
It is therefore enough to prove
\begin{align*}
\frac{1}{n}\sum_{i=1}^n R_{n,i}^2 \xrightarrow{\mathbb P}0
\qquad\text{and}\qquad
U_n\xrightarrow{\mathbb P}\theta.
\end{align*}
The second convergence follows from the [Strong Law for U-Statistics](/theorems/4938), which gives the stronger conclusion $U_n\xrightarrow{a.s.}\theta$ and hence convergence in probability. Its hypotheses hold here because the sample $(X_i)_{i\geq 1}$ is i.i.d., the kernel $h$ is symmetric and measurable, and the fourth-moment assumption implies $h(X_1,\dots,X_m)\in L^1(\Omega,\mathcal F,\mathbb P)$.
[/step]
[step:Bound the average squared projection error]
We prove
\begin{align*}
\mathbb E\left[\frac{1}{n}\sum_{i=1}^n R_{n,i}^2\right]\longrightarrow 0.
\end{align*}
If $m=1$, then $V_{n,i}=h(X_i)$ and $g(X_i)=h(X_i)$ for every $i$, so $R_{n,i}=0$ for every $i$ and the desired convergence holds. Hence assume $m\geq 2$ for the rest of this step.
By exchangeability of $X_1,\dots,X_n$,
\begin{align*}
\mathbb E\left[\frac{1}{n}\sum_{i=1}^n R_{n,i}^2\right]=\mathbb E[R_{n,1}^2].
\end{align*}
Condition on $X_1$. For each subset $A \subset \{2,\dots,n\}$ with $|A|=m-1$, define $Y_A:=h(X_1,(X_j)_{j\in A})-\mathbb E[h(X_1,X_2,\dots,X_m)\mid X_1]$.
Then, using the already defined family $\mathcal A_{n,1}$,
\begin{align*}
R_{n,1}=\binom{n-1}{m-1}^{-1}\sum_{A\in\mathcal A_{n,1}}Y_A.
\end{align*}
For every $A\in\mathcal A_{n,1}$, the random vector $(X_j)_{j\in A}$ has the same law as $(X_2,\dots,X_m)$ and is independent of $X_1$, so
\begin{align*}
\mathbb E[h(X_1,(X_j)_{j\in A})\mid X_1]=g(X_1)
\end{align*}
and therefore $\mathbb E[Y_A\mid X_1]=0$. For two such subsets $A$ and $B$, if $A\cap B=\varnothing$, then $Y_A$ and $Y_B$ are conditionally independent given $X_1$, hence
\begin{align*}
\mathbb E[Y_A Y_B\mid X_1]=\mathbb E[Y_A\mid X_1]\mathbb E[Y_B\mid X_1]=0.
\end{align*}
Thus only pairs with $A\cap B\neq \varnothing$ contribute after taking expectation. The fourth-moment assumption implies $h(X_1,\dots,X_m)\in L^2(\Omega,\mathcal F,\mathbb P)$. By [Jensen's inequality](/theorems/9) for [conditional expectation](/page/Conditional%20Expectation), $g(X_1)=\mathbb E[h(X_1,\dots,X_m)\mid X_1]$ also belongs to $L^2(\Omega,\mathcal F,\mathbb P)$. Hence there is a finite constant
\begin{align*}
C_h:=\mathbb E\left[\left(h(X_1,\dots,X_m)-g(X_1)\right)^2\right]<\infty
\end{align*}
such that, by the [Cauchy-Schwarz Inequality](/theorems/432) in $L^2(\Omega,\mathcal F,\mathbb P)$ applied to $Y_A$ and $Y_B$,
\begin{align*}
|\mathbb E[Y_A Y_B]|\leq \mathbb E[|Y_A Y_B|]\leq \left(\mathbb E[Y_A^2]\right)^{1/2}\left(\mathbb E[Y_B^2]\right)^{1/2}=C_h
\end{align*}
for all $A,B$.
The number of subsets $A \subset \{2,\dots,n\}$ with $|A|=m-1$ is $\binom{n-1}{m-1}$. For a fixed $A$, the number of subsets $B \subset \{2,\dots,n\}$ with $|B|=m-1$ and $A\cap B\neq \varnothing$ is at most
\begin{align*}
(m-1)\binom{n-2}{m-2}.
\end{align*}
Therefore
\begin{align*}
\mathbb E[R_{n,1}^2]\leq \binom{n-1}{m-1}^{-2}\binom{n-1}{m-1}(m-1)\binom{n-2}{m-2}C_h.
\end{align*}
Equivalently,
\begin{align*}
\mathbb E[R_{n,1}^2]\leq C_h(m-1)\frac{\binom{n-2}{m-2}}{\binom{n-1}{m-1}}.
\end{align*}
Since
\begin{align*}
\frac{\binom{n-2}{m-2}}{\binom{n-1}{m-1}}=\frac{m-1}{n-1},
\end{align*}
we obtain
\begin{align*}
\mathbb E[R_{n,1}^2]\leq C_h(m-1)\frac{m-1}{n-1}\longrightarrow 0.
\end{align*}
The [Markov Inequality](/theorems/514) applied to the nonnegative [random variable](/page/Random%20Variable)
\begin{align*}
\frac{1}{n}\sum_{i=1}^n R_{n,i}^2
\end{align*}
gives
\begin{align*}
\frac{1}{n}\sum_{i=1}^n R_{n,i}^2 \xrightarrow{\mathbb P}0.
\end{align*}
[guided]
We want to show that the empirical leave-one averages behave like the conditional expectation defining $h_1$. Recall the objects being compared. The conditional projection function $g:E\to\mathbb R$ is defined by $g(x)=\mathbb E[h(x,X_2,\dots,X_m)]$ for $x\in E$, with $g(x)=h(x)$ when $m=1$. For $i\in\{1,\dots,n\}$, the leave-one average is
\begin{align*}
V_{n,i}:=\binom{n-1}{m-1}^{-1}\sum_{A\in\mathcal A_{n,i}} h\bigl(X_i,(X_j)_{j\in A}\bigr),
\end{align*}
where $\mathcal A_{n,i}$ is the family of subsets $A\subset\{1,\dots,n\}\setminus\{i\}$ with $|A|=m-1$. The projection error is $R_{n,i}:=V_{n,i}-g(X_i)$. The natural quantity to control is the average squared error
\begin{align*}
\frac{1}{n}\sum_{i=1}^n R_{n,i}^2.
\end{align*}
If $m=1$, then $V_{n,i}=h(X_i)$ and $g(X_i)=h(X_i)$ for every $i$, so $R_{n,i}=0$ for every $i$. Thus the desired convergence is immediate in that case. We now assume $m\geq 2$.
Because the observations are i.i.d. and the construction is symmetric in the index $i$, exchangeability gives
\begin{align*}
\mathbb E\left[\frac{1}{n}\sum_{i=1}^n R_{n,i}^2\right]=\mathbb E[R_{n,1}^2].
\end{align*}
So it is enough to prove that the error for the first observation has vanishing second moment.
Fix $X_1$ and average over the remaining observations. For every subset $A \subset \{2,\dots,n\}$ with $|A|=m-1$, define $Y_A:=h(X_1,(X_j)_{j\in A})-\mathbb E[h(X_1,X_2,\dots,X_m)\mid X_1]$. This is the centred contribution of the tuple using $X_1$ and the observations indexed by $A$. The vector $(X_j)_{j\in A}$ has the same distribution as $(X_2,\dots,X_m)$ and is independent of $X_1$, so
\begin{align*}
\mathbb E[h(X_1,(X_j)_{j\in A})\mid X_1]=g(X_1),
\end{align*}
and therefore $\mathbb E[Y_A\mid X_1]=0$. Moreover, using $\mathcal A_{n,1}$ for the subsets of $\{2,\dots,n\}$ with cardinality $m-1$,
\begin{align*}
R_{n,1}=\binom{n-1}{m-1}^{-1}\sum_{A\in\mathcal A_{n,1}}Y_A.
\end{align*}
The important point is that most pairs of summands are conditionally uncorrelated. If $A\cap B=\varnothing$, then the random vectors $(X_j)_{j\in A}$ and $(X_j)_{j\in B}$ are conditionally independent given $X_1$. Since both $Y_A$ and $Y_B$ have conditional mean $0$, conditional independence gives
\begin{align*}
\mathbb E[Y_A Y_B\mid X_1]
=
\mathbb E[Y_A\mid X_1]\mathbb E[Y_B\mid X_1]
=
0.
\end{align*}
Thus only overlapping pairs of subsets can contribute to $\mathbb E[R_{n,1}^2]$.
For overlapping pairs, we use a uniform integrable bound. The fourth-moment assumption gives $h(X_1,\dots,X_m)\in L^2(\Omega,\mathcal F,\mathbb P)$. [Jensen's inequality](/theorems/1977) for conditional expectation gives
\begin{align*}
\mathbb E[g(X_1)^2]\leq \mathbb E[h(X_1,\dots,X_m)^2]<\infty.
\end{align*}
Therefore define
\begin{align*}
C_h:=\mathbb E\left[\left(h(X_1,\dots,X_m)-g(X_1)\right)^2\right]<\infty.
\end{align*}
For any two subsets $A$ and $B$ of $\{2,\dots,n\}$ with $|A|=|B|=m-1$, the random variables $Y_A$ and $Y_B$ have the same second moment $C_h$, because the variables outside $X_1$ are i.i.d. and the kernel is symmetric. The [Cauchy-Schwarz Inequality](/theorems/432) in $L^2(\Omega,\mathcal F,\mathbb P)$ gives
\begin{align*}
|\mathbb E[Y_A Y_B]|\leq \mathbb E[|Y_A Y_B|]\leq \left(\mathbb E[Y_A^2]\right)^{1/2}\left(\mathbb E[Y_B^2]\right)^{1/2}=C_h.
\end{align*}
This is the bound used for every overlapping pair $A,B$.
It remains to count how many overlapping pairs there are. There are $\binom{n-1}{m-1}$ choices for $A$. Once $A$ is fixed, a subset $B$ with $|B|=m-1$ and $A\cap B\neq\varnothing$ can be counted by first choosing at least one element of $A$ that lies in $B$. This gives the upper bound
\begin{align*}
(m-1)\binom{n-2}{m-2}.
\end{align*}
The count may overcount sets $B$ that share more than one element with $A$, but an upper bound is all we need. Therefore
\begin{align*}
\mathbb E[R_{n,1}^2]\leq \binom{n-1}{m-1}^{-2}\binom{n-1}{m-1}(m-1)\binom{n-2}{m-2}C_h.
\end{align*}
Canceling one copy of $\binom{n-1}{m-1}$ gives
\begin{align*}
\mathbb E[R_{n,1}^2]\leq C_h(m-1)\frac{\binom{n-2}{m-2}}{\binom{n-1}{m-1}}.
\end{align*}
The ratio of binomial coefficients is
\begin{align*}
\frac{\binom{n-2}{m-2}}{\binom{n-1}{m-1}}=\frac{m-1}{n-1}.
\end{align*}
Consequently
\begin{align*}
\mathbb E[R_{n,1}^2]\leq C_h(m-1)\frac{m-1}{n-1}\longrightarrow 0.
\end{align*}
Since the expectation of the nonnegative random variable
\begin{align*}
\frac{1}{n}\sum_{i=1}^n R_{n,i}^2
\end{align*}
tends to $0$, the [Markov Inequality](/theorems/514) implies
\begin{align*}
\frac{1}{n}\sum_{i=1}^n R_{n,i}^2 \xrightarrow{\mathbb P}0.
\end{align*}
[/guided]
[/step]
[step:Transfer empirical second moments from estimated projections to true projections]
For $i\in\{1,\dots,n\}$, define $a_{n,i}:=\widehat h_{1,i}$, $b_{n,i}:=h_1(X_i)$, and $d_{n,i}:=a_{n,i}-b_{n,i}$.
From the preceding step and $U_n\xrightarrow{\mathbb P}\theta$,
\begin{align*}
\frac{1}{n}\sum_{i=1}^n d_{n,i}^2
\leq
2\frac{1}{n}\sum_{i=1}^n R_{n,i}^2
+
2(U_n-\theta)^2
\xrightarrow{\mathbb P}0.
\end{align*}
By the [Cauchy-Schwarz Inequality](/theorems/432) for finite sums,
\begin{align*}
\left|\frac{1}{n}\sum_{i=1}^n a_{n,i}^2-\frac{1}{n}\sum_{i=1}^n b_{n,i}^2\right|\leq \frac{1}{n}\sum_{i=1}^n d_{n,i}^2+2\left(\frac{1}{n}\sum_{i=1}^n d_{n,i}^2\right)^{1/2}\left(\frac{1}{n}\sum_{i=1}^n b_{n,i}^2\right)^{1/2}.
\end{align*}
Since $h_1(X_1)\in L^2(\Omega,\mathcal F,\mathbb P)$, in particular $h_1(X_1)^2\in L^1(\Omega,\mathcal F,\mathbb P)$, the [Weak Law of Large Numbers](/theorems/1851) gives
\begin{align*}
\frac{1}{n}\sum_{i=1}^n b_{n,i}^2
\xrightarrow{\mathbb P}
\mathbb E[h_1(X_1)^2]
=
\zeta_1.
\end{align*}
Thus
\begin{align*}
\frac{1}{n}\sum_{i=1}^n \widehat h_{1,i}^2
-
\frac{1}{n}\sum_{i=1}^n h_1(X_i)^2
\xrightarrow{\mathbb P}0.
\end{align*}
The same [Cauchy-Schwarz Inequality](/theorems/432) estimate gives
\begin{align*}
\left|
\frac{1}{n}\sum_{i=1}^n \widehat h_{1,i}
-
\frac{1}{n}\sum_{i=1}^n h_1(X_i)
\right|
\leq
\left(\frac{1}{n}\sum_{i=1}^n d_{n,i}^2\right)^{1/2}
\xrightarrow{\mathbb P}0.
\end{align*}
By the [tower property of conditional expectation](/theorems/1150),
\begin{align*}
\mathbb E[h_1(X_1)]
=
\mathbb E[g(X_1)]-\theta
=
\mathbb E[h(X_1,\dots,X_m)]-\theta
=0.
\end{align*}
Since $\mathbb E[h_1(X_1)]=0$ and $h_1(X_1)\in L^1(\Omega,\mathcal F,\mathbb P)$, the [Weak Law of Large Numbers](/theorems/1851) also gives
\begin{align*}
\frac{1}{n}\sum_{i=1}^n h_1(X_i)\xrightarrow{\mathbb P}0.
\end{align*}
Therefore
\begin{align*}
\frac{1}{n}\sum_{i=1}^n \widehat h_{1,i}\xrightarrow{\mathbb P}0.
\end{align*}
[/step]
[step:Identify the limit of the centred empirical variance]
The quantity $\widehat\zeta_1$ is the centred empirical variance of the estimated first-projection values $\widehat h_{1,1},\dots,\widehat h_{1,n}$. By the algebraic identity for the centred empirical second moment,
\begin{align*}
\widehat\zeta_1
=
\frac{1}{n}\sum_{i=1}^n \widehat h_{1,i}^2
-
\left(\frac{1}{n}\sum_{i=1}^n \widehat h_{1,i}\right)^2.
\end{align*}
The previous step gives
\begin{align*}
\frac{1}{n}\sum_{i=1}^n \widehat h_{1,i}^2
\xrightarrow{\mathbb P}
\zeta_1
\qquad\text{and}\qquad
\frac{1}{n}\sum_{i=1}^n \widehat h_{1,i}
\xrightarrow{\mathbb P}0.
\end{align*}
Since the square map $x\mapsto x^2$ is continuous on $\mathbb R$, the [Continuous Mapping Theorem](/theorems/1847) gives
\begin{align*}
\left(\frac{1}{n}\sum_{i=1}^n \widehat h_{1,i}\right)^2
\xrightarrow{\mathbb P}0.
\end{align*}
Hence
\begin{align*}
\widehat\zeta_1\xrightarrow{\mathbb P}\zeta_1.
\end{align*}
[/step]
[step:Apply studentisation through Slutsky's theorem]
By assumption $\zeta_1>0$, and the consistency just proved implies
\begin{align*}
m\sqrt{\widehat\zeta_1}\xrightarrow{\mathbb P}m\sqrt{\zeta_1}.
\end{align*}
Since $\widehat\zeta_1\geq 0$ by definition and $\widehat\zeta_1\xrightarrow{\mathbb P}\zeta_1>0$, we also have $\mathbb P(\widehat\zeta_1>0)\to 1$. Thus the reciprocal factor $1/\sqrt{\widehat\zeta_1}$ is asymptotically well-defined. Define the continuous map $q:(0,\infty)\to\mathbb R$ by
\begin{align*}
q(x):=\frac{m\sqrt{\zeta_1}}{m\sqrt{x}}.
\end{align*}
The [Continuous Mapping Theorem](/theorems/1847) applied to $q$ gives
\begin{align*}
\frac{m\sqrt{\zeta_1}}{m\sqrt{\widehat\zeta_1}}\xrightarrow{\mathbb P}1.
\end{align*}
The [Central Limit Theorem for Nondegenerate U-Statistics](/theorems/4939) applies in the following standard form: if $(X_i)_{i\geq1}$ is i.i.d., $h$ is symmetric and measurable, $h(X_1,\dots,X_m)\in L^2(\Omega,\mathcal F,\mathbb P)$, and the first Hoeffding projection satisfies $\operatorname{Var}(h_1(X_1))=\zeta_1>0$, then $\sqrt n(U_n-\theta)/(m\sqrt{\zeta_1})$ converges in distribution to $\mathcal N(0,1)$. Its hypotheses hold here because $(X_i)_{i\geq1}$ is i.i.d., the kernel $h$ is symmetric and measurable, $h(X_1,\dots,X_m)\in L^2(\Omega,\mathcal F,\mathbb P)$ by the fourth-moment assumption, and the first Hoeffding projection has positive variance $\zeta_1>0$. Therefore
\begin{align*}
\frac{\sqrt n(U_n-\theta)}{m\sqrt{\zeta_1}}\xrightarrow{d}\mathcal N(0,1).
\end{align*}
The [Slutsky's Lemma](/theorems/1850) applied to the preceding convergence in distribution and the denominator ratio yields
\begin{align*}
\frac{\sqrt n(U_n-\theta)}{m\sqrt{\widehat\zeta_1}}=\frac{\sqrt n(U_n-\theta)}{m\sqrt{\zeta_1}}\cdot\frac{m\sqrt{\zeta_1}}{m\sqrt{\widehat\zeta_1}}\xrightarrow{d}\mathcal N(0,1).
\end{align*}
This proves both the plug-in variance consistency and the studentised [central limit theorem](/theorems/1848).
[/step]