[guided]Before computing any expectations we must verify that the kernel evaluations being averaged are integrable. Without this, neither linearity-of-expectation manipulations nor Fubini are licensed. The two ingredients are (a) the universal Cauchy--Schwarz bound for any positive-definite kernel, and (b) the integrability hypotheses on the diagonal $\mathbb{E}_\mu[k_\phi(x, x)] < \infty$ and $\mathbb{E}_\nu[k_\phi(y, y)] < \infty$ that are part of the theorem's standing assumptions.
**(a) The reproducing-kernel Cauchy--Schwarz bound.** Since $k_\phi$ is a positive-definite kernel arising from the inner product on the Hilbert space $T_\phi((V))$, the [Cauchy--Schwarz inequality for reproducing kernels](/theorems/???) gives, for all $u, v \in \mathcal{K}$,
\begin{align*}
|k_\phi(u, v)| = |\langle S(u), S(v)\rangle_\phi| \le \|S(u)\|_\phi \, \|S(v)\|_\phi = \sqrt{k_\phi(u, u)} \, \sqrt{k_\phi(v, v)}.
\end{align*}
The two equalities at the ends use $\|S(u)\|_\phi^2 = \langle S(u), S(u)\rangle_\phi = k_\phi(u, u)$ (and likewise for $v$). The bound is pointwise on $\mathcal{K} \times \mathcal{K}$ and requires no integrability.
**(b) Off-diagonal integrability under $\mu \otimes \mu$.** For $i \neq j$ the joint law of $(x^i, x^j)$ is $\mu \otimes \mu$ by the i.i.d.\ hypothesis on the $\mu$-sample. By Tonelli's theorem (applicable to the non-negative measurable function $\sqrt{k_\phi(\cdot, \cdot) k_\phi(\cdot, \cdot)}$),
\begin{align*}
\mathbb{E}\bigl[|k_\phi(x^i, x^j)|\bigr] = \int_{\mathcal{K} \times \mathcal{K}} |k_\phi(u, v)|\, d(\mu \otimes \mu)(u, v) \le \int_{\mathcal{K} \times \mathcal{K}} \sqrt{k_\phi(u, u) k_\phi(v, v)}\, d(\mu \otimes \mu)(u, v).
\end{align*}
Apply the AM--GM inequality $\sqrt{ab} \le \tfrac{1}{2}(a + b)$ with $a = k_\phi(u, u) \ge 0$ and $b = k_\phi(v, v) \ge 0$, and split the resulting integral by Tonelli:
\begin{align*}
\int_{\mathcal{K} \times \mathcal{K}} \sqrt{k_\phi(u, u) k_\phi(v, v)}\, d(\mu \otimes \mu)(u, v) &\le \frac{1}{2}\int_{\mathcal{K} \times \mathcal{K}} \bigl(k_\phi(u, u) + k_\phi(v, v)\bigr)\, d(\mu \otimes \mu)(u, v) \\
&= \frac{1}{2}\Bigl(\int_\mathcal{K} k_\phi(u, u)\, d\mu(u) + \int_\mathcal{K} k_\phi(v, v)\, d\mu(v)\Bigr) = \mathbb{E}_{x \sim \mu}[k_\phi(x, x)] < \infty,
\end{align*}
where the last inequality is the integrability hypothesis $\mathbb{E}_\mu[k_\phi(x, x)] < \infty$. Hence $\mathbb{E}[|k_\phi(x^i, x^j)|] < \infty$ for any $i \neq j$.
**(c) Cross-term and $\nu$-term integrability.** The same Cauchy--Schwarz + AM--GM argument applied with the joint law $\mu \otimes \nu$ on $(x^i, y^j)$ gives
\begin{align*}
\mathbb{E}\bigl[|k_\phi(x^i, y^j)|\bigr] \le \frac{1}{2}\bigl(\mathbb{E}_{x \sim \mu}[k_\phi(x, x)] + \mathbb{E}_{y \sim \nu}[k_\phi(y, y)]\bigr) < \infty,
\end{align*}
and applied with $\nu \otimes \nu$ on $(y^i, y^j)$ for $i \neq j$ gives
\begin{align*}
\mathbb{E}\bigl[|k_\phi(y^i, y^j)|\bigr] \le \mathbb{E}_{y \sim \nu}[k_\phi(y, y)] < \infty.
\end{align*}
**Why integrability matters here.** Three downstream operations rely on it:
\begin{enumerate}
\item *Fubini* — to compute $\mathbb{E}[k_\phi(x^i, x^j)] = \int\int k_\phi\, d\mu\, d\mu$ as iterated integrals (Steps 3 and 4) we need integrability on the product space; we have $L^1$-integrability by the bound just established.
\item *Linearity of expectation* — interchanging $\mathbb{E}$ with the finite double sum $\sum_{i \neq j}$ is automatic for finite sums of integrable random variables.
\item *Continuous mapping / WLLN* — convergence-in-probability arguments via Chebyshev or U-statistic theorems require finite first (and ideally second) moments; we have controlled the first moment, and the same Cauchy--Schwarz bound iterated gives the second moment, $\mathbb{E}[|k_\phi(u, v)|^2] \le \mathbb{E}[k_\phi(u, u) k_\phi(v, v)]$, which is finite by Tonelli and the same hypothesis (note here we do not even need AM--GM since we are already a single product over the product measure).
\end{enumerate}
The integrability hypotheses $\mathbb{E}_\mu[k_\phi(x, x)] < \infty$ and $\mathbb{E}_\nu[k_\phi(y, y)] < \infty$ are therefore not technical decoration — they are the entry condition that makes every expectation in the rest of the proof well-defined.[/guided]