[proofplan]
Choose a common dominating measure $\mu_i := P_i + Q_i$ for each pair $(P_i,Q_i)$, and write $p_i$ and $q_i$ for the corresponding Radon-Nikodym densities. The product measure $\mu := \bigotimes_i \mu_i$ dominates the product measures $P$ and $Q$, whose densities with respect to $\mu$ are the products $\prod_i p_i$ and $\prod_i q_i$. Therefore the Hellinger affinity of the product is the integral of a product of one-coordinate non-negative functions, and repeated Tonelli factorizes this integral into the product of the one-coordinate affinities.
[/proofplan]
custom_env
admin
[step:Choose coordinatewise dominating measures and densities]
For each $i \in \{1,\dots,n\}$, define the finite measure
\begin{align*}
\mu_i := P_i + Q_i
\end{align*}
on $(\mathcal{X}_i,\mathcal{A}_i)$. Then $P_i \ll \mu_i$ and $Q_i \ll \mu_i$. Let $p_i: \mathcal{X}_i \to [0,\infty]$ be the Radon-Nikodym density given by
\begin{align*}
p_i(x_i) = \frac{dP_i}{d\mu_i}(x_i).
\end{align*}
Let $q_i: \mathcal{X}_i \to [0,\infty]$ be the Radon-Nikodym density given by
\begin{align*}
q_i(x_i) = \frac{dQ_i}{d\mu_i}(x_i).
\end{align*} By definition of Hellinger affinity,
\begin{align*}
\rho_H(P_i,Q_i)
=
\int_{\mathcal{X}_i} \sqrt{p_i(x_i)q_i(x_i)}\,d\mu_i(x_i).
\end{align*}
Here $p_i$ and $q_i$ are $\mathcal{A}_i$-measurable, so the function $r_i: \mathcal{X}_i \to [0,\infty]$ defined by
\begin{align*}
r_i(x_i) = \sqrt{p_i(x_i)q_i(x_i)}
\end{align*}
is also $\mathcal{A}_i$-measurable.
[/step]
custom_env
admin
[step:Identify the product densities with respect to the product dominating measure]
Define the product measure
\begin{align*}
\mu := \bigotimes_{i=1}^n \mu_i
\end{align*}
on $(\mathcal{X},\mathcal{A})$. For $x=(x_1,\dots,x_n)\in \mathcal{X}$, define $p: \mathcal{X} \to [0,\infty]$ by
\begin{align*}
p(x) = \prod_{i=1}^n p_i(x_i).
\end{align*}
Define $q: \mathcal{X} \to [0,\infty]$ by
\begin{align*}
q(x) = \prod_{i=1}^n q_i(x_i).
\end{align*}
These functions are $\mathcal{A}$-measurable because each coordinate projection $\pi_i:\mathcal{X}\to\mathcal{X}_i$ is measurable and finite products of non-negative [measurable functions](/page/Measurable%20Functions) are measurable.
We claim that $p=dP/d\mu$ and $q=dQ/d\mu$. It suffices to verify this on measurable rectangles and then use uniqueness of finite product measures. If $A_i \in \mathcal{A}_i$ for $1\le i\le n$, then repeated integration gives
\begin{align*}
\int_{\prod_{i=1}^n A_i} p(x)\,d\mu(x)
=
\int_{\prod_{i=1}^n A_i}
\prod_{i=1}^n p_i(x_i)\,
d\left(\bigotimes_{i=1}^n \mu_i\right)(x).
\end{align*}
Since the integrand separates by coordinate, this equals
\begin{align*}
\prod_{i=1}^n
\int_{A_i} p_i(x_i)\,d\mu_i(x_i).
\end{align*}
Using $p_i=dP_i/d\mu_i$ for each coordinate, we get
\begin{align*}
\prod_{i=1}^n
\int_{A_i} p_i(x_i)\,d\mu_i(x_i)
=
\prod_{i=1}^n P_i(A_i)
=
P\left(\prod_{i=1}^n A_i\right).
\end{align*}
The same argument with $q_i$ in place of $p_i$ gives
\begin{align*}
\int_{\prod_{i=1}^n A_i} q(x)\,d\mu(x)
=
Q\left(\prod_{i=1}^n A_i\right).
\end{align*}
Hence $P\ll \mu$, $Q\ll \mu$, and the displayed functions are Radon-Nikodym densities of $P$ and $Q$ with respect to $\mu$.
[/step]
custom_env
admin
[step:Rewrite the product Hellinger affinity as an integral of separated factors]Using the densities from the previous step, the Hellinger affinity of $P$ and $Q$ is
\begin{align*}
\rho_H(P,Q)
=
\int_{\mathcal{X}} \sqrt{p(x)q(x)}\,d\mu(x).
\end{align*}
Substituting the product formulas for $p$ and $q$ gives
\begin{align*}
\rho_H(P,Q)
=
\int_{\mathcal{X}}
\sqrt{
\left(\prod_{i=1}^n p_i(x_i)\right)
\left(\prod_{i=1}^n q_i(x_i)\right)
}\,d\mu(x).
\end{align*}
Since all factors are non-negative,
\begin{align*}
\sqrt{
\left(\prod_{i=1}^n p_i(x_i)\right)
\left(\prod_{i=1}^n q_i(x_i)\right)
}
=
\prod_{i=1}^n \sqrt{p_i(x_i)q_i(x_i)}
=
\prod_{i=1}^n r_i(x_i).
\end{align*}
Therefore
\begin{align*}
\rho_H(P,Q)
=
\int_{\mathcal{X}} \prod_{i=1}^n r_i(x_i)\,d\mu(x).
\end{align*}[/step]
custom_env
admin
[guided]The point of introducing the densities $p_i$ and $q_i$ is that the square root in the Hellinger affinity interacts well with products. From the previous step, the product measures $P$ and $Q$ have densities
\begin{align*}
p(x)=\prod_{i=1}^n p_i(x_i),
\qquad
q(x)=\prod_{i=1}^n q_i(x_i)
\end{align*}
with respect to $\mu=\bigotimes_{i=1}^n\mu_i$, where $x=(x_1,\dots,x_n)\in\mathcal{X}$. Thus the definition of Hellinger affinity gives
\begin{align*}
\rho_H(P,Q)
=
\int_{\mathcal{X}} \sqrt{p(x)q(x)}\,d\mu(x).
\end{align*}
Substituting the product formulas for $p$ and $q$ yields
\begin{align*}
\rho_H(P,Q)
=
\int_{\mathcal{X}}
\sqrt{
\left(\prod_{i=1}^n p_i(x_i)\right)
\left(\prod_{i=1}^n q_i(x_i)\right)
}\,d\mu(x).
\end{align*}
Why can the square root be distributed across the product? Each $p_i$ and $q_i$ is non-negative because it is a Radon-Nikodym density of a positive measure. Hence every factor $p_i(x_i)q_i(x_i)$ is non-negative, and for finite products of non-negative extended [real numbers](/page/Real%20Numbers) the identity
\begin{align*}
\sqrt{
\left(\prod_{i=1}^n p_i(x_i)\right)
\left(\prod_{i=1}^n q_i(x_i)\right)
}
=
\prod_{i=1}^n \sqrt{p_i(x_i)q_i(x_i)}
\end{align*}
holds pointwise. With the notation $r_i: \mathcal{X}_i \to [0,\infty]$ defined by
\begin{align*}
r_i(x_i) = \sqrt{p_i(x_i)q_i(x_i)},
\end{align*}
we therefore obtain the separated form
\begin{align*}
\rho_H(P,Q)
=
\int_{\mathcal{X}} \prod_{i=1}^n r_i(x_i)\,d\mu(x).
\end{align*}
This separated form is the exact place where tensorization enters: the integrand is a product of functions, each depending on only one coordinate.[/guided]
custom_env
admin
[step:Factor the separated integral into coordinate integrals]
The function $R: \mathcal{X} \to [0,\infty]$ defined by
\begin{align*}
R(x) = \prod_{i=1}^n r_i(x_i)
\end{align*}
is non-negative and $\mathcal{A}$-measurable. Applying Tonelli's theorem for non-negative functions on finite product measure spaces (citing a result not yet in the wiki: Tonelli's theorem), we may iteratively integrate one coordinate at a time:
\begin{align*}
\int_{\mathcal{X}} \prod_{i=1}^n r_i(x_i)\,d\mu(x)
=
\int_{\mathcal{X}_1}\cdots\int_{\mathcal{X}_n}
\prod_{i=1}^n r_i(x_i)\,
d\mu_n(x_n)\cdots d\mu_1(x_1).
\end{align*}
Because each factor $r_i(x_i)$ depends only on the coordinate $x_i$, the iterated integral factors as
\begin{align*}
\int_{\mathcal{X}_1}\cdots\int_{\mathcal{X}_n}
\prod_{i=1}^n r_i(x_i)\,
d\mu_n(x_n)\cdots d\mu_1(x_1)
=
\prod_{i=1}^n
\int_{\mathcal{X}_i} r_i(x_i)\,d\mu_i(x_i).
\end{align*}
For each $i$, the definition of $r_i$ gives
\begin{align*}
\int_{\mathcal{X}_i} r_i(x_i)\,d\mu_i(x_i)
=
\int_{\mathcal{X}_i} \sqrt{p_i(x_i)q_i(x_i)}\,d\mu_i(x_i).
\end{align*}
By the coordinatewise definition of Hellinger affinity, this equals
\begin{align*}
\rho_H(P_i,Q_i).
\end{align*}
Combining these identities gives
\begin{align*}
\rho_H(P,Q)
=
\prod_{i=1}^n \rho_H(P_i,Q_i),
\end{align*}
which is the desired tensorization formula.
[/step]