[step:Prove entropy tensorization on a finite product probability space]
[claim:Entropy tensorizes over finite products]
Let $\nu=\nu_1\otimes\cdots\otimes\nu_n$ be a finite product of probability measures. For every non-negative integrable $g:E\to[0,\infty)$ with $g\log g\in L^1(\nu)$, one has
\begin{align*}
\operatorname{Ent}_\nu(g)\leq \sum_{i=1}^n \int_{E_{-i}} \operatorname{Ent}_{\nu_i}(g_{x_{-i}})\,d\nu_{-i}(x_{-i}).
\end{align*}
[/claim]
[proof]
It is enough to prove the two-factor inequality and then iterate. Let $(X,\mu)$ and $(Y,\rho)$ be probability spaces, let
\begin{align*}
h:X\times Y\to[0,\infty)
\end{align*}
be integrable, and assume $h\log h\in L^1(\mu\otimes\rho)$. Define
\begin{align*}
a:X\to[0,\infty)
\end{align*}
by
\begin{align*}
x\mapsto \int_Y h(x,y)\,d\rho(y).
\end{align*}
The integrability hypothesis $h\log h\in L^1(\mu\otimes\rho)$ and the lower bound for $s\mapsto s\log s$ ensure that the positive and negative parts occurring below are integrable. Hence the [Fubini theorem](/theorems/513) applies to the entropy decomposition. By the definition of entropy and the product measure,
\begin{align*}
\operatorname{Ent}_{\mu\otimes\rho}(h)=\int_X \operatorname{Ent}_\rho(h(x,\cdot))\,d\mu(x)+\operatorname{Ent}_\mu(a).
\end{align*}
It remains to bound the second term by the entropy in the $X$ variable.
We use the convexity of the relative entropy integrand. For a probability measure $\mu$, define
\begin{align*}
\Psi:[0,\infty)\times(0,\infty)\to\mathbb{R}
\end{align*}
by
\begin{align*}
(s,r)\mapsto s\log(s/r)-s+r.
\end{align*}
For fixed $r>0$, take $0\log(0/r)=0$. The function $\Psi$ is jointly convex on $[0,\infty)\times(0,\infty)$. Therefore, for every non-negative $u:X\to[0,\infty)$ with $u\log u\in L^1(\mu)$,
\begin{align*}
\operatorname{Ent}_\mu(u)=\inf_{r>0}\int_X \Psi(u(x),r)\,d\mu(x).
\end{align*}
Indeed, the infimum in $r$ is attained at
\begin{align*}
r=\int_X u\,d\mu.
\end{align*}
Apply this formula to $a=\int_Y h(\cdot,y)\,d\rho(y)$. Convexity of $\Psi$ and the fact that $\rho$ is a probability measure give, for every integrable map $r:Y\to(0,\infty)$,
\begin{align*}
\int_X \Psi\left(\int_Y h(x,y)\,d\rho(y),\int_Y r(y)\,d\rho(y)\right)\,d\mu(x)\leq \int_Y\int_X \Psi(h(x,y),r(y))\,d\mu(x)\,d\rho(y).
\end{align*}
The iterated integral is justified by Tonelli for the non-negative function $\Psi(h(x,y),r(y))$. To pass from the right-hand side to pointwise entropies, choose for each $m\in\mathbb{N}$ a positive [simple function](/page/Simple%20Function) $r_m:Y\to(0,\infty)$ whose values approximate, within $1/m$ on each level set, the scalar minimizers in the variational formula for $\operatorname{Ent}_\mu(h(\cdot,y))$. Monotone approximation of measurable non-negative functions and the variational formula then give
\begin{align*}
\operatorname{Ent}_\mu(a)\leq \int_Y \operatorname{Ent}_\mu(h(\cdot,y))\,d\rho(y).
\end{align*}
Substituting this estimate into the entropy decomposition yields
\begin{align*}
\operatorname{Ent}_{\mu\otimes\rho}(h)\leq \int_X \operatorname{Ent}_\rho(h(x,\cdot))\,d\mu(x)+\int_Y \operatorname{Ent}_\mu(h(\cdot,y))\,d\rho(y).
\end{align*}
This is the two-factor tensorization inequality.
For $n>2$, apply the two-factor inequality to the decomposition $E=(E_1\times\cdots\times E_{n-1})\times E_n$.
The first term is then tensorized again over $E_1,\dots,E_{n-1}$. Induction over $n$ gives the asserted finite-product estimate.
[/proof]
[/step]