[proofplan]
We prove the result by revealing the coordinates one at a time. For each $k$, we average $f$ over the coordinates after $k$ and write the total entropy as a telescoping sum of entropy increments. Each increment is an entropy in the $k$-th coordinate of an averaged function. Finally, convexity of $t \mapsto t\log t$ shows that averaging over the later coordinates can only decrease this one-coordinate entropy, giving the desired sum of coordinate entropies.
[/proofplan]
[step:Define the partial averages and the entropy function]
Let $\Phi: [0,\infty) \to \mathbb{R}$ be the convex function defined by $\Phi(t) := t\log t$ for $t>0$ and $\Phi(0) := 0$. Entropy is understood in the standard measure-theoretic sense
\begin{align*}
\operatorname{Ent}_\nu(g) := \int \Phi(g)\,d\nu - \Phi\left(\int g\,d\nu\right),
\end{align*}
with all integrals taken with respect to the displayed measure.
For $k \in \{0,\dots,n\}$, define the product probability measure
\begin{align*}
\mu_{\leq k} := \mu_1 \otimes \cdots \otimes \mu_k
\end{align*}
on $E_{\leq k} := E_1 \times \cdots \times E_k$, with the convention that $E_{\leq 0}$ is a one-point space and $\mu_{\leq 0}$ is the unit mass on it. Also define
\begin{align*}
\mu_{>k} := \mu_{k+1} \otimes \cdots \otimes \mu_n
\end{align*}
on $E_{>k} := E_{k+1} \times \cdots \times E_n$, with the analogous convention when $k=n$. For $x=(x_1,\dots,x_n) \in E$, write $x_{\leq k}:=(x_1,\dots,x_k) \in E_{\leq k}$ and $x_{-k}:=(x_1,\dots,x_{k-1},x_{k+1},\dots,x_n) \in E_1\times\cdots\times E_{k-1}\times E_{k+1}\times\cdots\times E_n$.
For each $k \in \{0,\dots,n\}$, define the partial average $F_k: E_{\leq k} \to [0,\infty)$ by
\begin{align*}
F_k(x_1,\dots,x_k) := \int_{E_{>k}} f(x_1,\dots,x_k,z_{k+1},\dots,z_n)\, d\mu_{>k}(z_{k+1},\dots,z_n).
\end{align*}
Thus $F_n=f$ as a function on $E$, and $F_0=\int_E f\,d\mu$ is a constant.
Since $f$ is non-negative and measurable, Tonelli's theorem applies to the non-negative sections defining $F_k$, so each $F_k$ is measurable. Since $f$ is also $\mu$-integrable and all factors are probability measures, [Fubini's theorem](/theorems/2961) gives that each $F_k$ is $\mu_{\leq k}$-integrable and
\begin{align*}
\int_{E_{\leq k}} F_k\,d\mu_{\leq k} = \int_E f\,d\mu.
\end{align*}
[/step]
[step:Decompose the total entropy into one-coordinate increments]
We telescope the entropy along the sequence $F_0,F_1,\dots,F_n$. First we justify that all terms in the telescoping sum are finite. We use the convention in the definition of finite entropy that $\operatorname{Ent}_\mu(f)<\infty$ means both defining quantities
\begin{align*}
\int_E \Phi(f)\,d\mu(x)
\end{align*}
and
\begin{align*}
\Phi\left(\int_E f\,d\mu(x)\right)
\end{align*}
are finite [real numbers](/page/Real%20Numbers). For each $k$, [Jensen's Inequality](/theorems/9) for the convex function $\Phi$ applied to the probability measure $\mu_{>k}$ gives, for $\mu_{\leq k}$-almost every $x_{\leq k}$,
\begin{align*}
\Phi(F_k(x_{\leq k})) \leq \int_{E_{>k}} \Phi(f(x_{\leq k},z))\,d\mu_{>k}(z).
\end{align*}
Integrating with respect to $\mu_{\leq k}$ and applying Tonelli's theorem to the positive and negative parts, which is legitimate because $\Phi$ is bounded below on $[0,\infty)$, yields
\begin{align*}
\int_{E_{\leq k}} \Phi(F_k)\,d\mu_{\leq k} \leq \int_E \Phi(f)\,d\mu(x) < \infty.
\end{align*}
Thus every intermediate entropy term is a real number, so the following telescoping cancellation is an ordinary finite cancellation. Since $F_n=f$ and
\begin{align*}
F_0 = \int_E f\,d\mu(x),
\end{align*}
we have
\begin{align*}
\operatorname{Ent}_\mu(f) = \int_E \Phi(F_n)\,d\mu(x) - \Phi(F_0).
\end{align*}
Adding and subtracting the intermediate terms gives
\begin{align*}
\operatorname{Ent}_\mu(f) = \sum_{k=1}^n \left(\int_{E_{\leq k}} \Phi(F_k)\,d\mu_{\leq k} - \int_{E_{\leq k-1}} \Phi(F_{k-1})\,d\mu_{\leq k-1}\right).
\end{align*}
For fixed $k$, [Fubini's theorem](/theorems/2961) applies because $f$ is non-negative and integrable on the product probability space, and it gives
\begin{align*}
F_{k-1}(x_1,\dots,x_{k-1}) = \int_{E_k} F_k(x_1,\dots,x_{k-1},y)\,d\mu_k(y)
\end{align*}
for $\mu_{\leq k-1}$-almost every $(x_1,\dots,x_{k-1})$. Therefore the $k$-th summand is
\begin{align*}
\int_{E_{\leq k-1}} \operatorname{Ent}_{\mu_k}\bigl(F_k(x_1,\dots,x_{k-1},\cdot)\bigr)\,d\mu_{\leq k-1}(x_1,\dots,x_{k-1}).
\end{align*}
[guided]
The purpose of the functions $F_k$ is to expose the variables gradually. The function $F_k$ depends only on the first $k$ coordinates; it is obtained from $f$ by integrating out the last $n-k$ coordinates. Tonelli's theorem guarantees these sections are measurable because $f$ is non-negative and measurable, and [Fubini's theorem](/theorems/2961) is available because $f$ is integrable with respect to the product probability measure. Thus $F_n$ is the original function, while $F_0$ is the global average
\begin{align*}
F_0 = \int_E f\,d\mu(x).
\end{align*}
Before telescoping, we must check that we are not subtracting undefined extended quantities. We use the convention in the definition of finite entropy that $\operatorname{Ent}_\mu(f)<\infty$ means both
\begin{align*}
\int_E \Phi(f)\,d\mu(x)
\end{align*}
and
\begin{align*}
\Phi\left(\int_E f\,d\mu(x)\right)
\end{align*}
are finite real numbers. [Jensen's Inequality](/theorems/9) for the convex function $\Phi$ applied to the probability measure $\mu_{>k}$ gives
\begin{align*}
\Phi(F_k(x_{\leq k})) \leq \int_{E_{>k}} \Phi(f(x_{\leq k},z))\,d\mu_{>k}(z)
\end{align*}
for $\mu_{\leq k}$-almost every $x_{\leq k}$. Because $\Phi$ is bounded below on $[0,\infty)$, Tonelli's theorem applies to the positive and negative parts, and hence
\begin{align*}
\int_{E_{\leq k}}\Phi(F_k)\,d\mu_{\leq k} \leq \int_E \Phi(f)\,d\mu(x) < \infty.
\end{align*}
Therefore all intermediate quantities in the telescoping identity are finite real numbers.
The entropy of $f$ is
\begin{align*}
\operatorname{Ent}_\mu(f) = \int_E \Phi(f)\,d\mu - \Phi\left(\int_E f\,d\mu\right).
\end{align*}
Because $F_n=f$ and $F_0=\int_E f\,d\mu$, this becomes
\begin{align*}
\operatorname{Ent}_\mu(f) = \int_E \Phi(F_n)\,d\mu - \Phi(F_0).
\end{align*}
Now insert every intermediate quantity $\int_{E_{\leq k}}\Phi(F_k)\,d\mu_{\leq k}$. The resulting telescoping identity is
\begin{align*}
\operatorname{Ent}_\mu(f) = \sum_{k=1}^n \left(\int_{E_{\leq k}} \Phi(F_k)\,d\mu_{\leq k} - \int_{E_{\leq k-1}} \Phi(F_{k-1})\,d\mu_{\leq k-1}\right).
\end{align*}
We now identify each difference as an entropy in the newly revealed coordinate $x_k$. Fix $k \in \{1,\dots,n\}$ and fix $(x_1,\dots,x_{k-1}) \in E_{\leq k-1}$ outside the null set where the Fubini identity can fail. By definition,
\begin{align*}
F_k(x_1,\dots,x_{k-1},y) = \int_{E_{>k}} f(x_1,\dots,x_{k-1},y,z_{k+1},\dots,z_n)\,d\mu_{>k}(z_{k+1},\dots,z_n).
\end{align*}
Averaging this expression over $y \in E_k$ and applying [Fubini's theorem](/theorems/2961) to the non-negative integrable function $f$ gives exactly $F_{k-1}(x_1,\dots,x_{k-1})$:
\begin{align*}
F_{k-1}(x_1,\dots,x_{k-1}) = \int_{E_k} F_k(x_1,\dots,x_{k-1},y)\,d\mu_k(y).
\end{align*}
Hence, for this fixed past coordinate vector, the expression
\begin{align*}
\int_{E_k}\Phi(F_k(x_1,\dots,x_{k-1},y))\,d\mu_k(y) - \Phi(F_{k-1}(x_1,\dots,x_{k-1}))
\end{align*}
is precisely
\begin{align*}
\operatorname{Ent}_{\mu_k}\bigl(F_k(x_1,\dots,x_{k-1},\cdot)\bigr).
\end{align*}
Integrating over the past variables gives the claimed representation of the $k$-th telescoping increment.
[/guided]
[/step]
[step:Use joint convexity of relative entropy to contract the later-coordinate average]
Fix $k \in \{1,\dots,n\}$ and fix $x_{\leq k-1}=(x_1,\dots,x_{k-1}) \in E_{\leq k-1}$ outside the null set where the relevant sections are measurable and integrable. Define the measurable function $h: E_k \times E_{>k} \to [0,\infty)$ by
\begin{align*}
h(y,z) := f(x_1,\dots,x_{k-1},y,z_{k+1},\dots,z_n).
\end{align*}
Define the averaged slice $H: E_k \to [0,\infty)$ by
\begin{align*}
H(y) := \int_{E_{>k}} h(y,z)\,d\mu_{>k}(z).
\end{align*}
Then $H(y)=F_k(x_1,\dots,x_{k-1},y)$ for $\mu_k$-almost every $y$.
Let $\Psi:[0,\infty)\times[0,\infty)\to[-e^{-1},\infty]$ be defined by $\Psi(u,v):=u\log(u/v)$ when $u>0$ and $v>0$, by $\Psi(0,v):=0$ when $v\geq0$, and by $\Psi(u,0):=+\infty$ when $u>0$. The function $\Psi$ is the perspective of the convex function $\Phi$, hence is jointly convex. For each $v\geq0$ and $u\geq0$, the lower bound $r\log r\geq -e^{-1}$ for $r\geq0$ gives $\Psi(u,v)\geq -v/e$, with the convention already stated when $v=0$.
For each $z\in E_{>k}$ for which the section $h(\cdot,z)$ is integrable, define the map $m:E_{>k}\to[0,\infty]$ by
\begin{align*}
m(z) := \int_{E_k} h(y,z)\,d\mu_k(y).
\end{align*}
Also define
\begin{align*}
M := \int_{E_{>k}} m(z)\,d\mu_{>k}(z) = \int_{E_k} H(y)\,d\mu_k(y),
\end{align*}
where the equality follows from [Fubini's theorem](/theorems/2961) applied to the non-negative integrable function $h$.
Using the representation of entropy by $\Psi$, we have
\begin{align*}
\operatorname{Ent}_{\mu_k}(H) = \int_{E_k} \Psi(H(y),M)\,d\mu_k(y).
\end{align*}
For each fixed $y\in E_k$, [Jensen's Inequality](/theorems/9) for the jointly convex function $\Psi$ on the probability space $(E_{>k},\mu_{>k})$ gives
\begin{align*}
\Psi(H(y),M) \leq \int_{E_{>k}} \Psi(h(y,z),m(z))\,d\mu_{>k}(z).
\end{align*}
Integrating over $E_k$ is legitimate after controlling negative parts. Indeed, the lower bound above gives
\begin{align*}
\Psi(H(y),M)^- \leq M/e
\end{align*}
for $\mu_k$-almost every $y$, and
\begin{align*}
\Psi(h(y,z),m(z))^- \leq m(z)/e
\end{align*}
for $\mu_k\otimes\mu_{>k}$-almost every $(y,z)$. Since
\begin{align*}
\int_{E_{>k}}m(z)\,d\mu_{>k}(z)=M<\infty,
\end{align*}
the negative parts are integrable. Applying [Fubini's theorem](/theorems/2961) to the integrable negative parts and Tonelli's theorem to the positive parts yields
\begin{align*}
\operatorname{Ent}_{\mu_k}(H) \leq \int_{E_{>k}}\int_{E_k}\Psi(h(y,z),m(z))\,d\mu_k(y)\,d\mu_{>k}(z).
\end{align*}
The inner integral is exactly $\operatorname{Ent}_{\mu_k}(h(\cdot,z))$, interpreted in the extended sense if the section is outside the entropy domain. Therefore,
\begin{align*}
\operatorname{Ent}_{\mu_k}\bigl(F_k(x_1,\dots,x_{k-1},\cdot)\bigr) \leq \int_{E_{>k}}\operatorname{Ent}_{\mu_k,k}(f)(x_1,\dots,x_{k-1},z_{k+1},\dots,z_n)\,d\mu_{>k}(z).
\end{align*}
[guided]
We need to compare the entropy of the averaged slice $H$ with the average of the entropies of the unaveraged slices $h(\cdot,z)$. A direct convexity argument for $g\mapsto \operatorname{Ent}_{\mu_k}(g)$ is delicate because entropy contains a negative term involving the mean. The clean scalar way to handle this is to rewrite entropy as a relative entropy integrand.
Fix $k\in\{1,\dots,n\}$ and fix $x_{\leq k-1}=(x_1,\dots,x_{k-1})\in E_{\leq k-1}$ outside the null set where the relevant sections are measurable and integrable. Define the measurable map $h:E_k\times E_{>k}\to[0,\infty)$ by
\begin{align*}
h(y,z) := f(x_1,\dots,x_{k-1},y,z_{k+1},\dots,z_n).
\end{align*}
Define the average over the later variables as the measurable map $H:E_k\to[0,\infty)$ with
\begin{align*}
H(y) := \int_{E_{>k}} h(y,z)\,d\mu_{>k}(z).
\end{align*}
Then $H(y)=F_k(x_1,\dots,x_{k-1},y)$ for $\mu_k$-almost every $y$.
Now define the scalar function $\Psi:[0,\infty)\times[0,\infty)\to[-e^{-1},\infty]$ by $\Psi(u,v)=u\log(u/v)$ for $u>0$ and $v>0$, by $\Psi(0,v)=0$ for $v\geq0$, and by $\Psi(u,0)=+\infty$ for $u>0$. This function is the perspective of the convex function $\Phi(t)=t\log t$, so $\Psi$ is jointly convex in the two scalar variables. This joint convexity is the correct replacement for the invalid subtraction of two convexity inequalities. We will also need a lower bound: for $v>0$, writing $r=u/v$ gives $\Psi(u,v)=v r\log r$, and $r\log r\geq -e^{-1}$ for $r\geq0$. Thus $\Psi(u,v)\geq -v/e$, and the same lower bound holds under the stated conventions when $v=0$.
For each later-coordinate value $z\in E_{>k}$ whose section is integrable, define its $E_k$-mean as the map $m:E_{>k}\to[0,\infty]$ by
\begin{align*}
m(z) := \int_{E_k} h(y,z)\,d\mu_k(y).
\end{align*}
Define the mean of $H$ by
\begin{align*}
M := \int_{E_k} H(y)\,d\mu_k(y).
\end{align*}
[Fubini's theorem](/theorems/2961) applies to the non-negative integrable function $h$, so the two ways of computing the total mean agree:
\begin{align*}
M = \int_{E_{>k}} m(z)\,d\mu_{>k}(z).
\end{align*}
The entropy of $H$ can now be written as
\begin{align*}
\operatorname{Ent}_{\mu_k}(H) = \int_{E_k}\Psi(H(y),M)\,d\mu_k(y).
\end{align*}
For fixed $y\in E_k$, the pair $(H(y),M)$ is the $\mu_{>k}$-average of the pairs $(h(y,z),m(z))$. Therefore [Jensen's Inequality](/theorems/9) for the jointly convex scalar function $\Psi$ gives
\begin{align*}
\Psi(H(y),M) \leq \int_{E_{>k}}\Psi(h(y,z),m(z))\,d\mu_{>k}(z).
\end{align*}
Integrating this inequality over $E_k$ requires a signed-integral justification, because $\Psi$ is not non-negative. The lower bound gives
\begin{align*}
\Psi(H(y),M)^- \leq M/e
\end{align*}
for $\mu_k$-almost every $y$, and
\begin{align*}
\Psi(h(y,z),m(z))^- \leq m(z)/e
\end{align*}
for $\mu_k\otimes\mu_{>k}$-almost every $(y,z)$. Since Fubini gave
\begin{align*}
\int_{E_{>k}}m(z)\,d\mu_{>k}(z)=M<\infty,
\end{align*}
these negative parts are integrable. We may therefore apply [Fubini's theorem](/theorems/2961) to the negative parts and Tonelli's theorem to the positive parts, obtaining
\begin{align*}
\operatorname{Ent}_{\mu_k}(H) \leq \int_{E_{>k}}\int_{E_k}\Psi(h(y,z),m(z))\,d\mu_k(y)\,d\mu_{>k}(z).
\end{align*}
For fixed $z$, the inner integral is exactly the entropy of the slice $h(\cdot,z)$ with respect to $\mu_k$:
\begin{align*}
\int_{E_k}\Psi(h(y,z),m(z))\,d\mu_k(y)=\operatorname{Ent}_{\mu_k}(h(\cdot,z)).
\end{align*}
Thus
\begin{align*}
\operatorname{Ent}_{\mu_k}\bigl(F_k(x_1,\dots,x_{k-1},\cdot)\bigr) \leq \int_{E_{>k}}\operatorname{Ent}_{\mu_k,k}(f)(x_1,\dots,x_{k-1},z_{k+1},\dots,z_n)\,d\mu_{>k}(z),
\end{align*}
with the right-hand side interpreted in the extended sense for exceptional slices.
[/guided]
[/step]
[step:Integrate the coordinate bounds and sum over all coordinates]
Integrating the preceding inequality over $E_{\leq k-1}$ with respect to $\mu_{\leq k-1}$ gives
\begin{align*}
\int_{E_{\leq k-1}} \operatorname{Ent}_{\mu_k}\bigl(F_k(x_1,\dots,x_{k-1},\cdot)\bigr)\,d\mu_{\leq k-1}(x_1,\dots,x_{k-1})
\end{align*}
\begin{align*}
\leq \int_{E_{\leq k-1}}\int_{E_{>k}}\operatorname{Ent}_{\mu_k,k}(f)(x_1,\dots,x_{k-1},z_{k+1},\dots,z_n)\,d\mu_{>k}(z)\,d\mu_{\leq k-1}(x_1,\dots,x_{k-1}).
\end{align*}
Here $\operatorname{Ent}_{\mu_k,k}(f)(x_{-k})$ denotes the entropy, with respect to $\mu_k$, of the one-coordinate slice obtained by fixing all coordinates of $x$ except the $k$-th. Since the integrand on the right is non-negative and independent of the $k$-th coordinate, Tonelli's theorem permits inserting the probability integration over $E_k$ without changing its value. Thus
\begin{align*}
\int_{E_{\leq k-1}}\int_{E_{>k}}\operatorname{Ent}_{\mu_k,k}(f)\,d\mu_{>k}\,d\mu_{\leq k-1} = \int_E \operatorname{Ent}_{\mu_k,k}(f)(x_{-k})\,d\mu(x).
\end{align*}
Combining this with the telescoping decomposition gives
\begin{align*}
\operatorname{Ent}_\mu(f) \leq \sum_{k=1}^n \int_E \operatorname{Ent}_{\mu_k,k}(f)(x_{-k})\,d\mu(x).
\end{align*}
This is the asserted tensorization inequality.
[/step]