[guided]We prove the tensorization estimate rather than citing it. Because the cube is finite, all functions are measurable and all expectations are finite sums. Let $X: \{0,1\}^n \to \{0,1\}^n$ be the identity random vector under $\mu_p$. For $i \in \{1,\dots,n\}$ and $y \in \{0,1\}^{n-1}$, define $g_{i,y}: \{0,1\} \to [0,\infty)$ by $g_{i,y}(b)=g(y^{i,b})$ for $b \in \{0,1\}$.
For a finite probability measure $\rho$ and a non-negative function $u$ on its support, define
\begin{align*}
\operatorname{Ent}_{\rho}(u)=\mathbb{E}_{\rho}[u\log u]-\mathbb{E}_{\rho}[u]\log \mathbb{E}_{\rho}[u],
\end{align*}
with the convention $0\log 0:=0$. We prove by induction on $n$ that
\begin{align*}
\operatorname{Ent}_{\mu_p}(g)\leq \sum_{i=1}^n \mathbb{E}_{\mu_p}\bigl[\operatorname{Ent}_{\nu_p}(g_{i,\pi_{-i}(X)})\bigr].
\end{align*}
When $n=1$, the right-hand side has one term and is exactly $\operatorname{Ent}_{\nu_p}(g)$, so the estimate is equality.
Assume the estimate is known in dimension $n-1$. Write a point of the cube as $(z,b)\in\{0,1\}^{n-1}\times\{0,1\}$, let $\rho=\nu_p^{\otimes(n-1)}$, and define the averaged function $G:\{0,1\}^{n-1}\to[0,\infty)$ by
\begin{align*}
G(z)=\mathbb{E}_{\nu_p}[g(z,\cdot)].
\end{align*}
Expanding the definition of entropy separates the last coordinate from the first $n-1$ coordinates:
\begin{align*}
\operatorname{Ent}_{\rho\otimes\nu_p}(g)=\mathbb{E}_{\rho}\bigl[\operatorname{Ent}_{\nu_p}(g(z,\cdot))\bigr]+\operatorname{Ent}_{\rho}(G).
\end{align*}
This identity follows by writing both sides as finite sums: the terms $\mathbb{E}_{\rho\otimes\nu_p}[g\log g]$ agree, the intermediate term $\mathbb{E}_{\rho}[G\log G]$ cancels once with opposite signs, and the remaining term is $-\mathbb{E}_{\rho\otimes\nu_p}[g]\log\mathbb{E}_{\rho\otimes\nu_p}[g]$.
Now apply the induction hypothesis to $G$ on the $(n-1)$-dimensional Bernoulli cube. For each $i \in \{1,\dots,n-1\}$ and $w \in \{0,1\}^{n-2}$, let $w^{i,a} \in \{0,1\}^{n-1}$ denote the point whose deleted-coordinate vector is $w$ and whose $i$-th coordinate is $a \in \{0,1\}$. Define the fibre function $G_{i,w}: \{0,1\} \to [0,\infty)$ by
\begin{align*}
G_{i,w}(a)=G(w^{i,a}).
\end{align*}
If $Z:\{0,1\}^{n-1}\to\{0,1\}^{n-1}$ is the identity random vector under $\rho$, then
\begin{align*}
\operatorname{Ent}_{\rho}(G)\leq\sum_{i=1}^{n-1}\mathbb{E}_{\rho}\bigl[\operatorname{Ent}_{\nu_p}(G_{i,\pi_{-i}(Z)})\bigr].
\end{align*}
For a fixed coordinate $i\leq n-1$ and fixed outside coordinates, $G_{i,\pi_{-i}(Z)}$ is obtained by averaging, with respect to the last Bernoulli coordinate, the corresponding two-point fibre functions of $g$.
We need a valid reason that averaging cannot increase the entropy term. The entropy functional is convex on non-negative functions on the two-point space: for non-negative functions $u_0,u_1:\{0,1\}\to[0,\infty)$ and $\lambda\in[0,1]$,
\begin{align*}
\operatorname{Ent}_{\nu_p}(\lambda u_1+(1-\lambda)u_0)\leq \lambda\operatorname{Ent}_{\nu_p}(u_1)+(1-\lambda)\operatorname{Ent}_{\nu_p}(u_0).
\end{align*}
The boundary cases in this inequality are covered by the hypotheses $u_0,u_1\geq 0$ and the convention $0\log 0:=0$ in the entropy definition. Applying this convexity inequality to the finite average in the last coordinate gives, for each $i\leq n-1$,
\begin{align*}
\mathbb{E}_{\rho}\bigl[\operatorname{Ent}_{\nu_p}(G_{i,\pi_{-i}(Z)})\bigr]\leq\mathbb{E}_{\mu_p}\bigl[\operatorname{Ent}_{\nu_p}(g_{i,\pi_{-i}(X)})\bigr].
\end{align*}
The remaining term in the decomposition is exactly the fibre entropy in the last coordinate:
\begin{align*}
\mathbb{E}_{\rho}\bigl[\operatorname{Ent}_{\nu_p}(g(z,\cdot))\bigr]=\mathbb{E}_{\mu_p}\bigl[\operatorname{Ent}_{\nu_p}(g_{n,\pi_{-n}(X)})\bigr].
\end{align*}
Combining the decomposition, the induction hypothesis, and the Jensen comparison proves
\begin{align*}
\operatorname{Ent}_{\mu_p}(g)\leq\sum_{i=1}^n\mathbb{E}_{\mu_p}\bigl[\operatorname{Ent}_{\nu_p}(g_{i,\pi_{-i}(X)})\bigr].
\end{align*}
This is the finite product entropy tensorization inequality used below.[/guided]