[proofplan]
We prove the product inequality by decomposing entropy one coordinate at a time. The key finite-space fact is entropy tensorization: the entropy under a product measure is bounded by the sum of the conditional one-coordinate entropies. Once this is applied to $f^2$, each conditional entropy is exactly the entropy of a function on the two-point Bernoulli space, so the assumed [two-point logarithmic Sobolev inequality](/theorems/6752) gives the desired coordinate derivative term.
[/proofplan]
[step:Define the coordinate fibres and conditional functions]
For each $i \in \{1,\dots,n\}$, define the coordinate-deletion map $\pi_{-i}: \{0,1\}^n \to \{0,1\}^{n-1}$ by removing the $i$-th coordinate. For $y \in \{0,1\}^{n-1}$, let $y^{i,b} \in \{0,1\}^n$ denote the point whose deleted-coordinate vector is $y$ and whose $i$-th coordinate is $b \in \{0,1\}$.
For each $i \in \{1,\dots,n\}$, the discrete derivative $D_i f: \{0,1\}^n \to \mathbb{R}$ is the function defined by $D_i f(x)=f(x^{i,1})-f(x^{i,0})$. For each $i \in \{1,\dots,n\}$ and $y \in \{0,1\}^{n-1}$, define the fibre function $f_{i,y}: \{0,1\} \to \mathbb{R}$ by $f_{i,y}(b)=f(y^{i,b})$ for $b \in \{0,1\}$. Then
\begin{align*}
f_{i,y}(1)-f_{i,y}(0)=D_i f(y^{i,0})=D_i f(y^{i,1}).
\end{align*}
Thus $(D_i f)^2$ is constant on each $i$-th coordinate fibre.
[/step]
[step:Derive finite entropy tensorization by conditioning one coordinate]
We prove the finite product entropy tensorization inequality for a non-negative function $g: \{0,1\}^n \to [0,\infty)$. Let $X: \{0,1\}^n \to \{0,1\}^n$ be the identity random vector under $\mu_p$. For $i \in \{1,\dots,n\}$ and $y \in \{0,1\}^{n-1}$, define $g_{i,y}: \{0,1\} \to [0,\infty)$ by $g_{i,y}(b)=g(y^{i,b})$ for $b \in \{0,1\}$.
For a finite probability measure $\rho$ and a non-negative function $u$ on its support, define $\operatorname{Ent}_{\rho}(u)=\mathbb{E}_{\rho}[u\log u]-\mathbb{E}_{\rho}[u]\log \mathbb{E}_{\rho}[u]$, with $0\log 0:=0$. The following finite tensorization estimate is proved by induction on $n$:
\begin{align*}
\operatorname{Ent}_{\mu_p}(g)\leq \sum_{i=1}^n \mathbb{E}_{\mu_p}\bigl[\operatorname{Ent}_{\nu_p}(g_{i,\pi_{-i}(X)})\bigr].
\end{align*}
For $n=1$ this is equality. Assume the estimate holds for $n-1$. Write a point as $(z,b)\in\{0,1\}^{n-1}\times\{0,1\}$, let $\rho=\nu_p^{\otimes(n-1)}$, and define $G:\{0,1\}^{n-1}\to[0,\infty)$ by $G(z)=\mathbb{E}_{\nu_p}[g(z,\cdot)]$. Directly expanding the definition of entropy gives the exact decomposition
\begin{align*}
\operatorname{Ent}_{\rho\otimes\nu_p}(g)=\mathbb{E}_{\rho}\bigl[\operatorname{Ent}_{\nu_p}(g(z,\cdot))\bigr]+\operatorname{Ent}_{\rho}(G).
\end{align*}
For each $i \in \{1,\dots,n-1\}$ and $w \in \{0,1\}^{n-2}$, let $w^{i,a} \in \{0,1\}^{n-1}$ denote the point whose deleted-coordinate vector is $w$ and whose $i$-th coordinate is $a \in \{0,1\}$. Define the fibre function $G_{i,w}: \{0,1\} \to [0,\infty)$ by $G_{i,w}(a)=G(w^{i,a})$ for $a \in \{0,1\}$. Applying the induction hypothesis to $G$ gives
\begin{align*}
\operatorname{Ent}_{\rho}(G)\leq\sum_{i=1}^{n-1}\mathbb{E}_{\rho}\bigl[\operatorname{Ent}_{\nu_p}(G_{i,\pi_{-i}(Z)})\bigr],
\end{align*}
where $Z:\{0,1\}^{n-1}\to\{0,1\}^{n-1}$ is the identity random vector under $\rho$. For each $i\leq n-1$, the function $G_{i,\pi_{-i}(Z)}$ is the $\nu_p$-average in the last coordinate of the corresponding two-point fibre of $g$. The entropy functional is convex on non-negative functions on $\{0,1\}$: for non-negative functions $u_0,u_1: \{0,1\} \to [0,\infty)$ and $\lambda\in[0,1]$,
\begin{align*}
\operatorname{Ent}_{\nu_p}(\lambda u_1+(1-\lambda)u_0)\leq \lambda\operatorname{Ent}_{\nu_p}(u_1)+(1-\lambda)\operatorname{Ent}_{\nu_p}(u_0).
\end{align*}
This finite two-point convexity inequality includes the boundary cases because all functions are non-negative and the entropy convention is $0\log 0:=0$. Applying this convexity inequality to the finite average in the last coordinate gives
\begin{align*}
\mathbb{E}_{\rho}\bigl[\operatorname{Ent}_{\nu_p}(G_{i,\pi_{-i}(Z)})\bigr]\leq\mathbb{E}_{\mu_p}\bigl[\operatorname{Ent}_{\nu_p}(g_{i,\pi_{-i}(X)})\bigr].
\end{align*}
Combining these estimates with the decomposition proves tensorization.
[guided]
We prove the tensorization estimate rather than citing it. Because the cube is finite, all functions are measurable and all expectations are finite sums. Let $X: \{0,1\}^n \to \{0,1\}^n$ be the identity random vector under $\mu_p$. For $i \in \{1,\dots,n\}$ and $y \in \{0,1\}^{n-1}$, define $g_{i,y}: \{0,1\} \to [0,\infty)$ by $g_{i,y}(b)=g(y^{i,b})$ for $b \in \{0,1\}$.
For a finite probability measure $\rho$ and a non-negative function $u$ on its support, define
\begin{align*}
\operatorname{Ent}_{\rho}(u)=\mathbb{E}_{\rho}[u\log u]-\mathbb{E}_{\rho}[u]\log \mathbb{E}_{\rho}[u],
\end{align*}
with the convention $0\log 0:=0$. We prove by induction on $n$ that
\begin{align*}
\operatorname{Ent}_{\mu_p}(g)\leq \sum_{i=1}^n \mathbb{E}_{\mu_p}\bigl[\operatorname{Ent}_{\nu_p}(g_{i,\pi_{-i}(X)})\bigr].
\end{align*}
When $n=1$, the right-hand side has one term and is exactly $\operatorname{Ent}_{\nu_p}(g)$, so the estimate is equality.
Assume the estimate is known in dimension $n-1$. Write a point of the cube as $(z,b)\in\{0,1\}^{n-1}\times\{0,1\}$, let $\rho=\nu_p^{\otimes(n-1)}$, and define the averaged function $G:\{0,1\}^{n-1}\to[0,\infty)$ by
\begin{align*}
G(z)=\mathbb{E}_{\nu_p}[g(z,\cdot)].
\end{align*}
Expanding the definition of entropy separates the last coordinate from the first $n-1$ coordinates:
\begin{align*}
\operatorname{Ent}_{\rho\otimes\nu_p}(g)=\mathbb{E}_{\rho}\bigl[\operatorname{Ent}_{\nu_p}(g(z,\cdot))\bigr]+\operatorname{Ent}_{\rho}(G).
\end{align*}
This identity follows by writing both sides as finite sums: the terms $\mathbb{E}_{\rho\otimes\nu_p}[g\log g]$ agree, the intermediate term $\mathbb{E}_{\rho}[G\log G]$ cancels once with opposite signs, and the remaining term is $-\mathbb{E}_{\rho\otimes\nu_p}[g]\log\mathbb{E}_{\rho\otimes\nu_p}[g]$.
Now apply the induction hypothesis to $G$ on the $(n-1)$-dimensional Bernoulli cube. For each $i \in \{1,\dots,n-1\}$ and $w \in \{0,1\}^{n-2}$, let $w^{i,a} \in \{0,1\}^{n-1}$ denote the point whose deleted-coordinate vector is $w$ and whose $i$-th coordinate is $a \in \{0,1\}$. Define the fibre function $G_{i,w}: \{0,1\} \to [0,\infty)$ by
\begin{align*}
G_{i,w}(a)=G(w^{i,a}).
\end{align*}
If $Z:\{0,1\}^{n-1}\to\{0,1\}^{n-1}$ is the identity random vector under $\rho$, then
\begin{align*}
\operatorname{Ent}_{\rho}(G)\leq\sum_{i=1}^{n-1}\mathbb{E}_{\rho}\bigl[\operatorname{Ent}_{\nu_p}(G_{i,\pi_{-i}(Z)})\bigr].
\end{align*}
For a fixed coordinate $i\leq n-1$ and fixed outside coordinates, $G_{i,\pi_{-i}(Z)}$ is obtained by averaging, with respect to the last Bernoulli coordinate, the corresponding two-point fibre functions of $g$.
We need a valid reason that averaging cannot increase the entropy term. The entropy functional is convex on non-negative functions on the two-point space: for non-negative functions $u_0,u_1:\{0,1\}\to[0,\infty)$ and $\lambda\in[0,1]$,
\begin{align*}
\operatorname{Ent}_{\nu_p}(\lambda u_1+(1-\lambda)u_0)\leq \lambda\operatorname{Ent}_{\nu_p}(u_1)+(1-\lambda)\operatorname{Ent}_{\nu_p}(u_0).
\end{align*}
The boundary cases in this inequality are covered by the hypotheses $u_0,u_1\geq 0$ and the convention $0\log 0:=0$ in the entropy definition. Applying this convexity inequality to the finite average in the last coordinate gives, for each $i\leq n-1$,
\begin{align*}
\mathbb{E}_{\rho}\bigl[\operatorname{Ent}_{\nu_p}(G_{i,\pi_{-i}(Z)})\bigr]\leq\mathbb{E}_{\mu_p}\bigl[\operatorname{Ent}_{\nu_p}(g_{i,\pi_{-i}(X)})\bigr].
\end{align*}
The remaining term in the decomposition is exactly the fibre entropy in the last coordinate:
\begin{align*}
\mathbb{E}_{\rho}\bigl[\operatorname{Ent}_{\nu_p}(g(z,\cdot))\bigr]=\mathbb{E}_{\mu_p}\bigl[\operatorname{Ent}_{\nu_p}(g_{n,\pi_{-n}(X)})\bigr].
\end{align*}
Combining the decomposition, the induction hypothesis, and the Jensen comparison proves
\begin{align*}
\operatorname{Ent}_{\mu_p}(g)\leq\sum_{i=1}^n\mathbb{E}_{\mu_p}\bigl[\operatorname{Ent}_{\nu_p}(g_{i,\pi_{-i}(X)})\bigr].
\end{align*}
This is the finite product entropy tensorization inequality used below.
[/guided]
[/step]
[step:Apply tensorization to $f^2$]
Define $g: \{0,1\}^n \to [0,\infty)$ by $g(x)=f(x)^2$. Applying the finite tensorization estimate proved in the previous step to this $g$ gives
\begin{align*}
\operatorname{Ent}_{\mu_p}(f^2)\leq\sum_{i=1}^n\mathbb{E}_{\mu_p}\bigl[\operatorname{Ent}_{\nu_p}(f_{i,\pi_{-i}(X)}^2)\bigr].
\end{align*}
Here $f_{i,\pi_{-i}(X)}^2$ denotes the function $b \mapsto f_{i,\pi_{-i}(X)}(b)^2$ on $\{0,1\}$.
[/step]
[step:Use the two-point logarithmic Sobolev inequality on each fibre]
Fix $i \in \{1,\dots,n\}$ and $y \in \{0,1\}^{n-1}$. The function $f_{i,y}: \{0,1\} \to \mathbb{R}$ is a function on the two-point Bernoulli space. By the defining property of the constant $C_p$ stated in the theorem,
\begin{align*}
\operatorname{Ent}_{\nu_p}(f_{i,y}^2)\leq C_p\bigl(f_{i,y}(1)-f_{i,y}(0)\bigr)^2.
\end{align*}
Using the fibre identity from the first step,
\begin{align*}
\operatorname{Ent}_{\nu_p}(f_{i,y}^2)\leq C_p (D_i f(y^{i,0}))^2.
\end{align*}
Since $(D_i f)^2$ is constant on the fibre over $y$, averaging over $y$ with respect to $\nu_p^{\otimes(n-1)}$ gives
\begin{align*}
\mathbb{E}_{\mu_p}\bigl[\operatorname{Ent}_{\nu_p}(f_{i,\pi_{-i}(X)}^2)\bigr]\leq C_p \mathbb{E}_{\mu_p}\bigl[(D_i f)^2\bigr].
\end{align*}
[/step]
[step:Sum the coordinate bounds to obtain the product inequality]
Substituting the fibrewise estimate into the tensorized entropy bound yields
\begin{align*}
\operatorname{Ent}_{\mu_p}(f^2)\leq\sum_{i=1}^n C_p \mathbb{E}_{\mu_p}\bigl[(D_i f)^2\bigr].
\end{align*}
Since $C_p$ is independent of the coordinate $i$, this is exactly
\begin{align*}
\operatorname{Ent}_{\mu_p}(f^2)\leq C_p \sum_{i=1}^n \mathbb{E}_{\mu_p}\bigl[(D_i f)^2\bigr].
\end{align*}
This proves the discrete tensorized logarithmic Sobolev inequality on the Bernoulli cube.
[/step]