Androma — The Home of Mathematics on the Internet

custom_env admin

[step:Derive finite entropy tensorization by conditioning one coordinate]We prove the finite product entropy tensorization inequality for a non-negative function $g: \{0,1\}^n \to [0,\infty)$. Let $X: \{0,1\}^n \to \{0,1\}^n$ be the identity random vector under $\mu_p$. For $i \in \{1,\dots,n\}$ and $y \in \{0,1\}^{n-1}$, define $g_{i,y}: \{0,1\} \to [0,\infty)$ by $g_{i,y}(b)=g(y^{i,b})$ for $b \in \{0,1\}$. For a finite probability measure $\rho$ and a non-negative function $u$ on its support, define $\operatorname{Ent}_{\rho}(u)=\mathbb{E}_{\rho}[u\log u]-\mathbb{E}_{\rho}[u]\log \mathbb{E}_{\rho}[u]$, with $0\log 0:=0$. The following finite tensorization estimate is proved by induction on $n$: \begin{align*} \operatorname{Ent}_{\mu_p}(g)\leq \sum_{i=1}^n \mathbb{E}_{\mu_p}\bigl[\operatorname{Ent}_{\nu_p}(g_{i,\pi_{-i}(X)})\bigr]. \end{align*} For $n=1$ this is equality. Assume the estimate holds for $n-1$. Write a point as $(z,b)\in\{0,1\}^{n-1}\times\{0,1\}$, let $\rho=\nu_p^{\otimes(n-1)}$, and define $G:\{0,1\}^{n-1}\to[0,\infty)$ by $G(z)=\mathbb{E}_{\nu_p}[g(z,\cdot)]$. Directly expanding the definition of entropy gives the exact decomposition \begin{align*} \operatorname{Ent}_{\rho\otimes\nu_p}(g)=\mathbb{E}_{\rho}\bigl[\operatorname{Ent}_{\nu_p}(g(z,\cdot))\bigr]+\operatorname{Ent}_{\rho}(G). \end{align*} For each $i \in \{1,\dots,n-1\}$ and $w \in \{0,1\}^{n-2}$, let $w^{i,a} \in \{0,1\}^{n-1}$ denote the point whose deleted-coordinate vector is $w$ and whose $i$-th coordinate is $a \in \{0,1\}$. Define the fibre function $G_{i,w}: \{0,1\} \to [0,\infty)$ by $G_{i,w}(a)=G(w^{i,a})$ for $a \in \{0,1\}$. Applying the induction hypothesis to $G$ gives \begin{align*} \operatorname{Ent}_{\rho}(G)\leq\sum_{i=1}^{n-1}\mathbb{E}_{\rho}\bigl[\operatorname{Ent}_{\nu_p}(G_{i,\pi_{-i}(Z)})\bigr], \end{align*} where $Z:\{0,1\}^{n-1}\to\{0,1\}^{n-1}$ is the identity random vector under $\rho$. For each $i\leq n-1$, the function $G_{i,\pi_{-i}(Z)}$ is the $\nu_p$-average in the last coordinate of the corresponding two-point fibre of $g$. The entropy functional is convex on non-negative functions on $\{0,1\}$: for non-negative functions $u_0,u_1: \{0,1\} \to [0,\infty)$ and $\lambda\in[0,1]$, \begin{align*} \operatorname{Ent}_{\nu_p}(\lambda u_1+(1-\lambda)u_0)\leq \lambda\operatorname{Ent}_{\nu_p}(u_1)+(1-\lambda)\operatorname{Ent}_{\nu_p}(u_0). \end{align*} This finite two-point convexity inequality includes the boundary cases because all functions are non-negative and the entropy convention is $0\log 0:=0$. Applying this convexity inequality to the finite average in the last coordinate gives \begin{align*} \mathbb{E}_{\rho}\bigl[\operatorname{Ent}_{\nu_p}(G_{i,\pi_{-i}(Z)})\bigr]\leq\mathbb{E}_{\mu_p}\bigl[\operatorname{Ent}_{\nu_p}(g_{i,\pi_{-i}(X)})\bigr]. \end{align*} Combining these estimates with the decomposition proves tensorization.[/step]

custom_env admin

[guided]We prove the tensorization estimate rather than citing it. Because the cube is finite, all functions are measurable and all expectations are finite sums. Let $X: \{0,1\}^n \to \{0,1\}^n$ be the identity random vector under $\mu_p$. For $i \in \{1,\dots,n\}$ and $y \in \{0,1\}^{n-1}$, define $g_{i,y}: \{0,1\} \to [0,\infty)$ by $g_{i,y}(b)=g(y^{i,b})$ for $b \in \{0,1\}$. For a finite probability measure $\rho$ and a non-negative function $u$ on its support, define \begin{align*} \operatorname{Ent}_{\rho}(u)=\mathbb{E}_{\rho}[u\log u]-\mathbb{E}_{\rho}[u]\log \mathbb{E}_{\rho}[u], \end{align*} with the convention $0\log 0:=0$. We prove by induction on $n$ that \begin{align*} \operatorname{Ent}_{\mu_p}(g)\leq \sum_{i=1}^n \mathbb{E}_{\mu_p}\bigl[\operatorname{Ent}_{\nu_p}(g_{i,\pi_{-i}(X)})\bigr]. \end{align*} When $n=1$, the right-hand side has one term and is exactly $\operatorname{Ent}_{\nu_p}(g)$, so the estimate is equality. Assume the estimate is known in dimension $n-1$. Write a point of the cube as $(z,b)\in\{0,1\}^{n-1}\times\{0,1\}$, let $\rho=\nu_p^{\otimes(n-1)}$, and define the averaged function $G:\{0,1\}^{n-1}\to[0,\infty)$ by \begin{align*} G(z)=\mathbb{E}_{\nu_p}[g(z,\cdot)]. \end{align*} Expanding the definition of entropy separates the last coordinate from the first $n-1$ coordinates: \begin{align*} \operatorname{Ent}_{\rho\otimes\nu_p}(g)=\mathbb{E}_{\rho}\bigl[\operatorname{Ent}_{\nu_p}(g(z,\cdot))\bigr]+\operatorname{Ent}_{\rho}(G). \end{align*} This identity follows by writing both sides as finite sums: the terms $\mathbb{E}_{\rho\otimes\nu_p}[g\log g]$ agree, the intermediate term $\mathbb{E}_{\rho}[G\log G]$ cancels once with opposite signs, and the remaining term is $-\mathbb{E}_{\rho\otimes\nu_p}[g]\log\mathbb{E}_{\rho\otimes\nu_p}[g]$. Now apply the induction hypothesis to $G$ on the $(n-1)$-dimensional Bernoulli cube. For each $i \in \{1,\dots,n-1\}$ and $w \in \{0,1\}^{n-2}$, let $w^{i,a} \in \{0,1\}^{n-1}$ denote the point whose deleted-coordinate vector is $w$ and whose $i$-th coordinate is $a \in \{0,1\}$. Define the fibre function $G_{i,w}: \{0,1\} \to [0,\infty)$ by \begin{align*} G_{i,w}(a)=G(w^{i,a}). \end{align*} If $Z:\{0,1\}^{n-1}\to\{0,1\}^{n-1}$ is the identity random vector under $\rho$, then \begin{align*} \operatorname{Ent}_{\rho}(G)\leq\sum_{i=1}^{n-1}\mathbb{E}_{\rho}\bigl[\operatorname{Ent}_{\nu_p}(G_{i,\pi_{-i}(Z)})\bigr]. \end{align*} For a fixed coordinate $i\leq n-1$ and fixed outside coordinates, $G_{i,\pi_{-i}(Z)}$ is obtained by averaging, with respect to the last Bernoulli coordinate, the corresponding two-point fibre functions of $g$. We need a valid reason that averaging cannot increase the entropy term. The entropy functional is convex on non-negative functions on the two-point space: for non-negative functions $u_0,u_1:\{0,1\}\to[0,\infty)$ and $\lambda\in[0,1]$, \begin{align*} \operatorname{Ent}_{\nu_p}(\lambda u_1+(1-\lambda)u_0)\leq \lambda\operatorname{Ent}_{\nu_p}(u_1)+(1-\lambda)\operatorname{Ent}_{\nu_p}(u_0). \end{align*} The boundary cases in this inequality are covered by the hypotheses $u_0,u_1\geq 0$ and the convention $0\log 0:=0$ in the entropy definition. Applying this convexity inequality to the finite average in the last coordinate gives, for each $i\leq n-1$, \begin{align*} \mathbb{E}_{\rho}\bigl[\operatorname{Ent}_{\nu_p}(G_{i,\pi_{-i}(Z)})\bigr]\leq\mathbb{E}_{\mu_p}\bigl[\operatorname{Ent}_{\nu_p}(g_{i,\pi_{-i}(X)})\bigr]. \end{align*} The remaining term in the decomposition is exactly the fibre entropy in the last coordinate: \begin{align*} \mathbb{E}_{\rho}\bigl[\operatorname{Ent}_{\nu_p}(g(z,\cdot))\bigr]=\mathbb{E}_{\mu_p}\bigl[\operatorname{Ent}_{\nu_p}(g_{n,\pi_{-n}(X)})\bigr]. \end{align*} Combining the decomposition, the induction hypothesis, and the Jensen comparison proves \begin{align*} \operatorname{Ent}_{\mu_p}(g)\leq\sum_{i=1}^n\mathbb{E}_{\mu_p}\bigl[\operatorname{Ent}_{\nu_p}(g_{i,\pi_{-i}(X)})\bigr]. \end{align*} This is the finite product entropy tensorization inequality used below.[/guided]

custom_env admin

What brings you to Androma?

Start with a route through the knowledge graph.

Attributions & Verification

Proof

Verification Progress

Contributors

Who Can Verify

Quick Actions

Sign in to Androma

Check your inbox

One last step

Attributions & Verification

Proof

Verification Progress

Contributors

Who Can Verify

Quick Actions

Raw Attribution Data