Discrete Tensorized Logarithmic Sobolev Inequality on the Bernoulli Cube

Discrete Tensorized Logarithmic Sobolev Inequality on the Bernoulli Cube (Theorem # 6755)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We prove the product inequality by decomposing entropy one coordinate at a time. The key finite-space fact is entropy tensorization: the entropy under a product measure is bounded by the sum of the conditional one-coordinate entropies. Once this is applied to $f^2$, each conditional entropy is exactly the entropy of a function on the two-point Bernoulli space, so the assumed [two-point logarithmic Sobolev inequality](/theorems/6752) gives the desired coordinate derivative term. [/proofplan] [step:Define the coordinate fibres and conditional functions] For each $i \in \{1,\dots,n\}$, define the coordinate-deletion map $\pi_{-i}: \{0,1\}^n \to \{0,1\}^{n-1}$ by removing the $i$-th coordinate. For $y \in \{0,1\}^{n-1}$, let $y^{i,b} \in \{0,1\}^n$ denote the point whose deleted-coordinate vector is $y$ and whose $i$-th coordinate is $b \in \{0,1\}$. For each $i \in \{1,\dots,n\}$, the discrete derivative $D_i f: \{0,1\}^n \to \mathbb{R}$ is the function defined by $D_i f(x)=f(x^{i,1})-f(x^{i,0})$. For each $i \in \{1,\dots,n\}$ and $y \in \{0,1\}^{n-1}$, define the fibre function $f_{i,y}: \{0,1\} \to \mathbb{R}$ by $f_{i,y}(b)=f(y^{i,b})$ for $b \in \{0,1\}$. Then \begin{align*} f_{i,y}(1)-f_{i,y}(0)=D_i f(y^{i,0})=D_i f(y^{i,1}). \end{align*} Thus $(D_i f)^2$ is constant on each $i$-th coordinate fibre. [/step] [step:Derive finite entropy tensorization by conditioning one coordinate] We prove the finite product entropy tensorization inequality for a non-negative function $g: \{0,1\}^n \to [0,\infty)$. Let $X: \{0,1\}^n \to \{0,1\}^n$ be the identity random vector under $\mu_p$. For $i \in \{1,\dots,n\}$ and $y \in \{0,1\}^{n-1}$, define $g_{i,y}: \{0,1\} \to [0,\infty)$ by $g_{i,y}(b)=g(y^{i,b})$ for $b \in \{0,1\}$. For a finite probability measure $\rho$ and a non-negative function $u$ on its support, define $\operatorname{Ent}_{\rho}(u)=\mathbb{E}_{\rho}[u\log u]-\mathbb{E}_{\rho}[u]\log \mathbb{E}_{\rho}[u]$, with $0\log 0:=0$. The following finite tensorization estimate is proved by induction on $n$: \begin{align*} \operatorname{Ent}_{\mu_p}(g)\leq \sum_{i=1}^n \mathbb{E}_{\mu_p}\bigl[\operatorname{Ent}_{\nu_p}(g_{i,\pi_{-i}(X)})\bigr]. \end{align*} For $n=1$ this is equality. Assume the estimate holds for $n-1$. Write a point as $(z,b)\in\{0,1\}^{n-1}\times\{0,1\}$, let $\rho=\nu_p^{\otimes(n-1)}$, and define $G:\{0,1\}^{n-1}\to[0,\infty)$ by $G(z)=\mathbb{E}_{\nu_p}[g(z,\cdot)]$. Directly expanding the definition of entropy gives the exact decomposition \begin{align*} \operatorname{Ent}_{\rho\otimes\nu_p}(g)=\mathbb{E}_{\rho}\bigl[\operatorname{Ent}_{\nu_p}(g(z,\cdot))\bigr]+\operatorname{Ent}_{\rho}(G). \end{align*} For each $i \in \{1,\dots,n-1\}$ and $w \in \{0,1\}^{n-2}$, let $w^{i,a} \in \{0,1\}^{n-1}$ denote the point whose deleted-coordinate vector is $w$ and whose $i$-th coordinate is $a \in \{0,1\}$. Define the fibre function $G_{i,w}: \{0,1\} \to [0,\infty)$ by $G_{i,w}(a)=G(w^{i,a})$ for $a \in \{0,1\}$. Applying the induction hypothesis to $G$ gives \begin{align*} \operatorname{Ent}_{\rho}(G)\leq\sum_{i=1}^{n-1}\mathbb{E}_{\rho}\bigl[\operatorname{Ent}_{\nu_p}(G_{i,\pi_{-i}(Z)})\bigr], \end{align*} where $Z:\{0,1\}^{n-1}\to\{0,1\}^{n-1}$ is the identity random vector under $\rho$. For each $i\leq n-1$, the function $G_{i,\pi_{-i}(Z)}$ is the $\nu_p$-average in the last coordinate of the corresponding two-point fibre of $g$. The entropy functional is convex on non-negative functions on $\{0,1\}$: for non-negative functions $u_0,u_1: \{0,1\} \to [0,\infty)$ and $\lambda\in[0,1]$, \begin{align*} \operatorname{Ent}_{\nu_p}(\lambda u_1+(1-\lambda)u_0)\leq \lambda\operatorname{Ent}_{\nu_p}(u_1)+(1-\lambda)\operatorname{Ent}_{\nu_p}(u_0). \end{align*} This finite two-point convexity inequality includes the boundary cases because all functions are non-negative and the entropy convention is $0\log 0:=0$. Applying this convexity inequality to the finite average in the last coordinate gives \begin{align*} \mathbb{E}_{\rho}\bigl[\operatorname{Ent}_{\nu_p}(G_{i,\pi_{-i}(Z)})\bigr]\leq\mathbb{E}_{\mu_p}\bigl[\operatorname{Ent}_{\nu_p}(g_{i,\pi_{-i}(X)})\bigr]. \end{align*} Combining these estimates with the decomposition proves tensorization. [guided] We prove the tensorization estimate rather than citing it. Because the cube is finite, all functions are measurable and all expectations are finite sums. Let $X: \{0,1\}^n \to \{0,1\}^n$ be the identity random vector under $\mu_p$. For $i \in \{1,\dots,n\}$ and $y \in \{0,1\}^{n-1}$, define $g_{i,y}: \{0,1\} \to [0,\infty)$ by $g_{i,y}(b)=g(y^{i,b})$ for $b \in \{0,1\}$. For a finite probability measure $\rho$ and a non-negative function $u$ on its support, define \begin{align*} \operatorname{Ent}_{\rho}(u)=\mathbb{E}_{\rho}[u\log u]-\mathbb{E}_{\rho}[u]\log \mathbb{E}_{\rho}[u], \end{align*} with the convention $0\log 0:=0$. We prove by induction on $n$ that \begin{align*} \operatorname{Ent}_{\mu_p}(g)\leq \sum_{i=1}^n \mathbb{E}_{\mu_p}\bigl[\operatorname{Ent}_{\nu_p}(g_{i,\pi_{-i}(X)})\bigr]. \end{align*} When $n=1$, the right-hand side has one term and is exactly $\operatorname{Ent}_{\nu_p}(g)$, so the estimate is equality. Assume the estimate is known in dimension $n-1$. Write a point of the cube as $(z,b)\in\{0,1\}^{n-1}\times\{0,1\}$, let $\rho=\nu_p^{\otimes(n-1)}$, and define the averaged function $G:\{0,1\}^{n-1}\to[0,\infty)$ by \begin{align*} G(z)=\mathbb{E}_{\nu_p}[g(z,\cdot)]. \end{align*} Expanding the definition of entropy separates the last coordinate from the first $n-1$ coordinates: \begin{align*} \operatorname{Ent}_{\rho\otimes\nu_p}(g)=\mathbb{E}_{\rho}\bigl[\operatorname{Ent}_{\nu_p}(g(z,\cdot))\bigr]+\operatorname{Ent}_{\rho}(G). \end{align*} This identity follows by writing both sides as finite sums: the terms $\mathbb{E}_{\rho\otimes\nu_p}[g\log g]$ agree, the intermediate term $\mathbb{E}_{\rho}[G\log G]$ cancels once with opposite signs, and the remaining term is $-\mathbb{E}_{\rho\otimes\nu_p}[g]\log\mathbb{E}_{\rho\otimes\nu_p}[g]$. Now apply the induction hypothesis to $G$ on the $(n-1)$-dimensional Bernoulli cube. For each $i \in \{1,\dots,n-1\}$ and $w \in \{0,1\}^{n-2}$, let $w^{i,a} \in \{0,1\}^{n-1}$ denote the point whose deleted-coordinate vector is $w$ and whose $i$-th coordinate is $a \in \{0,1\}$. Define the fibre function $G_{i,w}: \{0,1\} \to [0,\infty)$ by \begin{align*} G_{i,w}(a)=G(w^{i,a}). \end{align*} If $Z:\{0,1\}^{n-1}\to\{0,1\}^{n-1}$ is the identity random vector under $\rho$, then \begin{align*} \operatorname{Ent}_{\rho}(G)\leq\sum_{i=1}^{n-1}\mathbb{E}_{\rho}\bigl[\operatorname{Ent}_{\nu_p}(G_{i,\pi_{-i}(Z)})\bigr]. \end{align*} For a fixed coordinate $i\leq n-1$ and fixed outside coordinates, $G_{i,\pi_{-i}(Z)}$ is obtained by averaging, with respect to the last Bernoulli coordinate, the corresponding two-point fibre functions of $g$. We need a valid reason that averaging cannot increase the entropy term. The entropy functional is convex on non-negative functions on the two-point space: for non-negative functions $u_0,u_1:\{0,1\}\to[0,\infty)$ and $\lambda\in[0,1]$, \begin{align*} \operatorname{Ent}_{\nu_p}(\lambda u_1+(1-\lambda)u_0)\leq \lambda\operatorname{Ent}_{\nu_p}(u_1)+(1-\lambda)\operatorname{Ent}_{\nu_p}(u_0). \end{align*} The boundary cases in this inequality are covered by the hypotheses $u_0,u_1\geq 0$ and the convention $0\log 0:=0$ in the entropy definition. Applying this convexity inequality to the finite average in the last coordinate gives, for each $i\leq n-1$, \begin{align*} \mathbb{E}_{\rho}\bigl[\operatorname{Ent}_{\nu_p}(G_{i,\pi_{-i}(Z)})\bigr]\leq\mathbb{E}_{\mu_p}\bigl[\operatorname{Ent}_{\nu_p}(g_{i,\pi_{-i}(X)})\bigr]. \end{align*} The remaining term in the decomposition is exactly the fibre entropy in the last coordinate: \begin{align*} \mathbb{E}_{\rho}\bigl[\operatorname{Ent}_{\nu_p}(g(z,\cdot))\bigr]=\mathbb{E}_{\mu_p}\bigl[\operatorname{Ent}_{\nu_p}(g_{n,\pi_{-n}(X)})\bigr]. \end{align*} Combining the decomposition, the induction hypothesis, and the Jensen comparison proves \begin{align*} \operatorname{Ent}_{\mu_p}(g)\leq\sum_{i=1}^n\mathbb{E}_{\mu_p}\bigl[\operatorname{Ent}_{\nu_p}(g_{i,\pi_{-i}(X)})\bigr]. \end{align*} This is the finite product entropy tensorization inequality used below. [/guided] [/step] [step:Apply tensorization to $f^2$] Define $g: \{0,1\}^n \to [0,\infty)$ by $g(x)=f(x)^2$. Applying the finite tensorization estimate proved in the previous step to this $g$ gives \begin{align*} \operatorname{Ent}_{\mu_p}(f^2)\leq\sum_{i=1}^n\mathbb{E}_{\mu_p}\bigl[\operatorname{Ent}_{\nu_p}(f_{i,\pi_{-i}(X)}^2)\bigr]. \end{align*} Here $f_{i,\pi_{-i}(X)}^2$ denotes the function $b \mapsto f_{i,\pi_{-i}(X)}(b)^2$ on $\{0,1\}$. [/step] [step:Use the two-point logarithmic Sobolev inequality on each fibre] Fix $i \in \{1,\dots,n\}$ and $y \in \{0,1\}^{n-1}$. The function $f_{i,y}: \{0,1\} \to \mathbb{R}$ is a function on the two-point Bernoulli space. By the defining property of the constant $C_p$ stated in the theorem, \begin{align*} \operatorname{Ent}_{\nu_p}(f_{i,y}^2)\leq C_p\bigl(f_{i,y}(1)-f_{i,y}(0)\bigr)^2. \end{align*} Using the fibre identity from the first step, \begin{align*} \operatorname{Ent}_{\nu_p}(f_{i,y}^2)\leq C_p (D_i f(y^{i,0}))^2. \end{align*} Since $(D_i f)^2$ is constant on the fibre over $y$, averaging over $y$ with respect to $\nu_p^{\otimes(n-1)}$ gives \begin{align*} \mathbb{E}_{\mu_p}\bigl[\operatorname{Ent}_{\nu_p}(f_{i,\pi_{-i}(X)}^2)\bigr]\leq C_p \mathbb{E}_{\mu_p}\bigl[(D_i f)^2\bigr]. \end{align*} [/step] [step:Sum the coordinate bounds to obtain the product inequality] Substituting the fibrewise estimate into the tensorized entropy bound yields \begin{align*} \operatorname{Ent}_{\mu_p}(f^2)\leq\sum_{i=1}^n C_p \mathbb{E}_{\mu_p}\bigl[(D_i f)^2\bigr]. \end{align*} Since $C_p$ is independent of the coordinate $i$, this is exactly \begin{align*} \operatorname{Ent}_{\mu_p}(f^2)\leq C_p \sum_{i=1}^n \mathbb{E}_{\mu_p}\bigl[(D_i f)^2\bigr]. \end{align*} This proves the discrete tensorized logarithmic Sobolev inequality on the Bernoulli cube. [/step]

Explore Further

Gambler's Ruin Recurrence Probability Theory Bayes' Formula Probability Theory Bias and Variance Orders for Multivariate Kernel Density Estimators Probability & Statistics Noisy Sparse Recovery Rate under Restricted Isometry Probability & Statistics Bernstein Inequality in Minimum Form Probability & Statistics Local Packing Principle Probability & Statistics Independence of Disjoint Blocks Probability Theory Subdifferential of the $\ell^1$ Norm Probability & Statistics Probability & Statistics Area

What brings you to Androma?

Start with a route through the knowledge graph.