Gibbs' Inequality — Statement & Proof

Gibbs' Inequality (Theorem # 1629)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

No discussion available for this theorem.

Proof

[proofplan] The argument rests on the concave estimate $\ln x \le x - 1$, valid for all $x > 0$, with equality iff $x = 1$. Applied to $x = q_i / p_i$ on the support of $p$, this turns a weighted sum of log-ratios into a telescoping expression that reduces to at most $\sum q_i - \sum p_i \le 0$. Rearranging gives Gibbs' inequality. Equality forces both $q_i/p_i = 1$ on the support of $p$ and $\sum_{i : p_i \ne 0} q_i = 1$, which together imply $q = p$ on all indices. The conversion from $\ln$ to $\log_2$ is a positive-scalar multiplication that preserves the inequality. [/proofplan] [step:Reduce from $\log_2$ to $\ln$] For $x > 0$, $\log_2 x = \ln x / \ln 2$. Since $\ln 2 > 0$, multiplying both sides of the desired inequality by $\ln 2 > 0$ shows it is equivalent to \begin{align*} -\sum_{i=1}^n p_i \ln p_i \le -\sum_{i=1}^n p_i \ln q_i, \end{align*} with the same equality condition. We prove this reformulated inequality. We use the convention $0 \cdot \ln 0 := 0$ (justified by $\lim_{x \to 0^+} x \ln x = 0$) and $0 \cdot \ln q_i := 0$ even when $q_i = 0$. To avoid pathologies when $q_i = 0$, set \begin{align*} I := \{i \in \{1, \ldots, n\} : p_i \ne 0\}, \end{align*} and observe that both sums are supported on $I$: \begin{align*} -\sum_{i=1}^n p_i \ln p_i = -\sum_{i \in I} p_i \ln p_i, \qquad -\sum_{i=1}^n p_i \ln q_i = -\sum_{i \in I} p_i \ln q_i. \end{align*} If there is any $i \in I$ with $q_i = 0$, then $p_i > 0$ and $p_i \ln q_i = p_i \cdot (-\infty) = -\infty$, so the right-hand side is $+\infty$ and the inequality holds vacuously; equality is then impossible since $q_i = 0 \ne p_i$. Assume therefore that $q_i > 0$ for every $i \in I$. [/step] [step:Record the elementary estimate $\ln x \le x - 1$] [claim:For all $x > 0$, $\ln x \le x - 1$, with equality iff $x = 1$] Define \begin{align*} \varphi : (0, \infty) &\to \mathbb{R}, \\ x &\mapsto (x - 1) - \ln x. \end{align*} Then $\varphi$ is differentiable on $(0, \infty)$ with $\varphi'(x) = 1 - 1/x$. Thus $\varphi'(x) < 0$ on $(0, 1)$, $\varphi'(1) = 0$, and $\varphi'(x) > 0$ on $(1, \infty)$. By the first-derivative test, $\varphi$ attains its global minimum on $(0, \infty)$ at $x = 1$, where $\varphi(1) = 0$. Hence $\varphi(x) \ge 0$ for all $x > 0$, with equality iff $x = 1$. Rearranging gives $\ln x \le x - 1$ with equality iff $x = 1$. [/claim] [proof][/proof] [/step] [step:Apply the estimate to $x = q_i / p_i$ and sum] For each $i \in I$, set $x_i := q_i / p_i > 0$. The claim of Step 2 gives \begin{align*} \ln \frac{q_i}{p_i} \le \frac{q_i}{p_i} - 1, \qquad i \in I, \end{align*} with equality iff $q_i = p_i$. Multiplying by $p_i > 0$ (which preserves the inequality and the equality condition) and summing over $i \in I$, \begin{align*} \sum_{i \in I} p_i \ln \frac{q_i}{p_i} \le \sum_{i \in I} p_i \left(\frac{q_i}{p_i} - 1\right) = \sum_{i \in I} q_i - \sum_{i \in I} p_i. \end{align*} Now $\sum_{i \in I} p_i = \sum_{i=1}^n p_i = 1$ (since $p_i = 0$ outside $I$), and $\sum_{i \in I} q_i \le \sum_{i=1}^n q_i = 1$ (since the summands $q_i \ge 0$ are non-negative). Substituting, \begin{align*} \sum_{i \in I} p_i \ln \frac{q_i}{p_i} \le \sum_{i \in I} q_i - 1 \le 1 - 1 = 0. \end{align*} [guided] We chose $x = q_i/p_i$ specifically to introduce both distributions into the inequality. The weight $p_i$ attached to the $i$-th term plays two roles: it guarantees the sum is supported on $I$ (keeping all ratios well-defined), and it cancels the denominator in $p_i(q_i/p_i - 1) = q_i - p_i$, turning the bound into a telescoping difference of two probability sums. The final $\sum_{i \in I} q_i \le 1$ uses only non-negativity of $q$ and the restriction of the summation to $I$; it becomes an equality exactly when $q$ is also supported in $I$. [/guided] [/step] [step:Rearrange the log-ratio sum into Gibbs' inequality] Split the logarithm: for $i \in I$, $\ln(q_i/p_i) = \ln q_i - \ln p_i$. Therefore \begin{align*} \sum_{i \in I} p_i \ln \frac{q_i}{p_i} = \sum_{i \in I} p_i \ln q_i - \sum_{i \in I} p_i \ln p_i. \end{align*} Using the support identities from Step 1, \begin{align*} \sum_{i \in I} p_i \ln q_i = \sum_{i=1}^n p_i \ln q_i, \qquad \sum_{i \in I} p_i \ln p_i = \sum_{i=1}^n p_i \ln p_i. \end{align*} Combining with the bound from Step 3, \begin{align*} \sum_{i=1}^n p_i \ln q_i - \sum_{i=1}^n p_i \ln p_i \le 0, \end{align*} which rearranges to \begin{align*} -\sum_{i=1}^n p_i \ln p_i \le -\sum_{i=1}^n p_i \ln q_i. \end{align*} By Step 1, this is the form of Gibbs' inequality with $\log_2$ replaced by $\ln$, hence establishes Gibbs' inequality. [/step] [step:Characterise the case of equality] Tracing the equality conditions through Steps 3 and 1: **Equality in Step 3 requires both:** (a) $q_i / p_i = 1$ for every $i \in I$, i.e., $q_i = p_i$ on the support of $p$; and (b) $\sum_{i \in I} q_i = \sum_{i=1}^n q_i$, i.e., $q_i = 0$ for every $i \notin I$. **Conclusion.** Conditions (a) and (b) together say: $q_i = p_i$ for every $i \in I$, and $q_i = 0 = p_i$ for every $i \notin I$ (using the definition of $I$). Hence $q_i = p_i$ for all $i \in \{1, \ldots, n\}$. Conversely, if $q_i = p_i$ for all $i$, then both sides of Gibbs' inequality are the (entropy) value $-\sum p_i \log p_i$, giving equality. Equality therefore holds if and only if $p_i = q_i$ for all $i \in \{1, \ldots, n\}$, completing the proof. [guided] Equality in the Gibbs inequality comes from two sources, both of which must saturate. First, the pointwise estimate $\ln x \le x - 1$ is an equality only at $x = 1$, forcing $q_i = p_i$ on the support $I$ of $p$. Second, the final estimate $\sum_{i \in I} q_i \le \sum_{i=1}^n q_i = 1$ is an equality only when $q_i = 0$ outside $I$. Both must hold simultaneously, and together they pin down $q$ completely: $q = p$ on $I$ and $q = 0 = p$ off $I$. Each condition alone is not enough — one could imagine $q = p$ on the support of $p$ but $q$ placing mass outside that support, which would violate normalisation of $q$. This is why the theorem states the equality condition as the single sharp statement $p = q$. [/guided] [/step]

Explore Further

Structure of Optimal Codes Information Theory Cyclic Codes Are Ideals Information Theory Factorisation for Odd $n$ Information Theory Fano's Inequality Information Theory Characterisation of the Error Locator Polynomial Information Theory Basis from Generator Polynomial Information Theory Minimum Distance and Column Independence Information Theory Shannon's First Coding Theorem Information Theory

What brings you to Androma?

Start with a route through the knowledge graph.

Gibbs' Inequality (Theorem # 1629)

Discussion

Proof

Explore Further

Sign in to Androma

Check your inbox

One last step

Gibbs' Inequality (Theorem # 1629)

Discussion

Proof

Explore Further