Lovász Extension Convexity Criterion — Statement & Proof

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We prove first that [submodularity](/page/Submodular%20Function) makes the [Lovász extension](/page/Lovasz%20Extension) a support function. For each $x \in \mathbb{R}^E$, the greedy vector obtained from a decreasing ordering of the coordinates of $x$ lies in the [base polyhedron](/page/Base%20Polyhedron) of $f$ and maximizes the linear functional $y \mapsto \sum_{e\in E}x_e y_e$ there. This gives $\hat f(x)$ as a maximum of linear functions, hence convex. Conversely, if $\hat f$ is convex, we apply convexity to indicator vectors and evaluate the Lovász extension at the common midpoint of $\mathbb{1}_A,\mathbb{1}_B$ and $\mathbb{1}_{A\cup B},\mathbb{1}_{A\cap B}$ to recover the submodular inequality. [/proofplan] [step:Separate the empty ground set case] If $E=\varnothing$, then the only subsets of $E$ are $\varnothing$ itself. Hence submodularity reduces to \begin{align*} f(\varnothing)+f(\varnothing)\geq f(\varnothing)+f(\varnothing), \end{align*} which holds. Also $\mathbb{R}^E$ is the one-point [vector space](/page/Vector%20Space) with unique element $0_E:\varnothing\to\mathbb{R}$, and the Lovász extension satisfies $\hat f(0_E)=f(\varnothing)=0$. Therefore $\hat f$ is convex on $\mathbb{R}^E$. Thus the theorem holds when $E=\varnothing$, so for the rest of the proof assume $E\neq\varnothing$. [guided] When $E=\varnothing$, there is exactly one subset of $E$, namely $\varnothing$. The submodular inequality must therefore be checked only for the pair $A=B=\varnothing$, and it reads \begin{align*} f(\varnothing)+f(\varnothing)\geq f(\varnothing)+f(\varnothing). \end{align*} This inequality is an equality, so $f$ is submodular in the empty ground set case. The vector space $\mathbb{R}^E$ is also degenerate in this case: it consists of the single function $0_E:\varnothing\to\mathbb{R}$. The Lovász extension has only one value to assign, and by the normalization hypothesis it is \begin{align*} \hat f(0_E)=f(\varnothing)=0. \end{align*} A function on a one-point convex set is convex, because for the only possible choices $x=z=0_E$ and every $\lambda\in[0,1]$, the convexity inequality reduces to equality. Hence the theorem holds when $E=\varnothing$. We may therefore assume for the remainder of the proof that $E\neq\varnothing$. [/guided] [/step] [step:Define the base polyhedron and the greedy vector] Assume first that $f$ is [submodular](/page/Submodular%20Function). Let $n:=|E|$. For $y \in \mathbb{R}^E$ and $S \subset E$, define \begin{align*} y(S) := \sum_{e\in S} y_e, \end{align*} with $y(\varnothing)=0$. Define the [base polyhedron](/page/Base%20Polyhedron) associated to $f$ as the set $B(f)\subset\mathbb{R}^E$ whose elements are precisely those $y\in\mathbb{R}^E$ satisfying \begin{align*} y(E)&=f(E), \end{align*} and, for every $S\subset E$, \begin{align*} y(S) \leq f(S). \end{align*} Fix $x\in\mathbb{R}^E$, choose an ordering $e_1,\dots,e_n$ of $E$ such that \begin{align*} x_{e_1}\geq x_{e_2}\geq \cdots \geq x_{e_n}, \end{align*} and set $S_0:=\varnothing$ and $S_k:=\{e_1,\dots,e_k\}$ for $1\leq k\leq n$. Define the greedy vector $g_x\in\mathbb{R}^E$ by \begin{align*} (g_x)_{e_k}:=f(S_k)-f(S_{k-1}),\qquad 1\leq k\leq n. \end{align*} By the definition of the [Lovász extension](/page/Lovasz%20Extension), \begin{align*} \hat f(x)=\sum_{k=1}^{n}x_{e_k}(g_x)_{e_k}. \end{align*} This value is independent of the chosen ordering among tied coordinates: if $x_{e_a}=\cdots=x_{e_b}=c$ for a tied block, then the total contribution of that block is \begin{align*} c\sum_{k=a}^{b}\bigl(f(S_k)-f(S_{k-1})\bigr)=c\bigl(f(S_b)-f(S_{a-1})\bigr), \end{align*} which depends only on the set of tied elements in the block and not on their internal order. [/step] [step:Show that the greedy vector belongs to the base polyhedron] We prove $g_x\in B(f)$. First, telescoping gives \begin{align*} g_x(E)=\sum_{k=1}^{n}\bigl(f(S_k)-f(S_{k-1})\bigr)=f(E)-f(\varnothing)=f(E). \end{align*} It remains to prove $g_x(A)\leq f(A)$ for every $A\subset E$. Let $A\subset E$. Write $A=\{e_{k_1},\dots,e_{k_m}\}$ with $1\leq k_1<\cdots<k_m\leq n$, and define $A_0:=\varnothing$ and $A_j:=\{e_{k_1},\dots,e_{k_j}\}$ for $1\leq j\leq m$. For each $j$, we have $A_{j-1}\subset S_{k_j-1}$ and $e_{k_j}\notin S_{k_j-1}$. Submodularity applied to the sets $A_j=A_{j-1}\cup\{e_{k_j}\}$ and $S_{k_j-1}$ gives \begin{align*} f(A_j)+f(S_{k_j-1})\geq f(A_{j-1})+f(S_{k_j}). \end{align*} Rearranging, \begin{align*} f(S_{k_j})-f(S_{k_j-1})\leq f(A_j)-f(A_{j-1}). \end{align*} Summing over $1\leq j\leq m$ yields \begin{align*} g_x(A)=\sum_{j=1}^{m}(g_x)_{e_{k_j}}=\sum_{j=1}^{m}\bigl(f(S_{k_j})-f(S_{k_j-1})\bigr)\leq \sum_{j=1}^{m}\bigl(f(A_j)-f(A_{j-1})\bigr)=f(A)-f(\varnothing)=f(A). \end{align*} Thus $g_x\in B(f)$. [guided] Recall the objects being checked in this step. For $y\in\mathbb{R}^E$ and $S\subset E$, we write $y(S):=\sum_{e\in S}y_e$, with $y(\varnothing)=0$. The base polyhedron $B(f)\subset\mathbb{R}^E$ consists of the vectors $y\in\mathbb{R}^E$ such that $y(E)=f(E)$ and $y(S)\leq f(S)$ for every $S\subset E$. The greedy vector $g_x\in\mathbb{R}^E$ is defined along the sorted chain $S_k=\{e_1,\dots,e_k\}$ by $(g_x)_{e_k}:=f(S_k)-f(S_{k-1})$ for $1\leq k\leq n$. The constraint $g_x(E)=f(E)$ follows from the construction: the increments of $f$ along the chain \begin{align*} \varnothing=S_0\subset S_1\subset\cdots\subset S_n=E \end{align*} telescope: \begin{align*} g_x(E)=\sum_{k=1}^{n}\bigl(f(S_k)-f(S_{k-1})\bigr)=f(E)-f(\varnothing)=f(E). \end{align*} The nontrivial part is checking the inequalities $g_x(A)\leq f(A)$ for arbitrary $A\subset E$, since $A$ need not be one of the chain sets $S_k$. List the elements of $A$ in the order in which they appear in the sorted chain: \begin{align*} A=\{e_{k_1},\dots,e_{k_m}\},\qquad 1\leq k_1<\cdots<k_m\leq n. \end{align*} Define $A_0:=\varnothing$, and for $1\leq j\leq m$ define $A_j:=\{e_{k_1},\dots,e_{k_j}\}$. At the moment when the chain adds $e_{k_j}$, the larger set already present is $S_{k_j-1}$, while the part of $A$ already selected is only $A_{j-1}$. Since $A_{j-1}\subset S_{k_j-1}$ and $e_{k_j}\notin S_{k_j-1}$, submodularity applied to $A_j=A_{j-1}\cup\{e_{k_j}\}$ and $S_{k_j-1}$ gives \begin{align*} f(A_j)+f(S_{k_j-1})\geq f(A_j\cup S_{k_j-1})+f(A_j\cap S_{k_j-1}). \end{align*} Here $A_j\cup S_{k_j-1}=S_{k_j}$ and $A_j\cap S_{k_j-1}=A_{j-1}$, so the inequality becomes \begin{align*} f(A_j)+f(S_{k_j-1})\geq f(S_{k_j})+f(A_{j-1}). \end{align*} Rearranging, \begin{align*} f(S_{k_j})-f(S_{k_j-1})\leq f(A_j)-f(A_{j-1}). \end{align*} This says that the marginal contribution of $e_{k_j}$ along the larger chain is no bigger than its marginal contribution inside $A$ itself. Summing these inequalities gives \begin{align*} g_x(A)=\sum_{j=1}^{m}(g_x)_{e_{k_j}}=\sum_{j=1}^{m}\bigl(f(S_{k_j})-f(S_{k_j-1})\bigr)\leq \sum_{j=1}^{m}\bigl(f(A_j)-f(A_{j-1})\bigr)=f(A)-f(\varnothing)=f(A). \end{align*} Therefore $g_x$ satisfies every defining inequality of $B(f)$, and hence $g_x\in B(f)$. [/guided] [/step] [step:Prove the greedy vector maximizes the sorted linear functional] Let $y\in B(f)$. Since $S_0=\varnothing$, the identity $y_{e_k}=y(S_k)-y(S_{k-1})$ holds for $1\leq k\leq n$. Substituting this identity and telescoping the resulting finite sum gives the summation-by-parts formula \begin{align*} \sum_{k=1}^{n}x_{e_k}y_{e_k}=x_{e_n}y(E)+\sum_{k=1}^{n-1}(x_{e_k}-x_{e_{k+1}})y(S_k). \end{align*} Because $y\in B(f)$, we have $y(E)=f(E)$ and $y(S_k)\leq f(S_k)$ for $1\leq k\leq n-1$. Because the coordinates of $x$ are sorted in nonincreasing order, $x_{e_k}-x_{e_{k+1}}\geq 0$. Hence \begin{align*} \sum_{k=1}^{n}x_{e_k}y_{e_k}\leq x_{e_n}f(E)+\sum_{k=1}^{n-1}(x_{e_k}-x_{e_{k+1}})f(S_k). \end{align*} The right-hand side is the same summation-by-parts expansion applied to $g_x$, since $g_x(S_k)=f(S_k)$ for each chain set $S_k$ and $g_x(E)=f(E).$ Therefore \begin{align*} \sum_{e\in E}x_e y_e\leq \sum_{e\in E}x_e (g_x)_e=\hat f(x). \end{align*} Since $g_x\in B(f)$, this proves \begin{align*} \hat f(x)=\max_{y\in B(f)}\sum_{e\in E}x_e y_e. \end{align*} [guided] The goal is to prove that no vector $y\in B(f)$ gives a larger value of the linear functional \begin{align*} y\mapsto \sum_{e\in E}x_e y_e \end{align*} than the greedy vector $g_x$. The ordering $e_1,\dots,e_n$ was chosen so that the coefficients $x_{e_k}$ decrease, which lets us compare the chain sums $y(S_k)$ with $f(S_k)$ using only nonnegative weights. For $1\leq k\leq n$, the chain sets satisfy $S_k=S_{k-1}\cup\{e_k\}$, so the additivity of $y(S)$ over finite sums gives \begin{align*} y_{e_k}=y(S_k)-y(S_{k-1}). \end{align*} Substituting this identity into the linear functional gives \begin{align*} \sum_{k=1}^{n}x_{e_k}y_{e_k}=\sum_{k=1}^{n}x_{e_k}\bigl(y(S_k)-y(S_{k-1})\bigr). \end{align*} Expanding this finite sum and collecting the coefficient of each $y(S_k)$ gives the summation-by-parts identity \begin{align*} \sum_{k=1}^{n}x_{e_k}y_{e_k}=x_{e_n}y(E)+\sum_{k=1}^{n-1}(x_{e_k}-x_{e_{k+1}})y(S_k). \end{align*} Now we use the defining constraints of $B(f)$. Since $y\in B(f)$, we have $y(E)=f(E)$ and $y(S_k)\leq f(S_k)$ for each $1\leq k\leq n-1$. Since the coordinates are sorted as $x_{e_1}\geq\cdots\geq x_{e_n}$, every coefficient $x_{e_k}-x_{e_{k+1}}$ is nonnegative. Therefore replacing $y(S_k)$ by the larger quantity $f(S_k)$ preserves the inequality direction: \begin{align*} \sum_{k=1}^{n}x_{e_k}y_{e_k}\leq x_{e_n}f(E)+\sum_{k=1}^{n-1}(x_{e_k}-x_{e_{k+1}})f(S_k). \end{align*} The greedy vector has $g_x(E)=f(E)$ and, by telescoping along the initial segment of the chain, $g_x(S_k)=f(S_k)$ for each $1\leq k\leq n-1$. Applying the same summation-by-parts identity to $g_x$ therefore gives \begin{align*} x_{e_n}f(E)+\sum_{k=1}^{n-1}(x_{e_k}-x_{e_{k+1}})f(S_k)=\sum_{e\in E}x_e(g_x)_e. \end{align*} By the definition of the Lovász extension along the sorted order, the last expression is $\hat f(x)$. Hence every $y\in B(f)$ satisfies \begin{align*} \sum_{e\in E}x_e y_e\leq \sum_{e\in E}x_e(g_x)_e=\hat f(x). \end{align*} Since the previous step proved $g_x\in B(f)$, this upper bound is attained at $g_x$, and therefore \begin{align*} \hat f(x)=\max_{y\in B(f)}\sum_{e\in E}x_e y_e. \end{align*} [/guided] [/step] [step:Deduce convexity from the support function formula] Let $x,z\in\mathbb{R}^E$ and let $\lambda\in[0,1]$. Using the support formula proved above, \begin{align*} \hat f(\lambda x+(1-\lambda)z)=\max_{y\in B(f)}\sum_{e\in E}\bigl(\lambda x_e+(1-\lambda)z_e\bigr)y_e=\max_{y\in B(f)}\left(\lambda\sum_{e\in E}x_e y_e+(1-\lambda)\sum_{e\in E}z_e y_e\right)\leq \lambda\max_{y\in B(f)}\sum_{e\in E}x_e y_e +(1-\lambda)\max_{y\in B(f)}\sum_{e\in E}z_e y_e=\lambda\hat f(x)+(1-\lambda)\hat f(z). \end{align*} Thus $\hat f$ is convex. [guided] We now use the support-function formula obtained in the previous step: \begin{align*} \hat f(w)=\max_{y\in B(f)}\sum_{e\in E}w_e y_e \end{align*} for every $w\in\mathbb{R}^E$. Let $x,z\in\mathbb{R}^E$ and let $\lambda\in[0,1]$. The vector $\lambda x+(1-\lambda)z\in\mathbb{R}^E$ has coordinates $\lambda x_e+(1-\lambda)z_e$, so \begin{align*} \hat f(\lambda x+(1-\lambda)z)=\max_{y\in B(f)}\left(\lambda\sum_{e\in E}x_e y_e+(1-\lambda)\sum_{e\in E}z_e y_e\right). \end{align*} For each fixed $y\in B(f)$, the first sum is at most $\max_{u\in B(f)}\sum_{e\in E}x_eu_e$, and the second sum is at most $\max_{v\in B(f)}\sum_{e\in E}z_ev_e$. Since $\lambda\geq0$ and $1-\lambda\geq0$, multiplying these inequalities preserves their directions. Taking the maximum over $y\in B(f)$ gives \begin{align*} \hat f(\lambda x+(1-\lambda)z)\leq \lambda\max_{u\in B(f)}\sum_{e\in E}x_eu_e+(1-\lambda)\max_{v\in B(f)}\sum_{e\in E}z_ev_e. \end{align*} Applying the support-function formula again to $x$ and $z$, the right-hand side is \begin{align*} \lambda\hat f(x)+(1-\lambda)\hat f(z). \end{align*} This is exactly the convexity inequality for $\hat f$ on $\mathbb{R}^E$. [/guided] [/step] [step:Recover submodularity from convexity on indicator vectors] Assume conversely that $\hat f$ is convex. For each $A\subset E$, define the indicator vector $\mathbb{1}_A\in\mathbb{R}^E$ by setting $(\mathbb{1}_A)_e:=1$ for $e\in A$ and $(\mathbb{1}_A)_e:=0$ for $e\in E\setminus A$. By the definition of the Lovász extension, sorting $\mathbb{1}_A$ places the elements of $A$ before the elements of $E\setminus A$, and therefore \begin{align*} \hat f(\mathbb{1}_A)=f(A). \end{align*} Let $A,B\subset E$. The midpoint identity \begin{align*} \frac{\mathbb{1}_A+\mathbb{1}_B}{2}=\frac{\mathbb{1}_{A\cup B}+\mathbb{1}_{A\cap B}}{2} \end{align*} holds coordinatewise: both sides are $1$ on $A\cap B$, $\frac{1}{2}$ on $(A\cup B)\setminus(A\cap B)$, and $0$ on $E\setminus(A\cup B)$. Sorting this common vector gives \begin{align*} \hat f\left(\frac{\mathbb{1}_A+\mathbb{1}_B}{2}\right)=f(A\cap B)+\frac{1}{2}\bigl(f(A\cup B)-f(A\cap B)\bigr)=\frac{f(A\cup B)+f(A\cap B)}{2}. \end{align*} Convexity of $\hat f$ applied to $\mathbb{1}_A$ and $\mathbb{1}_B$ gives \begin{align*} \hat f\left(\frac{\mathbb{1}_A+\mathbb{1}_B}{2}\right)\leq \frac{1}{2}\hat f(\mathbb{1}_A)+\frac{1}{2}\hat f(\mathbb{1}_B)=\frac{f(A)+f(B)}{2}. \end{align*} Combining the two displayed formulas and multiplying by $2$ yields \begin{align*} f(A\cup B)+f(A\cap B)\leq f(A)+f(B). \end{align*} Since $A,B\subset E$ were arbitrary, $f$ is submodular. This completes the proof. [guided] Assume now that $\hat f$ is convex. We must recover the submodular inequality for arbitrary subsets $A,B\subset E$. For each subset $C\subset E$, define the indicator vector $\mathbb{1}_C\in\mathbb{R}^E$ as follows: for $e\in C$ set $(\mathbb{1}_C)_e:=1$, and for $e\in E\setminus C$ set $(\mathbb{1}_C)_e:=0$. When the Lovász extension is evaluated at $\mathbb{1}_C$, any decreasing ordering places all elements of $C$ before all elements of $E\setminus C$. The increments telescope from $\varnothing$ to $C$ and then from $C$ to $E$ with coefficient $0$ after the first block, so \begin{align*} \hat f(\mathbb{1}_C)=f(C). \end{align*} Fix $A,B\subset E$. The coordinatewise identity \begin{align*} \frac{\mathbb{1}_A+\mathbb{1}_B}{2}=\frac{\mathbb{1}_{A\cup B}+\mathbb{1}_{A\cap B}}{2} \end{align*} holds because both sides have value $1$ on $A\cap B$, value $1/2$ on $(A\cup B)\setminus(A\cap B)$, and value $0$ on $E\setminus(A\cup B)$. Sorting this common vector puts the elements of $A\cap B$ first, then the elements of $(A\cup B)\setminus(A\cap B)$, then the remaining elements. Hence the Lovász extension evaluates to \begin{align*} \hat f\left(\frac{\mathbb{1}_A+\mathbb{1}_B}{2}\right)=f(A\cap B)+\frac{1}{2}\bigl(f(A\cup B)-f(A\cap B)\bigr)=\frac{f(A\cup B)+f(A\cap B)}{2}. \end{align*} Convexity of $\hat f$ applies to the two vectors $\mathbb{1}_A,\mathbb{1}_B\in\mathbb{R}^E$ with midpoint parameter $1/2$, giving \begin{align*} \hat f\left(\frac{\mathbb{1}_A+\mathbb{1}_B}{2}\right)\leq \frac{1}{2}\hat f(\mathbb{1}_A)+\frac{1}{2}\hat f(\mathbb{1}_B)=\frac{f(A)+f(B)}{2}. \end{align*} Combining this inequality with the explicit midpoint evaluation yields \begin{align*} \frac{f(A\cup B)+f(A\cap B)}{2}\leq \frac{f(A)+f(B)}{2}. \end{align*} Multiplying by $2$ gives \begin{align*} f(A\cup B)+f(A\cap B)\leq f(A)+f(B). \end{align*} Since $A$ and $B$ were arbitrary subsets of $E$, this is exactly submodularity of $f$. [/guided] [/step]

What brings you to Androma?

Start with a route through the knowledge graph.

Lovász Extension Convexity Criterion (Theorem # 5820)

Discussion

Proof

Explore Further

Sign in to Androma

Check your inbox

One last step

Lovász Extension Convexity Criterion (Theorem # 5820)

Discussion

Proof

Explore Further