[proofplan]
We prove first that [submodularity](/page/Submodular%20Function) makes the [Lovász extension](/page/Lovasz%20Extension) a support function. For each $x \in \mathbb{R}^E$, the greedy vector obtained from a decreasing ordering of the coordinates of $x$ lies in the [base polyhedron](/page/Base%20Polyhedron) of $f$ and maximizes the linear functional $y \mapsto \sum_{e\in E}x_e y_e$ there. This gives $\hat f(x)$ as a maximum of linear functions, hence convex. Conversely, if $\hat f$ is convex, we apply convexity to indicator vectors and evaluate the Lovász extension at the common midpoint of $\mathbb{1}_A,\mathbb{1}_B$ and $\mathbb{1}_{A\cup B},\mathbb{1}_{A\cap B}$ to recover the submodular inequality.
[/proofplan]
[step:Separate the empty ground set case]
If $E=\varnothing$, then the only subsets of $E$ are $\varnothing$ itself. Hence submodularity reduces to
\begin{align*}
f(\varnothing)+f(\varnothing)\geq f(\varnothing)+f(\varnothing),
\end{align*}
which holds. Also $\mathbb{R}^E$ is the one-point [vector space](/page/Vector%20Space) with unique element $0_E:\varnothing\to\mathbb{R}$, and the Lovász extension satisfies $\hat f(0_E)=f(\varnothing)=0$. Therefore $\hat f$ is convex on $\mathbb{R}^E$. Thus the theorem holds when $E=\varnothing$, so for the rest of the proof assume $E\neq\varnothing$.
[guided]
When $E=\varnothing$, there is exactly one subset of $E$, namely $\varnothing$. The submodular inequality must therefore be checked only for the pair $A=B=\varnothing$, and it reads
\begin{align*}
f(\varnothing)+f(\varnothing)\geq f(\varnothing)+f(\varnothing).
\end{align*}
This inequality is an equality, so $f$ is submodular in the empty ground set case.
The vector space $\mathbb{R}^E$ is also degenerate in this case: it consists of the single function $0_E:\varnothing\to\mathbb{R}$. The Lovász extension has only one value to assign, and by the normalization hypothesis it is
\begin{align*}
\hat f(0_E)=f(\varnothing)=0.
\end{align*}
A function on a one-point convex set is convex, because for the only possible choices $x=z=0_E$ and every $\lambda\in[0,1]$, the convexity inequality reduces to equality. Hence the theorem holds when $E=\varnothing$. We may therefore assume for the remainder of the proof that $E\neq\varnothing$.
[/guided]
[/step]
[step:Define the base polyhedron and the greedy vector]
Assume first that $f$ is [submodular](/page/Submodular%20Function). Let $n:=|E|$. For $y \in \mathbb{R}^E$ and $S \subset E$, define
\begin{align*}
y(S) := \sum_{e\in S} y_e,
\end{align*}
with $y(\varnothing)=0$. Define the [base polyhedron](/page/Base%20Polyhedron) associated to $f$ as the set $B(f)\subset\mathbb{R}^E$ whose elements are precisely those $y\in\mathbb{R}^E$ satisfying
\begin{align*}
y(E)&=f(E),
\end{align*}
and, for every $S\subset E$,
\begin{align*}
y(S) \leq f(S).
\end{align*}
Fix $x\in\mathbb{R}^E$, choose an ordering $e_1,\dots,e_n$ of $E$ such that
\begin{align*}
x_{e_1}\geq x_{e_2}\geq \cdots \geq x_{e_n},
\end{align*}
and set $S_0:=\varnothing$ and $S_k:=\{e_1,\dots,e_k\}$ for $1\leq k\leq n$. Define the greedy vector $g_x\in\mathbb{R}^E$ by
\begin{align*}
(g_x)_{e_k}:=f(S_k)-f(S_{k-1}),\qquad 1\leq k\leq n.
\end{align*}
By the definition of the [Lovász extension](/page/Lovasz%20Extension),
\begin{align*}
\hat f(x)=\sum_{k=1}^{n}x_{e_k}(g_x)_{e_k}.
\end{align*}
This value is independent of the chosen ordering among tied coordinates: if $x_{e_a}=\cdots=x_{e_b}=c$ for a tied block, then the total contribution of that block is
\begin{align*}
c\sum_{k=a}^{b}\bigl(f(S_k)-f(S_{k-1})\bigr)=c\bigl(f(S_b)-f(S_{a-1})\bigr),
\end{align*}
which depends only on the set of tied elements in the block and not on their internal order.
[/step]
[step:Show that the greedy vector belongs to the base polyhedron]
We prove $g_x\in B(f)$. First, telescoping gives
\begin{align*}
g_x(E)=\sum_{k=1}^{n}\bigl(f(S_k)-f(S_{k-1})\bigr)=f(E)-f(\varnothing)=f(E).
\end{align*}
It remains to prove $g_x(A)\leq f(A)$ for every $A\subset E$.
Let $A\subset E$. Write $A=\{e_{k_1},\dots,e_{k_m}\}$ with $1\leq k_1<\cdots<k_m\leq n$, and define $A_0:=\varnothing$ and $A_j:=\{e_{k_1},\dots,e_{k_j}\}$ for $1\leq j\leq m$. For each $j$, we have $A_{j-1}\subset S_{k_j-1}$ and $e_{k_j}\notin S_{k_j-1}$. Submodularity applied to the sets $A_j=A_{j-1}\cup\{e_{k_j}\}$ and $S_{k_j-1}$ gives
\begin{align*}
f(A_j)+f(S_{k_j-1})\geq f(A_{j-1})+f(S_{k_j}).
\end{align*}
Rearranging,
\begin{align*}
f(S_{k_j})-f(S_{k_j-1})\leq f(A_j)-f(A_{j-1}).
\end{align*}
Summing over $1\leq j\leq m$ yields
\begin{align*}
g_x(A)=\sum_{j=1}^{m}(g_x)_{e_{k_j}}=\sum_{j=1}^{m}\bigl(f(S_{k_j})-f(S_{k_j-1})\bigr)\leq \sum_{j=1}^{m}\bigl(f(A_j)-f(A_{j-1})\bigr)=f(A)-f(\varnothing)=f(A).
\end{align*}
Thus $g_x\in B(f)$.
[guided]
Recall the objects being checked in this step. For $y\in\mathbb{R}^E$ and $S\subset E$, we write $y(S):=\sum_{e\in S}y_e$, with $y(\varnothing)=0$. The base polyhedron $B(f)\subset\mathbb{R}^E$ consists of the vectors $y\in\mathbb{R}^E$ such that $y(E)=f(E)$ and $y(S)\leq f(S)$ for every $S\subset E$. The greedy vector $g_x\in\mathbb{R}^E$ is defined along the sorted chain $S_k=\{e_1,\dots,e_k\}$ by $(g_x)_{e_k}:=f(S_k)-f(S_{k-1})$ for $1\leq k\leq n$.
The constraint $g_x(E)=f(E)$ follows from the construction: the increments of $f$ along the chain
\begin{align*}
\varnothing=S_0\subset S_1\subset\cdots\subset S_n=E
\end{align*}
telescope:
\begin{align*}
g_x(E)=\sum_{k=1}^{n}\bigl(f(S_k)-f(S_{k-1})\bigr)=f(E)-f(\varnothing)=f(E).
\end{align*}
The nontrivial part is checking the inequalities $g_x(A)\leq f(A)$ for arbitrary $A\subset E$, since $A$ need not be one of the chain sets $S_k$. List the elements of $A$ in the order in which they appear in the sorted chain:
\begin{align*}
A=\{e_{k_1},\dots,e_{k_m}\},\qquad 1\leq k_1<\cdots<k_m\leq n.
\end{align*}
Define $A_0:=\varnothing$, and for $1\leq j\leq m$ define $A_j:=\{e_{k_1},\dots,e_{k_j}\}$. At the moment when the chain adds $e_{k_j}$, the larger set already present is $S_{k_j-1}$, while the part of $A$ already selected is only $A_{j-1}$. Since $A_{j-1}\subset S_{k_j-1}$ and $e_{k_j}\notin S_{k_j-1}$, submodularity applied to $A_j=A_{j-1}\cup\{e_{k_j}\}$ and $S_{k_j-1}$ gives
\begin{align*}
f(A_j)+f(S_{k_j-1})\geq f(A_j\cup S_{k_j-1})+f(A_j\cap S_{k_j-1}).
\end{align*}
Here $A_j\cup S_{k_j-1}=S_{k_j}$ and $A_j\cap S_{k_j-1}=A_{j-1}$, so the inequality becomes
\begin{align*}
f(A_j)+f(S_{k_j-1})\geq f(S_{k_j})+f(A_{j-1}).
\end{align*}
Rearranging,
\begin{align*}
f(S_{k_j})-f(S_{k_j-1})\leq f(A_j)-f(A_{j-1}).
\end{align*}
This says that the marginal contribution of $e_{k_j}$ along the larger chain is no bigger than its marginal contribution inside $A$ itself. Summing these inequalities gives
\begin{align*}
g_x(A)=\sum_{j=1}^{m}(g_x)_{e_{k_j}}=\sum_{j=1}^{m}\bigl(f(S_{k_j})-f(S_{k_j-1})\bigr)\leq \sum_{j=1}^{m}\bigl(f(A_j)-f(A_{j-1})\bigr)=f(A)-f(\varnothing)=f(A).
\end{align*}
Therefore $g_x$ satisfies every defining inequality of $B(f)$, and hence $g_x\in B(f)$.
[/guided]
[/step]
[step:Prove the greedy vector maximizes the sorted linear functional]
Let $y\in B(f)$. Since $S_0=\varnothing$, the identity $y_{e_k}=y(S_k)-y(S_{k-1})$ holds for $1\leq k\leq n$. Substituting this identity and telescoping the resulting finite sum gives the summation-by-parts formula
\begin{align*}
\sum_{k=1}^{n}x_{e_k}y_{e_k}=x_{e_n}y(E)+\sum_{k=1}^{n-1}(x_{e_k}-x_{e_{k+1}})y(S_k).
\end{align*}
Because $y\in B(f)$, we have $y(E)=f(E)$ and $y(S_k)\leq f(S_k)$ for $1\leq k\leq n-1$. Because the coordinates of $x$ are sorted in nonincreasing order, $x_{e_k}-x_{e_{k+1}}\geq 0$. Hence
\begin{align*}
\sum_{k=1}^{n}x_{e_k}y_{e_k}\leq x_{e_n}f(E)+\sum_{k=1}^{n-1}(x_{e_k}-x_{e_{k+1}})f(S_k).
\end{align*}
The right-hand side is the same summation-by-parts expansion applied to $g_x$, since $g_x(S_k)=f(S_k)$ for each chain set $S_k$ and $g_x(E)=f(E).$ Therefore
\begin{align*}
\sum_{e\in E}x_e y_e\leq \sum_{e\in E}x_e (g_x)_e=\hat f(x).
\end{align*}
Since $g_x\in B(f)$, this proves
\begin{align*}
\hat f(x)=\max_{y\in B(f)}\sum_{e\in E}x_e y_e.
\end{align*}
[guided]
The goal is to prove that no vector $y\in B(f)$ gives a larger value of the linear functional
\begin{align*}
y\mapsto \sum_{e\in E}x_e y_e
\end{align*}
than the greedy vector $g_x$. The ordering $e_1,\dots,e_n$ was chosen so that the coefficients $x_{e_k}$ decrease, which lets us compare the chain sums $y(S_k)$ with $f(S_k)$ using only nonnegative weights.
For $1\leq k\leq n$, the chain sets satisfy $S_k=S_{k-1}\cup\{e_k\}$, so the additivity of $y(S)$ over finite sums gives
\begin{align*}
y_{e_k}=y(S_k)-y(S_{k-1}).
\end{align*}
Substituting this identity into the linear functional gives
\begin{align*}
\sum_{k=1}^{n}x_{e_k}y_{e_k}=\sum_{k=1}^{n}x_{e_k}\bigl(y(S_k)-y(S_{k-1})\bigr).
\end{align*}
Expanding this finite sum and collecting the coefficient of each $y(S_k)$ gives the summation-by-parts identity
\begin{align*}
\sum_{k=1}^{n}x_{e_k}y_{e_k}=x_{e_n}y(E)+\sum_{k=1}^{n-1}(x_{e_k}-x_{e_{k+1}})y(S_k).
\end{align*}
Now we use the defining constraints of $B(f)$. Since $y\in B(f)$, we have $y(E)=f(E)$ and $y(S_k)\leq f(S_k)$ for each $1\leq k\leq n-1$. Since the coordinates are sorted as $x_{e_1}\geq\cdots\geq x_{e_n}$, every coefficient $x_{e_k}-x_{e_{k+1}}$ is nonnegative. Therefore replacing $y(S_k)$ by the larger quantity $f(S_k)$ preserves the inequality direction:
\begin{align*}
\sum_{k=1}^{n}x_{e_k}y_{e_k}\leq x_{e_n}f(E)+\sum_{k=1}^{n-1}(x_{e_k}-x_{e_{k+1}})f(S_k).
\end{align*}
The greedy vector has $g_x(E)=f(E)$ and, by telescoping along the initial segment of the chain, $g_x(S_k)=f(S_k)$ for each $1\leq k\leq n-1$. Applying the same summation-by-parts identity to $g_x$ therefore gives
\begin{align*}
x_{e_n}f(E)+\sum_{k=1}^{n-1}(x_{e_k}-x_{e_{k+1}})f(S_k)=\sum_{e\in E}x_e(g_x)_e.
\end{align*}
By the definition of the Lovász extension along the sorted order, the last expression is $\hat f(x)$. Hence every $y\in B(f)$ satisfies
\begin{align*}
\sum_{e\in E}x_e y_e\leq \sum_{e\in E}x_e(g_x)_e=\hat f(x).
\end{align*}
Since the previous step proved $g_x\in B(f)$, this upper bound is attained at $g_x$, and therefore
\begin{align*}
\hat f(x)=\max_{y\in B(f)}\sum_{e\in E}x_e y_e.
\end{align*}
[/guided]
[/step]
[step:Deduce convexity from the support function formula]
Let $x,z\in\mathbb{R}^E$ and let $\lambda\in[0,1]$. Using the support formula proved above,
\begin{align*}
\hat f(\lambda x+(1-\lambda)z)=\max_{y\in B(f)}\sum_{e\in E}\bigl(\lambda x_e+(1-\lambda)z_e\bigr)y_e=\max_{y\in B(f)}\left(\lambda\sum_{e\in E}x_e y_e+(1-\lambda)\sum_{e\in E}z_e y_e\right)\leq \lambda\max_{y\in B(f)}\sum_{e\in E}x_e y_e +(1-\lambda)\max_{y\in B(f)}\sum_{e\in E}z_e y_e=\lambda\hat f(x)+(1-\lambda)\hat f(z).
\end{align*}
Thus $\hat f$ is convex.
[guided]
We now use the support-function formula obtained in the previous step:
\begin{align*}
\hat f(w)=\max_{y\in B(f)}\sum_{e\in E}w_e y_e
\end{align*}
for every $w\in\mathbb{R}^E$. Let $x,z\in\mathbb{R}^E$ and let $\lambda\in[0,1]$. The vector $\lambda x+(1-\lambda)z\in\mathbb{R}^E$ has coordinates $\lambda x_e+(1-\lambda)z_e$, so
\begin{align*}
\hat f(\lambda x+(1-\lambda)z)=\max_{y\in B(f)}\left(\lambda\sum_{e\in E}x_e y_e+(1-\lambda)\sum_{e\in E}z_e y_e\right).
\end{align*}
For each fixed $y\in B(f)$, the first sum is at most $\max_{u\in B(f)}\sum_{e\in E}x_eu_e$, and the second sum is at most $\max_{v\in B(f)}\sum_{e\in E}z_ev_e$. Since $\lambda\geq0$ and $1-\lambda\geq0$, multiplying these inequalities preserves their directions. Taking the maximum over $y\in B(f)$ gives
\begin{align*}
\hat f(\lambda x+(1-\lambda)z)\leq \lambda\max_{u\in B(f)}\sum_{e\in E}x_eu_e+(1-\lambda)\max_{v\in B(f)}\sum_{e\in E}z_ev_e.
\end{align*}
Applying the support-function formula again to $x$ and $z$, the right-hand side is
\begin{align*}
\lambda\hat f(x)+(1-\lambda)\hat f(z).
\end{align*}
This is exactly the convexity inequality for $\hat f$ on $\mathbb{R}^E$.
[/guided]
[/step]
[step:Recover submodularity from convexity on indicator vectors]
Assume conversely that $\hat f$ is convex. For each $A\subset E$, define the indicator vector $\mathbb{1}_A\in\mathbb{R}^E$ by setting $(\mathbb{1}_A)_e:=1$ for $e\in A$ and $(\mathbb{1}_A)_e:=0$ for $e\in E\setminus A$. By the definition of the Lovász extension, sorting $\mathbb{1}_A$ places the elements of $A$ before the elements of $E\setminus A$, and therefore
\begin{align*}
\hat f(\mathbb{1}_A)=f(A).
\end{align*}
Let $A,B\subset E$. The midpoint identity
\begin{align*}
\frac{\mathbb{1}_A+\mathbb{1}_B}{2}=\frac{\mathbb{1}_{A\cup B}+\mathbb{1}_{A\cap B}}{2}
\end{align*}
holds coordinatewise: both sides are $1$ on $A\cap B$, $\frac{1}{2}$ on $(A\cup B)\setminus(A\cap B)$, and $0$ on $E\setminus(A\cup B)$. Sorting this common vector gives
\begin{align*}
\hat f\left(\frac{\mathbb{1}_A+\mathbb{1}_B}{2}\right)=f(A\cap B)+\frac{1}{2}\bigl(f(A\cup B)-f(A\cap B)\bigr)=\frac{f(A\cup B)+f(A\cap B)}{2}.
\end{align*}
Convexity of $\hat f$ applied to $\mathbb{1}_A$ and $\mathbb{1}_B$ gives
\begin{align*}
\hat f\left(\frac{\mathbb{1}_A+\mathbb{1}_B}{2}\right)\leq \frac{1}{2}\hat f(\mathbb{1}_A)+\frac{1}{2}\hat f(\mathbb{1}_B)=\frac{f(A)+f(B)}{2}.
\end{align*}
Combining the two displayed formulas and multiplying by $2$ yields
\begin{align*}
f(A\cup B)+f(A\cap B)\leq f(A)+f(B).
\end{align*}
Since $A,B\subset E$ were arbitrary, $f$ is submodular. This completes the proof.
[guided]
Assume now that $\hat f$ is convex. We must recover the submodular inequality for arbitrary subsets $A,B\subset E$. For each subset $C\subset E$, define the indicator vector $\mathbb{1}_C\in\mathbb{R}^E$ as follows: for $e\in C$ set $(\mathbb{1}_C)_e:=1$, and for $e\in E\setminus C$ set $(\mathbb{1}_C)_e:=0$. When the Lovász extension is evaluated at $\mathbb{1}_C$, any decreasing ordering places all elements of $C$ before all elements of $E\setminus C$. The increments telescope from $\varnothing$ to $C$ and then from $C$ to $E$ with coefficient $0$ after the first block, so
\begin{align*}
\hat f(\mathbb{1}_C)=f(C).
\end{align*}
Fix $A,B\subset E$. The coordinatewise identity
\begin{align*}
\frac{\mathbb{1}_A+\mathbb{1}_B}{2}=\frac{\mathbb{1}_{A\cup B}+\mathbb{1}_{A\cap B}}{2}
\end{align*}
holds because both sides have value $1$ on $A\cap B$, value $1/2$ on $(A\cup B)\setminus(A\cap B)$, and value $0$ on $E\setminus(A\cup B)$. Sorting this common vector puts the elements of $A\cap B$ first, then the elements of $(A\cup B)\setminus(A\cap B)$, then the remaining elements. Hence the Lovász extension evaluates to
\begin{align*}
\hat f\left(\frac{\mathbb{1}_A+\mathbb{1}_B}{2}\right)=f(A\cap B)+\frac{1}{2}\bigl(f(A\cup B)-f(A\cap B)\bigr)=\frac{f(A\cup B)+f(A\cap B)}{2}.
\end{align*}
Convexity of $\hat f$ applies to the two vectors $\mathbb{1}_A,\mathbb{1}_B\in\mathbb{R}^E$ with midpoint parameter $1/2$, giving
\begin{align*}
\hat f\left(\frac{\mathbb{1}_A+\mathbb{1}_B}{2}\right)\leq \frac{1}{2}\hat f(\mathbb{1}_A)+\frac{1}{2}\hat f(\mathbb{1}_B)=\frac{f(A)+f(B)}{2}.
\end{align*}
Combining this inequality with the explicit midpoint evaluation yields
\begin{align*}
\frac{f(A\cup B)+f(A\cap B)}{2}\leq \frac{f(A)+f(B)}{2}.
\end{align*}
Multiplying by $2$ gives
\begin{align*}
f(A\cup B)+f(A\cap B)\leq f(A)+f(B).
\end{align*}
Since $A$ and $B$ were arbitrary subsets of $E$, this is exactly submodularity of $f$.
[/guided]
[/step]