Quadratic Cost Duality and Convex Subdifferential Structure

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] The quadratic identity follows by expanding $|x-y|^2$ and observing that the marginal constraints make the two second-moment terms independent of the transport plan. For the structural statement, we use [Kantorovich duality](/theorems/6799) for the quadratic cost to obtain optimal dual potentials and then rewrite those potentials as a dual pair for maximizing the correlation $x \cdot y$. Convex conjugation replaces this correlation-dual pair by a Fenchel pair $(\varphi,\varphi^*)$ without increasing its integral value. [Complementary slackness](/theorems/2559) then forces equality in Fenchel's inequality on the optimal plan, and Fenchel-Young equality identifies the equality set with the graph of the convex subdifferential. [/proofplan] [step:Expand the quadratic cost and separate the fixed marginal terms] For every $\pi \in \Pi(\mu,\nu)$, the function \begin{align*} (x,y) \mapsto x \cdot y \end{align*} is $\pi$-integrable because [Young's inequality](/theorems/244) gives \begin{align*} |x \cdot y| \leq \frac{1}{2}|x|^2 + \frac{1}{2}|y|^2 \end{align*} and $\mu,\nu \in \mathcal P_2(\mathbb R^n)$. Since $\pi$ has marginals $\mu$ and $\nu$, we have \begin{align*} \int_{\mathbb R^n \times \mathbb R^n} |x|^2 \, d\pi(x,y)=\int_{\mathbb R^n} |x|^2 \, d\mu(x) \end{align*} and \begin{align*} \int_{\mathbb R^n \times \mathbb R^n} |y|^2 \, d\pi(x,y)=\int_{\mathbb R^n} |y|^2 \, d\nu(y). \end{align*} Using the pointwise identity \begin{align*} |x-y|^2=|x|^2+|y|^2-2x \cdot y, \end{align*} we obtain, for every $\pi \in \Pi(\mu,\nu)$, \begin{align*} \int_{\mathbb R^n \times \mathbb R^n} |x-y|^2 \, d\pi(x,y)=A-2\int_{\mathbb R^n \times \mathbb R^n} x \cdot y \, d\pi(x,y), \end{align*} where the finite constant $A$ is defined by \begin{align*} A:=\int_{\mathbb R^n} |x|^2 \, d\mu(x)+\int_{\mathbb R^n} |y|^2 \, d\nu(y). \end{align*} Taking the infimum over $\pi \in \Pi(\mu,\nu)$ on the left side is therefore the same as taking the supremum of the correlation term on the right side: \begin{align*} \inf_{\pi \in \Pi(\mu,\nu)} \int_{\mathbb R^n \times \mathbb R^n} |x-y|^2 \, d\pi(x,y)=A-2\sup_{\pi \in \Pi(\mu,\nu)}\int_{\mathbb R^n \times \mathbb R^n} x \cdot y \, d\pi(x,y). \end{align*} [/step] [step:Obtain quadratic dual potentials and complementary slackness] By the existence theorem for optimal transport plans with lower semicontinuous costs bounded from below by an integrable function, applied to the continuous nonnegative cost \begin{align*} c: \mathbb R^n \times \mathbb R^n \to [0,\infty), \qquad c(x,y):=|x-y|^2, \end{align*} there exists a plan $\pi_0 \in \Pi(\mu,\nu)$ attaining the quadratic-cost infimum. We now fix any such optimal plan $\pi_0$. We now use the Kantorovich duality hypothesis stated in the theorem statement. Applied to $\pi_0$, it gives Borel functions \begin{align*} f: \mathbb R^n \to \mathbb R, \qquad g: \mathbb R^n \to \mathbb R \end{align*} such that $\int_{\mathbb R^n} |f| \, d\mu < \infty$ and $\int_{\mathbb R^n} |g| \, d\nu < \infty$, \begin{align*} f(x)+g(y)\leq |x-y|^2 \quad \text{for all } x,y \in \mathbb R^n, \end{align*} and \begin{align*} \int_{\mathbb R^n} f \, d\mu+\int_{\mathbb R^n} g \, d\nu=\int_{\mathbb R^n \times \mathbb R^n} |x-y|^2 \, d\pi_0(x,y). \end{align*} Since the nonnegative function $(x,y)\mapsto |x-y|^2-f(x)-g(y)$ has integral zero with respect to $\pi_0$, it vanishes $\pi_0$-almost everywhere: \begin{align*} f(x)+g(y)=|x-y|^2 \quad \text{for } \pi_0\text{-almost every }(x,y). \end{align*} [guided] The role of Kantorovich duality is to replace the global minimization over all couplings by pointwise inequalities involving two one-variable functions. The cost function is \begin{align*} c: \mathbb R^n \times \mathbb R^n \to [0,\infty), \qquad c(x,y):=|x-y|^2. \end{align*} It is continuous, hence lower semicontinuous, and nonnegative. Since $\mu,\nu \in \mathcal P_2(\mathbb R^n)$, the coupling set is nonempty: define the product coupling $\pi_{\mathrm{prod}} \in \Pi(\mu,\nu)$ by \begin{align*} \pi_{\mathrm{prod}}(A \times B):=\mu(A)\nu(B) \end{align*} for all Borel sets $A,B \subset \mathbb R^n$, and extend this prescription uniquely to the Borel product $\sigma$-algebra on $\mathbb R^n \times \mathbb R^n$. This coupling has finite quadratic cost because \begin{align*} |x-y|^2 \leq 2|x|^2+2|y|^2. \end{align*} Thus the usual existence theorem for lower semicontinuous transport costs applies and gives an optimal plan $\pi_0 \in \Pi(\mu,\nu)$. Next we use the Kantorovich duality hypothesis stated in the theorem statement. This is the strong finite-second-moment quadratic-cost version needed here: it gives real-valued Borel representatives of the dual potentials, integrability with respect to the two marginals, the pointwise dual inequality on all of $\mathbb R^n \times \mathbb R^n$, dual attainment, and complementary slackness for the fixed optimal plan $\pi_0$. Thus we obtain Borel functions \begin{align*} f: \mathbb R^n \to \mathbb R, \qquad g: \mathbb R^n \to \mathbb R \end{align*} such that $\int_{\mathbb R^n} |f| \, d\mu < \infty$ and $\int_{\mathbb R^n} |g| \, d\nu < \infty$, satisfying the dual feasibility inequality \begin{align*} f(x)+g(y)\leq |x-y|^2 \quad \text{for all } x,y \in \mathbb R^n. \end{align*} Dual optimality says that the dual value equals the primal value: \begin{align*} \int_{\mathbb R^n} f \, d\mu+\int_{\mathbb R^n} g \, d\nu=\int_{\mathbb R^n \times \mathbb R^n} |x-y|^2 \, d\pi_0(x,y). \end{align*} Now define the nonnegative slack function \begin{align*} s: \mathbb R^n \times \mathbb R^n \to [0,\infty], \qquad s(x,y):=|x-y|^2-f(x)-g(y). \end{align*} Dual feasibility gives $s \geq 0$. Integrating $s$ with respect to $\pi_0$ and using the marginal identities for $\pi_0$ gives \begin{align*} \int_{\mathbb R^n \times \mathbb R^n} s \, d\pi_0=\int_{\mathbb R^n \times \mathbb R^n} |x-y|^2 \, d\pi_0(x,y)-\int_{\mathbb R^n} f \, d\mu-\int_{\mathbb R^n} g \, d\nu=0. \end{align*} A nonnegative measurable function with integral zero is zero almost everywhere. Therefore \begin{align*} f(x)+g(y)=|x-y|^2 \quad \text{for } \pi_0\text{-almost every }(x,y). \end{align*} This is the complementary slackness relation that will later become equality in Fenchel's inequality. [/guided] [/step] [step:Transform the quadratic dual pair into a correlation dual pair] Define the Borel functions \begin{align*} \alpha: \mathbb R^n \to \mathbb R \cup \{\infty\}, \qquad \alpha(x):=\frac{|x|^2-f(x)}{2} \end{align*} and \begin{align*} \beta: \mathbb R^n \to \mathbb R \cup \{\infty\}, \qquad \beta(y):=\frac{|y|^2-g(y)}{2}. \end{align*} The quadratic dual inequality is equivalent to \begin{align*} x \cdot y \leq \alpha(x)+\beta(y) \quad \text{for all } x,y \in \mathbb R^n. \end{align*} Also, \begin{align*} \int_{\mathbb R^n} \alpha \, d\mu+\int_{\mathbb R^n} \beta \, d\nu=\frac{1}{2}\left(A-\int_{\mathbb R^n} f \, d\mu-\int_{\mathbb R^n} g \, d\nu\right). \end{align*} Using the identity proved in the first step and the optimality of $\pi_0$, this becomes \begin{align*} \int_{\mathbb R^n} \alpha \, d\mu+\int_{\mathbb R^n} \beta \, d\nu=\sup_{\pi \in \Pi(\mu,\nu)}\int_{\mathbb R^n \times \mathbb R^n} x \cdot y \, d\pi(x,y). \end{align*} Moreover, complementary slackness for $f,g$ is equivalent to \begin{align*} x \cdot y=\alpha(x)+\beta(y) \quad \text{for } \pi_0\text{-almost every }(x,y). \end{align*} [/step] [step:Replace the correlation dual pair by a Fenchel dual pair] Since $f$ is finite $\mu$-almost everywhere with $\int_{\mathbb R^n} |f| \, d\mu < \infty$ and $g$ is finite $\nu$-almost everywhere with $\int_{\mathbb R^n} |g| \, d\nu < \infty$, the functions $\alpha$ and $\beta$ are finite $\mu$-almost everywhere and $\nu$-almost everywhere, respectively. Choose $x_0 \in \mathbb R^n$ with $\alpha(x_0)<\infty$ and choose $y_0 \in \mathbb R^n$ with $\beta(y_0)<\infty$. The correlation dual inequality gives the affine lower bound \begin{align*} \beta(y) \geq x_0 \cdot y-\alpha(x_0) \quad \text{for all } y \in \mathbb R^n. \end{align*} Define the convex conjugate of $\beta$ by the function \begin{align*} \varphi: \mathbb R^n \to (-\infty,\infty], \qquad \varphi(x):=\sup_{y \in \mathbb R^n}\{x \cdot y-\beta(y)\}. \end{align*} Because $x \cdot y \leq \alpha(x)+\beta(y)$ for all $x,y \in \mathbb R^n$, we have \begin{align*} \varphi(x)\leq \alpha(x) \quad \text{for all } x \in \mathbb R^n. \end{align*} The function $\varphi$ is convex and lower semicontinuous as a supremum of affine functions. The function $\beta$ is proper because it is finite at $y_0$, never takes the value $-\infty$, and has the affine minorant $y \mapsto x_0 \cdot y-\alpha(x_0)$. Hence its convex conjugate is a well-defined extended-real convex function. The function $\varphi$ is proper: the inequality above gives $\varphi(x_0)\leq \alpha(x_0)<\infty$, while $\beta(y_0)<\infty$ gives \begin{align*} \varphi(x)\geq x \cdot y_0-\beta(y_0)>-\infty \quad \text{for all } x \in \mathbb R^n. \end{align*} Moreover, $\varphi$ is bounded below by the affine function $x \mapsto x \cdot y_0-\beta(y_0)$, and $\varphi^*$ will be bounded below by the affine function $y \mapsto x_0 \cdot y-\varphi(x_0)$. Since $\mu,\nu \in \mathcal P_2(\mathbb R^n)$, both affine lower bounds are integrable with respect to $\mu$ and $\nu$. Thus the extended integrals of $\varphi$ and $\varphi^*$ are well-defined from below and cannot contain an ambiguous $\infty-\infty$ cancellation. Let $\varphi^*: \mathbb R^n \to (-\infty,\infty]$ denote the convex conjugate of $\varphi$: \begin{align*} \varphi^*(y):=\sup_{x \in \mathbb R^n}\{x \cdot y-\varphi(x)\}. \end{align*} Since $\beta$ is proper and bounded below by an affine function, the biconjugate $\beta^{**}$ is well-defined and the Fenchel-Moreau biconjugation inequality gives $\beta^{**}\leq \beta$ pointwise. Since $\varphi=\beta^*$, we have $\varphi^*=\beta^{**}$, and therefore $\varphi^*\leq \beta$ pointwise. Hence \begin{align*} \int_{\mathbb R^n} \varphi \, d\mu+\int_{\mathbb R^n} \varphi^* \, d\nu\leq \int_{\mathbb R^n} \alpha \, d\mu+\int_{\mathbb R^n} \beta \, d\nu. \end{align*} On the other hand, Fenchel's inequality gives \begin{align*} x \cdot y \leq \varphi(x)+\varphi^*(y) \quad \text{for all } x,y \in \mathbb R^n. \end{align*} Integrating this inequality against any $\pi \in \Pi(\mu,\nu)$ gives \begin{align*} \int_{\mathbb R^n \times \mathbb R^n} x \cdot y \, d\pi(x,y)\leq \int_{\mathbb R^n} \varphi \, d\mu+\int_{\mathbb R^n} \varphi^* \, d\nu. \end{align*} Taking the supremum over $\pi \in \Pi(\mu,\nu)$ and using the optimality of $(\alpha,\beta)$ for the correlation dual problem yields \begin{align*} \int_{\mathbb R^n} \varphi \, d\mu+\int_{\mathbb R^n} \varphi^* \, d\nu=\sup_{\pi \in \Pi(\mu,\nu)}\int_{\mathbb R^n \times \mathbb R^n} x \cdot y \, d\pi(x,y). \end{align*} [guided] The delicate point is that $\alpha$, $\beta$, $\varphi$, and $\varphi^*$ may be extended-valued, so we first put affine lower bounds in place before comparing integrals. Since $f$ is finite $\mu$-almost everywhere with $\int_{\mathbb R^n} |f| \, d\mu < \infty$ and $g$ is finite $\nu$-almost everywhere with $\int_{\mathbb R^n} |g| \, d\nu < \infty$, the functions $\alpha$ and $\beta$ are finite almost everywhere for their respective measures. Choose $x_0 \in \mathbb R^n$ with $\alpha(x_0)<\infty$ and $y_0 \in \mathbb R^n$ with $\beta(y_0)<\infty$. From \begin{align*} x \cdot y \leq \alpha(x)+\beta(y) \quad \text{for all } x,y \in \mathbb R^n \end{align*} we get, by fixing $x=x_0$, \begin{align*} \beta(y) \geq x_0 \cdot y-\alpha(x_0) \quad \text{for all } y \in \mathbb R^n. \end{align*} This affine lower bound is exactly what prevents the conjugate from degenerating at $x_0$. Define \begin{align*} \varphi: \mathbb R^n \to (-\infty,\infty], \qquad \varphi(x):=\sup_{y \in \mathbb R^n}\{x \cdot y-\beta(y)\}. \end{align*} For every $x,y \in \mathbb R^n$, the inequality $x \cdot y \leq \alpha(x)+\beta(y)$ gives $x \cdot y-\beta(y)\leq \alpha(x)$, and taking the supremum over $y$ gives \begin{align*} \varphi(x)\leq \alpha(x) \quad \text{for all } x \in \mathbb R^n. \end{align*} In particular, $\varphi(x_0)\leq \alpha(x_0)<\infty$. Also, because $\beta(y_0)<\infty$, the single competitor $y_0$ in the supremum gives \begin{align*} \varphi(x)\geq x \cdot y_0-\beta(y_0)>-\infty \quad \text{for all } x \in \mathbb R^n. \end{align*} Thus $\varphi$ is proper. It is convex and lower semicontinuous because it is the supremum of affine functions of $x$. Now define \begin{align*} \varphi^*: \mathbb R^n \to (-\infty,\infty], \qquad \varphi^*(y):=\sup_{x \in \mathbb R^n}\{x \cdot y-\varphi(x)\}. \end{align*} The same competitor $x_0$ gives the affine lower bound \begin{align*} \varphi^*(y)\geq x_0 \cdot y-\varphi(x_0) \quad \text{for all } y \in \mathbb R^n. \end{align*} Since $\mu$ and $\nu$ have finite second moments, every affine function on $\mathbb R^n$ is integrable with respect to $\mu$ and $\nu$. Therefore the extended integrals of $\varphi$ and $\varphi^*$ are defined from below; no expression of the form $\infty-\infty$ is being used. Because $\beta$ is proper and bounded below by an affine function, the biconjugate $\beta^{**}$ is well-defined and the Fenchel-Moreau biconjugation inequality gives $\beta^{**}\leq \beta$ pointwise. Since $\varphi=\beta^*$, we have $\varphi^*=\beta^{**}$, so $\varphi^*\leq \beta$ pointwise. Together with $\varphi\leq \alpha$, this yields \begin{align*} \int_{\mathbb R^n} \varphi \, d\mu+\int_{\mathbb R^n} \varphi^* \, d\nu\leq \int_{\mathbb R^n} \alpha \, d\mu+\int_{\mathbb R^n} \beta \, d\nu. \end{align*} Fenchel's inequality for the proper lower semicontinuous convex function $\varphi$ gives \begin{align*} x \cdot y \leq \varphi(x)+\varphi^*(y) \quad \text{for all } x,y \in \mathbb R^n. \end{align*} Integrating against any coupling $\pi \in \Pi(\mu,\nu)$ gives \begin{align*} \int_{\mathbb R^n \times \mathbb R^n} x \cdot y \, d\pi(x,y)\leq \int_{\mathbb R^n} \varphi \, d\mu+\int_{\mathbb R^n} \varphi^* \, d\nu. \end{align*} Taking the supremum over all couplings gives \begin{align*} \sup_{\pi \in \Pi(\mu,\nu)}\int_{\mathbb R^n \times \mathbb R^n} x \cdot y \, d\pi(x,y)\leq \int_{\mathbb R^n} \varphi \, d\mu+\int_{\mathbb R^n} \varphi^* \, d\nu. \end{align*} On the other hand, the previous comparison with $\alpha$ and $\beta$ gives \begin{align*} \int_{\mathbb R^n} \varphi \, d\mu+\int_{\mathbb R^n} \varphi^* \, d\nu\leq \int_{\mathbb R^n} \alpha \, d\mu+\int_{\mathbb R^n} \beta \, d\nu. \end{align*} It remains to identify the right-hand side. From the definitions of $\alpha$ and $\beta$, \begin{align*} \int_{\mathbb R^n} \alpha \, d\mu+\int_{\mathbb R^n} \beta \, d\nu=\frac{1}{2}\left(A-\int_{\mathbb R^n} f \, d\mu-\int_{\mathbb R^n} g \, d\nu\right), \end{align*} where \begin{align*} A:=\int_{\mathbb R^n} |x|^2 \, d\mu(x)+\int_{\mathbb R^n} |y|^2 \, d\nu(y). \end{align*} Dual optimality for $f,g$ says \begin{align*} \int_{\mathbb R^n} f \, d\mu+\int_{\mathbb R^n} g \, d\nu=\int_{\mathbb R^n \times \mathbb R^n} |x-y|^2 \, d\pi_0(x,y). \end{align*} Expanding $|x-y|^2=|x|^2+|y|^2-2x\cdot y$ and using that $\pi_0$ has marginals $\mu$ and $\nu$, we get \begin{align*} \int_{\mathbb R^n} \alpha \, d\mu+\int_{\mathbb R^n} \beta \, d\nu=\int_{\mathbb R^n \times \mathbb R^n} x\cdot y \, d\pi_0(x,y). \end{align*} Since $\pi_0$ is a coupling, this last quantity is at most the correlation supremum. Combining the three inequalities forces equality: \begin{align*} \int_{\mathbb R^n} \varphi \, d\mu+\int_{\mathbb R^n} \varphi^* \, d\nu=\sup_{\pi \in \Pi(\mu,\nu)}\int_{\mathbb R^n \times \mathbb R^n} x \cdot y \, d\pi(x,y). \end{align*} [/guided] [/step] [step:Use complementary slackness for the Fenchel pair] Define the nonnegative measurable function \begin{align*} h: \mathbb R^n \times \mathbb R^n \to [0,\infty], \qquad h(x,y):=\varphi(x)+\varphi^*(y)-x \cdot y. \end{align*} Fenchel's inequality gives $h \geq 0$. The affine lower bounds obtained in the preceding step imply that the extended integrals of $\varphi$ and $\varphi^*$ are defined from below. Since their sum has already been proved equal to the finite correlation supremum, no undefined extended-real subtraction occurs below. Since $\pi_0$ maximizes the correlation functional by the identity in the first step, and since $(\varphi,\varphi^*)$ has the same dual value, we have \begin{align*} \int_{\mathbb R^n \times \mathbb R^n} h \, d\pi_0=\int_{\mathbb R^n} \varphi \, d\mu+\int_{\mathbb R^n} \varphi^* \, d\nu-\int_{\mathbb R^n \times \mathbb R^n} x \cdot y \, d\pi_0(x,y)=0. \end{align*} Therefore $h=0$ for $\pi_0$-almost every $(x,y)$, meaning \begin{align*} \varphi(x)+\varphi^*(y)=x \cdot y \quad \text{for } \pi_0\text{-almost every }(x,y). \end{align*} [/step] [step:Identify the Fenchel equality set with the subdifferential graph] By the Fenchel-Young equality criterion for proper lower semicontinuous convex functions, for $x,y \in \mathbb R^n$ one has \begin{align*} \varphi(x)+\varphi^*(y)=x \cdot y \end{align*} if and only if $y \in \partial \varphi(x)$. The graph of $\partial \varphi$ is closed in $\mathbb R^n \times \mathbb R^n$, hence Borel measurable. Applying this equivalence to the $\pi_0$-almost everywhere equality obtained in the previous step gives \begin{align*} \pi_0\bigl(\{(x,y) \in \mathbb R^n \times \mathbb R^n : y \in \partial \varphi(x)\}\bigr)=1. \end{align*} Thus the optimal plan $\pi_0$ is concentrated on the graph of the convex subdifferential of the proper lower semicontinuous convex function $\varphi$, as claimed. [/step]

What brings you to Androma?

Start with a route through the knowledge graph.