Sinkhorn Scaling for Finite Entropic Optimal Transport with Positive Marginals

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

No discussion available for this theorem.

Proof

[proofplan] We first prove that the finite transport polytope is nonempty and compact, and that the entropy-regularized objective is continuous and strictly convex, which gives existence and uniqueness of the minimizer. The main point is then to show that the minimizer is strictly positive; this is done by moving a small amount of mass around a two-by-two cycle and using the fact that the derivative of $t\log t-t$ tends to $-\infty$ at $0$. Once positivity is known, first-order optimality on the affine constraint space gives a matrix of partial derivatives lying in the span of the row and column constraint normals. Rearranging the resulting equations yields the Sinkhorn factorization, and a direct ratio argument proves uniqueness of the scaling vectors up to a reciprocal scalar. [/proofplan] [step:Show the feasible set is nonempty and compact] Define $\bar\pi=(\bar\pi_{ij})\in\mathbb R^{m\times n}$ by \begin{align*} \bar\pi_{ij}:=\mu_i\nu_j. \end{align*} Since $\mu_i>0$ and $\nu_j>0$, we have $\bar\pi_{ij}>0$ for all $i,j$. Also, \begin{align*} \sum_{j=1}^n \bar\pi_{ij}=\mu_i\sum_{j=1}^n\nu_j=\mu_i \end{align*} for every $i\in\{1,\dots,m\}$, and \begin{align*} \sum_{i=1}^m \bar\pi_{ij}=\nu_j\sum_{i=1}^m\mu_i=\nu_j \end{align*} for every $j\in\{1,\dots,n\}$. Hence $\bar\pi\in\Gamma(\mu,\nu)$, so $\Gamma(\mu,\nu)$ is nonempty. The set $\Gamma(\mu,\nu)$ is the intersection of the closed orthant $[0,\infty)^{m\times n}$ with finitely many affine hyperplanes, so it is closed in $\mathbb R^{m\times n}$. If $\pi\in\Gamma(\mu,\nu)$, then $0\le \pi_{ij}\le \mu_i\le 1$ for every $i,j$, so $\Gamma(\mu,\nu)$ is bounded. Since $\mathbb R^{m\times n}$ is finite-dimensional, $\Gamma(\mu,\nu)$ is compact. [/step] [step:Obtain existence and uniqueness of the minimizer] Define the objective function $F:\Gamma(\mu,\nu)\to\mathbb R$ by \begin{align*} F(\pi):=\sum_{i=1}^m\sum_{j=1}^n C_{ij}\pi_{ij}+\varepsilon\sum_{i=1}^m\sum_{j=1}^n h(\pi_{ij}). \end{align*} The function $h:[0,\infty)\to\mathbb R$ is continuous because \begin{align*} \lim_{t\downarrow 0}(t\log t-t)=0. \end{align*} Therefore $F$ is continuous on the compact set $\Gamma(\mu,\nu)$, and so $F$ attains a minimum at some $\pi^*\in\Gamma(\mu,\nu)$. The function $h$ is strictly convex on $[0,\infty)$. Indeed, $h$ is strictly convex on $(0,\infty)$ because $h''(t)=1/t>0$, and strict convexity at endpoints follows by taking one-sided limits from $(0,\infty)$. Since $\varepsilon>0$, the function \begin{align*} \pi\mapsto \varepsilon\sum_{i=1}^m\sum_{j=1}^n h(\pi_{ij}) \end{align*} is strictly convex on $[0,\infty)^{m\times n}$, while \begin{align*} \pi\mapsto \sum_{i=1}^m\sum_{j=1}^n C_{ij}\pi_{ij} \end{align*} is affine. Hence $F$ is strictly convex on the convex set $\Gamma(\mu,\nu)$. If two distinct minimizers existed, their midpoint would lie in $\Gamma(\mu,\nu)$ and strict convexity would give a strictly smaller value of $F$, a contradiction. Thus the minimizer is unique. [/step] [step:Prove the minimizer has only positive entries] We prove that $\pi^*_{ij}>0$ for every $i,j$. Define $q=(q_{ij})\in(0,\infty)^{m\times n}$ by \begin{align*} q_{ij}:=\mu_i\nu_j. \end{align*} As in the first step, $q\in\Gamma(\mu,\nu)$. Suppose, to the contrary, that the zero set \begin{align*} Z:=\{(i,j)\in\{1,\dots,m\}\times\{1,\dots,n\}:\pi^*_{ij}=0\} \end{align*} is nonempty, and let $P$ denote its complement in $\{1,\dots,m\}\times\{1,\dots,n\}$. For $\delta\in[0,1]$, define \begin{align*} \pi_\delta:=(1-\delta)\pi^*+\delta q. \end{align*} Because $\Gamma(\mu,\nu)$ is convex, $\pi_\delta\in\Gamma(\mu,\nu)$ for every $\delta\in[0,1]$. Set \begin{align*} M_Z:=\sum_{(i,j)\in Z}q_{ij}. \end{align*} Since $Z$ is nonempty and every $q_{ij}$ is positive, $M_Z>0$. For $(i,j)\in Z$ and $\delta\in(0,1]$, we have $(\pi_\delta)_{ij}=\delta q_{ij}$, and therefore \begin{align*} h((\pi_\delta)_{ij})-h(\pi^*_{ij})=\delta q_{ij}\log\delta+\delta q_{ij}\log q_{ij}-\delta q_{ij}. \end{align*} For $(i,j)\in P$, set $r_{ij}:=q_{ij}-\pi^*_{ij}$. Since $\pi^*_{ij}>0$ on $P$ and $h$ is differentiable on $(0,\infty)$ with $h'(t)=\log t$, the finite limit \begin{align*} \lim_{\delta\downarrow0}\frac{h(\pi^*_{ij}+\delta r_{ij})-h(\pi^*_{ij})}{\delta}=r_{ij}\log\pi^*_{ij} \end{align*} exists for each $(i,j)\in P$. The linear cost term satisfies \begin{align*} \sum_{i=1}^m\sum_{j=1}^n C_{ij}(\pi_\delta-\pi^*)_{ij}=\delta\sum_{i=1}^m\sum_{j=1}^n C_{ij}(q_{ij}-\pi^*_{ij}). \end{align*} Combining the preceding formulas and dividing by $\delta>0$ gives \begin{align*} \frac{F(\pi_\delta)-F(\pi^*)}{\delta}=\varepsilon M_Z\log\delta+R(\delta), \end{align*} where $R:(0,1]\to\mathbb R$ has a finite limit as $\delta\downarrow0$. Since $\varepsilon>0$ and $M_Z>0$, the term $\varepsilon M_Z\log\delta$ tends to $-\infty$ as $\delta\downarrow0$. Hence $F(\pi_\delta)<F(\pi^*)$ for all sufficiently small $\delta>0$, contradicting the minimality of $\pi^*$. Thus $Z$ is empty, so every entry of $\pi^*$ is positive. [guided] The goal is to rule out boundary minimizers. A two-by-two perturbation is not always available: a feasible coupling can have positive marginals and disconnected support. Instead we use a perturbation that always exists, namely the segment from $\pi^*$ to the strictly positive feasible coupling $q=(q_{ij})$ defined by \begin{align*} q_{ij}:=\mu_i\nu_j. \end{align*} Because $\mu_i>0$ and $\nu_j>0$, every $q_{ij}$ is positive. Also $q\in\Gamma(\mu,\nu)$, since its row and column sums are exactly the marginals. Assume that $\pi^*$ has at least one zero entry. Define the zero set \begin{align*} Z:=\{(i,j)\in\{1,\dots,m\}\times\{1,\dots,n\}:\pi^*_{ij}=0\} \end{align*} and let $P$ be the complement of $Z$. For $\delta\in[0,1]$, define the convex interpolation \begin{align*} \pi_\delta:=(1-\delta)\pi^*+\delta q. \end{align*} The transport polytope is convex because its defining conditions are linear equalities and coordinatewise inequalities. Therefore $\pi_\delta\in\Gamma(\mu,\nu)$ for every $\delta\in[0,1]$. The entropy term at the zero entries gives the contradiction. Define \begin{align*} M_Z:=\sum_{(i,j)\in Z}q_{ij}. \end{align*} This number is strictly positive because $Z$ is nonempty and every $q_{ij}>0$. If $(i,j)\in Z$, then $(\pi_\delta)_{ij}=\delta q_{ij}$, so using $h(0)=0$ and $h(t)=t\log t-t$ for $t>0$ gives \begin{align*} h((\pi_\delta)_{ij})-h(\pi^*_{ij})=\delta q_{ij}\log\delta+\delta q_{ij}\log q_{ij}-\delta q_{ij}. \end{align*} Summing these zero-entry contributions produces the singular term $\delta M_Z\log\delta$. Now consider the positive entries. For $(i,j)\in P$, define the real number \begin{align*} r_{ij}:=q_{ij}-\pi^*_{ij}. \end{align*} Since $\pi^*_{ij}>0$ on $P$, the function $h$ is differentiable at $\pi^*_{ij}$ and $h'(t)=\log t$ for $t>0$. Hence \begin{align*} \lim_{\delta\downarrow0}\frac{h(\pi^*_{ij}+\delta r_{ij})-h(\pi^*_{ij})}{\delta}=r_{ij}\log\pi^*_{ij}. \end{align*} There are only finitely many entries, so the sum of all positive-entry entropy quotients has a finite limit as $\delta\downarrow0$. The cost term is affine, so it contributes only a finite first-order quantity: \begin{align*} \sum_{i=1}^m\sum_{j=1}^n C_{ij}(\pi_\delta-\pi^*)_{ij}=\delta\sum_{i=1}^m\sum_{j=1}^n C_{ij}(q_{ij}-\pi^*_{ij}). \end{align*} Putting the zero-entry entropy contribution, the positive-entry entropy contribution, and the affine cost contribution together, we obtain \begin{align*} \frac{F(\pi_\delta)-F(\pi^*)}{\delta}=\varepsilon M_Z\log\delta+R(\delta), \end{align*} where $R(\delta)$ has a finite limit as $\delta\downarrow0$. The term $\varepsilon M_Z\log\delta$ tends to $-\infty$ because $\varepsilon>0$ and $M_Z>0$. Therefore the quotient is negative for all sufficiently small $\delta>0$, and hence \begin{align*} F(\pi_\delta)<F(\pi^*). \end{align*} This contradicts the minimality of $\pi^*$ over $\Gamma(\mu,\nu)$. Thus the zero set $Z$ must be empty, which means $\pi^*_{ij}>0$ for every $i,j$. [/guided] [/step] [step:Derive the row-column multiplier equations] Let $T\subset\mathbb R^{m\times n}$ denote the tangent space to the affine marginal constraints: \begin{align*} T:=\left\{\sigma\in\mathbb R^{m\times n}: \sum_{j=1}^n\sigma_{ij}=0\ \text{for every }i,\ \sum_{i=1}^m\sigma_{ij}=0\ \text{for every }j\right\}. \end{align*} Since $\pi^*_{ij}>0$ for every $i,j$, for every $\sigma\in T$ and all sufficiently small real $s$, the matrix $\pi^*+s\sigma$ lies in $\Gamma(\mu,\nu)$. Thus the first variation of $F$ at $\pi^*$ vanishes in every direction $\sigma\in T$: \begin{align*} \sum_{i=1}^m\sum_{j=1}^n \left(C_{ij}+\varepsilon\log\pi^*_{ij}\right)\sigma_{ij}=0. \end{align*} Define $G=(G_{ij})\in\mathbb R^{m\times n}$ by \begin{align*} G_{ij}:=C_{ij}+\varepsilon\log\pi^*_{ij}. \end{align*} The preceding identity says that $G$ is orthogonal to $T$ with respect to the Euclidean inner product on $\mathbb R^{m\times n}$. We now identify the orthogonal complement of $T$. Define the linear map $L:\mathbb R^{m\times n}\to\mathbb R^m\times\mathbb R^n$ by \begin{align*} L(\sigma):=\left(\left(\sum_{j=1}^n\sigma_{ij}\right)_{i=1}^m,\left(\sum_{i=1}^m\sigma_{ij}\right)_{j=1}^n\right). \end{align*} Then $T=\ker L$. With the Euclidean inner products on $\mathbb R^{m\times n}$ and $\mathbb R^m\times\mathbb R^n$, the adjoint map $L^*:\mathbb R^m\times\mathbb R^n\to\mathbb R^{m\times n}$ is given by \begin{align*} (L^*(\alpha,\beta))_{ij}=\alpha_i+\beta_j. \end{align*} Indeed, for every $\sigma\in\mathbb R^{m\times n}$, \begin{align*} \sum_{i=1}^m\alpha_i\sum_{j=1}^n\sigma_{ij}+\sum_{j=1}^n\beta_j\sum_{i=1}^m\sigma_{ij}=\sum_{i=1}^m\sum_{j=1}^n(\alpha_i+\beta_j)\sigma_{ij}. \end{align*} By the finite-dimensional identity $(\ker L)^\perp=\operatorname{Range}(L^*)$, there exist vectors $\alpha=(\alpha_1,\dots,\alpha_m)\in\mathbb R^m$ and $\beta=(\beta_1,\dots,\beta_n)\in\mathbb R^n$ such that \begin{align*} G_{ij}=\alpha_i+\beta_j \end{align*} for every $i,j$. Hence \begin{align*} C_{ij}+\varepsilon\log\pi^*_{ij}=\alpha_i+\beta_j \end{align*} for every $i\in\{1,\dots,m\}$ and $j\in\{1,\dots,n\}$. [/step] [step:Convert the multiplier equations into Sinkhorn scaling] From \begin{align*} C_{ij}+\varepsilon\log\pi^*_{ij}=\alpha_i+\beta_j \end{align*} we obtain \begin{align*} \pi^*_{ij}=\exp\left(\frac{\alpha_i}{\varepsilon}\right)\exp\left(-\frac{C_{ij}}{\varepsilon}\right)\exp\left(\frac{\beta_j}{\varepsilon}\right). \end{align*} Define $a=(a_1,\dots,a_m)\in(0,\infty)^m$ and $b=(b_1,\dots,b_n)\in(0,\infty)^n$ by \begin{align*} a_i:=\exp\left(\frac{\alpha_i}{\varepsilon}\right) \end{align*} and \begin{align*} b_j:=\exp\left(\frac{\beta_j}{\varepsilon}\right). \end{align*} Since $K_{ij}=\exp(-C_{ij}/\varepsilon)$, this gives \begin{align*} \pi^*_{ij}=a_iK_{ij}b_j \end{align*} for every $i,j$. Finally, because $\pi^*\in\Gamma(\mu,\nu)$, its row and column sums are prescribed. Substituting $\pi^*_{ij}=a_iK_{ij}b_j$ into those constraints gives \begin{align*} \sum_{j=1}^n a_iK_{ij}b_j=\mu_i \end{align*} for every $i\in\{1,\dots,m\}$, and \begin{align*} \sum_{i=1}^m a_iK_{ij}b_j=\nu_j \end{align*} for every $j\in\{1,\dots,n\}$. [/step] [step:Compare two positive factorizations to prove scalar uniqueness] Let $a'\in(0,\infty)^m$ and $b'\in(0,\infty)^n$ satisfy \begin{align*} a'_iK_{ij}b'_j=\pi^*_{ij} \end{align*} for every $i,j$. Since $K_{ij}>0$ and $\pi^*_{ij}=a_iK_{ij}b_j$, cancellation gives \begin{align*} a'_ib'_j=a_ib_j \end{align*} for every $i,j$. Fix $j_0\in\{1,\dots,n\}$ and define \begin{align*} \lambda:=\frac{b_{j_0}}{b'_{j_0}}. \end{align*} Then $\lambda>0$, and for every $i\in\{1,\dots,m\}$, \begin{align*} a'_i b'_{j_0}=a_i b_{j_0}. \end{align*} Dividing by $b'_{j_0}>0$ gives \begin{align*} a'_i=\lambda a_i. \end{align*} Using $a'_i=\lambda a_i$ in $a'_ib'_j=a_ib_j$ and dividing by $a_i>0$ gives \begin{align*} \lambda b'_j=b_j. \end{align*} Thus $b'_j=\lambda^{-1}b_j$ for every $j$. Therefore $a'=\lambda a$ and $b'=\lambda^{-1}b$, which proves the asserted uniqueness of the scaling vectors up to reciprocal scalar multiplication. [/step]

What brings you to Androma?

Start with a route through the knowledge graph.