Sinkhorn Scaling Theorem — Statement & Proof

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

No discussion available for this theorem.

Proof

[proofplan] We prove the scaling theorem by a finite-dimensional variational argument. We introduce logarithmic variables $x_i=\log u_i$ and $y_j=\log v_j$, minimize a strictly convex functional modulo its one-dimensional translation invariance, and show that its Euler-Lagrange equations are exactly the prescribed row and column constraints. The strict positivity of $K$, $a$, and $b$ gives coercivity on a normalized affine slice and identifies the only possible degeneracy of the Hessian. Uniqueness follows by applying the same Hessian argument to the line segment between two logarithmic scaling pairs. [/proofplan] [step:Set up the logarithmic scaling functional on a normalized slice] Define the affine subspace \begin{align*} E:=\left\{(x,y)\in\mathbb R^m\times\mathbb R^n:\sum_{i=1}^m a_i x_i=0\right\}. \end{align*} Define the function \begin{align*} \Phi:E\to\mathbb R,\qquad (x,y)\mapsto \sum_{i=1}^m\sum_{j=1}^n K_{ij}\exp(x_i+y_j)-\sum_{i=1}^m a_i x_i-\sum_{j=1}^n b_j y_j \end{align*} Since $(x,y)\in E$ implies $\sum_{i=1}^m a_i x_i=0$, on $E$ this is equivalently \begin{align*} \Phi(x,y)=\sum_{i=1}^m\sum_{j=1}^n K_{ij}\exp(x_i+y_j)-\sum_{j=1}^n b_j y_j. \end{align*} The normalization in the definition of $E$ removes the invariance \begin{align*} (x,y)\mapsto (x+c(1,\dots,1),y-c(1,\dots,1)), \end{align*} which leaves all sums $x_i+y_j$ unchanged. [/step] [step:Prove coercivity of the functional on the normalized slice] We prove that $\Phi(z)\to+\infty$ whenever $z=(x,y)\in E$ and $|z|\to\infty$, where $|z|$ is the Euclidean norm on $\mathbb R^{m+n}$. Let \begin{align*} \alpha:=\min_{1\le i\le m} a_i \end{align*} and \begin{align*} \beta:=\min_{1\le j\le n} b_j. \end{align*} Then $\alpha>0$ and $\beta>0$. For $(x,y)\in E$, the identity $\sum_i a_i x_i=0$ implies that either all $x_i=0$ or the vector $x$ has at least one nonnegative component and at least one nonpositive component. In particular, \begin{align*} \max_{1\le i\le m} x_i\ge 0 \end{align*} and \begin{align*} \min_{1\le i\le m} x_i\le 0. \end{align*} Also, \begin{align*} \sum_{i=1}^m a_i |x_i|\le \max_{1\le i\le m}x_i-\min_{1\le i\le m}x_i. \end{align*} Let \begin{align*} M_K:=\min_{1\le i\le m,\ 1\le j\le n}K_{ij}. \end{align*} Then $M_K>0$. For every $(x,y)\in E$, \begin{align*} \sum_{i=1}^m\sum_{j=1}^n K_{ij}\exp(x_i+y_j)\ge M_K\exp\left(\max_i x_i+\max_j y_j\right). \end{align*} Set \begin{align*} A:=\max_{1\le i\le m}x_i+\max_{1\le j\le n}y_j. \end{align*} Since $\max_i x_i\ge 0$, we have $\max_j y_j\le A$. Hence \begin{align*} -\sum_{j=1}^n b_j y_j\ge -\max_{1\le j\le n}y_j\ge -A. \end{align*} Therefore, when $A\to+\infty$, \begin{align*} \Phi(x,y)\ge M_K\exp(A)-A\to+\infty, \end{align*} because the exponential function dominates the linear function. It remains to control sequences for which $\max_i x_i+\max_j y_j$ is bounded above. Suppose $(x_k,y_k)\in E$ is a sequence with $|(x_k,y_k)|\to\infty$ and with $\max_i x_{k,i}+\max_j y_{k,j}$ bounded above. If $\max_j y_{k,j}\to+\infty$, then $\max_i x_{k,i}\to-\infty$, contradicting $\max_i x_{k,i}\ge0$. Hence $\max_j y_{k,j}$ is bounded above. Since \begin{align*} -\sum_{j=1}^n b_j y_{k,j}\ge -\max_{1\le j\le n}y_{k,j}, \end{align*} the linear part cannot tend to $-\infty$ through the maximum of $y_k$. If some component $y_{k,j}\to-\infty$, then, because $b_j>0$, \begin{align*} -\sum_{\ell=1}^n b_\ell y_{k,\ell}\to+\infty \end{align*} unless another component of $y_k$ tends to $+\infty$, which has already been excluded. Therefore $\Phi(x_k,y_k)\to+\infty$ in this case. Finally, if $y_k$ is bounded and $|(x_k,y_k)|\to\infty$, then $|x_k|\to\infty$. Since $x_k\in E$, the oscillation $\max_i x_{k,i}-\min_i x_{k,i}$ tends to $+\infty$. With $y_k$ bounded, at least one sum $x_{k,i}+y_{k,j}$ tends to $+\infty$, and the positive exponential term again forces $\Phi(x_k,y_k)\to+\infty$. Thus $\Phi$ is coercive on $E$. [guided] The only delicate point is that $\Phi$ has a built-in scaling symmetry: adding the same constant to every $x_i$ and subtracting it from every $y_j$ does not change $x_i+y_j$. The slice \begin{align*} E=\left\{(x,y)\in\mathbb R^m\times\mathbb R^n:\sum_{i=1}^m a_i x_i=0\right\} \end{align*} removes exactly this freedom. We now show that $\Phi$ tends to $+\infty$ along every unbounded sequence in $E$. Let $((x_k,y_k))_{k\in\mathbb N}$ be a sequence in $E$ with $|(x_k,y_k)|\to\infty$, where $x_k=(x_{k,1},\dots,x_{k,m})\in\mathbb R^m$ and $y_k=(y_{k,1},\dots,y_{k,n})\in\mathbb R^n$. Define \begin{align*} M_K:=\min_{1\le i\le m,\ 1\le j\le n}K_{ij}. \end{align*} Because every entry of $K$ is strictly positive, $M_K>0$. For every $(x,y)\in E$, the exponential part satisfies \begin{align*} \sum_{i=1}^m\sum_{j=1}^n K_{ij}\exp(x_i+y_j)\ge M_K\exp\left(\max_i x_i+\max_j y_j\right). \end{align*} Define \begin{align*} A_k:=\max_{1\le i\le m}x_{k,i}+\max_{1\le j\le n}y_{k,j}. \end{align*} If $A_k\to+\infty$, then the normalization gives $\max_i x_{k,i}\ge0$, so $\max_j y_{k,j}\le A_k$. Because the weights $b_j$ are positive and sum to $1$, \begin{align*} -\sum_{j=1}^n b_j y_{k,j}\ge -\max_{1\le j\le n}y_{k,j}\ge -A_k. \end{align*} Consequently \begin{align*} \Phi(x_k,y_k)\ge M_K\exp(A_k)-A_k\to+\infty. \end{align*} This is the precise domination estimate: the worst possible contribution from the linear term is only linear in $A_k$, while the positive exponential term grows like $\exp(A_k)$. It remains to consider the case where $A_k=\max_i x_{k,i}+\max_j y_{k,j}$ is bounded above. Since \begin{align*} \sum_{i=1}^m a_i x_{k,i}=0 \end{align*} and every $a_i$ is positive, the vector $x_k$ cannot have all components strictly negative or all components strictly positive unless it is balanced by components of the opposite sign. In particular, \begin{align*} \max_i x_{k,i}\ge0. \end{align*} Therefore boundedness of $\max_i x_{k,i}+\max_j y_{k,j}$ implies that $\max_j y_{k,j}$ is bounded above. Now the negative linear term in $\Phi$ is helpful. Since each $b_j>0$, if any component $y_{k,j}$ tends to $-\infty$ while $\max_j y_{k,j}$ remains bounded above, then \begin{align*} -\sum_{\ell=1}^n b_\ell y_{k,\ell}\to+\infty. \end{align*} Hence $\Phi(x_k,y_k)\to+\infty$ in that case. The only remaining possibility is that $y_k$ is bounded but $x_k$ is unbounded. Since $x_k$ satisfies the weighted normalization \begin{align*} \sum_{i=1}^m a_i x_{k,i}=0, \end{align*} unboundedness of $x_k$ forces the oscillation between its largest and smallest components to tend to $+\infty$. Because $y_k$ is bounded, some sum $x_{k,i}+y_{k,j}$ must tend to $+\infty$. The corresponding exponential term \begin{align*} K_{ij}\exp(x_{k,i}+y_{k,j}) \end{align*} then tends to $+\infty$, and all other exponential terms are nonnegative. Thus $\Phi(x_k,y_k)\to+\infty$. This proves coercivity on $E$. [/guided] [/step] [step:Minimize the functional and derive the row and column equations] We use the finite-dimensional direct method. Let $((x_k,y_k))_{k\in\mathbb N}$ be a minimizing sequence for $\Phi$ on $E$. Since $\Phi$ is coercive on $E$, this sequence is bounded in the finite-dimensional Euclidean space $\mathbb R^m\times\mathbb R^n$. By Bolzano-Weierstrass, after passing to a subsequence there is a point $(x^*,y^*)\in\mathbb R^m\times\mathbb R^n$ such that $(x_k,y_k)\to(x^*,y^*)$. Since $E$ is a closed affine subspace, $(x^*,y^*)\in E$. Since $\Phi:E\to\mathbb R$ is continuous, $\Phi$ attains its minimum at $(x^*,y^*)$. For $i\in\{1,\dots,m\}$ and $j\in\{1,\dots,n\}$, define \begin{align*} P_{ij}:=K_{ij}\exp(x_i^*+y_j^*). \end{align*} We first compute the column equations. For each $j\in\{1,\dots,n\}$ and each $s\in\mathbb R$, the point $(x^*,y^*+s e_j)$ lies in $E$, where $e_j\in\mathbb R^n$ is the $j$-th standard basis vector. Differentiating the one-variable function $s\mapsto\Phi(x^*,y^*+s e_j)$ at $s=0$ gives \begin{align*} 0=\sum_{i=1}^m K_{ij}\exp(x_i^*+y_j^*)-b_j. \end{align*} Therefore \begin{align*} \sum_{i=1}^m P_{ij}=b_j. \end{align*} For the row equations, let $h=(h_1,\dots,h_m)\in\mathbb R^m$ satisfy \begin{align*} \sum_{i=1}^m a_i h_i=0. \end{align*} Then $(x^*+s h,y^*)\in E$ for every $s\in\mathbb R$. Differentiating $s\mapsto\Phi(x^*+s h,y^*)$ at $s=0$ gives \begin{align*} 0=\sum_{i=1}^m h_i\left(\sum_{j=1}^n K_{ij}\exp(x_i^*+y_j^*)-a_i\right). \end{align*} Define $r=(r_1,\dots,r_m)\in\mathbb R^m$ by \begin{align*} r_i:=\sum_{j=1}^n P_{ij}-a_i. \end{align*} The preceding identity says \begin{align*} \sum_{i=1}^m h_i r_i=0 \end{align*} for every $h\in\mathbb R^m$ satisfying $\sum_i a_i h_i=0$. Hence $r$ is a scalar multiple of $a$. Summing the components of $r$ and using the already proved column equations gives \begin{align*} \sum_{i=1}^m r_i=\sum_{i=1}^m\sum_{j=1}^n P_{ij}-\sum_{i=1}^m a_i=\sum_{j=1}^n b_j-1=0. \end{align*} Since $\sum_i a_i=1$, the scalar multiple must be zero. Thus $r_i=0$ for every $i$, namely \begin{align*} \sum_{j=1}^n P_{ij}=a_i. \end{align*} [/step] [step:Convert the critical point into positive scaling vectors] Define \begin{align*} u:\{1,\dots,m\}\to(0,\infty),\qquad i\mapsto \exp(x_i^*) \end{align*} and \begin{align*} v:\{1,\dots,n\}\to(0,\infty),\qquad j\mapsto \exp(y_j^*) \end{align*} Equivalently, write $u=(u_1,\dots,u_m)\in(0,\infty)^m$ and $v=(v_1,\dots,v_n)\in(0,\infty)^n$. Then \begin{align*} P_{ij}=u_iK_{ij}v_j. \end{align*} The row and column identities proved above show that \begin{align*} P=\operatorname{diag}(u)K\operatorname{diag}(v)\in\Pi(a,b). \end{align*} This proves existence. [/step] [step:Identify the only possible non-uniqueness of the scaling factors] Let $\widetilde u\in(0,\infty)^m$ and $\widetilde v\in(0,\infty)^n$ satisfy \begin{align*} \widetilde P:=\operatorname{diag}(\widetilde u)K\operatorname{diag}(\widetilde v)\in\Pi(a,b). \end{align*} Define logarithmic differences $p=(p_1,\dots,p_m)\in\mathbb R^m$ and $q=(q_1,\dots,q_n)\in\mathbb R^n$ by \begin{align*} p_i:=\log \widetilde u_i-\log u_i \end{align*} and \begin{align*} q_j:=\log \widetilde v_j-\log v_j. \end{align*} Define the parameterized matrix family \begin{align*} P_{\mathrm{seg}}:[0,1]\to(0,\infty)^{m\times n},\qquad t\mapsto P_{\mathrm{seg}}(t) \end{align*} by \begin{align*} P_{\mathrm{seg}}(t)_{ij}:=P_{ij}\exp(t(p_i+q_j)). \end{align*} Both endpoint matrices $P_{\mathrm{seg}}(0)=P$ and $P_{\mathrm{seg}}(1)=\widetilde P$ have row marginals $a$ and column marginals $b$. Consider the convex function $\psi:[0,1]\to\mathbb R$ defined by \begin{align*} \psi(t):=\sum_{i=1}^m\sum_{j=1}^n P_{ij}\exp(t(p_i+q_j))-\sum_{i=1}^m a_i t p_i-\sum_{j=1}^n b_j t q_j. \end{align*} The endpoint marginal equations imply $\psi'(0)=0$ and $\psi'(1)=0$. Its second derivative is \begin{align*} \psi''(t)=\sum_{i=1}^m\sum_{j=1}^n P_{ij}\exp(t(p_i+q_j))(p_i+q_j)^2. \end{align*} Since every $P_{ij}>0$, every term in this sum is nonnegative, and $\psi''(t)=0$ holds only if \begin{align*} p_i+q_j=0 \end{align*} for every $i\in\{1,\dots,m\}$ and $j\in\{1,\dots,n\}$. Because $\psi''\ge0$, the derivative $\psi'$ is nondecreasing on $[0,1]$. Together with $\psi'(0)=\psi'(1)=0$, this forces $\psi'$ to be constant on $[0,1]$. Hence the continuous nonnegative function $\psi''$ vanishes on $(0,1)$. Therefore $p_i+q_j=0$ for all $i,j$. Fix $i_0\in\{1,\dots,m\}$ and define \begin{align*} c:=p_{i_0}. \end{align*} From $p_i+q_j=0$ for all $i,j$, we obtain $q_j=-c$ for every $j$ and then $p_i=c$ for every $i$. Set \begin{align*} \lambda:=\exp(c). \end{align*} Then $\lambda>0$ and \begin{align*} \widetilde u=\lambda u \end{align*} while \begin{align*} \widetilde v=\lambda^{-1}v. \end{align*} Thus the scaling pair is unique up to reciprocal scalar rescaling, completing the proof. [/step]

What brings you to Androma?

Start with a route through the knowledge graph.

Sinkhorn Scaling Theorem (Theorem # 9600)

Discussion

Proof

Explore Further

Sign in to Androma

Check your inbox

One last step

Sinkhorn Scaling Theorem (Theorem # 9600)

Discussion

Proof

Explore Further