Pages Optimal Transport I: Foundations Attributions

Attributions & Verification

Track contributions and verify content correctness

Content

Optimal Transport I: Foundations develops the central questions and methods of optimal transport, the study of how to move mass from one distribution to another as efficiently as possible. The course begins with the classical Monge formulation, where transport is described by maps, and then moves to the Kantorovich relaxation, which replaces rigid maps with transport plans and makes the theory flexible enough to handle existence, duality, and computation. Along the way, the course connects analysis, geometry, partial differential equations, and linear programming through a single organizing problem.

text admin

The main themes are the structure of optimal couplings, the role of dual potentials, and the geometric meaning of transport cost. Early chapters build the basic variational framework, then discrete optimal transport and linear programming provide a computable model that clarifies the general theory. From there, the course develops [Kantorovich duality](/theorems/6799), cyclical monotonicity, and Brenier’s theorem for quadratic cost, leading to the Monge-Ampere equation as the PDE behind optimal maps. The later chapters use these tools to define Wasserstein distances, study the geometry of Wasserstein spaces, and analyze stability, approximation, and limiting behavior of transport problems in general settings.

text admin

# Introduction

h1 admin

Optimal transport asks how one distribution of mass can be moved into another while paying the least possible cost. The subject begins with a geometric problem about moving piles of material, but the modern theory quickly becomes a language for probability measures, [weak convergence](/page/Weak%20Convergence), convexity, and partial differential equations. This foundations course develops the passage from transport maps to transport plans, then uses compactness and duality to obtain existence and structure theorems. Chapters 8 and 9 treat Wasserstein distances and their geodesics, while Chapters 6 and 7 develop the quadratic-cost theory culminating in [Brenier's theorem](/theorems/7477) and its Monge-Ampere equation.

text admin

The guiding viewpoint is that a probability measure records where mass sits, while a transport object records how that mass is matched to its new location. At first, the desired object is a map: each point of the source chooses a single destination. The main lesson of the opening lectures is that maps are too rigid for a general existence theory, so the course replaces them by measures on the product space.

text admin

We use two standard pieces of notation from the start. If $X$ is a measurable or [topological space](/page/Topological%20Space), then $\mathcal P(X)$ denotes the set of probability measures on $X$. On $\mathbb R^n$, the symbol $\mathcal L^n$ denotes [Lebesgue measure](/page/Lebesgue%20Measure).

text admin

## The Central Question of Transport

h2 admin

How can two probability measures be compared when the underlying space has geometry? Total variation detects how much mass differs at the same location, but it does not measure the effort required to move mass from one location to another. Optimal transport inserts a cost $c(x,y)$ between a source point $x$ and a target point $y$, and asks for a mass-preserving assignment that minimises the total cost.

text admin

The first formulation uses deterministic assignments. Before minimising anything, we need a precise way to say that a measurable map sends one measure to another.

text admin

[definition: Pushforward Measure] Let $(X, \mathcal A)$ and $(Y, \mathcal B)$ be measurable spaces, let $\mu$ be a measure on $(X, \mathcal A)$, and let $T: X \to Y$ be measurable. The pushforward measure $T_\#\mu$ on $(Y, \mathcal B)$ is defined by \begin{align*} T_\#\mu(B) = \mu(T^{-1}(B)) \end{align*} for every $B \in \mathcal B$. [/definition]

definition admin

The pushforward condition is the mass-balance constraint: all mass that lands in a target set $B$ must have come from the preimage $T^{-1}(B)$. This lets us state Monge's problem as a constrained minimisation over maps.

text admin

[definition: Transport Map] Let $(X, \mathcal A, \mu)$ and $(Y, \mathcal B, \nu)$ be probability spaces. A transport map from $\mu$ to $\nu$ is a measurable map $T: X \to Y$ such that $T_\#\mu = \nu$. [/definition]

definition admin

Once a transport map exists, a cost function $c: X \times Y \to (-\infty,\infty]$ assigns a price to moving a unit of mass from $x$ to $T(x)$. The resulting optimisation problem is the historical starting point.

text admin

[definition: Monge Problem] Let $X$ and $Y$ be measurable spaces, let $\mu \in \mathcal P(X)$ and $\nu \in \mathcal P(Y)$, and let $c: X \times Y \to (-\infty,\infty]$ be measurable. The Monge problem is \begin{align*} \inf\left\{\int_X c(x,T(x))\,d\mu(x) : T_\#\mu = \nu\right\}. \end{align*} [/definition]

definition admin

This problem is nonlinear in two ways: the admissibility constraint is imposed on maps, and the set of maps is not stable under averaging. Those features are the reason Monge's formulation is geometrically natural but analytically difficult.

text admin

[example: Translating Lebesgue Measure] Let $\mu$ be given by $\mu(B)=\mathcal L^n(B\cap A)/\mathcal L^n(A)$ for Borel sets $B\subset \mathbb R^n$, and fix $a\in\mathbb R^n$. The map $T_a(x)=x+a$ is continuous, hence Borel measurable, and if $\nu=(T_a)_\#\mu$, then for every Borel set $B\subset\mathbb R^n$, \begin{align*} (T_a)_\#\mu(B)=\mu(T_a^{-1}(B))=\nu(B). \end{align*} Thus $T_a{}_\#\mu=\nu$, so $T_a$ is a transport map from $\mu$ to $\nu$. For the squared Euclidean cost $c(x,y)=|x-y|^2$, the cost of this map is \begin{align*} \int_{\mathbb R^n} c(x,T_a(x))\,d\mu(x)=\int_{\mathbb R^n}|x-(x+a)|^2\,d\mu(x). \end{align*} Since $x-(x+a)=-a$ and $|-a|=|a|$, this becomes \begin{align*} \int_{\mathbb R^n}|x-(x+a)|^2\,d\mu(x)=\int_{\mathbb R^n}|a|^2\,d\mu(x). \end{align*} Because $|a|^2$ is constant and $\mu$ is a probability measure, \begin{align*} \int_{\mathbb R^n}|a|^2\,d\mu(x)=|a|^2\mu(\mathbb R^n)=|a|^2. \end{align*} So translating every point by the same vector $a$ has total quadratic transport cost exactly $|a|^2$. [/example]

example admin

The example shows the simplest geometric meaning of the cost: transport is not just comparison of two measures, but comparison through a chosen geometry. It also hides a rigidity that becomes visible when a single source point must send mass to several destinations. To formulate transport in a way that can represent splitting, the next object must record pairings of source and target locations as a measure rather than as a function.

text admin

## Why Maps Are Too Rigid

h2 admin

What fails if every source point is forced to choose only one destination? The obstruction appears as soon as atoms are present. A point mass cannot be split by a deterministic map, but many natural target measures require exactly such splitting.

text admin

[example: An Atom Cannot Split] Let $\mu=\delta_0$ on $\mathbb R$ and let $\nu=\frac12\delta_{-1}+\frac12\delta_1$. Fix a measurable map $T:\mathbb R\to\mathbb R$. For every Borel set $B\subset\mathbb R$, the definition of pushforward gives \begin{align*} T_\#\mu(B)=\mu(T^{-1}(B))=\delta_0(T^{-1}(B)). \end{align*} The condition $0\in T^{-1}(B)$ is equivalent to $T(0)\in B$, so $T_\#\mu(B)=1$ when $T(0)\in B$ and $T_\#\mu(B)=0$ when $T(0)\notin B$. Hence $T_\#\mu=\delta_{T(0)}$. If $T$ were a transport map from $\mu$ to $\nu$, then $T_\#\mu=\nu$. Evaluating both measures on the singleton $\{T(0)\}$ gives \begin{align*} T_\#\mu(\{T(0)\})=\delta_{T(0)}(\{T(0)\})=1. \end{align*} On the other hand, $\nu(\{T(0)\})$ is $1/2$ if $T(0)=-1$, is $1/2$ if $T(0)=1$, and is $0$ otherwise. Thus $\nu(\{T(0)\})\ne 1$, contradicting $T_\#\mu=\nu$. Therefore no transport map sends $\delta_0$ to $\frac12\delta_{-1}+\frac12\delta_1$, even though the desired movement is exactly to send half of the mass at $0$ to $-1$ and half to $1$. [/example]

example admin

Showing 20 of 629 blocks

Verification Progress

629 Total Blocks

0 Verified

0% verified

Contributors

admin 629 blocks (0 verified)

Who Can Verify

Areas: Analysis

Viktor Miykov Admin

Max Vassiliev Global Reviewer

Horia Neagu Global Reviewer

강현욱 Global Reviewer

Demo Testing Global Reviewer

Archie Pennycook Global Reviewer

Quick Actions

Edit Page

Raw Attribution Data

Loading attribution data...

What brings you to Androma?

Start with a route through the knowledge graph.

Attributions & Verification

Content

Verification Progress

Contributors

Who Can Verify

Quick Actions

Sign in to Androma

Check your inbox

One last step

Attributions & Verification

Content

Verification Progress

Contributors

Who Can Verify

Quick Actions

Raw Attribution Data