Optimal Transport I: Foundations develops the central questions and methods of optimal transport, the study of how to move mass from one distribution to another as efficiently as possible. The course begins with the classical Monge formulation, where transport is described by maps, and then moves to the Kantorovich relaxation, which replaces rigid maps with transport plans and makes the theory flexible enough to handle existence, duality, and computation. Along the way, the course connects analysis, geometry, partial differential equations, and linear programming through a single organizing problem.
text
admin
The main themes are the structure of optimal couplings, the role of dual potentials, and the geometric meaning of transport cost. Early chapters build the basic variational framework, then discrete optimal transport and linear programming provide a computable model that clarifies the general theory. From there, the course develops [Kantorovich duality](/theorems/6799), cyclical monotonicity, and Brenier’s theorem for quadratic cost, leading to the Monge-Ampere equation as the PDE behind optimal maps. The later chapters use these tools to define Wasserstein distances, study the geometry of Wasserstein spaces, and analyze stability, approximation, and limiting behavior of transport problems in general settings.
text
admin
# Introduction
h1
admin
Optimal transport asks how one distribution of mass can be moved into another while paying the least possible cost. The subject begins with a geometric problem about moving piles of material, but the modern theory quickly becomes a language for probability measures, [weak convergence](/page/Weak%20Convergence), convexity, and partial differential equations. This foundations course develops the passage from transport maps to transport plans, then uses compactness and duality to obtain existence and structure theorems. Chapters 8 and 9 treat Wasserstein distances and their geodesics, while Chapters 6 and 7 develop the quadratic-cost theory culminating in [Brenier's theorem](/theorems/7477) and its Monge-Ampere equation.
text
admin
The guiding viewpoint is that a probability measure records where mass sits, while a transport object records how that mass is matched to its new location. At first, the desired object is a map: each point of the source chooses a single destination. The main lesson of the opening lectures is that maps are too rigid for a general existence theory, so the course replaces them by measures on the product space.
text
admin
We use two standard pieces of notation from the start. If $X$ is a measurable or [topological space](/page/Topological%20Space), then $\mathcal P(X)$ denotes the set of probability measures on $X$. On $\mathbb R^n$, the symbol $\mathcal L^n$ denotes [Lebesgue measure](/page/Lebesgue%20Measure).
text
admin
## The Central Question of Transport
h2
admin
How can two probability measures be compared when the underlying space has geometry? Total variation detects how much mass differs at the same location, but it does not measure the effort required to move mass from one location to another. Optimal transport inserts a cost $c(x,y)$ between a source point $x$ and a target point $y$, and asks for a mass-preserving assignment that minimises the total cost.
text
admin
The first formulation uses deterministic assignments. Before minimising anything, we need a precise way to say that a measurable map sends one measure to another.
text
admin
[definition: Pushforward Measure]
Let $(X, \mathcal A)$ and $(Y, \mathcal B)$ be measurable spaces, let $\mu$ be a measure on $(X, \mathcal A)$, and let $T: X \to Y$ be measurable. The pushforward measure $T_\#\mu$ on $(Y, \mathcal B)$ is defined by
\begin{align*}
T_\#\mu(B) = \mu(T^{-1}(B))
\end{align*}
for every $B \in \mathcal B$.
[/definition]
definition
admin
The pushforward condition is the mass-balance constraint: all mass that lands in a target set $B$ must have come from the preimage $T^{-1}(B)$. This lets us state Monge's problem as a constrained minimisation over maps.
text
admin
[definition: Transport Map]
Let $(X, \mathcal A, \mu)$ and $(Y, \mathcal B, \nu)$ be probability spaces. A transport map from $\mu$ to $\nu$ is a measurable map $T: X \to Y$ such that $T_\#\mu = \nu$.
[/definition]
definition
admin
Once a transport map exists, a cost function $c: X \times Y \to (-\infty,\infty]$ assigns a price to moving a unit of mass from $x$ to $T(x)$. The resulting optimisation problem is the historical starting point.
text
admin
[definition: Monge Problem]
Let $X$ and $Y$ be measurable spaces, let $\mu \in \mathcal P(X)$ and $\nu \in \mathcal P(Y)$, and let $c: X \times Y \to (-\infty,\infty]$ be measurable. The Monge problem is
\begin{align*}
\inf\left\{\int_X c(x,T(x))\,d\mu(x) : T_\#\mu = \nu\right\}.
\end{align*}
[/definition]
definition
admin
This problem is nonlinear in two ways: the admissibility constraint is imposed on maps, and the set of maps is not stable under averaging. Those features are the reason Monge's formulation is geometrically natural but analytically difficult.
text
admin
[example: Translating Lebesgue Measure]
Let $\mu$ be given by $\mu(B)=\mathcal L^n(B\cap A)/\mathcal L^n(A)$ for Borel sets $B\subset \mathbb R^n$, and fix $a\in\mathbb R^n$. The map $T_a(x)=x+a$ is continuous, hence Borel measurable, and if $\nu=(T_a)_\#\mu$, then for every Borel set $B\subset\mathbb R^n$,
\begin{align*}
(T_a)_\#\mu(B)=\mu(T_a^{-1}(B))=\nu(B).
\end{align*}
Thus $T_a{}_\#\mu=\nu$, so $T_a$ is a transport map from $\mu$ to $\nu$.
For the squared Euclidean cost $c(x,y)=|x-y|^2$, the cost of this map is
\begin{align*}
\int_{\mathbb R^n} c(x,T_a(x))\,d\mu(x)=\int_{\mathbb R^n}|x-(x+a)|^2\,d\mu(x).
\end{align*}
Since $x-(x+a)=-a$ and $|-a|=|a|$, this becomes
\begin{align*}
\int_{\mathbb R^n}|x-(x+a)|^2\,d\mu(x)=\int_{\mathbb R^n}|a|^2\,d\mu(x).
\end{align*}
Because $|a|^2$ is constant and $\mu$ is a probability measure,
\begin{align*}
\int_{\mathbb R^n}|a|^2\,d\mu(x)=|a|^2\mu(\mathbb R^n)=|a|^2.
\end{align*}
So translating every point by the same vector $a$ has total quadratic transport cost exactly $|a|^2$.
[/example]
example
admin
The example shows the simplest geometric meaning of the cost: transport is not just comparison of two measures, but comparison through a chosen geometry. It also hides a rigidity that becomes visible when a single source point must send mass to several destinations. To formulate transport in a way that can represent splitting, the next object must record pairings of source and target locations as a measure rather than as a function.
text
admin
## Why Maps Are Too Rigid
h2
admin
What fails if every source point is forced to choose only one destination? The obstruction appears as soon as atoms are present. A point mass cannot be split by a deterministic map, but many natural target measures require exactly such splitting.
text
admin
[example: An Atom Cannot Split]
Let $\mu=\delta_0$ on $\mathbb R$ and let $\nu=\frac12\delta_{-1}+\frac12\delta_1$. Fix a measurable map $T:\mathbb R\to\mathbb R$. For every Borel set $B\subset\mathbb R$, the definition of pushforward gives
\begin{align*}
T_\#\mu(B)=\mu(T^{-1}(B))=\delta_0(T^{-1}(B)).
\end{align*}
The condition $0\in T^{-1}(B)$ is equivalent to $T(0)\in B$, so $T_\#\mu(B)=1$ when $T(0)\in B$ and $T_\#\mu(B)=0$ when $T(0)\notin B$. Hence $T_\#\mu=\delta_{T(0)}$.
If $T$ were a transport map from $\mu$ to $\nu$, then $T_\#\mu=\nu$. Evaluating both measures on the singleton $\{T(0)\}$ gives
\begin{align*}
T_\#\mu(\{T(0)\})=\delta_{T(0)}(\{T(0)\})=1.
\end{align*}
On the other hand, $\nu(\{T(0)\})$ is $1/2$ if $T(0)=-1$, is $1/2$ if $T(0)=1$, and is $0$ otherwise. Thus $\nu(\{T(0)\})\ne 1$, contradicting $T_\#\mu=\nu$. Therefore no transport map sends $\delta_0$ to $\frac12\delta_{-1}+\frac12\delta_1$, even though the desired movement is exactly to send half of the mass at $0$ to $-1$ and half to $1$.
[/example]