[proofplan]
We first verify that the graph map $G_T:x\mapsto (x,T(x))$ is measurable, so its pushforward $\gamma_T$ is a probability measure on the product measurable space. We then compute the two marginals of $\gamma_T$ by composing $G_T$ with the coordinate projections. The cost identity is the defining integration property of pushforward measures, proved by reducing from indicators to simple functions and then to extended [measurable functions](/page/Measurable%20Functions) through positive and negative parts. Finally, every admissible Monge map therefore supplies an admissible Kantorovich plan with the same cost, so the Kantorovich infimum is bounded above by the Monge infimum.
[/proofplan]
[step:Show that the graph map is measurable]
Let
\begin{align*}
G_T: X \to X \times Y
\end{align*}
be the map defined by $G_T(x)=(x,T(x))$. To prove that $G_T$ is $\mathcal A/(\mathcal A\otimes\mathcal B)$-measurable, define
\begin{align*}
\mathcal C := \{E \in \mathcal A\otimes\mathcal B : G_T^{-1}(E)\in\mathcal A\}.
\end{align*}
The class $\mathcal C$ is a $\sigma$-algebra on $X\times Y$. If $A\in\mathcal A$ and $B\in\mathcal B$, then
\begin{align*}
G_T^{-1}(A\times B)=A\cap T^{-1}(B).
\end{align*}
Since $T$ is $\mathcal A/\mathcal B$-measurable, $T^{-1}(B)\in\mathcal A$, and hence $A\cap T^{-1}(B)\in\mathcal A$. Thus $\mathcal C$ contains every measurable rectangle $A\times B$. Because $\mathcal A\otimes\mathcal B$ is generated by such rectangles, $\mathcal C=\mathcal A\otimes\mathcal B$. Therefore $G_T$ is measurable.
Since $\mu$ is a probability measure on $(X,\mathcal A)$, its pushforward
\begin{align*}
\gamma_T := (G_T)_{\#}\mu
\end{align*}
is a probability measure on $(X\times Y,\mathcal A\otimes\mathcal B)$.
[guided]
The only point that needs checking before we can speak about the pushforward graph plan is measurability of the graph map. Define
\begin{align*}
G_T: X \to X \times Y
\end{align*}
by $G_T(x)=(x,T(x))$. We must show that $G_T^{-1}(E)\in\mathcal A$ for every $E\in\mathcal A\otimes\mathcal B$.
It is enough to test this on the generators of the product $\sigma$-algebra. To make that reduction precise, define
\begin{align*}
\mathcal C := \{E \in \mathcal A\otimes\mathcal B : G_T^{-1}(E)\in\mathcal A\}.
\end{align*}
This class is a $\sigma$-algebra because inverse images commute with complements and countable unions. Now take a measurable rectangle $A\times B$ with $A\in\mathcal A$ and $B\in\mathcal B$. For $x\in X$,
\begin{align*}
x\in G_T^{-1}(A\times B)
\end{align*}
holds exactly when $G_T(x)=(x,T(x))\in A\times B$, which is exactly when $x\in A$ and $T(x)\in B$. Hence
\begin{align*}
G_T^{-1}(A\times B)=A\cap T^{-1}(B).
\end{align*}
The set $T^{-1}(B)$ belongs to $\mathcal A$ because $T$ is measurable, and therefore $A\cap T^{-1}(B)\in\mathcal A$. Thus every measurable rectangle lies in $\mathcal C$. Since $\mathcal A\otimes\mathcal B$ is the smallest $\sigma$-algebra containing all measurable rectangles, we get $\mathcal C=\mathcal A\otimes\mathcal B$. This proves that $G_T$ is measurable.
Now the pushforward
\begin{align*}
\gamma_T := (G_T)_{\#}\mu
\end{align*}
is well-defined. Since pushforwards preserve total mass and $\mu(X)=1$, $\gamma_T(X\times Y)=1$, so $\gamma_T$ is a probability measure on the product measurable space.
[/guided]
[/step]
[step:Compute the first marginal of the graph plan]
Let
\begin{align*}
\pi_X: X\times Y \to X
\end{align*}
be the first-coordinate projection, $\pi_X(x,y)=x$. For every $A\in\mathcal A$,
\begin{align*}
(\pi_X)_{\#}\gamma_T(A)=\gamma_T(\pi_X^{-1}(A)).
\end{align*}
Using $\gamma_T=(G_T)_{\#}\mu$, this becomes
\begin{align*}
(\pi_X)_{\#}\gamma_T(A)=\mu(G_T^{-1}(\pi_X^{-1}(A))).
\end{align*}
Since $\pi_X(G_T(x))=x$ for every $x\in X$, we have $G_T^{-1}(\pi_X^{-1}(A))=A$. Therefore
\begin{align*}
(\pi_X)_{\#}\gamma_T(A)=\mu(A).
\end{align*}
Thus $(\pi_X)_{\#}\gamma_T=\mu$.
[/step]
[step:Compute the second marginal of the graph plan]
Let
\begin{align*}
\pi_Y: X\times Y \to Y
\end{align*}
be the second-coordinate projection, $\pi_Y(x,y)=y$. For every $B\in\mathcal B$,
\begin{align*}
(\pi_Y)_{\#}\gamma_T(B)=\gamma_T(\pi_Y^{-1}(B)).
\end{align*}
Using $\gamma_T=(G_T)_{\#}\mu$, this gives
\begin{align*}
(\pi_Y)_{\#}\gamma_T(B)=\mu(G_T^{-1}(\pi_Y^{-1}(B))).
\end{align*}
Since $\pi_Y(G_T(x))=T(x)$ for every $x\in X$, we have $G_T^{-1}(\pi_Y^{-1}(B))=T^{-1}(B)$. Hence
\begin{align*}
(\pi_Y)_{\#}\gamma_T(B)=\mu(T^{-1}(B))=T_{\#}\mu(B)=\nu(B).
\end{align*}
Thus $(\pi_Y)_{\#}\gamma_T=\nu$. Combining this with the first marginal computation gives $\gamma_T\in\Pi(\mu,\nu)$.
[/step]
[step:Identify the cost integral by the pushforward formula]
We prove the pushforward integration identity for the measurable map $G_T$ and the extended-valued measurable function $c$. First let $E\in\mathcal A\otimes\mathcal B$ and let $\mathbb 1_E:X\times Y\to\{0,1\}$ be its indicator function. By the definition of pushforward measure,
\begin{align*}
\int_{X\times Y}\mathbb 1_E(x,y)\,d\gamma_T(x,y)=\gamma_T(E)=\mu(G_T^{-1}(E)).
\end{align*}
Also,
\begin{align*}
\int_X \mathbb 1_E(G_T(x))\,d\mu(x)=\mu(G_T^{-1}(E)).
\end{align*}
Thus the identity holds for indicators. By finite linearity of the integral, it holds for every non-negative simple $\mathcal A\otimes\mathcal B$-measurable function $s:X\times Y\to[0,\infty)$:
\begin{align*}
\int_{X\times Y}s(x,y)\,d\gamma_T(x,y)=\int_X s(G_T(x))\,d\mu(x).
\end{align*}
For a non-negative $\mathcal A\otimes\mathcal B$-measurable function $f:X\times Y\to[0,\infty]$, choose a sequence $(s_n)_{n\in\mathbb N}$ of non-negative simple $\mathcal A\otimes\mathcal B$-measurable functions
\begin{align*}
s_n:X\times Y\to[0,\infty)
\end{align*}
such that $s_n\uparrow f$ pointwise. Then $s_n\circ G_T:X\to[0,\infty)$ is a non-negative simple $\mathcal A$-measurable function for each $n\in\mathbb N$, and $s_n\circ G_T\uparrow f\circ G_T$ pointwise. Applying the [Monotone Convergence Theorem](/theorems/509) first on $(X\times Y,\mathcal A\otimes\mathcal B,\gamma_T)$ and then on $(X,\mathcal A,\mu)$, and using the simple-function identity for each $s_n$, gives
\begin{align*}
\int_{X\times Y}f(x,y)\,d\gamma_T(x,y)=\int_X f(G_T(x))\,d\mu(x).
\end{align*}
Apply this equality to the positive and negative parts $c^+,c^-:X\times Y\to[0,\infty]$ of $c$, defined by $c^+=\max\{c,0\}$ and $c^-=\max\{-c,0\}$. Since the extended integrals in the theorem statement are well-defined, the differences of the positive and negative parts do not involve the indeterminate form $\infty-\infty$. Therefore
\begin{align*}
\int_{X\times Y}c(x,y)\,d\gamma_T(x,y)=\int_X c(G_T(x))\,d\mu(x).
\end{align*}
Since $G_T(x)=(x,T(x))$, this is exactly
\begin{align*}
\int_{X\times Y}c(x,y)\,d\gamma_T(x,y)=\int_X c(x,T(x))\,d\mu(x).
\end{align*}
[guided]
The desired identity says that integrating over the pushed-forward measure $\gamma_T=(G_T)_{\#}\mu$ is the same as pulling the function back along $G_T$ and integrating over $\mu$. We verify this directly, because the cost $c$ may be extended-valued.
Start with indicator functions. Let $E\in\mathcal A\otimes\mathcal B$ and define the indicator function $\mathbb 1_E:X\times Y\to\{0,1\}$ by $\mathbb 1_E(z)=1$ for $z\in E$ and $\mathbb 1_E(z)=0$ for $z\notin E$. By the defining property of the pushforward measure,
\begin{align*}
\gamma_T(E)=\mu(G_T^{-1}(E)).
\end{align*}
Hence
\begin{align*}
\int_{X\times Y}\mathbb 1_E(x,y)\,d\gamma_T(x,y)=\gamma_T(E)=\mu(G_T^{-1}(E)).
\end{align*}
On the other hand, the pulled-back indicator $\mathbb 1_E\circ G_T:X\to\{0,1\}$ is the indicator of $G_T^{-1}(E)$, so
\begin{align*}
\int_X \mathbb 1_E(G_T(x))\,d\mu(x)=\mu(G_T^{-1}(E)).
\end{align*}
Thus both integrals agree for indicators.
Now take a non-negative [simple function](/page/Simple%20Function) $s:X\times Y\to[0,\infty)$. Such a function is a finite sum of scalar multiples of indicators of measurable sets. Since the integral is finite-linearly compatible with such sums, the indicator calculation gives
\begin{align*}
\int_{X\times Y}s(x,y)\,d\gamma_T(x,y)=\int_X s(G_T(x))\,d\mu(x).
\end{align*}
Next let $f:X\times Y\to[0,\infty]$ be any non-negative measurable function. By the standard approximation of non-negative measurable functions by simple functions, choose a sequence $(s_n)_{n\in\mathbb N}$ of non-negative simple $\mathcal A\otimes\mathcal B$-measurable functions
\begin{align*}
s_n:X\times Y\to[0,\infty)
\end{align*}
such that $s_n\uparrow f$ pointwise. Since $G_T$ is $\mathcal A/(\mathcal A\otimes\mathcal B)$-measurable, each composition $s_n\circ G_T:X\to[0,\infty)$ is a non-negative simple $\mathcal A$-measurable function. The pointwise monotonicity is preserved after composition: for every $x\in X$, $s_n(G_T(x))\uparrow f(G_T(x))$.
We now apply the Monotone Convergence Theorem on the product [measure space](/page/Measure%20Space) $(X\times Y,\mathcal A\otimes\mathcal B,\gamma_T)$ and on the source measure space $(X,\mathcal A,\mu)$. It gives
\begin{align*}
\int_{X\times Y}f(x,y)\,d\gamma_T(x,y)=\lim_{n\to\infty}\int_{X\times Y}s_n(x,y)\,d\gamma_T(x,y)
\end{align*}
and
\begin{align*}
\int_X f(G_T(x))\,d\mu(x)=\lim_{n\to\infty}\int_X s_n(G_T(x))\,d\mu(x).
\end{align*}
For every $n\in\mathbb N$, the simple-function identity already proved gives
\begin{align*}
\int_{X\times Y}s_n(x,y)\,d\gamma_T(x,y)=\int_X s_n(G_T(x))\,d\mu(x).
\end{align*}
Taking limits on both sides yields
\begin{align*}
\int_{X\times Y}f(x,y)\,d\gamma_T(x,y)=\int_X f(G_T(x))\,d\mu(x).
\end{align*}
Finally apply this non-negative identity to the positive and negative parts of the cost. Define
\begin{align*}
c^+:X\times Y\to[0,\infty]
\end{align*}
by $c^+(x,y)=\max\{c(x,y),0\}$, and define
\begin{align*}
c^-:X\times Y\to[0,\infty]
\end{align*}
by $c^-(x,y)=\max\{-c(x,y),0\}$. Then $c=c^+-c^-$. The hypothesis that the two extended integrals are well-defined is used exactly here: it rules out subtracting $\infty$ from $\infty$. Applying the non-negative formula to $c^+$ and $c^-$ and subtracting the two well-defined extended quantities gives
\begin{align*}
\int_{X\times Y}c(x,y)\,d\gamma_T(x,y)=\int_X c(G_T(x))\,d\mu(x).
\end{align*}
Because $G_T(x)=(x,T(x))$, the right-hand side is
\begin{align*}
\int_X c(x,T(x))\,d\mu(x).
\end{align*}
Thus
\begin{align*}
\int_{X\times Y}c(x,y)\,d\gamma_T(x,y)=\int_X c(x,T(x))\,d\mu(x).
\end{align*}
[/guided]
[/step]
[step:Compare the Kantorovich and Monge infima]
Let $\mathcal M$ denote the class of all $\mathcal A/\mathcal B$-measurable maps $S:X\to Y$ such that $S_{\#}\mu=\nu$ and the Monge cost
\begin{align*}
\int_X c(x,S(x))\,d\mu(x)
\end{align*}
is well-defined. For each $S\in\mathcal M$, let $\operatorname{id}_X:X\to X$ denote the identity map on $X$, and define the graph map
\begin{align*}
G_S:X\to X\times Y
\end{align*}
by $G_S(x)=(\operatorname{id}_X(x),S(x))=(x,S(x))$. The preceding argument applied to $S$ gives a plan
\begin{align*}
\gamma_S := (G_S)_{\#}\mu
\end{align*}
with $\gamma_S\in\Pi(\mu,\nu)$ and
\begin{align*}
\int_{X\times Y}c(x,y)\,d\gamma_S(x,y)=\int_X c(x,S(x))\,d\mu(x).
\end{align*}
Thus the Kantorovich admissible class contains, through graph pushforwards, a competitor with the same cost as each admissible Monge map. Taking the infimum over $S\in\mathcal M$ gives
\begin{align*}
\mathsf K_c(\mu,\nu)\le \inf_{S\in\mathcal M}\int_X c(x,S(x))\,d\mu(x).
\end{align*}
This is precisely the assertion that the Kantorovich value is no larger than the Monge infimum. The proof is complete.
[/step]