[guided]The Lagrangian velocity along the segment from $x$ to $y$ is the vector $y-x$. The Eulerian velocity, however, must be a function of the current space-time point $(t,z)$, not of the hidden labels $(x,y)$. Therefore we average the Lagrangian velocity over all particles that occupy the same point $z$ at time $t$.
Formally, define
\begin{align*}
\xi:\mathbb R^n\times\mathbb R^n&\to\mathbb R^n
\end{align*}
\begin{align*}
(x,y)&\mapsto y-x
\end{align*}
and
\begin{align*}
F:[0,1]\times\mathbb R^n\times\mathbb R^n&\to[0,1]\times\mathbb R^n
\end{align*}
\begin{align*}
(t,x,y)&\mapsto (t,(1-t)x+ty).
\end{align*}
Let $\Lambda:=\mathcal L^1\big|_{[0,1]}\otimes\pi$. The map $F$ records the observable space-time position of a particle, while $\xi$ records its actual constant velocity. Since $\pi$ is an optimal coupling of two measures in $\mathcal P_2(\mathbb R^n)$, the quadratic transport cost is finite, and hence
\begin{align*}
\int_{[0,1]\times\mathbb R^n\times\mathbb R^n}|\xi(x,y)|^2\,d\Lambda(t,x,y)
=
\int_{\mathbb R^n\times\mathbb R^n}|y-x|^2\,d\pi(x,y)
<\infty.
\end{align*}
Thus $\xi$ is square-integrable with respect to $\Lambda$.
We now use the disintegration and conditional expectation theorem for the measurable map $F$. It gives a Borel map
\begin{align*}
v:[0,1]\times\mathbb R^n&\to\mathbb R^n
\end{align*}
\begin{align*}
(t,z)&\mapsto v_t(z)
\end{align*}
such that $v(F(t,x,y))$ is the conditional expectation of $\xi(x,y)$ given the value of $F(t,x,y)$. Equivalently, $v_t(z)$ is the average displacement $y-x$ among all particles that are at $z$ at time $t$. This definition is meaningful only up to the measure $F_\#\Lambda=d\mu_t(z)\,d\mathcal L^1(t)$, which is exactly the null-set convention needed for the action integral and the weak continuity equation.
The crucial estimate is Jensen's inequality for conditional expectation. Since the function $a\mapsto |a|^2$ is convex on $\mathbb R^n$,
\begin{align*}
|v(F(t,x,y))|^2
=
|\mathbb E_\Lambda[\xi\mid F](t,x,y)|^2
\le
\mathbb E_\Lambda[|\xi|^2\mid F](t,x,y)
\end{align*}
in the conditional-expectation sense. Integrating this inequality with respect to $\Lambda$ and pushing forward by $F$ gives
\begin{align*}
\int_0^1\int_{\mathbb R^n}|v_t(z)|^2\,d\mu_t(z)\,d\mathcal L^1(t)
\le
\int_{[0,1]\times\mathbb R^n\times\mathbb R^n}|\xi(x,y)|^2\,d\Lambda(t,x,y).
\end{align*}
Substituting the definition of $\Lambda$ and $\xi$, we obtain
\begin{align*}
\int_0^1\int_{\mathbb R^n}|v_t(z)|^2\,d\mu_t(z)\,d\mathcal L^1(t)
\le
\int_{\mathbb R^n\times\mathbb R^n}|y-x|^2\,d\pi(x,y)
=
W_2^2(\mu_0,\mu_1).
\end{align*}
This is the action bound required for the upper inequality.[/guided]