[guided]The strategy of this step is to upgrade the inequality of Step 1 to an *equality on a sufficiently rich subset*: we want to show that for every length the curve $\gamma_1$ achieves, there is another curve in $\Omega(p, q)$ with the same length but constant speed (saturating Cauchy–Schwarz). Why is this needed? Because only constant-speed curves saturate Cauchy–Schwarz, but a generic $\gamma_1 \in \Omega(p, q)$ has wildly varying speed.
The standard tool is the arc-length reparametrisation. The arc-length function $s(t) = \int_0^t |\dot\gamma_1|_g \, d\mathcal{L}^1$ measures distance covered by time $t$. To invert it as a piecewise $C^1$ map, we need $s'(t) = |\dot\gamma_1(t)|_g > 0$ — otherwise $s$ may be flat on intervals and not invertible. So we restrict the construction to **regular paths** and recover the general case by approximation.
**Definition.** Let $\Omega^*(p, q) := \{\gamma \in \Omega(p, q) : |\dot\gamma(t)|_g > 0 \text{ at every } t \text{ where } \dot\gamma \text{ exists}\}$.
**Constant-speed reparametrisation on $\Omega^*$.** Fix $\gamma_1 \in \Omega^*(p, q)$ with $\ell := \ell(\gamma_1) > 0$. Define the arc-length $s : [0, T] \to [0, \ell]$ by $s(t) = \int_0^t |\dot\gamma_1(\tau)|_g \, d\mathcal{L}^1(\tau)$, and let $0 = t_0 < t_1 < \cdots < t_N = T$ be a partition adapted to the piecewise-$C^1$ structure of $\gamma_1$. On each piece $[t_{i-1}, t_i]$, $s$ is $C^1$ with $s'(t) = |\dot\gamma_1(t)|_g > 0$ by regularity, hence strictly increasing. Applying the inverse function theorem piecewise, the inverse $s^{-1} : [0, \ell] \to [0, T]$ is well-defined, continuous, strictly increasing, and piecewise $C^1$ with $(s^{-1})'(u) = 1/|\dot\gamma_1(s^{-1}(u))|_g$. Now the rescaling factor $\ell/T$ converts the natural parameter on $[0, \ell]$ into the prescribed parameter on $[0, T]$:
\begin{align*}
\tilde\gamma_1(t) := \gamma_1\!\left(s^{-1}\!\left(\tfrac{\ell}{T}\, t\right)\right).
\end{align*}
This is a composition of piecewise $C^1$ maps, hence itself piecewise $C^1$, and $\tilde\gamma_1(0) = \gamma_1(s^{-1}(0)) = \gamma_1(0) = p$, $\tilde\gamma_1(T) = \gamma_1(s^{-1}(\ell)) = \gamma_1(T) = q$. The chain rule gives $|\dot{\tilde\gamma}_1(t)|_g = |\dot\gamma_1(s^{-1}(\tfrac{\ell}{T} t))|_g \cdot \tfrac{\ell}{T} \cdot \tfrac{1}{|\dot\gamma_1(s^{-1}(\tfrac{\ell}{T} t))|_g} \equiv \ell/T$, constant. Hence $\ell(\tilde\gamma_1) = \int_0^T \ell/T \, d\mathcal{L}^1 = \ell$ and the equality case of Step 1 yields $E(\tilde\gamma_1) = \ell(\gamma_1)^2/(2T)$. (If $\gamma_1$ is constant, then $p = q$ and the identity $0 = 0$ is trivial.)
**Why a density argument?** The constant-speed reparametrisation is restricted to $\Omega^*$. To transfer the conclusion to all of $\Omega(p, q)$, we show that $\Omega^*$ is dense in $\Omega(p, q)$ in length — i.e., the infima agree:
\begin{align*}
\inf_{\gamma \in \Omega^*(p, q)} \ell(\gamma) = \inf_{\gamma \in \Omega(p, q)} \ell(\gamma) = d(p, q).
\end{align*}
The inequality $\inf_{\Omega^*} \ell \geq \inf_\Omega \ell$ is immediate since $\Omega^* \subseteq \Omega$. For the reverse, fix $\gamma_1 \in \Omega(p, q)$ and $\varepsilon > 0$; we must construct $\gamma_1^\varepsilon \in \Omega^*(p, q)$ with $\ell(\gamma_1^\varepsilon) \leq \ell(\gamma_1) + \varepsilon$.
The idea is to perturb $\gamma_1$ in a transverse direction by a small amplitude oscillation, which knocks the speed away from zero everywhere while costing arbitrarily little length. Fix any unit vector $v \in T_p M$ and let $V$ be its parallel transport along $\gamma_1$ — a continuous unit vector field along $\gamma_1$. By smoothness of $\exp$ on a neighbourhood of the zero section, there exist $r_0 > 0$ and $C \ge 1$ such that for piecewise $C^1$ functions $\eta : [0, T] \to [-r_0, r_0]$ with $\eta(0) = \eta(T) = 0$, the curve $\gamma_1^\eta(t) := \exp_{\gamma_1(t)}(\eta(t) V(t))$ is well-defined, piecewise $C^1$, agrees with $\gamma_1$ at the endpoints, and satisfies the speed comparison
\begin{align*}
\bigl| |\dot{\gamma_1^\eta}(t)|_g - |\dot\gamma_1(t) + \eta'(t) V(t)|_g \bigr| \leq C\, |\eta(t)|\bigl(|\dot\gamma_1(t)|_g + |\eta'(t)|\bigr).
\end{align*}
(This is the standard $C^1$-comparison between $\exp$ and its differential at the origin, which is the identity in normal coordinates.)
Now the choice of $\eta$. We pick $\eta(t) := \delta\, \rho(t)\bigl(\sin(2\pi t/T + \theta) - \sin\theta\bigr)$ where $\rho$ is a smooth cutoff equal to $1$ except in tiny neighbourhoods of $0$ and $T$ (forcing $\eta(0) = \eta(T) = 0$), and $\theta$ is a generic phase chosen so that no zero of $\cos(2\pi t/T + \theta)$ in $[0, T]$ coincides with a zero of $|\dot\gamma_1|_g$. This last condition is achievable because the set of zeros of $|\dot\gamma_1|_g$ in $[0, T]$ is closed and the locations of the cosine zeros depend continuously on $\theta$, so a measure-positive set of $\theta$ avoids the obstruction. Then $\eta(0) = \eta(T) = 0$ and $|\eta'(t)| > 0$ except at finitely many points, none of which coincide with zeros of $|\dot\gamma_1|_g$. Hence at every $t$ where $\dot\gamma_1$ exists, the right-hand side $|\dot\gamma_1 + \eta' V|_g$ is bounded below by either $|\dot\gamma_1|_g > 0$ (when $\eta'(t) = 0$ but speed is nonzero) or $|\eta'(t)| > 0$ (when $|\dot\gamma_1| = 0$); the speed comparison then keeps $|\dot{\gamma_1^\eta}|_g > 0$ for $\delta$ small enough. So $\gamma_1^\eta \in \Omega^*(p, q)$.
Finally, the length cost. Integrating the speed comparison and using the triangle inequality $|\dot\gamma_1 + \eta' V|_g \leq |\dot\gamma_1|_g + |\eta'|$, the length increase $\ell(\gamma_1^\eta) - \ell(\gamma_1)$ is bounded above by $\int_0^T |\eta'(t)| \, d\mathcal{L}^1(t) + C \delta(\ell(\gamma_1) + 2\pi\delta)$. The first term is $O(\delta)$ (the bounded variation of one period of sine is $\leq 4\delta$, modulated by $\rho$); the second is $O(\delta\, \ell(\gamma_1))$. Choosing $\delta$ small enough makes the total $\leq \varepsilon$, and we set $\gamma_1^\varepsilon := \gamma_1^\eta$.
This completes the density argument. The lever for the next step: any minimum of $E$ on $\Omega(p, q)$ must achieve at most $\ell(\gamma_1)^2 / (2T)$ for every $\gamma_1 \in \Omega^*(p, q)$, hence at most $d(p, q)^2 / (2T)$ by passing to the infimum.[/guided]