[guided]The proof rests on a single algebraic identity: a curve from $p$ to $q$ of length exactly $r = d(p, q)$ is necessarily a minimal geodesic (after reparametrization). Our task is to produce such a curve. The natural candidate is a geodesic — but which one? We need to pick the initial direction at $p$ correctly.
If $p = q$, the constant curve at $p$ is a minimal geodesic, so assume $p \neq q$ and set $r = d(p, q) > 0$. We now build the small geodesic sphere centred at $p$ that will host our optimal point.
Here is the geometric idea. Imagine standing at $p$ and walking toward $q$ along an unknown minimal path. After walking a tiny distance $\delta$, you cross a small sphere around $p$ at some point $p_0$. By the triangle equality (which characterises minimisers), this $p_0$ satisfies $d(p, p_0) + d(p_0, q) = d(p, q)$. So the question becomes: which point on the small sphere lies on a minimal path?
To make "small sphere" precise, we need $\exp_p$ to be a genuine local chart there. Choose $\delta > 0$ small enough that $\exp_p$ restricts to a diffeomorphism on the closed ball $\overline{B}(0, \delta) \subset T_p M$ — this is possible by the [Exponential Map as a Local Diffeomorphism](/theorems/2712) — and so that $\delta < r$, which we will need for the optimal-point theorem below. Both conditions hold for any sufficiently small $\delta$. We then define the geodesic sphere
\begin{align*}
S_\delta(p) := \{x \in M : d(x, p) = \delta\} = \exp_p(\{v \in T_p M : |v|_g = \delta\}),
\end{align*}
which is the diffeomorphic image of a Euclidean sphere of radius $\delta$, hence compact. The fact that the metric distance $\delta$-sphere coincides with the image of the Euclidean $\delta$-sphere is precisely what the diffeomorphism property of $\exp_p$ on $\overline{B}(0, \delta)$ buys us.
We can now invoke the [Optimal Point on a Geodesic Sphere](/theorems/2725). Its hypotheses: $0 < \delta < d(p, q)$, which we have arranged ($\delta < r$ and $r > 0$). The theorem produces $p_0 \in S_\delta(p)$ with
\begin{align*}
d(p, p_0) + d(p_0, q) = d(p, q) = r.
\end{align*}
Since $p_0 \in S_\delta(p)$ we have $d(p, p_0) = \delta$, so $d(p_0, q) = r - \delta$.
We now extract the candidate initial velocity. Write $p_0 = \exp_p(\delta v)$ for the unique $v \in T_p M$ with $|v|_g = 1$ — uniqueness holds because $\exp_p|_{\overline{B}(0, \delta)}$ is injective. This $v$ is the direction "pointing toward $p_0$" at $p$, and it is the only candidate that has any chance of producing a length-$r$ curve from $p$ to $q$. Define the candidate geodesic
\begin{align*}
\gamma : [0, \infty) &\to M \\
t &\mapsto \exp_p(t v).
\end{align*}
Why is $\gamma$ defined for all $t \geq 0$, not just for $t \in [0, \delta]$? Because the hypothesis of the theorem is that $\exp_p$ is defined on all of $T_p M$ — completeness of $\exp_p$ at $p$. This is essential: we will need to follow $\gamma$ all the way to time $t = r$, far past the diffeomorphism radius $\delta$. By [Geodesics Have Constant Speed](/theorems/2709), since $\dot\gamma(0) = v$ has $|v|_g = 1$, the curve $\gamma$ has constant unit speed.
The remaining task — taking up the rest of the proof — is to show $\gamma(r) = q$. The plan is to track the "remaining distance to $q$": define $A = \{t \in [0,r] : d(\gamma(t), q) = r - t\}$. The key facts will be that $A$ is non-empty, closed in $[0, r]$, and relatively open in $[0, r)$, which together force $A = [0, r]$.[/guided]