[guided]The differential of the exponential map is most naturally studied by varying the initial velocity of a geodesic. Since $v \ne 0$, define
\begin{align*}
e := \frac{v}{r},
\end{align*}
where $r=|v|_g$. Then $|e|_g=1$ and $v=re$. Because $w$ is orthogonal to $v$, it is also orthogonal to $e$. We scale $w$ by $r$ and define
\begin{align*}
u := \frac{w}{r}.
\end{align*}
Thus $w=ru$ and $(e,u)_g=0$.
Now define the radial geodesic
\begin{align*}
\gamma:[0,r] &\to M,\\
t &\mapsto \exp_p(te).
\end{align*}
Completeness ensures that $\exp_p$ is defined on all of $T_pM$, and $|e|_g=1$ ensures that $\gamma$ has unit speed. Since $0 \le t \le r < R$, every point $\gamma(t)$ lies in $B(p,R)$, so the curvature bounds from the theorem apply along this geodesic.
To connect $(d\exp_p)_{re}(ru)$ with a Jacobi field, vary the initial velocity $e$ in the transverse direction $u$. For sufficiently small $\varepsilon>0$, define
\begin{align*}
F:(-\varepsilon,\varepsilon)\times[0,r] &\to M,\\
(s,t) &\mapsto \exp_p(t(e+s u)).
\end{align*}
For each fixed $s$, the curve $t \mapsto F(s,t)$ is the geodesic starting at $p$ with initial velocity $e+s u$. Therefore the variational vector field
\begin{align*}
J:[0,r] &\to TM,\\
t &\mapsto \left.\frac{\partial F}{\partial s}\right|_{s=0}(t)
\end{align*}
is a Jacobi field along $\gamma$ (citing a result not yet in the wiki: geodesic variation characterization of Jacobi fields).
The initial conditions come from differentiating the variation at $t=0$. Since $F(s,0)=p$ for every $s$,
\begin{align*}
J(0)=0.
\end{align*}
The initial covariant derivative records the derivative of the initial velocity:
\begin{align*}
D_tJ(0)
=
\left.\frac{D}{ds}\right|_{s=0}\left.\frac{\partial F}{\partial t}\right|_{t=0}
=
\left.\frac{D}{ds}\right|_{s=0}(e+s u)
=
u.
\end{align*}
Finally, at time $r$, the curve in $M$ obtained by varying $s$ is
\begin{align*}
s \mapsto F(s,r)=\exp_p(r(e+s u)).
\end{align*}
Its derivative at $s=0$ is exactly the differential of $\exp_p$ at $re$ applied to $ru$:
\begin{align*}
J(r)
=
(d\exp_p)_{re}(ru).
\end{align*}
Since $re=v$ and $ru=w$, this is also
\begin{align*}
J(r)=(d\exp_p)_v(w).
\end{align*}[/guided]