[proofplan]
Set $a = \dot\gamma(0)$ and $w = J'(0)$. The hypothesis $J(0) = 0$ together with the longitudinal-Jacobi formula from the [Structure of Jacobi Fields](/theorems/2715) shows the radial component of $J$ is $g_p(w, a/|a|^2)\, t\dot\gamma(t)$, but the formula we are after is for the full $J$, so we treat the case directly: the variation
\begin{align*}
f: (t, s) \mapsto \exp_p(t(a + sw))
\end{align*}
has each $s$-slice a geodesic through $p$ with initial velocity $a + sw$, by definition of $\exp_p$ and the [Geodesic Rescaling](/theorems/2710) lemma. So $f$ is a geodesic variation of $\gamma = f(\cdot, 0)$, and by [Jacobi Fields are Geodesic Variations](/theorems/2716) the variation field $\tilde J(t) := \partial_s f(t, 0)$ is a Jacobi field. We compute $\tilde J(t)$ via the chain rule applied to $\exp_p$, which yields exactly $\tilde J(t) = (d\exp_p)_{ta}(tw)$, then verify that $\tilde J$ has the same initial data $(0, w)$ as $J$. Uniqueness of Jacobi fields with given initial data, again from [Structure of Jacobi Fields](/theorems/2715), forces $J = \tilde J$.
[/proofplan]
[step:Set notation and define a candidate geodesic variation $f(t, s) = \exp_p(t(a + sw))$]
Set
\begin{align*}
a := \dot\gamma(0) \in T_p M, \qquad w := J'(0) := \frac{\nabla J}{dt}(0) \in T_p M.
\end{align*}
Since $J(0) = 0$ and $\gamma$ is the geodesic with $\gamma(0) = p$, $\dot\gamma(0) = a$, the [Geodesic Rescaling](/theorems/2710) lemma gives $\gamma(t) = \exp_p(ta)$ for $t$ in the maximal interval of definition of $\gamma$. The hypothesis ensures this includes $[0, L]$.
Define the candidate variation
\begin{align*}
f: [0, L] \times (-\varepsilon, \varepsilon) &\to M \\
(t, s) &\mapsto \exp_p\bigl(t(a + sw)\bigr).
\end{align*}
We must choose $\varepsilon > 0$ so that this map is defined and smooth. The set $\mathcal{D}_p \subseteq T_p M$ on which $\exp_p$ is defined is open and contains $0$; the set
\begin{align*}
K := \{ta : t \in [0, L]\} \subseteq \mathcal{D}_p
\end{align*}
is compact (the image of $[0, L]$ under $t \mapsto ta$). Since $\mathcal{D}_p$ is open and contains the compact $K$, there exists $\varepsilon > 0$ such that $t(a + sw) \in \mathcal{D}_p$ for all $(t, s) \in [0, L] \times (-\varepsilon, \varepsilon)$. With this $\varepsilon$, $f$ is smooth: $\exp_p$ is smooth on $\mathcal{D}_p$ by the [Exponential Map as a Local Diffeomorphism](/theorems/2712) (or simply by smooth dependence of geodesic flow on initial data), and $(t, s) \mapsto t(a + sw)$ is smooth.
[/step]
[step:Verify that $f$ is a geodesic variation: each $s$-slice is a geodesic through $p$]
Fix $s \in (-\varepsilon, \varepsilon)$. The curve
\begin{align*}
\beta_s: [0, L] &\to M \\
t &\mapsto f(t, s) = \exp_p(t(a + sw))
\end{align*}
satisfies $\beta_s(0) = \exp_p(0) = p$ and is a geodesic, by [Geodesic Rescaling](/theorems/2710): the rescaling lemma states that for any $b \in T_p M$ in the domain of $\exp_p$ and any $\lambda \in \mathbb{R}$, the curve $t \mapsto \exp_p(t\lambda b)$ is the geodesic with initial conditions $\beta(0) = p$, $\dot\beta(0) = \lambda b$, on the appropriate interval. Applied here with $b = a + sw$, $\lambda = 1$ on $[0, L]$, we get that $\beta_s$ is the geodesic with $\beta_s(0) = p$ and $\dot\beta_s(0) = a + sw$. In particular $\beta_0 = \gamma$.
So $f: [0, L] \times (-\varepsilon, \varepsilon) \to M$ is a geodesic variation of $\gamma$ in the sense of [Jacobi Fields are Geodesic Variations](/theorems/2716): each $s$-slice is a geodesic, and $f(t, 0) = \gamma(t)$.
[/step]
[step:Compute the variation field $\tilde J(t) := \partial_s f(t, 0)$ via the chain rule on $\exp_p$]
The variation field is, by definition,
\begin{align*}
\tilde J(t) := \frac{\partial f}{\partial s}(t, 0) = df_{(t, 0)}(\partial_s).
\end{align*}
Write $f$ as the composition
\begin{align*}
[0, L] \times (-\varepsilon, \varepsilon) \xrightarrow{\Phi} \mathcal{D}_p \xrightarrow{\exp_p} M, \qquad \Phi(t, s) := t(a + sw).
\end{align*}
By the chain rule,
\begin{align*}
df_{(t, s)}(\partial_s) = (d\exp_p)_{\Phi(t, s)}\bigl(d\Phi_{(t, s)}(\partial_s)\bigr).
\end{align*}
The map $\Phi$ takes values in the affine chart of $T_p M$ (treating $T_p M \cong \mathbb{R}^n$ via any linear isomorphism); its $\partial_s$-derivative is computed in this chart as
\begin{align*}
d\Phi_{(t, s)}(\partial_s) = \frac{\partial}{\partial s}\big|_{s} \bigl(t(a + sw)\bigr) = tw \in T_{t(a + sw)}(T_p M) \cong T_p M.
\end{align*}
(The canonical identification $T_v(T_p M) \cong T_p M$ is used, and is canonical because $T_p M$ is itself a vector space; $tw$ is the directional vector at the point $t(a + sw)$ in the chart $T_p M$.)
Hence
\begin{align*}
df_{(t, s)}(\partial_s) = (d\exp_p)_{t(a + sw)}(tw).
\end{align*}
Setting $s = 0$,
\begin{align*}
\tilde J(t) = df_{(t, 0)}(\partial_s) = (d\exp_p)_{ta}(tw). \tag{$\ast$}
\end{align*}
[guided]
We need to compute the variation field, which by definition is the $\partial_s$-derivative of $f$ along the central curve. Unpacking the definition:
\begin{align*}
\tilde J(t) := \frac{\partial f}{\partial s}(t, 0) = df_{(t, 0)}(\partial_s).
\end{align*}
How do we compute $df$? The trick is to factor $f$ through the vector space $T_p M$. Define $\Phi(t, s) := t(a + sw)$; then $f$ is the composition
\begin{align*}
[0, L] \times (-\varepsilon, \varepsilon) \xrightarrow{\Phi} \mathcal{D}_p \xrightarrow{\exp_p} M, \qquad \Phi(t, s) := t(a + sw).
\end{align*}
This factorisation is useful because $\Phi$ lands in the vector space $T_p M$, where derivatives reduce to ordinary partial derivatives in the affine chart, while the only place we have to do "manifold work" is the single application of $d\exp_p$.
Applying the chain rule to a smooth map between manifolds — with $T_p M$ regarded as a smooth manifold and the canonical identification $T_v(T_p M) \cong T_p M$ for any $v \in T_p M$, which is canonical because $T_p M$ is itself a vector space — we get
\begin{align*}
df_{(t, s)}(\partial_s) = (d\exp_p)_{\Phi(t, s)}\bigl(d\Phi_{(t, s)}(\partial_s)\bigr).
\end{align*}
Now we compute the inner factor $d\Phi_{(t, s)}(\partial_s)$. Since $\Phi$ takes values in a vector space, its derivative is just the partial derivative computed in the affine chart. Differentiating $\Phi(t, s) = ta + tsw$ with respect to $s$ at fixed $t$:
\begin{align*}
d\Phi_{(t, s)}(\partial_s) = \frac{\partial}{\partial s}\big|_{s} \bigl(t(a + sw)\bigr) = tw \in T_{t(a + sw)}(T_p M) \cong T_p M.
\end{align*}
The output is a tangent vector at the point $t(a + sw) \in T_p M$, which we identify with $T_p M$ itself via the canonical identification $T_v V \cong V$ for vector spaces $V$.
Substituting back into the chain-rule expression:
\begin{align*}
df_{(t, s)}(\partial_s) = (d\exp_p)_{t(a + sw)}(tw).
\end{align*}
Finally, setting $s = 0$ to evaluate at the central curve:
\begin{align*}
\tilde J(t) = df_{(t, 0)}(\partial_s) = (d\exp_p)_{ta}(tw). \tag{$\ast$}
\end{align*}
We mark this as $(\ast)$ because we will refer back to it in the next step when we verify that $\tilde J$ has the correct initial data.
This is a coordinate-free version of the standard "differentiate the exponential of a curve in the parameter" computation, expressed via the chain rule on the manifold $T_p M$ and pulled through $\exp_p$.
**Why the formula is morally simple.** $\exp_p(t(a + sw))$ is the geodesic from $p$ in direction $a + sw$, traversed for unit time and reparametrised by $t$. As we increase $s$, we tilt the initial velocity by $sw$, and the variation field at parameter $t$ is the difference between the tilted and untilted geodesics at time $t$ — to first order, this is $(d\exp_p)_{ta}$ applied to $tw$ because the input to $\exp_p$ is $\Phi(t, s) = ta + s(tw)$, and the linear part in $s$ is $tw$.
[/guided]
[/step]
[step:Verify $\tilde J$ has the same initial data $(0, w)$ as $J$ and conclude by uniqueness]
We check that the Jacobi field $\tilde J$ defined by $(\ast)$ satisfies $\tilde J(0) = 0$ and $\tilde J'(0) = w$.
**$\tilde J(0) = 0$.** From $(\ast)$ with $t = 0$:
\begin{align*}
\tilde J(0) = (d\exp_p)_{0}(0 \cdot w) = (d\exp_p)_0(0) = 0,
\end{align*}
by linearity of the differential.
**$\tilde J'(0) = w$.** This requires more care. By Step 2, $\tilde J$ is a Jacobi field along $\gamma$; in particular $\tilde J$ is smooth, so $\tilde J'(0) = \frac{\nabla \tilde J}{dt}(0)$ is well-defined. The cleanest computation goes via the symmetry of mixed covariant derivatives along the variation $f$:
\begin{align*}
\tilde J'(0) = \frac{\nabla}{\partial t}\Big|_{t = 0, s = 0} \partial_s f = \frac{\nabla}{\partial s}\Big|_{s = 0} \partial_t f(0, s),
\end{align*}
where the second equality is the symmetry $\nabla_{\partial t} \partial_s f = \nabla_{\partial s} \partial_t f$ used in the proof of [Jacobi Fields are Geodesic Variations](/theorems/2716) (a consequence of torsion-freeness of the Levi-Civita connection plus $[\partial_t, \partial_s] = 0$).
Now $\partial_t f(0, s)$ is the initial velocity of the geodesic $\beta_s$, which by Step 2 is $a + sw$:
\begin{align*}
\partial_t f(0, s) = \dot\beta_s(0) = a + sw \in T_p M.
\end{align*}
This is a smooth curve in $T_p M$ (a vector space, hence its own tangent space). Its covariant derivative along the curve $s \mapsto p$ in $M$ is, by the canonical identification $T_v T_p M \cong T_p M$, just the ordinary derivative:
\begin{align*}
\frac{\nabla}{\partial s}\big|_{s = 0} \partial_t f(0, s) = \frac{d}{ds}\big|_{s = 0}(a + sw) = w.
\end{align*}
(Here we use that the curve $s \mapsto f(0, s) = \exp_p(0) = p$ is constant, so $\nabla_{\partial s}$ acting on a vector field along this constant curve is the ordinary $s$-derivative in $T_p M$ — this is the special case where parallel transport is the identity on a constant curve.)
Hence $\tilde J'(0) = w = J'(0)$. Combined with $\tilde J(0) = 0 = J(0)$, both $J$ and $\tilde J$ are Jacobi fields along $\gamma$ with the same initial data. By the uniqueness assertion in the [Structure of Jacobi Fields](/theorems/2715),
\begin{align*}
J(t) = \tilde J(t) = (d\exp_p)_{ta}(tw) = (d\exp_p)_{t\dot\gamma(0)}\bigl(t J'(0)\bigr)
\end{align*}
for all $t \in [0, L]$, which is the claimed formula.
[/step]