[proofplan]
We differentiate the energy density along the variation and express the result using the pullback Levi-Civita connection on $U^{-1}TN$. Metric compatibility converts the derivative of the local energy density into the contraction of $\nabla V$ with $du$. We then integrate by parts on the compact boundaryless manifold $M$ by applying the divergence identity to the vector field metrically dual to the one-form $X\mapsto h(V,du(X))$. The remaining term is exactly the negative pairing of $V$ with the tension field $\tau(u)=\operatorname{tr}_g(\nabla du)$.
[/proofplan]
[step:Differentiate the energy density along the variation]
Let $I:=(-\varepsilon,\varepsilon)$, and let $\pi:I\times M\to M$ be the projection defined by $\pi(s,p):=p$. Equip the pullback bundle $U^{-1}TN\to I\times M$ with the connection induced by the Levi-Civita connection of $(N,h)$; denote this connection by $\nabla^U$. For fixed $s\in I$, denote by $\nabla^s$ the induced connection on $U_s^{-1}TN$.
Let $p\in M$, and choose a local $g$-orthonormal frame $e_1,\dots,e_m$ on an open neighbourhood $W\subset M$ of $p$, where $m:=\dim M$. We view each $e_i$ as a vector field on $I\times W$ independent of the $s$-variable. The local energy density of $U_s$ is the smooth map $e_U:I\times W\to \mathbb{R}$ defined at each point $(s,q)\in I\times W$ by
\begin{align*}
e_U(s,q)=\frac{1}{2}\sum_{i=1}^m h\bigl(dU_{(s,q)}(e_i),dU_{(s,q)}(e_i)\bigr).
\end{align*}
This definition is independent of the chosen local $g$-orthonormal frame, since it is the $g$-trace of the symmetric [bilinear form](/page/Bilinear%20Form) $(X,Y)\mapsto h(dU_s(X),dU_s(Y))$. Thus these local definitions patch to a global smooth energy density $e_U:I\times M\to\mathbb{R}$.
Choose $\delta\in(0,\varepsilon)$. Since $M$ is compact and $\partial_s e_U$ is continuous on the compact set $[-\delta,\delta]\times M$, the derivative $\partial_s e_U$ is bounded there. Hence the [differentiation under the integral sign](/page/Differentiation%20Under%20the%20Integral%20Sign) is justified at $s=0$, and
\begin{align*}
\frac{d}{ds}\Big|_{s=0}E[U_s]=\int_M \partial_s e_U(0,p)\,d\operatorname{vol}_g(p).
\end{align*}
By metric compatibility of $\nabla^U$ with $h$,
\begin{align*}
\partial_s e_U
=
\sum_{i=1}^m h\bigl(\nabla^U_{\partial_s}(dU(e_i)),dU(e_i)\bigr).
\end{align*}
The torsion-free property of the [Levi-Civita connection](/page/Levi-Civita%20Connection) on $N$ gives the pullback commutation identity
\begin{align*}
\nabla^U_{\partial_s}(dU(e_i))-\nabla^U_{e_i}(dU(\partial_s))
=
dU([\partial_s,e_i]).
\end{align*}
Because $e_i$ is independent of $s$, $[\partial_s,e_i]=0$, and hence
\begin{align*}
\nabla^U_{\partial_s}(dU(e_i))
=
\nabla^U_{e_i}(dU(\partial_s)).
\end{align*}
Evaluating at $s=0$ gives $dU(\partial_s)|_{\{0\}\times M}=V$ and $dU(e_i)|_{\{0\}\times M}=du(e_i)$. Therefore
\begin{align*}
\partial_s e_U(0,p)
=
\sum_{i=1}^m h\bigl(\nabla^u_{e_i}V,du(e_i)\bigr),
\end{align*}
where $\nabla^u$ denotes the pullback connection on $u^{-1}TN$. Thus
\begin{align*}
\frac{d}{ds}\Big|_{s=0}E[U_s]=\int_M \sum_{i=1}^m h\bigl(\nabla^u_{e_i}V,du(e_i)\bigr)\,d\operatorname{vol}_g.
\end{align*}
[guided]
The energy is an integral of a pointwise quadratic expression in $dU_s$, so the first task is to differentiate that quadratic expression. Let $I:=(-\varepsilon,\varepsilon)$, and let $\pi:I\times M\to M$ be the projection defined by $\pi(s,p):=p$. The map $U:I\times M\to N$ pulls the tangent bundle $TN\to N$ back to a vector bundle $U^{-1}TN\to I\times M$. The Levi-Civita connection of $(N,h)$ induces a connection on this pullback bundle; denote it by $\nabla^U$.
Fix $p\in M$. Choose a local $g$-orthonormal frame $e_1,\dots,e_m$ on an open neighbourhood $W\subset M$ of $p$, where $m:=\dim M$. We extend each $e_i$ to $I\times W$ by making it independent of $s$. Define the local energy density as the smooth map $e_U:I\times W\to \mathbb{R}$. For $(s,q)\in I\times W$, set
\begin{align*}
e_U(s,q)=\frac{1}{2}\sum_{i=1}^m h\bigl(dU_{(s,q)}(e_i),dU_{(s,q)}(e_i)\bigr).
\end{align*}
This expression is independent of the chosen local orthonormal frame because it is the trace of the bilinear form $(X,Y)\mapsto h(dU_s(X),dU_s(Y))$ with respect to $g$.
Choose $\delta\in(0,\varepsilon)$. Since $M$ is compact and $U$ is smooth, the function $e_U$ is smooth on $I\times M$, so $\partial_s e_U$ is continuous on the compact set $[-\delta,\delta]\times M$. Therefore $\partial_s e_U$ is bounded on that compact set. This boundedness is the local domination needed for the [differentiation under the integral sign](/page/Differentiation%20Under%20the%20Integral%20Sign) at $s=0$, and hence
\begin{align*}
\frac{d}{ds}\Big|_{s=0}E[U_s]=\int_M \partial_s e_U(0,p)\,d\operatorname{vol}_g(p).
\end{align*}
Now compute $\partial_s e_U$. Metric compatibility of the pullback connection means that for sections $A,B\in\Gamma(U^{-1}TN)$,
\begin{align*}
\partial_s h(A,B)
=
h(\nabla^U_{\partial_s}A,B)+h(A,\nabla^U_{\partial_s}B).
\end{align*}
Apply this with $A=B=dU(e_i)$. The factor $\frac12$ cancels the two equal terms, so
\begin{align*}
\partial_s e_U
=
\sum_{i=1}^m h\bigl(\nabla^U_{\partial_s}(dU(e_i)),dU(e_i)\bigr).
\end{align*}
We next replace $\nabla^U_{\partial_s}(dU(e_i))$ by a derivative of the variation field. This is where the torsion-free property of the [Levi-Civita connection](/page/Levi-Civita%20Connection) on $N$ enters. For vector fields $A,B$ on $I\times M$, the pullback connection satisfies
\begin{align*}
\nabla^U_A(dU(B))-\nabla^U_B(dU(A))
=
dU([A,B]).
\end{align*}
Taking $A=\partial_s$ and $B=e_i$, and using that the lifted frame $e_i$ is independent of $s$, gives $[\partial_s,e_i]=0$. Hence
\begin{align*}
\nabla^U_{\partial_s}(dU(e_i))
=
\nabla^U_{e_i}(dU(\partial_s)).
\end{align*}
At $s=0$, the section $dU(\partial_s)$ restricts to the variation field $V\in\Gamma(u^{-1}TN)$, defined by $V(p):=dU_{(0,p)}(\partial_s)$ for $p\in M$, and $dU(e_i)$ restricts to $du(e_i)$. Therefore
\begin{align*}
\partial_s e_U(0,p)
=
\sum_{i=1}^m h\bigl(\nabla^u_{e_i}V,du(e_i)\bigr),
\end{align*}
where $\nabla^u$ is the pullback connection on $u^{-1}TN$. Substituting this pointwise identity into the differentiated energy gives
\begin{align*}
\frac{d}{ds}\Big|_{s=0}E[U_s]=\int_M \sum_{i=1}^m h\bigl(\nabla^u_{e_i}V,du(e_i)\bigr)\,d\operatorname{vol}_g.
\end{align*}
This formula is the analytic heart of the first variation: the derivative of the energy is the trace pairing between the covariant derivative of the variation field and the differential of $u$.
[/guided]
[/step]
[step:Convert the trace pairing into a divergence plus the tension field]
Define the one-form $\alpha\in\Omega^1(M)$ by
\begin{align*}
\alpha(X):=h(V,du(X))
\end{align*}
for every vector field $X\in\mathfrak{X}(M)$. Let $\alpha^\sharp\in\mathfrak{X}(M)$ be the vector field metrically dual to $\alpha$, meaning
\begin{align*}
g(\alpha^\sharp,X)=\alpha(X)
\end{align*}
for every $X\in\mathfrak{X}(M)$.
At a point $p\in M$, choose the local $g$-orthonormal frame $e_1,\dots,e_m$ so that $\nabla^M_{e_i}e_j=0$ at $p$, where $\nabla^M$ is the Levi-Civita connection of $g$. Then the definition of divergence in the local orthonormal frame gives
\begin{align*}
\operatorname{div}_g(\alpha^\sharp)(p)=\sum_{i=1}^m e_i\bigl(g(\alpha^\sharp,e_i)\bigr)(p).
\end{align*}
Since $g(\alpha^\sharp,e_i)=\alpha(e_i)=h(V,du(e_i))$, this becomes
\begin{align*}
\operatorname{div}_g(\alpha^\sharp)(p)=\sum_{i=1}^m e_i\bigl(h(V,du(e_i))\bigr)(p).
\end{align*}
Metric compatibility of the pullback connection $\nabla^u$ with $h$ gives
\begin{align*}
\operatorname{div}_g(\alpha^\sharp)(p)=\sum_{i=1}^m h\bigl(\nabla^u_{e_i}V,du(e_i)\bigr)(p)+\sum_{i=1}^m h\bigl(V,\nabla^u_{e_i}(du(e_i))\bigr)(p).
\end{align*}
Because $\nabla^M_{e_i}e_i=0$ at $p$, the second sum is
\begin{align*}
\sum_{i=1}^m h\bigl(V,(\nabla du)(e_i,e_i)\bigr)(p)
=
h(V,\tau(u))(p),
\end{align*}
where
\begin{align*}
\tau(u):=\operatorname{tr}_g(\nabla du)=\sum_{i=1}^m(\nabla du)(e_i,e_i)
\end{align*}
is the tension field. Hence, pointwise on $M$,
\begin{align*}
\sum_{i=1}^m h\bigl(\nabla^u_{e_i}V,du(e_i)\bigr)
=
\operatorname{div}_g(\alpha^\sharp)-h(V,\tau(u)).
\end{align*}
[guided]
We want to rewrite the trace pairing $\sum_{i=1}^m h(\nabla^u_{e_i}V,du(e_i))$ as a divergence plus a geometric error term. Define the one-form $\alpha\in\Omega^1(M)$ by
\begin{align*}
\alpha(X):=h(V,du(X))
\end{align*}
for every vector field $X\in\mathfrak{X}(M)$. Let $\alpha^\sharp\in\mathfrak{X}(M)$ be the vector field metrically dual to $\alpha$, so that
\begin{align*}
g(\alpha^\sharp,X)=\alpha(X)
\end{align*}
for every $X\in\mathfrak{X}(M)$.
Fix $p\in M$. Choose a local $g$-orthonormal frame $e_1,\dots,e_m$ near $p$ such that $\nabla^M_{e_i}e_j=0$ at $p$, where $\nabla^M$ is the Levi-Civita connection of $g$. This choice is available by taking a normal orthonormal frame at the point. The divergence of $\alpha^\sharp$ at $p$ is then
\begin{align*}
\operatorname{div}_g(\alpha^\sharp)(p)=\sum_{i=1}^m e_i\bigl(g(\alpha^\sharp,e_i)\bigr)(p).
\end{align*}
Since $g(\alpha^\sharp,e_i)=\alpha(e_i)=h(V,du(e_i))$, metric compatibility of the pullback connection $\nabla^u$ gives
\begin{align*}
\operatorname{div}_g(\alpha^\sharp)(p)
=
\sum_{i=1}^m h\bigl(\nabla^u_{e_i}V,du(e_i)\bigr)(p)
+
\sum_{i=1}^m h\bigl(V,\nabla^u_{e_i}(du(e_i))\bigr)(p).
\end{align*}
The second sum is where the tension field appears. The covariant derivative of $du$ is defined by
\begin{align*}
(\nabla du)(X,Y):=\nabla^u_X(du(Y))-du(\nabla^M_XY)
\end{align*}
for vector fields $X,Y\in\mathfrak{X}(M)$. Because $\nabla^M_{e_i}e_i=0$ at $p$, we have
\begin{align*}
\nabla^u_{e_i}(du(e_i))(p)=(\nabla du)(e_i,e_i)(p).
\end{align*}
Therefore
\begin{align*}
\sum_{i=1}^m h\bigl(V,\nabla^u_{e_i}(du(e_i))\bigr)(p)
=
h(V,\tau(u))(p),
\end{align*}
where $\tau(u):=\operatorname{tr}_g(\nabla du)=\sum_{i=1}^m(\nabla du)(e_i,e_i)$ is the tension field. Rearranging the divergence identity gives, pointwise on $M$,
\begin{align*}
\sum_{i=1}^m h\bigl(\nabla^u_{e_i}V,du(e_i)\bigr)
=
\operatorname{div}_g(\alpha^\sharp)-h(V,\tau(u)).
\end{align*}
[/guided]
[/step]
[step:Integrate the divergence term away on the compact boundaryless domain]
Since $M$ is compact and has no boundary, the [divergence theorem](/page/Divergence%20Theorem) implies that the integral of the divergence of any smooth vector field over $M$ vanishes. Applying this to $\alpha^\sharp\in\mathfrak{X}(M)$ gives
\begin{align*}
\int_M \operatorname{div}_g(\alpha^\sharp)\,d\operatorname{vol}_g=0.
\end{align*}
Combining this with the identity from the previous step, first we have
\begin{align*}
\frac{d}{ds}\Big|_{s=0}E[U_s]=\int_M \sum_{i=1}^m h\bigl(\nabla^u_{e_i}V,du(e_i)\bigr)\,d\operatorname{vol}_g.
\end{align*}
Substituting the pointwise identity for the trace pairing gives
\begin{align*}
\frac{d}{ds}\Big|_{s=0}E[U_s]=\int_M \operatorname{div}_g(\alpha^\sharp)\,d\operatorname{vol}_g-\int_M h(V,\tau(u))\,d\operatorname{vol}_g.
\end{align*}
The divergence integral vanishes, so
\begin{align*}
\frac{d}{ds}\Big|_{s=0}E[U_s]=-\int_M h(V,\tau(u))\,d\operatorname{vol}_g.
\end{align*}
By symmetry of the Riemannian metric $h$, $h(V,\tau(u))=h(\tau(u),V)$. Therefore
\begin{align*}
\frac{d}{ds}\Big|_{s=0}E[U_s]
=
-\int_M h(\tau(u),V)\,d\operatorname{vol}_g,
\end{align*}
which is the desired [first variation formula](/theorems/2728).
[guided]
We now integrate the pointwise identity from the previous step. The vector field $\alpha^\sharp$ is smooth because $u$, $V$, $g$, and $h$ are smooth. Since $M$ is compact and has no boundary, the [divergence theorem](/page/Divergence%20Theorem) applies to $\alpha^\sharp$ and gives
\begin{align*}
\int_M \operatorname{div}_g(\alpha^\sharp)\,d\operatorname{vol}_g=0.
\end{align*}
From the first step we already proved
\begin{align*}
\frac{d}{ds}\Big|_{s=0}E[U_s]=\int_M \sum_{i=1}^m h\bigl(\nabla^u_{e_i}V,du(e_i)\bigr)\,d\operatorname{vol}_g.
\end{align*}
Substitute the identity
\begin{align*}
\sum_{i=1}^m h\bigl(\nabla^u_{e_i}V,du(e_i)\bigr)
=
\operatorname{div}_g(\alpha^\sharp)-h(V,\tau(u))
\end{align*}
into this integral. We obtain
\begin{align*}
\frac{d}{ds}\Big|_{s=0}E[U_s]
=
\int_M \operatorname{div}_g(\alpha^\sharp)\,d\operatorname{vol}_g
-
\int_M h(V,\tau(u))\,d\operatorname{vol}_g.
\end{align*}
The divergence integral is zero, so
\begin{align*}
\frac{d}{ds}\Big|_{s=0}E[U_s]
=
-\int_M h(V,\tau(u))\,d\operatorname{vol}_g.
\end{align*}
Finally, $h$ is a Riemannian metric and hence symmetric, so $h(V,\tau(u))=h(\tau(u),V)$. Therefore
\begin{align*}
\frac{d}{ds}\Big|_{s=0}E[U_s]
=
-\int_M h(\tau(u),V)\,d\operatorname{vol}_g,
\end{align*}
which is exactly the first variation formula in the theorem statement.
[/guided]
[/step]