[proofplan]
We differentiate the Riemannian density induced by the variation $F_t$ in local coordinates on $M$. The derivative of the determinant reduces the problem to computing the trace of the first variation of the induced metric. That trace is the tangential divergence of the tangential part of $X$ minus the [inner product](/page/Inner%20Product) of $X$ with the mean curvature vector. Since $X$ is supported in the interior of $K$, the divergence term integrates to zero, leaving exactly the stated formula.
[/proofplan]
[step:Differentiate the induced volume density in local coordinates]
Fix $p \in K$. Let $(U,\varphi)$ be a coordinate chart on $M$ with $p \in U$ and coordinate functions $(x_1,\dots,x_m)$. For $t \in (-\varepsilon,\varepsilon)$ and $i \in \{1,\dots,m\}$, define the coordinate vector field along $F_t$ to be the map $E_i(t): U \to F_t^*TN$ sending each $q \in U$ to $d(F_t)_q((\partial/\partial x_i)|_q)$. For $i,j \in \{1,\dots,m\}$, define the induced metric coefficient $\gamma_{ij}: (-\varepsilon,\varepsilon) \times U \to \mathbb{R}$ by the rule that $\gamma_{ij}(t,q)$ is $g_{F_t(q)}(E_i(t)(q),E_j(t)(q))$.
Let $(\gamma^{ij}(t,q))$ denote the inverse matrix of $(\gamma_{ij}(t,q))$. Let $\mathcal L^m$ denote $m$-dimensional [Lebesgue measure](/page/Lebesgue%20Measure) on the coordinate image $\varphi(U) \subset \mathbb{R}^m$. In this chart the density of $\mu_t$ is the function $q \mapsto \sqrt{\det(\gamma_{ij}(t,q))}$ multiplied by $d\mathcal L^m(x)$.
For every $q \in U$, the [Jacobi determinant differentiation formula](/page/Jacobi%20Formula) for a smooth curve of positive definite matrices gives
\begin{align*}
\left.\frac{\partial}{\partial t}\right|_{t=0}\sqrt{\det(\gamma_{ij}(t,q))}
=
\frac{1}{2}\sqrt{\det(\gamma_{ij}(0,q))}
\sum_{i,j=1}^m \gamma^{ij}(0,q)
\left.\frac{\partial \gamma_{ij}}{\partial t}\right|_{t=0}(q).
\end{align*}
Thus the pointwise first variation of the Riemannian density is determined by the scalar
\begin{align*}
\frac{1}{2}\sum_{i,j=1}^m \bar g^{ij}
\dot{\gamma}_{ij},
\end{align*}
where $\bar g^{ij}:=\gamma^{ij}(0,\cdot)$ and
\begin{align*}
\dot{\gamma}_{ij}: U \to \mathbb{R},\qquad q \mapsto \left.\frac{\partial \gamma_{ij}}{\partial t}\right|_{t=0}(q).
\end{align*}
[guided]
The first task is purely local: the area measure is built from the determinant of the induced metric, so we compute how that determinant changes. Choose a coordinate chart $(U,\varphi)$ with coordinate functions $(x_1,\dots,x_m)$, and let $\mathcal L^m$ denote $m$-dimensional Lebesgue measure on $\varphi(U) \subset \mathbb{R}^m$. For each time $t$ and each $i \in \{1,\dots,m\}$, the immersion $F_t$ sends the coordinate vector $\partial/\partial x_i$ to a vector along $F_t$, and we name this vector field $E_i(t): U \to F_t^*TN$ by declaring that $E_i(t)(q)=d(F_t)_q((\partial/\partial x_i)|_q)$ for each $q \in U$. For $i,j \in \{1,\dots,m\}$, the induced metric coefficient $\gamma_{ij}: (-\varepsilon,\varepsilon) \times U \to \mathbb{R}$ is therefore the smooth function satisfying $\gamma_{ij}(t,q)=g_{F_t(q)}(E_i(t)(q),E_j(t)(q))$.
Since each $F_t$ is an immersion, the matrix $(\gamma_{ij}(t,q))$ is positive definite for every $q \in U$. Hence its determinant is positive, and the local Riemannian volume density is the density function $q \mapsto \sqrt{\det(\gamma_{ij}(t,q))}$ with respect to $\mathcal L^m$ in the chosen coordinates.
Now we use the [Jacobi determinant differentiation formula](/page/Jacobi%20Formula) for a smooth curve $A(t)$ of positive definite matrices:
\begin{align*}
\frac{d}{dt}\sqrt{\det A(t)}
=
\frac{1}{2}\sqrt{\det A(t)}\,\operatorname{tr}\bigl(A(t)^{-1}A'(t)\bigr).
\end{align*}
Applying this identity to $A(t)=(\gamma_{ij}(t,q))$ at fixed $q$ gives
\begin{align*}
\left.\frac{\partial}{\partial t}\right|_{t=0}\sqrt{\det(\gamma_{ij}(t,q))}
=
\frac{1}{2}\sqrt{\det(\gamma_{ij}(0,q))}
\sum_{i,j=1}^m \gamma^{ij}(0,q)
\left.\frac{\partial \gamma_{ij}}{\partial t}\right|_{t=0}(q).
\end{align*}
Thus all geometric content is now concentrated in the trace term
\begin{align*}
\frac{1}{2}\sum_{i,j=1}^m \bar g^{ij}\dot{\gamma}_{ij},
\end{align*}
where $\bar g^{ij}:=\gamma^{ij}(0,\cdot)$ and $\dot{\gamma}_{ij}: U \to \mathbb{R}$ is the map $q \mapsto \left.\frac{\partial \gamma_{ij}}{\partial t}\right|_{t=0}(q)$.
[/guided]
[/step]
[step:Identify the trace of the metric variation]
Let $\nabla^N$ denote the [Levi-Civita connection](/page/Levi-Civita%20Connection) of $(N,g)$. Since the coordinate vector fields $\partial_t$ and $\partial_{x_i}$ commute on $(-\varepsilon,\varepsilon)\times U$ and $\nabla^N$ is torsion-free, the covariant derivatives along $\mathcal F$ satisfy
\begin{align*}
\nabla^N_{\partial_t}E_i
=
\nabla^N_{\partial_{x_i}}\partial_t\mathcal F.
\end{align*}
At $t=0$, this becomes
\begin{align*}
\left.\nabla^N_{\partial_t}E_i\right|_{t=0}
=
\nabla^N_{\partial_{x_i}}X.
\end{align*}
Metric compatibility of $\nabla^N$ gives
\begin{align*}
\dot{\gamma}_{ij}
&=
g(\nabla^N_{\partial_{x_i}}X,E_j(0))
+
g(E_i(0),\nabla^N_{\partial_{x_j}}X).
\end{align*}
Therefore, using the symmetry of $(\bar g^{ij})$,
\begin{align*}
\frac{1}{2}\sum_{i,j=1}^m \bar g^{ij}\dot{\gamma}_{ij}
=
\sum_{i,j=1}^m \bar g^{ij}g(\nabla^N_{\partial_{x_i}}X,E_j(0)).
\end{align*}
Let $(v_1,\dots,v_m)$ be a local $\bar g$-orthonormal tangent frame on $M$ near $p$. Define the corresponding tangent frame along the immersion by $e_i(q):=dF_q(v_i(q)) \in dF_q(T_qM) \subset T_{F(q)}N$. In the expression $\nabla^N_{e_i}X$, this means the covariant derivative of the section $X:M\to F^*TN$ in the base direction $v_i$, using the pullback connection induced by $\nabla^N$.
At $p$, choose the frame $(v_1,\dots,v_m)$ so that the preceding trace is evaluated in this [orthonormal basis](/page/Orthonormal%20Basis). Then the scalar at $p$ is
\begin{align*}
\sum_{i=1}^m g(\nabla^N_{e_i}X,e_i).
\end{align*}
Since both sides are scalar functions, this orthonormal-frame expression holds invariantly at every point:
\begin{align*}
\frac{1}{2}\operatorname{tr}_{\bar g}\dot{\gamma}
=
\sum_{i=1}^m g(\nabla^N_{e_i}X,e_i).
\end{align*}
[guided]
The determinant computation reduced the proof to identifying the trace of the first variation of the induced metric. Let $\nabla^N$ denote the [Levi-Civita connection](/page/Levi-Civita%20Connection) of $(N,g)$, and let $\mathcal F: (-\varepsilon,\varepsilon) \times U \to N$ be the variation map defined by $\mathcal F(t,q)=F_t(q)$. The coordinate vector fields $\partial_t$ and $\partial_{x_i}$ on $(-\varepsilon,\varepsilon)\times U$ commute. Since $\nabla^N$ is torsion-free, the pullback covariant derivatives along $\mathcal F$ satisfy
\begin{align*}
\nabla^N_{\partial_t}E_i
=
\nabla^N_{\partial_{x_i}}\partial_t\mathcal F.
\end{align*}
At $t=0$, the vector field $\partial_t\mathcal F|_{t=0}$ is exactly the variation vector field $X: M \to F^*TN$. Therefore
\begin{align*}
\left.\nabla^N_{\partial_t}E_i\right|_{t=0}
=
\nabla^N_{\partial_{x_i}}X,
\end{align*}
where the derivative on the right is taken in the base direction $\partial_{x_i}$ using the pullback connection on $F^*TN$.
Now differentiate the scalar function $\gamma_{ij}(t,q)=g_{F_t(q)}(E_i(t)(q),E_j(t)(q))$. Metric compatibility of the Levi-Civita connection means that differentiating the inner product equals the sum of the inner products with one factor differentiated at a time. Hence
\begin{align*}
\dot{\gamma}_{ij}
&=
g(\nabla^N_{\partial_{x_i}}X,E_j(0))
+
g(E_i(0),\nabla^N_{\partial_{x_j}}X).
\end{align*}
Contracting with the symmetric inverse matrix $(\bar g^{ij})$ gives
\begin{align*}
\frac{1}{2}\sum_{i,j=1}^m \bar g^{ij}\dot{\gamma}_{ij}
=
\sum_{i,j=1}^m \bar g^{ij}g(\nabla^N_{\partial_{x_i}}X,E_j(0)).
\end{align*}
The factor $1/2$ disappears because the two summands are equal after interchanging the dummy indices $i$ and $j$ and using symmetry of $(\bar g^{ij})$.
To express this invariantly, choose a local $\bar g$-orthonormal tangent frame $(v_1,\dots,v_m)$ near the point under consideration. Define $e_i: U \to F^*TN$ by $e_i(q)=dF_q(v_i(q))$. Since $F$ is an immersion and $\bar g=F^*g$, the vectors $(e_1,\dots,e_m)$ form a $g$-orthonormal frame of the tangent subbundle $dF(TM)$. Evaluating the contracted trace in this orthonormal basis gives
\begin{align*}
\frac{1}{2}\operatorname{tr}_{\bar g}\dot{\gamma}
=
\sum_{i=1}^m g(\nabla^N_{e_i}X,e_i).
\end{align*}
This is the desired pointwise identity: the trace of the metric variation is the tangential trace of the ambient covariant derivative of the variation vector field.
[/guided]
[/step]
[step:Decompose the trace into divergence and mean curvature]
Let $X^\top: M \to TM$ denote the tangential vector field determined by the tangential projection of $X$ along $dF(TM)$, and let $X^\perp: M \to (F^*TN)^\perp$ denote the normal component, so that $X=dF(X^\top)+X^\perp$.
Let $\nabla^M$ denote the [Levi-Civita connection](/page/Levi-Civita%20Connection) of $(M,\bar g)$. Let the [second fundamental form](/page/Second%20Fundamental%20Form) be the map $A: TM \times TM \to (F^*TN)^\perp$ defined by
\begin{align*}
A(Y,Z):=(\nabla^N_{dF(Y)} dF(Z)-dF(\nabla^M_Y Z))^\perp,
\end{align*}
for tangent vector fields $Y,Z:M\to TM$. Let the [mean curvature vector](/page/Mean%20Curvature%20Vector) be
\begin{align*}
H:=\sum_{i=1}^m A(v_i,v_i)
\end{align*}
for any local $\bar g$-orthonormal tangent frame $(v_1,\dots,v_m)$ on $M$.
For a tangent vector field $Y:M\to TM$, define its Riemannian divergence with respect to $\bar g$ by
\begin{align*}
\operatorname{div}_{\bar g}Y:=\sum_{i=1}^m \bar g(\nabla^M_{v_i}Y,v_i),
\end{align*}
where the expression is independent of the chosen local $\bar g$-orthonormal frame.
For such a frame, write $e_i=dF(v_i)$ for its image along the immersion; all covariant derivatives $\nabla^N_{e_i}$ are taken along the base direction $v_i$ using the pullback connection. Then
\begin{align*}
\sum_{i=1}^m g(\nabla^N_{e_i}X,e_i)
&=
\sum_{i=1}^m g(\nabla^N_{e_i}dF(X^\top),e_i)
+
\sum_{i=1}^m g(\nabla^N_{e_i}X^\perp,e_i).
\end{align*}
The tangential term is
\begin{align*}
\sum_{i=1}^m g(\nabla^N_{e_i}dF(X^\top),e_i)
=
\sum_{i=1}^m \bar g(\nabla^M_{v_i}X^\top,v_i)
=
\operatorname{div}_{\bar g}(X^\top).
\end{align*}
For the normal term, since $g(X^\perp,e_i)=0$ for every $i$, differentiating this identity in the direction $e_i$ gives
\begin{align*}
g(\nabla^N_{e_i}X^\perp,e_i)
=
-g(X^\perp,\nabla^N_{e_i}e_i).
\end{align*}
The normal projection of $\nabla^N_{e_i}e_i$ is $A(v_i,v_i)$, while $X^\perp$ is normal, so
\begin{align*}
\sum_{i=1}^m g(\nabla^N_{e_i}X^\perp,e_i)
=
-\sum_{i=1}^m g(X^\perp,A(v_i,v_i))
=
-g(X,H).
\end{align*}
Thus
\begin{align*}
\frac{1}{2}\operatorname{tr}_{\bar g}\dot{\gamma}
=
\operatorname{div}_{\bar g}(X^\top)-g(H,X).
\end{align*}
[guided]
This is the geometric heart of the proof. The trace of the metric variation is
\begin{align*}
\sum_{i=1}^m g(\nabla^N_{e_i}X,e_i),
\end{align*}
where $v_i$ is a local $\bar g$-orthonormal tangent frame on $M$, $e_i=dF(v_i)$ is its image along the immersion, and $\nabla^N_{e_i}$ denotes covariant differentiation of sections of $F^*TN$ in the base direction $v_i$ using the pullback connection. We split the variation vector field into tangential and normal parts. Define
$X^\top: M \to TM$ to be the unique tangent vector field whose image under $dF$ is the tangential projection of $X$, and define $X^\perp: M \to (F^*TN)^\perp$
by
\begin{align*}
X=dF(X^\top)+X^\perp.
\end{align*}
The tangential component should produce only reparametrization of the surface, so it should become a divergence. To verify this, use the Gauss formula
\begin{align*}
\nabla^N_{dF(Y)} dF(Z)=dF(\nabla^M_YZ)+A(Y,Z),
\end{align*}
where $\nabla^M$ is the [Levi-Civita connection](/page/Levi-Civita%20Connection) of $(M,\bar g)$ and $A: TM \times TM \to (F^*TN)^\perp$ is the [second fundamental form](/page/Second%20Fundamental%20Form). Since $A(v_i,X^\top)$ is normal and $e_i$ is tangent, their $g$-inner product is zero. Hence
\begin{align*}
\sum_{i=1}^m g(\nabla^N_{e_i}dF(X^\top),e_i)
=
\sum_{i=1}^m g(dF(\nabla^M_{v_i}X^\top),e_i).
\end{align*}
Because $F$ induces $\bar g=F^*g$, this equals
\begin{align*}
\sum_{i=1}^m \bar g(\nabla^M_{v_i}X^\top,v_i)
=
\operatorname{div}_{\bar g}(X^\top),
\end{align*}
where $\operatorname{div}_{\bar g}$ is the [Riemannian divergence](/page/Divergence).
Now consider the normal component. Since $X^\perp$ is normal and $e_i$ is tangent, we have
\begin{align*}
g(X^\perp,e_i)=0
\end{align*}
for every $i$. Differentiating this scalar identity in the direction $e_i$ and using metric compatibility of $\nabla^N$ gives
\begin{align*}
0
=
e_i\bigl(g(X^\perp,e_i)\bigr)
=
g(\nabla^N_{e_i}X^\perp,e_i)+g(X^\perp,\nabla^N_{e_i}e_i).
\end{align*}
Therefore
\begin{align*}
g(\nabla^N_{e_i}X^\perp,e_i)
=
-g(X^\perp,\nabla^N_{e_i}e_i).
\end{align*}
Only the normal projection of $\nabla^N_{e_i}e_i$ contributes to this inner product, because $X^\perp$ is normal. By definition of the second fundamental form, that normal projection is $A(v_i,v_i)$. Summing over $i$ gives
\begin{align*}
\sum_{i=1}^m g(\nabla^N_{e_i}X^\perp,e_i)
=
-\sum_{i=1}^m g(X^\perp,A(v_i,v_i)).
\end{align*}
The mean curvature vector is the trace
\begin{align*}
H:=\sum_{i=1}^m A(v_i,v_i),
\end{align*}
so the normal contribution is
\begin{align*}
-g(X^\perp,H).
\end{align*}
Since $H$ is normal, $g(X^\perp,H)=g(X,H)$. Combining the tangential and normal computations yields
\begin{align*}
\frac{1}{2}\operatorname{tr}_{\bar g}\dot{\gamma}
=
\operatorname{div}_{\bar g}(X^\top)-g(H,X).
\end{align*}
[/guided]
[/step]
[step:Integrate the pointwise variation and remove the divergence term]
Because $K$ is compact, choose finitely many coordinate charts whose domains cover $K$ and a smooth [partition of unity](/page/Partition%20of%20Unity) $(\psi_a)_{a\in I}$ subordinate to this finite cover, where $I$ is a finite index set and each $\psi_a: K \to [0,1]$ is a smooth function supported in one chart domain. On each coordinate patch, the function $(t,q) \mapsto \sqrt{\det(\gamma_{ij}(t,q))}$ is smooth, so after shrinking $\varepsilon>0$ if necessary, its $t$-derivative is bounded on $(-\varepsilon,\varepsilon)\times \operatorname{supp}\psi_a$ for each $a\in I$. By the one-variable [mean value theorem](/theorems/186) applied in the $t$-variable at fixed $q$, each local difference quotient is bounded in absolute value by this local bound on the $t$-derivative. The finitely many local bounds patch to an integrable global dominating function on $K$. Hence the [Dominated Convergence Theorem](/page/Dominated%20Convergence%20Theorem) justifies differentiating the area integral under the integral sign. Combining this with the density computation and the trace identity gives
\begin{align*}
\left.\frac{d}{dt}\right|_{t=0}\mathcal A_K[F_t]
=
\int_K \left(\operatorname{div}_{\bar g}(X^\top)-g(H,X)\right)\,d\mu_{\bar g}.
\end{align*}
Because $\operatorname{supp} X \subset \operatorname{int}K$, the tangential vector field $X^\top$ is smooth on $K$ and also has compact support in $\operatorname{int}K$. Hence there exists an open neighbourhood $V \subset K$ of $\partial K$ such that $X^\top=0$ on $V$, so the boundary trace of $X^\top$ is zero. The [Divergence Theorem](/page/Divergence%20Theorem) applies on the compact Riemannian manifold with smooth boundary $(K,\bar g)$ and gives
\begin{align*}
\int_K \operatorname{div}_{\bar g}(X^\top)\,d\mu_{\bar g}
=
\int_{\partial K}\bar g(X^\top,\nu)\,d\mu_{\partial K}
=
0,
\end{align*}
where $\nu$ is the outward unit conormal vector field along $\partial K$ and $\mu_{\partial K}$ is the induced boundary measure. Therefore
\begin{align*}
\left.\frac{d}{dt}\right|_{t=0}\mathcal A_K[F_t]
=
-\int_K g(H,X)\,d\mu_{\bar g}.
\end{align*}
Since $\bar g=F^*g$, this is exactly
\begin{align*}
\left.\frac{d}{dt}\right|_{t=0}\mathcal A_K[F_t]
=
-\int_K g(H,X)\,d\mu_{F^*g}.
\end{align*}
[guided]
We now assemble the pointwise computation into the variation of the area functional. Because $K$ is compact, we cover $K$ by finitely many coordinate charts and choose a smooth partition of unity $(\psi_a)$ subordinate to that finite cover. On the support of each $\psi_a$, the local density function
\begin{align*}
(t,q) \mapsto \sqrt{\det(\gamma_{ij}(t,q))}
\end{align*}
is smooth. After possibly reducing $\varepsilon>0$, its derivative with respect to $t$ is bounded on $(-\varepsilon,\varepsilon)\times \operatorname{supp}\psi_a$. For fixed $q$, the one-variable mean value theorem in the $t$-variable bounds the corresponding difference quotient by that same derivative bound. Since there are only finitely many partition functions, these bounds combine into an integrable dominating function on the compact set $K$. Therefore the [Dominated Convergence Theorem](/page/Dominated%20Convergence%20Theorem) permits differentiating the area integral under the integral sign.
Using the pointwise density computation and the identity
\begin{align*}
\frac{1}{2}\operatorname{tr}_{\bar g}\dot{\gamma}
=
\operatorname{div}_{\bar g}(X^\top)-g(H,X),
\end{align*}
we obtain
\begin{align*}
\left.\frac{d}{dt}\right|_{t=0}\mathcal A_K[F_t]
=
\int_K \left(\operatorname{div}_{\bar g}(X^\top)-g(H,X)\right)\,d\mu_{\bar g}.
\end{align*}
The hypothesis $\operatorname{supp}X\subset \operatorname{int}K$ implies $\operatorname{supp}X^\top\subset \operatorname{int}K$, because tangential projection cannot create support where $X$ is zero. Hence $X^\top$ vanishes on an open neighbourhood of $\partial K$, so its boundary trace is zero. Applying the [Divergence Theorem](/page/Divergence%20Theorem) on the compact Riemannian manifold with smooth boundary $(K,\bar g)$ gives
\begin{align*}
\int_K \operatorname{div}_{\bar g}(X^\top)\,d\mu_{\bar g}
=
\int_{\partial K}\bar g(X^\top,\nu)\,d\mu_{\partial K}
=
0,
\end{align*}
where $\nu$ is the outward unit conormal vector field along $\partial K$ and $\mu_{\partial K}$ is the induced boundary measure. Therefore
\begin{align*}
\left.\frac{d}{dt}\right|_{t=0}\mathcal A_K[F_t]
=
-\int_K g(H,X)\,d\mu_{\bar g}.
\end{align*}
Since $\bar g=F^*g$, this is exactly
\begin{align*}
\left.\frac{d}{dt}\right|_{t=0}\mathcal A_K[F_t]
=
-\int_K g(H,X)\,d\mu_{F^*g}.
\end{align*}
If $M$ is compact without boundary, then $K=M$ is a compact domain with empty boundary, and the same argument applies.
[/guided]
If $M$ is compact without boundary, then $K=M$ is a compact domain with empty boundary, and the same argument applies. This proves the stated formula.
[/step]