[proofplan]
We first use the explicit definition
\begin{align*}
\mathcal F[g,f] := \int_M \left(S_g + |\nabla f|_g^2\right)e^{-f}\,d\mu_g
\end{align*}
and record that the equation for $f$ preserves the weighted measure in total. The main computation is the [first variation formula](/theorems/2728) for $\mathcal F$ under an arbitrary variation of the pair $(g,f)$; when the weighted measure is fixed to first order, this variation is the negative pairing with $\operatorname{Ric}_g+\operatorname{Hess}_g f$. We then pull the Ricci flow back by the diffeomorphisms generated by $-\nabla f$, which changes the metric velocity to $-2(\operatorname{Ric}_g+\operatorname{Hess}_g f)$ and makes the weighted measure stationary to first order. Since $\mathcal F$ is invariant under diffeomorphism pullback, the gauged computation gives the desired identity for the original flow.
[/proofplan]
[step:Show that the conjugate equation preserves the weighted volume]
Fix $t \in [0,T]$. Write $g_t := g(t)$, $f_t := f(\cdot,t)$, $S_t := S_{g_t}$, and $d\mu_t := d\mu_{g_t}$. Define the weighted measure density as the smooth map
\begin{align*}
u_t: M \to (0,\infty), \qquad u_t(x)=e^{-f_t(x)}.
\end{align*}
Along Ricci flow, the Riemannian volume form satisfies
\begin{align*}
\partial_t d\mu_t = -S_t\,d\mu_t.
\end{align*}
Therefore
\begin{align*}
\partial_t(u_t\,d\mu_t)=(-\partial_t f_t-S_t)e^{-f_t}\,d\mu_t.
\end{align*}
Using the backward equation for $f_t$, this becomes
\begin{align*}
\partial_t(u_t\,d\mu_t)=(\Delta_{g_t} f_t-|\nabla^{g_t}f_t|_{g_t}^2)e^{-f_t}\,d\mu_t.
\end{align*}
By the identity $\operatorname{div}_{g_t}(e^{-f_t}\nabla^{g_t}f_t)=e^{-f_t}(\Delta_{g_t} f_t-|\nabla^{g_t}f_t|_{g_t}^2)$, we have
\begin{align*}
\partial_t(u_t\,d\mu_t)=\operatorname{div}_{g_t}(e^{-f_t}\nabla^{g_t}f_t)\,d\mu_t.
\end{align*}
Since $M$ is closed, the [divergence theorem](/theorems/2754) gives that the integral of a divergence over $M$ is zero. Hence
\begin{align*}
\frac{d}{dt}\int_M e^{-f_t}\,d\mu_t=\int_M \operatorname{div}_{g_t}(e^{-f_t}\nabla^{g_t}f_t)\,d\mu_t.
\end{align*}
Since this divergence integral is zero, we obtain
\begin{align*}
\frac{d}{dt}\int_M e^{-f_t}\,d\mu_t=0.
\end{align*}
[/step]
[step:Compute the first variation of $\mathcal F$ for an arbitrary variation of $(g,f)$]
Let $s \mapsto (g_s,f_s)$ be a smooth variation of a Riemannian metric and a smooth function on $M$. At a fixed parameter value $s=0$, set
\begin{align*}
g:=g_0, \qquad f:=f_0, \qquad h:=\left.\partial_s g_s\right|_{s=0}, \qquad \phi:=\left.\partial_s f_s\right|_{s=0}.
\end{align*}
Let $d\mu := d\mu_g$, let $S := S_g$, and let all contractions, gradients, Laplacians, Hessians, and divergences be taken with respect to $g$.
[claim:First variation formula for $\mathcal F$]
With the above notation,
\begin{align*}
\left.\frac{d}{ds}\right|_{s=0}\mathcal F[g_s,f_s] = \int_M \left[-\langle h,\operatorname{Ric}_g+\operatorname{Hess}_g f\rangle_g + \left(\frac{1}{2}\operatorname{tr}_g h-\phi\right)\left(2\Delta_g f-|\nabla^g f|_g^2+S\right)\right]e^{-f}\,d\mu.
\end{align*}
[/claim]
[proof]
We use the standard Riemannian volume variation formula, scalar-curvature variation formula, and the elementary metric-gradient variation identity obtained by differentiating $g_s^{-1}(df_s,df_s)$ at $s=0$. The smoothness of $s \mapsto (g_s,f_s)$ verifies the differentiability hypotheses of these variation identities, and closedness of $M$ will verify the boundary-free [integration by parts](/theorems/2098) used below. Explicitly,
\begin{align*}
\left.\partial_s d\mu_{g_s}\right|_{s=0}=\frac{1}{2}\operatorname{tr}_g h\,d\mu.
\end{align*}
Also,
\begin{align*}
\left.\partial_s S_{g_s}\right|_{s=0}=-\langle h,\operatorname{Ric}_g\rangle_g+\operatorname{div}_g(\operatorname{div}_g h)-\Delta_g(\operatorname{tr}_g h).
\end{align*}
Finally,
\begin{align*}
\left.\partial_s|\nabla^{g_s}f_s|_{g_s}^2\right|_{s=0}=-h(\nabla^g f,\nabla^g f)+2\langle\nabla^g f,\nabla^g\phi\rangle_g.
\end{align*}
Differentiating the definition of $\mathcal F$ gives
\begin{align*}
\left.\frac{d}{ds}\right|_{s=0}\mathcal F[g_s,f_s] = \int_M \left[-\langle h,\operatorname{Ric}_g\rangle_g + \operatorname{div}_g(\operatorname{div}_g h) - \Delta_g(\operatorname{tr}_g h) - h(\nabla^g f,\nabla^g f) + 2\langle\nabla^g f,\nabla^g\phi\rangle_g\right]e^{-f}\,d\mu + \int_M (S+|\nabla^g f|_g^2)\left(\frac{1}{2}\operatorname{tr}_g h-\phi\right)e^{-f}\,d\mu.
\end{align*}
Because $M$ is closed, [integration by parts](/theorems/210) against the weight $e^{-f}$, equivalently the [divergence theorem](/theorems/3614), gives
\begin{align*}
\int_M
\left[
\operatorname{div}_g(\operatorname{div}_g h)
-
\Delta_g(\operatorname{tr}_g h)
\right]e^{-f}\,d\mu
&=
\int_M
\left[
h(\nabla^g f,\nabla^g f)
-
\langle h,\operatorname{Hess}_g f\rangle_g
-
(\operatorname{tr}_g h)(|\nabla^g f|_g^2-\Delta_g f)
\right]e^{-f}\,d\mu.
\end{align*}
Also, weighted integration by parts first gives
\begin{align*}
\int_M 2\langle\nabla^g f,\nabla^g\phi\rangle_g e^{-f}\,d\mu = -2\int_M \phi\,\operatorname{div}_g(e^{-f}\nabla^g f)\,d\mu.
\end{align*}
Using $\operatorname{div}_g(e^{-f}\nabla^g f)=e^{-f}(\Delta_g f-|\nabla^g f|_g^2)$, this becomes
\begin{align*}
\int_M 2\langle\nabla^g f,\nabla^g\phi\rangle_g e^{-f}\,d\mu = -2\int_M \phi(\Delta_g f-|\nabla^g f|_g^2)e^{-f}\,d\mu.
\end{align*}
Substituting these two identities cancels the terms $h(\nabla^g f,\nabla^g f)$ and yields
\begin{align*}
\left.\frac{d}{ds}\right|_{s=0}\mathcal F[g_s,f_s] = \int_M \left[-\langle h,\operatorname{Ric}_g+\operatorname{Hess}_g f\rangle_g + \left(\frac{1}{2}\operatorname{tr}_g h-\phi\right)\left(2\Delta_g f-|\nabla^g f|_g^2+S\right)\right]e^{-f}\,d\mu.
\end{align*}
[guided]
The point of the first variation formula is to isolate the tensor that is dual to a metric variation. Let $h=\partial_s g_s|_{s=0}$ be the metric velocity and $\phi=\partial_s f_s|_{s=0}$ be the function velocity. Because $s \mapsto (g_s,f_s)$ is a smooth variation on the closed manifold $M$, the hypotheses of the Riemannian volume variation formula, the scalar-curvature variation formula, and the elementary metric-gradient variation identity obtained by differentiating $g_s^{-1}(df_s,df_s)$ at $s=0$ are satisfied:
\begin{align*}
\left.\partial_s d\mu_{g_s}\right|_{s=0}=\frac{1}{2}\operatorname{tr}_g h\,d\mu.
\end{align*}
For scalar curvature,
\begin{align*}
\left.\partial_s S_{g_s}\right|_{s=0}=-\langle h,\operatorname{Ric}_g\rangle_g+\operatorname{div}_g(\operatorname{div}_g h)-\Delta_g(\operatorname{tr}_g h).
\end{align*}
For the gradient term,
\begin{align*}
\left.\partial_s|\nabla^{g_s}f_s|_{g_s}^2\right|_{s=0}=-h(\nabla^g f,\nabla^g f)+2\langle\nabla^g f,\nabla^g\phi\rangle_g.
\end{align*}
The first identity is the volume-form variation; the second is the scalar-curvature variation; the third records both effects in the gradient term: changing the inverse metric contributes $-h(\nabla^g f,\nabla^g f)$, while changing the function contributes $2\langle\nabla^g f,\nabla^g\phi\rangle_g$.
Differentiating
\begin{align*}
\mathcal F[g_s,f_s]
=
\int_M
\left(S_{g_s}+|\nabla^{g_s}f_s|_{g_s}^2\right)e^{-f_s}\,d\mu_{g_s}
\end{align*}
therefore gives
\begin{align*}
\left.\frac{d}{ds}\right|_{s=0}\mathcal F[g_s,f_s] = \int_M \left[-\langle h,\operatorname{Ric}_g\rangle_g + \operatorname{div}_g(\operatorname{div}_g h) - \Delta_g(\operatorname{tr}_g h) - h(\nabla^g f,\nabla^g f) + 2\langle\nabla^g f,\nabla^g\phi\rangle_g\right]e^{-f}\,d\mu + \int_M (S+|\nabla^g f|_g^2)\left(\frac{1}{2}\operatorname{tr}_g h-\phi\right)e^{-f}\,d\mu.
\end{align*}
The divergence terms are not discarded, because the weight $e^{-f}$ is not constant. Instead we move the derivatives from $h$ onto the weight. Since
\begin{align*}
\operatorname{Hess}_g(e^{-f})
=
e^{-f}\left(\nabla^g f\otimes \nabla^g f-\operatorname{Hess}_g f\right),
\end{align*}
and
\begin{align*}
\Delta_g(e^{-f})
=
e^{-f}\left(|\nabla^g f|_g^2-\Delta_g f\right),
\end{align*}
integration by parts on the closed manifold gives
\begin{align*}
\int_M
\left[
\operatorname{div}_g(\operatorname{div}_g h)
-
\Delta_g(\operatorname{tr}_g h)
\right]e^{-f}\,d\mu
&=
\int_M
\left[
h(\nabla^g f,\nabla^g f)
-
\langle h,\operatorname{Hess}_g f\rangle_g
-
(\operatorname{tr}_g h)(|\nabla^g f|_g^2-\Delta_g f)
\right]e^{-f}\,d\mu.
\end{align*}
This is where the Hessian term in the final answer comes from.
The $\phi$-term is handled by one more weighted integration by parts:
\begin{align*}
\int_M 2\langle\nabla^g f,\nabla^g\phi\rangle_g e^{-f}\,d\mu=-2\int_M \phi\,\operatorname{div}_g(e^{-f}\nabla^g f)\,d\mu.
\end{align*}
Using $\operatorname{div}_g(e^{-f}\nabla^g f)=e^{-f}(\Delta_g f-|\nabla^g f|_g^2)$, this becomes
\begin{align*}
\int_M 2\langle\nabla^g f,\nabla^g\phi\rangle_g e^{-f}\,d\mu=-2\int_M \phi(\Delta_g f-|\nabla^g f|_g^2)e^{-f}\,d\mu.
\end{align*}
After substitution, the terms $h(\nabla^g f,\nabla^g f)$ cancel. The remaining $h$-terms combine into
\begin{align*}
-\langle h,\operatorname{Ric}_g+\operatorname{Hess}_g f\rangle_g,
\end{align*}
and the remaining scalar coefficient multiplying $\frac{1}{2}\operatorname{tr}_g h-\phi$ is
\begin{align*}
2\Delta_g f-|\nabla^g f|_g^2+S.
\end{align*}
Thus
\begin{align*}
\left.\frac{d}{ds}\right|_{s=0}\mathcal F[g_s,f_s] = \int_M \left[-\langle h,\operatorname{Ric}_g+\operatorname{Hess}_g f\rangle_g + \left(\frac{1}{2}\operatorname{tr}_g h-\phi\right)\left(2\Delta_g f-|\nabla^g f|_g^2+S\right)\right]e^{-f}\,d\mu.
\end{align*}
[/guided]
[/proof]
[/step]
[step:Pass to the diffeomorphism gauge generated by $-\nabla f$]
Fix $t_0 \in (0,T)$. Let $\mathfrak X(M)=\Gamma(TM)$ denote the space of smooth vector fields on $M$, and for $Y\in\mathfrak X(M)$ let $\mathcal L_Y$ denote Lie differentiation along $Y$. Let $X_t \in \mathfrak X(M)$ be the time-dependent vector field
\begin{align*}
X_t := -\nabla^{g(t)}f(t).
\end{align*}
Because $M$ is closed, the flow of $X_t$ exists for $t$ in a neighbourhood of $t_0$. Let $\Psi_t: M \to M$ be the corresponding family of diffeomorphisms normalized by $\Psi_{t_0}=\operatorname{id}_M$ and satisfying
\begin{align*}
\partial_t\Psi_t = X_t\circ \Psi_t.
\end{align*}
Define the pulled-back metric by
\begin{align*}
\widetilde g(t) := \Psi_t^*g(t).
\end{align*}
Define the pulled-back function by
\begin{align*}
\widetilde f(t) := \Psi_t^*f(t).
\end{align*}
At $t=t_0$, since $\Psi_{t_0}=\operatorname{id}_M$, the variation of the pulled-back metric is
\begin{align*}
\partial_t\widetilde g(t_0)=\partial_t g(t_0)+\mathcal L_{X_{t_0}}g(t_0).
\end{align*}
Since $\partial_t g(t_0)=-2\operatorname{Ric}_{g(t_0)}$ and $\mathcal L_{-\nabla^{g(t_0)}f(t_0)}g(t_0)=-2\operatorname{Hess}_{g(t_0)}f(t_0)$, this gives
\begin{align*}
\partial_t\widetilde g(t_0)=-2\left(\operatorname{Ric}_{g(t_0)}+\operatorname{Hess}_{g(t_0)}f(t_0)\right).
\end{align*}
The variation of the pulled-back function is
\begin{align*}
\partial_t\widetilde f(t_0)=\partial_t f(t_0)+X_{t_0}(f(t_0)).
\end{align*}
Substituting the backward equation and $X_{t_0}=-\nabla^{g(t_0)}f(t_0)$ gives
\begin{align*}
\partial_t\widetilde f(t_0)=-\Delta_{g(t_0)}f(t_0)+|\nabla^{g(t_0)}f(t_0)|_{g(t_0)}^2-S_{g(t_0)}-|\nabla^{g(t_0)}f(t_0)|_{g(t_0)}^2.
\end{align*}
Cancelling the gradient-norm terms yields
\begin{align*}
\partial_t\widetilde f(t_0)=-\Delta_{g(t_0)}f(t_0)-S_{g(t_0)}.
\end{align*}
Taking the trace of the metric variation gives
\begin{align*}
\frac{1}{2}\operatorname{tr}_{g(t_0)}\partial_t\widetilde g(t_0)=-S_{g(t_0)}-\Delta_{g(t_0)}f(t_0).
\end{align*}
By the preceding computation of $\partial_t\widetilde f(t_0)$, this is
\begin{align*}
\frac{1}{2}\operatorname{tr}_{g(t_0)}\partial_t\widetilde g(t_0)=\partial_t\widetilde f(t_0).
\end{align*}
Thus, in this gauge, the weighted measure $e^{-\widetilde f(t)}d\mu_{\widetilde g(t)}$ is stationary to first order at $t=t_0$.
[/step]
[step:Apply the first variation formula in the gauged variables]
Apply the first variation formula from the previous step to the path $(\widetilde g(t),\widetilde f(t))$ at $t=t_0$. The scalar term vanishes because
\begin{align*}
\frac{1}{2}\operatorname{tr}_{g(t_0)}\partial_t\widetilde g(t_0)-\partial_t\widetilde f(t_0)=0.
\end{align*}
Therefore
\begin{align*}
\left.\frac{d}{dt}\right|_{t=t_0}\mathcal F[\widetilde g(t),\widetilde f(t)] = -\int_M \left\langle -2(\operatorname{Ric}_{g(t_0)}+\operatorname{Hess}_{g(t_0)}f(t_0)),\operatorname{Ric}_{g(t_0)}+\operatorname{Hess}_{g(t_0)}f(t_0)\right\rangle_{g(t_0)}e^{-f(t_0)}\,d\mu_{g(t_0)}.
\end{align*}
Equivalently,
\begin{align*}
\left.\frac{d}{dt}\right|_{t=t_0}\mathcal F[\widetilde g(t),\widetilde f(t)] = 2\int_M |\operatorname{Ric}_{g(t_0)}+\operatorname{Hess}_{g(t_0)}f(t_0)|_{g(t_0)}^2 e^{-f(t_0)}\,d\mu_{g(t_0)}.
\end{align*}
[/step]
[step:Use diffeomorphism invariance to return to the original Ricci flow]
For every $t$ near $t_0$, the scalar curvature, gradient norm, and volume measure of $\Psi_t^*g(t)$ are the pullbacks of the corresponding quantities for $g(t)$. Hence the change-of-variables formula gives
\begin{align*}
\mathcal F[\widetilde g(t),\widetilde f(t)]=\mathcal F[\Psi_t^*g(t),\Psi_t^*f(t)].
\end{align*}
By the change-of-variables formula for the diffeomorphism $\Psi_t$, this equals the original functional:
\begin{align*}
\mathcal F[\widetilde g(t),\widetilde f(t)]=\mathcal F[g(t),f(t)].
\end{align*}
Differentiating this identity at $t=t_0$ and using the gauged computation yields
\begin{align*}
\left.\frac{d}{dt}\right|_{t=t_0}\mathcal F[g(t),f(t)]
=
2\int_M
|\operatorname{Ric}_{g(t_0)}+\operatorname{Hess}_{g(t_0)}f(t_0)|_{g(t_0)}^2
e^{-f(t_0)}\,d\mu_{g(t_0)}.
\end{align*}
Since $t_0 \in (0,T)$ was arbitrary and the integrand is pointwise non-negative, the formula holds for every $t \in (0,T)$ and implies
\begin{align*}
\frac{d}{dt}\mathcal F[g(t),f(t)] \geq 0.
\end{align*}
This proves Perelman's $\mathcal F$-functional monotonicity formula.
[/step]