[proofplan]
The proof rewrites the [heat equation](/page/Heat%20Equation) for $u$ as a nonlinear equation for $f=\log u$, then studies the resulting expression $|\nabla f|_g^2-\partial_t f$, which equals $-\Delta_g f$. The Bochner identity and the trace inequality for the Hessian give a parabolic differential inequality for a time-shifted version of this expression on each time slab $[s,\tau]$. A maximum-principle argument bounds the shifted quantity by $n/2$, and sending $s \downarrow 0$ gives the desired estimate. In the noncompact case the same argument is applied with spatial cutoffs, and the assumed vanishing of the cutoff error recovers the compact calculation in the limit.
[/proofplan]
[step:Rewrite the heat equation in terms of $f=\log u$]
Since $u>0$, the function $f=\log u$ is smooth on $M \times (0,T]$. For each fixed time $t \in (0,T]$, the gradient $\nabla f(\cdot,t)$ and Laplacian $\Delta_g f(\cdot,t)$ are computed with respect to the metric $g$; all displayed identities below are scalar identities on $M \times (0,T]$.
Using the chain rule and the heat equation $\partial_t u=\Delta_g u$,
\begin{align*}
\partial_t f = \frac{\partial_t u}{u} = \frac{\Delta_g u}{u}.
\end{align*}
Since $u=e^f$, the standard product and chain rules for the Laplace-Beltrami operator give
\begin{align*}
\Delta_g u
= \Delta_g(e^f)
= e^f\Delta_g f + e^f|\nabla f|_g^2.
\end{align*}
Dividing by $u=e^f$ yields
\begin{align*}
\partial_t f=\Delta_g f+|\nabla f|_g^2.
\end{align*}
Define the Li-Yau quantity $Q: M \times (0,T] \to \mathbb{R}$ by
\begin{align*}
Q(x,t) = |\nabla f(x,t)|_g^2-\partial_t f(x,t).
\end{align*}
The previous identity gives
\begin{align*}
Q=-\Delta_g f.
\end{align*}
[/step]
[step:Derive the differential inequality for the Li-Yau quantity]
Let $L$ denote the parabolic operator
\begin{align*}
L:=\partial_t-\Delta_g.
\end{align*}
We use the [Bochner identity](/page/Bochner%20Formula), which applies to the smooth function $f(\cdot,t)$ on the Riemannian manifold $(M,g)$ for each fixed time $t$ and states
\begin{align*}
\frac{1}{2}\Delta_g |\nabla f|_g^2
=
|\operatorname{Hess}_g f|_g^2
+
\langle \nabla f,\nabla \Delta_g f\rangle_g
+
\operatorname{Ric}_g(\nabla f,\nabla f).
\end{align*}
Because the metric is independent of time,
\begin{align*}
\partial_t |\nabla f|_g^2
=
2\langle \nabla f,\nabla \partial_t f\rangle_g.
\end{align*}
Thus
\begin{align*}
L|\nabla f|_g^2 = 2\langle \nabla f,\nabla \partial_t f\rangle_g - 2|\operatorname{Hess}_g f|_g^2 - 2\langle \nabla f,\nabla \Delta_g f\rangle_g - 2\operatorname{Ric}_g(\nabla f,\nabla f).
\end{align*}
Combining the two gradient terms gives
\begin{align*}
L|\nabla f|_g^2 = 2\langle \nabla f,\nabla(\partial_t f-\Delta_g f)\rangle_g - 2|\operatorname{Hess}_g f|_g^2 - 2\operatorname{Ric}_g(\nabla f,\nabla f).
\end{align*}
Since $\partial_t f-\Delta_g f=|\nabla f|_g^2$, this becomes
\begin{align*}
L|\nabla f|_g^2
=
2\langle \nabla f,\nabla |\nabla f|_g^2\rangle_g
-
2|\operatorname{Hess}_g f|_g^2
-
2\operatorname{Ric}_g(\nabla f,\nabla f).
\end{align*}
Also,
\begin{align*}
L(\partial_t f)
=
\partial_t(\partial_t f-\Delta_g f)
=
\partial_t |\nabla f|_g^2
=
2\langle \nabla f,\nabla \partial_t f\rangle_g.
\end{align*}
Subtracting these identities gives
\begin{align*}
LQ
=
2\langle \nabla f,\nabla Q\rangle_g
-
2|\operatorname{Hess}_g f|_g^2
-
2\operatorname{Ric}_g(\nabla f,\nabla f).
\end{align*}
At each point, choose a $g$-[orthonormal basis](/page/Orthonormal%20Basis) of $T_xM$ diagonalizing the symmetric [bilinear form](/page/Bilinear%20Form) $\operatorname{Hess}_g f$. If its eigenvalues are $\lambda_1,\dots,\lambda_n$, then
\begin{align*}
|\operatorname{Hess}_g f|_g^2
=
\sum_{i=1}^n \lambda_i^2,
\qquad
\Delta_g f
=
\sum_{i=1}^n \lambda_i.
\end{align*}
The [Cauchy-Schwarz inequality](/page/Cauchy-Schwarz%20Inequality) in the Euclidean [inner product](/page/Inner%20Product) on $\mathbb{R}^n$, applied to the vectors $(\lambda_1,\dots,\lambda_n)$ and $(1,\dots,1)$, gives
\begin{align*}
(\Delta_g f)^2
=
\left(\sum_{i=1}^n \lambda_i\right)^2
\leq
n\sum_{i=1}^n \lambda_i^2
=
n|\operatorname{Hess}_g f|_g^2.
\end{align*}
Thus
\begin{align*}
|\operatorname{Hess}_g f|_g^2 \geq \frac{(\Delta_g f)^2}{n}=\frac{Q^2}{n}.
\end{align*}
Since $\operatorname{Ric}_g\geq 0$,
\begin{align*}
LQ
\leq
2\langle \nabla f,\nabla Q\rangle_g
-
\frac{2}{n}Q^2.
\end{align*}
[guided]
The quantity we want to estimate is
\begin{align*}
Q=|\nabla f|_g^2-\partial_t f.
\end{align*}
The first useful observation is that the heat equation transforms this quantity into a Laplacian. Indeed, from
\begin{align*}
\partial_t f=\Delta_g f+|\nabla f|_g^2
\end{align*}
we get
\begin{align*}
Q=-\Delta_g f.
\end{align*}
This matters because the Bochner identity controls the Laplacian of $|\nabla f|_g^2$ in terms of the Hessian and Ricci curvature.
Let
\begin{align*}
L:=\partial_t-\Delta_g.
\end{align*}
The [Bochner identity](/page/Bochner%20Formula) applies because $f(\cdot,t)$ is smooth on the Riemannian manifold $(M,g)$ for each fixed $t$. It says
\begin{align*}
\frac{1}{2}\Delta_g |\nabla f|_g^2
=
|\operatorname{Hess}_g f|_g^2
+
\langle \nabla f,\nabla \Delta_g f\rangle_g
+
\operatorname{Ric}_g(\nabla f,\nabla f).
\end{align*}
Because $g$ is fixed in time, differentiating $|\nabla f|_g^2$ with respect to $t$ gives
\begin{align*}
\partial_t |\nabla f|_g^2
=
2\langle \nabla f,\nabla \partial_t f\rangle_g.
\end{align*}
Subtracting the Bochner formula multiplied by $2$ from this time derivative gives
\begin{align*}
L|\nabla f|_g^2
=
2\langle \nabla f,\nabla(\partial_t f-\Delta_g f)\rangle_g
-
2|\operatorname{Hess}_g f|_g^2
-
2\operatorname{Ric}_g(\nabla f,\nabla f).
\end{align*}
Since $\partial_t f-\Delta_g f=|\nabla f|_g^2$, this is
\begin{align*}
L|\nabla f|_g^2
=
2\langle \nabla f,\nabla |\nabla f|_g^2\rangle_g
-
2|\operatorname{Hess}_g f|_g^2
-
2\operatorname{Ric}_g(\nabla f,\nabla f).
\end{align*}
We also compute $L(\partial_t f)$. Since the metric is time-independent, $\partial_t$ commutes with $\Delta_g$, hence
\begin{align*}
L(\partial_t f)
=
\partial_t(\partial_t f-\Delta_g f)
=
\partial_t |\nabla f|_g^2
=
2\langle \nabla f,\nabla \partial_t f\rangle_g.
\end{align*}
Subtracting this identity from the identity for $L|\nabla f|_g^2$ yields
\begin{align*}
LQ
=
2\langle \nabla f,\nabla Q\rangle_g
-
2|\operatorname{Hess}_g f|_g^2
-
2\operatorname{Ric}_g(\nabla f,\nabla f).
\end{align*}
Now the curvature and Hessian terms become useful. The Ricci hypothesis gives
\begin{align*}
\operatorname{Ric}_g(\nabla f,\nabla f)\geq 0.
\end{align*}
For the Hessian term, fix a point $(x,t)$ and choose a $g$-orthonormal basis of $T_xM$ diagonalizing $\operatorname{Hess}_g f$. If $\lambda_1,\dots,\lambda_n$ are the corresponding eigenvalues, then
\begin{align*}
|\operatorname{Hess}_g f|_g^2 = \sum_{i=1}^n \lambda_i^2.
\end{align*}
Also,
\begin{align*}
\Delta_g f = \sum_{i=1}^n \lambda_i.
\end{align*}
Applying the [Cauchy-Schwarz inequality](/page/Cauchy-Schwarz%20Inequality) in the Euclidean inner product on $\mathbb{R}^n$ to the vectors $(\lambda_1,\dots,\lambda_n)$ and $(1,\dots,1)$ gives
\begin{align*}
(\Delta_g f)^2
=
\left(\sum_{i=1}^n \lambda_i\right)^2
\leq
n\sum_{i=1}^n \lambda_i^2
=
n|\operatorname{Hess}_g f|_g^2.
\end{align*}
Since $Q=-\Delta_g f$, this implies
\begin{align*}
|\operatorname{Hess}_g f|_g^2
\geq
\frac{(\Delta_g f)^2}{n}
=
\frac{Q^2}{n}.
\end{align*}
Substituting the two sign estimates into the formula for $LQ$ gives the key differential inequality
\begin{align*}
LQ
\leq
2\langle \nabla f,\nabla Q\rangle_g
-
\frac{2}{n}Q^2.
\end{align*}
[/guided]
[/step]
[step:Apply the maximum principle to the shifted quantity $F_s$]
Fix $s,\tau \in (0,T]$ with $s<\tau$. Define the shifted Li-Yau quantity $F_s: M \times [s,\tau] \to \mathbb{R}$ by
\begin{align*}
F_s(x,t) = (t-s)Q(x,t).
\end{align*}
For $t>s$, using the differential inequality for $Q$,
\begin{align*}
LF_s = Q+(t-s)LQ.
\end{align*}
The differential inequality for $Q$ gives
\begin{align*}
LF_s \leq Q+2(t-s)\langle \nabla f,\nabla Q\rangle_g -\frac{2(t-s)}{n}Q^2.
\end{align*}
Using $F_s=(t-s)Q$ and $\nabla F_s=(t-s)\nabla Q$ for fixed $t$, this becomes
\begin{align*}
LF_s \leq \frac{F_s}{t-s} + 2\langle \nabla f,\nabla F_s\rangle_g - \frac{2}{n(t-s)}F_s^2.
\end{align*}
Assume first that $M$ is compact. We apply the [parabolic maximum principle](/page/Maximum%20Principle) on the compact cylinder $M\times[s,\tau]$ to the smooth function $F_s$, in the pointwise form for the operator $L-2\langle \nabla f,\nabla\cdot\rangle_g$. This is justified directly at an interior or terminal-time maximum: the spatial gradient vanishes, the Laplacian is nonpositive, and the one-sided time derivative is nonnegative. Since $F_s=0$ on $M\times\{s\}$, let $(x_0,t_0)\in M\times[s,\tau]$ be a point where $F_s$ attains its maximum. If this maximum is nonpositive, then $F_s\leq 0\leq n/2$ on $M\times[s,\tau]$. Otherwise $F_s(x_0,t_0)>0$, so $t_0>s$. At this positive maximum,
\begin{align*}
\nabla F_s(x_0,t_0)=0.
\end{align*}
Moreover,
\begin{align*}
\Delta_g F_s(x_0,t_0)\leq 0.
\end{align*}
Finally,
\begin{align*}
\partial_t F_s(x_0,t_0)\geq 0,
\end{align*}
where the last inequality is the one-sided time derivative condition if $t_0=\tau$. Hence
\begin{align*}
LF_s(x_0,t_0)\geq 0.
\end{align*}
Evaluating the preceding differential inequality at $(x_0,t_0)$ gives
\begin{align*}
0 \leq \frac{F_s(x_0,t_0)}{t_0-s} - \frac{2}{n(t_0-s)}F_s(x_0,t_0)^2.
\end{align*}
Since $t_0-s>0$ and $F_s(x_0,t_0)>0$, multiplying by $(t_0-s)/F_s(x_0,t_0)$ yields
\begin{align*}
1-\frac{2}{n}F_s(x_0,t_0)\geq 0.
\end{align*}
Therefore
\begin{align*}
F_s(x_0,t_0)\leq \frac{n}{2}.
\end{align*}
Because $(x_0,t_0)$ is a maximum point,
\begin{align*}
F_s(x,t)\leq \frac{n}{2}
\end{align*}
for every $(x,t)\in M\times[s,\tau]$.
[/step]
[step:Pass from compact manifolds to the assumed complete noncompact case]
If $M$ is complete and noncompact, fix a base point $o\in M$. For $r>0$, define the geodesic ball $B_g(o,r)\subset M$ by $B_g(o,r)=\{y\in M: d_g(o,y)<r\}$, where $d_g:M\times M\to[0,\infty)$ is the Riemannian distance induced by $g$. For each radius $R>1$, let $\phi_R:M\to[0,1]$ be a smooth cutoff supported in $B_g(o,2R)$, equal to $1$ on $B_g(o,R)$, and chosen with bounds $|\nabla \phi_R|_g\leq C_1/R$ and $|\Delta_g\phi_R|\leq C_2/R^2$ on its support. Under $\operatorname{Ric}_g\geq0$, such cutoffs are obtained from a fixed one-variable cutoff profile and Laplacian comparison, so $C_1,C_2>0$ depend only on that profile and on $n$.
Define the compactly supported auxiliary function $G_R:M\times[s,\tau]\to\mathbb{R}$ by
\begin{align*}
G_R(x,t)=\phi_R(x)F_s(x,t).
\end{align*}
Apply the compact-support version of the [parabolic maximum principle](/page/Maximum%20Principle) to $G_R$ on $M\times[s,\tau]$. Since $G_R=0$ on $M\times\{s\}$, if its maximum is nonpositive then $G_R\leq0\leq n/2$ and there is nothing to prove. Otherwise let $(x_R,t_R)\in M\times(s,\tau]$ be a positive maximum point of $G_R$. At this point,
\begin{align*}
\nabla G_R(x_R,t_R)=0,
\end{align*}
\begin{align*}
\Delta_g G_R(x_R,t_R)\leq0,
\end{align*}
and
\begin{align*}
\partial_t G_R(x_R,t_R)\geq0,
\end{align*}
with the last inequality understood as the one-sided derivative condition if $t_R=\tau$. Hence $LG_R(x_R,t_R)\geq0$.
At points where $t>s$, the product rule gives
\begin{align*}
LG_R=\phi_R LF_s-F_s\Delta_g\phi_R-2\langle\nabla\phi_R,\nabla F_s\rangle_g.
\end{align*}
Using the differential inequality for $F_s$ and the identity $\nabla G_R=\phi_R\nabla F_s+F_s\nabla\phi_R$, evaluated at $(x_R,t_R)$, gives
\begin{align*}
0\leq \frac{G_R}{t_R-s}+2\langle\nabla f,\nabla G_R-F_s\nabla\phi_R\rangle_g-\frac{2}{n(t_R-s)}\phi_R F_s^2-F_s\Delta_g\phi_R-2\langle\nabla\phi_R,\nabla F_s\rangle_g.
\end{align*}
Since $\nabla G_R(x_R,t_R)=0$ and $G_R(x_R,t_R)>0$, we have $\phi_R(x_R)>0$ and
\begin{align*}
\nabla F_s(x_R,t_R)=-\frac{F_s(x_R,t_R)}{\phi_R(x_R)}\nabla\phi_R(x_R).
\end{align*}
Set $A_R:=G_R(x_R,t_R)$, $\theta_R:=\phi_R(x_R)$, and $\sigma_R:=t_R-s$. Then $0<\theta_R\leq1$, $0<\sigma_R\leq\tau-s$, $F_s(x_R,t_R)=A_R/\theta_R$, and the preceding inequality becomes
\begin{align*}
0\leq \frac{A_R}{\sigma_R}-\frac{2A_R^2}{n\sigma_R\theta_R}+A_R\left(-2\langle\nabla f,\nabla\phi_R\rangle_g-\Delta_g\phi_R+\frac{2|\nabla\phi_R|_g^2}{\theta_R}\right)(x_R,t_R).
\end{align*}
Dividing by $A_R>0$ and multiplying by $\sigma_R$ gives
\begin{align*}
\frac{2A_R}{n\theta_R}\leq 1+\sigma_R\left(-2\langle\nabla f,\nabla\phi_R\rangle_g-\Delta_g\phi_R+\frac{2|\nabla\phi_R|_g^2}{\theta_R}\right)(x_R,t_R).
\end{align*}
Since $\theta_R\leq1$, this implies
\begin{align*}
A_R\leq \frac{n}{2}+E_R,
\end{align*}
where the explicit cutoff error is
\begin{align*}
E_R:=\frac{n}{2}(\tau-s)\left(2\|\nabla f\|_{L^\infty(M\times[s,\tau])}\frac{C_1}{R}+\frac{C_2}{R^2}+\sup_{M}\frac{2|\nabla\phi_R|_g^2}{\phi_R}\right).
\end{align*}
The cutoff profile is chosen so that $|\nabla\phi_R|_g^2/\phi_R\leq C_3/R^2$ for a constant $C_3>0$ depending only on the same profile and on $n$. Hence
\begin{align*}
E_R\leq \frac{n}{2}(\tau-s)\left(2\|\nabla f\|_{L^\infty(M\times[s,\tau])}\frac{C_1}{R}+\frac{C_2+2C_3}{R^2}\right).
\end{align*}
The boundedness of $|\nabla f|_g$ on $M\times[s,\tau]$ and the theorem's assumed vanishing cutoff-error condition imply $E_R\to0$ as $R\to\infty$. The boundedness of $|\partial_t f|$ ensures $Q=|\nabla f|_g^2-\partial_t f$ and therefore $F_s$ are bounded on the time slab, so the compactly supported maximum of $G_R$ is finite and the above pointwise argument applies.
Now fix $(x,t)\in M\times[s,\tau]$. For all sufficiently large $R$, one has $x\in B_g(o,R)$ and hence $\phi_R(x)=1$. Therefore
\begin{align*}
F_s(x,t)=G_R(x,t)\leq A_R\leq \frac{n}{2}+E_R.
\end{align*}
Letting $R\to\infty$ gives
\begin{align*}
F_s(x,t)\leq \frac{n}{2}
\end{align*}
for every $(x,t)\in M\times[s,\tau]$.
[/step]
[step:Let the initial time shift tend to zero]
Let $(x,t)\in M\times(0,T]$ be fixed. Choose any $s\in(0,t)$ and apply the estimate from the previous steps with $\tau=t$. Since
\begin{align*}
F_s(x,t)=(t-s)Q(x,t),
\end{align*}
we have
\begin{align*}
(t-s)\left(|\nabla f(x,t)|_g^2-\partial_t f(x,t)\right) \leq \frac{n}{2}.
\end{align*}
Dividing by $t-s>0$ gives
\begin{align*}
|\nabla f(x,t)|_g^2-\partial_t f(x,t) \leq \frac{n}{2(t-s)}.
\end{align*}
Letting $s\downarrow 0$ yields
\begin{align*}
|\nabla f(x,t)|_g^2-\partial_t f(x,t) \leq \frac{n}{2t}.
\end{align*}
Since $(x,t)$ was arbitrary, the estimate holds on all of $M\times(0,T]$.
[/step]