[proofplan]
We combine the [heat equation](/page/Heat%20Equation) energy identity with the $L^1$-contractivity of the heat semigroup and Nash's inequality on $\mathbb{R}^n$. Nash's inequality converts the energy dissipation term $\|\nabla u(t)\|_{L^2}^2$ into a lower bound involving only $\|u(t)\|_{L^2}$ and the conserved-or-decreasing $L^1$ size. This gives a scalar differential inequality for $Y(t)=\|u(t)\|_{L^2}^2$, and solving that inequality yields the decay rate $t^{-n/4}$ after taking square roots.
[/proofplan]
[step:Handle the zero initial mass case]
Let $\mathcal{L}^n$ denote $n$-dimensional [Lebesgue Measure](/page/Lebesgue%20Measure) on $\mathbb{R}^n$. Set
\begin{align*}
M := \|u_0\|_{L^1(\mathbb{R}^n)}.
\end{align*}
If $M=0$, then $u_0=0$ $\mathcal{L}^n$-a.e. on $\mathbb{R}^n$. Since $e^{t\Delta}$ is linear on $L^2(\mathbb{R}^n)$, we have $u(t)=0$ in $L^2(\mathbb{R}^n)$ for every $t \geq 0$. The asserted estimates follow. Hence assume from now on that $M>0$.
[/step]
[step:Use heat semigroup dissipation and $L^1$ contractivity]
We use the whole-space heat semigroup given by the heat-kernel representation in the [Solution Of The Cauchy Problem For The Heat Equation](/theorems/54) and, equivalently on $L^2(\mathbb{R}^n)$, by the Fourier multiplier $e^{-t|\xi|^2}$. For any $g\in L^2(\mathbb{R}^n)$, let $\widehat{g}:\mathbb{R}^n\to\mathbb{C}$ denote its [Fourier transform](/page/Fourier%20Transform) in the Plancherel $L^2$ sense. Let $H^1(\mathbb{R}^n)$ denote the [Sobolev space](/page/Sobolev%20Space) of all $g\in L^2(\mathbb{R}^n)$ whose weak first partial derivatives $\partial_{x_i}g$, for $1\leq i\leq n$, belong to $L^2(\mathbb{R}^n)$; its gradient norm is defined by
\begin{align*}
\|\nabla g\|_{L^2(\mathbb{R}^n)}^2 := \sum_{i=1}^n \|\partial_{x_i}g\|_{L^2(\mathbb{R}^n)}^2.
\end{align*}
For each $t>0$, define $S(t): L^2(\mathbb{R}^n) \to L^2(\mathbb{R}^n)$ by $S(t)u_0=u(t)$, where the Fourier transform of $S(t)u_0$ is $\xi \mapsto e^{-t|\xi|^2}\widehat{u_0}(\xi)$. Since $0\leq e^{-t|\xi|^2}\leq 1$ and $e^{-t|\xi|^2}\to 1$ pointwise as $t\downarrow 0$, Plancherel's theorem and dominated convergence give strong continuity on $L^2(\mathbb{R}^n)$. For $t>0$, the estimate $|\xi|^2e^{-2t|\xi|^2}\leq (2et)^{-1}$ shows $S(t)u_0\in H^1(\mathbb{R}^n)$. For Schwartz initial data, differentiating under the Fourier transform and using Plancherel gives the integrated energy identity on every interval $[a,b]\subset (0,\infty)$:
\begin{align*}
\frac{1}{2}\|S(b)u_{0,k}\|_{L^2(\mathbb{R}^n)}^2 + \int_a^b \|\nabla S(t)u_{0,k}\|_{L^2(\mathbb{R}^n)}^2\, d\mathcal{L}^1(t) = \frac{1}{2}\|S(a)u_{0,k}\|_{L^2(\mathbb{R}^n)}^2.
\end{align*}
For general $u_0\in L^2(\mathbb{R}^n)$, choose Schwartz functions $u_{0,k}\to u_0$ in $L^2(\mathbb{R}^n)$. To pass to the limit in this integrated identity, fix $0<a<b<\infty$. Plancherel gives
\begin{align*}
\int_a^b \|\nabla S(t)(u_{0,k}-u_0)\|_{L^2(\mathbb{R}^n)}^2\, d\mathcal{L}^1(t)
= \int_{\mathbb{R}^n} |\widehat{u_{0,k}-u_0}(\xi)|^2\left(\int_a^b |\xi|^2e^{-2t|\xi|^2}\, d\mathcal{L}^1(t)\right)d\mathcal{L}^n(\xi).
\end{align*}
Since $|\xi|^2e^{-2t|\xi|^2}\leq (2ea)^{-1}$ for $t\in[a,b]$, the right-hand side is at most $(b-a)(2ea)^{-1}\|u_{0,k}-u_0\|_{L^2(\mathbb{R}^n)}^2$, which tends to $0$. The $L^2$ contraction of $S(t)$ also gives $S(a)u_{0,k}\to S(a)u_0$ and $S(b)u_{0,k}\to S(b)u_0$ in $L^2(\mathbb{R}^n)$. Passing to the limit yields
\begin{align*}
\frac{1}{2}\|S(b)u_0\|_{L^2(\mathbb{R}^n)}^2 + \int_a^b \|\nabla S(t)u_0\|_{L^2(\mathbb{R}^n)}^2\, d\mathcal{L}^1(t) = \frac{1}{2}\|S(a)u_0\|_{L^2(\mathbb{R}^n)}^2.
\end{align*}
Because this holds for every compact interval $[a,b]\subset(0,\infty)$, the map $t\mapsto \|u(t)\|_{L^2(\mathbb{R}^n)}^2$ is absolutely continuous on compact subintervals of $(0,\infty)$, and differentiating the integrated identity gives the whole-space version of the [Energy Dissipation For The Heat Equation](/theorems/564). Thus, for a.e. $t>0$,
\begin{align*}
\frac{1}{2}\frac{d}{dt}\|u(t)\|_{L^2(\mathbb{R}^n)}^2 + \|\nabla u(t)\|_{L^2(\mathbb{R}^n)}^2 = 0.
\end{align*}
The heat kernel is nonnegative and has total mass $1$ with respect to $\mathcal{L}^n$. Therefore the same representation, together with [Young's Convolution Inequality](/theorems/463) applied with exponents $1$ and $1$, gives the $L^1$ contraction estimate
\begin{align*}
\|u(t)\|_{L^1(\mathbb{R}^n)} \leq \|u_0\|_{L^1(\mathbb{R}^n)} = M.
\end{align*}
Define $Y: (0,\infty) \to [0,\infty)$ by
\begin{align*}
Y(t) := \|u(t)\|_{L^2(\mathbb{R}^n)}^2.
\end{align*}
Then $Y$ is absolutely continuous on compact subintervals of $(0,\infty)$ and, for a.e. $t>0$,
\begin{align*}
Y'(t) = -2\|\nabla u(t)\|_{L^2(\mathbb{R}^n)}^2.
\end{align*}
[guided]
Let $\mathcal{L}^n$ denote $n$-dimensional [Lebesgue Measure](/page/Lebesgue%20Measure) on $\mathbb{R}^n$, and define
\begin{align*}
M := \|u_0\|_{L^1(\mathbb{R}^n)}.
\end{align*}
First dispose of the case $M=0$. Since the $L^1$ norm of $u_0$ is zero, $u_0=0$ $\mathcal{L}^n$-a.e. on $\mathbb{R}^n$. The heat semigroup is linear on $L^2(\mathbb{R}^n)$, so $u(t)=e^{t\Delta}u_0=0$ in $L^2(\mathbb{R}^n)$ for every $t\geq 0$, and all asserted estimates follow. Hence, for the remainder of the proof, assume $M>0$; this is what later makes the factor $M^{-4/n}$ meaningful.
The goal is to replace the partial differential equation by an ordinary differential inequality for one scalar quantity. The right scalar quantity is the square of the $L^2$ norm, so define $Y: (0,\infty) \to [0,\infty)$ by
\begin{align*}
Y(t) := \|u(t)\|_{L^2(\mathbb{R}^n)}^2.
\end{align*}
The heat equation dissipates $L^2$ energy. The justification for rough initial data comes from the Fourier-multiplier construction of the heat semigroup, in parallel with the heat-kernel representation in the [Solution Of The Cauchy Problem For The Heat Equation](/theorems/54). For any $g\in L^2(\mathbb{R}^n)$, let $\widehat{g}:\mathbb{R}^n\to\mathbb{C}$ denote its Fourier transform in the Plancherel $L^2$ sense. Let $H^1(\mathbb{R}^n)$ denote the Sobolev space of all $g\in L^2(\mathbb{R}^n)$ whose weak first partial derivatives $\partial_{x_i}g$, for $1\leq i\leq n$, belong to $L^2(\mathbb{R}^n)$, with
\begin{align*}
\|\nabla g\|_{L^2(\mathbb{R}^n)}^2 := \sum_{i=1}^n \|\partial_{x_i}g\|_{L^2(\mathbb{R}^n)}^2.
\end{align*}
For each $t>0$, the solution operator $S(t): L^2(\mathbb{R}^n) \to L^2(\mathbb{R}^n)$ is defined by the Fourier multiplier $e^{-t|\xi|^2}$. The multiplier is bounded by $1$, so Plancherel gives $\|S(t)g\|_{L^2(\mathbb{R}^n)}\leq \|g\|_{L^2(\mathbb{R}^n)}$ for every $g\in L^2(\mathbb{R}^n)$. Since $e^{-t|\xi|^2}\to 1$ pointwise as $t\downarrow 0$, dominated convergence in Fourier space gives strong continuity on $L^2(\mathbb{R}^n)$. The smoothing assertion follows because $|\xi|^2e^{-2t|\xi|^2}\leq (2et)^{-1}$, so $\xi\widehat{S(t)g}(\xi)$ is square-integrable whenever $g\in L^2(\mathbb{R}^n)$; hence $S(t)g\in H^1(\mathbb{R}^n)$ for $t>0$.
For Schwartz initial data, the map $t\mapsto S(t)g$ is differentiable in $L^2(\mathbb{R}^n)$, and differentiating the Fourier multiplier gives $\partial_t S(t)g=\Delta S(t)g$. Taking the $L^2$ [inner product](/page/Inner%20Product) with $S(t)g$, using Plancherel, and integrating by parts in Fourier variables gives the integrated energy identity on every interval $[a,b]\subset(0,\infty)$. For the approximating Schwartz data this reads
\begin{align*}
\frac{1}{2}\|S(b)u_{0,k}\|_{L^2(\mathbb{R}^n)}^2 + \int_a^b \|\nabla S(t)u_{0,k}\|_{L^2(\mathbb{R}^n)}^2\, d\mathcal{L}^1(t) = \frac{1}{2}\|S(a)u_{0,k}\|_{L^2(\mathbb{R}^n)}^2.
\end{align*}
For general $u_0\in L^2(\mathbb{R}^n)$, choose Schwartz functions $u_{0,k}\to u_0$ in $L^2(\mathbb{R}^n)$. The $L^2$ convergence at the endpoints $a$ and $b$ is immediate from the contraction estimate for $S(t)$. For the gradient convergence, fix $0<a<b<\infty$. Plancherel gives
\begin{align*}
\int_a^b \|\nabla S(t)(u_{0,k}-u_0)\|_{L^2(\mathbb{R}^n)}^2\, d\mathcal{L}^1(t)
= \int_{\mathbb{R}^n} |\widehat{u_{0,k}-u_0}(\xi)|^2\left(\int_a^b |\xi|^2e^{-2t|\xi|^2}\, d\mathcal{L}^1(t)\right)d\mathcal{L}^n(\xi).
\end{align*}
For $t\in[a,b]$, the multiplier bound $|\xi|^2e^{-2t|\xi|^2}\leq (2ea)^{-1}$ gives
\begin{align*}
\int_a^b \|\nabla S(t)(u_{0,k}-u_0)\|_{L^2(\mathbb{R}^n)}^2\, d\mathcal{L}^1(t)
\leq (b-a)(2ea)^{-1}\|u_{0,k}-u_0\|_{L^2(\mathbb{R}^n)}^2.
\end{align*}
The right-hand side tends to $0$, so $S(t)u_{0,k}\to S(t)u_0$ in the time-integrated $H^1$ sense needed to pass to the limit in the integrated energy identity on compact subintervals of $(0,\infty)$. Passing to the limit gives
\begin{align*}
\frac{1}{2}\|S(b)u_0\|_{L^2(\mathbb{R}^n)}^2 + \int_a^b \|\nabla S(t)u_0\|_{L^2(\mathbb{R}^n)}^2\, d\mathcal{L}^1(t) = \frac{1}{2}\|S(a)u_0\|_{L^2(\mathbb{R}^n)}^2.
\end{align*}
Since this identity holds for every compact interval $[a,b]\subset(0,\infty)$, the function $Y$ is absolutely continuous on compact subintervals of $(0,\infty)$. Differentiating the integrated identity gives the whole-space form of [Energy Dissipation For The Heat Equation](/theorems/564). Therefore, for a.e. $t>0$,
\begin{align*}
\frac{1}{2}\frac{d}{dt}\|u(t)\|_{L^2(\mathbb{R}^n)}^2 + \|\nabla u(t)\|_{L^2(\mathbb{R}^n)}^2 = 0.
\end{align*}
In terms of $Y$, this becomes
\begin{align*}
Y'(t) = -2\|\nabla u(t)\|_{L^2(\mathbb{R}^n)}^2
\end{align*}
for a.e. $t>0$.
We also need a quantity that does not grow in time. The heat kernel representation writes $u(t)$ as convolution with a nonnegative kernel of total $\mathcal{L}^n$-mass $1$. Applying [Young's Convolution Inequality](/theorems/463) with exponents $1$ and $1$ gives contraction on $L^1(\mathbb{R}^n)$. Hence
\begin{align*}
\|u(t)\|_{L^1(\mathbb{R}^n)} \leq \|u_0\|_{L^1(\mathbb{R}^n)} = M
\end{align*}
for every $t\geq 0$. This is the point of introducing $M$: Nash's inequality contains an $L^1$ factor, and contractivity allows us to replace the time-dependent factor $\|u(t)\|_{L^1}$ by the fixed initial size $M$.
[/guided]
[/step]
[step:Apply Nash's inequality to convert dissipation into a power of $Y(t)$]
Let $A_n>0$ be a Nash constant in dimension $n$. By the [Nash Inequality](/theorems/565), every function $f\in L^1(\mathbb{R}^n)\cap H^1(\mathbb{R}^n)$ satisfies
\begin{align*}
\|f\|_{L^2(\mathbb{R}^n)}^{2+4/n}
\leq A_n \|f\|_{L^1(\mathbb{R}^n)}^{4/n}\|\nabla f\|_{L^2(\mathbb{R}^n)}^2.
\end{align*}
This is the only non-semigroup analytic input in the proof.
Apply this with $f=u(t)$. Since $u(t)\in L^1(\mathbb{R}^n)\cap H^1(\mathbb{R}^n)$ for $t>0$ and $\|u(t)\|_{L^1(\mathbb{R}^n)}\leq M$, we obtain
\begin{align*}
Y(t)^{1+2/n}
\leq A_n M^{4/n}\|\nabla u(t)\|_{L^2(\mathbb{R}^n)}^2.
\end{align*}
Therefore
\begin{align*}
\|\nabla u(t)\|_{L^2(\mathbb{R}^n)}^2
\geq A_n^{-1}M^{-4/n}Y(t)^{1+2/n}.
\end{align*}
Combining this lower bound with the energy identity gives, for a.e. $t>0$,
\begin{align*}
Y'(t) \leq -2A_n^{-1}M^{-4/n}Y(t)^{1+2/n}.
\end{align*}
[/step]
[step:Solve the scalar differential inequality]
Define the positive constant
\begin{align*}
\kappa_n := 2A_n^{-1}.
\end{align*}
The differential inequality is
\begin{align*}
Y'(t) \leq -\kappa_n M^{-4/n}Y(t)^{1+2/n}
\end{align*}
for a.e. $t>0$. Since $Y$ is nonincreasing, either $Y(t)=0$ for some $t>0$, in which case the desired estimate holds for all later times, or $Y(t)>0$ on the interval under consideration.
On every interval where $Y>0$, multiply the inequality by $-\frac{2}{n}Y(t)^{-1-2/n}$, which reverses the sign because the multiplier is negative. This gives
\begin{align*}
\frac{d}{dt}Y(t)^{-2/n} \geq \frac{2}{n}\kappa_n M^{-4/n}
\end{align*}
for a.e. $t>0$. Integrating over $(0,t)$ in the improper sense yields
\begin{align*}
Y(t)^{-2/n} \geq \frac{2}{n}\kappa_n M^{-4/n}t.
\end{align*}
Taking the negative power $-n/2$ gives
\begin{align*}
Y(t) \leq \left(\frac{n}{2\kappa_n}\right)^{n/2}M^2t^{-n/2}.
\end{align*}
[guided]
We first recover the scalar differential inequality from the analytic estimates, so that this guided step is self-contained. The energy identity gives
\begin{align*}
Y'(t) = -2\|\nabla u(t)\|_{L^2(\mathbb{R}^n)}^2
\end{align*}
for a.e. $t>0$. The [Nash Inequality](/theorems/565) applies to $u(t)$ because $u(t)\in L^1(\mathbb{R}^n)\cap H^1(\mathbb{R}^n)$, and the heat semigroup gives $\|u(t)\|_{L^1(\mathbb{R}^n)}\leq M$. Therefore
\begin{align*}
Y(t)^{1+2/n}
\leq A_nM^{4/n}\|\nabla u(t)\|_{L^2(\mathbb{R}^n)}^2.
\end{align*}
Equivalently,
\begin{align*}
\|\nabla u(t)\|_{L^2(\mathbb{R}^n)}^2
\geq A_n^{-1}M^{-4/n}Y(t)^{1+2/n}.
\end{align*}
Combining this lower bound with the energy identity gives
\begin{align*}
Y'(t) \leq -\kappa_n M^{-4/n}Y(t)^{1+2/n},
\end{align*}
where
\begin{align*}
\kappa_n := 2A_n^{-1}.
\end{align*}
The exponent $1+2/n$ is the decisive feature: it is larger than $1$, so integrating the reciprocal power of $Y$ produces polynomial decay.
First consider an interval on which $Y(t)>0$. This positivity allows us to multiply by powers of $Y(t)$ without division by zero. Multiply both sides by the negative quantity $-\frac{2}{n}Y(t)^{-1-2/n}$. Because the multiplier is negative, the inequality direction reverses:
\begin{align*}
-\frac{2}{n}Y(t)^{-1-2/n}Y'(t)
\geq \frac{2}{n}\kappa_n M^{-4/n}.
\end{align*}
The expression on the left is exactly the derivative of $Y(t)^{-2/n}$, by the chain rule:
\begin{align*}
\frac{d}{dt}Y(t)^{-2/n}
=
-\frac{2}{n}Y(t)^{-1-2/n}Y'(t).
\end{align*}
Hence
\begin{align*}
\frac{d}{dt}Y(t)^{-2/n} \geq \frac{2}{n}\kappa_n M^{-4/n}
\end{align*}
for a.e. $t>0$.
Integrating from a positive time $\varepsilon$ to $t$, where $0<\varepsilon<t$, gives
\begin{align*}
Y(t)^{-2/n} - Y(\varepsilon)^{-2/n}
\geq \frac{2}{n}\kappa_n M^{-4/n}(t-\varepsilon).
\end{align*}
Since $Y(\varepsilon)^{-2/n}\geq 0$, we may discard it from the left-hand side and obtain
\begin{align*}
Y(t)^{-2/n}
\geq \frac{2}{n}\kappa_n M^{-4/n}(t-\varepsilon).
\end{align*}
Letting $\varepsilon \downarrow 0$ yields
\begin{align*}
Y(t)^{-2/n} \geq \frac{2}{n}\kappa_n M^{-4/n}t.
\end{align*}
Finally, taking both sides to the power $-n/2$ reverses the inequality because the power is negative:
\begin{align*}
Y(t) \leq \left(\frac{n}{2\kappa_n}\right)^{n/2}M^2t^{-n/2}.
\end{align*}
If $Y$ reaches $0$ at some positive time, then $u(t)=0$ in $L^2(\mathbb{R}^n)$ at that time, and the energy identity forces $Y$ to remain $0$ afterward. Thus the same estimate holds in that case as well.
[/guided]
[/step]
[step:Take square roots and record the small-time $L^2$ control]
Taking square roots in
\begin{align*}
Y(t) \leq \left(\frac{n}{2\kappa_n}\right)^{n/2}M^2t^{-n/2}
\end{align*}
gives
\begin{align*}
\|u(t)\|_{L^2(\mathbb{R}^n)}
\leq \left(\frac{n}{2\kappa_n}\right)^{n/4}t^{-n/4}\|u_0\|_{L^1(\mathbb{R}^n)}.
\end{align*}
Define
\begin{align*}
C_n := \left(\frac{n}{2\kappa_n}\right)^{n/4}
= \left(\frac{nA_n}{4}\right)^{n/4}.
\end{align*}
This proves the asserted Nash decay estimate.
The energy identity also gives $Y'(t)\leq 0$ for a.e. $t>0$, so $Y$ is nonincreasing on $(0,\infty)$. Since the heat semigroup is strongly continuous on $L^2(\mathbb{R}^n)$, we have $Y(t)\to Y(0)=\|u_0\|_{L^2(\mathbb{R}^n)}^2$ as $t\downarrow 0$. Therefore $Y(t)\leq Y(0)$ for every $t\geq 0$, and hence
\begin{align*}
\|u(t)\|_{L^2(\mathbb{R}^n)}
\leq \|u_0\|_{L^2(\mathbb{R}^n)}.
\end{align*}
Combining this $L^2$ contraction estimate with the Nash decay estimate gives
\begin{align*}
\|u(t)\|_{L^2(\mathbb{R}^n)}
\leq \min\left\{\|u_0\|_{L^2(\mathbb{R}^n)}, C_n t^{-n/4}\|u_0\|_{L^1(\mathbb{R}^n)}\right\}
\end{align*}
for every $t>0$, completing the proof.
[guided]
We now translate the estimate for $Y(t)$ back into the norm appearing in the theorem. By definition,
\begin{align*}
Y(t) := \|u(t)\|_{L^2(\mathbb{R}^n)}^2.
\end{align*}
The scalar estimate from the previous step is
\begin{align*}
Y(t) \leq \left(\frac{n}{2\kappa_n}\right)^{n/2}M^2t^{-n/2}.
\end{align*}
Both sides are nonnegative, so taking square roots preserves the inequality and gives
\begin{align*}
\|u(t)\|_{L^2(\mathbb{R}^n)}
\leq \left(\frac{n}{2\kappa_n}\right)^{n/4}t^{-n/4}M.
\end{align*}
Since $M=\|u_0\|_{L^1(\mathbb{R}^n)}$, this is the claimed $L^1$ to $L^2$ heat-decay estimate. We define the dimension-dependent constant
\begin{align*}
C_n := \left(\frac{n}{2\kappa_n}\right)^{n/4}.
\end{align*}
Because $\kappa_n=2A_n^{-1}$, this is equivalently
\begin{align*}
C_n = \left(\frac{nA_n}{4}\right)^{n/4}.
\end{align*}
Thus the constant depends only on $n$, since the Nash constant $A_n$ depends only on the dimension.
The small-time control comes from the same energy identity. The identity implies $Y'(t)\leq 0$ for a.e. $t>0$, so $Y$ is nonincreasing on $(0,\infty)$. Strong continuity of the heat semigroup on $L^2(\mathbb{R}^n)$ gives
\begin{align*}
\lim_{t\downarrow 0}Y(t)=\|u_0\|_{L^2(\mathbb{R}^n)}^2.
\end{align*}
Therefore $Y(t)\leq \|u_0\|_{L^2(\mathbb{R}^n)}^2$ for every $t\geq 0$, and taking square roots gives
\begin{align*}
\|u(t)\|_{L^2(\mathbb{R}^n)}
\leq \|u_0\|_{L^2(\mathbb{R}^n)}.
\end{align*}
Keeping both estimates records the stronger combined form
\begin{align*}
\|u(t)\|_{L^2(\mathbb{R}^n)}
\leq \min\left\{\|u_0\|_{L^2(\mathbb{R}^n)}, C_n t^{-n/4}\|u_0\|_{L^1(\mathbb{R}^n)}\right\}.
\end{align*}
This proves both the decay estimate and the advertised small-time $L^2$ control.
[/guided]
[/step]