[proofplan]
We write the heat semigroup as convolution with the heat kernel and move the spatial derivative onto the kernel. The derivative of the Gaussian has an exact parabolic scaling, so its $L^r$ norm has the factor $t^{-k/2-n(1-1/r)/2}$. Choosing $r$ by the Young convolution relation $1+1/q=1/p+1/r$ gives exactly the exponent in the desired estimate. The argument first identifies the derivative at the kernel level and then applies [Young's convolution inequality](/theorems/463) to extend the estimate to all $L^p$ data.
[/proofplan]
[step:Differentiate the heat kernel convolution in the spatial variable]
Let $\mathcal{L}^n$ denote [Lebesgue measure](/page/Lebesgue%20Measure) on $\mathbb{R}^n$, and let $C_c^\infty(\mathbb{R}^n)$ denote the space of smooth functions $\mathbb{R}^n\to\mathbb{R}$ with compact support. For each $t>0$, define the heat kernel $\Gamma_t: \mathbb{R}^n \to \mathbb{R}$ by
\begin{align*}
\Gamma_t(x)=(4\pi t)^{-n/2} e^{-\frac{|x|^2}{4t}}.
\end{align*}
For $f \in L^p(\mathbb{R}^n)$, define the heat semigroup function $e^{t\Delta}f: \mathbb{R}^n \to \mathbb{R}$ by
\begin{align*}
(e^{t\Delta}f)(x)=\int_{\mathbb{R}^n} \Gamma_t(x-y) f(y)\, d\mathcal{L}^n(y).
\end{align*}
For a multi-index $\alpha \in \mathbb{N}_0^n$ with $|\alpha|=k$, define the derivative kernel $K_{t,\alpha}: \mathbb{R}^n \to \mathbb{R}$ by
\begin{align*}
K_{t,\alpha}(x)=D^\alpha \Gamma_t(x).
\end{align*}
Since $\Gamma_t$ is smooth and every derivative of $\Gamma_t$ is a polynomial multiple of $\Gamma_t$, $K_{t,\alpha} \in L^r(\mathbb{R}^n)$ for every $1 \leq r \leq \infty$. Let $p'\in[1,\infty]$ denote the Hölder conjugate exponent of $p$, with $1/p+1/p'=1$. For each multi-index $\beta\in\mathbb{N}_0^n$ with $|\beta|\leq k$, the function $D^\beta\Gamma_t$ lies in $L^{p'}(\mathbb{R}^n)$, so Hölder's inequality shows that the integral defining $(D^\beta\Gamma_t)*f$ is absolutely convergent for each $x\in\mathbb{R}^n$. The Gaussian derivative bound also gives an $L^{p'}$ dominating function in a neighbourhood of each $x$, so differentiation under the integral is justified for each derivative up to order $k$. Therefore
\begin{align*}
D^\alpha e^{t\Delta}f=K_{t,\alpha}*f.
\end{align*}
[guided]
We want to estimate a derivative of $e^{t\Delta}f$, and the useful point is that the heat semigroup is a convolution operator. Let $\mathcal{L}^n$ denote Lebesgue measure on $\mathbb{R}^n$, and let $C_c^\infty(\mathbb{R}^n)$ denote the space of smooth functions $\mathbb{R}^n\to\mathbb{R}$ with compact support. For each $t>0$, the heat kernel is the smooth function $\Gamma_t: \mathbb{R}^n \to \mathbb{R}$ defined by
\begin{align*}
\Gamma_t(x)=(4\pi t)^{-n/2} e^{-\frac{|x|^2}{4t}}.
\end{align*}
Thus the heat semigroup applied to $f \in L^p(\mathbb{R}^n)$ is the function $e^{t\Delta}f: \mathbb{R}^n \to \mathbb{R}$ defined by
\begin{align*}
(e^{t\Delta}f)(x)=\int_{\mathbb{R}^n} \Gamma_t(x-y) f(y)\, d\mathcal{L}^n(y).
\end{align*}
Now fix a multi-index $\alpha \in \mathbb{N}_0^n$ with $|\alpha|=k$. We introduce the derivative kernel $K_{t,\alpha}: \mathbb{R}^n \to \mathbb{R}$ by
\begin{align*}
K_{t,\alpha}(x)=D^\alpha \Gamma_t(x).
\end{align*}
The reason for this definition is that differentiating the convolution in the $x$ variable differentiates only the factor $\Gamma_t(x-y)$, not the rough function $f(y)$. Since $\Gamma_t$ is a Gaussian, every spatial derivative $D^\alpha \Gamma_t$ is a polynomial in $x/\sqrt{t}$ multiplied by $\Gamma_t$. Hence $K_{t,\alpha}$ belongs to $L^r(\mathbb{R}^n)$ for every $1 \leq r \leq \infty$.
Let $p'\in[1,\infty]$ denote the Hölder conjugate exponent of $p$, with $1/p+1/p'=1$. For each multi-index $\beta\in\mathbb{N}_0^n$ with $|\beta|\leq k$, the function $D^\beta\Gamma_t$ lies in $L^{p'}(\mathbb{R}^n)$. Hence Hölder's inequality gives absolute convergence of
\begin{align*}
\int_{\mathbb{R}^n} D^\beta\Gamma_t(x-y) f(y)\, d\mathcal{L}^n(y)
\end{align*}
for every $x\in\mathbb{R}^n$. Why is it legitimate to differentiate under the integral? In a small neighbourhood of a fixed point $x$, each derivative of $\Gamma_t(x-y)$ up to order $k$ is bounded by an $L^{p'}$ Gaussian derivative majorant in the $y$ variable; multiplying by $f\in L^p(\mathbb{R}^n)$ gives an integrable majorant by Hölder's inequality. The [dominated convergence theorem](/theorems/4) for difference quotients therefore permits differentiation under the integral, one coordinate derivative at a time, up to total order $k$. Consequently
\begin{align*}
D^\alpha e^{t\Delta}f(x)=\int_{\mathbb{R}^n} D^\alpha \Gamma_t(x-y) f(y)\, d\mathcal{L}^n(y)=(K_{t,\alpha}*f)(x).
\end{align*}
[/guided]
[/step]
[step:Compute the parabolic scaling of the derivative kernel]
Let $\Gamma_1: \mathbb{R}^n \to \mathbb{R}$ be defined by
\begin{align*}
\Gamma_1(z)=(4\pi)^{-n/2} e^{-\frac{|z|^2}{4}}.
\end{align*}
The scaling identity for the heat kernel is
\begin{align*}
\Gamma_t(x)=t^{-n/2}\Gamma_1(t^{-1/2}x).
\end{align*}
Differentiating this identity gives
\begin{align*}
D^\alpha \Gamma_t(x)=t^{-n/2-k/2}(D^\alpha\Gamma_1)(t^{-1/2}x).
\end{align*}
For $1 \leq r < \infty$, the change of variables $z=t^{-1/2}x$, equivalently $x=t^{1/2}z$, transforms Lebesgue measure by
\begin{align*}
d\mathcal{L}^n(x)=t^{n/2}\,d\mathcal{L}^n(z).
\end{align*}
Therefore
\begin{align*}
\|K_{t,\alpha}\|_{L^r(\mathbb{R}^n)}
=
t^{-n/2-k/2+n/(2r)}
\|D^\alpha\Gamma_1\|_{L^r(\mathbb{R}^n)}.
\end{align*}
For $r=\infty$, taking the essential supremum in the same scaling identity gives
\begin{align*}
\|K_{t,\alpha}\|_{L^\infty(\mathbb{R}^n)}
=
t^{-n/2-k/2}
\|D^\alpha\Gamma_1\|_{L^\infty(\mathbb{R}^n)}.
\end{align*}
Thus, for every $1 \leq r \leq \infty$, there is a finite constant
\begin{align*}
A_{\alpha,r}:=\|D^\alpha\Gamma_1\|_{L^r(\mathbb{R}^n)}
\end{align*}
such that
\begin{align*}
\|K_{t,\alpha}\|_{L^r(\mathbb{R}^n)}
\leq
A_{\alpha,r} t^{-k/2-\frac{n}{2}\left(1-\frac{1}{r}\right)}.
\end{align*}
[guided]
The heat kernel has parabolic scaling: space scales like $t^{1/2}$, while the prefactor scales like $t^{-n/2}$. Define $\Gamma_1: \mathbb{R}^n\to\mathbb{R}$ by
\begin{align*}
\Gamma_1(z)=(4\pi)^{-n/2}e^{-\frac{|z|^2}{4}}.
\end{align*}
Then
\begin{align*}
\Gamma_t(x)=t^{-n/2}\Gamma_1(t^{-1/2}x).
\end{align*}
Applying the chain rule once contributes one factor $t^{-1/2}$; applying it $k=|\alpha|$ times gives
\begin{align*}
D^\alpha\Gamma_t(x)=t^{-n/2-k/2}(D^\alpha\Gamma_1)(t^{-1/2}x).
\end{align*}
For $1\leq r<\infty$, we compute the $L^r$ norm using the change of variables $z=t^{-1/2}x$, equivalently $x=t^{1/2}z$. Under this substitution, Lebesgue measure transforms as
\begin{align*}
d\mathcal{L}^n(x)=t^{n/2}\,d\mathcal{L}^n(z).
\end{align*}
Therefore
\begin{align*}
\|K_{t,\alpha}\|_{L^r(\mathbb{R}^n)}=t^{-n/2-k/2+n/(2r)}\|D^\alpha\Gamma_1\|_{L^r(\mathbb{R}^n)}.
\end{align*}
For $r=\infty$, the same scaling identity gives
\begin{align*}
\|K_{t,\alpha}\|_{L^\infty(\mathbb{R}^n)}=t^{-n/2-k/2}\|D^\alpha\Gamma_1\|_{L^\infty(\mathbb{R}^n)}.
\end{align*}
Thus for every $1\leq r\leq\infty$, with
\begin{align*}
A_{\alpha,r}:=\|D^\alpha\Gamma_1\|_{L^r(\mathbb{R}^n)},
\end{align*}
we have
\begin{align*}
\|K_{t,\alpha}\|_{L^r(\mathbb{R}^n)}\leq A_{\alpha,r}t^{-k/2-\frac{n}{2}\left(1-\frac{1}{r}\right)}.
\end{align*}
[/guided]
[/step]
[step:Choose the Young exponent matching the desired smoothing]
Define $r \in [1,\infty]$ by
\begin{align*}
\frac{1}{r}=1+\frac{1}{q}-\frac{1}{p}.
\end{align*}
Because $1 \leq p \leq q \leq \infty$, we have $0 \leq 1/r \leq 1$, so this definition is valid. Moreover,
\begin{align*}
1+\frac{1}{q}=\frac{1}{p}+\frac{1}{r}.
\end{align*}
The standard Young convolution inequality for this triple of exponents gives
\begin{align*}
\|K_{t,\alpha}*f\|_{L^q(\mathbb{R}^n)}
\leq
\|K_{t,\alpha}\|_{L^r(\mathbb{R}^n)}
\|f\|_{L^p(\mathbb{R}^n)}.
\end{align*}
[guided]
We now choose the exponent that makes Young's convolution inequality produce exactly the target $L^p\to L^q$ estimate. Define $r\in[1,\infty]$ by
\begin{align*}
\frac{1}{r}=1+\frac{1}{q}-\frac{1}{p}.
\end{align*}
Because $1\leq p\leq q\leq\infty$, we have $1/p\geq 1/q$, so $1/r\leq 1$. Also $1/p-1/q\leq 1$, so $1/r\geq 0$. Hence this formula defines a valid exponent $r\in[1,\infty]$. Rearranging the definition gives the Young relation
\begin{align*}
1+\frac{1}{q}=\frac{1}{p}+\frac{1}{r}.
\end{align*}
Young's convolution inequality applies to $K_{t,\alpha}\in L^r(\mathbb{R}^n)$ and $f\in L^p(\mathbb{R}^n)$ with this triple of exponents, giving
\begin{align*}
\|K_{t,\alpha}*f\|_{L^q(\mathbb{R}^n)}\leq \|K_{t,\alpha}\|_{L^r(\mathbb{R}^n)}\|f\|_{L^p(\mathbb{R}^n)}.
\end{align*}
This is the point where the condition $p\leq q$ is used: it ensures that the exponent $r$ required by convolution is not outside the allowed range.
[/guided]
[/step]
[step:Substitute the kernel norm and simplify the exponent]
Combining the identity $D^\alpha e^{t\Delta}f=K_{t,\alpha}*f$ with the preceding convolution estimate gives
\begin{align*}
\|D^\alpha e^{t\Delta}f\|_{L^q(\mathbb{R}^n)}
\leq
A_{\alpha,r} t^{-k/2-\frac{n}{2}\left(1-\frac{1}{r}\right)}
\|f\|_{L^p(\mathbb{R}^n)}.
\end{align*}
From the definition of $r$,
\begin{align*}
1-\frac{1}{r}
=
1-\left(1+\frac{1}{q}-\frac{1}{p}\right)
=
\frac{1}{p}-\frac{1}{q}.
\end{align*}
Therefore
\begin{align*}
\|D^\alpha e^{t\Delta}f\|_{L^q(\mathbb{R}^n)}
\leq
A_{\alpha,r} t^{-\frac{k}{2}-\frac{n}{2}\left(\frac{1}{p}-\frac{1}{q}\right)}
\|f\|_{L^p(\mathbb{R}^n)}.
\end{align*}
Since there are only finitely many multi-indices $\alpha \in \mathbb{N}_0^n$ with $|\alpha|=k$, define
\begin{align*}
C(n,k,p,q):=
\max_{|\alpha|=k} \|D^\alpha\Gamma_1\|_{L^r(\mathbb{R}^n)}.
\end{align*}
This constant is finite because each $D^\alpha\Gamma_1$ is a polynomial multiple of a Gaussian. The desired estimate follows for every multi-index $\alpha$ with $|\alpha|=k$.
[guided]
We combine the pieces. From the first step,
\begin{align*}
D^\alpha e^{t\Delta}f=K_{t,\alpha}*f.
\end{align*}
[Young's inequality](/theorems/244) and the kernel estimate therefore give
\begin{align*}
\|D^\alpha e^{t\Delta}f\|_{L^q(\mathbb{R}^n)}\leq A_{\alpha,r}t^{-k/2-\frac{n}{2}\left(1-\frac{1}{r}\right)}\|f\|_{L^p(\mathbb{R}^n)}.
\end{align*}
The exponent now simplifies because the definition of $r$ gives
\begin{align*}
1-\frac{1}{r}=\frac{1}{p}-\frac{1}{q}.
\end{align*}
Substituting this identity yields
\begin{align*}
\|D^\alpha e^{t\Delta}f\|_{L^q(\mathbb{R}^n)}\leq A_{\alpha,r}t^{-\frac{k}{2}-\frac{n}{2}\left(\frac{1}{p}-\frac{1}{q}\right)}\|f\|_{L^p(\mathbb{R}^n)}.
\end{align*}
The theorem asks for one constant that works for every multi-index of order $k$, not a different constant for each $\alpha$. The set of multi-indices $\alpha\in\mathbb{N}_0^n$ with $|\alpha|=k$ is finite, so we may define
\begin{align*}
C(n,k,p,q):=\max_{|\alpha|=k}\|D^\alpha\Gamma_1\|_{L^r(\mathbb{R}^n)}.
\end{align*}
This maximum is finite because every derivative $D^\alpha\Gamma_1$ is a polynomial multiple of a Gaussian, hence belongs to $L^r(\mathbb{R}^n)$ for every $1\leq r\leq\infty$. Replacing $A_{\alpha,r}$ by this maximum gives the asserted estimate uniformly for all $|\alpha|=k$.
[/guided]
[/step]