[proofplan]
The [Parabolic Mean Value Formula](/theorems/559) represents $u(x_0, t_0)$ as a weighted space-time integral over a heat ball.
Since $D^\alpha u$ also solves the heat equation (by [Interior Smoothness](/theorems/562), $u \in C^\infty$), the same formula applies to $D^\alpha u$ on a smaller concentric heat ball.
Iterating an integration-by-parts argument on geometrically shrinking heat balls transfers each spatial derivative from $u$ to the weight kernel, each costing a factor of $r^{-1}$, and the parabolic volume scaling $|E(r)| \sim r^{n+2}$ produces the stated powers of $r$.
[/proofplan]
[step:Apply the mean value formula to $D^\alpha u$ on a concentric heat ball]
By the [Parabolic Mean Value Formula](/theorems/559), for $u$ solving $\partial_t u - \Delta u = 0$ and $E(x_0, t_0; r) \subseteq \Omega_T$:
\begin{align*}
u(x_0, t_0) = \frac{1}{4r^n} \iint_{E(x_0, t_0; r)} u(y, s) \frac{|x_0 - y|^2}{(t_0 - s)^2} \, d\mathcal{L}^n(y) \, d\mathcal{L}^1(s).
\end{align*}
By [Interior Smoothness](/theorems/562), $u \in C^\infty(\Omega_T)$, so $D^\alpha u$ is well-defined for any multi-index $\alpha$.
Since $D^\alpha u$ also solves the heat equation (the heat operator commutes with spatial derivatives), the mean value formula applies to $D^\alpha u$ on any heat ball where it is defined.
[/step]
[step:Transfer derivatives to the weight kernel by iterative integration by parts]
For $|\alpha| = 1$, apply the mean value formula for $\partial_{x_i} u$ on the smaller heat ball $E(x_0, t_0; r/2) \subseteq E(x_0, t_0; r)$.
To control $\|\partial_{x_i} u\|_{L^1(E(r/2))}$ in terms of $\|u\|_{L^1(E(r))}$, integrate by parts on the heat ball: the identity $\partial_{x_i} u(y, s) = -\partial_{y_i} u(y, s)$, combined with integration against a smooth cutoff adapted to $E(r)$, transfers the derivative to the weight, picking up a factor of order $r^{-1}$ from the geometry of the heat ball at scale $r$.
Each spatial derivative costs a factor of $r^{-1}$, and each time derivative costs $r^{-2}$ (by the parabolic scaling $\partial_t u = \Delta u$, replacing one time derivative with two spatial derivatives).
For $|\alpha| = k$, iterate $k$ times on geometrically shrinking concentric heat balls with radii $r, r(1 - 1/(2k)), \ldots, r/2$:
\begin{align*}
\|D^\alpha u\|_{L^\infty(E(r/2))} \leq \frac{C_k}{r^k} \cdot \frac{1}{|E(r)|}\|u\|_{L^1(E(r))}.
\end{align*}
[/step]
[step:Substitute the volume scaling and conclude the estimate]
The heat ball $E(x_0, t_0; r)$ has $\mathcal{L}^{n+1}$-measure proportional to $r^{n+2}$: the $n$ spatial dimensions scale as $r$ and the time dimension scales as $r^2$.
Substituting $|E(r)| \sim C_n r^{n+2}$:
\begin{align*}
\max_{E(r/2)} |D^\alpha u| \leq \frac{C_k}{r^{n+2+k}}\|u\|_{L^1(E(r))}.
\end{align*}
Since $E(x_0, t_0; r) \subseteq B(x_0, r) \times (t_0 - 4r^2, t_0]$ (the heat ball is contained in a standard parabolic cylinder of comparable dimensions), the $L^1$ norm over $E(r)$ is bounded by the $L^1$ norm over the cylinder:
\begin{align*}
\max_{B(x_0, r/2) \times [t_0 - r^2, t_0]} |D^\alpha u| \leq \frac{C_k}{r^{n+2+k}} \|u\|_{L^1(B(x_0, r) \times (t_0 - 4r^2, t_0])}.
\end{align*}
[/step]