[guided]The single key inequality of this step is the optimization $\sup_{\lambda > 0}\lambda^m e^{-\lambda t} = (m/(et))^m$, which we now derive carefully and then apply.
**Step (a): the optimization.** Define $\varphi_m: (0, \infty) \to (0, \infty)$ by $\varphi_m(\lambda) = \lambda^m e^{-\lambda t}$, with $t > 0$ and $m \ge 1$ fixed. We seek $\sup_{\lambda > 0}\varphi_m(\lambda)$. Since $\varphi_m$ is smooth and $\varphi_m(\lambda) \to 0$ as $\lambda \to 0^+$ and as $\lambda \to \infty$, the supremum is attained at a critical point. Computing the derivative:
\begin{align*}
\varphi_m'(\lambda) = m\lambda^{m-1}e^{-\lambda t} - t\lambda^m e^{-\lambda t} = \lambda^{m-1}e^{-\lambda t}(m - t\lambda).
\end{align*}
Setting $\varphi_m'(\lambda) = 0$: since $\lambda^{m-1}e^{-\lambda t} > 0$ for $\lambda > 0$, the only critical point is $\lambda^* = m/t$. The sign of $\varphi_m'$ flips from positive to negative at $\lambda^*$, so $\lambda^*$ is the global maximum on $(0,\infty)$. Substituting:
\begin{align*}
\sup_{\lambda > 0}\lambda^m e^{-\lambda t} = \varphi_m\!\left(\frac{m}{t}\right) = \left(\frac{m}{t}\right)^m e^{-m\cdot(t/t)} = \left(\frac{m}{t}\right)^m e^{-m} = \frac{m^m}{(et)^m}.
\end{align*}
This is sharp: equality holds at $\lambda = m/t$.
Why does this control the smoothing rate? Because in the eigenfunction expansion, the $H^{2m}$-relevant weight on the $k$-th coefficient is $(1+\lambda_k)^{2m}$, and the semigroup multiplies each coefficient by $e^{-\lambda_k t}$. We need to bound $(1+\lambda_k)^{2m}e^{-2\lambda_k t}$ uniformly in $k$ — and the optimization above does exactly that, after a binomial expansion to handle the $(1+\lambda_k)^{2m}$ factor.
**Step (b): bound the weight $(1+\lambda_k)^{2m}e^{-2\lambda_k t}$.** Expand by the binomial theorem:
\begin{align*}
(1+\lambda_k)^{2m} = \sum_{j=0}^{2m}\binom{2m}{j}\lambda_k^j.
\end{align*}
Hence
\begin{align*}
(1+\lambda_k)^{2m}e^{-2\lambda_k t} = \sum_{j=0}^{2m}\binom{2m}{j}\lambda_k^j e^{-2\lambda_k t}.
\end{align*}
For each $j \ge 1$, applying the optimization from step (a) with parameter $2t$ in place of $t$:
\begin{align*}
\lambda_k^j e^{-2\lambda_k t} \le \sup_{\lambda > 0}\lambda^j e^{-2\lambda t} = \frac{j^j}{(2et)^j}.
\end{align*}
The $j = 0$ term contributes $\lambda_k^0 e^{-2\lambda_k t} \le 1$. So
\begin{align*}
(1+\lambda_k)^{2m}e^{-2\lambda_k t} \le 1 + \sum_{j=1}^{2m}\binom{2m}{j}\frac{j^j}{(2et)^j} =: M_m(t),
\end{align*}
and crucially $M_m(t)$ is independent of $k$.
What is the rate of $M_m(t)$ as $t \to 0^+$? Each term $j^j/(2et)^j$ scales like $t^{-j}$, and the largest power is $j = 2m$, giving $M_m(t) \sim C\, t^{-2m}$ for small $t$. Concretely, for $t \in (0, 1]$ we can absorb all terms into a single constant: $M_m(t) \le D_m\, t^{-2m}$ for some $D_m = D_m(m) > 0$.
What about large $t$? The estimate $M_m(t) \le D_m t^{-2m}$ is poor for $t \ge 1$ because $t^{-2m}$ decays only polynomially, whereas the actual weight $(1+\lambda_k)^{2m}e^{-2\lambda_k t}$ decays exponentially due to the gap $\lambda_1 > 0$. To capture this, write
\begin{align*}
(1+\lambda_k)^{2m}e^{-2\lambda_k t} = (1+\lambda_k)^{2m}e^{-\lambda_k t}\cdot e^{-\lambda_k t} \le M_m(t/2)\cdot e^{-\lambda_1 t},
\end{align*}
where we applied the same optimization with $t/2$ in place of $2t$. So at large $t$ the weight decays exponentially.
**Step (c): bound the Sobolev norm.** Using the bound from step (b):
\begin{align*}
\|T(t)g\|_{\mathcal{H}^{2m}}^2 = \sum_{k=1}^\infty (1+\lambda_k)^{2m}e^{-2\lambda_k t}|c_k|^2 \le M_m(t)\sum_{k=1}^\infty |c_k|^2 = M_m(t)\,\|g\|_{L^2(U)}^2.
\end{align*}
The sum $\sum_k |c_k|^2 = \|g\|_{L^2(U)}^2$ is Parseval's identity. Taking square roots:
\begin{align*}
\|T(t)g\|_{\mathcal{H}^{2m}} \le \sqrt{M_m(t)}\,\|g\|_{L^2(U)}, \qquad t > 0.
\end{align*}
For $t \in (0, 1]$, $\sqrt{M_m(t)} \le \sqrt{D_m}\,t^{-m}$. Combining with the embedding $\mathcal{H}^{2m}\hookrightarrow H^{2m}(U)$ from Step 3 (constant $C_m^{(2)}$):
\begin{align*}
\|T(t)g\|_{H^{2m}(U)} \le C_m^{(2)}\sqrt{M_m(t)}\,\|g\|_{L^2(U)} \le C_m^{(2)}\sqrt{D_m}\,t^{-m}\|g\|_{L^2(U)}, \qquad t \in (0, 1].
\end{align*}
This is the central smoothing estimate. The $t^{-m}$ rate in the $H^{2m}$-norm corresponds to a $t^{-k/2}$ rate in the $H^k$-norm (with $k = 2m$) — sharper than the $t^{-k}$ stated in the theorem, which uses $m = k$ instead of $m = k/2$ for simplicity in the conclusion step. The reason the rate cannot be improved further is that the optimization $\sup\lambda^m e^{-\lambda t}$ is sharp (attained at $\lambda^* = m/t$), so the $t^{-m}$ exponent is tight in this scheme.
For odd Sobolev indices $H^{2m+1}$, the same argument works with $(1+\lambda_k)^{2m+1}$ in place of $(1+\lambda_k)^{2m}$, giving a smoothing rate $t^{-(m+1/2)}$. The general case is identical.[/guided]