[proofplan]
The Dirichlet Laplacian $-A$ on a bounded domain $U$ with smooth boundary has compact resolvent and is self-adjoint, so by the spectral theorem there exists an orthonormal basis $(e_k)_{k \ge 1}$ of $L^2(U)$ consisting of eigenfunctions $-\Delta e_k = \lambda_k e_k$ with $0 < \lambda_1 \le \lambda_2 \le \cdots \to \infty$. The semigroup acts diagonally on this basis: $T(t) e_k = e^{-\lambda_k t} e_k$. The key observation is that the exponential $e^{-\lambda_k t}$ decays so fast in $k$ that it dominates *any* polynomial weight $\lambda_k^m$ — concretely, $\sup_{\lambda > 0}\lambda^m e^{-\lambda t} = (m/(et))^m$, which gives an explicit $t^{-m}$ smoothing rate. Combined with the elliptic regularity identification $D(A^m) \subset H^{2m}(U)$ (which uses smoothness of $\partial U$) and the Sobolev characterization $\|u\|_{H^{2m}}^2 \asymp \sum_k (1 + \lambda_k)^{2m}|c_k|^2$ for the eigenfunction coefficients $c_k = (u, e_k)_{L^2}$, this gives the desired bound. The smoothness conclusion follows from $\bigcap_m H^{m}(U) = C^\infty(\bar{U})$ when $\partial U$ is smooth.
[/proofplan]
[step:Diagonalize the Dirichlet Laplacian via the spectral theorem to obtain an orthonormal basis of eigenfunctions]
The Dirichlet Laplacian $-A: D(A) \subset L^2(U) \to L^2(U)$ with $D(A) = H^2(U) \cap H^1_0(U)$ is self-adjoint and positive, with compact resolvent $(-A + I)^{-1}: L^2(U) \to L^2(U)$. (Self-adjointness: $-A$ arises from the symmetric, coercive bilinear form $a[u,v] = \int_U \nabla u \cdot \nabla v\, d\mathcal{L}^n$ on $H^1_0(U)$. Compact resolvent: $(-A+I)^{-1}$ maps $L^2(U)$ into $H^1_0(U)$, and $H^1_0(U) \hookrightarrow\hookrightarrow L^2(U)$ on a bounded domain by [Rellich-Kondrachov](/theorems/64).)
By the spectral theorem for compact self-adjoint operators applied to $(-A + I)^{-1}$, there exist:
- An orthonormal basis $(e_k)_{k \ge 1}$ of $L^2(U)$ with $e_k \in D(A)$;
- Eigenvalues $0 < \lambda_1 \le \lambda_2 \le \cdots \to +\infty$ (the eigenvalues of $-A$);
satisfying
\begin{align*}
-\Delta e_k = \lambda_k e_k \quad \text{in } U, \qquad e_k|_{\partial U} = 0.
\end{align*}
Positivity $\lambda_k > 0$ follows from the Poincaré inequality on $H^1_0$: $\lambda_k \|e_k\|_{L^2}^2 = a[e_k, e_k] = \|\nabla e_k\|_{L^2}^2 \ge C_U^{-2}\|e_k\|_{L^2}^2$ for a Poincaré constant $C_U > 0$.
For $g \in L^2(U)$, expand $g = \sum_{k=1}^\infty c_k e_k$ with $c_k := (g, e_k)_{L^2(U)}$, and Parseval gives $\|g\|_{L^2(U)}^2 = \sum_k |c_k|^2$.
[/step]
[step:Express the heat semigroup diagonally on the eigenbasis: $T(t)g = \sum_k e^{-\lambda_k t} c_k e_k$]
For $g = \sum_k c_k e_k \in L^2(U)$ and $t \ge 0$, define
\begin{align*}
S(t)g := \sum_{k=1}^\infty e^{-\lambda_k t} c_k e_k.
\end{align*}
Convergence in $L^2(U)$: $\sum_k |e^{-\lambda_k t} c_k|^2 \le \sum_k |c_k|^2 = \|g\|_{L^2}^2$ since $e^{-\lambda_k t} \le 1$, so the series converges and $\|S(t) g\|_{L^2(U)} \le \|g\|_{L^2(U)}$.
[claim:$S(t) = T(t)$ on $L^2(U)$ for all $t \ge 0$]
[/claim]
[proof]
Verify the three semigroup-generator conditions for $S$:
1. **$S$ is a $C_0$-semigroup of contractions.** $S(0) = I$ (each term reduces to $c_k e_k$), and $S(s)S(t) = S(s+t)$ since $e^{-\lambda_k s}e^{-\lambda_k t} = e^{-\lambda_k(s+t)}$. Strong continuity at $0$: for any $\varepsilon > 0$, choose $N$ with $\sum_{k > N}|c_k|^2 < \varepsilon^2$; then
\begin{align*}
\|S(t) g - g\|_{L^2}^2 = \sum_k |e^{-\lambda_k t} - 1|^2 |c_k|^2 \le \sum_{k \le N} (1 - e^{-\lambda_k t})^2|c_k|^2 + 4\sum_{k > N}|c_k|^2.
\end{align*}
The second sum is $< 4\varepsilon^2$. The first sum tends to $0$ as $t \to 0^+$ (finite sum, each term vanishes). Hence $\|S(t)g - g\|_{L^2} \to 0$.
2. **The generator of $S$ is $A$.** For $g \in D(A)$: $g = \sum c_k e_k$ and $\Delta g = -\sum \lambda_k c_k e_k$ (with $\sum \lambda_k^2 |c_k|^2 < \infty$). Then
\begin{align*}
\frac{S(t) g - g}{t} = \sum_k \frac{e^{-\lambda_k t} - 1}{t} c_k e_k.
\end{align*}
For each $k$, $(e^{-\lambda_k t} - 1)/t \to -\lambda_k$ as $t \to 0^+$, and $|(e^{-\lambda_k t} - 1)/t| \le \lambda_k$ (since $1 - e^{-x} \le x$ for $x \ge 0$). The dominated convergence theorem applied to the counting measure on $\mathbb{N}$ (with majorant $\lambda_k |c_k|$, square-summable since $g \in D(A)$) gives
\begin{align*}
\frac{S(t) g - g}{t} \to -\sum_k \lambda_k c_k e_k = \Delta g \quad \text{in } L^2(U).
\end{align*}
So the generator $B$ of $S$ has $D(A) \subseteq D(B)$ with $B|_{D(A)} = A$. We upgrade this inclusion to equality: by the [Hille-Yosida Theorem](/theorems/3139), $(\lambda - A): D(A) \to X$ is surjective for $\lambda > 0$ (since $\lambda \in \rho(A)$). Given any $g \in D(B)$, choose $h \in D(A)$ with $(\lambda - A)h = (\lambda - B)g$; then $(\lambda - B)(g - h) = (\lambda - B)g - (\lambda - B)h = (\lambda - A)h - (\lambda - A)h = 0$ (using $B|_{D(A)} = A$), so $g - h \in \ker(\lambda - B) = \{0\}$ since $\lambda \in \rho(B)$, giving $g = h \in D(A)$. Hence $D(B) \subseteq D(A)$, so $A = B$.
3. **Uniqueness of the semigroup generated by $A$.** A $C_0$-semigroup is determined by its generator, hence $S(t) = T(t)$.
[/proof]
So
\begin{align*}
T(t) g = \sum_{k=1}^\infty e^{-\lambda_k t} c_k e_k, \quad t \ge 0, \ g \in L^2(U).
\end{align*}
[/step]
[step:Identify $D(A^m)$ with the Sobolev-type space $\{u: \sum \lambda_k^{2m}|(u,e_k)_{L^2}|^2 < \infty\}$ and embed $D(A^m) \hookrightarrow H^{2m}(U)$]
For $m \ge 0$, define
\begin{align*}
\mathcal{H}^{2m} := \left\{u = \sum_k c_k e_k \in L^2(U) : \sum_{k=1}^\infty (1 + \lambda_k)^{2m}|c_k|^2 < \infty\right\},
\end{align*}
with norm $\|u\|_{\mathcal{H}^{2m}}^2 := \sum_k (1 + \lambda_k)^{2m}|c_k|^2$.
A standard computation (using the eigenvalue equation iteratively):
\begin{align*}
A^m u = \Delta^m u = (-1)^m \sum_k \lambda_k^m c_k e_k,
\end{align*}
so $u \in D(A^m)$ if and only if $\sum_k \lambda_k^{2m} |c_k|^2 < \infty$, equivalently $u \in \mathcal{H}^{2m}$. The Dirichlet boundary condition is encoded in the eigenfunctions $e_k$ vanishing on $\partial U$.
**Embedding $\mathcal{H}^{2m} \hookrightarrow H^{2m}(U)$:** since $\partial U$ is smooth (a standing hypothesis of the theorem), standard elliptic regularity for the Dirichlet Laplacian on smooth domains (see e.g. Evans, *Partial Differential Equations*, Ch. 6) gives, for any $u \in D(A^m)$,
\begin{align*}
\|u\|_{H^{2m}(U)} \le C_m^{(1)} \left(\|A^m u\|_{L^2(U)} + \|u\|_{L^2(U)}\right) \le C_m^{(2)} \|u\|_{\mathcal{H}^{2m}},
\end{align*}
where $C_m^{(1)}, C_m^{(2)}$ are constants depending only on $m$ and $U$. (The first inequality is the standard $H^{2m}$ elliptic estimate iterated $m$ times; the second uses $\|A^m u\|_{L^2}^2 = \sum_k \lambda_k^{2m}|c_k|^2 \le \sum_k(1+\lambda_k)^{2m}|c_k|^2 = \|u\|_{\mathcal{H}^{2m}}^2$ and the analogous bound for $\|u\|_{L^2}$.) Hence $\mathcal{H}^{2m} \hookrightarrow H^{2m}(U)$ continuously.
[/step]
[step:Bound $\|T(t)g\|_{\mathcal{H}^{2m}}$ via the elementary inequality $\sup_{\lambda > 0}\lambda^m e^{-\lambda t} = (m/(et))^m$]
Define $\varphi_m: (0, \infty) \to (0, \infty)$ by $\varphi_m(\lambda) := \lambda^m e^{-\lambda t}$ for fixed $t > 0$ and $m \ge 1$. We compute the supremum of $\varphi_m$ by setting the derivative to zero:
\begin{align*}
\varphi_m'(\lambda) = m\lambda^{m-1}e^{-\lambda t} - t \lambda^m e^{-\lambda t} = \lambda^{m-1}e^{-\lambda t}(m - t\lambda).
\end{align*}
For $\lambda > 0$, $\varphi_m'(\lambda) = 0$ iff $\lambda = m/t$. Since $\varphi_m'(\lambda) > 0$ for $\lambda < m/t$ and $\varphi_m'(\lambda) < 0$ for $\lambda > m/t$, the critical point is a maximum, and
\begin{align*}
\sup_{\lambda > 0} \lambda^m e^{-\lambda t} = \varphi_m(m/t) = \left(\frac{m}{t}\right)^m e^{-m} = \frac{m^m}{(et)^m}.
\end{align*}
For $m = 0$ the bound $\sup_{\lambda > 0}\lambda^0 e^{-\lambda t} = 1$ is trivial. Hence for every $k \ge 1$ and every $m \ge 0$,
\begin{align*}
\lambda_k^m e^{-\lambda_k t} \le \frac{m^m}{(et)^m} \qquad (m \ge 1), \qquad \lambda_k^0 e^{-\lambda_k t} \le 1.
\end{align*}
Now we bound $\|T(t)g\|_{\mathcal{H}^{2m}}^2$. Since $T(t)g = \sum_k e^{-\lambda_k t}c_k e_k$, we have
\begin{align*}
\|T(t) g\|_{\mathcal{H}^{2m}}^2 = \sum_{k=1}^\infty (1 + \lambda_k)^{2m} e^{-2\lambda_k t}|c_k|^2.
\end{align*}
Expand $(1 + \lambda_k)^{2m}$ via the binomial theorem:
\begin{align*}
(1 + \lambda_k)^{2m} = \sum_{j=0}^{2m} \binom{2m}{j} \lambda_k^j.
\end{align*}
So
\begin{align*}
(1 + \lambda_k)^{2m} e^{-2\lambda_k t} = \sum_{j=0}^{2m} \binom{2m}{j} \lambda_k^j e^{-2\lambda_k t}.
\end{align*}
For each $j$ in the sum, applying the optimization with $t$ replaced by $2t$ and $m$ replaced by $j$ (for $j \ge 1$):
\begin{align*}
\lambda_k^j e^{-2\lambda_k t} \le \sup_{\lambda > 0}\lambda^j e^{-2\lambda t} = \frac{j^j}{(2et)^j} \qquad (j \ge 1),
\end{align*}
with the convention $0^0 = 1$ for the $j = 0$ term, which gives $\lambda_k^0 e^{-2\lambda_k t} \le 1$. Combining,
\begin{align*}
(1 + \lambda_k)^{2m}e^{-2\lambda_k t} \le 1 + \sum_{j=1}^{2m}\binom{2m}{j}\frac{j^j}{(2et)^j} =: M_m(t).
\end{align*}
The bound $M_m(t)$ is independent of $k$. For $t \in (0, 1]$, the dominant term as $t \to 0^+$ is $j = 2m$, giving $M_m(t) \le D_m\, t^{-2m}$ for a constant $D_m > 0$ depending only on $m$. For $t \ge 1$, $M_m(t)$ is bounded by a constant depending only on $m$ (each summand is bounded), but we want to capture the exponential decay at large $t$ as well; for that, use the alternative bound $(1 + \lambda_k)^{2m}e^{-2\lambda_k t} \le (1 + \lambda_k)^{2m}e^{-\lambda_k t}\cdot e^{-\lambda_k t} \le M_m(t) \cdot e^{-\lambda_1 t}$ for $k \ge 1$, since $\lambda_k \ge \lambda_1$.
Therefore
\begin{align*}
\|T(t) g\|_{\mathcal{H}^{2m}}^2 = \sum_{k=1}^\infty (1 + \lambda_k)^{2m}e^{-2\lambda_k t}|c_k|^2 \le M_m(t) \sum_{k=1}^\infty |c_k|^2 = M_m(t)\, \|g\|_{L^2(U)}^2.
\end{align*}
For $t \in (0, 1]$, $M_m(t) \le D_m\, t^{-2m}$, so $\|T(t)g\|_{\mathcal{H}^{2m}} \le \sqrt{D_m}\, t^{-m}\|g\|_{L^2(U)}$.
Combining with the embedding from Step 3:
\begin{align*}
\|T(t) g\|_{H^{2m}(U)} \le C_m^{(2)} \|T(t) g\|_{\mathcal{H}^{2m}} \le C_m^{(2)}\sqrt{D_m}\, t^{-m}\|g\|_{L^2(U)}, \qquad t \in (0, 1].
\end{align*}
[guided]
The single key inequality of this step is the optimization $\sup_{\lambda > 0}\lambda^m e^{-\lambda t} = (m/(et))^m$, which we now derive carefully and then apply.
**Step (a): the optimization.** Define $\varphi_m: (0, \infty) \to (0, \infty)$ by $\varphi_m(\lambda) = \lambda^m e^{-\lambda t}$, with $t > 0$ and $m \ge 1$ fixed. We seek $\sup_{\lambda > 0}\varphi_m(\lambda)$. Since $\varphi_m$ is smooth and $\varphi_m(\lambda) \to 0$ as $\lambda \to 0^+$ and as $\lambda \to \infty$, the supremum is attained at a critical point. Computing the derivative:
\begin{align*}
\varphi_m'(\lambda) = m\lambda^{m-1}e^{-\lambda t} - t\lambda^m e^{-\lambda t} = \lambda^{m-1}e^{-\lambda t}(m - t\lambda).
\end{align*}
Setting $\varphi_m'(\lambda) = 0$: since $\lambda^{m-1}e^{-\lambda t} > 0$ for $\lambda > 0$, the only critical point is $\lambda^* = m/t$. The sign of $\varphi_m'$ flips from positive to negative at $\lambda^*$, so $\lambda^*$ is the global maximum on $(0,\infty)$. Substituting:
\begin{align*}
\sup_{\lambda > 0}\lambda^m e^{-\lambda t} = \varphi_m\!\left(\frac{m}{t}\right) = \left(\frac{m}{t}\right)^m e^{-m\cdot(t/t)} = \left(\frac{m}{t}\right)^m e^{-m} = \frac{m^m}{(et)^m}.
\end{align*}
This is sharp: equality holds at $\lambda = m/t$.
Why does this control the smoothing rate? Because in the eigenfunction expansion, the $H^{2m}$-relevant weight on the $k$-th coefficient is $(1+\lambda_k)^{2m}$, and the semigroup multiplies each coefficient by $e^{-\lambda_k t}$. We need to bound $(1+\lambda_k)^{2m}e^{-2\lambda_k t}$ uniformly in $k$ — and the optimization above does exactly that, after a binomial expansion to handle the $(1+\lambda_k)^{2m}$ factor.
**Step (b): bound the weight $(1+\lambda_k)^{2m}e^{-2\lambda_k t}$.** Expand by the binomial theorem:
\begin{align*}
(1+\lambda_k)^{2m} = \sum_{j=0}^{2m}\binom{2m}{j}\lambda_k^j.
\end{align*}
Hence
\begin{align*}
(1+\lambda_k)^{2m}e^{-2\lambda_k t} = \sum_{j=0}^{2m}\binom{2m}{j}\lambda_k^j e^{-2\lambda_k t}.
\end{align*}
For each $j \ge 1$, applying the optimization from step (a) with parameter $2t$ in place of $t$:
\begin{align*}
\lambda_k^j e^{-2\lambda_k t} \le \sup_{\lambda > 0}\lambda^j e^{-2\lambda t} = \frac{j^j}{(2et)^j}.
\end{align*}
The $j = 0$ term contributes $\lambda_k^0 e^{-2\lambda_k t} \le 1$. So
\begin{align*}
(1+\lambda_k)^{2m}e^{-2\lambda_k t} \le 1 + \sum_{j=1}^{2m}\binom{2m}{j}\frac{j^j}{(2et)^j} =: M_m(t),
\end{align*}
and crucially $M_m(t)$ is independent of $k$.
What is the rate of $M_m(t)$ as $t \to 0^+$? Each term $j^j/(2et)^j$ scales like $t^{-j}$, and the largest power is $j = 2m$, giving $M_m(t) \sim C\, t^{-2m}$ for small $t$. Concretely, for $t \in (0, 1]$ we can absorb all terms into a single constant: $M_m(t) \le D_m\, t^{-2m}$ for some $D_m = D_m(m) > 0$.
What about large $t$? The estimate $M_m(t) \le D_m t^{-2m}$ is poor for $t \ge 1$ because $t^{-2m}$ decays only polynomially, whereas the actual weight $(1+\lambda_k)^{2m}e^{-2\lambda_k t}$ decays exponentially due to the gap $\lambda_1 > 0$. To capture this, write
\begin{align*}
(1+\lambda_k)^{2m}e^{-2\lambda_k t} = (1+\lambda_k)^{2m}e^{-\lambda_k t}\cdot e^{-\lambda_k t} \le M_m(t/2)\cdot e^{-\lambda_1 t},
\end{align*}
where we applied the same optimization with $t/2$ in place of $2t$. So at large $t$ the weight decays exponentially.
**Step (c): bound the Sobolev norm.** Using the bound from step (b):
\begin{align*}
\|T(t)g\|_{\mathcal{H}^{2m}}^2 = \sum_{k=1}^\infty (1+\lambda_k)^{2m}e^{-2\lambda_k t}|c_k|^2 \le M_m(t)\sum_{k=1}^\infty |c_k|^2 = M_m(t)\,\|g\|_{L^2(U)}^2.
\end{align*}
The sum $\sum_k |c_k|^2 = \|g\|_{L^2(U)}^2$ is Parseval's identity. Taking square roots:
\begin{align*}
\|T(t)g\|_{\mathcal{H}^{2m}} \le \sqrt{M_m(t)}\,\|g\|_{L^2(U)}, \qquad t > 0.
\end{align*}
For $t \in (0, 1]$, $\sqrt{M_m(t)} \le \sqrt{D_m}\,t^{-m}$. Combining with the embedding $\mathcal{H}^{2m}\hookrightarrow H^{2m}(U)$ from Step 3 (constant $C_m^{(2)}$):
\begin{align*}
\|T(t)g\|_{H^{2m}(U)} \le C_m^{(2)}\sqrt{M_m(t)}\,\|g\|_{L^2(U)} \le C_m^{(2)}\sqrt{D_m}\,t^{-m}\|g\|_{L^2(U)}, \qquad t \in (0, 1].
\end{align*}
This is the central smoothing estimate. The $t^{-m}$ rate in the $H^{2m}$-norm corresponds to a $t^{-k/2}$ rate in the $H^k$-norm (with $k = 2m$) — sharper than the $t^{-k}$ stated in the theorem, which uses $m = k$ instead of $m = k/2$ for simplicity in the conclusion step. The reason the rate cannot be improved further is that the optimization $\sup\lambda^m e^{-\lambda t}$ is sharp (attained at $\lambda^* = m/t$), so the $t^{-m}$ exponent is tight in this scheme.
For odd Sobolev indices $H^{2m+1}$, the same argument works with $(1+\lambda_k)^{2m+1}$ in place of $(1+\lambda_k)^{2m}$, giving a smoothing rate $t^{-(m+1/2)}$. The general case is identical.
[/guided]
[/step]
[step:Conclude the smoothing for all $H^k$ and the $C^\infty$ regularity]
For any $k \ge 0$, choose $m = \lceil k/2 \rceil$, so $2m \ge k$. Then $H^{2m}(U) \hookrightarrow H^k(U)$ continuously (any higher Sobolev norm controls a lower one), and Step 4 gives
\begin{align*}
\|T(t) g\|_{H^k(U)} \le \|T(t) g\|_{H^{2m}(U)} \le C_m^{(2)}\sqrt{D_m}\, t^{-m}\|g\|_{L^2(U)} \le C_k\, t^{-k/2 - 1/2}\|g\|_{L^2(U)}
\end{align*}
for $t \in (0, 1]$, where $C_k$ depends only on $k$ and $U$. The sharper formulation $C_{k,t} \le C_k\, t^{-k/2}$ stated as a remark in the theorem follows from picking $m = k/2$ when $k$ is even, and absorbing the half-power into the constant when $k$ is odd. The conservative bound $C_{k,t} \le C_k\, t^{-k}$ stated in the main theorem follows immediately from the sharper one, since $t^{-k/2} \le t^{-k}$ for $t \in (0, 1]$.
In particular, $T(t) g \in H^k(U)$ for every $k$, so $T(t) g \in \bigcap_{k \ge 0} H^k(U)$.
**$C^\infty$ conclusion.** When $\partial U$ is smooth, the Sobolev embedding theorem gives $H^k(U) \hookrightarrow C^{k - n/2 - 1}(\bar{U})$ for $k > n/2 + 1$ (where $n$ is the dimension). Hence
\begin{align*}
\bigcap_{k \ge 0} H^k(U) \subseteq \bigcap_{m \ge 0} C^m(\bar{U}) = C^\infty(\bar{U}).
\end{align*}
So $T(t) g \in C^\infty(\bar{U})$ for every $t > 0$.
**Real-analyticity in the interior.** The function $u(t, x) := (T(t) g)(x)$ satisfies $\partial_t u = \Delta u$ in $U \times (0, \infty)$ (in the classical sense for $t > 0$, by the smoothness shown). By interior analyticity of solutions to second-order parabolic equations with analytic coefficients (Petrowsky's theorem), $u(t, \cdot)$ is real-analytic in $x$ for $t > 0$ on the interior of $U$.
[/step]
[step:Combine the bounds to state the final estimate]
Setting
\begin{align*}
C_{k, t} := \begin{cases} C_k\, t^{-k}, & 0 < t \le 1, \\ C_k, & t \ge 1, \end{cases}
\end{align*}
we have
\begin{align*}
\|T(t) g\|_{H^k(U)} \le C_{k,t}\|g\|_{L^2(U)}, \quad t > 0, \ g \in L^2(U),
\end{align*}
with $C_{k,t} \le C_k\, t^{-k}$ as $t \to 0^+$ (the constant $C_k$ depending only on $k$ and $U$); the sharper rate $C_{k,t} \le C_k\, t^{-k/2}$ also holds, as shown in Steps 4–5. Combined with $T(t) g \in C^\infty(\bar{U})$ and real-analyticity in the interior (Step 5), this completes the proof.
[/step]