Partial Differential Equations III: Parabolic and Hyperbolic Evolution Equations

Edit 0 Issues 0 Pull Requests Roadmap Admin

Content

Problems

History

Issues Verification Attributions

This course studies how parabolic and hyperbolic partial differential equations describe the evolution of physical systems in time. Parabolic equations, typified by the heat equation, model diffusion, smoothing, and irreversible dissipation, while hyperbolic equations, typified by the wave equation, model propagation, oscillation, and energy transport. The emphasis is not only on solving specific equations, but on developing a coherent framework for understanding existence, uniqueness, stability, regularity, and asymptotic behavior of evolving fields. The chapters begin by recasting evolution equations as Cauchy problems, then develop the classical theory of the heat equation on Euclidean space and the energy methods that control parabolic solutions. From there, the course moves to semigroups and abstract parabolic theory, which provide a unified language for linear evolution, and then to long-time behavior and the qualitative structure of parabolic flows. The second half shifts to hyperbolic equations: energy conservation for the wave equation, finite propagation speed, domain of dependence, and Fourier and dispersive viewpoints that explain how waves spread and interact. The later chapters extend these linear ideas to semilinear problems, where nonlinear terms introduce new analytical difficulties. Weak compactness and limiting arguments become essential tools for constructing solutions and passing to limits, while damping and dissipation connect the parabolic and hyperbolic theories by showing how energy can decay or stabilize over time. Overall, the course builds from foundational well-posedness theory to deeper qualitative analysis, linking abstract methods with the behavior of concrete evolutionary PDEs. # Introduction These notes study time-dependent partial differential equations as evolution laws: an initial state is prescribed, and the equation describes how that state changes with time. The course focuses on the heat equation, the wave equation, and perturbations of these models because they display two sharply different mechanisms: parabolic smoothing and hyperbolic propagation. The central techniques are energy estimates, semigroup representations, weak formulations, and stability arguments in function spaces. The prerequisites are the standard language of real analysis, measure theory, Sobolev spaces, distributions, Hilbert spaces, basic functional analysis, Fourier transform methods, and elementary elliptic estimates. Earlier PDE courses usually treat elliptic problems as static boundary value problems; here the main question is how estimates evolve from initial data. A recurring theme is that the right formulation of a solution is part of the analysis, not a cosmetic choice. Throughout the abstract-evolution parts, $D(A)$ denotes the domain of an unbounded operator $A$, a $C_0$-semigroup means a strongly continuous family of bounded operators $(S(t))_{t\ge0}$ with $S(0)=I$ and $S(t+s)=S(t)S(s)$, and $A^*$ denotes the Hilbert-space adjoint when $A$ is densely defined. Integrals such as $\int_0^t S(t-s)f(s)\,ds$ are Bochner integrals with values in the underlying [Banach space](/page/Banach%20Space); this is why hypotheses like $f\in L^1(0,T;X)$ appear before differentiability is discussed. ## Evolution Laws and Initial Data The first organizing problem is how to turn a PDE involving time into a well-posed Cauchy problem. For a scalar function $u:(0,T)\times U\to \mathbb R$, the informal phrase "solve the PDE from initial data" must specify the function space of $u$, the sense in which time derivatives exist, and the topology in which the initial condition is attained. These choices determine which estimates are available and which uniqueness statement can be true. [definition: Evolution Equation] Let $X$ be a Banach space, let $A:D(A)\to X$ be an operator with domain $D(A)\subset X$, and let $f:(0,T)\to X$ be a forcing term. An evolution equation on $X$ is an equation of the form \begin{align*} \frac{du}{dt}(t) = A u(t) + f(t), \qquad 0<t<T, \end{align*} for an unknown function $u:(0,T)\to X$. [/definition] This definition separates the time variable from the spatial operator. In applications, $X$ may be $L^2(U)$, $H^1_0(U)$, or a distribution space, while $A$ may be a Laplacian, a divergence-form [elliptic operator](/page/Elliptic%20Operator), or a first-order matrix operator obtained from rewriting a second-order equation. The first model to fit into this language is the heat equation, where the spatial operator is elliptic and the time evolution is dissipative. [example: Heat Equation as an Evolution Equation] Let $U\subseteq \mathbb R^n$ be open and consider \begin{align*} \partial_t u-\Delta u=f, \qquad u(0)=u_0. \end{align*} Set $X=L^2(U)$, view $u(t)$ as the spatial function $x\mapsto u(t,x)$, and choose the operator $A=\Delta$ with a domain encoding the boundary condition; for example, homogeneous Dirichlet data lead to $D(A)=H^2(U)\cap H^1_0(U)$ when this elliptic domain description is valid. For each fixed $t$, the PDE says \begin{align*} \partial_t u(t,x)-\Delta_x u(t,x)=f(t,x). \end{align*} Adding $\Delta_xu(t,x)$ to both sides gives \begin{align*} \partial_t u(t,x)=\Delta_xu(t,x)+f(t,x). \end{align*} In $L^2(U)$ notation this is exactly \begin{align*} u'(t)=Au(t)+f(t), \end{align*} because $u'(t)$ denotes the $L^2(U)$-valued time derivative $x\mapsto \partial_tu(t,x)$ and $Au(t)$ denotes $x\mapsto \Delta_xu(t,x)$. Thus the heat equation is a Cauchy problem on the Banach space $L^2(U)$. The benefit is that information about the spatial operator $\Delta$, such as dissipativity or heat-semigroup smoothing, becomes information about the time-dependent solution $u(t)$. [/example] The heat equation suggests a first-order-in-time viewpoint, but the wave equation begins as second order in time. The course repeatedly uses the device of enlarging the state space so that the same abstract framework can treat both models. [example: Wave Equation as a First-Order System] For the homogeneous wave equation \begin{align*} \partial_t^2 u-\Delta u=0, \end{align*} introduce the velocity variable $v=\partial_t u$ and the state vector $W(t)=(u(t),v(t))$. The definition of $v$ gives \begin{align*} u'(t)=\partial_tu(t)=v(t). \end{align*} The wave equation gives \begin{align*} \partial_t^2u(t)-\Delta u(t)=0. \end{align*} Adding $\Delta u(t)$ to both sides gives \begin{align*} \partial_t^2u(t)=\Delta u(t). \end{align*} Since $v=\partial_tu$, differentiating in time gives $v'=\partial_t^2u$, and therefore \begin{align*} v'(t)=\Delta u(t). \end{align*} Thus the second-order scalar equation is equivalent, for sufficiently regular $u$, to the first-order system \begin{align*} u'(t)=v(t), \qquad v'(t)=\Delta u(t). \end{align*} With homogeneous Dirichlet boundary conditions, the natural state space is $H^1_0(U)\times L^2(U)$: the first component records finite spatial energy through $\nabla u\in L^2(U)$, while the second component records finite kinetic energy through $v=\partial_tu\in L^2(U)$. In operator form this is \begin{align*} W'(t)=\mathcal A W(t), \qquad \mathcal A(u,v)=(v,\Delta u), \end{align*} with a domain chosen so that $\Delta u$ is an $L^2$-function. The example shows that an equation second order in time can still be treated as an abstract first-order Cauchy problem after enlarging the state. [/example] ## Notions of Solution The next problem is that classical differentiability is often unavailable for natural data. For instance, heat flow with $u_0\in L^2(U)$ need not have a pointwise value at every spatial point when $t=0$, and forcing terms in $L^1(0,T;L^2(U))$ need not be continuous in time. The course therefore distinguishes several solution concepts and proves that they agree when enough regularity is present. [definition: Classical Solution] Let $Q=(0,T)\times U$, where $U\subseteq\mathbb R^n$ is open, and let $F:Q\times\mathbb R^N\to\mathbb R^m$ denote a pointwise differential expression involving $u$ and finitely many classical derivatives of $u$. A classical solution of $F(t,x,u,\partial u,\dots)=0$ is a function $u:Q\to\mathbb R^N$ belonging to $C^k(Q;\mathbb R^N)$ for every derivative order $k$ appearing in $F$, such that the equation holds at every $(t,x)\in Q$ and the prescribed initial and boundary conditions are attained pointwise. [/definition] Classical solutions are useful for deriving identities by formal calculation. The analysis then asks which of those identities survive under approximation, because the resulting weak or energy solutions are the ones compatible with low-regularity data. [definition: Weak Solution] Let $Q=(0,T)\times U$, let $Y$ be a Banach space of functions or distributions on $Q$, let $\mathcal T\subset C_c^\infty(Q)$ be a test-function space, and let $\mathcal B:Y\times\mathcal T\to\mathbb R$ be the [bilinear form](/page/Bilinear%20Form) obtained from a PDE by [integration by parts](/theorems/210). For a source term $g$ defining a functional $G:\mathcal T\to\mathbb R$, a weak solution is a function or distribution $u\in Y$ such that \begin{align*} \mathcal B[u,\phi]=G(\phi) \end{align*} for every $\phi\in\mathcal T$. [/definition] Weak formulations replace pointwise derivatives by distributional derivatives. This is essential for parabolic and hyperbolic equations because energy estimates naturally control norms such as $\|u(t)\|_{L^2}$, $\|\nabla u\|_{L^2((0,T)\times U)}$, or $\|\partial_t u(t)\|_{L^2}$ rather than all classical derivatives. For linear equations generated by an operator, there is also a formulation based on integrating the evolution in time, which leads to the mild solution concept. [definition: Mild Solution] Let $A$ generate a strongly continuous semigroup $(S(t))_{t\ge 0}$ on a Banach space $X$. A mild solution of \begin{align*} u'(t)=Au(t)+f(t), \qquad u(0)=u_0, \end{align*} is a function $u:[0,T]\to X$ satisfying \begin{align*} u(t)=S(t)u_0+\int_0^t S(t-s)f(s)\,ds \end{align*} for $0\le t\le T$. [/definition] Mild solutions encode the PDE through the solution operator rather than through differentiability. This is especially effective for the heat equation, where the semigroup has explicit smoothing bounds, and for perturbative problems, where the integral formula becomes the starting point for fixed-point arguments. The next result explains when this integral formulation returns to the differentiated equation. [quotetheorem:7057] [citeproof:7057] Each hypothesis prevents a specific failure. If $u_0\notin D(A)$, the orbit $S(t)u_0$ may be only continuous at $t=0$ rather than differentiable there; for the heat semigroup on $L^2(U)$, rough $L^2$ data gives a mild solution but not a strong solution at the initial time. If $f$ is merely integrable in time, the Duhamel term may fail to be $C^1([0,T];X)$. If the formula does not take values continuously in $D(A)$ with the graph norm, the expression $Au(t)$ need not depend continuously on time. The theorem is therefore a regularity upgrade, not an existence theorem, and it prepares the later strategy of constructing weak or mild solutions first and then proving additional regularity from estimates. ## Energy as a Measuring Device A central question in evolution equations is which quantity should be controlled in time. For heat flow, the natural energy decreases and dissipation measures smoothing. For wave flow, the natural energy is conserved or nearly conserved, reflecting propagation without parabolic damping. [definition: Energy Functional] Let $Y$ be a state space for an evolution equation. An energy functional on $Y$ is a map \begin{align*} E:Y\to[0,\infty]. \end{align*} [/definition] An energy is useful when the equation yields an identity or inequality controlling $E[u(t)]$ in terms of initial data, forcing, and lower-order terms. The definition is deliberately flexible because the right energy depends on the equation. In a Hilbert space setting, energies often arise from inner products, while for nonlinear equations they may involve integrals of convex functions or conserved Hamiltonians. The first computation shows how parabolic dissipation appears from the energy method. [example: Energy Identity for the Heat Equation] Assume $u$ is a smooth solution of $\partial_t u-\Delta u=0$ on a bounded [open set](/page/Open%20Set) $U\subset\mathbb R^n$, with $u=0$ on $\partial U$. For each fixed $t$, the equation says $\partial_tu(t,x)=\Delta u(t,x)$, so multiplying by $u(t,x)$ and integrating over $U$ gives \begin{align*} \int_U u(t,x)\partial_tu(t,x)\,dx=\int_U u(t,x)\Delta u(t,x)\,dx. \end{align*} The left-hand side is the time derivative of the squared $L^2$ norm with the factor $1/2$: \begin{align*} \frac{d}{dt}\left(\frac{1}{2}\int_U |u(t,x)|^2\,dx\right)=\int_U u(t,x)\partial_tu(t,x)\,dx. \end{align*} For the right-hand side, Green's identity gives \begin{align*} \int_U u(t,x)\Delta u(t,x)\,dx=-\int_U |\nabla u(t,x)|^2\,dx+\int_{\partial U}u(t,x)\partial_\nu u(t,x)\,dS. \end{align*} The boundary term is zero because $u(t,x)=0$ on $\partial U$, hence \begin{align*} \int_U u(t,x)\Delta u(t,x)\,dx=-\|\nabla u(t)\|_{L^2(U)}^2. \end{align*} Combining the two identities gives \begin{align*} \frac{1}{2}\frac{d}{dt}\|u(t)\|_{L^2(U)}^2=-\|\nabla u(t)\|_{L^2(U)}^2. \end{align*} Equivalently, \begin{align*} \frac{1}{2}\frac{d}{dt}\|u(t)\|_{L^2(U)}^2+\|\nabla u(t)\|_{L^2(U)}^2=0. \end{align*} Since $\|\nabla u(t)\|_{L^2(U)}^2\ge 0$, the function $t\mapsto \|u(t)\|_{L^2(U)}^2$ is nonincreasing; integrating the identity from $0$ to $T$ also gives \begin{align*} \frac{1}{2}\|u(T)\|_{L^2(U)}^2+\int_0^{\mathsf T}\|\nabla u(t)\|_{L^2(U)}^2\,dt=\frac{1}{2}\|u(0)\|_{L^2(U)}^2. \end{align*} Thus the heat equation both decreases the $L^2$ energy and controls the accumulated spatial gradient over time. [/example] The heat identity contains two pieces of information: stability of the solution size and a gain of spatial control after integration in time. The corresponding wave calculation has a different sign structure and therefore a different conclusion. [example: Energy Conservation for the Wave Equation] Let $u$ solve $\partial_t^2u-\Delta u=0$ smoothly on a bounded open set $U\subset\mathbb R^n$, with $u=0$ on $\partial U$. For each fixed $t$, the equation is \begin{align*} \partial_t^2u(t,x)=\Delta u(t,x). \end{align*} Multiplying by $\partial_tu(t,x)$ and integrating over $U$ gives \begin{align*} \int_U \partial_tu(t,x)\partial_t^2u(t,x)\,dx=\int_U \partial_tu(t,x)\Delta u(t,x)\,dx. \end{align*} The left-hand side is the derivative of the kinetic energy: \begin{align*} \frac{d}{dt}\left(\frac{1}{2}\int_U |\partial_tu(t,x)|^2\,dx\right)=\int_U \partial_tu(t,x)\partial_t^2u(t,x)\,dx. \end{align*} For the right-hand side, [integration by parts](/theorems/2098) in the spatial variables gives \begin{align*} \int_U \partial_tu(t,x)\Delta u(t,x)\,dx=-\int_U \nabla(\partial_tu)(t,x)\cdot \nabla u(t,x)\,dx+\int_{\partial U}\partial_tu(t,x)\partial_\nu u(t,x)\,dS. \end{align*} Since $u(t,x)=0$ on $\partial U$ for all $t$, differentiating the boundary condition in time gives $\partial_tu(t,x)=0$ on $\partial U$, so the boundary term vanishes. Hence \begin{align*} \int_U \partial_tu(t,x)\Delta u(t,x)\,dx=-\int_U \nabla(\partial_tu)(t,x)\cdot \nabla u(t,x)\,dx. \end{align*} Because $u$ is smooth, $\nabla(\partial_tu)=\partial_t(\nabla u)$, and therefore \begin{align*} \frac{d}{dt}\left(\frac{1}{2}\int_U |\nabla u(t,x)|^2\,dx\right)=\int_U \nabla u(t,x)\cdot \partial_t(\nabla u)(t,x)\,dx. \end{align*} Thus \begin{align*} -\int_U \nabla(\partial_tu)(t,x)\cdot \nabla u(t,x)\,dx=-\frac{d}{dt}\left(\frac{1}{2}\|\nabla u(t)\|_{L^2(U)}^2\right). \end{align*} Combining the kinetic and elastic identities gives \begin{align*} \frac{d}{dt}\left(\frac{1}{2}\|\partial_tu(t)\|_{L^2(U)}^2\right)=-\frac{d}{dt}\left(\frac{1}{2}\|\nabla u(t)\|_{L^2(U)}^2\right). \end{align*} Moving both terms to the same side yields \begin{align*} \frac{d}{dt}\left(\frac{1}{2}\|\partial_tu(t)\|_{L^2(U)}^2+\frac{1}{2}\|\nabla u(t)\|_{L^2(U)}^2\right)=0. \end{align*} Therefore the sum of kinetic energy and elastic energy is constant in time for the homogeneous wave equation with homogeneous Dirichlet boundary condition. [/example] These two examples motivate much of the course. Parabolic estimates exploit dissipation to gain regularity and decay, while hyperbolic estimates exploit conservation or finite-speed localization to control propagation. Once an estimate controls the difference of two solutions, it also becomes a uniqueness principle. [quotetheorem:7058] [citeproof:7058] The linearity and homogeneity assumptions are doing real work. For a nonlinear equation, the difference of two solutions usually satisfies an equation with coefficients depending on both solutions, so a separate Lipschitz or monotonicity estimate is needed. For an inhomogeneous problem, two solutions with different forcing terms differ by a forced equation, and the estimate must include the forcing norm. The constant must also be independent of the particular solution; an estimate whose constant depends on the unknown cannot force zero data to give zero solution. This is why later uniqueness proofs focus on deriving estimates for differences in exactly the same class in which existence was proved. ## Parabolic and Hyperbolic Behaviour The course contrasts two model behaviours rather than treating every equation separately. Parabolic equations diffuse information and often improve regularity immediately. Hyperbolic equations propagate signals at finite speed and preserve roughness more strongly. [definition: Parabolic Model Equation] The parabolic model equation in this course is the heat equation \begin{align*} \partial_t u - \Delta u = f \end{align*} for an unknown $u:(0,T)\times U\to \mathbb R$, with initial condition $u(0)=u_0$ and appropriate boundary conditions when $U\ne\mathbb R^n$. [/definition] The heat equation is tied to the elliptic operator $-\Delta$ and to the Gaussian heat kernel on $\mathbb R^n$. Its estimates combine positivity, averaging, and the spectral behaviour of the Laplacian. To see what is special about this behaviour, we next put beside it the model equation whose energy travels rather than diffuses. [definition: Hyperbolic Model Equation] The hyperbolic model equation in this course is the wave equation \begin{align*} \partial_t^2 u - \Delta u = f \end{align*} for an unknown $u:(0,T)\times U\to \mathbb R$, with initial conditions $u(0)=u_0$ and $\partial_tu(0)=u_1$ and appropriate boundary conditions when $U\ne\mathbb R^n$. [/definition] The wave equation has two pieces of initial data because it is second order in time. Its analysis is governed by energy conservation, domain-of-dependence arguments, and the geometry of characteristic cones. With the two model equations now fixed, the next example records the practical consequence of the distinction. It compares what happens to the same kind of localized initial information under each flow, making the later choice of estimates less mysterious. [example: Contrasting Smoothing and Propagation] Take compactly supported data $u_0\in L^1(\mathbb R^n)$ for the heat equation. For $t>0$, the heat solution is \begin{align*} u(t,x)=\int_{\mathbb R^n}G_t(x-y)u_0(y)\,dy \end{align*} with \begin{align*} G_t(z)=(4\pi t)^{-n/2}\exp\left(-\frac{|z|^2}{4t}\right). \end{align*} For any multi-index $\alpha$, differentiating the Gaussian gives \begin{align*} \partial_x^\alpha G_t(x-y)=t^{-|\alpha|/2}P_\alpha\left(\frac{x-y}{\sqrt t}\right)G_t(x-y) \end{align*} for a polynomial $P_\alpha$ depending only on $\alpha$ and $n$. Since $u_0$ is integrable and compactly supported, this derivative is bounded on each set $K\times \operatorname{supp}u_0$ with $K\subset\mathbb R^n$ compact, so differentiation under the integral gives \begin{align*} \partial_x^\alpha u(t,x)=\int_{\mathbb R^n}\partial_x^\alpha G_t(x-y)u_0(y)\,dy. \end{align*} Thus $u(t,\cdot)$ has spatial derivatives of every order for every $t>0$. If also $u_0\ge 0$ and $u_0$ is not identically zero, then $G_t(x-y)>0$ for every $x,y$, so \begin{align*} u(t,x)=\int_{\operatorname{supp}u_0}G_t(x-y)u_0(y)\,dy>0 \end{align*} for every $x\in\mathbb R^n$ and every $t>0$. For the wave equation with speed $1$, compact support behaves differently. If the initial displacement and velocity are supported in a ball $B_R(0)$, the *finite propagation speed property* gives \begin{align*} \operatorname{supp}u(t,\cdot)\subseteq B_{R+t}(0). \end{align*} Equivalently, if $|x|>R+t$, then the backward cone from $(t,x)$ does not meet the initial support, so the value at $(t,x)$ is zero. The heat equation immediately smooths and spreads nonnegative localized data, while the wave equation preserves a sharp geometric constraint on where the solution can be nonzero. [/example] This contrast is not merely qualitative. It determines which estimates are plausible: $L^p$-$L^q$ smoothing estimates are natural for the heat semigroup, while energy and local energy estimates are natural for waves. ## The Role of Function Spaces The final introductory problem is choosing spaces that are strong enough to make estimates meaningful but weak enough to include the data of interest. The course uses Sobolev spaces for spatial regularity, Bochner spaces for time-dependent Banach-space-valued functions, and distributional derivatives for weak formulations. [definition: Bochner Space] Let $X$ be a Banach space and let $1\le p\le \infty$. The Bochner space $L^p(0,T;X)$ consists of strongly [measurable functions](/page/Measurable%20Functions) $u:(0,T)\to X$ such that $\|u(\cdot)\|_X\in L^p(0,T)$, with norm \begin{align*} \|u\|_{L^p(0,T;X)}=\left(\int_0^{\mathsf T}\|u(t)\|_X^p\,dt\right)^{1/p} \end{align*} for $1\le p<\infty$, and the essential supremum norm for $p=\infty$. [/definition] Bochner spaces let us state estimates such as $u\in L^2(0,T;H^1_0(U))$ and $\partial_tu\in L^2(0,T;H^{-1}(U))$. This is the natural language for weak parabolic theory, where the solution may not be differentiable as an $L^2$-valued function at every time. To express the time derivative in that low-regularity setting, we use the distributional definition in time. [definition: Distributional Time Derivative] Let $X$ be a Banach space and $u\in L^1_{\mathrm{loc}}(0,T;X)$. A function $v\in L^1_{\mathrm{loc}}(0,T;X)$ is the distributional time derivative of $u$ if \begin{align*} \int_0^{\mathsf T} u(t)\varphi'(t)\,dt = -\int_0^{\mathsf T} v(t)\varphi(t)\,dt \end{align*} for every $\varphi\in C_c^\infty(0,T)$. [/definition] This definition is the time-dependent analogue of weak differentiation in space. It allows the course to prove compactness and stability results by passing to limits in integral identities rather than in pointwise equations. The next theorem identifies the [weak derivative](/page/Weak%20Derivative) with the integral representation that controls initial values and time continuity. [quotetheorem:7059] [citeproof:7059] The integrability assumptions are essential. If the time derivative exists only as a general distribution rather than as a function in $L^1(0,T;X)$, the integral formula may not define an $X$-valued absolutely continuous representative; jumps and measures in time require a different framework. The conclusion is also only up to changing $u$ on a null set, since Bochner functions are equivalence classes and pointwise values are not part of the original data. The theorem is therefore exactly the tool needed for PDE solutions built from integral identities: it converts weak time differentiation into representatives with meaningful initial traces, while leaving genuinely measure-valued time behaviour outside the present theory. ## Course Map The course begins by setting up abstract Cauchy problems, [Duhamel's principle](/theorems/55), distributional time derivatives, and Bochner spaces. It then develops the heat equation on $\mathbb R^n$, including kernel representations, maximum principles, $L^p$-$L^q$ smoothing, positivity, mass conservation, and Gaussian decay. The parabolic part continues with energy methods, weak solutions, Galerkin approximation, and compactness arguments for linear and semilinear equations. The hyperbolic part begins with the wave equation, its conserved energy, finite propagation speed, and representation formulas in low dimensions. It then treats weak solutions and stability estimates for forced waves and variable-coefficient perturbations. The final lectures connect the two viewpoints through semigroup theory, spectral methods, and semilinear evolution equations, emphasizing local well-posedness, continuation criteria, and the way linear estimates drive nonlinear analysis. By the end of the course, the reader should be able to translate a time-dependent PDE into a functional-analytic Cauchy problem, choose an appropriate solution concept, derive the basic energy estimate, and use that estimate to prove uniqueness, stability, and persistence of regularity. The introduction has set up the vocabulary; the rest of the notes develop the estimates that make the vocabulary effective. The introduction has now fixed the basic language of Cauchy problems, solution concepts, and energy estimates. The next chapter turns that vocabulary into a functional-analytic framework by viewing the unknown as a curve in a Banach or Hilbert space, where evolution equations can be studied abstractly. # 1. Evolution Equations as Cauchy Problems This opening chapter sets up time-dependent PDEs as evolution problems in function spaces. The central move is to replace a scalar unknown $u(t,x)$ by a curve $t \mapsto u(t)$ taking values in a Banach or Hilbert space, so that the PDE becomes an abstract differential equation. The chapter assumes the standard background from functional analysis: Banach and Hilbert spaces, densely defined operators, adjoints, weak derivatives, and the basic $L^p$ and Sobolev spaces used in elliptic theory. With that language in place, heat, wave, and many perturbative equations share the same notions of existence, uniqueness, forcing, and weak time differentiation. ## Cauchy Problems in Function Spaces The first question is how to formulate an initial value problem when the unknown at each time is itself a function of space. For the heat equation the natural state might be $u(t) \in L^2(\mathbb R^n)$ or $u(t) \in H^1(\mathbb R^n)$; for the wave equation the state must contain both displacement and velocity. The abstract Cauchy problem records exactly this information. [definition: Abstract Cauchy Problem] Let $X$ be a Banach space, let $A:D(A)\subset X\to X$ be a linear operator, let $u_0\in X$, and let $f:(0,T)\to X$ be a given forcing term. The abstract inhomogeneous Cauchy problem is the system consisting of \begin{align*} \dot u(t) = Au(t)+f(t),\qquad 0<t<T, \end{align*} and the initial condition $u(0)=u_0$. [/definition] Here $D(A)$ is part of the data: many PDE operators, such as $\Delta$ on $L^2(\mathbb R^n)$, are not defined on all of $X$. The first natural solution concept therefore requires enough regularity for $Au(t)$ and $\dot u(t)$ to exist as $X$-valued functions. [definition: Strong Solution] Assume $u_0\in X$ and $f\in L^1(0,T;X)$. A function $u:[0,T]\to X$ is a strong solution of the abstract Cauchy problem if $u\in C([0,T];X)$, $u(t)\in D(A)$ for a.e. $t\in(0,T)$, $Au\in L^1(0,T;X)$, $u$ is differentiable as an $X$-valued function for a.e. $t$, $\dot u\in L^1(0,T;X)$, and \begin{align*} \dot u(t)=Au(t)+f(t) \end{align*} for a.e. $t\in(0,T)$, with $u(0)=u_0$. [/definition] A strong solution is the direct analogue of a classical solution in time, but it may be too restrictive when $u_0$ is rough. The mild formulation asks instead that the equation has already been integrated in time, with the homogeneous evolution handled by a semigroup. [definition: Mild Solution] Let $X$ be a Banach space, let $A:D(A)\subset X\to X$ be the generator of a strongly continuous semigroup $(S(t))_{t\ge 0}$ with $S(t):X\to X$ for every $t\ge 0$, let $u_0\in X$, and let $f\in L^1(0,T;X)$. A function $u\in C([0,T];X)$ is a mild solution of \begin{align*} \dot u(t)=Au(t)+f(t),\qquad u(0)=u_0, \end{align*} if \begin{align*} u(t)=S(t)u_0+\int_0^t S(t-s)f(s)\,ds \end{align*} for every $t\in[0,T]$, where the integral is a Bochner integral in $X$. [/definition] The mild formula is meaningful for many initial data outside $D(A)$ because no separate expression $Au(t)$ is required. It still depends on a semigroup, so it is less intrinsic when the course later works from energy identities or adjoint test functions. This motivates a Hilbert-space weak formulation in which the equation is read through scalar pairings against vectors in $D(A^*)$. [definition: Weak Solution in a Hilbert Space] Let $H$ be a Hilbert space, let $A:D(A)\subset H\to H$ be densely defined, let $u_0\in H$, and let $f\in L^1(0,T;H)$. A function $u\in C([0,T];H)$ is a weak solution of $\dot u=Au+f$, $u(0)=u_0$, if for every $v\in D(A^*)$ the scalar function $t\mapsto (u(t),v)_H$ is absolutely continuous and \begin{align*} \frac{d}{dt}(u(t),v)_H=(u(t),A^*v)_H+(f(t),v)_H \end{align*} for a.e. $t\in(0,T)$, with $u(0)=u_0$. [/definition] This weak formulation is useful because it keeps the state space fixed at $H$ while moving the operator onto $v$. The next example shows how a second-order-in-time PDE can be brought into the same first-order framework. [example: Wave Equation as a First-Order System] Consider the wave equation on $\mathbb R^n$, \begin{align*} \partial_t^2 y(t,x)-\Delta y(t,x)=g(t,x),\qquad y(0)=y_0,\qquad \partial_t y(0)=y_1. \end{align*} Use the state variable $u(t)=(u_1(t),u_2(t))=(y(t),\partial_t y(t))$ in the energy space $H^1(\mathbb R^n)\times L^2(\mathbb R^n)$. Then the first component satisfies \begin{align*} \dot u_1(t)=\partial_t y(t)=u_2(t). \end{align*} For the second component, the wave equation gives \begin{align*} \partial_t^2 y(t)-\Delta y(t)=g(t). \end{align*} Adding $\Delta y(t)$ to both sides yields \begin{align*} \partial_t^2 y(t)=\Delta y(t)+g(t). \end{align*} Since $u_2(t)=\partial_t y(t)$, this becomes \begin{align*} \dot u_2(t)=\partial_t^2 y(t)=\Delta u_1(t)+g(t). \end{align*} Define \begin{align*} A(y,z)=(z,\Delta y) \end{align*} on a domain such as \begin{align*} D(A)=\{(y,z)\in H^1(\mathbb R^n)\times L^2(\mathbb R^n):z\in H^1(\mathbb R^n),\ \Delta y\in L^2(\mathbb R^n)\}. \end{align*} Combining the two component equations gives \begin{align*} \dot u(t)=(\dot u_1(t),\dot u_2(t))=(u_2(t),\Delta u_1(t)+g(t)). \end{align*} By the definition of $A$, \begin{align*} Au(t)+(0,g(t))=(u_2(t),\Delta u_1(t))+(0,g(t))=(u_2(t),\Delta u_1(t)+g(t)). \end{align*} Thus \begin{align*} \dot u(t)=Au(t)+(0,g(t)), \end{align*} with initial state \begin{align*} u(0)=(y(0),\partial_t y(0))=(y_0,y_1). \end{align*} The wave equation is therefore an abstract first-order Cauchy problem once the state is enlarged to include both displacement and velocity. [/example] ## Duhamel's Principle for Forced Linear Evolution Once the homogeneous equation is understood, the next problem is how to incorporate an external source term without rebuilding the solution theory. Duhamel's principle says that the inhomogeneous solution is assembled by adding up all homogeneous evolutions launched by the forcing at earlier times. [explanation: Duhamel Formula for Forced Linear Evolution] Let $A$ generate a strongly continuous semigroup $(S(t))_{t\ge0}$ on a Banach space $X$, let $u_0\in X$, and let $f\in L^1(0,T;X)$. The mild solution formula for \begin{align*} u_t=Au+f,\qquad u(0)=u_0 \end{align*} is \begin{align*} u(t)=S(t)u_0+\int_0^t S(t-s)f(s)\,ds,\qquad 0\le t\le T, \end{align*} with the integral understood as a Bochner integral in $X$. Under stronger assumptions such as $u_0\in D(A)$ and sufficiently regular $f$, this mild solution is a classical solution. [/explanation] The formula separates propagation from forcing, but its hypotheses are doing real work. The assumption $f\in L^1(0,T;X)$ makes the Bochner integral meaningful. A scalar counterexample is obtained by taking $X=\mathbb R$, $S(t)=I$, and $f(t)=1/t$ on $(0,T)$: for any $t>0$, the integral $\int_0^t f(s)\,ds$ diverges, so the Duhamel term is not an element of $X$. If the forcing is a distribution in time, for instance $f=\delta_{T/2}x_0$ with $0\ne x_0\in X$, the expression $\int_0^t S(t-s)f(s)\,ds$ is not a Bochner integral; it can only be interpreted after changing the solution concept. Strong continuity and local boundedness of $(S(t))_{t\ge 0}$ are what allow the step-function approximation to converge. The stronger conclusion is also genuinely stronger: without $u_0\in D(A)$, even the homogeneous solution $S(t)u_0$ may fail to have an $X$-valued derivative at $t=0$. Time regularity of $f$ controls the time regularity inherited by the solution. With $X=\mathbb R$, $S(t)=I$, $A=0$, and $f=\mathbf{1}_{(T/2,T)}$, the formula gives an absolutely continuous solution whose derivative is exactly this discontinuous forcing, so no conclusion requiring a continuous time derivative can follow from mere $L^1$ forcing. Duhamel's formula does not by itself prove that a proposed operator generates a semigroup, nor does it solve nonlinear equations without an additional fixed point or compactness argument. This is the starting point for heat kernel representations, energy estimates with sources, and fixed point arguments for nonlinear equations. [example: Heat Equation with Source on Euclidean Space] Let $X=L^2(\mathbb R^n)$ and let $A=\Delta$ with its usual heat-semigroup realization on $L^2(\mathbb R^n)$. For $r>0$, the heat semigroup is convolution with the Gaussian kernel \begin{align*} S(r)h=G_r*h,\qquad G_r(x)=(4\pi r)^{-n/2}e^{-|x|^2/(4r)}. \end{align*} The normalization of $G_r$ is \begin{align*} \int_{\mathbb R^n}G_r(x)\,dx=(4\pi r)^{-n/2}\prod_{j=1}^n\int_{\mathbb R}e^{-x_j^2/(4r)}\,dx_j=(4\pi r)^{-n/2}(4\pi r)^{n/2}=1. \end{align*} Thus [Young's convolution inequality](/theorems/463) gives, for each $h\in L^2(\mathbb R^n)$, \begin{align*} \|G_r*h\|_{L^2(\mathbb R^n)}\le \|G_r\|_{L^1(\mathbb R^n)}\|h\|_{L^2(\mathbb R^n)}=\|h\|_{L^2(\mathbb R^n)}. \end{align*} The PDE \begin{align*} \partial_t u(t,x)-\Delta u(t,x)=F(t,x),\qquad u(0,x)=u_0(x), \end{align*} is therefore the abstract Cauchy problem \begin{align*} \dot u(t)=Au(t)+F(t),\qquad u(0)=u_0, \end{align*} in $L^2(\mathbb R^n)$. Applying the Duhamel formula from the preceding theorem gives \begin{align*} u(t)=S(t)u_0+\int_0^t S(t-s)F(s)\,ds. \end{align*} Substituting the heat-kernel representation of $S(r)$ first with $r=t$ and then with $r=t-s$ gives \begin{align*} S(t)u_0=G_t*u_0. \end{align*} For the forcing term, \begin{align*} S(t-s)F(s)=G_{t-s}*F(s). \end{align*} Hence \begin{align*} u(t)=G_t*u_0+\int_0^t G_{t-s}*F(s)\,ds. \end{align*} The integral is a Bochner integral in $L^2(\mathbb R^n)$, since \begin{align*} \int_0^t\|G_{t-s}*F(s)\|_{L^2(\mathbb R^n)}\,ds\le \int_0^t\|F(s)\|_{L^2(\mathbb R^n)}\,ds<\infty. \end{align*} The formula says that the initial data is diffused for time $t$, while the source inserted at time $s$ is diffused only for the remaining time $t-s$. [/example] This representation turns existence into a formula, but it also raises the uniqueness question: could another weakly defined solution satisfy the same evolution law without being equal to it? In the semigroup setting the difference of two candidates is often controlled through an integral inequality rather than by subtracting differentiable equations pointwise. The scalar estimate behind that control is Gronwall's inequality: once the norm of a difference is bounded by its accumulated past values, the inequality forces the difference to vanish when the initial discrepancy is zero. [quotetheorem:872] [citeproof:872] The theorem is purely scalar, and that is exactly why it is reusable. A PDE or semigroup argument first has to produce nonnegative functions $a(t)$ and $b(t)$ with an estimate of the form required by the theorem; only then does Gronwall convert accumulated growth into an explicit exponential bound. The integrability and nonnegativity hypotheses are not cosmetic. If the coefficient multiplying the past size is not integrable, the exponential factor can be infinite, and the estimate gives no finite control. If the quantity being estimated is not nonnegative, it cannot directly represent a norm or energy. In applications to mild solutions, the semigroup bounds, forcing estimates, or Lipschitz constants are the ingredients that produce the scalar inequality, while Gronwall is the final step that turns that inequality into uniqueness or continuous dependence. Chapters 3 and 6 refine this argument into energy uniqueness and stability estimates adapted respectively to parabolic and hyperbolic equations. ## Distributional Time Derivatives and Bochner Spaces The final question of the chapter is how to differentiate a curve that may not be classically differentiable in time. PDE estimates often provide bounds such as $u\in L^2(0,T;H^1_0(\Omega))$ and $\partial_t u\in L^2(0,T;H^{-1}(\Omega))$, so the derivative must be interpreted weakly in time and in a Banach-valued sense. Scalar $L^p$ notation by itself is not enough for this: without strong measurability there may be no usable almost-everywhere representative, and without an integrable $X$-norm there is no Bochner integral with values in the state space. This pathology is avoided in the separable spaces most common in PDE, such as $L^2(\Omega)$ and $H^1_0(\Omega)$; in nonseparable spaces it can occur. For example, in $X=\ell^2([0,T])$, the map $t\mapsto e_t$ is weakly measurable but not strongly measurable, because its range is an uncountable set of pairwise distance $\sqrt 2$. [definition: Bochner Space] Let $X$ be a Banach space and let $1\le p\le\infty$. The Bochner space $L^p(0,T;X)$ consists of strongly measurable functions $u:(0,T)\to X$ such that $\|u\|_{L^p(0,T;X)}<\infty$, where for $1\le p<\infty$, \begin{align*} \|u\|_{L^p(0,T;X)}=\left(\int_0^{\mathsf T}\|u(t)\|_X^p\,dt\right)^{1/p}, \end{align*} and for $p=\infty$, \begin{align*} \|u\|_{L^\infty(0,T;X)}=\operatorname{ess\,sup}_{0<t<T}\|u(t)\|_X. \end{align*} [/definition] Bochner spaces allow the same estimates as scalar $L^p$ spaces, but the integrand is now a norm in $X$. To formulate an evolution equation in such a space, we need a weak time derivative defined through integration by parts in the time variable. [definition: Distributional Time Derivative] Let $X$ be a Banach space and let $u\in L^1(0,T;X)$. An element $v\in L^1(0,T;X)$ is the distributional time derivative of $u$ if \begin{align*} \int_0^{\mathsf T} u(t)\phi'(t)\,dt=-\int_0^{\mathsf T} v(t)\phi(t)\,dt \end{align*} for every $\phi\in C_c^\infty(0,T)$, where both integrals are Bochner integrals in $X$. [/definition] With this definition, a weak time derivative is still an $X$-valued function, not merely a formal symbol. For evolution equations, however, controlling only $u$ is not enough: one also needs the time derivative to live in the same integrability scale, otherwise expressions such as $u(t)-u(s)=\int_s^t \dot u(r)\,dr$ may have no Banach-valued meaning. The appropriate space therefore records both the curve and its distributional time derivative. [definition: Bochner-Sobolev Space] Let $X$ be a Banach space and let $1\le p\le\infty$. The space $W^{1,p}(0,T;X)$ consists of all $u\in L^p(0,T;X)$ whose distributional time derivative $\dot u$ belongs to $L^p(0,T;X)$. [/definition] The reason this space is central is that it restores the [fundamental theorem of calculus](/theorems/632) for Banach-valued functions. This result is the bridge between differential and integral formulations of evolution equations. [quotetheorem:7059] [citeproof:7059] The theorem justifies evaluating $u(0)$ for functions in $W^{1,1}(0,T;X)$, even though the original definition was only a.e. in time. Each hypothesis rules out a specific failure. If $u$ is merely in $L^1(0,T;X)$, then changing its value at $t=0$ produces the same $L^1$ element, so an initial value is not well-defined. If $u$ has a jump, such as $u(t)=x_0\mathbf{1}_{(T/2,T)}(t)$ for a nonzero $x_0\in X$, then $u\in L^1(0,T;X)$ but has no continuous representative and its [distributional derivative](/page/Distributional%20Derivative) contains a Dirac mass rather than an $L^1(0,T;X)$ function. If the derivative is only distributional and not represented by an integrable $X$-valued function, the right-hand side $\int_s^t \dot u(r)\,dr$ is not a Bochner integral in $X$. Banach-valued Bochner integrability is also essential: the non-strongly-measurable map $t\mapsto e_t$ from $(0,T)$ into $\ell^2((0,T))$ is bounded in norm but has no Bochner integral, so the formula cannot even be formed in $X$. The range $1\le p\le\infty$ belongs to the Banach-space framework of the theorem; for $0<p<1$, the scalar expression $\|u\|_{L^p}=(\int_0^{\mathsf T}\|u(t)\|_X^p\,dt)^{1/p}$ is not a norm, as the scalar functions $\mathbf{1}_{(0,T/2)}$ and $\mathbf{1}_{(T/2,T)}$ show by violating the triangle inequality. Thus the usual completeness and absolute-continuity argument is no longer an argument in a Banach space. The theorem also does not say that $u$ is differentiable at every time; it gives absolute continuity and the integral identity, with the derivative interpreted a.e. In PDE applications the derivative often lies in a larger [dual space](/page/Dual%20Space) than the solution itself, which leads to the Hilbert triple formulation used later for parabolic equations. [example: Energy-Class Time Regularity] Let $V\hookrightarrow H\hookrightarrow V^*$ be a Hilbert triple, meaning that $V$ is densely and continuously embedded in $H$, and $H$ is identified with a subspace of $V^*$ by \begin{align*} h\mapsto \bigl(v\mapsto (h,v)_H\bigr). \end{align*} Assume $u\in L^2(0,T;V)$ and $\dot u\in L^2(0,T;V^*)$. The Hilbert-triple time-[continuity theorem](/theorems/1145), often called the *Lions-Magenes lemma*, gives a representative of $u$ in $C([0,T];H)$. The estimate behind the result is the energy identity. For this representative and for $0\le s\le t\le T$, \begin{align*} \|u(t)\|_H^2-\|u(s)\|_H^2=2\int_s^t \langle \dot u(r),u(r)\rangle_{V^*,V}\,dr. \end{align*} The integrand is integrable because Cauchy-Schwarz for the dual pairing gives \begin{align*} |\langle \dot u(r),u(r)\rangle_{V^*,V}|\le \|\dot u(r)\|_{V^*}\|u(r)\|_V. \end{align*} Integrating this inequality and applying scalar Cauchy-Schwarz on $(s,t)$ gives \begin{align*} \int_s^t|\langle \dot u(r),u(r)\rangle_{V^*,V}|\,dr\le \left(\int_s^t\|\dot u(r)\|_{V^*}^2\,dr\right)^{1/2}\left(\int_s^t\|u(r)\|_V^2\,dr\right)^{1/2}. \end{align*} Both factors on the right tend to $0$ as $t-s\to 0$, because $\dot u\in L^2(0,T;V^*)$ and $u\in L^2(0,T;V)$. Thus the $H$-energy varies continuously in time, and the full Lions-Magenes argument upgrades the a.e.-defined Bochner function to an $H$-continuous representative. In parabolic weak solutions, this is exactly why control of spatial regularity in $V$ and time regularity in $V^*$ still permits a meaningful initial value in the middle space $H$. [/example] These three viewpoints now fit together. Strong solutions differentiate the equation in the state space, mild solutions encode the same dynamics through the semigroup and Duhamel integral, and weak solutions use test functions or dual spaces to make sense of rough states. The rest of the course repeatedly moves between these formulations to obtain existence, uniqueness, regularity, and stability for parabolic and hyperbolic PDEs. Chapter 1 set up the abstract Cauchy problem and the roles of mild, weak, and classical solutions. We now specialize that framework to the heat equation on Euclidean space, where the semigroup, the kernel, and the smoothing effect can all be seen explicitly. # 2. The Heat Equation on Euclidean Space Chapter 1 introduced evolution equations as Cauchy problems and explained why mild solutions are often the right object before differentiability is available. We now specialise to the model parabolic equation on all of Euclidean space, \begin{align*} \partial_t u - \Delta u = 0, \qquad u(0,x)=f(x), \end{align*} where $t>0$ and $x\in \mathbb R^n$. The main questions are explicit representation, uniqueness, smoothing, decay, and preservation of qualitative features such as positivity and mass. The heat equation is the first place where the semigroup viewpoint from Chapter 1 becomes concrete: the solution operator is convolution with a Gaussian kernel. ## Heat Kernel and Convolution Solutions The first problem is to find a candidate solution operator that evolves rough initial data without requiring pointwise derivatives of $f$. [Translation invariance](/theorems/4911) suggests that the solution at time $t$ should average nearby values of $f$, while scaling suggests that the averaging length scale is $\sqrt{t}$. The resulting kernel is the Gaussian heat kernel. [definition: Heat Kernel] The heat kernel is the function \begin{align*} \Gamma:(0,\infty)\times\mathbb R^n\to(0,\infty) \end{align*} defined by \begin{align*} \Gamma(t,x) := (4\pi t)^{-n/2}\exp\left(-\frac{|x|^2}{4t}\right). \end{align*} [/definition] The normalisation is chosen so that $\Gamma(t,\cdot)$ has integral $1$ over $\mathbb R^n$. Its width is proportional to $\sqrt t$, so heat flow replaces a point value by a Gaussian average over the spatial scale naturally reached by diffusion by time $t$. [example: Gaussian Normalisation] Let $t>0$. By the definition of the heat kernel, \begin{align*} \int_{\mathbb R^n}\Gamma(t,x)\,d\mathcal L^n(x) = \int_{\mathbb R^n}(4\pi t)^{-n/2}\exp\left(-\frac{|x|^2}{4t}\right)\,d\mathcal L^n(x). \end{align*} Since $(4\pi t)^{-n/2}$ is independent of $x$, this becomes \begin{align*} \int_{\mathbb R^n}\Gamma(t,x)\,d\mathcal L^n(x) = (4\pi t)^{-n/2}\int_{\mathbb R^n}\exp\left(-\frac{|x|^2}{4t}\right)\,d\mathcal L^n(x). \end{align*} Set $y=x/(2\sqrt t)$, so $x=2\sqrt t\,y$, $|x|^2/(4t)=|y|^2$, and the Jacobian factor is $(2\sqrt t)^n$. Therefore \begin{align*} \int_{\mathbb R^n}\exp\left(-\frac{|x|^2}{4t}\right)\,d\mathcal L^n(x) = (2\sqrt t)^n\int_{\mathbb R^n}e^{-|y|^2}\,d\mathcal L^n(y). \end{align*} Using the standard [Gaussian integral](/theorems/1140) $\int_{\mathbb R^n}e^{-|y|^2}\,d\mathcal L^n(y)=\pi^{n/2}$, we get \begin{align*} \int_{\mathbb R^n}\Gamma(t,x)\,d\mathcal L^n(x) = (4\pi t)^{-n/2}(2\sqrt t)^n\pi^{n/2}. \end{align*} Now $(2\sqrt t)^n=2^n t^{n/2}$ and $(4\pi t)^{n/2}=2^n\pi^{n/2}t^{n/2}$, hence \begin{align*} (4\pi t)^{-n/2}(2\sqrt t)^n\pi^{n/2} = \frac{2^n t^{n/2}\pi^{n/2}}{2^n\pi^{n/2}t^{n/2}} = 1. \end{align*} Thus $\Gamma(t,\cdot)$ has total integral $1$, so convolution with $\Gamma(t,\cdot)$ averages mass rather than changing the total amount of mass. [/example] The normalisation example identifies the kernel as an averaging density, but a solution formula still needs a rule assigning an evolved function to each initial datum. Convolution is the translation-invariant way to place this Gaussian average around every spatial point, so it is the natural definition of the heat evolution operator. [definition: Heat Semigroup on Euclidean Space] For $t>0$ and $1\le p\le\infty$, the heat semigroup on $L^p(\mathbb R^n)$ is the operator \begin{align*} e^{t\Delta}:L^p(\mathbb R^n)\to L^p(\mathbb R^n) \end{align*} defined by \begin{align*} (e^{t\Delta}f)(x):=(\Gamma(t,\cdot)*f)(x)=\int_{\mathbb R^n}\Gamma(t,x-y)f(y)\,d\mathcal L^n(y) \end{align*} for a.e. $x\in\mathbb R^n$. [/definition] The notation $e^{t\Delta}$ is consistent with the abstract semigroup notation from Chapter 1. After defining the operator, the next issue is whether it really solves the PDE and recovers the prescribed initial data as $t\downarrow0$. [quotetheorem:54] [citeproof:54] The theorem turns the heat equation into an explicit averaging formula, but its hypotheses also indicate the limitations of the statement. Boundedness of $f$ is used to justify the simplest dominated-convergence argument and to avoid discussing growth at spatial infinity; for example, $f(y)=\exp(|y|^2)$ is continuous but the convolution integral diverges for small $t$ because the Gaussian factor cannot dominate that growth. Later estimates replace boundedness with $L^p$ assumptions, but some condition controlling the integral is still needed. The pointwise recovery of $f$ requires continuity at the point: if $f=\mathbb{1}_{(0,\infty)}$ on $\mathbb R$, then $e^{t\Delta}f(0)=1/2$ for every $t>0$, so the limit at $0$ is not the chosen value $f(0)$. The theorem also does not assert uniqueness among all smooth solutions with the same pointwise initial trace on an unbounded domain, does not treat boundary-value problems, and does not give estimates uniform down to $t=0$ for derivatives. The smoothing conclusion is the first sign of the parabolic character of the equation: for positive time, differentiating in $x$ falls on the kernel rather than on $f$. [example: Heat Flow from an Indicator Function] Let $E\subset\mathbb R^n$ be measurable with finite measure and set $f=\mathbb{1}_E$. The heat evolution is \begin{align*} u(t,x)=\int_{\mathbb R^n}\Gamma(t,x-y)\mathbb{1}_E(y)\,d\mathcal L^n(y)=\int_E\Gamma(t,x-y)\,d\mathcal L^n(y). \end{align*} For $t>0$, derivatives in $x$ fall on the heat kernel: \begin{align*} D_x^\alpha u(t,x)=\int_E D_x^\alpha\Gamma(t,x-y)\,d\mathcal L^n(y). \end{align*} This is justified because $E$ has finite measure and $D_x^\alpha\Gamma(t,x-y)$ is locally bounded in $(t,x)$ for each fixed positive time. Since $\Gamma(t,z)>0$ and $\int_{\mathbb R^n}\Gamma(t,z)\,d\mathcal L^n(z)=1$, we have \begin{align*} 0\le u(t,x)\le \int_{\mathbb R^n}\Gamma(t,x-y)\,d\mathcal L^n(y)=1. \end{align*} If $Z_t$ is a random vector with density $\Gamma(t,z)$, then \begin{align*} \mathbb P(x+Z_t\in E)=\int_{\mathbb R^n}\mathbb{1}_E(x+z)\Gamma(t,z)\,d\mathcal L^n(z). \end{align*} With $y=x+z$, this becomes \begin{align*} \mathbb P(x+Z_t\in E)=\int_E\Gamma(t,y-x)\,d\mathcal L^n(y). \end{align*} Because $\Gamma(t,y-x)=\Gamma(t,x-y)$, the probability equals $u(t,x)$. In one dimension, for $E=(-\infty,0]$, \begin{align*} u(t,x)=\int_{-\infty}^0(4\pi t)^{-1/2}\exp\left(-\frac{(x-y)^2}{4t}\right)\,dy. \end{align*} Set $s=(y-x)/\sqrt{2t}$, so $dy=\sqrt{2t}\,ds$ and the upper limit becomes $-x/\sqrt{2t}$. Then \begin{align*} u(t,x)=\int_{-\infty}^{-x/\sqrt{2t}}\frac{1}{\sqrt{2\pi}}e^{-s^2/2}\,ds. \end{align*} Thus $u(t,x)=\Phi(-x/\sqrt{2t})$, equivalently $u(t,x)=\frac12\operatorname{erfc}(x/(2\sqrt t))$. Differentiating the displayed integral with respect to the upper limit gives \begin{align*} \partial_xu(t,x)=-(4\pi t)^{-1/2}\exp\left(-\frac{x^2}{4t}\right)<0. \end{align*} So the jump from $\mathbb{1}_{(-\infty,0]}$ is instantly replaced by a smooth strictly decreasing transition layer, and the dependence on $x/\sqrt t$ shows that its width is comparable to $\sqrt t$. [/example] The representation theorem gives existence, but uniqueness needs a separate idea because the equation is posed on an unbounded spatial domain. The central uniqueness mechanism is the parabolic maximum principle. ## The Parabolic Maximum Principle The next question is how much of a solution is determined by its boundary and initial values. Elliptic equations have maximum principles in space; the heat equation has a forward-in-time version because the sign of $\partial_t u-\Delta u$ prevents a new positive maximum from forming in the interior. [definition: Parabolic Cylinder] Let $U\subset\mathbb R^n$ be open and let $T>0$. The parabolic cylinder over $U$ with time horizon $T$ is \begin{align*} Q_T:=(0,T)\times U. \end{align*} Its parabolic boundary is \begin{align*} \partial_p Q_T:=\bigl(\{0\}\times \overline U\bigr)\cup\bigl((0,T]\times\partial U\bigr), \end{align*} where $\partial U$ is the topological boundary of $U$ and $\overline U$ is the closure of $U$ in $\mathbb R^n$. [/definition] The parabolic boundary contains the initial face and the spatial boundary, but not the terminal time slice. The quoted maximum principle below uses the time-first convention $Q_T=(0,T)\times\Omega$, while the following comparison theorem uses the space-first convention $Q_T=U\times(0,T)$; these are the same cylinder with the coordinates ordered as written, and in both cases the incoming boundary is the initial face together with the lateral boundary. With this geometry fixed, we can state the precise principle that any new maximum must be inherited from the part of the boundary where data enter the problem. [quotetheorem:693] [citeproof:693] The boundedness of $U$ and continuity up to $\overline{Q_T}$ ensure that the supremum is attained on a compact set; without this compactness, a separate growth argument is needed. This is not only a proof convenience: on the unbounded cylinder $(0,T]\times\mathbb R$, the function $u(t,x)=x$ satisfies $\partial_tu-\partial_x^2u=0$ and has no finite upper bound although its initial values are prescribed pointwise. Continuity up to the parabolic boundary is also essential for reading boundary data in the inequality; for instance, $u(t,x)=x$ on $(0,T]\times(0,1)$ solves the equation, but if the value at the lateral boundary point $x=1$ were assigned discontinuously as $0$, the boundary record would no longer control the interior limit. The parabolic boundary hypotheses are also sharp in their geometry: if the initial face is omitted, then $u(t,x)=e^{-\pi^2t}\sin(\pi x)$ on $(0,T]\times(0,1)$ has zero lateral boundary values but positive interior values inherited from the initial profile. The terminal time slice is not included because it is an output of the evolution, not incoming data. The condition $\partial_tu-\Delta u\le0$ is the subsolution sign convention, so the theorem controls upper bounds rather than lower bounds. Applying the same argument to $-u$ gives the corresponding minimum principle. Together these two statements imply comparison: if one subsolution starts below a supersolution and stays below it on the lateral boundary, it remains below for later times. [quotetheorem:7060] [citeproof:7060] The [comparison principle](/theorems/4870) is a uniqueness theorem in disguise: applying it twice to two solutions with the same data forces equality. Its hypotheses still depend on having a genuine parabolic boundary where the ordering is known, which is automatic in bounded cylinders but absent for the Cauchy problem on all of $\mathbb R^n$. The boundary ordering cannot be weakened to an interior ordering at the initial face alone: on $Q_T=(0,T]\times(0,1)$, the functions $u(t,x)=x$ and $v(t,x)=0$ both solve the heat equation, satisfy $u(0,x)\ge v(0,x)$, and violate $u\le v$ inside unless the lateral boundary data are included with the correct inequality. The differential inequality is also necessary; if $u(t,x)=t$ and $v(t,x)=0$, then $u=v$ on the initial face but $(\partial_t-\Delta)u>(\partial_t-\Delta)v$, and the ordering immediately reverses for $t>0$. This is why the next uniqueness statement needs a condition at spatial infinity. Without such a condition, pathological solutions can hide mass at infinity and enter the domain instantly. [quotetheorem:7061] [citeproof:7061] The boundedness assumption is part of the statement rather than a technical detail. It is exactly what lets the quadratic barrier dominate the artificial boundary on large balls before the radius is sent to infinity. The conclusion therefore distinguishes the physical bounded heat flow from exotic smooth solutions that vanish initially but grow too fast at infinity. A named source of counterexamples is Tychonoff's nonuniqueness construction: there are nonzero smooth solutions on $(0,T]\times\mathbb R$ with zero initial trace whose spatial growth is faster than any Gaussian allowed by the standard uniqueness classes. Thus the bounded-growth hypothesis is a real selection principle, not only a device used by the proof. This prepares the estimates in the smoothing and decay section below, where boundedness is replaced by integrability or Gaussian growth conditions. [example: Nonuniqueness Without Growth Restrictions] Choose a nonzero smooth function $a:(0,T]\to\mathbb R$ that is flat at $0$, meaning $a^{(k)}(t)\to0$ as $t\downarrow0$ for every $k\ge0$, and whose derivatives grow slowly enough that the series below converges locally uniformly together with all derivatives. Define \begin{align*} u(t,x)=\sum_{k=0}^{\infty}\frac{a^{(k)}(t)}{(2k)!}x^{2k}. \end{align*} Then $u$ is smooth on $(0,T]\times\mathbb R$, and termwise differentiation is justified by the local [uniform convergence](/page/Uniform%20Convergence) of the differentiated series. Differentiating in time gives \begin{align*} \partial_tu(t,x)=\sum_{k=0}^{\infty}\frac{a^{(k+1)}(t)}{(2k)!}x^{2k}. \end{align*} Differentiating twice in space gives \begin{align*} \partial_x^2u(t,x)=\sum_{k=1}^{\infty}\frac{a^{(k)}(t)}{(2k)!}(2k)(2k-1)x^{2k-2}. \end{align*} Since $(2k)(2k-1)/(2k)! = 1/(2k-2)!$, this becomes \begin{align*} \partial_x^2u(t,x)=\sum_{k=1}^{\infty}\frac{a^{(k)}(t)}{(2k-2)!}x^{2k-2}. \end{align*} With $j=k-1$, the last series is \begin{align*} \partial_x^2u(t,x)=\sum_{j=0}^{\infty}\frac{a^{(j+1)}(t)}{(2j)!}x^{2j}. \end{align*} Thus $\partial_tu(t,x)=\partial_x^2u(t,x)$. For each fixed $x$, the flatness of $a$ at $0$ and the convergence estimates imply \begin{align*} \lim_{t\downarrow0}u(t,x)=\sum_{k=0}^{\infty}\frac{\lim_{t\downarrow0}a^{(k)}(t)}{(2k)!}x^{2k}=0. \end{align*} If $a$ is not identically zero, then $u$ is not identically zero because $u(t,0)=a(t)$. Hence the zero pointwise initial trace does not determine a unique solution on $(0,T]\times\mathbb R$ unless some growth condition at spatial infinity is imposed. [/example] ## Smoothing and Decay Estimates The [representation formula](/theorems/39) gives more than existence: it quantifies how heat flow improves integrability and differentiability. The question is how much better $e^{t\Delta}f$ is than $f$ after a positive time, and how the improvement depends on $t$. [quotetheorem:649] [citeproof:649] This functional-analytic inequality is the bridge from the explicit kernel to quantitative PDE estimates. The relation \begin{align*} \frac{1}{p}+\frac{1}{q}=1+\frac{1}{r} \end{align*} is the scaling rule for convolution: it says exactly how much integrability can be gained from the kernel factor. The weak space $L^{q,\infty}$ is important because several kernels arising in PDE, especially singular or borderline kernels, sit naturally at weak type before they satisfy strong $L^q$ bounds. The restrictions $1<p,q<\infty$ are not decorative: the proof uses distribution-function truncation and interpolation, and endpoint cases require different weak-type statements or fail outright. For the heat kernel at positive time the situation is better, since the Gaussian belongs to every $L^q$ space, but the theorem still identifies which kernel norm must be estimated. The hypotheses also explain the limitations of this route. Young-type estimates use translation-invariant convolution structure, so they are best suited to constant-coefficient equations on $\mathbb R^n$ or to localised arguments that reduce to that model. Boundary conditions, variable coefficients, and nonlinearities usually require energy estimates, semigroup bounds, or perturbative arguments instead of direct convolution. In the heat-flow application, the missing input is not the abstract inequality but the exact size of $\|\Gamma(t,\cdot)\|_{L^q}$ as $t$ varies. The next calculation supplies that time scaling when the convolution kernel is the heat kernel. [example: Norm of the Heat Kernel] Let $1\le r<\infty$ and $t>0$. From the definition of the heat kernel, \begin{align*} \|\Gamma(t,\cdot)\|_{L^r}^r=\int_{\mathbb R^n}\left((4\pi t)^{-n/2}\exp\left(-\frac{|x|^2}{4t}\right)\right)^r\,d\mathcal L^n(x). \end{align*} Raising each factor to the power $r$ gives \begin{align*} \|\Gamma(t,\cdot)\|_{L^r}^r=(4\pi t)^{-nr/2}\int_{\mathbb R^n}\exp\left(-\frac{r|x|^2}{4t}\right)\,d\mathcal L^n(x). \end{align*} Set $x=\sqrt t\,z$. Then $|x|^2=t|z|^2$ and the Jacobian factor is $t^{n/2}$, so \begin{align*} \int_{\mathbb R^n}\exp\left(-\frac{r|x|^2}{4t}\right)\,d\mathcal L^n(x)=t^{n/2}\int_{\mathbb R^n}\exp\left(-\frac{r|z|^2}{4}\right)\,d\mathcal L^n(z). \end{align*} Substituting this into the previous expression yields \begin{align*} \|\Gamma(t,\cdot)\|_{L^r}^r=(4\pi)^{-nr/2}t^{-nr/2}t^{n/2}\int_{\mathbb R^n}\exp\left(-\frac{r|z|^2}{4}\right)\,d\mathcal L^n(z). \end{align*} Since $-nr/2+n/2=-n(r-1)/2$, this is \begin{align*} \|\Gamma(t,\cdot)\|_{L^r}^r=(4\pi)^{-nr/2}\left(\int_{\mathbb R^n}\exp\left(-\frac{r|z|^2}{4}\right)\,d\mathcal L^n(z)\right)t^{-n(r-1)/2}. \end{align*} Taking the $r$th root gives \begin{align*} \|\Gamma(t,\cdot)\|_{L^r}=(4\pi)^{-n/2}\left(\int_{\mathbb R^n}\exp\left(-\frac{r|z|^2}{4}\right)\,d\mathcal L^n(z)\right)^{1/r}t^{-\frac n2(1-1/r)}. \end{align*} Thus for $1\le r<\infty$, \begin{align*} \|\Gamma(t,\cdot)\|_{L^r}=C_{n,r}t^{-\frac n2(1-1/r)}, \end{align*} where \begin{align*} C_{n,r}:=(4\pi)^{-n/2}\left(\int_{\mathbb R^n}\exp\left(-\frac{r|z|^2}{4}\right)\,d\mathcal L^n(z)\right)^{1/r}. \end{align*} For $r=\infty$, the exponential factor is largest when $x=0$, so \begin{align*} \|\Gamma(t,\cdot)\|_{L^\infty}=\Gamma(t,0)=(4\pi t)^{-n/2}=(4\pi)^{-n/2}t^{-n/2}. \end{align*} This is the same formula with $1/r=0$. The power of $t$ comes only from the parabolic scaling $x=\sqrt t\,z$, and this is why the same exponent appears in the later $L^p$-$L^q$ heat estimates. [/example] The kernel norm example converts the abstract convolution inequality into a time-dependent bound for the heat flow. The next theorem records that gain: starting in $L^p$, the solution belongs to every larger $L^q$ space at positive time, with the diffusion scale determining the power of $t$. [quotetheorem:7062] [citeproof:7062] The restriction $p\le q$ reflects the direction of parabolic smoothing in this Young-inequality argument. If $q<p$ on $\mathbb R^n$, no estimate of the displayed form can hold for arbitrary $L^p$ data, because there are functions in $L^p(\mathbb R^n)$ that do not belong to $L^q(\mathbb R^n)$ at spatial infinity, such as slowly decaying power functions chosen with exponent between $n/p$ and $n/q$. The requirement $f\in L^p$ is likewise essential: taking a nonzero constant function shows that an $L^1\to L^\infty$ estimate cannot apply outside $L^1$, since the initial norm on the right would be infinite rather than a finite control quantity. The singular factor as $t\downarrow0$ is necessary when $p<q$; for approximate identities $f_\varepsilon=\varepsilon^{-n/p}\mathbb{1}_{B(0,\varepsilon)}$, the $L^p$ norm stays bounded while the $L^q$ norm of $e^{t\Delta}f_\varepsilon$ at times comparable to $\varepsilon^2$ grows like the stated power. At fixed positive time, diffusion spreads mass over a region of scale $\sqrt t$, and the power of $t$ records exactly that scaling. Once this estimate is known, the next natural question is whether heat flow also creates derivatives, not only better integrability. [quotetheorem:7063] [citeproof:7063] These derivative estimates express instantaneous regularization: no matter how rough the initial $L^p$ function is, the solution is smooth at every positive time. The factor $t^{-|\alpha|/2}$ is forced by parabolic scaling, since one spatial derivative costs one power of the diffusion length $\sqrt t$. A concrete test case is $f=\mathbb{1}_{(0,\infty)}$ on $\mathbb R$: $e^{t\Delta}f$ is an error-function profile and its spatial derivative at the origin is $(4\pi t)^{-1/2}$, so the first derivative really blows up like $t^{-1/2}$. For a point mass initial datum, the derivative of the heat kernel itself has exactly the same scaling predicted by the estimate. The hypotheses also cannot be dropped to arbitrary distributions without changing the right-hand side, since a derivative of the Dirac mass produces a heat flow whose size is controlled by distributional order rather than an $L^p$ norm. This limitation is useful rather than accidental, because it tells later energy estimates exactly how regularity should deteriorate near the initial time. [example: Smoothing of a Finite Measure] Let $\mu$ be a finite signed Borel measure on $\mathbb R^n$, with total variation measure $|\mu|$, and define \begin{align*} u(t,x)=\int_{\mathbb R^n}\Gamma(t,x-y)\,d\mu(y) \end{align*} for $t>0$. For each fixed $t>0$ and multi-index $\alpha$, differentiating the Gaussian in the $x$ variable gives \begin{align*} D_x^\alpha\Gamma(t,x-y)=t^{-\frac{n+|\alpha|}{2}}P_\alpha\left(\frac{x-y}{\sqrt t}\right)\exp\left(-\frac{|x-y|^2}{4t}\right) \end{align*} for a polynomial $P_\alpha$ depending only on $\alpha$ and $n$. Since a polynomial times a Gaussian is bounded on $\mathbb R^n$, the function $y\mapsto D_x^\alpha\Gamma(t,x-y)$ is bounded for $x$ in any compact set. Because $|\mu|(\mathbb R^n)<\infty$, dominated convergence for finite measures justifies differentiating under the integral, so \begin{align*} D_x^\alpha u(t,x)=\int_{\mathbb R^n}D_x^\alpha\Gamma(t,x-y)\,d\mu(y). \end{align*} Thus $u(t,\cdot)\in C^\infty(\mathbb R^n)$ for every $t>0$. The $L^\infty$ bound follows from the total variation inequality for signed measures: \begin{align*} |u(t,x)|\le \int_{\mathbb R^n}\Gamma(t,x-y)\,d|\mu|(y). \end{align*} Since \begin{align*} \Gamma(t,x-y)=(4\pi t)^{-n/2}\exp\left(-\frac{|x-y|^2}{4t}\right) \end{align*} and $\exp(-|x-y|^2/(4t))\le1$, we have \begin{align*} \Gamma(t,x-y)\le (4\pi t)^{-n/2}. \end{align*} Therefore \begin{align*} |u(t,x)|\le (4\pi t)^{-n/2}\int_{\mathbb R^n}d|\mu|(y)=(4\pi t)^{-n/2}|\mu|(\mathbb R^n). \end{align*} Taking the supremum over $x$ gives \begin{align*} \|u(t,\cdot)\|_{L^\infty}\le (4\pi t)^{-n/2}|\mu|(\mathbb R^n). \end{align*} For $\mu=\delta_0$, the defining property of the Dirac measure gives \begin{align*} u(t,x)=\int_{\mathbb R^n}\Gamma(t,x-y)\,d\delta_0(y)=\Gamma(t,x-0)=\Gamma(t,x). \end{align*} Thus a point mass is instantly replaced, for every $t>0$, by the smooth Gaussian density $\Gamma(t,\cdot)$. [/example] ## Positivity, Conservation, and Gaussian Tails The final question in this chapter is which qualitative features of the initial data survive under the heat flow. The heat kernel is positive, has integral $1$, and decays like a Gaussian; these three facts control sign, total mass, and spatial tails. [quotetheorem:7064] [citeproof:7064] This is the analytic form of the physical statement that heat density cannot become negative if it starts nonnegative. The concrete $L^p$ hypothesis matters because it gives a standard setting in which the convolution operator has already been defined. Positivity uses only the sign of the kernel, not differentiability of the solution or smoothness of the initial datum. The statement is limited to sign-preserving data and to the heat semigroup itself: if $f$ changes sign, for instance $f=\mathbb{1}_{(0,\infty)}-\mathbb{1}_{(-\infty,0)}$ on $\mathbb R$, the evolved function remains negative on the left and positive on the right by symmetry, so no nonnegativity conclusion is possible. Positivity preservation also does not say that zero sets remain fixed; if $f\ge0$ is nonzero and integrable, the strict positivity of the kernel makes $e^{t\Delta}f(x)>0$ for every $t>0$ and every $x$. The next property adds the normalisation of the kernel: in the absence of sources or boundary leakage, the total amount of heat is unchanged. [quotetheorem:7065] [citeproof:7065] Mass conservation should be read alongside the $L^p$-$L^q$ estimates. The $L^1$ assumption is essential for the displayed total mass to be finite and for [Fubini's theorem](/theorems/2961) to apply in this direct form: for $f\equiv1$, the heat flow is still $1$, and both integrals over $\mathbb R^n$ are infinite rather than conserved finite masses. For sign-changing data, the conserved quantity is the signed integral, while the $L^1$ norm itself may decrease by cancellation; for example, if $f=\mathbb{1}_{[0,1]}-\mathbb{1}_{[1,2]}$ on $\mathbb R$, then the signed integral is $0$ for all $t$, but the $L^1$ norm of $e^{t\Delta}f$ is smaller for positive time because the positive and negative parts overlap after convolution. For nonnegative data, positivity turns the conserved signed integral into conservation of total mass. Higher norms may decrease because the same mass spreads over a larger region, and the remaining issue is how far that mass can travel by a fixed time. [quotetheorem:7066] [citeproof:7066] This bound records the infinite propagation speed of the heat equation in a quantitative way. The compact-support hypothesis is what converts the geometry of the support into the lower bound $|x-y|\ge |x|/2$; without spatial localisation, no such tail estimate follows from the $L^1$ norm alone. A concrete counterexample is obtained by placing small bumps far away: let $f_k=\mathbb{1}_{B(x_k,1)}$ with $|x_k|\to\infty$, rescaled so that $\|f_k\|_{L^1}=1$. At the point $x=x_k$ and any fixed $t>0$, the value $e^{t\Delta}f_k(x_k)$ is bounded below by a positive constant depending on $n$ and $t$, while the displayed Gaussian bound with $|x_k|$ on the right tends to $0$. The constants are not the main point, since sharper estimates are possible when the support is described more precisely. What matters for later parabolic theory is the qualitative lesson: compact support is lost immediately, but the newly created tails are Gaussian rather than arbitrary. [remark: Infinite Propagation Speed] If $f\ge0$ is not identically zero and belongs to $L^1(\mathbb R^n)$, then $e^{t\Delta}f(x)>0$ for every $t>0$ and every $x\in\mathbb R^n$. Thus heat is felt instantly at every point of space. Later chapters contrast this behaviour with the wave equation, where finite propagation speed is a central feature. [/remark] The heat equation on Euclidean space therefore provides the model parabolic picture: explicit Gaussian representation, uniqueness through comparison under suitable growth conditions, immediate smoothing, conservation of total mass, positivity preservation, and Gaussian spatial decay. These facts will be reused when studying variable-coefficient parabolic equations, where the kernel is no longer explicit but the same estimates guide the weak and energy-based theory. The explicit heat kernel makes the parabolic mechanism completely transparent, but it also suggests what survives when formulas disappear. The next chapter replaces convolution identities with energy estimates and weak formulations, so the theory can handle rough coefficients and forcing terms. # 3. Energy Methods for Parabolic Equations This chapter turns the heat equation from an explicit convolution formula into an estimate-driven theory. The guiding question is how much control remains when the coefficients are only measurable, the forcing term is rough, and the solution is no longer differentiable enough for pointwise calculations. Energy methods answer this by testing the equation against the solution itself, converting parabolic evolution into inequalities for norms in Hilbert spaces. Chapter 2 used the heat kernel on $\mathbb R^n$ to see smoothing, positivity, conservation of mass, and Gaussian decay in constant-coefficient problems. Here we keep the dissipative structure but replace formulas by weak identities. This is the bridge from classical heat flow to the variational theory used for linear and semilinear parabolic equations. Throughout this chapter, spatial integrals over $U\subset\mathbb R^n$ are taken with respect to [Lebesgue measure](/page/Lebesgue%20Measure) $d\mathcal L^n(x)$ unless another measure is specified. ## Energy Identities for Divergence-Form Heat Equations The basic problem is to measure the size of a solution without solving the equation explicitly. For an equation of the form \begin{align*} u_t-\operatorname{div}(A\nabla u)=f, \end{align*} multiplying by $u$ should say that diffusion decreases the $L^2$ energy while forcing may add energy. The point of the energy identity is to make this statement precise before weakening the hypotheses. [definition: Divergence-Form Parabolic Operator] Let $U\subset \mathbb R^n$ be open, let $T>0$, and let $A\in L^\infty(U\times(0,T);\mathbb R^{n\times n})$. The divergence-form parabolic operator associated to $A$ is the [linear map](/page/Linear%20Map) \begin{align*} L_A:\{u\in L^2(0,T;H^1_0(U)):u_t\in L^2(0,T;H^{-1}(U))\}\to L^2(0,T;H^{-1}(U)) \end{align*} defined by \begin{align*} (L_Au)(t)(v):=u_t(t)(v)+\int_U A(x,t)\nabla u(x,t)\cdot\nabla v(x)\,d\mathcal L^n(x) \end{align*} for a.e. $t\in(0,T)$ and every $v\in H^1_0(U)$. [/definition] For smooth coefficients and smooth $u$, this weak definition agrees with the pointwise expression \begin{align*} L_Au = u_t - \sum_{i,j=1}^n \partial_{x_i}(A_{ij}\partial_{x_j}u) =u_t-\operatorname{div}(A\nabla u). \end{align*} This form places all second spatial derivatives in divergence form, so integration by parts transfers derivatives from $u$ to the [test function](/page/Test%20Function). The first estimate we need is therefore an identity showing exactly what is produced by testing the equation with $u$ itself. [example: Smooth Heat Equation Energy Balance] Let $U\subset\mathbb R^n$ be bounded with smooth boundary, let $u\in C^1([0,T];L^2(U))\cap C^0([0,T];H^1_0(U))$ be smooth in space, and suppose \begin{align*} u_t-\Delta u=f \end{align*} in $U\times(0,T)$ with $u=0$ on $\partial U\times(0,T)$. Fix $t\in(0,T)$. Multiplying the equation by $u(\cdot,t)$ and integrating over $U$ gives \begin{align*} \int_U u_t(x,t)u(x,t)\,d\mathcal L^n(x)-\int_U \Delta u(x,t)u(x,t)\,d\mathcal L^n(x)=\int_U f(x,t)u(x,t)\,d\mathcal L^n(x). \end{align*} The first term is the time derivative of the $L^2$ energy, because differentiating under the integral gives \begin{align*} \frac{d}{dt}\frac12\|u(t)\|_{L^2(U)}^2=\frac{d}{dt}\frac12\int_U u(x,t)^2\,d\mathcal L^n(x)=\int_U u(x,t)u_t(x,t)\,d\mathcal L^n(x). \end{align*} For the Laplacian term, Green's identity and the boundary condition $u=0$ on $\partial U$ give \begin{align*} -\int_U \Delta u(x,t)u(x,t)\,d\mathcal L^n(x)=\int_U \nabla u(x,t)\cdot\nabla u(x,t)\,d\mathcal L^n(x)-\int_{\partial U}\partial_\nu u(x,t)u(x,t)\,dS(x). \end{align*} Since $u(x,t)=0$ for $x\in\partial U$, the boundary integral is $0$, and therefore \begin{align*} -\int_U \Delta u(x,t)u(x,t)\,d\mathcal L^n(x)=\int_U |\nabla u(x,t)|^2\,d\mathcal L^n(x)=\|\nabla u(t)\|_{L^2(U)}^2. \end{align*} Substituting these two identities into the integrated equation yields \begin{align*} \frac{1}{2}\frac{d}{dt}\|u(t)\|_{L^2(U)}^2+\|\nabla u(t)\|_{L^2(U)}^2=(f(t),u(t))_{L^2(U)}. \end{align*} Thus the heat equation converts the PDE into a balance between the time derivative of the $L^2$ energy, the nonnegative spatial dissipation $\|\nabla u(t)\|_{L^2(U)}^2$, and the work done by the forcing term. [/example] The calculation for the Laplacian suggests that variable coefficients should be allowed when they still create positive dissipation. To make the same identity useful for rough coefficients, we isolate the two assumptions that enter the energy calculation: positivity of the quadratic form and boundedness of the coefficient matrix. [definition: Uniformly Elliptic Coefficient Field] Let $U\subset\mathbb R^n$ be open. A measurable matrix field $A:U\times(0,T)\to\mathbb R^{n\times n}$ is uniformly elliptic and bounded if there exist constants $\theta>0$ and $\Lambda<\infty$ such that, for a.e. $(x,t)\in U\times(0,T)$ and every $\xi,\eta\in\mathbb R^n$, \begin{align*} \sum_{i,j=1}^n A_{ij}(x,t)\xi_i\xi_j \ge \theta |\xi|^2 \end{align*} and \begin{align*} |A(x,t)\eta|\le \Lambda |\eta|. \end{align*} [/definition] The [first inequality](/theorems/2897) is the coercive part of diffusion; the second prevents the coefficient field from creating an unbounded bilinear form. To turn the parabolic equation into estimates, one must know exactly what happens when the equation is tested against the solution itself: the time derivative should become a change in $L^2$ energy, while the elliptic part should become a nonnegative dissipation term. The following balance law isolates that computation before any inequalities are imposed on the forcing term. [quotetheorem:7067] [citeproof:7067] This identity is the parabolic analogue of [conservation of energy](/theorems/1335) for the wave equation, except that the diffusion term has a sign. The smoothness hypothesis is used only to justify differentiating the energy and integrating by parts; without a boundary condition such as $u=0$ on $\partial U$, an additional boundary flux term appears and the displayed identity is no longer closed. For example, on an interval $U=(0,1)$, integration by parts for $u_t-u_{xx}=0$ produces the boundary contribution $-u_x(1,t)u(1,t)+u_x(0,t)u(0,t)$, which is not controlled by the interior $L^2$ energy unless boundary data or flux conditions supply extra information. If the coefficient field is not positive, the term involving $A\nabla u$ can fail to be dissipative; in the scalar case $A=-1$, the equation becomes the backward heat equation and the same computation gives the wrong sign for controlling $\|\nabla u\|_{L^2}^2$. The theorem does not give existence or regularity, and it does not by itself bound the forcing term. Since the term involving $A\nabla u$ is nonnegative after ellipticity, the identity immediately leads to an inequality that survives passage to weak limits. [quotetheorem:603] [citeproof:603] The inequality has two outputs: $u$ is bounded in $L^\infty(0,T;L^2(U))$ and its gradient lies in $L^2(0,T;L^2(U))$. Uniform ellipticity is essential here: if the coefficient field loses positivity, the gradient term may no longer control diffusion, and backward-heat behaviour can make the $L^2$ energy grow instead of decay. The forcing assumption is also part of the estimate, not a cosmetic choice: a time-dependent distribution outside $L^2(0,T;H^{-1}(U))$, such as a spatial Dirac mass in dimensions where point evaluation is not bounded on $H^1_0(U)$, cannot be paired with $u(t)$ in the displayed duality estimate. The homogeneous boundary condition is likewise needed to close the estimate; prescribed nonzero boundary values inject energy through the boundary unless one first subtracts an extension and estimates the resulting source terms. The estimate also does not assert pointwise smoothness or uniqueness by itself; it is an a priori bound in the natural weak norms. A useful test case is forcing that is too rough to be paired with $u$ by an $L^2$ [inner product](/page/Inner%20Product) but is still meaningful as a functional on $H^1_0(U)$. [example: Heat Equation with Rough Forcing] Let $U\subset\mathbb R^n$ be bounded, let $f\in L^2(0,T;H^{-1}(U))$, and let $u$ solve \begin{align*} u_t-\Delta u=f,\qquad u|_{\partial U}=0. \end{align*} For the heat operator, the coefficient matrix is $A=I$, so \begin{align*} A\nabla u\cdot\nabla u=I\nabla u\cdot\nabla u=|\nabla u|^2. \end{align*} Thus the parabolic energy inequality with $\theta=1$ gives, for every $t\in[0,T]$, \begin{align*} \|u(t)\|_{L^2(U)}^2+\int_0^t\|\nabla u(\tau)\|_{L^2(U)}^2\,d\tau\le \|u(0)\|_{L^2(U)}^2+\int_0^t\|f(\tau)\|_{H^{-1}(U)}^2\,d\tau. \end{align*} Since $\int_0^t\|f(\tau)\|_{H^{-1}(U)}^2\,d\tau\le \int_0^{\mathsf T}\|f(\tau)\|_{H^{-1}(U)}^2\,d\tau$, this implies \begin{align*} \|u(t)\|_{L^2(U)}^2+\int_0^t\|\nabla u(\tau)\|_{L^2(U)}^2\,d\tau\le \|u(0)\|_{L^2(U)}^2+\|f\|_{L^2(0,T;H^{-1}(U))}^2. \end{align*} Taking the supremum of the first term over $0\le t\le T$ gives \begin{align*} \sup_{0\le t\le T}\|u(t)\|_{L^2(U)}^2\le \|u(0)\|_{L^2(U)}^2+\|f\|_{L^2(0,T;H^{-1}(U))}^2. \end{align*} Taking $t=T$ in the same estimate gives \begin{align*} \int_0^{\mathsf T}\|\nabla u(\tau)\|_{L^2(U)}^2\,d\tau\le \|u(0)\|_{L^2(U)}^2+\|f\|_{L^2(0,T;H^{-1}(U))}^2. \end{align*} Adding the last two inequalities yields \begin{align*} \sup_{0\le t\le T}\|u(t)\|_{L^2(U)}^2+\int_0^{\mathsf T}\|\nabla u(\tau)\|_{L^2(U)}^2\,d\tau\le 2\left(\|u(0)\|_{L^2(U)}^2+\|f\|_{L^2(0,T;H^{-1}(U))}^2\right). \end{align*} The forcing term is used only through the duality pairing $f(t)(u(t))$ between $H^{-1}(U)$ and $H^1_0(U)$, so it need not be an $L^2(U)$ function at each time; it may be any square-integrable time-dependent functional on $H^1_0(U)$. [/example] ## Coercivity, Gronwall Estimates, and Continuous Dependence Energy identities become more powerful when the equation contains terms that do not have a fixed sign. The main question is how to keep control when lower-order terms, nonhomogeneous data, or comparisons between two solutions introduce an energy inequality with a multiple of the unknown energy on the right-hand side. [definition: Coercive Parabolic Bilinear Form] Let $V$ be a Hilbert space continuously embedded in a Hilbert space $H$, and identify $H$ with a subspace of $V^*$. A family of bilinear forms $a(t;\cdot,\cdot):V\times V\to\mathbb R$ is coercive on $V$ if there exist constants $\alpha>0$ and $\beta\ge 0$ such that \begin{align*} a(t;v,v)+\beta\|v\|_H^2\ge \alpha\|v\|_V^2 \end{align*} for a.e. $t\in(0,T)$ and every $v\in V$. [/definition] The parameter $\beta$ allows the lower-order part of the operator to be mildly negative. This definition is useful only if the resulting term $\beta\|v\|_H^2$ can be controlled over time, so the next step is the integral inequality that converts such terms into exponential bounds. [quotetheorem:872] [citeproof:872] For parabolic equations, Gronwall turns a differential inequality into stability with respect to initial data and forcing. The nonnegativity of $b$ is what makes the accumulated growth exponential rather than sign-cancelling; if $b$ is not integrable, the exponential factor may be infinite and the conclusion gives no finite bound. A concrete warning is $y'(t)=t^{-1}y(t)$ on $(0,T)$: formal integration gives $y(t)=Ct$, but prescribing data at $0$ does not control the growth through an integrable coefficient, since $\int_0^t \tau^{-1}\,d\tau$ diverges. In energy estimates, this is the difference between a bounded lower-order coefficient that can be absorbed into an exponential constant and a singular coefficient that destroys continuous dependence at the initial time. The lemma does not improve regularity or supply dissipation, so the PDE estimate must provide those terms before Gronwall is applied. The next theorem packages this idea in the Hilbert triple framework, where the same estimate applies to the heat equation, uniformly elliptic divergence-form operators, and the stability bounds later needed for Galerkin approximations. [quotetheorem:7068] Continuous dependence says that data determine the solution stably in the natural energy norms. The regularity assumptions are needed so that $w=u_1-u_2$ can be used as an energy test function and so that $u_i(0)$ is meaningful in $H$; without time continuity in $H$, the initial condition would not be a well-defined trace. Boundedness and coercivity are the analytic substitutes for a positive definite matrix: without boundedness, $\mathcal A(t)w$ need not define an element of $V^*$ with a uniform estimate, while without coercivity the energy calculation may lose the term controlling $\|w\|_V^2$. A model failure is an elliptic part with negative sign, where the difference equation has backward-heat growth rather than damping. The theorem is still conditional on the existence of solutions in the stated class and does not produce one. If two weak heat solutions have identical data, the estimate should force their difference to vanish; the next theorem states this uniqueness conclusion in the concrete Dirichlet heat setting. [quotetheorem:692] [citeproof:692] The variable-coefficient estimate is structurally the same, but it highlights why bounded measurable coefficients are acceptable. The zero initial difference is essential: if $z(0)\ne0$, the same computation gives stability rather than equality. Homogeneous boundary data are also part of the closed energy identity; nonzero boundary data require first reducing to zero boundary values or estimating boundary terms. The next example shows that no derivatives of $A$ are needed when the equation is written in divergence form. [example: Variable-Coefficient Diffusion with Measurable Coefficients] Let $A\in L^\infty(U\times(0,T);\mathbb R^{n\times n})$ be uniformly elliptic with ellipticity constant $\theta>0$, and let $u$ solve \begin{align*} u_t-\operatorname{div}(A\nabla u)=0 \end{align*} with $u=0$ on $\partial U\times(0,T)$. Applying the homogeneous case $f=0$ of the *Parabolic Energy Identity for Smooth Solutions* gives \begin{align*} \frac{1}{2}\|u(t)\|_{L^2(U)}^2+\int_0^t\int_U A(x,\tau)\nabla u(x,\tau)\cdot\nabla u(x,\tau)\,d\mathcal L^n(x)\,d\tau=\frac{1}{2}\|u(0)\|_{L^2(U)}^2. \end{align*} Multiplying both sides by $2$ yields \begin{align*} \|u(t)\|_{L^2(U)}^2+2\int_0^t\int_U A(x,\tau)\nabla u(x,\tau)\cdot\nabla u(x,\tau)\,d\mathcal L^n(x)\,d\tau=\|u(0)\|_{L^2(U)}^2. \end{align*} For a.e. $(x,\tau)$, uniform ellipticity applied with $\xi=\nabla u(x,\tau)$ gives \begin{align*} A(x,\tau)\nabla u(x,\tau)\cdot\nabla u(x,\tau)\ge \theta|\nabla u(x,\tau)|^2. \end{align*} Therefore \begin{align*} 2\int_0^t\int_U A(x,\tau)\nabla u(x,\tau)\cdot\nabla u(x,\tau)\,d\mathcal L^n(x)\,d\tau\ge 2\theta\int_0^t\int_U|\nabla u(x,\tau)|^2\,d\mathcal L^n(x)\,d\tau. \end{align*} Since \begin{align*} \int_U|\nabla u(x,\tau)|^2\,d\mathcal L^n(x)=\|\nabla u(\tau)\|_{L^2(U)}^2, \end{align*} we obtain \begin{align*} \|u(t)\|_{L^2(U)}^2+2\theta\int_0^t\|\nabla u(\tau)\|_{L^2(U)}^2\,d\tau\le \|u(0)\|_{L^2(U)}^2. \end{align*} No derivative of $A$ appears in this estimate: the coefficient field enters only through the measurable quadratic form $A\nabla u\cdot\nabla u$, so $A$ may oscillate or jump in $(x,t)$ as long as the same ellipticity and boundedness constants hold. [/example] ## Weak Formulation in Bochner Spaces The energy estimates suggest the right regularity class, but they also create a technical question. If $u\in L^2(0,T;H^1_0(U))$ and $u_t\in L^2(0,T;H^{-1}(U))$, then $u(t)$ is not initially defined as an $L^2(U)$-valued function for every time. To speak about initial values and energy at time $t$, we need a time-continuity theorem. [definition: Parabolic Energy Space] Let $U\subset\mathbb R^n$ be bounded and $T>0$. The parabolic energy space with zero boundary values is \begin{align*} \mathcal W(0,T;U):=\{u\in L^2(0,T;H^1_0(U)) : u_t\in L^2(0,T;H^{-1}(U))\}. \end{align*} [/definition] The energy space records the two norms controlled by the previous estimates, but a solution concept also needs an equation, boundary condition, and initial value. The next definition packages these requirements so that the PDE is tested against $H^1_0(U)$ functions and the initial datum is imposed through time continuity. [definition: Weak Solution of a Divergence-Form Parabolic Equation] Let $U\subset\mathbb R^n$ be bounded and open, let $T>0$, let $A$ be uniformly elliptic and bounded, let $f\in L^2(0,T;H^{-1}(U))$, and let $u_0\in L^2(U)$. A function $u\in\mathcal W(0,T;U)$ is a weak solution of \begin{align*} u_t-\operatorname{div}(A\nabla u)=f,\qquad u|_{\partial U}=0,\qquad u(0)=u_0, \end{align*} if for a.e. $t\in(0,T)$ and every $v\in H^1_0(U)$, \begin{align*} u_t(t)(v)+\int_U A(x,t)\nabla u(x,t)\cdot\nabla v(x)\,d\mathcal L^n(x)=f(t)(v), \end{align*} and the continuous representative of $u$ in $C([0,T];L^2(U))$ satisfies $u(0)=u_0$. [/definition] The last clause is meaningful only after proving that functions in the parabolic energy space have continuous $L^2$ representatives. Both assumptions in the energy space matter: an $L^2(0,T;V)$ function alone can be redefined on time slices arbitrarily, while control of $u'$ in $L^2(0,T;V^*)$ prevents such jumps in the weak sense. The result still gives continuity only in $H$, not in the stronger space $V$. The next theorem supplies that missing link and also justifies differentiating the square of the $H$-norm in weak energy calculations. [quotetheorem:7069] [citeproof:7069] With this lemma, the weak formulation has the same energy mechanism as the smooth one. The Gelfand triple hypothesis is essential: it supplies the duality pairing $V^*,V$ and the intermediate norm in which time traces live. The assumption $u\in L^2(0,T;V)$ alone cannot provide such traces: if $v\in V$ is nonzero and $u(t)=\mathbb{1}_{(T/2,T)}(t)v$, then $u$ belongs to $L^2(0,T;V)$, but its value at $T/2$ is not determined by the equivalence class and its distributional time derivative is a Dirac mass rather than an element of $L^2(0,T;V^*)$. The lemma does not identify boundary traces or improve spatial regularity, so it must be combined with the weak formulation rather than replacing it. The final theorem in this chapter records that the parabolic energy inequality remains valid in the weak class, which is the estimate later used in Galerkin compactness and stability arguments. [quotetheorem:7070] [citeproof:7070] This completes the passage from classical identities to weak parabolic theory. The estimate is also the compactness input behind the variational construction of solutions: boundedness in $L^2(0,T;H^1_0(U))$ and control of $u_t$ in $L^2(0,T;H^{-1}(U))$ are the hypotheses that make weak compactness and time-trace arguments usable. In semigroup language, the same dissipation is the Hilbert-space shadow of contractivity for the heat flow; in [numerical analysis](/page/Numerical%20Analysis), it is the continuum version of the stability estimate sought for implicit time-stepping schemes. The same coercive structure also reappears in stochastic diffusion through Dirichlet forms, where energy decay encodes the irreversible spreading of heat. Later chapters use this pattern for existence by Galerkin approximation, smoothing by bootstrapping, and semilinear problems where the energy inequality is closed by structural assumptions on the nonlinearity. Chapter 3 showed how energy identities and coercivity control parabolic flows even without an explicit kernel. The natural next step is to package those ideas in the language of generators and semigroups, which applies on general Banach and Hilbert spaces and unifies many existence arguments. # 4. Semigroups and Abstract Parabolic Theory This chapter replaces the heat-kernel formulas of Chapter 2 and the energy estimates of Chapter 3 by an operator-theoretic language that applies on general Banach and Hilbert spaces. The guiding question is: when does an unbounded spatial operator $A$ define a well-posed evolution $u_t = Au + f$? Semigroup theory answers this by encoding the solution operators $S(t)$ first, and then recovering the differential operator as their infinitesimal generator. ## Strongly Continuous Semigroups and Infinitesimal Generators For an autonomous linear Cauchy problem, the expected solution operators should satisfy two structural rules: evolving for time $s$ and then time $t$ is the same as evolving for time $s+t$, and evolving for time $0$ does nothing. On infinite-dimensional spaces, norm-continuity in time is too restrictive for the PDE examples we need, so the correct continuity requirement is pointwise in the Banach-space norm. [definition: Strongly Continuous Semigroup] Let $X$ be a Banach space. A family $(S(t))_{t \ge 0} \subset \mathcal{L}(X)$ is a strongly continuous semigroup on $X$ if: 1. $S(0)=I$; 2. $S(t+s)=S(t)S(s)$ for all $t,s \ge 0$; 3. for every $u \in X$, $S(t)u \to u$ in $X$ as $t \downarrow 0$. [/definition] The third condition is strong continuity at $0$; the semigroup law then gives strong continuity at every $t \ge 0$. The notation $S(t)$ is chosen to remind us that these operators play the role of $e^{tA}$ even when $A$ is unbounded. [example: Heat Semigroup On Euclidean Space] Let $X=L^p(\mathbb R^n)$ for $1\le p<\infty$. For $t>0$ define \begin{align*} G_t(x)=(4\pi t)^{-n/2}e^{-|x|^2/(4t)} \end{align*} and set $S(t)u_0=G_t*u_0$, while $S(0)u_0=u_0$. First, $\int_{\mathbb R^n}G_t\,d\mathcal L^n=1$, since the change of variables $x=\sqrt{4t}\,z$ gives \begin{align*} \int_{\mathbb R^n}(4\pi t)^{-n/2}e^{-|x|^2/(4t)}\,dx=(4\pi t)^{-n/2}(4t)^{n/2}\int_{\mathbb R^n}e^{-|z|^2}\,dz=1. \end{align*} Here $\int_{\mathbb R^n}e^{-|z|^2}\,dz=\pi^{n/2}$, so the final factor is $(4\pi t)^{-n/2}(4t)^{n/2}\pi^{n/2}=1$. Hence Young's convolution inequality gives \begin{align*} \|S(t)u_0\|_{L^p}\le \|G_t\|_{L^1}\|u_0\|_{L^p}=\|u_0\|_{L^p}. \end{align*} For $s,t>0$, compute the convolution kernel explicitly: \begin{align*} (G_t*G_s)(x)=\int_{\mathbb R^n}(4\pi t)^{-n/2}(4\pi s)^{-n/2}e^{-|x-y|^2/(4t)}e^{-|y|^2/(4s)}\,dy. \end{align*} The exponent combines by the identity \begin{align*} \frac{|x-y|^2}{t}+\frac{|y|^2}{s}=\frac{t+s}{ts}\left|y-\frac{s}{t+s}x\right|^2+\frac{|x|^2}{t+s}. \end{align*} Substituting this identity into the integral gives \begin{align*} (G_t*G_s)(x)=(4\pi t)^{-n/2}(4\pi s)^{-n/2}e^{-|x|^2/(4(t+s))}\int_{\mathbb R^n}e^{-(t+s)|y-\frac{s}{t+s}x|^2/(4ts)}\,dy. \end{align*} With $z=\sqrt{(t+s)/(4ts)}\left(y-\frac{s}{t+s}x\right)$, the remaining Gaussian integral is \begin{align*} \int_{\mathbb R^n}e^{-(t+s)|y-\frac{s}{t+s}x|^2/(4ts)}\,dy=\left(\frac{4ts}{t+s}\right)^{n/2}\pi^{n/2}. \end{align*} Therefore \begin{align*} (G_t*G_s)(x)=(4\pi(t+s))^{-n/2}e^{-|x|^2/(4(t+s))}=G_{t+s}(x). \end{align*} Thus $S(t)S(s)u_0=G_t*(G_s*u_0)=(G_t*G_s)*u_0=G_{t+s}*u_0=S(t+s)u_0$. Finally, $(G_t)_{t>0}$ is an approximation identity: $G_t\ge0$, $\|G_t\|_{L^1}=1$, and for every $\delta>0$, \begin{align*} \int_{|x|>\delta}G_t(x)\,dx=\pi^{-n/2}\int_{|z|>\delta/\sqrt{4t}}e^{-|z|^2}\,dz\to0 \end{align*} as $t\downarrow0$. Hence $G_t*u_0\to u_0$ in $L^p(\mathbb R^n)$ for $1\le p<\infty$. The heat kernels therefore form a strongly continuous semigroup, justifying the notation $S(t)=e^{t\Delta}$ for this evolution. [/example] The heat semigroup gives the time evolution, but the original PDE is written using the spatial operator $\Delta$. To decide which abstract operator is hidden inside a general semigroup, we need a definition that extracts the derivative of the orbit $t\mapsto S(t)u$ at the initial time. [definition: Infinitesimal Generator] Let $(S(t))_{t \ge 0}$ be a strongly continuous semigroup on a Banach space $X$. Its infinitesimal generator $A$ is the operator with domain \begin{align*} D(A)=\left\{u\in X: \lim_{t\downarrow 0}\frac{S(t)u-u}{t}\text{ exists in }X\right\}, \end{align*} and action \begin{align*} Au=\lim_{t\downarrow 0}\frac{S(t)u-u}{t}, \qquad u\in D(A). \end{align*} [/definition] The generator is often unbounded, and its domain is part of the data. For the heat semigroup on $L^2(\mathbb R^n)$, the generator is the Laplacian with domain $H^2(\mathbb R^n)$; for the Dirichlet heat flow on a bounded domain, the generator is the Dirichlet Laplacian. [example: Dirichlet Heat Semigroup] Let $\Omega\subset\mathbb R^n$ be bounded with [boundary regularity](/theorems/99) sufficient for the elliptic domain identity \begin{align*}D(\Delta_D)=H^2(\Omega)\cap H^1_0(\Omega)\end{align*} on $L^2(\Omega)$. For $u_0\in L^2(\Omega)$, define $S(t)u_0$ to be the weak solution $u(t)$ of \begin{align*}u_t=\Delta u,\qquad u|_{\partial\Omega}=0,\qquad u(0)=u_0.\end{align*} For smooth initial data, multiply the equation by $u(t)$ and integrate over $\Omega$. Since $u(t)$ has zero trace on $\partial\Omega$, integration by parts gives \begin{align*}\int_\Omega u_t(t)u(t)\,d\mathcal L^n=\int_\Omega \Delta u(t)u(t)\,d\mathcal L^n=-\int_\Omega |\nabla u(t)|^2\,d\mathcal L^n.\end{align*} The left-hand side is the time derivative of the squared $L^2$ norm divided by $2$: \begin{align*}\int_\Omega u_t(t)u(t)\,d\mathcal L^n=\frac{1}{2}\frac{d}{dt}\int_\Omega |u(t)|^2\,d\mathcal L^n.\end{align*} Therefore \begin{align*}\frac{1}{2}\frac{d}{dt}\|u(t)\|_{L^2}^2=-\|\nabla u(t)\|_{L^2}^2\le 0.\end{align*} Integrating from $0$ to $t$ gives \begin{align*}\|u(t)\|_{L^2}^2+2\int_0^t\|\nabla u(r)\|_{L^2}^2\,dr=\|u_0\|_{L^2}^2.\end{align*} Hence \begin{align*}\|S(t)u_0\|_{L^2}=\|u(t)\|_{L^2}\le \|u_0\|_{L^2}.\end{align*} For general $u_0\in L^2(\Omega)$, choose smooth zero-trace data $u_0^{(k)}\to u_0$ in $L^2(\Omega)$. The estimate above gives \begin{align*}\|S(t)u_0^{(k)}-S(t)u_0^{(m)}\|_{L^2}\le \|u_0^{(k)}-u_0^{(m)}\|_{L^2},\end{align*} so $S(t)u_0^{(k)}$ converges in $L^2(\Omega)$ and defines $S(t)u_0$. The same estimate passes to the limit, so $S(t)$ is a contraction on $L^2(\Omega)$. Finally, uniqueness of the weak heat solution implies $S(t)S(s)u_0=S(t+s)u_0$, because both sides solve the same zero-boundary heat equation at time $t+s$ with initial data $u_0$. Thus $S(t)=e^{t\Delta_D}$ is the Dirichlet heat semigroup, and the boundary condition is encoded in the generator domain $H^2(\Omega)\cap H^1_0(\Omega)$. [/example] The example shows that constructing a semigroup is equivalent to proving a full well-posedness theory for the associated PDE. The next question is how to recognise, from an unbounded operator $A$ and its resolvent equations, whether such a semigroup exists. [quotetheorem:3139] [citeproof:3139] The density hypothesis is not cosmetic: if $D(A)$ is not dense, the derivative at $t=0$ cannot encode a strongly continuous evolution on all of $X$, since generator domains of strongly continuous semigroups are automatically dense. Closedness rules out operators whose graph limits produce incompatible values; without it, the resolvent equation may look solvable on a core but fail to define stable dynamics after taking limits. The resolvent estimates are the analytic substitute for the bound $\|S(t)\|_{\mathcal{L}(X)}\le Me^{\omega t}$; for instance, a formal differential operator with badly posed elliptic resolvent equations cannot generate a semigroup even if it is densely defined. The theorem does not say that the semigroup is contractive, analytic, compact, or smoothing: it gives existence and growth control only. In Hilbert spaces and contraction settings, the energy inequality suggests a more direct test for generation, which leads to dissipative operators. ## Dissipative Operators and the Lumer Phillips Theorem The main problem in parabolic theory is to turn an energy estimate into existence, not just uniqueness. Dissipativity captures the infinitesimal form of the estimate $\|S(t)u\|_X\le \|u\|_X$, while a range condition supplies the elliptic solvability needed to build the flow. [definition: Dissipative Operator] Let $X$ be a Banach space. A densely defined linear operator $A:D(A)\subset X\to X$ is dissipative if for every $u\in D(A)$ and every $\lambda>0$, \begin{align*} \|\lambda u-Au\|_X\ge \lambda\|u\|_X. \end{align*} [/definition] In a Hilbert space, this condition has a familiar inner-product form. If $H$ is a real Hilbert space and $A$ is densely defined, then dissipativity is implied by $(Au,u)_H\le 0$ for all $u\in D(A)$. [example: Dirichlet Laplacian Is Dissipative] Let $H=L^2(\Omega)$ and let $A=\Delta$ with domain $D(A)=H^2(\Omega)\cap H^1_0(\Omega)$. For $u\in D(A)$, the trace of $u$ on $\partial\Omega$ is zero, so Green's identity gives \begin{align*} (Au,u)_{L^2}=\int_\Omega (\Delta u)u\,d\mathcal L^n=-\int_\Omega \nabla u\cdot\nabla u\,d\mathcal L^n+\int_{\partial\Omega}\frac{\partial u}{\partial\nu}u\,dS. \end{align*} The boundary integral is $0$ because $u|_{\partial\Omega}=0$, hence \begin{align*} (Au,u)_{L^2}=-\int_\Omega |\nabla u|^2\,d\mathcal L^n\le 0. \end{align*} To match the Banach-space definition of dissipativity, fix $\lambda>0$. If $u=0$, the inequality is immediate. If $u\ne0$, then \begin{align*} \lambda\|u\|_{L^2}^2\le \lambda\|u\|_{L^2}^2-(Au,u)_{L^2}. \end{align*} The right-hand side equals an inner product: \begin{align*} \lambda\|u\|_{L^2}^2-(Au,u)_{L^2}=(\lambda u-Au,u)_{L^2}. \end{align*} Using $|(v,u)_{L^2}|\le \|v\|_{L^2}\|u\|_{L^2}$ with $v=\lambda u-Au$ gives \begin{align*} \lambda\|u\|_{L^2}^2\le \|\lambda u-Au\|_{L^2}\|u\|_{L^2}. \end{align*} Dividing by $\|u\|_{L^2}$ yields \begin{align*} \|\lambda u-Au\|_{L^2}\ge \lambda\|u\|_{L^2}. \end{align*} Thus the Dirichlet Laplacian is dissipative: the zero boundary condition converts integration by parts into loss of $L^2$ energy. The remaining generation input is not another energy estimate, but solvability of $(\lambda I-\Delta)u=g$ with zero boundary values, supplied by elliptic theory. [/example] The example separates the two ingredients: an energy inequality gives dissipativity, and elliptic theory gives a range condition. This separation is not accidental. Dissipativity alone controls how two possible solutions move apart, but it does not guarantee that the resolvent equation can be solved for arbitrary data; a range condition alone gives solvability without any stability. The generation criterion below identifies exactly when these two static operator properties assemble into a contraction semigroup. [quotetheorem:7071] [citeproof:7071] This theorem explains why energy methods and elliptic estimates repeatedly appear together. Dense domain is still essential, because a contraction semigroup has a densely defined generator and therefore no operator defined only on a closed proper subspace can qualify. Dissipativity alone gives at most uniqueness and stability: an operator may satisfy the energy inequality on its domain but fail to solve $(\lambda I-A)u=g$ for arbitrary $g\in X$, so there is no way to evolve general initial data. Conversely, the range condition without dissipativity gives solvability of elliptic resolvent equations but no contraction estimate, and the resulting evolution, if it exists, need not be stable. Lumer-Phillips also does not provide analyticity, compactness, or smoothing; transport on $L^p(\mathbb R)$ is generated by a contraction semigroup but preserves roughness rather than regularising it. The theorem therefore converts energy plus elliptic solvability into well-posed time evolution, while the stronger parabolic features require extra structure. [example: Transport Semigroup] Let $X=L^p(\mathbb R)$ for $1\le p<\infty$, and define \begin{align*} (S(t)u_0)(x)=u_0(x-t), \qquad t\ge0. \end{align*} The identity at $t=0$ is immediate, since $(S(0)u_0)(x)=u_0(x)$. For $s,t\ge0$, \begin{align*} (S(t)S(s)u_0)(x)=(S(s)u_0)(x-t)=u_0((x-t)-s)=u_0(x-(t+s))=(S(t+s)u_0)(x). \end{align*} Thus $S(t)S(s)=S(t+s)$. Each $S(t)$ preserves the $L^p$ norm. Indeed, using the change of variables $y=x-t$, \begin{align*} \|S(t)u_0\|_{L^p}^p=\int_{\mathbb R}|u_0(x-t)|^p\,dx=\int_{\mathbb R}|u_0(y)|^p\,dy=\|u_0\|_{L^p}^p. \end{align*} Taking $p$-th roots gives $\|S(t)u_0\|_{L^p}=\|u_0\|_{L^p}$. Strong continuity follows from the continuity of translations in $L^p(\mathbb R)$: first prove it for $\varphi\in C_c^\infty(\mathbb R)$ by [uniform continuity](/page/Uniform%20Continuity) and compact support, then approximate an arbitrary $u_0\in L^p(\mathbb R)$ by such $\varphi$ and use the isometry estimate \begin{align*} \|S(t)u_0-u_0\|_{L^p}\le \|S(t)(u_0-\varphi)\|_{L^p}+\|S(t)\varphi-\varphi\|_{L^p}+\|\varphi-u_0\|_{L^p}. \end{align*} Since the first term equals $\|u_0-\varphi\|_{L^p}$ and the middle term tends to $0$ as $t\downarrow0$, we get $S(t)u_0\to u_0$ in $L^p$. For the generator, take $u\in W^{1,p}(\mathbb R)$. For almost every $x$, \begin{align*} u(x-t)-u(x)=-\int_{x-t}^{x}u'(r)\,dr=-\int_0^t u'(x-\theta)\,d\theta. \end{align*} Therefore \begin{align*} \frac{S(t)u-u}{t}+u'=-\frac{1}{t}\int_0^t u'(\cdot-\theta)\,d\theta+u'=\frac{1}{t}\int_0^t\bigl(u'-S(\theta)u'\bigr)\,d\theta. \end{align*} Taking norms and using [Minkowski's integral inequality](/theorems/464), \begin{align*} \left\|\frac{S(t)u-u}{t}+u'\right\|_{L^p}\le \frac{1}{t}\int_0^t\|u'-S(\theta)u'\|_{L^p}\,d\theta. \end{align*} The integrand tends to $0$ as $\theta\downarrow0$ by strong continuity of translations, so the average tends to $0$. Hence every $u\in W^{1,p}(\mathbb R)$ lies in the generator domain and $Au=-u'$. Conversely, if the generator limit exists in $L^p$ and equals $g$, then for every test function $\phi\in C_c^\infty(\mathbb R)$, \begin{align*} \int_{\mathbb R}g(x)\phi(x)\,dx=\lim_{t\downarrow0}\int_{\mathbb R}\frac{u(x-t)-u(x)}{t}\phi(x)\,dx. \end{align*} Changing variables in the first term gives \begin{align*} \int_{\mathbb R}\frac{u(x-t)-u(x)}{t}\phi(x)\,dx=\int_{\mathbb R}u(y)\frac{\phi(y+t)-\phi(y)}{t}\,dy. \end{align*} Since $\bigl(\phi(\cdot+t)-\phi\bigr)/t\to\phi'$ uniformly with common compact support, the limit is \begin{align*} \int_{\mathbb R}g(x)\phi(x)\,dx=\int_{\mathbb R}u(y)\phi'(y)\,dy. \end{align*} Thus the weak derivative of $u$ is $-g\in L^p(\mathbb R)$, so $u\in W^{1,p}(\mathbb R)$ and $Au=g=-u'$. The generator is therefore $A=-\frac{d}{dx}$ with domain $W^{1,p}(\mathbb R)$. Unlike the heat semigroup, this evolution has no smoothing: $S(t)u_0$ is exactly the profile $u_0$ shifted to the right by distance $t$, so it preserves the $L^p$ size and cannot create spatial regularity that was not already present. [/example] The transport example is important because semigroup theory is not only parabolic. The difference between heat and transport is not the existence of a semigroup, but the extra regularity and decay properties of that semigroup. ## Analytic Semigroups and Maximal Regularity as a Guiding Principle Contraction semigroups give well-posedness, but parabolic equations usually do more: they smooth instantly and allow estimates on both $u_t$ and $Au$. Analytic semigroups formalise the idea that $S(t)$ behaves like a [holomorphic function](/page/Holomorphic%20Function) of time for $t>0$, which is the operator-theoretic source of parabolic regularisation. [definition: Analytic Semigroup] Let $X$ be a Banach space. A strongly continuous semigroup $(S(t))_{t\ge0}$ is analytic if there exists an angle $\theta\in(0,\pi/2]$ and a holomorphic map $S:\Sigma_\theta\to\mathcal{L}(X)$ on \begin{align*} \Sigma_\theta=\{z\in\mathbb C\setminus\{0\}: |\arg z|<\theta\} \end{align*} such that $S(t)$ agrees with the original semigroup for $t>0$ and $S(z_1+z_2)=S(z_1)S(z_2)$ whenever $z_1,z_2,z_1+z_2\in\Sigma_\theta$. [/definition] The definition gives a complex-time formulation, but for PDEs we need a real-time consequence: smoothing estimates for the generator. The Dirichlet heat semigroup is the reference case where analyticity and the bound on $\Delta_D S(t)$ can be read from the spectral theorem. [quotetheorem:7072] [citeproof:7072] The estimate shows the singular behaviour near $t=0$ that accompanies smoothing: $S(t)u_0$ gains access to the operator domain for $t>0$, but the bound necessarily degenerates as $t\downarrow0$ for rough initial data. The self-adjoint nonnegative realisation matters here, because the proof uses the spectral theorem; for rougher boundary conditions, non-self-adjoint coefficients, or Banach-space realisations, analyticity can fail or require sectorial operator estimates instead. Transport gives a concrete contrast: its translation semigroup is strongly continuous and contractive, but not analytic and not smoothing. Analyticity also does not by itself solve nonlinear equations or give arbitrary spatial regularity; it supplies operator estimates that must be combined with forcing assumptions and elliptic information. To solve an inhomogeneous equation, we now need a formulation that combines the homogeneous semigroup with a source term without requiring the solution to lie in $D(A)$ at every time. [definition: Mild Solution] Let $A$ generate a strongly continuous semigroup $(S(t))_{t\ge0}$ on a Banach space $X$. Given $u_0\in X$ and $f\in L^1(0,T;X)$, a mild solution of \begin{align*} u_t=Au+f, \qquad u(0)=u_0, \end{align*} on $[0,T]$ is a function $u\in C([0,T];X)$ satisfying \begin{align*} u(t)=S(t)u_0+\int_0^t S(t-s)f(s)\,ds \end{align*} for every $t\in[0,T]$. [/definition] The mild formulation is the abstract Duhamel formula from Chapter 1, and it only uses bounded operators $S(t-s)$ together with a Bochner integral of $f$. This formula is not merely formal: strong continuity of the semigroup and $f\in L^1(0,T;X)$ make the integral term continuous in $X$, while uniqueness follows by subtracting two candidates and using the semigroup representation. This result is the workhorse existence theorem for abstract parabolic problems, but each assumption is doing work. Strong continuity is needed to make $t\mapsto S(t)u_0$ continuous at the initial time; without it, the formula could define an orbit that fails the basic Cauchy-problem requirement even before adding $f$. The Banach-space setting is used because Bochner integration and completeness ensure that the integral term belongs to $X$ and depends continuously on time. The condition $f\in L^1(0,T;X)$ is the natural minimal assumption for the Duhamel integral; if $f$ is not Bochner integrable, the expression may not define an $X$-valued function. The theorem also does not assert that $u(t)\in D(A)$, that $u_t$ exists as an $X$-valued function, or that any smoothing occurs. The regularity of the resulting solution is determined by extra structure of the semigroup and by stronger assumptions on the forcing term. [explanation: Maximal Regularity Principle] Maximal regularity is the principle that the equation $u_t=Au+f$ should control the two terms $u_t$ and $Au$ in the same function space as $f$, after the initial datum is placed in the correct trace space. A typical target estimate has the form \begin{align*} \|u_t\|_{L^p(0,T;X)}+\|Au\|_{L^p(0,T;X)}\le C\|f\|_{L^p(0,T;X)} \end{align*} for the zero-initial-data problem. This course uses maximal regularity as a guiding principle rather than as a fully developed theory. The point is to distinguish existence of a mild solution from estimates strong enough to treat nonlinear perturbations. Analytic semigroups supply the first layer of this principle through bounds such as $\|AS(t)\|\lesssim t^{-1}$, with constants depending on the operator and the time interval under consideration. [/explanation] The final comparison links the abstract vocabulary back to the weak formulations used earlier in the course. Mild solutions are defined by semigroup formulas; weak solutions are defined by testing against spatial functions and integrating by parts. [example: Comparison Of Mild And Weak Heat Solutions] Let $\Omega\subset\mathbb R^n$ be bounded, let $A=\Delta_D$ on $L^2(\Omega)$, and let $f\in L^2(0,T;H^{-1}(\Omega))$ with $u_0\in L^2(\Omega)$. The weak formulation asks that \begin{align*} \frac{d}{dt}(u(t),v)_{L^2}+\int_\Omega \nabla u(t)\cdot\nabla v\,d\mathcal L^n=f(t)(v) \end{align*} for every $v\in H^1_0(\Omega)$ in the distributional sense in time. Now assume in addition that $f\in L^2(0,T;L^2(\Omega))$. Let $(e_k)_{k\ge1}$ be an [orthonormal basis](/page/Orthonormal%20Basis) of $L^2(\Omega)$ consisting of Dirichlet eigenfunctions of $-\Delta_D$, so \begin{align*} -\Delta_D e_k=\lambda_k e_k, \qquad \lambda_k\ge0. \end{align*} The mild solution is \begin{align*} u(t)=S(t)u_0+\int_0^t S(t-s)f(s)\,ds. \end{align*} Testing this formula against $e_k$ and using $S(t)e_k=e^{-\lambda_k t}e_k$ gives \begin{align*} (u(t),e_k)_{L^2}=e^{-\lambda_k t}(u_0,e_k)_{L^2}+\int_0^t e^{-\lambda_k(t-s)}(f(s),e_k)_{L^2}\,ds. \end{align*} Differentiating the first term gives \begin{align*} \frac{d}{dt}\left[e^{-\lambda_k t}(u_0,e_k)_{L^2}\right]=-\lambda_k e^{-\lambda_k t}(u_0,e_k)_{L^2}. \end{align*} Differentiating the integral term by the fundamental theorem of calculus and the derivative of the kernel gives \begin{align*} \frac{d}{dt}\int_0^t e^{-\lambda_k(t-s)}(f(s),e_k)_{L^2}\,ds=(f(t),e_k)_{L^2}-\lambda_k\int_0^t e^{-\lambda_k(t-s)}(f(s),e_k)_{L^2}\,ds. \end{align*} Adding these two identities yields, for almost every $t$, \begin{align*} \frac{d}{dt}(u(t),e_k)_{L^2}=-\lambda_k(u(t),e_k)_{L^2}+(f(t),e_k)_{L^2}. \end{align*} Since $-\Delta_D e_k=\lambda_k e_k$, the Dirichlet form identity gives \begin{align*} \int_\Omega \nabla u(t)\cdot\nabla e_k\,d\mathcal L^n=\lambda_k(u(t),e_k)_{L^2}. \end{align*} Therefore \begin{align*} \frac{d}{dt}(u(t),e_k)_{L^2}+\int_\Omega \nabla u(t)\cdot\nabla e_k\,d\mathcal L^n=(f(t),e_k)_{L^2}. \end{align*} By linearity, the same identity holds for every finite linear combination of the eigenfunctions. These finite combinations are dense in $H^1_0(\Omega)$ with respect to the Dirichlet form norm, so passing to the limit gives the weak formulation for every $v\in H^1_0(\Omega)$. Thus, when the forcing is $L^2$-valued, the semigroup formula and the weak energy solution describe the same function; the equality is forced by uniqueness of weak solutions. [/example] Semigroup theory therefore does not replace the energy method; it packages it. The generator records the spatial operator and boundary condition, the semigroup records homogeneous evolution, and Duhamel's formula inserts forcing. Later hyperbolic theory will use related first-order formulations, but without the same analytic smoothing that characterises parabolic evolution. The semigroup viewpoint identifies the generator, the homogeneous evolution, and the Duhamel term as the core of parabolic dynamics. Having established well-posedness and regularisation on finite intervals, we now ask what these flows do in the large-time limit. # 5. Long-Time Behavior of Parabolic Flows This chapter studies what parabolic evolution equations do after the initial smoothing period has passed. Chapters 2 through 4 gave existence, uniqueness, energy identities, semigroup formulations, and regularisation estimates on finite time intervals. The new question is asymptotic: does the solution approach an equilibrium, and if so at what rate? On bounded domains the answer is often exponential because the spectrum has a gap, while on $\mathbb R^n$ the low-frequency part of the heat flow prevents any uniform exponential rate and leads instead to algebraic decay. ## Exponential Decay from a Spectral Gap The first long-time mechanism is coercivity. For the heat equation on a bounded domain, the Dirichlet energy controls the $L^2$ norm through Poincare's inequality, so the standard energy identity becomes a differential inequality for the size of the solution. Let $U \subset \mathbb R^n$ be a bounded connected domain with enough boundary regularity for the Poincare inequality on $H^1_0(U)$ to hold. We consider the homogeneous Dirichlet heat equation $\partial_t u-\Delta u=0$ in $U\times(0,\infty)$, with $u=0$ on $\partial U\times(0,\infty)$ and initial condition $u(0)=u_0$. The boundary condition removes constants and creates a positive first eigenvalue. [quotetheorem:7073] [citeproof:7073] This theorem is the model for many bounded-domain decay arguments: energy dissipation alone gives monotonicity, and the spectral gap upgrades monotonicity to exponential decay. Each hypothesis has a precise role. Boundedness is what prevents energy from escaping to arbitrarily large spatial scales; on $\mathbb R^n$ the functions $v_R(x)=R^{-n/2}\phi(x/R)$ have $\|v_R\|_{L^2}$ fixed while $\|\nabla v_R\|_{L^2}\to0$, so no inequality with a positive $\lambda_1$ can hold. Connectedness and the Dirichlet condition remove zero-energy modes: if there were no boundary condition, constants would satisfy $\nabla v=0$ without having zero $L^2$ norm. The theorem is also only an $L^2$ decay statement; it does not by itself identify pointwise asymptotics or higher-regularity decay without additional smoothing estimates. The number $\lambda_1$ is the sharp exponential rate in $L^2$ unless the initial datum is orthogonal to the first eigenspace. [example: Heat Equation on an Interval] Let $U=(0,L)$, impose $u(0,t)=u(L,t)=0$, and set $\phi_k(x)=\sin(k\pi x/L)$ for $k\ge1$. Then $\phi_k(0)=\phi_k(L)=0$, and differentiating twice gives \begin{align*} -\phi_k''(x)=-\frac{d^2}{dx^2}\sin(k\pi x/L)=\left(\frac{k\pi}{L}\right)^2\sin(k\pi x/L). \end{align*} Thus the Dirichlet eigenvalues of $-\partial_x^2$ on $(0,L)$ are $(k\pi/L)^2$, and the first one is $\lambda_1=\pi^2/L^2$. Writing the sine expansion of the initial datum as \begin{align*} u_0(x)=\sum_{k=1}^{\infty}a_k\sin(k\pi x/L),\qquad a_k=\frac{2}{L}\int_0^L u_0(x)\sin(k\pi x/L)\,dx, \end{align*} the heat flow multiplies the $k$th eigenmode by $e^{-(k\pi/L)^2t}$, so \begin{align*} u(x,t)=\sum_{k=1}^{\infty} a_k e^{-(k\pi/L)^2t}\sin(k\pi x/L). \end{align*} Because the sine functions are orthogonal and $\int_0^L\sin^2(k\pi x/L)\,dx=L/2$, the squared $L^2$ norm is \begin{align*} \|u(t)\|_{L^2(0,L)}^2=\frac{L}{2}\sum_{k=1}^{\infty}a_k^2 e^{-2(k\pi/L)^2t}. \end{align*} If $a_1\ne0$, then multiplying by $e^{2\pi^2t/L^2}$ gives \begin{align*} e^{2\pi^2t/L^2}\|u(t)\|_{L^2(0,L)}^2=\frac{L}{2}a_1^2+\frac{L}{2}\sum_{k=2}^{\infty}a_k^2 e^{-2((k\pi/L)^2-\pi^2/L^2)t}. \end{align*} Every exponent in the remaining sum is negative, so the sum tends to $0$ as $t\to\infty$. Hence \begin{align*} \|u(t)\|_{L^2(0,L)}\sim \left(\frac{L}{2}\right)^{1/2}|a_1|e^{-\pi^2t/L^2}. \end{align*} The first nonzero sine coefficient therefore determines the leading exponential rate; when $a_1\ne0$, the sharp rate is exactly $e^{-\lambda_1 t}=e^{-\pi^2t/L^2}$. [/example] This interval computation shows how boundary conditions decide which modes survive. The next question is what replaces Dirichlet decay when constants are allowed by the boundary condition. For Neumann problems, the correct statement subtracts the conserved average before applying the spectral gap on the zero-mean subspace. [quotetheorem:7074] [citeproof:7074] This version explains the role of equilibria. A parabolic flow may fail to converge to zero because the equation has a nonzero stationary subspace, but it can still converge exponentially to the projection of the data onto that subspace. The mechanism is the Neumann spectral gap on the zero-mean subspace: constants form the zero eigenspace, and every orthogonal mode is damped at a rate controlled by the first positive Neumann eigenvalue. The subtraction of $\bar u_0$ is therefore necessary, not a normalization trick: if $u_0\equiv c\ne0$, then the Neumann solution is the stationary function $u(t)\equiv c$, so decay to zero is false. Connectedness is also essential for using a single average; if $U$ has two disjoint components, each component has its own constant equilibrium, and subtracting only the global mean can leave an undamped piece. Thus the theorem measures decay only of the nonconstant component on each connected domain, and it does not determine convergence rates for domains where the required zero-mean Poincare inequality fails. [example: Decay of Heat Flow with Zero Mean] For the Neumann heat equation on $(0,L)$, the spatial mean is conserved because \begin{align*} \frac{d}{dt}\int_0^L u(x,t)\,dx=\int_0^L u_{xx}(x,t)\,dx=u_x(L,t)-u_x(0,t)=0. \end{align*} Thus $\bar u(t)=L^{-1}\int_0^L u(x,t)\,dx=\bar u_0$. The Neumann eigenfunctions are $\phi_0(x)=1$ and, for $k\ge 1$, $\phi_k(x)=\cos(k\pi x/L)$, since $\phi_k'(x)=-(k\pi/L)\sin(k\pi x/L)$ gives $\phi_k'(0)=\phi_k'(L)=0$, and \begin{align*} -\phi_k''(x)=\left(\frac{k\pi}{L}\right)^2\cos(k\pi x/L). \end{align*} If $\bar u_0=0$, the constant coefficient vanishes, so \begin{align*} u_0(x)=\sum_{k=1}^{\infty}a_k\cos(k\pi x/L),\qquad a_k=\frac{2}{L}\int_0^L u_0(x)\cos(k\pi x/L)\,dx. \end{align*} The heat flow multiplies the $k$th mode by $e^{-(k\pi/L)^2t}$, hence \begin{align*} u(x,t)=\sum_{k=1}^{\infty}a_k e^{-(k\pi/L)^2t}\cos(k\pi x/L). \end{align*} Using orthogonality and $\int_0^L\cos^2(k\pi x/L)\,dx=L/2$ for $k\ge1$, \begin{align*} \|u(t)\|_{L^2(0,L)}^2=\frac{L}{2}\sum_{k=1}^{\infty}a_k^2e^{-2(k\pi/L)^2t}. \end{align*} Since $k\ge1$ implies $(k\pi/L)^2\ge \pi^2/L^2$, \begin{align*} \|u(t)\|_{L^2(0,L)}^2\le e^{-2\pi^2t/L^2}\frac{L}{2}\sum_{k=1}^{\infty}a_k^2=e^{-2\pi^2t/L^2}\|u_0\|_{L^2(0,L)}^2. \end{align*} Taking square roots gives \begin{align*} \|u(t)\|_{L^2(0,L)}\le e^{-\pi^2t/L^2}\|u_0\|_{L^2(0,L)}. \end{align*} The zero mode is exactly the conserved average, and once it is absent the first remaining mode $\cos(\pi x/L)$ determines the exponential decay rate. [/example] ## Algebraic Decay on Euclidean Space On $\mathbb R^n$ there is no Poincare inequality controlling $\|u\|_{L^2}$ by $\|\nabla u\|_{L^2}$. Large-scale, slowly varying functions can have small gradient while keeping a large $L^2$ norm, so the bounded-domain spectral-gap argument cannot work. The replacement is to combine energy dissipation with an inequality that also uses mass or integrability information. [quotetheorem:7075] [citeproof:7075] Nash's inequality supplies the missing replacement for a spectral gap, but by itself it is only a static interpolation estimate. The $H^1$ hypothesis is needed because the right-hand side contains $\nabla f$ in $L^2$, while the $L^1$ hypothesis controls the low-frequency mass that cannot be estimated from $L^2$ data alone. For instance, the rescaled functions $f_R(x)=R^{-n/2}\phi(x/R)$ keep their $L^2$ norm fixed and have $\|\nabla f_R\|_{L^2}\to0$; without an $L^1$ factor there could be no uniform lower bound on the dissipation. The inequality also gives no sign information and no pointwise control: it is an $L^1$-$H^1$ interpolation estimate designed to convert energy dissipation into decay. The next step is to insert it into the heat energy identity and exploit conservation, or contraction, of the $L^1$ size. This turns dissipation into an explicit algebraic decay law. [quotetheorem:7076] [citeproof:7076] The exponent $t^{-n/4}$ matches the explicit heat kernel scaling. The $L^1$ assumption is not a technical decoration: it gives control of $\hat u_0$ near $\xi=0$, or equivalently control of the total amount of heat being spread out. If only $u_0\in L^2(\mathbb R^n)$ is assumed, there is no uniform algebraic bound depending only on $\|u_0\|_{L^2}$, because initial data can be concentrated at arbitrarily low frequencies and therefore decay arbitrarily slowly over long but finite time windows. The estimate also gives an upper bound, not an asymptotic expansion; identifying the leading coefficient requires additional moment or mass assumptions. Thus Nash's method is not merely qualitative, but it recovers the rate dictated by diffusion over length scale $t^{1/2}$ only under the integrability hypotheses stated in the theorem. [example: Gaussian Scaling and the Sharp Algebraic Rate] Let $u_0\in L^1(\mathbb R^n)$ have nonzero mass $M=\int_{\mathbb R^n}u_0\,d\mathcal L^n$, and write the predicted leading heat profile as \begin{align*} g_t(x)=M(4\pi t)^{-n/2}e^{-|x|^2/(4t)}. \end{align*} We compute its $L^2$ size explicitly: \begin{align*} \|g_t\|_{L^2(\mathbb R^n)}^2=|M|^2(4\pi t)^{-n}\int_{\mathbb R^n}e^{-|x|^2/(2t)}\,d\mathcal L^n(x). \end{align*} With the change of variables $x=(2t)^{1/2}y$, we have $d\mathcal L^n(x)=(2t)^{n/2}d\mathcal L^n(y)$, so \begin{align*} \int_{\mathbb R^n}e^{-|x|^2/(2t)}\,d\mathcal L^n(x)=(2t)^{n/2}\int_{\mathbb R^n}e^{-|y|^2}\,d\mathcal L^n(y). \end{align*} The $n$-dimensional Gaussian integral factors into $n$ copies of $\int_{\mathbb R}e^{-s^2}\,ds=\pi^{1/2}$, hence \begin{align*} \int_{\mathbb R^n}e^{-|y|^2}\,d\mathcal L^n(y)=\pi^{n/2}. \end{align*} Substituting this into the norm identity gives \begin{align*} \|g_t\|_{L^2(\mathbb R^n)}^2=|M|^2(4\pi t)^{-n}(2t)^{n/2}\pi^{n/2}. \end{align*} Equivalently, \begin{align*} \|g_t\|_{L^2(\mathbb R^n)}^2=|M|^2\,2^{-3n/2}\pi^{-n/2}t^{-n/2}. \end{align*} Taking square roots, \begin{align*} \|g_t\|_{L^2(\mathbb R^n)}=|M|\,2^{-3n/4}\pi^{-n/4}t^{-n/4}. \end{align*} Thus a nonzero mass term naturally produces exactly the algebraic rate $t^{-n/4}$ in $L^2$, matching the Nash decay exponent. If $M=0$, this leading Gaussian mass term disappears, so the next asymptotic term must come from higher moments of $u_0$, and faster leading decay can occur. [/example] Fourier methods give a complementary view of the same phenomenon. The heat semigroup multiplies each frequency by $e^{-t|\xi|^2}$, so high frequencies disappear rapidly while low frequencies decay slowly. [explanation: Fourier Splitting] For $u(t)=e^{t\Delta}u_0$ on $\mathbb R^n$, Plancherel gives \begin{align*} \|u(t)\|_{L^2}^2=\int_{\mathbb R^n}e^{-2t|\xi|^2}|\hat u_0(\xi)|^2\,d\mathcal L^n(\xi). \end{align*} The integral over $|\xi|\ge R(t)$ is small because of the exponential factor. The integral over $|\xi|<R(t)$ is controlled by the size of a ball in frequency space and bounds on $\hat u_0$, such as $\|\hat u_0\|_{L^\infty}\le C\|u_0\|_{L^1}$. Choosing $R(t)$ proportional to $t^{-1/2}$ gives the same algebraic decay as Nash's inequality. [/explanation] This frequency picture is often the easiest way to diagnose which rates are possible. Exponential decay corresponds to a spectral gap at the origin, while algebraic decay corresponds to continuous spectrum touching zero. ## Stability of Equilibria for Semilinear Parabolic Equations Long-time questions for nonlinear parabolic equations usually start by locating equilibria. The next problem is whether solutions beginning near an equilibrium remain near it and converge back to it. The guiding principle is that a stable linearisation, together with a nonlinear remainder of higher order, forces nonlinear stability. Consider a semilinear equation on a Banach space $X$, \begin{align*} \frac{du}{dt}=Au+F(u), \end{align*} where $A:D(A)\subset X\to X$ generates a strongly continuous semigroup and $F:V\subset X\to X$ is locally Lipschitz on an open neighbourhood $V$ of the equilibrium under study. If $u_*\in D(A)\cap V$ is an equilibrium, then $Au_*+F(u_*)=0$. Assume that $F$ is Frechet differentiable at $u_*$, with derivative $DF_{u_*}:X\to X$. Writing $v=u-u_*$ gives \begin{align*} \frac{dv}{dt}=Lv+N(v), \end{align*} where $L:D(A)\subset X\to X$ is the linearised operator $L=A+DF_{u_*}$ and, for $v$ in a sufficiently small ball $B_X(0,r)\subset X$ with $u_*+v\in V$, the nonlinear remainder is the map $N:B_X(0,r)\to X$ defined by \begin{align*} N(v)=F(u_*+v)-F(u_*)-DF_{u_*}v. \end{align*} [definition: Exponentially Stable Semigroup] Let $X$ be a Banach space and let $L$ generate a strongly continuous semigroup $(e^{tL})_{t\ge 0}$ on $X$. The semigroup is exponentially stable if there exist constants $M\ge 1$ and $\omega>0$ such that \begin{align*} \|e^{tL}\|_{\mathcal L(X)}\le Me^{-\omega t} \end{align*} for all $t\ge 0$. [/definition] This definition is formulated at the semigroup level because nonlinear stability uses the Duhamel formula rather than only spectral data. In applications, exponential semigroup bounds often come from sectorial elliptic operators and spectral information. [quotetheorem:7077] [citeproof:7077] The theorem says that nonlinear terms of quadratic order cannot overcome a strict exponential decay mechanism when the initial perturbation is small. The proof mechanism is Duhamel plus a weighted contraction estimate: the linear term contributes the factor $e^{-\omega t}$, while the quadratic remainder is controlled by the square of the exponentially weighted norm of the perturbation. Both assumptions are doing real work. Exponential stability rules out neutral modes: for the scalar equation $\dot v=0$, every solution is constant, so no smallness assumption can force decay. The quadratic Lipschitz condition rules out a nonlinear term that is too large near the origin; for example, a perturbation behaving like $|v|^{1/2}$ is not locally Lipschitz at $0$ and cannot be controlled by the weighted contraction argument. If the linearised operator has spectrum on the imaginary axis, or if the nonlinear remainder is only of first order, the result no longer applies and centre-manifold, modulation, or normal-form methods are needed. [example: Reaction-Diffusion Equation Near a Stable Constant State] Let $U\subset\mathbb R^n$ be bounded and smooth, and consider \begin{align*} \partial_t u-\Delta u=f(u) \end{align*} with homogeneous Neumann boundary condition $\partial_\nu u=0$ on $\partial U$. Suppose $a\in\mathbb R$ satisfies $f(a)=0$ and $f'(a)<0$. Set $v=u-a$. Since $a$ is constant, $\partial_t a=0$ and $\Delta a=0$, so \begin{align*} \partial_t v=\partial_t u \end{align*} and \begin{align*} \Delta v=\Delta u-\Delta a=\Delta u. \end{align*} Therefore \begin{align*} \partial_t v-\Delta v=\partial_t u-\Delta u=f(u)=f(a+v). \end{align*} Writing \begin{align*} f(a+v)=f(a)+f'(a)v+\bigl(f(a+v)-f(a)-f'(a)v\bigr) \end{align*} and using $f(a)=0$, we obtain \begin{align*} \partial_t v=\Delta v+f'(a)v+N(v), \end{align*} where \begin{align*} N(v)=f(a+v)-f(a)-f'(a)v. \end{align*} The Neumann condition is unchanged because $\partial_\nu v=\partial_\nu u-\partial_\nu a=\partial_\nu u=0$. If $f\in C^2$ and $|z-a|\le r$, Taylor's formula with integral remainder gives \begin{align*} f(a+z)=f(a)+f'(a)z+z^2\int_0^1(1-\theta)f''(a+\theta z)\,d\theta. \end{align*} Hence, for $|v|\le r$, \begin{align*} N(v)=v^2\int_0^1(1-\theta)f''(a+\theta v)\,d\theta. \end{align*} If $K=\sup_{|z-a|\le r}|f''(z)|$, then \begin{align*} |N(v)|\le K|v|^2\int_0^1(1-\theta)\,d\theta=\frac{K}{2}|v|^2. \end{align*} Similarly, \begin{align*} N(v)-N(w)=\bigl(f(a+v)-f(a+w)\bigr)-f'(a)(v-w), \end{align*} and the mean value formula gives \begin{align*} N(v)-N(w)=(v-w)\int_0^1\bigl(f'(a+w+\theta(v-w))-f'(a)\bigr)\,d\theta. \end{align*} Using $|f'(a+\eta)-f'(a)|\le K|\eta|$ for $|\eta|\le r$, \begin{align*} |N(v)-N(w)|\le K(|v|+|w|)|v-w|. \end{align*} Thus in a Banach algebra space such as $H^s(U)$ with $s>n/2$, the nonlinear remainder has the quadratic Lipschitz form required by the linearized stability principle. For the linear part \begin{align*} L=\Delta_N+f'(a)I, \end{align*} write the Neumann spectrum of $-\Delta_N$ as \begin{align*} 0=\mu_0\le \mu_1\le \mu_2\le\cdots. \end{align*} The corresponding eigenvalues of $\Delta_N$ are $-\mu_j$, so the eigenvalues of $L$ are \begin{align*} -\mu_j+f'(a)=f'(a)-\mu_j. \end{align*} Since $\mu_j\ge0$ and $f'(a)<0$, \begin{align*} f'(a)-\mu_j\le f'(a)<0. \end{align*} Thus the constant Neumann mode, which had eigenvalue $0$ for $\Delta_N$, is moved to the negative eigenvalue $f'(a)$, and every higher mode is even farther to the left. The linearized stability principle therefore applies in any function space where this spectral bound gives an exponentially stable semigroup and the displayed quadratic estimate for $N$ is valid. Consequently, sufficiently small perturbations $u(0)-a$ produce global solutions with $u(t)-a$ decaying exponentially to $0$. [/example] For PDE applications, the abstract theorem should be read together with regularity theory. A natural strategy is to prove stability in a space strong enough to control the nonlinearity, then use parabolic smoothing to obtain additional spatial regularity and pointwise convergence. [remark: Choosing the Stability Space] The choice of $X$ is part of the argument. For a polynomial nonlinearity, $L^2$ may be too weak to make $N(v)$ locally Lipschitz from $X$ to $X$, while $H^s(U)$ with $s>n/2$ has algebra properties that are more suitable. In many parabolic problems, maximal regularity or analytic semigroup estimates allow the stability argument to run in interpolation spaces closer to the natural energy class. [/remark] The chapter's three decay mechanisms fit together as follows. On bounded domains, the decisive issue is whether the linearised operator has a spectral gap after removing conserved quantities. On Euclidean space, diffusion still dissipates energy, but low frequencies force algebraic rates. For semilinear equations, the long-time behaviour is governed first by the linearised flow and then by whether the nonlinear remainder is small enough to preserve that decay. The long-time parabolic picture is one of decay, convergence, and sometimes equilibria, in sharp contrast with the diffusive smoothing developed earlier. The next chapter turns to the wave equation, where energy is conserved rather than dissipated and the main issue is propagation rather than regularisation. # 6. The Wave Equation and Energy Conservation The wave equation is the model hyperbolic evolution equation: disturbances travel, energy is transported, and the solution keeps memory of its initial velocity. In contrast with the heat flow of Chapters 2 and 3, the wave flow does not smooth rough data; the natural topology is the conserved energy topology. This chapter develops the classical energy identity, then uses it to define and control weak solutions with data in $H^1 \times L^2$, and finally adds forcing through Duhamel's principle. ## Classical Solutions and the Conserved Energy What quantity should replace the maximum principle for a second-order equation in time? For waves, the decisive invariant is the sum of kinetic and potential energy. We first work with smooth solutions so that the integration by parts is visible before passing to the energy space. Let $U \subset \mathbb R^n$ be an open set, often with boundary conditions imposed on $\partial U$, and let $T>0$. The homogeneous wave equation with speed $c>0$ is \begin{align*} \partial_t^2 u - c^2 \Delta u = 0 \qquad \text{in } (0,T) \times U. \end{align*} For the Dirichlet problem the boundary condition is $u=0$ on $(0,T)\times \partial U$, while on all of $\mathbb R^n$ no boundary condition is imposed. [definition: Classical Wave Solution] Let $U \subset \mathbb R^n$ be open and $T>0$. A classical solution of the homogeneous wave equation on $(0,T)\times U$ is a scalar function $u:(0,T)\times U\to\mathbb R$ with $u \in C^2((0,T)\times U)$ such that \begin{align*} \partial_t^2 u(t,x) - c^2 \Delta u(t,x) = 0 \end{align*} for all $(t,x) \in (0,T)\times U$. [/definition] The definition only records pointwise differentiability and the equation. To obtain a well-posed Cauchy problem, we prescribe both displacement and velocity: \begin{align*} u(0,x) = u_0(x), \qquad \partial_t u(0,x)=u_1(x). \end{align*} The second datum is necessary because the equation is second order in time. [definition: Wave Energy] Let $u$ be a sufficiently regular real-valued function on $[0,T]\times U$ with $\partial_t u(t,\cdot)\in L^2(U)$ and $u(t,\cdot)\in H^1(U)$ for each $t\in[0,T]$. The wave energy of $u$ is the map \begin{align*} E_u:[0,T]\to[0,\infty) \end{align*} defined by \begin{align*} E_u(t) := \frac{1}{2}\int_U |\partial_t u(t,x)|^2\,d\mathcal L^n(x) + \frac{c^2}{2}\int_U |\nabla u(t,x)|^2\,d\mathcal L^n(x). \end{align*} [/definition] The first term is kinetic energy and the second is elastic potential energy. The next question is whether this quantity is merely a useful norm or whether the PDE itself preserves it. Differentiating $E(t)$ is the test that reveals the hyperbolic structure and gives the estimate later used for uniqueness. [quotetheorem:7078] This theorem says that the wave equation trades kinetic and potential energy without dissipating their sum, but the hypotheses are doing real work. If $U$ is bounded and the boundary is not homogeneous Dirichlet, the boundary flux term need not vanish: a moving or forced boundary can inject energy into the string. If $U=\mathbb R^n$ and the solution has insufficient decay, the integration by parts over large balls may leave a nonzero flux through infinity, so the displayed energy may not even be finite. The theorem also does not assert smoothing or pointwise control; it preserves exactly the $L^2$ velocity and $H^1$ displacement size, unlike the heat equation, which loses $L^2$ energy through diffusion. [example: Vibrating String Energy] Consider the one-dimensional string on $(0,L)$ with fixed endpoints: \begin{align*} \partial_t^2 u-c^2\partial_x^2u=0,\qquad u(t,0)=u(t,L)=0. \end{align*} For smooth initial data $u(0,x)=u_0(x)$ and $\partial_tu(0,x)=u_1(x)$, its energy is \begin{align*} E(t)=\frac12\int_0^L |\partial_tu(t,x)|^2\,dx+\frac{c^2}{2}\int_0^L |\partial_xu(t,x)|^2\,dx. \end{align*} Differentiating under the integral sign and using the product rule gives \begin{align*} E'(t)=\int_0^L \partial_tu(t,x)\,\partial_t^2u(t,x)\,dx+c^2\int_0^L \partial_xu(t,x)\,\partial_x\partial_tu(t,x)\,dx. \end{align*} Integrating the second integral by parts in $x$ gives \begin{align*} c^2\int_0^L \partial_xu\,\partial_x\partial_tu\,dx=c^2\bigl[\partial_xu\,\partial_tu\bigr]_{x=0}^{x=L}-c^2\int_0^L \partial_x^2u\,\partial_tu\,dx. \end{align*} Since $u(t,0)=u(t,L)=0$ for every $t$, differentiating the boundary condition in time gives $\partial_tu(t,0)=\partial_tu(t,L)=0$, so the boundary term is \begin{align*} c^2\bigl[\partial_xu\,\partial_tu\bigr]_{x=0}^{x=L}=c^2\partial_xu(t,L)\partial_tu(t,L)-c^2\partial_xu(t,0)\partial_tu(t,0)=0. \end{align*} Therefore \begin{align*} E'(t)=\int_0^L \partial_tu(t,x)\bigl(\partial_t^2u(t,x)-c^2\partial_x^2u(t,x)\bigr)\,dx. \end{align*} The wave equation makes the integrand zero at every $x$, hence $E'(t)=0$ and $E(t)=E(0)$, equivalently by *[Energy Conservation for the Homogeneous Wave Equation](/theorems/7078)*. Thus the string can trade kinetic energy $\frac12\int_0^L|\partial_tu|^2\,dx$ with elastic energy $\frac{c^2}{2}\int_0^L|\partial_xu|^2\,dx$, but their sum stays fixed. [/example] The same computation can be localized. Instead of integrating over all of $U$, integrate over a subdomain and keep track of the boundary term. This is the first sign of finite propagation: energy crosses boundaries through flux rather than appearing far away. [remark: Boundary Flux] For smooth $V \subset U$, the local energy over $V$ satisfies a balance law whose boundary term is \begin{align*} -c^2\int_{\partial V} \partial_tu\, \partial_\nu u\,d\mathcal H^{n-1}. \end{align*} Here $\nu$ is the outward unit normal to $\partial V$. With this sign convention the displayed term is the contribution to $\frac{d}{dt}E_V(t)$, the time derivative of the energy remaining inside $V$. The outward energy flux itself is $c^2\partial_tu\,\partial_\nu u$ integrated over $\partial V$, so positive outward flux decreases the energy inside $V$. [/remark] ## Weak Solutions in the Energy Space Classical solutions require more regularity than the energy identity itself uses. The natural Cauchy data are $u_0\in H^1_0(U)$ and $u_1\in L^2(U)$, because exactly these norms occur in $E(0)$. The problem is to formulate the wave equation so that no second spatial derivative of $u$ is required pointwise. [definition: Energy Space] Let $U \subset \mathbb R^n$ be open. The Dirichlet energy space for the wave equation is \begin{align*} \mathcal H(U) := H^1_0(U) \times L^2(U), \end{align*} with norm $\|\cdot\|_{\mathcal H}:\mathcal H(U)\to[0,\infty)$ defined by \begin{align*} \|(v,w)\|_{\mathcal H}^2 := c^2\|\nabla v\|_{L^2(U)}^2 + \|w\|_{L^2(U)}^2. \end{align*} [/definition] On a bounded domain, Poincare's inequality makes this equivalent to the usual $H^1_0(U)\times L^2(U)$ norm. The space identifies which quantities can be controlled, but it does not yet say what equation a rough function satisfies. We need a weak formulation whose terms make sense for $u\in H^1_0(U)$ and $\partial_tu\in L^2(U)$. [definition: Weak Wave Solution] Let $u_0\in H^1_0(U)$ and $u_1\in L^2(U)$. A weak solution of the homogeneous Dirichlet wave equation on $(0,T)\times U$ is a function \begin{align*} u \in C([0,T];H^1_0(U)), \qquad \partial_t u \in C([0,T];L^2(U)) \end{align*} such that $u(0)=u_0$, $\partial_tu(0)=u_1$, and for every $\phi\in C_c^\infty((0,T)\times U)$, \begin{align*} \int_0^{\mathsf T}\int_U u\,\partial_t^2\phi\,d\mathcal L^n dt + c^2\int_0^{\mathsf T}\int_U \nabla u\cdot \nabla \phi\,d\mathcal L^n dt=0. \end{align*} [/definition] The distributional formulation transfers derivatives from $u$ to the test function. This makes the Laplacian visible through the Dirichlet form $\int_U \nabla u\cdot \nabla\phi$, which is meaningful for $H^1_0$ functions. Once the equation is meaningful at this level, the first question is whether two such rough solutions can differ while sharing the same initial state. [quotetheorem:7079] [citeproof:7079] Uniqueness shows that the energy estimate is strong enough to rule out hidden weak solutions within the Dirichlet energy class. The assumptions again matter: without a boundary condition that kills the flux, the difference of two solutions can exchange energy with the boundary, and the zero-initial-energy argument no longer closes. Bounded smoothness of $U$ is used here to have the standard Dirichlet trace theory, Poincare inequality, and eigenfunction approximation; on rough domains or exterior domains the same conclusion requires extra functional-analytic input. The result also says nothing about functions outside $C([0,T];H^1_0(U))$ with $\partial_tu\in C([0,T];L^2(U))$, where the energy may be undefined and the multiplier argument has no object to control. It remains to prove that an energy solution exists for every finite-energy initial state, not only for smooth data. The natural method is to solve finite-dimensional approximations and keep the energy bound uniform while passing to the limit. [quotetheorem:7080] [citeproof:7080] The Galerkin argument also explains why the energy topology is stable under rough initial data, but it is deliberately no stronger than that. If $u_0\in H^1_0(U)$ has no second weak derivative, the solution generally remains non-classical; the theorem does not produce $\Delta u$ as an $L^2$ function or make $u$ twice differentiable. The sine-series example below gives the concrete mechanism: wave modes oscillate instead of acquiring a heat-kernel decay factor. Outside the energy space, for instance with displacement not in $H^1_0(U)$ or velocity not in $L^2(U)$, the conserved quantity may be infinite and this well-posedness theorem has no content. [example: Failure of Parabolic Smoothing for Waves] Let $U=(0,\pi)$ and let $e_k(x)=\sqrt{2/\pi}\sin(kx)$, so \begin{align*} \partial_x^2 e_k(x)=-k^2e_k(x). \end{align*} For initial displacement \begin{align*} u_0(x)=\sum_{k=1}^\infty a_k e_k(x), \qquad u_1=0, \end{align*} the separated-mode solution is \begin{align*} u(t,x)=\sum_{k=1}^\infty a_k\cos(ckt)e_k(x). \end{align*} Indeed, for each $k$, \begin{align*} \partial_t^2\bigl(a_k\cos(ckt)e_k(x)\bigr)=-c^2k^2a_k\cos(ckt)e_k(x) \end{align*} and \begin{align*} c^2\partial_x^2\bigl(a_k\cos(ckt)e_k(x)\bigr)=-c^2k^2a_k\cos(ckt)e_k(x), \end{align*} so each mode satisfies $\partial_t^2u-c^2\partial_x^2u=0$. Also \begin{align*} u(0,x)=\sum_{k=1}^\infty a_k e_k(x) \end{align*} and \begin{align*} \partial_tu(0,x)=\sum_{k=1}^\infty -cka_k\sin(0)e_k(x)=0. \end{align*} To see the absence of parabolic smoothing concretely, take $a_k=k^{-2}$. Then \begin{align*} \sum_{k=1}^\infty k^2|a_k|^2=\sum_{k=1}^\infty \frac{1}{k^2}<\infty, \end{align*} so $u_0\in H^1_0(0,\pi)$, while \begin{align*} \sum_{k=1}^\infty k^4|a_k|^2=\sum_{k=1}^\infty 1=\infty, \end{align*} so $u_0$ is not two spatial derivatives smoother. At the positive time $t=2\pi/c$, \begin{align*} \cos(ckt)=\cos(2\pi k)=1 \end{align*} for every integer $k$, hence \begin{align*} u(2\pi/c,x)=\sum_{k=1}^\infty a_k e_k(x)=u_0(x). \end{align*} Thus the wave evolution can return exactly to the same rough displacement; unlike the heat equation, its Fourier coefficients are multiplied by oscillatory factors $\cos(ckt)$ rather than decaying factors such as $e^{-k^2t}$. [/example] This example is the sharp contrast with the heat semigroup. Hyperbolic evolution moves Sobolev regularity through time without the immediate regularization associated with parabolic kernels. ## Inhomogeneous Wave Equations and Forced Energy Inequalities How does the picture change when an external force acts on the medium? The total energy is no longer conserved, but the same calculation gives a quantitative inequality: the force can inject or remove energy only through its pairing with the velocity. Consider \begin{align*} \partial_t^2u - c^2\Delta u = f \qquad \text{in } (0,T)\times U, \end{align*} with homogeneous Dirichlet boundary condition and initial data $(u_0,u_1)\in H^1_0(U)\times L^2(U)$. The natural forcing class for an energy solution is $f\in L^1(0,T;L^2(U))$, since it pairs with $\partial_tu\in C([0,T];L^2(U))$. [definition: Energy Solution With Forcing] Let $f\in L^1(0,T;L^2(U))$. An energy solution of the forced Dirichlet wave equation is a function \begin{align*} u \in C([0,T];H^1_0(U)), \qquad \partial_tu\in C([0,T];L^2(U)) \end{align*} with prescribed initial data such that for every $\phi\in C_c^\infty((0,T)\times U)$, \begin{align*} \int_0^{\mathsf T}\int_U u\,\partial_t^2\phi\,d\mathcal L^n dt + c^2\int_0^{\mathsf T}\int_U \nabla u\cdot\nabla\phi\,d\mathcal L^n dt = \int_0^{\mathsf T}\int_U f\phi\,d\mathcal L^n dt. \end{align*} [/definition] The forcing term is placed on the right-hand side as a distribution in space and time. The homogeneous energy was constant because the equation supplied no source term; with forcing, the key problem is to measure how much energy the source can add. The same multiplier calculation gives a stability estimate in terms of the $L^1_tL^2_x$ size of $f$. [quotetheorem:7081] The square-root form is important: it bounds the energy norm linearly by the accumulated size of the force. The hypothesis $f\in L^1(0,T;L^2(U))$ is exactly what makes the right-hand side finite and lets $f$ pair with the velocity in $L^2(U)$. If the forcing is rougher, for example only a general distribution in space, the product with $\partial_tu$ need not be defined and this energy estimate may fail without replacing $L^2(U)$ by a dual Sobolev space. The theorem also does not say that forcing improves regularity; it only controls the finite-energy norm and gives continuous dependence on both initial data and forcing. [example: Compactly Supported Force] Let $U=\mathbb R^n$, let $0<a\le b<T$, and assume that $f(t,\cdot)=0$ whenever $t\notin[a,b]$. With zero initial data, the initial energy is \begin{align*} E(0)=\frac12\|\partial_tu(0)\|_{L^2(\mathbb R^n)}^2+\frac{c^2}{2}\|\nabla u(0)\|_{L^2(\mathbb R^n)}^2=0. \end{align*} Applying the *Forced Energy Inequality* with $s=0$ gives \begin{align*} \sqrt{2E(t)}\le \sqrt{2E(0)}+\int_0^t\|f(r)\|_{L^2(\mathbb R^n)}\,dr. \end{align*} Since $E(0)=0$, this becomes \begin{align*} \sqrt{2E(t)}\le \int_0^t\|f(r)\|_{L^2(\mathbb R^n)}\,dr. \end{align*} Because $f(r,\cdot)=0$ for $r<a$ and for $r>b$, the integral is exactly \begin{align*} \int_0^t\|f(r)\|_{L^2(\mathbb R^n)}\,dr=\int_a^{\min\{t,b\}}\|f(r)\|_{L^2(\mathbb R^n)}\,dr. \end{align*} Thus \begin{align*} \sqrt{2E(t)}\le \int_a^{\min\{t,b\}}\|f(r)\|_{L^2(\mathbb R^n)}\,dr. \end{align*} If $t<a$, the right-hand side is the integral over an empty time interval, so $\sqrt{2E(t)}\le0$ and hence $E(t)=0$. For $a\le t\le b$, the upper bound grows only through the accumulated $L^2$ size of the force. For $t\ge b$, the equation is homogeneous on $[b,t]$, so the homogeneous energy identity gives $E(t)=E(b)$: after the force is switched off, the wave continues to move, but no further energy is added. [/example] The estimate controls the size of a forced solution, but it does not yet describe how that solution is assembled from the homogeneous wave flow. We now ask how to represent the effect of a time-dependent source as a continuum of impulses. Duhamel's formula answers this by propagating each instantaneous force contribution from its insertion time to the observation time. [quotetheorem:7082] Duhamel's formula separates propagation from forcing: $S(t-s)$ tells how an impulse inserted at time $s$ travels to time $t$. The integral is a Bochner integral in $H^1_0(U)\times L^2(U)$, so the assumption $f\in L^1(0,T;L^2(U))$ is not cosmetic; without it, the accumulated forcing term may not define an energy-space vector. For instance, a forcing term with non-integrable $L^2(U)$ norm near a time $s_0$ can make $\int_0^t S(t-s)(0,f(s))\,ds$ diverge in $\mathcal H(U)$ even though each individual impulse lies in the energy space. The formula also relies on having a homogeneous wave group $S(t)$ on the chosen energy space, which is supplied here by the Dirichlet Laplacian on a bounded smooth domain. The generator notation must be read with its domain: if $(v,w)\in\mathcal H(U)$ but $v\notin H^2(U)$, then $\Delta v$ need not belong to $L^2(U)$, so $A(v,w)$ is not an energy-space vector even though $S(t)(v,w)$ is defined. It is the wave analogue of the heat equation's inhomogeneous semigroup formula, but here the operators preserve energy rather than smoothing the input, so the representation does not upgrade rough data to classical solutions. Chapter 6 established the conserved energy structure of the wave equation, while the parabolic chapters showed how dissipation leads to smoothing. We now refine the hyperbolic theory by studying where disturbances can travel and how the domain of dependence controls uniqueness and causality. # 7. Finite Propagation and Domain of Dependence Chapter 6 developed global energy estimates for the wave equation, while Chapters 2 and 3 provided the parabolic smoothing contrast. This chapter refines the hyperbolic picture by asking where the energy can travel. The prerequisites are the energy identity for the homogeneous wave equation, integration by parts, the [divergence theorem](/theorems/2754), and the basic heat kernel representation on $\mathbb R^n$. For the wave equation, information is constrained by cones in space-time, so values at a point depend only on initial data inside a ball of radius equal to the elapsed time. This is the sharp contrast with the heat equation, whose kernel is positive everywhere for every positive time. ## Energy on Space-Time Cones The first question is local: if we only observe the wave inside a backward cone, can we estimate the energy entering the vertex using only the data on the base of the cone? The global energy identity is not enough by itself, because it integrates over all space. The finite propagation result comes from applying the energy method to a moving spatial region whose boundary travels at wave speed. Let $u: [0,T] \times \mathbb R^n \to \mathbb R$ solve the homogeneous wave equation \begin{align*} \partial_t^2 u - \Delta u = 0. \end{align*} For a fixed point $(t_0,x_0)$ and $0 \le s \le t_0$, define the backward cone slices \begin{align*} K_s := B(x_0, t_0-s), \qquad 0 \le s \le t_0. \end{align*} As $s$ increases, the ball shrinks at speed $1$, matching the characteristic speed of the wave operator. For $u\in C^1([0,T]\times\mathbb R^n)$, the local energy density for the wave equation is the map \begin{align*} e[u]:[0,T]\times\mathbb R^n\to [0,\infty),\qquad e[u](t,x) := \frac{1}{2}|\partial_t u(t,x)|^2 + \frac{1}{2}|\nabla u(t,x)|^2. \end{align*} For the fixed cone $K=\{(s,x):0\le s\le t_0,\ x\in K_s\}$ and the same $u$, the corresponding local energy inside the cone slice is the function \begin{align*} E_K:[0,t_0]\to [0,\infty),\qquad E_K(s) := \int_{K_s} e[u](s,x)\,d\mathcal L^n(x). \end{align*} The main point is that $E_K(s)$ cannot increase as the slice moves toward the vertex. [quotetheorem:7083] [citeproof:7083] This estimate is the local version of conservation of energy. It says that when the observation region shrinks along the cone, no extra energy can enter through the lateral boundary. The $C^2$ hypothesis is used to justify differentiating under the integral sign and applying the [divergence theorem](/theorems/3614) on the moving region; for weak solutions the same result requires an approximation argument rather than this pointwise computation. The homogeneity of the equation is also essential: if $\partial_t^2u-\Delta u=F$ and $F$ is supported inside the cone, then the energy identity acquires the volume term $\int_{K_s}F\,\partial_tu\,d\mathcal L^n$, so energy can be created after the initial slice. The characteristic speed in the definition of $K_s$ is not cosmetic. In one dimension, a right-moving packet $F(x-t)$ crosses any lateral boundary whose inward motion is slower than speed $1$, which can make the local energy inside the moving interval increase. For the heat equation the failure is more drastic: nonnegative compactly supported data immediately produce positive values everywhere, so no cone-local energy monotonicity of this form can hold. The theorem does not say that energy at the vertex is determined by the boundary sphere alone, nor that energy travels only on the boundary of the cone. It is an exclusion principle for inward flux through a characteristic lateral boundary. This is why it is the right tool for the next support theorem: once the base energy is zero, monotonicity forces every later cone slice to have zero energy as well. [example: Energy Captured by a Backward Cone] Take $n=1$ and let $u(t,x)=F(x-t)+G(x+t)$ with $F,G\in C_c^2(\mathbb R)$. For the backward cone with vertex $(t_0,x_0)$, write \begin{align*} a_s=x_0-(t_0-s),\qquad b_s=x_0+(t_0-s). \end{align*} Then the slice is $I_s=(a_s,b_s)$, and \begin{align*} \partial_tu(s,x)=-F'(x-s)+G'(x+s),\qquad \partial_xu(s,x)=F'(x-s)+G'(x+s). \end{align*} Expanding the energy density gives \begin{align*} \frac12|\partial_tu|^2+\frac12|\partial_xu|^2=\frac12(-F'+G')^2+\frac12(F'+G')^2. \end{align*} The mixed terms cancel: \begin{align*} \frac12(F'^2-2F'G'+G'^2)+\frac12(F'^2+2F'G'+G'^2)=F'^2+G'^2, \end{align*} where $F'$ is evaluated at $x-s$ and $G'$ at $x+s$. Hence \begin{align*} E_K(s)=\int_{a_s}^{b_s} |F'(x-s)|^2\,dx+\int_{a_s}^{b_s}|G'(x+s)|^2\,dx. \end{align*} In the first integral set $y=x-s$. Since $a_s-s=x_0-t_0$ and $b_s-s=x_0+t_0-2s$, \begin{align*} \int_{a_s}^{b_s}|F'(x-s)|^2\,dx=\int_{x_0-t_0}^{x_0+t_0-2s}|F'(y)|^2\,dy. \end{align*} In the second integral set $z=x+s$. Since $a_s+s=x_0-t_0+2s$ and $b_s+s=x_0+t_0$, \begin{align*} \int_{a_s}^{b_s}|G'(x+s)|^2\,dx=\int_{x_0-t_0+2s}^{x_0+t_0}|G'(z)|^2\,dz. \end{align*} Therefore \begin{align*} E_K(s)=\int_{x_0-t_0}^{x_0+t_0-2s}|F'(y)|^2\,dy+\int_{x_0-t_0+2s}^{x_0+t_0}|G'(z)|^2\,dz. \end{align*} Differentiating these one-variable integrals by the fundamental theorem of calculus gives \begin{align*} E_K'(s)=-2|F'(x_0+t_0-2s)|^2-2|G'(x_0-t_0+2s)|^2\le 0. \end{align*} The $F$-part travels along characteristics $x-t=\text{constant}$ and only the characteristic $x-t=x_0-t_0$ reaches the vertex; the terms with $x-t>x_0-t_0$ are lost through the right lateral boundary. Similarly, the $G$-part travels along $x+t=\text{constant}$ and only $x+t=x_0+t_0$ reaches the vertex; the terms with $x+t<x_0+t_0$ are lost through the left lateral boundary. Thus the decrease of $E_K(s)$ exactly records the energy carried by waves whose characteristics do not meet $(t_0,x_0)$. [/example] The preceding example explains why the cone is the right moving region. If the boundary moved slower than speed $1$, some wave packets could cross into the region from outside. If it moved faster, the estimate would still hold but would give a larger possible dependence region than necessary. ## Finite Propagation Speed The next question is global in space but local in support: if the initial displacement and velocity vanish outside a ball, where can the solution be nonzero at later times? For the wave equation the answer is governed by the light cone. A compactly supported disturbance expands at speed at most $1$ when the equation is $\partial_t^2u-\Delta u=0$. [definition: Domain of Influence] For $T>0$, the domain-of-influence construction for the unit-speed wave equation is the map \begin{align*} \mathcal I_T: \mathcal P(\mathbb R^n) \to \mathcal P([0,T]\times \mathbb R^n) \end{align*} defined, for each $A\subseteq \mathbb R^n$, by \begin{align*} \mathcal I_T(A):=\{(t,x)\in [0,T]\times \mathbb R^n : \operatorname{dist}(x,A)\le t\}. \end{align*} [/definition] The definition names the region that can be reached by signals beginning in $A$ at time $0$. Since $\operatorname{dist}(x,A)=\operatorname{dist}(x,\overline A)$, only the closure of $A$ affects this region. The closed condition $\operatorname{dist}(x,A)\le t$ includes the wave front itself, where singularities or discontinuities in lower-regularity solutions may concentrate. The theorem to prove is that this geometric region is not merely suggestive: it is an actual support bound forced by the cone energy inequality. Stating the hypothesis in terms of the supports of the Cauchy data removes any boundary ambiguity and makes the causal set exactly the one generated by the initial disturbance. [quotetheorem:670] [citeproof:670] This theorem is often the most useful support statement for wave equations. It turns an initial support condition into a geometric bound on the support of the full solution: \begin{align*} \operatorname{supp} u(t,\cdot) \subseteq \{x\in \mathbb R^n : \operatorname{dist}(x,\operatorname{supp} g\cup \operatorname{supp} h)\le t\}. \end{align*} The vanishing of both Cauchy data on the base ball is essential. If the velocity datum $h$ is nonzero at some point of $B(x_0,t_0)$, the one-dimensional formula \begin{align*} u(t,x)=\frac12(g(x-t)+g(x+t))+\frac12\int_{x-t}^{x+t}h(y)\,d\mathcal L^1(y) \end{align*} shows that the vertex value may be nonzero even when $g=0$. Likewise, if the displacement datum $g$ is nonzero on the base ball but $h=0$, the endpoint contribution $\frac12(g(x-t)+g(x+t))$ can reach the vertex. The set $A$ records the portion of the initial hypersurface where the Cauchy data are allowed to live. Points outside $\mathcal I_T(A)$ have backward base balls disjoint from $A$, so the cone energy argument starts from zero initial energy there. The conclusion is only a support containment. It does not assert that the solution is nonzero everywhere inside the cone, because cancellations and symmetry can make the value vanish at particular points. The regularity assumptions permit the classical energy proof: $u\in C^2$ gives pointwise derivatives and boundary fluxes, while $g\in C^2$ and $h\in C^1$ are compatible with the stated classical Cauchy problem. Lower-regularity finite-energy solutions require density and weak continuity of the energy. The speed changes if the wave equation has the form \begin{align*} \partial_t^2u-c^2\Delta u=0; \end{align*} after rescaling time, the cone radius becomes $ct$. [example: Compactly Supported Initial Data] Suppose $\operatorname{supp} g\cup \operatorname{supp} h\subseteq \overline{B}(0,R)$. Applying *Finite Propagation Speed for the Wave Equation* with $A=\overline{B}(0,R)$ gives \begin{align*} u(t,x)=0 \quad \text{whenever} \quad \operatorname{dist}(x,\overline{B}(0,R))>t. \end{align*} If $|x|>R$, then the closest point of $\overline{B}(0,R)$ to $x$ lies on the radial segment from $0$ to $x$, so \begin{align*} \operatorname{dist}(x,\overline{B}(0,R))=|x|-R. \end{align*} Therefore, if $|x|>R+t$, then $|x|-R>t$, hence $\operatorname{dist}(x,\overline{B}(0,R))>t$, and the theorem gives $u(t,x)=0$. Thus \begin{align*} \operatorname{supp} u(t,\cdot)\subseteq \overline{B}(0,R+t) \end{align*} for every $0\le t\le T$. In dimension $3$, [Kirchhoff's formula](/theorems/666) gives the sharper representation \begin{align*} u(t,x)=\partial_t\left(\frac{1}{4\pi t}\int_{\partial B(x,t)} g(y)\,dS(y)\right)+\frac{1}{4\pi t}\int_{\partial B(x,t)} h(y)\,dS(y) \end{align*} for $t>0$. If $|x|>R+t$ and $y\in \partial B(x,t)$, then the [reverse triangle inequality](/theorems/2300) gives \begin{align*} |y|\ge |x|-|y-x|=|x|-t>R. \end{align*} Hence $y\notin \overline{B}(0,R)$, so $g(y)=0$ and $h(y)=0$ on the whole sphere $\partial B(x,t)$. Both integrals in Kirchhoff's formula vanish, and differentiating the identically zero spherical average gives $u(t,x)=0$. The support bound is therefore the geometric content of the representation formula: points outside $\overline{B}(0,R+t)$ have backward spheres missing all initial data. [/example] Representation formulas give another way to see the same geometry. In one dimension, [d'Alembert's formula](/theorems/665) uses the two characteristics $x\pm t$ and the interval between them. In three dimensions, Kirchhoff's formula places the displacement contribution on a sphere and the velocity contribution on the enclosed ball. In numerical schemes this geometric restriction appears as the Courant-Friedrichs-Lewy condition: the discrete stencil must contain the PDE's physical dependence cone, or a stable approximation cannot transmit information along all characteristic directions. Finite propagation is physically measurable. A detector far away from the initial disturbance must wait until the expanding cone reaches it. [example: Detector Travel Time] Let $f,g$ be supported in a compact set $A\subset\mathbb R^n$, and place a detector at $x_d\in\mathbb R^n$. Set \begin{align*} d=\operatorname{dist}(x_d,A)>0. \end{align*} For a time $t$ with $0\le t<d$, the inequality $t<d=\operatorname{dist}(x_d,A)$ gives \begin{align*} \operatorname{dist}(x_d,A)>t. \end{align*} Hence $(t,x_d)\notin \mathcal I_T(A)$, because $\mathcal I_T(A)$ consists exactly of the points satisfying $\operatorname{dist}(x,A)\le t$. By *Finite Propagation Speed for the Wave Equation*, this implies \begin{align*} u(t,x_d)=0 \end{align*} for every $0\le t<d$. Thus no signal from data supported in $A$ can be detected at $x_d$ before time $d$ when the wave speed is $1$. For the equation \begin{align*} \partial_t^2u-c^2\Delta u=0, \end{align*} the reachable region at time $t$ is described by $\operatorname{dist}(x,A)\le ct$, so the detector can only possibly register a signal once \begin{align*} d\le ct. \end{align*} Since $c>0$, dividing by $c$ gives \begin{align*} \frac{d}{c}\le t. \end{align*} Therefore the earliest possible detection time is $d$ for unit wave speed and $d/c$ for wave speed $c$. [/example] The theorem gives a support bound, not a guarantee that the disturbance is nonzero at the first possible arrival time. Cancellation, symmetry, or special initial data may make the detector read zero even after the cone reaches it. The point is causal exclusion: outside the cone, nonzero values are impossible. ## Domain of Dependence The dual question asks which part of the initial data can influence a prescribed solution value. For the point $(t_0,x_0)$, only data inside $\overline{B}(x_0,t_0)$ can matter. This is the domain-of-dependence principle, and it is proved by applying finite propagation to the difference of two solutions. [definition: Domain of Dependence] The domain-of-dependence construction at time $0$ for the unit-speed wave equation is the map \begin{align*} \mathcal D:[0,T]\times \mathbb R^n \to \mathcal P(\mathbb R^n) \end{align*} defined by \begin{align*} \mathcal D(t_0,x_0):=\overline{B}(x_0,t_0). \end{align*} [/definition] This is a definition relative to the initial hypersurface $t=0$. For dependence between two intermediate times $s<t_0$, the corresponding set is $\overline{B}(x_0,t_0-s)$ on the slice $t=s$. Applying the finite-propagation theorem to the difference of two solutions turns that notation into a uniqueness statement: two solutions with the same data on this ball must agree at the vertex, no matter how their data differ outside it. The resulting domain-of-dependence principle says that changing the initial data outside the ball $\overline{B}(x_0,t_0)$ cannot change the value at the vertex. The agreement on the whole dependence ball is essential: in one space dimension, if two velocity data differ on a small subinterval of $(x_0-t_0,x_0+t_0)$, d'Alembert's formula changes the value at $(t_0,x_0)$ by half the integral of that difference. The assumption that both functions solve the same homogeneous wave equation is also essential; adding a forcing term supported inside the cone can alter the vertex even when the initial data agree. The result is pointwise and local. It does not imply equality of the two solutions at other points of the time slice unless their own dependence balls also lie in the region of agreement. Repeating the same argument for all vertices in a backward cone gives the corresponding regional uniqueness statement. In numerical language, the exact PDE has a cone-shaped stencil. Stable numerical schemes for wave equations must respect this causal geometry at the discrete level, at least in an approximate form. [example: Changing Data Outside the Dependence Ball] Let $w=u-v$. Since $u$ and $v$ solve the same homogeneous wave equation, subtracting the two equations gives \begin{align*} \partial_t^2 w-\Delta w=(\partial_t^2u-\Delta u)-(\partial_t^2v-\Delta v)=0-0=0. \end{align*} The assumed agreement of the Cauchy data on $\overline{B}(x_0,t_0)$ says \begin{align*} w(0,x)=u(0,x)-v(0,x)=0 \quad \text{and} \quad \partial_t w(0,x)=\partial_tu(0,x)-\partial_tv(0,x)=0 \end{align*} for every $x\in \overline{B}(x_0,t_0)$. Applying finite propagation speed to the solution $w$ forces the value at the vertex to be \begin{align*} w(t_0,x_0)=0. \end{align*} Therefore \begin{align*} u(t_0,x_0)-v(t_0,x_0)=0, \end{align*} so $u(t_0,x_0)=v(t_0,x_0)$. The conclusion is only local in the observation point. For example, in one space dimension take $x_0=0$, $t_0=1$, let $u$ have zero Cauchy data, and let $v$ have zero displacement but velocity data $h\in C_c^1((2,3))$ with $h\ge 0$ and $h$ not identically zero. Then the two data sets agree on $[-1,1]$, so $u(1,0)=v(1,0)$. At the different point $(1,2)$, d'Alembert's formula for the difference $w=v-u$ gives \begin{align*} w(1,2)=\frac{1}{2}\int_{2-1}^{2+1} h(y)\,d\mathcal L^1(y)=\frac{1}{2}\int_1^3 h(y)\,d\mathcal L^1(y)>0. \end{align*} Thus changing data outside the dependence ball cannot affect the chosen vertex, but it can affect other points on the same time slice. [/example] The example isolates a pointwise consequence, but in applications one often needs equality on an entire space-time region. That requires repeating the same cone argument with many vertices: if two homogeneous wave solutions have matching Cauchy data on the base of every backward cone with vertex in a region $C$, then their difference has zero cone energy for each such vertex and the two solutions agree throughout $C$. This local form is useful when boundary conditions or coefficients are only specified in part of space-time. The cone hypothesis is doing real work: if the initial agreement is imposed on a smaller ball, a characteristic from the missing annulus can reach a point of $C$ and change the solution there. Similarly, if a forcing term is inserted inside the cone, two solutions with identical Cauchy data can differ after the forcing acts. The regularity and domain hypotheses are also part of the mechanism. If the solution is not regular enough for the local energy identity, jumps or distributional sources on characteristic surfaces must be controlled by a weak formulation before the conclusion is meaningful. If the cone is cut by a boundary without boundary data, reflected or incoming waves from that boundary may enter the region even though the base data agree. The theorem does not give uniqueness beyond the causal future of the prescribed data, and it does not replace boundary conditions for domains with physical boundaries. It identifies exactly the region where the homogeneous interior equation and the given base data determine the solution. This also shows why hyperbolic equations are compatible with local measurements: observing or prescribing data in a causal region determines the solution only inside the corresponding causal future. ## Contrast with Heat Flow The final question is why the preceding conclusions are special to hyperbolic equations. The heat equation has a positive Gaussian kernel, and this creates instantaneous spreading. The distinction is not merely technical; it is one of the main structural differences between parabolic and hyperbolic evolution. For $u_t-\Delta u=0$ on $\mathbb R^n$, the heat kernel representation is \begin{align*} u(t,x)=\int_{\mathbb R^n} (4\pi t)^{-n/2}\exp\left(-\frac{|x-y|^2}{4t}\right)f(y)\,d\mathcal L^n(y),\qquad t>0. \end{align*} The kernel is positive for every $x,y\in\mathbb R^n$ and every $t>0$. [example: Infinite Propagation for Heat Flow] Let $f\in C_c(\mathbb R^n)$ satisfy $f\ge 0$ and suppose $f$ is not identically zero. Fix $t>0$ and $x\in\mathbb R^n$. The heat kernel representation gives \begin{align*} u(t,x)=\int_{\mathbb R^n} (4\pi t)^{-n/2}\exp\left(-\frac{|x-y|^2}{4t}\right)f(y)\,d\mathcal L^n(y). \end{align*} Since $f(y)=0$ for $y\notin \operatorname{supp} f$, this is the same as \begin{align*} u(t,x)=\int_{\operatorname{supp} f} (4\pi t)^{-n/2}\exp\left(-\frac{|x-y|^2}{4t}\right)f(y)\,d\mathcal L^n(y). \end{align*} For every $y\in\mathbb R^n$, we have $(4\pi t)^{-n/2}>0$ and $\exp(-|x-y|^2/(4t))>0$, while $f(y)\ge 0$. Hence the integrand is nonnegative everywhere. Because $f$ is not identically zero and $f\ge 0$, there is a point $y_0\in\mathbb R^n$ with $f(y_0)>0$. By continuity of $f$, choose $r>0$ such that \begin{align*} f(y)\ge \frac{1}{2}f(y_0)>0 \end{align*} for every $y\in B(y_0,r)$. On this ball the heat kernel factor is also strictly positive, so \begin{align*} (4\pi t)^{-n/2}\exp\left(-\frac{|x-y|^2}{4t}\right)f(y)>0 \end{align*} for every $y\in B(y_0,r)$. Since $\mathcal L^n(B(y_0,r))>0$, the integral over $B(y_0,r)$ is strictly positive, and therefore \begin{align*} u(t,x)>0. \end{align*} Thus for every positive time the solution is positive at every spatial point, so compact support is destroyed immediately, even though the Gaussian factor becomes small when $|x-y|$ is large. [/example] This example should be read beside the finite propagation theorem. The wave equation conserves energy and transports it through characteristic cones; the heat equation dissipates energy and averages against a kernel with full spatial support. The two behaviours explain why hyperbolic uniqueness is often local and causal, while parabolic uniqueness is tied to smoothing, maximum principles, and backward ill-posedness. The finite-propagation principle explains the local and causal nature of hyperbolic evolution, in contrast with the global smoothing of the heat equation. With that geometric picture in place, Fourier analysis lets us examine the same wave flow frequency by frequency and derive dispersive estimates. # 8. Fourier and Dispersive Viewpoints for Hyperbolic Equations Fourier analysis turns the free wave equation into a family of independent harmonic oscillators, one for each frequency. The energy estimates of Chapter 6 showed stability, and the cone arguments of Chapter 7 showed finite propagation; this chapter asks what additional information is visible when the equation is diagonalised in frequency. The main themes are explicit representation formulas, the geometric meaning of wave propagation in different dimensions, and the dispersive decay that comes from oscillation rather than dissipation. ## Fourier Solution Formula for the Free Wave Equation How can the Cauchy problem for the wave equation be solved on all of $\mathbb R^n$ without boundary terms? The Fourier transform removes the spatial derivatives and replaces the PDE by an ODE in time. This gives a formula that is simultaneously an existence theorem, a uniqueness tool, and a starting point for decay estimates. We consider the free wave equation \begin{align*} \partial_t^2 u - \Delta u = 0, \qquad (t,x) \in \mathbb R \times \mathbb R^n. \end{align*} The Cauchy data are $u(0,x)=f(x)$ and $\partial_tu(0,x)=g(x)$. For smooth rapidly decreasing data, the Fourier transform in $x$ gives \begin{align*} \partial_t^2 \hat{u}(t,\xi) + |\xi|^2 \hat{u}(t,\xi) = 0. \end{align*} The zero frequency has the limiting behaviour of this same formula, with $\sin(t|\xi|)/|\xi|$ interpreted as $t$ at $\xi=0$. [quotetheorem:7084] [citeproof:7084] This formula separates the two sources of motion: the cosine propagator transports displacement data, while the sine propagator integrates velocity data. The Schwartz hypothesis is used to justify Fourier inversion and differentiation under the integral without first building the finite-energy theory; if $f$ or $g$ is a distribution, the same displayed expression may still make sense but the resulting object need not be a classical solution. The theorem does not give decay by itself: plane waves and narrow wave packets keep their frequency amplitudes, and the multipliers $\cos(t|\xi|)$ and $\sin(t|\xi|)/|\xi|$ do not damp high frequencies as the heat multiplier $e^{-t|\xi|^2}$ does. This limitation motivates both the energy identity below, which measures conserved frequency size, and the later physical-space formulas, which recover the geometry hidden by diagonalisation. [example: One Frequency Wave] Let $k \in \mathbb R^n \setminus \{0\}$, and take complex-valued initial data $f(x)=e^{ik\cdot x}$ and $g(x)=0$. We verify that \begin{align*} u(t,x)=\cos(t|k|)e^{ik\cdot x} \end{align*} solves the Cauchy problem. Since $e^{ik\cdot x}$ is independent of $t$, \begin{align*} \partial_t u(t,x)=-|k|\sin(t|k|)e^{ik\cdot x} \end{align*} and \begin{align*} \partial_t^2u(t,x)=-|k|^2\cos(t|k|)e^{ik\cdot x}. \end{align*} For each $j$, \begin{align*} \partial_{x_j}u(t,x)=ik_j\cos(t|k|)e^{ik\cdot x} \end{align*} and \begin{align*} \partial_{x_j}^2u(t,x)=-k_j^2\cos(t|k|)e^{ik\cdot x}. \end{align*} Therefore \begin{align*} \Delta u(t,x)=-\left(k_1^2+\cdots+k_n^2\right)\cos(t|k|)e^{ik\cdot x}=-|k|^2\cos(t|k|)e^{ik\cdot x}. \end{align*} Hence \begin{align*} \partial_t^2u(t,x)-\Delta u(t,x)=-|k|^2\cos(t|k|)e^{ik\cdot x}+|k|^2\cos(t|k|)e^{ik\cdot x}=0. \end{align*} The initial values are \begin{align*} u(0,x)=\cos(0)e^{ik\cdot x}=e^{ik\cdot x}=f(x) \end{align*} and \begin{align*} \partial_tu(0,x)=-|k|\sin(0)e^{ik\cdot x}=0=g(x). \end{align*} Finally, \begin{align*} \cos(t|k|)e^{ik\cdot x}=\frac{1}{2}e^{i(k\cdot x+t|k|)}+\frac{1}{2}e^{i(k\cdot x-t|k|)}. \end{align*} Thus the single spatial frequency $k$ splits into two time oscillations with temporal frequencies $\tau=\pm |k|$, and both satisfy the wave dispersion relation $\tau^2=|k|^2$. [/example] The single-frequency example isolates the oscillator structure but has no spatial decay. To connect the Fourier formula with the energy method from earlier chapters, we next ask which frequency-space quantity remains fixed under the same oscillation. [quotetheorem:7085] [citeproof:7085] The energy identity is not a decay statement. A single plane wave never decays in amplitude, so no estimate of the form $\|u(t,\cdot)\|_{L^\infty}\le C(t)E[u]^{1/2}$ with $C(t)\to 0$ can hold without extra localisation or integrability assumptions. The hypotheses $f\in H^1$ and $g\in L^2$ are exactly matched to the two terms in the conserved energy: if $f\notin H^1$, the initial elastic energy may be infinite, and if $g\notin L^2$, the kinetic energy may be infinite. The identity also does not control $\|u(t,\cdot)\|_{L^2}$ on all of $\mathbb R^n$ when the zero-frequency part is present; velocity data with nonzero low-frequency mass can produce growth in the displacement component. Decay therefore requires additional structure, such as compact support in physical space or frequency localisation away from $\xi=0$. The necessity of the finite-energy assumptions can be seen from concrete frequency profiles. If $\hat f(\xi)$ behaves like $|\xi|^{-n/2}$ along a high-frequency annulus sequence, then $f$ may fail to belong to $H^1$ because $\int |\xi|^2|\hat f(\xi)|^2\,d\mathcal L^n(\xi)$ diverges, and the elastic part of the displayed energy has no finite value. If $g$ is chosen with $\hat g$ not square-integrable, then the kinetic term is already infinite at $t=0$. The conservation law is therefore a statement about data for which both initial energy components are finite, not a regularising mechanism that repairs rough or non-square-integrable data. [example: Radial Wave Packet in Frequency Space] Let $\hat f(\xi)=a(|\xi|)$ and $\hat g(\xi)=0$, where $a\in C_c^\infty((1,2))$. The Fourier representation gives \begin{align*} u(t,x)=\frac{1}{(2\pi)^{n/2}}\int_{\mathbb R^n}e^{ix\cdot \xi}\cos(t|\xi|)a(|\xi|)\,d\mathcal L^n(\xi). \end{align*} Using $\cos s=(e^{is}+e^{-is})/2$, this is the sum \begin{align*} u(t,x)=\frac{1}{2(2\pi)^{n/2}}\int_{\mathbb R^n}e^{i(x\cdot \xi+t|\xi|)}a(|\xi|)\,d\mathcal L^n(\xi)+\frac{1}{2(2\pi)^{n/2}}\int_{\mathbb R^n}e^{i(x\cdot \xi-t|\xi|)}a(|\xi|)\,d\mathcal L^n(\xi). \end{align*} For the two phases \begin{align*} \Phi_\pm(\xi)=x\cdot \xi \pm t|\xi|, \end{align*} and for $\xi\ne 0$, \begin{align*} \nabla_\xi\Phi_\pm(\xi)=x\pm t\frac{\xi}{|\xi|}. \end{align*} A stationary point therefore satisfies \begin{align*} x\pm t\frac{\xi}{|\xi|}=0. \end{align*} Taking Euclidean norms of both sides gives \begin{align*} |x|=\left|t\frac{\xi}{|\xi|}\right|=|t|. \end{align*} Thus stationary phase can occur only on the light cone $|x|=|t|$, with $\xi/|\xi|=-x/t$ for the $+$ phase and $\xi/|\xi|=x/t$ for the $-$ phase when $t\ne 0$. Away from the light cone, say $\bigl||x|-|t|\bigr|\ge \delta(|x|+|t|)$ for some $\delta>0$, the same formula gives \begin{align*} |\nabla_\xi\Phi_\pm(\xi)|= \left|x\pm t\frac{\xi}{|\xi|}\right|\ge \bigl||x|-|t|\bigr|\ge \delta(|x|+|t|). \end{align*} Since $a(|\xi|)$ is supported where $1<|\xi|<2$, all differentiations in $\xi$ stay away from the singular point $\xi=0$. Repeated integration by parts with the vector field \begin{align*} L_\pm=\frac{1}{i|\nabla_\xi\Phi_\pm|^2}\nabla_\xi\Phi_\pm\cdot \nabla_\xi \end{align*} uses $L_\pm e^{i\Phi_\pm}=e^{i\Phi_\pm}$ and gains a factor comparable to $(|x|+|t|)^{-1}$ each time. Hence the oscillatory integral is rapidly small away from $|x|=|t|$, while near $|x|=|t|$ the phase has stationary angular directions. The packet therefore propagates near the light cone, and its off-cone decay comes from nonstationary oscillation in frequency. [/example] ## Kirchhoff and Spherical Mean Formulas in Low Dimensions The Fourier representation solves the equation, but it hides the geometry of finite propagation. What does the solution at $(t,x)$ actually depend on in physical space? In low dimensions the answer can be written in terms of averages over spheres or intervals, and these formulas reveal a sharp distinction between odd and even dimensions. The three-dimensional wave equation is governed by spherical means. The displacement at time $t$ depends on the initial data on the sphere of radius $t$ centred at $x$, not on the whole ball. [definition: Spherical Mean] For $r>0$, the spherical mean operator is the map \begin{align*} M_r:C(\mathbb R^3) \to C(\mathbb R^3) \end{align*} where, for $x\in \mathbb R^3$, \begin{align*} M_r h(x) := \frac{1}{4\pi r^2}\int_{\partial B(x,r)} h(y)\,d\mathcal H^2(y). \end{align*} [/definition] Spherical means convert the three-dimensional wave equation into an effectively one-dimensional radial equation. The factor $r$ in the next formula is the analytic trace of the area growth of spheres in $\mathbb R^3$. [explanation: Kirchhoff Formula in Three Dimensions] Let $g\in C^3(\mathbb R^3)$ and $h\in C^2(\mathbb R^3)$, and let $u$ solve \begin{align*} \partial_t^2u-\Delta u=0,\qquad u(\cdot,0)=g,\qquad \partial_tu(\cdot,0)=h \end{align*} on $\mathbb R^3\times(0,\infty)$. Then \begin{align*} u(x,t)=\frac{\partial}{\partial t}\left(\frac{1}{4\pi t^2}\int_{\partial B(x,t)}t\,g\,d\mathcal H^2\right)+\frac{1}{4\pi t^2}\int_{\partial B(x,t)}t\,h\,d\mathcal H^2. \end{align*} Equivalently, in terms of the spherical mean operator, \begin{align*} u(x,t)=\partial_t\bigl[t(M_tg)(x)\bigr]+t(M_th)(x). \end{align*} [/explanation] Kirchhoff's formula is more than an explicit solution: it is the first place where the Fourier formula's geometric blind spot becomes visible. The multiplier representation says that singularities travel at speed one, but it does not show whether the value at $(t,x)$ comes from the whole ball $B(x,t)$ or only from its boundary. The hypotheses $f\in C^3$ and $g\in C^2$ make the surface integrals, the time derivative, and the verification of the PDE classical; for merely finite-energy data the same expression is recovered only after approximation and may not have pointwise meaning. Compact support is not needed for local validity of the formula, but it is needed for the global propagation and decay conclusions drawn later. The dimension is essential: replacing $\mathbb R^3$ by $\mathbb R^2$ produces the Poisson disk integral below, while in one dimension the velocity data enter through an interval. [example: Outgoing Radial Pulse in Three Dimensions] Let $u(t,x)=U(t,r)$ with $r=|x|$, where $F$ and $G$ are smooth compactly supported radial profiles. For $r>0$, the radial Laplacian is \begin{align*} \Delta u(t,x)=\partial_r^2U(t,r)+\frac{2}{r}\partial_rU(t,r). \end{align*} Define $w(t,r)=rU(t,r)$. Then \begin{align*} \partial_t^2w(t,r)=r\partial_t^2U(t,r). \end{align*} Also, \begin{align*} \partial_rw(t,r)=U(t,r)+r\partial_rU(t,r). \end{align*} Differentiating once more in $r$ gives \begin{align*} \partial_r^2w(t,r)=2\partial_rU(t,r)+r\partial_r^2U(t,r). \end{align*} Hence \begin{align*} \partial_t^2w(t,r)-\partial_r^2w(t,r)=r\partial_t^2U(t,r)-2\partial_rU(t,r)-r\partial_r^2U(t,r). \end{align*} Factoring out $r$ from the last two terms gives \begin{align*} \partial_t^2w(t,r)-\partial_r^2w(t,r)=r\left(\partial_t^2U(t,r)-\partial_r^2U(t,r)-\frac{2}{r}\partial_rU(t,r)\right). \end{align*} Therefore $\partial_t^2u-\Delta u=0$ is equivalent, for $r>0$, to \begin{align*} \partial_t^2w(t,r)-\partial_r^2w(t,r)=0. \end{align*} The initial data for $w$ are \begin{align*} w(0,r)=rF(r). \end{align*} and \begin{align*} \partial_tw(0,r)=rG(r). \end{align*} Smoothness of the radial solution at the origin corresponds to the odd extension $w(t,-r)=-w(t,r)$ and $w(t,0)=0$. Thus, applying the one-dimensional *D'Alembert Formula* to the odd initial data $W_0(s)=sF(|s|)$ and $W_1(s)=sG(|s|)$ gives, for $r>0$, \begin{align*} w(t,r)=\frac{1}{2}W_0(r+t)+\frac{1}{2}W_0(r-t)+\frac{1}{2}\int_{r-t}^{r+t}W_1(s)\,d\mathcal L^1(s). \end{align*} If $H$ is an antiderivative of $W_1$, so that $H'(s)=W_1(s)$, then the integral term is \begin{align*} \int_{r-t}^{r+t}W_1(s)\,d\mathcal L^1(s)=H(r+t)-H(r-t). \end{align*} Consequently \begin{align*} w(t,r)=\left(\frac{1}{2}W_0(r+t)+\frac{1}{2}H(r+t)\right)+\left(\frac{1}{2}W_0(r-t)-\frac{1}{2}H(r-t)\right). \end{align*} Writing \begin{align*} A(s)=\frac{1}{2}W_0(s)-\frac{1}{2}H(s) \end{align*} and \begin{align*} B(s)=\frac{1}{2}W_0(s)+\frac{1}{2}H(s), \end{align*} we obtain \begin{align*} w(t,r)=A(r-t)+B(r+t). \end{align*} Since $u(t,r)=r^{-1}w(t,r)$ for $r>0$, this becomes \begin{align*} u(t,r)=\frac{A(r-t)}{r}+\frac{B(r+t)}{r}. \end{align*} The term $A(r-t)/r$ is constant along outgoing rays $r-t=\text{constant}$, while $B(r+t)/r$ is constant along incoming rays $r+t=\text{constant}$. Thus an outgoing compact radial pulse has the form $r^{-1}A(r-t)$ away from the origin: its profile travels outward at speed one, and its amplitude carries the geometric factor $1/r$ coming from the area growth of spheres in $\mathbb R^3$. [/example] The radial pulse shows the same one-dimensional mechanism hidden inside three-dimensional spherical propagation. To see that mechanism without the spherical averaging step, we now solve the wave equation directly along characteristics in one space dimension. [quotetheorem:665] [citeproof:665] This formula separates right-moving and left-moving waves. The displacement splits into two travelling copies, while the initial velocity contributes through the interval swept out by characteristics arriving at $(t,x)$. The assumptions $f\in C^2$ and $g\in C^1$ are the classical regularity needed to differentiate the formula twice in $t$ and $x$ and to match the initial velocity pointwise; if $g$ is only a measure, for example, the integral formula can define a weak solution with jumps rather than a classical one. The theorem does not imply sharp Huygens behaviour in the sense used in higher dimensions, because velocity data from the full interval $[x-t,x+t]$ affect the solution. This interval dependence is the one-dimensional model for the even-dimensional tails that appear in Poisson's formula. [example: One-Dimensional Compact Pulse] Suppose $f=0$ and $g\in C_c^1(\mathbb R)$ is supported in $[-1,1]$. By the *D'Alembert Formula*, the solution is \begin{align*} u(t,x)=\frac{1}{2}\bigl(f(x+t)+f(x-t)\bigr)+\frac{1}{2}\int_{x-t}^{x+t}g(y)\,d\mathcal L^1(y). \end{align*} Since $f=0$, both displacement terms vanish: \begin{align*} f(x+t)+f(x-t)=0+0=0. \end{align*} Therefore \begin{align*} u(t,x)=\frac{1}{2}\int_{x-t}^{x+t}g(y)\,d\mathcal L^1(y). \end{align*} Fix $x\in\mathbb R$ and take $t\ge |x|+1$. Then \begin{align*} x-t\le x-(|x|+1)\le -1. \end{align*} Also, \begin{align*} x+t\ge x+(|x|+1)\ge 1. \end{align*} Hence $[-1,1]\subset [x-t,x+t]$. Because $g$ is supported in $[-1,1]$, we have $g(y)=0$ for $y\notin [-1,1]$, so \begin{align*} \int_{x-t}^{x+t}g(y)\,d\mathcal L^1(y)=\int_{-1}^{1}g(y)\,d\mathcal L^1(y). \end{align*} Since $g$ vanishes outside $[-1,1]$, this is the same as \begin{align*} \int_{-1}^{1}g(y)\,d\mathcal L^1(y)=\int_{\mathbb R}g(y)\,d\mathcal L^1(y). \end{align*} Thus, for every fixed $x$ and every $t\ge |x|+1$, \begin{align*} u(t,x)=\frac{1}{2}\int_{\mathbb R}g(y)\,d\mathcal L^1(y). \end{align*} The local value therefore need not decay as $t\to+\infty$; if $\int_{\mathbb R}g\,d\mathcal L^1\ne 0$, the solution becomes a nonzero constant at each fixed observation point. [/example] The compact pulse demonstrates that lower-dimensional propagation may retain a lasting contribution from velocity data. The next formula shows the analogous tail in two dimensions, where the relevant physical-space representation integrates over a disk rather than just its boundary. [quotetheorem:667] [citeproof:667] The disk integral shows why even-dimensional waves have memory inside the light cone. A source point $y$ can affect $u(t,x)$ for all later times $t>|x-y|$, not only at the first arrival time. The Fourier formula alone does not reveal this tail, because the same phase relation $\tau^2=|\xi|^2$ underlies both the two- and three-dimensional equations; the distinction comes from how inverse Fourier transform or descent distributes the singularity in physical space. The assumptions $f\in C^3$, $g\in C^2$, and compact support justify the descent argument, keep the boundary singularity $(t^2-|x-y|^2)^{-1/2}$ integrable in the displayed classical formula, and allow the displacement term to be differentiated. Without compact support the formula can remain locally valid, but global tail and decay statements need additional spatial control. This is the basic failure of sharp Huygens' principle in even dimensions. [example: Tail Behaviour in Two Dimensions] Take $f=0$ and let $g\in C_c^2(\mathbb R^2)$ satisfy $g\ge 0$ and $g\not\equiv 0$. Fix $x\in\mathbb R^2$, and set $K=\operatorname{supp}g$ and $R=\sup_{y\in K}|x-y|$. By the *Poisson Formula for the Two-Dimensional Wave Equation*, for every $t>R$, \begin{align*} u(t,x)=\frac{1}{2\pi}\int_K \frac{g(y)}{\sqrt{t^2-|x-y|^2}}\,d\mathcal L^2(y). \end{align*} The denominator is positive because $|x-y|\le R<t$ on $K$. Since $g$ is continuous, nonnegative, and not identically zero, there is a point $y_0$ with $g(y_0)>0$; by continuity, $g>0$ on some ball $B(y_0,\varepsilon)$. Hence, for $t>R$, \begin{align*} \int_K \frac{g(y)}{\sqrt{t^2-|x-y|^2}}\,d\mathcal L^2(y)\ge \int_{K\cap B(y_0,\varepsilon)} \frac{g(y)}{\sqrt{t^2-|x-y|^2}}\,d\mathcal L^2(y)>0. \end{align*} Thus the two-dimensional solution has a positive tail at the fixed point $x$ for all sufficiently large times. To compute its leading size, rewrite the kernel as \begin{align*} \frac{1}{\sqrt{t^2-|x-y|^2}}=\frac{1}{t}\left(1-\frac{|x-y|^2}{t^2}\right)^{-1/2}. \end{align*} For $t\ge 2R$ and $y\in K$, we have $0\le |x-y|^2/t^2\le 1/4$. Let $h(s)=(1-s)^{-1/2}$. On $0\le s\le 1/4$, \begin{align*} h'(s)=\frac{1}{2}(1-s)^{-3/2}\le \frac{4}{3\sqrt 3}. \end{align*} The [mean value theorem](/theorems/186) therefore gives \begin{align*} 0\le \left(1-\frac{|x-y|^2}{t^2}\right)^{-1/2}-1\le \frac{4}{3\sqrt 3}\frac{|x-y|^2}{t^2}. \end{align*} Consequently, \begin{align*} u(t,x)-\frac{1}{2\pi t}\int_{\mathbb R^2}g(y)\,d\mathcal L^2(y)=\frac{1}{2\pi t}\int_K g(y)\left[\left(1-\frac{|x-y|^2}{t^2}\right)^{-1/2}-1\right]\,d\mathcal L^2(y). \end{align*} Using $|x-y|\le R$ on $K$, \begin{align*} 0\le u(t,x)-\frac{1}{2\pi t}\int_{\mathbb R^2}g(y)\,d\mathcal L^2(y)\le \frac{2R^2}{3\pi\sqrt 3\,t^3}\int_{\mathbb R^2}g(y)\,d\mathcal L^2(y). \end{align*} Thus \begin{align*} u(t,x)=\frac{1}{2\pi t}\int_{\mathbb R^2}g(y)\,d\mathcal L^2(y)+O(t^{-3}) \end{align*} for this fixed $x$. Since $g\ge 0$ and $g\not\equiv 0$, the integral is positive, so the tail persists and decays like $t^{-1}$ rather than disappearing after the wave front passes. [/example] ## Basic Dispersive Decay and the Role of Dimension Energy conservation says that the total size of a wave is constant, but observed amplitudes often decrease because the wave spreads out. How much decay should be expected, and why does dimension matter? The answer is controlled by the geometry of the light cone and by stationary phase in the Fourier representation. For compactly supported smooth data in three dimensions, Kirchhoff's formula already gives a pointwise decay estimate. The surface area of $\partial B(x,t)$ grows like $t^2$, while the formula contains a prefactor of order $t^{-1}$ after differentiating the spherical average. [quotetheorem:7086] [citeproof:7086] This estimate is a pointwise statement, not an energy statement. Compact support is essential for this uniform form: a plane wave has finite local amplitude but no spreading shell and does not decay pointwise, while spatially extended periodic data can keep feeding the observation point forever. The smoothness assumptions are used only to bound the Kirchhoff integrands by finitely many sup norms; weaker Sobolev data may conserve energy but need not have pointwise values. The theorem also does not assert decay in every norm, since the conserved energy stays constant. Its role is to identify the geometric source of the $t^{-1}$ amplitude factor that the frequency-localised stationary phase estimate generalises to dimension $n$. [example: Energy Spread over a Spherical Shell] Consider radial compactly supported data in $\mathbb R^3$ whose outgoing part has the form \begin{align*} u(t,r)=\frac{1}{r}F(r-t) \end{align*} on the large-radius region, where $F\in C_c^1(\mathbb R)$ is supported in an interval $[a,b]$. The solution is nonzero only when $a\le r-t\le b$, equivalently \begin{align*} t+a\le r\le t+b. \end{align*} Thus the pulse occupies a spherical shell of thickness $b-a$, independent of $t$. Let $s=r-t$. Differentiating the outgoing profile gives \begin{align*} \partial_tu(t,r)=-\frac{1}{r}F'(s). \end{align*} Also, \begin{align*} \partial_ru(t,r)=-\frac{1}{r^2}F(s)+\frac{1}{r}F'(s). \end{align*} For a radial function in $\mathbb R^3$, $|\nabla u(t,x)|^2=|\partial_ru(t,r)|^2$ when $r=|x|$. Hence the pointwise energy density is \begin{align*} \frac{1}{2}\left(|\partial_tu|^2+|\nabla u|^2\right)=\frac{1}{2r^2}|F'(s)|^2+\frac{1}{2}\left|\frac{1}{r}F'(s)-\frac{1}{r^2}F(s)\right|^2. \end{align*} Expanding the square gives \begin{align*} \frac{1}{2}\left(|\partial_tu|^2+|\nabla u|^2\right)=\frac{1}{r^2}|F'(s)|^2-\frac{1}{r^3}F(s)F'(s)+\frac{1}{2r^4}|F(s)|^2. \end{align*} On the shell $t+a\le r\le t+b$, we have $r$ comparable to $t$ for large $t$, while $F$ and $F'$ remain bounded because they are smooth and compactly supported. Therefore the leading energy density is of size $r^{-2}$, hence of size $t^{-2}$ on the shell. The spherical volume element is $4\pi r^2\,dr$, so the factor $r^2$ from the area of spheres cancels the leading $r^{-2}$ in the density. The shell has bounded thickness, so the total energy carried by the outgoing pulse remains order one while its pointwise amplitude is order $t^{-1}$. This is the geometric reason that three-dimensional compact waves can decay pointwise without losing conserved energy. [/example] The spherical-shell example explains the three-dimensional decay geometrically, but it does not cover general frequency-localised oscillatory integrals. To estimate waves after decomposing into dyadic frequency bands, we need a dispersive bound whose proof comes from stationary phase on the frequency sphere. [quotetheorem:7087] [citeproof:7087] The exponent $(n-1)/2$ reflects the number of curved angular directions on the frequency sphere. The compact frequency support away from $0$ is essential for this clean form: at very low frequency the phase has weak curvature on the time scale $|t|\ge 1$, and a separate rescaling argument is needed for dyadic pieces near the origin. Smoothness of $a$ is also part of the estimate, since stationary phase bounds require control of finitely many amplitude derivatives; a rough cutoff can introduce kernel tails that are not controlled by the stated constant. The theorem does not estimate the full wave propagator on arbitrary data in one step, nor does it give energy decay. It supplies the dyadic building block used later in Strichartz and nonlinear estimates, where frequency pieces are summed with Sobolev weights. [remark: Low Frequency and Derivative Loss] The multiplier $\sin(t|\xi|)/|\xi|$ is singular in derivative counting at low frequency, though the singularity is removable pointwise at $\xi=0$. Global dispersive estimates for rough data therefore require either frequency decompositions or Sobolev norms that account for low and high frequencies separately. Energy estimates avoid this issue because they measure $\partial_t u$ and $\nabla u$ rather than $u$ alone. [/remark] The geometric propagation laws can now be summarised by Huygens' principle. In odd spatial dimensions at least three, compactly supported disturbances leave no tail inside the cone after the wave front has passed. In even dimensions, the solution generally remembers the interior of the cone. [quotetheorem:7088] [citeproof:7088] Sharp Huygens is stronger than finite propagation speed. Finite propagation says that data outside $\overline{B}(x,|t|)$ cannot affect $u(t,x)$; sharp Huygens says that, in odd dimensions at least three, data strictly inside the ball have already passed by. The smooth compact support hypotheses ensure that the spherical representation is pointwise meaningful and that the support statement can be read literally. If the data are only finite-energy, pointwise values on the sphere need not exist, so the statement must be interpreted after testing against smooth functions or by approximation. If the data are smooth but not compactly supported, then every sufficiently large sphere may still meet the support, and the vanishing conclusion outside a compact travelling shell no longer follows. The restriction to odd $n\ge 3$ is necessary: $n=1$ has the interval term in d'Alembert's formula, and even dimensions have interior ball terms such as the Poisson formula. The theorem therefore marks a parity-and-dimension phenomenon, and it sets the baseline for later questions about how variable coefficients, obstacles, long-range forcing, and semilinear perturbations create tails even in dimensions where the free constant-coefficient equation has none. [example: Comparing Odd and Even Dimensional Tails] Let $f=0$, and in each dimension let the velocity datum $g$ be compactly supported, nonnegative, and positive somewhere near the origin. In three dimensions, set $K_3=\operatorname{supp}g\subset \mathbb R^3$ and choose $R_3>0$ such that $K_3\subset B(0,R_3)$. By the *Kirchhoff Formula in Three Dimensions*, evaluated at $x=0$ and with $f=0$, \begin{align*} u_3(t,0)=\frac{1}{4\pi t}\int_{\partial B(0,t)}g(y)\,d\mathcal H^2(y) \end{align*} for $t>0$. If $t>R_3$, then every $y\in \partial B(0,t)$ satisfies $|y|=t>R_3$, so $y\notin K_3$ and $g(y)=0$. Hence \begin{align*} \int_{\partial B(0,t)}g(y)\,d\mathcal H^2(y)=0 \end{align*} and therefore \begin{align*} u_3(t,0)=0 \end{align*} for all $t>R_3$. In two dimensions, set $K_2=\operatorname{supp}g\subset \mathbb R^2$ and choose $R_2>0$ such that $K_2\subset B(0,R_2)$. By the *Poisson Formula for the Two-Dimensional Wave Equation*, again with $f=0$ and $x=0$, \begin{align*} u_2(t,0)=\frac{1}{2\pi}\int_{K_2}\frac{g(y)}{\sqrt{t^2-|y|^2}}\,d\mathcal L^2(y) \end{align*} for every $t>R_2$. On $K_2$ we have $|y|\le R_2<t$, so $\sqrt{t^2-|y|^2}>0$. Since $g\ge 0$ and $g$ is positive at some point, continuity gives a ball $B(y_0,\varepsilon)$ on which $g>0$. Thus \begin{align*} \int_{K_2}\frac{g(y)}{\sqrt{t^2-|y|^2}}\,d\mathcal L^2(y)\ge \int_{K_2\cap B(y_0,\varepsilon)}\frac{g(y)}{\sqrt{t^2-|y|^2}}\,d\mathcal L^2(y)>0, \end{align*} so \begin{align*} u_2(t,0)>0 \end{align*} for every $t>R_2$. For its leading size, write \begin{align*} \frac{1}{\sqrt{t^2-|y|^2}}=\frac{1}{t}\left(1-\frac{|y|^2}{t^2}\right)^{-1/2}. \end{align*} If $t\ge 2R_2$ and $y\in K_2$, then $0\le |y|^2/t^2\le 1/4$. The mean value theorem applied to $h(s)=(1-s)^{-1/2}$ on $[0,1/4]$ gives \begin{align*} 0\le \left(1-\frac{|y|^2}{t^2}\right)^{-1/2}-1\le C\frac{|y|^2}{t^2} \end{align*} for a constant $C$ independent of $t$ and $y$. Therefore \begin{align*} u_2(t,0)-\frac{1}{2\pi t}\int_{\mathbb R^2}g(y)\,d\mathcal L^2(y)=\frac{1}{2\pi t}\int_{K_2}g(y)\left[\left(1-\frac{|y|^2}{t^2}\right)^{-1/2}-1\right]\,d\mathcal L^2(y), \end{align*} and the absolute value of the right-hand side is bounded by \begin{align*} \frac{CR_2^2}{2\pi t^3}\int_{\mathbb R^2}g(y)\,d\mathcal L^2(y). \end{align*} Thus \begin{align*} u_2(t,0)=\frac{1}{2\pi t}\int_{\mathbb R^2}g(y)\,d\mathcal L^2(y)+O(t^{-3}). \end{align*} The same compact, nonnegative velocity source therefore gives a passing front in three dimensions, where the sphere eventually misses the support, but a persistent two-dimensional tail, where the disk integral continues to see the source for all later times. [/example] This chapter gives three complementary views of the same equation. The Fourier formula diagonalises the flow and exposes oscillation; Kirchhoff and related formulas expose the light-cone geometry; dispersive estimates quantify how oscillation and geometry turn conserved energy into pointwise decay. These viewpoints will be reused when variable coefficients, forcing, and semilinear perturbations are treated as controlled deviations from the free wave evolution. Fourier methods reveal the oscillatory structure behind wave propagation, while the earlier energy and cone arguments give stability and causality. The next chapter returns to parabolic equations, but now with nonlinear reaction terms, where semigroup and Duhamel ideas must be combined with a fixed-point argument. # 9. Semilinear Parabolic Equations Semilinear parabolic equations combine the smoothing and maximum-principle structure of the heat equation with nonlinear reaction terms. The heat semigroup from Chapter 2 and the abstract Duhamel formula from Chapters 1 and 4 give the linear starting point; the new question is how far those tools survive when the source depends on the unknown solution itself. The chapter assumes the preceding material on strongly continuous semigroups, Duhamel's formula, the parabolic maximum principle, and basic Sobolev or continuous-function solution spaces. It develops local well-posedness in semigroup spaces, explains the mechanism by which solutions can fail to continue, and then uses order and dissipation to obtain global existence in important reaction-diffusion models. ## Local Existence in Semigroup Spaces The first problem is to make sense of an equation whose linear part has already been solved but whose forcing term is $F(u)$ rather than a prescribed function. We work in a Banach space setting that captures the heat equation on $\mathbb R^n$, on bounded domains, and in function spaces such as $C_0(\Omega)$ or $L^q(\Omega)$. [definition: Mild Solution of a Semilinear Parabolic Equation] Let $X$ be a Banach space, let $A:D(A)\subset X\to X$ generate a strongly continuous semigroup $(S(t))_{t\ge 0}$ on $X$, and let $F:D(F)\subset X\to X$. A function $u\in C([0,T];X)$ is a mild solution of $u_t=Au+F(u)$ with initial datum $u_0\in X$ if $u(t)\in D(F)$ for $0\le t\le T$, the map $t\mapsto F(u(t))$ is continuous from $[0,T]$ to $X$, and \begin{align*} u(t)=S(t)u_0+\int_0^t S(t-s)F(u(s))\,ds \end{align*} for every $t\in[0,T]$. [/definition] This definition packages the linear evolution into the semigroup and leaves the nonlinearity inside the Duhamel integral. The next task is to prove that the integral equation has exactly one solution for a short time; the natural tool is the [contraction mapping theorem](/theorems/71) on a closed ball in $C([0,T];X)$. [quotetheorem:7089] [citeproof:7089] The theorem is deliberately local in time: it constructs the solution only while the fixed-point ball stays inside a region where $F$ has a finite Lipschitz constant. The bounded-subset form of the hypothesis is stronger than pointwise local Lipschitz continuity, and it is the uniformity needed when the initial datum is allowed to vary over a bounded set or when the solution is restarted near a possible maximal time. The hypothesis is not cosmetic. For instance, the scalar equation $y'=|y|^{1/2}$ with $y(0)=0$ has more than one solution, so Lipschitz control cannot be replaced by mere continuity if uniqueness is part of the conclusion. The semigroup bound is also part of the mechanism: it is what keeps the Duhamel operator controlled on a short time interval, while the Banach-space setting supplies the [complete metric space](/page/Complete%20Metric%20Space) needed for the contraction argument. The theorem does not assert global existence, classical differentiability, or smoothing beyond what the chosen semigroup and function space provide. The next example shows how the abstract theorem recovers the usual reaction-diffusion model. [example: Power Reaction Local Solution] Let $\Omega\subset\mathbb R^n$ be bounded and smooth, take $X=C_0(\Omega)$ with the sup norm, and let $A=\Delta$ with homogeneous Dirichlet boundary condition. For $p\in\mathbb N$ with $p>1$, define $F(u)=u^p$ pointwise. If $\|u\|_\infty,\|v\|_\infty\le R$, then for each $x\in\Omega$ the factorization $a^p-b^p=(a-b)(a^{p-1}+a^{p-2}b+\cdots+b^{p-1})$ gives \begin{align*} |u(x)^p-v(x)^p|=|u(x)-v(x)|\left|u(x)^{p-1}+u(x)^{p-2}v(x)+\cdots+v(x)^{p-1}\right|. \end{align*} Each of the $p$ terms in the sum has absolute value at most $R^{p-1}$, so \begin{align*} |u(x)^p-v(x)^p|\le pR^{p-1}|u(x)-v(x)|. \end{align*} Taking the supremum over $x$ yields \begin{align*} \|F(u)-F(v)\|_\infty\le pR^{p-1}\|u-v\|_\infty. \end{align*} Thus $F$ is Lipschitz on every bounded subset of $C_0(\Omega)$, and *[Local Well-Posedness for Semilinear Evolution Equations](/theorems/7089)* gives a unique mild solution on some interval $[0,T]$. If $p>1$ is not an integer, the expression $u^p$ is not a real-valued function for arbitrary sign-changing $u\in C_0(\Omega)$. One precise choice is to restrict to the closed positive cone and use nonnegative initial data; then $u^p$ is defined pointwise and is nonnegative. Another precise choice is the signed power $F(u)=|u|^{p-1}u$, for which the scalar map $\phi(s)=|s|^{p-1}s$ satisfies $|\phi'(s)|=p|s|^{p-1}$ for $s\ne0$ and has derivative $0$ at $s=0$, hence the mean value theorem gives $|\phi(a)-\phi(b)|\le pR^{p-1}|a-b|$ whenever $|a|,|b|\le R$. Under either interpretation, the same bounded-subset Lipschitz estimate feeds into the local theorem. For nonnegative data in the positive-cone interpretation, positivity is preserved during the local existence interval. Indeed, the mild formula reads \begin{align*} u(t)=S(t)u_0+\int_0^t S(t-s)u(s)^p\,ds. \end{align*} The Dirichlet heat semigroup is positive, so $S(t)u_0\ge0$ when $u_0\ge0$, and $S(t-s)u(s)^p\ge0$ whenever $u(s)\ge0$. The fixed-point construction can therefore be carried out in the closed cone of nonnegative paths, giving $u(t)\ge0$ for $0\le t\le T$. Thus the abstract semigroup theorem recovers the standard local solution for $u_t-\Delta u=u^p$ once the power nonlinearity is interpreted on a domain where it is genuinely defined. [/example] Semigroup spaces are useful because they separate two effects: the linear semigroup provides smoothing and time propagation, while the fixed-point argument uses bounded-subset Lipschitz control of $F$. In applications the theorem is often improved by choosing $X=L^q(\Omega)$, $C^\alpha(\bar{\Omega})$, or a Sobolev space, then using heat estimates to place the nonlinear term in the right integrability class. ## Blow-Up Alternatives and Continuation Criteria Local existence gives a solution up to some time, but it does not say whether the endpoint is a genuine singularity or an artefact of the construction. The continuation question asks for a criterion that distinguishes loss of existence from a removable endpoint. [definition: Maximal Mild Solution] Let $X$ be a Banach space, let $A:D(A)\subset X\to X$ generate a strongly continuous semigroup $(S(t))_{t\ge 0}$ on $X$, and let $F:D(F)\subset X\to X$. A mild solution $u:[0,T_{\max})\to X$ of $u_t=Au+F(u)$ with $u(0)=u_0$ is maximal if there is no $T'>T_{\max}$ and no mild solution $v:[0,T')\to X$ with $v(t)=u(t)$ for all $t<T_{\max}$. [/definition] Maximality means that the solution has been continued as far as the local theory permits. This raises the main diagnostic question for the rest of the chapter: if a finite endpoint occurs, must some measurable quantity become unbounded? The blow-up alternative answers this in the solution norm. [quotetheorem:7090] [citeproof:7090] The theorem converts a qualitative question about existence into an a priori estimate problem. If a separate argument controls $\|u(t)\|_X$ on every finite interval, then the local solution is global. The assumptions matter: maximality is needed because a non-maximal solution can stop at an arbitrary time even when its norm is bounded, and the conclusion is only a statement about finite endpoints because no continuation beyond $T_{\max}=\infty$ is being requested. Lipschitz control on bounded subsets is essential for a uniform restart time on bounded sets; pointwise local Lipschitz continuity alone does not supply this uniform constant in an arbitrary infinite-dimensional Banach space. Without enough Lipschitz control, as in non-Lipschitz scalar ODE models such as $y'=|y|^{1/2}$, uniqueness and restart arguments can fail. The theorem does not rule out growth as $t\to\infty$; it says only that finite-time loss of existence must be detected by the chosen $X$-norm. It is useful to name this type of estimate because later arguments will provide it through either comparison or energy dissipation. [definition: Continuation Criterion] Fix a solution class $\mathcal S_T$ for solutions on $[0,T)$ in a Banach space $X$. A continuation criterion for a semilinear evolution equation is a specified predicate \begin{align*} Q_T:\mathcal S_T\to\{\text{true},\text{false}\} \end{align*} together with a time $\varepsilon>0$ such that every $u\in\mathcal S_T$ satisfying $Q_T[u]=\text{true}$ has an extension in the same solution class to $[0,T+\varepsilon)$. [/definition] For nonlinearities that are Lipschitz on bounded subsets of $X$, boundedness in the solution norm is the basic continuation criterion. In stronger function spaces, more refined criteria are possible: for instance, a solution constructed in $C^\alpha(\bar{\Omega})$ may continue as long as its $L^\infty$ norm remains bounded, because parabolic smoothing recovers the Hölder control after any positive time. [example: ODE Profile for Power Blow-Up] For $p>1$ and $y_0>0$, the scalar profile $y'=y^p$, $y(0)=y_0$, stays positive as long as it exists, so we can multiply by $y^{-p}$ and rewrite the equation as \begin{align*} y^{-p}y'=1. \end{align*} Since \begin{align*} \frac{d}{dt}\left(y^{1-p}\right)=(1-p)y^{-p}y'=1-p, \end{align*} integration from $0$ to $t$ gives \begin{align*} y(t)^{1-p}-y_0^{1-p}=(1-p)t. \end{align*} Thus \begin{align*} y(t)^{1-p}=y_0^{1-p}-(p-1)t. \end{align*} Raising both sides to the power $1/(1-p)=-1/(p-1)$ gives \begin{align*} y(t)=\left(y_0^{1-p}-(p-1)t\right)^{-1/(p-1)}. \end{align*} The expression is finite exactly while $y_0^{1-p}-(p-1)t>0$, namely for \begin{align*} 0\le t<\frac{y_0^{1-p}}{p-1}. \end{align*} As $t\uparrow T=\frac{y_0^{1-p}}{p-1}$, the positive factor $y_0^{1-p}-(p-1)t$ tends to $0$, and the exponent $-1/(p-1)$ is negative, so $y(t)\to\infty$. For the PDE $u_t-\Delta u=u^p$, a spatially constant profile $U(t,x)=y(t)$ satisfies $\Delta U=0$ and \begin{align*} \partial_t U-\Delta U=y'(t)=y(t)^p=U(t,x)^p. \end{align*} Diffusion may slow or prevent the ODE growth, but the *Blow-Up Alternative for Semilinear Parabolic Equations* says that any finite-time breakdown of the mild PDE solution must still be detected by divergence of the chosen solution norm. [/example] The power reaction example also warns that local parabolic smoothing does not by itself prevent blow-up. To get global solutions, the equation needs a structural estimate that bounds the solution uniformly in time or bounds an energy that controls the solution norm. ## Maximum-Principle Bounds for Reaction-Diffusion Equations The next question is how to prove a priori bounds without solving the equation explicitly. For scalar reaction-diffusion equations, the parabolic maximum principle compares the solution with scalar barriers solving ODEs. [definition: Reaction-Diffusion Equation] Let $\Omega\subset\mathbb R^n$ be open and let $f:\mathbb R\to\mathbb R$ be locally Lipschitz. A scalar reaction-diffusion equation with homogeneous Dirichlet boundary condition is \begin{align*} \partial_t u-\Delta u=f(u) \quad \text{in } (0,T)\times\Omega, \qquad u=0 \quad \text{on } (0,T)\times\partial\Omega. \end{align*} [/definition] The reaction term describes pointwise growth or decay, while the Laplacian transports information through diffusion. The pointwise order of two candidates should persist when their initial and boundary values are ordered and their differential inequalities have the same order. This need is captured by the comparison principle. [quotetheorem:7091] [citeproof:7091] Comparison is the bridge from finite-dimensional ODE barriers to infinite-dimensional PDE bounds. The ordering hypotheses are indispensable: if the initial data are reversed at even one point, or if the boundary data force $u>v$ on part of the parabolic boundary, no interior maximum-principle argument can restore the desired order. Local Lipschitz continuity of $f$ is what turns $f(u)-f(v)$ into a bounded coefficient times $u-v$ on the range of the two solutions; for rough nonlinearities this reduction can fail. The scalar nonlinearity $f(s)=|s|^{1/2}$ already shows the danger: the ODE $y'=f(y)$ with $y(0)=0$ has both the solution $y\equiv0$ and delayed positive solutions, so order and uniqueness can no longer be separated cleanly by the coefficient argument above. The bounded smooth domain and classical regularity assumptions are the setting in which the stated parabolic maximum principle applies directly, though weak variants exist with additional approximation work. The next global existence result is needed because an invariant interval gives exactly the $L^\infty$ estimate required by the blow-up alternative. [example: Boundary Ordering Cannot Be Recovered in the Interior] Take $\Omega=(0,1)$, let $f=0$, set $v(t,x)=0$, and set $u(t,x)=1-x$. We compute \begin{align*} \partial_t u(t,x)=0 \end{align*} because $u$ has no $t$-dependence, and \begin{align*} \partial_x u(t,x)=-1 \end{align*} so \begin{align*} \partial_{xx}u(t,x)=0. \end{align*} Hence $\Delta u=\partial_{xx}u=0$, and therefore \begin{align*} \partial_tu-\Delta u-f(u)=0-0-0=0. \end{align*} For $v\equiv0$, we also have $\partial_t v=0$, $\Delta v=0$, and $f(v)=0$, so \begin{align*} \partial_tv-\Delta v-f(v)=0-0-0=0. \end{align*} Thus the differential inequality is satisfied with equality. The order hypotheses fail on the parabolic boundary. At time $t=0$, \begin{align*} u(0,x)=1-x \end{align*} and, for every $0<x<1$, the inequality $1-x>0$ gives \begin{align*} u(0,x)>0=v(0,x), \end{align*} so the required initial ordering $u(0,\cdot)\le v(0,\cdot)$ is reversed. At the left boundary point, \begin{align*} u(t,0)=1-0=1 \end{align*} while \begin{align*} v(t,0)=0, \end{align*} so $u(t,0)>v(t,0)$ for every $t$. The desired conclusion also fails in the interior, since for $0<x<1$ we have $1-x>0$, hence \begin{align*} u(t,x)=1-x>0=v(t,x). \end{align*} The differential inequality alone does not create the missing initial and boundary order; comparison needs that order as an input. [/example] With the boundary and initial ordering made explicit, comparison can be used as a global existence tool rather than only as a uniqueness device. The obstruction in semilinear problems is that the reaction term can drive values of $u$ outside any a priori range, which would leave local existence estimates with no uniform continuation bound. If the endpoint constants act as sub- and supersolutions, comparison traps the whole solution between them and converts an ODE phase-line condition into a PDE pointwise bound. [quotetheorem:7092] [citeproof:7092] The hypotheses $f(a)\ge0$ and $f(b)\le0$ say that the vector field points inward at the endpoints of the interval. This is the PDE analogue of a positively invariant interval for an autonomous ODE. If $f(a)<0$, then the constant lower barrier $a$ is pushed downward by the reaction term, so a solution starting at the lower edge can immediately leave the interval. If $f(b)>0$, the same obstruction appears at the upper edge. Boundary data are equally important: even if the reaction points inward, prescribing boundary values outside $[a,b]$ forces the solution to violate the interval at the parabolic boundary, so comparison cannot prove an interior invariant bound. The final global-existence sentence should be read as a continuation step after the comparison estimate, not as a new existence theorem for arbitrary boundary data. With nonhomogeneous Dirichlet data, the mild formulation is usually obtained either by subtracting a sufficiently regular extension of $g$ and solving an equation with homogeneous boundary condition plus additional forcing, or by using a parabolic evolution framework that builds the boundary data into the domain and trace condition. Once that framework supplies a local solution and an $L^\infty$ continuation criterion, the invariant interval estimate prevents the finite-time obstruction. [example: Fisher-KPP Equation] The Fisher-KPP equation is \begin{align*} \partial_t u-\Delta u=u(1-u). \end{align*} Here the reaction function is $f(s)=s(1-s)$. At the lower endpoint $0$, \begin{align*} f(0)=0(1-0)=0. \end{align*} At the upper endpoint $1$, \begin{align*} f(1)=1(1-1)=0. \end{align*} Thus, with $a=0$ and $b=1$, the endpoint conditions $f(a)\ge0$ and $f(b)\le0$ hold because $0\ge0$ and $0\le0$. Assume the initial data and boundary data satisfy $0\le u_0\le1$ and $0\le g\le1$ on the parabolic boundary. By *[Global Existence from Invariant Bounds](/theorems/7092)*, the solution remains in the invariant interval: \begin{align*} 0\le u(t,x)\le1 \end{align*} for every $(t,x)$ in its lifespan. Therefore \begin{align*} \|u(t,\cdot)\|_{L^\infty(\Omega)}\le1 \end{align*} throughout that lifespan. The *Blow-Up Alternative for Semilinear Parabolic Equations* says that finite-time breakdown would force the chosen solution norm, in particular the $L^\infty$ norm in this continuation setting, to become unbounded. Since the estimate above keeps that norm bounded by $1$, no finite-time breakdown can occur, so the solution is global. The sign of $f(s)=s(1-s)$ also explains the model: for $0<s<1$, both factors $s$ and $1-s$ are positive, so $f(s)>0$ and the population grows; at $s=1$, the factor $1-s$ vanishes, giving saturation at carrying capacity. [/example] The same comparison method applies to equations whose nonlinearity is not sign-definite everywhere, provided the physically or analytically relevant interval is invariant. This is the most common route from local semilinear theory to global bounded solutions. ## Dissipation and Gradient Flow Structure Maximum-principle bounds are pointwise. A complementary problem is to find a quantity that decreases along the flow and therefore measures long-time relaxation toward equilibrium. [definition: Allen-Cahn Energy] Let $\Omega\subset\mathbb R^n$ be bounded and smooth, and let $W:\mathbb R\to[0,\infty)$ be given by \begin{align*} W(s)=\frac14(s^2-1)^2. \end{align*} The Allen-Cahn energy associated with $W$ is the functional $E:H^1(\Omega)\to[0,\infty]$ defined by \begin{align*} E[u]=\int_\Omega \left(\frac12|\nabla u|^2+W(u)\right)\,d\mathcal L^n. \end{align*} [/definition] This energy penalizes both interfaces, through $|\nabla u|^2$, and departure from the preferred phases $u=\pm1$, through $W(u)$. The key question is whether this formal variational picture gives a usable estimate for the PDE. Along a genuine $L^2$-gradient flow, energy should not merely be bounded; its rate of decrease should be the square of the velocity, which is the quantity needed to control relaxation and rule out energy creation by the reaction term. [quotetheorem:7093] [citeproof:7093] Energy dissipation alone does not always give an $L^\infty$ bound, but for Allen-Cahn the maximum principle supplies the missing pointwise estimate when the initial data lie between $-1$ and $1$. The smoothness assumption is what justifies differentiating $E[u(t)]$ and integrating by parts without approximation; weak solutions require a separate density or lower-semicontinuity argument. The boundary condition is also structural: if the integration by parts boundary term does not vanish, the identity acquires a term such as $\int_{\partial\Omega} \partial_\nu u\,\partial_t u\,d\mathcal H^{n-1}$, whose sign can inject energy through the boundary instead of dissipating it. Finally, a decreasing energy may control only an $H^1$-type quantity and potential integral. In dimensions where $H^1(\Omega)$ does not embed into $L^\infty(\Omega)$, a sequence can have bounded energy while developing increasingly high narrow peaks, so this kind of energy bound alone does not prevent pointwise blow-up or guarantee global bounded classical solutions in every semilinear model. [example: Boundary Flux Can Increase the Energy] On $\Omega=(0,1)$, the outward unit normal is $\nu(0)=-1$ at the left endpoint and $\nu(1)=1$ at the right endpoint. For a smooth function $u(t,x)$, the gradient part of the Allen-Cahn energy satisfies \begin{align*} \frac{d}{dt}\int_0^1 \frac12 |u_x(t,x)|^2\,dx=\int_0^1 u_x(t,x)u_{tx}(t,x)\,dx. \end{align*} Integrating by parts in $x$ gives \begin{align*} \int_0^1 u_xu_{tx}\,dx=\left[u_xu_t\right]_{x=0}^{x=1}-\int_0^1 u_{xx}u_t\,dx. \end{align*} The boundary bracket is \begin{align*} \left[u_xu_t\right]_{x=0}^{x=1}=u_x(t,1)u_t(t,1)-u_x(t,0)u_t(t,0). \end{align*} Equivalently, since $\partial_\nu u(t,0)=-u_x(t,0)$ and $\partial_\nu u(t,1)=u_x(t,1)$, the boundary contribution is \begin{align*} \int_{\partial\Omega}\partial_\nu u\,\partial_tu\,d\mathcal H^0=-u_x(t,0)u_t(t,0)+u_x(t,1)u_t(t,1). \end{align*} Now choose the boundary behavior so that $u_t(t,0)<0$, $u_x(t,0)>0$, and $u_x(t,1)u_t(t,1)=0$. Then \begin{align*} -u_x(t,0)u_t(t,0)>0 \end{align*} because $u_x(t,0)>0$ and $u_t(t,0)<0$. Hence \begin{align*} \int_{\partial\Omega}\partial_\nu u\,\partial_tu\,d\mathcal H^0=-u_x(t,0)u_t(t,0)>0. \end{align*} This positive boundary term is energy entering through the boundary, so the Allen-Cahn identity with only the negative bulk term requires homogeneous Neumann, periodic, or otherwise compatible boundary conditions that make this boundary contribution vanish. [/example] This boundary-flux example is separate from the global boundedness mechanism. For Allen-Cahn with compatible boundary conditions and initial data in the invariant interval, comparison and energy dissipation reinforce each other. [example: Allen-Cahn as a Global Semilinear Flow] For the Allen-Cahn reaction term $f(s)=s-s^3$, the endpoint values on the interval $[-1,1]$ are \begin{align*} f(-1)=(-1)-(-1)^3=-1-(-1)=0. \end{align*} At the upper endpoint, \begin{align*} f(1)=1-1^3=1-1=0. \end{align*} Thus, with $a=-1$ and $b=1$, the invariant-interval conditions are satisfied because $f(a)=0\ge0$ and $f(b)=0\le0$. Assume the initial and boundary data satisfy $-1\le u_0\le1$ and are compatible with the same interval on the parabolic boundary. By *Global Existence from Invariant Bounds*, every solution in its lifespan satisfies \begin{align*} -1\le u(t,x)\le1. \end{align*} Taking the supremum over $x\in\Omega$ gives \begin{align*} \|u(t,\cdot)\|_{L^\infty(\Omega)}\le1. \end{align*} The *Allen-Cahn Energy Dissipation* identity gives \begin{align*} \frac{d}{dt}E[u(t)]=-\int_\Omega |\partial_tu(t,x)|^2\,d\mathcal L^n(x). \end{align*} Since $|\partial_tu(t,x)|^2\ge0$ for every $x$, the integral is nonnegative, so \begin{align*} -\int_\Omega |\partial_tu(t,x)|^2\,d\mathcal L^n(x)\le0. \end{align*} Hence \begin{align*} \frac{d}{dt}E[u(t)]\le0, \end{align*} so the energy is nonincreasing along the flow. The pointwise bound prevents finite-time blow-up in the $L^\infty$ continuation setting, while the energy identity records the dissipative motion down the double-well energy landscape. [/example] The three mechanisms in this chapter fit together in the usual workflow for semilinear parabolic equations. First, local existence is obtained by applying the contraction mapping theorem to the Duhamel formula. Second, the blow-up alternative reduces global existence to an a priori bound. Third, maximum principles and dissipative energies provide those bounds for reaction-diffusion equations such as Fisher-KPP and Allen-Cahn. The same pattern reappears outside classical PDE: invariant regions are used in population dynamics and chemical reaction networks, while Lyapunov functionals play the role of energy in gradient-based optimization and nonequilibrium thermodynamics. Semilinear parabolic equations retain the smoothing backbone of the heat flow, but the nonlinearity forces us to track how bounds interact with the reaction term. The next chapter makes the same transition for hyperbolic equations, where the highest-order part is still the wave operator but the nonlinear term must be controlled without any parabolic smoothing. # 10. Semilinear Wave Equations This chapter moves from linear wave equations to nonlinear perturbations whose highest-order part is still the wave operator. The guiding question is whether the energy method from the linear theory can control a solution when the equation contains a power of the unknown. We focus on local well-posedness in the natural energy space, the role of scaling in deciding which nonlinearities are subcritical, and the stability estimates that make the solution map continuous. The prerequisites are the linear wave propagator and Duhamel formula from Chapters 6 and 8, Sobolev embedding, and the Strichartz estimates for the wave equation. ## Local Well-Posedness in Energy Spaces For the linear wave equation, the conserved energy gives control of $(u(t),\partial_t u(t))$ in $H^1(\mathbb R^n)\times L^2(\mathbb R^n)$. The first nonlinear question is whether the same energy space is strong enough to construct solutions to \begin{align*} \partial_t^2 u-\Delta u+F'(u)=0, \end{align*} where the nonlinearity is lower order and has polynomial growth. [definition: Energy Solution For A Semilinear Wave Equation] Let $I\subset \mathbb R$ be an interval with $0\in I$, let $U\subseteq \mathbb R^n$ be open, and let $F\in C^1(\mathbb R)$ with $F(0)=0$. A function $u:I\times U\to \mathbb R$ is an energy solution of \begin{align*} \partial_t^2u-\Delta u+F'(u)=0 \end{align*} with initial data $(u_0,u_1)\in H^1(U)\times L^2(U)$ if \begin{align*} u&\in C(I;H^1(U)),& \partial_tu&\in C(I;L^2(U)),& F'(u)&\in L^1_{\mathrm{loc}}(I\times U), \end{align*} $u(0)=u_0$, $\partial_tu(0)=u_1$, and the equation holds in $\mathcal D'(I\times U)$. [/definition] This definition keeps the same topology as the linear energy estimate and interprets the equation distributionally, since second derivatives need not be functions. The nonlinear term is meaningful when Sobolev embedding places $u$ in a sufficiently high $L^q$ space. [definition: Defocusing Power Nonlinearity] For $p>1$, the defocusing power nonlinearity is the map \begin{align*} N_p:\mathbb R&\to\mathbb R,& a&\mapsto |a|^{p-1}a. \end{align*} Its potential is the map \begin{align*} F_p:\mathbb R&\to\mathbb R,& a&\mapsto \frac{1}{p+1}|a|^{p+1}. \end{align*} [/definition] The word defocusing refers to the sign of the potential energy: it contributes a nonnegative term to the conserved energy. After fixing this model nonlinearity, the next task is to prove that the Cauchy problem has an energy solution before using conservation laws to extend it. The most direct construction uses only the energy estimate for the forced linear wave equation, so it requires the map $u\mapsto |u|^{p-1}u$ to be locally Lipschitz from $H^1(\mathbb R^n)$ into $L^2(\mathbb R^n)$. This gives a clean local theorem in the lower part of the energy-subcritical range; the remaining subcritical powers require the sharper Strichartz and fractional-derivative machinery discussed after the theorem. [quotetheorem:7094] [citeproof:7094] The theorem is local: it gives existence on a short time interval and does not by itself prevent finite-time breakdown. Its exponent range is deliberately narrower than the full energy-subcritical range \begin{align*} 1<p<\frac{n+2}{n-2}. \end{align*} For powers \begin{align*} \frac{n}{n-2}<p<\frac{n+2}{n-2}, \end{align*} the estimate $H^1\to L^{2p}$ is no longer available, so the nonlinearity is not controlled in $L^1_tL^2_x$ by the energy norm alone. The full energy-subcritical local theory replaces the single $L^1_tL^2_x$ forcing estimate by wave Strichartz spaces and, in higher dimensions or near the critical exponent, fractional chain-rule estimates for derivatives of the nonlinearity. Those estimates still give a lifespan depending only on the energy norm, but that conclusion is a separate analytic theorem rather than a consequence of the elementary Lipschitz bound proved above. The energy-space hypothesis is also essential: if initial data are only in $L^2\times H^{-1}$, the potential energy and the product $|u|^{p-1}u$ may not even define the dual forcing term used in Duhamel's formula. For a concrete obstruction, choose in dimension $n\ge 3$ a smooth compactly supported function $\phi$ with $\phi(0)\ne 0$ and set $\phi_\lambda(x)=\lambda^{n/2}\phi(\lambda x)$. Then $\|\phi_\lambda\|_{L^2}$ stays constant, but \begin{align*} \|\nabla\phi_\lambda\|_{L^2}=\lambda\|\nabla\phi\|_{L^2}, \end{align*} so the sequence is bounded in $L^2$ and unbounded in $H^1$. The local theorem cannot assign a uniform lifespan to such data, and the potential term $\int_{\mathbb R^n}|\phi_\lambda|^{p+1}\,d\mathcal L^n$ may diverge along the sequence. [example: Defocusing Cubic Wave Equation In Three Dimensions] In dimension $n=3$, take $p=3$. Since the unknown is real-valued, the defocusing power nonlinearity is \begin{align*} N_3(a)=|a|^{3-1}a=|a|^2a=a^3. \end{align*} Thus the Cauchy problem is \begin{align*} \partial_t^2u-\Delta u+u^3=0,\qquad (u(0),\partial_tu(0))=(u_0,u_1)\in H^1(\mathbb R^3)\times L^2(\mathbb R^3). \end{align*} The energy-critical exponent in three dimensions is \begin{align*} \frac{n+2}{n-2}=\frac{3+2}{3-2}=\frac{5}{1}=5. \end{align*} Because \begin{align*} 3<5, \end{align*} the cubic equation is energy-subcritical. The elementary local theorem also applies here, since \begin{align*} \frac{n}{n-2}=\frac{3}{3-2}=3, \end{align*} so $p=3$ lies at the endpoint of the range $1<p\le n/(n-2)$. Hence there is $T>0$ and a unique solution in \begin{align*} C([-T,T];H^1(\mathbb R^3))\cap C^1([-T,T];L^2(\mathbb R^3)). \end{align*} For $p=3$, the potential coefficient is \begin{align*} \frac{1}{p+1}=\frac{1}{3+1}=\frac14, \end{align*} so the energy contains the nonnegative quartic term \begin{align*} \frac14\int_{\mathbb R^3}|u|^4\,d\mathcal L^3. \end{align*} The Sobolev embedding $H^1(\mathbb R^3)\hookrightarrow L^6(\mathbb R^3)$ gives the needed $L^2$ control of the cubic forcing: \begin{align*} \|u^3\|_{L^2}=\left(\int_{\mathbb R^3}|u|^6\,d\mathcal L^3\right)^{1/2}=\|u\|_{L^6}^3\le C\|u\|_{H^1}^3. \end{align*} For two functions $u,v\in H^1(\mathbb R^3)$, the algebraic identity \begin{align*} u^3-v^3=(u-v)(u^2+uv+v^2) \end{align*} and Hölder's inequality give \begin{align*} \|u^3-v^3\|_{L^2}\le \|u-v\|_{L^6}\left(\|u\|_{L^6}^2+\|u\|_{L^6}\|v\|_{L^6}+\|v\|_{L^6}^2\right). \end{align*} Applying the same Sobolev embedding to each $L^6$ factor yields \begin{align*} \|u^3-v^3\|_{L^2}\le C\|u-v\|_{H^1}\left(\|u\|_{H^1}^2+\|u\|_{H^1}\|v\|_{H^1}+\|v\|_{H^1}^2\right). \end{align*} This is exactly the local Lipschitz control of the nonlinearity in $L^2$ used by the energy-space fixed-point argument. [/example] ## Energy-Subcritical And Critical Scaling Heuristics Local theory depends on estimates, but scaling predicts which estimates should be possible. The question is: if the equation is rescaled in a way that preserves its form, how does the conserved energy change? [definition: Scaling Of The Power Wave Equation] For $p>1$ and $\lambda>0$, the scaling operator associated with the power wave equation is the map \begin{align*} S_\lambda:C^2(\mathbb R^{1+n})&\to C^2(\mathbb R^{1+n}),& S_\lambda u&=u_\lambda, \end{align*} where \begin{align*} u_\lambda(t,x)=\lambda^{\frac{2}{p-1}}u(\lambda t,\lambda x). \end{align*} [/definition] This scaling leaves the differential equation invariant: the two linear derivatives and the power nonlinearity acquire the same homogeneity. To compare this invariance with well-posedness in Sobolev spaces, we need the regularity level whose norm has the same homogeneity under the scaling. [definition: Critical Sobolev Exponent For The Power Wave Equation] For $p>1$ and $n\ge 1$, the critical Sobolev exponent of the power wave equation is \begin{align*} s_c=\frac n2-\frac{2}{p-1}. \end{align*} [/definition] The energy space corresponds to $s=1$ for $u$ and $s=0$ for $\partial_tu$. This comparison motivates the standard classification of powers according to whether the energy regularity is above, equal to, or below the critical Sobolev exponent. [definition: Energy-Subcritical Critical And Supercritical Powers] Let $n\ge 3$ and $p>1$. The power $p$ is energy-subcritical if \begin{align*} p<\frac{n+2}{n-2}, \end{align*} energy-critical if \begin{align*} p=\frac{n+2}{n-2}, \end{align*} and energy-supercritical if \begin{align*} p>\frac{n+2}{n-2}. \end{align*} [/definition] In the subcritical regime, rescaling a solution to smaller spatial scales decreases the nonlinear part relative to the energy topology. The next calculation verifies that the borderline in this classification is exactly the exponent for which the full conserved energy is scale-invariant. [quotetheorem:7095] [citeproof:7095] This calculation explains why the exponent in the local existence theorem is not accidental. The assumption $n\ge 3$ is part of the scaling classification because only then does the Sobolev energy embedding have a finite critical power; for $n=1,2$, the expression \begin{align*} \frac{n+2}{n-2} \end{align*} does not define a positive finite threshold, and every finite power is handled by different subcritical estimates. The gradient and kinetic terms force the same condition, so changing the sign of the potential would not move the scaling threshold even though it would affect global dynamics. The theorem does not prove well-posedness at the critical exponent; it only identifies the point where scaling stops making concentration energetically more expensive. For a concrete failure mode, take a nonzero $\phi\in C_c^\infty(\mathbb R^n)$ and define \begin{align*} \phi_\lambda(x)=\lambda^{\frac{2}{p-1}}\phi(\lambda x). \end{align*} In the supercritical range, these profiles concentrate near the origin while the scaling of the energy no longer penalises the concentration in the way needed by the subcritical fixed-point argument. This is the diagnostic obstruction behind the energy-supercritical regime. [example: Critical Exponent In Three Dimensions] For $n=3$, the energy-critical exponent is \begin{align*} p=\frac{n+2}{n-2}=\frac{3+2}{3-2}=\frac{5}{1}=5. \end{align*} Thus the defocusing equation with $p=5$, namely $\partial_t^2u-\Delta u+u^5=0$, is energy-critical. The cubic equation has $p=3$, and since \begin{align*} 3<5, \end{align*} it is energy-subcritical. For the cubic equation, the scaling exponent is \begin{align*} \frac{2}{p-1}=\frac{2}{3-1}=\frac{2}{2}=1, \end{align*} so the scaled profile is \begin{align*} u_\lambda(t,x)=\lambda u(\lambda t,\lambda x). \end{align*} At a fixed time, the kinetic and gradient terms scale with exponent \begin{align*} \frac{4}{p-1}+2-n=\frac{4}{3-1}+2-3=\frac{4}{2}-1=2-1=1. \end{align*} The potential term has exponent \begin{align*} \frac{2(p+1)}{p-1}-n=\frac{2(3+1)}{3-1}-3=\frac{8}{2}-3=4-3=1. \end{align*} Therefore each part of the cubic energy is multiplied by $\lambda$ when a fixed profile is rescaled to spatial scale $\lambda^{-1}$. Concentrating cubic waves to smaller scales therefore costs the positive factor $\lambda$, while in the critical quintic case the corresponding exponent is $0$, so the energy scale is preserved. [/example] ## Stability Estimates And Continuous Dependence Existence alone is not enough for a well-posed Cauchy problem. We also need to know whether nearby initial data produce nearby solutions and whether forcing errors remain controlled over short time intervals. [definition: Energy Functional For The Defocusing Power Wave Equation] For $p>1$, the defocusing power energy is the functional \begin{align*} E:\left(H^1(\mathbb R^n)\cap L^{p+1}(\mathbb R^n)\right)\times L^2(\mathbb R^n)&\to [0,\infty),& (u,v)&\mapsto E[u,v], \end{align*} defined by \begin{align*} E[u,v]=\int_{\mathbb R^n}\left(\frac12|v|^2+\frac12|\nabla u|^2+\frac{1}{p+1}|u|^{p+1}\right)\,d\mathcal L^n. \end{align*} [/definition] The energy functional combines the linear wave energy with the potential associated with the nonlinearity. For a nonlinear wave equation, local solutions can only be continued as long as the quantities in the energy norm remain controlled, but the equation itself contains second time derivatives and does not directly give such a bound. The next issue is therefore whether this formally defined quantity is actually invariant along solutions. A conservation statement is needed to turn the energy from a bookkeeping expression into an a priori estimate: it must show that the defocusing potential participates in the same conserved total energy as the kinetic and gradient terms. In the statement below, notation such as $C^2(I;L^2)$ means a twice continuously differentiable map from the time interval $I$ into the Banach space $L^2(\mathbb R^n)$, and $C^1(I;H^2)$ is interpreted similarly with values in the Sobolev space $H^2(\mathbb R^n)$. A compact subinterval $J\subset I$ is a closed bounded interval lying inside the time interval under consideration; assumptions stated on every such $J$ express local-in-time regularity without requiring uniform bounds all the way to the endpoints of $I$. [quotetheorem:7096] [citeproof:7096] The sign of the potential now becomes decisive. The compact-support assumption above is a convenient sufficient condition for removing boundary terms; on a bounded domain the corresponding statement would require compatible boundary conditions such as Dirichlet or Neumann data. If those hypotheses fail, a boundary flux term can appear, so the energy inside the spatial region need not be conserved. The regularity assumptions are also part of the statement: for rough energy solutions, the product with $\partial_tu$ and the chain rule for $|u|^{p+1}$ must be justified by an approximation argument. The identity does not say that every component of the energy is separately conserved; kinetic energy can transfer into gradient or potential energy. In the defocusing case, however, the sum controls the linear energy norm and gives the a priori estimate used next. A concrete boundary-flux failure occurs already for the linear wave equation on the half-line. If $u(t,x)=f(x+t)$ on $(0,\infty)$ with $f\in C_c^\infty(\mathbb R)$, then $u_{tt}-u_{xx}=0$, but the energy on the half-line satisfies \begin{align*} \frac{d}{dt}\int_0^\infty \frac12\left(|u_t(t,x)|^2+|u_x(t,x)|^2\right)\,dx =-u_t(t,0)u_x(t,0). \end{align*} For this travelling profile the right-hand side is $-|f'(t)|^2$, so energy leaves the half-line through the boundary unless a boundary condition removes the flux. Energy conservation gives a bound when it applies, but a local existence theorem needs a separate restart principle to turn such a bound into long-time existence. The next result records exactly what has to remain controlled at the edge of the maximal interval: not every spacetime norm used in the proof is assumed a priori, only the energy norm that fixes the local lifespan. [quotetheorem:7097] [citeproof:7097] For defocusing powers in the range covered by the elementary local theorem, energy conservation often supplies the boundedness assumed in the continuation criterion. The boundedness hypothesis is exactly what prevents the local lifespan from shrinking to zero during iteration; without it, a sequence of restart times can accumulate at a finite endpoint. A model failure is an ordinary differential equation such as $y'=y^2$, whose solution $y(t)=(1-t)^{-1}$ exists locally from each finite value but has a norm that blows up as $t\uparrow 1$. For wave equations, focusing signs or supercritical concentration can create the analogous obstruction in the energy topology or in the named fixed-point class. The criterion does not identify a blow-up mechanism, and it does not rule out loss of compactness in critical problems where the energy remains scale-invariant. In more delicate problems, such as focusing equations or critical equations, the same criterion isolates the precise quantity whose blow-up or concentration must be ruled out. [example: Forced Nonlinear Oscillator Field] Consider a smooth solution of the forced equation \begin{align*} \partial_t^2u-\Delta u+|u|^{p-1}u=f \end{align*} on $[0,T]\times\mathbb R^n$, with enough spatial decay to justify integration by parts. Multiplying the equation by $\partial_tu$ and integrating over $\mathbb R^n$ gives \begin{align*} \int_{\mathbb R^n}(\partial_t^2u)\partial_tu\,d\mathcal L^n-\int_{\mathbb R^n}(\Delta u)\partial_tu\,d\mathcal L^n+\int_{\mathbb R^n}|u|^{p-1}u\partial_tu\,d\mathcal L^n=\int_{\mathbb R^n}f\partial_tu\,d\mathcal L^n. \end{align*} The first term is \begin{align*} \int_{\mathbb R^n}(\partial_t^2u)\partial_tu\,d\mathcal L^n=\frac{d}{dt}\int_{\mathbb R^n}\frac12|\partial_tu|^2\,d\mathcal L^n. \end{align*} For the spatial term, integration by parts gives \begin{align*} -\int_{\mathbb R^n}(\Delta u)\partial_tu\,d\mathcal L^n=\int_{\mathbb R^n}\nabla u\cdot\nabla\partial_tu\,d\mathcal L^n. \end{align*} Since $\nabla\partial_tu=\partial_t\nabla u$, this equals \begin{align*} \int_{\mathbb R^n}\nabla u\cdot\nabla\partial_tu\,d\mathcal L^n=\frac{d}{dt}\int_{\mathbb R^n}\frac12|\nabla u|^2\,d\mathcal L^n. \end{align*} For the potential term, the chain rule applied to $a\mapsto \frac{1}{p+1}|a|^{p+1}$ gives \begin{align*} \int_{\mathbb R^n}|u|^{p-1}u\partial_tu\,d\mathcal L^n=\frac{d}{dt}\int_{\mathbb R^n}\frac{1}{p+1}|u|^{p+1}\,d\mathcal L^n. \end{align*} Adding these three identities yields the forced energy balance \begin{align*} \frac{d}{dt}E[u(t),\partial_tu(t)]=\int_{\mathbb R^n}f(t,x)\partial_tu(t,x)\,d\mathcal L^n(x). \end{align*} By the *Cauchy--Schwarz inequality*, \begin{align*} \left|\int_{\mathbb R^n}f(t,x)\partial_tu(t,x)\,d\mathcal L^n(x)\right|\le \|f(t)\|_{L^2}\|\partial_tu(t)\|_{L^2}. \end{align*} Because the energy contains the term $\frac12\|\partial_tu(t)\|_{L^2}^2$, we have \begin{align*} \|\partial_tu(t)\|_{L^2}\le \sqrt{2E[u(t),\partial_tu(t)]}. \end{align*} Hence \begin{align*} \left|\frac{d}{dt}E[u(t),\partial_tu(t)]\right|\le \sqrt{2}\|f(t)\|_{L^2}E[u(t),\partial_tu(t)]^{1/2}. \end{align*} Equivalently, applying this inequality to $(E[u(t),\partial_tu(t)]+\varepsilon)^{1/2}$ and then letting $\varepsilon\downarrow0$ gives \begin{align*} E[u(t),\partial_tu(t)]^{1/2}\le E[u(0),u_1]^{1/2}+\frac{1}{\sqrt2}\int_0^t\|f(s)\|_{L^2}\,ds. \end{align*} Thus $f\in L^1(0,T;L^2(\mathbb R^n))$ changes the energy by a finite amount controlled by its $L^1_tL^2_x$ norm, so forcing perturbs the finite-time energy estimate rather than destroying it. [/example] The forced example records a general stability pattern: errors enter the energy balance through a dual pairing with $\partial_tu$. This prepares the small-data case, where the error is not external forcing but the nonlinear Duhamel term generated by the solution itself. [example: Small-Data Stability Around Zero] Assume $n\ge 3$, $1<p\le n/(n-2)$, and let \begin{align*} \varepsilon=\|(u_0,u_1)\|_{H^1(\mathbb R^n)\times L^2(\mathbb R^n)}. \end{align*} The zero initial data give the zero solution, since substituting $u=0$ into $\partial_t^2u-\Delta u+|u|^{p-1}u=0$ gives $0-0+0=0$, and its energy is \begin{align*} E[0,0]=\int_{\mathbb R^n}\left(\frac12|0|^2+\frac12|\nabla 0|^2+\frac{1}{p+1}|0|^{p+1}\right)\,d\mathcal L^n=0. \end{align*} Let $z$ be the linear wave with the same initial data: \begin{align*} \partial_t^2z-\Delta z=0,\qquad (z(0),\partial_tz(0))=(u_0,u_1). \end{align*} By the linear energy estimate, \begin{align*} \sup_{|t|\le T}\left(\|z(t)\|_{H^1}+\|\partial_tz(t)\|_{L^2}\right)\le C_0\varepsilon. \end{align*} The nonlinear solution satisfies the Duhamel formula \begin{align*} u(t)=z(t)-\int_0^t\frac{\sin((t-s)|\nabla|)}{|\nabla|}\left(|u(s)|^{p-1}u(s)\right)\,ds. \end{align*} Applying the forced linear wave estimate to $u-z$ gives \begin{align*} \sup_{|t|\le T}\left(\|u(t)-z(t)\|_{H^1}+\|\partial_tu(t)-\partial_tz(t)\|_{L^2}\right)\le C\int_{-T}^{\mathsf T}\||u(s)|^{p-1}u(s)\|_{L^2}\,ds. \end{align*} For each fixed $s$, since $|u|^{p-1}u$ has absolute value $|u|^p$, \begin{align*} \||u(s)|^{p-1}u(s)\|_{L^2}=\left(\int_{\mathbb R^n}|u(s,x)|^{2p}\,d\mathcal L^n(x)\right)^{1/2}=\|u(s)\|_{L^{2p}}^p. \end{align*} Because $p\le n/(n-2)$, we have $2p\le 2n/(n-2)$, so Sobolev embedding gives \begin{align*} \|u(s)\|_{L^{2p}}\le C_S\|u(s)\|_{H^1}. \end{align*} Hence, if \begin{align*} \sup_{|t|\le T}\left(\|u(t)\|_{H^1}+\|\partial_tu(t)\|_{L^2}\right)\le M, \end{align*} then \begin{align*} \int_{-T}^{\mathsf T}\||u(s)|^{p-1}u(s)\|_{L^2}\,ds\le \int_{-T}^{\mathsf T} C_S^p M^p\,ds=2TC_S^pM^p. \end{align*} Thus the nonlinear correction is bounded by $2CC_S^pTM^p$, which is one higher power of the energy-size bound $M$ than the linear term. Choose $M=2C_0\varepsilon$ and choose $T>0$ so that \begin{align*} 2CC_S^pTM^{p-1}\le \frac12. \end{align*} Then \begin{align*} 2CC_S^pTM^p\le \frac12M=C_0\varepsilon. \end{align*} Therefore \begin{align*} \sup_{|t|\le T}\left(\|u(t)\|_{H^1}+\|\partial_tu(t)\|_{L^2}\right)\le C_0\varepsilon+C_0\varepsilon=2C_0\varepsilon. \end{align*} So on this interval the nonlinear solution stays close to the zero solution in the same $H^1\times L^2$ norm used by the fixed-point argument. Repeating the same estimate from later times is possible as long as the energy norm remains small enough to keep a uniform choice of $T$, which is the precise sense in which small data are stable around zero. [/example] ## Consequences For The Course The semilinear wave equation shows how the linear tools developed earlier combine: Duhamel's formula constructs solutions, Sobolev and Strichartz estimates handle nonlinear terms, and energy identities control continuation. The elementary local theorem uses Sobolev embedding to keep the forcing in $L^1_tL^2_x$, while the full energy-subcritical theory uses Strichartz spaces to cover powers beyond this Lipschitz range. Later arguments for critical or focusing problems refine the same three ideas rather than replacing them. Chapter 10 showed how energy estimates and Sobolev control give local and subcritical well-posedness for semilinear waves. Those estimates are often not strong enough to pass to limits directly, so the next chapter develops weak compactness methods to construct solutions from bounded approximations. # 11. Weak Compactness and Passage to the Limit Chapters 3, 6, and 10 developed energy estimates for parabolic, linear hyperbolic, and semilinear hyperbolic equations. This chapter explains how those estimates become existence results: construct finite-dimensional approximate solutions, obtain bounds independent of the dimension, extract weakly convergent subsequences, and pass to the limit. The central difficulty is that [weak convergence](/page/Weak%20Convergence) is well suited to linear terms and convex energies, while nonlinearities and time-dependent products often require compactness stronger than weak convergence. ## Galerkin Approximation for Heat and Wave Equations How can an infinite-dimensional evolution equation be solved without already knowing that the relevant operator generates a semigroup? The Galerkin method answers this by projecting the equation onto finite-dimensional spaces, solving ordinary differential equations there, and using uniform energy estimates to recover a solution of the original PDE. As in the parabolic energy-space formulation of Chapter 3, let $V \hookrightarrow H \hookrightarrow V^*$ be a Gelfand triple, where $V$ is a Hilbert space densely and continuously embedded in the Hilbert space $H$, and $H$ is identified with its dual. The model parabolic problem is \begin{align*} u'(t)+A(t)u(t)=f(t) \quad \text{in } V^*, \qquad u(0)=u_0 \quad \text{in } H. \end{align*} Here $A(t):V \to V^*$ is induced by a bilinear form $B_t[u,v]$. [definition: Galerkin Approximation] Let $V \hookrightarrow H$ be separable Hilbert spaces and let $(w_k)_{k=1}^\infty \subset V$ be linearly independent with dense span in $V$. For $m \in \mathbb N$, set $V_m := \operatorname{span}\{w_1,\dots,w_m\}$, and let $P_m:H\to V_m$ denote the $H$-[orthogonal projection](/theorems/437) onto $V_m$. A Galerkin approximation to $u'(t)+A(t)u(t)=f(t)$ is a function \begin{align*} u_m(t)=\sum_{k=1}^m d_{m,k}(t)w_k \end{align*} satisfying, for each $j=1,\dots,m$, \begin{align*} (u_m'(t),w_j)_H + B_t[u_m(t),w_j] = f(t)(w_j) \end{align*} for a.e. $t \in (0,T)$, together with an initial condition $u_m(0)=P_m u_0 \in V_m$. [/definition] The definition turns the PDE into a finite system for the coefficients $d_{m,k}$. Once a basis of $V_m$ is fixed, the equation has the form $M d_m'(t)+K(t)d_m(t)=F_m(t)$, with $M$ positive definite, so standard ODE theory gives local solutions. The next theorem is needed to show that the finite-dimensional construction survives as $m\to\infty$ and gives a genuine weak solution rather than only a sequence of approximations. [quotetheorem:7098] [citeproof:7098] This theorem packages the standard existence mechanism for weak heat equations. The coercivity term supplies spatial control, the time derivative bound supplies enough compactness in time to identify the initial value, and the Hilbert triple records the integration by parts structure. Each hypothesis is tied to a specific part of the argument. Boundedness of $B_t$ makes $A(t)u$ an element of $V^*$ and lets the projected equations pass to the weak limit. Coercivity is the source of the $L^2(0,T;V)$ estimate; without it a finite-dimensional solution may exist but the sequence need not control any spatial derivatives as $m\to\infty$. Measurability in $t$ is needed even to interpret $t\mapsto B_t[u_m(t),w_j]$ as a measurable coefficient in the ODE system and to integrate the energy inequality. The coercivity assumption cannot be replaced by boundedness alone. If $B[u,v]=-(\nabla u,\nabla v)_{L^2(\Omega)}$ on $H^1_0(\Omega)$, the equation $u_t+\Delta u=0$ is the backward heat equation; high-frequency eigenmodes grow like $e^{\lambda_k t}$, so the Galerkin sequence loses uniform bounds for rough initial data. If the coefficient multiplying $\nabla u\cdot\nabla v$ is unbounded, then $B$ need not define a bounded map $V\to V^*$, and weak convergence in $L^2(0,T;V)$ is not enough to identify the limit of $A(t)u_m$. The theorem also does not assert uniqueness unless an additional monotonicity or coercive difference estimate is available, nor does it give classical smoothness; it gives the weak regularity listed in the conclusion. [example: Weak Heat Solution by Eigenfunction Truncation] Let $\Omega\subset \mathbb R^n$ be bounded, let $(w_k)$ be an $L^2(\Omega)$-orthonormal basis of Dirichlet eigenfunctions of $-\Delta$, and write \begin{align*} -\Delta w_k=\lambda_k w_k,\qquad w_k|_{\partial\Omega}=0,\qquad (w_i,w_j)_{L^2}=\delta_{ij}. \end{align*} The weak eigenvalue identity is \begin{align*} \int_\Omega \nabla w_k\cdot \nabla v\,d\mathcal L^n=\lambda_k(w_k,v)_{L^2}\qquad \text{for every }v\in H^1_0(\Omega). \end{align*} For the heat equation \begin{align*} u_t-\Delta u=f, \qquad u|_{\partial\Omega}=0, \qquad u(0)=u_0, \end{align*} take $V=H^1_0(\Omega)$, $H=L^2(\Omega)$, and \begin{align*} B[u,v]=\int_\Omega \nabla u\cdot \nabla v\,d\mathcal L^n. \end{align*} With $u_m(t)=\sum_{k=1}^m d_{m,k}(t)w_k$, testing the Galerkin equation against $w_j$ gives \begin{align*} (u_m'(t),w_j)_{L^2}+B[u_m(t),w_j]=f(t)(w_j). \end{align*} Since $u_m'(t)=\sum_{k=1}^m d_{m,k}'(t)w_k$ and the basis is $L^2$-orthonormal, \begin{align*} (u_m'(t),w_j)_{L^2}=d_{m,j}'(t). \end{align*} Using the eigenvalue identity with $v=w_j$, \begin{align*} B[u_m(t),w_j]=\sum_{k=1}^m d_{m,k}(t)\int_\Omega \nabla w_k\cdot \nabla w_j\,d\mathcal L^n=\sum_{k=1}^m d_{m,k}(t)\lambda_k(w_k,w_j)_{L^2}=\lambda_j d_{m,j}(t). \end{align*} Thus each coefficient satisfies \begin{align*} d_{m,j}'(t)+\lambda_j d_{m,j}(t)=f(t)(w_j). \end{align*} Multiply this scalar equation by $d_{m,j}(t)$ and sum over $j=1,\dots,m$. The derivative term is \begin{align*} \sum_{j=1}^m d_{m,j}'(t)d_{m,j}(t)=\frac12\frac{d}{dt}\sum_{j=1}^m d_{m,j}(t)^2=\frac12\frac{d}{dt}\|u_m(t)\|_{L^2}^2. \end{align*} The elliptic term is \begin{align*} \sum_{j=1}^m \lambda_j d_{m,j}(t)^2=\int_\Omega \nabla u_m(t)\cdot \nabla u_m(t)\,d\mathcal L^n=\|\nabla u_m(t)\|_{L^2}^2. \end{align*} The forcing term is \begin{align*} \sum_{j=1}^m f(t)(w_j)d_{m,j}(t)=f(t)(u_m(t)). \end{align*} Therefore \begin{align*} \frac12\frac{d}{dt}\|u_m(t)\|_{L^2}^2+\|\nabla u_m(t)\|_{L^2}^2=f(t)(u_m(t)). \end{align*} By the definition of the $H^{-1}$ dual norm and [Young's inequality](/theorems/244), \begin{align*} f(t)(u_m(t))\le \|f(t)\|_{H^{-1}}\|\nabla u_m(t)\|_{L^2}\le \frac12\|f(t)\|_{H^{-1}}^2+\frac12\|\nabla u_m(t)\|_{L^2}^2. \end{align*} Hence \begin{align*} \frac{d}{dt}\|u_m(t)\|_{L^2}^2+\|\nabla u_m(t)\|_{L^2}^2\le \|f(t)\|_{H^{-1}}^2. \end{align*} Integrating from $0$ to $t$ gives \begin{align*} \|u_m(t)\|_{L^2}^2+\int_0^t\|\nabla u_m(s)\|_{L^2}^2\,ds\le \|P_m u_0\|_{L^2}^2+\int_0^t\|f(s)\|_{H^{-1}}^2\,ds. \end{align*} Since $P_m$ is the $L^2$-orthogonal projection, $\|P_m u_0\|_{L^2}\le \|u_0\|_{L^2}$, so \begin{align*} \sup_{0\le t\le T}\|u_m(t)\|_{L^2}^2+\int_0^{\mathsf T}\|\nabla u_m(s)\|_{L^2}^2\,ds\le \|u_0\|_{L^2}^2+\int_0^{\mathsf T}\|f(s)\|_{H^{-1}}^2\,ds. \end{align*} This bound is independent of $m$, giving uniform control in $L^\infty(0,T;L^2(\Omega))$ and $L^2(0,T;H^1_0(\Omega))$; these are exactly the estimates used to extract a weakly convergent subsequence and construct the weak heat solution. [/example] The example shows how diffusion creates spatial control from the Dirichlet form. Hyperbolic equations require a related theorem because the natural estimate controls both displacement and velocity, and the finite-dimensional limit must retain the second-order time structure. [quotetheorem:7099] [proofunderconstruction:7099] The parabolic theorem gains an $L^2(0,T;V)$ estimate because diffusion smooths in space. The wave theorem instead controls the natural energy, and any extra compactness must come from damping, regularisation, or compact embeddings. The assumptions here are different because the test function is $u_t$ rather than $u$. Symmetry of $B$ is what turns $B[u,u_t]$ into $\frac{d}{dt}\frac12B[u,u]$; for a nonsymmetric form the skew or non-self-adjoint part contributes terms that are not controlled by the displayed energy unless extra bounds are imposed. Coercivity identifies $B[u,u]$ with the elastic energy and prevents the position variable from escaping in directions of negative or zero energy. Data regularity $u_0\in V$ and $u_1\in H$ is also part of the energy framework, since the initial energy contains both $\|u_1\|_H^2$ and $B[u_0,u_0]$. The damping coefficient $a\ge 0$ is not needed for the basic conservative wave bound when $a=0$, but it is needed for the dissipative estimate that controls $\int_0^{\mathsf T}\|u_t(s)\|_H^2\,ds$ when $a>0$. If $a<0$, the term becomes anti-damping and the energy identity feeds energy into the system; even the scalar equation $\ddot y-a_0\dot y+\lambda y=0$ with $a_0>0$ has exponentially growing modes. If $B$ is not coercive, the finite-dimensional system may contain directions with no restoring force or negative elastic energy, so the estimate no longer controls $\|u\|_V$. The theorem therefore gives weak energy solutions at the natural energy level; it does not provide smoothing, compactness of $u_t$, or strong convergence of nonlinear functions of $u$. ## Aubin-Lions Compactness for Parabolic Limits Weak convergence is enough for the linear variational identity, but many evolution equations contain nonlinear terms such as $g(u_m)$ or products involving $u_m$. The question is how an energy bound in a strong space and a time-derivative bound in a weak space can imply strong convergence in an intermediate space. Before adding strong compactness, it is useful to isolate the weaker compactness-and-limit-passage theorem that underlies the Galerkin construction. [quotetheorem:7100] [citeproof:7100] This theorem is a framework result, not a substitute for checking the equation at hand. Its compactness hypotheses produce a candidate limit and identify the weak time derivative, while the assumptions on the forms and forcing terms are what allow the variational identity to pass to the limit. The energy inequality survives only through convexity or a separately verified lower-semicontinuity argument, which is why the next two results focus on lower semicontinuity itself. When the equation contains nonlinear lower-order terms, this weak [compactness theorem](/theorems/2748) usually has to be combined with Aubin-Lions compactness to obtain the strong convergence needed to identify those nonlinear terms. [quotetheorem:615] [citeproof:615] The lemma is a compactness bridge: it converts the pair of estimates produced by parabolic energy methods into strong convergence in $L^p$ spaces. In applications, $X_0$ is usually a Sobolev space, $X$ an $L^2$ or $L^p$ space, and $X_1$ a negative Sobolev space. The compactness of $X_0\hookrightarrow\hookrightarrow X$ is the decisive input. A merely continuous embedding is not enough for most limiting arguments: in an infinite-dimensional Hilbert space $X_0=X=\ell^2$, the constant-in-time sequence $u_k(t)=e_k$ is bounded in $L^p(0,T;\ell^2)$ and has derivative zero, but it has no strongly convergent subsequence in $L^p(0,T;\ell^2)$. Thus the lemma is useful only when the spatial estimate lives in a genuinely compactly embedded space and the time-derivative estimate is available in a weaker ambient space. Aubin-Lions also has a precise limitation. It gives strong convergence only in the intermediate space $L^p(0,T;X)$, not in the stronger space $L^p(0,T;X_0)$, and it does not identify the limit as a solution unless the equation's variational identities are separately passed to the limit. In nonlinear problems this distinction matters: strong convergence in $L^2(0,T;L^2)$ may pass a continuous lower-order term, but it may not be enough for nonlinearities requiring stronger integrability or pointwise control. [example: Compactness for Regularized Semilinear Equations] Let $\Omega\subset\mathbb R^n$ be bounded and suppose $(u_\varepsilon)$ solves a regularized parabolic problem with \begin{align*} \sup_{\varepsilon>0}\left(\|u_\varepsilon\|_{L^2(0,T;H^1_0(\Omega))}+\|\partial_t u_\varepsilon\|_{L^2(0,T;H^{-1}(\Omega))}\right)<\infty. \end{align*} We apply the *Aubin-Lions lemma* with \begin{align*} X_0=H^1_0(\Omega),\qquad X=L^2(\Omega),\qquad X_1=H^{-1}(\Omega). \end{align*} The embedding $H^1_0(\Omega)\hookrightarrow\hookrightarrow L^2(\Omega)$ is compact by the Rellich-Kondrachov compactness theorem. The embedding $L^2(\Omega)\hookrightarrow H^{-1}(\Omega)$ is continuous because, for $z\in L^2(\Omega)$ and $\varphi\in H^1_0(\Omega)$, \begin{align*} |\langle z,\varphi\rangle_{H^{-1},H^1_0}|=\left|\int_\Omega z\varphi\,d\mathcal L^n\right|\le \|z\|_{L^2}\|\varphi\|_{L^2}. \end{align*} By Poincare's inequality, $\|\varphi\|_{L^2}\le C_\Omega\|\nabla\varphi\|_{L^2}=C_\Omega\|\varphi\|_{H^1_0}$, so \begin{align*} |\langle z,\varphi\rangle_{H^{-1},H^1_0}|\le C_\Omega\|z\|_{L^2}\|\varphi\|_{H^1_0}. \end{align*} Taking the supremum over all $\varphi$ with $\|\varphi\|_{H^1_0}\le 1$ gives \begin{align*} \|z\|_{H^{-1}}\le C_\Omega\|z\|_{L^2}. \end{align*} The hypotheses of the *Aubin-Lions lemma* are therefore exactly the displayed uniform bounds, so there are $\varepsilon_k\downarrow 0$ and $u\in L^2(0,T;L^2(\Omega))$ such that \begin{align*} u_{\varepsilon_k}\to u\quad \text{strongly in }L^2(0,T;L^2(\Omega)). \end{align*} This means \begin{align*} \int_0^{\mathsf T}\int_\Omega |u_{\varepsilon_k}(t,x)-u(t,x)|^2\,d\mathcal L^n(x)\,dt\to 0. \end{align*} After passing to a further subsequence, $u_{\varepsilon_k}(t,x)\to u(t,x)$ for a.e. $(t,x)\in(0,T)\times\Omega$. If $g:\mathbb R\to\mathbb R$ is continuous and its growth is controlled by the available integrability, then $g(u_{\varepsilon_k}(t,x))\to g(u(t,x))$ a.e. by continuity of $g$. For any test function $\varphi\in C_c^\infty((0,T)\times\Omega)$, convergence of the nonlinear term in the weak formulation is obtained once the family $g(u_{\varepsilon_k})\varphi$ is uniformly integrable; for example, if $g(u_{\varepsilon_k})\to g(u)$ strongly in $L^1((0,T)\times\Omega)$, then \begin{align*} \left|\int_0^{\mathsf T}\int_\Omega \big(g(u_{\varepsilon_k})-g(u)\big)\varphi\,d\mathcal L^n\,dt\right|\le \|\varphi\|_{L^\infty}\|g(u_{\varepsilon_k})-g(u)\|_{L^1}. \end{align*} The right-hand side tends to $0$, so the strong compactness conclusion is precisely what lets the regularized nonlinear term pass to the limit in the weak formulation. [/example] Aubin-Lions does not replace energy estimates; it uses them. The usual workflow is to first prove uniform bounds in the strongest natural spaces, then use the equation itself to estimate the time derivative in a dual space. [remark: Choosing the Three Spaces] For the heat equation with homogeneous Dirichlet boundary condition, the standard choice is \begin{align*} X_0=H^1_0(\Omega), \qquad X=L^2(\Omega), \qquad X_1=H^{-1}(\Omega). \end{align*} For fourth-order parabolic equations, one often shifts the triple upward, for instance $X_0=H^2(\Omega)$ and $X=L^2(\Omega)$, while $X_1$ records the space where the equation defines $u_t$. [/remark] ## Lower Semicontinuity and Weak Convergence in Energy Inequalities After extracting a weakly convergent subsequence, the remaining problem is to prove that the limit satisfies the same energy inequality as the approximations. Equalities are fragile under weak convergence, but convex norms and coercive energies behave well through lower semicontinuity. [definition: Weak Lower Semicontinuity] Let $X$ be a Banach space. A functional $E:X\to (-\infty,\infty]$ is weakly lower semicontinuous if, whenever $u_k\rightharpoonup u$ weakly in $X$, \begin{align*} E[u]\le \liminf_{k\to\infty} E[u_k]. \end{align*} [/definition] Weak lower semicontinuity is the reason energy inequalities pass to weak limits with the inequality in the correct direction. To use the definition in practice, one needs a criterion that produces weak lower semicontinuity from hypotheses that can actually be checked in energy estimates. The first basic source of this property is convexity: weak convergence does not preserve pointwise values or norms exactly, but convex functionals cannot jump upward at the weak limit in the way nonconvex expressions can. The following result supplies the abstract functional-analytic mechanism behind the energy passage to the limit. [quotetheorem:215] For PDE estimates one also needs a version that applies after time integration, because approximate solutions usually converge weakly in spaces such as $L^2(0,T;H^1_0(U))$ rather than at each fixed time. The limiting argument then has to compare an integral energy of the weak limit with the liminf of integral energies along the approximating sequence. [quotetheorem:986] This lower-semicontinuity principle is used at the final step of compactness arguments. It justifies replacing an approximate integrated energy by the limiting integrated energy after passing to weak limits in inequalities over time. The convexity hypothesis is not cosmetic. Weak limits average oscillations, and nonconvex functionals can drop or jump under that averaging. For example, in $L^2(0,2\pi)$ the functions $u_k(x)=\sin(kx)$ converge weakly to $0$, but the nonconvex energy $E[u]=-\|u\|_{L^2}^2$ satisfies $E[0]=0$ while $\liminf_k E[u_k]=-\pi$, so weak lower semicontinuity fails. This is why compactness arguments preserve convex energies such as norms, Dirichlet integrals, and positive quadratic forms, but require extra compactness or relaxation for nonconvex energies. Lower semicontinuity is also one-sided. It proves that the limiting energy is no larger than the liminf of approximate energies, so energy inequalities survive in the dissipative direction. It does not prove convergence of the energies, does not preserve energy equalities, and does not by itself identify nonlinear terms such as $u_m^2$ or $g(u_m)$; those terms need strong convergence, monotonicity, or a separate compactness argument. [example: Passing to the Limit in Damped Wave Equations] Suppose $u_m$ solves the Galerkin damped wave equation and satisfies, for every $t\in[0,T]$, \begin{align*} \frac12\|u_m'(t)\|_{L^2}^2+\frac12\|\nabla u_m(t)\|_{L^2}^2+a\int_0^t\|u_m'(s)\|_{L^2}^2\,ds \le E_m(0)+\int_0^t(f(s),u_m'(s))_{L^2}\,ds. \end{align*} Here \begin{align*} E_m(0):=\frac12\|u_m'(0)\|_{L^2}^2+\frac12\|\nabla u_m(0)\|_{L^2}^2. \end{align*} Assume that $u_m\overset{*}{\rightharpoonup}u$ in $L^\infty(0,T;H^1_0(\Omega))$ and $u_m'\overset{*}{\rightharpoonup}u_t$ in $L^\infty(0,T;L^2(\Omega))$. Since $L^\infty(0,T;L^2)$ embeds continuously into $L^2(0,T;L^2)$ on a finite time interval, the second convergence also gives \begin{align*} u_m'\rightharpoonup u_t\quad \text{in }L^2(0,T;L^2(\Omega)). \end{align*} Fix a time $t$ for which the weak time traces satisfy $u_m(t)\rightharpoonup u(t)$ in $H^1_0(\Omega)$ and $u_m'(t)\rightharpoonup u_t(t)$ in $L^2(\Omega)$. By *Norm Lower Semicontinuity*, \begin{align*} \|u_t(t)\|_{L^2}^2\le \liminf_{m\to\infty}\|u_m'(t)\|_{L^2}^2. \end{align*} Applying the same result to $\nabla u_m(t)\rightharpoonup \nabla u(t)$ in $L^2(\Omega;\mathbb R^n)$ gives \begin{align*} \|\nabla u(t)\|_{L^2}^2\le \liminf_{m\to\infty}\|\nabla u_m(t)\|_{L^2}^2. \end{align*} Also $u_m'\rightharpoonup u_t$ in $L^2(0,t;L^2(\Omega))$, so another application of lower semicontinuity gives \begin{align*} \int_0^t\|u_t(s)\|_{L^2}^2\,ds\le \liminf_{m\to\infty}\int_0^t\|u_m'(s)\|_{L^2}^2\,ds. \end{align*} The forcing term is linear in $u_m'$. If $f\in L^2(0,T;L^2(\Omega))$, then the map \begin{align*} v\mapsto \int_0^t(f(s),v(s))_{L^2}\,ds \end{align*} is a bounded linear functional on $L^2(0,t;L^2(\Omega))$, because \begin{align*} \left|\int_0^t(f(s),v(s))_{L^2}\,ds\right|\le \|f\|_{L^2(0,t;L^2)}\|v\|_{L^2(0,t;L^2)}. \end{align*} Therefore weak convergence of $u_m'$ implies \begin{align*} \int_0^t(f(s),u_m'(s))_{L^2}\,ds\to \int_0^t(f(s),u_t(s))_{L^2}\,ds. \end{align*} If the projected initial data are chosen so that $u_m(0)\to u(0)$ in $H^1_0(\Omega)$ and $u_m'(0)\to u_t(0)$ in $L^2(\Omega)$, then \begin{align*} E_m(0)\to E(0):=\frac12\|u_t(0)\|_{L^2}^2+\frac12\|\nabla u(0)\|_{L^2}^2. \end{align*} Taking the liminf on the left side of the approximate inequality and using the convergence of the right side yields \begin{align*} \frac12\|u_t(t)\|_{L^2}^2+\frac12\|\nabla u(t)\|_{L^2}^2+a\int_0^t\|u_t(s)\|_{L^2}^2\,ds \le E(0)+\int_0^t(f(s),u_t(s))_{L^2}\,ds. \end{align*} Thus weak convergence preserves the damped wave energy inequality: the kinetic, elastic, and damping terms survive by convex lower semicontinuity, while the forcing term passes to the limit by weak convergence against the fixed function $f$. [/example] The example shows how weak convergence and lower semicontinuity preserve an energy inequality term by term. Taken together, the chapter's compactness method has a standard workflow rather than a single additional named theorem. Galerkin approximation supplies finite-dimensional solvability, energy estimates supply uniform bounds, Aubin-Lions supplies strong compactness when lower-order or nonlinear terms must be identified, and lower semicontinuity preserves the energy inequality in the limit. [remark: Compactness Workflow for Energy Solutions] In a linear problem, the limit passage usually rests on bounded bilinear forms: weak convergence in $L^2(0,T;V)$ is enough to pass terms such as $B_t[u_m(t),\varphi(t)]$ to the limit. In a semilinear problem, the same step is often false until compactness gives strong convergence in a space where $g(u_m)$ converges to $g(u)$. The convergence statement is therefore not a formal slogan; it is the application-specific part of the proof. The uniform bounds are also indispensable. Without the $L^2(0,T;V)$ and $L^\infty(0,T;H)$ estimates, weak compactness may not produce a subsequence in the energy space. Without the derivative estimate, the weak limit of $u_m$ may not have the expected time derivative or trace. Without convexity or weak lower semicontinuity, the limiting object can solve the variational identity but fail the desired energy inequality; oscillatory sequences can lower nonconvex energies in the limit. The outcome is correspondingly limited: subsequential existence and an inequality, not uniqueness, full-sequence convergence, classical regularity, or energy equality. [/remark] Weak compactness turns uniform energy bounds into existence statements, but often only after passing to subsequences and accepting weaker conclusions. The final chapter adds dissipation to the hyperbolic picture, studying how damping restores decay, stability, and asymptotic convergence. # 12. Damping, Stability, and Evolution with Dissipation Chapters 2 through 11 developed parabolic smoothing, hyperbolic propagation, semigroup generation, and weak energy methods as separate strands of the theory. This chapter studies what happens when a hyperbolic system is forced to lose energy, either through explicit damping or through a dissipative nonlinearity. The guiding question is how the long-time behaviour of an evolution can be read from an energy functional: decay, convergence to equilibria, or stability of the zero solution. ## Damped Wave Equations and Energy Decay The undamped wave equation conserves energy, so its solutions propagate oscillations rather than settle down. Damping changes the problem by converting the conserved Hamiltonian energy into a Lyapunov functional. The first question is quantitative: under which boundary conditions and damping assumptions does the energy decrease, and when can the decrease be upgraded to a rate? Consider a bounded domain $U \subset \mathbb R^n$ with sufficiently regular boundary, and impose homogeneous Dirichlet boundary conditions. The model linearly damped wave equation is \begin{align*} u_{tt} - \Delta u + a u_t = 0 \quad \text{in } (0,\infty) \times U, \end{align*} where $a>0$ is the damping coefficient. The natural phase space is $H^1_0(U) \times L^2(U)$, because the position contributes elastic energy and the velocity contributes kinetic energy. [definition: Wave Energy] The wave energy is the functional \begin{align*} E:H^1_0(U)\times L^2(U)\to \mathbb R \end{align*} defined, for $(u,v)\in H^1_0(U)\times L^2(U)$, by \begin{align*} E[u,v] := \frac{1}{2}\|v\|_{L^2(U)}^2 + \frac{1}{2}\|\nabla u\|_{L^2(U)}^2. \end{align*} For a time-dependent solution $u(t)$, write \begin{align*} E(t) := E[u(t),u_t(t)]. \end{align*} [/definition] This is the same energy used for the conservative wave equation, but the damping term gives its derivative a sign. The next calculation records the exact balance law that later decay and stability arguments refine. [quotetheorem:7101] [citeproof:7101] The hypotheses in this identity are doing specific work. The homogeneous Dirichlet boundary condition removes the boundary flux term \begin{align*} \int_{\partial U} u_t\,\partial_\nu u\,d\mathcal H^{n-1}; \end{align*} with Neumann or mixed boundary data, the same multiplication produces an additional boundary contribution unless the boundary condition is chosen to cancel it. The regularity assumption allows the multiplier computation and the integration by parts; for finite-energy weak solutions the identity is obtained by density or Galerkin approximation rather than by differentiating every term pointwise. The condition $a>0$ gives a sign: if $a=0$ the energy is conserved, and if $a<0$ the same formula becomes an energy growth identity. The identity does not give a decay rate by itself. It controls the time integral of the velocity, while the energy also contains the displacement; a solution could have small velocity over long intervals while retaining elastic energy. The next result adds a geometric estimate, through Poincare's inequality on $H^1_0(U)$, that couples displacement and velocity strongly enough to force uniform decay of the whole phase-space state. [quotetheorem:7102] [citeproof:7102] The uniform damping and bounded Dirichlet geometry are essential for this form of decay. If the damping coefficient vanishes on part of the domain, rays may avoid the damping region; without the geometric control condition, localized damping need not produce exponential energy decay. If $U=\mathbb R^n$, Poincare's inequality is unavailable and low-frequency components can decay only polynomially. If the boundary condition leaves constants in the kernel, as for the Neumann wave equation, the mean displacement may persist and the full energy need not decay to zero. The theorem also has limits: it gives decay in the natural energy space, not heat-like smoothing, and the constants $C,\gamma$ are not universal because they depend on the domain and damping strength. This distinction prepares the next section, where the telegraph equation shows how damping can create a diffusive limit without making the finite-time damped wave flow identical to heat flow. [example: Damped String] Let $U=(0,L)$ and impose the Dirichlet conditions $u(t,0)=u(t,L)=0$. The functions \begin{align*} e_k(x)=\sin(k\pi x/L), \qquad k\ge 1, \end{align*} vanish at $x=0,L$ and satisfy \begin{align*} -e_k''(x)=\left(\frac{k\pi}{L}\right)^2 e_k(x). \end{align*} Write \begin{align*} u(t,x)=\sum_{k=1}^{\infty}q_k(t)e_k(x). \end{align*} Then \begin{align*} u_{tt}(t,x)=\sum_{k=1}^{\infty}q_k''(t)e_k(x). \end{align*} Also, using the eigenvalue identity for $e_k$, \begin{align*} -u_{xx}(t,x)=\sum_{k=1}^{\infty}q_k(t)\left(\frac{k\pi}{L}\right)^2 e_k(x). \end{align*} Finally, \begin{align*} a u_t(t,x)=\sum_{k=1}^{\infty}a q_k'(t)e_k(x). \end{align*} Substituting these three expansions into $u_{tt}-u_{xx}+a u_t=0$ gives \begin{align*} \sum_{k=1}^{\infty}\left(q_k''(t)+a q_k'(t)+\left(\frac{k\pi}{L}\right)^2q_k(t)\right)e_k(x)=0. \end{align*} Since the sine functions are orthogonal in $L^2(0,L)$, each coefficient must vanish: \begin{align*} q_k''(t)+a q_k'(t)+\left(\frac{k\pi}{L}\right)^2q_k(t)=0. \end{align*} For a fixed mode, set $\omega_k=k\pi/L$. The characteristic equation is \begin{align*} \lambda^2+a\lambda+\omega_k^2=0. \end{align*} Its roots are \begin{align*} \lambda_{k,\pm}=\frac{-a\pm\sqrt{a^2-4\omega_k^2}}{2}. \end{align*} If $a^2<4\omega_k^2$, the roots have real part $-a/2$. If $a^2>4\omega_k^2$, then $\sqrt{a^2-4\omega_k^2}<a$, so both numbers $(-a+\sqrt{a^2-4\omega_k^2})/2$ and $(-a-\sqrt{a^2-4\omega_k^2})/2$ are negative. If $a^2=4\omega_k^2$, then the repeated root is $-a/2$, and the solutions have the form $(A_k+B_k t)e^{-a t/2}$, which is bounded by a constant times $e^{-\mu t}$ for every $\mu<a/2$. Thus each fixed Fourier coefficient decays exponentially, with the slowest mode governed by the smallest spatial frequency $\omega_1=\pi/L$. The string therefore decomposes into damped oscillators, and the first sine mode is the obstruction to faster uniform decay, exactly reflecting the role of the Dirichlet spectral gap behind Poincare's inequality. [/example] The string example uses linear friction, where the sign of the dissipation is built into the coefficient $a>0$. Many models use feedback laws that are not linear in the velocity, so the next issue is to isolate the monotonicity condition that still guarantees energy loss. [definition: Monotone Damping Feedback] A function $g:\mathbb R\to \mathbb R$ is a monotone damping feedback if $g(0)=0$ and \begin{align*} (g(r)-g(s))(r-s)\ge 0 \qquad \text{for all } r,s\in\mathbb R. \end{align*} [/definition] For such a feedback, $g(r)r\ge 0$ for all $r\in\mathbb R$. This sign condition is the pointwise mechanism behind energy dissipation. [example: Nonlinear Damping with Monotone Feedback] Let $p\ge 1$ and define $g(r)=|r|^{p-1}r$. The map $g$ is monotone: if $r,s>0$, then $g(r)-g(s)=r^p-s^p$ has the same sign as $r-s$; if $r,s<0$, then $g(r)-g(s)=-((-r)^p-(-s)^p)$ also has the same sign as $r-s$; and if $r\ge 0\ge s$, then $g(r)\ge 0\ge g(s)$, so $(g(r)-g(s))(r-s)\ge 0$. Also, \begin{align*} g(r)r=|r|^{p-1}r^2=|r|^{p-1}|r|^2=|r|^{p+1}. \end{align*} Now let $u$ be a smooth solution of \begin{align*} u_{tt}-\Delta u+g(u_t)=0 \end{align*} with homogeneous Dirichlet boundary condition. Multiplying by $u_t$ and integrating over $U$ gives \begin{align*} \int_U u_{tt}u_t\,d\mathcal L^n-\int_U(\Delta u)u_t\,d\mathcal L^n+\int_U g(u_t)u_t\,d\mathcal L^n=0. \end{align*} The kinetic term is \begin{align*} \int_U u_{tt}u_t\,d\mathcal L^n=\frac{1}{2}\frac{d}{dt}\int_U |u_t|^2\,d\mathcal L^n. \end{align*} For the elastic term, integration by parts and $u_t=0$ on $\partial U$ give \begin{align*} -\int_U(\Delta u)u_t\,d\mathcal L^n=\int_U \nabla u\cdot \nabla u_t\,d\mathcal L^n. \end{align*} Since $\nabla u_t=\partial_t(\nabla u)$, this becomes \begin{align*} \int_U \nabla u\cdot \nabla u_t\,d\mathcal L^n=\frac{1}{2}\frac{d}{dt}\int_U |\nabla u|^2\,d\mathcal L^n. \end{align*} The damping term is \begin{align*} \int_U g(u_t)u_t\,d\mathcal L^n=\int_U |u_t|^{p+1}\,d\mathcal L^n. \end{align*} Combining these identities, \begin{align*} \frac{d}{dt}E(t)+\int_U |u_t(t,x)|^{p+1}\,d\mathcal L^n(x)=0. \end{align*} Integrating from $0$ to $t$ gives \begin{align*} E(t)+\int_0^t\int_U |u_s(s,x)|^{p+1}\,d\mathcal L^n(x)\,ds=E(0). \end{align*} Thus the monotonicity of $g$ appears in the energy law as the nonnegative dissipation density $|u_t|^{p+1}$, even though the resulting decay rate depends on more than the sign of the energy derivative. [/example] ## Heat-Wave Comparisons Through Dissipation Parabolic and hyperbolic equations respond differently to rough data. Heat flow smooths instantly and has infinite propagation speed, while wave flow preserves finite propagation and transports singularities. Damping creates a mixed picture: short-time behaviour can look wave-like, whereas long-time behaviour can resemble diffusion. The telegraph equation is the standard bridge between the two classes. It contains an inertial term $\tau u_{tt}$, a damping term $u_t$, and an elliptic spatial term. [definition: Telegraph Equation] Let $\tau>0$ and $\kappa>0$. The telegraph equation for $u:(0,\infty)\times \mathbb R^n\to\mathbb R$ is \begin{align*} \tau u_{tt}+u_t-\kappa \Delta u=0. \end{align*} [/definition] For high frequencies or short times the second time derivative matters, but for long times the balance $u_t\approx \kappa\Delta u$ becomes dominant after rescaling. This gives a precise way to compare smoothing and propagation. [quotetheorem:7103] [proofunderconstruction:7103] The compatibility condition $u_1^\tau\to\kappa\Delta u_0$ is the parabolic initial constraint: differentiating the limiting heat equation at $t=0$ gives $v_t(0)=\kappa\Delta u_0$. If this condition fails, the fast mode of the telegraph equation carries a transient of size determined by $u_1^\tau-\kappa\Delta u_0$, so convergence at $t=0$ is not expected. The restriction $\delta>0$ excludes precisely this initial layer; on intervals touching $0$, convergence generally requires stronger preparation of the velocity data and higher compatibility. The theorem should therefore not be read as saying that a damped wave instantly becomes heat flow. It gives convergence in a specified Sobolev norm away from the initial time, and it does not transfer heat-flow properties such as infinite propagation speed to each fixed $\tau>0$. This comparison leads naturally to Lyapunov and invariance methods, which study long-time behaviour without claiming full parabolic regularisation. [example: Heat-Like Behaviour of the Telegraph Equation] Taking the spatial Fourier transform of \begin{align*} \tau u_{tt}+u_t-\kappa\Delta u=0 \end{align*} and using $\widehat{-\Delta u}(\xi)=|\xi|^2\hat u(\xi)$ gives, for each fixed frequency $\xi$, \begin{align*} \tau \partial_{tt}\hat u(t,\xi)+\partial_t\hat u(t,\xi)+\kappa |\xi|^2\hat u(t,\xi)=0. \end{align*} Looking for solutions of the form $\hat u(t,\xi)=e^{\lambda t}$ gives \begin{align*} \tau \lambda^2+\lambda+\kappa |\xi|^2=0. \end{align*} The two roots are \begin{align*} \lambda_\pm(\tau,\xi)=\frac{-1\pm\sqrt{1-4\tau\kappa|\xi|^2}}{2\tau}. \end{align*} Fix $\xi$ and assume $\tau>0$ is small enough that $4\tau\kappa|\xi|^2<1$. For the root with the plus sign, rationalizing the numerator gives \begin{align*} \lambda_+(\tau,\xi)=\frac{-1+\sqrt{1-4\tau\kappa|\xi|^2}}{2\tau}\cdot \frac{-1-\sqrt{1-4\tau\kappa|\xi|^2}}{-1-\sqrt{1-4\tau\kappa|\xi|^2}}. \end{align*} The numerator becomes \begin{align*} (-1+\sqrt{1-4\tau\kappa|\xi|^2})(-1-\sqrt{1-4\tau\kappa|\xi|^2})=1-(1-4\tau\kappa|\xi|^2)=4\tau\kappa|\xi|^2. \end{align*} Therefore \begin{align*} \lambda_+(\tau,\xi)=-\frac{2\kappa|\xi|^2}{1+\sqrt{1-4\tau\kappa|\xi|^2}}. \end{align*} Since $\sqrt{1-4\tau\kappa|\xi|^2}\to 1$ as $\tau\downarrow 0$, this root satisfies \begin{align*} \lambda_+(\tau,\xi)\to -\kappa|\xi|^2. \end{align*} For the root with the minus sign, \begin{align*} \tau\lambda_-(\tau,\xi)=\frac{-1-\sqrt{1-4\tau\kappa|\xi|^2}}{2}. \end{align*} Again $\sqrt{1-4\tau\kappa|\xi|^2}\to 1$, so \begin{align*} \tau\lambda_-(\tau,\xi)\to -1. \end{align*} Thus $\lambda_-(\tau,\xi)$ is of size $-1/\tau$, while $\lambda_+(\tau,\xi)$ tends to the heat exponent $-\kappa|\xi|^2$. For this fixed frequency, the Fourier mode has the form \begin{align*} \hat u(t,\xi)=A_+(\xi,\tau)e^{\lambda_+(\tau,\xi)t}+A_-(\xi,\tau)e^{\lambda_-(\tau,\xi)t}. \end{align*} The factor $e^{\lambda_+t}$ converges to $e^{-\kappa t|\xi|^2}$, the heat multiplier, while $e^{\lambda_-t}$ decays on the fast time scale $t\sim\tau$ when its coefficient is present. The calculation is pointwise in $\xi$: if $\tau|\xi|^2$ is not small, then $1-4\tau\kappa|\xi|^2$ is not close to $1$, and for $4\tau\kappa|\xi|^2>1$ the roots are complex with real part $-1/(2\tau)$. Those high-frequency modes therefore retain oscillatory wave-like behaviour before damping dominates. [/example] The comparison also clarifies why heat estimates are stronger than damped wave estimates. Heat semigroups gain spatial derivatives for $t>0$, while damped wave semigroups generally decay in their energy space without full parabolic regularisation. [remark: Smoothing Versus Decay] For the heat equation, the estimate \begin{align*} \|e^{t\Delta}f\|_{L^q(\mathbb R^n)} \le C t^{-\frac{n}{2}\left(\frac{1}{p}-\frac{1}{q}\right)}\|f\|_{L^p(\mathbb R^n)} \end{align*} reflects smoothing and spreading. For the damped wave equation on a bounded domain, exponential decay of energy reflects loss of mechanical energy, not instantaneous improvement of Sobolev regularity. These are different mechanisms even when both estimates contain decay in time. [/remark] ## Lyapunov Functionals and Invariance Principles Energy identities become more powerful when interpreted dynamically. Instead of solving the equation explicitly, we find a functional that decreases along trajectories and identify the states where its derivative vanishes. The central question is whether this information forces a solution to approach an equilibrium. [definition: Lyapunov Functional] Let $X$ be a Hilbert space and let $S(t):X\to X$ be a semigroup. A functional $\Phi:X\to \mathbb R$ is a Lyapunov functional for $S(t)$ if $t\mapsto \Phi(S(t)x)$ is non-increasing for every $x\in X$. [/definition] A Lyapunov functional gives a preferred direction of time, but convergence requires knowing how much the functional decreases. It is useful to name this decrease before passing to limit sets. [definition: Dissipation Functional] Let $X$ be a Hilbert space, let $S(t):X\to X$ be a semigroup, and let $\Phi:X\to\mathbb R$ be a Lyapunov functional. A function $D:X\to[0,\infty)$ is a dissipation functional for $\Phi$ along $S(t)$ if, for every trajectory for which the derivative exists, \begin{align*} \frac{d}{dt}\Phi(S(t)x)=-D(S(t)x). \end{align*} [/definition] In applications the equality may first be proved for smooth solutions and then extended as an integrated energy inequality. The zero set of $D$ is the set where the Lyapunov functional has no instantaneous loss. To turn this information into convergence, the trajectory also needs compactness, since otherwise it may drift through infinitely many almost-stationary states. [definition: Omega-Limit Set] Let $X$ be a Hilbert space and let $S(t):X\to X$ be a semigroup. The omega-limit assignment is the set-valued map $\omega:X\to\mathcal P(X)$ defined by \begin{align*} \omega(x):=\{y\in X: \text{there exist } t_k\to\infty \text{ with } S(t_k)x\to y \text{ in } X\}. \end{align*} [/definition] The omega-limit set records possible limiting states, but by itself it does not say which limiting states are dynamically allowed. The next principle connects compactness of the orbit with the Lyapunov dissipation and rules out limit points away from the invariant zero-dissipation set. [quotetheorem:7104] [citeproof:7104] Compactness is the hypothesis that prevents escape along an infinite-dimensional bounded orbit. For instance, the translation semigroup on $L^2(\mathbb R)$ preserves the $L^2$ norm and has bounded orbits, but a translated bump has no strongly convergent subsequence; without compactness, an omega-limit set may be empty. Continuity of the semigroup is also needed, since invariance of limit points is obtained by passing limits through $S(t)$. Lower semicontinuity of $D$ ensures that zero dissipation survives the limiting process; otherwise a sequence of states with vanishing averaged dissipation could converge to a point where $D$ jumps upward. The conclusion does not say that the trajectory converges to a single equilibrium. It only confines all omega-limit points to the largest invariant subset of $M$, which may contain several equilibria or a continuum of stationary states. Stability results therefore add hypotheses that identify this invariant set, and the next section applies that idea near the zero solution. [example: Gradient Flow in a Hilbert Space] Let $X$ be a Hilbert space and let $J:X\to\mathbb R$ be differentiable, with gradient defined by \begin{align*} DJ(u)[h]=(\nabla J(u),h)_X \qquad \text{for every } h\in X. \end{align*} Suppose $u$ is a smooth solution of the gradient-flow equation \begin{align*} u_t+\nabla J(u)=0. \end{align*} By the chain rule for $J(u(t))$, \begin{align*} \frac{d}{dt}J(u(t))=DJ(u(t))[u_t(t)]. \end{align*} Using the definition of the Hilbert-space gradient, \begin{align*} DJ(u(t))[u_t(t)]=(\nabla J(u(t)),u_t(t))_X. \end{align*} The evolution equation gives $u_t(t)=-\nabla J(u(t))$, so \begin{align*} (\nabla J(u(t)),u_t(t))_X=(\nabla J(u(t)),-\nabla J(u(t)))_X. \end{align*} By bilinearity of the inner product, \begin{align*} (\nabla J(u(t)),-\nabla J(u(t)))_X=-\|\nabla J(u(t))\|_X^2. \end{align*} Therefore \begin{align*} \frac{d}{dt}J(u(t))=-\|\nabla J(u(t))\|_X^2. \end{align*} Thus $J$ is a Lyapunov functional and its dissipation functional is \begin{align*} D(v)=\|\nabla J(v)\|_X^2. \end{align*} Since $D(v)=0$ exactly when $\nabla J(v)=0$, the zero-dissipation set is the set of critical points of $J$. At such a point $v$, the equation becomes $u_t=0$, so the corresponding trajectory is the constant equilibrium $u(t)=v$. If the positive orbit is compact, *LaSalle Invariance Principle in a Hilbert Space* confines every omega-[limit point](/page/Limit%20Point) to the largest invariant subset of these critical points, so the only possible limiting states are equilibria of the gradient flow. [/example] ## Asymptotic Stability for Dissipative Semilinear Evolutions The final step is to combine semigroup well-posedness with Lyapunov decay. A semilinear evolution may have a linear dissipative part and a nonlinear term that preserves, weakens, or strengthens stability. The main question is how to state assumptions that guarantee the zero solution remains stable and attracts nearby trajectories. [definition: Asymptotic Stability] Let $X$ be a Hilbert space and let $S(t):X\to X$ be a semigroup with equilibrium $0$, meaning $S(t)0=0$ for all $t\ge 0$. The equilibrium $0$ is stable if for every $\varepsilon>0$ there exists $\delta>0$ such that $\|x\|_X<\delta$ implies $\|S(t)x\|_X<\varepsilon$ for all $t\ge 0$. It is asymptotically stable if it is stable and there exists $r>0$ such that $\|x\|_X<r$ implies $S(t)x\to 0$ in $X$ as $t\to\infty$. [/definition] Stability is local in the initial condition, while asymptotic stability adds attraction. The remaining issue is how to verify both properties from estimates available for a semilinear PDE: a Lyapunov functional controls the norm, dissipation prevents escape, and compactness identifies the limiting dynamics. [quotetheorem:7105] [citeproof:7105] Each assumption rules out a concrete failure mode. Without local equivalence between $\Phi$ and $\|x\|_X^2$, small Lyapunov value might not mean small phase-space norm; a functional flat in one direction cannot prove stability in that direction. Without the invariant sublevel condition, the argument would assume the trajectory stays in the region where the Lyapunov estimate is valid while using that same estimate to prove it. Without precompactness, the omega-limit set may be empty, as for translations on an unbounded domain. Without uniqueness of the zero-dissipation invariant set, LaSalle gives convergence only to that larger invariant set; a damped wave with a continuum of nearby stationary solutions may settle to a nonzero equilibrium. The theorem is intentionally abstract, because the same pattern appears in damped waves, reaction-diffusion equations, and relaxation systems. The problem-specific work is to build the functional, verify its local equivalence to the natural norm, prove a trapping sublevel set, and establish the compactness required for LaSalle's principle. [example: Semilinear Damped Wave Near a Strict Potential Minimum] Let $\lambda_1$ be the first Dirichlet eigenvalue of $-\Delta$ on $U$, so *Poincare's inequality* gives \begin{align*} \lambda_1\|u\|_{L^2(U)}^2\le \|\nabla u\|_{L^2(U)}^2 \end{align*} for $u\in H^1_0(U)$. Write $\alpha=F''(0)$. Since $\alpha>-\lambda_1$, choose $\varepsilon>0$ such that $\beta:=\alpha-\varepsilon>-\lambda_1$. By continuity of $F''$, there is $r>0$ such that $F''(s)\ge \beta$ whenever $|s|\le r$. For such $s$, using $F(0)=F'(0)=0$ and integrating twice, \begin{align*} F(s)=\int_0^s\int_0^\sigma F''(\theta)\,d\theta\,d\sigma \end{align*} if $s\ge 0$, and the same formula along the interval from $0$ to $s$ gives the bound \begin{align*} F(s)\ge \frac{\beta}{2}s^2. \end{align*} Thus, for a state with $|u(x)|\le r$, \begin{align*} \frac{1}{2}\|\nabla u\|_{L^2(U)}^2+\int_U F(u)\,d\mathcal L^n\ge \frac{1}{2}\|\nabla u\|_{L^2(U)}^2+\frac{\beta}{2}\|u\|_{L^2(U)}^2. \end{align*} If $\beta\ge 0$, the right-hand side is at least $\frac{1}{2}\|\nabla u\|_{L^2(U)}^2$. If $\beta<0$, then Poincare's inequality gives \begin{align*} \frac{\beta}{2}\|u\|_{L^2(U)}^2\ge \frac{\beta}{2\lambda_1}\|\nabla u\|_{L^2(U)}^2, \end{align*} so \begin{align*} \frac{1}{2}\|\nabla u\|_{L^2(U)}^2+\int_U F(u)\,d\mathcal L^n\ge \frac{1}{2}\left(1+\frac{\beta}{\lambda_1}\right)\|\nabla u\|_{L^2(U)}^2. \end{align*} The factor $1+\beta/\lambda_1$ is positive because $\beta>-\lambda_1$. Conversely, since $F''$ is continuous near $0$, there is $M>0$ such that $|F''(s)|\le M$ for $|s|\le r$, and the same double integration gives \begin{align*} |F(s)|\le \frac{M}{2}s^2. \end{align*} Therefore \begin{align*} \int_U F(u)\,d\mathcal L^n\le \frac{M}{2}\|u\|_{L^2(U)}^2\le \frac{M}{2\lambda_1}\|\nabla u\|_{L^2(U)}^2. \end{align*} These lower and upper bounds show that $\Phi[u,u_t]$ is locally equivalent to the $H^1_0(U)\times L^2(U)$ energy norm. Now let $u$ be a smooth solution of \begin{align*} u_{tt}-\Delta u+a u_t+F'(u)=0 \end{align*} with homogeneous Dirichlet boundary condition. Differentiating the kinetic part gives \begin{align*} \frac{d}{dt}\frac{1}{2}\|u_t(t)\|_{L^2(U)}^2=\int_U u_{tt}(t,x)u_t(t,x)\,d\mathcal L^n(x). \end{align*} For the elastic part, integration by parts and $u_t=0$ on $\partial U$ give \begin{align*} \frac{d}{dt}\frac{1}{2}\|\nabla u(t)\|_{L^2(U)}^2=\int_U \nabla u(t,x)\cdot \nabla u_t(t,x)\,d\mathcal L^n(x). \end{align*} The same integration by parts rewrites this as \begin{align*} \int_U \nabla u\cdot \nabla u_t\,d\mathcal L^n=-\int_U (\Delta u)u_t\,d\mathcal L^n. \end{align*} For the potential term, the chain rule gives \begin{align*} \frac{d}{dt}\int_U F(u(t,x))\,d\mathcal L^n(x)=\int_U F'(u(t,x))u_t(t,x)\,d\mathcal L^n(x). \end{align*} Adding the three identities, \begin{align*} \frac{d}{dt}\Phi[u(t),u_t(t)]=\int_U \left(u_{tt}-\Delta u+F'(u)\right)u_t\,d\mathcal L^n. \end{align*} The equation says $u_{tt}-\Delta u+F'(u)=-a u_t$, hence \begin{align*} \frac{d}{dt}\Phi[u(t),u_t(t)]=-a\int_U |u_t(t,x)|^2\,d\mathcal L^n(x). \end{align*} Thus $\Phi$ is a Lyapunov functional, and its dissipation vanishes exactly when $u_t=0$. On such a trajectory the equation reduces to the stationary elliptic equation \begin{align*} -\Delta u+F'(u)=0. \end{align*} If $0$ is the only nearby solution of this stationary problem, then the only nearby invariant zero-dissipation trajectory is the zero trajectory. Under the precompactness hypothesis in the abstract stability criterion, *[Local LaSalle Stability Theorem for Precompact Semiflows](/theorems/7105)* therefore gives local asymptotic stability of $(u,u_t)=(0,0)$. [/example] The chapter's main lesson is that dissipation is a structural property, not just a sign in an equation. For parabolic equations it is tied to smoothing and contraction; for hyperbolic equations it appears as decay of an energy on the phase space; for semilinear evolutions it is organised by Lyapunov functionals and invariance principles. These tools will reappear whenever long-time behaviour matters more than explicit representation formulas. ## Beyond and Connections These notes continue the path from [Partial Differential Equations I: Classical Foundations and First-Order Equations](/page/Partial%20Differential%20Equations%20I%3A%20Classical%20Foundations%20and%20First-Order%20Equations) and [Partial Differential Equations II: Elliptic Theory and Variational Methods](/page/Partial%20Differential%20Equations%20II%3A%20Elliptic%20Theory%20and%20Variational%20Methods). PDE I supplies characteristics, classical solution concepts, and first-order intuition; PDE II supplies weak formulations, elliptic operators, coercivity, and variational estimates. The present page uses those tools in time-dependent settings. The parabolic chapters connect most directly to [Heat Equation](/page/Heat%20Equation), [Sobolev Space](/page/Sobolev%20Space), [Inhomogeneous Sobolev Space](/page/Inhomogeneous%20Sobolev%20Space), [Semigroup Theory](/page/Semigroup%20Theory), and [fixed point methods in partial differential equations](/page/Fixed%20Point%20Methods%20in%20Partial%20Differential%20Equations). Those pages supply the kernel estimates, weak differentiability language, operator evolution viewpoint, and contraction principles used to pass from linear heat flow to semilinear parabolic problems. The hyperbolic chapters connect most directly to [Wave Equation](/page/Wave%20Equation), [Fourier Transform](/page/Fourier%20Transform), [Fourier Transform on L²](/page/Fourier%20Transform%20on%20L%C2%B2), [Hilbert Space](/page/Hilbert%20Space), and [Self-Adjoint Operators](/page/Self-Adjoint%20Operators). These topics explain why wave equations are governed by energy, finite propagation, spectral decompositions, and oscillatory representations rather than by instantaneous smoothing. The compactness and long-time chapters point toward [Calculus of Variations (PDEs)](/page/Calculus%20of%20Variations%20%28PDEs%29), [Banach-Alaoglu Theorem](/page/Banach-Alaoglu%20Theorem), [Compact Operator](/page/Compact%20Operator), and [Cambridge II Dynamical Systems](/page/Cambridge%20II%20Dynamical%20Systems). These links mark the transition from proving existence to understanding limits, invariant sets, Lyapunov functionals, and asymptotic behaviour. ## References - Lawrence C. Evans, *Partial Differential Equations*, second edition, American Mathematical Society, 2010. - Amnon Pazy, *Semigroups of Linear Operators and Applications to Partial Differential Equations*, Springer, 1983. - Roger Temam, *Infinite-Dimensional Dynamical Systems in Mechanics and Physics*, second edition, Springer, 1997. - Michael E. Taylor, *Partial Differential Equations III: Nonlinear Equations*, second edition, Springer, 2011.

Created by admin on 6/15/2026 | Last updated on 6/15/2026

What brings you to Androma?

Start with a route through the knowledge graph.

Partial Differential Equations III: Parabolic and Hyperbolic Evolution Equations

Sign in to Androma

Check your inbox

One last step

Partial Differential Equations III: Parabolic and Hyperbolic Evolution Equations

Prerequisites (0/13 completed)

Prerequisites Graph

Rate this page