This course introduces partial differential equations as the language of evolving functions, geometric motion, and physical balance laws. It begins with the basic idea that a PDE relates a function of several variables to its derivatives, then develops the main first-order models that arise in transport, wave propagation, and nonlinear dynamics. The emphasis is on understanding what PDEs mean, how their solutions are organized, and why different classes of equations require different methods.
The central themes are characteristics, Hamilton-Jacobi theory, conservation laws, weak solutions, and admissibility. The course starts with linear first-order equations and then moves to quasilinear equations, where solution curves and nonlinear flow effects become essential. From there it studies conservation laws, the role of shocks and discontinuities, entropy conditions, and the Riemann problem, which together explain how physically meaningful solutions are selected when classical smooth solutions fail.
Later chapters broaden the toolkit by introducing distributions, fundamental solutions, Green kernels, representation formulas, maximum principles, and energy methods. These tools connect classical explicit methods with the modern weak formulation of PDEs and clarify how boundary and initial data enter well-posed problems. The course ends by synthesizing these ideas into a coherent picture: classical formulas, qualitative estimates, and weak-solution concepts all fit into a single framework for understanding first-order PDEs and the foundations of broader PDE theory.
# Introduction
These introductory notes set the viewpoint for the course: a partial differential equation is not only an equation containing derivatives, but a rule that constrains how a function may vary in several independent directions at once. The course begins with first-order equations because they expose the geometry of propagation before the analytic machinery of elliptic, parabolic, and hyperbolic theory is developed. The recurring questions are existence, uniqueness, stability, and the formation of singularities from smooth data.
## What Is a Partial Differential Equation Trying to Determine?
The first problem is to identify the unknown and the information that is being prescribed. In ordinary differential equations, the unknown usually evolves along one independent variable, often time. In a partial differential equation, the unknown depends on several variables, so the equation may constrain spatial variation, temporal variation, or both. The course therefore needs a broad baseline definition before distinguishing steady equations, evolution equations, and boundary-value problems.
[definition: Partial Differential Equation]
A partial differential equation is an equation for an unknown function $u: U \to \mathbb R^m$, where $U \subseteq \mathbb R^n$ is open, involving $u$ and finitely many of its partial derivatives on $U$.
[/definition]
The definition is intentionally broad. The same formal object can describe a steady state, a time evolution, a conservation law, or a geometric constraint. What turns the formal equation into a mathematical problem is the choice of data imposed on part of the domain or on its boundary.
[example: Three Prototype Meanings]
Let $u$ be a scalar field. For a steady field $u:\mathbb R^n\to\mathbb R$, the equation $\Delta u=0$ means
\begin{align*}
\partial_{x_1x_1}u(x)+\partial_{x_2x_2}u(x)+\cdots+\partial_{x_nx_n}u(x)=0
\end{align*}
at each point $x$, so the second-order bending in the coordinate directions balances to zero. For a time-dependent field $u:\mathbb R^n\times(0,\infty)\to\mathbb R$, the [heat equation](/page/Heat%20Equation) $\partial_tu-\Delta u=0$ is the same as
\begin{align*}
\partial_tu(x,t)=\Delta u(x,t),
\end{align*}
so the time change is determined by the spatial curvature.
For constant transport, fix $b=(b_1,\dots,b_n)\in\mathbb R^n$ and let $u_0\in C^1(\mathbb R^n)$. Define $u(x,t)=u_0(x-bt)$, where $x-bt=(x_1-b_1t,\dots,x_n-b_nt)$. By the chain rule,
\begin{align*}
\partial_tu(x,t)=\sum_{i=1}^n \partial_{x_i}u_0(x-bt)(-b_i)=-\sum_{i=1}^n b_i\partial_{x_i}u_0(x-bt).
\end{align*}
For each $j\in\{1,\dots,n\}$, the chain rule also gives
\begin{align*}
\partial_{x_j}u(x,t)=\sum_{i=1}^n \partial_{x_i}u_0(x-bt)\delta_{ij}=\partial_{x_j}u_0(x-bt).
\end{align*}
Hence
\begin{align*}
b\cdot\nabla u(x,t)=\sum_{i=1}^n b_i\partial_{x_i}u(x,t)=\sum_{i=1}^n b_i\partial_{x_i}u_0(x-bt).
\end{align*}
Combining the two displayed identities,
\begin{align*}
\partial_tu(x,t)+b\cdot\nabla u(x,t)=-\sum_{i=1}^n b_i\partial_{x_i}u_0(x-bt)+\sum_{i=1}^n b_i\partial_{x_i}u_0(x-bt)=0.
\end{align*}
Thus $\Delta u=0$ models balance, $\partial_tu-\Delta u=0$ models diffusion driven by curvature, and $\partial_tu+b\cdot\nabla u=0$ models rigid transport of the initial profile along the lines $x-bt=\text{constant}$.
[/example]
These examples also signal why PDE theory separates equations by qualitative behaviour. The Laplace equation spreads boundary information throughout a region, the heat equation smooths initial data forward in time, and the transport equation carries data along curves.
## What Counts as a Solution?
A second problem appears before any equation is solved: the derivatives in the equation may not exist after the dynamics develops. The course therefore begins with classical solutions because this is the setting where the equation can be read literally, then returns in Chapters 5 and 6 to weak and entropy solutions when shocks and conservation laws force a larger solution class.
[definition: Classical Solution]
Let $U \subseteq \mathbb R^n$ be open. A classical solution of a partial differential equation of order $k$ on $U$ is a function $u \in C^k(U;\mathbb R^m)$ whose derivatives satisfy the equation at every point of $U$.
[/definition]
Classical solutions are the natural first notion because the equation can be checked pointwise. Their limitation is that first-order nonlinear equations can create discontinuities even from smooth initial data, so the pointwise definition is sometimes too restrictive for the long-time problem.
[example: Smooth Transport Solution]
Fix $b=(b_1,\dots,b_n)\in\mathbb R^n$ and let $u_0\in C^1(\mathbb R^n)$. Define $u:\mathbb R^n\times\mathbb R\to\mathbb R$ by $u(x,t)=u_0(x-bt)$, where $x-bt=(x_1-b_1t,\dots,x_n-b_nt)$. We verify that $u$ satisfies $\partial_tu+b\cdot\nabla u=0$ pointwise.
By the chain rule applied to the map $t\mapsto x-bt$,
\begin{align*}
\partial_tu(x,t)=\sum_{i=1}^n \partial_{x_i}u_0(x-bt)\,\partial_t(x_i-b_it).
\end{align*}
Since $\partial_t(x_i-b_it)=-b_i$, this becomes
\begin{align*}
\partial_tu(x,t)=\sum_{i=1}^n \partial_{x_i}u_0(x-bt)(-b_i)=-\sum_{i=1}^n b_i\partial_{x_i}u_0(x-bt).
\end{align*}
For each $j\in\{1,\dots,n\}$, the chain rule applied to the map $x\mapsto x-bt$ gives
\begin{align*}
\partial_{x_j}u(x,t)=\sum_{i=1}^n \partial_{x_i}u_0(x-bt)\,\partial_{x_j}(x_i-b_it).
\end{align*}
Here $\partial_{x_j}(x_i-b_it)=\delta_{ij}$, so
\begin{align*}
\partial_{x_j}u(x,t)=\sum_{i=1}^n \partial_{x_i}u_0(x-bt)\delta_{ij}=\partial_{x_j}u_0(x-bt).
\end{align*}
Therefore
\begin{align*}
b\cdot\nabla u(x,t)=\sum_{j=1}^n b_j\partial_{x_j}u(x,t)=\sum_{j=1}^n b_j\partial_{x_j}u_0(x-bt).
\end{align*}
Combining the time derivative with this spatial derivative,
\begin{align*}
\partial_tu(x,t)+b\cdot\nabla u(x,t)=-\sum_{i=1}^n b_i\partial_{x_i}u_0(x-bt)+\sum_{j=1}^n b_j\partial_{x_j}u_0(x-bt)=0.
\end{align*}
Thus the first-order equation describes rigid motion of the initial values along the lines $x-bt=\text{constant}$, rather than local averaging.
[/example]
The preceding computation is the model for the [method of characteristics](/page/Method%20of%20Characteristics), and it also motivates a theorem because existence is only half of the Cauchy problem. We also need to know that the initial profile determines no other classical solution.
[quotetheorem:6169]
[citeproof:6169]
The $C^1$ hypothesis is doing real work: the proof differentiates both the proposed solution and an arbitrary competing solution along characteristic curves. If $n=1$, $b=1$, and $u_0(x)=|x|$, then the formula gives $u(x,t)=|x-t|$, which transports the initial profile but is not a classical $C^1$ solution across the line $x=t$. Thus the formula may still describe propagation beyond the classical category, but the theorem as stated only proves classical existence and uniqueness.
The result also has a deliberately narrow scope. It has no boundary, no variable velocity field, and no question of characteristics intersecting or leaving the domain. Its value is that it isolates the central mechanism: a PDE is converted into an ODE along carefully chosen curves, and initial data are transported from the initial surface to the point where the solution is being evaluated.
## Which Data Make a PDE Problem Well Posed?
The next problem is that an equation alone rarely determines a unique function. Boundary conditions, initial conditions, or mixed conditions are part of the problem, and they must be placed where the equation can actually use them. To discuss this placement precisely, we separate the differential equation from the data that are prescribed with it.
[definition: Cauchy Problem]
Let $U \subseteq \mathbb R^n$ be open, let $S \subset \overline{U}$ be an initial set, and let $Y$ be a function space of maps $u:U\to\mathbb R^m$. A Cauchy problem for a PDE consists of the equation on $U$, the requirement $u\in Y$, and prescribed values of $u$, and possibly derivatives of $u$, on $S$.
[/definition]
A Cauchy problem records where the data enter, but it does not yet say whether the resulting problem is mathematically reliable. Even when data are specified, three different failures can occur: there may be no solution, there may be more than one solution, or tiny errors in the data may cause large changes in the solution. The word well posed packages exactly the conditions that rule out these failures and gives a standard test for whether a PDE problem is stable enough to be useful.
[definition: Well-Posed Problem]
Let $X$ be a data space and let $Y$ be a solution space, both equipped with specified topologies. A PDE problem with data $g\in X$ is well posed from $X$ to $Y$ if for every admissible $g\in X$ there exists a solution $u\in Y$, this solution is unique in $Y$, and the solution map $g\mapsto u$ is continuous from $X$ to $Y$.
[/definition]
The third condition matters because mathematical models are used with measured or rounded data. A problem may have a unique formal solution but still be unusable if small perturbations in the data produce large changes in the answer.
[example: Missing Data for Transport]
Fix $b=(b_1,\dots,b_n)\in\mathbb R^n$ and consider
\begin{align*}
\partial_t u+b\cdot\nabla u=0
\end{align*}
on $\mathbb R^n\times\mathbb R$ with no prescribed initial condition. Let $F\in C^1(\mathbb R^n)$ and define $u(x,t)=F(x-bt)$, where $x-bt=(x_1-b_1t,\dots,x_n-b_nt)$. Since $F$ is $C^1$ and $(x,t)\mapsto x-bt$ is smooth, the composition $u$ is $C^1$. By the chain rule,
\begin{align*}
\partial_tu(x,t)=\sum_{i=1}^n \partial_{x_i}F(x-bt)\,\partial_t(x_i-b_it).
\end{align*}
Since $\partial_t(x_i-b_it)=-b_i$, this gives
\begin{align*}
\partial_tu(x,t)=\sum_{i=1}^n \partial_{x_i}F(x-bt)(-b_i).
\end{align*}
Equivalently,
\begin{align*}
\partial_tu(x,t)=-\sum_{i=1}^n b_i\partial_{x_i}F(x-bt).
\end{align*}
For each $j\in\{1,\dots,n\}$, the chain rule gives
\begin{align*}
\partial_{x_j}u(x,t)=\sum_{i=1}^n \partial_{x_i}F(x-bt)\,\partial_{x_j}(x_i-b_it).
\end{align*}
Because $\partial_{x_j}(x_i-b_it)=\delta_{ij}$, we get
\begin{align*}
\partial_{x_j}u(x,t)=\sum_{i=1}^n \partial_{x_i}F(x-bt)\delta_{ij}.
\end{align*}
By the defining property of the Kronecker delta, this reduces to
\begin{align*}
\partial_{x_j}u(x,t)=\partial_{x_j}F(x-bt).
\end{align*}
Therefore
\begin{align*}
b\cdot\nabla u(x,t)=\sum_{j=1}^n b_j\partial_{x_j}u(x,t).
\end{align*}
Substituting the formula for $\partial_{x_j}u$ yields
\begin{align*}
b\cdot\nabla u(x,t)=\sum_{j=1}^n b_j\partial_{x_j}F(x-bt).
\end{align*}
Combining the time derivative and spatial derivative,
\begin{align*}
\partial_tu(x,t)+b\cdot\nabla u(x,t)=-\sum_{i=1}^n b_i\partial_{x_i}F(x-bt)+\sum_{j=1}^n b_j\partial_{x_j}F(x-bt)=0.
\end{align*}
Thus every $C^1$ profile $F$ produces a classical solution. For example, $F_0(y)=0$ gives $u_0(x,t)=0$, while $F_1(y)=1$ gives $u_1(x,t)=1$; both satisfy the same PDE, but $u_0\ne u_1$. The equation alone therefore determines how a chosen profile is transported, but without data it does not determine which profile is being transported.
[/example]
The example explains why the course treats a PDE problem as a package: equation, domain, data, and function class. Changing any one of these can change both the answer and the method of proof.
## Why First-Order Equations Come First
The central question in the first half of the course is how local directional information determines a function. A first-order scalar equation usually prescribes the directional derivative of $u$ along a vector field or imposes a nonlinear relation between $\nabla u$ and the independent variables. The simplest class isolates this directional mechanism while keeping the unknown $u$ only linearly involved.
[definition: Linear First-Order Equation]
Let $U \subseteq \mathbb R^n$ be open. A linear first-order scalar PDE on $U$ has the form
\begin{align*}
\sum_{i=1}^n a_i(x)\partial_{x_i}u(x)+c(x)u(x)=f(x),
\end{align*}
where $a_i,c,f:U\to\mathbb R$ are given functions and $u:U\to\mathbb R$ is the unknown.
[/definition]
The vector field $a=(a_1,\dots,a_n)$ is the geometric part of the equation. Since its integral curves provide the directions in which derivatives are prescribed, the basic working rule is to restrict the unknown to one such curve and read the PDE as an ordinary differential equation there. This rule is the local characteristic reduction: choose a curve $\gamma:(-\varepsilon,\varepsilon)\to U$ satisfying $\dot{\gamma}(s)=a(\gamma(s))$, and track the value of a known $C^1$ solution along that curve. If $w(s)=u(\gamma(s))$, then applying the chain rule to $u(\gamma(s))$ and using
\begin{align*}
a(x)\cdot\nabla u(x)+c(x)u(x)=f(x)
\end{align*}
gives the one-variable equation
\begin{align*}
\dot{w}(s)+c(\gamma(s))w(s)=f(\gamma(s)).
\end{align*}
Thus the PDE determines how $u$ changes along each characteristic curve, provided the curve stays in the domain where the equation is imposed.
The hypotheses mark the limits of the reduction. The solution $u$ must be $C^1$ because the restriction $s\mapsto u(\gamma(s))$ is being treated as a classical one-variable function, and the curve must remain inside $U$ because the original PDE is only known there. For instance, if $U=(0,1)\subset\mathbb R$ and $a(x)=1$, a characteristic starting near $1$ leaves $U$ after a short time, so the displayed ODE only describes the solution up to that exit time.
This reduction explains what any already-known classical solution must do along the curves of the vector field. To build solutions from data, Chapter 2 adds an initial hypersurface, requires the vector field to meet it transversely, and then solves the characteristic system backwards to recover the point where the data are prescribed.
## How This Course Fits into Later PDE Theory
The last introductory problem is to understand why the course mixes classical formulas with weak formulations. Classical methods explain the geometry of equations, while weak methods preserve meaningful solutions after classical derivatives cease to exist.
[definition: Weak Formulation]
Let $Q \subseteq \mathbb R^{n+1}$ be open, and let $u:Q\to\mathbb R$ be a locally integrable unknown. A weak formulation of a PDE on $Q$ is an integral identity required to hold for every [test function](/page/Test%20Function) $\phi\in C_c^\infty(Q)$, obtained by transferring derivatives from $u$ onto $\phi$ through [integration by parts](/theorems/210).
[/definition]
The weak viewpoint does not replace the classical one; it extends it. The point of the definition is that the equation can still be tested even when $u$ has too little regularity for its derivatives to be read pointwise.
This raises a necessary compatibility question. If a transport solution is already smooth in the classical sense, then the weak formulation should recognize exactly the same equation after [integration by parts](/theorems/2098). Otherwise the weak theory would not be an extension of the classical theory, but a different problem. The following consistency statement records this bridge for the constant-coefficient transport equation; here $T>0$ is a fixed final time and $\mathcal L^n$ denotes $n$-dimensional [Lebesgue measure](/page/Lebesgue%20Measure).
[quotetheorem:6170]
[citeproof:6170]
The smoothness assumption is what permits ordinary integration by parts; the weak formulation is designed so that the final integral identity still makes sense after $u$ loses classical derivatives. The compact support of $\phi$ is also essential for this clean statement, because it removes time-boundary terms at $0$ and $T$ and spatial boundary terms at infinity. The theorem proves only consistency of the two notions: it does not prove that weak solutions exist, that they are unique, or that every weak solution arises from a classical one.
This theorem prepares the transition from classical first-order equations to conservation laws. Once discontinuities appear, the weak identity may still make sense, but uniqueness can fail unless an additional entropy condition selects the physically relevant solution.
[remark: Course Trajectory]
The course proceeds from classification and well-posedness to characteristics, then to nonlinear first-order equations, shocks, weak solutions, entropy conditions, and representation formulas for the Laplace, heat, and wave equations. The purpose of the introductory chapter is to fix the questions that recur throughout: what is the unknown, where is the data placed, which solution class is being used, and how does the equation transmit information?
[/remark]
Those opening questions about unknowns, data, solution classes, and information flow now become the formal language of the course. Chapter 1 turns that intuition into a precise framework for classifying PDEs and stating what it means for a problem to be well posed.
# 1. PDEs as Equations for Functions and Flows
Partial differential equations enter the course as equations whose unknowns are functions rather than numbers or curves. This first chapter fixes the language used throughout the course: how PDEs are classified, how data are attached to them, and what it means for a problem to be mathematically well posed. The guiding theme is that a PDE is not only an algebraic relation among derivatives; it also describes directions of propagation, smoothing, oscillation, or constraint.
## Classifying Partial Differential Equations
When first meeting an unfamiliar PDE, the first question is not how to solve it. The first question is what kind of equation it is: which derivatives occur, how they enter, and what geometric information the highest-order terms carry. These distinctions determine which data may be prescribed and which methods are likely to apply.
[definition: Partial Differential Equation]
Let $U \subset \mathbb R^n$ be open and let $m \in \mathbb N$. For an unknown function $u\in C^m(U)$, a partial differential equation of order at most $m$ is an equation of the form
\begin{align*}
F\bigl(x,u(x),\nabla u(x),D^2u(x),\dots,D^m u(x)\bigr)=0,
\end{align*}
where
\begin{align*}
F:U\times\mathbb R\times\mathbb R^n\times\mathcal S^2(\mathbb R^n)\times\cdots\times\mathcal S^m(\mathbb R^n)\to\mathbb R
\end{align*}
is a prescribed function and $\mathcal S^k(\mathbb R^n)$ denotes the space of symmetric $k$-linear forms on $\mathbb R^n$.
[/definition]
This definition records the formal object studied by the course, but it is too broad to guide method selection by itself. The first coarse invariant is the largest derivative order present, because first-order equations lead to characteristic curves while second-order equations include the elliptic, parabolic, and hyperbolic families.
To make that invariant precise, we need notation that can name every mixed partial derivative without listing coordinates one by one. We use standard multi-index notation in the classification definitions. A multi-index is a vector $\alpha=(\alpha_1,\dots,\alpha_n)\in\mathbb N_0^n$, its length is $|\alpha|=\alpha_1+\cdots+\alpha_n$, and
\begin{align*}
D^\alpha u=\partial_{x_1}^{\alpha_1}\cdots\partial_{x_n}^{\alpha_n}u.
\end{align*}
[definition: Order of a PDE]
Let $U\subset\mathbb R^n$ be open, let $m\in\mathbb N$, and let $u\in C^m(U)$. Let
\begin{align*}
F:U\times\mathbb R\times\mathbb R^n\times\mathcal S^2(\mathbb R^n)\times\cdots\times\mathcal S^m(\mathbb R^n)\to\mathbb R
\end{align*}
be a prescribed function, where $\mathcal S^k(\mathbb R^n)$ denotes the space of symmetric $k$-linear forms on $\mathbb R^n$. For a PDE represented as
\begin{align*}
F\bigl(x,u(x),\nabla u(x),D^2u(x),\dots,D^m u(x)\bigr)=0,
\end{align*}
the order is the largest integer $k\le m$ such that $F$ has nonzero dependence on at least one derivative $D^\alpha u$ with $|\alpha|=k$.
[/definition]
The order tells us how many derivatives the equation controls, but it does not say whether solutions can be added or scaled. To decide whether superposition and linear operator methods are available, we need to separate linear equations from nonlinear ones.
[definition: Linear PDE]
Let $U \subset \mathbb R^n$ be open and let $m\in\mathbb N$. A PDE for $u\in C^m(U)$ is linear if it can be written in the form
\begin{align*}
Lu=f,
\end{align*}
where $f\in C(U)$ is prescribed and $L:C^m(U)\to C(U)$ is a linear differential operator.
[/definition]
For the classical operators used in this chapter, the coordinate form of a linear differential operator is
\begin{align*}
Lu=\sum_{|\alpha|\le m}a_\alpha(x)D^\alpha u,
\end{align*}
with coefficients $a_\alpha\in C(U)$.
Having separated linear equations from nonlinear ones, we still need to know which part of a linear operator controls its leading local behaviour. This motivates the principal part: the collection of the highest-order derivative terms, which is the part used in the first classification of elliptic, parabolic, and hyperbolic equations.
[definition: Principal Part]
Let $U \subset \mathbb R^n$ be open, let $m\in\mathbb N$, and let $L:C^m(U)\to C(U)$ be a linear differential operator of order $m$ of the form
\begin{align*}
Lu=\sum_{|\alpha|\le m} a_\alpha(x)D^\alpha u,
\end{align*}
where $a_\alpha\in C(U)$ for every multi-index $\alpha$ with $|\alpha|\le m$. The principal part of $L$ is the operator $L_m:C^m(U)\to C(U)$ defined by
\begin{align*}
L_m u=\sum_{|\alpha|=m} a_\alpha(x)D^\alpha u.
\end{align*}
[/definition]
Lower-order terms influence estimates, forcing, and growth, but the principal part gives the first approximation to the equation at high frequency. Comparing principal parts is therefore the fastest way to see why the standard second-order model equations have different qualitative behaviour.
[example: Principal Parts of Model Equations]
For the Laplace equation $\Delta u=0$ on $U\subset\mathbb R^n$, the associated linear operator is
\begin{align*}
Lu=\Delta u=\sum_{i=1}^n \partial_{x_i}^2u.
\end{align*}
For each index $i$, the derivative $\partial_{x_i}^2u$ is the multi-index derivative $D^\alpha u$ with $\alpha_i=2$ and $\alpha_j=0$ for $j\ne i$, so $|\alpha|=2$. Hence every term in $L$ has order $2$, and there are no terms with $|\alpha|<2$. The principal part, obtained by keeping exactly the order-$2$ terms, is therefore
\begin{align*}
L_2u=\sum_{i=1}^n \partial_{x_i}^2u=\Delta u.
\end{align*}
For the heat equation, use variables $(t,x_1,\dots,x_n)$ and write
\begin{align*}
Lu=\partial_tu-\Delta u=\partial_tu-\sum_{i=1}^n\partial_{x_i}^2u.
\end{align*}
The derivative $\partial_tu$ is first order because it differentiates once in the $t$ variable. For each $i$, the derivative $\partial_{x_i}^2u$ is second order because it differentiates twice in the $x_i$ variable. Thus the highest-order terms in $L$ are precisely
\begin{align*}
-\partial_{x_1}^2u,\ -\partial_{x_2}^2u,\ \dots,\ -\partial_{x_n}^2u,
\end{align*}
while $\partial_tu$ is lower order. Keeping only those second-order terms gives
\begin{align*}
L_2u=-\sum_{i=1}^n\partial_{x_i}^2u.
\end{align*}
Since
\begin{align*}
\Delta u=\sum_{i=1}^n\partial_{x_i}^2u,
\end{align*}
this can also be written as
\begin{align*}
L_2u=-\Delta u.
\end{align*}
For the [wave equation](/page/Wave%20Equation), again use variables $(t,x_1,\dots,x_n)$ and write
\begin{align*}
Lu=\partial_t^2u-\Delta u=\partial_t^2u-\sum_{i=1}^n\partial_{x_i}^2u.
\end{align*}
The derivative $\partial_t^2u$ has order $2$, and each derivative $\partial_{x_i}^2u$ also has order $2$. Therefore every displayed term is already a highest-order term, so the principal part is the full operator:
\begin{align*}
L_2u=\partial_t^2u-\sum_{i=1}^n\partial_{x_i}^2u.
\end{align*}
Thus the Laplace operator has only same-sign second spatial derivatives, the heat operator has second-order spatial terms together with a lower-order time derivative, and the wave operator has second time and second spatial derivatives with opposite signs. This principal-part distinction is the first algebraic signal separating elliptic, parabolic, and hyperbolic behaviour.
[/example]
The course next turns from classification by formula to classification by geometry. First-order equations provide the cleanest entry point because their geometry is carried by curves.
[definition: Characteristic Curve for a Transport Field]
Let $b:U\subset\mathbb R^n\to\mathbb R^n$ be a continuous vector field. A characteristic curve for $b$ is a differentiable curve $X:I\to U$ satisfying
\begin{align*}
\dot X(s)=b(X(s))
\end{align*}
for all $s\in I$.
[/definition]
Along such a curve, a first-order transport equation becomes an ordinary differential equation. This observation is the seed of the method of characteristics developed in Chapter 2.
## Cauchy Problems, Boundary Value Problems, and Well-Posedness
A PDE becomes a mathematical problem only after data are attached to it. The second guiding question is therefore: where should the data be prescribed so that the equation determines a solution? Initial data, boundary data, and mixed data lead to different notions of solvability.
[definition: Cauchy Problem]
Let $U\subset\mathbb R^n$ be open, let $m\in\mathbb N$, let $r\in\mathbb N\cup\{0\}$, let $\Sigma\subset U$ be a $C^1$ hypersurface, and let $X\subset C^m(U)$ be a chosen solution space for an unknown map $u:U\to\mathbb R$. Let
\begin{align*}
F:U\times\mathbb R\times\mathbb R^n\times\mathcal S^2(\mathbb R^n)\times\cdots\times\mathcal S^m(\mathbb R^n)\to\mathbb R
\end{align*}
be a prescribed function, and let
\begin{align*}
F\bigl(x,u(x),\nabla u(x),\dots,D^m u(x)\bigr)=0
\end{align*}
be a PDE on $U$. A Cauchy problem for this PDE consists of the equation in $U$ together with prescribed trace data
\begin{align*}
\gamma_j u=g_j\quad\text{on }\Sigma,\qquad 0\le j\le r,
\end{align*}
where each $\gamma_j:X\to Y_j$ is a specified trace operator on $\Sigma$ and each $g_j\in Y_j$ is prescribed.
[/definition]
For evolution equations the initial hypersurface is often $t=0$. For example, the transport equation $\partial_t u+c\partial_x u=0$ is paired with $u(0,x)=g(x)$, while the wave equation $\partial_t^2u-\Delta u=0$ requires both $u(0,x)=g(x)$ and $\partial_t u(0,x)=h(x)$.
[definition: Boundary Value Problem]
Let $U\subset\mathbb R^n$ be a domain with boundary $\partial U$, let $m\in\mathbb N$, let $N\in\mathbb N$, let $X\subset C^m(U)$ be a chosen solution space of maps $u:U\to\mathbb R$, and let
\begin{align*}
F:U\times\mathbb R\times\mathbb R^n\times\mathcal S^2(\mathbb R^n)\times\cdots\times\mathcal S^m(\mathbb R^n)\to\mathbb R
\end{align*}
be a prescribed function. Let
\begin{align*}
F\bigl(x,u(x),\nabla u(x),\dots,D^m u(x)\bigr)=0
\end{align*}
be a PDE in $U$. A boundary value problem consists of this equation together with boundary conditions
\begin{align*}
B_j u=g_j\quad\text{on }\partial U,\qquad 1\le j\le N,
\end{align*}
where each $B_j:X\to Z_j$ is a specified boundary trace operator and each boundary datum $g_j\in Z_j$ is prescribed.
[/definition]
The Laplace equation is the basic example: prescribing $u=g$ on $\partial U$ gives the Dirichlet problem, while prescribing the normal derivative gives the Neumann problem. These conditions are not interchangeable with initial conditions because elliptic equations do not propagate information along time-like curves.
A well-formulated problem should not only have a solution. The solution should be determined by the data and should not change wildly under small perturbations of those data.
[definition: Hadamard Well-Posedness]
Let $D$ be a topological data space and let $X$ be a topological solution space. Suppose a PDE problem assigns to each datum $d\in D$ the task of finding $u\in X$ satisfying the equation and the prescribed data conditions. The problem is well posed in the sense of Hadamard if the solution map $S:D\to X$ is defined on every admissible datum and satisfies the following three conditions:
for every $d\in D$, there exists $u\in X$ satisfying the equation and the prescribed data conditions for $d$;
for every $d\in D$, there is at most one $u\in X$ satisfying the equation and the prescribed data conditions for $d$;
the map $S:D\to X$, defined by $S(d)=u$, is continuous with respect to the chosen topologies on $D$ and $X$.
[/definition]
The third condition is what turns solvability into a stable mathematical model. Without continuous dependence, measurement error or numerical approximation may produce unrelated outputs. These three requirements together form the modelling test used whenever the course asks whether a proposed PDE problem has been posed with the right data and spaces.
[remark: Hadamard Criterion as a Modelling Test]
Hadamard well-posedness is used in this course as a modelling criterion rather than as a separate theorem. Existence says the model responds to every datum the chosen class permits, so the problem is not underdefined. Uniqueness says the datum determines at most one state, so the model makes a definite prediction. Continuous dependence says nearby data produce nearby solutions in the selected norms, so the prediction is stable under approximation and measurement.
[/remark]
Each clause is needed. Without existence, admissible physical or geometric data may lie outside the reach of the equation; without uniqueness, the same datum can lead to incompatible predictions; without continuous dependence, small measurement error can be amplified into a large change in the solution. The criterion is not a statement about a formula alone, because the equation, the data set, and the solution space all matter. The same differential expression can be well posed with one choice of spaces and ill posed with another, and this is why later chapters always state the function space and the prescribed data before proving solvability.
[example: Failure of Uniqueness Without Initial Data]
Consider the transport equation
\begin{align*}
\partial_t u+c\partial_x u=0
\end{align*}
on $\mathbb R\times\mathbb R$, where $c\in\mathbb R$ is fixed. We show that the PDE alone does not determine a unique solution. Let $F:\mathbb R\to\mathbb R$ be differentiable and define
\begin{align*}
u(t,x)=F(x-ct).
\end{align*}
Set $y(t,x)=x-ct$. Then
\begin{align*}
\partial_t y(t,x)=\partial_t x-\partial_t(ct)=0-c=-c
\end{align*}
and
\begin{align*}
\partial_x y(t,x)=\partial_x x-\partial_x(ct)=1-0=1.
\end{align*}
Using the chain rule for $u(t,x)=F(y(t,x))$, we get
\begin{align*}
\partial_t u(t,x)=F'(y(t,x))\partial_t y(t,x)=F'(x-ct)(-c)=-cF'(x-ct).
\end{align*}
The same chain-rule computation in the $x$ variable gives
\begin{align*}
\partial_x u(t,x)=F'(y(t,x))\partial_x y(t,x)=F'(x-ct)(1)=F'(x-ct).
\end{align*}
Substituting these identities into the transport operator gives
\begin{align*}
\partial_t u(t,x)+c\partial_x u(t,x)=-cF'(x-ct)+cF'(x-ct).
\end{align*}
Factoring out $F'(x-ct)$ gives
\begin{align*}
-cF'(x-ct)+cF'(x-ct)=(-c+c)F'(x-ct)=0\cdot F'(x-ct)=0.
\end{align*}
Thus every differentiable profile $F$ produces a classical solution.
Different choices of $F$ give different solutions of the same equation. For example, if $F_0(y)=0$, then
\begin{align*}
u_0(t,x)=F_0(x-ct)=0.
\end{align*}
If $F_1(y)=1$, then
\begin{align*}
u_1(t,x)=F_1(x-ct)=1.
\end{align*}
Both functions satisfy $\partial_tu+c\partial_xu=0$ on $\mathbb R\times\mathbb R$, but they are distinct because, for every $(t,x)$,
\begin{align*}
u_1(t,x)-u_0(t,x)=1-0=1.
\end{align*}
If an initial condition $u(0,x)=g(x)$ were prescribed, then the formula $u(t,x)=F(x-ct)$ would give
\begin{align*}
u(0,x)=F(x-c\cdot 0)=F(x)=g(x).
\end{align*}
So the missing datum is exactly the profile $F$ along a transverse line such as $t=0$; without it, the PDE alone leaves infinitely many profiles, and hence infinitely many solutions, available.
[/example]
The example shows why data must meet the geometry of the equation: prescribing data along the wrong set may fail to select one solution. This motivates a local uniqueness statement along each characteristic, because the PDE restricts to an ordinary differential equation on those curves and agreement at one point should then propagate along the curve.
[quotetheorem:6139]
[citeproof:6139]
This theorem is local and one-dimensional in spirit: uniqueness is transported only along characteristic curves, because only along those curves does the PDE reduce to the displayed ordinary differential equation. The hypotheses cannot be treated as cosmetic assumptions. For the equation $\partial_x w=0$ on $\mathbb R^2$, the curve $X(s)=(0,s)$ is not characteristic for $b=(1,0)$; the two solutions $u(x,y)=0$ and $v(x,y)=y$ agree at $X(0)$ but not at $X(s)$ for $s\ne0$. Thus agreement at one point does not propagate along an arbitrary curve.
The pointwise agreement hypothesis is equally essential. For the same equation $\partial_x w=0$, the functions $u(x,y)=0$ and $v(x,y)=1$ are both classical solutions, and they differ by the same nonzero amount along every horizontal characteristic. The equation can carry equality along a characteristic, but it cannot create equality from no initial equality.
The regularity assumptions identify where the classical proof is taking place. The condition $u,v\in C^1(U)$ is what permits the pointwise chain-rule computation of $\frac{d}{ds}z(X(s))$; for merely weak or discontinuous solutions, traces along curves and equality at a point need not be meaningful. Regularity of the vector field is also tied to using characteristics as a coordinate system: if $b(x)=2\sqrt{|x|}$ on $\mathbb R$, then the ordinary differential equation $\dot X=b(X)$ has more than one characteristic starting at $0$, for instance $X(s)=0$ and $X(s)=s^2$ for $s\ge0$. A full Cauchy theorem therefore requires enough regularity and a hypersurface that intersects nearby characteristics in a controlled way, so that this one-curve uniqueness mechanism can be assembled into uniqueness in a neighbourhood.
## Prototype Equations and Model Behaviours
The last question of the chapter is how the main PDE families behave before any general theory is built. The prototypes below are not a catalogue of formulas; they are reference points for the rest of the course. Each one represents a different relation between data, geometry, and solution behaviour.
[definition: Constant-Coefficient Transport Equation]
Let $c\in\mathbb R$. The constant-coefficient transport equation for $u\in C^1(\mathbb R\times\mathbb R)$ is
\begin{align*}
\partial_t u+c\partial_x u=0.
\end{align*}
[/definition]
Its characteristics are straight lines $x-ct=\text{constant}$. The equation says that $u$ is constant along those lines.
[example: Solving Constant-Coefficient Transport]
Let $g\in C^1(\mathbb R)$ and consider
\begin{align*}
\partial_t u+c\partial_x u=0,\qquad u(0,x)=g(x).
\end{align*}
We show that
\begin{align*}
u(t,x)=g(x-ct)
\end{align*}
solves both the PDE and the initial condition. Set
\begin{align*}
y(t,x)=x-ct.
\end{align*}
Then
\begin{align*}
\partial_t y(t,x)=\partial_t x-\partial_t(ct)=0-c=-c
\end{align*}
and
\begin{align*}
\partial_x y(t,x)=\partial_x x-\partial_x(ct)=1-0=1.
\end{align*}
Since $u(t,x)=g(y(t,x))$, the chain rule in the $t$ variable gives
\begin{align*}
\partial_t u(t,x)=g'(y(t,x))\partial_t y(t,x).
\end{align*}
Substituting $y(t,x)=x-ct$ and $\partial_t y(t,x)=-c$ gives
\begin{align*}
\partial_t u(t,x)=g'(x-ct)(-c)=-c g'(x-ct).
\end{align*}
The chain rule in the $x$ variable gives
\begin{align*}
\partial_x u(t,x)=g'(y(t,x))\partial_x y(t,x).
\end{align*}
Substituting $y(t,x)=x-ct$ and $\partial_x y(t,x)=1$ gives
\begin{align*}
\partial_x u(t,x)=g'(x-ct)(1)=g'(x-ct).
\end{align*}
Putting these two derivative formulas into the transport operator, we obtain
\begin{align*}
\partial_t u(t,x)+c\partial_x u(t,x)=-c g'(x-ct)+c g'(x-ct).
\end{align*}
Factoring out $g'(x-ct)$ gives
\begin{align*}
-c g'(x-ct)+c g'(x-ct)=(-c+c)g'(x-ct)=0\cdot g'(x-ct)=0.
\end{align*}
Thus $u$ satisfies $\partial_t u+c\partial_xu=0$. At $t=0$,
\begin{align*}
u(0,x)=g(x-c\cdot 0)=g(x-0)=g(x),
\end{align*}
so the initial condition is also satisfied.
For each fixed value of $x-ct$, the argument of $g$ is unchanged; the solution therefore carries the initial profile along the lines $x-ct=\text{constant}$ without changing its shape.
[/example]
Transport equations move profiles without smoothing them, so they model advection rather than spatial relaxation. The next prototype, the eikonal equation, is also first order, but it constrains the size of the gradient instead of describing motion along a fixed vector field.
[definition: Eikonal Equation]
Let $U\subset\mathbb R^n$ be open. The eikonal equation for $u\in C^1(U)$ is
\begin{align*}
|\nabla u(x)|=1.
\end{align*}
[/definition]
It appears when $u$ represents an arrival time or distance-like phase. Its nonlinearity lies in the gradient magnitude rather than in the value of $u$.
[example: Distance from a Hyperplane]
On $\mathbb R^n$, let $u(x)=x_1$. Writing $x=(x_1,\dots,x_n)$, the coordinate derivatives are
\begin{align*}
\partial_{x_1}u(x)=\partial_{x_1}x_1=1.
\end{align*}
For $2\le i\le n$, the variable $x_i$ is independent of $x_1$, so
\begin{align*}
\partial_{x_i}u(x)=\partial_{x_i}x_1=0.
\end{align*}
Therefore
\begin{align*}
\nabla u(x)=(\partial_{x_1}u(x),\dots,\partial_{x_n}u(x))=(1,0,\dots,0)=e_1.
\end{align*}
Using the Euclidean norm formula,
\begin{align*}
|\nabla u(x)|=\sqrt{1^2+0^2+\cdots+0^2}=\sqrt{1}=1.
\end{align*}
Thus $u$ solves the eikonal equation $|\nabla u|=1$.
More generally, let $a=(a_1,\dots,a_n)\in\mathbb R^n$ and define
\begin{align*}
u(x)=a\cdot x=\sum_{i=1}^n a_i x_i.
\end{align*}
Fix $j\in\{1,\dots,n\}$. Differentiating the finite sum term by term gives
\begin{align*}
\partial_{x_j}u(x)=\partial_{x_j}\left(\sum_{i=1}^n a_i x_i\right)=\sum_{i=1}^n a_i\partial_{x_j}x_i.
\end{align*}
For the coordinate functions,
\begin{align*}
\partial_{x_j}x_j=1.
\end{align*}
If $i\ne j$, then $x_i$ is independent of $x_j$, so
\begin{align*}
\partial_{x_j}x_i=0.
\end{align*}
Hence the sum has only one nonzero term:
\begin{align*}
\partial_{x_j}u(x)=a_j\partial_{x_j}x_j+\sum_{i\ne j}a_i\partial_{x_j}x_i.
\end{align*}
Substituting the coordinate derivatives gives
\begin{align*}
\partial_{x_j}u(x)=a_j\cdot 1+\sum_{i\ne j}a_i\cdot 0=a_j+0=a_j.
\end{align*}
Since this holds for every $j\in\{1,\dots,n\}$,
\begin{align*}
\nabla u(x)=(\partial_{x_1}u(x),\dots,\partial_{x_n}u(x))=(a_1,\dots,a_n)=a.
\end{align*}
Taking the Euclidean norm gives
\begin{align*}
|\nabla u(x)|=\sqrt{a_1^2+\cdots+a_n^2}=|a|.
\end{align*}
Consequently, if $|a|=1$, then
\begin{align*}
|\nabla u(x)|=|a|=1
\end{align*}
for every $x\in\mathbb R^n$, so $u(x)=a\cdot x$ is a smooth solution of the eikonal equation. Its level sets are the hyperplanes $a\cdot x=\text{constant}$, so these solutions represent planar wavefronts.
[/example]
The eikonal equation is nonlinear but static: it has no time variable in the displayed form. This motivates the inviscid Burgers equation as the first nonlinear evolution prototype, where the characteristic speed depends on the solution and smooth transport can break down.
[definition: Inviscid Burgers Equation]
The inviscid Burgers equation for $u\in C^1(\mathbb R\times\mathbb R)$ is
\begin{align*}
\partial_t u+u\partial_x u=0.
\end{align*}
[/definition]
Because the speed is $u$, larger values of the solution move differently from smaller values. This produces crossing characteristics and motivates weak solutions and shocks later in the course.
[example: Characteristic Speeds in Burgers]
Let $u\in C^1(\mathbb R\times\mathbb R)$ solve
\begin{align*}
\partial_t u+u\partial_xu=0,\qquad u(0,x)=g(x).
\end{align*}
We compute the characteristic curve starting from $x(0)=x_0$. Let $x(t)$ be differentiable. Applying the chain rule to the composition $t\mapsto u(t,x(t))$ gives
\begin{align*}
\frac{d}{dt}u(t,x(t))=\partial_tu(t,x(t))\frac{d}{dt}t+\partial_xu(t,x(t))x'(t).
\end{align*}
Since $\frac{d}{dt}t=1$, this is
\begin{align*}
\frac{d}{dt}u(t,x(t))=\partial_tu(t,x(t))+x'(t)\partial_xu(t,x(t)).
\end{align*}
Choose the curve so that
\begin{align*}
x'(t)=u(t,x(t)).
\end{align*}
Substituting this into the chain-rule identity gives
\begin{align*}
\frac{d}{dt}u(t,x(t))
=\partial_tu(t,x(t))+u(t,x(t))\partial_xu(t,x(t)).
\end{align*}
The Burgers equation holds at the point $(t,x(t))$, so
\begin{align*}
\partial_tu(t,x(t))+u(t,x(t))\partial_xu(t,x(t))=0.
\end{align*}
Therefore
\begin{align*}
\frac{d}{dt}u(t,x(t))=0.
\end{align*}
Thus $u(t,x(t))$ is constant along the characteristic. Evaluating the constant at $t=0$ gives
\begin{align*}
u(t,x(t))=u(0,x(0))=u(0,x_0)=g(x_0).
\end{align*}
The characteristic equation becomes
\begin{align*}
x'(t)=u(t,x(t))=g(x_0).
\end{align*}
Integrating from $0$ to $t$ yields
\begin{align*}
\int_0^t x'(s)\,ds=\int_0^t g(x_0)\,ds.
\end{align*}
The left side is $x(t)-x(0)$, and $g(x_0)$ is independent of $s$, so
\begin{align*}
x(t)-x(0)=g(x_0)\int_0^t 1\,ds=g(x_0)(t-0)=tg(x_0).
\end{align*}
Since $x(0)=x_0$,
\begin{align*}
x(t)-x_0=tg(x_0),
\end{align*}
and adding $x_0$ to both sides gives the characteristic line
\begin{align*}
x(t)=x_0+tg(x_0).
\end{align*}
Now suppose $g$ decreases between two points: choose $x_1<x_2$ with $g(x_1)>g(x_2)$. The characteristic from $x_1$ is
\begin{align*}
x=x_1+tg(x_1),
\end{align*}
and the characteristic from $x_2$ is
\begin{align*}
x=x_2+tg(x_2).
\end{align*}
They meet precisely when their spatial coordinates are equal:
\begin{align*}
x_1+tg(x_1)=x_2+tg(x_2).
\end{align*}
Subtracting $x_1$ from both sides gives
\begin{align*}
tg(x_1)=x_2-x_1+tg(x_2).
\end{align*}
Subtracting $tg(x_2)$ from both sides gives
\begin{align*}
tg(x_1)-tg(x_2)=x_2-x_1.
\end{align*}
Factoring out $t$ gives
\begin{align*}
t\bigl(g(x_1)-g(x_2)\bigr)=x_2-x_1.
\end{align*}
Since $g(x_1)>g(x_2)$, we have $g(x_1)-g(x_2)>0$, so division by this positive number gives
\begin{align*}
t=\frac{x_2-x_1}{g(x_1)-g(x_2)}.
\end{align*}
Because $x_2>x_1$, the numerator satisfies $x_2-x_1>0$, and because $g(x_1)>g(x_2)$, the denominator satisfies $g(x_1)-g(x_2)>0$. Hence
\begin{align*}
\frac{x_2-x_1}{g(x_1)-g(x_2)}>0.
\end{align*}
So the collision time is positive. At that spacetime point, the characteristic construction assigns both the initial label $x_1$ and the initial label $x_2$ to the same point, which is the classical mechanism by which smooth Burgers characteristics lead to shock formation.
[/example]
Burgers' equation shows that first-order equations may lose classical smoothness. To compare that behaviour with the main second-order classes, we introduce the elliptic, parabolic, and hyperbolic prototypes side by side, beginning with the spatial constraint represented by [Laplace's equation](/page/Laplace's%20Equation).
[definition: Laplace Equation]
Let $U\subset\mathbb R^n$ be open. The Laplace equation for a function $u\in C^2(U;\mathbb R)$ is
\begin{align*}
\Delta u=0.
\end{align*}
[/definition]
Solutions of the Laplace equation are harmonic functions, and boundary data are the natural way to select a solution on a bounded domain. This motivates the heat equation as the parabolic comparison case, where the Laplacian is coupled to a first time derivative and spatial smoothing becomes forward evolution.
[definition: Heat Equation]
The heat equation for a function $u\in C^{1,2}((0,\infty)\times\mathbb R^n;\mathbb R)$ is
\begin{align*}
\partial_t u-\Delta u=0.
\end{align*}
[/definition]
Here $C^{1,2}$ means one continuous time derivative and two continuous spatial derivatives on the open time slab where the PDE is imposed. In a Cauchy problem one additionally prescribes a trace $u(0,x)=g(x)$, which is a boundary value in time rather than part of the pointwise PDE domain. The heat equation is time-oriented: initial data are prescribed at $t=0$, and the solution evolves forward with smoothing. Reversing time is unstable, which is an early warning that the direction of time is part of the well-posed problem. The hyperbolic prototype instead uses a second time derivative, so both initial position and initial velocity are needed.
[definition: Wave Equation]
The wave equation for a function $u\in C^2(\mathbb R\times\mathbb R^n;\mathbb R)$ is
\begin{align*}
\partial_t^2u-\Delta u=0.
\end{align*}
[/definition]
The wave equation requires position and velocity data. Unlike the heat equation, it preserves a propagation geometry: disturbances travel through characteristic cones rather than appearing instantly everywhere.
[example: Elliptic Parabolic and Hyperbolic Contrasts]
On the bounded interval $(0,1)$, Laplace's equation is $u''=0$. If the boundary values are $u(0)=a$ and $u(1)=b$, define
\begin{align*}
u(x)=a+(b-a)x.
\end{align*}
Differentiating term by term gives
\begin{align*}
u'(x)=0+(b-a)\cdot 1=b-a.
\end{align*}
Differentiating once more gives
\begin{align*}
u''(x)=0.
\end{align*}
The boundary values are
\begin{align*}
u(0)=a+(b-a)0=a+0=a
\end{align*}
and
\begin{align*}
u(1)=a+(b-a)1=a+b-a=b.
\end{align*}
Thus this affine function solves the Dirichlet problem.
Changing only the left boundary value from $a$ to $a+\delta$ while keeping the right boundary value equal to $b$ gives
\begin{align*}
\widetilde u(x)=a+\delta+(b-a-\delta)x.
\end{align*}
Indeed,
\begin{align*}
\widetilde u(0)=a+\delta+(b-a-\delta)0=a+\delta
\end{align*}
and
\begin{align*}
\widetilde u(1)=a+\delta+(b-a-\delta)1=a+\delta+b-a-\delta=b.
\end{align*}
The difference between the perturbed and unperturbed solutions is
\begin{align*}
\widetilde u(x)-u(x)=a+\delta+(b-a-\delta)x-\bigl(a+(b-a)x\bigr).
\end{align*}
Distributing the minus sign gives
\begin{align*}
\widetilde u(x)-u(x)=a+\delta+(b-a-\delta)x-a-(b-a)x.
\end{align*}
Combining constant terms gives
\begin{align*}
a+\delta-a=\delta.
\end{align*}
Combining the coefficients of $x$ gives
\begin{align*}
(b-a-\delta)-(b-a)=b-a-\delta-b+a=-\delta.
\end{align*}
Therefore
\begin{align*}
\widetilde u(x)-u(x)=\delta-\delta x.
\end{align*}
Factoring out $\delta$ gives
\begin{align*}
\widetilde u(x)-u(x)=\delta(1-x).
\end{align*}
If $\delta\ne0$ and $0<x<1$, then $1-x\ne0$, so $\widetilde u(x)-u(x)\ne0$. Thus a boundary change at one endpoint changes every interior point.
For the heat equation on $\mathbb R$, fix a frequency $k\in\mathbb R$ and set
\begin{align*}
u(t,x)=e^{-k^2t}\sin(kx).
\end{align*}
Since $\partial_t(e^{-k^2t})=-k^2e^{-k^2t}$ and $\sin(kx)$ is independent of $t$,
\begin{align*}
\partial_tu(t,x)=-k^2e^{-k^2t}\sin(kx).
\end{align*}
Since $e^{-k^2t}$ is independent of $x$ and $\partial_x\sin(kx)=k\cos(kx)$,
\begin{align*}
\partial_xu(t,x)=ke^{-k^2t}\cos(kx).
\end{align*}
Differentiating once more in $x$ and using $\partial_x\cos(kx)=-k\sin(kx)$ gives
\begin{align*}
\partial_x^2u(t,x)=ke^{-k^2t}\bigl(-k\sin(kx)\bigr).
\end{align*}
Multiplying the constants gives
\begin{align*}
\partial_x^2u(t,x)=-k^2e^{-k^2t}\sin(kx).
\end{align*}
Therefore
\begin{align*}
\partial_tu(t,x)-\partial_x^2u(t,x)=-k^2e^{-k^2t}\sin(kx)-\bigl(-k^2e^{-k^2t}\sin(kx)\bigr).
\end{align*}
Changing subtraction of a negative term into addition gives
\begin{align*}
\partial_tu(t,x)-\partial_x^2u(t,x)=-k^2e^{-k^2t}\sin(kx)+k^2e^{-k^2t}\sin(kx).
\end{align*}
The two terms are opposites, so
\begin{align*}
\partial_tu(t,x)-\partial_x^2u(t,x)=0.
\end{align*}
Thus this mode solves $\partial_tu-\partial_x^2u=0$.
For $t>0$, the amplitude multiplier is $e^{-k^2t}$. If $0<|k_1|<|k_2|$, then $k_1^2<k_2^2$, so $-k_2^2t<-k_1^2t$. Since the exponential function is increasing,
\begin{align*}
e^{-k_2^2t}<e^{-k_1^2t}.
\end{align*}
Thus higher spatial frequencies are damped more strongly at positive time. At time $T>0$, the initial mode $\sin(kx)$ has become
\begin{align*}
e^{-k^2T}\sin(kx).
\end{align*}
To recover $\sin(kx)$ from that final-time mode, one must multiply by $e^{k^2T}$, because
\begin{align*}
e^{k^2T}\bigl(e^{-k^2T}\sin(kx)\bigr)=e^{k^2T-k^2T}\sin(kx)=e^0\sin(kx)=\sin(kx).
\end{align*}
Thus a final-time perturbation $\varepsilon\sin(kx)$ corresponds backward in time to
\begin{align*}
\varepsilon e^{k^2T}\sin(kx).
\end{align*}
For fixed $\varepsilon\ne0$ and $T>0$, the factor $e^{k^2T}$ tends to infinity as $|k|\to\infty$, so arbitrarily small high-frequency final-time errors can correspond to arbitrarily large initial-time errors. This displays the instability of the backward heat problem.
For the one-dimensional wave equation, let $f\in C^2(\mathbb R)$ and define
\begin{align*}
u(t,x)=\frac12 f(x-t)+\frac12 f(x+t).
\end{align*}
Using the chain rule in $t$,
\begin{align*}
\partial_t f(x-t)=f'(x-t)\partial_t(x-t)=f'(x-t)(-1)=-f'(x-t).
\end{align*}
Similarly,
\begin{align*}
\partial_t f(x+t)=f'(x+t)\partial_t(x+t)=f'(x+t)(1)=f'(x+t).
\end{align*}
Therefore
\begin{align*}
\partial_tu(t,x)=-\frac12 f'(x-t)+\frac12 f'(x+t).
\end{align*}
Differentiating this expression again in $t$ gives
\begin{align*}
\partial_t^2u(t,x)=-\frac12 f''(x-t)\partial_t(x-t)+\frac12 f''(x+t)\partial_t(x+t).
\end{align*}
Substituting $\partial_t(x-t)=-1$ and $\partial_t(x+t)=1$ gives
\begin{align*}
\partial_t^2u(t,x)=-\frac12 f''(x-t)(-1)+\frac12 f''(x+t)(1).
\end{align*}
Hence
\begin{align*}
\partial_t^2u(t,x)=\frac12 f''(x-t)+\frac12 f''(x+t).
\end{align*}
Using the chain rule in $x$,
\begin{align*}
\partial_x f(x-t)=f'(x-t)\partial_x(x-t)=f'(x-t)(1)=f'(x-t).
\end{align*}
Similarly,
\begin{align*}
\partial_x f(x+t)=f'(x+t)\partial_x(x+t)=f'(x+t)(1)=f'(x+t).
\end{align*}
Therefore
\begin{align*}
\partial_xu(t,x)=\frac12 f'(x-t)+\frac12 f'(x+t).
\end{align*}
Differentiating once more in $x$ gives
\begin{align*}
\partial_x^2u(t,x)=\frac12 f''(x-t)\partial_x(x-t)+\frac12 f''(x+t)\partial_x(x+t).
\end{align*}
Substituting $\partial_x(x-t)=1$ and $\partial_x(x+t)=1$ gives
\begin{align*}
\partial_x^2u(t,x)=\frac12 f''(x-t)+\frac12 f''(x+t).
\end{align*}
Therefore
\begin{align*}
\partial_t^2u(t,x)-\partial_x^2u(t,x)=\left(\frac12 f''(x-t)+\frac12 f''(x+t)\right)-\left(\frac12 f''(x-t)+\frac12 f''(x+t)\right).
\end{align*}
The two parenthesized expressions are identical, so
\begin{align*}
\partial_t^2u(t,x)-\partial_x^2u(t,x)=0.
\end{align*}
Thus $u$ solves $\partial_t^2u-\partial_x^2u=0$.
At the point $(t,x)$, the formula uses only the two values $f(x-t)$ and $f(x+t)$. If $f$ is supported in $[-R,R]$, then $f(y)=0$ whenever $y\notin[-R,R]$. Hence, whenever both $x-t\notin[-R,R]$ and $x+t\notin[-R,R]$,
\begin{align*}
u(t,x)=\frac12 f(x-t)+\frac12 f(x+t)=\frac12\cdot0+\frac12\cdot0=0.
\end{align*}
The wave disturbance therefore travels along the characteristic lines $x-t=\text{constant}$ and $x+t=\text{constant}$, rather than instantly affecting all space.
[/example]
These prototypes set the agenda for the rest of the course. Chapter 2 now focuses on first-order linear equations, using the characteristic curves introduced above to turn PDE questions into ODE questions along carefully chosen curves.
Once PDEs are set up as equations for functions and flows, the next step is to exploit the geometry they suggest. Chapter 2 takes the language of classification and data placement and turns it into a method for solving linear first-order equations along characteristics.
# 2. Linear First-Order Equations and Characteristics
This chapter turns the geometric viewpoint from Chapters 0 and 1 into a working method. It assumes the chain rule for $C^1$ maps, the elementary existence and uniqueness theorem for ODEs with locally Lipschitz vector fields, and the integrating-factor method for scalar linear ODEs. A scalar first-order linear PDE prescribes the derivative of an unknown function along a given vector field, so the equation becomes an ODE once we follow the right curves. The central questions are how to construct those curves, how source terms accumulate along them, and when initial data determine a solution rather than contradicting the equation.
## Scalar Linear Equations as Directional Derivative Equations
The first problem is to recognise what information a first-order linear equation actually gives. It does not prescribe every partial derivative of $u$ separately; it prescribes one directional derivative, namely the derivative in the direction selected by the coefficients of the equation.
[definition: Scalar Linear First-Order Equation]
Let $U \subset \mathbb R^n$ be open. A classical scalar linear first-order equation for $u\in C^1(U)$ has the form
\begin{align*}
\sum_{i=1}^n a_i(x)\,\partial_{x_i}u(x) + c(x)u(x) = f(x), \qquad x\in U,
\end{align*}
where $a_i,c,f:U\to \mathbb R$ are given functions for which the displayed expression is defined pointwise.
[/definition]
Writing $a=(a_1,\dots,a_n)$, the first-order part is $a(x)\cdot \nabla u(x)$. Thus the equation controls the rate of change of $u$ along the vector field $a$, while the lower-order coefficient $c$ and the source $f$ modify that rate.
[example: Constant-Coefficient Transport]
Let $b\in \mathbb R^n$ be nonzero, and let $u\in C^1(\mathbb R^n)$ solve
\begin{align*}
b\cdot \nabla u(x)=0 \qquad \text{for every } x\in\mathbb R^n.
\end{align*}
For a fixed starting point $x_0\in\mathbb R^n$, define
\begin{align*}
x(t)=x_0+tb.
\end{align*}
Then
\begin{align*}
x'(t)=\frac{d}{dt}(x_0+tb)=b.
\end{align*}
By the chain rule,
\begin{align*}
\frac{d}{dt}u(x_0+tb)=\nabla u(x_0+tb)\cdot \frac{d}{dt}(x_0+tb).
\end{align*}
Substituting $\frac{d}{dt}(x_0+tb)=b$ gives
\begin{align*}
\frac{d}{dt}u(x_0+tb)=\nabla u(x_0+tb)\cdot b.
\end{align*}
Since the dot product is symmetric,
\begin{align*}
\nabla u(x_0+tb)\cdot b=b\cdot \nabla u(x_0+tb).
\end{align*}
The PDE evaluated at the point $x_0+tb$ gives
\begin{align*}
b\cdot \nabla u(x_0+tb)=0.
\end{align*}
Therefore
\begin{align*}
\frac{d}{dt}u(x_0+tb)=0.
\end{align*}
Hence $u(x_0+tb)$ is constant in $t$, so $u$ is constant on every affine line parallel to $b$.
Now choose a hyperplane
\begin{align*}
S=\{y\in\mathbb R^n:n\cdot y=\alpha\}
\end{align*}
with $n\cdot b\ne0$. For a given $x\in\mathbb R^n$, the point on the line through $x$ in direction $-b$ has the form
\begin{align*}
y(t)=x-tb.
\end{align*}
It lies in $S$ exactly when
\begin{align*}
n\cdot y(t)=\alpha.
\end{align*}
Substituting $y(t)=x-tb$ gives
\begin{align*}
n\cdot(x-tb)=\alpha.
\end{align*}
Using linearity of the dot product,
\begin{align*}
n\cdot(x-tb)=n\cdot x-n\cdot(tb).
\end{align*}
Since $t$ is a scalar,
\begin{align*}
n\cdot(tb)=t(n\cdot b).
\end{align*}
Thus the intersection condition is
\begin{align*}
n\cdot x-t(n\cdot b)=\alpha.
\end{align*}
Rearranging,
\begin{align*}
t(n\cdot b)=n\cdot x-\alpha.
\end{align*}
Because $n\cdot b\ne0$, division by $n\cdot b$ gives
\begin{align*}
t=\frac{n\cdot x-\alpha}{n\cdot b}.
\end{align*}
Therefore the intersection point is
\begin{align*}
y=x-\frac{n\cdot x-\alpha}{n\cdot b}\,b.
\end{align*}
To verify that this point lies on $S$, compute first
\begin{align*}
n\cdot y=n\cdot\left(x-\frac{n\cdot x-\alpha}{n\cdot b}\,b\right).
\end{align*}
By linearity of the dot product,
\begin{align*}
n\cdot y=n\cdot x-\frac{n\cdot x-\alpha}{n\cdot b}(n\cdot b).
\end{align*}
Since $n\cdot b\ne0$,
\begin{align*}
\frac{n\cdot x-\alpha}{n\cdot b}(n\cdot b)=n\cdot x-\alpha.
\end{align*}
Hence
\begin{align*}
n\cdot y=n\cdot x-(n\cdot x-\alpha)=\alpha.
\end{align*}
Thus $y\in S$. Since $x$ and $y$ lie on the same affine line parallel to $b$, constancy along that line gives
\begin{align*}
u(x)=u(y).
\end{align*}
Substituting the formula for $y$,
\begin{align*}
u(x)=u\left(x-\frac{n\cdot x-\alpha}{n\cdot b}\,b\right).
\end{align*}
Values of $u$ on any hyperplane transverse to $b$ therefore determine $u$ everywhere; the characteristic method is already visible as solving the curve equation first and then reading the PDE along those curves.
[/example]
The constant-coefficient example shows that straight lines solve the transport geometry when the velocity is constant. For variable coefficients, the same role must be played by curves whose tangent vector at each point equals the coefficient field; this requirement leads to the characteristic ODE.
[definition: Characteristic Curve]
Let $a:U\to \mathbb R^n$ be a vector field. A characteristic curve for $a$ is a differentiable map $X:I\to U$, where $I\subset \mathbb R$ is an interval, satisfying
\begin{align*}
\dot X(t)=a(X(t)).
\end{align*}
[/definition]
A characteristic curve is designed so that movement along the curve matches the directional derivative appearing in the PDE. The obstruction in a variable-coefficient transport equation is that ordinary coordinate lines no longer follow the direction in which the PDE differentiates $u$. Once the curve itself solves the characteristic ODE, the chain rule differentiates $u$ in exactly the direction prescribed by the equation, reducing the PDE along that curve to a scalar ODE.
[explanation: Linear Characteristic Reduction]
Let $a:U\to\mathbb R^n$ be a vector field and let $u\in C^1(U)$. If $X:I\to U$ is a characteristic curve satisfying $\dot X(s)=a(X(s))$, then the chain rule gives
\begin{align*}
\frac{d}{ds}u(X(s))=\nabla u(X(s))\cdot \dot X(s)=a(X(s))\cdot\nabla u(X(s)).
\end{align*}
Thus a first-order linear equation
\begin{align*}
a(x)\cdot\nabla u(x)+c(x)u(x)=f(x)
\end{align*}
restricts along $X$ to the scalar ordinary differential equation
\begin{align*}
\frac{d}{ds}u(X(s))+c(X(s))u(X(s))=f(X(s)).
\end{align*}
[/explanation]
The hypothesis that $X$ actually solve the characteristic ODE is essential: if one follows an arbitrary curve, the chain rule gives the derivative of $u$ in the tangent direction of that curve, not in the direction $a$. For instance, $u(t,x)=x-t$ solves $\partial_t u+\partial_xu=0$, but along the vertical curve $s\mapsto(s,0)$ the value is $-s$, not constant. The regularity assumptions also have separate roles. The condition $u\in C^1(U)$ is what permits the classical chain-rule computation; for a merely continuous $u$, the expression $a\cdot\nabla u$ may not be defined pointwise. Local Lipschitz regularity of $a$ is the standard hypothesis that gives uniqueness of characteristics; without it, for example $\dot X=2\sqrt{|X|}$ with $X(0)=0$ has more than one solution, so a point need not select a single characteristic. Continuity of $c$ ensures that the integrating factor is a classical integral along each characteristic. The reduction still does not guarantee that characteristics cover the region of interest; it identifies the ODE satisfied once such a curve is available. This distinction leads next to explicit characteristic computations and, in the section on initial hypersurfaces below, to the question of whether an initial surface is crossed by enough characteristics.
[example: Variable Velocity Transport]
On $U=\mathbb R\times\mathbb R$, consider
\begin{align*}
\partial_t u(t,x)+x\partial_x u(t,x)=0, \qquad u(0,x)=g(x).
\end{align*}
The coefficient field in the $(t,x)$ variables is $a(t,x)=(1,x)$, so a characteristic $s\mapsto (T(s),X(s))$ satisfies
\begin{align*}
\dot T(s)=1,\qquad \dot X(s)=X(s).
\end{align*}
With $T(0)=0$ and $X(0)=x_0$, integrating $\dot T(s)=1$ from $0$ to $s$ gives
\begin{align*}
T(s)-T(0)=\int_0^s 1\,dr=s.
\end{align*}
Since $T(0)=0$, this is
\begin{align*}
T(s)=s.
\end{align*}
For the second equation, multiply by the integrating factor $e^{-s}$. The product rule gives
\begin{align*}
\frac{d}{ds}\left(e^{-s}X(s)\right)=\frac{d}{ds}(e^{-s})X(s)+e^{-s}\dot X(s).
\end{align*}
Since $\frac{d}{ds}(e^{-s})=-e^{-s}$, we have
\begin{align*}
\frac{d}{ds}\left(e^{-s}X(s)\right)=-e^{-s}X(s)+e^{-s}\dot X(s).
\end{align*}
Using $\dot X(s)=X(s)$, the right-hand side becomes
\begin{align*}
-e^{-s}X(s)+e^{-s}X(s)=0.
\end{align*}
Thus $e^{-s}X(s)$ is constant in $s$. Evaluating this constant at $s=0$ gives
\begin{align*}
e^{-s}X(s)=e^0X(0)=x_0.
\end{align*}
Multiplying by $e^s$ gives
\begin{align*}
X(s)=x_0e^s.
\end{align*}
Along this characteristic, the chain rule gives
\begin{align*}
\frac{d}{ds}u(T(s),X(s))=\partial_tu(T(s),X(s))\dot T(s)+\partial_xu(T(s),X(s))\dot X(s).
\end{align*}
Substituting $\dot T(s)=1$ and $\dot X(s)=X(s)$ yields
\begin{align*}
\frac{d}{ds}u(T(s),X(s))=\partial_tu(T(s),X(s))\cdot 1+\partial_xu(T(s),X(s))X(s).
\end{align*}
Since multiplication of [real numbers](/page/Real%20Numbers) is commutative,
\begin{align*}
\partial_xu(T(s),X(s))X(s)=X(s)\partial_xu(T(s),X(s)).
\end{align*}
Therefore
\begin{align*}
\frac{d}{ds}u(T(s),X(s))=\partial_tu(T(s),X(s))+X(s)\partial_xu(T(s),X(s)).
\end{align*}
The PDE evaluated at $(T(s),X(s))$ says
\begin{align*}
\partial_tu(T(s),X(s))+X(s)\partial_xu(T(s),X(s))=0.
\end{align*}
Hence
\begin{align*}
\frac{d}{ds}u(T(s),X(s))=0.
\end{align*}
So $u(T(s),X(s))$ is constant in $s$. Using $T(s)=s$, $X(s)=x_0e^s$, and the initial condition at $s=0$, we obtain
\begin{align*}
u(s,x_0e^s)=u(T(s),X(s))=u(T(0),X(0)).
\end{align*}
Since $T(0)=0$ and $X(0)=x_0$,
\begin{align*}
u(T(0),X(0))=u(0,x_0)=g(x_0).
\end{align*}
Thus
\begin{align*}
u(s,x_0e^s)=g(x_0).
\end{align*}
To express the value at a prescribed endpoint $(t,x)$, set $s=t$ and require
\begin{align*}
x=X(t)=x_0e^t.
\end{align*}
Multiplying both sides by $e^{-t}$ gives
\begin{align*}
xe^{-t}=x_0e^te^{-t}.
\end{align*}
Since $e^te^{-t}=1$, this becomes
\begin{align*}
x_0=xe^{-t}.
\end{align*}
Substituting $x_0=xe^{-t}$ into the constant-along-characteristics formula gives
\begin{align*}
u(t,x)=g(xe^{-t}).
\end{align*}
The initial value at $x_0$ is carried to the later point $x_0e^t$, so the initial profile is transported by the flow generated by the variable velocity field $x$.
[/example]
## Inhomogeneous Equations and Accumulated Source Terms
The next question is what changes when the PDE contains a nonzero source. Homogeneous equations propagate initial values along curves; inhomogeneous equations add contributions picked up continuously along the same curves.
[quotetheorem:6140]
[citeproof:6140]
The continuity of $f$ along the characteristic is what makes the accumulated-source integral a classical object. If the source were not integrable along a flow line, the formula would no longer define a pointwise classical solution by ordinary integration; for instance, a source with a nonintegrable singularity on the characteristic would make the displayed integral diverge. The same point applies to $c$, since the exponential weight is built from $\int c(X(r))\,dr$. The assumption $u\in C^1(U)$ is needed because the derivation differentiates $u(X(t))$, and the regularity of $a$ supplies the characteristic curves on which the formula is evaluated. If uniqueness of characteristics fails, the same starting point may admit different histories and the representation no longer gives a single canonical value without extra choices. The theorem also does not say that the same starting time $t_0$ is available globally for every point; characteristics may leave the domain or fail to meet the chosen initial set. The representation is the first appearance in these notes of the Duhamel principle: solve the homogeneous transport problem and then superpose source contributions inserted along the flow.
[example: Source Term Along Flow Lines]
Consider
\begin{align*}
\partial_t u(t,x)+\partial_x u(t,x)=t, \qquad u(0,x)=g(x),
\end{align*}
on $\mathbb R\times \mathbb R$. The coefficient field in the $(t,x)$ variables is $a(t,x)=(1,1)$, so a characteristic $s\mapsto (T(s),X(s))$ satisfies
\begin{align*}
\dot T(s)=1,\qquad \dot X(s)=1.
\end{align*}
To reach the endpoint $(t,x)$ at parameter value $s=t$, set
\begin{align*}
T(s)=s,\qquad X(s)=x-t+s.
\end{align*}
Then
\begin{align*}
\dot T(s)=\frac{d}{ds}s=1.
\end{align*}
Also,
\begin{align*}
\dot X(s)=\frac{d}{ds}(x-t+s)=1,
\end{align*}
because $x$ and $t$ are fixed endpoint coordinates while $s$ is the parameter. At $s=t$,
\begin{align*}
(T(t),X(t))=(t,x-t+t)=(t,x).
\end{align*}
At $s=0$, the same characteristic meets the initial line at
\begin{align*}
(T(0),X(0))=(0,x-t+0)=(0,x-t).
\end{align*}
Along this curve, the chain rule gives
\begin{align*}
\frac{d}{ds}u(T(s),X(s))=\partial_tu(T(s),X(s))\dot T(s)+\partial_xu(T(s),X(s))\dot X(s).
\end{align*}
Substituting $T(s)=s$, $X(s)=x-t+s$, $\dot T(s)=1$, and $\dot X(s)=1$ gives
\begin{align*}
\frac{d}{ds}u(s,x-t+s)=\partial_tu(s,x-t+s)\cdot 1+\partial_xu(s,x-t+s)\cdot 1.
\end{align*}
Therefore
\begin{align*}
\frac{d}{ds}u(s,x-t+s)=\partial_tu(s,x-t+s)+\partial_xu(s,x-t+s).
\end{align*}
Evaluating the PDE at the point $(s,x-t+s)$ gives
\begin{align*}
\partial_tu(s,x-t+s)+\partial_xu(s,x-t+s)=s.
\end{align*}
Hence
\begin{align*}
\frac{d}{ds}u(s,x-t+s)=s.
\end{align*}
Integrating both sides from $0$ to $t$ gives
\begin{align*}
\int_0^t \frac{d}{ds}u(s,x-t+s)\,ds=\int_0^t s\,ds.
\end{align*}
By the [fundamental theorem of calculus](/theorems/632),
\begin{align*}
\int_0^t \frac{d}{ds}u(s,x-t+s)\,ds=u(t,x-t+t)-u(0,x-t+0).
\end{align*}
Since $x-t+t=x$ and $x-t+0=x-t$, this becomes
\begin{align*}
\int_0^t \frac{d}{ds}u(s,x-t+s)\,ds=u(t,x)-u(0,x-t).
\end{align*}
Using the antiderivative $s^2/2$,
\begin{align*}
\int_0^t s\,ds=\left.\frac{s^2}{2}\right|_{0}^{t}.
\end{align*}
Thus
\begin{align*}
\int_0^t s\,ds=\frac{t^2}{2}-\frac{0^2}{2}=\frac{t^2}{2}.
\end{align*}
Combining the two integrals,
\begin{align*}
u(t,x)-u(0,x-t)=\frac{t^2}{2}.
\end{align*}
The initial condition gives
\begin{align*}
u(0,x-t)=g(x-t).
\end{align*}
Therefore
\begin{align*}
u(t,x)=g(x-t)+\frac{t^2}{2}.
\end{align*}
The source contributes through the accumulated integral along the whole characteristic segment, not merely through its value at the endpoint.
[/example]
When $c$ is nonzero, the same accumulation occurs with a weight. The weight measures how much a source contribution introduced at an earlier time survives to the observation point.
[example: Damped Transport With Source]
Let $\lambda>0$ and consider
\begin{align*}
\partial_t u+\partial_x u+\lambda u=f(t,x), \qquad u(0,x)=g(x).
\end{align*}
The coefficient field of the first-order part is $a(t,x)=(1,1)$. To reach a fixed endpoint $(t,x)$, use the characteristic
\begin{align*}
T(s)=s,\qquad X(s)=x-t+s,\qquad 0\le s\le t.
\end{align*}
Since $x$ and $t$ are fixed endpoint coordinates while $s$ is the parameter,
\begin{align*}
\dot T(s)=\frac{d}{ds}s=1.
\end{align*}
Similarly,
\begin{align*}
\dot X(s)=\frac{d}{ds}(x-t+s)=1.
\end{align*}
At the endpoint,
\begin{align*}
(T(t),X(t))=(t,x-t+t)=(t,x),
\end{align*}
and at the initial time,
\begin{align*}
(T(0),X(0))=(0,x-t+0)=(0,x-t).
\end{align*}
Along this curve, the chain rule gives
\begin{align*}
\frac{d}{ds}u(T(s),X(s))=\partial_tu(T(s),X(s))\dot T(s)+\partial_xu(T(s),X(s))\dot X(s).
\end{align*}
Substituting $T(s)=s$, $X(s)=x-t+s$, $\dot T(s)=1$, and $\dot X(s)=1$ gives
\begin{align*}
\frac{d}{ds}u(s,x-t+s)=\partial_tu(s,x-t+s)\cdot 1+\partial_xu(s,x-t+s)\cdot 1.
\end{align*}
Thus
\begin{align*}
\frac{d}{ds}u(s,x-t+s)=\partial_tu(s,x-t+s)+\partial_xu(s,x-t+s).
\end{align*}
Evaluating the PDE at $(s,x-t+s)$ gives
\begin{align*}
\partial_tu(s,x-t+s)+\partial_xu(s,x-t+s)+\lambda u(s,x-t+s)=f(s,x-t+s).
\end{align*}
Therefore, with $v(s)=u(s,x-t+s)$,
\begin{align*}
v'(s)+\lambda v(s)=f(s,x-t+s).
\end{align*}
Multiply this scalar ODE by $e^{\lambda s}$. By the product rule,
\begin{align*}
\frac{d}{ds}\left(e^{\lambda s}v(s)\right)=\lambda e^{\lambda s}v(s)+e^{\lambda s}v'(s).
\end{align*}
Factoring the right-hand side gives
\begin{align*}
\lambda e^{\lambda s}v(s)+e^{\lambda s}v'(s)=e^{\lambda s}\bigl(v'(s)+\lambda v(s)\bigr).
\end{align*}
Using $v'(s)+\lambda v(s)=f(s,x-t+s)$, we obtain
\begin{align*}
\frac{d}{ds}\left(e^{\lambda s}v(s)\right)=e^{\lambda s}f(s,x-t+s).
\end{align*}
Integrating from $0$ to $t$ and using the fundamental theorem of calculus gives
\begin{align*}
e^{\lambda t}v(t)-e^{\lambda\cdot 0}v(0)=\int_0^t e^{\lambda s}f(s,x-t+s)\,ds.
\end{align*}
Since $e^{\lambda\cdot0}=1$, this becomes
\begin{align*}
e^{\lambda t}v(t)-v(0)=\int_0^t e^{\lambda s}f(s,x-t+s)\,ds.
\end{align*}
Now $v(t)=u(t,x)$, while the initial condition gives
\begin{align*}
v(0)=u(0,x-t)=g(x-t).
\end{align*}
Hence
\begin{align*}
e^{\lambda t}u(t,x)-g(x-t)=\int_0^t e^{\lambda s}f(s,x-t+s)\,ds.
\end{align*}
Multiplying both sides by $e^{-\lambda t}$ yields
\begin{align*}
u(t,x)=e^{-\lambda t}g(x-t)+e^{-\lambda t}\int_0^t e^{\lambda s}f(s,x-t+s)\,ds.
\end{align*}
Move the constant factor $e^{-\lambda t}$ into the integral:
\begin{align*}
e^{-\lambda t}\int_0^t e^{\lambda s}f(s,x-t+s)\,ds=\int_0^t e^{-\lambda t}e^{\lambda s}f(s,x-t+s)\,ds.
\end{align*}
Since $e^{-\lambda t}e^{\lambda s}=e^{-\lambda t+\lambda s}=e^{-\lambda(t-s)}$, this gives
\begin{align*}
e^{-\lambda t}\int_0^t e^{\lambda s}f(s,x-t+s)\,ds=\int_0^t e^{-\lambda(t-s)}f(s,x-t+s)\,ds.
\end{align*}
Therefore
\begin{align*}
u(t,x)=e^{-\lambda t}g(x-t)+\int_0^t e^{-\lambda(t-s)}f(s,x-t+s)\,ds.
\end{align*}
The first term is the transported initial value multiplied by the damping factor $e^{-\lambda t}$, while the second term records source contributions inserted at time $s$ and then damped for the remaining time $t-s$. If $f\equiv 1$, then the source contribution is
\begin{align*}
\int_0^t e^{-\lambda(t-s)}\,ds.
\end{align*}
Use the substitution $r=t-s$. Then $dr=-ds$, and the endpoint $s=0$ corresponds to $r=t$, while $s=t$ corresponds to $r=0$. Hence
\begin{align*}
\int_0^t e^{-\lambda(t-s)}\,ds=\int_t^0 e^{-\lambda r}(-dr).
\end{align*}
Reversing the limits gives
\begin{align*}
\int_t^0 e^{-\lambda r}(-dr)=\int_0^t e^{-\lambda r}\,dr.
\end{align*}
Since $\lambda>0$, an antiderivative of $e^{-\lambda r}$ is $-\lambda^{-1}e^{-\lambda r}$. Therefore
\begin{align*}
\int_0^t e^{-\lambda r}\,dr=\left(-\frac{1}{\lambda}e^{-\lambda t}\right)-\left(-\frac{1}{\lambda}e^0\right).
\end{align*}
Because $e^0=1$,
\begin{align*}
\int_0^t e^{-\lambda r}\,dr=-\frac{1}{\lambda}e^{-\lambda t}+\frac{1}{\lambda}.
\end{align*}
Thus
\begin{align*}
\int_0^t e^{-\lambda(t-s)}\,ds=\frac{1-e^{-\lambda t}}{\lambda}.
\end{align*}
Since $\lambda>0$, the exponent $-\lambda t$ tends to $-\infty$ as $t\to\infty$, so $e^{-\lambda t}\to0$. The constant-source contribution therefore tends to
\begin{align*}
\frac{1-0}{\lambda}=\frac{1}{\lambda},
\end{align*}
expressing the balance between constant forcing and damping.
[/example]
## Initial Hypersurfaces and Transversality
The characteristic formulas determine $u$ only after a starting value on each characteristic has been chosen. The geometric problem is therefore to choose an initial hypersurface that intersects nearby characteristics once and in a stable way.
[definition: Initial Hypersurface]
Let $U\subset \mathbb R^n$ be open. An initial hypersurface is a $C^1$ embedded hypersurface $S\subset U$ together with prescribed data $u|_S=g$, where $g:S\to \mathbb R$.
[/definition]
An initial hypersurface by itself is only a place where data are written down; it may or may not be a good cross-section for the characteristic flow. To decide whether the Cauchy problem can propagate the data away from $S$, we must compare the vector field $a$ with the tangent spaces of $S$.
[definition: Characteristic and Noncharacteristic Hypersurface]
Let $S\subset U$ be a $C^1$ hypersurface and let $a:U\to \mathbb R^n$ be a vector field. The hypersurface $S$ is noncharacteristic at $p\in S$ if
\begin{align*}
a(p)\notin T_pS.
\end{align*}
It is characteristic at $p$ if $a(p)\in T_pS$.
[/definition]
This definition is geometric, but computations usually describe a hypersurface as a level set. The practical obstruction is that checking whether $a(p)$ lies in the tangent space $T_pS$ can be inconvenient when $S$ is given by an equation $\Phi=0$. In that representation, tangency should be detectable by testing the vector field against the normal covector $\nabla\Phi(p)$, giving the scalar criterion used in Cauchy problems.
[quotetheorem:6141]
[citeproof:6141]
The transversality condition is local, but it has a strong consequence: near a noncharacteristic point, the flow gives coordinates consisting of the starting point on $S$ and the time travelled along the characteristic. This is exactly the coordinate system needed to turn the Cauchy problem into a family of ODE initial value problems. The theorem is only a pointwise test: it does not say that the same hypersurface remains noncharacteristic away from the point being checked. For example, for the vector field $a(t,x)=(1,x)$ and the curve $S=\{t=x^2/2\}$, the defining function $\Phi(t,x)=t-x^2/2$ gives $a\cdot\nabla\Phi=1-x^2$, so transversality fails at $x=\pm 1$ even though it holds near $x=0$. Thus the scalar condition must be checked on the part of the initial surface used for the Cauchy problem, not only at a convenient base point.
[illustration:pdei-noncharacteristic-hypersurface]
The preceding criterion supplies the geometric hypothesis, but well-posedness also requires a theorem converting that geometry into an actual solution. The next result combines the ODE existence theorem, smooth dependence of flows on initial data, and the scalar integrating-factor formula. Its purpose is local: it proves that near a noncharacteristic point the Cauchy problem is equivalent to solving one ordinary differential equation on each characteristic.
[explanation: Local Noncharacteristic Cauchy Construction]
Consider the linear first-order equation
\begin{align*}
a(x)\cdot\nabla u(x)+c(x)u(x)=f(x)
\end{align*}
with initial data $u=g$ on a $C^1$ hypersurface $S=\{\Phi=0\}$. If $a(p)\cdot\nabla\Phi(p)\ne0$ at a point $p\in S$, then the characteristic flow crosses $S$ transversely near $p$. Locally, nearby points can be labelled by their starting point on $S$ and the time travelled along the characteristic, and the PDE reduces to one scalar linear ODE on each such curve.
[/explanation]
The condition $a(p)\cdot\nabla\Phi(p)\ne0$ is exactly the hypothesis that prevents characteristics from sliding along the initial surface. If it fails, the line $x-t=0$ for $\partial_tu+\partial_xu=0$ is a concrete counterexample: data there determine only one characteristic and say nothing about neighbouring ones. The construction is also only local; it does not rule out later intersections of characteristics, exit from the domain, or global loss of a single-valued coordinate system. These limitations motivate the final comparison with characteristic initial data, where the Cauchy problem loses the mechanism that made the proof work.
[example: Noncharacteristic Initial Line]
For
\begin{align*}
\partial_t u(t,x)+x\partial_xu(t,x)=0, \qquad u(0,x)=g(x),
\end{align*}
the first-order coefficient field in the $(t,x)$ variables is $a(t,x)=(1,x)$. The initial line is
\begin{align*}
S=\{(t,x):t=0\},
\end{align*}
so we may use the defining function $\Phi(t,x)=t$. Its gradient is
\begin{align*}
\nabla\Phi(t,x)=(\partial_t\Phi(t,x),\partial_x\Phi(t,x))=(1,0).
\end{align*}
Therefore
\begin{align*}
a(t,x)\cdot\nabla\Phi(t,x)=(1,x)\cdot(1,0)=1\cdot 1+x\cdot 0=1.
\end{align*}
Since $1\ne0$, the transversality criterion above shows that $S$ is noncharacteristic at every point of the initial line.
To compute the solution determined by these data, solve the characteristic equations through the point $(0,x_0)$:
\begin{align*}
\dot T(s)=1,\qquad \dot X(s)=X(s),\qquad T(0)=0,\qquad X(0)=x_0.
\end{align*}
Integrating $\dot T(s)=1$ from $0$ to $s$ gives
\begin{align*}
T(s)-T(0)=\int_0^s 1\,dr=s.
\end{align*}
Since $T(0)=0$, this gives
\begin{align*}
T(s)=s.
\end{align*}
For $X$, multiply the equation $\dot X(s)=X(s)$ by $e^{-s}$. By the product rule,
\begin{align*}
\frac{d}{ds}\left(e^{-s}X(s)\right)
=\frac{d}{ds}(e^{-s})X(s)+e^{-s}\dot X(s).
\end{align*}
Since $\frac{d}{ds}(e^{-s})=-e^{-s}$, this becomes
\begin{align*}
\frac{d}{ds}\left(e^{-s}X(s)\right)
=-e^{-s}X(s)+e^{-s}\dot X(s).
\end{align*}
Using $\dot X(s)=X(s)$,
\begin{align*}
\frac{d}{ds}\left(e^{-s}X(s)\right)
=-e^{-s}X(s)+e^{-s}X(s)=0.
\end{align*}
Hence $e^{-s}X(s)$ is constant in $s$. Evaluating the constant at $s=0$ gives
\begin{align*}
e^{-s}X(s)=e^0X(0)=x_0.
\end{align*}
Multiplying by $e^s$ yields
\begin{align*}
X(s)=x_0e^s.
\end{align*}
Along this characteristic, the chain rule gives
\begin{align*}
\frac{d}{ds}u(T(s),X(s))
=\partial_tu(T(s),X(s))\dot T(s)+\partial_xu(T(s),X(s))\dot X(s).
\end{align*}
Substituting $\dot T(s)=1$ and $\dot X(s)=X(s)$ gives
\begin{align*}
\frac{d}{ds}u(T(s),X(s))
=\partial_tu(T(s),X(s))\cdot 1+\partial_xu(T(s),X(s))X(s).
\end{align*}
Since multiplication of real numbers is commutative,
\begin{align*}
\partial_xu(T(s),X(s))X(s)=X(s)\partial_xu(T(s),X(s)),
\end{align*}
so
\begin{align*}
\frac{d}{ds}u(T(s),X(s))
=\partial_tu(T(s),X(s))+X(s)\partial_xu(T(s),X(s)).
\end{align*}
Evaluating the PDE at $(T(s),X(s))$ gives
\begin{align*}
\partial_tu(T(s),X(s))+X(s)\partial_xu(T(s),X(s))=0.
\end{align*}
Therefore
\begin{align*}
\frac{d}{ds}u(T(s),X(s))=0.
\end{align*}
Using $T(s)=s$ and $X(s)=x_0e^s$, this says
\begin{align*}
\frac{d}{ds}u(s,x_0e^s)=0.
\end{align*}
Thus $u(s,x_0e^s)$ is constant in $s$, and evaluating at $s=0$ gives
\begin{align*}
u(s,x_0e^s)=u(0,x_0e^0)=u(0,x_0)=g(x_0).
\end{align*}
For a prescribed endpoint $(t,x)$, set $s=t$ and require
\begin{align*}
x=X(t)=x_0e^t.
\end{align*}
Multiplying both sides by $e^{-t}$ gives
\begin{align*}
xe^{-t}=x_0e^te^{-t}.
\end{align*}
Since $e^te^{-t}=1$, we obtain
\begin{align*}
x_0=xe^{-t}.
\end{align*}
Substituting this starting point into the characteristic formula yields
\begin{align*}
u(t,x)=g(xe^{-t}).
\end{align*}
Thus the noncharacteristic initial line crosses each characteristic of this flow once, and the value initially assigned at $xe^{-t}$ is transported to the point $(t,x)$.
[/example]
## Characteristic Data and Breakdown of the Cauchy Problem
The final issue is what happens when data are posed on a characteristic hypersurface. In that case the initial surface is not a cross-section for the flow, so prescribing data there either fails to determine values off the surface or imposes compatibility conditions along the surface itself.
[example: Breakdown on a Characteristic Curve]
Consider
\begin{align*}
\partial_t u(t,x)+\partial_x u(t,x)=0
\end{align*}
in $\mathbb R^2$, with data prescribed on
\begin{align*}
S=\{(t,x):x-t=0\}.
\end{align*}
For the defining function $\Phi(t,x)=x-t$,
\begin{align*}
\nabla\Phi(t,x)=(\partial_t\Phi(t,x),\partial_x\Phi(t,x))=(-1,1).
\end{align*}
The coefficient field is $a(t,x)=(1,1)$, so
\begin{align*}
a(t,x)\cdot\nabla\Phi(t,x)=(1,1)\cdot(-1,1).
\end{align*}
Expanding the dot product gives
\begin{align*}
(1,1)\cdot(-1,1)=1\cdot(-1)+1\cdot 1.
\end{align*}
Since $1\cdot(-1)=-1$ and $1\cdot1=1$,
\begin{align*}
1\cdot(-1)+1\cdot 1=-1+1=0.
\end{align*}
Thus $a(t,x)\cdot\nabla\Phi(t,x)=0$ at every point of $S$. By the transversality criterion above, the vector field is tangent to $S$ at every point, so the initial line is characteristic.
The characteristic equations are
\begin{align*}
\dot T(s)=1,\qquad \dot X(s)=1.
\end{align*}
Integrating $\dot T(s)=1$ from $0$ to $s$ gives
\begin{align*}
T(s)-T(0)=\int_0^s 1\,dr=s.
\end{align*}
Therefore
\begin{align*}
T(s)=T(0)+s.
\end{align*}
Similarly,
\begin{align*}
X(s)-X(0)=\int_0^s 1\,dr=s,
\end{align*}
so
\begin{align*}
X(s)=X(0)+s.
\end{align*}
Subtracting the two formulas gives
\begin{align*}
X(s)-T(s)=(X(0)+s)-(T(0)+s).
\end{align*}
Expanding and canceling the two occurrences of $s$,
\begin{align*}
(X(0)+s)-(T(0)+s)=X(0)+s-T(0)-s=X(0)-T(0).
\end{align*}
Hence every characteristic remains on a line
\begin{align*}
x-t=k,
\end{align*}
where $k=X(0)-T(0)$.
Along such a characteristic, the chain rule gives
\begin{align*}
\frac{d}{ds}u(T(s),X(s))=\partial_tu(T(s),X(s))\dot T(s)+\partial_xu(T(s),X(s))\dot X(s).
\end{align*}
Using $\dot T(s)=1$ and $\dot X(s)=1$,
\begin{align*}
\frac{d}{ds}u(T(s),X(s))=\partial_tu(T(s),X(s))\cdot1+\partial_xu(T(s),X(s))\cdot1.
\end{align*}
Since multiplication by $1$ leaves each term unchanged,
\begin{align*}
\frac{d}{ds}u(T(s),X(s))=\partial_tu(T(s),X(s))+\partial_xu(T(s),X(s)).
\end{align*}
Evaluating the PDE at $(T(s),X(s))$ gives
\begin{align*}
\partial_tu(T(s),X(s))+\partial_xu(T(s),X(s))=0.
\end{align*}
Therefore
\begin{align*}
\frac{d}{ds}u(T(s),X(s))=0.
\end{align*}
Thus $u$ is constant on each line $x-t=k$.
The initial line $S$ is only the characteristic $x-t=0$. If $k\ne0$, then a point on the line $x-t=k$ cannot also lie on $S$, because membership in $S$ would require $x-t=0$, hence $k=0$. Thus data on $S$ do not determine the constants on nearby characteristics. For instance,
\begin{align*}
u_0(t,x)=0,\qquad u_1(t,x)=x-t
\end{align*}
both solve the PDE. Indeed,
\begin{align*}
\partial_tu_0(t,x)+\partial_xu_0(t,x)=0+0=0,
\end{align*}
and
\begin{align*}
\partial_tu_1(t,x)+\partial_xu_1(t,x)=(-1)+1=0.
\end{align*}
On $S$, where $x-t=0$, their traces agree:
\begin{align*}
u_0(t,x)=0
\end{align*}
and
\begin{align*}
u_1(t,x)=x-t=0.
\end{align*}
At any point with $x-t\ne0$, however,
\begin{align*}
u_0(t,x)=0\ne x-t=u_1(t,x).
\end{align*}
So characteristic initial data fail to determine a unique solution in any two-dimensional neighbourhood crossing the line $S$.
[/example]
Characteristic data can also be inconsistent. If the equation forces a particular evolution along the initial curve, then arbitrary prescribed values on that curve may contradict the PDE.
[example: Incompatible Characteristic Data]
For
\begin{align*}
\partial_t u(t,x)+\partial_x u(t,x)=1,
\end{align*}
prescribe data $u=h$ on the line
\begin{align*}
S=\{(t,x):x-t=0\}.
\end{align*}
Parametrize this line by
\begin{align*}
\gamma(s)=(s,s).
\end{align*}
Then
\begin{align*}
\gamma'(s)=\left(\frac{d}{ds}s,\frac{d}{ds}s\right)=(1,1),
\end{align*}
which is the coefficient field of the first-order part of the PDE. Thus $S$ is itself a characteristic curve.
Suppose $u\in C^1(\mathbb R^2)$ solves the PDE and has trace $u|_S=h$. Along $\gamma$, the chain rule gives
\begin{align*}
\frac{d}{ds}u(\gamma(s))
=\partial_tu(\gamma(s))\frac{d}{ds}s+\partial_xu(\gamma(s))\frac{d}{ds}s.
\end{align*}
Since $\gamma(s)=(s,s)$ and $\frac{d}{ds}s=1$, this becomes
\begin{align*}
\frac{d}{ds}u(s,s)
=\partial_tu(s,s)\cdot 1+\partial_xu(s,s)\cdot 1.
\end{align*}
Multiplication by $1$ leaves each term unchanged, so
\begin{align*}
\frac{d}{ds}u(s,s)=\partial_tu(s,s)+\partial_xu(s,s).
\end{align*}
Evaluating the PDE at the point $(s,s)$ gives
\begin{align*}
\partial_tu(s,s)+\partial_xu(s,s)=1.
\end{align*}
Therefore
\begin{align*}
\frac{d}{ds}u(s,s)=1.
\end{align*}
The trace condition $u|_S=h$ means that for every $s$,
\begin{align*}
u(s,s)=h(s,s).
\end{align*}
Differentiating both sides with respect to $s$ gives
\begin{align*}
\frac{d}{ds}u(s,s)=\frac{d}{ds}h(s,s).
\end{align*}
Combining this with $\frac{d}{ds}u(s,s)=1$ yields the necessary compatibility condition
\begin{align*}
\frac{d}{ds}h(s,s)=1.
\end{align*}
If, for example, $h(s,s)=0$ for every $s$, then
\begin{align*}
\frac{d}{ds}h(s,s)=\frac{d}{ds}0=0.
\end{align*}
This gives
\begin{align*}
0=\frac{d}{ds}h(s,s)=1,
\end{align*}
a contradiction. Hence no $C^1$ solution can realize those data: on a characteristic curve, the PDE already fixes how the prescribed trace must change.
[/example]
The contrast with the noncharacteristic theorem is the main lesson of the chapter. Linear first-order equations are locally solvable by characteristics when the initial surface is a genuine cross-section for the vector field; without transversality, the Cauchy problem loses either existence, uniqueness, or both. This same flow viewpoint underlies the quasilinear equations of Chapter 3, conservation laws before shocks form in Chapters 5-7, and the Hamilton-Jacobi and geometric-optics material of Chapter 4, where information travels along distinguished curves rather than spreading in all directions at once.
The linear transport picture shows that a PDE can often be reduced to ODEs along curves determined by the equation. Chapter 3 keeps that characteristic viewpoint but lets the velocity field depend on the unknown itself, so the geometry of the solution becomes part of the problem.
# 3. Quasilinear First-Order Equations
This chapter moves from linear transport, where the characteristic curves are prescribed independently of the unknown, to quasilinear equations, where the unknown itself changes the velocity field. The prerequisite picture is the characteristic method from Chapter 2, together with the standard ODE facts of local existence, uniqueness, smooth dependence, and the [inverse function theorem](/theorems/51). The central question is how far the characteristic method survives when the curves must be solved together with the solution. We shall see that local classical solvability still holds under a transversality condition, but that even smooth data can force derivatives to become infinite in finite time.
## Characteristic Systems for Quasilinear Scalar Equations
For a linear first-order equation, the characteristic curves are known once the coefficients are known. The new difficulty in a quasilinear scalar equation is that the coefficient field may depend on the value of the solution, so the geometry of propagation is part of the unknown. The first task is therefore to formulate the class of equations whose highest derivatives are still linear but whose characteristic speeds may be solution-dependent.
[definition: Quasilinear First-Order Scalar Equation]
Let $U \subset \mathbb R^n$ be open, let $I \subset \mathbb R$ be an interval, let $a:U \times I \to \mathbb R^n$ and $b:U \times I \to \mathbb R$ be functions, and let $u:U \to I$ be unknown. A quasilinear first-order scalar equation is an equation of the form
\begin{align*}
a(x,u(x))\cdot \nabla u(x) = b(x,u(x)), \qquad x \in U.
\end{align*}
[/definition]
The word quasilinear records that the first derivatives of $u$ appear linearly, while the coefficients of those derivatives may depend on $u$ itself. This is the lowest-order setting in which characteristics interact with the unknown rather than merely reading it off.
[example: Linear Transport as a Quasilinear Equation]
Let $a:U\to\mathbb R^n$ and $c:U\to\mathbb R$ be given, and consider the linear transport equation
\begin{align*}
a(x)\cdot \nabla u(x)=c(x).
\end{align*}
We show that this is a special case of the quasilinear form by defining coefficient functions on $U\times \mathbb R$ by
\begin{align*}
A(x,z)=a(x)
\end{align*}
and
\begin{align*}
B(x,z)=c(x).
\end{align*}
The definitions of $A$ and $B$ do not depend on the second variable $z$. Hence, when $z=u(x)$,
\begin{align*}
A(x,u(x))=a(x)
\end{align*}
and
\begin{align*}
B(x,u(x))=c(x).
\end{align*}
Taking the dot product of the first identity with $\nabla u(x)$ gives
\begin{align*}
A(x,u(x))\cdot \nabla u(x)=a(x)\cdot \nabla u(x).
\end{align*}
Therefore
\begin{align*}
A(x,u(x))\cdot \nabla u(x)=B(x,u(x))
\end{align*}
is equivalent, term by term, to
\begin{align*}
a(x)\cdot \nabla u(x)=c(x).
\end{align*}
For these coefficients, the characteristic system is
\begin{align*}
\dot{x}(s)=A(x(s),z(s)), \qquad \dot{z}(s)=B(x(s),z(s)).
\end{align*}
Substituting the definitions of $A$ and $B$ into the first equation gives
\begin{align*}
\dot{x}(s)=A(x(s),z(s))=a(x(s)).
\end{align*}
Substituting the definition of $B$ into the second equation gives
\begin{align*}
\dot{z}(s)=B(x(s),z(s))=c(x(s)).
\end{align*}
Thus the equation for $x(s)$ contains no $z(s)$, so the characteristic path is determined by the vector field $a$ alone. Once that path is known, the transported value is obtained from the scalar ODE $\dot z(s)=c(x(s))$, so the linear case sits inside the quasilinear framework exactly as the case where the characteristic geometry is independent of the solution value.
[/example]
The example separates the two pieces that must be coupled in the genuinely quasilinear case: the path in the base space and the value of the solution on that path. To make this coupling into a solvable object, we encode both quantities in a single ODE system on $U\times I$.
[definition: Characteristic System for a Quasilinear Equation]
For the quasilinear equation
\begin{align*}
a(x,u(x))\cdot \nabla u(x)=b(x,u(x)),
\end{align*}
the characteristic system is the system of ordinary differential equations for maps $x:J\to U$ and $z:J\to I$, where $J\subset \mathbb R$ is an interval,
\begin{align*}
\dot{x}(s)=a(x(s),z(s)), \qquad \dot{z}(s)=b(x(s),z(s)).
\end{align*}
[/definition]
Here $x$ is the characteristic position and $z$ is the transported solution value. This system lives in the enlarged space $U\times I$. The next question is why this ODE system is the correct one: if a classical solution already exists, its graph should be invariant under the lifted characteristic flow.
[quotetheorem:6142]
[citeproof:6142]
The classical regularity hypotheses on $u$ are essential here because the argument differentiates $u(x(s))$ along a curve; a discontinuous weak solution of Burgers after shock formation has no pointwise chain rule of this kind. The continuity of $a$ and $b$ is not the full ODE existence theory needed later, but it prevents the right-hand side on the lifted graph from changing value under harmless limiting operations. If this regularity is dropped, the characteristic ODE can stop being a classical object: for instance, the scalar ODE $\dot{x}=\mathbf{1}_{\{0\}}(x)$ has no $C^1$ solution through $x(0)=0$, because remaining at $0$ gives derivative $0$ while leaving $0$ makes the right-hand side immediately vanish. Thus the hypotheses on $a$ and $b$ are not decorative; they keep the graph calculation tied to an actual characteristic system. The theorem does not construct a solution from data; it only says that any classical solution must be organised by the lifted ODE flow. To build solutions, we also need to reverse the argument: prescribe initial values, solve the characteristic ODE system, and recover $u$ from the resulting characteristic surface.
[example: Inviscid Burgers Characteristics]
Consider the inviscid Burgers equation
\begin{align*}
u_t+u u_x=0
\end{align*}
for a $C^1$ function $u:(0,T)\times\mathbb R\to\mathbb R$, with initial data $u(0,x)=u_0(x)$. Write a characteristic in space-time as $s\mapsto (t(s),x(s))$, and set
\begin{align*}
z(s)=u(t(s),x(s)).
\end{align*}
The chain rule gives
\begin{align*}
\dot z(s)=\frac{d}{ds}u(t(s),x(s)).
\end{align*}
Hence
\begin{align*}
\dot z(s)=u_t(t(s),x(s))\dot t(s)+u_x(t(s),x(s))\dot x(s).
\end{align*}
Choose the characteristic equations
\begin{align*}
\dot t(s)=1
\end{align*}
and
\begin{align*}
\dot x(s)=z(s).
\end{align*}
Since $z(s)=u(t(s),x(s))$, substituting these equations into the chain-rule identity gives
\begin{align*}
\dot z(s)=u_t(t(s),x(s))\cdot 1+u_x(t(s),x(s))z(s).
\end{align*}
Replacing $z(s)$ by $u(t(s),x(s))$ gives
\begin{align*}
\dot z(s)=u_t(t(s),x(s))+u_x(t(s),x(s))u(t(s),x(s)).
\end{align*}
Since multiplication of real numbers is commutative,
\begin{align*}
u_x(t(s),x(s))u(t(s),x(s))=u(t(s),x(s))u_x(t(s),x(s)).
\end{align*}
Therefore
\begin{align*}
\dot z(s)=u_t(t(s),x(s))+u(t(s),x(s))u_x(t(s),x(s)).
\end{align*}
Evaluating the Burgers equation at $(t(s),x(s))$ gives
\begin{align*}
u_t(t(s),x(s))+u(t(s),x(s))u_x(t(s),x(s))=0,
\end{align*}
so
\begin{align*}
\dot z(s)=0.
\end{align*}
Label the characteristic by its initial footpoint $x_0$, so that
\begin{align*}
t(0)=0,\qquad x(0)=x_0,\qquad z(0)=u_0(x_0).
\end{align*}
From $\dot t(s)=1$, integration from $0$ to $s$ gives
\begin{align*}
t(s)-t(0)=\int_0^s 1\,dr.
\end{align*}
Since $\int_0^s 1\,dr=s$ and $t(0)=0$,
\begin{align*}
t(s)=s.
\end{align*}
From $\dot z(s)=0$, integration from $0$ to $s$ gives
\begin{align*}
z(s)-z(0)=\int_0^s 0\,dr.
\end{align*}
Since $\int_0^s 0\,dr=0$ and $z(0)=u_0(x_0)$,
\begin{align*}
z(s)=u_0(x_0).
\end{align*}
The equation $\dot x(s)=z(s)$ therefore becomes
\begin{align*}
\dot x(s)=u_0(x_0).
\end{align*}
Here $x_0$ is the fixed label of the characteristic, so $u_0(x_0)$ is constant with respect to the integration variable. Integrating from $0$ to $s$ gives
\begin{align*}
x(s)-x(0)=\int_0^s u_0(x_0)\,dr.
\end{align*}
Thus
\begin{align*}
x(s)-x(0)=s\,u_0(x_0).
\end{align*}
Using $x(0)=x_0$, we obtain
\begin{align*}
x(s)=x_0+s\,u_0(x_0).
\end{align*}
Because $t=s$, the characteristic relation can be written as
\begin{align*}
x=x_0+t\,u_0(x_0).
\end{align*}
Along this same characteristic,
\begin{align*}
u(t,x)=z(t)=u_0(x_0).
\end{align*}
Thus the solution value is constant along each Burgers characteristic, while the characteristic speed is that same constant value. As long as the map $x_0\mapsto x_0+t\,u_0(x_0)$ is locally invertible, each point $(t,x)$ has a unique footpoint $x_0$, and the classical solution is recovered by transporting the initial value from that footpoint.
[/example]
This calculation displays the main feature of nonlinear transport: different solution values travel at different speeds. If faster characteristics start behind slower ones, the characteristic map can fold and a single-valued classical solution cannot persist.
## Local Classical Solutions and Dependence on Initial Data
The characteristic construction needs initial data on a hypersurface rather than on the whole domain. The guiding question is: when does a family of lifted ODE trajectories form the graph of a function near the initial hypersurface? The answer is that the characteristic direction must be transverse to the initial surface, so that the characteristic labels and characteristic time form local coordinates.
[definition: Noncharacteristic Initial Hypersurface]
Let $U\subset \mathbb R^n$ be open, let $\Gamma\subset U$ be a $C^1$ hypersurface with unit normal $\nu(y)$ at $y\in \Gamma$, let $I\subset \mathbb R$ be an interval, let $a:U\times I\to \mathbb R^n$, let $b:U\times I\to \mathbb R$, and let $g\in C^1(\Gamma;I)$ be an initial datum, so $g:\Gamma\to I$. For the equation
\begin{align*}
a(x,u(x))\cdot \nabla u(x)=b(x,u(x)),
\end{align*}
with initial data $u|_\Gamma=g$, the hypersurface $\Gamma$ is noncharacteristic at $y\in \Gamma$ if
\begin{align*}
a(y,g(y))\cdot \nu(y)\ne 0.
\end{align*}
[/definition]
The noncharacteristic condition says that the characteristic direction crosses the initial surface instead of sliding along it. Without this condition, data on $\Gamma$ may fail to determine values away from $\Gamma$, or may overdetermine values along characteristic curves already contained in $\Gamma$.
[example: Characteristic and Noncharacteristic Data for Transport]
For the equation $u_x=0$ on $\mathbb R^2$, we can write
\begin{align*}
(1,0)\cdot \nabla u(x,y)=0,
\end{align*}
because
\begin{align*}
(1,0)\cdot \nabla u(x,y)=(1,0)\cdot (u_x(x,y),u_y(x,y))=1\cdot u_x(x,y)+0\cdot u_y(x,y)=u_x(x,y).
\end{align*}
Thus the characteristic direction is $(1,0)$. A characteristic curve $s\mapsto (x(s),y(s))$ satisfies
\begin{align*}
\dot x(s)=1
\end{align*}
and
\begin{align*}
\dot y(s)=0.
\end{align*}
Integrating from $0$ to $s$ gives
\begin{align*}
x(s)-x(0)=\int_0^s 1\,dr=s
\end{align*}
and
\begin{align*}
y(s)-y(0)=\int_0^s 0\,dr=0.
\end{align*}
Hence
\begin{align*}
x(s)=x(0)+s
\end{align*}
and
\begin{align*}
y(s)=y(0),
\end{align*}
so the characteristics are horizontal lines.
Along such a curve, the chain rule gives
\begin{align*}
\frac{d}{ds}u(x(s),y(s))=u_x(x(s),y(s))\dot x(s)+u_y(x(s),y(s))\dot y(s).
\end{align*}
Substituting $\dot x(s)=1$ and $\dot y(s)=0$ gives
\begin{align*}
\frac{d}{ds}u(x(s),y(s))=u_x(x(s),y(s))\cdot 1+u_y(x(s),y(s))\cdot 0.
\end{align*}
Since
\begin{align*}
u_x(x(s),y(s))\cdot 1+u_y(x(s),y(s))\cdot 0=u_x(x(s),y(s)),
\end{align*}
we obtain
\begin{align*}
\frac{d}{ds}u(x(s),y(s))=u_x(x(s),y(s)).
\end{align*}
Using $u_x=0$ on $\mathbb R^2$ gives
\begin{align*}
\frac{d}{ds}u(x(s),y(s))=0.
\end{align*}
Thus $u$ is constant along each horizontal characteristic.
Now consider data on the line $y=0$. This line has tangent direction $(1,0)$ and unit normal $(0,1)$. Since
\begin{align*}
(1,0)\cdot(0,1)=1\cdot 0+0\cdot 1=0,
\end{align*}
the characteristic direction is tangent to the line, so $y=0$ is characteristic. If data $u(x,0)=h(x)$ are prescribed there and a $C^1$ solution exists, then for any $x_1,x_2\in\mathbb R$,
\begin{align*}
h(x_2)-h(x_1)=u(x_2,0)-u(x_1,0).
\end{align*}
Applying the fundamental theorem of calculus to the one-variable function $r\mapsto u(r,0)$ gives
\begin{align*}
u(x_2,0)-u(x_1,0)=\int_{x_1}^{x_2}u_x(r,0)\,dr.
\end{align*}
Using $u_x(r,0)=0$ for every $r$ gives
\begin{align*}
\int_{x_1}^{x_2}u_x(r,0)\,dr=\int_{x_1}^{x_2}0\,dr=0.
\end{align*}
Therefore
\begin{align*}
h(x_2)-h(x_1)=0,
\end{align*}
so
\begin{align*}
h(x_2)=h(x_1)
\end{align*}
for all $x_1,x_2\in\mathbb R$. Compatible $C^1$ data on the characteristic line $y=0$ must therefore be constant.
By contrast, the line $x=0$ has tangent direction $(0,1)$ and unit normal $(1,0)$. Since
\begin{align*}
(1,0)\cdot(1,0)=1\cdot 1+0\cdot 0=1\ne 0,
\end{align*}
the line $x=0$ is noncharacteristic. Given data $u(0,y)=g(y)$, define
\begin{align*}
u(x,y)=g(y).
\end{align*}
For each fixed $y$, the expression $g(y)$ is independent of $x$, so differentiating with respect to $x$ gives
\begin{align*}
u_x(x,y)=\frac{\partial}{\partial x}g(y)=0.
\end{align*}
Evaluating the definition at $x=0$ gives
\begin{align*}
u(0,y)=g(y).
\end{align*}
Thus data on the noncharacteristic line $x=0$ determine a solution by carrying the value $g(y)$ unchanged along the horizontal characteristic at height $y$.
[/example]
We now state the local theorem in the form used throughout first-order PDE theory. The theorem is a direct application of ODE existence, uniqueness, and smooth dependence, followed by the inverse function theorem.
[quotetheorem:6143]
[citeproof:6143]
Each hypothesis has a specific role in turning the lifted ODE surface into a graph. If the noncharacteristic condition fails, as for $u_x=0$ with data on the line $y=0$, the initial curve is itself characteristic and arbitrary data along it are not compatible with a solution unless they are constant along that curve. By contrast, the line $x=0$ is noncharacteristic for this equation, and its data determine nearby values by crossing the characteristic direction. If the coefficients are not regular enough for ODE uniqueness, distinct characteristic flows can leave the same initial point, so the construction need not give a unique solution. This theorem is local in both space and characteristic time: it gives a solution only as long as the characteristic coordinate map remains invertible and the lifted ODE solution remains in the region where $a$ and $b$ are defined.
[remark: Locality of the Construction]
The theorem does not assert that the solution extends globally. A characteristic may leave $U$, the value $z$ may leave $I$, or the projection $(s,\theta)\mapsto x(s,\theta)$ may lose rank. The last obstruction is the analytic origin of shock formation in scalar conservation laws.
[/remark]
The same construction also records how solutions depend on the initial data. This matters because well-posedness is not only existence and uniqueness; it also requires stability under small perturbations.
[quotetheorem:6144]
[citeproof:6144]
Each part of the statement prevents a different failure mode in the finite-dimensional characteristic construction. The $C^k$ assumptions on $a$ and $b$ are assumptions on the vector field $(a,b)$ on $U\times I$; they ensure that the lifted ODE flow depends $C^k$-smoothly on the initial point $(Y(\theta),h(\lambda,\theta))$. For merely continuous vector fields this flow may fail to be unique, as in $\dot{x}=|x|^{1/2}$ from the initial point $0$, so there is no well-defined characteristic surface to invert. The $C^k$ regularity of $Y$ and $h$ controls the initial graph from which the flow starts, while the noncharacteristic inequality at $\lambda_0$ supplies the determinant that remains nonzero for nearby parameters after shrinking.
The restriction to a finite-dimensional parameter $\lambda\in\mathbb R^m$ is also part of the hypothesis, not a cosmetic choice. It is exactly the setting in which standard ODE smooth dependence and the inverse function theorem with parameters apply without introducing a Banach-space theory of composition. The theorem is therefore not a claim that arbitrary data give a $C^k$ map between naive $C^k$ Banach spaces; such a claim would require additional functional-analytic structure, since composition can lose derivatives. For example, the translation curve $t\mapsto h(\cdot+t)$ is not differentiable as a curve in $C^1$ when $h(x)=x|x|$ near $0$, even though $h\in C^1$. The common physical neighbourhood $V$ is equally necessary: without a single region on which every nearby characteristic projection is invertible, the expression $(\lambda,x)\mapsto u_\lambda(x)$ would compare solutions at different physical points or on different domains. For Burgers with decreasing initial data, the lifted flow remains smooth while this projection folds in finite time. Thus the theorem gives stability only in the shared pre-crossing regime, not continuous dependence of a shock-forming classical solution beyond its lifespan. This prepares the traffic-flow example below, where smooth dependence holds up to the first time at which compression destroys the characteristic coordinates.
[example: Traffic Flow Before Shock Formation]
Let $\rho(0,x)=\rho_0(x)$ and suppose $\rho$ is $C^1$. The conservation law
\begin{align*}
\rho_t+(Q(\rho))_x=0
\end{align*}
has quasilinear form
\begin{align*}
\rho_t+Q'(\rho)\rho_x=0,
\end{align*}
because, for fixed $t$, the one-variable chain rule applied to $x\mapsto Q(\rho(t,x))$ gives
\begin{align*}
(Q(\rho))_x(t,x)=Q'(\rho(t,x))\rho_x(t,x).
\end{align*}
Let $s\mapsto (t(s),x(s))$ be a characteristic and set
\begin{align*}
z(s)=\rho(t(s),x(s)).
\end{align*}
Choose
\begin{align*}
\dot t(s)=1
\end{align*}
and
\begin{align*}
\dot x(s)=Q'(z(s)).
\end{align*}
The chain rule gives
\begin{align*}
\dot z(s)=\frac{d}{ds}\rho(t(s),x(s)).
\end{align*}
Hence
\begin{align*}
\dot z(s)=\rho_t(t(s),x(s))\dot t(s)+\rho_x(t(s),x(s))\dot x(s).
\end{align*}
Substituting $\dot t(s)=1$ and $\dot x(s)=Q'(z(s))$ gives
\begin{align*}
\dot z(s)=\rho_t(t(s),x(s))\cdot 1+\rho_x(t(s),x(s))Q'(z(s)).
\end{align*}
Since multiplication by $1$ does not change the first term,
\begin{align*}
\dot z(s)=\rho_t(t(s),x(s))+\rho_x(t(s),x(s))Q'(z(s)).
\end{align*}
Using $z(s)=\rho(t(s),x(s))$, this becomes
\begin{align*}
\dot z(s)=\rho_t(t(s),x(s))+\rho_x(t(s),x(s))Q'(\rho(t(s),x(s))).
\end{align*}
Since the factors are real-valued,
\begin{align*}
\rho_x(t(s),x(s))Q'(\rho(t(s),x(s)))=Q'(\rho(t(s),x(s)))\rho_x(t(s),x(s)).
\end{align*}
Therefore
\begin{align*}
\dot z(s)=\rho_t(t(s),x(s))+Q'(\rho(t(s),x(s)))\rho_x(t(s),x(s)).
\end{align*}
Evaluating $\rho_t+Q'(\rho)\rho_x=0$ at $(t(s),x(s))$ gives
\begin{align*}
\rho_t(t(s),x(s))+Q'(\rho(t(s),x(s)))\rho_x(t(s),x(s))=0,
\end{align*}
so
\begin{align*}
\dot z(s)=0.
\end{align*}
If the characteristic starts from the footpoint $x_0$, with
\begin{align*}
t(0)=0,\qquad x(0)=x_0,\qquad z(0)=\rho_0(x_0),
\end{align*}
then integrating $\dot t(s)=1$ from $0$ to $s$ gives
\begin{align*}
t(s)-t(0)=\int_0^s 1\,dr.
\end{align*}
Since $\int_0^s 1\,dr=s$ and $t(0)=0$,
\begin{align*}
t(s)=s.
\end{align*}
Similarly, integrating $\dot z(s)=0$ from $0$ to $s$ gives
\begin{align*}
z(s)-z(0)=\int_0^s 0\,dr.
\end{align*}
Since $\int_0^s 0\,dr=0$ and $z(0)=\rho_0(x_0)$,
\begin{align*}
z(s)=\rho_0(x_0).
\end{align*}
The equation for $x$ is therefore
\begin{align*}
\dot x(s)=Q'(z(s))=Q'(\rho_0(x_0)).
\end{align*}
Here $x_0$ is the fixed label of the characteristic, so $Q'(\rho_0(x_0))$ is constant with respect to $s$. Integrating from $0$ to $s$ gives
\begin{align*}
x(s)-x(0)=\int_0^s Q'(\rho_0(x_0))\,dr.
\end{align*}
Since the integrand is constant in $r$,
\begin{align*}
\int_0^s Q'(\rho_0(x_0))\,dr=sQ'(\rho_0(x_0)).
\end{align*}
Using $x(0)=x_0$ gives
\begin{align*}
x(s)=x_0+sQ'(\rho_0(x_0)).
\end{align*}
Since $t=s$, the characteristic map is
\begin{align*}
x=x_0+tQ'(\rho_0(x_0)).
\end{align*}
Before this map loses local invertibility, the footpoint $x_0$ is recovered from $(t,x)$, and the transported value gives
\begin{align*}
\rho(t,x)=\rho_0(x_0).
\end{align*}
Differentiate the characteristic map with respect to $x_0$, holding $t$ fixed:
\begin{align*}
\frac{\partial x}{\partial x_0}=\frac{\partial}{\partial x_0}\left(x_0+tQ'(\rho_0(x_0))\right).
\end{align*}
By linearity of differentiation,
\begin{align*}
\frac{\partial x}{\partial x_0}=\frac{\partial x_0}{\partial x_0}+t\frac{d}{dx_0}Q'(\rho_0(x_0)).
\end{align*}
Since $\frac{\partial x_0}{\partial x_0}=1$,
\begin{align*}
\frac{\partial x}{\partial x_0}=1+t\frac{d}{dx_0}Q'(\rho_0(x_0)).
\end{align*}
By the chain rule applied to $x_0\mapsto Q'(\rho_0(x_0))$,
\begin{align*}
\frac{d}{dx_0}Q'(\rho_0(x_0))=Q''(\rho_0(x_0))\rho_0'(x_0).
\end{align*}
Hence
\begin{align*}
\frac{\partial x}{\partial x_0}=1+tQ''(\rho_0(x_0))\rho_0'(x_0).
\end{align*}
If $Q''<0$ on the relevant density interval and $\rho_0'(x_0)>0$, then
\begin{align*}
Q''(\rho_0(x_0))<0.
\end{align*}
Multiplying this strict negative number by the strict positive number $\rho_0'(x_0)$ gives
\begin{align*}
Q''(\rho_0(x_0))\rho_0'(x_0)<0.
\end{align*}
Thus the derivative
\begin{align*}
1+tQ''(\rho_0(x_0))\rho_0'(x_0)
\end{align*}
decreases as $t$ increases. Equivalently, if $\rho_2>\rho_1$, then the fundamental theorem of calculus gives
\begin{align*}
Q'(\rho_2)-Q'(\rho_1)=\int_{\rho_1}^{\rho_2}Q''(r)\,dr.
\end{align*}
Since $Q''(r)<0$ for every $r\in(\rho_1,\rho_2)$, the integral over the positively oriented interval is negative:
\begin{align*}
\int_{\rho_1}^{\rho_2}Q''(r)\,dr<0.
\end{align*}
Therefore
\begin{align*}
Q'(\rho_2)-Q'(\rho_1)<0.
\end{align*}
So
\begin{align*}
Q'(\rho_2)<Q'(\rho_1).
\end{align*}
Higher density therefore travels with smaller characteristic speed, and compression destroys the classical characteristic formula exactly when the footpoint map stops being locally invertible.
[/example]
This traffic model shows why the local theorem is useful even when global smoothness is not expected. It gives the correct solution up to the first crossing time, and it identifies the mechanism by which classical solvability fails. The same geometric pattern appears beyond conservation laws: in Hamilton-Jacobi theory, characteristics transport gradients of an action, and caustics form when the characteristic projection ceases to be a graph; in geometric optics, the analogous projection failure corresponds to focusing of rays.
## Gradient Blow-Up and the Limits of Classical Solvability
The characteristic method can produce a smooth lifted surface while the projected graph ceases to be a graph. The central question is therefore not whether the ODE flow survives, but whether the spatial projection remains invertible. In one space dimension this condition can be computed explicitly, especially for conservation laws, because the equation says that a conserved density is transported by a flux depending only on its value. This is the first setting where the local theory above meets a concrete obstruction: the characteristic formula may assign several transported values to the same spatial point.
[definition: Scalar Conservation Law]
Let $T>0$ and let $f\in C^2(I)$, where $I\subset \mathbb R$ is an interval. A scalar conservation law in one space dimension is an equation
\begin{align*}
u_t + (f(u))_x=0
\end{align*}
for an unknown $u:(0,T)\times \mathbb R\to I$.
[/definition]
For a $C^1$ solution, the chain rule rewrites the conservation law as $u_t+f'(u)u_x=0$. The equation then says that the value of $u$ is constant along curves whose speed is determined by that same value. This creates a precise tension: the transported values remain well defined along characteristics, but the map from initial footpoints to current spatial positions may fold and assign more than one value to the same point.
To use this observation as a local solution formula, one needs a theorem that gives both parts of the characteristic representation: how values are transported from their initial footpoints, and what uniqueness condition on the projected characteristic map is required before the formula actually defines a classical graph. The next result provides exactly that conditional representation, setting up the later blow-up test by making the footpoint map explicit.
[quotetheorem:6145]
[citeproof:6145]
The $C^1$ regularity of the solution is needed for the chain rule along curves, and the $C^1$ regularity of $u_0$ is needed for the characteristic map to have a derivative in the footpoint variable. The condition $f\in C^2(I)$ has two roles: $f'\in C^1(I)$ makes the quasilinear speed regular enough for the characteristic calculation, and $f''$ appears in the crossing test below. If $u_0$ has a jump, as in a Riemann problem, the same characteristic picture remains suggestive but the displayed formula is not a classical solution at $t=0$. If $f$ were only $C^1$, the formula could still be written along individual smooth characteristics, but the derivative test for crossing would no longer have the stated form. The theorem also does not assert that the footpoint remains unique for all time; it gives a conditional representation whose failure is detected by the characteristic map. The formula converts loss of classical solvability into a question about the derivative of that map. Differentiating with respect to $x_0$ gives
\begin{align*}
\frac{\partial x}{\partial x_0}=1+t f''(u_0(x_0))u_0'(x_0).
\end{align*}
When this quantity reaches zero, neighbouring characteristics meet and the spatial derivative of the classical solution blows up.
[illustration:pdei-burgers-characteristic-crossing]
The derivative formula supplies a precise crossing test. A negative value of $q(x_0)=f''(u_0(x_0))u_0'(x_0)$ means that the characteristic through $x_0$ is compressed relative to its neighbours, and the larger the negative compression, the sooner the denominator in the derivative formula can vanish. The next theorem turns this local compression into a lifespan bound for the classical characteristic representation.
[quotetheorem:6146]
[citeproof:6146]
The finiteness of $m$ is needed because otherwise $-1/m$ is not a positive time scale; initial slopes with arbitrarily large negative compression can make predicted crossing times accumulate at $0$. The theorem does not say that every failure of classical solvability is a shock, since characteristics may also leave the domain or coefficients may cease to be defined. It isolates the characteristic-crossing mechanism and explains why the sign of the initial slope matters only through its interaction with the convexity of the flux. For Burgers, $f(u)=u^2/2$, so $f''(u)=1$ and decreasing initial data create compression.
[example: Burgers Equation with Increasing Initial Data]
Let $u$ solve Burgers equation
\begin{align*}
u_t+u u_x=0
\end{align*}
with initial condition $u(0,x)=u_0(x)$, where $u_0\in C^1(\mathbb R)$ and $u_0'(x)\ge 0$ for every $x\in\mathbb R$. By *[Characteristic Formula for Smooth Scalar Conservation Laws](/theorems/6145)* applied to the Burgers flux $f(u)=u^2/2$, the characteristic labelled by $x_0$ satisfies
\begin{align*}
X(t,x_0)=x_0+t u_0(x_0)
\end{align*}
and
\begin{align*}
u(t,X(t,x_0))=u_0(x_0)
\end{align*}
as long as the footpoint map remains locally invertible.
Differentiate the characteristic map with respect to $x_0$, holding $t$ fixed. By linearity of differentiation,
\begin{align*}
X_{x_0}(t,x_0)=\frac{\partial}{\partial x_0}x_0+\frac{\partial}{\partial x_0}\bigl(tu_0(x_0)\bigr).
\end{align*}
Since $t$ is fixed in this differentiation,
\begin{align*}
\frac{\partial}{\partial x_0}x_0=1
\end{align*}
and
\begin{align*}
\frac{\partial}{\partial x_0}\bigl(tu_0(x_0)\bigr)=t u_0'(x_0).
\end{align*}
Therefore
\begin{align*}
X_{x_0}(t,x_0)=1+t u_0'(x_0).
\end{align*}
For $t\ge 0$ and $u_0'(x_0)\ge 0$,
\begin{align*}
t u_0'(x_0)\ge 0.
\end{align*}
Adding $1$ to both sides gives
\begin{align*}
1+t u_0'(x_0)\ge 1,
\end{align*}
so
\begin{align*}
X_{x_0}(t,x_0)\ge 1.
\end{align*}
Thus the footpoint map is locally invertible at every such point of the classical characteristic construction.
To compute the spatial derivative, differentiate the transported identity
\begin{align*}
u(t,X(t,x_0))=u_0(x_0)
\end{align*}
with respect to $x_0$. Since $t$ is fixed, the chain rule gives
\begin{align*}
\frac{\partial}{\partial x_0}u(t,X(t,x_0))=u_x(t,X(t,x_0))X_{x_0}(t,x_0).
\end{align*}
The derivative of the right-hand side is
\begin{align*}
\frac{d}{dx_0}u_0(x_0)=u_0'(x_0).
\end{align*}
Hence
\begin{align*}
u_x(t,X(t,x_0))X_{x_0}(t,x_0)=u_0'(x_0).
\end{align*}
Substituting $X_{x_0}(t,x_0)=1+t u_0'(x_0)$ gives
\begin{align*}
u_x(t,X(t,x_0))\bigl(1+t u_0'(x_0)\bigr)=u_0'(x_0).
\end{align*}
Because $1+t u_0'(x_0)\ge 1$, this factor is nonzero, so division is valid:
\begin{align*}
u_x(t,X(t,x_0))=\frac{u_0'(x_0)}{1+t u_0'(x_0)}.
\end{align*}
The numerator is nonnegative and the denominator is positive, hence
\begin{align*}
0\le \frac{u_0'(x_0)}{1+t u_0'(x_0)}.
\end{align*}
Also, since $1+t u_0'(x_0)\ge 1$ and $u_0'(x_0)\ge 0$,
\begin{align*}
\frac{u_0'(x_0)}{1+t u_0'(x_0)}\le \frac{u_0'(x_0)}{1}=u_0'(x_0).
\end{align*}
Therefore
\begin{align*}
0\le u_x(t,X(t,x_0))\le u_0'(x_0).
\end{align*}
Increasing initial data therefore produce characteristics that separate or remain parallel, and along the classical characteristic solution the spatial slope never exceeds the initial slope at the same footpoint.
[/example]
Increasing data show the rarefactive side of Burgers dynamics. To see shock formation, reverse the sign of the initial slope so that faster characteristics start behind slower ones.
[example: Burgers Equation with Decreasing Initial Data]
Let $u_0\in C^1(\mathbb R)$ satisfy
\begin{align*}
\inf_{x_0\in\mathbb R}u_0'(x_0)=-\alpha<0,
\end{align*}
so $\alpha>0$. For Burgers equation, the characteristic formula from *Characteristic Formula for Smooth Scalar Conservation Laws* gives
\begin{align*}
X(t,x_0)=x_0+t u_0(x_0)
\end{align*}
and
\begin{align*}
u(t,X(t,x_0))=u_0(x_0)
\end{align*}
as long as the footpoint map $x_0\mapsto X(t,x_0)$ remains locally invertible.
Differentiate the characteristic map with respect to $x_0$, holding $t$ fixed:
\begin{align*}
X_{x_0}(t,x_0)=\frac{\partial}{\partial x_0}\bigl(x_0+t u_0(x_0)\bigr).
\end{align*}
By linearity of differentiation,
\begin{align*}
\frac{\partial}{\partial x_0}\bigl(x_0+t u_0(x_0)\bigr)=\frac{\partial x_0}{\partial x_0}+\frac{\partial}{\partial x_0}\bigl(tu_0(x_0)\bigr).
\end{align*}
Since $t$ is fixed in this differentiation,
\begin{align*}
\frac{\partial x_0}{\partial x_0}=1
\end{align*}
and
\begin{align*}
\frac{\partial}{\partial x_0}\bigl(tu_0(x_0)\bigr)=t u_0'(x_0).
\end{align*}
Therefore
\begin{align*}
X_{x_0}(t,x_0)=1+t u_0'(x_0).
\end{align*}
Since $\inf_{x_0\in\mathbb R}u_0'(x_0)=-\alpha$, we have
\begin{align*}
u_0'(x_0)\ge -\alpha
\end{align*}
for every $x_0$. For $t\ge 0$, multiplying by $t$ preserves the inequality:
\begin{align*}
t u_0'(x_0)\ge -\alpha t.
\end{align*}
Adding $1$ gives
\begin{align*}
X_{x_0}(t,x_0)=1+t u_0'(x_0)\ge 1-\alpha t.
\end{align*}
The lower bound vanishes when
\begin{align*}
1-\alpha t=0,
\end{align*}
which is equivalent to
\begin{align*}
t=\frac{1}{\alpha}.
\end{align*}
Thus the predicted first crossing time is
\begin{align*}
T_*=\frac{1}{\alpha}.
\end{align*}
This agrees with *[Shock Formation Criterion for Smooth Scalar Conservation Laws](/theorems/6146)*. For Burgers, the flux is
\begin{align*}
f(u)=\frac{u^2}{2}.
\end{align*}
Differentiating once gives
\begin{align*}
f'(u)=u,
\end{align*}
and differentiating again gives
\begin{align*}
f''(u)=1.
\end{align*}
Hence
\begin{align*}
q(x_0)=f''(u_0(x_0))u_0'(x_0).
\end{align*}
Substituting $f''(u_0(x_0))=1$ gives
\begin{align*}
q(x_0)=1\cdot u_0'(x_0)=u_0'(x_0).
\end{align*}
Therefore
\begin{align*}
m=\inf_{x_0\in\mathbb R}q(x_0)=\inf_{x_0\in\mathbb R}u_0'(x_0)=-\alpha.
\end{align*}
Substituting $m=-\alpha$ into $T_*=-1/m$ gives
\begin{align*}
T_*=-\frac{1}{-\alpha}=\frac{1}{\alpha}.
\end{align*}
For $0\le t<T_*=1/\alpha$, the inequality above gives
\begin{align*}
X_{x_0}(t,x_0)=1+t u_0'(x_0)\ge 1-\alpha t>0.
\end{align*}
Hence the characteristic map is locally invertible throughout this pre-crossing range. In this range, differentiate the transported identity
\begin{align*}
u(t,X(t,x_0))=u_0(x_0)
\end{align*}
with respect to $x_0$. Since $t$ is fixed, the chain rule gives
\begin{align*}
\frac{\partial}{\partial x_0}u(t,X(t,x_0))=u_x(t,X(t,x_0))X_{x_0}(t,x_0).
\end{align*}
The derivative of the right-hand side is
\begin{align*}
\frac{d}{dx_0}u_0(x_0)=u_0'(x_0).
\end{align*}
Therefore
\begin{align*}
u_x(t,X(t,x_0))X_{x_0}(t,x_0)=u_0'(x_0).
\end{align*}
Substituting $X_{x_0}(t,x_0)=1+t u_0'(x_0)$ gives
\begin{align*}
u_x(t,X(t,x_0))\bigl(1+t u_0'(x_0)\bigr)=u_0'(x_0).
\end{align*}
Because $1+t u_0'(x_0)>0$ for $0\le t<1/\alpha$, division by this factor is valid:
\begin{align*}
u_x(t,X(t,x_0))=\frac{u_0'(x_0)}{1+t u_0'(x_0)}.
\end{align*}
If the infimum is attained at a footpoint $x_0^*$, then
\begin{align*}
u_0'(x_0^*)=-\alpha.
\end{align*}
At that footpoint, the derivative formula gives
\begin{align*}
u_x(t,X(t,x_0^*))=\frac{u_0'(x_0^*)}{1+t u_0'(x_0^*)}.
\end{align*}
Substituting $u_0'(x_0^*)=-\alpha$ gives
\begin{align*}
u_x(t,X(t,x_0^*))=\frac{-\alpha}{1-\alpha t}.
\end{align*}
As $t\uparrow 1/\alpha$, the denominator satisfies
\begin{align*}
1-\alpha t\downarrow 0,
\end{align*}
while the numerator stays equal to $-\alpha<0$. Hence
\begin{align*}
u_x(t,X(t,x_0^*))\to -\infty.
\end{align*}
If the infimum is not attained, then for every $\varepsilon>0$ there is a footpoint $x_\varepsilon$ such that
\begin{align*}
u_0'(x_\varepsilon)<-\alpha+\varepsilon.
\end{align*}
For this footpoint,
\begin{align*}
1+t u_0'(x_\varepsilon)<1+t(-\alpha+\varepsilon).
\end{align*}
Expanding the right-hand side gives
\begin{align*}
1+t(-\alpha+\varepsilon)=1-\alpha t+\varepsilon t.
\end{align*}
Given any $\delta>0$, choose $t<1/\alpha$ close enough to $1/\alpha$ that
\begin{align*}
1-\alpha t<\frac{\delta}{2}.
\end{align*}
Then choose $\varepsilon>0$ small enough that
\begin{align*}
\varepsilon t<\frac{\delta}{2}.
\end{align*}
Combining the two inequalities gives
\begin{align*}
1-\alpha t+\varepsilon t<\delta.
\end{align*}
Therefore
\begin{align*}
1+t u_0'(x_\varepsilon)<\delta.
\end{align*}
Thus even without an attained minimizer, the denominators in the derivative formula can be made arbitrarily small along footpoints and times approaching $T_*$. Decreasing initial data compress characteristics: the transported value remains $u(t,X(t,x_0))=u_0(x_0)$ before crossing, but the classical characteristic formula breaks down when the footpoint map loses spatial invertibility, and the spatial derivative becomes unbounded in the compressive regime.
[/example]
The shock criterion is a classical result, not yet a weak-solution theory. After crossing, conservation laws require a new interpretation because different characteristic values try to occupy the same point.
[remark: Classical Breakdown Is Not Physical Nonexistence]
Gradient blow-up means that the $C^1$ solution constructed by characteristics has reached the boundary of its validity. It does not mean that the conservation law has no meaningful continuation. Chapters 5 and 6 replace classical solutions by weak solutions and add entropy conditions to select the physically relevant continuation after shocks form.
[/remark]
A discontinuity in the initial data can produce the opposite behaviour: rather than forming a shock from compression, characteristics may fan out from a jump. This is not a classical solution at the initial time, but it is the prototype for the [entropy solution](/page/Entropy%20Solution) of a Riemann problem.
[example: Rarefaction Fan from Discontinuous Burgers Data]
Consider Burgers equation
\begin{align*}
u_t+u u_x=0
\end{align*}
with Riemann data $u(0,x)=u_L$ for $x<0$ and $u(0,x)=u_R$ for $x>0$, where $u_L<u_R$. In a region where $u=c$ is constant, the coefficient of $u_x$ in Burgers equation is $c$, so the characteristic speed is $c$. Thus the left constant state emits characteristics with speed $u_L$, giving the boundary ray
\begin{align*}
x=u_Lt.
\end{align*}
The right constant state emits characteristics with speed $u_R$, giving the boundary ray
\begin{align*}
x=u_Rt.
\end{align*}
Since $u_L<u_R$, for every $t>0$,
\begin{align*}
u_Rt-u_Lt=t(u_R-u_L)>0.
\end{align*}
Hence the two boundary rays enclose a nonempty sector for every positive time.
Define a self-similar profile for $t>0$ by setting $u(t,x)=u_L$ when $x/t\le u_L$, setting $u(t,x)=x/t$ when $u_L<x/t<u_R$, and setting $u(t,x)=u_R$ when $x/t\ge u_R$. In the left constant region, differentiating the constant value $u_L$ gives
\begin{align*}
u_t(t,x)=0.
\end{align*}
Also,
\begin{align*}
u_x(t,x)=0.
\end{align*}
Substituting into the Burgers expression gives
\begin{align*}
u_t+u u_x=0+u_L\cdot 0=0.
\end{align*}
In the right constant region, differentiating the constant value $u_R$ gives
\begin{align*}
u_t(t,x)=0.
\end{align*}
Also,
\begin{align*}
u_x(t,x)=0.
\end{align*}
Therefore
\begin{align*}
u_t+u u_x=0+u_R\cdot 0=0.
\end{align*}
In the fan region $u(t,x)=x/t$ with $t>0$. Since $x/t=xt^{-1}$, differentiating with respect to $t$ while holding $x$ fixed gives
\begin{align*}
u_t(t,x)=\frac{\partial}{\partial t}(xt^{-1})=-xt^{-2}=-\frac{x}{t^2}.
\end{align*}
Differentiating with respect to $x$ while holding $t$ fixed gives
\begin{align*}
u_x(t,x)=\frac{\partial}{\partial x}\left(\frac{x}{t}\right)=\frac{1}{t}.
\end{align*}
Substituting these derivatives and $u(t,x)=x/t$ into the equation gives
\begin{align*}
u_t+u u_x=-\frac{x}{t^2}+\frac{x}{t}\cdot\frac{1}{t}.
\end{align*}
Multiplying the two fractions in the second term gives
\begin{align*}
\frac{x}{t}\cdot\frac{1}{t}=\frac{x}{t^2}.
\end{align*}
Hence
\begin{align*}
u_t+u u_x=-\frac{x}{t^2}+\frac{x}{t^2}=0.
\end{align*}
Thus the proposed profile satisfies Burgers equation at every point away from the two boundary rays.
The formula also matches continuously on the fan boundaries. On the left boundary $x=u_Lt$, with $t>0$,
\begin{align*}
\frac{x}{t}=\frac{u_Lt}{t}=u_L.
\end{align*}
On the right boundary $x=u_Rt$, with $t>0$,
\begin{align*}
\frac{x}{t}=\frac{u_Rt}{t}=u_R.
\end{align*}
Inside the fan, a value $\xi$ with $u_L<\xi<u_R$ lies on the ray
\begin{align*}
x=\xi t.
\end{align*}
Along that ray,
\begin{align*}
u(t,\xi t)=\frac{\xi t}{t}=\xi.
\end{align*}
The characteristic speed at this value is $\xi$, and the ray $x=\xi t$ has slope $x/t=\xi$ in the $(t,x)$-plane. Thus the increasing jump is filled by a continuous family of characteristic speeds between $u_L$ and $u_R$, forming a rarefaction fan.
[/example]
[illustration:pdei-burgers-rarefaction-fan]
The chapter ends with a sharp distinction. Quasilinear first-order equations have a robust local classical theory because their graphs are generated by ODE flows. Their global classical theory fails for structural reasons: the same characteristic mechanism that constructs the solution can destroy the projection needed to view the lifted flow as a single-valued differentiable function.
The quasilinear theory makes clear that characteristics can generate solutions only locally before they cease to define a single-valued graph. Chapter 4 builds on that mechanism and recasts it in Hamilton-Jacobi form, where the unknown behaves like an action or phase and variational structure enters the picture.
# 4. Hamilton-Jacobi Equations
Hamilton-Jacobi equations are nonlinear first-order PDEs in which the unknown function acts as an action, phase, or value function. After the linear and quasilinear characteristic methods of Chapters 2 and 3, the new feature is that the characteristic curves are no longer prescribed in physical space alone: they are generated by a Hamiltonian system in phase space. This chapter develops the characteristic representation for smooth solutions, explains complete integrals, and then turns to the eikonal equation and the Hopf-Lax formula as model descriptions of wavefronts and variational propagation.
## From Linear Characteristics to Hamilton-Jacobi Equations
The linear method of characteristics solves a PDE by transporting values along known curves. For a nonlinear first-order equation, the velocity of propagation depends on the gradient of the solution itself, so the central question becomes: how can we track both the position and the gradient at the same time?
A Hamilton-Jacobi equation is the standard form of this problem. The unknown $u$ is scalar, but its gradient determines the dynamics.
[definition: Hamilton-Jacobi Equation]
Let $U \subseteq \mathbb R^n$ be open, let $T>0$, and let $H \in C^1(U \times \mathbb R^n)$. A first-order Hamilton-Jacobi equation for $u:(0,T)\times U \to \mathbb R$ is an equation of the form
\begin{align*}
u_t(t,x)+H(x,\nabla u(t,x))=0.
\end{align*}
The associated Cauchy problem prescribes
\begin{align*}
u(0,x)=g(x), \qquad x \in U,
\end{align*}
for a given function $g:U\to \mathbb R$.
[/definition]
The Hamiltonian $H$ encodes how the slope $\nabla u$ affects propagation. When $H$ is independent of $x$, the medium is homogeneous; when $H$ depends on $x$, the geometry or material properties vary with position.
[example: Free Particle Hamiltonian]
Let
\begin{align*}
H(p)=\frac{|p|^2}{2}
\end{align*}
on $\mathbb R^n$, and consider
\begin{align*}
u_t+\frac{1}{2}|\nabla u|^2=0, \qquad u(0,x)=g(x).
\end{align*}
Writing $p=(p_1,\dots,p_n)$, we have
\begin{align*}
H(p)=\frac{1}{2}\sum_{j=1}^n p_j^2.
\end{align*}
For each $i=1,\dots,n$,
\begin{align*}
\partial_{p_i}H(p)=\partial_{p_i}\left(\frac{1}{2}\sum_{j=1}^n p_j^2\right).
\end{align*}
By linearity of differentiation,
\begin{align*}
\partial_{p_i}H(p)=\frac{1}{2}\sum_{j=1}^n \partial_{p_i}(p_j^2).
\end{align*}
If $j\neq i$, then $p_j^2$ is independent of $p_i$, so $\partial_{p_i}(p_j^2)=0$. The only remaining term is
\begin{align*}
\partial_{p_i}(p_i^2)=2p_i.
\end{align*}
Therefore
\begin{align*}
\partial_{p_i}H(p)=\frac{1}{2}\cdot 2p_i=p_i.
\end{align*}
Since this holds in every component,
\begin{align*}
H_p(p)=(p_1,\dots,p_n)=p.
\end{align*}
Since $H$ has no $x$-dependence, every $x_i$-derivative is zero, so
\begin{align*}
H_x(x,p)=0.
\end{align*}
Start a characteristic from the initial point $y$ with
\begin{align*}
x(0)=y, \qquad p(0)=\nabla g(y), \qquad z(0)=g(y).
\end{align*}
The momentum equation is
\begin{align*}
\dot p(t)=-H_x(x(t),p(t))=0.
\end{align*}
Thus every component satisfies $\dot p_i(t)=0$, and hence
\begin{align*}
p_i(t)=p_i(0)=\partial_{x_i}g(y), \qquad i=1,\dots,n.
\end{align*}
Equivalently,
\begin{align*}
p(t)=\nabla g(y).
\end{align*}
The position equation becomes
\begin{align*}
\dot x(t)=H_p(p(t))=p(t)=\nabla g(y).
\end{align*}
For each component,
\begin{align*}
x_i(t)-x_i(0)=\int_0^t \partial_{x_i}g(y)\,ds.
\end{align*}
Since $\partial_{x_i}g(y)$ is constant in the integration variable $s$,
\begin{align*}
x_i(t)-x_i(0)=t\,\partial_{x_i}g(y).
\end{align*}
Using $x(0)=y$, this gives
\begin{align*}
x(t)=y+t\nabla g(y).
\end{align*}
For the value variable, the characteristic system gives
\begin{align*}
\dot z(t)=p(t)\cdot H_p(p(t))-H(p(t)).
\end{align*}
Using $H_p(p(t))=p(t)$ and $H(p(t))=|p(t)|^2/2$, we obtain
\begin{align*}
\dot z(t)=p(t)\cdot p(t)-\frac{|p(t)|^2}{2}.
\end{align*}
Since $p(t)\cdot p(t)=|p(t)|^2$,
\begin{align*}
\dot z(t)=|p(t)|^2-\frac{|p(t)|^2}{2}.
\end{align*}
Hence
\begin{align*}
\dot z(t)=\frac{|p(t)|^2}{2}.
\end{align*}
Substituting the constant momentum $p(t)=\nabla g(y)$ gives
\begin{align*}
\dot z(t)=\frac{|\nabla g(y)|^2}{2}.
\end{align*}
Integrating from $0$ to $t$,
\begin{align*}
z(t)-z(0)=\int_0^t \frac{|\nabla g(y)|^2}{2}\,ds.
\end{align*}
The integrand is constant in $s$, so
\begin{align*}
z(t)-z(0)=\frac{t}{2}|\nabla g(y)|^2.
\end{align*}
Using $z(0)=g(y)$, we get
\begin{align*}
z(t)=g(y)+\frac{t}{2}|\nabla g(y)|^2.
\end{align*}
Thus, while the characteristic map
\begin{align*}
y\mapsto y+t\nabla g(y)
\end{align*}
is locally invertible and the smooth solution is represented by these characteristics,
\begin{align*}
u(t,y+t\nabla g(y))=g(y)+\frac{t}{2}|\nabla g(y)|^2.
\end{align*}
The formula says that the free-particle solution transports the initial slope $\nabla g(y)$ at constant velocity and accumulates the quadratic action along that straight path.
[/example]
This example already shows the main obstruction. The formula is meaningful while the map $y\mapsto y+t\nabla g(y)$ can be inverted; if two different initial points arrive at the same $x$, a single smooth value of $\nabla u$ can no longer represent both momenta.
[definition: Complete Integral]
Let $U \subseteq \mathbb R^n$ be open, let $A \subseteq \mathbb R^n$ be open, and let
\begin{align*}
F:U\times \mathbb R\times \mathbb R^n\to \mathbb R
\end{align*}
be a smooth map defining a first-order PDE $F(x,u(x),\nabla u(x))=0$ for functions $u:U\to \mathbb R$. A complete integral is a smooth map
\begin{align*}
V:U\times A\to \mathbb R, \qquad (x,a)\mapsto V(x;a),
\end{align*}
such that, for every fixed $a\in A$, the function $x\mapsto V(x;a)$ solves
\begin{align*}
F(x,V(x;a),\nabla_x V(x;a))=0.
\end{align*}
[/definition]
The parameter set $A$ records the $n$ degrees of freedom in the family. In applications, a nondegeneracy condition in the parameter variables is added so that envelopes can be solved locally for $a=a(x)$. Complete integrals are useful because an envelope of members of the family can produce a solution satisfying prescribed data. They are the nonlinear analogue of having enough independent characteristic data to describe the full solution.
[remark: Envelopes and Singular Solutions]
Given a complete integral $u(x;a)$, an envelope is obtained by imposing stationarity in the parameters:
\begin{align*}
\frac{\partial u}{\partial a_i}(x;a)=0, \qquad i=1,\dots,n.
\end{align*}
Solving these equations for $a=a(x)$ and substituting back may produce a solution not obtained by fixing a constant parameter. In geometric language, the envelope is where a family of characteristic sheets touches a common surface.
[/remark]
The complete-integral viewpoint is classical and geometric. For local PDE theory, the next problem is more operational: given initial data, can we evolve the position, momentum, and value by ordinary differential equations and then read off the solution?
## Characteristics as Hamiltonian Flow in Phase Space
For a smooth solution, differentiating the PDE should produce an equation for the gradient. The question is how to choose curves so that this differentiated system closes and gives a representation of $u$ itself.
Let $p(t)=\nabla u(t,x(t))$. If $x(t)$ is chosen with velocity $H_p(x(t),p(t))$, then the evolution of $p(t)$ is governed by the $x$-derivative of the Hamiltonian. This is the phase-space form of characteristics.
[definition: Hamiltonian Characteristic System]
Let $U\subseteq \mathbb R^n$ be open, let
\begin{align*}
H:U\times \mathbb R^n\to \mathbb R
\end{align*}
belong to $C^2(U\times \mathbb R^n)$, and let $I\subseteq \mathbb R$ be a time interval. The Hamiltonian characteristic system associated to
\begin{align*}
u_t+H(x,\nabla u)=0
\end{align*}
is the system for curves
\begin{align*}
x:I\to U, \qquad p:I\to \mathbb R^n, \qquad z:I\to \mathbb R
\end{align*}
given by
\begin{align*}
\dot{x}(t)=H_p(x(t),p(t)),
\end{align*}
\begin{align*}
\dot{p}(t)=-H_x(x(t),p(t)),
\end{align*}
\begin{align*}
\dot{z}(t)=p(t)\cdot H_p(x(t),p(t))-H(x(t),p(t)).
\end{align*}
[/definition]
The first two equations are Hamilton's equations, and the third records the action accumulated along the projected curve. We need a representation theorem to justify that this ODE system is not merely formal: starting from $(y,\nabla g(y),g(y))$, it must reproduce both the gradient and the value of any smooth solution.
[quotetheorem:3517]
[citeproof:3517]
The hypotheses are not cosmetic. If $u$ is not twice differentiable, the differentiated equation for $\nabla u$ may not exist; for instance the quadratic Hopf-Lax solution can develop a corner after crossing characteristics. If $g$ is not smooth, the initial momentum $\nabla g(y)$ may not be defined, so the characteristic system cannot even be initialized at every point. The condition that the characteristic remains inside the domain of the smooth solution is also essential: an ODE trajectory may leave $U$ or reach a caustic where $y\mapsto x(t;y)$ is no longer invertible, and then the theorem no longer produces a single classical graph over physical space.
This theorem explains why smooth Hamilton-Jacobi theory is tied to invertibility of the characteristic projection. The phase-space flow may remain smooth even when its projection to physical space folds, so we need a name for the physical-space failure point where the classical graph description breaks down.
[definition: Caustic]
Let $Y\subseteq \mathbb R^n$ be an open parameter set, let $I\subseteq \mathbb R$ be a time interval, and let
\begin{align*}
X:I\times Y\to U, \qquad (t,y)\mapsto X(t;y),
\end{align*}
be a smooth family of characteristic positions in an [open set](/page/Open%20Set) $U\subseteq \mathbb R^n$. A caustic is a point $(t,x)\in I\times U$ with $x=X(t;y)$ for some $y\in Y$ such that the map
\begin{align*}
X_t:Y\to U, \qquad X_t(y)=X(t;y),
\end{align*}
fails to be locally invertible at $y$.
[/definition]
At a caustic, different characteristic branches meet in physical space. The phase-space trajectories can still be distinct, but a classical solution cannot assign several different gradients to the same point.
[example: Crossing Characteristics]
In one space dimension, let
\begin{align*}
H(p)=\frac{p^2}{2}.
\end{align*}
Then
\begin{align*}
H_p(p)=\frac{d}{dp}\left(\frac{p^2}{2}\right)=p,
\end{align*}
and $H$ has no $x$-dependence, so
\begin{align*}
H_x(x,p)=0.
\end{align*}
The momentum equation in the Hamiltonian characteristic system is therefore
\begin{align*}
\dot p(t)=-H_x(x(t),p(t))=0,
\end{align*}
with initial condition
\begin{align*}
p(0)=g'(y).
\end{align*}
Since $\dot p(t)=0$, the momentum is constant:
\begin{align*}
p(t)=p(0)=g'(y).
\end{align*}
The position equation becomes
\begin{align*}
\dot x(t)=H_p(p(t))=p(t)=g'(y),
\end{align*}
with initial condition
\begin{align*}
x(0)=y.
\end{align*}
Integrating from $0$ to $t$ gives
\begin{align*}
x(t;y)-x(0;y)=\int_0^t g'(y)\,ds.
\end{align*}
The quantity $g'(y)$ is independent of the integration variable $s$, so
\begin{align*}
x(t;y)-y=tg'(y).
\end{align*}
Hence
\begin{align*}
x(t;y)=y+tg'(y).
\end{align*}
Differentiate the characteristic projection $y\mapsto x(t;y)$ with respect to the initial point:
\begin{align*}
\frac{\partial x}{\partial y}(t;y)=\frac{\partial}{\partial y}\left(y+tg'(y)\right).
\end{align*}
Since $t$ is fixed in this derivative,
\begin{align*}
\frac{\partial x}{\partial y}(t;y)=1+tg''(y).
\end{align*}
If $g''(y_0)<0$, define
\begin{align*}
t_0=-\frac{1}{g''(y_0)}.
\end{align*}
Because $g''(y_0)<0$, this gives $t_0>0$. At this time,
\begin{align*}
\frac{\partial x}{\partial y}(t_0;y_0)=1+t_0g''(y_0).
\end{align*}
Substituting the definition of $t_0$,
\begin{align*}
\frac{\partial x}{\partial y}(t_0;y_0)=1+\left(-\frac{1}{g''(y_0)}\right)g''(y_0).
\end{align*}
Since $g''(y_0)\neq 0$,
\begin{align*}
\frac{\partial x}{\partial y}(t_0;y_0)=1-1=0.
\end{align*}
Thus the Jacobian of the projection $y\mapsto x(t;y)$ vanishes at $(t_0,y_0)$, so the characteristic projection is not locally invertible there.
The phase-space curves may still remain distinct. For two initial points $y_1$ and $y_2$, the corresponding momenta are
\begin{align*}
p(t;y_1)=g'(y_1)
\end{align*}
and
\begin{align*}
p(t;y_2)=g'(y_2).
\end{align*}
Their projected positions are
\begin{align*}
x(t;y_1)=y_1+tg'(y_1)
\end{align*}
and
\begin{align*}
x(t;y_2)=y_2+tg'(y_2).
\end{align*}
If the projected positions agree, then
\begin{align*}
y_1+tg'(y_1)=y_2+tg'(y_2).
\end{align*}
If at the same time the momenta differ,
\begin{align*}
g'(y_1)\neq g'(y_2),
\end{align*}
then the same spatial point is reached with two different momenta. Since a smooth function $u(t,x)$ has only one spatial derivative $u_x(t,x)$ at a fixed point, it cannot represent both momenta there. This is the Hamilton-Jacobi analogue of shock formation for conservation laws.
[/example]
The failure of smoothness motivates weaker solution concepts in Chapters 5 and 6. Before the later viscosity-solution viewpoint is used explicitly, however, there is an important formula for convex Hamiltonians that gives the correct nonsmooth continuation.
## The Hopf-Lax Formula in the Quadratic Model
When $H$ is convex in momentum, Hamilton-Jacobi evolution has a variational interpretation. The guiding question is: if smooth characteristics cross, can the solution be described by choosing the least action among all possible starting points?
For the model Hamiltonian
\begin{align*}
H(p)=\frac{|p|^2}{2},
\end{align*}
the corresponding Lagrangian is also quadratic. This leads to a minimisation formula.
[definition: Quadratic Hopf-Lax Candidate]
Let $g:\mathbb R^n\to \mathbb R$ be bounded below and continuous. For $t>0$, define
\begin{align*}
u(t,x)=\inf_{y\in \mathbb R^n}\left\{g(y)+\frac{|x-y|^2}{2t}\right\}.
\end{align*}
[/definition]
This formula compares all straight-line paths ending at $x$ at time $t$. The term
\begin{align*}
\frac{|x-y|^2}{2t}
\end{align*}
is the action cost of travelling from $y$ to $x$ in time $t$ with constant velocity, and the next theorem checks that the variational candidate satisfies the PDE at points where the minimiser is smooth and unique.
[quotetheorem:6147]
[citeproof:6147]
The unique smooth minimiser assumption is exactly where this theorem is local rather than global. If two different points $y_1$ and $y_2$ minimise the same action at $(t,x)$, the infimum may have a corner and $\nabla u(t,x)$ need not exist; this is the variational form of two characteristic branches reaching the same point. If the minimiser is unique but does not depend smoothly on $(t,x)$, the envelope differentiation used in the proof is not justified. Thus the theorem verifies the PDE only at regular points of the value function; it does not assert that the Hopf-Lax candidate is a classical solution after caustics form.
The proof shows why convexity matters in the general theory: the solution is built by minimising action. The quadratic calculation is the model for a broader formula, but the broader formula is not usually classical after minimisers collide. Before stating it, we need to name the weaker solution concept that keeps track of the PDE through nonsmooth points.
[definition: Viscosity Solution]
A viscosity solution is a [continuous function](/page/Continuous%20Function) that satisfies a first-order nonlinear PDE through comparison with smooth test functions touching it from above or below, rather than through pointwise derivatives everywhere. This notion keeps the correct solution after classical derivatives fail at corners or caustics.
[/definition]
This definition is deliberately qualitative here; the course only needs the idea that smooth test functions replace unavailable derivatives. The Hopf-Lax formula is the main example: it gives a value function that may not be differentiable everywhere, but still solves the Hamilton-Jacobi problem in this test-function sense.
[explanation: Hopf-Lax Formula for Convex Hamiltonians]
Let $H:\mathbb R^n\to \mathbb R$ be convex, finite-valued, lower semicontinuous, and superlinear, and let $L:\mathbb R^n\to \mathbb R\cup\{+\infty\}$ be its Legendre transform,
\begin{align*}
L(q)=\sup_{p\in\mathbb R^n}\{p\cdot q-H(p)\}.
\end{align*}
Let $g:\mathbb R^n\to \mathbb R$ be bounded and uniformly continuous. Then the function
\begin{align*}
u(t,x)=\inf_{y\in\mathbb R^n}\left\{g(y)+tL\left(\frac{x-y}{t}\right)\right\}, \qquad t>0,
\end{align*}
with $u(0,x)=g(x)$, is the viscosity solution of the Cauchy problem
\begin{align*}
u_t+H(\nabla u)=0, \qquad u(0,x)=g(x).
\end{align*}
[/explanation]
In this course the formula is proved only in the quadratic model above. The theorem is not a classical differentiability statement: after minimisers cease to be unique, $u$ may have corners, and the asserted solution concept is viscosity solution. The hypotheses make the variational expression finite and stable. Lower semicontinuity gives good compactness properties for the Legendre transform, superlinearity gives coercivity in the velocity variable, and bounded [uniform continuity](/page/Uniform%20Continuity) of $g$ gives a well-defined initial trace as $t\downarrow 0$.
The assumptions also mark the boundary of the statement. If superlinearity is removed, the Legendre transform may be finite only on a restricted velocity set. For instance, with the affine Hamiltonian $H(p)=b\cdot p+c$, the Legendre transform is $L(q)=-c$ when $q=b$ and $L(q)=+\infty$ otherwise. In the sign convention $u_t+H(\nabla u)=0$, the corresponding characteristic formula is $u(t,x)=g(x-tb)-ct$: indeed, $u_t=-b\cdot\nabla g(x-tb)-c$ and $\nabla u=\nabla g(x-tb)$, so $u_t+b\cdot\nabla u+c=0$. This still solves a linear transport equation in the viscosity sense, but it is not the finite-action minimisation formula over all endpoint velocities stated above. If the initial datum is allowed to be discontinuous, the formula may not attain the prescribed initial data continuously at $t=0$, so the Cauchy problem in the theorem's terms is no longer the same problem.
[example: Recovering Characteristics from Hopf-Lax]
For
\begin{align*}
H(p)=\frac{|p|^2}{2},
\end{align*}
the quadratic Hopf-Lax candidate is
\begin{align*}
u(t,x)=\inf_{a\in\mathbb R^n}\left\{g(a)+\frac{|x-a|^2}{2t}\right\}, \qquad t>0.
\end{align*}
Assume $g\in C^2(\mathbb R^n)$ and that, at the fixed point $(t,x)$ with $t>0$, the infimum is attained at a unique interior point $y$. Define
\begin{align*}
\Phi(a)=g(a)+\frac{|x-a|^2}{2t}.
\end{align*}
Since $y$ is an interior minimiser and $\Phi$ is differentiable, the first-order necessary condition gives
\begin{align*}
\nabla_a\Phi(y)=0.
\end{align*}
Writing
\begin{align*}
|x-a|^2=\sum_{j=1}^n (x_j-a_j)^2,
\end{align*}
we compute, for each $i=1,\dots,n$,
\begin{align*}
\partial_{a_i}\Phi(a)=\partial_{a_i}g(a)+\partial_{a_i}\left(\frac{1}{2t}\sum_{j=1}^n (x_j-a_j)^2\right).
\end{align*}
Since $t$ is fixed and differentiation is linear,
\begin{align*}
\partial_{a_i}\Phi(a)=\partial_{a_i}g(a)+\frac{1}{2t}\sum_{j=1}^n \partial_{a_i}(x_j-a_j)^2.
\end{align*}
If $j\neq i$, then $x_j-a_j$ is independent of $a_i$, so
\begin{align*}
\partial_{a_i}(x_j-a_j)^2=0.
\end{align*}
For the remaining term,
\begin{align*}
\partial_{a_i}(x_i-a_i)^2=2(x_i-a_i)\partial_{a_i}(x_i-a_i).
\end{align*}
Because $\partial_{a_i}(x_i-a_i)=-1$, this becomes
\begin{align*}
\partial_{a_i}(x_i-a_i)^2=-2(x_i-a_i).
\end{align*}
Therefore
\begin{align*}
\partial_{a_i}\Phi(a)=\partial_{a_i}g(a)+\frac{1}{2t}\left(-2(x_i-a_i)\right).
\end{align*}
Hence
\begin{align*}
\partial_{a_i}\Phi(a)=\partial_{a_i}g(a)-\frac{x_i-a_i}{t}.
\end{align*}
Evaluating at $a=y$ and using $\nabla_a\Phi(y)=0$ gives
\begin{align*}
0=\partial_{x_i}g(y)-\frac{x_i-y_i}{t}.
\end{align*}
Adding $(x_i-y_i)/t$ to both sides yields
\begin{align*}
\frac{x_i-y_i}{t}=\partial_{x_i}g(y).
\end{align*}
Since this holds for every component,
\begin{align*}
\frac{x-y}{t}=\nabla g(y).
\end{align*}
Multiplying by $t$ and adding $y$ gives the stationarity relation
\begin{align*}
x=y+t\nabla g(y).
\end{align*}
Now compare this with the free-particle characteristic starting from $y$. Since $H$ has no $x$-dependence,
\begin{align*}
H_x(x,p)=0,
\end{align*}
and since $H(p)=|p|^2/2$,
\begin{align*}
H_p(p)=p.
\end{align*}
With initial data
\begin{align*}
p(0;y)=\nabla g(y), \qquad x(0;y)=y,
\end{align*}
the momentum equation is
\begin{align*}
\dot p(s;y)=-H_x(x(s;y),p(s;y))=0.
\end{align*}
Thus each component satisfies
\begin{align*}
\dot p_i(s;y)=0.
\end{align*}
So
\begin{align*}
p_i(s;y)=p_i(0;y)=\partial_{x_i}g(y),
\end{align*}
and therefore
\begin{align*}
p(s;y)=\nabla g(y).
\end{align*}
The position equation becomes
\begin{align*}
\dot x(s;y)=H_p(p(s;y))=p(s;y)=\nabla g(y).
\end{align*}
For each component,
\begin{align*}
x_i(s;y)-x_i(0;y)=\int_0^s \partial_{x_i}g(y)\,d\sigma.
\end{align*}
The integrand is constant in $\sigma$, so
\begin{align*}
x_i(s;y)-x_i(0;y)=s\,\partial_{x_i}g(y).
\end{align*}
Using $x(0;y)=y$, we obtain
\begin{align*}
x(s;y)=y+s\nabla g(y).
\end{align*}
At time $s=t$,
\begin{align*}
x(t;y)=y+t\nabla g(y),
\end{align*}
which is exactly the stationarity relation found from the Hopf-Lax minimisation. Thus, before the characteristic projection forms a caustic, the unique Hopf-Lax minimiser labels the same branch as the smooth characteristic solution; after several branches reach the same point, the infimum selects the branch with the smallest value of
\begin{align*}
g(a)+\frac{|x-a|^2}{2t}.
\end{align*}
[/example]
The [quadratic formula](/theorems/1301) is the first place where characteristics and variational principles meet. The next section interprets the same geometry through wavefronts and the eikonal equation.
## The Eikonal Equation and Wavefront Geometry
In geometric optics, a rapidly oscillating wave is described by a phase function. The leading-order equation for that phase asks for a function whose gradient has prescribed length, so the geometric question is: which functions measure distance to a source or wavefront?
[definition: Eikonal Equation]
Let $U\subseteq \mathbb R^n$ be open and let $c:U\to (0,\infty)$ be a speed function. The eikonal equation for a phase $u:U\to\mathbb R$ is
\begin{align*}
|\nabla u(x)|=\frac{1}{c(x)}, \qquad x\in U.
\end{align*}
In the unit-speed case it is
\begin{align*}
|\nabla u(x)|=1.
\end{align*}
[/definition]
The level sets of $u$ are wavefronts, and $\nabla u$ is normal to the fronts where $u$ is smooth. Thus the eikonal equation says that the phase increases at a fixed rate in the normal direction.
[example: Distance from a Point]
Let $u(x)=|x-x_0|$ on $\mathbb R^n\setminus\{x_0\}$. Since $x\neq x_0$, we have $|x-x_0|>0$, so the denominator in the gradient formula below is nonzero. Writing
\begin{align*}
u(x)=\left(\sum_{j=1}^n (x_j-x_{0,j})^2\right)^{1/2},
\end{align*}
we compute, for each $i=1,\dots,n$, by the chain rule:
\begin{align*}
\partial_{x_i}u(x)=\frac{1}{2}\left(\sum_{j=1}^n (x_j-x_{0,j})^2\right)^{-1/2}\partial_{x_i}\left(\sum_{j=1}^n (x_j-x_{0,j})^2\right).
\end{align*}
By linearity of differentiation,
\begin{align*}
\partial_{x_i}\left(\sum_{j=1}^n (x_j-x_{0,j})^2\right)=\sum_{j=1}^n \partial_{x_i}(x_j-x_{0,j})^2.
\end{align*}
If $j\neq i$, then $x_j-x_{0,j}$ is independent of $x_i$, so
\begin{align*}
\partial_{x_i}(x_j-x_{0,j})^2=0.
\end{align*}
For the remaining term,
\begin{align*}
\partial_{x_i}(x_i-x_{0,i})^2=2(x_i-x_{0,i})\partial_{x_i}(x_i-x_{0,i}).
\end{align*}
Since $\partial_{x_i}(x_i-x_{0,i})=1$, this becomes
\begin{align*}
\partial_{x_i}(x_i-x_{0,i})^2=2(x_i-x_{0,i}).
\end{align*}
Therefore
\begin{align*}
\partial_{x_i}\left(\sum_{j=1}^n (x_j-x_{0,j})^2\right)=2(x_i-x_{0,i}).
\end{align*}
Substituting this into the chain-rule expression gives
\begin{align*}
\partial_{x_i}u(x)=\frac{1}{2}\left(\sum_{j=1}^n (x_j-x_{0,j})^2\right)^{-1/2}2(x_i-x_{0,i}).
\end{align*}
Cancelling the factor $2$ gives
\begin{align*}
\partial_{x_i}u(x)=\frac{x_i-x_{0,i}}{\left(\sum_{j=1}^n (x_j-x_{0,j})^2\right)^{1/2}}.
\end{align*}
Since
\begin{align*}
\left(\sum_{j=1}^n (x_j-x_{0,j})^2\right)^{1/2}=|x-x_0|,
\end{align*}
we obtain
\begin{align*}
\partial_{x_i}u(x)=\frac{x_i-x_{0,i}}{|x-x_0|}.
\end{align*}
Since this holds for every component,
\begin{align*}
\nabla u(x)=\frac{x-x_0}{|x-x_0|}.
\end{align*}
Taking the Euclidean norm,
\begin{align*}
|\nabla u(x)|^2=\sum_{i=1}^n \left(\frac{x_i-x_{0,i}}{|x-x_0|}\right)^2.
\end{align*}
Since $|x-x_0|$ is independent of the summation index $i$,
\begin{align*}
|\nabla u(x)|^2=\sum_{i=1}^n \frac{(x_i-x_{0,i})^2}{|x-x_0|^2}.
\end{align*}
Factoring out the common denominator gives
\begin{align*}
|\nabla u(x)|^2=\frac{\sum_{i=1}^n (x_i-x_{0,i})^2}{|x-x_0|^2}.
\end{align*}
By the definition of the Euclidean norm,
\begin{align*}
\sum_{i=1}^n (x_i-x_{0,i})^2=|x-x_0|^2.
\end{align*}
Therefore
\begin{align*}
|\nabla u(x)|^2=\frac{|x-x_0|^2}{|x-x_0|^2}.
\end{align*}
Because $|x-x_0|>0$, this quotient is $1$, so
\begin{align*}
|\nabla u(x)|^2=1.
\end{align*}
Since $|\nabla u(x)|\ge 0$, it follows that
\begin{align*}
|\nabla u(x)|=1.
\end{align*}
Thus the distance from a point solves the unit-speed eikonal equation away from the source point. Its level sets are the spheres $|x-x_0|=r$, and the gradient direction $(x-x_0)/|x-x_0|$ points along the outward radial rays.
[/example]
Distance functions provide the canonical examples, but smoothness can fail when there is more than one nearest point or when geodesics focus. We need to separate the smooth region from the singular region by naming the set where the nearest-point description loses uniqueness or regularity.
[definition: Cut Locus]
For a [closed set](/page/Closed%20Set) $K\subset\mathbb R^n$, define the distance function
\begin{align*}
d_K:\mathbb R^n\to \mathbb R, \qquad d_K(x)=\inf_{y\in K}|x-y|.
\end{align*}
At points where the nearest point is locally unique, the nearest-point projection is the map
\begin{align*}
\pi:V\to \mathbb R^n
\end{align*}
from an open neighbourhood $V\subseteq \mathbb R^n\setminus K$ assigning to each $x\in V$ its nearest point in $K$, so $\pi(V)\subseteq K$. The cut locus of $K$ is the set of points $x\in \mathbb R^n\setminus K$ at which $d_K$ fails to be differentiable because nearest-point projection is not locally single-valued or because the normal parametrisation degenerates.
[/definition]
Away from the cut locus, the nearest point is locally stable, and the distance function behaves like a smooth phase. The issue is that the distance function is defined by an infimum, so differentiability can fail exactly when competing minimisers appear or the nearest-point map stops varying smoothly.
The definition therefore raises a local PDE question: after excluding the cut locus, does the distance function become smooth enough to satisfy the eikonal equation in the classical sense? The needed statement turns geometric uniqueness of nearest points into analytic information, identifying the gradient with the unit normal direction and proving that its length is one.
[quotetheorem:6148]
[citeproof:6148]
This theorem is the geometric counterpart of the Hopf-Lax calculation: a minimisation principle gives a differentiable solution only where the minimiser is unique and stable. Closedness of $K$ ensures that nearest points exist locally; for a non-closed set such as $K=(0,1)\subset\mathbb R$, the point $2$ has distance $1$ but no nearest point in $K$. Uniqueness is also necessary: for $K=\{-1,1\}\subset\mathbb R$, the distance function is $d_K(x)=\min\{|x+1|,|x-1|\}$ and is not differentiable at $x=0$, where two nearest points compete. Smooth dependence is a separate condition: even with a unique nearest point at a chosen location, corners or degenerating [normal coordinates](/theorems/2713) can make the projection fail to vary smoothly nearby, which is precisely the behaviour excluded by staying away from the cut locus.
[example: Distance to a Plane]
Let
\begin{align*}
K=\{x\in\mathbb R^n:x_n=0\}.
\end{align*}
We compute the distance function $d_K$ on the two open half-spaces and then check the unit-speed eikonal equation there.
First suppose $x=(x_1,\dots,x_n)$ satisfies $x_n>0$. The point
\begin{align*}
y=(x_1,\dots,x_{n-1},0)
\end{align*}
belongs to $K$, and
\begin{align*}
x-y=(0,\dots,0,x_n).
\end{align*}
Therefore
\begin{align*}
|x-y|=\left(0^2+\cdots+0^2+x_n^2\right)^{1/2}.
\end{align*}
Since $0^2+\cdots+0^2+x_n^2=x_n^2$, this becomes
\begin{align*}
|x-y|=(x_n^2)^{1/2}.
\end{align*}
For a real number $a$, $(a^2)^{1/2}=|a|$, so
\begin{align*}
|x-y|=|x_n|.
\end{align*}
Because $x_n>0$, we have $|x_n|=x_n$, and hence
\begin{align*}
|x-y|=x_n.
\end{align*}
Now let $z=(z_1,\dots,z_{n-1},0)\in K$. Then
\begin{align*}
x-z=(x_1-z_1,\dots,x_{n-1}-z_{n-1},x_n),
\end{align*}
so by the definition of the Euclidean norm,
\begin{align*}
|x-z|^2=\sum_{i=1}^{n-1}(x_i-z_i)^2+x_n^2.
\end{align*}
Each square $(x_i-z_i)^2$ is nonnegative, so
\begin{align*}
\sum_{i=1}^{n-1}(x_i-z_i)^2+x_n^2\ge x_n^2.
\end{align*}
Both sides are nonnegative, and the square-root function is increasing on $[0,\infty)$, so
\begin{align*}
|x-z|\ge (x_n^2)^{1/2}.
\end{align*}
Again $(x_n^2)^{1/2}=|x_n|=x_n$, because $x_n>0$, so
\begin{align*}
|x-z|\ge x_n.
\end{align*}
The point $y$ attains this lower bound, so
\begin{align*}
d_K(x)=x_n, \qquad x_n>0.
\end{align*}
Thus, on the half-space $x_n>0$,
\begin{align*}
\partial_{x_i}d_K(x)=\partial_{x_i}x_n=0 \quad (i=1,\dots,n-1),
\end{align*}
and
\begin{align*}
\partial_{x_n}d_K(x)=\partial_{x_n}x_n=1.
\end{align*}
Therefore
\begin{align*}
\nabla d_K(x)=(0,\dots,0,1)=e_n.
\end{align*}
Taking the Euclidean norm gives
\begin{align*}
|\nabla d_K(x)|^2=0^2+\cdots+0^2+1^2.
\end{align*}
Since the right-hand side is $1$,
\begin{align*}
|\nabla d_K(x)|^2=1.
\end{align*}
Because $|\nabla d_K(x)|\ge 0$, it follows that
\begin{align*}
|\nabla d_K(x)|=1.
\end{align*}
Now suppose $x_n<0$. The same point
\begin{align*}
y=(x_1,\dots,x_{n-1},0)
\end{align*}
belongs to $K$, and
\begin{align*}
x-y=(0,\dots,0,x_n).
\end{align*}
Hence
\begin{align*}
|x-y|=\left(0^2+\cdots+0^2+x_n^2\right)^{1/2}.
\end{align*}
As before,
\begin{align*}
|x-y|=(x_n^2)^{1/2}=|x_n|.
\end{align*}
Because $x_n<0$, we have $|x_n|=-x_n$, so
\begin{align*}
|x-y|=-x_n.
\end{align*}
For any $z=(z_1,\dots,z_{n-1},0)\in K$, the same norm calculation gives
\begin{align*}
|x-z|^2=\sum_{i=1}^{n-1}(x_i-z_i)^2+x_n^2.
\end{align*}
Since each square is nonnegative,
\begin{align*}
|x-z|^2\ge x_n^2.
\end{align*}
Taking square roots of nonnegative quantities gives
\begin{align*}
|x-z|\ge (x_n^2)^{1/2}.
\end{align*}
Since $x_n<0$, this is
\begin{align*}
|x-z|\ge |x_n|=-x_n.
\end{align*}
The point $y$ attains this lower bound, so
\begin{align*}
d_K(x)=-x_n, \qquad x_n<0.
\end{align*}
Thus, on the half-space $x_n<0$,
\begin{align*}
\partial_{x_i}d_K(x)=\partial_{x_i}(-x_n)=0 \quad (i=1,\dots,n-1),
\end{align*}
and
\begin{align*}
\partial_{x_n}d_K(x)=\partial_{x_n}(-x_n)=-1.
\end{align*}
Therefore
\begin{align*}
\nabla d_K(x)=(0,\dots,0,-1)=-e_n.
\end{align*}
Taking the Euclidean norm gives
\begin{align*}
|\nabla d_K(x)|^2=0^2+\cdots+0^2+(-1)^2.
\end{align*}
Since the right-hand side is $1$,
\begin{align*}
|\nabla d_K(x)|^2=1.
\end{align*}
Because $|\nabla d_K(x)|\ge 0$, we get
\begin{align*}
|\nabla d_K(x)|=1.
\end{align*}
Thus the distance to the plane solves the unit-speed eikonal equation on each open half-space. The hyperplane $K$ itself is the interface where the formula for the gradient changes from $e_n$ to $-e_n$, so the distance function is not differentiable across that interface.
[/example]
The eikonal equation also explains caustics: rays are characteristic curves for the phase, and caustics occur when the ray map ceases to parametrise space smoothly.
[example: Caustic from a Curved Wavefront]
Let $\gamma:I\to\mathbb R^2$ be a smooth initial wavefront parametrised by arclength, and write
\begin{align*}
T(s)=\gamma'(s), \qquad |T(s)|=1.
\end{align*}
Choose the unit normal $N(s)$ on the side into which the wavefront moves, so that $|N(s)|=1$ and $T(s)\cdot N(s)=0$. Suppose the signed curvature on this side satisfies
\begin{align*}
N'(s)=-\kappa(s)T(s), \qquad \kappa(s)>0.
\end{align*}
The normal ray map is
\begin{align*}
X(r,s)=\gamma(s)+rN(s), \qquad r\ge 0.
\end{align*}
Differentiating with respect to $r$ gives
\begin{align*}
X_r(r,s)=N(s).
\end{align*}
Differentiating with respect to $s$ gives
\begin{align*}
X_s(r,s)=\gamma'(s)+rN'(s).
\end{align*}
Using $\gamma'(s)=T(s)$ and $N'(s)=-\kappa(s)T(s)$,
\begin{align*}
X_s(r,s)=T(s)+r\bigl(-\kappa(s)T(s)\bigr).
\end{align*}
Thus
\begin{align*}
X_s(r,s)=T(s)-r\kappa(s)T(s).
\end{align*}
Factoring out $T(s)$ gives
\begin{align*}
X_s(r,s)=(1-r\kappa(s))T(s).
\end{align*}
The Jacobian determinant of the map $(r,s)\mapsto X(r,s)$ is
\begin{align*}
\det(X_r(r,s),X_s(r,s))=\det\bigl(N(s),(1-r\kappa(s))T(s)\bigr).
\end{align*}
By linearity of the determinant in the second column,
\begin{align*}
\det(X_r(r,s),X_s(r,s))=(1-r\kappa(s))\det(N(s),T(s)).
\end{align*}
Since $T(s)$ and $N(s)$ are orthonormal in $\mathbb R^2$, the parallelogram they span has area $1$, so
\begin{align*}
|\det(N(s),T(s))|=1.
\end{align*}
Therefore
\begin{align*}
|\det(X_r(r,s),X_s(r,s))|=|1-r\kappa(s)|\,|\det(N(s),T(s))|.
\end{align*}
Substituting $|\det(N(s),T(s))|=1$ gives
\begin{align*}
|\det(X_r(r,s),X_s(r,s))|=|1-r\kappa(s)|.
\end{align*}
Thus the normal coordinates are locally nondegenerate precisely when
\begin{align*}
1-r\kappa(s)\neq 0.
\end{align*}
At
\begin{align*}
r=\frac{1}{\kappa(s)},
\end{align*}
we have
\begin{align*}
1-r\kappa(s)=1-\frac{1}{\kappa(s)}\kappa(s)=1-1=0.
\end{align*}
So the Jacobian vanishes at this distance, the normal parametrisation degenerates, and the projected rays form a caustic.
Before this degeneracy, define the travel-time function by
\begin{align*}
u(X(r,s))=r.
\end{align*}
Differentiating this identity with respect to $r$ and using the chain rule gives
\begin{align*}
\nabla u(X(r,s))\cdot X_r(r,s)=1.
\end{align*}
Since $X_r(r,s)=N(s)$, this becomes
\begin{align*}
\nabla u(X(r,s))\cdot N(s)=1.
\end{align*}
Differentiating the same identity with respect to $s$ gives
\begin{align*}
\nabla u(X(r,s))\cdot X_s(r,s)=0.
\end{align*}
Using $X_s(r,s)=(1-r\kappa(s))T(s)$,
\begin{align*}
\nabla u(X(r,s))\cdot \bigl((1-r\kappa(s))T(s)\bigr)=0.
\end{align*}
Pulling out the scalar factor gives
\begin{align*}
(1-r\kappa(s))\,\nabla u(X(r,s))\cdot T(s)=0.
\end{align*}
When $1-r\kappa(s)\neq 0$, division by this nonzero factor yields
\begin{align*}
\nabla u(X(r,s))\cdot T(s)=0.
\end{align*}
Because $\{T(s),N(s)\}$ is an [orthonormal basis](/page/Orthonormal%20Basis) of $\mathbb R^2$, write
\begin{align*}
\nabla u(X(r,s))=aT(s)+bN(s).
\end{align*}
Taking the dot product with $T(s)$ gives
\begin{align*}
\nabla u(X(r,s))\cdot T(s)=a\,T(s)\cdot T(s)+b\,N(s)\cdot T(s).
\end{align*}
Using $T(s)\cdot T(s)=1$ and $N(s)\cdot T(s)=0$,
\begin{align*}
\nabla u(X(r,s))\cdot T(s)=a.
\end{align*}
Since this dot product is $0$, we have
\begin{align*}
a=0.
\end{align*}
Taking the dot product with $N(s)$ gives
\begin{align*}
\nabla u(X(r,s))\cdot N(s)=a\,T(s)\cdot N(s)+b\,N(s)\cdot N(s).
\end{align*}
Using $T(s)\cdot N(s)=0$ and $N(s)\cdot N(s)=1$,
\begin{align*}
\nabla u(X(r,s))\cdot N(s)=b.
\end{align*}
Since this dot product is $1$, we have
\begin{align*}
b=1.
\end{align*}
Therefore
\begin{align*}
\nabla u(X(r,s))=N(s).
\end{align*}
Taking norms gives
\begin{align*}
|\nabla u(X(r,s))|=|N(s)|=1.
\end{align*}
So the travel-time function solves the unit-speed eikonal equation before the caustic.
For a concrete picture, take the initial curve to be a circle of radius $R$ and let the rays move inward. Then the inward signed curvature is
\begin{align*}
\kappa=\frac{1}{R}.
\end{align*}
The degeneracy condition is
\begin{align*}
r=\frac{1}{\kappa}.
\end{align*}
Substituting $\kappa=1/R$ gives
\begin{align*}
r=\frac{1}{1/R}=R.
\end{align*}
Thus all inward normal rays meet at the centre after distance $R$. At that point different starting points on the circle arrive with different normal directions, so a single differentiable phase cannot assign one gradient there.
[/example]
The Hamilton-Jacobi chapter therefore completes the classical characteristic picture and exposes its limitation. Smooth solutions are represented by Hamiltonian flow until the projection folds; variational formulas continue beyond that time by selecting minimising branches; and the eikonal equation translates the same mechanism into the geometry of wavefronts.
Hamilton-Jacobi theory already shows that characteristic flow can continue past the point where a classical graph breaks down. Chapter 5 pushes beyond that limitation by changing the notion of solution itself, so discontinuities and weak conservation become central rather than exceptional.
# 5. Conservation Laws and Weak Solutions
This chapter changes the status of the equations studied so far. Chapters 2 and 3 treated first-order PDEs mostly through smooth solutions transported along characteristics; conservation laws force us to confront the fact that smoothness can break down in finite time even from smooth initial data. The central question is how to keep the equation meaningful after shocks form, and the answer is to move from pointwise differential equations to integral identities tested against smooth compactly supported functions.
The prerequisites are the characteristic method for first-order quasilinear equations, the [divergence theorem](/theorems/2754), integration by parts, and the basic language of test functions. The chapter uses those tools to connect three viewpoints on the same equation: integral balance over spatial regions, divergence-form PDEs, and weak identities in spacetime. This provides the course-level bridge from classical first-order theory to the entropy and admissibility theory of Chapter 6 and the Riemann problems of Chapter 7.
## From Balance Laws to Divergence Form
A conservation law begins with a bookkeeping question: if a quantity is distributed through space, how can its amount inside a region change? In the absence of sources, the only mechanism is flux across the boundary. This section translates that physical principle into the differential equation that will later be interpreted weakly.
Let $u(t,x)$ denote a scalar density on a spacetime domain $(0,T)\times U$, where $U \subset \mathbb R^n$ is open. Let $f(u)$ be the flux, with $f: \mathbb R \to \mathbb R^n$. If $V \subset U$ is a bounded region with smooth boundary, conservation over $V$ asserts that the rate of change of the total amount in $V$ equals minus the outward flux through $\partial V$.
[definition: Integral Conservation Law]
Let $U \subset \mathbb R^n$ be open, let $T>0$, and let $f: \mathbb R \to \mathbb R^n$. A function $u: (0,T)\times U \to \mathbb R$ satisfies the integral conservation law with flux $f$ if, for every bounded open set $V \subset U$ with smooth boundary and every time interval on which the following quantities are defined, where $d\mathcal L^n$ denotes $n$-dimensional Lebesgue measure and $d\mathcal H^{n-1}$ denotes surface measure on $\partial V$,
\begin{align*}
\frac{d}{dt}\int_V u(t,x)\,d\mathcal L^n(x)
= -\int_{\partial V} f(u(t,x))\cdot n(x)\,d\mathcal H^{n-1}(x),
\end{align*}
where $n$ is the outward unit normal to $\partial V$.
[/definition]
The definition records conservation before differentiating in $x$: it is phrased in terms of amounts and boundary fluxes, so it still has conceptual meaning when a density develops sharp fronts. For smooth densities, however, we want to know whether this bookkeeping law agrees with the PDE form used in the characteristic method. The next theorem is the bridge from the physical balance law to the divergence-form equation.
[quotetheorem:6149]
[citeproof:6149]
This theorem explains why conservation laws are written in divergence form. Each hypothesis has a specific role. The assumption $u\in C^1$ is what permits differentiating the integral in time and using the chain rule inside $\operatorname{div}_x f(u)$; a step function in $x$ already shows that a conserved profile may have no pointwise derivative at its jump. The smoothness of $V$ is used for the classical [divergence theorem](/theorems/3614); rough regions require a more careful trace theory. The theorem does not say that every conserved discontinuous profile satisfies the pointwise PDE, and it does not decide which discontinuities are physically admissible. Its scope is exactly the smooth regime where integral bookkeeping and differential form are equivalent.
[example: Burgers Equation in Conservation Form]
For the inviscid Burgers flux in one space dimension,
\begin{align*}
f(u)=\frac{u^2}{2},
\end{align*}
the divergence-form conservation law $\partial_tu+\partial_x f(u)=0$ becomes
\begin{align*}
\partial_t u+\partial_x\left(\frac{u^2}{2}\right)=0.
\end{align*}
Assume now that $u$ is $C^1$. Since multiplication by the constant $\frac12$ commutes with differentiation,
\begin{align*}
\partial_x\left(\frac{u^2}{2}\right)=\frac{1}{2}\partial_x(u^2).
\end{align*}
Using $u^2=u\cdot u$ and applying the product rule,
\begin{align*}
\partial_x(u^2)=\partial_x(u\cdot u).
\end{align*}
The product rule gives
\begin{align*}
\partial_x(u\cdot u)=(\partial_xu)u+u(\partial_xu).
\end{align*}
Since multiplication of real-valued functions is commutative, this is
\begin{align*}
(\partial_xu)u+u(\partial_xu)=u\,\partial_xu+u\,\partial_xu=2u\,\partial_xu.
\end{align*}
Therefore
\begin{align*}
\partial_x(u^2)=2u\,\partial_xu.
\end{align*}
Substituting this into the derivative of $\frac{u^2}{2}$ gives
\begin{align*}
\partial_x\left(\frac{u^2}{2}\right)=\frac12\bigl(2u\,\partial_xu\bigr).
\end{align*}
Canceling the factor $2$ with $\frac12$ yields
\begin{align*}
\partial_x\left(\frac{u^2}{2}\right)=u\,\partial_xu.
\end{align*}
Thus, wherever $u$ is $C^1$, Burgers equation in conservation form is equivalent to
\begin{align*}
\partial_t u+u\,\partial_xu=0.
\end{align*}
Let $t\mapsto x(t)$ be a smooth curve whose velocity equals the value of $u$ along the curve:
\begin{align*}
\frac{dx}{dt}=u(t,x(t)).
\end{align*}
By the chain rule for the composition $t\mapsto u(t,x(t))$,
\begin{align*}
\frac{d}{dt}u(t,x(t))=\partial_tu(t,x(t))+\frac{dx}{dt}\,\partial_xu(t,x(t)).
\end{align*}
Substituting $\frac{dx}{dt}=u(t,x(t))$ gives
\begin{align*}
\frac{d}{dt}u(t,x(t))=\partial_tu(t,x(t))+u(t,x(t))\,\partial_xu(t,x(t)).
\end{align*}
Since the smooth Burgers equation holds at the point $(t,x(t))$,
\begin{align*}
\partial_tu(t,x(t))+u(t,x(t))\,\partial_xu(t,x(t))=0.
\end{align*}
Hence
\begin{align*}
\frac{d}{dt}u(t,x(t))=0.
\end{align*}
Therefore $u$ is constant along each smooth characteristic, and the characteristic speed is that same constant value of $u$. Larger values of $u$ move along faster characteristic curves than smaller values, so characteristics can collide and form a discontinuity even when the initial profile is smooth.
[/example]
The Burgers example is the model for the rest of the chapter. The equation has a classical meaning before characteristics intersect, but the conservation principle still has a meaningful interpretation after the classical derivative has broken down.
## Test Functions and Weak Formulations
Once discontinuities appear, the pointwise equation is no longer the right object. The guiding question becomes: what identity survives after multiplying by a smooth probe function and integrating over spacetime? Test functions allow derivatives to be transferred from the unknown solution onto the probe, where they are harmless.
We use test functions compactly supported in spacetime so that no boundary terms appear at spatial infinity or at the endpoints of the time interval. The price is that the PDE is no longer read point by point; it is read through all possible probes.
[definition: Weak Solution of a Scalar Conservation Law]
Let $U \subset \mathbb R^n$ be open, let $T>0$, and let $f\in C^1(\mathbb R;\mathbb R^n)$. A function $u\in L^\infty_{\mathrm{loc}}((0,T)\times U)$ is a weak solution of
\begin{align*}
\partial_t u+\operatorname{div}_x f(u)=0
\end{align*}
on $(0,T)\times U$ if, for every $\phi\in C_c^\infty((0,T)\times U)$,
\begin{align*}
\int_0^{\,T}\int_U \left(u\,\partial_t\phi+f(u)\cdot \nabla\phi\right)\,d\mathcal L^n(x)\,dt=0.
\end{align*}
[/definition]
The definition is arranged so that the derivatives land on $\phi$, not on $u$. The local boundedness hypothesis ensures that $f(u)$ is locally bounded, hence the integral is finite on the compact support of $\phi$.
[quotetheorem:6150]
[citeproof:6150]
This result justifies the weak definition: it extends the classical equation without changing it when classical solutions exist. The compact support of $\phi$ is essential here because it removes boundary terms at $t=0$, $t=T$, and $\partial U$; if test functions were allowed to touch those boundaries, initial and boundary data would have to appear in the identity. The $C^1$ hypothesis on $u$ is used only for the equivalence with the pointwise PDE, not for the definition of weak solution itself. A discontinuous shock profile for Burgers equation may satisfy the weak identity while failing to satisfy $\partial_tu+u\partial_xu=0$ pointwise at the shock. The theorem also does not provide uniqueness or admissibility: for Riemann data, entropy-violating weak shocks can satisfy the same integral identity.
[remark: Distributional Reading]
If $u\in L^\infty_{\mathrm{loc}}((0,T)\times U)$, then $u$ and each component of $f(u)$ define regular distributions. The weak solution identity says that the distribution
\begin{align*}
\partial_t T_u+\sum_{i=1}^n \partial_{x_i}T_{f_i(u)}
\end{align*}
is zero on $(0,T)\times U$. This is the same equation as the classical divergence form, interpreted in $\mathcal D'((0,T)\times U)$.
[/remark]
Distribution notation is compact, but the test-function identity is usually the working definition because it shows exactly what has to be checked.
[example: Checking a Piecewise Smooth Weak Solution]
Let $\phi\in C_c^\infty((0,T)\times\mathbb R)$, and split spacetime into $\Omega_-=\{(t,x):x<s(t)\}$ and $\Omega_+=\{(t,x):x>s(t)\}$. Assume $u$ is $C^1$ on each side and satisfies $\partial_tu+\partial_x f(u)=0$ separately on $\Omega_-$ and $\Omega_+$. On either region, the product rule gives
\begin{align*}
\partial_t(u\phi)=(\partial_tu)\phi+u\,\partial_t\phi.
\end{align*}
Solving for the term containing $\partial_t\phi$ gives
\begin{align*}
u\,\partial_t\phi=\partial_t(u\phi)-(\partial_tu)\phi.
\end{align*}
The product rule also gives
\begin{align*}
\partial_x(f(u)\phi)=(\partial_x f(u))\phi+f(u)\,\partial_x\phi.
\end{align*}
Solving for the term containing $\partial_x\phi$ gives
\begin{align*}
f(u)\,\partial_x\phi=\partial_x(f(u)\phi)-(\partial_x f(u))\phi.
\end{align*}
Adding the two identities,
\begin{align*}
u\,\partial_t\phi+f(u)\,\partial_x\phi=\partial_t(u\phi)+\partial_x(f(u)\phi)-(\partial_tu+\partial_x f(u))\phi.
\end{align*}
Since $\partial_tu+\partial_x f(u)=0$ inside $\Omega_-$ and inside $\Omega_+$, this reduces on each side to
\begin{align*}
u\,\partial_t\phi+f(u)\,\partial_x\phi=\partial_t(u\phi)+\partial_x(f(u)\phi).
\end{align*}
Let the jump curve be $\Gamma=\{(t,s(t)):0<t<T\}$, and write
\begin{align*}
u_-(t)=\lim_{x\uparrow s(t)}u(t,x).
\end{align*}
Similarly,
\begin{align*}
u_+(t)=\lim_{x\downarrow s(t)}u(t,x).
\end{align*}
Because $\phi$ has compact support in $(0,T)\times\mathbb R$, applying the spacetime divergence theorem to the vector field $(u\phi,f(u)\phi)$ on $\Omega_-$ and $\Omega_+$ leaves no outer boundary contribution. Only the boundary terms on $\Gamma$ remain.
For $\Omega_-$, set $F(t,x)=x-s(t)$. Then $\Omega_-=\{F<0\}$, so the outward unit normal along $\Gamma$ is
\begin{align*}
\nu_-=\frac{\nabla_{t,x}F}{|\nabla_{t,x}F|}=\frac{(-s'(t),1)}{\sqrt{1+(s'(t))^2}}.
\end{align*}
The line element along the parametrized curve $t\mapsto (t,s(t))$ is
\begin{align*}
d\mathcal H^1=\sqrt{1+(s'(t))^2}\,dt.
\end{align*}
Thus the contribution from the left side is
\begin{align*}
\int_\Gamma (u_-\phi,f(u_-)\phi)\cdot \nu_-\,d\mathcal H^1=\int \left(-s'(t)u_-(t)+f(u_-(t))\right)\phi(t,s(t))\,dt.
\end{align*}
For $\Omega_+$, the outward normal points toward $\{F<0\}$, so
\begin{align*}
\nu_+=-\frac{\nabla_{t,x}F}{|\nabla_{t,x}F|}=\frac{(s'(t),-1)}{\sqrt{1+(s'(t))^2}}.
\end{align*}
Therefore the contribution from the right side is
\begin{align*}
\int_\Gamma (u_+\phi,f(u_+)\phi)\cdot \nu_+\,d\mathcal H^1=\int \left(s'(t)u_+(t)-f(u_+(t))\right)\phi(t,s(t))\,dt.
\end{align*}
Adding the left and right boundary contributions gives
\begin{align*}
\int \left(-s'(t)u_-(t)+f(u_-(t))\right)\phi(t,s(t))\,dt+\int \left(s'(t)u_+(t)-f(u_+(t))\right)\phi(t,s(t))\,dt.
\end{align*}
Combining the two integrands,
\begin{align*}
-s'(t)u_-(t)+f(u_-(t))+s'(t)u_+(t)-f(u_+(t))=s'(t)(u_+(t)-u_-(t))-\bigl(f(u_+(t))-f(u_-(t))\bigr).
\end{align*}
Hence the total interface contribution is
\begin{align*}
\int \left(s'(t)(u_+(t)-u_-(t))-\bigl(f(u_+(t))-f(u_-(t))\bigr)\right)\phi(t,s(t))\,dt.
\end{align*}
The weak identity holds for every test function exactly when this integral is zero for every $\phi\in C_c^\infty((0,T)\times\mathbb R)$. Since any compactly supported smooth function $\psi(t)$ can be realized on the curve by taking $\phi(t,x)=\psi(t)\eta(x-s(t))$ with $\eta\in C_c^\infty(\mathbb R)$ and $\eta(0)=1$, the coefficient of $\phi(t,s(t))$ must vanish along the curve:
\begin{align*}
s'(t)(u_+(t)-u_-(t))=f(u_+(t))-f(u_-(t)).
\end{align*}
The weak formulation converts the moving jump into a local balance between the jump speed and the jump in flux.
[/example]
The coefficient obtained in this example is the jump condition. It is the local rule that tells a discontinuity how fast it must move if it is to preserve the conservation law.
## Moving Discontinuities and the Rankine-Hugoniot Condition
A weak formulation permits discontinuities, but it does not permit arbitrary discontinuities. The next question is local and geometric: if a solution has a jump across a moving hypersurface, what speed is compatible with conservation? In one space dimension the answer is an algebraic relation between the shock speed and the jump in flux.
[definition: Shock Joining Two States]
Let $f: \mathbb R\to\mathbb R$ be a $C^1$ map, and let $u_-,u_+\in \mathbb R$. A piecewise constant function $u: (0,T)\times\mathbb R\to\mathbb R$ given by $u(t,x)=u_-$ for $x<st$ and $u(t,x)=u_+$ for $x>st$ is called a shock joining the left state $u_-$ to the right state $u_+$ with speed $s$.
[/definition]
The definition records a candidate discontinuous profile, not yet a solution. The weak formulation from the previous section now gives a way to test it: all interior derivatives vanish on the constant regions, so the entire obstruction is concentrated on the moving line $x=st$. Conservation selects the allowed value of $s$ by requiring that this line contribution cancel.
[explanation: Rankine-Hugoniot Jump Condition]
For a scalar conservation law $u_t+f(u)_x=0$, suppose a piecewise smooth solution has left and right traces $u^-(t)$ and $u^+(t)$ across a $C^1$ interface $x=\xi(t)$. Conservation across the moving interface forces
\begin{align*}
\dot{\xi}(t)\bigl(u^+(t)-u^-(t)\bigr)=f(u^+(t))-f(u^-(t)).
\end{align*}
For a constant-speed shock $x=st$, this becomes $s(u_+-u_-)=f(u_+)-f(u_-)$.
[/explanation]
The condition says that shock speed is the secant slope of the flux between the two states. The piecewise constant hypothesis is what makes the computation purely a boundary calculation: away from $x=st$, both $u$ and $f(u)$ are constant, so there are no interior residuals. The assumption that $f: \mathbb R\to\mathbb R$ is $C^1$ is stronger than is needed for this algebraic jump calculation, but it keeps the theorem in the same class of fluxes used for the surrounding conservation law theory. If the speed is chosen incorrectly, the weak identity fails; for Burgers flux, the jump $u_-=1$, $u_+=0$ with speed $s=0$ leaves a nonzero coefficient on the shock line. The theorem does not say that a Rankine-Hugoniot shock is physically correct: entropy conditions will later rule out some weak shocks. If $u_-\ne u_+$, then
\begin{align*}
s=\frac{f(u_+)-f(u_-)}{u_+-u_-}.
\end{align*}
[example: Burgers Shock Joining Two Constant States]
For the Burgers flux
\begin{align*}
f(u)=\frac{u^2}{2},
\end{align*}
the *Rankine-Hugoniot Condition* says that, when $u_-\ne u_+$, the shock speed is
\begin{align*}
s=\frac{f(u_+)-f(u_-)}{u_+-u_-}.
\end{align*}
Substituting $f(u)=u^2/2$ into the numerator gives
\begin{align*}
f(u_+)-f(u_-)=\frac{u_+^2}{2}-\frac{u_-^2}{2}.
\end{align*}
Since the two terms have the same denominator,
\begin{align*}
\frac{u_+^2}{2}-\frac{u_-^2}{2}=\frac{u_+^2-u_-^2}{2}.
\end{align*}
Therefore
\begin{align*}
s=\frac{\frac{u_+^2-u_-^2}{2}}{u_+-u_-}
=\frac{u_+^2-u_-^2}{2(u_+-u_-)}.
\end{align*}
The difference of squares identity gives
\begin{align*}
u_+^2-u_-^2=(u_+-u_-)(u_++u_-),
\end{align*}
so
\begin{align*}
s=\frac{(u_+-u_-)(u_++u_-)}{2(u_+-u_-)}.
\end{align*}
Because $u_-\ne u_+$, we have $u_+-u_-\ne 0$, and cancellation of this nonzero factor gives
\begin{align*}
s=\frac{u_++u_-}{2}=\frac{u_-+u_+}{2}.
\end{align*}
For a jump from $u_-=2$ to $u_+=0$, this formula gives
\begin{align*}
s=\frac{2+0}{2}=\frac{2}{2}=1.
\end{align*}
In the smooth Burgers equation, the characteristic speed is $u$ itself. Indeed, for $C^1$ functions $u$,
\begin{align*}
\partial_x f(u)
=\partial_x\left(\frac{u^2}{2}\right)
=\frac12\,\partial_x(u^2)
=\frac12\left((\partial_x u)u+u(\partial_xu)\right)
=\frac12\left(2u\,\partial_xu\right)
=u\,\partial_xu.
\end{align*}
Thus the conservation law
\begin{align*}
\partial_tu+\partial_x f(u)=0
\end{align*}
becomes
\begin{align*}
\partial_tu+u\,\partial_xu=0
\end{align*}
where $u$ is smooth, so characteristics move with velocity $u$. The left state $u_-=2$ sends characteristics with speed $2$, while the right state $u_+=0$ sends characteristics with speed $0$. Since
\begin{align*}
0<1<2,
\end{align*}
the shock speed lies between the two characteristic speeds: left characteristics overtake the discontinuity, and the discontinuity overtakes right characteristics. Characteristics enter the jump from both sides, which is the compressive shock pattern that the entropy condition will select later.
[/example]
The Burgers calculation turns the jump condition into a concrete speed, but it also points to a broader geometric principle. In more than one space dimension a discontinuity is not a moving point but a moving hypersurface, so the correct balance must involve the normal component of spacetime flux. The next theorem is the multidimensional form of the same boundary-term cancellation used in the one-dimensional proof.
[quotetheorem:6151]
[citeproof:6151]
This higher-dimensional form is best read as continuity of normal spacetime flux across the jump. The added classical-solvability hypothesis away from $\Sigma$ is necessary: if $u$ has a smooth residual $\partial_tu+\operatorname{div}_x f(u)\ne0$ on one side, then the weak formulation contains interior terms as well as the hypersurface term, so the jump relation alone cannot make $u$ a weak solution. Smoothness of $\Sigma$ supplies a well-defined normal and trace; corners or fractal interfaces require a different formulation. The theorem is also only a necessary condition in this stated form: it identifies the singular contribution along the jump, but it does not by itself impose initial data, boundary data, or entropy admissibility. In one space dimension, the line $x=s t$ has spacetime normal proportional to $(-s,1)$, and the formula reduces to the scalar Rankine-Hugoniot relation.
[illustration:pdei-rankine-hugoniot-shock]
## Flux Shape and Physical Examples
The weak formulation and the jump condition are formal tools; their interpretation depends on the flux. A useful question is how the graph of $f$ controls propagation speeds, shock speeds, and the qualitative meaning of discontinuities.
[example: Traffic Density with Concave Flux]
Let $u(t,x)$ denote car density on a one-dimensional road, normalized so that $0\le u\le 1$, and take the traffic flux
\begin{align*}
f(u)=u(1-u).
\end{align*}
Expanding the product gives
\begin{align*}
f(u)=u\cdot 1-u\cdot u=u-u^2.
\end{align*}
At the empty-road density,
\begin{align*}
f(0)=0(1-0)=0\cdot 1=0.
\end{align*}
At the jammed-road density,
\begin{align*}
f(1)=1(1-1)=1\cdot 0=0.
\end{align*}
To check concavity on $[0,1]$, differentiate $f(u)=u-u^2$:
\begin{align*}
f'(u)=\frac{d}{du}(u)-\frac{d}{du}(u^2)=1-2u.
\end{align*}
Differentiating once more,
\begin{align*}
f''(u)=\frac{d}{du}(1-2u)=0-2=-2.
\end{align*}
Since $-2<0$, the graph of $f$ is concave on $[0,1]$. Thus the conservation law $\partial_tu+\partial_x f(u)=0$ becomes
\begin{align*}
\partial_tu+\partial_x\bigl(u(1-u)\bigr)=0.
\end{align*}
Now consider a jump from $u_-=0.8$ to $u_+=0.2$. By the *Rankine-Hugoniot Condition*, because $u_+\ne u_-$, the shock speed is
\begin{align*}
s=\frac{f(u_+)-f(u_-)}{u_+-u_-}.
\end{align*}
The right-state flux is
\begin{align*}
f(0.2)=0.2(1-0.2)=0.2(0.8).
\end{align*}
Writing $0.2=\frac{2}{10}$ and $0.8=\frac{8}{10}$,
\begin{align*}
0.2(0.8)=\frac{2}{10}\cdot\frac{8}{10}=\frac{16}{100}=0.16.
\end{align*}
Hence
\begin{align*}
f(0.2)=0.16.
\end{align*}
The left-state flux is
\begin{align*}
f(0.8)=0.8(1-0.8)=0.8(0.2).
\end{align*}
Using the same multiplication,
\begin{align*}
0.8(0.2)=\frac{8}{10}\cdot\frac{2}{10}=\frac{16}{100}=0.16,
\end{align*}
so
\begin{align*}
f(0.8)=0.16.
\end{align*}
Substituting these values into the jump-speed formula gives
\begin{align*}
s=\frac{0.16-0.16}{0.2-0.8}.
\end{align*}
The numerator is
\begin{align*}
0.16-0.16=0,
\end{align*}
and the denominator is
\begin{align*}
0.2-0.8=\frac{2}{10}-\frac{8}{10}=-\frac{6}{10}=-0.6.
\end{align*}
Therefore
\begin{align*}
s=\frac{0}{-0.6}=0.
\end{align*}
The interface is stationary because the flux entering from the left equals the flux leaving on the right, even though the traffic densities on the two sides are different.
[/example]
This example shows that shock speed is not the velocity of individual particles. It is the velocity of a macroscopic interface determined by conservation of the total density.
[remark: What Weak Solutions Remember]
A weak solution remembers the integral conservation law, not the characteristic construction. Characteristics remain a useful diagnostic in smooth regions, but after crossing they no longer define a single-valued classical solution. The weak formulation keeps the conserved quantity and discards the requirement that derivatives exist pointwise across the shock.
[/remark]
The chapter ends with the main lesson: divergence form is the correct form for passing from classical conservation to weak conservation. The Rankine-Hugoniot condition is the first compatibility law for discontinuities, and it prepares the ground for entropy conditions, where admissibility rather than mere weak solvability becomes the central issue.
Once shocks appear, weak formulations are no longer enough to pick out a physically meaningful solution. Chapter 6 adds entropy and admissibility to the weak theory, giving the extra selection principle needed to control discontinuous conservation laws.
# 6. Entropy Conditions and Admissibility
This chapter studies what remains of the Cauchy problem for scalar conservation laws after the classical solution has broken down. In Chapters 3 and 5, characteristics explained smooth propagation and the formation of shocks, while weak formulations preserved conservation across jumps; after characteristics cross, the weak formulation alone no longer selects a physical solution. The main question is therefore an admissibility question: among many weak solutions with the same initial data, which one is selected by compression, viscosity, and stability?
For most of the chapter we consider a scalar conservation law
\begin{align*}
u_t + f(u)_x = 0, \qquad u(0,x)=u_0(x),
\end{align*}
where $f \in C^2(\mathbb R)$ is the flux. The model case is Burgers equation
\begin{align*}
u_t + \left(\frac{u^2}{2}\right)_x=0,
\end{align*}
where characteristic speed is $f'(u)=u$.
## Nonuniqueness After Shock Formation
The first problem is to say what the conservation law means once $u$ has a jump discontinuity. Classical derivatives no longer exist along the shock curve, but the conserved quantity should still satisfy an integral balance against test functions.
[definition: Weak Solution Of A Scalar Conservation Law]
Let $u \in L^\infty_{\mathrm{loc}}((0,\infty)\times \mathbb R)$ and let $u_0 \in L^\infty_{\mathrm{loc}}(\mathbb R)$. The function $u$ is a weak solution of $u_t+f(u)_x=0$ with initial data $u_0$ if, for every $\phi \in C_c^\infty([0,\infty)\times \mathbb R)$,
\begin{align*}
\int_0^\infty \int_{\mathbb R} \left(u\phi_t + f(u)\phi_x\right)\,dx\,dt + \int_{\mathbb R} u_0(x)\phi(0,x)\,dx = 0.
\end{align*}
[/definition]
The weak formulation is obtained by multiplying the conservation law by a test function and integrating by parts. It permits discontinuities, but it also raises a new question: if a discontinuity separates two constant states, what speed is compatible with conservation across the moving interface?
[explanation: Rankine-Hugoniot Jump Condition]
For a scalar conservation law $u_t+f(u)_x=0$, suppose a piecewise smooth solution has left and right traces $u^-(t)$ and $u^+(t)$ across a $C^1$ interface $x=\xi(t)$. Conservation across the moving interface forces
\begin{align*}
\dot{\xi}(t)\bigl(u^+(t)-u^-(t)\bigr)=f(u^+(t))-f(u^-(t)).
\end{align*}
For a constant-speed shock $x=st$, this becomes $s(u_+-u_-)=f(u_+)-f(u_-)$.
[/explanation]
The jump condition is necessary for weak solutions, but it is not an admissibility criterion. The piecewise $C^1$ and trace hypotheses are what make the boundary calculation meaningful; without well-defined left and right traces along the curve, the expression $[f(u)]/[u]$ may not even have a value. The single-jump assumption is a local simplification: with several shocks the same condition holds on each sufficiently regular shock curve, but interactions require additional bookkeeping. Most importantly, the condition does not select a unique weak solution: for Burgers Riemann data with $u_L<u_R$, both an upward jump moving at Rankine-Hugoniot speed and the rarefaction fan satisfy conservation in the weak sense, but only the rarefaction is admissible. This failure of selection is why the rest of this chapter imposes entropy conditions rather than another conservation identity.
[example: Two Burgers Shocks With The Same Jump Law]
For Burgers equation $f(u)=u^2/2$, take Riemann data $u_0(x)=u_L$ for $x<0$ and $u_0(x)=u_R$ for $x>0$, with $u_L\ne u_R$. A single discontinuity $x=\sigma t$ joining $u_L$ to $u_R$ must satisfy the [Rankine-Hugoniot jump condition](/theorems/232), so
\begin{align*}
\sigma = \frac{f(u_L)-f(u_R)}{u_L-u_R}.
\end{align*}
Substituting $f(u)=u^2/2$ gives
\begin{align*}
\sigma = \frac{\frac{u_L^2}{2}-\frac{u_R^2}{2}}{u_L-u_R}.
\end{align*}
Factoring the numerator,
\begin{align*}
\sigma = \frac{u_L^2-u_R^2}{2(u_L-u_R)} = \frac{(u_L-u_R)(u_L+u_R)}{2(u_L-u_R)} = \frac{u_L+u_R}{2}.
\end{align*}
The characteristic speed for Burgers equation is $f'(u)=u$, so characteristics in the left constant state move with speed $u_L$ and characteristics in the right constant state move with speed $u_R$. If $u_L>u_R$, then
\begin{align*}
u_L-\sigma = u_L-\frac{u_L+u_R}{2} = \frac{u_L-u_R}{2} > 0.
\end{align*}
Also,
\begin{align*}
u_R-\sigma = u_R-\frac{u_L+u_R}{2} = \frac{u_R-u_L}{2} < 0.
\end{align*}
Thus left characteristics move faster than the discontinuity and right characteristics move slower than it, so characteristics from both sides run into the shock.
If instead $u_L<u_R$, then
\begin{align*}
u_L-\sigma = \frac{u_L-u_R}{2}<0.
\end{align*}
Similarly,
\begin{align*}
u_R-\sigma = \frac{u_R-u_L}{2}>0.
\end{align*}
The left characteristics fall behind the discontinuity and the right characteristics move ahead of it, so the same Rankine-Hugoniot jump law produces an expansive discontinuity rather than a compressed front.
[/example]
The second case should be replaced by a rarefaction fan. This already shows the guiding principle: admissible discontinuities are compressive, while expansive discontinuities are rejected.
## Lax and Oleinik Entropy Conditions
Once nonuniqueness appears, the next problem is to phrase compression using only local information at the discontinuity. For a convex scalar flux, the geometry of characteristics gives a sharp test: the characteristic speed on the left must exceed the shock speed, and the shock speed must exceed the characteristic speed on the right.
[definition: Lax Entropy Shock]
Assume $f\in C^2(\mathbb R)$ and $f''(u)>0$ for all $u\in \mathbb R$. A jump from $u_-$ to $u_+$ with speed
\begin{align*}
\sigma = \frac{f(u_-)-f(u_+)}{u_- - u_+}
\end{align*}
is a Lax entropy shock if
\begin{align*}
f'(u_-) > \sigma > f'(u_+).
\end{align*}
[/definition]
The definition translates compression into inequalities between characteristic speeds. For convex fluxes, the Rankine-Hugoniot speed is a secant slope of $f$, while the incoming characteristic speeds are the endpoint tangent slopes. The admissibility question is therefore whether this secant lies between the two endpoint tangents in the correct order. Strict convexity turns that geometric comparison into a simple test on the left and right states, which is the form needed in Riemann problems.
[quotetheorem:6152]
[citeproof:6152]
The Lax test is local at a single shock, and the strict convexity hypothesis is doing real work. It ensures that $f'$ is strictly increasing and that each secant slope lies strictly between the endpoint characteristic speeds; for a nonconvex flux, a secant can cross regions of changing convexity, so the simple ordering $u_->u_+$ no longer characterises admissibility. The theorem also says nothing about rarefaction fans, shock interactions, or arbitrary weak solutions away from a single discontinuity. To control general weak solutions, Oleinik's condition gives a one-sided estimate that rules out upward jumps and imposes a quantitative decay on positive slopes.
[definition: Kruzhkov Entropy Solution]
For the scalar conservation law $u_t+f(u)_x=0$, a bounded weak solution is a Kruzhkov entropy solution if it satisfies the entropy inequality for every constant state $k\in\mathbb R$, using the entropy density $|u-k|$ and entropy flux $\operatorname{sgn}(u-k)(f(u)-f(k))$ against every nonnegative compactly supported test function.
[/definition]
This entropy class is broad enough to include nonsmooth shock and rarefaction solutions, but it still needs quantitative structure. For convex fluxes, Oleinik's estimate supplies that structure by forbidding positive slopes from growing too steeply after positive time, which rules out expansive jumps in a way that can be checked directly on profiles.
[quotetheorem:6153]
[citeproof:6153]
Oleinik's estimate is stronger than rejecting a single expansive shock. The lower bound $f''\ge \alpha>0$ gives a uniform Riccati-type damping of positive slopes; if genuine convexity degenerates, as for a flux with an inflection point, positive slopes need not decay at the same universal rate and nonclassical wave patterns may appear. The estimate is one-sided: it controls upward variation but allows downward jumps, so it is not by itself a [compactness theorem](/theorems/2748) or a full uniqueness theorem. Its role here is diagnostic and quantitative, preparing the transition from geometric shock admissibility to integral entropy inequalities.
[example: Rarefaction Fan For Burgers Equation]
For Burgers equation with $u_L<u_R$, the rarefaction fan is the self-similar function defined, for $t>0$, by $u(t,x)=u_L$ when $x/t<u_L$, by $u(t,x)=x/t$ when $u_L\le x/t\le u_R$, and by $u(t,x)=u_R$ when $x/t>u_R$. In the two constant regions the derivative with respect to $x$ is $0$. Inside the fan,
\begin{align*}
u(t,x)=\frac{x}{t}.
\end{align*}
Differentiating with respect to $x$ at fixed $t>0$ gives
\begin{align*}
u_x(t,x)=\frac{1}{t}.
\end{align*}
For Burgers flux $f(u)=u^2/2$, we have $f'(u)=u$ and $f''(u)=1$, so the constant in the *[Oleinik One-Sided Estimate](/theorems/6153)* is $\alpha=1$. The estimate therefore becomes
\begin{align*}
u(t,y)-u(t,x)\le \frac{y-x}{t}
\end{align*}
for a.e. $x<y$. Inside the fan, equality holds:
\begin{align*}
u(t,y)-u(t,x)=\frac{y}{t}-\frac{x}{t}.
\end{align*}
Since $\frac{y}{t}-\frac{x}{t}=\frac{y-x}{t}$, the rarefaction fan saturates the one-sided bound.
The competing single-jump weak solution would move at Rankine-Hugoniot speed
\begin{align*}
\sigma=\frac{f(u_L)-f(u_R)}{u_L-u_R}.
\end{align*}
Substituting $f(u)=u^2/2$ gives
\begin{align*}
\sigma=\frac{\frac{u_L^2}{2}-\frac{u_R^2}{2}}{u_L-u_R}.
\end{align*}
Factoring the numerator gives
\begin{align*}
\sigma=\frac{u_L^2-u_R^2}{2(u_L-u_R)}.
\end{align*}
Since $u_L^2-u_R^2=(u_L-u_R)(u_L+u_R)$ and $u_L\ne u_R$,
\begin{align*}
\sigma=\frac{u_L+u_R}{2}.
\end{align*}
Because $u_L<u_R$, this jump is upward: the value immediately to the right of the discontinuity is larger than the value immediately to the left. At a fixed time $t>0$, choose $x<\sigma t<y$. Then the jump solution has $u(t,x)=u_L$ and $u(t,y)=u_R$, so
\begin{align*}
u(t,y)-u(t,x)=u_R-u_L>0.
\end{align*}
But $y-x$ can be chosen arbitrarily small around $\sigma t$, so for sufficiently small $y-x$ one has
\begin{align*}
u_R-u_L>\frac{y-x}{t}.
\end{align*}
Thus the upward jump violates Oleinik's one-sided estimate, while the rarefaction fan satisfies it sharply.
[/example]
The rarefaction fan illustrates why admissibility cannot be stated as a preference for discontinuities. Sometimes the admissible solution is continuous and self-similar, even though the weak formulation also permits a discontinuity.
## Kruzhkov Entropies and Uniqueness
The final problem is uniqueness for arbitrary bounded data, not merely for a single Riemann jump. Kruzhkov's idea is to test the conservation law against every constant state $k$ and measure whether the distance from $u$ to $k$ is itself dissipated.
[definition: Kruzhkov Entropy Pair]
For $k\in \mathbb R$, the Kruzhkov entropy is the map
\begin{align*}
\eta_k: \mathbb R &\to [0,\infty), & u &\mapsto |u-k|,
\end{align*}
and the associated entropy flux is the map
\begin{align*}
q_k: \mathbb R &\to \mathbb R, & u &\mapsto \operatorname{sgn}(u-k)(f(u)-f(k)).
\end{align*}
[/definition]
These entropies compare the solution with all constant solutions. The next step is to impose their dissipation as an inequality, because equality is too rigid once shocks create entropy production.
[definition: Kruzhkov Entropy Solution]
Let $u\in L^\infty((0,\infty)\times \mathbb R)$ be a weak solution with initial data $u_0\in L^\infty(\mathbb R)$. The function $u$ is a Kruzhkov entropy solution if, for every $k\in\mathbb R$ and every nonnegative $\phi\in C_c^\infty((0,\infty)\times\mathbb R)$,
\begin{align*}
\int_0^\infty\int_{\mathbb R}\left(|u-k|\phi_t+\operatorname{sgn}(u-k)(f(u)-f(k))\phi_x\right)\,dx\,dt\ge 0.
\end{align*}
[/definition]
The sign of the inequality records dissipation. The reason to impose the condition for every $k$ is that it produces an $L^1$ [comparison principle](/theorems/4870) between two different solutions. This motivates the following uniqueness theorem.
[quotetheorem:579]
[citeproof:579]
The boundedness assumption keeps the nonlinear flux terms controlled uniformly on the range of the solutions, while the $L^1$ assumption on $u_0-v_0$ is what makes the contraction estimate finite on the whole line. The full family of constants $k$ is also essential: using only one convex entropy such as $u^2/2$ can admit weak solutions that dissipate that entropy but fail comparison with other constant states. The theorem does not construct solutions; it says that once entropy solutions exist, the admissibility criterion is strong enough to prevent two different evolutions from the same data. In the smooth convex-flux setting, the core mechanism can be seen without the full doubling formalism: subtract the equations and multiply by a smooth approximation to $\operatorname{sgn}(u-v)$, then integrate by parts to obtain the same contraction.
[example: Entropy Inequality Across A Burgers Shock]
For Burgers equation $f(u)=u^2/2$, fix constants $u_->u_+$ and set $\sigma=(u_-+u_+)/2$. Consider the discontinuous function equal to $u_-$ on the left of the line $x=\sigma t$ and equal to $u_+$ on the right. For a fixed $k\in\mathbb R$, the Kruzhkov entropy and entropy flux are
\begin{align*}
\eta_k(u)=|u-k|,\qquad q_k(u)=\operatorname{sgn}(u-k)\left(\frac{u^2}{2}-\frac{k^2}{2}\right).
\end{align*}
Both $\eta_k(u)$ and $q_k(u)$ are constant away from the shock line, so the distribution $\partial_t\eta_k(u)+\partial_x q_k(u)$ is concentrated on $x=\sigma t$. Its coefficient is
\begin{align*}
E_k=\sigma\bigl(|u_- -k|-|u_+-k|\bigr)-\bigl(q_k(u_-)-q_k(u_+)\bigr).
\end{align*}
With the sign convention in the Kruzhkov inequality, it is enough to prove $E_k\le 0$ for every $k$.
First suppose $k\le u_+$. Then $u_- -k\ge 0$ and $u_+-k\ge 0$, so
\begin{align*}
E_k=\sigma\bigl((u_- -k)-(u_+-k)\bigr)-\left(\frac{u_-^2-k^2}{2}-\frac{u_+^2-k^2}{2}\right).
\end{align*}
Thus
\begin{align*}
E_k=\sigma(u_- -u_+)-\frac{u_-^2-u_+^2}{2}.
\end{align*}
Using $u_-^2-u_+^2=(u_- -u_+)(u_-+u_+)$ and $\sigma=(u_-+u_+)/2$ gives
\begin{align*}
E_k=\frac{u_-+u_+}{2}(u_- -u_+)-\frac{(u_- -u_+)(u_-+u_+)}{2}=0.
\end{align*}
Next suppose $k\ge u_-$. Then $u_- -k\le 0$ and $u_+-k\le 0$, so
\begin{align*}
E_k=\sigma\bigl((k-u_-)-(k-u_+)\bigr)-\left(-\frac{u_-^2-k^2}{2}+\frac{u_+^2-k^2}{2}\right).
\end{align*}
Therefore
\begin{align*}
E_k=-\sigma(u_- -u_+)+\frac{u_-^2-u_+^2}{2}.
\end{align*}
Again substituting $\sigma=(u_-+u_+)/2$ and factoring the difference of squares gives
\begin{align*}
E_k=-\frac{u_-+u_+}{2}(u_- -u_+)+\frac{(u_- -u_+)(u_-+u_+)}{2}=0.
\end{align*}
It remains to treat $u_+<k<u_-$. In this case $u_- -k>0$ and $u_+-k<0$, hence
\begin{align*}
E_k=\sigma\bigl((u_- -k)-(k-u_+)\bigr)-\left(\frac{u_-^2-k^2}{2}+\frac{u_+^2-k^2}{2}\right).
\end{align*}
Substituting $\sigma=(u_-+u_+)/2$ gives
\begin{align*}
E_k=\frac{u_-+u_+}{2}(u_-+u_+-2k)-\frac{u_-^2+u_+^2-2k^2}{2}.
\end{align*}
Expanding the first product,
\begin{align*}
E_k=\frac{u_-^2+2u_-u_+ +u_+^2-2ku_- -2ku_+ -u_-^2-u_+^2+2k^2}{2}.
\end{align*}
Canceling $u_-^2$ and $u_+^2$ leaves
\begin{align*}
E_k=u_-u_+-ku_- -ku_+ +k^2.
\end{align*}
Factoring the quadratic in $k$ gives
\begin{align*}
E_k=(k-u_-)(k-u_+).
\end{align*}
Since $u_+<k<u_-$, we have $k-u_-<0$ and $k-u_+>0$, so
\begin{align*}
E_k<0.
\end{align*}
Thus $E_k\le 0$ for all constants $k$, with strict entropy production exactly when $k$ lies between the two states. The compressive Burgers shock therefore satisfies every Kruzhkov entropy inequality.
[/example]
Kruzhkov's theorem is the rigorous form of selection by admissibility. The entropy inequalities reject nonentropy shocks, accept rarefactions, and make the Cauchy problem stable in $L^1$.
## Vanishing Viscosity Selection
The entropy conditions also arise from a physical regularisation: add a small diffusion term, solve the parabolic problem, and let the viscosity vanish. The question is whether this limiting process chooses the same solution as the entropy inequalities.
[definition: Vanishing Viscosity Approximation]
For $\varepsilon>0$, the vanishing viscosity approximation to $u_t+f(u)_x=0$ is
\begin{align*}
u^\varepsilon_t + f(u^\varepsilon)_x = \varepsilon u^\varepsilon_{xx}, \qquad u^\varepsilon(0,x)=u_0^\varepsilon(x).
\end{align*}
[/definition]
The parabolic term smooths shocks over a layer of width depending on $\varepsilon$. As $\varepsilon\to0$, the layer collapses, so the key issue is whether the limiting weak solution satisfies the same entropy inequalities that gave uniqueness. This motivates the following selection theorem.
[quotetheorem:6154]
[citeproof:6154]
This selection theorem explains why the entropy solution is not merely a convention attached to weak solutions. The convergence hypothesis is substantial: the theorem identifies any strong local $L^1$ limit, but it does not prove that such a convergent subsequence exists in settings where compactness estimates are unavailable. The assumptions $u_0\in L^1\cap L^\infty$ give both spatial integrability for the contraction framework and uniform bounds for the nonlinear flux; without those controls, mass can escape at infinity or nonlinear terms may fail to pass to the limit. To see the mechanism in a single shock, it is useful to inspect the travelling wave profile created by viscosity before the limit is taken.
[example: Viscous Profile For A Burgers Shock]
For Burgers equation with viscosity,
\begin{align*}
u^\varepsilon_t+\left(\frac{(u^\varepsilon)^2}{2}\right)_x=\varepsilon u^\varepsilon_{xx},
\end{align*}
look for a travelling layer
\begin{align*}
u^\varepsilon(t,x)=U(z),\qquad z=\frac{x-\sigma t}{\varepsilon},
\end{align*}
with $U(-\infty)=u_-$ and $U(\infty)=u_+$. Differentiating the ansatz gives
\begin{align*}
u^\varepsilon_t=-\frac{\sigma}{\varepsilon}U'(z),\qquad u^\varepsilon_x=\frac{1}{\varepsilon}U'(z),\qquad u^\varepsilon_{xx}=\frac{1}{\varepsilon^2}U''(z).
\end{align*}
Also,
\begin{align*}
\left(\frac{(u^\varepsilon)^2}{2}\right)_x=\frac{d}{dx}\left(\frac{U(z)^2}{2}\right)=U(z)U'(z)\frac{1}{\varepsilon}.
\end{align*}
Substitution gives
\begin{align*}
-\frac{\sigma}{\varepsilon}U'(z)+\frac{1}{\varepsilon}U(z)U'(z)=\varepsilon\frac{1}{\varepsilon^2}U''(z).
\end{align*}
Multiplying by $\varepsilon$ yields
\begin{align*}
U''(z)=(U(z)-\sigma)U'(z).
\end{align*}
Since
\begin{align*}
\frac{d}{dz}\left(\frac{U(z)^2}{2}-\sigma U(z)\right)=U(z)U'(z)-\sigma U'(z),
\end{align*}
we integrate once to obtain
\begin{align*}
U'(z)=\frac{U(z)^2}{2}-\sigma U(z)+C.
\end{align*}
Using $U(\infty)=u_+$ and $U'(\infty)=0$ gives
\begin{align*}
C=\sigma u_+-\frac{u_+^2}{2}.
\end{align*}
Using $U(-\infty)=u_-$ and $U'(-\infty)=0$ then gives
\begin{align*}
0=\frac{u_-^2-u_+^2}{2}-\sigma(u_- -u_+).
\end{align*}
Factoring $u_-^2-u_+^2=(u_- -u_+)(u_-+u_+)$ and using $u_-\ne u_+$ gives
\begin{align*}
\sigma=\frac{u_-+u_+}{2}.
\end{align*}
With this value of $\sigma$,
\begin{align*}
U'=\frac{U^2}{2}-\frac{u_-+u_+}{2}U+\frac{u_-+u_+}{2}u_+-\frac{u_+^2}{2}.
\end{align*}
The constant terms combine as
\begin{align*}
\frac{u_-+u_+}{2}u_+-\frac{u_+^2}{2}=\frac{u_-u_+}{2},
\end{align*}
so
\begin{align*}
U'=\frac{U^2-(u_-+u_+)U+u_-u_+}{2}.
\end{align*}
Factoring the quadratic gives
\begin{align*}
U'=\frac{(U-u_-)(U-u_+)}{2}.
\end{align*}
If $u_->u_+$ and $u_+<U<u_-$, then $U-u_-<0$ and $U-u_+>0$, hence
\begin{align*}
U'<0.
\end{align*}
Thus the viscous layer decreases from the left state to the right state.
We can solve the separated equation explicitly. Let $a=u_- -u_+>0$ and define
\begin{align*}
R(z)=\frac{U(z)-u_+}{u_- -U(z)}.
\end{align*}
Differentiating logarithmically gives
\begin{align*}
\frac{R'(z)}{R(z)}=\frac{U'(z)}{U(z)-u_+}+\frac{U'(z)}{u_- -U(z)}.
\end{align*}
Substituting $U'=(U-u_-)(U-u_+)/2$ gives
\begin{align*}
\frac{R'(z)}{R(z)}=\frac{(U-u_-)(U-u_+)}{2}\left(\frac{1}{U-u_+}+\frac{1}{u_- -U}\right).
\end{align*}
The parenthesis has common denominator $(U-u_+)(u_- -U)$ and numerator $u_- -u_+$, so
\begin{align*}
\frac{R'(z)}{R(z)}=-\frac{u_- -u_+}{2}=-\frac{a}{2}.
\end{align*}
Choosing the origin so that $R(0)=1$, we get
\begin{align*}
R(z)=e^{-az/2}.
\end{align*}
Therefore
\begin{align*}
\frac{U(z)-u_+}{u_- -U(z)}=e^{-az/2}.
\end{align*}
Solving for $U(z)$ gives
\begin{align*}
U(z)=\frac{u_+ +u_-e^{-az/2}}{1+e^{-az/2}}.
\end{align*}
As $z\to-\infty$, $e^{-az/2}\to\infty$ and $U(z)\to u_-$. As $z\to\infty$, $e^{-az/2}\to0$ and $U(z)\to u_+$.
If instead $u_-<u_+$, then a profile connecting $u_-$ on the left to $u_+$ on the right would have to increase somewhere. But for every intermediate value $u_-<U<u_+$, the same profile equation gives $U-u_->0$ and $U-u_+<0$, so
\begin{align*}
U'=\frac{(U-u_-)(U-u_+)}{2}<0.
\end{align*}
An increasing heteroclinic viscous profile therefore cannot connect $u_-$ to $u_+$. The viscous travelling layer exists for the compressive ordering $u_->u_+$, the same ordering selected by the *[Lax Entropy Condition For Convex Scalar Flux](/theorems/6152)* and by the Kruzhkov entropy inequalities.
[/example]
Vanishing viscosity closes the circle between physical modelling and weak-solution selection. The weak formulation supplies conservation across discontinuities, the Lax and Oleinik conditions express compression, Kruzhkov inequalities give uniqueness and stability, and viscosity explains why this is the solution observed as a limit of regularised dynamics.
Entropy conditions make the admissible shock structure precise, and the Riemann problem is the cleanest setting in which to see them at work. Chapter 7 isolates a single jump discontinuity so that the full nonlinear selection mechanism can be studied in its simplest form.
# 7. The Riemann Problem for Scalar Conservation Laws
Chapters 5 and 6 introduced weak solutions and entropy admissibility because nonlinear conservation laws can form discontinuities even from smooth data. This chapter studies the simplest discontinuous Cauchy problem: a single jump separating two constant states. The Riemann problem is small enough to solve explicitly, but rich enough to reveal the elementary waves from which more complicated solutions are assembled.
For a scalar conservation law
\begin{align*}
u_t + f(u)_x = 0.
\end{align*}
the main question is how the left state $u_L$ and right state $u_R$ determine the wave fan. Convex fluxes give a complete answer: decreasing jumps produce entropy shocks, increasing jumps produce rarefaction fans, and linear pieces of the flux produce contact discontinuities.
## Piecewise Constant Data and Self-Similar Form
How much of a conservation law is already visible in the evolution of a single jump? The Riemann problem isolates this question by removing all length scales except the self-similar variable
\begin{align*}
\xi=\frac{x}{t}.
\end{align*}
This makes it the local model for discontinuities and the building block behind front-tracking approximations.
[definition: Scalar Riemann Problem]
Let $f: \mathbb R \to \mathbb R$ satisfy $f \in C^1(\mathbb R)$, and let $u_L,u_R \in \mathbb R$. The scalar Riemann problem for the conservation law $u_t+f(u)_x=0$ is the Cauchy problem with initial data $u(x,0)=u_L$ for $x<0$ and $u(x,0)=u_R$ for $x>0$.
[/definition]
The definition captures the smallest non-smooth datum that can generate nonlinear wave behaviour. To exploit the scaling symmetry of this datum, we need a class of solutions depending only on the ray through the origin in space-time.
[definition: Self-Similar Riemann Solution]
A self-similar solution of the scalar Riemann problem is a weak solution of the form
\begin{align*}
u(x,t) = U(x/t), \qquad t>0,
\end{align*}
where $U: \mathbb R \to \mathbb R$ has left and right limits $u_L$ and $u_R$ at $-\infty$ and $+\infty$.
[/definition]
Self-similarity turns propagation speed into position inside the fan. The next obstruction is that a discontinuity inside such a profile cannot travel with an arbitrary speed; conservation across the moving interface fixes that speed.
[example: Stationary Jump with Equal Flux]
Suppose $f(u_L)=f(u_R)$, and define $u(x,t)=u_L$ on the left half-plane $x<0$ and $u(x,t)=u_R$ on the right half-plane $x>0$. The interface is the vertical line $x=0$, which is the line $x=st$ with $s=0$.
The Rankine-Hugoniot balance for a jump from $u_L$ to $u_R$ is
\begin{align*}
s(u_R-u_L)=f(u_R)-f(u_L).
\end{align*}
Substituting $s=0$ and $f(u_R)=f(u_L)$ gives
\begin{align*}
0\cdot (u_R-u_L)=0.
\end{align*}
So the stationary interface satisfies the weak conservation balance. If $u_L\ne u_R$, the equivalent speed formula gives
\begin{align*}
s=\frac{f(u_R)-f(u_L)}{u_R-u_L}=\frac{0}{u_R-u_L}=0.
\end{align*}
Thus a nonzero jump can be weakly admissible when the two flux values agree; whether this stationary jump is entropy admissible is a separate question determined by the geometry of $f$ between and around the two states.
[/example]
The example illustrates a stationary balance of fluxes. To decide which moving jumps can appear in any weak solution, we need the general balance law across a moving discontinuity.
[explanation: Rankine-Hugoniot Jump Condition]
For a scalar conservation law $u_t+f(u)_x=0$, suppose a piecewise smooth solution has left and right traces $u^-(t)$ and $u^+(t)$ across a $C^1$ interface $x=\xi(t)$. Conservation across the moving interface forces
\begin{align*}
\dot{\xi}(t)\bigl(u^+(t)-u^-(t)\bigr)=f(u^+(t))-f(u^-(t)).
\end{align*}
For a constant-speed shock $x=st$, this becomes $s(u_+-u_-)=f(u_+)-f(u_-)$.
[/explanation]
Rankine-Hugoniot is a conservation law constraint, not a selection principle. The piecewise $C^1$ and constant-trace hypotheses matter because the proof only balances two well-defined traces across one moving interface; if oscillations or several fronts accumulate at the interface, the formula no longer identifies a single local wave. Even when the formula holds, it does not decide entropy admissibility: for Burgers' equation, both a decreasing shock and an increasing expansion shock can satisfy the same algebraic jump relation, but only the decreasing one is selected by entropy. The next task is to distinguish jumps that compress characteristics, which are entropy admissible for convex fluxes, from jumps that create information at the discontinuity.
## Shock Waves, Rarefaction Waves, and Contact Discontinuities
When a jump evolves, should it remain compressed into a discontinuity or spread into a continuous fan? The answer is governed by the characteristic speed $f'(u)$. Characteristics entering a jump support a shock; characteristics spreading apart produce a rarefaction.
[definition: Characteristic Speed for a Scalar Conservation Law]
Let $f: \mathbb R \to \mathbb R$ satisfy $f \in C^1(\mathbb R)$. The characteristic speed map associated to $f$ is
\begin{align*}
c: \mathbb R \to \mathbb R, \qquad c(u)=f'(u).
\end{align*}
[/definition]
For smooth solutions, curves with velocity $f'(u)$ carry constant values of $u$. This motivates the shock definition: a jump is admissible as a candidate wave only after its speed has been tied to the Rankine-Hugoniot balance.
[definition: Shock Wave]
Let $f: \mathbb R \to \mathbb R$ satisfy $f \in C^1(\mathbb R)$, let $u_L,u_R \in \mathbb R$, and let $s \in \mathbb R$. A shock wave connecting $u_L$ to $u_R$ is a bounded weak solution
\begin{align*}
u: \mathbb R \times (0,\infty) \to \mathbb R, \qquad u \in L^\infty(\mathbb R \times (0,\infty)),
\end{align*}
of $u_t+f(u)_x=0$ such that $u(x,t)=u_L$ for $x<st$, $u(x,t)=u_R$ for $x>st$, and $s$ satisfies the Rankine-Hugoniot condition.
[/definition]
A shock definition alone still permits expansion shocks. To encode the compression of characteristics from both sides, we need the Lax entropy inequality.
[definition: Lax Entropy Shock]
Let $f \in C^2(\mathbb R)$ and let $u_L \ne u_R$. A shock connecting $u_L$ to $u_R$ with speed $s$ is a Lax entropy shock if
\begin{align*}
f'(u_L) > s > f'(u_R).
\end{align*}
[/definition]
The Lax condition compares tangent slopes with the chord slope through the two states. A Rankine-Hugoniot discontinuity only enforces conservation across the jump; it does not decide whether characteristics enter the jump or spread away from it. For a strictly convex flux, the chord slope lies between the endpoint tangent slopes, so the remaining question is which ordering of $u_L$ and $u_R$ makes both families of nearby characteristics impinge on the discontinuity. That comparison gives the scalar shock rule.
[quotetheorem:6155]
[citeproof:6155]
The theorem settles decreasing jumps for strictly convex fluxes because strict convexity makes every chord slope lie strictly between the two endpoint tangent slopes. This hypothesis is needed: for a linear flux the tangent slopes are all equal and a nonconstant jump is a contact rather than a compressive shock, while for a nonconvex flux the chord slope may lie on the wrong side of one endpoint tangent. The result also does not say that every Rankine-Hugoniot shock is admissible; increasing jumps for convex fluxes satisfy the jump formula but violate the Lax inequalities. For increasing jumps the same characteristic picture points in the opposite direction, so we need a wave that spreads a continuum of states across the opening fan.
[definition: Rarefaction Wave]
Let $f: \mathbb R \to \mathbb R$ satisfy $f \in C^2(\mathbb R)$, and let $u_L,u_R \in \mathbb R$. A rarefaction wave for the scalar conservation law $u_t+f(u)_x=0$ is a self-similar weak solution $u(x,t)=U(x/t)$ whose profile is a continuous map
\begin{align*}
U: \mathbb R \to \mathbb R
\end{align*}
with $U(\xi)=u_L$ for $\xi$ below the left edge of the fan, $U(\xi)=u_R$ for $\xi$ above the right edge of the fan, and
\begin{align*}
f'(U(\xi))=\xi
\end{align*}
on the interior of the fan.
[/definition]
The rarefaction definition says that the state placed on the ray
\begin{align*}
\frac{x}{t}=\xi
\end{align*}
has characteristic speed $\xi$. Strict convexity makes $f'$ invertible on intervals, so the definition leads to an explicit profile for increasing Riemann data.
[explanation: Rarefaction Fan for Convex Riemann Data]
Let $q\in C^2(\mathbb R)$ satisfy $q''>0$, and let $\rho_L<\rho_R$. The centered rarefaction for the Riemann problem is the self-similar profile with value $\rho_L$ on $x\le q'(\rho_L)t$, value $(q')^{-1}(x/t)$ on $q'(\rho_L)t<x<q'(\rho_R)t$, and value $\rho_R$ on $x\ge q'(\rho_R)t$. Strict convexity makes $q'$ strictly increasing, so the middle expression is well defined and fills the spreading fan continuously.
[/explanation]
The increasing-data assumption is essential because the fan uses the monotone inverse of $f'$ on the interval from $u_L$ to $u_R$. Strict convexity is also doing real work: for a concave flux the increasing jump compresses rather than spreads, and for a nonconvex flux such as $f(u)=u^3$ across an interval containing $0$, the speed $f'(u)$ is not monotone on the whole state interval. The theorem does not classify those concave or nonconvex cases; it identifies the centered rarefaction selected by entropy when the characteristic speeds open out in the strictly convex scalar setting. Shocks compress and rarefactions spread. A third limiting case occurs when the characteristic speed is the same on both sides, so the next definition records transported discontinuities with no compression or spreading.
[definition: Contact Discontinuity]
Let $f: \mathbb R \to \mathbb R$ satisfy $f \in C^1(\mathbb R)$, let $u_L,u_R \in \mathbb R$, and let $s \in \mathbb R$. A contact discontinuity is a bounded weak solution
\begin{align*}
u: \mathbb R \times (0,\infty) \to \mathbb R, \qquad u \in L^\infty(\mathbb R \times (0,\infty)),
\end{align*}
of $u_t+f(u)_x=0$ such that $u(x,t)=u_L$ for $x<st$, $u(x,t)=u_R$ for $x>st$, and
\begin{align*}
f'(u_L)=s=f'(u_R)
\end{align*}
where $s$ satisfies the Rankine-Hugoniot condition.
[/definition]
For strictly convex fluxes, nonconstant contact discontinuities do not occur because equal tangent speeds force equal states. The role of contacts is still worth isolating because linear scalar equations and later systems both transport jumps without changing their strength.
[example: Linear Flux and Transported Jumps]
Let $f(u)=a u$ with $a \in \mathbb R$, and let $u_L,u_R \in \mathbb R$. For a jump moving along $x=st$, the Rankine-Hugoniot balance is
\begin{align*}
s(u_R-u_L)=f(u_R)-f(u_L).
\end{align*}
Since $f(u_R)=a u_R$ and $f(u_L)=a u_L$, the right-hand side is
\begin{align*}
f(u_R)-f(u_L)=a u_R-a u_L=a(u_R-u_L).
\end{align*}
Thus the jump condition becomes
\begin{align*}
s(u_R-u_L)=a(u_R-u_L).
\end{align*}
If $u_L\ne u_R$, division by $u_R-u_L$ gives
\begin{align*}
s=a.
\end{align*}
If $u_L=u_R$, the left and right states are the same, so the solution is simply the constant state transported by the equation.
Therefore the piecewise constant function with value $u_L$ on $x<at$ and value $u_R$ on $x>at$ satisfies the conservation balance across its only interface. The characteristic speed is constant because
\begin{align*}
f'(u)=a
\end{align*}
for every state $u$. In particular,
\begin{align*}
f'(u_L)=a=s=f'(u_R).
\end{align*}
The jump is transported at the same speed as every smooth characteristic, so this is the scalar contact case.
[/example]
The three wave types are not separate tricks; for a convex scalar flux they exhaust the Riemann problem. We can now package the shock, rarefaction, and constant cases into one solver.
[quotetheorem:6156]
[citeproof:6156]
The theorem depends on strict convexity because it turns the graph of $f$ into a single ordered family of tangent slopes: every interval of states is resolved by either one chord-controlled shock or one inverse-slope rarefaction. If $f$ has a linear segment, nonconstant contacts can appear and uniqueness within the same formula has to be stated with that degeneracy in mind; if $f$ is nonconvex, the entropy solution may contain compound waves chosen by convex or concave envelopes rather than by a single chord. The theorem also does not cover systems, multidimensional conservation laws, or unbounded solution classes; its uniqueness statement is the scalar Kruzhkov one with the prescribed Riemann trace. It is nevertheless the scalar prototype for Riemann solvers in hyperbolic systems. Systems replace a single convex graph by several characteristic families, but the local vocabulary of shocks, rarefactions, and contacts remains.
## Worked Scalar Models
How do these abstract wave rules look in equations that appear in applications? Burgers' equation gives the cleanest algebraic model, while traffic flow shows why nonlinearity and entropy selection carry physical content.
[example: Burgers Shock]
For inviscid Burgers' equation,
\begin{align*}
u_t+\left(\frac{u^2}{2}\right)_x=0,
\end{align*}
the flux is $f(u)=u^2/2$, hence the characteristic speed is
\begin{align*}
f'(u)=\frac{d}{du}\left(\frac{u^2}{2}\right)=u.
\end{align*}
Assume $u_L>u_R$. The Rankine-Hugoniot balance for a jump from $u_L$ to $u_R$ is
\begin{align*}
s(u_R-u_L)=f(u_R)-f(u_L).
\end{align*}
Substituting $f(u)=u^2/2$ gives
\begin{align*}
s(u_R-u_L)=\frac{u_R^2}{2}-\frac{u_L^2}{2}.
\end{align*}
The numerator factors as
\begin{align*}
\frac{u_R^2}{2}-\frac{u_L^2}{2}=\frac{u_R^2-u_L^2}{2}=\frac{(u_R-u_L)(u_R+u_L)}{2}.
\end{align*}
Therefore
\begin{align*}
s(u_R-u_L)=\frac{(u_R-u_L)(u_R+u_L)}{2}.
\end{align*}
Since $u_L>u_R$, we have $u_R-u_L\ne 0$, so division by $u_R-u_L$ gives
\begin{align*}
s=\frac{u_L+u_R}{2}.
\end{align*}
The Lax entropy inequalities are
\begin{align*}
f'(u_L)>s>f'(u_R).
\end{align*}
Using $f'(u)=u$ and the speed just computed, these become
\begin{align*}
u_L>\frac{u_L+u_R}{2}>u_R.
\end{align*}
The left inequality holds because
\begin{align*}
u_L-\frac{u_L+u_R}{2}=\frac{u_L-u_R}{2}>0,
\end{align*}
and the right inequality holds because
\begin{align*}
\frac{u_L+u_R}{2}-u_R=\frac{u_L-u_R}{2}>0.
\end{align*}
Thus the decreasing Burgers jump is a Lax entropy shock moving along the line
\begin{align*}
x=\frac{u_L+u_R}{2}t.
\end{align*}
[/example]
The same equation with the states reversed produces a continuous fan. This pair of examples is worth remembering because it is the local picture behind characteristic crossing and spreading.
[example: Burgers Rarefaction]
For inviscid Burgers' equation,
\begin{align*}
u_t+\left(\frac{u^2}{2}\right)_x=0,
\end{align*}
the flux is $f(u)=u^2/2$, so its characteristic speed is
\begin{align*}
f'(u)=\frac{d}{du}\left(\frac{u^2}{2}\right)=u.
\end{align*}
Assume $u_L<u_R$. A rarefaction is self-similar, so write $u(x,t)=U(\xi)$ with $\xi=x/t$. On the nonconstant part of the fan, the rarefaction condition is
\begin{align*}
f'(U(\xi))=\xi.
\end{align*}
Substituting $f'(u)=u$ gives
\begin{align*}
U(\xi)=\xi.
\end{align*}
The left edge of the fan travels with speed
\begin{align*}
f'(u_L)=u_L,
\end{align*}
and the right edge travels with speed
\begin{align*}
f'(u_R)=u_R.
\end{align*}
Thus the profile is $U(\xi)=u_L$ when $\xi<u_L$, $U(\xi)=\xi$ when $u_L\le \xi\le u_R$, and $U(\xi)=u_R$ when $\xi>u_R$. Equivalently, since $\xi=x/t$, the solution is $u(x,t)=u_L$ when $x/t<u_L$, $u(x,t)=x/t$ when $u_L\le x/t\le u_R$, and $u(x,t)=u_R$ when $x/t>u_R$.
Inside the fan,
\begin{align*}
u(x,t)=\frac{x}{t}.
\end{align*}
Each ray has speed $\xi=x/t$ between $u_L$ and $u_R$, and the state on that ray is $U(\xi)=\xi$, whose Burgers characteristic speed is $f'(U(\xi))=U(\xi)=\xi$.
[/example]
Traffic flow uses the same mathematics with a different interpretation of the unknown. The conserved quantity is density, and the flux is density times velocity.
[example: Lighthill-Whitham-Richards Traffic Riemann Problem]
Let $\rho(x,t)$ denote car density, with $v_{\max}>0$ and $\rho_{\max}>0$, and consider
\begin{align*}
\rho_t+q(\rho)_x=0,\qquad q(\rho)=\rho v_{\max}\left(1-\frac{\rho}{\rho_{\max}}\right).
\end{align*}
Expanding the flux gives
\begin{align*}
q(\rho)=v_{\max}\rho-\frac{v_{\max}}{\rho_{\max}}\rho^2.
\end{align*}
Therefore
\begin{align*}
q'(\rho)=v_{\max}-\frac{2v_{\max}}{\rho_{\max}}\rho=v_{\max}\left(1-\frac{2\rho}{\rho_{\max}}\right).
\end{align*}
Also,
\begin{align*}
q''(\rho)=-\frac{2v_{\max}}{\rho_{\max}}<0.
\end{align*}
Thus the traffic flux is strictly concave, so the characteristic speed decreases as density increases.
Assume first that $\rho_L<\rho_R$. The Rankine-Hugoniot speed of a jump from $\rho_L$ to $\rho_R$ is
\begin{align*}
s=\frac{q(\rho_R)-q(\rho_L)}{\rho_R-\rho_L}.
\end{align*}
Substituting the expanded formula for $q$ gives
\begin{align*}
s=\frac{\left(v_{\max}\rho_R-\frac{v_{\max}}{\rho_{\max}}\rho_R^2\right)-\left(v_{\max}\rho_L-\frac{v_{\max}}{\rho_{\max}}\rho_L^2\right)}{\rho_R-\rho_L}.
\end{align*}
Collecting the linear and quadratic terms gives
\begin{align*}
s=\frac{v_{\max}(\rho_R-\rho_L)-\frac{v_{\max}}{\rho_{\max}}(\rho_R^2-\rho_L^2)}{\rho_R-\rho_L}.
\end{align*}
Since
\begin{align*}
\rho_R^2-\rho_L^2=(\rho_R-\rho_L)(\rho_R+\rho_L),
\end{align*}
we get
\begin{align*}
s=\frac{v_{\max}(\rho_R-\rho_L)-\frac{v_{\max}}{\rho_{\max}}(\rho_R-\rho_L)(\rho_R+\rho_L)}{\rho_R-\rho_L}.
\end{align*}
Because $\rho_L<\rho_R$, the denominator $\rho_R-\rho_L$ is nonzero, so cancellation gives
\begin{align*}
s=v_{\max}\left(1-\frac{\rho_L+\rho_R}{\rho_{\max}}\right).
\end{align*}
Now compare this speed with the characteristic speeds on the two sides:
\begin{align*}
q'(\rho_L)-s=v_{\max}\left(1-\frac{2\rho_L}{\rho_{\max}}\right)-v_{\max}\left(1-\frac{\rho_L+\rho_R}{\rho_{\max}}\right).
\end{align*}
The right-hand side reduces to
\begin{align*}
q'(\rho_L)-s=\frac{v_{\max}}{\rho_{\max}}(\rho_R-\rho_L)>0.
\end{align*}
Similarly,
\begin{align*}
s-q'(\rho_R)=v_{\max}\left(1-\frac{\rho_L+\rho_R}{\rho_{\max}}\right)-v_{\max}\left(1-\frac{2\rho_R}{\rho_{\max}}\right).
\end{align*}
Thus
\begin{align*}
s-q'(\rho_R)=\frac{v_{\max}}{\rho_{\max}}(\rho_R-\rho_L)>0.
\end{align*}
Hence
\begin{align*}
q'(\rho_L)>s>q'(\rho_R).
\end{align*}
Characteristics enter the discontinuity from both sides, so an increasing-density traffic jump forms a shock.
If instead $\rho_L>\rho_R$, then the decreasing nature of $q'$ gives
\begin{align*}
q'(\rho_L)<q'(\rho_R).
\end{align*}
The characteristic speeds spread apart, so the solution is a self-similar rarefaction $\rho(x,t)=R(\xi)$ with $\xi=x/t$. Inside the fan, the rarefaction condition is
\begin{align*}
q'(R(\xi))=\xi.
\end{align*}
Substituting the formula for $q'$ gives
\begin{align*}
v_{\max}\left(1-\frac{2R(\xi)}{\rho_{\max}}\right)=\xi.
\end{align*}
Dividing by $v_{\max}$ gives
\begin{align*}
1-\frac{2R(\xi)}{\rho_{\max}}=\frac{\xi}{v_{\max}}.
\end{align*}
Moving terms gives
\begin{align*}
\frac{2R(\xi)}{\rho_{\max}}=1-\frac{\xi}{v_{\max}}.
\end{align*}
Multiplying by $\rho_{\max}/2$ gives
\begin{align*}
R(\xi)=\frac{\rho_{\max}}{2}\left(1-\frac{\xi}{v_{\max}}\right).
\end{align*}
Therefore the rarefaction has left state $R(\xi)=\rho_L$ for $\xi<q'(\rho_L)$, interior profile
\begin{align*}
R(\xi)=\frac{\rho_{\max}}{2}\left(1-\frac{\xi}{v_{\max}}\right)
\end{align*}
for $q'(\rho_L)\le \xi\le q'(\rho_R)$, and right state $R(\xi)=\rho_R$ for $\xi>q'(\rho_R)$. The traffic model reverses the convex-flux rule: an increase in density compresses characteristics into a shock, while a decrease in density opens a rarefaction fan.
[/example]
The traffic example reverses the increasing/decreasing rule because the flux is concave rather than convex. The geometric principle is unchanged: compare the characteristic speeds on the two sides, then choose compression or spreading.
## Entropy Admissibility and Nonconvex Warning Cases
Why do the convex formulas not settle every scalar conservation law? Convexity prevents characteristic speed from changing direction as the state changes. When $f''$ changes sign, a single Riemann solution may contain compound waves, and the chord-slope picture no longer gives a two-case classification.
[definition: Kruzhkov Entropy Solution]
Let $f: \mathbb R \to \mathbb R$ satisfy $f \in C^1(\mathbb R)$. A bounded weak solution $u \in L^\infty(\mathbb R \times (0,\infty))$ of $u_t+f(u)_x=0$ is a Kruzhkov entropy solution if for every $k \in \mathbb R$,
\begin{align*}
\partial_t |u-k| + \partial_x\left(\operatorname{sgn}(u-k)(f(u)-f(k))\right) \le 0
\end{align*}
in the sense of distributions.
[/definition]
This condition packages all constant-state entropy inequalities at once. To verify that the explicit convex waves are the right weak solutions, we need to connect the shock and rarefaction formulas with this distributional entropy condition.
[quotetheorem:6157]
[citeproof:6157]
The strict convexity hypothesis is used twice: it gives monotone characteristic speed for the rarefaction construction, and it makes the entropy jump inequality equivalent to Lax compression for shocks. The statement is limited to the explicit convex Riemann waves; it does not claim that arbitrary weak solutions, nonconvex Riemann patterns, or systems satisfy Kruzhkov admissibility by the same chord argument. The entropy condition is especially important when the graph of $f$ has an inflection point, because then intermediate states can be selected by envelope constructions rather than by one shock or one fan. The next example shows why the convex theorem has a genuine convexity hypothesis rather than a cosmetic assumption.
[example: Flux with an Inflection Point]
Take $f(u)=u^3$ and choose states $u_L<0<u_R$. The characteristic speed is
\begin{align*}
f'(u)=\frac{d}{du}(u^3)=3u^2.
\end{align*}
On the negative side,
\begin{align*}
\frac{d}{du}(3u^2)=6u<0 \qquad \text{for } u<0,
\end{align*}
so $f'(u)$ decreases as $u$ moves from $u_L$ toward $0$. On the positive side,
\begin{align*}
\frac{d}{du}(3u^2)=6u>0 \qquad \text{for } u>0,
\end{align*}
so $f'(u)$ increases as $u$ moves from $0$ toward $u_R$. Also
\begin{align*}
f'(0)=0,\qquad f'(u_L)=3u_L^2>0,\qquad f'(u_R)=3u_R^2>0.
\end{align*}
Thus the speed map falls from $3u_L^2$ to $0$ and then rises from $0$ to $3u_R^2$ across the interval $[u_L,u_R]$. A single centered rarefaction would require a continuous profile satisfying
\begin{align*}
f'(U(\xi))=\xi,
\end{align*}
with the fan ordered by the ray speed $\xi=x/t$; here the same positive speed can occur at two different states, one negative and one positive, because
\begin{align*}
3(-a)^2=3a^2=3a^2 \qquad \text{for } a>0.
\end{align*}
So one monotone rarefaction fan cannot pass through the whole interval from $u_L$ to $u_R$.
A single shock from $u_L$ to $u_R$ would have Rankine-Hugoniot speed
\begin{align*}
s=\frac{f(u_R)-f(u_L)}{u_R-u_L}
=\frac{u_R^3-u_L^3}{u_R-u_L}.
\end{align*}
Factoring the numerator gives
\begin{align*}
u_R^3-u_L^3=(u_R-u_L)(u_R^2+u_Ru_L+u_L^2),
\end{align*}
and since $u_R-u_L\ne 0$,
\begin{align*}
s=u_R^2+u_Ru_L+u_L^2.
\end{align*}
For the intermediate constant $k=0$, the Kruzhkov entropy jump inequality for this single discontinuity reads
\begin{align*}
s\bigl(|u_R|-|u_L|\bigr)
\le
\operatorname{sgn}(u_R)\bigl(f(u_R)-f(0)\bigr)
-
\operatorname{sgn}(u_L)\bigl(f(u_L)-f(0)\bigr).
\end{align*}
Since $u_R>0$, $u_L<0$, and $f(0)=0$, this becomes
\begin{align*}
s(u_R+u_L)\le u_R^3+u_L^3.
\end{align*}
Substituting the shock speed gives
\begin{align*}
(u_R^2+u_Ru_L+u_L^2)(u_R+u_L)\le u_R^3+u_L^3.
\end{align*}
The left-hand side expands to
\begin{align*}
u_R^3+2u_R^2u_L+2u_Ru_L^2+u_L^3,
\end{align*}
so the inequality is equivalent to
\begin{align*}
2u_R^2u_L+2u_Ru_L^2\le 0.
\end{align*}
Factoring gives
\begin{align*}
2u_Ru_L(u_R+u_L)\le 0.
\end{align*}
Because $u_Ru_L<0$, this inequality holds exactly when $u_R+u_L\ge 0$. If $u_R<-u_L$, the inequality fails for $k=0$, so the single shock is not always entropy admissible.
The obstruction is the inflection point at $0$, where
\begin{align*}
f''(u)=6u
\end{align*}
changes sign. The entropy Riemann solution is therefore not determined by the two-case convex rule; it is selected by the lower convex envelope of $f$ on $[u_L,u_R]$, producing a compound pattern in which shocks and rarefactions are joined at intermediate states.
[/example]
This warning case explains why the scalar theory still needs Kruzhkov entropy solutions outside the convex setting. The explicit Riemann solver must then inspect the convex or concave envelope of the flux between the two states.
## Wave Interaction and the Front-Tracking Preview
What happens when the initial data contain many jumps rather than one? Each jump generates a local Riemann solution, and the resulting elementary waves move until they meet. Understanding their interactions is the next step from exact Riemann solutions to approximate solutions for general initial data.
[definition: Piecewise Constant Front-Tracking Approximation]
Let $T>0$. A front-tracking approximation is an approximate solution
\begin{align*}
u^\delta: \mathbb R \times [0,T] \to \mathbb R, \qquad u^\delta \in L^\infty(\mathbb R \times [0,T]),
\end{align*}
which is piecewise constant on finitely many polygonal regions in the $(x,t)$ plane. The interfaces between these regions are moving fronts, and each front is a shock, contact, or small rarefaction front produced by a local Riemann solver.
[/definition]
Rarefaction fans are continuous, so a front-tracking scheme replaces them by many small jumps. This discretisation turns the PDE evolution into a finite interaction problem in the $(x,t)$ plane.
[explanation: Interaction Picture]
Before two fronts meet, each front moves with the speed prescribed by its local left and right states. At an interaction point, the outgoing pattern is obtained by solving a new Riemann problem whose left and right states are the outer states adjacent to the interaction. For convex scalar laws, shocks tend to merge with shocks, rarefactions spread, and the total variation does not increase under the entropy flow.
This picture is the mechanism behind existence proofs for entropy solutions with bounded variation initial data: construct front-tracking approximations, control their total variation and interaction count, and then pass to an $L^1_{\mathrm{loc}}$ limit. The Riemann problem is therefore the local algebra used at every interaction vertex.
[/explanation]
The chapter's main lesson is that entropy solutions are selected locally. In the scalar convex case, every jump is resolved by a single shock or rarefaction, and this local rule is compatible with global stability through $L^1$ contraction. Later hyperbolic theory keeps the Riemann problem at the centre, but the solver becomes a multi-wave object because systems have several characteristic speeds rather than the single scalar speed $f'(u)$.
The Riemann problem shows how to treat discontinuities by enlarging the notion of solution, but it still lives within conservation-law theory. Chapter 8 broadens the analytical toolkit by introducing distributions, so singular sources, jumps, and derivatives can all be handled in one calculus.
# 8. Distributions, Fundamental Solutions, and Green Kernels
Distributions let us keep the operations of calculus after classical derivatives have stopped existing. In Chapters 5-7, weak formulations and conservation laws already forced us to treat jumps and boundary terms as first-class objects; this chapter adds point sources to the same distributional language. This chapter makes that language precise: test functions probe an equation locally, distributions act on those probes, and fundamental solutions turn forced linear equations into convolution formulas.
The prerequisites are the divergence theorem, integration by parts on smooth domains, multi-index notation, basic $L^p$ and locally integrable functions, and the weak formulations from the preceding chapters. The guiding question is how much of the classical calculus of PDEs survives when functions are only locally integrable, discontinuous, or even concentrated at a point. The answer is that differentiation, convolution, and Green identities continue to work, provided every statement is interpreted through integration against smooth compactly supported functions.
## Test Functions and Distributions
A classical solution of a PDE can be tested by multiplying the equation by a smooth compactly supported function and integrating by parts. The point of the distributional viewpoint is to reverse this procedure: instead of differentiating the unknown first, we move derivatives onto the test function and use the resulting identity as the definition.
[definition: Test Function]
Let $\Omega \subset \mathbb R^n$ be open. A test function on $\Omega$ is an element of
\begin{align*}
\mathcal D(\Omega) := C_c^\infty(\Omega).
\end{align*}
[/definition]
Thus each $\phi\in\mathcal D(\Omega)$ is a map $\phi:\Omega\to\mathbb R$ whose derivatives of all orders exist and whose support is a compact subset of $\Omega$. Test functions are smooth enough to absorb any number of derivatives, and their compact support removes boundary terms from integration by parts. They give a controlled way to probe local information, but the PDE object itself still needs to be defined by how it responds to all such probes. This motivates replacing functions by linear functionals on $\mathcal D(\Omega)$, which is the smallest enlargement needed to include both ordinary locally integrable functions and point sources.
[definition: Distribution]
Let $\Omega \subset \mathbb R^n$ be open. A distribution on $\Omega$ is a continuous linear functional
\begin{align*}
T : \mathcal D(\Omega) \to \mathbb R.
\end{align*}
The space of distributions on $\Omega$ is denoted $\mathcal D'(\Omega)$.
[/definition]
The continuity condition uses the standard test-function topology; in these notes, the essential consequence is that distributions are determined by how they act on compactly supported smooth probes and are stable under limiting operations used in weak PDE arguments. Since PDEs usually begin with functions rather than abstract functionals, we next record the canonical way a locally integrable function becomes a distribution.
[definition: Regular Distribution]
Let $f \in L^1_{\mathrm{loc}}(\Omega)$. The [regular distribution](/page/Regular%20Distribution) associated to $f$ is the distribution $T_f \in \mathcal D'(\Omega)$ defined by
\begin{align*}
T_f(\phi) := \int_\Omega f(x)\phi(x)\,d\mathcal L^n(x), \qquad \phi \in \mathcal D(\Omega).
\end{align*}
[/definition]
Different locally integrable representatives that agree a.e. define the same regular distribution. This is why distribution theory naturally sits between pointwise classical analysis and measure-theoretic weak formulations. The enlargement is genuine, because some PDE sources are concentrated at a point and cannot be represented by locally integrable densities.
[example: Dirac Mass]
Fix $x_0\in\Omega$. The Dirac mass at $x_0$ is the functional
\begin{align*}
\delta_{x_0}(\phi):=\phi(x_0),\qquad \phi\in\mathcal D(\Omega).
\end{align*}
It is linear because for $a,b\in\mathbb R$ and $\phi,\psi\in\mathcal D(\Omega)$, evaluating the test function $a\phi+b\psi$ at $x_0$ gives
\begin{align*}
\delta_{x_0}(a\phi+b\psi)=(a\phi+b\psi)(x_0)=a\phi(x_0)+b\psi(x_0)=a\delta_{x_0}(\phi)+b\delta_{x_0}(\psi).
\end{align*}
It is continuous in the test-function topology: if $\phi_j\to0$ in $\mathcal D(\Omega)$, then the supports eventually lie in one compact subset of $\Omega$ and $\sup_{x\in\Omega}|\phi_j(x)|\to0$, so
\begin{align*}
|\delta_{x_0}(\phi_j)|=|\phi_j(x_0)|\le \sup_{x\in\Omega}|\phi_j(x)|\to0.
\end{align*}
Thus $\delta_{x_0}\in\mathcal D'(\Omega)$.
This distribution is not induced by any $g\in L^1_{\mathrm{loc}}(\Omega)$. Suppose, for contradiction, that
\begin{align*}
\int_\Omega g(x)\phi(x)\,d\mathcal L^n(x)=\phi(x_0)
\end{align*}
for every $\phi\in\mathcal D(\Omega)$. If $U\Subset\Omega$ is open and $x_0\notin U$, then every $\phi\in C_c^\infty(U)$ satisfies $\phi(x_0)=0$, hence
\begin{align*}
\int_U g(x)\phi(x)\,d\mathcal L^n(x)=0.
\end{align*}
By the standard testing lemma for locally integrable functions, $g=0$ a.e. on each such $U$, and therefore $g=0$ a.e. on $\Omega\setminus\{x_0\}$. Since $\mathcal L^n(\{x_0\})=0$, this gives $g=0$ a.e. on $\Omega$. Choosing $\eta\in\mathcal D(\Omega)$ with $\eta(x_0)=1$ then gives
\begin{align*}
0=\int_\Omega g(x)\eta(x)\,d\mathcal L^n(x)=\eta(x_0)=1,
\end{align*}
a contradiction. Thus $\delta_{x_0}$ is a genuinely singular distribution, representing a unit point source located at $x_0$.
[/example]
The key operation is differentiation. For smooth functions, integration by parts says that differentiating the function is the same as differentiating the test function with a sign change. The next definition promotes that identity to all distributions, including regular distributions from nonsmooth functions and singular point masses.
[definition: Distributional Derivative]
Let $T \in \mathcal D'(\Omega)$ and let $\alpha$ be a multi-index. The [distributional derivative](/page/Distributional%20Derivative) $D^\alpha T \in \mathcal D'(\Omega)$ is defined by
\begin{align*}
D^\alpha T(\phi) := (-1)^{|\alpha|}T(D^\alpha \phi), \qquad \phi \in \mathcal D(\Omega).
\end{align*}
[/definition]
The definition is useful only if it agrees with classical differentiation when classical derivatives exist. The first identity in this chapter is the consistency result: the distributional derivative of a smooth regular distribution is the regular distribution associated to the classical derivative.
[quotetheorem:6158]
[citeproof:6158]
This theorem says that distributional differentiation extends the old operation instead of changing it. The compact support hypothesis on $\phi$ is essential: on $\Omega=(0,1)$ with $f\equiv 1$, integration by parts against the non-compactly supported smooth function $\phi(x)=x$ gives
\begin{align*}
-\int_0^1 \phi'(x)\,d\mathcal L^1(x)=-1,
\end{align*}
although the classical derivative of $f$ is $0$; the missing term is exactly the boundary contribution $\phi(0)-\phi(1)$. Compactly supported test functions remove this dependence on boundary values.
The $C^1$ assumption on $f$ is needed only for the conclusion that the distributional derivative is the regular distribution induced by $\partial_i f$. If $f=H=\mathbb{1}_{(0,\infty)}$ on $\mathbb R$, then $H\in L^1_{\mathrm{loc}}(\mathbb R)$ but no locally integrable function represents its distributional derivative, because the derivative is the point mass $\delta_0$. This limitation is exactly what makes singular terms visible: when a function has a jump, integration by parts detects the boundary of the jump set. The most basic case is the Heaviside function.
[example: Heaviside Derivative]
Let $H:\mathbb R\to\mathbb R$ be $H(x)=\mathbb{1}_{(0,\infty)}(x)$. Since $H=0$ on $(-\infty,0]$ and $H=1$ on $(0,\infty)$, its regular distribution satisfies
\begin{align*}
T_H(\psi)=\int_{\mathbb R}H(x)\psi(x)\,d\mathcal L^1(x)=\int_0^\infty \psi(x)\,d\mathcal L^1(x)
\end{align*}
for every $\psi\in C_c^\infty(\mathbb R)$. Therefore, for $\phi\in C_c^\infty(\mathbb R)$, the definition of distributional derivative gives
\begin{align*}
D T_H(\phi)=-T_H(\phi')=-\int_0^\infty \phi'(x)\,d\mathcal L^1(x).
\end{align*}
Because $\phi$ has compact support, there is $R>0$ such that $\phi(x)=0$ for all $x\ge R$. Hence
\begin{align*}
-\int_0^\infty \phi'(x)\,d\mathcal L^1(x)=-\int_0^R \phi'(x)\,dx=-(\phi(R)-\phi(0)).
\end{align*}
Since $\phi(R)=0$, this becomes
\begin{align*}
-(\phi(R)-\phi(0))=\phi(0).
\end{align*}
Thus
\begin{align*}
D T_H(\phi)=\phi(0)=\delta_0(\phi)
\end{align*}
for every test function $\phi$, so $D T_H=\delta_0$ in $\mathcal D'(\mathbb R)$. The jump of size $1$ at the origin is recorded distributionally as a unit Dirac mass.
[/example]
## Convolution and Distributional Solutions
Many linear constant-coefficient PDEs are solved by translating the effect of a point source. Convolution is the operation that superposes translated point-source responses, so it is the algebraic bridge between distributions and representation formulas.
[definition: Convolution]
The test-function convolution operation is the bilinear map
\begin{align*}
* : L^1_{\mathrm{loc}}(\mathbb R^n) \times C_c^\infty(\mathbb R^n) \to C^\infty(\mathbb R^n)
\end{align*}
defined by
\begin{align*}
(f*g)(x) := \int_{\mathbb R^n} f(x-y)g(y)\,d\mathcal L^n(y), \qquad x\in\mathbb R^n.
\end{align*}
[/definition]
The same formula extends under standard hypotheses such as one factor being compactly supported or rapidly decaying. In PDE applications, the main use is that differentiating a convolution may be transferred from one factor to the other. The next theorem makes this compatibility precise, which is needed before convolution can be used as a solution formula for differential equations.
[quotetheorem:6159]
[citeproof:6159]
The compact support and smoothness of $g$ are what justify differentiating under the integral and applying Fubini without extra decay assumptions on $f$. The theorem has three separate limits. First, it does not assert that arbitrary locally integrable functions can be convolved: on $\mathbb R$, taking $f\equiv 1$ and $g\equiv 1$ gives $\int_{\mathbb R}1\,d\mathcal L^1=\infty$. Second, it does not define convolution of two arbitrary distributions; distribution-distribution convolution requires additional hypotheses, such as compact support for one factor. Third, it gives regularity by smoothing against a compactly supported smooth function, not by differentiating a nonsmooth kernel pointwise. If $f=\delta_0$ as a distribution and $g=H$ is the Heaviside function, then $\delta_0*g=H$, whose derivative is $\delta_0$ rather than an ordinary function obtained by differentiating a smooth kernel under the integral sign. These boundary cases explain why the theorem uses a compactly supported smooth factor when no global decay or distribution-convolution machinery has been imposed.
This result explains why convolution is compatible with operators whose coefficients do not depend on position. If the coefficients vary with $x$, differentiating a convolution no longer simply passes the operator through the integral: derivatives can hit the variable coefficients and produce extra commutator terms. The [representation formula](/theorems/39) below is therefore restricted to the class of linear differential operators whose coefficients are fixed constants.
[definition: Constant-Coefficient Linear Differential Operator]
A constant-coefficient linear differential operator of order at most $m$ on $\mathbb R^n$ is the [linear map](/page/Linear%20Map)
\begin{align*}
L:C^\infty(\mathbb R^n)\to C^\infty(\mathbb R^n)
\end{align*}
defined by
\begin{align*}
Lu = \sum_{|\alpha|\le m} a_\alpha D^\alpha u,
\end{align*}
where $a_\alpha \in \mathbb R$ are constants and $m \in \mathbb N$.
[/definition]
The next step is to turn the informal idea of a point-source response into an object that can be used in distributional calculations. For a general forcing term $f$, one would like to build a solution by adding up the effects of infinitesimal point sources. This requires first isolating the response to a single unit source at the origin: a distribution whose image under $L$ is exactly the Dirac mass there.
[definition: Fundamental Solution]
Let $L$ be a constant-coefficient linear differential operator on $\mathbb R^n$. A fundamental solution for $L$ is a distribution $E \in \mathcal D'(\mathbb R^n)$ such that
\begin{align*}
LE = \delta_0
\end{align*}
in $\mathcal D'(\mathbb R^n)$.
[/definition]
The equation $LE=\delta_0$ says that $E$ is the field generated by a unit point source at the origin. Translating $E$ gives the field generated by a point source elsewhere, and convolution with $f$ superposes these translated responses. The next theorem is the formal engine behind whole-space representation formulas.
[quotetheorem:6171]
[citeproof:6171]
This theorem converts the search for solutions into the search for kernels. The constant-coefficient hypothesis is essential for the displayed calculation. For instance, in one dimension let $L=a(x)D$ with nonconstant $a\in C^1(\mathbb R)$ and let $E=H$, so $DE=\delta_0$. Then
\begin{align*}
L(E*f)=a(x)(D E*f),
\end{align*}
whereas $(LE)*f=(aD E)*f$ concentrates the coefficient at the singular variable and gives $a(0)(\delta_0*f)$ in the simplest point-source calculation. These agree only when $a(x)=a(0)$ on the relevant region, so variable coefficients cannot be handled by this convolution identity without correction terms.
The theorem also does not assert existence or uniqueness of a fundamental solution, nor does it give regularity of $E*f$ beyond the distributional conclusion $Lu=f$. The compact support and smoothness of $f$ keep the convolution with the distribution $E$ in the standard well-defined setting. A concrete failure occurs for the Laplacian in dimension $n\ge3$: if $f\equiv1$ on $\mathbb R^n$, then the formal potential at $x=0$ would contain
\begin{align*}
\int_{\mathbb R^n} |y|^{2-n}\,d\mathcal L^n(y),
\end{align*}
whose radial form is a nonzero constant times $\int_0^\infty r\,dr$, so it diverges at infinity. Thus the condition that $E*f$ be well-defined is a real hypothesis, not only a technical phrase. Once a fundamental solution is known, the forced equation is solved by an integral formula, and the formula can often be read geometrically from the characteristics of the operator.
[illustration:pdei-transport-fundamental-solution]
[example: Forced Transport Equation]
Consider $L=\partial_t+c\partial_x$ on $\mathbb R^2$, with $c\in\mathbb R$. The point-source response is concentrated on the forward characteristic $x=ct$, $t>0$, so we write it as $E(t,x)=H(t)\delta(x-ct)$ in the distributional sense. For $f\in C_c^\infty(\mathbb R^2)$, convolution with this kernel gives
\begin{align*}
(E*f)(t,x)=\int_{\mathbb R}\int_{\mathbb R} H(t-s)\delta\bigl((x-y)-c(t-s)\bigr)f(s,y)\,d\mathcal L^1(y)\,d\mathcal L^1(s).
\end{align*}
For fixed $s$, the Dirac factor evaluates the $y$-integral at the point determined by $(x-y)-c(t-s)=0$, namely $y=x-c(t-s)$. Hence
\begin{align*}
(E*f)(t,x)=\int_{\mathbb R} H(t-s)f\bigl(s,x-c(t-s)\bigr)\,d\mathcal L^1(s).
\end{align*}
Since $H(t-s)=1$ exactly when $s<t$ and $H(t-s)=0$ when $s>t$, this becomes
\begin{align*}
u(t,x):=(E*f)(t,x)=\int_{-\infty}^t f\bigl(s,x-c(t-s)\bigr)\,ds.
\end{align*}
Because $f$ is smooth and compactly supported, differentiating under the integral is justified. The upper endpoint contributes $f(t,x)$, and the chain rule gives
\begin{align*}
\frac{\partial}{\partial t}f\bigl(s,x-c(t-s)\bigr)=-c\,\partial_x f\bigl(s,x-c(t-s)\bigr).
\end{align*}
Therefore
\begin{align*}
\partial_tu(t,x)=f(t,x)-c\int_{-\infty}^t \partial_x f\bigl(s,x-c(t-s)\bigr)\,ds.
\end{align*}
Similarly, differentiating with respect to $x$ gives
\begin{align*}
\partial_xu(t,x)=\int_{-\infty}^t \partial_x f\bigl(s,x-c(t-s)\bigr)\,ds.
\end{align*}
Adding these two identities in the combination $\partial_tu+c\partial_xu$ yields
\begin{align*}
(\partial_t+c\partial_x)u(t,x)=f(t,x)-c\int_{-\infty}^t \partial_x f\bigl(s,x-c(t-s)\bigr)\,ds+c\int_{-\infty}^t \partial_x f\bigl(s,x-c(t-s)\bigr)\,ds.
\end{align*}
The two integral terms cancel, so
\begin{align*}
(\partial_t+c\partial_x)u(t,x)=f(t,x).
\end{align*}
Thus the convolution formula solves the forced transport equation, and the value at $(t,x)$ is obtained by integrating the forcing along the backward characteristic $s\mapsto (s,x-c(t-s))$ for $s\le t$.
[/example]
## Fundamental Solutions for the Laplacian
The Laplacian is the model [elliptic operator](/page/Elliptic%20Operator). Its fundamental solution describes the potential generated by a point source, and it is the prototype for Poisson equations, Green kernels, and boundary value representation formulas.
[explanation: Fundamental Solution of the Laplacian]
For $n\ge3$, the Newtonian kernel
\begin{align*}
\Phi(x)=\frac{1}{n(n-2)\omega_n}|x|^{2-n}
\end{align*}
is harmonic away from the origin and is normalized by $-\Delta\Phi=\delta_0$ in distributions. In dimension $2$, the corresponding convention is $\Phi(x)=-(2\pi)^{-1}\log |x|$. This is the fundamental solution convention used below for the positive operator $-\Delta$.
[/explanation]
The sign convention is chosen for the positive operator $-\Delta$. If the operator is instead $\Delta$, the fundamental solution is $-\Phi$, because $\Delta\Phi=-\delta_0$ under the convention above. This sign is not cosmetic: using $\Phi$ for $\Delta u=f$ would produce $-\!f$ rather than $f$.
The formulas are dimension-dependent because the radial harmonic functions solving the flux normalisation problem are powers in dimensions $n\ge3$ and logarithmic in dimension $2$. The case $n=1$ is excluded from the displayed formulas because the corresponding radial ODE has a piecewise linear solution rather than a power or logarithm: $\Phi(x)=-|x|/2$ satisfies $-\Phi''=\delta_0$ on $\mathbb R$. The theorem does not say that $\Phi$ is a classical solution at $0$; the whole point is that the missing flux at the singularity is recorded as $\delta_0$. With this convention, the Poisson equation $-\Delta u=f$ on the whole space is represented by convolution with $\Phi$, giving the Newtonian potential.
[example: Newtonian Potential]
Let $n\ge3$ and $f\in C_c^\infty(\mathbb R^n)$, and write
\begin{align*}
\Phi(z)=\frac{1}{n(n-2)\omega_n}|z|^{2-n},\qquad z\neq0.
\end{align*}
Define
\begin{align*}
u(x)=\int_{\mathbb R^n}\Phi(x-y)f(y)\,d\mathcal L^n(y)=\int_{\mathbb R^n}\frac{1}{n(n-2)\omega_n}|x-y|^{2-n}f(y)\,d\mathcal L^n(y).
\end{align*}
This is the convolution $u=\Phi*f$ with the convention
\begin{align*}
(\Phi*f)(x)=\int_{\mathbb R^n}\Phi(x-y)f(y)\,d\mathcal L^n(y).
\end{align*}
By *Fundamental Solution of the Laplacian*, $-\Delta\Phi=\delta_0$ in $\mathcal D'(\mathbb R^n)$. Applying *[Fundamental Solution Representation](/theorems/6171)* to $L=-\Delta$ gives
\begin{align*}
-\Delta u=(-\Delta)(\Phi*f).
\end{align*}
Since $-\Delta$ has constant coefficients, it passes through convolution with the test function $f$, so
\begin{align*}
(-\Delta)(\Phi*f)=((-\Delta)\Phi)*f.
\end{align*}
Using $(-\Delta)\Phi=\delta_0$, this becomes
\begin{align*}
((-\Delta)\Phi)*f=\delta_0*f.
\end{align*}
The Dirac mass is the convolution identity on test functions, hence
\begin{align*}
\delta_0*f=f.
\end{align*}
Therefore
\begin{align*}
-\Delta u=f
\end{align*}
in $\mathcal D'(\mathbb R^n)$.
Now let $x_0\notin\operatorname{supp}f$. Choose $r>0$ such that $B(x_0,r)\cap\operatorname{supp}f=\varnothing$. If $x\in B(x_0,r)$ and $y\in\operatorname{supp}f$, then $x-y\neq0$, so $x\mapsto\Phi(x-y)$ is smooth near $x$. Since $f$ has compact support and the kernel is smooth on these pairs $(x,y)$, differentiating under the integral gives
\begin{align*}
\Delta u(x)=\int_{\mathbb R^n}\Delta_x\Phi(x-y)f(y)\,d\mathcal L^n(y).
\end{align*}
The integrand vanishes whenever $y\in\operatorname{supp}f$, because $\Phi$ is harmonic away from $0$, so
\begin{align*}
\Delta u(x)=\int_{\operatorname{supp}f}0\cdot f(y)\,d\mathcal L^n(y)=0.
\end{align*}
Thus the Newtonian potential solves $-\Delta u=f$ distributionally on all of $\mathbb R^n$, and it is classically harmonic at every point outside the support of the forcing term.
[/example]
The one-dimensional Poisson equation shows the same idea with less geometry. A point source produces a kink, and the second derivative detects the jump in the first derivative.
[example: One-Dimensional Point Source]
On $\mathbb R$, set $\Phi(x)=-|x|/2$. Equivalently, $\Phi(x)=x/2$ when $x<0$ and $\Phi(x)=-x/2$ when $x>0$. We show that $-\Phi''=\delta_0$ in the distributional sense.
Let $\phi\in C_c^\infty(\mathbb R)$, and choose $R>0$ such that $\phi(x)=0$ for $|x|\ge R$. Since $\phi$ is identically zero near $\pm R$, also $\phi'(-R)=\phi'(R)=0$. By the definition of the second distributional derivative,
\begin{align*}
-D^2T_\Phi(\phi)=-T_\Phi(\phi'').
\end{align*}
Using the two formulas for $\Phi$ on $(-R,0)$ and $(0,R)$ gives
\begin{align*}
-T_\Phi(\phi'')=-\int_{-R}^0 \frac{x}{2}\phi''(x)\,dx+\frac12\int_0^R x\phi''(x)\,dx.
\end{align*}
For the first integral, integration by parts gives
\begin{align*}
\int_{-R}^0 x\phi''(x)\,dx=\left[x\phi'(x)\right]_{-R}^0-\int_{-R}^0\phi'(x)\,dx.
\end{align*}
The boundary term is $0\cdot\phi'(0)-(-R)\phi'(-R)=0$, and
\begin{align*}
\int_{-R}^0\phi'(x)\,dx=\phi(0)-\phi(-R)=\phi(0).
\end{align*}
Hence
\begin{align*}
\int_{-R}^0 x\phi''(x)\,dx=-\phi(0).
\end{align*}
For the second integral,
\begin{align*}
\int_0^R x\phi''(x)\,dx=\left[x\phi'(x)\right]_0^R-\int_0^R\phi'(x)\,dx.
\end{align*}
The boundary term is $R\phi'(R)-0\cdot\phi'(0)=0$, and
\begin{align*}
\int_0^R\phi'(x)\,dx=\phi(R)-\phi(0)=-\phi(0).
\end{align*}
Therefore
\begin{align*}
\int_0^R x\phi''(x)\,dx=\phi(0).
\end{align*}
Substituting the two computed integrals,
\begin{align*}
-D^2T_\Phi(\phi)=-\frac12\bigl(-\phi(0)\bigr)+\frac12\phi(0)=\phi(0).
\end{align*}
Since $\delta_0(\phi)=\phi(0)$, this proves
\begin{align*}
-\Phi''=\delta_0
\end{align*}
in $\mathcal D'(\mathbb R)$.
For a point source at $a\in\mathbb R$, define $u(x)=\Phi(x-a)=-|x-a|/2$. Translating the preceding computation from $0$ to $a$ gives
\begin{align*}
-u''=\delta_a
\end{align*}
in distributions. Thus a one-dimensional point source produces a kink, and the first derivative jumps by $-1$ at the source point.
[/example]
## Green Kernels and Boundary Value Problems
Fundamental solutions solve equations in the whole space, where translation symmetry is available. Boundary value problems break translation symmetry, so the kernel must also encode the boundary condition.
[illustration:pdei-dirichlet-green-kernel]
[definition: Green Function]
Let $\Omega\subset\mathbb R^n$ be open, let $X\subset \mathcal D'(\Omega)$ be a space on which a boundary trace condition is defined, and let
\begin{align*}
L:X\to \mathcal D'(\Omega)
\end{align*}
be a linear differential operator. A Green function for $L:X\to\mathcal D'(\Omega)$ with the specified boundary condition is a distribution kernel together with a section map
\begin{align*}
G\in \mathcal D'(\Omega_x\times\Omega_y), \qquad
\Omega\to X,\quad y\mapsto G_y,
\end{align*}
such that, for each $y\in\Omega$, the section $G_y$, written $G(\cdot,y)$ when the kernel is represented by a function off the diagonal, satisfies
\begin{align*}
L_xG_y=\delta_y
\end{align*}
in $\mathcal D'(\Omega)$, together with the prescribed boundary condition in the $x$ variable.
[/definition]
A Green function is a solution operator written as a kernel. To make the mapping property explicit, suppose $F\subset \mathcal D'(\Omega)$ is a forcing space and $Y\subset X$ is the corresponding solution space, with the inclusion $Y\hookrightarrow\mathcal D'(\Omega)$ understood. When the integral pairing below is defined for every $f\in F$ and gives an element of $Y$, the Green kernel induces the linear solution-kernel map
\begin{align*}
K_G:F\to Y,\qquad
K_Gf:=\left(x\mapsto\int_\Omega G(x,y)f(y)\,d\mathcal L^n(y)\right).
\end{align*}
In a classical smooth-kernel setting, one may take for example $F=C_c^\infty(\Omega)$ and $Y=C^2(\Omega)\cap X$ when the integral has that regularity. In weak elliptic settings, typical choices are $F=H^{-1}(\Omega)$ or $F=L^2(\Omega)$ and $Y=H^1_0(\Omega)$, with the formula interpreted through the weak pairing rather than as an absolutely convergent pointwise integral. If $L_xG(x,y)=\delta_y$, then this operator superposes the response to every infinitesimal source in the forcing term. The next theorem gives the boundary-adapted analogue of the whole-space convolution representation.
[quotetheorem:6160]
[citeproof:6160]
The homogeneity of the boundary condition is used so that the integral of kernels satisfying the condition still satisfies the same condition. The hypotheses about passing $L_x$ through the integral are not decorative. For the Laplacian in dimensions $n\ge2$, the fundamental singularity has second derivatives of size comparable to $|x-y|^{-n}$ near the diagonal, which are not absolutely integrable in $y$; applying $\Delta_x$ under the integral as an ordinary pointwise operation can therefore lose the Dirac contribution or produce a divergent integral. The distributional interpretation keeps the cancellation and the point mass together.
The boundary condition can fail just as concretely. If one uses the whole-space fundamental solution $\Phi(x-y)$ on a bounded domain instead of the Dirichlet Green function, then $x\mapsto \int_\Omega \Phi(x-y)f(y)\,d\mathcal L^n(y)$ generally has nonzero trace on $\partial\Omega$; for a nonnegative nonzero $f\in C_c^\infty(\Omega)$ and $\Omega$ a ball, the boundary values are positive for the kernel of $-\Delta$. Thus the kernel must encode the boundary condition, not merely the interior point-source equation. This statement is a representation principle rather than an existence theorem for $G$; constructing $G$ is a separate elliptic problem. The preceding argument explains how a Green kernel acts once it is known, but it does not explain where such kernels come from. For the Laplacian, the construction and representation formulas are governed by an integration-by-parts identity that keeps track of the boundary flux. The next theorem is the classical Green identity that supplies this boundary bookkeeping.
[explanation: Green Identities]
Let $U\subset\mathbb R^n$ be bounded with $C^1$ boundary, and let $u,v\in C^2(\overline U)$. The first Green identity is
\begin{align*}
\int_U \nabla u\cdot\nabla v\,d\mathcal L^n
=
\int_{\partial U}u\,\frac{\partial v}{\partial\nu}\,d\mathcal H^{n-1}
-
\int_U u\,\Delta v\,d\mathcal L^n.
\end{align*}
The second Green identity is
\begin{align*}
\int_U (u\Delta v-v\Delta u)\,d\mathcal L^n
=
\int_{\partial U}\left(u\frac{\partial v}{\partial\nu}-v\frac{\partial u}{\partial\nu}\right)\,d\mathcal H^{n-1}.
\end{align*}
[/explanation]
The smooth boundary and $C^2$ regularity up to the boundary are the classical assumptions that make the divergence theorem and normal derivatives available without trace theory. If $\Omega=(0,1)$ and $u(x)=|x-1/2|$, then $u$ is not $C^2$ and its second derivative contains the measure $2\delta_{1/2}$; substituting $u''$ as a classical function would miss an interior point-mass term. If the boundary has a cusp, for example $\Omega=\{(x,y):0<x<1,\ |y|<x^2\}$ near the origin, there is no classical outward unit normal at the cusp, so the boundary integral in the displayed formula is not a classical surface integral over a smooth normal field.
On rough domains or for weak solutions, the same identity survives only after replacing pointwise boundary values by traces and normal derivatives by weak fluxes. The displayed theorem itself does not provide those lower-regularity formulations, and it does not prove that a Green kernel exists. It is a classical integration-by-parts identity; weak trace theorems, conormal derivative definitions, and Green-kernel existence are separate results. This identity is the analytic source of Green representation formulas. Choosing $v$ to be a Green kernel turns one of the Laplacian terms into a Dirac mass, thereby recovering the value of $u$ from interior forcing and boundary data.
[example: Dirichlet Green Kernel]
Let $G(x,y)$ be the Dirichlet Green function for $-\Delta$ on a smooth bounded domain $\Omega$. Thus, for each $y\in\Omega$, the section $G(\cdot,y)$ has zero Dirichlet trace on $\partial\Omega$ and satisfies
\begin{align*}
-\Delta_xG(\cdot,y)=\delta_y
\end{align*}
in $\mathcal D'(\Omega)$. For $f\in C_c^\infty(\Omega)$, define
\begin{align*}
u(x)=\int_\Omega G(x,y)f(y)\,d\mathcal L^n(y).
\end{align*}
We compute $-\Delta u$ distributionally. Let $\phi\in\mathcal D(\Omega)$. Passing $-\Delta_x$ through the $y$-integration in the distributional sense, as in the Green-kernel representation principle above, gives
\begin{align*}
(-\Delta u)(\phi)=\int_\Omega \bigl((-\Delta_xG(\cdot,y))(\phi)\bigr)f(y)\,d\mathcal L^n(y).
\end{align*}
Using the defining identity $-\Delta_xG(\cdot,y)=\delta_y$, this becomes
\begin{align*}
(-\Delta u)(\phi)=\int_\Omega \delta_y(\phi)f(y)\,d\mathcal L^n(y).
\end{align*}
By the definition of the Dirac mass at $y$,
\begin{align*}
\delta_y(\phi)=\phi(y).
\end{align*}
Therefore
\begin{align*}
(-\Delta u)(\phi)=\int_\Omega \phi(y)f(y)\,d\mathcal L^n(y).
\end{align*}
By the definition of the regular distribution associated to $f$,
\begin{align*}
\int_\Omega \phi(y)f(y)\,d\mathcal L^n(y)=T_f(\phi).
\end{align*}
Since this holds for every $\phi\in\mathcal D(\Omega)$, we have
\begin{align*}
-\Delta u=f
\end{align*}
in $\mathcal D'(\Omega)$.
The boundary condition is inherited from the sections of the kernel. If $x\in\partial\Omega$, then the zero Dirichlet trace of $G(\cdot,y)$ gives
\begin{align*}
G(x,y)=0
\end{align*}
in the boundary-trace sense for each $y\in\Omega$. Passing the trace through the integral gives
\begin{align*}
u(x)=\int_\Omega G(x,y)f(y)\,d\mathcal L^n(y).
\end{align*}
Substituting the boundary trace of the kernel,
\begin{align*}
u(x)=\int_\Omega 0\cdot f(y)\,d\mathcal L^n(y).
\end{align*}
Hence
\begin{align*}
u(x)=0.
\end{align*}
Thus the same Green kernel produces the interior forcing term and enforces the homogeneous Dirichlet boundary condition.
[/example]
Green kernels therefore combine two ideas from earlier in the chapter: the singular response of a fundamental solution and the boundary bookkeeping supplied by Green identities. This is the template for later elliptic theory, where existence, regularity, and boundary traces are developed in functional-analytic spaces rather than only through smooth kernels.
Distributions provide the language for singular forcing and for the boundary terms that appear in weak formulations. Chapter 9 then uses that language to recover explicit representation formulas, showing how kernels and Green identities solve the classical model PDEs.
# 9. Classical Representation Formulas for Model PDEs
This chapter is the course's first systematic study of classical representation formulas for model second-order PDEs. The goal is to learn how explicit kernels and characteristic formulas solve the Laplace, Poisson, heat, and wave equations, while also revealing the qualitative behaviour that later elliptic, parabolic, and hyperbolic theory abstracts. The main prerequisites are multivariable calculus, the divergence theorem, differentiation under the integral sign, and basic estimates for integrals over balls, spheres, and the real line. Laplace and Poisson equations average data through the Newtonian potential, the heat equation spreads and smooths data through a Gaussian kernel, and the wave equation transports disturbances along characteristic cones.
## Harmonic Functions, Averages, and the Newtonian Potential
The first problem is to understand why solutions of $\Delta u = 0$ behave more rigidly than general twice differentiable functions. A harmonic function cannot have arbitrary local oscillation: its value at a point is encoded by its values on every surrounding sphere or ball. This averaging principle is the simplest form of elliptic regularity.
[definition: Harmonic Function]
Let $U \subset \mathbb R^n$ be open. A function $u:U\to\mathbb R$ with $u\in C^2(U)$ is harmonic in $U$ if
\begin{align*}
\Delta u = \sum_{i=1}^n \partial_{x_i x_i}u = 0
\end{align*}
in $U$, where $\Delta:C^2(U)\to C(U)$ is the Laplace operator.
[/definition]
The definition is local, but the first theorem turns it into a family of integral identities. The theorem says that harmonicity is equivalent to having no preferred direction of local averaging.
[quotetheorem:31]
[citeproof:31]
The hypotheses are part of the analytic content, not decorative assumptions. Smoothness makes the Laplacian equation classical, and requiring balls to remain inside $U$ keeps the averages away from boundary data. The statement is special to harmonic functions; a general $C^2$ function can have a strict interior maximum, for instance $u(x)=-|x|^2$ near the origin. The mean value property therefore gives a way to compare pointwise information with surrounding values, so it is natural to ask whether an interior extremum can occur without forcing extra rigidity. If a harmonic function attained a genuine interior maximum, its surrounding averages would have to equal that maximum even though every sampled value is no larger. The next result turns this tension into a uniqueness principle.
[explanation: Interior Rigidity for Harmonic Functions]
Let $U\subset\mathbb R^n$ be connected and let $u\in C^2(U)$ be harmonic. If $u$ attains a maximum or minimum at an interior point of $U$, then the mean-value property forces $u$ to be constant on the component of $U$ containing that point. This is an interior statement: boundary maximum estimates require separate boundedness and boundary-continuity hypotheses.
[/explanation]
Connectedness is essential in the maximum principle: on a disconnected open set, a function may be constant on each component with different constants, so an interior maximum on one component does not force a single constant value on all of $U$. The $C^2$ assumption is also part of the classical framework because the equation is being interpreted pointwise; later weak theories replace this by distributional or Sobolev hypotheses. Poisson's equation asks for a function whose Laplacian is prescribed rather than zero. The model formula comes from finding a function whose Laplacian is a point mass, then superposing point-source responses against the source term.
[remark: Laplacian Sign Convention for the Whole-Space Formula]
We keep the convention from the fundamental-solution theorem above: $\Phi$ is normalized by
\begin{align*}
-\Delta\Phi=\delta_0.
\end{align*}
Thus the whole-space Poisson formula in this section is read with the positive elliptic operator $-\Delta$. If one instead writes the equation as $\Delta u=f$, the kernel must be negated.
[/remark]
The fundamental solution explains what a single point source contributes to the potential. The next step is to turn many point sources into a solution of the whole-space Poisson equation by superposing translated copies of the normalized kernel against the density $f$. With the convention $-\Delta\Phi=\delta_0$, the correct whole-space formula for the positive operator is $u=\Phi*f$.
[explanation: Whole-Space Poisson Formula]
Let $f\in C_c^2(\mathbb R^n)$ and define
\begin{align*}
u(x)=(\Phi*f)(x)=\int_{\mathbb R^n}\Phi(x-y)f(y)\,d\mathcal L^n(y),
\end{align*}
where $\Phi$ is the fundamental solution satisfying $-\Delta\Phi=\delta_0$. Passing the constant-coefficient operator $-\Delta$ through convolution gives
\begin{align*}
-\Delta u= ((-\Delta)\Phi)*f=\delta_0*f=f
\end{align*}
in distributions, and the stated smoothness hypotheses upgrade the identity away from the singular integral to the classical Poisson equation on $\mathbb R^n$.
[/explanation]
The compact support and $C^2$ hypotheses keep this statement in the classical setting: compact support controls behaviour at infinity, while smoothness lets the singularity be handled by integration by parts and then upgraded from a distributional identity to a pointwise equality. If $f$ decays too slowly, the integral defining $u$ may fail to converge; if $f$ is only rough, the same formula is better interpreted first as a weak or distributional solution. This exact solution formula also lets us read off qualitative information from the source. For compactly supported data, the potential is controlled at large distances by the lowest-order moments of $f$, with the total mass giving the leading term in dimensions $n\ge 3$.
[example: Compactly Supported Source For Poisson Equation]
Let $n\ge 3$, let $f\in C_c^2(\mathbb R^n)$ be supported in $B(0,R)$, and define
\begin{align*}
u(x)=\frac{1}{n(n-2)\omega_n}\int_{B(0,R)} |x-y|^{2-n}f(y)\,d\mathcal L^n(y).
\end{align*}
By the whole-space Poisson formula above, this Newtonian potential satisfies $-\Delta u=f$ on $\mathbb R^n$. We compute its leading behaviour as $|x|\to\infty$.
Assume $|x|>2R$. Write $r=|x|$ and $e=x/|x|$, so $|e|=1$. For $y\in B(0,R)$, set $z=y/r$. Since $|y|<R$ and $r>2R$,
\begin{align*}
|z|=\frac{|y|}{r}<\frac{R}{r}<\frac{1}{2}.
\end{align*}
Also $x=re$ and $y=rz$, hence
\begin{align*}
x-y=r(e-z).
\end{align*}
Taking norms and raising to the power $2-n$ gives
\begin{align*}
|x-y|^{2-n}=r^{2-n}|e-z|^{2-n}.
\end{align*}
For fixed $e$, define $\varphi(z)=|e-z|^{2-n}$ on $|z|\le 1/2$. For each coordinate $z_i$,
\begin{align*}
\partial_{z_i}\varphi(z)=\partial_{z_i}\left(\sum_{j=1}^n(e_j-z_j)^2\right)^{(2-n)/2}.
\end{align*}
By the chain rule,
\begin{align*}
\partial_{z_i}\varphi(z)=\frac{2-n}{2}\left(\sum_{j=1}^n(e_j-z_j)^2\right)^{-n/2}\partial_{z_i}\left(\sum_{j=1}^n(e_j-z_j)^2\right).
\end{align*}
Since
\begin{align*}
\partial_{z_i}\left(\sum_{j=1}^n(e_j-z_j)^2\right)=-2(e_i-z_i),
\end{align*}
we get
\begin{align*}
\partial_{z_i}\varphi(z)=(n-2)(e_i-z_i)|e-z|^{-n}.
\end{align*}
Therefore
\begin{align*}
\nabla_z\varphi(z)=(n-2)(e-z)|e-z|^{-n}.
\end{align*}
Taking Euclidean norms,
\begin{align*}
|\nabla_z\varphi(z)|=(n-2)|e-z|^{1-n}.
\end{align*}
Because $|z|\le 1/2$,
\begin{align*}
|e-z|\ge |e|-|z|\ge 1-\frac{1}{2}=\frac{1}{2}.
\end{align*}
Thus
\begin{align*}
|\nabla_z\varphi(z)|\le (n-2)2^{n-1}.
\end{align*}
Apply the [mean value theorem](/theorems/186) to $\varphi$ along the segment from $0$ to $z$. There is some $\theta\in(0,1)$ such that
\begin{align*}
\varphi(z)-\varphi(0)=\nabla\varphi(\theta z)\cdot z.
\end{align*}
Since $\varphi(0)=|e|^{2-n}=1$, the [Cauchy-Schwarz inequality](/theorems/432) and the gradient bound give
\begin{align*}
\left||e-z|^{2-n}-1\right|\le |\nabla\varphi(\theta z)|\,|z|.
\end{align*}
Hence
\begin{align*}
\left||e-z|^{2-n}-1\right|\le (n-2)2^{n-1}|z|.
\end{align*}
Multiplying by $r^{2-n}$ and using $z=y/r$, we obtain
\begin{align*}
\left||x-y|^{2-n}-r^{2-n}\right|\le (n-2)2^{n-1}|y|r^{1-n}.
\end{align*}
Define
\begin{align*}
E(x,y)=|x-y|^{2-n}-r^{2-n}.
\end{align*}
Then
\begin{align*}
|x-y|^{2-n}=r^{2-n}+E(x,y),
\end{align*}
with
\begin{align*}
|E(x,y)|\le (n-2)2^{n-1}|y|r^{1-n}.
\end{align*}
Substituting this decomposition into the integral defining $u$ gives
\begin{align*}
u(x)=\frac{1}{n(n-2)\omega_n}\int_{B(0,R)}\left(r^{2-n}+E(x,y)\right)f(y)\,d\mathcal L^n(y).
\end{align*}
By linearity of the integral,
\begin{align*}
u(x)=\frac{r^{2-n}}{n(n-2)\omega_n}\int_{B(0,R)}f(y)\,d\mathcal L^n(y)+\frac{1}{n(n-2)\omega_n}\int_{B(0,R)}E(x,y)f(y)\,d\mathcal L^n(y).
\end{align*}
The error integral satisfies
\begin{align*}
\left|\int_{B(0,R)}E(x,y)f(y)\,d\mathcal L^n(y)\right|\le \int_{B(0,R)}|E(x,y)|\,|f(y)|\,d\mathcal L^n(y).
\end{align*}
Using the bound for $E(x,y)$,
\begin{align*}
\left|\int_{B(0,R)}E(x,y)f(y)\,d\mathcal L^n(y)\right|\le (n-2)2^{n-1}r^{1-n}\int_{B(0,R)}|y|\,|f(y)|\,d\mathcal L^n(y).
\end{align*}
The last integral is finite because $f$ is continuous and $B(0,R)$ is bounded. Since $f$ vanishes outside $B(0,R)$,
\begin{align*}
\int_{B(0,R)}f(y)\,d\mathcal L^n(y)=\int_{\mathbb R^n}f(y)\,d\mathcal L^n(y).
\end{align*}
Therefore
\begin{align*}
u(x)=\frac{|x|^{2-n}}{n(n-2)\omega_n}\int_{\mathbb R^n} f(y)\,d\mathcal L^n(y)+O(|x|^{1-n})
\end{align*}
as $|x|\to\infty$. Thus the total mass $\int_{\mathbb R^n}f\,d\mathcal L^n$ determines the leading monopole term; if this mass is zero, that term vanishes and the leading far-field behaviour comes from the next nonzero term in the expansion of $|x-y|^{2-n}$.
[/example]
As in the Green-kernel discussion of Chapter 8, this elliptic formula is spatial and instantaneous: every point of the source influences every point of the solution. The next model equation introduces time, and the representation formula becomes a convolution with a kernel that both spreads and regularises the initial data.
## Heat Flow, Gaussian Kernels, and Smoothing
For the heat equation, the guiding question is how an initial temperature profile evolves under diffusion. A point source should spread out symmetrically, conserve total mass, and become smoother for positive time. The Gaussian kernel is the unique expression satisfying these requirements together with the heat equation scaling.
[definition: Heat Kernel]
For $n\ge 1$, the heat kernel on $\mathbb R^n$ is the function $\Gamma:(0,\infty)\times\mathbb R^n\to\mathbb R$ defined by
\begin{align*}
\Gamma(t,x)=\frac{1}{(4\pi t)^{n/2}}\exp\left(-\frac{|x|^2}{4t}\right).
\end{align*}
[/definition]
The kernel is positive, integrates to one, and solves the heat equation away from $t=0$. It therefore acts like a time-dependent approximate identity, which suggests that convolution with $\Gamma$ should both solve the PDE for $t>0$ and recover the initial condition as the kernel concentrates at the origin. The representation formula makes that heuristic precise for bounded continuous data.
[explanation: Heat Kernel Representation Formula]
Let $C_b(\mathbb R^n)$ denote the space of bounded continuous functions on $\mathbb R^n$. For $g\in C_b(\mathbb R^n)$, define, for $t>0$,
\begin{align*}
u(t,x)=\int_{\mathbb R^n}\Gamma(t,x-y)g(y)\,d\mathcal L^n(y).
\end{align*}
Then $u\in C^\infty((0,\infty)\times\mathbb R^n)$, satisfies
\begin{align*}
\partial_t u-\Delta u=0,
\end{align*}
and $u(t,x)\to g(x)$ as $t\downarrow 0$ at every point $x$ where $g$ is continuous.
[/explanation]
The next check explains why this convolution is more than a formal guess: the Gaussian has exactly the derivative identity, mass normalization, and concentration behaviour needed for the heat equation and the initial trace.
[explanation: Why the Formula Solves the Heat Equation]
Differentiate the Gaussian explicitly to verify $\partial_t\Gamma=\Delta\Gamma$ for $t>0$. For fixed positive time, derivatives of $\Gamma(t,\cdot)$ are integrable, so differentiation under the integral gives smoothness and the heat equation. The mass identity $\int_{\mathbb R^n}\Gamma(t,z)\,d\mathcal L^n(z)=1$ follows by the [Gaussian integral](/theorems/1140). To prove convergence to the initial data, split the integral into $|y-x|<\delta$ and $|y-x|\ge \delta$; continuity of $g$ controls the first part, while Gaussian decay sends the second part to zero as $t\downarrow 0$.
[/explanation]
Boundedness ensures that the convolution integral is finite against the unit-mass Gaussian, and continuity at the point $x$ is exactly what is needed to recover $g(x)$ as $t\downarrow0$. If $g$ has a jump, the kernel still produces smooth positive-time solutions but the pointwise initial limit may become an averaged value rather than either one-sided value. This formula turns rough bounded data into a smooth solution for every positive time, which is the defining analytic feature of parabolic equations and contrasts sharply with the wave equation later in the chapter. Probabilistically, $u(t,x)$ is the expectation of $g(x+\sqrt{2}W_t)$ for [Brownian motion](/page/Brownian%20Motion) $W_t$, so the same Gaussian kernel also describes random diffusion.
[example: Gaussian Heat Flow From Rough Initial Data]
Let $g=\mathbb{1}_{[-1,1]}$ on $\mathbb R$. By *Heat Kernel Representation Formula*, for $t>0$ the heat solution is
\begin{align*}
u(t,x)=\int_{\mathbb R}\frac{1}{\sqrt{4\pi t}}\exp\left(-\frac{(x-y)^2}{4t}\right)\mathbb 1_{[-1,1]}(y)\,dy.
\end{align*}
Since $\mathbb 1_{[-1,1]}(y)=1$ for $-1\le y\le 1$ and $\mathbb 1_{[-1,1]}(y)=0$ otherwise,
\begin{align*}
u(t,x)=\frac{1}{\sqrt{4\pi t}}\int_{-1}^{1}\exp\left(-\frac{(x-y)^2}{4t}\right)\,dy.
\end{align*}
For fixed $t>0$, the first derivative of the Gaussian factor is
\begin{align*}
\partial_x\exp\left(-\frac{(x-y)^2}{4t}\right)=-\frac{x-y}{2t}\exp\left(-\frac{(x-y)^2}{4t}\right).
\end{align*}
Repeated $x$-derivatives have the form
\begin{align*}
\partial_x^k\exp\left(-\frac{(x-y)^2}{4t}\right)=P_{k,t}(x-y)\exp\left(-\frac{(x-y)^2}{4t}\right),
\end{align*}
where $P_{k,t}$ is a polynomial depending on $k$ and $t$. On each compact set $K\subset\mathbb R$, this derivative is continuous on the compact rectangle $K\times[-1,1]$, hence bounded there. Differentiation under the integral sign therefore gives
\begin{align*}
\partial_x^k u(t,x)=\frac{1}{\sqrt{4\pi t}}\int_{-1}^{1}\partial_x^k\exp\left(-\frac{(x-y)^2}{4t}\right)\,dy
\end{align*}
for every $k\ge 0$, so $u(t,\cdot)\in C^\infty(\mathbb R)$ even though $g$ jumps at $-1$ and $1$.
At the right jump point $x=1$,
\begin{align*}
u(t,1)=\frac{1}{\sqrt{4\pi t}}\int_{-1}^{1}\exp\left(-\frac{(1-y)^2}{4t}\right)\,dy.
\end{align*}
Set
\begin{align*}
s=\frac{1-y}{2\sqrt t}.
\end{align*}
Then $1-y=2\sqrt t\,s$, $(1-y)^2/(4t)=s^2$, and $dy=-2\sqrt t\,ds$. The endpoint $y=-1$ gives $s=1/\sqrt t$, while the endpoint $y=1$ gives $s=0$. Hence
\begin{align*}
u(t,1)=\frac{1}{\sqrt{4\pi t}}\int_{1/\sqrt t}^{0}e^{-s^2}(-2\sqrt t)\,ds.
\end{align*}
Reversing the limits gives
\begin{align*}
u(t,1)=\frac{2\sqrt t}{\sqrt{4\pi t}}\int_{0}^{1/\sqrt t}e^{-s^2}\,ds.
\end{align*}
Since $\sqrt{4\pi t}=2\sqrt{\pi t}=2\sqrt\pi\sqrt t$ for $t>0$,
\begin{align*}
u(t,1)=\frac{1}{\sqrt\pi}\int_{0}^{1/\sqrt t}e^{-s^2}\,ds.
\end{align*}
As $t\downarrow0$, the upper endpoint $1/\sqrt t$ increases to $\infty$. Since $e^{-s^2}\ge 0$, monotone convergence gives
\begin{align*}
\lim_{t\downarrow0}u(t,1)=\frac{1}{\sqrt\pi}\int_0^\infty e^{-s^2}\,ds.
\end{align*}
Using the Gaussian half-integral $\int_0^\infty e^{-s^2}\,ds=\sqrt\pi/2$,
\begin{align*}
\lim_{t\downarrow0}u(t,1)=\frac{1}{\sqrt\pi}\cdot\frac{\sqrt\pi}{2}=\frac{1}{2}.
\end{align*}
At the left jump point $x=-1$,
\begin{align*}
u(t,-1)=\frac{1}{\sqrt{4\pi t}}\int_{-1}^{1}\exp\left(-\frac{(-1-y)^2}{4t}\right)\,dy.
\end{align*}
Since $(-1-y)^2=(y+1)^2$,
\begin{align*}
u(t,-1)=\frac{1}{\sqrt{4\pi t}}\int_{-1}^{1}\exp\left(-\frac{(y+1)^2}{4t}\right)\,dy.
\end{align*}
Set
\begin{align*}
s=\frac{y+1}{2\sqrt t}.
\end{align*}
Then $y+1=2\sqrt t\,s$, $(y+1)^2/(4t)=s^2$, and $dy=2\sqrt t\,ds$. The endpoint $y=-1$ gives $s=0$, while the endpoint $y=1$ gives $s=1/\sqrt t$. Therefore
\begin{align*}
u(t,-1)=\frac{1}{\sqrt{4\pi t}}\int_{0}^{1/\sqrt t}e^{-s^2}(2\sqrt t)\,ds.
\end{align*}
As above,
\begin{align*}
u(t,-1)=\frac{1}{\sqrt\pi}\int_{0}^{1/\sqrt t}e^{-s^2}\,ds.
\end{align*}
Monotone convergence and $\int_0^\infty e^{-s^2}\,ds=\sqrt\pi/2$ give
\begin{align*}
\lim_{t\downarrow0}u(t,-1)=\frac{1}{2}.
\end{align*}
At every continuity point of $g$, *Heat Kernel Representation Formula* gives $u(t,x)\to g(x)$ as $t\downarrow0$. Thus the heat flow immediately smooths the discontinuous initial profile, recovers the original values away from the two jump points, and assigns the symmetric jump value $\frac12$ at each endpoint.
[/example]
The example shows that convolution with a positive kernel averages the initial profile rather than creating new extremes. This raises the bounded-domain question: if heat can enter through the spatial boundary as well as the initial slice, where can the largest value in a space-time cylinder occur? The parabolic maximum principle answers that the interior cannot be the first source of a maximum.
[quotetheorem:560]
[citeproof:560]
The boundedness of $U$ and continuity on $\overline U\times[0,T]$ ensure that the displayed maxima are attained, while the parabolic boundary records exactly where data are allowed to enter the cylinder. If boundary values are omitted, heat entering through $\partial U$ can create later interior values not controlled by the initial slice alone; if the final time were included as data, the direction of parabolic evolution would be lost. The heat equation therefore has infinite propagation speed: a compactly supported nonnegative datum immediately produces a positive Gaussian tail everywhere. Hyperbolic equations behave differently because their representation formulas carry data along finite-speed characteristic paths.
## Waves, Characteristics, and Finite Propagation Speed
For the wave equation, the central question is how displacement and velocity data move without diffusive smoothing. In one space dimension, the characteristic variables $x+t$ and $x-t$ reduce the equation to two travelling components. This gives an exact formula and a conservation law.
[definition: One-Dimensional Wave Cauchy Problem]
Let $c>0$, let $g:\mathbb R\to\mathbb R$ with $g\in C^2(\mathbb R)$, and let $h:\mathbb R\to\mathbb R$ with $h\in C^1(\mathbb R)$. The one-dimensional wave Cauchy problem on $\mathbb R$ is the problem of finding $u:[0,\infty)\times\mathbb R\to\mathbb R$ such that
\begin{align*}
Lu=0,
\end{align*}
where $L:C^2((0,\infty)\times\mathbb R)\to C((0,\infty)\times\mathbb R)$ is the operator
\begin{align*}
Lu=\partial_{tt}u-c^2\partial_{xx}u,
\end{align*}
with initial displacement
\begin{align*}
u(0,x)=g(x)
\end{align*}
and initial velocity
\begin{align*}
\partial_tu(0,x)=h(x).
\end{align*}
[/definition]
The formula below separates the transported initial displacement from the accumulated initial velocity. The interval $[x-ct,x+ct]$ is the one-dimensional light cone at time $t$.
[quotetheorem:665]
[citeproof:665]
The regularity assumptions on $g$ and $h$ are sufficient for the displayed formula to be twice differentiable and to satisfy the PDE classically; with rougher data the same expression may still define a weak solution but not a classical one. The whole-line setting avoids boundary reflections, so the only propagation mechanism visible here is transport along characteristics. DAlembert's formula is more than an explicit solution: it shows that waves preserve profiles while translating them. Localised data therefore split into pieces whose locations can be tracked by the characteristic lines $x\pm ct=\text{constant}$.
[example: Wave Propagation From Localized Displacement]
Let $c=1$, $h=0$, and let $g\in C_c^2(\mathbb R)$ be supported in $[-R,R]$, meaning $g(s)=0$ for every $s\notin[-R,R]$. By *DAlembert Formula* with $c=1$,
\begin{align*}
u(t,x)=\frac{1}{2}g(x+t)+\frac{1}{2}g(x-t)+\frac{1}{2}\int_{x-t}^{x+t}h(y)\,dy.
\end{align*}
Since $h(y)=0$ for every $y\in\mathbb R$,
\begin{align*}
\int_{x-t}^{x+t}h(y)\,dy=\int_{x-t}^{x+t}0\,dy.
\end{align*}
The integral of the zero function over any interval is zero, so
\begin{align*}
\int_{x-t}^{x+t}h(y)\,dy=0.
\end{align*}
Therefore
\begin{align*}
u(t,x)=\frac{1}{2}g(x+t)+\frac{1}{2}g(x-t).
\end{align*}
We identify where each translated term can be nonzero. If $g(x+t)\ne 0$, then the support condition forces
\begin{align*}
x+t\in[-R,R].
\end{align*}
Equivalently,
\begin{align*}
-R\le x+t\le R.
\end{align*}
Subtracting $t$ from all three parts gives
\begin{align*}
-R-t\le x\le R-t.
\end{align*}
Thus $\frac12 g(x+t)$ is supported in $[-R-t,R-t]$. The centre of this interval is
\begin{align*}
\frac{(-R-t)+(R-t)}{2}=\frac{-2t}{2}=-t.
\end{align*}
Similarly, if $g(x-t)\ne 0$, then
\begin{align*}
x-t\in[-R,R],
\end{align*}
so
\begin{align*}
-R\le x-t\le R.
\end{align*}
Adding $t$ to all three parts gives
\begin{align*}
-R+t\le x\le R+t.
\end{align*}
Thus $\frac12 g(x-t)$ is supported in $[-R+t,R+t]$. The centre of this interval is
\begin{align*}
\frac{(-R+t)+(R+t)}{2}=\frac{2t}{2}=t.
\end{align*}
If $x$ lies outside
\begin{align*}
[-R-t,R-t]\cup[-R+t,R+t],
\end{align*}
then $x+t\notin[-R,R]$ and $x-t\notin[-R,R]$. Hence
\begin{align*}
g(x+t)=0
\end{align*}
and
\begin{align*}
g(x-t)=0.
\end{align*}
Substituting these two equalities into the formula for $u(t,x)$ gives
\begin{align*}
u(t,x)=\frac12\cdot 0+\frac12\cdot 0=0.
\end{align*}
Therefore
\begin{align*}
\operatorname{supp} u(t,\cdot)\subset [-R-t,R-t]\cup[-R+t,R+t].
\end{align*}
The initial displacement has split into a left-moving translate and a right-moving translate, each travelling with speed $1$. The formula uses only translations and multiplication by $\frac12$; for every derivative order $k$ with $0\le k\le 2$,
\begin{align*}
\partial_x^k g(x+t)=g^{(k)}(x+t)
\end{align*}
and
\begin{align*}
\partial_x^k g(x-t)=g^{(k)}(x-t),
\end{align*}
by repeated application of the chain rule, since $\partial_x(x+t)=1$ and $\partial_x(x-t)=1$. Thus the wave equation transports the original regularity of $g$ without creating the smoothing effect seen for the heat equation.
[/example]
The compact support statement is a prototype for finite propagation speed, but it is useful to state the dependence region directly. Unlike the heat kernel, the wave formula samples only a bounded interval in the past. Data outside the backward cone cannot affect the value at a point, which is the hyperbolic analogue of causality.
[quotetheorem:6161]
[citeproof:6161]
This finite-speed statement is a direct consequence of the explicit formula, so it relies on the whole-line constant-coefficient model. It also shows the main qualitative failure of any heat-kernel analogy: wave data outside the backward cone have no influence at $(t,x)$, whereas heat data at any distance contribute immediately through the Gaussian tail. Finite propagation describes where the solution can be nonzero, but it does not measure how much wave motion is present. The complementary invariant is energy: instead of being dissipated or smoothed away, the sum of kinetic and elastic energy remains constant in time for solutions with suitable decay.
[quotetheorem:6162]
[citeproof:6162]
The vanishing boundary term is the analytic place where decay enters the energy identity; without it, energy can cross the artificial boundary at infinity or through a finite spatial boundary. On bounded intervals the corresponding identity contains boundary flux terms unless boundary conditions such as fixed endpoints or periodicity remove them. The elliptic, parabolic, and hyperbolic representation theories in this chapter summarise the classical contrast among model PDEs. Elliptic equations average spatial data and enforce rigidity, parabolic equations average with a time-dependent Gaussian and smooth immediately, and hyperbolic equations transport data along finite-speed characteristic paths while conserving energy.
Representation formulas reveal how elliptic, parabolic, and hyperbolic equations behave, but they also suggest a different question: what can be deduced without solving explicitly? Chapter 10 answers that by replacing formulas with maximum principles and energy estimates as the first structural tools.
# 10. Maximum Principles and Energy Methods as First Tools
This chapter changes the emphasis from solving PDEs to extracting information from them. Chapters 2-4 used characteristics, Chapters 8-9 developed representation formulas, and Chapters 5-7 introduced weak formulations to build solutions or interpret discontinuities. The prerequisites are the classical Laplace, heat, transport, and wave equations; integration by parts and the divergence theorem; basic $C^k$ regularity notation; $L^2$ and $L^\infty$ norms; and Gronwall's inequality. Here the central question is different: what can be proved about a solution before it has been written down? Maximum principles and energy estimates give the first systematic answers, and they lead directly to uniqueness results.
## Maximum Principles for Laplace and Heat Equations
Suppose a scalar quantity is governed by diffusion or equilibrium rather than transport. A natural question is whether the equation can create a new interior maximum, or whether all extremal behaviour must be inherited from the boundary and initial data. Maximum principles formalise the answer for the model elliptic and parabolic equations.
The elliptic model is the Laplace equation on a bounded domain. The principle says that a harmonic function cannot have a strict interior peak unless it is constant. The boundedness and boundary continuity assumptions are part of the mechanism: the maximum must actually be attained on the compact closure before the interior second-derivative test can be used.
[example: Why Boundary Control Needs Hypotheses]
On $U=\mathbb R^n$, define $u(x)=x_1$. For the first coordinate,
\begin{align*}
\partial_{x_1}u=\partial_{x_1}x_1=1.
\end{align*}
Differentiating this constant once more gives
\begin{align*}
\partial_{x_1x_1}u=\partial_{x_1}(1)=0.
\end{align*}
For $2\le i\le n$, the function $x_1$ is independent of $x_i$, so
\begin{align*}
\partial_{x_i}u=0.
\end{align*}
Differentiating again in the same coordinate gives
\begin{align*}
\partial_{x_ix_i}u=\partial_{x_i}(0)=0.
\end{align*}
Hence
\begin{align*}
\Delta u=\sum_{i=1}^n\partial_{x_ix_i}u=\partial_{x_1x_1}u+\sum_{i=2}^n\partial_{x_ix_i}u=0+\sum_{i=2}^n0=0,
\end{align*}
so $u$ is harmonic. For $x_R=(R,0,\ldots,0)$ with $R>0$,
\begin{align*}
u(x_R)=R.
\end{align*}
Thus $\sup_{\mathbb R^n}u\ge R$ for every $R>0$. If $\sup_{\mathbb R^n}u$ were finite, choosing $R>\sup_{\mathbb R^n}u$ would contradict $\sup_{\mathbb R^n}u\ge R$, so
\begin{align*}
\sup_{\mathbb R^n}u=\infty.
\end{align*}
On this unbounded domain there is no compact closure forcing a maximum to be attained, and there is no finite boundary maximum controlling these values.
Now let $U=B(0,1)\setminus\{0\}$, assume $n\ge 3$, and set $u(x)=|x|^{2-n}$. Write $r=|x|$, so $u(x)=f(r)$ with $f(r)=r^{2-n}$ for $0<r<1$. For $1\le i\le n$,
\begin{align*}
\partial_{x_i}r=\partial_{x_i}\left(\sum_{j=1}^n x_j^2\right)^{1/2}=\frac12\left(\sum_{j=1}^n x_j^2\right)^{-1/2}2x_i=\frac{x_i}{r}.
\end{align*}
By the chain rule,
\begin{align*}
\partial_{x_i}u=f'(r)\partial_{x_i}r=f'(r)\frac{x_i}{r}.
\end{align*}
Differentiating this expression in $x_i$ gives
\begin{align*}
\partial_{x_ix_i}u=\partial_{x_i}\left(f'(r)\frac{x_i}{r}\right).
\end{align*}
Using the product rule and the chain rule on the first factor,
\begin{align*}
\partial_{x_ix_i}u=f''(r)\partial_{x_i}r\frac{x_i}{r}+f'(r)\partial_{x_i}\left(x_ir^{-1}\right).
\end{align*}
Since $\partial_{x_i}r=x_i/r$, the first term is
\begin{align*}
f''(r)\partial_{x_i}r\frac{x_i}{r}=f''(r)\frac{x_i^2}{r^2}.
\end{align*}
For the remaining derivative,
\begin{align*}
\partial_{x_i}\left(x_ir^{-1}\right)=r^{-1}+x_i\partial_{x_i}(r^{-1}).
\end{align*}
Again by the chain rule,
\begin{align*}
\partial_{x_i}(r^{-1})=-r^{-2}\partial_{x_i}r=-r^{-2}\frac{x_i}{r}=-\frac{x_i}{r^3}.
\end{align*}
Therefore
\begin{align*}
\partial_{x_i}\left(x_ir^{-1}\right)=\frac1r-\frac{x_i^2}{r^3}.
\end{align*}
Substituting both pieces,
\begin{align*}
\partial_{x_ix_i}u=f''(r)\frac{x_i^2}{r^2}+f'(r)\left(\frac1r-\frac{x_i^2}{r^3}\right).
\end{align*}
Summing over $i$ gives
\begin{align*}
\Delta u=\sum_{i=1}^n f''(r)\frac{x_i^2}{r^2}+\sum_{i=1}^n f'(r)\left(\frac1r-\frac{x_i^2}{r^3}\right).
\end{align*}
Since $f'(r)$ and $f''(r)$ do not depend on $i$,
\begin{align*}
\Delta u=f''(r)\frac{\sum_{i=1}^n x_i^2}{r^2}+f'(r)\left(\frac nr-\frac{\sum_{i=1}^n x_i^2}{r^3}\right).
\end{align*}
Because $\sum_{i=1}^n x_i^2=r^2$, this becomes
\begin{align*}
\Delta u=f''(r)\frac{r^2}{r^2}+f'(r)\left(\frac nr-\frac{r^2}{r^3}\right)=f''(r)+\frac{n-1}{r}f'(r).
\end{align*}
For $f(r)=r^{2-n}$,
\begin{align*}
f'(r)=(2-n)r^{1-n}.
\end{align*}
Differentiating once more,
\begin{align*}
f''(r)=(2-n)(1-n)r^{-n}.
\end{align*}
Substituting these expressions into the radial Laplacian identity,
\begin{align*}
\Delta u=(2-n)(1-n)r^{-n}+\frac{n-1}{r}(2-n)r^{1-n}.
\end{align*}
Since $r^{-1}r^{1-n}=r^{-n}$,
\begin{align*}
\Delta u=(2-n)(1-n)r^{-n}+(n-1)(2-n)r^{-n}.
\end{align*}
Factoring the common term gives
\begin{align*}
\Delta u=(2-n)\bigl((1-n)+(n-1)\bigr)r^{-n}.
\end{align*}
The expression in parentheses is $0$, so
\begin{align*}
\Delta u=(2-n)\cdot 0\cdot r^{-n}=0
\end{align*}
for every $0<r<1$. Thus $u$ is harmonic on the punctured ball.
For $x_k=(1/k,0,\ldots,0)$,
\begin{align*}
|x_k|=\frac1k.
\end{align*}
Therefore
\begin{align*}
u(x_k)=\left(\frac1k\right)^{2-n}=\left(k^{-1}\right)^{2-n}=k^{n-2}.
\end{align*}
Since $n\ge 3$, we have $n-2\ge 1$, and hence $k^{n-2}\to\infty$ as $k\to\infty$. Therefore $u$ is unbounded near the missing point $0$. A continuous function on the compact set $\overline{B(0,1)}$ would be bounded, so this $u$ cannot extend continuously to $\overline U=\overline{B(0,1)}$. These two cases show that $\Delta u=0$ alone does not give boundary control; boundedness of the domain and continuity up to the boundary are part of what makes the weak maximum principle meaningful.
[/example]
These examples separate the analytic content of the maximum principle from the topological facts needed to state it. Once the domain is bounded and the solution extends continuously to the closure, the compactness of $\overline U$ guarantees that an extremum is attained somewhere. The remaining question is whether such an extremum can occur strictly inside $U$ when $\Delta u=0$. The following theorem answers this by turning the second-derivative test at an interior maximum into a contradiction.
[quotetheorem:100]
[citeproof:100]
This theorem is often the fastest route to information about a boundary value problem. It turns the interior differential inequality $Lu\le0$ into an order estimate controlled by the boundary values. The assumptions explain the scope of the result: boundedness of $U$ and continuity on $\overline U$ ensure that the maximum exists and can be compared with boundary values; ellipticity gives the second-derivative sign mechanism; and the condition $c\ge0$ prevents the zeroth-order term from reversing the comparison. The harmonic case $\Delta u=0$ is the model special case obtained from the Laplacian after adjusting sign conventions, but the quoted theorem is deliberately more general than harmonic functions. It does not assert existence of a solution with prescribed boundary data, and it does not identify where on $\partial U$ the maximum occurs.
[example: Uniqueness for the Dirichlet Problem]
Let $U\subset \mathbb R^n$ be bounded and open, and suppose $u,v\in C^2(U)\cap C(\overline U)$ are harmonic in $U$ with the same boundary values on $\partial U$. Define $z=u-v$. Since differences of $C^2(U)$ functions are in $C^2(U)$, and differences of $C(\overline U)$ functions are in $C(\overline U)$,
\begin{align*}
z\in C^2(U)\cap C(\overline U).
\end{align*}
For each $x\in U$, linearity of the Laplacian gives
\begin{align*}
\Delta z(x)=\Delta(u-v)(x)=\Delta u(x)-\Delta v(x).
\end{align*}
Because $u$ and $v$ are harmonic in $U$,
\begin{align*}
\Delta u(x)=0
\end{align*}
and
\begin{align*}
\Delta v(x)=0.
\end{align*}
Therefore
\begin{align*}
\Delta z(x)=0-0=0
\end{align*}
for every $x\in U$, so $z$ is harmonic in $U$.
For each $x\in\partial U$, the boundary values of $u$ and $v$ agree, so
\begin{align*}
u(x)=v(x).
\end{align*}
Hence
\begin{align*}
z(x)=u(x)-v(x)=0.
\end{align*}
Thus $z=0$ on $\partial U$, and consequently
\begin{align*}
\max_{\partial U}z=0.
\end{align*}
Applying the *Weak Maximum Principle for the Laplace Equation* to $z$ gives
\begin{align*}
\max_{\overline U}z=\max_{\partial U}z=0.
\end{align*}
Since every value of $z$ on $\overline U$ is bounded above by $\max_{\overline U}z$,
\begin{align*}
z(x)\le 0
\end{align*}
for every $x\in\overline U$.
Now apply the same argument to $-z$. Since $z\in C^2(U)\cap C(\overline U)$,
\begin{align*}
-z\in C^2(U)\cap C(\overline U).
\end{align*}
For each $x\in U$, linearity gives
\begin{align*}
\Delta(-z)(x)=-\Delta z(x)=-0=0.
\end{align*}
For each $x\in\partial U$, since $z(x)=0$,
\begin{align*}
(-z)(x)=-z(x)=-0=0.
\end{align*}
Thus
\begin{align*}
\max_{\partial U}(-z)=0.
\end{align*}
Applying the *Weak Maximum Principle for the Laplace Equation* to $-z$ gives
\begin{align*}
\max_{\overline U}(-z)=\max_{\partial U}(-z)=0.
\end{align*}
Therefore
\begin{align*}
-z(x)\le 0
\end{align*}
for every $x\in\overline U$. Multiplying by $-1$ reverses the inequality, so
\begin{align*}
z(x)\ge 0
\end{align*}
for every $x\in\overline U$.
Combining the two inequalities gives
\begin{align*}
0\le z(x)\le 0
\end{align*}
for every $x\in\overline U$. Hence
\begin{align*}
z(x)=0
\end{align*}
for every $x\in\overline U$. Since $z=u-v$, this means
\begin{align*}
u(x)-v(x)=0
\end{align*}
for every $x\in\overline U$, and therefore
\begin{align*}
u(x)=v(x)
\end{align*}
for every $x\in\overline U$. Thus the Dirichlet problem has at most one solution in the class $C^2(U)\cap C(\overline U)$.
[/example]
The weak principle controls the size of a solution, but it does not yet describe what happens if the boundary maximum is also attained inside. For a harmonic function, an interior maximum is not just another point where the same bound is reached: the averaging property forces nearby values to match unless there is room for strict decrease and compensation, which the maximum forbids. On a connected domain this local rigidity propagates, so the obstruction to a nonconstant solution is the presence of an interior extremum.
[explanation: Strong Maximum Principle for Harmonic Functions]
Let $U\subset\mathbb R^n$ be connected and let $u\in C^2(U)$ satisfy $\Delta u=0$. If $u$ attains its maximum or minimum at a point of $U$, then $u$ is constant on $U$. The statement assumes the extremum is actually attained; boundary comparison on $\overline U$ is a separate weak maximum-principle statement that needs boundedness and continuity up to the boundary.
[/explanation]
The connectedness assumption in the strong principle is also essential. If $U$ is the disjoint union of two open sets and $u$ is constant with different values on the two components, then $u$ is harmonic and attains an interior maximum on the component with the larger value, but it is not constant on all of $U$. The open-set and $C^2$ hypotheses place the maximum at a point where the equation and the mean-value argument are valid; a boundary maximum, or a singular point omitted from the domain, is not covered by this conclusion. The result is qualitative rather than quantitative: it rules out a nonconstant interior extremum, but it does not give a boundary estimate unless the boundary continuity hypotheses of the weak principle are also available. This distinction prepares the parabolic case, where the same idea of excluding new interior extrema survives from the heat-equation discussion in Chapter 9, but the notion of "boundary" must be changed to respect the direction of time.
The elliptic theory used the ordinary boundary because the equation has no preferred direction. For heat flow, time has a direction, so the relevant boundary must include the initial slice and the spatial side boundary, but not the terminal time as prescribed data. This motivates the following boundary convention.
[definition: Parabolic Boundary]
Let $U \subset \mathbb R^n$ be bounded and open, and let $T>0$. The parabolic boundary of $U\times(0,T]$ is
\begin{align*}
\partial_p(U\times(0,T]) = (\overline U\times\{0\}) \cup (\partial U\times[0,T]).
\end{align*}
[/definition]
The parabolic boundary records exactly the places where data are supplied for a forward heat problem.
[illustration:pdei-parabolic-boundary-cylinder]
Including the terminal slice as if it were ordinary boundary data would reverse the causal meaning of the heat equation. For example, on $\mathbb R\times[0,T]$ the function $u(x,t)=e^{-t}\sin x$ solves $u_t-u_{xx}=0$ and is determined by its initial value $\sin x$; prescribing only terminal data would require solving the heat equation backward, which is unstable and not controlled by the forward maximum principle. Once the terminal time slice is separated from the initial and lateral data, the right maximum estimate can be stated: every value at positive time is controlled by the initial and lateral boundary data. The proof again uses a small perturbation, now in the time variable.
[quotetheorem:693]
[citeproof:693]
The heat principle becomes especially concrete on an interval, where the parabolic boundary consists of the two side walls and the initial line. Each hypothesis has a specific role. The equality $u_t-\Delta u=0$ is what turns an interior positive-time maximum into a contradiction after the perturbation; for inequalities the conclusion changes to the corresponding subsolution or supersolution comparison. Boundedness of $U$ and finiteness of $T$ make $\overline U\times[0,T]$ compact, so a continuous solution attains its maximum. Continuity up to the parabolic boundary lets the maximum be compared with prescribed data; without it, boundary values would not control limiting behaviour near the edge of the cylinder. For instance, on $(0,1)\times(0,T]$ the heat solution with initial data approximating a point mass has interior values that are not bounded by any continuous initial trace, because the limiting datum is not a continuous function on $[0,1]$. The parabolic-boundary hypothesis is also directional: replacing the initial slice by only terminal data would ask for backward heat flow, which the forward maximum principle does not control. If the lateral boundary is removed on an unbounded spatial domain, $u(x,t)=e^{t+x}$ satisfies $u_t-u_{xx}=0$ and has no finite boundary maximum controlling its growth as $x\to\infty$. The theorem also gives uniqueness and comparison, not existence or regularity of solutions.
[example: Heat Equation on a Bounded Interval]
Let $u,v\in C^{2,1}((0,L)\times(0,T])\cap C([0,L]\times[0,T])$ satisfy
\begin{align*}
u_t-u_{xx}=0
\end{align*}
and
\begin{align*}
v_t-v_{xx}=0
\end{align*}
on $(0,L)\times(0,T]$, with the same initial values on $[0,L]\times\{0\}$ and the same boundary values on $\{0,L\}\times[0,T]$. Define
\begin{align*}
z=u-v.
\end{align*}
Since differences preserve the stated regularity,
\begin{align*}
z\in C^{2,1}((0,L)\times(0,T])\cap C([0,L]\times[0,T]).
\end{align*}
For each $(x,t)\in(0,L)\times(0,T]$, linearity of $\partial_t$ gives
\begin{align*}
z_t=(u-v)_t=u_t-v_t.
\end{align*}
Linearity of $\partial_{xx}$ gives
\begin{align*}
z_{xx}=(u-v)_{xx}=u_{xx}-v_{xx}.
\end{align*}
Subtracting these two identities,
\begin{align*}
z_t-z_{xx}=(u_t-v_t)-(u_{xx}-v_{xx}).
\end{align*}
Expanding the parentheses,
\begin{align*}
z_t-z_{xx}=u_t-v_t-u_{xx}+v_{xx}.
\end{align*}
Regrouping the $u$-terms and $v$-terms,
\begin{align*}
z_t-z_{xx}=(u_t-u_{xx})-(v_t-v_{xx}).
\end{align*}
Using the two heat equations,
\begin{align*}
z_t-z_{xx}=0-0=0.
\end{align*}
Thus $z$ satisfies the homogeneous heat equation on $(0,L)\times(0,T]$.
For the interval $U=(0,L)$, the parabolic boundary is
\begin{align*}
\partial_p((0,L)\times(0,T])=([0,L]\times\{0\})\cup(\{0,L\}\times[0,T]).
\end{align*}
On the initial part, the initial data agree, so for every $x\in[0,L]$,
\begin{align*}
z(x,0)=u(x,0)-v(x,0)=0.
\end{align*}
On the lateral part, the boundary data agree, so for every $t\in[0,T]$,
\begin{align*}
z(0,t)=u(0,t)-v(0,t)=0
\end{align*}
and
\begin{align*}
z(L,t)=u(L,t)-v(L,t)=0.
\end{align*}
Hence $z=0$ on $\partial_p((0,L)\times(0,T])$, and therefore
\begin{align*}
\max_{\partial_p((0,L)\times(0,T])} z=0.
\end{align*}
By the *Weak Maximum Principle for the Heat Equation* applied to $z$,
\begin{align*}
\max_{[0,L]\times[0,T]} z=\max_{\partial_p((0,L)\times(0,T])} z=0.
\end{align*}
Every value of $z$ on the closed cylinder is bounded above by this maximum, so for every $(x,t)\in[0,L]\times[0,T]$,
\begin{align*}
z(x,t)\le 0.
\end{align*}
Now consider $-z$. It has the same regularity as $z$. For each $(x,t)\in(0,L)\times(0,T]$,
\begin{align*}
(-z)_t=-z_t.
\end{align*}
Also,
\begin{align*}
(-z)_{xx}=-z_{xx}.
\end{align*}
Therefore
\begin{align*}
(-z)_t-(-z)_{xx}=-z_t-(-z_{xx}).
\end{align*}
Since $-(-z_{xx})=z_{xx}$,
\begin{align*}
(-z)_t-(-z)_{xx}=-z_t+z_{xx}.
\end{align*}
Factoring out $-1$,
\begin{align*}
(-z)_t-(-z)_{xx}=-(z_t-z_{xx}).
\end{align*}
Because $z_t-z_{xx}=0$,
\begin{align*}
(-z)_t-(-z)_{xx}=-0=0.
\end{align*}
Also, since $z=0$ on $\partial_p((0,L)\times(0,T])$,
\begin{align*}
-z=0
\end{align*}
on $\partial_p((0,L)\times(0,T])$, and hence
\begin{align*}
\max_{\partial_p((0,L)\times(0,T])}(-z)=0.
\end{align*}
Applying the *Weak Maximum Principle for the Heat Equation* to $-z$ gives
\begin{align*}
\max_{[0,L]\times[0,T]}(-z)=\max_{\partial_p((0,L)\times(0,T])}(-z)=0.
\end{align*}
Every value of $-z$ is bounded above by this maximum, so
\begin{align*}
-z(x,t)\le 0
\end{align*}
for every $(x,t)\in[0,L]\times[0,T]$. Multiplying by $-1$ reverses the inequality, and therefore
\begin{align*}
z(x,t)\ge 0.
\end{align*}
Combining the two inequalities, for every $(x,t)\in[0,L]\times[0,T]$,
\begin{align*}
0\le z(x,t)\le 0.
\end{align*}
Thus
\begin{align*}
z(x,t)=0
\end{align*}
for every $(x,t)\in[0,L]\times[0,T]$. Since $z=u-v$,
\begin{align*}
u(x,t)-v(x,t)=0.
\end{align*}
Hence
\begin{align*}
u(x,t)=v(x,t)
\end{align*}
everywhere on $[0,L]\times[0,T]$. The initial and lateral boundary data therefore determine at most one classical heat solution on the closed cylinder.
[/example]
## Energy Estimates for Transport and Wave Equations
Maximum principles are order estimates, so they are most natural for scalar elliptic and parabolic equations. For transport and wave equations, signs are less decisive than size. The guiding question becomes: can the equation control a norm of the solution over time?
For transport, the basic quantity is the $L^2$ mass. If the velocity field has no compression or expansion, this mass is conserved for solutions with no boundary flux. The boundary condition is not decorative: energy can enter or leave through the boundary when the vector field points across it.
[example: Boundary Flux Changes Transport Energy]
Let $U=(0,1)$, let $b=1$, and suppose $u\in C^1([0,1]\times[0,T])$ satisfies
\begin{align*}
u_t+u_x=0
\end{align*}
on $(0,1)\times(0,T)$. For fixed $t$, define
\begin{align*}
E(t)=\frac12\int_0^1 |u(x,t)|^2\,dx.
\end{align*}
Since $u$ is real-valued here, $|u(x,t)|^2=u(x,t)^2$, so differentiating under the integral sign gives
\begin{align*}
E'(t)=\frac12\int_0^1 \partial_t\bigl(u(x,t)^2\bigr)\,dx.
\end{align*}
By the chain rule,
\begin{align*}
\partial_t\bigl(u(x,t)^2\bigr)=2u(x,t)u_t(x,t),
\end{align*}
and therefore
\begin{align*}
E'(t)=\frac12\int_0^1 2u(x,t)u_t(x,t)\,dx
=\int_0^1 u(x,t)u_t(x,t)\,dx.
\end{align*}
The transport equation implies
\begin{align*}
u_t(x,t)+u_x(x,t)=0,
\end{align*}
so
\begin{align*}
u_t(x,t)=-u_x(x,t).
\end{align*}
Substituting this into the energy derivative,
\begin{align*}
E'(t)=\int_0^1 u(x,t)(-u_x(x,t))\,dx
=-\int_0^1 u(x,t)u_x(x,t)\,dx.
\end{align*}
Again by the chain rule in the spatial variable,
\begin{align*}
\partial_x\bigl(u(x,t)^2\bigr)=2u(x,t)u_x(x,t),
\end{align*}
so
\begin{align*}
u(x,t)u_x(x,t)=\frac12\partial_x\bigl(u(x,t)^2\bigr).
\end{align*}
Hence
\begin{align*}
E'(t)
=-\int_0^1 \frac12\partial_x\bigl(u(x,t)^2\bigr)\,dx
=-\frac12\int_0^1 \partial_x\bigl(u(x,t)^2\bigr)\,dx.
\end{align*}
By the fundamental theorem of calculus,
\begin{align*}
\int_0^1 \partial_x\bigl(u(x,t)^2\bigr)\,dx
=u(1,t)^2-u(0,t)^2.
\end{align*}
Therefore
\begin{align*}
E'(t)
=-\frac12\bigl(u(1,t)^2-u(0,t)^2\bigr)
=-\frac12u(1,t)^2+\frac12u(0,t)^2.
\end{align*}
Equivalently,
\begin{align*}
\frac{d}{dt}\frac12\int_0^1 |u(x,t)|^2\,dx
=\frac12|u(0,t)|^2-\frac12|u(1,t)|^2.
\end{align*}
Because $b=1>0$, characteristics move to the right: the left endpoint $x=0$ is the inflow boundary and the right endpoint $x=1$ is the outflow boundary. The $L^2$ mass therefore changes by incoming boundary energy minus outgoing boundary energy, and it is conserved only when those two boundary traces balance.
[/example]
The example shows that an energy identity has to account for both motion through the boundary and compression inside the domain. The condition $b\cdot\nu=0$ removes the boundary exchange, so the estimate can focus on the geometry of the flow in the interior. That leaves a specific question which the example does not answer: once boundary flux has been removed, how large can the $L^2$ norm become solely because the velocity field expands or compresses volume inside $U$? The next theorem gives the corresponding differential identity and turns it into a usable bound by Gronwall's inequality. This is the first multiplier argument in the chapter: multiply the PDE by the unknown, integrate, and read the sign and boundary terms.
[quotetheorem:6163]
[citeproof:6163]
This computation shows the role of divergence: it is the infinitesimal rate at which the velocity field changes volume. The smooth-boundary and tangency assumptions are used exactly to justify integration by parts and remove the boundary flux term. If $b\cdot\nu$ has a sign, the estimate must include boundary inflow or outflow data instead; the one-dimensional example above gives the explicit terms $\frac12|u(0,t)|^2$ and $-\frac12|u(1,t)|^2$. The $C^1$ regularity of $b$ is used to make $\operatorname{div}b$ a bounded function and to apply the classical divergence theorem, while $C^1$ regularity of $u$ justifies differentiating the $L^2$ energy and multiplying pointwise by $u$. If $b$ is merely discontinuous, characteristic curves may fail to be unique and the product rule behind $b\cdot\nabla(u^2)$ can fail in the classical sense; if $u$ has a jump discontinuity, the same computation produces distributional boundary and shock terms rather than the displayed identity. The estimate is an a priori bound for already regular solutions, not a construction of such solutions, and weaker solution theories need additional trace and renormalisation arguments before this calculation can be reused.
Incompressible transport has a sharper statement.
[example: Divergence-Free Transport]
Let $b\in C^1(\overline U;\mathbb R^n)$ satisfy $\operatorname{div}b=0$ in $U$ and $b\cdot\nu=0$ on $\partial U$, and let $u$ be a regular solution of
\begin{align*}
u_t+b\cdot\nabla u=0.
\end{align*}
Define
\begin{align*}
E(t)=\frac12\|u(\cdot,t)\|_{L^2(U)}^2=\frac12\int_U |u(x,t)|^2\,d\mathcal L^n(x).
\end{align*}
By the *[Basic Energy Estimate for Linear Transport](/theorems/6163)*,
\begin{align*}
E'(t)=\frac12\int_U (\operatorname{div}b)(x)|u(x,t)|^2\,d\mathcal L^n(x).
\end{align*}
Since $\operatorname{div}b=0$ in $U$, for every $x\in U$,
\begin{align*}
(\operatorname{div}b)(x)|u(x,t)|^2=0\cdot |u(x,t)|^2=0.
\end{align*}
Substituting this into the identity for $E'(t)$ gives
\begin{align*}
E'(t)=\frac12\int_U 0\,d\mathcal L^n(x)=0.
\end{align*}
Thus $E$ has zero derivative on $[0,T]$, so $E(t)=E(0)$ for every $t\in[0,T]$. Expanding this equality gives
\begin{align*}
\frac12\|u(\cdot,t)\|_{L^2(U)}^2=\frac12\|u(\cdot,0)\|_{L^2(U)}^2.
\end{align*}
Multiplying both sides by $2$ yields
\begin{align*}
\|u(\cdot,t)\|_{L^2(U)}^2=\|u(\cdot,0)\|_{L^2(U)}^2.
\end{align*}
Both norms are nonnegative, so taking the nonnegative square root of both sides gives
\begin{align*}
\|u(\cdot,t)\|_{L^2(U)}=\|u(\cdot,0)\|_{L^2(U)}
\end{align*}
for every $t\in[0,T]$. Thus an incompressible velocity field tangent to the boundary transports the solution without changing its square-integral size.
[/example]
The transport estimate controls a first-order evolution by tracking how the flow changes volume. For waves, the correct size is not just $\|u\|_{L^2}$, because the equation is second order in time and stores information in both velocity and spatial gradient. The next theorem identifies the conserved quantity that matches this structure.
[quotetheorem:6164]
[citeproof:6164]
The formula says that the equation moves energy around without changing its total amount. The $C^2$ regularity of $u$ permits differentiating the energy and integrating by parts in the classical sense, while the smooth-boundary hypothesis supplies a well-defined outward normal and the boundary version of Green's identity. For weak waves the same conservation law requires a density or approximation argument; without enough regularity, the formal derivative of $E$ need not be justified at every time. The homogeneous Dirichlet condition is what kills the boundary work term; for other boundary conditions, the energy identity has an extra flux contribution. Thus conservation is a statement about a closed system, not about every wave equation on a bounded domain. The next one-dimensional example keeps the same computation visible and shows exactly how the boundary term measures work done at the endpoints.
[illustration:pdei-wave-energy-fixed-string]
[example: Boundary Work for the One-Dimensional Wave Equation]
Let $u\in C^2([0,L]\times[0,T])$ solve
\begin{align*}
u_{tt}-u_{xx}=0
\end{align*}
on $(0,L)\times(0,T)$, and define
\begin{align*}
E(t)=\frac12\int_0^L \left(|u_t(x,t)|^2+|u_x(x,t)|^2\right)\,dx.
\end{align*}
Since $u$ is real-valued, $|u_t|^2=u_t^2$ and $|u_x|^2=u_x^2$. Differentiating under the integral sign gives
\begin{align*}
E'(t)=\frac12\int_0^L \partial_t\left(u_t(x,t)^2+u_x(x,t)^2\right)\,dx.
\end{align*}
By the chain rule,
\begin{align*}
\partial_t\left(u_t(x,t)^2\right)=2u_t(x,t)u_{tt}(x,t)
\end{align*}
and
\begin{align*}
\partial_t\left(u_x(x,t)^2\right)=2u_x(x,t)u_{xt}(x,t).
\end{align*}
Substituting these two identities into the derivative of $E$,
\begin{align*}
E'(t)=\frac12\int_0^L \left(2u_t(x,t)u_{tt}(x,t)+2u_x(x,t)u_{xt}(x,t)\right)\,dx.
\end{align*}
Splitting the integral and cancelling the factor $\frac12\cdot 2$ in each term gives
\begin{align*}
E'(t)=\int_0^L u_t(x,t)u_{tt}(x,t)\,dx+\int_0^L u_x(x,t)u_{xt}(x,t)\,dx.
\end{align*}
Because $u\in C^2$, the mixed derivative $u_{xt}$ exists and $u_{xt}=\partial_xu_t$. Therefore
\begin{align*}
\int_0^L u_x(x,t)u_{xt}(x,t)\,dx=\int_0^L u_x(x,t)\partial_xu_t(x,t)\,dx.
\end{align*}
Integration by parts in $x$ gives
\begin{align*}
\int_0^L u_x(x,t)\partial_xu_t(x,t)\,dx=\left[u_x(x,t)u_t(x,t)\right]_{x=0}^{x=L}-\int_0^L u_{xx}(x,t)u_t(x,t)\,dx.
\end{align*}
The boundary bracket evaluates to
\begin{align*}
\left[u_x(x,t)u_t(x,t)\right]_{x=0}^{x=L}=u_x(L,t)u_t(L,t)-u_x(0,t)u_t(0,t).
\end{align*}
Thus
\begin{align*}
\int_0^L u_x(x,t)u_{xt}(x,t)\,dx=u_x(L,t)u_t(L,t)-u_x(0,t)u_t(0,t)-\int_0^L u_{xx}(x,t)u_t(x,t)\,dx.
\end{align*}
Substituting this expression into the formula for $E'(t)$ yields
\begin{align*}
E'(t)=\int_0^L u_t(x,t)u_{tt}(x,t)\,dx-\int_0^L u_{xx}(x,t)u_t(x,t)\,dx+u_x(L,t)u_t(L,t)-u_x(0,t)u_t(0,t).
\end{align*}
Combining the two integrals,
\begin{align*}
E'(t)=\int_0^L \left(u_t(x,t)u_{tt}(x,t)-u_{xx}(x,t)u_t(x,t)\right)\,dx+u_x(L,t)u_t(L,t)-u_x(0,t)u_t(0,t).
\end{align*}
Factoring out $u_t(x,t)$ inside the integral gives
\begin{align*}
E'(t)=\int_0^L u_t(x,t)\left(u_{tt}(x,t)-u_{xx}(x,t)\right)\,dx+u_x(L,t)u_t(L,t)-u_x(0,t)u_t(0,t).
\end{align*}
Since $u_{tt}-u_{xx}=0$ on $(0,L)\times(0,T)$,
\begin{align*}
u_t(x,t)\left(u_{tt}(x,t)-u_{xx}(x,t)\right)=u_t(x,t)\cdot 0=0
\end{align*}
for every $x\in(0,L)$ and $t\in(0,T)$. Hence
\begin{align*}
\int_0^L u_t(x,t)\left(u_{tt}(x,t)-u_{xx}(x,t)\right)\,dx=\int_0^L 0\,dx=0.
\end{align*}
Therefore
\begin{align*}
E'(t)=u_x(L,t)u_t(L,t)-u_x(0,t)u_t(0,t).
\end{align*}
If the endpoints are fixed, so that $u(0,t)=0$ and $u(L,t)=0$ for all $t\in[0,T]$, then differentiating these identities with respect to $t$ gives
\begin{align*}
u_t(0,t)=0
\end{align*}
and
\begin{align*}
u_t(L,t)=0.
\end{align*}
Substituting these endpoint velocities into the boundary formula,
\begin{align*}
E'(t)=u_x(L,t)\cdot 0-u_x(0,t)\cdot 0=0-0=0.
\end{align*}
Thus fixed endpoints conserve the one-dimensional wave energy, while nonzero endpoint velocities contribute the boundary power term $u_x(L,t)u_t(L,t)-u_x(0,t)u_t(0,t)$.
[/example]
## Uniqueness from A Priori Estimates
Representation formulas prove uniqueness by writing the solution down. Estimates prove uniqueness in a more robust way: compare two possible solutions, apply an estimate to their difference, and conclude that the difference must vanish. This is the pattern that survives when explicit formulas are unavailable.
The abstract form isolates what is needed from the PDE. Once an a priori estimate has been proved, uniqueness is usually a short consequence.
[quotetheorem:6165]
[citeproof:6165]
This theorem is not tied to any one equation. The previous maximum and energy estimates are examples of the same logic, with different norms and different boundary data. Its hypotheses are deliberately stated at the level where uniqueness actually happens: the estimate must control a norm strong enough to identify two solutions in the chosen class. If an estimate controls only $\|\nabla w\|_{L^2(U)}$ on $H^1(U)$ without boundary normalisation, then constants have zero controlled quantity but are different functions; uniqueness would fail until boundary data or a mean-zero condition removes that kernel. Likewise, if $V$ maps into $X$ by a non-injective linear map, equality in $X$ need not identify equality in $V$; two distinct representatives can have the same image. The theorem does not prove that a solution exists for every $f$ and $g$, and it does not say that a weak solution is unique unless that weak solution belongs to a class where the estimate is valid.
The heat equation shows the pattern in an order norm.
[example: Heat Uniqueness as an Estimate]
Let $U\subset\mathbb R^n$ be bounded and open, let $T>0$, and suppose $u,v\in C^{2,1}(U\times(0,T])\cap C(\overline U\times[0,T])$ solve the same homogeneous heat equation with the same initial and lateral boundary data. Set
\begin{align*}
w=u-v.
\end{align*}
Since the stated regularity classes are closed under subtraction,
\begin{align*}
w\in C^{2,1}(U\times(0,T])\cap C(\overline U\times[0,T]).
\end{align*}
For $(x,t)\in U\times(0,T]$, linearity of $\partial_t$ gives
\begin{align*}
w_t(x,t)=u_t(x,t)-v_t(x,t).
\end{align*}
Linearity of the Laplacian gives
\begin{align*}
\Delta w(x,t)=\Delta u(x,t)-\Delta v(x,t).
\end{align*}
Therefore
\begin{align*}
w_t(x,t)-\Delta w(x,t)=\bigl(u_t(x,t)-v_t(x,t)\bigr)-\bigl(\Delta u(x,t)-\Delta v(x,t)\bigr).
\end{align*}
Expanding the parentheses gives
\begin{align*}
w_t(x,t)-\Delta w(x,t)=u_t(x,t)-v_t(x,t)-\Delta u(x,t)+\Delta v(x,t).
\end{align*}
Regrouping the $u$-terms and $v$-terms,
\begin{align*}
w_t(x,t)-\Delta w(x,t)=\bigl(u_t(x,t)-\Delta u(x,t)\bigr)-\bigl(v_t(x,t)-\Delta v(x,t)\bigr).
\end{align*}
Since both $u$ and $v$ solve the homogeneous heat equation,
\begin{align*}
u_t(x,t)-\Delta u(x,t)=0
\end{align*}
and
\begin{align*}
v_t(x,t)-\Delta v(x,t)=0.
\end{align*}
Thus
\begin{align*}
w_t(x,t)-\Delta w(x,t)=0-0=0,
\end{align*}
so $w$ also solves the homogeneous heat equation.
By the *Weak Maximum Principle for the Heat Equation* applied to $w$,
\begin{align*}
\max_{\overline U\times[0,T]} w=\max_{\partial_p(U\times(0,T])} w.
\end{align*}
Applying the same principle to $-w$ gives
\begin{align*}
\max_{\overline U\times[0,T]}(-w)=\max_{\partial_p(U\times(0,T])}(-w).
\end{align*}
For any compact set $A$,
\begin{align*}
\max_A(-w)=-\min_A w.
\end{align*}
Indeed, if $m=\min_A w$, then $m\le w(y)$ for every $y\in A$, so $-w(y)\le -m$ for every $y\in A$; and at a point where $w=m$, the value of $-w$ is $-m$. Hence the maximum principle for $-w$ is equivalent to
\begin{align*}
\min_{\overline U\times[0,T]}w=\min_{\partial_p(U\times(0,T])}w.
\end{align*}
Therefore every $(x,t)\in\overline U\times[0,T]$ satisfies
\begin{align*}
\min_{\partial_p(U\times(0,T])}w\le w(x,t)\le \max_{\partial_p(U\times(0,T])}w.
\end{align*}
For every boundary point $y\in\partial_p(U\times(0,T])$,
\begin{align*}
w(y)\le |w(y)|
\end{align*}
and
\begin{align*}
-w(y)\le |w(y)|.
\end{align*}
Thus both the boundary maximum of $w$ and the negative of the boundary minimum of $w$ are bounded above by $\max_{\partial_p(U\times(0,T])}|w|$. It follows that
\begin{align*}
|w(x,t)|\le \max_{\partial_p(U\times(0,T])}|w|
\end{align*}
for every $(x,t)\in\overline U\times[0,T]$. Taking the maximum over $\overline U\times[0,T]$ gives
\begin{align*}
\|w\|_{L^\infty(\overline U\times[0,T])}\le \|w\|_{L^\infty(\partial_p(U\times(0,T]))}.
\end{align*}
Since $w=u-v$, this is
\begin{align*}
\|u-v\|_{L^\infty(\overline U\times[0,T])}\le \|u-v\|_{L^\infty(\partial_p(U\times(0,T]))}.
\end{align*}
On the initial part $\overline U\times\{0\}$, the initial data agree, so
\begin{align*}
w(x,0)=u(x,0)-v(x,0)=0
\end{align*}
for every $x\in\overline U$. On the lateral part $\partial U\times[0,T]$, the boundary data agree, so
\begin{align*}
w(x,t)=u(x,t)-v(x,t)=0
\end{align*}
for every $(x,t)\in\partial U\times[0,T]$. Since
\begin{align*}
\partial_p(U\times(0,T])=(\overline U\times\{0\})\cup(\partial U\times[0,T]),
\end{align*}
we have
\begin{align*}
w=0
\end{align*}
on $\partial_p(U\times(0,T])$. Therefore
\begin{align*}
\|w\|_{L^\infty(\partial_p(U\times(0,T]))}=0.
\end{align*}
Substituting this into the estimate gives
\begin{align*}
\|u-v\|_{L^\infty(\overline U\times[0,T])}\le 0.
\end{align*}
An $L^\infty$ norm is nonnegative, so
\begin{align*}
\|u-v\|_{L^\infty(\overline U\times[0,T])}=0.
\end{align*}
Hence $u-v=0$ on $\overline U\times[0,T]$, which means
\begin{align*}
u(x,t)=v(x,t)
\end{align*}
for every $(x,t)\in\overline U\times[0,T]$. The maximum principle is therefore an a priori estimate: zero initial and lateral boundary difference forces the whole solution difference to vanish.
[/example]
The transport equation gives the same conclusion in an integral norm rather than a pointwise norm. This distinction matters because many equations have useful energy estimates even when no maximum principle is available.
[example: Transport Uniqueness from Energy Estimate]
Let $u$ and $v$ be regular solutions of the same forced transport equation,
\begin{align*}
u_t+b\cdot\nabla u=f
\end{align*}
and
\begin{align*}
v_t+b\cdot\nabla v=f
\end{align*}
on $U\times(0,T)$, where $b\cdot\nu=0$ on $\partial U$ and $\|\operatorname{div}b\|_\infty\le M$. Assume that $u$ and $v$ have the same initial data. We show that their difference has zero $L^2$ norm at every time.
Set
\begin{align*}
w=u-v.
\end{align*}
By linearity of the time derivative,
\begin{align*}
w_t=(u-v)_t=u_t-v_t.
\end{align*}
By linearity of the gradient,
\begin{align*}
\nabla w=\nabla(u-v)=\nabla u-\nabla v.
\end{align*}
Taking the dot product with $b$ gives
\begin{align*}
b\cdot\nabla w=b\cdot(\nabla u-\nabla v)=b\cdot\nabla u-b\cdot\nabla v.
\end{align*}
Adding the identities for $w_t$ and $b\cdot\nabla w$,
\begin{align*}
w_t+b\cdot\nabla w=(u_t-v_t)+(b\cdot\nabla u-b\cdot\nabla v).
\end{align*}
Expanding the parentheses gives
\begin{align*}
w_t+b\cdot\nabla w=u_t-v_t+b\cdot\nabla u-b\cdot\nabla v.
\end{align*}
Regrouping the $u$-terms and $v$-terms,
\begin{align*}
w_t+b\cdot\nabla w=(u_t+b\cdot\nabla u)-(v_t+b\cdot\nabla v).
\end{align*}
Using the two forced transport equations,
\begin{align*}
w_t+b\cdot\nabla w=f-f=0.
\end{align*}
Thus $w$ solves the homogeneous transport equation with the same velocity field $b$.
By the *Basic Energy Estimate for Linear Transport* applied to $w$,
\begin{align*}
\|w(\cdot,t)\|_{L^2(U)}\le e^{Mt/2}\|w(\cdot,0)\|_{L^2(U)}.
\end{align*}
Since the initial data agree, for every $x\in U$,
\begin{align*}
w(x,0)=u(x,0)-v(x,0)=0.
\end{align*}
Therefore
\begin{align*}
\|w(\cdot,0)\|_{L^2(U)}=\left(\int_U |w(x,0)|^2\,d\mathcal L^n(x)\right)^{1/2}.
\end{align*}
Substituting $w(x,0)=0$ into the integrand gives
\begin{align*}
\|w(\cdot,0)\|_{L^2(U)}=\left(\int_U |0|^2\,d\mathcal L^n(x)\right)^{1/2}.
\end{align*}
Since $|0|^2=0$,
\begin{align*}
\|w(\cdot,0)\|_{L^2(U)}=\left(\int_U 0\,d\mathcal L^n(x)\right)^{1/2}=0.
\end{align*}
Substituting this into the energy estimate gives
\begin{align*}
\|w(\cdot,t)\|_{L^2(U)}\le e^{Mt/2}\cdot 0=0.
\end{align*}
Because an $L^2$ norm is nonnegative,
\begin{align*}
0\le \|w(\cdot,t)\|_{L^2(U)}\le 0.
\end{align*}
Hence
\begin{align*}
\|w(\cdot,t)\|_{L^2(U)}=0
\end{align*}
for every $t\in[0,T]$. Since $w=u-v$, the two regular solutions agree in $L^2(U)$ at every time; the energy estimate converts equality of forcing and initial data into uniqueness in the controlled norm.
[/example]
The conceptual gain of this chapter is that qualitative PDE theory can begin before existence theory is complete. Maximum principles give order bounds, energy methods give norm bounds, and both convert homogeneous data into uniqueness. This mirrors comparison arguments in real analysis, Lyapunov functions in dynamical systems, and norm estimates for bounded linear operators: in each setting, a structural inequality replaces an explicit formula. Later elliptic, parabolic, and hyperbolic theories refine these estimates, but the basic proof strategy remains the same: derive a bound from the equation, then use the bound to rule out nonzero differences.
Maximum principles and energy estimates give control without requiring a closed-form solution, but they still depend on the right data being prescribed. Chapter 11 returns to that foundational issue and explains how boundary and initial conditions must be matched to the PDE to obtain a complete classical problem.
# 11. Boundary and Initial Data in Classical Models
This chapter fixes a point that has appeared since the Cauchy-problem and well-posedness definitions in Chapters 0 and 1: a PDE is not a complete problem until the boundary or initial data have been chosen in a way compatible with the equation. The same differential expression may define a well-posed Dirichlet problem, an underdetermined Neumann problem, an ill-posed Cauchy problem, or a mixed problem with different data on different parts of the boundary. We now collect the classical data types and the integral identities that explain the right choices for elliptic and transport models.
## Choosing Data for a PDE Problem
The first question is not only what equation should be solved, but where the missing information should be supplied. For elliptic equations, values on the spatial boundary are natural; for evolution and transport equations, initial or inflow data follow the direction of propagation. The terminology records these choices.
[definition: Dirichlet Data]
Let $U \subset \mathbb R^n$ be open with boundary $\partial U$. Dirichlet data for an unknown function $u: \overline U \to \mathbb R$ prescribe the boundary values
\begin{align*}
u = g \quad \text{on } \partial U,
\end{align*}
where $g: \partial U \to \mathbb R$ is a given function.
[/definition]
Dirichlet data fix the trace of the unknown, so they match problems where the solution itself is controlled at the boundary. For Poisson's equation, this is the standard way to remove additive ambiguity and obtain uniqueness under suitable hypotheses.
[example: Dirichlet Boundary Temperature]
Let $U \subset \mathbb R^n$ be a bounded domain, and let $u$ denote the steady-state temperature in $U$. The model
\begin{align*}
-\Delta u = f \quad \text{in } U
\end{align*}
with
\begin{align*}
u = g \quad \text{on } \partial U
\end{align*}
represents an imposed heat source $f$ in the interior together with a prescribed boundary temperature $g$.
To see why the boundary values are part of the physical problem, suppose $u$ solves the interior equation $-\Delta u=f$, and let $c\in\mathbb R$ with $c\neq 0$. Since $c$ is constant,
\begin{align*}
\frac{\partial c}{\partial x_i}=0 \quad \text{for } i=1,\ldots,n.
\end{align*}
Differentiating these zero functions once more gives
\begin{align*}
\frac{\partial^2 c}{\partial x_i^2}=0 \quad \text{for } i=1,\ldots,n.
\end{align*}
Therefore
\begin{align*}
\Delta c
=
\sum_{i=1}^n \frac{\partial^2 c}{\partial x_i^2}
=
\sum_{i=1}^n 0
=
0.
\end{align*}
For the shifted function $u+c$, linearity of partial differentiation gives
\begin{align*}
\Delta(u+c)
=
\sum_{i=1}^n \frac{\partial^2 (u+c)}{\partial x_i^2}
=
\sum_{i=1}^n \left(\frac{\partial^2 u}{\partial x_i^2}+\frac{\partial^2 c}{\partial x_i^2}\right)
=
\Delta u+\Delta c.
\end{align*}
Multiplying by $-1$ gives
\begin{align*}
-\Delta(u+c)
=
-\Delta u-\Delta c.
\end{align*}
Using $-\Delta u=f$ and $\Delta c=0$, we obtain
\begin{align*}
-\Delta(u+c)
=
f-0
=
f.
\end{align*}
Thus $u$ and $u+c$ solve the same interior equation.
On the boundary, the trace of the shifted function is
\begin{align*}
(u+c)|_{\partial U}
=
u|_{\partial U}+c.
\end{align*}
Since the Dirichlet condition for $u$ is $u|_{\partial U}=g$, this becomes
\begin{align*}
(u+c)|_{\partial U}
=
g+c.
\end{align*}
Because $c\neq 0$, the boundary value $g+c$ is not the same prescribed function as $g$. The forcing term $f$ alone therefore does not determine the temperature profile; the boundary temperature selects which solution of the same interior equation is being modeled.
[/example]
A different experiment fixes the rate at which mass, heat, or charge crosses the boundary. This leads to normal derivative data rather than value data.
[definition: Neumann Data]
Let $U \subset \mathbb R^n$ be a domain whose boundary has an outward unit normal vector $\nu$ at the points under consideration. Neumann data for a differentiable function $u: \overline U \to \mathbb R$ prescribe
\begin{align*}
\frac{\partial u}{\partial \nu} = g \quad \text{on } \partial U,
\end{align*}
where
\begin{align*}
\frac{\partial u}{\partial \nu}(x) := \nabla u(x) \cdot \nu(x).
\end{align*}
[/definition]
Neumann data determine flux across the boundary. They do not determine the absolute additive level of $u$, and that fact becomes an important uniqueness obstruction for elliptic problems.
[definition: Cauchy Data]
Let $\Sigma \subset \mathbb R^n$ be a hypersurface. Cauchy data for a second-order scalar equation prescribe both
\begin{align*}
u|_\Sigma = g
\end{align*}
and
\begin{align*}
\frac{\partial u}{\partial \nu}\bigg|_\Sigma = h,
\end{align*}
where $g,h: \Sigma \to \mathbb R$ are given functions and $\nu$ is a chosen unit normal along $\Sigma$.
[/definition]
For hyperbolic equations, Cauchy data on a non-characteristic initial surface are often the correct input. For elliptic equations, prescribing both value and normal derivative on a boundary portion is usually too rigid and unstable; the Laplace example below makes this visible.
[definition: Mixed Boundary Data]
Let $\partial U = \Gamma_D \cup \Gamma_N$ with $\Gamma_D \cap \Gamma_N = \varnothing$. Mixed boundary data prescribe
\begin{align*}
u = g \quad \text{on } \Gamma_D
\end{align*}
and
\begin{align*}
\frac{\partial u}{\partial \nu} = h \quad \text{on } \Gamma_N,
\end{align*}
for given functions $g: \Gamma_D \to \mathbb R$ and $h: \Gamma_N \to \mathbb R$.
[/definition]
Mixed data occur when different parts of the boundary have different physical controls. A membrane may be clamped on one side and insulated on another; a transport equation may require data only where characteristics enter the domain.
[example: Transport Inflow Boundary Condition]
Let $U \subset \mathbb R^n$ be a bounded domain with smooth boundary and let $b \in C^1(\overline U;\mathbb R^n)$. For
\begin{align*}
b(x)\cdot \nabla u(x)=f(x)\quad \text{in }U,
\end{align*}
the sign of $b\cdot \nu$ determines whether the characteristic direction points into or out of the domain. The inflow and outflow portions of the boundary are
\begin{align*}
\Gamma_-=\{x\in \partial U: b(x)\cdot \nu(x)<0\},
\end{align*}
and
\begin{align*}
\Gamma_+=\{x\in \partial U: b(x)\cdot \nu(x)>0\}.
\end{align*}
To justify the sign convention, let $\rho$ be a local defining function near $x_0\in \partial U$, with
\begin{align*}
U=\{\rho<0\},
\qquad
\partial U=\{\rho=0\},
\qquad
\nabla \rho=\nu \quad \text{on } \partial U.
\end{align*}
Let $X$ be the characteristic satisfying
\begin{align*}
X'(s)=b(X(s)),
\qquad
X(0)=x_0.
\end{align*}
Taylor's formula for the scalar function $s\mapsto \rho(X(s))$ at $s=0$ gives
\begin{align*}
\rho(X(h))
=
\rho(X(0))
+
h\frac{d}{ds}\rho(X(s))\bigg|_{s=0}
+
o(h).
\end{align*}
By the chain rule,
\begin{align*}
\frac{d}{ds}\rho(X(s))\bigg|_{s=0}
=
\nabla \rho(X(0))\cdot X'(0).
\end{align*}
Since $X(0)=x_0$, $\rho(x_0)=0$, $\nabla \rho(x_0)=\nu(x_0)$, and $X'(0)=b(x_0)$, this becomes
\begin{align*}
\rho(X(h))
=
0+h\,\nu(x_0)\cdot b(x_0)+o(h).
\end{align*}
Equivalently,
\begin{align*}
\rho(X(h))
=
h\,b(x_0)\cdot \nu(x_0)+o(h).
\end{align*}
If $b(x_0)\cdot \nu(x_0)<0$, set
\begin{align*}
\alpha=-b(x_0)\cdot \nu(x_0)>0.
\end{align*}
Then
\begin{align*}
\rho(X(h))
=
-\alpha h+o(h)
=
h\left(-\alpha+\frac{o(h)}{h}\right).
\end{align*}
Because $o(h)/h\to 0$ as $h\to 0^+$, there is $\delta>0$ such that
\begin{align*}
\left|\frac{o(h)}{h}\right|<\frac{\alpha}{2}
\quad \text{whenever }0<h<\delta.
\end{align*}
For such $h$,
\begin{align*}
-\alpha+\frac{o(h)}{h}
\le
-\alpha+\frac{\alpha}{2}
=
-\frac{\alpha}{2}
<0.
\end{align*}
Hence
\begin{align*}
\rho(X(h))<0
\quad \text{for }0<h<\delta,
\end{align*}
so $X(h)\in U$ for small positive $h$. Thus points with $b\cdot \nu<0$ are inflow points.
Now suppose a characteristic enters at $x_-\in \Gamma_-$ and write
\begin{align*}
X(0)=x_-,
\qquad
X'(s)=b(X(s)).
\end{align*}
Along this curve, the chain rule gives
\begin{align*}
\frac{d}{ds}u(X(s))
=
\nabla u(X(s))\cdot X'(s).
\end{align*}
Substituting the characteristic equation $X'(s)=b(X(s))$ yields
\begin{align*}
\frac{d}{ds}u(X(s))
=
\nabla u(X(s))\cdot b(X(s)).
\end{align*}
Since the dot product is commutative,
\begin{align*}
\nabla u(X(s))\cdot b(X(s))
=
b(X(s))\cdot \nabla u(X(s)).
\end{align*}
Using the transport equation at the point $X(s)$ gives
\begin{align*}
b(X(s))\cdot \nabla u(X(s))
=
f(X(s)).
\end{align*}
Therefore
\begin{align*}
\frac{d}{ds}u(X(s))
=
f(X(s)).
\end{align*}
Integrating from $0$ to $t$ gives
\begin{align*}
\int_0^t \frac{d}{ds}u(X(s))\,ds
=
\int_0^t f(X(s))\,ds.
\end{align*}
By the fundamental theorem of calculus,
\begin{align*}
\int_0^t \frac{d}{ds}u(X(s))\,ds
=
u(X(t))-u(X(0)).
\end{align*}
Thus
\begin{align*}
u(X(t))-u(X(0))
=
\int_0^t f(X(s))\,ds.
\end{align*}
If the inflow boundary condition is $u=g$ on $\Gamma_-$, then $X(0)=x_-\in \Gamma_-$ implies
\begin{align*}
u(X(0))=u(x_-)=g(x_-).
\end{align*}
Substitution gives
\begin{align*}
u(X(t))-g(x_-)
=
\int_0^t f(X(s))\,ds,
\end{align*}
and hence
\begin{align*}
u(X(t))
=
g(x_-)+\int_0^t f(X(s))\,ds.
\end{align*}
If the same characteristic exits at $x_+=X(T)\in \Gamma_+$, then taking $t=T$ gives
\begin{align*}
u(x_+)
=
u(X(T))
=
g(x_-)+\int_0^{\,T} f(X(s))\,ds.
\end{align*}
Thus inflow data determine the transported solution along each characteristic, while an independently prescribed outflow value is compatible only if it equals the value already produced by the inflow datum and the source term.
[/example]
## Green's Identities and Boundary Terms
Why do Dirichlet and Neumann conditions appear so persistently in elliptic theory? The answer is that integration by parts turns second derivatives in the interior into normal derivatives and boundary values on $\partial U$. The resulting identities are the bookkeeping device behind uniqueness, compatibility, and weak formulations. Throughout this chapter, $d\mathcal L^n$ denotes integration with respect to $n$-dimensional Lebesgue measure on $U$, and $d\mathcal H^{n-1}$ denotes integration with respect to $(n-1)$-dimensional [Hausdorff measure](/page/Hausdorff%20Measure) on sufficiently regular portions of $\partial U$.
[explanation: Green Identities]
Let $U\subset\mathbb R^n$ be bounded with $C^1$ boundary, and let $u,v\in C^2(\overline U)$. The first Green identity is
\begin{align*}
\int_U \nabla u\cdot\nabla v\,d\mathcal L^n
=
\int_{\partial U}u\,\frac{\partial v}{\partial\nu}\,d\mathcal H^{n-1}
-
\int_U u\,\Delta v\,d\mathcal L^n.
\end{align*}
The second Green identity is
\begin{align*}
\int_U (u\Delta v-v\Delta u)\,d\mathcal L^n
=
\int_{\partial U}\left(u\frac{\partial v}{\partial\nu}-v\frac{\partial u}{\partial\nu}\right)\,d\mathcal H^{n-1}.
\end{align*}
[/explanation]
The first identity is the [basic energy identity](/theorems/3672) for the Laplacian: taking the two functions equal converts the PDE into an integral of $|\nabla u|^2$, while taking one factor as a test function is the entry point to weak formulations. The regularity and boundary assumptions are not cosmetic: if $U$ has a cusp or a highly irregular boundary, an outward unit normal may fail to exist on a set large enough to make the displayed boundary term a classical surface integral. Similarly, if $u$ lacks two classical derivatives, the expression $\Delta u$ may not be a pointwise function, and the identity must be reinterpreted weakly rather than read as a classical formula. The identities do not assert existence or uniqueness for any boundary-value problem; they only record the integration-by-parts formulas that later uniqueness and compatibility arguments use. This motivates asking what remains when the same formula is applied to two functions and then subtracted; the answer is the comparison identity used to relate Laplacians to boundary traces.
The second identity is the comparison form of Green's identities: it measures how two Laplacians differ after the boundary traces and normal derivatives have been accounted for. This is why the identity is useful in uniqueness arguments, representation formulas, and compatibility checks. Its assumptions matter for the same reason as in the first identity: on a nonsmooth domain, the normal derivative term may not be defined pointwise, and for functions outside $C^2(\overline U)$ the Laplacian may only exist in a weak or distributional sense. The identity is therefore a classical statement, not yet a weak formulation theorem and not an existence theorem for Poisson's equation. A useful test case is a harmonic function with zero Cauchy data on a boundary portion: the formula suggests why boundary traces control interior information, but it does not by itself prove stable recovery from partial boundary data. It is also the template for the representation formulas using fundamental solutions and Green kernels developed in Chapters 8 and 9.
[example: Energy Identity for Homogeneous Dirichlet Data]
Suppose $U$ is bounded and connected with $C^1$ boundary, and suppose $u\in C^2(\overline U)$ satisfies
\begin{align*}
\Delta u=0 \quad \text{in } U,
\end{align*}
with
\begin{align*}
u=0 \quad \text{on } \partial U.
\end{align*}
We show that these hypotheses force $u=0$ in $U$. Applying *Green's First Identity* with $v=u$ gives
\begin{align*}
\int_U u\Delta u\,d\mathcal L^n
=
\int_{\partial U} u\frac{\partial u}{\partial \nu}\,d\mathcal H^{n-1}
-
\int_U \nabla u\cdot \nabla u\,d\mathcal L^n.
\end{align*}
Because $\Delta u=0$ pointwise in $U$,
\begin{align*}
\int_U u\Delta u\,d\mathcal L^n
=
\int_U u\cdot 0\,d\mathcal L^n
=
\int_U 0\,d\mathcal L^n
=
0.
\end{align*}
Because $u=0$ pointwise on $\partial U$,
\begin{align*}
\int_{\partial U} u\frac{\partial u}{\partial \nu}\,d\mathcal H^{n-1}
=
\int_{\partial U} 0\cdot \frac{\partial u}{\partial \nu}\,d\mathcal H^{n-1}
=
\int_{\partial U} 0\,d\mathcal H^{n-1}
=
0.
\end{align*}
For each $x\in U$, the Euclidean dot product gives
\begin{align*}
\nabla u(x)\cdot \nabla u(x)
=
\sum_{i=1}^n \frac{\partial u}{\partial x_i}(x)\frac{\partial u}{\partial x_i}(x)
=
\sum_{i=1}^n \left(\frac{\partial u}{\partial x_i}(x)\right)^2
=
|\nabla u(x)|^2.
\end{align*}
Substituting these three identities into Green's identity yields
\begin{align*}
0
=
0-\int_U |\nabla u|^2\,d\mathcal L^n,
\end{align*}
and adding $\int_U |\nabla u|^2\,d\mathcal L^n$ to both sides gives
\begin{align*}
\int_U |\nabla u|^2\,d\mathcal L^n=0.
\end{align*}
Since $u\in C^2(\overline U)$, the function $|\nabla u|^2$ is continuous on $U$, and it is nonnegative because it is a sum of squares. If there were a point $x_0\in U$ with $|\nabla u(x_0)|^2>0$, set
\begin{align*}
a=\frac{1}{2}|\nabla u(x_0)|^2>0.
\end{align*}
By continuity, there is $r>0$ with $B(x_0,r)\subset U$ such that
\begin{align*}
|\nabla u(x)|^2\ge a
\quad \text{for every }x\in B(x_0,r).
\end{align*}
Then monotonicity of the integral for nonnegative functions gives
\begin{align*}
\int_U |\nabla u|^2\,d\mathcal L^n
\ge
\int_{B(x_0,r)} |\nabla u|^2\,d\mathcal L^n
\ge
\int_{B(x_0,r)} a\,d\mathcal L^n
=
a\,\mathcal L^n(B(x_0,r))
>
0,
\end{align*}
contradicting $\int_U |\nabla u|^2\,d\mathcal L^n=0$. Hence
\begin{align*}
|\nabla u(x)|^2=0
\quad \text{for every }x\in U,
\end{align*}
so
\begin{align*}
\nabla u(x)=0
\quad \text{for every }x\in U.
\end{align*}
Because $U$ is open and connected, any two points in $U$ can be joined by a piecewise $C^1$ path $\gamma:[0,1]\to U$. For such a path, the chain rule gives
\begin{align*}
\frac{d}{dt}u(\gamma(t))
=
\nabla u(\gamma(t))\cdot \gamma'(t)
=
0\cdot \gamma'(t)
=
0
\end{align*}
on each smooth segment. Integrating along the path from $0$ to $1$ gives
\begin{align*}
u(\gamma(1))-u(\gamma(0))
=
\int_0^1 \frac{d}{dt}u(\gamma(t))\,dt
=
\int_0^1 0\,dt
=
0.
\end{align*}
Thus $u$ is constant on $U$, say $u=C$.
Since $u\in C(\overline U)$ and $u=0$ on $\partial U$, choose any $z\in\partial U$ and any sequence $x_j\in U$ with $x_j\to z$. Then
\begin{align*}
C
=
\lim_{j\to\infty}u(x_j)
=
u(z)
=
0.
\end{align*}
Therefore $u=0$ in $U$. This is the energy mechanism behind uniqueness for homogeneous Dirichlet data.
[/example]
## Uniqueness and Compatibility for Poisson Problems
The central boundary-value model in this chapter is Poisson's equation. The guiding question is: which boundary conditions produce a unique solution, and which impose hidden integral constraints? The Dirichlet and Neumann cases behave differently even though the interior equation is the same.
[explanation: Dirichlet Uniqueness on a Bounded Domain]
Let $U\subset\mathbb R^n$ be bounded and connected, and let $u,v\in C^2(U)\cap C(\overline U)$ satisfy
\begin{align*}
\Delta u=\Delta v=f \quad \text{in } U,
\qquad
u=v=g \quad \text{on } \partial U.
\end{align*}
Then $w=u-v$ is harmonic in $U$ and has zero boundary values. The weak maximum principle applied to $w$ and to $-w$ gives $w\le0$ and $w\ge0$ on $U$, hence $w=0$ and $u=v$.
[/explanation]
Dirichlet data remove all freedom in this bounded-domain Poisson problem. The hypotheses explain the scope of the conclusion: connectedness rules out assigning different constants on different components, continuity on $\overline U$ makes the boundary condition visible to the maximum principle, and boundedness ensures extrema are attained on $\overline U$. On unbounded domains, extra growth conditions at infinity may be needed for uniqueness. The shared right-hand side is essential as well: two functions with the same boundary values but different forcing terms need not agree. Existence is a separate issue depending on the regularity of $U$, $f$, and $g$, but uniqueness follows from the maximum principle alone.
[example: Why Boundary Values Matter for Poisson's Equation]
Let $U=B(0,1)\subset \mathbb R^n$ and take $f=0$. We compare two functions,
\begin{align*}
u(x)=0
\qquad \text{and} \qquad
v(x)=x_1,
\end{align*}
and show that they satisfy the same interior Poisson equation but correspond to different Dirichlet data.
For $u(x)=0$, each coordinate derivative is zero:
\begin{align*}
\frac{\partial u}{\partial x_i}(x)=0
\quad \text{for } i=1,\ldots,n.
\end{align*}
Differentiating once more gives
\begin{align*}
\frac{\partial^2 u}{\partial x_i^2}(x)=0
\quad \text{for } i=1,\ldots,n.
\end{align*}
Using the definition of the Laplacian,
\begin{align*}
-\Delta u(x)
=
-\sum_{i=1}^n \frac{\partial^2 u}{\partial x_i^2}(x)
=
-\sum_{i=1}^n 0
=
-0
=
0.
\end{align*}
Thus $u$ solves $-\Delta u=0$ in $U$.
For $v(x)=x_1$, the first derivatives are
\begin{align*}
\frac{\partial v}{\partial x_1}(x)=1,
\qquad
\frac{\partial v}{\partial x_i}(x)=0
\quad \text{for } i=2,\ldots,n.
\end{align*}
Differentiating these first derivatives again gives
\begin{align*}
\frac{\partial^2 v}{\partial x_1^2}(x)=\frac{\partial}{\partial x_1}(1)=0,
\end{align*}
and, for $i=2,\ldots,n$,
\begin{align*}
\frac{\partial^2 v}{\partial x_i^2}(x)=\frac{\partial}{\partial x_i}(0)=0.
\end{align*}
Therefore
\begin{align*}
-\Delta v(x)
=
-\sum_{i=1}^n \frac{\partial^2 v}{\partial x_i^2}(x)
=
-\left(0+\cdots+0\right)
=
0.
\end{align*}
Thus both $u$ and $v$ solve the same interior equation $-\Delta w=0$ in $U$.
Their boundary traces differ. Since $u(x)=0$ for every $x\in \overline U$,
\begin{align*}
u|_{\partial U}=0.
\end{align*}
Since $v(x)=x_1$ for every $x\in \overline U$,
\begin{align*}
v|_{\partial U}=x_1.
\end{align*}
At the boundary point $e_1=(1,0,\ldots,0)\in \partial U$, these traces give
\begin{align*}
u(e_1)=0,
\qquad
v(e_1)=1.
\end{align*}
Hence $u|_{\partial U}\neq v|_{\partial U}$. If the Dirichlet condition is $w=0$ on $\partial U$, then $u$ satisfies it, while $v$ does not because $v(e_1)=1\neq 0$. If the Dirichlet condition is $w=x_1$ on $\partial U$, then $v$ satisfies it, while $u$ does not because $u(e_1)=0\neq 1=x_1(e_1)$. By *Uniqueness for the Dirichlet Poisson Problem*, each fixed Dirichlet boundary value selects at most one solution. The interior equation $-\Delta w=0$ alone therefore does not determine the boundary-value problem; the boundary data decide which harmonic function is being modeled.
[/example]
Neumann data behave differently because constants have zero normal derivative. This already shows that uniqueness cannot hold without a normalisation. Before that issue is repaired, there is an even more basic question: can arbitrary interior forcing be matched by an arbitrary prescribed boundary flux?
[explanation: Compatibility Condition for the Neumann Poisson Problem]
Let $U\subset\mathbb R^n$ be bounded with smooth boundary. If a classical solution satisfies $\Delta u=f$ in $U$ and $\partial_\nu u=h$ on $\partial U$, then the divergence theorem gives
\begin{align*}
\int_U f\,d\mathcal L^n=\int_{\partial U}h\,d\mathcal H^{n-1}.
\end{align*}
With the sign convention $-\Delta u=f$, the same balance is written as $\int_U f\,d\mathcal L^n+\int_{\partial U}\partial_\nu u\,d\mathcal H^{n-1}=0$.
[/explanation]
This condition is the elliptic analogue of a global conservation law: the total source in the domain must be balanced by the outward flux through the boundary. The $C^1$ boundary assumption is used so that the outward normal and surface integral in the divergence theorem are classical objects; on rough domains the same statement belongs to trace and weak normal-flux theory. The theorem gives a necessary condition, not a full existence theorem: satisfying the integral balance does not by itself construct a solution without additional regularity and compatibility hypotheses. Connectedness affects uniqueness rather than the balance law, since constants on each connected component have zero normal derivative. When the condition holds, a Neumann solution is still determined only up to an additive constant on each connected component.
[example: Neumann Problem with Zero-Mean Forcing]
Let $U \subset \mathbb R^n$ be bounded with smooth boundary, and consider
\begin{align*}
-\Delta u=f \quad \text{in } U,
\qquad
\frac{\partial u}{\partial \nu}=0 \quad \text{on } \partial U.
\end{align*}
By *Compatibility Condition for the Neumann Poisson Problem*, any classical solution must satisfy
\begin{align*}
\int_U f\,d\mathcal L^n+\int_{\partial U}\frac{\partial u}{\partial \nu}\,d\mathcal H^{n-1}=0.
\end{align*}
Using the boundary condition $\frac{\partial u}{\partial \nu}=0$ on $\partial U$, this becomes
\begin{align*}
\int_U f\,d\mathcal L^n+\int_{\partial U}0\,d\mathcal H^{n-1}=0.
\end{align*}
Since the integral of the zero function is zero,
\begin{align*}
\int_{\partial U}0\,d\mathcal H^{n-1}=0,
\end{align*}
so the compatibility condition reduces to
\begin{align*}
\int_U f\,d\mathcal L^n+0=0,
\end{align*}
and hence
\begin{align*}
\int_U f\,d\mathcal L^n=0.
\end{align*}
Thus zero boundary flux requires the total forcing to have mean zero. A strictly positive continuous forcing cannot satisfy this condition. Indeed, if $f(x)>0$ for every $x\in U$, choose $x_0\in U$ and set
\begin{align*}
a=\frac{1}{2}f(x_0)>0.
\end{align*}
Because $U$ is open, there is $r_1>0$ such that $B(x_0,r_1)\subset U$. Because $f$ is continuous at $x_0$, there is $r_2>0$ such that
\begin{align*}
|f(x)-f(x_0)|<\frac{1}{2}f(x_0)
\quad \text{whenever }x\in B(x_0,r_2).
\end{align*}
Let $r=\min\{r_1,r_2\}$. Then $B(x_0,r)\subset U$, and for every $x\in B(x_0,r)$,
\begin{align*}
f(x)
>
f(x_0)-\frac{1}{2}f(x_0)
=
\frac{1}{2}f(x_0)
=
a.
\end{align*}
Therefore, by monotonicity of the integral for nonnegative functions,
\begin{align*}
\int_U f\,d\mathcal L^n
\ge
\int_{B(x_0,r)} f\,d\mathcal L^n
\ge
\int_{B(x_0,r)} a\,d\mathcal L^n
=
a\,\mathcal L^n(B(x_0,r))
>
0.
\end{align*}
This contradicts the necessary identity
\begin{align*}
\int_U f\,d\mathcal L^n=0.
\end{align*}
Finally, homogeneous Neumann data cannot determine the additive constant. If $u$ is one solution and $C\in\mathbb R$ is constant, then
\begin{align*}
\frac{\partial C}{\partial x_i}=0
\quad \text{for } i=1,\ldots,n,
\end{align*}
so
\begin{align*}
\frac{\partial^2 C}{\partial x_i^2}=0
\quad \text{for } i=1,\ldots,n.
\end{align*}
Hence
\begin{align*}
\Delta C
=
\sum_{i=1}^n \frac{\partial^2 C}{\partial x_i^2}
=
\sum_{i=1}^n 0
=
0.
\end{align*}
By linearity of second partial derivatives,
\begin{align*}
\Delta(u+C)
=
\sum_{i=1}^n \frac{\partial^2(u+C)}{\partial x_i^2}
=
\sum_{i=1}^n\left(\frac{\partial^2u}{\partial x_i^2}+\frac{\partial^2C}{\partial x_i^2}\right)
=
\Delta u+\Delta C
=
\Delta u.
\end{align*}
Therefore
\begin{align*}
-\Delta(u+C)
=
-\Delta u
=
f.
\end{align*}
On the boundary,
\begin{align*}
\frac{\partial (u+C)}{\partial \nu}
=
\nabla(u+C)\cdot \nu
=
(\nabla u+\nabla C)\cdot \nu.
\end{align*}
Since $C$ is constant,
\begin{align*}
\nabla C=(0,\ldots,0),
\end{align*}
and so
\begin{align*}
(\nabla u+\nabla C)\cdot \nu
=
(\nabla u+0)\cdot \nu
=
\nabla u\cdot \nu
=
\frac{\partial u}{\partial \nu}
=
0.
\end{align*}
Thus $u+C$ satisfies the same PDE and the same homogeneous Neumann condition as $u$. The flux condition fixes the total balance of the source, but it leaves the absolute additive level of the solution undetermined.
[/example]
The zero-mean example shows both sides of the Neumann story: a global balance condition is needed for existence, and constants remain invisible to the normal derivative. To speak about uniqueness without changing the physical flux condition, we add a separate scalar normalisation.
[remark: Normalisation for Neumann Uniqueness]
On a connected domain, uniqueness for the Neumann problem is usually restored by imposing a normalisation such as
\begin{align*}
\int_U u\,d\mathcal L^n = 0.
\end{align*}
This does not change the PDE or the boundary flux; it chooses one representative from the family $u+C$.
[/remark]
## Domains, Normals, and Flux
Boundary conditions only make classical sense when the boundary has enough geometry to support traces and normal vectors. The next question is therefore what assumptions on the domain are being used when we write $\partial u/\partial \nu$ or integrate over $\partial U$. In these introductory notes we keep the boundary regular enough for the classical divergence theorem.
[definition: Classical Domain with Normal]
A domain $U \subset \mathbb R^n$ has a classical outward normal if $U$ is open, connected, bounded, and $\partial U$ is a $C^1$ hypersurface with an outward unit normal field $\nu: \partial U \to \mathbb R^n$.
[/definition]
This assumption is stronger than needed for modern weak theory, but it is the right setting for the formulas in this chapter. It lets us treat $\nu$ as a geometric object on the boundary and interpret $\nabla u\cdot \nu$ pointwise. Once a normal is available, the next object to measure is the signed amount of a vector field crossing the boundary.
[definition: Boundary Flux]
Let $U \subset \mathbb R^n$ have a classical outward normal. The boundary flux functional is the map
\begin{align*}
\Phi_{\partial U}: C(\overline U;\mathbb R^n) \to \mathbb R
\end{align*}
defined by
\begin{align*}
\Phi_{\partial U}(F) := \int_{\partial U} F\cdot \nu\,d\mathcal H^{n-1}.
\end{align*}
[/definition]
For diffusion or heat conduction, the relevant vector field is often proportional to $-\nabla u$, so Neumann data prescribe the amount crossing the boundary. For transport, the sign of $b\cdot \nu$ decides whether information enters or exits.
[example: Flux Sign for Constant Transport]
Let $U=(0,1)\subset \mathbb R$ and consider
\begin{align*}
b u'(x)=f(x)\quad \text{for }0<x<1,
\end{align*}
where $b>0$ is constant. The boundary of $U$ consists of the two endpoints,
\begin{align*}
\partial U=\{0,1\}.
\end{align*}
At $0$, the outward direction points toward decreasing $x$, so
\begin{align*}
\nu(0)=-1.
\end{align*}
At $1$, the outward direction points toward increasing $x$, so
\begin{align*}
\nu(1)=1.
\end{align*}
The signed transport flux at the two boundary points is therefore
\begin{align*}
b\nu(0)
=
b(-1)
=
-b
<
0,
\end{align*}
because $b>0$, and
\begin{align*}
b\nu(1)
=
b(1)
=
b
>
0.
\end{align*}
Thus the constant transport direction enters $U$ at $0$ and exits $U$ at $1$, so the inflow boundary is $\{0\}$.
Now prescribe the inflow value
\begin{align*}
u(0)=a.
\end{align*}
Since $b>0$, in particular $b\neq 0$, so the equation $bu'(x)=f(x)$ may be divided by $b$:
\begin{align*}
\frac{1}{b}bu'(x)
=
\frac{1}{b}f(x).
\end{align*}
Because $\frac{1}{b}b=1$, this gives
\begin{align*}
u'(x)=\frac{1}{b}f(x)
\end{align*}
for $0<x<1$. Integrating both sides from $0$ to $x$ gives
\begin{align*}
\int_0^x u'(s)\,d\mathcal L^1(s)
=
\int_0^x \frac{1}{b}f(s)\,d\mathcal L^1(s).
\end{align*}
Since $1/b$ is constant, linearity of the integral gives
\begin{align*}
\int_0^x \frac{1}{b}f(s)\,d\mathcal L^1(s)
=
\frac{1}{b}\int_0^x f(s)\,d\mathcal L^1(s).
\end{align*}
By the fundamental theorem of calculus,
\begin{align*}
\int_0^x u'(s)\,d\mathcal L^1(s)
=
u(x)-u(0).
\end{align*}
Substituting these two identities into the integrated equation yields
\begin{align*}
u(x)-u(0)
=
\frac{1}{b}\int_0^x f(s)\,d\mathcal L^1(s).
\end{align*}
Using $u(0)=a$, we obtain
\begin{align*}
u(x)-a
=
\frac{1}{b}\int_0^x f(s)\,d\mathcal L^1(s),
\end{align*}
and adding $a$ to both sides gives
\begin{align*}
u(x)
=
a+\frac{1}{b}\int_0^x f(s)\,d\mathcal L^1(s).
\end{align*}
Taking $x=1$ determines the outflow value:
\begin{align*}
u(1)
=
a+\frac{1}{b}\int_0^1 f(s)\,d\mathcal L^1(s).
\end{align*}
Thus the inflow datum starts the characteristic integration and fixes the value at the outflow endpoint; prescribing $u(1)$ independently is compatible only when it equals the value produced by $a$ and the source term $f$.
[/example]
The same flux language also explains the [Neumann compatibility condition](/theorems/677). If $-\Delta u=f$, then the vector field $\nabla u$ has divergence $\Delta u=-f$, so its total outward flux must equal the total divergence in the domain.
## Ill-Posed Boundary Choices
The final issue is that more data do not necessarily make a PDE better determined. A boundary choice is well matched to an equation only when it respects the equation's propagation, smoothing, or maximum-principle structure. Cauchy data for Laplace's equation provide the standard warning.
[definition: Hadamard Well-Posed Boundary Problem]
Let $X$ be a normed data space, let $Y$ be a normed solution space, and let $\mathcal A \subset X$ be the set of admissible data for a boundary or initial value problem. The problem is Hadamard well-posed from $\mathcal A \subset X$ to $Y$ if for every $g \in \mathcal A$ there exists a unique solution $u \in Y$, and the solution map
\begin{align*}
S: \mathcal A \to Y, \qquad S(g)=u,
\end{align*}
is continuous with respect to the norm inherited from $X$ on $\mathcal A$ and the norm on $Y$.
[/definition]
This definition separates three possible failures. The Dirichlet Poisson theorem above addresses uniqueness; the Neumann compatibility theorem identifies a failure of existence for incompatible data; the next example shows failure of continuous dependence.
[example: Cauchy Data for Laplace Equation]
Consider the strip $U=\mathbb R\times(0,\infty)$ and, for $k\in\mathbb N$, define
\begin{align*}
u_k(x,y)=\frac{1}{k}e^{-ky}\sin(kx).
\end{align*}
We first verify that $u_k$ is harmonic. Differentiating in $x$ gives
\begin{align*}
\frac{\partial u_k}{\partial x}(x,y)=\frac{1}{k}e^{-ky}k\cos(kx)=e^{-ky}\cos(kx).
\end{align*}
Differentiating once more in $x$ gives
\begin{align*}
\frac{\partial^2 u_k}{\partial x^2}(x,y)=e^{-ky}(-k\sin(kx))=-k e^{-ky}\sin(kx).
\end{align*}
Differentiating in $y$ gives
\begin{align*}
\frac{\partial u_k}{\partial y}(x,y)=\frac{1}{k}\sin(kx)(-k e^{-ky})=-e^{-ky}\sin(kx).
\end{align*}
Differentiating once more in $y$ gives
\begin{align*}
\frac{\partial^2 u_k}{\partial y^2}(x,y)=-\sin(kx)(-k e^{-ky})=k e^{-ky}\sin(kx).
\end{align*}
Therefore
\begin{align*}
\Delta u_k(x,y)=\frac{\partial^2 u_k}{\partial x^2}(x,y)+\frac{\partial^2 u_k}{\partial y^2}(x,y).
\end{align*}
Substituting the two second derivatives gives
\begin{align*}
\Delta u_k(x,y)=-k e^{-ky}\sin(kx)+k e^{-ky}\sin(kx)=0.
\end{align*}
Thus $u_k$ is harmonic in $U$.
On the boundary line $y=0$,
\begin{align*}
u_k(x,0)=\frac{1}{k}e^0\sin(kx)=\frac{1}{k}\sin(kx).
\end{align*}
For every compact interval $I\subset\mathbb R$,
\begin{align*}
\sup_{x\in I}|u_k(x,0)|=\sup_{x\in I}\left|\frac{1}{k}\sin(kx)\right|=\frac{1}{k}\sup_{x\in I}|\sin(kx)|.
\end{align*}
Since $|\sin(kx)|\le 1$ for every $x$, this implies
\begin{align*}
\sup_{x\in I}|u_k(x,0)|\le \frac{1}{k}.
\end{align*}
Because $1/k\to 0$, the boundary values $u_k(\cdot,0)$ tend to $0$ uniformly on every compact interval $I$.
The vertical derivative on $y=0$ is
\begin{align*}
\frac{\partial u_k}{\partial y}(x,0)=-e^0\sin(kx)=-\sin(kx).
\end{align*}
On the fixed compact interval $[0,2\pi]$, set
\begin{align*}
x_k=\frac{\pi}{2k}.
\end{align*}
Since $k\ge 1$,
\begin{align*}
0<\frac{\pi}{2k}\le \frac{\pi}{2}<2\pi,
\end{align*}
so $x_k\in[0,2\pi]$. At this point,
\begin{align*}
\left|\frac{\partial u_k}{\partial y}(x_k,0)\right|=|-\sin(kx_k)|.
\end{align*}
Using $kx_k=k\frac{\pi}{2k}=\frac{\pi}{2}$ gives
\begin{align*}
|-\sin(kx_k)|=|-\sin(\pi/2)|=1.
\end{align*}
Therefore
\begin{align*}
\sup_{x\in[0,2\pi]}\left|\frac{\partial u_k}{\partial y}(x,0)\right|\ge \left|\frac{\partial u_k}{\partial y}(x_k,0)\right|=1.
\end{align*}
Thus the derivative data do not tend to $0$ in the corresponding uniform norm.
The growing modes show the sharper instability. Define
\begin{align*}
v_k(x,y)=\frac{1}{k}e^{ky}\sin(kx).
\end{align*}
Differentiating in $x$ gives
\begin{align*}
\frac{\partial v_k}{\partial x}(x,y)=\frac{1}{k}e^{ky}k\cos(kx)=e^{ky}\cos(kx).
\end{align*}
Differentiating once more in $x$ gives
\begin{align*}
\frac{\partial^2 v_k}{\partial x^2}(x,y)=e^{ky}(-k\sin(kx))=-k e^{ky}\sin(kx).
\end{align*}
Differentiating in $y$ gives
\begin{align*}
\frac{\partial v_k}{\partial y}(x,y)=\frac{1}{k}\sin(kx)k e^{ky}=e^{ky}\sin(kx).
\end{align*}
Differentiating once more in $y$ gives
\begin{align*}
\frac{\partial^2 v_k}{\partial y^2}(x,y)=k e^{ky}\sin(kx).
\end{align*}
Hence
\begin{align*}
\Delta v_k(x,y)=\frac{\partial^2 v_k}{\partial x^2}(x,y)+\frac{\partial^2 v_k}{\partial y^2}(x,y).
\end{align*}
Substituting the two second derivatives gives
\begin{align*}
\Delta v_k(x,y)=-k e^{ky}\sin(kx)+k e^{ky}\sin(kx)=0,
\end{align*}
so $v_k$ is harmonic in $U$.
Its boundary amplitude is still small. Since $e^{k\cdot 0}=1$,
\begin{align*}
v_k(x,0)=\frac{1}{k}\sin(kx).
\end{align*}
For every compact interval $I\subset\mathbb R$,
\begin{align*}
\sup_{x\in I}|v_k(x,0)|=\sup_{x\in I}\left|\frac{1}{k}\sin(kx)\right|.
\end{align*}
As before, $|\sin(kx)|\le 1$ gives
\begin{align*}
\sup_{x\in I}|v_k(x,0)|\le \frac{1}{k}\to 0.
\end{align*}
But at any fixed height $y_0>0$,
\begin{align*}
v_k\left(\frac{\pi}{2k},y_0\right)=\frac{1}{k}e^{ky_0}\sin\left(k\frac{\pi}{2k}\right).
\end{align*}
Since $k\frac{\pi}{2k}=\frac{\pi}{2}$ and $\sin(\pi/2)=1$, this becomes
\begin{align*}
v_k\left(\frac{\pi}{2k},y_0\right)=\frac{e^{ky_0}}{k}.
\end{align*}
To see that this tends to infinity, let $M>0$. Since exponential growth dominates linear growth, there is $K_1$ such that
\begin{align*}
\frac{e^{ky_0/2}}{k}\ge 1
\end{align*}
whenever $k\ge K_1$. Since $e^{ky_0/2}\to\infty$, there is $K_2$ such that
\begin{align*}
e^{ky_0/2}\ge M
\end{align*}
whenever $k\ge K_2$. Let $K=\max\{K_1,K_2\}$. For $k\ge K$,
\begin{align*}
\frac{e^{ky_0}}{k}=\left(\frac{e^{ky_0/2}}{k}\right)e^{ky_0/2}\ge 1\cdot M=M.
\end{align*}
Thus
\begin{align*}
\frac{e^{ky_0}}{k}\to\infty.
\end{align*}
Arbitrarily small high-frequency boundary amplitudes can therefore produce unbounded interior values at positive height. This is the failure of continuous dependence behind the elliptic Cauchy problem.
[/example]
The example demonstrates that the Laplace equation does not propagate data away from a hypersurface like a hyperbolic equation. Elliptic equations smooth and constrain interior behaviour through the entire boundary, so prescribing both $u$ and $\partial u/\partial \nu$ on a boundary piece is not a stable general-purpose formulation.
[remark: Boundary Choices by Equation Type]
For elliptic equations such as Poisson's equation, Dirichlet, Neumann, Robin, or mixed boundary conditions are the standard classical choices. For transport equations, data are imposed on initial or inflow hypersurfaces transverse to the characteristic direction. For hyperbolic equations, Cauchy data on non-characteristic hypersurfaces are natural, while for parabolic equations one combines initial data with spatial boundary conditions.
[/remark]
The lesson of the chapter is that boundary data are part of the model, not an accessory appended after the PDE is written. Green's identities expose the boundary terms produced by elliptic operators, the maximum principle gives Dirichlet uniqueness, the divergence theorem forces Neumann compatibility, and characteristic flow dictates inflow data for transport equations.
With the model problems and their data requirements now in place, the course can step back and compare the methods as a whole. Chapter 12 synthesises the classical theory, showing how characteristics, kernels, weak formulations, and estimates fit together and where modern PDE theory begins to extend beyond them.
# 12. Synthesis: From Classical Formulas to Modern PDE Theory
This final chapter gathers the main lessons of the course into a single framework. Earlier chapters solved model PDEs by characteristics, kernels, representation formulas, and maximum principles; here we compare what those tools prove, where they break down, and which ideas survive in modern weak theory. The prerequisites are the classical first-order method from Chapters 2-4, the heat, wave, and Laplace model equations from Chapters 8-10, and the distributional language introduced in Chapters 5 and 8 when discontinuities and point sources first appeared. The aim is not to start a new theory from scratch, but to show how the course's classical tools lead naturally to weak formulations, entropy selection, viscosity limits, and energy estimates.
## What Characteristics Solve and Where They Fail
The first question is how much of first-order PDE theory is genuinely solved by following curves. For linear transport and quasilinear scalar equations, the characteristic method turns a PDE into an ODE system, so it explains both existence and the first obstruction to existence.
[definition: Characteristic Flow]
Let $b: [0,T] \times U \to \mathbb R^n$ be a continuous vector field, with $U \subset \mathbb R^n$ open. A characteristic flow for $b$ is a map $X: [0,T] \times U \to U$ such that $t\mapsto X(t,x_0)$ is differentiable for each $x_0 \in U$ and
\begin{align*}
\frac{d}{dt}X(t,x_0) = b(t,X(t,x_0))
\end{align*}
for all $t\in[0,T]$, with
\begin{align*}
X(0,x_0) = x_0.
\end{align*}
[/definition]
The definition packages the geometric part of the method: it tells us which curves carry the information. This motivates the transport formula, because the next step is to convert the PDE into an ordinary differential equation along each such curve and recover the solution from initial data.
[quotetheorem:6166]
[citeproof:6166]
This theorem explains why the early part of the course had such explicit solutions: once a sufficiently regular curve is known, the PDE is solved along that curve by a one-dimensional integrating factor. The hypotheses are not cosmetic. Smoothness justifies the chain rule; for instance, if $b$ is merely discontinuous, ODE characteristics may fail to be unique, so there may be no single flow $X(t,x_0)$ to insert into the formula. The requirement that the curve remain inside $U$ keeps the PDE available along the whole path: for $U=(0,1)$ and $b\equiv1$, the characteristic starting at $x_0=3/4$ leaves $U$ at time $1/4$. To reconstruct a single-valued solution on spacetime, more is needed: the characteristic labels must determine points uniquely. Burgers characteristics give the basic failure mode, since different labels can reach the same spacetime point after compression. Thus the formula is a representation conditional on a good flow; it does not by itself prove global solvability. The same calculation also reveals a geometric hypothesis that was hidden in the formula. To name the obstruction precisely and state the local invertibility condition for Cauchy data, we need the following definition.
[definition: Transverse Initial Hypersurface]
Let $U\subset\mathbb R^n$ be open, let $b:U\to\mathbb R^n$ be a continuous vector field, and let
\begin{align*}
L:C^1(U)\to C(U), \qquad L u(x)=b(x)\cdot\nabla u(x)
\end{align*}
be the associated first-order transport operator. Let $S \subset U$ be a smooth hypersurface with normal vector $\nu(x)$ at $x \in S$. The hypersurface $S$ is transverse to $L$ at $x$ if
\begin{align*}
b(x)\cdot \nu(x) \ne 0.
\end{align*}
[/definition]
Transversality is the local invertibility condition behind the characteristic parametrisation. If the initial surface is tangent to the vector field, the initial data may propagate along the surface instead of away from it, which motivates testing the definition on examples where uniqueness either holds or fails.
[example: Nontransverse Data for Transport]
Consider $u_x=0$ in $\mathbb R^2$. This is the transport equation
\begin{align*}
(1,0)\cdot \nabla u=0,
\end{align*}
so the transport vector field is $b=(1,0)$. A characteristic $X(s)=(X_1(s),X_2(s))$ satisfies
\begin{align*}
X'(s)=b,
\end{align*}
which means componentwise
\begin{align*}
X_1'(s)=1,\qquad X_2'(s)=0.
\end{align*}
With $X(0)=(0,y_0)$, integrating the two scalar equations gives
\begin{align*}
X_1(s)-X_1(0)=\int_0^s 1\,dr=s,\qquad
X_2(s)-X_2(0)=\int_0^s 0\,dr=0.
\end{align*}
Since $X_1(0)=0$ and $X_2(0)=y_0$, this becomes
\begin{align*}
X_1(s)=s,\qquad X_2(s)=y_0.
\end{align*}
Thus the characteristics are the horizontal lines $y=y_0$.
Now prescribe data on the vertical line $x=0$ by
\begin{align*}
u(0,y)=g(y).
\end{align*}
The vertical line has normal vector $\nu=(1,0)$, and
\begin{align*}
b\cdot \nu=(1,0)\cdot(1,0)=1\ne0,
\end{align*}
so this data line is transverse to the transport direction. For each fixed $y$, the equation gives
\begin{align*}
\frac{d}{dx}u(x,y)=u_x(x,y)=0.
\end{align*}
Integrating from $0$ to $x$ gives
\begin{align*}
u(x,y)-u(0,y)=\int_0^x u_x(r,y)\,dr=\int_0^x 0\,dr=0.
\end{align*}
Therefore
\begin{align*}
u(x,y)=u(0,y)=g(y).
\end{align*}
The vertical data determine exactly one value on each horizontal characteristic.
If instead data are prescribed on the horizontal line $y=0$ by
\begin{align*}
u(x,0)=h(x),
\end{align*}
then the horizontal line has normal vector $\nu=(0,1)$, and
\begin{align*}
b\cdot \nu=(1,0)\cdot(0,1)=0.
\end{align*}
Thus the data surface is tangent to the characteristic direction. Along $y=0$, the PDE forces
\begin{align*}
0=u_x(x,0)=\frac{d}{dx}u(x,0)=h'(x),
\end{align*}
so a classical solution can exist only when $h$ is constant. If $h\equiv c$, then the boundary condition fixes only the characteristic $y=0$:
\begin{align*}
u(x,0)=c.
\end{align*}
For every other horizontal line $y=y_0\ne0$, the equation gives only
\begin{align*}
\frac{d}{dx}u(x,y_0)=0,
\end{align*}
so $u(x,y_0)$ must be constant in $x$, but that constant is not fixed by the data on $y=0$.
Indeed, if $F:\mathbb R\to\mathbb R$ is differentiable and satisfies $F(0)=c$, then
\begin{align*}
u(x,y)=F(y)
\end{align*}
has
\begin{align*}
u_x(x,y)=\frac{\partial}{\partial x}F(y)=0
\end{align*}
and
\begin{align*}
u(x,0)=F(0)=c.
\end{align*}
Different choices of $F$ with the same value $F(0)=c$ give different classical solutions with the same data on the horizontal line. Hence characteristic initial data do not determine a unique solution.
[/example]
The example shows failure caused by bad placement of the data surface. For nonlinear equations there is a second failure even with good initial placement: the flow itself can lose invertibility. This motivates the following theorem, which computes the first time at which Burgers characteristics fold.
[quotetheorem:6167]
[citeproof:6167]
The result identifies shock formation as a geometric folding of the characteristic map. The hypothesis $m<0$ is the compression condition: neighbouring characteristics with larger initial labels can move more slowly than those behind them, so the map $a\mapsto a+t u_0(a)$ loses local invertibility. If the infimum is not attained, the formula should be read as the first possible degeneracy time approached by a sequence of labels, rather than as a guarantee that a single named characteristic collides at $T_*$. The criterion predicts the breakdown of the classical parametrisation; it does not construct the post-shock weak or entropy solution. It is not a numerical defect or a lack of regularity in the initial data; it is built into the nonlinear transport law.
[illustration:pdei-burgers-characteristic-map-folding]
[example: Burgers as Four Linked Equations]
Assume $u$ is smooth on the region under discussion. The inviscid Burgers equation
\begin{align*}
\partial_t u+u\partial_xu=0
\end{align*}
is a transport equation whose velocity is the unknown value $u$ itself. If $x=x(t)$ is a characteristic satisfying
\begin{align*}
x'(t)=u(t,x(t)),
\end{align*}
then the chain rule gives
\begin{align*}
\frac{d}{dt}u(t,x(t))=\partial_tu(t,x(t))+x'(t)\partial_xu(t,x(t)).
\end{align*}
Substituting the characteristic equation $x'(t)=u(t,x(t))$ gives
\begin{align*}
\frac{d}{dt}u(t,x(t))=\partial_tu(t,x(t))+u(t,x(t))\partial_xu(t,x(t)).
\end{align*}
Evaluating Burgers' equation at $(t,x(t))$ gives
\begin{align*}
\partial_tu(t,x(t))+u(t,x(t))\partial_xu(t,x(t))=0.
\end{align*}
Therefore
\begin{align*}
\frac{d}{dt}u(t,x(t))=0.
\end{align*}
Thus $u$ is constant along the characteristics generated by its own values.
The same equation is also a conservation law. Since $r\mapsto r^2/2$ has derivative $r$, the chain rule gives
\begin{align*}
\partial_x\left(\frac{u^2}{2}\right)=u\partial_xu.
\end{align*}
Hence
\begin{align*}
\partial_tu+u\partial_xu=0
\end{align*}
is equivalent, for smooth $u$, to
\begin{align*}
\partial_tu+\partial_x\left(\frac{u^2}{2}\right)=0.
\end{align*}
Burgers' equation is also the differentiated form of a Hamilton-Jacobi equation. Suppose $\phi$ is smooth, $u=\partial_x\phi$, and
\begin{align*}
\partial_t\phi+\frac{(\partial_x\phi)^2}{2}=0.
\end{align*}
Differentiating both sides with respect to $x$ gives
\begin{align*}
0=\partial_x\left(\partial_t\phi+\frac{(\partial_x\phi)^2}{2}\right).
\end{align*}
Expanding the derivative gives
\begin{align*}
0=\partial_x\partial_t\phi+\partial_x\left(\frac{(\partial_x\phi)^2}{2}\right).
\end{align*}
Because $\phi$ is smooth, mixed partial derivatives commute, so
\begin{align*}
\partial_x\partial_t\phi=\partial_t\partial_x\phi=\partial_tu.
\end{align*}
Using the chain rule again,
\begin{align*}
\partial_x\left(\frac{(\partial_x\phi)^2}{2}\right)=(\partial_x\phi)(\partial_{xx}\phi).
\end{align*}
Since $u=\partial_x\phi$, we also have
\begin{align*}
\partial_xu=\partial_{xx}\phi.
\end{align*}
Thus
\begin{align*}
\partial_x\left(\frac{(\partial_x\phi)^2}{2}\right)=u\partial_xu.
\end{align*}
Substituting these two identities into the differentiated Hamilton-Jacobi equation gives
\begin{align*}
0=\partial_tu+u\partial_xu.
\end{align*}
Finally, Burgers appears as the zero-viscosity limit of the parabolic equation
\begin{align*}
\partial_tu^\varepsilon+u^\varepsilon\partial_xu^\varepsilon=\varepsilon\partial_{xx}u^\varepsilon.
\end{align*}
For smooth $u^\varepsilon$, the same chain-rule calculation gives
\begin{align*}
\partial_x\left(\frac{(u^\varepsilon)^2}{2}\right)=u^\varepsilon\partial_xu^\varepsilon.
\end{align*}
Therefore the viscous equation can be written in conservative form as
\begin{align*}
\partial_tu^\varepsilon+\partial_x\left(\frac{(u^\varepsilon)^2}{2}\right)=\varepsilon\partial_{xx}u^\varepsilon.
\end{align*}
Thus one equation simultaneously displays self-generated transport, conservation of the flux $u^2/2$, the slope equation for a Hamilton-Jacobi potential, and the vanishing-viscosity mechanism that selects the entropy solution after shocks form.
[/example]
## Why Weak Solutions and Entropy Conditions Are Unavoidable
The next question is what remains meaningful after characteristics cross. Classical derivatives may not exist across a jump, but conservation of total mass over intervals still has a distributional meaning.
[definition: Weak Solution of a Scalar Conservation Law]
Let $f\in C^1(\mathbb R)$ and $u_0\in L^\infty(\mathbb R)$. A function $u\in L^\infty((0,T)\times\mathbb R)$ is a weak solution of $\partial_t u+\partial_x f(u)=0$ with initial data $u(0,x)=u_0(x)$ if for every $\phi\in C_c^\infty([0,T)\times\mathbb R)$,
\begin{align*}
\int_0^{\,T}\int_{\mathbb R} \left(u\partial_t\phi+f(u)\partial_x\phi\right)\,dx\,dt+
\int_{\mathbb R}u_0(x)\phi(0,x)\,dx=0.
\end{align*}
[/definition]
The weak formulation is stable under limits, which is why it survives shock formation. It also raises a new interface question: if the solution jumps, conservation should impose a precise relation between the jump size and the speed of the moving discontinuity.
[explanation: Rankine-Hugoniot Jump Condition]
For a scalar conservation law $u_t+f(u)_x=0$, suppose a piecewise smooth solution has left and right traces $u^-(t)$ and $u^+(t)$ across a $C^1$ interface $x=\xi(t)$. Conservation across the moving interface forces
\begin{align*}
\dot{\xi}(t)\bigl(u^+(t)-u^-(t)\bigr)=f(u^+(t))-f(u^-(t)).
\end{align*}
For a constant-speed shock $x=st$, this becomes $s(u_+-u_-)=f(u_+)-f(u_-)$.
[/explanation]
The jump condition is a conservation law across the discontinuity, but it is not yet a selection rule. Its hypotheses are also structural: the solution must be smooth on each side of a sufficiently regular curve, and the left and right traces $u_-$ and $u_+$ must exist so that the interface contribution is meaningful. A general $L^\infty$ weak solution need not come with such traces or a single jump curve, so Rankine-Hugoniot describes classical shocks rather than the whole weak theory. For Burgers, both compression shocks and expansion shocks may satisfy Rankine-Hugoniot, which motivates a second structure that distinguishes admissible jumps from nonphysical ones.
[definition: Entropy Pair]
Let $f\in C^1(\mathbb R)$. An entropy pair for the scalar conservation law $\partial_t u+\partial_x f(u)=0$ is a pair of functions $\eta:\mathbb R\to\mathbb R$ and $q:\mathbb R\to\mathbb R$ with $\eta\in C^2(\mathbb R)$ convex and $q\in C^1(\mathbb R)$ satisfying
\begin{align*}
q'(r)=\eta'(r)f'(r)
\end{align*}
for all $r\in\mathbb R$.
[/definition]
Entropy pairs measure irreversible loss across shocks. Convexity is the analytic trace of the physical rule that the correct solution should dissipate every convex entropy, and this motivates building the inequality into the definition of the accepted weak solution.
[definition: Entropy Solution]
Let $u_0\in L^\infty(\mathbb R)$ and $f\in C^1(\mathbb R)$. A weak solution $u\in L^\infty((0,T)\times\mathbb R)$ is an entropy solution if, for every entropy pair $(\eta,q)$ and every nonnegative $\phi\in C_c^\infty((0,T)\times\mathbb R)$,
\begin{align*}
\int_0^{\,T}\int_{\mathbb R}\left(\eta(u)\partial_t\phi+q(u)\partial_x\phi\right)\,dx\,dt\ge 0.
\end{align*}
[/definition]
The entropy inequality replaces equality because shocks create entropy production as a nonpositive distribution in the differential form $\partial_t\eta(u)+\partial_xq(u)\le0$. Once the admissible class has been narrowed, the natural well-posedness question is stability: nearby initial data should lead to nearby entropy solutions.
[quotetheorem:579]
[citeproof:579]
The scalar and entropy hypotheses are essential. Kruzkov's entropy family and the doubling-of-variables argument are the mechanism behind the contraction estimate, and both use the order structure of scalar equations in a way that has no direct analogue for general systems. Arbitrary weak solutions can satisfy the integral conservation law while violating contraction: for Burgers with Riemann data $u_-=0$ and $u_+=1$, the discontinuous expansion shock moving with Rankine-Hugoniot speed $s=1/2$ is a weak solution, but it is rejected by the entropy condition and by vanishing viscosity, which instead gives the rarefaction fan. In systems, even small-data solutions of genuinely nonlinear hyperbolic systems may interact through waves of different families, so total variation and interaction potentials replace any direct scalar $L^1$ contraction principle. The $L^1$ assumptions ensure that the distance being contracted is finite, while boundedness keeps the flux controlled on the range of the solutions. In this course the result is used as the guiding stability idea, and it also explains why viscosity approximations are expected to converge to the entropy solution rather than to an arbitrary weak solution.
[quotetheorem:6168]
[citeproof:6168]
The regularity of $f$ and the smooth viscous approximations justify the entropy calculation before passing to the limit, while the uniform $L^\infty$ and $L^1$ controls are stability inputs that prevent mass from escaping or oscillating without control. Subsequence convergence alone only identifies possible limits; it does not prove that a limit exists for every sequence. Uniqueness of entropy solutions says that all subsequential limits, whenever they exist, must coincide. A compactness or relative compactness theorem is the extra ingredient that turns this identification of possible limits into convergence of the whole vanishing-viscosity family.
[example: Choosing a Solution Concept for a Jump]
Take Burgers equation, so $f(u)=u^2/2$, with Riemann data $u_-=1$ on the left and $u_+=0$ on the right. For a single jump moving along $x=st$, the *[Rankine-Hugoniot jump condition](/theorems/578)* says
\begin{align*}
s(u_+-u_-)=f(u_+)-f(u_-).
\end{align*}
Substituting $u_-=1$ and $u_+=0$ gives
\begin{align*}
s(0-1)=f(0)-f(1).
\end{align*}
Since
\begin{align*}
f(0)=\frac{0^2}{2}=0
\end{align*}
and
\begin{align*}
f(1)=\frac{1^2}{2}=\frac12,
\end{align*}
the jump condition becomes
\begin{align*}
-s=0-\frac12=-\frac12.
\end{align*}
Multiplying both sides by $-1$ gives
\begin{align*}
s=\frac12.
\end{align*}
Thus the weak shock has value $1$ on the region $x<t/2$ and value $0$ on the region $x>t/2$.
The characteristic speed for Burgers is
\begin{align*}
f'(u)=\frac{d}{du}\left(\frac{u^2}{2}\right)=u.
\end{align*}
On the left of the shock the speed is
\begin{align*}
f'(u_-)=f'(1)=1.
\end{align*}
The shock speed is $1/2$, and
\begin{align*}
1>\frac12.
\end{align*}
Thus left characteristics move faster than the shock and catch up to it. On the right the speed is
\begin{align*}
f'(u_+)=f'(0)=0.
\end{align*}
Since
\begin{align*}
0<\frac12,
\end{align*}
the shock moves faster than the right characteristics and overtakes them. Characteristics enter the discontinuity from both sides, so this jump is the entropy shock.
If the data are reversed, with $u_-=0$ and $u_+=1$, the same jump condition permits a discontinuity:
\begin{align*}
s(1-0)=f(1)-f(0).
\end{align*}
Using $f(1)=1/2$ and $f(0)=0$, this becomes
\begin{align*}
s=\frac12-0=\frac12.
\end{align*}
Now the left characteristic speed is
\begin{align*}
f'(u_-)=f'(0)=0,
\end{align*}
and the right characteristic speed is
\begin{align*}
f'(u_+)=f'(1)=1.
\end{align*}
Therefore
\begin{align*}
0<\frac12<1.
\end{align*}
The left characteristics move more slowly than the proposed jump, while the right characteristics move faster than it. Characteristics therefore move away from the discontinuity, so this is an expansion shock and is rejected by the entropy condition.
For the reversed data, the entropy solution is the rarefaction fan with value $0$ for $x<0$, value $x/t$ for $0<x<t$, and value $1$ for $x>t$. Inside the fan, where $0<x<t$, we have $u(t,x)=x/t$. Differentiating in $t$ with $x$ fixed gives
\begin{align*}
\partial_tu(t,x)=\partial_t\left(\frac{x}{t}\right)=x\partial_t(t^{-1})=x(-t^{-2})=-\frac{x}{t^2}.
\end{align*}
Differentiating in $x$ with $t$ fixed gives
\begin{align*}
\partial_xu(t,x)=\partial_x\left(\frac{x}{t}\right)=\frac1t\partial_xx=\frac1t.
\end{align*}
Hence
\begin{align*}
u\partial_xu=\frac{x}{t}\cdot\frac1t=\frac{x}{t^2}.
\end{align*}
Adding the two Burgers terms gives
\begin{align*}
\partial_tu+u\partial_xu=-\frac{x}{t^2}+\frac{x}{t^2}=0.
\end{align*}
On the constant regions $x<0$ and $x>t$, both $\partial_tu$ and $\partial_xu$ vanish, so Burgers' equation also holds there. Thus the rarefaction fan solves Burgers equation away from its two boundary rays, while the *vanishing-viscosity principle* selects this fan rather than the expansion shock. The weak formulation alone allows both discontinuities; the entropy or viscosity criterion chooses the stable one.
[/example]
## Beyond and Connections: Elliptic, Parabolic, Hyperbolic, Sobolev, Viscosity, and Conservation-Law Theory
The final question is what the course's classical formulas teach beyond the first-order setting. The unifying lesson is that explicit representations are valuable, but robust estimates are what survive in general domains, weak spaces, and variable-coefficient problems. This is the bridge to later elliptic theory, parabolic regularity, hyperbolic energy estimates, Sobolev weak solutions, viscosity methods, and entropy theory for conservation laws.
[explanation: Representation Versus Estimate]
The transport formula, d'Alembert formula, heat kernel, Poisson kernel, and characteristic solutions all represent solutions directly. They reveal propagation, smoothing, finite speed, boundary influence, and singularity formation in a concrete form.
Modern PDE theory often begins where representation formulas stop. On rough domains or for variable coefficients, one usually proves bounds that any solution must satisfy: maximum principles, energy inequalities, compactness estimates, comparison principles, and stability estimates. These estimates are weaker than explicit formulas in appearance, but they apply to larger classes of equations and are stable under approximation.
[/explanation]
The three second-order model families illustrate this shift in different ways. Each keeps a trace of the formulas studied earlier, while also demanding its own solution space and estimate.
[example: Heat Smoothing Versus Wave Propagation]
For the heat equation $\partial_tu-\Delta u=0$ on $\mathbb R^n$, the heat kernel representation is
\begin{align*}
u(t,x)=(G_t*u_0)(x)=\int_{\mathbb R^n}G_t(x-y)u_0(y)\,dy,\qquad G_t(z)=\frac{1}{(4\pi t)^{n/2}}\exp\left(-\frac{|z|^2}{4t}\right),
\end{align*}
for $t>0$. The kernel is nonnegative because $(4\pi t)^{-n/2}>0$ and the exponential is positive. Its total mass is computed from the one-dimensional Gaussian integral $\int_{\mathbb R}e^{-r^2/(4t)}\,dr=(4\pi t)^{1/2}$:
\begin{align*}
\int_{\mathbb R^n}G_t(z)\,dz=\frac{1}{(4\pi t)^{n/2}}\prod_{j=1}^n\int_{\mathbb R}\exp\left(-\frac{z_j^2}{4t}\right)\,dz_j=\frac{1}{(4\pi t)^{n/2}}\prod_{j=1}^n(4\pi t)^{1/2}=1.
\end{align*}
Thus $u(t,x)$ is a weighted average of the initial values $u_0(y)$.
More than averaging occurs. For each multi-index $\alpha$,
\begin{align*}
\partial_x^\alpha u(t,x)=\partial_x^\alpha\int_{\mathbb R^n}G_t(x-y)u_0(y)\,dy=\int_{\mathbb R^n}\partial_x^\alpha G_t(x-y)u_0(y)\,dy.
\end{align*}
The interchange of derivative and integral is justified because, for fixed $t>0$, each derivative $\partial_x^\alpha G_t(x-y)$ is a polynomial in $x-y$ times the Gaussian factor $\exp(-|x-y|^2/(4t))$, hence is integrable in $y$ and gives a dominated convergence bound when $u_0$ is bounded. Therefore bounded nonsmooth initial data become smooth in $x$ for every positive time.
For the one-dimensional wave equation $\partial_{tt}u-c^2\partial_{xx}u=0$, *[d'Alembert's formula](/theorems/665)* gives
\begin{align*}
u(t,x)=\frac12 g(x-ct)+\frac12 g(x+ct)+\frac{1}{2c}\int_{x-ct}^{x+ct}h(y)\,dy
\end{align*}
when $u(0,x)=g(x)$ and $\partial_tu(0,x)=h(x)$. Every term on the right uses only values of $g$ or $h$ from the interval $[x-ct,x+ct]$: the translated values are taken at the endpoints, and the integral ranges between the same endpoints. If $g$ has a corner at $x_0$, then $g(x-ct)$ has that same corner when $x-ct=x_0$, which is the line
\begin{align*}
x=x_0+ct.
\end{align*}
Similarly, $g(x+ct)$ has the corner when $x+ct=x_0$, which is the line
\begin{align*}
x=x_0-ct.
\end{align*}
Translation moves the singularity along these two characteristic lines, while no averaging is applied to the two translated $g$ terms. In higher dimensions, the *spherical mean formula* has the same geometric content: the solution at $(t,x)$ is determined by initial data inside the cone $|y-x|\le ct$ or on its spherical boundary, depending on the dimension and the term being represented. Thus parabolic theory is organised around dissipation and regularisation, while hyperbolic theory is organised around propagation and energy.
[/example]
The example separates smoothing from propagation, leaving elliptic equations as the remaining model family with no preferred time direction. This contrast also connects PDE to probability and geometry: the heat equation averages like Brownian motion, the wave equation propagates along cones from Lorentzian geometry, and harmonic functions are controlled by boundary data. This motivates a uniqueness template based on boundary comparison, where the central estimate is the maximum principle rather than a formula along curves.
[quotetheorem:1187]
[citeproof:1187]
This template uses each hypothesis in a visible way. Boundedness and continuity on $\overline U$ ensure that the boundary extrema are meaningful and attained, while harmonicity gives the comparison principle in the interior. On unbounded domains, zero boundary data alone need not control behaviour at infinity: in the upper half-plane, $u(x,y)=y$ is harmonic, vanishes on the boundary line $y=0$, and is nonzero because it grows at infinity. With nonzero forcing, the same argument must be replaced by a comparison with barriers or by estimates involving the sign and size of the forcing term. Maximum principles are comparison estimates rather than representation formulas, and they depend on pointwise regularity. To prepare for weak elliptic theory, the parallel uniqueness argument should be phrased in an integral form that only uses first derivatives, the measure $\mathcal L^n$ on $U$, and the boundary condition encoded by $H^1_0(U)$.
Later weak elliptic theory keeps the same uniqueness idea but changes the language. Instead of differentiating twice pointwise, one works in Sobolev spaces, encodes homogeneous boundary values by a zero-trace condition, and tests the equation against the solution itself. The prototype energy identity is
\begin{align*}
\int_U |\nabla u|^2\,d\mathcal L^n=0.
\end{align*}
It forces the weak gradient to vanish and then uses the boundary condition to remove additive constants. This paragraph is only a bridge to the Sobolev course: the present course uses it to name the direction of the later theory, not to develop the full trace and Hilbert-space machinery here.
[example: Distributional Derivative of One Jump]
Let $a,b\in\mathbb R$, and define $u(x)=a$ for $x<0$ and $u(x)=b$ for $x>0$. The value assigned to $u(0)$ is irrelevant for the regular distribution associated to $u$, because changing a function at one point does not change any integral against a test function. Define
\begin{align*}
T_u(\psi)=\int_{\mathbb R}u(x)\psi(x)\,d\mathcal L^1(x), \qquad \psi\in C_c^\infty(\mathbb R).
\end{align*}
We compute the distributional derivative $D T_u$ and show that it equals $(b-a)\delta_0$.
For $\phi\in C_c^\infty(\mathbb R)$, the distributional derivative is defined by
\begin{align*}
D T_u(\phi)=-T_u(\phi').
\end{align*}
Using the definition of $u$ on the two open half-lines gives
\begin{align*}
T_u(\phi')=\int_{\mathbb R}u(x)\phi'(x)\,d\mathcal L^1(x).
\end{align*}
Since the point $x=0$ has $\mathcal L^1$-measure zero, splitting the integral at the jump gives
\begin{align*}
T_u(\phi')=\int_{-\infty}^{0}a\phi'(x)\,dx+\int_{0}^{\infty}b\phi'(x)\,dx.
\end{align*}
Because $\phi$ has compact support, there is $R_0>0$ such that $\phi(x)=0$ whenever $|x|\ge R_0$. Choose $R>R_0$. Then the first half-line integral is
\begin{align*}
\int_{-\infty}^{0}a\phi'(x)\,dx=\int_{-R}^{0}a\phi'(x)\,dx.
\end{align*}
Pulling out the constant $a$ gives
\begin{align*}
\int_{-R}^{0}a\phi'(x)\,dx=a\int_{-R}^{0}\phi'(x)\,dx.
\end{align*}
By the fundamental theorem of calculus,
\begin{align*}
\int_{-R}^{0}\phi'(x)\,dx=\phi(0)-\phi(-R).
\end{align*}
Since $R>R_0$, we have $\phi(-R)=0$, and hence
\begin{align*}
\int_{-\infty}^{0}a\phi'(x)\,dx=a\phi(0).
\end{align*}
Similarly, the second half-line integral is
\begin{align*}
\int_{0}^{\infty}b\phi'(x)\,dx=\int_{0}^{R}b\phi'(x)\,dx.
\end{align*}
Pulling out the constant $b$ gives
\begin{align*}
\int_{0}^{R}b\phi'(x)\,dx=b\int_{0}^{R}\phi'(x)\,dx.
\end{align*}
Again by the fundamental theorem of calculus,
\begin{align*}
\int_{0}^{R}\phi'(x)\,dx=\phi(R)-\phi(0).
\end{align*}
Since $R>R_0$, we have $\phi(R)=0$, and therefore
\begin{align*}
\int_{0}^{\infty}b\phi'(x)\,dx=-b\phi(0).
\end{align*}
Adding the two contributions gives
\begin{align*}
T_u(\phi')=a\phi(0)-b\phi(0).
\end{align*}
Factoring out $\phi(0)$ gives
\begin{align*}
T_u(\phi')=(a-b)\phi(0).
\end{align*}
Therefore
\begin{align*}
D T_u(\phi)=-T_u(\phi')=-(a-b)\phi(0)=(b-a)\phi(0).
\end{align*}
The Dirac distribution at $0$ is defined by $\delta_0(\phi)=\phi(0)$, so for every test function $\phi$,
\begin{align*}
D T_u(\phi)=(b-a)\delta_0(\phi).
\end{align*}
Hence
\begin{align*}
D T_u=(b-a)\delta_0.
\end{align*}
Thus the distributional derivative records the jump size $b-a$ as a point mass at the interface, which is the local model for jump terms in weak formulations of conservation laws.
[/example]
Putting the chapter together, the course has moved from solving PDEs by curves and kernels to selecting solution concepts by stability and estimates. Characteristics explain why classical solutions exist and why they fail; weak formulations preserve conservation laws past the failure; entropy and viscosity select the physically stable branch. The later courses on elliptic, parabolic, and hyperbolic equations develop these same principles with stronger functional-analytic tools.
Contents
- Introduction
- What Is a Partial Differential Equation Trying to Determine?
- What Counts as a Solution?
- Which Data Make a PDE Problem Well Posed?
- Why First-Order Equations Come First
- How This Course Fits into Later PDE Theory
- 1. PDEs as Equations for Functions and Flows
- Classifying Partial Differential Equations
- Cauchy Problems, Boundary Value Problems, and Well-Posedness
- Prototype Equations and Model Behaviours
- 2. Linear First-Order Equations and Characteristics
- Scalar Linear Equations as Directional Derivative Equations
- Inhomogeneous Equations and Accumulated Source Terms
- Initial Hypersurfaces and Transversality
- Characteristic Data and Breakdown of the Cauchy Problem
- 3. Quasilinear First-Order Equations
- Characteristic Systems for Quasilinear Scalar Equations
- Local Classical Solutions and Dependence on Initial Data
- Gradient Blow-Up and the Limits of Classical Solvability
- 4. Hamilton-Jacobi Equations
- From Linear Characteristics to Hamilton-Jacobi Equations
- Characteristics as Hamiltonian Flow in Phase Space
- The Hopf-Lax Formula in the Quadratic Model
- The Eikonal Equation and Wavefront Geometry
- 5. Conservation Laws and Weak Solutions
- From Balance Laws to Divergence Form
- Test Functions and Weak Formulations
- Moving Discontinuities and the Rankine-Hugoniot Condition
- Flux Shape and Physical Examples
- 6. Entropy Conditions and Admissibility
- Nonuniqueness After Shock Formation
- Lax and Oleinik Entropy Conditions
- Kruzhkov Entropies and Uniqueness
- Vanishing Viscosity Selection
- 7. The Riemann Problem for Scalar Conservation Laws
- Piecewise Constant Data and Self-Similar Form
- Shock Waves, Rarefaction Waves, and Contact Discontinuities
- Worked Scalar Models
- Entropy Admissibility and Nonconvex Warning Cases
- Wave Interaction and the Front-Tracking Preview
- 8. Distributions, Fundamental Solutions, and Green Kernels
- Test Functions and Distributions
- Convolution and Distributional Solutions
- Fundamental Solutions for the Laplacian
- Green Kernels and Boundary Value Problems
- 9. Classical Representation Formulas for Model PDEs
- Harmonic Functions, Averages, and the Newtonian Potential
- Heat Flow, Gaussian Kernels, and Smoothing
- Waves, Characteristics, and Finite Propagation Speed
- 10. Maximum Principles and Energy Methods as First Tools
- Maximum Principles for Laplace and Heat Equations
- Energy Estimates for Transport and Wave Equations
- Uniqueness from A Priori Estimates
- 11. Boundary and Initial Data in Classical Models
- Choosing Data for a PDE Problem
- Green's Identities and Boundary Terms
- Uniqueness and Compatibility for Poisson Problems
- Domains, Normals, and Flux
- Ill-Posed Boundary Choices
- 12. Synthesis: From Classical Formulas to Modern PDE Theory
- What Characteristics Solve and Where They Fail
- Why Weak Solutions and Entropy Conditions Are Unavoidable
- Beyond and Connections: Elliptic, Parabolic, Hyperbolic, Sobolev, Viscosity, and Conservation-Law Theory
Partial Differential Equations I: Classical Foundations and First-Order Equations
Content
Problems
History
Created by admin on 6/10/2026 | Last updated on 6/11/2026
Prerequisites (0/8 completed)
Log in to track your prerequisite progress.
Prerequisites Graph
Interactive dependency map showing prerequisite concepts
Loading dependency graph...
Theorem
Definition
Current
Requires
Rate this page
★
★
★
★
★
Poor
Excellent