Many equations in analysis describe how a quantity changes along one variable: time for an oscillator, arc length for a curve, or a parameter for an [ordinary differential equation](/page/Ordinary%20Differential%20Equation). A partial differential equation begins when the unknown depends on several variables at once. The derivative in one direction can no longer tell the whole story, because spatial variation, temporal change, boundary behaviour, and geometry all interact in the same equation.
The simplest warning comes from heat flow. If $u(t,x)$ denotes temperature, then knowing only the time derivative $\partial_t u$ does not determine the evolution, because heat also reacts to spatial curvature through $\Delta u$. A steep linear temperature profile and a curved one may have the same value and the same first spatial derivative at a point, but only the curved profile has local heat imbalance. The PDE records that imbalance.
[example: Heat Flow Needs Spatial Curvature]
Think of $u_1(x)=0$ and $u_2(x)=x^2$ as instantaneous spatial temperature profiles on $\mathbb{R}$, not as full time-dependent solutions of the [heat equation](/page/Heat%20Equation). At $x=0$, their values agree:
\begin{align*}
u_1(0)=0
\end{align*}
and
\begin{align*}
u_2(0)=0^2=0.
\end{align*}
Their first derivatives also agree, since
\begin{align*}
u_1'(x)=0
\end{align*}
and
\begin{align*}
u_2'(x)=2x,
\end{align*}
so
\begin{align*}
u_1'(0)=0
\end{align*}
and
\begin{align*}
u_2'(0)=2\cdot 0=0.
\end{align*}
On $\mathbb{R}$, the Laplacian is the [second derivative](/page/Second%20Derivative). For $u_1$, differentiating $u_1'(x)=0$ gives
\begin{align*}
\Delta u_1(x)=u_1''(x)=0.
\end{align*}
For $u_2$, differentiating $u_2'(x)=2x$ gives
\begin{align*}
\Delta u_2(x)=u_2''(x)=2.
\end{align*}
In particular,
\begin{align*}
\Delta u_1(0)=0
\end{align*}
while
\begin{align*}
\Delta u_2(0)=2.
\end{align*}
If $U_1$ and $U_2$ solve the heat equation $\partial_t U_i=\Delta U_i$ with initial data $U_i(0,x)=u_i(x)$, then evaluating the equation at $(0,0)$ gives
\begin{align*}
\partial_t U_1(0,0)=\Delta U_1(0,0)=\Delta u_1(0)=0
\end{align*}
and
\begin{align*}
\partial_t U_2(0,0)=\Delta U_2(0,0)=\Delta u_2(0)=2.
\end{align*}
The second profile starts changing at the origin while the first does not, even though their value and slope at the origin are the same. The heat equation is therefore sensitive to second spatial variation, not only to slope.
[/example]
This chapter introduces the basic language for partial differential equations, explains why boundary and initial data are part of the problem rather than decoration, and organizes the first major families of equations by the features that control their analysis. The subject connects naturally with [Cambridge IA Differential Equations](/page/Cambridge%20IA%20Differential%20Equations), [Cambridge II Analysis of Functions](/page/Cambridge%20II%20Analysis%20of%20Functions), and the geometric viewpoint developed in [Cambridge III Differential Geometry](/page/Cambridge%20III%20Differential%20Geometry).
## Definition
The central object is not just an equation, but an equation imposed on an unknown function over a domain. The domain specifies where derivatives are taken, the target specifies what kind of quantity is being solved for, and the differential operator specifies which local measurements of the function enter the law.
Before writing a general PDE in coordinates, we fix the notation used to record all derivatives up to a chosen order. A multi-index is a vector $\alpha=(\alpha_1,\dots,\alpha_n)\in\mathbb{N}_0^n$, with length $|\alpha|=\alpha_1+\cdots+\alpha_n$, and
\begin{align*}
\partial^\alpha u=\frac{\partial^{|\alpha|}u}{\partial x_1^{\alpha_1}\cdots\partial x_n^{\alpha_n}}.
\end{align*}
The space $C^k(U;\mathbb{R}^m)$ consists of maps $U\to\mathbb{R}^m$ whose partial derivatives through order $k$ are continuous. When a formula uses variables indexed by all $\alpha$ with $|\alpha|\leq k$, those variables represent the finite jet of values and derivatives of $u$ up to order $k$ at a point.
[definition: Partial Differential Equation]
Let $U$ be an open subset of $\mathbb{R}^n$, let $m,q \in \mathbb{N}$, and let $k \in \mathbb{N}$. For each multi-index $\alpha$ with $0\leq |\alpha|\leq k$, let $z_\alpha \in \mathbb{R}^m$, and let
\begin{align*}
F: U \times \prod_{0\leq |\alpha|\leq k}\mathbb{R}^m &\to \mathbb{R}^q
\end{align*}
be a specified function. For $u\in C^k(U;\mathbb{R}^m)$, the induced differential operator has type
\begin{align*}
P:C^k(U;\mathbb{R}^m)&\to C^0(U;\mathbb{R}^q).
\end{align*}
It is defined by
\begin{align*}
P(u)(x)&=F\left(x,\{\partial^\alpha u(x):0 \leq |\alpha| \leq k\}\right).
\end{align*}
A partial differential equation of order at most $k$ for an unknown function $u: U \to \mathbb{R}^m$ is an equation of the form
\begin{align*}
F\left(x,\{\partial^\alpha u(x):0 \leq |\alpha| \leq k\}\right)=0
\end{align*}
for all $x \in U$.
[/definition]
The notation hides a large amount of structure. The same displayed form may describe a scalar equation, a coupled system, a stationary law, or an evolution equation. Even so, the definition isolates the local character of the subject: a PDE constrains a function by comparing its values and its partial derivatives at each point.
## Notions of Solution
The strongest first interpretation asks for all derivatives in the displayed equation to exist pointwise. This is the natural starting point when the coefficients and data are smooth, and it is the version closest to multivariable calculus.
[definition: Classical Solution]
Let $U$ be an open subset of $\mathbb{R}^n$, let $m,q,k\in\mathbb{N}$, and let
\begin{align*}
F: U\times\prod_{0\leq |\alpha|\leq k}\mathbb{R}^m&\to\mathbb{R}^q
\end{align*}
define a partial differential equation of order at most $k$ for an unknown map $u:U\to\mathbb{R}^m$. A classical solution of this equation is a function $u\in C^k(U;\mathbb{R}^m)$ such that
\begin{align*}
F\left(x,\{\partial^\alpha u(x):0\leq |\alpha|\leq k\}\right)&=0
\end{align*}
for every $x\in U$.
[/definition]
Classical solutions are valuable when they exist, but many natural equations create corners, shocks, or functions whose derivatives exist only after [integration by parts](/theorems/210). This pressure leads to weak formulations, where the equation is tested against smooth compactly supported functions instead of being read point by point.
[definition: Weak Solution]
Let $U$ be an open subset of $\mathbb{R}^n$, let $m,q\in\mathbb{N}$, let $X$ be a function space of maps $U\to\mathbb{R}^m$, and let $\mathcal{T}\subset C_c^\infty(U;\mathbb{R}^q)$ be a specified test-function space. Let
\begin{align*}
\mathcal{A}:X\times \mathcal{T}&\to\mathbb{R}
\end{align*}
be the residual functional associated to the chosen weak formulation of the equation. A weak solution in $X$ for this weak formulation is a function $u\in X$ such that
\begin{align*}
\mathcal{A}(u,\varphi)=0
\end{align*}
for every $\varphi\in\mathcal{T}$.
[/definition]
The residual functional is part of the formulation, not a canonical object attached to the words "partial differential equation" alone. It is usually built by multiplying the differential expression by a [test function](/page/Test%20Function), integrating over the domain, and applying the chosen integration-by-parts identity. For a scalar equation one often takes $\mathcal{T}=C_c^\infty(U)$; for a system with $q$ equations, vector-valued tests in $C_c^\infty(U;\mathbb{R}^q)$ record each residual component. For example, a divergence-form equation is integrated by parts, placing derivatives on the test function. This is the bridge from PDE to Sobolev spaces and functional analysis.
[example: Weak Form of Poisson's Equation]
Let $U$ be a bounded open subset of $\mathbb{R}^n$, let $f\in L^2(U)$, and first suppose $u\in C^2(U)$ satisfies $-\Delta u=f$ pointwise. For a test function $\varphi\in C_c^\infty(U)$, multiplying the equation by $\varphi$ and integrating gives
\begin{align*}
\int_U (-\Delta u)\varphi\,d\mathcal{L}^n=\int_U f\varphi\,d\mathcal{L}^n.
\end{align*}
Since $\Delta u=\sum_{i=1}^n \partial_{x_i}\partial_{x_i}u$, the left-hand side is
\begin{align*}
\int_U (-\Delta u)\varphi\,d\mathcal{L}^n=-\sum_{i=1}^n\int_U (\partial_{x_i}\partial_{x_i}u)\varphi\,d\mathcal{L}^n.
\end{align*}
Because $\varphi$ has compact support in $U$, one-dimensional [integration by parts](/theorems/2098) in each coordinate has no boundary term, so
\begin{align*}
-\int_U (\partial_{x_i}\partial_{x_i}u)\varphi\,d\mathcal{L}^n=\int_U (\partial_{x_i}u)(\partial_{x_i}\varphi)\,d\mathcal{L}^n.
\end{align*}
Summing over $i$ gives
\begin{align*}
\int_U (-\Delta u)\varphi\,d\mathcal{L}^n=\sum_{i=1}^n\int_U (\partial_{x_i}u)(\partial_{x_i}\varphi)\,d\mathcal{L}^n.
\end{align*}
Since $\nabla u\cdot\nabla\varphi=\sum_{i=1}^n(\partial_{x_i}u)(\partial_{x_i}\varphi)$, the identity becomes
\begin{align*}
\int_U \nabla u\cdot\nabla\varphi\,d\mathcal{L}^n=\int_U f\varphi\,d\mathcal{L}^n.
\end{align*}
The weak formulation therefore declares $u$ to solve $-\Delta u=f$ when this last identity holds for every $\varphi\in C_c^\infty(U)$; only the first weak derivatives of $u$ appear, even though the classical equation contains second derivatives.
[/example]
## Data and Well-Posed Problems
### Boundary Data
A PDE written without data is often underdetermined. The Laplace equation admits many harmonic functions, the heat equation needs an initial temperature, and transport equations need information entering through characteristics. A mathematical problem therefore couples the equation with side conditions that select the solution.
[definition: Boundary Value Problem]
Let $U$ be an open subset of $\mathbb{R}^n$ with boundary $\partial U$, let $X$ be a function space of maps $U\to\mathbb{R}^m$, let $Y$ be a space of boundary data on $\partial U$, and let $B:X\to Y$ be a boundary operator. A boundary value problem consists of a partial differential equation posed in $U$ together with a condition
\begin{align*}
B(u)=g
\end{align*}
for prescribed data $g\in Y$.
[/definition]
Boundary data express how the domain interacts with its surroundings. The first boundary question asks what value the unknown should take on the edge of the region, because prescribing those values is the most direct way to anchor an elliptic problem.
[definition: Dirichlet Boundary Condition]
Let $U$ be an open subset of $\mathbb{R}^n$ with boundary $\partial U$, let $X$ be a function space of maps $U\to\mathbb{R}^m$ for which a trace operator
\begin{align*}
\operatorname{Tr}:X&\to Y
\end{align*}
is defined into a boundary function space $Y$ on $\partial U$. A Dirichlet boundary condition prescribes
\begin{align*}
\operatorname{Tr}u=g
\end{align*}
for specified boundary data $g\in Y$.
[/definition]
For smooth functions on smooth domains, this is the familiar notation $u|_{\partial U}=g$. In Sobolev spaces, the trace operator is the object that makes the boundary value meaningful.
The Dirichlet condition controls the value itself. Many models instead prescribe how material crosses the boundary, so the next boundary condition records the outward flux rather than the boundary state.
[definition: Neumann Boundary Condition]
Let $U$ be an open subset of $\mathbb{R}^n$ with $C^1$ boundary and outward unit normal vector field $\nu$, let $X$ be a function space of scalar functions on $U$, and let $Y$ be a boundary function space on $\partial U$. Suppose the normal trace operator
\begin{align*}
\partial_\nu:X&\to Y
\end{align*}
is defined by $\partial_\nu u=\nabla u\cdot\nu$ for smooth functions and by extension on $X$. A Neumann boundary condition prescribes
\begin{align*}
\partial_\nu u=h
\end{align*}
on $\partial U$, for specified boundary data $h\in Y$.
[/definition]
The Neumann condition measures flux through the boundary. It is natural for heat insulation, electrostatics, and conservation laws, and it often determines solutions only up to constants unless an additional normalization is imposed. For problems such as $-\Delta u=f$ with Neumann data $\partial_\nu u=h$, integration by parts gives the compatibility condition
\begin{align*}
\int_U f \, d\mathcal{L}^n&=-\int_{\partial U} h \, d\mathcal{H}^{n-1},
\end{align*}
with this sign convention.
### Initial Data and Stability
Evolution equations need a different kind of side condition. The equation explains how the state moves in time, but the initial state tells the evolution where to begin.
[definition: Initial Value Problem]
Let $I$ be an interval in $\mathbb{R}$ containing $0$, let $U$ be an open subset of $\mathbb{R}^n$, let $X$ be a function space of maps $I\times U\to\mathbb{R}^m$, let $Y$ be a data space on $I\times U$, and let
\begin{align*}
P:X&\to Y
\end{align*}
be the differential operator defining a time-dependent partial differential equation $P(u)=f$. Let $Z$ be an initial-data space on $U$, let
\begin{align*}
\gamma_0:X&\to Z
\end{align*}
be the initial trace operator, with $\gamma_0(u)=u(0,\cdot)$ whenever pointwise evaluation at $t=0$ is meaningful. Let $u_0\in Z$ be the prescribed initial datum. An [initial value problem](/page/Initial%20Value%20Problem) consists of the equation $P(u)=f$ for an unknown $u\in X$ together with the condition
\begin{align*}
\gamma_0(u)=u_0
\end{align*}
[/definition]
When the equation has second time derivatives, as in the [wave equation](/page/Wave%20Equation), the initial velocity is also part of the data. After the equation and its data have been chosen, the next question is whether they determine a stable mathematical problem rather than a formal prescription.
[definition: Well-Posed Problem]
Let $X$ be a solution space, let $Y$ be a data space, and let
\begin{align*}
P:X&\to Y
\end{align*}
be the problem map associated to a partial differential equation and its boundary or initial operators. Let $Y_{\mathrm{adm}}\subset Y$ be the set of admissible data. The problem $P(u)=g$ is well-posed if for every $g\in Y_{\mathrm{adm}}$ there exists a unique $u\in X$ satisfying $P(u)=g$, and the solution map
\begin{align*}
S:Y_{\mathrm{adm}}&\to X
\end{align*}
defined by
\begin{align*}
S(g)&=u
\end{align*}
is continuous.
[/definition]
Well-posedness is a contract between the equation and the space in which it is studied. For parabolic equations, the first well-posedness test is uniqueness: if two temperature profiles begin from the same data and satisfy the same boundary law, the model should not allow them to separate. The following theorem records that uniqueness mechanism in the heat equation, where an energy estimate rules out two different evolutions from the same data.
[quotetheorem:10006]
The point of this principle is not the heat equation alone. It illustrates a recurring method: subtract two candidate solutions, multiply by the difference, integrate over the domain, and use the sign of the highest-order term to force the difference to vanish.
## Orders, Linearity, and Principal Parts
### Order
The order of a PDE measures the highest derivative appearing in the equation. This is more than bookkeeping. First-order equations propagate information along curves, [second-order elliptic equations](/page/Second-Order%20Elliptic%20Equations) smooth information across regions, and higher-order equations require additional boundary data.
[definition: Order of a Partial Differential Equation]
Let $U$ be an open subset of $\mathbb{R}^n$, let $m,q\in\mathbb{N}$, let $k\in\mathbb{N}$, and let $u:U\to\mathbb{R}^m$ be the unknown. For each multi-index $\alpha$ with $0\leq |\alpha|\leq k$, let $z_\alpha\in\mathbb{R}^m$, and let
\begin{align*}
F:U\times\prod_{0\leq |\alpha|\leq k}\mathbb{R}^m&\to\mathbb{R}^q
\end{align*}
be a specified function. If the partial differential equation for $u$ is represented by
\begin{align*}
F\left(x,\{\partial^\alpha u(x):0\leq |\alpha|\leq k\}\right)=0
\end{align*}
then its order is the largest integer $r\leq k$ for which the differential expression has genuine dependence on at least one derivative variable $z_\alpha$ with $|\alpha|=r$.
[/definition]
When the equation is written directly in terms of an unknown $u:U\to\mathbb{R}^m$, this says that the order is the largest $r$ for which a derivative $\partial^\alpha u$ with $|\alpha|=r$ enters the specified differential operator.
Order helps predict what data are needed. A first-order time equation generally asks for one initial condition, while a second-order time equation asks for initial position and velocity.
### Linearity
After order identifies which derivatives matter, linearity asks how the unknown enters the equation. This distinction is worth isolating because the linear case supports superposition and operator methods, while nonlinear equations require tools adapted to interaction among solutions. The first model is scalar; linear systems use the same operator idea with matrix-valued coefficients and vector-valued unknowns, but the scalar case is the cleanest place to see the structure.
[definition: Linear Partial Differential Equation]
Let $U$ be an open subset of $\mathbb{R}^n$. A partial differential equation of order $k$ for a scalar unknown $u:U\to\mathbb{R}$ is linear if it can be written in the form
\begin{align*}
\sum_{|\alpha|\leq k} a_\alpha(x)\partial^\alpha u(x)=f(x)
\end{align*}
with prescribed coefficient functions $a_\alpha:U\to\mathbb{R}$ and prescribed forcing term $f:U\to\mathbb{R}$.
[/definition]
Linearity allows superposition, duality, Green functions, and spectral methods. Most PDE outside the first model class are nonlinear, and the first question is where the nonlinearity enters. If it avoids the highest derivatives, the equation often keeps some linear estimates; if it changes the highest-order part itself, the geometry and the estimates can depend on the unknown solution.
### Nonlinear Dependence
The mildest nonlinear case keeps the highest-order differential operator linear and fixed, while allowing nonlinear lower-order terms. This pattern appears when a linear diffusion, wave, or dispersive operator is coupled to a reaction term depending on the unknown.
[definition: Semilinear Partial Differential Equation]
Let $U$ be an open subset of $\mathbb{R}^n$, let $k\in\mathbb{N}$, and let $u\in C^k(U)$ be a scalar unknown. A partial differential equation of order $k$ is semilinear if it can be written in the form
\begin{align*}
\sum_{|\alpha|=k} a_\alpha(x)\partial^\alpha u(x)+F\left(x,\{\partial^\beta u(x):0\leq |\beta|<k\}\right)=0
\end{align*}
with coefficient functions $a_\alpha:U\to\mathbb{R}$ and a prescribed function
\begin{align*}
F:U\times \prod_{0\leq |\beta|<k}\mathbb{R}&\to\mathbb{R}.
\end{align*}
[/definition]
Semilinear equations preserve a fixed principal part. Quasilinear equations are harder because the highest derivatives still enter linearly, but their coefficients now depend on the unknown or its lower derivatives. This is the setting of many geometric and fluid models, where the solution changes the medium in which it evolves.
[definition: Quasilinear Partial Differential Equation]
Let $U$ be an open subset of $\mathbb{R}^n$, let $k\in\mathbb{N}$, and let $u\in C^k(U)$ be a scalar unknown. For each $\alpha$ with $|\alpha|=k$, let
\begin{align*}
a_\alpha:U\times \prod_{0\leq |\beta|<k}\mathbb{R}&\to\mathbb{R}
\end{align*}
be a prescribed coefficient function, and let
\begin{align*}
F:U\times \prod_{0\leq |\beta|<k}\mathbb{R}&\to\mathbb{R}
\end{align*}
be prescribed. A partial differential equation of order $k$ is quasilinear if it can be written in the form
\begin{align*}
\sum_{|\alpha|=k} a_\alpha\left(x,\{\partial^\beta u(x):0\leq |\beta|<k\}\right)\partial^\alpha u(x)
+F\left(x,\{\partial^\beta u(x):0\leq |\beta|<k\}\right)=0
\end{align*}
[/definition]
Quasilinear structure still lets the highest derivatives appear through a linear expression once the lower-order jet of $u$ is fixed, so the top-order operator can be read as a linear operator with coefficients frozen from the solution. The next obstruction occurs when even that frozen-top-order reading breaks down: the Hessian might enter through a determinant, an eigenvalue, or another nonlinear function of the highest derivatives. This case needs its own name because the leading operator is no longer a linear expression in the highest-order variables; convexity, viscosity solutions, and comparison principles often become the main tools.
[definition: Fully Nonlinear Partial Differential Equation]
Let $U$ be an open subset of $\mathbb{R}^n$, let $k\in\mathbb{N}$, let $u\in C^k(U)$ be a scalar unknown, and let
\begin{align*}
G:U\times \prod_{0\leq |\alpha|\leq k}\mathbb{R}&\to\mathbb{R}
\end{align*}
be prescribed. A partial differential equation of order $k$ written as
\begin{align*}
G\left(x,\{\partial^\alpha u(x):0\leq |\alpha|\leq k\}\right)=0
\end{align*}
is fully nonlinear in this chosen jet representation if the dependence of $G$ on the highest-order variables $\{z_\alpha:|\alpha|=k\}$ is not affine after the lower-order variables $\{z_\beta:0\leq |\beta|<k\}$ are fixed: there do not exist prescribed functions
\begin{align*}
a_\alpha:U\times \prod_{0\leq |\beta|<k}\mathbb{R}&\to\mathbb{R}
\end{align*}
for all $\alpha$ with $|\alpha|=k$, and a prescribed lower-order function
\begin{align*}
H:U\times \prod_{0\leq |\beta|<k}\mathbb{R}&\to\mathbb{R}
\end{align*}
such that the highest-order dependence can be written, on the chosen jet variables, in the quasilinear form
\begin{align*}
\sum_{|\alpha|=k} a_\alpha\left(x,\{z_\beta:0\leq |\beta|<k\}\right)z_\alpha
+H\left(x,\{z_\beta:0\leq |\beta|<k\}\right).
\end{align*}
[/definition]
This definition is still tied to the chosen differential expression, but it is meant to exclude equations whose highest derivatives enter linearly after the lower-order jet has been fixed. Rewritings that introduce auxiliary variables or differentiate the equation can change the displayed order and should be treated as a different formulation, not as an automatic reclassification of the original PDE.
[example: Linear, Semilinear, Quasilinear, and Fully Nonlinear Equations]
On an open subset $U\subset\mathbb{R}^n$, the Poisson equation
\begin{align*}
-\Delta u=f
\end{align*}
is linear because
\begin{align*}
-\Delta u=-\sum_{i=1}^n \partial_{x_i}\partial_{x_i}u
\end{align*}
is a sum of prescribed coefficients times derivatives of $u$, with coefficient $-1$ on each second derivative $\partial_{x_i}\partial_{x_i}u$ and coefficient $0$ on all other derivative terms.
For a time-dependent unknown $u(t,x)$, the reaction-diffusion equation
\begin{align*}
\partial_t u-\Delta u+u^3=0
\end{align*}
has fixed differential part
\begin{align*}
\partial_t u-\Delta u=\partial_t u-\sum_{i=1}^n\partial_{x_i}\partial_{x_i}u.
\end{align*}
The remaining term $u^3$ depends on $u$ but not on any highest spatial derivative, so the equation is semilinear.
For smooth scalar $u$, the divergence-form equation
\begin{align*}
-\operatorname{div}\left((1+|u|^2)\nabla u\right)=f
\end{align*}
can be expanded component by component. Since $u$ is scalar, $|u|^2=u^2$, and
\begin{align*}
\operatorname{div}\left((1+u^2)\nabla u\right)=\sum_{i=1}^n \partial_{x_i}\left((1+u^2)\partial_{x_i}u\right).
\end{align*}
Using the product rule on each summand gives
\begin{align*}
\partial_{x_i}\left((1+u^2)\partial_{x_i}u\right)=\partial_{x_i}(1+u^2)\partial_{x_i}u+(1+u^2)\partial_{x_i}\partial_{x_i}u.
\end{align*}
By the chain rule,
\begin{align*}
\partial_{x_i}(1+u^2)=2u\partial_{x_i}u.
\end{align*}
Therefore
\begin{align*}
\partial_{x_i}\left((1+u^2)\partial_{x_i}u\right)=2u(\partial_{x_i}u)^2+(1+u^2)\partial_{x_i}\partial_{x_i}u.
\end{align*}
Summing over $i$ and inserting the minus sign gives
\begin{align*}
-\operatorname{div}\left((1+u^2)\nabla u\right)=-\sum_{i=1}^n(1+u^2)\partial_{x_i}\partial_{x_i}u-2u\sum_{i=1}^n(\partial_{x_i}u)^2.
\end{align*}
The second derivatives occur only through the linear expression $\sum_{i=1}^n -(1+u^2)\partial_{x_i}\partial_{x_i}u$, but the coefficient $1+u^2$ depends on the unknown, so the equation is quasilinear.
For $n\geq 2$, the Monge-Ampere equation
\begin{align*}
\det\left(\partial_{x_i}\partial_{x_j}u\right)=f
\end{align*}
is fully nonlinear because the second derivatives enter through the determinant of the Hessian. In two variables, writing $u_{ij}=\partial_{x_i}\partial_{x_j}u$, the Hessian determinant is
\begin{align*}
\det(D^2u)=u_{11}u_{22}-u_{12}u_{21}.
\end{align*}
Equivalently,
\begin{align*}
\det(D^2u)=(\partial_{x_1}\partial_{x_1}u)(\partial_{x_2}\partial_{x_2}u)-(\partial_{x_1}\partial_{x_2}u)(\partial_{x_2}\partial_{x_1}u).
\end{align*}
This expression contains products of second derivatives, so it is not affine in the highest-order derivative variables. Thus the leading part is no longer a linear expression in the highest derivatives, which is the distinction between fully nonlinear equations and the linear, semilinear, and quasilinear examples above.
[/example]
These distinctions are not taxonomy for its own sake. They tell the reader which tools are likely to survive from the linear theory: semilinear equations often begin with estimates for the fixed linear part, quasilinear equations require controlling coefficients generated by the solution, and fully nonlinear equations usually need order, convexity, or comparison structures.
Fully nonlinear equations also expose a limitation of weak formulations based on integration by parts. If the equation contains a determinant or an eigenvalue of the Hessian, there may be no useful way to move derivatives onto a test function. Instead of testing by integration, viscosity theory tests whether smooth functions can touch the unknown from above or below without violating the differential inequality.
[definition: Viscosity Solution]
Let $U$ be an open subset of $\mathbb{R}^n$, and let
\begin{align*}
F:U\times\mathbb{R}\times\mathbb{R}^n\times\mathbb{R}^{n\times n}&\to\mathbb{R}
\end{align*}
be prescribed. A [continuous function](/page/Continuous%20Function) $u:U\to\mathbb{R}$ is a viscosity subsolution of
\begin{align*}
F(x,u,\nabla u,D^2u)=0
\end{align*}
if, whenever $\phi\in C^2(U)$ and $u-\phi$ has a local maximum at $x_0\in U$, one has
\begin{align*}
F(x_0,u(x_0),\nabla\phi(x_0),D^2\phi(x_0))&\leq 0.
\end{align*}
It is a viscosity supersolution if, whenever $\phi\in C^2(U)$ and $u-\phi$ has a local minimum at $x_0\in U$, one has
\begin{align*}
F(x_0,u(x_0),\nabla\phi(x_0),D^2\phi(x_0))&\geq 0.
\end{align*}
It is a viscosity solution if it is both a viscosity subsolution and a viscosity supersolution.
[/definition]
The definition replaces missing pointwise second derivatives of $u$ by the derivatives of smooth comparison functions. Its strength is that it remains meaningful for merely continuous solutions, which is exactly the regularity one often has before proving further estimates.
To understand which part of a linear or partially linear equation drives its analytic behaviour, the next step is to isolate the highest-order terms. This separation matters because differentiating a rapidly oscillating function amplifies the highest derivatives most strongly, so the top-order terms usually decide whether estimates behave like diffusion, propagation, or transport. Lower-order terms can change solvability and constants, but the leading differential structure is the first object to read.
[definition: Principal Part]
Let $U$ be an open subset of $\mathbb{R}^n$, let $k\in\mathbb{N}$, let $u\in C^k(U)$ be a scalar unknown, let $a_\alpha:U\to\mathbb{R}$ be coefficient functions for all multi-indices $\alpha$ with $|\alpha|\leq k$, and let $f:U\to\mathbb{R}$ be prescribed. For a linear partial differential equation of order $k$,
\begin{align*}
\sum_{|\alpha|\leq k} a_\alpha(x)\partial^\alpha u(x)=f(x),
\end{align*}
the principal part is the sum of all terms with $|\alpha|=k$:
\begin{align*}
\sum_{|\alpha|=k} a_\alpha(x)\partial^\alpha u(x).
\end{align*}
[/definition]
The principal part controls the highest-frequency behaviour of the equation. Lower-order terms matter for estimates and qualitative behaviour, but the principal part usually determines whether the equation resembles diffusion, oscillation, or transport.
[example: Lower-Order Terms Do Not Change the Principal Part]
For the equation
\begin{align*}
-\Delta u+b(x)\cdot\nabla u+c(x)u=f(x)
\end{align*}
on an open subset $U\subset\mathbb{R}^n$, write $b(x)=(b_1(x),\ldots,b_n(x))$. The Laplacian term is
\begin{align*}
-\Delta u=-\sum_{i=1}^n \partial_{x_i}\partial_{x_i}u.
\end{align*}
The drift term expands as
\begin{align*}
b(x)\cdot\nabla u=\sum_{i=1}^n b_i(x)\partial_{x_i}u.
\end{align*}
Thus the full left-hand side is
\begin{align*}
-\sum_{i=1}^n \partial_{x_i}\partial_{x_i}u+\sum_{i=1}^n b_i(x)\partial_{x_i}u+c(x)u.
\end{align*}
The only second-order derivatives appearing are the terms $-\partial_{x_i}\partial_{x_i}u$, so the principal part is
\begin{align*}
-\Delta u.
\end{align*}
The terms $\sum_{i=1}^n b_i(x)\partial_{x_i}u$ and $c(x)u$ have differential order $1$ and $0$, respectively, so they are lower-order terms.
For example, if $u\in C_c^\infty(U)$, integration by parts in each coordinate has no boundary term and gives
\begin{align*}
-\int_U (\partial_{x_i}\partial_{x_i}u)u\,d\mathcal{L}^n=\int_U (\partial_{x_i}u)^2\,d\mathcal{L}^n.
\end{align*}
Summing over $i$ yields
\begin{align*}
\int_U (-\Delta u)u\,d\mathcal{L}^n=\sum_{i=1}^n\int_U (\partial_{x_i}u)^2\,d\mathcal{L}^n=\int_U |\nabla u|^2\,d\mathcal{L}^n.
\end{align*}
This is why elliptic estimates start from the second-order part; the drift and potential terms must then be controlled as lower-order contributions.
[/example]
## The Three Classical Second-Order Types
Second-order scalar equations in two variables have a traditional classification that still guides intuition in higher dimensions. The classification comes from the [quadratic form](/page/Quadratic%20Form) attached to the second derivatives. Its sign pattern predicts the geometry of information flow.
### Elliptic Behaviour
The first second-order type is the one associated with equilibrium rather than evolution. Its defining feature is a definite principal part, which forces the equation to couple all spatial directions together.
[definition: Elliptic Equation]
Let $U$ be an open subset of $\mathbb{R}^n$, let $a_{ij}:U\to\mathbb{R}$ for $1\leq i,j\leq n$, let $b_i:U\to\mathbb{R}$ for $1\leq i\leq n$, let $c:U\to\mathbb{R}$, and let $f:U\to\mathbb{R}$. For a scalar unknown $u\in C^2(U)$, a second-order linear equation
\begin{align*}
\sum_{i,j=1}^n a_{ij}(x)\partial_{x_i}\partial_{x_j}u(x)
+\sum_{i=1}^n b_i(x)\partial_{x_i}u(x)
+c(x)u(x)
&=f(x)
\end{align*}
is elliptic at $x\in U$ if the principal coefficient matrix
\begin{align*}
A(x)&=(a_{ij}(x))_{1\leq i,j\leq n}:\mathbb{R}^n\to\mathbb{R}^n
\end{align*}
has symmetric part that is definite at $x$.
[/definition]
This is a pointwise classification of the principal part. It says what kind of second-order behaviour is present at a chosen point, but it does not yet give the quantitative lower bounds needed for estimates. When later results require constants that work throughout a domain, the stronger uniform ellipticity hypotheses will be stated separately.
Elliptic equations model equilibrium. They do not describe time evolution; instead, they spread boundary information throughout the domain. For the model equation $\Delta u=0$, the central obstruction is to rule out hidden interior peaks or troughs that are not forced by boundary data. The maximum principle supplies exactly this control by turning qualitative ellipticity into a boundary-dominance statement.
[quotetheorem:32]
The maximum principle says that a harmonic function has no interior source of new extremes. Boundary values govern the whole region, which is one reason elliptic problems are often boundary value problems.
For this formulation, no smoothness of $\partial U$ is needed: the continuity of $u$ on $\overline{U}$ supplies the boundary values, and boundedness makes the extrema exist.
### Parabolic Behaviour
Equilibrium is only one way a second-order spatial operator appears. If an elliptic spatial part is coupled to a single forward time direction, the resulting equations describe diffusion and irreversible smoothing.
[definition: Parabolic Equation]
Let $I$ be an open interval in $\mathbb{R}$, let $U$ be an open subset of $\mathbb{R}^n$, let $X$ be the space of functions $u:I\times U\to\mathbb{R}$ whose time derivative $\partial_t u$ and spatial derivatives $\partial_{x_i}u$, $\partial_{x_i}\partial_{x_j}u$ are continuous on $I\times U$, let $Y=C^0(I\times U)$, and let $a_{ij},b_i,c\in C^0(I\times U)$ be real-valued coefficient functions. Write $A_s(t,x)$ for the symmetric part of $A(t,x)=(a_{ij}(t,x))_{1\leq i,j\leq n}$. Define the linear operator $L:X\to Y$ by
\begin{align*}
L(u)&=\partial_t u-\sum_{i,j=1}^n a_{ij}(t,x)\partial_{x_i}\partial_{x_j}u+\sum_{i=1}^n b_i(t,x)\partial_{x_i}u+c(t,x)u
\end{align*}
for $u\in X$. The equation $Lu=f$ with $f\in Y$ is uniformly parabolic if there exists $\theta>0$ such that
\begin{align*}
\sum_{i,j=1}^n (A_s(t,x))_{ij}\xi_i\xi_j\geq \theta |\xi|^2
\end{align*}
for every $(t,x)\in I\times U$ and every $\xi\in\mathbb{R}^n$.
[/definition]
The signs on the lower-order terms are part of this chosen convention; parabolicity is governed by the time derivative together with the uniformly elliptic second-order spatial part. Nonsymmetric coefficients are allowed only insofar as their symmetric part is uniformly positive definite, since the antisymmetric part does not contribute to the second-derivative quadratic form.
Parabolic equations model irreversible evolution. Their solutions often become smoother for positive time, but this is not just a regularity slogan: the initial profile may be much rougher than any later-time slice. For the heat equation, the key question is whether convolution with the heat kernel genuinely converts rough initial data into smooth profiles as soon as $t>0$.
[quotetheorem:10007]
This result expresses the directionality of parabolic equations. Rough initial data can become smooth after any positive time, while the reverse-time heat equation is unstable.
### Hyperbolic Behaviour
Diffusion is not the only time-dependent behaviour. When the principal part carries real characteristic directions, the equation transports information along wave paths rather than instantly mixing it through the whole domain.
[definition: Wave-Type Hyperbolic Equation]
Let $I$ be an interval in $\mathbb{R}$, let $U$ be an open subset of $\mathbb{R}^n$, let $X=C^2(I\times U)$, let $Y=C^0(I\times U)$, and let $a_{ij},b_i,c\in C^0(I\times U)$ be real-valued coefficient functions. Define the linear operator $L:X\to Y$ by
\begin{align*}
L(u)&=\partial_t^2u-\sum_{i,j=1}^n a_{ij}(t,x)\partial_{x_i}\partial_{x_j}u+\sum_{i=1}^n b_i(t,x)\partial_{x_i}u+c(t,x)u
\end{align*}
for $u\in X$. The equation $Lu=f$ with $f\in Y$ is uniformly hyperbolic with respect to $t$ if the matrix $A(t,x)=(a_{ij}(t,x))$ is symmetric and there exists $\theta>0$ such that
\begin{align*}
\sum_{i,j=1}^n a_{ij}(t,x)\xi_i\xi_j\geq \theta |\xi|^2
\end{align*}
for every $(t,x)\in I\times U$ and every $\xi\in\mathbb{R}^n$.
[/definition]
This definition isolates the standard second-order wave-type model rather than the most general principal-symbol formulation of hyperbolicity. Hyperbolic equations model propagation rather than equilibration. The wave equation is the basic example: disturbances travel with finite speed, so the next theorem records the finite region of initial data that can influence a later point.
[quotetheorem:670]
Finite speed is not just a qualitative slogan; it is the mechanism that makes hyperbolic initial-value problems local in spacetime. For the wave equation, data outside the relevant cone cannot affect the value at a point, so one can study solutions by restricting attention to a bounded domain of dependence. The theorem depends on the wave structure and on enough regularity for the energy argument behind the result; it should not be expected for elliptic equations, where boundary data can influence the whole domain, or for parabolic equations, where heat-like smoothing creates immediate long-range effects. This is why the elliptic, parabolic, and hyperbolic trichotomy is useful: it predicts whether the next estimates should be boundary estimates, smoothing estimates, or propagation estimates.
[example: Three Model Second-Order Equations]
On $\mathbb{R}^n$, the Poisson equation
\begin{align*}
-\Delta u=f
\end{align*}
has principal part
\begin{align*}
-\Delta u=-\sum_{i=1}^n \partial_{x_i}\partial_{x_i}u.
\end{align*}
Thus the principal coefficient matrix is $A=-I_n$, since
\begin{align*}
\sum_{i,j=1}^n A_{ij}\partial_{x_i}\partial_{x_j}u=-\sum_{i=1}^n \partial_{x_i}\partial_{x_i}u.
\end{align*}
For every $\xi\in\mathbb{R}^n$,
\begin{align*}
\xi^\top A\xi=-\sum_{i=1}^n \xi_i^2=-|\xi|^2.
\end{align*}
This matrix is negative definite, so the equation is elliptic. Since no time variable appears, it is an equilibrium equation rather than an evolution equation.
For the heat equation
\begin{align*}
\partial_t u-\Delta u=0,
\end{align*}
the second-order spatial part is again
\begin{align*}
-\Delta u=-\sum_{i=1}^n \partial_{x_i}\partial_{x_i}u,
\end{align*}
and the time part is the first derivative $\partial_t u$. In the parabolic convention used here,
\begin{align*}
\partial_t u-\sum_{i,j=1}^n a_{ij}\partial_{x_i}\partial_{x_j}u=0.
\end{align*}
Taking $a_{ij}=\delta_{ij}$ gives
\begin{align*}
\sum_{i,j=1}^n a_{ij}\partial_{x_i}\partial_{x_j}u=\sum_{i=1}^n \partial_{x_i}\partial_{x_i}u=\Delta u.
\end{align*}
The coefficient matrix is $I_n$, and
\begin{align*}
\xi^\top I_n\xi=\sum_{i=1}^n \xi_i^2=|\xi|^2.
\end{align*}
Thus the spatial part is uniformly elliptic while the equation has one forward time derivative, so the heat equation is parabolic and models dissipative evolution.
For the wave equation with $c>0$,
\begin{align*}
\partial_t^2 u-c^2\Delta u=0,
\end{align*}
the spatial term expands as
\begin{align*}
-c^2\Delta u=-c^2\sum_{i=1}^n \partial_{x_i}\partial_{x_i}u.
\end{align*}
In the wave-type hyperbolic convention
\begin{align*}
\partial_t^2u-\sum_{i,j=1}^n a_{ij}\partial_{x_i}\partial_{x_j}u=0,
\end{align*}
taking $a_{ij}=c^2\delta_{ij}$ gives
\begin{align*}
\sum_{i,j=1}^n a_{ij}\partial_{x_i}\partial_{x_j}u=c^2\sum_{i=1}^n \partial_{x_i}\partial_{x_i}u=c^2\Delta u.
\end{align*}
The spatial coefficient matrix is $c^2I_n$, and
\begin{align*}
\xi^\top(c^2I_n)\xi=c^2|\xi|^2.
\end{align*}
Because $c^2>0$, this is uniformly positive definite. The principal part contains $\partial_t^2u$ with the opposite sign from the spatial second derivatives, so the model is wave-type hyperbolic: it describes propagation along cones rather than instantaneous smoothing.
[/example]
## First-Order Equations and Characteristics
First-order equations are the place where geometry enters immediately. A derivative in a direction says how the unknown changes along curves tangent to that direction. Solving the PDE often means following those curves.
[definition: Characteristic Curve]
Let $U$ be an open subset of $\mathbb{R}^n$, let $I$ be an interval in $\mathbb{R}$, and let $b\in C^0(U;\mathbb{R}^n)$ be a vector field. The transport operator associated to $b$ is the map
$b\cdot\nabla:C^1(U)\to C^0(U)$ defined by $(b\cdot\nabla)(u)=b\cdot\nabla u$. A characteristic curve for $b\cdot\nabla$ is a differentiable curve $\gamma:I\to U$ satisfying
\begin{align*}
\gamma'(s)=b(\gamma(s))
\end{align*}
for $s\in I$.
[/definition]
Characteristics turn a PDE into an ordinary differential equation along selected paths. This is why ordinary differential equation theory remains part of the foundations of first-order PDE.
[definition: Transport Equation]
Let $U$ be an open subset of $\mathbb{R}^n$, let $b\in C^0(U;\mathbb{R}^n)$ be a vector field, and let $f\in C^0(U)$ be a prescribed function. A transport equation for an unknown $u\in C^1(U)$ has the form
\begin{align*}
b(x)\cdot \nabla u(x)=f(x).
\end{align*}
[/definition]
The transport equation asks for the [directional derivative](/page/Directional%20Derivative) of $u$ along $b$ to match a prescribed source. Along a characteristic curve $\gamma$, the chain rule changes this into an ordinary differential equation for $u(\gamma(s))$.
[quotetheorem:10008]
This theorem explains why characteristics are not merely a picture. They are a coordinate system adapted to the PDE, whenever the vector field is regular enough and the curves cover the region of interest.
## Weak Formulations and Function Spaces
Many PDE methods begin by multiplying the equation by a test function and integrating. This does two things at once: it lowers the number of derivatives required from the unknown, and it exposes the algebraic structure needed for estimates. Compact support is the condition that keeps this operation local: after integration by parts, no boundary term appears because the test function is supported in a compact subset of $U$ and has vanished before reaching $\partial U$.
[definition: Test Function]
Let $U$ be an open subset of $\mathbb{R}^n$. A test function on $U$ is a function $\varphi\in C_c^\infty(U)$.
[/definition]
Test functions need not vanish near the boundary of their own support; the support is defined as the closure of the region where the function may be nonzero. What matters for PDE is that $\operatorname{supp}\varphi$ is compactly contained in $U$, so $\varphi$ vanishes on some neighbourhood of $\partial U$ when it is viewed as a function on $U$. This lets integration by parts ignore boundary terms and also points to a broader idea: if derivatives can be moved onto test functions, then an equation can still make sense when the unknown has no pointwise derivatives, or even when the unknown is a distribution such as a Dirac mass.
For a differential operator on distributions, the signs are normally fixed by duality against test functions. For example, the [distributional derivative](/page/Distributional%20Derivative) is defined by moving the derivative onto the test function and paying the integration-by-parts sign. The next definition assumes that this distributional operator has already been constructed, so it records what it means to solve the resulting identity.
For weak solutions, derivatives are interpreted by testing against compactly supported smooth functions. The notation $\mathcal{D}'(U)$ denotes the space of distributions on $U$, meaning continuous linear functionals on compactly supported smooth test functions. If $f$ is locally integrable, then $T_f\in\mathcal{D}'(U)$ is the [regular distribution](/page/Regular%20Distribution) defined by integration against $f$.
This raises the basic solvability notion needed before any energy space is chosen: once both sides of an equation are distributions, a solution should be defined by equality of their actions on every test function. The definition below isolates that minimal meaning of solving $P(u)=F$.
[definition: Distributional Solution]
Let $U$ be an open subset of $\mathbb{R}^n$, let $P:\mathcal{D}'(U)\to\mathcal{D}'(U)$ be a differential operator defined on distributions, and let $F\in\mathcal{D}'(U)$ be prescribed data. A distributional solution of $P(u)=F$ is a distribution $u\in\mathcal{D}'(U)$ such that
\begin{align*}
P(u)(\varphi)=F(\varphi)
\end{align*}
for every $\varphi\in C_c^\infty(U)$.
[/definition]
This definition is broader than the Sobolev weak formulations used below. It treats the equation as an identity after testing, while a Sobolev weak solution also commits to a function space such as $H^1_{\mathrm{loc}}(U)$ or $H_0^1(U)$ where energy estimates can be made. To use this feature with second-order elliptic equations, the differential operator should be arranged so that integration by parts leaves only first derivatives on the solution.
The next step is to specialize the distributional viewpoint to the elliptic operators for which energy methods are effective. Writing the second-order part in divergence form lets one transfer one derivative from $u$ to the test function, so the natural assumptions involve only first weak derivatives of the solution and bounded measurable coefficients.
The standard energy space for divergence-form elliptic equations is local Sobolev regularity. The notation $H^1_{\mathrm{loc}}(U)$ means that a function and its first weak derivatives lie in $L^2$ on every compact subset of $U$. The coefficient condition $A\in L^\infty(U;\mathbb{R}^{n\times n})$ means that the matrix-valued coefficient is essentially bounded, and ``a.e.'' means almost everywhere, so pointwise inequalities are required outside a measure-zero exceptional set.
The formal definition below packages these analytic choices into a single class of equations. It records both the operator $-\operatorname{div}(A\nabla u)$ and the uniform ellipticity condition on $A$, because later existence, uniqueness, and regularity results use precisely this combination of weak differentiability, bounded coefficients, and quantitative positivity.
[definition: Divergence-Form Elliptic Equation]
Let $U$ be an open subset of $\mathbb{R}^n$, let $A\in L^\infty(U;\mathbb{R}^{n\times n})$, and define the distributional operator
\begin{align*}
P:H^1_{\mathrm{loc}}(U)&\to \mathcal{D}'(U)
\end{align*}
by
\begin{align*}
P(u)&=-\operatorname{div}(A\nabla u).
\end{align*}
Let $f\in L^1_{\mathrm{loc}}(U)$, identified with the regular distribution
\begin{align*}
T_f(\varphi)&=\int_U f\varphi \, d\mathcal{L}^n.
\end{align*}
A second-order scalar equation is a uniformly elliptic divergence-form equation if it has the form
\begin{align*}
P(u)=T_f.
\end{align*}
There are constants $\theta,\Lambda>0$ such that
\begin{align*}
\sum_{i,j=1}^n A_{ij}(x)\xi_i\xi_j\geq \theta|\xi|^2
\end{align*}
and
\begin{align*}
|A(x)\xi\cdot\eta|\leq \Lambda|\xi|\,|\eta|
\end{align*}
for a.e. $x\in U$ and every $\xi,\eta\in\mathbb{R}^n$.
[/definition]
Divergence form is well suited to weak solutions because integration by parts places only first derivatives on $u$. It is also the form that arises naturally from minimizing energy integrals when the coefficient matrix is symmetric; without symmetry, the same coercive weak theory still applies, but the equation need not come from a scalar energy functional.
[definition: Weak Form of a Divergence-Form Equation]
Let $U$ be an open subset of $\mathbb{R}^n$, let $A\in L^\infty_{\mathrm{loc}}(U;\mathbb{R}^{n\times n})$, and define
\begin{align*}
P:H^1_{\mathrm{loc}}(U)&\to\mathcal{D}'(U)
\end{align*}
by
\begin{align*}
P(u)&=-\operatorname{div}(A\nabla u).
\end{align*}
Let $f\in L^1_{\mathrm{loc}}(U)$ and let $T_f\in\mathcal{D}'(U)$ be its regular distribution. A function $u\in H^1_{\mathrm{loc}}(U)$ satisfies the weak form of $P(u)=T_f$ if the residual functional $R_u:C_c^\infty(U)\to\mathbb{R}$ defined by
\begin{align*}
R_u(\varphi)&=\int_U A\nabla u\cdot\nabla\varphi \, d\mathcal{L}^n-\int_U f\varphi \, d\mathcal{L}^n
\end{align*}
vanishes identically, meaning
\begin{align*}
\int_U A\nabla u\cdot\nabla\varphi \, d\mathcal{L}^n
&=
\int_U f\varphi \, d\mathcal{L}^n
\end{align*}
for every $\varphi\in C_c^\infty(U)$.
[/definition]
Although the display is written over $U$, the compact support of $\varphi$ localizes both integrals to a compact subset of $U$, so the local integrability assumptions are enough. The weak form is not a weaker mathematical standard; it is a different formulation whose solutions can often be proved to have additional regularity later. The Sobolev-space framework makes that statement precise.
[quotetheorem:4869]
This theorem is a prototype for modern PDE: convert the differential equation into a functional equation, prove coercivity and boundedness, solve in a [Hilbert space](/page/Hilbert%20Space), and then study regularity afterward.
## Estimates as the Engine of PDE
The same PDE can be approached by formulas, kernels, variational methods, compactness, or geometry, but estimates hold the subject together. An estimate states that some norm of the unknown is controlled by the data. Once such control is available, existence can be obtained by approximation and uniqueness by applying the estimate to a difference.
[definition: A Priori Estimate]
Let $X$ be a solution space, let $Y$ be a data space, and let $P:X\to Y$ denote a partial differential equation problem map. An a priori estimate for $P(u)=g$ is an inequality of the form
\begin{align*}
\|u\|_X\leq C\|g\|_Y
\end{align*}
for every solution $u\in X$ with data $g\in Y$, where $C>0$ is independent of $u$ and $g$.
[/definition]
The phrase "a priori" means the estimate is derived before constructing the solution. For the heat equation, the central estimate is an exact balance law: the $L^2$ norm decreases while the accumulated gradient energy is measured in time. The next theorem states that balance in a form that can drive uniqueness and compactness arguments.
[quotetheorem:10009]
Energy estimates are powerful because they use the sign structure of the equation. The heat equation dissipates $L^2$ energy, elliptic equations control spatial gradients, and wave equations conserve a different combination of displacement and velocity.
[remark: PDE as Local Law Plus Global Constraint]
A partial differential equation is local in its differential expression, but its solution theory is global. Boundary conditions, topology of the domain, conservation laws, and function spaces decide which local laws produce well-posed mathematical problems.
[/remark]
This local-global tension is why PDE sits at the intersection of analysis, geometry, and mathematical physics. Local derivatives state the law; estimates and spaces determine whether the law selects a function.
## Beyond and Connected Topics
After the basic language is in place, PDE theory branches quickly, and this parent chapter is deliberately selective. Elliptic regularity asks when weak solutions become smoother, building on tools from [Cambridge II Analysis of Functions](/page/Cambridge%20II%20Analysis%20of%20Functions). Conservation laws study shocks, entropy conditions, and the failure of classical differentiability after characteristics cross. Dispersive equations balance oscillation with nonlinearity. Geometric PDE treats curvature and shape as unknowns, connecting with [Cambridge III Differential Geometry](/page/Cambridge%20III%20Differential%20Geometry). Stochastic PDE adds randomness to the forcing or evolution.
Weak, distributional, and viscosity formulations are a second natural continuation. The weak form used above is tailored to an energy identity, while [Distribution](/page/Distribution) theory gives a broader language in which derivatives are defined by testing against compactly supported smooth functions. Viscosity solutions, introduced above for fully nonlinear equations, replace weak derivatives with comparison against smooth test functions from above and below when integration by parts is not the right organizing principle. These distinctions matter when solutions are measures, when sources are singular, when shocks or corners appear, or when no Hilbert-space coercivity is available.
[Boundary regularity](/theorems/99) and compatibility conditions form another branch. A smooth boundary permits traces, normal derivatives, and integration by parts in the familiar form; rough domains force these ideas to be rebuilt with functional analysis. Neumann problems already show the issue: the boundary flux and the interior source cannot be chosen independently.
The chapter should therefore be read as a map. Definitions give the coordinates, examples show why pointwise calculus is not enough, classifications identify the main terrain, and estimates supply the machinery that turns formal equations into theorems.
## References
Androma, [Cambridge IA Differential Equations](/page/Cambridge%20IA%20Differential%20Equations).
Androma, [Cambridge II Analysis of Functions](/page/Cambridge%20II%20Analysis%20of%20Functions).
Androma, [Cambridge III Differential Geometry](/page/Cambridge%20III%20Differential%20Geometry).
Androma, [Differential Forms II: Manifolds and Cohomology](/page/Differential%20Forms%20II%3A%20Manifolds%20and%20Cohomology).
Androma, [Distribution](/page/Distribution).
Partial Differential Equation
Also known as: ["PDE","Partial differential equations","Partial Differential Equations"]