Distribution — Verification

Distribution — Verification - Androma

Raw Database Data

ID	Page ID	Section	Type	Contributor ID	Partition Content	Partition Order	Created At
460	1087	content	create	1	Classical analysis is built on functions: measurable maps from $\mathbb{R}^n$ (or an open subset) to $\mathbb{R}$ or $\mathbb{C}$. But many objects that arise naturally in mathematics and physics are not functions in any reasonable sense. The Dirac delta — a "unit mass concentrated at a point" — cannot be described by a function, since a function that is zero everywhere except at one point has zero integral. The derivative of the Heaviside step function should be the delta, but classical differentiation fails at the discontinuity. Green's functions of elliptic operators are singular. Shock waves in gas dynamics have discontinuous velocity fields whose "derivatives" carry the physical content of the conservation law. The theory of distributions, introduced by Laurent Schwartz in the 1940s and 1950s, resolves all of these difficulties simultaneously. A distribution is not a function but a continuous linear functional on a space of test functions. Instead of asking "what is the value of $u$ at $x$?", one asks "what is $u(\varphi)$ for every smooth, compactly supported test function $\varphi$?" This shift from pointwise evaluation to averaged evaluation against test functions is what allows distributions to encompass delta masses, derivatives of discontinuous functions, and solutions to PDEs in the weakest possible sense. ## Motivation [motivation] ### Why Functions Are Not Enough Consider the one-dimensional wave equation $u_{tt} - u_{xx} = 0$ on $\mathbb{R} \times (0, \infty)$ with initial data $u(x, 0) = f(x)$ and $u_t(x, 0) = 0$. D'Alembert's formula gives $u(x,t) = \tfrac{1}{2}(f(x+t) + f(x-t))$. If $f$ is $C^2$, this is a classical solution: $u_{tt}$ and $u_{xx}$ exist pointwise and are equal. But if $f$ is merely continuous — say, a triangular pulse with a corner — then $u$ is continuous but not $C^2$, and the equation $u_{tt} = u_{xx}$ has no pointwise meaning at the corner. Yet the formula still describes the physically correct propagation of the wave. We need a framework in which $u$ "solves" the wave equation without requiring pointwise derivatives to exist. The situation is worse for nonlinear conservation laws. The inviscid Burgers equation $u_t + uu_x = 0$ with smooth initial data can develop discontinuities (shock waves) in finite time. After the shock forms, no classical solution exists, but the physics continues — the shock propagates according to the Rankine-Hugoniot conditions. The equation must be interpreted in a sense that allows discontinuous solutions and their "derivatives" to make sense. ### The Integration-by-Parts Idea The key observation is that testing against smooth functions can replace pointwise evaluation. Suppose $f$ is a smooth function and $\varphi \in C_c^\infty(\Omega)$ is a test function. Integration by parts gives \begin{align} \int_\Omega (\partial^\alpha f)(x)\, \varphi(x) \, d\mathcal{L}^n(x) &= (-1)^{\|\alpha\|} \int_\Omega f(x)\, (\partial^\alpha \varphi)(x) \, d\mathcal{L}^n(x), \end{align} with no boundary terms because $\varphi$ has compact support. The right-hand side makes sense even when $f$ is merely locally integrable — we never differentiate $f$, only the smooth test function $\varphi$. This suggests defining the "derivative" of $f$ as the rule that assigns to each $\varphi$ the number $(-1)^{\|\alpha\|} \int f \, \partial^\alpha \varphi \, d\mathcal{L}^n$. More generally, any linear functional on test functions that is continuous in a suitable sense can serve as a "generalised function" — a distribution. ### From Weak Derivatives to Distributions The weak derivative, central to [Sobolev space](/pages/1018) theory, is a special case: $v \in L^p(\Omega)$ is the weak derivative $\partial^\alpha f$ if $\int v \varphi \, d\mathcal{L}^n = (-1)^{\|\alpha\|} \int f \, \partial^\alpha \varphi \, d\mathcal{L}^n$ for all $\varphi \in C_c^\infty(\Omega)$. But weak derivatives are required to be functions in $L^p$. Distributions remove this restriction: the distributional derivative of $f$ is the functional $\varphi \mapsto (-1)^{\|\alpha\|} \int f \, \partial^\alpha \varphi \, d\mathcal{L}^n$, which need not be representable by any function. The Heaviside function $H$ has no weak derivative in any $L^p$ (its distributional derivative is the Dirac delta, which is not a function), but it has a perfectly well-defined distributional derivative. The theory of distributions is the completion of the weak derivative idea: every locally integrable function has distributional derivatives of all orders, regardless of regularity. [/motivation] ## Test Functions [definition: Test Function Space] Let $\Omega \subseteq \mathbb{R}^n$ be a non-empty open set. The space of test functions on $\Omega$, denoted $\mathcal{D}(\Omega)$, is the vector space of all smooth functions with compact support contained in $\Omega$: \begin{align} \mathcal{D}(\Omega) &:= \{ \varphi \in C^\infty(\Omega) \mid \mathrm{supp}(\varphi) \subset \Omega \text{ is compact} \}. \end{align} [/definition] The support condition is essential: by requiring $\varphi$ to vanish identically outside a compact subset of $\Omega$, all integration-by-parts manipulations produce no boundary terms. The space $\mathcal{D}(\Omega)$ is non-empty — a standard construction produces a "bump function" $\varphi \in \mathcal{D}(\mathbb{R}^n)$ by setting $\varphi(x) = c \exp(-1/(1-\|x\|^2))$ for $\|x\| < 1$ and $\varphi(x) = 0$ for $\|x\| \geq 1$, where $c$ is a normalising constant. [definition: Convergence In Test Function Space] A sequence $\{\varphi_k\}_{k=1}^\infty \subseteq \mathcal{D}(\Omega)$ converges to $\varphi \in \mathcal{D}(\Omega)$ if there exists a compact set $K \subset \Omega$ containing $\mathrm{supp}(\varphi_k)$ for every $k \in \mathbb{N}$, and $\partial^\alpha \varphi_k \to \partial^\alpha \varphi$ uniformly on $K$ for every multi-index $\alpha \in \mathbb{N}_0^n$. [/definition] The role of $K$ in this definition deserves emphasis. Because $\mathrm{supp}(\varphi_k) \subseteq K$ for every $k$, each $\varphi_k$ is identically zero on $\Omega \setminus K$, and so is every derivative $\partial^\alpha \varphi_k$. The uniform convergence $\partial^\alpha \varphi_k \to \partial^\alpha \varphi$ on $K$, together with $\partial^\alpha \varphi_k = 0$ on $\Omega \setminus K$, forces $\partial^\alpha \varphi = 0$ on $\Omega \setminus K$ as well (since a pointwise limit of zeros is zero). In particular, $\mathrm{supp}(\varphi) \subseteq K$ is a consequence of the definition, and the limit $\varphi$ is uniquely determined on all of $\Omega$: it is pinned down by uniform convergence on $K$ and forced to vanish on $\Omega \setminus K$. One does not choose $K$ freely — it is determined by the sequence, and the limit inherits the same support constraint. This convergence is very strong: it requires all supports to be contained in a single compact set (no mass escaping to infinity or to the boundary of $\Omega$) and all derivatives to converge uniformly. The topology on $\mathcal{D}(\Omega)$ is not metrisable — it is the strict inductive limit of the Fréchet spaces $\mathcal{D}_K(\Omega) = \{\varphi \in C^\infty(\Omega) : \mathrm{supp}(\varphi) \subseteq K\}$ over an exhaustion of $\Omega$ by compact sets. This technical point rarely matters in practice: sequential continuity (i.e. preserving convergent sequences) suffices for all applications. ## The Space of Distributions [definition: Distribution] Let $\Omega \subseteq \mathbb{R}^n$ be a non-empty open set. A distribution on $\Omega$ is a continuous linear functional on $\mathcal{D}(\Omega)$: a linear map $T: \mathcal{D}(\Omega) \to \mathbb{R}$ (or $\mathbb{C}$) such that $T(\varphi_k) \to T(\varphi)$ whenever $\varphi_k \to \varphi$ in $\mathcal{D}(\Omega)$. [/definition] [definition:Space Of Distributions] The space of distributions on $\Omega$, denoted $\mathcal{D}'(\Omega)$, is the set of all distributions on $\Omega$. It is equipped with the weak* topology: a sequence $\{T_k\}_{k=1}^\infty \subseteq \mathcal{D}'(\Omega)$ converges to $T \in \mathcal{D}'(\Omega)$ if $T_k(\varphi) \to T(\varphi)$ for every $\varphi \in \mathcal{D}(\Omega)$. [/definition] Writing $T(\varphi)$ for the action of the distribution $T$ on the test function $\varphi$, linearity means $T(\alpha \varphi + \beta \psi) = \alpha T(\varphi) + \beta T(\psi)$ and continuity means that convergent sequences of test functions are mapped to convergent sequences of real numbers. We will frequently write $\partial^\alpha T$ or $T_f$ when the distribution arises from differentiation or integration against a function; the context will always clarify the notation. ### Regular and Singular Distributions Every locally integrable function defines a distribution, but not every distribution arises this way. [definition:Regular Distribution] Let $f \in L^1_\mathrm{loc}(\Omega)$. The regular distribution associated to $f$ is the distribution $T_f \in \mathcal{D}'(\Omega)$ defined by \begin{align} T_f: \mathcal{D}(\Omega) &\to \mathbb{R} \\ \varphi &\mapsto \int_\Omega f(x)\, \varphi(x) \, d\mathcal{L}^n(x). \end{align} [/definition] Linearity of $T_f$ follows from linearity of the integral. For continuity, suppose $\varphi_k \to 0$ in $\mathcal{D}(\Omega)$, with all supports in a compact set $K$. Then $\|T_f(\varphi_k)\| \leq \\|\varphi_k\\|_{L^\infty(K)} \int_K \|f\| \, d\mathcal{L}^n \to 0$, since $\varphi_k \to 0$ uniformly and $f \in L^1(K)$ (because $f$ is locally integrable and $K$ is compact). The map $f \mapsto T_f$ from $L^1_\mathrm{loc}(\Omega)$ to $\mathcal{D}'(\Omega)$ is injective: if $T_f = T_g$, then $\int (f - g) \varphi \, d\mathcal{L}^n = 0$ for all $\varphi \in \mathcal{D}(\Omega)$, and a standard approximation argument (mollify $\mathrm{sgn}(f - g)$ on compact subsets) forces $f = g$ almost everywhere. We may therefore identify $L^1_\mathrm{loc}(\Omega)$ with a subspace of $\mathcal{D}'(\Omega)$ and write $f$ for $T_f$ when no confusion arises. Distributions that are not representable by any locally integrable function are called singular distributions. The most important example is the Dirac delta. [example:Dirac Delta] For $x_0 \in \Omega$, the Dirac delta at $x_0$ is the distribution \begin{align} \delta_{x_0}: \mathcal{D}(\Omega) &\to \mathbb{R} \\ \varphi &\mapsto \varphi(x_0). \end{align} Linearity is immediate. For continuity, if $\varphi_k \to 0$ in $\mathcal{D}(\Omega)$, then $\varphi_k \to 0$ uniformly, so $\delta_{x_0}(\varphi_k) = \varphi_k(x_0) \to 0$. The delta is singular: no $f \in L^1_\mathrm{loc}(\Omega)$ satisfies $\int f \varphi \, d\mathcal{L}^n = \varphi(x_0)$ for all $\varphi$, since the left-hand side is unchanged when $\varphi$ is modified on a null set while the right-hand side depends on the pointwise value at $x_0$. [/example] ## The Distributional Derivative The central operation on distributions — and the primary reason the theory exists — is differentiation. Every distribution has derivatives of all orders, regardless of any notion of regularity. [definition:Distributional Derivative] Let $T \in \mathcal{D}'(\Omega)$ and let $\alpha \in \mathbb{N}_0^n$ be a multi-index. The distributional derivative of $T$ of order $\alpha$ is the distribution $\partial^\alpha T \in \mathcal{D}'(\Omega)$ defined by \begin{align} \partial^\alpha T: \mathcal{D}(\Omega) &\to \mathbb{R} \\ \varphi &\mapsto (-1)^{\|\alpha\|} T(\partial^\alpha \varphi). \end{align} [/definition] The sign $(-1)^{\|\alpha\|}$ is chosen so that for regular distributions generated by smooth functions, the distributional derivative agrees with the classical one — this is the content of the integration-by-parts formula. The map $\varphi \mapsto \partial^\alpha \varphi$ is continuous on $\mathcal{D}(\Omega)$ (differentiating a convergent sequence in $\mathcal{D}$ produces a convergent sequence), so $\partial^\alpha T$ is indeed a distribution. A key consequence is that the derivative of any distribution is again a distribution, so every element of $\mathcal{D}'(\Omega)$ is infinitely differentiable. This is in sharp contrast to classical analysis, where differentiability is a restrictive regularity condition. The distributional derivative is consistent with both the classical derivative and the weak derivative used in Sobolev space theory — these are proved on the [Distributional Derivative](/pages/1046) page. In particular, if $f \in C^{\|\alpha\|}(\Omega)$ then $\partial^\alpha T_f = T_{\partial^\alpha f}$, and if $f \in W^{\|\alpha\|,p}(\Omega)$ with weak derivative $v$, then $\partial^\alpha T_f = T_v$. The three notions of derivative form a hierarchy: classical $\Rightarrow$ weak $\Rightarrow$ distributional, with each level strictly more general than the previous one. The Heaviside function has a distributional derivative ($\delta_0$) but no weak derivative in any $L^p$; a $W^{1,p}$ function has a weak derivative that also serves as its distributional derivative; a $C^1$ function has all three, and they agree. ### Computing Distributional Derivatives [example:Derivative Of The Heaviside Function] The Heaviside step function $H(x) = \mathbb{1}_{[0,\infty)}(x)$ is locally integrable on $\mathbb{R}$ and defines a regular distribution $T_H$. For any $\varphi \in \mathcal{D}(\mathbb{R})$: \begin{align} (\partial T_H)(\varphi) &= -T_H(\varphi') = -\int_0^\infty \varphi'(x) \, d\mathcal{L}^1(x) = -[\varphi(x)]_0^\infty = \varphi(0) = \delta_0(\varphi), \end{align} where the boundary term at infinity vanishes because $\varphi$ has compact support. Therefore $\partial T_H = \delta_0$: the distributional derivative of the step function is the Dirac delta. The unit jump discontinuity at $x = 0$ produces a delta mass of weight $1$. More generally, if $f$ is piecewise $C^1$ on $\mathbb{R}$ with jump discontinuities $[f]_{x_k} = f(x_k^+) - f(x_k^-)$ at finitely many points $\{x_k\}$, then $\partial T_f = T_{f'} + \sum_k [f]_{x_k} \delta_{x_k}$, where $f'$ denotes the classical derivative on the complement of $\{x_k\}$. Each jump contributes a delta mass whose weight equals the size of the jump. [/example] [example:Distributional Laplacian Of The Newton Potential] In dimension $n = 3$, the Newton potential $\Phi(x) = (4\pi\|x\|)^{-1}$ is locally integrable (the singularity at the origin is integrable because $\int_0^1 r^{-1} r^2 \, d\mathcal{L}^1(r) < \infty$ in polar coordinates) and is smooth and harmonic away from the origin: $\Delta \Phi = 0$ on $\mathbb{R}^3 \setminus \{0\}$. Yet the distributional Laplacian detects the singularity. For $\varphi \in \mathcal{D}(\mathbb{R}^3)$, the full computation (carried out on the [Distributional Derivative](/pages/1046) page using Green's identity on $\mathbb{R}^3 \setminus B(0, \varepsilon)$ and taking $\varepsilon \to 0$) gives \begin{align} (\Delta T_\Phi)(\varphi) &= T_\Phi(\Delta \varphi) = \varphi(0) = \delta_0(\varphi). \end{align} Therefore $\Delta T_\Phi = \delta_0$, or equivalently $-\Delta \Phi = -\delta_0$ in $\mathcal{D}'(\mathbb{R}^3)$. This is the distributional identity underlying the Poisson equation: the fundamental solution $\Phi$ satisfies $-\Delta \Phi = -\delta_0$, and the solution to $-\Delta u = f$ on $\mathbb{R}^3$ (for suitable $f$) is the convolution $u = \Phi * f$. [/example] ## Operations on Distributions ### Multiplication by Smooth Functions [definition:Smooth Multiplication] Let $T \in \mathcal{D}'(\Omega)$ and $\psi \in C^\infty(\Omega)$. The product $\psi T \in \mathcal{D}'(\Omega)$ is defined by \begin{align} (\psi T): \mathcal{D}(\Omega) &\to \mathbb{R} \\ \varphi &\mapsto T(\psi \varphi). \end{align} [/definition] This is well-defined because $\psi \varphi \in \mathcal{D}(\Omega)$ whenever $\varphi \in \mathcal{D}(\Omega)$ (the product of a smooth function with a compactly supported smooth function is compactly supported and smooth), and the map $\varphi \mapsto \psi \varphi$ is continuous on $\mathcal{D}(\Omega)$. For regular distributions, $\psi T_f = T_{\psi f}$, so the definition extends pointwise multiplication. One cannot multiply two arbitrary distributions: the product $\delta_0 \cdot \delta_0$ has no consistent definition within distribution theory (it would require evaluating $\delta_0$ at a point, which distributions cannot do). This limitation is fundamental — the space $\mathcal{D}'(\Omega)$ is a module over $C^\infty(\Omega)$ but not an algebra. The difficulty of multiplying distributions is the source of the renormalisation problem in quantum field theory and the need for paraproduct decompositions in nonlinear PDE. The Leibniz rule extends to distributions: for $\psi \in C^\infty(\Omega)$ and $T \in \mathcal{D}'(\Omega)$, \begin{align} \partial_j(\psi T) &= (\partial_j \psi) T + \psi (\partial_j T), \end{align} as is verified directly from the definitions. ### Support of a Distribution [definition:Support Of A Distribution] Let $T \in \mathcal{D}'(\Omega)$. The support of $T$, denoted $\mathrm{supp}(T)$, is the complement in $\Omega$ of the largest open set on which $T$ vanishes. Explicitly, $x_0 \notin \mathrm{supp}(T)$ if and only if there exists an open neighbourhood $U$ of $x_0$ such that $T(\varphi) = 0$ for every $\varphi \in \mathcal{D}(U)$. [/definition] For a regular distribution $T_f$ with $f \in L^1_\mathrm{loc}(\Omega)$, the support of $T_f$ coincides with the essential support of $f$: the smallest closed set outside of which $f = 0$ almost everywhere. The Dirac delta $\delta_{x_0}$ has support $\{x_0\}$ — it is a distribution concentrated at a single point. The following structure theorem, due to Schwartz, classifies all such distributions. [theorem:Distributions Supported At A Point] Let $T \in \mathcal{D}'(\Omega)$ with $\mathrm{supp}(T) = \{x_0\}$ for some $x_0 \in \Omega$. Then $T$ is a finite linear combination of derivatives of the Dirac delta at $x_0$: there exist $N \in \mathbb{N}_0$ and constants $c_\alpha \in \mathbb{R}$ for $\|\alpha\| \leq N$ such that \begin{align} T &= \sum_{\|\alpha\| \leq N} c_\alpha \, \partial^\alpha \delta_{x_0}. \end{align} [/theorem] The theorem says that the only distributions that "live" at a single point are the delta and its derivatives. This is remarkable: without any a priori regularity assumption, the support condition alone forces the distribution to be a finite-order differential operator applied to $\delta_{x_0}$. The proof proceeds by showing that $T$ annihilates every test function that vanishes to sufficiently high order at $x_0$ (by Taylor expansion and a cutoff argument), and then the remaining information is captured by the Taylor coefficients, which are precisely the $c_\alpha$. ## The Hierarchy of Function and Distribution Spaces The various spaces of functions and distributions form a chain of continuous inclusions that organises the entire theory. When $\Omega = \mathbb{R}^n$, the chain is \begin{align} \mathcal{D}(\mathbb{R}^n) \subseteq \mathcal{S}(\mathbb{R}^n) \subseteq L^p(\mathbb{R}^n) \subseteq \mathcal{S}'(\mathbb{R}^n) \subseteq \mathcal{D}'(\mathbb{R}^n), \end{align} where $1 \leq p \leq \infty$, and all inclusions are continuous. Each space in the chain is larger and allows rougher objects: $\mathcal{D}(\mathbb{R}^n)$ (test functions): smooth, compactly supported. The smallest space, but dense in all $L^p$ for $p < \infty$. $\mathcal{S}(\mathbb{R}^n)$ ([Schwartz space](/pages/1050)): smooth, rapidly decaying with all derivatives. Larger than $\mathcal{D}$ (includes Gaussians), still dense in $L^p$. The natural domain of the Fourier transform; equipped with a Fréchet topology. $L^p(\mathbb{R}^n)$: measurable functions with finite $p$-th moment. Contains non-smooth functions (e.g. step functions, $\|x\|^{-\gamma}$ near the origin) but requires integrability. $\mathcal{S}'(\mathbb{R}^n)$ ([tempered distributions](/pages/1053)): the dual of $\mathcal{S}$. Contains all $L^p$ functions, all polynomials, and the Dirac delta. The Fourier transform extends to $\mathcal{S}'$ by duality ([quotetheorem:230]). The "temperedness" condition — continuity with respect to the Schwartz semi-norms — excludes distributions of faster-than-polynomial growth, such as $T_{e^{e^x}}$. $\mathcal{D}'(\mathbb{R}^n)$ (distributions): the dual of $\mathcal{D}$. The largest space: it contains all tempered distributions and also objects like $T_{e^{e^x}}$ that grow super-polynomially. The Fourier transform is not defined on all of $\mathcal{D}'$ — this is the reason for introducing $\mathcal{S}'$ as an intermediate space. The dual pairing reverses the chain: the test function spaces $\mathcal{D} \subseteq \mathcal{S}$ get smaller, while the distribution spaces $\mathcal{S}' \subseteq \mathcal{D}'$ get larger. Making the test functions more restrictive (requiring rapid decay or compact support) allows the distributions to be wilder, because a functional on a smaller space needs to satisfy fewer continuity conditions. On a general open set $\Omega \neq \mathbb{R}^n$, the Schwartz space is not defined (it requires decay at infinity, which is only meaningful on all of $\mathbb{R}^n$), and the relevant chain reduces to $\mathcal{D}(\Omega) \subseteq L^p(\Omega) \subseteq \mathcal{D}'(\Omega)$. ## Application to PDEs: Distributional Solutions ### The Concept of a Distributional Solution The distributional framework provides the weakest notion of "solution" to a PDE: a distributional solution is a distribution that satisfies the equation when tested against all test functions. This is weaker than a weak solution in the Sobolev sense (which requires the solution to be a function in some $L^p$ or $W^{k,p}$ space) and allows genuinely singular solutions. [definition:Distributional Solution] Let $L = \sum_{\|\alpha\| \leq m} a_\alpha(x) \partial^\alpha$ be a linear differential operator with smooth coefficients $a_\alpha \in C^\infty(\Omega)$, and let $f \in \mathcal{D}'(\Omega)$. A distribution $u \in \mathcal{D}'(\Omega)$ is a distributional solution of $Lu = f$ if \begin{align} u(L^ \varphi) &= f(\varphi) \quad \text{for every } \varphi \in \mathcal{D}(\Omega), \end{align} where $L^ = \sum_{\|\alpha\| \leq m} (-1)^{\|\alpha\|} \partial^\alpha (a_\alpha \, \cdot)$ is the formal adjoint of $L$. [/definition] For the Laplacian $L = -\Delta$, the formal adjoint is $L^* = -\Delta$ (since $\Delta$ is self-adjoint), so a distributional solution of $-\Delta u = f$ satisfies $u(-\Delta \varphi) = f(\varphi)$ for all $\varphi \in \mathcal{D}(\Omega)$. When $u$ and $f$ are regular distributions, this reduces to the standard weak formulation $\int \nabla u \cdot \nabla \varphi \, d\mathcal{L}^n = \int f \varphi \, d\mathcal{L}^n$ (after one more integration by parts). ### Shock Waves as Distributional Solutions The most physically compelling application of distributional solutions is to conservation laws with discontinuous solutions. [example:Burgers Equation Shock Wave] The inviscid Burgers equation in one dimension is \begin{align} u_t + u \, u_x &= 0 \quad \text{on } \mathbb{R} \times (0, \infty), \end{align} or equivalently in conservation form, \begin{align} u_t + \partial_x\!\left(\tfrac{1}{2}u^2\right) &= 0. \end{align} Consider the initial data \begin{align} u_0: \mathbb{R} &\to \mathbb{R} \\ x &\mapsto \begin{cases} 1 & \text{if } x < 0, \\ 0 & \text{if } x > 0. \end{cases} \end{align} No classical solution exists for $t > 0$: characteristics from the left carry the value $1$ at speed $1$, while characteristics from the right carry the value $0$ at speed $0$, and they collide immediately. But a distributional solution exists as a travelling discontinuity. Define \begin{align} u: \mathbb{R} \times (0, \infty) &\to \mathbb{R} \\ (x, t) &\mapsto \begin{cases} 1 & \text{if } x < t/2, \\ 0 & \text{if } x > t/2. \end{cases} \end{align} This is a shock wave propagating at speed $s = 1/2$, which is the Rankine-Hugoniot speed $s = [u^2/2]/[u] = (1/2 - 0)/(1 - 0) = 1/2$. Verification as a distributional solution. The function $u$ is locally integrable on $\mathbb{R} \times (0, \infty)$ and defines a regular distribution. We must show that $T_u(\partial_t \varphi + u \, \partial_x \varphi) = 0$ for all $\varphi \in \mathcal{D}(\mathbb{R} \times (0, \infty))$. Equivalently, using the conservation form, we verify \begin{align} \int_0^\infty \int_{-\infty}^\infty \left(u \, \varphi_t + \tfrac{1}{2}u^2 \, \varphi_x\right) d\mathcal{L}^1(x) \, d\mathcal{L}^1(t) &= 0. \end{align} Split the integral at the shock line $x = t/2$. On $\{x < t/2\}$, $u = 1$ and $u^2/2 = 1/2$, giving $\int\!\!\int_{x < t/2} (\varphi_t + \tfrac{1}{2}\varphi_x) \, d\mathcal{L}^1(x) \, d\mathcal{L}^1(t)$. On $\{x > t/2\}$, $u = 0$ and both terms vanish. Integrating by parts on the region $\{x < t/2\}$ (where $u$ is constant, so $u_t = u_x = 0$ classically), the volume integral vanishes and the boundary contribution along $x = t/2$ is \begin{align} \int_0^\infty \varphi(t/2, t) \left(-s \cdot [u] + [u^2/2]\right) d\mathcal{L}^1(t), \end{align} where $[u] = 1 - 0 = 1$ and $[u^2/2] = 1/2 - 0 = 1/2$ are the jumps across the shock. Since $s = 1/2$, the factor $-s[u] + [u^2/2] = -1/2 + 1/2 = 0$, and the integral vanishes for every $\varphi$. The Rankine-Hugoniot condition is precisely the condition that makes the distributional equation hold across the shock. [/example] ### The Heat Kernel as a Distributional Initial-Value Solution [example:Heat Kernel Distributional Solution] The heat equation on $\mathbb{R}^n \times (0, \infty)$ is $u_t - \Delta u = 0$. The heat kernel (or Gauss-Weierstrass kernel) is \begin{align} K: \mathbb{R}^n \times (0, \infty) &\to \mathbb{R} \\ (x, t) &\mapsto \frac{1}{(4\pi t)^{n/2}} \exp\!\left(-\frac{\|x\|^2}{4t}\right). \end{align} For each $t > 0$, $K(\cdot, t) \in \mathcal{S}(\mathbb{R}^n)$ (it is a Gaussian, hence smooth and rapidly decaying). A direct computation confirms that $K$ satisfies $K_t - \Delta K = 0$ classically for $t > 0$. The distributional content lies in the initial condition. As $t \to 0^+$, the Gaussian $K(\cdot, t)$ concentrates: its $L^1$ norm is $\int_{\mathbb{R}^n} K(x, t) \, d\mathcal{L}^n(x) = 1$ for every $t > 0$ (by the Gaussian integral), while its support effectively shrinks to $\{0\}$. For any $\varphi \in \mathcal{D}(\mathbb{R}^n)$, \begin{align} \int_{\mathbb{R}^n} K(x, t) \, \varphi(x) \, d\mathcal{L}^n(x) &= \int_{\mathbb{R}^n} \frac{1}{(4\pi t)^{n/2}} e^{-\|x\|^2/(4t)} \varphi(x) \, d\mathcal{L}^n(x). \end{align} Substituting $y = x / \sqrt{4t}$ gives $\int_{\mathbb{R}^n} \pi^{-n/2} e^{-\|y\|^2} \varphi(\sqrt{4t}\, y) \, d\mathcal{L}^n(y) \to \varphi(0) \int_{\mathbb{R}^n} \pi^{-n/2} e^{-\|y\|^2} \, d\mathcal{L}^n(y) = \varphi(0)$ as $t \to 0^+$, by the dominated convergence theorem (using $\|\varphi(\sqrt{4t}\, y)\| \leq \\|\varphi\\|_{L^\infty}$ and integrability of the Gaussian). Therefore $T_{K(\cdot,t)} \to \delta_0$ in $\mathcal{D}'(\mathbb{R}^n)$ as $t \to 0^+$. This says that $K$ is the fundamental solution of the heat equation: it is a classical solution for $t > 0$ whose initial data, in the distributional sense, is the Dirac delta. The solution to $u_t - \Delta u = 0$ with initial data $u(\cdot, 0) = f$ (for $f \in L^p(\mathbb{R}^n)$, $1 \leq p \leq \infty$) is then given by convolution: $u(x, t) = (K(\cdot, t) * f)(x) = \int_{\mathbb{R}^n} K(x - y, t) f(y) \, d\mathcal{L}^n(y)$. [/example] ## References 1. L. Schwartz, Théorie des Distributions, 2nd ed. (1966). 2. L. Hörmander, The Analysis of Linear Partial Differential Operators I (1983). 3. L. C. Evans, Partial Differential Equations (1998). 4. W. Rudin, Functional Analysis (1991). 5. F. G. Friedlander and M. Joshi, Introduction to the Theory of Distributions, 2nd ed. (1998).	0	Fri Feb 27 2026 21:11:42 GMT+0000 (Coordinated Universal Time)

Page ID

Section

Type

Contributor ID

Partition Content

Partition Order

Created At

460

1087

content

create

Classical analysis is built on functions: measurable maps from $\mathbb{R}^n$ (or an open subset) to $\mathbb{R}$ or $\mathbb{C}$. But many objects that arise naturally in mathematics and physics are not functions in any reasonable sense. The Dirac delta — a "unit mass concentrated at a point" — cannot be described by a function, since a function that is zero everywhere except at one point has zero integral. The derivative of the Heaviside step function should be the delta, but classical differentiation fails at the discontinuity. Green's functions of elliptic operators are singular. Shock waves in gas dynamics have discontinuous velocity fields whose "derivatives" carry the physical content of the conservation law.

The theory of distributions, introduced by Laurent Schwartz in the 1940s and 1950s, resolves all of these difficulties simultaneously. A distribution is not a function but a *continuous linear functional on a space of test functions*. Instead of asking "what is the value of $u$ at $x$?", one asks "what is $u(\varphi)$ for every smooth, compactly supported test function $\varphi$?" This shift from pointwise evaluation to averaged evaluation against test functions is what allows distributions to encompass delta masses, derivatives of discontinuous functions, and solutions to PDEs in the weakest possible sense.

## Motivation

[motivation]

### Why Functions Are Not Enough

Consider the one-dimensional wave equation $u_{tt} - u_{xx} = 0$ on $\mathbb{R} \times (0, \infty)$ with initial data $u(x, 0) = f(x)$ and $u_t(x, 0) = 0$. D'Alembert's formula gives $u(x,t) = \tfrac{1}{2}(f(x+t) + f(x-t))$. If $f$ is $C^2$, this is a classical solution: $u_{tt}$ and $u_{xx}$ exist pointwise and are equal. But if $f$ is merely continuous — say, a triangular pulse with a corner — then $u$ is continuous but not $C^2$, and the equation $u_{tt} = u_{xx}$ has no pointwise meaning at the corner. Yet the formula still describes the physically correct propagation of the wave. We need a framework in which $u$ "solves" the wave equation without requiring pointwise derivatives to exist.

The situation is worse for nonlinear conservation laws. The inviscid Burgers equation $u_t + uu_x = 0$ with smooth initial data can develop discontinuities (shock waves) in finite time. After the shock forms, no classical solution exists, but the physics continues — the shock propagates according to the Rankine-Hugoniot conditions. The equation must be interpreted in a sense that allows discontinuous solutions and their "derivatives" to make sense.

### The Integration-by-Parts Idea

The key observation is that *testing against smooth functions* can replace pointwise evaluation. Suppose $f$ is a smooth function and $\varphi \in C_c^\infty(\Omega)$ is a test function. Integration by parts gives
\begin{align*}
\int_\Omega (\partial^\alpha f)(x)\, \varphi(x) \, d\mathcal{L}^n(x) &= (-1)^{|\alpha|} \int_\Omega f(x)\, (\partial^\alpha \varphi)(x) \, d\mathcal{L}^n(x),
\end{align*}
with no boundary terms because $\varphi$ has compact support. The right-hand side makes sense even when $f$ is merely locally integrable — we never differentiate $f$, only the smooth test function $\varphi$. This suggests defining the "derivative" of $f$ as the *rule* that assigns to each $\varphi$ the number $(-1)^{|\alpha|} \int f \, \partial^\alpha \varphi \, d\mathcal{L}^n$. More generally, any linear functional on test functions that is continuous in a suitable sense can serve as a "generalised function" — a distribution.

### From Weak Derivatives to Distributions

The weak derivative, central to [Sobolev space](/pages/1018) theory, is a special case: $v \in L^p(\Omega)$ is the weak derivative $\partial^\alpha f$ if $\int v \varphi \, d\mathcal{L}^n = (-1)^{|\alpha|} \int f \, \partial^\alpha \varphi \, d\mathcal{L}^n$ for all $\varphi \in C_c^\infty(\Omega)$. But weak derivatives are required to be *functions* in $L^p$. Distributions remove this restriction: the distributional derivative of $f$ is the functional $\varphi \mapsto (-1)^{|\alpha|} \int f \, \partial^\alpha \varphi \, d\mathcal{L}^n$, which need not be representable by any function. The Heaviside function $H$ has no weak derivative in any $L^p$ (its distributional derivative is the Dirac delta, which is not a function), but it has a perfectly well-defined distributional derivative. The theory of distributions is the completion of the weak derivative idea: every locally integrable function has distributional derivatives of all orders, regardless of regularity.

[/motivation]

## Test Functions

[definition: Test Function Space]
Let $\Omega \subseteq \mathbb{R}^n$ be a non-empty open set. The **space of test functions** on $\Omega$, denoted $\mathcal{D}(\Omega)$, is the vector space of all smooth functions with compact support contained in $\Omega$:
\begin{align*}
\mathcal{D}(\Omega) &:= \{  \varphi \in C^\infty(\Omega) \mid \mathrm{supp}(\varphi) \subset \Omega \text{ is compact}  \}.
\end{align*}
[/definition]

The support condition is essential: by requiring $\varphi$ to vanish identically outside a compact subset of $\Omega$, all integration-by-parts manipulations produce no boundary terms. The space $\mathcal{D}(\Omega)$ is non-empty — a standard construction produces a "bump function" $\varphi \in \mathcal{D}(\mathbb{R}^n)$ by setting $\varphi(x) = c \exp(-1/(1-|x|^2))$ for $|x| < 1$ and $\varphi(x) = 0$ for $|x| \geq 1$, where $c$ is a normalising constant.

[definition: Convergence In Test Function Space]
A sequence $\{\varphi_k\}_{k=1}^\infty \subseteq \mathcal{D}(\Omega)$ **converges** to $\varphi \in \mathcal{D}(\Omega)$ if there exists a compact set $K \subset \Omega$ containing $\mathrm{supp}(\varphi_k)$ for every $k \in \mathbb{N}$, and $\partial^\alpha \varphi_k \to \partial^\alpha \varphi$ uniformly on $K$ for every multi-index $\alpha \in \mathbb{N}_0^n$.
[/definition]

The role of $K$ in this definition deserves emphasis. Because $\mathrm{supp}(\varphi_k) \subseteq K$ for every $k$, each $\varphi_k$ is identically zero on $\Omega \setminus K$, and so is every derivative $\partial^\alpha \varphi_k$. The uniform convergence $\partial^\alpha \varphi_k \to \partial^\alpha \varphi$ on $K$, together with $\partial^\alpha \varphi_k = 0$ on $\Omega \setminus K$, forces $\partial^\alpha \varphi = 0$ on $\Omega \setminus K$ as well (since a pointwise limit of zeros is zero). In particular, $\mathrm{supp}(\varphi) \subseteq K$ is a *consequence* of the definition, and the limit $\varphi$ is uniquely determined on all of $\Omega$: it is pinned down by uniform convergence on $K$ and forced to vanish on $\Omega \setminus K$. One does not choose $K$ freely — it is determined by the sequence, and the limit inherits the same support constraint.

This convergence is very strong: it requires all supports to be contained in a single compact set (no mass escaping to infinity or to the boundary of $\Omega$) and all derivatives to converge uniformly. The topology on $\mathcal{D}(\Omega)$ is *not* metrisable — it is the strict inductive limit of the Fréchet spaces $\mathcal{D}_K(\Omega) = \{\varphi \in C^\infty(\Omega) : \mathrm{supp}(\varphi) \subseteq K\}$ over an exhaustion of $\Omega$ by compact sets. This technical point rarely matters in practice: sequential continuity (i.e. preserving convergent sequences) suffices for all applications.

## The Space of Distributions

[definition: Distribution]
Let $\Omega \subseteq \mathbb{R}^n$ be a non-empty open set. A **distribution** on $\Omega$ is a continuous linear functional on $\mathcal{D}(\Omega)$: a linear map $T: \mathcal{D}(\Omega) \to \mathbb{R}$ (or $\mathbb{C}$) such that $T(\varphi_k) \to T(\varphi)$ whenever $\varphi_k \to \varphi$ in $\mathcal{D}(\Omega)$.
[/definition]

[definition:Space Of Distributions]
The **space of distributions** on $\Omega$, denoted $\mathcal{D}'(\Omega)$, is the set of all distributions on $\Omega$. It is equipped with the weak* topology: a sequence $\{T_k\}_{k=1}^\infty \subseteq \mathcal{D}'(\Omega)$ converges to $T \in \mathcal{D}'(\Omega)$ if $T_k(\varphi) \to T(\varphi)$ for every $\varphi \in \mathcal{D}(\Omega)$.
[/definition]

Writing $T(\varphi)$ for the action of the distribution $T$ on the test function $\varphi$, linearity means $T(\alpha \varphi + \beta \psi) = \alpha T(\varphi) + \beta T(\psi)$ and continuity means that convergent sequences of test functions are mapped to convergent sequences of real numbers. We will frequently write $\partial^\alpha T$ or $T_f$ when the distribution arises from differentiation or integration against a function; the context will always clarify the notation.

### Regular and Singular Distributions

Every locally integrable function defines a distribution, but not every distribution arises this way.

[definition:Regular Distribution]
Let $f \in L^1_\mathrm{loc}(\Omega)$. The **regular distribution** associated to $f$ is the distribution $T_f \in \mathcal{D}'(\Omega)$ defined by
\begin{align*}
T_f: \mathcal{D}(\Omega) &\to \mathbb{R} \\
\varphi &\mapsto \int_\Omega f(x)\, \varphi(x) \, d\mathcal{L}^n(x).
\end{align*}
[/definition]

Linearity of $T_f$ follows from linearity of the integral. For continuity, suppose $\varphi_k \to 0$ in $\mathcal{D}(\Omega)$, with all supports in a compact set $K$. Then $|T_f(\varphi_k)| \leq \|\varphi_k\|_{L^\infty(K)} \int_K |f| \, d\mathcal{L}^n \to 0$, since $\varphi_k \to 0$ uniformly and $f \in L^1(K)$ (because $f$ is locally integrable and $K$ is compact).

The map $f \mapsto T_f$ from $L^1_\mathrm{loc}(\Omega)$ to $\mathcal{D}'(\Omega)$ is injective: if $T_f = T_g$, then $\int (f - g) \varphi \, d\mathcal{L}^n = 0$ for all $\varphi \in \mathcal{D}(\Omega)$, and a standard approximation argument (mollify $\mathrm{sgn}(f - g)$ on compact subsets) forces $f = g$ almost everywhere. We may therefore identify $L^1_\mathrm{loc}(\Omega)$ with a subspace of $\mathcal{D}'(\Omega)$ and write $f$ for $T_f$ when no confusion arises.

Distributions that are not representable by any locally integrable function are called **singular distributions**. The most important example is the Dirac delta.

[example:Dirac Delta]
For $x_0 \in \Omega$, the **Dirac delta** at $x_0$ is the distribution
\begin{align*}
\delta_{x_0}: \mathcal{D}(\Omega) &\to \mathbb{R} \\
\varphi &\mapsto \varphi(x_0).
\end{align*}
Linearity is immediate. For continuity, if $\varphi_k \to 0$ in $\mathcal{D}(\Omega)$, then $\varphi_k \to 0$ uniformly, so $\delta_{x_0}(\varphi_k) = \varphi_k(x_0) \to 0$. The delta is singular: no $f \in L^1_\mathrm{loc}(\Omega)$ satisfies $\int f \varphi \, d\mathcal{L}^n = \varphi(x_0)$ for all $\varphi$, since the left-hand side is unchanged when $\varphi$ is modified on a null set while the right-hand side depends on the pointwise value at $x_0$.
[/example]

## The Distributional Derivative

The central operation on distributions — and the primary reason the theory exists — is differentiation. Every distribution has derivatives of all orders, regardless of any notion of regularity.

[definition:Distributional Derivative]
Let $T \in \mathcal{D}'(\Omega)$ and let $\alpha \in \mathbb{N}_0^n$ be a multi-index. The **distributional derivative** of $T$ of order $\alpha$ is the distribution $\partial^\alpha T \in \mathcal{D}'(\Omega)$ defined by
\begin{align*}
\partial^\alpha T: \mathcal{D}(\Omega) &\to \mathbb{R} \\
\varphi &\mapsto (-1)^{|\alpha|} T(\partial^\alpha \varphi).
\end{align*}
[/definition]

The sign $(-1)^{|\alpha|}$ is chosen so that for regular distributions generated by smooth functions, the distributional derivative agrees with the classical one — this is the content of the integration-by-parts formula. The map $\varphi \mapsto \partial^\alpha \varphi$ is continuous on $\mathcal{D}(\Omega)$ (differentiating a convergent sequence in $\mathcal{D}$ produces a convergent sequence), so $\partial^\alpha T$ is indeed a distribution. A key consequence is that *the derivative of any distribution is again a distribution*, so every element of $\mathcal{D}'(\Omega)$ is infinitely differentiable. This is in sharp contrast to classical analysis, where differentiability is a restrictive regularity condition.

The distributional derivative is consistent with both the classical derivative and the weak derivative used in Sobolev space theory — these are proved on the [Distributional Derivative](/pages/1046) page. In particular, if $f \in C^{|\alpha|}(\Omega)$ then $\partial^\alpha T_f = T_{\partial^\alpha f}$, and if $f \in W^{|\alpha|,p}(\Omega)$ with weak derivative $v$, then $\partial^\alpha T_f = T_v$. The three notions of derivative form a hierarchy: classical $\Rightarrow$ weak $\Rightarrow$ distributional, with each level strictly more general than the previous one. The Heaviside function has a distributional derivative ($\delta_0$) but no weak derivative in any $L^p$; a $W^{1,p}$ function has a weak derivative that also serves as its distributional derivative; a $C^1$ function has all three, and they agree.

### Computing Distributional Derivatives

[example:Derivative Of The Heaviside Function]
The Heaviside step function $H(x) = \mathbb{1}_{[0,\infty)}(x)$ is locally integrable on $\mathbb{R}$ and defines a regular distribution $T_H$. For any $\varphi \in \mathcal{D}(\mathbb{R})$:
\begin{align*}
(\partial T_H)(\varphi) &= -T_H(\varphi') = -\int_0^\infty \varphi'(x) \, d\mathcal{L}^1(x) = -[\varphi(x)]_0^\infty = \varphi(0) = \delta_0(\varphi),
\end{align*}
where the boundary term at infinity vanishes because $\varphi$ has compact support. Therefore $\partial T_H = \delta_0$: the distributional derivative of the step function is the Dirac delta. The unit jump discontinuity at $x = 0$ produces a delta mass of weight $1$.

More generally, if $f$ is piecewise $C^1$ on $\mathbb{R}$ with jump discontinuities $[f]_{x_k} = f(x_k^+) - f(x_k^-)$ at finitely many points $\{x_k\}$, then $\partial T_f = T_{f'} + \sum_k [f]_{x_k} \delta_{x_k}$, where $f'$ denotes the classical derivative on the complement of $\{x_k\}$. Each jump contributes a delta mass whose weight equals the size of the jump.
[/example]

[example:Distributional Laplacian Of The Newton Potential]
In dimension $n = 3$, the Newton potential $\Phi(x) = (4\pi|x|)^{-1}$ is locally integrable (the singularity at the origin is integrable because $\int_0^1 r^{-1} r^2 \, d\mathcal{L}^1(r) < \infty$ in polar coordinates) and is smooth and harmonic away from the origin: $\Delta \Phi = 0$ on $\mathbb{R}^3 \setminus \{0\}$. Yet the distributional Laplacian detects the singularity. For $\varphi \in \mathcal{D}(\mathbb{R}^3)$, the full computation (carried out on the [Distributional Derivative](/pages/1046) page using Green's identity on $\mathbb{R}^3 \setminus B(0, \varepsilon)$ and taking $\varepsilon \to 0$) gives
\begin{align*}
(\Delta T_\Phi)(\varphi) &= T_\Phi(\Delta \varphi) = \varphi(0) = \delta_0(\varphi).
\end{align*}
Therefore $\Delta T_\Phi = \delta_0$, or equivalently $-\Delta \Phi = -\delta_0$ in $\mathcal{D}'(\mathbb{R}^3)$. This is the distributional identity underlying the Poisson equation: the fundamental solution $\Phi$ satisfies $-\Delta \Phi = -\delta_0$, and the solution to $-\Delta u = f$ on $\mathbb{R}^3$ (for suitable $f$) is the convolution $u = \Phi * f$.
[/example]

## Operations on Distributions

### Multiplication by Smooth Functions

[definition:Smooth Multiplication]
Let $T \in \mathcal{D}'(\Omega)$ and $\psi \in C^\infty(\Omega)$. The **product** $\psi T \in \mathcal{D}'(\Omega)$ is defined by
\begin{align*}
(\psi T): \mathcal{D}(\Omega) &\to \mathbb{R} \\
\varphi &\mapsto T(\psi \varphi).
\end{align*}
[/definition]

This is well-defined because $\psi \varphi \in \mathcal{D}(\Omega)$ whenever $\varphi \in \mathcal{D}(\Omega)$ (the product of a smooth function with a compactly supported smooth function is compactly supported and smooth), and the map $\varphi \mapsto \psi \varphi$ is continuous on $\mathcal{D}(\Omega)$. For regular distributions, $\psi T_f = T_{\psi f}$, so the definition extends pointwise multiplication.

One *cannot* multiply two arbitrary distributions: the product $\delta_0 \cdot \delta_0$ has no consistent definition within distribution theory (it would require evaluating $\delta_0$ at a point, which distributions cannot do). This limitation is fundamental — the space $\mathcal{D}'(\Omega)$ is a module over $C^\infty(\Omega)$ but not an algebra. The difficulty of multiplying distributions is the source of the renormalisation problem in quantum field theory and the need for paraproduct decompositions in nonlinear PDE.

The Leibniz rule extends to distributions: for $\psi \in C^\infty(\Omega)$ and $T \in \mathcal{D}'(\Omega)$,
\begin{align*}
\partial_j(\psi T) &= (\partial_j \psi) T + \psi (\partial_j T),
\end{align*}
as is verified directly from the definitions.

### Support of a Distribution

[definition:Support Of A Distribution]
Let $T \in \mathcal{D}'(\Omega)$. The **support** of $T$, denoted $\mathrm{supp}(T)$, is the complement in $\Omega$ of the largest open set on which $T$ vanishes. Explicitly, $x_0 \notin \mathrm{supp}(T)$ if and only if there exists an open neighbourhood $U$ of $x_0$ such that $T(\varphi) = 0$ for every $\varphi \in \mathcal{D}(U)$.
[/definition]

For a regular distribution $T_f$ with $f \in L^1_\mathrm{loc}(\Omega)$, the support of $T_f$ coincides with the essential support of $f$: the smallest closed set outside of which $f = 0$ almost everywhere. The Dirac delta $\delta_{x_0}$ has support $\{x_0\}$ — it is a distribution concentrated at a single point. The following structure theorem, due to Schwartz, classifies all such distributions.

[theorem:Distributions Supported At A Point]
Let $T \in \mathcal{D}'(\Omega)$ with $\mathrm{supp}(T) = \{x_0\}$ for some $x_0 \in \Omega$. Then $T$ is a finite linear combination of derivatives of the Dirac delta at $x_0$: there exist $N \in \mathbb{N}_0$ and constants $c_\alpha \in \mathbb{R}$ for $|\alpha| \leq N$ such that
\begin{align*}
T &= \sum_{|\alpha| \leq N} c_\alpha \, \partial^\alpha \delta_{x_0}.
\end{align*}
[/theorem]

The theorem says that the only distributions that "live" at a single point are the delta and its derivatives. This is remarkable: without any a priori regularity assumption, the support condition alone forces the distribution to be a finite-order differential operator applied to $\delta_{x_0}$. The proof proceeds by showing that $T$ annihilates every test function that vanishes to sufficiently high order at $x_0$ (by Taylor expansion and a cutoff argument), and then the remaining information is captured by the Taylor coefficients, which are precisely the $c_\alpha$.

## The Hierarchy of Function and Distribution Spaces

The various spaces of functions and distributions form a chain of continuous inclusions that organises the entire theory. When $\Omega = \mathbb{R}^n$, the chain is
\begin{align*}
\mathcal{D}(\mathbb{R}^n) \subseteq \mathcal{S}(\mathbb{R}^n) \subseteq L^p(\mathbb{R}^n) \subseteq \mathcal{S}'(\mathbb{R}^n) \subseteq \mathcal{D}'(\mathbb{R}^n),
\end{align*}
where $1 \leq p \leq \infty$, and all inclusions are continuous. Each space in the chain is larger and allows rougher objects:

**$\mathcal{D}(\mathbb{R}^n)$** (test functions): smooth, compactly supported. The smallest space, but dense in all $L^p$ for $p < \infty$.

**$\mathcal{S}(\mathbb{R}^n)$** ([Schwartz space](/pages/1050)): smooth, rapidly decaying with all derivatives. Larger than $\mathcal{D}$ (includes Gaussians), still dense in $L^p$. The natural domain of the Fourier transform; equipped with a Fréchet topology.

**$L^p(\mathbb{R}^n)$**: measurable functions with finite $p$-th moment. Contains non-smooth functions (e.g. step functions, $|x|^{-\gamma}$ near the origin) but requires integrability.

**$\mathcal{S}'(\mathbb{R}^n)$** ([tempered distributions](/pages/1053)): the dual of $\mathcal{S}$. Contains all $L^p$ functions, all polynomials, and the Dirac delta. The Fourier transform extends to $\mathcal{S}'$ by duality ([quotetheorem:230]). The "temperedness" condition — continuity with respect to the Schwartz semi-norms — excludes distributions of faster-than-polynomial growth, such as $T_{e^{e^x}}$.

**$\mathcal{D}'(\mathbb{R}^n)$** (distributions): the dual of $\mathcal{D}$. The largest space: it contains all tempered distributions and also objects like $T_{e^{e^x}}$ that grow super-polynomially. The Fourier transform is *not* defined on all of $\mathcal{D}'$ — this is the reason for introducing $\mathcal{S}'$ as an intermediate space.

The dual pairing reverses the chain: the test function spaces $\mathcal{D} \subseteq \mathcal{S}$ get smaller, while the distribution spaces $\mathcal{S}' \subseteq \mathcal{D}'$ get larger. Making the test functions more restrictive (requiring rapid decay or compact support) allows the distributions to be wilder, because a functional on a smaller space needs to satisfy fewer continuity conditions.

On a general open set $\Omega \neq \mathbb{R}^n$, the Schwartz space is not defined (it requires decay at infinity, which is only meaningful on all of $\mathbb{R}^n$), and the relevant chain reduces to $\mathcal{D}(\Omega) \subseteq L^p(\Omega) \subseteq \mathcal{D}'(\Omega)$.

## Application to PDEs: Distributional Solutions

### The Concept of a Distributional Solution

The distributional framework provides the weakest notion of "solution" to a PDE: a distributional solution is a distribution that satisfies the equation when tested against all test functions. This is weaker than a weak solution in the Sobolev sense (which requires the solution to be a function in some $L^p$ or $W^{k,p}$ space) and allows genuinely singular solutions.

[definition:Distributional Solution]
Let $L = \sum_{|\alpha| \leq m} a_\alpha(x) \partial^\alpha$ be a linear differential operator with smooth coefficients $a_\alpha \in C^\infty(\Omega)$, and let $f \in \mathcal{D}'(\Omega)$. A distribution $u \in \mathcal{D}'(\Omega)$ is a **distributional solution** of $Lu = f$ if
\begin{align*}
u(L^* \varphi) &= f(\varphi) \quad \text{for every } \varphi \in \mathcal{D}(\Omega),
\end{align*}
where $L^* = \sum_{|\alpha| \leq m} (-1)^{|\alpha|} \partial^\alpha (a_\alpha \, \cdot)$ is the formal adjoint of $L$.
[/definition]

For the Laplacian $L = -\Delta$, the formal adjoint is $L^* = -\Delta$ (since $\Delta$ is self-adjoint), so a distributional solution of $-\Delta u = f$ satisfies $u(-\Delta \varphi) = f(\varphi)$ for all $\varphi \in \mathcal{D}(\Omega)$. When $u$ and $f$ are regular distributions, this reduces to the standard weak formulation $\int \nabla u \cdot \nabla \varphi \, d\mathcal{L}^n = \int f \varphi \, d\mathcal{L}^n$ (after one more integration by parts).

### Shock Waves as Distributional Solutions

The most physically compelling application of distributional solutions is to conservation laws with discontinuous solutions.

[example:Burgers Equation Shock Wave]
The inviscid Burgers equation in one dimension is
\begin{align*}
u_t + u \, u_x &= 0 \quad \text{on } \mathbb{R} \times (0, \infty),
\end{align*}
or equivalently in conservation form,
\begin{align*}
u_t + \partial_x\!\left(\tfrac{1}{2}u^2\right) &= 0.
\end{align*}
Consider the initial data
\begin{align*}
u_0: \mathbb{R} &\to \mathbb{R} \\
x &\mapsto \begin{cases} 1 & \text{if } x < 0, \\ 0 & \text{if } x > 0. \end{cases}
\end{align*}
No classical solution exists for $t > 0$: characteristics from the left carry the value $1$ at speed $1$, while characteristics from the right carry the value $0$ at speed $0$, and they collide immediately. But a distributional solution exists as a travelling discontinuity. Define
\begin{align*}
u: \mathbb{R} \times (0, \infty) &\to \mathbb{R} \\
(x, t) &\mapsto \begin{cases} 1 & \text{if } x < t/2, \\ 0 & \text{if } x > t/2. \end{cases}
\end{align*}
This is a shock wave propagating at speed $s = 1/2$, which is the Rankine-Hugoniot speed $s = [u^2/2]/[u] = (1/2 - 0)/(1 - 0) = 1/2$.

**Verification as a distributional solution.** The function $u$ is locally integrable on $\mathbb{R} \times (0, \infty)$ and defines a regular distribution. We must show that $T_u(\partial_t \varphi + u \, \partial_x \varphi) = 0$ for all $\varphi \in \mathcal{D}(\mathbb{R} \times (0, \infty))$. Equivalently, using the conservation form, we verify
\begin{align*}
\int_0^\infty \int_{-\infty}^\infty \left(u \, \varphi_t + \tfrac{1}{2}u^2 \, \varphi_x\right) d\mathcal{L}^1(x) \, d\mathcal{L}^1(t) &= 0.
\end{align*}
Split the integral at the shock line $x = t/2$. On $\{x < t/2\}$, $u = 1$ and $u^2/2 = 1/2$, giving $\int\!\!\int_{x < t/2} (\varphi_t + \tfrac{1}{2}\varphi_x) \, d\mathcal{L}^1(x) \, d\mathcal{L}^1(t)$. On $\{x > t/2\}$, $u = 0$ and both terms vanish. Integrating by parts on the region $\{x < t/2\}$ (where $u$ is constant, so $u_t = u_x = 0$ classically), the volume integral vanishes and the boundary contribution along $x = t/2$ is
\begin{align*}
\int_0^\infty \varphi(t/2, t) \left(-s \cdot [u] + [u^2/2]\right) d\mathcal{L}^1(t),
\end{align*}
where $[u] = 1 - 0 = 1$ and $[u^2/2] = 1/2 - 0 = 1/2$ are the jumps across the shock. Since $s = 1/2$, the factor $-s[u] + [u^2/2] = -1/2 + 1/2 = 0$, and the integral vanishes for every $\varphi$. The Rankine-Hugoniot condition is precisely the condition that makes the distributional equation hold across the shock.
[/example]

### The Heat Kernel as a Distributional Initial-Value Solution

[example:Heat Kernel Distributional Solution]
The heat equation on $\mathbb{R}^n \times (0, \infty)$ is $u_t - \Delta u = 0$. The **heat kernel** (or Gauss-Weierstrass kernel) is
\begin{align*}
K: \mathbb{R}^n \times (0, \infty) &\to \mathbb{R} \\
(x, t) &\mapsto \frac{1}{(4\pi t)^{n/2}} \exp\!\left(-\frac{|x|^2}{4t}\right).
\end{align*}
For each $t > 0$, $K(\cdot, t) \in \mathcal{S}(\mathbb{R}^n)$ (it is a Gaussian, hence smooth and rapidly decaying). A direct computation confirms that $K$ satisfies $K_t - \Delta K = 0$ classically for $t > 0$.

The distributional content lies in the initial condition. As $t \to 0^+$, the Gaussian $K(\cdot, t)$ concentrates: its $L^1$ norm is $\int_{\mathbb{R}^n} K(x, t) \, d\mathcal{L}^n(x) = 1$ for every $t > 0$ (by the Gaussian integral), while its support effectively shrinks to $\{0\}$. For any $\varphi \in \mathcal{D}(\mathbb{R}^n)$,
\begin{align*}
\int_{\mathbb{R}^n} K(x, t) \, \varphi(x) \, d\mathcal{L}^n(x) &= \int_{\mathbb{R}^n} \frac{1}{(4\pi t)^{n/2}} e^{-|x|^2/(4t)} \varphi(x) \, d\mathcal{L}^n(x).
\end{align*}
Substituting $y = x / \sqrt{4t}$ gives $\int_{\mathbb{R}^n} \pi^{-n/2} e^{-|y|^2} \varphi(\sqrt{4t}\, y) \, d\mathcal{L}^n(y) \to \varphi(0) \int_{\mathbb{R}^n} \pi^{-n/2} e^{-|y|^2} \, d\mathcal{L}^n(y) = \varphi(0)$ as $t \to 0^+$, by the dominated convergence theorem (using $|\varphi(\sqrt{4t}\, y)| \leq \|\varphi\|_{L^\infty}$ and integrability of the Gaussian). Therefore $T_{K(\cdot,t)} \to \delta_0$ in $\mathcal{D}'(\mathbb{R}^n)$ as $t \to 0^+$.

This says that $K$ is the **fundamental solution** of the heat equation: it is a classical solution for $t > 0$ whose initial data, in the distributional sense, is the Dirac delta. The solution to $u_t - \Delta u = 0$ with initial data $u(\cdot, 0) = f$ (for $f \in L^p(\mathbb{R}^n)$, $1 \leq p \leq \infty$) is then given by convolution: $u(x, t) = (K(\cdot, t) * f)(x) = \int_{\mathbb{R}^n} K(x - y, t) f(y) \, d\mathcal{L}^n(y)$.
[/example]

## References

1. L. Schwartz, *Théorie des Distributions*, 2nd ed. (1966).
2. L. Hörmander, *The Analysis of Linear Partial Differential Operators I* (1983).
3. L. C. Evans, *Partial Differential Equations* (1998).
4. W. Rudin, *Functional Analysis* (1991).
5. F. G. Friedlander and M. Joshi, *Introduction to the Theory of Distributions*, 2nd ed. (1998).

Fri Feb 27 2026 21:11:42 GMT+0000 (Coordinated Universal Time)

Current Content

Debug: Found 1 attribution entries

First Attribution: Source: create, Text length: 28028, Start: N/A, End: N/A

Page content length: 34751

Classical analysis is built on functions: measurable maps from $\mathbb{R}^n$ (or an open subset) to $\mathbb{R}$. But many natural objects — the Dirac delta, derivatives of discontinuous functions, Green's functions of elliptic operators — are not functions in any reasonable sense. The theory of distributions, introduced by Laurent Schwartz in the 1940s and 1950s, resolves these difficulties by replacing pointwise evaluation with continuous linear functionals on a space of test functions. Instead of asking "what is the value of $u$ at $x$?", one asks "what is $T(\varphi)$ for every smooth, compactly supported $\varphi$?" This shift allows distributions to encompass delta masses, derivatives of discontinuous functions, and solutions to PDEs in the weakest possible sense.

Motivation

[motivation]

Why Functions Are Not Enough

Consider the one-dimensional wave equation $u_{tt} - u_{xx} = 0$ on $\mathbb{R} \times (0, \infty)$ with initial data $u(x, 0) = f(x)$ and $u_t(x, 0) = 0$. D'Alembert's formula gives $u(x,t) = \tfrac{1}{2}(f(x+t) + f(x-t))$. If $f \in C^2(\mathbb{R})$, this is a classical solution: $u_{tt}$ and $u_{xx}$ exist pointwise and are equal. But if $f$ is merely continuous — say, a triangular pulse with a corner — then $u$ is continuous but not $C^2$, and the equation $u_{tt} = u_{xx}$ has no pointwise meaning at the corner. Yet the formula still describes the physically correct propagation of the wave. We need a framework in which $u$ "solves" the wave equation without requiring pointwise derivatives to exist.

The situation is worse for nonlinear conservation laws. The inviscid Burgers equation $u_t + uu_x = 0$ with smooth initial data can develop discontinuities (shock waves) in finite time. After the shock forms, no classical solution exists, but the physics continues — the shock propagates according to the Rankine–Hugoniot conditions. The equation must be interpreted in a sense that allows discontinuous solutions and their "derivatives" to make sense.

The Integration-by-Parts Idea

The key observation is that testing against smooth functions can replace pointwise evaluation. Suppose $f \in C^{|\alpha|}(\Omega)$ and $\varphi \in C_c^\infty(\Omega)$. Integration by parts gives
\begin{align*} \int_\Omega (\partial^\alpha f)(x)\, \varphi(x) \, d\mathcal{L}^n(x) &= (-1)^{|\alpha|} \int_\Omega f(x)\, (\partial^\alpha \varphi)(x) \, d\mathcal{L}^n(x), \end{align*}
with no boundary terms because $\varphi$ has compact support in $\Omega$. The right-hand side makes sense even when $f$ is merely locally integrable — one never differentiates $f$, only the smooth test function $\varphi$. This suggests defining the "derivative" of $f$ as the rule that assigns to each $\varphi$ the number $(-1)^{|\alpha|} \int f \, \partial^\alpha \varphi \, d\mathcal{L}^n$. More generally, any linear functional on test functions that is continuous in a suitable sense can serve as a "generalised function" — a distribution.

From Weak Derivatives to Distributions

The weak derivative, central to Sobolev space theory, is a special case: $v \in L^p(\Omega)$ is the weak derivative $\partial^\alpha f$ if $\int v \varphi \, d\mathcal{L}^n = (-1)^{|\alpha|} \int f \, \partial^\alpha \varphi \, d\mathcal{L}^n$ for all $\varphi \in C_c^\infty(\Omega)$. But weak derivatives are required to be functions in $L^p$. Distributions remove this restriction: the distributional derivative of $T_f$ is the functional $\varphi \mapsto (-1)^{|\alpha|} \int f \, \partial^\alpha \varphi \, d\mathcal{L}^n$, which need not be representable by any locally integrable function. The Heaviside function $H$ has no weak derivative in any $L^p$ (its distributional derivative is the Dirac delta, which is not a function), but $T_H$ has a perfectly well-defined distributional derivative. The theory of distributions is the completion of the weak derivative idea: every locally integrable function generates a distribution whose distributional derivatives of all orders exist, regardless of the regularity of the original function.

[/motivation]

Test Functions

The definition of a distribution requires a space of "probe" functions against which distributions act. Smoothness ensures that integration by parts produces no error terms from non-differentiability; compact support ensures that boundary contributions vanish and all pairings are finite.

[definition: Space Of Test Functions]
Let $\Omega \subseteq \mathbb{R}^n$ be a non-empty open set. The space of test functions on $\Omega$ is
\begin{align*} \mathcal{D}(\Omega) &:= \{\varphi \in C^\infty(\Omega) \mid \mathrm{supp}(\varphi) \subset \Omega \text{ is compact}\}. \end{align*}
It carries the strict inductive limit topology: the finest locally convex topology making each inclusion $\mathcal{D}_K(\Omega) \hookrightarrow \mathcal{D}(\Omega)$ continuous, where $\mathcal{D}_K(\Omega)$ is the Fréchet space of smooth functions supported in a compact set $K \subset \Omega$.
[/definition]

This topology is Hausdorff and complete but not metrizable. Despite the non-metrizability, a sequence $\varphi_k \to \varphi$ in $\mathcal{D}(\Omega)$ if and only if all supports lie in a single compact set and all derivatives converge uniformly (Theorem 448). The full construction — bump functions, mollifiers, partitions of unity, and density results — is developed on the Test Functions page.

The Space of Distributions

With the test function space and its topology in hand, a distribution is defined as a continuous linear functional — a linear map from $\mathcal{D}(\Omega)$ to the scalars that is continuous with respect to the strict inductive limit topology.

[definition: Distribution]
Let $\Omega \subseteq \mathbb{R}^n$ be a non-empty open set and equip $\mathcal{D}(\Omega)$ with the strict inductive limit topology. A distribution on $\Omega$ is a continuous linear map $T: \mathcal{D}(\Omega) \to \mathbb{R}$.
[/definition]

The definition is topological: $T$ must send open sets in $\mathcal{D}(\Omega)$ to open sets in $\mathbb{R}$. Since the strict inductive limit topology on $\mathcal{D}(\Omega)$ is abstract and non-metrisable, it is natural to ask whether there are more concrete ways to verify that a given linear functional is a distribution. The following result provides two equivalent characterisations.

[quotetheorem:449]

The equivalence (1) $\Leftrightarrow$ (2) is the reason one can work with sequences throughout distribution theory despite the non-metrisability of $\mathcal{D}(\Omega)$: a linear functional is continuous if and only if it is sequentially continuous. This equivalence is a consequence of the LF-space structure (strict inductive limit of Fréchet spaces) and would fail for general locally convex spaces. The equivalence (1) $\Leftrightarrow$ (3) gives a quantitative bound: the integer $N_K$ measures how many derivatives of the test function the distribution "uses" on the compact set $K$. When a single $N$ works for all compact sets, $T$ is said to have finite order (at most $N$). For instance, regular distributions generated by $L^1_\mathrm{loc}$ functions have order $0$ (only the supremum of $\varphi$ appears), while the Dirac delta also has order $0$, and derivatives of the delta have order equal to the number of derivatives taken.

[definition: Space Of Distributions]
Let $\Omega \subseteq \mathbb{R}^n$ be a non-empty open set. The space of distributions on $\Omega$, denoted $\mathcal{D}'(\Omega)$, is the set of all distributions on $\Omega$. It is a vector space under pointwise operations:
\begin{align*} (T_1 + T_2)(\varphi) &:= T_1(\varphi) + T_2(\varphi), \\ (\lambda T)(\varphi) &:= \lambda \, T(\varphi) \end{align*}
for $T_1, T_2, T \in \mathcal{D}'(\Omega)$, $\lambda \in \mathbb{R}$, and $\varphi \in \mathcal{D}(\Omega)$.
[/definition]

The space $\mathcal{D}'(\Omega)$ is the continuous dual of the locally convex space $\mathcal{D}(\Omega)$, and as such it carries a natural topology: the weak* topology $\sigma(\mathcal{D}'(\Omega), \mathcal{D}(\Omega))$, which is the coarsest topology making every evaluation map $\operatorname{ev}_\varphi: T \mapsto T(\varphi)$ continuous. By the Pointwise Characterisation of Weak Star Convergence, a net (or sequence) $\{T_k\}$ converges to $T$ in $\sigma(\mathcal{D}'(\Omega), \mathcal{D}(\Omega))$ if and only if
\begin{align*} T_k(\varphi) \to T(\varphi) \quad \text{for every } \varphi \in \mathcal{D}(\Omega). \end{align*}
This is convergence "tested pointwise against all test functions," and it is the standard notion of convergence in distribution theory.

Regular and Singular Distributions

Every locally integrable function gives rise to a distribution by integration, but not every distribution arises this way. The distinction between these two cases is fundamental.

[definition: Regular Distribution]
Let $f \in L^1_\mathrm{loc}(\Omega)$. The regular distribution generated by $f$ is the distribution $T_f \in \mathcal{D}'(\Omega)$ defined by
\begin{align*} T_f: \mathcal{D}(\Omega) &\to \mathbb{R} \\ \varphi &\mapsto \int_\Omega f(x)\, \varphi(x) \, d\mathcal{L}^n(x). \end{align*}
[/definition]

That $T_f$ is indeed a distribution must be verified. Linearity follows from linearity of the Lebesgue integral. For continuity, one checks the seminorm condition from the Characterisation of Distributions: if $\mathrm{supp}(\varphi) \subseteq K$, then $|T_f(\varphi)| \leq \left(\int_K |f| \, d\mathcal{L}^n\right) \sup_K |\varphi|$, which is a bound of the required form with $N_K = 0$ and $C_K = \int_K |f| \, d\mathcal{L}^n < \infty$ (the finiteness uses local integrability and compactness of $K$). The order-zero bound means that $T_f$ depends only on the supremum of $\varphi$ — not on any of its derivatives.

The map $f \mapsto T_f$ sends locally integrable functions to distributions. A natural question is whether distinct functions generate distinct distributions — i.e., whether $T_f$ retains all the information about $f$ (up to null sets). The answer is yes.

[quotetheorem:450]

The injectivity means that no information is lost in passing from $f$ to $T_f$: if $T_f = T_g$ as distributions, then $f = g$ as elements of $L^1_\mathrm{loc}(\Omega)$. The proof uses mollification: the hypothesis $T_f = T_g$ implies that $f * \rho_\varepsilon = g * \rho_\varepsilon$ pointwise (since mollifiers are test functions), and passing $\varepsilon \to 0$ recovers $f = g$ a.e. via the Lebesgue differentiation theorem. Despite the injectivity, the objects $f$ and $T_f$ remain logically distinct: $f$ is an equivalence class of measurable functions, while $T_f$ is a continuous linear functional on $\mathcal{D}(\Omega)$. We maintain this distinction throughout.

A second compatibility question arises when $f \in L^1(\mathbb{R}^n)$ and $T_f$ is viewed as a tempered distribution: does the distributional Fourier transform $\widehat{T_f}$ (defined by duality as $\widehat{T_f}(\varphi) := T_f(\hat{\varphi})$) agree with the regular distribution $T_{\hat{f}}$ generated by the classical Fourier transform $\hat{f}(\xi) = \int f(x)\,e^{-ix\cdot\xi}\,d\mathcal{L}^n(x)$? The answer is yes — the two notions are compatible, with the proof reducing to Fubini's theorem.

[quotetheorem:718]

[definition: Singular Distribution]
A distribution $T \in \mathcal{D}'(\Omega)$ is singular if there is no $f \in L^1_\mathrm{loc}(\Omega)$ such that $T = T_f$.
[/definition]

The most important singular distribution is the Dirac delta, which evaluates a test function at a prescribed point.

[example: Dirac Delta]
For $x_0 \in \Omega$, the Dirac delta at $x_0$ is the map
\begin{align*} \delta_{x_0}: \mathcal{D}(\Omega) &\to \mathbb{R} \\ \varphi &\mapsto \varphi(x_0). \end{align*}
Linearity is immediate from the linearity of evaluation. For continuity, the seminorm estimate $|\delta_{x_0}(\varphi)| = |\varphi(x_0)| \leq \sup_K |\varphi|$ holds for any compact $K$ containing $x_0$, with $N_K = 0$ and $C_K = 1$. Thus $\delta_{x_0} \in \mathcal{D}'(\Omega)$, and it has order $0$.

The distribution $\delta_{x_0}$ is singular. To see this, suppose for contradiction that $\delta_{x_0} = T_f$ for some $f \in L^1_\mathrm{loc}(\Omega)$. Then $\int_\Omega f\varphi \, d\mathcal{L}^n = \varphi(x_0)$ for all $\varphi \in \mathcal{D}(\Omega)$. But the left-hand side depends on $\varphi$ only through its equivalence class in $L^1$ (modifying $\varphi$ on a null set does not change the integral), while the right-hand side depends on the pointwise value of $\varphi$ at $x_0$. Since one can modify any $\varphi \in \mathcal{D}(\Omega)$ on the null set $\{x_0\}$ without changing its $L^1$-equivalence class, the two sides cannot agree for all test functions, contradicting $\delta_{x_0} = T_f$.
[/example]

The Distributional Derivative

The central operation on distributions — and the primary reason the theory exists — is differentiation. Every distribution has derivatives of all orders, with no regularity requirements whatsoever. The definition is motivated by the integration-by-parts identity from the Motivation section: if $f \in C^{|\alpha|}(\Omega)$ and $\varphi \in \mathcal{D}(\Omega)$, then $\int (\partial^\alpha f)\varphi \, d\mathcal{L}^n = (-1)^{|\alpha|} \int f \, \partial^\alpha\varphi \, d\mathcal{L}^n$. The right-hand side makes sense for any distribution $T$ in place of $T_f$: one simply applies $T$ to $\partial^\alpha \varphi$.

[definition: Distributional Derivative]
Let $T \in \mathcal{D}'(\Omega)$ and let $\alpha \in \mathbb{N}_0^n$ be a multi-index. The distributional derivative of $T$ of order $\alpha$ is the distribution $\partial^\alpha T \in \mathcal{D}'(\Omega)$ defined by
\begin{align*} (\partial^\alpha T)(\varphi) &:= (-1)^{|\alpha|} T(\partial^\alpha \varphi) \quad \text{for every } \varphi \in \mathcal{D}(\Omega). \end{align*}
[/definition]

Three things must be checked. First, $\partial^\alpha \varphi \in \mathcal{D}(\Omega)$ whenever $\varphi \in \mathcal{D}(\Omega)$, since differentiating a smooth compactly supported function yields a smooth function with the same support. Second, $\partial^\alpha T$ is linear because $T$ is linear and $\partial^\alpha$ is linear. Third, $\partial^\alpha T$ is continuous: if $\varphi_k \to \varphi$ in $\mathcal{D}(\Omega)$ (with all supports in a compact set $K$ and all derivatives converging uniformly, by the Sequential Characterisation), then $\partial^\alpha \varphi_k \to \partial^\alpha \varphi$ in $\mathcal{D}(\Omega)$ (the supports remain in $K$, and $\partial^\beta(\partial^\alpha \varphi_k) = \partial^{\alpha+\beta}\varphi_k \to \partial^{\alpha+\beta}\varphi$ uniformly for every $\beta$). Continuity of $T$ then gives $T(\partial^\alpha \varphi_k) \to T(\partial^\alpha \varphi)$, hence $(\partial^\alpha T)(\varphi_k) \to (\partial^\alpha T)(\varphi)$.

The sign $(-1)^{|\alpha|}$ is chosen so that the distributional derivative of $T_f$ agrees with the classical derivative when $f \in C^{|\alpha|}(\Omega)$: the integration-by-parts identity gives $(\partial^\alpha T_f)(\varphi) = (-1)^{|\alpha|} T_f(\partial^\alpha \varphi) = (-1)^{|\alpha|} \int f \, \partial^\alpha \varphi \, d\mathcal{L}^n = \int (\partial^\alpha f) \varphi \, d\mathcal{L}^n = T_{\partial^\alpha f}(\varphi)$. Consistency with the weak derivative used in Sobolev space theory is proved on the Distributional Derivative page: if $f \in W^{|\alpha|,p}(\Omega)$ with weak derivative $v$, then $\partial^\alpha T_f = T_v$. The three notions — classical, weak, distributional — form a strict hierarchy, each more general than the last.

A key consequence is that the distributional derivative of any distribution is again a distribution, so every element of $\mathcal{D}'(\Omega)$ is infinitely differentiable in the distributional sense — in sharp contrast to classical analysis, where differentiability is a restrictive regularity condition.

Computing Distributional Derivatives

[example: Derivative Of The Heaviside Function]
The Heaviside step function $H: \mathbb{R} \to \mathbb{R}$ defined by $H(x) = \mathbb{1}_{[0,\infty)}(x)$ is locally integrable and generates a regular distribution $T_H$. For any $\varphi \in \mathcal{D}(\mathbb{R})$:
\begin{align*} (\partial T_H)(\varphi) &= -T_H(\varphi') = -\int_0^\infty \varphi'(x) \, d\mathcal{L}^1(x) = -[\varphi(x)]_0^\infty = \varphi(0) = \delta_0(\varphi), \end{align*}
where the boundary term at $+\infty$ vanishes because $\varphi$ has compact support. Therefore $\partial T_H = \delta_0$: the distributional derivative of $T_H$ is the Dirac delta. The unit jump discontinuity at $x = 0$ produces a delta mass of weight $1$.

More generally, if $g: \mathbb{R} \to \mathbb{R}$ is piecewise $C^1$ with jump discontinuities $[g]_{x_k} := g(x_k^+) - g(x_k^-)$ at finitely many points $\{x_k\}$, then
\begin{align*} \partial T_g &= T_{g'} + \sum_k [g]_{x_k}\, \delta_{x_k}, \end{align*}
where $g'$ denotes the classical derivative on the complement of $\{x_k\}$ (which is locally integrable). Each jump contributes a delta mass whose weight equals the size of the jump.
[/example]

[example: Distributional Laplacian Of The Newton Potential]
In dimension $n = 3$, the Newton potential $\Phi: \mathbb{R}^3 \setminus \{0\} \to \mathbb{R}$ defined by $\Phi(x) = (4\pi|x|)^{-1}$ extends to a locally integrable function on $\mathbb{R}^3$ (the singularity is integrable because $\int_0^1 r^{-1} \cdot r^2 \, d\mathcal{L}^1(r) < \infty$ in spherical coordinates). The function $\Phi$ is smooth and harmonic on $\mathbb{R}^3 \setminus \{0\}$: $\Delta \Phi(x) = 0$ for $x \neq 0$. Yet the distributional Laplacian of $T_\Phi$ detects the singularity at the origin. For $\varphi \in \mathcal{D}(\mathbb{R}^3)$, the full computation (carried out on the Distributional Derivative page using Green's identity on $\mathbb{R}^3 \setminus B(0, \varepsilon)$ and taking $\varepsilon \to 0$) gives $\Delta T_\Phi = -\delta_0$ in $\mathcal{D}'(\mathbb{R}^3)$. This is a special case of the general result for the fundamental solution of the Laplacian in all dimensions $n \geq 2$, stated as Distributional Laplacian of the Fundamental Solution. The distributional identity $-\Delta T_\Phi = \delta_0$ underlies the Poisson equation: the solution to $-\Delta u = f$ on $\mathbb{R}^3$ (for suitable $f$) is given by the convolution $u = \Phi * f$.
[/example]

Operations on Distributions

Multiplication by Smooth Functions

One can multiply a distribution by a smooth function, but the definition requires care: since a distribution is a functional on test functions (not a pointwise-defined object), multiplication must be defined by transferring the smooth factor to the test function.

[definition: Smooth Multiplication]
Let $T \in \mathcal{D}'(\Omega)$ and $\psi \in C^\infty(\Omega)$. The product $\psi T \in \mathcal{D}'(\Omega)$ is defined by
\begin{align*} (\psi T)(\varphi) &:= T(\psi \varphi) \quad \text{for every } \varphi \in \mathcal{D}(\Omega). \end{align*}
[/definition]

This is well-defined because $\psi \varphi \in \mathcal{D}(\Omega)$ whenever $\varphi \in \mathcal{D}(\Omega)$: the product of a smooth function with a compactly supported smooth function is smooth and has $\mathrm{supp}(\psi\varphi) \subseteq \mathrm{supp}(\varphi)$, which is compact in $\Omega$. Moreover, the map $\varphi \mapsto \psi\varphi$ is continuous on $\mathcal{D}(\Omega)$ (it preserves the common support and, by the classical Leibniz rule, uniform convergence of all derivatives), so $\psi T$ is a distribution. For regular distributions, $\psi T_f = T_{\psi f}$: the distributional product reduces to pointwise multiplication.

The product of two arbitrary distributions cannot be defined within the theory. The expression $\delta_0 \cdot \delta_0$, for instance, has no consistent meaning: to evaluate it on a test function $\varphi$, one would need to evaluate $\delta_0$ at the "function" $\delta_0 \cdot \varphi$, but $\delta_0 \cdot \varphi$ is not a test function (it is not even a function). The space $\mathcal{D}'(\Omega)$ is a module over the ring $C^\infty(\Omega)$, but it is not an algebra. This limitation is the source of the renormalisation problem in quantum field theory and the need for paraproduct decompositions in nonlinear PDE.

The Leibniz rule extends to distributional products.

[quotetheorem:452]

This is verified by a direct computation from the definitions of the distributional derivative and smooth multiplication, using the classical Leibniz rule $\psi \, \partial_j \varphi = \partial_j(\psi\varphi) - (\partial_j\psi)\varphi$ to rearrange the argument of $T$. The result is used routinely in localisation arguments: multiplying a distribution by a smooth cutoff function restricts its behaviour to a compact set, and the Leibniz rule controls the error introduced by the cutoff.

Support of a Distribution

To define the support of a distribution, one first specifies what it means for a distribution to vanish on an open set.

[definition: Vanishing On An Open Set]
Let $T \in \mathcal{D}'(\Omega)$ and let $U \subseteq \Omega$ be open. The distribution $T$ vanishes on $U$ if $T(\varphi) = 0$ for every $\varphi \in \mathcal{D}(\Omega)$ with $\mathrm{supp}(\varphi) \subseteq U$.
[/definition]

Before defining the support, one must know that the collection of all open sets on which $T$ vanishes is closed under arbitrary unions — otherwise the "largest open vanishing set" might not exist, and the support would not be well-defined.

[quotetheorem:453]

The proof is a partition-of-unity argument: a test function supported in the union $\bigcup U_i$ has compact support, which is covered by finitely many $U_i$; a smooth partition of unity subordinate to this finite cover decomposes $\varphi$ into pieces, each supported in some $U_i$, on which $T$ vanishes by hypothesis. With this result in hand, the support is well-defined.

[definition: Support Of A Distribution]
Let $T \in \mathcal{D}'(\Omega)$. The support of $T$, denoted $\mathrm{supp}(T)$, is the complement in $\Omega$ of the union of all open subsets of $\Omega$ on which $T$ vanishes.
[/definition]

By the Localisation of Vanishing, this union is itself an open set on which $T$ vanishes, so $\mathrm{supp}(T)$ is a well-defined closed subset of $\Omega$. Unwinding the definition, $x_0 \in \mathrm{supp}(T)$ precisely when every open neighbourhood $U$ of $x_0$ in $\Omega$ contains some $\varphi \in \mathcal{D}(U)$ with $T(\varphi) \neq 0$ — i.e., $T$ cannot be "killed" by localising near $x_0$. For a regular distribution $T_f$ with $f \in L^1_\mathrm{loc}(\Omega)$, $\mathrm{supp}(T_f)$ coincides with the essential support of $f$: the smallest closed subset of $\Omega$ outside of which $f = 0$ almost everywhere. The Dirac delta $\delta_{x_0}$ has $\mathrm{supp}(\delta_{x_0}) = \{x_0\}$: it is a distribution concentrated at a single point.

Distributions Supported at a Point

A natural question is: what are all the distributions concentrated at a single point? The following structure theorem, due to Schwartz, gives a complete answer.

[quotetheorem:451]

The theorem asserts that the only distributions supported at a single point are finite linear combinations of the delta and its derivatives at that point — no other behaviour is possible. This is remarkable: without any a priori regularity assumption, the support condition alone forces $T$ to be a finite-order differential operator applied to $\delta_{x_0}$. The key step in the proof is to show that $T$ annihilates every test function that vanishes to sufficiently high order at $x_0$ (using the seminorm estimate from the Characterisation of Distributions together with a Taylor expansion and cutoff argument), so the action of $T$ on an arbitrary test function depends only on the finitely many Taylor coefficients of $\varphi$ at $x_0$.

The Hierarchy of Function and Distribution Spaces

The various spaces of functions and distributions on $\mathbb{R}^n$ are connected by a chain of continuous linear embeddings:
\begin{align*} \mathcal{D}(\mathbb{R}^n) \hookrightarrow \mathcal{S}(\mathbb{R}^n) \hookrightarrow L^p(\mathbb{R}^n) \xrightarrow{\; f \,\mapsto\, T_f \;} \mathcal{S}'(\mathbb{R}^n) \hookrightarrow \mathcal{D}'(\mathbb{R}^n), \end{align*}
where $1 \leq p \leq \infty$. The first two arrows are set-theoretic inclusions. The third is the canonical embedding $f \mapsto T_f$: linear, injective (by the Injectivity of the Canonical Embedding), and continuous, but not a set-theoretic inclusion — it sends an equivalence class of measurable functions to a continuous linear functional. The fourth is restriction of functionals from $\mathcal{S}(\mathbb{R}^n)$ to $\mathcal{D}(\mathbb{R}^n)$.

Moving left to right, the spaces grow: $\mathcal{D}(\mathbb{R}^n)$ (smooth, compactly supported test functions, dense in $L^p$ for $p < \infty$), the Schwartz space $\mathcal{S}(\mathbb{R}^n)$ (smooth, rapidly decaying — the natural domain of the Fourier transform), $L^p(\mathbb{R}^n)$ (integrability without smoothness), $\mathcal{S}'(\mathbb{R}^n)$ (tempered distributions — the largest space on which the Fourier transform is defined, via the Fourier automorphism), and $\mathcal{D}'(\mathbb{R}^n)$ (all distributions, including those of super-polynomial growth). The dual pairing reverses the inclusion order: as the test function spaces shrink ($\mathcal{D} \hookrightarrow \mathcal{S}$), fewer continuity conditions are imposed, so the distribution spaces grow ($\mathcal{S}' \hookrightarrow \mathcal{D}'$).

[example: A Distribution That Is Not Tempered]
The function $g: \mathbb{R} \to \mathbb{R}$ defined by $g(x) = e^{e^x}$ is locally integrable and generates a regular distribution $T_g \in \mathcal{D}'(\mathbb{R})$. However, $T_g \notin \mathcal{S}'(\mathbb{R})$: the functional $\varphi \mapsto \int_{\mathbb{R}} e^{e^x}\varphi(x) \, d\mathcal{L}^1(x)$ is continuous on $\mathcal{D}(\mathbb{R})$ (where the compact support of $\varphi$ controls the integral) but not on $\mathcal{S}(\mathbb{R})$ (where the rapid decay of $\varphi$ cannot compensate for the super-exponential growth of $g$). The Fourier transform of $T_g$ is therefore not defined — this is the obstruction that the temperedness condition is designed to exclude.
[/example]

On a general open set $\Omega \subsetneq \mathbb{R}^n$, the Schwartz space is not defined (rapid decay at infinity is meaningful only on all of $\mathbb{R}^n$), and the chain reduces to $\mathcal{D}(\Omega) \hookrightarrow L^p(\Omega) \xrightarrow{f \mapsto T_f} \mathcal{D}'(\Omega)$.

Application to PDEs: Distributional Solutions

The Concept of a Distributional Solution

The distributional framework provides the weakest notion of "solution" to a PDE: a distributional solution is a distribution that satisfies the equation when tested against all test functions. This is weaker than a weak solution in the Sobolev sense (which requires the solution to belong to some $W^{k,p}$ space) and allows genuinely singular solutions.

The definition uses the formal adjoint of a differential operator. Given a linear differential operator $L = \sum_{|\alpha| \leq m} a_\alpha \partial^\alpha$ with coefficients $a_\alpha \in C^\infty(\Omega)$, its formal adjoint is $L^* = \sum_{|\alpha| \leq m} (-1)^{|\alpha|} \partial^\alpha(a_\alpha \, \cdot\,)$. Since each $a_\alpha$ is smooth, $L^*$ maps $\mathcal{D}(\Omega)$ into $\mathcal{D}(\Omega)$: if $\varphi \in \mathcal{D}(\Omega)$, then $L^*\varphi$ is smooth (by the classical Leibniz rule) and $\mathrm{supp}(L^*\varphi) \subseteq \mathrm{supp}(\varphi)$ (since $\partial^\alpha(a_\alpha \varphi)$ vanishes wherever $\varphi$ does).

[definition: Distributional Solution]
Let $L = \sum_{|\alpha| \leq m} a_\alpha \partial^\alpha$ be a linear differential operator with coefficients $a_\alpha \in C^\infty(\Omega)$, let $L^* = \sum_{|\alpha| \leq m} (-1)^{|\alpha|} \partial^\alpha(a_\alpha \, \cdot\,)$ be its formal adjoint, and let $S \in \mathcal{D}'(\Omega)$. A distribution $T \in \mathcal{D}'(\Omega)$ is a distributional solution of $LT = S$ if
\begin{align*} T(L^* \varphi) &= S(\varphi) \quad \text{for every } \varphi \in \mathcal{D}(\Omega). \end{align*}
[/definition]

For the Laplacian $L = -\Delta$, the formal adjoint is $L^* = -\Delta$ (since $\Delta$ is formally self-adjoint), so a distributional solution of $-\Delta T = S$ satisfies $T(-\Delta \varphi) = S(\varphi)$ for all $\varphi \in \mathcal{D}(\Omega)$. When $T = T_u$ and $S = T_f$ are both regular distributions, this reduces to the standard weak formulation $\int \nabla u \cdot \nabla \varphi \, d\mathcal{L}^n = \int f\varphi \, d\mathcal{L}^n$ (after one further integration by parts).

Shock Waves as Distributional Solutions

The most physically compelling application of distributional solutions is to conservation laws with discontinuous solutions.

[example: Burgers Equation Shock Wave]
The inviscid Burgers equation in one dimension, written in conservation form, is
\begin{align*} u_t + \partial_x\!\left(\tfrac{1}{2}u^2\right) &= 0 \quad \text{on } \mathbb{R} \times (0, \infty). \end{align*}
Consider the initial data $u_0: \mathbb{R} \to \mathbb{R}$ defined by $u_0(x) = 1$ for $x < 0$ and $u_0(x) = 0$ for $x > 0$. No classical solution exists for $t > 0$: characteristics from the left carry the value $1$ at speed $1$, while characteristics from the right carry the value $0$ at speed $0$, and they collide immediately. A distributional solution exists as a travelling discontinuity. Define $u: \mathbb{R} \times (0, \infty) \to \mathbb{R}$ by $u(x,t) = 1$ for $x < t/2$ and $u(x,t) = 0$ for $x > t/2$. This is a shock wave propagating at speed $s = 1/2$, which equals the Rankine–Hugoniot speed $s = [u^2/2]/[u] = (1/2 - 0)/(1 - 0) = 1/2$.

Verification as a distributional solution. The function $u$ is locally integrable on $\mathbb{R} \times (0, \infty)$ and generates a regular distribution $T_u$. The conservation law $u_t + \partial_x(u^2/2) = 0$ holds in the distributional sense if $\partial_t T_u + \partial_x T_{u^2/2} = 0$ in $\mathcal{D}'(\mathbb{R} \times (0,\infty))$, which means
\begin{align*} \int_0^\infty \int_{-\infty}^\infty \left(u \, \varphi_t + \tfrac{1}{2}u^2 \, \varphi_x\right) d\mathcal{L}^1(x) \, d\mathcal{L}^1(t) &= 0 \end{align*}
for every $\varphi \in \mathcal{D}(\mathbb{R} \times (0, \infty))$. Split the integral at the shock line $x = t/2$. On the region $\{x < t/2\}$, $u = 1$ and $u^2/2 = 1/2$, giving $\int\!\!\int_{x < t/2} (\varphi_t + \tfrac{1}{2}\varphi_x) \, d\mathcal{L}^1(x) \, d\mathcal{L}^1(t)$. On $\{x > t/2\}$, $u = 0$ and both terms vanish. Integrating by parts on the region $\{x < t/2\}$ (where $u$ is constant, so the classical equation holds trivially), the volume integral vanishes and the boundary contribution along $x = t/2$ is
\begin{align*} \int_0^\infty \varphi(t/2, t) \left(-s \cdot [u] + [u^2/2]\right) d\mathcal{L}^1(t), \end{align*}
where $[u] = 1 - 0 = 1$ and $[u^2/2] = 1/2 - 0 = 1/2$ are the jumps across the shock. Since $s = 1/2$, the factor $-s[u] + [u^2/2] = -1/2 + 1/2 = 0$, and the integral vanishes for every $\varphi$. The Rankine–Hugoniot condition is precisely the condition that makes the distributional equation hold across the shock.
[/example]

The Heat Kernel as a Distributional Initial-Value Solution

[example: Heat Kernel Distributional Solution]
The heat equation on $\mathbb{R}^n \times (0, \infty)$ is $u_t - \Delta u = 0$. The heat kernel is the function $K: \mathbb{R}^n \times (0, \infty) \to \mathbb{R}$ defined by
\begin{align*} K(x, t) &:= \frac{1}{(4\pi t)^{n/2}} \exp\!\left(-\frac{|x|^2}{4t}\right). \end{align*}
For each fixed $t > 0$, $K(\cdot, t) \in \mathcal{S}(\mathbb{R}^n)$ (it is a Gaussian, hence smooth and rapidly decaying). A direct computation confirms that $K$ satisfies $K_t - \Delta K = 0$ classically for all $t > 0$.

The distributional content lies in the initial condition. As $t \to 0^+$, the Gaussian $K(\cdot, t)$ concentrates: $\int_{\mathbb{R}^n} K(x, t) \, d\mathcal{L}^n(x) = 1$ for every $t > 0$ (by the standard Gaussian integral), while the mass concentrates near $\{0\}$. For any $\varphi \in \mathcal{D}(\mathbb{R}^n)$, the substitution $y = x/\sqrt{4t}$ gives
\begin{align*} T_{K(\cdot,t)}(\varphi) &= \int_{\mathbb{R}^n} K(x, t) \, \varphi(x) \, d\mathcal{L}^n(x) = \int_{\mathbb{R}^n} \pi^{-n/2} e^{-|y|^2} \varphi(\sqrt{4t}\, y) \, d\mathcal{L}^n(y). \end{align*}
As $t \to 0^+$, the integrand converges pointwise to $\pi^{-n/2} e^{-|y|^2} \varphi(0)$, and is dominated by $\pi^{-n/2} e^{-|y|^2} \|\varphi\|_{L^\infty}$, which is integrable. By the dominated convergence theorem,
\begin{align*} T_{K(\cdot,t)}(\varphi) &\to \varphi(0) \int_{\mathbb{R}^n} \pi^{-n/2} e^{-|y|^2} \, d\mathcal{L}^n(y) = \varphi(0) = \delta_0(\varphi). \end{align*}
Therefore $T_{K(\cdot,t)} \to \delta_0$ in $\mathcal{D}'(\mathbb{R}^n)$ as $t \to 0^+$: the heat kernel is the fundamental solution of the heat equation, a classical solution for $t > 0$ whose initial data in the distributional sense is the Dirac delta. The solution to $u_t - \Delta u = 0$ with initial data $u(\cdot, 0) = f$ (for $f \in L^p(\mathbb{R}^n)$, $1 \leq p \leq \infty$) is then given by convolution: $u(x, t) = (K(\cdot, t) * f)(x) = \int_{\mathbb{R}^n} K(x - y, t)\, f(y) \, d\mathcal{L}^n(y)$.
[/example]

References

L. Schwartz, Théorie des Distributions, 2nd ed. (1966).
L. Hörmander, The Analysis of Linear Partial Differential Operators I (1983).
L. C. Evans, Partial Differential Equations (1998).
W. Rudin, Functional Analysis (1991).
F. G. Friedlander and M. Joshi, Introduction to the Theory of Distributions, 2nd ed. (1998).

Attribution Debug Info:

Total segments: 1

Attributed segments: 0

Non-attributed segments: 1

Attribution Summary

admin

Contributions: 1

Sources: create

Last Modified: 2/27/2026

What brings you to Androma?

Start with a route through the knowledge graph.

Distribution - Content Verification

Raw Database Data

Current Content

Motivation

Why Functions Are Not Enough

The Integration-by-Parts Idea

From Weak Derivatives to Distributions

Test Functions

The Space of Distributions

Regular and Singular Distributions

The Distributional Derivative

Computing Distributional Derivatives

Operations on Distributions

Multiplication by Smooth Functions

Support of a Distribution

Distributions Supported at a Point

The Hierarchy of Function and Distribution Spaces

Application to PDEs: Distributional Solutions

The Concept of a Distributional Solution

Shock Waves as Distributional Solutions

The Heat Kernel as a Distributional Initial-Value Solution

References

Attribution Debug Info:

Attribution Summary

admin

Sign in to Androma

Check your inbox

One last step

Distribution - Content Verification

Raw Database Data

Current Content

Motivation

Why Functions Are Not Enough

The Integration-by-Parts Idea

From Weak Derivatives to Distributions

Test Functions

The Space of Distributions

Regular and Singular Distributions

The Distributional Derivative

Computing Distributional Derivatives

Operations on Distributions

Multiplication by Smooth Functions

Support of a Distribution

Distributions Supported at a Point

The Hierarchy of Function and Distribution Spaces

Application to PDEs: Distributional Solutions

The Concept of a Distributional Solution

Shock Waves as Distributional Solutions

The Heat Kernel as a Distributional Initial-Value Solution

References

Attribution Debug Info:

Attribution Summary

admin