Convolution is the operation that "smears" one function against another, producing a weighted average that inherits the best properties of both factors. It is the algebraic operation underlying three pillars of analysis: $L^p$ theory (where [Young's inequality](/theorems/463) controls its [integrability](/page/Integral)), Fourier analysis (where the [Convolution Theorem](/theorems/250) converts it to pointwise multiplication), and PDE (where solutions are built by convolving with fundamental solutions). This page develops the operation itself; for its role in regularisation see the [mollifier page](/page/Standard%20Mollifier), and for its interaction with the Fourier transform see the [Fourier Transform page](/page/Fourier%20Transform).
[motivation]
## Motivation
### What Convolution Does
Given two [functions](/page/Function) $f$ and $g$ on $\mathbb{R}^n$, the convolution $(f * g)(x) = \int f(x - y)g(y) \, d\mathcal{L}^n(y)$ evaluates $f$ at all points near $x$, weighted by $g$. If $g$ is a concentrated bump near the origin, $(f * g)(x)$ is a local average of $f$ around $x$ — a smoothing operation. If $g$ is itself rough, the convolution still makes sense (under integrability conditions) and is at least as regular as the smoother of the two factors.
### Why Domains Matter
The integral $(f * g)(x) = \int f(x - y)g(y) \, dy$ requires two things: $y$ must lie in the domain of $g$, and $x - y$ must lie in the domain of $f$. If $f$ is supported on a [set](/page/Set) $A$ and $g$ on a set $B$, then the integrand is nonzero only when $y \in B$ and $x - y \in A$, i.e., $x \in A + B$. The convolution's natural domain is therefore the **Minkowski sum** $A + B = \{a + b : a \in A, \, b \in B\}$, which is typically *larger* than either $A$ or $B$ individually. This domain growth is not a technicality — it is the reason convolution with a kernel of width $\varepsilon$ expands the support by $\varepsilon$, and it is the mechanism by which the [Poisson kernel](/theorems/576) on the ball solves the Dirichlet problem.
### The Three Algebraic Structures
Convolution gives $L^1(\mathbb{R}^n)$ the structure of a commutative Banach algebra (without identity). The [Fourier transform](/page/Fourier%20Transform) is the Gelfand transform of this algebra, converting convolution to multiplication and making the spectral theory of translation-invariant operators algebraic. In PDE, every linear constant-coefficient equation $P(D)u = f$ on $\mathbb{R}^n$ is solved by $u = E * f$ where $E$ is the fundamental solution satisfying $P(D)E = \delta_0$: the [heat kernel](/page/Heat%20Equation), the [Newtonian potential](/page/Laplace's%20Equation), and the [Duhamel integral](/theorems/55) are all convolutions.
[/motivation]
## Definition and Domain
The definition requires care about where the integral makes sense. The integrand involves $f$ evaluated at $x - y$ and $g$ at $y$, so both the integrability of $f$ and $g$ and the geometry of their supports determine the set of $x$ for which the convolution is finite.
[definition: Convolution]
Let $f, g: \mathbb{R}^n \to \mathbb{C}$ be measurable functions. The **convolution** of $f$ and $g$ is the function
\begin{align*}
(f * g)(x) := \int_{\mathbb{R}^n} f(x - y) \, g(y) \, d\mathcal{L}^n(y),
\end{align*}
defined for those $x \in \mathbb{R}^n$ for which the integral exists (either as a [Lebesgue integral](/page/Lebesgue%20Integral) when the integrand is absolutely integrable, or as an improper integral in suitable senses).
[/definition]
The notation is deceptively simple. The integral involves $f$ evaluated at $x - y$ (a *reflected and translated* version of $f$) multiplied by $g(y)$. For the integral to be finite at a given $x$, one needs $y \mapsto f(x-y)g(y)$ to be integrable. This is where the integrability conditions (Young's inequality) and the support conditions (Minkowski sum) enter.
### When Is the Convolution Well-Defined?
The most common sufficient conditions are:
**$L^1 * L^1$:** If $f, g \in L^1(\mathbb{R}^n)$, Fubini-Tonelli gives $\int\!\!\int |f(x-y)||g(y)| \, dy \, dx = \|f\|_{L^1}\|g\|_{L^1} < \infty$, so $f * g$ is defined a.e. and belongs to $L^1$.
**$L^p * L^q$:** More generally, [Young's Convolution Inequality](/theorems/463) gives the complete $L^p$ picture.
**Compact support:** If one of $f, g$ has compact support and the other is locally integrable, the integral reduces to a bounded region and $f * g$ is defined everywhere.
[example: Domain Growth Under Convolution]
Let $f = \mathbb{1}_{[0,1]}$ and $g = \mathbb{1}_{[2,4]}$ on $\mathbb{R}$. The convolution is
\begin{align*}
(f * g)(x) = \int_{\mathbb{R}} \mathbb{1}_{[0,1]}(x - y) \, \mathbb{1}_{[2,4]}(y) \, d\mathcal{L}^1(y) = \mathcal{L}^1\!\big([2,4] \cap [x-1, x]\big).
\end{align*}
The intersection $[2,4] \cap [x-1, x]$ is nonempty exactly when $x - 1 \leq 4$ and $x \geq 2$, i.e., $x \in [2, 5]$. The convolution is: $0$ for $x < 2$; $x - 2$ for $2 \leq x \leq 3$; $1$ for $3 \leq x \leq 4$; $5 - x$ for $4 \leq x \leq 5$; $0$ for $x > 5$. The support of $f * g$ is $[2, 5] = [0, 1] + [2, 4]$ — the Minkowski sum of the two supports. Neither input is supported on $[2, 5]$; the convolution's domain has *grown* by adding the widths.
Note that $f$ "lives" on $[0,1]$ and $g$ on $[2,4]$, which are disjoint — yet $f * g$ is nonzero on the connecting interval $[2, 5]$. Convolution reaches across gaps.
[/example]
## Commutativity and Algebraic Properties
Convolution on $\mathbb{R}^n$ is commutative: $(f * g)(x) = (g * f)(x)$. The proof requires a global change of variables.
[remark: Commutativity Proof]
Substitute $z = x - y$ (so $y = x - z$, $dy = dz$):
\begin{align*}
(f * g)(x) = \int f(x - y) g(y) \, dy = \int f(z) g(x - z) \, dz = (g * f)(x).
\end{align*}
This substitution is valid on $\mathbb{R}^n$ because Lebesgue measure is translation-invariant. On other domains — for instance, on a non-abelian [group](/page/Group) — the substitution introduces the group inverse and convolution is *not* commutative in general.
[/remark]
[example: Commutativity And Asymmetric Supports]
Take $f = \mathbb{1}_{[0,1]}$ and $g = \delta$-approximation $\rho_\varepsilon$ supported on $[-\varepsilon, \varepsilon]$. Then $f * g$ is a smooth function supported on $[-\varepsilon, 1 + \varepsilon]$: the mollification of $f$, which smooths the jumps at $0$ and $1$. Computing $g * f$ instead gives the *same* function — but the intuition is different. In $f * g$, we average $f$ against a narrow bump; in $g * f$, we smear the bump $g$ across the support of $f$. Commutativity says these produce the same output, despite the asymmetry between a rough function ($f$) and a smooth one ($g$).
[/example]
Beyond commutativity, convolution is associative — $(f * g) * h = f * (g * h)$ — and distributes over addition. These are verified by Fubini's theorem. The Dirac delta $\delta_0$ acts as a (distributional) identity: $f * \delta_0 = f$. However, no function in $L^1(\mathbb{R}^n)$ serves as a convolution identity: the [Riemann-Lebesgue Lemma](/theorems/245) forces $\hat{f}(\xi) \to 0$ for $f \in L^1$, but a convolution identity would need $\hat{e}(\xi) = 1$ for all $\xi$, which is incompatible with decay.
## The Support of a Convolution
The domain example above illustrates a general principle: convolution adds supports. If $f$ is nonzero on a set $A$ and $g$ on a set $B$, the integrand $f(x-y)g(y)$ can be nonzero only when $y \in B$ and $x - y \in A$, forcing $x \in A + B$. This makes precise the intuition that convolution "spreads out" a function by the width of the kernel.
[quotetheorem:588]
The support property is the mechanism behind the $\varepsilon$-enlargement in mollification: if $\operatorname{supp}(\rho_\varepsilon) = \overline{B}(0, \varepsilon)$ and $\operatorname{supp}(f) \subseteq K$, then $\operatorname{supp}(f * \rho_\varepsilon) \subseteq K + \overline{B}(0, \varepsilon)$ — the support expands by at most $\varepsilon$ in every direction. See [Properties of Mollification](/theorems/461) for the full statement in the mollification context.
## $L^p$ Theory
Knowing that $f * g$ is well-defined is not enough — we need to control which $L^r$ space it lands in. For pointwise products, Hölder's inequality gives $\|fg\|_{L^r} \leq \|f\|_{L^p}\|g\|_{L^q}$ with $1/r = 1/p + 1/q$. Convolution does strictly better: the averaging effect gains one full unit of integrability, so the exponent relation shifts to $1/r = 1/p + 1/q - 1$. Without this gain, the approximate identity theory would not work — the $L^1 * L^p \to L^p$ bound that underlies mollification requires $r = p$ with $q = 1$.
[quotetheorem:463]
The exponent relation $1/p + 1/q = 1 + 1/r$ has a useful interpretation: convolution gains one full unit of integrability ($1/r = 1/p + 1/q - 1$) compared to Hölder's inequality for products ($1/r = 1/p + 1/q$). The most important special cases are: $L^1 * L^p \to L^p$ (convolution with an $L^1$ kernel preserves $L^p$, with $\|f * g\|_p \leq \|f\|_p\|g\|_1$), which is the bound behind every approximate identity argument; and $L^1 * L^1 \to L^1$ (with the multiplicative norm bound $\|f * g\|_1 \leq \|f\|_1 \|g\|_1$), which makes $L^1(\mathbb{R}^n)$ a Banach algebra under convolution.
Young's inequality controls the norm of $f * g$. But for convergence of approximate identities — showing $f * \varphi_\varepsilon \to f$ — we need to control the *difference* $f * \varphi_\varepsilon - f = \int \varphi_\varepsilon(y)(f(\cdot - y) - f(\cdot)) \, dy$. This requires pulling the $L^p_x$ norm inside the $dy$ integral, which is a job for Minkowski's integral inequality.
[quotetheorem:464]
Together, Young and Minkowski reduce $L^p$ convergence of convolutions to the **$L^p$-[continuity](/page/Continuity) of translation**: $\|f(\cdot - h) - f\|_{L^p} \to 0$ as $h \to 0$, which holds for all $f \in L^p$ with $1 \leq p < \infty$.
## Regularity: Convolution Inherits the Best
Young's inequality and support control are quantitative but say nothing about differentiability. The following regularity principle is equally fundamental: if one factor is smooth, the convolution inherits that smoothness. The reason is that the [derivative](/page/Derivative) of $f * g$ can be computed by differentiating either factor — and one always chooses the smooth one. Without this principle, mollification could not produce smooth approximations.
[quotetheorem:35]
The principle generalises: if $f \in L^1_{\mathrm{loc}}$ and $g \in C_c^k$, then $f * g \in C^k$ and $D^\alpha(f * g) = f * D^\alpha g$ for $|\alpha| \leq k$. Intuitively, $f * g$ is at least as smooth as the smoother factor. This is the fundamental mechanism of mollification: convolving any rough function with $\rho_\varepsilon \in C_c^\infty$ produces a $C^\infty$ function. See the [mollifier page](/page/Standard%20Mollifier) for the full development.
## Interaction with the Fourier Transform
The deepest structural property of convolution is that the [Fourier transform](/page/Fourier%20Transform) diagonalises it: convolution in the spatial domain becomes pointwise multiplication in the frequency domain.
[quotetheorem:250]
The dual identity — multiplication becomes convolution — also holds: $\widehat{fg}(\xi) = (2\pi)^{-n}(\hat{f} * \hat{g})(\xi)$. This duality is the reason the Fourier transform is so effective for PDE: a differential equation $P(D)u = f$ transforms to $P(\xi)\hat{u}(\xi) = \hat{f}(\xi)$, which is algebraic; inverting gives $\hat{u} = \hat{f}/P$, i.e., $u = \mathcal{F}^{-1}[1/P] * f$, expressing the solution as a convolution with the fundamental solution.
The convolution theorem extends to distributions: when $u$ is a tempered distribution and $\varphi$ is a Schwartz function, the convolution $u * \varphi$ is a smooth function of at most polynomial growth, and the Fourier exchange $\widehat{u * \varphi} = \hat{u} \cdot \hat{\varphi}$ continues to hold. This is essential for PDE, where the fundamental solution is typically a distribution (e.g., $\delta_0$) and the source term a function.
[quotetheorem:458]
## Approximate Identities
There is no convolution identity in $L^1$ (as shown above), but there are *approximate* identities: families of kernels $\varphi_\varepsilon$ that act increasingly like $\delta_0$ as $\varepsilon \to 0$. The conditions needed are surprisingly minimal — unit mass, uniform $L^1$ bound, and concentration at the origin suffice to guarantee $f * \varphi_\varepsilon \to f$ in $L^p$ for any $f \in L^p$.
[definition: Approximate Identity]
A family $\{\varphi_\varepsilon\}_{\varepsilon > 0}$ in $L^1(\mathbb{R}^n)$ is an **approximate identity** if: (1) $\int_{\mathbb{R}^n} \varphi_\varepsilon \, d\mathcal{L}^n = 1$ for all $\varepsilon > 0$; (2) $\sup_\varepsilon \|\varphi_\varepsilon\|_{L^1} < \infty$; (3) for every $\delta > 0$, $\int_{|x| > \delta} |\varphi_\varepsilon(x)| \, d\mathcal{L}^n(x) \to 0$ as $\varepsilon \to 0$.
[/definition]
The three conditions say: unit mass, uniform $L^1$ bound, and concentration. The convergence $f * \varphi_\varepsilon \to f$ in $L^p$ (for $1 \leq p < \infty$) follows from the same argument used in part (3) of [Properties of Mollification](/theorems/461): Minkowski's integral inequality reduces the problem to $L^p$-continuity of translation, and the concentration property (3) localises the integral.
Important examples: the standard [mollifier](/page/Standard%20Mollifier) $\rho_\varepsilon$ (compactly supported, gives $C^\infty$ approximations); the [heat kernel](/page/Heat%20Equation) $(4\pi\varepsilon)^{-n/2}e^{-|x|^2/(4\varepsilon)}$ (solves the heat equation, not compactly supported); the [Poisson kernel](/theorems/576) (solves the Laplace equation on the half-space); the [Fejér kernel](/theorems/584) (trigonometric approximate identity on $\mathbb{T}$). The standard mollifier is distinguished by compact support, which gives the [support control](/theorems/588) that the others lack.
## Convolution of Distributions
The definition extends beyond functions. For a distribution $T \in \mathcal{D}'(\mathbb{R}^n)$ and a [test function](/page/Test%20Function) $\varphi \in \mathcal{D}(\mathbb{R}^n)$, the convolution $(T * \varphi)(x) := T_y(\varphi(x - y))$ is always well-defined (the map $y \mapsto \varphi(x - y)$ is in $\mathcal{D}$ for each $x$) and produces a $C^\infty$ function. This is the distributional analogue of the regularity principle: convolving with a smooth function always smooths.
For two distributions $T, S \in \mathcal{D}'(\mathbb{R}^n)$, the convolution $T * S$ requires a support condition — at least one of $T, S$ must be compactly supported — to ensure the necessary integrals converge. When this holds, $T * S \in \mathcal{D}'(\mathbb{R}^n)$ and satisfies:
\begin{align*}
\partial^\alpha(T * S) = (\partial^\alpha T) * S = T * (\partial^\alpha S)
\end{align*}
for every multi-index $\alpha$. The Dirac delta is the convolution identity: $T * \delta_0 = T$ for every distribution $T$. See the [Distribution page](/page/Distribution) for the full framework and [Theorem 458](/theorems/458) for the tempered distribution case.
## References
- Stein, E. M. and Weiss, G. (1971). *Introduction to Fourier Analysis on Euclidean Spaces*. Princeton University Press.
- Grafakos, L. (2014). *Classical Fourier Analysis* (3rd ed.). Springer.
- Folland, G. B. (1999). *Real Analysis* (2nd ed.). Wiley.
- Brezis, H. (2011). *Functional Analysis, [Sobolev Spaces](/page/Sobolev%20Space) and Partial Differential Equations*. Springer.