The [Fourier transform](/page/Fourier%20Transform) assigns to each [integrable](/page/Integral) function $f \in L^1(\mathbb{R}^n, \mathcal{L}^n)$ a continuous function $\hat{f}$ on frequency space. But many of the most important objects in analysis and probability — point masses, singular distributions of random variables, spectral measures of stochastic processes — are not [functions](/page/Function) at all. They are measures, and the integral defining $\hat{f}$ makes no sense for them because there is no density to integrate against Lebesgue measure. The Fourier–Stieltjes transform resolves this by replacing $f(x) \, d\mathcal{L}^n(x)$ with a general finite Borel measure $\mu$ in the exponent-weighted integral, producing a bounded continuous function that encodes the oscillatory content of $\mu$ in the same way that $\hat{f}$ encodes the oscillatory content of $f$. In probability, this object is the characteristic function of a random variable; in harmonic analysis, it is the natural extension of the Fourier transform to the measure algebra $M(\mathbb{R}^n)$.
## Motivation
[motivation]
### The $L^1$ [Boundary](/page/Boundary)
The [Fourier transform](/page/Fourier%20Transform) on $L^1(\mathbb{R}^n, \mathcal{L}^n)$ is defined by
\begin{align*}
\hat{f}(\xi) &= \int_{\mathbb{R}^n} f(x) \, e^{i\xi \cdot x} \, d\mathcal{L}^n(x).
\end{align*}
This integral converges absolutely because $|f(x) e^{i\xi \cdot x}| = |f(x)|$ and $f \in L^1$. The [Riemann–Lebesgue lemma](/theorems/245) guarantees that $\hat{f}$ is bounded, uniformly continuous, and vanishes at infinity: $\hat{f} \in C_0(\mathbb{R}^n)$. These are powerful properties, and within $L^1$ the theory works well.
The difficulty is that $L^1$ is not large enough. Consider the Dirac mass $\delta_0$, defined by $\delta_0(A) = \mathbb{1}_A(0)$ for every Borel set $A \subseteq \mathbb{R}^n$. This is a perfectly well-defined finite Borel measure — it assigns mass $1$ to any set containing the origin — but it has no density with respect to Lebesgue measure. There is no function $f \in L^1(\mathbb{R}^n, \mathcal{L}^n)$ such that $\delta_0(A) = \int_A f \, d\mathcal{L}^n$ for all Borel sets $A$. The $L^1$ Fourier transform therefore cannot see $\delta_0$ at all.
Yet the formal computation is immediate: if we insert $\delta_0$ into the integral, we obtain
\begin{align*}
\int_{\mathbb{R}^n} e^{i\xi \cdot x} \, d\delta_0(x) &= e^{i\xi \cdot 0} = 1.
\end{align*}
The "Fourier transform" of the Dirac mass is the constant function $1$ — a perfectly concrete object. The integral makes sense; it is only the insistence on having a Lebesgue density that fails. This suggests that the correct domain for the Fourier transform should include measures, not just integrable functions.
### Measures as the Natural Domain
An $L^1$ function $f$ acts on frequency space through the integral $\xi \mapsto \int f(x) e^{i\xi \cdot x} \, d\mathcal{L}^n(x)$. But this integral depends on $f$ only through the measure $\mu_f$ defined by $\mu_f(A) = \int_A f \, d\mathcal{L}^n$ — the absolutely continuous measure with density $f$. The mapping $f \mapsto \mu_f$ embeds $L^1(\mathbb{R}^n, \mathcal{L}^n)$ isometrically into the space $M(\mathbb{R}^n)$ of finite complex Borel measures (equipped with the total variation norm), and the Fourier transform of $f$ is really the Fourier transform of the measure $\mu_f$:
\begin{align*}
\hat{f}(\xi) &= \int_{\mathbb{R}^n} e^{i\xi \cdot x} \, d\mu_f(x).
\end{align*}
Once this is recognised, the generalisation is immediate: for any $\mu \in M(\mathbb{R}^n)$, the integral $\int e^{i\xi \cdot x} \, d\mu(x)$ converges absolutely because $|e^{i\xi \cdot x}| = 1$ and $|\mu|(\mathbb{R}^n) < \infty$. No density is required, and the resulting function of $\xi$ inherits [continuity](/page/Continuity) and boundedness from the [dominated convergence theorem](/theorems/4). The space $M(\mathbb{R}^n)$ is strictly larger than $L^1$: it contains point masses, singular continuous measures (such as the Cantor measure), and all convex combinations thereof.
### The Probabilistic Perspective
In probability theory, the same object appears under a different name. If $X: \Omega \to \mathbb{R}^n$ is a random variable on a probability space $(\Omega, \mathcal{F}, \mathbb{P})$, its law $\mu_X = \mathbb{P} \circ X^{-1}$ is a Borel probability measure on $\mathbb{R}^n$, and the characteristic function of $X$ is
\begin{align*}
\phi_X(\xi) &= \mathbb{E}[e^{i\xi \cdot X}] = \int_{\mathbb{R}^n} e^{i\xi \cdot x} \, d\mu_X(x).
\end{align*}
This is exactly the Fourier–Stieltjes transform of $\mu_X$. The characteristic function determines the distribution uniquely (the uniqueness theorem below), encodes all moments through its Taylor expansion at the origin, and characterises convergence in distribution through the Lévy continuity theorem. These facts make it the single most important analytic tool in probability theory — and they are all consequences of the general theory of the Fourier–Stieltjes transform.
[/motivation]
## The Fourier–Stieltjes Transform
### The Stieltjes Connection: Measures and Bounded Variation
The name "Stieltjes" refers to the Riemann–Stieltjes integral: the classical formulation defines the transform by integrating $e^{i\xi x}$ not against a density $f(x) \, d\mathcal{L}^1(x)$, but against a function of [bounded variation](/page/Functions%20of%20Bounded%20Variation) $dF(x)$. In one dimension, every finite Borel measure $\mu$ on $\mathbb{R}$ corresponds to a right-continuous BV function $F: \mathbb{R} \to \mathbb{C}$ via the **distribution function**
\begin{align*}
F(x) := \mu((-\infty, x]),
\end{align*}
and the Fourier–Stieltjes transform of $\mu$ coincides with the Riemann–Stieltjes integral:
\begin{align*}
\hat{\mu}(\xi) = \int_{-\infty}^\infty e^{i\xi x} \, dF(x).
\end{align*}
The total variation of $\mu$ as a measure equals the total variation of $F$ as a BV function: $|\mu|(\mathbb{R}) = V(F; \mathbb{R})$. This correspondence is a bijection: every right-continuous BV function $F$ with $\lim_{x \to -\infty} F(x) = 0$ determines a unique finite Borel measure, and conversely.
[example: Distribution Function Of A Mixed Measure]
Let $\mu = \frac{1}{2}\mathcal{L}^1|_{[0,1]} + \frac{1}{2}\delta_2$ — half of its mass is spread uniformly over $[0,1]$, and the other half is a point mass at $x = 2$. The distribution function is:
\begin{align*}
F(x) = \begin{cases} 0 & \text{if } x < 0, \\ x/2 & \text{if } 0 \leq x < 1, \\ 1/2 & \text{if } 1 \leq x < 2, \\ 1 & \text{if } x \geq 2. \end{cases}
\end{align*}
The function $F$ is right-continuous, non-decreasing, with a continuous part (the ramp on $[0,1]$) and a jump of size $1/2$ at $x = 2$. Its total variation is $V(F; \mathbb{R}) = 1 = |\mu|(\mathbb{R})$. The Fourier–Stieltjes transform of $\mu$ can be computed as the Riemann–Stieltjes integral $\int e^{i\xi x} \, dF(x)$, which splits into the continuous part $\frac{1}{2}\int_0^1 e^{i\xi x} \, d\mathcal{L}^1(x)$ and the jump contribution $\frac{1}{2}e^{2i\xi}$.
[/example]
In higher dimensions ($n \geq 2$), there is no natural analogue of distribution functions, and the measure-theoretic formulation becomes essential. But in $\mathbb{R}^1$, the BV and measure viewpoints are interchangeable, and the BV formulation is how the transform appears in classical probability and in older texts on harmonic analysis.
### The Measure-Theoretic Definition
To state the definition in its modern form, we need the ambient space of finite measures. Without this space, we have no norm to control the transform and no algebraic structure ([convolution](/page/Convolution)) to exploit. The space of finite complex Borel measures is to the Fourier–Stieltjes transform what $L^1(\mathbb{R}^n)$ is to the ordinary Fourier transform: the natural domain on which the transform is a bounded [linear map](/page/Linear%20Map).
[definition:Space Of Finite Complex Borel Measures]
Let $M(\mathbb{R}^n)$ denote the space of all finite complex Borel measures on $(\mathbb{R}^n, \mathcal{B}(\mathbb{R}^n))$: that is, the space of all countably additive set functions $\mu: \mathcal{B}(\mathbb{R}^n) \to \mathbb{C}$ with finite total variation
\begin{align*}
\|\mu\| &:= |\mu|(\mathbb{R}^n) < \infty,
\end{align*}
where $|\mu|$ is the total variation measure of $\mu$. Equipped with the total variation norm $\|\cdot\|$ and the convolution product (defined below), $M(\mathbb{R}^n)$ is a unital Banach algebra with identity $\delta_0$.
[/definition]
The total variation norm is the natural norm on $M(\mathbb{R}^n)$: it measures the "total mass" of $\mu$ when sign and phase are ignored. For a positive measure, the total variation equals the total mass $\mu(\mathbb{R}^n)$; for a signed or complex measure, it accounts for cancellation.
With this space in hand, the Fourier–Stieltjes transform is defined by the same exponential integral that defines the $L^1$ Fourier transform, but with the Lebesgue density replaced by a general measure.
[definition:Fourier Stieltjes Transform]
Let $\mu \in M(\mathbb{R}^n)$. The **Fourier–Stieltjes transform** of $\mu$ is the function
\begin{align*}
\hat{\mu}: \mathbb{R}^n &\to \mathbb{C} \\
\xi &\mapsto \int_{\mathbb{R}^n} e^{i\xi \cdot x} \, d\mu(x).
\end{align*}
[/definition]
The sign convention $e^{i\xi \cdot x}$ (rather than $e^{-i\xi \cdot x}$) matches the standard probabilistic convention for characteristic functions. The analysis convention, used on the [Fourier Transform](/page/Fourier%20Transform) page, differs by a sign in the exponent; when $\mu$ has a density $f \in L^1$, the two conventions are related by $\hat{\mu}(\xi) = \hat{f}(-\xi)$. Either choice leads to an equivalent theory, but one must be consistent. Throughout this page, we use $e^{i\xi \cdot x}$.
The integral converges absolutely for every $\xi \in \mathbb{R}^n$ because
\begin{align*}
\left| \int_{\mathbb{R}^n} e^{i\xi \cdot x} \, d\mu(x) \right| &\leq \int_{\mathbb{R}^n} |e^{i\xi \cdot x}| \, d|\mu|(x) = |\mu|(\mathbb{R}^n) = \|\mu\|,
\end{align*}
so the definition requires no additional integrability hypotheses beyond $\mu \in M(\mathbb{R}^n)$.
## Basic Analytic Properties
The most fundamental properties of the Fourier–Stieltjes transform follow directly from the definition and the dominated convergence theorem. The key point is that the integrand $x \mapsto e^{i\xi \cdot x}$ is bounded and continuous, while $|\mu|$ is a finite measure — precisely the setting in which dominated convergence is most powerful.
[theorem:Basic Properties Of The Fourier Stieltjes Transform]
Let $\mu \in M(\mathbb{R}^n)$. Then:
(a) **Boundedness.** $\|\hat{\mu}\|_\infty \leq \|\mu\|$.
(b) **Value at the origin.** $\hat{\mu}(0) = \mu(\mathbb{R}^n)$.
(c) **Uniform continuity.** The function $\hat{\mu}: \mathbb{R}^n \to \mathbb{C}$ is uniformly continuous.
(d) **Hermitian symmetry for real measures.** If $\mu$ is a real-valued (signed) measure, then $\overline{\hat{\mu}(\xi)} = \hat{\mu}(-\xi)$ for all $\xi \in \mathbb{R}^n$.
(e) **Positive-definiteness for positive measures.** If $\mu$ is a positive (non-negative) measure, then $\hat{\mu}$ is positive-definite: for every $m \in \mathbb{N}$, every $\xi_1, \ldots, \xi_m \in \mathbb{R}^n$, and every $c_1, \ldots, c_m \in \mathbb{C}$,
\begin{align*}
\sum_{j=1}^{m} \sum_{k=1}^{m} c_j \overline{c_k} \, \hat{\mu}(\xi_j - \xi_k) &\geq 0.
\end{align*}
[/theorem]
Property (a) is the estimate derived above; it says that the Fourier–Stieltjes transform is a contraction from $(M(\mathbb{R}^n), \|\cdot\|)$ into $(L^\infty(\mathbb{R}^n), \|\cdot\|_\infty)$. Property (b) recovers the total mass of $\mu$ from a single evaluation of $\hat{\mu}$ — in probability, this gives the normalisation $\phi_X(0) = 1$ for characteristic functions.
Property (c) follows from dominated convergence: if $\xi_m \to \xi$ in $\mathbb{R}^n$, then $e^{i\xi_m \cdot x} \to e^{i\xi \cdot x}$ pointwise, and the integrand is bounded by $1 \in L^1(|\mu|)$, so $\hat{\mu}(\xi_m) \to \hat{\mu}(\xi)$. The uniformity of the continuity requires a slightly more careful argument using the fact that the family of functions $\{x \mapsto e^{i\xi \cdot x} - e^{i\eta \cdot x}\}$ can be made uniformly small on a [set](/page/Set) of large $|\mu|$-measure when $|\xi - \eta|$ is small.
Property (e) is the starting point for Bochner's theorem (§ Positive-Definite Functions and Bochner's Theorem below). The computation verifying positive-definiteness is:
\begin{align*}
\sum_{j=1}^{m} \sum_{k=1}^{m} c_j \overline{c_k} \, \hat{\mu}(\xi_j - \xi_k) &= \sum_{j,k} c_j \overline{c_k} \int_{\mathbb{R}^n} e^{i(\xi_j - \xi_k) \cdot x} \, d\mu(x) \\
&= \int_{\mathbb{R}^n} \left( \sum_{j=1}^{m} c_j e^{i\xi_j \cdot x} \right) \overline{\left( \sum_{k=1}^{m} c_k e^{i\xi_k \cdot x} \right)} \, d\mu(x) \\
&= \int_{\mathbb{R}^n} \left| \sum_{j=1}^{m} c_j e^{i\xi_j \cdot x} \right|^2 \, d\mu(x) \geq 0,
\end{align*}
where the interchange of sum and integral is justified by finiteness of $\mu$, and the final inequality holds because the integrand is non-negative and $\mu$ is a positive measure.
### The Failure of the Riemann–Lebesgue Lemma
For $L^1$ functions, the [Riemann–Lebesgue lemma](/theorems/245) guarantees that $\hat{f}(\xi) \to 0$ as $|\xi| \to \infty$. This decay-at-infinity property is one of the most useful features of the $L^1$ Fourier transform. A natural question is whether the same holds for the Fourier–Stieltjes transform of a general measure.
It does not. The Dirac mass $\delta_a$ at a point $a \in \mathbb{R}^n$ has Fourier–Stieltjes transform $\hat{\delta}_a(\xi) = e^{ia \cdot \xi}$, which satisfies $|\hat{\delta}_a(\xi)| = 1$ for all $\xi$. Far from decaying, it oscillates with constant amplitude. More generally, any measure with a non-trivial discrete (atomic) part will have a Fourier–Stieltjes transform that does not vanish at infinity.
[example:Fourier Stieltjes Transform Of A Point Mass]
Let $a \in \mathbb{R}^n$ and let $\delta_a$ denote the Dirac measure at $a$: for every Borel set $A \subseteq \mathbb{R}^n$, $\delta_a(A) = \mathbb{1}_A(a)$. Then $\delta_a \in M(\mathbb{R}^n)$ with $\|\delta_a\| = 1$, and
\begin{align*}
\hat{\delta}_a(\xi) &= \int_{\mathbb{R}^n} e^{i\xi \cdot x} \, d\delta_a(x) = e^{i\xi \cdot a}.
\end{align*}
In particular, $\hat{\delta}_0(\xi) = 1$ for all $\xi$, and $|\hat{\delta}_a(\xi)| = 1$ for all $\xi$ and all $a$. The Fourier–Stieltjes transform of a point mass does not decay at infinity.
[/example]
This example shows that the image of the Fourier–Stieltjes transform is strictly larger than $C_0(\mathbb{R}^n)$: it includes functions that do not vanish at infinity. The precise characterisation of which continuous bounded functions arise as Fourier–Stieltjes transforms of positive measures is given by Bochner's theorem below.
The Riemann–Lebesgue lemma does still apply to the absolutely continuous part of a measure. If $\mu = f \, d\mathcal{L}^n + \mu_s$ is the Lebesgue decomposition of $\mu$ with respect to $\mathcal{L}^n$, then $\hat{\mu}(\xi) = \hat{f}(-\xi) + \hat{\mu}_s(\xi)$, and the first term decays at infinity by the Riemann–Lebesgue lemma. The behavior at infinity is therefore governed by the singular part $\mu_s$.
## Embedding of $L^1$ into $M(\mathbb{R}^n)$
The relationship between the $L^1$ Fourier transform and the Fourier–Stieltjes transform is not merely an analogy — there is a canonical isometric embedding that makes the former a special case of the latter. Understanding this embedding clarifies why every theorem about Fourier–Stieltjes transforms automatically specialises to a theorem about $L^1$ Fourier transforms.
Every function $f \in L^1(\mathbb{R}^n, \mathcal{L}^n)$ defines a complex Borel measure $\mu_f$ by
\begin{align*}
\mu_f(A) &= \int_A f \, d\mathcal{L}^n \quad \text{for all } A \in \mathcal{B}(\mathbb{R}^n).
\end{align*}
The total variation of $\mu_f$ equals the $L^1$ norm: $\|\mu_f\| = \|f\|_{L^1}$. This means the map $f \mapsto \mu_f$ is an isometric embedding of $L^1(\mathbb{R}^n, \mathcal{L}^n)$ into $M(\mathbb{R}^n)$. Its image consists precisely of the measures that are absolutely continuous with respect to $\mathcal{L}^n$. Under this embedding, the Fourier–Stieltjes transform of $\mu_f$ recovers the $L^1$ Fourier transform up to the sign convention:
\begin{align*}
\hat{\mu}_f(\xi) &= \int_{\mathbb{R}^n} e^{i\xi \cdot x} f(x) \, d\mathcal{L}^n(x) = \hat{f}(-\xi),
\end{align*}
where $\hat{f}$ uses the analysis convention $\hat{f}(\xi) = \int f(x) e^{-i\xi \cdot x} \, d\mathcal{L}^n(x)$ from the [Fourier Transform](/page/Fourier%20Transform) page.
[example:Gaussian Measure]
Let $\mu$ be the Gaussian probability measure on $\mathbb{R}$ with density
\begin{align*}
f: \mathbb{R} &\to \mathbb{R} \\
x &\mapsto \frac{1}{\sqrt{2\pi}} e^{-x^2/2}.
\end{align*}
This defines a positive Borel measure $\mu = f \, d\mathcal{L}^1 \in M(\mathbb{R})$ with $\|\mu\| = 1$. Its Fourier–Stieltjes transform is
\begin{align*}
\hat{\mu}(\xi) &= \int_{\mathbb{R}} e^{i\xi x} \cdot \frac{1}{\sqrt{2\pi}} e^{-x^2/2} \, d\mathcal{L}^1(x).
\end{align*}
To evaluate this, complete the square in the exponent:
\begin{align*}
i\xi x - \frac{x^2}{2} &= -\frac{1}{2}(x - i\xi)^2 - \frac{\xi^2}{2}.
\end{align*}
Substituting $u = x - i\xi$ and shifting the contour of integration (justified because the Gaussian decays rapidly in horizontal strips of $\mathbb{C}$):
\begin{align*}
\hat{\mu}(\xi) &= \frac{1}{\sqrt{2\pi}} e^{-\xi^2/2} \int_{\mathbb{R}} e^{-u^2/2} \, d\mathcal{L}^1(u) = e^{-\xi^2/2}.
\end{align*}
The Fourier–Stieltjes transform of the standard Gaussian is itself a Gaussian $\xi \mapsto e^{-\xi^2/2}$. This function decays rapidly at infinity — consistent with the Riemann–Lebesgue lemma, since $\mu$ is absolutely continuous.
[/example]
## Convolution of Measures
One of the most important algebraic features of the Fourier–Stieltjes transform is its behavior under convolution. Just as the [Fourier transform](/page/Fourier%20Transform) converts convolution of $L^1$ functions into pointwise multiplication, the Fourier–Stieltjes transform converts convolution of measures into pointwise multiplication.
The convolution of two measures needs careful definition, since we cannot simply "multiply densities and integrate" when no densities exist. The correct approach uses the product measure and a summation map.
[definition:Convolution Of Measures]
Let $\mu, \nu \in M(\mathbb{R}^n)$. The **convolution** $\mu * \nu \in M(\mathbb{R}^n)$ is defined by
\begin{align*}
(\mu * \nu)(A) &= \int_{\mathbb{R}^n} \int_{\mathbb{R}^n} \mathbb{1}_A(x + y) \, d\mu(x) \, d\nu(y)
\end{align*}
for every $A \in \mathcal{B}(\mathbb{R}^n)$. Equivalently, $\mu * \nu$ is the image (pushforward) of the product measure $\mu \otimes \nu$ under the addition map $(x, y) \mapsto x + y$.
[/definition]
The equivalence of these two descriptions is immediate from the definition of a pushforward: $(\mu * \nu)(A) = (\mu \otimes \nu)(\{(x,y) : x + y \in A\}) = \int \int \mathbb{1}_A(x + y) \, d\mu(x) \, d\nu(y)$. The total variation satisfies $\|\mu * \nu\| \leq \|\mu\| \cdot \|\nu\|$, so convolution is a bounded bilinear operation on $M(\mathbb{R}^n)$. Together with the total variation norm, this makes $M(\mathbb{R}^n)$ a Banach algebra — and $\delta_0$ is the identity, since $\mu * \delta_0 = \mu$ for every $\mu$.
The Fourier–Stieltjes transform is useful precisely because it converts algebraic operations on measures into pointwise operations on functions. We already saw that it maps the measure algebra $(M(\mathbb{R}^n), \|\cdot\|)$ into $(L^\infty(\mathbb{R}^n), \|\cdot\|_\infty)$ contractively. The next result shows it is actually a **Banach algebra homomorphism**: it converts convolution of measures into pointwise multiplication of transforms. Without this property, the transform would be a mere curiosity — a way to assign a function to a measure. With it, the transform becomes a computational tool: convolution equations become algebraic equations in the frequency domain.
[theorem:Convolution And The Fourier Stieltjes Transform]
Let $\mu, \nu \in M(\mathbb{R}^n)$. Then
\begin{align*}
\widehat{\mu * \nu}(\xi) &= \hat{\mu}(\xi) \cdot \hat{\nu}(\xi) \quad \text{for all } \xi \in \mathbb{R}^n.
\end{align*}
[/theorem]
The proof is a direct computation using Fubini's theorem. Since $|\mu| \otimes |\nu|$ is a finite measure on $\mathbb{R}^n \times \mathbb{R}^n$ and the integrand $(x,y) \mapsto e^{i\xi \cdot (x+y)}$ is bounded, Fubini applies without further conditions:
\begin{align*}
\widehat{\mu * \nu}(\xi) &= \int_{\mathbb{R}^n} e^{i\xi \cdot z} \, d(\mu * \nu)(z) \\
&= \int_{\mathbb{R}^n} \int_{\mathbb{R}^n} e^{i\xi \cdot (x + y)} \, d\mu(x) \, d\nu(y) \\
&= \left( \int_{\mathbb{R}^n} e^{i\xi \cdot x} \, d\mu(x) \right) \left( \int_{\mathbb{R}^n} e^{i\xi \cdot y} \, d\nu(y) \right) \\
&= \hat{\mu}(\xi) \cdot \hat{\nu}(\xi).
\end{align*}
This is the measure-theoretic version of the [Convolution Theorem for Fourier Transforms](/theorems/527). In probability, it yields the well-known fact that the characteristic function of a sum of independent random variables is the product of their characteristic functions: if $X$ and $Y$ are independent, then $\mu_{X+Y} = \mu_X * \mu_Y$, and so $\phi_{X+Y} = \phi_X \cdot \phi_Y$.
[example:Convolution Of Point Masses]
Let $a, b \in \mathbb{R}^n$. The convolution of two Dirac masses is
\begin{align*}
(\delta_a * \delta_b)(A) &= \int_{\mathbb{R}^n} \int_{\mathbb{R}^n} \mathbb{1}_A(x+y) \, d\delta_a(x) \, d\delta_b(y) = \mathbb{1}_A(a + b) = \delta_{a+b}(A)
\end{align*}
for every Borel set $A$. On the Fourier side:
\begin{align*}
\hat{\delta}_a(\xi) \cdot \hat{\delta}_b(\xi) &= e^{ia \cdot \xi} \cdot e^{ib \cdot \xi} = e^{i(a+b) \cdot \xi} = \hat{\delta}_{a+b}(\xi),
\end{align*}
confirming the convolution theorem. The Fourier–Stieltjes transform converts convolution (addition of mass locations) into pointwise multiplication of exponentials.
[/example]
## The Uniqueness Theorem
The most consequential property of the Fourier–Stieltjes transform is that it is injective: two distinct measures in $M(\mathbb{R}^n)$ cannot have the same Fourier–Stieltjes transform. Without injectivity, the transform would be a lossy operation — useful for computation but unable to serve as a faithful representation of measures. The uniqueness theorem guarantees that no information is lost, and that any property of $\mu$ can in principle be recovered from $\hat{\mu}$.
In probability, this is the statement that the characteristic function determines the distribution: if two random variables have the same characteristic function, they have the same law. This underpins the classical strategy of proving [distributional](/page/Distribution) identities by checking that characteristic functions agree.
[theorem:Uniqueness Of The Fourier Stieltjes Transform]
Let $\mu, \nu \in M(\mathbb{R}^n)$. If $\hat{\mu}(\xi) = \hat{\nu}(\xi)$ for all $\xi \in \mathbb{R}^n$, then $\mu = \nu$.
Equivalently, the Fourier–Stieltjes transform $\mu \mapsto \hat{\mu}$ is an injective homomorphism from the Banach algebra $(M(\mathbb{R}^n), *, \|\cdot\|)$ into $(C_b(\mathbb{R}^n), \cdot, \|\cdot\|_\infty)$.
[/theorem]
The idea of the proof is to use the Fourier–Stieltjes transform to test $\mu$ against a sufficiently rich family of functions. Fix $\xi \in \mathbb{R}^n$ and $\varepsilon > 0$, and consider the Gaussian measure $\gamma_\varepsilon$ with density $(2\pi\varepsilon)^{-n/2} \exp(-|x|^2 / 2\varepsilon)$. The Fourier–Stieltjes transform of $\gamma_\varepsilon$ is $\hat{\gamma}_\varepsilon(\xi) = e^{-\varepsilon|\xi|^2/2}$. By the convolution theorem, the function
\begin{align*}
(\hat{\mu} \cdot \hat{\gamma}_\varepsilon)(\xi) &= \widehat{\mu * \gamma_\varepsilon}(\xi)
\end{align*}
is the Fourier–Stieltjes transform of the smoothed measure $\mu * \gamma_\varepsilon$. The measure $\mu * \gamma_\varepsilon$ has a smooth density with respect to $\mathcal{L}^n$ (it is the convolution of $\mu$ with a Gaussian), so the $L^1$ [Fourier inversion theorem](/theorems/364) recovers this density from $\hat{\mu} \cdot \hat{\gamma}_\varepsilon$. Since $\hat{\mu} = \hat{\nu}$ everywhere, $\mu * \gamma_\varepsilon = \nu * \gamma_\varepsilon$ for every $\varepsilon > 0$. Letting $\varepsilon \to 0$, the Gaussian approximation to the identity gives $\mu * \gamma_\varepsilon \to \mu$ and $\nu * \gamma_\varepsilon \to \nu$ in the weak* [topology](/page/Topology), and hence $\mu = \nu$.
This argument illustrates a recurring theme: Gaussian convolution serves as a regularisation that transfers information from the Fourier side back to the measure side, exploiting the self-reproducing property of the Gaussian under the Fourier transform.
## Positive-Definite Functions and Bochner's Theorem
We saw above that the Fourier–Stieltjes transform of a positive measure is a continuous positive-definite function. A deep result of Bochner shows that the converse holds: every continuous positive-definite function arises as the Fourier–Stieltjes transform of some positive finite Borel measure. This provides a complete characterisation of which continuous functions are Fourier–Stieltjes transforms of positive measures.
Bochner's theorem provides a *converse* to property (e): it says that every continuous positive-definite function is the Fourier–Stieltjes transform of some positive finite Borel measure. To state it, we need to isolate what "positive-definite" means as a standalone condition — independent of any underlying measure. The definition below captures the algebraic property that the computation in property (e) established: the matrix $(\varphi(\xi_j - \xi_k))_{j,k}$ is positive semi-definite for every finite collection of points.
[definition:Positive Definite Function]
A function $\varphi: \mathbb{R}^n \to \mathbb{C}$ is **positive-definite** if for every $m \in \mathbb{N}$, every $\xi_1, \ldots, \xi_m \in \mathbb{R}^n$, and every $c_1, \ldots, c_m \in \mathbb{C}$,
\begin{align*}
\sum_{j=1}^{m} \sum_{k=1}^{m} c_j \overline{c_k} \, \varphi(\xi_j - \xi_k) &\geq 0.
\end{align*}
[/definition]
Positive-definiteness is a strong condition. Taking $m = 1$ gives $\varphi(0) \geq 0$. Taking $m = 2$ with $c_1 = c_2 = 1$ gives $\varphi(0) + \varphi(\xi_1 - \xi_2) + \overline{\varphi(\xi_1 - \xi_2)} + \varphi(0) \geq 0$, hence $\operatorname{Re} \varphi(\xi) \leq \varphi(0)$ for all $\xi$. In particular, $|\varphi(\xi)| \leq \varphi(0)$ — the function is bounded and attains its maximum modulus at the origin.
With this definition, Bochner's theorem asserts that continuity and positive-definiteness together are necessary and sufficient for a function to be the Fourier–Stieltjes transform of a positive finite Borel measure.
[theorem:Bochner]
A function $\varphi: \mathbb{R}^n \to \mathbb{C}$ is continuous and positive-definite if and only if there exists a positive finite Borel measure $\mu \in M(\mathbb{R}^n)$ such that
\begin{align*}
\varphi(\xi) &= \hat{\mu}(\xi) = \int_{\mathbb{R}^n} e^{i\xi \cdot x} \, d\mu(x) \quad \text{for all } \xi \in \mathbb{R}^n.
\end{align*}
The measure $\mu$ is uniquely determined by $\varphi$ (by the uniqueness theorem), and $\mu(\mathbb{R}^n) = \varphi(0)$.
[/theorem]
The "only if" direction was verified in the discussion of property (e) above: we showed by direct computation that $\hat{\mu}$ is positive-definite whenever $\mu$ is a positive measure, and we already know $\hat{\mu}$ is continuous.
The "if" direction is the deep part. The idea is to use positive-definiteness to construct the measure $\mu$ as a [limit](/page/Limit) of absolutely continuous approximations. For $\varepsilon > 0$, define the approximate density
\begin{align*}
f_\varepsilon(x) &= \frac{1}{(2\pi)^n} \int_{\mathbb{R}^n} \varphi(\xi) \, e^{-i\xi \cdot x} \, e^{-\varepsilon|\xi|^2/2} \, d\mathcal{L}^n(\xi).
\end{align*}
Positive-definiteness of $\varphi$ ensures that $f_\varepsilon(x) \geq 0$ for all $x$. The function $f_\varepsilon$ is integrable (the Gaussian factor ensures convergence), and the measures $\mu_\varepsilon = f_\varepsilon \, d\mathcal{L}^n$ form a bounded net in $M(\mathbb{R}^n)$ with $\|\mu_\varepsilon\| = \int f_\varepsilon \, d\mathcal{L}^n = \varphi(0)$. By the Banach–Alaoglu theorem (applied to $M(\mathbb{R}^n)$ as the dual of $C_0(\mathbb{R}^n)$), a subnet converges weak* to some positive measure $\mu$, and a dominated convergence argument shows $\hat{\mu} = \varphi$.
In probability, Bochner's theorem is the reason that characteristic functions are such effective tools: any function satisfying a handful of checkable conditions (continuity, $\varphi(0) = 1$, positive-definiteness) is guaranteed to be the characteristic function of some probability distribution. This is used, for instance, to prove the existence of Gaussian processes and infinitely divisible distributions.
[example:Positive Definiteness Of The Gaussian]
Consider the function $\varphi: \mathbb{R} \to \mathbb{R}$ defined by $\varphi(\xi) = e^{-|\xi|^2/2}$. We verified in the Gaussian example above that $\varphi = \hat{\mu}$ where $\mu$ is the standard Gaussian measure. Therefore, by the "only if" direction of Bochner's theorem, $\varphi$ must be positive-definite.
This can also be verified directly. For any $\xi_1, \ldots, \xi_m \in \mathbb{R}$ and $c_1, \ldots, c_m \in \mathbb{C}$:
\begin{align*}
\sum_{j,k} c_j \overline{c_k} \, e^{-|\xi_j - \xi_k|^2/2} &= \sum_{j,k} c_j \overline{c_k} \, e^{-\xi_j^2/2} \, e^{-\xi_k^2/2} \, e^{\xi_j \xi_k} \\
&= \sum_{j,k} c_j \overline{c_k} \, e^{-\xi_j^2/2} \, e^{-\xi_k^2/2} \sum_{p=0}^{\infty} \frac{(\xi_j \xi_k)^p}{p!} \\
&= \sum_{p=0}^{\infty} \frac{1}{p!} \left| \sum_{j=1}^{m} c_j \, e^{-\xi_j^2/2} \, \xi_j^p \right|^2 \geq 0.
\end{align*}
Each term in the [series](/page/Series) is a squared modulus and hence non-negative, confirming positive-definiteness.
[/example]
## The Lévy Continuity Theorem
In probability, the most important application of the Fourier–Stieltjes transform is to questions of convergence. The [central limit theorem](/theorems/521), for instance, is proved by showing that the characteristic functions of normalised sums converge pointwise to $e^{-|\xi|^2/2}$ — and then concluding that the distributions converge weakly to the Gaussian. The theorem that justifies this final step is the Lévy continuity theorem.
The difficulty that the Lévy continuity theorem addresses is the gap between pointwise convergence of transforms and [weak convergence](/page/Weak%20Convergence) of measures. Pointwise convergence of $\hat{\mu}_m$ to some function $\varphi$ does not automatically imply that the measures $\mu_m$ converge to anything — the [sequence](/page/Sequence) could "leak mass to infinity." The Lévy continuity theorem identifies the precise condition under which no mass escapes: continuity of the limit $\varphi$ at the origin.
[theorem:Levy Continuity Theorem]
Let $(\mu_m)_{m \in \mathbb{N}}$ be a sequence of Borel probability measures on $\mathbb{R}^n$, and suppose that the Fourier–Stieltjes transforms converge pointwise:
\begin{align*}
\hat{\mu}_m(\xi) &\to \varphi(\xi) \quad \text{for every } \xi \in \mathbb{R}^n.
\end{align*}
(a) If $\varphi$ is continuous at $\xi = 0$, then $\varphi$ is the Fourier–Stieltjes transform of a Borel probability measure $\mu$ on $\mathbb{R}^n$, and $\mu_m \to \mu$ weakly (that is, $\mu_m \xrightarrow{d} \mu$).
(b) Conversely, if $\mu_m \to \mu$ weakly for some probability measure $\mu$, then $\hat{\mu}_m(\xi) \to \hat{\mu}(\xi)$ for every $\xi \in \mathbb{R}^n$.
[/theorem]
Part (b) is the easier direction: if $\mu_m \to \mu$ weakly, then by definition $\int g \, d\mu_m \to \int g \, d\mu$ for every bounded continuous function $g$. Since $x \mapsto e^{i\xi \cdot x}$ is bounded and continuous for each fixed $\xi$, the conclusion follows immediately.
Part (a) is the substantial content. The hypothesis that $\varphi$ is continuous at $0$ is what prevents mass from escaping to infinity. To see why, suppose instead that mass did escape: then there would exist $\varepsilon > 0$ and a subsequence $\mu_{m_k}$ with $\mu_{m_k}(\mathbb{R}^n \setminus B(0, R_k)) \geq \varepsilon$ for some $R_k \to \infty$. The mass at infinity causes the transforms $\hat{\mu}_{m_k}$ to oscillate near the origin, which is incompatible with the limit $\varphi$ being continuous there.
The formal proof uses the tightness criterion: continuity of $\varphi$ at $0$ implies that the sequence $(\mu_m)$ is tight (this is established via a Gaussian averaging argument), and Prokhorov's theorem then gives [weak sequential compactness](/theorems/214). The uniqueness theorem identifies the limit.
## References
- Folland, G.B., *Real Analysis: Modern Techniques and Their Applications* (1999), Chapter 8.
- Rudin, W., *Fourier Analysis on [Groups](/page/Group)* (1962), Chapters 1–2.
- Katznelson, Y., *An Introduction to Harmonic Analysis* (2004), Chapter VI.
- Durrett, R., *Probability: Theory and Examples* (2019), Chapter 3.