Harmonic analysis is the study of how functions decompose into simpler oscillatory components and how operators act on these decompositions. This graduate course explores the modern theory of singular integral operators — the workhorses of harmonic analysis — and develops the function spaces and techniques needed to understand their boundedness. The theory culminates in powerful theorems like T(1) and T(b), which provide conditions for an operator to map one function space to another, with applications ranging from partial differential equations to signal processing and approximation theory.
The course begins by establishing foundational tools: interpolation theory allows us to leverage bounds on simple operators to obtain bounds on more complex ones, while the Hardy-Littlewood maximal function and Calderón-Zygmund decomposition provide mechanisms for controlling oscillations and localizing analysis. The Hilbert transform serves as the canonical example of a singular integral, introducing the key phenomenon that integrating along a line with a singular kernel can still yield bounded operators. This motivates the general Calderón-Zygmund theory, which systematically identifies when such operators are bounded, before the T(1) and T(b) theorems offer practical criteria for verification.
The latter half of the course shifts focus to deeper structures and refined techniques. The real Hardy space $H^1$ and its dual BMO capture the fine regularity properties of functions beyond classical Lebesgue spaces, while the Littlewood-Paley decomposition and Fourier multipliers provide alternative ways to analyze function behavior through frequency localization. Specialization topics — wavelets, Besov and Triebel-Lizorkin spaces, Muckenhoupt weights, and stationary phase methods — illustrate how these core ideas apply to modern problems, from harmonic analysis on weighted spaces to geometric measure theory and restriction phenomena in Fourier analysis.
# 1. Real and Complex Interpolation
This opening chapter establishes the two interpolation machines that drive almost every boundedness argument in the course. The central question is: if a linear operator behaves well at two extreme exponents, does it behave well at all intermediate exponents? The Riesz–Thorin theorem (complex interpolation) and the Marcinkiewicz interpolation theorem (real interpolation) give two different — and complementary — affirmative answers. To state these results precisely, one must first introduce the distribution function and its companion, the decreasing rearrangement, together with the Lorentz spaces that refine the $L^p$ scale. As a first major application, the Hausdorff–Young inequality for the Fourier transform follows directly from Riesz–Thorin.
## Distribution Functions and Decreasing Rearrangements
The starting point is an efficient way to encode the size of a measurable function without caring about where on the domain its large values occur. This leads to the distribution function, a classical device that converts pointwise information into measure-theoretic information.
[definition: Distribution Function]
Let $(X, \mu)$ be a $\sigma$-finite measure space and let $f: X \to \mathbb{C}$ be measurable. The **distribution function** of $f$ is the function $d_f: (0, \infty) \to [0, \infty]$ defined by
\begin{align*}
d_f(\lambda) = \mu\bigl(\{x \in X : |f(x)| > \lambda\}\bigr).
\end{align*}
[/definition]
The map $\lambda \mapsto d_f(\lambda)$ is non-increasing and right-continuous. It records how much of the domain is occupied by the "large" part of $f$, at each level $\lambda$.
[example: Distribution Function of a Power]
Let $f(x) = |x|^{-\alpha}$ on $\mathbb{R}^n$ with $0 < \alpha < n$. Then $|f(x)| > \lambda$ iff $|x| < \lambda^{-1/\alpha}$, so
\begin{align*}
d_f(\lambda) = \mathcal{L}^n\bigl(\{|x| < \lambda^{-1/\alpha}\}\bigr) = \omega_n \lambda^{-n/\alpha},
\end{align*}
where $\omega_n$ is the volume of the unit ball in $\mathbb{R}^n$. The power law $d_f(\lambda) \sim \lambda^{-n/\alpha}$ is the signature of a weak-$L^p$ function with $p = n/\alpha$. Setting $p = n/\alpha$ gives $\|f\|_{L^{p,\infty}} = \omega_n^{1/p} < \infty$. On the other hand, $\int_{|x| < R} |x|^{-\alpha p}\, d\mathcal{L}^n = \omega_n \int_0^R r^{n - \alpha p - 1}\, dr$, and since $\alpha p = n$ this integrand is $r^{-1}$, whose integral diverges logarithmically. Hence $f \in L^{p,\infty}(\mathbb{R}^n) \setminus L^p(\mathbb{R}^n)$. This is the canonical example showing that the Lorentz refinement $L^{p,\infty} \supsetneq L^p$ is not vacuous.
[/example]
The fundamental link between the distribution function and $L^p$ norms is the layer-cake formula. It allows one to compute $L^p$ norms by integrating the distribution function — a computation over $\lambda \in (0, \infty)$ rather than over $X$.
[quotetheorem:2956]
[citeproof:2956]
The hypothesis that $f$ is non-negative and $(X, \mu)$ is $\sigma$-finite is essential: without $\sigma$-finiteness, Fubini's theorem need not apply. Non-negativity ensures the integrand is non-negative, so Tonelli's theorem applies without integrability preconditions.
The layer-cake formula motivates studying $d_f$ directly. The decreasing rearrangement packages $d_f$ into a function on $(0, \infty)$ that has the same distribution as $f$ but is arranged in decreasing order.
<!-- illustration-needed: graph of $f^*$ as the sorted version of $|f|$ — show a step function $f$ on the real line alongside its rearrangement $f^*$ on $(0,\infty)$, with level sets of $|f|$ at height $\lambda$ corresponding to the jump in $f^*$ at $t = d_f(\lambda)$ -->
[definition: Decreasing Rearrangement]
The **decreasing rearrangement** of $f$ is the function $f^*: (0, \infty) \to [0, \infty)$ defined by
\begin{align*}
f^*(t) = \inf\{\lambda > 0 : d_f(\lambda) \le t\}.
\end{align*}
[/definition]
Equivalently, $f^*$ is the right-continuous inverse (generalised inverse) of $d_f$. The key properties are: $f^*$ is non-increasing, right-continuous, and $f$ and $f^*$ are equimeasurable in the sense that $d_{f^*} = d_f$. In particular, $\|f^*\|_{L^p(0,\infty)} = \|f\|_{L^p(X)}$ for all $1 \le p \le \infty$.
[example: Rearrangement of a Step Function]
Let $f(x) = 3 \cdot \mathbb{1}_{[0,1]}(x) + 5 \cdot \mathbb{1}_{[2,4]}(x)$ on $\mathbb{R}$ with Lebesgue measure. Then
\begin{align*}
d_f(\lambda) = \begin{cases} 3 & 0 < \lambda < 3, \\ 2 & 3 \le \lambda < 5, \\ 0 & \lambda \ge 5. \end{cases}
\end{align*}
Computing the inverse: $f^*(t) = 5$ for $t \in (0,2)$, and $f^*(t) = 3$ for $t \in [2,3)$, and $f^*(t) = 0$ for $t \ge 3$. This function on $(0, \infty)$ is the "sorted" version of $f$: largest values first.
[/example]
The rearrangement repackages all the size information of $f$ into a one-dimensional non-increasing function, but the $L^p$ norm of $f^*$ — being equal to that of $f$ — still depends only on the sublevel-set measures $d_f(\lambda)$ in an aggregate way. To detect finer distinctions between functions whose distribution functions decay at the same rate, one needs to weight the rearrangement by an additional integrability index. This is exactly what the Lorentz scale does, and the next section introduces it.
## Lorentz Spaces
Consider the function $f(x) = |x|^{-n/p}$ on $\mathbb{R}^n$. Its distribution function satisfies $d_f(\lambda) = \omega_n \lambda^{-p}$, so the layer-cake integral $\int_0^\infty \lambda^{p-1} d_f(\lambda)\, d\lambda$ diverges at both $0$ and $\infty$: $f \notin L^p(\mathbb{R}^n)$. Yet $d_f(\lambda)$ decays exactly like $\lambda^{-p}$ — the borderline case. Does the $L^p$ scale see this function at all? It does not: $f$ lives in $L^{p,\infty} \setminus L^p$, a gap that the standard scale cannot resolve. Lorentz spaces are designed precisely to fill this gap. They refine $L^p$ by introducing a second parameter $q$ that controls how regularly $d_f$ decays: not merely whether the decay is $O(\lambda^{-p})$, but whether the contributions at each level $\lambda$ are summable in a finer $L^q$ sense.
[definition: Lorentz Space]
Let $1 \le p < \infty$ and $1 \le q \le \infty$. The **Lorentz space** $L^{p,q}(X, \mu)$ consists of all measurable functions $f$ for which the quantity $\|f\|_{L^{p,q}}$ is finite, where
\begin{align*}
\|f\|_{L^{p,q}} = \begin{cases} \displaystyle \left( \int_0^\infty \bigl(t^{1/p} f^*(t)\bigr)^q \frac{dt}{t} \right)^{1/q} & \text{if } q < \infty, \\ \sup_{t > 0}\, t^{1/p} f^*(t) & \text{if } q = \infty. \end{cases}
\end{align*}
[/definition]
The quantity $t^{1/p} f^*(t)$ captures how large the $p$-th moment contribution is at level $t$: if $f^*(t) \approx t^{-1/p}$, the contributions are constant, corresponding to a borderline $L^p$ function. The $L^q(dt/t)$ norm of this expression measures whether those contributions are summable.
[remark: The Special Cases $L^{p,p}$ and $L^{p,\infty}$]
The layer-cake formula shows that $L^{p,p} = L^p$ with equal norms (up to a constant depending only on $p$). The endpoint space $L^{p,\infty}$ is called **weak-$L^p$** or **Marcinkiewicz space**: the condition $\|f\|_{L^{p,\infty}} < \infty$ is equivalent to $d_f(\lambda) \le C \lambda^{-p}$ for all $\lambda > 0$. Weak-$L^p$ is strictly larger than $L^p$; for instance, $|x|^{-n/p} \in L^{p,\infty}(\mathbb{R}^n) \setminus L^p(\mathbb{R}^n)$.
[/remark]
The ordering in $q$ is captured by a chain of embeddings. This monotonicity in the secondary index — finer decay control gives a stronger space — is the key structural property of the Lorentz scale and will reappear when we interpolate between weak-type endpoint bounds using Marcinkiewicz.
[quotetheorem:3148]
[citeproof:3148]
[remark: Why Lorentz Spaces Appear]
In interpolation theory, weak-type bounds — which place an operator in $\mathcal{L}(L^p, L^{p,\infty})$ rather than $\mathcal{L}(L^p, L^p)$ — arise naturally at endpoint exponents. The Lorentz scale provides the precise language for tracking these endpoint losses and gains during interpolation. The Marcinkiewicz theorem below is best understood as an interpolation theorem in the Lorentz scale.
[/remark]
With the Lorentz scale in hand, the stage is set for the interpolation theorems. Both Riesz–Thorin and Marcinkiewicz can be understood as assertions that the Lorentz scale is the natural receptacle for operators whose bounds vary continuously with the exponent.
## The Riesz–Thorin Theorem
The complex interpolation theorem is arguably the most elegant result in the subject: a linear operator bounded at two exponent pairs is bounded at all interpolated pairs, and the norm estimate is convex (in logarithm) along the interpolation path.
The proof uses complex analysis in an essential way — specifically, the three-lines lemma, which is the maximum principle applied to a strip in $\mathbb{C}$.
<!-- illustration-needed: the strip $\mathcal{S} = \{z \in \mathbb{C} : 0 \le \operatorname{Re}(z) \le 1\}$ in the complex plane, with the two boundary lines $\operatorname{Re}(z) = 0$ and $\operatorname{Re}(z) = 1$ labelled with bounds $M_0$ and $M_1$, and the intermediate line $\operatorname{Re}(z) = \theta$ labelled with $M_0^{1-\theta} M_1^\theta$ -->
[quotetheorem:3149]
[citeproof:3149]
The three-lines lemma is the analytic engine that powers Riesz–Thorin: it converts a pair of boundary bounds on a holomorphic function into a logarithmically convex bound on the interior of the strip, which is exactly the convexity expressed by the geometric mean $M_0^{1-\theta} M_1^\theta$ in the interpolation conclusion. To deploy this engine, we will encode the operator $T$ acting on a fixed simple function $f$ of $L^{p_\theta}$-norm one as a holomorphic function $F(z)$ on the strip, by attaching analytic exponents to the amplitudes of the elementary pieces of $f$ (and similarly on the dual side). At $\operatorname{Re}(z) = 0$ and $\operatorname{Re}(z) = 1$ the construction recovers test functions in $L^{p_0}$ and $L^{p_1}$ respectively, so the strong-type hypotheses on $T$ pin down the boundary values; at $z = \theta$ it recovers the action of $T$ on $f$ itself. Three-lines then delivers the conclusion in a single step.
[quotetheorem:949]
[citeproof:949]
The key structural insight is that the norm estimate $M_0^{1-\theta} M_1^\theta$ is the geometric mean of the two endpoint norms — the same convexity that log-convexity of $L^p$ norms expresses. This geometric convexity is essentially forced by the three-lines lemma.
To see Riesz–Thorin in action, we apply it to convolution operators where it recovers Young's inequality.
[example: Boundedness of Young's Convolution Inequality]
The convolution operator $T_g f = f * g$ satisfies $\|T_g f\|_\infty \le \|g\|_\infty \|f\|_1$ (so it is of type $(1, \infty)$ with norm $\|g\|_\infty$) and $\|T_g f\|_1 \le \|g\|_1 \|f\|_1$ (type $(1, 1)$). Applying Riesz–Thorin with $p_0 = q_0 = 1$, $p_1 = 1$, $q_1 = \infty$, $M_0 = \|g\|_1$, $M_1 = \|g\|_\infty$, and interpolation parameter $\theta = 1 - 1/r$ gives the estimate $\|f * g\|_r \le \|g\|_1^{1/r} \|g\|_\infty^{1-1/r} \|f\|_1$. A more refined application — interpolating between the pairs $(1,1)$ and $(r, r)$ — recovers Young's inequality $\|f * g\|_r \le \|f\|_p \|g\|_q$ for $1/p + 1/q = 1 + 1/r$.
[/example]
## The Hausdorff–Young Inequality
The Riesz–Thorin theorem has a direct application to the Fourier transform. The Fourier transform $\mathcal{F}$ is bounded from $L^1(\mathbb{R}^n)$ to $L^\infty(\mathbb{R}^n)$ with norm 1, and by the Plancherel theorem it is an isometry on $L^2(\mathbb{R}^n)$ (up to a normalisation constant depending on the convention). Interpolating between these two facts gives:
[quotetheorem:3150]
[citeproof:3150]
The inequality is sharp at $p = 2$ (isometry) and at $p = 1$ (Riemann–Lebesgue). The optimal constant for general $p$ — the Babenko–Beckner constant — is $(p^{1/p} / p'^{1/p'})^{n/2}$, strictly less than 1 for $1 < p < 2$. Proving this sharp constant requires a different approach (Gaussian test functions and complex analysis), and we state it without proof here.
[remark: Failure for $p > 2$]
The Hausdorff–Young inequality fails for $p > 2$. By a result of Herz, there exist functions $f \in L^p(\mathbb{R})$ for $p > 2$ whose Fourier transforms are not in any $L^q$. This asymmetry reflects the fact that the Fourier transform is not bounded from $L^p$ to any $L^q$ when $p > 2$.
[/remark]
The failure for $p > 2$ draws a sharp boundary in the $L^p$ theory of the Fourier transform. The next interpolation theorem — Marcinkiewicz — operates on the same principle but relaxes strong-type to weak-type, vastly extending the class of operators it can handle.
## The Marcinkiewicz Interpolation Theorem
The Riesz–Thorin theorem requires that the operator be bounded — that is, it maps $L^{p_i}$ to $L^{q_i}$ in the strong sense. Many natural operators satisfy only a weaker condition at the endpoints: they map $L^{p_i}$ to weak-$L^{q_i}$, meaning $d_{Tf}(\lambda) \lesssim \lambda^{-q_i}$ but $\|Tf\|_{L^{q_i}}$ may be infinite. The Marcinkiewicz theorem shows that weak-type endpoint bounds still suffice for strong-type bounds at all intermediate exponents.
[definition: Weak Type and Strong Type]
A linear (or sublinear) operator $T$ is of **strong type $(p, q)$** if
\begin{align*}
\|Tf\|_{L^q} \le C \|f\|_{L^p} \quad \text{for all } f \in L^p.
\end{align*}
It is of **weak type $(p, q)$** if
\begin{align*}
\|Tf\|_{L^{q,\infty}} \le C \|f\|_{L^p},
\end{align*}
that is, $d_{Tf}(\lambda) \le (C \|f\|_{L^p} / \lambda)^q$ for all $\lambda > 0$.
[/definition]
Strong type $(p,q)$ implies weak type $(p,q)$ by Chebyshev's inequality, but not conversely. For example, the Hardy–Littlewood maximal function $Mf$ satisfies weak type $(1,1)$ but not strong type $(1,1)$ — indeed $Mf$ need not be integrable even when $f \in L^1$.
[quotetheorem:3151]
[citeproof:3151]
The splitting $f = f_\lambda + r_\lambda$ is the prototype of the Calderón–Zygmund decomposition that will appear throughout the course. Here it is used in a simple pointwise form, but the same idea — separating the "big" and "small" parts of a function at a threshold level — underlies the weak-type bounds for the maximal function and singular integrals.
[example: Diagonal Marcinkiewicz]
The important special case $q_i = p_i$ (diagonal interpolation) says: if $T$ is of weak type $(p_0, p_0)$ and weak type $(p_1, p_1)$ with $p_0 < p_1$, then $T$ is of strong type $(p, p)$ for all $p_0 < p < p_1$. This is exactly how $L^p$ boundedness of the Hardy–Littlewood maximal function is deduced in Chapter 2: $M$ is of weak type $(1,1)$ and of strong type $(\infty, \infty)$ (immediate from $|Mf(x)| \le \|f\|_\infty$), and the theorem gives strong type $(p,p)$ for all $1 < p \le \infty$.
[/example]
With both interpolation theorems now in hand, it is worth pausing to compare their hypotheses, conclusions, and typical uses side by side.
[remark: Comparison with Riesz–Thorin]
The two interpolation theorems are complementary. Riesz–Thorin applies to linear operators with strong-type hypotheses, produces sharp norm constants, and requires complex-analytic machinery. Marcinkiewicz applies to sublinear operators with weak-type hypotheses, tolerates endpoint losses (hence the larger class of operators it covers), but produces a constant $C$ that is not optimal. In practice, one uses Riesz–Thorin when the operator is explicitly given by a multiplier or kernel and the endpoint norms are computable, and one uses Marcinkiewicz when only weak-type bounds are available at the endpoints — as is the case for most integral operators with singular kernels.
[/remark]
Interpolation theory provides the technical framework for measuring operator boundedness across different function spaces. The Hardy-Littlewood maximal function exemplifies this theory in practice, serving as a foundational maximal operator whose boundedness in $L^p$ spaces demonstrates how interpolation applies to concrete harmonic analysis problems.
# 2. The Hardy-Littlewood Maximal Function
The Hardy–Littlewood maximal function is the fundamental tool of real-variable harmonic analysis. Rather than studying the pointwise values of an integral average, one asks for the worst-case average over all balls centred at a point — this is the maximal function. The two central results of this chapter are the Hardy–Littlewood maximal theorem, which quantifies the distribution of large values of the maximal function, and the Lebesgue differentiation theorem, which shows that for locally integrable functions the averages over shrinking balls converge to the function value almost everywhere. The chapter builds directly on the interpolation tools of Chapter 1: once the weak-(1,1) bound is in hand, Marcinkiewicz interpolation immediately yields strong $L^p$ bounds for $1 < p \le \infty$.
## Variants of the Maximal Function
The Hardy–Littlewood maximal function formalises the idea of controlling a function by its local averages. Given $f \in L^1_{\mathrm{loc}}(\mathbb{R}^n)$, the natural question is: how large can the average of $|f|$ over balls centred at $x$ be, across all possible radii?
[definition: Hardy-Littlewood Maximal Function]
Let $f \in L^1_{\mathrm{loc}}(\mathbb{R}^n)$. The **centered Hardy–Littlewood maximal function** is
\begin{align*}
Mf(x) = \sup_{r > 0} \frac{1}{|B(x,r)|} \int_{B(x,r)} |f(y)|\, d\mathcal{L}^n(y),
\end{align*}
where $|B(x,r)|$ denotes the Lebesgue measure of the open ball $B(x,r)$.
[/definition]
Several variants appear naturally. The **uncentered maximal function** $\widetilde{M}f(x)$ takes the supremum over all balls $B$ that contain $x$ (not necessarily centred at $x$):
\begin{align*}
\widetilde{M}f(x) = \sup_{B \ni x} \frac{1}{|B|} \int_B |f(y)|\, d\mathcal{L}^n(y).
\end{align*}
The uncentered variant is larger, but the two are comparable: $Mf(x) \le \widetilde{M}f(x) \le 2^n Mf(x)$. The factor $2^n$ arises because a ball of radius $r$ containing $x$ is contained in a ball of radius $2r$ centred at $x$.
The **dyadic maximal function** $M_{\mathcal{D}}f(x)$ restricts to averages over dyadic cubes containing $x$:
\begin{align*}
M_{\mathcal{D}}f(x) = \sup_{Q \in \mathcal{D},\, Q \ni x} \frac{1}{|Q|} \int_Q |f(y)|\, d\mathcal{L}^n(y).
\end{align*}
The dyadic variant is central to stopping-time arguments (Chapter 3) and avoids the geometric complexity of balls. The **strong maximal function** $M_{\mathrm{s}}f(x)$ takes the supremum over rectangles with sides parallel to coordinate axes; it is strictly stronger than $Mf$ and satisfies an $L\log L$ bound (Jessen–Marcinkiewicz–Zygmund): $M_{\mathrm{s}}f \in L^1(Q)$ whenever $f \in L\log L(Q)$, but no such conclusion holds for merely $f \in L^1$.
[remark: Basic Properties of the Maximal Function]
The maximal function satisfies two structural properties that underlie its use. First, **sublinearity**: $M(f + g)(x) \le Mf(x) + Mg(x)$ and $M(\lambda f) = |\lambda| Mf$ for $\lambda \in \mathbb{R}$. Second, $Mf$ is **lower semicontinuous**: the superlevel set $\{Mf > \lambda\}$ is open for each $\lambda > 0$. The latter follows because if $Mf(x_0) > \lambda$, then the average of $|f|$ over some ball $B(x_0, r)$ exceeds $\lambda$, and for $y$ close to $x_0$ the same ball (slightly enlarged) still witnesses the same inequality.
[/remark]
[example: The Maximal Function of an Indicator]
Let $f = \mathbb{1}_{[-1,1]}$ on $\mathbb{R}$. For $|x| > 1$ and radius $r = |x| + 1$, the ball $B(x, r)$ contains $[-1,1]$, and the average is $2/(2r) = 1/(|x|+1)$. The optimal radius is $r = |x|+1$, giving $Mf(x) = 1/(|x|+1) \asymp |x|^{-1}$ for $|x| \gg 1$. Since $\int_2^\infty 1/|x|\,d\mathcal{L}^1 = \infty$, we conclude $Mf \notin L^1$. In particular $Mf \notin L^1(\mathbb{R})$, even though $f \in L^1(\mathbb{R})$. This shows that the maximal function does not preserve $L^1$ integrability — a phenomenon the maximal theorem makes precise via the weak-(1,1) bound.
[/example]
## The Vitali Covering Lemma
The key geometric input to the maximal theorem is a covering argument: given a finite family of balls, one can extract a sparse subfamily whose dilates cover everything. This is the Vitali covering lemma.
[quotetheorem:2967]
[citeproof:2967]
<!-- illustration-needed: Vitali covering — show a finite collection of balls in the plane, the selected disjoint subfamily highlighted, and arrows indicating how each non-selected ball is absorbed into the 5-fold dilate of a nearby selected ball -->
The factor $5$ is not sharp; one could use $3$ for the uncentered variant. The key point is that the measure of the union is controlled: by disjointness,
\begin{align*}
\Big|\bigcup_{j=1}^N B_j\Big| \le \Big|\bigcup_{\ell=1}^k 5B_{i_\ell}\Big| \le \sum_{\ell=1}^k |5B_{i_\ell}| = 5^n \sum_{\ell=1}^k |B_{i_\ell}|.
\end{align*}
This measure estimate is the geometric core of the weak-(1,1) maximal theorem.
## The Hardy–Littlewood Maximal Theorem
With the Vitali lemma in hand, the weak-(1,1) bound is within reach. The strategy is direct: to control the measure of $\{Mf > \lambda\}$, cover this set by balls on which the average of $|f|$ exceeds $\lambda$, apply the Vitali lemma to pass to a disjoint subcollection, and estimate the total measure using $\|f\|_1$.
[quotetheorem:2968]
[citeproof:2968]
[remark: Failure at the Endpoints]
The $L^1$ endpoint fails: as illustrated above, $\mathbb{1}_{[-1,1]} \in L^1(\mathbb{R})$ but $M(\mathbb{1}_{[-1,1]}) \notin L^1(\mathbb{R})$. The best one can say for $f \in L^1$ is the weak-$L^1$ bound. The $L^\infty$ bound is immediate from $|Mf(x)| \le \|f\|_\infty$, but $M$ does not map $L^\infty$ to $L^\infty$ when $f$ is constant: $M(1) = 1$, which is fine, but the interesting failure mode is the lack of genuine gain in regularity at $\infty$.
[/remark]
The constant $5^n$ in the weak-(1,1) bound is not optimal. It can be improved by using the uncentered variant (factor $3^n$) or by the Calder–Zygmund decomposition approach of Chapter 3, which yields explicit constants. For applications the precise constant matters less than the dimension dependence.
[example: Sharpness of the Weak-(1,1) Bound]
Let $f = \mathbb{1}_{B(0,1)}$ in $\mathbb{R}^n$. For large $|x|$, the ball $B(x, 2|x|)$ contains $B(0,1)$ and has volume $c_n (2|x|)^n$, so $Mf(x) \ge c_n' |x|^{-n}$. Thus $|\{Mf > \lambda\}| \gtrsim \lambda^{-1}$ for small $\lambda$, matching the weak-(1,1) bound up to constants. This confirms the bound is sharp in its $\lambda$-dependence.
[/example]
## The Lebesgue Differentiation Theorem
The maximal theorem has an immediate and important consequence: the local averages of an integrable function converge to the function value at almost every point. This is the Lebesgue differentiation theorem, one of the foundational results of measure theory.
[quotetheorem:74]
[citeproof:74]
The proof makes the role of the maximal function transparent: it is the right tool to dominate the oscillation of $f$ because it is the smallest superadditive function that majorises all local averages.
[remark: Generalisations]
The differentiation theorem holds for averages over more general families of sets shrinking to a point — so-called **regular families** — where the regularity condition requires that the sets remain comparable to balls (avoiding long thin needles). For the centered maximal function the proof above applies verbatim. The dyadic analogue holds as well: for $\mathcal{L}^n$-a.e. $x$, the dyadic averages $\frac{1}{|Q|}\int_Q f$ over dyadic cubes $Q$ containing $x$ with $\ell(Q) \to 0$ converge to $f(x)$.
[/remark]
## Lebesgue Points
The Lebesgue differentiation theorem gives almost-everywhere convergence of averages, but one can ask for a stronger local regularity property: at which points does the function $f$ behave, not merely in average, but as if it were continuous?
[definition: Lebesgue Point]
Let $f \in L^1_{\mathrm{loc}}(\mathbb{R}^n)$. A point $x \in \mathbb{R}^n$ is a **Lebesgue point** of $f$ if
\begin{align*}
\lim_{r \to 0^+} \frac{1}{|B(x,r)|} \int_{B(x,r)} |f(y) - f(x)|\, d\mathcal{L}^n(y) = 0.
\end{align*}
[/definition]
The condition says that the average deviation of $f$ from $f(x)$ over small balls goes to zero; it is strictly stronger than the convergence of averages (which only requires $\frac{1}{|B|}\int_B f \to f(x)$, without the absolute value inside). Every point of continuity of $f$ is a Lebesgue point, but Lebesgue points exist far more generally.
[quotetheorem:2975]
[citeproof:2975]
[explanation: Approximate Continuity]
The concept of a Lebesgue point connects naturally to that of **approximate continuity**. A measurable function $f$ is approximately continuous at $x$ if there exists a measurable set $E$ with $x$ as a density point of $E$ (meaning $|B(x,r) \cap E|/|B(x,r)| \to 1$ as $r \to 0$) and $f|_E$ is continuous at $x$.
It is a theorem that $x$ is a Lebesgue point of $f$ if and only if $f$ is approximately continuous at $x$ (provided $f$ is real-valued and locally integrable). The "only if" direction: if $x$ is a Lebesgue point, then for each $\varepsilon > 0$ the set $E_\varepsilon = \{y : |f(y) - f(x)| < \varepsilon\}$ has $x$ as a density point (by the Lebesgue point condition applied to $|f - f(x)|$), and $f$ is continuous on $E_\varepsilon$ at $x$ by construction. The "if" direction is a standard approximation argument.
This characterisation makes precise the sense in which a Lebesgue point is a point of "approximate" regularity: $f$ may oscillate wildly on a sparse set near $x$, but on a set of full density near $x$ it stays close to $f(x)$.
[/explanation]
[remark: Lebesgue Points and Pointwise Representatives]
For $f \in L^1_{\mathrm{loc}}$, the Lebesgue point condition selects a canonical pointwise representative of the equivalence class: at each Lebesgue point $x$, one defines $f(x) = \lim_{r \to 0} \frac{1}{|B(x,r)|}\int_{B(x,r)} f\, d\mathcal{L}^n$. This representative is defined $\mathcal{L}^n$-a.e. and is the natural choice when one needs actual pointwise values rather than equivalence classes. This construction is used implicitly whenever one writes "$f(x)$" for an $L^p$ function and asks about pointwise convergence.
[/remark]
The maximal function's control over function behavior motivates a deeper decomposition of functions into manageable pieces. The Calderón-Zygmund decomposition builds on this insight, providing an algorithmic way to partition a function based on level sets of the maximal function for precise analysis.
# 3. The Calderón-Zygmund Decomposition
## Overview
The Calderón-Zygmund decomposition is the most versatile stopping-time argument in harmonic analysis. Given an integrable function $f$ and a threshold $\lambda > 0$, it splits $f$ into a "good" part $g$ bounded pointwise by $C\lambda$ and a "bad" part $b$ concentrated on a disjoint collection of cubes with small total measure. The bad part has mean zero on each cube — the cancellation condition that makes singular integrals tractable. Alongside this function decomposition sits the Whitney decomposition, which tiles any proper open set by dyadic cubes proportional in size to their distance from the boundary. Together, these tools appear in the proofs of the weak-(1,1) bound for Calderón-Zygmund operators (Chapter 5), the John-Nirenberg inequality (Chapter 8), and many other results throughout the course. Both constructions rest on the same foundation: the dyadic grid and the Lebesgue differentiation theorem established in Chapter 2.
## The Dyadic Grid
The dyadic grid provides a canonical multiscale tiling of $\mathbb{R}^n$ by half-open cubes with a remarkable nesting property.
[definition: Dyadic Cube]
For $k \in \mathbb{Z}$ and a multi-index $m = (m_1, \dots, m_n) \in \mathbb{Z}^n$, the **dyadic cube of generation $k$** associated to $m$ is
\begin{align*}
Q_{k,m} := \bigl[2^{-k}m_1,\, 2^{-k}(m_1+1)\bigr) \times \cdots \times \bigl[2^{-k}m_n,\, 2^{-k}(m_n+1)\bigr).
\end{align*}
The side length of $Q_{k,m}$ is $2^{-k}$, so its volume is $|Q_{k,m}| = 2^{-kn}$. The **dyadic grid** is the collection
\begin{align*}
\mathcal{D} := \bigl\{Q_{k,m} : k \in \mathbb{Z},\, m \in \mathbb{Z}^n\bigr\}.
\end{align*}
[/definition]
The key structural property of $\mathcal{D}$ is that any two dyadic cubes are either disjoint or one is contained in the other.
[quotetheorem:3152]
[citeproof:3152]
This nesting property is what makes stopping-time arguments work: when we "stop" at a cube, we know its children and parents are cleanly separated from one another.
The other essential fact is a boundary condition at infinity.
[quotetheorem:3153]
[citeproof:3153]
## The Calderón-Zygmund Decomposition
The stopping-time argument is the heart of the construction. We start with $f \in L^1(\mathbb{R}^n)$ and a level $\lambda > 0$. The idea is to scan through the dyadic grid from large to small scales, stopping at the first generation where the average of $|f|$ over a cube exceeds $\lambda$. By the previous theorem, large cubes have averages near zero; by the Lebesgue differentiation theorem, small cubes have averages approximating the pointwise value. So for any point where $|f(x)| > \lambda$, there must be a first "generation of excess."
[quotetheorem:3154]
[citeproof:3154]
[remark: Why Dyadic?]
The upper bound $2^n\lambda$ on the average comes from comparing $Q_j$ to its parent, which costs a factor of $2^n$. The argument also runs with balls (using the Vitali covering lemma from Chapter 2), at the cost of a different absolute constant and the loss of the clean disjointness. The dyadic version gives disjoint cubes with explicit constants, which is why it is used in inductive arguments like the John-Nirenberg inequality.
[/remark]
### The Good and Bad Parts
From the geometric decomposition $\mathbb{R}^n = \Omega \cup (\mathbb{R}^n \setminus \Omega)$, we now decompose $f$ itself.
[definition: Good and Bad Parts]
Let $\{Q_j\}$ be the cubes from the Calderón-Zygmund decomposition at level $\lambda$. Define the **good function** $g$ and **bad function** $b$ by $f = g + b$, where
\begin{align*}
g(x) :=
\begin{cases}
f(x) & x \notin \Omega, \\
\displaystyle\frac{1}{|Q_j|}\int_{Q_j} f\, d\mathcal{L}^n & x \in Q_j,
\end{cases}
\end{align*}
and $b := f - g = \sum_j b_j$ with $b_j := (f - f_{Q_j})\,\mathbb{1}_{Q_j}$, where $f_{Q_j} := \frac{1}{|Q_j|}\int_{Q_j} f\, d\mathcal{L}^n$.
[/definition]
[quotetheorem:3155]
[citeproof:3155]
The good part $g$ is bounded in $L^\infty$ by $2^n\lambda$ and lies in $L^1$, so $g \in L^p$ for all $1 \leq p \leq \infty$. The bad part $b$ is the sum of localized pieces each with mean zero — this cancellation is the key ingredient that allows singular integral operators to act on $b$ without blowing up. The $L^2$ norm of $b_j$ (or of $g$) can also be estimated by interpolation between $L^1$ and $L^\infty$, but the $L^1$ and cancellation properties above are the core facts used in applications.
[example: A Step Function]
Let $n = 1$, $f = \mathbb{1}_{[0,1]}$, and $\lambda = 1/2$. The average of $|f|$ over $[0,1]$ is $1 > \lambda$, but the average over $[-1, 1]$ is $1/2 = \lambda$, which is not strictly greater than $\lambda$. At the generation where cubes have length $1$ and contain $[0,1)$, the average exceeds $\lambda$. So $Q_1 = [0,1)$ is selected (assuming we use the dyadic cube $[0,1)$ at level $k=0$). The set $\Omega = [0,1)$, and off $\Omega$ we have $f = 0 \leq \lambda$. The good part is $g = \mathbb{1}_{[0,1)}$ restricted to have the constant value $f_{Q_1} = 1$ on $Q_1$, while off $Q_1$ it equals $f = 0$. So $g = \mathbb{1}_{[0,1)}$ and $b = 0$. The average $|Q_1|^{-1}\int_{Q_1}|f| = 1 \in (\lambda, 2\lambda] = (1/2, 1]$, consistent with the bound. If instead $\lambda = 1/4$, then $[0,1)$ still gets selected (average $1 > 1/4$), and the same decomposition results with $g = \mathbb{1}_{[0,1)}$, $b = 0$.
For a more substantive example, take $f = \mathbb{1}_{[0,1/2)} - \mathbb{1}_{[1/2, 1)}$ and $\lambda = 1/4$. The stopping-time selects the maximal dyadic cube whose average of $|f|$ exceeds $\lambda$. The cube $[0,2)$ has average $1/2 > 1/4$ and its parent $[-2,2)$ has average $1/4 \not> \lambda$; thus $Q_1 = [0,2)$ is selected. Since $\int_{[0,2)} f = 0$, we get $g = 0$ on $Q_1$, $b_1 = f \cdot \mathbb{1}_{[0,2)} = f$, $\int b_1 = 0$, $\|b_1\|_{L^1} = 1 \le 2\int_{[0,2)}|f| = 2$.
[/example]
## The Whitney Decomposition
The Whitney decomposition addresses a different but closely related problem: given an open set $\Omega \subsetneq \mathbb{R}^n$, tile it with dyadic cubes that are "as large as possible while staying well inside $\Omega$." The precise statement is that each cube has diameter comparable to its distance from the complement.
[quotetheorem:3156]
[citeproof:3156]
<!-- illustration-needed: Whitney decomposition of an open set — show a bounded open set (e.g. a disk with a slit) and its covering by dyadic cubes, with larger cubes far from the boundary and smaller cubes near the boundary, illustrating diam(Q_j) ~ dist(Q_j, Omega^c) -->
[remark: Whitney Decomposition vs. Whitney Extension]
The Whitney decomposition should not be confused with the Whitney extension theorem, which constructs a smooth extension of a function defined on a closed set. The decomposition is the geometric tool underlying that theorem, but they are distinct results: the decomposition produces a tiling, while the extension theorem produces a function. The decomposition is also the right framework for defining smooth partitions of unity on open sets adapted to the boundary geometry.
[/remark]
The Whitney decomposition is used throughout analysis whenever one needs to integrate over an open set by integrating over cubes whose sizes reflect their distance to the boundary. It appears in the proof of the Sobolev extension theorem, in the construction of the Calderón-Zygmund decomposition for non-dyadic analogues, and in the characterization of BMO via Carleson measures (Chapter 8).
## A Second Proof of Weak-(1,1) for the Maximal Function
Chapter 2 proved the weak-(1,1) bound for the Hardy-Littlewood maximal function $Mf$ using the Vitali covering lemma. The Calderón-Zygmund decomposition gives a second proof that produces explicit constants and generalizes immediately to the setting of Calderón-Zygmund operators (Chapter 5).
[quotetheorem:3157]
[citeproof:3157]
[explanation: Why This Proof Generalizes]
The Vitali covering proof of weak-(1,1) relied on the specific geometry of balls and the covering lemma. The Calderón-Zygmund proof instead decomposes $f$ itself into structured pieces and exploits the mean-zero property of the bad part. This pattern — decompose $f = g + b$, handle $g$ by an $L^2$ bound (using Chebyshev), handle $b$ by size and cancellation — is precisely the template for proving weak-(1,1) for all Calderón-Zygmund singular integral operators in Chapter 5.
The explicit constant $2^n/\lambda$ in the bound above can be compared with the constant from the Vitali argument. Neither is sharp — the optimal constant in the weak-(1,1) bound for $M$ is a deep open problem — but the CZ method gives a route to track constants through the subsequent machinery.
[/explanation]
With a systematic decomposition in hand, we can now study singular integral operators that respect this structure. The Hilbert transform emerges as the canonical example of such an operator, whose boundedness properties follow from the maximal function and decomposition techniques.
# 4. The Hilbert Transform
The Hilbert transform is the simplest non-trivial singular integral, and the right place to see every phenomenon of the subject in its cleanest form. Unlike the Hardy–Littlewood maximal function, which averages over balls, the Hilbert transform integrates against a kernel $1/(x-y)$ that is not locally integrable — the integral must be interpreted as a principal value. The reward for this additional complexity is a rich structure: the transform is simultaneously an isometry on $L^2$, a bounded operator on $L^p$ for $1 < p < \infty$, and weak-(1,1). At the endpoints $p = 1$ and $p = \infty$ it fails to be bounded, and computing an explicit example reveals exactly why. The maximal truncation and Cotlar's identity then prepare the ground for the almost-everywhere convergence result, which is the prototype for the general theory of singular integrals in Chapter 5.
## Three Equivalent Definitions
The kernel $K(x) = 1/(\pi x)$ is odd and homogeneous of degree $-1$ in one dimension. Because it is not integrable near the origin, the convolution $K * f$ must be defined by removing a symmetric neighbourhood of the singularity and taking a limit.
[definition: Hilbert Transform]
The **Hilbert transform** is the operator $H : \mathcal{S}(\mathbb{R}) \to \mathcal{S}(\mathbb{R})$ defined for $f \in \mathcal{S}(\mathbb{R})$ by
\begin{align*}
Hf(x) = \frac{1}{\pi} \operatorname{p.v.} \int_{\mathbb{R}} \frac{f(y)}{x - y}\, d\mathcal{L}^1(y) := \frac{1}{\pi} \lim_{\varepsilon \to 0^+} \int_{|x-y| > \varepsilon} \frac{f(y)}{x - y}\, d\mathcal{L}^1(y).
\end{align*}
[/definition]
The notation $\operatorname{p.v.}$ stands for *principal value*. The symmetry of the truncation — removing both $(-\varepsilon, 0)$ and $(0, \varepsilon)$ from the region of integration — is essential: the kernel $1/(\pi x)$ is odd, so the contributions from $y \in (x - \varepsilon, x)$ and $y \in (x, x + \varepsilon)$ cancel. Any asymmetric truncation would produce a different (and unbounded) limit.
The limit defining $Hf$ exists for every $x$ when $f \in \mathcal{S}(\mathbb{R})$, and the Schwartz class is the natural domain for establishing equivalence of definitions.
### The Conjugate Poisson Integral
The second approach comes from complex analysis. Given $f \in \mathcal{S}(\mathbb{R})$, form the Poisson integral
\begin{align*}
u(x, t) = (P_t * f)(x) = \frac{1}{\pi} \int_{\mathbb{R}} \frac{t}{(x-y)^2 + t^2} f(y)\, d\mathcal{L}^1(y), \quad t > 0,
\end{align*}
which is the real part of a holomorphic function $F = u + iv$ on the upper half-plane. The harmonic conjugate $v$ is the conjugate Poisson integral:
\begin{align*}
v(x, t) = (Q_t * f)(x) = \frac{1}{\pi} \int_{\mathbb{R}} \frac{x - y}{(x-y)^2 + t^2} f(y)\, d\mathcal{L}^1(y).
\end{align*}
[definition: Hilbert Transform via Conjugate Poisson Integral]
The Hilbert transform $H : \mathcal{S}(\mathbb{R}) \to \mathcal{S}(\mathbb{R})$ is defined for $f \in \mathcal{S}(\mathbb{R})$ as the boundary value of the conjugate Poisson integral:
\begin{align*}
Hf(x) = \lim_{t \to 0^+} v(x, t) = \lim_{t \to 0^+} (Q_t * f)(x).
\end{align*}
[/definition]
The kernel $Q_t(x) = \frac{x}{\pi(x^2 + t^2)}$ converges in a distributional sense to the principal-value kernel $\frac{1}{\pi x}$ as $t \to 0^+$, which is why the two definitions agree on $\mathcal{S}(\mathbb{R})$.
### The Fourier Multiplier
The cleanest definition for analysis is as a Fourier multiplier.
[definition: Hilbert Transform as Fourier Multiplier]
The Hilbert transform is the operator $H : L^2(\mathbb{R}) \to L^2(\mathbb{R})$ defined by the Fourier multiplier $-i\operatorname{sgn}(\xi)$:
\begin{align*}
\widehat{Hf}(\xi) = -i\operatorname{sgn}(\xi)\, \hat{f}(\xi), \quad \xi \in \mathbb{R}.
\end{align*}
Here $\operatorname{sgn}(\xi) = \mathbb{1}_{(0,\infty)}(\xi) - \mathbb{1}_{(-\infty,0)}(\xi)$.
[/definition]
To reconcile this with the principal-value definition, one computes the distributional Fourier transform of $\operatorname{p.v.}(1/(\pi x))$. For $f \in \mathcal{S}(\mathbb{R})$ and $\varepsilon > 0$, write
\begin{align*}
\mathcal{F}\!\left(\operatorname{p.v.}\frac{1}{\pi x}\right)(\xi) = \lim_{\varepsilon \to 0^+} \int_{|x| > \varepsilon} \frac{e^{-i\xi x}}{\pi x}\, d\mathcal{L}^1(x).
\end{align*}
Since the integrand is odd after splitting into $e^{-i\xi x} = \cos(\xi x) - i\sin(\xi x)$, only the sine term survives:
\begin{align*}
= \lim_{\varepsilon \to 0^+} \frac{-2i}{\pi} \int_\varepsilon^\infty \frac{\sin(\xi x)}{x}\, d\mathcal{L}^1(x).
\end{align*}
The classical Dirichlet integral gives $\int_0^\infty \frac{\sin(u)}{u}\, d\mathcal{L}^1(u) = \frac{\pi}{2}$, so the limit equals $-i\operatorname{sgn}(\xi)$, confirming the multiplier formula.
### Equivalence on $\mathcal{S}(\mathbb{R})$
[quotetheorem:3158]
[citeproof:3158]
[remark: Skew-Symmetry and Involution]
The Hilbert transform satisfies $H^2 = -\mathrm{Id}$ on $L^2(\mathbb{R})$, because $(−i\operatorname{sgn}(\xi))^2 = -1$. In particular $H$ is injective, and $H^{-1} = -H$. The transform is also skew-symmetric: $(Hf, g)_{L^2} = -(f, Hg)_{L^2}$, since $\overline{-i\operatorname{sgn}(\xi)} = i\operatorname{sgn}(\xi)$.
[/remark]
## $L^2$ Boundedness via Plancherel
The Fourier multiplier definition makes $L^2$ boundedness immediate.
[quotetheorem:3168]
[citeproof:3168]
[explanation: Why $L^2$ is the Right Starting Point]
The $L^2$ isometry is the entry point for everything that follows. In the Calderón–Zygmund scheme for obtaining $L^p$ bounds, one needs: (i) a strong-type bound at some exponent, and (ii) a weak-type bound at $L^1$. Here the strong-type bound at $p = 2$ is provided by Plancherel, and the weak-(1,1) bound will be proved by the Calderón–Zygmund decomposition in the next section. Marcinkiewicz interpolation then fills in $1 < p < 2$, and duality extends to $2 < p < \infty$. The $L^2$ step cannot be replaced by a direct kernel estimate: the kernel $1/(\pi x)$ does not satisfy the conditions for the Riesz–Thorin or Young convolution theorems on $L^1$, so a frequency-domain argument is essential.
[/explanation]
## Weak-(1,1) and $L^p$ Boundedness
### The Hörmander Kernel Condition
To apply the Calderón–Zygmund machinery, one must verify a smoothness condition on the kernel $K(x) = 1/(\pi x)$. The relevant condition is the Hörmander condition, which controls how fast the kernel varies.
[definition: Hörmander Kernel Condition]
A kernel $K: \mathbb{R} \setminus \{0\} \to \mathbb{C}$ satisfies the **Hörmander condition** if there exists a constant $C > 0$ such that for every $y \ne 0$,
\begin{align*}
\int_{|x| > 2|y|} |K(x - y) - K(x)|\, d\mathcal{L}^1(x) \le C.
\end{align*}
[/definition]
For the Hilbert transform kernel $K(x) = 1/(\pi x)$, this is verified by explicit estimation. For $|x| > 2|y|$ one writes
\begin{align*}
K(x - y) - K(x) = \frac{1}{\pi}\left(\frac{1}{x - y} - \frac{1}{x}\right) = \frac{y}{\pi x(x-y)}.
\end{align*}
Since $|x| > 2|y|$ implies $|x - y| \ge |x|/2$, we obtain
\begin{align*}
|K(x-y) - K(x)| = \frac{|y|}{\pi|x||x-y|} \le \frac{2|y|}{\pi|x|^2}.
\end{align*}
Integrating over $|x| > 2|y|$ gives
\begin{align*}
\int_{|x| > 2|y|} |K(x-y) - K(x)|\, d\mathcal{L}^1(x) \le \frac{2|y|}{\pi} \int_{|x| > 2|y|} \frac{d\mathcal{L}^1(x)}{|x|^2} = \frac{2|y|}{\pi} \cdot \frac{2}{2|y|} \cdot 2 = \frac{4}{\pi}.
\end{align*}
So the Hörmander condition holds with constant $C = 4/\pi$.
### Weak-(1,1) via the Calderón–Zygmund Decomposition
[quotetheorem:3159]
[citeproof:3159]
### $L^p$ Boundedness for $1 < p < \infty$
With the weak-(1,1) bound and the $L^2$ isometry in hand, Marcinkiewicz interpolation gives $L^p$ for $1 < p \le 2$.
[quotetheorem:3169]
[citeproof:3169]
### Failure at the Endpoints
The Hilbert transform is not bounded on $L^1(\mathbb{R})$ or $L^\infty(\mathbb{R})$. The computation of $H\mathbb{1}_{[0,1]}$ makes both failures concrete.
[example: $H\mathbb{1}_{[0,1]}$ Is Not in $L^1$ or $L^\infty$]
Let $f = \mathbb{1}_{[0,1]}$. Then, for $x \notin [0, 1]$,
\begin{align*}
Hf(x) = \frac{1}{\pi} \operatorname{p.v.} \int_0^1 \frac{d\mathcal{L}^1(y)}{x - y} = \frac{1}{\pi} \log\left|\frac{x}{x-1}\right|.
\end{align*}
To verify this: $\int_0^1 \frac{dy}{x - y} = [-\log|x - y|]_0^1 = \log|x| - \log|x - 1| = \log|x/(x-1)|$. For $x \in (0, 1)$, the integral is a principal value but the computation remains valid by symmetry.
Near $x = 0$, $Hf(x) \sim \frac{1}{\pi}\log|x| \to -\infty$; near $x = 1$, $Hf(x) \sim \frac{1}{\pi}\log|x - 1|^{-1} \to +\infty$. Both logarithmic singularities are integrable near the singularity (since $\int_0^\delta |\log x|\, dx < \infty$), but the function does not belong to $L^\infty(\mathbb{R})$. Moreover, for large $|x|$,
\begin{align*}
Hf(x) = \frac{1}{\pi}\log\left|\frac{x}{x-1}\right| = \frac{1}{\pi}\log\left|1 + \frac{1}{x-1}\right| \sim \frac{1}{\pi(x-1)} \sim \frac{1}{\pi x},
\end{align*}
and since $\int_2^\infty \frac{dx}{x} = \infty$, the function $H\mathbb{1}_{[0,1]}$ is not in $L^1(\mathbb{R})$.
This computation shows that $f = \mathbb{1}_{[0,1]} \in L^1 \cap L^\infty$ but $Hf \notin L^1$ and $Hf \notin L^\infty$. In fact, $Hf$ does lie in weak-$L^1$ (consistent with the weak-(1,1) bound), and the failure to be in $L^\infty$ indicates that $H$ maps $L^\infty$ into BMO but not back into $L^\infty$.
[/example]
## The Maximal Hilbert Transform and Cotlar's Identity
The principal-value integral $Hf(x)$ is defined as a limit of truncated integrals, but for an arbitrary $L^p$ function this limit is not pointwise obvious. The maximal Hilbert transform controls the size of all truncations simultaneously.
[definition: Truncated Hilbert Transform and Maximal Hilbert Transform]
For $f \in L^p(\mathbb{R})$ and $\varepsilon > 0$, the **$\varepsilon$-truncated Hilbert transform** is
\begin{align*}
H_\varepsilon f(x) = \frac{1}{\pi} \int_{|x-y| > \varepsilon} \frac{f(y)}{x - y}\, d\mathcal{L}^1(y).
\end{align*}
The **maximal Hilbert transform** is
\begin{align*}
H_* f(x) = \sup_{\varepsilon > 0} |H_\varepsilon f(x)|.
\end{align*}
[/definition]
To prove almost-everywhere convergence of $H_\varepsilon f$ as $\varepsilon \to 0$, it suffices to show that $H_*$ is bounded on $L^p$. The key tool is Cotlar's identity, which bounds $H_*$ in terms of the Hardy–Littlewood maximal function $M$ and $H$ itself.
[quotetheorem:3161]
[citeproof:3161]
[explanation: Why Cotlar's Identity Gives Almost-Everywhere Convergence]
From Cotlar's identity and the $L^p$ bounds on $M$ and $H$, one obtains $\|H_* f\|_{L^p} \le C_p \|f\|_{L^p}$ for $1 < p < \infty$. The almost-everywhere convergence $H_\varepsilon f(x) \to Hf(x)$ as $\varepsilon \to 0$ then follows by the standard argument: the convergence holds for $f \in \mathcal{S}(\mathbb{R})$ (where everything is smooth), the set of $f$ for which $\limsup_{\varepsilon \to 0} |H_\varepsilon f - Hf| = 0$ a.e. is closed in $L^p$ (because the maximal function $H_*$ bounds the limsup), and $\mathcal{S}$ is dense in $L^p$, so convergence holds for all $f \in L^p$.
This is the prototype for almost-everywhere convergence in the general Calderón–Zygmund theory of Chapter 5, where the same strategy — bound the maximal truncation via a Cotlar-type identity, invoke $L^p$ boundedness, then pass to dense subsets — handles operators with more general kernels.
[/explanation]
[quotetheorem:3170]
[citeproof:3170]
<!-- illustration-needed: the Hilbert transform of the indicator function 1_{[0,1]} — show the graph of Hf(x) = (1/π) log|x/(x-1)| with the logarithmic blow-ups at x=0 and x=1, and the 1/πx tail for large |x| -->
The Hilbert transform exemplifies a broader class of singular integral operators with similar analytic properties. Calderón-Zygmund operators generalize this framework, extending boundedness results to a wide class of convolution operators with weakly singular kernels.
# 5. Calderón-Zygmund Operators
The Hilbert transform, studied in Chapter 4, is the archetype of a singular integral operator: a convolution whose kernel $1/(\pi x)$ is too large at the origin to define the integral classically, yet which is bounded on $L^p$ for all $1 < p < \infty$ and weak-type $(1,1)$. This chapter abstracts the essential features of that kernel into a set of axioms — size, smoothness, and cancellation — that define the class of Calderón–Zygmund operators. The main theorem states that $L^2$ boundedness, together with these kernel conditions, automatically forces weak-$(1,1)$ and $L^p$ boundedness for $1 < p < \infty$. We begin by meeting the Riesz transforms, the natural higher-dimensional analogues of the Hilbert transform, before developing the general theory.
## The Riesz Transforms
The Hilbert transform on $\mathbb{R}$ acts on the Fourier side by the multiplier $-i\operatorname{sgn}(\xi)$. On $\mathbb{R}^n$ there is no single such transform, but there are $n$ natural ones, one for each coordinate direction.
[definition: Riesz Transform]
For $j = 1, \dots, n$, the $j$-th **Riesz transform** is the operator $R_j : \mathcal{S}(\mathbb{R}^n) \to \mathcal{S}'(\mathbb{R}^n)$ — extending to a bounded $R_j : L^2(\mathbb{R}^n) \to L^2(\mathbb{R}^n)$ — defined for $f \in \mathcal{S}(\mathbb{R}^n)$ by the principal-value singular integral
\begin{align*}
R_j f(x) = c_n \, \mathrm{p.v.} \int_{\mathbb{R}^n} \frac{x_j - y_j}{|x - y|^{n+1}} f(y) \, d\mathcal{L}^n(y),
\end{align*}
where $c_n = \Gamma\!\left(\tfrac{n+1}{2}\right)/\pi^{(n+1)/2}$ is a dimensional constant chosen so that the Fourier multiplier takes the cleanest form.
[/definition]
The Fourier side description is the quickest route to all algebraic properties of the Riesz transforms.
[quotetheorem:3163]
[citeproof:3163]
[remark: Plancherel and $L^2$ Boundedness]
Since the multiplier $m_j(\xi) = -i\xi_j/|\xi|$ satisfies $|m_j(\xi)| = 1$ for all $\xi \neq 0$, Plancherel immediately gives $\|R_j f\|_{L^2} = \|f\|_{L^2}$: each Riesz transform is an isometry on $L^2$.
[/remark]
The family $(R_1, \dots, R_n)$ satisfies a remarkable algebraic identity that mirrors the one-dimensional identity $H^2 = -\mathrm{Id}$.
[quotetheorem:3164]
[citeproof:3164]
[explanation: Second Derivatives via Riesz Transforms]
The identity $\sum_j R_j^2 = -\mathrm{Id}$ is more than algebraic bookkeeping — it gives a powerful representation formula for second-order derivatives of solutions to elliptic PDEs.
If $u$ solves $-\Delta u = f$ on $\mathbb{R}^n$, then $\partial_{x_i}\partial_{x_j} u = -R_i R_j(-\Delta u) = -R_i R_j f$. On the Fourier side this is
\begin{align*}
\widehat{\partial_{x_i}\partial_{x_j} u}(\xi) = -(-i\xi_i/|\xi|)(-i\xi_j/|\xi|)\hat{f}(\xi) = -\frac{\xi_i\xi_j}{|\xi|^2}\hat{f}(\xi).
\end{align*}
Since the Riesz transforms are bounded on $L^p$ for $1 < p < \infty$ (which follows from the general theory below), this gives the Calderón–Zygmund inequality
\begin{align*}
\|\partial_{x_i}\partial_{x_j} u\|_{L^p} \lesssim \|\Delta u\|_{L^p}, \quad 1 < p < \infty.
\end{align*}
This is the foundation of elliptic $L^p$ regularity theory.
[/explanation]
[example: Riesz Transforms in Two Dimensions]
In $\mathbb{R}^2$, the two Riesz transforms $R_1, R_2$ interact via the identity $R_1^2 + R_2^2 = -\mathrm{Id}$. The combination $R_1 + iR_2$ is closely related to the Beurling–Ahlfors transform $\mathcal{B}f = \mathrm{p.v.}\int z^{-2}f(z)\,d\mathcal{L}^2(z)$ (with $z = x_1 + ix_2$), which has Fourier multiplier $-\bar{\xi}/\xi$. The Beurling–Ahlfors transform is an isometry on $L^2(\mathbb{R}^2)$ and plays a central role in quasiconformal mapping theory.
[/example]
## Calderón–Zygmund Kernels
The Riesz transform kernel $K_j(x) = c_n x_j |x|^{-(n+1)}$ shares several features with the Hilbert transform kernel $1/(\pi x)$: it decays like $|x|^{-n}$ at large scales, it has a smooth gradient that decays like $|x|^{-(n+1)}$, and it has mean zero over spheres centered at the origin. These three properties — size, smoothness, and cancellation — are the defining features of a Calderón–Zygmund kernel.
[definition: Calderón–Zygmund Kernel]
A measurable function $K: \mathbb{R}^n \setminus \{0\} \to \mathbb{C}$ is a **Calderón–Zygmund kernel** with constant $A > 0$ if it satisfies:
**Size condition:** $|K(x)| \le A|x|^{-n}$ for all $x \neq 0$.
**Hörmander smoothness condition:** For all $y \neq 0$,
\begin{align*}
\int_{|x| > 2|y|} |K(x - y) - K(x)| \, d\mathcal{L}^n(x) \le A.
\end{align*}
The associated **Calderón–Zygmund operator** is the principal-value convolution
\begin{align*}
Tf(x) = \mathrm{p.v.} \int_{\mathbb{R}^n} K(x - y) f(y) \, d\mathcal{L}^n(y) = \lim_{\varepsilon \to 0} \int_{|x-y|>\varepsilon} K(x-y)f(y)\,d\mathcal{L}^n(y),
\end{align*}
provided this limit exists. The associated operator is $T : L^2(\mathbb{R}^n) \to L^2(\mathbb{R}^n)$; the **cancellation condition** requires that $T$ extends to a bounded operator on $L^2(\mathbb{R}^n)$.
[/definition]
[remark: The Hörmander Condition]
The Hörmander condition bounds the total variation in $K$ over large scales when one shifts the argument. It is equivalent, for differentiable kernels, to the gradient condition $|\nabla K(x)| \le A|x|^{-(n+1)}$: if this holds, then the fundamental theorem of calculus gives
\begin{align*}
|K(x - y) - K(x)| \le |y| \sup_{|z - x| \le |y|} |\nabla K(z)| \lesssim \frac{|y|}{|x|^{n+1}},
\end{align*}
and integrating over $|x| > 2|y|$ yields a bound by a constant. The Hörmander form is preferred because it applies to rougher kernels — for instance, kernels that are merely bounded on the sphere — where pointwise gradient bounds fail.
[/remark]
[example: Examples of Calderón–Zygmund Kernels]
The following all satisfy the size and Hörmander conditions with a constant $A$ depending only on $n$.
**The Hilbert transform kernel** ($n=1$): $K(x) = 1/(\pi x)$. Here $|K(x)| = |\pi x|^{-1}$ satisfies the size condition, and since $K'(x) = -1/(\pi x^2)$ satisfies $|K'(x)| \lesssim |x|^{-2}$, the Hörmander condition holds. The cancellation condition (i.e., $L^2$ boundedness) follows from the Fourier multiplier $-i\operatorname{sgn}(\xi)$ having modulus one.
**The Riesz transform kernels**: $K_j(x) = c_n x_j |x|^{-(n+1)}$ for $j = 1, \dots, n$. The size condition $|K_j(x)| \le c_n |x|^{-n}$ holds, and $|\nabla K_j(x)| \lesssim |x|^{-(n+1)}$ gives the Hörmander condition.
**Odd kernels with mean-zero angular profile**: Fix $\Omega \in L^1(S^{n-1})$ with $\int_{S^{n-1}} \Omega(\theta) \, d\sigma(\theta) = 0$. The kernel $K(x) = \Omega(x/|x|)|x|^{-n}$ satisfies the size condition whenever $\Omega \in L^\infty$, and the mean-zero condition on $\Omega$ implies the cancellation condition. In particular, odd kernels (where $\Omega(-\theta) = -\Omega(\theta)$) automatically have mean zero.
[/example]
The size and Hörmander conditions alone do not guarantee $L^2$ boundedness — the cancellation condition is logically independent. The following section gives a powerful tool for verifying the cancellation condition by establishing $L^2$ boundedness directly.
## The Calderón–Zygmund Theorem
The central result of this chapter converts $L^2$ boundedness into the full range of $L^p$ estimates.
[quotetheorem:3165]
[citeproof:3165]
[explanation: Why the Calderón–Zygmund Theorem Is Sharp at the Endpoints]
The $L^p$ range $1 < p < \infty$ in the theorem is sharp. The Hilbert transform already shows both endpoints fail:
For $p = 1$: Consider $f = \mathbb{1}_{[0,1]}$. Then $Hf(x) = \frac{1}{\pi}\log|x/(x-1)|$, which near $x = 0$ behaves like $\frac{1}{\pi}|\log|x||$. This is integrable (barely), but computing directly shows $Hf \notin L^1(\mathbb{R})$: the logarithmic divergence is not summable over all of $\mathbb{R}$. So $H \not\colon L^1 \to L^1$.
For $p = \infty$: Again with $f = \mathbb{1}_{[0,1]}$, the function $Hf$ is unbounded near $x = 0$ and $x = 1$ (both are logarithmic singularities). The best one can say is that $H$ maps $L^\infty$ into $\mathrm{BMO}$, which will be developed in Chapter 8.
The weak-$(1,1)$ bound at the boundary $p = 1$ is optimal in the sense that $Tf$ need not lie in $L^1$ but its distribution function is controlled: $\mathcal{L}^n(\{|Tf| > \lambda\}) = O(\|f\|_{L^1}/\lambda)$.
[/explanation]
## The Cotlar–Stein Lemma
The Calderón–Zygmund theorem takes $L^2$ boundedness as a hypothesis. In many applications, one must establish $L^2$ boundedness from scratch — for instance, when the operator is not directly given by a convolution or when one wishes to decompose the operator into pieces and reassemble. The Cotlar–Stein lemma (also called the almost-orthogonality lemma) is the standard tool for this.
The idea is familiar from Hilbert space theory: a sum of orthogonal projections $T_j$ satisfies $\|\sum_j T_j\|^2 \le \sum_j \|T_j\|^2$ by Pythagoras. The Cotlar–Stein lemma relaxes exact orthogonality to near-orthogonality: if the products $T_j^* T_k$ and $T_j T_k^*$ are small when $|j - k|$ is large, the same conclusion holds with a different constant.
[quotetheorem:3166]
[citeproof:3166]
[explanation: Applying Cotlar–Stein to Calderón–Zygmund Operators]
To verify $L^2$ boundedness of a Calderón–Zygmund operator $T$ with kernel $K$ directly, one decomposes $K$ into pieces supported in dyadic annuli:
\begin{align*}
K = \sum_{j \in \mathbb{Z}} K_j, \quad K_j(x) = K(x)\,\mathbb{1}_{2^j < |x| \le 2^{j+1}}(x).
\end{align*}
Let $T_j$ be convolution with $K_j$. The almost-orthogonality conditions $\|T_j^* T_k\| \le \gamma(j-k)$ are verified using the size condition $|K(x)| \le A|x|^{-n}$ and the Hörmander smoothness: contributions from non-adjacent dyadic shells are suppressed by the kernel decay. The Cotlar–Stein lemma then assembles the pieces into a bounded operator on $L^2$.
This approach is particularly useful for non-translation-invariant operators, where the Fourier multiplier argument is unavailable. More precisely, for a general operator $T$ with standard kernel $K(x,y)$ satisfying $|K(x,y)| \le A|x-y|^{-n}$, one decomposes $T = \sum_j T_j$ where $T_j f(x) = \int_{2^j < |x-y| \le 2^{j+1}} K(x,y)f(y)\,d\mathcal{L}^n(y)$, and verifies the almost-orthogonality by estimating $T_j^* T_k$ using the support conditions on the kernels. When $|j - k| \ge 2$, the kernels of $T_j$ and $T_k^*$ (or $T_j^*$ and $T_k$) have disjoint supports in the $x$-variable, forcing $T_j T_k^* = 0$. For adjacent $j, k$, one uses the direct kernel bounds. The resulting $\gamma(j-k)$ decays exponentially, so $\sum_k \gamma(k) < \infty$ and Cotlar–Stein applies.
[/explanation]
[remark: Cotlar–Stein in Practice]
The Cotlar–Stein lemma is more than a device for convolution operators. It is central to the $T(1)$ theorem (Chapter 6), where one must verify $L^2$ boundedness of a non-convolution operator by decomposing it via paraproducts and verifying almost-orthogonality. The exponential decay $\gamma(k) = 2^{-\delta|k|}$ for some $\delta > 0$ is the typical form that arises, and it implies $A = \sum_k \gamma(k) < \infty$ with $A \lesssim 1/(1 - 2^{-\delta})$.
[/remark]
[example: Cotlar–Stein for the Hilbert Transform]
Decompose the Hilbert transform kernel $1/(\pi x)$ into dyadic pieces $K_j(x) = \frac{1}{\pi x} \mathbb{1}_{2^j < |x| \le 2^{j+1}}$. Each $T_j$ (convolution with $K_j$) is bounded on $L^2$ with $\|T_j\| \lesssim 1$. For $|j - k| \ge 2$, $T_j^* T_k$ is convolution with $K_j * \tilde K_k$ where $\tilde K_k(x) = \overline{K_k(-x)}$. Disjoint physical supports of $K_j, K_k$ do NOT make this zero; instead, the size and smoothness of the kernels combined with the dyadic-scale separation give $\|K_j * \tilde K_k\|_{L^1} \lesssim 2^{-|j-k|/2}$ via a direct calculation, hence $\|T_j^* T_k\|_{L^2 \to L^2} \lesssim 2^{-|j-k|/2}$ by Young's inequality. The same bound holds for $\|T_j T_k^*\|$. Setting $\gamma(m) = 2^{-|m|/2}$ gives $A = \sum_m \gamma(m) < \infty$ and Cotlar–Stein yields $L^2$ boundedness.
[/example]
Establishing boundedness for individual Calderón-Zygmund operators raises the question of when general multipliers or kernels define bounded operators. The T(1) and T(b) theorems provide characterizations that reduce this question to testing specific function values, making the boundedness problem tractable.
# 6. The T(1) and T(b) Theorems
The Calderón–Zygmund theorem in Chapter 5 reduces $L^p$ boundedness to $L^2$ boundedness, but it does not tell us when a non-convolution operator with a standard kernel is actually $L^2$ bounded. The T(1) theorem of David and Journé fills this gap: it gives a checkable criterion — the images $T(1)$ and $T^*(1)$ lie in $\text{BMO}$, together with a weak boundedness condition — that is both necessary and sufficient. The T(b) theorem then generalises the test function from the constant $1$ to any accretive function $b$, unlocking the proof of $L^2$ boundedness of the Cauchy integral on Lipschitz curves.
## Standard Kernels and Non-Convolution Operators
When $T$ is a convolution operator, its kernel $K(x,y) = K(x-y)$ is translation-invariant and $L^2$ boundedness is read off from the Fourier multiplier. Variable-coefficient operators arising in complex analysis and geometry break translation invariance, so a different framework is needed. The right class is that of operators associated with a standard kernel.
[definition: Standard Kernel]
A function $K: \mathbb{R}^n \times \mathbb{R}^n \setminus \{x = y\} \to \mathbb{C}$ is a **standard kernel** if there exists a constant $A > 0$ such that:
**Size condition:**
\begin{align*}
|K(x,y)| \le \frac{A}{|x-y|^n}
\end{align*}
for all $x \neq y$.
**Smoothness condition:** For each fixed $y$, the map $x \mapsto K(x,y)$ is differentiable away from $y$, and analogously in $y$; more precisely,
\begin{align*}
|\nabla_x K(x,y)| + |\nabla_y K(x,y)| \le \frac{A}{|x-y|^{n+1}}
\end{align*}
for all $x \neq y$. This implies the Hörmander-type cancellation condition
\begin{align*}
|K(x,y) - K(x,y')| \le A\,\frac{|y-y'|}{|x-y|^{n+1}}
\end{align*}
whenever $|y-y'| \le \tfrac{1}{2}|x-y|$, and the analogous estimate with $x$ and $y$ swapped.
[/definition]
Convolution kernels $K(x,y) = \Omega(x-y)|x-y|^{-n}$ with a smooth mean-zero angular profile $\Omega$ satisfy both conditions, as do the Riesz kernels $(x_j - y_j)/|x-y|^{n+1}$. But the definition also covers kernels whose behavior depends on position, such as the Cauchy kernel $1/(z-w)$ on a Lipschitz curve.
The difficulty with a general standard kernel is that the integral $\int K(x,y) f(y)\,d\mathcal{L}^n(y)$ need not converge absolutely. The correct formulation is through the pairing with test functions on disjoint supports.
[definition: Operator with Standard Kernel]
A continuous linear operator $T: C_c^\infty(\mathbb{R}^n) \to (C_c^\infty(\mathbb{R}^n))'$ is said to be **associated with the standard kernel** $K$ if for all $f, g \in C_c^\infty(\mathbb{R}^n)$ with $\operatorname{supp}(f) \cap \operatorname{supp}(g) = \varnothing$,
\begin{align*}
T(f)(g) = \iint K(x,y)\,f(y)\,g(x)\,d\mathcal{L}^n(y)\,d\mathcal{L}^n(x).
\end{align*}
The adjoint $T^*$ is then associated with $K^*(x,y) = \overline{K(y,x)}$.
[/definition]
This is the natural extension of the principal-value framework. Having a standard kernel is a necessary condition for $L^2$ boundedness, but far from sufficient on its own — it says nothing about cancellation across scales when the supports are not disjoint.
[remark: Extending T to BMO inputs]
To even make sense of $T(1)$, where $1$ is not compactly supported, one needs a way to apply $T$ to bounded functions. The standard device is to write $T(1)$ as a distribution by testing against $g \in C_c^\infty$ with $\int g = 0$: set $T(1)(g) = \lim_{R \to \infty} T(\mathbb{1}_{B(0,R)})(g)$, exploiting the cancellation in $g$ to ensure convergence via the kernel estimates. The output is interpreted in $\text{BMO}$.
[/remark]
## The Weak Boundedness Property
Even knowing $T$ has a standard kernel and $T(1) \in \text{BMO}$ does not immediately give $L^2$ boundedness. One more condition is needed, encoding a quantitative bound when both $f$ and $g$ are smooth bumps at the same scale and location.
[definition: Weak Boundedness Property]
An operator $T$ associated with a standard kernel satisfies the **weak boundedness property (WBP)** if there exists $C > 0$ such that
\begin{align*}
|\langle T\varphi_R^x, \psi_R^x \rangle| \le C R^n
\end{align*}
uniformly over all $x \in \mathbb{R}^n$, $R > 0$, and all functions $\varphi, \psi \in C_c^\infty(B(0,1))$ with $\|\varphi\|_{C^1}, \|\psi\|_{C^1} \le 1$. Here $\varphi_R^x(y) = \varphi((y-x)/R)$ and $\psi_R^x(y) = \psi((y-x)/R)$ are rescaled bumps at center $x$ and radius $R$.
[/definition]
The bound $CR^n$ is exactly what one would expect from a bounded operator on $L^2$: if $T$ were $L^2$ bounded, then $|\langle T\varphi_R^x, \psi_R^x \rangle| \le \|T\|_{L^2 \to L^2}\|\varphi_R^x\|_2\|\psi_R^x\|_2 \lesssim \|T\|_{L^2\to L^2} R^n$. So WBP is a necessary condition for $L^2$ boundedness, capturing cancellation at a single scale.
The WBP is stable under transposition ($T^*$ satisfies WBP iff $T$ does), under composition with smooth multiplications, and under the paraproduct decompositions used in the proof. These stability properties make it a robust hypothesis.
[remark: WBP versus kernel estimates]
The WBP is a genuinely additional hypothesis: one can construct operators with standard kernels for which $T(1) = T^*(1) = 0$ but WBP fails, and these operators are not $L^2$ bounded. In practice, verifying WBP for a specific operator is often an estimate of the form $|\langle Tf, g \rangle| \le C\|f\|_2\|g\|_2$ restricted to bump functions, which can be done by direct computation or by an $L^2$ argument on a dense class.
[/remark]
## The T(1) Theorem
The T(1) theorem identifies precisely the three conditions that characterize $L^2$ boundedness among operators with standard kernels.
[quotetheorem:3167]
[citeproof:3167]
The necessity of conditions 1–3 follows from the remarks above: WBP is necessary by Cauchy–Schwarz, and the images $T(1)$ and $T^*(1)$ lie in BMO because $L^2$-bounded Calderón–Zygmund operators map $L^\infty$ to BMO (a consequence of Fefferman duality). The substance of the theorem is sufficiency.
The paraproduct decomposition is the conceptual heart of this argument.
[explanation: The role of the paraproduct]
The paraproduct $\Pi_b$ is engineered so that its action on $1$ gives back $b$: since $\Delta_j(1) = 0$ for all $j$ (the constant function has all frequency bands zero), $\Pi_b(1) = \sum_j S_{j-1}(b) \cdot \Delta_j(1) = 0$ in the Littlewood–Paley sense, but the adjoint satisfies $(\Pi_b)^*(1) = b$ up to a correction. More precisely, subtracting the paraproducts $\Pi_{T(1)}$ and $(\Pi_{T^*(1)})^*$ from $T$ cancels the "main term" contributions of $T(1)$ and $T^*(1)$, leaving a remainder $R$ that genuinely maps the constant function to zero.
The condition $R(1) = 0$ is the cancellation property that makes almost-orthogonality work: without it, the dyadic pieces $R_j$ would have correlations at large separation $|j-k|$ that fail to decay, and the Cotlar–Stein sum would diverge.
[/explanation]
[example: Verification of the T(1) condition]
Consider the operator $Tf(x) = \text{p.v.}\int K(x,y) f(y)\,d\mathcal{L}^n(y)$ where $K(x,y) = \Omega((x-y)/|x-y|)|x-y|^{-n}$ with $\Omega$ odd and smooth. Since $K(x,y) = -K(y,x)$ (by oddness of $\Omega$), the operator $T$ is antisymmetric: $\langle Tf, g \rangle = -\langle Tg, f\rangle$, so $T^* = -T$.
To compute $T(1)$, pair with a Schwartz function $g$ satisfying $\int g = 0$:
\begin{align*}
T(1)(g) = \lim_{R \to \infty} \int \left(\int_{|y| \le R} K(x,y)\,d\mathcal{L}^n(y)\right) g(x)\,d\mathcal{L}^n(x).
\end{align*}
For fixed $x$, $\int_{|y| \le R} K(x,y)\,d\mathcal{L}^n(y) = \int_{|x-z| \le R} K(x, x-z)\,d\mathcal{L}^n(z)$. The substitution $z \mapsto -z$ and the oddness of $\Omega$ give $K(x, x-z) = -K(x, x+z)$, so the integrand over symmetric pairs cancels and the integral vanishes. Thus $T(1) = 0$, and by antisymmetry $T^*(1) = -T(1) = 0$. With WBP verified directly, the T(1) theorem confirms $T$ is $L^2$ bounded — consistent with the classical Calderón–Zygmund result for odd kernels.
[/example]
## The T(b) Theorem
In applications such as the Cauchy integral on Lipschitz curves, the natural test function is not the constant $1$ but an accretive function $b$ adapted to the curve. The T(b) theorem relaxes the condition from $T(1) \in \text{BMO}$ to $T(b) \in \text{BMO}$ for a suitable $b$.
[definition: Accretive Function]
A function $b \in L^\infty(\mathbb{R}^n)$ is **accretive** if there exists $\delta > 0$ such that
\begin{align*}
\operatorname{Re}(b(x)) \ge \delta > 0 \quad \text{for a.e. } x \in \mathbb{R}^n.
\end{align*}
More generally, $b$ is called **para-accretive** if there exist $\delta, C > 0$ such that for every cube $Q \subset \mathbb{R}^n$ there exists a subcube $Q' \subset Q$ with $|Q'| \ge C|Q|$ and $\left|\frac{1}{|Q'|}\int_{Q'} b\right| \ge \delta$.
[/definition]
Accretivity ensures that $b$ is nondegenerate everywhere, making multiplication by $b$ an invertible operation on $L^\infty$. The para-accretive condition is weaker and requires only a substantial average in some subcube of every cube, but it still provides enough control for the theory to work.
[quotetheorem:3171]
[citeproof:3171]
The proof follows the same structure as T(1): construct $b$-adapted paraproducts $\Pi^b_a$ whose subtraction from $T$ removes the contribution of $T(b_1)$ and $T^*(b_2)$, leaving a remainder with $R(b_1) = R^*(b_2) = 0$, to which a version of Cotlar–Stein applies.
[remark: The b-adapted paraproduct]
When $b \not\equiv 1$, the standard Littlewood–Paley paraproduct must be replaced by a $b$-adapted version that accounts for the weight $b$ in the averaging process. The construction uses a $b$-adapted martingale averaging operator in place of the standard dyadic conditional expectation, and the cancellation condition becomes $R(b_1) = 0$ rather than $R(1) = 0$. The para-accretivity condition guarantees that these adapted averaging operators are well-defined and bounded.
[/remark]
## The Cauchy Integral on Lipschitz Curves
The main application of the T(b) theorem is to prove that the Cauchy integral operator is bounded on $L^2$ when defined over a Lipschitz curve. This resolved a major open problem in complex analysis, the Calderón conjecture, via the work of Coifman, McIntosh, and Meyer.
Let $A: \mathbb{R} \to \mathbb{R}$ be a Lipschitz function with $\|A'\|_\infty \le M < \infty$, and let $\Gamma = \{(x, A(x)) : x \in \mathbb{R}\}$ be the corresponding Lipschitz curve in $\mathbb{C}$. The Cauchy integral operator along $\Gamma$ is
\begin{align*}
C_\Gamma f(z) = \text{p.v.}\int_\Gamma \frac{f(w)}{z - w}\,|dw|, \quad z \in \Gamma.
\end{align*}
Parametrising by $x$, this becomes the operator on $L^2(\mathbb{R})$:
\begin{align*}
C_A f(x) = \text{p.v.}\int_{-\infty}^\infty \frac{f(y)}{(x-y) + i(A(x)-A(y))}\,(1 + iA'(y))\,d\mathcal{L}^1(y).
\end{align*}
The kernel $K(x,y) = [(x-y) + i(A(x)-A(y))]^{-1}$ is a standard kernel: the Lipschitz condition $|A(x)-A(y)| \le M|x-y|$ gives
\begin{align*}
|K(x,y)| \le \frac{1}{|x-y|} \cdot \frac{1}{|(1 + i(A(x)-A(y))/(x-y))|} \le \frac{1}{\delta|x-y|}
\end{align*}
where $\delta = (1+M^2)^{-1/2} > 0$, and the derivative estimate follows similarly.
The direct verification of $C_A(1) \in \text{BMO}$ is difficult. Instead, one takes $b(x) = 1 + iA'(x)$, the tangent multiplier to $\Gamma$. Since $|A'| \le M$, the function $b$ is bounded, and $\operatorname{Re}(b(x)) = 1 > 0$, so $b$ is accretive.
[quotetheorem:3172]
[citeproof:3172]
[explanation: Why T(1) fails here and T(b) works]
The difficulty with applying T(1) directly to $C_A$ is that $C_A(1)$ is not in BMO — informally, $C_A(1)(x) = \text{p.v.}\int_{-\infty}^\infty (x-y+i(A(x)-A(y)))^{-1}\,dy$, and this integral has logarithmic growth analogous to the Hilbert transform of $1$, which also fails to lie in BMO. The accretive function $b$ repairs this by converting the numerator from $1$ to a quantity $b(y)\,dy$ that is, in complex variable terms, the holomorphic differential along $\Gamma$. The computation $C_A(b)(x) \in \text{BMO}$ then amounts to the fact that $\oint_\Gamma dw/(z-w)$ equals $2\pi i$ or $0$ depending on winding, a genuinely complex-analytic identity that is stable under Lipschitz perturbations.
More broadly, the T(b) theorem applies whenever the natural "test function" for the operator is not a constant but a function reflecting the geometry of the problem — the tangential measure, an approximate identity adapted to a measure, or a weight intrinsic to the operator.
[/explanation]
[example: The first-order commutator]
As a preliminary case of the Coifman–McIntosh–Meyer theorem, consider the first-order commutator
\begin{align*}
C_1 f(x) = \text{p.v.}\int_{-\infty}^\infty \frac{A(x)-A(y)}{(x-y)^2}\,f(y)\,d\mathcal{L}^1(y).
\end{align*}
This is a Calderón commutator. Writing $A(x)-A(y) = \int_y^x A'(t)\,dt$ and exchanging order of integration, one can express $C_1$ as a singular integral involving $A'$ composed with the Hilbert transform. Since $A' \in L^\infty$, the resulting operator has kernel satisfying the standard estimates, and $C_1(b)$ can be computed explicitly using the identity $(A(x)-A(y))/(x-y) = \int_0^1 A'(tx+(1-t)y)\,dt$. This yields $C_1(b) \in L^\infty \subset \text{BMO}$, and the T(b) theorem gives $\|C_1 f\|_2 \lesssim \|A'\|_\infty \|f\|_2$. The general Cauchy integral is a sum of such commutators when $A'$ is small, and extends to all Lipschitz $A$ by the T(b) theorem applied directly.
[/example]
These classical theorems operate in the world of $L^p$ spaces, but harmonic analysis increasingly requires spaces capturing finer regularity distinctions. The real Hardy space $H^1$ introduces a space where cancellation properties, not just size, determine membership, revealing structures invisible in $L^p$ theory.
# 7. The Real Hardy Space $H^1$
The Lebesgue space $L^1(\mathbb{R}^n)$ is the natural home for integrable functions, but it fails as a domain for singular integrals: the Hilbert transform and Riesz transforms are not bounded on $L^1$, and no Calderón–Zygmund operator is. The real Hardy space $H^1(\mathbb{R}^n)$ is the correct substitute — a proper subspace of $L^1$ on which the full Calderón–Zygmund theory applies. This chapter develops $H^1$ from the ground up: the maximal-function definition and its independence of the choice of approximation kernel, the equivalent characterisation via Riesz transforms, the atomic decomposition theorem of Coifman and Latter, and the fundamental boundedness result that every Calderón–Zygmund operator maps $H^1$ to $L^1$. The space $H^1$ also appears as the predual of $\mathrm{BMO}$ in Fefferman's duality theorem, treated in the next chapter.
## The Maximal-Function Definition
The difficulty in using $L^1$ as a domain for singular integrals is not just a matter of the kernel being too singular at zero; it is a structural failure. The principal-value integral defining $Hf$ does exist for $f \in L^1$, but the resulting function $Hf$ may not be integrable — it can only be guaranteed to lie in weak-$L^1$. To isolate the subspace of $L^1$ on which the Hilbert transform remains $L^1$, one needs a quantitative test that encodes the cancellation properties responsible for integrability. The key insight, due to Fefferman and Stein, is that this cancellation is already visible in the maximal averages of $f$ against a Schwartz kernel.
Fix a function $\varphi \in \mathcal{S}(\mathbb{R}^n)$ with $\int_{\mathbb{R}^n} \varphi \, d\mathcal{L}^n = 1$. For $t > 0$ define the dilate $\varphi_t(x) = t^{-n} \varphi(x/t)$, so that $\int \varphi_t \, d\mathcal{L}^n = 1$ for every $t$. The family $(\varphi_t)_{t > 0}$ is an approximate identity.
[definition: Grand Maximal Function]
Let $\varphi \in \mathcal{S}(\mathbb{R}^n)$ with $\int \varphi \, d\mathcal{L}^n = 1$, and for $t > 0$ set $\varphi_t(x) = t^{-n}\varphi(x/t)$. For a tempered distribution $f \in \mathcal{S}'(\mathbb{R}^n)$, the **$\varphi$-maximal function** is
\begin{align*}
M_\varphi f(x) = \sup_{t > 0} |(\varphi_t * f)(x)|, \quad x \in \mathbb{R}^n.
\end{align*}
Here $\varphi_t * f$ denotes the convolution of the Schwartz function $\varphi_t$ with the tempered distribution $f$, which is a smooth function.
[/definition]
The expression $\varphi_t * f$ is well-defined as a smooth function for $f \in \mathcal{S}'(\mathbb{R}^n)$ because $\varphi_t \in \mathcal{S}(\mathbb{R}^n)$, so the convolution of a Schwartz function with a tempered distribution is smooth. The maximal function $M_\varphi f$ then asks for the largest value that any of these smoothed approximations achieves at each point.
[definition: Hardy Space $H^1$]
The **real Hardy space** $H^1(\mathbb{R}^n)$ is defined as
\begin{align*}
H^1(\mathbb{R}^n) = \{ f \in \mathcal{S}'(\mathbb{R}^n) : M_\varphi f \in L^1(\mathbb{R}^n) \},
\end{align*}
equipped with the norm
\begin{align*}
\|f\|_{H^1} = \|M_\varphi f\|_{L^1(\mathbb{R}^n)}.
\end{align*}
[/definition]
The immediate question is whether this definition depends on the choice of $\varphi$. If two Schwartz functions $\varphi$ and $\psi$, each integrating to $1$, give different classes, then $H^1$ would not be intrinsically defined. The independence theorem shows this does not happen.
[quotetheorem:3173]
[citeproof:3173]
The independence theorem is not just a convenience: it means $H^1(\mathbb{R}^n)$ is a genuinely intrinsic function space, not an artifact of a particular averaging procedure. The hypothesis that $\int \varphi = 1$ is essential. If $\int \varphi = 0$, then $M_\varphi f$ can still be bounded even for functions $f$ with non-trivial oscillation at large scales, and the resulting space would be larger than $H^1$.
[remark: $H^1$ is a subspace of $L^1$]
Every $f \in H^1(\mathbb{R}^n)$ lies in $L^1(\mathbb{R}^n)$. To see this, note that the Lebesgue differentiation theorem gives $|f(x)| \leq M_\varphi f(x)$ almost everywhere when $\varphi \geq 0$ (choosing, for instance, a non-negative $\varphi$). So $\|f\|_{L^1} \leq \|M_\varphi f\|_{L^1} = \|f\|_{H^1}$, and the inclusion $H^1(\mathbb{R}^n) \hookrightarrow L^1(\mathbb{R}^n)$ is a bounded embedding. The inclusion is proper: a non-negative $L^1$ function lies in $H^1$ if and only if it is zero.
[/remark]
## The Riesz Transform Characterisation
The maximal-function definition of $H^1$ is flexible and well-suited to proving mapping properties, but it gives little direct intuition for which $L^1$ functions belong to the space. A function in $L^1$ whose Riesz transforms are also in $L^1$ must satisfy cancellation conditions at every scale, and this turns out to be precisely what is needed.
[quotetheorem:3174]
[citeproof:3174]
The hypothesis that all $n$ Riesz transforms are in $L^1$ cannot be weakened to a strict subset. In dimension $n = 2$, functions of the form $f(x) = g(x_1)h(x_2)$ with $g, h \geq 0$ and $g, h \in L^1$ have $R_1 f \in L^1$ only if $h \equiv 0$, but $R_2 f$ need not be in $L^1$ even when $R_1 f$ is. In one dimension, the characterisation simplifies:
[quotetheorem:3175]
[citeproof:3175]
The one-dimensional result makes the failure of $L^1$ transparent: a function in $L^1$ whose Hilbert transform is not in $L^1$ (for example, $f(x) = \mathbb{1}_{[0,1]}(x)$, for which $Hf(x) \sim \frac{1}{\pi}\log|x|$ near $x = 0$ and $x = 1$, which is not integrable) lies in $L^1 \setminus H^1$. The Hilbert transform of $\mathbb{1}_{[0,1]}$ has logarithmic divergences at the endpoints and belongs to $L^{1,\infty}$ but not $L^1$.
## Atoms and the Coifman–Latter Decomposition
The Riesz transform characterisation shows which functions belong to $H^1$, but it does not give a constructive picture of the space. The atomic decomposition fills this gap: it shows that every $H^1$ function is a convergent series of elementary building blocks, called atoms, each of which encodes the two essential features of $H^1$ membership — compact support and cancellation.
The condition $\int a \, d\mathcal{L}^n = 0$ is the mean-zero condition, which is the key cancellation property. Without it, a positive bump function supported in a ball $B$ satisfies $\|a\|_\infty \leq |B|^{-1}$ and lies in $L^\infty \cap L^1$, but need not be in $H^1$ (positive functions in $H^1$ must be zero). Without the size condition $\|a\|_{L^\infty} \leq |B|^{-1}$, the atom could be large relative to its support and fail to be in $H^1$.
[definition: $H^1$-Atom]
A measurable function $a : \mathbb{R}^n \to \mathbb{C}$ is an **$H^1$-atom** if there exists a ball $B \subset \mathbb{R}^n$ such that:
1. $\operatorname{supp}(a) \subset B$,
2. $\int_{\mathbb{R}^n} a(x) \, d\mathcal{L}^n(x) = 0$,
3. $\|a\|_{L^\infty(\mathbb{R}^n)} \leq |B|^{-1}$,
where $|B| = \mathcal{L}^n(B)$ denotes the Lebesgue measure of the ball.
[/definition]
Every atom has a uniform $H^1$ norm bound, making atoms natural building blocks.
[quotetheorem:3176]
[citeproof:3176]
The two-region splitting in this proof is prototypical: it appears again in the proof that Calderón–Zygmund operators map $H^1$ to $L^1$. The cancellation condition eliminates the leading term in a Taylor expansion, and the size condition controls the error.
The converse direction — that every $H^1$ function decomposes into atoms — is the deep result.
[quotetheorem:3177]
[citeproof:3177]
The hypothesis that $f \in H^1$ (rather than just $L^1$) is genuinely used in the forward direction to control the number and sizes of the cubes arising in the Calderón–Zygmund decomposition at each level and to ensure the series converges. For a function in $L^1 \setminus H^1$, the decomposition still produces pieces with mean zero on cubes, but the coefficients $\sum_j |\lambda_j|$ would diverge. The example $f = \mathbb{1}_{[0,1]} - \mathbb{1}_{[-1,0]}$ lies in $H^1(\mathbb{R})$ (it has mean zero, and its Hilbert transform can be computed explicitly and shown to be in $L^1$), while $g = \mathbb{1}_{[0,1]}$ does not.
The atomic decomposition is the central tool of the subject. It reduces questions about $H^1$ functions to questions about atoms, which are elementaryobjects with three explicit properties. The next section demonstrates this reduction for Calderón–Zygmund operators.
## Calderón–Zygmund Operators on H^1
The main theorem of this chapter explains why $H^1$ is the right substitute for $L^1$: the Calderón–Zygmund operators, which fail to be bounded on $L^1$, are bounded from $H^1$ to $L^1$. The proof reduces everything to atoms via the decomposition theorem.
[quotetheorem:3178]
[citeproof:3178]
The hypothesis that $T$ is $L^2$-bounded is used on $2B$ to control the operator without cancellation. The Hörmander condition is used on $(2B)^c$, where the cancellation of the atom offsets the singularity of the kernel. If the Hörmander condition fails — for example, for an operator with a kernel that does not satisfy the smoothness hypothesis — then the argument on $(2B)^c$ breaks down and the theorem can fail. Any positive operator (i.e., one for which $Tf \geq 0$ when $f \geq 0$) cannot map $H^1$ to $L^1$ unless it is trivial, since positive functions in $H^1$ are zero.
When a Calderón–Zygmund operator satisfies the additional cancellation condition $T(1) = 0$ — meaning that $T$ applied to the constant function $1$ (interpreted in the distributional or $\mathrm{BMO}$ sense) vanishes — then $T$ preserves the Hardy space.
[quotetheorem:3179]
[citeproof:3179]
The condition $T(1) = 0$ is not just a technical hypothesis — it is the condition that distinguishes operators which map $L^1$ to $H^1$ (equivalently $H^1$ to $H^1$) from those which merely map $L^1$ to weak-$L^1$. The Hilbert transform satisfies $H(1) = 0$ in the principal-value sense, which is why it maps $H^1(\mathbb{R})$ to itself. The operator $T = \mathrm{Id}$ is a Calderón–Zygmund operator (vacuously, with the trivial kernel) that does not satisfy $T(1) = 0$, and indeed $\mathrm{Id} : H^1 \to H^1$ holds in that case by definition — but the interesting content is for operators with non-trivial kernels.
[example: Riesz Transforms Map $H^1$ to $H^1$]
The Riesz transforms $R_j : L^2(\mathbb{R}^n) \to L^2(\mathbb{R}^n)$, defined by the multiplier $\widehat{R_j f}(\xi) = -i\xi_j / |\xi| \cdot \hat{f}(\xi)$, are Calderón–Zygmund operators. The kernel of $R_j$ is $K_j(x,y) = c_n (x_j - y_j)|x-y|^{-n-1}$, which is odd in $x - y$. This oddness means $\int_{|x-y| = r} K_j(x,y) \, d\mathcal{H}^{n-1}(y) = 0$ for every sphere, so in particular the row-integral condition $T(1) = 0$ holds. By the preceding theorem, $R_j : H^1(\mathbb{R}^n) \to H^1(\mathbb{R}^n)$ boundedly.
This is consistent with the Fefferman–Stein characterisation: if $f \in H^1$, then $R_j f \in L^1 \subset \mathcal{S}'$ and $R_k(R_j f) \in L^1$ for all $k$ (since $R_k$ maps $H^1$ to $L^1$), so $R_j f \in H^1$. The Riesz transform characterisation theorem then gives $\|R_j f\|_{H^1} \asymp \|R_j f\|_{L^1} + \sum_k \|R_k R_j f\|_{L^1} \lesssim \|f\|_{H^1}$, giving the $H^1 \to H^1$ bound.
[/example]
[remark: Comparison with $L^1$]
The contrast between $L^1$ and $H^1$ as domains for Calderón–Zygmund operators can now be summarised precisely. Every Calderón–Zygmund operator $T$ maps $L^1 \to L^{1,\infty}$ (weak type $(1,1)$, proved via the Calderón–Zygmund decomposition in Chapter 5). However, $T$ need not map $L^1 \to L^1$: the Hilbert transform of $\mathbb{1}_{[0,1]}$ has logarithmic singularities and is not in $L^1(\mathbb{R})$. By contrast, if $f \in H^1 \subset L^1$, then $Tf \in L^1$: the extra cancellation encoded in $H^1$ membership is precisely enough to restore integrability.
[/remark]
The atomic decomposition and the $H^1 \to L^1$ theorem together give a complete picture of how $H^1$ sits inside $L^1$ and why singular integrals respect this finer structure. The next chapter reveals the dual side of this story: $\mathrm{BMO}$ is the dual space of $H^1$, and the pairing is defined by the same cancellation conditions that appear in the atomic decomposition.
The duality between $H^1$ and bounded mean oscillation spaces provides the natural completion of Hardy space theory. BMO and its relationship to $L^p$ through Fefferman duality illuminate the complementary role of spaces with unbounded $L^\infty$ norm but controlled oscillation.
# 8. BMO and Fefferman Duality
The previous chapter showed that Calderón–Zygmund operators are bounded on the Hardy space $H^1$, filling the gap left by $L^1$ where these operators fail. A natural question immediately follows: what is the dual of $H^1$? The answer — due to Fefferman — is the space $\mathrm{BMO}$ of functions of bounded mean oscillation, an analytic object that had appeared independently in work of John and Nirenberg on functions with controlled local fluctuations. This chapter develops BMO from scratch, proves the John–Nirenberg exponential distribution inequality, introduces the Fefferman–Stein sharp maximal function, characterises BMO through Carleson measures, and culminates in the duality theorem $(H^1)^* = \mathrm{BMO}$.
## The Mean Oscillation Seminorm
The starting difficulty is this: if one wants a space that serves as the dual of $H^1$ and contains $L^\infty$, neither $L^\infty$ nor $L^1$ will do. Calderón–Zygmund operators fail to map $L^\infty$ to itself — the simplest instance is the Hilbert transform applied to a bounded function, which can log-diverge near the boundary of the support. The remedy is to measure not the pointwise size of a function, but how much it oscillates around its local averages over cubes.
For a cube $Q \subset \mathbb{R}^n$ with side length $\ell(Q)$ and a function $f \in L^1_{\mathrm{loc}}(\mathbb{R}^n)$, write the average of $f$ over $Q$ as
\begin{align*}
f_Q = \frac{1}{|Q|} \int_Q f(x)\, d\mathcal{L}^n(x) \in \mathbb{R}.
\end{align*}
The mean oscillation of $f$ over $Q$ is $\frac{1}{|Q|}\int_Q |f(x) - f_Q|\, d\mathcal{L}^n(x)$, which measures the average deviation of $f$ from its own mean. Taking the supremum over all cubes produces the BMO seminorm.
[definition: BMO Space]
Let $f \in L^1_{\mathrm{loc}}(\mathbb{R}^n)$. Define the **BMO seminorm** of $f$ by
\begin{align*}
\|f\|_{\mathrm{BMO}} = \sup_Q \frac{1}{|Q|} \int_Q |f(x) - f_Q|\, d\mathcal{L}^n(x),
\end{align*}
where the supremum is taken over all cubes $Q \subset \mathbb{R}^n$ with sides parallel to the coordinate axes. The **space of bounded mean oscillation** is
\begin{align*}
\mathrm{BMO}(\mathbb{R}^n) = \{ f \in L^1_{\mathrm{loc}}(\mathbb{R}^n) : \|f\|_{\mathrm{BMO}} < \infty \}.
\end{align*}
Since $\|f\|_{\mathrm{BMO}} = 0$ if and only if $f$ is a.e. constant, $\|\cdot\|_{\mathrm{BMO}}$ is only a seminorm. The quotient $\mathrm{BMO}(\mathbb{R}^n)/\{\text{constants}\}$, equipped with the induced norm, is a Banach space.
[/definition]
The key structural point is that $\mathrm{BMO}$ strictly contains $L^\infty$, which in turn distinguishes it from all $L^p$ spaces.
[example: $L^\infty$ embeds strictly into BMO]
If $f \in L^\infty(\mathbb{R}^n)$, then for every cube $Q$,
\begin{align*}
\frac{1}{|Q|}\int_Q |f - f_Q|\, d\mathcal{L}^n \leq \frac{1}{|Q|}\int_Q (|f| + |f_Q|)\, d\mathcal{L}^n \leq 2\|f\|_\infty,
\end{align*}
since $|f - f_Q| \leq |f| + |f_Q| \leq 2\|f\|_\infty$. Hence $\|f\|_{\mathrm{BMO}} \leq 2\|f\|_\infty$, so $L^\infty \hookrightarrow \mathrm{BMO}$.
For strict containment, consider $f(x) = \log|x|$ on $\mathbb{R}^n$. This function is locally integrable (the singularity at $0$ is integrable in any dimension since $\int_{B(0,1)} |\log|x||\, d\mathcal{L}^n < \infty$) and unbounded, so $f \notin L^\infty(\mathbb{R}^n)$. To check $f \in \mathrm{BMO}$, fix a cube $Q$. By a scaling and translation argument, the mean oscillation of $\log|x|$ over any cube is bounded by a universal constant. The key estimate: for a cube $Q$ of side $\ell$ centered at $x_0$, the oscillation satisfies $\frac{1}{|Q|}\int_Q |\log|x| - (\log|x|)_Q|\, d\mathcal{L}^n \leq C_n$ uniformly in $Q$, which one verifies by splitting according to whether $|x_0|$ is large or small relative to $\ell$.
On the other hand, $\operatorname{sgn}(\log|x|)$ — the function that is $+1$ for $|x| > 1$ and $-1$ for $|x| < 1$ — is bounded but is not in BMO. To see this, consider cubes $Q_R = [-R, R]^n$ for large $R$. The average of $\operatorname{sgn}(\log|x|)$ over $Q_R$ tends to $+1$ as $R \to \infty$, while the function takes value $-1$ on the ball $B(0,1)$. The mean oscillation over $Q_R$ is bounded below by a constant, but by choosing nested cubes stradling $\{|x| = 1\}$, the oscillation near the unit sphere does not decay, so $\|\operatorname{sgn}(\log|x|)\|_{\mathrm{BMO}} = \infty$.
[/example]
## The John–Nirenberg Inequality
The definition of BMO controls the $L^1$ average of $|f - f_Q|$ over cubes. A priori it says nothing about large deviations: a function could have most of its mass near the average $f_Q$ but occasionally spike to enormous values, while keeping the $L^1$ mean oscillation small. The John–Nirenberg inequality shows this cannot happen: the distribution of $|f - f_Q|$ is actually controlled by an exponential, so BMO functions have sub-Gaussian tails inside every cube.
[quotetheorem:3180]
[citeproof:3180]
The hypothesis $\|f\|_{\mathrm{BMO}} < \infty$ is genuinely necessary. Without a uniform bound on all cubes, the exponential estimate can fail. A function whose oscillation grows logarithmically in the cube scale would violate the conclusion, since for such an $f$ the level sets $\{|f - f_Q| > \lambda\}$ would have measure decaying only polynomially in $\lambda$ rather than exponentially.
The John–Nirenberg inequality has an immediate and important corollary: all $L^p$ averages of the local oscillation are comparable.
[quotetheorem:3181]
[citeproof:3181]
The bound $\|f\|_{\mathrm{BMO}} \leq \|f\|_{\mathrm{BMO}_p}$ follows from Hölder's inequality. The reverse bound follows from the John–Nirenberg exponential: since $\mathcal{L}^n(\{|f - f_Q| > \lambda\} \cap Q) \leq c_1 |Q| e^{-c_2\lambda/\|f\|_{\mathrm{BMO}}}$, the layer-cake formula gives
\begin{align*}
\frac{1}{|Q|}\int_Q |f - f_Q|^p\, d\mathcal{L}^n = p \int_0^\infty \lambda^{p-1} \frac{\mathcal{L}^n(\{|f - f_Q| > \lambda\} \cap Q)}{|Q|}\, d\lambda \leq p c_1 \int_0^\infty \lambda^{p-1} e^{-c_2 \lambda / \|f\|_{\mathrm{BMO}}}\, d\lambda,
\end{align*}
and the last integral evaluates to $C_{p,n} \|f\|_{\mathrm{BMO}}^p$ by the change of variables $t = c_2 \lambda/\|f\|_{\mathrm{BMO}}$.
This equivalence is fundamental: it shows that the $\mathrm{BMO}$ condition is stable under changing the exponent in the definition, which will be crucial when verifying BMO membership of specific operators.
## The Sharp Maximal Function
The Hardy–Littlewood maximal function $Mf$ controls $|f|$ by its local $L^1$ averages. For BMO, the natural analogue controls local oscillation rather than local size.
[definition: Sharp Maximal Function]
For $f \in L^1_{\mathrm{loc}}(\mathbb{R}^n)$, the **Fefferman–Stein sharp maximal function** is the map $f^\sharp: \mathbb{R}^n \to [0, \infty]$ defined by
\begin{align*}
f^\sharp(x) = \sup_{Q \ni x} \frac{1}{|Q|} \int_Q |f(y) - f_Q|\, d\mathcal{L}^n(y),
\end{align*}
where the supremum is over all cubes $Q$ containing $x$ with sides parallel to the axes.
[/definition]
The relationship between $f^\sharp$ and the BMO norm is immediate: $\|f\|_{\mathrm{BMO}} = \|f^\sharp\|_{L^\infty}$, since $\|f^\sharp\|_\infty = \sup_x f^\sharp(x) = \sup_Q \frac{1}{|Q|}\int_Q |f - f_Q| = \|f\|_{\mathrm{BMO}}$. So BMO is exactly the space of locally integrable functions for which the sharp maximal function is bounded.
The sharp maximal function is more useful than the norm alone because it admits a pointwise inequality connecting $L^p$ norms of $f$ with those of $f^\sharp$.
[quotetheorem:3182]
[citeproof:3182]
The hypothesis that $f \in L^{p_0}$ for some $p_0$ is necessary to rule out non-trivial constants, which have $f^\sharp = 0$ but are not zero in $L^p$. The inequality fails at $p = 1$: there exist functions with small $\|f^\sharp\|_{L^1}$ but large $\|f\|_{L^1}$ — in fact, any nonzero $L^1$ function whose oscillations are concentrated near a point provides a counterexample.
The Fefferman–Stein inequality is the technical backbone for proving that operators whose sharp maximal function is controlled by the Hardy–Littlewood maximal function of $f$ are bounded on $L^p$.
[example: CZ Operators Map $L^\infty$ to BMO]
Let $T$ be a Calderón–Zygmund operator. For $f \in L^\infty(\mathbb{R}^n)$, write $Tf = T(f - f_Q) + f_Q\cdot T(1)$ locally over a cube $Q$. The cancellation of $T$ on constants (or more precisely, the kernel regularity away from the diagonal) gives
\begin{align*}
(Tf)^\sharp(x) \lesssim Mf(x) \lesssim \|f\|_\infty
\end{align*}
pointwise, which upon taking the $L^\infty$ norm yields $\|Tf\|_{\mathrm{BMO}} \lesssim \|f\|_\infty$. The precise argument uses the Hörmander kernel condition to control the difference $K(\cdot, y) - K(\cdot, y_Q)$ when $y$ and $y_Q$ are both far from $x$, giving a uniform bound on the oscillation of $Tf$ over cubes not containing the singularity.
[/example]
## Carleson Measures and the Poisson Characterisation
BMO has a natural characterisation in terms of the upper half-space $\mathbb{R}^{n+1}_+ = \{(x, t) : x \in \mathbb{R}^n, t > 0\}$ via the Poisson extension. This characterisation, due to Fefferman and Stein, connects BMO to a geometric condition on measures in the upper half-space — the Carleson measure condition — and is the key to the duality theorem.
[definition: Carleson Measure]
A positive Borel measure $\mu$ on $\mathbb{R}^{n+1}_+$ is a **Carleson measure** if there exists a constant $C_\mu < \infty$ such that for every cube $Q \subset \mathbb{R}^n$,
\begin{align*}
\mu(T(Q)) \leq C_\mu |Q|,
\end{align*}
where $T(Q) = Q \times (0, \ell(Q)]$ is the **tent** over $Q$ (the region in the upper half-space directly above $Q$ with height equal to the side length of $Q$). The smallest such $C_\mu$ is called the **Carleson norm** of $\mu$.
[/definition]
<!-- illustration-needed: The tent T(Q) over a cube Q in the upper half-space — show Q on the boundary x-axis and the region Q × (0, l(Q)] above it forming a tent shape -->
The tent condition $\mu(T(Q)) \leq C_\mu |Q|$ says that the measure $\mu$ does not concentrate mass above any cube more than proportionally to the cube's volume. Without the tent condition, a measure could pile all its mass above a single point, which would prevent the Poisson integral from being bounded.
For $f \in L^1_{\mathrm{loc}}(\mathbb{R}^n)$, let $u(x, t) = P_t * f(x)$ be the Poisson extension of $f$ to $\mathbb{R}^{n+1}_+$, where $P_t(x) = c_n t(|x|^2 + t^2)^{-(n+1)/2}$ is the Poisson kernel.
[quotetheorem:3183]
[citeproof:3183]
The measure $|\nabla u|^2 t\, dx\, dt$ records how much the harmonic extension oscillates at each scale $t$ and location $x$. The factor $t$ is a natural weight: it weights the gradient at height $t$ proportionally to the distance from the boundary, so oscillations at small scales (close to the boundary) count with small weight, and the condition becomes a scale-invariant bound on total oscillation.
The proof uses the square function identity relating the $L^2$ norm of $\nabla u$ weighted by $t$ over the tent $T(Q)$ to the local oscillation of $f$ over $Q$, combined with the John–Nirenberg inequality. The direction $\mathrm{BMO} \Rightarrow$ Carleson follows from estimating $\int_{T(Q)} |\nabla u|^2 t$ by writing $f = f_Q + (f - f_Q)$ and using Plancherel for the harmonic extension of each part. The converse uses the fact that Carleson measures are precisely those for which the area integral is bounded on $L^2$.
## Fefferman Duality: $(H^1)^* = \mathrm{BMO}$
The entire structure built in this chapter converges to one theorem. The Hardy space $H^1(\mathbb{R}^n)$ was introduced as the substitute for $L^1$ on which Calderón–Zygmund operators are bounded. Its dual must be identified.
The pairing is formal: for $f \in H^1$ and $g \in \mathrm{BMO}$, one wants to make sense of $\Lambda_g(f) = \int_{\mathbb{R}^n} f(x) g(x)\, d\mathcal{L}^n(x)$. For general $H^1$ functions this integral need not converge absolutely — $H^1 \not\hookrightarrow L^2$ and $\mathrm{BMO} \not\subset L^2$. The atomic decomposition from the previous chapter is the tool that resolves this.
[quotetheorem:3184]
[citeproof:3184]
The Fefferman duality theorem is one of the central structural results of modern harmonic analysis. It places $H^1$ and $\mathrm{BMO}$ in the same role relative to each other as $L^1$ and $L^\infty$, but adapted to the cancellation structure of singular integrals.
[remark: The $L^1$--$L^\infty$ Analogy]
The pairing $(H^1, \mathrm{BMO})$ mirrors the $(L^1, L^\infty)$ duality, but with a crucial difference. The $L^1$--$L^\infty$ pairing holds for all bounded functions and all integrable functions. The $H^1$--$\mathrm{BMO}$ pairing requires the additional cancellation $\int a = 0$ for the $H^1$ side, which is exactly the condition that makes atoms well-adapted to BMO functions (since only the oscillation of the BMO function, not its mean, contributes to the pairing). This cancellation is what singular integrals possess and what makes the theory work.
[/remark]
[quotetheorem:3185]
[citeproof:3185]
This result completes the endpoint picture for Calderón–Zygmund operators: they are bounded on $L^p$ for $1 < p < \infty$ (Chapter 5), map $L^1$ weakly to $L^{1,\infty}$, map $H^1$ to $L^1$, and map $L^\infty$ to $\mathrm{BMO}$. By duality and interpolation, this gives a coherent endpoint theory in which $H^1$ and $\mathrm{BMO}$ fill the roles that $L^1$ and $L^\infty$ cannot.
The corollary that CZ operators map $L^\infty$ to BMO, combined with their $L^2$ boundedness, shows via complex interpolation that they are bounded on all of the scale. The space BMO is not merely a byproduct of duality theory — it is the natural home of oscillatory functions that arise throughout analysis, from solutions of elliptic PDE near their singularities to the output of singular integral operators applied to the worst-behaved bounded inputs.
While Fourier analysis via individual multipliers has limitations, frequency-localized decompositions offer more flexible tools. The Littlewood-Paley decomposition partitions functions by frequency, revealing how local behavior in frequency space translates to global boundedness and regularity properties.
# 9. The Littlewood-Paley Decomposition
The preceding chapters developed the Calderón–Zygmund machinery for controlling singular integrals and established $H^1$–BMO duality as the correct endpoint theory. Both tools, however, operate in physical space: they detect cancellation and oscillation through averages over balls and cubes. The Littlewood–Paley decomposition brings frequency space into the picture in a systematic way. The guiding question is: given a function $f \in L^p(\mathbb{R}^n)$, can we analyze its $L^p$ norm by looking at each dyadic frequency band $\{2^j \le |\xi| \le 2^{j+1}\}$ separately? The answer is yes, and the mechanism is the square function $S(f)$, whose $L^p$ norm is equivalent to $\|f\|_p$ for $1 < p < \infty$. This equivalence converts $L^p$ estimates into vector-valued $L^2$ estimates across frequency scales, and it underlies the multiplier theorems of Chapter 10 and the definition of Besov and Triebel–Lizorkin spaces in Chapter 11.
## Dyadic Frequency Projections
The starting point is a clean way to isolate the portion of a tempered distribution $f \in \mathcal{S}'(\mathbb{R}^n)$ that lives at frequencies of size $\approx 2^j$.
[motivation]
### Why not just use characteristic functions in frequency space?
The most naive approach would be to define $\Delta_j f$ by setting $\widehat{\Delta_j f}(\xi) = \hat{f}(\xi) \cdot \mathbb{1}_{\{2^{j-1} \le |\xi| \le 2^{j+1}\}}(\xi)$. The difficulty is that multiplication by $\mathbb{1}_{\{2^{j-1} \le |\xi| \le 2^{j+1}\}}$ is a Fourier multiplier whose symbol is not smooth: it has jump discontinuities at $|\xi| = 2^{j-1}$ and $|\xi| = 2^{j+1}$. By the Mihlin multiplier theorem (to be proved in Chapter 10), such discontinuities destroy $L^p$ boundedness for $p \ne 2$. The operator on physical space would be convolution with a kernel that decays only like $|x|^{-n}$ — not in $L^1$, and not satisfying the Hörmander smoothness condition needed for Calderón–Zygmund theory.
The remedy is to replace these sharp cutoffs with smooth ones. We construct a smooth radial bump $\hat{\psi}(\xi)$ that is supported in the annulus $\{1/2 \le |\xi| \le 2\}$ and equals $1$ on $\{3/4 \le |\xi| \le 3/2\}$ (after suitable normalisation), with the essential property that the dilates $\hat{\psi}(2^{-j}\xi)$ sum to $1$ for all $\xi \ne 0$. The convolution kernel $\psi \in \mathcal{S}(\mathbb{R}^n)$ is then in the Schwartz class, giving rapid decay and making the operator well-behaved on every $L^p$.
[/motivation]
With this motivation in place, we give the precise construction.
[definition: Littlewood-Paley Cutoff Functions]
Fix a function $\hat{\varphi} \in C_c^\infty(\mathbb{R}^n)$ satisfying $\hat{\varphi}(\xi) = 1$ for $|\xi| \le 1$ and $\hat{\varphi}(\xi) = 0$ for $|\xi| \ge 2$, with $0 \le \hat{\varphi} \le 1$ throughout. Define
\begin{align*}
\hat{\psi}(\xi) := \hat{\varphi}(\xi/2) - \hat{\varphi}(\xi),
\end{align*}
so that $\hat{\psi} \in C_c^\infty(\mathbb{R}^n)$ with $\operatorname{supp}(\hat{\psi}) \subset \{1/2 \le |\xi| \le 2\}$ and $\hat{\psi} \ge 0$. For each $j \in \mathbb{Z}$ define the dilated multiplier
\begin{align*}
\hat{\psi}_j(\xi) := \hat{\psi}(2^{-j}\xi),
\end{align*}
whose support lies in the dyadic annulus $\{2^{j-1} \le |\xi| \le 2^{j+1}\}$. The corresponding convolution kernel is $\psi_j(x) = 2^{jn}\psi(2^j x)$ where $\psi = \mathcal{F}^{-1}(\hat{\psi}) \in \mathcal{S}(\mathbb{R}^n)$. The **Littlewood–Paley projection** at frequency scale $j$ is the operator
\begin{align*}
\Delta_j f := \psi_j * f,
\end{align*}
defined initially for $f \in \mathcal{S}(\mathbb{R}^n)$ and extended by duality to $f \in \mathcal{S}'(\mathbb{R}^n)$.
The **low-frequency projection** at scale $j$ is $S_j f := \varphi_j * f$ where $\hat{\varphi}_j(\xi) = \hat{\varphi}(2^{-j}\xi)$, so $\widehat{S_j f}(\xi) = \hat{\varphi}(2^{-j}\xi)\hat{f}(\xi)$.
[/definition]
The construction ensures that $\Delta_j = S_{j+1} - S_j$: the $j$-th piece is the difference between consecutive low-frequency approximations. The key structural property is the telescoping identity.
[quotetheorem:3186]
[citeproof:3186]
[remark: Low-Frequency Residual]
The sum $\sum_{j \in \mathbb{Z}} \Delta_j f$ does not include the contribution near $\xi = 0$. Specifically, the function $\varphi$ captures the very-low-frequency part. In some conventions, one works with $f = S_0 f + \sum_{j \ge 0} \Delta_j f$, retaining the low-frequency piece $S_0 f$ separately. For the purpose of $L^p$ norm equivalences, the choice of where to start the sum is immaterial; what matters is that the sum reconstructs $f$.
[/remark]
A second fundamental fact is that the projections are almost orthogonal: $\Delta_j f$ and $\Delta_k f$ have disjoint (or nearly disjoint) Fourier supports when $|j - k| \ge 2$. Precisely, $\operatorname{supp}(\hat{\psi}_j) \cap \operatorname{supp}(\hat{\psi}_k) = \varnothing$ whenever $|j - k| \ge 2$, because the annuli $\{2^{j-1} \le |\xi| \le 2^{j+1}\}$ and $\{2^{k-1} \le |\xi| \le 2^{k+1}\}$ do not intersect in that case.
## The Square Function and the $L^p$ Equivalence
Having isolated each frequency band, the natural question is: how does the $L^p$ norm of $f$ compare to the combined sizes of its pieces $\Delta_j f$? The square function packages the answer.
[definition: Littlewood-Paley Square Function]
For $f \in \mathcal{S}(\mathbb{R}^n)$, the **Littlewood–Paley square function** $S(f) : \mathbb{R}^n \to [0,\infty)$ is defined by
\begin{align*}
S(f)(x) := \left( \sum_{j \in \mathbb{Z}} |\Delta_j f(x)|^2 \right)^{1/2}.
\end{align*}
[/definition]
The square function $S(f)$ takes values in $[0,\infty)$ and measures, at each point $x$, how the energy of $f$ is distributed across all frequency scales. The fundamental theorem of Littlewood–Paley theory asserts that this measurement is equivalent to $|f(x)|$ in an $L^p$ sense.
[quotetheorem:3187]
[citeproof:3187]
The necessity of the constraint $1 < p < \infty$ is genuine. For $p = 1$, the square function inequality fails in both directions: $S(f)$ need not be integrable when $f \in L^1$, and conversely one can have $S(f) \in L^1$ but $f \notin L^1$. The correct replacement is $H^1$: the Littlewood–Paley characterisation of $H^1(\mathbb{R}^n)$ is $\|f\|_{H^1} \approx \|S(f)\|_{L^1}$, which is where the theory connects back to Chapter 7. At the other endpoint $p = \infty$, the square function characterises BMO, providing another link to Chapter 8.
[example: The Square Function for a Frequency-Localised Function]
Let $f \in L^2(\mathbb{R}^n)$ with $\operatorname{supp}(\hat{f}) \subset \{|\xi| \le 1\}$. Then $\Delta_j f = 0$ for all $j \ge 1$ (since $\hat{\psi}_j$ is supported away from $\{|\xi| \le 1\}$ for $j \ge 1$) and $S(f)(x)^2 = \sum_{j \le 0} |\Delta_j f(x)|^2$. The square function inequality gives $\|S(f)\|_p \approx_p \|f\|_p$ for $1 < p < \infty$, but in this frequency-localised case one can be more explicit: since $f = S_1 f$, the low-frequency contributions dominate and the sum is effectively finite in a useful sense. In particular, $\|S(f)\|_{L^2}^2 = \sum_{j \le 0} \|\Delta_j f\|_{L^2}^2 \le \|f\|_{L^2}^2$ directly from Plancherel, with equality when $\hat{f}$ is supported exactly on the support of $\sum_{j \le 0} \hat{\psi}_j$.
[/example]
[example: Rademacher Functions and Khintchine's Inequality]
The proof of the $L^p$ square function inequality for $p \ne 2$ relies on a probabilistic ingredient: Khintchine's inequality. Consider independent $\pm 1$-valued Rademacher random variables $(\varepsilon_j)_{j \in \mathbb{Z}}$. Khintchine's inequality states that for $1 \le q < \infty$,
\begin{align*}
\mathbb{E}\left[\left|\sum_j \varepsilon_j a_j\right|^q\right]^{1/q} \approx_q \left(\sum_j |a_j|^2\right)^{1/2}
\end{align*}
for any sequence $(a_j)$ of real numbers. This connects the $\ell^2$ norm of $(a_j)$ to the $L^q(\Omega)$ norm of the random sum $\sum_j \varepsilon_j a_j$.
The role in the Littlewood–Paley proof is as follows: write $\|S(f)\|_{L^p}^p = \int_{\mathbb{R}^n} \left(\sum_j |\Delta_j f(x)|^2\right)^{p/2} d\mathcal{L}^n(x)$. By Khintchine, $\left(\sum_j |\Delta_j f(x)|^2\right)^{1/2} \approx_p \mathbb{E}_\varepsilon\left[|\sum_j \varepsilon_j \Delta_j f(x)|\right]$. For each realisation of $(\varepsilon_j)$, the randomised sum $T_\varepsilon f := \sum_j \varepsilon_j \Delta_j f$ is a Fourier multiplier operator with multiplier $\sum_j \varepsilon_j \hat{\psi}_j(\xi)$. The Calderón–Zygmund theorem (proved for sums of the $\Delta_j$ by verifying the kernel conditions) bounds $\|T_\varepsilon f\|_{L^p} \lesssim_p \|f\|_{L^p}$ uniformly in $\varepsilon$. Combining with Khintchine and Minkowski's inequality yields the upper bound $\|S(f)\|_{L^p} \lesssim_p \|f\|_{L^p}$.
[/example]
## Bernstein Inequalities and Frequency-Localised Functions
The Littlewood–Paley projections have a second key property beyond the $L^p$ equivalence: functions with compact Fourier support behave like smooth functions, with derivatives controlled by the size of the support. This is the content of the Bernstein inequalities.
[motivation]
### What problem do Bernstein inequalities solve?
Suppose $f \in L^p(\mathbb{R}^n)$ and $\operatorname{supp}(\hat{f}) \subset \{|\xi| \le R\}$. Since $f$ is band-limited, one expects its derivatives to be controlled by $R$ times the function itself — differentiating in physical space corresponds to multiplying by $i\xi$ in frequency space, and $|i\xi| \le R$ on the support. But this naive reasoning gives only $\|\partial_j f\|_{L^2} \le R\|f\|_{L^2}$ by Plancherel. The Bernstein inequalities extend this in two ways: they work for general $L^p$ (not just $L^2$), and they also capture the improvement when passing from a smaller $L^p$ to a larger $L^q$, $p \le q$ — an effect that reflects the size of the support in $L^n$-measure, which is of order $R^n$.
[/motivation]
The Bernstein inequalities make this intuition precise.
[quotetheorem:3188]
[citeproof:3188]
The Bernstein inequalities have an immediate consequence for the Littlewood–Paley projections: since $\operatorname{supp}(\widehat{\Delta_j f}) \subset \{2^{j-1} \le |\xi| \le 2^{j+1}\}$, one applies the Bernstein inequality with $R = 2^{j+1}$ to obtain $\|D^\alpha(\Delta_j f)\|_{L^q} \lesssim 2^{j(|\alpha| + n(1/p - 1/q))} \|\Delta_j f\|_{L^p}$ for $p \le q$. In particular, taking $p = q$, each $\Delta_j f$ satisfies $\|D^\alpha(\Delta_j f)\|_{L^p} \lesssim 2^{j|\alpha|} \|\Delta_j f\|_{L^p}$, meaning that differentiating $\Delta_j f$ to order $|\alpha|$ costs at most a factor of $2^{j|\alpha|}$. This makes precise the informal statement that $\Delta_j f$ behaves as a smooth function at spatial scale $2^{-j}$.
[remark: Reverse Bernstein Inequalities]
When the Fourier support is an annulus rather than a ball — as for $\Delta_j f$, whose support lies in $\{2^{j-1} \le |\xi| \le 2^{j+1}\}$ — a reverse inequality also holds: $\|D^\alpha(\Delta_j f)\|_{L^p} \gtrsim 2^{j|\alpha|} \|\Delta_j f\|_{L^p}$. This follows because the multiplier $|\xi|^\alpha$ is bounded below by $2^{(j-1)|\alpha|}$ on the support of $\hat{\psi}_j$, so $\|(i\xi)^\alpha \hat{\psi}_j(\xi)\hat{f}(\xi)\|_{L^2} \ge 2^{(j-1)|\alpha|} \|\hat{\psi}_j \hat{f}\|_{L^2}$, and the $L^p$ version follows similarly. Together with the forward inequality, this gives $\|D^\alpha(\Delta_j f)\|_{L^p} \approx_{\alpha,p} 2^{j|\alpha|} \|\Delta_j f\|_{L^p}$.
[/remark]
[example: Sobolev Embedding via Littlewood-Paley]
As an illustration of how the Bernstein inequalities interact with the square function, consider $f \in H^{s,p}(\mathbb{R}^n)$ (the Bessel potential space, defined in Chapter 10). Decompose $f = \sum_j \Delta_j f$. By Bernstein, $\|\Delta_j f\|_{L^q} \lesssim 2^{j \cdot n(1/p - 1/q)} \|\Delta_j f\|_{L^p}$ for $p \le q$. If $s > n(1/p - 1/q)$, then $2^{j \cdot n(1/p - 1/q)} \lesssim 2^{js}$, so summing over $j$ with the triangle inequality gives $\|f\|_{L^q} \lesssim \sum_j 2^{j \cdot n(1/p - 1/q)} \|\Delta_j f\|_{L^p} \lesssim \sum_j 2^{j(n(1/p-1/q) - s)} \cdot 2^{js} \|\Delta_j f\|_{L^p}$. For the right choice of $s = n/p - n/q$, this recovers the Sobolev embedding $H^{s,p} \hookrightarrow L^q$. The argument illustrates the general strategy of the next two chapters: use Littlewood–Paley to localize frequency, apply Bernstein within each band, and then reassemble using the square function inequality.
[/example]
## Almost Orthogonality and Paraproducts
The Littlewood–Paley decomposition also underlies a more subtle interaction: the decomposition of a product $fg$ into frequency interactions. When $f$ and $g$ are both written as $\sum_j \Delta_j f$ and $\sum_k \Delta_k g$, the product $fg = \sum_{j,k} (\Delta_j f)(\Delta_k g)$ involves all frequency interactions. However, the interactions split into three regimes based on the relative sizes of $j$ and $k$.
[definition: Paraproduct Decomposition]
For $f, g \in \mathcal{S}(\mathbb{R}^n)$, the **Bony paraproduct decomposition** is
\begin{align*}
fg = \Pi(f, g) + \Pi(g, f) + R(f, g),
\end{align*}
where
\begin{align*}
\Pi(f, g) &:= \sum_{j \in \mathbb{Z}} (S_{j-1} f)(\Delta_j g), \\
R(f, g) &:= \sum_{\substack{j, k \in \mathbb{Z} \\ |j - k| \le 1}} (\Delta_j f)(\Delta_k g).
\end{align*}
Here $\Pi(f, g)$ is the **paraproduct of $f$ acting on $g$** (low-frequency of $f$ times high-frequency of $g$) and $R(f, g)$ is the **resonance term** (same-frequency interactions).
[/definition]
[explanation: Why the Paraproduct Decomposition Matters]
The three terms in the Bony decomposition have fundamentally different frequency-support properties. In $\Pi(f, g)$, the factor $S_{j-1} f$ has Fourier support in $\{|\xi| \le 2^{j-1}\}$, while $\Delta_j g$ has support in $\{2^{j-1} \le |\xi| \le 2^{j+1}\}$. Their product $(S_{j-1}f)(\Delta_j g)$ has Fourier support in $\{|\xi| \le 3 \cdot 2^{j-1}\}$, which is dominated by the higher-frequency factor $\Delta_j g$. This locality in frequency means that $\Pi(f, g)$ is a genuine paraproduct: it behaves like $g$ in terms of regularity, modulated by $f$.
The resonance term $R(f, g)$ pairs frequencies of the same scale, and its product $(\Delta_j f)(\Delta_k g)$ with $|j - k| \le 1$ has Fourier support in a set of measure $\lesssim 2^{jn}$, but supported at frequencies of size $\approx 2^j$. The key gain is that $R(f, g)$ lives at frequency $\approx 2^j$, and if both $f$ and $g$ have Hölder-type regularity, the resonance term has twice the regularity of either factor — a genuinely superlinear gain.
This decomposition is the key tool in the proof of the Kato–Ponce commutator estimate and in the theory of paradifferential operators, and it is the foundation for the nonlinear analysis of PDEs in Besov and Triebel–Lizorkin spaces.
[/explanation]
The paraproduct $\Pi(f, g)$ is bounded as an operator in regularity: if $f \in L^\infty$ and $g \in L^p$, then $\Pi(f, g) \in L^p$ with $\|\Pi(f,g)\|_{L^p} \lesssim \|f\|_{L^\infty}\|g\|_{L^p}$. More refined estimates hold when $f \in \text{BMO}$ or $f$ belongs to a Sobolev or Besov space, and these form the analytic core of paralinearization in PDE theory.
<!-- illustration-needed: the frequency interaction diagram for the Bony paraproduct — show three regions in the $(j, k)$ plane, shading the low-high region (|k| >> |j|) for Pi(f,g), the high-low region for Pi(g,f), and the diagonal band |j-k| <= 1 for the resonance term R(f,g) -->
Fourier multipliers acting on frequency-localized pieces benefit from the refined understanding Littlewood-Paley theory provides. Fourier multipliers extend classical convolution operators to the frequency domain, with boundedness determined by careful control of the multiplier symbol across all frequencies.
# 10. Fourier Multipliers
The preceding chapter equipped us with the Littlewood--Paley square-function, a tool that converts $L^p$ norm control into $\ell^2$ control over dyadic frequency pieces. The natural next question is: which operations on the Fourier side of a function preserve $L^p$ boundedness? This chapter answers that question systematically. The central objects are **Fourier multiplier operators** $T_m$, defined by $\widehat{T_m f}(\xi) = m(\xi)\hat{f}(\xi)$ for a symbol $m : \mathbb{R}^n \to \mathbb{C}$, and the central results are two multiplier theorems — one due to Mihlin and one due to Marcinkiewicz — that give checkable derivative conditions on $m$ guaranteeing $T_m : L^p(\mathbb{R}^n) \to L^p(\mathbb{R}^n)$ for all $1 < p < \infty$. The chapter closes by applying these results to establish the $L^p$ Calderón--Zygmund inequality for the Laplacian and to define the Bessel potential spaces $H^{s,p}(\mathbb{R}^n)$, which are the $L^p$-analogues of the Sobolev spaces $H^s(\mathbb{R}^n)$.
## Fourier Multiplier Operators
Before proving any boundedness theorems, we need to set up the class of operators under study and understand what kind of conditions on the symbol $m$ might be sufficient.
Suppose $f \in \mathcal{S}(\mathbb{R}^n)$. The Fourier transform $\hat{f} : \mathbb{R}^n \to \mathbb{C}$ decomposes $f$ into plane waves. Multiplying $\hat{f}(\xi)$ by a function $m(\xi)$ and taking the inverse Fourier transform is the most natural way to build a translation-invariant operator: if $T$ commutes with all translations $\tau_y f(x) = f(x - y)$, then on the Fourier side $T$ must act as multiplication by some symbol.
[definition: Fourier Multiplier Operator]
Let $m \in L^\infty(\mathbb{R}^n)$. The **Fourier multiplier operator** with symbol $m$ is the operator $T_m : \mathcal{S}(\mathbb{R}^n) \to \mathcal{S}'(\mathbb{R}^n)$ defined by
\begin{align*}
\widehat{T_m f}(\xi) = m(\xi)\,\hat{f}(\xi), \quad \xi \in \mathbb{R}^n.
\end{align*}
By Plancherel, $T_m$ extends to a bounded operator $T_m : L^2(\mathbb{R}^n) \to L^2(\mathbb{R}^n)$ with $\|T_m\|_{\mathcal{L}(L^2)} = \|m\|_{L^\infty}$.
[/definition]
The $L^2$ theory is immediate — it reduces to a pointwise bound on $m$. The hard question is $L^p$ for $p \ne 2$. Boundedness on $L^p$ is a much more delicate property of $m$ that cannot be read off from $\|m\|_{L^\infty}$ alone.
[example: Riesz Transforms as Multipliers]
The Riesz transforms $R_j$, $j = 1, \ldots, n$, are the prototypical Fourier multiplier operators, with symbols
\begin{align*}
m_j(\xi) = \frac{-i\xi_j}{|\xi|}, \quad \xi \in \mathbb{R}^n \setminus \{0\}.
\end{align*}
Each $m_j$ is homogeneous of degree zero and smooth away from the origin. Since $|m_j(\xi)| = |\xi_j|/|\xi| \le 1$, the $L^\infty$ bound holds. The $L^p$ boundedness of $R_j$ for $1 < p < \infty$ was established in Chapter 5 via the Calderón--Zygmund theorem. The Riesz transforms thus exemplify the pattern: a symbol that is bounded and has controlled derivatives away from the origin yields a bounded multiplier operator on $L^p$.
[/example]
[example: The Hilbert Transform as a Multiplier]
In dimension $n = 1$, the Hilbert transform has symbol $m(\xi) = -i\,\operatorname{sgn}(\xi)$. This is bounded, homogeneous of degree zero, and smooth away from $\xi = 0$. The $L^p$ boundedness of $H$ for $1 < p < \infty$ established in Chapter 4 is again consistent with the pattern.
[/example]
The key structural insight underlying both the Mihlin and Marcinkiewicz theorems is:
> A multiplier operator $T_m$ is bounded on $L^p$ if and only if it is a Calderón--Zygmund operator, i.e., if the kernel $K = \mathcal{F}^{-1}m$ satisfies the size and Hörmander smoothness conditions from Chapter 5.
Derivative conditions on $m$ are precisely conditions that force $K$ to be a Calderón--Zygmund kernel.
## The Mihlin Multiplier Theorem
The question the Mihlin theorem answers is: what pointwise differential conditions on $m$ guarantee that $K = \mathcal{F}^{-1} m$ is a Calderón--Zygmund kernel and hence $T_m$ is bounded on $L^p$?
The relevant conditions are scale-invariant bounds on the partial derivatives of $m$, reflecting the homogeneity of degree zero that characterises operators like the Riesz transforms.
[definition: Mihlin Multiplier]
A measurable function $m : \mathbb{R}^n \setminus \{0\} \to \mathbb{C}$ is a **Mihlin multiplier** with constant $A$ if
\begin{align*}
|D^\alpha m(\xi)| \le A\,|\xi|^{-|\alpha|}
\end{align*}
for all multi-indices $\alpha$ with $|\alpha| \le \lfloor n/2 \rfloor + 1$ and all $\xi \ne 0$.
[/definition]
The threshold order $\lfloor n/2\rfloor + 1$ is dimension-dependent and reflects the cost of passing from sphere $L^2$ control to pointwise kernel estimates.
[remark: The Threshold Order]
The threshold $\lfloor n/2 \rfloor + 1$ appears because the Sobolev embedding $W^{\lfloor n/2 \rfloor + 1, 2}(S^{n-1}) \hookrightarrow L^\infty(S^{n-1})$ is used in verifying the Hörmander kernel condition. The bound requires just enough derivative control to pass from $L^2$ estimates on the sphere to pointwise estimates on the kernel.
[/remark]
[quotetheorem:3189]
[citeproof:3189]
The Mihlin theorem reduces the $L^p$ theory of a multiplier to the size of its derivatives. The necessity of the condition is not claimed — there are bounded $L^p$ multipliers that fail Mihlin bounds — but sufficiency is what is needed for the applications in this course.
[example: Partial Differential Operators as Multipliers]
Let $P(\xi) = \sum_{|\alpha| \le m} a_\alpha \xi^\alpha$ be a homogeneous elliptic polynomial of degree $m$ (so $|P(\xi)| \ge c|\xi|^m$ for all $\xi \ne 0$). The multiplier $m(\xi) = \xi^\beta / P(\xi)$ for a multi-index $|\beta| = m$ satisfies the Mihlin condition: since both numerator and denominator are homogeneous of degree $m$, the quotient is homogeneous of degree zero, and one verifies $|D^\alpha m(\xi)| \lesssim |\xi|^{-|\alpha|}$ from the chain and product rules together with the ellipticity bound $|P(\xi)| \ge c|\xi|^m$. In particular, the Riesz transforms have $P(\xi) = i|\xi|$ and $\beta = e_j$, so the multiplier $-i\xi_j / |\xi|$ is of this type.
[/example]
## The Marcinkiewicz Multiplier Theorem
The Mihlin theorem requires derivative bounds at every order up to $\lfloor n/2 \rfloor + 1$. The Marcinkiewicz theorem is a more economical condition: it asks only that $m$ has bounded variation on each dyadic interval (in dimension one) or on each dyadic rectangle (in higher dimensions), with a bound independent of the dyadic scale.
### The One-Dimensional Statement
In dimension $n = 1$, fix a dyadic decomposition of $\mathbb{R} \setminus \{0\}$ into intervals $I_j = [2^j, 2^{j+1})$ and $I_j^- = (-2^{j+1}, -2^j]$ for $j \in \mathbb{Z}$.
[definition: Marcinkiewicz Multiplier in Dimension One]
A function $m \in L^\infty(\mathbb{R})$ is a **Marcinkiewicz multiplier** in dimension one with constant $A$ if $\|m\|_{L^\infty} \le A$ and
\begin{align*}
\sup_{j \in \mathbb{Z}} \int_{2^j}^{2^{j+1}} |m'(\xi)|\,d\mathcal{L}^1(\xi) \le A.
\end{align*}
[/definition]
[quotetheorem:3190]
[citeproof:3190]
The Marcinkiewicz condition is strictly weaker than the Mihlin condition in dimension one: a function $m$ satisfying $|m'(\xi)| \le A/|\xi|$ automatically satisfies the Marcinkiewicz variation bound (since $\int_{2^j}^{2^{j+1}} A/|\xi|\,d\mathcal{L}^1(\xi) \le A \log 2$), but the Marcinkiewicz condition allows symbols with merely bounded variation on each dyadic interval even if the derivative is not pointwise controlled.
### The Higher-Dimensional Product Condition
In $\mathbb{R}^n$, the Marcinkiewicz theorem applies to symbols that behave like products of one-dimensional Marcinkiewicz multipliers in each coordinate direction.
[definition: Marcinkiewicz Multiplier in Dimension $n$]
For $n \ge 2$, a function $m \in L^\infty(\mathbb{R}^n)$ is a **Marcinkiewicz multiplier** with constant $A$ if $\|m\|_{L^\infty} \le A$ and for every subset $S \subseteq \{1, \ldots, n\}$ and every dyadic rectangle $R = I_{j_1} \times \cdots \times I_{j_n}$ (where each $I_{j_k}$ is a dyadic interval in the $\xi_k$-variable),
\begin{align*}
\int_R \left| D^\alpha m(\xi) \right| d\mathcal{L}^{|S|}(\xi_S) \le A,
\end{align*}
where $\alpha = \sum_{k \in S} e_k$ and the integration is over the coordinates indexed by $S$, uniformly over all choices of the remaining coordinates.
[/definition]
[quotetheorem:3191]
[citeproof:3191]
The proof in the higher-dimensional case proceeds by iterating the one-dimensional argument in each coordinate, using the tensor structure of the dyadic rectangles. The Littlewood--Paley theory in each variable independently provides the needed square-function estimates.
[remark: Comparison of Mihlin and Marcinkiewicz]
In dimension $n = 1$, Mihlin requires $|m(\xi)| \le A$ and $|m'(\xi)| \le A/|\xi|$, whereas Marcinkiewicz requires $|m(\xi)| \le A$ and bounded variation on each dyadic interval. Since $|m'(\xi)| \le A/|\xi|$ implies variation $\le A \log 2$ per dyadic interval, every Mihlin multiplier is a Marcinkiewicz multiplier. In dimension $n \ge 2$, the Mihlin condition requires mixed derivatives up to order $\lfloor n/2 \rfloor + 1$ at every point, while the Marcinkiewicz condition requires mixed-derivative integrals over dyadic rectangles. The two conditions are neither directly comparable in $n \ge 2$, but both imply $L^p$ boundedness and are used in different contexts depending on what regularity of $m$ is available.
[/remark]
## The Calderón--Zygmund Inequality for the Laplacian
A fundamental application of the Mihlin theorem is the control of all second-order partial derivatives of a function $u$ by its Laplacian in $L^p$. This is the $L^p$ analogue of the classical Schauder estimate for elliptic operators.
The question is the following: if $\Delta u \in L^p(\mathbb{R}^n)$ for $1 < p < \infty$, can we conclude that all second-order mixed partial derivatives $\partial_i \partial_j u$ lie in $L^p$ as well, with a uniform bound?
The Fourier transform converts this into a multiplier problem. On the Fourier side,
\begin{align*}
\widehat{\partial_i \partial_j u}(\xi) = -\xi_i \xi_j \hat{u}(\xi), \quad \widehat{\Delta u}(\xi) = -|\xi|^2 \hat{u}(\xi),
\end{align*}
so formally $\widehat{\partial_i \partial_j u}(\xi) = \frac{\xi_i \xi_j}{|\xi|^2} \widehat{\Delta u}(\xi)$. That is, the operator $u \mapsto \partial_i \partial_j u$ factors through $\Delta$ via the multiplier
\begin{align*}
m_{ij}(\xi) = \frac{\xi_i \xi_j}{|\xi|^2}.
\end{align*}
This is precisely the composition $-R_i \circ R_j$ of two Riesz transforms (up to sign).
[quotetheorem:3192]
[citeproof:3192]
[explanation: Why This Fails at $p = 1$ and $p = \infty$]
The inequality breaks down at the endpoints of the $L^p$ scale. For $p = \infty$, one can construct $u \in C^\infty(\mathbb{R}^n)$ with $\Delta u \in L^\infty$ but $\partial_i \partial_j u \notin L^\infty$; a standard example is $u(x) = |x|^2 \log |x|$ near the origin in $\mathbb{R}^2$. For $p = 1$, the Calderón--Zygmund theorem shows that the Riesz transforms are not bounded on $L^1$; hence $T_{m_{ij}} : L^1 \to L^1$ fails. The correct substitute at $p = 1$ is the Hardy space $H^1$: since $T_{m_{ij}}$ is a Calderón--Zygmund operator, it maps $H^1(\mathbb{R}^n) \to L^1(\mathbb{R}^n)$.
At $p = \infty$, the correct space is BMO: $T_{m_{ij}} : L^\infty \to \mathrm{BMO}$, which is the endpoint Calderón--Zygmund result from Chapter 8. The inequality $\|\partial_i \partial_j u\|_{\mathrm{BMO}} \lesssim \|\Delta u\|_{L^\infty}$ thus holds.
[/explanation]
The Calderón--Zygmund inequality for the Laplacian has an important consequence for elliptic PDE. On a bounded domain $\Omega \subset \mathbb{R}^n$ with smooth boundary, the $W^{2,p}$ regularity estimate for solutions of the Poisson equation $-\Delta u = f$ with Dirichlet boundary conditions reads $\|u\|_{W^{2,p}(\Omega)} \le C_{p,\Omega}\,\|f\|_{L^p(\Omega)}$. The key ingredient in proving this on $\mathbb{R}^n$ is precisely the Calderón--Zygmund inequality.
## Bessel Potential Spaces
The classical Sobolev spaces $W^{k,p}(\mathbb{R}^n)$ are defined for integer $k \ge 0$ using weak derivatives. To define a scale of Sobolev-type spaces for all real orders $s \in \mathbb{R}$, and to maintain good $L^p$ theory rather than the $L^2$ theory of $H^s(\mathbb{R}^n)$, one uses the Bessel potential operator.
The idea is to replace the homogeneous operator $(-\Delta)^{s/2}$ — whose multiplier $|\xi|^s$ is singular at the origin — by the inhomogeneous operator $(I - \Delta)^{s/2}$, whose multiplier $(1 + |\xi|^2)^{s/2}$ is smooth everywhere.
[definition: Bessel Potential Operator]
For $s \in \mathbb{R}$, the **Bessel potential operator** $J^s : \mathcal{S}(\mathbb{R}^n) \to \mathcal{S}'(\mathbb{R}^n)$ is the Fourier multiplier operator
\begin{align*}
\widehat{J^s f}(\xi) = (1 + |\xi|^2)^{-s/2}\,\hat{f}(\xi), \quad \xi \in \mathbb{R}^n.
\end{align*}
The symbol $(1 + |\xi|^2)^{-s/2}$ defines a smooth function on all of $\mathbb{R}^n$.
[/definition]
The operator $J^s$ is the inverse of $(I - \Delta)^{s/2}$ in the sense that $J^{-s}$ has symbol $(1 + |\xi|^2)^{s/2}$, and $J^s \circ J^{-s} = \mathrm{Id}$ on $\mathcal{S}$.
To see that $J^s$ is bounded on $L^p$ for $1 < p < \infty$, note that the symbol $(1 + |\xi|^2)^{-s/2}$ satisfies the Mihlin condition: it is smooth on all of $\mathbb{R}^n$ and for $|\xi| \ge 1$ behaves like $|\xi|^{-s}$, so $|D^\alpha [(1 + |\xi|^2)^{-s/2}]| \lesssim (1 + |\xi|^2)^{(-s - |\alpha|)/2} \lesssim |\xi|^{-|\alpha|}$ for $|\xi| \ge 1$, while for $|\xi| \le 1$ the symbol and all its derivatives are bounded. The Mihlin theorem therefore gives $J^s : L^p(\mathbb{R}^n) \to L^p(\mathbb{R}^n)$ for all $1 < p < \infty$.
[definition: Bessel Potential Space]
For $s \in \mathbb{R}$ and $1 < p < \infty$, the **Bessel potential space** $H^{s,p}(\mathbb{R}^n)$ is
\begin{align*}
H^{s,p}(\mathbb{R}^n) = \{f \in \mathcal{S}'(\mathbb{R}^n) : J^{-s} f \in L^p(\mathbb{R}^n)\},
\end{align*}
equipped with the norm $\|f\|_{H^{s,p}(\mathbb{R}^n)} = \|J^{-s} f\|_{L^p(\mathbb{R}^n)}$.
[/definition]
Equivalently, $H^{s,p}(\mathbb{R}^n) = J^s(L^p(\mathbb{R}^n))$: the space consists of all functions of the form $f = J^s g$ with $g \in L^p$, and $\|f\|_{H^{s,p}} = \|g\|_{L^p}$. The operator $J^s : L^p \to H^{s,p}$ is an isometric isomorphism by construction.
[remark: Special Cases]
When $s = 0$, $J^0 = \mathrm{Id}$ and $H^{0,p} = L^p(\mathbb{R}^n)$. When $s > 0$, $H^{s,p}$ consists of functions with additional regularity; when $s < 0$, it is a space of less-regular distributions.
[/remark]
The central structural theorem is the identification of $H^{s,p}$ with the Sobolev spaces for integer orders.
[quotetheorem:3193]
[citeproof:3193]
This identification justifies using $H^{s,p}(\mathbb{R}^n)$ as the fractional-order generalisation of Sobolev spaces for all $s \in \mathbb{R}$ and $1 < p < \infty$. The Bessel potential spaces will reappear in Chapter 11 as the $F^s_{p,2}$ Triebel--Lizorkin spaces, where the identification $H^{s,p} = F^s_{p,2}$ for $1 < p < \infty$ places them in a two-parameter scale.
[example: The Heat Semigroup and Bessel Potentials]
The heat kernel $e^{t\Delta}$ has symbol $e^{-t|\xi|^2}$. For $t = 1$, the symbol decays rapidly and defines a bounded operator on every $H^{s,p}$. More precisely, $e^{t\Delta} : H^{s,p} \to H^{r,p}$ for any $r > s$ and any $t > 0$, with operator norm that depends on $r - s$ and $t$: the gain in differentiability is
\begin{align*}
\|e^{t\Delta} f\|_{H^{r,p}} \lesssim t^{-(r-s)/2}\,\|f\|_{H^{s,p}}.
\end{align*}
This follows because the symbol of $J^{-r} \circ e^{t\Delta} \circ J^s$ is $(1 + |\xi|^2)^{(s-r)/2} e^{-t|\xi|^2}$, which for $r > s$ decays faster than any polynomial and hence satisfies the Mihlin condition with a constant proportional to $t^{-(r-s)/2}$ from the critical scale $|\xi| \sim t^{-1/2}$.
[/example]
The Bessel potential operator provides the language in which fractional-order Sobolev regularity is expressed throughout the rest of the course. The spaces $H^{s,p}(\mathbb{R}^n)$ are the correct domains for studying the mapping properties of differential operators of non-integer order, and they form the backbone of the theory of pseudodifferential operators.
The frequency-domain perspective extends naturally to entire scales of function spaces parametrised by smoothness and integrability. Besov and Triebel-Lizorkin spaces unify Sobolev, Hölder, and intermediate regularity classes through their Littlewood-Paley pieces, providing the right scale for sharp embeddings.
# 11. Besov and Triebel-Lizorkin Spaces
## Chapter 11: Besov and Triebel–Lizorkin Spaces
The Bessel potential spaces $H^{s,p}(\mathbb{R}^n)$ introduced in Chapter 10 provide a natural extension of Sobolev spaces to non-integer smoothness, but they leave an uncomfortable gap: for $p = \infty$ the theory degenerates, and there is no clean characterisation of functions with Hölder regularity in terms of Fourier multipliers. The two families studied here, $B^s_{p,q}$ and $F^s_{p,q}$, are designed to close that gap. Both are defined by imposing quantitative conditions on the Littlewood–Paley pieces $\Delta_j f$ introduced in Chapter 9, but they differ in the order of integration: Besov spaces take $\ell^q$ first over scales and then $L^p$ over space, while Triebel–Lizorkin spaces reverse the order. This seemingly minor difference separates classical objects that have different functional-analytic characters: Sobolev spaces and Bessel potential spaces sit inside the Triebel–Lizorkin family, while Hölder–Zygmund spaces and the Slobodeckij fractional Sobolev spaces sit inside the Besov family. Real interpolation between Sobolev spaces produces Besov spaces, which explains why these spaces arise naturally in regularity theory for PDEs and in approximation theory.
## Besov Spaces
### Why $\ell^q$ over scales is not enough
After the Littlewood–Paley square function $\|f\|_p \approx \|S(f)\|_p$ (with $S(f) = (\sum_j |\Delta_j f|^2)^{1/2}$), one natural question is: what if we weight each dyadic piece $\Delta_j f$ by a factor $2^{js}$ encoding $s$ derivatives, and measure the result in $L^p$? For integer $s$ this recovers the Sobolev norm. But to build a two-parameter family — one parameter for integrability, one for measuring the summability over scales — we need to choose how to aggregate the contributions $2^{js}\|\Delta_j f\|_p$ across $j$. Replacing the implicit $\ell^2$ by a general $\ell^q$ gives the Besov norm.
[definition: Besov Space]
Let $s \in \mathbb{R}$, $1 \le p \le \infty$, and $1 \le q \le \infty$. Fix a Littlewood–Paley resolution $\{\Delta_j\}_{j \in \mathbb{Z}}$ as in Chapter 9. The **Besov space** $B^s_{p,q}(\mathbb{R}^n)$ consists of all tempered distributions $f \in \mathcal{S}'(\mathbb{R}^n)$ for which
\begin{align*}
\|f\|_{B^s_{p,q}} := \left(\sum_{j \in \mathbb{Z}} 2^{jsq}\|\Delta_j f\|_{L^p}^q\right)^{1/q} < \infty,
\end{align*}
with the obvious modification $\|f\|_{B^s_{p,\infty}} := \sup_{j \in \mathbb{Z}} 2^{js}\|\Delta_j f\|_{L^p}$ when $q = \infty$.
[/definition]
The structure of this norm is transparent: $\|\Delta_j f\|_{L^p}$ measures the size of $f$ near frequency $|\xi| \sim 2^j$, and the weight $2^{js}$ penalises or rewards contributions at high frequencies according to the sign of $s$. Taking $s > 0$ with large $s$ forces the high-frequency pieces to be very small, meaning the function is very smooth. Taking $s < 0$ allows high-frequency blow-up at a controlled rate, capturing distributions of negative regularity.
The notation "$\ell^q$ in scale, $L^p$ in space" records that the norm integrates spatially (in $L^p$) before summing over scales (in $\ell^q$). The $q$ parameter is thus a second measure of smoothness: small $q$ is more restrictive (the partial sums $\sum_{j \le N} 2^{jsq}\|\Delta_j f\|_p^q$ must grow slowly), while large $q$ is more permissive.
### Independence of the Littlewood–Paley resolution
A crucial point is that the space $B^s_{p,q}(\mathbb{R}^n)$ does not depend on the particular resolution $\{\Delta_j\}$ used to define it — different choices of $\hat\psi$ satisfying the standard support and partition-of-unity conditions yield equivalent norms.
[quotetheorem:3194]
[citeproof:3194]
The proof uses the fact that each $\Delta_j$ can be expanded as a rapidly convergent series in the $\tilde\Delta_k$ (since the frequency supports have bounded overlap), and the resulting transfer matrix has rapidly decaying entries that can be controlled by Young's convolution inequality for $\ell^q$ sequences. This independence is what makes $B^s_{p,q}$ an intrinsic object rather than an artifact of a particular decomposition.
[example: A Smooth Function and a Distribution in Besov Spaces]
Take $f \in \mathcal{S}(\mathbb{R}^n)$. For any $j$, the frequency projection $\Delta_j f$ is Schwartz, and Bernstein's inequality (Chapter 9) gives $\|\Delta_j f\|_{L^p} \lesssim_N 2^{-jN}\|f\|_{L^p}$ for any $N > 0$. Therefore $2^{js}\|\Delta_j f\|_{L^p} \lesssim_N 2^{j(s-N)}$, which is summable in $\ell^q$ for $N$ large enough. This shows $\mathcal{S}(\mathbb{R}^n) \subset B^s_{p,q}(\mathbb{R}^n)$ for all $s, p, q$.
At the other extreme, the Dirac delta $\delta_0 \in \mathcal{S}'(\mathbb{R}^n)$ satisfies $\widehat{\Delta_j \delta_0}(\xi) = \hat\psi(2^{-j}\xi)$, so $\|\Delta_j\delta_0\|_{L^p} \sim 2^{jn(1-1/p)}$ by scaling. Thus $2^{js}\|\Delta_j\delta_0\|_{L^p} \sim 2^{j(s + n - n/p)}$. This is bounded in $j$ — the $\ell^\infty$ norm is finite — precisely when $s \le -n + n/p = -n/p'$. For example, $\delta_0 \in B^{-n/p'}_{p,\infty}(\mathbb{R}^n)$, which quantifies the regularity deficit of a point mass.
[/example]
## Triebel–Lizorkin Spaces
### Reversing the order of integration
The Besov norm applies $L^p$ in space first: compute the spatial norm of each dyadic piece $\Delta_j f$, weight by $2^{js}$, then aggregate over scales in $\ell^q$. The Triebel–Lizorkin norm reverses the operations: for each fixed $x$, look at the sequence $(2^{js}|\Delta_j f(x)|)_{j \in \mathbb{Z}}$ of values across scales, take its $\ell^q$ norm to produce a scalar function of $x$, and then measure this in $L^p$.
[definition: Triebel–Lizorkin Space]
Let $s \in \mathbb{R}$, $1 \le p < \infty$, and $1 \le q \le \infty$. Fix a Littlewood–Paley resolution $\{\Delta_j\}$. The **Triebel–Lizorkin space** $F^s_{p,q}(\mathbb{R}^n)$ consists of all $f \in \mathcal{S}'(\mathbb{R}^n)$ for which
\begin{align*}
\|f\|_{F^s_{p,q}} := \left\|\left(\sum_{j \in \mathbb{Z}} 2^{jsq}|\Delta_j f(\cdot)|^q\right)^{1/q}\right\|_{L^p} < \infty,
\end{align*}
with the modification $\|f\|_{F^s_{p,\infty}} := \|\sup_{j \in \mathbb{Z}} 2^{js}|\Delta_j f(\cdot)|\|_{L^p}$ when $q = \infty$.
The case $p = \infty$ is excluded because the formula as written reduces to the Besov norm $\|\cdot\|_{B^s_{\infty,q}}$ and one cannot exchange supremum-in-$L^\infty$ with the $\ell^q$ in scale in the same way.
[/definition]
[remark: Comparing Besov and Triebel–Lizorkin norms]
For the Besov norm, the $\ell^q$ summation over $j$ can cancel oscillations in $j$: a function with large $\|\Delta_j f\|_p$ at a single scale $j = j_0$ but small at all other scales will have finite $B^s_{p,q}$ norm (if $q < \infty$) provided the single large piece is absorbed by summability. For the Triebel–Lizorkin norm, the $\ell^q$ over $j$ is taken pointwise before $L^p$, so a function with large $|\Delta_{j_0} f(x)|$ for all $x$ in a set of positive measure cannot hide: the pointwise $\ell^q$ norm is large everywhere on that set.
[/remark]
As with Besov spaces, the Triebel–Lizorkin space $F^s_{p,q}(\mathbb{R}^n)$ is independent of the particular resolution $\{\Delta_j\}$: changing the bump function $\hat\psi$ produces an equivalent norm, via the same rapidly-decaying transfer-matrix argument.
## Identifications with Classical Spaces
The natural question now is whether Besov and Triebel–Lizorkin spaces are genuinely new objects or whether they redescribe classical spaces. The identifications below show the latter — but in a way that organises the classical hierarchy coherently through the Littlewood–Paley resolution.
### Bessel potential spaces are Triebel–Lizorkin
[quotetheorem:3195]
[citeproof:3195]
This identification is the payoff for the seemingly strange choice $q = 2$ in the Triebel–Lizorkin norm. The Bessel potential norm $\|f\|_{H^{s,p}} = \|(1 + |\xi|^2)^{s/2}\hat f\|_{L^p(\mathcal{F}^{-1})}$ asks for the $L^p$ norm of a Fourier multiplier of $f$. The Triebel–Lizorkin norm with $q = 2$ gives
\begin{align*}
\|f\|_{F^s_{p,2}} = \left\|\left(\sum_j 2^{2js}|\Delta_j f(\cdot)|^2\right)^{1/2}\right\|_{L^p},
\end{align*}
which is precisely the Littlewood–Paley square function applied to the frequency-weighted pieces. The Littlewood–Paley theorem (Chapter 9) then equates this with $\|(1+|\cdot|^2)^{s/2}\hat f\|_{L^p}$ via an equivalence of multipliers on each dyadic annulus. The restriction $1 < p < \infty$ is essential: the Littlewood–Paley theorem fails at $p = 1$ and $p = \infty$.
Since $H^{s,p} = W^{s,p}$ for integer $s \ge 0$ and $1 < p < \infty$ (as established in Chapter 10), we obtain the chain of identifications $F^k_{p,2} = W^{k,p}$ for $k \in \mathbb{N}$, $1 < p < \infty$.
### Hölder–Zygmund spaces are Besov
[quotetheorem:3196]
[citeproof:3196]
The condition $\|f\|_{B^s_{\infty,\infty}} = \sup_j 2^{js}\|\Delta_j f\|_{L^\infty} < \infty$ says that the piece of $f$ localised near frequency $|\xi| \sim 2^j$ is bounded in $L^\infty$ by a constant times $2^{-js}$. By Bernstein's inequality and the support of $\hat\psi$, a function satisfying this condition has bounded $s$-th order differences, which is exactly the Hölder–Zygmund condition. The integer values of $s$ require some care: the classical Hölder space $C^{k,0} = C^k$ is strictly smaller than $C^k_*$ because the Zygmund condition on second differences is weaker than Lipschitz in the first derivatives. This is why the identification $C^s_* = C^s$ requires $s \notin \mathbb{Z}$.
### Fractional Sobolev spaces are Besov
[quotetheorem:3197]
[citeproof:3197]
The Slobodeckij seminorm measures $s$-th order fractional differentiability by integrating a difference quotient. Passing to the Fourier side, the double integral becomes a convolution with a singular kernel whose Fourier transform is $|\xi|^{sp}$, up to constants. A Littlewood–Paley decomposition splits this into contributions from each dyadic annulus, and within annulus $j$ the contribution is $\sim 2^{jsp}\|\Delta_j f\|_{L^p}^p$. Summing over $j$ recovers the Besov norm $\|f\|_{B^s_{p,p}}^p = \sum_j 2^{jsp}\|\Delta_j f\|_{L^p}^p$ exactly. The identification fails for $s \ge 1$ because the Slobodeckij space for $s \ge 1$ involves derivatives, while $B^s_{p,p}$ for $s \ge 1$ is genuinely larger.
[remark: The Parameter $q$ as a Fine Index]
The identifications above make clear what the parameter $q$ measures. For Besov spaces $B^s_{p,q}$, varying $q$ with fixed $s$ and $p$ gives a family of spaces nested by inclusion: $B^s_{p,q_1} \subset B^s_{p,q_2}$ for $q_1 \le q_2$ (since $\ell^{q_1} \hookrightarrow \ell^{q_2}$). The classical spaces occupy specific positions: $W^{s,p} = B^s_{p,p}$, and making $q$ smaller (more stringent) or larger (more permissive) gives spaces with strictly finer or coarser smoothness properties at the same nominal regularity $s$.
[/remark]
## Embeddings
### Sobolev-type embeddings within the Besov family
The critical quantity governing Sobolev embeddings is the **Sobolev exponent** $s - n/p$, which measures smoothness relative to spatial dimension. Two Besov spaces at the same Sobolev exponent but different integrability are related by a continuous embedding.
[quotetheorem:3198]
[citeproof:3198]
The condition that $q$ is the same on both sides of the embedding is necessary: without it, the embedding between Besov spaces with different $q$ also involves a relationship between $q_0$ and $q_1$. The reason $q$ is preserved here is that Bernstein's inequality acts purely on each dyadic piece separately, without coupling scales.
### Cross embeddings between Besov and Triebel–Lizorkin
Besov and Triebel–Lizorkin spaces at the same $(s, p)$ but different $q$ are related by continuous embeddings that arise from $\ell^p \hookrightarrow \ell^q$ inequalities at the level of pointwise sequences.
[quotetheorem:3199]
[citeproof:3199]
The proof uses Minkowski's inequality for mixed-norm spaces. The inner embedding says the more restrictive Besov norm (with $\ell^{\min(p,q)}$ over scales) controls the Triebel–Lizorkin norm, and the outer embedding says the Triebel–Lizorkin norm controls the more permissive Besov norm (with $\ell^{\max(p,q)}$ over scales). The special case $q = 2$ and $p = 2$ gives $B^s_{2,2} = F^s_{2,2} = H^{s,2} = H^s$, which is the single $L^2$-based scale.
[remark: What the Cross Embeddings Reveal]
The embeddings $B^s_{p,p} \hookrightarrow F^s_{p,p} \hookrightarrow B^s_{p,p}$ (equal when $q = p$, by definition) confirm that the Slobodeckij space $W^{s,p} = B^s_{p,p}$ and the Triebel–Lizorkin space $F^s_{p,p}$ coincide. More generally, for $q < p$ the Besov space $B^s_{p,q}$ is strictly smaller than $F^s_{p,q}$, and for $q > p$ the containment reverses. This interplay is one reason Triebel–Lizorkin spaces are harder to work with than Besov spaces in interpolation theory: their structure is sensitive to the relationship between the spatial exponent $p$ and the scale-summability exponent $q$.
[/remark]
## Real Interpolation and the $K$-Functional
### The interpolation setup
Real interpolation, introduced in Chapter 1 via the $K$-functional, gives a clean calculus for spaces that lie between two given ones. Besov spaces arise as real interpolation spaces between Bessel potential spaces, which gives a second intrinsic characterisation and a conceptual explanation for why Besov spaces appear in PDE regularity theory.
[definition: $K$-Functional]
Let $(A_0, A_1)$ be a compatible Banach couple (meaning both embed continuously into a common Hausdorff topological vector space). For $f \in A_0 + A_1$ and $t > 0$, the **$K$-functional** is
\begin{align*}
K(t, f; A_0, A_1) := \inf_{f = f_0 + f_1,\, f_i \in A_i} \left(\|f_0\|_{A_0} + t\|f_1\|_{A_1}\right).
\end{align*}
For $0 < \theta < 1$ and $1 \le q \le \infty$, the real interpolation space $(A_0, A_1)_{\theta,q}$ consists of all $f \in A_0 + A_1$ for which
\begin{align*}
\|f\|_{(A_0,A_1)_{\theta,q}} := \left(\int_0^\infty \left(t^{-\theta} K(t, f; A_0, A_1)\right)^q \frac{dt}{t}\right)^{1/q} < \infty.
\end{align*}
[/definition]
The $K$-functional measures how cheaply $f$ can be decomposed as $f_0 + f_1$: when $t$ is small it is cheap to have $f_1$ large and $f_0$ small (the weight $t$ on $\|f_1\|_{A_1}$ is small), while for large $t$ the decomposition is weighted in the opposite direction. The parameter $\theta$ records how much weight to place on the $A_1$ component, and $q$ controls the integrability of the resulting function of $t$.
### Interpolation produces Besov spaces
[quotetheorem:3200]
[citeproof:3200]
The striking feature of this result is that the output exponent $q$ depends only on the interpolation parameter $q$, not on $q_0$ or $q_1$. This is a special feature of Besov spaces: the secondary index $q$ is precisely the one produced by real interpolation. By contrast, complex interpolation (Riesz–Thorin) produces Triebel–Lizorkin spaces from Triebel–Lizorkin spaces.
As a corollary, interpolating between the Sobolev spaces $W^{s_0,p}$ and $W^{s_1,p}$ (which, for $1 < p < \infty$, agree with $F^{s_i}_{p,2}$) by the real method gives Besov spaces rather than Sobolev spaces:
\begin{align*}
\left(W^{s_0,p}(\mathbb{R}^n),\, W^{s_1,p}(\mathbb{R}^n)\right)_{\theta,q} = B^{s_\theta}_{p,q}(\mathbb{R}^n), \quad 1 < p < \infty.
\end{align*}
This explains the ubiquity of Besov spaces in the regularity theory of elliptic and parabolic PDEs: whenever a priori estimates at two integer Sobolev levels $W^{s_0,p}$ and $W^{s_1,p}$ are combined by a real interpolation argument, the intermediate spaces that emerge are Besov spaces.
[example: Interpolation between $L^p$ and $W^{1,p}$]
Taking $s_0 = 0$ and $s_1 = 1$ with $p \in (1, \infty)$, the interpolation theorem gives $(L^p, W^{1,p})_{\theta,q} = B^\theta_{p,q}$ for $\theta \in (0,1)$. In particular, $(L^p, W^{1,p})_{\theta,p} = W^{\theta,p}$ (the Slobodeckij space at fractional exponent $\theta$), since the output has $q = p$ matching the Slobodeckij identification $B^\theta_{p,p} = W^{\theta,p}$. For $q \ne p$, the interpolation space is a strict Besov space $B^\theta_{p,q}$ that is neither a Sobolev nor a Hölder space.
[/example]
## Hardy–Littlewood–Sobolev and Sobolev Embedding
### The fractional integral operator
The **Riesz potential operator** or **fractional integral** $I_\alpha : \mathcal{S}(\mathbb{R}^n) \to \mathcal{S}'(\mathbb{R}^n)$ for $0 < \alpha < n$ is defined via the Fourier multiplier
\begin{align*}
\widehat{I_\alpha f}(\xi) = |\xi|^{-\alpha}\hat f(\xi),
\end{align*}
or equivalently by the convolution formula $I_\alpha f(x) = c_{n,\alpha}\int_{\mathbb{R}^n} |x-y|^{\alpha - n} f(y)\, d\mathcal{L}^n(y)$, where $c_{n,\alpha} = \pi^{-n/2}2^{-\alpha}\Gamma((n-\alpha)/2)/\Gamma(\alpha/2)$. The operator $I_\alpha$ is also written as $(-\Delta)^{-\alpha/2}$.
The question driving this section is: for which exponents $p$ and $q$ is $I_\alpha : L^p(\mathbb{R}^n) \to L^q(\mathbb{R}^n)$ bounded? A scaling argument settles the issue: $I_\alpha$ maps $f \mapsto f_\lambda := f(\lambda \cdot)$ to $\lambda^{-\alpha} I_\alpha f$. Under the rescaling $x \mapsto \lambda x$, $\|f_\lambda\|_{L^p} = \lambda^{-n/p}\|f\|_{L^p}$ and $\|I_\alpha f_\lambda\|_{L^q} = \lambda^{-\alpha}\lambda^{-n/q}\|I_\alpha f\|_{L^q}$. For boundedness, one needs $\lambda^{-n/p} = \lambda^{-\alpha - n/q}$ for all $\lambda > 0$, which forces $1/q = 1/p - \alpha/n$.
[quotetheorem:469]
[citeproof:469]
The case $p = 1$ requires a weak-type conclusion: $I_\alpha : L^1(\mathbb{R}^n) \to L^{n/(n-\alpha),\infty}(\mathbb{R}^n)$ (weak-$L^{n/(n-\alpha)}$), since the kernel $|x|^{\alpha-n}$ is not in $L^{n/(n-\alpha)}$ near infinity.
### Sobolev embedding for Bessel potential spaces
The Sobolev embedding theorem for $H^{s,p}$ spaces follows immediately from the Hardy–Littlewood–Sobolev inequality by interpreting $H^{s,p}$ as the image of $L^p$ under the Bessel potential operator.
[quotetheorem:903]
[citeproof:903]
The condition $1/q = 1/p - s/n > 0$ encodes two requirements: that $s < n/p$ (ensuring $q < \infty$) and that $q > 1$ (ensuring the target space is non-trivial). When $s \ge n/p$, the embedding target is $L^\infty$ or a Hölder space, handled separately. The critical case $1/q = 1/p - s/n = 0$ (i.e., $sp = n$) corresponds to borderline embeddings into $\mathrm{BMO}$ rather than $L^\infty$.
### Sharpness via the Besov scale
The Hardy–Littlewood–Sobolev inequality and the Sobolev embedding have their sharpest formulation in the Besov setting. The embedding $B^{s_0}_{p_0,q} \hookrightarrow B^{s_1}_{p_1,q}$ under the equal-Sobolev-exponent condition $s_0 - n/p_0 = s_1 - n/p_1$ is sharp in the sense that no embedding holds if $s_0 - n/p_0 < s_1 - n/p_1$. The Triebel–Lizorkin spaces interpolate between $H^{s,p}$ and $L^q$ in a way that makes the role of the Sobolev exponent transparent.
[example: Sharpness of the Sobolev Embedding]
Consider $f_\varepsilon(x) = \varphi(x)\sum_{k=1}^\infty \varepsilon^{-s} 2^{-kn/p}e^{i2^k x_1}$ for a smooth cutoff $\varphi \in C_c^\infty(\mathbb{R}^n)$ with $\varphi \equiv 1$ near $0$, and $\varepsilon > 0$ small. Each term is localised near frequency $2^k$, so $\|\Delta_k f_\varepsilon\|_{L^p} \sim \varepsilon^{-s} 2^{-kn/p} \cdot 2^{kn/p} = \varepsilon^{-s}$ (after rescaling), and $2^{ks}\|\Delta_k f_\varepsilon\|_{L^p} \sim \varepsilon^{-s} 2^{k(s - n/p + n/p)} = \varepsilon^{-s}2^{ks}$. This diverges in $\ell^q$ as $k \to \infty$ unless $s < 0$. The example shows that without the equal-Sobolev-exponent condition, no embedding into a higher-integrability space is possible, confirming sharpness. For the borderline case $1/q = 1/p - s/n = 0$, the function $f(x) = |x|^{-n/p}(\log(2/|x|))^{-1/p - \varepsilon}\mathbb{1}_{\{|x| \le 1\}}$ lies in $W^{s,p}$ but fails to belong to $L^\infty$, showing the Sobolev embedding cannot reach $L^\infty$ at the critical exponent.
[/example]
<!-- illustration-needed: the lattice of Besov and Triebel–Lizorkin spaces in the (1/p, s) plane — show the Sobolev line s = n/p, the classical spaces $W^{k,p}$, $H^{s,p}$, $C^s$, $W^{s,p}$ as points in this lattice, and the direction of the embedding arrow under constant Sobolev exponent s - n/p = const -->
Smooth oscillatory phenomena in harmonic analysis and PDE require understanding how phase interactions affect integral estimates. Stationary phase analysis provides the asymptotic tools for evaluating oscillatory integrals, essential for proving bounds on operators depending on phase behavior.
# 12. Stationary Phase
This chapter develops the theory of oscillatory integrals — integrals of the form $\int e^{i\varphi(x)} \psi(x)\, d\mathcal{L}^n(x)$ where the phase $\varphi$ oscillates rapidly. The central question is quantitative: how fast does such an integral decay as the oscillation frequency grows? Two complementary tools answer this question. The Van der Corput lemma handles the one-dimensional case using lower bounds on derivatives of the phase, and the stationary-phase lemma treats the higher-dimensional case by localising near critical points of the phase. Together they underpin the Stein–Tomas restriction theorem and the Strichartz estimates of the following chapters.
## The Problem of Rapid Oscillation
Before formulating any estimates, it is worth understanding why oscillatory integrals decay at all, and why the rate of decay depends on how the phase $\varphi$ behaves.
Consider the simplest case: $\int_a^b e^{i\lambda t}\, d\mathcal{L}^1(t)$ for large $\lambda > 0$. This evaluates to $(e^{i\lambda b} - e^{i\lambda a})/(i\lambda)$, which has modulus at most $2/\lambda$. The mechanism is cancellation: over each period $2\pi/\lambda$, the integrand completes a full cycle and the positive and negative parts cancel. As $\lambda \to \infty$ the periods shrink and the cancellation becomes more effective.
The challenge arises when the phase is not purely linear. If $\varphi$ has a critical point — a point where $\varphi'(x_0) = 0$ — then near $x_0$ the integrand $e^{i\varphi(x)}$ oscillates slowly. The integrand stays near $e^{i\varphi(x_0)}$ for a range of $x$ values of width roughly $\lambda^{-1/2}$ (since $\varphi(x) \approx \varphi(x_0) + \frac{1}{2}\varphi''(x_0)(x-x_0)^2$ and this is order $1$ when $|x - x_0| \sim \lambda^{-1/2}$). The contribution from this stationary region is of order $\lambda^{-1/2}$, which is larger than the $\lambda^{-1}$ decay from points where $\varphi' \ne 0$. Critical points are thus the dominant contributors to oscillatory integrals, and understanding them determines the decay rate.
<!-- illustration-needed: A plot of $e^{i\lambda\varphi(t)}$ for a phase with one critical point: show rapid oscillation away from the critical point and slow oscillation near it, indicating the dominant contribution region of width ~lambda^{-1/2} -->
## The Van der Corput Lemma
The Van der Corput lemma provides a quantitative bound for one-dimensional oscillatory integrals under a lower bound on some derivative of the phase. It is the workhorse of the theory.
[definition: Oscillatory Integral]
Let $\varphi : [a, b] \to \mathbb{R}$ be a smooth function (the **phase**) and $\psi : [a, b] \to \mathbb{C}$ a smooth function (the **amplitude**). The associated **oscillatory integral** is
\begin{align*}
I(\varphi, \psi) = \int_a^b e^{i\varphi(t)}\, \psi(t)\, d\mathcal{L}^1(t).
\end{align*}
[/definition]
When $\psi \equiv 1$ the amplitude is absent and we write $I(\varphi) = \int_a^b e^{i\varphi(t)}\, d\mathcal{L}^1(t)$.
[quotetheorem:637]
[citeproof:637]
The $-1/k$ exponent is sharp: for $\varphi(t) = \lambda t^k$ on $[-1,1]$, a scaling argument shows the integral is of order $\lambda^{-1/k}$.
[remark: Necessity of Monotonicity at $k=1$]
The monotonicity hypothesis on $\varphi'$ when $k = 1$ is needed in the proof to control the total variation of $1/\varphi'$. Without it, $\varphi'$ might oscillate rapidly between $+\lambda$ and $-\lambda$, causing $1/\varphi'$ to have large total variation, and the integration-by-parts bound would break down. When $k \ge 2$, the lower bound on $|\varphi^{(k)}|$ implies that $\varphi^{(k-1)}$ is monotonic (being a function whose derivative has a definite sign), so no separate monotonicity hypothesis is needed.
[/remark]
### Applications of the Van der Corput Bound
The Van der Corput lemma is a tool for converting pointwise lower bounds on phase derivatives into integral decay estimates. Two standard applications appear repeatedly in the course.
[example: Decay of the Fourier Transform of a Smooth Measure]
Let $d\mu = \psi(t)\, d\mathcal{L}^1(t)$ where $\psi \in C_c^\infty(\mathbb{R})$ and let $\varphi(t) = \xi \cdot t$ (linear phase). Then $\hat{\mu}(\xi) = \int e^{-i\xi t} \psi(t)\, d\mathcal{L}^1(t)$. For $|\xi| \ge 1$, the phase $\varphi(t) = -\xi t$ satisfies $|\varphi'| = |\xi| \ge 1$, and $\varphi'$ is constant (monotonic). The $k=1$ Van der Corput bound gives $|\hat{\mu}(\xi)| \lesssim |\xi|^{-1} \cdot (1 + \|\psi'\|_1)$. This is of course the standard integration-by-parts decay for the Fourier transform of a smooth compactly supported function, recovered here as a special case.
[/example]
[example: Decay of the Fourier Transform of Arc Length Measure on a Curved Arc]
Let $\gamma : [a, b] \to \mathbb{R}^2$ be a smooth curve with non-vanishing curvature, and let $d\sigma$ be arc length measure on $\gamma$. For a unit vector $e \in \mathbb{R}^2$ and parameter $\lambda > 0$, consider $\int_a^b e^{i\lambda e \cdot \gamma(t)} |\gamma'(t)|\, d\mathcal{L}^1(t)$. The phase is $\varphi(t) = \lambda e \cdot \gamma(t)$, so $\varphi'(t) = \lambda e \cdot \gamma'(t)$ and $\varphi''(t) = \lambda e \cdot \gamma''(t)$. Since the curvature of $\gamma$ is non-vanishing, there is a fixed $c > 0$ such that $|\gamma''(t)| \ge c$ uniformly. Choosing $e$ to be the unit tangent direction at a point where $\varphi' = 0$ makes $|\varphi''| \ge c\lambda$. The $k=2$ Van der Corput bound then gives decay of order $\lambda^{-1/2}$, consistent with the stationary-phase result below.
[/example]
## The Stationary Phase Lemma
Van der Corput's lemma is one-dimensional and assumes the phase derivative is bounded away from zero. In higher dimensions, and near critical points where $\nabla \varphi = 0$, a different approach is needed. The stationary-phase lemma provides both the rate of decay and the asymptotic expansion.
[definition: Non-Degenerate Critical Point]
Let $\varphi : U \to \mathbb{R}$ be a smooth function defined on an open set $U \subseteq \mathbb{R}^n$. A point $x_0 \in U$ is a **critical point** of $\varphi$ if $\nabla \varphi(x_0) = 0$. The critical point is **non-degenerate** if the Hessian matrix
\begin{align*}
H_\varphi(x_0) = \left(\frac{\partial^2 \varphi}{\partial x_i \partial x_j}(x_0)\right)_{1 \le i, j \le n}
\end{align*}
is invertible, i.e., $\det H_\varphi(x_0) \ne 0$.
[/definition]
At a non-degenerate critical point, the phase behaves locally like a non-degenerate quadratic form, and this governs the integral.
[quotetheorem:636]
[citeproof:636]
[explanation: The Role of the Hessian Determinant]
The factor $|\det H_\varphi(x_0)|^{-1/2}$ in the leading term has a direct geometric meaning. At a non-degenerate critical point $x_0$, the phase $\varphi$ grows quadratically in each direction. The region where the phase $\lambda\varphi(x)$ differs from $\lambda\varphi(x_0)$ by less than order $1$ — the region of effective non-cancellation — is an ellipsoid of volume approximately $(2\pi/\lambda)^{n/2} |\det H_\varphi(x_0)|^{-1/2}$. The integral over this region of an amplitude of order $1$ is exactly this volume, which is the leading term. A large Hessian determinant means sharp curvature in all directions, a small effective ellipsoid, and hence a smaller integral. A small Hessian determinant (approaching a degenerate critical point) means flat curvature in some direction, a large effective region, and a larger integral.
The signature phase factor $e^{i\pi\,\mathrm{sgn}/4}$ arises from the Gaussian integrals over directions of positive and negative curvature. For $\int_{\mathbb{R}} e^{i\lambda t^2/2}\, d\mathcal{L}^1(t)$ the contour is rotated by $45°$ (adding a phase $e^{i\pi/4}$), while for $\int_{\mathbb{R}} e^{-i\lambda t^2/2}\, d\mathcal{L}^1(t)$ it contributes $e^{-i\pi/4}$. The product over all $n$ coordinates gives $e^{i\pi \,\mathrm{sgn}/4}$.
[/explanation]
### The Case Without Stationary Points
When the amplitude $\psi$ is supported away from all critical points of $\varphi$, the integral decays faster than any power of $\lambda$.
[quotetheorem:3201]
[citeproof:3201]
This theorem clarifies the structure of the proof of the stationary-phase lemma: one first cuts off near the critical point (the remainder is rapidly decaying by this theorem), then handles the localised integral by the Gaussian computation.
## Asymptotic Expansions and Full Amplitude
The leading-order term in the stationary-phase expansion involves only the phase value, the Hessian, and the amplitude value at the critical point. The full expansion is an asymptotic series in powers of $\lambda^{-1}$.
[quotetheorem:636]
[citeproof:636]
The coefficients $a_j$ are obtained by expanding $\psi(x)$ and the Jacobian of the Morse-lemma coordinate change in a Taylor series around $x_0$, then integrating term by term against the Gaussian $e^{i\lambda y^\top A y/2}$. Each monomial $y^\alpha$ in the Taylor expansion contributes at the level $\lambda^{-|\alpha|/2}$, so even-order terms (pairing with the Gaussian) contribute at order $\lambda^{-|\alpha|/2}$ and odd-order terms vanish by symmetry. This gives the integer powers of $\lambda^{-1}$ in the expansion.
The proof that this course provides is a proof sketch. The full rigorous justification of the remainder estimates requires careful handling of the Morse coordinate change and the Taylor expansion of the Jacobian; see Stein, *Harmonic Analysis* (1993), Chapter VIII, for the complete treatment.
[example: The Fourier Transform of Surface Measure on the Sphere]
Let $S^{n-1}$ be the unit sphere in $\mathbb{R}^n$ with $n \ge 2$, and let $d\sigma$ denote the surface measure. The Fourier transform of $d\sigma$ is
\begin{align*}
\widehat{d\sigma}(\xi) = \int_{S^{n-1}} e^{-i\xi \cdot \omega}\, d\sigma(\omega).
\end{align*}
By rotational symmetry $\widehat{d\sigma}(\xi)$ depends only on $|\xi|$, so write $\lambda = |\xi|$ and take $e = \xi/|\xi|$. Parametrising near the poles $\pm e$, the phase at the north pole is $\varphi(\omega) = -e \cdot \omega$, which has a non-degenerate minimum at $\omega = e$ (since the Hessian restricted to $T_e S^{n-1}$ is the $(n-1) \times (n-1)$ identity). The stationary-phase lemma applied in the $(n-1)$-dimensional tangent coordinates gives
\begin{align*}
\widehat{d\sigma}(\xi) = c_+ e^{-i|\xi|} |\xi|^{-(n-1)/2} + c_- e^{i|\xi|} |\xi|^{-(n-1)/2} + O(|\xi|^{-(n+1)/2})
\end{align*}
for $|\xi| \to \infty$, where $c_\pm$ are explicit constants depending on $n$. The two terms come from the two stationary points $\omega = \pm e$. The leading-order bound $|\widehat{d\sigma}(\xi)| \lesssim |\xi|^{-(n-1)/2}$ is the key decay input in the Stein–Tomas theorem.
To see concretely why the exponent is $-(n-1)/2$ and not $-n/2$: the stationary phase is computed on the $(n-1)$-dimensional sphere, not in all of $\mathbb{R}^n$. The Hessian at each stationary point has $n-1$ non-zero eigenvalues (the sphere has $n-1$ dimensions of curvature), so the $-n/2$ exponent from the $\mathbb{R}^{n-1}$ stationary phase gives $-(n-1)/2$ in the ambient $\mathbb{R}^n$ count.
[/example]
## Connection to the Method of Descent and Multidimensional Integrals
The Van der Corput lemma in dimension $n$ follows from the one-dimensional version by a Fubini-type argument when the phase can be decomposed as a sum over coordinates. For a general phase $\varphi(x_1, \ldots, x_n)$ with $|\partial_{x_j}^k \varphi| \ge \lambda$ uniformly, one integrates over each variable in succession. However, this Fubini approach yields $\lambda^{-n/k}$ rather than the optimal $\lambda^{-n/(2k)}$ that the stationary-phase lemma achieves at a critical point. The improved bound near critical points requires the quadratic geometry of the Morse lemma — it cannot be obtained by one-dimensional arguments alone.
[remark: Degenerate Critical Points]
When $\det H_\varphi(x_0) = 0$ the critical point is degenerate and the $\lambda^{-n/2}$ bound no longer holds. The decay rate depends on the order of vanishing of $\det H_\varphi$ as one approaches $x_0$. For example, if $n = 1$ and $\varphi(t) = t^k$ near $t = 0$ (so the critical point has order $k-1$), the stationary-phase integral decays like $\lambda^{-1/k}$, consistent with Van der Corput at order $k$. The classification of degenerate critical points (in the sense of singularity theory) leads to a complete description of possible decay rates, but this is treated in the singularity-theory literature (see, e.g., Arnold–Gusein-Zade–Varchenko).
[/remark]
## Summary and Forward Reference
The two theorems of this chapter serve distinct roles in what follows. The Van der Corput lemma is used in a direct and computational way: given an explicit bound on a derivative of the phase, it immediately yields a decay rate. The stationary-phase lemma is used more structurally: its conclusion that the Fourier transform of a curved surface measure decays like $|\xi|^{-(n-1)/2}$ is an input to the $TT^*$ argument in the Stein–Tomas theorem.
The general principle uniting both results is that curvature (of the phase, of a hypersurface) forces cancellation and produces decay. A flat phase (all derivatives small) or a flat surface (zero Gaussian curvature) gives no decay and no restriction estimate. The connection between curvature and cancellation — which is what the stationary-phase method makes precise — is one of the deepest structural features of harmonic analysis.
Restriction operators map functions defined on large spaces to lower-dimensional surfaces, with bounds governed by geometric properties involving phase and curvature. The restriction theorems connect wave phenomena and Fourier analysis to geometry, showing how stationary phase estimates restrict to hypersurfaces.
# 13. Restriction Theorems
The preceding chapter on stationary phase developed the key decay estimates for oscillatory integrals — in particular, the fact that the Fourier transform of surface measure on a curved hypersurface decays at a specific rate in the spatial variable. This chapter puts that decay to work on a question that at first sounds elementary: can the Fourier transform of an $L^p$ function be restricted to a curved hypersurface? The answer turns out to depend delicately on $p$ and on the geometry of the surface, and the Stein--Tomas theorem gives the sharp range of $p$ for which restriction to the sphere is possible in an $L^2$ sense.
## The Restriction Problem
The Fourier transform of a Schwartz function $f \in \mathcal{S}(\mathbb{R}^n)$ is a smooth, rapidly decaying function on all of $\mathbb{R}^n$, and its restriction to any smooth submanifold $S \subset \mathbb{R}^n$ is well-defined. The problem arises when we try to extend this operation to $L^p$ functions.
For $f \in L^2(\mathbb{R}^n)$, the Fourier transform $\hat{f}$ is an $L^2$ function, and an $L^2$ function is defined only up to sets of measure zero. A smooth hypersurface $S \subset \mathbb{R}^n$ has Lebesgue measure zero in $\mathbb{R}^n$, so $\hat{f}|_S$ is not defined in any obvious sense for $f \in L^2$. For $f \in L^1(\mathbb{R}^n)$, the Fourier transform $\hat{f}$ is continuous by the Riemann-Lebesgue lemma, and restriction to $S$ is unambiguous — but $L^1$ is a much smaller space than we would like to work with.
The question is: for which exponents $p \in [1, 2]$ does the map $f \mapsto \hat{f}|_S$, defined initially on $\mathcal{S}(\mathbb{R}^n)$, extend to a bounded operator from $L^p(\mathbb{R}^n)$ to some $L^q(S, d\sigma)$, where $d\sigma$ denotes the surface measure on $S$?
[motivation]
### Why curvature matters
The restriction problem for flat hypersurfaces — such as a hyperplane $\{x_n = 0\}$ — is hopeless in the following sense. The Fourier transform of $f \in L^p(\mathbb{R}^n)$ for $p > 1$ need not be locally integrable on the hyperplane; there is no restriction inequality of the form $\|\hat{f}|_{S}\|_{L^q(d\sigma)} \lesssim \|f\|_{L^p(\mathbb{R}^n)}$ for any $q > 0$.
The situation changes when $S$ is a curved hypersurface with non-vanishing principal curvatures. The curvature makes the surface measure $d\sigma$ behave more like an $L^1$ measure from the Fourier perspective: its Fourier transform $\widehat{d\sigma}(x)$ decays as $|x| \to \infty$, and the rate of decay is governed by the curvature. For a hypersurface with $k$ non-vanishing principal curvatures, the stationary phase lemma gives
\begin{align*}
|\widehat{d\sigma}(x)| \lesssim |x|^{-k/2}.
\end{align*}
For the unit sphere $S^{n-1} \subset \mathbb{R}^n$, all $n-1$ principal curvatures are non-zero, so $\widehat{d\sigma}(x) = O(|x|^{-(n-1)/2})$. This decay is the key geometric input in the Stein--Tomas proof.
[/motivation]
[definition: Restriction Operator]
Let $S \subset \mathbb{R}^n$ be a smooth compact hypersurface with surface measure $d\sigma$, and let $1 \le p \le 2$, $1 \le q \le \infty$. The **restriction operator** $\mathcal{R}: \mathcal{S}(\mathbb{R}^n) \to L^q(S, d\sigma)$ is defined by
\begin{align*}
\mathcal{R}f = \hat{f}|_S.
\end{align*}
We say the **restriction estimate** $R(p \to q)$ holds for $S$ if $\mathcal{R}$ extends to a bounded operator $\mathcal{R}: L^p(\mathbb{R}^n) \to L^q(S, d\sigma)$, that is, if
\begin{align*}
\|\hat{f}\|_{L^q(S, d\sigma)} \lesssim \|f\|_{L^p(\mathbb{R}^n)}
\end{align*}
holds for all $f \in \mathcal{S}(\mathbb{R}^n)$ with constant independent of $f$.
[/definition]
The most natural target space is $L^2(S, d\sigma)$, both because $d\sigma$ is the geometrically natural measure on $S$ and because the $TT^*$ method (the proof technique) naturally delivers $L^2$ control on $S$.
### Necessary conditions from scaling
Before asking what the correct range of $p$ is, a scaling argument constrains the problem. Consider $f_\lambda(x) = f(\lambda x)$ for $\lambda > 0$. Then $\hat{f}_\lambda(\xi) = \lambda^{-n} \hat{f}(\xi/\lambda)$. If the sphere $S^{n-1}$ is scaled by $\lambda$, the surface measure scales as $\lambda^{n-1} d\sigma$. Plugging into the restriction estimate and tracking powers of $\lambda$ leads to the necessary condition $1/p \ge 1/p' \cdot n/(n-1)$, which translates to the constraint $p \le 2n/(n+1)$.
This necessary condition from scaling does not yet rule out large $p$. A more refined necessary condition comes from testing the inequality against specific functions concentrated near a cap of $S^{n-1}$.
[definition: Restriction Conjecture]
The **Stein restriction conjecture** for $S^{n-1}$ asserts that the restriction estimate $R(p \to 2)$ holds — that is,
\begin{align*}
\|\hat{f}\|_{L^2(S^{n-1}, d\sigma)} \lesssim \|f\|_{L^p(\mathbb{R}^n)}
\end{align*}
— for all $p \le 2n/(n+1)$.
[/definition]
[remark: Status of the Conjecture]
The restriction conjecture is fully resolved only for $n = 2$ (by Fefferman and Zygmund, in 1970). For $n \ge 3$ it remains open. The Stein--Tomas theorem, proved in the 1970s, establishes the estimate for the smaller range $p \le 2(n+1)/(n+3)$. Subsequent progress — by Wolff, Tao, and others through decoupling and polynomial partitioning methods — has progressively widened the known range, but the conjecture in full generality is open for all $n \ge 3$.
[/remark]
## The Stein--Tomas Theorem
The Stein--Tomas theorem is the landmark result on restriction to the sphere. It establishes the restriction estimate $R(p \to 2)$ for $p \le 2(n+1)/(n+3)$, which is strictly smaller than the conjectured threshold $p \le 2n/(n+1)$ but already requires a genuinely substantive proof. The proof proceeds via the $TT^*$ method — a general Hilbert-space technique for converting an operator bound into a self-adjoint estimate — combined with the stationary-phase decay of $\widehat{d\sigma}$.
[quotetheorem:3202]
[citeproof:3202]
### The dual extension formulation
The Stein--Tomas theorem admits an equivalent dual form that is often more convenient for applications. If $T: L^p \to L^2(d\sigma)$ is bounded, its adjoint $T^*: L^2(d\sigma) \to L^{p'}$ is bounded with the same operator norm. Writing this out explicitly gives the extension theorem.
[quotetheorem:3203]
[citeproof:3203]
[remark: Restriction vs Extension]
The restriction and extension formulations are dual to each other and carry equal information. In practice, the extension formulation is often the more useful one: it says that functions whose Fourier transform is supported on the sphere — the so-called **Strichartz regime** — enjoy improved integrability in physical space. This is the bridge to the Strichartz estimates of the next chapter.
[/remark]
The passage from $p = 2(n+1)/(n+3)$ (the Stein--Tomas range) to $p = 2n/(n+1)$ (the conjectured range) is a genuine open problem. Every improvement requires additional geometric input beyond the $TT^*$ method.
## The Decay of the Fourier Transform of Surface Measure
The central geometric computation underlying the Stein--Tomas theorem is the decay rate of $\widehat{d\sigma}(x)$ as $|x| \to \infty$. We record this as a standalone result, since it is used independently of the Stein--Tomas theorem in oscillatory integral estimates.
[quotetheorem:3204]
[citeproof:3204]
[example: Explicit computation for $n = 2$]
For $n = 2$, the sphere $S^1$ is the unit circle in $\mathbb{R}^2$ and $d\sigma$ is arc length. Taking $x = (0, r)$:
\begin{align*}
\widehat{d\sigma}(x) = \int_0^{2\pi} e^{-i r \sin\theta}\, d\theta = 2\pi J_0(r),
\end{align*}
where $J_0$ is the Bessel function of order zero. The classical asymptotics $J_0(r) = \sqrt{2/(\pi r)} \cos(r - \pi/4) + O(r^{-3/2})$ as $r \to \infty$ give exactly $|\widehat{d\sigma}(x)| \lesssim |x|^{-1/2}$, which matches the general formula $(n-1)/2 = 1/2$ for $n = 2$. The oscillatory factor $\cos(r - \pi/4)$ reflects the constructive and destructive interference of the two stationary-phase contributions from the top and bottom of the circle.
[/example]
## Structure of the $TT^*$ Argument
The $TT^*$ method that drives the Stein--Tomas proof is a general tool that recurs throughout harmonic analysis, and it is worth understanding it abstractly before applying it to restriction.
[explanation: The $TT^*$ Method]
Let $T: H_1 \to H_2$ be a bounded linear operator between Hilbert spaces, with adjoint $T^*: H_2 \to H_1$. The fundamental identity
\begin{align*}
\|T\|_{\mathcal{L}(H_1, H_2)}^2 = \|TT^*\|_{\mathcal{L}(H_2, H_2)}
\end{align*}
holds because $\|Tf\|_{H_2}^2 = (Tf, Tf)_{H_2} = (T^*Tf, f)_{H_1} \le \|T^*T\|_{\mathcal{L}(H_1,H_1)} \|f\|_{H_1}^2$, and similarly $\|T^*T\|_{\mathcal{L}(H_1)} = \|TT^*\|_{\mathcal{L}(H_2)}$ by the spectral radius.
In the restriction context, $H_1 = L^2(S^{n-1}, d\sigma)$ and $T: L^p(\mathbb{R}^n) \to L^2(d\sigma)$ is not a map between Hilbert spaces when $p \ne 2$. The $TT^*$ argument nonetheless works via a duality chain: the restriction estimate $\|Tf\|_{L^2(d\sigma)} \lesssim \|f\|_{L^p}$ is equivalent (by duality) to $\|T^*g\|_{L^{p'}} \lesssim \|g\|_{L^2(d\sigma)}$, and this in turn follows from the $L^2 \to L^{p'}$ bound on $T^*$ — which can be proved by writing $\|T^*g\|_{L^{p'}}^2 = \| |T^*g|^2 \|_{L^{p'/2}}$ and applying Hölder and the explicit kernel of $TT^*$.
The formula for the kernel of $TT^*$: for $g \in L^2(S^{n-1})$,
\begin{align*}
T^*g(x) &= \widehat{g\, d\sigma}(x) = \int_{S^{n-1}} g(\omega) e^{-ix \cdot \omega}\, d\sigma(\omega),
\end{align*}
so
\begin{align*}
TT^*g(\eta) &= \widehat{T^*g}(\eta)\Big|_{S^{n-1}} = \int_{S^{n-1}} g(\omega) \widehat{d\sigma}(\eta - \omega)\, d\sigma(\omega).
\end{align*}
The operator $TT^*$ on $L^2(S^{n-1})$ is thus convolution by the function $\widehat{d\sigma}$ restricted to the sphere. Proving $TT^*: L^2(d\sigma) \to L^{p'}(\mathbb{R}^n)$ reduces to bounding $\|T^*g\|_{L^{p'}}$, which becomes the Hardy--Littlewood--Sobolev inequality once one identifies the kernel $\widehat{d\sigma}$ with a Riesz potential.
[/explanation]
[remark: Sharpness of the Stein--Tomas Range]
The exponent $p = 2(n+1)/(n+3)$ is sharp for the $L^2$ restriction estimate proved by the $TT^*$ method: the endpoint of the Stein--Tomas theorem is achieved, meaning the bound holds at $p = 2(n+1)/(n+3)$ itself, not merely for strictly smaller $p$. The sharpness of this specific bound within the $TT^*$ framework follows from testing the inequality on Knapp-type examples — functions concentrated near a small spherical cap — which show that no extension of the $L^2$-to-$L^{p'}$ bound on $T^*$ is possible beyond the Stein--Tomas range by this method alone.
[/remark]
<!-- illustration-needed: the restriction problem geometry — show the sphere S^{n-1} inside frequency space R^n, with a Schwartz function f in physical space, its Fourier transform hat(f) defined on all of R^n, and the restriction hat(f)|_{S^{n-1}} as a function on the sphere carrying L^2(d sigma) norm -->
Restriction results on hypersurfaces extend to broader wave equation estimates capturing how dispersive effects influence solution behavior. Strichartz estimates bound mixed space-time norms of solutions to dispersive PDE, combining restriction theory with energy methods to quantify decay.
# 14. Strichartz Estimates
The Strichartz estimates are mixed-norm space-time bounds for solutions to the free Schrödinger equation. They convert the pointwise dispersive decay of the propagator $e^{it\Delta}$ — which shrinks amplitude and spreads mass across space as time grows — into global integrability over the space-time cylinder $\mathbb{R}_t \times \mathbb{R}^n_x$. The key inputs are the dispersive estimate, Hardy–Littlewood–Sobolev in the time variable, and the $TT^*$ method that also appeared in the proof of the Stein–Tomas theorem in Chapter 13. These estimates are indispensable in nonlinear dispersive PDE: they provide the a priori control needed to run fixed-point arguments for equations like the nonlinear Schrödinger equation.
## The Free Schrödinger Propagator
The question motivating this section is: how does an initial datum $f \in L^2(\mathbb{R}^n)$ evolve under the free Schrödinger equation $i\partial_t u + \Delta u = 0$? On the Fourier side, the equation dictates $i\partial_t \hat{u}(t,\xi) = -|\xi|^2 \hat{u}(t,\xi)$, which gives the explicit solution $\hat{u}(t,\xi) = e^{-it|\xi|^2} \hat{f}(\xi)$. Inverting the Fourier transform in $\xi$ produces a formula for $u(t,x)$ as a convolution of $f$ against an explicit kernel.
[definition: Schrödinger Propagator]
For $f \in \mathcal{S}(\mathbb{R}^n)$ and $t \neq 0$, the **free Schrödinger propagator** is the operator $e^{it\Delta} : \mathcal{S}(\mathbb{R}^n) \to C^\infty(\mathbb{R}^n)$ defined by
\begin{align*}
e^{it\Delta} f(x) := \mathcal{F}^{-1}\bigl(e^{-it|\xi|^2} \hat{f}(\xi)\bigr)(x) = \frac{1}{(2\pi)^n} \int_{\mathbb{R}^n} e^{i x \cdot \xi} e^{-it|\xi|^2} \hat{f}(\xi) \, d\mathcal{L}^n(\xi).
\end{align*}
This can also be written as a spatial convolution: by completing the square in the exponent, one computes $e^{it\Delta}f(x) = K_t * f(x)$ where the **Schrödinger kernel** is
\begin{align*}
K_t(x) = \frac{1}{(4\pi i t)^{n/2}} e^{i|x|^2/(4t)}, \qquad t \neq 0.
\end{align*}
The $u(t,x) := e^{it\Delta}f(x)$ is the unique solution in $C(\mathbb{R}; L^2(\mathbb{R}^n))$ of the Cauchy problem $i\partial_t u + \Delta u = 0$ with $u(0,\cdot) = f$.
[/definition]
Two fundamental estimates govern $e^{it\Delta}$. The first is conservation of $L^2$ mass, which follows directly from Plancherel and the fact that $|e^{-it|\xi|^2}| = 1$. The second is the dispersive estimate, which captures the spreading of mass over time.
[quotetheorem:3205]
[citeproof:3205]
The dispersive estimate says the amplitude of the solution decays like $|t|^{-n/2}$ as $t \to \infty$. This reflects the physical spreading of the wave packet: energy, which is proportional to amplitude squared, spreads over a region of volume $\sim |t|^n$ in the spatial variable, so the peak amplitude shrinks accordingly. The exponent $n/2$ is the same as the $O(\lambda^{-n/2})$ rate in the stationary-phase lemma, because the kernel $K_t(x)$ is itself an oscillatory integral.
By $L^2$ conservation, $e^{it\Delta}$ extends to a unitary operator on $L^2(\mathbb{R}^n)$ for every $t \in \mathbb{R}$.
[remark: Kernel Computation]
The formula $K_t(x) = (4\pi it)^{-n/2} e^{i|x|^2/(4t)}$ is obtained by computing the inverse Fourier transform of $e^{-it|\xi|^2}$. Completing the square, $-it|\xi|^2 + ix\cdot\xi = -it|\xi - x/(2t)|^2 + i|x|^2/(4t)$, and the Gaussian integral $\int_{\mathbb{R}^n} e^{-it|y|^2} d\mathcal{L}^n(y) = (\pi/it)^{n/2}$ provides the constant. The factor $(4\pi it)^{-n/2}$ requires interpreting $(it)^{n/2}$ via the principal branch with $\arg(it) = \pi/2$, giving $|K_t(x)| = (4\pi |t|)^{-n/2}$.
[/remark]
## Admissible Pairs and the Strichartz Estimates
The dispersive estimate alone only gives $L^\infty_x$ control at a single time. To obtain global space-time bounds, we integrate over $t$ using Hardy–Littlewood–Sobolev in the time variable, combined with the $TT^*$ method to handle cross-terms. The precise range of exponents for which this works is encoded in the admissibility condition.
[definition: Admissible Pair]
A pair of exponents $(q, r)$ with $2 \leq q, r \leq \infty$ is called **admissible** (for the Schrödinger equation in dimension $n$) if
\begin{align*}
\frac{2}{q} + \frac{n}{r} = \frac{n}{2},
\end{align*}
and the endpoint $(q, r, n) = (2, \infty, 2)$ is excluded.
[/definition]
The constraint $2/q + n/r = n/2$ is the scaling condition. The Schrödinger equation has the scaling symmetry $u(t,x) \mapsto \lambda^{n/2} u(\lambda^2 t, \lambda x)$, and the mixed norm $\|u\|_{L^q_t L^r_x}$ is invariant under this scaling precisely when $2/q + n/r = n/2$. The conditions $q \geq 2$ and $r \geq 2$ come from the direction of the inequality. The endpoint $(2, \infty, 2)$ is excluded because the $TT^*$ argument fails to close in that case; it was later shown to hold by a different method (Keel–Tao, 1998), but the course treats only the non-endpoint case.
[example: Specific Admissible Pairs]
In dimension $n = 3$, the admissibility condition becomes $2/q + 3/r = 3/2$. Several important pairs are:
- $(q, r) = (2, 6)$: here $2/2 + 3/6 = 1 + 1/2 = 3/2$. This is the standard energy estimate; the space $L^2_t L^6_x(\mathbb{R} \times \mathbb{R}^3)$ appears in the analysis of the cubic NLS.
- $(q, r) = (4, 3)$: here $2/4 + 3/3 = 1/2 + 1 = 3/2$.
- $(q, r) = (\infty, 2)$: here $2/\infty + 3/2 = 3/2$, which gives the $L^\infty_t L^2_x$ estimate — this is just $L^2$ conservation.
In dimension $n = 1$, the condition $2/q + 1/r = 1/2$ requires $q \geq 4$. The pair $(q, r) = (4, \infty)$ is borderline and excluded as an endpoint in the non-Keel–Tao theory.
In dimension $n = 2$, the pair $(2, \infty)$ is excluded; the simplest admissible pair is $(q, r) = (4, 4)$: $2/4 + 2/4 = 1/2 + 1/2 = 1 = 2/2$.
[/example]
The admissibility condition is necessary, not just sufficient: scaling $u(t,x) \mapsto \lambda^{n/2} u(\lambda^2 t, \lambda x)$ fixes $\|f\|_{L^2}$ and scales $\|e^{it\Delta}f\|_{L^q_t L^r_x}$ by $\lambda^{n/2 - n/r - 2/q}$, so dimensional consistency requires $2/q + n/r = n/2$.
[quotetheorem:639]
[citeproof:639]
The $TT^*$ method deserves emphasis: instead of estimating $\|Tf\|$ directly, one squares the problem by computing $\|TT^*\|$. This converts the bilinear problem of bounding the $L^2$-inner product $\langle Tf, g\rangle$ into a convolution-type estimate on the kernel of $TT^*$, where Hardy–Littlewood–Sobolev applies. The same strategy appeared in the proof of the Stein–Tomas theorem in Chapter 13: there $TT^*$ had kernel $\widehat{d\sigma}$, which decays by stationary phase; here the kernel $e^{i(t-s)\Delta}$ decays by the dispersive estimate.
## The Retarded Strichartz Estimate and the Inhomogeneous Problem
The homogeneous estimate controls the free propagator applied to initial data. For nonlinear applications, one needs to control solutions to the inhomogeneous equation $i\partial_t u + \Delta u = F(t,x)$ with initial data $u(0) = 0$. By Duhamel's formula, the solution is
\begin{align*}
u(t,x) = -i \int_0^t e^{i(t-s)\Delta} F(s,x) \, d\mathcal{L}^1(s).
\end{align*}
The question becomes: in what norms can $u$ be controlled in terms of $F$?
[quotetheorem:3206]
[citeproof:3206]
[remark: Retarded vs. Full Duhamel]
In practice one uses the retarded (half-line) version $\int_0^t e^{i(t-s)\Delta}F(s)\,d\mathcal{L}^1(s)$ rather than $\int_\mathbb{R}$. Since $\mathbb{1}_{[0,t]}(s) \leq \mathbb{1}_{\mathbb{R}}(s)$, the estimate for the half-line integral follows from the full-line estimate with the same constants.
[/remark]
## Application: Well-Posedness of the Cubic NLS
The reason Strichartz estimates matter is that they make nonlinear problems tractable. Consider the defocusing cubic nonlinear Schrödinger equation in dimension $n = 3$:
\begin{align*}
i\partial_t u + \Delta u = |u|^2 u, \quad u(0, \cdot) = f \in L^2(\mathbb{R}^3).
\end{align*}
By Duhamel's formula, a solution satisfies the integral equation
\begin{align*}
u(t) = e^{it\Delta}f - i\int_0^t e^{i(t-s)\Delta}(|u|^2 u)(s)\,d\mathcal{L}^1(s).
\end{align*}
A remark on criticality: cubic NLS in $\mathbb{R}^3$ is $\dot H^{1/2}$-critical (and $L^2$-supercritical), so the natural functional setting is $H^{1/2}$ or $H^1$, not $L^2$. We work with $H^1$ data and the admissible pair $(q,r) = (2,6)$ in dimension $n=3$:
The homogeneous Strichartz estimate gives $\|e^{it\Delta}f\|_{L^2_t L^6_x} \lesssim \|f\|_{L^2_x}$. For the nonlinear term, one estimates $\||u|^2 u\|_{L^2_t L^{6/5}_x}$ using Hölder in $x$ (since $1/6 + 2 \cdot 1/3 = 1/6 + 2/3 = 5/6$ so the exponents match $6/5$ when paired correctly), and then applies the inhomogeneous Strichartz estimate. For small data, $\|f\|_{L^2}$ small, this sets up a contraction mapping on a ball in $L^2_t L^6_x$, yielding local (and for small data, global) well-posedness.
This sketch illustrates the structure: Strichartz estimates convert the nonlinear term's growth into a manageable norm, and the fixed-point argument closes because the Strichartz norm is both controlled by the linear propagator and coercive enough to absorb the nonlinearity.
[example: Failure of the Estimate at the Excluded Endpoint]
Consider dimension $n = 2$ and the excluded pair $(q, r) = (2, \infty)$. Formally, $2/2 + 2/\infty = 1 = 2/2$, which satisfies the admissibility relation. The estimate $\|e^{it\Delta}f\|_{L^2_t L^\infty_x} \lesssim \|f\|_{L^2_x}$ would require the function $t \mapsto \|e^{it\Delta}f\|_{L^\infty_x}$ to be square-integrable in $t$. The dispersive estimate gives $\|e^{it\Delta}f\|_\infty \leq C|t|^{-1}\|f\|_1$ for $f \in L^1 \cap L^2$, and $\int_1^\infty |t|^{-2} d\mathcal{L}^1(t) < \infty$, so the estimate holds away from $t = 0$. The issue is the singularity at $t = 0$: as $t \to 0$, $e^{it\Delta}f \to f$ in $L^2$ but not necessarily in $L^\infty$, so $\|e^{it\Delta}f\|_\infty$ can blow up near $t = 0$ for general $f \in L^2$. The Hardy–Littlewood–Sobolev argument formally requires the Riesz potential exponent $\alpha = 1 - n(1/2 - 1/r) = 1 - 2(1/2) = 0$, i.e., the Riesz potential degenerates to convolution with a function that is not integrable near the origin. This is the precise reason the endpoint fails by the standard argument.
[/example]
## The Energy Space and Higher Regularity
The Strichartz estimate in Theorem 14.2 works for $f \in L^2(\mathbb{R}^n)$. For initial data in the Sobolev space $H^s(\mathbb{R}^n) = W^{s,2}(\mathbb{R}^n)$, stronger Strichartz estimates hold by commuting $e^{it\Delta}$ with the Bessel potential operator $J^s = (1-\Delta)^{s/2}$.
Since $J^s$ is a Fourier multiplier with symbol $(1+|\xi|^2)^{s/2}$ and $e^{it\Delta}$ has multiplier $e^{-it|\xi|^2}$, these two operators commute: $J^s e^{it\Delta} = e^{it\Delta} J^s$. Therefore
\begin{align*}
\|e^{it\Delta}f\|_{L^q_t W^{s,r}_x} = \|J^s e^{it\Delta} f\|_{L^q_t L^r_x} = \|e^{it\Delta} J^s f\|_{L^q_t L^r_x} \lesssim \|J^s f\|_{L^2_x} = \|f\|_{H^s_x},
\end{align*}
where the inequality is the standard Strichartz estimate applied to $J^s f \in L^2$. This gives $\|e^{it\Delta}f\|_{L^q_t W^{s,r}_x} \lesssim \|f\|_{H^s_x}$ for any admissible $(q,r)$.
[remark: Connection to Restriction Estimates]
There is a precise relationship between Strichartz estimates and the Fourier restriction theory of Chapter 13. The map $f \mapsto e^{it\Delta}f$ is essentially the Fourier extension operator associated with the paraboloid $\{(\xi, |\xi|^2) : \xi \in \mathbb{R}^n\} \subset \mathbb{R}^{n+1}$, which has non-vanishing Gaussian curvature. The space-time Strichartz bound $\|e^{it\Delta}f\|_{L^q_{t,x}} \lesssim \|f\|_{L^2_x}$ (for $q = 2(n+2)/n$, the $L^q_{t,x}$ estimate with equal exponents in $t$ and $x$) is exactly the $L^2$-restriction estimate for the paraboloid. The Stein–Tomas theorem for the paraboloid gives this estimate for $q \geq 2(n+2)/n$, and the Strichartz estimates in the mixed-norm spaces $L^q_t L^r_x$ arise by combining the paraboloid restriction with Minkowski's inequality to separate the time and space norms. The admissibility condition $2/q + n/r = n/2$ reflects the geometry of the paraboloid and the resulting scaling.
[/remark]
While Strichartz estimates address individual operators, understanding geometric obstructions to uniform estimates requires analyzing concentrated sets of directions. The Kakeya conjecture, conjecturing that sets containing unit segments in all directions must have full dimension, reveals fundamental constraints on oscillatory integral operators.
# 15. The Kakeya Conjecture
The preceding chapters on restriction theorems and Strichartz estimates made repeated use of the curvature of the sphere to extract decay from the Fourier transform. This chapter examines a more primitive geometric obstruction — one that does not require curvature, only direction. The Kakeya conjecture asks how small a set can be while still containing a unit line segment pointing in every direction. Its resolution (open in dimensions $n \ge 3$) would immediately sharpen the restriction conjecture and the range of Bochner--Riesz summability. The chapter develops the problem from scratch: we construct needle-thin Kakeya sets, state the conjecture precisely in terms of Hausdorff dimension, and trace the chain of implications that links geometry to Fourier analysis.
## From Rotating Needles to Measure Zero
### The Kakeya Needle Problem
How much area does it take to rotate a unit needle in the plane? The obvious answer — a circle of diameter 1 has area $\pi/4$ — turns out to be far from optimal. In 1917, Kakeya asked for the minimum-area convex region in which a unit segment can be continuously rotated through all angles. The answer for convex regions is a Reuleaux triangle, with area $\pi/8 - \sqrt{3}/4$. But if we drop convexity, Besicovitch showed in 1928 that the infimum of the area is zero: there exist compact sets of Lebesgue measure zero in which a unit needle can be continuously moved (not just placed in all directions, but continuously rotated through $2\pi$).
The construction is remarkable because it shows that "pointing in every direction" carries essentially no metric-geometric content in the plane — a set can do it for free, at least as far as Lebesgue measure is concerned. This motivates a cleaner combinatorial formulation that strips away the rotation requirement and focuses on direction alone.
[definition: Kakeya Set]
A **Kakeya set** (also called a Besicovitch set) in $\mathbb{R}^n$ is a compact set $E \subset \mathbb{R}^n$ that contains a unit line segment in every direction. More precisely, for every $e \in S^{n-1}$ there exists a point $a \in \mathbb{R}^n$ such that
\begin{align*}
\{a + te : t \in [0,1]\} \subset E.
\end{align*}
[/definition]
The definition does not require continuity in the direction parameter — we simply need one translate of the segment $[0,1] \cdot e$ to lie inside $E$ for each $e \in S^{n-1}$.
### Besicovitch's Construction
The existence of Kakeya sets of Lebesgue measure zero in $\mathbb{R}^2$ is not just a curiosity; it is the starting point for understanding why the Kakeya conjecture is both difficult and important. We outline the construction to make concrete how a set can cover all directions while being geometrically tiny.
[example: Kakeya Set of Measure Zero in $\mathbb{R}^2$]
The construction proceeds in two phases.
**Phase 1: Perron trees.** Start with a triangle $T$ with a horizontal base of length 1. The segment joining the apex to the base midpoint points in a fixed direction. By splitting $T$ into two sub-triangles, sliding them apart along the base axis, and overlapping them near the apex, we obtain a "Perron tree" — a union of two triangles whose total area is strictly less than $|T|$ but which together contain segments in two different directions. Iterating this splitting $k$ times yields a union of $2^k$ thin triangles, each supporting a segment in a distinct direction, with total area decreasing to zero as $k \to \infty$.
**Phase 2: Covering all directions.** The $k$-th Perron tree contains unit segments in $2^k$ equally-spaced directions. Taking a sequence of such trees (with $k \to \infty$, scaled and translated to be disjoint) and forming their union gives a set that contains unit segments in a dense set of directions. The closure of this union is a compact set containing unit segments in every direction.
**Why the area is zero.** The area estimate at stage $k$ shows the total measure of the Perron tree is at most $C / \log k$, which tends to 0 as $k \to \infty$. The final set, obtained as a countable union of such trees arranged to be disjoint, has $\mathcal{L}^2$-measure zero by countable additivity applied to the sets themselves (since the full set is a closure of a measure-zero set in this construction, and the overlaps are controlled). The argument requires some care with the countable union, but the upshot is: $\mathcal{L}^2(E) = 0$ while $E$ contains a unit segment in every direction in $\mathbb{R}^2$.
[/example]
In $\mathbb{R}^n$ for $n \ge 2$, the same construction (applied to two-dimensional slices) shows that Kakeya sets of Lebesgue measure zero exist in every dimension. The question of measure is therefore settled: Kakeya sets can be measure zero. The finer question is about Hausdorff dimension.
<!-- illustration-needed: a Perron tree — show a triangle split into 2^k thin sub-triangles that slide apart at the base and overlap near the apex, illustrating how the area decreases while segments in more directions are covered -->
## Hausdorff Dimension and the Conjecture
### Hausdorff Dimension of Kakeya Sets
[definition: Hausdorff Dimension]
For $E \subset \mathbb{R}^n$ and $s \ge 0$, the **$s$-dimensional Hausdorff measure** of $E$ is
\begin{align*}
\mathcal{H}^s(E) := \lim_{\delta \to 0} \inf\left\{ \sum_{j} (\operatorname{diam} U_j)^s : E \subset \bigcup_j U_j,\ \operatorname{diam}(U_j) \le \delta \right\}.
\end{align*}
The **Hausdorff dimension** of $E$ is
\begin{align*}
\dim_{\mathcal{H}} E := \inf\{s \ge 0 : \mathcal{H}^s(E) = 0\} = \sup\{s \ge 0 : \mathcal{H}^s(E) = \infty\}.
\end{align*}
[/definition]
Hausdorff dimension captures a finer notion of "size" than Lebesgue measure: a set can have $\mathcal{L}^n$-measure zero but Hausdorff dimension strictly between 0 and $n$, or it can have full dimension $n$ while still being measure-zero. For Kakeya sets, the relevant question is whether $\dim_{\mathcal{H}} E = n$.
[quotetheorem:3207]
[citeproof:3207]
This lower bound follows from the fact that $E$ contains a unit line segment, which has Hausdorff dimension exactly 1. The depth of the conjecture lies in showing the dimension is not merely 1 but the maximum possible value $n$.
### The Kakeya Conjecture
[conjecture: Kakeya]
Every Kakeya set $E \subseteq \mathbb{R}^n$ satisfies $\dim_{\mathcal{H}} E = n$.
[/conjecture]
[remark: Status of the Conjecture]
The conjecture is fully resolved only in dimension $n = 1$ (immediate, since any set containing a segment of length 1 in $\mathbb{R}^1$ must have $\dim_{\mathcal{H}} E \ge 1 = n$) and dimension $n = 2$.
[/remark]
The $n = 2$ case was proved by Davies in 1971.
[quotetheorem:3209]
[citeproof:3209]
For $n \ge 3$, the conjecture remains open. What is known is a sequence of improving lower bounds on $\dim_{\mathcal{H}} E$.
## Dimension Bounds in Higher Dimensions
### The $\delta$-Discretisation and Maximal Functions
To make progress on the conjecture in high dimensions, it is useful to reformulate it in terms of a $\delta$-discretised analogue. Fix a small parameter $\delta > 0$. A **$(\delta, n)$-Kakeya set** is a union of $\delta$-tubes (cylinders of radius $\delta$ and length 1) pointing in a $\delta$-separated family of directions covering $S^{n-1}$.
[definition: Kakeya Maximal Function]
For $f \in L^1_{\mathrm{loc}}(\mathbb{R}^n)$ and $\delta > 0$, the **Kakeya maximal function** is
\begin{align*}
f^*_\delta(e) := \sup_{T_\delta(e)} \frac{1}{\mathcal{L}^n(T_\delta(e))} \int_{T_\delta(e)} |f(x)|\, d\mathcal{L}^n(x), \quad e \in S^{n-1},
\end{align*}
where the supremum is over all $\delta$-tubes $T_\delta(e)$ in direction $e$ (cylinders of radius $\delta$, length 1, axis parallel to $e$).
[/definition]
The Kakeya maximal function measures how concentrated $f$ can be along tubes in each direction. A bound on $\|f^*_\delta\|_{L^p(S^{n-1})}$ in terms of $\|f\|_{L^p(\mathbb{R}^n)}$ (with a controlled $\delta$-dependence) is equivalent to a lower bound on the Hausdorff dimension of Kakeya sets. Specifically:
[quotetheorem:3210]
[citeproof:3210]
The proof that the maximal conjecture implies the geometric conjecture uses the following argument: take $f = \mathbb{1}_E$ for a Kakeya set $E$. The bound $f^*_\delta \ge c > 0$ on all of $S^{n-1}$ (since $E$ contains a $\delta$-tube in every direction after $\delta$-thickening) forces $\|f^*_\delta\|_{L^n(S^{n-1})} \ge c$. The maximal conjecture yields $c \le C_\varepsilon \delta^{-\varepsilon} \|f\|_{L^n} = C_\varepsilon \delta^{-\varepsilon} \mathcal{L}^n(E)^{1/n}$, rearranging to the lower bound $\mathcal{L}^n(E) \ge (c/C_\varepsilon)^n \delta^{n\varepsilon}$. Letting $\varepsilon \to 0$ forces $\mathcal{L}^n(E) > 0$; a set of positive Lebesgue measure has Hausdorff dimension $n$.
### Successive Bounds
In dimension $n = 3$, a sequence of results established progressively stronger lower bounds on $\dim_{\mathcal{H}} E$.
**Wolff's bound (1995).** Wolff introduced a method based on comparing how different families of tubes (in a Kakeya set) can interact. His result gives:
\begin{align*}
\dim_{\mathcal{H}} E \ge \frac{n}{2} + 1 \quad \text{for } n \ge 3.
\end{align*}
For $n = 3$ this gives $\dim_{\mathcal{H}} E \ge 5/2$. The key tool is the **bush argument**: if many unit segments pass through a common point, their union already has large dimension. If no such concentration occurs, the segments must be well-spread, which also forces large dimension.
**Katz--Tao bound (2002).** Katz and Tao improved Wolff's bound using a more refined arithmetic-combinatorial argument. For $n = 3$:
\begin{align*}
\dim_{\mathcal{H}} E \ge \frac{5}{2} + \varepsilon_0
\end{align*}
for an explicit (but small) $\varepsilon_0 > 0$, making the first improvement beyond Wolff's $5/2$ threshold. The method introduced the concept of **sticky Kakeya sets** and exploited additive combinatorics to control the combinatorial complexity of tube configurations.
**Guth's bound.** Guth obtained further improvements via polynomial methods: bounding the complexity of algebraic varieties that can contain many lines, connecting the Kakeya problem to questions in incidence geometry over $\mathbb{R}$.
**Wang--Zahl (2024).** The most recent major advance, due to Wang and Zahl, proves $\dim_{\mathcal{H}} E = 3$ for $n = 3$, resolving the Kakeya conjecture in dimension 3. The proof combines multilinear restriction estimates, polynomial partitioning (a technique introduced by Guth and Katz for the Erdős distinct distances problem), and a careful induction on scales.
[remark: State of the Conjecture]
As of this writing, the Kakeya conjecture is resolved for $n \le 3$ (Davies for $n = 2$, Wang--Zahl for $n = 3$) and remains open for $n \ge 4$. The general pattern of the bounds is that dimension at least $\frac{n+2}{2}$ is known via classical methods, with improvements requiring increasingly refined combinatorial and algebraic tools.
[/remark]
## Connections to Restriction and Bochner--Riesz
### Why Kakeya Matters for Fourier Analysis
The Kakeya conjecture might appear purely geometric, but it has direct implications for several central problems of Fourier analysis. The connection arises because the Fourier transform of a function supported near a thin tube in frequency space is spread along a line in physical space in the same direction — exactly the structure that Kakeya sets measure.
### Implication for the Restriction Conjecture
Recall from Chapter 13 that the restriction conjecture for the sphere $S^{n-1}$ asserts:
\begin{align*}
\|\hat{f}\|_{L^2(S^{n-1})} \lesssim \|f\|_{L^p(\mathbb{R}^n)} \quad \text{for } p \le \frac{2(n+1)}{n+3} \quad \text{(Stein–Tomas, proved); conjectured up to } p \le \frac{2n}{n+1}.
\end{align*}
[quotetheorem:3211]
[citeproof:3211]
The proof of the implication "restriction $\Rightarrow$ Kakeya maximal" is a dual argument: the restriction operator $R: f \mapsto \hat{f}|_{S^{n-1}}$ and its adjoint (the extension operator $E: g \mapsto \widehat{g\, d\sigma}$) relate $L^p$ norms over $\mathbb{R}^n$ to $L^2$ norms over $S^{n-1}$. Concentrating $f$ along $\delta$-tubes in physical space translates (via the Fourier transform) into concentration near caps of size $\delta$ on the sphere, and the bilinear form of the restriction estimate controls how many such caps can independently concentrate.
The logical chain is:
\begin{align*}
\text{Restriction conjecture} \implies \text{Kakeya maximal conjecture} \implies \text{Kakeya conjecture}.
\end{align*}
Resolving the Kakeya conjecture is therefore a necessary step toward resolving the restriction conjecture, though it is not sufficient — restriction contains additional information about the curvature of $S^{n-1}$ that Kakeya does not.
### Implication for Bochner--Riesz Multipliers
The Bochner--Riesz summation method attempts to invert the Fourier transform using the multiplier
\begin{align*}
m_\lambda(\xi) = (1 - |\xi|^2)_+^\lambda, \quad \lambda \ge 0.
\end{align*}
[definition: Bochner--Riesz Multiplier Operator]
For $\lambda \ge 0$, the **Bochner--Riesz operator** of order $\lambda$ is the Fourier multiplier operator $B^\lambda : \mathcal{S}(\mathbb{R}^n) \to L^2(\mathbb{R}^n)$ defined by
\begin{align*}
\widehat{B^\lambda f}(\xi) = (1 - |\xi|^2)_+^\lambda\, \hat{f}(\xi).
\end{align*}
[/definition]
The Bochner--Riesz conjecture asserts that $B^\lambda$ extends to a bounded operator $L^p(\mathbb{R}^n) \to L^p(\mathbb{R}^n)$ if and only if
\begin{align*}
\lambda > \max\!\left(n\left|\frac{1}{2} - \frac{1}{p}\right| - \frac{1}{2},\, 0\right).
\end{align*}
[quotetheorem:3212]
[citeproof:3212]
The connection between these two problems is that the multiplier $(1 - |\xi|^2)_+^\lambda$ is supported on the unit ball and has a rapidly oscillating Fourier transform whose $L^p$ properties are controlled by how the sphere $S^{n-1}$ (the boundary of the support) can be covered by small caps — exactly the data encoded in Kakeya-type estimates. The proof goes through the observation that $B^\lambda$ can be expressed as an average of projection operators onto frequency slabs, and the $L^p$ bound for each slab is a restriction-type estimate.
[remark: Summary of Implications]
The three major conjectures are ordered by difficulty:
\begin{align*}
\text{Restriction conjecture} \implies \text{Bochner--Riesz conjecture} \implies \text{Kakeya conjecture}.
\end{align*}
Each implication is strict in the sense that resolving the weaker conjecture does not resolve the stronger one. Kakeya is the "bottom" of this chain: it is necessary for both restriction and Bochner--Riesz, and its resolution in dimension $n = 3$ (Wang--Zahl, 2024) gives hope that the chain can eventually be climbed.
[/remark]
## The $\delta$-Tube Geometry
### Bourgain's Arithmetic Approach
One productive strategy for proving Kakeya bounds uses arithmetic combinatorics. The key observation is that a large collection of $\delta$-tubes in a Kakeya set cannot all pass through a common small region without violating combinatorial constraints. The algebraic structure of $\mathbb{R}^n$ — specifically, the way lines can intersect in a real vector space — controls the possible configurations.
[definition: $(\delta, s)$-Set]
A Borel set $E \subset \mathbb{R}^n$ is a **$(\delta, s)$-set** if for every ball $B(x, r)$ with $r \ge \delta$:
\begin{align*}
\mathcal{L}^n(E \cap B(x, r)) \le C r^s \delta^{n-s}.
\end{align*}
[/definition]
A $(\delta, s)$-set behaves like an $s$-dimensional object at all scales between $\delta$ and 1: it cannot be too concentrated in any ball. The Kakeya conjecture is equivalent to showing that any Kakeya set $E$ contains a $(\delta, n - \varepsilon)$-set for every $\varepsilon > 0$ and sufficiently small $\delta > 0$.
### The Sticky Kakeya Problem
A Kakeya set is called **sticky** if the tubes in the $\delta$-thickening do not separate: tubes in nearby directions remain close to one another across their length, rather than diverging. Sticky Kakeya sets are the hardest case — the tube overlap is maximised and the combinatorial constraints are weakest.
Katz and Tao's breakthrough was to show that any near-extremal Kakeya configuration (one achieving dimension close to $n/2 + 1$) must be approximately sticky, and then to use additive-combinatorial methods to rule out such configurations in dimension 3. The core tool is the following:
[quotetheorem:3213]
[citeproof:3213]
The proof is outside the scope of this chapter. The application to Kakeya proceeds by encoding the direction data of a near-extremal Kakeya set as an arithmetic set in $\mathbb{Z}$, invoking Freiman-type structure theorems to show the direction set must lie in a small algebraic variety, and then deriving a contradiction with the assumption that the set covers all of $S^{n-1}$.
### The Polynomial Method
The most recent advances (Guth--Katz for distinct distances, and Wang--Zahl for three-dimensional Kakeya) use **polynomial partitioning**: the observation that a polynomial $p \in \mathbb{R}[x_1, x_2, x_3]$ of degree $D$ divides $\mathbb{R}^3$ into $O(D^3)$ cells, and a collection of $N$ objects (tubes, points, lines) can be distributed so that each cell contains $O(N/D^3)$ of them. The polynomial zero set $Z(p)$ captures the remaining objects, and its algebraic structure (degree, irreducibility, tangency conditions with lines) can be controlled.
For the three-dimensional Kakeya problem, the polynomial method is used to handle the case of tubes that are "flat" — nearly contained in the zero set of a low-degree polynomial. The complementary case of "non-flat" tubes is handled by separate incidence-geometric arguments. Wang and Zahl's proof requires coordinating these two cases through a multi-scale induction.
## Lower Bounds via Examples
### Sharpness of the Conjectured Bound
The conjecture $\dim_{\mathcal{H}} E = n$ is tight: for any $s < n$, there exist Kakeya sets with Hausdorff dimension at most $s + \varepsilon$ (for any $\varepsilon > 0$), so the bound $n$ cannot be improved. This follows from the Besicovitch construction, which gives sets of measure zero (hence dimension at most $n$, but realising any prescribed dimension less than $n$ requires an additional argument).
More precisely, by varying the rate of overlap in the Perron tree construction, one can produce Kakeya sets of any prescribed Hausdorff dimension in $[1, n]$. The conjecture asserts that the worst case — the minimum possible dimension — is exactly $n$, i.e., that no Kakeya set can have dimension strictly less than $n$.
### The $n$-Dimensional Volumetric Bound
A simple lower bound in every dimension follows from a volumetric argument.
[quotetheorem:3214]
[citeproof:3214]
This bound $(n+1)/2$ is the baseline from which Wolff's bush argument gives $n/2 + 1$ (a small improvement for $n \ge 3$) and the subsequent results give further gains.
[explanation: Why This Problem Is Hard]
The difficulty of the Kakeya conjecture is not that any single tube is hard to understand, but that a Kakeya set must contain $\sim \delta^{-(n-1)}$ tubes (one per direction in a $\delta$-net of $S^{n-1}$) and these tubes can overlap in highly intricate ways. The worst case for the conjecture is when the tubes are arranged so that every point of $E$ is covered by as many tubes as possible — i.e., when the overlap is maximised. In this case, the set $E$ is small (many tubes share the same points), but the combinatorial counting of how many tubes can share a point in different ways is controlled by the geometry of lines in $\mathbb{R}^n$.
In $\mathbb{R}^2$, two lines in different directions can share at most one point, which gives a strong constraint. In $\mathbb{R}^3$, two lines can be skew, which allows far more overlap between tubes. The additional complexity of $\mathbb{R}^n$ for $n \ge 4$ makes the problem progressively harder, and the polynomial and algebraic tools available over $\mathbb{R}$ do not yet provide sufficient control.
[/explanation]
The Kakeya problem's difficulty motivates studying weighted versions of classical operators respecting geometric structure. Muckenhoupt weights characterize when weighted $L^p$ norms of the maximal function and other operators remain controlled, providing the natural framework for handling weights in harmonic analysis.
# 16. Muckenhoupt Weights
The Hardy--Littlewood maximal operator $M : L^p(\mathbb{R}^n) \to L^p(\mathbb{R}^n)$ and Calderón--Zygmund operators $T : L^p(\mathbb{R}^n) \to L^p(\mathbb{R}^n)$ are bounded for $1 < p < \infty$, but this says nothing about boundedness on $L^p(w)$, the space of functions $p$-integrable against a weight $w$. The question is: for which weights does the theory extend? Muckenhoupt's answer, discovered in the early 1970s, is a single self-improving condition on the averages of $w$ over cubes. This chapter defines the $A_p$ classes, establishes their structural properties via the reverse Hölder inequality, and proves Muckenhoupt's theorem that $A_p$ is precisely the class of weights for which the maximal function is bounded on $L^p(w)$. The Coifman--Fefferman theorem then extends this to all Calderón--Zygmund operators. The chapter closes with Hytönen's $A_2$ theorem, a sharp quantitative refinement that was open for nearly four decades.
## Weighted $L^p$ spaces and the problem
A **weight** is a function $w: \mathbb{R}^n \to [0,\infty)$ that is locally integrable and positive almost everywhere. Given a weight $w$ and $1 \le p < \infty$, the weighted Lebesgue space $L^p(w)$ consists of all measurable $f: \mathbb{R}^n \to \mathbb{C}$ for which
\begin{align*}
\|f\|_{L^p(w)} := \left(\int_{\mathbb{R}^n} |f(x)|^p\, w(x)\, d\mathcal{L}^n(x)\right)^{1/p} < \infty.
\end{align*}
This is simply $L^p$ with respect to the measure $d\mu = w\, d\mathcal{L}^n$.
The central problem is to determine, for a given operator $T$, which weights $w$ ensure that $T: L^p(w) \to L^p(w)$ is bounded. The maximal function $M$ provides the prototypical case. Since $Mf$ involves averaging $|f|$ over balls, one expects the weight to interact with these averages in a controlled way — but precisely which control is needed is not obvious from the problem statement alone.
[example: Power weights and the failure of naive conditions]
Consider $w(x) = |x|^\alpha$ on $\mathbb{R}^n$ for $\alpha \in \mathbb{R}$. This weight is locally integrable on $\mathbb{R}^n$ provided $\alpha > -n$. If $f = \mathbb{1}_{B(0,1)}$, then for $x$ near the origin,
\begin{align*}
Mf(x) \ge \frac{1}{|B(0,2)|} \int_{B(0,1)} d\mathcal{L}^n \asymp 1.
\end{align*}
For $\|Mf\|_{L^p(w)}$ to be finite, we need $\int_{B(0,2)} |x|^\alpha\, d\mathcal{L}^n(x) < \infty$, which requires $\alpha > -n$. But the condition $\alpha > -n$ is not sufficient for boundedness of $M$ on all of $L^p(w)$: taking $f_R = \mathbb{1}_{B(0,R)} \cdot |x|^{-\beta}$ for suitable $\beta$, one can check that $\|Mf_R\|_{L^p(w)}$ grows without bound while $\|f_R\|_{L^p(w)}$ remains controlled, whenever $\alpha$ is too negative. The precise threshold turns out to be the $A_p$ condition below.
[/example]
## The $A_p$ classes
[motivation]
### Why averages over cubes?
The maximal function $Mf(x) = \sup_{Q \ni x} \frac{1}{|Q|} \int_Q |f|$ (supremum over cubes containing $x$) controls the local behaviour of $f$ at every scale. For $M$ to be bounded on $L^p(w)$, the weight must not concentrate mass in ways that magnify the averages of functions whose $L^p(w)$ norm is controlled. Heuristically, $M$ is bounded on $L^p(w)$ if and only if the weighted and unweighted averages over each cube are comparable in a uniform sense. Making this precise gives the $A_p$ condition.
### Why Hölder duality?
The weak-type inequality $w(\{Mf > \lambda\}) \lesssim \lambda^{-p} \|f\|_{L^p(w)}^p$ (where $w(E) = \int_E w$) requires controlling $w(Q)$ against $\|f\|_{L^p(w)}$ when $f$ saturates the average over $Q$. If $f = \mathbb{1}_Q / \int_Q w^{1-p'}\, d\mathcal{L}^n$ (roughly the $L^p(w)$-normalised indicator of $Q$), then the average of $f$ over $Q$ in unweighted measure is $|Q|^{-1} \int_Q w^{1-p'} / \int_Q w^{1-p'}$, and the constraint becomes a product of two averages — one of $w$ and one of $w^{1-p'}$ — which must be uniformly bounded. This is exactly the $A_p$ condition.
[/motivation]
For a cube $Q \subset \mathbb{R}^n$, write $\langle f \rangle_Q = \frac{1}{|Q|} \int_Q f\, d\mathcal{L}^n$ for the average of $f$ over $Q$.
[definition: Muckenhoupt $A_p$ Condition]
Let $1 < p < \infty$ and let $p' = p/(p-1)$ be the Hölder conjugate. A weight $w$ belongs to the **Muckenhoupt class** $A_p$ if
\begin{align*}
[w]_{A_p} := \sup_Q \langle w \rangle_Q \cdot \langle w^{1-p'} \rangle_Q^{p-1} < \infty,
\end{align*}
where the supremum is taken over all cubes $Q \subset \mathbb{R}^n$ with sides parallel to the coordinate axes. The quantity $[w]_{A_p}$ is called the **$A_p$ characteristic** of $w$.
For $p = 1$: a weight $w$ belongs to $A_1$ if there exists $C > 0$ such that $\langle w \rangle_Q \le C \cdot \operatorname{ess\,inf}_Q w$ for all cubes $Q$, equivalently $Mw(x) \le C w(x)$ for a.e. $x \in \mathbb{R}^n$. The infimum of such $C$ is $[w]_{A_1}$.
[/definition]
The product packages a two-sided constraint on $w$.
[remark: Interpretation of the $A_p$ condition]
By Hölder's inequality applied with exponents $p$ and $p'$,
\begin{align*}
1 = \langle 1 \rangle_Q = \langle w^{1/p} \cdot w^{-1/p} \rangle_Q \le \langle w \rangle_Q^{1/p} \cdot \langle w^{-1/(p-1)} \rangle_Q^{(p-1)/p},
\end{align*}
so $[w]_{A_p} \ge 1$ always. The condition $[w]_{A_p} < \infty$ says that the weight $w$ and its "dual weight" $w^{1-p'}$ are simultaneously controlled in average — neither can concentrate too severely on any cube. Note that $w^{1-p'} = w^{-1/(p-1)}$ is the weight that appears in the dual space: $(L^p(w))^* = L^{p'}(w^{1-p'})$.
[/remark]
[example: Power weights]
Let $w(x) = |x|^\alpha$ on $\mathbb{R}^n$. Then $w \in A_p$ if and only if $-n < \alpha < n(p-1)$.
To verify the upper bound: for a cube $Q$ centred at the origin of side length $2r$,
\begin{align*}
\langle w \rangle_Q \asymp r^{\alpha}, \qquad \langle w^{1-p'} \rangle_Q^{p-1} = \langle |x|^{\alpha(1-p')} \rangle_Q^{p-1} \asymp r^{\alpha(1-p')(p-1)} = r^{-\alpha}.
\end{align*}
Here we used the integrability of $|x|^\beta$ near the origin in dimension $n$, which requires $\beta > -n$; this gives $\alpha > -n$ and $\alpha(1-p') > -n$, i.e. $\alpha < n(p-1)$. When both conditions hold, $\langle w \rangle_Q \cdot \langle w^{-1/(p-1)} \rangle_Q^{p-1} \asymp r^\alpha \cdot r^{-\alpha} = 1$ uniformly over all cubes $Q$ meeting the origin, and similarly for cubes away from the origin by scaling.
[/example]
## Structural properties of $A_p$ classes
The $A_p$ classes have a rich nested structure that follows from Jensen's inequality and a remarkable self-improvement property.
[quotetheorem:3215]
[citeproof:3215]
The nesting $A_p \subset A_q$ shows that the classes form an increasing filtration. The union $A_\infty = \bigcup_{p \ge 1} A_p$ is a natural limiting class; it admits a characterisation in terms of the reverse Hölder inequality.
[quotetheorem:3216]
[citeproof:3216]
The reverse Hölder inequality is the key to the openness property of $A_p$:
[quotetheorem:3217]
[citeproof:3217]
[remark: Significance of openness]
The openness property says that $A_p$ is not a sharp boundary condition: if a weight satisfies the $A_p$ condition, it satisfies a strictly stronger condition $A_{p-\varepsilon}$. This implies, for instance, that any weight in $A_2$ also lies in some $A_{2-\varepsilon}$, and hence gives bounded operators at exponents strictly below $2$. This self-improvement is invisible from the definition alone and relies fundamentally on the stopping-time structure of the proof.
[/remark]
## Muckenhoupt's characterisation of the maximal function
The $A_p$ condition was introduced precisely to characterise the boundedness of $M$ on $L^p(w)$. Muckenhoupt proved in 1972 that it is both necessary and sufficient.
[quotetheorem:3218]
[citeproof:3218]
[explanation: Why $A_p$ and not a simpler condition]
The $A_p$ condition may at first appear to be a technical artefact, but the necessity proof reveals its inevitability. The worst-case test function for the maximal bound is precisely $f = w^{1-p'} \mathbb{1}_Q$: it saturates the Hölder inequality on the cube, and the maximal function of this $f$ is at least $\langle w^{1-p'} \rangle_Q$ on $Q$. Requiring the weighted $L^p$ norm of $Mf$ not to exceed a constant times the weighted $L^p$ norm of $f$ for these specific test functions is equivalent to the $A_p$ condition. There is no room to replace the $A_p$ condition by something weaker and still bound the worst case.
The condition also has a clean geometric interpretation via the Radon--Nikodym theorem: $w \in A_p$ means that the measure $w\, d\mathcal{L}^n$ is a "doubling measure" that is not too far from Lebesgue measure, in the sense that the ratio of weighted to unweighted averages over cubes is uniformly controlled from both above and below (the latter via the dual weight $w^{1-p'}$).
[/explanation]
## Weighted Calderón--Zygmund theory
The step from the maximal function to Calderón--Zygmund operators follows the same structural logic. Coifman and Fefferman established in 1974 that $A_p$ is also the right condition for singular integrals.
[quotetheorem:3219]
[citeproof:3219]
[remark: The role of $A_\infty$]
The Coifman--Fefferman theorem actually shows something stronger: for any $0 < q < \infty$ and $w \in A_\infty = \bigcup_{p \ge 1} A_p$,
\begin{align*}
\|Tf\|_{L^q(w)} \lesssim \|Mf\|_{L^q(w)}.
\end{align*}
This "good-$\lambda$ inequality" form of the result, proved via a distributional comparison of the level sets of $Tf$ and $Mf$, is the most general weighted estimate for CZ operators and does not require $q > 1$.
[/remark]
## The $A_2$ theorem and sharp constants
The Muckenhoupt and Coifman--Fefferman theorems give qualitative weighted boundedness but say nothing about how the operator norm depends on the weight. A central question in modern harmonic analysis, raised in the 1990s and resolved by Hytönen in 2012, is to find the optimal dependence of $\|T\|_{L^2(w) \to L^2(w)}$ on $[w]_{A_2}$.
[quotetheorem:3220]
[citeproof:3220]
The theorem is sharp: the linear dependence $[w]_{A_2}$ cannot be replaced by any function growing slower than linearly at infinity. For the specific case of the Hilbert transform $H$ on $\mathbb{R}$, the bound $\|H\|_{L^2(w) \to L^2(w)} \asymp [w]_{A_2}$ was established by Petermichl (2007) for $H$ as a limit of dyadic shifts.
This is the content of the theorem — the proof is not covered in this course. The proof uses Hytönen's representation of an arbitrary CZ operator as an average of dyadic shift operators (with respect to random dyadic grids), reducing the estimate to the dyadic case where Bellman function techniques apply.
[explanation: Context and significance of the $A_2$ theorem]
The linear bound $\|T\|_{L^2(w)} \lesssim [w]_{A_2}$ was conjectured by Astala, Iwaniec, and Saksman in 2001 in the context of the Beltrami equation in quasiconformal analysis. The operator bound on $L^2(w)$ for $w \in A_2$ is closely related to regularity theory for degenerate elliptic PDEs: if the coefficient matrix $A(x)$ of an elliptic operator satisfies $w(x) I \le A(x) \le w(x)^{-1} I$ for some $A_2$ weight $w$, then the $A_2$ theorem controls the $L^2$ estimate for the associated singular integral.
The result for $p \ne 2$ follows from the $A_2$ theorem by extrapolation (Rubio de Francia extrapolation), which shows that weighted $L^p$ bounds for a single $p$ and all $A_p$ weights imply weighted $L^p$ bounds for all other exponents, with the dependence on $[w]_{A_p}$ explicitly tracked. Specifically, for $1 < p < \infty$,
\begin{align*}
\|T\|_{L^p(w) \to L^p(w)} \lesssim [w]_{A_p}^{\max(1, 1/(p-1))},
\end{align*}
which is also sharp.
[/explanation]
[example: The $A_2$ bound for power weights]
Take $w(x) = |x|^\alpha$ on $\mathbb{R}$ with $|\alpha| < 1$ (so $w \in A_2$). For a cube $Q = [-r, r]$,
\begin{align*}
\langle w \rangle_Q = \frac{1}{2r} \int_{-r}^r |x|^\alpha\, d\mathcal{L}^1(x) = \frac{r^\alpha}{\alpha + 1}, \qquad \langle w^{-1} \rangle_Q = \frac{r^{-\alpha}}{1 - \alpha},
\end{align*}
so $[w]_{A_2} = \langle w \rangle_Q \cdot \langle w^{-1} \rangle_Q = \frac{1}{(1+\alpha)(1-\alpha)} = \frac{1}{1-\alpha^2}$, which is independent of $r$ (as expected from the homogeneity of the power weight) and tends to $1$ as $\alpha \to 0$ and to $\infty$ as $|\alpha| \to 1$. The $A_2$ theorem then gives $\|H\|_{L^2(|x|^\alpha)} \lesssim \frac{1}{1 - \alpha^2}$, which blows up at the boundary $|\alpha| = 1$ of the $A_2$ range.
[/example]
## The class $A_\infty$ and connections to BMO
The limit class $A_\infty = \bigcup_{p \ge 1} A_p$ admits several equivalent characterisations that connect weight theory to BMO.
[quotetheorem:3221]
[citeproof:3221]
The proof that (i) $\iff$ (ii) follows from the reverse Hölder inequality established above. The characterisation (iii) says that the measure $w\, d\mathcal{L}^n$ is absolutely continuous with respect to Lebesgue measure in a quantitative, uniform way over all cubes. The equivalence (i) $\iff$ (iv) connects $A_\infty$ to BMO: a weight $w$ lies in some $A_p$ class if and only if its logarithm has bounded mean oscillation.
[remark: Connection to quasiconformal mappings]
The equivalence $w \in A_\infty \iff \log w \in \mathrm{BMO}$ has a striking geometric consequence. In the theory of quasiconformal mappings in the plane, the Jacobian of a $K$-quasiconformal map belongs to $A_p$ for $p$ depending on $K$. Since the Jacobian is a non-negative function with $\log J \in \mathrm{BMO}$ (by the Gehring--Astala theorem), the entire $A_\infty$ theory becomes a statement about the regularity of quasiconformal deformations.
[/remark]
Weighted inequalities with Muckenhoupt weights establish that classical singular integrals and Littlewood-Paley operators behave predictably under appropriate weighting. Wavelets provide an orthonormal basis that simultaneously diagonalizes many weighted operators, completing the course by showing how modern multiresolution analysis synthesizes all preceding themes.
# 17. Wavelets
The previous chapters built an elaborate frequency-domain language: Littlewood–Paley decompositions split a function by dyadic frequency annuli, and Besov and Triebel–Lizorkin spaces measure regularity by the sizes of those pieces. A natural question emerges: can one produce a true orthonormal basis for $L^2(\mathbb{R}^n)$ that simultaneously captures spatial location and frequency content — something the Fourier basis, being global in space, cannot do? Wavelets answer this question affirmatively. A single function $\psi$, the *mother wavelet*, is dilated and translated to produce an orthonormal basis for $L^2(\mathbb{R})$; the wavelet coefficients $\langle f, \psi_{j,k} \rangle$ encode the content of $f$ at spatial location $\sim k 2^{-j}$ and frequency $\sim 2^j$. The chapter begins with the algebraic framework — multiresolution analysis — that makes this possible, then describes Daubechies's resolution of the smoothness-versus-compact-support tradeoff, and concludes by showing how wavelet coefficients characterise Besov spaces, connecting back directly to Chapter 11.
## Multiresolution Analysis
### The Problem of Simultaneous Localisation
The Fourier transform is global: modifying $f$ on a small interval changes every frequency component. For signal processing and PDE applications one wants a decomposition that is local in both space and frequency. A first attempt is to tile $\mathbb{R}$ with intervals and expand $f$ on each piece using Fourier series, but this creates artificial discontinuities at boundaries. The goal is an orthonormal basis $\{\psi_{j,k}\}$ of $L^2(\mathbb{R})$ where each basis element lives at a definite spatial scale and location.
### The Multiresolution Structure
[definition: Multiresolution Analysis]
A **multiresolution analysis (MRA)** of $L^2(\mathbb{R})$ is a sequence of closed subspaces $\{V_j\}_{j \in \mathbb{Z}} \subset L^2(\mathbb{R})$ satisfying:
1. **(Nesting)** $V_j \subset V_{j+1}$ for all $j \in \mathbb{Z}$.
2. **(Density and separation)** $\overline{\bigcup_{j \in \mathbb{Z}} V_j} = L^2(\mathbb{R})$ and $\bigcap_{j \in \mathbb{Z}} V_j = \{0\}$.
3. **(Scaling)** $f \in V_j \iff f(2 \cdot) \in V_{j+1}$ for all $j \in \mathbb{Z}$.
4. **(Translation invariance)** $f \in V_0 \implies f(\cdot - k) \in V_0$ for all $k \in \mathbb{Z}$.
5. **(Riesz basis)** There exists $\varphi \in V_0$, called the **scaling function** or **father wavelet**, such that $\{\varphi(\cdot - k) : k \in \mathbb{Z}\}$ is an orthonormal basis for $V_0$.
[/definition]
The scaling condition connects adjacent levels: $V_j$ consists precisely of functions in $V_{j+1}$ at resolution $2^j$. Since $V_0 \subset V_1$ and $\{\varphi(2x - k)\}$ is an orthonormal basis for $V_1$, the father wavelet satisfies the **two-scale relation**
\begin{align*}
\varphi(x) = \sqrt{2} \sum_{k \in \mathbb{Z}} h_k \, \varphi(2x - k)
\end{align*}
for a sequence of real coefficients $(h_k)_{k \in \mathbb{Z}}$ in $\ell^2(\mathbb{Z})$. Taking the Fourier transform — using the convention $\hat{f}(\xi) = \int_{\mathbb{R}} f(x) e^{-i\xi x} \, d\mathcal{L}^1(x)$ throughout this chapter — this becomes
\begin{align*}
\hat{\varphi}(\xi) = m_0\!\left(\frac{\xi}{2}\right) \hat{\varphi}\!\left(\frac{\xi}{2}\right), \qquad m_0(\xi) = \frac{1}{\sqrt{2}} \sum_{k \in \mathbb{Z}} h_k e^{-ik\xi}.
\end{align*}
The function $m_0$, called the **low-pass filter**, is $2\pi$-periodic and must satisfy $|m_0(\xi)|^2 + |m_0(\xi + \pi)|^2 = 1$ for a.e. $\xi$ — the quadrature mirror filter condition — for the translates of $\varphi$ to form an orthonormal system.
[example: The Haar MRA]
The simplest example is the Haar system. Set $\varphi = \mathbb{1}_{[0,1)}$, the indicator of the unit interval. Define
\begin{align*}
V_j = \overline{\operatorname{span}}\{ \mathbb{1}_{[k2^{-j},(k+1)2^{-j})} : k \in \mathbb{Z} \},
\end{align*}
the space of functions in $L^2(\mathbb{R})$ that are constant on each dyadic interval of length $2^{-j}$. These spaces satisfy all five MRA axioms: nesting holds because any interval at scale $j$ is a union of two intervals at scale $j+1$; the union is dense by the Lebesgue differentiation theorem; the intersection is $\{0\}$; scaling and translation invariance are immediate from the definition; and $\{\varphi(\cdot - k)\} = \{\mathbb{1}_{[k,k+1)}\}$ is an orthonormal family spanning $V_0$ (the indicator functions are pairwise disjointly supported and unit-norm).
With normalisation $h_0 = h_1 = 1/\sqrt{2}$, the low-pass filter is $m_0(\xi) = \frac{1}{2}(1 + e^{-i\xi})$. We have $|1 + e^{-i\xi}|^2 = 2 + 2\cos\xi$ and $|1 - e^{-i\xi}|^2 = 2 - 2\cos\xi$, so $|m_0|^2 + |m_0(\cdot + \pi)|^2 = \frac{1}{2}(2 + 2\cos\xi) + \frac{1}{2}(2 - 2\cos\xi) = 2$. This fails the normalisation $= 1$, but this reflects the fact that the Haar $\varphi$ generates an orthonormal system with the convention $h_0 = h_1 = 1/\sqrt{2}$, giving $m_0(\xi) = \frac{1}{\sqrt{2}} \cdot \frac{1}{\sqrt{2}}(1 + e^{-i\xi}) = \frac{1}{2}(1+e^{-i\xi})$, from which $|m_0(\xi)|^2 + |m_0(\xi+\pi)|^2 = \frac{1}{4}(2 + 2\cos\xi) + \frac{1}{4}(2-2\cos\xi) = 1$, as required.
[/example]
### The Mother Wavelet
Given an MRA, let $W_j$ denote the orthogonal complement of $V_j$ in $V_{j+1}$:
\begin{align*}
V_{j+1} = V_j \oplus W_j.
\end{align*}
The spaces $W_j$ are mutually orthogonal, and $L^2(\mathbb{R}) = \bigoplus_{j \in \mathbb{Z}} W_j$ in the sense that every $f \in L^2(\mathbb{R})$ has the orthogonal decomposition $f = \sum_{j \in \mathbb{Z}} P_{W_j} f$, with convergence in $L^2(\mathbb{R})$.
[definition: Mother Wavelet]
The **mother wavelet** associated with an MRA $\{V_j\}$ with father wavelet $\varphi$ is the function $\psi \in W_0 \subset V_1$ defined by
\begin{align*}
\hat{\psi}(\xi) = m_1\!\left(\frac{\xi}{2}\right) \hat{\varphi}\!\left(\frac{\xi}{2}\right),
\end{align*}
where $m_1(\xi) = e^{-i\xi} \overline{m_0(\xi + \pi)}$ is the **high-pass filter**. For $j, k \in \mathbb{Z}$, set
\begin{align*}
\psi_{j,k}(x) = 2^{j/2} \psi(2^j x - k).
\end{align*}
[/definition]
The normalisation $2^{j/2}$ ensures $\|\psi_{j,k}\|_{L^2} = \|\psi\|_{L^2}$. The collection $\{\psi_{j,k} : j, k \in \mathbb{Z}\}$ forms an orthonormal basis for $L^2(\mathbb{R})$, with $\{\psi_{j,k} : k \in \mathbb{Z}\}$ forming an orthonormal basis for $W_j$.
[quotetheorem:3222]
[citeproof:3222]
[example: Haar Wavelet]
For the Haar MRA, $W_0$ is the space of $L^2(\mathbb{R})$ functions that are constant on $[0, 1/2)$ and $[1/2, 1)$ with zero integral over $[0,1)$ and zero outside $[0,1)$. The unique element (up to sign) of $W_0$ with unit norm supported in $[0,1)$ is
\begin{align*}
\psi(x) = \mathbb{1}_{[0,1/2)}(x) - \mathbb{1}_{[1/2,1)}(x).
\end{align*}
The resulting system $\psi_{j,k}(x) = 2^{j/2}\psi(2^j x - k)$ is the classical Haar wavelet basis, the oldest known wavelet system (Haar, 1910). Each $\psi_{j,k}$ is supported on the dyadic interval $[k 2^{-j}, (k+1)2^{-j})$, equals $2^{j/2}$ on the left half and $-2^{j/2}$ on the right half, and has integral zero. The wavelet coefficient $\langle f, \psi_{j,k} \rangle$ measures the difference of the averages of $f$ over the two halves of $[k2^{-j}, (k+1)2^{-j})$: it is the local oscillation of $f$ at scale $2^{-j}$ and location $k 2^{-j}$.
The Haar wavelet is not smooth — $\psi$ is discontinuous — and its Fourier transform $\hat\psi(\xi) = e^{-i\xi/2} \operatorname{sinc}(\xi/4)\sin(\xi/4)/\sqrt{2\pi}$ decays only as $|\xi|^{-1}$ for large $|\xi|$. For applications requiring smooth wavelets, the Haar system is insufficient, which motivates the construction of the next section.
[/example]
### Vanishing Moments
A key property of a mother wavelet $\psi$ is the number of its *vanishing moments*. We say $\psi$ has $N$ vanishing moments if
\begin{align*}
\int_{\mathbb{R}} x^m \psi(x) \, d\mathcal{L}^1(x) = 0 \qquad \text{for all } 0 \le m < N.
\end{align*}
In Fourier terms this means $\hat\psi$ vanishes to order $N$ at the origin: $\hat\psi(\xi) = O(|\xi|^N)$ as $\xi \to 0$. The Haar wavelet has exactly one vanishing moment ($\int \psi = 0$). Vanishing moments control how well the wavelet can detect polynomial behaviour: if $f$ is a polynomial of degree less than $N$ near a point $x_0$, then $\langle f, \psi_{j,k} \rangle \approx 0$ for $\psi_{j,k}$ localised near $x_0$. More vanishing moments means the wavelet basis "sees" only the irregular part of $f$, making wavelet coefficients a sensitive detector of singularities.
## Daubechies Wavelets
### The Smoothness–Support Tradeoff
The Haar wavelet is compactly supported but not smooth. The Shannon wavelet, which lives at a fixed dyadic frequency band $\{|\xi| \in [\pi, 2\pi]\}$, is smooth (in fact, band-limited) but has infinite support, decaying only as $|x|^{-1}$. A fundamental question is whether one can construct compactly supported wavelets with any prescribed number of vanishing moments and any prescribed degree of smoothness. The answer, resolved by Daubechies in 1988, is yes.
[quotetheorem:3223]
[citeproof:3223]
The course states this theorem without full proof. The construction proceeds as follows. One designs the low-pass filter $m_0$ as a trigonometric polynomial of degree $N$ satisfying the quadrature mirror filter condition $|m_0(\xi)|^2 + |m_0(\xi + \pi)|^2 = 1$, with $m_0$ having a zero of order $N$ at $\xi = \pi$ (to enforce $N$ vanishing moments for $\psi$). The Daubechies solution is
\begin{align*}
m_0(\xi) = \left(\frac{1 + e^{-i\xi}}{2}\right)^N Q(\xi),
\end{align*}
where $Q$ is a trigonometric polynomial chosen so that $|m_0(\xi)|^2 + |m_0(\xi+\pi)|^2 = 1$. Setting $L(\xi) = |Q(\xi)|^2$, the condition becomes $P(\sin^2(\xi/2)) = 1$ for a polynomial $P$ of degree $N-1$, and the unique solution is the Fejér–Riesz factorisation of a specific polynomial related to the binomial expansion. The father wavelet $\varphi$ is then defined by the infinite product
\begin{align*}
\hat{\varphi}(\xi) = \prod_{j=1}^{\infty} m_0\!\left(\frac{\xi}{2^j}\right),
\end{align*}
which converges because $m_0(0) = 1$ and $m_0$ is smooth. The smoothness exponent $r(N)$ grows linearly in $N$ but with constant less than one: $r(N) \approx 0.2075 N$ for large $N$.
[remark: Regularity and Support]
For $N = 1$, the Daubechies wavelet is the Haar wavelet (discontinuous). For $N = 2$ (the Daubechies $D_4$ wavelet, named for its 4-coefficient filter), the wavelet is continuous but not differentiable, supported on $[0, 3]$. For $N = 3$ (Daubechies $D_6$), the wavelet has one continuous derivative. The trade-off is explicit: higher $N$ means longer support (the interval has length $2N - 1$) but greater smoothness. There is no compactly supported wavelet that is both infinitely smooth and has finitely many vanishing moments — this was also established by Daubechies. Smooth compactly supported wavelets of class $C^\infty$ do not exist; one must either sacrifice compact support or accept that smoothness is at most $C^r$ for finite $r$.
[/remark]
[example: Daubechies $D_4$ Wavelet]
For $N = 2$, the low-pass filter coefficients $h_0, h_1, h_2, h_3$ satisfying the orthonormality and moment conditions are
\begin{align*}
h_0 &= \frac{1 + \sqrt{3}}{4\sqrt{2}}, & h_1 &= \frac{3 + \sqrt{3}}{4\sqrt{2}}, & h_2 &= \frac{3 - \sqrt{3}}{4\sqrt{2}}, & h_3 &= \frac{1 - \sqrt{3}}{4\sqrt{2}}.
\end{align*}
These satisfy $\sum_k h_k = \sqrt{2}$ (normalisation), $\sum_k h_k h_{k-2m} = \delta_{m,0}$ (orthogonality), and $\sum_k (-1)^k k h_k = 0$ (first moment condition for one vanishing moment of $\psi$ beyond the baseline). The father wavelet $\varphi$ is defined by the two-scale relation $\varphi(x) = \sqrt{2} \sum_{k=0}^3 h_k \varphi(2x - k)$, supported on $[0, 3]$, and is continuous but not differentiable. The mother wavelet $\psi(x) = \sqrt{2} \sum_{k=0}^3 (-1)^k h_{1-k} \varphi(2x - k)$ is similarly supported on $[0, 3]$, has two vanishing moments, and belongs to $C^{0.55\ldots}(\mathbb{R})$ in the sense of Hölder continuity with exponent approximately $0.55$.
[/example]
### Higher Dimensions
The construction extends to $\mathbb{R}^n$ via tensor products. Given an MRA $\{V_j\}$ of $L^2(\mathbb{R})$ with father wavelet $\varphi$ and mother wavelet $\psi$, one forms the $n$-dimensional MRA by setting $V_j = V_j \otimes \cdots \otimes V_j$ ($n$ factors). The orthonormal basis for $L^2(\mathbb{R}^n)$ consists of $2^n - 1$ families: for each $\varepsilon = (\varepsilon_1, \ldots, \varepsilon_n) \in \{0, 1\}^n \setminus \{(0, \ldots, 0)\}$, set
\begin{align*}
\Psi^\varepsilon(x_1, \ldots, x_n) = \prod_{i=1}^n \phi^{\varepsilon_i}(x_i),
\end{align*}
where $\phi^0 = \varphi$ and $\phi^1 = \psi$, and dilate and translate:
\begin{align*}
\Psi^\varepsilon_{j,k}(x) = 2^{jn/2} \Psi^\varepsilon(2^j x - k), \qquad j \in \mathbb{Z},\; k \in \mathbb{Z}^n.
\end{align*}
Then $\{\Psi^\varepsilon_{j,k} : \varepsilon \in \{0,1\}^n \setminus \{0\},\; j \in \mathbb{Z},\; k \in \mathbb{Z}^n\}$ is an orthonormal basis for $L^2(\mathbb{R}^n)$.
## Wavelet Characterisation of Besov Spaces
### Why Wavelets Characterise Smoothness
Recall from Chapter 11 that a function $f \in L^p(\mathbb{R}^n)$ lies in the Besov space $B^s_{p,q}$ when the Littlewood–Paley pieces $\Delta_j f$ (frequency-localised to the annulus $\{|\xi| \sim 2^j\}$) satisfy the $\ell^q$-in-scale summability condition
\begin{align*}
\|f\|_{B^s_{p,q}} = \left( \sum_{j \in \mathbb{Z}} 2^{jsq} \|\Delta_j f\|_{L^p}^q \right)^{1/q} < \infty.
\end{align*}
A wavelet $\psi$ with $N$ vanishing moments and compact support has the property that its dilates $\psi_{j,\cdot}$ are, from the Fourier side, localised near frequency $2^j$: the Fourier transform $\hat\psi_{j,k}(\xi) = 2^{-j/2} \hat\psi(2^{-j}\xi) e^{-ik\xi/2^j}$ has its main mass in $\{|\xi| \sim 2^j\}$. This means the wavelet coefficient $\langle f, \psi_{j,k} \rangle$ captures the same information as $\Delta_j f$, but instead of measuring an $L^p$ norm over space it measures a single coefficient. The wavelet characterisation of Besov spaces replaces the spatial $L^p$ norm of the Littlewood–Paley piece by a discrete $\ell^p$ norm over coefficients indexed by $k$.
### The Characterisation Theorem
[quotetheorem:3224]
[citeproof:3224]
[remark: Scaling of the Norm]
The exponent $s + n/2 - n/p$ in the theorem has a transparent interpretation. The factor $2^{js}$ penalises or rewards high frequencies in the standard Besov sense: positive $s$ means regularity. The factor $2^{j(n/2 - n/p)}$ compensates for the different normalisations of the $L^2$ and $L^p$ norms of a function localised on a cube of side $2^{-j}$: if $\|f\|_{L^2(Q)} \sim 2^{-jn/2}$ then $\|f\|_{L^p(Q)} \sim 2^{-jn/p}$, and the ratio $2^{j(n/2 - n/p)}$ converts between them.
[/remark]
### Special Cases and Applications
The wavelet characterisation specialises cleanly to previously encountered spaces. For $p = q = 2$, the Besov norm becomes
\begin{align*}
\|f\|_{B^s_{2,2}}^2 \asymp \sum_{j \in \mathbb{Z}} 2^{2js} \sum_{\varepsilon,k} |c^\varepsilon_{j,k}(f)|^2 = \sum_{\varepsilon,j,k} 2^{2js} |\langle f, \Psi^\varepsilon_{j,k} \rangle|^2,
\end{align*}
which is simply a weighted $\ell^2$ norm of the wavelet coefficients, with weight $2^{js}$ on the $j$-th level. Since $B^s_{2,2} = H^s(\mathbb{R}^n)$ (the $L^2$-Sobolev space), this gives a wavelet characterisation of Sobolev regularity purely in terms of the decay rate of wavelet coefficients.
For $p = q = \infty$, the characterisation of the Hölder–Zygmund space $C^s_* = B^s_{\infty,\infty}$ becomes
\begin{align*}
\|f\|_{C^s_*} \asymp \sup_{j \in \mathbb{Z}} \sup_{\varepsilon,k} 2^{js} |c^\varepsilon_{j,k}(f)|,
\end{align*}
so $f \in C^s_*$ if and only if the wavelet coefficients at level $j$ are all bounded by $C 2^{-js}$. This is a pointwise decay condition on coefficients, not an averaged one.
[example: Identifying Singularities via Wavelet Coefficients]
Consider $f(x) = |x|^\alpha \mathbb{1}_{[-1,1]}(x)$ for $\alpha > 0$. This function has a singularity at the origin. The wavelet coefficient at scale $j$ and location $k$ is
\begin{align*}
c_{j,k}(f) = \langle f, \psi_{j,k} \rangle = 2^{j/2} \int_{-1}^1 |x|^\alpha \psi(2^j x - k) \, d\mathcal{L}^1(x).
\end{align*}
For $k \ne 0$, the support of $\psi_{j,k}$ is $[k2^{-j}, (k+1)2^{-j}]$, which does not contain the origin for large $j$. On this support, $f(x) = x^\alpha$ is smooth, so the $N$ vanishing moments of $\psi$ give $c_{j,k}(f) = O(2^{j/2} \cdot (k2^{-j})^{\alpha - N} \cdot 2^{-jN}) = O(2^{-j(N - 1/2)} k^{\alpha - N})$, which decays rapidly in $j$. For $k = 0$, the support of $\psi_{j,0}$ contains the origin. Integration by parts is no longer available; instead one estimates $|c_{j,0}(f)| \le \|f\|_{L^\infty(\operatorname{supp}\psi_{j,0})} \|\psi\|_{L^1} \cdot 2^{j/2} \lesssim 2^{-j\alpha} \cdot 2^{j/2} = 2^{-j(\alpha - 1/2)}$. The coefficient near the singularity decays only at rate $2^{-j(\alpha - 1/2)}$, encoding that $f \in B^\alpha_{\infty,\infty}$ with the correct regularity exponent $\alpha$. The wavelet coefficients thus pinpoint both the *location* (only the $k = 0$ coefficient at each scale fails the fast-decay condition) and the *strength* (exponent $\alpha$) of the singularity.
[/example]
### Connection to Littlewood–Paley Theory
The wavelet basis and the Littlewood–Paley decomposition are two ways of organising the same spectral information. The key difference is that wavelet coefficients are discrete — indexed by $j$ and $k$ — while Littlewood–Paley pieces $\Delta_j f$ are $L^p$ functions. The wavelet characterisation of Besov spaces shows that the continuous family $\{2^{js} \|\Delta_j f\|_{L^p}\}_{j \in \mathbb{Z}}$ in the Besov norm is equivalent to the discrete family $\{2^{j(s+n/2-n/p)} (\sum_k |c^\varepsilon_{j,k}(f)|^p)^{1/p}\}_{j \in \mathbb{Z}}$.
This discretisation is the key advantage of wavelets in applications such as numerical analysis and signal compression: instead of storing $f$ as a continuous function, one stores its wavelet coefficients $c_{j,k}(f)$, and the Besov norm controls how well one can approximate $f$ by truncating to the largest coefficients. Specifically, if one retains only the $M$ largest wavelet coefficients of $f \in B^s_{p,q}$, the $L^p$ approximation error decays at rate $M^{-s/n}$, a rate determined entirely by the Besov regularity of $f$. This is the theoretical foundation of wavelet-based image compression.
<!-- illustration-needed: side-by-side comparison of Littlewood–Paley frequency annuli (rings in the Fourier domain) versus wavelet tiling (dyadic rectangles in the time-frequency plane), with wavelet $\psi_{j,k}$ shown as a rectangle centered at $(k 2^{-j}, 2^j)$ in the Heisenberg box diagram — this is the standard Heisenberg box or "tiling of the time-frequency plane" illustration -->
## References
- Grafakos, *Classical Fourier Analysis* (3rd ed., Springer, 2014)
- Grafakos, *Modern Fourier Analysis* (3rd ed., Springer, 2014)
- Stein, *Singular Integrals* (Princeton, 1970)
- Stein, *Harmonic Analysis* (Princeton, 1993)
- Duoandikoetxea, *Fourier Analysis* (AMS, 2001)
- Triebel, *Theory of Function Spaces* (Birkhäuser, 1983)
- Daubechies, *Ten Lectures on Wavelets* (SIAM, 1992)
Contents
- 1. Real and Complex Interpolation
- Distribution Functions and Decreasing Rearrangements
- Lorentz Spaces
- The Riesz–Thorin Theorem
- The Hausdorff–Young Inequality
- The Marcinkiewicz Interpolation Theorem
- 2. The Hardy-Littlewood Maximal Function
- Variants of the Maximal Function
- The Vitali Covering Lemma
- The Hardy–Littlewood Maximal Theorem
- The Lebesgue Differentiation Theorem
- Lebesgue Points
- 3. The Calderón-Zygmund Decomposition
- Overview
- The Dyadic Grid
- The Calderón-Zygmund Decomposition
- The Good and Bad Parts
- The Whitney Decomposition
- A Second Proof of Weak-(1,1) for the Maximal Function
- 4. The Hilbert Transform
- Three Equivalent Definitions
- The Conjugate Poisson Integral
- The Fourier Multiplier
- Equivalence on $\mathcal{S}(\mathbb{R})$
- $L^2$ Boundedness via Plancherel
- Weak-(1,1) and $L^p$ Boundedness
- The Hörmander Kernel Condition
- Weak-(1,1) via the Calderón–Zygmund Decomposition
- $L^p$ Boundedness for $1 < p < \infty$
- Failure at the Endpoints
- The Maximal Hilbert Transform and Cotlar's Identity
- 5. Calderón-Zygmund Operators
- The Riesz Transforms
- Calderón–Zygmund Kernels
- The Calderón–Zygmund Theorem
- The Cotlar–Stein Lemma
- 6. The T(1) and T(b) Theorems
- Standard Kernels and Non-Convolution Operators
- The Weak Boundedness Property
- The T(1) Theorem
- The T(b) Theorem
- The Cauchy Integral on Lipschitz Curves
- 7. The Real Hardy Space $H^1$
- The Maximal-Function Definition
- The Riesz Transform Characterisation
- Atoms and the Coifman–Latter Decomposition
- Calderón–Zygmund Operators on H^1
- 8. BMO and Fefferman Duality
- The Mean Oscillation Seminorm
- The John–Nirenberg Inequality
- The Sharp Maximal Function
- Carleson Measures and the Poisson Characterisation
- Fefferman Duality: $(H^1)^* = \mathrm{BMO}$
- 9. The Littlewood-Paley Decomposition
- Dyadic Frequency Projections
- Why not just use characteristic functions in frequency space?
- The Square Function and the $L^p$ Equivalence
- Bernstein Inequalities and Frequency-Localised Functions
- What problem do Bernstein inequalities solve?
- Almost Orthogonality and Paraproducts
- 10. Fourier Multipliers
- Fourier Multiplier Operators
- The Mihlin Multiplier Theorem
- The Marcinkiewicz Multiplier Theorem
- The One-Dimensional Statement
- The Higher-Dimensional Product Condition
- The Calderón--Zygmund Inequality for the Laplacian
- Bessel Potential Spaces
- 11. Besov and Triebel-Lizorkin Spaces
- Chapter 11: Besov and Triebel–Lizorkin Spaces
- Besov Spaces
- Why $\ell^q$ over scales is not enough
- Independence of the Littlewood–Paley resolution
- Triebel–Lizorkin Spaces
- Reversing the order of integration
- Identifications with Classical Spaces
- Bessel potential spaces are Triebel–Lizorkin
- Hölder–Zygmund spaces are Besov
- Fractional Sobolev spaces are Besov
- Embeddings
- Sobolev-type embeddings within the Besov family
- Cross embeddings between Besov and Triebel–Lizorkin
- Real Interpolation and the $K$-Functional
- The interpolation setup
- Interpolation produces Besov spaces
- Hardy–Littlewood–Sobolev and Sobolev Embedding
- The fractional integral operator
- Sobolev embedding for Bessel potential spaces
- Sharpness via the Besov scale
- 12. Stationary Phase
- The Problem of Rapid Oscillation
- The Van der Corput Lemma
- Applications of the Van der Corput Bound
- The Stationary Phase Lemma
- The Case Without Stationary Points
- Asymptotic Expansions and Full Amplitude
- Connection to the Method of Descent and Multidimensional Integrals
- Summary and Forward Reference
- 13. Restriction Theorems
- The Restriction Problem
- Why curvature matters
- Necessary conditions from scaling
- The Stein--Tomas Theorem
- The dual extension formulation
- The Decay of the Fourier Transform of Surface Measure
- Structure of the $TT^*$ Argument
- 14. Strichartz Estimates
- The Free Schrödinger Propagator
- Admissible Pairs and the Strichartz Estimates
- The Retarded Strichartz Estimate and the Inhomogeneous Problem
- Application: Well-Posedness of the Cubic NLS
- The Energy Space and Higher Regularity
- 15. The Kakeya Conjecture
- From Rotating Needles to Measure Zero
- The Kakeya Needle Problem
- Besicovitch's Construction
- Hausdorff Dimension and the Conjecture
- Hausdorff Dimension of Kakeya Sets
- The Kakeya Conjecture
- Dimension Bounds in Higher Dimensions
- The $\delta$-Discretisation and Maximal Functions
- Successive Bounds
- Connections to Restriction and Bochner--Riesz
- Why Kakeya Matters for Fourier Analysis
- Implication for the Restriction Conjecture
- Implication for Bochner--Riesz Multipliers
- The $\delta$-Tube Geometry
- Bourgain's Arithmetic Approach
- The Sticky Kakeya Problem
- The Polynomial Method
- Lower Bounds via Examples
- Sharpness of the Conjectured Bound
- The $n$-Dimensional Volumetric Bound
- 16. Muckenhoupt Weights
- Weighted $L^p$ spaces and the problem
- The $A_p$ classes
- Why averages over cubes?
- Why Hölder duality?
- Structural properties of $A_p$ classes
- Muckenhoupt's characterisation of the maximal function
- Weighted Calderón--Zygmund theory
- The $A_2$ theorem and sharp constants
- The class $A_\infty$ and connections to BMO
- 17. Wavelets
- Multiresolution Analysis
- The Problem of Simultaneous Localisation
- The Multiresolution Structure
- The Mother Wavelet
- Vanishing Moments
- Daubechies Wavelets
- The Smoothness–Support Tradeoff
- Higher Dimensions
- Wavelet Characterisation of Besov Spaces
- Why Wavelets Characterise Smoothness
- The Characterisation Theorem
- Special Cases and Applications
- Connection to Littlewood–Paley Theory
- References
Androma Graduate Harmonic Analysis
Content
Problems
History
Created by admin on 5/4/2026 | Last updated on 6/1/2026
Prerequisites
No prerequisites required for this page.
Rate this page
★
★
★
★
★
Poor
Excellent