The central problem of partial differential equations is existence: given a PDE such as $\Delta f = g$, does a solution exist, and in what sense? Classical methods require candidates to be twice continuously differentiable, but the spaces $C^k(\mathbb{R}^n)$ have terrible compactness properties — the closed unit ball is never compact in infinite dimensions, and [sequences](/page/Sequence) of smooth [functions](/page/Function) can converge pointwise to discontinuous [limits](/page/Limit). This makes it impossible to extract convergent subsequences from approximating sequences, which is the standard strategy for proving existence.
The resolution is to work in larger, better-behaved function spaces — the $L^p$ spaces and their cousins the [Sobolev spaces](/page/Sobolev%20Space) $W^{k,p}$ — where completeness and compactness are available. The cost is that functions in these spaces are only defined up to [sets](/page/Set) of measure zero, and "derivatives" must be reinterpreted in a weak ([distributional](/page/Distribution)) sense. The payoff is enormous: the weak formulation of a PDE makes existence a consequence of functional analysis (Riesz representation, Lax-Milgram), and the Sobolev embedding theorems then bootstrap the weak solution back to classical regularity.
This course develops the machinery needed for this program. Chapter 1 establishes the properties of $L^p$ spaces — completeness, density, [separability](/page/Separable), and regularity results (Lebesgue [differentiation](/page/Derivative), Lusin's theorem) — that form the analytic foundation. Chapter 2 addresses the "hunt for compactness" in infinite-dimensional spaces: weak and weak-$*$ topologies, reflexivity (Kakutani), uniform convexity (Milman-Pettis), and the identification of dual spaces. Chapter 3 develops Fourier analysis as a tool for decomposing functions and reducing PDEs to algebraic equations. Chapter 4 introduces Sobolev spaces, proves the Sobolev embedding theorems, and applies everything to the Dirichlet problem for Poisson's equation.
# Integration
Measure theory and Lebesgue integration provide the technical substrate on which the entire course is built. The key advantage of the [Lebesgue integral](/page/Lebesgue%20Integral) over the [Riemann integral](/page/Riemann%20Integral) is not greater generality per se, but the availability of powerful limit theorems — the Monotone Convergence Theorem, Fatou's Lemma, and the Dominated Convergence Theorem — that allow us to interchange limits and integrals under controlled hypotheses. These theorems fail for the Riemann integral: if $\{q_1, q_2, \ldots\}$ is an enumeration of $\mathbb{Q} \cap [0,1]$, the sequence $f_n := \mathbb{1}_{\{q_1, \ldots, q_n\}}$ converges pointwise to $\mathbb{1}_{\mathbb{Q} \cap [0,1]}$, each $f_n$ is Riemann integrable, but the limit is not. The Lebesgue theory handles this effortlessly.
## [Integrability](/page/Integral) and the Convergence Theorems
The Lebesgue integral is constructed in three stages: first for simple functions (measurable functions taking finitely many values), then for non-negative measurable functions (via supremum over simple functions below), and finally for general measurable functions (by decomposing into positive and negative parts). For a simple function $s = \sum_{i=1}^n \alpha_i \mathbb{1}_{A_i}$ on a measure space $(E, \mathcal{E}, \mu)$, the integral over a measurable set $A$ is $\int_A s \, d\mu := \sum_{i=1}^n \alpha_i \mu(A_i \cap A)$. For a non-negative measurable function $f$, $\int_E f \, d\mu := \sup\{\int_E s \, d\mu : 0 \le s \le f, \, s \text{ simple}\}$.
A basic but useful estimate relating the integral to the measure of level sets is the Markov (or Chebyshev) inequality: if $f \ge 0$ is measurable and $\alpha > 0$, then $\mu(\{f \ge \alpha\}) \le \frac{1}{\alpha} \int f \, d\mu$. This follows immediately by noting that $\alpha \mathbb{1}_{\{f \ge \alpha\}} \le f$.
[quotetheorem:514]
A direct consequence is that if $\int_E f \, d\mu < \infty$ for $f \ge 0$, then $f < \infty$ a.e.: the sets $A_n := \{f \ge n\}$ decrease to $\{f = +\infty\}$, and $\mu(A_n) \le \frac{1}{n} \int f \, d\mu \to 0$.
[citeproof:514]
The three convergence theorems are the workhorses of the Lebesgue theory. The Monotone Convergence Theorem handles the simplest case: non-negative functions increasing to a limit.
[quotetheorem:509]
The proof constructs, for each simple function $s \le f$ and each $c \in (0,1)$, the sets $E_k := \{x : f_k(x) \ge c \cdot s(x)\}$, which increase to $E$ by pointwise convergence. The key step is that the map $A \mapsto \int_A s \, d\mu$ is itself a measure (by the $\sigma$-additivity of $\mu$ and the finiteness of the sum defining $s$), so [continuity](/page/Continuity) from below gives $\int_{E_k} s \, d\mu \uparrow \int_E s \, d\mu$. The inequality $\int f_k \, d\mu \ge c \int_{E_k} s \, d\mu$ (valid since $f_k \ge cs$ on $E_k$) then yields $\lim \int f_k \, d\mu \ge c \int s \, d\mu$, and taking $c \uparrow 1$ and the supremum over $s$ finishes the argument. The non-negativity and monotonicity hypotheses are both essential: without monotonicity, mass can escape to infinity (consider $f_n = n \mathbb{1}_{(0, 1/n]}$ on $[0,1]$, where $\int f_n = 1$ for all $n$ but $f_n \to 0$ pointwise).
An immediate and important corollary is that for non-negative measurable functions $(f_k)_k$, $\int \sum_k f_k \, d\mu = \sum_k \int f_k \, d\mu$ — infinite sums and integrals commute when everything is non-negative.
[citeproof:509]
When the sequence is not monotone, we lose equality but retain a one-sided estimate.
[quotetheorem:510]
Fatou's Lemma is proved by applying the Monotone Convergence Theorem to the increasing sequence $g_n := \inf_{k \ge n} f_k$, which satisfies $g_n \uparrow \liminf_k f_k$ and $g_n \le f_k$ for all $k \ge n$. The inequality can be strict: the sliding bump $f_n = n \mathbb{1}_{(0,1/n]}$ has $\liminf f_n = 0$ a.e. but $\liminf \int f_n = 1$.
[citeproof:510]
To upgrade from the one-sided Fatou inequality to an equality, we need to prevent mass from escaping. The Dominated Convergence Theorem identifies the precise condition: the existence of an integrable dominating function.
[quotetheorem:4]
The proof is a beautiful application of Fatou's lemma to the *non-negative* functions $h_k := 2g - |f_k - f|$, where $g$ is the dominating function. Since $h_k \ge 0$ (by the triangle inequality and $|f_k|, |f| \le g$) and $h_k \to 2g$ pointwise, Fatou gives $\int 2g \le \liminf \int h_k = \int 2g - \limsup \int |f_k - f|$, which forces $\limsup \int |f_k - f| \le 0$. The domination hypothesis $|f_k| \le g$ with $g$ integrable is essential: without it, mass can escape (as in the sliding bump example) and the conclusion fails.
The DCT is the single most important tool in the Lebesgue theory. It justifies differentiation under the integral sign, computes limits of parametric integrals, and is the engine behind most approximation arguments in analysis.
[citeproof:4]
[example:Computing a Limit Using the DCT]
Consider $I_n := \int_0^1 \frac{nx}{1 + n^2 x^2} \, d\mathcal{L}^1(x)$. For each fixed $x \in (0, 1]$, $\frac{nx}{1 + n^2 x^2} = \frac{1}{n^{-1}x^{-1} + nx} \to 0$ as $n \to \infty$. At $x = 0$ the integrand is $0$ for all $n$. So $f_n(x) \to 0$ pointwise on $[0, 1]$.
To apply DCT we need a dominating function. The naive bound $\frac{nx}{1 + n^2x^2} \le \frac{1}{nx}$ (from dropping the $1$) diverges for fixed $x$ as $n \to \infty$, so it does not serve as an integrable dominator. The correct approach uses AM-GM: $1 + n^2x^2 \ge 2nx$, giving $f_n(x) \le \frac{nx}{2nx} = \frac{1}{2}$ for all $x \in [0, 1]$ and all $n$. Since $g \equiv 1/2$ is integrable on $[0, 1]$, DCT gives $\lim_{n \to \infty} I_n = \int_0^1 0 \, dx = 0$.
As a check: the substitution $u = nx$ gives $I_n = \frac{1}{n} \int_0^n \frac{u}{1 + u^2} \, du = \frac{1}{2n} \log(1 + n^2) \to 0$, confirming the DCT result. The direct computation works here, but the DCT argument is far more general — it applies to any dominated pointwise-convergent sequence without explicit antiderivatives.
[/example]
## Lebesgue Spaces
The convergence theorems give us the tools to build function spaces with good analytic properties. For $p \in [1, \infty)$, the $L^p$ norm $\|f\|_p := (\int |f|^p \, d\mu)^{1/p}$ measures the "average size" of $f$ with a penalty that increases with $p$: larger values of $p$ punish large oscillations more severely. For $p = \infty$, the essential supremum $\|f\|_\infty := \operatorname{ess\,sup} |f|$ captures the "worst case" size up to measure-zero sets.
[definition:Lebesgue Space]
Let $(E, \mathcal{E}, \mu)$ be a measure space and $p \in [1, \infty]$. The **Lebesgue space** $L^p(E, \mathcal{E}, \mu)$ is the set of equivalence classes of measurable functions $f: E \to \mathbb{C}$ with $\|f\|_p < \infty$, where $f \sim g$ if $f = g$ $\mu$-a.e.
[/definition]
The quotient by almost-everywhere equivalence is necessary to make $\|\cdot\|_p$ a genuine norm (rather than a seminorm): $\|f\|_p = 0$ implies $f = 0$ a.e. but not $f = 0$ everywhere. The triangle inequality for $\|\cdot\|_p$ is the [Minkowski inequality](/theorems/517), whose proof for $p > 1$ uses the [Hölder inequality](/theorems/516).
[example:Which Functions Are in $L^p$?]
Consider $f(x) = |x|^{-\alpha}$ on $B_1(0) \subset \mathbb{R}^n$ for $\alpha > 0$. Switching to polar coordinates ($d\mathcal{L}^n = r^{n-1} dr \, d\sigma$):
\begin{align*}
\int_{B_1(0)} |x|^{-\alpha p} \, d\mathcal{L}^n = \omega_{n-1} \int_0^1 r^{-\alpha p} \cdot r^{n-1} \, dr = \omega_{n-1} \int_0^1 r^{n - 1 - \alpha p} \, dr
\end{align*}
where $\omega_{n-1}$ is the surface area of $S^{n-1}$. The integral $\int_0^1 r^\beta \, dr$ converges if and only if $\beta > -1$, i.e. $n - 1 - \alpha p > -1$, i.e. $\alpha p < n$. So $|x|^{-\alpha} \in L^p(B_1(0))$ if and only if $\alpha < n/p$.
For example, in $\mathbb{R}^3$: $|x|^{-1} \in L^p(B_1(0))$ for $p < 3$ but not for $p \ge 3$. The function $|x|^{-1}$ is the Newtonian potential in $\mathbb{R}^3$ and appears as the fundamental solution of [Laplace's equation](/page/Laplace's%20Equation) — its $L^p$ integrability determines which Sobolev spaces it belongs to.
At infinity the situation reverses: $|x|^{-\alpha} \in L^p(\mathbb{R}^n \setminus B_1(0))$ if and only if $\alpha p > n$ (the integral $\int_1^\infty r^{n-1-\alpha p} dr$ converges iff $n - 1 - \alpha p < -1$). So no power function $|x|^{-\alpha}$ lies in $L^p(\mathbb{R}^n)$ globally — the local and global conditions are contradictory.
[/example]
The most fundamental structural property of $L^p$ spaces — and the reason they are indispensable in PDE theory — is completeness.
[quotetheorem:892]
The proof for $p < \infty$ extracts a fast Cauchy subsequence $(f_{n_k})$ with $\|f_{n_{k+1}} - f_{n_k}\|_p < 2^{-k}$, constructs the dominating function $G := \sum_k |f_{n_{k+1}} - f_{n_k}|$ (which is in $L^p$ by the Monotone Convergence Theorem since $\|G\|_p \le 1$), and uses $G < \infty$ a.e. to deduce that the telescoping series converges pointwise a.e. The Dominated Convergence Theorem (with dominator $G$) then upgrades pointwise convergence to $L^p$ convergence. The proof yields an important byproduct: every $L^p$-convergent sequence has a *pointwise a.e. convergent subsequence*.
The $p = \infty$ case is different in character: the Cauchy condition in $\|\cdot\|_\infty$ means [uniform convergence](/page/Uniform%20Convergence) outside a measure-zero set, so the limit is constructed by genuine uniform convergence on the "good" set and extended by zero on the null set.
[citeproof:892]
[example:Incompleteness of $C([0,1])$ Under the $L^1$ Norm]
Completeness depends on the norm, not just the vector space. Consider $C([0,1])$ with the $L^1$ norm $\|f\|_1 = \int_0^1 |f| \, d\mathcal{L}^1$ (instead of the supremum norm, under which it *is* complete). Define:
\begin{align*}
f_n(x) := \begin{cases} 0 & x \le 1/2, \\ n(x - 1/2) & 1/2 < x \le 1/2 + 1/n, \\ 1 & x > 1/2 + 1/n. \end{cases}
\end{align*}
Each $f_n$ is continuous. For $m > n$: $f_n$ and $f_m$ agree outside $[1/2, 1/2 + 1/n]$, and both take values in $[0, 1]$ on this interval, so $\|f_n - f_m\|_1 \le 1/n \to 0$. Hence $(f_n)$ is Cauchy in $\|\cdot\|_1$.
However, $f_n \to \mathbb{1}_{(1/2, 1]}$ pointwise, which is discontinuous. If some $g \in C([0,1])$ were the $L^1$ limit, then $\|f_n - g\|_1 \to 0$ would force a subsequence to converge pointwise a.e. to $g$, giving $g = \mathbb{1}_{(1/2,1]}$ a.e. — contradicting continuity of $g$ at $x = 1/2$. So $(f_n)$ has no limit in $(C([0,1]), \|\cdot\|_1)$. The $L^1$ norm is too weak to prevent limits from developing discontinuities: it controls the area between graphs, not the height of oscillations. The completion of $(C([0,1]), \|\cdot\|_1)$ is $L^1([0,1])$, which includes $\mathbb{1}_{(1/2, 1]}$.
[/example]
For analysis in $\mathbb{R}^n$, we need to know that $L^p$ spaces contain enough "nice" functions. The density results tell us that smooth, compactly supported functions are dense, so any $L^p$ function can be approximated by smooth functions — the starting point for most regularisation arguments.
[quotetheorem:893]
The proof for simple functions is a direct application of the Dominated Convergence Theorem: the standard approximation $0 \le s_n \uparrow f$ satisfies $|f - s_n|^p \le |f|^p \in L^1$, so DCT gives $\|f - s_n\|_p \to 0$. The approximation by $C_c^\infty$ functions requires the regularity of the Lebesgue measure: outer regularity approximates Borel sets by [open sets](/page/Open%20Set), which are then covered by rational-endpoint cubes, and [mollification](/page/Standard%20Mollifier) produces smooth approximations.
[citeproof:893]
Closely related is separability — the existence of a countable dense subset — which is essential for metrisability of weak topologies (as we shall see in Chapter 2).
[quotetheorem:548]
The countable dense subset consists of finite rational linear combinations of indicator functions of rational-endpoint cubes. Non-separability of $L^\infty$ follows from the uncountable family $(\mathbb{1}_{[-r,r]})_{r > 0}$, which satisfies $\|\mathbb{1}_{[-r,r]} - \mathbb{1}_{[-r',r']}\|_\infty = 1$ for $r \ne r'$ — no [countable set](/page/Countable%20Set) can approximate all of them.
[citeproof:548]
## Regularity of Measurable and Integrable Functions
Measurable and integrable functions can behave badly on sets of measure zero (we can redefine them arbitrarily on null sets without changing their equivalence class). But away from such pathologies, how "nice" are they? Two results answer this question: the Lebesgue Differentiation Theorem (the [fundamental theorem of calculus](/theorems/632) holds a.e. for $L^1$ functions) and Lusin's Theorem (measurable functions are continuous after removing sets of arbitrarily small measure).
The central concept is that of a **Lebesgue point**: $x$ is a Lebesgue point of $f$ if the average oscillation of $f$ around $x$ vanishes as we zoom in. Every continuity point of $f$ is a Lebesgue point (the $\varepsilon$-$\delta$ argument gives $\frac{1}{\mathcal{L}^n(B_r(x))} \int_{B_r(x)} |f(y) - f(x)| \, d\mathcal{L}^n(y) \le \varepsilon$ for $r < \delta$), but Lebesgue points can exist even where $f$ is discontinuous.
[quotetheorem:74]
The proof uses a density argument. The result holds for continuous $L^1$ functions (as every point is a Lebesgue point). For general $f \in L^1(\mathbb{R}^n)$, decompose $f = g + h$ where $g$ is continuous with $\|h\|_{L^1} < \varepsilon$. The quantity $T_f(x) := \lim_{r \downarrow 0} \frac{1}{\mathcal{L}^n(B_r(x))} \int_{B_r(x)} |f(y) - f(x)| \, d\mathcal{L}^n(y)$ satisfies $T_f(x) \le |h(x)| + M_h(x)$ where $M_h$ is the Hardy-Littlewood maximal function. The key estimate $\mu(\{M_h > a\}) \le \frac{3^n}{a} \|h\|_{L^1}$ (proved using the [Vitali Covering Lemma](/theorems/15)) then gives $\mu(\{T_f > 1/k\}) \le Ck\varepsilon$ for any $\varepsilon > 0$, forcing $T_f = 0$ a.e.
[citeproof:74]
Two important corollaries follow immediately.
[quotetheorem:894]
The Lebesgue Density Theorem says that when you zoom in on almost any point of a Borel set, the proportion of the ball occupied by the set converges to either $0$ (if the point is outside) or $1$ (if inside). The proof is a one-line application of the Lebesgue Differentiation Theorem to the function $f = \mathbb{1}_{E \cap B_{M+1}(0)}$ on each ball $B_M(0)$, followed by a countable union over $M$.
[citeproof:894]
[quotetheorem:895]
The Lebesgue Fundamental Theorem of Calculus recovers the classical FTC for $L^1$ functions: if $F(x) = \int_{-\infty}^x f$, then $F'(x) = f(x)$ for a.e. $x$. The estimate $|F(x+\delta)/\delta - f(x)| \le \frac{2}{\mathcal{L}^1(B_\delta(x))} \int_{B_\delta(x)} |f(y) - f(x)| \, dy \to 0$ is immediate from the definition of a Lebesgue point. The converse fails: Cantor's staircase function is differentiable a.e. with $F' = 0$ a.e., but $F(1) - F(0) = 1 \ne \int_0^1 0 \, dx$. The correct converse requires *absolute continuity* of $F$.
[citeproof:895]
The link between measurability and continuity is made precise by Egorov's and Lusin's theorems.
[quotetheorem:896]
Egorov's Theorem converts pointwise convergence to uniform convergence at the cost of removing a set of arbitrarily small measure. The proof builds nested sets $E_N^{(k)} := \bigcap_{p \ge N} \{|f_p - f| \le 1/k\}$ which increase to $A$ for each $k$ (by pointwise convergence), chooses $N_k$ with $\mu(A \setminus E_{N_k}^{(k)}) \le \varepsilon/2^k$, and takes $A_\varepsilon := A \setminus \bigcup_k (A \setminus E_{N_k}^{(k)})$. The finite-measure assumption $\mu(A) < \infty$ is essential: if $f_k = \mathbb{1}_{[k, k+1]}$ on $\mathbb{R}$, then $f_k \to 0$ pointwise but not uniformly on any set of finite co-measure.
[citeproof:896]
[quotetheorem:12]
Lusin's Theorem is a remarkable result: any measurable function, no matter how pathological, becomes continuous after removing a set of arbitrarily small measure. The key distinction is that this says $f|_{K}$ is continuous (the restriction to $K$ is continuous as a function on $K$), not that $f$ is continuous at every point of $K$. For instance, $f = \mathbb{1}_{\mathbb{Q}}$ satisfies $f|_{\mathbb{R} \setminus \mathbb{Q}}$ is continuous (it is the constant function $0$), even though $f$ is discontinuous everywhere. The proof uses inner regularity to find compact sets $K_n, K_n'$ inside $f^{-1}(V_n)$ and $F \setminus f^{-1}(V_n)$ respectively (for rational-endpoint intervals $V_n$), takes $K = \bigcap_n (K_n \cup K_n')$, and verifies continuity of $f|_K$ using normality to separate $K_n$ from $K_n'$.
[citeproof:12]
# Weak Topologies, Reflexivity, and Separability
The closed unit ball of a normed vector space is compact if and only if the space is finite-dimensional — a fact proved in the [Linear Analysis](/page/Cambridge%20II%20Linear%20Analysis) course. For function spaces such as $L^p(\mathbb{R}^n)$ and Sobolev spaces, which are infinite-dimensional, this means that bounded sequences need not have convergent subsequences in the norm topology. Since extracting convergent subsequences is the standard method for proving existence of solutions to PDEs, this is a serious obstacle.
The resolution is to weaken the topology. Fewer open sets means more compact sets, and the **[weak topology](/page/Weak*%20Topology)** — the coarsest topology making all bounded linear functionals continuous — turns out to strike the right balance: weak enough for compactness, strong enough for the limits to retain useful properties. The program of this chapter is: define the weak and weak-$*$ topologies, prove compactness of the closed unit ball in the weak-$*$ topology (Banach-Alaoglu), transfer this to the weak topology via reflexivity (Kakutani), and identify when the weak topology is metrizable (separability). The concrete payoff is that in separable reflexive [Banach spaces](/page/Banach%20Space) (such as $L^p$ for $1 < p < \infty$), every bounded sequence has a [weakly convergent](/page/Weak%20Convergence) subsequence.
## Initial Topologies and Weak Convergence
The weak topology is a special case of a general construction from point-set topology.
[definition:Initial Topology]
Let $X$ be a set and $(\phi_i: X \to Y_i)_{i \in I}$ a family of maps into [topological](/page/Topology) spaces $(Y_i)_{i \in I}$. The **initial topology** on $X$ generated by the $(\phi_i)_{i \in I}$ is the coarsest topology on $X$ making all $\phi_i$ continuous. Its open sets are arbitrary unions of finite intersections of preimages $\phi_i^{-1}(U_i)$ for $U_i \subseteq Y_i$ open.
[/definition]
The key property of initial topologies is that convergence is determined componentwise: $x_n \to x$ in the initial topology if and only if $\phi_i(x_n) \to \phi_i(x)$ in $Y_i$ for every $i \in I$. This follows directly from the description of a neighbourhood basis: a basic neighbourhood of $x$ is a finite intersection $\bigcap_{j \in J} \phi_j^{-1}(U_j)$, and $x_n$ is eventually in such a set if and only if $\phi_j(x_n) \in U_j$ for each of the finitely many $j$.
The initial topology also satisfies a **universal property**: a map $\psi: Z \to X$ from another topological space is continuous if and only if $\phi_i \circ \psi: Z \to Y_i$ is continuous for every $i$. This is the tool we will use repeatedly to verify continuity of maps between spaces with weak topologies.
[definition:Weak Topology]
Let $E$ be a [normed vector space](/page/Normed%20Vector%20Space) with dual $E^*$. The **weak topology** on $E$, denoted $\sigma(E, E^*)$, is the initial topology generated by $E^*$: the coarsest topology on $E$ making every $F \in E^*$ continuous.
[/definition]
By the characterisation of convergence in initial topologies: $x_n \rightharpoonup x$ weakly if and only if $F(x_n) \to F(x)$ for every $F \in E^*$. The weak topology is Hausdorff — if $x \ne y$ then the Geometric [Hahn-Banach Theorem](/theorems/879) produces $F \in E^*$ separating them.
The weak topology is always coarser than the norm (strong) topology: every $F \in E^*$ is strongly continuous by definition, so every weakly open set is strongly open. In finite dimensions the two topologies coincide (projections onto coordinates are dual elements, and the strong topology on $\mathbb{R}^n$ is generated by coordinate projections). In infinite dimensions they differ: weakly open neighbourhoods of $0$ are "huge" — they contain entire lines through the origin (because if $F_1, \ldots, F_n \in E^*$ all vanish at some $x \ne 0$, which must happen when $\dim E > n$ by rank-nullity, then $\lambda x$ lies in any basic weak neighbourhood of $0$ for all $\lambda$). In particular, the unit sphere $\{x : \|x\| = 1\}$ is strongly closed but not weakly closed in infinite dimensions, since $0$ lies in its weak closure.
[definition:Weak-Star Topology]
Let $E$ be a normed vector space. The **weak-$*$ topology** on $E^*$, denoted $\sigma(E^*, E)$, is the initial topology generated by the evaluation maps $\hat{f}: E^* \to \mathbb{R}$ defined by $\hat{f}(F) := F(f)$ for each $f \in E$.
[/definition]
Weak-$*$ convergence $F_n \overset{*}{\rightharpoonup} F$ means $F_n(f) \to F(f)$ for every $f \in E$ — pointwise convergence of functionals. The weak-$*$ topology is generated by a smaller class of functionals than the weak topology on $E^*$ (the evaluation maps $\hat{f} \in E^{**}$ form the image of the canonical embedding, which is a proper subset of $E^{**}$ when $E$ is not reflexive), so $\sigma(E^*, E) \subseteq \sigma(E^*, E^{**}) \subseteq \tau_{\text{strong}}$.
[example:Weak Convergence Without Strong Convergence in $\ell^2$]
Let $e_n := (0, \ldots, 0, 1, 0, \ldots) \in \ell^2(\mathbb{R})$ be the $n$-th standard basis vector. Then $e_n \rightharpoonup 0$ weakly but $e_n \not\to 0$ strongly.
**Weak convergence:** For any $F \in (\ell^2)^* \cong \ell^2$, the Riesz representation gives $F = F_y$ for some $y \in \ell^2$, with $F_y(x) = \sum_k x_k y_k$. Then $F_y(e_n) = y_n \to 0$ as $n \to \infty$, since $y \in \ell^2$ forces $\sum_k |y_k|^2 < \infty$ and hence $y_n \to 0$. So $F(e_n) \to 0 = F(0)$ for every $F \in (\ell^2)^*$.
**No strong convergence:** $\|e_n - 0\|_{\ell^2} = 1$ for all $n$, so $e_n \not\to 0$ in norm.
This example reveals the essential difference between the two topologies. The weak topology "sees" only the components $F(e_n) = y_n$ individually, each of which vanishes as $n \to \infty$ for any fixed $y \in \ell^2$. The norm, by contrast, measures all components simultaneously: $\|e_n\|^2 = \sum_k |e_n^{(k)}|^2 = 1$ does not depend on $n$. The mass does not disappear — it moves to higher and higher coordinates, escaping every fixed functional but maintaining constant norm.
[/example]
## Weak-Star Compactness: The Banach-Alaoglu Theorem
The central compactness result is that the closed unit ball of $E^*$ is always compact in the weak-$*$ topology. The proof embeds $B_{E^*}$ into a product of compact intervals $\prod_{f \in B_E} [-1, 1]$ (using that $|F(f)| \le \|F\| \cdot \|f\| \le 1$ for $F \in B_{E^*}$ and $f \in B_E$), shows the image is closed in the product topology, and invokes Tychonoff's theorem.
[quotetheorem:212]
This is the foundational result for weak-$*$ compactness. The weak-$*$ topology contains few enough open sets for compactness to hold, but enough for the topology to be Hausdorff and for limits to retain linearity and boundedness. The theorem requires no hypotheses on $E$ — it holds for any normed vector space.
However, Banach-Alaoglu is only directly useful when the space we are working in is itself a dual space. The next question is how to transfer this compactness from $E^*$ back to the original space $E$.
[citeproof:212]
## Reflexivity and Weak Compactness
The [canonical embedding](/theorems/875) $\phi: E \to E^{**}$ defined by $\phi(f)(F) := F(f)$ is an isometry, so $\phi(B_E) \subseteq B_{E^{**}}$. If $\phi$ is surjective — meaning $E$ is **reflexive** — then $\phi(B_E) = B_{E^{**}}$, and the inverse $\phi^{-1}$ is continuous from $(E^{**}, \sigma(E^{**}, E^*))$ to $(E, \sigma(E, E^*))$ (by the universal property of initial topologies). Since $B_{E^{**}}$ is weak-$*$ compact by Banach-Alaoglu, $B_E = \phi^{-1}(B_{E^{**}})$ is weakly compact.
The converse also holds: if $B_E$ is weakly compact, then $\phi(B_E)$ is weak-$*$ compact (hence closed) in $E^{**}$, and by Goldstine's Lemma it is weak-$*$ dense, so $\phi(B_E) = B_{E^{**}}$ and $E$ is reflexive.
[quotetheorem:898]
Goldstine's Lemma is the density half of the characterisation. The proof shows that any weak-$*$ neighbourhood of any $\psi \in B_{E^{**}}$ must intersect $\phi(B_E)$: if it did not, the [linear map](/page/Linear%20Map) $H: E \to \mathbb{R}^n$ defined by $H(f) = (F_1(f), \ldots, F_n(f))$ (for the $F_i$ defining the neighbourhood) would have $\alpha := (\psi(F_1), \ldots, \psi(F_n))$ outside $H(B_E)$, and separation by a hyperplane would contradict $\|\psi\|_{E^{**}} \le 1$.
[citeproof:898]
[quotetheorem:897]
Kakutani's Theorem is the fundamental result connecting reflexivity to compactness. It says that reflexive Banach spaces are precisely those whose closed unit ball is weakly compact. In combination with separability (which makes the weak topology metrizable on bounded sets), this gives weak *sequential* compactness: every bounded sequence in a separable reflexive Banach space has a weakly convergent subsequence.
[citeproof:897]
Some useful permanence properties: a closed subspace of a reflexive Banach space is reflexive (extend functionals by Hahn-Banach). A Banach space $E$ is reflexive if and only if $E^*$ is reflexive (the forward direction gives $\sigma(E^*, E) = \sigma(E^*, E^{**})$, so $B_{E^*}$ is weakly compact by Banach-Alaoglu; the reverse uses the canonical embedding).
## Uniform Convexity and Reflexivity
How do we verify that a given Banach space is reflexive? The most powerful criterion is **uniform convexity**: a geometric condition on the unit ball that implies reflexivity via the Milman-Pettis Theorem.
[definition:Uniform Convexity]
A Banach space $E$ is **uniformly convex** if for every $\varepsilon > 0$ there exists $\delta > 0$ such that for all $f, g \in B_E$,
\begin{align*}
\|f - g\| > \varepsilon \implies \left\|\frac{f + g}{2}\right\| < 1 - \delta.
\end{align*}
[/definition]
Geometrically, uniform convexity says that the midpoint of any two points on the unit sphere that are more than $\varepsilon$ apart lies strictly inside the unit ball, by a uniform amount $\delta$ that depends only on $\varepsilon$ and not on the particular points. The unit ball of $L^2$ (a circle in $\mathbb{R}^2$) is uniformly convex; the unit balls of $L^1$ and $L^\infty$ (a diamond and a square) are not.
[quotetheorem:899]
The proof uses Goldstine's Lemma to show that $\phi(B_E)$ is not merely weak-$*$ dense in $B_{E^{**}}$ but **strongly** dense. For any $\psi \in B_{E^{**}}$ and $\varepsilon > 0$, Goldstine gives $f \in B_E$ with $\phi(f)$ in a weak-$*$ neighbourhood of $\psi$. If $\|\phi(f) - \psi\| > \varepsilon$, a second application of Goldstine finds $g \in B_E$ with $\phi(g)$ also close to $\psi$ in the weak-$*$ topology. Then $\|f - g\| > \varepsilon$ but a dual functional $F$ witnesses $\|(f+g)/2\| > 1 - \delta$, contradicting uniform convexity. Since $\phi(B_E)$ is strongly dense and strongly closed (by completeness), $\phi(B_E) = B_{E^{**}}$.
[citeproof:899]
The main application of the Milman-Pettis Theorem is to the $L^p$ spaces. The Clarkson inequalities establish uniform convexity for $p \in (1, \infty)$.
[quotetheorem:900]
The first Clarkson inequality for $p \ge 2$ is proved by a scalar argument: for $\theta \in [0, 1]$ and $p \ge 2$, convexity of $t \mapsto t^{p/2}$ gives $\theta^{p/2} + (1-\theta)^{p/2} \le 1$, which after substitution and integration yields the inequality for $L^p$ functions. The second Clarkson inequality for $1 < p \le 2$ follows by duality from the first inequality applied on $L^q$ with $q = p/(p-1) \ge 2$. The failure at $p = 1$ and $p = \infty$ is witnessed by explicit examples: $f = \mathbb{1}_{[0,1]}$, $g = \mathbb{1}_{[1,2]}$ in $L^1$ have $\|f - g\|_1 = 2$ but $\|(f+g)/2\|_1 = 1$.
[citeproof:900]
Uniform convexity also gives a useful characterisation of strong convergence: in a uniformly convex space, $f_n \to f$ strongly if and only if $f_n \rightharpoonup f$ weakly and $\|f_n\| \to \|f\|$. The proof normalises to the unit ball and uses the definition of uniform convexity to show $\|(f_n + f)/2\| \to 1$ implies $\|f_n - f\| \to 0$.
## Separability, Metrisability, and Sequential Compactness
Kakutani tells us when the closed unit ball is weakly compact; separability tells us when this compactness is *sequential*. In a [metric space](/page/Metric%20Space), compactness and sequential compactness coincide, so the question reduces to: when is the weak topology metrizable on bounded sets?
The answer involves the separability of the dual space. If $E$ is separable, the weak-$*$ topology on $B_{E^*}$ is metrizable (define $d(F, G) := \sum_n 2^{-n} |F(f_n) - G(f_n)|$ for a dense sequence $(f_n)_n$ in $B_E$). Conversely, if the weak-$*$ topology on $B_{E^*}$ is metrizable, then $E$ is separable (the basic weak-$*$ neighbourhoods of $0$ determine a countable subset of $E$ whose span is dense, by a Hahn-Banach argument). Similarly, $E^*$ separable implies the weak topology on $B_E$ is metrizable (and the converse holds as well).
Combining everything: if $E$ is a separable, reflexive Banach space, then $B_E$ is weakly compact (Kakutani) and the weak topology on $B_E$ is metrizable (since $E^*$ is also separable and reflexive by the permanence properties of reflexivity). Hence $B_E$ is weakly sequentially compact, and by rescaling, every bounded sequence in $E$ has a weakly convergent subsequence. Similarly, every bounded sequence in $E^*$ has a weak-$*$ convergent subsequence.
[quotetheorem:214]
This is the culmination of the "hunt for compactness." For $L^p(\mathbb{R}^n)$ with $1 < p < \infty$, all the hypotheses are satisfied: $L^p$ is a separable Banach space (from Chapter 1), reflexive (by Milman-Pettis and Clarkson), and its dual $L^q$ is also separable and reflexive. Hence every bounded sequence in $L^p$ has a weakly convergent subsequence — the starting point for PDE existence theory.
[citeproof:214]
## Concrete Function Spaces
We now apply the abstract theory to identify the duals of the concrete function spaces.
**$L^p(\mathbb{R}^n)$ for $p \in (1, \infty)$:** The space is reflexive (Milman-Pettis + Clarkson), separable, and its dual is $L^q(\mathbb{R}^n)$ where $1/p + 1/q = 1$.
[quotetheorem:901]
The map $\Phi: L^q \to (L^p)^*$ sending $g \mapsto \Phi_g$ where $\Phi_g(f) = \int fg \, d\mu$ is well-defined and isometric: the upper bound $|\Phi_g(f)| \le \|f\|_p \|g\|_q$ is Hölder's inequality, and the lower bound is achieved by the test function $f = \operatorname{sign}(g)|g|^{q-1}/\|g\|_q^{q/p}$. Surjectivity for $p \in (1, \infty)$ uses reflexivity: the image is a closed subspace of a reflexive space, and a Hahn-Banach argument shows any functional vanishing on the image must be zero. The $p = 1$ case requires a different argument via the Hilbert space $L^2$ and the Riesz Representation Theorem.
[citeproof:901]
**$L^1(\mathbb{R}^n)$:** Its dual is $L^\infty(\mathbb{R}^n)$ (from the $p = 1$ case of the theorem above). However, $L^1$ is **not** reflexive. To see this, consider $f_n := \frac{1}{|B_{1/n}(0)|} \mathbb{1}_{B_{1/n}(0)}$, which satisfies $\|f_n\|_{L^1} = 1$ for all $n$. If $L^1$ were reflexive, $(f_n)$ would have a weakly convergent subsequence $f_{n_k} \rightharpoonup f$ for some $f \in L^1$. Testing against any $g \in C_c^0(\mathbb{R}^n \setminus \{0\})$ gives $\int f_{n_k} g \to 0$ (since $\operatorname{spt}(f_{n_k}) \to \{0\}$), so $\int fg = 0$ for all such $g$, forcing $f = 0$ a.e. But testing against $g \equiv 1$ gives $\int f_{n_k} = 1$ for all $k$, so $\int f = 1$, a contradiction.
**$L^\infty(\mathbb{R}^n)$:** It is the dual of $L^1$, so Banach-Alaoglu gives weak-$*$ compactness of $B_{L^\infty}$. It is not reflexive (otherwise $L^1$ would be) and not separable (the uncountable family $(\mathbb{1}_{[-r,r]})_{r>0}$ has pairwise $\|\cdot\|_\infty$-distance $1$). There exist continuous linear functionals on $L^\infty$ not representable by $L^1$ functions — for instance, the Dirac delta $F(f) := f(0)$ on $C_c^0 \subseteq L^\infty$, extended to all of $L^\infty$ by Hahn-Banach.
**$L^2(\mathbb{R}^n)$:** This is the special case $p = q = 2$, where $L^2$ is a [Hilbert space](/page/Hilbert%20Space). The [Riesz Representation Theorem](/theorems/221) for Hilbert spaces gives $(L^2)^* \cong L^2$ directly, and the orthogonal [projection onto closed convex sets](/theorems/240) exists and is Lipschitz. The Hilbert structure will be essential for the Fourier theory in Chapter 3 and the variational formulation of PDEs in Chapter 4.
# Fourier Decomposition of Functions
The [Fourier transform](/page/Fourier%20Transform) converts differentiation into multiplication by polynomials and convolution into pointwise multiplication, reducing partial differential equations to algebraic equations. This chapter develops the Fourier transform on $L^1(\mathbb{R}^d)$, extends it to $L^2(\mathbb{R}^d)$ via the Plancherel identity, and uses Fourier bases to decompose $L^2$ functions into oscillatory modes.
## The Fourier Transform on $L^1$
The Fourier transform of $f \in L^1(\mathbb{R}^d)$ is defined by $\mathcal{F}(f)(\xi) := \int_{\mathbb{R}^d} e^{-2\pi i x \cdot \xi} f(x) \, d\mathcal{L}^d(x)$. The integral converges absolutely since $|e^{-2\pi i x \cdot \xi} f(x)| = |f(x)|$ is integrable, giving $\|\mathcal{F}(f)\|_\infty \le \|f\|_{L^1}$. Continuity of $\mathcal{F}(f)$ follows from the Dominated Convergence Theorem applied to $\xi_n \to \xi$.
The key algebraic properties are: convolution becomes multiplication ($\mathcal{F}(f * g) = \mathcal{F}(f) \cdot \mathcal{F}(g)$, proved by Fubini), translation becomes phase shift ($\mathcal{F}(\tau_h f)(\xi) = e^{-2\pi i h \cdot \xi} \mathcal{F}(f)(\xi)$), and most importantly, **differentiation becomes polynomial multiplication**: if $f \in C^1(\mathbb{R}^d)$ with $\partial_{x_j} f \in L^1(\mathbb{R}^d)$, then $\mathcal{F}(\partial_{x_j} f)(\xi) = 2\pi i \xi_j \mathcal{F}(f)(\xi)$. This last property is what makes the Fourier transform useful for PDEs: it transforms the Laplacian $\Delta f$ into multiplication by $-4\pi^2 |\xi|^2$, reducing Poisson's equation $\Delta f = g$ to the algebraic equation $\mathcal{F}(f)(\xi) = -\mathcal{F}(g)(\xi) / (4\pi^2 |\xi|^2)$.
The decay of the Fourier transform at infinity is governed by the Riemann-Lebesgue Lemma.
[quotetheorem:245]
The intuition is that high-frequency oscillations in the exponential $e^{-2\pi i x \cdot \xi}$ cause massive cancellation in the integral when $|\xi|$ is large. The proof reduces to $C_c^\infty$ functions (where $(1 + 4\pi^2|\xi|^2) \mathcal{F}(f) = \mathcal{F}((1 - \Delta)f) \in L^\infty$ gives $|\mathcal{F}(f)(\xi)| \le C/(1 + |\xi|^2)$) and extends by density. An important consequence is that $\mathcal{F}: L^1(\mathbb{R}^d) \to C_0(\mathbb{R}^d)$ (continuous functions vanishing at infinity), but this map is not surjective.
[citeproof:245]
[example:Fourier Transform of the Gaussian]
Consider $f(x) = e^{-\alpha x^2}$ for $\alpha > 0$ and $d = 1$. The function $f$ satisfies the ODE $f'(x) = -2\alpha x f(x)$. Taking the Fourier transform of both sides and using the derivative-to-multiplication rules ($\mathcal{F}(f')(\xi) = 2\pi i \xi \mathcal{F}(f)(\xi)$ and $\mathcal{F}(xf)(\xi) = -\frac{1}{2\pi i} \mathcal{F}(f)'(\xi)$) gives an ODE for $g := \mathcal{F}(f)$:
\begin{align*}
2\pi i \xi \, g(\xi) = -2\alpha \cdot \left(-\frac{1}{2\pi i}\right) g'(\xi) = \frac{\alpha}{\pi i} g'(\xi)
\end{align*}
which rearranges to $g'(\xi) = -\frac{2\pi^2}{\alpha} \xi \, g(\xi)$. This has the same form as the ODE for $f$, with $\alpha$ replaced by $\pi^2/\alpha$, so $g(\xi) = C e^{-\pi^2 \xi^2 / \alpha}$ for some constant $C$. Evaluating at $\xi = 0$: $C = g(0) = \int_{\mathbb{R}} e^{-\alpha x^2} \, dx = \sqrt{\pi/\alpha}$ (by the Gaussian integral). Hence:
\begin{align*}
\mathcal{F}(e^{-\alpha x^2})(\xi) = \sqrt{\frac{\pi}{\alpha}} \, e^{-\pi^2 \xi^2 / \alpha}.
\end{align*}
A narrow Gaussian ($\alpha$ large, concentrated near $0$) transforms to a wide Gaussian ($\pi^2/\alpha$ small, spread out in frequency space), and vice versa. In the limit $\alpha \to 0^+$, $f \to 1$ (a constant) and $\mathcal{F}(f) \to \sqrt{\pi/\alpha} \cdot \delta_0$ (approaching a Dirac delta) — an instance of the **uncertainty principle**: localisation in space and localisation in frequency are inversely related.
[/example]
The Fourier transform on $L^1$ is injective — the inversion formula recovers $f$ from $\mathcal{F}(f)$ when the latter is also in $L^1$.
[quotetheorem:528]
The proof constructs an approximation to the identity $h_k$ using [convolutions](/page/Convolution) with rescaled versions of $H(\xi) = e^{-2\pi \sum |\xi_j|}$, shows $(f * h_k)(y) = \int H(\xi/k) e^{2\pi i y \cdot \xi} \mathcal{F}(f)(\xi) \, d\xi$ by Fubini, and takes $k \to \infty$ using the Dominated Convergence Theorem. The condition $\mathcal{F}(f) \in L^1$ is needed for the limit to converge.
[citeproof:528]
## The Fourier Transform on $L^2$ and the Plancherel Identity
The $L^1$ theory has a fundamental limitation: $\mathcal{F}(f)$ need not be in $L^1$ even when $f$ is, so the inversion formula has restricted applicability. On $L^2$, the situation is dramatically better: the Fourier transform is an isometry.
[quotetheorem:529]
The proof computes $\|\mathcal{F}(f)\|_2^2 = \mathcal{F}(f * \tilde{f})(0)$ where $\tilde{f}(x) = \overline{f(-x)}$, and shows this equals $\|f\|_2^2$ using the approximation-to-the-identity argument from the inversion theorem. Since $L^1 \cap L^2$ contains $C_c^\infty(\mathbb{R}^d)$ which is dense in $L^2$, the Plancherel identity extends $\mathcal{F}$ uniquely to an isometric bijection $L^2(\mathbb{R}^d) \to L^2(\mathbb{R}^d)$ (the **Fourier-Plancherel transform**). The inversion formula becomes $\hat{f}^{\,\vee} = \check{f}$ where $\check{f}(x) := f(-x)$, holding for all $f \in L^2$ without any additional integrability assumption.
[citeproof:529]
## Fourier Bases and [Fourier Series](/page/Fourier%20Series)
The Fourier transform decomposes $L^2(\mathbb{R}^d)$ functions into a continuum of oscillatory modes. On bounded domains, the decomposition becomes discrete — a Fourier series.
[example:Fourier Series on $[0, 2\pi]$]
Consider the Hilbert space $H = L^2([0, 2\pi], \frac{1}{2\pi} d\mathcal{L}^1)$. The exponentials $e_n(x) := e^{inx}$ for $n \in \mathbb{Z}$ form an orthonormal system: $(e_n, e_m) = \frac{1}{2\pi} \int_0^{2\pi} e^{i(n-m)x} dx = \delta_{nm}$ (by direct computation of the integral). This system is a Hilbert basis: Fejér's theorem shows that the [Cesàro means](/page/Ces%C3%A0ro%20Means) $\frac{1}{N+1} \sum_{n=0}^N \sum_{k=-n}^n \hat{f}_k e^{ikx}$ converge uniformly to $f$ for continuous $f$, using the [Fejér kernel](/page/Fej%C3%A9r%20Kernel) $K_N(y) = \frac{1}{N+1} \cdot \frac{\sin^2((N+1)y/2)}{2\pi \sin^2(y/2)}$ which is a smooth approximation to the identity. Since $\operatorname{span}\{e^{inx}\}$ is dense in $C^0([0, 2\pi]) \cap L^2$ and $C^0 \cap L^2$ is dense in $L^2$, the system is complete. Hence every $f \in L^2([0, 2\pi])$ has a Fourier series $f = \sum_{k \in \mathbb{Z}} \hat{f}_k e^{ikx}$ converging in $L^2$, with Parseval's identity $\|f\|_{L^2}^2 = \sum_k |\hat{f}_k|^2$. Pointwise convergence is a subtler question not resolved by the Hilbert space theory alone.
[/example]
The Poisson Summation Formula connects the values of a function at integers to the values of its Fourier transform at integers — linking the continuous and discrete worlds.
[quotetheorem:902]
The proof periodises $f$ to $\phi(t) := \sum_n f(t+n)$, computes its Fourier coefficients as $\hat{\phi}_k = \mathcal{F}(f)(k)$ by Fubini, reconstructs $\phi$ from its Fourier series (which converges uniformly by the absolute summability hypothesis), and evaluates at $t = 0$.
[citeproof:902]
[example:Applications of Poisson Summation]
**Theta function identity.** Taking $f(x) = e^{-\pi s x^2}$ for $s > 0$, the Gaussian computation gives $\mathcal{F}(f)(\xi) = s^{-1/2} e^{-\pi \xi^2/s}$. Poisson summation yields the theta function identity:
\begin{align*}
\sum_{n \in \mathbb{Z}} e^{-\pi n^2 s} = \frac{1}{\sqrt{s}} \sum_{n \in \mathbb{Z}} e^{-\pi n^2/s}
\end{align*}
which is fundamental in analytic number theory (it appears in Riemann's proof of the functional equation of the zeta function).
**Basel problem: $\sum_{n=1}^\infty 1/n^2 = \pi^2/6$.** Take $f(x) = e^{-2\pi|x|t}$ for $t > 0$. Its Fourier transform is $\mathcal{F}(f)(\xi) = \int_\mathbb{R} e^{-2\pi|x|t} e^{-2\pi i x \xi} \, dx$. Splitting into $x > 0$ and $x < 0$ and computing each geometric-type integral:
\begin{align*}
\mathcal{F}(f)(\xi) = \int_0^\infty e^{-2\pi x(t + i\xi)} \, dx + \int_0^\infty e^{-2\pi x(t - i\xi)} \, dx = \frac{1}{2\pi(t + i\xi)} + \frac{1}{2\pi(t - i\xi)} = \frac{t}{\pi(t^2 + \xi^2)}.
\end{align*}
Poisson summation gives $\sum_{n \in \mathbb{Z}} e^{-2\pi|n|t} = \frac{t}{\pi} \sum_{k \in \mathbb{Z}} \frac{1}{t^2 + k^2}$. The left side is $1 + 2\sum_{n=1}^\infty e^{-2\pi n t} = 1 + \frac{2e^{-2\pi t}}{1 - e^{-2\pi t}} = \coth(\pi t)$. Hence:
\begin{align*}
\coth(\pi t) = \frac{t}{\pi} \left(\frac{1}{t^2} + 2\sum_{k=1}^\infty \frac{1}{t^2 + k^2}\right) = \frac{1}{\pi t} + \frac{2t}{\pi} \sum_{k=1}^\infty \frac{1}{t^2 + k^2}.
\end{align*}
Taking $t \to 0^+$: $\coth(\pi t) = \frac{1}{\pi t} + \frac{\pi t}{3} - \cdots$ (from the Laurent expansion), while the right side becomes $\frac{1}{\pi t} + \frac{2t}{\pi} \sum_{k=1}^\infty \frac{1}{k^2} + O(t^3)$. Comparing the coefficients of $t$: $\frac{\pi}{3} = \frac{2}{\pi} \sum_{k=1}^\infty \frac{1}{k^2}$, giving $\sum_{k=1}^\infty \frac{1}{k^2} = \frac{\pi^2}{6}$.
[/example]
# Generalised Derivatives and Sobolev Spaces
The final chapter brings everything together. Spaces of classically differentiable functions ($C^k$) lack the compactness and completeness properties needed for PDE existence theory. Sobolev spaces $W^{k,p}$ replace classical derivatives with **weak derivatives** (defined via [integration by parts](/theorems/210) against [test functions](/page/Test%20Function)), gaining completeness and reflexivity while retaining enough information about derivatives to solve PDEs. The Sobolev embedding theorems then connect weak regularity back to classical regularity, completing the circle.
[example:Why $C^k$ Spaces Fail for PDE Theory]
The standard PDE strategy is to minimise a functional such as $J(f) = \int |\nabla f|^2 - 2gf$ over some class of functions. A minimising sequence $(f_n)$ satisfies $J(f_n) \to \inf J$, and we need to extract a convergent subsequence and show the limit is a minimiser. Two obstacles arise in $C^k$ spaces.
**Incompleteness.** Consider the space $\{f \in C^1(\mathbb{R}) : \|f\|_{H^1} < \infty\}$ with the norm $\|f\|_{H^1} := (\|f\|_{L^2}^2 + \|f'\|_{L^2}^2)^{1/2}$. The piecewise-linear ramp functions $f_n(x) := \max(0, \min(1, n(x - 1/2 + 1/n)))$ on $[0,1]$ (a smooth transition from $0$ to $1$ over an interval of width $1/n$) form a [Cauchy sequence](/page/Cauchy%20Sequence) in $L^2$ whose limit is the Heaviside function — which is not $C^1$. The space is not complete in the $H^1$ norm.
**No compactness.** In $C^1$ with the supremum norm, the [Arzelà-Ascoli Theorem](/theorems/885) gives compactness only when derivatives are *uniformly pointwise bounded*. But minimising sequences for PDE functionals typically have bounded $L^2$ norms of derivatives, not pointwise bounds. The $C^1$ topology requires uniform convergence of derivatives, which is far more than the $L^2$ control available from the functional.
Sobolev spaces resolve both problems: $W^{1,2}(\mathbb{R}^d)$ is complete (as a closed subspace of $L^2 \times L^2$), reflexive (since $L^2$ is), and its closed bounded sets are weakly sequentially compact (by [Kakutani's Theorem](/theorems/897)). A minimising sequence in $W^{1,2}$ has a weakly convergent subsequence, and lower semicontinuity of the functional under weak convergence shows the limit is a minimiser.
[/example]
## Weak Derivatives and Sobolev Spaces
The integration-by-parts formula $\int u \cdot D^\alpha \phi \, dx = (-1)^{|\alpha|} \int (D^\alpha u) \cdot \phi \, dx$ holds for $u \in C^{|\alpha|}$ and $\phi \in C_c^\infty$. The idea is to use this as a *definition* of the derivative: a function $g \in L^1_{\text{loc}}(\mathbb{R}^d)$ is the $\alpha$-th **weak derivative** of $f \in L^1_{\text{loc}}(\mathbb{R}^d)$ if $\int_{\mathbb{R}^d} g \cdot \phi \, dx = (-1)^{|\alpha|} \int_{\mathbb{R}^d} f \cdot D^\alpha \phi \, dx$ for all $\phi \in C_c^\infty(\mathbb{R}^d)$. The [weak derivative](/page/Weak%20Derivative) is unique when it exists (by the fundamental lemma of the [calculus of variations](/page/Calculus%20of%20Variations)), and agrees with the classical derivative when $f$ is classically differentiable.
[definition:Sobolev Space]
Let $k \in \mathbb{N}$ and $p \in [1, \infty]$. The **Sobolev space** $W^{k,p}(\mathbb{R}^d)$ consists of all $f \in L^p(\mathbb{R}^d)$ whose weak derivatives $D^\alpha f$ exist and lie in $L^p(\mathbb{R}^d)$ for all $|\alpha| \le k$, equipped with the norm
\begin{align*}
\|f\|_{W^{k,p}} := \left(\sum_{|\alpha| \le k} \|D^\alpha f\|_p^p\right)^{1/p}.
\end{align*}
When $p = 2$, we write $H^k(\mathbb{R}^d) := W^{k,2}(\mathbb{R}^d)$.
[/definition]
[example:Weak Derivative of $|x|$]
The function $f(x) = |x|$ on $\mathbb{R}$ has weak derivative $g(x) = 2H(x) - 1$ (where $H$ is the Heaviside function). Integration by parts: $\int_\mathbb{R} |x| \phi'(x) \, dx = \int_0^\infty x \phi'(x) \, dx - \int_{-\infty}^0 x \phi'(x) \, dx = -\int_0^\infty \phi \, dx + \int_{-\infty}^0 \phi \, dx = -\int_\mathbb{R} g \phi \, dx$. However, the Heaviside function $H(x)$ itself has no weak derivative in $L^1_{\text{loc}}$: the integration-by-parts formula gives $\int H \phi' = -\phi(0) = -\int \delta_0 \phi$, and the Dirac delta $\delta_0$ is not a function. This distinction — $|x|$ is weakly differentiable, $H(x)$ is not — illustrates that weak differentiability is a genuine regularity condition, not an empty formalism.
[/example]
Sobolev spaces are Banach spaces (as closed subspaces of products of $L^p$ spaces), and $W^{k,p}(\mathbb{R}^d)$ is reflexive for $p \in (1, \infty)$ (inheriting reflexivity from $L^p$). An equivalent definition for $p < \infty$ is $W^{k,p}(\mathbb{R}^d) = \overline{C_c^\infty(\mathbb{R}^d)}^{\|\cdot\|_{W^{k,p}}}$ — the closure of test functions in the Sobolev norm. For the Hilbert space case $H^k(\mathbb{R}^d)$, the Fourier transform gives a useful characterisation: $f \in H^k(\mathbb{R}^d)$ if and only if $(1 + |\xi|^2)^{k/2} \hat{f}(\xi) \in L^2(\mathbb{R}^d)$, and $\|f\|_{H^k}$ is equivalent to $\|(1 + |\xi|^2)^{k/2} \hat{f}\|_{L^2}$.
## The Sobolev Embeddings
The central question is: if a function has $k$ weak derivatives in $L^p$, what classical regularity does it possess? The answer depends on the relationship between $k$, $p$, and the dimension $d$, through the **critical exponent** $d/p$. When $k > d/p$, weak differentiability implies classical continuity (and even Hölder regularity); when $k < d/p$, we gain integrability but not continuity; the borderline $k = d/p$ is an intermediate case.
[quotetheorem:903]
The three cases use different proof techniques. The **subcritical case** ($k < d/p$, handled for $k = 1$ by Proposition 4.2 of the notes) uses the Gagliardo-Nirenberg-Sobolev product inequality and a clever choice of test function $v = |u|^{t-1}u$ to match exponents. The **supercritical case** ($k > d/p$) is [Morrey's Inequality](/theorems/62): the fundamental theorem of calculus on cubes, combined with Hölder's inequality, bounds the oscillation $|u(y_1) - u(y_2)|$ by $C|y_1 - y_2|^{1-d/p} \|\nabla u\|_{L^p}$. The **critical case** ($k = d/p$) uses an interpolation inequality to bridge between the other two.
The general $k$ is proved by iteration: each application of the $k = 1$ embedding uses one derivative and either raises the Lebesgue exponent (subcritical) or produces Hölder continuity (supercritical). The Sobolev embeddings extend to bounded domains $U \subset \mathbb{R}^d$ with smooth [boundary](/page/Boundary), via extension operators that reflect $u$ across $\partial U$.
[citeproof:903]
[example:The $d = 1$ Sobolev Embedding]
For $d = 1$ and $p = 2$, the critical exponent is $d/p = 1/2 < 1 = k$, so we are in the supercritical regime. The embedding gives $H^1(\mathbb{R}) \hookrightarrow C^{0,1/2}(\mathbb{R})$: every $H^1$ function has a representative that is $1/2$-Hölder continuous. We prove this directly for $f \in C_c^\infty(\mathbb{R})$ and extend by density.
**$L^\infty$ bound.** By the fundamental theorem of calculus and the product rule:
\begin{align*}
|f(x)|^2 = \int_{-\infty}^x \frac{d}{dy}(f(y)^2) \, dy = 2\int_{-\infty}^x f(y)f'(y) \, dy \le 2\int_{\mathbb{R}} |f||f'| \, d\mathcal{L}^1 \le 2\|f\|_{L^2} \|f'\|_{L^2}
\end{align*}
by the Cauchy-Schwarz inequality. Using $2ab \le a^2 + b^2$:
\begin{align*}
\|f\|_\infty^2 \le 2\|f\|_{L^2}\|f'\|_{L^2} \le \|f\|_{L^2}^2 + \|f'\|_{L^2}^2 = \|f\|_{H^1}^2.
\end{align*}
**Hölder bound.** For $x \ne y$, the Cauchy-Schwarz inequality on the interval $[y, x]$ gives:
\begin{align*}
|f(x) - f(y)| = \left|\int_y^x f'(z) \, dz\right| \le \left(\int_y^x 1^2 \, dz\right)^{1/2} \left(\int_y^x |f'(z)|^2 \, dz\right)^{1/2} \le |x - y|^{1/2} \|f'\|_{L^2}.
\end{align*}
Combining: $\|f\|_{C^{0,1/2}} = \|f\|_\infty + \sup_{x \ne y} \frac{|f(x) - f(y)|}{|x-y|^{1/2}} \le 2\|f\|_{H^1}$ for all $f \in C_c^\infty(\mathbb{R})$. For general $f \in H^1(\mathbb{R})$, approximate by $(f_k)_k \subset C_c^\infty$ with $\|f - f_k\|_{H^1} \to 0$. The inequality shows $(f_k)$ is Cauchy in $C^{0,1/2}(\mathbb{R})$, so it converges to some $g \in C^{0,1/2}(\mathbb{R})$. Since $f_k \to f$ in $L^2$ forces $f = g$ a.e., the function $f$ has a $1/2$-Hölder continuous representative.
This result is sharp: the exponent $1/2$ cannot be improved. To see this, consider $f_\varepsilon(x) = (x^2 + \varepsilon^2)^{1/4}$ on $[-1, 1]$, which as $\varepsilon \to 0$ converges to $|x|^{1/2}$ — a function that is exactly $1/2$-Hölder but no better.
[/example]
## The Dirichlet Problem for Poisson's Equation
We conclude by assembling the tools of the entire course to solve a classical PDE problem: find $f$ with $\Delta f = g$ in a bounded domain $U$ and $f = 0$ on $\partial U$.
The strategy has three steps: an **a priori estimate** (bounding the solution by the data), a **weak formulation** (recasting the PDE as an equation in $W_0^{1,2}(U)$), and a **regularity bootstrap** (promoting the weak solution to a classical one).
[example:Solving the Dirichlet Problem]
**Step 1: A priori estimate.** Multiply $\Delta f = g$ by $f$ and integrate by parts: $\int_U |\nabla f|^2 \, dx = -\int_U gf \, dx \le \|g\|_{L^2} \|f\|_{L^2}$. The [Poincaré Inequality](/theorems/76) on $W_0^{1,2}(U)$ gives $\|f\|_{L^2} \le C\|\nabla f\|_{L^2}$, so $\|f\|_{L^2} \le C\|g\|_{L^2}$. This immediately gives **uniqueness**: if $f_1, f_2$ both solve the problem, then $f_1 - f_2$ solves $\Delta(f_1 - f_2) = 0$ with zero boundary conditions, so $\|f_1 - f_2\|_{L^2} \le C \cdot 0 = 0$.
**Step 2: Weak formulation and existence.** Define the bilinear form $a(f_1, f_2) := \int_U \nabla f_1 \cdot \nabla f_2 \, dx$ on $H := W_0^{1,2}(U)$. Poincaré shows this is an equivalent inner product on $H$: coercivity $a(f, f) = \|\nabla f\|_{L^2}^2 \ge C^{-1}\|f\|_{H^1}^2$ follows from Poincaré, and boundedness $|a(f_1, f_2)| \le \|\nabla f_1\|_{L^2} \|\nabla f_2\|_{L^2}$ from Cauchy-Schwarz. The right-hand side $G(\phi) := -\int_U g\phi \, dx$ is a bounded linear functional on $H$ by the [Hölder Inequality](/theorems/516). By the [Riesz Representation Theorem](/theorems/221) on the Hilbert space $(H, a(\cdot, \cdot))$, there exists a unique $f \in H$ with $a(f, \phi) = G(\phi)$ for all $\phi$. Unwinding definitions, this says $\int_U \nabla f \cdot \nabla \phi \, dx = -\int_U g\phi \, dx$ for all $\phi \in C_c^\infty(U)$, which is the weak formulation of $\Delta f = g$.
**Step 3: Regularity bootstrap.** The weak solution $f \in W_0^{1,2}(U)$ has one weak derivative in $L^2$. By choosing appropriate test functions and using elliptic regularity estimates, we can show $f \in W^{k,2}(U)$ for all $k \ge 1$. If $g \in C^\infty(U)$, then for $k$ sufficiently large ($k > d/2$), the [Sobolev Embedding Theorem](/theorems/903) gives $f \in C^{k - \lceil d/2 \rceil}(U)$. Since $k$ was arbitrary, $f \in C^\infty(U)$: the weak solution is in fact smooth.
[/example]
This example illustrates the general strategy of modern PDE theory: reformulate the equation weakly in a Sobolev space where existence comes from functional analysis, then recover classical regularity from the Sobolev embeddings. The same approach — with the [Lax-Milgram Theorem](/theorems/91) replacing Riesz representation for non-symmetric problems — applies to general [second-order elliptic equations](/page/Second-Order%20Elliptic%20Equations) and extends (with modifications) to parabolic and hyperbolic equations.