This course develops the theory of Banach and Hilbert spaces, extending finite-dimensional linear algebra to infinite dimensions. Beginning with the Hahn-Banach extension theorem—a cornerstone result asserting the richness of the dual space—we establish the fundamental duality principle that underlies all of functional analysis. We then concretely study the duals of classical function spaces like $L^p$ and $C(K)$, before ascending to the more abstract study of weak topologies and the geometry of convex sets, culminating in the Krein-Milman theorem. This foundation naturally leads us to Banach algebras, which provide an algebraic framework unifying both operators and functions under a single structure.
The latter chapters reveal the power of this framework through functional calculus and spectral theory. We develop the holomorphic functional calculus for elements of a Banach algebra, showing how to evaluate analytic functions of operators, and then specialize to $C^*$-algebras—a class rich enough to capture quantum mechanics and harmonic analysis. The Borel functional calculus extends these ideas to normal operators, and the spectral theorem emerges as a unifying principle: a normal operator on a Hilbert space can be understood as multiplication by a function on a measure space. Throughout, duality and weak topologies serve as connecting threads, while spectral theory reveals the hidden diagonal structure of infinite-dimensional operators.
# 1. Hahn-Banach Extension Theorems
This chapter develops the Hahn-Banach extension theorems and their major consequences. The central question is deceptively simple: given a bounded linear functional defined on a subspace, can it be extended to the whole space without increasing its norm? The answer — yes, always — has far-reaching structural consequences for the geometry of Banach spaces. We also introduce the bidual, reflexivity, dual operators, quotient spaces, locally convex spaces, and several landmark corollaries that reveal how richly populated every dual space must be.
## Dual Spaces and Basic Notation
The first thing to establish is how large the dual of a normed space really is. Could it happen that $X^*$ is trivial, containing only the zero functional? This would be catastrophic: we could not distinguish any two points of $X$ by linear measurements, and nearly all of functional analysis would collapse. The Hahn-Banach theorem, proved below, guarantees that $X^*$ is always abundantly populated. But before proving that, we fix notation.
Let $X$ be a normed space (real or complex). The **dual space** of $X$ is
\begin{align*}
X^* = \{ f : X \to \mathbb{R} \text{ or } \mathbb{C} : f \text{ is linear and continuous} \}.
\end{align*}
Since $f$ is linear, continuity is equivalent to boundedness, so we may equivalently define $X^*$ as the space of bounded linear functionals on $X$. The dual is always a Banach space under the norm
\begin{align*}
\|f\|_{X^*} := \sup\{|f(x)| : x \in B_X\},
\end{align*}
where $B_X := \{x \in X : \|x\|_X \le 1\}$ is the closed unit ball and $S_X := \{x \in X : \|x\|_X = 1\}$ is the unit sphere.
[example: Standard Dual Space Identifications]
The following classical identifications describe the duals of familiar sequence spaces.
(i) $\ell_p^* \cong \ell_q$ for $1 \le p < \infty$, where $q$ is the Hölder conjugate of $p$, i.e. $\frac{1}{p} + \frac{1}{q} = 1$. The isomorphism sends $(y_n) \in \ell_q$ to the functional $f_y(x) = \sum_n x_n y_n$. To see this is isometric, the key estimate is Hölder's inequality: $|f_y(x)| = |\sum_n x_n y_n| \le \|x\|_{\ell_p} \|y\|_{\ell_q}$, so $\|f_y\|_{\ell_p^*} \le \|y\|_{\ell_q}$. Equality is achieved by the sequence $x_n^* = |y_n|^{q-2}\bar{y}_n/\|y\|_{\ell_q}^{q-1}$ (when $y \ne 0$), which satisfies $\|x^*\|_{\ell_p} = 1$ and $f_y(x^*) = \|y\|_{\ell_q}$, establishing $\|f_y\|_{\ell_p^*} = \|y\|_{\ell_q}$.
(ii) $c_0^* \cong \ell_1$. Here the natural pairing is again $f_y(x) = \sum_n x_n y_n$ for $y = (y_n) \in \ell_1$ and $x = (x_n) \in c_0$. To verify surjectivity: given any $f \in c_0^*$, let $e_n$ be the standard basis vectors in $c_0$ and set $y_n = f(e_n)$. Then for any finite sum $x = \sum_{n=1}^N a_n e_n$ we have $f(x) = \sum_n a_n y_n$, and the isometry $\|f\| = \|y\|_{\ell_1}$ follows because we can choose signs to make $|\sum_n y_n a_n| = \sum_n |y_n|$ for any truncation.
(iii) If $H$ is a Hilbert space, then $H^* \cong H$ (conjugate-linearly in the complex case), by the Riesz representation theorem. Every $f \in H^*$ is uniquely of the form $f = (\cdot, u)_H$ for some $u \in H$, with $\|f\| = \|u\|_H$.
The identification (iii) underpins the pairing notation introduced below.
[/example]
**Notation for isomorphism and pairing.** For normed spaces $X$ and $Y$:
- $X \sim Y$ means $X$ and $Y$ are isomorphic, i.e. there exists a linear homeomorphism $T: X \to Y$.
- $X \cong Y$ means $X$ and $Y$ are isometrically isomorphic, i.e. there exists a surjective linear $T: X \to Y$ with $\|T(x)\|_Y = \|x\|_X$ for all $x \in X$ (which forces $T$ to be injective, hence bijective, with continuous inverse).
- For $x \in X$ and $f \in X^*$, we write the **duality pairing**
\begin{align*}
f(x)
\end{align*}
using ordinary function notation. (Some sources use angle brackets $\langle x, f \rangle$; per our notation standards, we write $f(x)$ instead.)
The open mapping theorem gives a useful shortcut: if $X$ and $Y$ are both Banach, then to show $T: X \to Y$ is an isomorphism it suffices to check that one of $T$ or $T^{-1}$ is continuous, because the other continuity is then automatic.
By the definition of $\|f\|_{X^*}$, we always have the fundamental estimate
\begin{align*}
|f(x)| \le \|f\|_{X^*} \|x\|_X.
\end{align*}
## The Hahn-Banach Theorem
The heart of the chapter is the Hahn-Banach theorem. To state the most general version (for real spaces), we need a class of functions that generalises norms.
[definition: Positively Homogeneous and Subadditive Functional]
Let $X$ be a real vector space. A functional $p: X \to \mathbb{R}$ is called:
- **positively homogeneous** if $p(tx) = t\, p(x)$ for all $x \in X$ and all $t \ge 0$;
- **subadditive** if $p(x + y) \le p(x) + p(y)$ for all $x, y \in X$.
These two properties together are precisely the properties of a norm with the absolute homogeneity and non-negativity conditions weakened.
[/definition]
[quotetheorem:2627]
[citeproof:2627]
[explanation: What the Real Hahn-Banach Theorem Does and Does Not Say]
The hypothesis that $p$ is positively homogeneous and subadditive is precisely what makes the one-step extension argument work: the critical inequality $\sup \le \inf$ in the proof is equivalent to the subadditivity of $p$, and positive homogeneity is used when normalising by $|\lambda|$ to reduce the constraint on a general $z + \lambda x_0$ to the constraint on $z \pm x_0$. If $p$ were merely subadditive (without positive homogeneity), the extension step would break down. The generality of allowing $p$ to be something weaker than a norm — for instance, $p$ can take negative values as long as it is subadditive and positively homogeneous — is deliberate: some important applications (such as the geometric separation theorems proved in later chapters) use Minkowski functionals that are not norms.
What the theorem does **not** say: the extension $f$ carries no continuity guarantee unless $p$ is continuous (e.g. a norm or semi-norm). When $X$ is just an algebraic vector space, $f$ is a purely algebraic object. It also does not say the extension is unique — in general it is far from unique. For example, any functional on $\mathbb{R}^n$ that extends a one-dimensional functional subject to a polyhedral constraint will have many choices at each step of the inductive construction. Uniqueness of the extension (at norm-preserving level) occurs only when the unit ball of $X$ is strictly convex and the support functional is being extended, a much more specialised situation.
Finally, the proof makes essential use of Zorn's lemma, which is equivalent to the Axiom of Choice. The Hahn-Banach theorem cannot be proved in ZF alone: over models of set theory without choice, there exist normed spaces with trivial dual. The key insight is that Zorn's lemma handles the bookkeeping of transfinitely many extension steps all at once, avoiding the need to specify a well-ordering of $X$.
[/explanation]
Before extending to the complex case, we introduce semi-norms.
[definition: Semi-norm]
Let $X$ be a real or complex vector space. A **semi-norm** on $X$ is a function $p: X \to \mathbb{R}$ satisfying:
(i) $p(x) \ge 0$ for all $x \in X$;
(ii) $p(\lambda x) = |\lambda| p(x)$ for all $x \in X$ and all scalars $\lambda$;
(iii) $p(x + y) \le p(x) + p(y)$ for all $x, y \in X$.
A semi-norm differs from a norm only in that $p(x) = 0$ does not imply $x = 0$. The implication chain is: norm $\Rightarrow$ semi-norm $\Rightarrow$ positively homogeneous and subadditive.
[/definition]
[quotetheorem:2628]
[citeproof:2628]
[explanation: Why the Complex Case Needs $|g(y)| \le p(y)$, Not Merely $g(y) \le p(y)$]
The hypothesis $|g(y)| \le p(y)$ is strictly stronger than $g(y) \le p(y)$ in the complex setting, and the stronger form is necessary. The real Hahn-Banach theorem only uses the one-sided bound $g(y) \le p(y)$ because the real part alone determines a real-linear functional completely. In the complex case, the functional $g$ also has an imaginary part; the bound $|g(y)| \le p(y)$ ensures that both $\operatorname{Re}(g)$ and $\operatorname{Im}(g)$ are individually bounded, which is what the trick of recovering $f$ from its real part requires.
What happens if we only assume $\operatorname{Re}(g(y)) \le p(y)$? We can still extend $\operatorname{Re}(g)$ to a real-linear functional $f_1: X \to \mathbb{R}$ with $f_1(x) \le p(x)$, and we can form $f(x) = f_1(x) - if_1(ix)$ which is $\mathbb{C}$-linear. But without $|g(y)| \le p(y)$, the extension $f$ need not restrict back to $g$ on $Y$: the imaginary parts might not match. The condition $|g(y)| \le p(y)$ is precisely what forces the imaginary part of the extension to agree with the imaginary part of $g$.
Regarding uniqueness: just as in the real version, extensions are far from unique in general. They are unique only when the unit ball of the ambient space has a unique supporting hyperplane at the evaluation point — a geometric condition that fails in most infinite-dimensional spaces.
[/explanation]
[remark: Real Part Duality]
For a complex normed space $X$, let $X_\mathbb{R}$ denote $X$ viewed as a real normed space. The proof above shows that the map $(X^*)_\mathbb{R} \to (X_\mathbb{R})^*$ sending $f \mapsto \operatorname{Re}(f)$ is an isometric isomorphism. So the real and complex duals are the same in a precise sense.
[/remark]
A key corollary shows that every point in $X$ has a corresponding functional that "sees" it at full norm.
[quotetheorem:2629]
[citeproof:2629]
This theorem has three important consequences.
[remark: Consequences of the Normed Space Hahn-Banach]
(i) **Linear analogue of Tietze extension.** Part (i) is the linear version of Tietze's extension theorem: just as continuous functions on closed subsets of compact Hausdorff spaces extend without increasing the sup norm, bounded linear functionals on subspaces extend without increasing the operator norm.
(ii) **$X^*$ separates points.** Part (ii) implies that $X^*$ separates the points of $X$: if $x \ne y$, apply (ii) to $x_0 = x - y$ to get $f \in X^*$ with $f(x - y) = \|x - y\| \ne 0$.
(iii) **Dual expression for the norm.** The norming functional $f$ achieves the supremum in
\begin{align*}
\|x_0\|_X = \sup_{g \in B_{X^*}} |g(x_0)| = \sup_{g \in X^* \setminus \{0\}} \frac{|g(x_0)|}{\|g\|_{X^*}}.
\end{align*}
This mirrors the expression $\|g\|_{X^*} = \sup_{x \ne 0} |g(x)|/\|x\|_X$ for the dual norm, with the roles of $X$ and $X^*$ swapped.
(iv) **Geometric interpretation.** When $X$ is real and $\|x_0\|_X = 1$, the set $\{x : f(x) = 1\}$ is a hyperplane that is tangent to $B_X$ at $x_0$, with $B_X$ lying entirely on one side. This is the supporting hyperplane at $x_0$.
[/remark]
<!-- illustration-needed: the supporting hyperplane — show the unit ball B_X with a boundary point x_0, and the tangent hyperplane {f = 1} touching B_X at x_0, with B_X on the side {f <= 1} -->
## The Bidual and Reflexivity
Once we have the dual $X^*$, it is natural to ask: what is the dual of $X^*$? And when we embed $X$ into the bidual $X^{**}$ via evaluation functionals, how much of $X^{**}$ does this image fill? The question is not merely aesthetic. A space $X$ is recoverable from $X^{**}$ — in the sense that no information is lost when passing to the bidual — exactly when $X$ is reflexive. Non-reflexive spaces like $c_0$ have a strictly larger bidual, and those extra elements witness the failure of the space to be self-dual in this stronger sense. Understanding when $X = X^{**}$ has deep consequences for weak compactness and the solvability of optimisation problems.
The dual space $X^*$ is always a Banach space. Taking the dual again gives the bidual.
[definition: Bidual]
Let $X$ be a normed space. The **bidual** (or second dual) of $X$ is $X^{**} := (X^*)^*$, which is a Banach space under the operator norm $\|\phi\| = \sup_{f \in B_{X^*}} |\phi(f)|$.
[/definition]
There is a natural way to embed $X$ into $X^{**}$: every element $x \in X$ defines an evaluation functional $\hat{x}: X^* \to \mathbb{F}$ by $\hat{x}(f) := f(x)$. This is bounded since $|\hat{x}(f)| = |f(x)| \le \|f\|_{X^*} \|x\|_X$, so $\hat{x} \in X^{**}$.
[quotetheorem:875]
[citeproof:875]
Two remarks illuminate the significance of this embedding.
[remark: Completions via the Bidual]
(i) $\hat{X}$ is closed in $X^{**}$ if and only if $X$ is complete. (If $X$ is complete then $\hat{X} \cong X$ is Banach and Banach spaces are always closed in ambient Banach spaces; the converse uses completeness of $X^{**}$.)
(ii) The closure $\overline{\hat{X}}$ in $X^{**}$ is always a Banach space containing an isometric copy of $X$ as a dense subspace. This shows every normed space has a completion.
[/remark]
[citedefinition:Reflexive Space]
[example: Non-Reflexivity of $c_0$ via an Explicit Element of $\ell_\infty \setminus \hat{c_0}$]
The space $c_0^{**} \cong \ell_\infty$ (since $c_0^* \cong \ell_1$ and $\ell_1^* \cong \ell_\infty$), and the canonical embedding $\hat{\cdot}: c_0 \hookrightarrow \ell_\infty$ is simply the inclusion of convergent-to-zero sequences into all bounded sequences. To show $c_0$ is not reflexive, we must exhibit an element of $\ell_\infty$ that is not in the image of $\hat{c_0}$, i.e. a bounded sequence that does not converge to $0$.
The simplest example: the constant sequence $\mathbb{1} = (1, 1, 1, \ldots) \in \ell_\infty$ is not in $\hat{c_0}$, since no sequence in $c_0$ converges to $1 \ne 0$. More explicitly, $\mathbb{1}$ corresponds to an element of $c_0^{**} \cong \ell_1^*$ that evaluates as $\phi(\alpha) = \sum_n \alpha_n$ for $\alpha = (\alpha_n) \in \ell_1$. If $\mathbb{1} = \hat{x}$ for some $x \in c_0$, then $\hat{x}(e_n^*) = e_n^*(x) = x_n$ for each standard basis functional $e_n^* \in \ell_1$, and also $\hat{\mathbb{1}}(e_n^*) = e_n^*(\mathbb{1}) = 1$ for all $n$. But $x_n = 1$ for all $n$ contradicts $x \in c_0$ (which requires $x_n \to 0$). So the canonical embedding $c_0 \to c_0^{**}$ is not surjective, and $c_0$ is not reflexive.
A common trap: the non-reflexivity of $c_0$ does not follow merely from $c_0 \ne \ell_\infty$ as sets — one needs to check that no element of $c_0$ maps to $\mathbb{1}$ under the canonical embedding, not under some other isomorphism.
[/example]
The reflexive spaces include $\ell_p$ for any $p \in (1, \infty)$, every Hilbert space $H$, and every finite-dimensional normed space. The non-reflexive spaces include $c_0$, $\ell_1$, $\ell_\infty$, and $L^1([0,1])$. We will see in Chapter 2 that $L^p(\mu)$ is reflexive for all $p \in (1, \infty)$ and any measure $\mu$.
**How to verify reflexivity in practice.** A practical recipe: (1) Identify $X^*$ explicitly (as in the $\ell_p$ duality example). (2) Identify $X^{**} = (X^*)^*$ using the same method. (3) Track what the canonical embedding sends $x \in X$ to in $X^{**}$ and check whether every element of $X^{**}$ arises this way. In the $\ell_p$ case for $1 < p < \infty$, the chain $\ell_p^* \cong \ell_q \cong (\ell_q^*)^* = \ell_p^{**}$ via the Hölder pairing is compatible with the canonical embedding, so $\ell_p$ is reflexive. The key insight is that reflexivity is a statement about the specific evaluation map, not just an abstract isomorphism.
## Dual Operators
Every bounded linear map $T: X \to Y$ should induce a corresponding map on dual spaces. The naturality demand is: if $g \in Y^*$ measures elements of $Y$, then $g \circ T$ measures elements of $X$ by first applying $T$. This pullback construction is the dual operator, and it arises whenever one studies how linear maps interact with duality — from the open mapping theorem to spectral theory.
[definition: Dual Operator]
For $T \in \mathcal{L}(X, Y)$, the **dual operator** (or adjoint) $T^*: Y^* \to X^*$ is defined by
\begin{align*}
T^*(g) = g \circ T, \qquad g \in Y^*.
\end{align*}
In function notation: $T^*(g)(x) = g(T(x))$ for all $x \in X$.
[/definition]
The map $T \mapsto T^*$ is well-behaved in every respect.
[quotetheorem:2630]
[citeproof:2630]
[explanation: What the Dual Operator Theorem Does and Does Not Say]
The norm equality $\|T^*\| = \|T\|$ is not an obvious observation — it relies crucially on the dual expression for the norm, which is itself a consequence of Hahn-Banach. Without Hahn-Banach, we would only know $\|T^*\| \le \|T\|$ from the immediate bound $|g(T(x))| \le \|g\| \|T\| \|x\|$; the matching lower bound requires finding, for each $T(x)$, a functional $g$ that sees $T(x)$ at full norm, and that is exactly what the support functional guarantee of Hahn-Banach provides.
The theorem says nothing about surjectivity or kernel structure of $T^*$. Even if $T$ is surjective, $T^*$ need not be injective or surjective in general. The kernel-range relations are:
\begin{align*}
\ker(T^*) = (\operatorname{Range}(T))^\perp := \{g \in Y^* : g(y) = 0 \text{ for all } y \in \operatorname{Range}(T)\}
\end{align*}
and dually $\ker(T) \subset ^\perp(\operatorname{Range}(T^*))$; these are separate results (annihilator relations) proved using the closed range theorem.
The dual operator in the Banach space setting differs from the Hilbert space adjoint in one important respect: in a Hilbert space, $T^*$ is conjugate-linear in $T$ (there is a complex conjugate in the definition due to the sesquilinear inner product), whereas the Banach dual operator $T \mapsto T^*$ is linear (no conjugation). The Hilbert adjoint satisfies $(Tx, y)_H = (x, T^*y)_H$, while the Banach dual operator satisfies $T^*(g)(x) = g(T(x))$ — two quite different conventions, sharing the notation $T^*$ by tradition.
[/explanation]
[example: Dual of the Right Shift on $\ell_p$]
For $p \in (1, \infty)$ with Hölder conjugate $q$, define the right shift $R: \ell_p \to \ell_p$ by
\begin{align*}
R(x_1, x_2, x_3, \ldots) = (0, x_1, x_2, x_3, \ldots).
\end{align*}
Then $R^*: \ell_q \to \ell_q$ is the left shift: $R^*(y_1, y_2, y_3, \ldots) = (y_2, y_3, \ldots)$.
To see this, for $x = (x_n) \in \ell_p$ and $y = (y_n) \in \ell_q$, identify $y$ with the functional $g_y(x) = \sum_n x_n y_n$. Then $R^*(g_y)(x) = g_y(Rx) = \sum_n (Rx)_n y_n = \sum_{n \ge 2} x_{n-1} y_n = \sum_{m \ge 1} x_m y_{m+1} = g_{(y_2, y_3, \ldots)}(x)$.
[/example]
The dual operator also interacts with the canonical embedding: for $T \in \mathcal{L}(X, Y)$, the diagram
\begin{align*}
\begin{array}{ccc}
X & \xrightarrow{T} & Y \\
\downarrow\hat{\cdot} & & \downarrow\hat{\cdot} \\
X^{**} & \xrightarrow{T^{**}} & Y^{**}
\end{array}
\end{align*}
commutes, i.e. $\widehat{T(x)} = T^{**}(\hat{x})$. This is verified by checking that $g(\widehat{T(x)}) = g(T(x)) = T^*(g)(x) = T^*(g)(\hat{x}) = g(T^{**}(\hat{x}))$ for all $g \in Y^*$.
A consequence of the duality properties is that isomorphic spaces have isomorphic duals:
[remark: Isomorphism Passes to Duals]
If $X \sim Y$ (isomorphic), then $X^* \sim Y^*$. Indeed, if $T: X \to Y$ is an isomorphism with inverse $S$, then $TS = \operatorname{id}_Y$ and $ST = \operatorname{id}_X$, so taking duals gives $S^*T^* = \operatorname{id}_{X^*}$ and $T^*S^* = \operatorname{id}_{Y^*}$, showing $X^* \sim Y^*$ via $S^*, T^*$. In particular $(T^{-1})^* = (T^*)^{-1}$.
The converse is false: many non-isomorphic spaces share the same dual.
[/remark]
## Quotient Spaces
The algebraic quotient $X/Y$ of a vector space by a subspace is always well-defined, but equipping it with a norm requires care. The most natural attempt — taking the norm of a coset to be the infimum of norms of its members — is a legitimate norm only when $Y$ is closed. Without closedness, one can have $x \notin Y$ yet $\operatorname{dist}(x, Y) = 0$, so the infimum for the coset $x + Y$ is zero but the coset is not the zero element. This is the precise algebraic obstruction that closedness of $Y$ removes: on a closed subspace, if $\operatorname{dist}(x, Y) = 0$ then $x \in Y$ (since $Y = \overline{Y}$), so the infimum is zero only for the zero coset.
Let $Y \subset X$ be a closed subspace of a normed space $X$. The quotient $X/Y = \{x + Y : x \in X\}$ becomes a normed space under the **quotient norm**
\begin{align*}
\|x + Y\| := \operatorname{dist}(x, Y) = \inf_{y \in Y} \|x + y\|_X.
\end{align*}
The quotient norm is independent of the choice of coset representative: if $x + Y = z + Y$ then $x - z \in Y$, and shifting by $Y$ does not change the infimum.
[quotetheorem:2631]
[citeproof:2631]
[quotetheorem:2632]
[citeproof:2632]
[explanation: The Universal Property and the Role in Later Theorems]
The factorisation theorem expresses a **universal property**: every operator that kills $Y$ factors uniquely through the quotient $X/Y$. This is the standard algebraic statement that $X/Y$ is the "largest" quotient of $X$ on which elements of $Y$ vanish. The theorem provides the conceptual foundation for the open mapping theorem and the closed graph theorem: if $T: X \to Y$ is a bijective bounded linear operator between Banach spaces, then $\tilde{T}: X/\ker(T) \to Y$ is a bounded bijection, and the open mapping theorem gives that $\tilde{T}^{-1}$ is bounded, forcing $T^{-1}$ to be bounded.
The closedness of $Y$ is indispensable. If $Y$ is merely a dense subspace (not closed), the quotient norm degenerates: every coset $x + Y$ has infimum norm $0$ (since $x$ can be approximated by elements of $Y$), so the quotient is the zero space. This is not a useful object. The key lesson is: always check closedness before forming a quotient normed space.
The norm equality $\|\tilde{T}\| = \|T\|$ is a reflection of the fact that $q: D_X \to D_{X/Y}$ is surjective — the quotient does not artificially shrink the unit ball, so the factored operator has the same reach as the original.
[/explanation]
## Three Big Corollaries of Hahn-Banach
We now derive three major consequences of the Hahn-Banach theorem that illustrate the power of being able to construct functionals at will.
### Separability Passes from $X^*$ to $X$
[quotetheorem:2633]
The converse fails: $\ell_1$ is separable but $\ell_1^* \cong \ell_\infty$ is not. So this theorem tells us that $X^*$ can never be smaller than $X$ in the sense of separability.
[citeproof:2633]
### Every Separable Space Embeds Isometrically into $\ell_\infty$
[quotetheorem:2634]
[citeproof:2634]
This shows $\ell_\infty$ is isometrically universal for the class of separable Banach spaces. However, $\ell_\infty$ is itself not separable, which raises a natural question: is there a separable universal space? The answer is yes — one construction appears later in the course.
### Banach-Valued Liouville Theorem
The Hahn-Banach theorem also enables a Banach-space version of Liouville's theorem from complex analysis. The strategy — reduce a vector-valued problem to a scalar-valued one by composing with arbitrary elements of $X^*$ — is a recurring technique throughout functional analysis. The key insight is: if $\phi(f(z)) = \phi(f(0))$ for every $\phi \in X^*$, and $X^*$ separates points, then $f(z) = f(0)$.
[quotetheorem:2635]
[citeproof:2635]
This is a typical use of the "reduce to the scalar case via functionals" strategy that permeates functional analysis. The same technique works for: proving that analytic $X$-valued functions have convergent power series expansions (compose with each $\phi \in X^*$ and use the scalar result, then use the Hahn-Banach separation to conclude for $X$), showing that the Gelfand transform carries spectral information, and proving vector-valued versions of the Cauchy integral formula.
## Locally Convex Spaces
Normed spaces are not the most general framework in which Hahn-Banach applies. Locally convex spaces (lcs) replace the single norm with a family of semi-norms, capturing spaces like $C^\infty(\Omega)$ and $\mathcal{O}(U)$ that do not carry a natural norm. The challenge for $C^\infty(\Omega)$ is immediate: a function can have arbitrarily large derivatives on some compact subset while remaining small on another, so no single uniform norm captures all relevant convergence information. The family of all sup-norms $\|D^\alpha f\|_K$ over compact $K$ and multi-index $\alpha$ does the job, at the cost of moving outside the normed-space framework.
[definition: Locally Convex Space]
A **locally convex space** is a pair $(X, \mathcal{P})$ where $X$ is a vector space over $\mathbb{R}$ or $\mathbb{C}$ and $\mathcal{P}$ is a family of semi-norms on $X$ that **separates points**: for every $x \ne 0$, there exists $p \in \mathcal{P}$ with $p(x) \ne 0$.
[/definition]
The family $\mathcal{P}$ defines a topology on $X$ — the **lcs topology** — by declaring $U \subset X$ open if for every $x \in U$ there exist $\varepsilon > 0$, $n \in \mathbb{N}$, and $p_1, \ldots, p_n \in \mathcal{P}$ such that
\begin{align*}
\{y \in X : p_k(y - x) < \varepsilon \text{ for } 1 \le k \le n\} \subset U.
\end{align*}
The key properties of this topology are: vector addition and scalar multiplication are continuous (making $X$ a topological vector space); the topology is Hausdorff (because $\mathcal{P}$ separates points); and $x_n \to x$ in this topology if and only if $p(x_n - x) \to 0$ for every $p \in \mathcal{P}$.
Two families $\mathcal{P}$, $\mathcal{Q}$ of semi-norms on $X$ are **equivalent** (written $\mathcal{P} \sim \mathcal{Q}$) if they generate the same topology. The lcs $(X, \mathcal{P})$ is metrizable if and only if $\mathcal{P}$ has a countable equivalent subfamily.
[citedefinition:Fréchet Space]
[example: Standard Locally Convex Spaces]
(i) Every normed space $(X, \|\cdot\|)$ is a lcs with $\mathcal{P} = \{\|\cdot\|\}$.
(ii) Let $U \subset \mathbb{C}$ be open and non-empty. The space of analytic functions
\begin{align*}
\mathcal{O}(U) = \{f: U \to \mathbb{C} : f \text{ is analytic}\}
\end{align*}
with semi-norms $p_K(f) = \sup_{z \in K} |f(z)|$ for each compact $K \subset U$ is a lcs. The induced topology is the topology of local uniform convergence. Using a compact exhaustion $U = \bigcup_n K_n$ (which exists for any open set in $\mathbb{C}$), the topology is generated by the countable subfamily $\{p_{K_n}\}$, and one verifies completeness using the fact that locally uniform limits of analytic functions are analytic (Weierstrass's theorem). So $\mathcal{O}(U)$ is a Fréchet space. By Montel's theorem, it is not normable: the closed unit ball of any putative norm would have to be bounded in all semi-norms $p_K$, but bounded equicontinuous families on $U$ are relatively compact (Montel's theorem), contradicting the usual non-compactness of the unit ball in infinite-dimensional spaces.
(iii) For $\Omega \subset \mathbb{R}^d$ open, the space $C^\infty(\Omega)$ with semi-norms $p_{K,\alpha}(f) = \sup_{x \in K} |D^\alpha f(x)|$ for compact $K \subset \Omega$ and multi-index $\alpha \in (\mathbb{Z}_{\ge 0})^d$ is a Fréchet space, again not normable.
[/example]
The correct notion of continuity for linear maps between lcs's is captured by the following characterisation.
[quotetheorem:2636]
[citeproof:2636]
[explanation: Why Finitely Many Semi-norms Suffice, and What This Means for the Dual]
The surprising consequence of condition (iii) is that only finitely many semi-norms from $\mathcal{P}$ are needed to control any fixed output semi-norm — even if $\mathcal{P}$ is infinite. This is not obvious from the definition, and it reflects the fact that the topology is defined by finite intersections of basic open sets.
The contrast with the normed case is instructive: on a normed space, condition (iii) simply says $\|T(x)\| \le C\|x\|$, which is boundedness. On $C^\infty(\Omega)$, however, a continuous linear map need not be bounded by any single semi-norm. The differentiation operator $D: C^\infty(\Omega) \to C^\infty(\Omega)$ sending $f \mapsto f'$ satisfies $p_{K,\alpha}(Df) = p_{K,\alpha+1}(f)$, so it is bounded by a single semi-norm of higher order. But a map like the evaluation-at-a-sequence operator involving derivatives of all orders would require infinitely many semi-norms on the right-hand side of (iii) if such a map existed — and the theorem says it cannot be continuous if it does.
The **algebraic dual** of a lcs is the space of all linear maps $X \to \mathbb{F}$, while the **topological dual** $X^*$ consists only of continuous ones. For normed spaces these coincide on bounded sets (by boundedness), but for lcs's the algebraic dual can be enormous. On $C^\infty(\Omega)$ the topological dual is the space $\mathcal{E}'(\Omega)$ of distributions with compact support — a rich but manageable object. The algebraic dual of $C^\infty(\Omega)$ contains many more maps, most of which have no sensible interpretation.
[/explanation]
The dual of a locally convex space is defined as $X^* = \{f: X \to \mathbb{F} : f \text{ linear and continuous}\}$. For such maps, continuity has a clean characterisation in terms of kernels.
[quotetheorem:2637]
[citeproof:2637]
[explanation: Closed Kernels, Hyperplane Separation, and a Practical Test]
The closed kernel characterisation is special to functionals (maps to $\mathbb{F}$) and does not generalise to operators $T: X \to Y$ with $\dim Y \ge 2$. The reason is geometric: the kernel of a nonzero functional is a maximal proper subspace (a hyperplane), and a hyperplane is either closed or dense. If $\ker(f)$ is dense and $f \ne 0$, then $f$ is discontinuous — it is nonzero somewhere but its kernel, approaching everywhere, forces any nonzero value to oscillate wildly. There is no intermediate case for hyperplanes in a topological vector space.
The practical consequence is a convenient test: to verify that a linear functional on a lcs is continuous, it suffices to check that its kernel is closed. This is often easier than directly verifying the continuity criterion (iii) above, especially when the functional is defined abstractly. For example, to show that a distribution $T \in \mathcal{D}'(\Omega)$ is continuous, one can check that $T^{-1}(\{0\}) = \ker(T)$ is closed in $C_c^\infty(\Omega)$, which is sometimes transparent from the definition of $T$.
The connection to hyperplane separation is direct: the closed kernel theorem says that a functional is continuous iff its level sets $\{f = c\}$ are closed, i.e. iff the functional separates points from its zero level set in a topologically meaningful way. This connects naturally to the geometric Hahn-Banach separation theorems proved in Chapter 3, which characterise when two convex sets can be separated by a closed hyperplane.
[/explanation]
The Hahn-Banach theorem extends to locally convex spaces.
[quotetheorem:2638]
[citeproof:2638]
Part (ii) says that $X^*$ separates points of $X$ from closed subspaces: the dual is large enough to distinguish any point from any closed subspace. In particular, $X^*$ separates points of $X$, which is the lcs analogue of the normed-space separation result.
The Hahn-Banach theorem and duality machinery of Chapter 1 are now deployed concretely: Chapter 2 identifies the dual spaces of the two most important function spaces in analysis—the Lp spaces and C(K)—by converting abstract functionals into explicit integrals through the Radon-Nikodym theorem. This concrete duality is the foundation on which all subsequent functional analysis rests.
# 2. The Dual Spaces of $L^p(\mu)$ and $C(K)$
This chapter identifies the dual spaces of the two most important families of Banach spaces in analysis: the $L^p(\mu)$ spaces and the space $C(K)$ of continuous functions on a compact Hausdorff space. Both identifications rest on a common strategy — Radon-Nikodym theory converts a bounded linear functional into an integral against a measure — but the two settings require genuinely different tools. Chapter 1 gave us the Hahn-Banach machinery to study functionals in the abstract; here we make that machinery concrete by computing explicitly what the dual spaces look like.
## $L^p(\mu)$ Spaces and Their Completeness
What does a "generic" function space look like? Analysis constantly needs to take limits of sequences of functions — Fourier series converge pointwise or in norm, approximations to solutions of PDEs need to converge, and variational arguments require weak limits. The classical spaces $C(K)$ or $C^1[0,1]$ are too rigid: a pointwise limit of continuous functions need not be continuous, and so these spaces are not closed under the very operations analysis demands. The $L^p$ spaces are designed precisely to fix this: they are the smallest reasonable completions of the continuous functions that are stable under $p$-th power integration. Every sequence of $L^p$ functions whose norms are summable has a limit in the same space — and that is the substance of the Riesz-Fischer theorem below. Without completeness, functional analysis collapses: the Hahn-Banach theorem, the open mapping theorem, and all subsequent dual-space theory require working in Banach spaces.
We fix a measure space $(\Omega, \Sigma, \mu)$. The $L^p$ spaces are defined as follows.
[definition: $L^p$ Space]
For $p \in [1, \infty)$, the space $L^p(\Omega, \Sigma, \mu)$, abbreviated $L^p(\mu)$, consists of all measurable functions $f : \Omega \to \mathbb{C}$ (or $\mathbb{R}$) satisfying
\begin{align*}
\int_\Omega |f|^p \, d\mu < \infty,
\end{align*}
with the $L^p$-norm
\begin{align*}
\|f\|_p := \left(\int_\Omega |f|^p \, d\mu\right)^{1/p}.
\end{align*}
Elements are identified up to $\mu$-a.e. equality, so $L^p(\mu)$ is a quotient of the set of $p$-integrable functions by the equivalence relation $f \sim g \iff f = g$ a.e.
For $p = \infty$, the space $L^\infty(\mu)$ consists of all measurable essentially bounded functions, with norm
\begin{align*}
\|f\|_\infty := \operatorname{ess\,sup}|f| = \inf\!\left\{\sup_{\Omega \setminus N} |f| : N \in \Sigma,\, \mu(N) = 0\right\}.
\end{align*}
Again elements are identified $\mu$-a.e.
[/definition]
A useful fact about $L^\infty$: for every $f \in L^\infty(\mu)$, there exists a null set $N$ such that $\|f\|_\infty = \sup_{\Omega \setminus N} |f|$. To see this, take a sequence $(N_n)$ of null sets with $\sup_{\Omega \setminus N_n}|f| \to \|f\|_\infty$ and set $N = \bigcup_n N_n$; then $\sup_{\Omega \setminus N}|f| \le \sup_{\Omega \setminus N_n}|f|$ for all $n$, giving $\sup_{\Omega \setminus N}|f| \le \|f\|_\infty$, while the reverse inequality is immediate from the definition.
[quotetheorem:2639]
[citeproof:2639]
The proof reveals something important: the identification of functions agreeing a.e. is not a cosmetic convenience but a logical necessity. On the pre-quotient space of literally all $p$-integrable functions, $\|\cdot\|_p$ is only a seminorm — the function $f = \mathbb{1}_{\{0\}}$ on $\mathbb{R}$ has $\|f\|_p = 0$ but $f \ne 0$. The completion argument above would fail if we did not collapse null-measure sets, because the limit function $f$ is only defined a.e. and different representatives give genuinely different functions in the pre-quotient. This is not a triviality: it means $L^p(\mu)$ is a space of equivalence classes, not a space of functions, and one must exercise care whenever pointwise values at a specific point are invoked.
Completeness also underlies everything that follows. The dual-space identifications of the next sections rest on Hahn-Banach and the open mapping theorem, both of which require Banach spaces. The case $p = 2$ is particularly significant: $L^2(\mu)$ is a Hilbert space with inner product $(f, g)_{L^2} = \int_\Omega f \bar{g} \, d\mu$, and Hilbert-space theory (Riesz representation in the Hilbert sense, orthonormal bases, Parseval's identity) is available there but not for other $p$.
[example: A Cauchy Sequence That Fails Outside $L^p$]
The necessity of completeness becomes concrete if one works in the wrong space. Consider the space $C([0,1])$ of continuous functions equipped with the $L^1$ norm $\|f\|_1 = \int_0^1 |f| \, d\mathcal{L}^1$. Define a sequence of continuous functions by
\begin{align*}
f_n(x) = \begin{cases} 0 & 0 \le x \le \tfrac{1}{2} - \tfrac{1}{n}, \\ n(x - \tfrac{1}{2} + \tfrac{1}{n}) & \tfrac{1}{2} - \tfrac{1}{n} \le x \le \tfrac{1}{2}, \\ 1 & \tfrac{1}{2} \le x \le 1. \end{cases}
\end{align*}
Each $f_n$ is a continuous "ramp" interpolating between $0$ and $1$ across an interval of length $1/n$ centred at $1/2$. For $m > n$, $\|f_m - f_n\|_1 \le 1/n \to 0$, so $(f_n)$ is Cauchy in the $L^1$-norm. The pointwise limit is $\mathbb{1}_{[1/2, 1]}$, which is not continuous. Thus $(f_n)$ is a Cauchy sequence in $(C([0,1]), \|\cdot\|_1)$ with no limit in that space: the space is not complete. Passing to $L^1([0,1])$ resolves this — the indicator function $\mathbb{1}_{[1/2,1]}$ is in $L^1$ and is the $L^1$ limit of $(f_n)$.
[/example]
## The Radon-Nikodym Theorem
Suppose we are handed a bounded linear functional $\psi : L^p(\mu) \to \mathbb{C}$ and asked to describe it concretely. What does $\psi$ look like? We know $\psi$ is bounded and linear, but that tells us nothing about its internal structure. The natural guess is that $\psi$ should look like integration against some fixed function: $\psi(f) = \int f \cdot g \, d\mu$ for some $g$. But why should every bounded linear functional have this form? The Radon-Nikodym theorem is the key: it says that if a measure $\nu$ is "subordinate" to $\mu$ in the precise sense of absolute continuity, then $\nu$ is itself an integral against $\mu$, i.e., $\nu(A) = \int_A g \, d\mu$ for some $g \in L^1(\mu)$. This converts the problem of understanding an abstract functional into the problem of understanding a measure, and then the Radon-Nikodym theorem converts the measure into a concrete function.
[definition: Complex Measure and Total Variation]
Let $(\Omega, \Sigma)$ be a measurable space. A **complex measure** on $\Sigma$ is a countably additive function $\nu : \Sigma \to \mathbb{C}$, meaning
\begin{align*}
\nu\!\left(\bigsqcup_{n=1}^\infty A_n\right) = \sum_{n=1}^\infty \nu(A_n)
\end{align*}
for pairwise disjoint $A_n \in \Sigma$, where the series is absolutely convergent. (Absolute convergence is required because reordering the $A_n$ does not change their union.)
The **total variation measure** of $\nu$ is the positive measure $|\nu| : \Sigma \to [0, \infty]$ defined by
\begin{align*}
|\nu|(A) := \sup\!\left\{\sum_{k=1}^n |\nu(A_k)| : A = \bigsqcup_{k=1}^n A_k \text{ measurable partition}\right\}.
\end{align*}
The **total variation norm** is $\|\nu\|_1 := |\nu|(\Omega)$.
[/definition]
The total variation $|\nu|$ plays the role that $|f|$ plays for functions: it is always a positive measure, and $|\nu|$ being finite is the correct notion of "$\nu$ is bounded." Complex measures always have finite total variation (this follows from the Jordan decomposition below), which distinguishes them from general signed measures that may assign $+\infty$ or $-\infty$ to some sets. The intuition is the same as for absolutely convergent series of complex numbers: $|\nu|(A)$ is the largest "absolute total mass" one can extract from $\nu$ by partitioning $A$, and finiteness of this quantity is what makes $\nu$ amenable to integration theory.
To make practical use of complex measures, we need a way to reduce computations involving $\nu$ to computations involving ordinary positive measures, where monotone and dominated convergence are available. The Jordan decomposition supplies exactly this reduction: every signed (real-valued) measure splits canonically into a difference of two finite positive measures, and a complex measure splits into four. This is the analogue of writing a real-valued function as $f = f^+ - f^-$, except now applied at the level of the measure itself. Once we have this decomposition in hand, all of the integration machinery built for positive measures transfers to complex measures by linearity, and the total variation $|\nu|$ becomes the natural majorant playing the role of $|f|$ in absolute integrability bounds.
[definition: Signed Measure and Jordan Decomposition]
A **signed measure** on $\Sigma$ is a countably additive function $\nu : \Sigma \to \mathbb{R}$. The **Hahn decomposition** of $(\Omega, \Sigma, \nu)$ is a measurable partition $\Omega = P \cup N$ such that $\nu(A) \ge 0$ for all measurable $A \subset P$ and $\nu(A) \le 0$ for all measurable $A \subset N$. The **Jordan decomposition** writes $\nu = \nu^+ - \nu^-$ where $\nu^\pm(A) = \pm\nu(A \cap P^\pm)$ are finite positive measures. For a complex measure, one applies the Jordan decomposition to $\operatorname{Re}(\nu)$ and $\operatorname{Im}(\nu)$ separately to write $\nu = \nu_1 - \nu_2 + i(\nu_3 - \nu_4)$, with each $\nu_k$ a finite positive measure. This shows that $|\nu|$ is always finite for complex measures.
[/definition]
With the Jordan decomposition in hand, working with complex or signed measures reduces to working with finite positive measures. The next concept identifies which measures arise as integrals. The motivating observation is the following: every $f \in L^1(\mu)$ produces a complex measure via the formula $\nu(A) = \int_A f \, d\mu$, and one would like a clean structural characterisation of the measures that arise this way — a property of $\nu$ alone, formulated without reference to any putative $f$, that is necessary and sufficient for $\nu$ to be representable as an integral against $\mu$.
The candidate property is striking in its simplicity: if $\nu$ comes from integrating an $L^1$ function against $\mu$, then any set with $\mu$-measure zero contributes nothing to the integral, hence has $\nu$-measure zero. This necessary condition turns out to be sufficient as well, and it earns its own name — absolute continuity — because in the finite-measure case it is equivalent to the stronger-looking $\varepsilon$-$\delta$ statement that small $\mu$-measure forces small $|\nu|$. The Radon-Nikodym theorem, which we state next, will reverse this implication: every absolutely continuous complex measure on a $\sigma$-finite space arises from integration against some $L^1$ density.
[definition: Absolute Continuity]
Let $(\Omega, \Sigma, \mu)$ be a measure space and $\nu : \Sigma \to \mathbb{C}$ a complex measure. We say $\nu$ is **absolutely continuous with respect to $\mu$**, written $\nu \ll \mu$, if
\begin{align*}
\mu(A) = 0 \implies \nu(A) = 0 \quad \text{for all } A \in \Sigma.
\end{align*}
[/definition]
For finite $\mu$, absolute continuity has an equivalent $\varepsilon$-$\delta$ characterisation: $\nu \ll \mu$ if and only if for every $\varepsilon > 0$ there exists $\delta > 0$ such that $\mu(A) < \delta$ implies $|\nu(A)| < \varepsilon$. This is the origin of the name.
The canonical example: if $f \in L^1(\mu)$, then $\nu(A) := \int_A f \, d\mu$ defines a complex measure with $\nu \ll \mu$. The Radon-Nikodym theorem asserts this is the only way such measures arise.
What can fail if absolute continuity does not hold? Consider $\mu = \mathcal{L}^1$ (Lebesgue measure on $\mathbb{R}$) and $\nu = \delta_0$ (the Dirac measure at $0$). The set $A = \{0\}$ satisfies $\mu(A) = \mathcal{L}^1(\{0\}) = 0$ but $\nu(A) = \delta_0(\{0\}) = 1 \ne 0$, so $\delta_0 \not\ll \mathcal{L}^1$. Correspondingly, there is no $f \in L^1(\mathcal{L}^1)$ with $\delta_0(A) = \int_A f \, d\mathcal{L}^1$ for all $A$ — if such $f$ existed, it would need to be zero a.e. (since $\int_A f \, d\mathcal{L}^1 = 0$ for any single-point set) but also satisfy $\int_{\mathbb{R}} f \, d\mathcal{L}^1 = 1$, a contradiction.
[definition: Sigma-Finiteness]
A set $A \in \Sigma$ is **$\sigma$-finite** for $\mu$ if $A = \bigcup_n A_n$ with $\mu(A_n) < \infty$ for each $n$. The measure $\mu$ is $\sigma$-finite if $\Omega$ itself is $\sigma$-finite.
[/definition]
Sigma-finiteness is the second essential hypothesis of the Radon-Nikodym theorem, and its necessity is equally concrete. Consider $\Omega = \mathbb{R}$ equipped with the counting measure $\mu$ (which assigns to each set its cardinality, or $+\infty$ if the set is infinite), and let $\nu = \mathcal{L}^1$ be Lebesgue measure. Then $\nu \ll \mu$ (any set of counting-measure zero is empty, hence has Lebesgue measure zero), but $\mu$ is not $\sigma$-finite ($\mathbb{R}$ cannot be written as a countable union of sets with finite cardinality). The Radon-Nikodym theorem fails here: there is no $f \in L^1(\mu)$ with $\mathcal{L}^1(A) = \int_A f \, d\mu$ for all $A$, because such an $f$ would need to satisfy $\int_{\{x\}} f \, d\mu = f(x) = \mathcal{L}^1(\{x\}) = 0$ for every singleton, forcing $f \equiv 0$ and giving $\int_{\mathbb{R}} f \, d\mu = 0 \ne \mathcal{L}^1(\mathbb{R}) = \infty$.
[quotetheorem:2640]
[citeproof:2640]
The Radon-Nikodym theorem is a powerful identification result, but its hypotheses are not mere technicalities — each one is genuinely necessary, and the theorem gives no information about what $d\nu/d\mu$ looks like beyond the fact that it is in $L^1(\mu)$. In particular: the derivative need not be continuous, bounded, or even locally bounded; it is an a.e.-equivalence class, not a pointwise-defined function. The two counterexamples above (Dirac vs. Lebesgue for absolute continuity; counting measure vs. Lebesgue for $\sigma$-finiteness) show that both hypotheses are tight. The forward significance is substantial: in the next section, we apply the theorem to convert an arbitrary bounded linear functional $\psi \in L^p(\mu)^*$ into a measure $\nu(A) := \psi(\mathbb{1}_A)$ and then invoke Radon-Nikodym to extract the integrand $g \in L^q(\mu)$ representing $\psi$. Every step of that argument relies on the theorem as a black box, and dropping either hypothesis would break the argument.
[remark: Radon-Nikodym Derivative Notation]
For suitable measurable $g$, the chain rule holds:
\begin{align*}
\int_\Omega g \, d\nu = \int_\Omega g \cdot \frac{d\nu}{d\mu} \, d\mu.
\end{align*}
This mirrors the classical change-of-variables formula and makes the derivative notation natural.
[/remark]
The chain rule above is more than notational convenience: it tells us that once we have identified the Radon-Nikodym derivative $d\nu/d\mu$, every $\nu$-integral collapses into a $\mu$-integral with the derivative absorbed into the integrand. In practice this is how Radon-Nikodym derivatives are used — one rarely needs to know $\nu$ as a measure once its density against $\mu$ is in hand. The next example makes this concrete by computing a Radon-Nikodym derivative from scratch on a simple measure space, verifying the existence claim of the theorem and illustrating the chain rule.
[example: Computing a Radon-Nikodym Derivative Explicitly]
Let $\Omega = (0,1)$ with Lebesgue measure $\mu = \mathcal{L}^1$. Define a measure $\nu$ on the Borel sets of $(0,1)$ by $\nu(A) = \int_A x \, d\mathcal{L}^1(x)$. This is a finite positive measure with $\nu \ll \mathcal{L}^1$: if $\mathcal{L}^1(A) = 0$ then $\int_A x \, d\mathcal{L}^1(x) = 0$. The Radon-Nikodym theorem guarantees a unique $f \in L^1(\mathcal{L}^1)$ with $\nu(A) = \int_A f \, d\mathcal{L}^1$, and by inspection $f(x) = x$ works. So $d\nu/d\mathcal{L}^1 = x$ (the identity function).
To see the chain rule in action: for $g(x) = x^2$,
\begin{align*}
\int_{(0,1)} g \, d\nu = \int_{(0,1)} x^2 \cdot x \, d\mathcal{L}^1(x) = \int_0^1 x^3 \, dx = \tfrac{1}{4},
\end{align*}
which equals $\int_{(0,1)} g \cdot \frac{d\nu}{d\mu} \, d\mu = \int_0^1 x^2 \cdot x \, dx = 1/4$, confirming the chain rule directly.
[/example]
## The Dual of $L^p(\mu)$
What does an arbitrary bounded linear functional on $L^p(\mu)$ look like? Unlike on a finite-dimensional space, where every linear functional is given by a dot product, on an infinite-dimensional space it is not a priori clear that functionals must be of any particular form. The naive guess is that $\psi(f) = \int f g \, d\mu$ for some fixed $g$, but why should this exhaust all possibilities? The question is whether the integration pairing $L^p \times L^q \to \mathbb{C}$ is already the "complete" description of the dual, or whether there might be exotic functionals not captured by any $g \in L^q$. The answer below is that for $p \in (1, \infty)$ — and for $p = 1$ under the mild hypothesis of $\sigma$-finiteness — the integration pairing captures everything: the dual is exactly $L^q$.
The natural candidate for elements of $L^p(\mu)^*$ comes from integration: for $g \in L^q(\mu)$, the map
\begin{align*}
\varphi_g(f) := \int_\Omega f \cdot g \, d\mu
\end{align*}
is well-defined and bounded by Hölder's inequality: $|\varphi_g(f)| \le \|f\|_p \|g\|_q$. So $\varphi_g \in L^p(\mu)^*$ and $\|\varphi_g\| \le \|g\|_q$. This gives a contractive linear map $\varphi : L^q(\mu) \to L^p(\mu)^*$ via $\varphi(g) = \varphi_g$.
Here $q$ is the Hölder conjugate of $p$: $1/p + 1/q = 1$, with the convention $q = \infty$ when $p = 1$ and $q = 1$ when $p = \infty$.
[quotetheorem:2641]
[citeproof:2641]
The theorem identifies the dual of $L^p$ with $L^q$ for $p \in (1, \infty)$, but the endpoints require separate comment. The exclusion of $p = \infty$ is genuine and not an oversight: the dual of $L^\infty(\mu)$ is strictly larger than $L^1(\mu)$ in general. To see why, consider $\mu = \mathcal{L}^1$ on $[0,1]$. The space $L^\infty[0,1]$ contains $C([0,1])$ as a closed subspace, and the functional $\psi(f) = f(0)$ (evaluation at $0$) is a bounded linear functional on $C([0,1])$ with $\|\psi\| = 1$. If $\psi$ extended to $L^\infty[0,1]$ and were represented by some $g \in L^1$, then we would need $f(0) = \int_0^1 f(x) g(x) \, dx$ for all $f \in C([0,1])$. Taking $f_n(x) = \max(1 - nx, 0)$, which converges to $0$ a.e. while $f_n(0) = 1$, dominated convergence would give $\int f_n g \, dx \to 0 \ne 1$, a contradiction. So $L^\infty[0,1]^*$ strictly contains $L^1[0,1]$, and its elements include finitely additive set functions that are not $\sigma$-additive measures.
For $p = 1$, the hypothesis of $\sigma$-finiteness is essential. On a non-$\sigma$-finite space, $L^1(\mu)^*$ can be larger than $L^\infty(\mu)$. A concrete obstruction: in the isometry proof, we used $\sigma$-finiteness to find a set $B$ of positive finite measure inside $\{|g| > s\}$. Without $\sigma$-finiteness, every set of positive measure might have infinite measure, making it impossible to construct the test function $\mathbb{1}_B \in L^1$.
[example: Reflexivity of $L^p$ for $p \in (1, \infty)$]
The dual identification gives an immediate structural consequence: for $p \in (1, \infty)$ and any measure space $(\Omega, \Sigma, \mu)$, the space $L^p(\mu)$ is reflexive.
To see this concretely on $L^2([0,1])$: the dual $(L^2)^*$ is identified with $L^2$ itself via the map $g \mapsto \varphi_g$ (since $2' = 2$). The bidual $(L^2)^{**}$ is identified with $(L^2)^* \cong L^2$. The canonical embedding $\iota : L^2 \to (L^2)^{**}$ sends $f$ to the functional $\hat{f}(\psi) = \psi(f)$. Under the identification $(L^2)^{**} \cong L^2$, the functional $\hat{f}$ corresponds to $f$ itself: for any $g \in L^2$,
\begin{align*}
\hat{f}(\varphi_g) = \varphi_g(f) = \int_0^1 f(x) g(x) \, dx = \varphi_f(g),
\end{align*}
so the canonical embedding sends $f$ to $\varphi_f$, which is exactly the element of $(L^2)^* \cong L^2$ corresponding to $f$. Thus $\iota$ is surjective and $L^2([0,1])$ is reflexive.
The same argument works for any $p \in (1, \infty)$: the two isometric isomorphisms $L^q \cong (L^p)^*$ and $L^p \cong (L^q)^*$ compose to show that the canonical embedding $L^p \to (L^p)^{**}$ is surjective. The endpoints $p = 1$ and $p = \infty$ are not reflexive: $(L^1)^* \cong L^\infty$ but $(L^\infty)^*$ is strictly larger than $L^1$, so $L^1$ cannot embed surjectively into its bidual; and $L^\infty$ is not reflexive on any infinite measure space (its dual is too large).
[/example]
## The Dual of $C(K)$ and the Riesz Representation Theorem
The $L^p$ dual identification relied on the ambient measure $\mu$ from the start. What if there is no measure at all — just a compact Hausdorff space $K$ and the space $C(K)$ of continuous functions on it? A bounded linear functional $\varphi : C(K) \to \mathbb{C}$ assigns a number to each continuous function, linearly and continuously. The question is: must $\varphi$ be of the form $\varphi(f) = \int_K f \, d\nu$ for some measure $\nu$ on $K$? There is no measure given in advance, so the theorem must construct one from $\varphi$. This is the content of the Riesz representation theorem, and it is more subtle than the $L^p$ case precisely because we have to build the measure from scratch using only the topological structure of $K$.
We now turn to the second main identification. Let $K$ be a compact Hausdorff topological space. Define:
\begin{align*}
C(K) &= \{f : K \to \mathbb{C} : f \text{ continuous}\}, \\
C_\mathbb{R}(K) &= \{f : K \to \mathbb{R} : f \text{ continuous}\}, \\
C_+(K) &= \{f \in C_\mathbb{R}(K) : f \ge 0\}.
\end{align*}
The space $C(K)$ is a Banach space under the uniform norm $\|f\|_\infty = \sup_K |f|$. We write $M(K) = C(K)^*$ for the dual, and define subsets
\begin{align*}
M_\mathbb{R}(K) &= \{\varphi \in M(K) : \varphi(f) \in \mathbb{R} \text{ for all } f \in C_\mathbb{R}(K)\}, \\
M_+(K) &= \{\varphi \in M(K) : \varphi(f) \ge 0 \text{ for all } f \in C_+(K)\}.
\end{align*}
Elements of $M_+(K)$ are called **positive linear functionals**. The strategy of the Riesz representation theorem is to work primarily with $M_+(K)$, because the following lemma reduces everything to the positive case.
[quotetheorem:2642]
[citeproof:2642]
This lemma is the starting point: to describe all of $M(K)$, it suffices to describe $M_+(K)$, the positive functionals. Part (iv) is the functional-analytic analogue of the Jordan decomposition for signed measures — every real functional splits canonically into a difference of positive functionals, with the total mass equal to the sum. This is not an accident: the proof of the Riesz theorem will reveal that positive functionals correspond exactly to positive measures, and the Jordan decomposition of functionals then mirrors the Jordan decomposition of signed measures. Part (iii) identifies positive functionals by a norm condition: the norm is attained at $\mathbb{1}_K$ if and only if the functional is positive, which is a remarkably clean characterisation with no explicit reference to the ordering on $C(K)$.
### Topological Preliminaries: Urysohn and Partitions of Unity
Before proving the Riesz theorem, we collect the topological tools that make the proof work. Their role is to bridge the gap between sets (in the Borel $\sigma$-algebra) and continuous functions. The Riesz theorem constructs a measure from a functional: we need continuous functions to "approximate" indicator functions of sets, and this requires that the topology on $K$ is rich enough to separate closed sets by continuous functions. Compact Hausdorff spaces have exactly this richness, through Urysohn's lemma.
Recall that a compact Hausdorff space $K$ is **normal**: disjoint closed subsets can be separated by disjoint open sets. Normality is stronger than Hausdorff and follows from compactness.
**Urysohn's Lemma** (quoted without proof): If $E, F \subset K$ are disjoint closed sets, there exists $f : K \to [0,1]$ continuous with $f \equiv 0$ on $E$ and $f \equiv 1$ on $F$. Equivalently, if $E \subset U$ with $E$ closed and $U$ open, there exists $f \in C(K)$ with $E \prec f \prec U$, meaning $f \equiv 1$ on $E$ and $\operatorname{supp}(f) \subset U$.
[quotetheorem:2643]
[citeproof:2643]
Partitions of unity are the workhorses of the existence proof for the Riesz measure: they allow us to decompose a continuous function $f$ into pieces supported on small open sets, apply the functional to each piece, and estimate the resulting sum against a pre-measure built from $\varphi$. Without the Hausdorff condition on $K$, points could not be separated and Urysohn's lemma would fail, meaning we could not construct the continuous functions needed to approximate indicators of closed sets.
### Borel Measures and Regularity
Not every Borel measure on a topological space behaves well with respect to the topology. A Borel measure that assigns large mass to topologically thin sets, or that cannot be approximated from inside by compact sets, would not interact predictably with continuous functions. The Riesz theorem identifies functionals on $C(K)$ with measures, so the measures that arise must be compatible with the topology in a precise sense: they must be determined by their values on open sets (outer regularity) and on compact sets (inner regularity). These two properties together are called **regularity**.
[definition: Regular Borel Measure]
Let $X$ be a topological space with Borel $\sigma$-algebra $\mathcal{B}(X)$. A positive Borel measure $\mu$ on $X$ is **regular** if:
(i) $\mu(E) < \infty$ for all compact $E \subset X$;
(ii) $\mu(A) = \inf\{\mu(U) : A \subset U \text{ open}\}$ for all $A \in \mathcal{B}(X)$ (outer regularity);
(iii) $\mu(U) = \sup\{\mu(E) : E \subset U,\, E \text{ compact}\}$ for all open $U$ (inner regularity).
A complex Borel measure $\nu$ is regular if $|\nu|$ is a regular positive Borel measure.
[/definition]
For a compact Hausdorff $K$, a positive Borel measure $\mu$ is regular if and only if $\mu(K) < \infty$ and $\mu$ is outer regular; inner regularity then follows by taking complements.
Why is regularity the right condition? Non-regular measures can exhibit genuinely pathological behaviour. For instance, on an uncountable discrete topological space $X$, the counting measure $\mu$ assigns mass $|A|$ to each set $A$. This is a Borel measure, but it is not inner regular: for any open set $U$ (which in the discrete topology is any set), the sup of $\mu(E)$ over compact $E \subset U$ equals the sup over finite $E \subset U$, giving $\sup_E \mu(E) = \infty$ if $U$ is infinite, but this must be reconciled with $\mu(U)$ which might also be infinite. More concretely, on $[0,1]$ with the Borel $\sigma$-algebra, one can construct a Borel measure that is not inner regular by assigning infinite mass to every non-empty open set and zero mass to a specific closed set — such a measure is outer regular for that closed set but not inner regular. The Riesz theorem guarantees that the measure it constructs is always regular, and regularity is what makes the measure unique.
Integration against a complex Borel measure $\nu$ is defined via the Jordan decomposition $\nu = \nu_1 - \nu_2 + i(\nu_3 - \nu_4)$: a measurable $f$ is $\nu$-integrable if $\int |f| \, d|\nu| < \infty$, and
\begin{align*}
\int_\Omega f \, d\nu := \int_\Omega f \, d\nu_1 - \int_\Omega f \, d\nu_2 + i\int_\Omega f \, d\nu_3 - i\int_\Omega f \, d\nu_4.
\end{align*}
The triangle inequality $|\int f \, d\nu| \le \int |f| \, d|\nu|$ holds. Every $f \in C(K)$ is $\nu$-integrable for any complex Borel measure $\nu$ on compact $K$, since $\int |f| \, d|\nu| \le \|f\|_\infty |\nu|(K) < \infty$.
### The Riesz Representation Theorem
[quotetheorem:221]
[citeproof:221]
The Riesz theorem is a profound statement about the relationship between topology and measure theory, and each hypothesis is load-bearing. Compactness of $K$ is essential: on a non-compact locally compact Hausdorff space $X$, the evaluation functional $\varphi(f) = f(x_0)$ for $x_0 \in X$ is bounded on $C_c(X)$ (continuous functions of compact support) but does not extend to a regular measure on all of $X$ in the same sense — one must work with the one-point compactification or restrict to $C_0(X)$, and the statement changes substantially. The Hausdorff condition ensures normality, which is needed for Urysohn's lemma and hence for partitions of unity; without it, the continuous functions might be too few to separate closed sets and the construction of $\mu^*$ would break down. Regularity is what makes the representing measure unique: if we dropped the regularity requirement, one could construct distinct measures agreeing on all open sets but differing on some exotic Borel set, and uniqueness would fail.
[quotetheorem:2644]
[citeproof:2644]
This theorem completes the circle: every bounded linear functional on $C(K)$ is exactly integration against a regular complex Borel measure, and the correspondence is isometric. The result has wide implications. The weak-$*$ topology on $M(K) = C(K)^*$ — in which a net $\nu_\alpha$ converges if $\int f \, d\nu_\alpha \to \int f \, d\nu$ for every $f \in C(K)$ — is the natural topology for sequences of probability measures in analysis and probability. The Helly selection theorem (any bounded sequence of positive measures on $[0,1]$ has a weak-$*$ convergent subsequence) is an immediate consequence of the Banach-Alaoglu theorem (Chapter 3) applied to $M([0,1]) = C([0,1])^*$. The space $\mathcal{P}(K)$ of probability measures on $K$ (i.e., regular positive Borel measures with total mass $1$) is then a compact convex subset of the unit ball of $M(K)$ in the weak-$*$ topology, making it amenable to fixed-point theorems such as Markov-Kakutani. The extreme points of $\mathcal{P}(K)$ — the Dirac measures — will be identified as the extreme points of the closed unit ball of $M(K)$ in Chapter 4.
[remark: Completeness of the Measure Space]
Since dual spaces are always Banach spaces and isometric isomorphisms preserve completeness, this theorem implies that the space of regular complex Borel measures on $K$ is a Banach space under the total variation norm $\|\nu\|_1 = |\nu|(K)$. The isometric isomorphism $C(K)^* \cong \{\text{regular complex Borel measures on } K\}$ is sometimes stated as the defining property of the **measure space** $M(K)$.
[/remark]
The Banach space structure on $M(K)$ is not immediately obvious from the measure-theoretic side: why should the total variation norm make the regular measures complete? The theorem answers this indirectly: since $C(K)^*$ is always a Banach space (a general fact about duals of Banach spaces), and the isomorphism is isometric, the completeness transfers. Trying to prove completeness of $M(K)$ directly from measure theory would require checking that every Cauchy sequence of regular measures converges to a regular measure, which is possible but more involved.
[example: Dirac Measures as Functionals]
For any fixed $z \in K$, the evaluation functional $\varphi_z : C(K) \to \mathbb{C}$, $\varphi_z(f) = f(z)$, is a bounded linear functional on $C(K)$ with $\|\varphi_z\| = 1$ (it is bounded since $|f(z)| \le \|f\|_\infty$, and the norm is attained by the constant function $\mathbb{1}_K$). By the Riesz representation theorem, there exists a unique regular complex Borel measure $\delta_z$ on $K$ such that
\begin{align*}
\int_K f \, d\delta_z = f(z) \quad \text{for all } f \in C(K).
\end{align*}
This is the **Dirac measure** at $z$: it is the positive measure satisfying $\delta_z(\{z\}) = 1$ and $\delta_z(K \setminus \{z\}) = 0$. The Riesz theorem thus guarantees the existence of Dirac measures as a special case, and identifies evaluation functionals with point masses.
Note that $\|\delta_z\|_1 = \delta_z(K) = 1 = \|\varphi_z\|$, confirming the isometric nature of the correspondence. Moreover, different points give different measures: if $z \ne w$ in $K$, Urysohn's lemma gives $f \in C(K)$ with $f(z) = 1$ and $f(w) = 0$, so $\int f \, d\delta_z = 1 \ne 0 = \int f \, d\delta_w$, hence $\delta_z \ne \delta_w$ as measures. The map $z \mapsto \delta_z$ embeds $K$ isometrically (in the metric induced by the total variation norm divided by $2$) into $M(K)$, and this embedding is continuous in the weak-$*$ topology since $\int f \, d\delta_{z_\alpha} = f(z_\alpha) \to f(z) = \int f \, d\delta_z$ whenever $z_\alpha \to z$ in $K$.
[/example]
The chapter culminates in a unified picture: the dual of $L^p(\mu)$ for $1 \le p < \infty$ is $L^q(\mu)$ (integration pairing), and the dual of $C(K)$ is the space of regular complex Borel measures on $K$ (also via integration). Both are instances of the general principle that bounded linear functionals on function spaces are represented by integration against a suitable kernel or measure.
With the norm topology well-understood and duals computed, Chapter 3 introduces coarser topologies on Banach spaces that enable compactness without norm-closedness. The weak topology makes bounded sequences sequentially compact in separable reflexive spaces and leads to the Banach-Alaoglu theorem, showing that the unit ball of the dual is always weakly-* compact—a cornerstone of the calculus of variations and optimization.
# 3. Weak Topologies
The norm topology on a Banach space is in many ways too strong — bounded sequences need not have convergent subsequences, and functionals defined on the dual may fail to be continuous with respect to it. A vivid illustration: in $L^2([0,1])$, the unit ball is not norm-compact, so a minimisation problem (find $u$ in the unit ball minimising some functional) may have no solution — the minimising sequence runs off into the distance without accumulating. The weak and weak-$*$ topologies, which are coarser (fewer open sets), repair exactly this: the unit ball of $X^*$ is always $w^*$-compact, giving compactness for free. We begin with the general theory of weak topologies generated by families of functions, then specialise to the two key examples arising from a normed space and its dual. The chapter culminates in the Banach-Alaoglu theorem, Mazur's theorem, and the characterisation of reflexivity in terms of weak compactness.
## Weak Topologies Generated by a Family of Functions
Let $X$ be a set and let $\mathcal{F}$ be a family of functions, where each $f \in \mathcal{F}$ maps $X$ into some topological space $Y_f$ (the target may vary with $f$). The central question is: what is the smallest topology on $X$ that makes all members of $\mathcal{F}$ continuous simultaneously?
[definition: Weak Topology Generated by a Family]
Let $X$ be a set and $\mathcal{F} = \{f : X \to Y_f\}$ a family of functions into topological spaces $Y_f$. The **weak topology generated by $\mathcal{F}$**, denoted $\sigma(X, \mathcal{F})$, is the smallest topology on $X$ such that every $f \in \mathcal{F}$ is continuous.
[/definition]
This topology exists because it is the intersection of all topologies on $X$ with respect to which every $f \in \mathcal{F}$ is continuous — the discrete topology always ensures continuity, so the intersection is non-empty.
The key structural description is in terms of a sub-basis. The collection
\begin{align*}
\mathcal{S} = \{f^{-1}(U) : f \in \mathcal{F},\ U \subset Y_f \text{ open}\}
\end{align*}
is a sub-basis for $\sigma(X, \mathcal{F})$. Unrolling what this means for open sets:
A set $V \subset X$ is open in $\sigma(X, \mathcal{F})$ if and only if for every $x \in V$, there exist $n \in \mathbb{N}$, functions $f_1, \ldots, f_n \in \mathcal{F}$, and open sets $U_i \subset Y_{f_i}$ such that
\begin{align*}
x \in \bigcap_{i=1}^n f_i^{-1}(U_i) \subset V.
\end{align*}
The phrase "finitely many $f \in \mathcal{F}$" is essential: one only imposes constraints on finitely many members of the family at a time.
[remark: Sub-basis Refinement]
If $\mathcal{S}_f$ is a sub-basis for the topology on $Y_f$ for each $f \in \mathcal{F}$, then $\{f^{-1}(U) : f \in \mathcal{F},\ U \in \mathcal{S}_f\}$ is already a sub-basis for $\sigma(X, \mathcal{F})$. In particular, to verify that a map into $(X, \sigma(X,\mathcal{F}))$ is continuous, it suffices to check pre-images of sub-basis elements.
[/remark]
The most useful property of the weak topology is its universal property, which converts continuity into $\sigma(X, \mathcal{F})$ into a family of scalar continuity conditions.
[quotetheorem:2645]
[citeproof:2645]
The universal property is the reason weak topologies are so powerful: to check continuity of a map $g : Z \to (X, \sigma(X, \mathcal{F}))$, one never needs to understand the full topology on $X$ — it suffices to check each composed scalar (or target-space) map $f \circ g$ one at a time. In the context of normed spaces, this means checking weak continuity of a map reduces to checking each functional evaluation separately, which is often tractable. The separation hypothesis is also worth noting: if $\mathcal{F}$ separates the points of $X$ (for every $x \neq y$, some $f \in \mathcal{F}$ satisfies $f(x) \neq f(y)$), then $\sigma(X, \mathcal{F})$ is Hausdorff, since two distinct points can be separated by the pre-images of disjoint open sets around their different images.
### Classical Topologies as Weak Topologies
Before specialising to normed spaces, it is instructive to see that two familiar topologies are already weak topologies in disguise.
[example: Subspace Topology]
Let $X$ be a topological space, $Y \subset X$ a subset, and $i : Y \hookrightarrow X$ the inclusion map. Then $\sigma(Y, \{i\})$ is precisely the subspace topology on $Y$: the sub-basis consists of sets $i^{-1}(U) = U \cap Y$ for $U$ open in $X$, which is exactly how the subspace topology is defined.
[/example]
A more elaborate but equally familiar example is the product topology, which fits the same template once one views the projection maps as the generating family. The crucial subtlety, as we will see, is that an open set in the product can constrain only finitely many coordinates — a feature that traces back directly to the finite intersection in the sub-basis description.
[example: Product Topology]
Let $(X_\gamma)_{\gamma \in \Gamma}$ be a family of topological spaces and $X = \prod_{\gamma \in \Gamma} X_\gamma$. Think of elements of $X$ as functions $x : \Gamma \to \bigsqcup_{\gamma} X_\gamma$ with $x(\gamma) \in X_\gamma$. Define the projection maps $\pi_\gamma : X \to X_\gamma$ by $\pi_\gamma(x) = x(\gamma)$. Then the product topology on $X$ is exactly $\sigma(X, \{\pi_\gamma : \gamma \in \Gamma\})$.
Explicitly, a set $V \subset X$ is open if and only if for every $x \in V$, there exist finitely many indices $\gamma_1, \ldots, \gamma_n \in \Gamma$ and open neighbourhoods $U_i$ of $x(\gamma_i)$ in $X_{\gamma_i}$ such that $\{y \in X : y(\gamma_i) \in U_i,\ 1 \leq i \leq n\} \subset V$. All other coordinates are unconstrained.
[/example]
These two examples produce well-behaved topologies because the generating family separates points. To see that this hypothesis cannot be omitted — and to motivate the role Hahn-Banach will play later — it is worth examining what goes wrong when the family is too small.
[example: Hypothesis Failure — When the Topology Collapses]
To see why point-separation by $\mathcal{F}$ is indispensable for the Hausdorff property, consider $X = \mathbb{R}^2$ with $\mathcal{F} = \{\pi_1\}$ (projection onto the first coordinate). The topology $\sigma(\mathbb{R}^2, \{\pi_1\})$ has as open sets exactly the horizontal strips $\pi_1^{-1}(U) = U \times \mathbb{R}$ for $U$ open in $\mathbb{R}$. Two points $(0, 0)$ and $(0, 1)$ have identical first coordinates, so $\pi_1$ does not separate them. Any open set containing $(0, 0)$ must be a horizontal strip $U \times \mathbb{R}$ with $0 \in U$, which automatically also contains $(0, 1)$. Thus the topology is not Hausdorff, and one cannot distinguish the two points by open sets. This failure is not exotic: the weak topology on a vector space has the same issue if the separating family $F$ is too small (e.g., if only the zero functional is available). Hahn-Banach guarantees the dual separates points of a normed space precisely to exclude this collapse.
[/example]
## Metrizability of Weak Topologies
In analysis, it is often useful to know when a topology is metrizable, since in metric spaces one can work with sequences rather than nets. The following proposition gives a sufficient condition.
[quotetheorem:2646]
[citeproof:2646]
Both hypotheses — countability of the family and point-separation — are essential. Without countability, the construction of a single weighted metric $d$ collapses: uncountable products of metric spaces are generally not metrizable, the standard counterexample being $\{0,1\}^{[0,1]}$ with the product topology, which is compact (by Tychonov below) but not first countable. Without separation, the proposed $d$ degenerates to a pseudometric and the resulting topology is not Hausdorff. We will see this proposition resurface in the proof that $(B_{X^*}, w^*)$ is metrizable when $X$ is separable: a countable dense sequence in $X$ provides the countable separating family of evaluations.
## Tychonov's Theorem
The most important compactness theorem in topology, Tychonov's theorem, is best understood as a statement about weak topologies: the product topology (itself a weak topology) preserves compactness.
[quotetheorem:2647]
[citeproof:2647]
Tychonov's theorem rests squarely on the Axiom of Choice, via Zorn's lemma; the argument fails without it. In fact, Tychonov's theorem (in its full generality for uncountable products) is equivalent to the Axiom of Choice. A sharp contrast: sequential compactness does not behave well for uncountable products — the product $\{0,1\}^{[0,1]}$ is Tychonov-compact but not sequentially compact. For our purposes, the key invocation of Tychonov is in the proof of the Banach-Alaoglu theorem, where $B_{X^*}$ is embedded as a closed subset of a (generally uncountable) product of compact disks; it is this uncountability that makes Tychonov — rather than the simpler sequential compactness — the correct tool.
## Weak Topologies on Vector Spaces
In normed spaces, the norm topology often forces sequences to carry more information than is structurally necessary. A linear functional $f \in X^*$ defines an equivalence relation on $X$ (two points are "equivalent" if $f$ cannot distinguish them), and the norm topology on $X$ knows about all functionals simultaneously. The question is whether one can impose a coarser topology — one generated by a designated subspace $F$ of functionals — without losing separation of points. This is the abstract framework for both the weak topology and the weak-$*$ topology.
Let $E$ be a real or complex vector space and $F$ a subspace of all linear functionals on $E$ (not necessarily continuous). Assume $F$ separates the points of $E$, which is equivalent to: for every $x \neq 0$ in $E$, there exists $f \in F$ with $f(x) \neq 0$.
For each $f \in F$, the map $P_f : E \to \mathbb{R}$ defined by $P_f(x) = |f(x)|$ is a seminorm. The family $\mathcal{P} = \{P_f : f \in F\}$ is a family of seminorms separating points, so it defines a locally convex space (lcs) structure on $E$. The locally convex topology from $\mathcal{P}$ is precisely $\sigma(E, F)$: a set $U \subset E$ is open if and only if for every $x \in U$ there exist $n$, functionals $f_1, \ldots, f_n \in F$ and $\varepsilon > 0$ such that
\begin{align*}
\{y \in E : |f_i(y) - f_i(x)| < \varepsilon \text{ for all } 1 \leq i \leq n\} \subset U.
\end{align*}
In particular, $\sigma(E, F)$ is Hausdorff and makes addition and scalar multiplication continuous. The following lemma identifies the continuous dual of $(E, \sigma(E,F))$.
[quotetheorem:2648]
[citeproof:2648]
The Kernel Lemma is the algebraic engine behind the dual identification: the key point is that finiteness of the family $\{g_i\}$ is essential. If one tried to use an infinite family $\{g_i\}_{i=1}^\infty$ with $\bigcap_{i=1}^\infty \ker(g_i) \subset \ker(f)$, the conclusion $f \in \operatorname{span}\{g_i : i \in \mathbb{N}\}$ fails in general — there is no finite linear combination that captures $f$. The proof uses only finite-dimensional linear algebra, which is why the constraint to finitely many $g_i$ in the weak topology (one open set pins down only finitely many functionals at once) is exactly the right finiteness condition.
[quotetheorem:2649]
[citeproof:2649]
This proposition closes the loop: the weak topology generated by $F$ has continuous dual exactly $F$ — no more, no less. This is in stark contrast to the norm topology, where the continuous dual $X^*$ can be much larger than any seed family used to define a coarser topology. The lesson is that the weak topology $\sigma(E, F)$ is precisely the coarsest topology that witnesses $F$ as a dual: one cannot accidentally generate extra continuous functionals by using too many seminorms. This rigidity is what makes weak topologies the natural setting for duality arguments.
## The Weak and Weak-$*$ Topologies
Both the weak topology and the weak-$*$ topology emerge from the same question asked from different sides of the pairing between a normed space and its dual: which linear functionals should be used to generate the coarsening? For the weak topology on $X$, one uses all continuous functionals $f \in X^*$ to keep fewer open sets on $X$ while preserving Hausdorffness (the dual $X^*$ separates points of $X$ by Hahn-Banach, as established in Chapter 1). For the weak-$*$ topology on $X^*$, one goes even further and uses only the evaluations $f \mapsto f(x)$ for $x \in X$, rather than all of $X^{**}$, to generate the topology on $X^*$. The payoff for this second coarsening is dramatic: whereas the unit ball in $X^*$ under the norm topology is compact only in finite dimensions, under the weak-$*$ topology it is always compact.
[definition: Weak Topology on a Normed Space]
Let $X$ be a normed vector space. The **weak topology** (or $w$-topology) on $X$ is $\sigma(X, X^*)$, the smallest topology on $X$ making every continuous linear functional continuous. We write $(X, w)$ for $X$ equipped with this topology. Sets open in $(X, w)$ are called **weakly open**; a sequence $x_n \rightharpoonup x$ is said to converge **weakly** to $x$.
[/definition]
By Hahn-Banach, $X^*$ separates points of $X$, so the weak topology is Hausdorff. Concretely:
\begin{align*}
U \subset X \text{ is } w\text{-open} \iff \forall x \in U,\ \exists n,\ f_1, \ldots, f_n \in X^*,\ \varepsilon > 0 : \{y : |f_i(y-x)| < \varepsilon,\ 1 \leq i \leq n\} \subset U.
\end{align*}
[definition: Weak-$*$ Topology]
Let $X$ be a normed space. The **weak-$*$ topology** (or $w^*$-topology) on $X^*$ is $\sigma(X^*, X)$, where we identify $X$ with $\hat{X} \subset X^{**}$ via the canonical embedding $x \mapsto \hat{x}$. This is the smallest topology on $X^*$ making every evaluation map $f \mapsto f(x)$ continuous. We write $(X^*, w^*)$. Convergence $f_n \overset{*}{\rightharpoonup} f$ means $f_n(x) \to f(x)$ for every $x \in X$, i.e. pointwise convergence.
[/definition]
The weak-$*$ topology separates points of $X^*$: if $f \neq g$ then there exists $x \in X$ with $f(x) \neq g(x)$, i.e. $\hat{x}(f) \neq \hat{x}(g)$.
### Comparing the Topologies
Three natural topologies on $X$ — the weak topology, the norm topology, and topologies induced from $X^{**}$ — are related by inclusion:
\begin{align*}
\sigma(X, X^*) \subset \|\cdot\|-\text{topology on }X,
\end{align*}
because every $f \in X^*$ is norm-continuous. Strict inequality holds in infinite dimensions:
[remark: Weak Equals Norm Only in Finite Dimensions]
$\sigma(X, X^*) = \|\cdot\|$-topology on $X$ if and only if $\dim(X) < \infty$. In infinite dimensions the weak topology is strictly coarser: there are norm-open sets that are not weakly open.
[/remark]
Similarly, on $X^*$ we have the chain
\begin{align*}
\sigma(X^*, X) \subset \sigma(X^*, X^{**}) \subset \|\cdot\|-\text{topology on }X^*,
\end{align*}
with the first inclusion because $\sigma(X^*, X)$ is generated by fewer evaluation functionals (only those in $\hat{X} \subset X^{**}$).
[quotetheorem:2650]
[citeproof:2650]
The equivalence in (iii) is more than a curiosity: it gives a purely topological criterion for reflexivity. Reflexivity, defined a priori via the canonical embedding $X \hookrightarrow X^{**}$, turns out to be detectable by comparing two topologies on $X^*$. This foreshadows a deeper theme — explored fully later in this chapter — where reflexivity is equivalent to weak compactness of the unit ball. The pattern is consistent: structural properties of a Banach space find their cleanest expression as compactness or coincidence statements about the weak and weak-$*$ topologies.
## Weak Convergence and Boundedness
Working with the weak topology in full generality requires nets, since the weak topology is rarely first countable. In practice, however, one often wants to extract subsequences and compute weak limits the way one does with norm convergence. The structural results below show that despite the weakness of the topology, sequences still carry essential information: weakly convergent sequences are norm-bounded, and the norm is lower semicontinuous along them. Moreover, weak boundedness of a set is no weaker than norm boundedness — a non-trivial fact that reduces the apparently topological notion of boundedness to a familiar metric one. We treat the weak and weak-$*$ versions in parallel because both rely on the Principle of Uniform Boundedness applied to the canonical embedding $X \hookrightarrow X^{**}$.
[citedefinition:Weak Convergence]
The notion of weak convergence pairs naturally with a notion of weak boundedness for subsets of $X$, and analogously for the dual.
[definition: Weak and Weak-$*$ Boundedness]
A subset $A \subset X$ is **weakly bounded** if $f(A) = \{f(x) : x \in A\}$ is bounded in $\mathbb{R}$ (or $\mathbb{C}$) for every $f \in X^*$. A subset $B \subset X^*$ is **weak-$*$ bounded** if $\{\varphi(x) : \varphi \in B\}$ is bounded for every $x \in X$.
[/definition]
These topological notions of boundedness coincide with the natural norm-boundedness, by the Principle of Uniform Boundedness.
[quotetheorem:2651]
[citeproof:2651]
The asymmetry between (i) and (ii) is worth dwelling on: part (i) holds for any normed space because the dual $X^*$ is automatically complete, but part (ii) requires completeness of $X$ itself. The next example shows this hypothesis is genuinely necessary and not a defect of the proof.
[example: Completeness is Needed for Part (ii)]
The completeness hypothesis in part (ii) is not superfluous. Consider $X = c_{00}$, the space of eventually-zero sequences equipped with the $\ell^1$ norm (so $X^* \cong \ell^\infty$). This is not a Banach space. Define functionals $\varphi_n \in X^*$ by $\varphi_n(x) = n \cdot x_n$ for $x = (x_k) \in c_{00}$. For any fixed $x \in c_{00}$, we have $x_n = 0$ for all but finitely many $n$, so $\varphi_n(x) = 0$ eventually; in particular, $\{\varphi_n(x) : n \in \mathbb{N}\}$ is bounded. Thus $\{\varphi_n\}$ is $w^*$-bounded. However, $\|\varphi_n\| = n \to \infty$, so the set is not norm-bounded. The PUB fails here because $c_{00}$ is incomplete — in a Banach space, pointwise boundedness forces uniform boundedness.
[/example]
The following semi-continuity property of norms under weak and weak-$*$ convergence is a direct consequence of the Principle of Uniform Boundedness.
[quotetheorem:2652]
[citeproof:2652]
Equality need not hold in either inequality: a concrete witness is $X = \ell^2$ and the standard basis $e_n$, which converges weakly to $0$ (since for $f = (a_k) \in \ell^2$, $f(e_n) = a_n \to 0$) yet $\|e_n\| = 1$ for all $n$, so $\|0\| = 0 < 1 = \liminf_n \|e_n\|$. The norm can drop in the weak limit but never jump up. This lower semicontinuity is precisely the property exploited in the calculus of variations: a norm-coercive functional automatically attains its infimum on a weakly compact set, because lower semicontinuity along minimising sequences prevents the functional value from leaping above its infimum at the weak limit.
## The Hahn-Banach Separation Theorems
If the norm topology is too fine and the trivial topology too coarse, the natural question is: can one systematically find linear functionals that "see" geometrically meaningful separations? Given two disjoint convex sets in a locally convex space, the answer is yes — under hypotheses that are both necessary and essentially sharp.
### The Minkowski Functional
The key tool is a device for converting geometric data (the shape of a convex set) into analytic data (a subadditive functional that bounds linear maps). Before Hahn-Banach can be applied to extend a linear functional, one needs a subadditive positively homogeneous functional — a "gauge" — that dominates it. The Minkowski functional of a convex set is exactly this gauge.
[definition: Minkowski Functional]
Let $(X, \mathcal{P})$ be a locally convex space and $C \subset X$ a convex set with $0 \in \operatorname{int}(C)$. The **Minkowski functional** of $C$ is $\mu_C : X \to [0, \infty)$ defined by
\begin{align*}
\mu_C(x) := \inf\{t > 0 : x \in tC\}.
\end{align*}
[/definition]
This is well-defined (the infimum is finite) because scalar multiplication is continuous and $0 \in \operatorname{int}(C)$: for any $x \in X$, the map $\lambda \mapsto \lambda x$ is continuous, so there exists $\delta > 0$ such that $\lambda x \in C$ for all $|\lambda| \leq \delta$, giving $x \in (1/\delta)C$.
[example: Minkowski Functional of the Unit Ball]
If $X$ is a normed space and $C = B_X = \{x : \|x\| < 1\}$, then $\mu_C = \|\cdot\|$. To see this: $x \in tC$ means $\|x\| < t$, so $\inf\{t > 0 : \|x\| < t\} = \|x\|$. The Minkowski functional generalises the norm to convex bodies: any open convex neighbourhood of the origin has a Minkowski functional, and that functional has exactly the analytic properties (subadditivity, positive homogeneity) needed to apply Hahn-Banach.
[/example]
The properties of the Minkowski functional are what make it the right tool for Hahn-Banach: subadditivity and positive homogeneity are precisely the conditions under which a linear functional dominated by the gauge can be extended from a subspace to the whole space.
[quotetheorem:2653]
<!-- illustration-needed: the Minkowski functional of a convex body — a convex body C in the plane with 0 in its interior, a point x outside C, and the boundary point t*x where the ray from 0 through x first exits C; the value mu_C(x) is the reciprocal of the scaling t needed to bring x back inside C, so mu_C(x) > 1 outside C, mu_C(x) < 1 strictly inside, mu_C(x) = 1 on the boundary -->
[citeproof:2653]
The two containments capture complementary information about the relationship between $\mu_C$ and $C$: the left containment says points where the gauge is strictly less than $1$ already lie inside $C$, while the right says $C$ is contained in the closed sub-level set $\{\mu_C \leq 1\}$. Openness of $C$ promotes the left containment to equality because any interior point can be slightly inflated and still remain in $C$, forcing $\mu_C < 1$. Closedness promotes the right containment to equality because the limit $(1-1/n)x \to x$ is captured. The combination of subadditivity and positive homogeneity is exactly what the abstract Hahn-Banach theorem requires of a dominating functional, which is why $\mu_C$ is the canonical bridge from "convex set" to "extendable linear functional."
### Separation Theorems
When can two disjoint convex sets in a locally convex space be separated by a continuous linear functional? The answer depends on the regularity of the sets: openness on one side is enough for weak separation; compactness on one side and closedness on the other give strict separation. Without convexity, separation by hyperplanes is generally impossible — a functional $f$ always has connected fibres $\{f = c\}$, and non-convex sets may interleave in ways no hyperplane can cut.
[quotetheorem:2654]
[citeproof:2654]
The hypothesis that $0 \in C$ is purely a normalisation: any open convex set has a translate containing the origin, and one can recover the general statement by translating. Openness, on the other hand, is essential — it produces the strict inequality through the inflation $(1+1/n)x_0 \notin C$, which forces $\mu_C(x_0) \geq 1$ even though $\mu_C(x_0) \leq 1$ for the boundary point. This one-sided result, separating a single point from an open convex set, is the engine behind the two-set Hahn-Banach separation theorem below: the difference set $A - B$ of two disjoint convex sets is convex and avoids the origin, reducing the two-set separation to point-from-set separation.
<!-- illustration-needed: Hahn-Banach separation of two disjoint convex sets — a compact convex set A and a closed convex set B in the plane with A ∩ B = ∅, and a separating affine line {f = α} that lies strictly between them; the gap reflects strict separation sup_A f < inf_B f, which compactness of A is what guarantees -->
[quotetheorem:974]
[citeproof:974]
The compactness assumption in part (ii) is not a technical convenience: without it, the strict-separation conclusion $\sup_A f < \inf_B f$ can fail outright, even for two disjoint closed convex sets in the plane.
[example: Why Compactness Cannot Be Dropped from Part (ii)]
The hypothesis in part (ii) that $A$ is compact cannot be weakened to merely closed. Consider $X = \mathbb{R}^2$ with the ordinary topology, $A = \{(x, y) : xy \geq 1,\ x > 0,\ y > 0\}$ (the region above the upper branch of the hyperbola $xy = 1$ in the open first quadrant), and $B = \{(x, 0) : x > 0\}$ (the positive $x$-axis). Both are closed and convex, and $A \cap B = \varnothing$. Suppose for contradiction that there is a continuous linear separation $f(x,y) = ax + by$ with $\sup_A f < \alpha < \inf_B f$ for some $\alpha \in \mathbb{R}$. Evaluating on $B$: $f(x, 0) = ax > \alpha$ for all $x > 0$ forces $a \geq 0$ (since otherwise $ax \to -\infty$ as $x \to \infty$); letting $x \downarrow 0$ gives $\alpha \leq 0$. Evaluating on $A$ at the point $(t, 1/t) \in A$ for $t > 0$ gives $at + b/t < \alpha \leq 0$ for all $t > 0$. Sending $t \to \infty$ with $a \geq 0$ forces $a = 0$ (otherwise $at \to \infty$); sending $t \to 0$ then forces $b \leq 0$. But then $f(x, 0) = 0 \not> \alpha$ when $\alpha < 0$, and $f \equiv 0$ when $\alpha = 0$, contradicting strict separation in either case. So no continuous linear strict separation exists despite $A$, $B$ being closed, convex, and disjoint.
[/example]
## Reflexivity, Separability, and the Cantor Universality of Weak Topologies
### Mazur's Theorem
A fundamental observation is that the norm-closed and weakly closed convex sets coincide. This is a powerful tool, since weak closedness is in general easier to establish (one needs to check against fewer open sets), while norm-closedness is what one often wants.
[quotetheorem:985]
[citeproof:985]
This yields a striking corollary about weakly null sequences.
[quotetheorem:2655]
The corollary says that even if the sequence itself does not converge in norm, one can find convex combinations that do. A standard illustration: in $\ell^2$, the standard basis vectors $e_n \rightharpoonup 0$ (since for $f = (a_n) \in \ell^2$, $f(e_n) = a_n \to 0$), yet $\|e_n\| = 1$ for all $n$. However, $\|\frac{1}{n}(e_1 + \cdots + e_n)\|_{\ell^2} = \frac{1}{\sqrt{n}} \to 0$, and each such element lies in $\operatorname{conv}\{e_n\}$.
[citeproof:2655]
More precisely, if $x_n \rightharpoonup 0$, then one can find indices $p_1 < q_1 < p_2 < q_2 < \cdots$ and convex combinations $\sum_{i=p_k}^{q_k} t_i x_i$ (with $t_i \geq 0$, $\sum_{i=p_k}^{q_k} t_i = 1$) that converge to $0$ in norm. This block-convex-combination version is sometimes called the Mazur Lemma and is particularly useful in PDE theory, where one has a weakly convergent minimising sequence and needs strong (norm) convergence for a specific subsequence of averages.
### The Banach-Alaoglu Theorem
The norm topology on $X^*$ is too strong for compactness: the unit ball $B_{X^*}$ is norm-compact only when $X$ is finite-dimensional. The weak-$*$ topology repairs this.
[quotetheorem:212]
[citeproof:212]
The Banach-Alaoglu theorem is one of the most important results in functional analysis, and will be a key ingredient in Chapter 4: the weak* compactness of $B_{X^*}$ is precisely what makes the unit ball of any dual space eligible for the Krein-Milman theorem. But it comes with important caveats. First, the $w^*$-compactness of $B_{X^*}$ does not in general imply sequential compactness: when $X$ is non-separable, bounded sequences in $X^*$ need not have $w^*$-convergent subsequences. For instance, taking $X = \ell^\infty$, the dual $X^* = (\ell^\infty)^*$ contains the unit ball $B_{(\ell^\infty)^*}$, which is $w^*$-compact by Banach-Alaoglu but not $w^*$-sequentially compact. Second, the theorem says nothing about norm-compactness: $B_{X^*}$ under the norm topology is compact only in finite dimensions. The correct slogan is: Banach-Alaoglu gives compactness in the weakest reasonable topology.
### Separability and Metrizability
The Banach-Alaoglu theorem gives compactness; when $X$ is separable, the weak-$*$ topology on $B_{X^*}$ is additionally metrizable, which allows sequential arguments.
[quotetheorem:2656]
[citeproof:2656]
The combination of Banach-Alaoglu and separability-metrizability is what makes the weak-$*$ topology so effective in practice. When $X$ is separable, one gets not just $w^*$-compactness but also $w^*$-sequential compactness of $B_{X^*}$: one can extract subsequences, run diagonal arguments, and perform all the sequential manipulations that ordinary metric-space topology allows. This sequential compactness is indispensable for applications in PDE and calculus of variations, where one typically works in $L^p$ spaces (separable for $1 \leq p < \infty$) and needs to extract weakly convergent subsequences from bounded minimising sequences.
[remark: Sequential Weak-$*$ Compactness]
If $X$ is separable, then $(B_{X^*}, w^*)$ is metrizable and compact (by Banach-Alaoglu), hence sequentially compact: every bounded sequence in $X^*$ has a weak-$*$ convergent subsequence.
[/remark]
### Goldstein's Theorem and Reflexivity
How large is $\hat{X}$ inside $X^{**}$? The canonical embedding $J : X \to X^{**}$, $J(x) = \hat{x}$, is always an isometric isomorphism onto its image $\hat{X}$, so $B_X$ maps to an isometric copy $\hat{B}_X$ inside $B_{X^{**}}$. The question is whether anything in $B_{X^{**}}$ is "far" from $\hat{B}_X$ in the $w^*$-topology — that is, whether $\hat{B}_X$ is $w^*$-dense in $B_{X^{**}}$. For non-reflexive spaces, $\hat{X}$ is a proper subspace of $X^{**}$, yet Goldstein's theorem shows the unit balls are as close as they can be in the $w^*$-topology.
[quotetheorem:2657]
[citeproof:2657]
Why does convexity play the crucial role here? The key is that $w^*$-closure is hard to control in general, but for convex sets, the Hahn-Banach separation theorem allows one to work with functionals rather than points. If $\hat{B}_X$ were not convex, the $w^*$-closure might miss pieces of $B_{X^{**}}$ in ways that cannot be separated out by a single functional. One also cannot replace $w^*$-closure by norm-closure: in a non-reflexive space, $\hat{B}_X$ is not norm-dense in $B_{X^{**}}$ (since $\hat{X}$ is a proper norm-closed subspace of $X^{**}$). The weak-$*$ topology is genuinely coarser in a way that allows the density to hold.
An immediate consequence: if $X$ is norm-separable, then $X^{**}$ is $w^*$-separable (since $X$ is $w^*$-dense in $X^{**}$, and a dense separable set gives separability). In particular, $\ell_\infty^* \simeq \ell_1^{**}$ is $w^*$-separable.
The reflexivity of a Banach space is characterised entirely in terms of the weak compactness of its unit ball.
[quotetheorem:2658]
[citeproof:2658]
The equivalence (i) $\iff$ (iii) is a landmark result: reflexivity has a purely topological characterisation in terms of compactness. Non-reflexive spaces fail this in a very concrete way. For example, $c_0$ (sequences converging to zero) is not reflexive: $(c_0)^* = \ell^1$ and $(c_0)^{**} = \ell^\infty \neq c_0$. The standard basis vectors $e_n \in c_0$ satisfy $\|e_n\| = 1$, but $e_n \rightharpoonup 0$ weakly in $c_0$ (since every $f \in (c_0)^* = \ell^1$ gives $f(e_n) = f_n \to 0$). So the sequence $(e_n)$ lies in $B_{c_0}$ but has no weakly convergent subsequence converging to a point in $B_{c_0}$ with positive norm — confirming that $(B_{c_0}, w)$ is not compact. Similarly, $\ell^1$ is not reflexive: the Dirac masses $\delta_n \in \ell^1$ (the sequence with 1 in position $n$ and 0 elsewhere) satisfy $\|\delta_n\|_{\ell^1} = 1$ and $\delta_n \rightharpoonup 0$ weakly (since every $\varphi \in (\ell^1)^* = \ell^\infty$ gives $\varphi(\delta_n) = \varphi_n$, and boundedness of $\varphi$ does not force $\varphi_n \to 0$ in general — but any weakly convergent sequence must be norm-bounded and satisfy $f(\delta_n) \to f(\delta)$, yet no subsequence converges weakly to a limit in $\ell^1$ since $(\ell^1)^{**} = \ell^\infty \supsetneq \ell^1$).
[remark: Separable Reflexive Spaces]
If $X$ is a separable reflexive Banach space, then $(B_X, w)$ is both compact (by the theorem) and metrizable (by Proposition 3.7, since $X^*$ is separable for reflexive separable $X$). Hence $(B_X, w)$ is weakly sequentially compact: every bounded sequence in $X$ has a weakly convergent subsequence. This is a key tool in the calculus of variations and PDE theory — for instance, if $X = H^1_0(\Omega)$ and one has a bounded minimising sequence $(u_n)$, reflexivity and separability together guarantee a weakly convergent subsequence $u_{n_k} \rightharpoonup u$, and one then passes to the limit in the variational problem.
[/remark]
## Separable Spaces and the Cantor Set
The structural results on separability culminate in two universality theorems. The Cantor set $\Delta = \{0,1\}^{\mathbb{N}}$ (with the product topology) is in some sense the "most complicated" compact metric space, yet every compact metric space is its continuous image. This universality of the Cantor set then propagates, via the isometric embedding of normed spaces into $C(K)$ spaces, to show that $C([0,1])$ is universal for all separable normed spaces.
[quotetheorem:2659]
[citeproof:2659]
A natural follow-up is to identify $\Delta$ with a familiar concrete object, which fixes intuition for the abstract Cantor space.
[remark: Cantor Set and the Middle-Third Construction]
The space $\Delta = \{0,1\}^\mathbb{N}$ is homeomorphic to the standard middle-third Cantor set $\mathcal{C} \subset [0,1]$ via $(\varepsilon_i)_{i=1}^\infty \mapsto \sum_{i=1}^\infty (2\varepsilon_i) \cdot 3^{-i}$.
[/remark]
The theorem above is actually the nontrivial direction of a characterisation: a compact metrizable space $K$ is a continuous image of $\Delta$ if and only if $K$ is metrizable and compact. The converse (any continuous image of $\Delta$ is compact and metrizable) is elementary. Together, these statements identify $\Delta$ as the "universal" compact metric space.
[quotetheorem:2660]
[citeproof:2660]
This result shows that $C([0,1])$ is "universal" among separable normed spaces: every separable normed space is, up to isometry, a subspace of $C([0,1])$. The proof assembles nearly every major tool from this chapter: the isometric embedding via the unit ball (requiring Banach-Alaoglu for the compactness of $B_{X^*}$), the metrizability of $(B_{X^*}, w^*)$ from separability, the Cantor-set surjection, and the linear-interpolation extension — a complete picture of how weak topologies and separability interact.
The weak compactness of the unit ball motivates an algebraic question: which compact convex sets can be recovered from their extreme points? Chapter 4 proves the Krein-Milman theorem—every compact convex set is the closed convex hull of its extreme points—and then applies this to extreme points of dual balls, culminating in the Banach-Stone theorem: the isometry class of C(K) completely determines the topology of K.
# 4. Convexity and the Krein-Milman Theorem
This chapter demonstrates the full power of the Hahn-Banach separation theorems developed in the preceding chapter. The central object is the notion of an extreme point of a convex set — a point that cannot be written as a proper convex combination of other points in the set. The Krein-Milman theorem asserts that every compact convex subset of a locally convex space is the closed convex hull of its extreme points, a result both surprising and immensely useful. We also prove a partial converse, characterise the extreme points of certain function-space dual balls, and conclude with the Banach-Stone theorem, which shows that the isometry class of $C(K)$ completely determines the homeomorphism class of the compact space $K$.
## Extreme Points of Convex Sets
Before stating any major theorem, we need to identify the "vertices" of a convex set — the points that cannot be obtained by averaging two distinct points of the set. These are the building blocks from which the whole set can, in favourable circumstances, be reconstructed.
[definition: Extreme Point]
Let $X$ be a real or complex vector space and let $K \subset X$ be a convex subset. A point $x \in K$ is an **extreme point** of $K$ if whenever
\begin{align*}
x = (1-t)y + tz
\end{align*}
for some $y, z \in K$ and $t \in (0,1)$, it follows that $y = z = x$.
We write $\operatorname{Ext}(K)$ for the set of all extreme points of $K$.
[/definition]
Unpacking the definition: $x$ is extreme precisely when it does not lie in the interior of any line segment contained in $K$. The restriction $t \in (0,1)$ (rather than $t \in [0,1]$) is necessary to avoid the degenerate case where $t = 0$ would force $x = y$ trivially, which would make every point extreme.
[example: Extreme Points in Low-Dimensional Sequence Spaces]
Three standard examples build intuition for which points are and are not extreme.
**(i) $\ell^1(\mathbb{R}^2)$** — the space $\mathbb{R}^2$ with norm $\|x\|_1 = |x_1| + |x_2|$. The unit ball $B_{\ell^1}$ is a diamond (square rotated $45^\circ$) with vertices $\{\pm e_1, \pm e_2\}$. Every point on an edge strictly between two vertices can be written as a convex combination of those vertices, so it is not extreme. Hence
\begin{align*}
\operatorname{Ext}(B_{\ell^1}) = \{\pm e_1, \pm e_2\}.
\end{align*}
**(ii) $\ell^2(\mathbb{R}^2)$** — the space $\mathbb{R}^2$ with the Euclidean norm. The unit ball $B_{\ell^2}$ is the closed unit disc. Any point on the boundary circle is extreme: if $x = (1-t)y + tz$ with $\|x\|_2 = 1$ and $y, z \in B_{\ell^2}$, then by the strict convexity of the norm we must have $y = z = x$. Interior points are never extreme. Hence
\begin{align*}
\operatorname{Ext}(B_{\ell^2}) = S_{\ell^2} = \{x \in \mathbb{R}^2 : \|x\|_2 = 1\}.
\end{align*}
**(iii) $c_0$** — the space of sequences converging to zero, with $\|x\|_\infty = \sup_n |x_n|$. Remarkably, the unit ball $B_{c_0}$ has **no** extreme points.
To see this, let $x = (x_n)_n \in B_{c_0}$. Since $x_n \to 0$, there exists some index $n_0$ with $|x_{n_0}| < 1$. Set $\varepsilon = 1 - |x_{n_0}| > 0$, and define sequences $y$ and $z$ by $y_i = z_i = x_i$ for $i \neq n_0$, and $y_{n_0} = x_{n_0} + \varepsilon$, $z_{n_0} = x_{n_0} - \varepsilon$. Then $y, z \in B_{c_0}$ (since $\sup_n |y_n|, \sup_n |z_n| \leq 1$), both $y$ and $z$ still converge to $0$ (differing from $x$ in only one coordinate), and $x = \frac{1}{2}(y + z)$ with $y \neq x$. So $x$ is not extreme. Since $x$ was arbitrary:
\begin{align*}
\operatorname{Ext}(B_{c_0}) = \varnothing.
\end{align*}
[/example]
The third example is a warning: a convex set need not have any extreme points at all. The Krein-Milman theorem will show this cannot happen for compact convex sets.
## The Krein-Milman Theorem
[quotetheorem:2661]
Before proving this, we introduce the key structural tool — the notion of a face — which allows us to systematically "slice" $K$ into smaller and smaller pieces until we isolate an extreme point.
[definition: Face of a Convex Set]
Let $(X, \mathcal{P})$ be a locally convex space and let $K \subset X$ be a non-empty, compact, convex subset. A **face** of $K$ is a non-empty, compact, convex subset $E \subset K$ such that: whenever $y, z \in K$ and $t \in (0,1)$ satisfy $(1-t)y + tz \in E$, we have $y, z \in E$.
In other words, if $E$ contains an interior point of a line segment in $K$, it must contain the entire line segment.
[/definition]
Three fundamental properties of faces make them useful:
**Property 1.** $K$ itself is a face of $K$. Moreover, for any $x \in K$:
\begin{align*}
\{x\} \text{ is a face of } K \iff x \in \operatorname{Ext}(K).
\end{align*}
This is immediate from the definitions.
**Property 2.** For any $f \in X^*$, let $\alpha = \sup_K f$. Then the set
\begin{align*}
E = \{x \in K : f(x) = \alpha\}
\end{align*}
is a face of $K$. Such sets are called **supporting hyperplane faces**.
To verify this: $E$ is non-empty because $f$ is continuous and $K$ is compact, so the supremum is attained. $E$ is compact (preimage of $\{\alpha\}$ under a continuous map, intersected with compact $K$) and convex (since $f$ is linear). If $(1-t)y + tz \in E$ with $y, z \in K$ and $t \in (0,1)$, then
\begin{align*}
\alpha = f((1-t)y + tz) = (1-t)f(y) + tf(z) \leq (1-t)\alpha + t\alpha = \alpha,
\end{align*}
with equality only if $f(y) = f(z) = \alpha$, i.e., $y, z \in E$.
Geometrically: we take a hyperplane $\{f = \beta\}$ and push it outward (sending $\beta \to \alpha$) until it reaches the boundary of $K$; the resulting intersection is a face.
**Property 3.** The face relation is transitive: if $E$ is a face of $F$ and $F$ is a face of $K$, then $E$ is a face of $K$.
This follows directly from the definition: if $y, z \in K$ and $(1-t)y + tz \in E \subset F$, then the face property of $F$ gives $y, z \in F$, and then the face property of $E$ within $F$ gives $y, z \in E$.
As a consequence: if $x \in \operatorname{Ext}(E)$ and $E$ is a face of $K$, then $x \in \operatorname{Ext}(K)$.
[citeproof:2661]
<!-- illustration-needed: a compact convex polygon with its extreme points (vertices) marked, alongside an arrow showing how the closed convex hull of the vertices recovers the whole polygon — to build intuition for the Krein-Milman conclusion -->
## The Dual Unit Ball Always Has Extreme Points
A striking immediate consequence concerns dual unit balls.
[quotetheorem:2662]
[citeproof:2662]
The content of this theorem is more surprising than it first appears: the existence of at least one extreme point in every dual unit ball is not obvious by inspection, and it rules out certain spaces from ever appearing as duals. The two ingredients of the proof are essential and cannot be relaxed. Banach-Alaoglu supplies weak* compactness of $B_{X^*}$, and Krein-Milman then needs both compactness and local convexity of the ambient topology — the weak* topology on $X^*$ is locally convex precisely because it is generated by the seminorms $f \mapsto |f(x)|$ for $x \in X$. Drop the dual structure and the conclusion can fail outright: $B_{c_0}$ is the unit ball of a primal Banach space, equipped with its norm topology (in which it is bounded but **not** compact), and we showed earlier that $\operatorname{Ext}(B_{c_0}) = \varnothing$. The contrast is decisive — every dual unit ball has extreme points, but a non-dual closed convex bounded set need not.
[remark: No Space Has Dual Equal to c0]
This corollary shows that there is no normed space $X$ with $X^* \cong c_0$ isometrically. Indeed, $\operatorname{Ext}(B_{c_0}) = \varnothing$ by the example above, yet every dual unit ball must have at least one extreme point. (This does not rule out a non-isometric isomorphism $X^* \sim c_0$, only an isometric one.)
[/remark]
## Faces and Slices: Towards a Converse
How can one recognise an extreme point intrinsically — without checking the defining "no convex combination" property against every pair of points in $K$? The geometric picture suggests a local-separation criterion: a vertex of a polygon can be cut off from the rest by a single straight cut, while a point on a flat edge cannot. Making this precise is the goal of the next two theorems, which together show that extreme points of compact convex sets are exactly those that can be separated from the rest of $K$ by a single open half-space — a much stronger statement than the multi-hyperplane result available for general compact sets.
[quotetheorem:2663]
[citeproof:2663]
This result is a direct application of compactness — without it, the open cover of $K \setminus V$ might not admit a finite subcover, and we would be left with infinitely many hyperplane conditions that cannot be combined into a finite description. The "finitely many" bound is, in general, the best one can say for an arbitrary point $x_0 \in K$: if $x_0$ lies on a flat boundary segment, no single hyperplane can separate a neighbourhood of $x_0$ from the rest of $K$ (the hyperplane would have to contain the entire flat segment). This is exactly the obstruction that extremality removes — and that the next theorem exploits.
[quotetheorem:2664]
[citeproof:2664]
The geometric content is transparent: at an extreme point, a single supporting hyperplane isolates the point from the rest of $K$. This is exactly the intuition for "vertex" — a vertex of a polygon can be cut off by a single straight cut, while a flat boundary segment cannot.
The extremality hypothesis is genuinely essential here. Take $K$ to be the unit disc in $\mathbb{R}^2$ and let $x_0 = (1, 0)$, which lies on the boundary but is an extreme point — so indeed a single hyperplane $\{x_1 = 1\}$ isolates it. But take instead a point on a flat edge of a polygon: any hyperplane that separates this point from the opposite side of the polygon must run along the flat edge itself, containing the entire edge rather than just the one point. No single open half-space cuts out a small neighbourhood of such a point within $K$. The convex combination argument in the proof breaks down precisely there: at a non-extreme boundary point, $x_0$ can be written as a convex combination of points in $K_i$, so $x_0 \in \operatorname{conv}(\bigcup K_i)$ and the Hahn-Banach separation step fails.
Note also what the theorem does *not* claim: the slice $\{f < \alpha\} \cap K$ need not consist of $x_0$ alone. A slice is an open half-space intersected with $K$, so it typically contains an entire neighbourhood of $x_0$ in $K$. What the theorem gives is merely that arbitrarily small neighbourhoods can be captured by slices — the slices themselves can be large. The forward connection to the partial converse is immediate: we will use the slice theorem to show that $x_0 \notin \overline{\operatorname{conv}}(S)$ whenever $S$ misses $x_0 \in \operatorname{Ext}(K)$.
## A Partial Converse to Krein-Milman
The Krein-Milman theorem says $K$ equals the closed convex hull of $\operatorname{Ext}(K)$. A natural question is: is $\operatorname{Ext}(K)$ the *smallest* subset with this property? The following theorem confirms this.
[quotetheorem:2665]
[citeproof:2665]
It is worth pausing to understand why Krein-Milman alone does not give this minimality. Krein-Milman tells us $K = \overline{\operatorname{conv}}(\operatorname{Ext}(K))$, but a priori a proper subset of $\operatorname{Ext}(K)$ could already have dense convex hull. The partial converse rules this out: every extreme point is genuinely needed. The reason is that an extreme point $x_0$ cannot be approximated by convex combinations of points that avoid it — any closed convex set missing $x_0$ cannot contain $x_0$ in its convex hull, and the single-hyperplane theorem provides the separating functional that witnesses this.
The condition in the theorem is on $\overline{\operatorname{conv}}(S)$, not on $\operatorname{conv}(S)$ without closure. This is not a coincidence: $\operatorname{conv}(S)$ itself can be smaller even when $\overline{\operatorname{conv}}(S) = K$. The closure is indispensable, and the proof uses it by replacing $S$ with $\overline{S}$ at the outset.
Finally, this minimality result is the key tool for the next task: identifying $\operatorname{Ext}(B_{C(K)^*})$ exactly. The strategy will be to exhibit a candidate set $S$ of obvious extreme points (namely the scalar multiples of evaluation functionals), show $\overline{\operatorname{conv}}(S) = B_{C(K)^*}$, and conclude by the partial converse that $\operatorname{Ext}(B_{C(K)^*}) \subset S$; then verify the reverse inclusion directly.
## Extreme Points of the Dual Ball of $C(K)$
The Krein-Milman framework now pays for itself: combined with the partial converse, it lets us pin down the extreme points of a concrete and important dual unit ball — that of $C(K)$ for a compact Hausdorff space $K$. The payoff will be the Banach-Stone theorem, where these extreme points encode the topology of $K$ inside the Banach-space structure of $C(K)$. Recall from Chapter 2 that by the Riesz representation theorem, $C(K)^*$ is the space of regular Borel measures on $K$; the evaluation functionals $\delta_k$ (defined by $\delta_k(f) = f(k)$) play the role of "atoms."
[example: Extreme Points of the C(K) Dual Ball]
Let $K$ be a compact Hausdorff space. We claim
\begin{align*}
\operatorname{Ext}(B_{C(K)^*}) = \{\lambda \delta_k : |\lambda| = 1, k \in K\},
\end{align*}
where $\delta_k \in C(K)^*$ is evaluation at $k$, defined by $\delta_k(f) = f(k)$.
**Step 1: Each $\lambda \delta_k$ is an extreme point.**
Fix $k \in K$ and $|\lambda| = 1$. Suppose $\lambda \delta_k = (1-t)\mu + t\nu$ with $\mu, \nu \in B_{C(K)^*}$ and $t \in (0,1)$. We must show $\mu = \nu = \lambda \delta_k$.
For any $f \in C(K)$ with $\|f\|_\infty \leq 1$, the evaluation gives
\begin{align*}
\lambda f(k) = (1-t)\mu(f) + t\nu(f).
\end{align*}
Taking $f = \lambda \overline{g}$ for any $g \in C(K)$ with $\|g\|_\infty \leq 1$ (here $\overline{\lambda}$ denotes the complex conjugate, and we use $|\lambda| = 1$), we see that $|\mu(f)| \leq \|\mu\| \leq 1$ and $|\nu(f)| \leq 1$. Since $|\lambda f(k)| = |f(k)| \leq 1$, both sides have modulus at most $1$.
Now choose a continuous function $f$ supported near $k$ with $f(k) = \lambda^{-1}$ and $\|f\|_\infty = 1$ (using Urysohn's lemma on the compact Hausdorff space $K$). Then $\lambda f(k) = 1$, so
\begin{align*}
1 = (1-t)\mu(f) + t\nu(f)
\end{align*}
with $|\mu(f)| \leq 1$ and $|\nu(f)| \leq 1$. A convex combination of two numbers of modulus at most $1$ equals $1$ only if both equal $1$. Hence $\mu(f) = \nu(f) = 1$ for this particular $f$.
More generally, for any $g \in C(K)$ with $\|g\|_\infty \leq 1$, the Riesz representation of $\mu$ as a measure $\tilde{\mu}$ gives $\mu(g) = \int_K g \, d\tilde{\mu}$. Since $\mu(f) = 1$ for a function $f$ that is $\lambda^{-1}$ at $k$ and has $\|f\|_\infty = 1$, the measure $\tilde{\mu}$ must be concentrated at $k$ (a unit mass at a single point). Explicitly: if $|\tilde{\mu}|$ were not a point mass at $k$, then $\tilde{\mu}$ would give positive measure to some open set $U$ not containing $k$; choosing $f$ to vanish on $U$ and equal $\lambda^{-1}$ at $k$ would give $|\mu(f)| < 1$, contradicting $\mu(f) = 1$. Therefore $\tilde{\mu} = \lambda \delta_k$, i.e., $\mu = \lambda \delta_k$. The same argument gives $\nu = \lambda \delta_k$. So $\lambda \delta_k \in \operatorname{Ext}(B_{C(K)^*})$.
**Step 2: No other functionals are extreme.**
Let $\mu \in B_{C(K)^*}$ with $\mu \notin \{\lambda \delta_k : |\lambda| = 1, k \in K\}$. We must exhibit $\nu_1, \nu_2 \in B_{C(K)^*}$ with $\mu = \frac{1}{2}(\nu_1 + \nu_2)$ and $\nu_1 \neq \nu_2$.
By the Riesz representation theorem, $\mu$ corresponds to a regular Borel measure $\tilde{\mu}$ with $\|\mu\| = |\tilde{\mu}|(K) \leq 1$. Since $\mu$ is not of the form $\lambda \delta_k$, either $\|\mu\| < 1$ or $\tilde{\mu}$ is not a unit point mass.
*Case 1: $\|\mu\| < 1$.* Set $\nu_1 = \mu + \varepsilon \delta_{k_0}$ and $\nu_2 = \mu - \varepsilon \delta_{k_0}$ for any $k_0 \in K$ and $\varepsilon > 0$ small enough so $\|\nu_1\|, \|\nu_2\| \leq 1$. Then $\mu = \frac{1}{2}(\nu_1 + \nu_2)$ and $\nu_1 \neq \nu_2$, so $\mu \notin \operatorname{Ext}(B_{C(K)^*})$.
*Case 2: $\|\mu\| = 1$ and $\tilde{\mu}$ is not a unit point mass.* The total variation $|\tilde{\mu}|$ is then a probability measure on $K$ that is not concentrated at any single point. We exhibit a direct convex decomposition of $\tilde{\mu}$ into two distinct elements of $B_{C(K)^*}$.
Since $|\tilde{\mu}|$ is not a point mass, there exists a Borel set $E \subset K$ with $0 < |\tilde{\mu}|(E) < 1$. Set $a = |\tilde{\mu}|(E) \in (0, 1)$, and write $F = K \setminus E$, so $|\tilde{\mu}|(F) = 1 - a$. Define two complex Borel measures
\begin{align*}
\nu_1 = \frac{1}{a}\, \tilde{\mu}\, \chi_E, \qquad \nu_2 = \frac{1}{1-a}\, \tilde{\mu}\, \chi_F,
\end{align*}
where $(\tilde{\mu}\, \chi_E)(B) := \tilde{\mu}(B \cap E)$ for any Borel set $B$. Each $\nu_i$ is a regular complex Borel measure on $K$, and we compute its total variation: $|\nu_1|(K) = \frac{1}{a}|\tilde{\mu}\, \chi_E|(K) = \frac{1}{a}|\tilde{\mu}|(E) = 1$, and similarly $|\nu_2|(K) = \frac{1}{1-a}|\tilde{\mu}|(F) = 1$. Hence $\nu_1, \nu_2 \in B_{C(K)^*}$ (identifying measures with functionals via Riesz).
The convex combination recovers $\tilde{\mu}$:
\begin{align*}
a \nu_1 + (1-a)\nu_2 = \tilde{\mu}\, \chi_E + \tilde{\mu}\, \chi_F = \tilde{\mu}\, \chi_K = \tilde{\mu}.
\end{align*}
Finally, $\nu_1 \neq \nu_2$: indeed $\nu_1$ is concentrated on $E$ and $\nu_2$ on $F = K \setminus E$, so $\nu_1(E) = \frac{1}{a}\tilde{\mu}(E)$ while $\nu_2(E) = 0$, and these differ unless $\tilde{\mu}(E) = 0$ — but $|\tilde{\mu}|(E) = a > 0$ and the assumption that $\tilde{\mu}$ is not a unit point mass means $\tilde{\mu}(E) \neq 0$ for at least one such choice of $E$ (if $\tilde{\mu}(B) = 0$ for every Borel $B$ with $|\tilde{\mu}|(B) < 1$, then $\tilde{\mu}$ would have to be supported on a single $|\tilde{\mu}|$-atom of full measure, i.e., a point mass).
Therefore $\mu \notin \operatorname{Ext}(B_{C(K)^*})$, completing the argument.
[/example]
## The Banach-Stone Theorem
The identification of extreme points of $B_{C(K)^*}$ has a beautiful application: it shows that the isometry class of $C(K)$ completely determines the topology of $K$.
[quotetheorem:2666]
[citeproof:2666]
Each hypothesis in Banach-Stone is genuinely necessary. **Compactness of $K$** is essential: for non-compact (locally compact Hausdorff) $K$ one typically replaces $C(K)$ by $C_0(K)$, and the analogue holds with proper continuous maps, but the plain $C(K)$ statement breaks down because the dual is no longer described by regular Borel measures on $K$ alone. **Hausdorffness** is needed for $K$ to embed as evaluation functionals — without separation of points, distinct points of $K$ give the same $\delta_k$ and the recovery of $K$ from $C(K)$ fails. **Isometric (rather than topological) isomorphism** is also essential: there exist non-homeomorphic compact Hausdorff spaces $K, L$ with $C(K)$ and $C(L)$ isomorphic as Banach spaces — for instance, $C([0, 1])$ and $C([0, 1] \cup \{2\}) \cong C([0, 1]) \oplus \mathbb{R}$ are linearly isomorphic via classical results (Milyutin's theorem implies $C([0, 1]) \cong C([0, 1]^2)$ as Banach spaces, even though $[0, 1]$ and $[0, 1]^2$ are not homeomorphic). It is precisely the isometry that pins down the unit ball, and hence the extreme-point set, and hence the topology of $K$. The theorem holds for both real and complex scalars, with the unit circle in the extreme-point characterisation replaced by $\{\pm 1\}$ in the real case; the proof is otherwise identical.
[remark: Categorical Interpretation]
The Banach-Stone theorem shows that the assignment $K \mapsto C(K)$ is injective on homeomorphism classes of compact Hausdorff spaces: distinct compact Hausdorff spaces yield non-isometric $C(K)$ spaces. Combined with the Commutative Gelfand–Naimark theorem proved in Chapter 7 (which states that every commutative unital $C^*$-algebra is isometrically isomorphic to some $C(K)$), this establishes a perfect duality between compact Hausdorff spaces and commutative unital $C^*$-algebras.
[/remark]
Having analyzed the geometric and dual-space structure of Banach spaces, Chapter 5 shifts to the algebraic study of bounded operators through the lens of Banach algebras. The spectrum of an operator becomes the key invariant, captured by characters on commutative algebras, and the Gelfand representation theorem encodes every spectrum as a function algebra—marrying algebraic and analytic structure.
# 5. Banach Algebras
Eigenvalues tell us everything about a matrix: they control invertibility, long-run behaviour, and the structure of every invariant subspace. But for a bounded linear operator $T$ on an infinite-dimensional Banach space, the equation $Tx = \lambda x$ may have no non-zero solution at all — $T$ may have no eigenvalues. What replaces eigenvalues in this setting? The answer is the **spectrum**: the set of $\lambda \in \mathbb{C}$ for which $\lambda I - T$ fails to be invertible. Computing and understanding spectra requires both the algebraic structure of multiplication (composition of operators) and the analytic structure of completeness (to run power-series arguments). This chapter develops the abstract framework — Banach algebras — that captures precisely this combination, and the central achievement is the **Gelfand representation theorem**, which encodes the entire spectrum of every element in a commutative Banach algebra in terms of a single continuous map into a function algebra. This chapter builds directly on the dual-space and functional machinery developed in Chapters 1–3 — in particular, the Banach-space Liouville theorem (Chapter 1) is the key analytic tool that forces non-emptiness of spectra — and the Gelfand–Mazur theorem shows that the only complete normed division algebra over $\mathbb{C}$ is $\mathbb{C}$ itself.
## Algebras and Banach Algebras
A bounded linear operator on a Banach space can be added to another, scaled by a complex number, and composed. The composition is associative and distributes over addition. What is missing from the Banach-space picture is that the norm interacts only with the additive and scalar structures; it says nothing about composition. Without a compatibility condition between the norm and multiplication, sequences of products can escape to infinity in uncontrollable ways — the limit of a product need not be the product of the limits. The right notion is **submultiplicativity**: a normed algebra in which $\|ab\| \leq \|a\|\|b\|$. This single inequality ensures that multiplication is jointly continuous, that the geometric series argument for inverses works, and that spectral theory has good analytic properties.
[definition: Algebra]
An **algebra** $A$ is a vector space (over $\mathbb{R}$ or $\mathbb{C}$) equipped with a bilinear multiplication $A \times A \to A$, $(a, b) \mapsto ab$, satisfying:
- $(ab)c = a(bc)$ (associativity),
- $a(b + c) = ab + ac$ and $(a + b)c = ac + bc$ (distributivity),
- $\lambda(ab) = (\lambda a)b = a(\lambda b)$ for all $a, b, c \in A$ and scalars $\lambda$.
In other words, $A$ is simultaneously a ring and a module over the scalar field.
[/definition]
The algebraic definition alone allows pathological behaviour: an algebra can be zero-dimensional, or the product of two non-zero elements can be zero. The additional condition that the algebra contain a genuine identity element $1 \neq 0$ is essential for defining inverses and hence spectra.
[definition: Unital Algebra]
A **unital algebra** is an algebra $A$ containing an element $1 \neq 0$ such that $1 \cdot a = a \cdot 1 = a$ for all $a \in A$.
[/definition]
Having an algebraic structure, we now impose the analytic requirement: the norm must respect multiplication.
[definition: Algebra Norm]
An **algebra norm** on an algebra $A$ is a vector space norm $\|\cdot\| : A \to \mathbb{R}$ satisfying the submultiplicativity condition:
\begin{align*}
\|ab\| \leq \|a\| \cdot \|b\| \quad \forall\, a, b \in A.
\end{align*}
The pair $(A, \|\cdot\|)$ is called a **normed algebra**. The condition implies that multiplication is jointly continuous.
[/definition]
A normed algebra is not yet sufficient for most results — we need sequences to converge. Completeness is what allows the geometric series $\sum_{n=0}^\infty x^n$ to converge when $\|x\| < 1$, which is the key mechanism for constructing inverses.
[definition: Banach Algebra]
A **Banach algebra** (BA) is a complete normed algebra: a normed algebra $(A, \|\cdot\|)$ that is complete as a metric space.
[/definition]
For unital algebras, one can always normalise the unit to have norm 1. Submultiplicativity forces $\|1\| = \|1 \cdot 1\| \leq \|1\|^2$, so $\|1\| \geq 1$ always; an equivalent renorming achieves $\|1\| = 1$ without changing the topology.
[definition: Unital Normed Algebra]
A **unital normed algebra** is a normed algebra $A$ that is also unital with $\|1\| = 1$.
[/definition]
[remark: Normalising the Unit]
If $A$ is a normed algebra with unit $1 \neq 0$, then submultiplicativity gives $\|1\| = \|1 \cdot 1\| \leq \|1\|^2$, so $\|1\| \geq 1$ always. For any unital normed algebra one may pass to the equivalent norm $|\!|\!|a|\!|\!| := \sup\{\|ab\| : \|b\| \leq 1\}$, which satisfies $|\!|\!|1|\!|\!| = 1$. Thus without loss of generality, unital Banach algebras are unital normed algebras with $\|1\| = 1$.
[/remark]
Structure-preserving maps between algebras are ring homomorphisms that are also linear. Notice that continuity is not automatic — a homomorphism may fail to be bounded. This is in sharp contrast to the situation for characters into $\mathbb{C}$, which are automatically continuous (see the Characters are Continuous theorem below).
[definition: Algebra Homomorphism]
A linear map $\theta : A \to B$ between algebras is a **homomorphism** if $\theta(xy) = \theta(x)\theta(y)$ for all $x, y \in A$. If $A$ and $B$ are unital with units $1_A$ and $1_B$, then $\theta$ is a **unital homomorphism** if additionally $\theta(1_A) = 1_B$.
If $A, B$ are normed algebras, a homomorphism need not be continuous; an **isomorphism** is a bijective homomorphism $\theta$ such that both $\theta$ and $\theta^{-1}$ are continuous.
[/definition]
The three canonical examples below cover the full range of the theory: a commutative algebra arising from topology, a commutative algebra arising from integration, and a non-commutative algebra arising from operator theory.
[example: Fundamental Examples of Banach Algebras]
The following examples illustrate the range and depth of the Banach algebra framework.
**(i) $C(K)$ for compact Hausdorff $K$**: With pointwise multiplication $(fg)(k) = f(k)g(k)$ and the supremum norm $\|f\|_\infty = \sup_{k \in K}|f(k)|$, submultiplicativity follows directly: $\|fg\|_\infty \leq \|f\|_\infty \|g\|_\infty$. The constant function $1_K$ is the unit, and completeness is standard (uniform limit of continuous functions is continuous). So $C(K)$ is a commutative unital BA. A **uniform algebra** on $K$ is a closed subalgebra of $C(K)$ that contains $1_K$ and separates points.
For non-empty compact $K \subset \mathbb{C}$, the following chain of closed subalgebras records increasing regularity:
\begin{align*}
P(K) \subset R(K) \subset H(K) \subset A(K) \subset C(K),
\end{align*}
where $P(K)$ is the closure of polynomials in $C(K)$, $R(K)$ is the closure of rational functions without poles in $K$, $H(K)$ is the closure of functions holomorphic on some neighbourhood of $K$, and $A(K) = \{f \in C(K) : f \text{ is holomorphic on } \operatorname{int}(K)\}$ is the **disk algebra** when $K$ is a disk. The inclusion $R(K) = H(K)$ follows from the Runge approximation theorem.
The disk algebra $A(\overline{\mathbb{D}})$ consists of functions continuous on the closed unit disk $\overline{\mathbb{D}}$ and holomorphic on the open disk $\mathbb{D}$. The identity function $\mathrm{id}(z) = z$ belongs to $A(\overline{\mathbb{D}})$, and the Gelfand transform for this algebra will encode the full geometry of the disk — a fact whose significance we will see when the spectrum of $\mathrm{id}$ in $A(\overline{\mathbb{D}})$ turns out to be the entire closed disk $\overline{\mathbb{D}}$, not just the boundary circle.
<!-- illustration-needed: the chain P(K) subset R(K) subset H(K) subset A(K) subset C(K) for K the closed unit disk — show as nested annular rings or concentric circles labeling each subalgebra, with P(K) the innermost and C(K) the outermost -->
**(ii) $L^1(\mathbb{R})$ with convolution**: Define multiplication by
\begin{align*}
(f * g)(t) := \int_{\mathbb{R}} f(s)g(t - s)\, ds.
\end{align*}
To verify submultiplicativity: by Fubini, $\|f * g\|_1 = \int_{\mathbb{R}}\left|\int_{\mathbb{R}} f(s)g(t-s)\,ds\right|\,dt \leq \int_{\mathbb{R}}\int_{\mathbb{R}}|f(s)||g(t-s)|\,ds\,dt = \|f\|_1\|g\|_1$. So $L^1(\mathbb{R})$ with convolution and the $L^1$-norm is a commutative BA. It is **not** unital: a unit $e$ would satisfy $e * f = f$ for all $f$, forcing $e$ to be the Dirac delta — not an $L^1$-function. The Riemann–Lebesgue lemma also shows this: the Fourier transform of any $L^1$-function vanishes at infinity, but $\hat{e} \equiv 1$ would have to be bounded away from zero.
**(iii) $\mathcal{L}(X)$ for a Banach space $X$**: Composition satisfies $\|ST\|_{\mathcal{L}(X)} \leq \|S\|_{\mathcal{L}(X)}\|T\|_{\mathcal{L}(X)}$ by definition of the operator norm. The identity operator is the unit, with $\|I\|_{\mathcal{L}(X)} = 1$. Completeness follows from the completeness of $X$ (a Cauchy sequence of operators converges strongly). So $\mathcal{L}(X)$ with composition is a unital BA. When $\dim X > 1$, it is non-commutative — for instance, shift operators on $\ell^2$ don't commute with rank-one projections. These operator algebras are the prototypical non-commutative Banach algebras.
[/example]
## Elementary Constructions
The three fundamental examples above have a structural deficiency that blocks the general theory: $L^1(\mathbb{R})$ has no unit, so the definition of spectrum (which requires inverting $\lambda 1 - x$) is not directly applicable. More generally, a Banach algebra arising from some natural construction may lack a unit, may be over the wrong scalar field, or may need to be embedded in a larger algebra to make its spectrum computable. The four constructions below address these obstacles systematically — each one resolves a specific problem that arises when trying to apply spectral theory.
**(i) Subalgebras.** A **subalgebra** $B$ of an algebra $A$ is a subspace closed under the multiplication of $A$. A **unital subalgebra** of a unital $A$ is a subalgebra containing the unit $1$. The closure of a subalgebra of a normed algebra is again a subalgebra.
**(ii) Unitisation.** The problem: $L^1(\mathbb{R})$ is non-unital, so we cannot directly define the spectrum of a convolution operator $f \in L^1(\mathbb{R})$. The spectrum requires a unit — we need $\lambda 1 - f$ to be an element of the algebra. The fix is to adjoin a formal unit. Given any algebra $A$ (possibly non-unital), its **unitisation** is $A^+ := A \oplus \mathbb{C}$ as a vector space, with multiplication:
\begin{align*}
(a, \lambda) \cdot (b, \mu) := (ab + \lambda b + \mu a, \lambda\mu), \quad a, b \in A,\ \lambda, \mu \in \mathbb{C}.
\end{align*}
Then $A^+$ is a unital algebra with unit $(0, 1)$. The original $A$ embeds as a proper ideal via $a \mapsto (a, 0)$. If $A$ is normed, the unitisation carries the norm $\|(a, \lambda)\| := \|a\| + |\lambda|$, which gives $\|(0,1)\| = 1$. Completeness of $A$ implies completeness of $A^+$.
One can think of $A^+$ as $\{a + \lambda I : a \in A, \lambda \in \mathbb{C}\}$, where $I$ is a formal identity; then $(a + \lambda I)(b + \mu I) = (ab + \lambda b + \mu a) + \lambda\mu I$, which is the rule above.
For $L^1(\mathbb{R})$, the unitisation $(L^1(\mathbb{R}))^+$ consists of pairs $(f, \lambda)$ with $f \in L^1(\mathbb{R})$ and $\lambda \in \mathbb{C}$, thought of as $f + \lambda\delta_0$ where $\delta_0$ is the Dirac mass — precisely the structure of measures on $\mathbb{R}$ that are absolutely continuous plus a point mass at the origin.
**(iii) Ideals.** A left (resp. right, two-sided) ideal $J$ of an algebra $A$ is a subspace closed under left (resp. right, both) multiplication from $A$. If $J \triangleleft A$ is a two-sided ideal in a normed algebra, then $\overline{J}$ is also an ideal. For quotients to give normed algebras one wants closed ideals: if $J$ is closed and $A$ is a unital normed algebra with $J$ a proper ideal, then $A/J$ is a unital normed algebra with unit $1 + J$, and $\|1 + J\|_{A/J} = 1$.
**(iv) Completion.** Let $A$ be a normed algebra and $\tilde{A}$ its Banach space completion. For $a, b \in \tilde{A}$, choose sequences $(a_n), (b_n)$ in $A$ with $a_n \to a$ and $b_n \to b$, and define $ab := \lim_{n \to \infty} a_n b_n$. Submultiplicativity ensures this limit exists and is independent of the sequences. With this multiplication, $\tilde{A}$ becomes a Banach algebra.
**(v) Left multiplication (any BA embeds in $\mathcal{L}(X)$).** Let $A$ be a unital BA. For $a \in A$, define $L_a : A \to A$ by $L_a(b) := ab$. Then $L_a \in \mathcal{L}(A)$ and $\|L_a\| = \|a\|$, so the map $a \mapsto L_a$ is an isometric homomorphism $A \hookrightarrow \mathcal{L}(A)$. Thus every Banach algebra is isometrically isomorphic to a closed subalgebra of $\mathcal{L}(X)$ for some Banach space $X$.
## Spectral Theory
The spectrum of an element $x$ in a unital BA generalises the set of eigenvalues of a matrix. The key to the whole theory is the following lemma on invertibility.
[quotetheorem:2667]
[citeproof:2667]
This lemma is the engine of spectral theory: it says the geometric series $(1-x)^{-1} = \sum_{n=0}^\infty x^n$, valid in $\mathbb{C}$ for $|x| < 1$, works in any Banach algebra. The hypothesis $\|1 - a\| < 1$ is sharp: in $C([0,1])$, the function $a(t) = t$ satisfies $\|1 - a\|_\infty = 1$ and is not invertible (it vanishes at $t = 0$). The lemma gives no information about inverses far from the identity — for those, the spectrum is the right tool. The immediate consequence is that the set of invertible elements is open, and the map $x \mapsto x^{-1}$ is continuous.
[definition: Group of Invertibles]
For a unital algebra $A$, let $G(A) := \{a \in A : a \text{ is invertible}\}$ be the **group of invertible elements**, analogous to $GL_n$ for matrices.
[/definition]
The topological properties of $G(A)$ underpin everything that follows. The openness of $G(A)$ is what makes the spectrum a closed set; part (iv) below is the crucial ingredient in the subalgebra spectrum theorem.
[quotetheorem:2668]
[citeproof:2668]
Part (iii) captures a critical asymmetry: approaching the boundary of invertibility forces the inverse norm to blow up. This is why no element on the boundary of $G(A)$ can be invertible in any larger algebra — property (iv) shows such elements are fundamentally singular. Properties (i) and (ii) together say that $G(A)$ is a topological group (open in $A$, with continuous inversion). Property (iv) will be the key ingredient in the proof that boundary spectra in a subalgebra lie in the spectrum of the ambient algebra.
[definition: Spectrum]
Let $A$ be an algebra and $x \in A$. The **spectrum** of $x$, written $\sigma_A(x)$, is:
- If $A$ is unital: $\sigma_A(x) = \{\lambda \in \mathbb{C} : \lambda 1 - x \notin G(A)\}$.
- If $A$ is non-unital: $\sigma_A(x) := \sigma_{A^+}((x, 0))$, where $(x, 0) \in A^+$ is the image of $x$ under the canonical embedding $a \mapsto (a, 0)$.
[/definition]
The spectrum is the algebraic analogue of the set of eigenvalues, but it is defined purely in terms of invertibility — no need for eigenvectors. Let us see what this gives in concrete cases.
[example: Spectra in Concrete Algebras]
**(i) Matrices**: For $A = M_n(\mathbb{C})$, a scalar $\lambda$ lies in $\sigma_A(X)$ iff $\lambda I - X$ is not invertible, i.e., iff $\det(\lambda I - X) = 0$. This is exactly the characteristic polynomial condition, so $\sigma_A(X) = \{\text{eigenvalues of } X\}$. For example, if $X = \begin{pmatrix} 0 & 1 \\ 0 & 0 \end{pmatrix}$, then $\det(\lambda I - X) = \lambda^2$, so $\sigma_A(X) = \{0\}$ — a single eigenvalue with a two-dimensional generalised eigenspace. The nilpotent $X$ satisfies $X^2 = 0$, so $\|X^n\|^{1/n} \to 0$ as $n \to \infty$, consistent with spectral radius zero.
**(ii) $C(K)$**: An element $f \in C(K)$ is invertible iff $1/f$ is continuous, which holds iff $f(k) \neq 0$ for all $k \in K$. The unit is the constant function $1_K$. Thus $\lambda 1_K - f$ is invertible iff $\lambda \neq f(k)$ for all $k \in K$, giving $\sigma_{C(K)}(f) = f(K)$, the image of $f$ — a compact subset of $\mathbb{C}$.
**(iii) $\mathcal{L}(X)$**: For the right-shift operator $T$ on $\ell^2$, defined by $T(x_1, x_2, \ldots) = (0, x_1, x_2, \ldots)$, one checks: $T$ is injective (no eigenvalues), but $\lambda I - T$ fails to be surjective for $|\lambda| < 1$ (since the range is not closed). Thus $\sigma_{\mathcal{L}(\ell^2)}(T)$ contains the open unit disk, and since $\|T\| = 1$, the spectrum is exactly the closed unit disk $\overline{\mathbb{D}}$. In particular, the right-shift has no eigenvalues — the spectrum can be non-empty even with an empty point spectrum.
[/example]
The key theorem of spectral theory establishes that spectra are non-empty compact sets. The proof has two parts: compactness follows from the openness of $G(A)$, and non-emptiness requires a Banach-space version of Liouville's theorem.
[quotetheorem:2669]
[citeproof:2669]
The role of complex scalars here is indispensable. Over $\mathbb{R}$, the spectrum can be empty: the rotation matrix $R_\theta = \begin{pmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{pmatrix}$ in $M_2(\mathbb{R})$ for $\theta \notin \{0, \pi\}$ has characteristic polynomial $\lambda^2 - 2\cos\theta\,\lambda + 1$ with no real roots, so its real spectrum is empty. The complex field is what allows Liouville's theorem to force non-emptiness. The theorem does not constrain the shape of $\sigma_A(x)$ beyond compactness — the spectrum of a continuous function on the Cantor set $K$ can itself be a Cantor set, and the spectrum of a normal operator on a Hilbert space can be any compact subset of $\mathbb{C}$. An immediate consequence is the following fundamental theorem.
[quotetheorem:2670]
[citeproof:2670]
The Gelfand–Mazur theorem explains why Banach algebra theory over $\mathbb{C}$ is so much richer than over $\mathbb{R}$: in the complex case, any division algebra collapses to $\mathbb{C}$ itself. Over $\mathbb{R}$, the quaternions $\mathbb{H}$ provide a non-trivial four-dimensional normed division algebra. The proof is short but uses the full weight of the Spectrum theorem — it is an analytic result disguised as an algebraic one.
[definition: Spectral Radius]
For a BA $A$ and $x \in A$, the **spectral radius** of $x$ is:
\begin{align*}
r_A(x) := \sup\{|\lambda| : \lambda \in \sigma_A(x)\}.
\end{align*}
By the spectrum theorem, $r_A(x) \leq \|x\|$.
[/definition]
The spectral mapping theorem for polynomials shows how the spectrum transforms under polynomial maps.
[quotetheorem:2671]
[citeproof:2671]
The polynomial spectral mapping theorem is the gateway to functional calculus. The restriction to polynomials is real: the analogous statement for entire functions $f$ (i.e., $\sigma_A(f(x)) = f(\sigma_A(x))$) requires the holomorphic functional calculus, which uses contour integration to define $f(x) = \frac{1}{2\pi i}\oint_\gamma f(\lambda)(\lambda 1 - x)^{-1}\,d\lambda$. This goes beyond what polynomial algebra can provide, and its development requires that $\sigma_A(x)$ be non-empty so the contour can be chosen correctly. The Beurling–Gelfand formula gives an intrinsic formula for the spectral radius in terms of the norm.
[quotetheorem:2672]
[citeproof:2672]
The formula $r_A(x) = \lim \|x^n\|^{1/n}$ bridges two very different notions: the spectral radius is defined algebraically (via invertibility), yet it equals a limit that depends only on the norm. That the limit even exists follows from the submultiplicativity $\|x^{m+n}\| \leq \|x^m\|\|x^n\|$, which by Fekete's lemma guarantees $\lim \|x^n\|^{1/n} = \inf_n \|x^n\|^{1/n}$.
The formula also illuminates the gap between $\|x\|$ and $r_A(x)$. Consider the nilpotent matrix $N = \begin{pmatrix} 0 & 1 \\ 0 & 0 \end{pmatrix}$ in $M_2(\mathbb{C})$: $\|N\| = 1$ but $N^2 = 0$, so $\|N^n\|^{1/n} = 0$ for $n \geq 2$, giving $r(N) = 0$. The norm alone is too crude an invariant — it fails to detect that all spectral values are $0$. The formula $\lim \|x^n\|^{1/n}$ sees through the norm's over-counting to extract the true spectral information. The next result examines how spectra behave when we restrict to a subalgebra.
[quotetheorem:2673]
[citeproof:2673]
The subalgebra theorem says $\sigma_B(x)$ can only be larger than $\sigma_A(x)$, and it can grow only by filling in bounded holes — bounded connected components of the complement $\mathbb{C} \setminus \sigma_A(x)$. A striking example: take $A = C(\partial \mathbb{D})$ (continuous functions on the unit circle) and $B = A(\overline{\mathbb{D}})$ (the disk algebra). The identity function $\mathrm{id}(z) = z$ has spectrum $\sigma_A(\mathrm{id}) = \partial \mathbb{D}$ (the unit circle) in $C(\partial \mathbb{D})$, but $\sigma_B(\mathrm{id}) = \overline{\mathbb{D}}$ (the full closed disk) in $A(\overline{\mathbb{D}})$: inverting $\lambda \cdot 1 - \mathrm{id}$ requires dividing by $\lambda - z$, which is not in the disk algebra when $|\lambda| < 1$. The subalgebra $A(\overline{\mathbb{D}})$ fills in the entire open unit disk. This spectrum-filling phenomenon is intrinsically geometric: the disk algebra sees the interior of the disk, while $C(\partial \mathbb{D})$ sees only the boundary.
<!-- illustration-needed: the spectrum-filling phenomenon — show sigma_A(id) = partial D (unit circle) in C(partial D) on the left, and sigma_B(id) = closed disk D-bar in A(D-bar) on the right, with an arrow showing the subalgebra inclusion filling in the bounded component -->
[quotetheorem:2674]
[citeproof:2674]
Maximal commutative subalgebras provide the bridge from non-commutative to commutative algebra. To compute $r_A(xy)$ for commuting $x, y$, we pass to a maximal commutative subalgebra $C$ containing both, which by the theorem has the same spectrum as $A$. Inside the commutative algebra $C$, the character formula applies, and the multiplicativity of characters gives the subadditivity and submultiplicativity of the spectral radius. Without passing to a maximal commutative subalgebra, the character formula is unavailable — maximal commutative subalgebras are the reduction step that makes the commutative theory applicable to non-commutative problems.
## Commutative Banach Algebras and the Gelfand Transform
The spectrum of an element is defined via invertibility — an abstract algebraic condition. In a commutative BA, we can ask for something more concrete: a formula that computes $\sigma_A(x)$ directly as a set of values. The obstacle is that inverting $\lambda 1 - x$ in a general algebra is hard to check. But if we can represent $A$ as a function algebra, invertibility becomes pointwise non-vanishing, which is easy to check. The key insight is that in a commutative BA, every maximal ideal is the kernel of a **character** — a multiplicative linear functional $\varphi : A \to \mathbb{C}$ — and the spectrum of $x$ is exactly the set of values $\{\varphi(x) : \varphi \in \Phi_A\}$.
[definition: Character]
A **character** on an algebra $A$ is a non-zero algebra homomorphism $\varphi : A \to \mathbb{C}$. We write $\Phi_A$ for the set of all characters of $A$. If $A$ is unital, then $\varphi(1) = 1$ for every $\varphi \in \Phi_A$.
[/definition]
A character is an algebraic object — just a multiplicative linear map. The remarkable fact is that in a Banach algebra, algebra implies analysis: every character is automatically bounded.
[quotetheorem:2675]
[citeproof:2675]
The automatic continuity of characters is striking and specific to maps into $\mathbb{C}$. A homomorphism between two general Banach algebras need not be continuous — boundedness is an additional hypothesis in that setting, not a consequence (as noted in the definition of algebra isomorphism above). What makes characters special is that $\mathbb{C}$ has no room for an element to be large without being invertible: if $|\varphi(x)| > \|x\|$, the image point $\varphi(x)$ would make $\varphi(1 - x/\varphi(x)) = 0$, but $1 - x/\varphi(x)$ is invertible in $A$, and characters preserve invertibility. The continuity of characters has a crucial structural consequence: $\Phi_A$ sits inside the unit ball of $A^*$, enabling the Banach–Alaoglu theorem to make $\Phi_A$ compact in the weak$^*$ topology.
[quotetheorem:2676]
[citeproof:2676]
This bijection is a remarkable bridge between ring theory and analysis: a purely algebraic object (maximal ideal) is classified by an analytic object (a bounded linear functional of norm 1). The key step is Gelfand–Mazur in the surjectivity argument — without complex scalars, $A/M$ need not be isomorphic to $\mathbb{C}$ (it could be $\mathbb{R}$ or even $\mathbb{H}$), and the bijection would fail. Commutativity is also essential: in a non-commutative BA, the quotient $A/M$ by a maximal ideal need not be a field, so the argument breaks down. The bijection immediately gives a character-based formula for the spectrum.
[quotetheorem:2677]
[citeproof:2677]
This is the payoff of the commutative theory: invertibility, which was defined abstractly via the group $G(A)$, is now completely described by pointwise evaluation on characters. Statement (i) says $x$ is invertible iff it has no character zero — exactly the analogue of "$f \in C(K)$ is invertible iff $f$ is nowhere zero." Statement (ii) shows the spectrum is the image of $x$ under the Gelfand transform. In the non-commutative case this formula breaks down: for the right-shift $T$ on $\ell^2$, the algebra $\mathcal{L}(\ell^2)$ admits no characters at all (its closed two-sided ideals are not of codimension 1; in fact $\mathcal{L}(\ell^2)$ has no proper two-sided closed ideals other than the compact operators, which has codimension infinity), yet $\sigma(T)$ is the closed unit disk.
[example: Characters of $C(K)$ and $R(K)$]
**(i) $A = C(K)$**: The characters are exactly the evaluation maps $\delta_k(f) = f(k)$ for $k \in K$. Indeed, $\delta_k \in \Phi_A$ for all $k$. Conversely, let $M \in \mathfrak{M}_A$ and suppose $M \neq \ker(\delta_k)$ for all $k$. Then for each $k \in K$, there exists $f_k \in M$ with $f_k(k) \neq 0$, so by continuity $f_k \neq 0$ on some open neighbourhood $U_k$ of $k$. Compactness gives a finite subcover $K = \bigcup_{i=1}^n U_{k_i}$. Then $f := \sum_{i=1}^n |f_{k_i}|^2 = \sum_i f_{k_i} \overline{f_{k_i}} \in M$ (as $M$ is an ideal), and $f > 0$ on $K$, so $f$ is invertible. Thus $M \cap G(A) \neq \varnothing$, contradicting $M$ proper. Hence $\Phi_{C(K)} = \{\delta_k : k \in K\}$.
This recovers the earlier calculation: $\sigma_{C(K)}(f) = f(K)$.
**(ii) $A = R(K)$ for $K \subset \mathbb{C}$ compact, $K \neq \varnothing$**: The characters are again the point evaluations $\{\delta_k : k \in K\}$. Let $\varphi \in \Phi_{R(K)}$ and let $\mathrm{id} \in R(K)$ denote the identity function $\mathrm{id}(z) = z$. Set $w = \varphi(\mathrm{id})$. If $w \notin K$, then $w \cdot 1 - \mathrm{id}$ is invertible in $R(K)$ (the function $z \mapsto w - z$ has no zero in $K$), so $\varphi(w \cdot 1 - \mathrm{id}) \neq 0$, but it equals $w - \varphi(\mathrm{id}) = 0$ — a contradiction. So $w \in K$. For any polynomial $p(z) = \sum a_k z^k$, we have $\varphi(p) = \sum a_k \varphi(\mathrm{id})^k = p(w) = \delta_w(p)$. The same extends to rational functions without poles in $K$, and by continuity (characters have norm 1) to all of $R(K)$. So $\varphi = \delta_w$.
[/example]
### The Gelfand Topology and Gelfand Transform
We now want to use the character space $\Phi_A$ as a compact topological space on which elements of $A$ act as continuous functions. To do this, $\Phi_A$ needs a topology. The natural choice is the weakest topology making all evaluation maps $\varphi \mapsto \varphi(x)$ continuous — this is the restriction of the weak$^*$ topology from $A^*$. The payoff is that $\Phi_A$ becomes compact, enabling us to apply the full theory of continuous functions on compact spaces.
The set $\Phi_A$ can be viewed as a subset of the unit ball of $A^*$:
\begin{align*}
\Phi_A = \{\varphi \in \overline{B}_{A^*} : \varphi(xy) = \varphi(x)\varphi(y)\ \forall x, y \in A,\ \varphi(1) = 1\},
\end{align*}
which is the intersection of closed sets in the weak$^*$ topology (preimages of $\{1\}$ and $\{0\}$ under evaluation maps). So $\Phi_A$ is weak$^*$-closed in $\overline{B}_{A^*}$. By the Banach–Alaoglu theorem, $\overline{B}_{A^*}$ is weak$^*$-compact, hence $\Phi_A$ is weak$^*$-compact.
[definition: Gelfand Topology and Character Space]
The **Gelfand topology** on $\Phi_A$ is the weak$^*$-topology inherited from $A^*$. The compact space $(\Phi_A, w^*)$ is called the **spectrum** of $A$, also the **character space** or **maximal ideal space**.
[/definition]
With $\Phi_A$ now a compact Hausdorff space, every element $x \in A$ defines a continuous function on it.
[definition: Gelfand Transform]
For $x \in A$, define $\hat{x} : \Phi_A \to \mathbb{C}$ by $\hat{x}(\varphi) := \varphi(x)$. Then $\hat{x} \in C(\Phi_A)$ (it is continuous by definition of the Gelfand topology). The function $\hat{x}$ is the **Gelfand transform** of $x$, and the map
\begin{align*}
\widehat{\ } : A \to C(\Phi_A), \quad x \mapsto \hat{x},
\end{align*}
is the **Gelfand map**.
[/definition]
The Gelfand map is a canonical algebra homomorphism from $A$ to a function algebra. Its significance is that all spectral information is preserved: the spectrum of $\hat{x}$ in $C(\Phi_A)$ equals the spectrum of $x$ in $A$.
[quotetheorem:2678]
[citeproof:2678]
The Gelfand map is the commutative Banach algebra analogue of diagonalising a normal matrix. In general it is neither injective nor surjective — two distinct phenomena that are worth understanding concretely. It fails to be injective when $A$ contains non-zero quasi-nilpotents: if $r_A(x) = 0$ but $x \neq 0$, then $\hat{x} \equiv 0$ in $C(\Phi_A)$. A concrete example: any algebra with a non-zero nilpotent element, such as upper-triangular $2 \times 2$ matrices (restricted to the off-diagonal part), has a non-trivial kernel. It fails to be surjective for the disk algebra $A(\overline{\mathbb{D}})$: the Gelfand map is an isometry onto $A(\overline{\mathbb{D}})$ itself (since characters are evaluation maps $\delta_z$ for $z \in \overline{\mathbb{D}}$, and the Gelfand transform of $f \in A(\overline{\mathbb{D}})$ is $\hat{f}(\delta_z) = f(z)$), so the image is $A(\overline{\mathbb{D}}) \subsetneq C(\overline{\mathbb{D}})$. The distinction between these two failure modes is precisely captured by the Jacobson radical.
### The Jacobson Radical
The Gelfand map can fail to be injective: if $r_A(x) = 0$, then $\hat{x} \equiv 0$ even when $x \neq 0$. Which elements are killed by every character simultaneously? An element $x$ lies in the kernel of every character iff $\varphi(x) = 0$ for all $\varphi \in \Phi_A$, i.e., iff $\sigma_A(x) = \{0\}$. These are precisely the **quasi-nilpotent** elements — elements that look nilpotent from the spectral perspective, even if $x^n \neq 0$ for all $n$. The Jacobson radical is the subspace they form.
[definition: Jacobson Radical]
The **Jacobson radical** of a commutative, unital BA $A$ is:
\begin{align*}
J(A) := \ker(\widehat{\ }) = \bigcap_{\varphi \in \Phi_A} \ker(\varphi) = \bigcap_{M \in \mathfrak{M}_A} M.
\end{align*}
Equivalently, $J(A) = \{x \in A : \sigma_A(x) = \{0\}\}$, the set of elements of zero spectral radius.
[/definition]
The three descriptions of $J(A)$ — as the kernel of the Gelfand map, as the intersection of all maximal ideal kernels, and as the set of zero-spectral-radius elements — each illuminate a different facet of the same object.
[definition: Semisimple Algebra]
$A$ is **semisimple** if $J(A) = \{0\}$, i.e. if the Gelfand map is injective.
[/definition]
Semisimplicity is the condition under which the Gelfand map becomes a faithful representation of $A$ as a function algebra — every element is determined by its values on characters.
[definition: Quasi-Nilpotent Element]
An element $x \in A$ is **quasi-nilpotent** if $\lim_{n \to \infty} \|x^n\|^{1/n} = 0$, equivalently if $r_A(x) = 0$, equivalently if $\sigma_A(x) = \{0\}$.
[/definition]
Thus $J(A)$ is exactly the set of quasi-nilpotent elements of $A$. Nilpotent elements (with $x^n = 0$ for some $n$) are automatically quasi-nilpotent, but not conversely. The canonical example of a quasi-nilpotent that is not nilpotent is the **Volterra operator**: $V : L^2([0,1]) \to L^2([0,1])$ defined by $(Vf)(x) = \int_0^x f(t)\,dt$. An iterated integral computation gives $\|V^n\|_{L^2 \to L^2} \le 1/n!$, so $\|V^n\|^{1/n} \le (1/n!)^{1/n} \to 0$ as $n \to \infty$, giving $r(V) = 0$. Yet $V^n \neq 0$ for all $n$ — the Volterra operator is genuinely quasi-nilpotent, not nilpotent. The Gelfand map kills $V$ entirely: $\hat{V} \equiv 0$ on $\Phi_{\mathcal{L}(L^2)}$. A semisimple algebra has no non-trivial quasi-nilpotents: the Gelfand map is an honest representation.
[quotetheorem:2679]
[citeproof:2679]
The commutativity hypothesis in this theorem is not a technicality — it is essential. Consider the nilpotent matrices $E_{12} = \begin{pmatrix} 0 & 1 \\ 0 & 0 \end{pmatrix}$ and $E_{21} = \begin{pmatrix} 0 & 0 \\ 1 & 0 \end{pmatrix}$ in $M_2(\mathbb{C})$. Both are nilpotent: $E_{12}^2 = E_{21}^2 = 0$, so $r(E_{12}) = r(E_{21}) = 0$. Their sum is $E_{12} + E_{21} = \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix}$, which has eigenvalues $\pm 1$, so $r(E_{12} + E_{21}) = 1$. This violates $r(x+y) \leq r(x) + r(y) = 0$ — subadditivity fails completely without commutativity. The character-based proof breaks down because $\mathcal{L}(\mathbb{C}^2) = M_2(\mathbb{C})$ has no characters at all (it is a simple algebra with no proper ideals of codimension 1).
The abstract spectral theory of Chapter 5 is made concrete and powerful in Chapter 6 through the holomorphic functional calculus: for a commutative Banach algebra element, the HFC defines f(x) for any holomorphic function f on the spectrum via Cauchy's integral formula. This functional calculus is then used to prove Runge's approximation theorem, demonstrating a surprising bridge between functional analysis and classical complex analysis.
# 6. Holomorphic Functional Calculus
This chapter develops the **Holomorphic Functional Calculus (HFC)**, which gives a rigorous meaning to expressions like $f(a)$ for an element $a$ of a commutative Banach algebra and a holomorphic function $f$. The polynomial spectral mapping theorem established in Chapter 5 is the starting point: the HFC extends that result from polynomials to all holomorphic functions using contour integration. The key difficulty is that while polynomials in $a$ are perfectly well-defined via the algebra operations, extending to arbitrary analytic functions requires a new idea. The natural one, originating in complex analysis, is Cauchy's integral formula — and the HFC is precisely a Banach-space-valued version of it. The chapter builds up from vector-valued integration, establishes the main theorem, and then uses it to prove Runge's Approximation Theorem — an elegant exchange of ideas between functional analysis and complex analysis.
## Setting Up the Algebra of Holomorphic Functions
The central question is: what is the natural domain on which the functional calculus $\Theta_x$ should be defined? Polynomials are too rigid — there is no polynomial formula for $\log$ or $\exp$ on a large disk. Smooth functions admit no Cauchy integral formula. The right space turns out to be the algebra $\mathcal{O}(U)$ of functions holomorphic on some open neighborhood $U$ of the spectrum, but its topology must be chosen so that contour integrals against elements of $A$ are continuous in the function $f$. The locally uniform topology — generated by sup-norms over compact subsets of $U$ — is exactly what makes this work.
Let $U \subset \mathbb{C}$ be non-empty and open. We write $\mathcal{O}(U)$ for the algebra of all analytic functions $f : U \to \mathbb{C}$. For a compact set $\varnothing \neq K \subset U$, define the seminorm $\|f\|_K := \sup_K |f|$. The family of all such seminorms (over all compact $K \subset U$) turns $\mathcal{O}(U)$ into a locally convex space (lcs), and in fact a Fréchet algebra: the algebra operations are jointly continuous.
Write $\mathbb{1}_U \in \mathcal{O}(U)$ for the constant function $1$ on $U$, and $\operatorname{id} \in \mathcal{O}(U)$ for the identity map $z \mapsto z$. Then $\mathcal{O}(U)$ is a unital algebra with unit $\mathbb{1}_U$, and every polynomial $p(z) = \sum_{k=0}^n a_k z^k$ belongs to $\mathcal{O}(U)$ (as a finite linear combination of powers of $\operatorname{id}$).
The main theorem of the chapter asserts the existence and uniqueness of a functional calculus on this algebra:
[quotetheorem:2680]
The proof requires significant set-up and is given in stages below. Before embarking on it, it is worth pausing to understand both what the theorem says and what it requires.
[remark: Interpreting the HFC as Evaluation]
If $p = \sum_{k=0}^n a_k \operatorname{id}^k$ is a polynomial and $\Theta_x$ is a homomorphism with $\Theta_x(\operatorname{id}) = x$, then $\Theta_x(p) = \sum_{k=0}^n a_k x^k = p(x)$. So for polynomials, $\Theta_x(f)$ is simply evaluation at $x$. The theorem extends this to all holomorphic $f$: we think of $\Theta_x(f)$ as "$f(x)$". The spectral mapping property $\varphi(f(x)) = f(\varphi(x))$ holds for polynomials by direct computation, and the theorem extends it to all analytic $f$. For a concrete example: the HFC gives a canonical way to define $\log(M)$ for an invertible matrix (or more generally for any invertible element of a Banach algebra), something no finite polynomial can do.
[/remark]
Having seen what the theorem produces, we should ask what it requires. Each of the three hypotheses — holomorphy on a neighbourhood of the spectrum, commutativity of $A$, and continuity of $\Theta_x$ — plays a non-redundant role, and the theorem fails the moment any one is dropped. The next discussion examines each in turn.
[explanation: Necessity of the Hypotheses and Limitations]
Each hypothesis in the theorem is essential, and the theorem fails without it.
**Holomorphy on a neighborhood of the spectrum, not just on the spectrum itself.** The hypothesis requires $f$ to be holomorphic on an open set $U$ containing $\sigma_A(x)$ — it is not enough for $f$ to be continuous, or even real-analytic, on $\sigma_A(x)$. The reason is the Cauchy integral formula: the contour $\Gamma$ encircling $\sigma_A(x)$ must lie in the domain of holomorphy of $f$, and the contour necessarily surrounds a neighborhood of $\sigma_A(x)$. If $f$ were merely continuous on $\sigma_A(x)$, the integral $\frac{1}{2\pi i} \int_\Gamma f(z)(z - x)^{-1}\, dz$ could not be defined — $f$ would not have values on $\Gamma$. More concretely: take $A = \mathbb{C}$, $x = 0$, and let $f$ be a continuous function on $\{0\}$ that does not extend holomorphically to any neighborhood. No canonical "$f(0)$" can be produced by any algebraically natural construction.
**Commutativity.** The homomorphism property $\Theta_x(fg) = \Theta_x(f)\Theta_x(g)$ uses commutativity of $A$ in an essential way: the proof extends the multiplicativity from rational functions (where it follows by algebra) to all holomorphic $f$ by density, and the density argument invokes the characters $\Phi_A$ — which are only defined (and only separate points of $A$) in the commutative setting. For non-commutative $A$, neither the character space nor the spectral mapping property in the form $\sigma_A(f(x)) = f(\sigma_A(x))$ is available. The HFC can be extended to non-commutative $C^*$-algebras and normal operators via the continuous functional calculus, but that requires Hilbert space structure.
**Uniqueness is within the class of continuous unital homomorphisms.** The theorem asserts that $\Theta_x$ is the unique continuous unital homomorphism $\mathcal{O}(U) \to A$ sending $\operatorname{id} \mapsto x$. Without continuity, the uniqueness fails: abstract homomorphisms from $\mathcal{O}(U)$ to $A$ (extending evaluation on polynomials but discontinuous) might exist in principle. The role of continuity is precisely to allow a density argument: once two continuous homomorphisms agree on the dense subalgebra $\mathcal{R}(U)$, they agree everywhere. The theorem does not assert that every map $f \mapsto f(x)$ is the HFC — it asserts that there is exactly one such map with the right continuity and unitality properties, which is what makes it canonical.
[/explanation]
Alongside the HFC, we will prove a classical complex-analytic result by purely functional-analytic means:
[quotetheorem:2681]
The statement compresses two distinct claims — a density result and a precise topological constraint on which poles are needed — and both deserve unpacking. The discussion below clarifies the role of the bounded components of $\mathbb{C} \setminus K$, shows why polynomials alone are insufficient, and contrasts Runge's theorem with stronger approximation results.
[explanation: What Runge Says and What It Does Not]
Runge's theorem answers a question raised by polynomial approximation: which functions on $K$ can be uniformly approximated by rational functions? The answer is: exactly the functions holomorphic on some neighborhood of $K$. The bounded components of $\mathbb{C} \setminus K$ are the key obstruction — each one requires a pole to be representable by rational approximation.
To see why the hypothesis on $\Lambda$ is necessary, consider $K = \{z \in \mathbb{C} : 1/2 \le |z| \le 2\}$, the closed annulus. The bounded component of $\mathbb{C} \setminus K$ is the punctured disk $\{|z| < 1/2\}$. The function $f(z) = 1/z$ is holomorphic on the annulus (as an open neighborhood of $K$ is available), so $f \in \mathcal{O}(K)$. But $f(z) = 1/z$ cannot be uniformly approximated on $K$ by polynomials: if $p_n \to 1/z$ uniformly on $K$, then integrating around $|z| = 1$ would give $\int_{|z|=1} p_n(z)\, dz \to \int_{|z|=1} z^{-1}\, dz = 2\pi i$, yet $\int_{|z|=1} p_n(z)\, dz = 0$ for every polynomial $p_n$ by Cauchy's theorem. So polynomial approximation fails, and at least one pole inside the hole $\{|z| < 1/2\}$ is required. The refined Runge statement says one pole per bounded component suffices.
The theorem also does not claim that polynomials are dense in $\mathcal{O}(K)$ for general $K$ — that is a separate and stronger statement (closer to Mergelyan's theorem) which requires $\mathbb{C} \setminus K$ to be connected. Runge's theorem is about rational approximation, where poles are available to handle the topological obstructions.
[/explanation]
The two theorems are proved together: most of the HFC is established first (Theorem 6.2), then Runge's theorem is deduced from it, and finally the density result of Runge's theorem is used to complete the proof of the HFC (specifically, to verify the homomorphism property by a density argument). The circularity is productive: the map $\Theta_x$ is partially defined and verified first, Runge is then proved as a corollary for the specific algebra $\mathcal{C}(K)$, and the density result from Runge flows back to close the loop.
## Vector-Valued Integration
The HFC is defined via a contour integral in which the integrand takes values in the Banach algebra $A$, not in $\mathbb{C}$. A direct attempt to extend scalar integration to $X$-valued functions runs into a subtlety: Riemann sums are still well-defined as limits in the norm topology, but the key properties — especially the ability of bounded linear functionals to pass through the integral — must be verified from scratch. Without this commutation property, the connection between the vector-valued Cauchy integral and scalar Cauchy formulas would be severed, and the entire approach would fail. We therefore need a theory of integration for Banach-space-valued continuous functions whose foundational properties we now record.
[definition: Riemann Integral of a Banach-Valued Function]
Let $a < b \in \mathbb{R}$, let $X$ be a Banach space, and let $f : [a, b] \to X$ be continuous. For each $n \in \mathbb{N}$, let $\Delta_n : a = t_0^{(n)} < t_1^{(n)} < \cdots < t_{k_n}^{(n)} = b$ be a dissection with mesh $|\Delta_n| := \max_{1 \le j \le k_n}(t_j^{(n)} - t_{j-1}^{(n)}) \to 0$ as $n \to \infty$. Then define:
\begin{align*}
\int_a^b f(t)\, dt := \lim_{n \to \infty} \sum_{j=1}^{k_n} f(t_j^{(n)})(t_j^{(n)} - t_{j-1}^{(n)}).
\end{align*}
This limit exists in $X$ and is independent of the choice of dissections $(\Delta_n)_n$ (using the uniform continuity of $f$ on the compact interval $[a, b]$).
[/definition]
The key property connecting this to the classical integral is that continuous linear functionals pass through it:
[quotetheorem:2682]
[citeproof:2682]
This commutation property is the entire reason vector-valued integration is workable for our purposes. The continuity of $\varphi$ is essential: without it, the limit of Riemann sums need not commute with $\varphi$, and the reduction to scalar integrals would fail. The result also identifies the correct proof technique for essentially every claim about vector-valued contour integrals: reduce to the scalar case via an arbitrary $\varphi \in X^*$, apply the classical scalar result, then invoke Hahn-Banach to conclude the vector-valued statement. This template will recur throughout the chapter.
With this in hand, we define contour integrals of $X$-valued functions:
[definition: Vector-Valued Contour Integral]
Let $\gamma : [a, b] \to \mathbb{C}$ be a (piecewise smooth) path with image $[\gamma] := \operatorname{Im}(\gamma)$. Let $f : [\gamma] \to X$ be continuous. Define:
\begin{align*}
\int_\gamma f(z)\, dz := \int_a^b f(\gamma(t)) \cdot \gamma'(t)\, dt \in X.
\end{align*}
If $\Gamma = (\gamma_1, \ldots, \gamma_m)$ is a chain (a finite sequence of paths) with $[\Gamma] = \bigcup_{j=1}^m [\gamma_j]$, set $\int_\Gamma f(z)\, dz := \sum_{j=1}^m \int_{\gamma_j} f(z)\, dz$.
[/definition]
From the scalar estimate one obtains, for a single path, the bound:
\begin{align*}
\left\|\int_\gamma f(z)\, dz\right\| \le \|f\|_{[\gamma]} \cdot \ell(\gamma),
\end{align*}
where $\|f\|_{[\gamma]} = \sup_{z \in [\gamma]} \|f(z)\|$ and $\ell(\gamma) = \int_a^b |\gamma'(t)|\, dt$ is the length of $\gamma$. For a chain, the same bound holds with $\ell(\Gamma) = \sum_{j=1}^m \ell(\gamma_j)$.
The following vector-valued Cauchy theorem is the foundation of the HFC. The proof strategy is precisely the Hahn-Banach template from above: reduce to the scalar case via functionals.
[quotetheorem:2683]
[citeproof:2683]
The winding-number hypothesis is essential. Without it, the scalar Cauchy theorem fails, and the reduction to the scalar case yields nothing. A concrete illustration: take $X = \mathbb{C}$, $f(z) = 1/z$, and $\Gamma = \partial B(0,1)$ oriented counterclockwise. Then $n(\Gamma, 0) = 1 \neq 0$, and indeed $\int_\Gamma f(z)\, dz = 2\pi i \neq 0$. The point $0$ lies outside $U = \mathbb{C} \setminus \{0\}$, so the theorem is not violated — but it shows what happens when a contour has non-zero winding around a singularity. For the HFC, we will use this theorem with $f(z) = g(z)(z - x)^{-1}$ and $\Gamma$ chosen to wind around $\sigma_A(x)$ with winding number $1$ — the theorem then guarantees that deforming $\Gamma$ (without crossing $\sigma_A(x)$) does not change the value of the integral. This independence-of-contour fact is what makes $\Theta_x(f)$ well-defined.
## Constructing and Proving the Functional Calculus
The problem is to define $f(x)$ for $x \in A$ and $f \in \mathcal{O}(U)$ when the power series approach is unavailable. The power series $\sum_{n=0}^\infty c_n x^n$ converges in $A$ only when the radius of convergence of $\sum c_n z^n$ exceeds $\|x\|$, which may not contain the full spectrum. For $f = \log$ or $f = (z - \lambda)^{-1}$ with $\lambda \in \sigma_A(x)$, no globally convergent power series exists. The right idea is to use Cauchy's integral formula directly: in the scalar case, $f(\lambda) = \frac{1}{2\pi i} \int_\Gamma f(z)(z - \lambda)^{-1}\, dz$. Replacing the scalar $\lambda$ with the Banach-algebra element $x$ gives the candidate definition $\Theta_x(f) = \frac{1}{2\pi i} \int_\Gamma f(z)(z \cdot 1 - x)^{-1}\, dz$ — where the resolvent $(z \cdot 1 - x)^{-1}$ plays the role of $(z - \lambda)^{-1}$.
Let $A$, $x$, $U$ be as in Theorem 6.1, and set $K = \sigma_A(x) \subset U$. By a standard fact from complex analysis (see the Complex Analysis handout), there exists a cycle $\Gamma$ in $U \setminus K$ with winding numbers:
\begin{align*}
n(\Gamma, \omega) = \begin{cases} 0 & \text{if } \omega \notin U, \\ 1 & \text{if } \omega \in K. \end{cases}
\end{align*}
Concretely, $\Gamma$ is a finite union of rectangles whose interiors cover $K$ and whose boundaries lie in $U \setminus K$.
<!-- illustration-needed: A cycle Γ in the annular region U∖K encircling the compact spectrum K exactly once, with winding number 1 on K and 0 outside U. Show several rectangular contours assembled into the cycle, the compact set K shaded in the interior, and the open set U as a larger region. -->
[definition: Cauchy Integral Formula for Banach Algebras]
With notation as above, define $\Theta_x : \mathcal{O}(U) \to A$ by:
\begin{align*}
\Theta_x(f) := \frac{1}{2\pi i} \int_\Gamma f(z) \cdot (z \cdot 1 - x)^{-1}\, dz.
\end{align*}
[/definition]
This is well-defined because $z \mapsto (z \cdot 1 - x)^{-1}$ is continuous (and hence bounded) on the compact set $[\Gamma] \subset U \setminus \sigma_A(x) = \rho_A(x) \cup (\mathbb{C} \setminus U)$; continuity follows from the continuity of inversion on $G(A)$ (Chapter 5).
[quotetheorem:2684]
[citeproof:2684]
It is worth pausing to note the structure of what was just proved: every property except the homomorphism property has been verified, and the homomorphism property was deliberately deferred. The reason is methodological — verifying multiplicativity directly from the contour integral formula leads to a double integral that does not simplify, but verifying it on rational functions is straightforward. The next remark explains why this two-step strategy is the right one.
[remark: Why Rational Functions First]
The argument for Property 2 bypasses any notion of $f(x)$ for general analytic $f$ — it relies only on the algebra structure of $A$ and the vector-valued Cauchy theorem. The extension to all holomorphic $f$ then goes by density: rational functions without poles in $U$ are dense in $\mathcal{O}(U)$, which is exactly the content of Runge's theorem. The strategy is therefore circular in a productive way: partial HFC $\Rightarrow$ Runge $\Rightarrow$ full HFC.
[/remark]
Properties 3 and 4 deserve emphasis. Property 3 is the conceptual heart of the construction: it reduces the claim that $\varphi(\Theta_x(f)) = f(\varphi(x))$ — an identity in the Banach algebra $A$ — to the scalar Cauchy integral formula, via the characters of $A$. Without commutativity, the characters $\Phi_A$ would not separate points of $A$, and this reduction would be unavailable. Property 4 is then a formal consequence: the spectrum is computed by evaluating all characters. Together, they confirm that the integral formula for $\Theta_x(f)$ is the correct generalization of evaluating $f$ at $x$.
[example: Resolvent as a Functional Calculus Output]
The resolvent $(f_\lambda)(z) = (z - \lambda)^{-1}$ for a fixed $\lambda \notin \sigma_A(x)$ is holomorphic on $U = \mathbb{C} \setminus \{\lambda\}$, which contains $\sigma_A(x)$. Applying the HFC:
\begin{align*}
\Theta_x(f_\lambda) = \frac{1}{2\pi i} \int_\Gamma (z - \lambda)^{-1} (z \cdot 1 - x)^{-1}\, dz.
\end{align*}
By the resolvent identity $(z \cdot 1 - x)^{-1} - (\lambda \cdot 1 - x)^{-1} = (z - \lambda)(\lambda \cdot 1 - x)^{-1}(z \cdot 1 - x)^{-1}$ (which holds in any unital algebra), one can rewrite the integrand and apply the vector-valued Cauchy theorem on the analytic part. The spectral mapping immediately confirms the answer:
\begin{align*}
\sigma_A(\Theta_x(f_\lambda)) = \{({\mu} - \lambda)^{-1} : \mu \in \sigma_A(x)\} = \sigma_A((\lambda \cdot 1 - x)^{-1}).
\end{align*}
Hence $\Theta_x(f_\lambda) = (\lambda \cdot 1 - x)^{-1}$ — the HFC recovers the resolvent operator. This is not just a sanity check: it shows that the resolvent, which was defined algebraically, is already an instance of the functional calculus. The power series expansion $(z \cdot 1 - x)^{-1} = \sum_{n \ge 0} x^n z^{-(n+1)}$ (valid for $|z| > \|x\|$) is a special case — but the HFC extends the resolvent to all $\lambda \notin \sigma_A(x)$, not just large $\lambda$.
[/example]
The previous example showed the HFC reproducing a familiar object — the resolvent — and so functioned as a consistency check. The next example uses the HFC to *construct* something new, an idempotent associated to a topological splitting of the spectrum. This is qualitatively different: the input is not a smooth function but a locally constant one, exploiting the disconnectedness of $\sigma_A(x)$, and the output is an algebraic feature ($e^2 = e$) extracted purely from the topology.
[example: Spectral Projection from an Idempotent]
Suppose $\sigma_A(x) = \sigma_1 \cup \sigma_2$ where $\sigma_1$ and $\sigma_2$ are non-empty, disjoint, and both open-and-closed in $\sigma_A(x)$ (e.g., $x$ has isolated eigenvalues). Define $f : U \to \mathbb{C}$ to be the function holomorphic on some open $U \supset \sigma_A(x)$ with $f|_{U_1} = 1$ and $f|_{U_2} = 0$, where $U_1, U_2$ are disjoint open sets covering $U$ with $\sigma_1 \subset U_1$ and $\sigma_2 \subset U_2$. Then $f^2 = f$ on $U$, so since $\Theta_x$ is a homomorphism:
\begin{align*}
\Theta_x(f)^2 = \Theta_x(f^2) = \Theta_x(f).
\end{align*}
Thus $e := \Theta_x(f) \in A$ is an idempotent. The spectral mapping gives $\sigma_A(e) = f(\sigma_A(x)) = \{0, 1\}$ (or just $\{0\}$ or $\{1\}$ if one of the $\sigma_i$ is empty, but that was excluded). In $A = \mathcal{L}(H)$ for a Hilbert space $H$, an idempotent with spectrum in $\{0,1\}$ is precisely an orthogonal projection. So the HFC produces spectral projections directly from the topology of the spectrum — isolating the part of $x$ corresponding to $\sigma_1$. This construction is the prototype for the spectral theorem: decomposing an operator via indicator functions on parts of its spectrum.
[/example]
## Runge's Approximation Theorem via the HFC
The question that Runge's theorem answers is: can every function holomorphic near $K$ be uniformly approximated on $K$ by rational functions? The obstruction is topological: each bounded connected component of $\mathbb{C} \setminus K$ acts as a "hole" through which residues can accumulate, and a rational function needs a pole inside each hole to represent that residue. Phrased differently, if we try to use only polynomials (rational functions with the only "pole" at $\infty$), the bounded components cause the approximation to fail — as we saw with $1/z$ on the annulus. The HFC gives an elegant proof of Runge's theorem by applying the partial results from Theorem 6.2 to the specific commutative Banach algebra $A = \mathcal{C}(K)$.
[citeproof:2684]
Runge's theorem describes density on a compact set $K$, but the application we actually need (to close the proof of the HFC) is density in $\mathcal{O}(U)$ for an open set $U$. The transition is a routine exhaustion argument: a basic neighbourhood in the locally uniform topology on $\mathcal{O}(U)$ tests convergence on a single compact $K \subset U$, so it suffices to approximate on a slightly enlarged compact $\tilde{K}$ whose holes lie outside $U$ — and Runge applied to $\tilde{K}$ provides the rational approximant.
[quotetheorem:2685]
[citeproof:2685]
## Completing the Proof of the Holomorphic Functional Calculus
The remaining difficulty in proving Theorem 6.1 is the homomorphism property: we need $\Theta_x(fg) = \Theta_x(f)\Theta_x(g)$ for all $f, g \in \mathcal{O}(U)$. This is non-trivial because the contour integral formula for $\Theta_x(fg)$ is not the product of two integrals of the form $\frac{1}{2\pi i}\int_\Gamma h(z)(z - x)^{-1}\, dz$ — writing out $\Theta_x(f)\Theta_x(g)$ as a product of two contour integrals leads to a double integral that does not simplify directly. The resolution is a density argument: prove multiplicativity for rational functions (where it holds by pure algebra), then extend by continuity using the density of $\mathcal{R}(U)$ in $\mathcal{O}(U)$ — which is exactly the Corollary to Runge's theorem. This is where the circular dependence closes.
[citeproof:2685]
With the HFC fully established, we can deploy it to construct elements that no power-series argument can produce. The classical motivating problem is the existence of a logarithm: given an invertible element $x$ of a Banach algebra, can one solve $e^y = x$ for some $y \in A$? Power series fail because $\log$ has no globally convergent expansion at $0$, but the HFC handles this with no extra effort.
[example: Logarithm of an Invertible Element]
Let $A$ be a commutative unital Banach algebra and $x \in A$ be invertible. Suppose the spectrum $\sigma_A(x)$ is contained in a simply connected open set $U \subset \mathbb{C}$ not containing $0$ (e.g., a half-plane). Then the principal branch of $\log$ is holomorphic on $U$, so the HFC provides an element $\log(x) := \Theta_x(\log) \in A$. The spectral mapping gives $\sigma_A(\log(x)) = \{\log(\lambda) : \lambda \in \sigma_A(x)\}$. Moreover, since $\Theta_x$ is a homomorphism, $e^{\log(x)} = x$ (as the identity $e^{\log(z)} = z$ holds pointwise on $U$, and the HFC maps pointwise function identities to algebraic identities). This would be impossible to establish via power series alone, since the logarithm has no globally convergent power series.
If $0 \in \sigma_A(x)$, the logarithm is simply not defined — no branch of $\log$ is holomorphic on any neighborhood of $0$. This is a concrete illustration of why the spectrum-avoidance hypothesis $0 \notin \sigma_A(x)$ (equivalently, $x$ invertible) is not a technical convenience but a genuine necessity.
[/example]
Two further remarks situate the HFC within the broader theory. The first concerns the independence of the construction from the choice of cycle, and the second connects the HFC to more powerful functional calculi developed later in the course.
[remark: Dependence on the Cycle $\Gamma$]
The definition of $\Theta_x(f)$ uses a specific cycle $\Gamma$ encircling $\sigma_A(x)$ with winding number $1$. Different choices of $\Gamma$ (satisfying the same winding number conditions) give the same value, by the vector-valued Cauchy theorem applied between two such cycles. Thus $\Theta_x$ is genuinely independent of the choice of $\Gamma$, as one should expect from the analogy with Cauchy's integral formula.
[/remark]
The second remark looks outward from the HFC to its successors. The functional calculus we have constructed handles holomorphic functions on commutative Banach algebras; richer ambient structures permit richer calculi. The Borel functional calculus, developed later in the course, is the natural extension when the algebra is $\mathcal{L}(H)$ and the element is a normal operator.
[remark: Comparison with the Borel Functional Calculus]
For a normal operator $T \in \mathcal{L}(H)$ on a Hilbert space, the course later develops the Borel Functional Calculus (Chapter 8), which extends the map $f \mapsto f(T)$ from holomorphic functions to all bounded Borel functions on the spectrum. The HFC sits inside the Borel FC as the restriction to analytic $f$. The HFC applies in the much more general setting of commutative Banach algebras and uses no Hilbert space structure, while the Borel FC exploits the richer structure of $\mathcal{L}(H)$ and spectral measures. This chapter is one of the great payoffs of soft analysis: a genuinely non-trivial theorem in complex function theory (Runge's approximation) emerges almost for free from purely algebraic considerations about characters of Banach algebras.
[/remark]
Beyond Banach algebras, Chapter 7 studies C*-algebras—Banach algebras with an involution satisfying the single rigid axiom ‖x*x‖ = ‖x‖². This equation forces every commutative unital C*-algebra to be isomorphic to C(K), and enables the construction of positive square roots and polar decompositions of invertible operators.
# 7. $C^*$-Algebras
This chapter introduces $C^*$-algebras, the central objects of modern operator theory. A $C^*$-algebra is a Banach algebra (in the sense of Chapter 5) equipped with an involution satisfying one deceptively simple identity — the $C^*$-equation $\|x^* x\| = \|x\|^2$ — yet this single constraint forces the entire algebraic and analytic structure into rigid shape. The chapter culminates in the Commutative Gelfand–Naimark Theorem, which classifies all commutative unital $C^*$-algebras as continuous function algebras $C(K)$, building on the Gelfand representation developed in Chapter 5 and the holomorphic functional calculus of Chapter 6, and then applies this classification to obtain square roots and polar decompositions for operators on Hilbert spaces.
## Involutions and the $C^*$-Equation
To talk about adjoints in an abstract algebraic setting, we need a counterpart to the Hilbert space adjoint. The right abstraction is an involution.
[definition: Star-Algebra]
A **$*$-algebra** is a complex algebra $A$ equipped with an **involution**: a map $*: A \to A$, $x \mapsto x^*$, satisfying for all $x, y \in A$ and $\lambda, \mu \in \mathbb{C}$:
1. $(\lambda x + \mu y)^* = \bar{\lambda} x^* + \bar{\mu} y^*$ (conjugate-linearity),
2. $(xy)^* = y^* x^*$ (anti-multiplicativity),
3. $x^{**} = x$ (involutivity, i.e., $* \circ * = \mathrm{id}$).
[/definition]
The three axioms have an immediate consequence that is constantly used but easy to overlook: in a unital $*$-algebra the involution must fix the identity. This is not built into the axioms — it falls out of them, and the proof is a small but instructive exercise in pushing $*$ through products.
[remark: Identity Is Self-Adjoint]
If $A$ is unital, then $1^* = 1$. To see this: apply $*$ to the identity $1 \cdot 1 = 1$ to get $1^* \cdot 1^* = 1^*$, so $1^*$ is idempotent. Since $A$ is a Banach algebra and $1 \neq 0$, we have $1^* \neq 0$. Multiply both sides of $1^* \cdot 1^* = 1^*$ on the right by $x$ for any $x \in A$: using $1^*$ as a left identity (since $1^* \cdot x = (x^* \cdot 1)^* = (x^*)^* = x$), we get $1^* \cdot x = x$ for all $x \in A$, which means $1^*$ is a left identity. Since the left and right identities in a unital algebra must coincide, $1^* = 1$.
[/remark]
With an involution in hand, we can add a norm condition to turn a $*$-algebra into a $C^*$-algebra.
[definition: C-Star-Algebra]
A **$C^*$-algebra** is a Banach algebra $A$ with an involution $*$ such that the **$C^*$-equation** holds:
\begin{align*}
\|x^* x\| = \|x\|^2 \qquad \forall x \in A.
\end{align*}
A norm on $A$ satisfying this equation (together with the Banach algebra axioms) is called a **$C^*$-norm**.
[/definition]
The $C^*$-equation is the central axiom of the entire theory. It links the involution to the norm in a way that propagates rigidity throughout the structure — for instance, it immediately forces the norm to be unique (if it exists) and the involution to be isometric. To appreciate why this single axiom is so powerful, it helps to know what can go wrong without it.
[example: A Banach *-Algebra Without the C*-Equation]
Take $A = M_2(\mathbb{C})$ with its operator-norm structure (a unital Banach $*$-algebra under the usual conjugate-transpose $T \mapsto T^*$). Now twist the involution: fix $S = \begin{pmatrix} 1 & 0 \\ 0 & 2 \end{pmatrix}$ (so $S = S^*$ but $S$ is not unitary) and define $T^\dagger := S T^* S^{-1}$. One checks $\dagger$ is conjugate-linear, anti-multiplicative, and involutive, so $(A, \dagger)$ is a Banach $*$-algebra.
Compute the $C^*$-equation for $T = \begin{pmatrix} 0 & 1 \\ 0 & 0 \end{pmatrix}$. Then
\begin{align*}
T^\dagger = S T^* S^{-1} = \begin{pmatrix} 1 & 0 \\ 0 & 2 \end{pmatrix}\begin{pmatrix} 0 & 0 \\ 1 & 0 \end{pmatrix}\begin{pmatrix} 1 & 0 \\ 0 & 1/2 \end{pmatrix} = \begin{pmatrix} 0 & 0 \\ 2 & 0 \end{pmatrix} \cdot \begin{pmatrix} 1 & 0 \\ 0 & 1/2 \end{pmatrix} = \begin{pmatrix} 0 & 0 \\ 2 & 0 \end{pmatrix},
\end{align*}
so $T^\dagger T = \begin{pmatrix} 0 & 0 \\ 2 & 0 \end{pmatrix}\begin{pmatrix} 0 & 1 \\ 0 & 0 \end{pmatrix} = \begin{pmatrix} 0 & 0 \\ 0 & 2 \end{pmatrix}$. The operator norm gives $\|T^\dagger T\| = 2$ but $\|T\|^2 = 1^2 = 1$, so $\|T^\dagger T\| \neq \|T\|^2$. The point is that the $C^*$-equation is an additional axiom — it is not automatic from the Banach algebra and involution structures, and without it the algebraic and metric structures do not constrain each other.
[/example]
The example shows the $C^*$-equation is genuinely a constraint, not a tautology. Two formal consequences follow immediately from it, and we record them now since they will be invoked throughout the chapter.
[remark: Basic Consequences of the C*-Equation]
Let $A$ be a $C^*$-algebra and $x \in A$.
**(i) The involution is isometric:** $\|x^*\| = \|x\|$.
To see this: from the $C^*$-equation and submultiplicativity of the norm, $\|x\|^2 = \|x^* x\| \leq \|x^*\| \|x\|$, giving $\|x\| \leq \|x^*\|$. Applying the same inequality to $x^*$ gives $\|x^*\| \leq \|x^{**}\| = \|x\|$, so $\|x^*\| = \|x\|$.
**(ii) Unital algebras automatically have $\|1\| = 1$:** From the $C^*$-equation, $\|1\| = \|1^* \cdot 1\| = \|1\|^2$, so $\|1\| \in \{0, 1\}$; since $1 \neq 0$ we have $\|1\| = 1$.
[/remark]
Two further structural notions are needed before we can speak meaningfully of "examples" of $C^*$-algebras: a notion of subobject, and a notion of structure-preserving map. Both must respect the involution as well as the algebra structure.
[definition: Star-Subalgebra]
A **$*$-subalgebra** of a $*$-algebra $A$ is a subalgebra $B \subset A$ that is closed under the involution, i.e., $x \in B \implies x^* \in B$. A closed $*$-subalgebra of a $C^*$-algebra is itself a $C^*$-algebra, and is called a **$C^*$-subalgebra**. In particular, the norm-closure of a $*$-subalgebra of a $C^*$-algebra is again a $C^*$-subalgebra.
[/definition]
The corresponding notion of map between $*$-algebras is an algebra homomorphism that intertwines the involutions.
[definition: Star-Homomorphism]
A **$*$-homomorphism** between $*$-algebras is an algebra homomorphism $\theta: A \to B$ such that $\theta(x^*) = \theta(x)^*$ for all $x \in A$, i.e., $\theta$ and $*$ commute. A bijective $*$-homomorphism is a **$*$-isomorphism**.
[/definition]
Notice that the definition of $*$-homomorphism makes no continuity assumption — only an algebraic compatibility. One might expect that a separate boundedness hypothesis is needed to do analysis with such maps, but in the $C^*$-setting this expectation is wrong, and the next remark explains why.
[remark: Automatic Continuity]
There is no continuity assumption in the definition of a $*$-homomorphism. However, the $C^*$-equation is rigid enough to force continuity automatically — a $*$-homomorphism between $C^*$-algebras is always continuous (and in fact contractive). This is a hallmark of $C^*$-algebra theory: algebraic structure alone determines the analytic structure.
[/remark]
## Examples and Special Elements
We have built up an axiom system, but so far we have not exhibited a single non-trivial $C^*$-algebra. The pressing question is whether the axioms are even consistent — and if so, whether they admit a rich supply of examples or only pathological ones. A naive search is hopeless: the $C^*$-equation is a strong constraint that ties the norm to the involution at every point, and most natural-looking norms on a $*$-algebra simply fail it (we just saw this with the dagger involution on $M_2(\mathbb{C})$).
Two classes of examples stand out, and they turn out to be representative of *all* $C^*$-algebras (a theorem we will state but not prove). The first is commutative — continuous functions on a compact space; the second is non-commutative — bounded operators on a Hilbert space. Verifying the $C^*$-equation in either case is a finite calculation, and we record a practical recipe before doing it: it suffices to (a) write down an explicit involution $*$, (b) show $\|x^* x\| = \|x\|^2$ directly from the norm formula. In $C(K)$ this reduces to a pointwise calculation; in $\mathcal{L}(H)$ it follows from the polarisation identity.
[example: Fundamental Examples of C*-Algebras]
The two archetypal $C^*$-algebras are:
**(i) $C(K)$ for $K$ compact Hausdorff:** Define $f^* := \bar{f}$ (complex conjugation). This is a commutative, unital $C^*$-algebra with $C^*$-norm $\|f\|_\infty = \sup_{k \in K} |f(k)|$. The $C^*$-equation is immediate: $\|f^* f\|_\infty = \|\bar{f} f\|_\infty = \||f|^2\|_\infty = \|f\|_\infty^2$.
**(ii) $\mathcal{L}(H)$ for $H$ a Hilbert space:** With $T^* :=$ the Hilbert space adjoint of $T$, this is a unital $C^*$-algebra. The $C^*$-equation follows from: $\|T^* T\| = \sup_{\|x\| \leq 1} \|T^* T x\| \geq \sup_{\|x\| \leq 1} |(Tx, Tx)_H| / \|Tx\| = \sup_{\|x\| \leq 1} \|Tx\| = \|T\|$, and combined with $\|T^* T\| \leq \|T^*\| \|T\| = \|T\|^2$, this gives $\|T^*T\| = \|T\|^2$. Any $C^*$-subalgebra of $\mathcal{L}(H)$ is again a $C^*$-algebra.
The Gelfand–Naimark Theorem (below) will show that every commutative unital $C^*$-algebra is isomorphic to $C(K)$, and a deeper result (the general Gelfand–Naimark Theorem, not proved here) shows every $C^*$-algebra embeds isometrically as a $C^*$-subalgebra of some $\mathcal{L}(H)$.
[/example]
Within a $C^*$-algebra, elements that interact nicely with the involution play a distinguished role.
[definition: Hermitian, Unitary, Normal Elements]
Let $A$ be a $C^*$-algebra and $x \in A$.
- $x$ is **Hermitian** (or **self-adjoint**) if $x^* = x$.
- $x$ is **unitary** if $A$ is unital and $x^* x = x x^* = 1$.
- $x$ is **normal** if $x^* x = x x^*$, i.e., $[x, x^*] = 0$.
Hermitian $\implies$ Normal and Unitary $\implies$ Normal (neither implication reverses in general).
[/definition]
The classes of Hermitian, unitary, and normal elements are not exclusive — a single element can fail to be in any class, lie in just one, or lie in several. To get a feel for how restrictive each condition really is, it is worth contrasting them on a concrete normal element where we can compute the spectrum directly and watch which subset each class carves out.
[example: Hermitian vs Unitary on a Concrete Normal Element]
Work in $A = C([0, 2\pi])$ with $f^* = \bar{f}$, and consider the normal element $f(t) = e^{it}$. Then
\begin{align*}
f^* f(t) = \overline{e^{it}} \cdot e^{it} = e^{-it} e^{it} = 1, \qquad f f^*(t) = 1,
\end{align*}
so $f^* f = f f^* = 1$, i.e., $f$ is unitary. But $f^*(t) = e^{-it} \neq e^{it} = f(t)$ for $t \in (0, 2\pi)$, so $f$ is **not** Hermitian. The spectrum is $\sigma_A(f) = f([0, 2\pi]) = S^1$ — the unit circle, consistent with unitarity.
Now decompose $f$ into Hermitian parts (per the Hermitian decomposition below): $f = h + ik$ with $h(t) = \cos t$ and $k(t) = \sin t$. Each of $h, k$ is Hermitian (real-valued), and $\sigma_A(h) = h([0, 2\pi]) = [-1, 1] \subset \mathbb{R}$ — consistent with the spectral inclusion theorem we will prove. Moreover $h^2 + k^2 = 1$ (the Pythagorean identity in $C(K)$), which is precisely the abstract relation $f^* f = (h - ik)(h + ik) = h^2 + k^2 = 1$.
Two structural points emerge. First, the implications Hermitian $\implies$ Normal and Unitary $\implies$ Normal do not reverse: $f$ is normal and unitary but not Hermitian. Second, in $C(K)$ the Gelfand–Mazur reduction is visible directly — the spectrum equals the image $f(K)$, so being Hermitian (real-valued image) and being unitary (unit-modulus image) are mutually exclusive except for the locally constant cases $f \equiv \pm 1$.
[/example]
Every element of a $C^*$-algebra decomposes into Hermitian parts, exactly analogous to real and imaginary parts of a complex number.
[remark: Hermitian Decomposition]
For any $x \in A$, there exist **unique** Hermitian elements $h, k \in A$ such that $x = h + ik$. They are given by
\begin{align*}
h = \frac{x + x^*}{2}, \qquad k = \frac{x - x^*}{2i}.
\end{align*}
Uniqueness follows because $x^* = h - ik$, which determines $h$ and $k$. Existence is verified directly: both expressions are Hermitian and $x = h + ik$.
[/remark]
A second basic consequence of the $*$-structure concerns invertibility: applying the involution preserves the group of units, and the spectrum of $x^*$ is therefore tightly tied to the spectrum of $x$. We record this now because it underlies many of the spectral arguments in the next section.
[remark: Invertibility and the Spectrum of x*]
In a unital $C^*$-algebra $A$, $x \in G(A)$ (the group of invertible elements) if and only if $x^* \in G(A)$: indeed, $xy = 1 \implies 1 = 1^* = (xy)^* = y^* x^*$, and the reverse direction is symmetric. As a consequence,
\begin{align*}
\sigma_A(x^*) = \overline{\sigma_A(x)} := \{\bar{\lambda} : \lambda \in \sigma_A(x)\},
\end{align*}
since $\lambda \cdot 1 - x$ is invertible if and only if $(\lambda \cdot 1 - x)^* = \bar{\lambda} \cdot 1 - x^*$ is invertible. In particular, $r_A(x^*) = r_A(x)$.
[/remark]
## Spectral Properties and Characters
In a general Banach algebra, the spectral radius can be strictly smaller than the norm — a nilpotent matrix has spectral radius $0$ but nonzero norm — and there is no a priori reason for a Hermitian element to behave any differently. The question we want to settle is: what extra structure forces $r(x) = \|x\|$, and for which elements? The $C^*$-equation provides exactly the right rigidity, but only for the elements that interact nicely with the involution — the normal ones. The next theorem makes this precise, and its proof shows in miniature how the $C^*$-equation propagates from a single algebraic identity into a global metric statement.
[quotetheorem:2686]
[citeproof:2686]
This is a striking fact: for normal elements, the spectral radius — a purely algebraic quantity — equals the norm. It means that the norm of a normal element is completely determined by its spectrum, which is the source of much of the rigidity of $C^*$-algebras.
The normality hypothesis cannot be dropped. A nilpotent operator $T$ on a Hilbert space satisfies $T^n = 0$ for some $n$, so $r(T) = 0$, yet $\|T\|$ can be as large as desired. For instance, the upper-triangular matrix
\begin{align*}
T = \begin{pmatrix} 0 & 1 \\ 0 & 0 \end{pmatrix} \in \mathcal{L}(\mathbb{C}^2)
\end{align*}
has $T^2 = 0$ so $r(T) = 0$, but $\|T\| = 1$. Nilpotent operators are never normal (unless they are zero), so this is not a contradiction — but it illustrates that the theorem would be false without the normality hypothesis. The theorem also does not extend to general Banach algebras: in a Banach algebra without the $C^*$-equation, the spectral radius can be strictly smaller than the norm even for Hermitian elements.
Characters of commutative $C^*$-algebras have a special relationship with the involution.
[quotetheorem:2687]
[citeproof:2687]
This theorem reflects the rigidity of the $C^*$-condition. In a general commutative Banach algebra, characters are algebra homomorphisms $A \to \mathbb{C}$, but there is no reason they should interact with an involution in any particular way. The $C^*$-equation is what forces $\varphi(x^*) = \overline{\varphi(x)}$ — the proof works precisely because the $C^*$-equation gives $\|z_t\|^2 = \|z_t^* z_t\|$, enabling the trick with $z_t = x + it$. Without the $C^*$-equation, one could define an involution on a commutative Banach algebra for which this identity fails for some character.
The following corollary pins down the spectra of Hermitian and unitary elements, and proves spectral permanence across $C^*$-subalgebras.
[quotetheorem:2688]
[citeproof:2688]
Property (3) is the **spectral permanence** property of $C^*$-algebras. It fails for general Banach subalgebras — inverting in the smaller algebra need not match inverting in the larger one — but the $C^*$-structure forces them to agree for normal elements. This is one of the reasons $C^*$-algebras are so well-behaved.
[example: Spectral Permanence Fails for Banach Subalgebras]
Let $A = C(\mathbb{T})$ where $\mathbb{T} = S^1$ is the unit circle, and let $B \subset A$ be the **disk algebra**: the Banach algebra of functions continuous on the closed unit disk $\overline{\mathbb{D}}$ and holomorphic on the open disk $\mathbb{D}$, restricted to the boundary $\mathbb{T}$. The function $f(z) = z$ lies in $B$. In $A = C(\mathbb{T})$, $\sigma_A(f) = f(\mathbb{T}) = \mathbb{T}$. In $B$, however, $\sigma_B(f) = \overline{\mathbb{D}}$ — the full closed disk — because $f - \lambda$ fails to be invertible in $B$ for any $|\lambda| < 1$ (the would-be inverse $1/(z - \lambda)$ has a pole inside $\mathbb{D}$). So $\sigma_B(f) \supsetneq \sigma_A(f)$. The disk algebra $B$ does not carry a natural $C^*$-structure (there is no involution making $f^* = \bar{f}$ consistent with the algebra structure of $B$), and this is precisely why spectral permanence fails.
[/example]
## The Commutative Gelfand–Naimark Theorem
A priori, a commutative unital $C^*$-algebra could be a complicated abstract object — a Banach space with a multiplicative structure and an involution, but with no geometric picture attached. The question is whether such an algebra must, in disguise, be the algebra of continuous functions on some compact space. One might hope that Gelfand's theory from the preceding chapter answers this, but on its own the Gelfand transform is only a homomorphism into $C(\Phi_A)$; it need not be injective or surjective for a general Banach algebra. What the $C^*$-condition adds is exactly the missing piece.
[quotetheorem:2689]
[citeproof:2689]
The proof is a beautiful example of how several ingredients — Gelfand theory, the spectral radius formula, the characters-are-$*$-homomorphisms theorem, and Stone–Weierstrass — combine to give a classification result that is far stronger than any one ingredient alone. Notice that isometry and surjectivity each required the $C^*$-structure in an essential way: isometry used $r_A(x) = \|x\|$ (which requires normality, hence the $C^*$-equation), and surjectivity used conjugation-invariance (which required characters to be $*$-homomorphisms). The theorem fails for general commutative Banach algebras.
[explanation: What the Theorem Means]
The Commutative Gelfand–Naimark Theorem completely solves the classification problem for commutative unital $C^*$-algebras: every such algebra is, up to isometric $*$-isomorphism, a space of continuous functions on a compact Hausdorff space. The space $K = \Phi_A$ is constructed canonically from $A$ — it is the spectrum or **maximal ideal space** of $A$.
The theorem also reverses: every $C(K)$ (for compact Hausdorff $K$) is a commutative unital $C^*$-algebra, and the construction $A \mapsto \Phi_A$ recovers $K$ from $A$. So there is a categorical duality between compact Hausdorff spaces and commutative unital $C^*$-algebras — this is the starting point of noncommutative geometry, where one studies non-commutative $C^*$-algebras as "algebras of functions on noncommutative spaces."
[/explanation]
## Applications of the Gelfand–Naimark Theorem
With the classification in hand, we can transport problems about abstract $C^*$-algebras into the concrete setting of $C(K)$ — solve them pointwise — and then transport the solution back. Two applications illustrate the technique. First, the existence of positive square roots, which reduces to taking the pointwise square root of a non-negative function. Second, the polar decomposition of an invertible operator, which uses the square root construction to produce the "size" factor of the decomposition. Each application has a subtlety about which hypothesis is really needed, and we treat them in turn.
### Positive Elements and Square Roots
Every non-negative real number has a unique non-negative square root. Whether the same is true for elements of a $C^*$-algebra — and if so, what hypothesis on $x$ is required — is the question of this subsection. The answer is not "Hermitian": being Hermitian is the analogue of being real, which is not enough (negative reals have no real square root). The right hypothesis is positivity, an algebraic condition that we will state below in spectral terms. Without positivity the construction breaks down, and the failure mode is visible in $C(K)$ itself.
[example: Square Roots Fail Without Positivity]
In $C([-1, 1])$, consider $f(t) = t$ (Hermitian, since it is real-valued). Its spectrum is $\sigma(f) = [-1, 1]$. The "obvious" square root function $\sqrt{t}$ is not real-valued on $[-1, 0)$, and in particular the function $g(t) = \sqrt{t}$ is not even defined as a real-valued continuous function on $[-1, 1]$. There is no element $g \in C([-1, 1])$ with $g^2 = f$ and $g$ real-valued. So the Hermitian hypothesis alone does not suffice — the spectrum must be non-negative.
[/example]
This counterexample isolates the missing ingredient: the spectrum must be contained in $[0, \infty)$. Promoting that observation to a definition gives the right notion of "non-negative element" in an abstract $C^*$-algebra.
[definition: Positive Element]
Let $A$ be a unital $C^*$-algebra. An element $x \in A$ is **positive** if $x$ is Hermitian and $\sigma_A(x) \subset [0, \infty)$.
[/definition]
Positive elements are the $C^*$-algebra analogue of non-negative real numbers.
[quotetheorem:2690]
[citeproof:2690]
In particular, for any Hilbert space $H$, every positive operator $T \in \mathcal{L}(H)$ has a unique positive square root. Recall that $T \in \mathcal{L}(H)$ is positive in this sense if and only if $(T(x), x)_H \geq 0$ for all $x \in H$.
The positivity hypothesis is essential and cannot be relaxed to Hermitian alone, as the $C([-1,1])$ counterexample showed. The theorem also does not give a general continuous functional calculus — it applies only to the square root function. Extending this to arbitrary continuous functions on the spectrum is a significant further result (the continuous functional calculus for normal elements) that goes beyond this chapter.
**How to compute $x^{1/2}$ in $C(K)$:** The proof gives an explicit algorithm. Given a positive element $x \in A$: (1) identify the commutative $C^*$-subalgebra $B$ generated by $x$; (2) apply the Gelfand transform to identify $B \cong C(K)$; (3) compute $\sqrt{f}$ pointwise in $C(K)$; (4) transport back via $\theta$. In $C(K)$ itself, this is trivial — $x = f \geq 0$ implies $x^{1/2} = \sqrt{f}$ pointwise.
### Polar Decomposition of Invertible Operators
Every non-zero complex number $z$ can be written as $z = |z| \cdot e^{i\theta}$, separating its "size" $|z|$ from its "phase" $e^{i\theta}$. A natural question for operators is whether something similar holds: can every invertible $T \in \mathcal{L}(H)$ be written as a positive part times a unitary part? The naive approach of writing $T = \|T\| \cdot (T/\|T\|)$ fails immediately, since $T/\|T\|$ is not unitary in general. What is needed is a finer decomposition, and the obstacle is precisely constructing the "size" piece — which turns out to require the positive square root theorem.
[quotetheorem:2691]
[citeproof:2691]
The invertibility hypothesis is not just a convenience — it is necessary for the conclusion to hold in the stated form. If $T$ is not invertible, then $TT^*$ is not invertible, $R = (TT^*)^{1/2}$ is not invertible, and the formula $U = R^{-1}T$ breaks down. In the non-invertible case, one can still decompose $T = RU$ where $U$ is a **partial isometry** (satisfying $U^*U$ is an orthogonal projection, not necessarily the identity), but uniqueness requires care and the statement is more subtle. For example, the shift operator $S: \ell^2 \to \ell^2$ defined by $S(e_n) = e_{n+1}$ satisfies $S^*S = I$ but $SS^* = I - P_0$ where $P_0$ is the projection onto the first basis vector — so $S$ is an isometry but not unitary, and its polar decomposition must use a partial isometry rather than a unitary.
[remark: Structure of the Proof]
The key insight is that the Gelfand–Naimark Theorem, via the square root theorem, provides the positive factor $R$ directly from the algebraic expression $TT^*$. The polar decomposition then reduces understanding an arbitrary invertible operator to understanding a positive operator (the "size") and a unitary operator (the "phase" or "direction").
[/remark]
The spectral decomposition from Chapter 7 is generalized in Chapter 8 to the Borel functional calculus: for a normal operator T on a Hilbert space, a resolution of the identity assigns orthogonal projections to Borel sets of the spectrum, enabling integration against f(T) for all bounded Borel functions f. This is the culmination of spectral theory, reducing arbitrary normal operators to multiplication operators and unlocking the full power of measure theory in operator theory.
# 8. Borel Functional Calculus and Spectral Theory
This final chapter brings together the spectral theory and $C^*$-algebra machinery developed throughout the course into its most powerful form: the Borel Functional Calculus (BFC). Recall from Chapter 7 that the Commutative Gelfand–Naimark theorem classifies every commutative unital $C^*$-algebra as a $C(K)$; the BFC uses this classification together with the Riesz Representation Theorem (Chapter 2) to produce an operator-valued measure — the resolution of the identity — from which every normal operator can be reconstructed. The central question is: given a normal operator $T \in \mathcal{L}(H)$, how do we make sense of $f(T)$ for a general Borel-measurable function $f$ on the spectrum $\sigma(T)$? The answer requires a careful notion of operator-valued integration, built around the concept of a resolution of the identity. Once in place, the BFC allows us to decompose any normal operator spectrally and to apply the full machinery of measure theory to operator theory. The chapter closes with three striking applications, including the polar decomposition, a logarithm theorem for unitary operators, and a topological result about the group of invertible operators.
## Resolutions of the Identity
Throughout this chapter, $H$ is a non-trivial complex Hilbert space and $K$ is a compact Hausdorff space. We write $\mathcal{B}$ for the Borel $\sigma$-algebra of $K$ and $\mathcal{L}(H)$ for the bounded operators $H \to H$. The inner product on $H$ is $(\cdot, \cdot)_H$.
The problem we face is this: we want to "integrate" operator-valued functions over a spectral parameter. A naive approach — pick a Borel partition of $K$, form operator-valued Riemann sums, and take a limit — runs into trouble immediately, because the operator norm limit need not exist unless the projections corresponding to different partition cells are orthogonal. Without that orthogonality, the partial sums need not even form a bounded sequence. The resolution of the identity is the axiom system that encodes exactly the orthogonality constraint needed to make the integral well-defined: it assigns to each Borel set $E \subseteq K$ an orthogonal projection $P(E)$ on $H$, in a way that is coherent with the Boolean structure of $\mathcal{B}$.
[definition: Resolution of the Identity]
A **resolution of the identity** of $H$ over $K$ is a function $P : \mathcal{B} \to \mathcal{L}(H)$ satisfying:
(i) $P(\varnothing) = 0$ and $P(K) = I$.
(ii) $P(E)$ is an orthogonal projection for every $E \in \mathcal{B}$.
(iii) $P(E \cap F) = P(E) \circ P(F)$ for all $E, F \in \mathcal{B}$.
(iv) If $E \cap F = \varnothing$, then $P(E \cup F) = P(E) + P(F)$ (finite additivity).
(v) For every $x, y \in H$, the map $P_{x,y} : \mathcal{B} \to \mathbb{C}$ defined by
\begin{align*}
P_{x,y}(E) := (P(E)(x),\, y)_H
\end{align*}
is a regular complex Borel measure on $K$.
[/definition]
A resolution of the identity is an "operator-valued measure", though it need not be countably additive in the operator norm. The scalar measures $P_{x,y}$ supply the required countable additivity in the "weak" direction.
<!-- illustration-needed: a projection-valued measure picture — show a compact set K (e.g. an interval or disk) partitioned into Borel pieces E_1, E_2, E_3, with each piece labelled by its associated orthogonal projection P(E_i) on H, and the ranges of the P(E_i) drawn as mutually orthogonal subspaces of H whose direct sum reconstructs H -->
[example: Multiplication Operator on $L^2(0,1)$]
Let $H = L^2([0,1])$ and $K = [0,1]$. For a Borel set $E \subseteq [0,1]$, define
\begin{align*}
P(E)(f) := \mathbb{1}_E \cdot f.
\end{align*}
Multiplication by $\mathbb{1}_E$ is an orthogonal projection since $\mathbb{1}_E^2 = \mathbb{1}_E$ and $\mathbb{1}_E$ is real-valued. The map $P_{x,y}(E) = \int_E x(t)\overline{y(t)}\, d\mathcal{L}^1(t)$ is a standard complex Borel measure, so all axioms are satisfied. This is the prototype: the BFC will reduce every normal operator to a version of this multiplication picture.
[/example]
The continuous prototype above shows the framework working with a measure that has no atoms; the next example complements it by showing how the same axioms recover the familiar finite-dimensional spectral decomposition, where the measure is a finite sum of point masses.
[example: Spectral Projections of a Self-Adjoint Matrix]
Let $H = \mathbb{C}^n$ and $T$ a self-adjoint matrix with distinct eigenvalues $\lambda_1 < \lambda_2 < \cdots < \lambda_k$ and corresponding orthonormal eigenvectors $e_1, \dots, e_k$ (each of multiplicity one, for simplicity). For a Borel set $E \subseteq \mathbb{R}$, define
\begin{align*}
P(E) := \sum_{j:\, \lambda_j \in E} e_j e_j^*,
\end{align*}
the orthogonal projection onto the span of all eigenvectors whose eigenvalue lies in $E$. We have:
- $P(\varnothing) = 0$ and $P(\mathbb{R}) = I$ (since the eigenvectors span $\mathbb{C}^n$).
- $P(E)^2 = P(E)$ and $P(E)^* = P(E)$ (projection onto a sum of orthonormal vectors).
- $P(E \cap F) = P(E)P(F)$: only eigenvalues in both $E$ and $F$ contribute.
- Finite additivity from disjointness of the index sets.
The scalar measure $P_{x,y}(E) = \sum_{j:\lambda_j \in E} (x, e_j)(e_j, y)$ is a finite positive measure (for $x = y$) supported on the finite set $\{\lambda_1, \dots, \lambda_k\}$. For this matrix, $T = \int \lambda\, dP(\lambda) = \sum_j \lambda_j P(\{\lambda_j\}) = \sum_j \lambda_j e_j e_j^*$, which is exactly the spectral decomposition. The BFC then gives $f(T) = \sum_j f(\lambda_j) e_j e_j^*$ for any bounded Borel $f$.
[/example]
<!-- illustration-needed: spectrum and spectral projections of a self-adjoint matrix — show eigenvalues lambda_1 < lambda_2 < ... < lambda_k as labelled points on the real axis, a Borel set E shaded over a subset of the real line, and arrows from the points lambda_j in E pointing to the corresponding orthonormal eigenvectors e_j, whose span is the range of P(E) -->
### Immediate Properties of Resolutions
Several structural properties flow immediately from the definition.
[explanation: Properties of $P$]
Let $P$ be a resolution of the identity over $K$.
**Commutativity.** For all $E, F \in \mathcal{B}$,
\begin{align*}
P(E) \circ P(F) = P(E \cap F) = P(F \cap E) = P(F) \circ P(E).
\end{align*}
The projections $P(E)$ and $P(F)$ always commute.
**Orthogonality of disjoint sets.** If $E \cap F = \varnothing$, then $P(E \cap F) = P(\varnothing) = 0$, so $P(E) \circ P(F) = 0$. Because $P(E)$ is self-adjoint, for any $x, y \in H$,
\begin{align*}
(P(E)(x),\, P(F)(y))_H = (x,\, (P(E) \circ P(F))(y))_H = (x, 0)_H = 0.
\end{align*}
Thus disjoint Borel sets yield projections with orthogonal ranges.
**The measures $P_{x,x}$ are positive.** For any $x \in H$ and $E \in \mathcal{B}$,
\begin{align*}
P_{x,x}(E) = (P(E)x, x)_H = (P(E)x, P(E)x)_H = \|P(E)x\|^2 \ge 0,
\end{align*}
where we used that $x = P(E)x + (x - P(E)x)$ with the two summands orthogonal, so $(P(E)x, x)_H = (P(E)x, P(E)x)_H$. Moreover, $\|P_{x,x}\|_1 = P_{x,x}(K) = \|P(K)x\|^2 = \|x\|^2$.
**Countable additivity in $H$.** The map $E \mapsto P(E)(x)$ is countably additive from $\mathcal{B}$ to $H$. Indeed, if $E = \bigsqcup_{n=1}^\infty E_n$ is a Borel partition, then for all $N$,
\begin{align*}
\sum_{n=1}^N \|P(E_n)x\|^2 = \Bigl\|\sum_{n=1}^N P(E_n)x\Bigr\|^2 = \Bigl\|P\Bigl(\bigsqcup_{n=1}^N E_n\Bigr)x\Bigr\|^2 \le \|x\|^2.
\end{align*}
(The first equality uses orthogonality of the ranges.) Hence $\sum_n \|P(E_n)x\|^2 \le \|x\|^2 < \infty$, so the partial sums $\sum_{n=1}^N P(E_n)x$ form a Cauchy sequence in $H$ and therefore converge. For any $y \in H$, by continuity of the inner product,
\begin{align*}
\Bigl(\sum_n P(E_n)x,\, y\Bigr)_H = \sum_n (P(E_n)x, y)_H = \sum_n P_{x,y}(E_n) = P_{x,y}(E) = (P(E)x, y)_H.
\end{align*}
Since this holds for all $y$, we conclude $\sum_n P(E_n)x = P(E)x$.
**Null sets propagate.** If $P(E_n) = 0$ for all $n$, then $P\bigl(\bigcup_n E_n\bigr) = 0$ (by the countable additivity just proved with $x$ arbitrary).
[/explanation]
## Integration Against a Resolution of the Identity
Suppose we want to define $\int_K f\, dP$ for a bounded Borel function $f: K \to \mathbb{C}$. The naive approach of using the uniform norm on $K$ would be too restrictive: a function that differs from a bounded function only on a $P$-null set (a Borel set $E$ with $P(E) = 0$) ought to integrate to the same operator, since $P$ assigns no "weight" to such sets. Two functions that are equal everywhere except on $P$-null sets should be identified. This forces us to introduce a refined notion of boundedness adapted to $P$, rather than insisting on global uniform bounds.
[definition: $P$-Essential Boundedness and $L^\infty(P)$]
A Borel function $f: K \to \mathbb{C}$ is **$P$-essentially bounded** if there exists a Borel set $E \in \mathcal{B}$ with $P(E) = 0$ such that $f$ is bounded on $K \setminus E$. The space of such functions is
\begin{align*}
L^\infty(P) := \{ f: K \to \mathbb{C} \mid f \text{ Borel and } P\text{-essentially bounded}\},
\end{align*}
with norm
\begin{align*}
\|f\|_\infty := \inf\bigl\{ \sup_{K \setminus E} |f| : E \in \mathcal{B},\; P(E) = 0 \bigr\}.
\end{align*}
Functions $f$ and $g$ are identified if $f = g$ $P$-a.e., meaning there exists $E \in \mathcal{B}$ with $P(E) = 0$ and $f = g$ on $K \setminus E$. With this identification and the norm $\|\cdot\|_\infty$, $L^\infty(P)$ is a commutative, unital $C^*$-algebra, where the involution is complex conjugation.
[/definition]
The infimum in the definition of $\|f\|_\infty$ is attained: taking a sequence $(E_n)$ with $P(E_n) = 0$ and $\sup_{K \setminus E_n}|f| \to \|f\|_\infty$, set $E = \bigcup_n E_n$. Then $P(E) = 0$ by the null-set propagation property, and $\sup_{K \setminus E}|f| = \|f\|_\infty$.
The ordinary bounded Borel functions $L^\infty(K) = \{f: K \to \mathbb{C} \mid f \text{ Borel and norm-bounded}\}$ embed into $L^\infty(P)$ via the natural inclusion, which is a unital $*$-homomorphism that is norm-decreasing.
The key lemma constructs the integration map and establishes all the properties that make it useful.
[quotetheorem:2692]
[citeproof:2692]
The integral $\Phi$ is multiplicative — a property worth highlighting separately because it has no analogue for ordinary Lebesgue integration, where $\int fg \ne (\int f)(\int g)$ in general.
[remark: Multiplicativity]
The integral $\int_K f\, dP$ is multiplicative: $\int_K fg\, dP = \bigl(\int_K f\, dP\bigr)\circ\bigl(\int_K g\, dP\bigr)$. This is not typical behaviour for classical integrals — it reflects the fact that $\Phi$ is a homomorphism of algebras, not just a linear map.
[/remark]
The integration theorem is more restrictive than it might appear. The hypothesis that $P_{x,y}$ is a regular measure is not automatic from the projection-valued map structure alone — it must be required explicitly. Without regularity, the approximation by simple functions used in Step 2 can fail, because there would be no guarantee that $P_{x,y}(E_n) \to P_{x,y}(E)$ for monotone sequences of Borel sets. Similarly, the isometry $\|\Phi(f)\| = \|f\|_{L^\infty(P)}$ would break if the $P$-null sets were not handled correctly: a function that differs from zero on a $P$-null set might produce a non-zero operator via a naive construction, violating isometry. The framework of $P$-essential boundedness is precisely what rules this out. Finally, the theorem says nothing about countable additivity in the operator norm — that fails in general; the additivity is only in the weak sense through the scalar measures $P_{x,y}$.
## The Spectral Theorem for Commutative $C^*$-Algebras
We now have all the tools to state and prove the spectral theorem at the level of commutative $C^*$-algebras. This is the operator-theoretic analogue of the classical Riesz Representation Theorem (RRT): it says that any commutative unital $C^*$-algebra sitting inside $\mathcal{L}(H)$ is "represented" by integration against a canonical resolution of the identity over its character space.
[quotetheorem:2693]
[citeproof:2693]
This theorem is the cornerstone of spectral theory: it says every commutative $C^*$-subalgebra of $\mathcal{L}(H)$ is "diagonalised" by a canonical resolution of the identity. The integration formula $\int_K \hat{T}\, dP = T$ makes precise the idea that $T$ is "multiplication by its eigenvalue function."
Each hypothesis of the theorem is load-bearing. Commutativity of $A$ is essential: for a non-commutative $C^*$-algebra, the Gelfand transform does not give a homeomorphism of the spectrum to a compact space, so the construction of the character space $\Phi_A$ fails and there is no single compact space to integrate over. The requirement that $A$ be a $C^*$-subalgebra of $\mathcal{L}(H)$ (not merely a Banach algebra or abstract $C^*$-algebra) is what gives us the concrete operator actions $(T(x), y)_H$ needed to extract the measures $\mu_{x,y}$. Without this, the theorem degenerates to a pure isomorphism of abstract algebras with no Hilbert space content.
## Exponentials in Banach Algebras
The Fuglede–Putnam–Rosenblum lemma, which we need next, and the unitary representation theorem, which closes the chapter, both hinge on a single analytic tool: the operator exponential. Without it, there is no way to connect the algebraic condition $xz = zy$ to the adjoint condition $x^*z = zy^*$, and no way to write every unitary as $e^{iQ}$. The underlying reason is that the function $t \mapsto e^{itA}$ is the bridge between the self-adjoint world (real spectra, bounded observables) and the unitary world (spectra on the circle), and the exponential series makes this bridge algebraically tractable.
[definition: Exponential in a Banach Algebra]
Let $A$ be a unital Banach algebra and $x \in A$. Define
\begin{align*}
e^x := \sum_{n=0}^\infty \frac{x^n}{n!},
\end{align*}
where $x^0 = 1_A$. The series converges absolutely in $A$ (as $\sum_n \|x\|^n/n! = e^{\|x\|} < \infty$), so the partial sums are Cauchy and the limit exists.
[/definition]
If $x, y \in A$ commute ($xy = yx$), then $e^{x+y} = e^x \cdot e^y$, exactly as for scalars. Note that commutativity is required: without it, the binomial expansion used to equate $\sum_n (x+y)^n/n!$ with $(\sum_n x^n/n!)(\sum_n y^n/n!)$ is invalid, and $e^{x+y} \ne e^x e^y$ in general.
## The Fuglede–Putnam–Rosenblum Lemma
Here is the difficulty: if $xz = zy$ for normal elements $x, y$ and arbitrary $z$, can we conclude $x^*z = zy^*$? At first glance, there is no reason to expect this — the condition $xz = zy$ says nothing about how $z$ interacts with the adjoints $x^*$ and $y^*$, and in a non-commutative setting these can behave completely differently. For a non-normal operator, the lemma indeed fails: one can construct $x = \begin{pmatrix} 0 & 1 \\ 0 & 0 \end{pmatrix}$ and $z, y$ such that $xz = zy$ but $x^*z \ne zy^*$. The key property that makes $x$ normal — that $x$ and $x^*$ commute — is precisely what allows the exponential argument to translate information about $x$ to information about $x^*$.
[quotetheorem:2694]
[citeproof:2694]
## The Spectral Theorem for Normal Operators
We are now in a position to prove the principal theorem of the chapter, which generalises the finite-dimensional spectral theorem (diagonalisation of normal matrices) to infinite-dimensional Hilbert spaces.
[quotetheorem:2695]
Before turning to the proof, it is worth pausing to interpret the formula $T = \int_K \lambda\, dP(\lambda)$ — especially in light of the finite-dimensional case, where the integral collapses to a familiar sum.
[remark: What the Spectral Decomposition Means]
In finite dimensions, a normal matrix $T$ with eigenvalues $\lambda_1, \dots, \lambda_n$ (counted with multiplicity) satisfies $T = \sum_k \lambda_k P_k$, where $P_k$ is the orthogonal projection onto the $\lambda_k$-eigenspace. The spectral decomposition $T = \int_K \lambda\, dP(\lambda)$ is the infinite-dimensional analogue: the "sum" over eigenvalues becomes an integral over the spectrum, and the discrete spectral projections become the continuous projection-valued measure $P$.
[/remark]
[citeproof:2695]
The normality hypothesis is not negotiable. For a non-normal operator — say, the unilateral left shift $S$ on $\ell^2(\mathbb{N})$ defined by $S(e_n) = e_{n-1}$ for $n \ge 1$ and $S(e_0) = 0$ — the operator fails to be diagonalised by any resolution of the identity over $\sigma(S)$. Concretely, $\sigma(S)$ is the closed unit disk, and there is no projection-valued measure $P$ on the disk such that $S = \int \lambda\, dP(\lambda)$. The reason is that such a decomposition would force $S$ to be normal (since $\int \lambda\, dP$ and $\int \bar\lambda\, dP$ commute by the homomorphism property), contradicting $SS^* \ne S^*S$ for the shift. Furthermore, the theorem does not assert the existence of eigenvectors: when $T$ has continuous spectrum (e.g., the multiplication operator $M_\lambda$ on $L^2([0,1])$ has spectrum $[0,1]$ with no point spectrum), there are no classical eigenvectors at all, and the spectral measure $P$ replaces the role of eigenspaces entirely.
## The Borel Functional Calculus
The spectral decomposition is only as powerful as the functions we can plug into it. The previous chapter's Gelfand–Naimark theory gives us a continuous functional calculus: for $f \in C(\sigma(T))$, we can define $f(T)$ as the image of $f$ under the inverse Gelfand transform. But continuous functions cannot capture spectral projections — the indicator function $\mathbb{1}_E$ is not continuous on $\sigma(T)$ (unless $E$ is both open and closed), yet it is precisely the function that should give $f(T) = P(E)$. To define $P(E)$ from $T$ alone, we must extend the functional calculus from $C(\sigma(T))$ to all bounded Borel functions on $\sigma(T)$. The spectral decomposition makes this extension possible.
[quotetheorem:2696]
[citeproof:2696]
The BFC is the capstone of the course's spectral theory programme. Starting from the Gelfand–Naimark theorem (which gave the continuous functional calculus $C(K) \to \mathcal{L}(H)$), we have extended to all of $L^\infty(K)$. The price is losing isometry (only norm-decrease for general Borel functions), but we gain the ability to apply discontinuous functions — most importantly, indicator functions $\mathbb{1}_E$ (which give the spectral projections $P(E)$) and the function $|\cdot|$ (which gives the absolute value $|T|$).
The normality hypothesis cannot be dropped: for the unilateral shift $S$ on $\ell^2$ — which has $\sigma(S)$ equal to the closed unit disk but is not normal — there is no projection-valued measure $P$ on $\sigma(S)$, and so no BFC; for instance, $\mathbb{1}_{\{0\}}(S)$ would have to be the projection onto $\ker S = \{0\}$, but no consistent assignment $f \mapsto f(S)$ extends the polynomial calculus to all of $L^\infty(\sigma(S))$ as a $*$-homomorphism. Likewise, the restriction to bounded Borel functions is essential: for unbounded Borel $f$, the integral $\int f\, dP$ need not converge in $\mathcal{L}(H)$ (it produces an unbounded, in general only densely defined, operator), and one must work in the framework of unbounded self-adjoint operators rather than $\mathcal{L}(H)$.
The inclusion in property (iv) can be strict: for the multiplication operator $M_\lambda$ on $L^2([0,1])$ with spectrum $K = [0,1]$ and $f(\lambda) = \mathbb{1}_{[0,1/2]}(\lambda)$, the operator $f(T) = P([0,1/2])$ is a projection with spectrum $\{0,1\}$, while $f(K) = \{0,1\}$ — so here equality happens to hold. But for $f(\lambda) = \mathbb{1}_{\{1/2\}}(\lambda)$ (indicator of a single point), $f(T) = P(\{1/2\})$, which is zero since $\{1/2\}$ has Lebesgue measure zero and is a $P$-null set; then $\sigma(f(T)) = \{0\} \subsetneq \{0,1\} = f(K)$. The discrepancy arises precisely because $P$-null sets allow $f$ to take a non-zero value on a set that $P$ "does not see."
[example: Computing $f(T)$ for a Diagonal Operator]
Let $H = \ell^2(\mathbb{N})$ and $T(e_n) = \lambda_n e_n$ for a bounded sequence $(\lambda_n) \subset \mathbb{C}$, where $(e_n)$ is the standard orthonormal basis. Then $T$ is normal with spectrum $\sigma(T) = \overline{\{\lambda_n : n \ge 1\}}$, and the resolution of the identity is
\begin{align*}
P(E) = \sum_{n:\, \lambda_n \in E} e_n e_n^*.
\end{align*}
For any bounded Borel $f: \sigma(T) \to \mathbb{C}$, the BFC gives
\begin{align*}
f(T)(e_n) = f(\lambda_n) e_n,
\end{align*}
so $f(T)$ is again a diagonal operator. To verify: $(f(T)x, y) = \int f\, dP_{x,y} = \sum_n f(\lambda_n)(x, e_n)\overline{(y, e_n)}$, which is the inner product of $\sum_n f(\lambda_n)(x,e_n)e_n$ with $y$.
As a concrete case, take $\lambda_n = 1/n$ and $f(\lambda) = \mathbb{1}_{(0, 1/2]}(\lambda)$. Then $f(T)(e_n) = e_n$ for $n \ge 2$ and $f(T)(e_1) = 0$ (since $\lambda_1 = 1 > 1/2$). The operator $f(T)$ is the orthogonal projection onto $\overline{\operatorname{span}\{e_2, e_3, \dots\}}$, the spectral projection of $T$ onto the spectral region $(0, 1/2]$.
[/example]
## Applications of the Borel Functional Calculus
We close the course with three elegant applications that illustrate the power of the BFC.
### Polar Decomposition of Normal Operators
Every complex number $z \ne 0$ factors as $z = |z| \cdot (z/|z|)$, a product of its modulus and a point on the unit circle. The polar decomposition asks whether a similar factorisation $T = RU$ holds for operators. For general bounded operators, the answer involves partial isometries — unitary operators only on the orthogonal complement of $\ker(T)$ — and the "positive part" $|T| = (T^*T)^{1/2}$ need not commute with the "rotation" part. For normal operators, the BFC provides both parts simultaneously and guarantees they commute.
[quotetheorem:2697]
[citeproof:2697]
The commutativity conclusion — that $R$, $U$, and $T$ pairwise commute — is a feature specific to normal operators. For a general (non-normal) invertible operator $T$, one can still write $T = U|T|$ where $|T| = (T^*T)^{1/2}$ and $U$ is unitary, but $U$ and $|T|$ need not commute. As a concrete failure, take $T = \begin{pmatrix} 1 & 1 \\ 0 & 1 \end{pmatrix}$ on $\mathbb{C}^2$: then $T^*T = \begin{pmatrix} 1 & 1 \\ 1 & 2 \end{pmatrix}$, $|T| = (T^*T)^{1/2}$ has eigenvalues the positive square roots of those of $T^*T$ (a non-trivial real symmetric matrix), and $U = T|T|^{-1}$ rotates the eigenframe of $|T|$ — a direct computation shows $U|T| \ne |T|U$. The deeper difference is this: for the general polar decomposition, the "rotation part" $U$ is constructed via $U = T|T|^{-1}$, which does not produce an element of the $C^*$-algebra generated by $T$ alone — it requires $T$ and $(T^*T)^{1/2}$ together. For normal $T$, the BFC keeps everything inside the algebra generated by $T$, which is why commutativity is automatic. Finally, if $T$ is normal but not invertible ($0 \in \sigma(T)$), the value assigned to $u(0) = 1$ is a convention: there is no canonical choice on $\ker(T)$, but choosing $u = 1$ there gives a globally unitary operator that still satisfies $T = RU$.
### Every Unitary Operator Is an Exponential
The next application gives the analogue, for unitary operators, of Euler's formula $e^{i\theta}$ for points on the unit circle.
[quotetheorem:2698]
[citeproof:2698]
The operator $Q = f(U)$ is a "logarithm" of $U$ in the sense that $e^{iQ} = U$. Unlike the complex logarithm on $\mathbb{C}$, this is a globally defined, bounded, self-adjoint operator on $H$. The reason a global definition is possible here, when the complex logarithm requires a branch cut, is that the BFC operates in $L^\infty$ and can use a discontinuous function $f$ — the measurable choice of argument — without any penalty. By contrast, if we required $f$ to be continuous on $\sigma(U)$, we would need $\sigma(U)$ to be simply connected. If $\sigma(U) = S^1$ (the full circle), no continuous branch of argument exists, and the continuous functional calculus alone cannot produce $Q$.
The logarithm $Q$ is not unique: any measurable function $f$ on $S^1$ satisfying $e^{if(t)} = t$ gives a valid $Q$, and there are countably many distinct such functions (differing by multiples of $2\pi$ on various Borel subsets of $S^1$). Each choice gives a different self-adjoint $Q$ with $e^{iQ} = U$.
### The Invertible Operators Form a Connected Group
The final application combines the polar decomposition and the exponential representation to prove a striking topological fact about $\mathcal{L}(H)$.
[quotetheorem:2699]
[citeproof:2699]
The path $\gamma(t) = e^{tS} e^{itQ}$ has a concrete meaning: it deforms $T$ back to the identity by simultaneously "unrotating" the unitary factor ($e^{itQ} \to I$ as $t \to 0$) and "unshrinking" the positive factor ($e^{tS} \to I$ as $t \to 0$). Both deformations pass through invertible operators because exponentials of bounded operators are always invertible.
The hypotheses are essential. Invertibility is what allows $\sigma(R) \subset (0, \infty)$ and so makes $\log(R)$ well-defined; if $T$ is not invertible, $0 \in \sigma(R)$ and the logarithm blows up. The Hilbert-space hypothesis is also load-bearing: for a general Banach space $X$, there is no inner-product structure, hence no notion of "unitary," and the polar decomposition $T = RU$ has no analogue. Indeed, the analogous statement for $G(\mathcal{L}(X))$ on a Banach space $X$ can fail — there exist Banach spaces (e.g., $X = \ell^p$ for certain $p$ in suitable settings) where the group of invertible operators is disconnected. The combination of the polar decomposition and the unitary exponential representation, both of which require the Hilbert-space structure, is precisely what makes the proof work.
[explanation: Significance of Connectedness]
The connectedness of $G(\mathcal{L}(H))$ contrasts sharply with the finite-dimensional case: in $GL(n, \mathbb{R})$, the connected components are separated by the sign of the determinant. In infinite dimensions, $\mathcal{L}(H)$ has no determinant in the classical sense for general operators, and the proof reveals why: every invertible operator is a product of two "exponentials," and exponentials are always connected to the identity. This result has consequences in $K$-theory and in the study of Fredholm operators and index theory.
[/explanation]
## References
Contents
- 1. Hahn-Banach Extension Theorems
- Dual Spaces and Basic Notation
- The Hahn-Banach Theorem
- The Bidual and Reflexivity
- Dual Operators
- Quotient Spaces
- Three Big Corollaries of Hahn-Banach
- Separability Passes from $X^*$ to $X$
- Every Separable Space Embeds Isometrically into $\ell_\infty$
- Banach-Valued Liouville Theorem
- Locally Convex Spaces
- 2. The Dual Spaces of $L^p(\mu)$ and $C(K)$
- $L^p(\mu)$ Spaces and Their Completeness
- The Radon-Nikodym Theorem
- The Dual of $L^p(\mu)$
- The Dual of $C(K)$ and the Riesz Representation Theorem
- Topological Preliminaries: Urysohn and Partitions of Unity
- Borel Measures and Regularity
- The Riesz Representation Theorem
- 3. Weak Topologies
- Weak Topologies Generated by a Family of Functions
- Classical Topologies as Weak Topologies
- Metrizability of Weak Topologies
- Tychonov's Theorem
- Weak Topologies on Vector Spaces
- The Weak and Weak-$*$ Topologies
- Comparing the Topologies
- Weak Convergence and Boundedness
- The Hahn-Banach Separation Theorems
- The Minkowski Functional
- Separation Theorems
- Reflexivity, Separability, and the Cantor Universality of Weak Topologies
- Mazur's Theorem
- The Banach-Alaoglu Theorem
- Separability and Metrizability
- Goldstein's Theorem and Reflexivity
- Separable Spaces and the Cantor Set
- 4. Convexity and the Krein-Milman Theorem
- Extreme Points of Convex Sets
- The Krein-Milman Theorem
- The Dual Unit Ball Always Has Extreme Points
- Faces and Slices: Towards a Converse
- A Partial Converse to Krein-Milman
- Extreme Points of the Dual Ball of $C(K)$
- The Banach-Stone Theorem
- 5. Banach Algebras
- Algebras and Banach Algebras
- Elementary Constructions
- Spectral Theory
- Commutative Banach Algebras and the Gelfand Transform
- The Gelfand Topology and Gelfand Transform
- The Jacobson Radical
- 6. Holomorphic Functional Calculus
- Setting Up the Algebra of Holomorphic Functions
- Vector-Valued Integration
- Constructing and Proving the Functional Calculus
- Runge's Approximation Theorem via the HFC
- Completing the Proof of the Holomorphic Functional Calculus
- 7. $C^*$-Algebras
- Involutions and the $C^*$-Equation
- Examples and Special Elements
- Spectral Properties and Characters
- The Commutative Gelfand–Naimark Theorem
- Applications of the Gelfand–Naimark Theorem
- Positive Elements and Square Roots
- Polar Decomposition of Invertible Operators
- 8. Borel Functional Calculus and Spectral Theory
- Resolutions of the Identity
- Immediate Properties of Resolutions
- Integration Against a Resolution of the Identity
- The Spectral Theorem for Commutative $C^*$-Algebras
- Exponentials in Banach Algebras
- The Fuglede–Putnam–Rosenblum Lemma
- The Spectral Theorem for Normal Operators
- The Borel Functional Calculus
- Applications of the Borel Functional Calculus
- Polar Decomposition of Normal Operators
- Every Unitary Operator Is an Exponential
- The Invertible Operators Form a Connected Group
- References
Cambridge III Functional Analysis
Content
Problems
History
Created by admin on 4/28/2026 | Last updated on 6/1/2026
Prerequisites
No prerequisites required for this page.
Rate this page
★
★
★
★
★
Poor
Excellent