Linear analysis is the study of infinite-dimensional vector spaces equipped with extra structure — norms, inner products, topologies — that enables analysis. In finite dimensions, linear algebra suffices: every linear map is automatically continuous, every bounded sequence has a convergent subsequence, and every subspace is closed. None of these properties survive the passage to infinite dimensions, and the central challenge of the subject is understanding precisely what does and does not carry over.
The motivation is concrete. Differential equations — ordinary and partial — are naturally formulated as operator equations $Lu = f$, where $L$ is a differential operator acting on a space of [functions](/page/Function). To apply the tools of linear algebra (existence of solutions, spectral decompositions, perturbation theory), we need the space of functions to be a vector space, and $L$ to be a linear map between such spaces. But to do *analysis* — to take [limits](/page/Limit), extract convergent subsequences, approximate rough objects by smooth ones — we need more: we need a topology compatible with the algebraic structure, and ideally completeness. The theory developed in these notes provides exactly this framework.
The notes follow the Cambridge Part IB Linear Analysis course. We begin with [normed vector spaces](/page/Normed%20Vector%20Space) and their basic theory (including dual spaces and adjoint maps), then study finite-dimensional spaces (where everything is controlled by compactness of the unit ball), and proceed to the three pillars of the subject: the Hahn-Banach theorem (which guarantees a rich supply of linear functionals), the Baire Category theorem (which yields uniform boundedness, the open mapping theorem, and the closed graph theorem), and the theory of compact operators on [Hilbert spaces](/page/Hilbert%20Space) (culminating in the spectral theorem).
# Normed Vector Spaces
The starting point of linear analysis is to combine the algebraic structure of a vector space with a notion of "size" — a norm — that allows us to measure distances, define convergence, and do analysis. This chapter introduces normed vector spaces and develops the basic constructions: [Banach spaces](/page/Banach%20Space) (complete normed spaces), bounded [linear maps](/page/Linear%20Map), dual spaces, adjoints, and the bidual. These objects form the vocabulary of the entire subject.
## The Norm and its Topology
A vector space on its own carries no analytic structure: there is no notion of convergence, no concept of [continuity](/page/Continuity), no way to say that two elements are "close." A norm remedies this by assigning a non-negative real number to each vector, measuring its size.
[definition:Normed Vector Space]
A **normed vector space** is a pair $(V, \|\cdot\|)$ where $V$ is a vector space over a field $\mathbb{F}$ (either $\mathbb{R}$ or $\mathbb{C}$) and $\|\cdot\|: V \to \mathbb{R}$ is a **norm**, i.e. a function satisfying:
(i) $\|v\| \ge 0$ for all $v \in V$, with equality if and only if $v = 0$.
(ii) $\|\lambda v\| = |\lambda| \cdot \|v\|$ for all $\lambda \in \mathbb{F}$, $v \in V$.
(iii) $\|v + w\| \le \|v\| + \|w\|$ for all $v, w \in V$ (the triangle inequality).
[/definition]
A norm induces a metric $d(v, w) := \|v - w\|$, which in turn induces a topology: a set $U \subset V$ is open if and only if every point $x \in U$ has an open ball $B_\varepsilon(x) := \{y \in V : \|y - x\| < \varepsilon\} \subset U$ around it. This gives $V$ the structure of a [topological](/page/Topology) space, and a natural question arises: how does the topology interact with the algebraic operations?
The answer is that the algebraic operations are continuous with respect to the norm topology. This is the content of the following result.
[theorem:Continuity of Algebraic Operations]
Let $(V, \|\cdot\|)$ be a normed vector space. Then addition $+: V \times V \to V$ and scalar multiplication $\cdot: \mathbb{F} \times V \to V$ are continuous with respect to the norm topology on $V$ (and the product topology on the domain).
[/theorem]
The key point is that the norm provides *quantitative* control over both operations. For addition, this is immediate from the triangle inequality: if $w_i \in B_{\varepsilon/2}(v_i)$ for $i = 1, 2$, then $\|w_1 + w_2 - v_1 - v_2\| \le \|w_1 - v_1\| + \|w_2 - v_2\| < \varepsilon$, so the sum stays in $B_\varepsilon(v_1 + v_2)$. For scalar multiplication, the argument requires a bit more care because both the scalar and the vector can vary simultaneously, but the essential mechanism is the same: the estimate $\|aw - \lambda v\| \le |a| \cdot \|w - v\| + |a - \lambda| \cdot \|v\|$ shows that both factors can be controlled independently.
[proof]
**Step 1: Continuity of addition.** Let $U \subset V$ be open and take $(v_1, v_2)$ with $v_1 + v_2 \in U$. Since $U$ is open, there exists $\varepsilon > 0$ with $B_\varepsilon(v_1 + v_2) \subset U$. For any $w_i \in B_{\varepsilon/2}(v_i)$, the triangle inequality gives:
\begin{align*}
\|w_1 + w_2 - v_1 - v_2\| \le \|w_1 - v_1\| + \|w_2 - v_2\| < \varepsilon/2 + \varepsilon/2 = \varepsilon.
\end{align*}
Hence $B_{\varepsilon/2}(v_1) \times B_{\varepsilon/2}(v_2) \subset (+)^{-1}(U)$, which is open in the product topology.
**Step 2: Continuity of scalar multiplication.** Let $U \subset V$ be open and take $(\lambda, v)$ with $\lambda v \in U$. Choose $\varepsilon > 0$ with $B_\varepsilon(\lambda v) \subset U$.
*Case $\lambda \ne 0$, $v \ne 0$:* Set $\delta = \varepsilon / (2\|v\|)$ and choose $r > 0$ small enough that $|a| r < \varepsilon/2$ for all $a \in (\lambda - \delta, \lambda + \delta)$. For $a \in (\lambda - \delta, \lambda + \delta)$ and $w \in B_r(v)$:
\begin{align*}
\|aw - \lambda v\| \le \|aw - av\| + \|av - \lambda v\| = |a| \cdot \|w - v\| + |a - \lambda| \cdot \|v\| < \varepsilon/2 + \varepsilon/2 = \varepsilon.
\end{align*}
*Case $\lambda = 0$:* We need $\|aw\| < \varepsilon$. For $w \in B_1(v)$ we have $\|w\| < \|v\| + 1$, so taking $|a| < \varepsilon / (\|v\| + 1)$ gives $\|aw\| = |a| \cdot \|w\| < \varepsilon$.
*Case $v = 0$:* We need $\|aw\| < \varepsilon$. Taking $w \in B_r(0)$ and $a \in (\lambda - \delta, \lambda + \delta)$ with $r$ small enough that $(|\lambda| + \delta) r < \varepsilon$ gives $\|aw\| = |a| \cdot \|w\| < (|\lambda| + \delta)r < \varepsilon$.
In every case we have found an open neighbourhood of $(\lambda, v)$ mapping into $U$.
[/proof]
This result shows that every normed vector space is a **[topological vector space](/page/Topological%20Vector%20Space)** — a vector space equipped with a topology making addition and scalar multiplication continuous, and in which singleton sets are closed. The converse question — when is a topological vector space normable? — has a clean answer: a topological vector space admits a norm inducing its topology if and only if it has an absolutely convex bounded neighbourhood of $0$. We will not pursue this direction further, but it explains why normed spaces occupy a privileged position: they are precisely the topological vector spaces whose local geometry is controlled by a single convex, balanced set (the unit ball).
## Banach Spaces
A normed vector space provides the setting for analysis, but without completeness — the property that every Cauchy sequence converges — many fundamental tools are unavailable. We cannot guarantee that limits of approximating [sequences](/page/Sequence) exist in the space, we cannot take closures meaningfully, and we cannot apply the powerful results (Baire category, open mapping, uniform boundedness) that form the backbone of the subject. Completeness is therefore not a luxury but a necessity.
[definition:Banach Space]
A normed vector space $(V, \|\cdot\|)$ is a **Banach space** if it is complete as a metric space, i.e. if every Cauchy sequence in $V$ converges to a limit in $V$.
[/definition]
[example:Bounded Functions]
Let $X$ be a compact Hausdorff space. Define
\begin{align*}
B(X) := \{f: X \to \mathbb{R} \mid f \text{ is bounded}\}
\end{align*}
with the supremum norm $\|f\|_\infty := \sup_{x \in X} |f(x)|$. We claim $(B(X), \|\cdot\|_\infty)$ is a Banach space.
Let $(f_n)_{n=1}^\infty$ be a Cauchy sequence in $B(X)$. For each fixed $x \in X$, the numerical sequence $(f_n(x))_{n=1}^\infty$ satisfies $|f_n(x) - f_m(x)| \le \|f_n - f_m\|_\infty$, so it is Cauchy in $\mathbb{R}$. By completeness of $\mathbb{R}$, the pointwise limit $f(x) := \lim_{n \to \infty} f_n(x)$ exists for every $x$.
We now show $f_n \to f$ in $B(X)$. Let $\varepsilon > 0$. Since $(f_n)$ is Cauchy, there exists $N$ such that $\|f_n - f_m\|_\infty < \varepsilon/2$ for all $n, m \ge N$. For each fixed $x \in X$ and $n \ge N$, take $m \to \infty$ in the inequality $|f_n(x) - f_m(x)| < \varepsilon/2$ to obtain $|f_n(x) - f(x)| \le \varepsilon/2 < \varepsilon$. Since this holds for all $x$, we have $\|f_n - f\|_\infty \le \varepsilon$ for all $n \ge N$. Finally, $f$ is bounded because $|f(x)| \le |f(x) - f_N(x)| + |f_N(x)| \le \varepsilon + \|f_N\|_\infty$ for all $x$.
[/example]
The argument above — construct the limit pointwise using completeness of the scalars, then upgrade to norm convergence — is the standard method for proving completeness of function spaces defined by supremum-type norms. It applies equally to the next example.
[example:Continuous Functions on a Compact Space]
Let $X$ be a compact Hausdorff space. Define $C(X) := \{f: X \to \mathbb{R} \mid f \text{ is continuous}\}$ with the supremum norm. Since $X$ is compact, every continuous $f: X \to \mathbb{R}$ is bounded, so $C(X) \subset B(X)$ is a linear subspace.
The space $(C(X), \|\cdot\|_\infty)$ is a Banach space. Since $(B(X), \|\cdot\|_\infty)$ is complete, it suffices to show that $C(X)$ is a closed subspace. Suppose $f_n \to f$ in $B(X)$ with each $f_n$ continuous. Then $f_n \to f$ uniformly, and the uniform limit of continuous functions is continuous. Hence $f \in C(X)$, so $C(X)$ is closed and therefore complete.
[/example]
[example:The Space $\ell^p$]
For $p \in [1, \infty)$, define the sequence space
\begin{align*}
\ell^p(\mathbb{R}) := \left\{x = (x_1, x_2, \ldots) \mid x_i \in \mathbb{R} \text{ for all } i \text{ and } \sum_{i=1}^\infty |x_i|^p < \infty \right\}
\end{align*}
with the norm $\|x\|_{\ell^p} := \left(\sum_{i=1}^\infty |x_i|^p\right)^{1/p}$. The triangle inequality for this norm is the **Minkowski inequality**. The space $(\ell^p, \|\cdot\|_{\ell^p})$ is a Banach space. The completeness proof follows the same pointwise-then-upgrade structure as for $B(X)$:
Let $(x^{(n)})_{n=1}^\infty$ be a Cauchy sequence in $\ell^p$. For each fixed index $i$, the scalar sequence $(x_i^{(n)})_n$ satisfies $|x_i^{(n)} - x_i^{(m)}| \le \|x^{(n)} - x^{(m)}\|_{\ell^p}$ (since the $i$-th term of the sum is bounded by the full $\ell^p$ norm), so it is Cauchy in $\mathbb{R}$ and converges to some $x_i \in \mathbb{R}$. Define $x := (x_1, x_2, \ldots)$. To show $x \in \ell^p$ and $x^{(n)} \to x$ in $\ell^p$: fix $\varepsilon > 0$ and choose $N$ with $\|x^{(n)} - x^{(m)}\|_{\ell^p}^p < \varepsilon^p$ for all $n, m \ge N$. For any finite truncation length $K$:
\begin{align*}
\sum_{i=1}^K |x_i^{(n)} - x_i|^p = \lim_{m \to \infty} \sum_{i=1}^K |x_i^{(n)} - x_i^{(m)}|^p \le \liminf_{m \to \infty} \|x^{(n)} - x^{(m)}\|_{\ell^p}^p \le \varepsilon^p.
\end{align*}
Taking $K \to \infty$ gives $\|x^{(n)} - x\|_{\ell^p}^p \le \varepsilon^p$ for all $n \ge N$, and the triangle inequality gives $\|x\|_{\ell^p} \le \|x - x^{(N)}\|_{\ell^p} + \|x^{(N)}\|_{\ell^p} < \infty$.
Similarly, $\ell^\infty(\mathbb{R}) := \{x = (x_1, x_2, \ldots) \mid \sup_i |x_i| < \infty\}$ with norm $\|x\|_{\ell^\infty} := \sup_i |x_i|$ is a Banach space (the completeness proof is identical to the proof for $B(X)$, with the index set $\mathbb{N}$ replacing $X$). These sequence spaces serve as fundamental building blocks throughout functional analysis.
[/example]
[example:An Incomplete Normed Space]
Consider the space $\hat{L}^1([0,1])$ of continuous functions $f: [0,1] \to \mathbb{R}$ with the $L^1$ norm $\|f\|_1 := \int_0^1 |f(x)| \, d\mathcal{L}^1(x)$. This is a normed vector space (the triangle inequality is the Minkowski inequality), but it is **not** complete. We exhibit an explicit Cauchy sequence with no limit in the space.
Define $f_n: [0,1] \to \mathbb{R}$ by:
\begin{align*}
f_n(x) := \begin{cases} 0 & \text{if } 0 \le x \le \tfrac{1}{2}, \\ n(x - \tfrac{1}{2}) & \text{if } \tfrac{1}{2} < x \le \tfrac{1}{2} + \tfrac{1}{n}, \\ 1 & \text{if } \tfrac{1}{2} + \tfrac{1}{n} < x \le 1. \end{cases}
\end{align*}
Each $f_n$ is continuous, so $f_n \in \hat{L}^1([0,1])$. We claim $(f_n)_n$ is Cauchy in the $L^1$ norm. For $m > n$, the functions $f_n$ and $f_m$ agree outside the interval $[\tfrac{1}{2}, \tfrac{1}{2} + \tfrac{1}{n}]$, and on this interval both take values in $[0, 1]$. Therefore:
\begin{align*}
\|f_n - f_m\|_1 = \int_{1/2}^{1/2 + 1/n} |f_n(x) - f_m(x)| \, d\mathcal{L}^1(x) \le \int_{1/2}^{1/2 + 1/n} 1 \, d\mathcal{L}^1(x) = \frac{1}{n} \to 0.
\end{align*}
So $(f_n)_n$ is Cauchy. However, $f_n$ converges pointwise to the Heaviside function $f(x) = \mathbb{1}_{(1/2, 1]}(x)$, which is discontinuous at $x = 1/2$. If $(f_n)_n$ had a limit $g \in \hat{L}^1([0,1])$ (i.e., $g$ continuous), then $\|f_n - g\|_1 \to 0$ would force $f_n \to g$ in measure, and hence some subsequence would converge to $g$ pointwise a.e. But the full sequence already converges pointwise to $f$, so $g = f$ a.e., contradicting the continuity of $g$ (since $f$ has a jump at $1/2$).
This example reveals the fundamental mechanism behind incompleteness: the $L^1$ norm is *too weak* to prevent limits from developing discontinuities. The continuous functions which are "close" in $L^1$ can differ on [sets](/page/Set) of small measure but by large amounts — the norm controls the area between the graphs, not the height of the oscillations. The completion of $\hat{L}^1([0,1])$ is the Lebesgue space $L^1([0,1])$, consisting of [integrable](/page/Integral) measurable functions identified up to a.e. equivalence, which includes the Heaviside function.
[/example]
## Bounded Linear Maps
Given two normed vector spaces $X$ and $Y$, the natural morphisms to study are linear maps $T: X \to Y$ that respect the topological structure — that is, continuous linear maps. In finite dimensions every linear map is continuous, but this fails dramatically in infinite dimensions: for instance, the differentiation operator $T: C^1([0,1]) \to C([0,1])$ defined by $T(f) = f'$ is unbounded when $C^1([0,1])$ carries the supremum norm (the sequence $f_n(x) = x^n$ has $\|f_n\|_\infty = 1$ but $\|T(f_n)\|_\infty = n$). Understanding which linear maps are well-behaved requires the notion of boundedness.
[definition:Bounded Linear Map]
Let $X$ and $Y$ be normed vector spaces. A linear map $T: X \to Y$ is **bounded** if there exists $C > 0$ such that $\|T(x)\|_Y \le C\|x\|_X$ for all $x \in X$. We write $\mathcal{L}(X, Y)$ for the space of all bounded linear maps $X \to Y$.
[/definition]
The condition says that $T$ maps the unit ball of $X$ into a bounded subset of $Y$ — equivalently, that $T$ does not "blow up" any bounded set. The key insight is that for linear maps, this is the *same* condition as continuity. This equivalence is the most basic and important fact about linear maps between normed spaces.
[quotetheorem:873]
The equivalence (ii) $\Leftrightarrow$ (iii) is the heart of the result. The point is that linearity reduces a global condition (continuity everywhere) to a local one (continuity at a single point), and then the homogeneity of the norm upgrades this further to a quantitative bound. The Lipschitz condition in (iii) $\Rightarrow$ (i) shows that bounded linear maps are not just continuous but uniformly continuous — their modulus of continuity is linear.
[citeproof:873]
With the equivalence established, we equip $\mathcal{L}(X, Y)$ with a natural norm that measures the "worst-case amplification" of a linear map.
[definition:Operator Norm]
For a bounded linear map $T \in \mathcal{L}(X, Y)$, the **operator norm** is defined by:
\begin{align*}
\|T\|_{\mathcal{L}(X,Y)} := \sup_{\substack{x \in X \\ \|x\|_X \le 1}} \|T(x)\|_Y.
\end{align*}
[/definition]
The operator norm is the smallest constant $C$ for which the boundedness condition $\|T(x)\|_Y \le C\|x\|_X$ holds for all $x \in X$. An equivalent expression, often more convenient for computation, is $\|T\|_{\mathcal{L}(X,Y)} = \sup_{x \ne 0} \|T(x)\|_Y / \|x\|_X$, which follows from the linearity of $T$. The key consequence for applications is the fundamental estimate:
\begin{align*}
\|T(x)\|_Y \le \|T\|_{\mathcal{L}(X,Y)} \cdot \|x\|_X \quad \text{for all } x \in X.
\end{align*}
[example:Operator Norm of the Integration Operator]
Define $T: (C([0,1]), \|\cdot\|_\infty) \to (C([0,1]), \|\cdot\|_\infty)$ by
\begin{align*}
(Tf)(x) := \int_0^x f(t) \, d\mathcal{L}^1(t).
\end{align*}
We compute $\|T\|_{\mathcal{L}(C([0,1]))}$ exactly.
**Upper bound.** For any $f \in C([0,1])$ with $\|f\|_\infty \le 1$ and any $x \in [0,1]$:
\begin{align*}
|(Tf)(x)| = \left|\int_0^x f(t) \, d\mathcal{L}^1(t)\right| \le \int_0^x |f(t)| \, d\mathcal{L}^1(t) \le \int_0^x 1 \, d\mathcal{L}^1(t) = x \le 1.
\end{align*}
Taking the supremum over $x$: $\|Tf\|_\infty \le 1$. Hence $\|T\| \le 1$.
**Lower bound.** Take the constant function $f \equiv 1$, which has $\|f\|_\infty = 1$. Then $(Tf)(x) = x$, so $\|Tf\|_\infty = \sup_{x \in [0,1]} x = 1$. Therefore:
\begin{align*}
\|T\| \ge \frac{\|Tf\|_\infty}{\|f\|_\infty} = \frac{1}{1} = 1.
\end{align*}
Combining: $\|T\|_{\mathcal{L}(C([0,1]))} = 1$. The supremum in the definition of the operator norm is attained by the constant function — not by a sequence of increasingly pathological functions. This is a special feature of this operator; in general, the supremum need not be attained.
Note also that $T$ is injective (if $Tf \equiv 0$ then $\int_0^x f(t) \, dt = 0$ for all $x$, so $f \equiv 0$ by the [Fundamental Theorem of Calculus](/theorems/632)) but *not* surjective (e.g., the constant function $g \equiv 1$ is not in the range, since $Tf$ must vanish at $x = 0$). So $T$ is a bounded injective operator that is not invertible — the invertibility of a bounded linear map between infinite-dimensional spaces cannot be read off from injectivity alone.
[/example]
## Dual Spaces
Among all bounded linear maps out of a normed vector space $V$, those mapping into the scalar field $\mathbb{F}$ play a distinguished role. These are the **linear functionals** — they assign a number to each vector, and they are the objects through which we "probe" the space. In finite dimensions, the dual space is isomorphic to the space itself, but in infinite dimensions the dual can be much larger and carries structure that the original space may lack. Most importantly, the dual is *always* complete, even when the original space is not.
[definition:Dual Space]
Let $V$ be a normed vector space over $\mathbb{F}$. The **dual space** of $V$, denoted $V^*$, is defined as $V^* := \mathcal{L}(V, \mathbb{F})$, the space of all bounded linear functionals $V \to \mathbb{F}$, equipped with the operator norm.
[/definition]
The fact that dual spaces are always Banach spaces is one of the most useful structural results in the theory. It means that even when working with an incomplete normed vector space, its dual provides a complete space to work in.
[quotetheorem:874]
The proof is a model instance of the "pointwise limit" technique that appears repeatedly in functional analysis: construct the candidate limit pointwise using completeness of the scalars, verify that it belongs to the space (linearity from linearity of limits, boundedness from uniform boundedness of the Cauchy sequence), and then upgrade pointwise convergence to norm convergence using the Cauchy condition. The same argument, with $\mathbb{F}$ replaced by an arbitrary Banach space $Y$, shows that $\mathcal{L}(X, Y)$ is complete whenever $Y$ is complete.
[citeproof:874]
[example:Dual of $\ell^p$]
Consider $\ell^p(\mathbb{R})$ for $p \in [1, \infty)$, and let $q$ denote its Hölder conjugate: $1/p + 1/q = 1$ (with $q = \infty$ if $p = 1$). Every element $y = (y_1, y_2, \ldots) \in \ell^q$ defines a bounded linear functional $y^*: \ell^p \to \mathbb{R}$ via
\begin{align*}
y^*(x) := \sum_{i=1}^\infty x_i y_i.
\end{align*}
This sum converges absolutely by Hölder's inequality: $|y^*(x)| \le \|x\|_{\ell^p} \cdot \|y\|_{\ell^q}$, which also gives $\|y^*\|_{(\ell^p)^*} \le \|y\|_{\ell^q}$. The map $y \mapsto y^*$ is an isometric isomorphism $\ell^q \cong (\ell^p)^*$. The identification $(\ell^p)^* \cong \ell^q$ is fundamental: it shows that dualisation "swaps" the integrability exponent to its conjugate.
[/example]
## Adjoint Maps
Given a bounded linear map $T: X \to Y$ between normed vector spaces, there is a natural way to "pull back" linear functionals from $Y^*$ to $X^*$. If $g: Y \to \mathbb{F}$ is a bounded linear functional on $Y$, then the composition $g \circ T: X \to \mathbb{F}$ is a bounded linear functional on $X$. This construction defines the adjoint.
[definition:Adjoint Map]
Let $X$, $Y$ be normed vector spaces and $T \in \mathcal{L}(X, Y)$. The **adjoint** of $T$ is the map $T^* \in \mathcal{L}(Y^*, X^*)$ defined by:
\begin{align*}
(T^*(g))(x) := g(T(x)) \quad \text{for all } x \in X, \, g \in Y^*.
\end{align*}
[/definition]
The adjoint is the unique map making the following diagram commute: given $g \in Y^*$, the functional $T^*(g) \in X^*$ is precisely $g \circ T$. The construction is natural in the sense that it depends only on the map $T$ and the linear structure, not on any choice of basis or inner product. A fundamental question is whether this transposition preserves the size of the operator.
[quotetheorem:876]
The inequality $\|T^*\| \le \|T\|$ says that the adjoint never amplifies more than the original map. With the Hahn-Banach theorem (which we will prove in the next chapter), this inequality strengthens to equality: $\|T^*\|_{\mathcal{L}(Y^*, X^*)} = \|T\|_{\mathcal{L}(X, Y)}$. The missing ingredient is the existence of support functionals that achieve the supremum in the definition of $\|T(x)\|_Y$, allowing the reverse inequality to be established. We will return to this after developing the Hahn-Banach theorem.
[citeproof:876]
## The Bidual and Reflexivity
We have seen that every normed vector space $V$ has a dual $V^*$, which is always a Banach space. Since $V^*$ is itself a normed vector space, it too has a dual — the **bidual** $V^{**} := (V^*)^*$. This is the space of bounded linear functionals on $V^*$, and it is again a Banach space (by the [Completeness of the Dual Space](/theorems/874), applied to $V^*$).
There is a natural way for elements of $V$ to act as functionals on $V^*$: each $v \in V$ defines an "evaluation at $v$" map $\phi(v): V^* \to \mathbb{F}$ by $(\phi(v))(f) := f(v)$. This gives a map $\phi: V \to V^{**}$, the **canonical embedding**. The fundamental question is: does $\phi$ identify $V$ with all of $V^{**}$, or is $V^{**}$ strictly larger?
[definition:Canonical Embedding]
Let $V$ be a normed vector space. The **canonical embedding** $\phi: V \to V^{**}$ is defined by:
\begin{align*}
(\phi(v))(f) := f(v) \quad \text{for all } f \in V^*, \, v \in V.
\end{align*}
[/definition]
At this stage we can prove that $\phi$ is bounded with $\|\phi\|_{\mathcal{L}(V, V^{**})} \le 1$. The argument is elementary: for any $v \in V$ and $f \in V^*$ with $\|f\|_{V^*} \le 1$, we have $|(\phi(v))(f)| = |f(v)| \le \|f\|_{V^*} \cdot \|v\|_V \le \|v\|_V$, so $\|\phi(v)\|_{V^{**}} \le \|v\|_V$.
The deeper fact — that $\phi$ is actually an *isometry*, meaning $\|\phi(v)\|_{V^{**}} = \|v\|_V$ for all $v \in V$ — requires producing a functional $f_v \in V^*$ that achieves the maximum $|f_v(v)| = \|v\|_V$ with $\|f_v\|_{V^*} = 1$. Such a functional is called a **support functional**, and its existence is guaranteed by the Hahn-Banach theorem. We will prove this isometry in the next chapter.
Since every isometry is injective, the canonical embedding allows us to view $V$ as a subspace of $V^{**}$. The bidual is therefore "at least as large" as $V$. When the two spaces coincide, the structure theory becomes particularly clean.
[definition:Reflexive Space]
A normed vector space $V$ is **reflexive** if the canonical embedding $\phi: V \to V^{**}$ is surjective, i.e. $\phi(V) = V^{**}$.
[/definition]
Reflexivity is not merely the existence of *some* isomorphism $V \cong V^{**}$ — it demands that the *canonical* embedding is surjective. This distinction matters: there exist non-reflexive Banach spaces $X$ for which $X \cong X^{**}$ via a non-canonical isomorphism (James's theorem provides such examples).
[example:Finite-Dimensional Spaces Are Reflexive]
Let $V = \mathbb{R}^n$ with the Euclidean norm and standard basis $\{e_1, \ldots, e_n\}$. Every linear functional $g: V \to \mathbb{R}$ is determined by $g(e_1), \ldots, g(e_n)$, and every such functional is bounded (since $|g(v)| \le \sum_{i=1}^n |v_i| \cdot |g(e_i)| \le n \cdot \max_i |g(e_i)| \cdot \|v\|$ by Cauchy-Schwarz in the finite sum). Hence $V^*$ is isomorphic to $\mathbb{R}^n$, and by the same argument $V^{**}$ is isomorphic to $\mathbb{R}^n$. The canonical embedding $\phi: V \to V^{**}$ is an injective linear map between spaces of the same finite dimension, and hence is surjective. So all finite-dimensional normed vector spaces are reflexive.
[/example]
[remark:Reflexivity Implies Completeness]
If $V$ is reflexive, then $V$ must be a Banach space. Indeed, $V^{**} = (V^*)^*$ is the dual of a normed vector space and hence is complete by the [Completeness of the Dual Space](/theorems/874). Since $\phi(V) = V^{**}$ and $\phi$ is an isometry, $V$ is isometrically isomorphic to a complete space and is therefore complete. This means that incompleteness is an immediate obstruction to reflexivity: no incomplete normed vector space can be reflexive.
[/remark]
The spaces $\ell^p$ are reflexive for $p \in (1, \infty)$, since $(\ell^p)^* \cong \ell^q$ and $(\ell^q)^* \cong \ell^p$ where $1/p + 1/q = 1$, and the composition of these identifications coincides with the canonical embedding. However, $\ell^1$ is not reflexive (its dual is $\ell^\infty$, which is "too large"), and $\ell^\infty$ is not reflexive either. The failure of reflexivity for $\ell^1$ and $\ell^\infty$ is closely connected to the weak and weak* topologies, which we will not pursue in this course but which are central to Part II Analysis of Functions.
The canonical embedding, adjoint maps, and dual spaces interact in a coherent way. Given $T \in \mathcal{L}(X, Y)$, its adjoint $T^* \in \mathcal{L}(Y^*, X^*)$ preserves the norm (once we have the Hahn-Banach theorem to strengthen the inequality $\|T^*\| \le \|T\|$ to equality). The adjoint of the adjoint, $T^{**} \in \mathcal{L}(X^{**}, Y^{**})$, extends the original map $T$ in the sense that $T^{**} \circ \phi_X = \phi_Y \circ T$, where $\phi_X$ and $\phi_Y$ are the canonical embeddings. This coherence is what makes the theory of duality a powerful structural tool rather than merely a formal construction.
# Finite-Dimensional Normed Vector Spaces
In the previous chapter we built the general theory of normed vector spaces, dual spaces, and bounded linear maps. The present chapter specialises to the finite-dimensional setting, where four remarkable properties hold simultaneously: all norms are equivalent, the closed unit ball is compact, every normed vector space is complete, and every linear map is bounded. None of these statements survive in infinite dimensions, and each fails for a different reason. Understanding *why* they hold in finite dimensions — and precisely where the proofs break down — is essential preparation for the deeper theory that follows.
The single key mechanism behind all four properties is **compactness of the closed unit ball**, which is itself a consequence of the Bolzano-Weierstrass theorem applied componentwise. This one fact cascades through the entire theory: it forces norms to be comparable via continuous functions on compact sets, it makes all [Cauchy sequences](/page/Cauchy%20Sequence) convergent by extracting subsequences, and it bounds linear maps by evaluating continuous functions on compact domains.
## Equivalence of Norms
In infinite-dimensional spaces, the choice of norm fundamentally affects the topology: a sequence can converge in one norm and diverge in another. In finite dimensions, this cannot happen. The deep reason is that all norms see the same compact sets, and compactness pins down the topology uniquely.
[definition:Lipschitz Equivalent Norms]
Let $V$ be a vector space. Two norms $\|\cdot\|$ and $\|\cdot\|'$ on $V$ are **Lipschitz equivalent** (or simply equivalent) if there exist constants $A, B > 0$ such that
\begin{align*}
A\|v\|' \le \|v\| \le B\|v\|' \quad \text{for all } v \in V.
\end{align*}
[/definition]
Lipschitz equivalence is an [equivalence relation](/page/Equivalence%20Relation). It preserves all topological properties: convergence, continuity, openness, closedness, compactness. Two equivalent norms define the same topology.
The fundamental result is that in finite dimensions, there is only one equivalence class.
[quotetheorem:877]
The proof has a beautiful structure. The upper bound $\|v\| \le C\|v\|_{\ell^1_n}$ is elementary (just the triangle inequality applied to a basis expansion). The lower bound is where the hard work lies: we need the norm $\|\cdot\|$, viewed as a continuous function on the $\ell^1_n$-unit sphere, to achieve a strictly positive infimum. This requires the unit sphere to be compact — and compactness follows from the Bolzano-Weierstrass theorem applied componentwise. The argument fails in infinite dimensions precisely because the unit sphere is no longer compact: one can find sequences on the sphere with no convergent subsequence, allowing the infimum of a continuous function to be zero without being attained.
[citeproof:877]
[example:Failure in Infinite Dimensions]
Consider $\ell^2(\mathbb{R})$ and define a second norm by $\|x\|' := \sup_{i} |x_i / i|$. Then $\|x\|' \le \|x\|_{\ell^2}$ (since $|x_i/i| \le |x_i| \le \|x\|_{\ell^2}$), but there is no constant $A > 0$ with $A\|x\|_{\ell^2} \le \|x\|'$. Indeed, the standard basis vectors $e_n$ satisfy $\|e_n\|_{\ell^2} = 1$ but $\|e_n\|' = 1/n \to 0$, so the ratio $\|e_n\|'/\|e_n\|_{\ell^2} = 1/n$ is unbounded below. These norms are genuinely inequivalent: $e_n \to 0$ in $\|\cdot\|'$ but not in $\|\cdot\|_{\ell^2}$.
[/example]
## Compactness and Completeness
The equivalence of norms has two immediate and powerful consequences: the closed unit ball is always compact (in any norm), and the space is always complete. Both follow because these properties hold for the $\ell^1_n$ norm (by Bolzano-Weierstrass and the completeness of $\mathbb{R}^n$), and Lipschitz equivalence preserves them.
[quotetheorem:878]
This result has enormous consequences. In the forward direction, it tells us that bounded closed sets in finite-dimensional spaces are compact — the familiar Heine-Borel property. In the reverse direction, it tells us that the norm topology in infinite-dimensional spaces is fundamentally different from the finite-dimensional case: bounded [closed sets](/page/Closed%20Set) are never compact, and this is the source of most of the difficulties in infinite-dimensional analysis.
The reverse direction is worth contemplating: the proof constructs a covering of $\overline{B}_1(0)$ by balls of radius $1/2$, extracts a finite subcover (by the assumed compactness), and then iteratively refines the covering to show that $V$ is contained in the span of finitely many centres. The iteration is the delicate part: we repeatedly use the fact that $B_{1/2^k}(0) \subset Y + B_{1/2^{k+1}}(0)$ (from rescaling the covering), and that the intersection $\bigcap_k (Y + B_{1/2^k}(0))$ collapses to $\overline{Y} = Y$ (since finite-dimensional subspaces are closed). This forces $\overline{B}_1(0) \subset Y$, and since every vector can be rescaled into $\overline{B}_1(0)$, we get $V = Y$.
[citeproof:878]
[example:The Unit Ball in $\ell^1$ Is Not Compact]
In $\ell^1(\mathbb{R})$, define $e_i = (0, \ldots, 0, 1, 0, \ldots)$ with $1$ in position $i$. Then $\|e_i\|_{\ell^1} = 1$ for all $i$, so $(e_i)_{i=1}^\infty \subset \overline{B}_1(0)$. But for $i \ne j$, $\|e_i - e_j\|_{\ell^1} = 2$. No subsequence of $(e_i)_i$ can be Cauchy (let alone convergent), since all pairwise distances equal $2$. In particular, $\overline{B}_1(0)$ in $\ell^1$ is not sequentially compact, and therefore not compact. This is consistent with the theorem since $\ell^1$ is infinite-dimensional.
More geometrically: in $\mathbb{R}^n$ the unit ball has "finitely many directions," and any sequence must eventually revisit a small neighbourhood. In $\ell^1$ the standard basis vectors are all on the unit sphere but all at distance $2$ from each other — the ball has "infinitely many independent directions."
[/example]
As an immediate consequence of compactness, finite-dimensional normed vector spaces are always Banach spaces: every Cauchy sequence is bounded, hence contained in a compact ball, hence has a convergent subsequence, and a Cauchy sequence with a convergent subsequence converges.
A further consequence is that every linear map out of a finite-dimensional space is bounded. The proof is elegant: given $T: V \to W$ with $V$ finite-dimensional, define a new norm on $V$ by $\|v\|' := \|v\|_V + \|T(v)\|_W$. By the equivalence of norms, $\|v\|' \le C\|v\|_V$ for some $C$, which gives $\|T(v)\|_W \le C\|v\|_V$. This argument does not use any property of $T$ other than linearity — it is the finite-dimensionality of the domain that does all the work.
[example:All Linear Maps Are Bounded in Finite Dimensions]
Let $V = \mathbb{R}^2$ with the Euclidean norm and let $T: \mathbb{R}^2 \to \mathbb{R}$ be the linear map $T(x_1, x_2) := 3x_1 - 7x_2$. We verify boundedness directly and via the equivalence-of-norms argument.
**Direct computation.** By the Cauchy-Schwarz inequality:
\begin{align*}
|T(x)| = |3x_1 - 7x_2| \le \sqrt{3^2 + 7^2} \cdot \sqrt{x_1^2 + x_2^2} = \sqrt{58} \cdot \|x\|.
\end{align*}
Moreover, taking $x = (3, -7)/\sqrt{58}$ gives $T(x) = (9 + 49)/\sqrt{58} = \sqrt{58}$. So $\|T\| = \sqrt{58}$.
**Via the new-norm trick.** Define $\|x\|' := \|x\| + |T(x)|$. This is a norm on $\mathbb{R}^2$ (positive-definiteness, homogeneity, and the triangle inequality are inherited from $\|\cdot\|$ and $|\cdot|$). By the [Equivalence of Norms](/theorems/877), there exists $C > 0$ with $\|x\|' \le C\|x\|$ for all $x$. Since $|T(x)| \le \|x\|' \le C\|x\|$, we get $T$ is bounded with $\|T\| \le C$. This argument works for *any* linear map $\mathbb{R}^2 \to W$, without computing $T$ explicitly.
The key point is that the equivalence-of-norms argument is *non-constructive*: it uses compactness to guarantee the existence of $C$ but does not compute it. The Cauchy-Schwarz computation gives the sharp constant $\sqrt{58}$, while the new-norm trick gives only existence. The power of the trick is its generality — it applies uniformly to all linear maps from any finite-dimensional space.
[/example]
[example:Failure in Infinite Dimensions — An Unbounded Linear Functional]
The equivalence-of-norms argument fails in infinite dimensions because norms need not be equivalent. Here is a concrete unbounded linear functional.
Consider the space $c_{00}(\mathbb{R}) := \{x = (x_1, x_2, \ldots) : x_i = 0 \text{ for all but finitely many } i\}$ with the $\ell^\infty$ norm $\|x\|_{\ell^\infty} = \sup_i |x_i|$. Define $T: c_{00} \to \mathbb{R}$ by:
\begin{align*}
T(x) := \sum_{i=1}^\infty i \cdot x_i.
\end{align*}
This is a well-defined linear map (the sum is finite since $x \in c_{00}$). But $T$ is unbounded: the standard basis vector $e_n = (0, \ldots, 0, 1, 0, \ldots)$ (with $1$ in position $n$) satisfies $\|e_n\|_{\ell^\infty} = 1$ and $T(e_n) = n$. Therefore:
\begin{align*}
\|T\| \ge \sup_n \frac{|T(e_n)|}{\|e_n\|_{\ell^\infty}} = \sup_n n = \infty.
\end{align*}
The space $c_{00}$ is infinite-dimensional (it contains all the linearly independent vectors $e_n$), and its unit ball is not compact (the sequence $(e_n)_n$ has $\|e_i - e_j\|_{\ell^\infty} = 1$ for $i \ne j$ and no convergent subsequence). This is why the compactness argument of the finite-dimensional proof breaks down.
[/example]
# The Hahn-Banach Theorem
The previous chapters established the basic vocabulary of normed vector spaces and dual spaces. We now face a fundamental question: **are there enough linear functionals?** In finite dimensions the answer is immediate — the dual of $\mathbb{R}^n$ is $\mathbb{R}^n$ itself, and we can construct any functional we want by specifying its values on a basis. In infinite dimensions, the existence of *bounded* linear functionals with prescribed properties is far from clear. The Hahn-Banach theorem provides the affirmative answer and is the most important existence result in functional analysis.
The theorem has many forms, but at its core it says one thing: **bounded linear functionals defined on a subspace can always be extended to the entire space without increasing their norm.** This single statement has far-reaching consequences: it guarantees that dual spaces are "large enough" to separate points, it shows that the canonical embedding into the bidual is an isometry, and it implies that the adjoint of a bounded linear map preserves the operator norm exactly.
## The Extension Theorem
The starting point is a purely algebraic question: if $f: W \to \mathbb{R}$ is a linear functional on a subspace $W \subset V$ satisfying $f(w) \le p(w)$ for some sublinear function $p$, can we extend $f$ to all of $V$ while maintaining this domination? The answer is yes, and the proof proceeds in two stages: first handle the codimension-one case (which is a concrete calculation), then iterate using Zorn's lemma.
The codimension-one step is the heart of the matter. If $W$ has codimension $1$ in $V$, then $V = W \oplus \operatorname{span}\{v_0\}$ for some $v_0 \notin W$, and extending $f$ amounts to choosing a single number $\tilde{f}(v_0)$. The constraint $\tilde{f}(v) \le p(v)$ for all $v$ translates into an interval of permissible values for $\tilde{f}(v_0)$, and the key computation shows this interval is non-empty — the supremum of the lower bounds is at most the infimum of the upper bounds, thanks to the subadditivity of $p$ and the linearity of $f$.
[quotetheorem:879]
Several aspects of this result deserve comment. First, the function $p$ is *not* assumed to be linear or even a norm — it is only positively homogeneous and subadditive. This generality is essential for applications: in the normed-space corollary below, we will take $p(v) = \|f\|_{W^*} \cdot \|v\|_V$, which is sublinear but not linear. Second, the use of Zorn's lemma is unavoidable in general (the result is equivalent to a weak form of the axiom of choice), though for [separable](/page/Separable) spaces one can construct the extension by induction over a countable dense subset. Third, the theorem gives existence but not uniqueness — the extension is generally not unique unless the subspace is dense.
[citeproof:879]
## Consequences for Normed Spaces
The power of the Hahn-Banach theorem becomes apparent when we specialise the sublinear function $p$ to the norm. This yields a cascade of increasingly strong structural results about dual spaces.
### Norm-Preserving Extension
The first and most direct consequence is that bounded linear functionals on subspaces can be extended to the whole space without any loss of norm.
[quotetheorem:880]
This is the normed-space form of the Hahn-Banach theorem, and it is the version most frequently used in practice. The result is analogous to the Tietze extension theorem in topology (which extends continuous functions from closed subsets of normal spaces), but with the crucial additional guarantee that the norm is preserved exactly. The choice $p(v) = \|f\|_{W^*} \cdot \|v\|_V$ turns the one-sided Hahn-Banach domination $\tilde{f}(v) \le p(v)$ into the absolute-value bound $|\tilde{f}(v)| \le \|f\|_{W^*} \cdot \|v\|_V$ by applying the inequality to both $v$ and $-v$.
[citeproof:880]
### Support Functionals
The norm-preserving extension immediately yields the existence of "maximising" functionals for any nonzero vector — functionals that achieve the operator-norm supremum.
[quotetheorem:881]
Support functionals are the workhorses of the Hahn-Banach theory. The idea is breathtakingly simple: define the functional on the one-dimensional subspace $\operatorname{span}\{v\}$ by $f(\lambda v) = \lambda \|v\|_V$, which has norm $1$, and then extend to all of $V$ by the norm-preserving extension. The resulting functional "looks at" the component of a vector in the direction of $v$ and measures its size optimally.
[citeproof:881]
### Separation of Points
The existence of support functionals has an immediate and important corollary: the dual space is rich enough to distinguish any two distinct points.
[quotetheorem:882]
This result is the functional-analytic analogue of the statement that there are "enough observables" to determine a state. It says that no information about a vector is lost when we pass from the vector itself to the collection of all its evaluations by bounded linear functionals. In physical terms: if every measurement gives the same result on two states, then the states are identical. The proof is a one-line application of support functionals: given $v \ne w$, the vector $v - w$ is nonzero and therefore has a support functional $f$ with $f(v - w) = \|v - w\|_V \ne 0$.
[citeproof:882]
## Completing the Duality Theory
With the Hahn-Banach theorem in hand, we can now close the gaps left open in Chapter 1.
### The Canonical Embedding is an Isometry
In Chapter 1, we proved that the canonical embedding $\phi: V \to V^{**}$ satisfies $\|\phi(v)\|_{V^{**}} \le \|v\|_V$. The missing reverse inequality required a support functional, which we now have.
[quotetheorem:875]
The isometry means that $V$ embeds into $V^{**}$ as a *metrically faithful* copy — distances are preserved exactly, not just up to a constant. In particular, $\phi$ is injective, so we may identify $V$ with its image $\phi(V) \subset V^{**}$. For reflexive spaces ($\phi(V) = V^{**}$), the bidual is nothing more than $V$ itself, dressed up in different notation.
[citeproof:875]
### The Adjoint Preserves the Operator Norm
In Chapter 1, we proved the inequality $\|T^*\|_{\mathcal{L}(Y^*, X^*)} \le \|T\|_{\mathcal{L}(X,Y)}$. The Hahn-Banach theorem upgrades this to equality.
[quotetheorem:883]
The proof reveals a general principle: the operator norm of $T$ is "witnessed" by vectors $v$ for which $\|T(v)\|_Y$ is close to $\|T\| \cdot \|v\|_X$, and the support functional of $T(v)$ produces a functional $g \in Y^*$ for which $\|T^*(g)\|_{X^*}$ is correspondingly large. The Hahn-Banach theorem ensures that such witnessing functionals always exist.
[citeproof:883]
### Summary of the Duality Framework
Let us collect the results established across Chapters 1 and 3 into a coherent picture. For any normed vector space $V$:
The dual space $V^* = \mathcal{L}(V, \mathbb{F})$ is always a Banach space, even when $V$ is not. The dual separates points: $f(v) = 0$ for all $f \in V^*$ implies $v = 0$. The canonical embedding $\phi: V \to V^{**}$ is an isometry, so $V$ sits inside $V^{**}$ as a closed subspace (closed because an isometry has closed range in a complete space). For a bounded linear map $T: X \to Y$, the adjoint $T^*: Y^* \to X^*$ satisfies $\|T^*\| = \|T\|$, and the double adjoint $T^{**}: X^{**} \to Y^{**}$ extends $T$ through the canonical embeddings. Reflexive spaces are characterised by $\phi(V) = V^{**}$, and reflexivity implies completeness.
This framework — support functionals, norm-preserving extensions, point separation, isometric embedding into the bidual — is the foundation upon which the deeper results of the course (Banach-Steinhaus, open mapping theorem, spectral theory) will be built.
# The Baire Category Theorem
The Hahn-Banach theorem told us that dual spaces are rich — there are enough linear functionals to separate points. We now turn to a completely different source of power: **completeness**. The Baire Category Theorem says that complete [metric spaces](/page/Metric%20Space) are "large" in a precise topological sense, and this single observation has three major consequences for the theory of bounded linear maps between Banach spaces: the Uniform Boundedness Principle (pointwise boundedness implies uniform boundedness), the Open Mapping Theorem (surjective bounded maps are open), and the Closed Graph Theorem (closed linear maps are bounded). These three results, together with the Hahn-Banach theorem, form the four pillars of linear functional analysis.
The common mechanism is that completeness prevents a space from being "thin" — from being exhausted by small, negligible pieces. The Baire Category Theorem makes this precise.
## Meagreness and the Baire Category Theorem
The key definitions come from topology.
[definition:Meagre Set]
Let $X$ be a topological space and $Z \subset X$. We say $Z$ is **of first category** (or **meagre**) if it can be written as a countable union of nowhere dense sets:
\begin{align*}
Z = \bigcup_{n=1}^\infty E_n
\end{align*}
where each $\overline{E_n}$ has empty interior. We say $Z$ is **of second category** (or **non-meagre**) if it is not meagre.
[/definition]
A nowhere dense set is one whose closure has no "bulk" — it contains no open ball. A meagre set is a countable union of such sets, and can be thought of as "topologically negligible." The Baire Category Theorem asserts that a complete metric space is never meagre in itself — it is too large to be exhausted by countably many negligible pieces.
[quotetheorem:630]
The proof constructs a point in the intersection of countably many dense [open sets](/page/Open%20Set) by building a nested sequence of closed balls, one inside each dense open set, with radii shrinking to zero. Completeness guarantees that this nested sequence converges to a point, and the nesting ensures that the point belongs to every set in the intersection. The argument is a beautiful instance of the "dig deeper" principle: at each stage we dig into a smaller ball that lies inside the next dense open set, and completeness ensures we reach a definite point at the bottom.
The equivalent reformulation is often more useful in practice: if $X = \bigcup_{n=1}^\infty F_n$ where each $F_n$ is closed, then at least one $F_n$ has nonempty interior. This is the form we will use in the proof of the Uniform Boundedness Principle.
[citeproof:630]
## Applications of the Baire Category Theorem
Before proceeding to the main theorems of functional analysis, we note two striking existence results that demonstrate the power of the Baire Category Theorem.
[example:Existence of Irrationals]
The rational numbers $\mathbb{Q}$ are countable, so $\mathbb{Q} = \bigcup_{q \in \mathbb{Q}} \{q\}$ is a countable union of singletons. Each singleton $\{q\}$ is closed and has empty interior in $\mathbb{R}$ (no open interval is contained in a single point). Therefore $\mathbb{Q}$ is meagre in $\mathbb{R}$. Since $\mathbb{R}$ is a complete metric space, the Baire Category Theorem tells us $\mathbb{R}$ is non-meagre. Therefore $\mathbb{R} \ne \mathbb{Q}$, and irrational numbers exist. This argument extends: the irrationals $\mathbb{R} \setminus \mathbb{Q}$, being a countable intersection of dense open sets $\mathbb{R} \setminus \{q\}$, are not just nonempty but dense — a far stronger statement than mere existence.
[/example]
[example:Existence of Nowhere Differentiable Continuous Functions]
The Baire Category Theorem proves that "most" continuous functions are nowhere [differentiable](/page/Derivative). Define for each $m, n \in \mathbb{N}$:
\begin{align*}
E_{m,n} := \left\{f \in C([0,1]) \mid \exists x \in [0,1] \text{ s.t. } \forall y \in [0,1] \setminus \{x\} \text{ with } |y - x| < \tfrac{1}{m}, \; \left|\frac{f(y) - f(x)}{y - x}\right| \le n\right\}.
\end{align*}
Each $E_{m,n}$ is closed (if $f_k \to f$ uniformly and each $f_k$ has a point $x_k$ where the difference quotient is bounded by $n$ on a $1/m$-neighbourhood, then by compactness of $[0,1]$ a subsequence of the $x_k$ converges to some $x$, and the uniform convergence forces $f$ to inherit the bound at $x$). Each $E_{m,n}$ has empty interior: given any $f \in E_{m,n}$ and $\varepsilon > 0$, the [Weierstrass approximation theorem](/theorems/480) provides a polynomial $p$ with $\|f - p\|_\infty < \varepsilon/2$, and adding a small-amplitude high-frequency zigzag function to $p$ produces a function $g$ with $\|f - g\|_\infty < \varepsilon$ and $g \notin E_{m,n}$ (because the zigzag has difference quotients exceeding $n$ everywhere). Every function differentiable at some point $x$ belongs to $E_{m,n}$ for some $m, n$ (take $n = \lceil |f'(x)| \rceil + 1$ and $m$ large enough). Since $C([0,1])$ is a Banach space, the Baire Category Theorem implies $\bigcup_{m,n} E_{m,n}$ is meagre. The complement — continuous functions differentiable at no point — is therefore a dense $G_\delta$ set.
[/example]
## The Uniform Boundedness Principle
We now come to the first of the three great consequences of the Baire Category Theorem. The Uniform Boundedness Principle (also known as the Banach-Steinhaus theorem) says that a family of bounded linear maps that is bounded at every point must be bounded uniformly in operator norm. The passage from pointwise to uniform is precisely the kind of upgrade that fails for general functions (a sequence of continuous functions can be pointwise bounded everywhere yet have suprema tending to infinity), and the theorem identifies the structural reason it works for linear maps: the level sets $\{x : \sup_\alpha \|T_\alpha(x)\| \le n\}$ are closed (by continuity) and exhaust the space (by the pointwise bound), so the Baire Category Theorem forces one of them to contain a ball, and linearity promotes the ball to a global bound.
[quotetheorem:549]
The theorem requires the domain $X$ to be a Banach space — completeness is essential. Without it, the Baire Category Theorem does not apply and the conclusion fails. The target $Y$ need only be a normed space; its completeness is not used.
The key step in the proof is the passage from "some $F_N$ contains a ball $B_\varepsilon(v_0)$" to "the family is uniformly bounded." This uses a centering trick: for any $x$ with $\|x\|_X \le 1$, the point $v_0 + \varepsilon x/2$ lies in $B_\varepsilon(v_0) \subset F_N$, and by the triangle inequality and linearity:
\begin{align*}
\|T_\alpha(x)\|_Y \le \frac{2}{\varepsilon}\left(N + \sup_\alpha \|T_\alpha(v_0)\|_Y\right)
\end{align*}
which is a bound independent of $\alpha$ and $x$.
[citeproof:549]
[remark:Failure for General Functions]
The Uniform Boundedness Principle is specific to bounded linear maps. For general continuous functions the conclusion fails: define $f_n: [0,1] \to \mathbb{R}$ by a triangular spike of height $n$ on an interval of width $2/n$ centred at $1/n$ (and zero elsewhere). Then for each fixed $x \in [0,1]$, $\sup_n |f_n(x)| < \infty$ (since eventually $x$ lies outside the support of $f_n$), but $\sup_n \sup_x |f_n(x)| = \sup_n n = \infty$. The linearity of the operators is what makes the centering trick work.
[/remark]
## The Open Mapping Theorem
The next major application of the Baire Category Theorem is the Open Mapping Theorem. It answers a natural question: if $T: X \to Y$ is a surjective bounded linear map between Banach spaces, does $T$ send open sets to open sets? In finite dimensions this is automatic (surjective linear maps between spaces of the same dimension are isomorphisms), but in infinite dimensions surjectivity of a bounded map does not by itself guarantee that the image of an open ball contains an open ball.
The Open Mapping Theorem says that it does, and the proof proceeds in three stages. First, the Baire Category Theorem shows that the *closure* of $T(B_1^X(0))$ contains a ball in $Y$ (approximate openness). Second, a completeness argument promotes this to actual containment: $T(B_1^X(0))$ itself contains a ball. Third, a rescaling argument extends this from the unit ball to arbitrary open sets.
[quotetheorem:631]
Surjectivity is necessary: if $T$ is not surjective, its image misses some point $q \in Y$, and by linearity $T(X)$ cannot contain any open ball around $0$ (since scaling $q$ shows that arbitrarily small scalar multiples of $q$ are also missed). The completeness of both $X$ and $Y$ is used in different ways: completeness of $Y$ enters via the Baire Category Theorem (to show approximate openness), while completeness of $X$ is used in the geometric series argument (to sum the sequence of corrections and obtain an actual preimage).
[citeproof:631]
## The Inverse Mapping Theorem
An immediate and extremely useful corollary of the Open Mapping Theorem is a global inverse theorem for bounded linear maps. In finite dimensions, an injective linear map between spaces of the same dimension is automatically an isomorphism — the inverse is automatically linear and bounded. In infinite dimensions, injectivity and surjectivity of a bounded map do not automatically guarantee that the inverse is bounded (a bijective bounded map could in principle have an unbounded inverse). The Inverse Mapping Theorem says this cannot happen, provided both spaces are complete.
[quotetheorem:884]
The proof is a one-line application of the Open Mapping Theorem: since $T$ is a bounded surjection between Banach spaces, it is an open map. But the preimage of an open set under $T^{-1}$ is the same as the image of that set under $T$, which is open. So $T^{-1}$ is continuous, hence bounded.
This result is remarkable because it gives a *global* inverse — not just a local one as in the finite-dimensional [inverse function theorem](/page/Inverse%20Function%20Theorem) of multivariable calculus. The price we pay is the stronger hypothesis of linearity (and completeness of both spaces).
[citeproof:884]
## The Closed Graph Theorem
The final consequence of the Baire Category Theorem addresses a practical problem: given a linear map $T: X \to Y$ between Banach spaces, how can we verify that $T$ is bounded without explicitly computing an operator norm bound?
The [Equivalence of Continuity and Boundedness](/theorems/873) tells us that boundedness is equivalent to continuity. Continuity means that if $v_n \to v$ then $T(v_n) \to T(v)$ — both convergence and the correct identification of the limit. There is a weaker condition that is often easier to check: the **closed graph property** says that if $v_n \to v$ and $T(v_n) \to w$ for some $w \in Y$, then $w = T(v)$. In other words, we assume both sequences converge, and merely verify that the limit of $T(v_n)$ is the "right" one. This is weaker than continuity because continuity demands that $T(v_n)$ converges (not just that if it converges, it converges correctly). The Closed Graph Theorem says that for linear maps between Banach spaces, this weaker condition suffices.
[definition:Closed Graph]
Let $X$ and $Y$ be normed vector spaces and $T: X \to Y$ a linear map. The **graph** of $T$ is the set
\begin{align*}
\Gamma(T) := \{(x, T(x)) \in X \times Y \mid x \in X\}.
\end{align*}
We say $T$ has a **closed graph** if $\Gamma(T)$ is a closed subset of $X \times Y$ (equipped with the norm $\|(x, y)\|_{X \times Y} := \max\{\|x\|_X, \|y\|_Y\}$).
[/definition]
The closed graph condition says precisely: if $v_n \to v$ in $X$ and $T(v_n) \to w$ in $Y$, then $T(v) = w$. Every bounded (continuous) linear map has a closed graph, but the converse is non-trivial.
[quotetheorem:217]
The proof is elegant and short. The graph $\Gamma(T)$, being a closed subspace of the Banach space $X \times Y$, is itself a Banach space. The projection $\pi_X: \Gamma(T) \to X$ defined by $\pi_X(x, T(x)) = x$ is a bounded bijective linear map between Banach spaces. The [Inverse Mapping Theorem](/theorems/884) guarantees that $\pi_X^{-1}$ is bounded. Therefore the composition $T = \pi_Y \circ \pi_X^{-1}$ (where $\pi_Y(x, T(x)) = T(x)$) is bounded as the composition of bounded maps.
The Closed Graph Theorem is often the most convenient tool for proving that a linear map is bounded, because the closed graph property can be verified using weak or [distributional](/page/Distribution) limits. It is used extensively in PDE theory, where one constructs operators via limit processes and needs to verify boundedness without explicit norm estimates.
[citeproof:217]
[example:Differentiation Has a Closed Graph but Is Unbounded]
Consider $T: C^1([0,1]) \to C([0,1])$ defined by $T(f) := f'$, where $C^1([0,1])$ carries the supremum norm $\|\cdot\|_\infty$ (not the $C^1$ norm). The sequence $f_n(x) = x^n$ has $\|f_n\|_\infty = 1$ but $\|T(f_n)\|_\infty = n$, so $T$ is unbounded.
However, $T$ has a closed graph. Suppose $f_n \to f$ uniformly and $f_n' \to g$ uniformly. Then for each $t \in [0,1]$:
\begin{align*}
f(t) = \lim_{n \to \infty} f_n(t) = \lim_{n \to \infty} \left(f_n(0) + \int_0^t f_n'(s) \, d\mathcal{L}^1(s)\right) = f(0) + \int_0^t g(s) \, d\mathcal{L}^1(s).
\end{align*}
By the Fundamental Theorem of Calculus, $f$ is differentiable with $f' = g$, so $T(f) = g$.
The Closed Graph Theorem does not apply here because the domain $C^1([0,1])$ with the $\|\cdot\|_\infty$ norm is *not* a Banach space — it is incomplete (the limit of a uniformly convergent Cauchy sequence of $C^1$ functions need not be $C^1$). This example shows precisely why completeness of the domain is necessary.
[/example]
## Interplay of the Four Pillars
The four fundamental theorems — Hahn-Banach, Baire Category, Uniform Boundedness, and Open Mapping — interact in deep ways. The Uniform Boundedness Principle and the Open Mapping Theorem are both consequences of the Baire Category Theorem, but they serve different purposes: the former converts pointwise bounds to uniform bounds, while the latter converts surjectivity to openness. The Closed Graph Theorem follows from the Open Mapping Theorem (via the Inverse Mapping Theorem). The Hahn-Banach theorem stands apart — it uses no completeness and instead relies on the axiom of choice — but it provides the "duality infrastructure" (support functionals, extensions) that makes the other theorems applicable.
Together, these four results justify the central role of Banach spaces in analysis: they are the natural setting in which linear operators behave predictably, surjective maps have bounded inverses, pointwise convergence has uniform consequences, and there are enough functionals to probe the space completely.
# The Topology of $C(K)$
Throughout this chapter, $K$ denotes a compact Hausdorff topological space and $C(K) := \{f: K \to \mathbb{R} \mid f \text{ is continuous}\}$, equipped with the supremum norm $\|f\|_{C(K)} := \sup_{x \in K} |f(x)|$. We have already seen that $C(K)$ is a Banach space. This chapter addresses three natural questions about the structure of $C(K)$: how many continuous functions are there (answered by Urysohn's lemma and the Tietze extension theorem), when can we extract convergent subsequences from a family of continuous functions (answered by Arzelà-Ascoli), and when can we approximate arbitrary continuous functions by elements of a prescribed subalgebra (answered by Stone-Weierstrass).
These three themes — abundance, compactness, and approximation — are the core structural questions for any function space in analysis.
## Separation and Extension
A compact Hausdorff space has a fundamental topological property that makes it amenable to analysis: **normality**. A topological space is normal if any two disjoint closed sets can be separated by disjoint open sets. Every compact Hausdorff space is normal — the proof uses the Hausdorff property to separate points from compact sets, then compactness to upgrade to separation of compact (hence closed) sets from each other.
Normality enables the construction of continuous real-valued functions with prescribed values on closed sets, which is the content of Urysohn's lemma.
[quotetheorem:887]
The proof is a beautiful piece of constructive topology. The normality condition is used iteratively to build a nested family of open and closed sets indexed by dyadic rationals $q \in (0, 1)$, with $U_q \subset C_q \subset U_{q'}$ whenever $q < q'$. The function $f$ is then defined as an infimum: $f(x) = \inf\{q : x \in U_q\}$. The key insight is that continuity of $f$ follows from the nesting: $\{f < \alpha\}$ is a union of open $U_qs, and $\{f > \alpha\}$ is a union of complements of closed $C_qs, both open.
[citeproof:887]
Urysohn's lemma produces "step-like" functions that are constant on two specified closed sets. The Tietze-Urysohn extension theorem is a substantial generalisation: it extends *any* bounded continuous function from a closed subset to the whole space, preserving the norm.
[quotetheorem:888]
The proof is an iterative approximation scheme that repeatedly applies Urysohn's lemma to reduce the error. At each stage, the maximum error decreases by a factor of $2/3$: the function $g_i$ produced by Urysohn's lemma captures a third of the remaining oscillation, and the geometric series $\sum (2/3)^i$ converges. This is a precursor of the approximation-and-limit technique that pervades functional analysis.
The Tietze-Urysohn theorem is the analogue of the [Norm-Preserving Extension](/theorems/880) from the Hahn-Banach theory, but for continuous functions rather than linear functionals. It guarantees that $C(K)$ is "rich": given any finite set of points $\{p_1, \ldots, p_n\} \subset K$ and any values $\{y_1, \ldots, y_n\} \subset \mathbb{R}$, there exists $f \in C(K)$ with $f(p_i) = y_i$ for each $i$.
[citeproof:888]
## Compact Subsets and the Arzelà-Ascoli Theorem
In finite-dimensional spaces, bounded closed sets are compact (by Heine-Borel). In $C(K)$, the closed unit ball is *not* compact (by the [Characterisation of Finite Dimensionality](/theorems/878), since $C(K)$ is infinite-dimensional when $K$ is infinite). The Arzelà-Ascoli theorem identifies exactly which subsets of $C(K)$ are compact: those that are closed, bounded, and **equicontinuous**.
The need for equicontinuity is illustrated by a simple example: define $f_n: [0,1] \to \mathbb{R}$ as a triangular spike of height $1$ supported on $[0, 1/n]$. Each $f_n$ is continuous with $\|f_n\|_\infty = 1$, but the pointwise limit is $f(x) = 0$ for $x \ne 0$ and $f(0) = 1$, which is discontinuous. No subsequence can converge uniformly, because the limit would have to be discontinuous. The problem is that the modulus of continuity of $f_n$ near $x = 0$ depends on $n$ and deteriorates: the functions are "individually continuous but collectively discontinuous."
[definition:Equicontinuity]
Let $K$ be a topological space and $F \subset C(K)$. The family $F$ is **equicontinuous at $x \in K$** if for every $\varepsilon > 0$, there exists an open neighbourhood $U$ of $x$ such that $|f(y) - f(x)| < \varepsilon$ for all $y \in U$ and all $f \in F$. The family $F$ is **equicontinuous** if it is equicontinuous at every $x \in K$.
[/definition]
[quotetheorem:885]
The proof exploits the interplay between two types of compactness: the compactness of $K$ (which converts equicontinuity into a finite covering argument) and the compactness of bounded subsets of $\mathbb{R}^n$ (which handles the values of the functions at finitely many points). The equicontinuity condition ensures that controlling the values at finitely many points suffices to control the entire function uniformly.
The reverse direction — showing that a totally bounded family is equicontinuous — is equally instructive. The finite $\varepsilon$-net $\{f_1, \ldots, f_m\}$ provides a finite "template" of functions, and equicontinuity follows because any $f \in F$ is uniformly close to one of these finitely many continuous templates.
[citeproof:885]
[example:Application to ODEs]
The Arzelà-Ascoli theorem is the key tool in Peano's existence theorem for ODEs. Consider $x'(t) = f(x(t))$ with $f$ continuous (but not necessarily Lipschitz). The strategy is to approximate $f$ by Lipschitz functions $f_n$, solve the approximate ODEs $x_n'(t) = f_n(x_n(t))$ using Picard-Lindelöf, show that $\{x_n\}$ is bounded and equicontinuous (via the ODE and the bound on $f$), extract a [uniformly convergent](/page/Uniform%20Convergence) subsequence by Arzelà-Ascoli, and verify that the limit solves the original equation. Without Arzelà-Ascoli, there is no way to pass from the approximate solutions to a genuine solution.
[/example]
## Approximation: The Stone-Weierstrass Theorem
The classical Weierstrass approximation theorem states that every continuous function on $[0, 1]$ can be uniformly approximated by polynomials. The Stone-Weierstrass theorem vastly generalises this: polynomials are replaced by an arbitrary subalgebra, and $[0, 1]$ is replaced by an arbitrary compact Hausdorff space. The key hypothesis is that the subalgebra **separates points**.
[quotetheorem:886]
The theorem subsumes the Weierstrass approximation theorem: polynomials on $[0, 1]$ form a subalgebra that separates points (the function $f(x) = x$ distinguishes any two points) and includes the constant function $1$ (so no point is annihilated by all polynomials). Hence $\overline{A} = C_\mathbb{R}([0, 1])$.
The proof has three layers of beautiful structure. First, the closure of a subalgebra is shown to be a **sublattice** (closed under $\max$ and $\min$) by approximating the absolute value function $|f|$ via polynomials in $f^2$. Second, a closed sublattice with the "interpolation property" (the ability to approximately match any continuous function at any two points) is shown to be dense, via a compactness argument that first minimises over rows and then maximises over columns. Third, the subalgebra's separation and non-vanishing properties are used to verify the interpolation property.
The second case of the theorem — where $\overline{A} = \{f \in C(K) : f(x_0) = 0\}$ for some point $x_0$ — cannot be eliminated in general. The algebra generated by $f(x) = x$ on $[0, 1]$ consists of polynomials vanishing at $0$, and its closure is exactly the space of continuous functions vanishing at $0$.
[citeproof:886]
[remark:Complex Stone-Weierstrass]
The theorem as stated is for real-valued functions. The complex version requires an additional hypothesis: the subalgebra must be closed under complex conjugation (i.e. $f \in A \Rightarrow \bar{f} \in A$). Without this, the theorem fails: complex polynomials in $z$ on $\overline{B}(0,1) \subset \mathbb{C}$ are holomorphic, and their uniform limits are holomorphic by Morera's theorem. But $\bar{z}$ is not holomorphic, so it cannot be approximated. The conjugation hypothesis is automatically satisfied for real-valued subalgebras (since $\bar{f} = f$), which is why it does not appear in the real version.
[/remark]
[example:Weierstrass Approximation as a Special Case]
The classical Weierstrass Approximation Theorem states: every $f \in C_\mathbb{R}([0,1])$ can be uniformly approximated by polynomials. We derive this from Stone-Weierstrass.
Let $A := \{p: [0,1] \to \mathbb{R} \mid p \text{ is a polynomial}\}$. Then $A$ is a subalgebra of $C_\mathbb{R}([0,1])$ (closed under addition, scalar multiplication, and pointwise multiplication of polynomials). It separates points: the polynomial $f(x) = x$ satisfies $f(a) \ne f(b)$ whenever $a \ne b$. Moreover, $A$ contains the constant polynomial $1$, so for every $x_0 \in [0,1]$, there exists $p \in A$ with $p(x_0) = 1 \ne 0$. By the [Stone-Weierstrass Theorem](/theorems/886), either $\overline{A} = C_\mathbb{R}([0,1])$ or $\overline{A} = \{f \in C_\mathbb{R}([0,1]) : f(x_0) = 0\}$ for some $x_0$. Since $1 \in A$ and $1(x_0) = 1 \ne 0$, the second case is impossible. Hence $\overline{A} = C_\mathbb{R}([0,1])$.
This argument extends immediately: polynomials are dense in $C_\mathbb{R}(K)$ for any compact subset $K \subset \mathbb{R}^n$, since polynomials in $n$ variables still form a separating subalgebra containing the constants. The Stone-Weierstrass theorem thus unifies a whole family of approximation results into a single criterion: separation of points.
[/example]
# Hilbert Spaces
Normed vector spaces provide the setting for analysis, and Banach spaces add the crucial property of completeness. But the most powerful and elegant theory emerges when the norm comes from an **inner product** — a bilinear (or sesquilinear) form that generalises the dot product of Euclidean space. Inner products give us a notion of **angle** and **orthogonality**, and with these come orthogonal projections, orthogonal decompositions, and the ability to expand elements in orthonormal bases. The resulting spaces — Hilbert spaces — are the natural setting for Fourier analysis, quantum mechanics, and the spectral theory of operators.
The key insight of this chapter is that the geometry of Hilbert spaces is governed by **orthogonality**, and that this geometric structure completely determines the dual space (via the Riesz Representation Theorem) and enables the representation of elements as convergent series in an orthonormal basis (leading to the identification with $\ell^2$).
## Inner Products and the Cauchy-Schwarz Inequality
[definition:Inner Product Space]
An **inner product space** (or **Euclidean space**) is a vector space $V$ over $\mathbb{F}$ (either $\mathbb{R}$ or $\mathbb{C}$) equipped with an **inner product** $(\cdot, \cdot): V \times V \to \mathbb{F}$ satisfying:
(i) $(v, w) = \overline{(w, v)}$ for all $v, w \in V$ (conjugate symmetry).
(ii) $(\lambda v_1 + \mu v_2, w) = \lambda(v_1, w) + \mu(v_2, w)$ for all $\lambda, \mu \in \mathbb{F}$, $v_1, v_2, w \in V$ (linearity in the first argument).
(iii) $(v, v) \ge 0$ for all $v \in V$, with equality if and only if $v = 0$ (positive-definiteness).
A **Hilbert space** is an inner product space that is complete under the induced norm $\|v\| := \sqrt{(v, v)}$.
[/definition]
The fact that $\|v\| := \sqrt{(v, v)}$ is indeed a norm follows from the **Cauchy-Schwarz inequality**: $|(v, w)| \le \|v\| \cdot \|w\|$ for all $v, w \in V$. This is proved by considering the non-negative quadratic $0 \le (v + tw, v + tw) = \|v\|^2 + 2t \operatorname{Re}(v, w) + t^2\|w\|^2$ and noting that its discriminant must be non-positive. From Cauchy-Schwarz, the triangle inequality $\|v + w\|^2 = \|v\|^2 + 2\operatorname{Re}(v, w) + \|w\|^2 \le (\|v\| + \|w\|)^2$ follows immediately.
Inner product spaces satisfy two additional identities with no analogue for general normed spaces: the **parallelogram law** $\|v + w\|^2 + \|v - w\|^2 = 2\|v\|^2 + 2\|w\|^2$ and **Pythagoras' theorem**: if $(v, w) = 0$, then $\|v + w\|^2 = \|v\|^2 + \|w\|^2$. The parallelogram law characterises inner product spaces among normed spaces: a norm satisfies the parallelogram law if and only if it arises from an inner product (via the polarisation identities).
[example:The Space $\ell^2$]
The sequence space $\ell^2(\mathbb{R}) := \{x = (x_1, x_2, \ldots) : \sum_{i=1}^\infty |x_i|^2 < \infty\}$ with inner product $(a, b)_{\ell^2} := \sum_{i=1}^\infty a_i b_i$ is a Hilbert space. The inner product is well-defined by Cauchy-Schwarz (applied in finite dimensions and passed to the limit): $\sum_{i=1}^\infty |a_i b_i| \le \|a\|_{\ell^2} \cdot \|b\|_{\ell^2} < \infty$. Every separable infinite-dimensional Hilbert space is isometrically isomorphic to $\ell^2$ — this is the fundamental classification result of the chapter.
[/example]
## Orthogonal Projections
The key geometric structure of Hilbert spaces is the **orthogonal decomposition**: every closed subspace has a unique orthogonal complement, and every vector can be uniquely decomposed into a component in the subspace and a component perpendicular to it.
[quotetheorem:241]
The existence of orthogonal projections relies on completeness in an essential way: the closest point in the subspace $M$ to a given vector $x$ is found as the limit of a minimising sequence, and the parallelogram law is used to show this sequence is Cauchy. The uniqueness of the decomposition follows from the fact that $M \cap M^\perp = \{0\}$.
This result is the Hilbert-space analogue of writing a vector in $\mathbb{R}^3$ as the sum of its projection onto a plane and its perpendicular component. The difference is that in infinite dimensions, the subspace $M$ must be *closed* (equivalently, complete) — an unclosed subspace may not contain its closest point to a given vector.
[citeproof:241]
[example:Orthogonal Projection in $\ell^2$]
Let $H = \ell^2(\mathbb{R})$ and let $M := \{x \in \ell^2 : x_1 = x_2\}$. This is a closed subspace (it is the kernel of the bounded linear functional $f(x) = x_1 - x_2$). We compute the orthogonal projection of $x = (3, 1, 0, 0, \ldots)$ onto $M$.
We seek $m \in M$ minimising $\|x - m\|_{\ell^2}^2$. Write $m = (a, a, m_3, m_4, \ldots)$ (since $m_1 = m_2 = a$ for $m \in M$). Then:
\begin{align*}
\|x - m\|^2 = (3 - a)^2 + (1 - a)^2 + m_3^2 + m_4^2 + \cdots.
\end{align*}
This is minimised by taking $m_k = 0$ for $k \ge 3$ (adding nonzero components only increases the distance) and choosing $a$ to minimise $(3 - a)^2 + (1 - a)^2 = 2a^2 - 8a + 10$. Differentiating: $4a - 8 = 0$, so $a = 2$. The projection is $P_M(x) = (2, 2, 0, 0, \ldots)$.
We verify: $x - P_M(x) = (1, -1, 0, 0, \ldots)$, and for any $m = (a, a, m_3, \ldots) \in M$:
\begin{align*}
(x - P_M(x), m)_{\ell^2} = 1 \cdot a + (-1) \cdot a + 0 + \cdots = 0.
\end{align*}
So $x - P_M(x) \in M^\perp$, confirming the decomposition $x = P_M(x) + (x - P_M(x))$ with $P_M(x) \in M$ and $x - P_M(x) \in M^\perp$.
[/example]
## The Riesz Representation Theorem
In a general Banach space, the dual $V^*$ can be much larger and more complicated than $V$ itself. In a Hilbert space, the situation is dramatically simpler: every bounded linear functional is given by an inner product with a fixed element.
[quotetheorem:221]
This is one of the most elegant results in functional analysis. It says that the natural map $v \mapsto (v, \cdot)_H$ from $H$ to $H^*$ is an isometric *anti*-isomorphism (conjugate-linear in the complex case, linear in the real case). The dual of a Hilbert space is therefore "the same" as the space itself, up to conjugation.
The proof splits into two parts. Surjectivity is the substantive content: given $f \in H^*$, the kernel $\ker(f)$ is a closed subspace of codimension $1$ (unless $f \equiv 0$). The orthogonal decomposition gives $H = \ker(f) \oplus \ker(f)^\perp$, where $\ker(f)^\perp$ is one-dimensional. The representing element $u_f$ is found by choosing a unit vector $v$ in $\ker(f)^\perp$ and scaling it so that $f(v) = (u_f, v)_H$. The isometry $\|f\|_{H^*} = \|u_f\|_H$ follows from Cauchy-Schwarz in one direction and the choice of $u_f$ in the other.
An immediate corollary is that every Hilbert space is reflexive: the canonical embedding $\phi: H \to H^{**}$ is surjective because it coincides with the composition of the Riesz isomorphisms for $H$ and $H^*$.
[citeproof:221]
## Orthonormal Bases and the Isomorphism with $\ell^2$
[definition:Orthonormal System]
Let $H$ be a Hilbert space. A set $\{e_\alpha\}_{\alpha \in I} \subset H$ is an **orthonormal system** if $(e_\alpha, e_\beta)_H = \delta_{\alpha\beta}$ for all $\alpha, \beta \in I$. An orthonormal system is a **(Hilbert space) basis** if it is maximal — it cannot be extended to a strictly larger orthonormal system.
[/definition]
In a Hilbert space, a maximal orthonormal system $S$ satisfies $\overline{\operatorname{span}(S)} = H$. The proof uses the orthogonal decomposition: if $S^\perp \ne \{0\}$, then any unit vector in $S^\perp$ could be added to $S$, contradicting maximality. The converse also holds (for Euclidean spaces, not just Hilbert spaces): if $\overline{\operatorname{span}(S)} = H$, then $S$ is maximal. The existence of a maximal orthonormal system is guaranteed by Zorn's lemma.
For separable Hilbert spaces (those with a countable dense subset), bases are countable and can be constructed by the Gram-Schmidt procedure. The central result is that every element can be expanded as a convergent series in the basis elements.
[theorem:Fourier Expansion in Hilbert Spaces]
Let $H$ be a separable Hilbert space with orthonormal basis $\{e_i\}_{i=1}^N$ ($N \in \mathbb{N} \cup \{\infty\}$). For every $x \in H$, setting $x_i := (x, e_i)_H$:
(i) $x = \sum_{i=1}^N x_i e_i$ (convergence in $H$).
(ii) $\|x\|_H^2 = \sum_{i=1}^N |x_i|^2$ (Parseval's identity).
(iii) $(x, y)_H = \sum_{i=1}^N x_i \overline{y_i}$ for all $x, y \in H$ (absolutely convergent).
[/theorem]
The proof uses Bessel's inequality ($\sum_{i=1}^n |x_i|^2 \le \|x\|^2$, which follows from $\|P_{F_n}(x)\| \le \|x\|$ where $P_{F_n}$ is the projection onto $\operatorname{span}\{e_1, \ldots, e_n\}$) to show that the partial sums $S_n := \sum_{i=1}^n x_i e_i$ form a Cauchy sequence, then verifies that $x - \lim S_n \in \{e_1, e_2, \ldots\}^\perp = \{0\}$, so $x = \lim S_n$.
[proof]
**Step 1: The partial sums are Cauchy.** By Bessel's inequality, $\sum_{i=1}^n |x_i|^2 \le \|x\|^2$ for all $n$. Hence $\sum_{i=1}^\infty |x_i|^2 \le \|x\|^2 < \infty$, and for $n > m$:
\begin{align*}
\|S_n - S_m\|^2 = \left\|\sum_{i=m+1}^n x_i e_i\right\|^2 = \sum_{i=m+1}^n |x_i|^2 \to 0 \text{ as } m, n \to \infty.
\end{align*}
**Step 2: Identify the limit.** Since $H$ is complete, $S_n \to S$ for some $S \in H$. By continuity of the inner product, $(S, e_j) = \lim_{n \to \infty} (S_n, e_j) = x_j = (x, e_j)$ for all $j$. Hence $x - S \in \{e_1, e_2, \ldots\}^\perp = \{0\}$, so $x = S$.
**Step 3: Parseval.** $\|x\|^2 = \lim_{n \to \infty} \|S_n\|^2 = \lim_{n \to \infty} \sum_{i=1}^n |x_i|^2 = \sum_{i=1}^\infty |x_i|^2$.
**Step 4: Inner product formula.** By Cauchy-Schwarz in $\ell^2$: $\sum |x_i \bar{y}_i| \le (\sum |x_i|^2)^{1/2}(\sum |y_i|^2)^{1/2} = \|x\| \cdot \|y\| < \infty$. The identity follows from continuity of the inner product applied to $x = \sum x_i e_i$ and $y = \sum y_i e_i$.
[/proof]
The Fourier expansion shows that the map $H \to \ell^2$ sending $x \mapsto \{(x, e_i)\}_{i=1}^\infty$ is an isometric isomorphism (surjectivity follows from the Riesz-Fischer theorem: every $\ell^2$ sequence arises as the coefficients of some element of $H$, constructed as the limit of partial sums). Hence all separable infinite-dimensional Hilbert spaces are isometrically isomorphic to $\ell^2$ — there is essentially only one such space.
[example:Fourier Series Convergence in $L^2$]
Consider $H = L^2(S^1)$, the completion of $C_\mathbb{C}(S^1)$ under the inner product $(f, g)_{L^2} := \frac{1}{2\pi}\int_{-\pi}^{\pi} f(x) \overline{g(x)} \, d\mathcal{L}^1(x)$. The functions $e_n(x) := e^{inx}$ for $n \in \mathbb{Z}$ form an orthonormal system:
\begin{align*}
(e_n, e_m)_{L^2} = \frac{1}{2\pi}\int_{-\pi}^{\pi} e^{inx} e^{-imx} \, d\mathcal{L}^1(x) = \frac{1}{2\pi}\int_{-\pi}^{\pi} e^{i(n-m)x} \, d\mathcal{L}^1(x) = \delta_{nm}.
\end{align*}
The orthogonality follows by direct computation: for $n \ne m$, $\int_{-\pi}^{\pi} e^{i(n-m)x} \, dx = \frac{e^{i(n-m)\pi} - e^{-i(n-m)\pi}}{i(n-m)} = \frac{2i\sin((n-m)\pi)}{i(n-m)} = 0$.
We claim $\{e_n\}_{n \in \mathbb{Z}}$ is a Hilbert space basis, i.e., $\overline{\operatorname{span}\{e_n\}}= L^2(S^1)$. Since the trigonometric polynomials $\operatorname{span}\{e^{inx}\}_{n \in \mathbb{Z}}$ form a subalgebra of $C_\mathbb{C}(S^1)$ that separates points and is closed under complex conjugation ($\overline{e^{inx}} = e^{-inx}$), the complex Stone-Weierstrass theorem gives density in $C_\mathbb{C}(S^1)$ under the supremum norm. Since convergence in $\|\cdot\|_\infty$ implies convergence in $\|\cdot\|_{L^2}$ (because $\|f\|_{L^2} \le \|f\|_\infty$ for functions on $S^1$), the trigonometric polynomials are also dense in $C_\mathbb{C}(S^1)$ under $\|\cdot\|_{L^2}$. Since $C_\mathbb{C}(S^1)$ is dense in $L^2(S^1)$ (by definition of $L^2$ as a completion), density follows.
As a consequence, for any $f \in C_\mathbb{R}(S^1)$, the partial Fourier sums
\begin{align*}
(S_N f)(x) := \sum_{n=-N}^{N} \hat{f}(n) e^{inx}, \quad \text{where } \hat{f}(n) := \frac{1}{2\pi}\int_{-\pi}^{\pi} f(x) e^{-inx} \, d\mathcal{L}^1(x),
\end{align*}
converge to $f$ in $L^2(S^1)$ by the Fourier Expansion theorem. The projection $S_N = P_{F_N}$ onto $F_N := \operatorname{span}\{e^{inx} : |n| \le N\}$ satisfies $\|S_N\|_{\mathcal{L}(L^2)} \le 1$, and so for any polynomial $P \in F_N$:
\begin{align*}
\|S_N f - f\|_{L^2} \le \|S_N(f - P)\|_{L^2} + \|P - f\|_{L^2} \le 2\|f - P\|_{L^2}.
\end{align*}
By Stone-Weierstrass, the right side can be made arbitrarily small. This is the proper functional-analytic proof of $L^2$ convergence of [Fourier series](/page/Fourier%20Series) — no explicit computation of the [Dirichlet kernel](/page/Dirichlet%20Kernel) is needed.
The $L^2$ convergence established here is strictly weaker than pointwise convergence. There exist continuous functions whose Fourier series diverge at specific points — a fact that follows from the Uniform Boundedness Principle applied to the Dirichlet kernel operators, as discussed in Chapter 4. The Hilbert space framework gives $L^2$ convergence for free from the orthonormal basis theory, but says nothing about pointwise behaviour.
[/example]
# Operators and Spectral Theory
We now turn to the study of bounded linear operators $T: X \to X$ on a Banach space $X$ (written $T \in \mathcal{L}(X) := \mathcal{L}(X, X)$). In finite dimensions, the central objects of study are eigenvalues and eigenvectors, which lead to diagonalisation and the [Jordan normal form](/theorems/864). In infinite dimensions, the appropriate generalisation is the **spectrum** of an operator — the set of scalars $\lambda$ for which $T - \lambda I$ fails to be invertible. The spectrum is always nonempty and compact, but its structure is richer than in finite dimensions: an operator can fail to be invertible by being non-injective (eigenvalue) or non-surjective (continuous or residual spectrum).
The most complete results are obtained for **compact operators** — operators whose image of the unit ball has compact closure. Compact operators are the infinite-dimensional analogues of matrices: they have at most countably many eigenvalues (accumulating only at zero), and their non-zero eigenspaces are finite-dimensional. For compact **self-adjoint** operators on Hilbert spaces, the full spectral theorem gives an orthonormal eigenbasis decomposition, reducing the operator to a diagonal form — the infinite-dimensional generalisation of the principal axis theorem.
## The Spectrum
[definition:Spectrum]
Let $X$ be a Banach space over $\mathbb{C}$ and $T \in \mathcal{L}(X)$. The **spectrum** of $T$ is
\begin{align*}
\sigma(T) := \{\lambda \in \mathbb{C} : T - \lambda I \text{ is not invertible in } \mathcal{L}(X)\}.
\end{align*}
The **resolvent set** is $\rho(T) := \mathbb{C} \setminus \sigma(T)$. The **point spectrum** (set of eigenvalues) is $\sigma_p(T) := \{\lambda \in \mathbb{C} : \ker(T - \lambda I) \ne \{0\}\}$.
[/definition]
The point spectrum $\sigma_p(T)$ corresponds to failure of injectivity, while the remainder of $\sigma(T) \setminus \sigma_p(T)$ corresponds to failure of surjectivity (the operator is injective but not surjective). In finite dimensions these coincide, but in infinite dimensions they need not.
[example:The Spectrum of the Right Shift on $\ell^\infty$]
Define the right shift $S: \ell^\infty \to \ell^\infty$ by $S(a_1, a_2, \ldots) := (0, a_1, a_2, \ldots)$. We compute $\sigma(S)$, $\sigma_p(S)$, and the approximate point spectrum $\sigma_{ap}(S)$ in full.
**Operator norm.** We have $\|S(x)\|_{\ell^\infty} = \sup_i |x_i| = \|x\|_{\ell^\infty}$ for all $x$ (the supremum is unchanged since we only add a zero at the front). So $S$ is an isometry: $\|S\| = 1$.
**Point spectrum: $\sigma_p(S) = \varnothing$.** Suppose $S(a_1, a_2, \ldots) = \lambda(a_1, a_2, \ldots)$ for some $\lambda \in \mathbb{C}$ and some $a \ne 0$. Comparing components: $0 = \lambda a_1$ (first component) and $a_k = \lambda a_{k+1}$ for $k \ge 1$. From the first equation, either $\lambda = 0$ (which gives $a_k = 0$ for all $k$ by induction from $a_k = \lambda a_{k+1}$) or $a_1 = 0$ (which gives $a_1 = 0$, then $a_1 = \lambda a_2$ gives $a_2 = 0$, and inductively $a_k = 0$ for all $k$). In both cases $a = 0$, so there is no eigenvector. The right shift has **no eigenvalues**.
**Full spectrum: $\sigma(S) = \overline{B}(0, 1)$.** We show that for every $|\lambda| \le 1$, the operator $S - \lambda I$ is not surjective. Fix such a $\lambda$ (assume $\lambda \ne 0$; the case $\lambda = 0$ is immediate since $S$ itself is not surjective — the sequence $(1, 0, 0, \ldots)$ is not in the image). Take $b = (b_1, b_2, \ldots) \in \ell^\infty$ and suppose $(S - \lambda I)(a) = b$, i.e.
\begin{align*}
(0, a_1, a_2, \ldots) - \lambda(a_1, a_2, \ldots) = (b_1, b_2, \ldots).
\end{align*}
Comparing components: $-\lambda a_1 = b_1$ gives $a_1 = -\lambda^{-1} b_1$. Then $a_1 - \lambda a_2 = b_2$ gives $a_2 = \lambda^{-1}(a_1 - b_2) = -\lambda^{-1} b_2 - \lambda^{-2} b_1$. Inductively:
\begin{align*}
a_n = -\sum_{k=1}^n \lambda^{-(n-k+1)} b_k = -\lambda^{-1} \sum_{k=1}^n \lambda^{-(n-k)} b_k.
\end{align*}
When $|\lambda| \le 1$, choosing $b_k$ so that each term $\lambda^{-(n-k)} b_k$ has the same sign produces $|a_n| \ge n$. Concretely, take $b_k := \lambda^k / |\lambda|^k$ (a sequence of modulus $1$ on the unit circle). Then $\lambda^{-(n-k)} b_k = \lambda^{-(n-2k)} |\lambda|^{-k}$. For $\lambda$ real and positive, $b_k = 1$ for all $k$, and $|a_n| = |\lambda|^{-1}(1 + |\lambda|^{-1} + \cdots + |\lambda|^{-(n-1)})$, which diverges as $n \to \infty$ since $|\lambda|^{-1} \ge 1$. So $a \notin \ell^\infty$ for this choice of $b$, proving $S - \lambda I$ is not surjective.
Conversely, we already know $\sigma(S) \subset \overline{B}(0, \|S\|) = \overline{B}(0, 1)$ from the [Spectrum Is Nonempty and Compact](/theorems/889). Therefore $\sigma(S) = \overline{B}(0, 1)$.
**Approximate point spectrum: $\sigma_{ap}(S) = \partial B(0, 1)$.** The approximate point spectrum consists of $\lambda$ for which there exist $x_n$ with $\|x_n\| = 1$ and $\|(S - \lambda I)x_n\| \to 0$. First, $\sigma_{ap}(S) \supset \partial\sigma(S) = \partial B(0, 1)$ (a general fact: the [boundary](/page/Boundary) of the spectrum is always contained in the approximate point spectrum). Second, for $|\lambda| < 1$, the reverse triangle inequality and the isometry property $\|Sx\|_{\ell^\infty} = \|x\|_{\ell^\infty}$ give:
\begin{align*}
\|(S - \lambda I)x\|_{\ell^\infty} = \|Sx - \lambda x\|_{\ell^\infty} \ge \|Sx\|_{\ell^\infty} - |\lambda| \cdot \|x\|_{\ell^\infty} = (1 - |\lambda|)\|x\|_{\ell^\infty}.
\end{align*}
So $\|(S - \lambda I)x\| \ge (1 - |\lambda|) > 0$ for all unit vectors $x$, meaning $\lambda \notin \sigma_{ap}(S)$.
Combining: $\sigma_{ap}(S) = \partial B(0, 1) = \{|\lambda| = 1\}$.
This example demonstrates a phenomenon impossible in finite dimensions: the entire closed disk is in the spectrum, but no point is an eigenvalue. The "spectral obstruction" comes entirely from failure of surjectivity, not injectivity.
[/example]
[quotetheorem:889]
The three parts of this theorem use different tools. Containment in $\overline{B}(0, \|T\|)$ follows from the Neumann [series](/page/Series): if $|\lambda| > \|T\|$, then $\|\lambda^{-1}T\| < 1$, so $I - \lambda^{-1}T$ is invertible by $\sum_{k=0}^\infty (\lambda^{-1}T)^k$. Closedness of $\sigma(T)$ follows from the openness of the set of invertible operators (Lemma: if $S$ is invertible and $\|T - S\| < \|S^{-1}\|^{-1}$, then $T$ is invertible). Nonemptiness is the deep part: it uses Liouville's theorem for Banach-space-valued analytic functions, which itself relies on the Hahn-Banach theorem.
[citeproof:889]
## Compact Operators
[definition:Compact Operator]
A linear map $T: X \to Y$ between Banach spaces is **compact** if for every bounded subset $E \subset X$, the image $T(E)$ is precompact in $Y$ (its closure is compact). Equivalently, $T$ is compact if and only if $T(\overline{B}_1(0))$ is precompact. We write $\mathcal{K}(X) := \mathcal{K}(X, X)$ for the set of compact operators on $X$.
[/definition]
Every finite-rank operator (with finite-dimensional range) is compact, since bounded subsets of finite-dimensional spaces are precompact. The set $\mathcal{K}(X)$ is a closed two-sided ideal in $\mathcal{L}(X)$: it is closed under addition, scalar multiplication, and composition with arbitrary bounded operators, and it is closed in the operator norm. The closedness implies that limits of finite-rank operators are compact — this is how compact operators arise in practice (e.g., integral operators with smooth kernels are limits of finite-rank approximations).
Compact operators are the "almost finite-dimensional" operators. The [Characterisation of Finite Dimensionality](/theorems/878) shows that the identity $I: X \to X$ is compact if and only if $X$ is finite-dimensional. In infinite dimensions, compact operators are therefore strictly "smaller" than the identity — they cannot be invertible. This forces $0 \in \sigma(T)$ for every compact operator on an infinite-dimensional space.
[example:Integral Operators]
Let $K: \mathbb{R} \to \mathbb{R}$ be a smooth function. Define $T: C([0,1]) \to C([0,1])$ by $(Tf)(x) := \int_0^1 K(x - y) f(y) \, d\mathcal{L}^1(y)$. This operator is compact: if $\|f_n\|_\infty \le 1$, then $\|Tf_n\|_\infty \le \sup |K|$ and $\|(Tf_n)'\|_\infty \le \sup |K'|$, so $\{Tf_n\}$ is bounded and equicontinuous. By the [Arzelà-Ascoli Theorem](/theorems/885), the sequence has a uniformly convergent subsequence, so $T(\overline{B}_1(0))$ is precompact.
[/example]
[example:The Diagonal Operator on $\ell^2$]
Let $(\lambda_n)_{n=1}^\infty$ be a bounded sequence of real numbers, and define $T: \ell^2 \to \ell^2$ by $T(x_1, x_2, \ldots) := (\lambda_1 x_1, \lambda_2 x_2, \ldots)$. This is a bounded linear operator with $\|T\| = \sup_n |\lambda_n|$.
We claim: $T$ is compact if and only if $\lambda_n \to 0$ as $n \to \infty$.
**($\Leftarrow$):** Suppose $\lambda_n \to 0$. Define the finite-rank operators $T_N(x) := (\lambda_1 x_1, \ldots, \lambda_N x_N, 0, 0, \ldots)$. Then:
\begin{align*}
\|T - T_N\| = \sup_{n > N} |\lambda_n| \to 0 \text{ as } N \to \infty.
\end{align*}
Each $T_N$ has finite-dimensional range (contained in $\operatorname{span}\{e_1, \ldots, e_N\}$), so $T_N$ is compact. Since compact operators form a closed subspace of $\mathcal{L}(\ell^2)$ and $T_N \to T$ in operator norm, $T$ is compact.
**($\Rightarrow$):** Suppose $\lambda_n \not\to 0$. Then there exists $\varepsilon > 0$ and a subsequence $(n_k)$ with $|\lambda_{n_k}| \ge \varepsilon$ for all $k$. The standard basis vectors $e_{n_k}$ satisfy $\|e_{n_k}\|_{\ell^2} = 1$ and for $j \ne k$:
\begin{align*}
\|Te_{n_j} - Te_{n_k}\|_{\ell^2} = \|(\lambda_{n_j} e_{n_j} - \lambda_{n_k} e_{n_k})\|_{\ell^2} = \sqrt{\lambda_{n_j}^2 + \lambda_{n_k}^2} \ge \varepsilon\sqrt{2}.
\end{align*}
So $(Te_{n_k})_k$ has no convergent subsequence, contradicting compactness of $T$.
This example illustrates the general principle: compact operators are limits of finite-rank operators, and the "tail" of the operator must decay. The condition $\lambda_n \to 0$ is precisely the statement that the operator becomes "increasingly finite-rank" in the basis directions.
[/example]
## The Spectral Theorem for Compact [Self-Adjoint Operators](/page/Self-Adjoint%20Operators)
We now restrict to compact self-adjoint operators on a Hilbert space, where the spectral theory is as complete as in finite dimensions. A self-adjoint operator $T \in \mathcal{L}(H)$ satisfies $(Tx, y)_H = (x, Ty)_H$ for all $x, y \in H$. Self-adjointness forces all eigenvalues to be real (since $\lambda\|x\|^2 = (Tx, x) = (x, Tx) = \bar{\lambda}\|x\|^2$) and eigenvectors for distinct eigenvalues to be orthogonal (since $(\lambda - \mu)(x, y) = (Tx, y) - (x, Ty) = 0$).
The critical observation is that the operator norm of a self-adjoint operator is "witnessed" by the quadratic form: $\|T\| = \sup_{\|x\| = 1} |(Tx, x)_H|$. This is proved using the polarisation identity and the parallelogram law. The compactness of $T$ then allows us to extract an eigenvector: a maximising sequence $\{x_n\}$ with $(Tx_n, x_n) \to \|T\|$ has a subsequence with $Tx_{n_k} \to y$ (by compactness), and the estimate $\|Tx_n - \|T\|x_n\|^2 \to 0$ shows that $x_{n_k}$ converges to an eigenvector with eigenvalue $\pm\|T\|$.
[quotetheorem:538]
This theorem is the culmination of the course. It gives a complete diagonalisation of compact self-adjoint operators: $T = \sum_k \mu_k (\cdot, e_k)_H e_k$, where $\{e_k\}$ is an orthonormal set of eigenvectors and $\mu_k \to 0$. The convergence is in operator norm, which is a much stronger statement than pointwise convergence. The decomposition reduces $T$ to a "diagonal" form — the infinite-dimensional analogue of diagonalising a real symmetric matrix by an orthogonal change of basis.
The proof proceeds by induction. At each stage, we find the eigenvalue of largest magnitude (using the variational characterisation and compactness), isolate its eigenspace, restrict $T$ to the orthogonal complement (which is invariant under $T$ by self-adjointness), and repeat. The key points are: compactness forces eigenvalues to accumulate only at zero (if there were infinitely many with $|\mu_k| \ge a > 0$, the eigenvectors would give a bounded sequence with no convergent subsequence under $T$, contradicting compactness); finite dimensionality of eigenspaces follows from the same argument; and the expansion formula $T = \sum \mu_k P_{e_k}$ holds because $\|T - T_n\| = \|T|_{H_n}\| = |\mu_{n+1}| \to 0$.
[citeproof:538]
[example:The Right Shift Operator Revisited]
Returning to the right shift $S$ on $\ell^\infty$: $\sigma_p(S) = \varnothing$, $\sigma(S) = \overline{B}(0, 1)$, and the approximate point spectrum $\sigma_{ap}(S) = \partial B(0, 1)$ (the unit circle). This example shows that for general (non-compact, non-self-adjoint) operators, the spectrum can be a full disk with no eigenvalues at all — a situation impossible for compact self-adjoint operators, whose spectrum consists only of eigenvalues (plus possibly zero).
[/example]
The spectral theorem for compact self-adjoint operators is the starting point for more advanced spectral theory: the spectral theorem for bounded self-adjoint operators (which replaces the sum over eigenvalues by an integral over the spectrum), the theory of unbounded operators (essential for quantum mechanics and PDE), and the functional calculus (which allows us to define $f(T)$ for continuous functions $f$ on the spectrum). These topics are developed in Part II courses (Analysis of Functions, Functional Analysis).