What brings you to Androma?

This opening chapter fixes the viewpoint of the course. Spectral theory begins with a simple finite-dimensional question: what remains of eigenvalues, diagonalisation, and matrix functions when the [vector space](/page/Vector%20Space) is infinite-dimensional? The answer is not only a list of analogues, because infinite-dimensional operators can fail to have eigenvectors while still having spectral behaviour detected by invertibility, approximation, and holomorphic methods. The course will work mainly with bounded [linear operators on Banach spaces](/page/Linear%20Operators%20on%20Banach%20Spaces) and Hilbert spaces. Banach spaces provide the natural setting for resolvents and complex analysis; Hilbert spaces add orthogonality, adjoints, and normality, which make the strongest diagonalisation theorems possible. This chapter is a map of the terrain, the prerequisites, and the recurring examples. # 0. Introduction ## What Spectral Theory Tries to Replace The finite-dimensional model asks us to study a matrix $A \in \mathbb C^{n \times n}$ by the values of $\lambda \in \mathbb C$ for which $A - \lambda I$ is not invertible. In finite dimensions, non-invertibility is equivalent to the existence of a non-zero vector $v$ with $Av = \lambda v$. Infinite-dimensional spaces separate these two ideas, and that separation is the main source of the subject. [definition: Bounded Linear Operator] Let $X$ and $Y$ be normed spaces over $\mathbb C$. A map $T: X \to Y$ is a [bounded linear operator](/page/Bounded%20Linear%20Operator) if $T$ is linear and there exists $C \ge 0$ such that $\|Tx\|_Y \le C\|x\|_X$ for every $x \in X$. [/definition] Boundedness is the continuity condition that makes analysis possible. To compare different operators, discuss convergence of operator sequences, and treat perturbations quantitatively, we need a numerical size attached to each bounded operator. [definition: Operator Norm] Let $X$ and $Y$ be normed spaces and let $T: X \to Y$ be a bounded linear operator. The operator norm of $T$ is \begin{align*} \|T\|_{\mathcal L(X,Y)} = \sup\{\|Tx\|_Y : x \in X,\ \|x\|_X \le 1\}. \end{align*} [/definition] The notation $\mathcal L(X,Y)$ denotes the normed space of bounded linear operators from $X$ to $Y$, and $\mathcal L(X)$ denotes $\mathcal L(X,X)$. The course often studies $\mathcal L(X)$ not only as a vector space but as an algebra under composition. [example: Multiplication By A Bounded Function] Let $(E,\mathcal E,\mu)$ be a [measure space](/page/Measure%20Space), let $1 \le p \le \infty$, and let $m \in L^\infty(E)$. Define $M_m:L^p(E)\to L^p(E)$ by $(M_m f)(x)=m(x)f(x)$ for representatives. Linearity follows pointwise: for scalars $a,b$ and functions $f,g$, \begin{align*} M_m(af+bg)(x)=m(x)(af(x)+bg(x))=a(M_m f)(x)+b(M_m g)(x). \end{align*} For $1\le p<\infty$, the defining property of the essential supremum gives $|m(x)|\le \|m\|_{L^\infty}$ for almost every $x$, hence \begin{align*} \|M_m f\|_{L^p}^p=\int_E |m(x)f(x)|^p\,d\mu(x)\le \int_E \|m\|_{L^\infty}^p |f(x)|^p\,d\mu(x)=\|m\|_{L^\infty}^p\|f\|_{L^p}^p. \end{align*} Taking $p$th roots gives $\|M_m f\|_{L^p}\le \|m\|_{L^\infty}\|f\|_{L^p}$. For $p=\infty$, the same almost-everywhere bound gives \begin{align*} \|M_m f\|_{L^\infty}=\operatorname*{ess\,sup}_{x\in E}|m(x)f(x)|\le \|m\|_{L^\infty}\|f\|_{L^\infty}. \end{align*} Thus $M_m$ is bounded and $\|M_m\|_{\mathcal L(L^p(E))}\le \|m\|_{L^\infty}$. To see the reverse inequality in the non-degenerate case, fix $0<c<\|m\|_{L^\infty}$ and set $A_c=\{x\in E:|m(x)|>c\}$. By the definition of essential supremum, $\mu(A_c)>0$. For $p=\infty$, take $f=\mathbf 1_{A_c}$; then $\|f\|_{L^\infty}=1$ and \begin{align*} \|M_m f\|_{L^\infty}=\operatorname*{ess\,sup}_{x\in A_c}|m(x)|\ge c. \end{align*} For $1\le p<\infty$, whenever $A_c$ contains a measurable subset $B$ with $0<\mu(B)<\infty$, take $f=\mathbf 1_B$. Then $\|f\|_{L^p}=\mu(B)^{1/p}$ and \begin{align*} \|M_m f\|_{L^p}^p=\int_B |m(x)|^p\,d\mu(x)\ge \int_B c^p\,d\mu(x)=c^p\mu(B)=c^p\|f\|_{L^p}^p. \end{align*} Hence $\|M_m\|\ge c$. Since this holds for every $c<\|m\|_{L^\infty}$, we get $\|M_m\|_{\mathcal L(L^p(E))}=\|m\|_{L^\infty}$ in the usual non-zero $L^p$ setting. The zero space is the degenerate exception. This example already suggests that the spectrum of an operator should remember the essential range of a function, not only eigenvalues. [/example] The previous example is diagonal in the pointwise sense, but many important operators are not given by multiplication. The next object records the obstruction to inverting $T-\lambda I$ and is the organising definition for the first part of the course. [definition: Spectrum And Resolvent Set] Let $X$ be a complex [Banach space](/page/Banach%20Space) and let $T \in \mathcal L(X)$. The resolvent set of $T$ is \begin{align*} \rho(T)=\{\lambda \in \mathbb C : T-\lambda I \text{ is bijective and } (T-\lambda I)^{-1}\in \mathcal L(X)\}. \end{align*} The spectrum of $T$ is \begin{align*} \sigma(T)=\mathbb C\setminus \rho(T). \end{align*} [/definition] For Banach spaces, the bounded inverse theorem means that bijectivity of $T-\lambda I$ already gives boundedness of the inverse. The inverse is included in the definition because the normed-space setting requires it, and because the resolvent operator itself is the [analytic function](/page/Analytic%20Function) we study. [definition: Resolvent Operator] Let $X$ be a complex Banach space, let $T \in \mathcal L(X)$, and let $\lambda \in \rho(T)$. The resolvent operator of $T$ at $\lambda$ is \begin{align*} R(\lambda,T)=(T-\lambda I)^{-1}:X\to X. \end{align*} [/definition] Since $\lambda\in\rho(T)$, this inverse belongs to $\mathcal L(X)$. The sign convention here uses $T-\lambda I$, so later identities must keep this order. Some books use $(\lambda I-T)^{-1}$ instead; the mathematical content is the same after changing signs consistently. ## Why Infinite Dimensions Change Eigenvalues The first warning is that spectral points need not be eigenvalues. In finite dimensions, injectivity and surjectivity are equivalent for linear maps on the same space; in infinite dimensions, an operator may be injective with dense non-closed range, injective with non-dense range, or fail to be injective. Spectral theory refines these failures rather than hiding them. We write $X^*$ for the [dual space](/page/Dual%20Space) of a normed space $X$, meaning the space of bounded linear functionals $f:X\to\mathbb C$. [definition: Point Spectrum] Let $X$ be a complex Banach space and let $T \in \mathcal L(X)$. The point spectrum of $T$ is \begin{align*} \sigma_p(T)=\{\lambda \in \mathbb C : \ker(T-\lambda I)\ne \{0\}\}. \end{align*} [/definition] The point spectrum is the set of genuine eigenvalues. It is often too small to control the operator, so the course introduces approximate eigenvectors and range conditions as complementary spectral data. [definition: Approximate Point Spectrum] Let $X$ be a complex Banach space and let $T \in \mathcal L(X)$. The approximate point spectrum of $T$ is the set of all $\lambda \in \mathbb C$ for which there exists a sequence $(x_n)$ in $X$ such that $\|x_n\|_X=1$ for every $n$ and \begin{align*} \|(T-\lambda I)x_n\|_X \to 0. \end{align*} [/definition] Approximate eigenvectors detect failure of $T-\lambda I$ to be bounded below. This is especially important in Hilbert spaces, where spectral measures and the spectral theorem eventually turn approximate spectral behaviour into a form of diagonalisation. [example: Unilateral Shift] Let $S:\ell^2\to\ell^2$ be the unilateral shift $S(x_1,x_2,\dots)=(0,x_1,x_2,\dots)$. If $Sx=\lambda x$, then comparing coordinates gives \begin{align*} 0=\lambda x_1. \end{align*} For every $n\ge 1$, the $(n+1)$st coordinate gives \begin{align*} x_n=\lambda x_{n+1}. \end{align*} If $\lambda=0$, the equations $x_n=0\cdot x_{n+1}$ give $x_n=0$ for every $n$. If $\lambda\ne 0$, the first equation gives $x_1=0$, and then $x_n=\lambda x_{n+1}$ gives $x_{n+1}=\lambda^{-1}x_n=0$ inductively. Thus $S$ has no non-zero eigenvector, so in particular the open unit disc is not contained in the point spectrum of $S$. The adjoint $S^*$ is the backward shift. Indeed, for $x,y\in\ell^2$, \begin{align*} (Sx,y)_{\ell^2}=\sum_{n=1}^{\infty}(Sx)_n\overline{y_n}=\sum_{n=2}^{\infty}x_{n-1}\overline{y_n}=\sum_{k=1}^{\infty}x_k\overline{y_{k+1}}=(x,(y_2,y_3,\dots))_{\ell^2}. \end{align*} Hence $S^*y=(y_2,y_3,\dots)$. For $|\alpha|<1$, the vector \begin{align*} v_\alpha=(1,\alpha,\alpha^2,\dots) \end{align*} belongs to $\ell^2$ because \begin{align*} \|v_\alpha\|_{\ell^2}^2=\sum_{n=0}^{\infty}|\alpha|^{2n}=\frac{1}{1-|\alpha|^2}. \end{align*} Applying $S^*$ gives \begin{align*} S^*v_\alpha=(\alpha,\alpha^2,\alpha^3,\dots)=\alpha(1,\alpha,\alpha^2,\dots)=\alpha v_\alpha. \end{align*} Thus every $\alpha$ with $|\alpha|<1$ is an eigenvalue of $S^*$ even though no point of the open unit disc is an eigenvalue of $S$. Boundary points are detected by approximate eigenvectors for $S$. If $|\lambda|=1$, set \begin{align*} u_N=\frac{1}{\sqrt N}(1,\lambda^{-1},\lambda^{-2},\dots,\lambda^{-(N-1)},0,0,\dots). \end{align*} Then $\|u_N\|_{\ell^2}=1$, and the two vectors $Su_N$ and $\lambda u_N$ differ only in their first and $(N+1)$st coordinates: \begin{align*} (S-\lambda I)u_N=\frac{1}{\sqrt N}(-\lambda,0,\dots,0,\lambda^{-(N-1)},0,\dots). \end{align*} Therefore \begin{align*} \|(S-\lambda I)u_N\|_{\ell^2}^2=\frac{|\lambda|^2+|\lambda|^{-2(N-1)}}{N}=\frac{2}{N}. \end{align*} So $\|(S-\lambda I)u_N\|_{\ell^2}\to 0$. The shift therefore shows that eigenvalues, adjoint eigenvalues, and approximate eigenvectors can carry different spectral information in infinite dimensions. [/example] This example motivates a central theme: the spectrum is stable and geometrically meaningful even when eigenvectors are absent. The course will return to shifts as test cases for spectral radius, approximate spectrum, and the distinction between normal and non-normal behaviour. [remark: Finite-Dimensional Intuition] In finite-dimensional complex vector spaces, every spectral point is an eigenvalue because $T-\lambda I$ fails to be invertible exactly when its kernel is non-zero. In infinite-dimensional Banach spaces, the failure of invertibility may instead come from a non-surjective range or from an inverse that exists algebraically on the range but is not bounded. The course treats these possibilities as structural information, not as pathologies. [/remark] The finite-dimensional picture still supplies the right questions. What replaces the [characteristic polynomial](/page/Characteristic%20Polynomial)? How large can the spectrum be? When can an operator be represented by multiplication by the independent variable? These questions lead to resolvent methods first, and Hilbert-space spectral theorems later. ## The Resolvent As An Analytic Object The course begins with a question about stability: if $T-\lambda I$ is invertible, must $T-\mu I$ remain invertible for nearby $\mu$? The relevant mechanism is the elementary Neumann-series principle: on a Banach space, if $\|A\|_{\mathcal L(X)}<1$, then $I-A$ is invertible and its inverse is represented by the norm-convergent geometric series $\sum_{n=0}^{\infty}A^n$. Completeness is essential because the inverse is obtained as a norm limit in $\mathcal L(X)$, and the strict norm bound is a sufficient condition for convergence rather than a necessary condition for invertibility. For example, $A=2I$ has norm $2$, but $I-A=-I$ is still invertible. For the resolvent, this principle is applied to the scalar perturbation \begin{align*} T-\mu I=(T-\lambda I)\bigl(I-(\mu-\lambda)R(\lambda,T)\bigr), \end{align*} whenever $\lambda\in\rho(T)$. Thus nearby parameters remain in the resolvent set as long as $|\mu-\lambda|\,\|R(\lambda,T)\|<1$. The same expansion also explains why the resolvent depends analytically on the spectral parameter. [quotetheorem:8387] [citeproof:8387] Analyticity is the bridge from operator theory to complex analysis, and the hypotheses explain why this is a Banach-space theorem over $\mathbb C$. Completeness enters through the Neumann series in $\mathcal L(X)$: if $X=c_{00}$ with the sup norm and $A=\frac12 S$ is the right shift, then $\|A\|<1$, but solving $(I-A)y=e_1$ gives $y=(1,\frac12,\frac14,\dots)\notin c_{00}$, so the Neumann inverse lives in the completion rather than in $X$. The complex scalar field is also part of the statement, because the conclusion is holomorphy of $\lambda\mapsto R(\lambda,T)$; over a real Banach space there is only a real-parameter resolvent unless the space is complexified. The theorem is local and says only that invertibility persists under sufficiently small scalar perturbations. It does not say that spectral points are isolated, that spectral points are eigenvalues, or that the boundary of the resolvent has a discrete shape. Those global questions are deliberately postponed to the first development chapter, where the compactness and nonemptiness of the spectrum, the spectral radius formula, and the Volterra example appear in their natural order. ## Hilbert Spaces And Adjoint Structure The second half of the course asks when an operator can be treated like a diagonal matrix. Banach-space methods give the spectrum and resolvent, but Hilbert spaces add the adjoint, orthogonality, and projection geometry. These features make self-adjoint and normal operators the infinite-dimensional analogues of Hermitian and normal matrices. The definitions in this section are not merely a vocabulary list. Each one names a different way in which an operator can respect Hilbert-space geometry: the adjoint encodes how an operator moves inner products, self-adjointness and normality control spectral decomposition, unitaries and isometries preserve lengths, projections split the space orthogonally, and positivity imports order-like information into operator theory. The first construction is the adjoint, because all of the later classes are expressed by comparing $T$ with $T^*$. [definition: Adjoint Operator] Let $H$ be a complex Hilbert space and let $T \in \mathcal L(H)$. The adjoint of $T$ is the unique operator $T^*\in \mathcal L(H)$ such that \begin{align*} (Tx,y)_H=(x,T^*y)_H \end{align*} for all $x,y\in H$. [/definition] The existence of the adjoint is a Hilbert-space result using the [Riesz representation theorem](/theorems/218). Once $T^*$ is available, the next question is which compatibility conditions between $T$ and $T^*$ reproduce the spectral behaviour of Hermitian and normal matrices. The two most important answers are self-adjointness, which is the operator analogue of being Hermitian, and normality, which is the broader condition under which adjoints commute with the original operator and spectral decompositions remain possible. [definition: Self-Adjoint And Normal Operators] Let $H$ be a complex Hilbert space and let $T\in \mathcal L(H)$. The operator $T$ is self-adjoint if $T=T^*$. The operator $T$ is normal if $TT^*=T^*T$. [/definition] Self-adjoint operators are normal, and normal operators are the correct setting for the bounded spectral theorem. In finite dimensions, normality is exactly the hypothesis that permits unitary diagonalisation. Infinite-dimensional Hilbert spaces introduce a new obstruction: even a bounded self-adjoint operator need not have any eigenvectors. The compact self-adjoint spectral theorem, introduced later in the compact-operator discussion, is used here in its structural role: compactness recovers a discrete spectral picture, where nonzero spectral values are detected by genuine eigenvectors and the space splits orthogonally around them. This result is the prototype for later developments: spectral information becomes an [orthogonal decomposition](/theorems/436). Its hypotheses also mark the limits of the eigenvector method. Compactness is essential for this conclusion; multiplication by $x$ on $L^2([0,1])$ is bounded and self-adjoint but has no eigenvectors, with spectrum filling the interval $[0,1]$. That example is the reason later chapters introduce special language for spectral values attached to multiplication and unitary models before passing to the full spectral-measure theorem. Self-adjointness is also essential: the unilateral shift is an isometry with highly structured spectrum, but it is not diagonalised by an orthonormal eigenbasis. Thus the compact theorem should be read as the discrete case of spectral theory. Later chapters replace the discrete sum over eigenvectors by a more flexible functional calculus, which treats an operator by applying suitable scalar functions to its spectrum. [example: Compact Diagonal Operator] Let $H=\ell^2$, let $(e_n)$ be the standard [orthonormal basis](/page/Orthonormal%20Basis), and define \begin{align*} K(x_1,x_2,\dots)=(a_1x_1,a_2x_2,\dots), \end{align*} where each $a_n\in\mathbb R$ and $a_n\to 0$. Since a convergent sequence is bounded, set $M=\sup_{n\ge 1}|a_n|<\infty$. For $x=(x_n)\in\ell^2$, \begin{align*} \|Kx\|_{\ell^2}^2=\sum_{n=1}^{\infty}|a_nx_n|^2\le M^2\sum_{n=1}^{\infty}|x_n|^2=M^2\|x\|_{\ell^2}^2. \end{align*} Thus $K$ is bounded and $\|K\|\le M$. The operator is self-adjoint. Indeed, for $x=(x_n)$ and $y=(y_n)$ in $\ell^2$, using that the [inner product](/page/Inner%20Product) is linear in the first variable, \begin{align*} (Kx,y)_{\ell^2}=\sum_{n=1}^{\infty}a_nx_n\overline{y_n}. \end{align*} Since $a_n=\overline{a_n}$, \begin{align*} (x,Ky)_{\ell^2}=\sum_{n=1}^{\infty}x_n\overline{a_ny_n}=\sum_{n=1}^{\infty}x_n a_n\overline{y_n}=\sum_{n=1}^{\infty}a_nx_n\overline{y_n}. \end{align*} Hence $(Kx,y)_{\ell^2}=(x,Ky)_{\ell^2}$ for all $x,y$, so $K=K^*$. To prove compactness, define the finite-rank truncation \begin{align*} K_N(x_1,x_2,\dots)=(a_1x_1,\dots,a_Nx_N,0,0,\dots). \end{align*} Then $K_N$ has range contained in $\operatorname{span}\{e_1,\dots,e_N\}$, so $K_N$ is compact by the finite-dimensional compactness criterion. For $\|x\|_{\ell^2}\le 1$, \begin{align*} \|(K-K_N)x\|_{\ell^2}^2=\sum_{n>N}|a_nx_n|^2\le \left(\sup_{n>N}|a_n|\right)^2\sum_{n>N}|x_n|^2\le \left(\sup_{n>N}|a_n|\right)^2. \end{align*} Therefore \begin{align*} \|K-K_N\|\le \sup_{n>N}|a_n|. \end{align*} Because $a_n\to 0$, the right side tends to $0$, so $K$ is the operator-[norm limit of compact operators](/theorems/4892). Hence $K$ is compact. Each standard basis vector is an eigenvector: \begin{align*} Ke_j=(0,\dots,0,a_j,0,\dots)=a_je_j. \end{align*} Thus every listed value $a_j$ belongs to $\sigma_p(K)$, hence to $\sigma(K)$. Now compute the whole spectrum. If $\lambda\notin \overline{\{a_n:n\in\mathbb N\}}$, then \begin{align*} d=\inf_{n\ge 1}|a_n-\lambda|>0. \end{align*} Define \begin{align*} R_\lambda(y_1,y_2,\dots)=\left(\frac{y_1}{a_1-\lambda},\frac{y_2}{a_2-\lambda},\dots\right). \end{align*} For $y\in\ell^2$, \begin{align*} \|R_\lambda y\|_{\ell^2}^2=\sum_{n=1}^{\infty}\frac{|y_n|^2}{|a_n-\lambda|^2}\le d^{-2}\sum_{n=1}^{\infty}|y_n|^2=d^{-2}\|y\|_{\ell^2}^2. \end{align*} Also, coordinate by coordinate, \begin{align*} ((K-\lambda I)R_\lambda y)_n=(a_n-\lambda)\frac{y_n}{a_n-\lambda}=y_n. \end{align*} And for $x\in\ell^2$, \begin{align*} (R_\lambda(K-\lambda I)x)_n=\frac{(a_n-\lambda)x_n}{a_n-\lambda}=x_n. \end{align*} Thus $K-\lambda I$ is invertible with bounded inverse $R_\lambda$, so $\lambda\in\rho(K)$. Conversely, if $\lambda\in\overline{\{a_n:n\in\mathbb N\}}$ and $\lambda=a_j$ for some $j$, then $(K-\lambda I)e_j=0$, so $K-\lambda I$ is not injective. If $\lambda$ is a [limit point](/page/Limit%20Point) not equal to any $a_j$, choose indices $n_k$ with $a_{n_k}\to\lambda$. Then $\|e_{n_k}\|_{\ell^2}=1$ and \begin{align*} \|(K-\lambda I)e_{n_k}\|_{\ell^2}=|a_{n_k}-\lambda|\to 0. \end{align*} If $K-\lambda I$ had a bounded inverse, then \begin{align*} 1=\|e_{n_k}\|_{\ell^2}\le \|(K-\lambda I)^{-1}\|\,\|(K-\lambda I)e_{n_k}\|_{\ell^2}\to 0, \end{align*} which is impossible. Hence $\lambda\in\sigma(K)$. Therefore \begin{align*} \sigma(K)=\overline{\{a_n:n\in\mathbb N\}}. \end{align*} Since $a_n\to 0$, the only possible accumulation point is $0$, so this diagonal compact self-adjoint operator has exactly the spectral shape predicted by the compact self-adjoint theory: eigenvalues along the diagonal, with accumulation only at $0$. [/example] The compact theorem explains why compact self-adjoint operators behave like diagonal matrices with entries tending to zero. Removing compactness forces a new language, because multiplication by a [continuous function](/page/Continuous%20Function) on an interval usually has no eigenvectors but still has a perfectly controlled spectrum. ## Functional Calculus As The Course Destination The final question of the course is how to make sense of $f(T)$ when $f$ is a function defined on the spectrum of $T$. For matrices, polynomials in $A$ are immediate and diagonalisation extends the definition to many functions. For bounded normal operators, the spectral theorem supplies the same idea without requiring eigenvectors. [definition: Polynomial Functional Calculus] Let $X$ be a complex Banach space and let $T\in\mathcal L(X)$. For a polynomial $p(z)=\sum_{k=0}^n a_k z^k$, define \begin{align*} p(T)=\sum_{k=0}^n a_k T^k \in \mathcal L(X). \end{align*} [/definition] The polynomial calculus is algebraic, but it already interacts strongly with the spectrum. The first compatibility question is whether applying a polynomial to an operator transforms spectral values by applying the same polynomial to the original spectrum. [quotetheorem:2671] [citeproof:2671] The polynomial result is a first version of the functional calculus, and its proof shows exactly where the hypotheses enter. The scalar field is complex so that $p(z)-\mu$ factors into linear terms. Without that factorisation, the real version with real spectra can fail: if $J$ is rotation by $90$ degrees on $\mathbb R^2$, then $\sigma_{\mathbb R}(J)=\varnothing$, but for $p(t)=t^2+1$ we have $p(J)=0$, so $\sigma_{\mathbb R}(p(J))=\{0\}\ne p(\varnothing)$. Boundedness keeps every expression inside the Banach algebra $\mathcal L(X)$; outside that setting, polynomial expressions in an unbounded operator require domain control, and the simple product-of-invertibles proof no longer applies without extra hypotheses. The theorem is not yet a continuous or Borel functional calculus: it only treats polynomials, and it gives no direct meaning to $f(T)$ for an arbitrary continuous function $f$ on $\sigma(T)$. Later chapters extend the same spectral compatibility first to rational functions on Banach spaces and then to continuous functions on the spectrum of a bounded normal operator on a Hilbert space. [definition: Continuous Functional Calculus] Let $H$ be a complex Hilbert space and let $N\in\mathcal L(H)$ be normal. A continuous functional calculus for $N$ is a unital $*$-homomorphism \begin{align*} C(\sigma(N))\to \mathcal L(H), \qquad f\mapsto f(N), \end{align*} which sends the coordinate function $z\mapsto z$ to $N$. [/definition] This definition states the destination rather than proving existence. The bounded normal spectral theorem asserts that such a calculus exists and is isometric, so spectral questions about $N$ can be translated into function theory on the compact set $\sigma(N)$. [example: Multiplication Operators As Functional Calculus] Let $(E,\mathcal E,\mu)$ be a measure space and let $m\in L^\infty(E)$. On $L^2(E)$, multiplication by $m$ is bounded by the earlier multiplication-operator estimate, and its adjoint is multiplication by $\overline m$. Indeed, using the convention that the inner product is linear in the first variable, for $g,h\in L^2(E)$ we have \begin{align*} (M_m g,h)_{L^2}=\int_E m(x)g(x)\overline{h(x)}\,d\mu(x). \end{align*} Since $\overline{(\overline{m(x)}h(x))}=m(x)\overline{h(x)}$, this becomes \begin{align*} (M_m g,h)_{L^2}=\int_E g(x)\overline{\overline{m(x)}h(x)}\,d\mu(x)=(g,M_{\overline m}h)_{L^2}. \end{align*} Thus $M_m^*=M_{\overline m}$. Now, for every $g\in L^2(E)$, \begin{align*} (M_mM_m^*g)(x)=m(x)\overline{m(x)}g(x)=|m(x)|^2g(x). \end{align*} Also, \begin{align*} (M_m^*M_mg)(x)=\overline{m(x)}m(x)g(x)=|m(x)|^2g(x). \end{align*} Hence $M_mM_m^*=M_m^*M_m$, so $M_m$ is normal. The polynomial calculus already shows why the formula should be $f(M_m)=M_{f\circ m}$. If $p(z)=\sum_{k=0}^n c_kz^k$, then $M_m^0=I=M_1$, and induction gives $M_m^k=M_{m^k}$ for every $k\ge 1$, because \begin{align*} (M_mM_{m^k}g)(x)=m(x)m(x)^kg(x)=m(x)^{k+1}g(x). \end{align*} Therefore, for every $g\in L^2(E)$, \begin{align*} (p(M_m)g)(x)=\sum_{k=0}^n c_km(x)^kg(x)=p(m(x))g(x). \end{align*} Thus $p(M_m)=M_{p\circ m}$. The continuous functional calculus for the normal operator $M_m$ extends this rule from polynomials to continuous functions on $\sigma(M_m)$, so it gives \begin{align*} f(M_m)=M_{f\circ m}. \end{align*} This is the basic model behind the slogan that normal operators are generalized multiplication operators. [/example] Functional calculus is why the course does not stop at listing spectral sets. It provides a way to build projections, square roots, exponentials, and evolution operators from spectral data, and it prepares the language used in $C^*$-algebras and quantum mechanics. ## Prerequisites And Recurring Conventions The course assumes that Banach and Hilbert space basics are already available, but it will reuse them constantly. The guiding question in this chapter is not whether these facts are new, but which ones will be invoked so often that they should be kept visible. [definition: Banach Space] A Banach space is a [normed vector space](/page/Normed%20Vector%20Space) $(X,\|\cdot\|_X)$ that is complete with respect to the metric induced by its norm. [/definition] Completeness is what permits norm-convergent series of operators to define operators, and every use of the Neumann series depends on this point. The Hilbert-space part of the course requires an additional structure: a complete norm coming from an inner product, which turns geometry into algebra through orthogonality. [definition: Hilbert Space] A Hilbert space is an [inner product space](/page/Inner%20Product%20Space) $H$ that is complete with respect to the norm $\|x\|_H=(x,x)_H^{1/2}$. [/definition] The Hilbert-space inner product is taken linear in the first argument. Orthogonal decompositions, adjoints, and projection-valued measures all use this convention. The remaining prerequisite is a compactness notion for operators, because compact operators are the first infinite-dimensional class whose spectral theory is still largely eigenvalue-based. [definition: Compact Operator] Let $X$ and $Y$ be Banach spaces. An operator $K\in\mathcal L(X,Y)$ is compact if, for every bounded sequence $(x_n)$ in $X$, the sequence $(Kx_n)$ has a convergent subsequence in $Y$. [/definition] Compact operators form the bridge between finite-dimensional linear algebra and general spectral theory. They need not have finite rank, but their effect on bounded sequences is finite-dimensional enough to force discrete non-zero spectrum. [example: Integral Operator With Continuous Kernel] Let $K:[0,1]\times[0,1]\to\mathbb C$ be continuous and define $T:C[0,1]\to C[0,1]$ by \begin{align*} (Tf)(x)=\int_0^1 K(x,y)f(y)\,dy. \end{align*} Linearity follows from linearity of the integral: \begin{align*} T(af+bg)(x)=\int_0^1 K(x,y)(af(y)+bg(y))\,dy=a(Tf)(x)+b(Tg)(x). \end{align*} Since $K$ is continuous on the compact square $[0,1]^2$, it is bounded; set $M=\sup_{(x,y)\in[0,1]^2}|K(x,y)|$. Then \begin{align*} |(Tf)(x)|\le \int_0^1 |K(x,y)||f(y)|\,dy\le M\|f\|_\infty. \end{align*} Taking the supremum over $x$ gives $\|Tf\|_\infty\le M\|f\|_\infty$, so $T$ is bounded. We also check that $Tf$ is continuous. Since $K$ is uniformly continuous on $[0,1]^2$, for every $\varepsilon>0$ there is $\delta>0$ such that $|x-x'|<\delta$ implies $|K(x,y)-K(x',y)|<\varepsilon/(1+\|f\|_\infty)$ for every $y\in[0,1]$. Hence \begin{align*} |(Tf)(x)-(Tf)(x')|\le \int_0^1 |K(x,y)-K(x',y)||f(y)|\,dy\le \varepsilon. \end{align*} Thus $Tf\in C[0,1]$. To prove compactness, let $(f_n)$ be a bounded sequence in $C[0,1]$, and choose $R$ such that $\|f_n\|_\infty\le R$ for all $n$. The previous estimate gives \begin{align*} \|Tf_n\|_\infty\le MR. \end{align*} For equicontinuity, [uniform continuity](/page/Uniform%20Continuity) of $K$ gives, for every $\varepsilon>0$, a $\delta>0$ such that $|x-x'|<\delta$ implies $\sup_{y\in[0,1]}|K(x,y)-K(x',y)|<\varepsilon/(1+R)$. Therefore \begin{align*} |(Tf_n)(x)-(Tf_n)(x')|\le R\int_0^1 |K(x,y)-K(x',y)|\,dy\le \varepsilon \end{align*} for every $n$. The family $\{Tf_n:n\in\mathbb N\}$ is uniformly bounded and equicontinuous, so by the *Arzela-Ascoli Theorem* it has a uniformly convergent subsequence in $C[0,1]$. Hence $T$ is compact. In the $L^2$ setting, the [symmetry condition](/theorems/1360) $K(x,y)=\overline{K(y,x)}$ gives self-adjointness. For $g,h\in L^2[0,1]$, boundedness of $K$ and Cauchy-Schwarz justify Fubini, and \begin{align*} (Tg,h)_{L^2}=\int_0^1\int_0^1 K(x,y)g(y)\overline{h(x)}\,dy\,dx=\int_0^1 g(y)\int_0^1 \overline{K(y,x)}\,\overline{h(x)}\,dx\,dy=(g,Th)_{L^2}. \end{align*} Thus continuous kernels supply a concrete source of compact operators, and symmetric continuous kernels supply compact self-adjoint operators in Hilbert-space examples. [/example] The course also uses [weak convergence](/page/Weak%20Convergence), dual spaces, quotient arguments, and elementary complex analysis. The central complex-analytic inputs are [power series](/page/Power%20Series) for operator-valued holomorphic functions, [Cauchy's theorem](/theorems/797) in Banach-valued form via scalar testing, and [Liouville's theorem](/theorems/38). [remark: Scope Of The Course] The course studies bounded operators. Unbounded self-adjoint operators, closed operators, semigroups, and the spectral theory of differential operators are natural continuations, but they require domain considerations that are deliberately postponed. The bounded theory developed here supplies the language and many of the methods for those later topics. [/remark] The next chapter begins the systematic development with $\mathcal L(X)$ as a Banach algebra. From there the [resolvent identity](/theorems/8388), compactness of the spectrum, and spectral radius formula become the basic tools for everything that follows. Chapter 0 leaves us with the motivation and the general picture; Chapter 1 turns that picture into a precise framework by treating bounded operators as elements of the Banach algebra $\mathcal L(X)$. With the resolvent identity, the compactness of the spectrum, and the spectral radius formula in hand, the later chapters can build on concrete analytic tools rather than on intuition alone. # 1. Operators, Resolvents, and the Spectrum This first development chapter turns the overview from Chapter 0 into the working language used throughout spectral theory. The prerequisites are the basic theory of complex Banach spaces, bounded linear operators, operator norms, and the standard complex-analysis facts used later, especially [Liouville's theorem](/theorems/346) and power-series arguments. The central shift is from studying a [linear map](/page/Linear%20Map) by its values on vectors to studying the family of operators $T-\lambda I$ as the complex parameter $\lambda$ varies. The resolvent records where this family is invertible, while the spectrum records the precise obstruction to solving $(T-\lambda I)x=y$ inside the Banach space. ## Invertibility and Spectral Obstructions For a finite-dimensional matrix, failure of invertibility means the presence of an eigenvector. On an infinite-dimensional Banach space there are several ways invertibility can fail: the equation may have non-unique solutions, it may have solutions on a dense but non-closed range, or it may miss a genuine closed subspace of the codomain. The first task is to separate these failures, since later theorems detect different parts of the spectrum by different methods. Let $X$ be a complex Banach space. We write $\mathcal{L}(X)$ for the Banach algebra of bounded linear operators $T:X\to X$, equipped with the operator norm $\|T\|_{\mathcal{L}(X)}$. [definition: Invertible Operator in a Banach Algebra] Let $X$ be a Banach space and let $T\in \mathcal{L}(X)$. The operator $T$ is invertible in $\mathcal{L}(X)$ if there exists $S\in \mathcal{L}(X)$ such that \begin{align*} ST=TS=I. \end{align*} The inverse, when it exists, is denoted $T^{-1}$. [/definition] This definition requires the inverse to be bounded and defined on all of $X$. For bounded bijections between Banach spaces, boundedness of the inverse is automatic by the bounded inverse theorem; nevertheless, including $T^{-1}\in\mathcal{L}(X)$ keeps the algebraic and analytic requirements visible. [example: Right Shift Is an Isometry but Not Onto] Let $S:\ell^2\to\ell^2$ be the unilateral right shift \begin{align*} S(x_1,x_2,x_3,\dots)=(0,x_1,x_2,\dots). \end{align*} For $x=(x_1,x_2,x_3,\dots)\in\ell^2$, \begin{align*} \|Sx\|_{\ell^2}^2=|0|^2+\sum_{n=1}^{\infty}|x_n|^2=\|x\|_{\ell^2}^2. \end{align*} Hence $\|Sx\|_{\ell^2}=\|x\|_{\ell^2}$ for every $x\in\ell^2$. In particular, if $Sx=0$, then $\|x\|_{\ell^2}=\|Sx\|_{\ell^2}=0$, so $x=0$; thus $S$ is injective. Its range is exactly \begin{align*} \operatorname{Range}(S)=\{y=(y_1,y_2,y_3,\dots)\in\ell^2:y_1=0\}. \end{align*} Indeed, every vector $Sx$ has first coordinate $0$. Conversely, if $y_1=0$, then $x=(y_2,y_3,y_4,\dots)$ belongs to $\ell^2$ because \begin{align*} \sum_{n=1}^{\infty}|x_n|^2=\sum_{n=2}^{\infty}|y_n|^2\le \sum_{n=1}^{\infty}|y_n|^2<\infty, \end{align*} and $Sx=y$. This range is closed: if $y^{(m)}\in\operatorname{Range}(S)$ and $y^{(m)}\to y$ in $\ell^2$, then \begin{align*} |y_1|=|y_1-y^{(m)}_1|\le \|y-y^{(m)}\|_{\ell^2}\to 0, \end{align*} so $y_1=0$ and $y\in\operatorname{Range}(S)$. The vector $e_1=(1,0,0,\dots)$ is not in $\operatorname{Range}(S)$, since every vector in the range has first coordinate $0$. Therefore $S$ is not onto. If $S$ were invertible in $\mathcal L(\ell^2)$, then for every $y\in\ell^2$ we would have $y=S(S^{-1}y)$, so $S$ would be onto. Hence $S$ is not invertible in $\mathcal L(\ell^2)$. This example shows that preserving norms does not force invertibility in infinite-dimensional spaces. [/example] Once invertibility is phrased inside $\mathcal{L}(X)$, the spectral question asks for which scalars the shifted operator $T-\lambda I$ is invertible. The identity operator is often suppressed in notation, but the shift by $\lambda I$ is the object being tested. [definition: Spectrum and Resolvent Set] Let $X$ be a complex Banach space and let $T\in\mathcal{L}(X)$. The resolvent set of $T$ is \begin{align*} \rho(T)=\{\lambda\in\mathbb C: T-\lambda I \text{ is invertible in }\mathcal{L}(X)\}. \end{align*} The spectrum of $T$ is \begin{align*} \sigma(T)=\mathbb C\setminus \rho(T). \end{align*} [/definition] The sign convention $T-\lambda I$ is the one used in these notes. Some texts use $\lambda I-T$; this changes the sign of the resolvent operator but not the set $\sigma(T)$. The spectrum splits according to how $T-\lambda I$ fails to be invertible. The first obstruction is algebraic: a non-zero vector is killed by $T-\lambda I$. [definition: Point Spectrum] Let $X$ be a complex Banach space and let $T\in\mathcal{L}(X)$. The point spectrum of $T$ is \begin{align*} \sigma_p(T)=\{\lambda\in\mathbb C: \ker(T-\lambda I)\ne\{0\}\}. \end{align*} [/definition] Elements of $\sigma_p(T)$ are eigenvalues of $T$. Eigenvalues do not exhaust the infinite-dimensional spectrum. A sequence of unit vectors may behave more and more like eigenvectors even when no actual eigenvector exists, and this is the obstruction captured next. [definition: Approximate Point Spectrum] Let $X$ be a complex Banach space and let $T\in\mathcal{L}(X)$. The approximate point spectrum of $T$ is the set of all $\lambda\in\mathbb C$ for which there exists a sequence $(x_n)_{n\ge 1}$ in $X$ such that \begin{align*} \|x_n\|_X=1, \qquad \|(T-\lambda I)x_n\|_X\to 0. \end{align*} [/definition] Approximate eigenvectors are central in Hilbert-space spectral theory because they detect boundary spectral values and, for normal operators, all spectral values. The next two pieces classify what remains when $T-\lambda I$ is injective. [definition: Residual and Continuous Spectrum] Let $X$ be a complex Banach space and let $T\in\mathcal{L}(X)$. The residual spectrum of $T$ is the set of all $\lambda\in\mathbb C$ such that $T-\lambda I$ is injective and $\overline{\operatorname{Range}(T-\lambda I)}\ne X$. The continuous spectrum of $T$ is the set of all $\lambda\in\mathbb C$ such that $T-\lambda I$ is injective, $\overline{\operatorname{Range}(T-\lambda I)}=X$, and $\operatorname{Range}(T-\lambda I)\ne X$. [/definition] The word continuous refers to the fact that the inverse exists algebraically on a dense range but cannot be extended to a bounded everywhere-defined inverse. This phenomenon has no finite-dimensional analogue, since a dense subspace of a finite-dimensional normed space is the whole space. [example: Diagonal Operators on Sequence Spaces] Let $1\le p<\infty$ and let $a=(a_n)_{n\ge 1}\in \ell^\infty$. Define $D_a:\ell^p\to\ell^p$ by \begin{align*} (D_a x)_n=a_nx_n. \end{align*} If $x=(x_n)_{n\ge 1}\in\ell^p$, then \begin{align*} \|D_a x\|_{\ell^p}^p=\sum_{n=1}^{\infty}|a_nx_n|^p\le \|a\|_{\ell^\infty}^p\sum_{n=1}^{\infty}|x_n|^p=\|a\|_{\ell^\infty}^p\|x\|_{\ell^p}^p. \end{align*} Thus $D_a$ is bounded and $\|D_a\|_{\mathcal L(\ell^p)}\le \|a\|_{\ell^\infty}$. Conversely, for every $\varepsilon>0$ there is some $k$ with $|a_k|>\|a\|_{\ell^\infty}-\varepsilon$, and for the standard basis vector $e_k$, \begin{align*} \|D_a e_k\|_{\ell^p}=|a_k|\|e_k\|_{\ell^p}=|a_k|>\|a\|_{\ell^\infty}-\varepsilon. \end{align*} Letting $\varepsilon\downarrow 0$ gives $\|D_a\|_{\mathcal L(\ell^p)}=\|a\|_{\ell^\infty}$. We now compute the spectrum. Put $A=\overline{\{a_n:n\ge 1\}}$. If $\lambda\notin A$, then \begin{align*} \delta=\operatorname{dist}(\lambda,A)>0. \end{align*} Define $M_\lambda:\ell^p\to\ell^p$ by \begin{align*} (M_\lambda y)_n=\frac{y_n}{a_n-\lambda}. \end{align*} Since $|a_n-\lambda|\ge \delta$ for every $n$, \begin{align*} \|M_\lambda y\|_{\ell^p}^p=\sum_{n=1}^{\infty}\frac{|y_n|^p}{|a_n-\lambda|^p}\le \delta^{-p}\sum_{n=1}^{\infty}|y_n|^p=\delta^{-p}\|y\|_{\ell^p}^p. \end{align*} So $M_\lambda\in\mathcal L(\ell^p)$. For every $x\in\ell^p$, \begin{align*} (M_\lambda(D_a-\lambda I)x)_n=\frac{(a_n-\lambda)x_n}{a_n-\lambda}=x_n. \end{align*} For every $y\in\ell^p$, \begin{align*} ((D_a-\lambda I)M_\lambda y)_n=(a_n-\lambda)\frac{y_n}{a_n-\lambda}=y_n. \end{align*} Hence $D_a-\lambda I$ is invertible, with inverse $M_\lambda$, so $\lambda\in\rho(D_a)$. If $\lambda=a_k$ for some $k$, then \begin{align*} ((D_a-\lambda I)e_k)_n=(a_n-\lambda)(e_k)_n. \end{align*} The only possibly non-zero coordinate is the $k$th one, and it equals \begin{align*} (a_k-\lambda)(e_k)_k=0. \end{align*} Thus $e_k\ne 0$ lies in $\ker(D_a-\lambda I)$, so $D_a-\lambda I$ is not invertible. It remains to treat the case where $\lambda\in A$ but $\lambda\ne a_n$ for every $n$. For each $j\ge 1$, choose $n_j$ such that \begin{align*} |a_{n_j}-\lambda|<\frac{1}{j}. \end{align*} Then $\|e_{n_j}\|_{\ell^p}=1$, and \begin{align*} \|(D_a-\lambda I)e_{n_j}\|_{\ell^p}=|a_{n_j}-\lambda|<\frac{1}{j}. \end{align*} Therefore $\|(D_a-\lambda I)e_{n_j}\|_{\ell^p}\to 0$. If $D_a-\lambda I$ had a bounded inverse $B$, then \begin{align*} 1=\|e_{n_j}\|_{\ell^p}=\|B(D_a-\lambda I)e_{n_j}\|_{\ell^p}\le \|B\|_{\mathcal L(\ell^p)}\|(D_a-\lambda I)e_{n_j}\|_{\ell^p}, \end{align*} which tends to $0$, a contradiction. Hence $D_a-\lambda I$ is not invertible. Combining the two directions, \begin{align*} \sigma(D_a)=\overline{\{a_n:n\ge 1\}}. \end{align*} Attained diagonal values are eigenvalues with eigenvectors among the standard basis vectors, while non-attained limit points are detected by unit vectors $e_{n_j}$ on which $D_a-\lambda I$ becomes arbitrarily small. [/example] This diagonal model is the main mental picture for much of spectral theory: the spectrum behaves like the closure of possible diagonal entries. General operators are not diagonal, but the resolvent still measures how close $T-\lambda I$ is to losing invertibility. ## The Resolvent Operator and Its Identity Knowing that $T-\lambda I$ is invertible is useful only if we also study the inverse as a function of $\lambda$. The resolvent operator packages solutions of $(T-\lambda I)x=y$ and its algebraic identity is the computational engine behind spectral theory. The next definitions and theorem explain why the resolvent set is not merely a set but an open region carrying an analytic operator-valued function. [definition: Resolvent Operator] Let $X$ be a complex Banach space, let $T\in\mathcal{L}(X)$, and let $\lambda\in\rho(T)$. The resolvent operator of $T$ at $\lambda$ is \begin{align*} R(\lambda,T)=(T-\lambda I)^{-1}\in\mathcal{L}(X). \end{align*} [/definition] The parameter $\lambda$ belongs to the resolvent set precisely when this bounded inverse exists. To compare the inverse at two different parameters, we need a formula that turns a change in $\lambda$ into an operator identity; that formula is the basic algebraic mechanism behind differentiability of the resolvent and perturbation estimates. [quotetheorem:8388] [citeproof:8388] The identity turns differences of resolvents into products of resolvents. Both parameters must lie in $\rho(T)$ because the formula is a statement about bounded everywhere-defined inverses; at a spectral value, one of the displayed operators is not defined in $\mathcal{L}(X)$. The identity also does not say that $R(\lambda,T)$ is uniformly bounded as $\lambda$ moves through the resolvent set; for diagonal operators the resolvent norm can blow up when $\lambda$ approaches the spectrum. Its real use is therefore local and analytic: once a single inverse exists, the identity gives control of nearby inverses and differentiability of the resolvent. This is the first sign that spectral theory is controlled by Banach-algebra algebra rather than by choosing bases. The next question is stability: if $T-\lambda I$ is invertible, must nearby operators $T-\mu I$ also be invertible? The answer uses the Neumann-series idea that a sufficiently small perturbation of the identity remains invertible. [quotetheorem:8389] [citeproof:8389] Since $\rho(T)$ is open, the spectrum is closed. The boundedness of $T$ is part of the mechanism: the perturbation $(\lambda-\lambda_0)R(\lambda_0,T)$ must be a bounded operator with norm less than $1$ for the Neumann series to apply. The theorem is local and says nothing by itself about whether $\sigma(T)$ is nonempty or bounded; those global facts require the large-$\lambda$ estimate and complex analysis below. Later this local Neumann-series argument becomes the main perturbation tool: invertibility is stable under perturbations that are small relative to the inverse. [example: Resolvent of a Diagonal Operator] Let $A=\overline{\{a_n:n\ge 1\}}$ and suppose $\lambda\notin A$. Set $\delta=\operatorname{dist}(\lambda,A)>0$. For $y\in\ell^p$, define \begin{align*} (M_\lambda y)_n=\frac{y_n}{a_n-\lambda}. \end{align*} Since $|a_n-\lambda|\ge \delta$ for every $n$, \begin{align*} \|M_\lambda y\|_{\ell^p}^p=\sum_{n=1}^{\infty}\left|\frac{y_n}{a_n-\lambda}\right|^p\le \delta^{-p}\sum_{n=1}^{\infty}|y_n|^p=\delta^{-p}\|y\|_{\ell^p}^p. \end{align*} Thus $M_\lambda\in\mathcal L(\ell^p)$. For every $x\in\ell^p$ and every coordinate $n$, \begin{align*} (M_\lambda(D_a-\lambda I)x)_n=\frac{(a_n-\lambda)x_n}{a_n-\lambda}=x_n. \end{align*} For every $y\in\ell^p$ and every coordinate $n$, \begin{align*} ((D_a-\lambda I)M_\lambda y)_n=(a_n-\lambda)\frac{y_n}{a_n-\lambda}=y_n. \end{align*} Hence $M_\lambda=(D_a-\lambda I)^{-1}=R(\lambda,D_a)$, so \begin{align*} R(\lambda,D_a)x=\left(\frac{x_n}{a_n-\lambda}\right)_{n\ge 1}. \end{align*} Now put \begin{align*} m=\sup_{n\ge 1}\frac{1}{|a_n-\lambda|}. \end{align*} The upper bound follows coordinatewise: \begin{align*} \|R(\lambda,D_a)x\|_{\ell^p}^p=\sum_{n=1}^{\infty}\frac{|x_n|^p}{|a_n-\lambda|^p}\le m^p\sum_{n=1}^{\infty}|x_n|^p=m^p\|x\|_{\ell^p}^p. \end{align*} So $\|R(\lambda,D_a)\|_{\mathcal L(\ell^p)}\le m$. Conversely, for every $\varepsilon>0$ choose $k$ with \begin{align*} \frac{1}{|a_k-\lambda|}>m-\varepsilon. \end{align*} Since $\|e_k\|_{\ell^p}=1$, \begin{align*} \|R(\lambda,D_a)e_k\|_{\ell^p}=\frac{1}{|a_k-\lambda|}>m-\varepsilon. \end{align*} Letting $\varepsilon\downarrow 0$ gives $\|R(\lambda,D_a)\|_{\mathcal L(\ell^p)}=m$. Finally, because $z\mapsto |\lambda-z|$ is continuous, taking the infimum over $\{a_n:n\ge 1\}$ gives the same value as taking it over its closure $A$. Therefore \begin{align*} \sup_{n\ge 1}\frac{1}{|a_n-\lambda|}=\frac{1}{\inf_{n\ge 1}|a_n-\lambda|}=\frac{1}{\operatorname{dist}(\lambda,A)}. \end{align*} Thus \begin{align*} \|R(\lambda,D_a)\|_{\mathcal L(\ell^p)}=\frac{1}{\operatorname{dist}(\lambda,\overline{\{a_n:n\ge 1\}})}. \end{align*} As $\lambda$ moves through the resolvent set toward the spectral set $\overline{\{a_n:n\ge 1\}}$, this distance tends to $0$, so the resolvent norm tends to infinity. [/example] This example should be read as a model rather than as a special trick. In general the resolvent norm quantifies the conditioning of the equation $(T-\lambda I)x=y$; near the spectrum, solving this equation becomes unstable or impossible. ## Compactness and Nonemptiness of the Spectrum The next structural question is global. The openness theorem gives local information about $\rho(T)$, but it does not yet say where the spectrum lies or whether the spectrum must exist. Over complex Banach spaces the spectrum of a bounded operator is always a nonempty compact subset of the complex plane. The norm bound gives a first enclosure for the spectrum, but it can be very crude. Rather than repeating the earlier examples in full, we isolate the invariant that those examples were meant to motivate: the spectral radius \begin{align*} r(T)=\sup\{|\lambda|:\lambda\in\sigma(T)\}. \end{align*} For operators this notation is a special case of the following algebraic language used later in the page. A unital Banach algebra $A$ is a complex Banach space with an associative multiplication, a multiplicative identity $1$, and a submultiplicative norm. For $x\in A$, its spectrum is \begin{align*} \sigma_A(x)=\{\lambda\in\mathbb C:x-\lambda 1\text{ is not invertible in }A\}, \end{align*} and its spectral radius is $r_A(x)=\sup\{|\lambda|:\lambda\in\sigma_A(x)\}$. When $A=B(X)$ and $x=T$, this recovers the operator notation $\sigma(T)$ and $r(T)$. The spectral radius formula answers a question left open by the elementary norm enclosure: how much of the spectrum can be recovered from the asymptotic behaviour of powers rather than from a single resolvent computation? [quotetheorem:2672] [citeproof:2672] Conceptually, the formula says that the spectrum is governed by long-term multiplicative behaviour. This is why operators with comparable one-step norms can have different spectral radii, and it is also why later compact and normal theory often extracts spectral information from invariant subspaces and powers rather than from direct computation of inverses. The preceding discussion explains the size of the spectrum once the spectrum is known to exist. The remaining global issue is existence itself. In the next quoted result the standing context is a non-zero complex Banach space $X$; this nonzero hypothesis is essential for the usual nonemptiness conclusion. [quotetheorem:889] [citeproof:889] This is the first point where complex scalars are essential. Compactness and nonemptiness do not say that the spectrum contains eigenvalues, nor do they describe which part of the spectrum comes from injectivity, range, or density failures. Over real Banach spaces an operator may have empty real spectrum, as rotation by $90^\circ$ on $\mathbb R^2$ shows. The theorem therefore supplies the global existence result that later refinements, such as spectral radius estimates and spectral mapping, can sharpen but not replace. [example: The Volterra Operator Has Spectrum Only at Zero] Let $V:C[0,1]\to C[0,1]$ be the Volterra operator, with $C[0,1]$ carrying the supremum norm, defined by \begin{align*} (Vf)(x)=\int_0^x f(t)\,dt. \end{align*} For $f\in C[0,1]$ and $0\le x\le 1$, \begin{align*} |(Vf)(x)|\le \int_0^x |f(t)|\,dt\le \int_0^x \|f\|_\infty\,dt=x\|f\|_\infty\le \|f\|_\infty. \end{align*} Thus $V$ is bounded and $\|V\|_{\mathcal L(C[0,1])}\le 1$. We claim that for every $n\ge 1$ and every $0\le x\le 1$, \begin{align*} |(V^n f)(x)|\le \frac{x^n}{n!}\|f\|_\infty. \end{align*} For $n=1$, this is exactly the estimate \begin{align*} |(Vf)(x)|\le x\|f\|_\infty. \end{align*} Assume the estimate holds for some $n\ge 1$. Then \begin{align*} |(V^{n+1}f)(x)|=\left|\int_0^x (V^n f)(t)\,dt\right|. \end{align*} Using the induction hypothesis inside the integral gives \begin{align*} |(V^{n+1}f)(x)|\le \int_0^x \frac{t^n}{n!}\|f\|_\infty\,dt=\frac{x^{n+1}}{(n+1)!}\|f\|_\infty. \end{align*} Hence, for every $n\ge 1$, \begin{align*} \|V^n f\|_\infty=\sup_{0\le x\le 1}|(V^n f)(x)|\le \frac{1}{n!}\|f\|_\infty. \end{align*} Therefore \begin{align*} \|V^n\|_{\mathcal L(C[0,1])}\le \frac{1}{n!}. \end{align*} By the *Spectral Radius Formula*, \begin{align*} r(V)=\lim_{n\to\infty}\|V^n\|_{\mathcal L(C[0,1])}^{1/n}. \end{align*} The estimate above implies \begin{align*} 0\le r(V)\le \lim_{n\to\infty}\left(\frac{1}{n!}\right)^{1/n}=0, \end{align*} so $r(V)=0$. By the definition of spectral radius, every $\lambda\in\sigma(V)$ satisfies $|\lambda|\le r(V)=0$, and therefore \begin{align*} \sigma(V)\subset\{0\}. \end{align*} Since $C[0,1]$ is a non-zero complex Banach space, the *Spectrum Is Compact and Nonempty* theorem gives $\sigma(V)\ne\varnothing$. Thus \begin{align*} \sigma(V)=\{0\}. \end{align*} Finally, $0$ is not an eigenvalue. If $Vf=0$, then for every $x\in[0,1]$, \begin{align*} \int_0^x f(t)\,dt=0. \end{align*} The function $F(x)=\int_0^x f(t)\,dt$ is differentiable and satisfies $F'(x)=f(x)$ by the [fundamental theorem of calculus](/theorems/632) for continuous functions. Since $F(x)=0$ for all $x$, we get $f(x)=F'(x)=0$ for all $x\in[0,1]$. Hence $\ker V=\{0\}$, so the only spectral value of $V$ is not an eigenvalue. [/example] This example is a warning against treating spectrum as a list of eigenvalues. The Volterra operator has spectrum, but no eigenvector corresponding to its only spectral value. ## Spectral Radius The norm bound $\sigma(T)\subset\{|\lambda|\le\|T\|\}$ is often too crude. Powers of $T$ contain sharper long-term information: if $T^n$ grows slowly, then spectral values must be small. The spectral radius is the number extracted from this long-term growth. [definition: Spectral Radius] Let $X$ be a complex Banach space and let $T\in\mathcal{L}(X)$. The spectral radius of $T$ is \begin{align*} r(T)=\sup\{|\lambda|:\lambda\in\sigma(T)\}. \end{align*} [/definition] The [compactness theorem](/theorems/2748) ensures that this supremum is finite and attained. But the definition still depends on knowing the spectrum itself, which is often the hardest object to compute. A more workable estimate should use only the iterates of the operator, since powers can be bounded directly even when the resolvent is inaccessible. The following result solves that problem by recovering the spectral radius from the long-term norm growth of $T^n$. [quotetheorem:2672] [citeproof:2672] This discussion emphasizes two ingredients that recur later: submultiplicativity of powers and analyticity of the resolvent. The formula converts spectral questions into norm estimates and norm estimates back into spectral information. The bounded-operator hypothesis is essential because the expression $\|T^n\|_{\mathcal{L}(X)}$ and the Neumann/Laurent expansions live in the Banach algebra $\mathcal{L}(X)$. The formula also does not say that $\|T\|_{\mathcal{L}(X)}$ itself is close to $r(T)$: a non-zero nilpotent operator has $r(T)=0$ but positive operator norm. Thus the theorem is most powerful for long-term behaviour of powers, and it prepares the later spectral mapping principle by showing how algebraic operations on $T$ are reflected in analytic information about the resolvent. [example: Spectral Radius of the Unilateral Shift] For the unilateral right shift $S(x_1,x_2,x_3,\dots)=(0,x_1,x_2,\dots)$ on $\ell^2$, its $n$th power is \begin{align*} S^n(x_1,x_2,x_3,\dots)=(\underbrace{0,\dots,0}_{n\text{ zeros}},x_1,x_2,\dots). \end{align*} Hence, for every $x\in\ell^2$, \begin{align*} \|S^n x\|_{\ell^2}^2=\sum_{k=1}^{\infty}|x_k|^2=\|x\|_{\ell^2}^2. \end{align*} Thus $\|S^n x\|_{\ell^2}=\|x\|_{\ell^2}$ for all $x$, so $\|S^n\|_{\mathcal L(\ell^2)}=1$ for every $n\ge 1$. By the *Spectral Radius Formula*, \begin{align*} r(S)=\lim_{n\to\infty}\|S^n\|_{\mathcal L(\ell^2)}^{1/n}=\lim_{n\to\infty}1^{1/n}=1. \end{align*} We can also see the full spectrum. If $|\lambda|>1$, then \begin{align*} S-\lambda I=-\lambda(I-\lambda^{-1}S). \end{align*} Since $\|\lambda^{-1}S\|_{\mathcal L(\ell^2)}=|\lambda|^{-1}<1$, the Neumann series makes $I-\lambda^{-1}S$ invertible, and therefore $S-\lambda I$ is invertible. Hence $\sigma(S)\subset\{\lambda\in\mathbb C:|\lambda|\le 1\}$. Now let $|\lambda|<1$. If $\lambda=0$, then $S$ is not onto because $e_1=(1,0,0,\dots)$ is not in its range. If $0<|\lambda|<1$ and $(S-\lambda I)x=e_1$, then the first coordinate gives \begin{align*} -\lambda x_1=1, \end{align*} so $x_1=-\lambda^{-1}$. For $n\ge 2$, the $n$th coordinate gives \begin{align*} x_{n-1}-\lambda x_n=0, \end{align*} so induction yields \begin{align*} x_n=-\lambda^{-n}. \end{align*} Then \begin{align*} \sum_{n=1}^{\infty}|x_n|^2=\sum_{n=1}^{\infty}|\lambda|^{-2n}=\infty, \end{align*} because $|\lambda|^{-2}>1$. Thus no $x\in\ell^2$ solves $(S-\lambda I)x=e_1$, so $S-\lambda I$ is not onto. It remains to put the boundary $|\lambda|=1$ in the spectrum. For $N\ge 1$, define \begin{align*} u_N=\frac{1}{\sqrt N}(1,\lambda^{-1},\lambda^{-2},\dots,\lambda^{-(N-1)},0,0,\dots). \end{align*} Since $|\lambda|=1$, \begin{align*} \|u_N\|_{\ell^2}^2=\frac{1}{N}\sum_{k=0}^{N-1}|\lambda|^{-2k}=1. \end{align*} The coordinates of $(S-\lambda I)u_N$ cancel from the second through the $N$th coordinate; the first coordinate is $-\lambda/\sqrt N$, and the $(N+1)$st coordinate is $\lambda^{-(N-1)}/\sqrt N$. Therefore \begin{align*} \|(S-\lambda I)u_N\|_{\ell^2}^2=\frac{|\lambda|^2}{N}+\frac{|\lambda|^{-2(N-1)}}{N}=\frac{2}{N}. \end{align*} Thus $\|(S-\lambda I)u_N\|_{\ell^2}\to 0$. If $S-\lambda I$ had a bounded inverse $B$, then \begin{align*} 1=\|u_N\|_{\ell^2}=\|B(S-\lambda I)u_N\|_{\ell^2}\le \|B\|_{\mathcal L(\ell^2)}\|(S-\lambda I)u_N\|_{\ell^2}, \end{align*} and the right side would tend to $0$, a contradiction. Hence every $|\lambda|=1$ also belongs to $\sigma(S)$. Combining the outside resolvent calculation with the interior and boundary obstructions gives \begin{align*} \sigma(S)=\{\lambda\in\mathbb C:|\lambda|\le 1\}. \end{align*} The point spectrum is empty: if $Sx=\lambda x$, then the first coordinate gives $0=\lambda x_1$; for $n\ge 2$, the coordinate equation is $x_{n-1}=\lambda x_n$. If $\lambda=0$, this forces every coordinate of $x$ to be $0$. If $\lambda\ne 0$, then $x_1=0$, and the recurrence forces $x_n=0$ for every $n$. Thus no non-zero eigenvector exists, even though the spectrum fills the whole closed unit disc. [/example] This example is the basic infinite-dimensional contrast with matrices. The spectrum fills a two-dimensional region even though there are no eigenvalues. [example: Spectral Radius of a Diagonal Operator] Let $1\le p<\infty$ and let $D_a:\ell^p\to\ell^p$ be defined by $(D_a x)_k=a_kx_k$. We first verify the formula for powers. For $n=1$ it is the definition of $D_a$. If $(D_a^n x)_k=a_k^n x_k$, then \begin{align*} (D_a^{n+1}x)_k=(D_a(D_a^n x))_k=a_k(D_a^n x)_k=a_k(a_k^n x_k)=a_k^{n+1}x_k. \end{align*} Thus, for every $n\ge 1$, \begin{align*} (D_a^n x)_k=a_k^n x_k. \end{align*} Put $\alpha=\sup_{k\ge 1}|a_k|$. For every $x\in\ell^p$, \begin{align*} \|D_a^n x\|_{\ell^p}^p=\sum_{k=1}^{\infty}|a_k^n x_k|^p=\sum_{k=1}^{\infty}|a_k|^{np}|x_k|^p\le \alpha^{np}\sum_{k=1}^{\infty}|x_k|^p=\alpha^{np}\|x\|_{\ell^p}^p. \end{align*} Hence $\|D_a^n\|_{\mathcal L(\ell^p)}\le \alpha^n$. Conversely, if $\varepsilon>0$, choose $j$ with $|a_j|>\alpha-\varepsilon$. Since $\|e_j\|_{\ell^p}=1$, \begin{align*} \|D_a^n e_j\|_{\ell^p}=|a_j|^n>(\alpha-\varepsilon)^n. \end{align*} Letting $\varepsilon\downarrow 0$ gives $\|D_a^n\|_{\mathcal L(\ell^p)}\ge \alpha^n$, so \begin{align*} \|D_a^n\|_{\mathcal L(\ell^p)}=\alpha^n=\sup_{k\ge 1}|a_k|^n. \end{align*} By the *Spectral Radius Formula*, \begin{align*} r(D_a)=\lim_{n\to\infty}\|D_a^n\|_{\mathcal L(\ell^p)}^{1/n}=\lim_{n\to\infty}(\alpha^n)^{1/n}=\alpha=\sup_{k\ge 1}|a_k|. \end{align*} This agrees with $\sigma(D_a)=\overline{\{a_k:k\ge 1\}}$, because the continuous function $z\mapsto |z|$ has the same supremum on a set and on its closure. Thus, for diagonal operators, the spectral radius is exactly the largest modulus approached by the diagonal entries. [/example] The chapter has built the vocabulary needed for the rest of the course: spectra classify invertibility failures, resolvents turn invertibility into analytic functions, and the spectral radius measures spectral size through the growth of powers. The next chapter develops the Banach-space tools behind the Neumann series, holomorphic resolvent methods, and spectral mapping principles used in these first results. By the end of Chapter 1, the spectrum has been defined and its most basic properties are available, but those results still depend on a small set of general estimates and analytic facts. Chapter 2 develops the Banach-space machinery behind those arguments, so that perturbation estimates, holomorphic dependence, and spectral mapping can be used systematically rather than ad hoc. # 2. Banach-Space Spectral Theory Tools This chapter develops the Banach-space tools that make the spectrum usable rather than only definitional. Chapter 1 introduced the resolvent set and the spectrum of an operator $T \in \mathcal{L}(X)$; here we learn how the resolvent behaves under perturbations, how complex function theory enters through $\lambda \mapsto (T - \lambda I)^{-1}$, and how spectra transform under algebraic functional calculus. The guiding theme is that invertibility in the Banach algebra $\mathcal{L}(X)$ is stable, analytic, and compatible with polynomial and rational expressions. ## Stability of Invertibility and Neumann Series When solving operator equations, the first practical question is whether an invertible operator remains invertible after a small error term is added. In finite dimensions this follows from continuity of the determinant, but Banach-space operator theory needs a norm estimate that also gives the inverse explicitly. The Neumann series is the operator-theoretic analogue of the geometric series. [remark: Neumann-Series Inversion] If $A \in \mathcal{L}(X)$ and $\|A\|_{\mathcal{L}(X)}<1$, then the geometric series \begin{align*} \sum_{n=0}^{\infty} A^n \end{align*} converges in the operator norm, and its sum is the inverse of $I-A$. This is the Banach-space version of the scalar identity $(1-a)^{-1}=\sum_{n=0}^{\infty}a^n$ for $|a|<1$: completeness of $\mathcal{L}(X)$ turns the [Cauchy estimates](/theorems/2571) for the partial sums into an actual bounded inverse. [/remark] This principle does more than prove existence of an inverse: it gives a controlled formula for the inverse. Its norm hypothesis is a condition for this geometric-series argument, not a characterisation of invertibility. For instance, if $A=2I$, then $\|A\|_{\mathcal{L}(X)}=2$ but $I-A=-I$ is invertible; if $A=I$, then $\|A\|_{\mathcal{L}(X)}=1$ and $I-A=0$ is not invertible. Thus the boundary $\|A\|_{\mathcal{L}(X)}=1$ cannot be treated by the same estimate, and the theorem does not say that $I-A$ fails to be invertible whenever $\|A\|_{\mathcal{L}(X)}\ge 1$. The next problem is to transfer this control from $I-A$ to an arbitrary invertible operator $B$ after adding an error term $E$. [quotetheorem:8390] [citeproof:8390] This perturbation result is the local engine behind resolvent estimates: once a single point belongs to the resolvent set, nearby points do as well. The smallness condition is sufficient rather than necessary, because it is designed to put $I+B^{-1}E$ within the Neumann-series ball around the identity. As a boundary case, take $B=I$ and $E=I$; then $\|B^{-1}E\|_{\mathcal{L}(X)}=1$, so the theorem does not apply, but $B+E=2I$ is invertible. In contrast, $B=I$ and $E=-I$ gives the same norm value while $B+E=0$ is not invertible, so norm size alone at the boundary cannot decide the question. This motivates looking at finite-rank errors, where the inverse can sometimes be written using scalar data rather than an infinite series. [example: Rank-One Perturbation] Let $X$ be a Banach space, let $u \in X$, and let $f \in X^*$. Define $R \in \mathcal{L}(X)$ by $R x=f(x)u$, and assume first that $1+f(u)\ne 0$. We show that the inverse of $I+R$ is the operator $S$ given by \begin{align*} Sx=x-\frac{f(x)}{1+f(u)}u. \end{align*} For $x\in X$, linearity of $f$ gives \begin{align*} f(Sx)=f(x)-\frac{f(x)}{1+f(u)}f(u). \end{align*} Combining the two terms over the denominator $1+f(u)$ gives \begin{align*} f(Sx)=\frac{(1+f(u))f(x)-f(u)f(x)}{1+f(u)}=\frac{f(x)}{1+f(u)}. \end{align*} Hence \begin{align*} (I+R)Sx=Sx+f(Sx)u=x-\frac{f(x)}{1+f(u)}u+\frac{f(x)}{1+f(u)}u=x. \end{align*} In the other order, first compute \begin{align*} S((I+R)x)=S(x+f(x)u)=x+f(x)u-\frac{f(x+f(x)u)}{1+f(u)}u. \end{align*} By linearity of $f$, \begin{align*} f(x+f(x)u)=f(x)+f(x)f(u)=f(x)(1+f(u)). \end{align*} Substituting this into the previous expression gives \begin{align*} S((I+R)x)=x+f(x)u-\frac{f(x)(1+f(u))}{1+f(u)}u=x. \end{align*} Thus $S$ is both a left and right inverse of $I+R$, so $I+R$ is invertible and \begin{align*} (I+R)^{-1}x=x-\frac{f(x)}{1+f(u)}u. \end{align*} The identity behind the scalar denominator is \begin{align*} R^2x=R(f(x)u)=f(x)f(u)u=f(u)Rx. \end{align*} If $1+f(u)=0$, then $f(u)=-1$; in particular $u\ne 0$, since $f(0)=0$. Therefore \begin{align*} (I+R)u=u+f(u)u=u-u=0, \end{align*} so $I+R$ has a nonzero vector in its kernel and is not invertible. This shows that a rank-one perturbation can be inverted by one scalar correction exactly when the scalar $1+f(u)$ does not vanish. [/example] The example shows that the norm-small condition is sufficient but not necessary. Spectral theory often combines local norm estimates with exact algebraic identities, and the resolvent function is where the two viewpoints meet. ## The Holomorphic Resolvent and Nonempty Spectrum The next question is why the resolvent is more than a set-valued complement of the spectrum. Since $T-\lambda I$ depends complex-linearly on $\lambda$, the inverse should vary analytically wherever it exists. This analytic structure is the bridge from Banach algebra estimates to complex analysis. [definition: Resolvent Function] Let $X$ be a complex Banach space and let $T \in \mathcal{L}(X)$. The resolvent function of $T$ is the map $R_T : \rho(T) \to \mathcal{L}(X)$ defined by \begin{align*} R_T(\lambda) = (T - \lambda I)^{-1}. \end{align*} [/definition] The sign convention follows the course convention from Chapter 1. To use the resolvent as an analytic object, we need a local power-series expansion around each point of $\rho(T)$, and the perturbation theorem gives that expansion by treating $\lambda-\lambda_0$ as the small parameter. [quotetheorem:8391] [citeproof:8391] Holomorphy is useful because scalar complex analysis can be recovered by applying bounded linear functionals to the operator-valued resolvent; this scalarisation principle recurs throughout spectral theory. The complex-scalar hypothesis is essential for this argument, since holomorphicity and Liouville's theorem are statements over $\mathbb C$. The displayed series is local: its guaranteed radius depends on $\|R_T(\lambda_0)\|_{\mathcal{L}(X)}$, and it need not reach the nearest spectral point optimally. The theorem also does not provide [analytic continuation](/page/Analytic%20Continuation) across $\sigma(T)$; spectral points are precisely the obstruction to defining the inverse operator. The next problem is to rule out the possibility that the resolvent exists everywhere, and Liouville's theorem gives the needed global obstruction. [quotetheorem:889] [citeproof:889] This argument is the first major payoff of working over $\mathbb C$. The assumption $X\ne\{0\}$ removes the degenerate case in which $\mathcal{L}(X)$ is the zero algebra and the identity/invertibility conventions collapse the usual spectral discussion. Complex scalars matter: over a real Banach space the spectrum can be empty, as rotation by $90^\circ$ on $\mathbb R^2$ has no real $\lambda$ for which $T-\lambda I$ fails to be invertible. The proof is therefore not only a Banach-space completeness argument; it combines scalarisation with Liouville's theorem, and that complex-analysis obstruction has no direct real analogue. To see the same ideas in a computable model, it helps to examine diagonal multiplication, where the resolvent estimate becomes a pointwise bound on reciprocal scalars. [example: Multiplication by a Bounded Sequence] Let $a=(a_n)_{n\ge 1}\in \ell^\infty$ and define $M_a\in \mathcal{L}(\ell^p)$, $1\le p\le \infty$, by $(M_a x)_n=a_nx_n$. We compute the spectrum by showing that \begin{align*} \sigma(M_a)=\overline{\{a_n:n\ge 1\}}. \end{align*} First, if $\lambda\notin \overline{\{a_n:n\ge 1\}}$, then \begin{align*} d:=\inf_{n\ge 1}|a_n-\lambda|>0. \end{align*} Set $b_n=(a_n-\lambda)^{-1}$. Then $|b_n|\le d^{-1}$ for every $n$, so $b\in \ell^\infty$ and the diagonal multiplier $M_b$ is bounded on $\ell^p$. For $x\in \ell^p$ and each coordinate $n$, \begin{align*} ((M_a-\lambda I)M_bx)_n=(a_n-\lambda)b_nx_n=(a_n-\lambda)(a_n-\lambda)^{-1}x_n=x_n. \end{align*} Also, \begin{align*} (M_b(M_a-\lambda I)x)_n=b_n(a_n-\lambda)x_n=(a_n-\lambda)^{-1}(a_n-\lambda)x_n=x_n. \end{align*} Thus $M_b$ is a two-sided inverse of $M_a-\lambda I$, so $\lambda\in \rho(M_a)$. Conversely, let $\lambda\in \overline{\{a_n:n\ge 1\}}$. If $a_m=\lambda$ for some $m$, then the coordinate vector $e_m$ is nonzero and \begin{align*} (M_a-\lambda I)e_m=(a_m-\lambda)e_m=0e_m=0. \end{align*} Hence $M_a-\lambda I$ has a nontrivial kernel and is not invertible. If no coordinate equals $\lambda$, then there is a subsequence $(a_{n_k})$ with $a_{n_k}\to \lambda$. For each $k$, $\|e_{n_k}\|_p=1$, while \begin{align*} \|(M_a-\lambda I)e_{n_k}\|_p=\|(a_{n_k}-\lambda)e_{n_k}\|_p=|a_{n_k}-\lambda|\|e_{n_k}\|_p=|a_{n_k}-\lambda|. \end{align*} If $M_a-\lambda I$ had a bounded inverse $S$, then applying $S$ to $(M_a-\lambda I)e_{n_k}$ would give \begin{align*} 1=\|e_{n_k}\|_p=\|S(M_a-\lambda I)e_{n_k}\|_p\le \|S\|_{\mathcal{L}(\ell^p)}|a_{n_k}-\lambda|. \end{align*} The right-hand side tends to $0$, contradicting $1\le \|S\|_{\mathcal{L}(\ell^p)}|a_{n_k}-\lambda|$. Therefore every point in the closure belongs to $\sigma(M_a)$, and the spectrum is exactly the closure of the scalar values appearing on the diagonal. [/example] Diagonal multiplication is the model case where the spectrum is visible directly as a closure of scalar values. The next section explains why polynomial and rational expressions preserve this scalar intuition for every bounded operator. ## Spectral Mapping for Algebraic Functional Calculus Once an operator has a spectrum, the next question is how the spectrum changes when we form $p(T)$ or more generally $r(T)$. The guiding scalar principle is that applying a function to an input should apply the same function to the possible spectral values. For polynomials this is a consequence of factorisation over $\mathbb C$. [quotetheorem:2671] [citeproof:2671] The theorem says that polynomial algebra in $T$ does not create spectral values by any mechanism other than applying the polynomial to old ones. The hypotheses are doing real work: nonzero complex Banach spaces give a nonempty spectrum, and the proof uses factorisation of $p(z)-\lambda$ into linear factors over $\mathbb C$. Over real scalars the same statement must be formulated with care, often by complexifying the space, because real polynomials need not split into real linear factors. The theorem also says only where the spectrum of $p(T)$ lies; it does not assert that $T$ has eigenvectors, that $p(T)$ is diagonalizable, or that spectral multiplicities are preserved. Nilpotent finite-dimensional blocks are a useful test case because their spectrum is small while their powers retain information about non-diagonal structure. [example: Nilpotent Jordan Blocks] Let $N \in \mathcal{L}(\mathbb C^m)$ be the nilpotent Jordan block with ones on the superdiagonal and zeros elsewhere. With respect to the standard basis, applying $N$ shifts each basis vector at most one position, so after $m$ applications every basis vector is sent to $0$; hence $N^m=0$. By the *Polynomial Spectral Mapping Theorem* applied to $p(z)=z^m$, \begin{align*} \sigma(N^m)=\{\mu^m:\mu\in\sigma(N)\}. \end{align*} Since $N^m=0$, the operator $N^m-\lambda I=-\lambda I$ is invertible for $\lambda\ne 0$, while $N^m-0I=0$ is not invertible on $\mathbb C^m\ne\{0\}$. Thus $\sigma(N^m)=\{0\}$, so \begin{align*} \{\mu^m:\mu\in\sigma(N)\}=\{0\}. \end{align*} Every complex number satisfying $\mu^m=0$ is $\mu=0$, so $\sigma(N)\subseteq\{0\}$. Also $Ne_1=0$ and $e_1\ne 0$, so $N=N-0I$ is not invertible; therefore $0\in\sigma(N)$. Hence \begin{align*} \sigma(N)=\{0\}. \end{align*} For $\lambda\ne 0$, set \begin{align*} Q_\lambda=-\sum_{k=0}^{m-1}\lambda^{-k-1}N^k. \end{align*} Then, using $N^m=0$ and collecting the two finite sums, \begin{align*} (N-\lambda I)Q_\lambda=-\sum_{k=0}^{m-1}\lambda^{-k-1}N^{k+1}+\sum_{k=0}^{m-1}\lambda^{-k}N^k=I-\lambda^{-m}N^m=I. \end{align*} The same calculation on the other side gives \begin{align*} Q_\lambda(N-\lambda I)=-\sum_{k=0}^{m-1}\lambda^{-k-1}N^{k+1}+\sum_{k=0}^{m-1}\lambda^{-k}N^k=I-\lambda^{-m}N^m=I. \end{align*} Thus \begin{align*} (N-\lambda I)^{-1}=-\sum_{k=0}^{m-1}\lambda^{-k-1}N^k,\qquad \lambda\ne 0. \end{align*} So all spectral values have collapsed to the single point $0$, but the resolvent still records the nilpotent powers $I,N,\dots,N^{m-1}$; for $m\ge 2$, this is exactly the information lost by looking only at spectral location. [/example] Polynomials are not the end of the algebraic calculus. If a rational function has no pole on the spectrum of $T$, its denominator is invertible after substituting $T$, so the same question can be asked for quotients. [definition: Rational Functional Calculus] Let $X$ be a complex Banach space and let $T \in \mathcal{L}(X)$. Let \begin{align*} \mathcal{R}_{\sigma(T)}=\{r=p/q:p,q\in \mathbb C[z],\ q(\lambda)\ne 0 \text{ for every } \lambda\in\sigma(T)\} \end{align*} be the algebra of rational functions with no poles on $\sigma(T)$. The rational functional calculus for $T$ is the map \begin{align*} \Phi_T:\mathcal{R}_{\sigma(T)}\to \mathcal{L}(X), \qquad r\mapsto r(T), \end{align*} defined as follows: if $r=p/q$ with $q(\lambda)\ne 0$ for every $\lambda \in \sigma(T)$, then \begin{align*} r(T)=p(T)q(T)^{-1}. \end{align*} [/definition] The condition on $q$ is exactly the condition needed to make $q(T)$ invertible, by polynomial spectral mapping. This definition is independent of the chosen representation $p/q$ after clearing denominators on the spectrum-free poles, because all the polynomials involved commute as functions of $T$. [quotetheorem:8392] [citeproof:8392] Rational spectral mapping packages many resolvent identities into one statement. The no-poles-on-spectrum hypothesis is necessary because $q(\mu)=0$ for some $\mu\in\sigma(T)$ forces $0\in\sigma(q(T))$ by polynomial spectral mapping, so $q(T)$ is not invertible and $p(T)q(T)^{-1}$ is not defined in $\mathcal{L}(X)$. This is not a removable technicality: if $r(z)=1/z$ and $0\in\sigma(T)$, then applying $r$ would require an inverse for $T$, precisely what the spectral condition rules out. Taking $r(z)=(z-\alpha)^{-1}$, for example, describes the spectrum of the resolvent operator itself whenever $\alpha \in \rho(T)$. [example: Spectrum of a Resolvent] Let $X$ be a nonzero complex Banach space, let $T\in\mathcal{L}(X)$, and let $\alpha\in\rho(T)$. Consider the rational function $r(z)=(z-\alpha)^{-1}$, so $r=p/q$ with $p(z)=1$ and $q(z)=z-\alpha$. If $\lambda\in\sigma(T)$, then $\lambda\ne\alpha$ because $\alpha\in\rho(T)=\mathbb C\setminus\sigma(T)$, hence \begin{align*} q(\lambda)=\lambda-\alpha\ne 0. \end{align*} Thus $r$ has no pole on $\sigma(T)$, and the rational functional calculus gives \begin{align*} r(T)=p(T)q(T)^{-1}=I(T-\alpha I)^{-1}=(T-\alpha I)^{-1}. \end{align*} By the *[Rational Spectral Mapping Theorem](/theorems/8392)*, \begin{align*} \sigma(r(T))=r(\sigma(T)). \end{align*} Substituting the two explicit expressions just computed gives \begin{align*} \sigma((T-\alpha I)^{-1})=\{r(\lambda):\lambda\in\sigma(T)\}=\{(\lambda-\alpha)^{-1}:\lambda\in\sigma(T)\}. \end{align*} So passing from $T$ to its resolvent at $\alpha$ shifts the spectral set by $-\alpha$ and then takes pointwise reciprocals. [/example] The chapter's tools will be used repeatedly in Chapters 3 through 10, especially when Hilbert-space geometry is added to the Banach-space resolvent theory. Neumann series control local invertibility, holomorphic resolvents import complex analysis, and spectral mapping lets algebraic expressions in an operator be read from the scalar geometry of its spectrum. Chapter 2 shows how resolvent estimates and complex analysis make the spectrum effective as a tool, not just a definition. The next step is to place all of this inside Hilbert space, where the inner product gives geometric structure to orthogonality, adjoints, and decomposition methods that are essential for the rest of the course. # 3. Hilbert-Space Preliminaries for Spectral Theory This chapter supplies the Hilbert-space language used throughout the spectral theory of bounded operators. Chapters 1 and 2 treated operators mostly as elements of the Banach algebra $\mathcal L(X)$; here the inner product adds geometry. The main questions are how an operator interacts with orthogonality, when subspaces split the space in a useful way, and how numerical information from inner products constrains the spectrum. ## Adjoint Operators and Distinguished Classes The Banach-space inverse problem asks whether $T-\lambda I$ is invertible. On a Hilbert space there is a second, geometric question: how does $T$ move angles, lengths, and orthogonal complements? The adjoint is the device that converts this question into an operator equation. [definition: Adjoint Operator] Let $H$ be a Hilbert space and let $T \in \mathcal{L}(H)$. The adjoint of $T$ is the operator $T^* \in \mathcal{L}(H)$ such that \begin{align*} (Tx,y)_H=(x,T^*y)_H \end{align*} for all $x,y\in H$. [/definition] Existence and uniqueness use the [Riesz representation theorem](/theorems/221): for fixed $y$, the map $x \mapsto (Tx,y)_H$ is a bounded linear functional on $H$. The definition by itself only assigns an operator $T^*$ to a single bounded operator $T$. To use adjoints in spectral theory, one must know that taking adjoints is compatible with the algebraic operations used to build new operators. Once the adjoint exists, the next question is whether it can be used as a reliable algebraic operation rather than only as a definition attached to one operator. Spectral arguments will repeatedly pass from an operator to its adjoint inside sums, products, scalar multiples, and norm estimates, so the formal rules must say exactly how those operations interact. If adjoints failed to interact predictably with the operator algebra, then identities involving resolvents, quadratic forms, and composite operators would break as soon as one passed to adjoints. The useful rules are exactly the Hilbert-space analogues of conjugate-transpose identities for matrices, including how scalars, sums, products, and norms behave. The formal algebra of adjoints is needed before introducing special operator classes such as self-adjoint and unitary operators. The result below records the identities that make the adjoint operation usable inside later spectral arguments. [quotetheorem:8393] [citeproof:8393] The boundedness hypothesis is essential here: without it the adjoint may fail to be defined on all of $H$, and domain questions replace these algebraic identities. A concrete model is the differentiation operator $D$ on $L^2(0,1)$ with domain $H^1_0(0,1)$; its adjoint is described by weak differentiation on a larger Sobolev domain, so formulas such as $(ST)^*=T^*S^*$ require domain hypotheses before they even have a precise meaning. The complex scalars explain the conjugates in the first formula; over a real Hilbert space the same statement loses the bars. The identity for $T^*T$ is stronger than a formal rule, because it turns an operator norm into a quadratic-form computation, which will be the main mechanism behind spectral estimates. These identities make adjoints behave like conjugate transposes, so the next task is to name the operator classes that replace familiar matrix classes. The most important one for spectral theory is the analogue of a Hermitian matrix. [definition: Self-Adjoint Operator] Let $H$ be a Hilbert space. An operator $T\in\mathcal{L}(H)$ is self-adjoint if $T=T^*$. [/definition] Self-adjoint operators are the infinite-dimensional replacement for Hermitian matrices, and later they will have real spectra and a functional calculus. A different geometric problem is to recognize operators that preserve the whole Hilbert-space geometry rather than a real-valued quadratic form. Length preservation alone does not force surjectivity in infinite dimension, so the unitary condition records both preservation of inner-product geometry and invertibility through the adjoint. [definition: Unitary Operator] Let $H$ be a Hilbert space. An operator $U\in\mathcal{L}(H)$ is unitary if $U^*U=UU^*=I$. [/definition] A unitary operator preserves the Hilbert-space geometry and is onto. To separate preservation of length from surjectivity, the course also uses the following weaker notion. [definition: Isometry] Let $H$ be a Hilbert space. An operator $V\in\mathcal{L}(H)$ is an isometry if \begin{align*} \|Vx\|_H=\|x\|_H \end{align*} for all $x\in H$. [/definition] An isometry need not be onto. In contrast, a unitary operator is precisely a surjective isometry in the Hilbert-space setting, and the missing surjectivity is responsible for different spectral behaviour of unilateral and bilateral shifts. [definition: Orthogonal Projection] Let $H$ be a Hilbert space. An operator $P\in\mathcal{L}(H)$ is an [orthogonal projection](/theorems/437) if $P^2=P$ and $P^*=P$. [/definition] The algebraic condition $P^2=P$ says that applying the projection twice changes nothing, while the adjoint condition says that the discarded component is orthogonal to the retained component. Idempotents without self-adjointness can project along an oblique direction, so they need not respect distances or orthogonal complements. Spectral theory also needs to detect when an operator behaves like a non-negative real number; an arbitrary operator may rotate phases so that $(Tx,x)_H$ is not even real. This motivates the next definition, which uses the inner product to record non-negativity of an operator through its quadratic form. [definition: Positive Operator] Let $H$ be a Hilbert space. An operator $T\in\mathcal{L}(H)$ is positive, written $T\geq 0$, if \begin{align*} (Tx,x)_H\geq 0 \end{align*} for all $x\in H$. [/definition] Positivity is a quadratic-form condition; in the complex bounded setting it forces the quadratic form to be real and ultimately forces self-adjointness. The next class is broader: it includes self-adjoint and unitary operators, and it is broad enough to support the spectral theorem later in the course. The obstruction it removes is that a general operator can have different left and right Hilbert-space behaviour: $T^*T$ measures the size after applying $T$, while $TT^*$ measures the size after applying $T^*$. Normality asks these two measurements to agree as operators. [definition: Normal Operator] Let $H$ be a Hilbert space. An operator $N\in\mathcal{L}(H)$ is normal if \begin{align*} N^*N=NN^*. \end{align*} [/definition] Normality means that the operator commutes with the operation measuring its Hilbert-space transpose. The first concrete model of these definitions is projection onto a closed subspace. [example: Orthogonal Projection Onto a Closed Subspace] Let $M$ be a closed subspace of a Hilbert space $H$, and let $P_Mx$ be the nearest point in $M$ to $x$. The nearest-point condition gives the orthogonal decomposition \begin{align*} x=P_Mx+(x-P_Mx) \end{align*} with $P_Mx\in M$ and $x-P_Mx\in M^\perp$. This decomposition identifies the range and kernel. Since $P_Mx\in M$ for every $x\in H$, $\operatorname{Range}(P_M)\subset M$. Conversely, if $m\in M$, then \begin{align*} m=m+0 \end{align*} with $m\in M$ and $0\in M^\perp$, so $P_Mm=m$ and $m\in\operatorname{Range}(P_M)$. Hence $\operatorname{Range}(P_M)=M$. If $P_Mx=0$, then \begin{align*} x=0+x \end{align*} and the decomposition forces $x\in M^\perp$, so $\ker(P_M)\subset M^\perp$. If $x\in M^\perp$, then \begin{align*} x=0+x \end{align*} with $0\in M$ and $x\in M^\perp$, so $P_Mx=0$. Thus $\ker(P_M)=M^\perp$. Now $P_Mx\in M$, so applying $P_M$ again gives \begin{align*} P_M(P_Mx)=P_Mx \end{align*} for every $x\in H$, and therefore $P_M^2=P_M$. To check self-adjointness, write \begin{align*} x=P_Mx+x_\perp \end{align*} and \begin{align*} y=P_My+y_\perp \end{align*} where $x_\perp,y_\perp\in M^\perp$. Since $P_Mx,P_My\in M$, orthogonality gives \begin{align*} (P_Mx,y)_H=(P_Mx,P_My)_H+(P_Mx,y_\perp)_H=(P_Mx,P_My)_H \end{align*} and \begin{align*} (x,P_My)_H=(P_Mx,P_My)_H+(x_\perp,P_My)_H=(P_Mx,P_My)_H. \end{align*} Hence \begin{align*} (P_Mx,y)_H=(x,P_My)_H \end{align*} for all $x,y\in H$, so $P_M^*=P_M$. Therefore $P_M$ is an orthogonal projection onto $M$, with kernel exactly $M^\perp$. [/example] This example is the geometric source of the definition. The next section proves that every closed subspace of a Hilbert space arises this way. ## Orthogonal Decompositions and Reducing Subspaces Spectral theory often simplifies an operator by splitting the Hilbert space into pieces. The central problem is to know which subspaces admit an orthogonal complement and which splittings respect the operator. [definition: Orthogonal Complement] Let $H$ be a Hilbert space and let $M\subset H$ be a subset. The orthogonal complement of $M$ is \begin{align*} M^\perp=\{x\in H:(x,m)_H=0\text{ for all }m\in M\}. \end{align*} [/definition] The set $M^\perp$ is always a closed linear subspace. What is not automatic is that $M$ and $M^\perp$ together account for every vector in $H$. Spectral arguments often need to split an arbitrary vector into a component lying in a chosen closed subspace and a component invisible to it, and this requires an existence and uniqueness theorem for orthogonal projections. [quotetheorem:241] [citeproof:241] This theorem is the Hilbert-space substitute for coordinates adapted to a subspace. Closedness is essential: a non-closed dense proper subspace has orthogonal complement $\{0\}$, so it cannot split $H$ as $M\oplus M^\perp$. Completeness of $H$ is used through the minimizing sequence, and in an incomplete inner-product space the nearest point argument can leave the space. The theorem does not say that every algebraic complement is orthogonal; it produces the unique complement compatible with the inner product. To use such decompositions in operator theory, the next question is whether the operator maps a chosen summand back into itself. [definition: Invariant Subspace] Let $H$ be a Hilbert space and let $T\in\mathcal{L}(H)$. A closed subspace $M\subset H$ is invariant for $T$ if \begin{align*} T(M)\subset M. \end{align*} [/definition] Invariant subspaces permit restriction: if $M$ is invariant for $T$, then $T|_M\in\mathcal{L}(M)$. They do not by themselves give a block diagonal decomposition: the unilateral shift on $\ell^2(\mathbb N)$ leaves the closed span of $e_2,e_3,\dots$ invariant, but its orthogonal complement is not invariant. Thus a single invariant summand can still allow the operator to mix information from the complementary direction into the chosen subspace. The next definition asks for [invariance of the orthogonal complement](/theorems/2410) as well. [definition: Reducing Subspace] Let $H$ be a Hilbert space and let $T\in\mathcal{L}(H)$. A closed subspace $M\subset H$ reduces $T$ if both $M$ and $M^\perp$ are invariant for $T$. [/definition] Reducing subspaces are the correct Hilbert-space notion of decomposing an operator into independent summands. The definition is geometric, but checking both $M$ and $M^\perp$ directly can be awkward, especially when $M$ is given as the range of a projection. A usable criterion should express reduction in terms of the adjoint or the orthogonal projection $P_M$, because those are operators that can be manipulated inside the same algebra as $T$. [quotetheorem:8394] [citeproof:8394] This criterion is practical because projections are operators in the same algebra as $T$. Closedness is needed so that $P_M$ exists as a bounded orthogonal projection, and the adjoint condition is not cosmetic: invariance for $T$ alone does not force invariance of $M^\perp$. The theorem does not classify reducing subspaces; it only converts the question into adjoint-invariance or a commutator equation. That conversion is what will later let spectral projections automatically produce reducing subspaces for normal operators. The bilateral shift gives a useful test case where the geometry is exact but invariant subspaces remain restrictive. [example: Bilateral Shift] Let $H=\ell^2(\mathbb Z)$, with standard basis $(e_k)_{k\in\mathbb Z}$, and define $U:H\to H$ by $(Ux)_n=x_{n-1}$. For $x\in H$, \begin{align*} \|Ux\|_{\ell^2}^2=\sum_{n\in\mathbb Z}|(Ux)_n|^2=\sum_{n\in\mathbb Z}|x_{n-1}|^2=\sum_{m\in\mathbb Z}|x_m|^2=\|x\|_{\ell^2}^2. \end{align*} Thus $U$ preserves norms. Define $V:H\to H$ by $(Vx)_n=x_{n+1}$. Then \begin{align*} (UVx)_n=(Vx)_{n-1}=x_n. \end{align*} Also, \begin{align*} (VUx)_n=(Ux)_{n+1}=x_n. \end{align*} Hence $UV=VU=I$, so $U$ is invertible and $U^{-1}=V$. To identify the adjoint, take $x,y\in\ell^2(\mathbb Z)$. Reindexing the absolutely convergent inner-product series gives \begin{align*} (Ux,y)_{\ell^2}=\sum_{n\in\mathbb Z}x_{n-1}\overline{y_n}=\sum_{m\in\mathbb Z}x_m\overline{y_{m+1}}=(x,Vy)_{\ell^2}. \end{align*} Therefore $U^*=V$, so \begin{align*} (U^*x)_n=x_{n+1}. \end{align*} Since $U^*U=UU^*=I$, the operator $U$ is unitary. For a basis vector, $Ue_k=e_{k+1}$. Because $e_{k+1}$ is orthogonal to $e_k$, the one-dimensional subspace $\operatorname{span}\{e_k\}$ is not invariant under $U$. More generally, if $A\subset\mathbb Z$ and $H_A=\overline{\operatorname{span}}\{e_k:k\in A\}$, then $UH_A\subset H_A$ forces $k+1\in A$ whenever $k\in A$, since $Ue_k=e_{k+1}$. Invariance under $U^*$ forces $k-1\in A$ whenever $k\in A$, since $U^*e_k=e_{k-1}$. If $A$ is nonempty and $k\in A$, repeated use of these two implications gives $k+m\in A$ for every $m\in\mathbb Z$, so $A=\mathbb Z$. Hence the only coordinate subspaces invariant under both $U$ and $U^*$ are $\{0\}$ and all of $H$. The bilateral shift is geometrically perfect, because it is unitary, but its coordinate reducing subspaces are still rigid; this is why shifts are useful tests for invariant-subspace intuition. [/example] The bilateral shift will return when the spectrum of unitary operators is discussed. It also shows why geometry alone does not make invariant-subspace questions simple. ## Numerical Range and Spectral Consequences The spectrum is defined through invertibility, but in Hilbert spaces inner products provide scalar tests of an operator. This is needed because invertibility is global: an operator can fail to have eigenvectors while still have spectral values detected by vectors on which it almost acts like scalar multiplication. The numerical range records all values obtained by compressing the operator to one-dimensional directions, and it gives quick spectral information for self-adjoint and normal operators. [definition: Numerical Range] Let $H$ be a Hilbert space and let $T\in\mathcal{L}(H)$. The numerical range of $T$ is \begin{align*} W(T)=\{(Tx,x)_H:x\in H,\ \|x\|_H=1\}. \end{align*} [/definition] The numerical range is not usually closed, but it captures the scalar shadow of $T$. For self-adjoint operators this shadow lies on the real line, yet the spectrum is defined by failure of invertibility and could a priori be a more complicated subset of $\mathbb C$. The key localization question is whether the inner-product symmetry is strong enough to exclude non-real spectral values altogether. [quotetheorem:552] [citeproof:552] The self-adjointness hypothesis is doing the whole localization work: without it, even a bounded operator on a complex Hilbert space can have non-real spectrum, as multiplication by $i$ on $H$ has spectrum $\{i\}$. Complex scalars matter because the proof excludes every point with non-zero imaginary part; over real Hilbert spaces one usually complexifies before making spectral statements. The theorem does not assert that every real point in an interval is spectral, nor does it describe eigenvectors. It only gives the first geometric constraint on the spectrum, which the spectral theorem later refines into a functional model. The following multiplication operator is the model for self-adjoint operators with continuous spectrum rather than a basis of eigenvectors. [example: Multiplication by $x$ on $L^2$ of the Unit Interval] Let $H=L^2([0,1])$ and define $M_x:H\to H$ by $(M_xf)(t)=t f(t)$. First $M_x$ is bounded, because $0\leq t\leq 1$ on $[0,1]$, so \begin{align*} \|M_xf\|_{L^2}^2=\int_0^1 |t f(t)|^2\,d\mathcal L^1(t)=\int_0^1 t^2|f(t)|^2\,d\mathcal L^1(t)\leq \int_0^1 |f(t)|^2\,d\mathcal L^1(t)=\|f\|_{L^2}^2. \end{align*} For $f,g\in L^2([0,1])$, \begin{align*} (M_xf,g)_{L^2}=\int_0^1 t f(t)\overline{g(t)}\,d\mathcal L^1(t)=\int_0^1 f(t)\overline{t g(t)}\,d\mathcal L^1(t)=(f,M_xg)_{L^2}, \end{align*} since $t$ is real. Hence $M_x^*=M_x$, so $M_x$ is self-adjoint. Also, \begin{align*} (M_xf,f)_{L^2}=\int_0^1 t|f(t)|^2\,d\mathcal L^1(t)\geq 0, \end{align*} because $t\geq 0$ and $|f(t)|^2\geq 0$ almost everywhere. Thus $M_x$ is positive. We now compute the spectrum. If $\lambda\notin[0,1]$, let $\delta=\operatorname{dist}(\lambda,[0,1])>0$. Then $|t-\lambda|\geq\delta$ for every $t\in[0,1]$, so the function $h(t)=(t-\lambda)^{-1}$ is bounded and $\|h\|_\infty\leq\delta^{-1}$. Define $R:L^2([0,1])\to L^2([0,1])$ by $(Rf)(t)=h(t)f(t)$. Then \begin{align*} \|Rf\|_{L^2}^2=\int_0^1 |h(t)f(t)|^2\,d\mathcal L^1(t)\leq \delta^{-2}\|f\|_{L^2}^2. \end{align*} Moreover, \begin{align*} ((M_x-\lambda I)Rf)(t)=(t-\lambda)\frac{f(t)}{t-\lambda}=f(t) \end{align*} and \begin{align*} (R(M_x-\lambda I)f)(t)=\frac{(t-\lambda)f(t)}{t-\lambda}=f(t). \end{align*} Thus $M_x-\lambda I$ is invertible, so $\lambda\notin\sigma(M_x)$. Conversely, fix $\lambda\in[0,1]$. For each $n$ choose an interval $I_n\subset[0,1]$ of positive measure such that $\lambda\in I_n$ and $|t-\lambda|\leq n^{-1}$ for every $t\in I_n$. Set \begin{align*} f_n=\frac{\mathbf 1_{I_n}}{\sqrt{\mathcal L^1(I_n)}}. \end{align*} Then $\|f_n\|_{L^2}=1$, and \begin{align*} \|(M_x-\lambda I)f_n\|_{L^2}^2=\frac{1}{\mathcal L^1(I_n)}\int_{I_n}|t-\lambda|^2\,d\mathcal L^1(t)\leq \frac{1}{n^2}. \end{align*} Hence $\|(M_x-\lambda I)f_n\|_{L^2}\leq n^{-1}$ while $\|f_n\|_{L^2}=1$. If $M_x-\lambda I$ were invertible, then $\|(M_x-\lambda I)f\|_{L^2}\geq \|(M_x-\lambda I)^{-1}\|^{-1}\|f\|_{L^2}$ for every $f$, contradicting the sequence $f_n$. Therefore $\lambda\in\sigma(M_x)$. Combining the two inclusions gives $\sigma(M_x)=[0,1]$: the spectral values are exactly the possible values of the multiplier $t$ on the unit interval. [/example] This example is the prototype for the multiplication-operator form of the spectral theorem. Self-adjoint operators need not have many genuine eigenvectors, but their spectral values still behave like the essential range of a real-valued function. [definition: Spectrum of a Unitary Direction] Let $H$ be a complex Hilbert space and let $U\in\mathcal{L}(H)$ be unitary. A spectral value of $U$ is a complex number $\lambda\in\sigma(U)$, where the spectrum is taken in the Banach algebra $\mathcal{L}(H)$. [/definition] The definition uses no new object; it fixes the setting for localization of unitary spectra. Since $U$ is an isometry, the spectral-radius bound already suggests $\sigma(U)$ cannot escape the closed unit disc. The obstruction is the interior: an isometry that is not onto can have spectral values inside the disc, so unitarity must use invertibility through $U^*$ to force the opposite bound as well. [quotetheorem:8395] [citeproof:8395] The invertibility of $U$ is essential, since an isometry that is not onto can have spectrum filling the closed unit disc rather than just the unit circle. The norm preservation gives the outer bound, while applying the same argument to $U^{-1}$ rules out the punctured interior. The theorem does not say that every point of the unit circle belongs to the spectrum; finite-dimensional rotations may have only finitely many spectral values. Together with the real-spectrum theorem, this result explains the two most important spectral pictures: self-adjoint spectra live on the real line, and unitary spectra live on the unit circle. To prepare for the normal-operator spectral theorem, the next result records the basic norm identity that replaces order or unitarity. [quotetheorem:8396] [citeproof:8396] Normality is essential because the equality compares two different defects, one for $N-\lambda I$ and one for its adjoint. For the unilateral shift $S$ on $\ell^2(\mathbb N)$, $|Se_1|=1$ but $|S^*e_1|=0$, so the displayed equality already fails at $\lambda=0$. The conclusion is still only a norm identity, not a spectral theorem by itself, and it does not produce eigenvectors. Its force is that approximate eigenvector behaviour for a normal operator cannot be hidden on the adjoint side. This elementary identity is the seed of the normal-operator spectral theorem. It says that for normal operators, defects of invertibility are seen symmetrically by the operator and its adjoint, which prevents the residual spectrum from appearing. Once Hilbert-space geometry is available, the central questions shift from mere invertibility to how operators interact with orthogonality and adjoints. Chapter 4 uses that structure to study compact operators, which behave like finite-dimensional matrices in many spectral respects and provide the model case for the stronger spectral theorems to come. # 4. Compact Operators on Hilbert Space This chapter introduces the compact operators, the class of bounded operators that behave most like finite-dimensional matrices even on infinite-dimensional Hilbert spaces. The preceding chapters developed the spectrum through resolvents, holomorphic arguments, and spectral mapping; compactness now adds a strong geometric constraint. The central theme is that compact operators may have many eigenvalues, but away from $0$ their spectral behaviour is finite-dimensional. ## Compactness Through Total Boundedness The first question is how to recognise when a bounded operator has a genuinely finite-dimensional flavour. Boundedness only says that the image of the unit ball remains bounded, but in infinite-dimensional Hilbert space bounded sets need not have convergent subsequences. Compactness asks for much more: the image of the unit ball must be small enough to be covered by finitely many balls of every positive radius. [definition: Compact Operator] Let $H$ and $K$ be Hilbert spaces. A bounded linear operator $T \in \mathcal{L}(H,K)$ is compact if $T(\overline{B}(0,1))$ has compact closure in $K$. [/definition] This definition is stated through the unit ball, but in practice proofs usually produce bounded sequences rather than explicit finite covers or closures. To use compactness in contradiction arguments, one must know that every bounded sequence in the domain has an image subsequence converging in norm. The sequential criterion provides exactly that bridge from the topological definition to the form used in weak convergence and spectral arguments. [quotetheorem:4919] [citeproof:4919] The boundedness hypothesis on the sequence is essential: compactness controls only the image of bounded sets, and an unbounded sequence may have images escaping to infinity even under a compact operator. The Hilbert-space assumption is being used only through the metric completeness of the codomain and the norm topology; the result is really a Banach-space compactness criterion. The criterion is especially useful when a weakly or algebraically defined sequence is already known to be bounded: compactness then upgrades boundedness of the inputs into norm convergence of a subsequence of the outputs. Its limitation is just as important. The theorem is a recognition principle, not a construction principle: it tells us what compactness does to every bounded sequence, but it does not explain how to verify compactness without testing all such sequences. For example, diagonal operators on $\ell^2$ are easiest to prove compact by truncating their tails, not by directly chasing arbitrary subsequences. For approximation arguments we need to cover the image of the unit ball by finitely many small balls, so the next criterion rewrites compactness in the finite-covering language that makes those truncations visible. [quotetheorem:8397] [citeproof:8397] Completeness of $K$ is the point that turns finite coverings at every scale into compact closure; in a non-[complete metric space](/page/Complete%20Metric%20Space), [total boundedness](/page/Total%20Boundedness) alone need not give compactness. The theorem also shows that compactness is stronger than boundedness, since bounded subsets of infinite-dimensional Hilbert space usually fail to be totally bounded. This finite-covering formulation is exactly what allows compactness to survive approximation in operator norm. Total boundedness shows why operators with finite-dimensional range are compact: finite-dimensional bounded sets admit finite coverings at every scale. To turn this observation into a reusable approximation method, we need a name for operators whose images live in some finite-dimensional subspace of the codomain. These will be the basic finite-dimensional models from which more complicated compact operators are built. [definition: Finite-Rank Operator] Let $H$ and $K$ be Hilbert spaces. An operator $F \in \mathcal{L}(H,K)$ has finite rank if $\operatorname{Range}(F)$ is finite-dimensional. [/definition] The definition isolates the strongest possible finite-dimensional behaviour for an operator. The remaining issue is whether having finite-dimensional range really forces the image of the infinite-dimensional unit ball to be compact after closure. Since the unit ball in the domain may be huge, the argument must pass through bounded subsets of a finite-dimensional range, where closed and bounded sets are compact. [quotetheorem:4891] [citeproof:4891] Boundedness of $F$ is needed here: finite-dimensional range by itself does not make an everywhere-defined linear map continuous if no boundedness is assumed. The finite-dimensionality assumption is also doing real work, because the identity on an infinite-dimensional Hilbert space maps the unit ball into a bounded set but not into a compact one. Thus the theorem is not merely a source of examples; it identifies the exact finite-dimensional mechanism behind compactness: once the image of the unit ball is trapped in a finite-dimensional subspace, ordinary Heine-Borel compactness becomes available. Finite-rank operators alone are too restrictive for analysis, where natural compact operators usually have infinite-dimensional range. Integral operators, diagonal operators with entries tending to $0$, and smoothing operators are typically compact because they can be uniformly approximated by finite-dimensional pieces, not because their actual ranges are finite-dimensional. The point of the finite-rank theorem is therefore preparatory: it supplies the compact building blocks. The next result explains why taking operator-norm limits of those blocks still leaves the compact world. [quotetheorem:4892] [citeproof:4892] The convergence must be in operator norm; pointwise strong convergence of compact operators is not enough, since finite-rank projections on an infinite-dimensional Hilbert space converge strongly to the identity, which is not compact. The theorem also uses compactness of every approximant, not merely boundedness, because bounded operators are already closed in operator norm. Its main use is to reduce compactness proofs to finite-rank approximation plus a uniform tail estimate. This closure theorem changes compactness from a topological definition into a usable approximation method. The diagonal operators on $\ell^2$ are the cleanest test case: compactness is exactly decay of the diagonal, because truncating the tail gives finite rank. [example: Compact Diagonal Operators on $\ell^2$] [claim]The diagonal operator $D_a$ is compact if and only if $a_n\to 0$.[/claim] [proof]Let $M=\sup_{n\ge 1}|a_n|<\infty$. For $x=(x_n)\in\ell^2$, \begin{align*}\|D_ax\|_{\ell^2}^2=\sum_{n=1}^{\infty}|a_nx_n|^2\le M^2\sum_{n=1}^{\infty}|x_n|^2=M^2\|x\|_{\ell^2}^2.\end{align*} Thus $D_a$ is bounded. Suppose first that $a_n\to 0$, and define \begin{align*}D_a^{(N)}x=(a_1x_1,\ldots,a_Nx_N,0,0,\ldots).\end{align*} The range of $D_a^{(N)}$ is contained in $\operatorname{span}\{e_1,\ldots,e_N\}$, so $D_a^{(N)}$ has finite rank and is compact by *[Finite-Rank Operators Are Compact](/theorems/4891)*. For $x\in\ell^2$, \begin{align*}\|(D_a-D_a^{(N)})x\|_{\ell^2}^2=\sum_{n>N}|a_nx_n|^2\le \left(\sup_{n>N}|a_n|\right)^2\sum_{n>N}|x_n|^2\le \left(\sup_{n>N}|a_n|\right)^2\|x\|_{\ell^2}^2.\end{align*} Hence $\|D_a-D_a^{(N)}\|\le \sup_{n>N}|a_n|$. Conversely, for each $m>N$, \begin{align*}\|(D_a-D_a^{(N)})e_m\|_{\ell^2}=|a_m|.\end{align*} Taking the supremum over $m>N$ gives $\|D_a-D_a^{(N)}\|\ge \sup_{m>N}|a_m|$, so \begin{align*}\|D_a-D_a^{(N)}\|=\sup_{n>N}|a_n|.\end{align*} Since $a_n\to 0$, this norm tends to $0$, and *Norm Limit of Compact Operators* implies that $D_a$ is compact. Conversely, suppose $D_a$ is compact. The standard basis $(e_n)$ is bounded because $\|e_n\|_{\ell^2}=1$ for every $n$. By *[Sequential Characterisation of Compact Operators](/theorems/4919)*, every subsequence of $(D_ae_n)=(a_ne_n)$ has a norm-convergent further subsequence. If $a_n$ did not tend to $0$, then there would be an $\varepsilon>0$ and indices $n_1<n_2<\cdots$ such that $|a_{n_j}|\ge\varepsilon$ for every $j$. For $j\ne k$, orthonormality of the basis gives \begin{align*}\|a_{n_j}e_{n_j}-a_{n_k}e_{n_k}\|_{\ell^2}^2=|a_{n_j}|^2+|a_{n_k}|^2\ge 2\varepsilon^2.\end{align*} Thus the subsequence $(a_{n_j}e_{n_j})$ has no Cauchy subsequence, hence no norm-convergent subsequence, contradicting compactness. Therefore $a_n\to 0$.[/proof] This identifies compactness of a diagonal operator with decay of its diagonal entries: truncating the diagonal gives the finite-dimensional approximations, and compactness forces the images of the basis vectors to vanish in norm. [/example] The diagonal example shows the general pattern: compactness is finite-dimensional approximation plus a tail estimate. To state the corresponding integral-operator class precisely, we encode an operator by a square-integrable kernel whose separated-kernel approximations play the role of diagonal truncations. [definition: Hilbert-Schmidt Integral Operator] Let $(X,\mathcal E,\mu)$ be a measure space and let $k \in L^2(X\times X,\mu\otimes\mu)$. The Hilbert-Schmidt integral operator associated to $k$ is the bounded operator $T_k:L^2(X)\to L^2(X)$ represented by \begin{align*} (T_k f)(x)=\int_X k(x,y)f(y)\,d\mu(y) \end{align*} for a.e. $x\in X$ and every $f\in L^2(X)$ for which this representative is chosen. [/definition] The definition packages an integral kernel as an operator on $L^2$. To prove compactness, we need to connect $L^2$ approximation of the kernel with operator-norm approximation of the induced operator. [quotetheorem:4896] [citeproof:4896] The $L^2$ hypothesis on the kernel is sufficient rather than necessary; there are compact integral operators whose kernels are not square-integrable in this sense. Sigma-finiteness is a standard measurability and approximation assumption ensuring that finite sums of separated functions approximate kernels in $L^2(X\times X)$. The result connects compact operator theory with Fourier analysis, PDE, and probability, where smoothing or averaging kernels often produce compactness. The theorem applies to many familiar smoothing integral operators. The Volterra operator is a useful example because its compactness comes from an $L^2$ kernel even though its spectral behaviour is very different from a diagonal operator with visible eigenvectors. [example: Volterra-Type Hilbert-Schmidt Operator] On $L^2(0,1)$ define \begin{align*}(Tf)(x)=\int_0^x f(y)\,dy.\end{align*} This is the integral operator with kernel $k(x,y)=\mathbf{1}_{\{0\le y\le x\le 1\}}$, because for a.e. $x\in(0,1)$, \begin{align*}\int_0^1 k(x,y)f(y)\,dy=\int_0^1 \mathbf{1}_{\{0\le y\le x\le 1\}}f(y)\,dy=\int_0^x f(y)\,dy.\end{align*} Moreover, \begin{align*}\|k\|_{L^2((0,1)^2)}^2=\int_0^1\int_0^1 \mathbf{1}_{\{0\le y\le x\le 1\}}\,dy\,dx=\int_0^1\int_0^x 1\,dy\,dx=\int_0^1 x\,dx=\frac{1}{2}.\end{align*} Thus $k\in L^2((0,1)^2)$, and *Hilbert-Schmidt Operators Are Compact* implies that $T$ is compact. This compact operator has no nonzero eigenvalues. Suppose $\lambda\ne 0$ and $Tf=\lambda f$ for some $f\in L^2(0,1)$. Set $g=Tf$. The function $g$ is absolutely continuous, $g(0)=0$, and by the fundamental theorem for Lebesgue integrals, $g'(x)=f(x)$ for a.e. $x\in(0,1)$. Since $g=\lambda f$ a.e., we get \begin{align*}g'(x)=\lambda^{-1}g(x)\quad\text{for a.e. }x\in(0,1).\end{align*} For $h(x)=e^{-x/\lambda}g(x)$, the product rule for absolutely continuous functions gives \begin{align*}h'(x)=e^{-x/\lambda}g'(x)-\lambda^{-1}e^{-x/\lambda}g(x)=0\quad\text{for a.e. }x\in(0,1).\end{align*} Hence $h$ is constant on $[0,1]$, and $h(0)=g(0)=0$, so $h=0$, then $g=0$, and finally $f=\lambda^{-1}g=0$. Therefore no $\lambda\ne 0$ is an eigenvalue of $T$. The Volterra operator shows that compactness can occur without producing any nonzero point spectrum; additional symmetry assumptions are needed for the richer eigenvalue theory developed later. [/example] Sobolev-type inclusions give a third model: compactness comes from extra smoothness controlling oscillation. In this course we use a sequence-space model that captures the same mechanism without invoking the full [Rellich-Kondrachov theorem](/theorems/64). [example: Compact Sobolev-Type Inclusion] For $s>t$, define \begin{align*}H^s_{\mathrm{seq}}=\left\{x=(x_n)_{n\ge 1}: \sum_{n=1}^{\infty}(1+n^2)^s|x_n|^2<\infty\right\}.\end{align*} We show that the inclusion $J:H^s_{\mathrm{seq}}\to H^t_{\mathrm{seq}}$, $Jx=x$, is compact. First, $J$ is well-defined and bounded: for $x\in H^s_{\mathrm{seq}}$, \begin{align*}\|Jx\|_{H^t_{\mathrm{seq}}}^2=\sum_{n=1}^{\infty}(1+n^2)^t|x_n|^2=\sum_{n=1}^{\infty}(1+n^2)^{t-s}(1+n^2)^s|x_n|^2.\end{align*} Since $s>t$, we have $t-s<0$, so $(1+n^2)^{t-s}\le 1$ for every $n\ge 1$. Hence \begin{align*}\|Jx\|_{H^t_{\mathrm{seq}}}^2\le \sum_{n=1}^{\infty}(1+n^2)^s|x_n|^2=\|x\|_{H^s_{\mathrm{seq}}}^2.\end{align*} Define isometries $U_s:H^s_{\mathrm{seq}}\to \ell^2$ and $U_t:H^t_{\mathrm{seq}}\to \ell^2$ by \begin{align*}(U_sx)_n=(1+n^2)^{s/2}x_n,\qquad (U_tx)_n=(1+n^2)^{t/2}x_n.\end{align*} Indeed, \begin{align*}\|U_sx\|_{\ell^2}^2=\sum_{n=1}^{\infty}|(1+n^2)^{s/2}x_n|^2=\sum_{n=1}^{\infty}(1+n^2)^s|x_n|^2=\|x\|_{H^s_{\mathrm{seq}}}^2.\end{align*} The conjugated operator $A=U_tJU_s^{-1}$ on $\ell^2$ is diagonal. If $z\in\ell^2$ and $x=U_s^{-1}z$, then $x_n=(1+n^2)^{-s/2}z_n$, so \begin{align*}(Az)_n=(U_tJx)_n=(1+n^2)^{t/2}x_n=(1+n^2)^{t/2}(1+n^2)^{-s/2}z_n=(1+n^2)^{(t-s)/2}z_n.\end{align*} Thus $A$ is the diagonal operator with diagonal entries \begin{align*}a_n=(1+n^2)^{(t-s)/2}=\frac{1}{(1+n^2)^{(s-t)/2}}.\end{align*} Because $s-t>0$, the denominator tends to $\infty$, so $a_n\to 0$. By *Compact Diagonal Operators on $\ell^2$*, $A$ is compact. Since $J=U_t^{-1}AU_s$ and composition with bounded isometries preserves compactness, the inclusion $J:H^s_{\mathrm{seq}}\to H^t_{\mathrm{seq}}$ is compact. The compactness comes from the decay factor $(1+n^2)^{(t-s)/2}$: high-frequency coordinates are increasingly damped when the stronger $H^s_{\mathrm{seq}}$ norm is viewed in the weaker $H^t_{\mathrm{seq}}$ norm. [/example] ## Compactness and Weak Convergence The next question is how compactness interacts with Hilbert-space weak convergence. Weak convergence is designed to retain boundedness and all scalar testing information, while compactness converts boundedness of images into norm subsequential compactness. The key result is that a compact operator upgrades weak convergence in the domain to norm convergence in the codomain. [definition: Weak Convergence in Hilbert Space] Let $H$ be a Hilbert space. A sequence $(x_n)$ in $H$ converges weakly to $x\in H$, written $x_n\rightharpoonup x$, if $(x_n,y)_H\to (x,y)_H$ for every $y\in H$. [/definition] Weak convergence gives convergence only after testing against fixed vectors, so it does not immediately look like a compactness hypothesis. Compactness can only be applied to bounded sets, and the definition of weak convergence does not visibly include any [uniform norm](/page/Uniform%20Norm) bound. The first issue is therefore to prove that pointwise convergence of all scalar tests prevents the sequence itself from escaping to infinity in norm. [quotetheorem:983] [citeproof:983] The theorem is a Hilbert-space instance of a general Banach-space principle using uniform boundedness; weak convergence is never allowed to have unbounded norm growth. The conclusion would fail if convergence were tested only against a small subset of vectors, since uniform boundedness requires pointwise control on the whole dual family. This boundedness result is exactly the entry point for applying compactness to weakly convergent sequences. Boundedness lets compactness produce norm-convergent subsequences of the image. That alone is not enough to prove convergence of the full image sequence, because different subsequences could conceivably have different norm limits. The Hilbert-space scalar tests identify any possible subsequential limit as $Tx$, forcing compactness to upgrade weak convergence $x_n\rightharpoonup x$ into norm convergence of $Tx_n$. [quotetheorem:4895] [citeproof:4895] Compactness of $T$ is essential: bounded operators preserve weak convergence only weakly, not in norm, as the identity on an infinite-dimensional Hilbert space shows. The Hilbert structure enters when identifying the norm limit by testing against all $z\in K$ through the adjoint $T^*$. This theorem is one of the main bridges between geometric compactness and variational methods, where weakly convergent minimizing sequences become strongly convergent after applying compact embeddings or compact solution operators. This theorem is a diagnostic for non-compactness. If an operator preserves weakly null sequences without making them norm-null, it cannot be compact. [example: Identity Operator Is Not Compact in Infinite Dimension] Let $H$ be an infinite-dimensional Hilbert space, and choose an orthonormal sequence $(e_n)$. We first verify that $e_n\rightharpoonup 0$. Fix $y\in H$. By [Bessel's inequality](/theorems/540) applied to the orthonormal sequence $(e_n)$, \begin{align*}\sum_{n=1}^{\infty}|(y,e_n)_H|^2\le \|y\|_H^2.\end{align*} The series on the left converges, so its terms satisfy $|(y,e_n)_H|^2\to 0$. Hence $|(y,e_n)_H|\to 0$, and since $|(e_n,y)_H|=|(y,e_n)_H|$, we get \begin{align*}(e_n,y)_H\to 0.\end{align*} Because this holds for every $y\in H$, the sequence $(e_n)$ converges weakly to $0$. Now suppose, for contradiction, that the identity operator $I\in\mathcal L(H)$ is compact. Since $e_n\rightharpoonup 0$, *Compact Operators Send Weak Convergence to Norm Convergence* gives \begin{align*}Ie_n\to I0\quad\text{in norm}.\end{align*} But $Ie_n=e_n$ and $I0=0$, so this says $\|e_n\|_H\to 0$. Orthonormality gives \begin{align*}\|e_n\|_H^2=(e_n,e_n)_H=1\end{align*} for every $n$, so $\|e_n\|_H=1$ for every $n$, a contradiction. Therefore the identity operator on an infinite-dimensional Hilbert space is not compact. [/example] The same idea detects the non-compactness of shifts and multiplication operators. It also explains why compactness is rare among operators that move infinitely many orthogonal directions without decay. [example: Unilateral Shift Is Not Compact] On $\ell^2$, define $S$ on the standard orthonormal basis by $S(e_n)=e_{n+1}$. We show that $S$ is not compact by testing it on the weakly null sequence $(e_n)$. First verify that $e_n\rightharpoonup 0$. Fix $y=(y_k)_{k\ge 1}\in\ell^2$. Since $(e_n,y)_{\ell^2}=y_n$ and $\sum_{k=1}^{\infty}|y_k|^2<\infty$, the terms of this convergent nonnegative series satisfy $|y_n|^2\to 0$. Hence $y_n\to 0$, so \begin{align*}(e_n,y)_{\ell^2}\to 0.\end{align*} Because this holds for every $y\in\ell^2$, we have $e_n\rightharpoonup 0$. If $S$ were compact, *Compact Operators Send Weak Convergence to Norm Convergence* would imply $Se_n\to S0$ in $\ell^2$ norm. Since $S0=0$ and $Se_n=e_{n+1}$, this would give \begin{align*}\|Se_n-S0\|_{\ell^2}=\|e_{n+1}\|_{\ell^2}.\end{align*} Orthonormality gives \begin{align*}\|e_{n+1}\|_{\ell^2}^2=(e_{n+1},e_{n+1})_{\ell^2}=1,\end{align*} so $\|e_{n+1}\|_{\ell^2}=1$ for every $n$. Thus $Se_n$ does not converge to $0$ in norm, contradicting compactness. Therefore the unilateral shift is not compact. The obstruction is that the shift moves the orthonormal basis without decay: the vectors $Se_n$ remain unit vectors, unlike compact diagonal operators where $\|D_ae_n\|_{\ell^2}=|a_n|\to 0$. [/example] ## Spectral Restrictions for Compact Operators The last question is how compactness reshapes the spectrum. In general Banach-space spectral theory, the spectrum of a bounded operator can be a complicated compact subset of $\mathbb C$. For compact operators, every nonzero spectral value is forced to be an eigenvalue, and such eigenvalues cannot accumulate away from $0$. We first need the geometric obstruction behind the result. [Riesz's lemma](/theorems/1222) says that an infinite-dimensional normed space always contains unit vectors separated from a given proper closed subspace. [quotetheorem:1222] [citeproof:1222] The lemma is a compactness detector rather than just a geometric separation statement. It explains why a compact operator cannot keep producing new unit directions that stay uniformly separated after applying the operator. That obstruction is exactly what is needed in spectral theory: if a nonzero spectral value were not an eigenvalue, the failure of invertibility would have to persist without producing a kernel vector, creating an infinite-dimensional chain on which compactness is incompatible with a lower bound. The next formal result converts this geometric obstruction into the spectral conclusion that nonzero spectral points of a compact operator must be genuine eigenvalues. [quotetheorem:4897] [citeproof:4897] The condition $\lambda\ne 0$ cannot be removed: the Volterra operator is compact and has $0$ in its spectrum without having a useful nonzero eigenspace structure. Complex scalars are assumed so that the usual spectral theory gives a nonempty compact spectrum and the spectral alternatives are stated over $\mathbb C$. This theorem is the first point where compact operators begin to look like infinite-dimensional matrices with a discrete nonzero spectrum. The theorem says that the nonzero [spectrum of a compact operator](/theorems/220) is point spectrum. It remains to rule out infinite-dimensional eigenspaces, since an infinite eigenspace would make the operator act like a nonzero multiple of the identity on too large a subspace. [quotetheorem:220] [citeproof:220] The nonzero assumption is again essential, because $\ker T$ may be infinite-dimensional for a compact operator such as a finite-rank projection. Compactness is used only on the unit ball of the eigenspace, where $T$ acts as a scalar multiple of the identity; this isolates exactly why infinite-dimensional eigenspaces are impossible away from $0$. Finite-dimensional eigenspaces control the local size of each spectral subspace, but they do not yet describe the shape of the whole nonzero spectrum. The next issue is whether compactness also forces the different nonzero spectral values themselves to thin out. After multiplicity has been controlled at each individual nonzero spectral value, the remaining obstruction is global: there might still be infinitely many different nonzero spectral values. To complete the compact-operator picture, one must understand where such a sequence of spectral values is allowed to accumulate. Finite multiplicity rules out large eigenspaces at a single nonzero spectral value, but it does not by itself prevent infinitely many different nonzero eigenvalues from crowding around another nonzero number. Such a cluster would give many independent spectral directions on which the compact operator still acts with uniformly visible size. Compactness cannot preserve that much separated information away from zero, so the only possible accumulation point for distinct nonzero spectral values should be zero. This is the global compactness question left open by the finite-dimensional eigenspace result. If a nonzero number could be approached by distinct spectral values, then the operator would carry infinitely many nearly independent spectral modes whose sizes stay bounded away from zero. The following theorem rules out exactly that obstruction and turns the local eigenspace information into a discreteness statement for the nonzero spectrum. [quotetheorem:4923] [citeproof:4923] This result is the promised global compactness statement. Distinctness of the spectral values matters: a constant sequence of one nonzero eigenvalue says nothing about accumulation, while a sequence of genuinely different nonzero spectral values would represent new spectral directions that do not disappear in norm. Compactness is what forbids infinitely many separated finite-dimensional spectral directions from remaining visible away from $0$; without compactness, the unilateral shift has spectrum filling the closed unit disk, and multiplication operators can have whole intervals of spectrum. The theorem does not say that $0$ must be an eigenvalue, nor does it say that every nonzero eigenvalue occurs only once; it says that the nonzero spectral data can only accumulate at the one value where compact operators are allowed to lose information. A typical model is a compact diagonal operator on $\ell^2$ with diagonal entries tending to $0$: the nonzero diagonal values may form an infinite list, but they can only pile up at $0$. This example also shows the limitation of the theorem. Compactness gives discreteness away from $0$, not a finite spectrum and not, by itself, an orthogonal expansion. Those stronger structural conclusions require the symmetry hypotheses introduced in the self-adjoint and normal spectral theorems later in the note. Together these results give the basic spectral picture for compact operators. Away from $0$, the spectrum is a discrete list of eigenvalues, each with finite-dimensional eigenspace; any infinite list must tend to $0$. This is the qualitative compact-operator theorem before symmetry is imposed. When self-adjointness or normality is added later, the same discreteness can be combined with orthogonality and expansion formulas; without those extra hypotheses, compactness controls size and accumulation but not an orthonormal eigenbasis. [example: Spectrum of a Compact Diagonal Operator] Let $D_a\in\mathcal L(\ell^2)$ be given by $(D_ax)_n=a_nx_n$, with $a_n\to 0$. For each standard basis vector $e_m$, \begin{align*}D_ae_m=a_me_m.\end{align*} Thus every nonzero value $a_m$ is a nonzero eigenvalue of $D_a$, with eigenvector $e_m$. Conversely, suppose $\lambda\ne 0$ and $D_ax=\lambda x$ for some nonzero $x=(x_n)\in\ell^2$. Equality of coordinates gives \begin{align*}a_nx_n=\lambda x_n\end{align*} for every $n$, hence \begin{align*}(a_n-\lambda)x_n=0\end{align*} for every $n$. Therefore $x_n=0$ whenever $a_n\ne\lambda$, so \begin{align*}x=\sum_{\{n:a_n=\lambda\}}x_ne_n.\end{align*} Thus the eigenspace for $\lambda\ne 0$ is exactly \begin{align*}\ker(D_a-\lambda I)=\operatorname{span}\{e_n:a_n=\lambda\}.\end{align*} Since $a_n\to 0$ and $\lambda\ne 0$, choose $N$ such that $|a_n|<|\lambda|/2$ for all $n\ge N$. If $a_n=\lambda$, then $|a_n|=|\lambda|$, so such an $n$ must satisfy $n<N$. Hence only finitely many basis vectors occur in this span, agreeing with the finite-dimensional eigenspace conclusion above. Finally, if $\mu\ne 0$, choose $N$ such that $|a_n|<|\mu|/2$ for all $n\ge N$. All values of $(a_n)$ lying in the ball of radius $|\mu|/2$ around $\mu$ must then come from the finite set $\{a_1,\dots,a_{N-1}\}$, so $\mu$ cannot be an accumulation point of $\{a_n:n\ge 1\}$. Therefore the only possible accumulation point is $0$, as in *[Zero Is the Only Possible Accumulation Point of the Spectrum](/theorems/4923)*. This diagonal model makes the compact spectral restrictions visible coordinate by coordinate. [/example] The diagonal computation also highlights why zero needs separate treatment. Nonzero spectral values are controlled by eigenspaces and finite-dimensionality, but $0$ may enter the spectrum through failure of surjectivity, failure of bounded inverse estimates, or accumulation of nonzero eigenvalues. [remark: Role of Zero] The point $0$ behaves differently from the rest of the spectrum. If $H$ is infinite-dimensional and $T$ is compact, then $0\in\sigma(T)$ unless $T$ has finite-dimensional range equal to $H$. Indeed, if $T$ were invertible, then $I=T^{-1}T$ would be compact, contradicting non-[compactness of the identity](/theorems/4920) on infinite-dimensional Hilbert space. [/remark] The compact spectral theory developed here is not yet the spectral theorem. It gives the topological restrictions on possible spectra; the next step is to add symmetry, especially self-adjointness and normality, which supplies orthogonality of eigenspaces and diagonalisation. Chapter 4 identifies the topological shape of the spectrum for compact operators, but symmetry is still missing. Chapter 5 adds self-adjointness, and that extra structure turns the compact theory into a genuine diagonalization result, with real eigenvalues, orthogonal eigenspaces, and a complete eigenvector decomposition. # 5. Compact Self-Adjoint Operators Compactness changes spectral theory from a theory of resolvents into a theory of coordinates. In the previous chapters, the spectrum of a bounded operator was controlled by analytic and algebraic tools, but it could still be complicated. For compact self-adjoint operators on a Hilbert space, the non-zero spectrum consists of eigenvalues, the eigenspaces are mutually orthogonal, and the operator is recovered from its action on those eigenspaces. The guiding question of this chapter is: when does a bounded operator behave like a diagonal matrix with entries tending to $0$? Self-adjointness supplies real quadratic quantities, while compactness supplies convergent subsequences. Their combination gives a usable spectral theorem with applications to integral operators and boundary-value problems. ## Extremal Eigenvalues from the Rayleigh Quotient The finite-dimensional spectral theorem often begins by maximizing the quadratic form $(Ax,x)$ on the unit sphere. In infinite-dimensional Hilbert spaces the unit sphere is not compact in norm, so the same argument needs an extra mechanism. Compact self-adjoint operators provide that mechanism: compactness turns weak convergence into norm convergence after applying the operator. [definition: Rayleigh Quotient] Let $H$ be a complex Hilbert space and let $T \in \mathcal{L}(H)$ be self-adjoint. The Rayleigh quotient of $T$ is the map \begin{align*} \mathcal R_T:H\setminus\{0\}\to \mathbb R, \qquad \mathcal R_T(x)=\frac{(Tx,x)_H}{(x,x)_H}. \end{align*} [/definition] The quotient is homogeneous of degree zero, so it is enough to study it on the unit sphere. Self-adjointness is what makes $R_T(x)$ real, and this lets us compare its [supremum and infimum](/page/Supremum%20and%20Infimum) as genuine [real numbers](/page/Real%20Numbers). [example: Diagonal Rayleigh Quotient] Let $H=\ell^2$ and define $T(a_1,a_2,\dots)=(\lambda_1a_1,\lambda_2a_2,\dots)$, where $\lambda_j\in\mathbb R$ and $B=\sup_j|\lambda_j|<\infty$. For $x=(a_j)\neq 0$, the series $\sum_{j=1}^{\infty}\lambda_j|a_j|^2$ converges absolutely because \begin{align*}\sum_{j=1}^{\infty}|\lambda_j||a_j|^2\le B\sum_{j=1}^{\infty}|a_j|^2<\infty.\end{align*} Using the standard inner product on $\ell^2$, \begin{align*}(Tx,x)_{\ell^2}=\sum_{j=1}^{\infty}\lambda_j a_j\overline{a_j}=\sum_{j=1}^{\infty}\lambda_j|a_j|^2.\end{align*} Also, \begin{align*}(x,x)_{\ell^2}=\sum_{j=1}^{\infty}a_j\overline{a_j}=\sum_{j=1}^{\infty}|a_j|^2.\end{align*} Therefore \begin{align*}R_T(x)=\frac{\sum_{j=1}^{\infty}\lambda_j|a_j|^2}{\sum_{j=1}^{\infty}|a_j|^2}.\end{align*} If $w_j=|a_j|^2/\sum_{k=1}^{\infty}|a_k|^2$, then $w_j\ge 0$ and $\sum_{j=1}^{\infty}w_j=1$, so \begin{align*}R_T(x)=\sum_{j=1}^{\infty}\lambda_jw_j.\end{align*} Thus $R_T(x)$ is a weighted average of the diagonal entries. If $S=\sup_j\lambda_j$ and $\lambda_1=S$, then for every $x\neq 0$, \begin{align*}\sum_{j=1}^{\infty}\lambda_j|a_j|^2\le S\sum_{j=1}^{\infty}|a_j|^2.\end{align*} Dividing by $\sum_{j=1}^{\infty}|a_j|^2>0$ gives $R_T(x)\le S$, while \begin{align*}R_T(e_1)=\frac{\lambda_1}{1}=S.\end{align*} So $e_1$ attains the maximum value. If the supremum $S$ is not attained, choose indices $j_k$ with $\lambda_{j_k}>S-1/k$. Then \begin{align*}R_T(e_{j_k})=\lambda_{j_k}>S-\frac{1}{k}.\end{align*} Hence the Rayleigh quotient can approach $S$. It cannot attain $S$: if $x\neq 0$ and $w_j=|a_j|^2/\sum_k|a_k|^2$, then each $S-\lambda_j$ is positive, and at least one $w_j$ is positive, so \begin{align*}S-R_T(x)=\sum_{j=1}^{\infty}(S-\lambda_j)w_j>0.\end{align*} Thus the infinite diagonal example shows exactly how a supremum may exist as a limiting spectral value without being achieved by any vector. [/example] This example shows both the matrix intuition and the infinite-dimensional obstruction. Compactness will force non-zero extremal spectral values to be attained, provided the extremum is separated from $0$. The variational lesson is that a compact self-adjoint operator has enough finite-dimensional behavior at the edge of its spectrum to start an eigenvector construction. One should not read this as saying that every limiting extremum is attained: the obstruction is exactly the possibility that the relevant extremal value is only approached through eigenvalues tending to $0$. The full compact self-adjoint spectral theorem will be quoted once the orthogonality and recursive decomposition have been prepared; here we only need the conceptual starting point that a nonzero spectral edge supplies an eigenvector. [remark: Norm of a Self-Adjoint Operator] For a bounded self-adjoint operator $T \in \mathcal{L}(H)$, \begin{align*} \|T\|_{\mathcal{L}(H)} = \sup_{\|x\|_H=1} |(Tx,x)_H|. \end{align*} Consequently, if compact self-adjoint $T$ is non-zero, at least one of the two extremal numbers above is non-zero and therefore gives an eigenvalue. [/remark] This norm identity connects the variational problem to the operator norm. It also gives the recursive starting point for diagonalization: find an eigenvector at the largest absolute spectral value, remove its direction, and repeat on the orthogonal complement. ## Orthogonality and Recursive Diagonalization Once the first eigenvector has been found, the next question is whether the remaining part of the Hilbert space is still stable under the operator. Self-adjointness gives exactly the invariance needed: eigenspaces for different eigenvalues are orthogonal, and orthogonal complements of invariant finite-dimensional eigenspaces remain invariant. [quotetheorem:8398] [citeproof:8398] Orthogonality prevents eigenvectors belonging to different spectral values from interfering with each other, but both hypotheses matter. If the eigenvalues are the same, orthogonality need not hold: in the identity operator every non-zero vector is an eigenvector with eigenvalue $1$. If self-adjointness is dropped, distinct eigenspaces can also fail to be orthogonal; for example, on $\mathbb C^2$ the operator $A(z_1,z_2)=(z_1+z_2,2z_2)$ has eigenvectors for $1$ and $2$ that are not orthogonal in the standard inner product. The theorem therefore gives a structural separation specific to self-adjoint operators, not to arbitrary diagonalizable operators. The next step is to check that after an eigenspace has been removed, the same compact self-adjoint problem reappears on the remaining Hilbert space. [quotetheorem:8399] [citeproof:8399] The recursive procedure is now legitimate, but again self-adjointness is doing essential work. For the non-self-adjoint operator $A(z_1,z_2)=(z_1+z_2,2z_2)$ on $\mathbb C^2$, the eigenspace for $1$ is $\operatorname{span}((1,0))$, while its orthogonal complement $\operatorname{span}((0,1))$ is not invariant because $A(0,1)=(1,2)$. The theorem does not say that every complement of an eigenspace is invariant, nor that invariant subspaces are automatically orthogonal; it singles out the orthogonal complement because adjoints convert invariance into orthogonality. On each invariant orthogonal complement, the restriction of $T$ is still compact and self-adjoint, so the extremal eigenvalue theorem can be applied again unless the restriction is the zero operator. At this point the recursive construction has all the ingredients needed for the compact self-adjoint expansion theorem: eigenvectors from extremal spectral values, orthogonality between distinct eigenspaces, invariance of orthogonal complements, and compactness forcing any infinite list of non-zero eigenvalues to tend to $0$. Rather than beginning from the abstract spectral theorem, we now use the diagonal model to show what the theorem looks like in its simplest concrete form. [example: Compact Diagonal Operators] Let $H=\ell^2$ and define $T(a_1,a_2,\dots)=(\lambda_1a_1,\lambda_2a_2,\dots)$, where each $\lambda_j\in\mathbb R$ and $\lambda_j\to 0$. Since a convergent real sequence is bounded, choose $B=\sup_j|\lambda_j|<\infty$. For $x=(a_j)\in\ell^2$, \begin{align*}\|Tx\|_{\ell^2}^2=\sum_{j=1}^{\infty}|\lambda_ja_j|^2\le B^2\sum_{j=1}^{\infty}|a_j|^2=B^2\|x\|_{\ell^2}^2.\end{align*} Thus $T$ is a bounded operator on $\ell^2$. For $x=(a_j)$ and $y=(b_j)$ in $\ell^2$, the standard inner product gives \begin{align*}(Tx,y)_{\ell^2}=\sum_{j=1}^{\infty}\lambda_j a_j\overline{b_j}.\end{align*} Because $\lambda_j$ is real, \begin{align*}(x,Ty)_{\ell^2}=\sum_{j=1}^{\infty}a_j\overline{\lambda_jb_j}=\sum_{j=1}^{\infty}a_j\lambda_j\overline{b_j}=\sum_{j=1}^{\infty}\lambda_j a_j\overline{b_j}.\end{align*} Hence $(Tx,y)_{\ell^2}=(x,Ty)_{\ell^2}$ for all $x,y\in\ell^2$, so $T$ is self-adjoint. To see compactness, define the finite-rank truncation \begin{align*}T_N(a_1,a_2,\dots)=(\lambda_1a_1,\dots,\lambda_Na_N,0,0,\dots).\end{align*} For $\|x\|_{\ell^2}=1$, \begin{align*}\|(T-T_N)x\|_{\ell^2}^2=\sum_{j=N+1}^{\infty}|\lambda_j|^2|a_j|^2\le \left(\sup_{j>N}|\lambda_j|\right)^2\sum_{j=N+1}^{\infty}|a_j|^2\le \left(\sup_{j>N}|\lambda_j|\right)^2.\end{align*} Therefore $\|T-T_N\|_{\mathcal L(\ell^2)}\le \sup_{j>N}|\lambda_j|$, and this upper bound tends to $0$ because $\lambda_j\to 0$. Each $T_N$ has range contained in $\operatorname{span}(e_1,\dots,e_N)$, so each $T_N$ is finite-rank; since $T$ is the operator-norm limit of these finite-rank operators, $T$ is compact. For the standard basis vector $e_j$, all coordinates are $0$ except the $j$th coordinate, which is $1$. Applying $T$ gives \begin{align*}Te_j=(0,\dots,0,\lambda_j,0,\dots)=\lambda_j e_j.\end{align*} Thus each $e_j$ is an eigenvector with eigenvalue $\lambda_j$. If $\lambda\neq 0$ is one of the diagonal values, then $Tx=\lambda x$ means \begin{align*}\lambda_j a_j=\lambda a_j\end{align*} for every $j$. Equivalently, \begin{align*}(\lambda_j-\lambda)a_j=0\end{align*} for every $j$, so $a_j=0$ whenever $\lambda_j\neq\lambda$. Hence the eigenspace for $\lambda$ is exactly the closed span of those $e_j$ with $\lambda_j=\lambda$. In this diagonal model, the compact self-adjoint spectral expansion is simply the usual coordinate expansion in the standard orthonormal basis, with non-zero diagonal entries tending to $0$. [/example] This model is the template for the general theorem. Compact self-adjoint operators are precisely the operators that become diagonal after choosing the right orthonormal eigenbasis, except that a kernel may remain and the diagonal entries must tend to $0$ in the infinite-dimensional part. ## Expansion Theorem and Eigenfunction Series Diagonalization is useful only if vectors and operators can be reconstructed from the eigenvectors. The next question is therefore not just whether eigenvectors exist, but whether they form a basis for the part of the Hilbert space seen by $T$ and whether the resulting series converges in the correct norm. [quotetheorem:538] [citeproof:538] The theorem is a diagonal expansion of the operator, not necessarily of every vector in $H$. If $\ker T$ is non-zero, eigenvectors with non-zero eigenvalues do not span the kernel; adjoining an orthonormal basis of $\ker T$ gives a full orthonormal basis of $H$ when a basis-level statement is desired. Both hypotheses prevent familiar failures. Without compactness, a self-adjoint operator can have continuous spectrum rather than a discrete list of eigenvalues; multiplication by $t$ on $L^2[0,1]$ is self-adjoint but has no eigenbasis of point masses in the Hilbert space. Without self-adjointness, compactness alone does not force orthogonal diagonalization; a finite-dimensional Jordan block is compact as an operator but cannot be diagonalized by an orthonormal eigenbasis. [definition: Spectral Projection for a Non-Zero Eigenvalue] Let $T \in \mathcal{L}(H)$ be compact and self-adjoint, and let $\lambda \neq 0$ be an eigenvalue. The spectral projection onto $E_\lambda=\ker(T-\lambda I)$ is the orthogonal projection $P_\lambda:H\to E_\lambda$. [/definition] The projection $P_\lambda$ packages all eigenvectors with the same eigenvalue into a single coordinate block. This grouped form is needed when eigenvalues have multiplicity, because a statement depending on a chosen basis inside $E_\lambda$ hides the intrinsic geometry. The next expansion rewrites the spectral theorem in terms of these basis-independent blocks. If $\{\lambda_k\}$ denotes the set of distinct non-zero eigenvalues of $T$, and $P_{\lambda_k}$ denotes the orthogonal projection onto $E_{\lambda_k}$, the eigenvector expansion above can be grouped as \begin{align*} Tx=\sum_k \lambda_k P_{\lambda_k}x, \end{align*} with convergence in $H$ for every $x\in H$. This is not a second theorem separate from the compact self-adjoint spectral theorem; it is the same diagonal expansion with all equal-eigenvalue coordinates collected into their eigenspaces. The projection form makes the connection with finite-dimensional diagonal matrices especially transparent, with each $P_{\lambda_k}$ playing the role of projection onto a coordinate block. These projections range only over the non-zero eigenspaces. They do not by themselves resolve the kernel: if a vector lies in $\ker T$, every term in the displayed sum vanishes, and a basis of $\ker T$ must be adjoined for a full orthonormal decomposition of $H$. Compactness is what keeps every non-zero eigenspace finite-dimensional and prevents non-zero eigenvalues from accumulating away from $0$; without it, a self-adjoint operator may have continuous spectrum and no such countable block expansion. Self-adjointness is equally essential, since non-normal compact operators need not have enough orthogonal eigenvectors to diagonalize. The formula therefore describes the compact self-adjoint situation, not arbitrary compact operators. It also prepares for the min-max principle, where eigenvalues are characterized by optimizing the Rayleigh quotient over subspaces. ## The Basic Min-Max Principle The extremal eigenvalue theorem identifies the first positive or negative spectral value. The min-max principle answers the recursive version of the same question: after several orthogonal directions have been removed, how is the next eigenvalue detected variationally? [quotetheorem:8400] [citeproof:8400] The principle turns eigenvalue estimates into subspace estimates, but its formulation is tied to the hypotheses. Non-negativity lets the positive eigenvalues be ordered as a decreasing sequence without having to separate positive and negative branches. Compactness ensures that, above any positive threshold, only finitely many eigenvalues occur; otherwise there may be no discrete list $\lambda_1\ge \lambda_2\ge\cdots$ to optimize over. Self-adjointness is needed because the Rayleigh quotient is real and detects spectral values; for a non-self-adjoint operator the quadratic form may not encode eigenvalues at all. Multiplicity is also part of the statement: repeated eigenvalues must be listed once for each independent eigenvector, or the dimension count in the proof gives the wrong index. In applications, trial subspaces give lower bounds in the max-min form, while constraints or orthogonality conditions give upper bounds in the min-max form. [example: Finite-Rank Approximation Bound] Let $T\ge 0$ be compact and self-adjoint, and list its positive eigenvalues with multiplicity as $\lambda_1\ge \lambda_2\ge \cdots >0$. Suppose $L\subset H$ has dimension $n$ and satisfies $(Tx,x)_H\ge c$ for every $x\in L$ with $\|x\|_H=1$. By the *Basic Min-Max Principle*, \begin{align*}\lambda_n=\max_{\dim E=n}\min_{\{x\in E:\|x\|_H=1\}}(Tx,x)_H.\end{align*} Since $L$ is one of the $n$-dimensional subspaces included in the maximum, the maximum is at least the value obtained from $L$: \begin{align*}\lambda_n\ge \min_{\{x\in L:\|x\|_H=1\}}(Tx,x)_H.\end{align*} The hypothesis says every unit vector in $L$ has quadratic form at least $c$, so \begin{align*}\min_{\{x\in L:\|x\|_H=1\}}(Tx,x)_H\ge c.\end{align*} Combining these two inequalities gives \begin{align*}\lambda_n\ge c.\end{align*} Thus $n$ independent test directions with a uniform Rayleigh lower bound force the first $n$ positive eigenvalues, counted with multiplicity, to stay at or above that threshold. [/example] This example is the abstract form of many PDE eigenvalue estimates. The operator is replaced by a compact Green operator or inverse [elliptic operator](/page/Elliptic%20Operator), and test functions encode geometric information about the domain. ## Integral Operators and Hilbert-Schmidt Diagonal Expansions Compact self-adjoint operators often arise from kernels. The abstract theorem then becomes an eigenfunction expansion: the operator acts by projecting a function onto eigenfunctions and multiplying by eigenvalues. [definition: Symmetric Hilbert-Schmidt Kernel] Let $K \in L^2([0,1]^2)$ be a complex-valued function. It is a symmetric Hilbert-Schmidt kernel if \begin{align*} K(s,t)=\overline{K(t,s)} \end{align*} for $\mathcal L^2$-a.e. $(s,t) \in [0,1]^2$. [/definition] Such a kernel defines an integral operator on $L^2[0,1]$, but three facts must be checked before the spectral theorem can be used: boundedness, compactness, and self-adjointness. The square-integrability of the kernel gives boundedness and compactness through finite-rank approximation, while symmetry gives self-adjointness. The next theorem records this verification and then applies the abstract diagonal expansion. [quotetheorem:4896] [citeproof:4896] This result is the point where compact-operator spectral theory becomes an integral-kernel expansion. The hypotheses should be read carefully. The condition $K\in L^2([0,1]^2)$ is what places the operator in the Hilbert-Schmidt class and gives compactness by approximation with finite sums of separated functions. The symmetry condition is what turns the operator into a self-adjoint one, so that the compact self-adjoint spectral theorem applies with an orthonormal eigenfunction basis and real eigenvalues. The conclusion is therefore not a statement about every integral operator. If the kernel is square-integrable but not symmetric, the operator is still compact but need not have one orthonormal eigenfunction expansion for itself. The appropriate replacement is singular-value theory, applied to $T_K^*T_K$ and $T_KT_K^*$. In the symmetric case, however, the abstract orthogonal decomposition from the compact self-adjoint theorem is expressed concretely as a Hilbert-Schmidt diagonal expansion of the kernel operator, much like a [Fourier series](/page/Fourier%20Series) whose basis has been adapted to $K$. [example: Rank-One Symmetric Kernel] Let $u\in L^2[0,1]$ and set $K(s,t)=u(s)\overline{u(t)}$. Since \begin{align*}\int_0^1\int_0^1 |K(s,t)|^2\,d\mathcal L^1(t)\,d\mathcal L^1(s)=\left(\int_0^1 |u(s)|^2\,d\mathcal L^1(s)\right)\left(\int_0^1 |u(t)|^2\,d\mathcal L^1(t)\right)=\|u\|_{L^2}^4,\end{align*} the kernel is Hilbert-Schmidt. Also, \begin{align*}\overline{K(t,s)}=\overline{u(t)\overline{u(s)}}=\overline{u(t)}u(s)=u(s)\overline{u(t)}=K(s,t),\end{align*} so the kernel is symmetric. For $f\in L^2[0,1]$, the associated integral operator satisfies \begin{align*}(T_Kf)(s)=\int_0^1 u(s)\overline{u(t)}f(t)\,d\mathcal L^1(t)=u(s)\int_0^1 f(t)\overline{u(t)}\,d\mathcal L^1(t).\end{align*} With the standard $L^2$ inner product, this is \begin{align*}(T_Kf)(s)=(f,u)_{L^2}u(s).\end{align*} Thus every vector in the range of $T_K$ is a scalar multiple of $u$, so $\operatorname{Range}(T_K)\subseteq \operatorname{span}(u)$; if $u\neq 0$, taking $f=u$ gives \begin{align*}T_Ku=(u,u)_{L^2}u=\|u\|_{L^2}^2u,\end{align*} so $\operatorname{Range}(T_K)=\operatorname{span}(u)$. If $u=0$, then $T_K=0$ and there are no non-zero eigenvalues. Assume now that $u\neq 0$. The vector \begin{align*}e=\frac{u}{\|u\|_{L^2}}\end{align*} has norm $1$, and \begin{align*}T_Ke=\left(e,u\right)_{L^2}u=\left(\frac{u}{\|u\|_{L^2}},u\right)_{L^2}u=\|u\|_{L^2}u=\|u\|_{L^2}^2e.\end{align*} Hence $\|u\|_{L^2}^2$ is an eigenvalue with normalized eigenfunction $e$. Conversely, if $T_Kf=\lambda f$ with $\lambda\neq 0$, then \begin{align*}\lambda f=T_Kf=(f,u)_{L^2}u.\end{align*} Dividing by $\lambda$ gives \begin{align*}f=\frac{(f,u)_{L^2}}{\lambda}u,\end{align*} so every eigenvector with non-zero eigenvalue lies in $\operatorname{span}(u)$. For $f=\alpha u$ with $\alpha\neq 0$, \begin{align*}T_K(\alpha u)=(\alpha u,u)_{L^2}u=\alpha\|u\|_{L^2}^2u=\|u\|_{L^2}^2(\alpha u),\end{align*} so the only non-zero eigenvalue is $\|u\|_{L^2}^2$. Therefore the spectral expansion has exactly one non-zero term: \begin{align*}T_Kf=\|u\|_{L^2}^2(f,e)_{L^2}e=(f,u)_{L^2}u.\end{align*} This rank-one kernel is the compact self-adjoint expansion in its smallest possible non-zero form. [/example] Rank-one kernels show the expansion without any convergence issues. General Hilbert-Schmidt kernels are built as limits of finite-rank kernels, and the compact spectral theorem controls the limiting eigenfunction series. ## Green Operators as Compact Self-Adjoint Models A central reason compact self-adjoint operators matter is that differential equations can sometimes be inverted into compact integral operators. Instead of studying an unbounded differential operator directly, this course first studies its compact inverse model. [example: Green Operator for the Dirichlet Interval Problem] Let $H=L^2[0,1]$. For $f\in H$, set \begin{align*} u(s)=\int_0^s t(1-s)f(t)\,d\mathcal L^1(t)+\int_s^1 s(1-t)f(t)\,d\mathcal L^1(t). \end{align*} This is the same as $u(s)=\int_0^1 G(s,t)f(t)\,d\mathcal L^1(t)$ with $G(s,t)=s(1-t)$ when $s\le t$ and $G(s,t)=t(1-s)$ when $t\le s$. The boundary values are immediate from the displayed formula: \begin{align*} u(0)=\int_0^0 t f(t)\,d\mathcal L^1(t)+\int_0^1 0\cdot(1-t)f(t)\,d\mathcal L^1(t)=0. \end{align*} Similarly, \begin{align*} u(1)=\int_0^1 t\cdot 0\cdot f(t)\,d\mathcal L^1(t)+\int_1^1 (1-t)f(t)\,d\mathcal L^1(t)=0. \end{align*} Using the fundamental theorem for absolutely continuous indefinite integrals, for a.e. $s\in(0,1)$, \begin{align*} u'(s)=-\int_0^s t f(t)\,d\mathcal L^1(t)+\int_s^1 (1-t)f(t)\,d\mathcal L^1(t). \end{align*} Differentiating once more a.e. gives \begin{align*} u''(s)=-s f(s)-(1-s)f(s)=-f(s). \end{align*} Thus $u=Gf$ is the weak solution of $-u''=f$ with $u(0)=u(1)=0$. The kernel is square-integrable because $0\le G(s,t)\le 1$ on $[0,1]^2$, so \begin{align*} \int_0^1\int_0^1 |G(s,t)|^2\,d\mathcal L^1(t)\,d\mathcal L^1(s)\le 1. \end{align*} It is also symmetric: if $s\le t$, then $G(s,t)=s(1-t)$ and, since $t\ge s$, $G(t,s)=s(1-t)$; the case $t\le s$ is the same with $s$ and $t$ interchanged. Since the kernel is real, $G(s,t)=\overline{G(t,s)}$. Hence the associated integral operator is compact and self-adjoint on $L^2[0,1]$ by the symmetric Hilbert-Schmidt kernel theorem. Now let $Gf=\lambda f$ with $\lambda\ne 0$, and put $u=Gf$. Then $u=\lambda f$, so $f=\lambda^{-1}u$. Since $-u''=f$, we obtain \begin{align*} -u''=\lambda^{-1}u,\qquad u(0)=u(1)=0. \end{align*} Writing $\mu=\lambda^{-1}$, the differential equation is $u''+\mu u=0$. Non-zero solutions satisfying both Dirichlet boundary conditions occur when $\mu=(j\pi)^2$ for some integer $j\ge 1$, with eigenfunctions proportional to $\sin(j\pi s)$. Indeed, \begin{align*} -\frac{d^2}{ds^2}\sin(j\pi s)=j^2\pi^2\sin(j\pi s). \end{align*} Also $\sin(j\pi\cdot 0)=0$ and $\sin(j\pi\cdot 1)=0$. The normalization is \begin{align*} \int_0^1 |\sin(j\pi s)|^2\,d\mathcal L^1(s)=\frac12. \end{align*} Therefore the normalized eigenfunctions are $e_j(s)=\sqrt{2}\sin(j\pi s)$, and the corresponding eigenvalues of the Green operator are \begin{align*} \lambda_j=\frac{1}{j^2\pi^2}. \end{align*} So the compact Green operator diagonalizes in the sine basis, while the compact eigenvalues are the reciprocals of the Dirichlet eigenvalues of $-d^2/ds^2$. [/example] This example links the compact theorem to the familiar sine series from boundary-value problems. The eigenvalues of the compact Green operator tend to $0$, while the corresponding eigenvalues of the differential operator $-d^2/ds^2$ tend to infinity. [remark: Compact Inverse Philosophy] Many elliptic boundary-value problems lead to a compact self-adjoint solution operator after choosing the correct Hilbert space framework. The spectral theorem then gives an orthonormal family of eigenfunctions and real eigenvalues, while the original differential operator is recovered by taking reciprocals of the non-zero compact eigenvalues. [/remark] The compact inverse viewpoint is the bridge from the matrix-like theory of this chapter to later spectral theory for unbounded self-adjoint operators. It explains why compactness and self-adjointness appear together throughout mathematical physics, PDE, and harmonic analysis. After Chapter 5, the compact self-adjoint case is fully understood, but the course still needs the corresponding picture for the broader normal case. Chapter 6 extends the compact theory by combining the self-adjoint theorem with adjoint and orthogonal-decomposition methods, leading to the compact normal theorem and the singular-value decomposition. # 6. Compact Normal Operators and Singular Values This chapter completes the compact-operator part of the course by moving beyond the self-adjoint case. We use the compact self-adjoint spectral theorem from Chapter 5 together with the adjoint and orthogonal-decomposition tools from Chapter 3. The goals are to prove the spectral theorem for compact normal operators, construct singular values for arbitrary compact operators, and explain why Schmidt expansions give the correct infinite-dimensional analogue of matrix [singular value decomposition](/theorems/3071). Compact operators on Hilbert spaces behave like infinite matrices whose entries become small at infinity. In the previous chapter, compact self-adjoint operators gave an orthonormal basis of eigenvectors and real eigenvalues tending to $0$. This chapter keeps compactness but weakens self-adjointness in two directions: normal operators, where the operator still diagonalises orthogonally, and arbitrary compact operators, where singular values replace eigenvalues as the stable diagonal data. The guiding point is that compactness isolates all non-zero spectral behaviour into finite-dimensional eigenspaces, while Hilbert-space geometry supplies orthogonality. For non-normal compact operators the eigenvalues may be too unstable or too sparse, so we study $T^*T$ and extract the positive numbers measuring how much $T$ stretches orthogonal directions. ## Orthogonal Decomposition for Compact Normal Operators The first problem is whether the spectral theorem for compact self-adjoint operators survives after replacing self-adjointness by normality. A normal operator need not have real eigenvalues, so the order structure used for self-adjoint operators disappears. The replacement is geometric: eigenspaces belonging to different eigenvalues are orthogonal, and compactness still forces all non-zero spectral values to be eigenvalues of finite multiplicity. [definition: Normal Operator] Let $H$ be a complex Hilbert space. An operator $N \in \mathcal{L}(H)$ is normal if \begin{align*} N^*N = NN^*. \end{align*} [/definition] Normality is the operator identity that makes $N$ and $N^*$ compatible enough for spectral subspaces to be mutually orthogonal. The condition includes self-adjoint operators, unitary operators, and multiplication by bounded scalar functions, but compactness will make the spectral picture much more discrete. [example: Diagonal Compact Normal Operator] Let $H=\ell^2$ and suppose first that $(\lambda_n)$ is bounded. For $x=(x_n)\in \ell^2$, \begin{align*} \|Nx\|_{\ell^2}^2=\sum_{n=1}^{\infty}|\lambda_n x_n|^2\le \left(\sup_{n\ge 1}|\lambda_n|^2\right)\sum_{n=1}^{\infty}|x_n|^2 \end{align*} so $N$ is bounded and $\|N\|\le \sup_n|\lambda_n|$. Its adjoint is the diagonal operator \begin{align*} N^*(x_1,x_2,\dots)=(\overline{\lambda_1}x_1,\overline{\lambda_2}x_2,\dots), \end{align*} because for $x,y\in\ell^2$, \begin{align*} (Nx,y)_{\ell^2}=\sum_{n=1}^{\infty}\lambda_n x_n\overline{y_n}=\sum_{n=1}^{\infty}x_n\overline{\overline{\lambda_n}y_n}=(x,N^*y)_{\ell^2}. \end{align*} Therefore \begin{align*} N^*N(x_1,x_2,\dots)=(|\lambda_1|^2x_1,|\lambda_2|^2x_2,\dots) \end{align*} and \begin{align*} NN^*(x_1,x_2,\dots)=(|\lambda_1|^2x_1,|\lambda_2|^2x_2,\dots), \end{align*} so $N^*N=NN^*$ and $N$ is normal. If $\lambda_n\to 0$, define the finite-rank truncation \begin{align*} N_m(x_1,x_2,\dots)=(\lambda_1x_1,\dots,\lambda_mx_m,0,0,\dots). \end{align*} For $\|x\|_{\ell^2}\le 1$, \begin{align*} \|(N-N_m)x\|_{\ell^2}^2=\sum_{n>m}|\lambda_nx_n|^2\le \left(\sup_{n>m}|\lambda_n|^2\right)\sum_{n>m}|x_n|^2\le \sup_{n>m}|\lambda_n|^2. \end{align*} Hence $\|N-N_m\|\le \sup_{n>m}|\lambda_n|\to 0$, so $N$ is a norm limit of finite-rank operators and is compact. Conversely, if $N$ is compact and $\lambda_n$ did not tend to $0$, then some $\varepsilon>0$ and subsequence $(n_k)$ would satisfy $|\lambda_{n_k}|\ge\varepsilon$. The vectors $Ne_{n_k}=\lambda_{n_k}e_{n_k}$ would then obey, for $k\ne \ell$, \begin{align*} \|Ne_{n_k}-Ne_{n_\ell}\|_{\ell^2}^2=|\lambda_{n_k}|^2+|\lambda_{n_\ell}|^2\ge 2\varepsilon^2, \end{align*} so the compact image of the bounded sequence $(e_{n_k})$ would contain no norm-convergent subsequence, a contradiction. Thus this diagonal operator is compact exactly when $\lambda_n\to 0$. Finally, each standard basis vector satisfies \begin{align*} Ne_n=\lambda_ne_n, \end{align*} so the diagonal entries are eigenvalues, with repetitions corresponding to higher-dimensional eigenspaces. Since the compactness condition forces $\lambda_n\to 0$, the only possible accumulation point among these diagonal eigenvalues is $0$, which is exactly the pattern abstracted by the compact normal spectral theorem. [/example] The diagonal example identifies the target statement: every non-zero spectral value should be represented by an actual eigenspace. This theorem is needed before any orthogonal decomposition can be built, because spectral values that were only approximate would have no eigenspace to insert into the direct sum. Compactness is the mechanism that turns approximate eigenvectors into genuine eigenvectors. [quotetheorem:4897] [citeproof:4897] This theorem leaves $0$ in a special role. It may be an eigenvalue, as for a finite-rank projection, or it may lie in the spectrum only as an accumulation point, as for the diagonal operator on $\ell^2$ with diagonal entries $1/n$. The hypothesis $\lambda \ne 0$ is therefore essential: compact injective diagonal operators can have $0$ in the spectrum without having a zero eigenvector. Compactness is essential too, since the unilateral shift has spectral values in the unit disc but no eigenvectors; normality is what guarantees that spectral failure gives approximate eigenvectors in the right Hilbert-space form. The infinite-dimensional assumption is mainly to avoid the finite-dimensional case, where every spectral value is already an eigenvalue. The preceding result supplies the eigenspaces that will appear in the decomposition, but it does not yet explain how those eigenspaces sit inside $H$. A direct-sum decomposition built from arbitrary eigenspaces could be badly conditioned, because different summands might interact through the inner product. To obtain an orthogonal Hilbert-space expansion, we need the normality hypothesis to give a geometric separation statement: different spectral values must contribute perpendicular directions rather than merely linearly independent ones. [quotetheorem:8398] [citeproof:8398] This orthogonality is a genuinely normal-operator phenomenon, not a general fact about diagonalisation. For example, on $\mathbb C^2$ the operator $A$ given by $Ae_1=e_1$ and $Ae_2=e_1+2e_2$ has distinct eigenvalues $1$ and $2$, but its eigenspaces are not perpendicular for the standard inner product. The theorem therefore does not say that every diagonalizable operator has orthogonal eigenspaces, nor does it control generalized eigenspaces for non-normal operators. Normality is exactly the condition preventing spectral subspaces from leaning into one another, which is why the next decomposition can be an orthogonal direct sum rather than just an algebraic splitting. Orthogonality is what turns eigenspaces into Hilbert-space geometry, but bounded normal operators require a still broader language. Compact normal operators can often be understood through eigenvectors; non-compact normal operators may have continuous spectrum, as multiplication by $t$ on $L^2[0,1]$ does. In the full bounded normal theory, the replacement for an eigenbasis is a projection-valued measure, whose precise axioms are developed later. For the present compact chapter, the important point is only the guiding picture: spectral pieces should correspond to mutually orthogonal parts of the Hilbert space, and compactness is the extra hypothesis that turns most of those pieces into honest eigenspaces. This is why the compact normal theorem below should be read as the discrete shadow of the later spectral-measure theorem, not as the full measure-theoretic statement. Non-zero spectral values become eigenvalues of finite multiplicity, possible accumulation is forced to occur only at $0$, and the normality hypothesis makes the resulting eigenspaces orthogonal. Without compactness, continuous spectral parts may appear; without normality, even compact operators need not admit an orthogonal diagonalisation, as the weighted-shift example below illustrates. [example: Compact Weighted Shift Is Not Usually Normal] Let $H=\ell^2$ with standard orthonormal basis $(e_n)_{n\ge 1}$, and define \begin{align*} T e_n = w_n e_{n+1}, \qquad n\ge 1, \end{align*} where $w_n\to 0$. Since every convergent scalar sequence is bounded, let $M=\sup_{n\ge 1}|w_n|<\infty$. For $x=(x_n)\in \ell^2$, \begin{align*} Tx=(0,w_1x_1,w_2x_2,w_3x_3,\dots). \end{align*} Therefore \begin{align*} \|Tx\|_{\ell^2}^2=\sum_{n=1}^{\infty}|w_nx_n|^2\le M^2\sum_{n=1}^{\infty}|x_n|^2=M^2\|x\|_{\ell^2}^2, \end{align*} so $T$ is bounded. For $m\ge 1$, define $T_m$ by $T_m e_n=w_ne_{n+1}$ when $n\le m$, and $T_m e_n=0$ when $n>m$. Its range is contained in $\operatorname{span}\{e_2,\dots,e_{m+1}\}$, so $T_m$ has finite rank. If $\|x\|_{\ell^2}\le 1$, then \begin{align*} \|(T-T_m)x\|_{\ell^2}^2=\sum_{n>m}|w_nx_n|^2\le \left(\sup_{n>m}|w_n|^2\right)\sum_{n>m}|x_n|^2\le \sup_{n>m}|w_n|^2. \end{align*} Hence \begin{align*} \|T-T_m\|\le \sup_{n>m}|w_n|\to 0. \end{align*} Thus $T$ is a norm limit of finite-rank operators, and therefore $T$ is compact. We compute the adjoint on the basis. For every $n\ge 1$, \begin{align*} (Te_n,e_1)_{\ell^2}=(w_ne_{n+1},e_1)_{\ell^2}=0, \end{align*} so \begin{align*} T^*e_1=0. \end{align*} For $k\ge 2$, \begin{align*} (Te_n,e_k)_{\ell^2}=(w_ne_{n+1},e_k)_{\ell^2}=w_n\delta_{n+1,k}. \end{align*} Also, \begin{align*} (e_n,\overline{w_{k-1}}e_{k-1})_{\ell^2}=w_{k-1}\delta_{n,k-1}=w_n\delta_{n+1,k}. \end{align*} Since this holds for every basis vector $e_n$, \begin{align*} T^*e_k=\overline{w_{k-1}}e_{k-1}, \qquad k\ge 2. \end{align*} Now for each $n\ge 1$, \begin{align*} T^*T e_n=T^*(w_ne_{n+1})=w_n\overline{w_n}e_n=|w_n|^2e_n. \end{align*} On the other hand, \begin{align*} TT^*e_1=T0=0. \end{align*} For $n\ge 2$, \begin{align*} TT^*e_n=T(\overline{w_{n-1}}e_{n-1})=\overline{w_{n-1}}w_{n-1}e_n=|w_{n-1}|^2e_n. \end{align*} Therefore $T^*T=TT^*$ exactly when $|w_1|^2=0$ and $|w_n|^2=|w_{n-1}|^2$ for every $n\ge 2$. The first condition gives $w_1=0$, and then induction gives $|w_n|=0$ for every $n$, so $w_n=0$ for every $n$. Thus this compact unilateral weighted shift is normal only in the degenerate zero case. This shows why compactness alone is not enough for the compact normal spectral theorem: arbitrary compact operators need singular values rather than orthogonal eigenvalue diagonalisation. [/example] ## Singular Values and Compact Polar Decomposition For a general compact operator $T$, eigenvectors may fail to span the space and eigenvalues may miss the main geometry of the map. The better question is: along which orthogonal directions does $T$ stretch most, and by what amounts? This is answered by applying the compact self-adjoint spectral theorem to the positive operator $T^*T$. [definition: Singular Values] Let $H$ and $K$ be Hilbert spaces, and let $T \in \mathcal{L}(H,K)$ be compact. The non-zero singular values of $T$ are the positive numbers \begin{align*} s_j(T) = \sqrt{\lambda_j}, \end{align*} where $\lambda_j>0$ ranges over the non-zero eigenvalues of the compact positive operator $T^*T$, repeated according to multiplicity. [/definition] The definition replaces the possibly non-normal operator $T$ by the positive operator $T^*T$ on the domain space. Singular values are always non-negative, stable under taking adjoints, and finite in number above any positive threshold. When listed as a sequence, they are arranged in non-increasing order. [example: Singular Values of a Finite Matrix] Let $A\in\mathbb C^{m\times n}$ act from $\mathbb C^n$ to $\mathbb C^m$ with the standard inner products, and let $(e_1,\dots,e_n)$ and $(f_1,\dots,f_m)$ be the standard orthonormal bases. Suppose $A$ is rectangular diagonal, so for some $r\le \min\{m,n\}$, \begin{align*} Ae_j=a_jf_j \text{ for } 1\le j\le r \end{align*} and \begin{align*} Ae_j=0 \text{ for } j>r. \end{align*} We compute $A^*A$ on the basis. For $1\le i\le m$ and $1\le j\le n$, \begin{align*} (Ae_j,f_i)_{\mathbb C^m}=a_j\delta_{ij} \text{ when } j\le r \end{align*} and this inner product is $0$ when $j>r$. Since the inner product is linear in the first variable, \begin{align*} (e_j,\overline{a_i}e_i)_{\mathbb C^n}=a_i\delta_{ij}. \end{align*} Therefore \begin{align*} A^*f_i=\overline{a_i}e_i \text{ for } 1\le i\le r \end{align*} and \begin{align*} A^*f_i=0 \text{ for } i>r. \end{align*} Hence, for $1\le j\le r$, \begin{align*} A^*Ae_j=A^*(a_jf_j)=a_j\overline{a_j}e_j=|a_j|^2e_j, \end{align*} while for $j>r$, \begin{align*} A^*Ae_j=A^*0=0. \end{align*} Thus the eigenvalues of $A^*A$ are $|a_1|^2,\dots,|a_r|^2$, together with the remaining zeros. The singular values, being the square roots of the non-zero eigenvalues of $A^*A$, are the positive numbers among $|a_1|,\dots,|a_r|$; under the finite-dimensional convention that also lists zero singular values, the remaining entries are zeros. The multiplication $a_j\overline{a_j}=|a_j|^2$ is the point: the phase of $a_j$ disappears, while its magnitude remains. [/example] The finite matrix example shows that the singular-value calculation begins with $A^*A$, but a decomposition of $A$ itself requires a separate positive factor. We therefore need a name for the positive square root of $T^*T$, because this operator will carry all singular values and will be the magnitude part in [polar decomposition](/theorems/3074). This definition is the bridge from numerical singular values to an operator factorisation. [definition: Absolute Value of an Operator] Let $H$ and $K$ be Hilbert spaces and let $T \in \mathcal{L}(H,K)$. The absolute value of $T$ is \begin{align*} |T| = (T^*T)^{1/2} \in \mathcal{L}(H). \end{align*} [/definition] Thus $|T|$ is positive and self-adjoint, and its non-zero eigenvalues are exactly the singular values of $T$. The notation is consistent with the scalar identity $|z|=(\overline z z)^{1/2}$, but the operator factor carrying phase and range direction remains to be identified. [quotetheorem:8401] [citeproof:8401] Polar decomposition separates magnitude from direction. The partial isometry $U$ is not generally unitary: if $T$ has nonzero kernel or non-dense range, $U$ is only isometric on $(\ker T)^\perp$ and vanishes on the kernel under the standard choice. This is why the theorem is a Hilbert-space statement rather than a purely Banach-space statement; it uses adjoints, orthogonal complements, and the positive square root of $T^*T$. Compactness is not required for the factorisation itself, but it is required for the next step: for compact operators, the magnitude $|T|$ has an orthonormal eigenbasis away from its kernel, so applying $U$ to those eigenvectors gives the left singular vectors. [example: Compact Integral Operator with Non-Symmetric Kernel] Let $T:L^2[0,1]\to L^2[0,1]$ be given by \begin{align*} (Tf)(x)=\int_0^1 k(x,y)f(y)\,dy, \end{align*} where $k\in L^2([0,1]^2)$. For $f\in L^2[0,1]$, Cauchy-Schwarz in the $y$-variable gives \begin{align*} |(Tf)(x)|^2\le \left(\int_0^1 |k(x,y)|^2\,dy\right)\left(\int_0^1 |f(y)|^2\,dy\right). \end{align*} Integrating in $x$ gives \begin{align*} \|Tf\|_2^2\le \left(\int_0^1\int_0^1 |k(x,y)|^2\,dy\,dx\right)\|f\|_2^2=\|k\|_{L^2([0,1]^2)}^2\|f\|_2^2, \end{align*} so $T$ is bounded. Since $L^2([0,1]^2)$ is the norm closure of finite sums $\sum_{\ell=1}^N a_\ell(x)b_\ell(y)$, the corresponding integral operators are finite-rank and converge to $T$ in operator norm by the same estimate; hence $T$ is compact. We compute the adjoint. For $f,g\in L^2[0,1]$, \begin{align*} (Tf,g)=\int_0^1\left(\int_0^1 k(x,y)f(y)\,dy\right)\overline{g(x)}\,dx. \end{align*} By [Fubini's theorem](/theorems/2961), \begin{align*} (Tf,g)=\int_0^1 f(y)\left(\int_0^1 k(x,y)\overline{g(x)}\,dx\right)\,dy. \end{align*} Since \begin{align*} \overline{\int_0^1 \overline{k(x,y)}g(x)\,dx}=\int_0^1 k(x,y)\overline{g(x)}\,dx, \end{align*} we have \begin{align*} (Tf,g)=\int_0^1 f(y)\overline{\left(\int_0^1 \overline{k(x,y)}g(x)\,dx\right)}\,dy. \end{align*} Thus \begin{align*} (T^*g)(y)=\int_0^1 \overline{k(x,y)}g(x)\,dx. \end{align*} Now apply this formula with $g=Tf$. For $y\in[0,1]$, \begin{align*} (T^*Tf)(y)=\int_0^1 \overline{k(x,y)}(Tf)(x)\,dx. \end{align*} Substituting the definition of $Tf$ gives \begin{align*} (T^*Tf)(y)=\int_0^1 \overline{k(x,y)}\left(\int_0^1 k(x,z)f(z)\,dz\right)\,dx. \end{align*} Using Fubini again, \begin{align*} (T^*Tf)(y)=\int_0^1\left(\int_0^1 \overline{k(x,y)}k(x,z)\,dx\right)f(z)\,dz. \end{align*} Therefore $T^*T$ is the integral operator with kernel \begin{align*} h(y,z)=\int_0^1 \overline{k(x,y)}k(x,z)\,dx. \end{align*} This operator is positive because \begin{align*} (T^*Tf,f)=(Tf,Tf)=\|Tf\|_2^2\ge 0. \end{align*} It is self-adjoint because \begin{align*} (T^*T)^*=T^*(T^*)^*=T^*T. \end{align*} The original operator need not share either symmetry property: for example, $k(x,y)=x$ gives \begin{align*} (Tf)(x)=x\int_0^1 f(y)\,dy, \end{align*} while \begin{align*} (T^*g)(y)=\int_0^1 xg(x)\,dx, \end{align*} so $T\ne T^*$; also $T^*T$ projects in the constant direction with weight $\int_0^1 x^2\,dx=1/3$, whereas $TT^*$ projects in the $x$-direction with weight $\int_0^1 1\,dy=1$, so $T^*T\ne TT^*$. Thus the singular values are read from the positive self-adjoint operator $T^*T$, whose kernel is $h$, even when the original integral kernel has no symmetry. [/example] ## Schmidt Decomposition for Compact Operators The last problem is to turn singular values into an actual expansion of the operator. For matrices, singular value decomposition writes $A$ as a sum of rank-one maps. The compact Hilbert-space version is the Schmidt decomposition, with convergence in operator norm when the singular values tend to zero. [definition: Rank-One Operator] Let $H$ and $K$ be Hilbert spaces. For $u\in K$ and $v\in H$, the rank-one operator $u\otimes v:H\to K$ is defined by \begin{align*} (u\otimes v)(x)= (x,v)_H u. \end{align*} [/definition] Rank-one operators are the atoms in the singular-value expansion. The convention above is compatible with the inner product being linear in the first argument: the map $x\mapsto (x,v)_H$ is linear in $x$. With these atoms available, the next theorem upgrades the polar decomposition from a factorisation into a convergent diagonal series. [quotetheorem:8402] [citeproof:8402] This theorem is the infinite-dimensional SVD. It describes any compact operator as an operator-norm limit of finite-rank operators whose summands are mutually orthogonal on both the domain and codomain sides. Compactness is essential for the operator-norm convergence statement: the identity operator on an infinite-dimensional Hilbert space has singular value $1$ in every orthogonal direction and cannot be approximated in operator norm by finite-rank operators. The theorem is also stronger than pointwise expansion alone; the tail control by singular values is what makes the approximation uniform over the unit ball. This is the structural reason singular values appear in compact approximation, regularisation of ill-posed inverse problems, and Hilbert-Schmidt theory: they measure how rapidly the operator loses information along orthogonal directions. [example: Schmidt Decomposition of a Finite-Rank Operator] Let \begin{align*} T x = \sum_{j=1}^r a_j (x,e_j)_H f_j, \end{align*} where $(e_1,\dots,e_r)$ is orthonormal in $H$, $(f_1,\dots,f_r)$ is orthonormal in $K$, and $a_j\ne 0$. We compute $T^*T$ to identify the singular values. For $1\le i\le r$ and $x\in H$, \begin{align*} (Tx,f_i)_K=\left(\sum_{j=1}^r a_j(x,e_j)_H f_j,f_i\right)_K=a_i(x,e_i)_H. \end{align*} Since the inner product is conjugate-linear in the second variable, \begin{align*} (x,\overline{a_i}e_i)_H=a_i(x,e_i)_H. \end{align*} Thus \begin{align*} T^*f_i=\overline{a_i}e_i. \end{align*} For each $1\le i\le r$, \begin{align*} T^*Te_i=T^*(a_i f_i)=a_i\overline{a_i}e_i=|a_i|^2e_i. \end{align*} If $x$ is orthogonal to $\operatorname{span}\{e_1,\dots,e_r\}$, then $(x,e_j)_H=0$ for every $j$, so $Tx=0$ and hence \begin{align*} T^*Tx=0. \end{align*} Therefore the non-zero eigenvalues of $T^*T$ are $|a_1|^2,\dots,|a_r|^2$, repeated with multiplicity. By the definition of singular values, the singular values of $T$ are $|a_1|,\dots,|a_r|$, reordered in non-increasing order. Write \begin{align*} a_j=|a_j|e^{i\theta_j}. \end{align*} If we set \begin{align*} u_j=e^{i\theta_j}f_j, \end{align*} then $(u_1,\dots,u_r)$ is orthonormal because multiplying each $f_j$ by a complex number of modulus $1$ preserves inner products. Also, \begin{align*} |a_j|(x,e_j)_H u_j=|a_j|(x,e_j)_H e^{i\theta_j}f_j=a_j(x,e_j)_H f_j. \end{align*} Substituting this identity into the original formula gives \begin{align*} T x = \sum_{j=1}^r |a_j|(x,e_j)_H u_j. \end{align*} Thus the complex phases of the coefficients have moved into the left singular vectors, while the singular values themselves are the positive numbers $|a_j|$. [/example] The finite-rank example shows that truncating a Schmidt expansion produces controlled approximations, not merely formal partial sums. This leads to a sharper question: among all rank-$n$ approximations, does the truncation obtained by keeping the largest singular values minimise the operator-norm error? [quotetheorem:8403] [citeproof:8403] The result explains why singular values are not only spectral invariants but approximation numbers. Compactness appears again as the statement that these approximation errors tend to $0$, so compact operators are exactly norm limits of finite-rank operators in this spectral model. The statement is specific to the operator norm; in Hilbert-Schmidt or trace-class norms the optimal error is measured by different tails of the singular-value sequence, such as square-sums or ordinary sums. For non-compact operators there may be no sequence $s_j\to 0$ governing finite-rank approximation, and the identity operator again shows that the best rank-$n$ operator-norm error can remain equal to $1$ for every $n$. [remark: Relation with Normal Diagonalisation] For a compact normal operator $N$, the singular values are $|\lambda_j|$, where $\lambda_j$ are the non-zero eigenvalues repeated with multiplicity. The spectral theorem keeps the complex phases in the diagonal entries, while the singular-value decomposition separates each phase into the partial isometry in the polar decomposition. Thus the normal theory and the singular-value theory agree on size but answer different structural questions. [/remark] Chapter 6 shows that compact normal operators retain a matrix-like structure, though phases and sizes separate into different pieces. Chapter 7 replaces those discrete decompositions with projection-valued measures, which are the right language once continuous spectrum enters and countable eigenvalue lists are no longer enough. # 7. Spectral Measures and Projection-Valued Measures This chapter replaces the finite and countable sums from the compact spectral theorems in Chapters 5 and 6 by projection-valued measures. The guiding point is that a normal operator may have continuous spectrum, so its diagonal form cannot usually be written as a countable list of eigenvalues and eigenspaces. Projection-valued measures provide the correct substitute: they assign orthogonal projections to Borel sets in the spectral space, and integration against them turns bounded Borel functions into bounded operators. The chapter builds the measure-theoretic language needed for the [spectral theorem for bounded normal operators](/theorems/8412). ## Projection-Valued Measures on Compact Spectral Spaces The problem is to describe a decomposition of a Hilbert space indexed by Borel subsets rather than by individual eigenvalues. For a diagonal matrix, a subset of the eigenvalue set determines the span of the corresponding eigenspaces. For a multiplication operator, a Borel subset determines multiplication by its indicator function. Projection-valued measures abstract exactly this rule. [definition: Projection-Valued Measure] Let $H$ be a Hilbert space and let $K$ be a compact [metric space](/page/Metric%20Space). A projection-valued measure on $K$ is a map \begin{align*} E : \mathcal B(K) \to \mathcal L(H) \end{align*} such that: 1. $E(\Delta)$ is an orthogonal projection for every $\Delta \in \mathcal B(K)$. 2. $E(\varnothing)=0$ and $E(K)=I$. 3. $E(\Delta_1 \cap \Delta_2)=E(\Delta_1)E(\Delta_2)$ for all $\Delta_1,\Delta_2 \in \mathcal B(K)$. 4. If $(\Delta_n)_{n=1}^\infty$ are pairwise disjoint Borel sets, then \begin{align*} E\left(\bigcup_{n=1}^\infty \Delta_n\right)x = \sum_{n=1}^\infty E(\Delta_n)x \end{align*} for every $x \in H$, with convergence in the norm of $H$. [/definition] The strong-additivity axiom is the topology naturally forced by orthogonal decompositions of vectors. It asks for convergence after applying the projections to each vector, rather than [uniform convergence](/page/Uniform%20Convergence) in operator norm. [remark: Orthogonality of Disjoint Spectral Pieces] If $\Delta_1 \cap \Delta_2=\varnothing$, then $E(\Delta_1)E(\Delta_2)=0$. Hence the subspaces $E(\Delta_1)H$ and $E(\Delta_2)H$ are orthogonal. This is the operator-theoretic counterpart of disjoint measurable sets having no overlap. [/remark] This orthogonality explains why the strong series in the definition has a chance to converge. The summands $E(\Delta_n)x$ are mutually orthogonal vectors, so the Hilbert-space Pythagorean identity controls the partial sums. To use scalar measure theory in the construction of spectral integrals, we next need to extract ordinary complex measures from the operator-valued set function. [quotetheorem:8404] [citeproof:8404] Matrix coefficient measures let us translate operator-valued statements into scalar measure theory, but the theorem is using the full projection-valued structure. If $E(\Delta)$ were merely a family of bounded operators without orthogonal projections, the formula $(E(\Delta)x,x)_H$ need not be positive, and scalar integration would have no positive measure to start from. If countable additivity were only asserted in a weak or finite sense, disjoint unions could fail to produce countably additive measures, so the usual dominated convergence and approximation arguments would be unavailable. Thus the theorem identifies the precise bridge from Hilbert-space decompositions to scalar measure theory: first build positive measures $\mu_x$, then recover mixed coefficients by polarization. In the finite-dimensional case this bridge has no hidden analytic difficulty, because every scalar coefficient measure is a finite sum of point masses. That example is worth isolating before moving to continuous spectrum, since it shows that the abstract definition is not replacing diagonalization but extending the same bookkeeping rule beyond finite sums. [example: Finite-Dimensional Atomic Spectral Measure] Let $H=\mathbb C^n$ with its usual inner product, and suppose \begin{align*} A=\sum_{j=1}^m \lambda_jP_j \end{align*} where the $\lambda_j$ are distinct and $P_j$ is the orthogonal projection onto $\ker(A-\lambda_jI)$. The eigenspaces in this decomposition are mutually orthogonal and span $H$, so \begin{align*} P_iP_j=0 \text{ for } i\ne j,\qquad P_j^2=P_j,\qquad P_j^*=P_j,\qquad \sum_{j=1}^m P_j=I. \end{align*} On $K=\{\lambda_1,\dots,\lambda_m\}$, define \begin{align*} E(\Delta)=\sum_{\lambda_j\in\Delta}P_j \end{align*} for each Borel set $\Delta\subset K$. For each $\Delta\subset K$, the operator $E(\Delta)$ is an orthogonal projection. Indeed, \begin{align*} E(\Delta)^2=\left(\sum_{\lambda_i\in\Delta}P_i\right)\left(\sum_{\lambda_j\in\Delta}P_j\right)=\sum_{\lambda_i,\lambda_j\in\Delta}P_iP_j=\sum_{\lambda_j\in\Delta}P_j=E(\Delta). \end{align*} Also, \begin{align*} E(\Delta)^*=\left(\sum_{\lambda_j\in\Delta}P_j\right)^*=\sum_{\lambda_j\in\Delta}P_j^*=E(\Delta). \end{align*} Thus $E(\Delta)$ is self-adjoint and idempotent, hence an orthogonal projection. The empty set and whole space satisfy \begin{align*} E(\varnothing)=0 \end{align*} and \begin{align*} E(K)=\sum_{j=1}^mP_j=I. \end{align*} For two Borel sets $\Delta,\Omega\subset K$, \begin{align*} E(\Delta)E(\Omega)=\left(\sum_{\lambda_i\in\Delta}P_i\right)\left(\sum_{\lambda_j\in\Omega}P_j\right)=\sum_{\lambda_i\in\Delta,\lambda_j\in\Omega}P_iP_j=\sum_{\lambda_j\in\Delta\cap\Omega}P_j=E(\Delta\cap\Omega). \end{align*} If $(\Delta_n)_{n=1}^{\infty}$ are pairwise disjoint subsets of $K$, then each eigenvalue $\lambda_j$ lies in at most one $\Delta_n$, so for every $x\in H$, \begin{align*} E\left(\bigcup_{n=1}^{\infty}\Delta_n\right)x=\sum_{\lambda_j\in\bigcup_n\Delta_n}P_jx=\sum_{n=1}^{\infty}\sum_{\lambda_j\in\Delta_n}P_jx=\sum_{n=1}^{\infty}E(\Delta_n)x. \end{align*} Only finitely many terms in this series can be nonzero, because $K$ has only $m$ points. Hence the convergence is in norm, and $E$ is a projection-valued measure. For $x\in H$, the scalar coefficient measure is \begin{align*} \mu_x(\Delta)=(E(\Delta)x,x)_H=\left(\sum_{\lambda_j\in\Delta}P_jx,x\right)_H=\sum_{\lambda_j\in\Delta}(P_jx,x)_H. \end{align*} Since $P_j$ is an orthogonal projection, \begin{align*} (P_jx,x)_H=(P_jx,P_jx)_H=\|P_jx\|_H^2. \end{align*} Therefore \begin{align*} \mu_x(\Delta)=\sum_{\lambda_j\in\Delta}\|P_jx\|_H^2. \end{align*} In particular, $\mu_x$ has an atom of mass $\|P_jx\|_H^2$ at $\lambda_j$. Finally, integrating the coordinate function $z$ against this atomic projection-valued measure gives \begin{align*} \int_K z\,dE(z)=\sum_{j=1}^m\lambda_jE(\{\lambda_j\})=\sum_{j=1}^m\lambda_jP_j=A. \end{align*} Thus the usual diagonal decomposition of a normal matrix is exactly spectral integration against a finite atomic projection-valued measure. [/example] The finite-dimensional example shows the intended interpretation of $E(\Delta)$: it selects the component of a vector whose spectral values lie in $\Delta$. In infinite dimension, however, spectral pieces are no longer exhausted by finite partitions, so finite additivity is too weak to support integration or limiting decompositions. When Borel sets increase to a larger set, the corresponding projections must converge on each vector to the projection for the union; otherwise mass could disappear in the limit. The following criterion packages that countable-additivity requirement in the strong-operator form used to build projection-valued measures from monotone projection families. [quotetheorem:8405] [citeproof:8405] This criterion is useful because spectral projections are often constructed by first proving monotone convergence of projections, but it also shows why finite additivity is not enough. A finitely additive assignment can behave correctly on disjoint finite unions while losing mass along an increasing sequence of Borel sets; such an assignment would not support integration against countable partitions. The continuity-from-below hypothesis prevents this pathology by forcing the finite partial spectral pieces to converge to the whole spectral piece in the topology relevant to vectors. Once strong countable additivity is available, the familiar scalar tools of integration can be imported through the measures $\mu_{x,y}$. ## Integration Against Projection-Valued Measures The next problem is to turn a bounded Borel function $f:K\to\mathbb C$ into an operator. For diagonal matrices the rule is $f(A)=\sum_j f(\lambda_j)P_j$. For a general projection-valued measure, the same formula becomes an operator integral. [definition: Spectral Integral of a Simple Function] Let $E$ be a projection-valued measure on $K$, and let $s:K\to\mathbb C$ be a bounded Borel [simple function](/page/Simple%20Function) of the form \begin{align*} s = \sum_{j=1}^m \alpha_j \mathbb{1}_{\Delta_j}, \end{align*} where the sets $\Delta_1,\dots,\Delta_m$ are pairwise disjoint. The spectral integral of $s$ with respect to $E$ is \begin{align*} \int_K s\,dE := \sum_{j=1}^m \alpha_j E(\Delta_j) \in \mathcal L(H). \end{align*} [/definition] The definition is independent of the chosen disjoint representation of $s$ because common refinements of finite measurable partitions give the same projection sum. It already behaves like a diagonal operator: each spectral region is multiplied by the scalar value assigned to that region. [quotetheorem:8406] [citeproof:8406] The norm bound shows that simple spectral integrals are stable under uniform changes of the scalar function. The hypotheses cannot be dropped casually: without disjointness of the partition, the Pythagorean norm computation double-counts spectral pieces, and without the projection multiplication rule the inner product formula would not reduce to an ordinary scalar integral. The real-valued conclusion is also sharp in the expected way, since complex coefficients generally produce normal diagonal operators rather than self-adjoint ones. The next construction problem is to define $\int f\,dE$ for an arbitrary bounded Borel function $f$, where a finite partition formula is no longer available but uniform simple approximation still is. The simple-function estimate above is the mechanism that makes this extension possible: it turns uniform convergence of scalar functions into operator-norm convergence of their spectral integrals. Boundedness is the condition that keeps this approximation inside $\mathcal L(H)$, before any domain theory for unbounded operators enters. [definition: Spectral Integral of a Bounded Borel Function] Let $E$ be a projection-valued measure on a compact metric space $K$, and let $f:K\to\mathbb C$ be bounded and Borel. Choose bounded Borel simple functions $(s_n)_{n=1}^\infty$ such that $\|s_n-f\|_\infty\to 0$. The spectral integral of $f$ with respect to $E$ is the operator \begin{align*} \int_K f\,dE := \lim_{n\to\infty} \int_K s_n\,dE \end{align*} where the limit is taken in operator norm. [/definition] The preceding norm estimate proves that this definition is independent of the approximating sequence. Thus bounded Borel functions form a functional calculus once their products and conjugates are respected by the integral. [quotetheorem:8407] [citeproof:8407] This theorem says that spectral integration is a representation of the algebra of bounded Borel functions. The multiplicativity is the decisive point: a merely linear assignment $f\mapsto \int f\,dE$ would not deserve to be called a functional calculus, because it would not preserve products, inverses where they exist, or adjoints. The boundedness assumption also marks the present scope of the construction; unbounded functions require domain questions and belong to the unbounded spectral theorem rather than this chapter. Within the bounded setting, the theorem gives a precise meaning to applying a Borel function to a spectral variable. Step functions give the most visible approximation picture for this calculus, because they make the spectral space look like finitely many bins. That viewpoint also clarifies what changes from Riemann integration: the intervals are not subintervals of the Hilbert space, but subsets of the possible spectral values. [example: Step-Function Spectral Decomposition] Let $K=[a,b]$, let $E$ be a projection-valued measure, and set \begin{align*} \Delta_j=(t_{j-1},t_j] \end{align*} for $1\le j\le m$. For the simple function \begin{align*} s=\sum_{j=1}^m \alpha_j\mathbb{1}_{\Delta_j}, \end{align*} the definition of the simple spectral integral gives \begin{align*} \int_K s\,dE=\sum_{j=1}^m \alpha_jE(\Delta_j). \end{align*} If $x\in H$ and we look only at its spectral component $E(\Delta_k)x$ in one interval $\Delta_k$, then \begin{align*} \left(\int_K s\,dE\right)E(\Delta_k)x=\sum_{j=1}^m \alpha_jE(\Delta_j)E(\Delta_k)x. \end{align*} By the multiplication rule for a projection-valued measure, \begin{align*} E(\Delta_j)E(\Delta_k)=E(\Delta_j\cap\Delta_k). \end{align*} The intervals $\Delta_1,\dots,\Delta_m$ are pairwise disjoint, so $E(\Delta_j\cap\Delta_k)=0$ when $j\ne k$, while $E(\Delta_k\cap\Delta_k)=E(\Delta_k)$. Hence \begin{align*} \left(\int_K s\,dE\right)E(\Delta_k)x=\alpha_kE(\Delta_k)x. \end{align*} Thus the operator $\int_K s\,dE$ multiplies the spectral component lying in $\Delta_k$ by the scalar $\alpha_k$. The point $a$ is not contained in any $\Delta_j$, and for the spectral component $E(\{a\})x$ one has \begin{align*} \left(\int_K s\,dE\right)E(\{a\})x=\sum_{j=1}^m \alpha_jE(\Delta_j\cap\{a\})x=0. \end{align*} If $s$ is chosen so that it uniformly approximates the identity function $\operatorname{id}(t)=t$ on $[a,b]$, then the spectral integrals approximate the coordinate spectral integral. Indeed, the norm estimate for spectral integrals gives \begin{align*} \left\|\int_K s\,dE-\int_K \operatorname{id}\,dE\right\|_{\mathcal L(H)}\le \|s-\operatorname{id}\|_\infty. \end{align*} Therefore step functions approximate the operator represented by the spectral resolution by multiplying finer and finer spectral intervals by scalar sample values. [/example] Step functions show how the integral generalizes Riemann sums, but the projections are indexed by spectral sets rather than intervals of vectors. The most important example comes from ordinary multiplication by functions on an $L^2$ space. [example: Multiplication Operators on $L^2$] Let $(X,\mathcal A,\mu)$ be a finite measure space, let $H=L^2(X,\mathcal A,\mu)$, and let $m:X\to K$ be a measurable map into a compact metric space $K$. For each Borel set $\Delta\subset K$, define \begin{align*} (E(\Delta)u)(x)=\mathbb{1}_{m^{-1}(\Delta)}(x)u(x), \qquad u\in L^2(X,\mathcal A,\mu). \end{align*} Since $m^{-1}(\Delta)\in\mathcal A$ and $|\mathbb{1}_{m^{-1}(\Delta)}|\le 1$, this defines a bounded operator on $L^2(X,\mathcal A,\mu)$. We verify that $E(\Delta)$ is an orthogonal projection. For $u\in L^2(X,\mathcal A,\mu)$, \begin{align*} (E(\Delta)^2u)(x)=\mathbb{1}_{m^{-1}(\Delta)}(x)\mathbb{1}_{m^{-1}(\Delta)}(x)u(x)=\mathbb{1}_{m^{-1}(\Delta)}(x)u(x)=(E(\Delta)u)(x). \end{align*} Thus $E(\Delta)^2=E(\Delta)$. Also, for $u,v\in L^2(X,\mathcal A,\mu)$, \begin{align*} (E(\Delta)u,v)_H=\int_X \mathbb{1}_{m^{-1}(\Delta)}(x)u(x)\overline{v(x)}\,d\mu(x). \end{align*} Since the indicator is real-valued, \begin{align*} \int_X \mathbb{1}_{m^{-1}(\Delta)}(x)u(x)\overline{v(x)}\,d\mu(x)=\int_X u(x)\overline{\mathbb{1}_{m^{-1}(\Delta)}(x)v(x)}\,d\mu(x)=(u,E(\Delta)v)_H. \end{align*} Hence $E(\Delta)^*=E(\Delta)$, so $E(\Delta)$ is an orthogonal projection. The normalization axioms are immediate from the indicators: \begin{align*} E(\varnothing)u=\mathbb{1}_{\varnothing}u=0. \end{align*} Also, \begin{align*} E(K)u=\mathbb{1}_{m^{-1}(K)}u=\mathbb{1}_Xu=u. \end{align*} For Borel sets $\Delta,\Omega\subset K$, \begin{align*} (E(\Delta)E(\Omega)u)(x)=\mathbb{1}_{m^{-1}(\Delta)}(x)\mathbb{1}_{m^{-1}(\Omega)}(x)u(x)=\mathbb{1}_{m^{-1}(\Delta\cap\Omega)}(x)u(x). \end{align*} Therefore $E(\Delta)E(\Omega)=E(\Delta\cap\Omega)$. It remains to check countable additivity in the strong operator topology. Let $(\Delta_n)_{n=1}^{\infty}$ be pairwise disjoint Borel subsets of $K$ and let $u\in L^2(X,\mathcal A,\mu)$. The preimages $m^{-1}(\Delta_n)$ are pairwise disjoint, so for each $N$, \begin{align*} \sum_{n=1}^N E(\Delta_n)u=\mathbb{1}_{m^{-1}(\bigcup_{n=1}^N\Delta_n)}u. \end{align*} Thus \begin{align*} E\left(\bigcup_{n=1}^{\infty}\Delta_n\right)u-\sum_{n=1}^N E(\Delta_n)u=\mathbb{1}_{m^{-1}(\bigcup_{n=N+1}^{\infty}\Delta_n)}u. \end{align*} Taking $L^2$ norms gives \begin{align*} \left\|E\left(\bigcup_{n=1}^{\infty}\Delta_n\right)u-\sum_{n=1}^N E(\Delta_n)u\right\|_H^2=\int_X \mathbb{1}_{m^{-1}(\bigcup_{n=N+1}^{\infty}\Delta_n)}(x)|u(x)|^2\,d\mu(x). \end{align*} The indicators on the right decrease pointwise to $0$ and are bounded by $1$, so dominated convergence gives convergence of the integral to $0$. Hence the partial sums converge in norm to $E(\bigcup_n\Delta_n)u$, and $E$ is a projection-valued measure. Now let $f:K\to\mathbb C$ be bounded and Borel. First suppose $s=\sum_{j=1}^r\alpha_j\mathbb{1}_{\Gamma_j}$ is a bounded Borel simple function written over pairwise disjoint Borel sets. Then \begin{align*} \left(\int_K s\,dE\right)u=\sum_{j=1}^r\alpha_jE(\Gamma_j)u. \end{align*} Evaluating at $x\in X$ gives \begin{align*} \left(\sum_{j=1}^r\alpha_jE(\Gamma_j)u\right)(x)=\sum_{j=1}^r\alpha_j\mathbb{1}_{m^{-1}(\Gamma_j)}(x)u(x)=\left(\sum_{j=1}^r\alpha_j\mathbb{1}_{\Gamma_j}(m(x))\right)u(x)=(s\circ m)(x)u(x). \end{align*} Therefore \begin{align*} \left(\int_K s\,dE\right)u=(s\circ m)u. \end{align*} Choose bounded Borel simple functions $s_n$ with $\|s_n-f\|_\infty\to 0$. Since $|s_n(m(x))-f(m(x))|\le\|s_n-f\|_\infty$, we have \begin{align*} \|(s_n\circ m)u-(f\circ m)u\|_H\le \|s_n-f\|_\infty\|u\|_H. \end{align*} By the definition of the bounded Borel spectral integral, \begin{align*} \int_K f\,dE=\lim_{n\to\infty}\int_K s_n\,dE \end{align*} in operator norm. Applying this limit to $u$ and using the simple-function identity gives \begin{align*} \left(\int_K f\,dE\right)u=\lim_{n\to\infty}(s_n\circ m)u=(f\circ m)u. \end{align*} Thus spectral integration for this projection-valued measure is exactly multiplication by $f\circ m$, which is the usual functional calculus for multiplication operators. [/example] This model should be kept in mind throughout spectral theory. The [spectral theorem for normal operators](/theorems/2695) says that, after a unitary change of representation, every bounded normal operator behaves like this multiplication example. ## Spectral Support and Generalized Diagonalization The remaining question is how the projection-valued measure records the part of the [compact space](/page/Compact%20Space) that is actually seen by the Hilbert space. This is not just a naming issue: the same multiplication operator can be presented using a compact set $K$ that is larger than the values actually taken up to null sets, and then points in the extra region would falsely look like possible spectral values unless the zero-projection regions are removed. For example, multiplication by a function whose essential range is contained in $[0,1]$ may still be discussed as a map into $[-1,2]$; the intervals outside $[0,1]$ are invisible to every vector in $L^2$. The support extracts the smallest closed part of $K$ that the Hilbert space can detect. [definition: Support of a Projection-Valued Measure] Let $E$ be a projection-valued measure on a compact metric space $K$. The support of $E$ is \begin{align*} \operatorname{supp}(E) := K\setminus \bigcup\{U\subset K : U\text{ is open and }E(U)=0\}. \end{align*} [/definition] This definition says that a point belongs to the support precisely when every open neighbourhood of it carries a nonzero spectral projection. For later comparison with the spectrum of an operator, we need the support to be the minimal closed subset that carries all the spectral mass; the next theorem verifies this minimality. [quotetheorem:8408] [citeproof:8408] The minimality result says that no smaller closed subset of $K$ contains all spectral pieces. The second-countability of the compact metric space is doing real work here: it turns the possibly uncountable union of zero open sets into a countable one, so countable additivity can see the whole complement of the support. Without such a carrier theorem, enlarging $K$ would make spectral statements depend on arbitrary ambient choices rather than on the projections themselves. The next definition is needed when a particular operator is represented as the integral of the coordinate function: it names this closed carrier as the spectral support of that representation. [definition: Spectral Support of an Operator Representation] Let $H$ be a Hilbert space, let $E$ be a projection-valued measure on a compact subset $K\subset\mathbb C$, and define the bounded operator $T\in\mathcal L(H)$ by \begin{align*} T=\int_K z\,dE(z). \end{align*} The spectral support of this representation of $T:H\to H$ is $\operatorname{supp}(E)$. [/definition] The word representation is important here. Later, the spectral theorem identifies this set with $\sigma(T)$ for bounded normal operators when $K$ is chosen as a compact set containing the spectrum. Before proving that full theorem, we need to check that an operator built from a coordinate spectral integral already has the formal properties expected of a diagonal normal operator. [quotetheorem:8409] [citeproof:8409] This theorem is the formal version of diagonalization. The scalar variable $z$ plays the role of an eigenvalue, while $E(\Delta)H$ plays the role of the spectral subspace whose spectral values lie in $\Delta$. Normality is not an accidental consequence: it comes from the commutative scalar identity $z\overline z=\overline z z$, so the construction cannot represent a general non-normal operator by a single projection-valued measure in this way. The real-support case also explains the self-adjoint theorem as the special case where the spectral variable has no imaginary part. These limitations point back to the finite-dimensional picture, where normal matrices are exactly the matrices unitarily diagonalized by orthogonal spectral projections. [example: Diagonal Matrices Revisited] Let \begin{align*} K=\{\lambda_1,\dots,\lambda_m\} \end{align*} and define \begin{align*} E(\Delta)=\sum_{\lambda_j\in\Delta}P_j \end{align*} for $\Delta\subset K$. Since $E(\{\lambda_j\})=P_j$, the coordinate spectral integral is \begin{align*} \int_K z\,dE(z)=\sum_{j=1}^m \lambda_jE(\{\lambda_j\}). \end{align*} Substituting $E(\{\lambda_j\})=P_j$ gives \begin{align*} \int_K z\,dE(z)=\sum_{j=1}^m\lambda_jP_j=A. \end{align*} If $f:K\to\mathbb C$ is any function, then on the finite set $K$ it has the simple-function expansion \begin{align*} f=\sum_{j=1}^m f(\lambda_j)\mathbb{1}_{\{\lambda_j\}}. \end{align*} Therefore \begin{align*} \int_K f(z)\,dE(z)=\sum_{j=1}^m f(\lambda_j)E(\{\lambda_j\}). \end{align*} Again using $E(\{\lambda_j\})=P_j$, this becomes \begin{align*} \int_K f(z)\,dE(z)=\sum_{j=1}^m f(\lambda_j)P_j. \end{align*} This is exactly the usual diagonal functional calculus for $A$, so \begin{align*} f(A)=\sum_{j=1}^m f(\lambda_j)P_j=\int_K f(z)\,dE(z). \end{align*} The support can also be read off from the atoms. If $U\subset K$ is open in the relative topology, then \begin{align*} E(U)=\sum_{\lambda_j\in U}P_j. \end{align*} If $\lambda_k\in U$ and $P_k\ne 0$, then \begin{align*} E(U)P_k=\sum_{\lambda_j\in U}P_jP_k=P_k, \end{align*} because $P_jP_k=0$ for $j\ne k$ and $P_k^2=P_k$. Hence $E(U)\ne 0$. Conversely, if $U$ contains no eigenvalue with nonzero eigenspace, then every projection in the displayed sum is zero, so $E(U)=0$. Thus $\operatorname{supp}(E)$ is precisely the set of eigenvalues whose eigenspaces are nonzero. Ordinary diagonalization is therefore spectral integration against a finite atomic projection-valued measure: the atom at $\lambda_j$ is the projection $P_j$ onto the corresponding eigenspace. [/example] The same language also describes continuous spectrum, where there need not be any eigenvectors at all. For multiplication by $m$ on $L^2(X,\mu)$, the spectral projection $E(\Delta)$ is multiplication by $\mathbb{1}_{m^{-1}(\Delta)}$, and the operator is \begin{align*} M_m=\int_K z\,dE(z). \end{align*} The support is the essential range of $m$ in the sense that points outside it have neighbourhoods whose inverse image has $\mu$-measure zero. This is the model that turns the spectral theorem from a statement about bases of eigenvectors into a statement about measure-theoretic diagonalization. Chapter 7 recasts diagonalization in measure-theoretic form, preparing the final operator-theoretic statement of the spectral theorem. Chapter 8 then combines that framework with the earlier Hilbert-space and functional-analytic tools to give the spectral theorem for bounded normal operators in its full generality. # 8. Spectral Theorem for Bounded Normal Operators The previous chapters developed spectral theory through resolvents, compactness, and the Hilbert-space geometry of self-adjoint operators. This chapter uses the projection-valued measure integration developed in Chapter 7, the [Riesz representation theorem for positive functionals](/theorems/2979) on $C(K)$, and the basic Hilbert-space facts about orthogonal direct sums and projections. It replaces finite-dimensional diagonalisation by a measure-theoretic model: a bounded normal operator is multiplication by the coordinate function after decomposing the Hilbert space into cyclic pieces. The central questions are how the operator determines its spectral measure, how functional calculus becomes multiplication by functions, and how the spectrum is read from the support of that measure. ## Cyclic Subspaces and Multiplication Models A finite-dimensional normal matrix is diagonalised by finding enough orthogonal eigenvectors. Infinite-dimensional normal operators often have too few eigenvectors: multiplication by $x$ on $L^2([0,1],\mathcal L^1)$ has no eigenvectors, since $xf(x)=\lambda f(x)$ forces $f$ to be supported on the singleton $\{\lambda\}$, which has measure zero. This failure shows that the right substitute is not a basis of eigenvectors but a measure space on which the operator acts by multiplication. The first problem is to understand this model on the smallest invariant subspaces generated by a single vector. [definition: Normal Operator] Let $H$ be a complex Hilbert space. A bounded operator $T \in \mathcal{L}(H)$ is normal if \begin{align*} T^*T = TT^*. \end{align*} [/definition] Normality is the structural hypothesis that lets the adjoint and the operator be analysed simultaneously. To build a multiplication model from this condition, we need to know what part of the Hilbert space is generated by repeatedly applying $T$ and $T^*$ to one vector. [definition: Cyclic Subspace] Let $T \in \mathcal{L}(H)$ and let $h \in H$. The cyclic subspace generated by $h$ for $T$ is \begin{align*} H_h = \overline{\{p(T,T^*)h : p \in \mathbb{C}[z,\bar z]\}}. \end{align*} The vector $h$ is cyclic for $T$ if $H_h = H$. [/definition] The closure is essential because multiplication by continuous functions first produces a dense algebra of vectors, while the Hilbert space model is completed in an $L^2$ norm. The next problem is to prove that a cyclic normal operator is not merely similar to multiplication, but unitarily equivalent to multiplication by the coordinate function on an $L^2$ space. [quotetheorem:8410] [citeproof:8410] This theorem is the infinite-dimensional analogue of diagonalising on the orbit of one vector, but it also marks the point where eigenvectors stop being the central object. The nonzero hypothesis on $h$ is needed because the constant function $1$ in the model has norm $\mu_h(K)^{1/2}=\|h\|_H$, so the zero vector would give the degenerate zero measure rather than a genuine cyclic representation. Normality is the hypothesis that prevents hidden Jordan behaviour: on $\mathbb C^2$, the nilpotent operator $N e_2=e_1$ and $N e_1=0$ is cyclic, but $\sigma(N)=\{0\}$ and any multiplication model on a measure space supported at $\{0\}$ would be the zero operator, not $N$. Thus the theorem is not a cyclic representation theorem for arbitrary bounded operators; it uses normality to turn the cyclic orbit into a scalar multiplication model. The limitation is that this theorem only describes $H_h$, not necessarily all of $H$; the next section explains how to assemble many such pieces. The following example is the basic test case: it has a cyclic vector and a complete spectral representation, but no eigenbasis at all. [example: Continuous Spectrum With A Cyclic Vector] Let $H=L^2([0,1],\mathcal L^1)$ and let $T=M_x$, so $(Tf)(x)=xf(x)$. Since $x$ is real-valued, $T^*=T$: for $f,g\in L^2([0,1])$, \begin{align*} (Tf,g)_{L^2}=\int_0^1 xf(x)\overline{g(x)}\,d\mathcal L^1(x)=\int_0^1 f(x)\overline{xg(x)}\,d\mathcal L^1(x)=(f,Tg)_{L^2}. \end{align*} Thus polynomials in $T$ and $T^*$ applied to $1$ are exactly polynomial functions in $x$. Polynomial functions are uniformly dense in $C([0,1])$, and $C([0,1])$ is dense in $L^2([0,1],\mathcal L^1)$, so the constant function $1$ is cyclic. For a polynomial $p$, the vector $p(T)1$ is the function $x\mapsto p(x)$, hence \begin{align*} (p(T)1,1)_{L^2}=\int_0^1 p(x)\overline{1}\,d\mathcal L^1(x)=\int_0^1 p(x)\,d\mathcal L^1(x). \end{align*} Therefore the cyclic spectral measure associated to $1$ is [Lebesgue measure](/page/Lebesgue%20Measure) on $[0,1]$. The operator is self-adjoint and cyclic, but it has no eigenvectors. If $M_xf=\lambda f$, then for almost every $x\in[0,1]$, \begin{align*} xf(x)=\lambda f(x). \end{align*} Subtracting the right-hand side gives \begin{align*} (x-\lambda)f(x)=0 \end{align*} for almost every $x$. Hence $f(x)=0$ for almost every $x\ne\lambda$, so $f$ is supported, up to a null set, on $\{\lambda\}\cap[0,1]$. This set has Lebesgue measure $0$, and therefore $\|f\|_{L^2}^2=\int_0^1 |f(x)|^2\,d\mathcal L^1(x)=0$. Thus $f=0$ in $L^2([0,1],\mathcal L^1)$, so no nonzero eigenvector exists. The spectral measure is continuous mass spread across intervals, not mass coming from eigenspaces at points. [/example] The example shows why spectral theory for normal operators is a theory of measures rather than only of eigenvalues. To make this representation independent of a chosen vector, the next step is to split the whole Hilbert space into orthogonal cyclic pieces. ## Orthogonal Cyclic Decomposition and Spectral Measures A single cyclic vector need not exist. For instance, if $T=\lambda I$ on a Hilbert space of dimension greater than one, then $H_h=\operatorname{span}\{h\}$ for every nonzero $h$, so no single vector can generate the whole space. Even diagonal matrices may require several coordinate vectors to see all eigenspaces, and general Hilbert spaces can require many cyclic summands. The guiding question is how to assemble the cyclic models without losing the operator-theoretic structure. Here a cyclic subspace means the closed span of all vectors obtained from one vector by applying continuous functions of the operator. More algebraically, $C^*(T,I)$ denotes the smallest closed self-adjoint operator algebra containing $T$ and the identity $I$; a subspace is reducing when it is invariant for both $T$ and $T^*$. The general construction of maximal orthogonal cyclic reducing summands is useful background, but in this note it should be treated as a structural method rather than as a new theorem to be used computationally. It relies on maximality arguments and on cyclic representation theory that are not developed here. [remark: Cyclic Reduction as a Structural Tool] The normal spectral theorem may be organized by decomposing the representation of the algebra $C^*(T,I)$ into orthogonal cyclic reducing pieces. On each cyclic piece, the operator can be modeled as multiplication by the spectral variable on an $L^2$-space. The reducing condition is essential: without invariance for both $T$ and $T^*$, normality may be lost on the summands. This is not a constructive algorithm in general, and the family of cyclic pieces need not be countable. Its value is architectural: once the multiplication models are assembled, the remaining coordinate-free data are the projections corresponding to Borel subsets of the spectral variable. [/remark] Chapter 7 introduced projection-valued measures, scalar spectral measures, and spectral integrals. We now use that language rather than redefining it: the next theorem proves that every bounded normal operator supplies exactly such a projection-valued measure, so that the operator itself is the integral of the coordinate function. [quotetheorem:8412] [citeproof:8412] The theorem says that a bounded normal operator has been diagonalised, but the diagonal is now a measurable function rather than a sequence. Normality is essential here: non-normal operators can have nilpotent or Jordan-type behaviour that cannot be represented by multiplication by a scalar function. The theorem also has a limitation worth keeping in view: the spectral measure is projection-valued, not a single scalar measure, until a vector or pair of vectors is chosen. The projections $E(B)$ are the spectral subspaces corresponding to spectral windows $B$, and they are what make the Borel functional calculus possible. [example: Bilateral Shift As Multiplication By The Circle Coordinate] Let $H=\ell^2(\mathbb Z)$ with standard basis $(e_n)_{n\in\mathbb Z}$, and define $Ue_n=e_{n+1}$. Define the [Fourier transform](/page/Fourier%20Transform) $\mathcal F:\ell^2(\mathbb Z)\to L^2(\mathbb T,m)$ on basis vectors by \begin{align*} \mathcal F e_n(z)=z^n. \end{align*} Since $(z^n)_{n\in\mathbb Z}$ is an orthonormal basis of $L^2(\mathbb T,m)$, this prescription extends to a unitary map. On each basis vector, \begin{align*} (\mathcal F Ue_n)(z)=\mathcal F e_{n+1}(z)=z^{n+1}. \end{align*} Also, \begin{align*} (M_z\mathcal F e_n)(z)=z\cdot z^n=z^{n+1}. \end{align*} Thus $\mathcal F Ue_n=M_z\mathcal F e_n$ for every $n\in\mathbb Z$, and by linearity and continuity, \begin{align*} \mathcal F U\mathcal F^{-1}=M_z. \end{align*} In the multiplication model, the projection associated to a Borel set $B\subset\mathbb T$ is multiplication by its indicator: \begin{align*} \widetilde E(B)f=\mathbb 1_B f. \end{align*} Transporting this projection back to $\ell^2(\mathbb Z)$ gives \begin{align*} E(B)=\mathcal F^{-1}M_{\mathbb 1_B}\mathcal F. \end{align*} For this projection-valued measure, \begin{align*} \int_{\mathbb T} z\,d\widetilde E(z)=M_z, \end{align*} so \begin{align*} \int_{\mathbb T} z\,dE(z)=\mathcal F^{-1}M_z\mathcal F=U. \end{align*} Finally, $\sigma(U)=\sigma(M_z)=\mathbb T$. Indeed, if $\lambda\notin\mathbb T$, then $|z-\lambda|$ is bounded below on $\mathbb T$, so $M_z-\lambda I=M_{z-\lambda}$ has bounded inverse $M_{1/(z-\lambda)}$. If $\lambda\in\mathbb T$, then every arc around $\lambda$ has positive $m$-measure, so $z-\lambda$ is not bounded away from $0$ on sets of full measure; hence $M_{1/(z-\lambda)}$ is not bounded and $M_z-\lambda I$ is not invertible. The bilateral shift is therefore exactly multiplication by the circle coordinate, with spectral projections given by cutting the circle into Borel pieces. [/example] This model is the normal-operator replacement for Jordan form. There are no nilpotent blocks because normality forces orthogonal spectral pieces rather than chains of generalized eigenvectors. ## Functional Calculus and Spectral Projections Once $T$ is represented as multiplication by $z$, every bounded Borel function of $T$ is obtained by multiplying by $f(z)$. The remaining issue is to translate familiar operations on functions into operator statements that can be used without returning to a chosen model. [quotetheorem:8413] [citeproof:8413] These rules justify treating $T$ as a measurable coordinate function, but only within the normal-operator setting supplied by the spectral measure. The uniformly bounded convergence hypothesis is necessary. For $T=M_x$ on $L^2([0,1],\mathcal L^1)$, the functions $f_n(x)=n\mathbf{1}_{(0,1/n)}(x)$ converge pointwise to $0$, but \begin{align*} \|f_n(T)1\|_{L^2}^2=\int_0^{1/n} n^2\,d\mathcal L^1=n, \end{align*} so the corresponding operators do not converge strongly to $0$. The theorem also does not assert norm convergence from pointwise convergence: even the uniformly bounded functions $\mathbf{1}_{(0,1/n)}$ converge pointwise to $0$ while the multiplication operators have norm $1$ for every $n$. The most important special case is obtained by taking indicator functions of Borel sets, because these functions isolate parts of the operator by subsets of the spectrum. [example: Atomic And Continuous Diagonal Parts] Let \begin{align*} H=\ell^2(\mathbb N)\oplus L^2([0,1],\mathcal L^1) \end{align*} and define \begin{align*} T((a_n)_{n=1}^\infty,f)=\left(\left(\frac{1}{n}a_n\right)_{n=1}^\infty,\, M_x f\right). \end{align*} The operator is self-adjoint because, for $a,b\in\ell^2(\mathbb N)$ and $f,g\in L^2([0,1])$, \begin{align*} (T(a,f),(b,g))_H=\sum_{n=1}^\infty \frac{1}{n}a_n\overline{b_n}+\int_0^1 xf(x)\overline{g(x)}\,d\mathcal L^1(x). \end{align*} Since each $1/n$ and each $x\in[0,1]$ is real, this equals \begin{align*} \sum_{n=1}^\infty a_n\overline{\frac{1}{n}b_n}+\int_0^1 f(x)\overline{xg(x)}\,d\mathcal L^1(x)=((a,f),T(b,g))_H. \end{align*} For a Borel set $B\subset[0,1]$, define \begin{align*} E(B)((a_n)_{n=1}^\infty,f)=\left((\mathbb 1_B(1/n)a_n)_{n=1}^\infty,\,\mathbb 1_B f\right). \end{align*} This is an orthogonal projection: on the sequence part, multiplication by $\mathbb 1_B(1/n)$ squares to itself coordinate by coordinate, and on the $L^2$ part, \begin{align*} M_{\mathbb 1_B}^2=M_{\mathbb 1_B^2}=M_{\mathbb 1_B}=M_{\overline{\mathbb 1_B}}=M_{\mathbb 1_B}^*. \end{align*} It also satisfies $E(\varnothing)=0$ and $E([0,1])=I$, and countable additivity follows coordinatewise on $\ell^2(\mathbb N)$ and from countable additivity of multiplication by indicator functions on $L^2([0,1])$. The atoms are visible on the standard coordinate vectors. If $e_k\in\ell^2(\mathbb N)$ and we regard $(e_k,0)\in H$, then \begin{align*} E_{(e_k,0)}(B)=(E(B)(e_k,0),(e_k,0))_H=\mathbb 1_B(1/k). \end{align*} Thus the scalar spectral measure of $(e_k,0)$ is the point mass $\delta_{1/k}$. On the continuous summand, for $(0,f)\in H$, \begin{align*} E_{(0,f)}(B)=(E(B)(0,f),(0,f))_H=\int_0^1 \mathbb 1_B(x)|f(x)|^2\,d\mathcal L^1(x). \end{align*} Equivalently, \begin{align*} E_{(0,f)}(B)=\int_B |f(x)|^2\,d\mathcal L^1(x), \end{align*} so this part is absolutely continuous with respect to Lebesgue measure. Finally, the spectral integral recovers $T$. On the $k$th sequence coordinate, the function $z$ takes the value $1/k$, so \begin{align*} \left(\int_{[0,1]} z\,dE(z)(a,f)\right)_k=\frac{1}{k}a_k. \end{align*} On the $L^2$ summand, integration against the multiplication projections gives multiplication by the coordinate function: \begin{align*} \left(\int_{[0,1]} z\,dE(z)(a,f)\right)_{L^2}=M_x f. \end{align*} Therefore \begin{align*} \int_{[0,1]} z\,dE(z)(a,f)=\left(\left(\frac{1}{n}a_n\right)_{n=1}^\infty,M_x f\right)=T(a,f). \end{align*} The same projection-valued measure therefore records pure point mass at the eigenvalues $1/n$ and continuous spectral mass spread through the interval $[0,1]$. [/example] The functional calculus also detects invariant subspaces generated by spectral sets. If $B$ is Borel, then $E(B)H$ reduces $T$, and the restriction has spectrum contained in the closure of $B\cap\sigma(T)$. [remark: Spectral Projections As Infinite-Dimensional Eigenspaces] For an eigenvalue $\lambda$ of a normal operator $T$, the projection $E(\{\lambda\})$ is the orthogonal projection onto $\ker(T-\lambda I)$. For a general Borel set $B$, the range $E(B)H$ is the subspace of vectors whose spectral mass is concentrated in $B$. Thus spectral projections generalise eigenspaces from points to measurable subsets. [/remark] This point of view will be used in the next chapter to define continuous and holomorphic functional calculi more systematically. Here it already gives a precise meaning to localising an operator to part of its spectrum. ## Spectrum, Approximate Eigenvalues, and Support For multiplication by a function, invertibility is governed by whether the function is bounded away from zero on the measure-theoretic support. The final question of this chapter is how the spectrum of a normal operator is recovered from its spectral measure and why approximate eigenvectors replace missing eigenvectors. [definition: Support Of A Spectral Measure] Let $K$ be a compact metric space, let $H$ be a complex Hilbert space, and let \begin{align*} E:\mathcal B(K)\to\mathcal L(H) \end{align*} be a projection-valued measure. The support of $E$ is the complement of the union of all open sets $U\subset K$ such that $E(U)=0$. [/definition] The support records exactly the spectral values that the Hilbert space can detect. The obstruction to invertibility is local: if the coordinate function is bounded away from a point on all spectral mass, then a bounded reciprocal exists on the support; if every neighbourhood of the point carries nonzero spectral projection, no bounded inverse can remove that mass. Thus the spectral question becomes whether the projection-valued measure sees arbitrarily small neighbourhoods of the proposed spectral value. [quotetheorem:8414] [citeproof:8414] This result explains why the spectrum can be large even when no eigenvectors exist: support is a measure-theoretic condition, while an eigenvalue requires positive spectral mass at a single point. The hypotheses are doing more than supplying notation. For the Jordan nilpotent $N$ on $\mathbb C^2$, the spectrum is $\{0\}$, but there is no projection-valued spectral measure whose integral of $z$ gives $N$, since a measure supported at $\{0\}$ would give $\int z\,dE(z)=0$. Thus the support formula is a statement about operators that have a normal spectral representation; it is not a general recipe for assigning a support to an arbitrary operator. For normal operators, support points can still be detected by vectors whose spectral mass is concentrated in smaller and smaller neighbourhoods. This is a special feature of the spectral theorem; for general operators, residual spectrum can prevent spectral values from being approximated by eigenvectors in this way. To capture such spectral values by vectors, we need a weaker notion of eigenvalue in which the eigenvector equation is satisfied in the limit. [definition: Approximate Point Spectral Value] Let $T\in\mathcal{L}(H)$ and let $\lambda\in\mathbb C$. The scalar $\lambda$ is an approximate eigenvalue of $T$ if there exists a sequence $(h_n)_{n=1}^\infty$ in $H$ such that $\|h_n\|_H=1$ for all $n$ and \begin{align*} \|(T-\lambda I)h_n\|_H\to 0. \end{align*} [/definition] Approximate eigenvectors are built by cutting the spectral measure closer and closer to the point. For a general operator, a spectral value can persist because the range is not dense, rather than because vectors almost satisfy an eigenvalue equation. Normal spectral measures remove that residual obstruction: nonzero spectral mass near $\lambda$ gives unit vectors concentrated near $\lambda$, and the multiplier $z-\lambda$ is small on those vectors. [quotetheorem:8415] [citeproof:8415] For self-adjoint operators this recovers the familiar picture that spectral values can be detected by vectors on which the operator almost acts as scalar multiplication. The theorem does not say that every spectral value is an eigenvalue; it says that the eigenvalue equation can be approximated on unit vectors. Normality is again doing real work, because the proof uses orthogonal spectral projections near $\lambda$. A concrete failure occurs for the unilateral shift $S$ on $\ell^2(\mathbb N)$, defined by $S e_n=e_{n+1}$. Every $\lambda$ with $|\lambda|<1$ lies in $\sigma(S)$, but it is an eigenvalue of $S^*$ rather than an approximate eigenvalue of $S$; indeed, for every $h\in\ell^2(\mathbb N)$, \begin{align*} \|(S-\lambda I)h\|_{\ell^2}\ge (1-|\lambda|)\|h\|_{\ell^2}, \end{align*} so no unit vectors can make $(S-\lambda I)h$ tend to $0$. This is residual spectrum, and it is exactly the obstruction absent in the normal theorem. For unitary operators, the same idea says that spectral values on the unit circle may be detected by wave packets localised near arcs. [example: Bilateral Shift Has No Eigenvectors But Full Approximate Spectrum] Under the Fourier transform $\mathcal F:\ell^2(\mathbb Z)\to L^2(\mathbb T,m)$, the bilateral shift $U$ is represented by $M_z$, meaning $\mathcal F U\mathcal F^{-1}=M_z$. We first show that $U$ has no eigenvectors by checking the equivalent multiplication equation. If $M_z f=\lambda f$, then for $m$-almost every $z\in\mathbb T$, \begin{align*} zf(z)=\lambda f(z). \end{align*} Subtracting the right-hand side gives \begin{align*} (z-\lambda)f(z)=0. \end{align*} If $\lambda\notin\mathbb T$, then $z-\lambda\ne 0$ for every $z\in\mathbb T$, so $f(z)=0$ for almost every $z$. If $\lambda\in\mathbb T$, then $f(z)=0$ for almost every $z\in\mathbb T\setminus\{\lambda\}$, so $f$ is supported, up to a null set, on $\{\lambda\}$. Since $m(\{\lambda\})=0$, \begin{align*} \|f\|_{L^2(\mathbb T,m)}^2=\int_{\{\lambda\}} |f(z)|^2\,dm(z)=0. \end{align*} Thus $f=0$ in $L^2(\mathbb T,m)$, and therefore $U$ has no nonzero eigenvectors. Now fix $\lambda\in\mathbb T$. For each $n$, set \begin{align*} A_n=\{z\in\mathbb T:|z-\lambda|<1/n\}. \end{align*} This is a nonempty open arc, so $m(A_n)>0$. Define \begin{align*} f_n=\frac{\mathbb 1_{A_n}}{m(A_n)^{1/2}}. \end{align*} Then \begin{align*} \|f_n\|_{L^2(\mathbb T,m)}^2=\int_{\mathbb T}\frac{\mathbb 1_{A_n}(z)}{m(A_n)}\,dm(z)=1. \end{align*} Also, since $f_n$ is supported on $A_n$ and $|z-\lambda|<1/n$ on $A_n$, \begin{align*} \|(M_z-\lambda I)f_n\|_{L^2}^2=\int_{A_n}|z-\lambda|^2\frac{1}{m(A_n)}\,dm(z)\le \frac{1}{n^2}. \end{align*} Hence \begin{align*} \|(M_z-\lambda I)f_n\|_{L^2}\le \frac{1}{n}. \end{align*} Let $h_n=\mathcal F^{-1}f_n$. Since $\mathcal F$ is unitary, $\|h_n\|_{\ell^2}=1$, and \begin{align*} \|(U-\lambda I)h_n\|_{\ell^2}=\|(\mathcal F U\mathcal F^{-1}-\lambda I)f_n\|_{L^2}=\|(M_z-\lambda I)f_n\|_{L^2}\le \frac{1}{n}. \end{align*} Therefore every $\lambda\in\mathbb T$ is an approximate eigenvalue of $U$, even though no point of $\mathbb T$ is an actual eigenvalue. [/example] The chapter has now replaced diagonalisation by the statement that bounded normal operators are multiplication operators assembled from spectral measures. The next stage is to use this representation as a calculus: continuous, Borel, and holomorphic functions of an operator become corresponding functions of the spectral variable. Chapter 8 expresses bounded normal operators as multiplication operators determined by their spectral measures, so the remaining task is to turn that representation into a usable calculus. Chapter 9 does exactly that, developing continuous, holomorphic, and Borel functional calculus as the natural extension of scalar functions to operators. # 9. Functional Calculus This chapter turns the spectral theorem into a calculus: instead of applying polynomials to a normal operator, we apply functions defined on its spectrum. The guiding question is how much of a scalar function $f$ can be transferred to an operator $T$ while preserving algebraic relations, norm estimates, positivity, and spectral information. The continuous calculus gives a norm-controlled answer for continuous functions, and the Borel calculus adds characteristic functions, hence spectral projections. ## Continuous Functions of One Bounded Normal Operator The polynomial and rational calculi from Chapter 2 let us form $p(T)$ and suitable $r(T)$, and the spectral theorem for bounded normal operators from Chapter 8 suggests that a normal operator should behave like multiplication by the coordinate function on its spectrum. The problem is to pass from polynomials to all continuous functions without losing uniqueness or control of the operator norm. [definition: Continuous Functional Calculus] Let $H$ be a complex Hilbert space and let $T \in \mathcal{L}(H)$ be bounded and normal. A continuous functional calculus for $T$ is a unital $*$-homomorphism \begin{align*} \Phi_T : C(\sigma(T)) \to \mathcal{L}(H) \end{align*} such that $\Phi_T(\operatorname{id}_{\sigma(T)}) = T$, where $\operatorname{id}_{\sigma(T)}(z)=z$. For $f \in C(\sigma(T))$, write $f(T) := \Phi_T(f)$. [/definition] This definition asks for more than a notation: it packages the algebraic rules that functions of $T$ must satisfy. Here $C(\sigma(T))$ is the algebra of continuous complex-valued functions on the compact spectrum, and a unital $*$-homomorphism is a map preserving sums, products, scalar multiplication, the constant function $1$, and the operation of complex conjugation. The adjoint rule means that complex conjugation of scalar functions corresponds to taking Hilbert-space adjoints, and the unital condition keeps constant functions tied to scalar multiples of $I$. At this stage we use the continuous part of the normal-operator spectral theorem as a calculus principle rather than introducing the full projection-valued-measure machinery. For each bounded normal $T$, there is a unique such map $f\mapsto f(T)$, and it extends the polynomial calculus. Each hypothesis is doing work. Boundedness makes $\sigma(T)$ compact, so $C(\sigma(T))$ has the supremum norm and contains the constant functions; normality is what makes the polynomial expressions in $T$ and $T^*$ behave like commutative functions of $z$ and $\bar z$. For a non-normal operator, the estimate $\|p(T)\|=\sup_{\lambda \in \sigma(T)}|p(\lambda)|$ can fail badly: a nilpotent Jordan block has spectrum $\{0\}$ but is not the zero operator, so the spectrum alone cannot control polynomial functions of it. Continuity is also a real limitation: this continuous calculus does not produce $\mathbf{1}_E(T)$ for a sharp spectral set $E$, because characteristic functions usually do not lie in $C(\sigma(T))$. [example: Square Root of a Positive Operator] Let $A \in \mathcal{L}(H)$ be bounded, self-adjoint, and positive, so $\sigma(A)\subset [0,\infty)$. On $\sigma(A)$ define $f(t)=\sqrt{t}$ and let $\operatorname{id}_{\sigma(A)}(t)=t$. Since $f(t)^2=t$ for every $t\in \sigma(A)$, we have $f\cdot f=\operatorname{id}_{\sigma(A)}$ as functions on $\sigma(A)$. The continuous functional calculus defines $A^{1/2}:=f(A)$. By multiplicativity and the defining property of the continuous functional calculus, \begin{align*} A^{1/2}A^{1/2}=f(A)f(A)=(f\cdot f)(A)=\operatorname{id}_{\sigma(A)}(A)=A. \end{align*} Also $f$ is real-valued and $f(t)\ge 0$ for every $t\in\sigma(A)$, so the positivity criterion in the functional calculus gives $A^{1/2}=f(A)\ge 0$. Thus the continuous functional calculus produces a positive operator whose square is $A$, without choosing eigenvectors or assuming that $A$ is compact. [/example] The square-root construction illustrates the main shift in viewpoint: formulas can be made directly on the spectrum. The same method applies to functions that encode evolution, especially exponential functions on the [spectrum of a self-adjoint operator](/theorems/552). [example: Unitary Exponentials of Bounded Self-Adjoint Operators] Let $A \in \mathcal{L}(H)$ be bounded and self-adjoint, so $\sigma(A)\subset \mathbb R$. Fix $t\in \mathbb R$ and define $f_t(s)=e^{its}$ for $s\in\sigma(A)$. The continuous functional calculus defines $e^{itA}:=f_t(A)$. For each $s\in\sigma(A)$, \begin{align*} \overline{f_t}(s)f_t(s)=\overline{e^{its}}e^{its}=e^{-its}e^{its}=e^0=1. \end{align*} Hence $\overline{f_t}\,f_t=1$ as functions on $\sigma(A)$. Using the adjoint rule and multiplicativity in the continuous functional calculus, \begin{align*} (e^{itA})^*e^{itA}=f_t(A)^*f_t(A)=\overline{f_t}(A)f_t(A)=(\overline{f_t}\,f_t)(A)=1(A)=I. \end{align*} Similarly, since $f_t(s)\overline{f_t}(s)=e^{its}e^{-its}=1$ for every $s\in\sigma(A)$, \begin{align*} e^{itA}(e^{itA})^*=f_t(A)\overline{f_t}(A)=(f_t\overline{f_t})(A)=1(A)=I. \end{align*} Thus $e^{itA}$ is unitary. If $r,t\in\mathbb R$, then for every $s\in\sigma(A)$, \begin{align*} f_{t+r}(s)=e^{i(t+r)s}=e^{its+irs}=e^{its}e^{irs}=f_t(s)f_r(s). \end{align*} Therefore $f_{t+r}=f_tf_r$ on $\sigma(A)$, and multiplicativity gives \begin{align*} e^{i(t+r)A}=f_{t+r}(A)=(f_tf_r)(A)=f_t(A)f_r(A)=e^{itA}e^{irA}. \end{align*} So the scalar exponential law on the spectrum becomes a unitary one-parameter group of operators. [/example] These examples depend only on continuous functions. To isolate spectral pieces, however, we need functions such as $\mathbf{1}_E$, and characteristic functions are usually discontinuous. This leads from the continuous calculus to a measure-theoretic enlargement. ## Borel Functions and Spectral Projections The next problem is to recover projections onto the part of $H$ where the spectrum lies in a Borel set $E \subset \sigma(T)$. Continuous functions cannot usually detect a sharp set boundary, so the calculus must be enlarged from $C(\sigma(T))$ to bounded Borel functions. [definition: Bounded Borel Functional Calculus] Let $H$ be a complex Hilbert space and let $T \in \mathcal{L}(H)$ be bounded and normal. A bounded Borel functional calculus for $T$ is a unital $*$-homomorphism \begin{align*} \Psi_T : B_b(\sigma(T)) \to \mathcal{L}(H) \end{align*} from bounded Borel functions on $\sigma(T)$ into bounded operators on $H$ extending the continuous functional calculus. For $f \in B_b(\sigma(T))$, write $f(T):=\Psi_T(f)$. [/definition] This enlargement keeps the algebraic rules of the continuous calculus but adds discontinuous functions that can detect exact spectral regions. To state the resulting decomposition theorem, we need a name for the operator obtained by applying the Borel calculus to a characteristic function. [definition: Spectral Projection] Let $H$ be a complex Hilbert space, let $T \in \mathcal{L}(H)$ be bounded and normal, and let $E \subset \sigma(T)$ be a Borel set. The spectral projection of $T$ associated to $E$ is the operator \begin{align*} P_T(E) : H \to H, \qquad P_T(E) := \mathbf{1}_E(T). \end{align*} [/definition] Since $\mathbf{1}_E^2=\mathbf{1}_E$ and $\overline{\mathbf{1}_E}=\mathbf{1}_E$, the calculus gives $P_T(E)^2=P_T(E)$ and $P_T(E)^*=P_T(E)$. These algebraic identities explain why characteristic functions should produce projections, but they do not by themselves justify countable decompositions of the spectrum. What is needed is the full Borel functional calculus: a unital $*$-homomorphism \begin{align*} \Psi_T : B_b(\sigma(T)) \to \mathcal{L}(H), \qquad f \mapsto f(T), \end{align*} extending the continuous functional calculus and satisfying the usual bounded monotone convergence property. This is the countable-additivity mechanism that lets $E \mapsto \mathbf{1}_E(T)$ behave as a genuine spectral projection assignment. [remark: Borel Functional Calculus] For a bounded normal operator $T$ on a complex Hilbert space $H$, the Borel functional calculus extends continuous functional calculus from $C(\sigma(T))$ to bounded Borel functions $B_b(\sigma(T))$. Applying it to characteristic functions gives spectral projections $P_T(E)=\mathbf{1}_E(T)$, and the monotone convergence property ensures the countable additivity needed for a projection-valued measure. [/remark] The Borel functional calculus theorem is needed because spectral decompositions are built from sharp cuts of the spectrum, and sharp cuts are encoded by bounded Borel functions rather than continuous functions. The monotone convergence condition is not decorative: it is the countable-additivity mechanism that turns $E \mapsto P_T(E)$ into a genuine projection-valued measure instead of only a finitely additive algebraic assignment. Normality remains essential because the projections must commute coherently and come from a spectral measure; for general non-normal operators there is no comparable projection-valued decomposition determined just by the spectrum. The theorem is still bounded in scope: it handles bounded Borel functions on the compact spectrum of a bounded operator, not arbitrary unbounded Borel functions without domain questions. The Borel calculus is often summarized by the spectral integral notation \begin{align*} T = \int_{\sigma(T)} \lambda \, dP_T(\lambda), \end{align*} where the integral is understood through the Borel functional calculus. This notation is useful because it makes $f(T)$ look like the scalar operation $\lambda \mapsto f(\lambda)$ integrated against the operator-valued spectral measure, and it prepares the later extension to unbounded self-adjoint operators where the domain of an unbounded $f(T)$ becomes part of the construction. [example: Indicator Functions as Spectral Projections] Let $A \in \mathcal{L}(H)$ be bounded and self-adjoint, and fix $a \in \mathbb R$. Put $E=(-\infty,a]\cap\sigma(A)$. The Borel functional calculus defines \begin{align*} P_A(E)=\mathbf{1}_E(A). \end{align*} For each $\lambda\in\sigma(A)$, the value $\mathbf{1}_E(\lambda)$ is either $0$ or $1$, so $\mathbf{1}_E(\lambda)^2=\mathbf{1}_E(\lambda)$ and $\overline{\mathbf{1}_E(\lambda)}=\mathbf{1}_E(\lambda)$. Hence $\mathbf{1}_E^2=\mathbf{1}_E$ and $\overline{\mathbf{1}_E}=\mathbf{1}_E$ on $\sigma(A)$. By multiplicativity and the adjoint rule in the Borel calculus, \begin{align*} P_A(E)^2=\mathbf{1}_E(A)\mathbf{1}_E(A)=(\mathbf{1}_E^2)(A)=\mathbf{1}_E(A)=P_A(E). \end{align*} Also, \begin{align*} P_A(E)^*=\mathbf{1}_E(A)^*=\overline{\mathbf{1}_E}(A)=\mathbf{1}_E(A)=P_A(E). \end{align*} Thus $P_A(E)$ is an orthogonal projection. Now let $E$ and $F$ be disjoint Borel subsets of $\sigma(A)$. For every $\lambda\in\sigma(A)$, disjointness gives \begin{align*} \mathbf{1}_E(\lambda)\mathbf{1}_F(\lambda)=\mathbf{1}_{E\cap F}(\lambda)=0. \end{align*} Therefore $\mathbf{1}_E\mathbf{1}_F=0$ as a Borel function on $\sigma(A)$, and multiplicativity gives \begin{align*} P_A(E)P_A(F)=\mathbf{1}_E(A)\mathbf{1}_F(A)=(\mathbf{1}_E\mathbf{1}_F)(A)=0(A)=0. \end{align*} If $x=P_A(E)u$ and $y=P_A(F)v$, then \begin{align*} (x,y)_H=(P_A(E)u,P_A(F)v)_H=(u,P_A(E)P_A(F)v)_H=(u,0)_H=0. \end{align*} So disjoint spectral sets give orthogonal spectral subspaces, and $P_A(E)$ is the projection onto the spectral part of $H$ corresponding to spectral values of $A$ at most $a$. [/example] This example explains why the Borel calculus is indispensable in spectral theory: it turns subsets of the spectrum into Hilbert-space decompositions. Continuous functions smooth over spectral boundaries, while Borel indicators cut the space according to them. ## Spectral Mapping and Positivity Once functions of operators are available, the main structural questions are whether spectra transform as expected and whether order properties of scalar functions transfer to operators. These are the tests that show the functional calculus behaves like substitution into a diagonal matrix. [quotetheorem:8416] [citeproof:8416] At the scalar continuous-function level, invertibility of $f-\lambda$ is exactly nonvanishing on the compact set $\sigma(T)$. The continuity assumption is what makes this reasoning legitimate inside $C(\sigma(T))$; without it, nonvanishing need not give a continuous inverse in the same algebra, and discontinuities must be handled by the Borel calculus instead. The normality assumption is also part of the mechanism, because it identifies the operator algebra generated by $T$ with functions on $\sigma(T)$. Outside this setting spectral mapping can be subtler: for non-normal operators the spectrum may fail to record nilpotent directions, so the formula is no longer a consequence of a norm-preserving functional calculus on the spectrum alone. [example: Spectrum of a Square Root] Let $A \in \mathcal{L}(H)$ be positive and bounded. Since $A$ is self-adjoint and positive, $\sigma(A)\subset [0,\infty)$, so the function $f:\sigma(A)\to \mathbb R$ given by $f(t)=\sqrt t$ is continuous. By the continuous functional calculus, $A^{1/2}=f(A)$. Applying the *Spectral Mapping Theorem for Continuous Functions* to this $f$ gives \begin{align*} \sigma(A^{1/2})=\sigma(f(A))=f(\sigma(A)). \end{align*} It remains only to spell out the set $f(\sigma(A))$. By definition of the image of a set under a function, \begin{align*} f(\sigma(A))=\{f(\lambda):\lambda\in\sigma(A)\}. \end{align*} Since $f(\lambda)=\sqrt{\lambda}$ for each $\lambda\in\sigma(A)$, this becomes \begin{align*} f(\sigma(A))=\{\sqrt{\lambda}:\lambda\in\sigma(A)\}. \end{align*} Therefore \begin{align*} \sigma(A^{1/2})=\{\sqrt{\lambda}:\lambda\in\sigma(A)\}=:\sqrt{\sigma(A)}. \end{align*} Thus the square-root operator has exactly the scalar square roots of the spectral values of $A$, so the functional-calculus construction preserves the expected spectral information. [/example] Spectral mapping describes what happens to spectra after applying functions. The companion order question asks when a scalar inequality $f \ge 0$ on the spectrum becomes the Hilbert-space inequality $f(T)\ge 0$, and this is what makes the calculus compatible with positive operators. For a bounded normal operator $T$ and a real-valued continuous function $f\in C(\sigma(T))$, the continuous functional calculus gives a self-adjoint operator $f(T)$. If \begin{align*} f(\lambda)\ge 0\qquad\text{for every }\lambda\in\sigma(T), \end{align*} then \begin{align*} f(T)\ge 0. \end{align*} Indeed, writing $g=\sqrt f$ gives $f=g\overline g$ on $\sigma(T)$, so the $*$-homomorphism property yields \begin{align*} f(T)=g(T)g(T)^*=g(T)^*g(T). \end{align*} Conversely, when $T$ is self-adjoint, positivity of $f(T)$ forces $f$ to be non-negative on $\sigma(T)$. This criterion is the bridge between scalar inequalities and operator inequalities, but its hypotheses are precise. Real-valuedness is needed even to make $f(T)$ self-adjoint, since a genuinely complex-valued function produces a normal operator rather than an ordered self-adjoint one. Nonnegativity on the spectrum is exactly what permits the factorisation $f=g\bar g$ with $g=\sqrt f$, which turns the scalar inequality into $f(T)=g(T)^*g(T)$. The converse is stated for self-adjoint $T$ because then $\sigma(T) \subset \mathbb R$ and positivity has an order-theoretic meaning on real spectral values; for a general normal operator with non-real spectrum, $f(T)\ge 0$ forces $f$ to be real and nonnegative on the spectral support seen by the calculus, but it is not an order statement about the complex coordinate function itself. This also explains why square roots of positive operators are positive and why spectral projections are positive operators. [remark: Order Is Spectral for Self-Adjoint Operators] For bounded self-adjoint $A$, the statement $A \ge 0$ is equivalent to $\sigma(A) \subset [0,\infty)$. One direction follows from positivity and the resolvent estimate on $(-\infty,0)$; the other follows by applying the positivity criterion to the identity function on $\sigma(A)$. Thus the order structure on bounded self-adjoint operators is encoded by their spectra. [/remark] The chapter closes the circle started by the spectral theorem: a bounded normal operator is controlled by functions on its spectrum. Continuous functions give norm-accurate operator functions, Borel functions give spectral projections, spectral mapping identifies the resulting spectra, and positivity is checked pointwise on the spectrum. These tools will be the language for later extensions to unbounded self-adjoint operators and to operator algebras. By the end of Chapter 9, the spectral theorem has become a flexible calculus: spectra can be mapped, positivity can be tested pointwise, and operator functions can be built directly from scalar ones. Chapter 10 uses that machinery on concrete examples and classification problems, showing how the abstract theory organizes and explains the main operator-theoretic phenomena of the course. # 10. Applications and Classification Examples The final chapter turns the general spectral theorem into a working set of diagnostic tools. Chapters 1 and 2 built the resolvent, Chapters 4 through 6 developed compact spectral theory, and Chapters 8 and 9 built the Borel functional calculus for bounded normal operators; here we use those tools to locate spectra, separate spectral types, and compare model examples. The guiding theme is that normality converts algebraic information about an operator into geometric information about its spectral measure, while compact perturbations show which spectral features are fragile and which persist. ## Resolvent Estimates for Special Classes of Operators Given a bounded operator $T \in \mathcal{L}(H)$ on a complex Hilbert space $H$, the first practical question is: where can the spectrum lie, and how large can the resolvent become away from it? For arbitrary operators the norm of $(T - \lambda I)^{-1}$ can be much larger than the reciprocal distance from $\lambda$ to $\sigma(T)$. For self-adjoint, unitary, and normal operators, the spectral theorem removes this instability. The basic location result for self-adjoint operators is the operator-theoretic analogue of the fact that a real-valued function has real range. We need this estimate first because it turns self-adjointness into a concrete exclusion principle for all non-real spectral parameters. [quotetheorem:8417] [citeproof:8417] The estimate says more than real spectral location: it controls the resolvent quantitatively near the real line. Self-adjointness is essential here: the unilateral shift is not self-adjoint and its spectrum is the closed unit disc, while a non-normal Jordan block can have resolvent behaviour not governed by distance to a real set. The theorem does not classify real spectral points as eigenvalues or continuous spectrum; it only excludes non-real parameters and bounds the corresponding inverse. The next natural class is unitary operators, where the inverse is already built into the operator and the spectral obstruction is radial rather than imaginary. This motivates a location theorem for the unit circle. [quotetheorem:8395] [citeproof:8395] The unitary hypothesis is doing real work: an isometry that is not onto, such as the unilateral shift, need not have spectrum only on the unit circle. The theorem also does not say which points of the unit circle actually occur; it gives an inclusion, not an equality. Self-adjoint and unitary operators are special cases of normal operators, so the common mechanism should be a unified principle measuring the size of an operator directly from its spectral set. The right question is whether the spectral radius already gives the operator norm. For normal operators the answer is yes, and this is the main reason their spectral pictures are reliable. [quotetheorem:8418] [citeproof:8418] Norm equality turns the spectrum into an exact norm gauge. Normality is essential: the nilpotent Jordan block on $\mathbb{C}^2$ defined by $Te_1=0$ and $Te_2=e_1$ has spectrum $\{0\}$ but nonzero norm, so the spectral radius need not control the norm for general operators. The theorem also does not describe eigenvectors or spectral multiplicity; two normal operators can have the same spectral set but different spectral measures. To use spectra for perturbation and approximation arguments, we also need the exact size of the inverse away from the spectrum. Applying the same functional calculus to the reciprocal function gives the normal resolvent formula. [quotetheorem:8419] [citeproof:8419] The formula becomes most transparent in multiplication form. Normality is again indispensable: pseudospectral examples and Jordan blocks can have resolvent norms much larger than $1/\operatorname{dist}(\lambda,\sigma(T))$. The theorem also does not locate the spectrum by itself; it assumes $\lambda\in\rho(N)$ and then measures the inverse exactly. The following example turns the abstract distance from $\lambda$ to the spectrum into an elementary essential supremum computation, which is the prototype for the rest of the chapter. [example: Resolvent of a Self-Adjoint Multiplication Operator] Let $H=L^2([0,1])$ and define $(M_x f)(t)=t f(t)$. Fix $\lambda \notin [0,1]$ and put \begin{align*} d=\operatorname{dist}(\lambda,[0,1])=\min_{0\le t\le 1}|t-\lambda|. \end{align*} Since $[0,1]$ is compact and $\lambda\notin[0,1]$, we have $d>0$. Hence the function $g(t)=(t-\lambda)^{-1}$ is bounded on $[0,1]$, with $|g(t)|\le 1/d$, so multiplication by $g$ defines a bounded operator on $L^2([0,1])$. For every $f\in L^2([0,1])$, \begin{align*} ((M_x-\lambda I)(gf))(t)=(t-\lambda)\frac{1}{t-\lambda}f(t)=f(t). \end{align*} Similarly, \begin{align*} (g(M_x-\lambda I)f)(t)=\frac{1}{t-\lambda}(t-\lambda)f(t)=f(t). \end{align*} Thus $(M_x-\lambda I)^{-1}$ is multiplication by $(t-\lambda)^{-1}$. The norm of this multiplier is its essential supremum. Since $t\mapsto |t-\lambda|$ is continuous and has minimum $d$, the reciprocal satisfies \begin{align*} \operatorname*{ess\,sup}_{t\in[0,1]}\frac{1}{|t-\lambda|}=\frac{1}{d}. \end{align*} Therefore \begin{align*} \|(M_x-\lambda I)^{-1}\|=\frac{1}{\operatorname{dist}(\lambda,[0,1])}. \end{align*} The essential range of $t\mapsto t$ on $[0,1]$ is $[0,1]$, so this is exactly the normal resolvent formula in the multiplication model: the resolvent grows precisely as $\lambda$ approaches the spectral interval. [/example] The example also previews the central models of the chapter: multiplication operators. Before returning to them, we examine compact perturbations, where explicit diagonal examples show which spectral features can move and which ones remain as limiting structure. ## Compact Perturbations and Stable Essential Behaviour Which parts of the spectrum survive under a compact perturbation? Finite-dimensional intuition suggests that small-rank changes can move eigenvalues, create isolated spectral points, or remove accidental kernel dimensions. Infinite-dimensional spectral theory adds another layer: accumulation and continuous spectral behaviour often persist even when individual eigenvalues do not. We use a simple diagonal model to separate the movable discrete spectrum from the stable limiting part. The point is not the full Fredholm theory of the essential spectrum, but the concrete phenomenon that compact operators can alter isolated eigenvalues while leaving the main accumulation set visible. [example: Compact Perturbation of a Diagonal Operator] Let $H=\ell^2$ with standard orthonormal basis $(e_n)_{n\ge 1}$, and define $D e_n=\frac{1}{n}e_n$. For $x=\sum_{n\ge 1}x_n e_n$, we have \begin{align*} Dx=\sum_{n\ge 1}\frac{x_n}{n}e_n. \end{align*} Thus $D$ is self-adjoint, since its diagonal entries are real. If $D_N e_n=\frac{1}{n}e_n$ for $1\le n\le N$ and $D_N e_n=0$ for $n>N$, then $D_N$ has finite rank and \begin{align*} \|(D-D_N)x\|_{\ell^2}^2=\sum_{n>N}\frac{|x_n|^2}{n^2}\le \frac{1}{(N+1)^2}\sum_{n>N}|x_n|^2\le \frac{1}{(N+1)^2}\|x\|_{\ell^2}^2. \end{align*} Hence $\|D-D_N\|\le \frac{1}{N+1}$, so $D$ is compact as a norm limit of finite-rank operators. For each $n\ge 1$, \begin{align*} (D-\tfrac{1}{n}I)e_n=0, \end{align*} so every $\frac{1}{n}$ is an eigenvalue. Also $0\in\sigma(D)$, because if $D^{-1}$ were bounded then $D^{-1}e_n=n e_n$, which would give \begin{align*} \|D^{-1}e_n\|_{\ell^2}=n \end{align*} for unit vectors $e_n$, impossible for a bounded operator. Conversely, if $\lambda\notin \{0\}\cup\{\frac{1}{n}:n\ge 1\}$, then the numbers $\frac{1}{n}-\lambda$ are bounded away from $0$, since the only accumulation point of $\{\frac{1}{n}:n\ge 1\}$ is $0$. The diagonal operator \begin{align*} R_\lambda e_n=\frac{1}{\frac{1}{n}-\lambda}e_n \end{align*} is therefore bounded, and \begin{align*} (D-\lambda I)R_\lambda e_n=(\tfrac{1}{n}-\lambda)\frac{1}{\tfrac{1}{n}-\lambda}e_n=e_n. \end{align*} The same calculation gives $R_\lambda(D-\lambda I)e_n=e_n$, so $R_\lambda=(D-\lambda I)^{-1}$. Hence \begin{align*} \sigma(D)=\{0\}\cup\left\{\frac{1}{n}:n\ge 1\right\}. \end{align*} Now define $K e_1=2e_1$ and $K e_n=0$ for $n\ge 2$. This operator has range $\operatorname{span}\{e_1\}$, so it is rank one and compact. The perturbed operator is still diagonal: \begin{align*} (D+K)e_1=De_1+Ke_1=e_1+2e_1=3e_1. \end{align*} For $n\ge 2$, \begin{align*} (D+K)e_n=De_n+Ke_n=\frac{1}{n}e_n+0=\frac{1}{n}e_n. \end{align*} Therefore \begin{align*} \sigma(D+K)=\{0,3\}\cup\left\{\frac{1}{n}:n\ge 2\right\}. \end{align*} The compact perturbation replaces the isolated eigenvalue $1$ of $D$ by the isolated eigenvalue $3$, while the remaining eigenvalues $\frac{1}{2},\frac{1}{3},\dots$ still accumulate at $0$; the limiting spectral behaviour at $0$ is unchanged. [/example] This model is the compact self-adjoint spectral theorem in miniature: the nonzero spectral values are not mysterious continuous spectral points, but genuine eigenvalues. To justify why diagonal examples are representative, we need the structural theorem governing nonzero spectrum for compact self-adjoint operators. [quotetheorem:4923] [citeproof:4923] The accumulation theorem controls where the nonzero spectral values can lie, but it does not yet describe how the operator acts on the Hilbert space. For self-adjoint compact operators, different eigenspaces should be orthogonal, and the operator should be recoverable by summing its scalar action on those eigenspaces. The remaining structural question is therefore whether the compact self-adjoint operator admits a diagonal expansion analogous to a diagonal matrix with entries tending to zero. [quotetheorem:538] [citeproof:538] The theorem explains why compact self-adjoint operators resemble diagonal matrices with entries tending to zero. Compactness is essential: the identity operator has nonzero spectrum $\{1\}$ in infinite dimension, but $1$ is an eigenvalue with infinite-dimensional eigenspace and the spectral picture is not forced into a sequence tending to $0$. Self-adjointness is also essential for the orthogonality conclusion; non-normal compact operators can have more complicated root-space behaviour. The theorem does not say that $0$ is an eigenvalue, only that it is the only possible nonzero accumulation point. The next example combines a non-compact diagonal operator with a compact diagonal perturbation, so the limiting spectral background is no longer just $0$. [example: Diagonal Operator with a Compact Diagonal Perturbation] Let $D \in \mathcal{L}(\ell^2)$ be defined by $D e_n=(-1)^n e_n$. Since $D e_{2m}=e_{2m}$ and $D e_{2m-1}=-e_{2m-1}$, both $1$ and $-1$ are eigenvalues. If $\lambda\notin\{-1,1\}$, define $R_\lambda e_n=((-1)^n-\lambda)^{-1}e_n$. The two numbers $1-\lambda$ and $-1-\lambda$ are nonzero, so $R_\lambda$ is bounded, and for every basis vector, \begin{align*} (D-\lambda I)R_\lambda e_n=((-1)^n-\lambda)\frac{1}{(-1)^n-\lambda}e_n=e_n. \end{align*} The same calculation gives $R_\lambda(D-\lambda I)e_n=e_n$, hence $D-\lambda I$ is invertible. Therefore $\sigma(D)=\{-1,1\}$. Now let $K e_n=\frac{1}{n}e_n$. For $N\ge 1$, define $K_N e_n=\frac{1}{n}e_n$ for $n\le N$ and $K_N e_n=0$ for $n>N$. Then $K_N$ has finite rank, and for $x=\sum_{n\ge 1}x_n e_n$, \begin{align*} \|(K-K_N)x\|_{\ell^2}^2=\sum_{n>N}\frac{|x_n|^2}{n^2}\le \frac{1}{(N+1)^2}\sum_{n>N}|x_n|^2\le \frac{1}{(N+1)^2}\|x\|_{\ell^2}^2. \end{align*} Thus $\|K-K_N\|\le \frac{1}{N+1}$, so $K$ is compact as a norm limit of finite-rank operators. The perturbation remains diagonal. For each $n\ge 1$, \begin{align*} (D+K)e_n=De_n+Ke_n=(-1)^n e_n+\frac{1}{n}e_n=\left((-1)^n+\frac{1}{n}\right)e_n. \end{align*} Hence every number $(-1)^n+\frac{1}{n}$ is an eigenvalue of $D+K$. Along even indices, \begin{align*} (-1)^{2m}+\frac{1}{2m}=1+\frac{1}{2m}\to 1. \end{align*} Along odd indices, \begin{align*} (-1)^{2m-1}+\frac{1}{2m-1}=-1+\frac{1}{2m-1}\to -1. \end{align*} Thus the compact diagonal perturbation creates the displaced eigenvalues $(-1)^n+\frac{1}{n}$, while the two limiting spectral points $1$ and $-1$ remain as the persistent background. [/example] This example shows why compact perturbation theory tracks more than individual eigenvalues. It motivates the language of essential spectral behaviour, even though the full Fredholm definition belongs to a later course. [remark: What Stability Means Here] In this chapter, stability under compact perturbation is meant in this concrete spectral sense: isolated eigenvalues may move, but accumulation patterns and non-discrete spectral behaviour are harder to remove. The full theory is expressed using Fredholm operators and the essential spectrum, developed in later courses. [/remark] The examples give a useful warning. Spectra are closed sets, so when infinitely many eigenvalues move, their limit points must be retained even if no eigenvector belongs to the limiting value. ## Point Spectrum, Continuous Spectrum, and Spectral Measure Support Once the location of the spectrum is known, the next question is what kind of spectral value each point represents. Is there an eigenvector? Is the range dense but not onto? Does the spectral measure see the point only through intervals around it? Multiplication operators answer these questions with the least technical overhead. To make the comparison precise, we first name the three classical spectral types for bounded operators. These definitions record whether failure of invertibility comes from failure of injectivity, failure of surjectivity with dense range, or failure of density of the range. [definition: Spectral Types] Let $T \in \mathcal{L}(X)$ be a bounded operator on a complex Banach space $X$. A point $\lambda \in \sigma(T)$ lies in the point spectrum $\sigma_p(T)$ if $T - \lambda I$ is not injective; it lies in the continuous spectrum $\sigma_c(T)$ if $T - \lambda I$ is injective, has dense range, and is not surjective; it lies in the residual spectrum $\sigma_r(T)$ if $T - \lambda I$ is injective and its range is not dense. [/definition] The definition separates all possible failures of invertibility, but normal Hilbert-space operators should have a sharper classification. For them, the adjoint relates the range of $N-\lambda I$ to its kernel, so injectivity should force density of the range. This motivates the theorem that normal operators have no residual spectrum, leaving only point and continuous spectral behaviour to classify. [quotetheorem:8420] [citeproof:8420] The theorem leaves two possibilities for normal spectral values: eigenvalue behaviour and continuous behaviour. Normality is essential, since non-normal operators can have residual spectrum; the adjoint kernel argument fails without the equality of kernels. The result also does not decide whether a given spectral point is point or continuous spectrum, so it is a structural exclusion rather than a classification. Multiplication operators distinguish the remaining possibilities by asking how large the level sets of the multiplier are. This leads to the measure-theoretic replacement for ordinary range. [definition: Essential Range] Let $(\Omega,\mathcal{F},\mu)$ be a measure space and let $\varphi:\Omega\to\mathbb{C}$ be a measurable representative of an element of $L^\infty(\Omega,\mathcal{F},\mu)$. The essential range of $\varphi$ is the set of all $\lambda \in \mathbb{C}$ such that for every $\varepsilon>0$, \begin{align*} \mu\bigl(\{\omega \in \Omega : |\varphi(\omega)-\lambda|<\varepsilon\}\bigr)>0. \end{align*} [/definition] This definition filters out changes on null sets, exactly as $L^\infty$ does. The key obstruction for a multiplication operator is whether division by $\varphi-\lambda$ remains essentially bounded. If $\lambda$ is outside the essential range, the multiplier stays away from $\lambda$ except on a null set and a bounded inverse multiplier exists; if $\lambda$ is in the essential range, there are positive-measure regions where $\varphi$ is arbitrarily close to $\lambda$, so bounded below estimates fail. The spectral theorem for multiplication operators is precisely this essential-range test. [quotetheorem:8421] [citeproof:8421] The theorem reduces spectral classification to a measure calculation. The $L^\infty$ hypothesis is needed so that $M_\varphi$ is a bounded operator on $L^2$; unbounded multipliers require the theory of unbounded operators and domains. The ordinary range would give the wrong answer because $L^2$ does not see null-set changes: if $\varphi(t)=t$ on $[0,1]$ and $\psi$ agrees with $\varphi$ except that $\psi(1/2)=2$, then $M_\psi=M_\varphi$ on $L^2([0,1])$, so the spectrum is still $[0,1]$ rather than $[0,1]\cup\{2\}$. The theorem also does not say that every spectral point is an eigenvalue: that depends on whether the corresponding level set has positive measure. Counting measure produces point spectrum at attained values, while limit points can remain spectral without being eigenvalues. This is the discrete multiplication model. [example: Discrete Spectrum of a Diagonal Multiplication Operator] Let $H=\ell^2$ with standard orthonormal basis $(e_n)_{n\ge 1}$, and let $D e_n=a_n e_n$, where $(a_n)$ is bounded. For $x=\sum_{n\ge 1}x_n e_n$, the operator acts by \begin{align*} Dx=\sum_{n\ge 1}a_n x_n e_n. \end{align*} Thus $D$ is multiplication by the function $n\mapsto a_n$ on $(\mathbb{N},2^{\mathbb{N}},\#)$. We compute the spectrum. If $\lambda\notin \overline{\{a_n:n\ge 1\}}$, then \begin{align*} \delta=\operatorname{dist}\bigl(\lambda,\overline{\{a_n:n\ge 1\}}\bigr)>0. \end{align*} Hence $|a_n-\lambda|\ge \delta$ for every $n$, so the diagonal formula \begin{align*} R_\lambda e_n=\frac{1}{a_n-\lambda}e_n \end{align*} defines a bounded operator with $\|R_\lambda\|\le 1/\delta$. On each basis vector, \begin{align*} (D-\lambda I)R_\lambda e_n=(a_n-\lambda)\frac{1}{a_n-\lambda}e_n=e_n. \end{align*} Also, \begin{align*} R_\lambda(D-\lambda I)e_n=\frac{1}{a_n-\lambda}(a_n-\lambda)e_n=e_n. \end{align*} By linearity and continuity, $R_\lambda=(D-\lambda I)^{-1}$ on all of $\ell^2$, so $\lambda\in\rho(D)$. Conversely, if $\lambda\in \overline{\{a_n:n\ge 1\}}$, there are indices $n_k$ with $|a_{n_k}-\lambda|\to 0$. For each $k$, \begin{align*} \|(D-\lambda I)e_{n_k}\|_{\ell^2}=\|(a_{n_k}-\lambda)e_{n_k}\|_{\ell^2}=|a_{n_k}-\lambda|. \end{align*} Since $\|e_{n_k}\|_{\ell^2}=1$ and $\|(D-\lambda I)e_{n_k}\|_{\ell^2}\to 0$, the operator $D-\lambda I$ cannot be bounded below. An invertible bounded operator has bounded inverse and therefore is bounded below, so $D-\lambda I$ is not invertible. Hence \begin{align*} \sigma(D)=\overline{\{a_n:n\ge 1\}}. \end{align*} The eigenvalue criterion is pointwise. If $a_m=\lambda$ for some $m$, then \begin{align*} (D-\lambda I)e_m=(a_m-\lambda)e_m=0, \end{align*} so $\lambda$ is an eigenvalue. Conversely, if $(D-\lambda I)x=0$ for a nonzero vector $x=\sum_{n\ge 1}x_n e_n$, then \begin{align*} 0=(D-\lambda I)x=\sum_{n\ge 1}(a_n-\lambda)x_n e_n. \end{align*} Since the [coordinates in an orthonormal basis](/theorems/3267) are unique, $(a_n-\lambda)x_n=0$ for every $n$. Some coordinate $x_m$ is nonzero, so $a_m-\lambda=0$, and therefore $a_m=\lambda$. For the special sequence $a_n=1/n$, the spectral formula gives \begin{align*} \sigma(D)=\{0\}\cup\left\{\frac{1}{n}:n\ge 1\right\}. \end{align*} Each $\frac{1}{n}$ is an eigenvalue because $D e_n=\frac{1}{n}e_n$. The point $0$ is spectral because $\frac{1}{n}\to 0$, but it is not an eigenvalue: if $Dx=0$ and $x=\sum_{n\ge 1}x_n e_n$, then \begin{align*} 0=Dx=\sum_{n\ge 1}\frac{x_n}{n}e_n. \end{align*} Coordinate uniqueness gives $x_n/n=0$ for every $n$, hence $x_n=0$ for every $n$, so $x=0$. Thus the diagonal model has eigenvalues at the attained values $1/n$, while the limit point $0$ remains spectral without having an eigenvector. [/example] The diagonal example has atoms in the underlying measure space, so many spectral points are eigenvalues. Lebesgue measure gives the opposite behaviour for the basic coordinate multiplication operator. [example: Continuous Spectrum of Multiplication by the Coordinate] Let $H=L^2([0,1])$ with Lebesgue measure, and let $(M_x f)(t)=t f(t)$. We first identify the essential range of $t\mapsto t$. If $\lambda\in[0,1]$ and $\varepsilon>0$, then the set \begin{align*} [0,1]\cap(\lambda-\varepsilon,\lambda+\varepsilon) \end{align*} contains a non-degenerate interval, so it has positive Lebesgue measure. Thus every $\lambda\in[0,1]$ lies in the essential range. If $\lambda\notin[0,1]$, then \begin{align*} d=\operatorname{dist}(\lambda,[0,1])>0, \end{align*} and choosing $\varepsilon=d/2$ gives \begin{align*} \{t\in[0,1]: |t-\lambda|<\varepsilon\}=\varnothing. \end{align*} Hence the essential range is exactly $[0,1]$, and by *Spectrum of a Multiplication Operator*, \begin{align*} \sigma(M_x)=[0,1]. \end{align*} Now fix $\lambda\in[0,1]$. If $(M_x-\lambda I)f=0$, then \begin{align*} (t-\lambda)f(t)=0 \end{align*} for almost every $t\in[0,1]$. On the set $\{t:t\ne\lambda\}$ this implies $f(t)=0$, while the remaining set $\{t:t=\lambda\}$ has Lebesgue measure zero. Therefore $f=0$ in $L^2([0,1])$, so $M_x-\lambda I$ is injective and $\lambda$ is not an eigenvalue. The range is dense. Indeed, if $h\in L^2([0,1])$ is orthogonal to $\operatorname{Range}(M_x-\lambda I)$, then for every $f\in L^2([0,1])$, \begin{align*} 0=((M_x-\lambda I)f,h)_{L^2}=\int_0^1 (t-\lambda)f(t)\overline{h(t)}\,dt. \end{align*} Taking $f(t)=(t-\lambda)h(t)$, which belongs to $L^2([0,1])$ because $|t-\lambda|\le 1$, gives \begin{align*} 0=\int_0^1 |t-\lambda|^2|h(t)|^2\,dt. \end{align*} Thus $(t-\lambda)h(t)=0$ almost everywhere, and the same null-set argument gives $h=0$. Hence the orthogonal complement of the range is $\{0\}$, so the range is dense. Finally, the range is not all of $L^2([0,1])$. For each $\varepsilon>0$, let \begin{align*} E_\varepsilon=[0,1]\cap(\lambda-\varepsilon,\lambda+\varepsilon) \end{align*} with positive measure, and define \begin{align*} u_\varepsilon=\frac{\mathbb{1}_{E_\varepsilon}}{\sqrt{m(E_\varepsilon)}}. \end{align*} Then $\|u_\varepsilon\|_{L^2}=1$, while \begin{align*} \|(M_x-\lambda I)u_\varepsilon\|_{L^2}^2=\frac{1}{m(E_\varepsilon)}\int_{E_\varepsilon}|t-\lambda|^2\,dt\le \varepsilon^2. \end{align*} So $M_x-\lambda I$ is not bounded below. If it were bijective, the bounded inverse theorem would make its inverse bounded, hence $M_x-\lambda I$ would be bounded below. Therefore it is not surjective. Each $\lambda\in[0,1]$ is injective with dense non-surjective range, so every point of $[0,1]$ lies in the continuous spectrum. [/example] This example is the standard picture of continuous spectrum: approximate eigenvectors concentrate near a point, but no nonzero vector is supported at a single point. The same support-versus-null-set distinction appears in Fourier analysis, where translation-invariant operators become multiplication by functions of the frequency variable and spectral localisation is governed by measure-theoretic support. Spectral measure records this concentration using projections onto measurable subsets of the spectral line. ## Functional Calculus and Spectral Projections in Examples The final question of the chapter is how to compute functions of a normal operator in practice. The spectral theorem says that a bounded normal operator is unitarily equivalent to multiplication by the identity function on a measure model. In computations, this means that $f(N)$ is obtained by applying $f$ to the spectral variable. The functional calculus is useful only if it controls operator norms. In the Borel functional calculus, the map $\Psi_T$ sends a bounded Borel function $f$ on $\sigma(T)$ to the operator $f(T)$, and the governing principle is that the operator norm is exactly the essential supremum norm coming from the associated spectral measure. For continuous functions this specializes to the familiar continuous functional calculus norm rule. This norm control is only one part of the spectral-measure story. Actual computations also require an integration theorem. Here a resolution of the identity $P$ means the normalized projection-valued measure attached to the normal operator, and $L^\infty(P)$ denotes bounded [measurable functions](/page/Measurable%20Functions) modulo equality outside $P$-null sets. Once such a $P$ is available, one must know that functions in $L^\infty(P)$ can be integrated against $P$ to produce bounded operators with the expected algebraic and norm properties. This is stronger than the continuous functional calculus because functions that agree only outside a spectral-measure null set should define the same operator. The next formal result supplies that operator-valued integral for essentially bounded measurable functions. It is the technical bridge from the abstract Borel calculus to concrete formulas such as $f(N)=\int f\,dP$, and it explains why characteristic functions of measurable sets become projections. [quotetheorem:2692] [citeproof:2692] The Borel functional calculus includes discontinuous functions, so it can apply characteristic functions of Borel sets to $N$. Normality is again the hypothesis that makes this calculus available with exact norm control; for general bounded operators there is no comparable projection-valued calculus on Borel subsets of the spectrum. The norm formula also does not determine the spectral projections by the set $\sigma(N)$ alone, since the scalar spectral measures matter. Characteristic functions should become projections, because they separate the spectrum into two pieces. This motivates the projection-valued language attached to a normal operator. [definition: Spectral Projection] Let $N \in \mathcal{L}(H)$ be normal with spectral measure $E$. For a Borel set $B \subset \mathbb{C}$, the spectral projection associated to $B$ is the operator $E(B):H\to H$ in $\mathcal{L}(H)$ defined by \begin{align*} E(B)=\mathbb{1}_B(N). \end{align*} [/definition] A spectral projection is useful because it isolates an invariant part of the operator rather than merely selecting a subset of the spectrum. Since $E(B)$ is made from the same functional calculus as $N$, it should commute with $N$ and with $N^*$. The next theorem turns that commutation into a reducing decomposition of the Hilbert space. [quotetheorem:2695] [citeproof:2695] The reducing property explains why spectral projections are the operational version of restricting an operator to part of its spectrum. The projection hypothesis is essential: an arbitrary closed subspace need not be invariant under either $N$ or $N^*$, and even an invariant subspace need not reduce a non-normal operator. The theorem also does not claim that every reducing subspace is generated by a single simple interval; the full projection lattice is governed by all Borel sets and the spectral measure. For self-adjoint operators, intervals are the most natural Borel sets, and the multiplication model makes the projection formula concrete. [example: Spectral Projections for an Interval] Let $A \in \mathcal{L}(H)$ be bounded and self-adjoint with spectral measure $E$, and let $B=[a,b]\subset\mathbb{R}$. By definition of spectral projection, \begin{align*} E([a,b])=\mathbb{1}_{[a,b]}(A). \end{align*} Thus $E([a,b])x$ is the part of $x$ selected by the indicator function of the spectral interval $[a,b]$. In the multiplication model $A=M_x$ on $L^2([0,1])$, applying a bounded Borel function $\varphi$ to $M_x$ gives multiplication by $\varphi(t)$. Taking $\varphi=\mathbb{1}_{[a,b]}$, we get, for every $f\in L^2([0,1])$, \begin{align*} (E([a,b])f)(t)=(\mathbb{1}_{[a,b]}(M_x)f)(t)=\mathbb{1}_{[a,b]}(t)f(t) \end{align*} for almost every $t\in[0,1]$. Since $t$ is restricted to $[0,1]$, this is equivalently multiplication by $\mathbb{1}_{[a,b]\cap[0,1]}$. The operator is a projection because the multiplier is idempotent: \begin{align*} \mathbb{1}_{[a,b]}(t)^2=\mathbb{1}_{[a,b]}(t) \end{align*} for every $t$. Therefore, for $f\in L^2([0,1])$, \begin{align*} (E([a,b])^2f)(t)=\mathbb{1}_{[a,b]}(t)\mathbb{1}_{[a,b]}(t)f(t)=\mathbb{1}_{[a,b]}(t)f(t)=(E([a,b])f)(t). \end{align*} It is self-adjoint because the multiplier $\mathbb{1}_{[a,b]}$ is real-valued. Thus spectral projections are ordinary cutoffs in the multiplication model: they keep the part of $f$ supported on $[a,b]\cap[0,1]$ and remove the rest. [/example] This computation gives a practical interpretation of the abstract measure $E$. For each $x \in H$, the scalar measure $B \mapsto (E(B)x,x)_H$ describes how the vector $x$ is distributed across the spectrum. [example: Borel Functional Calculus for a Diagonal Normal Operator] Let $D\in\mathcal{L}(\ell^2)$ be diagonal with $D e_n=a_n e_n$, where $(a_n)$ is bounded. First consider a polynomial $p(z)=\sum_{k=0}^m c_k z^k$. Since $D^k e_n=a_n^k e_n$ for every $k\ge 0$, we have \begin{align*} p(D)e_n=\sum_{k=0}^m c_k D^k e_n=\sum_{k=0}^m c_k a_n^k e_n=p(a_n)e_n. \end{align*} The Borel functional calculus extends this diagonal rule from polynomials to bounded Borel functions on $\sigma(D)$. Hence, for every bounded Borel function $f$ on $\sigma(D)$, \begin{align*} f(D)e_n=f(a_n)e_n. \end{align*} Now let $B\subset\sigma(D)$ be Borel. The spectral projection is \begin{align*} E(B)=\mathbb{1}_B(D). \end{align*} Applying the preceding formula with $f=\mathbb{1}_B$ gives \begin{align*} E(B)e_n=\mathbb{1}_B(a_n)e_n. \end{align*} Thus $E(B)e_n=e_n$ when $a_n\in B$, and $E(B)e_n=0$ when $a_n\notin B$. For a vector $x=\sum_{n\ge 1}x_n e_n\in\ell^2$, continuity of the bounded projection $E(B)$ gives \begin{align*} E(B)x=\sum_{n\ge 1}x_n E(B)e_n=\sum_{a_n\in B}x_n e_n. \end{align*} Therefore the range of $E(B)$ is exactly the closed span of the basis vectors whose diagonal entries lie in $B$: \begin{align*} E(B)\ell^2=\overline{\operatorname{span}}\{e_n:a_n\in B\}. \end{align*} For the scalar spectral measure associated to $x=\sum_{n\ge 1}x_n e_n$, this gives \begin{align*} (E(B)x,x)_{\ell^2}=\sum_{a_n\in B}|x_n|^2. \end{align*} In particular, for a single value $\alpha\in\sigma(D)$, \begin{align*} E(\{\alpha\})\ell^2=\overline{\operatorname{span}}\{e_n:a_n=\alpha\}. \end{align*} Thus the spectral measure is atomic at precisely the values actually assumed by the sequence $(a_n)$, while the full spectral support is the closure of the diagonal entries. [/example] The contrast between this atomic example and $M_x$ on $L^2([0,1])$ is the main classification lesson. The same spectral set can support different spectral measures, and the measure determines whether spectral values are eigenvalues, continuous values, or mixtures of both. [remark: What the Examples Classify] Diagonal operators on $\ell^2$ model pure point behaviour, multiplication by $x$ on $L^2([0,1])$ models continuous behaviour, and compact perturbations show how isolated eigenvalues can be added or moved around a stable limiting set. Normal operator theory classifies these examples through spectral measures rather than through the spectrum alone. [/remark] ## Beyond and Connections The operator-theoretic ideas developed here connect naturally with several next directions on Androma. The resolvent viewpoint leads into spectral theory for unbounded operators, where domains become part of the structure and the resolvent set is often the most stable object to study. The Neumann-series perturbation principle also reappears throughout functional analysis: it is the basic local mechanism behind openness of the invertible group, stability of Fredholm operators, and many inverse-mapping arguments. For normal operators, the compact case should be read as the bridge between finite-dimensional diagonalisation and the full projection-valued-measure form of the spectral theorem. Compactness turns the spectral measure into a discrete eigenvalue expansion, while the general bounded normal theorem replaces sums over eigenvectors by integration over the spectrum. The continuous functional calculus is the corresponding scalar-to-operator dictionary for continuous functions on the spectrum; the Borel calculus extends this dictionary to bounded Borel functions, but at the price of using the spectral measure more explicitly. Positivity is another recurring theme. The criterion that a self-adjoint operator is positive exactly when its spectral values are nonnegative is the spectral-theoretic version of the elementary fact that a Hermitian matrix is positive semidefinite exactly when its eigenvalues are nonnegative. This perspective is useful in polar decomposition, square roots of positive operators, $C^*$-algebras, and the study of quadratic forms. For related Androma material, see the compact diagonalisation theorem [Compact Self-Adjoint Spectral Theorem](/theorems/538), the advanced functional-calculus reference [Borel Functional Calculus](/theorems/2696), and the background notes on [Hilbert Space](/page/Hilbert%20Space), [Compact Operator](/page/Compact%20Operator), and [Self-Adjoint Operators](/page/Self-Adjoint%20Operators). The functional-calculus material developed here is the local bridge from these operator-theoretic foundations to the spectral-measure viewpoint. Beyond this page, the same ideas feed into several larger directions. In analysis, compact self-adjoint operators provide the Hilbert-space model behind Fourier expansions, Sturm-Liouville theory, and many integral-equation methods: one studies a problem by finding orthogonal modes and reading analytic information from the resulting scalar coefficients. In operator algebras and quantum mechanics, the passage from eigenvalue sums to functional calculus becomes the language for observables, projections, and spectral decompositions even when no eigenbasis is available. In partial differential equations, compact resolvents turn unbounded differential operators into discrete spectral data, while the full spectral-measure viewpoint handles continuous spectrum and scattering phenomena. The natural continuation is therefore twofold. One route stays with compactness and studies Fredholm theory, compact resolvents, and spectral approximation. The other route drops compactness and develops projection-valued measures, Borel functional calculus, and the spectral theorem for general normal operators. The present course is the common entry point: it explains why finite-dimensional diagonalisation survives exactly in compact settings, and why functional calculus is the correct replacement once the spectrum is no longer just a sequence of eigenvalues. ## References - [Spectral Theorem for Compact Self-Adjoint Operators](/theorems/538) - [Borel Functional Calculus](/theorems/2696) - [Hilbert Space](/page/Hilbert%20Space) - [Compact Operator](/page/Compact%20Operator) - [Self-Adjoint Operators](/page/Self-Adjoint%20Operators)

Created by admin on 6/20/2026 | Last updated on 6/20/2026

What brings you to Androma?

Start with a route through the knowledge graph.

Operator Theory I: Spectral Theory

Sign in to Androma

Check your inbox

One last step

Operator Theory I: Spectral Theory

Prerequisites (0/18 completed)

Prerequisites Graph

Rate this page