A [vector space](/page/Vector%20Space) with an [inner product](/page/Inner%20Product) gives two different ways to understand a vector. We can treat the vector as a single object, or we can ask how much of it points in each chosen direction. In finite-dimensional Euclidean space this second description feels automatic: choose perpendicular unit coordinate axes, take dot products, and recover the vector from its coordinates. The question behind an orthonormal basis is whether the same idea survives in an infinite-dimensional Hilbert space, where a vector may require infinitely many coordinates and convergence becomes part of the coordinate system itself.
The first warning is that algebraic bases are too rigid for analysis. A function in $L^2(0,1)$ is not usefully described by a finite list of basis vectors, and a Hamel basis gives no workable way to approximate, compute energy, or pass to limits. What analysis needs is a coordinate system compatible with the norm: finite partial sums should approximate the vector, squared coefficients should measure squared length, and orthogonality should turn geometry into scalar arithmetic.
[example: Fourier Sines as Coordinates]
Let $\tau>0$ and let $H=L^2(0,\tau)$ with inner product, where $\mathcal L^1$ denotes one-dimensional [Lebesgue measure](/page/Lebesgue%20Measure),
\begin{align*}
(f,g)_H = \int_0^\tau f(x)\overline{g(x)}\,d\mathcal L^1(x).
\end{align*}
For $n\in\mathbb N$, define
\begin{align*}
e_n(x)=\sqrt{\frac{2}{\tau}}\sin\left(\frac{n\pi x}{\tau}\right).
\end{align*}
For $m\ne n$, the product-to-sum identity gives
\begin{align*}
(e_m,e_n)_H
&=\frac{2}{\tau}\int_0^\tau
\sin\left(\frac{m\pi x}{\tau}\right)
\sin\left(\frac{n\pi x}{\tau}\right)\,d\mathcal L^1(x)\\
&=\frac{1}{\tau}\int_0^\tau
\left[
\cos\left(\frac{(m-n)\pi x}{\tau}\right)
-
\cos\left(\frac{(m+n)\pi x}{\tau}\right)
\right]\,d\mathcal L^1(x)\\
&=\frac{1}{\tau}
\left[
\frac{\tau}{(m-n)\pi}\sin\left(\frac{(m-n)\pi x}{\tau}\right)
-
\frac{\tau}{(m+n)\pi}\sin\left(\frac{(m+n)\pi x}{\tau}\right)
\right]_{0}^{\tau}\\
&=\frac{1}{\tau}
\left[
\frac{\tau}{(m-n)\pi}\sin((m-n)\pi)
-
\frac{\tau}{(m+n)\pi}\sin((m+n)\pi)
\right]\\
&=0.
\end{align*}
For $n=m$, the identity $\sin^2\theta=(1-\cos(2\theta))/2$ gives
\begin{align*}
(e_n,e_n)_H
&=\frac{2}{\tau}\int_0^\tau
\sin^2\left(\frac{n\pi x}{\tau}\right)\,d\mathcal L^1(x)\\
&=\frac{1}{\tau}\int_0^\tau
\left[
1-\cos\left(\frac{2n\pi x}{\tau}\right)
\right]\,d\mathcal L^1(x)\\
&=\frac{1}{\tau}
\left[
x-\frac{\tau}{2n\pi}\sin\left(\frac{2n\pi x}{\tau}\right)
\right]_{0}^{\tau}\\
&=\frac{1}{\tau}\left[\tau-\frac{\tau}{2n\pi}\sin(2n\pi)\right]\\
&=1.
\end{align*}
Thus $(e_n)_{n\in\mathbb N}$ is an orthonormal sequence.
Now take $f(x)=x$. Its $n$th coefficient is
\begin{align*}
(f,e_n)_H
&=\sqrt{\frac{2}{\tau}}\int_0^\tau x\sin\left(\frac{n\pi x}{\tau}\right)\,d\mathcal L^1(x).
\end{align*}
Using [integration by parts](/theorems/2098) with $u=x$ and $dv=\sin(n\pi x/\tau)\,dx$, so that $du=dx$ and $v=-\tau\cos(n\pi x/\tau)/(n\pi)$, we get
\begin{align*}
\int_0^\tau x\sin\left(\frac{n\pi x}{\tau}\right)\,d\mathcal L^1(x)
&=
\left[
-\frac{\tau x}{n\pi}\cos\left(\frac{n\pi x}{\tau}\right)
\right]_0^\tau
+
\frac{\tau}{n\pi}\int_0^\tau
\cos\left(\frac{n\pi x}{\tau}\right)\,d\mathcal L^1(x)\\
&=
-\frac{\tau^2}{n\pi}\cos(n\pi)
+
\frac{\tau}{n\pi}
\left[
\frac{\tau}{n\pi}\sin\left(\frac{n\pi x}{\tau}\right)
\right]_0^\tau\\
&=
-\frac{\tau^2}{n\pi}(-1)^n
+
\frac{\tau^2}{n^2\pi^2}\sin(n\pi)\\
&=
(-1)^{n+1}\frac{\tau^2}{n\pi}.
\end{align*}
Therefore
\begin{align*}
(f,e_n)_H
&=\sqrt{\frac{2}{\tau}}\cdot (-1)^{n+1}\frac{\tau^2}{n\pi}\\
&=(-1)^{n+1}\sqrt{2}\,\frac{\tau^{3/2}}{n\pi}.
\end{align*}
The partial sums $\sum_{n=1}^N (f,e_n)_H e_n$ are not algebraic coordinate expansions in a finite-dimensional space; they are approximations in the $L^2$ norm. This example shows the central pattern: orthogonality computes the coefficients explicitly, while completeness decides whether the infinite expansion returns the original vector.
[/example]
The purpose of this chapter is to build that pattern in abstract Hilbert spaces. We start with orthonormal systems, separate orthogonality from completeness, prove the energy inequalities that make coefficients meaningful, and then explain why every Hilbert space has enough orthonormal vectors to act as coordinates.
## Orthogonality, Span, and Completeness
The basic obstruction to usable coordinates is interaction between basis vectors. If two directions are not perpendicular, changing one coordinate changes the contribution of another. The inner product removes this interference when the chosen vectors are pairwise orthogonal and normalized to length $1$.
[definition: Orthonormal Set]
Let $H$ be a Hilbert space over $\mathbb R$ or $\mathbb C$. A subset $E \subset H$ is an orthonormal set if every $e \in E$ satisfies $\|e\|_H = 1$ and every pair of distinct elements $e,f \in E$ satisfies $(e,f)_H = 0$.
[/definition]
Orthonormality is a metric condition, not yet a spanning condition. A single unit vector in an infinite-dimensional Hilbert space is orthonormal, but it does not describe most vectors. The next issue is whether the chosen directions leave a hidden perpendicular direction behind; if they do, every coefficient in the system can vanish while the vector itself remains nonzero.
[definition: Complete Orthonormal Set]
Let $H$ be a Hilbert space and let $E \subset H$ be an orthonormal set. The set $E$ is complete if the only vector $x \in H$ satisfying $(x,e)_H = 0$ for every $e \in E$ is $x=0$.
[/definition]
Completeness is often the right conceptual condition, but approximation is the condition used in estimates. To connect them, we need the closed subspace generated by a family of vectors. This construction records not only finite linear combinations, but also every vector that can be reached as a norm limit of such combinations.
[definition: Closed Linear Span]
Let $H$ be a Hilbert space and let $A \subset H$. The closed linear span of $A$ is
\begin{align*}
\overline{\operatorname{span}}(A) = \overline{\left\{\sum_{i=1}^n \alpha_i a_i : n \in \mathbb N,\ \alpha_i \in \mathbb F,\ a_i \in A\right\}},
\end{align*}
where $\mathbb F$ is the scalar field of $H$ and the closure is taken in the norm of $H$.
[/definition]
## Definition
The problem is now to name the kind of orthonormal set that actually serves as coordinates for the whole Hilbert space. Orthogonality alone gives independent directions, while closed spanning gives approximation of arbitrary vectors. The central object of the chapter is the structure that has both features at once.
[definition: Orthonormal Basis]
Let $H$ be a Hilbert space. An orthonormal basis of $H$ is a complete orthonormal set $E \subset H$.
[/definition]
This terminology differs from the algebraic notion of a Hamel basis. An orthonormal basis usually does not mean that each vector is a finite linear combination of basis elements. It means that each vector is a norm limit of finite linear combinations, with coefficients controlled by the inner product.
## Coefficients
A coordinate system needs a way to extract coordinates without solving a new approximation problem each time. Orthogonality makes this possible: when all other basis directions are perpendicular to $e$, the inner product with $e$ isolates exactly the scalar multiplying that direction. This scalar is the coordinate attached to $e$.
[definition: Fourier Coefficient]
Let $H$ be a Hilbert space, let $E \subset H$ be an orthonormal set, and let $x \in H$. For $e \in E$, the Fourier coefficient of $x$ at $e$ is the scalar $(x,e)_H$.
[/definition]
The word Fourier here is structural rather than limited to trigonometric functions. It signals that vectors are being analyzed by their projections onto orthonormal directions.
[remark: Linear Convention]
Throughout this page the inner product $(\cdot,\cdot)_H$ is linear in the first argument and conjugate-linear in the second argument. With this convention, the coefficient multiplying $e$ in an expansion of $x$ is $(x,e)_H$.
[/remark]
## Orthogonality and Finite Coordinates
Before infinite expansions enter the story, orthonormality already gives exact finite-dimensional geometry. The key point is that squared norms add along perpendicular directions, turning inner products into coordinate arithmetic.
A finite orthonormal family should behave like the standard coordinate axes in $\mathbb F^n$. The following theorem is the algebraic core behind every later convergence statement.
[quotetheorem:4883]
This theorem says that the map sending a finite scalar list to the corresponding linear combination is an isometry from $\mathbb F^n$ into $H$. It is why normalization is not cosmetic: without length $1$, coefficients would carry scale factors. The next issue is approximation from such a finite span: if a vector lies outside the span, the coordinate partial sum should be the nearest vector inside it.
[quotetheorem:4858]
The minimization statement is the analytic meaning of coordinate projection. It tells us that finite orthonormal expansions are not arbitrary truncations; they are best approximations from their finite span.
[example: Failure Without Orthogonality]
In $\mathbb R^2$ with the Euclidean inner product, let $u_1=(1,0)$, $u_2=(1,1)$, and $x=(0,1)$. The two chosen directions are not orthogonal, because
\begin{align*}
(u_1,u_2)
&=(1,0)\cdot(1,1)\\
&=1\cdot 1+0\cdot 1\\
&=1\ne 0.
\end{align*}
The inner products of $x$ with these two vectors are
\begin{align*}
(x,u_1)
&=(0,1)\cdot(1,0)\\
&=0\cdot 1+1\cdot 0\\
&=0,
\end{align*}
and
\begin{align*}
(x,u_2)
&=(0,1)\cdot(1,1)\\
&=0\cdot 1+1\cdot 1\\
&=1.
\end{align*}
If we tried to reconstruct $x$ by the orthonormal-coordinate formula using these non-orthogonal vectors, the proposed sum would be
\begin{align*}
(x,u_1)u_1+(x,u_2)u_2
&=0(1,0)+1(1,1)\\
&=(0,0)+(1,1)\\
&=(1,1).
\end{align*}
Since $(1,1)\ne(0,1)=x$, these inner products do not give the coordinates of $x$ in the family $\{u_1,u_2\}$. The failure is exactly the nonzero interaction $(u_1,u_2)=1$: inner products against a non-orthogonal family do not isolate independent coordinate directions.
[/example]
This finite example explains why orthonormal bases are special among spanning families. They make the coordinate functionals explicit and stable.
## Bessel Inequality and Energy Control
In an infinite-dimensional Hilbert space, taking all coefficients at once could produce too much data. A vector has finite norm, so its coordinate magnitudes must have finite total energy if the coordinate system is to be meaningful.
The first result says that every orthonormal set is automatically safe in this sense. Even without completeness, its coefficients cannot contain more energy than the vector itself. The quoted theorem card below records the real Hilbert-space form of [Bessel's inequality](/theorems/540). For the complex Hilbert spaces also covered by this page, the corresponding estimate is obtained by replacing each real square with the absolute square $|(x,e)|^2$; all later complex examples use that absolute-square form rather than $c_k^2$.
[quotetheorem:540]
Bessel inequality is the reason Fourier coefficients belong to an $\ell^2$ space. In the complex examples below we use this complex-valued form with absolute squares. It also raises a subtle point for nonseparable Hilbert spaces: even if the orthonormal set itself is uncountable, one fixed vector cannot have uncountably many significant nonzero coordinates, because the finite partial sums of squared magnitudes remain bounded.
[quotetheorem:4931]
This result matters in nonseparable Hilbert spaces, where an orthonormal basis may be uncountable. Even there, a single vector uses at most countably many basis directions with nonzero coefficient.
[example: Missing Energy in an Incomplete System]
Let $H=\ell^2(\mathbb N)$ with inner product
\begin{align*}
(x,y)_H=\sum_{k=1}^{\infty}x_k\overline{y_k},
\end{align*}
and let $E=\{e_2,e_3,e_4,\ldots\}$, where $e_n$ has $1$ in the $n$th coordinate and $0$ in every other coordinate. For $m,n\ge 2$,
\begin{align*}
(e_m,e_n)_H
&=\sum_{k=1}^{\infty}(e_m)_k\overline{(e_n)_k}.
\end{align*}
If $m=n$, then the only nonzero summand occurs at $k=n$, so
\begin{align*}
(e_n,e_n)_H
&=(e_n)_n\overline{(e_n)_n}\\
&=1\cdot \overline{1}\\
&=1.
\end{align*}
If $m\ne n$, then there is no coordinate $k$ for which both $(e_m)_k$ and $(e_n)_k$ are nonzero, so every summand is $0$ and
\begin{align*}
(e_m,e_n)_H=0.
\end{align*}
Thus $E$ is an orthonormal set.
Now take $x=e_1$. For every $n\ge 2$,
\begin{align*}
(x,e_n)_H
&=(e_1,e_n)_H\\
&=\sum_{k=1}^{\infty}(e_1)_k\overline{(e_n)_k}\\
&=0,
\end{align*}
because $(e_1)_k$ is nonzero only when $k=1$, while $(e_n)_k$ is nonzero only when $k=n\ge 2$. Hence for every finite subset $F\subset E$,
\begin{align*}
\sum_{e\in F}|(x,e)_H|^2
&=\sum_{e\in F}0\\
&=0.
\end{align*}
On the other hand,
\begin{align*}
\|x\|_{\ell^2}^2
&=(e_1,e_1)_H\\
&=\sum_{k=1}^{\infty}|(e_1)_k|^2\\
&=|1|^2\\
&=1.
\end{align*}
Thus the complex form of Bessel's inequality reads $0\le 1$, but equality fails. The missing energy is exactly the first coordinate direction, which is orthogonal to every vector in $E$ and is not included in the system.
[/example]
The example separates orthonormality from completeness. Orthonormality prevents overcounting energy; completeness prevents losing energy.
## Completeness and Parseval Identity
The defining test for completeness says that no nonzero vector is orthogonal to every basis direction. For computations, the more useful test is whether all energy is captured by the coefficients.
The next theorem is the central equivalence behind orthonormal bases. It says that density, uniqueness of orthogonal residuals, norm recovery, and series expansion are the same phenomenon.
[quotetheorem:4932]
The equality in condition 3 is the norm formula promised by the opening example, but it is worth isolating because it is used constantly in estimates. Once the orthonormal system is complete, the missing-energy gap from Bessel inequality disappears.
The remaining computational question is how much geometry is recovered from the coefficient sequence, not just how much norm is recovered. Parseval's identities answer that question by turning both norms and inner products into coordinate sums. The quoted Parseval card is stated for real Hilbert spaces; in complex Hilbert spaces the norm identity uses $|(x,e)|^2$, and with our convention that the inner product is linear in the first argument the inner-product identity has the form $\sum_e (x,e)\overline{(y,e)}$.
[quotetheorem:541]
The second identity is the coordinate form of the inner product. Once a Hilbert space has an orthonormal basis, its geometry is encoded by square-summable scalar sequences.
[example: Parseval for the Sine Expansion of $x$]
Returning to $H=L^2(0,\tau)$ with $\tau>0$ and $f(x)=x$, the opening computation gave
\begin{align*}
(f,e_n)_H
&=(-1)^{n+1}\sqrt{2}\,\frac{\tau^{3/2}}{n\pi}.
\end{align*}
Therefore
\begin{align*}
|(f,e_n)_H|^2
&=\left|(-1)^{n+1}\sqrt{2}\,\frac{\tau^{3/2}}{n\pi}\right|^2\\
&=|(-1)^{n+1}|^2|\sqrt{2}|^2\frac{|\tau^{3/2}|^2}{|n\pi|^2}\\
&=1\cdot 2\cdot \frac{\tau^3}{n^2\pi^2}\\
&=\frac{2\tau^3}{n^2\pi^2}.
\end{align*}
Because the sine system is complete in $L^2(0,\tau)$, *[Parseval Identity](/theorems/248)* applies to $f$ and gives
\begin{align*}
\|f\|_{L^2(0,\tau)}^2
&=\sum_{n=1}^\infty |(f,e_n)_H|^2\\
&=\sum_{n=1}^\infty \frac{2\tau^3}{n^2\pi^2}.
\end{align*}
The left-hand side is
\begin{align*}
\|f\|_{L^2(0,\tau)}^2
&=\int_0^\tau |x|^2\,d\mathcal L^1(x)\\
&=\int_0^\tau x^2\,d\mathcal L^1(x)\\
&=\left[\frac{x^3}{3}\right]_0^\tau\\
&=\frac{\tau^3}{3}.
\end{align*}
Combining the two expressions for $\|f\|_{L^2(0,\tau)}^2$ gives
\begin{align*}
\frac{\tau^3}{3}
&=\sum_{n=1}^\infty \frac{2\tau^3}{n^2\pi^2}\\
&=\frac{2\tau^3}{\pi^2}\sum_{n=1}^\infty \frac{1}{n^2}.
\end{align*}
Since $\tau>0$, we may divide by $\tau^3$ and then multiply by $\pi^2/2$:
\begin{align*}
\frac{1}{3}
&=\frac{2}{\pi^2}\sum_{n=1}^\infty \frac{1}{n^2},\\
\frac{\pi^2}{6}
&=\sum_{n=1}^\infty \frac{1}{n^2}.
\end{align*}
Thus the completeness of the sine basis turns the norm identity for $f(x)=x$ into Euler's numerical identity for $\sum_{n=1}^\infty n^{-2}$.
[/example]
Completeness is therefore the bridge from approximation to exact representation. The next question is how to obtain complete orthonormal systems in the first place.
## Construction by Orthogonalization
In finite-dimensional linear algebra, a basis can be turned into an orthonormal basis by subtracting projections and normalizing. The same idea works for countable dense sets, provided we discard vectors that contribute no new direction.
Before stating the construction, we isolate the operation it uses. Orthogonalization takes a list of independent vectors and replaces each vector by the part not already explained by the previous ones.
[definition: Gram-Schmidt Orthogonalization]
Let $H$ be an [inner product space](/page/Inner%20Product%20Space) and let $(v_n)_{n\in\mathbb N}$ be a sequence of linearly independent vectors in $H$. The Gram-Schmidt orthogonalization of $(v_n)$ is the sequence $(e_n)_{n\in\mathbb N}$ defined recursively by
\begin{align*}
w_n &= v_n - \sum_{i=1}^{n-1}(v_n,e_i)_H e_i, \\
e_n &= \frac{w_n}{\|w_n\|_H}.
\end{align*}
[/definition]
The key question is whether this recursive procedure really preserves the approximation power of the original list. If orthogonalization changed the span at some finite stage, it could produce attractive perpendicular vectors while losing the original functions or vectors we wanted to approximate. The theorem records that this loss does not happen.
[quotetheorem:435]
The theorem turns a linearly independent list into perpendicular coordinates without changing any finite span. To obtain a full basis from this process, we need a countable supply of vectors whose closure reaches the whole Hilbert space. That topological countability condition is separability.
[definition: Separable Hilbert Space]
A Hilbert space $H$ is separable if there exists a countable subset $D \subset H$ such that $\overline{D}=H$.
[/definition]
Separability says that the space has countably many points from which all other points can be approximated. In a Hilbert space, this is enough to build countably many orthonormal directions whose closed span is the whole space. The next theorem explains why most Hilbert spaces appearing in analysis have orthonormal bases that can be written as sequences.
[quotetheorem:543]
The construction begins from a countable dense set, removes vectors already in the span of earlier choices, and applies Gram-Schmidt to the remaining linearly independent sequence. Density supplies completeness.
[example: Legendre Polynomials]
Let $H=L^2(-1,1)$ with inner product
\begin{align*}
(f,g)_H = \int_{-1}^1 f(x)\overline{g(x)}\,d\mathcal L^1(x).
\end{align*}
We apply Gram-Schmidt to $v_1(x)=1$, $v_2(x)=x$, and $v_3(x)=x^2$. First,
\begin{align*}
\|v_1\|_H^2
&=\int_{-1}^1 1\,d\mathcal L^1(x)\\
&=2,
\end{align*}
so
\begin{align*}
e_1(x)=\frac{1}{\sqrt 2}.
\end{align*}
For the second vector,
\begin{align*}
(v_2,e_1)_H
&=\int_{-1}^1 x\cdot \frac{1}{\sqrt 2}\,d\mathcal L^1(x)\\
&=\frac{1}{\sqrt 2}\left[\frac{x^2}{2}\right]_{-1}^1\\
&=\frac{1}{\sqrt 2}\left(\frac12-\frac12\right)\\
&=0.
\end{align*}
Thus $w_2=v_2-(v_2,e_1)_He_1=x$, and
\begin{align*}
\|w_2\|_H^2
&=\int_{-1}^1 x^2\,d\mathcal L^1(x)\\
&=\left[\frac{x^3}{3}\right]_{-1}^1\\
&=\frac13-\left(-\frac13\right)\\
&=\frac23.
\end{align*}
Therefore
\begin{align*}
e_2(x)=\frac{x}{\sqrt{2/3}}=\sqrt{\frac32}\,x.
\end{align*}
For the third vector, the coefficient in the $e_1$ direction is
\begin{align*}
(v_3,e_1)_H
&=\int_{-1}^1 x^2\cdot \frac{1}{\sqrt 2}\,d\mathcal L^1(x)\\
&=\frac{1}{\sqrt 2}\left[\frac{x^3}{3}\right]_{-1}^1\\
&=\frac{1}{\sqrt 2}\left(\frac13-\left(-\frac13\right)\right)\\
&=\frac{2}{3\sqrt 2}.
\end{align*}
Hence
\begin{align*}
(v_3,e_1)_H e_1
&=\frac{2}{3\sqrt 2}\cdot \frac{1}{\sqrt 2}\\
&=\frac13.
\end{align*}
The coefficient in the $e_2$ direction is
\begin{align*}
(v_3,e_2)_H
&=\int_{-1}^1 x^2\sqrt{\frac32}\,x\,d\mathcal L^1(x)\\
&=\sqrt{\frac32}\int_{-1}^1 x^3\,d\mathcal L^1(x)\\
&=\sqrt{\frac32}\left[\frac{x^4}{4}\right]_{-1}^1\\
&=\sqrt{\frac32}\left(\frac14-\frac14\right)\\
&=0.
\end{align*}
Therefore
\begin{align*}
w_3
&=v_3-(v_3,e_1)_He_1-(v_3,e_2)_He_2\\
&=x^2-\frac13-0\\
&=x^2-\frac13.
\end{align*}
Its squared norm is
\begin{align*}
\|w_3\|_H^2
&=\int_{-1}^1 \left(x^2-\frac13\right)^2\,d\mathcal L^1(x)\\
&=\int_{-1}^1 \left(x^4-\frac23x^2+\frac19\right)\,d\mathcal L^1(x)\\
&=\left[\frac{x^5}{5}-\frac{2x^3}{9}+\frac{x}{9}\right]_{-1}^1\\
&=\left(\frac15-\frac29+\frac19\right)-\left(-\frac15+\frac29-\frac19\right)\\
&=\frac25-\frac49+\frac29\\
&=\frac25-\frac29\\
&=\frac{18-10}{45}\\
&=\frac{8}{45}.
\end{align*}
Thus
\begin{align*}
e_3(x)
&=\frac{x^2-\frac13}{\sqrt{8/45}}\\
&=\sqrt{\frac{45}{8}}\left(x^2-\frac13\right).
\end{align*}
So the first three orthonormal vectors obtained from $1,x,x^2,\ldots$ are
\begin{align*}
\frac{1}{\sqrt2},\qquad \sqrt{\frac32}\,x,\qquad \sqrt{\frac{45}{8}}\left(x^2-\frac13\right).
\end{align*}
The term $1/3$ is exactly the projection of $x^2$ onto the constant direction, so Gram-Schmidt turns the elementary polynomial list into an orthonormal polynomial system by subtracting earlier coordinate components and normalizing.
[/example]
The separable case is constructive. General Hilbert spaces require a maximality argument rather than a countable algorithm.
## Maximal Orthonormal Families
### Existence Beyond Separable Spaces
A Hilbert space may be too large to have a countable [dense subset](/page/Dense%20Subset). In that setting, the right question is not how to list the basis, but whether a maximal orthonormal family exists and whether maximality forces completeness.
The relevant definition captures the idea that no new orthonormal direction can be added.
[definition: Maximal Orthonormal Set]
Let $H$ be a Hilbert space. An orthonormal set $E \subset H$ is maximal if there is no orthonormal set $F \subset H$ such that $E \subsetneq F$.
[/definition]
The important question is whether maximality is merely a stopping rule or whether it has analytic force. If a nonzero vector were orthogonal to every element of $E$, normalizing it would produce a new direction and enlarge $E$. Thus maximality is expected to rule out exactly the hidden perpendicular directions that obstruct completeness.
[quotetheorem:4887]
This theorem converts the problem of constructing bases into the problem of finding maximal orthonormal sets. Chains of orthonormal sets have unions that remain orthonormal, so [Zorn's lemma](/theorems/1226) supplies such maximal families. Combining those two ideas gives the general existence statement: every Hilbert space has an orthonormal basis, although the indexing set may be uncountable.
The existence argument is nonconstructive in large spaces. It guarantees coordinates, but not a practical indexing scheme. For applications, separability remains the condition that makes bases usable as sequences.
### Dimension
Once bases exist, a natural classification question appears. In finite-dimensional linear algebra, all bases have the same number of vectors; Hilbert spaces have an analogous invariant, but it counts orthonormal basis vectors rather than Hamel basis vectors.
[definition: Hilbert Dimension]
Let $H$ be a Hilbert space. The Hilbert dimension of $H$ is the cardinality of any orthonormal basis of $H$.
[/definition]
This definition is meaningful only if the cardinality does not depend on which orthonormal basis is chosen. The possible obstruction is that a Hilbert space might admit two orthonormal bases of different sizes, making the phrase "the Hilbert dimension" ambiguous. The invariance result below rules out that obstruction: the number of perpendicular coordinate directions is determined by the space, not by the particular basis chosen.
[quotetheorem:4933]
Hilbert dimension should not be confused with Hamel dimension. An infinite-dimensional separable Hilbert space has countable Hilbert dimension, while its Hamel dimension is much larger.
## Coordinate Isomorphisms
### The $\ell^2$ Model
Once an orthonormal basis is fixed, a Hilbert space can be compared directly with an $\ell^2$ space over the index set of that basis. This is the precise form of the slogan that Hilbert spaces are Euclidean spaces with possibly infinitely many perpendicular axes.
To state the coordinate model for arbitrary index sets, we need the square-summable scalar space over a set.
[definition: $\ell^2(I)$]
Let $I$ be a set and let $\mathbb F$ be $\mathbb R$ or $\mathbb C$. The space $\ell^2(I)$ consists of all functions $a:I\to\mathbb F$ such that
\begin{align*}
\sum_{i\in I}|a_i|^2 < \infty,
\end{align*}
where the sum is defined as the supremum over finite subsets of $I$. Its inner product is
\begin{align*}
(a,b)_{\ell^2(I)} = \sum_{i\in I} a_i\overline{b_i}.
\end{align*}
[/definition]
The definition allows uncountable $I$, but each element of $\ell^2(I)$ has countable support. The coordinate theorem now asks for more than an inequality: it asks whether every square-summable scalar family comes from a unique vector and whether every vector is recovered from its coefficients.
[quotetheorem:4889]
This theorem is the strongest coordinate statement. It says that after choosing an orthonormal basis, all Hilbert-space questions can be transported to $\ell^2(E)$, where geometry is encoded by scalar sequences.
[example: The Standard Basis of $\ell^2$]
Let $H=\ell^2(\mathbb N)$, and let $e_n$ be the sequence whose $k$th coordinate is
\begin{align*}
(e_n)_k=
\begin{cases}
1, & k=n,\\
0, & k\ne n.
\end{cases}
\end{align*}
For $x=(x_k)_{k\in\mathbb N}\in\ell^2(\mathbb N)$, the $n$th coefficient is
\begin{align*}
(x,e_n)_H
&=\sum_{k=1}^{\infty}x_k\overline{(e_n)_k}\\
&=x_n\overline{(e_n)_n}+\sum_{\substack{k=1\\ k\ne n}}^{\infty}x_k\overline{(e_n)_k}\\
&=x_n\overline{1}+\sum_{\substack{k=1\\ k\ne n}}^{\infty}x_k\overline{0}\\
&=x_n.
\end{align*}
Thus the coordinate map sends $x$ to the same scalar sequence $(x_n)_{n\in\mathbb N}$.
The norm identity from *Parseval Identity* becomes
\begin{align*}
\|x\|_{\ell^2}^2
&=\sum_{n=1}^{\infty}|(x,e_n)_H|^2\\
&=\sum_{n=1}^{\infty}|x_n|^2,
\end{align*}
which is exactly the defining squared norm on $\ell^2(\mathbb N)$. For the partial sums,
\begin{align*}
\sum_{n=1}^N (x,e_n)_H e_n
&=\sum_{n=1}^N x_n e_n\\
&=(x_1,x_2,\ldots,x_N,0,0,\ldots).
\end{align*}
Therefore the error is
\begin{align*}
x-\sum_{n=1}^N x_n e_n
&=(0,0,\ldots,0,x_{N+1},x_{N+2},\ldots),
\end{align*}
and its squared norm is
\begin{align*}
\left\|x-\sum_{n=1}^N x_n e_n\right\|_{\ell^2}^2
&=\sum_{k=N+1}^{\infty}|x_k|^2.
\end{align*}
Since $x\in\ell^2(\mathbb N)$, the series $\sum_{k=1}^{\infty}|x_k|^2$ converges, so its tails satisfy
\begin{align*}
\sum_{k=N+1}^{\infty}|x_k|^2 \to 0.
\end{align*}
Hence
\begin{align*}
\sum_{n=1}^N (x,e_n)_H e_n \to x
\end{align*}
in $\ell^2$ norm. The abstract orthonormal-basis expansion is therefore the usual reconstruction of a square-summable sequence from its coordinates.
[/example]
### Unitary Changes of Coordinates
Different orthonormal bases give different coordinate labels, but the underlying Hilbert geometry should not change. The maps that preserve this geometry are unitary operators, and they are the correct notion of change of orthonormal coordinates.
[definition: Unitary Operator]
Let $H$ and $K$ be Hilbert spaces over the same scalar field. A [linear map](/page/Linear%20Map) $U:H\to K$ is unitary if it is surjective and
\begin{align*}
(Ux,Uy)_K = (x,y)_H
\end{align*}
for every $x,y\in H$.
[/definition]
The key test for a change of coordinates is whether it transports complete perpendicular coordinate systems to complete perpendicular coordinate systems. A unitary operator preserves lengths, inner products, orthogonality, projections, and convergence of series, so it should preserve both parts of the orthonormal-basis condition.
[quotetheorem:4934]
This result is often used in reverse: to prove a family is an orthonormal basis, identify it as the unitary image of a known one.
## Approximation and Operators
Orthonormal bases are not only coordinate systems for vectors; they also turn many operator questions into questions about scalar sequences. This is especially powerful when the operator respects the basis directions.
The simplest such operators scale each basis vector independently. These are the Hilbert-space analogue of diagonal matrices.
[definition: Diagonal Operator with Respect to an Orthonormal Basis]
Let $H$ be a Hilbert space with orthonormal basis $E$, and let $(\lambda_e)_{e\in E}$ be a family of scalars. Define
\begin{align*}
\mathcal D(T)=\left\{x\in H: \sum_{e\in E}|\lambda_e|^2 |(x,e)_H|^2<\infty\right\}.
\end{align*}
A linear operator
\begin{align*}
T: \mathcal D(T) &\to H \\
x &\mapsto \sum_{e\in E}\lambda_e (x,e)_H e
\end{align*}
is diagonal with respect to $E$.
[/definition]
The problem is to determine when this diagonal rule defines an operator on all of $H$ and how large that operator is. In finite dimensions the largest diagonal magnitude controls the matrix norm; in the Hilbert-space setting the same principle survives exactly when the diagonal entries are bounded.
[quotetheorem:4935]
This theorem reduces the norm of a diagonal operator to the supremum of its diagonal entries. It is a prototype for spectral theory, where suitable operators are studied by finding orthonormal bases adapted to them.
[example: Compact Diagonal Operator]
On $\ell^2(\mathbb N)$, define
\begin{align*}
T(x_1,x_2,x_3,\ldots)
&=\left(x_1,\frac{x_2}{2},\frac{x_3}{3},\ldots\right).
\end{align*}
For $x=(x_n)_{n\in\mathbb N}\in \ell^2(\mathbb N)$,
\begin{align*}
\|Tx\|_{\ell^2}^2
&=\sum_{n=1}^{\infty}\left|\frac{x_n}{n}\right|^2\\
&=\sum_{n=1}^{\infty}\frac{|x_n|^2}{n^2}\\
&\le \sum_{n=1}^{\infty}|x_n|^2\\
&=\|x\|_{\ell^2}^2.
\end{align*}
Thus $T$ is bounded and $\|T\|\le 1$. Since
\begin{align*}
Te_1=e_1,
\end{align*}
we also have
\begin{align*}
\|T\|
&\ge \frac{\|Te_1\|_{\ell^2}}{\|e_1\|_{\ell^2}}\\
&=\frac{1}{1}\\
&=1.
\end{align*}
Therefore $\|T\|=1$.
For $N\in\mathbb N$, define the finite-rank truncation
\begin{align*}
T_N(x_1,x_2,x_3,\ldots)
&=\left(x_1,\frac{x_2}{2},\ldots,\frac{x_N}{N},0,0,\ldots\right).
\end{align*}
Its range is contained in $\operatorname{span}\{e_1,\ldots,e_N\}$, so $T_N$ has finite rank. For $x\in\ell^2(\mathbb N)$,
\begin{align*}
(T-T_N)x
&=\left(0,\ldots,0,\frac{x_{N+1}}{N+1},\frac{x_{N+2}}{N+2},\ldots\right),
\end{align*}
and hence
\begin{align*}
\|(T-T_N)x\|_{\ell^2}^2
&=\sum_{n=N+1}^{\infty}\frac{|x_n|^2}{n^2}\\
&\le \frac{1}{(N+1)^2}\sum_{n=N+1}^{\infty}|x_n|^2\\
&\le \frac{1}{(N+1)^2}\|x\|_{\ell^2}^2.
\end{align*}
Thus $\|T-T_N\|\le 1/(N+1)$. Taking $x=e_{N+1}$ gives
\begin{align*}
\|(T-T_N)e_{N+1}\|_{\ell^2}
&=\left\|\frac{1}{N+1}e_{N+1}\right\|_{\ell^2}\\
&=\frac{1}{N+1},
\end{align*}
so
\begin{align*}
\|T-T_N\|=\frac{1}{N+1}.
\end{align*}
Since $1/(N+1)\to 0$, the finite-rank operators $T_N$ converge to $T$ in operator norm. By *[Finite-Rank Operators are Compact](/theorems/4891)* and *Norm Limit of Compact Operators*, $T$ is compact. This example shows that, in orthonormal coordinates, compactness appears as decay of the diagonal entries $\lambda_n=1/n$ to $0$.
[/example]
The same basis language underlies [Fourier series](/page/Fourier%20Series), Sturm-Liouville theory, spectral decompositions of compact [self-adjoint operators](/page/Self-Adjoint%20Operators), and weak convergence arguments in Hilbert spaces.
## Beyond and Connected Topics
Orthonormal bases are the coordinate foundation for [Hilbert Space](/page/Hilbert%20Space) theory. The next layer is projection theory: closed subspaces, orthogonal complements, and the decomposition $H=M\oplus M^\perp$ explain why best approximation exists and why least-squares problems have geometric solutions.
They also lead directly into [Weak Convergence](/page/Weak%20Convergence). In a Hilbert space with orthonormal basis $(e_n)_{n\in\mathbb N}$, weak convergence can often be tested through coefficients together with boundedness. This is one reason orthonormal bases appear throughout compactness arguments in PDE and variational problems.
Spectral theory is the operator-theoretic continuation. Compact self-adjoint operators admit orthonormal bases of eigenvectors under suitable hypotheses, turning infinite-dimensional operators into diagonal coordinate rules. This is the Hilbert-space version of diagonalizing a symmetric matrix.
Fourier analysis is the classical [analytic continuation](/page/Analytic%20Continuation). Trigonometric systems, sine and cosine systems, Hermite functions, and wavelets are all examples of orthonormal systems designed to match a particular operator, domain, or scale structure.
In PDE, orthonormal bases are used to construct Galerkin approximations. A solution is first sought in the span of finitely many basis functions, estimates are made uniformly in the dimension, and compactness or weak convergence is used to pass to a limit.
## References
Androma, [Hilbert Space](/page/Hilbert%20Space).
Androma, [Weak Convergence](/page/Weak%20Convergence).
Reed and Simon, *Methods of Modern Mathematical Physics I: Functional Analysis* (1980).
Conway, *A Course in Functional Analysis* (1990).
Brezis, *Functional Analysis, Sobolev Spaces and Partial Differential Equations* (2011).