Geometry lives in the notion of angle. You can have a vector space — a collection of objects you can add and scale — without any concept of perpendicularity, without any way to measure how "close" two directions are to each other. Linear algebra as developed so far gives you span, independence, bases, and dimension, but nothing that distinguishes the unit vectors in $\mathbb{R}^2$ from any other basis. The moment you ask "are these two vectors orthogonal?" or "what is the projection of $v$ onto $u$?", you are reaching for something more: a rule that turns pairs of vectors into numbers, encoding their geometric relationship. That rule is an inner product.
The payoff is enormous. From a single inner product, you recover lengths (by setting both inputs equal), angles (via the ratio of inner product to lengths), orthogonality, orthonormal bases, and eventually an entire calculus of projections. The [Gram-Schmidt process](/page/Gram-Schmidt%20Process) turns any basis into an orthonormal one. The [spectral theorem](/page/Spectral%20Theorem) — one of the deepest results in linear algebra — says that self-adjoint operators on inner product spaces are diagonalizable by an orthonormal basis. Every result in the theory of [Fourier series](/page/Fourier%20Series) and $L^2$ spaces is an infinite-dimensional echo of the geometry we develop here.
[example: Dot Product in Euclidean Space]
In $\mathbb{R}^n$, the familiar dot product is
\begin{align*}
\langle v, w \rangle &= \sum_{i=1}^{n} v_i w_i.
\end{align*}
This gives $|\langle v, w \rangle| \le |v| |w|$ (Cauchy-Schwarz), with equality iff $v$ and $w$ are parallel. The angle $\theta$ between $v$ and $w$ satisfies
\begin{align*}
\cos \theta &= \frac{\langle v, w \rangle}{|v| \, |w|}.
\end{align*}
For example, $v = (1, 1, 0)$ and $w = (1, -1, 0)$ satisfy $\langle v, w \rangle = 1 \cdot 1 + 1 \cdot (-1) + 0 = 0$, so $\theta = \pi/2$: these vectors are orthogonal. The dot product is the prototype that the abstract definition will axiomatize.
[/example]
But $\mathbb{R}^n$ with the dot product is only one example. The space $\mathbb{C}^n$ of complex $n$-tuples needs a Hermitian inner product to give real, non-negative lengths. The space of continuous functions $C([0,1])$ carries an $L^2$ inner product. The abstract definition captures all of these at once.
## Definition
What properties does the dot product actually use? It is bilinear (or sesquilinear over $\mathbb{C}$), symmetric (or conjugate-symmetric), and positive definite. These are precisely the axioms we isolate.
[definition: Inner Product]
Let $V$ be a vector space over a field $\mathbb{F}$, where $\mathbb{F}$ is either $\mathbb{R}$ or $\mathbb{C}$. An **inner product** on $V$ is a function
\begin{align*}
(\cdot, \cdot)_V : V \times V &\to \mathbb{F}
\end{align*}
satisfying the following three axioms for all $u, v, w \in V$ and all $\alpha \in \mathbb{F}$:
1. **Linearity in the first argument:** $(u + \alpha v, w)_V = (u, w)_V + \alpha (v, w)_V$.
2. **Conjugate symmetry:** $(u, v)_V = \overline{(v, u)_V}$.
3. **Positive definiteness:** $(v, v)_V \ge 0$, with $(v, v)_V = 0$ iff $v = 0$.
When $\mathbb{F} = \mathbb{R}$, conjugate symmetry reduces to ordinary symmetry: $(u, v)_V = (v, u)_V$.
[/definition]
[remark: Linearity Convention]
The convention here is linearity in the first argument and conjugate-linearity in the second. This matches the convention in the notation standards (and in most functional analysis), where $(f, g)_{L^2} = \int f \bar{g} \, d\mathcal{L}^n$ is linear in $f$ and conjugate-linear in $g$. Some textbooks (particularly in physics) reverse this convention. Be alert to the difference when reading other sources.
[/remark]
With an inner product in hand, we can name the structure it equips a vector space with. This is not merely a formality: the definition packages together the vector space and its inner product as a unit, so that statements about the geometry — angles, lengths, projections — refer unambiguously to a specific pairing, not just the underlying set of vectors.
[definition: Inner Product Space]
An **inner product space** is a pair $(V, (\cdot, \cdot)_V)$ where $V$ is a vector space over $\mathbb{F} \in \{\mathbb{R}, \mathbb{C}\}$ and $(\cdot, \cdot)_V$ is an inner product on $V$.
[/definition]
Every inner product induces a norm. This is not a definition but a theorem: the positivity axiom ensures that taking the square root of the inner product with itself yields a well-defined, non-negative real number, and the other norm axioms follow.
[definition: Induced Norm]
Let $(V, (\cdot, \cdot)_V)$ be an inner product space. The **induced norm** (or **norm induced by the inner product**) is
\begin{align*}
\|v\|_V &:= \sqrt{(v, v)_V}
\end{align*}
for $v \in V$.
[/definition]
That $\|\cdot\|_V$ satisfies the triangle inequality — the hardest norm axiom to verify — follows from the Cauchy-Schwarz inequality, which we prove in the next section. The other axioms are immediate: $\|v\|_V \ge 0$ by positive definiteness, $\|\alpha v\|_V = |\alpha| \|v\|_V$ because $(\alpha v, \alpha v)_V = \alpha \bar{\alpha} (v,v)_V = |\alpha|^2 (v,v)_V$, and $\|v\|_V = 0$ iff $v = 0$ again by positive definiteness.
[example: Standard Inner Products]
The most important examples of inner product spaces:
**1. Euclidean space $\mathbb{R}^n$.** Let $v = (v_1, \ldots, v_n)$ and $w = (w_1, \ldots, w_n)$. The standard inner product is
\begin{align*}
(v, w)_{\mathbb{R}^n} &= \sum_{i=1}^{n} v_i w_i.
\end{align*}
This is real-valued, symmetric, and positive definite. The induced norm is the Euclidean norm $\|v\|_{\mathbb{R}^n} = \sqrt{\sum_{i=1}^n v_i^2}$.
**2. Unitary space $\mathbb{C}^n$.** For $v, w \in \mathbb{C}^n$:
\begin{align*}
(v, w)_{\mathbb{C}^n} &= \sum_{i=1}^{n} v_i \overline{w_i}.
\end{align*}
Note the conjugate on $w_i$: this ensures $(v, v)_{\mathbb{C}^n} = \sum_i |v_i|^2 \ge 0$. Without the conjugate, taking $v = (i, 0, \ldots, 0)$ gives $(v, v) = i^2 = -1$, violating positive definiteness.
**3. $L^2([a,b])$.** On the space of square-integrable functions $f, g: [a,b] \to \mathbb{C}$:
\begin{align*}
(f, g)_{L^2} &= \int_a^b f(x) \overline{g(x)} \, d\mathcal{L}^1(x).
\end{align*}
Here $\mathcal{L}^1$ is Lebesgue measure on $[a,b]$. This is the inner product that makes Fourier analysis work: the trigonometric functions $\{e^{2\pi i n x}\}_{n \in \mathbb{Z}}$ are orthonormal in $L^2([0,1])$.
[/example]
The standard inner product on $\mathbb{R}^n$ and the $L^2$ inner product on function spaces show that inner products are plentiful, but not every norm you encounter will come from one. The question of which norms arise from inner products has a clean, testable answer — and the failure of that test reveals the geometric rigidity that an inner product imposes.
[example: The $\ell^1$ Norm Is Not an Inner Product Norm]
Not every norm comes from an inner product. The $\ell^1$ norm on $\mathbb{R}^2$ — $\|(v_1, v_2)\|_1 = |v_1| + |v_2|$ — satisfies the triangle inequality but does not arise from any inner product. The precise criterion is the [parallelogram law](/page/Parallelogram%20Law): a norm comes from an inner product if and only if $\|u + v\|^2 + \|u - v\|^2 = 2\|u\|^2 + 2\|v\|^2$ for all $u, v$.
For $\ell^1$: take $u = (1, 0)$ and $v = (0, 1)$. Then $\|u + v\|_1 = \|(1,1)\|_1 = 2$ and $\|u - v\|_1 = \|(1,-1)\|_1 = 2$, so the left side is $2^2 + 2^2 = 8$. The right side is $2\|u\|_1^2 + 2\|v\|_1^2 = 2 \cdot 1 + 2 \cdot 1 = 4$. The parallelogram law fails, so $\ell^1$ is not an inner product norm.
[/example]
## The Cauchy-Schwarz Inequality
The foundational inequality of inner product spaces is Cauchy-Schwarz. It is not a theorem about geometry — it is a consequence purely of the axioms, and it unlocks nearly everything else: the triangle inequality, the definition of angle, the continuity of the inner product, and eventually the Riesz representation theorem.
The key idea behind the proof is to ask: for a fixed $u$ and $v$, what choice of scalar $t \in \mathbb{F}$ minimizes $\|u - tv\|^2$? Since norms are non-negative, expanding and minimizing over $t$ forces a relationship between $(u, v)$ and $\|u\| \|v\|$.
[quotetheorem:432]
Cauchy-Schwarz makes it legitimate to define the angle between two nonzero vectors in a real inner product space. Since $|(u, v)_V| \le \|u\|_V \|v\|_V$, the ratio $(u, v)_V / (\|u\|_V \|v\|_V)$ lies in $[-1, 1]$, and the angle $\theta \in [0, \pi]$ is well-defined by $\cos \theta = (u, v)_V / (\|u\|_V \|v\|_V)$.
With Cauchy-Schwarz in hand, the triangle inequality for $\|\cdot\|_V$ follows immediately. For any $u, v \in V$:
\begin{align*}
\|u + v\|_V^2 &= (u + v, u + v)_V \\
&= \|u\|_V^2 + 2\operatorname{Re}(u, v)_V + \|v\|_V^2 \\
&\le \|u\|_V^2 + 2|(u, v)_V| + \|v\|_V^2 \\
&\le \|u\|_V^2 + 2\|u\|_V \|v\|_V + \|v\|_V^2 \\
&= (\|u\|_V + \|v\|_V)^2.
\end{align*}
Taking square roots gives $\|u + v\|_V \le \|u\|_V + \|v\|_V$.
[example: Cauchy-Schwarz for Integrals]
Take $V = L^2([0,1])$ with the standard $L^2$ inner product. Let $f(x) = x$ and $g(x) = \sqrt{x}$, so both $f, g \in L^2([0,1])$.
We compute each quantity. First,
\begin{align*}
(f, g)_{L^2} &= \int_0^1 x \cdot \sqrt{x} \, d\mathcal{L}^1 = \int_0^1 x^{3/2} \, d\mathcal{L}^1 = \left[\frac{x^{5/2}}{5/2}\right]_0^1 = \frac{2}{5}.
\end{align*}
Next,
\begin{align*}
\|f\|_{L^2}^2 &= \int_0^1 x^2 \, d\mathcal{L}^1 = \frac{1}{3}, \qquad \|g\|_{L^2}^2 = \int_0^1 x \, d\mathcal{L}^1 = \frac{1}{2}.
\end{align*}
So $\|f\|_{L^2} = 1/\sqrt{3}$ and $\|g\|_{L^2} = 1/\sqrt{2}$. Cauchy-Schwarz asserts
\begin{align*}
\frac{2}{5} &\le \frac{1}{\sqrt{3}} \cdot \frac{1}{\sqrt{2}} = \frac{1}{\sqrt{6}}.
\end{align*}
Indeed $2/5 = 0.4$ and $1/\sqrt{6} \approx 0.408$, so the inequality holds with strict inequality (as expected, since $f$ and $g$ are not proportional as functions).
[/example]
## Orthogonality
Orthogonality is the heart of inner product geometry. Two vectors are orthogonal when their inner product is zero — meaning, geometrically, that they are at right angles, contributing nothing to each other. The leverage this gives is enormous: in an orthogonal or orthonormal set, computing projections and decompositions becomes a matter of taking individual inner products, with no need to solve any system of equations.
[definition: Orthogonality]
Let $(V, (\cdot, \cdot)_V)$ be an inner product space. Two vectors $u, v \in V$ are **orthogonal**, written $u \perp v$, if $(u, v)_V = 0$.
A set $S \subset V$ is called **orthogonal** if $(u, v)_V = 0$ for all distinct $u, v \in S$. It is called **orthonormal** if additionally $(v, v)_V = 1$ for all $v \in S$, i.e., $\|v\|_V = 1$ for all $v \in S$.
[/definition]
Two important structural results follow directly from the definition of orthogonality. The first is the Pythagorean theorem, which in the abstract setting says that orthogonal vectors contribute independently to the squared length of their sum.
[quotetheorem:3266]
The proof is immediate from bilinearity and symmetry: $(u + v, u + v)_V = \|u\|_V^2 + (u, v)_V + (v, u)_V + \|v\|_V^2 = \|u\|_V^2 + \|v\|_V^2$ since $(u, v)_V = (v, u)_V = 0$. The general case follows by induction.
The second structural result says that orthogonality forces linear independence — an orthogonal set with no zero vectors cannot have any redundancy.
[quotetheorem:3307]
The argument is clean: if $\alpha_1 v_1 + \cdots + \alpha_k v_k = 0$ with the $v_j$ mutually orthogonal and nonzero, take the inner product of both sides with $v_j$:
\begin{align*}
0 &= \left(\sum_{i=1}^k \alpha_i v_i, v_j\right)_V = \sum_{i=1}^k \alpha_i (v_i, v_j)_V = \alpha_j \|v_j\|_V^2.
\end{align*}
Since $\|v_j\|_V \ne 0$, we get $\alpha_j = 0$. This holds for every $j$, so the set is linearly independent.
Given a subspace $W$, there is a canonical way to describe all vectors in $V$ that are perpendicular to everything in $W$. This is worth isolating: the set of such vectors is always a subspace, regardless of the structure of $W$, and it will play a central role in the decomposition theorem below.
[definition: Orthogonal Complement]
Let $(V, (\cdot, \cdot)_V)$ be an inner product space and let $W \subset V$ be a nonempty subset. The **orthogonal complement** of $W$ is
\begin{align*}
W^\perp &:= \{v \in V : (v, w)_V = 0 \text{ for all } w \in W\}.
\end{align*}
[/definition]
That $W^\perp$ is always a subspace of $V$ — even when $W$ itself is not — follows directly from linearity: if $(u, w)_V = 0$ and $(v, w)_V = 0$ for all $w \in W$, then $(\alpha u + \beta v, w)_V = \alpha (u, w)_V + \beta (v, w)_V = 0$.
The relationship between $W$ and $W^\perp$ in a finite-dimensional inner product space is particularly clean.
[quotetheorem:436]
This theorem tells you that every subspace has a canonical complement — the set of all vectors perpendicular to it. The decomposition $v = w + w^\perp$ gives rise to the orthogonal projection.
[definition: Orthogonal Projection]
In the setting of the theorem above, the **orthogonal projection onto $W$** is the linear map
\begin{align*}
P_W : V &\to W
\end{align*}
defined by $P_W v = w$, where $v = w + w^\perp$ is the unique orthogonal decomposition.
[/definition]
[remark: Projection as Best Approximation]
The projection $P_W v$ is the element of $W$ closest to $v$ in the norm induced by the inner product. That is,
\begin{align*}
\|v - P_W v\|_V &= \min_{w \in W} \|v - w\|_V.
\end{align*}
This best-approximation property is what makes projections useful in applications: finding the least-squares solution to an overdetermined linear system $Ax = b$ is precisely computing the projection of $b$ onto the column space of $A$.
[/remark]
The best-approximation characterization makes orthogonal projection the right tool whenever one needs the closest point in a subspace to a given vector. The following example works this out concretely in three dimensions.
[example: Projection in $\mathbb{R}^3$]
Let $V = \mathbb{R}^3$ with the standard inner product and let $W = \operatorname{span}\{e_1, e_2\}$, the $xy$-plane. Given $v = (3, 4, 5)$, we decompose $v = w + w^\perp$ where $w \in W$ and $w^\perp \in W^\perp = \operatorname{span}\{e_3\}$.
Since $W$ has the orthonormal basis $\{e_1, e_2\}$, the projection formula (developed in the next section) gives
\begin{align*}
w &= (v, e_1)_{\mathbb{R}^3} e_1 + (v, e_2)_{\mathbb{R}^3} e_2 = 3 e_1 + 4 e_2 = (3, 4, 0).
\end{align*}
Then $w^\perp = v - w = (0, 0, 5)$. We verify: $(w, w^\perp)_{\mathbb{R}^3} = 3 \cdot 0 + 4 \cdot 0 + 0 \cdot 5 = 0$. The distance from $v$ to $W$ is $\|w^\perp\|_{\mathbb{R}^3} = 5$.
The closest point in the $xy$-plane to $(3, 4, 5)$ is $(3, 4, 0)$: this is $(3, 4, 5)$ with its $z$-coordinate set to zero. Any other point $(a, b, 0) \in W$ satisfies $\|(3,4,5) - (a,b,0)\|^2 = (3-a)^2 + (4-b)^2 + 25 > 25$ unless $a = 3$ and $b = 4$.
[/example]
## Orthonormal Bases and the Gram-Schmidt Process
Having an orthonormal basis in an inner product space transforms all computations. With a general basis $\{v_1, \ldots, v_n\}$, computing coordinates of a vector $u$ requires solving the linear system whose matrix is the Gram matrix $(v_i, v_j)$. With an orthonormal basis $\{e_1, \ldots, e_n\}$, the coordinates are simply inner products: the coefficient of $e_j$ in the expansion of $u$ is $(u, e_j)_V$.
The question is: does every finite-dimensional inner product space have an orthonormal basis? The answer is yes, and the constructive proof — the Gram-Schmidt process — is both simple and fundamental.
The idea is to take any basis and modify it one vector at a time, subtracting off the component in the direction of the vectors already orthogonalized. At each step, you subtract the projection onto the span of the previous vectors. The result is an orthogonal set; normalizing gives an orthonormal set.
[quotetheorem:435]
Why does $u_j \ne 0$ at each step? Because $v_j \notin \operatorname{span}\{v_1, \ldots, v_{j-1}\} = \operatorname{span}\{u_1, \ldots, u_{j-1}\}$ (by linear independence), and $u_j$ is $v_j$ minus its projection onto that span. If $u_j = 0$, then $v_j$ would equal its projection onto $\operatorname{span}\{u_1, \ldots, u_{j-1}\}$, contradicting linear independence.
[example: Gram-Schmidt on Polynomials]
Let $V = \mathbb{R}[x]_{\le 2}$, the space of real polynomials of degree at most $2$, with the $L^2$ inner product
\begin{align*}
(p, q)_{L^2} &= \int_{-1}^{1} p(x) q(x) \, d\mathcal{L}^1.
\end{align*}
Start with the monomial basis $\{1, x, x^2\}$.
**Step 1.** $u_1 = 1$. We have $\|u_1\|_{L^2}^2 = \int_{-1}^1 1 \, d\mathcal{L}^1 = 2$, so $e_1 = 1/\sqrt{2}$.
**Step 2.** Compute the projection of $x$ onto $u_1$:
\begin{align*}
\frac{(x, u_1)_{L^2}}{\|u_1\|_{L^2}^2} &= \frac{\int_{-1}^1 x \, d\mathcal{L}^1}{2} = \frac{0}{2} = 0.
\end{align*}
So $u_2 = x - 0 \cdot 1 = x$. We have $\|u_2\|_{L^2}^2 = \int_{-1}^1 x^2 \, d\mathcal{L}^1 = 2/3$, so $e_2 = x / \sqrt{2/3} = x\sqrt{3/2}$.
**Step 3.** Project $x^2$ onto $u_1$ and $u_2$:
\begin{align*}
\frac{(x^2, u_1)_{L^2}}{\|u_1\|_{L^2}^2} &= \frac{\int_{-1}^1 x^2 \, d\mathcal{L}^1}{2} = \frac{2/3}{2} = \frac{1}{3}.
\end{align*}
\begin{align*}
\frac{(x^2, u_2)_{L^2}}{\|u_2\|_{L^2}^2} &= \frac{\int_{-1}^1 x^3 \, d\mathcal{L}^1}{2/3} = \frac{0}{2/3} = 0.
\end{align*}
So $u_3 = x^2 - \frac{1}{3} \cdot 1 - 0 \cdot x = x^2 - \frac{1}{3}$.
We verify orthogonality: $(u_3, u_1)_{L^2} = \int_{-1}^1 (x^2 - 1/3) \, d\mathcal{L}^1 = 2/3 - 2/3 = 0$. And $(u_3, u_2)_{L^2} = \int_{-1}^1 (x^2 - 1/3)x \, d\mathcal{L}^1 = \int_{-1}^1 x^3 \, d\mathcal{L}^1 - \frac{1}{3}\int_{-1}^1 x \, d\mathcal{L}^1 = 0 - 0 = 0$.
The resulting orthogonal polynomials $\{1, x, x^2 - 1/3\}$ are scalar multiples of the first three [Legendre polynomials](/page/Legendre%20Polynomials), illustrating how Gram-Schmidt on $L^2([-1,1])$ naturally produces classical orthogonal polynomials.
[/example]
Once we have an orthonormal basis $\{e_1, \ldots, e_n\}$, every vector $v \in V$ has the expansion
\begin{align*}
v &= \sum_{j=1}^{n} (v, e_j)_V \, e_j,
\end{align*}
and the inner product takes the simple form
\begin{align*}
(u, v)_V &= \sum_{j=1}^{n} (u, e_j)_V \overline{(v, e_j)_V}.
\end{align*}
This is Parseval's identity in finite dimensions. In the infinite-dimensional setting of $L^2$, the analogous statement — that the $L^2$ norm of a function equals the $\ell^2$ norm of its Fourier coefficients — is a central theorem of harmonic analysis.
[illustration:gram-schmidt-r2]
With orthonormal bases and projections in hand, the natural next question is: how does a linear operator interact with the inner product? In particular, can we "move" an operator from one argument of the inner product to the other — and what happens when we can?
## Adjoints and Self-Adjoint Operators
The inner product gives a way to "transpose" a linear operator. In $\mathbb{R}^n$ with the standard inner product, the adjoint of a matrix $A$ is its transpose $A^\top$, defined by the requirement $(Av) \cdot w = v \cdot (A^\top w)$ for all $v, w$. The abstract version of this, for operators on inner product spaces, leads to one of the most important classes of operators in linear algebra.
Given a linear operator $T: V \to V$ on an inner product space, can we always find another operator $T^*: V \to V$ satisfying $(Tu, v)_V = (u, T^*v)_V$ for all $u, v$? In finite dimensions, yes — uniquely.
[definition: Adjoint Operator]
Let $(V, (\cdot, \cdot)_V)$ be a finite-dimensional inner product space and let $T: V \to V$ be a linear operator. The **adjoint** of $T$ is the unique linear operator $T^*: V \to V$ satisfying
\begin{align*}
(Tu, v)_V &= (u, T^*v)_V \quad \text{for all } u, v \in V.
\end{align*}
[/definition]
Existence and uniqueness of $T^*$ in finite dimensions follows from a dimension count: the map $v \mapsto (Tu, v)_V$ is a linear functional on $V$, and by the Riesz representation theorem (which in finite dimensions is elementary), it equals $(u, w)_V$ for a unique $w \in V$. Setting $T^* v = w$ defines $T^*$.
In terms of the matrix representation with respect to an orthonormal basis $\{e_1, \ldots, e_n\}$: if $A$ is the matrix of $T$, then $A^* = \bar{A}^\top$ (the conjugate transpose) is the matrix of $T^*$. Over $\mathbb{R}$, this is just $A^\top$.
[definition: Self-Adjoint Operator]
A linear operator $T: V \to V$ on a finite-dimensional inner product space is **self-adjoint** (or **Hermitian** over $\mathbb{C}$, **symmetric** over $\mathbb{R}$) if $T^* = T$, i.e.,
\begin{align*}
(Tu, v)_V &= (u, Tv)_V \quad \text{for all } u, v \in V.
\end{align*}
[/definition]
Two foundational properties of self-adjoint operators underpin the spectral theorem. The first is that all eigenvalues are real — a striking constraint that has no analogue for general operators.
[quotetheorem:3279]
The argument is direct: if $Tv = \lambda v$ with $v \ne 0$, then
\begin{align*}
\lambda \|v\|_V^2 &= (\lambda v, v)_V = (Tv, v)_V = (v, Tv)_V = (v, \lambda v)_V = \bar{\lambda} \|v\|_V^2.
\end{align*}
Since $\|v\|_V \ne 0$, we get $\lambda = \bar{\lambda}$, so $\lambda \in \mathbb{R}$.
The second property is that eigenvectors corresponding to distinct eigenvalues are automatically orthogonal — a fact with no extra effort required beyond the self-adjoint identity.
[quotetheorem:3280]
The proof: since $T$ is self-adjoint, $(Tu, v)_V = (u, Tv)_V$, which gives $\lambda(u, v)_V = \mu(u, v)_V$. Since $\lambda \ne \mu$, we conclude $(u, v)_V = 0$.
These two properties — real eigenvalues and orthogonal eigenspaces — are the key ingredients in the spectral theorem.
[quotetheorem:440]
The spectral theorem is one of the most powerful results in all of linear algebra. It says that from an intrinsic perspective — using only the inner product geometry, not coordinates — self-adjoint operators are completely characterized by their eigenvalues and their orthogonal eigenspaces. In matrix terms: every real symmetric matrix $A$ can be written as $A = Q \Lambda Q^\top$ where $Q$ is orthogonal and $\Lambda$ is a real diagonal matrix. Every Hermitian matrix $A = \bar{A}^\top$ can be written as $A = U \Lambda U^*$ where $U$ is unitary. The following example carries out this diagonalization explicitly for a $2 \times 2$ symmetric matrix, illustrating all the steps the theorem guarantees.
[example: Diagonalizing a Symmetric Matrix]
Let $V = \mathbb{R}^2$ with the standard inner product and let $T$ be the operator with matrix
\begin{align*}
A &= \begin{pmatrix} 2 & 1 \\ 1 & 2 \end{pmatrix}.
\end{align*}
This is symmetric, so the spectral theorem guarantees a real orthonormal eigenbasis.
The characteristic polynomial is $\det(A - \lambda I) = (2 - \lambda)^2 - 1 = \lambda^2 - 4\lambda + 3 = (\lambda - 1)(\lambda - 3)$, giving eigenvalues $\lambda_1 = 1$ and $\lambda_2 = 3$.
For $\lambda_1 = 1$: $(A - I)v = 0$ gives $v_1 + v_2 = 0$, so the eigenspace is $\operatorname{span}\{(1, -1)^\top\}$. Normalizing: $e_1 = \frac{1}{\sqrt{2}}(1, -1)^\top$.
For $\lambda_2 = 3$: $(A - 3I)v = 0$ gives $-v_1 + v_2 = 0$, so the eigenspace is $\operatorname{span}\{(1, 1)^\top\}$. Normalizing: $e_2 = \frac{1}{\sqrt{2}}(1, 1)^\top$.
We verify orthogonality: $(e_1, e_2)_{\mathbb{R}^2} = \frac{1}{2}(1 \cdot 1 + (-1) \cdot 1) = 0$. The orthonormal eigenbasis $\{e_1, e_2\}$ diagonalizes $A$:
\begin{align*}
A &= Q \Lambda Q^\top, \quad Q = \frac{1}{\sqrt{2}}\begin{pmatrix} 1 & 1 \\ -1 & 1 \end{pmatrix}, \quad \Lambda = \begin{pmatrix} 1 & 0 \\ 0 & 3 \end{pmatrix}.
\end{align*}
The columns of $Q$ are the eigenvectors $e_1$ and $e_2$. Since $Q$ is orthogonal ($Q^\top Q = I$), we have $Q^{-1} = Q^\top$.
[/example]
## Hilbert Spaces
The inner product spaces we have encountered so far are finite-dimensional. The theory extends naturally to infinite dimensions — but a new phenomenon appears: completeness. In a finite-dimensional inner product space, every Cauchy sequence converges (since all norms on a finite-dimensional space are equivalent and $\mathbb{R}$ or $\mathbb{C}$ is complete). In infinite dimensions, this fails without additional hypothesis.
Consider the space $C([0,1])$ of continuous functions on $[0,1]$ with the $L^2$ inner product $(f, g)_{L^2} = \int_0^1 f(x) g(x) \, d\mathcal{L}^1$. The sequence of continuous functions
\begin{align*}
f_n(x) &= \begin{cases} 0 & 0 \le x \le \frac{1}{2} - \frac{1}{n} \\ n(x - \frac{1}{2}) + \frac{1}{2} & \frac{1}{2} - \frac{1}{n} \le x \le \frac{1}{2} + \frac{1}{n} \\ 1 & \frac{1}{2} + \frac{1}{n} \le x \le 1 \end{cases}
\end{align*}
is Cauchy in the $L^2$ norm (since $f_n \to \mathbb{1}_{[1/2, 1]}$ in $L^2$, and the limit is integrable but not continuous). The limit does not belong to $C([0,1])$. So $C([0,1])$ with the $L^2$ inner product is not complete.
The correct infinite-dimensional setting is the space $L^2([0,1])$ of (equivalence classes of) square-integrable functions, where completeness holds. Spaces that combine the inner product structure with completeness have a name.
[definition: Hilbert Space]
A **Hilbert space** is an inner product space $(H, (\cdot, \cdot)_H)$ that is complete with respect to the induced norm $\|\cdot\|_H$: every Cauchy sequence in $H$ converges to an element of $H$.
[/definition]
Every finite-dimensional inner product space is automatically a Hilbert space. The fundamental infinite-dimensional examples are:
- $\ell^2 := \{(a_n)_{n \ge 1} \subset \mathbb{R} : \sum_{n=1}^\infty a_n^2 < \infty\}$ with $(a, b)_{\ell^2} = \sum_{n=1}^\infty a_n b_n$.
- $L^2(U)$ for an open set $U \subset \mathbb{R}^n$, with $(f, g)_{L^2} = \int_U f \bar{g} \, d\mathcal{L}^n$.
- $H^1(U) = W^{1,2}(U)$, with $(u, v)_{H^1} = \int_U (u\bar{v} + \nabla u \cdot \nabla \bar{v}) \, d\mathcal{L}^n$.
The theory of Hilbert spaces retains much of the finite-dimensional inner product geometry: the orthogonal decomposition theorem holds for closed subspaces, and the Riesz representation theorem characterizes every bounded linear functional as an inner product with a fixed vector.
[quotetheorem:221]
This theorem says that the dual of a Hilbert space is isometrically isomorphic to itself: every bounded functional "is" an inner product with a fixed vector. This identification $H \cong H^*$ is what allows one to replace dual-space pairings by inner products throughout Hilbert space theory, and it is the foundation for the weak formulation of PDEs in $H^1_0$ spaces.
[remark: Orthonormal Bases in Hilbert Spaces]
In a separable Hilbert space $H$ (one that has a countable dense subset), there exists a countable orthonormal basis (also called a **complete orthonormal system**) $\{e_n\}_{n=1}^\infty$. "Basis" here means that every $v \in H$ has the expansion
\begin{align*}
v &= \sum_{n=1}^\infty (v, e_n)_H \, e_n,
\end{align*}
where convergence is in the norm $\|\cdot\|_H$. The Parseval identity holds: $\|v\|_H^2 = \sum_{n=1}^\infty |(v, e_n)_H|^2$.
For $L^2([0, 1])$, the exponentials $\{e^{2\pi i n x}\}_{n \in \mathbb{Z}}$ form a complete orthonormal system: the Parseval identity in this case is the statement that the $L^2$ norm of $f$ equals the $\ell^2$ norm of its Fourier coefficients. This is the bridge between inner product space theory and classical Fourier analysis.
[/remark]
## References
Axler, S., *Linear Algebra Done Right* (2015).
Halmos, P. R., *Finite-Dimensional Vector Spaces* (1958).
Rudin, W., *Real and Complex Analysis* (1987).
Brezis, H., *Functional Analysis, Sobolev Spaces and Partial Differential Equations* (2011).
Evans, L. C., *Partial Differential Equations* (2010).