Orthogonal projection is the mechanism that turns perpendicularity into an actual operation. In the closed-subspace setting of a [Hilbert Space](/page/Hilbert%20Space), a subspace is not merely a subset: it determines a canonical nearest-point map, a decomposition of the ambient space, and a self-adjoint idempotent operator. In Euclidean geometry this is the familiar shadow of a vector on a line or plane. In analysis, the same idea supports [Fourier series](/page/Fourier%20Series), least-squares approximation, spectral theory, [conditional expectation](/page/Conditional%20Expectation) in $L^2$, and the fibre-metric geometry developed in [Fibre Bundles I: Bundles, Sections, and Transition Data](/page/Fibre%20Bundles%20I%3A%20Bundles%2C%20Sections%2C%20and%20Transition%20Data).
The word "projection" by itself only says that applying the map twice changes nothing. Orthogonality adds the crucial metric condition: the error vector is perpendicular to the chosen subspace. That extra condition is what makes the projection canonical. Without it, many different complementary subspaces produce many different projections onto the same range; with it, a closed subspace of a Hilbert space determines a unique [bounded linear operator](/page/Bounded%20Linear%20Operator).
The concept has two parallel forms. The Hilbert-space form treats orthogonal projection as an operator $P \in \mathcal{L}(H)$ satisfying $P^2=P$ and $P^*=P$, where $\mathcal{L}(H)$ denotes the space of bounded linear operators from $H$ to itself and $P^*$ denotes the adjoint operator of $P$. The geometric form treats it as the nearest-point map onto a closed subspace. In finite-dimensional linear algebra and smooth geometry, the same construction appears fibrewise: if a vector bundle $E \to M$ carries a smooth fibre metric and $F \subset E$ is a smooth subbundle, then each fibre splits into $F_p$ and its orthogonal complement, producing a smooth bundle map $P_F:E\to E$.
## Definition
The starting point is a Hilbert space, where inner products make it meaningful to ask for perpendicular errors. The key operation is not merely a decomposition theorem but a map: each vector should be assigned its component in the target subspace. This is a strong demand in infinite-dimensional spaces, and it is precisely why closed subspaces are the natural domains of the theory.
[definition: Orthogonal Projection onto a Closed Subspace]
Let $H$ be a Hilbert space and let $M \subset H$ be a closed linear subspace. The orthogonal projection onto $M$ is the map $P_M:H\to H$ whose value at $x \in H$ is denoted by $P_Mx$, defined by the condition that $P_Mx \in M$ and
\begin{align*}
(x-P_Mx,m)_H=0
\end{align*}
for every $m \in M$.
[/definition]
The definition forces the error $x-P_Mx$ to have no component in any allowed direction $m\in M$. To describe those possible error vectors without repeatedly mentioning every test vector in $M$, one needs a separate object: the subspace of all vectors perpendicular to $M$. That object is the orthogonal complement, and it is the language in which projection decompositions are stated.
[definition: Orthogonal Complement]
Let $H$ be a Hilbert space and let $M \subset H$ be a linear subspace. The orthogonal complement of $M$ is
\begin{align*}
M^\perp := \{x \in H : (x,m)_H = 0 \text{ for every } m \in M\}.
\end{align*}
[/definition]
The orthogonal complement is the candidate space for all possible projection errors. If every $x \in H$ can be written as a sum of a vector in $M$ and a vector in $M^\perp$, then the component in $M$ is forced to be the projected point. This is the content needed to justify the definition above: the Hilbert space [projection theorem](/theorems/1985) guarantees that the required vector $P_Mx$ exists and is unique. The error $x-P_Mx$ is not an arbitrary residual; it is orthogonal to every allowable correction in $M$.
Analysts often identify orthogonal projections by algebraic properties of the operator. This version is compact and powerful because it can be checked without first naming the range as a subspace.
[definition: Orthogonal Projection Operator]
Let $H$ be a Hilbert space. A bounded linear map
\begin{align*}
P:H &\to H
\end{align*}
is an orthogonal projection operator if $P \in \mathcal{L}(H)$ and
\begin{align*}
P^2 &= P, & P^* &= P.
\end{align*}
[/definition]
The equation $P^2=P$ says that the operator is a projection; once a vector has been projected, projecting it again does not move it. The equation $P^*=P$ says that the projection is compatible with the [inner product](/page/Inner%20Product). Together they distinguish orthogonal projections from oblique projections.
In differential geometry and geometric analysis, orthogonal projection is used point by point on vector bundles. The definition is the same fibrewise idea, but smoothness matters because the projection must vary smoothly with the base point.
[definition: Orthogonal Projection onto a Subbundle]
Let $E \to M$ be a smooth real vector bundle equipped with a smooth fibre metric $g$, and let $F \subset E$ be a smooth vector subbundle. The orthogonal projection onto $F$ is the smooth bundle map
\begin{align*}
P_F:E &\to E
\end{align*}
such that, for each $p \in M$, the restriction
\begin{align*}
(P_F)_p:E_p &\to E_p
\end{align*}
is the orthogonal projection of the [inner product space](/page/Inner%20Product%20Space) $(E_p,g_p)$ onto $F_p$.
[/definition]
The fibrewise condition says that each vector $v \in E_p$ decomposes as a component in $F_p$ plus a component in the fibrewise orthogonal complement $F_p^\perp$. Smoothness is not a decorative condition: it is what allows $P_F$ to act on smooth sections and appear in differential operators.
## Equivalent Characterisations
The decomposition definition is geometric, while the operator definition is algebraic. In practice one often meets an endomorphism $P$ before knowing whether it comes from a chosen subspace and its orthogonal complement. The obstruction is that idempotence alone only says vectors split into a range and a kernel; it does not force those two pieces to be perpendicular. Self-adjointness supplies exactly that missing orthogonality, so the identities $P^2=P$ and $P^*=P$ become a complete intrinsic test for orthogonal projection.
[quotetheorem:9185]
This result lets the reader move freely between geometry and operator theory. When studying [Self-Adjoint Operators](/page/Self-Adjoint%20Operators), an orthogonal projection is the simplest nontrivial example: its spectrum is contained in $\{0,1\}$, and its range and kernel are perpendicular. The next characterisation explains why the same operator is also the natural solution to approximation problems. Once a closed subspace is interpreted as a class of allowable approximants, the projection is the element of that class nearest to the original vector.
[quotetheorem:9186]
This is the analytical heart of orthogonal projection. It explains why projections occur in [approximation theory](/page/Approximation%20Theory), Fourier analysis, and least-squares problems: the orthogonal component is the unique residual that cannot be improved by moving within $M$. The [orthogonal decomposition theorem](/theorems/241) quoted above supplies the structural reason this minimisation statement is possible: each vector separates into an allowable component and an error component perpendicular to every allowable direction.
The hypotheses matter. In infinite-dimensional settings the subspace must be closed; otherwise a best approximant may fail to exist even when vectors can be approximated arbitrarily well. Thus the decomposition result is not merely a formal splitting, but the condition that makes norm identities, Parseval-type formulas, and energy decompositions in $L^2$ theory behave cleanly.
## Standard Examples
The finite-dimensional case is the model every other example generalises. Projection onto a line shows both the formula and the role of normalisation.
[example: Projection onto a Line in Euclidean Space]
Let $u \in \mathbb{R}^n$ with $u \ne 0$, and let $M=\operatorname{span}(u)$ with the Euclidean inner product. To project $x\in\mathbb{R}^n$ onto the line $M$, write the candidate projected vector as $\alpha u$ and impose that the residual be perpendicular to $u$:
\begin{align*}
(x-\alpha u)\cdot u=0.
\end{align*}
Expanding the dot product gives
\begin{align*}
x\cdot u-\alpha(u\cdot u)=0.
\end{align*}
Since $u\ne 0$, we have $u\cdot u>0$, so
\begin{align*}
\alpha=\frac{x\cdot u}{u\cdot u}.
\end{align*}
Thus
\begin{align*}
P_Mx=\frac{x\cdot u}{u\cdot u}u.
\end{align*}
The residual is indeed orthogonal to $M$, because
\begin{align*}
\left(x-\frac{x\cdot u}{u\cdot u}u\right)\cdot u=x\cdot u-\frac{x\cdot u}{u\cdot u}(u\cdot u)=x\cdot u-x\cdot u=0.
\end{align*}
If $m\in M$, then $m=\beta u$ for some $\beta\in\mathbb{R}$, and therefore
\begin{align*}
\left(x-\frac{x\cdot u}{u\cdot u}u\right)\cdot m=\left(x-\frac{x\cdot u}{u\cdot u}u\right)\cdot(\beta u)=\beta\left(x-\frac{x\cdot u}{u\cdot u}u\right)\cdot u=\beta\cdot 0=0.
\end{align*}
So the residual lies in $M^\perp$, and the displayed vector is the orthogonal projection onto $M$.
If $|u|=1$, then $u\cdot u=1$, so the formula becomes
\begin{align*}
P_Mx=(x\cdot u)u.
\end{align*}
Writing vectors as columns, $u^\top x=u\cdot x=x\cdot u$, hence
\begin{align*}
P_Mx=u(u^\top x)=(u u^\top)x
\end{align*}
in the unit-vector case. In general,
\begin{align*}
P_Mx=\frac{u(u^\top x)}{u\cdot u}=\left(\frac{u u^\top}{u\cdot u}\right)x.
\end{align*}
Thus the projection matrix is $u u^\top$ when $|u|=1$, and $u u^\top/(u\cdot u)$ for an arbitrary nonzero direction vector $u$.
[/example]
This example already shows why orthogonal projections are usually self-adjoint. The matrix $u u^\top/(u\cdot u)$ is symmetric and idempotent, so the finite-dimensional operator satisfies the algebraic definition.
Projection onto a finite-dimensional subspace is controlled by orthonormal coordinates. This is the local pattern behind Fourier coefficients.
[example: Projection onto a Finite-Dimensional Orthonormal Span]
Let $H$ be a Hilbert space and let $e_1,\ldots,e_n \in H$ be orthonormal, so $(e_i,e_j)_H=0$ for $i\ne j$ and $(e_j,e_j)_H=1$. Set
\begin{align*}
M=\operatorname{span}(e_1,\ldots,e_n).
\end{align*}
For $x\in H$, define
\begin{align*}
y=\sum_{i=1}^n (x,e_i)_H e_i.
\end{align*}
Then $y\in M$. We show that $x-y$ is perpendicular to every vector in $M$.
Fix $j\in\{1,\ldots,n\}$. By linearity of the inner product in the first variable,
\begin{align*}
(y,e_j)_H=\left(\sum_{i=1}^n (x,e_i)_H e_i,e_j\right)_H=\sum_{i=1}^n (x,e_i)_H(e_i,e_j)_H.
\end{align*}
Since the set is orthonormal, every term with $i\ne j$ is zero and the $i=j$ term is $(x,e_j)_H$, hence
\begin{align*}
(y,e_j)_H=(x,e_j)_H.
\end{align*}
Therefore
\begin{align*}
(x-y,e_j)_H=(x,e_j)_H-(y,e_j)_H=(x,e_j)_H-(x,e_j)_H=0.
\end{align*}
Now let $m\in M$. Then $m=\sum_{j=1}^n \beta_j e_j$ for some scalars $\beta_1,\ldots,\beta_n$. Using conjugate-linearity in the second variable in the complex case, and ordinary bilinearity in the real case,
\begin{align*}
(x-y,m)_H=\left(x-y,\sum_{j=1}^n \beta_j e_j\right)_H=\sum_{j=1}^n \overline{\beta_j}(x-y,e_j)_H=\sum_{j=1}^n \overline{\beta_j}\cdot 0=0.
\end{align*}
Thus $x-y\in M^\perp$, so the orthogonal projection of $x$ onto $M$ is
\begin{align*}
P_Mx=\sum_{i=1}^n (x,e_i)_H e_i.
\end{align*}
The formula says that projection onto an orthonormal finite-dimensional span keeps exactly the coordinates of $x$ in the directions $e_1,\ldots,e_n$.
[/example]
This formula is one of the main reasons orthonormal bases are useful. Coordinates are inner products, and projection is obtained by retaining the coordinates belonging to the chosen subspace.
Infinite-dimensional examples require the subspace to be closed. The next example shows what fails when closedness is omitted.
[example: Nonclosed Subspace with No Nearest Point]
Let $H=L^2(0,1)$ and let $M$ be the subspace of polynomial functions on $(0,1)$, viewed as equivalence classes in $L^2(0,1)$. Polynomials are dense in $L^2(0,1)$: continuous functions are dense in $L^2(0,1)$, and continuous functions on $[0,1]$ are uniformly approximated by polynomials by *[Weierstrass Approximation Theorem](/theorems/480)*. Thus the closure of $M$ in $L^2(0,1)$ is all of $L^2(0,1)$.
Take
\begin{align*}
f=\mathbf{1}_{(0,1/2)}.
\end{align*}
This function is not equal almost everywhere to any polynomial. Indeed, if a polynomial $p$ satisfied $p=f$ almost everywhere, then $p=0$ almost everywhere on $(1/2,1)$. Since $p$ is continuous, the set where $p=0$ is closed, and because it contains a dense full-measure subset of $(1/2,1)$, it contains all of $(1/2,1)$. A nonzero polynomial has only finitely many zeros unless it is identically zero, so $p$ must be the zero polynomial. But then $p=0$ almost everywhere on $(0,1/2)$, while $f=1$ almost everywhere there, a contradiction.
Since $M$ is dense in $L^2(0,1)$, for every $\varepsilon>0$ there is a polynomial $p_\varepsilon\in M$ such that
\begin{align*}
\|f-p_\varepsilon\|_{L^2}<\varepsilon.
\end{align*}
Therefore
\begin{align*}
0\le \inf_{p\in M}\|f-p\|_{L^2}\le \|f-p_\varepsilon\|_{L^2}<\varepsilon.
\end{align*}
Because this holds for every $\varepsilon>0$,
\begin{align*}
\inf_{p\in M}\|f-p\|_{L^2}=0.
\end{align*}
No polynomial attains this infimum. If some $p\in M$ satisfied
\begin{align*}
\|f-p\|_{L^2}=0,
\end{align*}
then
\begin{align*}
\int_0^1 |f(x)-p(x)|^2\,d\mathcal{L}^1(x)=0,
\end{align*}
so $f=p$ almost everywhere, contradicting the choice of $f$. Thus $f$ can be approximated arbitrarily well by elements of $M$, but it has no nearest point in $M$; this is exactly why orthogonal projection requires the target subspace to be closed.
[/example]
This example is a useful warning: dense subspaces can approximate every vector arbitrarily well without containing the nearest point. Orthogonal projection belongs to closed subspaces, not merely large subspaces.
The following example separates orthogonal projections from general projections. Idempotence alone does not encode perpendicularity.
[example: Oblique Projection in the Plane]
Let $T:\mathbb{R}^2\to\mathbb{R}^2$ be given by
\begin{align*}
T(x_1,x_2)=(x_1+x_2,0).
\end{align*}
For any $(x_1,x_2)\in\mathbb{R}^2$,
\begin{align*}
T^2(x_1,x_2)=T(T(x_1,x_2))=T(x_1+x_2,0)=(x_1+x_2,0)=T(x_1,x_2).
\end{align*}
Hence $T^2=T$, so $T$ is a projection. Its range is the $x_1$-axis: every value of $T$ has the form $(a,0)$, and conversely
\begin{align*}
T(a,0)=(a,0).
\end{align*}
The kernel is obtained by solving $T(x_1,x_2)=(0,0)$:
\begin{align*}
T(x_1,x_2)=(0,0)\iff (x_1+x_2,0)=(0,0)\iff x_1+x_2=0\iff x_2=-x_1.
\end{align*}
Therefore
\begin{align*}
\ker(T)=\{(t,-t):t\in\mathbb{R}\}.
\end{align*}
This kernel is not perpendicular to the range, since $(1,0)$ lies on the $x_1$-axis, $(1,-1)\in\ker(T)$, and
\begin{align*}
(1,0)\cdot(1,-1)=1\cdot 1+0\cdot(-1)=1\ne 0.
\end{align*}
The same failure appears in the matrix. With respect to the standard basis,
\begin{align*}
T(1,0)=(1,0)
\end{align*}
and
\begin{align*}
T(0,1)=(1,0),
\end{align*}
so the matrix of $T$ has columns $(1,0)$ and $(1,0)$:
\begin{align*}
[T]=\begin{pmatrix}1&1\end{pmatrix}\text{ in the first row and }\begin{pmatrix}0&0\end{pmatrix}\text{ in the second row}.
\end{align*}
Thus the first row is $(1,1)$ and the second row is $(0,0)$, while the first column is $(1,0)$ and the second column is $(1,0)$. The matrix is not symmetric, so $T$ is not self-adjoint for the Euclidean inner product. Thus $T$ is an idempotent projection onto the $x_1$-axis, but it is not an orthogonal projection.
[/example]
This boundary case explains the role of self-adjointness in the operator definition. A projection may land in the correct subspace while measuring errors along the wrong directions.
## Properties
The most basic structural theorem is the Hilbert space projection theorem. It is the existence theorem behind the definition and the reason closed linear subspaces behave like Euclidean subspaces.
[quotetheorem:105]
The theorem is variational rather than merely algebraic: it says that the best approximation from a closed subspace is exactly the point whose error is orthogonal to that subspace. Closedness is essential here, because a non-closed subspace can have an infimum distance that is never attained. In the special case where the closed convex set is a closed linear subspace $M$, the nearest point assignment is precisely the orthogonal projection onto $M$, so the projection operator is the functional-analytic form of the nearest-point principle.
Once the projection exists as a bounded operator, its size becomes important: repeated approximation arguments often need estimates that do not enlarge norms. The next theorem records the sharp operator norm bound. It also explains why orthogonal projections are stable tools in analysis rather than merely geometric constructions.
[quotetheorem:9187]
The estimate is sharp because every nonzero orthogonal projection fixes each nonzero vector in its range, so its operator norm cannot be less than $1$. The zero-subspace case is the only exception: the projection is the zero operator, and its norm is $0$. This distinction is useful in applications because projecting onto a genuine approximation space preserves existing approximants while never increasing error estimates, energy bounds, or convergence norms.
Hierarchies of approximation spaces require compatibility between different projections. For instance, a finite-dimensional Galerkin space may sit inside a larger one, and the smaller approximation should not depend on whether the larger projection was taken first. Nested subspaces give this compatibility.
[quotetheorem:9188]
The nested formula says that once a vector has been compressed into the larger space, projecting further onto the smaller space gives exactly the same result as projecting directly onto the smaller space. The inclusion hypothesis cannot be dropped: projections onto two unrelated closed subspaces need not commute, and their composition need not be a projection at all. In concrete approximation schemes, this is the mechanism that makes coarse and fine models compatible.
This raises the converse structural question: what information about the Hilbert space is encoded by a projection itself? To use projections as coordinates, one needs more than compatibility of nested ranges; one needs a decomposition showing that every vector separates uniquely into a projected part and an orthogonal error part.
The next structural issue is therefore not another compatibility law, but the intrinsic geometry carried by a single orthogonal projection. Its range should be recoverable as the retained component, its orthogonal complement as the discarded component, and every vector should split through those two pieces with no overlap. The following result records that exact direct-sum decomposition and the corresponding Pythagorean norm identity.
[quotetheorem:241]
The direct-sum statement is stronger than saying merely that $P$ keeps some vectors and kills others. It says that the range and kernel exhaust the whole Hilbert space without overlap, and that the error $x-Px$ is not an arbitrary remainder but the unique component orthogonal to the retained subspace. This is why the hypothesis that the projection is orthogonal matters: a general idempotent still splits the space algebraically into range and kernel, but the Pythagorean identity and best-approximation interpretation can fail when the two summands are not perpendicular.
The decomposition also explains how the page will use projections from this point onward. In $L^2$ examples, the projected part is the finite Fourier approximation and the orthogonal part is the error invisible to the chosen modes. In operator theory, the same geometry underlies spectral projections, which behave like characteristic functions of parts of the spectrum while acting on vectors by separating them into orthogonal spectral components.
## Orthogonal Projection in Function Spaces
The analytic power of orthogonal projection is especially visible in $L^2$ spaces, where inner products are integrals. The finite-dimensional formula becomes the formula for Fourier coefficients.
[example: Fourier Partial Sum as Orthogonal Projection]
Let $H=L^2(-\pi,\pi)$ with
\begin{align*}
(f,g)_{L^2}=\int_{-\pi}^{\pi} f(x)\overline{g(x)}\,d\mathcal{L}^1(x).
\end{align*}
For $n\in\mathbb{N}$, define
\begin{align*}
e_k(x)=\frac{1}{\sqrt{2\pi}}e^{ikx}
\end{align*}
for $-n\le k\le n$, and let $M_n=\operatorname{span}(e_{-n},\ldots,e_n)$. We first check that these functions are orthonormal. If $j=k$, then
\begin{align*}
(e_j,e_j)_{L^2}=\int_{-\pi}^{\pi}\frac{1}{\sqrt{2\pi}}e^{ijx}\frac{1}{\sqrt{2\pi}}e^{-ijx}\,d\mathcal{L}^1(x)=\frac{1}{2\pi}\int_{-\pi}^{\pi}1\,d\mathcal{L}^1(x)=1.
\end{align*}
If $j\ne k$, then
\begin{align*}
(e_j,e_k)_{L^2}=\frac{1}{2\pi}\int_{-\pi}^{\pi}e^{i(j-k)x}\,d\mathcal{L}^1(x)=\frac{1}{2\pi}\left(\frac{e^{i(j-k)\pi}-e^{-i(j-k)\pi}}{i(j-k)}\right)=0,
\end{align*}
because $e^{i(j-k)\pi}=e^{-i(j-k)\pi}=(-1)^{j-k}$.
For $f\in L^2(-\pi,\pi)$, set
\begin{align*}
y=\sum_{k=-n}^{n}(f,e_k)_{L^2}e_k.
\end{align*}
Then $y\in M_n$. For a fixed $j$ with $-n\le j\le n$, linearity in the first variable gives
\begin{align*}
(y,e_j)_{L^2}=\sum_{k=-n}^{n}(f,e_k)_{L^2}(e_k,e_j)_{L^2}=(f,e_j)_{L^2}.
\end{align*}
Therefore
\begin{align*}
(f-y,e_j)_{L^2}=(f,e_j)_{L^2}-(y,e_j)_{L^2}=(f,e_j)_{L^2}-(f,e_j)_{L^2}=0.
\end{align*}
Now let $m\in M_n$. Then $m=\sum_{j=-n}^{n}\beta_j e_j$ for some scalars $\beta_j$, so conjugate-linearity in the second variable gives
\begin{align*}
(f-y,m)_{L^2}=\sum_{j=-n}^{n}\overline{\beta_j}(f-y,e_j)_{L^2}=\sum_{j=-n}^{n}\overline{\beta_j}\cdot 0=0.
\end{align*}
Thus $f-y$ is perpendicular to every vector in $M_n$, so the orthogonal projection of $f$ onto $M_n$ is
\begin{align*}
P_{M_n}f=\sum_{k=-n}^{n}(f,e_k)_{L^2}e_k.
\end{align*}
Writing the coefficient out,
\begin{align*}
(f,e_k)_{L^2}=\frac{1}{\sqrt{2\pi}}\int_{-\pi}^{\pi}f(t)e^{-ikt}\,d\mathcal{L}^1(t),
\end{align*}
so
\begin{align*}
(P_{M_n}f)(x)=\sum_{k=-n}^{n}\left(\frac{1}{2\pi}\int_{-\pi}^{\pi}f(t)e^{-ikt}\,d\mathcal{L}^1(t)\right)e^{ikx}.
\end{align*}
This is the $n$th symmetric Fourier partial sum: it keeps exactly the Fourier modes with frequencies $-n,\ldots,n$, and the discarded residual is orthogonal to every trigonometric polynomial in $M_n$.
[/example]
Partial information determines a closed subspace of square-integrable random variables. The right replacement for a [random variable](/page/Random%20Variable), when only that information is available, should be the best $L^2$ approximation among [measurable functions](/page/Measurable%20Functions) of the available information. This requirement leads to conditional expectation as an orthogonal projection.
[definition: Conditional Expectation as Orthogonal Projection]
Let $(\Omega,\mathcal{F},\mathbb{P})$ be a probability space and let $\mathcal{G}\subset\mathcal{F}$ be a sub-$\sigma$-algebra. Regard $L^2(\Omega,\mathcal{G},\mathbb{P})$ as the closed subspace of $L^2(\Omega,\mathcal{F},\mathbb{P})$ consisting of $\mathcal{G}$-measurable equivalence classes. The conditional expectation operator is the endomorphism
\begin{align*}
\mathbb{E}[\,\cdot\mid\mathcal{G}]:L^2(\Omega,\mathcal{F},\mathbb{P}) &\to L^2(\Omega,\mathcal{F},\mathbb{P})
\end{align*}
whose value at $X\in L^2(\Omega,\mathcal{F},\mathbb{P})$ is $\mathbb{E}[X\mid\mathcal{G}] \in L^2(\Omega,\mathcal{G},\mathbb{P})$.
[/definition]
The defining map is the usual conditional expectation operator, now viewed inside $L^2$. In the Hilbert space $L^2(\Omega,\mathcal{F},\mathbb{P})$, it is the orthogonal projection onto the closed subspace $L^2(\Omega,\mathcal{G},\mathbb{P})$. This Hilbert-space formulation explains why conditional expectation satisfies both an averaging property and a best-approximation property.
## Vector Bundle Form
In geometry, the target of projection may vary from point to point. A subbundle $F\subset E$ assigns a subspace $F_p\subset E_p$ in every fibre, and a fibre metric supplies the notion of orthogonality inside each fibre.
[example: Tangential Projection along an Embedded Submanifold]
Let $M\subset\mathbb{R}^n$ be a smooth embedded $k$-dimensional submanifold with the metric induced from the Euclidean inner product. Fix a local parametrization $\Phi:U\subset\mathbb{R}^k\to M$, write $p=\Phi(s)$, and set
\begin{align*}
E_i(p)=\frac{\partial \Phi}{\partial s_i}(s)
\end{align*}
for $1\le i\le k$. The vectors $E_1(p),\ldots,E_k(p)$ form a basis of $T_pM$.
For $v\in\mathbb{R}^n$, write the projected vector as
\begin{align*}
y=\sum_{i=1}^k a_i E_i(p).
\end{align*}
The condition that $v-y$ be perpendicular to $T_pM$ is equivalent to requiring
\begin{align*}
(v-y)\cdot E_\ell(p)=0
\end{align*}
for every $1\le \ell\le k$. Expanding $y$ gives
\begin{align*}
v\cdot E_\ell(p)-\sum_{i=1}^k a_i E_i(p)\cdot E_\ell(p)=0.
\end{align*}
Define
\begin{align*}
G_{i\ell}(p)=E_i(p)\cdot E_\ell(p)
\end{align*}
and
\begin{align*}
b_\ell(p,v)=v\cdot E_\ell(p).
\end{align*}
Since $E_1(p),\ldots,E_k(p)$ are linearly independent, the Gram matrix $G(p)$ is positive definite and hence invertible. Therefore the coefficients are determined by
\begin{align*}
a_i(p,v)=\sum_{\ell=1}^k (G(p)^{-1})_{i\ell} b_\ell(p,v).
\end{align*}
Thus, in this local parametrization,
\begin{align*}
P_{T_pM}v=\sum_{i=1}^k\sum_{\ell=1}^k (G(p)^{-1})_{i\ell}\bigl(v\cdot E_\ell(p)\bigr)E_i(p).
\end{align*}
Now check the perpendicularity. For each $r$,
\begin{align*}
(P_{T_pM}v)\cdot E_r(p)=\sum_{i=1}^k a_i(p,v)G_{ir}(p)=b_r(p,v)=v\cdot E_r(p).
\end{align*}
Hence
\begin{align*}
(v-P_{T_pM}v)\cdot E_r(p)=v\cdot E_r(p)-v\cdot E_r(p)=0.
\end{align*}
Since every $w\in T_pM$ has the form $w=\sum_{r=1}^k c_rE_r(p)$, it follows that
\begin{align*}
(v-P_{T_pM}v)\cdot w=\sum_{r=1}^k c_r(v-P_{T_pM}v)\cdot E_r(p)=0.
\end{align*}
So $P_{T_pM}v$ is the orthogonal projection of $v$ onto $T_pM$.
The formula depends smoothly on $p$ because the fields $E_i$, the functions $G_{i\ell}$, the inverse matrix entries $(G^{-1})_{i\ell}$, and the inner products $v\cdot E_\ell(p)$ all vary smoothly in the chosen chart. On overlapping charts the formula gives the same vector, because an orthogonal projection onto the fixed subspace $T_pM$ is unique. Therefore the maps $P_{T_pM}:\mathbb{R}^n\to\mathbb{R}^n$ assemble to a smooth bundle map
\begin{align*}
P: M\times\mathbb{R}^n\to M\times\mathbb{R}^n
\end{align*}
given by $P(p,v)=(p,P_{T_pM}v)$, whose image is exactly $TM\subset M\times\mathbb{R}^n$. This is the tangential projection: it keeps the tangent component of an ambient vector and discards the normal component.
[/example]
This is the projection underlying tangential gradients, normal components, and many constructions in [Cambridge III Riemannian Geometry](/page/Cambridge%20III%20Riemannian%20Geometry). It turns ambient vectors into tangent vectors without choosing coordinates. To use the construction intrinsically on vector bundles, one needs an algebraic test that can be checked fibre by fibre. The next theorem gives exactly that test and mirrors the Hilbert-space condition $P^2=P$ and $P^*=P$.
[quotetheorem:9189]
The smooth idempotent condition matters because it forces the pointwise image to have locally constant rank, so the images form a smooth subbundle. This theorem gives the vector-bundle analogue of $P^2=P$ and $P^*=P$.
## Relationship to Other Concepts
Orthogonal projection is a meeting point for several parts of analysis. In [Linear Map](/page/Linear%20Map) theory, it is a special idempotent. In Hilbert space theory, it is the operator form of a closed subspace. In approximation theory, it is the unique best approximation map. In spectral theory, it is the prototype for spectral projections.
It should be distinguished from a general projection. A general projection on a [vector space](/page/Vector%20Space) requires a direct sum $X=M\oplus N$ and maps $m+n$ to $m$. Orthogonal projection uses $N=M^\perp$, which depends on an inner product and requires closedness in Hilbert spaces.
It is also closely related to [The Adjoint of an Operator](/page/The%20Adjoint%20of%20an%20Operator). The identity $P^*=P$ says that the projection is symmetric with respect to the Hilbert space inner product. For finite-dimensional real inner product spaces, this is the familiar symmetry of the projection matrix.
In [numerical analysis](/page/Numerical%20Analysis) and statistics, least-squares solutions are orthogonal projections onto column spaces. If $A\in\mathbb{R}^{m\times n}$ has full column rank, the least-squares approximation to $b\in\mathbb{R}^m$ in the range of $A$ is $A\hat{x}$, where
\begin{align*}
A^\top A\hat{x}=A^\top b.
\end{align*}
The residual $b-A\hat{x}$ is perpendicular to every column of $A$.
In geometry, the vector-bundle version is the natural language for splitting tangent and normal components. It is used when decomposing ambient derivatives, defining second fundamental forms, and projecting sections of a bundle onto a chosen subbundle.
## Beyond and Connections
Orthogonal projection sits between Hilbert-space geometry and operator theory. The identity $P=P^*$ connects it to [The Adjoint of an Operator](/page/The%20Adjoint%20of%20an%20Operator), while the idempotent law $P^2=P$ connects it to the broader study of projections in algebras of operators. The self-adjoint condition is the part that turns an algebraic projection into a geometric one: it forces the range and kernel to meet at right angles.
This perspective is also the entry point to spectral methods. In the theory of [Self-Adjoint Operators](/page/Self-Adjoint%20Operators), spectral projections decompose a Hilbert space into orthogonal pieces associated with regions of the spectrum, generalizing the finite-dimensional decomposition into eigenspaces. In analysis, the same idea appears in Fourier approximation, least-squares methods, conditional expectation, and Galerkin schemes.
For geometric applications, compare this page with [Fibre Bundles I: Bundles, Sections, and Transition Data](/page/Fibre%20Bundles%20I%3A%20Bundles%2C%20Sections%2C%20and%20Transition%20Data) and [Cambridge III Riemannian Geometry](/page/Cambridge%20III%20Riemannian%20Geometry). A Riemannian metric supplies inner products on tangent spaces, making it meaningful to project tangent vectors onto subspaces such as tangent and normal directions. Thus the Hilbert-space theorem here is the linear model behind many decompositions used in differential geometry.
## References
[Hilbert Space](/page/Hilbert%20Space).
[Self-Adjoint Operators](/page/Self-Adjoint%20Operators).
[The Adjoint of an Operator](/page/The%20Adjoint%20of%20an%20Operator).
[Fibre Bundles I: Bundles, Sections, and Transition Data](/page/Fibre%20Bundles%20I%3A%20Bundles%2C%20Sections%2C%20and%20Transition%20Data).
[Cambridge III Riemannian Geometry](/page/Cambridge%20III%20Riemannian%20Geometry).
John B. Conway, *A Course in Functional Analysis* (1990).
Michael Reed and Barry Simon, *Methods of Modern Mathematical Physics I: Functional Analysis* (1980).
Walter Rudin, *Functional Analysis* (1991).
Orthogonal Projection
Also known as: Orthogonal projector, Orthogonal projection operator, Hilbert space projection, Projection onto a closed subspace, Perpendicular projection, Best approximation projection