A [linear map](/page/Linear%20Map) can preserve addition and scalar multiplication while destroying every analytic estimate we hoped to use. Differentiation is the warning sign: it is linear, but if the domain norm only measures the height of a function, then rapidly oscillating functions can have small height and enormous derivative. The operator norm is the device that separates linear maps which are merely algebraic from linear maps that are controlled by the geometry of their spaces.
The central question is this: how large can $Tx$ be when $x$ has size at most one? Once that question has a finite answer, the map $T$ is stable under approximation, sends convergent sequences to convergent sequences, and can be used safely inside limiting arguments. Without such a bound, a linear formula may be unusable in analysis.
[example: Differentiation Without the Right Norm]
Let $X=C^1([0,1])$ be equipped with the norm $\|f\|_{C^0}=\sup_{x\in[0,1]}|f(x)|$, let $Y=C([0,1])$ have the same supremum norm, and define $T:X\to Y$ by $Tf=f'$. This map is well-defined because every $f\in C^1([0,1])$ has derivative $f'\in C([0,1])$. If $f,g\in X$ and $\alpha,\beta$ are scalars, then $\alpha f+\beta g\in C^1([0,1])$, and for each $x\in[0,1]$,
\begin{align*}
T(\alpha f+\beta g)(x)=(\alpha f+\beta g)'(x)=\alpha f'(x)+\beta g'(x)=\alpha Tf(x)+\beta Tg(x).
\end{align*}
Thus $T(\alpha f+\beta g)=\alpha Tf+\beta Tg$, so $T$ is linear.
For each positive integer $n$, put $f_n(x)=\sin(nx)$. Then $f_n\in C^1([0,1])$. Since $|\sin t|\le 1$ for every real $t$, we have $|\sin(nx)|\le 1$ for every $x\in[0,1]$, and therefore
\begin{align*}
\|f_n\|_{C^0}=\sup_{x\in[0,1]}|\sin(nx)|\le 1.
\end{align*}
Differentiating $f_n$ gives, for each $x\in[0,1]$,
\begin{align*}
(Tf_n)(x)=f_n'(x)=n\cos(nx).
\end{align*}
Because $n>0$,
\begin{align*}
\|Tf_n\|_{C^0}=\sup_{x\in[0,1]}|n\cos(nx)|=n\sup_{x\in[0,1]}|\cos(nx)|.
\end{align*}
For every $x\in[0,1]$, $|\cos(nx)|\le 1$, so
\begin{align*}
\sup_{x\in[0,1]}|\cos(nx)|\le 1.
\end{align*}
At $x=0$, we have $|\cos(n\cdot 0)|=|\cos 0|=1$, so the same supremum is at least $1$. Hence
\begin{align*}
\sup_{x\in[0,1]}|\cos(nx)|=1.
\end{align*}
Substituting this into the previous norm computation gives
\begin{align*}
\|Tf_n\|_{C^0}=n.
\end{align*}
Suppose, toward a contradiction, that $T$ were bounded for these norms. Then there would be a constant $C\ge 0$ such that
\begin{align*}
\|Tf\|_{C^0}\le C\|f\|_{C^0}
\end{align*}
for every $f\in X$. Applying this estimate to $f_n$ gives
\begin{align*}
n=\|Tf_n\|_{C^0}\le C\|f_n\|_{C^0}\le C\cdot 1=C.
\end{align*}
Thus $n\le C$ for every positive integer $n$. Choosing a positive integer $n>C$ contradicts this inequality. Therefore differentiation is not bounded from $(C^1([0,1]),\|\cdot\|_{C^0})$ to $(C([0,1]),\|\cdot\|_{C^0})$. The input norm controls only the height of a function, while differentiation can turn bounded oscillations into arbitrarily large outputs.
[/example]
This failure gives the guiding theme of the page. A useful operator norm is not just a number attached to a formula; it records a compatibility between the domain norm, the codomain norm, and the operation being performed.
## Definition
The opening example shows that the real issue is not linearity itself, but whether a linear map has a finite worst-case amplification on unit-sized inputs. This is the number we want to name first: it is the quantity that later becomes a Lipschitz constant, an embedding constant, a stability constant, and the radius controlling perturbation arguments.
[definition: Operator Norm]
Let $(X,\|\cdot\|_X)$ and $(Y,\|\cdot\|_Y)$ be normed vector spaces over the same scalar field. We write $\mathcal L(X,Y)$ for the space of bounded linear operators from $X$ to $Y$; before boundedness has been checked, the same subscript records the intended domain and codomain of the operator norm. For a linear map $T:X\to Y$, the operator norm of $T$ is
\begin{align*}
\|T\|_{\mathcal L(X,Y)}:=\sup\{\|Tx\|_Y:x\in X,\ \|x\|_X\le 1\}.
\end{align*}
[/definition]
This definition presupposes that the domain and codomain already have norms, so we record the ambient structure explicitly. Normed vector spaces are the setting where linear algebra meets approximation: they let us ask whether a sequence converges, whether an error is small, and whether a map respects those errors.
[definition: Normed Vector Space]
A [normed vector space](/page/Normed%20Vector%20Space) is a [vector space](/page/Vector%20Space) $X$ over $\mathbb R$ or $\mathbb C$ together with a function $\|\cdot\|_X:X\to[0,\infty)$ such that $\|x\|_X=0$ if and only if $x=0$, $\|\lambda x\|_X=|\lambda|\|x\|_X$ for every scalar $\lambda$, and $\|x+y\|_X\le \|x\|_X+\|y\|_X$ for all $x,y\in X$.
[/definition]
Once vectors have sizes, the next question is which linear maps have finite operator norm. The differentiation example failed because bounded inputs could produce unbounded outputs, so we isolate the exact condition that forbids that behaviour.
[definition: Bounded Linear Operator]
Let $(X,\|\cdot\|_X)$ and $(Y,\|\cdot\|_Y)$ be normed vector spaces over the same scalar field. A [bounded linear operator](/page/Bounded%20Linear%20Operator) from $X$ to $Y$ is a linear map $T:X\to Y$ for which there exists $C\ge 0$ such that
\begin{align*}
\|Tx\|_Y\le C\|x\|_X
\end{align*}
for every $x\in X$.
[/definition]
The definition uses the closed unit ball, but computations often begin from a unit vector, a nonzero vector of arbitrary size, or a candidate constant in an inequality. The next result explains why all of those routes compute the same number, so it is the practical form of the definition.
[quotetheorem:9320]
The last formula says that proving an operator norm bound and proving a uniform estimate are the same act. The operational consequence is so common that it is worth isolating: once the norm of $T$ is known, every vector estimate follows by scaling from the unit ball.
[quotetheorem:9321]
This inequality is the form in which the operator norm is usually used inside a proof. The next example shows that the operator norm depends on the norms, not only on the underlying algebraic map.
[example: Identity Between Different Norms]
Let $I:(\mathbb R^n,|\cdot|)\to(\mathbb R^n,\|\cdot\|_\infty)$ be the identity map, where
\begin{align*}
|x|=(x_1^2+\cdots+x_n^2)^{1/2}
\end{align*}
and
\begin{align*}
\|x\|_\infty=\max_{1\le i\le n}|x_i|.
\end{align*}
We compute the operator norm of $I$ and then compare it with the operator norm of the same identity map in the reverse direction.
For $x=(x_1,\ldots,x_n)\in\mathbb R^n$ and each $i$, every term $x_k^2$ is nonnegative, so
\begin{align*}
|x_i|^2=x_i^2\le x_1^2+\cdots+x_n^2=|x|^2.
\end{align*}
Both $|x_i|$ and $|x|$ are nonnegative, so taking square roots gives $|x_i|\le |x|$ for every $i$. Hence the maximum of the coordinate absolute values also satisfies
\begin{align*}
\|Ix\|_\infty=\max_{1\le i\le n}|x_i|\le |x|.
\end{align*}
Therefore every $x$ with $|x|\le 1$ satisfies $\|Ix\|_\infty\le 1$, and the definition of the operator norm gives
\begin{align*}
\|I\|_{\mathcal L((\mathbb R^n,|\cdot|),(\mathbb R^n,\|\cdot\|_\infty))}\le 1.
\end{align*}
To see that the bound is sharp, take $e_1=(1,0,\ldots,0)$. Then
\begin{align*}
|e_1|=(1^2+0^2+\cdots+0^2)^{1/2}=1.
\end{align*}
Since $Ie_1=e_1$,
\begin{align*}
\|Ie_1\|_\infty=\max\{|1|,|0|,\ldots,|0|\}=1.
\end{align*}
Thus the supremum over the Euclidean unit ball is at least $1$. Combining this lower bound with the previous upper bound gives
\begin{align*}
\|I\|_{\mathcal L((\mathbb R^n,|\cdot|),(\mathbb R^n,\|\cdot\|_\infty))}=1.
\end{align*}
Now consider the reverse identity $J:(\mathbb R^n,\|\cdot\|_\infty)\to(\mathbb R^n,|\cdot|)$. For $x=(x_1,\ldots,x_n)$ and each coordinate $i$,
\begin{align*}
|x_i|\le \max_{1\le k\le n}|x_k|=\|x\|_\infty.
\end{align*}
Squaring this inequality gives
\begin{align*}
x_i^2=|x_i|^2\le \|x\|_\infty^2.
\end{align*}
Adding these $n$ coordinate inequalities yields
\begin{align*}
x_1^2+\cdots+x_n^2\le \|x\|_\infty^2+\cdots+\|x\|_\infty^2=n\|x\|_\infty^2.
\end{align*}
Since $Jx=x$, the left-hand side is $|Jx|^2$, so
\begin{align*}
|Jx|^2\le n\|x\|_\infty^2.
\end{align*}
Taking square roots, with both sides nonnegative, gives
\begin{align*}
|Jx|\le \sqrt n\,\|x\|_\infty.
\end{align*}
Therefore every $x$ with $\|x\|_\infty\le 1$ satisfies $|Jx|\le \sqrt n$, and hence
\begin{align*}
\|J\|_{\mathcal L((\mathbb R^n,\|\cdot\|_\infty),(\mathbb R^n,|\cdot|))}\le \sqrt n.
\end{align*}
For $u=(1,\ldots,1)$, every coordinate has absolute value $1$, so
\begin{align*}
\|u\|_\infty=\max\{|1|,\ldots,|1|\}=1.
\end{align*}
Since $Ju=u$,
\begin{align*}
|Ju|=(1^2+\cdots+1^2)^{1/2}=n^{1/2}=\sqrt n.
\end{align*}
Thus the supremum over the $\|\cdot\|_\infty$ unit ball is at least $\sqrt n$. Combining this lower bound with the upper bound gives
\begin{align*}
\|J\|_{\mathcal L((\mathbb R^n,\|\cdot\|_\infty),(\mathbb R^n,|\cdot|))}=\sqrt n.
\end{align*}
The underlying algebraic map is the identity in both directions, but its operator norm changes because the domain and codomain norms change.
[/example]
## Boundedness and Continuity
The operator norm is powerful because, for linear maps, a single global inequality controls all local behaviour. Nonlinear maps can be continuous at one point and badly behaved elsewhere; a linear map transports every local question back to the origin.
The right theorem explains why bounded linear maps are the continuous linear maps of normed-space analysis. It is the reason the notation $\mathcal L(X,Y)$ means bounded linear maps rather than all algebraic linear maps.
[quotetheorem:873]
This theorem turns the opening failure into a diagnosis. Differentiation was not continuous for the $C^0$ norm on the domain. If the domain norm is strengthened so that it records derivative size, the same formula becomes bounded.
[example: Differentiation with the $C^1$ Norm]
Let $X=C^1([0,1])$ with $\|f\|_{C^1}=\|f\|_{C^0}+\|f'\|_{C^0}$, let $Y=C([0,1])$ with $\|\cdot\|_{C^0}$, and define $T:X\to Y$ by $Tf=f'$. This is well-defined because every $f\in C^1([0,1])$ has continuous derivative $f'\in C([0,1])$. If $f,g\in X$ and $\alpha,\beta$ are scalars, then $\alpha f+\beta g\in C^1([0,1])$, and for every $x\in[0,1]$,
\begin{align*}
T(\alpha f+\beta g)(x)=(\alpha f+\beta g)'(x)=\alpha f'(x)+\beta g'(x)=\alpha Tf(x)+\beta Tg(x).
\end{align*}
Thus $T(\alpha f+\beta g)=\alpha Tf+\beta Tg$, so $T$ is linear.
For every $f\in X$,
\begin{align*}
\|Tf\|_{C^0}=\|f'\|_{C^0}.
\end{align*}
Since $\|f\|_{C^0}\ge 0$, we have
\begin{align*}
\|f'\|_{C^0}\le \|f\|_{C^0}+\|f'\|_{C^0}.
\end{align*}
Therefore
\begin{align*}
\|Tf\|_{C^0}\le \|f\|_{C^1}.
\end{align*}
So $T$ is bounded and $\|T\|_{\mathcal L(X,Y)}\le 1$.
To prove that the constant $1$ is sharp, fix an integer $n\ge 2$ and set
\begin{align*}
f_n(x)=\frac{\sin(nx)}{n}.
\end{align*}
For every $x\in[0,1]$, $|\sin(nx)|\le 1$, so
\begin{align*}
|f_n(x)|=\frac{|\sin(nx)|}{n}\le \frac{1}{n}.
\end{align*}
Hence $\|f_n\|_{C^0}\le 1/n$. Since $n\ge 2$, the point $x_n=\pi/(2n)$ belongs to $[0,1]$, because
\begin{align*}
0\le \frac{\pi}{2n}\le \frac{\pi}{4}<1.
\end{align*}
At this point,
\begin{align*}
|f_n(x_n)|=\frac{|\sin(n\pi/(2n))|}{n}=\frac{|\sin(\pi/2)|}{n}=\frac{1}{n}.
\end{align*}
Thus $\|f_n\|_{C^0}\ge 1/n$, and combining the two inequalities gives
\begin{align*}
\|f_n\|_{C^0}=\frac{1}{n}.
\end{align*}
Differentiating $f_n$ gives, for every $x\in[0,1]$,
\begin{align*}
f_n'(x)=\frac{n\cos(nx)}{n}=\cos(nx).
\end{align*}
Since $|\cos(nx)|\le 1$ for every $x\in[0,1]$, we have $\|f_n'\|_{C^0}\le 1$. At $x=0$,
\begin{align*}
|f_n'(0)|=|\cos(0)|=1,
\end{align*}
so $\|f_n'\|_{C^0}\ge 1$. Hence
\begin{align*}
\|f_n'\|_{C^0}=1.
\end{align*}
Combining the two norm computations,
\begin{align*}
\|f_n\|_{C^1}=\|f_n\|_{C^0}+\|f_n'\|_{C^0}=\frac{1}{n}+1.
\end{align*}
Define
\begin{align*}
h_n=\frac{f_n}{\|f_n\|_{C^1}}.
\end{align*}
Since $\|f_n\|_{C^1}=1+1/n>0$, homogeneity of the norm gives
\begin{align*}
\|h_n\|_{C^1}=\left\|\frac{f_n}{\|f_n\|_{C^1}}\right\|_{C^1}=\frac{1}{\|f_n\|_{C^1}}\|f_n\|_{C^1}=1.
\end{align*}
By linearity of $T$,
\begin{align*}
Th_n=T\left(\frac{f_n}{\|f_n\|_{C^1}}\right)=\frac{Tf_n}{\|f_n\|_{C^1}}.
\end{align*}
Using homogeneity of the $C^0$ norm and $Tf_n=f_n'$,
\begin{align*}
\|Th_n\|_{C^0}=\left\|\frac{Tf_n}{\|f_n\|_{C^1}}\right\|_{C^0}=\frac{\|Tf_n\|_{C^0}}{\|f_n\|_{C^1}}=\frac{\|f_n'\|_{C^0}}{1+1/n}.
\end{align*}
Since $\|f_n'\|_{C^0}=1$ and $1+1/n=(n+1)/n$,
\begin{align*}
\|Th_n\|_{C^0}=\frac{1}{(n+1)/n}=\frac{n}{n+1}.
\end{align*}
Each $h_n$ has $C^1$ norm $1$, so the definition of the operator norm gives
\begin{align*}
\|T\|_{\mathcal L(X,Y)}\ge \frac{n}{n+1}
\end{align*}
for every integer $n\ge 2$. Since
\begin{align*}
\frac{n}{n+1}=1-\frac{1}{n+1}
\end{align*}
and $1/(n+1)\to 0$, these lower bounds tend to $1$. Therefore $\|T\|_{\mathcal L(X,Y)}\ge 1$. Together with the earlier upper bound $\|T\|_{\mathcal L(X,Y)}\le 1$, this proves
\begin{align*}
\|T\|_{\mathcal L(X,Y)}=1.
\end{align*}
The derivative term in the $C^1$ norm exactly controls the output size of differentiation, so the same differentiation map that was unbounded for the $C^0$ domain norm becomes a bounded operator of norm $1$.
[/example]
A linear operator is determined by what it does to the unit ball, because every nonzero vector is a scalar multiple of a unit vector. This suggests a geometric test for boundedness: instead of checking every vector separately, check whether the whole image of the unit ball remains bounded.
[quotetheorem:9322]
The unit-ball viewpoint is the doorway to compact operators, where the image of the unit ball is required to have compact closure rather than merely finite diameter.
## Algebra of Bounded Operators
### The Space $\mathcal L(X,Y)$
If bounded operators are to be objects of analysis, they must form their own normed spaces. We want to approximate operators by simpler operators, add error terms, and take limits of operator sequences, all while retaining boundedness.
[definition: Space of Bounded Linear Operators]
Let $X$ and $Y$ be normed vector spaces over the same scalar field. The space of bounded linear operators from $X$ to $Y$ is
\begin{align*}
\mathcal L(X,Y):=\{T:X\to Y:T\text{ is linear and bounded}\}.
\end{align*}
When $X=Y$, write $\mathcal L(X):=\mathcal L(X,X)$.
[/definition]
Once $\mathcal L(X,Y)$ has been named, there is a structural problem to solve: addition and scalar multiplication of operators are defined pointwise, but boundedness could in principle be lost under those operations, and the supremum formula for $\|T\|$ still has to satisfy the norm axioms. The needed result verifies that these operations stay inside $\mathcal L(X,Y)$ and that operator size behaves like vector size.
[quotetheorem:9323]
Convergence in this norm is [uniform convergence](/page/Uniform%20Convergence) on the unit ball. That strength is exactly what is needed when operators are composed, inverted, or inserted into estimates.
### Composition and Multiplication
Operators can be multiplied by composition. A norm useful for operator theory must respect this multiplication: applying two bounded maps in succession should have a controlled amplification factor.
[quotetheorem:1054]
This is the operator analogue of $|ab|\le |a||b|$. It is the estimate that allows products, powers, and series of operators to be treated quantitatively.
[example: Projection Matrix]
Let $P:\mathbb R^2\to\mathbb R^2$ be defined by $P(x_1,x_2)=(x_1,0)$, and put the Euclidean norm on both copies of $\mathbb R^2$. We compute the operator norm of $P$ and verify that applying $P$ twice does not change the operator.
First, $P$ is linear. If $x=(x_1,x_2)$, $y=(y_1,y_2)$, and $\alpha,\beta$ are scalars, then
\begin{align*}
\alpha x+\beta y=(\alpha x_1+\beta y_1,\alpha x_2+\beta y_2).
\end{align*}
Applying $P$ gives
\begin{align*}
P(\alpha x+\beta y)=P(\alpha x_1+\beta y_1,\alpha x_2+\beta y_2)=(\alpha x_1+\beta y_1,0).
\end{align*}
On the other hand,
\begin{align*}
Px=(x_1,0)
\end{align*}
and
\begin{align*}
Py=(y_1,0).
\end{align*}
Therefore
\begin{align*}
\alpha Px+\beta Py=\alpha(x_1,0)+\beta(y_1,0)=(\alpha x_1,0)+(\beta y_1,0)=(\alpha x_1+\beta y_1,0).
\end{align*}
The two expressions agree, so $P(\alpha x+\beta y)=\alpha Px+\beta Py$.
For $x=(x_1,x_2)$, we have $Px=(x_1,0)$, so
\begin{align*}
|Px|=\bigl(x_1^2+0^2\bigr)^{1/2}=(x_1^2)^{1/2}=|x_1|.
\end{align*}
Since $x_2^2\ge 0$,
\begin{align*}
x_1^2\le x_1^2+x_2^2.
\end{align*}
Both sides are nonnegative, so taking square roots preserves the inequality:
\begin{align*}
(x_1^2)^{1/2}\le (x_1^2+x_2^2)^{1/2}.
\end{align*}
Thus
\begin{align*}
|Px|=|x_1|\le (x_1^2+x_2^2)^{1/2}=|x|.
\end{align*}
Hence every $x$ with $|x|\le 1$ satisfies $|Px|\le 1$, and the definition of the operator norm gives
\begin{align*}
\|P\|_{\mathcal L(\mathbb R^2,\mathbb R^2)}\le 1.
\end{align*}
To prove that this bound is sharp, take $e_1=(1,0)$. Then
\begin{align*}
|e_1|=(1^2+0^2)^{1/2}=1.
\end{align*}
Also,
\begin{align*}
Pe_1=P(1,0)=(1,0)=e_1,
\end{align*}
so
\begin{align*}
|Pe_1|=|e_1|=1.
\end{align*}
Because $e_1$ belongs to the Euclidean unit ball, the supremum defining the operator norm is at least $1$:
\begin{align*}
\|P\|_{\mathcal L(\mathbb R^2,\mathbb R^2)}\ge |Pe_1|=1.
\end{align*}
Combining the upper and lower bounds gives
\begin{align*}
\|P\|_{\mathcal L(\mathbb R^2,\mathbb R^2)}=1.
\end{align*}
Finally, for every $x=(x_1,x_2)$,
\begin{align*}
P^2x=P(Px)=P(x_1,0)=(x_1,0)=Px.
\end{align*}
Thus $P^2=P$ as operators. Here $\mathcal L(X,Y)$ denotes the space of bounded linear operators from $X$ to $Y$, equipped with the operator norm. Since the two operators are equal, their operator norms are equal, and therefore
\begin{align*}
\|P^2\|_{\mathcal L(\mathbb R^2,\mathbb R^2)}=\|P\|_{\mathcal L(\mathbb R^2,\mathbb R^2)}=1.
\end{align*}
This projection keeps the first coordinate and discards the second, so applying it twice has exactly the same effect as applying it once.
[/example]
### Completeness and Series
Infinite sums of operators occur in perturbation theory and spectral theory. To make sense of such sums, the operator space must be complete whenever the codomain is complete. The relevant norm on $\mathcal L(X,Y)$ is
\begin{align*}
\|T\|_{\mathcal L(X,Y)}=\sup\{\|Tx\|_Y : x\in X,\ \|x\|_X\le 1\},
\end{align*}
so completeness asks whether Cauchy sequences of bounded operators converge to bounded operators in this norm. When $Y$ is complete, the answer is yes: the space $\mathcal L(X,Y)$ is complete under the operator norm.
Completeness lets norm estimates produce actual operators as limits. The most important first example is the operator-valued geometric series, where the powers of an operator must be summed in the operator norm rather than only pointwise on vectors. The next theorem is the basic mechanism behind many inverse and perturbation arguments.
[quotetheorem:8545]
The Neumann series turns a bound smaller than $1$ into an inverse. This principle reappears in fixed-point arguments, integral equations, and perturbations of invertible maps.
## Computing Operator Norms
### Finite-Dimensional Computations
Computing an operator norm means solving an optimization problem on the unit ball. In finite-dimensional Euclidean spaces, compactness guarantees that the maximum is attained, and the answer is governed by the largest singular value.
[quotetheorem:9324]
Diagonal maps show the formula in its most transparent form: each coordinate is stretched independently, so the largest coordinate stretch wins.
[example: Diagonal Matrices]
Let $A:\mathbb R^n\to\mathbb R^n$ be the diagonal map
\begin{align*}
A(x_1,\ldots,x_n)=(a_1x_1,\ldots,a_nx_n),
\end{align*}
with the Euclidean norm on both domain and codomain. We compute its operator norm.
First, $A$ is linear. If $x=(x_1,\ldots,x_n)$, $y=(y_1,\ldots,y_n)$, and $\alpha,\beta\in\mathbb R$, then
\begin{align*}
\alpha x+\beta y=(\alpha x_1+\beta y_1,\ldots,\alpha x_n+\beta y_n).
\end{align*}
Applying $A$ gives
\begin{align*}
A(\alpha x+\beta y)=A(\alpha x_1+\beta y_1,\ldots,\alpha x_n+\beta y_n).
\end{align*}
By the definition of $A$,
\begin{align*}
A(\alpha x+\beta y)=(a_1(\alpha x_1+\beta y_1),\ldots,a_n(\alpha x_n+\beta y_n)).
\end{align*}
Distributing in each coordinate,
\begin{align*}
A(\alpha x+\beta y)=(\alpha a_1x_1+\beta a_1y_1,\ldots,\alpha a_nx_n+\beta a_ny_n).
\end{align*}
Also,
\begin{align*}
\alpha Ax=\alpha(a_1x_1,\ldots,a_nx_n)=(\alpha a_1x_1,\ldots,\alpha a_nx_n)
\end{align*}
and
\begin{align*}
\beta Ay=\beta(a_1y_1,\ldots,a_ny_n)=(\beta a_1y_1,\ldots,\beta a_ny_n).
\end{align*}
Adding these coordinatewise gives
\begin{align*}
\alpha Ax+\beta Ay=(\alpha a_1x_1+\beta a_1y_1,\ldots,\alpha a_nx_n+\beta a_ny_n).
\end{align*}
Thus $A(\alpha x+\beta y)=\alpha Ax+\beta Ay$, so $A$ is linear.
Set
\begin{align*}
M=\max_{1\le i\le n}|a_i|.
\end{align*}
For $x=(x_1,\ldots,x_n)$,
\begin{align*}
Ax=(a_1x_1,\ldots,a_nx_n).
\end{align*}
Therefore the Euclidean norm gives
\begin{align*}
|Ax|^2=|a_1x_1|^2+\cdots+|a_nx_n|^2.
\end{align*}
For each $i$, the definition of $M$ gives $|a_i|\le M$, and since $|x_i|^2\ge 0$,
\begin{align*}
|a_ix_i|^2=|a_i|^2|x_i|^2\le M^2|x_i|^2.
\end{align*}
Adding these $n$ inequalities,
\begin{align*}
|Ax|^2\le M^2|x_1|^2+\cdots+M^2|x_n|^2.
\end{align*}
Factoring out $M^2$,
\begin{align*}
M^2|x_1|^2+\cdots+M^2|x_n|^2=M^2(|x_1|^2+\cdots+|x_n|^2).
\end{align*}
By the definition of the Euclidean norm,
\begin{align*}
|x|^2=|x_1|^2+\cdots+|x_n|^2.
\end{align*}
Hence
\begin{align*}
|Ax|^2\le M^2|x|^2.
\end{align*}
Since $|Ax|$, $M$, and $|x|$ are nonnegative, taking square roots gives
\begin{align*}
|Ax|\le M|x|.
\end{align*}
Thus every $x$ with $|x|\le 1$ satisfies $|Ax|\le M$, so
\begin{align*}
\|A\|_{\mathcal L(\mathbb R^n,\mathbb R^n)}\le M.
\end{align*}
Choose an index $j$ such that $|a_j|=M$; this exists because the maximum is taken over the finite set $\{|a_1|,\ldots,|a_n|\}$. Let $e_j=(0,\ldots,0,1,0,\ldots,0)$ be the $j$th standard basis vector. Then
\begin{align*}
|e_j|^2=0^2+\cdots+0^2+1^2+0^2+\cdots+0^2=1.
\end{align*}
Since $|e_j|\ge 0$,
\begin{align*}
|e_j|=1.
\end{align*}
Also,
\begin{align*}
Ae_j=(0,\ldots,0,a_j,0,\ldots,0).
\end{align*}
Therefore
\begin{align*}
|Ae_j|^2=0^2+\cdots+0^2+|a_j|^2+0^2+\cdots+0^2=|a_j|^2.
\end{align*}
Since $|Ae_j|\ge 0$ and $|a_j|\ge 0$,
\begin{align*}
|Ae_j|=|a_j|=M.
\end{align*}
The vector $e_j$ lies in the Euclidean unit ball, so the supremum defining the operator norm is at least the value at $e_j$:
\begin{align*}
\|A\|_{\mathcal L(\mathbb R^n,\mathbb R^n)}\ge |Ae_j|=M.
\end{align*}
Combining the upper and lower bounds,
\begin{align*}
\|A\|_{\mathcal L(\mathbb R^n,\mathbb R^n)}=M=\max_{1\le i\le n}|a_i|.
\end{align*}
For a diagonal operator on Euclidean space, the operator norm is exactly the largest coordinate stretch.
[/example]
### Dual Norms
Scalar-valued bounded linear maps are important enough to have their own name: they are continuous linear functionals. Their operator norm is the dual norm, and it measures how strongly a functional can read a unit vector.
[definition: Dual Norm]
Let $X$ be a normed vector space over $\mathbb R$ or $\mathbb C$, and let $\mathbb F$ denote its scalar field. The continuous [dual space](/page/Dual%20Space) is
\begin{align*}
X^*:=\mathcal L(X,\mathbb F).
\end{align*}
For a bounded linear functional $f:X\to\mathbb F$, the dual norm of $f$ is
\begin{align*}
\|f\|_{X^*}:=\sup\{|f(x)|:x\in X,\ \|x\|_X\le 1\}.
\end{align*}
[/definition]
The dual norm packages inequalities such as Holder's inequality into operator language. When equality cases are understood, it often gives exact norms rather than estimates.
[example: Integral Functional on $L^p$]
Let $(E,\mathcal E,\mu)$ be a [measure space](/page/Measure%20Space), let $1<p<\infty$, and let $p'$ satisfy $1/p+1/p'=1$. For $g\in L^{p'}(E,\mathcal E,\mu)$, define
\begin{align*}
F_g(f)=\int_E f\overline g\,d\mu
\end{align*}
for $f\in L^p(E,\mathcal E,\mu)$. By *Holder's inequality*, $|f||g|\in L^1$ and
\begin{align*}
\int_E |f||g|\,d\mu\le \|f\|_{L^p}\|g\|_{L^{p'}}.
\end{align*}
Since $|f\overline g|=|f||g|$, the integral defining $F_g(f)$ is finite.
If $f,h\in L^p(E,\mathcal E,\mu)$ and $\alpha,\beta\in\mathbb C$, then $\alpha f+\beta h\in L^p(E,\mathcal E,\mu)$, and linearity of the integral gives
\begin{align*}
F_g(\alpha f+\beta h)=\int_E(\alpha f+\beta h)\overline g\,d\mu.
\end{align*}
Expanding the integrand,
\begin{align*}
(\alpha f+\beta h)\overline g=\alpha f\overline g+\beta h\overline g.
\end{align*}
Therefore
\begin{align*}
F_g(\alpha f+\beta h)=\alpha\int_E f\overline g\,d\mu+\beta\int_E h\overline g\,d\mu=\alpha F_g(f)+\beta F_g(h).
\end{align*}
Thus $F_g$ is linear. Also,
\begin{align*}
|F_g(f)|=\left|\int_E f\overline g\,d\mu\right|\le \int_E |f\overline g|\,d\mu=\int_E |f||g|\,d\mu.
\end{align*}
Using Holder's inequality again,
\begin{align*}
|F_g(f)|\le \|f\|_{L^p}\|g\|_{L^{p'}}.
\end{align*}
Hence $F_g$ is bounded and
\begin{align*}
\|F_g\|_{(L^p)^*}\le \|g\|_{L^{p'}}.
\end{align*}
If $g=0$ in $L^{p'}$, then $F_g(f)=0$ for every $f$, so $\|F_g\|_{(L^p)^*}=0=\|g\|_{L^{p'}}$. Now assume $g\ne 0$. Define
\begin{align*}
f_0(x)=|g(x)|^{p'-2}g(x)
\end{align*}
where $g(x)\ne 0$, and set $f_0(x)=0$ where $g(x)=0$. Then $f_0$ is measurable, and for every $x$,
\begin{align*}
|f_0(x)|=|g(x)|^{p'-1}.
\end{align*}
Thus
\begin{align*}
|f_0(x)|^p=|g(x)|^{(p'-1)p}.
\end{align*}
From $1/p+1/p'=1$ we get
\begin{align*}
\frac{1}{p'}=1-\frac{1}{p}=\frac{p-1}{p}.
\end{align*}
Hence
\begin{align*}
p'=\frac{p}{p-1}.
\end{align*}
Therefore
\begin{align*}
p'-1=\frac{p}{p-1}-1=\frac{1}{p-1}.
\end{align*}
Multiplying by $p$ gives
\begin{align*}
(p'-1)p=\frac{p}{p-1}=p'.
\end{align*}
Consequently,
\begin{align*}
\int_E |f_0|^p\,d\mu=\int_E |g|^{p'}\,d\mu<\infty.
\end{align*}
So $f_0\in L^p(E,\mathcal E,\mu)$, and
\begin{align*}
\|f_0\|_{L^p}=\left(\int_E |g|^{p'}\,d\mu\right)^{1/p}.
\end{align*}
Since
\begin{align*}
\|g\|_{L^{p'}}=\left(\int_E |g|^{p'}\,d\mu\right)^{1/p'},
\end{align*}
raising both sides to the power $p'/p$ gives
\begin{align*}
\|g\|_{L^{p'}}^{p'/p}=\left(\int_E |g|^{p'}\,d\mu\right)^{1/p}.
\end{align*}
Thus
\begin{align*}
\|f_0\|_{L^p}=\|g\|_{L^{p'}}^{p'/p}.
\end{align*}
Next,
\begin{align*}
F_g(f_0)=\int_E |g|^{p'-2}g\overline g\,d\mu.
\end{align*}
Because $g\overline g=|g|^2$, the integrand equals $|g|^{p'-2}|g|^2=|g|^{p'}$ on $\{g\ne 0\}$ and equals $0$ on $\{g=0\}$. Hence
\begin{align*}
F_g(f_0)=\int_E |g|^{p'}\,d\mu=\|g\|_{L^{p'}}^{p'}.
\end{align*}
Since $g\ne 0$, $\|g\|_{L^{p'}}>0$, and therefore $\|f_0\|_{L^p}>0$. Define
\begin{align*}
u_0=\frac{f_0}{\|f_0\|_{L^p}}.
\end{align*}
By homogeneity of the $L^p$ norm,
\begin{align*}
\|u_0\|_{L^p}=\frac{\|f_0\|_{L^p}}{\|f_0\|_{L^p}}=1.
\end{align*}
By linearity of $F_g$,
\begin{align*}
F_g(u_0)=\frac{F_g(f_0)}{\|f_0\|_{L^p}}.
\end{align*}
Also, multiplying $1/p+1/p'=1$ by $p'$ gives
\begin{align*}
\frac{p'}{p}+1=p'.
\end{align*}
Thus
\begin{align*}
\frac{p'}{p}=p'-1.
\end{align*}
Using the formulas for $F_g(f_0)$ and $\|f_0\|_{L^p}$,
\begin{align*}
|F_g(u_0)|=\frac{\|g\|_{L^{p'}}^{p'}}{\|g\|_{L^{p'}}^{p'/p}}=\frac{\|g\|_{L^{p'}}^{p'}}{\|g\|_{L^{p'}}^{p'-1}}=\|g\|_{L^{p'}}.
\end{align*}
Since $u_0$ has $L^p$ norm $1$, the supremum defining the dual norm gives
\begin{align*}
\|F_g\|_{(L^p)^*}\ge \|g\|_{L^{p'}}.
\end{align*}
Combining this lower bound with the earlier upper bound gives
\begin{align*}
\|F_g\|_{(L^p)^*}=\|g\|_{L^{p'}}.
\end{align*}
Thus integration against $g$ defines a bounded linear functional on $L^p$, and its operator norm is exactly the $L^{p'}$ norm of $g$.
[/example]
### Supremum Without Attainment
Infinite-dimensional spaces add a new phenomenon: the supremum in the operator norm may fail to be achieved. This is why norm computations often use maximizing sequences rather than maximizers.
[example: A Norm Not Attained]
Let $T:\ell^2\to\ell^2$ be defined by
\begin{align*}
T(x_1,x_2,\ldots)=\left(\frac{1}{2}x_1,\frac{2}{3}x_2,\frac{3}{4}x_3,\ldots\right).
\end{align*}
For $x=(x_1,x_2,\ldots)\in\ell^2$, the $n$th coordinate of $Tx$ is $(n/(n+1))x_n$. Since $0<n/(n+1)<1$ for every $n\ge 1$,
\begin{align*}
\left|\frac{n}{n+1}x_n\right|^2=\left(\frac{n}{n+1}\right)^2|x_n|^2\le |x_n|^2.
\end{align*}
For every $N\ge 1$, summing the first $N$ inequalities gives
\begin{align*}
\sum_{n=1}^{N}\left|\frac{n}{n+1}x_n\right|^2\le \sum_{n=1}^{N}|x_n|^2\le \sum_{n=1}^{\infty}|x_n|^2=\|x\|_{\ell^2}^2.
\end{align*}
The partial sums on the left are increasing and bounded above, so $Tx\in\ell^2$ and
\begin{align*}
\|Tx\|_{\ell^2}^2=\sum_{n=1}^{\infty}\left(\frac{n}{n+1}\right)^2|x_n|^2\le \|x\|_{\ell^2}^2.
\end{align*}
Taking square roots of nonnegative quantities gives
\begin{align*}
\|Tx\|_{\ell^2}\le \|x\|_{\ell^2}.
\end{align*}
The map is linear because scalar multiplication and addition occur coordinatewise: for $x,y\in\ell^2$ and scalars $\alpha,\beta$, the $n$th coordinate of $T(\alpha x+\beta y)$ is
\begin{align*}
\frac{n}{n+1}(\alpha x_n+\beta y_n)=\alpha\frac{n}{n+1}x_n+\beta\frac{n}{n+1}y_n,
\end{align*}
which is the $n$th coordinate of $\alpha Tx+\beta Ty$. Thus $T$ is bounded and
\begin{align*}
\|T\|_{\mathcal L(\ell^2)}\le 1.
\end{align*}
For the standard unit vector $e_n=(0,\ldots,0,1,0,\ldots)$, whose only nonzero coordinate is the $n$th coordinate,
\begin{align*}
\|e_n\|_{\ell^2}^2=1.
\end{align*}
Also,
\begin{align*}
Te_n=\left(0,\ldots,0,\frac{n}{n+1},0,\ldots\right),
\end{align*}
so
\begin{align*}
\|Te_n\|_{\ell^2}^2=\left(\frac{n}{n+1}\right)^2.
\end{align*}
Since $n/(n+1)>0$,
\begin{align*}
\|Te_n\|_{\ell^2}=\frac{n}{n+1}.
\end{align*}
Each $e_n$ has $\ell^2$ norm $1$, so the supremum defining the operator norm satisfies
\begin{align*}
\|T\|_{\mathcal L(\ell^2)}\ge \frac{n}{n+1}
\end{align*}
for every $n\ge 1$. Since
\begin{align*}
\frac{n}{n+1}=1-\frac{1}{n+1},
\end{align*}
the numbers $n/(n+1)$ tend to $1$. Therefore $\|T\|_{\mathcal L(\ell^2)}\ge 1$, and combining this with the upper bound gives
\begin{align*}
\|T\|_{\mathcal L(\ell^2)}=1.
\end{align*}
Now suppose that a unit vector $x=(x_1,x_2,\ldots)\in\ell^2$ attained the norm, so that $\|x\|_{\ell^2}=1$ and $\|Tx\|_{\ell^2}=1$. Then
\begin{align*}
0=\|x\|_{\ell^2}^2-\|Tx\|_{\ell^2}^2=\sum_{n=1}^{\infty}|x_n|^2-\sum_{n=1}^{\infty}\left(\frac{n}{n+1}\right)^2|x_n|^2.
\end{align*}
Because each term satisfies $\left(n/(n+1)\right)^2|x_n|^2\le |x_n|^2$, this difference is the sum of nonnegative terms:
\begin{align*}
0=\sum_{n=1}^{\infty}\left(1-\left(\frac{n}{n+1}\right)^2\right)|x_n|^2.
\end{align*}
For each fixed $k$, the $k$th term is nonnegative and is bounded above by the whole sum, so
\begin{align*}
\left(1-\left(\frac{k}{k+1}\right)^2\right)|x_k|^2=0.
\end{align*}
Since $0<k/(k+1)<1$, we have
\begin{align*}
1-\left(\frac{k}{k+1}\right)^2>0.
\end{align*}
Thus $|x_k|^2=0$ for every $k$, so $x=0$. This contradicts $\|x\|_{\ell^2}=1$. Hence the operator norm is $1$, but no unit vector attains it; the vectors $e_n$ approach the norm through the values $n/(n+1)\to 1$.
[/example]
This example is a compact lesson in infinite-dimensional thinking: the best constant may be real even when no best vector exists.
## Hilbert Space Structure
### Adjoints
Hilbert spaces add inner products to the story. This extra geometry lets operators be reflected across the [inner product](/page/Inner%20Product), producing adjoints and sharper norm identities. The existence of this reflected operator is a real theorem, supplied by the [Riesz representation theorem](/theorems/218); boundedness of $T$ is what makes the representing vector depend continuously on $y$.
[definition: Hilbert Space Adjoint]
Let $H$ and $K$ be Hilbert spaces over the same scalar field, with inner products linear in the first argument. For $T\in\mathcal L(H,K)$, the Hilbert space adjoint of $T$ is the unique operator $T^*\in\mathcal L(K,H)$ satisfying
\begin{align*}
(Tx,y)_K=(x,T^*y)_H
\end{align*}
for all $x\in H$ and $y\in K$.
[/definition]
Adjoints are useful only because they preserve the operator-norm scale. The following identities convert norm questions about $T$ into norm questions about $T^*$ and $T^*T$.
[quotetheorem:9325]
The identity $\|T^*T\|=\|T\|^2$ is stronger than what submultiplicativity alone provides. It is one of the places where Hilbert space geometry gives additional structure beyond Banach space theory.
### Projections
Orthogonal projections are the geometric operators that split a Hilbert space into a closed subspace and its perpendicular complement. Their operator norm measures the fact that discarding a perpendicular component cannot increase length.
[definition: Orthogonal Projection]
Let $H$ be a Hilbert space and let $M\subset H$ be a closed linear subspace. The [orthogonal projection](/theorems/437) onto $M$ is the map $P_M:H\to H$ such that $P_Mx\in M$ and $x-P_Mx\in M^\perp$ for each $x\in H$.
[/definition]
Once the projection exists, the next analytic question is whether it is controlled in operator norm. The answer is the norm version of the [Pythagorean theorem](/theorems/3266): a component of a vector cannot be longer than the vector itself. This makes closed subspaces manageable in Hilbert space estimates.
[quotetheorem:9326]
A concrete projection in $L^2$ shows the operator norm as an energy estimate. The estimate below is just Cauchy-Schwarz, but the conclusion is a statement about a bounded operator.
[example: Projection Onto Constants in $L^2$]
Let $H=L^2(0,1)$ with inner product
\begin{align*}
(f,g)_{L^2}=\int_0^1 f\overline g\,d\mathcal L^1.
\end{align*}
Let $M$ be the subspace of constant functions. For $f\in H$, the integral $\int_0^1 f\,d\mathcal L^1$ is finite because the *[Cauchy-Schwarz inequality](/theorems/432)* gives
\begin{align*}
\int_0^1 |f(x)|\,d\mathcal L^1(x)\le \left(\int_0^1 |f(x)|^2\,d\mathcal L^1(x)\right)^{1/2}\left(\int_0^1 1^2\,d\mathcal L^1(x)\right)^{1/2}.
\end{align*}
Since $\mathcal L^1((0,1))=1$, the second factor is $1$, so $\int_0^1 |f|\,d\mathcal L^1\le \|f\|_{L^2}$. Define $P_Mf$ to be the constant function with value
\begin{align*}
a_f=\int_0^1 f(x)\,d\mathcal L^1(x).
\end{align*}
First we verify that this is the orthogonal projection onto $M$. Certainly $P_Mf\in M$. If $c$ is a constant function, then
\begin{align*}
(f-P_Mf,c)_{L^2}=\int_0^1 (f(x)-a_f)\overline c\,d\mathcal L^1(x).
\end{align*}
Since $\overline c$ is constant,
\begin{align*}
\int_0^1 (f(x)-a_f)\overline c\,d\mathcal L^1(x)=\overline c\int_0^1 f(x)\,d\mathcal L^1(x)-\overline c\int_0^1 a_f\,d\mathcal L^1(x).
\end{align*}
The first integral is $a_f$, and the second integral is
\begin{align*}
\int_0^1 a_f\,d\mathcal L^1(x)=a_f\mathcal L^1((0,1))=a_f.
\end{align*}
Therefore
\begin{align*}
(f-P_Mf,c)_{L^2}=\overline c\,a_f-\overline c\,a_f=0.
\end{align*}
Thus $f-P_Mf\in M^\perp$, so $P_M$ is the orthogonal projection onto the constants.
The map $P_M$ is linear. If $f,g\in L^2(0,1)$ and $\alpha,\beta$ are scalars, then
\begin{align*}
a_{\alpha f+\beta g}=\int_0^1(\alpha f(x)+\beta g(x))\,d\mathcal L^1(x)=\alpha\int_0^1 f(x)\,d\mathcal L^1(x)+\beta\int_0^1 g(x)\,d\mathcal L^1(x).
\end{align*}
Hence
\begin{align*}
a_{\alpha f+\beta g}=\alpha a_f+\beta a_g.
\end{align*}
Since $P_Mh$ is the constant function with value $a_h$, this gives
\begin{align*}
P_M(\alpha f+\beta g)=\alpha P_Mf+\beta P_Mg.
\end{align*}
Now compute the operator norm. Since $P_Mf$ is the constant function $a_f$,
\begin{align*}
\|P_Mf\|_{L^2}^2=\int_0^1 |a_f|^2\,d\mathcal L^1=|a_f|^2\mathcal L^1((0,1))=|a_f|^2.
\end{align*}
Both sides are nonnegative, so
\begin{align*}
\|P_Mf\|_{L^2}=|a_f|.
\end{align*}
Also,
\begin{align*}
|a_f|=\left|\int_0^1 f(x)\cdot 1\,d\mathcal L^1(x)\right|.
\end{align*}
By the *Cauchy-Schwarz inequality* in $L^2(0,1)$,
\begin{align*}
|a_f|\le \|f\|_{L^2}\|1\|_{L^2}.
\end{align*}
Since
\begin{align*}
\|1\|_{L^2}=\left(\int_0^1 1^2\,d\mathcal L^1\right)^{1/2}=\mathcal L^1((0,1))^{1/2}=1,
\end{align*}
we obtain
\begin{align*}
\|P_Mf\|_{L^2}=|a_f|\le \|f\|_{L^2}.
\end{align*}
Thus $P_M$ is bounded and
\begin{align*}
\|P_M\|_{\mathcal L(L^2(0,1))}\le 1.
\end{align*}
For the constant function $f=1$,
\begin{align*}
a_1=\int_0^1 1\,d\mathcal L^1=\mathcal L^1((0,1))=1.
\end{align*}
So $P_M1$ is the constant function $1$. Also,
\begin{align*}
\|1\|_{L^2}=\left(\int_0^1 1^2\,d\mathcal L^1\right)^{1/2}=1.
\end{align*}
Therefore
\begin{align*}
\|P_M1\|_{L^2}=\|1\|_{L^2}=1.
\end{align*}
The vector $1$ has $L^2$ norm $1$, so the supremum defining the operator norm is at least $1$:
\begin{align*}
\|P_M\|_{\mathcal L(L^2(0,1))}\ge \|P_M1\|_{L^2}=1.
\end{align*}
Combining the upper and lower bounds gives
\begin{align*}
\|P_M\|_{\mathcal L(L^2(0,1))}=1.
\end{align*}
Projecting onto constants keeps the average part of a function and discards the orthogonal fluctuation, so it cannot increase $L^2$ energy, and constants show that the bound is sharp.
[/example]
## Operator Norms in Analysis
### Embeddings
Many analytic theorems say that one function space sits continuously inside another. The operator norm of the inclusion is the best constant in the embedding inequality.
[definition: Continuous Embedding]
Let $X$ and $Y$ be normed vector spaces whose elements belong to a common ambient vector space. A continuous embedding $X\hookrightarrow Y$ is the inclusion map $j:X\to Y$ such that $j\in\mathcal L(X,Y)$.
[/definition]
Once an embedding is interpreted as an operator, its norm becomes the quantitative loss incurred when moving from the stronger space to the weaker one. The next result records this loss as the optimal constant in the embedding inequality, which is the form used in Sobolev and interpolation estimates.
[quotetheorem:9327]
For a basic example, the $C^1$ norm controls the $C^0$ norm without loss. The inclusion forgets derivative information, but it does not enlarge the supremum norm.
[example: Inclusion from $C^1$ to $C$]
Let $X=C^1([0,1])$ with
\begin{align*}
\|f\|_{C^1}=\|f\|_{C^0}+\|f'\|_{C^0},
\end{align*}
and let $Y=C([0,1])$ with the supremum norm $\|\cdot\|_{C^0}$. Define $j:X\to Y$ by $jf=f$. This is well-defined because every continuously differentiable function on $[0,1]$ is continuous on $[0,1]$.
The map $j$ is linear. If $f,g\in X$ and $\alpha,\beta$ are scalars, then for every $x\in[0,1]$,
\begin{align*}
j(\alpha f+\beta g)(x)=(\alpha f+\beta g)(x)=\alpha f(x)+\beta g(x).
\end{align*}
Also,
\begin{align*}
(\alpha jf+\beta jg)(x)=\alpha(jf)(x)+\beta(jg)(x)=\alpha f(x)+\beta g(x).
\end{align*}
Thus $j(\alpha f+\beta g)=\alpha jf+\beta jg$.
For every $f\in C^1([0,1])$,
\begin{align*}
\|jf\|_{C^0}=\|f\|_{C^0}.
\end{align*}
Since $\|f'\|_{C^0}\ge 0$, we have
\begin{align*}
\|f\|_{C^0}\le \|f\|_{C^0}+\|f'\|_{C^0}.
\end{align*}
Using the definition of the $C^1$ norm,
\begin{align*}
\|jf\|_{C^0}\le \|f\|_{C^1}.
\end{align*}
Therefore $j$ is bounded and
\begin{align*}
\|j\|_{\mathcal L(C^1([0,1]),C([0,1]))}\le 1.
\end{align*}
To prove that the bound is sharp, take the constant function $u(x)=1$. Then $u\in C^1([0,1])$ and $u'(x)=0$ for every $x\in[0,1]$. Hence
\begin{align*}
\|u\|_{C^0}=\sup_{x\in[0,1]}|1|=1.
\end{align*}
Also,
\begin{align*}
\|u'\|_{C^0}=\sup_{x\in[0,1]}|0|=0.
\end{align*}
So
\begin{align*}
\|u\|_{C^1}=\|u\|_{C^0}+\|u'\|_{C^0}=1+0=1.
\end{align*}
Since $ju=u$,
\begin{align*}
\|ju\|_{C^0}=\|u\|_{C^0}=1.
\end{align*}
The vector $u$ has $C^1$ norm $1$, so the supremum defining the operator norm is at least $1$:
\begin{align*}
\|j\|_{\mathcal L(C^1([0,1]),C([0,1]))}\ge \|ju\|_{C^0}=1.
\end{align*}
Combining the upper and lower bounds gives
\begin{align*}
\|j\|_{\mathcal L(C^1([0,1]),C([0,1]))}=1.
\end{align*}
The inclusion forgets the derivative term, and constant functions show that this loss of information does not force any reduction in the best possible norm bound.
[/example]
### Solution Operators
Differential and integral equations often produce maps from data to solutions. The operator norm of such a map is a stability constant: it measures how much a solution can change when the input data changes.
[example: Antiderivative as a Solution Operator]
Let $S:C([0,1])\to C^1([0,1])$ be defined by
\begin{align*}
(Sf)(x)=\int_0^x f(t)\,d\mathcal L^1(t),
\end{align*}
where $C([0,1])$ has $\|\cdot\|_{C^0}$ and $C^1([0,1])$ has $\|u\|_{C^1}=\|u\|_{C^0}+\|u'\|_{C^0}$. Since $f$ is continuous, *the [fundamental theorem of calculus](/theorems/632)* gives $Sf\in C^1([0,1])$ and
\begin{align*}
(Sf)'=f.
\end{align*}
Thus $S$ is well-defined.
If $f,g\in C([0,1])$ and $\alpha,\beta$ are scalars, then for every $x\in[0,1]$,
\begin{align*}
S(\alpha f+\beta g)(x)=\int_0^x(\alpha f(t)+\beta g(t))\,d\mathcal L^1(t).
\end{align*}
By linearity of the integral,
\begin{align*}
S(\alpha f+\beta g)(x)=\alpha\int_0^x f(t)\,d\mathcal L^1(t)+\beta\int_0^x g(t)\,d\mathcal L^1(t).
\end{align*}
Using the definition of $S$ on each integral,
\begin{align*}
S(\alpha f+\beta g)(x)=\alpha(Sf)(x)+\beta(Sg)(x).
\end{align*}
Since this holds for every $x\in[0,1]$, we have $S(\alpha f+\beta g)=\alpha Sf+\beta Sg$. Hence $S$ is linear.
Now fix $f\in C([0,1])$ and $x\in[0,1]$. By the triangle inequality for integrals,
\begin{align*}
|(Sf)(x)|=\left|\int_0^x f(t)\,d\mathcal L^1(t)\right|\le \int_0^x |f(t)|\,d\mathcal L^1(t).
\end{align*}
For every $t\in[0,1]$, the definition of the supremum norm gives $|f(t)|\le \|f\|_{C^0}$. Therefore
\begin{align*}
\int_0^x |f(t)|\,d\mathcal L^1(t)\le \int_0^x \|f\|_{C^0}\,d\mathcal L^1(t).
\end{align*}
Since $\|f\|_{C^0}$ is constant in $t$,
\begin{align*}
\int_0^x \|f\|_{C^0}\,d\mathcal L^1(t)=\|f\|_{C^0}\mathcal L^1([0,x])=x\|f\|_{C^0}.
\end{align*}
Because $0\le x\le 1$, this gives
\begin{align*}
|(Sf)(x)|\le x\|f\|_{C^0}\le \|f\|_{C^0}.
\end{align*}
Taking the supremum over $x\in[0,1]$ yields
\begin{align*}
\|Sf\|_{C^0}\le \|f\|_{C^0}.
\end{align*}
Also, since $(Sf)'=f$,
\begin{align*}
\|(Sf)'\|_{C^0}=\|f\|_{C^0}.
\end{align*}
Thus
\begin{align*}
\|Sf\|_{C^1}=\|Sf\|_{C^0}+\|(Sf)'\|_{C^0}\le \|f\|_{C^0}+\|f\|_{C^0}=2\|f\|_{C^0}.
\end{align*}
So $S$ is bounded and
\begin{align*}
\|S\|_{\mathcal L(C([0,1]),C^1([0,1]))}\le 2.
\end{align*}
To prove that this constant is sharp, take the constant function $f=1$. Then
\begin{align*}
\|f\|_{C^0}=\sup_{x\in[0,1]}|1|=1.
\end{align*}
For this input,
\begin{align*}
(Sf)(x)=\int_0^x 1\,d\mathcal L^1(t)=\mathcal L^1([0,x])=x.
\end{align*}
Hence
\begin{align*}
\|Sf\|_{C^0}=\sup_{x\in[0,1]}|x|=1.
\end{align*}
Also $(Sf)'=1$, so
\begin{align*}
\|(Sf)'\|_{C^0}=\sup_{x\in[0,1]}|1|=1.
\end{align*}
Therefore
\begin{align*}
\|Sf\|_{C^1}=\|Sf\|_{C^0}+\|(Sf)'\|_{C^0}=1+1=2.
\end{align*}
Since this input satisfies $\|f\|_{C^0}=1$, the definition of the operator norm gives
\begin{align*}
\|S\|_{\mathcal L(C([0,1]),C^1([0,1]))}\ge \|Sf\|_{C^1}=2.
\end{align*}
Combining the lower bound with the upper bound,
\begin{align*}
\|S\|_{\mathcal L(C([0,1]),C^1([0,1]))}=2.
\end{align*}
The antiderivative operator sends data $f$ to the solution $u=Sf$ of $u'=f$ with $u(0)=0$, and its norm records the simultaneous control of both the solution size and its derivative.
[/example]
This is a small model for elliptic and parabolic estimates, where solving a PDE becomes a bounded operator from a data space into a solution space.
## Spectral and Perturbative Uses
### Spectral Bounds
The spectrum of an operator is defined by invertibility, not by vector stretching. Even so, the operator norm gives a first enclosure for the spectrum, because large spectral parameters make a Neumann-series inversion possible.
[definition: Spectral Radius]
Let $X$ be a complex Banach space, and let $T\in\mathcal L(X)$. The spectrum of $T$ is
\begin{align*}
\sigma(T):=\{\lambda\in\mathbb C:\lambda I_X-T\text{ is not invertible in }\mathcal L(X)\}.
\end{align*}
The spectral radius of $T$ is
\begin{align*}
r(T):=\sup\{|\lambda|:\lambda\in\sigma(T)\}.
\end{align*}
[/definition]
The spectral radius may look unrelated to the unit-ball stretching measured by $\|T\|$. The obstruction is that $\lambda\in\sigma(T)$ is defined by failure of invertibility of $\lambda I-T$, while $\|T\|$ only bounds the size of $T$ on the unit ball. For $|\lambda|>\|T\|$, however, the operator $\lambda I-T=\lambda(I-\lambda^{-1}T)$ is a small perturbation of the invertible operator $\lambda I$, so the Neumann-series principle forces invertibility. This gives the first disk that must contain the spectrum.
[quotetheorem:9328]
This estimate is crude for some operators and sharp for others, but it is often the first guaranteed region in which the spectrum must lie.
### Perturbations
Analysis frequently asks whether a solvable problem remains solvable after a small error is introduced. The operator norm is the topology in which invertibility is stable.
[quotetheorem:9329]
The condition depends on $\|A^{-1}\|$, not just on $\|E\|$. A poorly conditioned inverse leaves less room for perturbation.
[example: Perturbing the Identity]
Let $X$ be a Banach space and let $E\in\mathcal L(X)$ satisfy $\|E\|_{\mathcal L(X)}\le 1/2$. Put $A=-E$. By homogeneity of the operator norm,
\begin{align*}
\|A\|_{\mathcal L(X)}=\|-E\|_{\mathcal L(X)}=|-1|\|E\|_{\mathcal L(X)}=\|E\|_{\mathcal L(X)}\le \frac12.
\end{align*}
For $n=0$, $A^0=I$ and $\|I\|_{\mathcal L(X)}\le 1=\|A\|_{\mathcal L(X)}^0$. For $n\ge 1$, repeated use of *[Submultiplicativity of the Operator Norm](/theorems/1054)* gives
\begin{align*}
\|A^n\|_{\mathcal L(X)}=\|A^{n-1}A\|_{\mathcal L(X)}\le \|A^{n-1}\|_{\mathcal L(X)}\|A\|_{\mathcal L(X)}\le \cdots \le \|A\|_{\mathcal L(X)}^n.
\end{align*}
Therefore, for every $n\ge 0$,
\begin{align*}
\|A^n\|_{\mathcal L(X)}\le \|A\|_{\mathcal L(X)}^n\le \left(\frac12\right)^n.
\end{align*}
For integers $M>N\ge 0$, define the partial sums $S_N=\sum_{n=0}^{N}A^n$. Then
\begin{align*}
\|S_M-S_N\|_{\mathcal L(X)}=\left\|\sum_{n=N+1}^{M}A^n\right\|_{\mathcal L(X)}\le \sum_{n=N+1}^{M}\|A^n\|_{\mathcal L(X)}.
\end{align*}
Using the bound above,
\begin{align*}
\sum_{n=N+1}^{M}\|A^n\|_{\mathcal L(X)}\le \sum_{n=N+1}^{M}\left(\frac12\right)^n\le \sum_{n=N+1}^{\infty}\left(\frac12\right)^n.
\end{align*}
The geometric tail is
\begin{align*}
\sum_{n=N+1}^{\infty}\left(\frac12\right)^n=\frac{(1/2)^{N+1}}{1-1/2}=2\left(\frac12\right)^{N+1}.
\end{align*}
This tends to $0$ as $N\to\infty$, so $(S_N)$ is Cauchy in operator norm. Since $X$ is Banach, the completeness fact for operator spaces discussed above implies that $\mathcal L(X)$ is Banach, so $S_N$ converges in operator norm to some $S\in\mathcal L(X)$. We write
\begin{align*}
S=\sum_{n=0}^{\infty}A^n=\sum_{n=0}^{\infty}(-E)^n.
\end{align*}
Now compute the finite products. Since $A A^n=A^{n+1}=A^n A$ for every $n\ge 0$,
\begin{align*}
AS_N=A\sum_{n=0}^{N}A^n=\sum_{n=0}^{N}A^{n+1}=A+A^2+\cdots+A^{N+1}.
\end{align*}
Also
\begin{align*}
S_NA=\left(\sum_{n=0}^{N}A^n\right)A=\sum_{n=0}^{N}A^{n+1}=A+A^2+\cdots+A^{N+1}.
\end{align*}
Hence
\begin{align*}
(I-A)S_N=S_N-AS_N=(I+A+\cdots+A^N)-(A+A^2+\cdots+A^{N+1})=I-A^{N+1}.
\end{align*}
Similarly,
\begin{align*}
S_N(I-A)=S_N-S_NA=(I+A+\cdots+A^N)-(A+A^2+\cdots+A^{N+1})=I-A^{N+1}.
\end{align*}
The remainder tends to zero in operator norm because
\begin{align*}
\|A^{N+1}\|_{\mathcal L(X)}\le \left(\frac12\right)^{N+1}.
\end{align*}
Also, by *Submultiplicativity of the Operator Norm*,
\begin{align*}
\|(I-A)S_N-(I-A)S\|_{\mathcal L(X)}=\|(I-A)(S_N-S)\|_{\mathcal L(X)}\le \|I-A\|_{\mathcal L(X)}\|S_N-S\|_{\mathcal L(X)}.
\end{align*}
Since $S_N\to S$ in operator norm, the right-hand side tends to $0$. Passing to the limit in $(I-A)S_N=I-A^{N+1}$ gives
\begin{align*}
(I-A)S=I.
\end{align*}
The same argument applied to
\begin{align*}
S_N(I-A)=I-A^{N+1}
\end{align*}
gives
\begin{align*}
S(I-A)=I.
\end{align*}
Thus $S$ is a two-sided inverse for $I-A$. Since $A=-E$, we have $I-A=I+E$, and therefore
\begin{align*}
(I+E)^{-1}=S=\sum_{n=0}^{\infty}(-E)^n.
\end{align*}
It remains to record the norm bound. For each $N$,
\begin{align*}
\|S_N\|_{\mathcal L(X)}=\left\|\sum_{n=0}^{N}A^n\right\|_{\mathcal L(X)}\le \sum_{n=0}^{N}\|A^n\|_{\mathcal L(X)}\le \sum_{n=0}^{N}\left(\frac12\right)^n\le \sum_{n=0}^{\infty}\left(\frac12\right)^n=2.
\end{align*}
Because $S_N\to S$ in operator norm, the norm is continuous:
\begin{align*}
\|S\|_{\mathcal L(X)}\le \|S-S_N\|_{\mathcal L(X)}+\|S_N\|_{\mathcal L(X)}.
\end{align*}
Taking $N\to\infty$ gives
\begin{align*}
\|S\|_{\mathcal L(X)}\le 2.
\end{align*}
Since $S=(I+E)^{-1}$, we conclude
\begin{align*}
\|(I+E)^{-1}\|_{\mathcal L(X)}\le 2.
\end{align*}
Thus the estimate $\|E\|_{\mathcal L(X)}\le 1/2$ gives both existence of the inverse and the quantitative stability bound $\|(I+E)^{-1}\|_{\mathcal L(X)}\le 2$.
[/example]
## Beyond and Connected Topics
The operator norm is the entry point to [Banach spaces](/page/Banach%20Space) and [Hilbert spaces](/page/Hilbert%20Space). In Banach space theory, it defines the topology on $\mathcal L(X,Y)$, the dual norm on $X^*$, and the meaning of uniform convergence of operators. The notes [Cambridge II Linear Analysis](/page/Cambridge%20II%20Linear%20Analysis) develop this normed-space viewpoint systematically.
In functional analysis, operator norms interact with [weak convergence](/page/Weak%20Convergence), compactness, adjoints, and spectral theory. Compact operators refine the unit-ball picture by sending bounded sets to relatively compact sets, leading to a spectral theory closer to finite-dimensional linear algebra. This direction belongs naturally with [Cambridge III Functional Analysis](/page/Cambridge%20III%20Functional%20Analysis).
In analysis and PDE, many estimates are operator norm bounds in disguise. Sobolev embeddings, trace maps, elliptic solution operators, and integral transforms all become continuous linear maps between function spaces. The operator norm is the optimal constant in the relevant estimate, even when computing that constant is hard.
In complex analysis, bounded linear operators appear through integral transforms, multiplication operators, composition operators, and Cauchy-type projections. The norm viewpoint complements the [analytic function](/page/Analytic%20Function) theory developed in [Cambridge IB Complex Analysis](/page/Cambridge%20IB%20Complex%20Analysis), especially when holomorphic objects are studied inside normed spaces of functions.
Operator norm convergence is only one topology on operators. Strong operator convergence and weak operator convergence are weaker topologies that are indispensable in advanced analysis, but they give up uniform control on the unit ball. Learning which topology is strong enough for a given argument is a central judgment in operator theory.
## References
Androma, [Cambridge II Linear Analysis](/page/Cambridge%20II%20Linear%20Analysis).
Androma, [Cambridge III Functional Analysis](/page/Cambridge%20III%20Functional%20Analysis).
Androma, [Cambridge IB Analysis and Topology](/page/Cambridge%20IB%20Analysis%20and%20Topology).
Androma, [Cambridge IB Complex Analysis](/page/Cambridge%20IB%20Complex%20Analysis).
Walter Rudin, *Functional Analysis* (1991).
John B. Conway, *A Course in Functional Analysis* (1990).
Michael Reed and Barry Simon, *Methods of Modern Mathematical Physics I: Functional Analysis* (1980).
Operator Norm
Also known as: operator norm, induced norm, induced operator norm, norm of a linear operator, bounded operator norm