Sequences can appear to settle down long before we understand what they are approaching. The decimals $3$, $3.1$, $3.14$, $3.141$ suggest a limiting number, while $(-1)^n$ never chooses a side. Convergence is the language that separates genuine stabilization from repeated pattern.
The main problem is not only whether a limit exists. The deeper question is what survives after passing to that limit. Continuity, inequalities, integrals, derivatives, and minimizers are all fragile under weak forms of convergence. A useful theory must say which mode of convergence protects which operation.
[example: A Pointwise Limit That Breaks Continuity]
Here $C([0,1];\mathbb{R})$ denotes the set of continuous functions from $[0,1]$ to $\mathbb{R}$. For each $n \in \mathbb{N}$, define $f_n\in C([0,1];\mathbb{R})$ by
\begin{align*}
f_n(x)=x^n,\qquad 0\le x\le 1.
\end{align*}
We compute the pointwise limit. If $x=0$, then $f_n(0)=0^n=0$ for every $n$, so $f_n(0)\to 0$. If $0<x<1$, write $x=1/(1+t)$ with $t=(1-x)/x>0$. For every $n\in\mathbb{N}$, Bernoulli's inequality gives $(1+t)^n\ge 1+nt$, hence
\begin{align*}
0\le x^n=\frac{1}{(1+t)^n}\le \frac{1}{1+nt}.
\end{align*}
Given $\varepsilon>0$, choose $N\in\mathbb{N}$ with $N>(1/\varepsilon-1)/t$ when $\varepsilon<1$, and take $N=1$ when $\varepsilon\ge 1$. Then for every $n\ge N$,
\begin{align*}
0\le x^n\le \frac{1}{1+nt}<\varepsilon.
\end{align*}
Thus $x^n\to 0$ for every $0\le x<1$. At the endpoint,
\begin{align*}
f_n(1)=1^n=1
\end{align*}
for every $n$, so $f_n(1)\to 1$.
Therefore the pointwise limit $f:[0,1]\to\mathbb{R}$ is
\begin{align*}
f(x)=0\text{ for }0\le x<1,\qquad f(1)=1.
\end{align*}
This limit is not continuous at $1$: for $\varepsilon=1/2$, every $\delta>0$ admits a point $y\in[0,1)$ with $|y-1|<\delta$, for instance $y=1-\min(\delta/2,1/2)$, and then
\begin{align*}
|f(y)-f(1)|=|0-1|=1>\frac12.
\end{align*}
So each $f_n$ is continuous, but the pointwise limit is discontinuous; pointwise convergence controls each fixed input separately and does not preserve continuity.
[/example]
The failure is concentrated near $1$. For each fixed $x<1$ the sequence eventually becomes small, but there is no single stage after which all points near $1$ are controlled. This distinction between point-by-point control and uniform control will guide the whole chapter.
## Definition
To speak about convergence, we first need a structure that measures closeness. In $\mathbb{R}$ this is absolute value; in a general metric space it is a distance function. The definition asks whether every tolerance around a candidate limit eventually traps the sequence.
[definition: Convergence in a Metric Space]
Let $(X,d)$ be a metric space, let $(x_n)_{n=1}^\infty$ be a sequence in $X$, and let $x\in X$. The sequence $(x_n)$ converges to $x$ if for every $\varepsilon>0$ there exists $N\in\mathbb{N}$ such that for every $n\ge N$,
\begin{align*}
d(x_n,x)<\varepsilon.
\end{align*}
Write $x_n\to x$ as $n\to\infty$.
[/definition]
This definition depends on a proposed limit. In many constructions, however, an approximation scheme gives terms that are close to each other without revealing the limiting object in advance. We therefore need an internal convergence test that mentions only the tail of the sequence.
## Cauchy Sequences and Completeness
When the candidate limit is unknown, the previous definition is hard to use directly. A numerical scheme, an iteration, or a sequence of approximations may only tell us that late terms are close to one another. The Cauchy condition isolates this internal stabilization: the tail must eventually fit inside any prescribed tolerance, even before we know where it should land.
[definition: Cauchy Sequence]
Let $(X,d)$ be a metric space. A sequence $(x_n)_{n=1}^\infty$ in $X$ is a Cauchy sequence if for every $\varepsilon>0$ there exists $N\in\mathbb{N}$ such that for all $m,n\ge N$,
\begin{align*}
d(x_m,x_n)<\varepsilon.
\end{align*}
[/definition]
A Cauchy sequence may still fail to converge if the ambient space has holes. The rational approximations to $\sqrt{2}$ want to converge, but their limit is absent from $\mathbb{Q}$. This motivates making completeness a property of the space itself.
[definition: Complete Metric Space]
A metric space $(X,d)$ is complete if every Cauchy sequence in $X$ converges to an element of $X$.
[/definition]
Completeness matters because existence proofs often build Cauchy sequences rather than closed formulas. Before completeness can be used, we need the basic fact that genuine convergence always gives the Cauchy property.
[quotetheorem:7844]
The theorem gives a necessary test for convergence. It also explains why completeness is the missing converse condition: in a complete space, the internal Cauchy test is enough.
[example: A Cauchy Sequence Missing Its Limit in $\mathbb{Q}$]
Let
\begin{align*}q_n=\frac{\lfloor 10^n\sqrt{2}\rfloor}{10^n}\in\mathbb{Q}.\end{align*}
By the defining property of the floor function,
\begin{align*}\lfloor 10^n\sqrt{2}\rfloor\le 10^n\sqrt{2}<\lfloor 10^n\sqrt{2}\rfloor+1.\end{align*}
Dividing by $10^n$ gives
\begin{align*}q_n\le \sqrt{2}<q_n+10^{-n},\end{align*}
so
\begin{align*}0\le \sqrt{2}-q_n<10^{-n}.\end{align*}
We show that $(q_n)$ is Cauchy in $\mathbb{Q}$ with the metric $d(x,y)=|x-y|$. Let $\varepsilon>0$, and choose $N\in\mathbb{N}$ such that $2\cdot 10^{-N}<\varepsilon$. If $m,n\ge N$, then
\begin{align*}|q_m-q_n|=|(\sqrt{2}-q_n)-(\sqrt{2}-q_m)|.\end{align*}
By the triangle inequality and the error bound above,
\begin{align*}|q_m-q_n|\le |\sqrt{2}-q_n|+|\sqrt{2}-q_m|<10^{-n}+10^{-m}\le 2\cdot 10^{-N}<\varepsilon.\end{align*}
Thus $(q_n)$ is Cauchy in $\mathbb{Q}$.
It remains to see why the sequence has no limit in $\mathbb{Q}$. Suppose, toward a contradiction, that $q_n\to q$ in $\mathbb{Q}$ for some $q\in\mathbb{Q}$. The same absolute-value metric is being used inside $\mathbb{R}$, so also $q_n\to q$ as a real sequence. But the bound $0\le \sqrt{2}-q_n<10^{-n}$ shows $q_n\to\sqrt{2}$ in $\mathbb{R}$: given $\varepsilon>0$, choose $N$ with $10^{-N}<\varepsilon$, and then $n\ge N$ implies $|q_n-\sqrt{2}|<\varepsilon$.
A real sequence has at most one limit: if $q_n\to q$ and $q_n\to \sqrt{2}$ with $q\ne\sqrt{2}$, take $\varepsilon=|q-\sqrt{2}|/3$ and choose $n$ large enough that $|q_n-q|<\varepsilon$ and $|q_n-\sqrt{2}|<\varepsilon$; then
\begin{align*}|q-\sqrt{2}|\le |q-q_n|+|q_n-\sqrt{2}|<2\varepsilon=\frac{2}{3}|q-\sqrt{2}|,\end{align*}
which is impossible. Hence $q=\sqrt{2}$.
Finally, $\sqrt{2}$ is not rational. If $\sqrt{2}=a/b$ with integers $a,b$ having no common factor and $b\ne 0$, then $a^2=2b^2$, so $a^2$ is even and therefore $a$ is even. Write $a=2c$. Then
\begin{align*}4c^2=2b^2.\end{align*}
Dividing by $2$ gives
\begin{align*}2c^2=b^2,\end{align*}
so $b$ is even, contradicting that $a$ and $b$ have no common factor. Therefore $(q_n)$ is Cauchy in $\mathbb{Q}$ but does not converge in $\mathbb{Q}$; the missing limit is the real number $\sqrt{2}$.
[/example]
## Numerical Limits and Algebraic Stability
### Limit Laws
Numerical convergence is the first setting where limits become calculational. We want to replace complicated sequences by their limits inside algebraic expressions, but this is legitimate only when the operations are continuous with respect to the chosen notion of closeness.
The same idea works in any normed vector space, where the error is measured by the norm of the difference. This definition is needed because many analytic objects are vectors: functions, sequences, matrices, and distributions after embedding into suitable spaces. The space $\ell^2$ denotes the square-summable scalar sequences $(a_n)$ with $\sum_{n=1}^\infty |a_n|^2<\infty$.
[definition: Norm Convergence]
Let $(X,\|\cdot\|_X)$ be a normed vector space over $\mathbb{R}$ or $\mathbb{C}$. A sequence $(x_n)_{n=1}^\infty$ in $X$ converges in norm to $x\in X$ if
\begin{align*}
\|x_n-x\|_X\to 0.
\end{align*}
[/definition]
Once convergence is expressed as small norm error, addition and scalar multiplication behave predictably. For real scalar sequences, multiplication and division can also be controlled, but division needs the limiting denominator to stay away from zero. Order comparisons are also meaningful in the real setting, where inequalities can pass to limits under the usual hypotheses; for complex sequences one keeps only the algebraic limit laws and avoids order language.
[remark: Algebraic Limit Laws for Scalars]
If $x_n\to x$ and $y_n\to y$ are real scalar sequences, then sums, differences, scalar multiples, and products converge to the corresponding sums, differences, scalar multiples, and products of the limits. If additionally $y\ne 0$ and $y_n\ne 0$ eventually, then $x_n/y_n\to x/y$. Inequality statements such as order preservation belong to the real-valued case, not to arbitrary complex-valued sequences.
[/remark]
The nonzero denominator condition is essential because reciprocals magnify errors near zero. Without a positive distance from zero, small denominator errors can dominate the whole expression.
[example: Division Near a Vanishing Limit]
Let $x_n=1$ and $y_n=1/n$ in $\mathbb{R}$. For every $\varepsilon>0$, $|x_n-1|=|1-1|=0<\varepsilon$, so $x_n\to 1$. Also, by the Archimedean property, choose $N\in\mathbb{N}$ with $N>1/\varepsilon$; then for every $n\ge N$,
\begin{align*}
|y_n-0|=\left|\frac{1}{n}\right|=\frac{1}{n}\le \frac{1}{N}<\varepsilon.
\end{align*}
Thus $y_n\to 0$.
For every $n\in\mathbb{N}$, $y_n\ne 0$, and
\begin{align*}
\frac{x_n}{y_n}=\frac{1}{1/n}=n.
\end{align*}
The sequence $(n)$ does not converge in $\mathbb{R}$. Indeed, if $n\to L$ for some $L\in\mathbb{R}$, then with $\varepsilon=1$ there would be $N\in\mathbb{N}$ such that $|n-L|<1$ for every $n\ge N$. Choose an integer $n\ge N$ with $n>L+1$. Then
\begin{align*}
|n-L|=n-L>1,
\end{align*}
contradicting $|n-L|<1$. Hence $x_n/y_n$ does not converge, even though the numerator and denominator converge separately; the quotient rule needs the limiting denominator to be nonzero.
[/example]
### Order and Closed Conditions
Algebraic rules are not enough for analysis; inequalities also need to pass to the limit. The correct principle is that closed inequalities survive, while strict inequalities may collapse at the boundary.
[quotetheorem:7846]
The theorem explains why closed sets behave well under sequential limits. A sequence in $[0,\infty)$ can limit to $0$, but it cannot limit to a negative number.
[example: Strict Positivity Can Collapse]
Let $x_n=1/n$ for $n\in\mathbb{N}$. Since $n>0$, division by the positive number $n$ gives
\begin{align*}
x_n=\frac{1}{n}>0.
\end{align*}
Thus every term is strictly positive.
We show that $x_n\to 0$. Let $\varepsilon>0$. By the Archimedean property, choose $N\in\mathbb{N}$ such that $N>1/\varepsilon$. If $n\ge N$, then $1/n\le 1/N$, and from $N>1/\varepsilon$ we get $1/N<\varepsilon$. Therefore
\begin{align*}
|x_n-0|=\left|\frac{1}{n}\right|=\frac{1}{n}\le \frac{1}{N}<\varepsilon.
\end{align*}
Hence $x_n\to 0$, even though $x_n>0$ for every $n$. The strict inequalities $x_n>0$ survive in the limit only as the closed inequality $0\ge 0$.
[/example]
## Subsequences, Compactness, and Accumulation
### Subsequences
A sequence may fail to converge because it has several incompatible limiting behaviours. To diagnose this, we need a way to keep infinitely many terms while discarding the rest. The order of the retained terms must be preserved so that late behaviour remains late behaviour.
[definition: Subsequence]
Let $(x_n)_{n=1}^\infty$ be a sequence in a set $X$. A subsequence of $(x_n)$ is a sequence $(x_{n_k})_{k=1}^\infty$ where $(n_k)_{k=1}^\infty$ is a strictly increasing sequence in $\mathbb{N}$.
[/definition]
Subsequences are useful because convergence of the whole sequence should leave no freedom for different limiting behaviour. We need a theorem that turns this intuition into a diagnostic tool: if a proposed convergent sequence has two subsequential limits, the original convergence claim must fail.
[quotetheorem:7847]
Two subsequences with different limits therefore prove non-convergence. The alternating sequence is the standard model.
[example: Oscillation Detected by Subsequences]
Let $x_n=(-1)^n$. For the even subsequence, $n=2k$, so
\begin{align*}
x_{2k}=(-1)^{2k}=\left((-1)^2\right)^k=1^k=1.
\end{align*}
Thus $(x_{2k})$ converges to $1$. For the odd subsequence, $n=2k-1$, so
\begin{align*}
x_{2k-1}=(-1)^{2k-1}=(-1)^{2k}(-1)=1\cdot(-1)=-1.
\end{align*}
Thus $(x_{2k-1})$ converges to $-1$.
We show that the full sequence cannot converge. Suppose, toward a contradiction, that $x_n\to L$ for some $L\in\mathbb{R}$. With $\varepsilon=1/2$, there is $N\in\mathbb{N}$ such that $|x_n-L|<1/2$ for every $n\ge N$. Choose $k$ large enough that $2k\ge N$ and $2k-1\ge N$. Then
\begin{align*}
|1-L|=|x_{2k}-L|<\frac12.
\end{align*}
Also,
\begin{align*}
|-1-L|=|x_{2k-1}-L|<\frac12.
\end{align*}
By the triangle inequality,
\begin{align*}
2=|1-(-1)|=|(1-L)+(L+1)|\le |1-L|+|L+1|<\frac12+\frac12=1,
\end{align*}
which is impossible. Hence $((-1)^n)$ does not converge; the two subsequences reveal the persistent oscillation between $1$ and $-1$.
[/example]
### Compactness
Subsequences also lead to compactness. Boundedness in finite-dimensional Euclidean space prevents escape to infinity, but boundedness alone does not yet identify a limit. The key compactness question is whether bounded data must contain a convergent part after discarding enough terms.
[quotetheorem:628]
The theorem is special to finite-dimensional bounded sets. In infinite-dimensional spaces, bounded sequences may remain separated forever, so stronger compactness hypotheses or weaker convergence notions are needed.
[example: Bounded Sequence Without Norm-Convergent Subsequence in $\ell^2$]
Let $e_k\in\ell^2$ be the sequence whose $k$-th coordinate is $1$ and whose other coordinates are $0$. Its squared $\ell^2$ norm is
\begin{align*}
\|e_k\|_{\ell^2}^2=\sum_{r=1}^{\infty}|(e_k)_r|^2=|1|^2+\sum_{r\ne k}0^2=1.
\end{align*}
Since norms are nonnegative, $\|e_k\|_{\ell^2}=1$ for every $k$, so the sequence is bounded.
If $j\ne k$, then $e_j-e_k$ has coordinate $1$ at $j$, coordinate $-1$ at $k$, and coordinate $0$ everywhere else. Therefore
\begin{align*}
\|e_j-e_k\|_{\ell^2}^2=\sum_{r=1}^{\infty}|(e_j-e_k)_r|^2=|1|^2+|-1|^2+\sum_{r\ne j,k}0^2=2.
\end{align*}
Thus $\|e_j-e_k\|_{\ell^2}=\sqrt{2}$ whenever $j\ne k$.
Now take any subsequence $(e_{k_m})_{m=1}^{\infty}$. Since the indices $k_m$ are strictly increasing, $k_p\ne k_q$ whenever $p\ne q$, and hence
\begin{align*}
\|e_{k_p}-e_{k_q}\|_{\ell^2}=\sqrt{2}>1.
\end{align*}
With $\varepsilon=1$, no tail of the subsequence can have all its terms within $\varepsilon$ of each other, so no subsequence is Cauchy.
Finally, a norm-convergent subsequence would have to be Cauchy: if $e_{k_m}\to x$ in $\ell^2$, then for $\varepsilon=1$ choose $M$ such that $\|e_{k_m}-x\|_{\ell^2}<1/2$ for all $m\ge M$, and for $p,q\ge M$ the triangle inequality gives
\begin{align*}
\|e_{k_p}-e_{k_q}\|_{\ell^2}\le \|e_{k_p}-x\|_{\ell^2}+\|x-e_{k_q}\|_{\ell^2}<\frac12+\frac12=1.
\end{align*}
This contradicts $\|e_{k_p}-e_{k_q}\|_{\ell^2}=\sqrt{2}$ for $p\ne q$. Hence the bounded sequence $(e_k)$ in $\ell^2$ has no norm-convergent subsequence.
[/example]
## Pointwise and Uniform Convergence of Functions
### Pointwise Control
When integrals appear below, $\mathcal{L}^1$ denotes one-dimensional Lebesgue measure, so $d\mathcal{L}^1$ means integration with respect to this measure.
For functions, a limit can be tested at each input separately. This is often the most accessible notion, but it allows the required stage of convergence to depend on the input.
[definition: Pointwise Convergence]
Let $E$ be a set, let $(Y,d_Y)$ be a metric space, and let $f_n:E\to Y$ and $f:E\to Y$ be functions. The sequence $(f_n)_{n=1}^\infty$ converges pointwise to $f$ on $E$ if for every $x\in E$, the sequence $(f_n(x))_{n=1}^\infty$ converges to $f(x)$ in $Y$.
[/definition]
Pointwise convergence does not control how convergence varies across the domain. To preserve continuity and interchange limits with uniform estimates, we need a single stage that works for all inputs at once.
[definition: Uniform Convergence]
Let $E$ be a set, let $(Y,d_Y)$ be a metric space, and let $f_n:E\to Y$ and $f:E\to Y$ be functions. The sequence $(f_n)_{n=1}^\infty$ converges uniformly to $f$ on $E$ if for every $\varepsilon>0$ there exists $N\in\mathbb{N}$ such that for every $n\ge N$ and every $x\in E$,
\begin{align*}
d_Y(f_n(x),f(x))<\varepsilon.
\end{align*}
[/definition]
Uniform convergence gives a global error bound, but it is helpful to package that bound as a distance between functions. We need the supremum norm because it turns the phrase “the largest pointwise error is small” into ordinary norm convergence.
[definition: Supremum Norm]
Let $E$ be a set and let $B(E;\mathbb{R})$ be the vector space of bounded functions $f:E\to\mathbb{R}$. The supremum norm on $B(E;\mathbb{R})$ is the map
\begin{align*}
\|f\|_\infty &= \sup_{x\in E}|f(x)|.
\end{align*}
[/definition]
The reason for demanding uniform control is that continuity is local in the input but global in the error estimate. We need a theorem that transfers continuity from approximating functions to the limit, and the supremum norm gives the right amount of control to do this at every point.
[quotetheorem:258]
The opening example therefore cannot be uniformly convergent on $[0,1]$. On smaller intervals away from $1$, the same functions do converge uniformly.
[example: Uniform Convergence Away from the Boundary]
Fix $a\in(0,1)$ and define $f_n:[0,a]\to\mathbb{R}$ by $f_n(x)=x^n$. For $x\in[0,a]$ we have $0\le x\le a$, so multiplying the inequalities $0\le x\le a$ by the nonnegative number $x^{n-1}$ gives
\begin{align*}
0\le x^n\le ax^{n-1}.
\end{align*}
Repeating this comparison $n$ times gives $0\le x^n\le a^n$. Since equality occurs at $x=a$,
\begin{align*}
\|f_n\|_\infty=\sup_{x\in[0,a]}|x^n|=a^n.
\end{align*}
It remains to check that $a^n\to 0$. Write $a=1/(1+t)$, where $t=(1-a)/a>0$. Bernoulli's inequality gives $(1+t)^n\ge 1+nt$, and therefore
\begin{align*}
0\le a^n=\frac{1}{(1+t)^n}\le \frac{1}{1+nt}.
\end{align*}
Given $\varepsilon>0$, choose $N\in\mathbb{N}$ with $N>(1/\varepsilon-1)/t$ if $\varepsilon<1$, and choose $N=1$ if $\varepsilon\ge 1$. Then for every $n\ge N$,
\begin{align*}
\|f_n-0\|_\infty=a^n\le \frac{1}{1+nt}<\varepsilon.
\end{align*}
Thus $f_n\to 0$ uniformly on $[0,a]$.
On $[0,1]$, the pointwise limit is $f(x)=0$ for $0\le x<1$ and $f(1)=1$. The convergence is not uniform there: if it were uniform, then for $\varepsilon=1/2$ there would be $N\in\mathbb{N}$ such that $|x^n-f(x)|<1/2$ for every $n\ge N$ and every $x\in[0,1]$. Taking $n=N$ and $x=(3/4)^{1/N}$ gives $0<x<1$, so $f(x)=0$, while
\begin{align*}
|x^N-f(x)|=\left|\left((3/4)^{1/N}\right)^N-0\right|=\frac34>\frac12.
\end{align*}
Hence moving the domain boundary away from $1$ turns the pointwise decay of $x^n$ into uniform decay.
[/example]
### Integrals and Derivatives
Uniform convergence is strong enough to pass limits through Riemann or Lebesgue integrals on finite intervals. The theorem is motivated by the estimate that a uniformly small error has uniformly small total area over a finite interval.
[quotetheorem:7848]
Pointwise convergence alone does not control area. A sequence may become small at every fixed point while concentrating fixed mass into narrower intervals.
[example: Moving Spikes and Failure of Pointwise Integral Convergence]
Define $f_n:[0,1]\to\mathbb{R}$ by
\begin{align*}
f_n(x)=n\mathbb{1}_{(0,1/n)}(x).
\end{align*}
We first compute the pointwise limit. Since $0\notin(0,1/n)$ for every $n$, we have
\begin{align*}
f_n(0)=n\mathbb{1}_{(0,1/n)}(0)=n\cdot 0=0.
\end{align*}
Now fix $x\in(0,1]$. Choose $N\in\mathbb{N}$ with $N>1/x$. If $n\ge N$, then
\begin{align*}
\frac{1}{n}\le \frac{1}{N}<x.
\end{align*}
Thus $x\notin(0,1/n)$ for every $n\ge N$, so
\begin{align*}
f_n(x)=n\mathbb{1}_{(0,1/n)}(x)=n\cdot 0=0.
\end{align*}
Hence $f_n(x)\to 0$ for every fixed $x\in[0,1]$.
The integrals do not converge to the integral of the pointwise limit. Since $\mathbb{1}_{(0,1/n)}$ is $1$ on $(0,1/n)$ and $0$ outside it,
\begin{align*}
\int_0^1 f_n(x)\,d\mathcal{L}^1(x)=\int_0^1 n\mathbb{1}_{(0,1/n)}(x)\,d\mathcal{L}^1(x)=n\,\mathcal{L}^1((0,1/n)).
\end{align*}
The interval $(0,1/n)$ has Lebesgue measure $1/n-0=1/n$, so
\begin{align*}
n\,\mathcal{L}^1((0,1/n))=n\cdot\frac{1}{n}=1.
\end{align*}
Thus $f_n\to 0$ pointwise on $[0,1]$, but $\int_0^1 f_n\,d\mathcal{L}^1=1$ for every $n$; the mass remains fixed while the support shrinks toward $0$.
[/example]
Differentiation is more fragile than integration because derivatives measure small-scale variation, and a sequence can have stable derivatives while its vertical position drifts. We need a theorem that adds exactly the missing anchor: convergence at one point fixes the constants of integration, while uniform convergence of derivatives controls the variation everywhere else.
[quotetheorem:260]
The base point condition is necessary because derivatives do not record vertical displacement.
[example: Derivatives Converge but Functions Drift]
Let $f_n:[0,1]\to\mathbb{R}$ be defined by $f_n(x)=n$. For each fixed $n$, the function $f_n$ is constant on $[0,1]$, so
\begin{align*}
f_n'(x)=0
\end{align*}
for every $x\in[0,1]$. Hence, if $g(x)=0$ on $[0,1]$, then for every $n$,
\begin{align*}
\|f_n'-g\|_\infty=\sup_{x\in[0,1]}|0-0|=0.
\end{align*}
Therefore $f_n'\to 0$ uniformly on $[0,1]$.
The functions themselves do not converge at any point. Fix $x\in[0,1]$. Then
\begin{align*}
f_n(x)=n
\end{align*}
for every $n$. Suppose, toward a contradiction, that $f_n(x)\to L$ for some real number $L$. Taking $\varepsilon=1$, there would be $N\in\mathbb{N}$ such that $|n-L|<1$ for every $n\ge N$. Choose an integer $n\ge N$ with $n>L+1$. Then
\begin{align*}
|f_n(x)-L|=|n-L|=n-L>1,
\end{align*}
contradicting the required inequality. Thus the derivatives converge uniformly, but without convergence at even one base point the functions can drift vertically and fail to converge anywhere.
[/example]
## Convergence in Measure and $L^p$ Spaces
### Measure-Theoretic Convergence
Pointwise convergence treats every point as equally visible, but integration theory ignores changes on sets of measure zero. We therefore need convergence notions that measure the size of the exceptional set where the error remains large.
[definition: Convergence in Measure]
Let $(E,\mathcal{E},\mu)$ be a measure space, and let $f_n:E\to\mathbb{R}$ and $f:E\to\mathbb{R}$ be measurable functions. The sequence $(f_n)_{n=1}^\infty$ converges to $f$ in measure if for every $\varepsilon>0$,
\begin{align*}
\mu(\{x\in E:|f_n(x)-f(x)|>\varepsilon\})\to 0.
\end{align*}
[/definition]
Convergence in measure allows the bad set to move with $n$. To compare it with pointwise behaviour, we need a pointwise notion that ignores only a fixed null set.
[definition: Almost Everywhere Convergence]
Let $(E,\mathcal{E},\mu)$ be a measure space, and let $f_n:E\to\mathbb{R}$ and $f:E\to\mathbb{R}$ be measurable functions. The sequence $(f_n)_{n=1}^\infty$ converges to $f$ almost everywhere if there exists a measurable set $N\subset E$ with $\mu(N)=0$ such that $f_n(x)\to f(x)$ for every $x\in E\setminus N$.
[/definition]
Almost everywhere convergence fixes the exceptional null set once and for all, while convergence in measure allows the exceptional set to move. On finite measure spaces, this fixed-null-set control should force the moving exceptional sets to become small, and the next theorem makes that comparison precise.
[quotetheorem:1022]
The finite measure assumption cannot be ignored. A bump can move away through an infinite space while keeping the same size.
[example: Translation on an Infinite Measure Space]
Let $E=\mathbb{R}$ with Lebesgue measure $\mathcal{L}^1$, and define
\begin{align*}
f_n(x)=\mathbb{1}_{[n,n+1]}(x).
\end{align*}
We first check pointwise convergence. Fix $x\in\mathbb{R}$. Choose $N\in\mathbb{N}$ with $N>x$. If $n\ge N$, then $n>x$, so $x<n$ and therefore $x\notin[n,n+1]$. Hence, for every $n\ge N$,
\begin{align*}
f_n(x)=\mathbb{1}_{[n,n+1]}(x)=0.
\end{align*}
Therefore $f_n(x)\to 0$ for every fixed $x\in\mathbb{R}$.
Now take $\varepsilon=1/2$. Since $f_n$ only takes the values $0$ and $1$,
\begin{align*}
\{x\in\mathbb{R}:|f_n(x)-0|>1/2\}=\{x\in\mathbb{R}:\mathbb{1}_{[n,n+1]}(x)=1\}=[n,n+1].
\end{align*}
The interval $[n,n+1]$ has Lebesgue measure
\begin{align*}
\mathcal{L}^1([n,n+1])=(n+1)-n=1.
\end{align*}
So for every $n\in\mathbb{N}$,
\begin{align*}
\mathcal{L}^1(\{x\in\mathbb{R}:|f_n(x)|>1/2\})=1.
\end{align*}
This sequence of measures is constantly $1$, so it does not converge to $0$. Hence $f_n$ converges pointwise to $0$, but not in measure on the infinite measure space $\mathbb{R}$.
[/example]
### $L^p$ Control
For integrals and function spaces, it is often better to measure the average size of the error. The $L^p$ norm does this by integrating $p$-th powers of the error.
[definition: $L^p$ Convergence]
Let $(E,\mathcal{E},\mu)$ be a measure space, let $1\le p<\infty$, and let $f_n,f\in L^p(E,\mathcal{E},\mu)$. The sequence $(f_n)_{n=1}^\infty$ converges to $f$ in $L^p$ if
\begin{align*}
\|f_n-f\|_{L^p(E)}=\left(\int_E |f_n-f|^p\,d\mu\right)^{1/p}\to 0.
\end{align*}
[/definition]
$L^p$ convergence should imply that large errors are rare, because a large-error set contributes at least its measure times a fixed power of the threshold to the $L^p$ error. The next theorem turns this intuition into the standard bridge from norm convergence to convergence in measure.
[quotetheorem:1075]
The reverse direction needs extra hypotheses. A dominating integrable function is one of the most common ways to upgrade pointwise convergence to $L^1$ convergence.
[quotetheorem:4]
Dominated convergence prevents the moving-spike pathology by placing one integrable envelope over every term.
[example: Applying Dominated Convergence]
Let $f_n:[0,1]\to\mathbb{R}$ be $f_n(x)=x^n$. We first identify the almost-everywhere limit. If $x=0$, then $f_n(0)=0^n=0$ for every $n$. If $0<x<1$, write $x=1/(1+t)$ with $t=(1-x)/x>0$. Bernoulli's inequality gives $(1+t)^n\ge 1+nt$, so
\begin{align*}
0\le x^n=\frac{1}{(1+t)^n}\le \frac{1}{1+nt}.
\end{align*}
Given $\varepsilon>0$, choose $N\in\mathbb{N}$ with $N>(1/\varepsilon-1)/t$ when $\varepsilon<1$, and choose $N=1$ when $\varepsilon\ge 1$. Then for every $n\ge N$,
\begin{align*}
0\le x^n\le \frac{1}{1+nt}<\varepsilon.
\end{align*}
Thus $x^n\to 0$ for every $x\in[0,1)$. At $x=1$, $f_n(1)=1^n=1$ for every $n$, but the exceptional set $\{1\}$ has $\mathcal{L}^1$-measure $0$. Hence $f_n\to 0$ almost everywhere on $[0,1]$.
For every $x\in[0,1]$ and every $n\in\mathbb{N}$, we have $0\le x^n\le 1$, so $|f_n(x)|\le 1$. Also,
\begin{align*}
\int_0^1 1\,d\mathcal{L}^1=\mathcal{L}^1([0,1])=1<\infty.
\end{align*}
Therefore the constant function $1$ belongs to $L^1([0,1])$, and the *Dominated Convergence Theorem* gives
\begin{align*}
\int_0^1 x^n\,d\mathcal{L}^1(x)\to 0.
\end{align*}
The same conclusion is visible from the explicit integral. Since
\begin{align*}
\frac{d}{dx}\left(\frac{x^{n+1}}{n+1}\right)=x^n,
\end{align*}
the fundamental theorem of calculus gives
\begin{align*}
\int_0^1 x^n\,d\mathcal{L}^1(x)=\left.\frac{x^{n+1}}{n+1}\right|_0^1.
\end{align*}
Evaluating the endpoints,
\begin{align*}
\left.\frac{x^{n+1}}{n+1}\right|_0^1=\frac{1^{n+1}}{n+1}-\frac{0^{n+1}}{n+1}.
\end{align*}
Since $1^{n+1}=1$ and $0^{n+1}=0$,
\begin{align*}
\int_0^1 x^n\,d\mathcal{L}^1(x)=\frac{1}{n+1}.
\end{align*}
Finally, $1/(n+1)\to 0$, matching the conclusion from dominated convergence.
[/example]
## Weak Convergence and Testing by Functionals
### Weak Limits
Norm convergence is often too strong in infinite-dimensional analysis. Bounded sequences may have no norm-convergent subsequence, so analysts test convergence through continuous linear functionals instead.
[definition: Weak Convergence]
Let $X$ be a Banach space over $\mathbb{R}$ or $\mathbb{C}$, and let $(x_n)_{n=1}^\infty$ be a sequence in $X$. The sequence $(x_n)$ converges weakly to $x\in X$, written $x_n\rightharpoonup x$, if for every $f\in X^*$,
\begin{align*}
f(x_n)\to f(x).
\end{align*}
[/definition]
After defining weak convergence, the next problem is to compare it with the stronger norm topology. We need this comparison because every later weak compactness argument starts from bounds or convergence in norm and then asks what all bounded linear tests see. The theorem below supplies that bridge from strong convergence to weak convergence.
[quotetheorem:982]
The converse fails in infinite dimension, which is exactly why weak convergence is useful: it can produce limits when norm convergence is unavailable.
[example: Weak but Not Norm Convergence in $\ell^2$]
Let $(e_n)_{n=1}^\infty$ be the standard orthonormal sequence in $\ell^2$, so $(e_n)_n=1$ and $(e_n)_k=0$ when $k\ne n$. We show that $e_n\rightharpoonup 0$ but $e_n$ does not converge to $0$ in norm.
Fix $a=(a_k)_{k=1}^\infty\in\ell^2$. Since $a\in\ell^2$, the series $\sum_{k=1}^{\infty}|a_k|^2$ converges. Hence its tails converge to $0$, and for every $n$,
\begin{align*}
0\le |a_n|^2\le \sum_{k=n}^{\infty}|a_k|^2.
\end{align*}
Therefore $|a_n|^2\to 0$, so $a_n\to 0$ and also $\overline{a_n}\to 0$.
For the functional $T_a:\ell^2\to\mathbb{C}$ defined by $T_a(x)=(x,a)_{\ell^2}$, we compute
\begin{align*}
T_a(e_n)=(e_n,a)_{\ell^2}=\sum_{k=1}^{\infty}(e_n)_k\overline{a_k}.
\end{align*}
All terms in the sum vanish except the $k=n$ term, because $(e_n)_k=0$ for $k\ne n$ and $(e_n)_n=1$. Thus
\begin{align*}
T_a(e_n)=1\cdot\overline{a_n}=\overline{a_n}\to 0.
\end{align*}
Also $T_a(0)=(0,a)_{\ell^2}=0$, so $T_a(e_n)\to T_a(0)$.
By the *Riesz representation theorem for Hilbert spaces*, every continuous linear functional on $\ell^2$ has the form $T_a$ for some $a\in\ell^2$. Hence every element of $(\ell^2)^*$ sends $e_n$ to a scalar sequence converging to its value at $0$, so $e_n\rightharpoonup 0$ in $\ell^2$.
This convergence is not norm convergence. For every $n$,
\begin{align*}
\|e_n-0\|_{\ell^2}^2=\sum_{k=1}^{\infty}|(e_n)_k|^2=|1|^2+\sum_{k\ne n}0^2=1.
\end{align*}
Since norms are nonnegative, $\|e_n-0\|_{\ell^2}=1$ for every $n$. Taking $\varepsilon=1/2$, there is no $N\in\mathbb{N}$ such that $\|e_n-0\|_{\ell^2}<1/2$ for all $n\ge N$. Thus $e_n$ converges weakly to $0$, but not in norm; weak convergence records the values of all continuous linear tests, while norm convergence would require the vectors themselves to become close to $0$.
[/example]
### Lower Semicontinuity
Weak convergence loses norm information, but variational arguments still need inequalities to survive. Lower semicontinuity is the property that prevents energy from jumping downward incorrectly at the limit.
[definition: Weak Sequential Lower Semicontinuity]
Let $X$ be a Banach space and let $F:X\to(-\infty,\infty]$ be a function. The function $F$ is weakly sequentially lower semicontinuous if whenever $x_n\rightharpoonup x$ weakly in $X$,
\begin{align*}
F(x)\le \liminf_{n\to\infty}F(x_n).
\end{align*}
[/definition]
The norm itself has this lower semicontinuity property, and this is the estimate that makes many weak compactness arguments usable. We need it because bounded minimizing sequences often converge only weakly, so the limiting candidate must inherit an energy bound without requiring norm convergence.
[quotetheorem:215]
The inequality may be strict. In $\ell^2$, the sequence $e_n\rightharpoonup 0$ has norms equal to $1$, while the weak limit has norm $0$.
## Series and Uniform Tests
Infinite series are sequences of partial sums. The convergence question is therefore the same, but the structure of sums gives useful tests. Uniform convergence of a function series is especially important because it preserves continuity and justifies termwise operations under suitable hypotheses.
[definition: Uniform Convergence of a Series of Functions]
Let $E$ be a set, let $(Y,\|\cdot\|_Y)$ be a normed vector space, and let $u_n:E\to Y$ be functions. The series $\sum_{n=1}^\infty u_n$ converges uniformly on $E$ if the sequence of partial sums $s_N:E\to Y$ defined by $s_N(x)=\sum_{n=1}^N u_n(x)$ converges uniformly on $E$ as $N\to\infty$.
[/definition]
A practical convergence test should compare every term of the function series to a numerical majorant. We need the Weierstrass $M$-test because it gives uniform convergence without knowing the sum in advance, which is exactly the situation in power-series and approximation arguments.
[quotetheorem:261]
The test replaces a function problem by a numerical comparison. It is strong enough for many power-series and approximation arguments.
[example: Uniform Convergence of a Power-Type Series]
For $x\in[0,1]$, define $u_n(x)=x^n/n^2$. Since $0\le x\le 1$, multiplying the inequalities $0\le x\le 1$ by the nonnegative number $x^{n-1}$ gives $0\le x^n\le x^{n-1}$ for $n\ge 1$, and repeating this comparison gives $0\le x^n\le 1$. Hence, for every $n\in\mathbb{N}$ and every $x\in[0,1]$,
\begin{align*}
|u_n(x)|=\left|\frac{x^n}{n^2}\right|=\frac{|x^n|}{n^2}=\frac{x^n}{n^2}\le \frac{1}{n^2}.
\end{align*}
Set $M_n=1/n^2$. The numerical series $\sum_{n=1}^{\infty}M_n=\sum_{n=1}^{\infty}1/n^2$ converges by the $p$-series test with $p=2>1$. Therefore the hypotheses of the *Weierstrass $M$-test* are satisfied, because $|u_n(x)|\le M_n$ uniformly in $x\in[0,1]$ and $\sum_{n=1}^{\infty}M_n<\infty$.
It follows that
\begin{align*}
\sum_{n=1}^{\infty}u_n(x)=\sum_{n=1}^{\infty}\frac{x^n}{n^2}
\end{align*}
converges uniformly and absolutely on $[0,1]$. The point is that the entire function series is controlled by the single convergent numerical majorant $\sum_{n=1}^{\infty}1/n^2$, independently of $x$.
[/example]
## Choosing the Right Mode of Convergence
The different notions answer different questions. Pointwise convergence asks whether each input stabilizes. Uniform convergence asks whether the whole domain stabilizes at one rate. $L^p$ convergence asks whether average error vanishes. Convergence in measure asks whether large errors become rare. Weak convergence asks whether every bounded linear measurement stabilizes.
Because each notion discards different information, implications must be handled carefully. The following summary records the basic directions that are used most often.
[quotetheorem:7849]
The converses fail in important ways. Pointwise convergence need not preserve continuity or integrals. Convergence in measure need not give pointwise convergence of the full sequence. Weak convergence need not give norm convergence.
[example: Convergence in Measure Without Pointwise Convergence]
Let $E=[0,1]$ with Lebesgue measure. For each dyadic level $k\ge 0$ and each $j\in\{0,1,\dots,2^k-1\}$, set
\begin{align*}
I_{k,j}=\left[\frac{j}{2^k},\frac{j+1}{2^k}\right).
\end{align*}
List these intervals by increasing $k$, and within each fixed level by increasing $j$. Thus level $k$ occupies the indices $n=2^k,2^k+1,\dots,2^{k+1}-1$. Let $I_n$ be the $n$-th interval in this list and define $f_n=\mathbb{1}_{I_n}$.
We first show that $f_n\to 0$ in measure. If $I_n=I_{k,j}$, then
\begin{align*}
\mathcal{L}^1(I_n)=\mathcal{L}^1\left(\left[\frac{j}{2^k},\frac{j+1}{2^k}\right)\right)=\frac{j+1}{2^k}-\frac{j}{2^k}=\frac{1}{2^k}.
\end{align*}
Given $\eta>0$, choose $K\in\mathbb{N}$ such that $2^{-K}<\eta$. If $n\ge 2^K$, then $I_n$ belongs to some level $k\ge K$, so
\begin{align*}
\mathcal{L}^1(I_n)=2^{-k}\le 2^{-K}<\eta.
\end{align*}
Hence $\mathcal{L}^1(I_n)\to 0$.
Now fix $\varepsilon>0$. Since $f_n$ only takes the values $0$ and $1$, if $\varepsilon\ge 1$ then
\begin{align*}
\{x\in[0,1]:|f_n(x)-0|>\varepsilon\}=\varnothing.
\end{align*}
If $0<\varepsilon<1$, then $|f_n(x)|>\varepsilon$ exactly when $f_n(x)=1$, so
\begin{align*}
\{x\in[0,1]:|f_n(x)-0|>\varepsilon\}=I_n.
\end{align*}
Therefore, in both cases,
\begin{align*}
\mathcal{L}^1(\{x\in[0,1]:|f_n(x)-0|>\varepsilon\})\to 0.
\end{align*}
Thus $f_n\to 0$ in measure.
The sequence does not converge pointwise on $[0,1)$. Fix $x\in[0,1)$. For each $k\ge 0$, let $j_k=\lfloor 2^k x\rfloor$. Since $0\le x<1$, we have $0\le j_k\le 2^k-1$, and the defining property of the floor function gives
\begin{align*}
j_k\le 2^k x<j_k+1.
\end{align*}
Dividing by $2^k$ gives
\begin{align*}
\frac{j_k}{2^k}\le x<\frac{j_k+1}{2^k},
\end{align*}
so $x\in I_{k,j_k}$. Hence at the index corresponding to $I_{k,j_k}$, the value of $f_n(x)$ is $1$. This happens once at every dyadic level, so $f_n(x)=1$ for infinitely many $n$.
For every level $k\ge 1$, choose some $\ell_k\in\{0,1,\dots,2^k-1\}$ with $\ell_k\ne j_k$. Then $x\notin I_{k,\ell_k}$, and at the index corresponding to $I_{k,\ell_k}$ we have
\begin{align*}
f_n(x)=\mathbb{1}_{I_{k,\ell_k}}(x)=0.
\end{align*}
This also happens for infinitely many $n$. Therefore, for every $x\in[0,1)$, the scalar sequence $(f_n(x))$ takes the values $1$ and $0$ infinitely often, so it cannot converge. The example shows that convergence in measure allows the exceptional set to move, while pointwise convergence of the full sequence requires eventual stabilization at each fixed point.
[/example]
In applications, the operation determines the convergence. Continuity suggests uniform convergence. Integral limits suggest $L^1$ convergence, domination, or uniform integrability. Infinite-dimensional compactness suggests weak convergence or compact embeddings. This chapter has focused on the core sequence and function-space modes; locally uniform convergence, convergence of measures, compact embeddings, and convergence in distribution are boundary topics that use the same philosophy but need their own hypotheses and examples.
## Beyond and Connected Topics
Convergence is a central thread in [Cambridge IA Analysis Notes](/page/Cambridge%20IA%20Analysis%20Notes), where sequences, series, continuity, and differentiability are first tied together through epsilon arguments.
In [Cambridge IB Analysis and Topology](/page/Cambridge%20IB%20Analysis%20and%20Topology), convergence is reorganized through metric spaces, compactness, completeness, and topological structure. This is the natural continuation for readers who want the general language behind the definitions above.
In [Cambridge II Analysis of Functions](/page/Cambridge%20II%20Analysis%20of%20Functions), convergence becomes a tool for function spaces, Fourier analysis, differentiation under the limit, and approximation. The distinction between pointwise, uniform, and norm convergence becomes operational there.
In [Cambridge IB Complex Analysis](/page/Cambridge%20IB%20Complex%20Analysis), locally uniform convergence is especially important because it preserves holomorphicity and supports termwise manipulation of power series.
Further directions include weak convergence in Banach spaces, convergence of measures, compact embeddings in Sobolev spaces, and convergence in distribution for probability. Each direction keeps the same guiding question: which tests define the limit, and which structures survive after the limit is taken?
## References
Androma, [Cambridge IA Analysis Notes](/page/Cambridge%20IA%20Analysis%20Notes).
Androma, [Cambridge IB Analysis and Topology](/page/Cambridge%20IB%20Analysis%20and%20Topology).
Androma, [Cambridge IB Complex Analysis](/page/Cambridge%20IB%20Complex%20Analysis).
Androma, [Cambridge II Analysis of Functions](/page/Cambridge%20II%20Analysis%20of%20Functions).
Walter Rudin, *Principles of Mathematical Analysis* (1976).
Gerald B. Folland, *Real Analysis* (1999).
Haim Brezis, *Functional Analysis, Sobolev Spaces and Partial Differential Equations* (2011).