# Metric Space
How far apart are two [functions](/page/Function)? Two matrices? Two probability [distributions](/page/Distribution)? On the real line, the distance between numbers $x$ and $y$ is $|x - y|$, and the entire edifice of real analysis --- convergence, [continuity](/page/Continuity), compactness --- rests on this single measurement. But the real line is only one setting among many. In the space $C[0,1]$ of continuous functions on the unit interval, the "distance" between two functions $f$ and $g$ might be the largest vertical gap $\sup_{t \in [0,1]} |f(t) - g(t)|$, or the total area between their graphs $\int_0^1 |f(t) - g(t)| \, dt$. In a network, distance might count hops rather than metres. The concept of a metric space isolates the three properties that make any notion of distance analytically useful --- positivity, symmetry, and the triangle inequality --- and demonstrates that once these three axioms hold, the full machinery of convergence, continuity, compactness, and completeness follows without reference to coordinates, dimensions, or algebraic structure.
[motivation]
### Beyond Euclidean distance
In a first analysis course, every distance is $|x - y|$ on $\mathbb{R}$ or $\|x - y\|$ on $\mathbb{R}^n$. This is sufficient for studying [sequences](/page/Sequence) of numbers and functions of a real variable, but it becomes limiting as soon as the objects of study are themselves more complex. Consider the space of all continuous functions $f: [0,1] \to \mathbb{R}$. Two functions $f$ and $g$ might agree everywhere except on a tiny interval near $t = 1/2$, where $f$ spikes sharply above $g$. Are they "close"? That depends on how we measure: the supremum distance $\sup |f(t) - g(t)|$ says they are far apart (the spike is tall), while the $L^1$ distance $\int_0^1 |f(t) - g(t)| \, dt$ says they are close (the spike is narrow, so the area is small). Different notions of distance on the same underlying set can lead to different convergent sequences, different continuous functions, and different compact subsets. A framework that accommodates all such notions simultaneously is not a luxury --- it is a necessity.
### What makes a "distance" useful?
Not every assignment of non-negative numbers to pairs of points deserves to be called a distance. Three requirements emerge from examining what makes $|x - y|$ so effective on $\mathbb{R}$.
First, **positivity and identity of indiscernibles**: $d(x, y) \geq 0$ always, and $d(x, y) = 0$ if and only if $x = y$. Without this, distinct points could be "distance zero apart," collapsing the ability to distinguish them. Relaxing this condition leads to pseudometrics, which arise naturally in quotient constructions but sacrifice the power to separate points.
Second, **symmetry**: $d(x, y) = d(y, x)$. The distance from London to Paris is the same as from Paris to London. Dropping symmetry leads to quasimetrics, which model situations like travel times on one-way streets, but which require separate theories for "forward convergence" and "backward convergence."
Third, the **triangle inequality**: $d(x, z) \leq d(x, y) + d(y, z)$. This is the axiom with the deepest consequences. It prevents "shortcuts" --- going from $x$ to $z$ via $y$ can never be cheaper than going directly. Every proof in metric space theory that involves approximation --- and that is nearly all of them --- relies on the triangle inequality to control accumulated errors. The $\varepsilon/3$ argument, the passage from pointwise to uniform convergence, the proof that Cauchy sequences are bounded: all depend on chaining triangle inequality estimates.
### The payoff
Once a set $X$ is equipped with a function $d$ satisfying these three axioms, the entire toolkit of analysis becomes available. Open balls define a topology; convergence is characterised by $d(x_n, x) \to 0$; continuity means $\varepsilon$-$\delta$ with $d$ in place of absolute value; Cauchy sequences and completeness generalise directly. The remarkable fact is that these constructions do not depend on the specific formula for $d$ --- only on the three axioms. A single proof of the Heine-Cantor theorem, for instance, works simultaneously for real-valued functions on an interval, for maps between surfaces, and for operators between Banach spaces. The metric space framework is the engine that makes this kind of unification possible.
[/motivation]
## Definition
[definition:Metric Space]
A **metric space** is an ordered pair $(X, d)$, where $X$ is a non-empty set and $d: X \times X \to \mathbb{R}$ is a function, called the **metric** (or **distance function**), satisfying the following three axioms for all $x, y, z \in X$:
1. **Positivity and identity of indiscernibles.**
\begin{align*}
d(x, y) \geq 0, \quad \text{and} \quad d(x, y) = 0 \iff x = y.
\end{align*}
2. **Symmetry.**
\begin{align*}
d(x, y) = d(y, x).
\end{align*}
3. **Triangle inequality.**
\begin{align*}
d(x, z) \leq d(x, y) + d(y, z).
\end{align*}
The elements of $X$ are called **points**, and $d(x, y)$ is called the **distance** between $x$ and $y$.
[/definition]
The three axioms are independent: no two of them imply the third, and each rules out a distinct pathology. Axiom 1 ensures that the metric separates points --- it functions as an identity test. Axiom 2 ensures that the relation "being close to" is symmetric. Axiom 3 ensures that distances compose in a controlled way, which is essential for every approximation argument in the subject.
A consequence of Axioms 1 and 3 is the **reverse triangle inequality**: for all $x, y, z \in X$,
\begin{align*}
|d(x, z) - d(y, z)| \leq d(x, y).
\end{align*}
To see this, apply the triangle inequality twice: $d(x, z) \leq d(x, y) + d(y, z)$ gives $d(x, z) - d(y, z) \leq d(x, y)$, and swapping the roles of $x$ and $y$ gives $d(y, z) - d(x, z) \leq d(y, x) = d(x, y)$. Taking the maximum of the two inequalities yields the result. The reverse triangle inequality shows that the distance function $d$ is Lipschitz continuous (with constant $1$) in each variable separately, a fact used repeatedly in the sequel.
[example:Euclidean Metric]
The **Euclidean metric** on $\mathbb{R}^n$ is defined by
\begin{align*}
d_2: \mathbb{R}^n \times \mathbb{R}^n &\to \mathbb{R} \\
(x, y) &\mapsto \left( \sum_{i=1}^{n} (x_i - y_i)^2 \right)^{1/2},
\end{align*}
where $x = (x_1, \ldots, x_n)$ and $y = (y_1, \ldots, y_n)$. Positivity and symmetry are immediate. The triangle inequality $d_2(x, z) \leq d_2(x, y) + d_2(y, z)$ is the Minkowski inequality for $p = 2$, which in turn follows from the Cauchy-Schwarz inequality. When $n = 1$, this reduces to the familiar $|x - y|$ on the real line.
[/example]
[example:Supremum Metric On Bounded Functions]
Let $S$ be a non-empty set and let $B(S)$ denote the set of all bounded functions $f: S \to \mathbb{R}$. Define
\begin{align*}
d_\infty: B(S) \times B(S) &\to \mathbb{R} \\
(f, g) &\mapsto \sup_{s \in S} |f(s) - g(s)|.
\end{align*}
The supremum exists because $f - g$ is bounded. Positivity and symmetry follow from those of $|\cdot|$. For the triangle inequality, let $f, g, h \in B(S)$ and $s \in S$. Then
\begin{align*}
|f(s) - h(s)| \leq |f(s) - g(s)| + |g(s) - h(s)| \leq d_\infty(f, g) + d_\infty(g, h).
\end{align*}
Taking the supremum over $s$ gives $d_\infty(f, h) \leq d_\infty(f, g) + d_\infty(g, h)$. This metric is fundamental in analysis: convergence in $d_\infty$ is precisely [uniform convergence](/page/Uniform%20Convergence), and the resulting metric space structure on $C[0,1]$ underpins the Arzela-Ascoli theorem and the theory of [Banach spaces](/page/Banach%20Space).
[/example]
[example:Discrete Metric]
On any non-empty set $X$, define
\begin{align*}
d_{\text{disc}}: X \times X &\to \mathbb{R} \\
(x, y) &\mapsto \begin{cases} 0 & \text{if } x = y, \\ 1 & \text{if } x \neq y. \end{cases}
\end{align*}
Axioms 1 and 2 are immediate. For the triangle inequality, if $x = z$ then $d_{\text{disc}}(x, z) = 0 \leq d_{\text{disc}}(x, y) + d_{\text{disc}}(y, z)$. If $x \neq z$, then $y$ differs from at least one of $x$ or $z$, so $d_{\text{disc}}(x, y) + d_{\text{disc}}(y, z) \geq 1 = d_{\text{disc}}(x, z)$. The discrete metric is the coarsest possible: it detects only whether points are equal or distinct, nothing about "how far apart" they are. Every subset of $(X, d_{\text{disc}})$ is open, so the induced topology is the discrete topology. A sequence converges in this metric if and only if it is eventually constant.
[/example]
## Open Sets and Topology
The concept of an open set arises naturally from the metric: a set is open if every point in it has room to move in all directions without leaving the set. This is formalised through open balls.
[definition:Open Ball]
Let $(X, d)$ be a metric space, let $x \in X$, and let $r > 0$. The **open ball** of radius $r$ centred at $x$ is the set
\begin{align*}
B_r(x) = \{ y \in X : d(x, y) < r \}.
\end{align*}
[/definition]
In $\mathbb{R}$ with the standard metric, $B_r(x) = (x - r, x + r)$ is an open interval. In $\mathbb{R}^2$ with the Euclidean metric, $B_r(x)$ is an open disc. In $\mathbb{R}^2$ with the taxicab metric $d_1(x, y) = |x_1 - y_1| + |x_2 - y_2|$, the "ball" $B_r(x)$ is a diamond (square rotated $45°$). The shape of open balls depends on the metric, not just on the underlying set.
[definition:Open Set In A Metric Space]
Let $(X, d)$ be a metric space. A subset $U \subseteq X$ is **open** if for every $x \in U$, there exists $r > 0$ such that $B_r(x) \subseteq U$.
[/definition]
Openness captures the idea that $U$ has no "[boundary](/page/Boundary) points" among its members: every point of $U$ is an interior point, surrounded by a small ball entirely contained in $U$.
The collection of all [open sets](/page/Open%20Set) in a metric space $(X, d)$ satisfies three properties that define a topology:
(i) The empty set $\varnothing$ and the full space $X$ are open.
(ii) The union of any collection of open sets is open.
(iii) The intersection of finitely many open sets is open.
Property (i) holds because $\varnothing$ has no points to check, and $X$ contains every ball. Property (ii) holds because if $x$ lies in a union $\bigcup_\alpha U_\alpha$, then $x \in U_\alpha$ for some $\alpha$, and the ball witnessing openness of $U_\alpha$ at $x$ lies in the union. Property (iii) holds because if $x \in U_1 \cap \cdots \cap U_k$, each $U_j$ provides a ball $B_{r_j}(x) \subseteq U_j$, and $B_r(x) \subseteq U_1 \cap \cdots \cap U_k$ for $r = \min(r_1, \ldots, r_k)$. The finiteness is essential: an infinite intersection of open sets need not be open. For instance, $\bigcap_{n=1}^\infty (-1/n, 1/n) = \{0\}$, which is not open in $\mathbb{R}$.
This collection of open sets is called the **topology induced by the metric** $d$. It is the bridge between metric spaces and the more general setting of [topological spaces](/page/Topology): every metric space is a topological space, but not every topological space arises from a metric. The induced topology determines which functions are continuous, which sequences converge, and which [sets](/page/Set) are compact --- the topological, rather than the quantitative, content of the metric.
The following theorem makes precise the relationship between continuity and the induced topology. It shows that continuity, originally defined pointwise via $\varepsilon$-$\delta$, can be characterised globally in terms of open sets.
[quotetheorem:269]
This characterisation is the gateway from metric spaces to general topology. In a topological space where there is no metric, the $\varepsilon$-$\delta$ definition of continuity is unavailable, and the preimage condition becomes the *definition* of continuity. The theorem guarantees that for metric spaces, this topological definition agrees with the familiar $\varepsilon$-$\delta$ one. The proof in both directions relies on the open ball structure: the forward direction uses the $\varepsilon$-$\delta$ condition to build a ball around each preimage point, and the reverse direction applies the preimage condition to an open ball in the codomain to extract a $\delta$ from the openness of the preimage. A more detailed treatment of continuity between metric spaces, including composition, sequential characterisation, and the relationship to [continuity on the real line](/page/Continuity%20(Real%20Analysis)), can be found on the [Continuity (Metric Spaces)](/page/Continuity%20(Metric%20Spaces)) page.
## Convergence and Completeness
Convergence in a metric space generalises the $\varepsilon$-$N$ definition from [real sequences](/page/Convergence%20(Real%20Sequences)): a sequence approaches a limit if the distances to that limit become arbitrarily small.
A sequence $(x_n)_{n \in \mathbb{N}}$ in a metric space $(X, d)$ **converges** to a point $x \in X$ if for every $\varepsilon > 0$, there exists $N \in \mathbb{N}$ such that $d(x_n, x) < \varepsilon$ for all $n \geq N$. When this holds, we write $x_n \to x$ and call $x$ the **limit** of the sequence. The limit, if it exists, is unique: if $x_n \to x$ and $x_n \to y$, then for every $\varepsilon > 0$ we have
\begin{align*}
0 \leq d(x, y) \leq d(x, x_n) + d(x_n, y) < \frac{\varepsilon}{2} + \frac{\varepsilon}{2} = \varepsilon
\end{align*}
for all sufficiently large $n$, so $d(x, y) = 0$ and hence $x = y$ by Axiom 1. This argument uses both the triangle inequality and the identity of indiscernibles --- both axioms are essential for [uniqueness of limits](/theorems/625).
Convergence can be rephrased topologically: $x_n \to x$ if and only if every open set containing $x$ contains all but finitely many terms of the sequence. This equivalence connects the sequential and topological viewpoints, and in metric spaces (which are first-countable), these two viewpoints are fully interchangeable.
### Cauchy sequences
In many situations, one suspects that a sequence converges but does not know what the limit should be. The Cauchy condition captures the idea that the terms of the sequence cluster together, without reference to any candidate limit.
A sequence $(x_n)_{n \in \mathbb{N}}$ in a metric space $(X, d)$ is a **Cauchy sequence** if for every $\varepsilon > 0$, there exists $N \in \mathbb{N}$ such that $d(x_m, x_n) < \varepsilon$ for all $m, n \geq N$. Every convergent sequence is Cauchy (the proof is identical to the real-valued case, using the triangle inequality), but the converse may fail: a Cauchy sequence can "want to converge" to a point that is not in the space. The standard example is the sequence of rationals $1, 1.4, 1.41, 1.414, \ldots$ in $(\mathbb{Q}, |\cdot|)$: it is Cauchy, but its limit $\sqrt{2}$ does not belong to $\mathbb{Q}$.
A metric space $(X, d)$ is called **complete** if every Cauchy sequence in $X$ converges to a point of $X$. Completeness is a property of the pair $(X, d)$, not of the set $X$ alone. For a detailed treatment, see [Cauchy Sequence](/page/Cauchy%20Sequence).
[example:Real Line Is Complete]
The real line $(\mathbb{R}, |\cdot|)$ is complete. This is one of several equivalent formulations of the completeness axiom for $\mathbb{R}$. The proof, given in full on the [Convergence (Real Sequences)](/page/Convergence%20(Real%20Sequences)) page, proceeds by showing that every Cauchy sequence in $\mathbb{R}$ is bounded, extracting a convergent subsequence via the Bolzano-Weierstrass theorem, and then using the Cauchy condition to show that the full sequence converges to the same limit.
More generally, $(\mathbb{R}^n, d_2)$ is complete: a sequence in $\mathbb{R}^n$ is Cauchy if and only if each of its $n$ coordinate sequences is Cauchy in $\mathbb{R}$, and coordinate-wise convergence in $\mathbb{R}$ (which is complete) gives convergence in $\mathbb{R}^n$.
[/example]
[example:Rationals Are Not Complete]
The metric space $(\mathbb{Q}, |\cdot|)$ is not complete. Define the sequence $(q_n)_{n \in \mathbb{N}}$ in $\mathbb{Q}$ by the recursion
\begin{align*}
q_1 = 1, \qquad q_{n+1} = \frac{1}{2}\left(q_n + \frac{2}{q_n}\right).
\end{align*}
This is the Babylonian algorithm for $\sqrt{2}$. Each $q_n$ is rational (since $\mathbb{Q}$ is closed under the arithmetic operations involved), and the sequence is Cauchy in $(\mathbb{Q}, |\cdot|)$ because it converges to $\sqrt{2}$ in $\mathbb{R}$ and convergent sequences are Cauchy. However, $\sqrt{2} \notin \mathbb{Q}$, so the sequence has no limit in $\mathbb{Q}$. The "gap" at $\sqrt{2}$ is precisely the kind of deficiency that completeness rules out.
[/example]
[example:Function Space Completeness]
Let $C[0,1]$ denote the set of continuous functions $f: [0,1] \to \mathbb{R}$, equipped with the supremum metric $d_\infty(f, g) = \sup_{t \in [0,1]} |f(t) - g(t)|$. The space $(C[0,1], d_\infty)$ is complete.
To verify this, let $(f_n)$ be a Cauchy sequence in $(C[0,1], d_\infty)$. For each fixed $t \in [0,1]$, the inequality $|f_m(t) - f_n(t)| \leq d_\infty(f_m, f_n)$ shows that the real-valued sequence $(f_n(t))_{n \in \mathbb{N}}$ is Cauchy, hence convergent in $\mathbb{R}$. Define $f(t) = \lim_{n \to \infty} f_n(t)$. The Cauchy condition in $d_\infty$ gives uniform convergence $f_n \to f$, and since the [uniform limit of continuous functions is continuous](/page/Uniform%20Convergence), $f \in C[0,1]$. Therefore $(C[0,1], d_\infty)$ is complete --- it is in fact a [Banach space](/page/Banach%20Space) under the supremum norm.
This completeness fails for different metrics on the same underlying set. Under the $L^1$ metric $d_1(f, g) = \int_0^1 |f(t) - g(t)| \, dt$, the space $C[0,1]$ is not complete: one can construct a Cauchy sequence of continuous functions whose $L^1$-limit is a discontinuous function (such as a step function), which lies outside $C[0,1]$.
[/example]
## Compactness
Compactness is the metric space analogue of the closed-and-bounded condition on intervals in $\mathbb{R}$. In general topology, compactness is defined via open covers (every open cover has a finite subcover), but in metric spaces, a more intuitive sequential characterisation is available and equivalent.
A metric space $(X, d)$ is **sequentially compact** if every sequence in $X$ has a subsequence that converges to a point of $X$. A subset $K \subseteq X$ is sequentially compact if the subspace $(K, d|_{K \times K})$ is sequentially compact. In metric spaces, sequential compactness is equivalent to the open-cover definition of compactness, so we use the two terms interchangeably.
Compactness is the strongest of the "finiteness" conditions in topology. Every compact metric space is complete (a Cauchy sequence in a compact space has a convergent subsequence, and a Cauchy sequence with a convergent subsequence converges). Every compact metric space is also totally bounded (for every $\varepsilon > 0$, finitely many open balls of radius $\varepsilon$ cover the space). The converse also holds: a metric space is compact if and only if it is complete and totally bounded.
In $\mathbb{R}^n$, compactness admits a concrete characterisation.
[quotetheorem:271]
The Heine-Borel theorem is specific to finite-dimensional Euclidean space. Its proof in the forward direction uses the fact that $\mathbb{R}^n$ is Hausdorff (so compact sets are closed) and that continuous real-valued functions on compact sets are bounded (applied to the norm function $x \mapsto \|x\|$). The reverse direction embeds the closed bounded set $K$ into a product of closed intervals $[-M, M]^n$, which is compact by Tychonoff's theorem (or, in this finite-dimensional case, by iterated application of the Bolzano-Weierstrass theorem), and uses the fact that closed subsets of compact spaces are compact.
The theorem fails in infinite-dimensional spaces. The closed unit ball $\overline{B}_1(0) = \{f \in C[0,1] : \|f\|_\infty \leq 1\}$ in $(C[0,1], d_\infty)$ is closed and bounded but not compact: the sequence $f_n(t) = t^n$ has $\|f_n\|_\infty = 1$ for all $n$, but the pointwise limit is discontinuous, so no subsequence converges uniformly. This failure is not accidental --- in any infinite-dimensional normed space, the closed unit ball is never compact (a theorem of Riesz). Compactness in infinite dimensions requires additional hypotheses beyond closedness and boundedness.
[quotetheorem:628]
The Bolzano-Weierstrass theorem is the sequential formulation of compactness for bounded subsets of $\mathbb{R}^n$. It states that boundedness alone is sufficient to extract a convergent subsequence, though the limit may lie outside the original set (it will lie in its closure). Combined with closedness, Bolzano-Weierstrass yields sequential compactness and hence, by Heine-Borel, compactness. The proof proceeds by the bisection method: bisect the bounding box, choose the half containing infinitely many terms, and repeat. The nested [closed sets](/page/Closed%20Set) shrink to a point by the completeness of $\mathbb{R}$, and a subsequence is selected with one term from each set. The extension from $\mathbb{R}$ to $\mathbb{R}^n$ uses a coordinate-by-coordinate extraction: apply the one-dimensional result to the first coordinates to extract a subsequence, then to the second coordinates of that subsequence, and so on through all $n$ coordinates.
### Compactness in function spaces
The question of when a subset of $C(K)$ is compact has a definitive answer given by the Arzela-Ascoli theorem. The two conditions --- uniform boundedness and equicontinuity --- are the function-space analogues of "bounded" and "closed" in the Heine-Borel theorem, though the analogy is imperfect.
[quotetheorem:66]
The Arzela-Ascoli theorem is one of the most widely applied results in analysis. Its proof combines a diagonal argument (to extract a pointwise convergent subsequence on a countable dense subset of $K$) with an $\varepsilon/3$ argument (to upgrade pointwise convergence on a dense set to uniform convergence on all of $K$, using equicontinuity and compactness of the domain). The theorem explains why compactness in function spaces is harder to come by than in $\mathbb{R}^n$: uniform boundedness provides a bound on function values, but equicontinuity is an additional constraint on the regularity of the functions, preventing wild oscillations that would defeat the extraction of a uniformly convergent subsequence. In applications to differential equations, the Arzela-Ascoli theorem is the standard tool for extracting convergent subsequences from families of approximate solutions.
## Continuity in Metric Spaces
The $\varepsilon$-$\delta$ definition of continuity extends directly from the real line to arbitrary metric spaces: small perturbations of the input (measured by $d$) produce small perturbations of the output (measured by $d'$).
Let $(X, d)$ and $(Y, d')$ be metric spaces. A function $f: X \to Y$ is **continuous at** $a \in X$ if for every $\varepsilon > 0$, there exists $\delta > 0$ such that
\begin{align*}
d(x, a) < \delta \implies d'(f(x), f(a)) < \varepsilon.
\end{align*}
The function $f$ is **continuous** (on $X$) if it is continuous at every point $a \in X$. This definition is identical in structure to the real-valued case, with the metric $d$ replacing $|x - a|$ and $d'$ replacing $|f(x) - f(a)|$. A detailed treatment appears on the [Continuity (Metric Spaces)](/page/Continuity%20(Metric%20Spaces)) page.
Continuity has an equivalent sequential characterisation in metric spaces: $f$ is continuous at $a$ if and only if $x_n \to a$ implies $f(x_n) \to f(a)$ for every sequence $(x_n)$ in $X$. This equivalence, which relies on the first-countability of metric spaces, is often more convenient for verifying continuity or proving discontinuity (by exhibiting a sequence that converges but whose images do not).
### Uniform continuity
Continuity is a pointwise condition: the $\delta$ may depend on both $\varepsilon$ and the point $a$. Uniform continuity strengthens this by requiring that $\delta$ work simultaneously at every point.
A function $f: (X, d) \to (Y, d')$ is **uniformly continuous** if for every $\varepsilon > 0$, there exists $\delta > 0$ such that for all $x, y \in X$,
\begin{align*}
d(x, y) < \delta \implies d'(f(x), f(y)) < \varepsilon.
\end{align*}
The quantifier structure is the key difference: in continuity, the order is $\forall a \, \forall \varepsilon \, \exists \delta$; in uniform continuity, it is $\forall \varepsilon \, \exists \delta \, \forall x, y$. The $\delta$ in uniform continuity depends only on $\varepsilon$, not on the particular points involved.
The distinction matters. The function $f: (0, \infty) \to \mathbb{R}$ defined by $f(x) = 1/x$ is continuous but not uniformly continuous: near $x = 0$, ever-smaller values of $\delta$ are needed for a given $\varepsilon$, and no single $\delta$ works for all points. Compactness of the domain eliminates this phenomenon.
[quotetheorem:280]
The Heine-Cantor theorem is stated above for closed bounded intervals in $\mathbb{R}$, but the result generalises: if $(X, d)$ is a compact metric space, $(Y, d')$ is any metric space, and $f: X \to Y$ is continuous, then $f$ is uniformly continuous. The proof is by contradiction using sequential compactness. If $f$ is not uniformly continuous, there exist $\varepsilon_0 > 0$ and sequences $(x_n)$, $(y_n)$ with $d(x_n, y_n) < 1/n$ but $d'(f(x_n), f(y_n)) \geq \varepsilon_0$. By compactness, $(x_n)$ has a convergent subsequence $x_{n_k} \to c$. Then $y_{n_k} \to c$ as well (since $d(x_{n_k}, y_{n_k}) < 1/n_k \to 0$), and continuity at $c$ forces $d'(f(x_{n_k}), f(y_{n_k})) \to 0$, contradicting $d'(f(x_{n_k}), f(y_{n_k})) \geq \varepsilon_0$. This argument illustrates the power of compactness: it converts a pointwise property (continuity) into a uniform one.
### Lipschitz continuity
A function $f: (X, d) \to (Y, d')$ is **Lipschitz continuous** with constant $L \geq 0$ if
\begin{align*}
d'(f(x), f(y)) \leq L \cdot d(x, y) \quad \text{for all } x, y \in X.
\end{align*}
Lipschitz continuity implies uniform continuity (take $\delta = \varepsilon / L$ when $L > 0$), which in turn implies continuity. None of the reverse implications holds in general. The function $f(x) = \sqrt{x}$ on $[0, 1]$ is uniformly continuous (by Heine-Cantor, since $[0,1]$ is compact) but not Lipschitz: near $x = 0$, the [derivative](/page/Derivative) $f'(x) = 1/(2\sqrt{x})$ is unbounded, so no finite Lipschitz constant $L$ exists. The function $f(x) = x \sin(1/x)$ on $(0, 1]$ (extended by $f(0) = 0$) is continuous but not uniformly continuous.
Lipschitz continuity plays a central role in the theory of ordinary differential equations (the Picard-Lindelof theorem requires a Lipschitz condition on the right-hand side) and in the Banach [contraction mapping principle](/page/Contraction%20Mapping%20Principle), where a Lipschitz constant strictly less than $1$ guarantees existence and uniqueness of fixed points.
## The Landscape of Metric Spaces
The examples presented so far --- Euclidean, supremum, discrete --- are only the beginning. The versatility of the metric space concept lies in its ability to capture wildly different notions of distance on different sets, and sometimes different notions of distance on the same set.
[example:Taxicab Metric]
The **taxicab metric** (or **$\ell^1$ metric**) on $\mathbb{R}^n$ is defined by
\begin{align*}
d_1: \mathbb{R}^n \times \mathbb{R}^n &\to \mathbb{R} \\
(x, y) &\mapsto \sum_{i=1}^{n} |x_i - y_i|.
\end{align*}
In $\mathbb{R}^2$, this measures distance "along the grid lines" --- the shortest path if one can only travel horizontally and vertically, as in a city laid out on a rectangular grid. The open balls in $d_1$ are diamonds (squares rotated by $45°$), in contrast to the circular balls of $d_2$. Despite the different geometry of balls, the taxicab and Euclidean metrics induce the same topology on $\mathbb{R}^n$: a set is open in one if and only if it is open in the other. This follows from the inequalities
\begin{align*}
d_2(x, y) \leq d_1(x, y) \leq \sqrt{n} \, d_2(x, y),
\end{align*}
which show that every $d_1$-ball contains a $d_2$-ball of appropriate radius and vice versa. Any two metrics $d$ and $d'$ on a set $X$ are called **topologically equivalent** if they induce the same topology, and **strongly equivalent** if there exist constants $c, C > 0$ with $c \, d(x,y) \leq d'(x,y) \leq C \, d(x,y)$ for all $x, y$. Strong equivalence implies topological equivalence, but not conversely.
[/example]
[example:P Adic Metric]
Fix a prime $p$. Every non-zero rational number $x \in \mathbb{Q} \setminus \{0\}$ can be written as $x = p^v \cdot a/b$, where $v \in \mathbb{Z}$ and $a, b$ are integers not divisible by $p$. The integer $v$ is called the **$p$-adic valuation** of $x$, written $v_p(x) = v$. By convention, $v_p(0) = +\infty$. The **$p$-adic metric** on $\mathbb{Q}$ is defined by
\begin{align*}
d_p: \mathbb{Q} \times \mathbb{Q} &\to \mathbb{R} \\
(x, y) &\mapsto p^{-v_p(x - y)},
\end{align*}
with the convention $d_p(x, x) = p^{-\infty} = 0$. This metric satisfies a stronger form of the triangle inequality, the **ultrametric inequality**:
\begin{align*}
d_p(x, z) \leq \max(d_p(x, y), d_p(y, z)),
\end{align*}
which follows from the property $v_p(a + b) \geq \min(v_p(a), v_p(b))$. In the $p$-adic metric, numbers are "close" when their difference is divisible by a high power of $p$. For example, with $p = 7$, the integers $1$ and $1 + 7^{10}$ are very close ($d_7 = 7^{-10}$), even though they differ by over $282$ billion in the Euclidean sense. The $p$-adic metric is not topologically equivalent to the Euclidean metric on $\mathbb{Q}$: the sequence $a_n = p^n$ converges to $0$ in $d_p$ (since $d_p(p^n, 0) = p^{-n} \to 0$) but diverges in the Euclidean metric. The completion of $(\mathbb{Q}, d_p)$ is the field of $p$-adic numbers $\mathbb{Q}_p$, a fundamental object in number theory.
[/example]
[example:Hausdorff Metric]
Let $(X, d)$ be a metric space and let $\mathcal{K}(X)$ denote the collection of non-empty compact subsets of $X$. For $A \in \mathcal{K}(X)$ and $x \in X$, define the distance from $x$ to $A$ by $d(x, A) = \inf_{a \in A} d(x, a)$. The **Hausdorff metric** on $\mathcal{K}(X)$ is
\begin{align*}
d_H: \mathcal{K}(X) \times \mathcal{K}(X) &\to \mathbb{R} \\
(A, B) &\mapsto \max\!\left(\sup_{a \in A} d(a, B), \; \sup_{b \in B} d(b, A)\right).
\end{align*}
Informally, $d_H(A, B) < \varepsilon$ means that every point of $A$ is within distance $\varepsilon$ of some point of $B$, and every point of $B$ is within distance $\varepsilon$ of some point of $A$. The Hausdorff metric turns the space of compact subsets into a metric space, and if $(X, d)$ is complete, then so is $(\mathcal{K}(X), d_H)$. This metric is widely used in fractal geometry (where one studies [limits](/page/Limit) of iterated function systems in the Hausdorff metric) and in computational geometry (where it measures the similarity of shapes).
[/example]
### When metrics disagree
Two metrics on the same set can induce different topologies, different notions of convergence, and different notions of completeness. On $C[0,1]$, the supremum metric $d_\infty$ and the $L^1$ metric $d_1(f, g) = \int_0^1 |f(t) - g(t)| \, dt$ differ in all three respects. A sequence that converges in $d_\infty$ (uniformly) also converges in $d_1$, since $d_1(f_n, f) \leq d_\infty(f_n, f)$, but the converse fails: the sequence $f_n(t) = nt \cdot \mathbb{1}_{[0, 1/n]}(t) + (2 - nt) \cdot \mathbb{1}_{(1/n, 2/n]}(t)$, a triangle of height $1$ and base $2/n$, satisfies $d_1(f_n, 0) = 1/n \to 0$ but $d_\infty(f_n, 0) = 1$ for all $n$. The space $(C[0,1], d_\infty)$ is complete (as shown above), while $(C[0,1], d_1)$ is not complete (its completion is the Lebesgue space $L^1[0,1]$, which contains equivalence classes of [integrable](/page/Integral) functions that need not be continuous).
The choice of metric is therefore not merely a technical convenience --- it determines the analytic properties of the space and must be guided by the intended application. In approximation theory and differential equations, the supremum metric is natural because it controls pointwise errors. In probability and statistics, $L^p$ metrics and the Wasserstein metric are preferred because they respect the measure-theoretic structure of the problem.
## Problems
[problem]
Let $(X, d)$ be a metric space. Prove that for any $x, y, z, w \in X$,
\begin{align*}
|d(x, y) - d(z, w)| \leq d(x, z) + d(y, w).
\end{align*}
*Difficulty: 1*
[/problem]
[solution]
By the triangle inequality applied twice:
\begin{align*}
d(x, y) \leq d(x, z) + d(z, w) + d(w, y),
\end{align*}
so $d(x, y) - d(z, w) \leq d(x, z) + d(y, w)$ (using symmetry $d(w, y) = d(y, w)$). Swapping the pairs $(x, y)$ and $(z, w)$:
\begin{align*}
d(z, w) - d(x, y) \leq d(z, x) + d(w, y) = d(x, z) + d(y, w).
\end{align*}
Taking the maximum of the two inequalities gives $|d(x, y) - d(z, w)| \leq d(x, z) + d(y, w)$.
[/solution]
[problem]
Let $(X, d)$ be a metric space. Define $\tilde{d}: X \times X \to \mathbb{R}$ by $\tilde{d}(x, y) = \min(d(x, y), 1)$. Prove that $\tilde{d}$ is a metric on $X$ and that $d$ and $\tilde{d}$ induce the same topology.
*Difficulty: 2*
[/problem]
[solution]
**$\tilde{d}$ is a metric.** Positivity and symmetry are inherited from $d$. For the triangle inequality, let $x, y, z \in X$. If $d(x, z) < 1$, then
\begin{align*}
\tilde{d}(x, z) = d(x, z) \leq d(x, y) + d(y, z).
\end{align*}
Since $\min(a, 1) \leq a$ for all $a \geq 0$, we need $d(x, y) + d(y, z) \geq \tilde{d}(x, y) + \tilde{d}(y, z)$? Not quite --- we need the sharper bound. We consider two cases.
Case 1: $\tilde{d}(x, z) < 1$. Then $\tilde{d}(x, z) = d(x, z) \leq d(x, y) + d(y, z)$. Since $\tilde{d}(x, y) \leq d(x, y)$ and $\tilde{d}(y, z) \leq d(y, z)$, we cannot directly conclude $\tilde{d}(x, z) \leq \tilde{d}(x, y) + \tilde{d}(y, z)$ from this. Instead, note that $d(x, z) \leq d(x, y) + d(y, z)$, so $\min(d(x, z), 1) \leq d(x, y) + d(y, z)$. But we also have $\min(d(x, z), 1) \leq 1 \leq \min(d(x, y), 1) + \min(d(y, z), 1)$ whenever at least one of $d(x, y), d(y, z) \geq 1$. If both are less than $1$, then $\tilde{d}(x, y) + \tilde{d}(y, z) = d(x, y) + d(y, z) \geq d(x, z) \geq \tilde{d}(x, z)$.
Case 2: $\tilde{d}(x, z) = 1$. Then $\tilde{d}(x, z) = 1 \leq \tilde{d}(x, y) + \tilde{d}(y, z)$, since $1 \leq d(x, y) + d(y, z)$ (by the triangle inequality for $d$ and the fact that $d(x, z) \geq 1$) and at least one of $d(x, y), d(y, z)$ is at least $1/2$, but more directly: $\tilde{d}(x, y) + \tilde{d}(y, z) \geq \min(d(x, y), 1) + \min(d(y, z), 1)$. If either $d(x, y) \geq 1$ or $d(y, z) \geq 1$, then the sum is at least $1$. If both are less than $1$, then the sum equals $d(x, y) + d(y, z) \geq d(x, z) \geq 1$.
**Same topology.** For $r < 1$, the open ball $B_r^{\tilde{d}}(x) = \{y : \tilde{d}(x, y) < r\} = \{y : d(x, y) < r\} = B_r^d(x)$. So for every $d$-open set $U$ and $x \in U$, choose $r < 1$ with $B_r^d(x) \subseteq U$; then $B_r^{\tilde{d}}(x) = B_r^d(x) \subseteq U$, so $U$ is $\tilde{d}$-open. Conversely, every $\tilde{d}$-open set is $d$-open by the same argument.
[/solution]
[problem]
Let $(X, d)$ be a compact metric space and let $f: X \to X$ be a function satisfying $d(f(x), f(y)) < d(x, y)$ for all $x \neq y$ (a **strict contraction**). Prove that $f$ has a unique fixed point.
*Difficulty: 3*
[/problem]
[solution]
**Existence.** Define $\varphi: X \to \mathbb{R}$ by $\varphi(x) = d(x, f(x))$. The function $\varphi$ is continuous: by the reverse triangle inequality, $|\varphi(x) - \varphi(y)| = |d(x, f(x)) - d(y, f(y))| \leq d(x, y) + d(f(x), f(y)) \leq 2d(x, y)$. Since $X$ is compact and $\varphi$ is continuous, $\varphi$ attains its infimum at some point $x_0 \in X$. Suppose $\varphi(x_0) > 0$, so $x_0 \neq f(x_0)$. Then
\begin{align*}
\varphi(f(x_0)) = d(f(x_0), f(f(x_0))) < d(x_0, f(x_0)) = \varphi(x_0),
\end{align*}
where the strict inequality uses the hypothesis with $x = x_0$ and $y = f(x_0) \neq x_0$. This contradicts the minimality of $\varphi(x_0)$. Therefore $\varphi(x_0) = 0$, meaning $f(x_0) = x_0$.
**Uniqueness.** If $f(x_0) = x_0$ and $f(x_1) = x_1$ with $x_0 \neq x_1$, then $d(x_0, x_1) = d(f(x_0), f(x_1)) < d(x_0, x_1)$, a contradiction.
Note that compactness is essential. The function $f: (0, \infty) \to (0, \infty)$ defined by $f(x) = x/2$ satisfies the strict contraction condition, but the unique fixed point $x = 0$ does not belong to $(0, \infty)$.
[/solution]
## References
- W. Rudin, *Principles of Mathematical Analysis*, 3rd edition, McGraw-Hill, 1976.
- S. Shirali and H. L. Vasudeva, *Metric Spaces*, Springer, 2006.
- G. B. Folland, *Real Analysis: Modern Techniques and Their Applications*, 2nd edition, Wiley, 1999.
- J. R. Munkres, *Topology*, 2nd edition, Prentice Hall, 2000.