[motivation]
### Beyond Euclidean distance
In a first analysis course, every distance is $|x - y|$ on $\mathbb{R}$ or $\|x - y\|$ on $\mathbb{R}^n$. This is sufficient for studying [sequences](/page/Sequence) of numbers and functions of a real variable, but it becomes limiting as soon as the objects of study are themselves more complex. Consider the space of all continuous functions $f: [0,1] \to \mathbb{R}$. Two functions $f$ and $g$ might agree everywhere except on a tiny interval near $t = 1/2$, where $f$ spikes sharply above $g$. Are they "close"? That depends on how we measure: the supremum distance $\sup |f(t) - g(t)|$ says they are far apart (the spike is tall), while the $L^1$ distance $\int_0^1 |f(t) - g(t)| \, dt$ says they are close (the spike is narrow, so the area is small). Different notions of distance on the same underlying set can lead to different convergent sequences, different continuous functions, and different compact subsets. A framework that accommodates all such notions simultaneously is not a luxury --- it is a necessity.
### What makes a "distance" useful?
Not every assignment of non-negative numbers to pairs of points deserves to be called a distance. Three requirements emerge from examining what makes $|x - y|$ so effective on $\mathbb{R}$.
First, **positivity and identity of indiscernibles**: $d(x, y) \geq 0$ always, and $d(x, y) = 0$ if and only if $x = y$. Without this, distinct points could be "distance zero apart," collapsing the ability to distinguish them. Relaxing this condition leads to pseudometrics, which arise naturally in quotient constructions but sacrifice the power to separate points.
Second, **symmetry**: $d(x, y) = d(y, x)$. The distance from London to Paris is the same as from Paris to London. Dropping symmetry leads to quasimetrics, which model situations like travel times on one-way streets, but which require separate theories for "forward convergence" and "backward convergence."
Third, the **triangle inequality**: $d(x, z) \leq d(x, y) + d(y, z)$. This is the axiom with the deepest consequences. It prevents "shortcuts" --- going from $x$ to $z$ via $y$ can never be cheaper than going directly. Every proof in metric space theory that involves approximation --- and that is nearly all of them --- relies on the triangle inequality to control accumulated errors. The $\varepsilon/3$ argument, the passage from pointwise to uniform convergence, the proof that Cauchy sequences are bounded: all depend on chaining triangle inequality estimates.
### The payoff
Once a set $X$ is equipped with a function $d$ satisfying these three axioms, the entire toolkit of analysis becomes available. Open balls define a topology; convergence is characterised by $d(x_n, x) \to 0$; continuity means $\varepsilon$-$\delta$ with $d$ in place of absolute value; Cauchy sequences and completeness generalise directly. The remarkable fact is that these constructions do not depend on the specific formula for $d$ --- only on the three axioms. A single proof of the Heine-Cantor theorem, for instance, works simultaneously for real-valued functions on an interval, for maps between surfaces, and for operators between Banach spaces. The metric space framework is the engine that makes this kind of unification possible.
[/motivation]