A ruler handles intervals, but probability and analysis quickly force us to measure sets made by limiting operations. The event that a random experiment succeeds infinitely often, the domain on which a sequence of functions converges, and a curve in the plane that has zero area but positive length all push past finite additivity. The problem is to assign size in a way that survives countable unions, countable intersections, and limiting approximation.
The naive wish is to measure every subset and keep all familiar symmetries of length. That wish is too strong in general, so measure theory takes the workable route: first choose a stable class of measurable sets, then define a countably additive size only on that class. This page is a parent-level map of that framework. Its main goal is to explain the object called a measure and the structural consequences of countable additivity; later sections on integration, product measures, Radon measures, [Hausdorff measure](/page/Hausdorff%20Measure), probability, and dynamics are included only to show how the same axioms reappear in major downstream theories. They record the minimal definitions and orientation theorems needed for the parent concept, while construction details, decomposition theory, and applications belong on child pages.
[example: Approximating an Interval from Inside]
Let $E=[0,1]$ and set $A_n=(0,1-1/n)$ for $n\in\mathbb{N}$. Then $A_n\subset A_{n+1}$ and
\begin{align*}
\bigcup_{n=1}^{\infty}A_n=(0,1).
\end{align*}
The expected lengths satisfy
\begin{align*}
\lim_{n\to\infty}\left(1-\frac{1}{n}\right)=1.
\end{align*}
A useful theory of size must force this kind of limiting compatibility, not only finite additivity for finitely many pieces.
[/example]
The example points to the first design requirement. We need a class of sets that is closed under the operations used to form limiting events and exceptional sets. This motivates the measurable structure on the underlying set.
## Definition
Before assigning sizes, we must decide which sets are visible to the theory. That measurable structure is not extra decoration; it is the stage on which every measure lives.
[definition: Measure]
Let $E$ be a set and let $\mathcal{E}$ be a sigma-algebra on $E$. A measure on $(E,\mathcal{E})$ is a function
\begin{align*}
\mu:\mathcal{E}\to[0,\infty]
\end{align*}
such that $\mu(\varnothing)=0$ and, for every sequence $(A_n)_{n=1}^{\infty}$ of pairwise disjoint sets in $\mathcal{E}$,
\begin{align*}
\mu\left(\bigcup_{n=1}^{\infty}A_n\right)=\sum_{n=1}^{\infty}\mu(A_n).
\end{align*}
[/definition]
The elements of $\mathcal{E}$ are called measurable sets. This definition gives the page its central object: a countably additive, nonnegative size assignment on a chosen collection of measurable sets. The rest of the opening section now fills in the ambient language that the definition used: what it means for $\mathcal{E}$ to be a sigma-algebra, and how the set, sigma-algebra, and measure are packaged together.
## Measurable Structure
A measure cannot be placed on a collection of sets that collapses under the operations used in limiting arguments. If a set is measurable, its complement must also be measurable so that exceptional sets can be removed; if countably many measurable pieces appear, their union must remain inside the theory. This stability requirement is the role of a measurable space.
[definition: Measurable Space]
A measurable space is a pair $(E,\mathcal{E})$ where $E$ is a set and $\mathcal{E}\subset\mathcal{P}(E)$ satisfies the following conditions:
\begin{align*}
\varnothing\in\mathcal{E}.
\end{align*}
\begin{align*}
A\in\mathcal{E}\implies E\setminus A\in\mathcal{E}.
\end{align*}
\begin{align*}
A_1,A_2,\ldots\in\mathcal{E}\implies\bigcup_{n=1}^{\infty}A_n\in\mathcal{E}.
\end{align*}
[/definition]
The definition deliberately separates two choices: which sets can be measured, and how much size each measurable set receives. The first choice is encoded by the sigma-algebra; the second is encoded by $\mu$. A measure is usually used together with the measurable space on which it is defined, so the next definition packages the set, the measurable sets, and the size assignment into the ambient object for the rest of the page.
[definition: Measure Space]
A [measure space](/page/Measure%20Space) is a triple $(E,\mathcal{E},\mu)$ where $(E,\mathcal{E})$ is a measurable space and $\mu$ is a measure on $(E,\mathcal{E})$.
[/definition]
Once the ambient triple is fixed, we can ask how large the whole universe is. The next distinctions are not cosmetic: finite total mass powers compactness-style estimates, probability fixes the total mass to one, and sigma-finiteness lets many infinite spaces be treated as countable unions of finite pieces.
## Finiteness and Normalisation
Countable additivity by itself permits spaces whose total size is infinite, and estimates that sum errors over the whole space can then become unusable. The useful bounded-mass case is the one where the measure of the entire universe is finite.
[definition: Finite Measure]
A measure $\mu$ on $(E,\mathcal{E})$ is finite if
\begin{align*}
\mu(E)<\infty.
\end{align*}
[/definition]
A finite measure still has an arbitrary total mass, so its values are sizes rather than normalized chances.
For probability, the underlying question is not only whether the total mass is bounded, but whether all events can be compared on a single chance scale. Fixing the whole space to have mass one turns measurable sets into events whose sizes are normalized probabilities.
[definition: Probability Measure]
A probability measure on a measurable space $(\Omega,\mathcal{F})$ is a measure $\mathbb{P}:\mathcal{F}\to[0,\infty]$ such that
\begin{align*}
\mathbb{P}(\Omega)=1.
\end{align*}
[/definition]
The normalization forces $0\le \mathbb{P}(A)\le 1$ for every event $A\in\mathcal{F}$, but the object is still first a measure in the same extended nonnegative sense as the preceding definition.
Many analytic measures have infinite total mass, including [Lebesgue measure](/page/Lebesgue%20Measure) on $\mathbb{R}^n$. To retain the local advantages of finite measure, we ask whether the space can be covered by countably many finite-measure pieces.
[definition: Sigma-Finite Measure]
A measure $\mu$ on $(E,\mathcal{E})$ is $\sigma$-finite if there exist sets $E_1,E_2,\ldots\in\mathcal{E}$ such that
\begin{align*}
E=\bigcup_{n=1}^{\infty}E_n
\end{align*}
and $\mu(E_n)<\infty$ for every $n\in\mathbb{N}$.
[/definition]
Sigma-finiteness is the reason many infinite spaces still behave like countable assemblies of finite ones. The standard examples show the distinction between finite, infinite, and sigma-finite size.
[example: Counting Measure and Lebesgue Measure]
On a set $E$, the counting measure $\#$ on $(E,\mathcal{P}(E))$ is the measure
\begin{align*}
\#:\mathcal{P}(E)&\to[0,\infty]
\end{align*}
defined by assigning to $A\subset E$ its number of elements when $A$ is finite and assigning $\infty$ when $A$ is infinite. It is finite exactly when $E$ is finite, and it is $\sigma$-finite exactly when $E$ is countable.
Lebesgue measure $\mathcal{L}^n$ on $\mathbb{R}^n$ is not finite, since $\mathcal{L}^n(\mathbb{R}^n)=\infty$. It is $\sigma$-finite because
\begin{align*}
\mathbb{R}^n=\bigcup_{m=1}^{\infty}[-m,m]^n
\end{align*}
and every cube $[-m,m]^n$ has finite $\mathcal{L}^n$-measure.
[/example]
Sigma-finiteness gives the first hint that the axioms are not merely bookkeeping. The next step is to extract the limiting rules that countable additivity forces, since those rules are what let finite approximations control infinite constructions.
## Sigma-Additivity and Limits
### Continuity of Measure
The purpose of countable additivity is not merely to add a countable list of disjoint sets. Its deeper role is to make measure continuous under monotone limiting operations. The key question is when the measure of a limiting set can be recovered as the limit of the measures of the approximating sets: increasing approximation works directly, while decreasing approximation needs a finite-measure starting point to rule out infinite mass escaping to infinity.
[quotetheorem:1082]
The finiteness condition in the decreasing case is not cosmetic. The following example shows exactly how decreasing approximation fails when infinite mass remains present at every finite stage.
[example: Decreasing Sets with Infinite Measure]
Let $E=\mathbb{R}$, let $\mathcal{E}$ be the family of Borel subsets of $\mathbb{R}$, and let $\mu=\mathcal{L}^1$. Define $A_n=(n,\infty)$. Then $A_{n+1}\subset A_n$ and
\begin{align*}
\bigcap_{n=1}^{\infty}A_n=\varnothing.
\end{align*}
However, $\mathcal{L}^1(A_n)=\infty$ for every $n$, while $\mathcal{L}^1(\varnothing)=0$. The conclusion of continuity from above would give the wrong answer without the finite-measure assumption.
[/example]
Continuity theorems explain how an already defined measure behaves under limits. A separate practical question remains: when a measure is described on a small family of familiar sets, how much of the full measure has already been determined?
### Generating Classes
In practice, a measure is often specified first on intervals, rectangles, or cylinder sets. We then need a criterion saying that agreement on those test sets determines the whole measure. The right generating families are stable under finite intersections.
[definition: $\pi$-System]
A $\pi$-system on a set $E$ is a nonempty family $\mathcal{P}\subset\mathcal{P}(E)$ such that
\begin{align*}
A,B\in\mathcal{P}\implies A\cap B\in\mathcal{P}.
\end{align*}
[/definition]
A $\pi$-system is small enough to be concrete and large enough to generate a sigma-algebra, but generation alone does not automatically justify comparing measures only on the generator.
The obstruction is that two countably additive measures might agree on the visible test sets while differing after countable unions and complements are taken. A uniqueness principle is needed to say that, under the right finiteness hypothesis, the generator really determines the whole measure.
[quotetheorem:506]
This is the structural reason that a probability distribution on $\mathbb{R}$ is determined by interval probabilities, and that product measures are determined by rectangles.
## Constructing Measures
### From Set Functions to Measures
The definition of a measure says what the finished object must do, but many important measures are not handed to us on a full sigma-algebra. Length is first known on intervals, probability laws are often specified on cylinder events, and geometric measures start from covering costs. The construction problem is to begin with a smaller rule and extend it without losing countable additivity. The smaller class must at least support finite cutting and recombination, since interval and rectangle formulas are first checked by finite decompositions.
[definition: Algebra of Sets]
Let $E$ be a set. An algebra of subsets of $E$ is a family $\mathcal{A}\subset\mathcal{P}(E)$ such that $\varnothing\in\mathcal{A}$, $E\in\mathcal{A}$, and $\mathcal{A}$ is closed under finite unions and complements relative to $E$.
[/definition]
To turn an algebra into the starting point for a genuine measure, we need a rule that already respects the countable decompositions visible inside that algebra. Suppose we know how to assign lengths to finite unions of intervals, or volumes to finite unions of rectangles. If a countable disjoint union of such test sets happens to remain in the same algebra, the provisional size rule must already agree with the countable sum; otherwise any later extension to a sigma-algebra would be forced to contradict the starting data. A premeasure is the compatibility condition that makes extension possible rather than arbitrary.
[definition: Premeasure]
Let $E$ be a set and let $\mathcal{A}\subset\mathcal{P}(E)$ be an algebra of subsets of $E$. A premeasure on $\mathcal{A}$ is a function
\begin{align*}
\mu_0:\mathcal{A}\to[0,\infty]
\end{align*}
such that $\mu_0(\varnothing)=0$ and, whenever $(A_n)_{n=1}^{\infty}$ is a sequence of pairwise disjoint sets in $\mathcal{A}$ with $\bigcup_{n=1}^{\infty}A_n\in\mathcal{A}$,
\begin{align*}
\mu_0\left(\bigcup_{n=1}^{\infty}A_n\right)=\sum_{n=1}^{\infty}\mu_0(A_n).
\end{align*}
[/definition]
A premeasure is already countably additive wherever the algebra can see the union, but the algebra may not contain the limiting sets needed in analysis. The remaining problem is not just to add those missing sets, but to add them without changing the sizes of the original sets or introducing competing answers on the generated sigma-algebra. The [extension theorem](/theorems/59) is the mechanism that turns a concrete finite-stage rule into a genuine measure.
[quotetheorem:9707]
This theorem is the formal bridge from concrete formulas to measures. Lebesgue measure, for instance, starts from rectangle volume and then extends to the Borel sigma-algebra and its completion.
### Outer Measures and Measurable Sets
There is another construction route that measures every subset at first, but only weakly. Covering arguments naturally produce set functions defined on all subsets; countable additivity is then recovered by selecting the subsets that split every ambient set without interaction.
[definition: Outer Measure]
Let $E$ be a set. An outer measure on $E$ is a function
\begin{align*}
\mu^*:\mathcal{P}(E)\to[0,\infty]
\end{align*}
such that $\mu^*(\varnothing)=0$, $A\subset B\subset E$ implies $\mu^*(A)\le\mu^*(B)$, and for every sequence $(A_n)_{n=1}^{\infty}$ of subsets of $E$,
\begin{align*}
\mu^*\left(\bigcup_{n=1}^{\infty}A_n\right)\le\sum_{n=1}^{\infty}\mu^*(A_n).
\end{align*}
[/definition]
An outer measure has the right monotonicity and [countable subadditivity](/theorems/1108), but subadditivity can be strict when a set cuts across the geometry of another set in a pathological way. This creates an obstruction to treating every subset as measurable: cutting a test set into two pieces may lose information about how its size was assembled.
To turn an outer measure into an actual measure, one must identify the sets for which every cut behaves additively. The criterion should test a candidate set against all possible subsets of the ambient space, because measurability is a compatibility condition with the whole outer-measure structure rather than a property of the candidate's size alone.
The obstruction is that outer measure always gives
$\mu^*(B)\le \mu^*(B\cap A)+\mu^*(B\setminus A)$
by subadditivity, but the reverse inequality need not hold for an arbitrary cutting set $A$. The measurable sets are exactly those cuts for which no test set $B$ gains extra size when it is separated into the part inside $A$ and the part outside $A$.
[definition: Caratheodory-Measurable Set]
Let $\mu^*:\mathcal{P}(E)\to[0,\infty]$ be an outer measure. A set $A\subset E$ is $\mu^*$-Caratheodory-measurable if
\begin{align*}
\mu^*(B)=\mu^*(B\cap A)+\mu^*(B\setminus A)
\end{align*}
for every $B\subset E$.
[/definition]
The key payoff is that measurability is not merely a name: the Caratheodory-measurable sets form a sigma-algebra, and the outer measure becomes a measure after restriction to that sigma-algebra. Hausdorff measure later in the page uses exactly this pattern.
## Measurable Functions and Integration
The next sections move beyond the bare definition only far enough to show why countable additivity was chosen in the first place. Integration, almost-everywhere reasoning, probability laws, products, and geometric size all depend on predictable limiting behaviour. The detailed construction of each theory belongs on its own page; here we keep only the structural pieces needed to see how measure supports them, so the emphasis remains on the common measure-theoretic mechanism rather than on the full development of each application.
A measure assigns size to sets, while integration assigns size-weighted totals to functions. To connect the two, functions must pull measurable sets back to measurable sets. This is the role of measurability.
[definition: Measurable Function]
Let $(E,\mathcal{E})$ and $(F,\mathcal{F})$ be measurable spaces. A function $f:E\to F$ is measurable if
\begin{align*}
f^{-1}(B)\in\mathcal{E}
\end{align*}
for every $B\in\mathcal{F}$.
[/definition]
Before integrating arbitrary functions, the measure only gives immediate numerical data for measurable sets.
The first obstruction in defining an integral is that a general [measurable function](/page/Measurable%20Function) may take infinitely many values, so its total cannot be read directly from finitely many measured regions. Simple functions remove that difficulty by taking only finitely many nonnegative values on measurable pieces.
[definition: Simple Function]
Let $(E,\mathcal{E})$ be a measurable space. Equip $[0,\infty)$ with its Borel sigma-algebra $\mathcal{B}([0,\infty))$. A nonnegative [simple function](/page/Simple%20Function) is a measurable function
\begin{align*}
s:(E,\mathcal{E})\to([0,\infty),\mathcal{B}([0,\infty)))
\end{align*}
of the form
\begin{align*}
s=\sum_{i=1}^{m}a_i\mathbb{1}_{A_i},
\end{align*}
where $m\in\mathbb{N}$, $a_i\in[0,\infty)$, and $A_i\in\mathcal{E}$ for every $i$.
[/definition]
Once simple functions are available, the remaining issue is to assign them a numerical total in a way that depends only on the function, not on an accidental presentation of its level sets. A single simple function can be written in several ways, so the integral must be tied to the regions on which the function has fixed height rather than to arbitrary algebraic notation.
This is the first place where measure turns a function into a number. The needed construction should encode the geometric idea of area under a step function while remaining stable under refinements of the measurable partition.
The next definition gives this construction for nonnegative simple functions. When a simple function is written on disjoint measurable pieces, each constant height is weighted by the measure of the piece where that height is taken; disjointness prevents double-counting and supplies the canonical starting point for the general integral.
[definition: Integral of a Nonnegative Simple Function]
Let $(E,\mathcal{E},\mu)$ be a measure space, and let $\mathcal{S}_+(E,\mathcal{E})$ be the set of nonnegative simple [measurable functions](/page/Measurable%20Functions)
\begin{align*}
s:(E,\mathcal{E})\to([0,\infty),\mathcal{B}([0,\infty))).
\end{align*}
The integral of nonnegative simple functions with respect to $\mu$ is the map
\begin{align*}
\int_E \cdot\,d\mu:\mathcal{S}_+(E,\mathcal{E})&\to[0,\infty]
\end{align*}
defined as follows: if
\begin{align*}
s=\sum_{i=1}^{m}a_i\mathbb{1}_{A_i}
\end{align*}
with pairwise disjoint $A_i\in\mathcal{E}$ and $a_i\in[0,\infty)$, then
\begin{align*}
\int_E s\,d\mu=\sum_{i=1}^{m}a_i\mu(A_i).
\end{align*}
[/definition]
This value is independent of the chosen disjoint representation of $s$; refining two representations by all pairwise intersections gives a common disjoint representation with the same weighted sum. We use the standard convention $0\cdot\infty=0$, so zero-height pieces contribute nothing even when their base has infinite measure.
A general nonnegative measurable function may take infinitely many values, so it cannot usually be decomposed into finitely many weighted measurable pieces. The way to keep the integral compatible with the simple case is to look at all simple functions lying below it and take the largest value their integrals force.
[definition: Integral of a Nonnegative Measurable Function]
Let $(E,\mathcal{E},\mu)$ be a measure space. Equip $[0,\infty]$ with its Borel sigma-algebra $\mathcal{B}([0,\infty])$ for the usual order topology on the extended half-line, and let $\mathcal{M}_+(E,\mathcal{E})$ be the set of measurable functions
\begin{align*}
f:(E,\mathcal{E})\to([0,\infty],\mathcal{B}([0,\infty])).
\end{align*}
The integral of nonnegative measurable functions with respect to $\mu$ is the map
\begin{align*}
\int_E \cdot\,d\mu:\mathcal{M}_+(E,\mathcal{E})&\to[0,\infty]
\end{align*}
defined by
\begin{align*}
\int_E f\,d\mu=\sup\left\{\int_E s\,d\mu:0\le s\le f,\ s\text{ is a nonnegative simple function}\right\}
\end{align*}
for every $f\in\mathcal{M}_+(E,\mathcal{E})$.
[/definition]
The lower-approximation definition would be unstable if increasing pointwise limits could lose mass in the limit. The basic convergence question is therefore whether integrating first and then passing to the limit agrees with taking the increasing limit first.
[quotetheorem:509]
In probability, this theorem is a theorem about expectations. It allows nonnegative random variables to be approximated by simpler random variables without losing the limiting expectation.
[example: Expectation as an Integral]
Let $(\Omega,\mathcal{F},\mathbb{P})$ be a [probability space](/page/Probability%20Space), and let $X:\Omega\to[0,\infty]$ be a nonnegative [random variable](/page/Random%20Variable). Its expectation is
\begin{align*}
\mathbb{E}[X]=\int_{\Omega}X\,d\mathbb{P}.
\end{align*}
If $X$ is the simple random variable
\begin{align*}
X=\sum_{i=1}^{m}a_i\mathbb{1}_{A_i}
\end{align*}
for pairwise disjoint events $A_i\in\mathcal{F}$, then
\begin{align*}
\mathbb{E}[X]=\sum_{i=1}^{m}a_i\mathbb{P}(A_i).
\end{align*}
The usual finite weighted average is therefore the simple-function case of measure-theoretic integration.
[/example]
Integration makes exceptional sets unavoidable: two functions that differ only where the measure is zero should have the same integral, and probabilistic statements often hold only outside a negligible event. We therefore need a precise language for sets that the measure cannot see.
## Null Sets and Almost Everywhere Reasoning
### Negligible Exceptions
Many mathematical statements are stable under changing a function on a set whose measured size is zero, but that phrase needs an exact set-theoretic object. These negligible measurable sets are the exceptions that integration and probability are designed to ignore.
[definition: Null Set]
Let $(E,\mathcal{E},\mu)$ be a measure space. A set $N\in\mathcal{E}$ is a null set if
\begin{align*}
\mu(N)=0.
\end{align*}
[/definition]
After null sets are available, statements about functions often should not fail just because of values on a negligible exceptional set.
The remaining problem is linguistic as well as mathematical: we need a precise way to say that a property may fail, but only on a set the measure ignores. Almost-everywhere terminology packages that exception into one null measurable set.
[definition: Almost Everywhere]
Let $(E,\mathcal{E},\mu)$ be a measure space. A property $P(x)$ holds $\mu$-almost everywhere if there exists a null set $N\in\mathcal{E}$ such that $P(x)$ holds for every $x\in E\setminus N$.
[/definition]
Almost-everywhere arguments often discard one null exceptional set at each step of a countable construction. This is only legitimate if the union of all discarded exceptions remains null, rather than accumulating positive measure.
[quotetheorem:9708]
Null sets create one remaining measurability problem: a subset of a measurable null set may be too small to belong to the original sigma-algebra, even though it should still be negligible. A complete measure space rules out this mismatch by making every subset of a null measurable set measurable and null.
[definition: Complete Measure]
A measure $\mu$ on $(E,\mathcal{E})$ is complete if, whenever $N\in\mathcal{E}$ satisfies $\mu(N)=0$ and $A\subset N$, then $A\in\mathcal{E}$ and
\begin{align*}
\mu(A)=0.
\end{align*}
[/definition]
Completeness matters when functions are altered on null sets. It lets almost-everywhere equivalence interact cleanly with measurability.
[example: Countable Sets in the Real Line]
In $(\mathbb{R},\mathcal{L}_{\mathrm{Leb}}(\mathbb{R}),\mathcal{L}^1)$, where $\mathcal{L}_{\mathrm{Leb}}(\mathbb{R})$ denotes the Lebesgue sigma-algebra, each singleton has Lebesgue measure zero. For $a\in\mathbb{R}$ and $\varepsilon>0$,
\begin{align*}
\{a\}\subset\left(a-\frac{\varepsilon}{2},a+\frac{\varepsilon}{2}\right),
\end{align*}
and the containing interval has length $\varepsilon$. The singleton $\{a\}$ is closed, hence Borel, hence Lebesgue measurable. Since the interval is the disjoint union of $\{a\}$ and the measurable set $\left(a-\frac{\varepsilon}{2},a+\frac{\varepsilon}{2}\right)\setminus\{a\}$, countable additivity gives
\begin{align*}
\mathcal{L}^1(\{a\})\le \varepsilon.
\end{align*}
Since this holds for every $\varepsilon>0$, we get $\mathcal{L}^1(\{a\})=0$. A countable union argument then gives $\mathcal{L}^1(\mathbb{Q})=0$, even though $\mathbb{Q}$ is dense in $\mathbb{R}$.
[/example]
Null sets are especially powerful when a countable process generates a limiting event. Probability packages this situation in the language of events that occur infinitely often.
### Infinitely Often Events
Almost-everywhere reasoning is central in probability because infinite sequences of events produce limiting behavior that is not captured by any single event. If one wants to ask whether events keep happening forever, a finite union or intersection is not enough: the event must remember what happens arbitrarily far out in the sequence.
[definition: Limsup of Events]
Let $(\Omega,\mathcal{F})$ be a measurable space and let $(A_n)_{n=1}^{\infty}$ be a sequence in $\mathcal{F}$. The limsup event is
\begin{align*}
\limsup_{n\to\infty}A_n=\bigcap_{n=1}^{\infty}\bigcup_{m\ge n}A_m.
\end{align*}
[/definition]
This event is often read as "$A_n$ occurs infinitely often"; in probability notation this is abbreviated as $A_n\text{ i.o.}$. The useful question is when this tail event must have probability zero.
The obstruction is that each individual event can be small while infinitely many opportunities remain. What rules out infinitely many occurrences is not termwise smallness alone, but a summable bound on the total amount of probability available in the tails.
[quotetheorem:507]
The lemma is a standard route from summable estimates to almost sure conclusions. It is one of the simplest places where countable additivity becomes a probabilistic tool.
[example: Rare Events Occur Finitely Often]
Let $(\Omega,\mathcal{F},\mathbb{P})$ be a probability space and let $(A_n)_{n=1}^{\infty}$ be events with $\mathbb{P}(A_n)\le 2^{-n}$. Then
\begin{align*}
\sum_{n=1}^{\infty}\mathbb{P}(A_n)\le\sum_{n=1}^{\infty}2^{-n}=1.
\end{align*}
By the first Borel-Cantelli lemma,
\begin{align*}
\mathbb{P}(A_n\text{ i.o.})=0.
\end{align*}
With probability one, only finitely many of the events occur.
[/example]
The preceding probability example is still a statement about one measure on one space. Many constructions, however, require moving measures through maps or comparing two measures on the same measurable structure.
## Comparing and Transporting Measures
### Pushforwards and Laws
A measurable map sends points from one measurable space to another, but a measure cannot be transported by simply applying the map to sets unless images of measurable sets are known to be measurable. The stable operation is instead to measure a target set by pulling it back to the source.
[definition: Pushforward Measure]
Let $(E,\mathcal{E})$ and $(F,\mathcal{F})$ be measurable spaces, let $\mu$ be a measure on $(E,\mathcal{E})$, and let $f:E\to F$ be measurable. The pushforward measure $f_{\#}\mu$ on $(F,\mathcal{F})$ is the function
\begin{align*}
f_{\#}\mu:\mathcal{F}&\to[0,\infty]
\end{align*}
defined by
\begin{align*}
(f_{\#}\mu)(B)=\mu(f^{-1}(B))
\end{align*}
for every $B\in\mathcal{F}$.
[/definition]
In probability, the pushforward of $\mathbb{P}$ by a random variable is its distribution. The next example shows the discrete formula encoded by the definition.
[example: Law of a Discrete Random Variable]
Let $(\Omega,\mathcal{F},\mathbb{P})$ be a probability space and let $X:\Omega\to\mathbb{N}$ be a random variable. The law of $X$ is $\mu_X=X_{\#}\mathbb{P}$ on $(\mathbb{N},\mathcal{P}(\mathbb{N}))$. For $k\in\mathbb{N}$,
\begin{align*}
\mu_X(\{k\})=\mathbb{P}(X=k).
\end{align*}
For $A\subset\mathbb{N}$,
\begin{align*}
\mu_X(A)=\sum_{k\in A}\mathbb{P}(X=k).
\end{align*}
[/example]
Pushforwards explain how a measure changes under a map. The next comparison stays on one measurable space and asks when one measure can be described by weighting another, or when the two measures occupy disjoint measurable regions.
### Densities and Singular Parts
Two measures on the same measurable space may disagree most sharply on which sets are negligible. If a set has zero $\mu$-mass but positive $\nu$-mass, then $\nu$ cannot be obtained by merely reweighting $\mu$, because $\mu$ provides no mass there to redistribute.
[definition: Absolute Continuity of Measures]
Let $\mu$ and $\nu$ be measures on the same measurable space $(E,\mathcal{E})$. The measure $\nu$ is absolutely continuous with respect to $\mu$, written $\nu\ll\mu$, if for every $A\in\mathcal{E}$,
\begin{align*}
\mu(A)=0\implies\nu(A)=0.
\end{align*}
[/definition]
Absolute continuity removes the null-set obstruction to writing $\nu$ as a weighted version of $\mu$, but it does not by itself exhibit the weight. The main question is whether there is a measurable density whose integral over each set exactly reconstructs $\nu$.
[quotetheorem:1247]
The density $g$ in this formulation is a nonnegative measurable function with finite values, and its role is to convert integration with respect to $\mu$ into the measure $\nu$ on every measurable set. The $\sigma$-finiteness assumptions are what make such a density available globally, rather than only on isolated finite pieces of the space.
There is also a different obstruction from absolute continuity: two measures may fail to overlap at all rather than one being obtained by reweighting the other. To name this opposite kind of relationship, one asks whether the space can be split so that each measure gives full mass to a region ignored by the other.
[definition: Mutually Singular Measures]
Let $\mu$ and $\nu$ be measures on the same measurable space $(E,\mathcal{E})$. The measures $\mu$ and $\nu$ are mutually singular, written $\mu\perp\nu$, if there exists $A\in\mathcal{E}$ such that
\begin{align*}
\mu(A)=0
\end{align*}
and
\begin{align*}
\nu(E\setminus A)=0.
\end{align*}
[/definition]
Point masses and Lebesgue measure are the basic model for singularity. The following computation shows that all mass of the point measure is concentrated on a Lebesgue-null set.
[example: A Point Mass Is Singular to Lebesgue Measure]
Let $a\in\mathbb{R}$, and define $\delta_a$ on $(\mathbb{R},\mathcal{B}(\mathbb{R}))$ by
\begin{align*}
\delta_a(B)=\mathbb{1}_B(a).
\end{align*}
Set $A=\{a\}$. Then $\mathcal{L}^1(A)=0$ and $\delta_a(\mathbb{R}\setminus A)=0$, so $\delta_a\perp\mathcal{L}^1$.
[/example]
Singularity and absolute continuity compare measures on the same space. Another basic comparison asks whether a transformation changes the measure at all, which leads from static size assignments toward dynamics.
### Invariance
A map may transport a measure back to itself. This is the measure-theoretic language for dynamics that rearrange mass without changing its distribution.
[definition: Measure-Preserving Transformation]
Let $(E,\mathcal{E},\mu)$ be a measure space. A measurable map $T:E\to E$ is measure-preserving if
\begin{align*}
\mu(T^{-1}(A))=\mu(A)
\end{align*}
for every $A\in\mathcal{E}$.
[/definition]
Measure-preserving transformations are the entry point to ergodic theory. They turn the static concept of measure into a tool for studying long-term behaviour.
## Product Measures and Independence
Pairs of coordinates create a new measurability problem: a subset of $E_1\times E_2$ need not be described by separate conditions on the two coordinates. Rectangles are the basic observable sets coming from the factors, and the product sigma-algebra is the smallest measurable structure that contains those rectangles while remaining closed under [countable set](/page/Countable%20Set) operations.
[definition: Product Sigma-Algebra]
Let $(E_1,\mathcal{E}_1)$ and $(E_2,\mathcal{E}_2)$ be measurable spaces. The product $\sigma$-algebra $\mathcal{E}_1\otimes\mathcal{E}_2$ on $E_1\times E_2$ is
\begin{align*}
\mathcal{E}_1\otimes\mathcal{E}_2=\sigma\left(\{A_1\times A_2:A_1\in\mathcal{E}_1,\ A_2\in\mathcal{E}_2\}\right).
\end{align*}
[/definition]
The product sigma-algebra says which subsets of $E_1\times E_2$ are measurable, but it does not yet assign their sizes. Under sigma-finiteness, the rectangle rule has a unique extension to the generated sigma-algebra; that uniqueness is what makes the following definition unambiguous. In the rectangle formula, the product of extended-real values uses the convention $0\cdot\infty=0$.
[definition: Product Measure]
Let $(E_1,\mathcal{E}_1,\mu_1)$ and $(E_2,\mathcal{E}_2,\mu_2)$ be $\sigma$-finite measure spaces. The product measure $\mu_1\otimes\mu_2$ is the unique measure on $(E_1\times E_2,\mathcal{E}_1\otimes\mathcal{E}_2)$ satisfying
\begin{align*}
\mu_1\otimes\mu_2:\mathcal{E}_1\otimes\mathcal{E}_2\to[0,\infty]
\end{align*}
and
\begin{align*}
(\mu_1\otimes\mu_2)(A_1\times A_2)=\mu_1(A_1)\mu_2(A_2)
\end{align*}
for every $A_1\in\mathcal{E}_1$ and $A_2\in\mathcal{E}_2$.
[/definition]
After the product measure is defined, a new computational obstruction remains: a function on $E_1\times E_2$ depends on two coordinates at once, while the original measures only integrate one coordinate at a time. The essential principle is that, under the right hypotheses, integration over the product can be reduced to iterated one-coordinate integrations.
[quotetheorem:513]
For probability spaces, product measure is exactly the construction of independent coordinates. Rectangles encode events depending separately on each coordinate.
[example: Independent Coordinate Random Variables]
Let $(E_1,\mathcal{E}_1,\mu_1)$ and $(E_2,\mathcal{E}_2,\mu_2)$ be probability spaces, and set
\begin{align*}
(\Omega,\mathcal{F},\mathbb{P})=(E_1\times E_2,\mathcal{E}_1\otimes\mathcal{E}_2,\mu_1\otimes\mu_2).
\end{align*}
Let $X:\Omega\to E_1$ and $Y:\Omega\to E_2$ be the coordinate maps defined by $X(x,y)=x$ and $Y(x,y)=y$.
For $A\in\mathcal{E}_1$ and $B\in\mathcal{E}_2$,
\begin{align*}
\mathbb{P}(X\in A,\ Y\in B)=\mu_1(A)\mu_2(B).
\end{align*}
Thus $X$ and $Y$ are independent with laws $\mu_1$ and $\mu_2$.
[/example]
Products already show that extra structure on the underlying set can dictate useful measurable sets. Topology provides another source of structure: open sets become observable sets, compact sets control local finiteness, and continuous functions become tests for measures.
## Borel, Radon, and Geometric Measures
### Topological Measurability
When the underlying set has a topology, open sets are the first sets we know how to observe, but the open sets alone are usually not closed under complements or countable intersections. To use measure theory on a [topological space](/page/Topological%20Space), the observable open sets must therefore be enlarged to the smallest sigma-algebra compatible with them.
[definition: Borel Sigma-Algebra]
Let $(X,\tau)$ be a topological space. The Borel $\sigma$-algebra on $X$ is
\begin{align*}
\mathcal{B}(X)=\sigma(\tau).
\end{align*}
[/definition]
The elements of $\mathcal{B}(X)$ are called Borel sets. Once topology has supplied the measurable sets, the remaining task is to assign sizes to them without ignoring the topological origin of the sigma-algebra. A measure defined on the Borel sigma-algebra is the standard object that combines topology with size.
[definition: Borel Measure]
Let $(X,\tau)$ be a topological space. A Borel measure on $X$ is a measure
\begin{align*}
\mu:\mathcal{B}(X)\to[0,\infty]
\end{align*}
on $(X,\mathcal{B}(X))$.
[/definition]
For analysis on spaces, a Borel measure can still be too wild to interact well with topology. The difficulty appears when we try to recover the size of a complicated set using only topological approximations: open neighbourhoods should approximate from outside, and compact pieces should approximate from inside. Without that regularity, continuous compactly supported functions may miss mass that the topology cannot localise. Radon measures are introduced to rule out this mismatch and to make measure, compactness, and continuous test functions speak the same language. The word Radon has several nearby conventions in the literature, especially about which regularity properties are built into the definition. On this page, a Radon measure on a locally compact [Hausdorff space](/page/Hausdorff%20Space) means a locally finite Borel measure that is outer regular on every Borel set and inner regular by compact sets on every Borel set.
[definition: Radon Measure]
Let $X$ be a locally compact Hausdorff space. A Radon measure on $X$ is a Borel measure
\begin{align*}
\mu:\mathcal{B}(X)\to[0,\infty]
\end{align*}
such that $\mu(K)<\infty$ for every compact set $K\subset X$, and for every Borel set $A\subset X$,
\begin{align*}
\mu(A)=\inf\{\mu(U):A\subset U,\ U\subset X\text{ open}\}
\end{align*}
and
\begin{align*}
\mu(A)=\sup\{\mu(K):K\subset A,\ K\subset X\text{ compact}\}.
\end{align*}
[/definition]
Radon measures are the measures that integrate compactly supported continuous test functions in the expected positive way, but the exact representation theorem must be stated with the same regularity convention as the definition. On arbitrary locally compact Hausdorff spaces, books differ over whether the representing measure is required to be inner regular on every Borel set or only on open sets, with additional hypotheses used to upgrade between conventions. The next theorem therefore uses a common topological setting where the strong convention above is guaranteed.
[quotetheorem:2979]
The second-countability assumption is not part of the philosophical bridge between positive functionals and measures; it is there to keep the theorem aligned with this page's strong Radon convention without hiding a regularity upgrade inside the word "Radon." Some references state a version for arbitrary locally compact Hausdorff spaces using a weaker regular Borel convention, then impose additional topological hypotheses when inner regularity on all Borel sets is needed. Downstream Androma pages using Radon measures should keep this convention unless they explicitly announce a narrower or broader one before stating results.
[remark: Radon Convention]
In this chapter, any theorem quoted with the phrase "Radon measure" should be read with the convention in the definition above: locally finite Borel measure, outer regular on every Borel set, and inner regular by compact sets on every Borel set. Results imported from sources using a weaker regular Borel convention must restate their topological hypotheses before being reused.
[/remark]
Radon measures connect topology with size on locally compact spaces. The final geometric direction changes the dimension being measured, so that the same abstract axioms can describe length on curves, area on surfaces, and more singular sets.
### Dimension and Hausdorff Measure
Lebesgue measure gives $n$-dimensional volume in $\mathbb{R}^n$, but geometry also needs length, surface area, and fractional-dimensional size. The next definition measures a set by covering it with small pieces and charging each piece by its diameter to the power $s$. Its normalizing constant uses Euler's gamma function $\Gamma$, which extends the factorial by satisfying $\Gamma(k+1)=k!$ for every integer $k\ge0$.
[definition: Hausdorff Measure]
Let $(X,d)$ be a [metric space](/page/Metric%20Space) and let $s\in[0,\infty)$. Set
\begin{align*}
c_s=\frac{\pi^{s/2}}{2^s\Gamma(s/2+1)}.
\end{align*}
For $s=0$, use the convention that $(\operatorname{diam}U)^0=0$ when $U=\varnothing$ and $(\operatorname{diam}U)^0=1$ when $U\ne\varnothing$.
For $\delta>0$, define
\begin{align*}
\mathcal{H}^{s}_{\delta}:\mathcal{P}(X)&\to[0,\infty]
\end{align*}
by
\begin{align*}
\mathcal{H}^{s}_{\delta}(A)=\inf\left\{c_s\sum_{i=1}^{\infty}(\operatorname{diam}U_i)^s:A\subset\bigcup_{i=1}^{\infty}U_i,\ \operatorname{diam}(U_i)<\delta\right\}
\end{align*}
for every $A\subset X$. Define the Hausdorff outer measure $\mathcal{H}^{s}_{*}:\mathcal{P}(X)\to[0,\infty]$ by
\begin{align*}
\mathcal{H}^{s}_{*}(A)=\lim_{\delta\downarrow0}\mathcal{H}^{s}_{\delta}(A)
\end{align*}
for every $A\subset X$. Let $\mathcal{M}_{\mathcal{H}^{s}_{*}}$ be the sigma-algebra of $\mathcal{H}^{s}_{*}$-Caratheodory-measurable sets, meaning the sets $A\subset X$ such that
\begin{align*}
\mathcal{H}^{s}_{*}(B)=\mathcal{H}^{s}_{*}(B\cap A)+\mathcal{H}^{s}_{*}(B\setminus A)
\end{align*}
for every $B\subset X$. The $s$-dimensional Hausdorff measure is the measure
\begin{align*}
\mathcal{H}^{s}:\mathcal{M}_{\mathcal{H}^{s}_{*}}&\to[0,\infty]
\end{align*}
defined by $\mathcal{H}^{s}(A)=\mathcal{H}^{s}_{*}(A)$ for every $A\in\mathcal{M}_{\mathcal{H}^{s}_{*}}$.
[/definition]
The outer measure is defined on all subsets, while the countably additive measure lives on the Caratheodory-measurable sigma-algebra. For metric spaces, every Borel set is $\mathcal{H}^{s}_{*}$-Caratheodory-measurable, so Hausdorff measure restricts in particular to a Borel measure. This distinction is what lets Hausdorff measure keep the abstract measure axioms while adapting size to geometric dimension. At dimension zero, it recovers counting for finite sets.
[example: Zero-Dimensional Hausdorff Measure Counts Finite Sets]
Let $X$ be a metric space and let $A=\{a_1,\ldots,a_m\}\subset X$ be finite. If $m=0$, the empty cover has total cost $0$, so $\mathcal{H}^0(\varnothing)=0$. If $m=1$, every sufficiently small nonempty cover must contain at least one covering set, and the singleton cover gives cost $1$, so $\mathcal{H}^0(A)=1$. Suppose $m\ge2$, and set
\begin{align*}
r=\min\{d(a_i,a_j):i\ne j\}.
\end{align*}
Then $r>0$. For $0<\delta\le r$, any set $U$ with $\operatorname{diam}(U)<\delta$ contains at most one point of $A$. With the convention that empty covering sets contribute $0$ to the sum and each nonempty covering set contributes $(\operatorname{diam}U)^0=1$, every $\delta$-cover of $A$ has cost at least $m$. The singleton cover $A\subset\bigcup_{i=1}^m\{a_i\}$ has cost $m$, so $\mathcal{H}^0_\delta(A)=m$ for all sufficiently small $\delta$. Thus $\mathcal{H}^0(A)=m$.
[/example]
The chapter has now returned to its starting theme from several directions: size is useful only when it survives countable constructions. The connected topics below indicate where each direction becomes a full theory rather than a structural preview.
## Beyond and Connected Topics
Measure is the foundation for integration theory. The next layer includes construction from premeasures and outer measures, completion of measure spaces, signed and complex measures, $L^p(E,\mathcal{E},\mu)$ spaces, convergence theorems, and Radon-Nikodym derivatives. This parent page has used these topics only as signposts; each deserves a child treatment where the construction proofs, decomposition theorems, examples, and completion procedures can be developed at full scale without compressing the narrative. These tools turn measure from a theory of set size into a theory of function spaces.
Probability is the normalized branch of the theory. Random variables are measurable maps, distributions are pushforward measures, independence is expressed by product measures, and almost sure statements are almost-everywhere statements for $\mathbb{P}$. The Androma notes [Cambridge IB Probability and Measure](/page/Cambridge%20IB%20Probability%20and%20Measure) develop this direction from the probability side.
Geometric measure theory changes the meaning of size while keeping countable additivity. Hausdorff measure, Radon measures, rectifiability, area and coarea formulae, and sets of finite perimeter use measures to study rough geometric objects. The notes [Geometric Measure Theory I: Measures and Hausdorff Dimension](/page/Geometric%20Measure%20Theory%20I%3A%20Measures%20and%20Hausdorff%20Dimension), [Geometric Measure Theory II: Area and Coarea Formulas](/page/Geometric%20Measure%20Theory%20II%3A%20Area%20and%20Coarea%20Formulas), and [Geometric Measure Theory III: BV Functions and Sets of Finite Perimeter](/page/Geometric%20Measure%20Theory%20III%3A%20BV%20Functions%20and%20Sets%20of%20Finite%20Perimeter) continue this path.
Dynamical systems use measures to describe statistical behaviour under iteration. Measure-preserving transformations, invariant probability measures, and measure-theoretic entropy study systems where individual orbits may be complicated but distributions remain structured.
Functional analysis views measures as objects acting on functions. The [Riesz representation theorem](/theorems/218) identifies Radon measures with positive linear functionals, [weak convergence](/page/Weak%20Convergence) of measures is tested against continuous functions, and projection-valued measures connect measure theory with spectral theory.
## References
Androma, [Cambridge IB Probability and Measure](/page/Cambridge%20IB%20Probability%20and%20Measure).
Androma, [Geometric Measure Theory I: Measures and Hausdorff Dimension](/page/Geometric%20Measure%20Theory%20I%3A%20Measures%20and%20Hausdorff%20Dimension).
Androma, [Geometric Measure Theory II: Area and Coarea Formulas](/page/Geometric%20Measure%20Theory%20II%3A%20Area%20and%20Coarea%20Formulas).
Androma, [Geometric Measure Theory III: BV Functions and Sets of Finite Perimeter](/page/Geometric%20Measure%20Theory%20III%3A%20BV%20Functions%20and%20Sets%20of%20Finite%20Perimeter).
Halmos, *Measure Theory* (1950).
Folland, *Real Analysis: Modern Techniques and Their Applications* (1999).
Billingsley, *Probability and Measure* (1995).
Evans and Gariepy, *Measure Theory and Fine Properties of Functions* (1992).
Measure
Also known as: Measure theory measure, Nonnegative measure, Countably additive measure, Measure on a sigma-algebra, Set measure