Suppose we want to assign sizes to uncertain events, but we also want those sizes to behave like honest mathematics. If an event is split into disjoint alternatives, the size of the whole should be the sum of the sizes of the alternatives. If an event is certain, its size should be $1$. If the sample space is continuous, such as the interval $[0,1]$, the same language should still describe events like $[0,1/2]$, countable unions of intervals, and limiting events that arise from repeated observations. A probability measure is the device that makes all of this possible.
The first obstruction is that not every subset of a sample space can always be assigned a consistent probability. In finite probability this issue is hidden, because every subset can be listed and counted. On an uncountable space, the attempt to assign lengths to all subsets while preserving countable additivity conflicts with the set-theoretic structure of the real line. The modern solution is to separate the sample space from the collection of events whose probabilities are being assigned.
[example: Why Events Need a Sigma Algebra]
Let $\Omega=[0,1]$. Start with the open subintervals of $[0,1]$, and close them under complements in $\Omega$ and countable unions; the resulting collection is the Borel sigma algebra $\mathcal{B}([0,1])$. Because a sigma algebra is also closed under countable intersections, intervals such as $[a,b]$ are Borel: for $0\le a\le b\le 1$,
\begin{align*}
[a,b]=(a,1]\cap[0,b)
\end{align*}
where $(a,1]=\Omega\setminus[0,a]$ and $[0,b]=\Omega\setminus(b,1]$, and these sets are obtained from open intervals by complements and countable unions.
On $\mathcal{B}([0,1])$, the uniform probability measure is Lebesgue length restricted to $[0,1]$, so
\begin{align*}
P([a,b])=b-a
\end{align*}
whenever $0\le a\le b\le 1$; in particular,
\begin{align*}
P([0,1])=1-0=1.
\end{align*}
The restriction to a sigma algebra is not cosmetic. If one tried to assign a probability to every subset while keeping countable additivity and [translation invariance](/theorems/4911) modulo $1$, a Vitali-type obstruction appears. Choose one representative from each equivalence class of $[0,1)$ under $x\sim y$ when $x-y\in\mathbb{Q}$, and call the resulting set $V$. For rational $r\in[0,1)\cap\mathbb{Q}$, let
\begin{align*}
V+r=\{(v+r)\bmod 1:v\in V\}.
\end{align*}
If $(V+r)\cap(V+s)$ contained a point, then for some $v,w\in V$ we would have $(v+r)\bmod 1=(w+s)\bmod 1$, hence $v-w\in\mathbb{Q}$, so $v=w$ by the choice of one representative from each equivalence class, and then $r=s$. Thus the sets $V+r$ are pairwise disjoint. They also cover $[0,1)$, because every $x\in[0,1)$ is rationally equivalent to exactly one representative $v\in V$, so $x=(v+r)\bmod 1$ for some rational $r\in[0,1)$.
If such a probability measure measured all subsets and were invariant under these translations, then every set $V+r$ would have the same value $P(V)$. Countable additivity would give
\begin{align*}
1=P([0,1))=\sum_{r\in[0,1)\cap\mathbb{Q}}P(V+r)=\sum_{r\in[0,1)\cap\mathbb{Q}}P(V).
\end{align*}
If $P(V)=0$, the right side is $0$; if $P(V)>0$, the right side diverges to infinity. Both contradict $P([0,1))=1$. The sigma algebra therefore records exactly which events the model is prepared to measure while preserving the probability rules.
[/example]
The example shows the central pattern: probability is not only a number attached to an outcome, but a countably additive set function attached to a chosen class of events. The rest of the page stays close to that measure-theoretic core; random variables, laws, independence, and products are mentioned only as downstream uses of probability measures rather than developed as separate topics.
## Definition
The primary object is a normalized measure on a chosen family of events. The normalization $P(\Omega)=1$ marks the whole sample space as certain, while countable additivity says that disjoint alternatives contribute their probabilities without overlap. This is the point where ordinary intuition about finite counting becomes a definition that also works for intervals, limiting events, and infinite experiments.
[definition: Probability Measure]
Let $\Omega$ be a set and let $\mathcal{F}$ be a sigma algebra on $\Omega$. A probability measure on $(\Omega,\mathcal{F})$ is a function
\begin{align*}
P:\mathcal{F} &\to [0,1]
\end{align*}
such that:
\begin{align*}
P(\Omega)&=1.
\end{align*}
\begin{align*}
P\left(\bigcup_{n=1}^{\infty} A_n\right)&=\sum_{n=1}^{\infty} P(A_n)
\end{align*}
whenever $A_1,A_2,\ldots\in\mathcal{F}$ are pairwise disjoint.
[/definition]
The notation $(\Omega,\mathcal{F})$ deliberately separates outcomes from events. The sample space $\Omega$ holds possible outcomes, but the probability measure is only required to evaluate sets in $\mathcal{F}$. To make that phrase precise, we isolate the event structure that the definition assumes.
## Event Structures
Before probability can be computed, we need a space of events stable under the operations that probability questions use. Complements model negation, countable unions model repeated alternatives, and countable intersections follow from the other two. This is why probability begins with a sigma algebra rather than with a bare collection of subsets.
[definition: Sigma Algebra]
Let $\Omega$ be a set. A sigma algebra on $\Omega$ is a collection $\mathcal{F}\subset \mathcal{P}(\Omega)$ such that:
$\Omega \in \mathcal{F}$.
If $A\in\mathcal{F}$, then $\Omega\setminus A\in\mathcal{F}$.
If $A_1,A_2,\ldots\in\mathcal{F}$, then $\bigcup_{n=1}^{\infty} A_n\in\mathcal{F}$.
[/definition]
A sigma algebra is the bookkeeping device that makes limiting events legitimate, but the sigma algebra alone is not yet the object on which a probability measure is defined. We need a name for the sample space together with its admissible events, because the same underlying set can carry different event structures and therefore support different probability models. That paired object is the measurable space.
[definition: Measurable Space]
A measurable space is a pair $(\Omega,\mathcal{F})$, where $\Omega$ is a set and $\mathcal{F}$ is a sigma algebra on $\Omega$.
[/definition]
The elements of $\mathcal{F}$ are called measurable sets; in probability they are usually called events. With this supporting language in place, the definition of probability measure says exactly that events receive normalized, countably additive sizes.
Once probabilities are assigned only through normalization and countable additivity, even familiar manipulations such as passing to complements or taking limits of increasing events need justification. Without such rules, a calculation with events could silently use properties that have not been proved from the axioms. The following result collects the basic consequences that make probability measures usable in ordinary event calculations: monotonicity, complement formulas, finite inclusion-exclusion, and continuity along increasing or decreasing event sequences.
[quotetheorem:1106]
The last two identities are continuity statements. They explain why probability measures are compatible with limiting events, not only with finite Boolean combinations. That compatibility is what makes measure-theoretic probability suitable for sequences of random variables.
## Finite and Countable Models
Finite probability spaces reveal the definition without analytic complications. The event sigma algebra can be the full power set, and the probability measure is completely determined by the mass assigned to each point. To connect the measure-theoretic definition with the familiar act of assigning probabilities to individual outcomes, we first name the function that records those pointwise masses.
[definition: Probability Mass Function]
Let $\Omega$ be a finite or [countable set](/page/Countable%20Set). A [probability mass function](/page/Probability%20Mass%20Function) on $\Omega$ is a function $p:\Omega\to[0,1]$ such that
\begin{align*}
\sum_{\omega\in\Omega}p(\omega)&=1.
\end{align*}
[/definition]
A probability mass function defines a probability measure on $(\Omega,\mathcal{P}(\Omega))$ by
\begin{align*}
P(A)&=\sum_{\omega\in A}p(\omega)
\end{align*}
for every $A\subset \Omega$. This construction explains why elementary probability can often pretend that probabilities belong to outcomes. In a countable model, events are still the measured objects, but the event probability is recovered by summing point masses.
[example: A Geometric Distribution as a Probability Measure]
Let $\Omega=\mathbb{N}$ and fix $q\in(0,1)$. Define
\begin{align*}
p(n)&=(1-q)q^{n-1},\qquad n\in\mathbb{N}.
\end{align*}
Since $0<q<1$, the geometric series formula gives
\begin{align*}
\sum_{n=1}^{\infty}p(n)&=\sum_{n=1}^{\infty}(1-q)q^{n-1}.
\end{align*}
Factoring out $1-q$ and reindexing by $m=n-1$ gives
\begin{align*}
\sum_{n=1}^{\infty}(1-q)q^{n-1}&=(1-q)\sum_{m=0}^{\infty}q^m.
\end{align*}
Using $\sum_{m=0}^{\infty}q^m=1/(1-q)$, we get
\begin{align*}
(1-q)\sum_{m=0}^{\infty}q^m&=(1-q)\frac{1}{1-q}=1.
\end{align*}
For every $A\subset\mathbb{N}$, set
\begin{align*}
P(A)&=\sum_{n\in A}(1-q)q^{n-1}.
\end{align*}
Then $P(\mathbb{N})=1$ by the computation above. If $A_1,A_2,\ldots$ are pairwise disjoint subsets of $\mathbb{N}$, then each $n\in\bigcup_{j=1}^{\infty}A_j$ belongs to exactly one $A_j$, so regrouping the nonnegative terms gives
\begin{align*}
P\left(\bigcup_{j=1}^{\infty}A_j\right)&=\sum_{j=1}^{\infty}\sum_{n\in A_j}(1-q)q^{n-1}.
\end{align*}
By the definition of $P(A_j)$, this is
\begin{align*}
P\left(\bigcup_{j=1}^{\infty}A_j\right)&=\sum_{j=1}^{\infty}P(A_j).
\end{align*}
Thus $P$ is a probability measure on $\mathcal{P}(\mathbb{N})$.
For the tail event $\{n\ge k\}$, where $k\in\mathbb{N}$,
\begin{align*}
P(\{n\ge k\})&=\sum_{n=k}^{\infty}(1-q)q^{n-1}.
\end{align*}
Writing $n=k+m$ gives
\begin{align*}
\sum_{n=k}^{\infty}(1-q)q^{n-1}&=\sum_{m=0}^{\infty}(1-q)q^{k+m-1}.
\end{align*}
Factoring out $q^{k-1}$ gives
\begin{align*}
\sum_{m=0}^{\infty}(1-q)q^{k+m-1}&=(1-q)q^{k-1}\sum_{m=0}^{\infty}q^m.
\end{align*}
Using the geometric series formula once more,
\begin{align*}
(1-q)q^{k-1}\sum_{m=0}^{\infty}q^m&=(1-q)q^{k-1}\frac{1}{1-q}=q^{k-1}.
\end{align*}
So the probability of waiting at least until time $k$ is $q^{k-1}$, making the geometric tail explicit.
[/example]
The countable case also raises the question of whether point masses describe every probability measure on a countable sample space. The theorem below gives the exact answer we need: countable probability measures are precisely sums of their atomic masses.
[quotetheorem:9348]
This result is the boundary of the discrete theory. On a countable sample space, knowing the probability of each singleton determines the whole measure, so probability can be computed by summing point masses. The next section leaves that setting: on continuous spaces, singleton probabilities may all be zero, and the event structure must instead record intervals, open sets, and their countable combinations.
## Continuous Models
### Borel Events
The next conceptual shift is that a single point may have probability zero while an interval has positive probability. This is not a paradox; it is the reason countable additivity is phrased for countable unions rather than arbitrary unions.
[definition: Borel Probability Measure]
Let $X$ be a [topological space](/page/Topological%20Space) and let $\mathcal{B}(X)$ be the Borel sigma algebra generated by the open subsets of $X$. A Borel probability measure on $X$ is a probability measure on $(X,\mathcal{B}(X))$.
[/definition]
The Borel sigma algebra is the natural event space when observations take values in a topological space. It contains open sets, closed sets, and the countable combinations needed for limiting events, while remaining small enough for standard measures to be well behaved.
[example: Uniform Probability on an Interval]
Let $\Omega=[0,1]$ and let $\mathcal{F}=\mathcal{B}([0,1])$. The uniform probability measure $P$ is Lebesgue length restricted to Borel subsets of $[0,1]$, so for $0\le a\le b\le 1$ the interval $[a,b]$ has length
\begin{align*}
P([a,b])=b-a.
\end{align*}
In particular, if $x\in[0,1]$, then $\{x\}=[x,x]$, and therefore
\begin{align*}
P(\{x\})=P([x,x])=x-x=0.
\end{align*}
For the whole sample space,
\begin{align*}
P([0,1])=1-0=1.
\end{align*}
Thus every individual point has probability $0$, while the full interval has probability $1$. The probability of an interval is not obtained by adding uncountably many point probabilities; it is assigned directly by the measure as length.
[/example]
### Densities and Point Masses
For many real-valued models, probabilities are computed from a density, so the next definition explains the integral form we need for such models. The density is not itself the probability of a point; it is the integrand whose integral over an event gives the probability of that event.
[definition: Probability Density]
Let $P$ be a probability measure on $(\mathbb{R},\mathcal{B}(\mathbb{R}))$. A probability density for $P$ with respect to [Lebesgue measure](/page/Lebesgue%20Measure) $\mathcal{L}^1$ is a [measurable function](/page/Measurable%20Function) $f:\mathbb{R}\to[0,\infty)$ such that
\begin{align*}
P(A)&=\int_A f(x)\,d\mathcal{L}^1(x)
\end{align*}
for every $A\in\mathcal{B}(\mathbb{R})$.
[/definition]
This definition turns probability calculations into integration, but it does not say that every probability measure has a density. Discrete distributions, singular distributions, and mixtures show that probability measures are more general than densities.
### Measures Beyond Densities
The simplest way to see the limitation of densities is to look at a measure concentrated entirely at one point. Such a measure is still countably additive and normalized, but it cannot be recovered by integrating a Lebesgue density over Borel sets.
[example: A Measure With No Density]
Let $P$ be defined on Borel sets $A\subset\mathbb{R}$ by
\begin{align*}
P(A)=\mathbb{1}_A(0).
\end{align*}
Then
\begin{align*}
P(\mathbb{R})=\mathbb{1}_{\mathbb{R}}(0)=1,
\end{align*}
so this is the point mass at $0$. For the singleton $\{0\}$,
\begin{align*}
P(\{0\})=\mathbb{1}_{\{0\}}(0)=1.
\end{align*}
We show that no Lebesgue density can represent this measure. For every $m\in\mathbb{N}$, the containment $\{0\}\subset (-1/m,1/m)$ gives
\begin{align*}
0\le \mathcal{L}^1(\{0\})\le \mathcal{L}^1((-1/m,1/m))=\frac{2}{m}.
\end{align*}
Letting $m\to\infty$ gives $\mathcal{L}^1(\{0\})=0$. Hence, for any nonnegative measurable function $f$,
\begin{align*}
\int_{\{0\}} f(x)\,d\mathcal{L}^1(x)=0,
\end{align*}
because the domain of integration has Lebesgue measure $0$.
If $P$ had a density $f$ with respect to $\mathcal{L}^1$, then the defining formula for a density would give
\begin{align*}
1=P(\{0\})=\int_{\{0\}} f(x)\,d\mathcal{L}^1(x)=0,
\end{align*}
a contradiction. Thus the point mass at $0$ is a probability measure, but it is not described by a Lebesgue density.
[/example]
The ability to handle point masses and densities in one framework is one of the main strengths of the measure-theoretic definition. A model may have both continuous and discrete pieces without changing the basic language of events and measures.
## Events Built From Limits
Probability theory often asks about events that are not seen at a single time. A sequence may hit a set infinitely often, converge to a limit, or eventually stay inside a tolerance. These are countable combinations of simpler events, so sigma algebras and countable additivity are designed for them.
[definition: Almost Sure Event]
Let $P$ be a probability measure on $(\Omega,\mathcal{F})$. An event $A\in\mathcal{F}$ holds almost surely if
\begin{align*}
P(A)&=1.
\end{align*}
[/definition]
The phrase almost surely lets probability ignore null events while still making exact mathematical statements. It is stronger than saying an event has high probability, and weaker than saying the event contains every outcome.
[example: Countable Intersections of Almost Sure Events]
Let $A_1,A_2,\ldots\in\mathcal{F}$ satisfy $P(A_n)=1$ for every $n$. For each $n$, set $B_n=\Omega\setminus A_n$. By the [complement rule](/theorems/4970) for probability measures in *[Basic Properties of Probability Measures](/theorems/1106)*,
\begin{align*}
P(B_n)=P(\Omega\setminus A_n)=1-P(A_n)=1-1=0.
\end{align*}
We will also use the following consequence of countable additivity, derived directly inside the example. For arbitrary events $E_1,E_2,\ldots$, define disjoint pieces as follows. First set $C_1=E_1$. For each $n\ge 2$, set
\begin{align*}
C_n=E_n\setminus\bigcup_{k=1}^{n-1}E_k.
\end{align*}
Then the $C_n$ are pairwise disjoint, $C_n\subset E_n$, and $\bigcup_{n=1}^{\infty}C_n=\bigcup_{n=1}^{\infty}E_n$. Countable additivity and monotonicity therefore give
\begin{align*}
P\left(\bigcup_{n=1}^{\infty}E_n\right)=\sum_{n=1}^{\infty}P(C_n)\le \sum_{n=1}^{\infty}P(E_n).
\end{align*}
Because $\mathcal{F}$ is a sigma algebra, each $B_n\in\mathcal{F}$ and $\bigcup_{n=1}^{\infty}B_n\in\mathcal{F}$. By De Morgan's law,
\begin{align*}
\Omega\setminus\bigcap_{n=1}^{\infty}A_n=\bigcup_{n=1}^{\infty}(\Omega\setminus A_n)=\bigcup_{n=1}^{\infty}B_n.
\end{align*}
Applying *[Countable Subadditivity](/theorems/1108)* to the sets $B_n$ gives
\begin{align*}
P\left(\bigcup_{n=1}^{\infty}B_n\right)\le \sum_{n=1}^{\infty}P(B_n)=\sum_{n=1}^{\infty}0=0.
\end{align*}
Since probabilities are nonnegative, this forces
\begin{align*}
P\left(\bigcup_{n=1}^{\infty}B_n\right)=0.
\end{align*}
Using the complement rule again,
\begin{align*}
P\left(\bigcap_{n=1}^{\infty}A_n\right)=1-P\left(\Omega\setminus\bigcap_{n=1}^{\infty}A_n\right)=1-P\left(\bigcup_{n=1}^{\infty}B_n\right)=1-0=1.
\end{align*}
Thus countably many almost sure conditions can be imposed simultaneously: the event on which all of them hold still has probability $1$.
[/example]
The previous example shows how subadditivity turns a countable list of almost sure conditions into one simultaneous almost sure condition. This explains both the strength and the limitation of the result: countable unions of exceptional sets remain controlled, while arbitrary uncountable unions need not be.
Subadditivity is often the bridge from local estimates to global statements. Deeper convergence tools, such as the Borel--Cantelli lemmas, belong to probability theory built on top of this definition; the role of the probability measure here is to supply the countable estimate that those arguments use.
## Standard Constructions Built from Probability Measures
The main constructions of probability theory use probability measures rather than replacing them. A random variable is a measurable map out of a [probability space](/page/Probability%20Space), and its law is the probability measure obtained by transporting $P$ to the value space. Thus the formula
\begin{align*}
P_X(B)&=P(X^{-1}(B))
\end{align*}
for $B\in\mathcal{B}(\mathbb{R})$ is best read as a construction of a new probability measure, not as an additional axiom.
Independence is another construction-level idea. For events $A,B\in\mathcal{F}$, independence means
\begin{align*}
P(A\cap B)&=P(A)P(B),
\end{align*}
and product probability measures are the standard way to build spaces where this multiplication rule holds for events coming from separate experiments. The detailed theory of random variables, distributions, independence, and product measures belongs on their dedicated pages; here their purpose is to show how much of probability rests on the single definition of a normalized, countably additive measure.
## Beyond Probability Measure
Probability measures are normalized measures, so they sit inside the broader theory of [Measure](/page/Measure), measurable spaces, and [Measure Space](/page/Measure%20Space). Those pages develop the general language without the total-mass-one normalization, keeping the probability measure visible as the special case where the total mass is $1$.
The next probabilistic layer is the study of [Random Variable](/page/Random%20Variable), distribution, expectation, and convergence. In that direction, the probability measure supplies the integration theory used to define expectation and $L^p$ spaces of random variables.
For course-level development, [Cambridge IA Probability](/page/Cambridge%20IA%20Probability) introduces elementary probabilistic reasoning, while [Cambridge IB Probability and Measure](/page/Cambridge%20IB%20Probability%20and%20Measure) develops the measure-theoretic framework in which probability measures, random variables, independence, and convergence theorems are studied together.
## References
Androma, [Cambridge IA Probability](/page/Cambridge%20IA%20Probability).
Androma, [Cambridge IB Probability and Measure](/page/Cambridge%20IB%20Probability%20and%20Measure).
Androma, [Cambridge III Advanced Probability](/page/Cambridge%20III%20Advanced%20Probability).
Androma, [Measure](/page/Measure).
Androma, [Measure Space](/page/Measure%20Space).
Androma, [Geometric Measure Theory I: Measures and Hausdorff Dimension](/page/Geometric%20Measure%20Theory%20I%3A%20Measures%20and%20Hausdorff%20Dimension).
Androma, [Random Variable](/page/Random%20Variable).
Patrick Billingsley, *Probability and Measure* (1995).
David Williams, *Probability with Martingales* (1991).
Olav Kallenberg, *Foundations of Modern Probability* (2002).
Probability Measure
Also known as: Probability Law, Probability Distribution, Probability Measure Axioms, Normalized Measure