An event is the mathematical form of a question about an experiment. A coin is tossed, a card is drawn, a point is sampled, a random path is observed; in every case probability can only assign numbers after the question has been translated into a set of outcomes. The theory of events explains which questions are allowed, how logical operations on questions become set operations, and why countable limits of questions remain inside probability theory.
The main danger is that verbal questions feel more flexible than measurable sets. In a finite model this causes no trouble, because every subset can be assigned a probability. In an uncountable model, such as a uniformly chosen point of $[0,1]$, asking for probabilities of all subsets is incompatible with the usual rules of length and countable additivity. Events are therefore not arbitrary sentences; they are measurable subsets chosen as part of the model.
[example: A Die Roll as a Set of Outcomes]
Let $\Omega=\{1,2,3,4,5,6\}$ with the uniform probability measure, so $\mathbb P(\{k\})=1/6$ for each $k\in\Omega$. The question "is the outcome even?" is represented by the event
\begin{align*}
A=\{\omega\in\Omega:\omega\text{ is even}\}=\{2,4,6\}\subset\Omega.
\end{align*}
The three singleton events $\{2\}$, $\{4\}$, and $\{6\}$ are pairwise disjoint, and
\begin{align*}
A=\{2\}\cup\{4\}\cup\{6\}.
\end{align*}
By finite additivity for disjoint events,
\begin{align*}
\mathbb P(A)
&=\mathbb P(\{2\}\cup\{4\}\cup\{6\})\\
&=\mathbb P(\{2\})+\mathbb P(\{4\})+\mathbb P(\{6\})\\
&=\frac{1}{6}+\frac{1}{6}+\frac{1}{6}\\
&=\frac{3}{6}\\
&=\frac{1}{2}.
\end{align*}
Thus the probabilistic question is answered by first identifying the set of outcomes where the phrase "even outcome" is true, and then measuring that set.
[/example]
This finite example is deliberately simple, but it already contains the whole pattern. We need a sample space of possible outcomes, a class of subsets that count as observable questions, and a probability rule on that class. Before naming events themselves, we should build the structure in which such questions are allowed to live.
## Event Structures
The event collection cannot be an arbitrary list of favourite questions. If a model can answer a question, it should also answer its negation; if it can answer countably many questions, it should answer whether at least one of them occurs. These requirements lead to the closure rules of a sigma-algebra.
[definition: Sigma-Algebra of Events]
Let $\Omega$ be a set. A sigma-algebra of events on $\Omega$ is a collection $\mathcal F\subset\mathcal P(\Omega)$ such that
\begin{align*}
&\Omega\in\mathcal F,\\
&A\in\mathcal F \implies A^c\in\mathcal F,\\
&A_1,A_2,\dots\in\mathcal F \implies \bigcup_{n=1}^{\infty}A_n\in\mathcal F.
\end{align*}
[/definition]
The [complement rule](/theorems/4970) represents the logical word "not", and the countable union rule represents countable uses of "or". These closure rules also give countable intersections by taking complements. Once this allowable class of questions is fixed, the next ingredient is the numerical rule that measures them.
[definition: Probability Space]
A probability space is a triple $(\Omega,\mathcal F,\mathbb P)$ consisting of a sample space $\Omega$, a sigma-algebra $\mathcal F$ on $\Omega$, and a function
\begin{align*}
\mathbb P:\mathcal F&\to[0,1]
\end{align*}
such that $\mathbb P(\Omega)=1$ and, whenever $A_1,A_2,\dots\in\mathcal F$ are pairwise disjoint,
\begin{align*}
\mathbb P\left(\bigcup_{n=1}^{\infty}A_n\right)=\sum_{n=1}^{\infty}\mathbb P(A_n).
\end{align*}
[/definition]
This definition separates three choices that are often blurred: possible outcomes, observable questions, and assigned probabilities. With the ambient probability space in place, the primary definition can now be stated in its compact form.
## Definition
The definition of an event answers the smallest modelling question: once a probability space has been chosen, what kind of object is a probabilistic question? It should be something to which $\mathbb P$ can assign a number. This forces events to be members of the sigma-algebra, not merely subsets named in prose.
[definition: Event]
Let $(\Omega,\mathcal F,\mathbb P)$ be a probability space. An event is an element $A\in\mathcal F$.
[/definition]
The definition is short because the structure has been pushed into $\mathcal F$. In a countable model, individual outcomes can be useful building blocks for events; when a singleton has been admitted into $\mathcal F$, it is often called an elementary event. Elementary events are decisive in finite probability, but they do not capture the full behaviour of continuous probability. This distinction motivates the first warning example after the definition. It shows why events cannot always be understood by adding the masses of individual outcomes.
[example: Singletons in a Continuous Model]
Let $\Omega=[0,1]$, let $\mathcal F=\mathcal B([0,1])$, where $\mathcal B([0,1])$ denotes the Borel sigma-algebra on $[0,1]$, and let $\mathbb P$ be [Lebesgue measure](/page/Lebesgue%20Measure) restricted to $[0,1]$. For $a\in[0,1]$, the singleton is the degenerate interval
\begin{align*}
\{a\}=[a,a].
\end{align*}
It is closed in $[0,1]$, hence it belongs to $\mathcal B([0,1])$, so it is an event. Since restricted Lebesgue measure agrees with interval length on intervals contained in $[0,1]$,
\begin{align*}
\mathbb P(\{a\})
&=\mathbb P([a,a])\\
&=a-a\\
&=0.
\end{align*}
For $0\le r<s\le 1$, the interval $[r,s]$ is also closed in $[0,1]$, hence is an event, and
\begin{align*}
\mathbb P([r,s])
&=s-r.
\end{align*}
Thus every individual point has probability zero, but an interval with distinct endpoints has positive probability equal to its length.
[/example]
The continuous example forces us to respect the sigma-algebra. It also prepares the algebraic viewpoint: events are sets, so the logic of probability is built from set operations. The next section develops that calculus because almost every probability computation begins by rewriting the target event.
## Algebra of Events
### Intersections and Unions
The same event can be described in many verbal ways, and the most useful description is often obtained by combining simpler events. To compute probabilities, we need reliable translations of "and", "or", and "not". The next definition starts with simultaneous occurrence, since overlap is the quantity that appears in conditional probability and independence.
[definition: Intersection of Events]
Let $(\Omega,\mathcal F,\mathbb P)$ be a probability space and let $A,B\in\mathcal F$. The intersection event of $A$ and $B$ is
\begin{align*}
A\cap B=\{\omega\in\Omega: \omega\in A \text{ and } \omega\in B\}.
\end{align*}
[/definition]
The next operation is needed for questions that are answered yes when at least one alternative occurs. Such questions cannot be handled by intersections alone, because they ask for inclusion in either event. Defining the union prepares the probability formula that corrects for possible overlap.
[definition: Union of Events]
Let $(\Omega,\mathcal F,\mathbb P)$ be a probability space and let $A,B\in\mathcal F$. The union event of $A$ and $B$ is
\begin{align*}
A\cup B=\{\omega\in\Omega: \omega\in A \text{ or } \omega\in B\}.
\end{align*}
[/definition]
A union counts outcomes in either event, so adding $\mathbb P(A)$ and $\mathbb P(B)$ counts the overlap twice. The next theorem gives the correction that should be applied whenever the events are not known to be disjoint. It is the basic finite additivity formula used throughout elementary probability.
[quotetheorem:4969]
The subtraction term removes the double count of the intersection. A different simplification becomes available when the event we want is complicated but its failure is simple. That situation motivates the complement, the event corresponding to negating a question inside the same sample space.
### Complements and Disjointness
Negating an event keeps the same sample space but reverses the answer to the question. This operation is needed whenever a failure event is simpler than the event we want to count, and it prepares the partition of $\Omega$ into an event and its complement.
[definition: Complementary Event]
Let $(\Omega,\mathcal F,\mathbb P)$ be a probability space and let $A\in\mathcal F$. The complementary event of $A$ is
\begin{align*}
A^c=\Omega\setminus A.
\end{align*}
[/definition]
The complement turns "at least one success" into "not zero successes" and converts many large unions into smaller intersections. Since $A$ and $A^c$ partition the whole sample space, their probabilities must add to one. The next theorem records this rule because it is the standard route through many counting problems.
[quotetheorem:4970]
Complementation is a negation operation; disjointness is an exclusion relation between two events. The distinction matters because exclusion is often confused with independence. We need to name disjointness before explaining why it makes addition simpler but dependence stronger.
[definition: Disjoint Events]
Let $(\Omega,\mathcal F,\mathbb P)$ be a probability space. Events $A,B\in\mathcal F$ are disjoint if
\begin{align*}
A\cap B=\varnothing.
\end{align*}
[/definition]
Disjoint events have no overlap, so the correction term in the addition formula vanishes. Finite additivity handles a finite case split, and the probability axioms extend this to countably many disjoint alternatives. The next example shows how complementing an event can be faster than listing every successful case. It also illustrates how event algebra chooses the right description before any arithmetic begins.
[example: At Least One Head]
A fair coin is tossed three times, so the sample space is
\begin{align*}
\Omega=\{HHH,HHT,HTH,HTT,THH,THT,TTH,TTT\},
\end{align*}
with each outcome having probability $1/8$. Let $A$ be the event that at least one toss is heads. Its complement is the event that no toss is heads, hence
\begin{align*}
A^c=\{TTT\}.
\end{align*}
Since $A$ and $A^c$ are disjoint and $A\cup A^c=\Omega$, finite additivity gives
\begin{align*}
1
&=\mathbb P(\Omega)\\
&=\mathbb P(A\cup A^c)\\
&=\mathbb P(A)+\mathbb P(A^c).
\end{align*}
Therefore
\begin{align*}
\mathbb P(A)
&=1-\mathbb P(A^c)\\
&=1-\mathbb P(\{TTT\})\\
&=1-\frac{1}{8}\\
&=\frac{8}{8}-\frac{1}{8}\\
&=\frac{7}{8}.
\end{align*}
The complement calculation replaces seven favourable outcomes by the single failure outcome $TTT$.
[/example]
The algebra of events is useful even when events are written explicitly, but most probability problems do not present events that way. They present random variables and ask questions about their values. The next section explains how such questions become events through preimages.
## Events Generated by Random Variables
### Preimages and Thresholds
A random variable is a measurable map, and every measurable question about its value pulls back to an event in the sample space. This mechanism is why notation such as $\{X\le a\}$ is meaningful. The next definition makes the hidden preimage explicit so that the event remains typed as a subset of $\Omega$.
[definition: Event Determined by a Random Variable]
Let $(\Omega,\mathcal F,\mathbb P)$ be a probability space, let $(E,\mathcal E)$ be a measurable space, let $X:(\Omega,\mathcal F)\to(E,\mathcal E)$ be a random variable, and let $B\in\mathcal E$. The event that $X$ lies in $B$ is
\begin{align*}
\{X\in B\}=X^{-1}(B)=\{\omega\in\Omega:X(\omega)\in B\}.
\end{align*}
[/definition]
The notation $\{X\in B\}$ is compact, but it should always be read as a preimage in $\Omega$. For real-valued random variables, the most common sets $B$ are half-lines and intervals. This motivates threshold events, which encode distribution functions and tail probabilities.
[definition: Threshold Event]
Let $(\Omega,\mathcal F,\mathbb P)$ be a probability space, let $X:(\Omega,\mathcal F)\to(\mathbb R,\mathcal B(\mathbb R))$ be a real-valued random variable, and let $a\in\mathbb R$. A threshold event is an event of one of the forms
\begin{align*}
\{X\le a\},\qquad \{X<a\},\qquad \{X\ge a\},\qquad \{X>a\}.
\end{align*}
[/definition]
Threshold events are the raw material of distribution functions and tail estimates. Their eventhood is not a separate assumption; it follows from measurability of $X$ and the Borel nature of half-lines. The next theorem records this point because it justifies the most common probability notation involving random variables.
[quotetheorem:4971]
This theorem explains why $\mathbb P(X\le a)$ is a legitimate expression rather than informal shorthand. It also warns us that a non-measurable function would fail exactly here. The following example keeps the sample space finite while showing the preimage calculation explicitly.
[example: Squaring a Die Roll]
Let $\Omega=\{1,2,3,4,5,6\}$ with the uniform probability measure, so $\mathbb P(\{k\})=1/6$ for each $k\in\Omega$. Define
\begin{align*}
X:\Omega&\to\mathbb R\\
\omega&\mapsto \omega^2.
\end{align*}
The threshold event $\{X\le 10\}$ is the preimage of $(-\infty,10]$, so
\begin{align*}
\{X\le 10\}
&=\{\omega\in\Omega:X(\omega)\le 10\}\\
&=\{\omega\in\{1,2,3,4,5,6\}:\omega^2\le 10\}.
\end{align*}
Checking the six possible outcomes,
\begin{align*}
1^2&=1\le 10,\\
2^2&=4\le 10,\\
3^2&=9\le 10,\\
4^2&=16>10,\\
5^2&=25>10,\\
6^2&=36>10.
\end{align*}
Hence
\begin{align*}
\{X\le 10\}=\{1,2,3\}.
\end{align*}
The singleton events $\{1\}$, $\{2\}$, and $\{3\}$ are pairwise disjoint, and
\begin{align*}
\{1,2,3\}=\{1\}\cup\{2\}\cup\{3\}.
\end{align*}
By finite additivity for disjoint events,
\begin{align*}
\mathbb P(X\le 10)
&=\mathbb P(\{X\le 10\})\\
&=\mathbb P(\{1,2,3\})\\
&=\mathbb P(\{1\}\cup\{2\}\cup\{3\})\\
&=\mathbb P(\{1\})+\mathbb P(\{2\})+\mathbb P(\{3\})\\
&=\frac{1}{6}+\frac{1}{6}+\frac{1}{6}\\
&=\frac{3}{6}\\
&=\frac{1}{2}.
\end{align*}
The inequality $X\le 10$ is tested in $\mathbb R$, but the event whose probability is measured is the subset $\{1,2,3\}$ of die outcomes.
[/example]
### Information Generated by an Observation
Knowing the value of a random variable does not reveal every event in $\mathcal F$. It reveals exactly those events whose truth can be decided from that value. This motivates the sigma-algebra generated by a random variable, the event structure corresponding to an observation.
[definition: Sigma-Algebra Generated by a Random Variable]
Let $(\Omega,\mathcal F)$ be a measurable space, let $(E,\mathcal E)$ be a measurable space, and let $X:(\Omega,\mathcal F)\to(E,\mathcal E)$ be a measurable map. The sigma-algebra generated by $X$ is
\begin{align*}
\sigma(X)=\{X^{-1}(B):B\in\mathcal E\}.
\end{align*}
[/definition]
The sigma-algebra $\sigma(X)$ records the events observable from $X$. In probability this is the first appearance of events as information rather than only as subsets. The next example shows the distinction between events decided by an observation and events left unresolved by it.
[example: Information from the First Toss]
Let two fair coin tosses be modelled by $\Omega=\{HH,HT,TH,TT\}$ with the uniform probability measure. Let $X$ record the first toss, so
\begin{align*}
X(HH)&=H,& X(HT)&=H,& X(TH)&=T,& X(TT)&=T.
\end{align*}
Taking the possible observed values to be $\{H,T\}$ with its full sigma-algebra, the sigma-algebra generated by $X$ is
\begin{align*}
\sigma(X)=\{X^{-1}(B):B\subset\{H,T\}\}.
\end{align*}
There are four subsets of $\{H,T\}$, and their preimages are
\begin{align*}
X^{-1}(\varnothing)
&=\{\omega\in\Omega:X(\omega)\in\varnothing\}
=\varnothing,\\
X^{-1}(\{H\})
&=\{\omega\in\Omega:X(\omega)=H\}
=\{HH,HT\},\\
X^{-1}(\{T\})
&=\{\omega\in\Omega:X(\omega)=T\}
=\{TH,TT\},\\
X^{-1}(\{H,T\})
&=\{\omega\in\Omega:X(\omega)\in\{H,T\}\}
=\Omega.
\end{align*}
Hence
\begin{align*}
\sigma(X)=\{\varnothing,\Omega,\{HH,HT\},\{TH,TT\}\}.
\end{align*}
The event that the first toss is heads is $\{HH,HT\}$, so it belongs to $\sigma(X)$. The event that the second toss is heads is $\{HH,TH\}$, and
\begin{align*}
\{HH,TH\}\ne\varnothing,\qquad
\{HH,TH\}\ne\Omega,\qquad
\{HH,TH\}\ne\{HH,HT\},\qquad
\{HH,TH\}\ne\{TH,TT\}.
\end{align*}
Therefore $\{HH,TH\}\notin\sigma(X)$. Knowing the first toss decides exactly which of the two blocks $\{HH,HT\}$ and $\{TH,TT\}$ occurred, but it does not decide whether the second toss was heads.
[/example]
Once events represent information, the next question is how probabilities change when an event is known to have occurred. This leads from generated sigma-algebras to conditional probability. The elementary version conditions on a single event of positive probability.
## Conditional Events and Independence
### Conditioning on an Event
Conditioning restricts attention to the part of the sample space where the information event has occurred. The probability of $A$ after learning $B$ should count the overlap $A\cap B$ and then renormalise by the size of $B$. This motivates the formula below and also explains why $\mathbb P(B)>0$ is required.
[definition: Conditional Probability Given an Event]
Let $(\Omega,\mathcal F,\mathbb P)$ be a probability space, and let $A,B\in\mathcal F$ with $\mathbb P(B)>0$. The conditional probability of $A$ given $B$ is
\begin{align*}
\mathbb P(A\mid B)=\frac{\mathbb P(A\cap B)}{\mathbb P(B)}.
\end{align*}
[/definition]
The denominator makes $B$ the new effective universe, but the formula is useful only if it still obeys the probability axioms. Otherwise conditioning would be a convenient ratio rather than a genuine probability model. The result below checks that, once $B$ is fixed and $\mathbb P(B)>0$, the assignment $A\mapsto\mathbb P(A\mid B)$ supports the same complement, union, and disjoint-decomposition rules as ordinary probability.
[quotetheorem:4972]
This theorem confirms that conditioning is not an informal change of mood; it is a new probability measure. Once that is known, conditional probability must support countable disjoint case splits just as ordinary probability does. The next theorem records the conditional version of countable additivity, now using notation that has been defined.
[quotetheorem:1112]
Conditional countable additivity is the rule that lets us decompose an event into disjoint cases after information $B$ has already been fixed. It also prepares the definition of independence, where conditioning on one event leaves the probability of another unchanged. The overlap formula makes the multiplicative criterion natural.
### Independence and Joint Information
Independence asks whether two pieces of event information interact probabilistically. The definition must use intersections because simultaneous occurrence is where information from two events is tested.
[definition: Independent Events]
Let $(\Omega,\mathcal F,\mathbb P)$ be a probability space. Events $A,B\in\mathcal F$ are independent if
\begin{align*}
\mathbb P(A\cap B)=\mathbb P(A)\mathbb P(B).
\end{align*}
[/definition]
When $\mathbb P(B)>0$, this condition is equivalent to $\mathbb P(A\mid B)=\mathbb P(A)$. The multiplicative definition also covers zero-probability conditioning events without division. A first consistency check is that independence should not depend on whether the second question is phrased positively or negatively: if learning $B$ gives no information about $A$, then learning $B^c$ should give no information about $A$ either.
[quotetheorem:1115]
Independence is therefore a relation between the information carried by events, not between their positive descriptions alone. This raises the next problem: for more than two events, pairwise checks can miss a constraint that appears only when several events are intersected at once. We need a family-level definition that forces the product formula for every finite subfamily.
[definition: Independent Family of Events]
Let $(\Omega,\mathcal F,\mathbb P)$ be a probability space and let $(A_i)_{i\in I}$ be a family of events. The family is independent if for every finite subset $J\subset I$,
\begin{align*}
\mathbb P\left(\bigcap_{j\in J}A_j\right)=\prod_{j\in J}\mathbb P(A_j).
\end{align*}
[/definition]
The finite-subfamily condition is stronger than pairwise independence. This stronger condition is needed in limit theorems and repeated-trial arguments, where many events must behave as though no hidden constraint links them. The next example shows the failure of pairwise checks.
[example: Pairwise Independent but Not Independent]
Let $\Omega=\{HH,HT,TH,TT\}$ with the uniform probability measure, so each singleton outcome has probability $1/4$. Define
\begin{align*}
A=\{HH,HT\},\qquad B=\{HH,TH\},\qquad C=\{HH,TT\}.
\end{align*}
The events decompose into disjoint singleton unions:
\begin{align*}
A&=\{HH\}\cup\{HT\},\\
B&=\{HH\}\cup\{TH\},\\
C&=\{HH\}\cup\{TT\}.
\end{align*}
Therefore
\begin{align*}
\mathbb P(A)&=\mathbb P(\{HH\})+\mathbb P(\{HT\})
=\frac{1}{4}+\frac{1}{4}
=\frac{1}{2},\\
\mathbb P(B)&=\mathbb P(\{HH\})+\mathbb P(\{TH\})
=\frac{1}{4}+\frac{1}{4}
=\frac{1}{2},\\
\mathbb P(C)&=\mathbb P(\{HH\})+\mathbb P(\{TT\})
=\frac{1}{4}+\frac{1}{4}
=\frac{1}{2}.
\end{align*}
The pairwise intersections are
\begin{align*}
A\cap B
&=\{HH,HT\}\cap\{HH,TH\}
=\{HH\},\\
A\cap C
&=\{HH,HT\}\cap\{HH,TT\}
=\{HH\},\\
B\cap C
&=\{HH,TH\}\cap\{HH,TT\}
=\{HH\}.
\end{align*}
Hence
\begin{align*}
\mathbb P(A\cap B)&=\frac{1}{4}
=\frac{1}{2}\cdot\frac{1}{2}
=\mathbb P(A)\mathbb P(B),\\
\mathbb P(A\cap C)&=\frac{1}{4}
=\frac{1}{2}\cdot\frac{1}{2}
=\mathbb P(A)\mathbb P(C),\\
\mathbb P(B\cap C)&=\frac{1}{4}
=\frac{1}{2}\cdot\frac{1}{2}
=\mathbb P(B)\mathbb P(C).
\end{align*}
Thus the events are pairwise independent.
For the triple intersection,
\begin{align*}
A\cap B\cap C
&=\{HH,HT\}\cap\{HH,TH\}\cap\{HH,TT\}\\
&=\{HH\}.
\end{align*}
Therefore
\begin{align*}
\mathbb P(A\cap B\cap C)
&=\mathbb P(\{HH\})\\
&=\frac{1}{4},
\end{align*}
while
\begin{align*}
\mathbb P(A)\mathbb P(B)\mathbb P(C)
&=\frac{1}{2}\cdot\frac{1}{2}\cdot\frac{1}{2}\\
&=\frac{1}{8}.
\end{align*}
Since $\frac{1}{4}\ne\frac{1}{8}$, the product formula fails for the finite subfamily $\{A,B,C\}$, so $A,B,C$ are not an independent family.
[/example]
Pairwise independence fails because the triple intersection has extra structure. This raises a second problem: disjointness is often confused with independence. Positive-probability disjoint events cannot occur together, so learning one of them has occurred forces the other to fail. We need the theorem below to make this incompatibility explicit.
[quotetheorem:4973]
Disjoint events of positive probability exclude each other; independent events do not change each other's probabilities. This contrast becomes even more important for sequences, where repeated events can be disjoint, independent, nested, or none of these. The next section studies event sequences and the long-run questions they define.
## Limiting Events
Probability theory often asks not only whether an event occurs, but whether events occur forever, eventually stop, or recur infinitely often. Such statements use countably many unions and intersections. The sigma-algebra axioms are designed to keep these long-run questions inside the event space.
[definition: Event Occurring Infinitely Often]
Let $(\Omega,\mathcal F,\mathbb P)$ be a probability space and let $(A_n)_{n\in\mathbb N}$ be a sequence of events. The event that $A_n$ occurs infinitely often is
\begin{align*}
\{A_n\text{ i.o.}\}=\limsup_{n\to\infty}A_n=\bigcap_{n=1}^{\infty}\bigcup_{m\ge n}A_m.
\end{align*}
[/definition]
This event contains outcomes that belong to $A_m$ for arbitrarily large indices $m$. A different long-run event asks whether all sufficiently late events occur. That condition is stronger, and it is expressed by reversing the order of the countable union and intersection.
[definition: Event Occurring Eventually Always]
Let $(\Omega,\mathcal F,\mathbb P)$ be a probability space and let $(A_n)_{n\in\mathbb N}$ be a sequence of events. The event that $A_n$ occurs eventually always is
\begin{align*}
\liminf_{n\to\infty}A_n=\bigcup_{n=1}^{\infty}\bigcap_{m\ge n}A_m.
\end{align*}
[/definition]
The liminf event contains outcomes for which all but finitely many $A_n$ occur. Since both constructions use countable operations, they should remain events whenever the $A_n$ are events. The next theorem states this closure property explicitly because it legitimises almost sure convergence and recurrence statements.
[quotetheorem:4879]
The theorem shows why sigma-algebras use countable closure rather than only finite closure. Once limiting events are available, we can ask when infinitely many occurrences have probability zero or one. The Borel-Cantelli lemmas are the standard answers.
[quotetheorem:507]
The first lemma says that events with summable probabilities cannot keep occurring indefinitely except on a null event. The reverse conclusion needs an additional no-conspiracy condition. Independence supplies that condition, so the second lemma belongs naturally after the first.
[quotetheorem:508]
Together the two lemmas draw a sharp line for many independent rare-event models. Summable probabilities produce only finitely many occurrences almost surely; divergent total probability produces infinitely many occurrences almost surely under independence. The next example makes the threshold visible.
[example: Rare Events with Harmonic Probabilities]
Let $(A_n)_{n\in\mathbb N}$ be independent events with $\mathbb P(A_n)=1/n$. To check the hypothesis of *Second Borel-Cantelli Lemma*, group the harmonic series in dyadic blocks:
\begin{align*}
\sum_{n=1}^{\infty}\frac{1}{n}
&=1+\frac{1}{2}+\left(\frac{1}{3}+\frac{1}{4}\right)+\left(\frac{1}{5}+\cdots+\frac{1}{8}\right)+\cdots.
\end{align*}
For each $k\ge 1$, the block from $2^k+1$ to $2^{k+1}$ has $2^k$ terms, and each term is at least $1/2^{k+1}$, so
\begin{align*}
\sum_{n=2^k+1}^{2^{k+1}}\frac{1}{n}
&\ge 2^k\cdot \frac{1}{2^{k+1}}\\
&=\frac{1}{2}.
\end{align*}
Thus the partial sums contain arbitrarily many disjoint blocks each contributing at least $1/2$, so
\begin{align*}
\sum_{n=1}^{\infty}\mathbb P(A_n)
&=\sum_{n=1}^{\infty}\frac{1}{n}\\
&=\infty.
\end{align*}
Since the events are independent, *Second Borel-Cantelli Lemma* gives
\begin{align*}
\mathbb P(A_n\text{ i.o.})=1.
\end{align*}
If instead $\mathbb P(A_n)=1/n^2$, then for every $n\ge 2$,
\begin{align*}
\frac{1}{n^2}
&\le \frac{1}{n(n-1)}\\
&=\frac{1}{n-1}-\frac{1}{n}.
\end{align*}
Therefore, for $N\ge 2$,
\begin{align*}
\sum_{n=1}^{N}\frac{1}{n^2}
&=1+\sum_{n=2}^{N}\frac{1}{n^2}\\
&\le 1+\sum_{n=2}^{N}\left(\frac{1}{n-1}-\frac{1}{n}\right)\\
&=1+\left(1-\frac{1}{2}\right)+\left(\frac{1}{2}-\frac{1}{3}\right)+\cdots+\left(\frac{1}{N-1}-\frac{1}{N}\right)\\
&=1+1-\frac{1}{N}\\
&<2.
\end{align*}
The partial sums are increasing and bounded above by $2$, hence
\begin{align*}
\sum_{n=1}^{\infty}\mathbb P(A_n)
&=\sum_{n=1}^{\infty}\frac{1}{n^2}\\
&<\infty.
\end{align*}
By *First Borel-Cantelli Lemma*,
\begin{align*}
\mathbb P(A_n\text{ i.o.})=0.
\end{align*}
The harmonic probabilities are large enough in aggregate to force infinitely many occurrences under independence, while the square-summable probabilities allow only finitely many occurrences almost surely.
[/example]
Long-run events often have probability zero or one, and this leads to the language of almost sure statements. To use that language responsibly, we need to distinguish events that are impossible from events that are merely negligible. The next section focuses on probability-zero exceptions.
## Null Events and Almost Sure Statements
Continuous probability forces us to accept events that contain outcomes but have probability zero. A point chosen uniformly from $[0,1]$ can equal a specified number as a set-theoretic possibility, yet that singleton has probability zero. We need a name for these negligible exceptions because they are not impossible as sets, but probability theory treats them as ignorable for almost-sure statements.
[definition: Null Event]
Let $(\Omega,\mathcal F,\mathbb P)$ be a probability space. A null event is an event $N\in\mathcal F$ such that
\begin{align*}
\mathbb P(N)=0.
\end{align*}
[/definition]
Null events are the exceptions ignored by almost sure statements. If the failure set of a property is null, probability theory treats the property as holding with full probability. This leads to the next definition, which packages the convention directly in event language: an event is almost sure when its complement is a null event.
[definition: Almost Sure Event]
Let $(\Omega,\mathcal F,\mathbb P)$ be a probability space. An event $A\in\mathcal F$ occurs almost surely if
\begin{align*}
\mathbb P(A)=1.
\end{align*}
[/definition]
Almost sure reasoning often combines countably many full-probability statements. This is valid only if the union of the corresponding failure events is still negligible. The rule below supplies exactly that protection: countably many null exceptions may be collected into one null exception.
[quotetheorem:1108]
The countability restriction is essential. Uncountable unions of null events can have positive probability, so almost sure arguments must track how many exceptional sets are being combined. The next example shows the failure of an uncountable version.
[example: Uncountable Union of Null Events]
Let $\Omega=[0,1]$, let $\mathcal F=\mathcal B([0,1])$, and let $\mathbb P$ be Lebesgue measure restricted to $[0,1]$. For each $x\in[0,1]$, the singleton is the degenerate closed interval
\begin{align*}
\{x\}=[x,x],
\end{align*}
so $\{x\}\in\mathcal B([0,1])$. Since restricted Lebesgue measure agrees with interval length on intervals contained in $[0,1]$,
\begin{align*}
\mathbb P(\{x\})
&=\mathbb P([x,x])\\
&=x-x\\
&=0.
\end{align*}
Thus every set $\{x\}$ with $x\in[0,1]$ is a null event.
Now compare the union of all these null events. First, if $y\in[0,1]$, then $y\in\{y\}$, and since $y\in[0,1]$ the singleton $\{y\}$ is one of the sets in the union. Hence
\begin{align*}
[0,1]\subset \bigcup_{x\in[0,1]}\{x\}.
\end{align*}
Conversely, if
\begin{align*}
y\in \bigcup_{x\in[0,1]}\{x\},
\end{align*}
then there is some $x\in[0,1]$ such that $y\in\{x\}$, so $y=x$, and therefore $y\in[0,1]$. Hence
\begin{align*}
\bigcup_{x\in[0,1]}\{x\}\subset[0,1].
\end{align*}
The two inclusions give
\begin{align*}
[0,1]=\bigcup_{x\in[0,1]}\{x\}.
\end{align*}
Therefore
\begin{align*}
\mathbb P\left(\bigcup_{x\in[0,1]}\{x\}\right)
&=\mathbb P([0,1])\\
&=1-0\\
&=1.
\end{align*}
So an uncountable union of null events can have probability $1$, which is why countable additivity cannot be strengthened to uncountable additivity.
[/example]
A final technical issue remains: a subset of a null event may be too small to be visible to the original sigma-algebra. Since probability has already declared the containing event negligible, it is natural to require all its subsets to be events too. This motivates completeness of the probability space.
[definition: Complete Probability Space]
A probability space $(\Omega,\mathcal F,\mathbb P)$ is complete if whenever $N\in\mathcal F$ satisfies $\mathbb P(N)=0$ and $A\subset N$, then $A\in\mathcal F$.
[/definition]
Completeness makes changes on null sets harmless for measurability. This matters when random variables are modified on exceptional sets or when path properties are proved outside a null event. The last structural question is where the event sigma-algebra comes from in standard real-valued models.
## Borel Events and Standard Models
In real-valued probability, the observable questions are usually generated from open sets or intervals. Instead of listing every event, we specify a simple class of sets and close it under the sigma-algebra operations. This construction is needed because the starting family is rarely already closed under complements and countable unions, while probability measures require a full sigma-algebra as their domain.
[definition: Generated Sigma-Algebra]
Let $\Omega$ be a set and let $\mathcal A\subset\mathcal P(\Omega)$. The sigma-algebra generated by $\mathcal A$ is
\begin{align*}
\sigma(\mathcal A)=\bigcap\{\mathcal G:\mathcal G\text{ is a sigma-algebra on }\Omega\text{ and }\mathcal A\subset\mathcal G\}.
\end{align*}
[/definition]
Generated sigma-algebras let us build large event spaces from small observable families. This raises the modeling question of which small family should generate the event space in a topological setting. The standard answer uses open sets, leading to the Borel events used for real-valued and Euclidean random variables.
[definition: Borel Event]
Let $E$ be a [topological space](/page/Topological%20Space) and let $\mathcal B(E)$ be the sigma-algebra generated by the open subsets of $E$. A Borel event in $E$ is an element of $\mathcal B(E)$.
[/definition]
For $E=\mathbb R$, Borel events include intervals and all sets obtained from intervals by countable unions, countable intersections, and complements. This is already a very large class, but it is not the same as the full power set of $\mathbb R$. There are subsets of $[0,1]$ that are not Borel, and in the usual Lebesgue model there are even subsets that are not Lebesgue measurable; the event sigma-algebra is therefore part of the model rather than a harmless technical decoration. The remaining generator question is how little real-line data is enough to recover all Borel events. Since threshold events use half-lines, the theorem below explains why distribution functions encode the probabilities of a rich event class.
[quotetheorem:1080]
The theorem is a bridge between topology and probability. It says that knowing probabilities of threshold events is enough to determine the law of a real-valued random variable. The next finite example shows the opposite phenomenon: if the sigma-algebra is deliberately small, some ordinary-looking questions are not events.
[example: A Subset That Is Not an Event]
Let $\Omega=\{1,2,3,4\}$ and let
\begin{align*}
\mathcal F=\{\varnothing,\Omega,\{1,2\},\{3,4\}\}.
\end{align*}
In this probability space, events are exactly the elements of $\mathcal F$. Since
\begin{align*}
\{1,2\}\in\{\varnothing,\Omega,\{1,2\},\{3,4\}\},
\end{align*}
the set $\{1,2\}$ is an event. But
\begin{align*}
\{1\}&\ne\varnothing,\\
\{1\}&\ne\Omega=\{1,2,3,4\},\\
\{1\}&\ne\{1,2\},\\
\{1\}&\ne\{3,4\},
\end{align*}
so
\begin{align*}
\{1\}\notin\mathcal F.
\end{align*}
Thus $\{1\}$ is a subset of $\Omega$ but not an event in this model. A probability measure $\mathbb P:\mathcal F\to[0,1]$ may assign a value to $\mathbb P(\{1,2\})$, but the expression $\mathbb P(\{1\})$ is not defined because $\{1\}$ is not in the domain $\mathcal F$. This model can answer whether the outcome lies in the block $\{1,2\}$, but it cannot answer whether the outcome is exactly $1$.
[/example]
This example returns to the central lesson of the chapter. Eventhood is not a property of a subset alone; it is a property relative to a chosen sigma-algebra. The connected topics below all develop this same idea in richer settings.
## Beyond and Connected Topics
Events are the entry point to [Cambridge IA Probability](/page/Cambridge%20IA%20Probability), where finite and countable sample spaces make the translation between verbal questions and subsets especially concrete. Counting, conditioning, and independence all begin by identifying the right event.
The measure-theoretic development continues in [Cambridge IB Probability and Measure](/page/Cambridge%20IB%20Probability%20and%20Measure). There probability becomes a measure on a sigma-algebra, and the distinction between Borel sets, null sets, completions, and non-measurable subsets becomes part of the core language.
In [Cambridge III Advanced Probability](/page/Cambridge%20III%20Advanced%20Probability), events increasingly represent information. Filtrations, tail sigma-algebras, stopping events, and almost sure convergence all depend on understanding which events belong to which sigma-algebra.
In [Cambridge III Stochastic Calculus and Applications](/page/Cambridge%20III%20Stochastic%20Calculus%20and%20Applications), a random time $\tau$ is studied through events such as $\{\tau\le t\}$, which define stopping times. Path properties of [Brownian motion](/page/Brownian%20Motion) are expressed as almost sure events on function spaces. The same basic definition of event supports much richer continuous-time questions.
Events also connect outward to statistics, ergodic theory, and integration. In each case the central object remains a measurable set, but the interpretation changes: data, time averages, observability, or exceptional behaviour.
## References
Androma, [Cambridge IA Probability](/page/Cambridge%20IA%20Probability).
Androma, [Cambridge IB Probability and Measure](/page/Cambridge%20IB%20Probability%20and%20Measure).
Androma, [Cambridge III Advanced Probability](/page/Cambridge%20III%20Advanced%20Probability).
Androma, [Cambridge III Stochastic Calculus and Applications](/page/Cambridge%20III%20Stochastic%20Calculus%20and%20Applications).
Billingsley, *Probability and Measure* (1995).
Kallenberg, *Foundations of Modern Probability* (2002).
Williams, *Probability with Martingales* (1991).