A filtration is what lets probability remember time. A [probability space](/page/Probability%20Space) by itself can tell us which events are measurable, but it does not tell us when information becomes available. If a trader decides at noon using tomorrow's closing price, or if a stopping rule uses a future coin toss, the formulas may still be syntactically measurable, yet the model has lost the distinction between prediction and hindsight.
The central problem is therefore not only to describe randomness, but to organize information as it is revealed. A filtration is an increasing family of sigma-algebras, one for each time, and the inclusion relation records that information can be gained but not forgotten. Once this structure is in place, the words adapted, stopping time, martingale, predictable, optional, and progressive become precise.
[example: A Betting Rule That Sees Tomorrow]
Let $(X_n)_{n \in \mathbb N}$ be independent random variables with
\begin{align*}
\mathbb P(X_n=1)=\mathbb P(X_n=-1)=\frac{1}{2},
\end{align*}
and define the random walk by
\begin{align*}
S_n=\sum_{k=1}^n X_k.
\end{align*}
For every outcome $\omega$, the value $X_n(\omega)$ is either $1$ or $-1$. Hence either
\begin{align*}
X_n(\omega)^2=1^2=1
\end{align*}
or
\begin{align*}
X_n(\omega)^2=(-1)^2=1.
\end{align*}
Thus $X_n^2=1$ as a [random variable](/page/Random%20Variable).
If the gambler must choose the stake $H_n$ before seeing the $n$-th toss, then $H_n$ should depend only on $X_1,\dots,X_{n-1}$. The gain after $N$ rounds is
\begin{align*}
G_N=\sum_{n=1}^N H_nX_n.
\end{align*}
Now suppose the gambler is allowed to choose
\begin{align*}
H_n=X_n,
\end{align*}
using the current toss before placing the stake. Substituting this rule into the gain gives
\begin{align*}
G_N=\sum_{n=1}^N X_nX_n.
\end{align*}
For each $n$, multiplication of a number by itself gives
\begin{align*}
X_nX_n=X_n^2.
\end{align*}
Therefore
\begin{align*}
G_N=\sum_{n=1}^N X_n^2.
\end{align*}
Since $X_n^2=1$ for every $n$,
\begin{align*}
G_N=\sum_{n=1}^N 1.
\end{align*}
The sum contains exactly the $N$ terms indexed by $1,2,\dots,N$, so
\begin{align*}
G_N=N.
\end{align*}
Thus every outcome of the experiment gives the same positive gain $N$. The coin tosses themselves are still fair; the guaranteed profit appears because the rule $H_n=X_n$ uses information from the same time step it claims to bet on. Filtrations prevent this kind of hindsight by recording which random variables are available before each decision.
[/example]
The example shows why measurability alone is too coarse. Each $H_n$ is a measurable random variable on the full sample space, but the full sigma-algebra contains future information. The theory needs a second layer of measurability, indexed by time, to say what is known before each decision.
## Basic Measurable Information
Before defining filtrations, we need the static object whose time-dependent version will carry information. In probability, a sigma-algebra is not merely a technical domain for measures: it is a catalogue of yes-or-no questions whose answers are observable. If an event belongs to the sigma-algebra, then the model allows us to decide whether it occurred.
[definition: Sigma-Algebra]
Let $\Omega$ be a set. A sigma-algebra on $\Omega$ is a collection $\mathcal F \subset \mathcal P(\Omega)$ such that $\Omega \in \mathcal F$, such that $A \in \mathcal F$ implies $A^c \in \mathcal F$, and such that $A_1,A_2,\dots \in \mathcal F$ implies $\bigcup_{n=1}^{\infty} A_n \in \mathcal F$.
[/definition]
The complement and countable union axioms mean that observable questions are stable under negation and countable logical alternatives. But information without probabilities cannot distinguish a rare event from a typical one, and conditional prediction later needs averages over observable events. The ambient object must therefore attach a [probability measure](/page/Probability%20Measure) to the catalogue of events.
[definition: Probability Space]
A probability space is a triple $(\Omega, \mathcal F, \mathbb P)$ where $\Omega$ is a set, $\mathcal F$ is a sigma-algebra on $\Omega$, and $\mathbb P: \mathcal F \to [0,1]$ is a measure satisfying $\mathbb P(\Omega)=1$.
[/definition]
The sigma-algebra $\mathcal F$ describes all events available to the completed model. The missing ingredient is temporal access: at noon, not all events in $\mathcal F$ should be known. To distinguish prediction from hindsight, we need a time-indexed family of observable event catalogues that grows as more information is revealed.
## Definition
The previous examples show that the same probability space can support different notions of what is currently visible. A filtration isolates that extra structure: it assigns to each time the events that can be decided by then, and it requires these event catalogues to grow as observations accumulate.
[definition: Filtration]
Let $(\Omega, \mathcal F, \mathbb P)$ be a probability space and let $T$ be a totally ordered set. A filtration on $(\Omega, \mathcal F, \mathbb P)$ indexed by $T$ is a family $(\mathcal F_t)_{t \in T}$ of sub-sigma-algebras of $\mathcal F$ such that
\begin{align*}
s \le t \implies \mathcal F_s \subset \mathcal F_t.
\end{align*}
[/definition]
The index set $T$ may be $\mathbb N$ for discrete time, $[0,\infty)$ for continuous time, or a finite ordered set for a finite experiment. Once the information flow is part of the model, later definitions need to refer simultaneously to the probability law and to the available information at each time. Naming the combined structure keeps those hypotheses explicit.
## Filtered Models and Natural Information
Once a filtration has been chosen, the probability space and the information flow should be treated as a single modelling object. This packaging matters because adaptedness, stopping times, martingales, and conditional expectations all depend on both the probability law and the selected filtration.
[definition: Filtered Probability Space]
A filtered probability space is a quadruple $(\Omega, \mathcal F, (\mathcal F_t)_{t \in T}, \mathbb P)$ where $(\Omega, \mathcal F, \mathbb P)$ is a probability space and $(\mathcal F_t)_{t \in T}$ is a filtration on it.
[/definition]
In most applications, the filtration is not chosen independently. It is generated by observations of a process, and the natural question is: what is the smallest information flow that makes the observations up to time $t$ visible at time $t$?
[definition: Natural Filtration]
Let $(X_t)_{t \in T}$ be a stochastic process on $(\Omega, \mathcal F, \mathbb P)$ with values in a measurable space $(E, \mathcal E)$. The natural filtration of $(X_t)_{t \in T}$ is the filtration $(\mathcal F_t^X)_{t \in T}$ defined by
\begin{align*}
\mathcal F_t^X = \sigma(X_s : s \le t).
\end{align*}
[/definition]
The notation $\sigma(X_s : s \le t)$ means the smallest sigma-algebra making all maps $X_s$ with $s \le t$ measurable. It contains the events that can be described from the observed path up to time $t$, modulo the limitations of the measurable structure.
## Information Generated by Observations
### Generated Sigma-Algebras
The first structural question is how random variables create information. If $X$ is observed, then every event of the form $\{X \in A\}$ becomes knowable. The sigma-algebra generated by $X$ packages all such events and no additional ones.
[definition: Sigma-Algebra Generated by a Random Variable]
Let $(\Omega, \mathcal F, \mathbb P)$ be a probability space, let $(E, \mathcal E)$ be a measurable space, and let $X: \Omega \to E$ be measurable. The sigma-algebra generated by $X$ is the sub-sigma-algebra of $\mathcal F$ given by
\begin{align*}
\sigma(X) = \{X^{-1}(A) : A \in \mathcal E\}.
\end{align*}
[/definition]
For a real-valued random variable, $\sigma(X)$ records everything that can be inferred from the numerical value of $X$. If two outcomes give the same value of $X$, then no event in $\sigma(X)$ can separate them.
### Finite Observation Example
[example: Two Tosses and Growing Information]
Let $\Omega=\{HH,HT,TH,TT\}$ with the uniform probability measure. Define $X_1(\omega)$ to be the first toss and $X_2(\omega)$ to be the second toss, so $X_1$ and $X_2$ take values in $\{H,T\}$. The first coordinate map has the two fibers
\begin{align*}
X_1^{-1}(\{H\})=\{HH,HT\}.
\end{align*}
Also,
\begin{align*}
X_1^{-1}(\{T\})=\{TH,TT\}.
\end{align*}
The sigma-algebra on $\{H,T\}$ is $\{\varnothing,\{H\},\{T\},\{H,T\}\}$, so the inverse images under $X_1$ are
\begin{align*}
X_1^{-1}(\varnothing)=\varnothing.
\end{align*}
\begin{align*}
X_1^{-1}(\{H,T\})=\Omega.
\end{align*}
\begin{align*}
X_1^{-1}(\{H\})=\{HH,HT\}.
\end{align*}
\begin{align*}
X_1^{-1}(\{T\})=\{TH,TT\}.
\end{align*}
Therefore the information generated by the first toss is
\begin{align*}
\mathcal F_1=\sigma(X_1)=\{\varnothing,\Omega,\{HH,HT\},\{TH,TT\}\}.
\end{align*}
At time $2$, the observed pair $(X_1,X_2)$ has four possible values. Its fibers are
\begin{align*}
\{(X_1,X_2)=(H,H)\}=\{HH\}.
\end{align*}
\begin{align*}
\{(X_1,X_2)=(H,T)\}=\{HT\}.
\end{align*}
\begin{align*}
\{(X_1,X_2)=(T,H)\}=\{TH\}.
\end{align*}
\begin{align*}
\{(X_1,X_2)=(T,T)\}=\{TT\}.
\end{align*}
Thus $\mathcal F_2=\sigma(X_1,X_2)$ contains every singleton subset of $\Omega$. Since a sigma-algebra is closed under finite unions, for any subset $A\subset\Omega$ we can write
\begin{align*}
A=\bigcup_{\omega\in A}\{\omega\},
\end{align*}
and this union belongs to $\mathcal F_2$. Hence every subset of $\Omega$ belongs to $\mathcal F_2$, so
\begin{align*}
\mathcal F_2=\sigma(X_1,X_2)=\mathcal P(\Omega).
\end{align*}
At time $1$, the event $\{HH,HT\}$ is known because it is exactly $\{X_1=H\}$. The event $\{HH,TH\}$ is not in $\mathcal F_1$ because the only elements of $\mathcal F_1$ are $\varnothing$, $\Omega$, $\{HH,HT\}$, and $\{TH,TT\}$, and $\{HH,TH\}$ is none of these four sets. Equivalently, it separates outcomes with the same first toss: it contains $HH$ but not $HT$, even though both have first toss $H$, and it contains $TH$ but not $TT$, even though both have first toss $T$. At time $2$, every subset of $\Omega$ is knowable because observing both tosses identifies the exact outcome.
[/example]
This finite example makes the partial-information interpretation literal: $\mathcal F_1$ partitions the sample space into the two blocks determined by the first toss, while $\mathcal F_2$ separates all four outcomes. The next issue is universality: if some other filtration also observes the same process, it must contain the natural one.
### Minimality of Natural Information
The construction of $\mathcal F_n^X$ was described as the information generated by the observations $X_1,\dots,X_n$, but this should not depend on the particular way we write down the sample space. The right test is a comparison with any other filtration $(\mathcal G_n)$ in which the same observations are already measurable at each time. If observing the process under $(\mathcal G_n)$ really makes sense, then every event determined by the first $n$ observations should also be available in $\mathcal G_n$. The formal statement below makes this minimality property precise.
[quotetheorem:9906]
[citeproof:9906]
The theorem says that the natural filtration is the least information flow compatible with observing the process. Any larger filtration represents additional information, such as market data, an independent signal, or forbidden future knowledge.
### Enlarged Information Flows
[example: Enlarging the Filtration Changes the Story]
Let $(X_n)_{n \in \mathbb N}$ be independent random variables with
\begin{align*}
\mathbb P(X_n=1)=\mathbb P(X_n=-1)=\frac{1}{2},
\end{align*}
and define the random walk by
\begin{align*}
S_n=\sum_{k=1}^n X_k.
\end{align*}
Fix $N\ge 2$. Since $S_1=X_1$, the natural information at time $1$ is
\begin{align*}
\mathcal F_1^S=\sigma(S_1)=\sigma(X_1).
\end{align*}
For each $k\ge 2$, the random variable $X_k$ is independent of $X_1$, hence independent of $\sigma(X_1)=\mathcal F_1^S$. Its mean is
\begin{align*}
\mathbb E[X_k]=1\cdot \frac{1}{2}+(-1)\cdot \frac{1}{2}.
\end{align*}
The two terms cancel:
\begin{align*}
1\cdot \frac{1}{2}+(-1)\cdot \frac{1}{2}=\frac{1}{2}-\frac{1}{2}=0.
\end{align*}
Therefore
\begin{align*}
\mathbb E[X_k]=0.
\end{align*}
We compute the time-$1$ conditional forecast of $S_N$ under the natural filtration. By the definition of $S_N$,
\begin{align*}
S_N=X_1+\sum_{k=2}^N X_k.
\end{align*}
Since $S_1=X_1$, this is
\begin{align*}
S_N=S_1+\sum_{k=2}^N X_k.
\end{align*}
Taking conditional expectation with respect to $\mathcal F_1^S$ gives
\begin{align*}
\mathbb E[S_N\mid \mathcal F_1^S]=\mathbb E\left[S_1+\sum_{k=2}^N X_k\mid \mathcal F_1^S\right].
\end{align*}
By linearity of conditional expectation,
\begin{align*}
\mathbb E\left[S_1+\sum_{k=2}^N X_k\mid \mathcal F_1^S\right]=\mathbb E[S_1\mid \mathcal F_1^S]+\sum_{k=2}^N \mathbb E[X_k\mid \mathcal F_1^S].
\end{align*}
Because $S_1$ is $\mathcal F_1^S$-measurable,
\begin{align*}
\mathbb E[S_1\mid \mathcal F_1^S]=S_1.
\end{align*}
Because $X_k$ is independent of $\mathcal F_1^S$ and $\mathbb E[X_k]=0$ for each $k\ge 2$,
\begin{align*}
\mathbb E[X_k\mid \mathcal F_1^S]=\mathbb E[X_k]=0.
\end{align*}
Substituting these identities,
\begin{align*}
\mathbb E[S_N\mid \mathcal F_1^S]=S_1+\sum_{k=2}^N 0.
\end{align*}
The sum of zeros is $0$, so
\begin{align*}
\mathbb E[S_N\mid \mathcal F_1^S]=S_1.
\end{align*}
Now enlarge the filtration by declaring the terminal value visible from the start:
\begin{align*}
\mathcal G_n=\sigma(S_1,\dots,S_n,S_N), \qquad 1\le n\le N.
\end{align*}
If $m\le n$, then each generator $S_1,\dots,S_m,S_N$ of $\mathcal G_m$ is among the generators $S_1,\dots,S_n,S_N$ of $\mathcal G_n$. Hence
\begin{align*}
\mathcal G_m\subset \mathcal G_n.
\end{align*}
Thus $(\mathcal G_n)_{1\le n\le N}$ is a filtration. At time $1$,
\begin{align*}
\mathcal G_1=\sigma(S_1,S_N).
\end{align*}
Since $S_N$ is one of the generators of $\mathcal G_1$, it is $\mathcal G_1$-measurable. Therefore
\begin{align*}
\mathbb E[S_N\mid \mathcal G_1]=S_N.
\end{align*}
The two time-$1$ forecasts are
\begin{align*}
\mathbb E[S_N\mid \mathcal F_1^S]=S_1
\end{align*}
and
\begin{align*}
\mathbb E[S_N\mid \mathcal G_1]=S_N.
\end{align*}
Their difference is
\begin{align*}
S_N-S_1=\left(\sum_{k=1}^N X_k\right)-X_1.
\end{align*}
Cancelling the $X_1$ term gives
\begin{align*}
S_N-S_1=\sum_{k=2}^N X_k.
\end{align*}
Thus the forecasts differ exactly when the future increment $\sum_{k=2}^N X_k$ is not already determined by the first observation. The enlarged filtration models an observer who receives the terminal signal $S_N$ at time $1$, so the same random walk has different conditional predictions under the two information flows.
[/example]
The point is not that larger filtrations are illegitimate. They are essential when the observer genuinely has more information. The mathematical question is always which information flow corresponds to the modelling situation.
## Adapted Processes
Once information is indexed by time, the next question is whether a process can be observed as time passes. A process whose value at time $t$ depends on future randomness should not count as a legitimate state variable at time $t$.
[definition: Adapted Process]
Let $(\Omega, \mathcal F, (\mathcal F_t)_{t \in T}, \mathbb P)$ be a filtered probability space, and let $(E, \mathcal E)$ be a measurable space. A stochastic process $(X_t)_{t \in T}$ with coordinate maps $X_t:\Omega\to E$ is adapted to $(\mathcal F_t)_{t \in T}$ if $X_t$ is $\mathcal F_t$-measurable for every $t \in T$.
[/definition]
Adaptedness is the mathematical form of non-anticipation for state variables. It says that the present value is visible at the present time. It does not say how the value was generated, nor does it impose independence or continuity.
[example: Future Maximum Is Not Adapted]
Let $(X_n)_{n=1}^N$ be real-valued random variables and let $\mathcal F_n=\sigma(X_1,\dots,X_n)$. Define
\begin{align*}
M_n=\max_{1\le k\le n}X_k,\qquad Y_n=\max_{n\le k\le N}X_k.
\end{align*}
For each fixed $n$, the map
\begin{align*}
f_n:\mathbb R^n\to\mathbb R,\qquad f_n(x_1,\dots,x_n)=\max\{x_1,\dots,x_n\}
\end{align*}
is Borel measurable. Since the vector $(X_1,\dots,X_n)$ is $\sigma(X_1,\dots,X_n)$-measurable, the composition
\begin{align*}
M_n=f_n(X_1,\dots,X_n)
\end{align*}
is $\sigma(X_1,\dots,X_n)$-measurable. Because $\mathcal F_n=\sigma(X_1,\dots,X_n)$, the random variable $M_n$ is $\mathcal F_n$-measurable for every $n$, so the process $(M_n)_{1\le n\le N}$ is adapted.
The future maximum need not be adapted. Take $N=2$, let $\Omega=\{a,b\}$, define
\begin{align*}
X_1(a)=X_1(b)=0,
\end{align*}
and define
\begin{align*}
X_2(a)=1,\qquad X_2(b)=-1.
\end{align*}
Since $X_1$ is constant, its only nonempty fiber is
\begin{align*}
X_1^{-1}(\{0\})=\Omega.
\end{align*}
Therefore the sigma-algebra generated by $X_1$ is
\begin{align*}
\mathcal F_1=\sigma(X_1)=\{\varnothing,\Omega\}.
\end{align*}
For $n=1$, the future maximum is
\begin{align*}
Y_1=\max\{X_1,X_2\}.
\end{align*}
Evaluating at $a$ gives
\begin{align*}
Y_1(a)=\max\{X_1(a),X_2(a)\}=\max\{0,1\}=1.
\end{align*}
Evaluating at $b$ gives
\begin{align*}
Y_1(b)=\max\{X_1(b),X_2(b)\}=\max\{0,-1\}=0.
\end{align*}
Hence
\begin{align*}
\{Y_1=1\}=\{a\}.
\end{align*}
But the only sets in $\mathcal F_1$ are $\varnothing$ and $\Omega=\{a,b\}$, and $\{a\}$ is neither of these. Thus $\{Y_1=1\}\notin\mathcal F_1$, so $Y_1$ is not $\mathcal F_1$-measurable.
The running maximum is observable from the past because it is built from $X_1,\dots,X_n$, while the future maximum can reveal information from times that have not yet appeared.
[/example]
Many discrete-time stochastic integrals are built from decisions made just before the next random increment. Adaptedness is almost enough for this, but a stake placed before the $n$-th toss should be measurable with respect to time $n-1$, not time $n$.
[definition: Predictable Process in Discrete Time]
Let $(\Omega, \mathcal F, (\mathcal F_n)_{n \in \mathbb N_0}, \mathbb P)$ be a filtered probability space. A real-valued process $(H_n)_{n \ge 1}$ with coordinate maps $H_n:\Omega\to\mathbb R$ is predictable in discrete time if $H_n$ is $\mathcal F_{n-1}$-measurable for every $n\ge 1$.
[/definition]
Predictability encodes the timing of decisions. The reason this notion matters is that fair-game results should survive betting strategies based on past information. Without the condition $H_n\in\mathcal F_{n-1}$, a betting strategy could choose its stake after seeing the very increment it is meant to bet on, so the martingale cancellation behind fair games would no longer match the modelling situation. The theorem below isolates the case where each stake is chosen from the past and asks whether such a strategy can create expected gain.
[quotetheorem:3542]
[citeproof:3542]
This result is the formal version of the fair-game principle: a strategy based only on past information cannot manufacture positive expected gain from a martingale. The boundedness assumption is a convenient integrability condition; more general versions replace it with hypotheses ensuring that the stochastic sums are integrable.
## Stopping Times
A filtration also lets us define random times at which a decision is made. The key issue is whether the decision to stop by time $t$ can be made from information available by time $t$. This excludes rules that choose a time after inspecting the future path.
[definition: Stopping Time]
Let $(\Omega, \mathcal F, (\mathcal F_t)_{t \in T}, \mathbb P)$ be a filtered probability space with $T \subset [0,\infty)$. A random variable $\tau: \Omega \to [0,\infty]$ is a stopping time with respect to $(\mathcal F_t)_{t \in T}$ if
\begin{align*}
\{\tau \le t\} \in \mathcal F_t
\end{align*}
for every $t \in T$.
[/definition]
The event $\{\tau \le t\}$ asks whether the rule has stopped by time $t$. If that event is known at time $t$, then the rule does not need future information to decide whether stopping has occurred.
[example: First Hitting Time]
Let $(X_n)_{n \in \mathbb N}$ be a real-valued process adapted to $(\mathcal F_n)_{n \in \mathbb N}$, and fix $a \in \mathbb R$. Define
\begin{align*}
\tau = \inf\{m \in \mathbb N : X_m \ge a\},
\end{align*}
with $\inf\varnothing=\infty$. We show that $\tau$ is a stopping time by proving that $\{\tau\le n\}\in\mathcal F_n$ for every $n\in\mathbb N$.
Fix $n\in\mathbb N$. If $\omega\in\{\tau\le n\}$, then $\tau(\omega)\le n$. By the definition of $\tau$, this means that some $k\in\{1,\dots,n\}$ satisfies $X_k(\omega)\ge a$. Hence
\begin{align*}
\omega\in\bigcup_{k=1}^n\{X_k\ge a\}.
\end{align*}
Conversely, if $\omega\in\bigcup_{k=1}^n\{X_k\ge a\}$, then for some $k\in\{1,\dots,n\}$ we have $X_k(\omega)\ge a$. Therefore the set $\{m\in\mathbb N:X_m(\omega)\ge a\}$ contains such a $k$, so its infimum is at most $k$, and hence
\begin{align*}
\tau(\omega)\le k\le n.
\end{align*}
Thus $\omega\in\{\tau\le n\}$. The two inclusions give
\begin{align*}
\{\tau\le n\}=\bigcup_{k=1}^n\{X_k\ge a\}.
\end{align*}
For each $k\le n$, adaptedness gives that $X_k$ is $\mathcal F_k$-measurable. Since $[a,\infty)$ is a Borel subset of $\mathbb R$,
\begin{align*}
\{X_k\ge a\}=X_k^{-1}([a,\infty))\in\mathcal F_k.
\end{align*}
Because $(\mathcal F_n)_{n\in\mathbb N}$ is a filtration and $k\le n$, we have
\begin{align*}
\mathcal F_k\subset\mathcal F_n.
\end{align*}
Therefore
\begin{align*}
\{X_k\ge a\}\in\mathcal F_n
\end{align*}
for every $k=1,\dots,n$. Since $\mathcal F_n$ is a sigma-algebra, it is closed under finite unions, so
\begin{align*}
\bigcup_{k=1}^n\{X_k\ge a\}\in\mathcal F_n.
\end{align*}
Using the event identity above, this gives
\begin{align*}
\{\tau\le n\}\in\mathcal F_n.
\end{align*}
Since this holds for every $n\in\mathbb N$, $\tau$ is a stopping time.
The rule stops at the first observed crossing of the level $a$, and whether it has stopped by time $n$ is determined entirely by the observed values $X_1,\dots,X_n$.
[/example]
Not every natural-looking random time is a stopping time. The difference between first and last is decisive: first hitting can be recognized when it happens, while last hitting requires knowing that the event will never occur again.
[example: Last Visit Before a Horizon]
Let $(X_n)_{n=1}^N$ be real-valued and adapted to its natural filtration $\mathcal F_n=\sigma(X_1,\dots,X_n)$, and fix $a\in\mathbb R$. Define
\begin{align*}
\rho=\sup\{m\in\{1,\dots,N\}:X_m\ge a\},
\end{align*}
with $\sup\varnothing=0$. For a fixed $n<N$, we identify the event $\{\rho\le n\}$.
First suppose $\omega\in\{\rho\le n\}$. If there were some $k\in\{n+1,\dots,N\}$ with $X_k(\omega)\ge a$, then $k$ would belong to the set
\begin{align*}
\{m\in\{1,\dots,N\}:X_m(\omega)\ge a\}.
\end{align*}
Since $k>n$, the supremum of that set would be at least $k$, hence greater than $n$, contradicting $\rho(\omega)\le n$. Therefore $X_k(\omega)<a$ for every $k=n+1,\dots,N$, so
\begin{align*}
\omega\in\bigcap_{k=n+1}^N\{X_k<a\}.
\end{align*}
Conversely, suppose
\begin{align*}
\omega\in\bigcap_{k=n+1}^N\{X_k<a\}.
\end{align*}
Then $X_k(\omega)<a$ for every $k=n+1,\dots,N$. Hence any index $m\in\{1,\dots,N\}$ satisfying $X_m(\omega)\ge a$ must satisfy $m\le n$. If there is at least one such index, its supremum is at most $n$; if there is none, then by convention $\rho(\omega)=0\le n$. Thus $\rho(\omega)\le n$, and so
\begin{align*}
\omega\in\{\rho\le n\}.
\end{align*}
The two inclusions prove
\begin{align*}
\{\rho\le n\}=\bigcap_{k=n+1}^N \{X_k<a\}.
\end{align*}
This event need not belong to $\mathcal F_n$. Take $N=2$, let $\Omega=\{\omega_1,\omega_2\}$, set $a=1$, and define
\begin{align*}
X_1(\omega_1)=X_1(\omega_2)=0.
\end{align*}
Define the second observation by
\begin{align*}
X_2(\omega_1)=2
\end{align*}
and
\begin{align*}
X_2(\omega_2)=0.
\end{align*}
Because $X_1$ is constant, its inverse images of Borel sets are only $\varnothing$ and $\Omega$: if $0\in B$, then $X_1^{-1}(B)=\Omega$, while if $0\notin B$, then $X_1^{-1}(B)=\varnothing$. Therefore
\begin{align*}
\mathcal F_1=\sigma(X_1)=\{\varnothing,\Omega\}.
\end{align*}
For $\omega_1$, we have
\begin{align*}
X_1(\omega_1)=0<1
\end{align*}
and
\begin{align*}
X_2(\omega_1)=2\ge 1.
\end{align*}
Thus the set of crossing times is $\{2\}$, so
\begin{align*}
\rho(\omega_1)=\sup\{2\}=2.
\end{align*}
For $\omega_2$, we have
\begin{align*}
X_1(\omega_2)=0<1
\end{align*}
and
\begin{align*}
X_2(\omega_2)=0<1.
\end{align*}
Thus the set of crossing times is empty, so
\begin{align*}
\rho(\omega_2)=0.
\end{align*}
Hence
\begin{align*}
\{\rho\le 1\}=\{\omega\in\Omega:\rho(\omega)\le 1\}=\{\omega_2\}.
\end{align*}
But $\{\omega_2\}$ is neither $\varnothing$ nor $\Omega=\{\omega_1,\omega_2\}$, so
\begin{align*}
\{\rho\le 1\}\notin\mathcal F_1.
\end{align*}
Therefore $\rho$ is not a stopping time with respect to the natural filtration.
The last visit time fails because deciding whether the last crossing has already occurred may require seeing that no later crossing will occur.
[/example]
Stopping times allow random horizons while keeping the non-anticipation principle. To compare a process before and after such a random time, we need a construction that freezes the path when the rule stops.
[definition: Stopped Process]
Let $(E,\mathcal E)$ be a measurable space, let $T \subset [0,\infty)$, let $(X_t)_{t \in T}$ be a stochastic process with $X_t: \Omega \to E$, and let $\tau: \Omega \to [0,\infty]$ be a random time such that $\min\{t,\tau(\omega)\} \in T$ for every $t \in T$ and $\omega \in \Omega$. The stopped process is the stochastic process
\begin{align*}
X^\tau_t:\Omega\to E, \qquad X^\tau_t(\omega)=X_{\min\{t,\tau(\omega)\}}(\omega).
\end{align*}
[/definition]
After stopping, the process is frozen at its value at the stopping time. The next question is whether this operation preserves observability: if the original process is visible over time and the stopping rule is legitimate, the stopped process should remain visible.
[quotetheorem:9907]
[citeproof:9907]
The theorem says that stopping an observable process at a legitimate random time produces another observable process. This is a structural fact, not a martingale statement; no independence or expectation hypothesis is involved.
## Martingales and Conditional Information
The deepest use of filtrations is in [conditional expectation](/page/Conditional%20Expectation). A [martingale](/page/Martingale) is not simply a process with constant mean. It is a process whose future conditional expectation, given current information, equals its present value.
[definition: Conditional Expectation Given a Sigma-Algebra]
Let $(\Omega, \mathcal F, \mathbb P)$ be a probability space, let $X:\Omega\to\mathbb R$ be an integrable real-valued random variable, and let $\mathcal G \subset \mathcal F$ be a sub-sigma-algebra. A conditional expectation of $X$ given $\mathcal G$ is an integrable $\mathcal G$-measurable random variable $Y:\Omega\to\mathbb R$ such that
\begin{align*}
\int_A Y\,d\mathbb P = \int_A X\,d\mathbb P
\end{align*}
for every $A \in \mathcal G$.
[/definition]
The random variable $Y$ is the best $\mathcal G$-measurable summary of $X$ in the sense that it gives the same averages over all events distinguishable by $\mathcal G$. To formalize a fair evolving process, we need a condition saying that the present value is exactly the conditional forecast of every later value, using only the information currently available.
[definition: Martingale]
Let $(\Omega, \mathcal F, (\mathcal F_t)_{t \in T}, \mathbb P)$ be a filtered probability space. A real-valued process $(M_t)_{t \in T}$ with coordinate maps $M_t:\Omega\to\mathbb R$ is a martingale with respect to $(\mathcal F_t)_{t \in T}$ if $M_t$ is $\mathcal F_t$-measurable for every $t \in T$, if $\mathbb E[|M_t|] < \infty$ for every $t \in T$, and if $s \le t$ implies $\mathbb E[M_t \mid \mathcal F_s] = M_s$ almost surely.
[/definition]
The three clauses correspond to observability, integrability, and fairness relative to the filtration. Changing the filtration changes the third clause, which is why filtrations are part of the definition rather than background decoration.
[example: Random Walk Martingale]
Let $(X_n)_{n \in \mathbb N}$ be independent random variables satisfying
\begin{align*}
\mathbb P(X_n = 1) = \mathbb P(X_n = -1) = \frac{1}{2},
\end{align*}
and define
\begin{align*}
S_n = \sum_{k=1}^n X_k, \qquad \mathcal F_n = \sigma(X_1,\dots,X_n).
\end{align*}
We show that $(S_n)_{n \in \mathbb N}$ is a martingale with respect to $(\mathcal F_n)_{n \in \mathbb N}$ by checking adaptedness, integrability, and the conditional expectation identity.
For each $n$, the map
\begin{align*}
f_n:\mathbb R^n\to\mathbb R,\qquad f_n(x_1,\dots,x_n)=x_1+\cdots+x_n
\end{align*}
is Borel measurable, and
\begin{align*}
S_n=f_n(X_1,\dots,X_n).
\end{align*}
Since $\mathcal F_n=\sigma(X_1,\dots,X_n)$, the vector $(X_1,\dots,X_n)$ is $\mathcal F_n$-measurable, so $S_n$ is $\mathcal F_n$-measurable. Thus the process is adapted.
For every outcome $\omega$, the value $X_k(\omega)$ is either $1$ or $-1$, so
\begin{align*}
|X_k(\omega)|=1.
\end{align*}
Hence $|X_k|=1$ as a random variable. By the triangle inequality,
\begin{align*}
|S_n|=\left|\sum_{k=1}^n X_k\right|\le \sum_{k=1}^n |X_k|.
\end{align*}
Substituting $|X_k|=1$ for each $k$ gives
\begin{align*}
\sum_{k=1}^n |X_k|=\sum_{k=1}^n 1=n.
\end{align*}
Therefore
\begin{align*}
|S_n|\le n.
\end{align*}
Taking expectations and using monotonicity of expectation,
\begin{align*}
\mathbb E[|S_n|]\le \mathbb E[n]=n<\infty.
\end{align*}
Thus every $S_n$ is integrable.
It remains to verify the martingale identity. For each $j$,
\begin{align*}
\mathbb E[X_j]=1\cdot \mathbb P(X_j=1)+(-1)\cdot \mathbb P(X_j=-1).
\end{align*}
Using the given probabilities,
\begin{align*}
\mathbb E[X_j]=1\cdot \frac{1}{2}+(-1)\cdot \frac{1}{2}.
\end{align*}
The two terms are
\begin{align*}
1\cdot \frac{1}{2}=\frac{1}{2}
\end{align*}
and
\begin{align*}
(-1)\cdot \frac{1}{2}=-\frac{1}{2}.
\end{align*}
Thus
\begin{align*}
\mathbb E[X_j]=\frac{1}{2}-\frac{1}{2}=0.
\end{align*}
Now fix $m<n$. From the definition of $S_n$,
\begin{align*}
S_n=\sum_{k=1}^n X_k.
\end{align*}
Split the sum at $m$:
\begin{align*}
S_n=\sum_{k=1}^m X_k+\sum_{j=m+1}^n X_j.
\end{align*}
Since $S_m=\sum_{k=1}^m X_k$, this becomes
\begin{align*}
S_n=S_m+\sum_{j=m+1}^n X_j.
\end{align*}
Taking conditional expectation with respect to $\mathcal F_m$ gives
\begin{align*}
\mathbb E[S_n\mid \mathcal F_m]=\mathbb E\left[S_m+\sum_{j=m+1}^n X_j\mid \mathcal F_m\right].
\end{align*}
By linearity of conditional expectation,
\begin{align*}
\mathbb E\left[S_m+\sum_{j=m+1}^n X_j\mid \mathcal F_m\right]=\mathbb E[S_m\mid \mathcal F_m]+\sum_{j=m+1}^n \mathbb E[X_j\mid \mathcal F_m].
\end{align*}
Because $S_m$ is $\mathcal F_m$-measurable,
\begin{align*}
\mathbb E[S_m\mid \mathcal F_m]=S_m.
\end{align*}
For $j>m$, the variable $X_j$ is independent of $X_1,\dots,X_m$, hence independent of $\mathcal F_m=\sigma(X_1,\dots,X_m)$. Therefore its conditional expectation given $\mathcal F_m$ is its ordinary expectation:
\begin{align*}
\mathbb E[X_j\mid \mathcal F_m]=\mathbb E[X_j].
\end{align*}
Using $\mathbb E[X_j]=0$,
\begin{align*}
\mathbb E[X_j\mid \mathcal F_m]=0.
\end{align*}
Substituting the known part and the centered increments,
\begin{align*}
\mathbb E[S_n\mid \mathcal F_m]=S_m+\sum_{j=m+1}^n 0.
\end{align*}
The sum has $n-m$ terms, each equal to $0$, so
\begin{align*}
\sum_{j=m+1}^n 0=0.
\end{align*}
Hence
\begin{align*}
\mathbb E[S_n\mid \mathcal F_m]=S_m.
\end{align*}
When $m=n$, the random variable $S_n$ is $\mathcal F_n$-measurable, so
\begin{align*}
\mathbb E[S_n\mid \mathcal F_n]=S_n.
\end{align*}
Thus for all $m\le n$,
\begin{align*}
\mathbb E[S_n\mid \mathcal F_m]=S_m.
\end{align*}
The simple symmetric random walk is therefore a martingale in its natural filtration: the past sum is visible, and every future increment has conditional mean zero given the past.
[/example]
This computation is the prototype for martingale reasoning: separate the known part from the new increment, then condition on the present sigma-algebra. Once martingales are tied to filtrations, the next question is whether fair games remain fair when stopped at legitimate random times.
[quotetheorem:1153]
[citeproof:1153]
The bounded horizon is not cosmetic. Without conditions controlling the stopping time or the martingale, a strategy can wait for a rare favourable event and distort expectations. Filtrations prevent looking ahead, but they do not by themselves prevent unbounded waiting.
[example: Why Boundedness Matters]
Let $(S_n)_{n \in \mathbb N_0}$ be the simple symmetric random walk with $S_0=0$, so $S_n=X_1+\cdots+X_n$ where the increments are independent and satisfy $\mathbb P(X_j=1)=\mathbb P(X_j=-1)=1/2$. Let its natural filtration be $\mathcal F_n=\sigma(S_0,\dots,S_n)$, and define
\begin{align*}
\tau = \inf\{n \in \mathbb N : S_n = 1\}.
\end{align*}
For each $n\in\mathbb N$, the event $\{\tau\le n\}$ occurs exactly when the walk reaches $1$ at one of the times $1,\dots,n$. Thus
\begin{align*}
\{\tau\le n\}=\bigcup_{k=1}^n\{S_k=1\}.
\end{align*}
For $k\le n$, the random variable $S_k$ is $\mathcal F_k$-measurable by definition of the natural filtration, and $\mathcal F_k\subset\mathcal F_n$ because $(\mathcal F_n)$ is increasing. Therefore $\{S_k=1\}\in\mathcal F_n$ for every $k=1,\dots,n$. Since $\mathcal F_n$ is closed under finite unions,
\begin{align*}
\bigcup_{k=1}^n\{S_k=1\}\in\mathcal F_n.
\end{align*}
Hence $\{\tau\le n\}\in\mathcal F_n$ for every $n$, so $\tau$ is a stopping time.
We now prove that $\tau<\infty$ almost surely. Fix $m\ge 1$, define
\begin{align*}
\sigma_m=\inf\{n\in\mathbb N:S_n=-m\},
\end{align*}
and consider the walk until it first reaches either $1$ or $-m$. For $i\in\{-m,-m+1,\dots,1\}$, let $q_i$ be the probability, starting from $i$, of hitting $1$ before hitting $-m$. The boundary values are
\begin{align*}
q_{-m}=0
\end{align*}
and
\begin{align*}
q_1=1.
\end{align*}
If $-m<i<1$, then after one step the walk is at $i+1$ with probability $1/2$ and at $i-1$ with probability $1/2$, so conditioning on the first step gives
\begin{align*}
q_i=\frac{1}{2}q_{i+1}+\frac{1}{2}q_{i-1}.
\end{align*}
Multiplying by $2$ gives
\begin{align*}
2q_i=q_{i+1}+q_{i-1}.
\end{align*}
Subtracting $q_i+q_{i-1}$ from both sides gives
\begin{align*}
q_i-q_{i-1}=q_{i+1}-q_i.
\end{align*}
Thus the successive differences are constant. If this common difference is $a$, then moving from $0$ to $i$ in unit steps gives
\begin{align*}
q_i=q_0+ai.
\end{align*}
Writing $b=q_0$, we have $q_i=ai+b$ for constants $a,b$.
The boundary value at $-m$ gives
\begin{align*}
q_{-m}=a(-m)+b=-am+b=0.
\end{align*}
The boundary value at $1$ gives
\begin{align*}
q_1=a+b=1.
\end{align*}
From $-am+b=0$, we get
\begin{align*}
b=am.
\end{align*}
Substituting this into $a+b=1$ gives
\begin{align*}
a+am=1.
\end{align*}
Factoring $a$ gives
\begin{align*}
a(m+1)=1.
\end{align*}
Therefore
\begin{align*}
a=\frac{1}{m+1}.
\end{align*}
Since $b=am$,
\begin{align*}
b=\frac{m}{m+1}.
\end{align*}
Starting from $0$, this yields
\begin{align*}
\mathbb P(\tau<\sigma_m)=q_0=b=\frac{m}{m+1}.
\end{align*}
Because $\{\tau<\sigma_m\}\subset\{\tau<\infty\}$,
\begin{align*}
\mathbb P(\tau<\infty)\ge \mathbb P(\tau<\sigma_m).
\end{align*}
Substituting the value just computed,
\begin{align*}
\mathbb P(\tau<\infty)\ge \frac{m}{m+1}.
\end{align*}
This holds for every $m\ge 1$. Since
\begin{align*}
\frac{m}{m+1}=1-\frac{1}{m+1},
\end{align*}
and $\frac{1}{m+1}\to 0$, we have $\frac{m}{m+1}\to 1$. Also $\mathbb P(\tau<\infty)\le 1$, so
\begin{align*}
\mathbb P(\tau<\infty)=1.
\end{align*}
On the event $\{\tau<\infty\}$, the definition of $\tau$ gives
\begin{align*}
S_\tau=1.
\end{align*}
Since this event has probability $1$, $S_\tau$ equals the constant random variable $1$ almost surely, and therefore
\begin{align*}
\mathbb E[S_\tau]=1.
\end{align*}
Also $S_0=0$, so
\begin{align*}
\mathbb E[S_0]=0.
\end{align*}
Thus
\begin{align*}
\mathbb E[S_\tau]=1\ne 0=\mathbb E[S_0].
\end{align*}
There is no contradiction with optional stopping for bounded stopping times. For every $N\ge 1$, consider the event that the first $N$ increments are all equal to $-1$:
\begin{align*}
A_N=\{X_1=-1,\dots,X_N=-1\}.
\end{align*}
By independence of the increments,
\begin{align*}
\mathbb P(A_N)=\prod_{j=1}^N \mathbb P(X_j=-1).
\end{align*}
Since each factor is $1/2$,
\begin{align*}
\mathbb P(A_N)=\prod_{j=1}^N \frac{1}{2}.
\end{align*}
The product has $N$ identical factors, so
\begin{align*}
\mathbb P(A_N)=2^{-N}.
\end{align*}
This probability is positive. On $A_N$, for every $1\le k\le N$,
\begin{align*}
S_k=X_1+\cdots+X_k=(-1)+\cdots+(-1)=-k.
\end{align*}
Hence $S_k\ne 1$ for all $1\le k\le N$, so
\begin{align*}
\tau>N
\end{align*}
on $A_N$. Since this happens with positive probability for every $N$, no deterministic finite bound can hold for $\tau$. The failure comes from unbounded waiting, not from looking ahead.
[/example]
The lesson is that filtrations handle information, while integrability and boundedness handle size. Both are needed for reliable stopping theorems.
## Continuous Time and Regularity of Information
In continuous time, the index set is usually $[0,\infty)$, and subtle questions arise because there is no immediate previous time. Events may become visible exactly at a time, just before it, or just after it. Regularity assumptions on filtrations control these boundary effects.
[definition: Right-Continuous Filtration]
Let $(\mathcal F_t)_{t \ge 0}$ be a filtration on $(\Omega, \mathcal F, \mathbb P)$. It is right-continuous if
\begin{align*}
\mathcal F_t = \bigcap_{s>t} \mathcal F_s
\end{align*}
for every $t \ge 0$.
[/definition]
Right-continuity says that no new information appears immediately after time $t$ without already being present at time $t$. Another boundary issue comes from null sets: probability often identifies events that differ only on null sets, so we need the filtration to contain all negligible refinements that the ambient completed model treats as invisible.
[definition: Complete Filtration]
Let $(\Omega, \mathcal F, \mathbb P)$ be a complete probability space. A filtration $(\mathcal F_t)_{t \in T}$ is complete if, whenever $A \in \mathcal F$ satisfies $\mathbb P(A)=0$ and $B \subset A$, one has $B \in \mathcal F_t$ for every $t \in T$.
[/definition]
Completeness ensures that events differing only on null sets do not fall outside the information structure. Since continuous-time stochastic processes are often compared up to indistinguishability or almost-sure equality, the standard framework needs both null-set closure and right-continuity built into a single reusable hypothesis.
[definition: Usual Conditions]
A filtered probability space $(\Omega, \mathcal F, (\mathcal F_t)_{t \ge 0}, \mathbb P)$ satisfies the usual conditions if $(\Omega,\mathcal F,\mathbb P)$ is complete, every subset of a $\mathbb P$-null event in $\mathcal F$ belongs to $\mathcal F_0$, and $(\mathcal F_t)_{t \ge 0}$ is right-continuous.
[/definition]
These conditions make the filtration stable under the equivalence relations used in probability. They also align the abstract filtration with the pathwise intuition that information available at time $t$ includes events that can be resolved by looking arbitrarily soon after $t$.
[example: Brownian Filtration]
Let $(W_t)_{t \ge 0}$ be a standard [Brownian motion](/page/Brownian%20Motion) on a probability space, and pass to the completed ambient space $(\Omega,\overline{\mathcal F},\overline{\mathbb P})$. The natural Brownian information at time $t$ is the sigma-algebra generated by all path coordinates observed up to time $t$:
\begin{align*}
\mathcal F_t^W=\sigma(W_s:0\le s\le t).
\end{align*}
If $s\le t$, then every index $r$ satisfying $0\le r\le s$ also satisfies $0\le r\le t$. Hence every coordinate map used to generate $\mathcal F_s^W$ is also one of the coordinate maps used to generate $\mathcal F_t^W$. Since $\mathcal F_t^W$ is a sigma-algebra containing all the generators of $\mathcal F_s^W$, it contains the sigma-algebra generated by them:
\begin{align*}
\mathcal F_s^W\subset \mathcal F_t^W.
\end{align*}
Thus $(\mathcal F_t^W)_{t\ge0}$ is a filtration.
Let $\mathcal N$ be the collection of all subsets of $\overline{\mathbb P}$-null events in $\overline{\mathcal F}$. For each $u\ge0$, define
\begin{align*}
\mathcal F_u^W\vee\mathcal N=\sigma(\mathcal F_u^W\cup\mathcal N).
\end{align*}
Because $\mathcal F_u^W\subset\overline{\mathcal F}$ and every element of $\mathcal N$ belongs to $\overline{\mathcal F}$ by completeness of the ambient space, the generated sigma-algebra is a sub-sigma-algebra of $\overline{\mathcal F}$:
\begin{align*}
\mathcal F_u^W\vee\mathcal N\subset\overline{\mathcal F}.
\end{align*}
Define the augmented Brownian filtration by
\begin{align*}
\mathcal G_t=\bigcap_{u>t}(\mathcal F_u^W\vee\mathcal N).
\end{align*}
Thus an event $A$ belongs to $\mathcal G_t$ exactly when, for every $u>t$, one has
\begin{align*}
A\in\mathcal F_u^W\vee\mathcal N.
\end{align*}
We first check that $(\mathcal G_t)_{t\ge0}$ is increasing. Let $s\le t$ and let $A\in\mathcal G_s$. By the definition of $\mathcal G_s$, for every $u>s$,
\begin{align*}
A\in\mathcal F_u^W\vee\mathcal N.
\end{align*}
Now take any $v>t$. Since $s\le t<v$, we have $v>s$, so the previous statement with $u=v$ gives
\begin{align*}
A\in\mathcal F_v^W\vee\mathcal N.
\end{align*}
This holds for every $v>t$, hence
\begin{align*}
A\in\bigcap_{v>t}(\mathcal F_v^W\vee\mathcal N)=\mathcal G_t.
\end{align*}
Therefore
\begin{align*}
\mathcal G_s\subset\mathcal G_t.
\end{align*}
The filtration contains all null subsets at every time. Indeed, if $B\in\mathcal N$, then $B\in\mathcal F_u^W\cup\mathcal N$ for every $u>t$. Since a generated sigma-algebra contains the collection that generates it,
\begin{align*}
B\in\sigma(\mathcal F_u^W\cup\mathcal N)=\mathcal F_u^W\vee\mathcal N
\end{align*}
for every $u>t$. Therefore
\begin{align*}
B\in\bigcap_{u>t}(\mathcal F_u^W\vee\mathcal N)=\mathcal G_t.
\end{align*}
So every subset of a $\overline{\mathbb P}$-null event belongs to every $\mathcal G_t$.
Finally we verify right-continuity. Since $(\mathcal G_t)$ is increasing, if $A\in\mathcal G_t$ and $r>t$, then
\begin{align*}
A\in\mathcal G_r.
\end{align*}
Hence
\begin{align*}
\mathcal G_t\subset\bigcap_{r>t}\mathcal G_r.
\end{align*}
For the reverse inclusion, suppose $A\in\bigcap_{r>t}\mathcal G_r$. Fix $u>t$. Choose a number $r$ with
\begin{align*}
t<r<u.
\end{align*}
Since $A\in\mathcal G_r$, the definition of $\mathcal G_r$ gives
\begin{align*}
A\in\mathcal F_q^W\vee\mathcal N
\end{align*}
for every $q>r$. Taking $q=u$, which is allowed because $u>r$, gives
\begin{align*}
A\in\mathcal F_u^W\vee\mathcal N.
\end{align*}
This holds for every $u>t$, so
\begin{align*}
A\in\bigcap_{u>t}(\mathcal F_u^W\vee\mathcal N)=\mathcal G_t.
\end{align*}
Thus
\begin{align*}
\bigcap_{r>t}\mathcal G_r\subset\mathcal G_t.
\end{align*}
Combining the two inclusions,
\begin{align*}
\mathcal G_t=\bigcap_{r>t}\mathcal G_r.
\end{align*}
The augmented filtration therefore records the Brownian path information available up to arbitrarily small times after $t$, includes all null-set refinements, and is right-continuous. This is the standard information structure used for Brownian martingales and Itô integrals.
[/example]
The passage from natural to augmented filtration is a modelling convention with mathematical force. It completes the null sets and enforces right-continuity, while preserving the intended Brownian observation process as the source of information.
## Independence and New Information
A filtration separates old information from new information. Independence statements then express that future noise carries no hidden dependence on the past. This is the mechanism behind random walks, Brownian motion, Markov processes, and martingale increments.
[definition: Independence of a Random Variable and a Sigma-Algebra]
Let $(\Omega, \mathcal F, \mathbb P)$ be a probability space, let $X: \Omega \to E$ be a random variable with values in a measurable space $(E,\mathcal E)$, and let $\mathcal G \subset \mathcal F$ be a sub-sigma-algebra. The random variable $X$ is independent of $\mathcal G$ if $\sigma(X)$ and $\mathcal G$ are independent sigma-algebras.
[/definition]
This definition is stronger than saying that $X$ is uncorrelated with a few chosen random variables. It says that every event determined by $X$ is independent of every event already contained in $\mathcal G$.
[definition: Independent Increments Relative to a Filtration]
Let $(X_t)_{t \ge 0}$ be a real-valued stochastic process with coordinate maps $X_t:\Omega\to\mathbb R$, adapted to a filtration $(\mathcal F_t)_{t \ge 0}$. The process has independent increments relative to $(\mathcal F_t)_{t \ge 0}$ if, for every $0 \le s \le t$, the increment $X_t-X_s$ is independent of $\mathcal F_s$.
[/definition]
The phrase relative to a filtration is important. A process may have independent increments relative to its natural filtration but fail to have them after the filtration is enlarged by future information. The next result explains why this definition is useful: once the new increments are independent of the past and have mean zero, conditional forecasts collapse to the present value.
[quotetheorem:9908]
[citeproof:9908]
The theorem captures the common route from noise models to martingales. Adaptedness says the process is observable, independence says future increments are new, and centering says the new information has no conditional drift.
[example: Poisson Compensation]
Let $(N_t)_{t \ge 0}$ be a Poisson process of rate $\lambda>0$, and let its natural filtration be $\mathcal F_t=\sigma(N_u:0\le u\le t)$. For $0\le s\le t$, the increment $N_t-N_s$ is independent of $\mathcal F_s$ and is Poisson distributed with parameter $\lambda(t-s)$, so its mean is
\begin{align*}
\mathbb E[N_t-N_s]=\lambda(t-s).
\end{align*}
We first compute the conditional forecast of $N_t$ from time $s$. Since
\begin{align*}
N_t=N_s+(N_t-N_s),
\end{align*}
we have
\begin{align*}
\mathbb E[N_t\mid\mathcal F_s]=\mathbb E[N_s+(N_t-N_s)\mid\mathcal F_s].
\end{align*}
By linearity of conditional expectation,
\begin{align*}
\mathbb E[N_s+(N_t-N_s)\mid\mathcal F_s]=\mathbb E[N_s\mid\mathcal F_s]+\mathbb E[N_t-N_s\mid\mathcal F_s].
\end{align*}
Because $N_s$ is $\mathcal F_s$-measurable,
\begin{align*}
\mathbb E[N_s\mid\mathcal F_s]=N_s.
\end{align*}
Because $N_t-N_s$ is independent of $\mathcal F_s$,
\begin{align*}
\mathbb E[N_t-N_s\mid\mathcal F_s]=\mathbb E[N_t-N_s].
\end{align*}
Using the Poisson mean computed above,
\begin{align*}
\mathbb E[N_t-N_s\mid\mathcal F_s]=\lambda(t-s).
\end{align*}
Substituting the two terms gives
\begin{align*}
\mathbb E[N_t\mid\mathcal F_s]=N_s+\lambda(t-s).
\end{align*}
If $t>s$, then $t-s>0$, and since $\lambda>0$,
\begin{align*}
\lambda(t-s)>0.
\end{align*}
Thus
\begin{align*}
N_s+\lambda(t-s)\ne N_s.
\end{align*}
So $(N_t)_{t\ge0}$ is not a martingale with respect to its natural filtration.
Now define the compensated process
\begin{align*}
M_t=N_t-\lambda t.
\end{align*}
For each $t$, $N_t$ is $\mathcal F_t$-measurable by the definition of $\mathcal F_t$, and $\lambda t$ is deterministic, so $M_t$ is $\mathcal F_t$-measurable.
The process is integrable. Since $N_t\ge0$,
\begin{align*}
|M_t|=|N_t-\lambda t|\le |N_t|+|\lambda t|.
\end{align*}
Because $N_t\ge0$, $|N_t|=N_t$, and because $\lambda>0$ and $t\ge0$, $|\lambda t|=\lambda t$. Hence
\begin{align*}
|M_t|\le N_t+\lambda t.
\end{align*}
Taking expectations and using monotonicity and linearity,
\begin{align*}
\mathbb E[|M_t|]\le \mathbb E[N_t]+\mathbb E[\lambda t].
\end{align*}
The Poisson mean gives $\mathbb E[N_t]=\lambda t$, and the constant $\lambda t$ has expectation $\lambda t$, so
\begin{align*}
\mathbb E[|M_t|]\le \lambda t+\lambda t.
\end{align*}
Therefore
\begin{align*}
\mathbb E[|M_t|]\le 2\lambda t<\infty.
\end{align*}
It remains to check the martingale identity. For $0\le s\le t$,
\begin{align*}
M_t=N_t-\lambda t.
\end{align*}
Using $N_t=N_s+(N_t-N_s)$ gives
\begin{align*}
M_t=N_s+(N_t-N_s)-\lambda t.
\end{align*}
Since
\begin{align*}
\lambda t=\lambda s+\lambda(t-s),
\end{align*}
we get
\begin{align*}
M_t=N_s+(N_t-N_s)-\lambda s-\lambda(t-s).
\end{align*}
Grouping $N_s-\lambda s=M_s$ yields
\begin{align*}
M_t=M_s+(N_t-N_s)-\lambda(t-s).
\end{align*}
Taking conditional expectation with respect to $\mathcal F_s$,
\begin{align*}
\mathbb E[M_t\mid\mathcal F_s]=\mathbb E[M_s+(N_t-N_s)-\lambda(t-s)\mid\mathcal F_s].
\end{align*}
By linearity of conditional expectation,
\begin{align*}
\mathbb E[M_s+(N_t-N_s)-\lambda(t-s)\mid\mathcal F_s]=\mathbb E[M_s\mid\mathcal F_s]+\mathbb E[N_t-N_s\mid\mathcal F_s]-\mathbb E[\lambda(t-s)\mid\mathcal F_s].
\end{align*}
Because $M_s$ is $\mathcal F_s$-measurable,
\begin{align*}
\mathbb E[M_s\mid\mathcal F_s]=M_s.
\end{align*}
Because $\lambda(t-s)$ is deterministic,
\begin{align*}
\mathbb E[\lambda(t-s)\mid\mathcal F_s]=\lambda(t-s).
\end{align*}
As above, independence of $N_t-N_s$ from $\mathcal F_s$ and the Poisson mean formula give
\begin{align*}
\mathbb E[N_t-N_s\mid\mathcal F_s]=\lambda(t-s).
\end{align*}
Substituting these three identities,
\begin{align*}
\mathbb E[M_t\mid\mathcal F_s]=M_s+\lambda(t-s)-\lambda(t-s).
\end{align*}
The last two terms cancel:
\begin{align*}
\lambda(t-s)-\lambda(t-s)=0.
\end{align*}
Therefore
\begin{align*}
\mathbb E[M_t\mid\mathcal F_s]=M_s.
\end{align*}
The deterministic drift $\lambda t$ has been subtracted exactly, leaving a process whose future conditional expectation equals its present value.
[/example]
This example shows the role of the filtration in distinguishing drift from surprise. The deterministic term $\lambda t$ is predictable trend, while $M_t$ is the remaining fluctuation after conditioning on the past.
## Beyond and Connected Topics
Filtrations become more delicate in continuous-time martingale theory; the course notes [Cambridge III Advanced Probability](/page/Cambridge%20III%20Advanced%20Probability) use them to organize predictable and optional sigma-algebras, which formalize which random objects can be used as integrands or observed at random times. They also connect directly to stochastic-process theory, where the natural filtration records the information used in the Markov property.
The same language is foundational in [stochastic calculus](/page/Stochastic%20Calculus). There, the usual conditions are imposed before defining many standard constructions, and adapted or predictable processes specify which integrands are allowed to depend on the past without seeing future noise; the course notes [Cambridge III Stochastic Calculus and Applications](/page/Cambridge%20III%20Stochastic%20Calculus%20and%20Applications) continue this viewpoint in a setting where filtrations control admissible stochastic integration.
## References
Androma, [Cambridge III Advanced Probability](/page/Cambridge%20III%20Advanced%20Probability).
Androma, [Cambridge IB Probability and Measure](/page/Cambridge%20IB%20Probability%20and%20Measure).
Androma, [Cambridge III Stochastic Calculus and Applications](/page/Cambridge%20III%20Stochastic%20Calculus%20and%20Applications).
Androma, [Cambridge IA Probability](/page/Cambridge%20IA%20Probability).
[Conditional Expectation](/page/Conditional%20Expectation).
[Martingale](/page/Martingale).
David Williams, *Probability with Martingales* (1991).
Daniel Revuz and Marc Yor, *Continuous Martingales and Brownian Motion* (1999).
Ioannis Karatzas and Steven Shreve, *Brownian Motion and Stochastic Calculus* (1991).
Rick Durrett, *Probability: Theory and Examples* (2019).
Filtration
Also known as: ["Filtered probability space","Information filtration","Stochastic filtration","Filtrations in probability"]