A gambler watches a fortune rise and fall after each play of a fair game. The expected gain from the next play is zero, but this local fairness does not say what happens after many plays, after a rule for stopping, or after passing to a limit. If the gambler quits the first time the fortune reaches a large target, the stopping rule depends on the history of the game. If the fortune is observed continuously rather than at integer times, the same question becomes even sharper: how can a process fluctuate at every scale and still have no predictable drift?
Martingales are the language for this kind of fairness under evolving information. They do not say that a random process is constant. They say that, after all information currently available has been used, the best conditional prediction of the future value is the present value. This single idea controls random walks, likelihood ratios, stochastic integrals, Brownian motion, branching processes, and many limiting arguments in probability.
The point is not merely philosophical. Many naive arguments about fair games fail because they ignore the information contained in the stopping rule.
[example: Doubling Strategy Failure]
**SETUP.** Let $X_1, X_2, \dots$ be i.i.d. with $\mathbb P(X_i = 1) = \mathbb P(X_i = -1) = 1/2$. Define
\begin{align*}
S_n &= \sum_{i=1}^n X_i, \qquad S_0 = 0, \qquad \mathcal F_n = \sigma(X_1, \dots, X_n),
\end{align*}
and the stopping time
\begin{align*}
\tau &= \inf\{n \ge 0 : S_n = 1\}.
\end{align*}
The capital $S_n$ represents the gambler's fortune after $n$ plays of a fair $\pm 1$ game. The rule $\tau$ says: quit the first time fortune reaches 1.
**CLAIM.** The process $(S_n)_{n \ge 0}$ is a martingale with $\mathbb E[S_0] = 0$, yet $\mathbb E[S_\tau] = 1$. The apparent contradiction does not arise from any bias in the game. It arises because $\tau$ is unbounded, so the [Optional Stopping Theorem](/theorems/1153) does not apply.
**DERIVATION.**
**Step 1: $(S_n)_{n \ge 0}$ is a martingale.** For $m \ge n \ge 0$, write $S_m = S_n + \sum_{i=n+1}^m X_i$ and condition on $\mathcal F_n$:
\begin{align*}
\mathbb E[S_m \mid \mathcal F_n]
&= \mathbb E\!\left[S_n + \sum_{i=n+1}^m X_i \,\middle|\, \mathcal F_n\right] \\
&= S_n + \sum_{i=n+1}^m \mathbb E[X_i \mid \mathcal F_n] \\
&= S_n + \sum_{i=n+1}^m \mathbb E[X_i] \\
&= S_n + \sum_{i=n+1}^m 0 \;=\; S_n.
\end{align*}
The second equality uses linearity and the fact that $S_n$ is $\mathcal F_n$-measurable. The third uses that $X_i$ is independent of $\mathcal F_n = \sigma(X_1,\dots,X_n)$ for every $i > n$, so $\mathbb E[X_i \mid \mathcal F_n] = \mathbb E[X_i]$. The fourth uses $\mathbb E[X_i] = \tfrac{1}{2}(1) + \tfrac{1}{2}(-1) = 0$.
**Step 2: $\tau$ is a stopping time.** For every $n \ge 0$,
\begin{align*}
\{\tau \le n\} &= \{S_0 = 1\} \cup \{S_1 = 1\} \cup \cdots \cup \{S_n = 1\}.
\end{align*}
Each $\{S_k = 1\}$ is $\mathcal F_k$-measurable, and $\mathcal F_k \subseteq \mathcal F_n$ for $k \le n$, so the union belongs to $\mathcal F_n$.
**Step 3: $\mathbb P(\tau < \infty) = 1$.** Fix an integer $N \ge 1$ and define the auxiliary stopping time $\tau_N = \inf\{n \ge 0 : S_n = 1 \text{ or } S_n = -N\}$. Until $\tau_N$ the walk is confined to $\{-N, -N{+}1, \dots, 1\}$, a finite set, so $\tau_N < \infty$ a.s. Set $Y_n = S_n + N$; then $Y_0 = N$, $Y$ moves $\pm 1$ at each step, and $\tau_N$ is the first time $Y$ hits $N{+}1$ (corresponding to $S = 1$) or $0$ (corresponding to $S = -N$). Applying [Gambler's Ruin Probability](/theorems/1125) with starting position $N$ and absorbing barriers at $0$ and $N{+}1$:
\begin{align*}
\mathbb P(S_{\tau_N} = 1) &= \frac{N}{N+1}.
\end{align*}
Since $\{S_{\tau_N} = 1\} \subseteq \{\tau \le \tau_N\} \subseteq \{\tau < \infty\}$, we have $\mathbb P(\tau < \infty) \ge N/(N+1)$ for every $N \ge 1$. Letting $N \to \infty$:
\begin{align*}
\mathbb P(\tau < \infty) &= 1.
\end{align*}
**Step 4: $S_\tau = 1$ almost surely.** By definition of $\tau$ as the first time the walk visits 1, $S_\tau = 1$ on $\{\tau < \infty\}$. Since $\mathbb P(\tau < \infty) = 1$, we conclude $S_\tau = 1$ a.s.
**Step 5: $\mathbb E[S_\tau] = 1 \ne 0 = \mathbb E[S_0]$.** From Step 4:
\begin{align*}
\mathbb E[S_\tau] &= 1 \cdot \mathbb P(\tau < \infty) = 1.
\end{align*}
Since $S_0 = 0$ deterministically, $\mathbb E[S_0] = 0$.
**Step 6: The [Optional Stopping Theorem](/theorems/1153) does not apply because $\tau$ is not bounded.** The [Optional Stopping Theorem](/theorems/1153) requires that the stopping times be bounded above by a deterministic constant. For every $n \ge 1$, the event $\{X_1 = -1, X_2 = -1, \dots, X_n = -1\}$ has probability $2^{-n} > 0$. On this event $S_k = -k \le 0 < 1$ for $k = 1, \dots, n$, and $S_0 = 0 \ne 1$, so $\tau > n$. Since $\mathbb P(\tau > n) \ge 2^{-n} > 0$ for every $n$, the stopping time $\tau$ is not bounded by any deterministic constant.
**Step 7: The stopped family is not uniformly integrable, and the escaping mass is explicit.** For each fixed $n \ge 0$, the time $n \wedge \tau$ is bounded by $n$, so by the [Optional Stopping Theorem](/theorems/1153):
\begin{align*}
\mathbb E[S_{n \wedge \tau}] &= \mathbb E[S_0] = 0.
\end{align*}
Decompose according to whether $\tau \le n$ or not, using $S_{n \wedge \tau} = S_\tau \mathbf 1_{\{\tau \le n\}} + S_n \mathbf 1_{\{\tau > n\}}$:
\begin{align*}
0 &= \mathbb E[S_\tau \mathbf 1_{\{\tau \le n\}}] + \mathbb E[S_n \mathbf 1_{\{\tau > n\}}] \\
&= \mathbb P(\tau \le n) + \mathbb E[S_n \mathbf 1_{\{\tau > n\}}],
\end{align*}
where the second line uses $S_\tau = 1$ a.s. Rearranging:
\begin{align*}
\mathbb E[S_n \mathbf 1_{\{\tau > n\}}] &= -\mathbb P(\tau \le n).
\end{align*}
As $n \to \infty$, $\mathbb P(\tau \le n) \to 1$, so $\mathbb E[S_n \mathbf 1_{\{\tau > n\}}] \to -1$. On the event $\{\tau > n\}$ the walk has never reached 1; since $S_n$ is integer-valued and $S_n \ne 1$ at every step up to $n$, the walk satisfies $S_n \le 0$ on $\{\tau > n\}$. Thus the expected value of $S_n$ on this event is at most $0$, yet its magnitude grows: the paths that have not yet hit 1 by time $n$ are rare (probability $\mathbb P(\tau > n) \to 0$) but carry $S_n$ to deeply negative values. Meanwhile, $S_{n \wedge \tau} \to S_\tau = 1$ almost surely. A family converging a.s. to an integrable limit and also converging in $L^1$ must be uniformly integrable by [Uniform Integrability and $L^1$ Convergence](/theorems/1162); since $\mathbb E[S_{n \wedge \tau}] = 0 \not\to 1 = \mathbb E[S_\tau]$, the convergence fails in $L^1$, confirming that $\{S_{n \wedge \tau}\}_{n \ge 0}$ is not uniformly integrable.
**CONCLUSION.** The game is fair at every step: $\mathbb E[X_i] = 0$ and $(S_n)$ is a martingale. The strategy stop-at-first-profit guarantees $S_\tau = 1$ almost surely. The apparent paradox $\mathbb E[S_\tau] \ne \mathbb E[S_0]$ is not a failure of fairness but a failure of the hypothesis of the [Optional Stopping Theorem](/theorems/1153): $\tau$ has no deterministic upper bound, and the stopped family $\{S_{n \wedge \tau}\}$ is not uniformly integrable. The "missing" unit of expected wealth sits in the large negative values of $S_n$ on the rare paths that have wandered deeply into deficit and not yet recovered. Their expected contribution $\mathbb E[S_n \mathbf 1_{\{\tau > n\}}] = -\mathbb P(\tau \le n) \to -1$ exactly offsets the expected gain on paths that have already stopped. Optional stopping requires hypotheses that prevent this mass from escaping through the tails of $\tau$.
[/example]
This example gives the basic tension of the subject. A martingale has no drift when viewed one step at a time, but limiting and stopping operations can smuggle in large rare losses. Martingale theory is the collection of conditions under which local fairness survives these operations.
## Definition
Before defining a martingale, we need a way to record what information is available at each time. A process cannot be fair or unfair in the abstract; it is fair relative to what the observer already knows. The same random variable can be unpredictable with respect to a coarse information set and completely predictable with respect to a larger one.
[definition: Filtration]
Let $(\Omega, \mathcal F, \mathbb P)$ be a probability space and let $T$ be a linearly ordered time set. A filtration on $(\Omega, \mathcal F, \mathbb P)$ indexed by $T$ is a family $(\mathcal F_t)_{t \in T}$ of sub-$\sigma$-algebras of $\mathcal F$ such that
\begin{align*}
\mathcal F_s &\subseteq \mathcal F_t
\end{align*}
whenever $s, t \in T$ and $s \le t$.
[/definition]
The inclusion $\mathcal F_s \subseteq \mathcal F_t$ says that information accumulates over time. Equality is allowed: sometimes no new information arrives between two times. To use a filtration in a definition of fairness, we also need to specify which processes are actually observable at the times they are indexed. This leads to the measurability condition called adaptedness.
[definition: Adapted Process]
Let $(\Omega, \mathcal F, \mathbb P)$ be a probability space with filtration $(\mathcal F_t)_{t \in T}$. A stochastic process $(X_t)_{t \in T}$ with values in a measurable space $(E, \mathcal E)$ is adapted to $(\mathcal F_t)_{t \in T}$ if $X_t$ is $\mathcal F_t$-measurable for every $t \in T$.
[/definition]
Adaptedness rules out future-looking processes. For instance, the process $X_t = W_1$ for $0 \le t < 1$ is not adapted to the natural filtration of Brownian motion, because it reveals at time $t$ the value at time $1$.
A remaining ambiguity is where the information flow comes from. If no external observations are being modeled, we should not give the process extra information beyond its own past values; otherwise conditional fairness could depend on signals not actually contained in the process. The natural filtration is the minimal filtration that makes the process adapted, so it provides the default information flow generated by the process itself.
[definition: Natural Filtration]
Let $(X_t)_{t \in T}$ be a stochastic process on $(\Omega, \mathcal F, \mathbb P)$ with values in a measurable space $(E, \mathcal E)$. The natural filtration of $(X_t)_{t \in T}$ is the filtration $(\mathcal F_t^X)_{t \in T}$ defined by
\begin{align*}
\mathcal F_t^X &= \sigma(X_s : s \in T,\ s \le t).
\end{align*}
[/definition]
The definition of martingale also requires integrability. Conditional expectations are only finite-valued objects in the usual sense when the random variables being conditioned have finite first moment.
[definition: Integrable Process]
Let $(X_t)_{t \in T}$ be a real-valued stochastic process on $(\Omega, \mathcal F, \mathbb P)$. The process $(X_t)_{t \in T}$ is integrable if
\begin{align*}
\mathbb E[|X_t|] &< \infty
\end{align*}
for every $t \in T$.
[/definition]
Now fairness can be stated precisely. The future value may be random, but after conditioning on the current information, its expected value is exactly the current value.
[definition: Martingale]
Let $(\Omega, \mathcal F, \mathbb P)$ be a probability space with filtration $(\mathcal F_t)_{t \in T}$, where $T$ is a linearly ordered time set. A real-valued stochastic process $(X_t)_{t \in T}$ is a martingale with respect to $(\mathcal F_t)_{t \in T}$ if:
1. $(X_t)_{t \in T}$ is adapted to $(\mathcal F_t)_{t \in T}$;
2. $(X_t)_{t \in T}$ is integrable;
3. for all $s, t \in T$ with $s \le t$,
\begin{align*}
\mathbb E[X_t \mid \mathcal F_s] &= X_s
\end{align*}
almost surely.
[/definition]
The definition is deliberately relative to a filtration. Saying that $(X_t)$ is a martingale without naming the filtration usually means the filtration is clear from context, often the natural filtration or a larger filtration satisfying the same conditional expectation identity.
Many arguments need one-sided versions of fairness rather than exact equality. Processes with nonnegative conditional drift model accumulated advantage, and the upward inequality is isolated as the submartingale condition.
[definition: Submartingale]
Let $(\Omega, \mathcal F, \mathbb P)$ be a probability space with filtration $(\mathcal F_t)_{t \in T}$. A real-valued stochastic process $(X_t)_{t \in T}$ is a submartingale with respect to $(\mathcal F_t)_{t \in T}$ if:
1. $(X_t)_{t \in T}$ is adapted to $(\mathcal F_t)_{t \in T}$;
2. $(X_t)_{t \in T}$ is integrable;
3. for all $s, t \in T$ with $s \le t$,
\begin{align*}
\mathbb E[X_t \mid \mathcal F_s] &\ge X_s
\end{align*}
almost surely.
[/definition]
The inequality points upward: a submartingale is expected to increase after all current information is used. Many martingale arguments also need the opposite comparison, especially when tracking reserves, error bounds, or quantities that should not increase on average. Rather than translating every such estimate by replacing a process with its negative, we introduce the downward-drift version as its own object.
[definition: Supermartingale]
Let $(\Omega, \mathcal F, \mathbb P)$ be a probability space with filtration $(\mathcal F_t)_{t \in T}$. A real-valued stochastic process $(X_t)_{t \in T}$ is a supermartingale with respect to $(\mathcal F_t)_{t \in T}$ if:
1. $(X_t)_{t \in T}$ is adapted to $(\mathcal F_t)_{t \in T}$;
2. $(X_t)_{t \in T}$ is integrable;
3. for all $s, t \in T$ with $s \le t$,
\begin{align*}
\mathbb E[X_t \mid \mathcal F_s] &\le X_s
\end{align*}
almost surely.
[/definition]
The three definitions are tied together by sign changes. If $(X_t)$ is a submartingale, then $(-X_t)$ is a supermartingale; if both inequalities hold, the process is a martingale. This simple observation becomes useful when proving maximal inequalities, because upper and lower fluctuations can be treated symmetrically.
## Information and Conditional Fairness
The martingale identity is often easiest to verify when the process is built from independent increments. The conditional expectation then removes all future mean-zero increments and leaves exactly the past.
[example: Simple Random Walk Martingale]
**SETUP.** Let $X_1, X_2, \dots$ be i.i.d. random variables with $\mathbb P(X_i = 1) = \mathbb P(X_i = -1) = \tfrac{1}{2}$. Define the partial sums and natural filtration by
\begin{align*}
S_0 &= 0, \qquad S_n = \sum_{i=1}^n X_i \quad (n \ge 1), \qquad \mathcal F_n = \sigma(X_1, \dots, X_n),
\end{align*}
with the convention $\mathcal F_0 = \{\varnothing, \Omega\}$. The increments $X_i$ model the outcome of the $i$-th play of a fair $\pm 1$ game, and $S_n$ is the gambler's cumulative fortune after $n$ plays.
**CLAIM.** The process $(S_n)_{n \ge 0}$ is a martingale with respect to $(\mathcal F_n)_{n \ge 0}$.
**DERIVATION.** We verify the three conditions of the martingale definition.
**Step 1: Adaptedness.** For each $n \ge 0$, the sum $S_n = \sum_{i=1}^n X_i$ is a Borel function of $(X_1, \dots, X_n)$. Since $X_1, \dots, X_n$ are each $\mathcal F_n$-measurable by definition of $\mathcal F_n = \sigma(X_1,\dots,X_n)$, their sum is $\mathcal F_n$-measurable. Thus $(S_n)_{n \ge 0}$ is adapted.
**Step 2: Integrability.** Each $|X_i| = 1$ almost surely, so $|S_n| \le \sum_{i=1}^n |X_i| = n$ almost surely by the triangle inequality. Therefore
\begin{align*}
\mathbb E[|S_n|] &\le n < \infty
\end{align*}
for every $n \ge 0$.
**Step 3: Martingale identity.** Fix $m \ge n \ge 0$. Split the sum at index $n$:
\begin{align*}
S_m &= \sum_{i=1}^m X_i = \sum_{i=1}^n X_i + \sum_{i=n+1}^m X_i = S_n + \sum_{i=n+1}^m X_i.
\end{align*}
Now condition on $\mathcal F_n$:
\begin{align*}
\mathbb E[S_m \mid \mathcal F_n]
&= \mathbb E\!\left[S_n + \sum_{i=n+1}^m X_i \,\middle|\, \mathcal F_n\right] \\
&= \mathbb E[S_n \mid \mathcal F_n] + \sum_{i=n+1}^m \mathbb E[X_i \mid \mathcal F_n] \\
&= S_n + \sum_{i=n+1}^m \mathbb E[X_i] \\
&= S_n + \sum_{i=n+1}^m \!\left[\tfrac{1}{2}(1) + \tfrac{1}{2}(-1)\right] \\
&= S_n + \sum_{i=n+1}^m 0 \;=\; S_n.
\end{align*}
The justifications are as follows.
- *Line 2 (linearity and extraction):* By [Basic Properties of Conditional Expectation](/theorems/1148), $\mathbb E[\,\cdot\mid\mathcal F_n]$ is linear and $\mathbb E[S_n \mid \mathcal F_n] = S_n$ because $S_n$ is $\mathcal F_n$-measurable.
- *Line 3 (dropping the condition on future increments):* For each $i > n$, the random variable $X_i$ depends only on the $i$-th trial and is therefore independent of $\mathcal F_n = \sigma(X_1,\dots,X_n)$. By [Conditioning and Independence](/theorems/1152), independence of $X_i$ from $\mathcal F_n$ gives $\mathbb E[X_i \mid \mathcal F_n] = \mathbb E[X_i]$.
- *Line 4 (direct calculation of the unconditional mean):* $\mathbb E[X_i] = \tfrac{1}{2}(1) + \tfrac{1}{2}(-1) = 0$.
**CONCLUSION.** The martingale identity $\mathbb E[S_m \mid \mathcal F_n] = S_n$ holds for all $m \ge n \ge 0$. The key mechanism is that each future increment $X_i$ with $i > n$ is independent of the observed history $\mathcal F_n$ and has mean zero. Independence removes the conditioning, and zero mean annihilates the sum. Neither property alone suffices: a non-zero-mean process with independent increments drifts, and a zero-mean process without independence can still have predictable structure, as illustrated by the example of constant-mean processes that fail the martingale identity.
[/example]
The fair random walk example shows the martingale identity directly, but many applications do not use the martingale itself. They apply a convex function to it and then exploit the one-sided drift created by Jensen's inequality. The next result records this passage from fairness to submartingale estimates.
[quotetheorem:3538]
Thus $|X_t|$, $X_t^2$, and $e^{\theta X_t}$ often become submartingales once their integrability is known. This is one of the main routes from martingale structure to tail bounds.
There is another basic source of martingales that starts from a terminal random variable rather than from increments. If $Y$ is an integrable payoff, the problem is to turn partial information at time $t$ into a time-consistent forecast of the same final quantity. The obstruction is that forecasts at different times must agree under further conditioning; otherwise the forecast process would revise itself predictably. Conditional expectation, together with the tower property, gives exactly the martingale structure needed for these successive forecasts.
[quotetheorem:3539]
In financial language, $X_t$ is the time-$t$ price of a payoff $Y$ under a risk-neutral probability measure. In measure-theoretic language, it is simply the tower property of conditional expectation written dynamically.
Not every process with constant mean is a martingale. The missing ingredient is conditional fairness, not just equality of unconditional expectations.
[example: Constant Mean Without Martingale Property]
**SETUP.** Let $Z$ be a real-valued random variable on a probability space $(\Omega, \mathcal F, \mathbb P)$ satisfying
\begin{align*}
\mathbb E[|Z|] < \infty, \qquad \mathbb E[Z] = 0, \qquad \mathbb P(Z \ne 0) > 0.
\end{align*}
Define a three-time process by
\begin{align*}
X_0 &= 0, \qquad X_1 = Z, \qquad X_2 = -Z,
\end{align*}
and a filtration by
\begin{align*}
\mathcal F_0 &= \{\varnothing, \Omega\}, \qquad \mathcal F_1 = \mathcal F_2 = \sigma(Z).
\end{align*}
The filtration is non-decreasing because $\mathcal F_0 \subset \mathcal F_1 = \mathcal F_2$.
**CLAIM.** The process $(X_n)_{n=0,1,2}$ satisfies conditions (1) and (2) of the martingale definition—it is adapted and integrable—and all three unconditional means are zero. Nevertheless, condition (3) fails at times $1$ and $2$: the martingale identity $\mathbb E[X_2 \mid \mathcal F_1] = X_1$ does not hold. Constant unconditional mean is therefore strictly weaker than the martingale property.
**DERIVATION.**
**Step 1: Adaptedness.** We check that $X_n$ is $\mathcal F_n$-measurable for each $n$.
- $n = 0$: $X_0 = 0$ is the constant zero function, which is measurable with respect to every $\sigma$-algebra, in particular $\mathcal F_0$.
- $n = 1$: $X_1 = Z$ is $\sigma(Z)$-measurable by definition of the generated $\sigma$-algebra. Since $\mathcal F_1 = \sigma(Z)$, the random variable $X_1$ is $\mathcal F_1$-measurable.
- $n = 2$: $X_2 = -Z$. The map $z \mapsto -z$ is a Borel function on $\mathbb R$, so $-Z$ is $\sigma(Z)$-measurable. Since $\mathcal F_2 = \sigma(Z)$, the random variable $X_2$ is $\mathcal F_2$-measurable.
Thus $(X_n)$ is adapted to $(\mathcal F_n)$.
**Step 2: Integrability.** We verify $\mathbb E[|X_n|] < \infty$ for each $n$.
- $\mathbb E[|X_0|] = \mathbb E[0] = 0 < \infty$.
- $\mathbb E[|X_1|] = \mathbb E[|Z|] < \infty$ by hypothesis.
- $\mathbb E[|X_2|] = \mathbb E[|-Z|] = \mathbb E[|Z|] < \infty$.
**Step 3: Constant unconditional means.** Compute each expectation directly.
\begin{align*}
\mathbb E[X_0] &= 0, \\
\mathbb E[X_1] &= \mathbb E[Z] = 0 \quad \text{(by hypothesis)}, \\
\mathbb E[X_2] &= \mathbb E[-Z] = -\mathbb E[Z] = 0,
\end{align*}
where the last line uses linearity of expectation. All three means equal zero.
**Step 4: Computing $\mathbb E[X_2 \mid \mathcal F_1]$.** Recall $X_2 = -Z$ and $\mathcal F_1 = \sigma(Z)$. Since $X_2 = -Z$ is $\sigma(Z)$-measurable (established in Step 1) and $\mathcal F_1 = \sigma(Z)$, the random variable $X_2$ is $\mathcal F_1$-measurable. By [Basic Properties of Conditional Expectation](/theorems/1148), a random variable that is already measurable with respect to the conditioning $\sigma$-algebra equals its own conditional expectation:
\begin{align*}
\mathbb E[X_2 \mid \mathcal F_1] &= X_2 = -Z.
\end{align*}
**Step 5: The martingale identity fails.** The martingale condition at times $1$ and $2$ would require
\begin{align*}
\mathbb E[X_2 \mid \mathcal F_1] &= X_1,
\end{align*}
that is, $-Z = Z$ almost surely. This forces $2Z = 0$ almost surely, hence $Z = 0$ almost surely, which contradicts $\mathbb P(Z \ne 0) > 0$. More precisely, on the event $\{Z \ne 0\}$ (which has positive probability by hypothesis),
\begin{align*}
\mathbb E[X_2 \mid \mathcal F_1](\omega) - X_1(\omega) &= -Z(\omega) - Z(\omega) = -2Z(\omega) \ne 0.
\end{align*}
Therefore $\mathbb P\!\left(\mathbb E[X_2 \mid \mathcal F_1] \ne X_1\right) \ge \mathbb P(Z \ne 0) > 0$, so the identity fails on a set of positive measure.
**CONCLUSION.** The process $(X_n)_{n=0,1,2}$ is adapted, integrable, and has identically zero unconditional means, yet it fails the martingale property. The failure is concrete: at time $1$, the observer knows $Z = X_1$ exactly, because $\mathcal F_1 = \sigma(Z)$. Knowing $X_1 = Z$, they can perfectly predict $X_2 = -Z$ — and this prediction differs from $X_1$ whenever $Z \ne 0$. The martingale identity demands that the best conditional prediction of $X_2$ given $\mathcal F_1$ equal the current value $X_1$; instead, the conditional prediction is $-X_1$, the mirror image. Equality of means is a global, unconditional statement; martingales require the stronger local, conditional fairness $\mathbb E[X_t \mid \mathcal F_s] = X_s$ for all pairs $s \le t$.
[/example]
This example prevents a common mistake: martingales are not processes whose expectations happen to be constant. The condition is local in information and must hold after conditioning on the past.
## Stopping and Sampling
Stopping rules are where martingales become delicate. A deterministic time is harmless, but a random time chosen from the observed history can bias the sample unless the stopping operation is controlled.
[definition: Stopping Time]
Let $(\Omega, \mathcal F, \mathbb P)$ be a probability space with filtration $(\mathcal F_t)_{t \in T}$, where $T \subset [0,\infty)$. A random variable $\tau: \Omega \to T \cup \{\infty\}$ is a stopping time with respect to $(\mathcal F_t)_{t \in T}$ if
\begin{align*}
\{\tau \le t\} &\in \mathcal F_t
\end{align*}
for every $t \in T$.
[/definition]
The event $\{\tau \le t\}$ must be decidable using information available by time $t$. The first time a random walk hits a level is a stopping time; the last time before time $1$ that Brownian motion is positive is not, because deciding it requires knowledge of the future.
[illustration:martingale-stopping-time-path]
Once a stopping time is available, we need a process that agrees with the original path until the decision time and then holds its terminal value fixed. This frozen path is the object to which ordinary deterministic-time martingale tools can be applied.
[definition: Stopped Process]
Let $(X_t)_{t \in T}$ be a stochastic process and let $\tau: \Omega \to T \cup \{\infty\}$ be a stopping time. The stopped process associated with $X$ and $\tau$ is the process $(X_{t \wedge \tau})_{t \in T}$, where
\begin{align*}
t \wedge \tau &= \min\{t,\tau\}.
\end{align*}
[/definition]
After defining the stopped process, the central question is whether stopping can create an advantage in a fair game. For bounded stopping times the answer is no: the finite horizon prevents strategies from waiting indefinitely for a favorable fluctuation. This gives the basic optional sampling principle.
[quotetheorem:1153]
The boundedness assumption is not decorative. It prevents the strategy from waiting indefinitely for a favorable event while hiding the cost in rare long runs.
[example: Why Boundedness Matters]
**SETUP.** Let $X_1, X_2, \dots$ be i.i.d. random variables with $\mathbb P(X_i = 1) = \mathbb P(X_i = -1) = \tfrac{1}{2}$. Define
\begin{align*}
S_0 &= 0, \qquad S_n = \sum_{i=1}^n X_i \quad (n \ge 1), \qquad \mathcal F_n = \sigma(X_1, \dots, X_n),
\end{align*}
and the random time
\begin{align*}
\tau &= \inf\{n \ge 0 : S_n = 1\}.
\end{align*}
Here $S_n$ is the gambler's cumulative fortune after $n$ plays of a fair $\pm 1$ game, and $\tau$ is the first time that fortune reaches $1$.
**CLAIM.** The process $(S_n)_{n \ge 0}$ is a martingale with $\mathbb E[S_0] = 0$. The time $\tau$ is a stopping time satisfying $\mathbb P(\tau < \infty) = 1$, yet $\tau$ is not bounded by any deterministic constant and $\mathbb E[S_\tau] = 1 \ne 0 = \mathbb E[S_0]$. This shows that [Bounded Optional Sampling](/theorems/1153) cannot be extended to arbitrary almost-surely-finite stopping times without additional hypotheses.
**DERIVATION.**
**Step 1: $(S_n)_{n \ge 0}$ is a martingale with $\mathbb E[S_0] = 0$.** For $m \ge n \ge 0$, write $S_m = S_n + \sum_{i=n+1}^{m} X_i$ and condition on $\mathcal F_n$:
\begin{align*}
\mathbb E[S_m \mid \mathcal F_n]
&= S_n + \sum_{i=n+1}^{m} \mathbb E[X_i \mid \mathcal F_n]
= S_n + \sum_{i=n+1}^{m} \mathbb E[X_i]
= S_n + \sum_{i=n+1}^{m} \!\Bigl[\tfrac{1}{2}(1) + \tfrac{1}{2}(-1)\Bigr]
= S_n.
\end{align*}
The first equality uses linearity and the $\mathcal F_n$-measurability of $S_n$; the second uses the independence of $X_i$ from $\mathcal F_n = \sigma(X_1,\dots,X_n)$ for $i > n$, which gives $\mathbb E[X_i \mid \mathcal F_n] = \mathbb E[X_i]$; the third evaluates $\mathbb E[X_i] = \tfrac{1}{2}(1) + \tfrac{1}{2}(-1) = 0$ directly. Since $S_0 = 0$ deterministically, $\mathbb E[S_0] = 0$.
**Step 2: $\tau$ is a stopping time.** For each $n \ge 0$,
\begin{align*}
\{\tau \le n\} &= \{S_0 = 1\} \cup \{S_1 = 1\} \cup \cdots \cup \{S_n = 1\}.
\end{align*}
Each event $\{S_k = 1\}$ is $\mathcal F_k$-measurable (since $S_k$ is a Borel function of $X_1,\dots,X_k$), and $\mathcal F_k \subseteq \mathcal F_n$ for $k \le n$, so every term in the union belongs to $\mathcal F_n$. Hence the union $\{\tau \le n\}$ is $\mathcal F_n$-measurable for every $n$, making $\tau$ a stopping time.
**Step 3: $\tau$ is not bounded by any deterministic constant.** Fix an arbitrary integer $n \ge 1$. On the event $\{X_1 = -1, X_2 = -1, \dots, X_n = -1\}$, the partial sums are $S_k = -k$ for $k = 1,\dots,n$ and $S_0 = 0$, so $S_k < 1$ for every $k = 0,1,\dots,n$. On this event, $\tau > n$. Since the $X_i$ are i.i.d.,
\begin{align*}
\mathbb P(\tau > n) \;\ge\; \mathbb P(X_1 = -1, \dots, X_n = -1) \;=\; \prod_{i=1}^{n} \mathbb P(X_i = -1) \;=\; \left(\tfrac{1}{2}\right)^n \;=\; 2^{-n} \;>\; 0.
\end{align*}
Because $\mathbb P(\tau > n) > 0$ for every $n \ge 1$, there is no deterministic constant $N$ with $\tau \le N$ almost surely.
**Step 4: $\mathbb P(\tau < \infty) = 1$.** For each integer $N \ge 1$, let $\tau_N = \inf\{n \ge 0 : S_n = 1 \text{ or } S_n = -N\}$. The walk is confined to the finite set $\{-N, -N+1, \dots, 1\}$ until time $\tau_N$, so $\tau_N < \infty$ almost surely. Setting $Y_n = S_n + N$ turns the walk into one starting at $N$ with absorbing barriers at $0$ (corresponding to $S = -N$) and $N+1$ (corresponding to $S = 1$). Applying [Gambler's Ruin Probability](/theorems/1125) with starting position $N$ and barriers at $0$ and $N+1$:
\begin{align*}
\mathbb P(S_{\tau_N} = 1) &= \frac{N}{N+1}.
\end{align*}
Whenever $S_{\tau_N} = 1$ the walk has reached level $1$ by time $\tau_N$, so $\tau \le \tau_N < \infty$ on this event. Thus
\begin{align*}
\mathbb P(\tau < \infty) \;\ge\; \mathbb P(S_{\tau_N} = 1) \;=\; \frac{N}{N+1}
\end{align*}
for every $N \ge 1$. Taking $N \to \infty$ gives $\mathbb P(\tau < \infty) \ge 1$, hence $\mathbb P(\tau < \infty) = 1$.
**Step 5: $S_\tau = 1$ almost surely.** By definition of $\tau$, the event $\{\tau < \infty\}$ is exactly the event that the walk ever reaches level $1$. On $\{\tau < \infty\}$ the process hits $1$ for the first time at step $\tau$, so $S_\tau = 1$. Since $\mathbb P(\tau < \infty) = 1$ from Step 4, we have $S_\tau = 1$ almost surely.
**Step 6: $\mathbb E[S_\tau] = 1 \ne 0 = \mathbb E[S_0]$.** From Step 5,
\begin{align*}
\mathbb E[S_\tau] &= 1 \cdot \mathbb P(S_\tau = 1) = 1 \cdot 1 = 1.
\end{align*}
Combined with $\mathbb E[S_0] = 0$ from Step 1, we obtain
\begin{align*}
\mathbb E[S_\tau] - \mathbb E[S_0] &= 1 - 0 = 1 \ne 0.
\end{align*}
If [Bounded Optional Sampling](/theorems/1153) applied to $\tau$, it would give $\mathbb E[S_\tau] = \mathbb E[S_0]$. It does not apply, because $\tau$ is not bounded (Step 3).
**CONCLUSION.** Every ingredient of the Bounded Optional Sampling hypothesis holds except one: $\tau$ has no deterministic upper bound, and this single failure is enough to invalidate the conclusion. The game is fair at every step — $(S_n)$ is a martingale — and the walk reaches level $1$ with certainty, but the cost is hidden. Paths that have not yet hit $1$ by time $n$ have drifted deeply negative; as $n \to \infty$ these paths become vanishingly rare yet carry arbitrarily large negative values, and their contribution exactly cancels the unit expected gain on paths that have already stopped. Bounding $\tau$ by a deterministic constant forecloses this escape route and restores the identity $\mathbb E[S_\tau] = \mathbb E[S_0]$.
[/example]
The failed unbounded stopping example shows that ordinary integrability at each fixed time is not enough; rare extreme values can still carry the missing expectation. To extend optional sampling, we need a condition that controls all large tails uniformly across a family of random variables. That condition is uniform integrability.
[definition: Uniform Integrability]
Let $\mathcal C$ be a family of integrable real-valued random variables on $(\Omega, \mathcal F, \mathbb P)$. The family $\mathcal C$ is uniformly integrable if
\begin{align*}
\lim_{K \to \infty} \sup_{X \in \mathcal C} \mathbb E[|X| \mathbb{1}_{\{|X| > K\}}] &= 0.
\end{align*}
[/definition]
Uniform integrability is a quantitative version of "no mass escapes to infinity." It is stronger than boundedness in $L^1$, and it is exactly the condition that makes convergence in probability compatible with convergence of expectations in many martingale arguments.
This matters because optional sampling beyond bounded times is usually proved by first stopping at bounded approximations and then passing to a limit. At that point, ordinary convergence in probability is not enough: expectations can lose mass in rare large outcomes, exactly as in the failed stopping example. The next issue is therefore a recognition problem: we need a criterion telling us when convergence of random variables is strong enough to carry expectations through the limiting step.
[quotetheorem:1164]
In applications, the uniform integrability hypothesis is often verified by boundedness, domination, or $L^p$ bounds with $p > 1$. The theorem explains why stopping arguments in martingale theory so often come paired with moment estimates.
The previous theorem explains how to pass expectations to limits, but localization also needs stability before the limit is taken. If we stop a martingale at a stopping time, the resulting frozen process should still have no predictable drift. This stability result is what makes stopped approximations legitimate martingales.
[quotetheorem:3540]
This result is the mechanism behind localization: instead of proving a global estimate at once, stop the process before it becomes too large, prove a bounded statement, and then let the stopping thresholds tend to infinity.
## Decomposition and Predictable Drift
Martingales describe processes with zero predictable drift. General adapted processes can often be split into a martingale part and a predictable drift part. This turns martingales into a coordinate system for stochastic dynamics.
In discrete time, "predictable" means known just before the next increment is taken. The definition isolates processes whose value at time $n$ is already determined by information at time $n-1$.
[definition: Predictable Process in Discrete Time]
Let $(\Omega, \mathcal F, \mathbb P)$ be a probability space with filtration $(\mathcal F_n)_{n \ge 0}$. A real-valued process $(A_n)_{n \ge 0}$ is predictable if $A_0$ is $\mathcal F_0$-measurable and $A_n$ is $\mathcal F_{n-1}$-measurable for every $n \ge 1$.
[/definition]
A submartingale has nonnegative conditional drift, but the definition alone does not say where that drift is stored. In discrete time, predictable processes provide the right bookkeeping device: the amount already determined before each step can be recorded separately from the new random fluctuation. The decomposition question is whether every integrable submartingale admits such a canonical split into predictable accumulated drift plus a martingale part.
[quotetheorem:3541]
The theorem says that the excess growth in a submartingale is not mysterious: it is precisely an accumulated predictable compensator. This point of view becomes essential in continuous time, where the compensator of a counting process or increasing process carries analytic information.
Martingale transforms are another way of separating predictable choices from random shocks. They formalize betting strategies whose stake is chosen before the next increment is observed.
[definition: Martingale Transform]
Let $(M_n)_{n \ge 0}$ be a discrete-time martingale with respect to $(\mathcal F_n)_{n \ge 0}$, and let $(H_n)_{n \ge 1}$ be a predictable real-valued process. The martingale transform of $M$ by $H$ is the process $(Y_n)_{n \ge 0}$ defined by
\begin{align*}
Y_0 &= 0, \\
Y_n &= \sum_{i=1}^n H_i(M_i - M_{i-1}), \qquad n \ge 1.
\end{align*}
[/definition]
A betting strategy should not create profit from a fair game merely by rescaling each fair increment. The real issue is timing: the stake may depend on the past, but it must be chosen before the next increment is observed. Boundedness supplies the integrability needed for conditional expectations, while predictability prevents the strategy from using future information. Under these two restrictions, the transformed gains should remain fair.
[quotetheorem:3542]
The predictability of $H_i$ is the crucial condition. Choosing $H_i$ after seeing $M_i - M_{i-1}$ would allow the transform to extract drift from a fair game.
[example: Betting One Unit on Positive Drift Is Not Predictable]
**SETUP.** Let $X_1, X_2, \dots$ be i.i.d. with $\mathbb P(X_i = 1) = \mathbb P(X_i = -1) = \tfrac{1}{2}$. Define the partial sums and natural filtration by
\begin{align*}
S_0 &= 0, \qquad S_n = \sum_{i=1}^n X_i \quad (n \ge 1), \qquad \mathcal F_n = \sigma(X_1, \dots, X_n),
\end{align*}
with $\mathcal F_0 = \{\varnothing, \Omega\}$. Note that $S_n - S_{n-1} = X_n$ for every $n \ge 1$. A candidate betting strategy is the process $(H_n)_{n \ge 1}$ defined by
\begin{align*}
H_n &= \mathbb{1}_{\{S_n - S_{n-1} = 1\}} = \mathbb{1}_{\{X_n = 1\}}.
\end{align*}
The rule is: at time $n$, bet one unit if the $n$-th increment was a $+1$.
**CLAIM.** The strategy $(H_n)_{n \ge 1}$ produces a per-step expected gain of $\tfrac{1}{2} > 0$, yet $(H_n)$ is not predictable: $H_n$ is not $\mathcal F_{n-1}$-measurable. Since predictability is a necessary hypothesis of *Bounded Martingale Transform*, the accumulated gain $Y_n = \sum_{i=1}^n H_i X_i$ is not a martingale transform, and the apparent profit does not contradict the fairness of $(S_n)$.
**DERIVATION.**
**Step 1: Computing the gain at each step.** The gain from the strategy at time $n$ is $H_n X_n$. We evaluate it on each part of the sample space.
- On the event $\{X_n = 1\}$: $H_n = \mathbb{1}_{\{X_n=1\}} = 1$ and $X_n = 1$, so $H_n X_n = 1 \cdot 1 = 1$.
- On the event $\{X_n = -1\}$: $H_n = \mathbb{1}_{\{X_n=1\}} = 0$ and $X_n = -1$, so $H_n X_n = 0 \cdot (-1) = 0$.
Therefore $H_n X_n = \mathbb{1}_{\{X_n = 1\}}$ almost surely for every $n \ge 1$.
**Step 2: Computing the expected gain per step.** Since $H_n X_n = \mathbb{1}_{\{X_n = 1\}}$,
\begin{align*}
\mathbb E[H_n X_n]
&= \mathbb E\bigl[\mathbb{1}_{\{X_n = 1\}}\bigr]
= \mathbb P(X_n = 1)
= \tfrac{1}{2}.
\end{align*}
**Step 3: $H_n$ is not $\mathcal F_{n-1}$-measurable.** By *Predictable Process in Discrete Time*, a process is predictable if and only if $H_n$ is $\mathcal F_{n-1}$-measurable for every $n \ge 1$. We show this fails.
The filtration $\mathcal F_{n-1} = \sigma(X_1, \dots, X_{n-1})$ encodes information from only the first $n-1$ increments. The increment $X_n$ is independent of $\mathcal F_{n-1}$ by the i.i.d. assumption. By [Conditioning and Independence](/theorems/1152), a $\mathcal F_{n-1}$-measurable random variable $Y$ satisfies
\begin{align*}
\mathbb E[Y \mid \mathcal F_{n-1}] &= Y \quad \text{a.s.}
\end{align*}
Now suppose, for contradiction, that $H_n = \mathbb{1}_{\{X_n = 1\}}$ were $\mathcal F_{n-1}$-measurable. Then by [Basic Properties of Conditional Expectation](/theorems/1148),
\begin{align*}
H_n &= \mathbb E[H_n \mid \mathcal F_{n-1}] = \mathbb E\bigl[\mathbb{1}_{\{X_n = 1\}} \mid \mathcal F_{n-1}\bigr].
\end{align*}
Since $X_n$ is independent of $\mathcal F_{n-1}$, [Conditioning and Independence](/theorems/1152) gives
\begin{align*}
\mathbb E\bigl[\mathbb{1}_{\{X_n = 1\}} \mid \mathcal F_{n-1}\bigr]
&= \mathbb E\bigl[\mathbb{1}_{\{X_n = 1\}}\bigr]
= \mathbb P(X_n = 1)
= \tfrac{1}{2}.
\end{align*}
So the assumption that $H_n$ is $\mathcal F_{n-1}$-measurable forces $H_n = \tfrac{1}{2}$ almost surely. But $H_n = \mathbb{1}_{\{X_n = 1\}}$ is $\{0,1\}$-valued, and $\mathbb P(H_n = 0) = \mathbb P(X_n = -1) = \tfrac{1}{2} > 0$, so $H_n \ne \tfrac{1}{2}$ on a set of positive probability. This is a contradiction. Therefore $H_n$ is not $\mathcal F_{n-1}$-measurable, and $(H_n)$ is not predictable.
**Step 4: The accumulated gain process is not a martingale.** Define $Y_0 = 0$ and
\begin{align*}
Y_n &= \sum_{i=1}^n H_i X_i = \sum_{i=1}^n \mathbb{1}_{\{X_i = 1\}}, \qquad n \ge 1.
\end{align*}
Since $H_i X_i = \mathbb{1}_{\{X_i = 1\}}$ from Step 1, and the $X_i$ are i.i.d., linearity of expectation gives
\begin{align*}
\mathbb E[Y_n]
&= \sum_{i=1}^n \mathbb E\bigl[\mathbb{1}_{\{X_i = 1\}}\bigr]
= \sum_{i=1}^n \tfrac{1}{2}
= \frac{n}{2}.
\end{align*}
In particular $\mathbb E[Y_n] = n/2 \to \infty$ as $n \to \infty$. A martingale starting at $Y_0 = 0$ has constant expectation $\mathbb E[Y_n] = 0$ for all $n$; since $n/2 \ne 0$ for $n \ge 1$, the process $(Y_n)$ is not a martingale. This is consistent with *Bounded Martingale Transform* failing to apply: that theorem requires predictability of the integrand, which $(H_n)$ does not satisfy.
**CONCLUSION.** The strategy bet on a $+1$ increment after seeing it is not a legitimate betting strategy: it peeks at the outcome before placing the wager. The $\mathcal F_{n-1}$-measurability requirement in the definition of a predictable process formalises the constraint that bets must be placed using only information available strictly before the increment is revealed. Because $X_n$ is independent of $\mathcal F_{n-1}$, the indicator $\mathbb{1}_{\{X_n = 1\}}$ is not $\mathcal F_{n-1}$-measurable, and no amount of algebraic manipulation can turn it into one. The positive expected gain $\mathbb E[H_n X_n] = \tfrac{1}{2}$ is real, but it is not extracted from the martingale $(S_n)$ by a fair strategy — it is extracted by clairvoyance. The filtration is not bookkeeping: it is the precise mathematical object that separates admissible decisions from inadmissible ones.
[/example]
The example shows why the filtration is not bookkeeping. It encodes which decisions are admissible before randomness is revealed.
## Maximal Inequalities and Convergence
The most useful martingale estimates are not pointwise. They control the largest fluctuation of a process over a time interval. This is where submartingales produced by convexity enter the theory.
[quotetheorem:1158]
This is the martingale analogue of Markov's inequality, but with the maximum over time controlled by the final expectation. It is the starting point for convergence theorems because it turns terminal bounds into pathwise control.
The preceding maximal inequality controls tail probabilities, but many convergence and compactness arguments need norm bounds rather than one threshold at a time. For $L^p$ martingales, the question is whether the entire running maximum has a finite $L^p$ norm controlled by the terminal value. Such an estimate would make pathwise fluctuation control compatible with the standard Banach-space estimates used in analysis.
[quotetheorem:1159]
The constant depends only on $p$, not on the time horizon. This independence is what allows estimates to pass from finite intervals to infinite sequences.
Maximal bounds are useful because they are a route to convergence. A martingale may oscillate forever, so the next question is how to rule out infinitely many substantial upcrossings while still allowing ordinary random fluctuation. One-sided integrability bounds provide the compactness needed to force an almost sure limiting random variable.
[quotetheorem:1157]
Almost sure convergence alone does not say that the limiting random variable still represents the martingale by conditional expectation. The next question is when the limiting random variable can be used to reconstruct every earlier value of the martingale. For that stronger representation, the family of martingale values must not lose mass in rare large events as time tends to infinity. Uniform integrability is the condition that upgrades convergence into the statement that the whole martingale is the evolving conditional forecast of its limit.
[quotetheorem:1163]
The first part gives almost sure convergence under a one-sided bound. The uniformly integrable martingale case is stronger: the martingale is exactly the sequence of conditional forecasts of its limiting value.
Branching processes give a natural source of martingales because the expected population at the next generation is proportional to the current population. The proportionality can be divided out, leaving a fair process.
[definition: Galton-Watson Branching Process]
Let $(\xi_{n,i})_{n,i \ge 1}$ be i.i.d. random variables taking values in $\{0,1,2,\dots\}$. A Galton-Watson branching process with offspring distribution equal to the law of $\xi_{1,1}$ is the stochastic process $(Z_n)_{n \ge 0}$ defined by $Z_0 \in \{0,1,2,\dots\}$ and
\begin{align*}
Z_{n+1} &= \sum_{i=1}^{Z_n} \xi_{n+1,i}, \qquad n \ge 0.
\end{align*}
[/definition]
The martingale normalization uses the mean number of offspring. If the mean is $m$, then the expected population size after $n$ generations is $m^n$ times the initial population.
[example: Branching Process Martingale]
**SETUP.** Let $(\xi_{n,i})_{n \ge 1,\, i \ge 1}$ be i.i.d. non-negative integer-valued random variables with common mean
\begin{align*}
m &= \mathbb E[\xi_{1,1}] \in (0,\infty).
\end{align*}
Let $(Z_n)_{n \ge 0}$ be the Galton-Watson branching process with $Z_0 = 1$ and the recursion
\begin{align*}
Z_{n+1} &= \sum_{i=1}^{Z_n} \xi_{n+1,i}, \qquad n \ge 0,
\end{align*}
where an empty sum (when $Z_n = 0$) equals $0$. Let $\mathcal F_n = \sigma(Z_0, Z_1, \dots, Z_n)$ be the natural filtration of the population process. Define the normalized process
\begin{align*}
W_n &= \frac{Z_n}{m^n}, \qquad n \ge 0.
\end{align*}
Since $Z_0 = 1$, we have $W_0 = 1$ deterministically.
**CLAIM.** The process $(W_n)_{n \ge 0}$ is a nonnegative martingale with respect to $(\mathcal F_n)_{n \ge 0}$, with constant mean $\mathbb E[W_n] = 1$ for every $n \ge 0$. The [Almost Sure Martingale Convergence Theorem](/theorems/1157) therefore guarantees that $W_n$ converges almost surely to a finite nonnegative random variable $W_\infty$.
**DERIVATION.**
We verify the three conditions of the martingale definition.
**Step 1: Adaptedness.** For each $n \ge 0$, the random variable $Z_n$ is $\mathcal F_n$-measurable by definition of $\mathcal F_n = \sigma(Z_0, \dots, Z_n)$. Since $m > 0$ is a deterministic constant, $W_n = Z_n/m^n$ is a Borel function of $Z_n$ and hence $\mathcal F_n$-measurable. Thus $(W_n)_{n \ge 0}$ is adapted to $(\mathcal F_n)_{n \ge 0}$.
**Step 2: The fundamental conditional expectation.** We establish
\begin{align*}
\mathbb E[Z_{n+1} \mid \mathcal F_n] &= m Z_n \qquad \text{almost surely.}
\end{align*}
Fix an integer $k \ge 0$ and work on the event $\{Z_n = k\} \in \mathcal F_n$. The branching recursion gives $Z_{n+1} = \sum_{i=1}^{k} \xi_{n+1,i}$ on this event (an empty sum equalling $0$ when $k = 0$). By construction of the Galton-Watson process, the generation-$(n+1)$ offspring variables $\xi_{n+1,1}, \dots, \xi_{n+1,k}$ are i.i.d. and independent of $\mathcal F_n$. Applying linearity of conditional expectation to the fixed finite sum of $k$ terms ([Basic Properties of Conditional Expectation](/theorems/1148)), then dropping the conditioning on each $\xi_{n+1,i}$ by independence ([Conditioning and Independence](/theorems/1152)):
\begin{align*}
\mathbb E[Z_{n+1} \mid \mathcal F_n]\big|_{\{Z_n = k\}}
&= \mathbb E\!\left[\sum_{i=1}^{k} \xi_{n+1,i} \,\middle|\, \mathcal F_n\right]\bigg|_{\{Z_n = k\}} \\
&= \sum_{i=1}^{k} \mathbb E[\xi_{n+1,i} \mid \mathcal F_n]\bigg|_{\{Z_n = k\}} \\
&= \sum_{i=1}^{k} \mathbb E[\xi_{n+1,i}] \\
&= \sum_{i=1}^{k} m \;=\; km.
\end{align*}
Since $\{Z_n = k\}$ for $k = 0, 1, 2, \dots$ partition $\Omega$ into $\mathcal F_n$-measurable sets, and $\mathbb E[Z_{n+1}\mid\mathcal F_n] = km = mZ_n$ on each atom $\{Z_n = k\}$, we conclude
\begin{align*}
\mathbb E[Z_{n+1} \mid \mathcal F_n] &= m Z_n \qquad \text{almost surely.}
\end{align*}
**Step 3: Integrability.** We show $\mathbb E[Z_n] = m^n$ by induction on $n$.
*Base case* $n = 0$: $\mathbb E[Z_0] = 1 = m^0$.
*Inductive step*: Assume $\mathbb E[Z_n] = m^n$. By the [Tower Property of Conditional Expectation](/theorems/1150) and Step 2:
\begin{align*}
\mathbb E[Z_{n+1}]
&= \mathbb E\!\left[\mathbb E[Z_{n+1} \mid \mathcal F_n]\right]
= \mathbb E[m Z_n]
= m\,\mathbb E[Z_n]
= m \cdot m^n
= m^{n+1}.
\end{align*}
Therefore
\begin{align*}
\mathbb E[W_n] &= \frac{\mathbb E[Z_n]}{m^n} = \frac{m^n}{m^n} = 1 < \infty
\end{align*}
for every $n \ge 0$.
**Step 4: Martingale identity.** Using Step 2 and the fact that $1/m^{n+1}$ is a deterministic constant (so it pulls out of the conditional expectation by linearity, [Basic Properties of Conditional Expectation](/theorems/1148)):
\begin{align*}
\mathbb E[W_{n+1} \mid \mathcal F_n]
&= \mathbb E\!\left[\frac{Z_{n+1}}{m^{n+1}} \,\middle|\, \mathcal F_n\right]
= \frac{1}{m^{n+1}}\,\mathbb E[Z_{n+1} \mid \mathcal F_n]
= \frac{1}{m^{n+1}} \cdot m Z_n
= \frac{Z_n}{m^n}
= W_n.
\end{align*}
**Step 5: Nonnegativity.** Each $\xi_{n,i} \ge 0$ a.s., so $Z_n \ge 0$ a.s. for every $n$ by induction: $Z_0 = 1 \ge 0$, and if $Z_n \ge 0$ then $Z_{n+1} = \sum_{i=1}^{Z_n} \xi_{n+1,i} \ge 0$ as a sum of nonneg terms. Since $m > 0$, we have $W_n = Z_n/m^n \ge 0$ a.s.
**Step 6: Almost sure convergence.** Since $W_n \ge 0$ a.s., we have $W_n^+ = W_n$, so
\begin{align*}
\sup_{n \ge 0} \mathbb E[W_n^+] &= \sup_{n \ge 0} \mathbb E[W_n] = \sup_{n \ge 0} 1 = 1 < \infty.
\end{align*}
The [Almost Sure Martingale Convergence Theorem](/theorems/1157) then gives an integrable random variable $W_\infty \ge 0$ such that $W_n \to W_\infty$ almost surely.
**CONCLUSION.** Each generation's offspring are i.i.d. with mean $m$ and, crucially, independent of the accumulated history $\mathcal F_n$. This independence lets conditional expectations reduce to unconditional ones, giving $\mathbb E[Z_{n+1}\mid\mathcal F_n] = mZ_n$ — precisely the multiplicative factor cancelled by the $m^n$ in the denominator of $W_n$. The result is a martingale, and its nonnegativity provides the uniform $L^1$ bound $\sup_n \mathbb E[W_n^+] = 1$ required by the Almost Sure Martingale Convergence Theorem.
The further question — whether $W_\infty > 0$ on the survival event $\{\text{all } Z_n > 0\}$ — lies beyond the three martingale conditions alone. A theorem of Kesten and Stigum asserts that $\mathbb P(W_\infty > 0 \mid \text{survival}) = 1$ if and only if $\mathbb E[\xi_{1,1}\log\xi_{1,1}] < \infty$. When this moment condition fails, $W_\infty = 0$ almost surely even on survival: the normalized population collapses, reflecting a failure of uniform integrability in the family $\{W_n : n \ge 0\}$ that prevents $L^1$ convergence despite almost sure convergence.
[/example]
The example illustrates a recurring pattern: normalize a growing random object by its expected growth, obtain a martingale, and then use martingale convergence to extract a limiting random variable.
## Continuous-Time Martingales
Continuous time introduces two new issues. First, there are uncountably many times, so path regularity matters. Second, stochastic calculus studies processes that can fluctuate infinitely often on every interval, so martingale structure must be compatible with continuous paths and quadratic variation.
To avoid ambiguity at time boundaries, continuous-time probability usually assumes right-continuity of the filtration.
[definition: Right-Continuous Filtration]
Let $(\mathcal F_t)_{t \ge 0}$ be a filtration on $(\Omega, \mathcal F, \mathbb P)$. The filtration is right-continuous if
\begin{align*}
\mathcal F_t &= \bigcap_{s > t} \mathcal F_s
\end{align*}
for every $t \ge 0$.
[/definition]
Right-continuity says that no new information appears immediately after time $t$ without being visible at time $t$. It is part of the usual hypotheses in stochastic calculus, often paired with completeness of the probability space.
Continuous time needs a model whose noise has no predictable drift but still accumulates random fluctuation over every interval. The process must have independent future increments relative to the filtration, continuous sample paths, and variance that scales with elapsed time. Standard Brownian motion packages these requirements into the basic source of continuous-time martingales.
[definition: Standard Brownian Motion]
Let $(\Omega, \mathcal F, \mathbb P)$ be a probability space with filtration $(\mathcal F_t)_{t \ge 0}$. A real-valued process $(W_t)_{t \ge 0}$ is a standard Brownian motion with respect to $(\mathcal F_t)_{t \ge 0}$ if:
1. $W_0 = 0$ almost surely;
2. $(W_t)_{t \ge 0}$ has continuous paths almost surely;
3. $(W_t)_{t \ge 0}$ is adapted to $(\mathcal F_t)_{t \ge 0}$;
4. for all $0 \le s \le t$, the increment $W_t - W_s$ is independent of $\mathcal F_s$ and satisfies
\begin{align*}
W_t - W_s &\sim \mathcal N(0,t-s).
\end{align*}
[/definition]
The mean-zero independent increments in the definition are exactly what the martingale identity needs, but the continuous-time statement still has to check compatibility with the whole filtration at every pair of times. The point is that conditioning on $\mathcal F_s$ should remove only the future increment and leave the already observed value $W_s$ unchanged. This turns Brownian motion from a path model into a martingale relative to its information flow.
[quotetheorem:1183]
Brownian motion is not the only martingale built from Brownian motion. Subtracting the predictable variance from its square removes the drift created by quadratic fluctuations.
[example: Square of Brownian Motion]
**SETUP.** Let $(W_t)_{t \ge 0}$ be a standard Brownian motion on a probability space $(\Omega, \mathcal F, \mathbb P)$. Let $(\mathcal F_t)_{t \ge 0}$ denote its natural filtration, so $\mathcal F_t = \sigma(W_s : 0 \le s \le t)$. By definition of standard Brownian motion, $W_0 = 0$ almost surely, paths are continuous almost surely, $(W_t)$ is adapted to $(\mathcal F_t)$, and for every $0 \le s \le t$ the increment $W_t - W_s$ is independent of $\mathcal F_s$ and satisfies $W_t - W_s \sim \mathcal{N}(0, t - s)$.
We study two processes built from $W$:
\begin{align*}
X_t &= W_t^2, \qquad t \ge 0, \\
M_t &= W_t^2 - t, \qquad t \ge 0.
\end{align*}
**CLAIM.** The process $(X_t)_{t \ge 0} = (W_t^2)_{t \ge 0}$ is a submartingale with respect to $(\mathcal F_t)_{t \ge 0}$, but not a martingale. Removing the accumulated quadratic drift produces the martingale $(M_t)_{t \ge 0} = (W_t^2 - t)_{t \ge 0}$. Specifically, for all $0 \le s \le t$:
\begin{align*}
\mathbb E[W_t^2 \mid \mathcal F_s] &= W_s^2 + (t - s), \\
\mathbb E[M_t \mid \mathcal F_s] &= M_s.
\end{align*}
**DERIVATION.**
**Step 1: Adaptedness.** Since $W_t$ is $\mathcal F_t$-measurable by construction of $\mathcal F_t$, the map $x \mapsto x^2$ is Borel, so $W_t^2$ is $\mathcal F_t$-measurable. Thus $(W_t^2)_{t \ge 0}$ is adapted to $(\mathcal F_t)_{t \ge 0}$. Since $t \ge 0$ is a deterministic constant, $M_t = W_t^2 - t$ is also $\mathcal F_t$-measurable.
**Step 2: Integrability.** Because $W_t \sim \mathcal{N}(0, t)$, its second moment equals its variance:
\begin{align*}
\mathbb E[W_t^2] &= \mathrm{Var}(W_t) + (\mathbb E[W_t])^2 = t + 0 = t < \infty.
\end{align*}
So $\mathbb E[|X_t|] = \mathbb E[W_t^2] = t < \infty$ for every $t \ge 0$, giving integrability of $(X_t)$. For $(M_t)$, the triangle inequality gives $\mathbb E[|M_t|] \le \mathbb E[W_t^2] + t = t + t = 2t < \infty$.
**Step 3: Expanding $W_t^2$ around the past value.** Fix $0 \le s \le t$. Since $W_t = W_s + (W_t - W_s)$, squaring gives:
\begin{align*}
W_t^2 &= \bigl(W_s + (W_t - W_s)\bigr)^2 \\
&= W_s^2 + 2W_s(W_t - W_s) + (W_t - W_s)^2.
\end{align*}
**Step 4: Computing each conditional expectation.** Apply $\mathbb E[\,\cdot \mid \mathcal F_s]$ to both sides of Step 3, using linearity ([Basic Properties of Conditional Expectation](/theorems/1148)):
\begin{align*}
\mathbb E[W_t^2 \mid \mathcal F_s]
&= \mathbb E[W_s^2 \mid \mathcal F_s]
+ \mathbb E\bigl[2W_s(W_t - W_s) \mid \mathcal F_s\bigr]
+ \mathbb E\bigl[(W_t - W_s)^2 \mid \mathcal F_s\bigr].
\end{align*}
We evaluate each term.
*Term 1.* $W_s^2$ is $\mathcal F_s$-measurable (established in Step 1 with $t$ replaced by $s$). By [Basic Properties of Conditional Expectation](/theorems/1148), a random variable that is already $\mathcal F_s$-measurable equals its own conditional expectation:
\begin{align*}
\mathbb E[W_s^2 \mid \mathcal F_s] &= W_s^2.
\end{align*}
*Term 2.* $W_s$ is $\mathcal F_s$-measurable. By [Basic Properties of Conditional Expectation](/theorems/1148), measurable factors extract from conditional expectations. The increment $W_t - W_s$ is independent of $\mathcal F_s$ by the definition of standard Brownian motion, so [Conditioning and Independence](/theorems/1152) gives $\mathbb E[W_t - W_s \mid \mathcal F_s] = \mathbb E[W_t - W_s]$. Since $W_t - W_s \sim \mathcal{N}(0, t - s)$, its mean is $\mathbb E[W_t - W_s] = 0$. Combining:
\begin{align*}
\mathbb E\bigl[2W_s(W_t - W_s) \mid \mathcal F_s\bigr]
&= 2W_s\,\mathbb E[W_t - W_s \mid \mathcal F_s]
= 2W_s \cdot 0
= 0.
\end{align*}
*Term 3.* The function $u \mapsto u^2$ is Borel, so $(W_t - W_s)^2$ is a Borel function of the increment $W_t - W_s$, which is independent of $\mathcal F_s$. Applying [Conditioning and Independence](/theorems/1152) again:
\begin{align*}
\mathbb E\bigl[(W_t - W_s)^2 \mid \mathcal F_s\bigr]
&= \mathbb E\bigl[(W_t - W_s)^2\bigr].
\end{align*}
Since $W_t - W_s \sim \mathcal{N}(0, t - s)$, the second moment of a centered normal random variable equals its variance:
\begin{align*}
\mathbb E\bigl[(W_t - W_s)^2\bigr]
&= \mathrm{Var}(W_t - W_s) + \bigl(\mathbb E[W_t - W_s]\bigr)^2
= (t - s) + 0^2
= t - s.
\end{align*}
**Step 5: Assembling the result.** Summing the three terms:
\begin{align*}
\mathbb E[W_t^2 \mid \mathcal F_s]
&= W_s^2 + 0 + (t - s)
= W_s^2 + (t - s).
\end{align*}
**Step 6: $(W_t^2)_{t \ge 0}$ is a submartingale but not a martingale.** For $t \ge s$, we have $t - s \ge 0$, so:
\begin{align*}
\mathbb E[W_t^2 \mid \mathcal F_s] &= W_s^2 + (t - s) \ge W_s^2.
\end{align*}
The inequality $\mathbb E[X_t \mid \mathcal F_s] \ge X_s$ holds almost surely, so $(W_t^2)$ is a submartingale. When $t > s$, the quantity $t - s > 0$, so $\mathbb E[W_t^2 \mid \mathcal F_s] = W_s^2 + (t - s) \ne W_s^2$ on any event where the conditional expectation is not canceling the drift — and indeed, $\mathbb E[W_t^2 \mid \mathcal F_s] - W_s^2 = t - s > 0$ deterministically. Therefore the martingale identity $\mathbb E[W_t^2 \mid \mathcal F_s] = W_s^2$ fails for every pair $s < t$, and $(W_t^2)$ is not a martingale.
**Step 7: $(M_t)_{t \ge 0} = (W_t^2 - t)_{t \ge 0}$ is a martingale.** For $0 \le s \le t$, apply linearity and the fact that the deterministic quantity $t$ is its own conditional expectation ([Basic Properties of Conditional Expectation](/theorems/1148)):
\begin{align*}
\mathbb E[M_t \mid \mathcal F_s]
&= \mathbb E[W_t^2 - t \mid \mathcal F_s] \\
&= \mathbb E[W_t^2 \mid \mathcal F_s] - t \\
&= \bigl(W_s^2 + (t - s)\bigr) - t \\
&= W_s^2 - s \\
&= M_s.
\end{align*}
The third line substitutes the result of Step 5. The fourth line expands and cancels: $W_s^2 + t - s - t = W_s^2 - s$. Together with adaptedness and integrability from Steps 1–2, this confirms that $(M_t)$ is a martingale.
**CONCLUSION.** Squaring Brownian motion introduces a bias: each future increment $W_t - W_s$ has mean zero, but its square $(W_t - W_s)^2$ has mean $t - s$, the elapsed time. This quadratic mean accumulates deterministically regardless of the path, pushing $\mathbb E[W_t^2 \mid \mathcal F_s]$ above the current value $W_s^2$ by exactly $t - s$. The compensated process $M_t = W_t^2 - t$ subtracts this accumulated drift, restoring the martingale identity. The term $t$ is precisely the quadratic variation $[W]_t = t$ of Brownian motion: in the language of the Doob decomposition, the predictable increasing process $A_t = t$ is the compensator of the submartingale $(W_t^2)$, and stripping it away leaves the martingale part $(M_t)$.
[/example]
The preceding example identifies a drift correction for $W_t^2$, but the correction is not ordinary variation of the Brownian path. Brownian paths are too rough for first-order accumulated displacement to capture their stochastic size. What survives under fine partitions is the accumulated square of small increments, and this second-order accumulation is the quantity that stochastic calculus must measure.
[definition: Quadratic Variation]
Let $(X_t)_{t \ge 0}$ be a real-valued stochastic process. A process $([X]_t)_{t \ge 0}$ is the quadratic variation of $X$ along a refining family of partitions if, for every $t \ge 0$, the sums
\begin{align*}
\sum_{i=0}^{k_n-1} \left(X_{t_{i+1}^{(n)} \wedge t} - X_{t_i^{(n)} \wedge t}\right)^2
\end{align*}
converge in probability to $[X]_t$ as the mesh of the partition $\{0 = t_0^{(n)} < \cdots < t_{k_n}^{(n)}\}$ tends to $0$.
[/definition]
For Brownian motion, the definition should recover the deterministic clock suggested by the compensated martingale $W_t^2-t$. This is the first place where Brownian motion visibly differs from a smooth path: its first-order increments average to zero, but their squared increments accumulate at a nonzero deterministic rate. The key question is whether the partition sums really converge to this deterministic clock rather than to a random or partition-dependent limit.
[quotetheorem:3543]
The theorem identifies the deterministic second-order clock hidden inside Brownian paths. It is not a pathwise bounded-variation statement: individual Brownian paths oscillate too violently for ordinary variation to work, while the squared increments stabilize in probability. This is why Itô's formula later contains a second-derivative correction term. The correction is not an artificial convention; it is the analytic trace of the nonzero quadratic variation $[W]_t=t$.
[illustration:brownian-quadratic-variation-sums]
Quadratic variation gives the second-order clock for Brownian motion, but stochastic calculus also needs a way to handle processes before global integrability estimates are available. Many natural processes behave like martingales up to the first time they become too large, and only later require separate bounds. This motivates a localized version of the martingale property, where fairness is tested after stopping at an increasing sequence of safer times.
[definition: Local Martingale]
Let $(\Omega, \mathcal F, \mathbb P)$ be a probability space with filtration $(\mathcal F_t)_{t \ge 0}$. A real-valued adapted process $(X_t)_{t \ge 0}$ is a local martingale if there exists an increasing sequence of stopping times $(\tau_n)_{n \ge 1}$ such that $\tau_n \to \infty$ almost surely and, for each $n \ge 1$, the stopped process $(X_{t \wedge \tau_n})_{t \ge 0}$ is a martingale.
[/definition]
Every martingale is a local martingale, but the converse requires integrability control. This distinction is one reason stochastic calculus often proves local statements first and then adds moment estimates when expectations are needed.
[quotetheorem:2077]
The theorem is a warning: local fairness plus nonnegativity can still lose mass in expectation. In financial mathematics this phenomenon is related to strict local martingales and asset price bubbles; in analysis it reflects mass escaping through localization limits.
## Stochastic Integrals and Itô Calculus
The stochastic integral is built to preserve the martingale principle. If the integrand is chosen using only past information and the integrator has martingale increments, the resulting accumulated gain should again have no predictable drift.
The continuous-time analogue of a predictable betting strategy is a predictable process. The full definition uses a $\sigma$-algebra on space-time that records information available immediately before each time.
[definition: Predictable Sigma Algebra]
Let $(\Omega, \mathcal F, \mathbb P)$ be a probability space with filtration $(\mathcal F_t)_{t \ge 0}$. The predictable $\sigma$-algebra on $\Omega \times [0,\infty)$ is the $\sigma$-algebra generated by all left-continuous adapted real-valued processes on $\Omega \times [0,\infty)$.
[/definition]
With this object in place, predictability becomes the admissibility condition for continuous-time integrands. The value of the integrand at time $t$ must be determined without seeing the fresh noise arriving at that same instant. The measurability condition below turns that no-anticipation principle into a precise test for whether a process may be used as an Itô integrand.
[definition: Predictable Process in Continuous Time]
Let $(\Omega, \mathcal F, \mathbb P)$ be a probability space with filtration $(\mathcal F_t)_{t \ge 0}$ and predictable $\sigma$-algebra $\mathcal P$ on $\Omega \times [0,\infty)$. A real-valued process $(H_t)_{t \ge 0}$ is predictable if the map
\begin{align*}
H: \Omega \times [0,\infty) &\to \mathbb R \\
(\omega,t) &\mapsto H_t(\omega)
\end{align*}
is $\mathcal P$-measurable.
[/definition]
Predictable processes are the allowable strategies, but we still need to define what it means to accumulate gains against Brownian motion. The roughness of Brownian paths prevents an ordinary pathwise Riemann-Stieltjes integral, so the construction begins in $L^2$ where approximation by simple predictable processes can be controlled. The resulting object is the Itô integral with respect to Brownian motion.
[definition: Itô Integral with Respect to Brownian Motion]
Let $(W_t)_{t \ge 0}$ be a standard Brownian motion with respect to a filtration $(\mathcal F_t)_{t \ge 0}$. For a predictable process $(H_t)_{0 \le t \le T}$ satisfying
\begin{align*}
\mathbb E\left[\int_0^T H_t^2\, dt\right] &< \infty,
\end{align*}
the Itô integral of $H$ with respect to $W$ on $[0,T]$ is the square-integrable random variable
\begin{align*}
\int_0^T H_t\, dW_t.
\end{align*}
[/definition]
The construction is made by approximation with simple predictable processes, so it needs a norm identity that proves different approximating sequences give the same limit. The essential obstruction is that Brownian paths are too rough for ordinary variation estimates; the square-integrable theory is controlled instead by quadratic variation.
[quotetheorem:3544]
The isometry controls the size of the terminal integral, but martingale theory also requires a dynamic statement at intermediate times. If the integrand is predictable, then each future Brownian increment is multiplied only by information already known before that increment occurs. This no-anticipation structure should preserve conditional fairness for the whole integral process.
[quotetheorem:3545]
This theorem is the continuous-time version of the bounded martingale transform. The integrand $H_s$ is the stake, and $dW_s$ is the infinitesimal martingale increment.
Once stochastic integrals are available, the next obstruction is that ordinary chain rules are no longer correct for Brownian-driven processes. Quadratic variation contributes a second-order finite-variation term, so smooth transformations must be expanded with both first and second derivatives. Itô's formula gives the rule that separates the martingale part from the finite-variation correction.
[quotetheorem:3546]
The second derivative term is the analytic trace of quadratic variation. If this finite-variation term vanishes or is compensated, the remaining process is a martingale.
[example: Exponential Martingale]
**SETUP.** Let $(W_t)_{t \ge 0}$ be a standard Brownian motion on a probability space $(\Omega, \mathcal F, \mathbb P)$ with natural filtration $(\mathcal F_t)_{t \ge 0}$, where $\mathcal F_t = \sigma(W_s : 0 \le s \le t)$. By definition, $W_0 = 0$ almost surely, paths are continuous almost surely, and for every $0 \le s \le t$ the increment $W_t - W_s \sim \mathcal N(0, t - s)$ is independent of $\mathcal F_s$. Fix $\theta \in \mathbb R$ and define
\begin{align*}
M_t &= \exp\!\left(\theta W_t - \frac{\theta^2}{2}\,t\right), \qquad t \ge 0.
\end{align*}
Since $W_0 = 0$ almost surely, $M_0 = e^0 = 1$ deterministically.
**CLAIM.** The process $(M_t)_{t \ge 0}$ is a strictly positive local martingale satisfying the stochastic differential equation
\begin{align*}
dM_t &= \theta M_t\, dW_t.
\end{align*}
On each finite interval $[0, T]$ it is a genuine square-integrable martingale, and
\begin{align*}
\mathbb E[M_t] &= 1
\end{align*}
for every $0 \le t \le T$.
**DERIVATION.**
**Step 1: Positivity and adaptedness.** The exponential function is everywhere strictly positive, so $M_t = e^{(\cdots)} > 0$ for all $\omega \in \Omega$ and all $t \ge 0$. Since $W_t$ is $\mathcal F_t$-measurable and $t$ is a deterministic constant, the map $\omega \mapsto \exp(\theta W_t(\omega) - \tfrac{\theta^2}{2} t)$ is a Borel function of $W_t(\omega)$. By [Basic Properties of Conditional Expectation](/theorems/1148), Borel functions of $\mathcal F_t$-measurable random variables are $\mathcal F_t$-measurable, so $M_t$ is $\mathcal F_t$-measurable for every $t \ge 0$.
**Step 2: Apply the Itô formula to $e^{\theta W_t}$.** Set $f(x) = e^{\theta x}$. Then $f \in C^2(\mathbb R)$ with
\begin{align*}
f'(x) &= \theta e^{\theta x}, \qquad f''(x) = \theta^2 e^{\theta x}.
\end{align*}
By *Itô Formula for Brownian Motion*, for every $t \ge 0$:
\begin{align*}
e^{\theta W_t}
&= e^{\theta W_0} + \int_0^t \theta e^{\theta W_s}\, dW_s + \frac{1}{2}\int_0^t \theta^2 e^{\theta W_s}\, ds.
\end{align*}
Since $W_0 = 0$ almost surely, $e^{\theta W_0} = 1$. In differential notation this reads
\begin{align*}
d\!\left(e^{\theta W_t}\right) &= \theta e^{\theta W_t}\, dW_t + \frac{\theta^2}{2}\, e^{\theta W_t}\, dt. \tag{$*$}
\end{align*}
**Step 3: Introduce the time-compensating factor and compute $dM_t$.** Write $M_t = e^{\theta W_t} \cdot g(t)$ where $g(t) = e^{-\theta^2 t/2}$. The function $g$ is deterministic and $C^1$ with $g'(t) = -\tfrac{\theta^2}{2}\, g(t)$. A deterministic $C^1$ function has zero quadratic variation, so its covariation with any semimartingale $X$ satisfies $[X, g]_t = 0$. The product rule for a semimartingale multiplied by a deterministic $C^1$ function therefore gives
\begin{align*}
dM_t &= g(t)\, d\!\left(e^{\theta W_t}\right) + e^{\theta W_t}\, g'(t)\, dt.
\end{align*}
Substituting $(*)$ and $g'(t) = -\tfrac{\theta^2}{2}\, g(t)$:
\begin{align*}
dM_t
&= e^{-\theta^2 t/2}\!\left[\theta e^{\theta W_t}\, dW_t + \frac{\theta^2}{2}\, e^{\theta W_t}\, dt\right]
+ e^{\theta W_t}\!\left[-\frac{\theta^2}{2}\, e^{-\theta^2 t/2}\right] dt \\
&= \theta\, e^{\theta W_t - \theta^2 t/2}\, dW_t
+ \frac{\theta^2}{2}\, e^{\theta W_t - \theta^2 t/2}\, dt
- \frac{\theta^2}{2}\, e^{\theta W_t - \theta^2 t/2}\, dt \\
&= \theta M_t\, dW_t + \left(\frac{\theta^2}{2} - \frac{\theta^2}{2}\right) M_t\, dt \\
&= \theta M_t\, dW_t.
\end{align*}
The two $dt$ contributions cancel exactly. In integral form, with $M_0 = 1$:
\begin{align*}
M_t &= 1 + \int_0^t \theta M_s\, dW_s.
\end{align*}
**Step 4: $(M_t)_{t \ge 0}$ is a local martingale.** The representation $M_t = 1 + \int_0^t \theta M_s\, dW_s$ writes $M_t$ as a stochastic integral against Brownian motion. Define stopping times $\tau_n = \inf\{t \ge 0 : M_t \ge n\}$. On the stochastic interval $[0, \tau_n]$ the integrand $\theta M_{s \wedge \tau_n}$ is bounded in absolute value by $|\theta| n$, so $\mathbb E[\int_0^T \theta^2 M_{s \wedge \tau_n}^2\, ds] \le \theta^2 n^2 T < \infty$ for every $T, n < \infty$. By *Brownian Stochastic Integral Martingale*, the stopped process $(M_{t \wedge \tau_n})_{t \ge 0}$ is a square-integrable martingale for each $n$. Since $M_t > 0$ almost surely and is continuous, the stopping times $\tau_n$ increase to infinity almost surely. Therefore $(M_t)_{t \ge 0}$ is a local martingale. Being a positive local martingale, it is also a supermartingale by [Non-negative Local Martingale is a Supermartingale](/theorems/2077).
**Step 5: $(M_t)_{0 \le t \le T}$ is a genuine martingale on every finite interval.** To apply *Brownian Stochastic Integral Martingale* without stopping, we verify
\begin{align*}
\mathbb E\!\left[\int_0^T \theta^2 M_s^2\, ds\right] &< \infty.
\end{align*}
We compute $\mathbb E[M_s^2]$. Since $M_s^2 = e^{2\theta W_s - \theta^2 s}$,
\begin{align*}
\mathbb E[M_s^2] &= e^{-\theta^2 s}\, \mathbb E\!\left[e^{2\theta W_s}\right].
\end{align*}
The density of $W_s \sim \mathcal N(0, s)$ is $\frac{1}{\sqrt{2\pi s}}\, e^{-x^2/(2s)}$. We evaluate $\mathbb E[e^{2\theta W_s}]$ by completing the square:
\begin{align*}
2\theta x - \frac{x^2}{2s}
&= -\frac{1}{2s}\!\left(x^2 - 4\theta s\, x\right)
= -\frac{(x - 2\theta s)^2}{2s} + 2\theta^2 s.
\end{align*}
Therefore
\begin{align*}
\mathbb E\!\left[e^{2\theta W_s}\right]
&= \frac{1}{\sqrt{2\pi s}} \int_{-\infty}^{\infty} \exp\!\left(-\frac{(x - 2\theta s)^2}{2s} + 2\theta^2 s\right) dx \\
&= e^{2\theta^2 s} \cdot \frac{1}{\sqrt{2\pi s}} \int_{-\infty}^{\infty} e^{-(x - 2\theta s)^2/(2s)}\, dx \\
&= e^{2\theta^2 s} \cdot 1 \;=\; e^{2\theta^2 s},
\end{align*}
where the last integral equals $\sqrt{2\pi s}$ because the integrand is the unnormalized density of $\mathcal N(2\theta s,\, s)$. Substituting back:
\begin{align*}
\mathbb E[M_s^2] &= e^{-\theta^2 s} \cdot e^{2\theta^2 s} = e^{\theta^2 s}.
\end{align*}
By Fubini's theorem (the integrand is nonnegative):
\begin{align*}
\mathbb E\!\left[\int_0^T \theta^2 M_s^2\, ds\right]
&= \theta^2 \int_0^T \mathbb E[M_s^2]\, ds
= \theta^2 \int_0^T e^{\theta^2 s}\, ds.
\end{align*}
If $\theta = 0$ then $M_t = 1$ identically, and the claim is trivial. For $\theta \ne 0$, evaluating the integral:
\begin{align*}
\theta^2 \int_0^T e^{\theta^2 s}\, ds
&= \theta^2 \cdot \frac{e^{\theta^2 s}}{\theta^2}\bigg|_{s=0}^{s=T}
= e^{\theta^2 T} - 1
< \infty.
\end{align*}
The integrability condition holds for every finite $T$. By *Brownian Stochastic Integral Martingale*, the process $(M_t)_{0 \le t \le T} = \bigl(1 + \int_0^t \theta M_s\, dW_s\bigr)_{0 \le t \le T}$ is a square-integrable martingale with respect to $(\mathcal F_t)_{0 \le t \le T}$.
**Step 6: $\mathbb E[M_t] = 1$.** Since $(M_t)_{0 \le t \le T}$ is a martingale and $M_0 = 1$, taking unconditional expectations of the martingale identity at times $0$ and $t$ gives
\begin{align*}
\mathbb E[M_t] &= \mathbb E[M_0] = 1.
\end{align*}
This may also be verified directly without invoking the martingale property. Completing the square in $\theta x - x^2/(2t)$:
\begin{align*}
\theta x - \frac{x^2}{2t}
&= -\frac{(x - \theta t)^2}{2t} + \frac{\theta^2 t}{2},
\end{align*}
so
\begin{align*}
\mathbb E\!\left[e^{\theta W_t}\right]
&= \frac{1}{\sqrt{2\pi t}} \int_{-\infty}^{\infty} e^{\theta x}\, e^{-x^2/(2t)}\, dx
= e^{\theta^2 t/2} \cdot \frac{1}{\sqrt{2\pi t}} \int_{-\infty}^{\infty} e^{-(x - \theta t)^2/(2t)}\, dx
= e^{\theta^2 t/2},
\end{align*}
where the final integral equals $\sqrt{2\pi t}$ by normalization of the $\mathcal N(\theta t, t)$ density. Therefore
\begin{align*}
\mathbb E[M_t]
&= e^{-\theta^2 t/2} \cdot \mathbb E\!\left[e^{\theta W_t}\right]
= e^{-\theta^2 t/2} \cdot e^{\theta^2 t/2}
= 1.
\end{align*}
**CONCLUSION.** The exponent $\theta W_t - \tfrac{\theta^2}{2}t$ has two parts: a diffusion term $\theta W_t$ that fluctuates, and a deterministic drift term $-\tfrac{\theta^2}{2}t$ that was engineered to cancel the Itô correction. Specifically, differentiating the exponential $e^{\theta W_t}$ by the Itô formula produces a $dt$ term $+\tfrac{\theta^2}{2} e^{\theta W_t} dt$ from the second-order correction. Differentiating the compensator $e^{-\theta^2 t/2}$ by the ordinary chain rule produces the equal and opposite contribution $-\tfrac{\theta^2}{2} M_t dt$. The two $dt$ terms cancel identically, leaving $dM_t = \theta M_t dW_t$ — a pure diffusion with no drift, the hallmark of a local martingale. The coefficient $\tfrac{\theta^2}{2}$ in the exponent is thus not arbitrary: it is precisely the Itô correction $\tfrac{1}{2} f''(x)|_{f = e^{\theta \cdot}} = \tfrac{\theta^2}{2}$. The direct verification $\mathbb E[M_t] = e^{-\theta^2 t/2} \cdot e^{\theta^2 t/2} = 1$ confirms the cancellation from the moment side. On infinite time horizons the status of $(M_t)_{t \ge 0}$ as a true martingale versus merely a local martingale depends on whether $\sup_{t \ge 0} \mathbb E[M_t^2]$ is finite, which it is not in general since $\mathbb E[M_t^2] = e^{\theta^2 t} \to \infty$; this is why the conclusion is stated only on finite intervals $[0, T]$.
[/example]
Exponential martingales are the engine behind Gaussian tail estimates, Girsanov's theorem, and change of measure. They show how martingales connect conditional fairness to transformations of probability measures.
## Change of Measure and Likelihood Ratios
When probability measures change, martingales often appear as likelihood ratios. This is the probabilistic version of tracking how densities evolve as more information is revealed.
The finite-horizon object is the [Radon-Nikodym derivative](/page/Radon-Nikodym%20Theorem) restricted to the information available at time $t$.
[definition: Density Process]
Let $(\Omega, \mathcal F, \mathbb P)$ be a probability space with filtration $(\mathcal F_t)_{0 \le t \le T}$, and let $\mathbb Q$ be a probability measure on $(\Omega, \mathcal F)$ such that $\mathbb Q \ll \mathbb P$. A density process of $\mathbb Q$ with respect to $\mathbb P$ is the process $(Z_t)_{0 \le t \le T}$ defined by
\begin{align*}
Z_t &= \mathbb E\left[\frac{d\mathbb Q}{d\mathbb P} \,\middle|\, \mathcal F_t\right].
\end{align*}
[/definition]
Changing measure should not be treated as a one-time algebraic substitution, because the available information grows with time. The obstruction is to show that the likelihood ratio seen through each $\mathcal F_t$ evolves fairly under the original measure. That fairness is exactly the martingale property of the density process.
[quotetheorem:3547]
Under a change of measure, martingales transform in structured ways. The most famous instance is Girsanov's theorem, where exponential martingales shift the drift of Brownian motion. In the Brownian case, the guiding statement is: if the likelihood ratio is built from a suitable exponential martingale, then a Brownian motion with drift under the old measure becomes a Brownian motion without that drift under the new measure.
This principle is not just a change of variables. It says that a drift can be absorbed into the probability measure, and the absorber is a martingale density. The full theorem uses stochastic exponential and covariation notation, which belongs to a more advanced semimartingale course; for this page, the important point is the role of martingales as likelihood-ratio processes. Much of stochastic calculus can be read as the study of which transformations preserve martingale structure.
## References
David Williams, *Probability with Martingales* (1991).
Daniel Revuz and Marc Yor, *Continuous Martingales and Brownian Motion* (1999).
Ioannis Karatzas and Steven Shreve, *Brownian Motion and Stochastic Calculus* (1991).
Rick Durrett, *Probability: Theory and Examples* (2019).
David Applebaum, *Lévy Processes and Stochastic Calculus* (2009).