Stochastic calculus extends classical analysis to random processes, providing the mathematical framework for modeling continuous-time phenomena in finance, physics, and biology. Unlike ordinary calculus, which deals with deterministic functions, stochastic calculus handles processes whose evolution is driven by randomness. This course develops the theory and techniques needed to integrate with respect to random paths, differentiate stochastic processes, and solve equations whose solutions are inherently random. The fundamental challenge — and the core innovation — is that random paths lack the regularity of classical curves, requiring entirely new definitions of integral and derivative.
The course begins by establishing the measure-theoretic foundations through the Lebesgue–Stieltjes integral, which generalizes classical integration to functions of bounded variation and serves as the stepping stone to stochastic integration. Semimartingales are then introduced as the natural class of processes that can be decomposed into a predictable component and a purely random component, making them ideal candidates for integration theory. The stochastic integral is then constructed, first for simple integrands and extended to adapted processes, with the central result being Itô's formula — the chain rule for stochastic processes. Finally, stochastic differential equations are treated as integral equations, providing both existence–uniqueness theorems and techniques for solving important classes of equations that appear throughout applications.
Each chapter builds on the previous: measure theory enables precise definitions of variation and integrability; semimartingales organize processes by their decomposition properties; the stochastic integral operates on this structure to produce a usable calculus; and SDEs are then solved by appealing to all prior machinery.
# Introduction
This course begins where ordinary differential equations leave off: the world is noisy, and the deterministic framework of classical analysis is insufficient to model systems subject to random perturbations. Stochastic calculus provides the rigorous machinery to handle such systems, and this introductory chapter motivates the entire enterprise — explaining why white noise is not a function, how Brownian motion enters naturally as the integral of white noise, and how the Wiener integral gives a first taste of what the Itô integral will achieve in full generality.
[motivation]
**From ODEs to SDEs.** Classical analysis revolves around ordinary differential equations of the form
\begin{align*}
\dot{x}(t) = F(x(t)).
\end{align*}
Many physical systems, however, are subject to random perturbations: a particle in a fluid, stock prices, neural firing rates. A natural way to model such a system is to add a random forcing term,
\begin{align*}
\dot{x}(t) = F(x(t)) + \eta(t),
\end{align*}
where $\eta$ is a random function representing noise. The question is: what properties should $\eta$ have?
If we are modeling physical noise — thermal fluctuations, quantum randomness, measurement error — we expect the noise at widely separated times to be essentially independent. Noise at time $t$ carries no memory of what happened at time $s$ when $|t - s|$ is large. The idealization of this observation is to demand that $\eta(t)$ and $\eta(s)$ are independent for every $t \neq s$. Such a process is called **white noise**.
**Why White Noise Is Not a Function.** The independence requirement places a severe constraint on $\eta$. If $\eta(t)$ and $\eta(s)$ were independent for every $t \neq s$ and $\eta$ were a measurable function, then $\eta$ would have to be almost everywhere constant — a contradiction. More precisely, white noise turns out to exist only as a Schwartz distribution, not as a genuine function. This is the first fundamental obstacle the course must overcome.
**The Integral Formulation and Brownian Motion.** To understand the simplest case, set $F = 0$. The equation reduces to $\dot{x} = \eta$, or in integral form,
\begin{align*}
x(t) = x(0) + \int_0^t \eta(s)\, ds.
\end{align*}
For this integral to make sense, $\eta$ should at least be a signed measure. But white noise is not even that, so the integral cannot be interpreted classically.
We proceed nonetheless by examining what properties such an $x$ would have to satisfy. For any partition $0 = t_0 < t_1 < \cdots < t_n$, the increments
\begin{align*}
x(t_i) - x(t_{i-1}) = \int_{t_{i-1}}^{t_i} \eta(s)\, ds
\end{align*}
should be independent (since the noise values on disjoint time intervals are independent), and their variance should scale as $|t_i - t_{i-1}|$. These are precisely the defining properties of Brownian motion increments. Thus, the "integral of white noise" should be Brownian motion.
**Why Work in Continuous Time?** One might ask: why not simply discretize time and work with finite sums? The answer parallels the relationship between Riemann sums and the Lebesgue integral. The Lebesgue integral requires significant foundational work to construct, but once built, it is a vastly more powerful tool: integrating $1/x^3$ requires no special tricks, whereas summing $\sum_{n=1}^\infty 1/n^3$ in closed form is a much harder problem. Similarly, stochastic calculus in continuous time, once constructed, yields explicit computations and structural results — Itô's formula, the Lévy characterization, Girsanov's theorem — that have no easy discrete analogues.
Furthermore, many important continuous-time processes are naturally described as solutions to stochastic differential equations, just as trigonometric functions and Bessel functions are characterized as solutions to ordinary differential equations. The SDE viewpoint gives both a computational handle and structural insight.
**The Two Integrals: Itô and Stratonovich.** There are two principal approaches to defining stochastic integrals: the **Itô integral** and the **Stratonovich integral**. This course focuses on the Itô integral. One key advantage is that the Itô integral of an adapted process with respect to a martingale is again a martingale — a property that makes it the natural tool for probabilistic analysis. The Stratonovich integral, while more natural from the perspective of differential geometry and chain rules, is less convenient for the probabilistic techniques that dominate this course.
[/motivation]
## Gaussian Spaces and Isometric Embeddings
The first obstacle in building a stochastic integral is that we need to land in the right space. An integral against Brownian motion should produce a random variable, and that random variable should carry Gaussian statistics — but how can we guarantee this algebraically, without ad hoc calculations? The answer is to build the integral as an isometric embedding into a Gaussian subspace of $L^2(\Omega)$. We begin by making this target space precise.
[definition:GaussianSpace]
Let $(\Omega, \mathcal{F}, \mathbb{P})$ be a probability space. A subspace $S \subseteq L^2(\Omega, \mathcal{F}, \mathbb{P})$ is called a **Gaussian space** if $S$ is a closed linear subspace and every $X \in S$ is a centered Gaussian random variable (i.e., $X \sim N(0, \sigma^2)$ for some $\sigma^2 \geq 0$).
[/definition]
The point of the definition is that Gaussian spaces are stable under $L^2$ limits: if $(X_n)$ is a sequence of centered Gaussians in $S$ converging in $L^2$ to $X$, then $X$ is also centered Gaussian (since $L^2$ convergence implies convergence of characteristic functions, and the characteristic function of $N(0, \sigma_n^2)$ converges to that of a Gaussian). Closed linear subspaces of $L^2$ that consist entirely of centered Gaussians are therefore the natural arena in which to build stochastic integrals via Hilbert space methods.
The crucial existence result shows that every separable Hilbert space can be "realized" as a Gaussian space, with the Hilbert space inner product corresponding exactly to the covariance structure of the associated random variables.
[quotetheorem:2067]
[citeproof:2067]
This theorem is the foundation on which everything else rests, so it is worth pausing to understand exactly what it says — and what it does not say.
First, why separability? The proof constructs the isometry by mapping each element of a countable orthonormal basis to an independent standard Gaussian, then extending by linearity and $L^2$ closure. If $H$ were not separable, no countable basis would exist and this construction would break down. Non-separable Hilbert spaces do admit Gaussian isometries (by a transfinite argument), but the proof is substantially harder and we never encounter non-separable spaces in this course.
Second, uniqueness: the isometry $I$ is not unique. Different choices of orthonormal basis yield different assignments $e_i \mapsto X_i$, and different probability spaces are possible as well. What is unique — up to isomorphism of Gaussian spaces — is the covariance structure: any two Gaussian isometries $I, I': H \to L^2(\Omega)$ that agree on inner products are related by a unitary transformation on the target Gaussian space. For our purposes, we fix one such isometry once and for all.
Third, why an isometry rather than merely a bounded linear map? The isometry condition $\mathbb{E}[I(f)I(g)] = (f,g)_H$ is exactly the Itô isometry in embryonic form. It ensures that $L^2$ convergence of integrands implies $L^2$ convergence of integrals — the property that allows us to extend from step functions to all of $L^2$ by density. A bounded map that is not an isometry would distort inner products and destroy this extension.
Taking $H = L^2(\mathbb{R}_+)$ — the $L^2$ space of square-integrable functions on $[0, \infty)$ — we obtain an isometry that sends each $L^2$ function to a centered Gaussian random variable while preserving the inner product exactly. This isometry is precisely what we will call Gaussian white noise.
## Gaussian White Noise
Given that Brownian motion is supposed to be the "integral of white noise", one naturally asks: what is white noise itself? The difficulty is that white noise cannot be a function — as the motivation block explained, any measurable function with independent values at every pair of distinct times must be essentially constant. What white noise can be is a linear functional on $L^2$, and the Gaussian space framework tells us exactly what kind.
[definition:GaussianWhiteNoise]
A **Gaussian white noise** on $\mathbb{R}_+$ is an isometry
\begin{align*}
WN: L^2(\mathbb{R}_+) \to S
\end{align*}
from $L^2(\mathbb{R}_+)$ into some Gaussian space $S \subseteq L^2(\Omega, \mathcal{F}, \mathbb{P})$. For a Borel set $A \subseteq \mathbb{R}_+$, we write $WN(A) := WN(\mathbb{1}_A)$, interpreting $WN$ as a set function via the indicator function $\mathbb{1}_A \in L^2(\mathbb{R}_+)$ (provided $|A| < \infty$).
[/definition]
The existence of Gaussian white noise follows immediately from the theorem above: take $H = L^2(\mathbb{R}_+)$, which is separable (with, e.g., the Haar functions as a basis), and apply the existence theorem. The resulting isometry $I: L^2(\mathbb{R}_+) \to S$ is Gaussian white noise.
What makes white noise look like a random measure — even though it is not — is the following collection of properties:
[quotetheorem:2068]
[citeproof:2068]
These properties make $WN$ look exactly like a random signed measure: it assigns to each finite-measure set a Gaussian random variable, it is countably additive in $L^2$ and almost surely, and disjoint sets give independent increments. But $WN$ is not a random measure in the usual sense. The almost sure convergence in property (3) holds on a set of full measure, but that set depends on the particular partition $(A_i)$. For $WN$ to be a genuine measure, we would need a single set of full probability on which $WN$ is countably additive for all partitions simultaneously — and this fails. This is a decisive distinction between white noise and a random measure.
## From White Noise to Brownian Motion
Despite the failure of white noise to be a measure, we can still extract a stochastic process from it. Define
\begin{align*}
B_t := WN([0,t])
\end{align*}
for $t \geq 0$. This is well-defined since $[0,t]$ has finite Lebesgue measure.
[quotetheorem:2069]
[citeproof:2069]
The process $(B_t)$ constructed above has the correct finite-dimensional distributions for Brownian motion, but the sample paths $t \mapsto B_t(\omega)$ are not yet known to be continuous. By choosing a good orthonormal basis of $L^2(\mathbb{R}_+)$ — one adapted to the structure of the Haar wavelets, for instance — one can arrange that the sample paths are continuous almost surely. The systematic construction of Brownian motion with continuous paths, along with the verification of all required properties, occupies the first chapter of the course proper.
## The Wiener Integral: A First Stochastic Integral
We now have white noise and Brownian motion, but no stochastic integral. The most naive approach — integrate $f$ against the Brownian path $t \mapsto B_t(\omega)$ using classical analysis — fails immediately. Brownian paths are nowhere differentiable and have infinite variation; classical Riemann–Stieltjes theory requires the integrator to have bounded variation. We need a different approach entirely. The **Wiener integral** provides the resolution for deterministic integrands, leveraging the $L^2$ isometry of white noise rather than pathwise estimates.
For a step function $f \in L^2(\mathbb{R}_+)$ of the form
\begin{align*}
f = \sum_{i=1}^n f_i \mathbb{1}_{[s_i, t_i]}
\end{align*}
with $s_i < t_i$, the natural definition of the stochastic integral is
\begin{align*}
\int_0^\infty f(s)\, dB_s := WN(f) = \sum_{i=1}^n f_i\, WN([s_i, t_i]) = \sum_{i=1}^n f_i (B_{t_i} - B_{s_i}).
\end{align*}
Before proceeding, we must address a fundamental question: why can we not simply define $\int f\, dB$ as a classical Riemann–Stieltjes integral, path by path? The answer is that Brownian paths have **infinite variation** on every compact interval. More precisely, for almost every $\omega \in \Omega$, the function $t \mapsto B_t(\omega)$ satisfies
\begin{align*}
\sup_{\text{partitions}} \sum_i |B_{t_i}(\omega) - B_{t_{i-1}}(\omega)| = +\infty
\end{align*}
over every interval $[0,T]$, where the supremum is over all partitions $0 = t_0 < t_1 < \cdots < t_n = T$. The Riemann–Stieltjes integral $\int f\, dg$ is well-defined when $g$ has bounded variation; Brownian motion, whose paths have infinite total variation on every interval, fails this condition catastrophically. The classical theory cannot be applied here, and no clever approximation scheme rescues it: the pathwise Riemann–Stieltjes integral against $B_t$ simply does not exist for generic $f \in L^2$.
The Wiener integral sidesteps the pathwise problem entirely. Rather than integrating path by path against $B_t(\omega)$, it integrates in $L^2(\Omega)$ using the isometric structure of white noise. This is consistent with the Riemann–Stieltjes heuristic — partition time, multiply integrand by increments — but the limit is taken in $L^2(\Omega)$, not pointwise.
[example:WienerIntegralIndicator]
Take $f = \mathbb{1}_{[a,b]}$ for $0 \leq a < b$. Then
\begin{align*}
\int_0^\infty \mathbb{1}_{[a,b]}(s)\, dB_s = WN([a,b]) = B_b - B_a.
\end{align*}
This is a centered Gaussian with variance $b - a$. More generally, for $f = \sum_{i=1}^n f_i \mathbb{1}_{[s_i,t_i]}$, the integral $\int f\, dB$ is a centered Gaussian with variance
\begin{align*}
\mathbb{E}\left[\left(\int_0^\infty f(s)\, dB_s\right)^2\right] = \sum_{i=1}^n f_i^2 (t_i - s_i) = \int_0^\infty f(s)^2\, ds = \|f\|_{L^2}^2,
\end{align*}
where we used the independence of $B_{t_i} - B_{s_i}$ and $B_{t_j} - B_{s_j}$ for $i \neq j$, together with $\mathbb{E}[(B_{t_i}-B_{s_i})^2] = t_i - s_i$. This identity $\mathbb{E}[\int f\, dB]^2 = \|f\|_{L^2}^2$ is the **Itô isometry** in its simplest form.
[/example]
Since step functions are dense in $L^2(\mathbb{R}_+)$, the Itô isometry allows us to extend the Wiener integral by continuity to all of $L^2(\mathbb{R}_+)$: for a general $f \in L^2(\mathbb{R}_+)$, approximate by step functions $f_n \to f$ in $L^2$, define $\int f\, dB = \lim_{n \to \infty} \int f_n\, dB$ (with $L^2(\mathbb{P})$ convergence guaranteed by the isometry), and verify that the limit is independent of the choice of approximation. The result is a linear isometry
\begin{align*}
f \mapsto \int_0^\infty f(s)\, dB_s, \quad L^2(\mathbb{R}_+) \to L^2(\Omega, \mathcal{F}, \mathbb{P}),
\end{align*}
which is precisely $WN$ itself. The Wiener integral is therefore nothing more than $WN$ written in integral notation.
[example:WienerIntegralLinear]
We compute $\int_0^t s\, dB_s$ — the Wiener integral of the linear function $f(s) = s\, \mathbb{1}_{[0,t]}(s)$. By the Itô isometry, the variance is
\begin{align*}
\mathbb{E}\left[\left(\int_0^t s\, dB_s\right)^2\right] = \int_0^t s^2\, ds = \frac{t^3}{3},
\end{align*}
so $\int_0^t s\, dB_s \sim N(0, t^3/3)$. To identify the integral more explicitly, we use a discrete approximation. Take the partition $s_i = it/n$ for $i = 0, \ldots, n$, so that
\begin{align*}
\sum_{i=1}^n s_{i-1}(B_{s_i} - B_{s_{i-1}}) \xrightarrow{L^2} \int_0^t s\, dB_s.
\end{align*}
A summation-by-parts (Abel summation) on $\sum_{i=1}^n s_{i-1}(B_{s_i} - B_{s_{i-1}})$ gives
\begin{align*}
\sum_{i=1}^n s_{i-1}(B_{s_i} - B_{s_{i-1}}) = t_n B_{s_n} - \sum_{i=1}^n B_{s_i}(s_i - s_{i-1}) - \sum_{i=1}^n B_{s_i}\frac{t}{n},
\end{align*}
and passing to the limit (the Riemann sum $\frac{t}{n}\sum B_{s_i} \to \int_0^t B_s\, ds$ in $L^2$) yields the identity $\int_0^t s\, dB_s = tB_t - \int_0^t B_s\, ds$. One checks directly that the right-hand side has variance $t^3 - 2t \cdot \frac{t^2}{2} + \frac{t^3}{3} = \frac{t^3}{3}$, consistent with the isometry. This integration-by-parts identity is a special case of the stochastic integration by parts formula, which is a consequence of Itô's formula.
[/example]
[example:WienerIntegralAdaptedFailure]
The Wiener integral handles only deterministic integrands. To see concretely why this restriction matters, suppose we attempt to integrate $f_s(\omega) = B_s(\omega)$ — the Brownian motion path itself — against $dB_s$. The function $s \mapsto B_s(\omega)$ is random: its value depends on $\omega$. The Wiener integral $WN(f)$ is defined only when $f \in L^2(\mathbb{R}_+)$ is a fixed, deterministic function; applying $WN$ to a random $f$ is not meaningful because $WN$ is a linear map on the deterministic space $L^2(\mathbb{R}_+)$, not on the space of random functions.
This is exactly the gap that blocks us from writing down SDEs. In $dX_t = \sigma(X_t)\, dB_t$, the coefficient $\sigma(X_t)$ depends on the current state $X_t$, which is random. No Wiener integral handles this. The Itô integral, constructed in Chapter 3, resolves this by replacing the deterministic $L^2(\mathbb{R}_+)$ domain with the space of square-integrable adapted processes: processes $f_s(\omega)$ that depend on $\omega$ but only through the Brownian path up to time $s$. Adaptedness is the key constraint — it ensures the integrand is "non-anticipating", and it is precisely this condition that makes the Itô integral a martingale.
[/example]
The Wiener integral handles integrands $f \in L^2(\mathbb{R}_+)$ that are **deterministic** — they depend on time but not on the randomness $\omega \in \Omega$. The Itô integral, constructed in Chapter 3, handles **random integrands** $f_t(\omega)$ that are adapted to the filtration generated by $(B_t)$. This is the setting required for stochastic differential equations, where the drift and diffusion coefficients depend on the current state of the system, which is itself random. The extension from deterministic to adapted integrands requires the full machinery of martingale theory and quadratic variation, which is developed in Chapter 2.
## Course Outline
The rest of the course is organized as follows. Chapter 1 develops the Lebesgue–Stieltjes integral, the deterministic integration theory underpinning the finite-variation component of semimartingales. Chapter 2 introduces the central objects: finite-variation processes, local martingales, square-integrable martingales, quadratic variation, covariation, and semimartingales. Chapter 3 constructs the Itô stochastic integral in full generality — first for simple processes (analogous to the step function construction above), then via the Itô isometry for square-integrable martingale integrators, then extended to local martingales and semimartingales. The central result is **Itô's formula**, the stochastic analogue of the chain rule, which corrects the classical formula by a quadratic variation term. Chapter 4 applies the theory to stochastic differential equations: existence and uniqueness of strong and weak solutions, the Yamada–Watanabe theorem, the strong Markov property, and the connection between SDEs and second-order partial differential equations via the Feynman–Kac formula and the Dirichlet problem.
Throughout, the two central tools are **martingale theory** (the Lévy characterization, Girsanov's theorem, the martingale representation theorem) and **Itô's formula** (which connects stochastic calculus to classical analysis and PDE). The goal is to develop facility with these tools and to see them applied in a range of settings: from the conformal invariance of planar Brownian motion to the probabilistic representation of solutions to elliptic PDEs.
**Key prerequisite.** This course assumes fluency in measure-theoretic probability at the level of Part III Advanced Probability, including $L^2$ martingale theory, optional stopping, the martingale convergence theorem, and the basic properties of Brownian motion. These are used freely from Chapter 1 onward.
With measure-theoretic probability and martingale theory in hand, we now turn to integration theory itself. The Lebesgue–Stieltjes integral extends classical Riemann–Stieltjes integration to functions of bounded variation, creating the pathwise integration framework that will ground our treatment of finite-variation processes.
# 1. The Lebesgue–Stieltjes Integral
This chapter develops the deterministic integration theory that underlies stochastic calculus. The central problem is to give rigorous meaning to expressions of the form $\int_0^t h(s)\, da(s)$ when $a$ is not differentiable — a situation that arises throughout the theory of stochastic processes, where path regularity is far weaker than classical calculus demands. The key insight is that integration against $a$ should be understood as integration against an associated signed measure, and the correct class of integrators is the càdlàg functions of bounded variation.
[motivation]
**Why classical calculus is not enough.** In classical calculus, if $h, a : [0,1] \to \mathbb{R}$ are $C^1$, one defines
\begin{align*}
\int_0^1 h(x)\, da(x) = \int_0^1 h(x)\, a'(x)\, dx.
\end{align*}
This reduces integration against $a$ to ordinary Lebesgue integration against $a'$, which is perfectly rigorous. The problem is that in stochastic calculus, the processes we encounter — Brownian motion paths, sample paths of martingales — are nowhere differentiable. Writing $da = a'(x)\, dx$ is simply not an option.
A more fundamental approach is needed: rather than differentiating $a$ to produce a density, we ask whether $a$ itself encodes a measure. If $a$ were a cumulative distribution function, it would determine a probability measure on $[0,1]$ via the assignment $\mu((s,t]) = a(t) - a(s)$. The Lebesgue-Stieltjes construction generalises this idea to functions that can decrease as well as increase, by working with signed measures.
There is one immediate obstacle: if $a$ can decrease, then the "measure" $\mu((s,t]) = a(t) - a(s)$ can be negative, so it is not a measure in the classical sense. This forces us to work with signed measures, which are differences of two positive measures.
[/motivation]
## Signed Measures and the Hahn Decomposition
To integrate against a function that can both increase and decrease, we need a notion of signed measure. The key decomposition theorem asserts that any signed measure is a difference of two positive measures supported on disjoint sets.
[definition: Signed Measure]
A **signed measure** on $[0,T]$ (equipped with the Borel $\sigma$-algebra $\mathcal{B}([0,T])$) is a difference $\mu = \mu^+ - \mu^-$ of two finite positive Borel measures on $[0,T]$ with disjoint supports. The decomposition $\mu = \mu^+ - \mu^-$ is called the **Hahn decomposition** of $\mu$.
[/definition]
The disjointness requirement in the definition is a normalisation convention: any difference $\mu_1 - \mu_2$ of finite positive measures can be expressed in Hahn form. The following theorem makes this precise.
[quotetheorem:2070]
[citeproof:2070]
The Hahn form $\mu = \mu^+ - \mu^-$ is more than a formal device: because $\mu^+$ and $\mu^-$ have disjoint supports, the decomposition is unique, and each part can be studied independently using the classical theory of positive measures. Without disjointness, one could redistribute mass freely between $\mu^+$ and $\mu^-$ (for instance, adding a common positive measure to both) without changing $\mu$, so the decomposition would carry no information. This uniqueness is what allows us to define the total variation $|\mu| = \mu^+ + \mu^-$ unambiguously in the next definition.
Once we have a signed measure $\mu$, integration of a Borel-measurable function $h$ against $\mu$ is defined by linearity: $\int h\, d\mu = \int h\, d\mu^+ - \int h\, d\mu^-$, with the usual integrability conditions. To control the size of integrals, we need the total variation.
[definition: Total Variation of a Signed Measure]
The **total variation** of a signed measure $\mu = \mu^+ - \mu^-$ is the positive measure $|\mu| = \mu^+ + \mu^-$.
[/definition]
The total variation measure $|\mu|$ captures the full "mass" of $\mu$, regardless of sign. A function $h$ is integrable against $\mu$ precisely when it is integrable against $|\mu|$, i.e., when $h \in L^1([0,T], |\mu|)$.
## Bounded Variation and the Lebesgue-Stieltjes Measure
We now determine which functions $a : [0,T] \to \mathbb{R}$ give rise to a signed measure. The guiding intuition is that the Riemann sums
\begin{align*}
\sum_{i=1}^{n} h(t_{i-1})\bigl(a(t_i) - a(t_{i-1})\bigr)
\end{align*}
should converge as the mesh of the partition tends to zero. For this, the increments $a(t_i) - a(t_{i-1})$ must not grow uncontrollably — even for the simple integrand $h = 1$, the sum $\sum |a(t_i) - a(t_{i-1})|$ must remain bounded. This is the condition of bounded variation.
[definition: Total Variation of a Function]
Let $a : [0,T] \to \mathbb{R}$. The **total variation** of $a$ is
\begin{align*}
V_a(T) = |a(0)| + \sup\left\{ \sum_{i=1}^n |a(t_i) - a(t_{i-1})| : 0 = t_0 < t_1 < \cdots < t_n = T,\, n \geq 1 \right\}.
\end{align*}
We say $a$ has **bounded variation** on $[0,T]$, written $a \in BV[0,T]$, if $V_a(T) < \infty$.
[/definition]
The term $|a(0)|$ is included so that the total variation counts the "jump from zero" at the origin — in applications, one often thinks of $a$ as defined on all of $\mathbb{R}$ with $a(t) = 0$ for $t < 0$, and the convention ensures consistent accounting.
[remark: Bounded Variation and Monotone Decomposition]
A function has bounded variation if and only if it can be written as the difference of two bounded non-decreasing functions. Indeed, the functions $a^+ = \frac{1}{2}(V_a + a)$ and $a^- = \frac{1}{2}(V_a - a)$ are both non-decreasing (since $V_a$ grows at least as fast as $|a|$ changes), and $a = a^+ - a^-$. This decomposition mirrors the Hahn decomposition for signed measures.
[/remark]
Not every bounded variation function determines a signed measure without additional regularity — one needs right-continuity and the existence of left limits to ensure that the "cumulative measure" is well-defined without ambiguity at individual points.
[definition: Càdlàg Function]
A function $a : [0,T] \to \mathbb{R}$ is **càdlàg** (from the French *continu à droite, limites à gauche*) if:
- $a$ is right-continuous: $\lim_{s \downarrow t} a(s) = a(t)$ for all $t \in [0,T)$, and
- $a$ has left-limits: $a(t^-) := \lim_{s \uparrow t} a(s)$ exists in $\mathbb{R}$ for all $t \in (0,T]$.
[/definition]
Right-continuity is the canonical convention in probability theory. It ensures that stopping times and random times interact well with $\sigma$-algebras, and it corresponds to the convention that $\mu((s,t]) = a(t) - a(s)$ (a half-open interval open on the left, closed on the right).
The central structural theorem of this chapter establishes a perfect correspondence between signed measures and càdlàg BV functions.
[quotetheorem:2071]
[citeproof:2071]
[example: A Càdlàg Step Function]
Let $a : [0,1] \to \mathbb{R}$ be defined by
\begin{align*}
a(t) = \begin{cases} 1 & \text{if } t < \tfrac{1}{2}, \\ 0 & \text{if } t \geq \tfrac{1}{2}. \end{cases}
\end{align*}
This function is right-continuous (at $t = \frac{1}{2}$, $a(\frac{1}{2}) = 0 = \lim_{s \downarrow 1/2} a(s)$) and has a left-limit at $\frac{1}{2}$ equal to $1$, so $a$ is càdlàg. Its total variation is $V_a(1) = |a(0)| + |a(1/2^-) - a(0)| = 1 + 1 = 2$ (the function starts at $1$, then jumps down by $1$, and stays there).
The signed measure associated to $a$ is $\mu = \delta_0 - \delta_{1/2}$, where $\delta_x$ denotes the Dirac mass at $x$. Indeed, $a(t) = \mu([0,t]) = \mathbb{1}_{t \geq 0} - \mathbb{1}_{t \geq 1/2}$ matches the formula above. The total variation measure is $|\mu| = \delta_0 + \delta_{1/2}$, and $|\mu|([0,1]) = 2 = V_a(1)$.
[/example]
## The Lebesgue-Stieltjes Integral
With the bijection established, the definition of the integral is immediate: integration against $a$ simply means integration against the associated signed measure.
[definition: Lebesgue-Stieltjes Integral]
Let $a : [0,T] \to \mathbb{R}$ be càdlàg with $a \in BV[0,T]$, and let $\mu$ be the associated signed measure. For $h \in L^1([0,T], |\mu|)$ and $0 \leq s \leq t \leq T$, the **Lebesgue-Stieltjes integral** of $h$ against $a$ over $(s,t]$ is
\begin{align*}
\int_s^t h(r)\, da(r) = \int_{(s,t]} h(r)\, \mu(dr).
\end{align*}
The integral against the total variation measure is
\begin{align*}
\int_s^t h(r)\, |da(r)| = \int_{(s,t]} h(r)\, |\mu|(dr).
\end{align*}
We also write $(h \cdot a)(t) = \int_0^t h(r)\, da(r)$ for the running integral process.
[/definition]
The choice of the half-open interval $(s,t]$ is deliberate: with the right-continuity convention, the mass of $a$ at a point $r$ (the jump $\Delta a(r) = a(r) - a(r^-)$) is attributed to the integral $\int_s^t \cdot\, da$ whenever $s < r \leq t$.
[remark: Extension to Infinite Horizon]
To work on $[0,\infty)$, one says that a càdlàg function $a : [0,\infty) \to \mathbb{R}$ has **finite variation** if $a|_{[0,T]} \in BV[0,T]$ for every $T > 0$. The Lebesgue-Stieltjes integral $\int_0^t h(r)\, da(r)$ is then defined for each $t < \infty$ by restricting to $[0,t]$.
[/remark]
The basic continuity estimate for the integral is the following bound, analogous to the triangle inequality for ordinary integrals.
[quotetheorem:2108]
This theorem says that integrating a bounded measurable function against a BV function produces another BV function — the class of finite-variation functions is stable under Lebesgue-Stieltjes integration. In stochastic calculus, this becomes the statement that integration against a finite-variation process remains a finite-variation process.
## Riemann Sum Approximation
Although the Lebesgue-Stieltjes integral is defined via measure theory, it can be computed as a limit of Riemann sums when the integrand is left-continuous. This is the key connection between the abstract definition and the practical approximations used throughout stochastic calculus.
For a sequence of partitions $\pi_m = \{0 = t_0^{(m)} < t_1^{(m)} < \cdots < t_{n_m}^{(m)} = t\}$ of $[0,t]$, write $|\pi_m| = \max_i |t_i^{(m)} - t_{i-1}^{(m)}|$ for the mesh.
[quotetheorem:2072]
[citeproof:2072]
The left-continuity hypothesis on $h$ is essential. It ensures that evaluating $h$ at the left endpoint $t_{i-1}^{(m)}$ of each interval correctly approximates the integrand over that interval as the mesh shrinks. In stochastic calculus, integrands are adapted processes — naturally left-continuous or predictable — so this condition is automatically satisfied in the relevant settings.
[example: Computing a Lebesgue-Stieltjes Integral via Riemann Sums]
Let $a(t) = \mathbb{1}_{[1/2, 1]}(t)$ for $t \in [0,1]$ (a step up at $t = 1/2$) and let $h(t) = t$. We compute $\int_0^1 t\, da(t)$.
The signed measure associated to $a$ is $\mu = \delta_{1/2}$ (a unit mass at $\frac{1}{2}$), since $a(t) = \mu([0,t]) = \mathbb{1}_{t \geq 1/2}$. By definition of the Lebesgue-Stieltjes integral,
\begin{align*}
\int_0^1 t\, da(t) = \int_{(0,1]} t\, \delta_{1/2}(dt) = \frac{1}{2}.
\end{align*}
We can verify this via Riemann sums. Take the uniform partition $t_i^{(m)} = i/m$ for $0 \leq i \leq m$. For large $m$, exactly one subinterval $[t_{i-1}^{(m)}, t_i^{(m)}]$ contains the point $\frac{1}{2}$, and $a$ jumps by $1$ across it. The corresponding Riemann sum term is $h(t_{i-1}^{(m)}) \cdot 1 = t_{i-1}^{(m)}$, where $t_{i-1}^{(m)} \to \frac{1}{2}$ as $m \to \infty$. All other terms contribute $0$ (since $a$ is constant on those intervals). Thus the Riemann sums converge to $\frac{1}{2}$, confirming the measure-theoretic computation.
[/example]
## Integration by Parts
A fundamental identity in classical calculus is the integration by parts formula. For the Lebesgue-Stieltjes integral, an analogous formula holds for any two BV functions. Unlike the classical case, however, there is a correction term that accounts for simultaneous jumps.
[quotetheorem:2073]
[citeproof:2073]
The correction term $\sum_{s < r \leq t} \Delta a(r)\, \Delta b(r)$ is identically zero whenever $a$ or $b$ is continuous. In that case, the formula reduces to the familiar
\begin{align*}
\int_s^t a(r)\, db(r) + \int_s^t b(r)\, da(r) = a(t)b(t) - a(s)b(s),
\end{align*}
since $a(r^-) = a(r)$ for continuous $a$. This is the situation that arises in pathwise integration against continuous finite-variation processes.
[example: Integration by Parts with a Jump Process]
Let $a(t) = \mathbb{1}_{[1/2, 1]}(t)$ and $b(t) = \mathbb{1}_{[1/2, 1]}(t)$ on $[0,1]$, so both functions have a single jump of size $1$ at $t = \frac{1}{2}$.
The product $a(t)b(t) = \mathbb{1}_{[1/2,1]}(t)$, so $a(1)b(1) - a(0)b(0) = 1 - 0 = 1$.
The integrals $\int_0^1 a(r^-)\, db(r)$ and $\int_0^1 b(r^-)\, da(r)$: since $a(r^-) = 0$ for all $r$ (both $a$ and $b$ are zero just before their jump at $r = \frac{1}{2}$, and $a((1/2)^-) = 0$), both integrals equal $0$.
The covariation sum is $\Delta a(1/2)\, \Delta b(1/2) = 1 \cdot 1 = 1$.
The integration by parts formula gives $1 = 0 + 0 + 1$, which holds. The entire contribution to the product rule comes from the simultaneous jump — a phenomenon with no analogue in classical calculus, but which becomes central in the Itô formula for stochastic processes.
[/example]
[explanation: The Role of Left-Limits in the Integration by Parts Formula]
The appearance of $a(r^-)$ rather than $a(r)$ in the integrand of $\int a^-\, db$ is not cosmetic — it is forced by the convention that the Lebesgue-Stieltjes integral over $(s,t]$ attributes the mass $\Delta b(r) = b(r) - b(r^-)$ to the endpoint $r$. If $a$ were evaluated at $r$ itself (its right-continuous value), the product rule would not balance: both $a$ and $b$ are jumping simultaneously at $r$, and the "value of $a$ during the jump of $b$" should be the left-limit, i.e., the value just before the jump.
This convention propagates into Itô's formula for semimartingales, where the deterministic integration by parts formula serves as the template. The correction term for simultaneous jumps becomes the quadratic covariation $[X,Y]$ in the stochastic setting — a key object in Chapter 2.
[/explanation]
## Pathwise Integration Against Finite-Variation Processes
The Lebesgue-Stieltjes theory applies pathwise to stochastic processes. If $A = (A_t)_{t \geq 0}$ is a stochastic process whose sample paths are càdlàg and of finite variation, then for each fixed $\omega \in \Omega$, the path $t \mapsto A_t(\omega)$ belongs to the class of functions studied in this chapter, and the integral $\int_0^t H_s(\omega)\, dA_s(\omega)$ is defined as a Lebesgue-Stieltjes integral for each $\omega$.
[definition: Finite Variation Process]
A stochastic process $A = (A_t)_{t \geq 0}$ on a probability space $(\Omega, \mathcal{F}, \mathbb{P})$ has **finite variation** if for $\mathbb{P}$-almost every $\omega \in \Omega$, the sample path $t \mapsto A_t(\omega)$ is càdlàg and $A_\cdot(\omega)|_{[0,T]} \in BV[0,T]$ for every $T > 0$.
[/definition]
The total variation process of $A$ is $V_A(t) = V_{A_\cdot(\omega)}(t)$, defined pathwise. Finite-variation processes are the simplest class of integrators: integration against them requires no probabilistic structure beyond measurability of the integrand — the Lebesgue-Stieltjes integral is defined $\omega$ by $\omega$.
This is in sharp contrast with Brownian motion, whose sample paths have infinite variation on every compact interval, so the classical Lebesgue-Stieltjes theory breaks down entirely. Constructing integrals against Brownian motion requires the Itô integral, developed in Chapter 3.
[remark: Finite-Variation Processes in the Semimartingale Decomposition]
In the theory of semimartingales (Chapter 2), every semimartingale $X$ admits a decomposition $X = M + A$ where $M$ is a local martingale and $A$ is a finite-variation process. The stochastic integral $\int H\, dX$ splits accordingly into an Itô integral against $M$ (requiring the $L^2$ theory) and a Lebesgue-Stieltjes integral against $A$ (handled by the present chapter). The finite-variation component is the "drift" of $X$.
[/remark]
Finite-variation processes integrate easily via Lebesgue–Stieltjes theory, but Brownian motion — with its infinite variation — requires deeper structure. Chapter 2 develops the full machinery of local martingales, quadratic variation, and covariation needed to decompose and integrate against paths that oscillate wildly.
# 2. Semi-martingales
Chapter 2 is about semimartingales, the class of stochastic processes that will serve as integrators for the stochastic integral developed in Chapter 3. The central challenge is that Brownian motion — the canonical integrator in applications — has paths of infinite variation, so the classical Lebesgue–Stieltjes theory from Chapter 1 cannot directly apply. The resolution is to decompose a general process into two parts, each of which can be handled separately: a finite-variation component and a local martingale component.
[motivation]
**Why semimartingales?** The goal of this course is to construct a stochastic integral $\int_0^t H_s \, dX_s$ for processes $X$ arising in applications — most importantly Brownian motion. The cleanest framework for integration is the Lebesgue–Stieltjes theory of Chapter 1: if $A$ has finite variation, then $s \mapsto A_s(\omega)$ is a signed measure on $[0,t]$ for each fixed $\omega$, and we can integrate any locally bounded measurable function against it.
**The obstruction.** Brownian motion has infinite variation on every interval, almost surely. This is not a quirk of a bad choice of integrator — it is fundamental. Any integrator that models a diffusion process (continuous paths, independent increments, variance scaling linearly in time) will have infinite variation. The Lebesgue–Stieltjes approach fails completely for such processes.
**The decomposition strategy.** The way forward is to identify the largest class of processes $X$ for which a stochastic integral can be defined with good properties. The answer is the class of *semimartingales*: processes that decompose as
\begin{align*}
X_t = X_0 + A_t + M_t,
\end{align*}
where $A$ is a finite-variation process (handled by Chapter 1 methods) and $M$ is a local martingale (requiring new theory). Most of Chapter 2 is devoted to developing the local martingale theory — quadratic variation, covariation, and the $L^2$ theory — before putting the pieces together to define semimartingales in Section 2.6.
**What Section 2.1 covers.** The present section establishes the finite-variation component of this decomposition. We extend the notion of finite variation from Chapter 1 to stochastic processes, define the stochastic integral $H \cdot A$ against a finite-variation process $A$, and identify the correct measurability condition on the integrand $H$ — the *previsibility* condition — that ensures the integrated process remains adapted. The key point to carry away: integration against finite-variation processes is easy precisely because pathwise it reduces to ordinary measure-theoretic integration.
[/motivation]
## Finite Variation Processes
Throughout the chapter, we fix a filtered probability space $(\Omega, \mathcal{F}, (\mathcal{F}_t)_{t \ge 0}, \mathbb{P})$ satisfying the usual conditions.
We first recall the setting. A process $X : \Omega \times [0, \infty) \to \mathbb{R}$ is a *càdlàg adapted process* if, for all $\omega \in \Omega$, the path $t \mapsto X_t(\omega)$ is right-continuous with left limits, and for each $t \ge 0$, the random variable $X_t = X(\cdot, t)$ is $\mathcal{F}_t$-measurable. We write $X \in \mathcal{G}$ to indicate that a random variable $X$ is measurable with respect to the $\sigma$-algebra $\mathcal{G}$.
The definition of total variation for a function on $[0,t]$ (covered in Chapter 1) extends immediately to the pathwise setting.
[definition: Finite Variation Process]
Let $A$ be a càdlàg adapted process on $(\Omega, \mathcal{F}, (\mathcal{F}_t)_{t \ge 0}, \mathbb{P})$. We call $A$ a **finite variation process** if for every $\omega \in \Omega$, the path $A(\omega, \cdot) : [0,\infty) \to \mathbb{R}$ has finite variation on every compact interval $[0,t]$.
The **total variation process** $V$ of $A$ is defined pathwise by
\begin{align*}
V_t(\omega) = \int_0^t |dA_s(\omega)|, \quad t \ge 0,
\end{align*}
where the integral is the total variation of $A(\omega, \cdot)$ on $[0,t]$ in the sense of Chapter 1.
[/definition]
The total variation process inherits the key properties of the underlying process:
[quotetheorem:2074]
[citeproof:2074]
Adaptedness of $V_t$ matters because it means the accumulated variation up to time $t$ is observable at time $t$ — an information constraint that ensures any integral built from $V$ remains non-anticipating. This adaptedness property, together with the càdlàg paths of $V$, is what allows the stochastic integral against $A$ (defined in the next subsection) to remain adapted as well.
### Stochastic Integration Against Finite Variation Processes
Since each path of a finite variation process $A$ defines a signed measure $dA(\omega, \cdot)$ on $[0,\infty)$, we can define a stochastic integral pathwise.
[definition: Stochastic Integral Against Finite Variation Process]
Let $A$ be a finite variation process and $H : \Omega \times [0,\infty) \to \mathbb{R}$ a process satisfying
\begin{align*}
\int_0^t |H_s(\omega)| \, |dA_s(\omega)| < \infty \quad \text{for all } (\omega, t) \in \Omega \times [0, \infty).
\end{align*}
The **stochastic integral** $(H \cdot A)$ is the process defined by
\begin{align*}
(H \cdot A)_t(\omega) = \int_0^t H_s(\omega) \, dA_s(\omega), \quad t \ge 0,
\end{align*}
where the integral on the right is the Lebesgue–Stieltjes integral of $H(\omega, \cdot)$ against $A(\omega, \cdot)$.
[/definition]
This defines a well-posed process pathwise, but we need $H \cdot A$ to be adapted — that is, $(H \cdot A)_t$ must be $\mathcal{F}_t$-measurable for every $t$. This requires a measurability condition on $H$ stronger than mere adaptedness. The correct notion is *previsibility*.
[definition: Previsible Process]
The **previsible $\sigma$-algebra** $\mathcal{P}$ on $\Omega \times [0, \infty)$ is the $\sigma$-algebra generated by sets of the form $E \times (s, t]$, where $0 \le s < t$ and $E \in \mathcal{F}_s$. The generating collection
\begin{align*}
\Pi = \bigl\{ E \times (s, t] : 0 \le s < t, \, E \in \mathcal{F}_s \bigr\}
\end{align*}
is a $\pi$-system.
A process $H : \Omega \times [0, \infty) \to \mathbb{R}$ is **previsible** if it is measurable with respect to $\mathcal{P}$.
[/definition]
The intuition behind previsibility is that a previsible event at time $t$ is already determined an instant before $t$ — whenever it occurs, one knows it a small but positive time in advance. This is exactly the right condition to ensure integration does not anticipate the future.
Two tractable classes of previsible processes are:
[definition: Simple Process]
A process $H : \Omega \times [0, \infty) \to \mathbb{R}$ is a **simple process**, written $H \in \mathcal{E}$, if it has the form
\begin{align*}
H(\omega, t) = \sum_{i=1}^{n} H_{i-1}(\omega) \, \mathbb{1}_{(t_{i-1}, t_i]}(t)
\end{align*}
for some $0 = t_0 < t_1 < \cdots < t_n$, where each $H_{i-1}$ is an $\mathcal{F}_{t_{i-1}}$-measurable random variable.
[/definition]
Simple processes are previsible by definition: each term $H_{i-1}(\omega) \cdot \mathbb{1}_{(t_{i-1},t_i]}(t)$ is measurable with respect to $\mathcal{P}$ because $\{H_{i-1} \in B\} \times (t_{i-1}, t_i] \in \mathcal{P}$ for every Borel set $B$. Limits of simple processes are also previsible.
A second class comes from càdlàg processes:
[quotetheorem:2075]
[citeproof:2075]
[example: Brownian Motion Is Previsible; Poisson Process Is Not]
**Brownian motion** $W_t$ is continuous and adapted, hence previsible by the theorem above.
**A Poisson process** $N_t$ (a right-continuous step process with jump size $1$) is adapted but *not* previsible. The reason is instructive: at a jump time $T$, the value $N_T = N_{T-} + 1$ is not in $\mathcal{F}_{T-} = \sigma(\mathcal{F}_s : s < T)$, because the jump itself is not anticipated by the left-side $\sigma$-algebras. Precisely, $N_t \notin \mathcal{F}_{t-}$ whenever $t$ is a jump time, so $N$ cannot be measurable with respect to $\mathcal{P}$.
[/example]
With previsibility established, we can state the main result of this section:
[quotetheorem:2076]
[citeproof:2076]
One might ask why adaptedness of $H$ is not sufficient to ensure $H \cdot A$ is adapted. The issue is subtle: adaptedness means $H_t$ is $\mathcal{F}_t$-measurable for each fixed $t$, but the integral $(H \cdot A)_t = \int_0^t H_s \, dA_s$ involves the values of $H$ over the entire interval $[0,t]$. For the integral to be $\mathcal{F}_t$-measurable, we need joint measurability of $(s, \omega) \mapsto H_s(\omega)$ in a way compatible with the filtration at time $t$ — this is precisely what $\mathcal{P}$-measurability provides. The previsible $\sigma$-algebra is the natural $\sigma$-algebra on path space for which simple processes (which are manifestly "non-anticipating") are measurable, and the class is closed under pointwise limits, giving access to all reasonable integrands.
### The Significance and Limitations of Finite Variation
The theory developed in this section is satisfying: if $A$ is a finite variation process, then — provided the integrability condition holds — integration against $dA$ inherits all the familiar properties of Lebesgue–Stieltjes integration. Dominated convergence, linearity, and change of variables all work pathwise, without any stochastic subtlety.
The limitation is equally clear. The motivating example of a "random integrator" in stochastic differential equations is Brownian motion $W_t$, and Brownian motion has infinite variation on every interval $[0,t]$ almost surely. This can be seen from the quadratic variation computation that $\sum_{i} (W_{t_i} - W_{t_{i-1}})^2 \to t$ as the mesh goes to zero: if $W$ had finite variation $V_t$, this sum would be bounded by $\max_i |W_{t_i} - W_{t_{i-1}}| \cdot V_t \to 0$, a contradiction.
The remaining sections of this chapter develop the theory needed to handle martingale integrators. The key insight is that while martingales vary wildly, the martingale property forces large cancellations between upward and downward movements. These cancellations replace finite variation as the mechanism that makes a well-defined integral possible.
## Local Martingale
Throughout this section we work on a filtered probability space $(\Omega, \mathcal{F}, (\mathcal{F}_t)_{t \geq 0}, \mathbb{P})$ satisfying the **usual conditions**:
[definition: Usual Conditions]
A filtered probability space $(\Omega, \mathcal{F}, (\mathcal{F}_t)_{t \geq 0}, \mathbb{P})$ satisfies the **usual conditions** if:
1. $\mathcal{F}_0$ contains all $\mathbb{P}$-null sets (completeness), and
2. The filtration $(\mathcal{F}_t)_{t \geq 0}$ is right-continuous, meaning $\mathcal{F}_t = \mathcal{F}_{t+} := \bigcap_{s > t} \mathcal{F}_s$ for all $t \geq 0$.
[/definition]
The completeness condition ensures that subsets of null events are measurable — this prevents pathological measurability failures. Right-continuity of the filtration is a technical convenience that allows us to work freely with stopping times: it ensures that $\{\tau \leq t\} \in \mathcal{F}_t$ whenever $\{\tau < t\} \in \mathcal{F}_t$, without needing to pass to a larger filtration.
### The Optional Stopping Theorem
Before introducing local martingales, we recall the central structural result about martingales that we will localize. The following theorem characterizes martingales through their behaviour at stopping times.
[quotetheorem:2109]
This theorem is the workhorse result from discrete-time martingale theory, carried over to the continuous setting. The equivalence between the four conditions reflects the fundamental principle that a martingale has no systematic drift — the expected value at any stopping time equals the initial expected value.
### Motivation for Localisation
In practice, most results about stochastic integrals are first established for bounded martingales, then extended to square-integrable martingales using Hilbert space structure. To handle a general martingale $M$, one introduces the stopping times $T_n = \inf\{t > 0 : M_t \geq n\}$ and works with the stopped processes $M^{T_n}$, which are martingales for each $n$, before passing to the limit $n \to \infty$.
If one is already doing this, it is natural to weaken the martingale condition and only require that the stopped processes $M^{T_n}$ be martingales, without insisting that $M$ itself is a martingale. This weakening is not merely a convenience — martingales are not always closed under the operations we care about (such as composition with smooth functions in Itô's formula), but local martingales will be. This motivates the following central definition.
[definition: Local Martingale]
A càdlàg adapted process $X$ is a **local martingale** if there exists a sequence of stopping times $(T_n)_{n \geq 1}$ such that:
- $T_n \to \infty$ almost surely as $n \to \infty$, and
- the stopped process $X^{T_n} = (X_{T_n \wedge t})_{t \geq 0}$ is a martingale for every $n$.
The sequence $(T_n)$ is called a **reducing sequence** for $X$.
[/definition]
Every martingale is a local martingale: by the optional stopping theorem, we may take $T_n = n$, since $X^n = (X_{n \wedge t})_{t \geq 0}$ is a martingale for each $n$. The interesting examples are local martingales that fail to be genuine martingales.
[example: Inverse Distance to 3d Brownian Motion]
Let $(B_t)_{t \geq 0}$ be a standard Brownian motion in $\mathbb{R}^3$, and define the process
\begin{align*}
X_t := \frac{1}{|B_t|}, \quad t \geq 1.
\end{align*}
Then $X = (X_t)_{t \geq 1}$ is a local martingale but not a martingale.
To see that $X$ is not a martingale, observe that $X_t \geq 0$ and
\begin{align*}
\sup_{t \geq 1} \mathbb{E}[X_t^2] < \infty, \qquad \mathbb{E}[X_t] \to 0 \text{ as } t \to \infty.
\end{align*}
Since $X_t \geq 0$ but $\mathbb{E}[X_t] \to 0$, the process cannot be a martingale — if it were, $\mathbb{E}[X_t]$ would be constant.
To see that $X$ is a local martingale, recall that for any $f \in C^2_b(\mathbb{R}^3)$, the process
\begin{align*}
M^f_t := f(B_t) - f(B_1) - \frac{1}{2} \int_1^t \Delta f(B_s) \, ds
\end{align*}
is a martingale. The function $f(x) = 1/|x|$ satisfies $\Delta f(x) = 0$ for all $x \neq 0$ (it is harmonic away from the origin). If $f$ had no singularity at $0$, this would immediately give that $X$ is a martingale. The idea is to truncate away from the singularity.
Define
\begin{align*}
T_n := \inf\left\{t \geq 1 : |B_t| < \frac{1}{n}\right\},
\end{align*}
and choose $f_n \in C^2_b(\mathbb{R}^3)$ such that $f_n(x) = 1/|x|$ for $|x| \geq 1/n$. Then for $t \geq 1$,
\begin{align*}
X_{t \wedge T_n} - X_{T_n \wedge 1} = M^{f_n}_{t \wedge T_n},
\end{align*}
which is a martingale (being a stopped martingale). So $X^{T_n}$ is a martingale for each $n$.
It remains to verify that $T_n \to \infty$ almost surely. Since $|B_t|$ is a 3-dimensional Bessel process, Brownian motion in $\mathbb{R}^3$ is transient — it drifts to infinity and avoids any bounded region for all sufficiently large $t$. More concretely, since $\mathbb{E}[X_t] \to 0$ and $X_t \geq 0$, the process $X_t = 1/|B_t|$ tends to zero in $L^1$, meaning $|B_t| \to \infty$ in probability. This implies $T_n \to \infty$ almost surely.
[/example]
### Non-negative Local Martingales and Supermartingales
The example above hints at a general phenomenon: non-negative local martingales tend to behave like supermartingales. This is because Fatou's lemma creates a downward bias when passing limits through conditional expectations.
[quotetheorem:2077]
[citeproof:2077]
The Fatou inequality is strict precisely when the local martingale is not uniformly integrable near the stopping times. In the example above, $X_t = 1/|B_t|$ is a non-negative local martingale with $\mathbb{E}[X_t] \to 0 < \mathbb{E}[X_1]$, confirming it is a strict supermartingale.
### When Local Martingales are Genuine Martingales
The gap between martingales and local martingales is exactly uniform integrability. The following pair of results makes this precise.
We first recall two foundational results from measure-theoretic probability.
[quotetheorem:2110]
[quotetheorem:2111]
These two results together provide the main tool for promoting local martingale convergence to $L^1$ convergence, which is what allows us to characterize when a local martingale is a true martingale.
[quotetheorem:2078]
[citeproof:2078]
An immediate but important consequence is the following:
[quotetheorem:2079]
[citeproof:2079]
### Canonical Reducing Sequences for Continuous Local Martingales
The definition of a local martingale does not prescribe the form of the reducing sequence. In particular, the stopped processes $X^{T_n}$ need not be bounded, which would be a convenient property. For continuous local martingales, we can always find a canonical reducing sequence consisting of first-exit times from bounded intervals.
[quotetheorem:2080]
[citeproof:2080]
This result is practically important: it means that when working with a continuous local martingale $X$ started at zero, we may always assume without loss of generality that the stopped processes are uniformly bounded — a hypothesis that greatly simplifies arguments.
### Continuous Finite Variation Local Martingales are Trivial
The final theorem of this section is deceptively simple to state but has profound consequences. It is the rigidity result that makes the stochastic integral genuinely necessary, and that enforces uniqueness in the Doob–Meyer decomposition.
[quotetheorem:2081]
[citeproof:2081]
This theorem has two major consequences that shape the rest of the course.
**Itô integration is unavoidable.** Since Brownian motion $W_t$ is a continuous local martingale with $W_0 = 0$ that is not identically zero, this theorem tells us that $W_t$ cannot have finite variation. Therefore, the pathwise Lebesgue–Stieltjes integral $\int_0^t H_s \, dW_s$ cannot be defined for general integrands — the theory developed in Section 2.1 does not apply to Brownian motion. A genuinely new integration theory, the Itô integral, is needed.
**Uniqueness of semimartingale decompositions.** We will later want to define the stochastic integral with respect to processes that decompose as a sum of a continuous local martingale $M$ and a finite variation process $A$. This theorem tells us such a decomposition is unique: if $M + A = M' + A'$ with $M, M'$ continuous local martingales (both starting at zero) and $A, A'$ finite variation processes, then $M - M' = A' - A$ is simultaneously a continuous local martingale and a finite variation process, hence identically zero. So $M = M'$ and $A = A'$. This uniqueness is what makes the stochastic integral for semimartingales well-defined, as we will see in Section 2.6.
## Square Integrable Martingales
The goal of this section is to set up a Hilbert space framework for martingales that will underpin the construction of the Itô integral in Chapter 3. The rough strategy for building the stochastic integral with respect to a martingale $M$ is: define it on simple processes via finite Riemann sums, establish $L^2$ bounds showing this map is an isometry, and then extend by density and completeness. For this to work, the space of martingales must itself be a Hilbert space. The spaces $\mathcal{M}^2$ and $\mathcal{M}^2_c$ defined below are exactly this structure.
### The Spaces $\mathcal{M}^2$ and $\mathcal{M}^2_c$
We begin by specifying precisely which martingales are square integrable and how to equip their collection with a Hilbert space structure.
[definition: Square Integrable Martingale Space]
Let $(\Omega, \mathcal{F}, (\mathcal{F}_t)_{t \geq 0}, \mathbb{P})$ be a filtered probability space satisfying the usual conditions. Define
\begin{align*}
\mathcal{M}^2 &= \left\{ X : \Omega \times [0, \infty) \to \mathbb{R} : X \text{ is a càdlàg martingale with } \sup_{t \geq 0} \mathbb{E}[X_t^2] < \infty \right\},
\end{align*}
and the subspace of **continuous** square integrable martingales
\begin{align*}
\mathcal{M}^2_c &= \left\{ X \in \mathcal{M}^2 : X(\omega, \cdot) \text{ is continuous for every } \omega \in \Omega \right\}.
\end{align*}
Define an inner product on $\mathcal{M}^2$ by
\begin{align*}
(X, Y)_{\mathcal{M}^2} &= \mathbb{E}[X_\infty Y_\infty],
\end{align*}
which induces the norm $\|X\|_{\mathcal{M}^2} = \mathbb{E}[X_\infty^2]^{1/2}$.
[/definition]
A word on why the norm is finite and well-defined. For any $X \in \mathcal{M}^2$, the process $(X_t^2)_{t \geq 0}$ is a submartingale by Jensen's inequality (since $x \mapsto x^2$ is convex). Therefore $t \mapsto \mathbb{E}[X_t^2]$ is non-decreasing, and by assumption it is bounded. The martingale convergence theorem then applies: since $X$ is an $L^2$-bounded martingale, there exists a limit $X_\infty \in L^2(\Omega)$ such that $X_t \to X_\infty$ almost surely and in $L^2$. Moreover,
\begin{align*}
\mathbb{E}[X_\infty^2] = \sup_{t \geq 0} \mathbb{E}[X_t^2],
\end{align*}
so the norm $\|X\|_{\mathcal{M}^2}$ captures all the mass of the path at once.
Doob's maximal inequality (stated below) gives $\mathbb{E}[\sup_{t \geq 0} X_t^2] \leq 4\mathbb{E}[X_\infty^2]$. In particular, $\|X\|_{\mathcal{M}^2} = 0$ implies $\sup_{t \geq 0} |X_t| = 0$ almost surely, so $X = 0$ as a process. This confirms that the norm is non-degenerate.
### Doob's $L^2$ Maximal Inequality
The key analytic tool that makes $\mathcal{M}^2$ a Hilbert space is Doob's inequality, which controls the supremum of the path from the terminal value alone. It asserts that for any square-integrable martingale, controlling the terminal $L^2$ norm is enough to control the entire path.
[quotetheorem:2112]
This is an instance of the general Doob's $L^p$ inequality for $p = 2$. The proof uses the weak $(1,1)$ maximal inequality together with a stopping-time argument; the constant $4 = (p/(p-1))^p$ at $p=2$. The inequality says that once we control the $L^2$ norm of the terminal value, we control the entire path uniformly in time. This is the fundamental reason why the norm $\mathbb{E}[X_\infty^2]^{1/2}$, which only sees the limit, captures the full path behaviour.
### $\mathcal{M}^2$ is a Hilbert Space
With the inner product defined above and Doob's inequality in hand, we can verify that $\mathcal{M}^2$ is complete.
[quotetheorem:2082]
[citeproof:2082]
[example: Brownian Motion in M2]
Let $W_t$ be a standard Brownian motion on $(\Omega, \mathcal{F}, (\mathcal{F}_t)_{t \geq 0}, \mathbb{P})$. The stopped process $W_t^T = W_{t \wedge T}$ for any fixed $T < \infty$ is a continuous $L^2$-bounded martingale: $\mathbb{E}[(W^T_t)^2] = \mathbb{E}[t \wedge T] \leq T$. Therefore $W^T \in \mathcal{M}^2_c$. The terminal value is $W^T_\infty = W_T$ and $\|W^T\|_{\mathcal{M}^2}^2 = T$. The process $W$ itself is not in $\mathcal{M}^2$, since $\mathbb{E}[W_t^2] = t \to \infty$, illustrating the necessity of the bound $\sup_{t \geq 0} \mathbb{E}[X_t^2] < \infty$.
[/example]
[example: Bounded Stopping Time Martingales]
More generally, if $M$ is any continuous local martingale that is bounded in $L^2$, then $M \in \mathcal{M}^2_c$. In particular, for $M$ a continuous local martingale with $M_0 = 0$, the stopped process $M^{\tau_n}$ where $\tau_n = \inf\{t : |M_t| > n\}$ lies in $\mathcal{M}^2_c$ for each $n$. This localisation idea — reducing from $\mathcal{M}^2_c$ to manageable pieces — is the same strategy used in Section 2.2 for local martingales, and it will reappear in Chapter 3 when we extend the stochastic integral.
[/example]
### Connection to the Angle Bracket and What Comes Next
The Hilbert space structure of $\mathcal{M}^2$ is not merely an abstract nicety — it is the scaffolding on which the Doob–Meyer decomposition is built. In Section 2.4, we will associate to each $M \in \mathcal{M}^2_c$ a unique continuous, increasing, adapted process $\langle M \rangle$ (the **quadratic variation** or **angle bracket** of $M$) such that $M_t^2 - \langle M \rangle_t$ is a martingale. The existence and uniqueness of $\langle M \rangle$ is a consequence of the Riesz representation theorem applied in the Hilbert space $\mathcal{M}^2_c$, and the $L^2$ bounds proved here are exactly what make the representation theorem applicable.
The inner product structure also explains the role of $\mathcal{M}^2_c$ in integration theory. In Chapter 3, the stochastic integral $\int_0^t H_s\, dM_s$ will be defined first for simple integrands $H$ and then extended by density; the Itô isometry $\|\int H\, dM\|_{\mathcal{M}^2}^2 = \mathbb{E}[\int H_s^2\, d\langle M \rangle_s]$ makes this extension well-defined, and the completeness of $\mathcal{M}^2_c$ proved above guarantees the limit lies in the right space.
## Quadratic Variation
The preceding sections developed two key tools: the machinery of local martingales and the Hilbert space $\mathcal{M}^2_c$ of square-integrable continuous martingales. We are now ready to construct the central object of this chapter. A guiding observation from classical analysis is that, for smooth functions, second-order terms are negligible compared to first-order ones. For Brownian motion — and continuous local martingales generally — this is spectacularly false. These processes oscillate so wildly that second-order variations accumulate to something finite and non-trivial. The quadratic variation captures exactly this accumulation, and it will be the key ingredient in Itô's formula in Chapter 3.
### Convergence in the u.c.p. Sense
Before constructing the quadratic variation, we need a mode of convergence adapted to stochastic processes on compact time intervals.
[definition: Convergence Uniformly on Compacts in Probability]
Let $(\Omega, \mathcal{F}, \mathbb{P})$ be a probability space carrying a filtration $(\mathcal{F}_t)_{t \ge 0}$. For a sequence of adapted processes $(X^n_t)_{n \ge 1}$ and an adapted process $(X_t)_{t \ge 0}$, we say $X^n \to X$ **uniformly on compact sets in probability** (u.c.p.) if for every $t > 0$ and every $\varepsilon > 0$,
\begin{align*}
\mathbb{P}\!\left(\sup_{s \in [0,t]} |X^n_s - X_s| > \varepsilon\right) \to 0 \quad \text{as } n \to \infty.
\end{align*}
[/definition]
This mode of convergence is weaker than $L^2$ convergence of the running supremum, but stronger than pointwise convergence in probability at each fixed time. It is well-suited to continuous processes because it respects the sample-path structure: a sequence of continuous processes converges u.c.p. to a continuous limit if and only if the convergence is uniform over compact time intervals with high probability.
### The Angle Bracket Process
We can now state the main theorem of this section. For a continuous local martingale $M$ with $M_0 = 0$, we want to find a process $\langle M \rangle$ that measures the accumulated quadratic oscillation of $M$. The key insight is that $M^2_t$ is not a martingale (since martingales have constant mean, but $\mathbb{E}[M^2_t]$ grows), and subtracting an appropriate increasing process corrects this defect.
[quotetheorem:2113]
[definition: Quadratic Variation]
The process $\langle M \rangle$ in the theorem above is called the **quadratic variation** of $M$.
[/definition]
The Riemann-sum formula $\langle M \rangle^{(n)}_t \to \langle M \rangle_t$ is the more intuitively accessible characterisation: it says $\langle M \rangle_t$ accumulates the squared increments of $M$ over finer and finer dyadic partitions of $[0, t]$. The martingale property of $M^2_t - \langle M \rangle_t$ is the analytically powerful characterisation — it provides uniqueness and extends cleanly to the stochastic integral in Chapter 3. Both characterisations are essential.
[example: Quadratic Variation of Brownian Motion]
Let $W$ be a standard Brownian motion. We claim $\langle W \rangle_t = t$.
The process $W^2_t - t$ is a martingale: for $s \le t$,
\begin{align*}
\mathbb{E}[W^2_t - t \mid \mathcal{F}_s] &= \mathbb{E}[W^2_t \mid \mathcal{F}_s] - t.
\end{align*}
Writing $W_t = W_s + (W_t - W_s)$ and using independence of $W_t - W_s$ from $\mathcal{F}_s$ and $\mathbb{E}[W_t - W_s] = 0$,
\begin{align*}
\mathbb{E}[W^2_t \mid \mathcal{F}_s] &= W^2_s + \mathbb{E}[(W_t - W_s)^2] = W^2_s + (t - s).
\end{align*}
Therefore $\mathbb{E}[W^2_t - t \mid \mathcal{F}_s] = W^2_s - s$, confirming the martingale property. Since $A_t = t$ is continuous, adapted, increasing with $A_0 = 0$, uniqueness in the theorem gives $\langle W \rangle_t = t$.
This result is sometimes written informally as $(dW_t)^2 = dt$, which encapsulates the rough, second-order nature of Brownian paths.
[/example]
### Proof of the Quadratic Variation Theorem
The proof proceeds in three stages: uniqueness (which is immediate from results already established), existence for bounded martingales (the main Hilbert space argument), and extension to all continuous local martingales via stopping times.
[proof]
**Uniqueness.** Suppose $A$ and $\tilde{A}$ both satisfy the conditions of the theorem. Then
\begin{align*}
A_t - \tilde{A}_t = (M^2_t - \tilde{A}_t) - (M^2_t - A_t)
\end{align*}
is a continuous adapted process that is a difference of two continuous local martingales, hence itself a continuous local martingale starting at 0. Moreover, both $A$ and $\tilde{A}$ are increasing processes, so $A - \tilde{A}$ has finite variation. By the result of Section 2.2 — that a continuous local martingale of finite variation is identically zero — we conclude $A = \tilde{A}$ almost surely. This gives uniqueness.
**Existence for bounded martingales.** Suppose $|M(\omega, t)| \le C$ for all $(\omega, t)$. Then $M \in \mathcal{M}^2_c$. Fix $T > 0$ and define the auxiliary processes
\begin{align*}
X^n_t = \sum_{i=1}^{\lfloor 2^n T \rfloor} M_{(i-1)2^{-n}}\bigl(M_{i2^{-n} \wedge t} - M_{(i-1)2^{-n} \wedge t}\bigr).
\end{align*}
These are constructed so that $\langle M \rangle^{(n)}_{k2^{-n}} = M^2_{k2^{-n}} - 2X^n_{k2^{-n}}$ on dyadic grid points, reducing the analysis of $\langle M \rangle^{(n)}$ to that of $X^n$.
Each $X^n$ is a martingale (by the martingale property of $M$ and measurability of the coefficients $M_{(i-1)2^{-n}}$). To show $(X^n)$ is Cauchy in $\mathcal{M}^2_c$, compute for $n \ge m$:
\begin{align*}
X^n_\infty - X^m_\infty = \sum_{i=1}^{\lfloor 2^n T \rfloor} (M_{(i-1)2^{-n}} - M_{\lfloor (i-1)2^{m-n}\rfloor 2^{-m}})(M_{i \cdot 2^{-n}} - M_{(i-1)2^{-n}}).
\end{align*}
Taking the expected square and using the orthogonality of martingale increments, one obtains
\begin{align*}
\mathbb{E}[(X^n_\infty - X^m_\infty)^2] \le \mathbb{E}\!\left[\sup_{|s-t| \le 2^{-m}} |M_t - M_s|^4\right]^{1/2} \mathbb{E}\!\left[(\langle M \rangle^{(n)}_T)^2\right]^{1/2}
\end{align*}
by the Cauchy–Schwarz inequality. The second factor is bounded by $12C^2 \cdot 4C^2$ (via an expansion of the squared sum using orthogonal increments and the uniform bound $|M| \le C$). The first factor tends to zero as $m \to \infty$ by uniform continuity of $M$ on $[0, T]$ and dominated convergence (using the bound $|M_t - M_s|^4 \le 16C^4$). So $(X^n)$ is Cauchy in $\mathcal{M}^2_c$ and converges to some $X \in \mathcal{M}^2_c$.
From $L^2$ convergence of the supremum we extract an almost sure subsequence along which $\sup_t |X^n_t - X_t| \to 0$. Setting $A^{(T)}_t = M^2_t - 2X_t$ on the good set and $0$ on the null complement, the process $A^{(T)}$ is continuous, adapted, satisfies $M^2_{t \wedge T} - A^{(T)}_{t \wedge T}$ is a martingale, and is increasing (since $\langle M \rangle^{(n)}$ is increasing on dyadic grids and the convergence is uniform). Patching over $T \in \mathbb{N}$ using the uniqueness argument yields a global process $\langle M \rangle$.
**Extension to all continuous local martingales.** For a general continuous local martingale $M$ with $M_0 = 0$, let $T_n = \inf\{t \ge 0 : |M_t| \ge n\}$. The stopped processes $M^{T_n}$ are bounded continuous martingales, and the preceding step gives processes $A^n = \langle M^{T_n} \rangle$. By uniqueness, $A^n_{t \wedge T_n}$ and $A^{n+1}_{t \wedge T_n}$ are indistinguishable for $t < T_n$, so the processes patch consistently to give a global $\langle M \rangle$ with $\langle M \rangle_{t \wedge T_n} = A^n_t$ for all $n$. This $\langle M \rangle$ is increasing (since each $A^n$ is), and $M^2_t - \langle M \rangle_t$ is a continuous local martingale since $M^2_{t \wedge T_n} - \langle M \rangle_{t \wedge T_n}$ is a true martingale for each $n$.
**u.c.p. convergence of $\langle M \rangle^{(n)}$.** For bounded $M$, the u.c.p. convergence follows from the $L^2$ convergence of $\sup_t |X^n_t - X_t|$ and the uniform continuity of $M$ and $X$ on $[0,T]$. For general $M$, one writes
\begin{align*}
\mathbb{P}\!\left(\sup_{t \le T} |\langle M \rangle^{(n)}_t - \langle M \rangle_t| > \varepsilon\right) \le \mathbb{P}(T_k < T) + \mathbb{P}\!\left(\sup_{t \le T} |\langle M^{T_k} \rangle^{(n)}_t - \langle M^{T_k} \rangle_t| > \varepsilon\right),
\end{align*}
picks $k$ large to make the first term small (since $T_k \to \infty$ a.s.), and then $n$ large to make the second term small by the bounded case.
[/proof]
### Properties of the Quadratic Variation
With the construction in hand, several consequences follow immediately from the characterisation of $\langle M \rangle$. The first records how quadratic variation interacts with stopping, which is the key tool for extending results from bounded martingales to general continuous local martingales.
[quotetheorem:2083]
[citeproof:2083]
This stopping property is essential for extending results proved for bounded martingales to the general case via localisation.
[quotetheorem:2084]
[citeproof:2084]
The argument above uses crucially that $M^2_t \ge 0$. Without this, a non-zero local martingale could have zero quadratic variation — but such a process would necessarily have finite variation (by the argument from Section 2.2), and therefore not be a continuous local martingale unless it is constant.
### Quadratic Variation for Square-Integrable Martingales
For martingales in the Hilbert space $\mathcal{M}^2_c$ from Section 2.3, the quadratic variation has a cleaner $L^1$ theory.
[quotetheorem:2085]
[citeproof:2085]
The identity $\|M - M_0\|_{\mathcal{M}^2}^2 = \mathbb{E}[\langle M \rangle_\infty]$ is the cornerstone of stochastic integration theory. It says that the $\mathcal{M}^2_c$ norm of a martingale equals the $L^1$ norm of its terminal quadratic variation. When we define the stochastic integral $\int_0^\infty H_s\, dM_s$ for a previsible process $H$ against a martingale $M \in \mathcal{M}^2_c$ in Chapter 3, the Itô isometry will take exactly this form:
\begin{align*}
\left\|\int_0^\infty H_s\, dM_s\right\|_{\mathcal{M}^2}^2 = \mathbb{E}\!\left[\int_0^\infty H^2_s\, d\langle M \rangle_s\right].
\end{align*}
This is a stochastic analogue of the classical $L^2$ isometry for the Lebesgue integral, and it explains why $\langle M \rangle$ must be constructed before the stochastic integral can be defined.
## Covariation
The quadratic variation $\langle M \rangle$ built in Section 2.4 measures how much a single continuous local martingale $M$ fluctuates. When two martingales $M$ and $N$ interact — as they will when we integrate one against the other in Itô's formula — we need a joint quantity that captures their correlated fluctuations. This joint quantity is the **covariation** $\langle M, N \rangle$, and it arises naturally from the algebraic structure already present in $\mathcal{M}^2_c$.
Recall that $\mathcal{M}^2_c$ is a Hilbert space with inner product $\langle M, N \rangle_{\mathcal{M}^2_c} = \mathbb{E}[\langle M - M_0, N - N_0 \rangle_\infty]$. The bracket $\langle \cdot \rangle$ is the quadratic object associated to the norm; the covariation should be the corresponding bilinear object. The polarization identity from Hilbert space theory tells us exactly how to define it.
[definition: Covariation]
Let $M$ and $N$ be continuous local martingales. The **covariation** (also called the **bracket**) of $M$ and $N$ is the process $\langle M, N \rangle_t : [0, \infty) \times \Omega \to \mathbb{R}$ defined by the polarization identity:
\begin{align*}
\langle M, N \rangle_t = \frac{1}{4}\bigl(\langle M + N \rangle_t - \langle M - N \rangle_t\bigr).
\end{align*}
[/definition]
Since $M + N$ and $M - N$ are again continuous local martingales, both $\langle M + N \rangle$ and $\langle M - N \rangle$ exist by the results of Section 2.4. The difference is therefore well-defined, and $\langle M, N \rangle$ is a continuous process. When $M = N$, the identity recovers $\langle M, M \rangle = \langle M \rangle$.
If $M, N \in \mathcal{M}^2_c$, then evaluating at $t = \infty$ gives
\begin{align*}
\langle M - M_0, N - N_0 \rangle_{\mathcal{M}^2_c} = \mathbb{E}[\langle M, N \rangle_\infty],
\end{align*}
confirming that the covariation is the stochastic analogue of the inner product on $\mathcal{M}^2_c$.
### Properties of the Covariation
The following proposition collects the key properties of $\langle M, N \rangle$, paralleling those established for $\langle M \rangle$ in Section 2.4. Most importantly, it identifies $\langle M, N \rangle$ as the unique compensator turning the product $MN$ into a local martingale.
[quotetheorem:2086]
[citeproof:2086]
Part (i) is the most structurally important characterization. It says that $\langle M, N \rangle$ is the unique compensator that turns the product $MN$ into a local martingale. This is the stochastic analogue of the product rule, and it will be the key identity in Itô's formula. Part (iii) gives a concrete meaning to $\langle M, N \rangle$: it is the limit of the sum of correlated increments of $M$ and $N$ over successive dyadic partitions.
### An Example: Independent Brownian Motions
[example: Independent Brownian Motions Have Zero Covariation]
Let $B$ and $B'$ be two independent standard Brownian motions on the same filtered probability space, with $B_0 = B'_0 = 0$.
We claim $\langle B, B' \rangle_t = 0$ for all $t \geq 0$, a.s.
Set $X_{\pm} = \frac{1}{\sqrt{2}}(B \pm B')$. Since $B$ and $B'$ are independent standard Brownian motions, each $X_{\pm}$ is again a standard Brownian motion: the increments $X_{\pm}(t) - X_{\pm}(s) \sim N(0, t - s)$ are independent for disjoint intervals, and the sample paths are continuous. In particular, $\langle X_+ \rangle_t = t$ and $\langle X_- \rangle_t = t$.
Expanding using bilinearity and the polarization identity,
\begin{align*}
\langle X_+, X_+ \rangle_t &= \frac{1}{2}\langle B + B', B + B' \rangle_t = \frac{1}{2}\bigl(\langle B \rangle_t + 2\langle B, B' \rangle_t + \langle B' \rangle_t\bigr) = t + \langle B, B' \rangle_t,
\end{align*}
and similarly $\langle X_-, X_- \rangle_t = t - \langle B, B' \rangle_t$.
Since $\langle X_+ \rangle_t = t$, the first identity gives $\langle B, B' \rangle_t = 0$.
[/example]
This example is fundamental: it says that the covariation is the stochastic analogue of correlation. Independent processes have zero covariation, just as uncorrelated random variables have zero covariance. In Chapter 3, Lévy's characterization of Brownian motion will use exactly this criterion — that a continuous local martingale with $\langle M \rangle_t = t$ and $\langle M, N \rangle = 0$ for every other component must be an independent Brownian motion.
### The Kunita–Watanabe Inequality
The following inequality is the stochastic Cauchy–Schwarz inequality: it bounds the integral against the total variation measure $|d\langle M, N \rangle|$ in terms of integrals against $d\langle M \rangle$ and $d\langle N \rangle$ separately.
The covariation $\langle M, N \rangle$ is a signed measure on $[0, \infty)$ (via the Lebesgue–Stieltjes construction). A natural question is: how does the integral against $|d\langle M, N \rangle|$ compare to the integrals against $d\langle M \rangle$ and $d\langle N \rangle$? The answer is a Cauchy–Schwarz inequality at the level of these random measures.
[quotetheorem:2087]
[citeproof:2087]
The Kunita–Watanabe inequality is the stochastic analogue of the Cauchy–Schwarz inequality for Lebesgue integrals. Its principal application in Chapter 3 is to establish that the stochastic integral $\int H \, dM$ is well-defined whenever $H$ is previsible and $\int_0^\infty H_s^2 \, d\langle M \rangle_s < \infty$ a.s. The inequality also shows that $\langle M, N \rangle$ is absolutely continuous with respect to both $\langle M \rangle$ and $\langle N \rangle$ as measures on $[0, \infty)$, a fact that will be used in the proof of Itô's formula.
## Semimartingale
The theory developed in Sections 2.1–2.5 has produced two classes of processes that behave very differently under integration: finite variation processes (Section 2.1), which are handled by the ordinary Lebesgue–Stieltjes integral, and continuous local martingales (Sections 2.2–2.5), which require the stochastic integral. A semimartingale is a process that is, in a precise sense, a sum of one from each class. This is the natural class on which Itô's stochastic integral can be defined.
[definition: Continuous Semimartingale]
A continuous adapted process $X = (X_t)_{t \geq 0}$ on $(\Omega, \mathcal{F}, (\mathcal{F}_t)_{t \geq 0}, \mathbb{P})$ is a **continuous semimartingale** if it admits a decomposition
\begin{align*}
X_t = X_0 + M_t + A_t,
\end{align*}
where:
- $X_0 \in \mathcal{F}_0$ (the initial value is $\mathcal{F}_0$-measurable),
- $M = (M_t)_{t \geq 0}$ is a continuous local martingale with $M_0 = 0$,
- $A = (A_t)_{t \geq 0}$ is a continuous finite variation process with $A_0 = 0$.
This decomposition is unique up to indistinguishability.
[/definition]
The uniqueness of the decomposition is not automatic and deserves a brief explanation. If $X = X_0 + M + A = X_0 + M' + A'$ are two such decompositions, then $M - M' = A' - A$. The left side is a continuous local martingale and the right side is a continuous finite variation process. Any continuous local martingale that is simultaneously of finite variation is constant (indistinguishable from $0$, since both processes start at $0$). This was established in Section 2.2 and is the key reason the decomposition is unique.
Requiring $M_0 = 0$ and $A_0 = 0$ is a normalization convention. The full initial value is carried by $X_0 \in \mathcal{F}_0$, which need not be deterministic. This convention ensures the decomposition is unique — without it, one could shift mass between $M_0$ and $A_0$ freely.
### Quadratic Variation of a Semimartingale
The quadratic variation of a semimartingale is inherited entirely from its martingale part. This is because finite variation processes do not contribute to quadratic variation — they are too "smooth" to accumulate fluctuations along dyadic partitions.
[definition: Quadratic Variation and Covariation of Semimartingales]
Let $X = X_0 + M + A$ and $Y = Y_0 + N + B$ be continuous semimartingales. Define:
\begin{align*}
\langle X \rangle_t &= \langle M \rangle_t, \\
\langle X, Y \rangle_t &= \langle M, N \rangle_t.
\end{align*}
[/definition]
These definitions are consistent precisely because $A$ and $B$ are finite variation processes: they contribute nothing to the Riemann-sum approximations used in the u.c.p. convergence of Section 2.4 and property (iii) of the covariation theorem.
To see this concretely, note that for any finite variation process $A$ with continuous paths, the sum $\sum_i (A_{i \cdot 2^{-n}} - A_{(i-1) \cdot 2^{-n}})^2$ converges to $0$ as $n \to \infty$. This is because the increments of $A$ are bounded by $\sup_i |A_{i \cdot 2^{-n}} - A_{(i-1) \cdot 2^{-n}}| \to 0$ by uniform continuity, while the sum of their absolute values is bounded by the total variation of $A$. Consequently, the cross terms involving $A$ vanish in the limit as well.
[quotetheorem:2088]
[citeproof:2088]
The semimartingale class is the correct domain of integration for stochastic calculus. Chapter 3 will define the Itô integral $\int_0^t H_s \, dX_s$ for a semimartingale $X$ and suitable previsible $H$ by splitting the integral into a stochastic part $\int H \, dM$ (requiring the Hilbert space machinery of $\mathcal{M}^2_c$ and the Itô isometry) and a Lebesgue–Stieltjes part $\int H \, dA$ (handled by classical analysis).
The covariation $\langle X, Y \rangle$ then enters Itô's formula as the correction term that accounts for the second-order behaviour of the stochastic integral — the term that distinguishes stochastic calculus from ordinary calculus. Specifically, for smooth $f$ and semimartingales $X, Y$, the Itô correction involves $\langle X, Y \rangle$ through the second derivatives of $f$. The Riemann-sum characterization of $\langle X, Y \rangle$ is what makes this concrete: it shows that the covariation is the "second-order part" of the product $X_t Y_t$, the limit of the discrete quadratic cross-variation along dyadic partitions.
The semimartingale decomposition $X = X_0 + M + A$ separates predictable and martingale parts, with quadratic variation measuring the roughness of the martingale component. Chapter 3 harnesses this decomposition to define the Itô integral, prove Itô's formula, and derive fundamental results like Lévy characterization and Girsanov's theorem.
# 3. The Stochastic Integral
Chapter 3 constructs the stochastic integral $\int_0^t H_s\, dM_s$ for continuous square-integrable martingales $M \in \mathcal{M}^2_c$. The strategy mirrors the construction of the Lebesgue integral: define the integral on a tractable dense subclass, establish an isometry, and extend by continuity. Chapter 2 produced the quadratic variation process $\langle M \rangle$ and the space $\mathcal{M}^2_c$; these are the raw materials on which the construction rests.
[motivation]
**The construction strategy.** A naive attempt to define $\int_0^t H_s\, dM_s$ by Riemann sums fails because a typical continuous martingale has infinite total variation on every interval, so the sum does not converge absolutely. The key insight is that $M$ has finite *quadratic* variation, and this quadratic variation controls the size of the integral in an $L^2$ sense rather than a pathwise sense.
**Step 1: Simple processes.** We first define the integral for the class $\mathcal{E}$ of simple (elementary) processes — those that are piecewise constant in time with $\mathcal{F}_{t_i}$-measurable values. For these, the integral is an explicit finite sum and poses no convergence issues. The key result here is the **Itô isometry on $\mathcal{E}$**: the $\mathcal{M}^2$ norm of $H \cdot M$ equals the $L^2(\langle M \rangle)$ norm of $H$. This identity relates two entirely different norms and is the engine of the extension.
**Step 2: The space $L^2(M)$ and density.** We define $L^2(M)$ to be the Hilbert space of previsible processes square-integrable against the measure $d\mathbb{P}\, d\langle M \rangle$. The isometry established on $\mathcal{E}$ means the integration map $H \mapsto H \cdot M$ is an isometry $\mathcal{E} \to \mathcal{M}^2_c$ with respect to the $L^2(M)$ and $\mathcal{M}^2$ norms, respectively. Since $\mathcal{E}$ is dense in $L^2(M)$, the isometry extends uniquely to all of $L^2(M)$.
**Step 3: Extension by density.** For a general $H \in L^2(M)$, one picks simple approximations $H^n \to H$ in $L^2(M)$ and defines $H \cdot M := \lim_{n \to \infty} H^n \cdot M$ in $\mathcal{M}^2_c$. The isometry guarantees this limit is independent of the choice of approximating sequence.
[/motivation]
## Simple Processes
The starting point is identifying a class of integrands for which the stochastic integral can be defined combinatorially. The role is played by processes that are piecewise constant in time, analogous to step functions in Lebesgue integration.
[definition: Simple Process]
The space of **simple processes** $\mathcal{E}$ consists of all functions $H : \Omega \times [0, \infty) \to \mathbb{R}$ that can be written in the form
\begin{align*}
H_t(\omega) = \sum_{i=1}^{n} H_{i-1}(\omega)\, \mathbb{1}_{(t_{i-1}, t_i]}(t)
\end{align*}
for some finite partition $0 \leq t_0 \leq t_1 \leq \cdots \leq t_n$ and bounded random variables $H_{i-1} \in \mathcal{F}_{t_{i-1}}$.
[/definition]
The measurability condition $H_{i-1} \in \mathcal{F}_{t_{i-1}}$ is essential: it says the integrand's value on the interval $(t_{i-1}, t_i]$ is determined by information available at time $t_{i-1}$, before the interval begins. This is the discrete analogue of previsibility — the integrand cannot look into the future.
[definition: Stochastic Integral for Simple Processes]
For $M \in \mathcal{M}^2_c$ and $H \in \mathcal{E}$ written as above, the **stochastic integral** $(H \cdot M)_t$ is defined by
\begin{align*}
\int_0^t H_s\, dM_s = (H \cdot M)_t = \sum_{i=1}^{n} H_{i-1}(M_{t_i \wedge t} - M_{t_{i-1} \wedge t}).
\end{align*}
[/definition]
This is precisely the Itô–Riemann sum: on each interval $(t_{i-1}, t_i]$, we weight the increment $M_{t_i} - M_{t_{i-1}}$ by the value $H_{i-1}$ chosen at the left endpoint. When $M$ is of finite variation, this coincides with the Lebesgue–Stieltjes integral, so the definition is consistent with the classical theory developed in Chapter 1.
The first substantial result shows that $H \cdot M$ inherits the structure of $M$: it is again a continuous square-integrable martingale, and the $\mathcal{M}^2$ norm of $H \cdot M$ is computed entirely in terms of $H$ and the quadratic variation $\langle M \rangle$.
[quotetheorem:2089]
[citeproof:2089]
The identity $\langle H \cdot M \rangle_t = \int_0^t H_s^2\, d\langle M \rangle_s$ is worth pausing on: the quadratic variation of the stochastic integral is the ordinary Lebesgue–Stieltjes integral of $H^2$ against $d\langle M \rangle$. This is a precise, pathwise statement — not just a norm identity — and it governs how the integral interacts with brackets throughout the rest of the theory.
A companion result shows that the stochastic integral commutes with the bracket operation. It will be used repeatedly once we extend the integral to general processes.
[quotetheorem:2090]
[citeproof:2090]
## Itô Isometry
The isometry established on $\mathcal{E}$ suggests the right target space for extending the integral: the Hilbert space $L^2(M)$ of previsible processes square-integrable against the product measure $d\mathbb{P}\, d\langle M \rangle$.
[definition: The Space $L^2(M)$]
Let $M \in \mathcal{M}^2_c$. The space $L^2(M)$ consists of (equivalence classes of) previsible processes $H : \Omega \times [0, \infty) \to \mathbb{R}$ satisfying
\begin{align*}
\|H\|_{L^2(M)}^2 = \mathbb{E}\!\left[\int_0^\infty H_s^2\, d\langle M \rangle_s\right] < \infty.
\end{align*}
The inner product is $(H, K)_{L^2(M)} = \mathbb{E}[\int_0^\infty H_s K_s\, d\langle M \rangle_s]$.
[/definition]
Concretely, $L^2(M) = L^2(\Omega \times [0, \infty), \mathcal{P}, d\mathbb{P}\, d\langle M \rangle)$, where $\mathcal{P}$ is the previsible $\sigma$-algebra. Being an $L^2$ space over a measure space, it is a Hilbert space. The Itô isometry on $\mathcal{E}$ can now be restated cleanly: the map $\mathcal{E} \to \mathcal{M}^2_c$, $H \mapsto H \cdot M$, is an isometry for the $L^2(M)$ and $\mathcal{M}^2$ norms. To extend it to all of $L^2(M)$, we need $\mathcal{E}$ to be dense.
[quotetheorem:2091]
[citeproof:2091]
With density in hand, the extension is a consequence of abstract Hilbert space theory: any isometry from a dense subspace of a Hilbert space into a complete metric space extends uniquely to the whole Hilbert space.
[quotetheorem:2092]
[citeproof:2092]
[definition: Stochastic Integral]
For $H \in L^2(M)$ and $M \in \mathcal{M}^2_c$, the process $H \cdot M$ given by the Itô isometry is called the **stochastic integral of $H$ with respect to $M$**, written
\begin{align*}
(H \cdot M)_t = \int_0^t H_s\, dM_s.
\end{align*}
[/definition]
The Itô isometry has several important structural consequences. The first concerns products of stochastic integrals.
[quotetheorem:2093]
[citeproof:2093]
Taking expectations at a fixed time $t$ gives the covariance formula: for $H \in L^2(M)$ and $K \in L^2(N)$,
\begin{align*}
\mathbb{E}\!\left[\int_0^t H_s\, dM_s\right] = 0, \qquad \mathbb{E}\!\left[\int_0^t H_s\, dM_s \int_0^t K_s\, dN_s\right] = \mathbb{E}\!\left[\int_0^t H_s K_s\, d\langle M, N \rangle_s\right].
\end{align*}
The first identity says stochastic integrals are martingales starting at zero; the second is the generalized Itô isometry. These identities are indispensable: they reduce computations of second moments and covariances of stochastic integrals to deterministic integrals against $d\langle M, N \rangle$.
A further corollary addresses composition of stochastic integrals: integrating $K$ against $H \cdot M$ is the same as integrating $KH$ against $M$.
[quotetheorem:2094]
[citeproof:2094]
[remark: Martingale Preservation]
The most important feature of the Itô integral is that it maps $L^2(M)$ into $\mathcal{M}^2_c$: integrating a previsible process against a continuous square-integrable martingale produces another continuous square-integrable martingale. After Itô's formula is established in Section 3.5, this will become a systematic tool for recognizing martingales. For example, identifying $B^2_t - t$ as a martingale — where $B_t$ is a standard Brownian motion — will follow immediately by writing it as a stochastic integral and applying the zero-expectation property.
[/remark]
## Extension to Local Martingales
The Itô isometry of Section 3.2 gives a well-defined stochastic integral $H \cdot M$ whenever $M \in \mathcal{M}^2_c$ and $H \in L^2(M)$, the space of previsible processes square-integrable against $\langle M \rangle$. This is already a substantial achievement, but it is too restrictive for the applications we have in mind. When we later prove Itô's formula, the processes that appear naturally on the right-hand side are continuous local martingales — not necessarily square-integrable ones. We therefore need to extend the integral to this broader class, using a localisation argument that reduces everything to the already-solved case.
### The Space $L^2_{\mathrm{loc}}(M)$
The key observation is that the square-integrability condition $\mathbb{E}\!\left[\int_0^\infty H_s^2\, d\langle M \rangle_s\right] < \infty$ is far too strong for our purposes. What we really need is a condition that holds locally — that is, after stopping at a suitable sequence of times.
[definition: Space L2loc(M)]
Let $M$ be a continuous local martingale. Define $L^2_{\mathrm{loc}}(M)$ to be the space of previsible processes $H$ such that
\begin{align*}
\int_0^t H_s^2\, d\langle M \rangle_s < \infty \quad \text{a.s.}
\end{align*}
for every finite $t > 0$.
[/definition]
The condition is now almost-sure finiteness for each fixed $t$, not integrability in expectation. This is a much weaker requirement: for instance, any adapted continuous process $H$ belongs to $L^2_{\mathrm{loc}}(M)$ for every continuous local martingale $M$, since the integrand $H_s^2$ is bounded on $[0,t]$ by the supremum $\sup_{s \leq t} H_s^2$, which is finite a.s. by continuity.
[remark: Comparison with $L^2(M)$]
The space $L^2(M)$ used in Section 3.2 consists of previsible $H$ satisfying $\mathbb{E}\!\left[\int_0^\infty H_s^2\, d\langle M \rangle_s\right] < \infty$. This implies $\int_0^t H_s^2\, d\langle M \rangle_s < \infty$ a.s. for all $t$, so $L^2(M) \subset L^2_{\mathrm{loc}}(M)$. The converse fails: the condition in $L^2_{\mathrm{loc}}(M)$ involves no expectation and no global bound over all time.
[/remark]
### Existence and Characterisation of the Integral
The extension proceeds by constructing a sequence of stopping times that simultaneously tame the local martingale $M$ (making it square-integrable) and the integrand $H$ (making it square-integrable against the stopped bracket). On each stopped interval the integral is already defined by Section 3.2; the key is to check that the local definitions are compatible and fit together into a global process.
[quotetheorem:2114]
Property (i) is the bracket characterisation, which extends the Itô isometry identity $\langle H \cdot M, H \cdot M \rangle = H^2 \cdot \langle M \rangle$ to the local setting. This formula replaces the global isometry identity and is what makes the extension unique. Properties (ii) and (iii) express the correct behaviour under stopping and composition of integrals.
[citeproof:2114]
### Why Localisation Works
The power of the localisation argument is that it converts a potentially difficult global construction into a sequence of simpler problems, each of which falls within the scope of the $\mathcal{M}^2_c$ theory. The stopping times $S_n$ are chosen to control both the integrand and the integrator simultaneously: the condition $\int_0^t (1 + H_s^2)\, d\langle M \rangle_s \leq n$ ensures that $\langle M \rangle$ is bounded by $n$ (keeping $M^{S_n}$ in $\mathcal{M}^2_c$) and that $H$ is in $L^2(M^{S_n})$.
[example: Integral of a Brownian Functional]
Let $W$ be a standard Brownian motion and set $H_t = W_t$. Then $M = W$ is a continuous local martingale with $\langle W \rangle_t = t$, and
\begin{align*}
\int_0^t H_s^2\, d\langle W \rangle_s = \int_0^t W_s^2\, ds.
\end{align*}
For any fixed $t$, the integral $\int_0^t W_s^2\, ds$ is finite a.s. since $s \mapsto W_s^2$ is continuous and hence bounded on $[0,t]$ with probability one. So $H = W \in L^2_{\mathrm{loc}}(W)$. However, $\mathbb{E}\!\left[\int_0^T W_s^2\, ds\right] = \int_0^T s\, ds = T^2/2$, which is finite but grows with $T$ — so $W \notin L^2(W)$ globally. The localisation sequence is $S_n = \inf\{t \geq 0 : \int_0^t (1 + W_s^2)\, ds \geq n\}$, and on each stopped interval $[0, S_n]$ we have $\int_0^{S_n} W_s^2\, ds \leq n$, so the Itô isometry applies to give $W \cdot W^{S_n}$. Assembling these gives $\int_0^t W_s\, dW_s$, which we know equals $\frac{1}{2}(W_t^2 - t)$ by Itô's formula.
[/example]
---
## Extension to Semimartingales
With the stochastic integral now defined for continuous local martingale integrators, we turn to the final extension: integrating against a continuous semimartingale. Recall from Chapter 2 that a continuous semimartingale is a process of the form $X = X_0 + M + A$, where $X_0 \in \mathcal{F}_0$, $M$ is a continuous local martingale, and $A$ is a continuous finite variation process (equivalently, a process that can be written as the difference of two continuous non-decreasing processes). The integral against $X$ decomposes into an Itô integral against $M$ and a Lebesgue–Stieltjes integral against $A$; the question is what class of integrands can be handled.
### Locally Bounded Previsible Processes
The condition $H \in L^2_{\mathrm{loc}}(M)$ is the right integrability requirement for the martingale component. For the finite variation component, the natural condition is local boundedness of the integrand.
[definition: Locally Bounded Previsible Process]
A previsible process $H$ is **locally bounded** if for every $t \geq 0$,
\begin{align*}
\sup_{s \leq t} |H_s| < \infty \quad \text{a.s.}
\end{align*}
[/definition]
Two facts about locally bounded processes make them the right class for integrating against semimartingales:
(i) Every adapted continuous process is locally bounded, since a continuous function on $[0,t]$ attains its supremum.
(ii) If $H$ is locally bounded and $A$ is a finite variation process, then for all $t \geq 0$,
\begin{align*}
\int_0^t |H_s|\, |dA_s| < \infty \quad \text{a.s.}
\end{align*}
Indeed, if $|H_s| \leq C_t$ a.s. on $[0,t]$, then the integral is bounded by $C_t \cdot \|A\|_{[0,t]}$, where $\|A\|_{[0,t]}$ denotes the total variation of $A$ on $[0,t]$, which is finite a.s. since $A$ is a finite variation process.
[remark: Locally Bounded Implies L2loc(M)]
If $H$ is locally bounded and $M$ is a continuous local martingale, then $H \in L^2_{\mathrm{loc}}(M)$. Indeed, $\int_0^t H_s^2\, d\langle M \rangle_s \leq (\sup_{s \leq t} H_s^2) \cdot \langle M \rangle_t$. Both factors are a.s. finite for each $t$: the first by local boundedness, the second because $\langle M \rangle_t$ is an increasing adapted process that is finite a.s. at each time. So locally bounded integrands can be integrated against any component of a semimartingale.
[/remark]
### The Stochastic Integral against a Semimartingale
The definition of the semimartingale integral is pleasantly direct: split $X$ into its martingale and finite variation parts, integrate each separately, and add.
[definition: Stochastic Integral against a Semimartingale]
Let $X = X_0 + M + A$ be a continuous semimartingale, and let $H$ be a locally bounded previsible process. The **stochastic integral** $H \cdot X$ is the continuous semimartingale defined by
\begin{align*}
H \cdot X = H \cdot M + H \cdot A,
\end{align*}
where $H \cdot M$ is the local martingale integral from Section 3.3 and $H \cdot A$ is the pathwise Lebesgue–Stieltjes integral. We write
\begin{align*}
(H \cdot X)_t = \int_0^t H_s\, dX_s.
\end{align*}
[/definition]
The decomposition $X = X_0 + M + A$ is not unique in general (for instance, we can transfer a bounded-variation martingale between $M$ and $A$). However, continuous semimartingales have a canonical decomposition — if $M$ and $M'$ are two continuous local martingale parts, then $M - M'$ is a continuous local martingale with finite variation paths, and such a process must be identically zero. So the decomposition is unique among decompositions with $M$ a continuous local martingale, and the stochastic integral is well-defined.
### Properties of the Semimartingale Integral
The semimartingale integral inherits all the structural properties we expect from an integral.
[quotetheorem:2095]
[citeproof:2095]
These properties confirm that the semimartingale integral behaves as a genuine integral: bilinearity allows linear combinations to be integrated termwise, associativity means that integrating a product $HK$ against $X$ is the same as first integrating $K$ against $X$ and then integrating $H$ against the result, and the stopping property ensures the integral is consistent with localization. Together they enable the chain rule of stochastic calculus — Itô's formula — to be derived cleanly in the next section.
### Stochastic Dominated Convergence
A key tool for computations with semimartingale integrals is the stochastic dominated convergence theorem, which provides conditions under which convergence of integrands implies convergence of integrals. This extends the classical dominated convergence theorem to the stochastic setting, with convergence in probability replacing almost sure convergence.
[quotetheorem:2096]
[citeproof:2096]
### Riemann Approximation of the Integral
An important consequence of the dominated convergence theorem is that the semimartingale integral can be approximated by Riemann sums, just as in the classical theory.
[quotetheorem:2072]
[citeproof:2072]
The Riemann sum approximation is important both conceptually (it confirms that the Itô integral is the natural left-endpoint Riemann–Stieltjes limit along the semimartingale) and practically (it is the starting point for deriving Itô's formula in Section 3.5 via Taylor expansion of the Riemann sums). Notice the crucial role of left endpoints: had we used right endpoints or midpoints, we would obtain different limits — the Stratonovich integral is the midpoint limit, and it differs from the Itô integral by a correction term proportional to $d\langle M \rangle$.
[example: Bilinear Form and the Integration by Parts Formula]
Let $X$ and $Y$ be continuous semimartingales with $X = X_0 + M^X + A^X$ and $Y = Y_0 + M^Y + A^Y$. Then $XY$ is also a semimartingale. Applying Itô's formula (which we prove in Section 3.5) to $f(x,y) = xy$ yields the **integration by parts formula**:
\begin{align*}
X_t Y_t = X_0 Y_0 + \int_0^t X_s\, dY_s + \int_0^t Y_s\, dX_s + \langle M^X, M^Y \rangle_t.
\end{align*}
The term $\langle M^X, M^Y \rangle_t$ is the covariation of the martingale parts; it is absent in classical integration by parts and represents the genuinely stochastic correction. When $X$ or $Y$ is a finite variation process, the covariation term vanishes and the formula reduces to the classical Lebesgue–Stieltjes integration by parts. This formula, combined with the properties established in this section, gives a powerful calculus for manipulating stochastic integrals.
[/example]
## Itô's Formula
With the stochastic integral now defined for semimartingales, the most fundamental question in stochastic calculus becomes: if $X_t$ is a semimartingale and $f$ is a smooth function, what is $f(X_t)$? In ordinary calculus, the chain rule answers this immediately: $df(X_t) = f'(X_t)\, dX_t$. In stochastic calculus, this is wrong. The error is not a technicality — it reflects a deep structural fact about processes with non-trivial quadratic variation. The correction term, involving the second derivative of $f$ and the quadratic variation of $X$, is the entire content of stochastic calculus, and Itô's formula makes it precise.
We begin with a preliminary result that is itself a special case of Itô's formula and whose proof introduces the key idea.
### Integration by Parts for Semimartingales
[quotetheorem:2098]
[citeproof:2098]
[remark: The Itô Correction is Unavoidable]
In classical calculus, the product rule $d(xy) = x\, dy + y\, dx$ holds exactly because the cross term $(dx)(dy)$ is second order and vanishes. For semimartingales, however, the covariation $\langle X, Y \rangle_t$ is in general non-zero — for instance, when $X = Y = W$ is a standard Brownian motion, $\langle W, W \rangle_t = t$, which is neither zero nor negligible. Any formula for $d(X_t Y_t)$ that omits this term is simply incorrect.
[/remark]
[remark: Martingale Structure]
If $X$ and $Y$ are both (local) martingales, then the stochastic integrals $\int_0^t X_s\, dY_s$ and $\int_0^t Y_s\, dX_s$ are again (local) martingales. But the covariation $\langle X, Y \rangle_t$ is a process of finite variation — it is not a martingale in general. This forces the product $X_t Y_t$ out of the class of martingales, which is why the natural class for stochastic calculus is semimartingales, not martingales alone.
[/remark]
### The Itô Formula: Statement and Proof
[quotetheorem:2099]
[citeproof:2099]
[explanation: Why the Second-Order Term is the Entire Content of Stochastic Calculus]
The correction term $\frac{1}{2}\sum_{i,j} \int_0^t \frac{\partial^2 f}{\partial x_i \partial x_j}(X_s)\, d\langle X^i, X^j \rangle_s$ deserves a careful conceptual explanation, because understanding it is the same as understanding stochastic calculus.
In ordinary calculus, the Taylor expansion of $f$ around a point $x_0$ is
\begin{align*}
f(x_0 + h) = f(x_0) + f'(x_0) h + \frac{1}{2} f''(x_0) h^2 + O(h^3).
\end{align*}
When $h = \Delta x$ is the increment of a smooth path over a small interval $[s, s + \Delta t]$, the increment $\Delta x = O(\Delta t)$, so $(\Delta x)^2 = O((\Delta t)^2)$, which is second order and vanishes in the limit. The chain rule $df = f'(x)\, dx$ is all that survives.
For Brownian motion, the story is fundamentally different. Brownian paths are Hölder continuous of exponent $1/2 - \varepsilon$ but not of exponent $1/2$, which means increments satisfy $|\Delta W| \approx \sqrt{\Delta t}$. Squaring: $(\Delta W)^2 \approx \Delta t$. This is first order in time, not second order. So the second-order term in the Taylor expansion does not vanish — it contributes at the same order as the first-order term.
To be precise: when one sums the Taylor expansion over a partition of $[0, t]$,
\begin{align*}
f(X_t) - f(X_0) &\approx \sum_i f'(X_{t_i}) \Delta X_i + \frac{1}{2} \sum_i f''(X_{t_i}) (\Delta X_i)^2 + \text{higher order}.
\end{align*}
The first sum converges to the stochastic integral $\int_0^t f'(X_s)\, dX_s$ by definition. The second sum is
\begin{align*}
\frac{1}{2} \sum_i f''(X_{t_i}) (\Delta X_i)^2,
\end{align*}
which — because $(\Delta X_i)^2 \approx \Delta\langle X \rangle_i$ — converges to $\frac{1}{2}\int_0^t f''(X_s)\, d\langle X \rangle_s$. This is precisely the Itô correction. The third-order and higher terms do genuinely vanish.
The slogan is: **in stochastic calculus, $(dX)^2 = d\langle X \rangle$ and $(dX)^k = 0$ for $k \ge 3$**. Itô's formula is the rigorous version of this heuristic. All the novelty of stochastic calculus compared to ordinary calculus is encoded in the single fact that quadratic variation is non-trivial.
[/explanation]
### Differential Form and the Multiplication Table
The formula is often written in differential form as
\begin{align*}
df(X_t) = \sum_{i=1}^p \frac{\partial f}{\partial x_i}(X_t)\, dX^i_t + \frac{1}{2}\sum_{i,j=1}^p \frac{\partial^2 f}{\partial x_i \partial x_j}(X_t)\, d\langle X^i, X^j \rangle_t.
\end{align*}
This is a notational shorthand for the integral identity, but it is extraordinarily useful in calculations. One works formally with the **multiplication table**:
\begin{align*}
dt \cdot dt &= 0, & dt \cdot dW_t &= 0, & dW_t \cdot dW_t &= dt,
\end{align*}
where $W_t$ is a standard Brownian motion. More generally, for independent Brownian motions $W^i$ and $W^j$, one has $dW^i_t \cdot dW^j_t = \mathbb{1}_{\{i=j\}} dt$. One also writes $dX_t\, dY_t = d\langle X, Y \rangle_t$.
[remark: Justification of the Multiplication Table]
The rules in the multiplication table are not axioms — they are theorems. Each encodes a convergence statement: for instance, $dW_t \cdot dW_t = dt$ means that $\sum_i (\Delta W_i)^2 \to t$ as the partition mesh tends to zero, which is the quadratic variation of Brownian motion. Writing $dX_t\, dY_t = d\langle X, Y \rangle_t$ is a compact expression of the polarization identity and the definition of the covariation process from Sections 2.4–2.5. The table is a mnemonic, but every step in a computation using it should be understood as a limit statement.
[/remark]
### Key Applications
[example: Brownian Motion Squared]
Let $W_t$ be a standard Brownian motion with $W_0 = 0$, and take $f(x) = x^2$. Then $f'(x) = 2x$ and $f''(x) = 2$. Since $\langle W, W \rangle_t = t$, Itô's formula gives
\begin{align*}
W_t^2 = W_0^2 + \int_0^t 2W_s\, dW_s + \frac{1}{2}\int_0^t 2\, d\langle W \rangle_s = 0 + 2\int_0^t W_s\, dW_s + t.
\end{align*}
Rearranging:
\begin{align*}
W_t^2 - t = 2\int_0^t W_s\, dW_s.
\end{align*}
The right side is a stochastic integral against a local martingale, hence is itself a continuous local martingale. This recovers the classical fact that $W_t^2 - t$ is a martingale (which is one of Lévy's characterizing properties of Brownian motion). The classical proof requires a direct computation; Itô's formula makes it a one-line calculation.
In the differential notation, the same calculation reads: $d(W_t^2) = 2W_t\, dW_t + dW_t\, dW_t = 2W_t\, dW_t + dt$, which is the same identity.
[/example]
[example: The Itô Formula for Functions of Brownian Motion]
Let $f \in C^2(\mathbb{R})$ and $W_t$ a standard Brownian motion. The one-dimensional Itô formula specializes to
\begin{align*}
f(W_t) = f(W_0) + \int_0^t f'(W_s)\, dW_s + \frac{1}{2}\int_0^t f''(W_s)\, ds.
\end{align*}
In particular, $f(W_t)$ is a (local) martingale if and only if $f''(W_s) = 0$ for a.e.\ $s$, which (for a fixed trajectory) means $f$ is affine. For $f(W_t)$ to be a martingale for a non-trivial $f$, one must use a time-dependent function $f(t, x)$, which leads to the Feynman–Kac formula and the connection between stochastic calculus and PDEs.
[/example]
[example: Space-Time Brownian Motion and the Heat Operator]
Let $B = (B^1, \ldots, B^d)$ be a $d$-dimensional Brownian motion. Consider the $\mathbb{R}^{d+1}$-valued semimartingale $X_t = (t, B^1_t, \ldots, B^d_t)$, so $X^0_t = t$ (the time component). The quadratic covariations are $\langle X^0, X^0 \rangle_t = 0$ (since $t$ has finite variation), $\langle X^0, X^i \rangle_t = 0$, and $\langle X^i, X^j \rangle_t = \mathbb{1}_{\{i=j\}} t$. For $f \in C^2(\mathbb{R}^{d+1})$, Itô's formula gives
\begin{align*}
f(t, B_t) - f(0, B_0) = \int_0^t \frac{\partial f}{\partial s}(s, B_s)\, ds + \sum_{i=1}^d \int_0^t \frac{\partial f}{\partial x_i}(s, B_s)\, dB^i_s + \frac{1}{2}\int_0^t \Delta_x f(s, B_s)\, ds,
\end{align*}
where $\Delta_x = \sum_{i=1}^d \partial^2/\partial x_i^2$ is the spatial Laplacian. Rearranging:
\begin{align*}
f(t, B_t) - f(0, B_0) - \int_0^t \left(\frac{\partial}{\partial s} + \frac{1}{2}\Delta_x\right) f(s, B_s)\, ds = \sum_{i=1}^d \int_0^t \frac{\partial f}{\partial x_i}(s, B_s)\, dB^i_s.
\end{align*}
The left side minus the right side is a continuous local martingale. In particular, if $f$ solves the heat equation $\partial_t f = \frac{1}{2}\Delta_x f$, then $f(t, B_t)$ is a local martingale. This is the probabilistic proof of the mean value property of harmonic functions and the basis of the Feynman–Kac formula.
[/example]
[example: Geometric Brownian Motion]
Let $W_t$ be a standard Brownian motion and fix constants $\mu \in \mathbb{R}$, $\sigma > 0$. Define the process $S_t = S_0 \exp(\sigma W_t + \mu t)$ for $S_0 > 0$. This is the **geometric Brownian motion**, the fundamental model for asset prices in mathematical finance.
To find the SDE satisfied by $S_t$, take $f(x) = S_0 e^x$ and $X_t = \sigma W_t + \mu t$. Since $X_t$ is a semimartingale with $dX_t = \mu\, dt + \sigma\, dW_t$ and $d\langle X \rangle_t = \sigma^2\, dt$, Itô's formula gives
\begin{align*}
dS_t = f'(X_t)\, dX_t + \frac{1}{2} f''(X_t)\, d\langle X \rangle_t = S_t\, dX_t + \frac{1}{2} S_t\, \sigma^2\, dt.
\end{align*}
Substituting $dX_t = \mu\, dt + \sigma\, dW_t$:
\begin{align*}
dS_t = S_t(\mu\, dt + \sigma\, dW_t) + \frac{1}{2}\sigma^2 S_t\, dt = S_t\left(\mu + \frac{\sigma^2}{2}\right) dt + \sigma S_t\, dW_t.
\end{align*}
So $S_t$ satisfies the stochastic differential equation $dS_t = \left(\mu + \frac{\sigma^2}{2}\right) S_t\, dt + \sigma S_t\, dW_t$. The drift coefficient $\mu + \sigma^2/2$ differs from $\mu$: the extra $\sigma^2/2$ is precisely the Itô correction, and it is the reason one must be careful about the distinction between "log-normal drift" $\mu + \sigma^2/2$ and "instantaneous drift" $\mu$ in financial models.
[/example]
[example: The Exponential Martingale]
Let $M_t$ be a continuous local martingale with $M_0 = 0$. Define the **stochastic exponential** (or Doléans-Dade exponential) by
\begin{align*}
\mathcal{E}(M)_t = \exp\!\left( M_t - \frac{1}{2}\langle M \rangle_t \right).
\end{align*}
Applying Itô's formula to $f(x, y) = e^{x - y/2}$ along the semimartingale $(M_t, \langle M \rangle_t)$: since $\langle M \rangle$ has finite variation, its covariation with $M$ is $\langle M, \langle M \rangle \rangle = 0$, and so
\begin{align*}
d\mathcal{E}(M)_t &= \mathcal{E}(M)_t\, dM_t - \frac{1}{2}\mathcal{E}(M)_t\, d\langle M \rangle_t + \frac{1}{2}\mathcal{E}(M)_t\, d\langle M \rangle_t = \mathcal{E}(M)_t\, dM_t.
\end{align*}
That is, $\mathcal{E}(M)$ satisfies $d\mathcal{E}(M)_t = \mathcal{E}(M)_t\, dM_t$, which means $\mathcal{E}(M)$ is a continuous local martingale. (It is a true martingale under additional integrability conditions, such as the Novikov condition $\mathbb{E}[e^{\langle M \rangle_T / 2}] < \infty$.)
This computation shows that $\mathcal{E}(M)$ is the natural "exponential" in the stochastic world: while $e^{M_t}$ is not a local martingale (the Itô correction $\frac{1}{2}\int e^{M_s}\, d\langle M \rangle_s$ spoils it), the corrected version $\mathcal{E}(M)_t = e^{M_t - \langle M \rangle_t / 2}$ is. The stochastic exponential plays a central role in Girsanov's theorem (Section 3.7) and in the construction of equivalent probability measures.
[/example]
[example: Brownian Motion in Polar Coordinates]
Let $(B^1_t, B^2_t)$ be a two-dimensional Brownian motion started away from the origin: $B_0 = (r_0, 0)$ with $r_0 > 0$. Define the radial process $R_t = |B_t| = \sqrt{(B^1_t)^2 + (B^2_t)^2}$.
Apply Itô's formula to $f(x_1, x_2) = \sqrt{x_1^2 + x_2^2}$. Computing derivatives: $\frac{\partial f}{\partial x_i} = x_i/f$ and $\frac{\partial^2 f}{\partial x_i^2} = f^{-1} - x_i^2 f^{-3}$, so $\sum_i \frac{\partial^2 f}{\partial x_i^2} = f^{-1}$. Since $d\langle B^i, B^j \rangle_t = \mathbb{1}_{\{i=j\}} dt$, Itô's formula gives
\begin{align*}
dR_t = \frac{B^1_t}{R_t}\, dB^1_t + \frac{B^2_t}{R_t}\, dB^2_t + \frac{1}{2} \cdot \frac{1}{R_t}\, dt.
\end{align*}
The stochastic part $\tilde{W}_t = \int_0^t \frac{B^1_s}{R_s}\, dB^1_s + \int_0^t \frac{B^2_s}{R_s}\, dB^2_s$ is a continuous local martingale with quadratic variation $d\langle \tilde{W} \rangle_t = \frac{(B^1_t)^2 + (B^2_t)^2}{R_t^2}\, dt = dt$, so by Lévy's characterization (Section 3.6), $\tilde{W}_t$ is itself a standard Brownian motion. Therefore $R_t$ satisfies the SDE
\begin{align*}
dR_t = d\tilde{W}_t + \frac{1}{2R_t}\, dt,
\end{align*}
which is the SDE for the Bessel process of dimension $2$. The drift term $1/(2R_t)$ is the Itô correction arising from the curvature of the map $(x_1, x_2) \mapsto \sqrt{x_1^2 + x_2^2}$: the non-linearity of the function creates a drift term even though the original Brownian motion has no drift.
[/example]
### Time-Change Representation
The Itô formula for polar coordinates illustrates a general principle: continuous local martingales are, up to a time change, Brownian motions. This is made precise by the following fundamental theorem.
[quotetheorem:2115]
The proof uses Lévy's characterization of Brownian motion (Section 3.6): one checks that $W_s = M_{\tau_s}$ is a continuous local martingale with $\langle W \rangle_s = s$, which characterizes Brownian motion. The key computation is that $\langle W \rangle_s = \langle M \rangle_{\tau_s} = s$, which follows from the definition of $\tau_s$ as the inverse of $\langle M \rangle$.
[explanation: The Meaning of Dubins–Schwarz]
The Dubins–Schwarz theorem has a striking consequence: up to a random time change, every continuous local martingale is a Brownian motion. In other words, Brownian motion is the universal continuous local martingale.
This means that all the stochastic integrals $\int_0^t H_s\, dW_s$ one writes in stochastic differential equations — which appear to depend on the specific integrand $H$ and the specific Brownian motion $W$ — are, after a time change, just evaluations of a single Brownian motion at random times. The randomness in the process comes from two sources: the speed (the quadratic variation $\langle M \rangle_t$, which in concrete situations depends on the integrand $H_t$) and the underlying Brownian path.
The theorem also shows the fundamental role of the quadratic variation. Two continuous local martingales $M$ and $N$ have the same law if and only if their quadratic variation processes $\langle M \rangle$ and $\langle N \rangle$ have the same distribution. This is the deeper content of Lévy's characterization: Brownian motion is the unique continuous local martingale with quadratic variation $t$.
The condition $\langle M \rangle_\infty = \infty$ is essential. If $\langle M \rangle_\infty < \infty$, then the martingale converges (by the martingale convergence theorem applied to $M_{\tau_s}$), and the time change only runs up to a finite limit. In applications such as the martingale representation theorem and the Girsanov theorem, the Dubins–Schwarz perspective provides important structural understanding even when one does not apply the time change explicitly.
[/explanation]
[remark: Scope of Itô's Formula]
The formula as stated requires $f \in C^2(\mathbb{R}^p)$. This is essentially optimal for a formula involving only $\int f''\, d\langle X \rangle$. One can extend to $f$ that are merely twice weakly differentiable (with some integrability), but the proof requires more refined arguments. A famous extension is the Tanaka formula for $f(x) = |x|$, which is not $C^2$ at the origin; this introduces the **local time** of Brownian motion, a process that measures the amount of time the path spends near each level. Local time arises in some treatments as a further extension of the theory; it lies outside the scope of this course.
[/remark]
## The Lévy Characterization
Itô's formula is not merely a computational tool — it is the engine behind several of the deepest structural results in stochastic calculus. The first of these is Lévy's characterization theorem, which gives a beautiful answer to the following question: if you hand someone a continuous local martingale and tell them its quadratic variation is linear in $t$, must it be a Brownian motion? The answer is yes, and the proof goes through Itô's formula applied to the complex exponential.
[quotetheorem:2100]
[citeproof:2100]
[remark: Why the quadratic variation condition characterizes Brownian motion]
The condition $\langle X^i, X^j \rangle_t = \delta_{ij} t$ can be understood as encoding two things simultaneously: the absence of any drift (since $X$ is already assumed to be a local martingale) and the correct "amount of randomness" (the quadratic variation tracks the accumulated variance of the process). Brownian motion has both, and Lévy's theorem says these are the only two conditions needed. In practice, this is useful because if one constructs a continuous local martingale via a stochastic integral, computing the quadratic variation is straightforward via the rule $\langle H \cdot M \rangle_t = \int_0^t H_s^2 \, d\langle M \rangle_s$, which can reduce to a recognizable integral.
[/remark]
### The Dubins–Schwarz Theorem
If a continuous local martingale does not have $\langle M \rangle_t = t$ but its quadratic variation grows to infinity, one can still reduce to the Brownian motion case by a random time change. This is the content of the Dubins–Schwarz theorem.
[quotetheorem:2116]
The idea is transparent: by running time according to the quadratic variation clock, we reindex $M$ so that the variance accumulates at the constant rate required by Lévy's theorem. After the time change, the quadratic variation of the new process is exactly linear, so Lévy's theorem certifies it as Brownian motion.
[citeproof:2116]
[remark: One-dimensional restriction]
The Dubins–Schwarz theorem is intrinsically one-dimensional. In two dimensions, representing a planar martingale as a time-changed Brownian motion would require separate time changes for each coordinate, and these time changes need not agree. The analogous result in the plane takes a different form, discussed in the application below.
[/remark]
### The Martingale Representation Theorem
A consequence of Lévy's characterization is the martingale representation theorem, which characterizes all martingales in the Brownian filtration as stochastic integrals.
[quotetheorem:2117]
The theorem says that in the Brownian filtration, stochastic integrals against Brownian motion are the only local martingales — there is no room for "extra randomness" beyond what $W$ provides. The proof uses the fact that the Brownian filtration is generated by stochastic integrals of deterministic functions (the Wiener chaos decomposition), and Lévy's characterization enters to verify that any orthogonal local martingale must be zero.
The hypothesis that $(\mathcal{F}_t)$ is the natural filtration of $W$ is indispensable. In a larger filtration — one that contains information from an independent source of randomness, such as an independent Poisson process — there exist local martingales that are orthogonal to all stochastic integrals against $W$ and therefore cannot be represented in the above form. The representation theorem characterizes completeness of the Brownian filtration as a martingale space: it is the filtration for which Brownian motion generates all the martingales.
### Conformal Invariance of Planar Brownian Motion
A beautiful application of Lévy's characterization is the conformal invariance of two-dimensional Brownian motion.
[quotetheorem:2101]
[citeproof:2101]
<!-- illustration-needed: conformal invariance of planar Brownian motion — show a Brownian path in a domain U mapped by a holomorphic f to a time-changed Brownian path in f(U), illustrating that the shape of the path is preserved up to time change -->
[example: Image of Brownian motion under a holomorphic map]
Let $W_t = W^1_t + i W^2_t$ be a planar Brownian motion starting at $1 \in \mathbb{C}$, and let $f(z) = z^2$. Then $f(W_t) = (W^1_t)^2 - (W^2_t)^2 + 2i W^1_t W^2_t$. By Itô's formula, setting $U = \operatorname{Re}(f(W))$ and $V = \operatorname{Im}(f(W))$:
\begin{align*}
dU_t &= 2W^1_t \, dW^1_t - 2W^2_t \, dW^2_t, \\
dV_t &= 2W^1_t \, dW^2_t + 2W^2_t \, dW^1_t.
\end{align*}
Both $U$ and $V$ are continuous local martingales. Their quadratic variation processes satisfy $\langle U \rangle_t = \langle V \rangle_t = 4\int_0^t |W_s|^2 \, ds$ and $\langle U, V \rangle_t = 0$, confirming that the cross-variation vanishes (as the Cauchy–Riemann equations for $f(z) = z^2$ predict). Setting $\sigma(t) = 4\int_0^t |W_s|^2 \, ds$, Lévy's theorem applies to $(U_{\sigma^{-1}(t)}, V_{\sigma^{-1}(t)})$ to confirm it is a standard two-dimensional Brownian motion.
[/example]
---
## Girsanov's Theorem
Changing probability measures is one of the most powerful techniques in stochastic analysis. The basic question is: if $X$ is a semimartingale under $\mathbb{P}$, what does it look like under a new measure $\mathbb{Q} \sim \mathbb{P}$? Girsanov's theorem gives the precise answer: the semimartingale structure is preserved, but the local martingale component acquires a drift correction determined by the Radon–Nikodym derivative.
The finite-dimensional preview of this phenomenon is instructive. If $X \sim N(0, I)$ under $\mathbb{P}$, then shifting to $X + a \sim N(a, I)$ amounts to replacing $\mathbb{P}$ by a measure $\mathbb{Q}$ with Radon–Nikodym derivative
\begin{align*}
\frac{d\mathbb{Q}}{d\mathbb{P}} = \exp\!\left(a \cdot X - \tfrac{1}{2}|a|^2\right).
\end{align*}
Under $\mathbb{Q}$, the centered variable $X - a$ has the same law as $X$ under $\mathbb{P}$. Girsanov's theorem is the continuous-time analogue: the density is replaced by a stochastic exponential, and the drift shift is replaced by a covariation correction.
### The Stochastic Exponential
[definition: Stochastic Exponential]
Let $M$ be a continuous local martingale with $M_0 = 0$. The **stochastic exponential** (or **Doléans–Dade exponential**) of $M$ is the process
\begin{align*}
\mathcal{E}(M)_t = e^{M_t - \frac{1}{2}\langle M \rangle_t}.
\end{align*}
[/definition]
The correction term $-\frac{1}{2}\langle M \rangle_t$ is the analogue of the normalizing constant $-\frac{1}{2}|a|^2$ in the Gaussian example. Without it, $e^{M_t}$ would not be a martingale. With it, Itô's formula reveals the clean structure:
[quotetheorem:2102]
[citeproof:2102]
[remark: Novikov's Condition]
The hypothesis that $\langle M \rangle_\infty \leq b < \infty$ is sufficient but far from necessary for $\mathcal{E}(M)$ to be a uniformly integrable martingale. The standard refinement is **Novikov's condition**:
\begin{align*}
\mathbb{E}\!\left[e^{\frac{1}{2}\langle M \rangle_\infty}\right] < \infty.
\end{align*}
Under Novikov's condition, $\mathcal{E}(M)$ is a uniformly integrable martingale. This covers many SDEs where $\langle M \rangle$ grows at most linearly in $t$ with a deterministic bound on the coefficient.
[/remark]
### The Theorem
[quotetheorem:2103]
[citeproof:2103]
[explanation: The meaning of Girsanov's theorem]
Girsanov's theorem says that changing to an equivalent measure $\mathbb{Q}$ preserves the local martingale character of $X$ but introduces a drift correction $\langle X, M \rangle$. In the most common setting, $X = W$ is a Brownian motion under $\mathbb{P}$ and $M = \int_0^\cdot \theta_s \, dW_s$ for some process $\theta$. Then $\langle W, M \rangle_t = \int_0^t \theta_s \, ds$, and the conclusion is that
\begin{align*}
\tilde{W}_t = W_t - \int_0^t \theta_s \, ds
\end{align*}
is a Brownian motion under $\mathbb{Q}$. Equivalently, $W_t = \tilde{W}_t + \int_0^t \theta_s \, ds$ — the original Brownian motion, viewed from the new measure, has acquired a drift $\theta$.
This has a compelling interpretation: by changing the probability measure (that is, by re-weighting outcomes), one can absorb any square-integrable drift into the diffusion part, turning a drifted Brownian motion into a pure Brownian motion. This is the precise sense in which equivalent measures correspond to drift changes.
[/explanation]
### Application to Stochastic Differential Equations
Girsanov's theorem is indispensable for the study of SDEs. Consider the stochastic differential equation under $\mathbb{P}$:
\begin{align*}
dX_t = b(t, X_t) \, dt + \sigma(t, X_t) \, dW_t.
\end{align*}
If one already knows that $X$ is a weak solution to the simpler equation $dX_t = \sigma(t, X_t) \, d\tilde{W}_t$ (a driftless equation) under some measure $\mathbb{Q}$, then by Girsanov one can relate $\mathbb{P}$ and $\mathbb{Q}$ with Radon–Nikodym density $\mathcal{E}(M)$ where $M_t = \int_0^t \sigma^{-1}(s, X_s) b(s, X_s) \, dW_s$.
[example: Brownian motion with constant drift]
Let $W$ be a standard Brownian motion under $\mathbb{P}$, and let $\mu \in \mathbb{R}$ be a constant. Define
\begin{align*}
M_t = \mu W_t, \qquad \mathcal{E}(M)_t = e^{\mu W_t - \frac{1}{2}\mu^2 t}.
\end{align*}
Since $\langle M \rangle_t = \mu^2 t$ is deterministic, Novikov's condition gives $\mathbb{E}[e^{\frac{1}{2}\mu^2 T}] = e^{\frac{1}{2}\mu^2 T} < \infty$ for any finite $T$. On the finite horizon $[0,T]$, define $\mathbb{Q}$ by $d\mathbb{Q}/d\mathbb{P}|_{\mathcal{F}_T} = e^{\mu W_T - \frac{1}{2}\mu^2 T}$. Then by Girsanov's theorem, $\tilde{W}_t = W_t - \mu t$ is a $\mathbb{Q}$-Brownian motion. Equivalently, $W_t = \tilde{W}_t + \mu t$ has, under $\mathbb{Q}$, the law of a Brownian motion with drift $\mu$. This confirms that the Radon–Nikodym density for a Brownian motion with drift $\mu$ relative to a standard Brownian motion on $[0,T]$ is $e^{\mu W_T - \frac{1}{2}\mu^2 T}$, a fact already seen in the finite-dimensional Gaussian computation at the start of this section.
[/example]
[example: Stochastic exponential as a change of drift in an SDE]
Consider the SDE
\begin{align*}
dX_t = \theta(t) \, dt + dW_t, \qquad X_0 = 0,
\end{align*}
where $\theta : [0, \infty) \to \mathbb{R}$ is a deterministic function satisfying $\int_0^T \theta(t)^2 \, dt < \infty$ for all $T > 0$. Under $\mathbb{P}$, the solution is $X_t = W_t + \int_0^t \theta(s) \, ds$. Define $M_t = -\int_0^t \theta(s) \, dW_s$, so $\langle M \rangle_t = \int_0^t \theta(s)^2 \, ds$. The stochastic exponential is
\begin{align*}
\mathcal{E}(M)_t = \exp\!\left(-\int_0^t \theta(s) \, dW_s - \tfrac{1}{2}\int_0^t \theta(s)^2 \, ds\right).
\end{align*}
Define $\mathbb{Q}$ on $\mathcal{F}_T$ by $d\mathbb{Q}/d\mathbb{P} = \mathcal{E}(M)_T$. By Girsanov's theorem, $\tilde{W}_t = W_t - \langle W, M \rangle_t = W_t + \int_0^t \theta(s) \, ds = X_t$ is a $\mathbb{Q}$-Brownian motion. That is, the solution $X$ to the drifted SDE, viewed under $\mathbb{Q}$, is a standard Brownian motion. This makes precise the idea that the drift $\theta(t)$ can be removed by passing to the measure $\mathbb{Q}$, and confirms that the law of $X$ under $\mathbb{P}$ is absolutely continuous with respect to the Wiener measure, with an explicit Radon–Nikodym density.
[/example]
Girsanov's theorem reveals that probability measure changes shift the drift of semimartingales in a precise, controlled way. Chapter 4 applies these tools to stochastic differential equations themselves: solving for existence and uniqueness, finding explicit solutions, and connecting stochastic dynamics to PDE theory.
# 4. Stochastic Differential Equations
This chapter arrives at the original motivation for the entire course: making sense of differential equations driven by random noise. Having built the Lebesgue–Stieltjes integral, the theory of semimartingales, and the Itô stochastic integral in Chapters 1–3, we now have the tools to define and study stochastic differential equations (SDEs) rigorously. The central questions are when solutions exist, when they are unique, and how they connect to classical PDEs from analysis.
[motivation]
**The central problem.** Ordinary differential equations model deterministic dynamics. In many applications — physics, biology, finance — systems are subject to persistent random perturbations that cannot be dismissed as small. The natural model is an equation of the form
\begin{align*}
\dot{x}(t) = F(x(t)) + \eta(t),
\end{align*}
where $\eta(t)$ is Gaussian white noise: a random process whose values at different times are independent. White noise is not a function but a Schwartz distribution, so the equation cannot be interpreted classically. The insight developed in Chapter 0 is that white noise, integrated against time, produces Brownian motion: we should read $\eta(t) \, dt$ as $dW_t$. The Itô integral constructed in Chapter 3 gives the rigorous meaning of the term $\sigma(t, X_t)\, dW_t$. This reinterprets the equation as
\begin{align*}
dX_t = F(X_t) \, dt + dW_t,
\end{align*}
or in integral form $X_t = X_0 + \int_0^t F(X_s) \, ds + W_t$, where the final term is the Itô integral developed in Chapter 3.
**Two questions.** Given drift $b$ and diffusion $\sigma$, when does the SDE $dX_t = b(t, X_t) \, dt + \sigma(t, X_t) \, dW_t$ admit a solution, and when is that solution unique? The situation is subtler than for ODEs because the probability space itself is part of the data: a *weak solution* allows the Brownian motion to be chosen as part of the solution, while a *strong solution* must be adapted to a pre-specified Brownian motion. Correspondingly, *uniqueness in law* says all solutions share the same distribution, while *pathwise uniqueness* says two solutions on the same space with the same Brownian motion are indistinguishable.
**Why SDEs connect to PDEs.** The Itô formula shows that if $X$ solves an SDE, then for any smooth $f$, the process $f(t, X_t) - \int_0^t (\partial_s + L)f(s, X_s) \, ds$ is a local martingale, where $L$ is a second-order differential operator built from the coefficients $b$ and $\sigma$. This converts martingale arguments into PDE statements: solutions to elliptic and parabolic equations can be represented as expectations of functionals of $X$. The Feynman–Kac formula is the deepest instance of this principle.
[/motivation]
## Existence and Uniqueness of Solutions
To make sense of the discussion above, we first give a precise definition of what it means to solve an SDE.
[definition: Stochastic Differential Equation]
Let $d, m \in \mathbb{N}$, and let $b : \mathbb{R}_+ \times \mathbb{R}^d \to \mathbb{R}^d$ and $\sigma : \mathbb{R}_+ \times \mathbb{R}^d \to \mathbb{R}^{d \times m}$ be locally bounded and measurable. A **solution** to the stochastic differential equation $\mathcal{E}(\sigma, b)$ given by
\begin{align*}
dX_t = b(t, X_t) \, dt + \sigma(t, X_t) \, dW_t
\end{align*}
consists of:
1. a filtered probability space $(\Omega, \mathcal{F}, (\mathcal{F}_t)_{t \geq 0}, \mathbb{P})$ satisfying the usual conditions;
2. an $m$-dimensional Brownian motion $W$ with $W_0 = 0$ that is $(\mathcal{F}_t)$-adapted;
3. an $(\mathcal{F}_t)$-adapted continuous process $X$ with values in $\mathbb{R}^d$ satisfying
\begin{align*}
X_t = X_0 + \int_0^t \sigma(s, X_s) \, dW_s + \int_0^t b(s, X_s) \, ds
\end{align*}
for all $t \geq 0$.
If $X_0 = x \in \mathbb{R}^d$, we call $X$ a **(weak) solution** to $\mathcal{E}_x(\sigma, b)$. The solution is called a **strong solution** if $X$ is adapted to the canonical filtration of $W$, i.e., $\mathcal{F}_t = \sigma(W_s : s \leq t)$ (augmented in the usual way).
[/definition]
The distinction between weak and strong solutions is fundamental. In a weak solution, the probability space and Brownian motion are part of the unknowns; we seek a triple $(\Omega, W, X)$ satisfying the integral equation. In a strong solution, we fix $(\Omega, \mathcal{F}, (\mathcal{F}_t), \mathbb{P})$ and $W$ in advance and demand that $X$ be measurable with respect to the filtration of $W$ alone. Strong solutions are more demanding but also more useful: they define a deterministic map from the driving noise to the solution path.
Uniqueness splits similarly:
[definition: Uniqueness of Solutions]
For the SDE $\mathcal{E}(\sigma, b)$, we say:
- **Uniqueness in law** holds if for every $x \in \mathbb{R}^d$, all solutions to $\mathcal{E}_x(\sigma, b)$ have the same distribution (as processes on $C(\mathbb{R}_+; \mathbb{R}^d)$).
- **Pathwise uniqueness** holds if whenever $(\Omega, \mathcal{F}, (\mathcal{F}_t), \mathbb{P})$ and $W$ are fixed and $X, X'$ are two solutions with $X_0 = X'_0$, then $X_t = X'_t$ for all $t \geq 0$ almost surely.
[/definition]
These two notions are genuinely distinct. The Tanaka equation shows that weak existence and uniqueness in law can hold while pathwise uniqueness fails.
[example: Tanaka Equation]
Consider the SDE
\begin{align*}
dX_t = \operatorname{sgn}(X_t) \, dW_t, \quad X_0 = 0,
\end{align*}
where $\operatorname{sgn}(x) = +1$ for $x > 0$ and $\operatorname{sgn}(x) = -1$ for $x \leq 0$.
**Weak existence.** Let $X$ be a one-dimensional Brownian motion with $X_0 = 0$, and define
\begin{align*}
W_t = \int_0^t \operatorname{sgn}(X_s) \, dX_s.
\end{align*}
This is well-defined since $\operatorname{sgn}(X_s)$ is previsible (left-continuous). Then
\begin{align*}
\int_0^t \operatorname{sgn}(X_s) \, dW_s = \int_0^t \operatorname{sgn}(X_s)^2 \, dX_s = X_t - X_0 = X_t,
\end{align*}
using that $\operatorname{sgn}(x)^2 = 1$. So $X$ solves the SDE. We need to verify $W$ is a Brownian motion. Since $W$ is a continuous local martingale, Lévy's characterisation reduces this to checking its quadratic variation: $\langle W \rangle_t = \int_0^t \operatorname{sgn}(X_s)^2 \, ds = t$. So $W$ is indeed a Brownian motion, and weak existence holds.
**Uniqueness in law.** The same computation shows that any solution $X$ is itself a Brownian motion (since $\langle X \rangle_t = t$ and $X$ is a continuous local martingale), so all solutions have the same distribution: that of a standard Brownian motion.
**Pathwise uniqueness fails.** Suppose $X$ is a solution and let $X' = -X$. Then $X'$ satisfies
\begin{align*}
-X_t = \int_0^t \operatorname{sgn}(X_s) \, dW_s = \int_0^t \operatorname{sgn}(-X_s) \, dW_s + 2\int_0^t \mathbb{1}_{X_s = 0} \, dW_s.
\end{align*}
The second term vanishes: it is a continuous local martingale with quadratic variation $\int_0^t \mathbb{1}_{X_s = 0} \, ds = 0$ (since Brownian motion spends zero Lebesgue-measure time at zero). So $-X$ is also a solution to the same SDE with the same Brownian motion $W$. Since $X$ and $-X$ are not indistinguishable (both are Brownian motions, but $X_t \neq -X_t$ on a set of positive measure for $t > 0$), pathwise uniqueness fails.
[/example]
In the other direction, pathwise uniqueness is the stronger condition:
[quotetheorem:2118]
The theorem will not be proved here, as the course proceeds directly to the main existence and uniqueness theorem under Lipschitz conditions. The two hypotheses are genuinely independent and both necessary: weak existence provides the raw material (a solution on some space) that pathwise uniqueness then upgrades to a strong solution, while without weak existence the theorem would have nothing to work with. The Tanaka equation shows that weak existence alone does not suffice — pathwise uniqueness is the ingredient that forces the solution to be a deterministic function of the driving noise.
The key condition guaranteeing both existence and uniqueness is the following global Lipschitz bound on the coefficients.
[definition: Lipschitz Coefficients]
The coefficients $b : \mathbb{R}_+ \times \mathbb{R}^d \to \mathbb{R}^d$ and $\sigma : \mathbb{R}_+ \times \mathbb{R}^d \to \mathbb{R}^{d \times m}$ are **Lipschitz in $x$** if there exists a constant $K > 0$ such that for all $t \geq 0$ and $x, y \in \mathbb{R}^d$,
\begin{align*}
|b(t, x) - b(t, y)| &\leq K|x - y|, \\
|\sigma(t, x) - \sigma(t, y)| &\leq K|x - y|.
\end{align*}
[/definition]
The Lipschitz condition is the direct analogue of the Picard–Lindelöf condition for ODEs. Under it, both pathwise uniqueness and strong existence hold.
[quotetheorem:2104]
[citeproof:2104]
The estimate in the uniqueness proof deserves emphasis. The core structure is: take the difference of two solution equations, apply the Itô isometry to trade the stochastic integral for a Lebesgue integral, invoke Lipschitz to bound the integrand, and apply Grönwall. This three-step pattern recurs throughout SDE theory.
[remark: Grönwall's Lemma]
The key ODE comparison principle used in the proof is: if $h : [0, T] \to \mathbb{R}_+$ is measurable and satisfies $h(t) \leq c \int_0^t h(s) \, ds$ for some constant $c > 0$, then $h(t) \leq h(0) e^{ct}$ for all $t \in [0, T]$. In the uniqueness argument, $h(0) = 0$ since $X_0 = X'_0$, so $h \equiv 0$ follows immediately.
[/remark]
## Examples of Stochastic Differential Equations
With existence and uniqueness in hand, we turn to explicit solutions. In each case, we apply Itô's formula to reduce the SDE to something simpler — exactly as one uses integrating factors or substitutions to solve ODEs. The three examples below trace a natural progression: the Ornstein–Uhlenbeck process has additive noise and linear coefficients; geometric Brownian motion introduces multiplicative noise where $\sigma$ depends on $X_t$; and the Bessel process features a coefficient singular at zero, placing it at the edge of the Lipschitz theory.
### The Ornstein–Uhlenbeck Process
Mean-reversion is a fundamental feature of many physical and financial systems: quantities that wander away from an equilibrium are pulled back toward it. The Ornstein–Uhlenbeck process is the canonical SDE model of this phenomenon, combining a linear restoring drift with additive Gaussian noise.
[example: Ornstein–Uhlenbeck Process]
Let $\lambda > 0$. The **Ornstein–Uhlenbeck process** is the solution to
\begin{align*}
dX_t = -\lambda X_t \, dt + dW_t.
\end{align*}
Existence and uniqueness follow from the Lipschitz theorem since $b(x) = -\lambda x$ is globally Lipschitz. To find the explicit solution, apply Itô's formula to $Y_t = e^{\lambda t} X_t$:
\begin{align*}
dY_t = \lambda e^{\lambda t} X_t \, dt + e^{\lambda t} \, dX_t = \lambda e^{\lambda t} X_t \, dt + e^{\lambda t}(-\lambda X_t \, dt + dW_t) = e^{\lambda t} \, dW_t.
\end{align*}
Integrating and multiplying by $e^{-\lambda t}$ gives
\begin{align*}
X_t = e^{-\lambda t} X_0 + \int_0^t e^{-\lambda(t-s)} \, dW_s.
\end{align*}
The integrand $s \mapsto e^{-\lambda(t-s)}$ is deterministic, so this is a Wiener integral — a Gaussian random variable for each fixed $t$.
**Distribution.** If $X_0 = x$ is fixed, then $X_t$ is Gaussian with
\begin{align*}
\mathbb{E}[X_t] &= e^{-\lambda t} x, \\
\operatorname{Cov}(X_t, X_s) &= \frac{1}{2\lambda}\bigl(e^{-\lambda|t-s|} - e^{-\lambda(t+s)}\bigr).
\end{align*}
To see the covariance formula, assume $s \leq t$ and apply the Itô isometry:
\begin{align*}
\mathbb{E}\bigl[(X_t - \mathbb{E}[X_t])(X_s - \mathbb{E}[X_s])\bigr] = \mathbb{E}\left[\int_0^t e^{-\lambda(t-u)} \, dW_u \int_0^s e^{-\lambda(s-u)} \, dW_u\right] = e^{-\lambda(t+s)} \int_0^s e^{2\lambda u} \, du,
\end{align*}
and computing the integral gives the formula above. In particular, $X_t \sim N(e^{-\lambda t} x, (1 - e^{-2\lambda t})/(2\lambda))$, and as $t \to \infty$,
\begin{align*}
X_t \xrightarrow{d} N\!\left(0, \frac{1}{2\lambda}\right).
\end{align*}
**Stationary distribution.** If instead $X_0 \sim N(0, 1/(2\lambda))$, then $\operatorname{Cov}(X_t, X_s) = \frac{1}{2\lambda} e^{-\lambda|t-s|}$ depends only on the time difference $|t-s|$. This is the unique stationary distribution of the process: it is a Gaussian measure on $\mathbb{R}$ preserved by the dynamics.
The Ornstein–Uhlenbeck process is the prototypical mean-reverting process. The drift $-\lambda X_t \, dt$ pulls $X_t$ toward zero at rate $\lambda$, while $dW_t$ provides perpetual random fluctuations. The balance between these two effects produces the Gaussian stationary distribution.
[/example]
### Geometric Brownian Motion
Many quantities — asset prices, population sizes, biological concentrations — must remain strictly positive. Multiplicative noise, where the diffusion coefficient is proportional to the current value, naturally preserves positivity and leads to lognormal distributions. Geometric Brownian motion is the simplest SDE with this structure.
[example: Geometric Brownian Motion]
Fix $\sigma > 0$ and $r \in \mathbb{R}$. **Geometric Brownian motion** is the solution to
\begin{align*}
dX_t = r X_t \, dt + \sigma X_t \, dW_t.
\end{align*}
Again Lipschitz conditions are satisfied (both coefficients are linear in $X_t$). To solve explicitly, apply Itô's formula to $\log X_t$ (assuming $X_0 > 0$ and $X_t > 0$ for all $t$, which holds a.s. once we find the solution). Since $f(x) = \log x$ satisfies $f'(x) = 1/x$ and $f''(x) = -1/x^2$, Itô's formula gives
\begin{align*}
d(\log X_t) = \frac{1}{X_t} \, dX_t - \frac{1}{2X_t^2} \, d\langle X \rangle_t = \frac{dX_t}{X_t} - \frac{\sigma^2}{2} \, dt = \left(r - \frac{\sigma^2}{2}\right) dt + \sigma \, dW_t.
\end{align*}
Integrating and exponentiating:
\begin{align*}
X_t = X_0 \exp\!\left(\sigma W_t + \left(r - \frac{\sigma^2}{2}\right)t\right).
\end{align*}
The term $-\sigma^2/2$ is the Itô correction arising from the second-order term in Itô's formula. It ensures that $\mathbb{E}[X_t] = X_0 e^{rt}$, since $\mathbb{E}[e^{\sigma W_t}] = e^{\sigma^2 t/2}$. Without the correction, the exponent would be $\sigma W_t + rt$, and the expectation would be $X_0 e^{rt + \sigma^2 t/2}$, misrepresenting the growth rate.
Geometric Brownian motion models asset prices in the Black–Scholes framework: $r$ is the drift (expected return) and $\sigma$ is the volatility. The logarithm $\log X_t$ is a Brownian motion with drift, so $X_t$ is lognormally distributed.
[/example]
### The Bessel Process
When a $d$-dimensional Brownian motion is projected onto its radial coordinate, the resulting one-dimensional process retains information about the ambient dimension through a singular drift. This radial process — the Bessel process — illustrates how geometric structure of the state space manifests in the coefficients of an SDE.
[example: Bessel Process]
Let $W = (W^1, \ldots, W^d)$ be a $d$-dimensional Brownian motion, and define $X_t = |W_t|$ (the Euclidean norm). We compute the SDE satisfied by $X_t$ for $t < \tau_0 = \inf\{t \geq 0 : X_t = 0\}$.
Apply Itô's formula to $f(x) = |x|$ on $\mathbb{R}^d \setminus \{0\}$. The partial derivatives are $\partial_i |x| = x_i / |x|$ and $\partial_i^2 |x| = (|x|^2 - x_i^2)/|x|^3$. Summing over $i$:
\begin{align*}
\Delta |x| = \sum_{i=1}^d \frac{|x|^2 - x_i^2}{|x|^3} = \frac{d|x|^2 - |x|^2}{|x|^3} = \frac{d-1}{|x|}.
\end{align*}
By Itô's formula,
\begin{align*}
dX_t = \sum_{i=1}^d \frac{W^i_t}{X_t} \, dW^i_t + \frac{d-1}{2X_t} \, dt.
\end{align*}
Define $\tilde{W}_t = \int_0^t \sum_i (W^i_s / X_s) \, dW^i_s$. This is a continuous local martingale with $\langle \tilde{W} \rangle_t = \int_0^t \sum_i (W^i_s)^2/X_s^2 \, ds = t$, so by Lévy's characterisation, $\tilde{W}$ is a one-dimensional Brownian motion. Thus $X_t$ satisfies
\begin{align*}
dX_t = \frac{d-1}{2X_t} \, dt + d\tilde{W}_t
\end{align*}
for $t < \tau_0$. This is the **Bessel process** of dimension $d$. The singular drift $\frac{d-1}{2X_t}$ pushes $X_t$ away from zero when $d \geq 2$, reflecting the fact that $d$-dimensional Brownian motion is transient for $d \geq 3$ and recurrent but does not hit zero for $d = 2$.
[/example]
## Representations of Solutions to PDEs
One of the deepest applications of SDEs is the probabilistic representation of solutions to second-order PDEs. The connection arises from Itô's formula: if $X$ solves an SDE, then for any smooth $f$, the process $f(t, X_t) - \int_0^t (\partial_s + L)f(s, X_s) \, ds$ is a local martingale, where $L$ is the generator of $X$.
[definition: Generator of an SDE]
Let $X$ be a solution to $\mathcal{E}(\sigma, b)$ where $b : \mathbb{R}^d \to \mathbb{R}^d$ and $\sigma : \mathbb{R}^d \to \mathbb{R}^{d \times m}$ are time-independent. The **generator** of $X$ is the second-order differential operator $L : C^2(\mathbb{R}^d) \to C(\mathbb{R}^d)$ defined by
\begin{align*}
L = \frac{1}{2} \sum_{i,j=1}^d a_{ij} \partial_i \partial_j + \sum_{i=1}^d b_i \partial_i,
\end{align*}
where $a = \sigma \sigma^\top \in \mathbb{R}^{d \times d}$ is the diffusion matrix.
[/definition]
The diffusion matrix $a = \sigma \sigma^\top$ encodes the covariation structure of the noise: $d\langle X^i, X^j \rangle_t = a_{ij}(X_t) \, dt$. When $b = 0$ and $\sigma = \sqrt{2} I$, we have $a = 2I$ and $L = \Delta$, recovering the standard Laplacian. This is the generator of Brownian motion run at speed $\sqrt{2}$.
The central proposition connecting $L$ to martingales is a direct corollary of Itô's formula:
[quotetheorem:2105]
[citeproof:2105]
The Martingale Problem encodes the SDE as a condition on certain test processes rather than as a pathwise integral equation. This reformulation, developed systematically by Stroock and Varadhan, is especially powerful when the coefficients are merely measurable and Itô's formula cannot be applied directly: one can define a "solution" to the SDE as any probability measure on path space under which all the martingales $M^f$ are genuine martingales. Existence and uniqueness in this weaker sense then become questions about probability measures, amenable to compactness and tightness arguments that bypass pathwise constructions entirely.
### The Dirichlet–Poisson Problem
Let $U \subseteq \mathbb{R}^d$ be non-empty, bounded, and open. Given $f \in C_b(U)$ and $g \in C_b(\partial U)$, the **Dirichlet–Poisson problem** for $L$ is to find $u \in C^2(U) \cap C(\bar{U})$ satisfying
\begin{align*}
-Lu(x) &= f(x) \quad \text{for } x \in U, \\
u(x) &= g(x) \quad \text{for } x \in \partial U.
\end{align*}
When $f = 0$, this is the **Dirichlet problem**; when $g = 0$, this is the **Poisson problem**.
The operator $L$ must be uniformly elliptic to guarantee that this problem is well-posed.
[definition: Uniform Ellipticity]
A matrix-valued function $a : \bar{U} \to \mathbb{R}^{d \times d}$ is **uniformly elliptic** on $\bar{U}$ if there exists a constant $c > 0$ such that for all $\xi \in \mathbb{R}^d$ and $x \in \bar{U}$,
\begin{align*}
\xi^\top a(x) \xi \geq c|\xi|^2.
\end{align*}
When $a$ is symmetric (as it is when $a = \sigma \sigma^\top$), this is equivalent to requiring that the smallest eigenvalue of $a(x)$ is bounded below by $c > 0$ uniformly over $x \in \bar{U}$.
[/definition]
The following existence result for the Dirichlet–Poisson problem is stated here without proof; its verification requires elliptic PDE theory — specifically, uniform Schauder estimates for second-order operators — that lies outside the stochastic analysis developed in this course.
[quotetheorem:2119]
The SDE representation theorem then identifies what any solution must look like.
[quotetheorem:2106]
[citeproof:2106]
The representation $u(x) = \mathbb{E}_x[g(X_{T_U}) + \int_0^{T_U} f(X_s) \, ds]$ is the foundation of Monte Carlo methods for elliptic PDEs. Rather than discretising the PDE on a spatial grid, one simulates many trajectories of $X$ starting from $x$, runs each until it exits $U$, and averages the boundary and source contributions — the estimator is unbiased by the theorem, and its variance decreases as the square root of the sample size. This approach is particularly attractive in high dimensions, where grid-based methods suffer from the curse of dimensionality.
### The Feynman–Kac Formula
The Dirichlet–Poisson representation extends to parabolic equations. Recall that for the heat equation $\partial_t u = \Delta u$ with initial data $u(0, \cdot) = f$, the solution is $u(x, t) = \mathbb{E}_x[f(W_t)]$ where $W$ is standard Brownian motion. For the general operator $L$, this becomes:
[quotetheorem:2107]
[citeproof:2107]
The representation $u(t, x) = \mathbb{E}_x[f(X_t)]$ has a striking consequence: it implies that the solution process $X$ is a **continuous Markov process**. The conditional expectation $\mathbb{E}_x[f(X_t) \mid \mathcal{F}_s]$ depends on $\mathcal{F}_s$ only through $X_s$, which is exactly the Markov property.
The deepest result of this section adds a potential term $V$ to the PDE, which introduces a multiplicative exponential weight.
[quotetheorem:2120]
The Feynman–Kac formula is a profound bridge between probability and analysis. When $L = \Delta$ is the Laplacian (the generator of Brownian motion run at speed $\sqrt{2}$) and $V$ is a potential, the equation $\partial_t u = \Delta u + Vu$ is the Schrödinger equation (in imaginary time), and the formula represents its solutions as expectations weighted by a Feynman path integral — the origin of the name. The exponential factor $\exp(\int_0^t V(X_s) \, ds)$ penalises (or rewards) paths that pass through regions where $V$ is large and negative (or positive), giving $V$ its interpretation as a potential energy.
[remark: Scope of Feynman–Kac]
The version stated here assumes $V$ is bounded. The formula extends to unbounded potentials under suitable integrability conditions on $V$ and $X$, and it generalises further to time-dependent coefficients. In quantum mechanics, $V$ is typically a confining potential well, and the Feynman–Kac formula provides a rigorous basis for Feynman's path-integral formulation of quantum mechanics — a representation of the heat kernel (and, by analytic continuation, the Schrödinger propagator) as an integral over Brownian paths.
[/remark]
## References
This course is based on lectures by R. Bauerschmidt (Part III, Lent 2018). Standard references for stochastic calculus include:
- I. Karatzas and S.E. Shreve, *Brownian Motion and Stochastic Calculus*, Springer, 1991.
- D. Revuz and M. Yor, *Continuous Martingales and Brownian Motion*, Springer, 1999.
- J.-F. Le Gall, *Brownian Motion, Martingales, and Stochastic Calculus*, Springer, 2016.
Contents
- Introduction
- Gaussian Spaces and Isometric Embeddings
- Gaussian White Noise
- From White Noise to Brownian Motion
- The Wiener Integral: A First Stochastic Integral
- Course Outline
- 1. The Lebesgue–Stieltjes Integral
- Signed Measures and the Hahn Decomposition
- Bounded Variation and the Lebesgue-Stieltjes Measure
- The Lebesgue-Stieltjes Integral
- Riemann Sum Approximation
- Integration by Parts
- Pathwise Integration Against Finite-Variation Processes
- 2. Semi-martingales
- Finite Variation Processes
- Stochastic Integration Against Finite Variation Processes
- The Significance and Limitations of Finite Variation
- Local Martingale
- The Optional Stopping Theorem
- Motivation for Localisation
- Non-negative Local Martingales and Supermartingales
- When Local Martingales are Genuine Martingales
- Canonical Reducing Sequences for Continuous Local Martingales
- Continuous Finite Variation Local Martingales are Trivial
- Square Integrable Martingales
- The Spaces $\mathcal{M}^2$ and $\mathcal{M}^2_c$
- Doob's $L^2$ Maximal Inequality
- $\mathcal{M}^2$ is a Hilbert Space
- Connection to the Angle Bracket and What Comes Next
- Quadratic Variation
- Convergence in the u.c.p. Sense
- The Angle Bracket Process
- Proof of the Quadratic Variation Theorem
- Properties of the Quadratic Variation
- Quadratic Variation for Square-Integrable Martingales
- Covariation
- Properties of the Covariation
- An Example: Independent Brownian Motions
- The Kunita–Watanabe Inequality
- Semimartingale
- Quadratic Variation of a Semimartingale
- 3. The Stochastic Integral
- Simple Processes
- Itô Isometry
- Extension to Local Martingales
- The Space $L^2_{\mathrm{loc}}(M)$
- Existence and Characterisation of the Integral
- Why Localisation Works
- Extension to Semimartingales
- Locally Bounded Previsible Processes
- The Stochastic Integral against a Semimartingale
- Properties of the Semimartingale Integral
- Stochastic Dominated Convergence
- Riemann Approximation of the Integral
- Itô's Formula
- Integration by Parts for Semimartingales
- The Itô Formula: Statement and Proof
- Differential Form and the Multiplication Table
- Key Applications
- Time-Change Representation
- The Lévy Characterization
- The Dubins–Schwarz Theorem
- The Martingale Representation Theorem
- Conformal Invariance of Planar Brownian Motion
- Girsanov's Theorem
- The Stochastic Exponential
- The Theorem
- Application to Stochastic Differential Equations
- 4. Stochastic Differential Equations
- Existence and Uniqueness of Solutions
- Examples of Stochastic Differential Equations
- The Ornstein–Uhlenbeck Process
- Geometric Brownian Motion
- The Bessel Process
- Representations of Solutions to PDEs
- The Dirichlet–Poisson Problem
- The Feynman–Kac Formula
- References
Cambridge III Stochastic Calculus and Applications
Content
Problems
History
Created by admin on 4/24/2026 | Last updated on 6/1/2026
Prerequisites
No prerequisites required for this page.
Rate this page
★
★
★
★
★
Poor
Excellent