In 1827, the botanist Robert Brown peered through a microscope at pollen grains suspended in water and observed something deeply unsettling: each grain moved in a ceaseless, erratic, unpredictable jitter that no force he could identify explained. The motion never ceased, never settled into equilibrium, and — most puzzling — seemed to have no coherent direction at any moment. Einstein's 1905 explanation revealed the mechanism: the grain is bombarded from all sides by enormous numbers of water molecules moving according to thermal fluctuations, and no two successive collisions are correlated. The cumulative effect of this molecular chaos is a continuous path that wiggles at every scale.
The mathematical challenge this poses is severe. If we wish to describe the position $W_t$ of the grain at time $t$, we want a function of time that is continuous, because the grain cannot teleport, and everywhere irregular, because the molecular bombardments are uncorrelated across arbitrarily short intervals. These two requirements are in violent tension with classical calculus. A continuous function that one pictures from analysis — say a polynomial or a trigonometric function — is smooth on most of its domain. But Brownian paths are nowhere differentiable: at every single moment in time, the grain has no well-defined velocity. The velocity would require a limit $(W_{t+h} - W_t)/h$ as $h \to 0$, but since $W_{t+h} - W_t \sim \mathcal{N}(0, h)$, this ratio has variance $1/h$, which diverges as $h \to 0$. The fluctuations grow faster than $h$ shrinks.
What we face is a fundamental inadequacy of classical analysis when confronted with genuine randomness at every scale. The project of Brownian motion as a mathematical theory is to build a rigorous foundation for these objects: to prove they exist, to understand how regular their paths are, to develop a notion of integration with respect to them (since differentiation fails), and to connect them back to classical PDE theory in surprising ways.
[example: Random Walk Approximation]
Before the formal construction, a discrete model builds the intuition. Fix a time horizon $T > 0$ and divide it into $n$ equal steps of size $\Delta t = T/n$. At each step, toss a fair coin: heads moves the particle up by $\sqrt{\Delta t}$, tails moves it down by $\sqrt{\Delta t}$. Let $X_k$ be the position after $k$ steps, so
\begin{align*}
X_k = \sqrt{\Delta t} \sum_{j=1}^{k} \varepsilon_j,
\end{align*}
where $\varepsilon_1, \varepsilon_2, \ldots$ are i.i.d. with $\mathbb{P}(\varepsilon_j = +1) = \mathbb{P}(\varepsilon_j = -1) = 1/2$. Define $W^{(n)}$ by linearly interpolating the points $(k\Delta t, X_k)$. By the Central Limit Theorem, for any fixed $t$, $W^{(n)}_t \xrightarrow{d} \mathcal{N}(0, t)$ as $n \to \infty$. The scaling $\sqrt{\Delta t}$ is precisely chosen so that the variance of $X_{\lfloor nt/T \rfloor}$ equals approximately $t$ regardless of $n$, and increments over disjoint intervals are independent by construction. This suggests that the limit $n \to \infty$ should produce a continuous process with independent Gaussian increments — precisely the defining properties of Brownian motion. Donsker's invariance principle makes this convergence rigorous: the random functions $W^{(n)}$ converge in distribution, in the space of continuous functions $C([0,T])$ equipped with the supremum norm, to a process satisfying those four axioms. The discrete random walk is the combinatorial skeleton; Brownian motion is the continuous path it approaches.
[/example]
## Definition
The random walk example tells us what properties the limit should have. The formal definition extracts these properties into axioms.
A *stochastic process* is a family of random variables $(W_t)_{t \ge 0}$ on a single probability space $(\Omega, \mathcal{F}, \mathbb{P})$; each *sample path* $t \mapsto W_t(\omega)$ is a function of time for a fixed outcome $\omega \in \Omega$. The question of what sample paths look like — continuous? differentiable? — is a separate issue from the joint distribution of the random variables. The four defining axioms of Brownian motion capture the physical requirements cleanly.
[definition: Standard Brownian Motion]
A **standard Brownian motion** (or **Wiener process**) is a stochastic process $(W_t)_{t \ge 0}$ on a probability space $(\Omega, \mathcal{F}, \mathbb{P})$ satisfying:
1. **Initial condition:** $W_0 = 0$ almost surely.
2. **Independent increments:** For any $0 \le s < t$, the increment $W_t - W_s$ is independent of $\sigma(W_r : r \le s)$.
3. **Gaussian increments:** For any $0 \le s < t$, $W_t - W_s \sim \mathcal{N}(0, t - s)$.
4. **Continuous paths:** Almost surely, the map $t \mapsto W_t(\omega)$ is continuous.
[/definition]
The four axioms each play a distinct role. Axiom 1 pins down the starting point. Axiom 2 captures the memoryless character of molecular bombardment: what the grain does after time $s$ is independent of its entire history up to $s$. Axiom 3 fixes the scale: the increment over a time interval of length $h$ has variance exactly $h$, so longer intervals produce larger fluctuations in proportion to $\sqrt{h}$. Axiom 4 is what distinguishes a path process from a mere collection of random variables — it says that $W$ traces a genuine continuous curve, never teleporting between values.
From these axioms, the basic distributional properties follow immediately. For any $0 \le s \le t$:
\begin{align*}
\mathbb{E}[W_t] &= 0, \\
\mathbb{E}[W_s W_t] &= \min(s, t), \\
\mathbb{E}[|W_t - W_s|^2] &= t - s.
\end{align*}
The covariance formula $\mathbb{E}[W_s W_t] = \min(s,t)$ deserves attention. Assuming $s \le t$, write $W_t = W_s + (W_t - W_s)$, where the two terms are independent by Axiom 2, and $W_t - W_s$ has mean zero by Axiom 3. Then $\mathbb{E}[W_s W_t] = \mathbb{E}[W_s^2] + \mathbb{E}[W_s]\mathbb{E}[W_t - W_s] = s + 0 = s = \min(s,t)$.
[remark: Existence Is Non-Trivial]
The definition asserts the existence of such a process, but this must be proved — there is no guarantee that a probability measure on path space satisfying all four axioms simultaneously can be constructed. The standard route goes through Kolmogorov's extension theorem to build the finite-dimensional distributions, followed by a regularity theorem to pass from a process with the right distributions to one with genuinely continuous paths.
[/remark]
## Construction and Path Regularity
Writing down the four axioms is one thing. Proving that a process satisfying all four simultaneously exists requires genuine work. The strategy has two steps: first construct a process with the correct finite-dimensional distributions using Kolmogorov's extension theorem, then upgrade to continuous paths using a general regularity criterion.
The key tool for the second step detects when a family of random variables indexed by a parameter can be jointly modified to have Hölder-continuous trajectories.
[quotetheorem:1170]
For Brownian motion, the Gaussian increment axiom gives all moments explicitly. Since $W_t - W_s \sim \mathcal{N}(0, t-s)$, the $2k$-th moment of a centred Gaussian with variance $\sigma^2$ equals $(2k-1)!! \, \sigma^{2k}$, where $(2k-1)!! = 1 \cdot 3 \cdot 5 \cdots (2k-1)$. Taking $k = 2$:
\begin{align*}
\mathbb{E}[|W_t - W_s|^4] = 3(t - s)^2.
\end{align*}
The quoted criterion applies with $p=4$ and $\varepsilon=1$, since the fourth moment is bounded by a constant times $|t-s|^{1+\varepsilon}$. It therefore gives Hölder continuity for every exponent $\alpha<\varepsilon/p=1/4$. A more careful argument using higher moments gives Brownian paths that are $\alpha$-Hölder continuous for every $\alpha<1/2$. The exponent $1/2$ itself is not achieved: BM paths fail to be $1/2$-Hölder continuous on any interval.
This upper bound $\gamma < 1/2$ is essentially the optimal regularity statement, and it has an immediate consequence: since $1/2 < 1$, Brownian paths are never Lipschitz, and in particular never differentiable. But the failure of differentiability is far more dramatic than mere failure of Lipschitz continuity. The following theorem shows that Brownian paths are differentiable at no point whatsoever.
[quotetheorem:3548]
The theorem is the precise version of the variance blow-up intuition from the opening paragraphs. Brownian increments over a short interval of length $h$ have typical size $\sqrt{h}$, so dividing by $h$ amplifies the fluctuation to scale $1/\sqrt{h}$. As the interval shrinks, the would-be velocity becomes more unstable rather than more settled. Thus Brownian motion is continuous enough to trace a path, but too irregular to carry an instantaneous velocity at any time.
The path regularity story closes by asking what kind of accumulated variation survives when ordinary arc length is infinite. To make that question precise, we need a second-order replacement for total variation: a limit formed from squared increments rather than absolute increments.
The *total variation* of a differentiable path on $[0,T]$ equals $\int_0^T |f'(t)| \, d\mathcal{L}^1(t)$. Brownian paths have infinite total variation on every interval — the oscillations are too violent to accumulate a finite arc length. But a coarser notion does behave precisely: instead of summing absolute increments $|X_{t_{i+1}} - X_{t_i}|$, sum their squares. The *quadratic variation* turns out to equal exactly $T$ for BM on $[0,T]$ — not zero (as for smooth functions) and not infinity (as total variation is) — and this precise non-zero value is what forces the Itô correction term in stochastic calculus.
To use this second-order accumulation as an invariant rather than an informal calculation, we first need a definition that works for an arbitrary process and for refining partitions whose mesh tends to zero. The key issue is that the value should not depend on a favored grid: a stochastic path is observed through finer and finer time partitions, and the squared-increment sums must settle to the same limit as the largest time step disappears. The following definition packages that requirement by naming the partition sums, specifying the mesh condition, and choosing an $L^2(\mathbb{P})$ mode of convergence strong enough to control the random error in those sums.
[definition: Quadratic Variation of a Process]
Let $(X_t)_{t \in [0,T]}$ be a stochastic process. Given a partition $\Pi = \{0 = t_0 < t_1 < \cdots < t_n = T\}$ with mesh $|\Pi| = \max_i(t_{i+1} - t_i)$, define the quadratic variation sum along $\Pi$:
\begin{align*}
V^2(X; \Pi) = \sum_{i=0}^{n-1} (X_{t_{i+1}} - X_{t_i})^2.
\end{align*}
The **quadratic variation** of $X$ on $[0,T]$ is $[X, X]_T = \lim_{|\Pi| \to 0} V^2(X; \Pi)$ when this limit exists in $L^2(\mathbb{P})$.
[/definition]
For a smooth function $f$, the quadratic variation is zero: each increment is $O(|\Pi|)$, so the sum of squares is $O(n \cdot |\Pi|^2) = O(|\Pi|) \to 0$. Brownian motion sits at the opposite extreme.
The next question is whether Brownian motion really has this second-order accumulation in a canonical refining scheme. The theorem below proves the statement along dyadic partitions, which are the standard nested partitions used to see the phenomenon cleanly. More general partition-invariance statements require additional hypotheses on the partitions, but the dyadic result already identifies the quantity that drives Itô calculus.
[quotetheorem:3543]
The result is often written in differential shorthand as $dW_t \cdot dW_t = dt$. This notation records the conceptual lesson rather than a literal product of differentials: along the canonical dyadic refinement, Brownian motion accumulates one unit of quadratic variation per unit of elapsed time.
[example: Quadratic Variation Versus Total Variation]
To see how drastically Brownian paths differ from smooth curves, consider $f(t) = \sin(t)$ on $[0, 2\pi]$. Its total variation is $4$ (two full oscillations between $-1$ and $1$). For the quadratic variation along the uniform partition with $n$ subintervals of length $2\pi/n$:
\begin{align*}
\sum_{i=0}^{n-1} \left(\sin\!\left(\frac{2\pi(i+1)}{n}\right) - \sin\!\left(\frac{2\pi i}{n}\right)\right)^2 \lesssim n \cdot \left(\frac{2\pi}{n}\right)^2 = \frac{4\pi^2}{n} \to 0.
\end{align*}
For a Brownian path on $[0, 2\pi]$, the same sum converges to $2\pi$ in $L^2$. The smooth function is "thin" in second-order variation; the Brownian path packs $2\pi$ worth of quadratic variation into the same interval. This distinction is what forces the Itô correction term in stochastic calculus.
[/example]
## The Markov Property and Stopping Times
Brownian motion has a remarkable memoryless structure: what the process does after time $s$ depends on the present value $W_s$ but not on the path $(W_r)_{r \le s}$ prior to $s$. This is the Markov property. It has a sharper form — the strong Markov property — where the deterministic time $s$ is replaced by a random time $\tau$ that depends on the path itself. To state these properties with precision, we need the language of filtrations, which organise the information available to an observer watching the path unfold in time.
[definition: Natural Filtration of Brownian Motion]
Let $(W_t)_{t \ge 0}$ be a standard Brownian motion. The **natural filtration** is the family of $\sigma$-algebras
\begin{align*}
\mathcal{F}_t^W = \sigma(W_s : 0 \le s \le t), \quad t \ge 0.
\end{align*}
The **usual augmentation** $(\mathcal{F}_t)_{t \ge 0}$ is obtained by completing the natural filtration with all $\mathbb{P}$-null sets and then taking a right-continuous version, often written $\mathcal{F}_t=\bigcap_{u>t}\mathcal{F}_u^W$ after completion.
[/definition]
The filtration $(\mathcal{F}_t)_{t \ge 0}$ represents the information accumulated by observing the Brownian path up to time $t$, with the usual technical corrections built in. Completion handles null events, while right-continuity is imposed separately; both are part of the standard hypotheses under which the strong Markov property behaves cleanly.
Once the information flow is fixed, the first structural question is whether Brownian motion really has no memory relative to that information. The Markov property makes this precise by comparing the future after a deterministic time with a fresh Brownian motion started from the observed present value.
[quotetheorem:1175]
The Markov property says that BM restarts from scratch at any deterministic time $s$. But many natural events — "the first time BM hits level $a$," "the first time BM exits an interval" — are defined by random times. To apply the restart property at these times, we must first characterise which random times are admissible. The key requirement is that we can determine whether the random time has already elapsed by examining the path up to any fixed moment $t$.
[definition: Stopping Time]
A random variable $\tau: \Omega \to [0, \infty]$ is a **stopping time** with respect to the filtration $(\mathcal{F}_t)_{t \ge 0}$ if
\begin{align*}
\{\tau \le t\} \in \mathcal{F}_t \quad \text{for all } t \ge 0.
\end{align*}
[/definition]
The condition $\{\tau \le t\} \in \mathcal{F}_t$ means: by observing the path up to time $t$, we can determine whether $\tau$ has already occurred. First hitting times of closed sets are stopping times — we can check whether $W$ has hit the target by time $t$. By contrast, $\sigma = \sup\{t \le 1 : W_t = 0\}$ is not a stopping time: knowing the path only up to time $t < 1$ does not reveal when the last zero before time $1$ will occur.
The reason stopping times matter is that many Brownian calculations begin at a random hitting or exit time, not at a clock time chosen in advance. To make those calculations legitimate, we need a theorem saying that Brownian motion can be restarted at such observable random times with the same law and independence properties it has at deterministic times.
The theorem is usually stated with the right-continuous filtration $\mathcal{F}_t^+ := \bigcap_{u>t}\mathcal{F}_u$. For a stopping time $T$, the notation $\mathcal{F}_T^+$ means the information observable up to the random time $T$ in this right-continuous filtration. This is the precise sigma-algebra from which the post-$T$ Brownian motion must be independent.
[quotetheorem:1180]
The strong Markov property is not automatic for all Markov processes; it requires the right-continuity of the filtration, which is precisely why the null-set augmentation in the definition of $(\mathcal{F}_t)$ is not merely cosmetic.
The next issue is what information can be extracted from Brownian motion at arbitrarily small positive times. Events determined by every initial interval might appear subtle, but the restart symmetry forces a sharp dichotomy: such infinitesimal information cannot carry nontrivial probabilities.
[quotetheorem:1178]
The germ $\sigma$-algebra captures everything that can be deduced from the behaviour of $W$ on arbitrarily small initial intervals. The 0-1 law says this algebra is trivial: every event determined by the infinitesimal initial behaviour of BM is either impossible or certain. A striking consequence is that $\mathbb{P}(W_t > 0 \text{ for all small } t > 0) = 0$ — BM immediately oscillates around zero, crossing it infinitely often near $t = 0$, with probability one.
## Martingales and Brownian Motion
A martingale models a fair game: the expected future value, given the present, equals the present value. Brownian motion generates a rich family of martingales, and these martingales are the central tool for computing expectations of functionals of BM paths.
Three fundamental martingales arise from polynomial and exponential functions of $W_t$. Each encodes a different moment of the Gaussian distribution of BM increments, and each has a direct application.
[quotetheorem:1183]
The polynomial martingales control the first two moments, but many Brownian computations require a whole family of test martingales indexed by a parameter. Exponential martingales provide that family: they package Gaussian moment-generating functions into adapted processes and become the basic tool for change of measure arguments.
[quotetheorem:1184]
That $W_t$ is a martingale reflects the symmetry of BM: no drift in any direction. That $W_t^2 - t$ is a martingale says that although $\mathbb{E}[W_t^2] = t$ grows linearly in time, the growth is exactly accounted for by the $t$ correction. The exponential martingale $\exp(\sigma W_t - \sigma^2 t/2)$ — sometimes called the Doléans-Dade or Girsanov exponential — is fundamental in the theory of measure changes and stochastic differential equations.
[example: Verifying the Quadratic Martingale]
We verify that $M_t = W_t^2 - t$ is a martingale. For $s < t$, compute $\mathbb{E}[M_t \mid \mathcal{F}_s]$ directly. Write $W_t = W_s + (W_t - W_s)$ and expand:
\begin{align*}
W_t^2 = W_s^2 + 2W_s(W_t - W_s) + (W_t - W_s)^2.
\end{align*}
Taking conditional expectation given $\mathcal{F}_s$ and using independence of $W_t - W_s$ from $\mathcal{F}_s$ (with $\mathbb{E}[W_t - W_s] = 0$ and $\mathbb{E}[(W_t - W_s)^2] = t - s$):
\begin{align*}
\mathbb{E}[W_t^2 \mid \mathcal{F}_s] = W_s^2 + 2W_s \cdot 0 + (t - s) = W_s^2 + (t - s).
\end{align*}
Therefore:
\begin{align*}
\mathbb{E}[M_t \mid \mathcal{F}_s] = W_s^2 + (t - s) - t = W_s^2 - s = M_s. \qquad \checkmark
\end{align*}
[/example]
Martingale theory provides powerful tools for computing expected values at random times. The Optional Stopping Theorem states that the martingale property is preserved at stopping times, under appropriate integrability conditions.
The theorem is needed because stopping a martingale at an arbitrary random time can destroy integrability or introduce bias. Optional stopping identifies the hypotheses under which the fair-game intuition remains valid, which is exactly what is required for exit-time calculations with Brownian motion.
[quotetheorem:2109]
Applied simultaneously to the two martingales $W_t$ and $W_t^2 - t$, the Optional Stopping Theorem can compute both the exit distribution and the expected exit time for any bounded interval.
[example: First Passage Time via Optional Stopping]
Let $a, b > 0$ and define $\tau = \inf\{t \ge 0 : W_t \in \{-a, b\}\}$, the first exit time from the interval $(-a, b)$. Since $(-a, b)$ is a bounded interval and $W_1 \sim \mathcal{N}(0,1)$ has positive probability of exceeding $\max(a,b)$ in absolute value, the process exits $(-a, b)$ with positive probability during each unit time interval; independence of increments over disjoint intervals then gives $\tau < \infty$ almost surely by a geometric-trials argument. Apply the Optional Stopping Theorem to $M_t = W_t$:
\begin{align*}
\mathbb{E}[W_\tau] = \mathbb{E}[W_0] = 0.
\end{align*}
Since $W_\tau \in \{-a, b\}$, letting $p = \mathbb{P}(W_\tau = b)$:
\begin{align*}
pb + (1 - p)(-a) = 0 \implies p = \frac{a}{a + b}.
\end{align*}
Now apply optional stopping to $N_t = W_t^2 - t$, giving $\mathbb{E}[W_\tau^2 - \tau] = 0$, so
\begin{align*}
\mathbb{E}[\tau] = \mathbb{E}[W_\tau^2] = p \cdot b^2 + (1-p) \cdot a^2 = \frac{a b^2 + b a^2}{a + b} = ab.
\end{align*}
The expected exit time from $(-a, b)$ is $ab$. When $a = b$, this gives $\mathbb{E}[\tau] = a^2$: a symmetric interval of half-width $a$ has expected exit time $a^2$, consistent with the $\sqrt{t}$ scaling of Brownian motion.
[/example]
## Itô's Formula and Stochastic Integration
Classical calculus rests on two pillars: differentiation and integration. For Brownian paths, differentiation fails completely. Integration, however, can be salvaged — but only by abandoning the classical Riemann-Stieltjes theory and building something new. The obstruction is precisely the infinite total variation of BM paths: the Stieltjes integral $\int_0^t f(s) \, dg(s)$ requires $g$ to have bounded variation, which Brownian paths do not.
The Itô stochastic integral is defined as a limit of Riemann sums using left-endpoint evaluation:
\begin{align*}
\int_0^t f(W_s) \, dW_s = \lim_{|\Pi| \to 0} \sum_{i} f(W_{t_i})(W_{t_{i+1}} - W_{t_i}) \quad \text{in } L^2(\mathbb{P}).
\end{align*}
The choice of left endpoints — rather than right endpoints or midpoints — is not arbitrary. It is precisely what makes the resulting integral a martingale: $\mathbb{E}\!\left[\int_0^t f(W_s) \, dW_s\right] = 0$ for all adapted integrands $f$ satisfying $\int_0^t \mathbb{E}[f(W_s)^2] \, d\mathcal{L}^1(s) < \infty$. This is the Itô convention, and the resulting calculus has a built-in symmetry that makes expectations tractable.
The nonzero quadratic variation $[W,W]_t = t$ is the source of the key difference from classical calculus. For a smooth function $f$, expanding via Taylor's theorem:
\begin{align*}
f(W_{t+h}) - f(W_t) \approx f'(W_t)(W_{t+h} - W_t) + \tfrac{1}{2} f''(W_t)(W_{t+h} - W_t)^2.
\end{align*}
In classical calculus the quadratic term is negligible because $(W_{t+h} - W_t)^2 = O(h^2)$. For Brownian motion, $(W_{t+h} - W_t)^2 \approx h$ — the quadratic term contributes at first order and cannot be discarded. Summing over a partition and taking the limit yields Itô's formula.
[quotetheorem:3546]
The extra term $\frac{1}{2} \int_0^t f''(W_s) \, ds$ is the **Itô correction**. It arises from the nonzero quadratic variation, encoded in the heuristic $dW_t \cdot dW_t = dt$. For a classical smooth path with zero quadratic variation, there is no correction and the formula reduces to the ordinary fundamental theorem of calculus. The Itô correction is not a flaw in stochastic integration; it is a precise and necessary feature of how Brownian paths accumulate quadratic variation.
[example: Itô's Formula for the Square]
Apply Itô's formula with $f(x) = x^2$, giving $f'(x) = 2x$ and $f''(x) = 2$:
\begin{align*}
W_t^2 = 0 + \int_0^t 2W_s \, dW_s + \frac{1}{2} \int_0^t 2 \, d\mathcal{L}^1(s) = 2\int_0^t W_s \, dW_s + t.
\end{align*}
Rearranging: $\int_0^t W_s \, dW_s = \frac{1}{2}(W_t^2 - t)$. Compare with the classical formula $\int_0^t x \, dx = \frac{1}{2}t^2$: the stochastic answer $\frac{1}{2}(W_t^2 - t)$ contains the Itô correction $-t/2$. The correction is not accidental — it makes $\int_0^t W_s \, dW_s$ a martingale (expectation zero), while the naïve guess $W_t^2/2$ has expectation $t/2 \ne 0$.
[/example]
When $f$ depends on both time and space, the formula gains a time-derivative term. This version — the space-time Itô formula — is the one most directly connected to PDE theory, because the $d\mathcal{L}^1(s)$ integral involves the operator $\partial_t + \frac{1}{2}\partial_{xx}$, precisely the generator of the heat equation.
The time-dependent formula is needed to answer a new question: if we observe a smooth function along the random curve $t \mapsto W_t$, which part of its change is random martingale fluctuation and which part is deterministic drift? The theorem below gives that decomposition and identifies the drift as the heat operator applied to the function.
[quotetheorem:3549]
If $f$ satisfies the backward heat equation $\partial_t f + \frac{1}{2}\partial_{xx} f = 0$, the two $d\mathcal{L}^1(s)$ integrands cancel exactly, and $f(t, W_t)$ reduces to a pure stochastic integral — hence a martingale. This is the key mechanism connecting PDE theory to probabilistic representations: solutions to equations involving the generator of BM become martingales along Brownian paths, and their expectations encode formulas for the solutions themselves.
## The Reflection Principle and the Running Maximum
The path symmetry of Brownian motion about any level it has already visited gives a powerful tool for computing distributions of path extrema. The idea is the following: once BM reaches level $a > 0$, the strong Markov property implies that the future behaviour is symmetric about $a$. Therefore, paths that reach $a$ and finish above $a$ at time $t$ are matched in number with paths that reach $a$ and then finish below $a$ at time $t$ — a reflection gives a bijection between them.
[quotetheorem:1181]
The reflection principle computes the distribution of the running maximum $M_t = \sup_{0 \le s \le t} W_s$. Since $\{M_t \ge a\} = \{\tau_a \le t\}$, the theorem gives:
\begin{align*}
\mathbb{P}(M_t \ge a) = 2\mathbb{P}(W_t \ge a) = 2\!\left(1 - \Phi\!\left(\frac{a}{\sqrt{t}}\right)\right),
\end{align*}
where $\Phi$ is the standard normal CDF. The density of $M_t$ is therefore:
\begin{align*}
\mathbb{P}(M_t \in da) = \sqrt{\frac{2}{\pi t}} \exp\!\left(-\frac{a^2}{2t}\right) \, da, \quad a > 0.
\end{align*}
The reflection principle simultaneously yields the distribution of the first passage time $\tau_a$. Since $\{\tau_a \le t\} = \{M_t \ge a\}$, differentiating $\mathbb{P}(\tau_a \le t)$ in $t$ gives the density:
\begin{align*}
f_{\tau_a}(t) = \frac{a}{\sqrt{2\pi t^3}} \exp\!\left(-\frac{a^2}{2t}\right), \quad t > 0.
\end{align*}
This is an inverse Gaussian density. Since $f_{\tau_a}(t)$ decays like $t^{-3/2}$ as $t \to \infty$, the integral $\int_0^\infty t \cdot f_{\tau_a}(t) \, d\mathcal{L}^1(t)$ diverges: $\mathbb{E}[\tau_a] = \infty$. Brownian motion reaches every level with probability one — it is neighbourhood-recurrent in one dimension — but the time it takes has such heavy tails that its expectation is infinite.
[example: Joint Distribution of the Maximum and Terminal Value]
The reflection principle gives the joint distribution of $(W_t, M_t)$. For $x \le a$ with $a > 0$:
\begin{align*}
\mathbb{P}(W_t \le x,\, M_t \ge a) = \mathbb{P}(W_t \ge 2a - x).
\end{align*}
The argument invokes the strong Markov property at $\tau_a$: after time $\tau_a$, reflect the path about $a$ to obtain another BM. Under the reflected path, the event $\{W_t \le x\}$ (for the original path) corresponds to $\{W_t \ge 2a - x\}$ (for the reflected path). Differentiating the reflected-tail identity with respect to $x$ gives, for $y\le a$,
\begin{align*}
\mathbb{P}(W_t \in dy,\, M_t \ge a)
= \frac{1}{\sqrt{2\pi t}}\exp\!\left(-\frac{(2a-y)^2}{2t}\right)\,dy.
\end{align*}
The extra derivative in the first-passage density appears only when differentiating in the time variable $t$, not when differentiating this terminal-value identity in $y$.
[/example]
## Brownian Motion and the Heat Equation
The deepest structural result in the theory connects Brownian motion to classical PDE theory: BM and the heat equation are two faces of the same mathematical object. The connection runs in both directions. The probability density of $W_t$ satisfies the heat equation. Conversely, every solution to the heat equation can be represented as an expectation over Brownian paths. This duality has profound consequences for both probability theory and the analysis of PDE.
The **transition density** of BM started at $x \in \mathbb{R}$ is the probability density of $W_t$ given $W_0 = x$:
\begin{align*}
p(t, x, y) = \frac{1}{\sqrt{2\pi t}} \exp\!\left(-\frac{(y - x)^2}{2t}\right), \quad t > 0, \ x, y \in \mathbb{R}.
\end{align*}
This is the Gaussian kernel — and that it satisfies the heat equation is no accident. It is the first concrete instance of the BM-PDE duality.
[quotetheorem:3550]
The converse direction is equally striking: every bounded continuous initial datum gives a solution to the heat equation representable as an expectation over Brownian paths. The analytic question is how to evolve initial data forward in time; Brownian motion answers it by averaging the initial profile over the random displacement at time $t$. This makes the heat semigroup visible as a probabilistic smoothing operator.
[quotetheorem:3551]
The theorem explains why Brownian averaging is exactly the heat-flow operation. Random displacement spreads the initial data by a Gaussian kernel, while the martingale structure prevents any extra drift term from appearing. Thus the heat equation is not merely analogous to Brownian motion; it is the deterministic shadow of Brownian averaging.
The BM-PDE duality reaches its most powerful form as the Feynman-Kac formula, which handles elliptic equations with potential terms. The new difficulty is that the equation now contains a zeroth-order term that weights paths according to how long they spend in different regions. Feynman-Kac accounts for this by inserting an exponential path weight before taking the boundary expectation.
[quotetheorem:3552]
[remark: Boundary Regularity in Feynman-Kac]
The conclusion that $u \in C(\bar{U})$ — meaning the boundary data is attained continuously — requires every boundary point of $U$ to be *regular* in the sense of potential theory: informally, Brownian motion started near $x \in \partial U$ must exit $U$ near $x$ immediately rather than spending time inside. Regularity is guaranteed by the exterior cone condition, which holds for all $C^1$ boundaries and all convex domains. At an irregular boundary point (such as the tip of an inward cusp), the probabilistic formula still produces a solution in $C^2(U)$, but the boundary data need not be attained continuously.
[/remark]
In the special case $q \equiv 0$, the potential term disappears, and the formula reduces to a clean statement about harmonic functions: the solution to Laplace's equation is simply the expected boundary value along Brownian exit paths. This deserves to be isolated because it turns the Dirichlet problem into a hitting-distribution problem. Instead of solving a PDE directly, one can understand the value at an interior point by asking where Brownian motion first exits the domain.
[quotetheorem:3553]
This is a probabilistic interpretation of the mean value property of harmonic functions: the value of $u$ at an interior point $x$ is the average of $f$ over the boundary, weighted by the **harmonic measure** — the exit distribution of BM started at $x$. The harmonic measure depends on both the geometry of $U$ and the starting point $x$; it equals the uniform measure on $\partial U$ when $U$ is a ball and $x$ is its centre, but not in general.
[example: Dirichlet Problem on an Interval]
Consider the Dirichlet problem on $U = (0, 1) \subset \mathbb{R}$: find $u \in C^2((0,1)) \cap C([0,1])$ with $-\frac{1}{2}u'' = 0$ on $(0,1)$ and boundary conditions $u(0) = 0$, $u(1) = 1$. The unique solution is $u(x) = x$.
Kakutani's formula gives $u(x) = \mathbb{E}^x[u(W_{\tau_U})]$ where $W_{\tau_U} \in \{0, 1\}$. Since $u(0) = 0$ and $u(1) = 1$:
\begin{align*}
u(x) = 0 \cdot \mathbb{P}^x(W_{\tau_U} = 0) + 1 \cdot \mathbb{P}^x(W_{\tau_U} = 1) = \mathbb{P}^x(W_{\tau_U} = 1).
\end{align*}
From the optional stopping computation in the Martingales section with $a = x$ and $b = 1 - x$, $\mathbb{P}^x(W_{\tau_U} = 1) = x$. So Kakutani's formula gives $u(x) = x$, matching the analytic solution exactly. The probabilistic and analytic approaches yield identical answers, confirming the Feynman-Kac duality in this one-dimensional setting.
[/example]
The Feynman-Kac connection has consequences that reach well beyond computing explicit solutions. The maximum principle for elliptic operators follows from the observation that BM must exit every bounded domain, so expectations over exit values inherit the extremal properties of $f$. The spectrum of the Laplacian on a domain encodes the distribution of BM exit times through the eigenvalue expansion of the heat kernel. Monte Carlo methods for solving Laplace's equation reduce to simulating BM paths until they hit the boundary — a conceptually simple algorithm with a deep probabilistic foundation.
[illustration:brownian-exit-harmonic-measure]
## References
- I. Karatzas and S. E. Shreve, *Brownian Motion and Stochastic Calculus* (1991).
- P. Mörters and Y. Peres, *Brownian Motion* (2010).
- D. Revuz and M. Yor, *Continuous Martingales and Brownian Motion* (1999).
- B. Øksendal, *Stochastic Differential Equations: An Introduction with Applications* (2003).
- N. Ikeda and S. Watanabe, *Stochastic Differential Equations and Diffusion Processes* (1981).
---