A single [random variable](/page/Random%20Variable) describes one uncertain observation. Many probabilistic questions are not about one observation, but about an evolving record: the position of a particle after each collision, the wealth of a gambler after each bet, the number of calls arriving before time $t$, or the price of an asset as information accumulates. In all these cases, the object of interest is not just the marginal law at one time. The order of the observations, their dependence, and the information revealed along the way are part of the mathematics.
The first warning is that the distribution at a fixed time can hide the event we care about. A fair gambler might have final wealth $0$ after ten plays, but the path could have visited bankruptcy earlier. A diffusion might have the same one-time Gaussian law as another process, while having different correlations and different hitting behaviour. A stochastic process is the language that keeps the whole random evolution visible.
[example: A Fair Random Walk Remembers Its Path]
Let $Y_1,Y_2,\dots$ be i.i.d. random variables with $\mathbb P(Y_i=1)=\mathbb P(Y_i=-1)=1/2$, and define $S_0=0$ and $S_n=\sum_{i=1}^n Y_i$. At time $2$,
\begin{align*}
S_2=Y_1+Y_2.
\end{align*}
The four possible pairs $(Y_1,Y_2)$ are $(1,1)$, $(1,-1)$, $(-1,1)$, and $(-1,-1)$. By independence, each has probability $(1/2)(1/2)=1/4$. Therefore $S_2$ takes the values $2,0,-2$, and the value $0$ occurs exactly for the two pairs $(1,-1)$ and $(-1,1)$, so
\begin{align*}
\mathbb P(S_2=0)=\mathbb P((Y_1,Y_2)=(1,-1))+\mathbb P((Y_1,Y_2)=(-1,1))=\frac{1}{4}+\frac{1}{4}=\frac{1}{2}.
\end{align*}
The terminal event $S_2=1$ is impossible, because $Y_1+Y_2$ is always one of $-2,0,2$; hence $\mathbb P(S_2=1)=0$. By contrast, the event that the walk has reached level $1$ by time $2$ is
\begin{align*}
\left\{\max\{S_0,S_1,S_2\}\ge1\right\}.
\end{align*}
Since $S_0=0$ and $S_1=Y_1$, if $Y_1=1$ then the maximum is already at least $1$. If $Y_1=-1$, then $S_1=-1$ and $S_2=-1+Y_2$ is either $0$ or $-2$, so the maximum among $S_0,S_1,S_2$ is $0$ and the walk has not hit $1$. Thus
\begin{align*}
\left\{\max\{S_0,S_1,S_2\}\ge1\right\}=\{Y_1=1\},
\end{align*}
and therefore
\begin{align*}
\mathbb P\left(\max\{S_0,S_1,S_2\}\ge1\right)=\mathbb P(Y_1=1)=\frac{1}{2}.
\end{align*}
For example, the path with $Y_1=-1$ and $Y_2=1$ has $S_0=0$, $S_1=-1$, and $S_2=0$, so it ends at the same terminal value as the path $(1,-1)$ but never reaches $1$. This is why the random evolution must be treated as the whole family $(S_n)_{n\ge0}$, not just through the law of one terminal value.
[/example]
The random walk already contains most of the conceptual ingredients of the subject. There is an index set, a state space, a [probability space](/page/Probability%20Space), a family of random variables, and a growing body of information. Changing any one of these changes the theory: discrete time behaves differently from continuous time, countable state spaces behave differently from path spaces, and processes observed with a filtration support questions that cannot even be stated for an isolated random variable.
## Definition
The definition packages the minimal data needed to talk about random evolution. The index set records the parameter along which the process evolves; it is often time, but the same language also handles random fields indexed by space, stochastic sequences indexed by $\mathbb N$, and families indexed by arbitrary sets.
[definition: Stochastic Process]
Let $(\Omega, \mathcal F, \mathbb P)$ be a probability space, let $(E, \mathcal E)$ be a measurable space, and let $T$ be a non-empty index set. A stochastic process with index set $T$ and state space $(E,\mathcal E)$ is a family of random variables
\begin{align*}
X_t : (\Omega,\mathcal F) \to (E,\mathcal E), \qquad t \in T.
\end{align*}
It is denoted by $(X_t)_{t \in T}$.
[/definition]
When $T=\mathbb N$ or $T=\{0,1,2,\dots\}$, the process is a discrete-time process. When $T=[0,\infty)$ or an interval in $\mathbb R$, it is a continuous-time process. The word continuous in continuous-time refers to the index set, not to the sample paths. The formal definition is deliberately lean; the next section separates the choices hidden inside it: time, state, and path behaviour.
## Time, State, and Sample Paths
### Discrete Time
The simplest classification asks what kind of time and what kind of state space are being used. This distinction is not cosmetic. A [Markov chain](/page/Markov%20Chain) on a [countable set](/page/Countable%20Set) is governed by transition probabilities between states; [Brownian motion](/page/Brownian%20Motion) in continuous time is governed by Gaussian increments and path regularity; a random field indexed by $\mathbb R^n$ is shaped by spatial dependence rather than temporal order.
A discrete-time process is a random sequence. Because the index set is countable, many events involving all times can be built from countable unions and intersections. This makes measurability less fragile than in continuous time.
[example: Bernoulli Process as Repeated Observation]
Let $X_1,X_2,\dots$ be i.i.d. random variables with $X_n\sim\operatorname{Ber}(p)$ for some $p\in[0,1]$, so each $X_n$ takes values in $\{0,1\}$ with
\begin{align*}
\mathbb P(X_n=1)=p \quad \text{and} \quad \mathbb P(X_n=0)=1-p.
\end{align*}
The process $(X_n)_{n\in\mathbb N}$ records the outcome of each trial separately: $X_n=1$ means success at time $n$, and $X_n=0$ means failure at time $n$.
Define the accumulated process by
\begin{align*}
S_n=\sum_{i=1}^n X_i.
\end{align*}
For example,
\begin{align*}
S_1=X_1.
\end{align*}
Also,
\begin{align*}
S_2=X_1+X_2.
\end{align*}
Since each summand is either $0$ or $1$, the value of $S_2$ counts how many of the first two trials were successes. Explicitly, if $(X_1,X_2)=(0,0)$ then
\begin{align*}
S_2=0+0=0.
\end{align*}
If $(X_1,X_2)=(1,0)$ then
\begin{align*}
S_2=1+0=1.
\end{align*}
If $(X_1,X_2)=(0,1)$ then
\begin{align*}
S_2=0+1=1.
\end{align*}
If $(X_1,X_2)=(1,1)$ then
\begin{align*}
S_2=1+1=2.
\end{align*}
Thus $S_n$ counts the number of successes among the first $n$ observations. The original Bernoulli process keeps the individual success-failure record, while the partial-sum process keeps the accumulated count up to each time.
[/example]
### Continuous Time
Continuous time introduces a new tension: there are uncountably many random variables $X_t$. Pointwise measurability of each $X_t$ does not automatically guarantee good measurability of the map $(t,\omega)\mapsto X_t(\omega)$, and path regularity becomes a theorem or an assumption rather than an afterthought. To integrate a process over time or treat it as a random function of two variables, we need a measurability condition on the joint map.
[definition: Jointly Measurable Process]
Let $(\Omega,\mathcal F,\mathbb P)$ be a probability space, let $T$ be a measurable space with sigma-algebra $\mathcal T$, and let $(E,\mathcal E)$ be a measurable space. A stochastic process $(X_t)_{t\in T}$ on $(\Omega,\mathcal F,\mathbb P)$ is jointly measurable if the map $X:T\times\Omega\to E$ defined by
\begin{align*}
X(t,\omega)=X_t(\omega)
\end{align*}
is $(\mathcal T\otimes\mathcal F,\mathcal E)$-measurable.
[/definition]
Joint measurability is the entry ticket for applying Fubini-type arguments to random evolutions. It is stronger than asking each $X_t$ to be measurable as a random variable, and the distinction matters in continuous-time theory.
When $E$ has topology, one may ask for paths that are continuous, right-continuous, or have left limits. These conditions let us treat sample paths with analytic tools, so they must be named separately from the underlying probabilistic definition. There are two levels of path regularity: a property may hold for every $\omega$, or it may hold outside a single null set. Most continuous-time probability uses the almost-sure version, while the stricter version is useful when a process has already been modified on a null set.
[definition: Continuous Sample Paths]
Let $(\Omega,\mathcal F,\mathbb P)$ be a probability space, let $T\subset\mathbb R$, let $(E,d)$ be a [metric space](/page/Metric%20Space) with Borel sigma-algebra $\mathcal B(E)$, and let $(X_t)_{t\in T}$ be an $E$-valued stochastic process on $(\Omega,\mathcal F,\mathbb P)$. The process has continuous sample paths if for every $\omega\in\Omega$, the function $t\mapsto X_t(\omega)$ from $T$ to $E$ is continuous.
[/definition]
This condition is often too strong if demanded for every $\omega$, because stochastic processes are usually identified up to null sets. In standard process theory, the phrase "has continuous sample paths" often means the following almost-sure condition; whenever the distinction matters, the page will say which version is intended.
[definition: Almost Surely Continuous Sample Paths]
Let $(\Omega,\mathcal F,\mathbb P)$ be a probability space, let $T\subset\mathbb R$, let $(E,d)$ be a metric space with Borel sigma-algebra $\mathcal B(E)$, and let $(X_t)_{t\in T}$ be an $E$-valued stochastic process on $(\Omega,\mathcal F,\mathbb P)$. The process has almost surely continuous sample paths if there exists an event $\Omega_0\in\mathcal F$ with $\mathbb P(\Omega_0)=1$ such that for every $\omega\in\Omega_0$, the function $t\mapsto X_t(\omega)$ from $T$ to $E$ is continuous.
[/definition]
The next sections explain how null sets, finite observations, and whole paths interact.
## Dependence and Finite-Dimensional Distributions
### Finite Snapshots
A process is not determined by its one-time marginal laws. Dependence across times is encoded by the joint distributions of finite collections. These finite snapshots are the part of a process that can be specified directly before a path-space construction is available.
[definition: Finite-Dimensional Distribution]
Let $(\Omega,\mathcal F,\mathbb P)$ be a probability space, let $(E,\mathcal E)$ be a measurable space, and let $(X_t)_{t\in T}$ be a stochastic process with coordinate maps $X_t:(\Omega,\mathcal F)\to(E,\mathcal E)$. For $n\in\mathbb N$ and $t_1,\dots,t_n\in T$, the finite-dimensional distribution of $X$ at $(t_1,\dots,t_n)$ is the [probability measure](/page/Probability%20Measure) on $(E^n,\mathcal E^{\otimes n})$ given by
\begin{align*}
\mu_{t_1,\dots,t_n}(A)=\mathbb P\left((X_{t_1},\dots,X_{t_n})\in A\right)
\end{align*}
for every $A\in\mathcal E^{\otimes n}$.
[/definition]
Finite-dimensional distributions record all joint laws visible at finitely many times. They determine many calculations, but they do not by themselves describe path regularity. The following example motivates why joint laws, rather than separate marginal laws, are the first serious invariant of a process.
[example: Marginals Do Not Determine Dependence]
Let $Z$ be a real-valued random variable with $Z\sim\mathcal N(0,1)$, and define $X_1=Z$ and $X_2=Z$. Let $Y_1,Y_2$ be independent random variables with $Y_1\sim\mathcal N(0,1)$ and $Y_2\sim\mathcal N(0,1)$. Since $X_1=Z$ and $X_2=Z$, both $X_1$ and $X_2$ have distribution $\mathcal N(0,1)$; by assumption, $Y_1$ and $Y_2$ also have distribution $\mathcal N(0,1)$. Thus all four one-time marginal distributions agree.
For the first process, equality holds pointwise:
\begin{align*}
X_1(\omega)=Z(\omega)=X_2(\omega)
\end{align*}
for every $\omega\in\Omega$, so
\begin{align*}
\mathbb P(X_1=X_2)=\mathbb P(\Omega)=1.
\end{align*}
For the second process, set $D=Y_1-Y_2$. Since $Y_1$ and $Y_2$ are independent standard normal random variables, the density of $D$ is the convolution
\begin{align*}
f_D(u)=\int_{\mathbb R}\frac{1}{\sqrt{2\pi}}e^{-x^2/2}\frac{1}{\sqrt{2\pi}}e^{-(x-u)^2/2}\,dx.
\end{align*}
Hence $D$ has a density with respect to [Lebesgue measure](/page/Lebesgue%20Measure), and the singleton $\{0\}$ has Lebesgue measure $0$, so
\begin{align*}
\mathbb P(Y_1=Y_2)=\mathbb P(D=0)=\int_{\{0\}} f_D(u)\,du=0.
\end{align*}
Thus the processes have identical one-time marginal laws, but their two-time joint laws are different: one has perfect dependence, while the other has no positive probability of equality at the two times.
[/example]
The example shows that finite-dimensional laws are the data we should specify when building a process abstractly. There is a small bookkeeping issue: the definition above uses ordered tuples $(t_1,\dots,t_n)$, while a path-space product $\prod_{t\in T} E$ is naturally indexed by finite subsets of $T$. To pass between these descriptions, the ordered laws must be compatible with relabelling of coordinates as well as with forgetting coordinates. Equivalently, after choosing a measure $\mu_F$ on $E^{F}$ for each finite coordinate set $F\subset T$, all coordinate projections must give the same lower-dimensional laws. The next theorem explains when such finite-dimensional specifications are coherent enough to come from an actual process.
[quotetheorem:9959]
The theorem constructs a process with the requested finite-dimensional distributions, but it does not promise continuous paths, cadlag paths, or adaptedness to a useful filtration. Those extra structures are supplied by later regularity theorems or by more explicit constructions.
### Versions of a Process
Finite-dimensional distributions identify processes only up to fixed-time null sets. To make pathwise statements, we need language for processes that agree at each time and for processes that agree as entire paths.
[definition: Modification]
Let $(\Omega,\mathcal F,\mathbb P)$ be a probability space, let $(E,\mathcal E)$ be a standard Borel space, and let $(X_t)_{t\in T}$ and $(Y_t)_{t\in T}$ be $E$-valued stochastic processes on $(\Omega,\mathcal F,\mathbb P)$. The process $Y$ is a modification of $X$ if
\begin{align*}
\mathbb P(X_t=Y_t)=1
\end{align*}
for every $t\in T$.
[/definition]
A modification agrees with the original process at every fixed time outside a null set that may depend on the time. This is enough for finite-dimensional distributions, but it may not preserve statements involving all times simultaneously. To express equality as entire paths, we need a stronger notion using one exceptional null set for the whole index set.
[definition: Indistinguishable Processes]
Let $(\Omega,\mathcal F,\mathbb P)$ be a probability space, let $(E,\mathcal E)$ be a standard Borel space, and let $(X_t)_{t\in T}$ and $(Y_t)_{t\in T}$ be $E$-valued stochastic processes on $(\Omega,\mathcal F,\mathbb P)$. The processes $X$ and $Y$ are indistinguishable if there exists an event $\Omega_0\in\mathcal F$ with $\mathbb P(\Omega_0)=1$ such that
\begin{align*}
X_t(\omega)=Y_t(\omega) \qquad \text{for every } t\in T \text{ and every } \omega\in\Omega_0.
\end{align*}
[/definition]
Indistinguishability is the pathwise version of equality. For countable $T$, modifications are indistinguishable, because a [countable union of null sets](/theorems/9708) is null. For uncountable $T$, that argument no longer works.
[example: Modification Need Not Mean Indistinguishable]
Let $\Omega=[0,1]$, let $\mathcal F=\mathcal B([0,1])$, and let $\mathbb P=\mathcal L^1|_{[0,1]}$. For each $t\in[0,1]$, define
\begin{align*}
X_t(\omega)=0, \qquad Y_t(\omega)=\mathbb 1_{\{t\}}(\omega)
\end{align*}
for $\omega\in[0,1]$. The singleton $\{t\}$ is Borel, so $Y_t$ is measurable, and $X_t$ is measurable because it is constant.
Fix $t\in[0,1]$. Since $\mathcal L^1(\{t\})=0$, we have
\begin{align*}
\mathbb P(Y_t=1)=\mathbb P(\{t\})=0.
\end{align*}
Therefore
\begin{align*}
\mathbb P(Y_t=0)=1-\mathbb P(Y_t=1)=1-0=1.
\end{align*}
Also $X_t=0$ for every $\omega$, so
\begin{align*}
\{X_t=Y_t\}=\{Y_t=0\}.
\end{align*}
Hence
\begin{align*}
\mathbb P(X_t=Y_t)=\mathbb P(Y_t=0)=1.
\end{align*}
Because this holds for every fixed $t\in[0,1]$, the process $Y$ is a modification of $X$.
However, pathwise agreement fails at every sample point. If $\omega\in[0,1]$ and we choose $t=\omega$, then
\begin{align*}
X_t(\omega)=0.
\end{align*}
At the same time, $t=\omega$ implies $\omega\in\{t\}$, so
\begin{align*}
Y_t(\omega)=\mathbb 1_{\{t\}}(\omega)=1.
\end{align*}
Thus $X_t(\omega)\ne Y_t(\omega)$ for this choice of $t$. Consequently,
\begin{align*}
\{\omega\in[0,1]: X_t(\omega)=Y_t(\omega)\text{ for every }t\in[0,1]\}=\varnothing.
\end{align*}
This event has probability $0$, not $1$, so $X$ and $Y$ are not indistinguishable. The example shows that in uncountable time, equality at each fixed time can still fail to give equality of whole sample paths.
[/example]
This example is a permanent warning in continuous time. Whenever a statement quantifies over all $t$, equality at each fixed time may be too weak.
## Filtrations, Adaptedness, and Information
### Growing Information
A stochastic process is often observed progressively. At time $t$, the observer should know the past but not the future. A filtration formalises this growing information and makes conditional prediction meaningful.
[definition: Filtration]
Let $(\Omega,\mathcal F,\mathbb P)$ be a probability space and let $T\subset\mathbb R$ be an ordered index set. A filtration is a family $(\mathcal F_t)_{t\in T}$ of sub-sigma-algebras of $\mathcal F$ such that
\begin{align*}
\mathcal F_s \subset \mathcal F_t \qquad \text{whenever } s\le t.
\end{align*}
[/definition]
A filtration may contain outside information: signals, enlarged observations, or even future data inserted by hand. When the process itself is meant to be the only observed object, we need the minimal filtration generated by its past values; otherwise adaptedness and stopping-time statements can accidentally depend on information the model was not supposed to reveal.
[definition: Natural Filtration]
Let $(\Omega,\mathcal F,\mathbb P)$ be a probability space, let $(E,\mathcal E)$ be a measurable space, and let $(X_t)_{t\in T}$ be a stochastic process with coordinate maps $X_t:(\Omega,\mathcal F)\to(E,\mathcal E)$, where $T\subset\mathbb R$. The natural filtration of $X$ is the family $(\mathcal F_t^X)_{t\in T}$ defined by
\begin{align*}
\mathcal F_t^X=\sigma(X_s:s\in T,\ s\le t).
\end{align*}
[/definition]
Working with the natural filtration is useful when the process is the only source of information. Enlarging the filtration can change stopping times, martingales, and independence of increments, so the filtration must be stated whenever those notions appear. Once a filtration is fixed, the next question is whether the process can actually be observed without looking ahead.
[definition: Adapted Process]
Let $(\mathcal F_t)_{t\in T}$ be a filtration on $(\Omega,\mathcal F,\mathbb P)$. A stochastic process $(X_t)_{t\in T}$ with state space $(E,\mathcal E)$ and coordinate maps
\begin{align*}
X_t:(\Omega,\mathcal F)\to(E,\mathcal E), \qquad t\in T,
\end{align*}
is adapted to $(\mathcal F_t)_{t\in T}$ if $X_t$ is $\mathcal F_t$-measurable for every $t\in T$.
[/definition]
Adaptedness says that the value at time $t$ is observable using the information available at time $t$. It rules out processes that look into the future while pretending to be present-time quantities.
[example: A Non-Adapted Look-Ahead Process]
Let $Y_1,Y_2,\dots$ be i.i.d. real-valued random variables, let $\mathcal F_n=\sigma(Y_1,\dots,Y_n)$, and define $X_n=Y_{n+1}$. To make the failure of adaptedness visible, suppose the common law is not degenerate, so there is a Borel set $B\subset\mathbb R$ with $0<\mathbb P(Y_1\in B)<1$.
Fix $n$. Since $Y_{n+1}$ is independent of $Y_1,\dots,Y_n$, the event $\{Y_{n+1}\in B\}$ is independent of $\mathcal F_n$. If $X_n$ were $\mathcal F_n$-measurable, then $\{X_n\in B\}\in\mathcal F_n$. But $X_n=Y_{n+1}$, so
\begin{align*}
\{X_n\in B\}=\{Y_{n+1}\in B\}.
\end{align*}
Thus this event would be independent of itself, giving
\begin{align*}
\mathbb P(X_n\in B)=\mathbb P(X_n\in B\cap X_n\in B)=\mathbb P(X_n\in B)^2.
\end{align*}
The equation $q=q^2$ has only the solutions $q=0$ and $q=1$, but
\begin{align*}
\mathbb P(X_n\in B)=\mathbb P(Y_{n+1}\in B)=\mathbb P(Y_1\in B)\in(0,1).
\end{align*}
This contradiction shows that $X_n$ is not $\mathcal F_n$-measurable, so the process is not adapted to $(\mathcal F_n)_{n\in\mathbb N}$.
If instead we enlarge the filtration to
\begin{align*}
\mathcal G_n=\sigma(Y_1,\dots,Y_{n+1}),
\end{align*}
then $X_n=Y_{n+1}$ is $\mathcal G_n$-measurable for every $n$ by the definition of the generated sigma-algebra. The same formula $X_n=Y_{n+1}$ is therefore a look-ahead process relative to $(\mathcal F_n)$, but an adapted process relative to $(\mathcal G_n)$.
[/example]
### Random Times
Random times are central because many natural questions ask when an event first occurs: first ruin, first passage above a barrier, first arrival after a deadline. A valid random time must be decidable from present information, which motivates the stopping-time condition.
[definition: Stopping Time]
Let $(\mathcal F_t)_{t\in T}$ be a filtration, where $T\subset[0,\infty)$. Equip $T\cup\{\infty\}$ with the sigma-algebra generated by the sets $\{u\in T\cup\{\infty\}:u\le t\}$ for $t\in T$. A measurable map $\tau:\Omega\to T\cup\{\infty\}$ is a [stopping time](/page/Stopping%20Time) with respect to $(\mathcal F_t)_{t\in T}$ if
\begin{align*}
\{\tau\le t\}\in\mathcal F_t
\end{align*}
for every $t\in T$.
[/definition]
The condition says that by time $t$ we can decide whether the time has already occurred. In discrete time, the equivalent condition $\{\tau=t\}\in\mathcal F_t$ is often convenient, but the definition using $\{\tau\le t\}$ is the standard form that behaves well in continuous time.
[remark: Continuous-Time Conventions for Stopping Times]
For a general ordered subset $T\subset[0,\infty)$, some advanced texts use variants involving $\{\tau<t\}$, or impose right-continuity conditions on the filtration to make these formulations agree in the expected way. This page uses the standard $\{\tau\le t\}$ convention as the basic definition; a more refined treatment of continuous-time filtrations should state the exact regularity assumptions on $(\mathcal F_t)_{t\in T}$.
[/remark]
The first hitting time of a discrete process shows why the definition is natural: whether the event has already happened can be checked from the observations made so far.
[example: First Hitting Time of a Random Walk]
Let $(S_n)_{n\ge0}$ be the simple symmetric random walk and let $(\mathcal F_n^S)_{n\ge0}$ be its natural filtration, so
\begin{align*}
\mathcal F_n^S=\sigma(S_0,S_1,\dots,S_n).
\end{align*}
For $a\in\mathbb Z$, define
\begin{align*}
\tau_a=\inf\{m\ge0:S_m=a\},
\end{align*}
with the convention that $\inf\varnothing=\infty$. We show that $\tau_a$ is a stopping time by proving that $\{\tau_a\le n\}\in\mathcal F_n^S$ for every $n\ge0$.
Fix $n\ge0$. The event $\{\tau_a\le n\}$ means that the walk has hit $a$ at one of the times $0,1,\dots,n$. Therefore
\begin{align*}
\{\tau_a\le n\}=\bigcup_{k=0}^n\{S_k=a\}.
\end{align*}
For each $k\le n$, the random variable $S_k$ is one of the generators of $\mathcal F_n^S=\sigma(S_0,\dots,S_n)$, so
\begin{align*}
\{S_k=a\}=S_k^{-1}(\{a\})\in\mathcal F_n^S.
\end{align*}
Since a sigma-algebra is closed under finite unions,
\begin{align*}
\bigcup_{k=0}^n\{S_k=a\}\in\mathcal F_n^S.
\end{align*}
Hence
\begin{align*}
\{\tau_a\le n\}\in\mathcal F_n^S.
\end{align*}
This holds for every $n\ge0$, so $\tau_a$ is a stopping time with respect to the natural filtration. The point is that deciding whether the level $a$ has already been hit by time $n$ requires only the observed values $S_0,S_1,\dots,S_n$, not any future value of the walk.
[/example]
## Martingales and Markov Processes
### Fair Prediction
Two organizing principles dominate stochastic process theory. Martingales formalise fair prediction under a filtration. Markov processes formalise the idea that the future depends on the present state rather than the full past. They answer different questions and often interact.
A martingale is the model of a quantity whose current value is the best conditional prediction of its future value. This is not the same as independence: a martingale may have highly dependent increments, but its conditional drift vanishes.
[definition: Martingale]
Let $(\mathcal F_t)_{t\in T}$ be a filtration on $(\Omega,\mathcal F,\mathbb P)$, where $T\subset[0,\infty)$. A process $(M_t)_{t\in T}$ whose coordinate maps are
\begin{align*}
M_t:(\Omega,\mathcal F)\to(\mathbb R,\mathcal B(\mathbb R)), \qquad t\in T,
\end{align*}
is a martingale with respect to $(\mathcal F_t)_{t\in T}$ if it is adapted, $\mathbb E[|M_t|]<\infty$ for every $t\in T$, and
\begin{align*}
\mathbb E[M_t\mid\mathcal F_s]=M_s
\end{align*}
whenever $s\le t$.
[/definition]
Martingales turn [conditional expectation](/page/Conditional%20Expectation) into a dynamic conservation law. They are the backbone of optional stopping, stochastic integration, change of measure, and many concentration inequalities.
[example: Centered Random Walk as a Martingale]
Let $Y_1,Y_2,\dots$ be i.i.d. integrable random variables with $\mathbb E[Y_i]=0$, and define $M_0=0$ and
\begin{align*}
M_n=\sum_{i=1}^n Y_i
\end{align*}
for $n\ge1$. Let $\mathcal F_n=\sigma(Y_1,\dots,Y_n)$. Since $M_n$ is a finite sum of $\mathcal F_n$-measurable random variables, $M_n$ is $\mathcal F_n$-measurable, so the process is adapted. Also, by the triangle inequality and integrability of the increments,
\begin{align*}
\mathbb E[|M_n|]\le \sum_{i=1}^n \mathbb E[|Y_i|]<\infty.
\end{align*}
Now fix integers $0\le n\le m$. If $m=n$, then $\mathbb E[M_m\mid\mathcal F_n]=\mathbb E[M_n\mid\mathcal F_n]=M_n$ because $M_n$ is $\mathcal F_n$-measurable. If $m>n$, then
\begin{align*}
M_m=M_n+\sum_{i=n+1}^m Y_i.
\end{align*}
By linearity of conditional expectation,
\begin{align*}
\mathbb E[M_m\mid\mathcal F_n]=\mathbb E[M_n\mid\mathcal F_n]+\sum_{i=n+1}^m \mathbb E[Y_i\mid\mathcal F_n].
\end{align*}
The first term is $M_n$, since $M_n$ is $\mathcal F_n$-measurable. For each $i>n$, the random variable $Y_i$ is independent of $\mathcal F_n=\sigma(Y_1,\dots,Y_n)$, so
\begin{align*}
\mathbb E[Y_i\mid\mathcal F_n]=\mathbb E[Y_i]=0.
\end{align*}
Therefore
\begin{align*}
\mathbb E[M_m\mid\mathcal F_n]=M_n+\sum_{i=n+1}^m 0=M_n.
\end{align*}
Thus $(M_n)_{n\ge0}$ is a martingale with respect to $(\mathcal F_n)_{n\ge0}$. The vanishing mean of each future increment becomes a conditional statement: given the information available at time $n$, the best prediction of $M_m$ is the current value $M_n$.
[/example]
Optional stopping is one of the first reasons martingales are useful. It explains when a fair process remains fair after stopping according to available information.
[quotetheorem:1153]
The boundedness hypothesis protects the statement from gambling paradoxes caused by unbounded waiting. The theorem is not saying every betting strategy has finite expected payoff; it says that stopping within a fixed horizon preserves the martingale expectation.
[example: Why Boundedness Matters]
Let $(S_n)_{n\ge0}$ be the simple symmetric random walk with $S_0=0$, and define
\begin{align*}
\tau_1=\inf\{n\ge0:S_n=1\}.
\end{align*}
On the event $\{\tau_1<\infty\}$, the definition of $\tau_1$ gives $S_{\tau_1}=1$. By the standard recurrence theorem for the one-dimensional simple symmetric random walk, $\mathbb P(\tau_1<\infty)=1$. Hence
\begin{align*}
S_{\tau_1}=1
\end{align*}
almost surely, and therefore
\begin{align*}
\mathbb E[S_{\tau_1}]=1\cdot \mathbb P(S_{\tau_1}=1)=1\cdot 1=1.
\end{align*}
Meanwhile $S_0=0$ deterministically, so
\begin{align*}
\mathbb E[S_0]=0.
\end{align*}
The stopping time $\tau_1$ is not bounded. Indeed, for every $N\in\mathbb N$, the event
\begin{align*}
\{Y_1=-1,\dots,Y_N=-1\}
\end{align*}
has probability $(1/2)^N>0$, and on this event the path satisfies $S_k=-k<1$ for $1\le k\le N$. Thus $\tau_1>N$ on this event, so
\begin{align*}
\mathbb P(\tau_1>N)\ge \mathbb P(Y_1=-1,\dots,Y_N=-1)=\left(\frac12\right)^N>0.
\end{align*}
Therefore no deterministic $N$ can satisfy $\tau_1\le N$ almost surely. The equality $\mathbb E[S_{\tau_1}]=1\ne0=\mathbb E[S_0]$ is exactly the failure that the boundedness hypothesis in optional stopping is designed to exclude.
[/example]
### State-Based Prediction
Markov processes focus on state-based prediction. Instead of retaining the whole history, the process carries enough information in its present state to determine the conditional law of the future. To state this without hiding measure-theoretic assumptions, we first isolate the object that records a measurable family of probability laws.
[definition: Transition Kernel]
Let $(E,\mathcal E)$ be a measurable space. A transition kernel from $E$ to $E$ is a function $K:E\times\mathcal E\to[0,1]$ such that for each $x\in E$, $A\mapsto K(x,A)$ is a probability measure on $(E,\mathcal E)$, and for each $A\in\mathcal E$, $x\mapsto K(x,A)$ is $\mathcal E$-measurable.
[/definition]
A transition kernel is the stochastic analogue of an evolution operator. On a countable state space it is the same data as a transition matrix; on a general state space it supplies the measurability needed to integrate conditional laws. The missing question is how to express memorylessness without pretending that the past is irrelevant as an event. The right statement is conditional: once the present state is known, the conditional law of a later observation is supplied by a kernel depending only on that state.
[definition: Markov Process]
Let $T\subset[0,\infty)$, let $(E,\mathcal E)$ be a standard Borel space, and let $(X_t)_{t\in T}$ be an $E$-valued stochastic process adapted to a filtration $(\mathcal F_t)_{t\in T}$. The process is Markov with respect to $(\mathcal F_t)_{t\in T}$ if for every $s,t\in T$ with $s\le t$ there exists a transition kernel $P_{s,t}:E\times\mathcal E\to[0,1]$ such that for every $A\in\mathcal E$,
\begin{align*}
\mathbb P(X_t\in A\mid\mathcal F_s)=P_{s,t}(X_s,A)
\end{align*}
almost surely.
[/definition]
The Markov property is a statement about conditional laws. It does not say that the future is independent of the past; it says that after conditioning on the present state, the earlier past adds no further predictive information. The definition above uses the one-time transition formulation: it controls the conditional law of $X_t$ given $\mathcal F_s$.
[remark: One-Time and Pathwise Markov Properties]
The one-time formulation is enough for many calculations with transition kernels and finite-dimensional distributions, but it should not be silently identified with every standard process-level Markov property. A stronger pathwise Markov property asks for the conditional law of the whole future process $(X_u)_{u\ge s}$ given $\mathcal F_s$, usually as a probability law on a specified path space. Passing from one-time conditional laws to that pathwise statement can require additional regularity, measurable conditional probabilities, and compatibility of the chosen path space, so this page uses the one-time version unless a later result explicitly states a path-space form.
[/remark]
In discrete time, a time-homogeneous transition kernel gives the same one-step law at every time. This is a substantial simplification: instead of specifying a separate joint law on $E^{n+1}$ for every time horizon, we specify the initial law and one measurable rule for moving from the current state to the next state. The point of the next theorem is that these local transition rules are not merely marginal information. Together with the Markov property, they determine the full joint law of the chain at times $0,1,\dots,n$.
[quotetheorem:9960]
This formula explains why Markov chains are tractable: a long joint law factors into an initial law and repeated one-step transitions. In finite state spaces, the same statement becomes matrix multiplication.
[example: Two-State Markov Chain]
Let $E=\{0,1\}$ and let
\begin{align*}
K(0,\{1\})=a, \qquad K(0,\{0\})=1-a, \qquad K(1,\{0\})=b, \qquad K(1,\{1\})=1-b,
\end{align*}
where $a,b\in[0,1]$. Suppose $\mathbb P(X_0=0)=p$, so
\begin{align*}
\mathbb P(X_0=1)=1-\mathbb P(X_0=0)=1-p.
\end{align*}
We compute the probability of being in state $1$ at time $1$ by splitting according to the two possible values of $X_0$. Since $E=\{0,1\}$, the events $\{X_0=0\}$ and $\{X_0=1\}$ are disjoint and their union is the whole sample space, so
\begin{align*}
\{X_1=1\}=\bigl(\{X_0=0\}\cap\{X_1=1\}\bigr)\cup\bigl(\{X_0=1\}\cap\{X_1=1\}\bigr).
\end{align*}
Therefore, by finite additivity,
\begin{align*}
\mathbb P(X_1=1)=\mathbb P(X_0=0, X_1=1)+\mathbb P(X_0=1, X_1=1).
\end{align*}
For the first term, conditioning on $X_0=0$ gives
\begin{align*}
\mathbb P(X_0=0, X_1=1)=\mathbb P(X_0=0)\mathbb P(X_1=1\mid X_0=0)=pK(0,\{1\})=pa.
\end{align*}
For the second term, conditioning on $X_0=1$ gives
\begin{align*}
\mathbb P(X_0=1, X_1=1)=\mathbb P(X_0=1)\mathbb P(X_1=1\mid X_0=1)=(1-p)K(1,\{1\})=(1-p)(1-b).
\end{align*}
Combining the two contributions,
\begin{align*}
\mathbb P(X_1=1)=pa+(1-p)(1-b).
\end{align*}
Thus the one-step law is obtained by weighting each possible present state by its current probability and then applying the transition probability from that state.
[/example]
## Continuous-Time Models
### Counting Processes
Continuous-time stochastic processes model systems whose clock is not restricted to integer steps. The price of this flexibility is additional structure: path regularity, measurability in time, and consistency of increments become central rather than peripheral.
The Poisson process is the basic model for random arrivals. It counts how many events have occurred by time $t$, and its defining assumptions express stationary independent increments with paths that increase by single jumps.
[definition: Poisson Process]
Let $(\Omega,\mathcal F,\mathbb P)$ be a probability space and let $\lambda>0$. A Poisson process of rate $\lambda$ is a stochastic process $(N_t)_{t\ge0}$ whose coordinate maps are
\begin{align*}
N_t:(\Omega,\mathcal F)\to(\{0,1,2,\dots\},2^{\{0,1,2,\dots\}}), \qquad t\ge0,
\end{align*}
and which satisfies the following conditions: there exists an event $\Omega_0\in\mathcal F$ with $\mathbb P(\Omega_0)=1$ for which each path $t\mapsto N_t(\omega)$, $\omega\in\Omega_0$, is nondecreasing, right-continuous, has left limits at every $t>0$, and has jumps, when they occur, of size $1$; $N_0=0$ almost surely; for every finite collection of disjoint intervals $(s_j,t_j]$ with $0\le s_j\le t_j$, the increments $N_{t_j}-N_{s_j}$ are independent; and
\begin{align*}
N_t-N_s\sim\operatorname{Poi}(\lambda(t-s))
\end{align*}
for all $0\le s\le t$.
[/definition]
The Poisson process is the continuous-time analogue of repeated rare event counting. It is integer-valued and increases by jumps, so it is continuous in time but not continuous in its paths.
[example: Arrival Counts in Disjoint Intervals]
Let $(N_t)_{t\ge0}$ be a Poisson process of rate $\lambda$, and fix $0\le s<t<u$. The intervals $(s,t]$ and $(t,u]$ are disjoint, so the increments $N_t-N_s$ and $N_u-N_t$ are independent by the definition of a Poisson process. Their laws are
\begin{align*}
N_t-N_s\sim\operatorname{Poi}(\lambda(t-s))
\end{align*}
and
\begin{align*}
N_u-N_t\sim\operatorname{Poi}(\lambda(u-t)).
\end{align*}
For a Poisson random variable $Z\sim\operatorname{Poi}(\theta)$, the mass function is
\begin{align*}
\mathbb P(Z=k)=e^{-\theta}\frac{\theta^k}{k!},\qquad k=0,1,2,\dots.
\end{align*}
Therefore
\begin{align*}
\mathbb P(N_t-N_s=0)=e^{-\lambda(t-s)}\frac{(\lambda(t-s))^0}{0!}=e^{-\lambda(t-s)}.
\end{align*}
Also,
\begin{align*}
\mathbb P(N_u-N_t=1)=e^{-\lambda(u-t)}\frac{(\lambda(u-t))^1}{1!}=\lambda(u-t)e^{-\lambda(u-t)}.
\end{align*}
Using independence of the two increments,
\begin{align*}
\mathbb P(N_t-N_s=0,\,N_u-N_t=1)=\mathbb P(N_t-N_s=0)\mathbb P(N_u-N_t=1).
\end{align*}
Substituting the two one-increment probabilities gives
\begin{align*}
\mathbb P(N_t-N_s=0,\,N_u-N_t=1)=e^{-\lambda(t-s)}\lambda(u-t)e^{-\lambda(u-t)}.
\end{align*}
The calculation shows that disjoint time intervals contribute separate Poisson counts, and their joint probabilities factor because the increments are independent.
[/example]
### Brownian Motion
Brownian motion is the basic continuous-path noise. It is the scaling limit of random walks, the driving signal for Itô calculus, and the canonical Gaussian process with independent increments. These roles motivate isolating the exact assumptions: start at zero, continuous paths, independent increments, and Gaussian increment laws.
[definition: Brownian Motion]
Let $(\Omega,\mathcal F,\mathbb P)$ be a probability space. A process $(W_t)_{t\ge0}$ whose coordinate maps are
\begin{align*}
W_t:(\Omega,\mathcal F)\to(\mathbb R,\mathcal B(\mathbb R)), \qquad t\ge0,
\end{align*}
is a standard Brownian motion if $W_0=0$ almost surely, it has almost surely continuous sample paths, for every finite chain $0\le t_0<t_1<\cdots<t_n$ the random variables $W_{t_1}-W_{t_0},\dots,W_{t_n}-W_{t_{n-1}}$ are independent, and
\begin{align*}
W_t-W_s\sim\mathcal N(0,t-s)
\end{align*}
for all $0\le s\le t$.
[/definition]
Brownian motion has almost surely continuous paths but is far from differentiable. Its increments scale like the square root of time, which is the first sign that [stochastic calculus](/page/Stochastic%20Calculus) cannot be ordinary calculus with random inputs.
[example: Brownian Scaling]
Let $(W_t)_{t\ge0}$ be a standard Brownian motion and fix $c>0$. Define
\begin{align*}
Y_t=\frac{1}{\sqrt c}W_{ct}, \qquad t\ge0.
\end{align*}
First,
\begin{align*}
Y_0=\frac{1}{\sqrt c}W_{c0}=\frac{1}{\sqrt c}W_0.
\end{align*}
Since $W_0=0$ almost surely, $Y_0=0$ almost surely.
Let $\Omega_0$ be an event of probability $1$ on which every path $u\mapsto W_u(\omega)$ is continuous. For $\omega\in\Omega_0$, the map
\begin{align*}
t\mapsto Y_t(\omega)=\frac{1}{\sqrt c}W_{ct}(\omega)
\end{align*}
is continuous because it is the composition of the continuous maps $t\mapsto ct$, $u\mapsto W_u(\omega)$, and $x\mapsto x/\sqrt c$. Hence $Y$ has almost surely continuous sample paths.
Now fix $0\le s\le t$. Then
\begin{align*}
Y_t-Y_s=\frac{1}{\sqrt c}W_{ct}-\frac{1}{\sqrt c}W_{cs}=\frac{1}{\sqrt c}(W_{ct}-W_{cs}).
\end{align*}
Because $0\le cs\le ct$, the Brownian increment satisfies
\begin{align*}
W_{ct}-W_{cs}\sim\mathcal N(0,ct-cs)=\mathcal N(0,c(t-s)).
\end{align*}
By the scaling rule for centered normal variables, multiplying by $1/\sqrt c$ changes the variance by the factor $(1/\sqrt c)^2=1/c$, so
\begin{align*}
Y_t-Y_s\sim\mathcal N\left(0,\frac{1}{c}c(t-s)\right)=\mathcal N(0,t-s).
\end{align*}
Finally, if $0\le t_0<t_1<\cdots<t_n$, then
\begin{align*}
Y_{t_j}-Y_{t_{j-1}}=\frac{1}{\sqrt c}(W_{ct_j}-W_{ct_{j-1}})
\end{align*}
for each $j=1,\dots,n$. The intervals $(ct_{j-1},ct_j]$ are disjoint and ordered, so the Brownian increments $W_{ct_j}-W_{ct_{j-1}}$ are independent; deterministic rescaling preserves independence. Therefore the increments of $Y$ are independent, and $Y$ is again a standard Brownian motion.
[/example]
The existence of Brownian motion with almost surely continuous paths is not merely a formal consequence of assigning Gaussian finite-dimensional distributions. It requires a regularity argument that upgrades the raw process to a continuous modification.
[quotetheorem:1170]
This theorem is the regularity bridge between moment estimates and path properties. It is a typical result of process theory: a statement about random variables at pairs of times produces a statement about almost every whole path.
## Convergence and Regularity of Processes
### Finite-Dimensional Convergence
Sequences of stochastic processes arise from approximation, scaling limits, numerical schemes, and statistical asymptotics. Since a process contains many random variables at once, convergence must specify both the state space and the topology on paths.
For finite-dimensional questions, convergence of all finite snapshots is the weakest natural notion. It sees joint distributions at finitely many times but ignores tightness and path oscillation.
[definition: Convergence of Finite-Dimensional Distributions]
Let $(E,d)$ be a metric space with Borel sigma-algebra $\mathcal B(E)$. Let $(X^n_t)_{t\in T}$ be $E$-valued stochastic processes, and let $(X_t)_{t\in T}$ be another $E$-valued stochastic process. The processes $X^n$ converge to $X$ in finite-dimensional distributions if for every $k\in\mathbb N$ and every $t_1,\dots,t_k\in T$,
\begin{align*}
(X^n_{t_1},\dots,X^n_{t_k})\xrightarrow{d}(X_{t_1},\dots,X_{t_k}).
\end{align*}
[/definition]
This convergence is often the first step in a limit theorem, but it is not enough to control paths. A sequence can converge at each finite set of times while developing spikes between those times.
[example: Finite Snapshots Can Miss Moving Spikes]
Let $X^n_t=\mathbb 1_{\{1/n\}}(t)$ for $t\in[0,1]$, viewed as deterministic processes, and let $X_t=0$ for every $t\in[0,1]$. We show that the finite-dimensional distributions of $X^n$ converge to those of $X$, even though the paths do not converge uniformly.
Fix $t\in[0,1]$. If $t=0$, then $1/n\ne0$ for every $n$, so
\begin{align*}
X^n_0=\mathbb 1_{\{1/n\}}(0)=0
\end{align*}
for every $n$. If $t>0$, then the equation $1/n=t$ is equivalent to $n=1/t$. There is at most one integer $n$ with $n=1/t$, so for all sufficiently large $n$ we have $1/n\ne t$, and therefore
\begin{align*}
X^n_t=\mathbb 1_{\{1/n\}}(t)=0.
\end{align*}
Thus $X^n_t\to0=X_t$ for every fixed $t\in[0,1]$.
Now fix $k\in\mathbb N$ and times $t_1,\dots,t_k\in[0,1]$. For each $j$, the sequence $X^n_{t_j}$ is eventually equal to $0$. Since there are only finitely many times, there is an $N$ such that for every $n\ge N$ and every $j=1,\dots,k$,
\begin{align*}
X^n_{t_j}=0.
\end{align*}
Hence for $n\ge N$,
\begin{align*}
(X^n_{t_1},\dots,X^n_{t_k})=(0,\dots,0)=(X_{t_1},\dots,X_{t_k}).
\end{align*}
Because these vectors are deterministic and eventually identical, their distributions converge to the distribution of $(X_{t_1},\dots,X_{t_k})$. Therefore $X^n$ converges to the zero process in finite-dimensional distributions.
However, for every $n$, the spike occurs at the time $t=1/n\in[0,1]$, and
\begin{align*}
|X^n_{1/n}|=|\mathbb 1_{\{1/n\}}(1/n)|=1.
\end{align*}
Also $X^n_t$ only takes the values $0$ and $1$, so $|X^n_t|\le1$ for every $t\in[0,1]$. Combining the lower bound from $t=1/n$ with the upper bound for all $t$ gives
\begin{align*}
\sup_{t\in[0,1]}|X^n_t|=1.
\end{align*}
Thus the finite snapshots miss the moving spike: every fixed finite set of observation times eventually sees only zeros, while the uniform size of the path remains equal to $1$ for every $n$.
[/example]
### Tightness and Functional Limits
To prove convergence as random functions, one typically combines finite-dimensional convergence with tightness in a path space. Tightness prevents probability mass from escaping through wild oscillations, so it is the compactness condition behind functional limit theorems.
[definition: Tightness of Processes]
Let $(\Omega,\mathcal F,\mathbb P)$ be a probability space and let $(S,d)$ be a metric space with Borel sigma-algebra $\mathcal B(S)$. A family of random variables $Z_i:(\Omega,\mathcal F)\to(S,\mathcal B(S))$, indexed by $i\in I$, is tight if for every $\varepsilon>0$ there exists a compact set $K\subset S$ such that
\begin{align*}
\mathbb P(Z_i\in K)\ge1-\varepsilon
\end{align*}
for every $i\in I$.
[/definition]
When processes are regarded as random variables taking values in $C([0,T])$ or in a cadlag path space, this definition applies directly. Compact sets in those spaces encode uniform boundedness and control of oscillation. This compactness principle motivates the functional [central limit theorem](/theorems/521), where convergence of entire paths replaces convergence of one terminal sum.
[quotetheorem:1189]
Donsker's theorem upgrades the [central limit theorem](/theorems/1848) from one terminal sum to an entire random path. The theorem says that after diffusive scaling, the whole trajectory of partial sums behaves like Brownian motion.
Regularity assumptions also appear in the classification of path spaces. Continuous paths are natural for Brownian motion, while jump processes require a space that allows discontinuities but still retains right-continuity and left limits.
[definition: Cadlag Path]
Let $T=[0,\infty)$ and let $(E,d)$ be a metric space. A function $x:T\to E$ is cadlag if it is right-continuous at every $t\ge0$ and has a left limit at every $t>0$.
[/definition]
Cadlag paths are the standard setting for many jump processes and semimartingales. They are flexible enough to include jumps and structured enough to support stopping-time arguments and stochastic integration.
## Beyond and Connected Topics
Stochastic processes are the common language behind several branches of probability. [Cambridge IB Probability and Measure](/page/Cambridge%20IB%20Probability%20and%20Measure) supplies the measure-theoretic base: random variables, distributions, expectation, conditional expectation, and convergence modes. Without that foundation, filtrations and finite-dimensional distributions are difficult to handle responsibly.
Martingale theory is the next structural layer. It studies fair processes, stopping, convergence, inequalities, and decompositions. This direction leads naturally to [Cambridge III Advanced Probability](/page/Cambridge%20III%20Advanced%20Probability), where stochastic processes are treated with measure-theoretic precision and used to prove limit theorems.
Continuous-time stochastic analysis begins when Brownian motion is used as an integrator. The Itô integral and stochastic differential equations require adaptedness, filtrations, Brownian quadratic variation, and stopping times. These topics are developed in [Cambridge III Stochastic Calculus and Applications](/page/Cambridge%20III%20Stochastic%20Calculus%20and%20Applications).
Statistics uses stochastic processes as asymptotic objects and as data-generating models. Empirical processes, time series, Markov chain Monte Carlo, and diffusion models all depend on viewing data as a dependent random family rather than as isolated observations. [Cambridge II Principles of Statistics](/page/Cambridge%20II%20Principles%20of%20Statistics) is a natural companion for the statistical side.
Another direction is Markov process theory: transition semigroups, generators, invariant measures, recurrence, and ergodicity. In continuous time this connects probability to PDE through heat semigroups and diffusion generators.
A final direction is path-space probability. Instead of studying $X_t$ separately for each $t$, one studies the law of the entire random function $t\mapsto X_t$. This is the viewpoint behind [weak convergence](/page/Weak%20Convergence) in $C([0,T])$, Skorokhod spaces, Gaussian processes, and modern stochastic analysis. Three natural expansion points are Gaussian processes beyond Brownian motion, transition semigroups and infinitesimal generators for Markov processes, and Skorokhod path spaces for cadlag processes; each requires a dedicated page because the topology and measurability choices become part of the mathematics.
## References
Androma, [Cambridge III Stochastic Calculus and Applications](/page/Cambridge%20III%20Stochastic%20Calculus%20and%20Applications).
Androma, [Cambridge III Advanced Probability](/page/Cambridge%20III%20Advanced%20Probability).
Androma, [Cambridge II Principles of Statistics](/page/Cambridge%20II%20Principles%20of%20Statistics).
Androma, [Cambridge IB Probability and Measure](/page/Cambridge%20IB%20Probability%20and%20Measure).
David Williams, *Probability with Martingales* (1991).
Rick Durrett, *Probability: Theory and Examples* (2019).
Olav Kallenberg, *Foundations of Modern Probability* (2021).
Ioannis Karatzas and Steven Shreve, *Brownian Motion and Stochastic Calculus* (1991).
Stochastic Process
Also known as: Random Process, Stochastic Processes, Random Function, Random Evolution, Process Theory