A sequence of independent random variables is often too forgetful for systems that persist. If it rains today, tomorrow's weather is affected by that fact; if a queue is long now, the next customer sees the backlog; if a gambler has current fortune $i$, the next fortune is determined by $i$ and by the next stake, not by the whole list of earlier fortunes. Markov chains are the first systematic way to retain memory without carrying the whole past: the present state is the summary of the past that matters for predicting the next step.
The point is not that history is irrelevant. The point is that we choose the state so that history has already been compressed into it. This makes Markov chains both probabilistic objects and modelling decisions. A poor choice of state can destroy the Markov property; a good choice turns an apparently complicated evolving system into repeated multiplication by transition probabilities.
Before the formal definition, it helps to see why independent trials are too small a language. A weather model with independent days can match the marginal proportion of rainy days, but it cannot represent long wet spells or long dry spells except through accidental clustering.
[example: Rain Persistence Versus Independent Weather]
Let $S=\{0,1\}$, where $0$ means dry and $1$ means rainy. In the independent model, $\mathbb P(X_n=1)=1/2$ for every $n$, so independence gives
\begin{align*}
\mathbb P(X_0=1,X_1=1,X_2=1)=\mathbb P(X_0=1)\mathbb P(X_1=1)\mathbb P(X_2=1).
\end{align*}
Substituting the three one-day probabilities,
\begin{align*}
\mathbb P(X_0=1,X_1=1,X_2=1)=\frac{1}{2}\cdot \frac{1}{2}\cdot \frac{1}{2}=\frac{1}{8}.
\end{align*}
Now suppose instead that the next day repeats the present weather with probability $0.8$ and switches with probability $0.2$, and suppose $\mathbb P(X_0=1)=1/2$. The event of three rainy days in a row is the path $1\to 1\to 1$, so the path probability factors as the initial probability times the two conditional repeat probabilities:
\begin{align*}
\mathbb P(X_0=1,X_1=1,X_2=1)=\mathbb P(X_0=1)\mathbb P(X_1=1\mid X_0=1)\mathbb P(X_2=1\mid X_1=1).
\end{align*}
Using the stated transition rule,
\begin{align*}
\mathbb P(X_0=1,X_1=1,X_2=1)=\frac{1}{2}\cdot 0.8\cdot 0.8=\frac{1}{2}\cdot 0.64=0.32.
\end{align*}
Both models have one-day rainy probability $1/2$, but the Markov model assigns much larger probability to a rainy spell because the current state carries persistence information.
[/example]
The example shows the central move. Instead of assigning probabilities separately to every possible history, we assign probabilities for the next state conditional on the present state. The formal definition must say what the state space is, what the process is, and what it means for the present to screen off the past.
## Definition
A Markov chain is a stochastic process whose next-step conditional law depends on the observed past only through the present state; the state is the compressed record of everything in the history that remains relevant for prediction.
This condition is useful only because it sits between two less flexible extremes. Independent sequences forget too much, while arbitrary stochastic processes remember too much to compute with. The Markov condition keeps exactly the information encoded in the present state, so the modelling burden is to choose states rich enough that the next-step law no longer needs the earlier path.
[definition: Markov Chain]
Let $(\Omega, \mathcal F, \mathbb P)$ be a [probability space](/page/Probability%20Space) and let $S$ be a [countable set](/page/Countable%20Set). A Markov chain on $S$ is a sequence $(X_n)_{n \ge 0}$ of random variables $X_n : \Omega \to S$ such that, for every $n \ge 0$ and every $i_0, \ldots, i_{n+1} \in S$ with
\begin{align*}
\mathbb P(X_0 = i_0, \ldots, X_n = i_n) > 0,
\end{align*}
one has
\begin{align*}
\mathbb P(X_{n+1} = i_{n+1} \mid X_0 = i_0, \ldots, X_n = i_n) = \mathbb P(X_{n+1} = i_{n+1} \mid X_n = i_n).
\end{align*}
[/definition]
This definition gives the central screening-off condition. To use it effectively, we still need the supporting vocabulary that says what kind of state space is being used and what kind of stochastic process is being constrained.
## State Spaces and Processes
A Markov chain is only useful once the possible values of the process have been chosen carefully. To turn the informal phrase "the chain is at state $i$" into formulas such as $\mathbb P(X_n=i)$ and sums over possible next states, the elementary theory restricts attention to finite or countably infinite state spaces. Countability is the classical setting in which individual states have point probabilities, so questions about movement can be reduced to sums over states and rows of transition probabilities.
[definition: Countable State Space]
A countable state space is a finite or countably infinite set $S$ whose elements are called states.
[/definition]
The state space alone is static. To speak about evolution, we need a sequence of random observations indexed by time, with each observation taking values in the state space just chosen.
[definition: Discrete-Time Stochastic Process]
Let $(\Omega, \mathcal F, \mathbb P)$ be a probability space and let $(S, \mathcal S)$ be a measurable space. A discrete-time stochastic process with state space $S$ is a sequence $(X_n)_{n \ge 0}$ of random variables such that, for each $n \ge 0$,
\begin{align*}
X_n : (\Omega, \mathcal F) \to (S, \mathcal S).
\end{align*}
[/definition]
The next definition is needed because a general stochastic process can use its whole history when choosing the next state. We need a condition that says the current state is a sufficient summary for one-step prediction, while still allowing the current state to depend strongly on the past.
[definition: Markov Property]
Let $(X_n)_{n \ge 0}$ be a discrete-time stochastic process with countable state space $S$. The process has the Markov property if, for every $n \ge 0$ and every $i_0, \ldots, i_{n+1} \in S$ with
\begin{align*}
\mathbb P(X_0 = i_0, \ldots, X_n = i_n) > 0,
\end{align*}
one has
\begin{align*}
\mathbb P(X_{n+1} = i_{n+1} \mid X_0 = i_0, \ldots, X_n = i_n) = \mathbb P(X_{n+1} = i_{n+1} \mid X_n = i_n).
\end{align*}
[/definition]
This condition is not absolute independence of the next state from the past. It is conditional independence after the present state is known. That distinction matters, because the present state may itself contain a great deal of accumulated historical information.
With the parent notion now in place, many calculations need the rule for moving from $i$ to $j$ to be the same at every time. This extra assumption turns the chain into an iterated system with a single transition rule. The rule must be specified for every state, including states that the chosen initial distribution may not visit at time $0$, because those states can still appear later or under a different starting law.
[definition: Time-Homogeneous Markov Chain]
A Markov chain $(X_n)_{n \ge 0}$ on a countable state space $S$ is time-homogeneous if there exists a map
\begin{align*}
P:S \times S \to [0,1],
\end{align*}
written $P(i,j)=p_{ij}$, such that
\begin{align*}
\sum_{j \in S} p_{ij}=1
\end{align*}
for all $i \in S$, and for all $n \ge 0$ and all $i,j \in S$ with $\mathbb P(X_n=i)>0$,
\begin{align*}
\mathbb P(X_{n+1}=j \mid X_n=i)=p_{ij}.
\end{align*}
[/definition]
Time-homogeneity leaves one basic computational question: if the chain follows a specified finite path, how is the probability of that path determined from the starting law and the one-step transition rule? The Markov property removes any dependence on earlier history once the current state is known, so the path probability decomposes into one initial factor and one transition factor for each step. Concretely, if the initial law is $\mu$ and the path is $i_0,i_1,\ldots,i_n$, then
\begin{align*}
\mathbb P(X_0=i_0,X_1=i_1,\ldots,X_n=i_n)=\mu(i_0)p_{i_0i_1}p_{i_1i_2}\cdots p_{i_{n-1}i_n}.
\end{align*}
This product rule explains why Markov chains are tractable: summing such local products over suitable paths gives hitting probabilities, return probabilities, and long-time distributions.
## Transition Structure
The one-step probabilities are the local rules of the chain. In a finite state space they form a matrix; in a countable state space they form an infinite stochastic array. In either case every row must be a probability distribution.
### Matrices and Kernels
A table of transition probabilities is meaningful only when each row describes a possible next-state distribution. This row condition is the probabilistic constraint behind the linear-algebraic object.
[definition: Stochastic Matrix]
Let $S$ be a countable set. A stochastic matrix on $S$ is a map
\begin{align*}
P:S \times S \to [0,1],
\end{align*}
written $P(i,j)=p_{ij}$, such that
\begin{align*}
\sum_{j \in S} p_{ij} = 1
\end{align*}
for all $i \in S$.
[/definition]
A stochastic matrix by itself is only a collection of candidate rows. To use it for a particular Markov chain, those row entries must agree with the chain's conditional one-step laws at every time. This identification is what turns the abstract array into the transition data that can be iterated, multiplied, and used in hitting calculations.
[definition: Transition Matrix]
Let $(X_n)_{n \ge 0}$ be a time-homogeneous Markov chain on a countable state space $S$. A transition matrix for the chain is a map
\begin{align*}
P:S \times S \to [0,1],
\end{align*}
written $P(i,j)=p_{ij}$, such that
\begin{align*}
\sum_{j \in S} p_{ij}=1
\end{align*}
for every $i \in S$, and such that for all $n \ge 0$ and all $i,j \in S$ with $\mathbb P(X_n=i)>0$,
\begin{align*}
p_{ij} = \mathbb P(X_{n+1} = j \mid X_n = i).
\end{align*}
[/definition]
The same data may be viewed as a map $P:S \times S \to [0,1]$. If a state is never visited at time $n$ under the chosen initial law, the process does not determine that row from the displayed [conditional probability](/page/Conditional%20Probability); the standard convention is to choose any stochastic row compatible with the intended transition rule, while the rows of visited states are fixed by the chain.
The matrix viewpoint is best for finite computations. The kernel viewpoint is better when one wants to emphasize that each state $i$ determines a probability measure on possible next states, which is the route toward more general state spaces.
Before naming the transition rule formally, we fix one piece of measure notation. The symbol $\mathcal B([0,1])$ denotes the Borel $\sigma$-algebra on $[0,1]$, the collection of subsets generated by open intervals in $[0,1]$.
The next useful question is how to describe the same transition rule without referring to individual target states one at a time. Instead of asking only for the probability of moving from $i$ to a single state $j$, we want a rule that assigns a probability to every set of possible next states. This is the role of a transition kernel: it packages the one-step dynamics as a family of probability measures, one measure for each starting state. On a countable state space this adds no new information beyond the transition matrix, but it makes clear which measurability requirement must be retained when the state space is later no longer finite or countable.
[definition: Transition Kernel on a Countable State Space]
Let $S$ be a countable state space, equipped with the discrete sigma-algebra $2^S$. A transition kernel on $S$ is a map
\begin{align*}
K: S \times 2^S \to [0,1]
\end{align*}
such that for each $i \in S$, the set function $A \mapsto K(i,A)$ is a probability measure on $2^S$, and for each $A \subset S$, the map
\begin{align*}
i \mapsto K(i,A)
\end{align*}
is measurable from $(S,2^S)$ to $([0,1],\mathcal B([0,1]))$.
[/definition]
For a countable state space, the matrix and kernel descriptions encode the same data through
\begin{align*}
K(i,A)=\sum_{j \in A}p_{ij}.
\end{align*}
The measurability condition in the starting state $i$ is automatic here, because every subset of the countable state space is measurable. It is included because the same kernel definition survives unchanged when the state space is no longer countable. The kernel language is therefore a specialization and extension of the parent Markov chain idea, rather than the parent definition itself.
### Multi-Step Motion
A one-step rule is local. To answer questions about later times, we need probabilities for travelling from one state to another in several steps. These are not new parameters; they are generated by the one-step rule.
[definition: $n$-Step Transition Probability]
Let $(X_n)_{n \ge 0}$ be a time-homogeneous Markov chain on a countable state space $S$ with transition matrix $P$. Let $\delta_i$ denote the point mass at $i$, and let $\mathbb P_i$ denote the law of a chain with the same time-homogeneous transition matrix $P$ and initial distribution $\delta_i$, so that $X_0=i$ almost surely. Expectations under $\mathbb P_i$ are written $\mathbb E_i$. For $n \ge 0$ and $i,j \in S$, the $n$-step transition probability is
\begin{align*}
p_{ij}^{(n)} = \mathbb P_i(X_n = j).
\end{align*}
[/definition]
The natural question is how these probabilities combine. To go from $i$ to $j$ in $m+n$ steps, the chain must be somewhere after $m$ steps, and then move from that intermediate state to $j$ in the remaining $n$ steps.
[quotetheorem:2207]
This result is the bridge between probability and matrix powers. Its content is stronger than a bookkeeping identity: it says that all multi-step probabilities are forced by the one-step transition rule and the Markov property. The intermediate sum over $k$ is also where countability matters, since the chain can pass through any state after the first block of steps. In matrix language, the theorem justifies reading $p_{ij}^{(n)}$ as the $(i,j)$ entry of $P^n$, so questions about long-time behaviour become questions about powers of a stochastic matrix. The next example makes this concrete in the smallest nontrivial state space.
[example: Two-State Transition Rule]
Let $S=\{0,1\}$, and write the one-step probabilities as
\begin{align*}
p_{00}=1-a,\quad p_{01}=a,\quad p_{10}=b,\quad p_{11}=1-b.
\end{align*}
The two rows are probability distributions because
\begin{align*}
p_{00}+p_{01}=(1-a)+a=1
\end{align*}
and
\begin{align*}
p_{10}+p_{11}=b+(1-b)=1.
\end{align*}
If $a=b=0$, then
\begin{align*}
p_{00}=1-0=1,\quad p_{11}=1-0=1,
\end{align*}
so state $0$ and state $1$ are both absorbing. If $a=b=1$, then
\begin{align*}
p_{01}=1,\quad p_{10}=1,\quad p_{00}=1-1=0,\quad p_{11}=1-1=0,
\end{align*}
so a chain starting from $0$ follows $0,1,0,1,\ldots$, while a chain starting from $1$ follows $1,0,1,0,\ldots$. If $0<a,b<1$, then
\begin{align*}
p_{01}^{(1)}=a>0
\end{align*}
and
\begin{align*}
p_{10}^{(1)}=b>0,
\end{align*}
so the two states communicate. In the same case,
\begin{align*}
p_{00}=1-a>0,\quad p_{11}=1-b>0,
\end{align*}
so each state also has a positive probability of persisting for one more step. Thus this two-state family already displays trapping, deterministic cycling, and random persistence.
[/example]
The two-state example is small enough to compute by hand, but it already shows why structural hypotheses are needed. A chain may have a transition matrix and still fail to converge because it is trapped or periodic.
## Hitting and Stopping
Many important events are not about the state at a fixed time. They ask whether the chain ever reaches a set, when it returns, or what happens after a random time. These questions require stopping times and restart principles.
### First Visits
A target set is often more important than a target time. Hitting times record the first time a chain enters a set, making path events into random variables.
[definition: Hitting Time]
Let $(X_n)_{n \ge 0}$ be a Markov chain on a countable state space $S$, and let $A \subset S$. The hitting time of $A$ is the extended nonnegative-integer-valued [random variable](/page/Random%20Variable) $T_A:\Omega \to \{0,1,2,\ldots\} \cup \{\infty\}$ defined by
\begin{align*}
T_A = \inf\{n \ge 0 : X_n \in A\},
\end{align*}
with the convention that $\inf \varnothing = \infty$. For a singleton $\{i\}$, write
\begin{align*}
T_i=T_{\{i\}}.
\end{align*}
[/definition]
When the starting state already lies in $A$, the hitting time is $0$. To study whether a chain comes back after leaving time $0$, we need the first return time.
[definition: First Return Time]
Let $(X_n)_{n \ge 0}$ be a Markov chain on a countable state space $S$, and let $i \in S$. The first return time to $i$ is the extended positive-integer-valued random variable $T_i^+:\Omega \to \mathbb N \cup \{\infty\}$ defined by
\begin{align*}
T_i^+ = \inf\{n \ge 1 : X_n = i\}.
\end{align*}
The convention is $\inf \varnothing = \infty$.
[/definition]
The main use of these times is first-step analysis: condition on the first move, then solve equations for the desired hitting probabilities. The next example shows the method in its most classical finite form.
[example: Gambler's Ruin]
Let $N \in \mathbb N$ and let $S=\{0,1,\ldots,N\}$. From each interior state $i$ with $1 \le i \le N-1$, the chain moves to $i+1$ with probability $p$ and to $i-1$ with probability $q=1-p$, where $0<p<1$; states $0$ and $N$ are absorbing. We compute
\begin{align*}
h_i=\mathbb P_i(T_N<T_0).
\end{align*}
The boundary values are
\begin{align*}
h_0=\mathbb P_0(T_N<T_0)=0
\end{align*}
and
\begin{align*}
h_N=\mathbb P_N(T_N<T_0)=1,
\end{align*}
because starting at $0$ has already hit $0$, while starting at $N$ has already hit $N$.
For $1 \le i \le N-1$, conditioning on the first step gives
\begin{align*}
h_i=\mathbb P_i(T_N<T_0 \mid X_1=i+1)\mathbb P_i(X_1=i+1)+\mathbb P_i(T_N<T_0 \mid X_1=i-1)\mathbb P_i(X_1=i-1).
\end{align*}
After the first step, the future depends on the new state, so
\begin{align*}
\mathbb P_i(T_N<T_0 \mid X_1=i+1)=h_{i+1}
\end{align*}
and
\begin{align*}
\mathbb P_i(T_N<T_0 \mid X_1=i-1)=h_{i-1}.
\end{align*}
Using $\mathbb P_i(X_1=i+1)=p$ and $\mathbb P_i(X_1=i-1)=q$, we get
\begin{align*}
h_i=p h_{i+1}+q h_{i-1}.
\end{align*}
Assume first that $p \ne q$. Rearranging the recurrence,
\begin{align*}
p h_{i+1}-p h_i=q h_i-q h_{i-1}.
\end{align*}
Thus
\begin{align*}
p(h_{i+1}-h_i)=q(h_i-h_{i-1}).
\end{align*}
Let $d_i=h_i-h_{i-1}$ for $1 \le i \le N$. Then
\begin{align*}
d_{i+1}=\frac{q}{p}d_i.
\end{align*}
Writing $\rho=q/p$, repeated substitution gives
\begin{align*}
d_i=\rho^{i-1}d_1.
\end{align*}
Since $h_0=0$,
\begin{align*}
h_i=d_1+d_2+\cdots+d_i.
\end{align*}
Substituting $d_r=\rho^{r-1}d_1$,
\begin{align*}
h_i=d_1(1+\rho+\rho^2+\cdots+\rho^{i-1}).
\end{align*}
For $\rho \ne 1$, the finite geometric sum is
\begin{align*}
1+\rho+\rho^2+\cdots+\rho^{i-1}=\frac{1-\rho^i}{1-\rho}.
\end{align*}
Therefore
\begin{align*}
h_i=d_1\frac{1-\rho^i}{1-\rho}.
\end{align*}
Using $h_N=1$ gives
\begin{align*}
1=d_1\frac{1-\rho^N}{1-\rho}.
\end{align*}
Hence
\begin{align*}
d_1=\frac{1-\rho}{1-\rho^N}.
\end{align*}
Substituting this value of $d_1$ back into the formula for $h_i$,
\begin{align*}
h_i=\frac{1-\rho}{1-\rho^N}\cdot \frac{1-\rho^i}{1-\rho}.
\end{align*}
Cancelling the common factor $1-\rho$,
\begin{align*}
h_i=\frac{1-\rho^i}{1-\rho^N}.
\end{align*}
Since $\rho=q/p$, this is
\begin{align*}
h_i=\frac{1-(q/p)^i}{1-(q/p)^N}.
\end{align*}
If $p=q=1/2$, then the recurrence becomes
\begin{align*}
h_i=\frac{1}{2}h_{i+1}+\frac{1}{2}h_{i-1}.
\end{align*}
Multiplying by $2$,
\begin{align*}
2h_i=h_{i+1}+h_{i-1}.
\end{align*}
Rearranging,
\begin{align*}
h_{i+1}-h_i=h_i-h_{i-1}.
\end{align*}
Thus all successive differences $d_i=h_i-h_{i-1}$ are equal to one constant $d_1$. Since $h_0=0$,
\begin{align*}
h_i=i d_1.
\end{align*}
Using $h_N=1$ gives
\begin{align*}
1=N d_1.
\end{align*}
Therefore
\begin{align*}
d_1=\frac{1}{N}.
\end{align*}
So, in the symmetric case,
\begin{align*}
h_i=\frac{i}{N}.
\end{align*}
Thus the probability of reaching $N$ before $0$ is found by solving a finite boundary value problem for the hitting probabilities.
[/example]
### Restarting at Random Times
First-step analysis is only the simplest restart argument. More generally, after a random time that can be recognized from the observed past, the chain should restart from its current state. The filtration supplies the formal language for what the observed past means.
[definition: Natural Filtration of a Markov Chain]
Let $(X_n)_{n \ge 0}$ be a Markov chain on a probability space $(\Omega, \mathcal F, \mathbb P)$. Its natural filtration is the sequence $(\mathcal F_n^X)_{n \ge 0}$ defined by
\begin{align*}
\mathcal F_n^X = \sigma(X_0, X_1, \ldots, X_n).
\end{align*}
[/definition]
The natural filtration records the information available before a restart argument is made. Not every random time is legitimate for such an argument: deciding whether the time has occurred by time $n$ must use only the information visible by time $n$, not knowledge of the future path. This need leads to the following definition.
[definition: Stopping Time]
Let $(\mathcal F_n)_{n \ge 0}$ be a filtration on a probability space $(\Omega,\mathcal F,\mathbb P)$. A stopping time with respect to $(\mathcal F_n)_{n \ge 0}$ is a random variable $\tau:\Omega \to \{0,1,2,\ldots\}\cup\{\infty\}$ such that, for every $n \ge 0$,
\begin{align*}
\{\tau \le n\} \in \mathcal F_n.
\end{align*}
[/definition]
A hitting time is a stopping time with respect to the natural filtration, because by time $n$ the observed path tells whether the target has already been hit. But the stopping-time condition only says when the clock is allowed to ring; it does not yet identify what information has been revealed at that random time. To formulate a restart theorem, we need a sigma-algebra that contains exactly the events whose truth can be decided from the path observed up to $\tau$.
[definition: Stopped Sigma-Algebra]
Let $(\mathcal F_n)_{n \ge 0}$ be a filtration on a probability space $(\Omega,\mathcal F,\mathbb P)$, and let $\tau$ be a stopping time with respect to this filtration. The stopped sigma-algebra at $\tau$ is
\begin{align*}
\mathcal F_\tau = \{A \in \mathcal F : A \cap \{\tau \le n\} \in \mathcal F_n \text{ for every } n \ge 0\}.
\end{align*}
[/definition]
The stopped sigma-algebra is the formal version of the observed past at the random time. The remaining difficulty is that a hitting time depends on the path itself, so restarting the chain there is not covered by the ordinary Markov property at deterministic times. For discrete-time Markov chains, the needed restart principle is the [strong Markov property](/theorems/2208): if $\tau$ is a stopping time and $X_\tau=i$, then conditional on the information observed up to $\tau$, the shifted process
\begin{align*}
(X_{\tau+m})_{m \ge 0}
\end{align*}
has the same transition rule as the original chain started from $i$. In particular, after a hitting time, the pre-$\tau$ path matters for the future only through the state reached at $\tau$.
[remark: Restarting at a Hitting Time]
This is why hitting-time arguments can cut a sample path at the first visit to a set, discard the earlier path except for the state where the visit occurred, and continue with the same transition probabilities. The stopping-time hypothesis is essential: the restart time must be determined from the past and present observations, not by looking ahead into the future of the path.
[/remark]
## Classes, Recurrence, and Period
A transition matrix is also a directed graph. There is an arrow from $i$ to $j$ when $p_{ij}>0$. The graph determines which states can be reached, which parts of the chain are closed, and which states belong to the same long-term component.
### Communication Classes
A single transition may not reveal the whole reachability structure, because a state can be reached through intermediate states. Accessibility records reachability in some finite number of steps.
[definition: Accessibility]
Let $(X_n)_{n \ge 0}$ be a time-homogeneous Markov chain on a countable state space $S$. A state $j$ is accessible from a state $i$, written $i \to j$, if there exists $n \ge 0$ such that
\begin{align*}
p_{ij}^{(n)} > 0.
\end{align*}
[/definition]
Accessibility can point in only one direction. To identify pieces of the state space that mutually reach each other, we need a symmetric relation built from accessibility in both directions.
[definition: Communication]
Let $(X_n)_{n \ge 0}$ be a time-homogeneous Markov chain on a countable state space $S$. States $i,j \in S$ communicate, written $i \leftrightarrow j$, if $i \to j$ and $j \to i$.
[/definition]
Communication divides the state space into communicating classes. A chain with only one such class has no graph-theoretic decomposition into separate communicating regions, which motivates the standard connectedness condition.
[definition: Irreducible Markov Chain]
A time-homogeneous Markov chain on a countable state space $S$ is irreducible if every pair of states communicates.
[/definition]
To understand long-term confinement, connectedness is not enough: a communicating class may still leak probability to states outside it. We need a condition that identifies classes which, once entered, cannot be exited.
[definition: Closed Communicating Class]
Let $(X_n)_{n \ge 0}$ be a time-homogeneous Markov chain on a countable state space $S$. A communicating class $C \subset S$ is closed if for every $i \in C$ and every $j \notin C$,
\begin{align*}
p_{ij}=0.
\end{align*}
[/definition]
Closed classes act as self-contained chains. In a finite chain, the path enters some closed communicating class with probability one and then never leaves it, although the class reached may depend on the starting state. The following structural result records the finite-state picture behind that sentence.
[quotetheorem:9528]
This decomposition is the graph-theoretic skeleton of the finite theory. Once the chain reaches a closed class, the later analysis can be carried out inside that class; before that time, transient states can be visited only a finite number of times almost surely.
### Return Behaviour
Reachability asks whether a path exists with positive probability. Recurrence asks whether return happens almost surely when the chain starts from a state.
[definition: Recurrent and Transient States]
Let $(X_n)_{n \ge 0}$ be a time-homogeneous Markov chain on a countable state space $S$ with transition matrix $P$. Let $\mathbb P_i$ denote the law of a chain with transition matrix $P$ and initial distribution $\delta_i$, so that $X_0=i$ almost surely. The state $i \in S$ is recurrent if
\begin{align*}
\mathbb P_i(T_i^+ < \infty)=1.
\end{align*}
The state $i$ is transient if
\begin{align*}
\mathbb P_i(T_i^+ < \infty)<1.
\end{align*}
[/definition]
Recurrence says the chain returns, but it does not say how long one expects to wait. On a finite irreducible chain, recurrent states return in finite mean time. Countable chains have a sharper distinction: the return probability can be one while the expected return time is infinite. This is the gap between recurrence as certainty and recurrence as a stable long-run frequency, and it motivates the following definition.
[definition: Positive Recurrent and Null Recurrent States]
Let $(X_n)_{n \ge 0}$ be a time-homogeneous Markov chain on a countable state space $S$, and let $i \in S$ be recurrent. The state $i$ is positive recurrent if
\begin{align*}
\mathbb E_i[T_i^+] < \infty.
\end{align*}
The state $i$ is null recurrent if
\begin{align*}
\mathbb E_i[T_i^+] = \infty.
\end{align*}
[/definition]
Positive recurrence is the form of recurrence compatible with stationary mass in the countable-state setting. Null recurrence is more delicate: the chain comes back almost surely, but the returns are so spread out that no finite stationary distribution can assign the state a stable positive frequency in the irreducible case.
The definition uses a return probability, but return probabilities are not always the easiest quantities to compute. We need a criterion that replaces the event of eventual return by the expected total number of visits, which can be read from the powers of the transition matrix.
[quotetheorem:2211]
Let $N_i$ denote the total number of visits to $i$,
\begin{align*}
N_i := \sum_{n=0}^{\infty} \mathbf 1_{\{X_n=i\}}.
\end{align*}
Since
\begin{align*}
\mathbb E_i[N_i]=\sum_{n=0}^{\infty}p_{ii}^{(n)},
\end{align*}
the theorem converts recurrence into a summability question about the diagonal entries of the powers of $P$.
The visit-count criterion is useful, but it still appears to ask for a separate calculation at every state. That would make the classification of a large chain feel local and fragmented: perhaps $i$ has infinitely many expected returns while a nearby state $j$ behaves differently. Communication rules out this kind of mismatch. If the chain can move from $i$ to $j$ and later from $j$ back to $i$, visits to one state can be converted into visits to the other by attaching fixed access paths. The next theorem is what lets recurrence and transience belong to communicating classes rather than to isolated states.
[quotetheorem:2221]
This result lets us speak of recurrent and transient communicating classes, not merely recurrent and transient states. The long-term classification of a chain is therefore a classification of its graph components together with their return behaviour.
Even when return is guaranteed, returns may occur only on a fixed arithmetic schedule. Periodicity records this obstruction to convergence at every time.
[definition: Period of a State]
Let $(X_n)_{n \ge 0}$ be a time-homogeneous Markov chain on a countable state space $S$. The period of a state $i \in S$ is
\begin{align*}
d(i)=\gcd R_i, \qquad R_i=\{n \ge 1 : p_{ii}^{(n)} > 0\},
\end{align*}
provided $R_i$ is nonempty. If $R_i=\varnothing$, the period of $i$ is undefined. The state $i$ is aperiodic if $R_i$ is nonempty and $d(i)=1$.
[/definition]
In an irreducible chain all states have the same period. Aperiodicity is therefore a property of the whole chain in that setting.
[example: Period Two Random Walk]
Let $S=\mathbb Z$ and consider the simple random walk with $p_{i,i+1}=1/2$ and $p_{i,i-1}=1/2$ for every $i \in \mathbb Z$. Starting from $0$, write each step as an increment $\xi_r \in \{-1,1\}$, so
\begin{align*}
X_n=\xi_1+\xi_2+\cdots+\xi_n.
\end{align*}
Since $-1 \equiv 1 \pmod 2$, every increment changes the parity, and therefore
\begin{align*}
X_n \equiv n \pmod 2.
\end{align*}
If $X_n=0$, then $0 \equiv n \pmod 2$, so $n$ must be even. Conversely, for $n=2m$ with $m \ge 1$, the path that takes $m$ steps to the right and then $m$ steps to the left returns to $0$, and its probability is
\begin{align*}
\left(\frac{1}{2}\right)^{2m}>0.
\end{align*}
Hence
\begin{align*}
R_0=\{n \ge 1:p_{00}^{(n)}>0\}=\{2,4,6,\ldots\}.
\end{align*}
Thus
\begin{align*}
d(0)=\gcd R_0=\gcd\{2,4,6,\ldots\}=2.
\end{align*}
So the chain has period $2$ at $0$, and by the same parity argument a walk started from any $i$ can return to $i$ only after an even number of steps. This periodicity is separate from recurrence: the one-dimensional simple symmetric random walk is recurrent, but its possible return times still follow the parity of the starting point.
[/example]
Periodicity explains why a chain may fail to approach equilibrium at fixed times even when it keeps returning. Removing this clock is one of the hypotheses in the finite convergence theorem.
## Stationarity and Long-Term Behaviour
The most important long-term question is whether the distribution of $X_n$ settles into a stable distribution. A candidate stable distribution should be unchanged by one transition of the chain.
### Equilibrium Distributions
If a distribution is already at equilibrium, applying the transition rule should reproduce the same distribution. This gives a linear equation for the unknown row vector of masses.
[definition: Stationary Distribution]
Let $P=(p_{ij})_{i,j \in S}$ be the transition matrix of a time-homogeneous Markov chain on a countable state space $S$. A probability distribution $\pi:S \to [0,1]$, written $\pi_i=\pi(i)$, is stationary for $P$ if
\begin{align*}
\sum_{i \in S}\pi_i=1
\end{align*}
and, for every $j \in S$,
\begin{align*}
\pi_j = \sum_{i \in S}\pi_i p_{ij}.
\end{align*}
[/definition]
In row-vector notation this condition is $\pi P=\pi$. If $X_0 \sim \pi$, then $X_n \sim \pi$ for every $n \ge 0$.
For finite irreducible chains, recurrence and stationarity fit together with little visible tension. In a countable chain, recurrence alone is not enough: a null recurrent chain keeps returning, but its visits are too sparse to define a probability equilibrium. Positive recurrence is the extra return-time condition that makes stationary mass possible, and it leads to the following theorem.
[quotetheorem:9529]
This theorem explains why the countable-state theory cannot stop at recurrent versus transient. Positive recurrence is the bridge from return behaviour to equilibrium; null recurrence is the reason an almost-sure return statement may still fail to produce a stationary probability distribution.
Solving all stationary equations at once can obscure the probabilistic structure: the equation $\pi P=\pi$ only checks that total mass entering each state matches its mass, after summing over all sources. A more local condition asks for balance edge by edge, so that the probability flow from $i$ to $j$ is exactly matched by the reverse flow from $j$ to $i$. This pairwise symmetry is the reversibility condition.
[definition: Reversible Markov Chain]
Let $P=(p_{ij})_{i,j \in S}$ be the transition matrix of a time-homogeneous Markov chain on a countable state space $S$, and let $\pi:S \to [0,1]$ be a probability distribution on $S$, written $\pi_i=\pi(i)$. The chain is reversible with respect to $\pi$ if for all $i,j \in S$,
\begin{align*}
\pi_i p_{ij}=\pi_j p_{ji}.
\end{align*}
[/definition]
Reversibility says that, at equilibrium, the flow along each edge is balanced by flow along the reversed edge. The point of imposing such a strong pairwise condition is that it should force the weaker global stationarity equations automatically: when the incoming mass to a state is rewritten using the reversed balanced flows, it should add back up to that state's own mass.
The useful criterion is therefore not just a definition, but a way to certify stationarity without solving the full system of linear equations. Once detailed balance is checked for every pair of states, the next result records exactly why those local identities are sufficient to produce a stationary distribution.
[quotetheorem:7218]
Detailed balance is central in random walks on graphs and in Markov chain Monte Carlo. It lets one design transition probabilities by choosing symmetric flows around a desired target distribution.
[example: Stationary Distribution of a Two-State Chain]
For the two-state chain with $p_{01}=a$ and $p_{10}=b$, where $a,b \in (0,1)$, the remaining one-step probabilities are
\begin{align*}
p_{00}=1-a
\end{align*}
and
\begin{align*}
p_{11}=1-b.
\end{align*}
A stationary distribution $\pi=(\pi_0,\pi_1)$ must satisfy $\pi_0+\pi_1=1$ and, for state $0$,
\begin{align*}
\pi_0=\pi_0p_{00}+\pi_1p_{10}.
\end{align*}
Substituting the entries of the transition matrix gives
\begin{align*}
\pi_0=\pi_0(1-a)+\pi_1b.
\end{align*}
Expanding the right-hand side,
\begin{align*}
\pi_0=\pi_0-\pi_0a+\pi_1b.
\end{align*}
Subtracting $\pi_0$ from both sides gives
\begin{align*}
0=-\pi_0a+\pi_1b.
\end{align*}
Adding $\pi_0a$ to both sides,
\begin{align*}
\pi_0a=\pi_1b.
\end{align*}
Since $\pi_0+\pi_1=1$, we have
\begin{align*}
\pi_1=1-\pi_0.
\end{align*}
Substituting this into $\pi_0a=\pi_1b$ gives
\begin{align*}
\pi_0a=(1-\pi_0)b.
\end{align*}
Expanding,
\begin{align*}
\pi_0a=b-\pi_0b.
\end{align*}
Adding $\pi_0b$ to both sides,
\begin{align*}
\pi_0a+\pi_0b=b.
\end{align*}
Factoring out $\pi_0$,
\begin{align*}
\pi_0(a+b)=b.
\end{align*}
Because $a+b>0$, division by $a+b$ gives
\begin{align*}
\pi_0=\frac{b}{a+b}.
\end{align*}
Then
\begin{align*}
\pi_1=1-\frac{b}{a+b}.
\end{align*}
Writing $1=(a+b)/(a+b)$,
\begin{align*}
\pi_1=\frac{a+b}{a+b}-\frac{b}{a+b}.
\end{align*}
Combining the fractions,
\begin{align*}
\pi_1=\frac{a}{a+b}.
\end{align*}
Thus
\begin{align*}
\pi=\left(\frac{b}{a+b},\frac{a}{a+b}\right).
\end{align*}
The state that is harder to leave receives more stationary mass: increasing $b$, the probability of leaving state $1$, increases $\pi_0$, while increasing $a$, the probability of leaving state $0$, increases $\pi_1$.
[/example]
### Convergence to Equilibrium
A stationary distribution need not attract all starting distributions. Deterministic alternation on two states has stationary distribution $(1/2,1/2)$, but a chain started at $0$ alternates forever. Irreducibility and aperiodicity are the finite-state hypotheses that remove the common structural obstructions.
[quotetheorem:2225]
This theorem is the positive recurrent aperiodic equilibrium picture. Under the right structural hypotheses, the initial state is forgotten in distribution and every row of $P^n$ approaches the same limiting distribution. Each hypothesis rules out a different obstruction: irreducibility prevents the limiting distribution from depending on a closed class chosen at the start, positive recurrence ensures that stationary mass is normalizable and repeatedly visited, and aperiodicity prevents deterministic cycling such as the two-state alternation mentioned above. Without these assumptions, stationary distributions may exist without attracting the chain, or different starting states may converge to different limiting laws. The theorem is therefore the main justification for using a stationary distribution as a long-run prediction rather than merely as an algebraic solution to $\pi P=\pi$.
## Model Families
Markov chains are usually specified by writing down a transition rule. Different modelling choices produce families with recognizable structure: absorbing chains for terminal states, birth-death chains for one-dimensional movement, and reflecting chains for constrained motion.
### Absorption
A terminal state is a state the process cannot leave. Such states are useful for modelling ruin, completion, failure, fixation, and other endpoint events.
[definition: Absorbing State]
Let $(X_n)_{n \ge 0}$ be a time-homogeneous Markov chain on a countable state space $S$. A state $i \in S$ is absorbing if
\begin{align*}
p_{ii}=1.
\end{align*}
[/definition]
Absorbing states turn long-term questions into hitting questions. The chain does not mix across all states; it eventually becomes trapped if absorption occurs.
[example: Absorbing Chain with Two Exits]
Let $S=\{0,1,2\}$, with $0$ and $2$ absorbing. From state $1$, the chain moves to $0$ with probability $r$, stays at $1$ with probability $s$, and moves to $2$ with probability $t$, where $r,s,t \ge 0$ and $r+s+t=1$. We compute the probability of eventual absorption at $2$ before absorption at $0$, starting from state $1$.
Assume $r+t>0$. For $k \ge 0$, the event that the chain holds at state $1$ for exactly $k$ steps and then exits to $2$ is
\begin{align*}
X_1=1,\ldots,X_k=1,X_{k+1}=2,
\end{align*}
with the convention that for $k=0$ this means $X_1=2$. By the path product rule for a time-homogeneous Markov chain, this event has probability
\begin{align*}
s^k t.
\end{align*}
The events for different values of $k$ are disjoint, because the first exit time from state $1$ has only one value. Therefore
\begin{align*}
\mathbb P_1(T_2<T_0)=\sum_{k=0}^{\infty}s^k t.
\end{align*}
Since $r+t>0$ and $r+s+t=1$, we have
\begin{align*}
s=1-(r+t)<1.
\end{align*}
Also $s \ge 0$, so the geometric series converges. Its partial sums satisfy
\begin{align*}
\sum_{k=0}^{m}s^k=\frac{1-s^{m+1}}{1-s}.
\end{align*}
Letting $m \to \infty$ gives $s^{m+1}\to 0$, hence
\begin{align*}
\sum_{k=0}^{\infty}s^k=\frac{1}{1-s}.
\end{align*}
Substituting this into the hitting probability,
\begin{align*}
\mathbb P_1(T_2<T_0)=t\cdot \frac{1}{1-s}.
\end{align*}
Using $r+s+t=1$, subtract $s$ from both sides to get
\begin{align*}
1-s=r+t.
\end{align*}
Therefore
\begin{align*}
\mathbb P_1(T_2<T_0)=\frac{t}{r+t}.
\end{align*}
Thus the holding probability $s$ changes the waiting time before absorption, but the eventual choice between exits $0$ and $2$ depends only on the relative exit probabilities $r$ and $t$.
[/example]
### Birth and Death
Many systems move by increments of size at most one: population size changes by one birth or one death, a queue grows or shrinks by one customer, and a random walk moves to neighbouring sites. This motivates a named one-dimensional class.
[definition: Birth-Death Chain]
A birth-death chain is a time-homogeneous Markov chain on a subset $S \subset \mathbb Z$ whose transition probabilities satisfy
\begin{align*}
p_{ij}=0 \quad \text{whenever } |i-j|>1.
\end{align*}
[/definition]
The name comes from population dynamics: a birth increases the state by one, a death decreases it by one, and holding leaves it unchanged.
[example: Reflecting Random Walk on a Finite Interval]
Let $S=\{0,1,\ldots,N\}$, with $N \ge 1$. For $1 \le i \le N-1$, suppose the chain moves to $i+1$ with probability $p$, to $i-1$ with probability $q$, and stays at $i$ with probability $1-p-q$, where $p,q>0$ and $p+q<1$. At the endpoints set
\begin{align*}
p_{0,1}=p,\quad p_{0,0}=1-p,\quad p_{N,N-1}=q,\quad p_{N,N}=1-q.
\end{align*}
The interior rows sum to
\begin{align*}
q+(1-p-q)+p=1.
\end{align*}
The endpoint rows sum to
\begin{align*}
(1-p)+p=1
\end{align*}
and
\begin{align*}
q+(1-q)=1.
\end{align*}
Thus the displayed rule is a stochastic transition rule on the finite interval.
The chain cannot leave $S$: at $0$ the only possible moves are to $0$ and $1$, at $N$ the only possible moves are to $N-1$ and $N$, and from an interior state the only possible moves are to neighbouring states or to itself. If $i<j$, the path
\begin{align*}
i,i+1,i+2,\ldots,j
\end{align*}
has probability
\begin{align*}
p^{j-i}>0.
\end{align*}
If $i>j$, the path
\begin{align*}
i,i-1,i-2,\ldots,j
\end{align*}
has probability
\begin{align*}
q^{i-j}>0.
\end{align*}
Therefore every state is accessible from every other state, so the chain is irreducible. Also
\begin{align*}
p_{00}=1-p>0
\end{align*}
because $p<1$, so state $0$ can return to itself in one step and has period $1$. Since all states communicate with $0$, the irreducible chain is aperiodic.
We now compute a stationary distribution. The neighbouring balance equations should satisfy
\begin{align*}
\pi_i p=\pi_{i+1}q
\end{align*}
for $0 \le i \le N-1$. Since $q>0$, this gives
\begin{align*}
\pi_{i+1}=\frac{p}{q}\pi_i.
\end{align*}
Writing $r=p/q$, repeated substitution gives
\begin{align*}
\pi_i=r^i\pi_0
\end{align*}
for every $0 \le i \le N$.
If $p=q$, then $r=1$, so $\pi_i=\pi_0$ for every $i$. The normalization condition gives
\begin{align*}
1=\sum_{i=0}^{N}\pi_i=\sum_{i=0}^{N}\pi_0=(N+1)\pi_0.
\end{align*}
Hence
\begin{align*}
\pi_0=\frac{1}{N+1}.
\end{align*}
Therefore
\begin{align*}
\pi_i=\frac{1}{N+1}
\end{align*}
for every $0 \le i \le N$, so the stationary distribution is uniform.
If $p \ne q$, then $r \ne 1$. The normalization condition gives
\begin{align*}
1=\sum_{i=0}^{N}\pi_i=\sum_{i=0}^{N}r^i\pi_0=\pi_0\sum_{i=0}^{N}r^i.
\end{align*}
The finite geometric sum is
\begin{align*}
\sum_{i=0}^{N}r^i=\frac{1-r^{N+1}}{1-r}.
\end{align*}
Thus
\begin{align*}
1=\pi_0\frac{1-r^{N+1}}{1-r}.
\end{align*}
Solving for $\pi_0$ gives
\begin{align*}
\pi_0=\frac{1-r}{1-r^{N+1}}.
\end{align*}
Therefore
\begin{align*}
\pi_i=\frac{1-r}{1-r^{N+1}}r^i.
\end{align*}
Substituting $r=p/q$,
\begin{align*}
\pi_i=\frac{1-p/q}{1-(p/q)^{N+1}}\left(\frac{p}{q}\right)^i.
\end{align*}
So the reflecting boundary changes the gambler's ruin chain from an absorbing model into a finite irreducible aperiodic chain with an equilibrium distribution: uniform when $p=q$, and geometric in the ratio $p/q$ when $p \ne q$.
[/example]
The reflecting walk contrasts with gambler's ruin. Both are birth-death chains on a finite interval, but the boundary rule changes the long-term behaviour from absorption to equilibrium.
## Beyond and Connected Topics
Markov chains sit at the intersection of elementary probability, linear algebra, and measure-theoretic stochastic processes. The finite and countable theory belongs naturally after [Cambridge IA Probability](/page/Cambridge%20IA%20Probability), where conditional probability and expectation are developed, and before more advanced measure-theoretic treatments in [Cambridge IB Probability and Measure](/page/Cambridge%20IB%20Probability%20and%20Measure).
The direct course-level continuation is [Cambridge IB Markov Chains](/page/Cambridge%20IB%20Markov%20Chains). That page develops classification, recurrence, invariant distributions, and convergence as a coherent undergraduate course, with many finite-state computations and standard examples.
A second direction is martingale theory. Many hitting-time arguments for Markov chains can be recast using martingales and optional stopping, especially for random walks and gambler's ruin. This connects Markov chains to broader stochastic process methods.
A third direction is Markov chain Monte Carlo. There the goal is reversed: instead of starting with a physical transition mechanism and finding its stationary distribution, one starts with a target distribution and designs a reversible chain having that target as equilibrium.
A fourth direction is coding and cryptography. Random walks on finite groups and graphs appear in mixing, random generation, and probabilistic algorithms; this gives a bridge toward [Cambridge II Coding and Cryptography](/page/Cambridge%20II%20Coding%20and%20Cryptography), where randomness is used as a computational resource.
A fifth direction is continuous time. Continuous-time Markov chains replace transition probabilities by rates and exponential holding times. The embedded jump chain is discrete-time, so the ideas of communication, recurrence, absorption, and stationarity remain visible.
## References
Androma, [Cambridge IB Markov Chains](/page/Cambridge%20IB%20Markov%20Chains).
Androma, [Cambridge IA Probability](/page/Cambridge%20IA%20Probability).
Androma, [Cambridge IB Probability and Measure](/page/Cambridge%20IB%20Probability%20and%20Measure).
Androma, [Cambridge II Coding and Cryptography](/page/Cambridge%20II%20Coding%20and%20Cryptography).
Norris, *Markov Chains* (1997).
Durrett, *Probability: Theory and Examples* (2019).
Levin, Peres, and Wilmer, *Markov Chains and Mixing Times* (2017).
Markov Chain
Also known as: ["Markov Chains","Discrete-Time Markov Chain","Time-Homogeneous Markov Chain","Transition Matrix"]