[motivation]
Probability theory, as studied in a first course, is largely a theory of independence. We toss a coin repeatedly, roll dice in sequence, draw from an urn with replacement — in each case, the outcome of one trial tells us nothing about the next. This independence assumption is mathematically convenient and allows us to prove powerful results: the law of large numbers, the central limit theorem, and so on. But it is also, frankly, an idealization.
In the real world, consecutive observations are almost never independent. The weather tomorrow is not independent of the weather today. The queue length at a service counter at 3pm is not independent of the queue length at 2pm. The price of a stock next week is not independent of its price this week. If we wish to model such phenomena mathematically, we need a theory of *dependent* random variables.
The difficulty is that "dependence" is an enormously broad concept. Given two random variables $X$ and $Y$, knowing that they are dependent tells us very little about their joint structure. Their relationship could be simple or extraordinarily complex, monotone or oscillatory, local or non-local. Without further assumptions, very little can be said in general, and essentially no theory can be built.
This parallels the situation in analysis. Given a function $f: \mathbb{R} \to \mathbb{R}$, knowing merely that $f$ exists tells us nothing useful. But if we assume $f$ is continuous, or differentiable, or convex, a rich theory opens up. The structural assumption is what makes the mathematics tractable.
The course takes the following approach: restrict attention to a particularly tractable — yet still general enough to be interesting — form of dependence. We study sequences of random variables $(X_0, X_1, X_2, \ldots)$ in which the conditional distribution of the next state $X_{n+1}$, given the entire history $X_0, X_1, \ldots, X_n$, depends only on the current state $X_n$. In symbols:
\begin{align*}
\mathbb{P}(X_{n+1} = j \mid X_0 = i_0, \ldots, X_{n-1} = i_{n-1}, X_n = i) = \mathbb{P}(X_{n+1} = j \mid X_n = i).
\end{align*}
This is the *Markov property*, and a sequence satisfying it is called a *Markov chain*. The intuition is vivid: the future of the process is determined entirely by its present state. The past — however the chain arrived at its current position — is irrelevant once the current state is known. This is sometimes called the "memorylessness" of the process, though the more precise term is the Markov property.
The Markov property is a structural constraint, and structural constraints enable proofs. As we will see throughout this course, just this single assumption — that the future depends only on the present — is enough to derive a rich and complete theory:
- We can compute the probability of reaching any state from any other state.
- We can determine which states a chain will eventually return to, and with what probability.
- We can identify when a chain settles into a long-run equilibrium distribution, independent of where it started.
- We can reverse time and ask whether the chain looks statistically the same run backwards.
None of these would be tractable for general dependent sequences. The Markov property is precisely the right level of constraint: strong enough to admit a complete theory, weak enough to model a vast range of real phenomena.
Before the formal definitions begin, it is worth noting that Markov chains are not exotic objects. A simple random walk on $\mathbb{Z}$ — where at each step the walker moves right or left with equal probability, independently of everything except the current position — is a Markov chain. The branching process — where each individual independently produces a random number of offspring — is a Markov chain (with the current population size as the state). Coin-tossing and die-rolling are degenerate Markov chains where the transition probabilities do not depend on the current state at all (since consecutive outcomes are independent).
What this course provides is the general framework that encompasses all these examples, and the tools to analyse them systematically.
This course studies discrete-time Markov chains on countable state spaces. The journey proceeds in four main phases. First, we define the basic objects and establish how the transition matrix $P = (p_{ij})$ encodes the entire dynamics of a homogeneous chain. Second, we classify states and chains according to their long-run behaviour: which states communicate, which are recurrent (the chain will surely return), and which are transient (there is positive probability of never returning). Simple random walks in dimensions one, two, and three provide concrete illustrations of this dichotomy. Third, we study invariant distributions and convergence to equilibrium — the deep theorem that an irreducible, positive recurrent, aperiodic chain forgets its initial state and converges to a unique stationary distribution. Finally, we examine time-reversal and detailed balance, which characterise chains that look the same run forwards and backwards.
Throughout, the emphasis is on understanding the probabilistic structure rather than treating Markov chains as a branch of linear algebra — though the language of matrices will appear naturally and usefully at several points.
[/motivation]