This course develops the modern theory of entropy in ergodic theory and shows how information-theoretic ideas organize the study of dynamical systems. It begins with entropy for partitions and the basic language of information, then moves to Kolmogorov-Sinai entropy as an isomorphism invariant that measures dynamical complexity. From there, the course turns to generators, entropy computation, and the [Shannon-McMillan-Breiman theorem](/theorems/6766), which explains how entropy governs the asymptotic frequency of typical orbit segments.
text
admin
The later chapters broaden the scope from measure-theoretic dynamics to symbolic, topological, and statistical viewpoints. Bernoulli shifts and isomorphism problems illustrate the classification power of entropy, while Markov shifts and symbolic dynamics provide concrete models for more general systems. Topological entropy, the variational principle, and thermodynamic formalism connect entropy with pressure, equilibrium states, and statistical mechanics. The final chapters explore how entropy interacts with mixing and decay of correlations, with number-theoretic dynamical systems, and with phase transitions in statistical mechanics, showing how a single invariant links rigorous dynamics, probability, and mathematical physics.
text
admin
# Introduction
h1
admin
This opening chapter fixes the scope and language of the course. Ergodic Theory I treated qualitative long-term behaviour: invariant sets, recurrence, ergodicity, weak mixing, and strong mixing. The second course asks for quantitative invariants, especially entropy, which measure how much information is produced by a dynamical system per unit time.
text
admin
The guiding contrast is between systems that look chaotic because they separate nearby orbits and systems that are chaotic in the measure-theoretic sense because observations reveal genuinely new information. Entropy connects measurable dynamics, symbolic dynamics, topological dynamics, statistical mechanics, and number-theoretic examples. The aim of the course is to learn how entropy is defined, computed, compared under factors and codings, and used as a classification tool.
text
admin
## The Central Questions of the Course
h2
admin
A measure-preserving system may be studied by repeatedly observing which atom of a finite partition contains the orbit point. The first question is how much information this observation process produces over a long time interval. A second question is whether that number is intrinsic to the system or depends on the chosen observation scheme.
text
admin
[explanation: Entropy As Information Growth]
Let $(X, \mathcal B, \mu, T)$ be a probability-preserving system and let $\mathcal P$ be a finite measurable partition of $X$. The observation of $x, T x, \dots, T^{n-1}x$ through $\mathcal P$ records the atom of the joined partition
text
admin
\begin{align*}
\mathcal P_0^{n-1} = \mathcal P \vee T^{-1}\mathcal P \vee \cdots \vee T^{-(n-1)}\mathcal P
\end{align*}
align*
admin
that contains $x$. The entropy of this joined partition measures the information needed to describe the length-$n$ name of a typical point. The entropy rate asks for the asymptotic average information per observation.
[/explanation]
text
admin
This viewpoint turns dynamics into a source-coding problem: a partition gives a finite alphabet, orbit segments give words, and the measure gives the frequencies of those words. To make this rate useful, we need to know that the finite-time entropies have a well-defined long-time average.
text
admin
[quotetheorem:6722]
text
admin
[citeproof:6722]
text
admin
The theorem explains why entropy is an asymptotic invariant rather than a finite-time statistic. The finiteness of the partition is essential here: on the countable atomic probability space $X=\{2,3,4,\dots\}$ with
where $c>0$ normalises the total mass, the partition into singletons has infinite Shannon entropy. For the identity map on this space, the first normalised entropy value is already infinite, so no finite numerical rate is obtained from that countable observation. Measure preservation is also doing real work, because it lets the entropy of $T^{-m}\mathcal P_0^{n-1}$ agree with the entropy of $\mathcal P_0^{n-1}$ in the subadditivity argument. Without preservation this comparison can fail even for finite partitions: on $X=\{0,1\}$ with $\mu(\{0\})=\mu(\{1\})=1/2$, the map $T(0)=T(1)=0$ is not measure preserving, and for the partition into singletons the pullback $T^{-1}\mathcal P$ is the single-atom partition, so its entropy is $0$ rather than $H_\mu(\mathcal P)=\log 2$. The result does not say that the finite-time values are stable or monotone in $n$; it says only that their average information per step has a limiting rate. Chapter 2 removes the dependence on a particular finite observation by taking the supremum over all finite measurable partitions in the definition of Kolmogorov-Sinai entropy.
text
admin
## Background Assumed from Ergodic Theory I
h2
admin
Before entropy can be compared across examples, we need to know what counts as the same long-term statistical experiment and what counts as a coarser observation of it. The course assumes that measure-preserving transformations have already been introduced as the measurable analogue of time evolution, so this section recalls only the base objects that entropy will attach numerical invariants to and compare under morphisms.
text
admin
[definition: Probability-Preserving System]
A probability-preserving system is a quadruple $(X, \mathcal B, \mu, T)$ where $(X, \mathcal B, \mu)$ is a probability space and $T:X\to X$ is a measurable map such that
\begin{align*}
\mu(T^{-1}A)=\mu(A)
\end{align*}
for every $A\in \mathcal B$.
[/definition]
definition
admin
The map $T$ may be invertible or non-invertible, and both cases occur throughout the course. Invertible systems include shifts and rotations; non-invertible systems include expanding maps such as the doubling map on the circle.