This course develops the modern theory of deterministic chaos and ergodic behavior in dynamical systems. It asks how simple nonlinear rules can produce complicated long-term behavior, and how that behavior can still be described using geometry, symbolic coding, and statistical laws. The central objects are recurrent orbits, invariant sets, hyperbolic dynamics, and invariant measures, with an emphasis on understanding when chaos is robust, when it can be classified, and how it can be measured.
The chapters begin with recurrence and the first signs of chaotic motion, then introduce symbolic dynamics as a way to encode orbits by sequences. From there the course builds the geometric theory of horseshoes, Smale dynamics, hyperbolic sets, stable manifolds, and homoclinic intersections, leading to shadowing and structural stability. Later chapters turn to quantitative complexity through topological entropy, then to invariant measures, ergodicity, Lyapunov exponents, SRB measures, and physical measures. The final chapters bring these strands together through Markov partitions and thermodynamic formalism, showing how geometry, coding, and statistics form a unified picture of chaotic dynamics.
# Introduction
This introductory chapter fixes the language and expectations for the course. The main problem is to understand how deterministic rules can produce behaviour that is geometrically organized, topologically complicated, and statistically regular. We will move between maps, flows, invariant sets, symbolic codings, entropy, Lyapunov exponents, and invariant measures, so the first task is to identify the common objects that all later chapters will refine.
The course assumes a first encounter with dynamical systems: fixed points, invariant manifolds, elementary stability, and examples from ordinary differential equations. Here the emphasis shifts from local phase portraits to long-term behaviour on invariant sets. A recurring theme is that chaos is not a synonym for disorder; it is the presence of rigid structures that make complicated orbit patterns analyzable.
## The Central Question of Chaotic Dynamics
What does it mean to understand a system after individual orbit prediction has become unreliable? For a map $f:X \to X$ or a flow $(\varphi_t)_{t \in \mathbb R}$ on a phase space $X$, the raw data are orbits, but the objects we can often control are invariant sets, recurrence properties, rates of separation, and invariant probability measures. The course develops these objects as complementary ways of describing the same dynamics.
[definition: Discrete-Time Dynamical System]
A discrete-time dynamical system is a pair $(X,f)$ where $X$ is a phase space and $f:X \to X$ is a map. The forward orbit of $x \in X$ is the sequence $(f^n(x))_{n \ge 0}$, where $f^0=\operatorname{id}_X$ and $f^{n+1}=f \circ f^n$.
[/definition]
This definition records the basic mechanism for maps: the system evolves by repeated composition. Many systems in applications evolve continuously in time rather than by a fixed update rule, so we also need notation for families of maps whose parameter is time itself.
[definition: Continuous-Time Dynamical System]
A continuous-time dynamical system is a family $(\varphi_t)_{t \in \mathbb R}$ of maps $\varphi_t:X \to X$ satisfying $\varphi_0=\operatorname{id}_X$ and $\varphi_{t+s}=\varphi_t \circ \varphi_s$ for all $s,t \in \mathbb R$. The orbit of $x \in X$ is the set $\{\varphi(t,x):t \in \mathbb R\}$.
[/definition]
The same data can be packaged as a single action $\varphi:\mathbb R \times X \to X$, written $\varphi(t,x)=\varphi_t(x)$; the group law $\varphi_{t+s}=\varphi_t \circ \varphi_s$ then becomes a joint condition on the two-variable map. Flows enter because many systems arise from differential equations, while maps arise both directly and as time-one maps or return maps. The bridge between them lets us transfer ideas such as recurrence, invariant measures, and entropy across continuous and discrete time.
[example: Time-One Map of a Flow]
Let $(\varphi_t)_{t \in \mathbb R}$ be the flow of an autonomous ordinary differential equation $\dot{x}=F(x)$ on a smooth manifold $M$, so $\varphi_0=\operatorname{id}_M$ and $\varphi_{t+s}=\varphi_t \circ \varphi_s$. Define $f=\varphi_1:M \to M$. Then the first few iterates are
\begin{align*}
f^0=\operatorname{id}_M=\varphi_0.
\end{align*}
Also,
\begin{align*}
f^1=f=\varphi_1.
\end{align*}
For the second iterate, the flow law gives
\begin{align*}
f^2=f\circ f=\varphi_1\circ \varphi_1=\varphi_{1+1}=\varphi_2.
\end{align*}
If $f^n=\varphi_n$, then
\begin{align*}
f^{n+1}=f\circ f^n=\varphi_1\circ \varphi_n=\varphi_{1+n}=\varphi_{n+1}.
\end{align*}
Thus, by induction, $f^n=\varphi_n$ for every $n\ge 0$.
For each $x\in M$, the forward orbit of the discrete-time system $(M,f)$ is therefore
\begin{align*}
\mathcal O_f^+(x)=\{f^n(x):n\ge 0\}=\{\varphi_n(x):n\ge 0\}.
\end{align*}
This is exactly the set of points obtained by sampling the continuous trajectory $t\mapsto \varphi_t(x)$ at integer times. The time-one map preserves the integer-time skeleton of the flow, while behaviour on intervals such as $0<t<1$ is invisible unless we keep the full family $(\varphi_t)_{t\in\mathbb R}$.
[/example]
The example shows that different time descriptions can point to the same long-term motion. To compare such descriptions, we need to isolate the subsets that keep the dynamics inside themselves and therefore carry their own internal orbit structure.
[definition: Invariant Set]
Let $(X,f)$ be a discrete-time dynamical system. A subset $A \subset X$ is forward invariant if $f(A) \subset A$, invariant if $f(A)=A$, and completely invariant if $f^{-1}(A)=A$.
[/definition]
For non-invertible maps, these distinctions matter. Forward invariance is enough for future evolution to remain in $A$, while stronger versions keep track of preimages and are often needed when coding all bi-infinite orbit histories.
[example: Invariance for the Doubling Map]
View $S^1$ as $\mathbb R/\mathbb Z$, so $D([x])=[2x]$. The whole circle is invariant because for every $[y]\in S^1$, the point $[y/2]\in S^1$ satisfies
\begin{align*}
D([y/2])=[2(y/2)]=[y].
\end{align*}
Thus $D(S^1)=S^1$.
If $x$ is periodic with period $p$, so $D^p(x)=x$, its periodic orbit is
\begin{align*}
P=\{x,D(x),D^2(x),\ldots,D^{p-1}(x)\}.
\end{align*}
For $0\le k\le p-2$, applying $D$ to $D^k(x)$ gives $D^{k+1}(x)\in P$, and for the last point,
\begin{align*}
D(D^{p-1}(x))=D^p(x)=x\in P.
\end{align*}
Hence $D(P)\subset P$. Since every element of $P$ is also the image under $D$ of the preceding element in the same cycle, $D(P)=P$.
The symbolic picture comes from binary expansions. If
\begin{align*}
x=0.b_1b_2b_3\cdots=\sum_{j\ge 1} b_j2^{-j},
\end{align*}
with $b_j\in\{0,1\}$, then
\begin{align*}
2x=b_1+\sum_{j\ge 1}b_{j+1}2^{-j}.
\end{align*}
Reducing modulo $1$ removes the integer part $b_1$, so
\begin{align*}
D(x)=0.b_2b_3b_4\cdots.
\end{align*}
Thus $D$ acts on binary expansions by shifting the sequence one step to the left.
This explains the invariance issue for forbidden words. If a set is defined by forbidding a word everywhere in the binary sequence, then shifting left cannot create a new occurrence of that word, because every block in $b_2b_3b_4\cdots$ was already a block in $b_1b_2b_3\cdots$. That set is forward invariant. If the condition only forbids a word at a fixed initial position, forward invariance can fail: the sequence $0111\cdots$ does not begin with $11$, but after applying $D$ it becomes $111\cdots$, which does begin with $11$. The doubling map therefore turns a geometric question about subsets of the circle into a combinatorial question about which binary sequence conditions are preserved by the left shift.
[/example]
## Three Viewpoints on Chaos
How can a single system be described by topology, geometry, and probability at the same time? The same orbit structure may be examined through open sets, through derivatives and stretching rates, or through the distribution of time spent in different regions. The course deliberately keeps these viewpoints in contact.
[definition: Topological Dynamical System]
A topological dynamical system is a pair $(X,f)$ where $X$ is a [topological space](/page/Topological%20Space) and $f:X \to X$ is continuous.
[/definition]
Continuity makes qualitative orbit behaviour visible through open sets. This topological language is not enough for questions about stretching rates, stable directions, or tangent-vector growth, because open sets do not record how nearby orbits separate infinitesimally. To ask whether a map expands one tangent direction while contracting another, the phase space must carry smooth coordinates and the time evolution must respect that smooth structure.
[definition: Smooth Dynamical System]
A smooth dynamical system is a pair $(M,f)$ where $M$ is a smooth manifold and $f:M \to M$ is a smooth map, or a smooth flow $(\varphi_t)_{t \in \mathbb R}$ on $M$.
[/definition]
Smoothness adds differential information. Hyperbolicity, stable and unstable directions, and Lyapunov exponents depend on how tangent vectors are transported along orbits; after this geometric viewpoint, we also need a framework that records what happens for typical points rather than every point.
[definition: Measure-Preserving Dynamical System]
A measure-preserving dynamical system is a quadruple $(X,\mathcal F,\mu,f)$ where $(X,\mathcal F,\mu)$ is a [measure space](/page/Measure%20Space), $f:X \to X$ is measurable, and $\mu(f^{-1}(A))=\mu(A)$ for every $A \in \mathcal F$.
[/definition]
The measure-theoretic viewpoint replaces pointwise prediction by statistical questions. Instead of trying to predict a single orbit forever, we ask whether time averages converge, whether invariant measures exist, and whether almost every orbit has a typical frequency of visits.
[example: Irrational Rotation and the Three Viewpoints]
Let $S^1=\mathbb R/\mathbb Z$ and write $R_\alpha([x])=[x+\alpha]$, where $\alpha\notin\mathbb Q$. Iterating gives
\begin{align*}
R_\alpha^n([x])=[x+n\alpha]
\end{align*}
because the case $n=0$ is $R_\alpha^0([x])=[x]$, and if $R_\alpha^n([x])=[x+n\alpha]$, then
\begin{align*}
R_\alpha^{n+1}([x])=R_\alpha([x+n\alpha])=[x+n\alpha+\alpha]=[x+(n+1)\alpha].
\end{align*}
Topologically, the orbit is dense. Fix $\varepsilon>0$ and choose $N$ with $1/N<\varepsilon$. Among the $N+1$ points $[0],[\alpha],\ldots,[N\alpha]$, two lie in the same interval $[j/N,(j+1)/N)$ modulo $1$, so for some $1\le q\le N$ there is an integer $p$ with
\begin{align*}
|q\alpha-p|<1/N<\varepsilon.
\end{align*}
Since $\alpha$ is irrational, $q\alpha-p\ne 0$. Thus some positive iterate of $R_\alpha$ moves every point by a nonzero circle distance smaller than $\varepsilon$. Repeating this small step produces orbit points within $\varepsilon$ of every point of the circle, so $\{R_\alpha^n([x]):n\ge 0\}$ is dense in $S^1$.
Smoothly, in the coordinate $x\in\mathbb R/\mathbb Z$ the map is $x\mapsto x+\alpha$, so its derivative is
\begin{align*}
D(R_\alpha)_x=1.
\end{align*}
Therefore
\begin{align*}
D(R_\alpha^n)_x=1^n=1,
\end{align*}
so tangent vectors are neither expanded nor contracted exponentially.
Measure-theoretically, [Lebesgue measure](/page/Lebesgue%20Measure) $m$ is invariant because $R_\alpha^{-1}(A)=A-\alpha$ and translations preserve Lebesgue measure:
\begin{align*}
m(R_\alpha^{-1}(A))=m(A-\alpha)=m(A).
\end{align*}
Moreover, by *[Weyl equidistribution for irrational rotations](/theorems/3443)*, for every continuous $g:S^1\to\mathbb R$,
\begin{align*}
\lim_{N\to\infty}\frac{1}{N}\sum_{n=0}^{N-1}g(R_\alpha^n(x))=\int_{S^1}g\,dm.
\end{align*}
Thus irrational rotation has dense recurrent motion and stable time averages, but no sensitive dependence: distances between points are preserved by every iterate.
[/example]
The irrational rotation example also shows why recurrence must be separated from expansion. We next need a theorem that supplies recurrence-like behaviour without assuming expansion, mixing, or a special invariant measure. Compactness gives the first such mechanism: an infinite forward orbit in a compact metric phase space cannot avoid accumulating somewhere.
[quotetheorem:7733]
[citeproof:7733]
This theorem does not yet say that the orbit returns close to its initial condition, nor that the accumulation set has rich dynamics. It gives the first compactness mechanism behind recurrence, which later becomes Birkhoff recurrence and Poincare recurrence in stronger settings. Compactness is the load-bearing hypothesis here: for the map $f:\mathbb R \to \mathbb R$ given by $f(x)=x+1$, the forward orbit of $0$ is $\{0,1,2,\dots\}$ and has no accumulation point in $\mathbb R$.
## The Course Roadmap
Which pieces of structure will be introduced, and why are they ordered this way? The course begins with recurrence and topological chaos because these are the least technical ways to state that orbit behaviour is globally complicated. It then introduces symbolic dynamics as a combinatorial model in which many chaotic features can be computed.
[explanation: From Geometry to Symbols]
Hyperbolic dynamics often produces a stretching-and-folding mechanism: nearby points separate along unstable directions, return through folding, and generate many distinguishable orbit segments. Symbolic dynamics turns this mechanism into sequences over a finite alphabet. The shift map is simple to define, but it already exhibits dense periodic points, transitivity, mixing, entropy, and invariant measures.
The power of symbolic models is that they can serve both as examples and as codings of more geometric systems. A horseshoe, for instance, contains an invariant [Cantor set](/page/Cantor%20Set) on which the dynamics is conjugate to a full shift or a subshift of finite type. This lets us prove chaos in a smooth system by constructing a symbolic subsystem.
[/explanation]
After symbolic dynamics, the course turns to entropy and Lyapunov exponents. These quantify two different aspects of complexity: entropy counts distinguishable orbit segments at large times, while Lyapunov exponents measure exponential rates of tangent-vector growth. The first principle is combinatorial and topological, so we state it before the differential growth principle.
[definition: Informal Entropy Principle]
Entropy is the exponential growth rate of orbit complexity measured at finite resolution and long time.
[/definition]
This is not yet the formal definition of topological or measure-theoretic entropy. It is a guiding principle: later chapters make precise what is being counted, what resolution means, and how the growth rate depends on topology or measure; the complementary question is how individual tangent vectors grow along smooth orbits.
[definition: Informal Lyapunov Principle]
Lyapunov exponents are asymptotic exponential growth rates for derivatives along orbits.
[/definition]
The Lyapunov principle is smooth rather than purely topological. It connects sensitive dependence to differential expansion, and in the presence of invariant measures it becomes a statement about almost-everywhere asymptotic behaviour.
[example: Logistic Map at Parameter Four]
The folding is visible from the values at the endpoints and the critical point. Since
\begin{align*}
L(0)=4\cdot 0\cdot 1=0,\qquad L(1)=4\cdot 1\cdot 0=0,\qquad L(1/2)=4(1/2)(1/2)=1,
\end{align*}
the two halves $[0,1/2]$ and $[1/2,1]$ are both mapped onto $[0,1]$. The derivative is
\begin{align*}
L'(x)=4-8x=4(1-2x),
\end{align*}
so $L'(1/2)=0$, while $|L'(x)|>1$ exactly when $4|1-2x|>1$, equivalently $x<3/8$ or $x>5/8$. Thus the map expands on most of the interval but folds at $x=1/2$.
The relation with the doubling map becomes explicit under the substitution $h(\theta)=\sin^2(\pi\theta)$ from $\mathbb R/\mathbb Z$ to $[0,1]$. For $D(\theta)=2\theta \bmod 1$,
\begin{align*}
L(h(\theta))=4\sin^2(\pi\theta)(1-\sin^2(\pi\theta)).
\end{align*}
Using $1-\sin^2 u=\cos^2 u$ gives
\begin{align*}
L(h(\theta))=4\sin^2(\pi\theta)\cos^2(\pi\theta).
\end{align*}
Using $\sin(2u)=2\sin u\cos u$ gives
\begin{align*}
L(h(\theta))=\sin^2(2\pi\theta)=h(2\theta)=h(D(\theta)).
\end{align*}
Therefore $L\circ h=h\circ D$: the logistic map at parameter four is the image of the doubling map under $h$.
This also gives the absolutely continuous invariant measure. If $y=h(\theta)=\sin^2(\pi\theta)$ with $0<y<1$, the two inverse branches are
\begin{align*}
\theta_1(y)=\frac{1}{\pi}\arcsin(\sqrt y),\qquad \theta_2(y)=1-\frac{1}{\pi}\arcsin(\sqrt y).
\end{align*}
For the first branch,
\begin{align*}
\frac{d\theta_1}{dy}=\frac{1}{\pi}\cdot \frac{1}{\sqrt{1-y}}\cdot \frac{1}{2\sqrt y}=\frac{1}{2\pi\sqrt{y(1-y)}}.
\end{align*}
The second branch has the same absolute derivative, so pushing Lebesgue measure on $\mathbb R/\mathbb Z$ forward by $h$ gives density
\begin{align*}
\rho(y)=\frac{1}{\pi\sqrt{y(1-y)}}.
\end{align*}
Thus the same change of variables that turns doubling into $L$ also supplies symbolic dynamics from binary expansion and the invariant probability measure $\rho(y)\,dy$ on $[0,1]$.
[/example]
The logistic map example illustrates why geometric complexity alone is not the end of the story: we still want to know what long orbit averages do for typical initial conditions. This leads to the ergodic problem, where invariant measures convert deterministic iteration into statistical statements.
[explanation: Time Averages as the Basic Ergodic Question]
Let $(X,\mathcal F,\mu,f)$ be a probability-preserving dynamical system and let $g \in L^1(X,\mathcal F,\mu)$. The central ergodic question is whether the averages
\begin{align*}
\frac{1}{N}\sum_{n=0}^{N-1} g(f^n(x))
\end{align*}
converge for $\mu$-a.e. $x$, and whether the limit is related to the space average $\int_X g\,d\mu$.
[/explanation]
Later, Birkhoff's Ergodic Theorem will answer the convergence part, and ergodicity will identify the limit with the spatial average for invariant observables. This is the point at which chaos begins to look statistically stable rather than merely complicated.
## Conventions and Standing Assumptions
What conventions will prevent technical ambiguity later in the notes? Since the course moves between several categories, every theorem will specify whether $X$ is a [metric space](/page/Metric%20Space), a compact metric space, a smooth manifold, a measurable space, or a probability space. The same symbol may represent different structure in different chapters, so hypotheses carry real content.
[definition: Orbit Notation]
For a map $f:X \to X$, write $f^n$ for the $n$-fold iterate when $n \ge 1$, set $f^0=\operatorname{id}_X$, and write $\mathcal O^+(x)=\{f^n(x):n \ge 0\}$ for the forward orbit of $x$.
[/definition]
This notation is used throughout the discrete-time chapters. When $f$ is invertible, two-sided orbits use $n \in \mathbb Z$; when $f$ is not invertible, backward orbits may not be unique, while continuous-time systems require a separate notation because their time parameter is not restricted to integers.
[definition: Flow Notation]
For a flow $(\varphi_t)_{t \in \mathbb R}$ on $X$, with associated action $\varphi:\mathbb R \times X \to X$, write $\mathcal O(x)=\{\varphi(t,x):t \in \mathbb R\}$ for the full orbit of $x$ and $\mathcal O^+(x)=\{\varphi(t,x):t \ge 0\}$ for the forward orbit.
[/definition]
Flow notation will appear when discussing suspension flows, recurrence for continuous time, and Lyapunov exponents for differential equations. The course will often prove a statement for maps first and then explain the corresponding flow version.
[remark: Metrics and Coordinates]
When $X$ is a metric space, open balls are written $B(x_0,r)$. When $M$ is a smooth manifold, derivatives of maps are written as linear maps $Df_x$ or $dF_p$ according to context, while Jacobian matrices are written $Jf_x$ in coordinates.
[/remark]
The notation separates coordinate-free statements from coordinate calculations. This distinction becomes important when defining stable directions, invariant splittings, and Lyapunov exponents.
[remark: Probability Language]
For probability-preserving systems, the probability space is written $(X,\mathcal F,\mu)$ or $(X,\mathcal F,\mathbb P)$ according to context. Expectations are written $\mathbb E[\cdot]$, and statements holding outside a null set are said to hold $\mu$-a.e. or a.s.
[/remark]
These conventions make it possible to compare topological statements, which often concern every point or every [open set](/page/Open%20Set), with measure-theoretic statements, which usually concern almost every point.
## What Counts as Understanding a Chaotic System
What should a reader be able to do by the end of the course? The answer is not a single definition of chaos, but a collection of techniques that reveal structure at different resolutions. A system is understood when its recurrent sets, symbolic models, invariant measures, entropy, and growth rates fit into a coherent picture.
[explanation: The Working Philosophy]
Topological chaos asks whether orbits can move between regions in complicated ways. Hyperbolic dynamics explains how stretching and folding create this behaviour robustly. Symbolic dynamics converts orbit itineraries into sequences, making combinatorial arguments possible. Ergodic theory studies what typical long orbits do relative to invariant measures.
These perspectives do not always agree. An irrational rotation is topologically recurrent and measure-theoretically rigid but has zero entropy and no sensitive dependence. A full shift has abundant periodic points, positive entropy, and strong mixing. A smooth nonuniformly hyperbolic map may show expansion only for almost every point with respect to a chosen invariant measure. The course is organized around learning which hypotheses make these descriptions align.
[/explanation]
The later chapters will repeatedly return to two guiding questions. First, what geometric mechanism creates many distinguishable orbit segments? Second, what statistical law governs the long-time behaviour of typical orbits? The purpose of the introduction is to keep these questions visible before the technical machinery begins.
The introduction leaves us with two questions that organize the entire course: what creates many distinguishable orbit segments, and what governs the long-time statistics of typical orbits? The next chapter begins by fixing the language needed to answer both, starting from recurrence and the basic notion of a dynamical system.
# 1. From Recurrence to Chaos
This opening chapter sets up the common language used throughout the course: a dynamical system is a rule for moving points in a phase space, and the central questions concern what can happen after many iterations or after a long time. We begin with recurrence, where orbits return near where they started, and then move toward topological chaos, where recurrence coexists with global mixing and sensitive dependence on initial conditions. The same vocabulary also appears outside dynamics proper: Markov chains ask for recurrence of states, numerical schemes iterate discretised evolution rules, and statistical mechanics studies long-time averages under measure-preserving transformations. The examples in this chapter are deliberately concrete: rotations, expanding circle maps, the logistic map, and suspension flows will reappear later as models for symbolic dynamics, entropy, and invariant measures.
## Orbits and Invariant Sets
How should we record the long-term behaviour of a point when the system may be discrete, continuous, invertible, or non-invertible? The first task is to separate the space of possible states from the rule that evolves those states, and then to define the orbit as the object whose geometry we study.
[definition: Discrete Dynamical System]
A discrete dynamical system is a pair $(X, f)$ where $X$ is a phase space and $f: X \to X$ is a map. For $n \in \mathbb N$, the $n$-th iterate is defined by
\begin{align*}
f^n &= \underbrace{f \circ \cdots \circ f}_{n\text{ factors}}, &
f^0 &= \operatorname{id}_X.
\end{align*}
[/definition]
The definition gives the evolution rule, but the raw list of iterates is not yet organised as a geometric object. To compare periodic motion, dense motion, and convergence, we need a named set attached to each initial condition. That set is the orbit, and the distinction between forward and full orbits records whether the dynamics can be reversed.
[definition: Forward Orbit]
Let $(X, f)$ be a discrete dynamical system and let $x \in X$. The forward orbit of $x$ is
\begin{align*}
\mathcal O^+(x) = \{f^n(x) : n \ge 0\}.
\end{align*}
If $f$ is invertible, the full orbit of $x$ is
\begin{align*}
\mathcal O(x) = \{f^n(x) : n \in \mathbb Z\}.
\end{align*}
[/definition]
The forward orbit is the basic experimental trace of a discrete system. In a non-invertible system, two different points may merge under iteration, so the past is not part of the data unless an inverse branch has been chosen.
[example: Irrational Rotation Orbits]
Let $\mathbb T=\mathbb R/\mathbb Z$ and fix $\alpha\in\mathbb R\setminus\mathbb Q$. For $R_\alpha(x)=x+\alpha \pmod 1$, induction from the definition of iterate gives
\begin{align*}
R_\alpha^n(x)=x+n\alpha \pmod 1.
\end{align*}
Hence the forward orbit is $\mathcal O^+(x)=\{x+n\alpha \pmod 1:n\ge 0\}$.
This orbit never repeats. Indeed, if $R_\alpha^n(x)=R_\alpha^m(x)$ with $n>m$, then
\begin{align*}
x+n\alpha \equiv x+m\alpha \pmod 1.
\end{align*}
Subtracting $x+m\alpha$ from both sides gives
\begin{align*}
(n-m)\alpha\in\mathbb Z.
\end{align*}
Since $n-m\ne 0$, this implies $\alpha\in\mathbb Q$, contradicting the choice of $\alpha$.
Now let $I\subset\mathbb T$ be any non-empty arc, and choose $N$ so large that $1/N<|I|$. Divide $\mathbb T$ into $N$ arcs of length $1/N$. Among the $N+1$ points
\begin{align*}
0,\alpha,2\alpha,\ldots,N\alpha \pmod 1
\end{align*}
two lie in the same subarc, so for some $0\le i<j\le N$ the integer $q=j-i$ satisfies
\begin{align*}
\|q\alpha\|_{\mathbb T}<1/N,
\end{align*}
where $\|t\|_{\mathbb T}$ is the distance from $t$ to the nearest integer. Write $\beta=q\alpha \pmod 1$. Then either $0<\beta<1/N$ or $0<1-\beta<1/N$. In the first case, the points $0,\beta,2\beta,\ldots$ move around the circle with gaps smaller than $1/N$ before each wraparound; in the second case, the same statement holds in the opposite circular order because the step $1-\beta$ is smaller than $1/N$. Thus some multiple $m\beta$ lies in the translated arc $I-x$. Since
\begin{align*}
x+m\beta \equiv x+mq\alpha \equiv R_\alpha^{mq}(x)\pmod 1,
\end{align*}
the forward orbit of $x$ meets $I$. Therefore irrational rotations have non-periodic orbits that nevertheless visit every arc of the circle.
[/example]
Rotations show that an orbit may be recurrent without being periodic. To discuss parts of phase space that trap orbits, we need a notion of invariance.
[definition: Invariant Set]
Let $(X, f)$ be a discrete dynamical system. A subset $A \subset X$ is forward invariant if $f(A) \subset A$, invariant if $f(A) = A$, and completely invariant if $f^{-1}(A) = A$.
[/definition]
Forward invariance is the right condition for trapping regions: after the orbit enters $A$, it remains there. Equality $f(A)=A$ is stronger for non-invertible maps because every point of $A$ must have at least one preimage in $A$.
[example: Invariant Sets for the Doubling Map]
Let $D:\mathbb T\to\mathbb T$ be $D(x)=2x \pmod 1$. The singleton $\{0\}$ is invariant because
\begin{align*}
D(0)=2\cdot 0 \pmod 1=0.
\end{align*}
Thus $D(\{0\})=\{0\}$.
Let
\begin{align*}
A=\left\{\frac{p}{2^k}\pmod 1 : p\in\mathbb Z,\ k\in\mathbb N\right\}
\end{align*}
be the set of dyadic rational classes. If $x=\frac{p}{2^k}\pmod 1$ with $k\ge 1$, then
\begin{align*}
D(x)=2\cdot \frac{p}{2^k}\pmod 1=\frac{p}{2^{k-1}}\pmod 1.
\end{align*}
The right-hand side is again dyadic. If $k=0$, then $x=p\pmod 1=0$, so $D(x)=0$. Hence $D(A)\subset A$, so $A$ is forward invariant. Moreover, repeated doubling gives
\begin{align*}
D^k\left(\frac{p}{2^k}\pmod 1\right)=p\pmod 1=0,
\end{align*}
so every dyadic rational class eventually lands on the fixed point $0$.
The whole circle is invariant because every $y\in\mathbb T$ has a preimage under $D$: for example, if $x=\frac{y}{2}\pmod 1$, then
\begin{align*}
D(x)=2\cdot \frac{y}{2}\pmod 1=y\pmod 1.
\end{align*}
Thus $D(\mathbb T)=\mathbb T$. By contrast, an arc such as $I=(0,\frac{1}{10})\subset\mathbb T$ is not forward invariant, since $\frac{3}{40}\in I$ but
\begin{align*}
D\left(\frac{3}{40}\right)=\frac{3}{20}\pmod 1
\end{align*}
and $\frac{3}{20}>\frac{1}{10}$, so $D(\frac{3}{40})\notin I$. This illustrates the difference between a trapping set and a set whose image expands outside itself.
[/example]
Continuous-time dynamics require the same concepts but with time indexed by $\mathbb R$ or $[0,\infty)$. The compatibility condition says that flowing for time $s$ and then time $t$ agrees with flowing for time $s+t$.
[definition: Flow]
A flow on a phase space $X$ is a map $\varphi: \mathbb R \times X \to X$ such that, writing $\varphi_t(x)=\varphi(t,x)$, we have $\varphi_0 = \operatorname{id}_X$ and $\varphi_{t+s} = \varphi_t \circ \varphi_s$ for all $s,t \in \mathbb R$.
[/definition]
For flows generated by autonomous ordinary differential equations, $\varphi_t(x_0)$ is the solution at time $t$ starting at $x_0$. A semiflow uses $t \ge 0$ instead of all real times, and this is common when solutions need not exist backward in time.
[example: Suspension Flow over a Circle Map]
Let $f:\mathbb T\to\mathbb T$ be continuous. The unit-roof suspension space is the quotient
\begin{align*}
X_f=(\mathbb T\times[0,1])/((x,1)\sim(f(x),0)).
\end{align*}
Write $[x,s]$ for the equivalence class of $(x,s)$. For $t\ge 0$, choose
\begin{align*}
k=\lfloor s+t\rfloor
\end{align*}
and
\begin{align*}
r=s+t-k.
\end{align*}
Then $0\le r<1$, and the forward suspension motion is
\begin{align*}
\Phi_t([x,s])=[f^k(x),r].
\end{align*}
Thus the point moves upward from height $s$ to height $s+t$; each time the height crosses an integer, the quotient relation identifies the top of one fiber with the bottom over the image under $f$.
For the cross-section $\Sigma=\{[x,0]:x\in\mathbb T\}$, the first unit of time gives
\begin{align*}
\Phi_1([x,0])=[f^{\lfloor 1\rfloor}(x),1-\lfloor 1\rfloor]=[f(x),0].
\end{align*}
Inductively, for every $n\in\mathbb N$,
\begin{align*}
\Phi_n([x,0])=[f^n(x),0].
\end{align*}
Indeed, if $\Phi_n([x,0])=[f^n(x),0]$, then
\begin{align*}
\Phi_{n+1}([x,0])=\Phi_1([f^n(x),0])=[f(f^n(x)),0]=[f^{n+1}(x),0].
\end{align*}
So the return map of the suspension motion to $\Sigma$ is exactly $f$. In particular, if $f^{n_j}(x)\to x$ for some sequence $n_j\to\infty$, then
\begin{align*}
\Phi_{n_j}([x,0])=[f^{n_j}(x),0]\to [x,0].
\end{align*}
Thus discrete recurrence for $f$ becomes return behaviour for the continuous-time suspension, with $\mathbb T\times\{0\}$ recording the original map.
[/example]
Suspensions give the bridge between iterated maps and flows. Many topological properties are studied first for maps because the return map to a cross-section often contains the essential recurrence.
## Limit Sets and Recurrence
When an orbit does not converge to a single point, what should count as its limiting behaviour? The correct object is not the set of all visited points, but the set of points that are approached along arbitrarily late times.
[definition: Omega-Limit Set]
Let $(X,d)$ be a metric space and let $f: X \to X$. The omega-limit set of $x \in X$ is
\begin{align*}
\omega(x) = \{y \in X : \text{there exist } n_k \to \infty \text{ with } f^{n_k}(x) \to y\}.
\end{align*}
For a flow $\varphi_t$, the omega-limit set is
\begin{align*}
\omega(x) = \{y \in X : \text{there exist } t_k \to \infty \text{ with } \varphi_{t_k}(x) \to y\}.
\end{align*}
[/definition]
The omega-limit set ignores finite initial behaviour, so it is stable under replacing $x$ by $f^m(x)$. Before using it as the long-term state space of an orbit, we need to know that compact dynamics actually produce such limiting points and that the set behaves well under the map. The next result supplies those structural facts.
[quotetheorem:7734]
[citeproof:7734]
This theorem explains why compact systems cannot lose all limiting behaviour at infinity: compactness prevents the tail sequence from escaping, and continuity is exactly what lets limits of tail iterates remain compatible with the dynamics. Without compactness the conclusion can fail, for instance the translation $f(x)=x+1$ on $\mathbb R$ has $\omega(x)=\varnothing$ for every $x$. Without continuity, a [limit point](/page/Limit%20Point) of the orbit need not be sent to a limit point of the shifted orbit, so forward invariance can break.
The theorem also has a sharp limitation: $\omega(x)$ may be a complicated compact invariant set, not a single attracting state, and it does not imply that $x$ itself returns near where it started. It turns recurrence into a membership question, namely whether the original point lies in the compact invariant set attached to its own tail. The next definition isolates precisely that case.
[definition: Recurrent Point]
Let $(X,d)$ be a metric space and let $f: X \to X$. A point $x \in X$ is recurrent if $x \in \omega(x)$.
[/definition]
Equivalently, a recurrent point has a sequence $n_k \to \infty$ such that $f^{n_k}(x) \to x$. Recurrence is weaker than periodicity: a periodic point returns exactly after a fixed number of iterates, while a recurrent point only returns arbitrarily close along a sequence of times.
[example: Periodic and Recurrent Points]
For the doubling map $D(x)=2x \pmod 1$, take a rational class $x=\frac{p}{q}\pmod 1$ with $q$ odd. Since $\gcd(2,q)=1$, multiplication by $2$ is invertible modulo $q$. Among the finitely many residues
\begin{align*}
1,2,2^2,\ldots,2^q \pmod q
\end{align*}
two are equal, so for some $0\le i<j\le q$,
\begin{align*}
2^i\equiv 2^j \pmod q.
\end{align*}
Multiplying by the inverse of $2^i$ modulo $q$ gives
\begin{align*}
2^{j-i}\equiv 1 \pmod q.
\end{align*}
Therefore
\begin{align*}
D^{j-i}\left(\frac{p}{q}\right)=\frac{2^{j-i}p}{q}\pmod 1=\frac{p}{q}\pmod 1,
\end{align*}
so $x$ is periodic. More generally, every periodic point is recurrent: if $f^r(x)=x$ for some $r\ge 1$, then for $n_k=kr$ we have
\begin{align*}
f^{n_k}(x)=f^{kr}(x)=x
\end{align*}
for every $k$, and $n_k\to\infty$.
For an irrational rotation $R_\alpha(x)=x+\alpha\pmod 1$, no point is periodic. Indeed, if $R_\alpha^n(x)=x$ for some $n\ge 1$, then
\begin{align*}
x+n\alpha\equiv x \pmod 1,
\end{align*}
so $n\alpha\in\mathbb Z$, hence $\alpha\in\mathbb Q$, a contradiction. However, every point is recurrent. For each $N\ge 1$, the $N+1$ points
\begin{align*}
0,\alpha,2\alpha,\ldots,N\alpha \pmod 1
\end{align*}
lie in $N$ arcs of length $1/N$, so two of them differ by some $q_N\alpha$ with $1\le q_N\le N$ and
\begin{align*}
\|q_N\alpha\|_{\mathbb T}<\frac{1}{N}.
\end{align*}
Choosing $N\to\infty$ gives return times with $\|q_N\alpha\|_{\mathbb T}\to 0$, and these times must be unbounded because $\alpha$ is irrational. Hence
\begin{align*}
R_\alpha^{q_N}(x)=x+q_N\alpha \pmod 1 \to x.
\end{align*}
Thus the doubling map gives exact periodic return for these rational classes, while irrational rotations give approximate return without exact periodicity.
[/example]
The examples show recurrence in familiar systems, but the [compactness theorem](/theorems/2748) suggests a broader existence principle. Even if a particular starting point does not return near itself, the closed region generated by its future orbit has compact limiting structure. The next theorem says that every non-empty closed trapping region contains at least one point with genuine recurrent behaviour.
[quotetheorem:3428]
The result does more than find a recurrent point: it identifies a compact invariant piece with no smaller closed trapping subsystem. Each hypothesis is doing visible work. Compactness is what makes nested intersections non-empty; on the non-compact system $f(x)=x+1$ on $\mathbb R$, the closed forward invariant set $\mathbb R$ contains no recurrent point. Closedness keeps limiting points inside the region under study, while forward invariance is needed because a set that is not trapping can lose its orbit before recurrence has any chance to occur.
The theorem should not be read as saying that every point in a closed forward invariant set is recurrent, or that recurrence occurs on a large open set. A compact system may have attracting basins whose typical points converge toward a smaller recurrent set without being recurrent themselves. To reuse this irreducible compact piece, we need a name for closed forward invariant sets with no proper closed trapping subsystem. This motivates the following definition of a minimal set, which will let us describe rotations and later symbolic systems in a common language.
[definition: Minimal Set]
Let $X$ be a compact metric space and let $f: X \to X$ be continuous. A non-empty closed forward invariant set $M \subset X$ is minimal if it contains no proper non-empty closed forward invariant subset.
[/definition]
Minimal sets are the places where recurrence is distributed across the whole component rather than isolated at a single periodic orbit.
[example: Minimality of Irrational Rotations]
Let $\alpha \in \mathbb R \setminus \mathbb Q$ and let $R_\alpha(x)=x+\alpha \pmod 1$ on $\mathbb T$. We show that every forward orbit is dense, and then minimality follows from closed forward invariance.
Fix $x\in\mathbb T$ and let $I\subset\mathbb T$ be a non-empty arc. Choose $N\ge 1$ such that $1/N<|I|$. Among the $N+1$ points
\begin{align*}
0,\alpha,2\alpha,\ldots,N\alpha \pmod 1
\end{align*}
two lie in the same one of the $N$ half-open arcs of length $1/N$. Thus there are integers $0\le i<j\le N$ such that, with $q=j-i$, we have
\begin{align*}
\|q\alpha\|_{\mathbb T}<\frac{1}{N}.
\end{align*}
Since $\alpha$ is irrational and $q\ge 1$, $q\alpha\notin\mathbb Z$, so $q\alpha \pmod 1$ is not $0$.
Write $\beta=q\alpha \pmod 1$ with $0<\beta<1$. If $\beta<1/N$, set $\gamma=\beta$; if $\beta>1-1/N$, set $\gamma=1-\beta$. In either case,
\begin{align*}
0<\gamma<\frac{1}{N}<|I|.
\end{align*}
The points $0,\gamma,2\gamma,\ldots \pmod 1$ cut the circle into successive gaps of length at most $\gamma$, so every arc of length larger than $\gamma$ contains one of them. Applying this to the translated arc $I-x$, some multiple of $\gamma$ lies in $I-x$ modulo $1$.
If $\gamma=\beta$, this gives an $m\ge 0$ with $m\beta\in I-x$. Since $\beta\equiv q\alpha \pmod 1$,
\begin{align*}
x+m\beta \equiv x+mq\alpha \equiv R_\alpha^{mq}(x)\pmod 1.
\end{align*}
Hence $R_\alpha^{mq}(x)\in I$. If $\gamma=1-\beta$, then $m\gamma\in x-I$ is equivalent to $-m\gamma\in I-x$, and because $-\gamma\equiv \beta \pmod 1$, the same calculation gives
\begin{align*}
x+m\beta \equiv x+mq\alpha \equiv R_\alpha^{mq}(x)\pmod 1.
\end{align*}
Thus the forward orbit of $x$ meets every non-empty arc $I$, so it is dense in $\mathbb T$.
Now let $M\subset\mathbb T$ be non-empty, closed, and forward invariant under $R_\alpha$. Choose $x\in M$. Forward invariance gives $R_\alpha^n(x)\in M$ for every $n\ge 0$, so
\begin{align*}
\overline{\mathcal O^+(x)}\subset M.
\end{align*}
The density just proved gives $\overline{\mathcal O^+(x)}=\mathbb T$, hence $M=\mathbb T$. Therefore the only non-empty closed forward invariant subset is the whole circle, so the irrational rotation is minimal.
[/example]
Minimality is a topological notion, and the rotation example shows how it can be verified by density of every orbit. The course also needs a recurrence principle suited to probabilistic and statistical questions, where a set may have negligible exceptional subsets. To state that principle, the dynamics must preserve a finite measure.
[definition: Measure-Preserving Transformation]
Let $(X,\mathcal F,\mu)$ be a measure space. A measurable map $f:X\to X$ is measure-preserving if
\begin{align*}
\mu(f^{-1}(A)) = \mu(A)
\end{align*}
for every $A \in \mathcal F$.
[/definition]
The inverse image appears because $f$ may not be invertible. This definition is designed so that the distribution of a point does not change after applying the dynamics.
[quotetheorem:3425]
Poincare recurrence is weaker than pointwise convergence information but stronger than mere non-emptiness of recurrent points. The finiteness assumption is essential: translation $f(x)=x+1$ on $(\mathbb R,\mathcal B(\mathbb R),\mathcal L^1)$ preserves Lebesgue measure, but the set $A=[0,1]$ never returns to itself after positive integer times. Measure preservation is equally essential: a map that pushes mass into a strict attracting subset can make a positive-measure set disappear from its own future, so recurrence need not hold.
The theorem also does not bound the gaps between return times, does not say that $f^n(x)$ converges, and does not assert that every point returns. It is an almost-everywhere statement inside finite invariant measure, and it is the first indication that invariant measures impose statistical constraints on orbits.
## Topological Transitivity and Mixing
A system may have recurrent points but still split into independent regions. The next question asks when the dynamics can move information from any open part of the phase space to any other open part.
[definition: Topological Transitivity]
Let $(X,d)$ be a metric space and let $f:X\to X$ be continuous. The map $f$ is topologically transitive if for every pair of non-empty open sets $U,V \subset X$, there exists $n \ge 0$ such that
\begin{align*}
f^n(U) \cap V \ne \varnothing.
\end{align*}
[/definition]
Transitivity is a topological irreducibility condition. It says that no open region is dynamically isolated from another open region. The open-set formulation is useful for proofs, but in examples we often recognise transitivity by finding a single orbit that visits every open set. This motivates proving that the open-set definition is equivalent to the dense-orbit formulation on compact metric spaces.
[quotetheorem:7735]
[citeproof:7735]
The theorem turns a global open-set condition into the existence of a single dense orbit, but the hypotheses rule out several pathologies. The no-isolated-points condition is needed because an isolated point is itself an open set. On a finite discrete space with a cyclic permutation, the open-set condition holds and every cyclic orbit is dense, but this is a degenerate form of transitivity: density only means visiting finitely many isolated points. Compact metric spaces supply the countable basis and the [Baire category theorem](/theorems/630) used to build one point meeting all basis elements; outside a Baire or second-countable setting, the open dense hitting sets used in the proof may either have empty intersection or fail to reduce to a countable list.
The dense-orbit formulation also has a limitation: it gives one orbit that samples every open set, not uniform information about all orbits and not any estimate on hitting times. Mixing, introduced next, strengthens transitivity by requiring visits from $U$ to $V$ at all sufficiently late times rather than at a single time.
[example: Transitivity of the Doubling Map]
Let $U,V\subset\mathbb T$ be non-empty open arcs. Choose a smaller open arc $J\subset U$ and write it in a lift as $J=(a,b)\pmod 1$ with length $\ell=b-a>0$. For $n\ge 0$, induction from $D(x)=2x\pmod 1$ gives
\begin{align*}
D^n(x)=2^n x\pmod 1.
\end{align*}
Choose $n$ so large that $2^n\ell>1$. We show that $D^n(J)=\mathbb T$. Let $y\in\mathbb T$ and choose a representative $y\in[0,1)$. Since the interval $(2^n a,2^n b)$ has length
\begin{align*}
2^n b-2^n a=2^n(b-a)=2^n\ell>1,
\end{align*}
there is an integer $k$ with
\begin{align*}
y+k\in(2^n a,2^n b).
\end{align*}
Set $x=(y+k)/2^n$. Then $x\in(a,b)$, so $x\pmod 1\in J$, and
\begin{align*}
D^n(x)=2^n x\pmod 1=y+k\pmod 1=y\pmod 1.
\end{align*}
Thus $D^n(J)=\mathbb T$, and therefore
\begin{align*}
D^n(U)\cap V\supset D^n(J)\cap V=V\ne\varnothing.
\end{align*}
So the doubling map is topologically transitive.
In binary notation, away from the two possible expansions of dyadic endpoints, $x=0.\varepsilon_1\varepsilon_2\varepsilon_3\cdots$ satisfies
\begin{align*}
D(x)=2x\pmod 1=0.\varepsilon_2\varepsilon_3\varepsilon_4\cdots.
\end{align*}
Thus each iterate shifts the binary sequence left, which is why an interval determined by finitely many initial binary digits eventually becomes large enough to meet any prescribed open arc.
[/example]
Transitivity asks for at least one hitting time, so it still allows long gaps between visits from $U$ to $V$. Chaotic expanding examples usually have a stronger property: after waiting long enough, every later time is a hitting time. This persistent version is topological mixing.
[definition: Topological Mixing]
Let $(X,d)$ be a metric space and let $f:X\to X$ be continuous. The map $f$ is topologically mixing if for every pair of non-empty open sets $U,V \subset X$, there exists $N \ge 0$ such that
\begin{align*}
f^n(U) \cap V \ne \varnothing
\end{align*}
for all $n \ge N$.
[/definition]
Mixing implies transitivity, but the reverse implication fails. Irrational rotation is transitive on the circle, but its rigid geometry prevents eventual persistent overlap of all open arcs.
[example: Irrational Rotation Is Not Mixing]
Let $R_\alpha(x)=x+\alpha \pmod 1$ with $\alpha\notin\mathbb Q$. We show that $R_\alpha$ is not topologically mixing by choosing two arcs whose intersections fail at arbitrarily large times. Let
\begin{align*}
U=(0,\tfrac{1}{10})\subset\mathbb T
\end{align*}
and
\begin{align*}
V=(\tfrac{1}{2},\tfrac{3}{5})\subset\mathbb T.
\end{align*}
For every $n\ge 0$,
\begin{align*}
R_\alpha^n(U)=U+n\alpha \pmod 1,
\end{align*}
because $R_\alpha^n(x)=x+n\alpha \pmod 1$ by induction from the definition of iterate.
It remains to find arbitrarily large $n$ for which this translate misses $V$. Fix $N\ge 1$. Among the $N+1$ points
\begin{align*}
0,\alpha,2\alpha,\ldots,N\alpha \pmod 1
\end{align*}
two lie in the same one of the $N$ arcs of length $1/N$, so for some $1\le q\le N$,
\begin{align*}
\|q\alpha\|_{\mathbb T}<\frac{1}{N}.
\end{align*}
Taking $N\to\infty$ and discarding repeated values if necessary gives arbitrarily large integers $q$ with
\begin{align*}
\|q\alpha\|_{\mathbb T}<\frac{1}{10}.
\end{align*}
The values $q$ cannot remain bounded, because a bounded infinite set of integers has a repeated value, and a fixed nonzero value of $\|q\alpha\|_{\mathbb T}$ cannot be smaller than $1/N$ for all large $N$.
For such a $q$, write $q\alpha\pmod 1=\theta$. If $0\le \theta<1/10$, then
\begin{align*}
R_\alpha^q(U)=(\theta,\theta+\tfrac{1}{10})\subset (0,\tfrac{1}{5}),
\end{align*}
so $R_\alpha^q(U)\cap V=\varnothing$. If $9/10<\theta<1$, then
\begin{align*}
R_\alpha^q(U)=(\theta,1)\cup(0,\theta-\tfrac{9}{10})\subset (\tfrac{9}{10},1)\cup(0,\tfrac{1}{10}),
\end{align*}
so again $R_\alpha^q(U)\cap V=\varnothing$. Since these missing times $q$ are arbitrarily large, there is no $M$ such that $R_\alpha^n(U)\cap V\ne\varnothing$ for all $n\ge M$. Thus irrational rotation is transitive but not topologically mixing.
[/example]
Expanding maps behave differently because they stretch small sets until they cover the space. For rotations, the image of a small arc always has the same length, so the dynamics can move the arc around without creating persistent overlap. For the doubling map, each iterate doubles lengths before reducing modulo $1$, and after finitely many iterates any arc has been cut into pieces whose images run through the circle. This stretching mechanism is the topological form of the loss of initial binary digits.
[example: Mixing for the Doubling Map]
Let $D:\mathbb T\to\mathbb T$ be $D(x)=2x\pmod 1$, and let $U,V\subset\mathbb T$ be non-empty open arcs. Choose an open arc $J\subset U$ and write a lift of it as $J=(a,b)\pmod 1$ with length $\ell=b-a>0$. By induction,
\begin{align*}
D^n(x)=2^n x\pmod 1.
\end{align*}
Indeed, the case $n=0$ is $D^0(x)=x=2^0x\pmod 1$, and if $D^n(x)=2^n x\pmod 1$, then
\begin{align*}
D^{n+1}(x)=D(D^n(x))=2(2^n x)\pmod 1=2^{n+1}x\pmod 1.
\end{align*}
Choose $N$ such that $2^N\ell>1$. Then for every $n\ge N$ we also have $2^n\ell>1$. Fix such an $n$ and take any $y\in\mathbb T$, represented by a number $y\in[0,1)$. The interval $(2^n a,2^n b)$ has length
\begin{align*}
2^n b-2^n a=2^n(b-a)=2^n\ell>1.
\end{align*}
Since every open interval of length greater than $1$ contains some point of the form $y+k$ with $k\in\mathbb Z$, choose $k$ such that
\begin{align*}
y+k\in(2^n a,2^n b).
\end{align*}
Set
\begin{align*}
x=\frac{y+k}{2^n}.
\end{align*}
Then $x\in(a,b)$, so $x\pmod 1\in J\subset U$, and
\begin{align*}
D^n(x)=2^n x\pmod 1=y+k\pmod 1=y\pmod 1.
\end{align*}
Thus $D^n(J)=\mathbb T$ for every $n\ge N$. Therefore
\begin{align*}
D^n(U)\cap V\supset D^n(J)\cap V=\mathbb T\cap V=V\ne\varnothing.
\end{align*}
So for every pair of non-empty open arcs $U,V$, all sufficiently large iterates of $U$ meet $V$, which is exactly topological mixing for the doubling map on arcs. The calculation shows the mechanism: each iterate doubles the lifted length until the image wraps around the whole circle.
[/example]
## Sensitive Dependence and Devaney Chaos
Transitivity and mixing describe how open sets move, but chaos also concerns the instability of individual predictions. The guiding question is whether arbitrarily close initial conditions can eventually separate by a definite amount.
[definition: Sensitive Dependence on Initial Conditions]
Let $(X,d)$ be a metric space and let $f:X\to X$ be continuous. The map $f$ has sensitive dependence on initial conditions if there exists $\delta>0$ such that for every $x \in X$ and every $\varepsilon>0$, there exist $y \in X$ and $n\ge 0$ with
\begin{align*}
d(x,y)<\varepsilon, \qquad d(f^n(x), f^n(y))>\delta.
\end{align*}
[/definition]
The constant $\delta$ is a uniform scale of eventual separation. Sensitivity does not say that nearby points separate forever; it says that arbitrary precision in the initial condition cannot prevent some later macroscopic discrepancy.
[example: Sensitivity of the Doubling Map]
For $D(x)=2x \pmod 1$ on $\mathbb T=\mathbb R/\mathbb Z$, we show sensitivity with separation constant $\delta=\frac14$. Fix $x\in\mathbb T$ and $\varepsilon>0$. Choose $k\ge 0$ so large that
\begin{align*}
2^{-k-1}<\varepsilon.
\end{align*}
Set
\begin{align*}
y=x+2^{-k-1}\pmod 1.
\end{align*}
Then the circle distance satisfies
\begin{align*}
d_{\mathbb T}(x,y)\le 2^{-k-1}<\varepsilon.
\end{align*}
By induction from $D(x)=2x\pmod 1$, we have
\begin{align*}
D^k(z)=2^kz\pmod 1.
\end{align*}
Therefore
\begin{align*}
D^k(y)=2^k(x+2^{-k-1})\pmod 1=2^kx+\frac12\pmod 1.
\end{align*}
Also
\begin{align*}
D^k(x)=2^kx\pmod 1.
\end{align*}
On the circle, a point and its translate by $\frac12$ have distance $\frac12$, so
\begin{align*}
d_{\mathbb T}(D^k(x),D^k(y))=\frac12>\frac14.
\end{align*}
Thus every neighbourhood of every $x$ contains a point whose orbit separates from the orbit of $x$ by at least $\delta=\frac14$ at some iterate.
In binary notation, this same calculation says that changing a digit far out in the expansion creates a tiny initial perturbation, and the doubling map shifts that changed digit left until it becomes a first-digit discrepancy.
[/example]
The doubling map combines sensitivity with transitivity and abundant periodic points, and this combination is stronger than sensitivity alone. A map can be sensitive for local reasons while still decomposing into unrelated pieces. This motivates Devaney's definition, which packages together global indecomposability, dense exact recurrence, and prediction instability.
[definition: Devaney Chaos]
Let $(X,d)$ be a metric space and let $f:X\to X$ be continuous. The map $f$ is chaotic in the sense of Devaney if $f$ is topologically transitive, the set $\{x \in X : f^n(x)=x \text{ for some } n\ge 1\}$ is dense in $X$, and $f$ has sensitive dependence on initial conditions.
[/definition]
The definition lists sensitivity as a separate condition because it captures the predictive meaning of chaos. In compact spaces without isolated points, however, the first two conditions already force sensitivity. The next theorem explains why dense periodic structure and transitivity leave no room for stable prediction.
[quotetheorem:7736]
[citeproof:7736]
This result explains why Devaney's third condition is sometimes treated as a consequence rather than an independent axiom, but the hypotheses are not cosmetic. Transitivity alone is not enough: an irrational rotation is transitive but has no sensitive dependence because it is an isometry. Dense periodic points alone are not enough either: a disjoint union of attracting periodic components may have many periodic points while lacking global orbit mixing. Sensitivity still captures only eventual separation at some time; it does not imply sustained exponential divergence, positive entropy, or statistical randomness. Later symbolic and entropy viewpoints strengthen this qualitative instability into computable orbit complexity.
[example: Logistic Map at Parameter Four]
Let $L:[0,1]\to[0,1]$ be $L(x)=4x(1-x)$, and define
\begin{align*}
g(u)=\sin^2\left(\frac{\pi u}{2}\right).
\end{align*}
The map $g$ is a homeomorphism from $[0,1]$ to $[0,1]$ because $\frac{\pi u}{2}$ increases from $0$ to $\frac{\pi}{2}$ and $\sin^2$ is strictly increasing on that interval. Let $T(u)=1-|1-2u|$ be the tent map. If $0\le u\le \frac12$, then $T(u)=2u$, and
\begin{align*}
g(T(u))=\sin^2(\pi u).
\end{align*}
Also,
\begin{align*}
L(g(u))=4\sin^2\left(\frac{\pi u}{2}\right)\left(1-\sin^2\left(\frac{\pi u}{2}\right)\right).
\end{align*}
Using $1-\sin^2\theta=\cos^2\theta$ gives
\begin{align*}
L(g(u))=4\sin^2\left(\frac{\pi u}{2}\right)\cos^2\left(\frac{\pi u}{2}\right)=\sin^2(\pi u).
\end{align*}
Thus $L(g(u))=g(T(u))$ on $[0,\frac12]$. If $\frac12\le u\le 1$, then $T(u)=2-2u$, so
\begin{align*}
g(T(u))=\sin^2(\pi(1-u))=\sin^2(\pi u),
\end{align*}
and the same calculation gives $L(g(u))=g(T(u))$. Hence
\begin{align*}
L\circ g=g\circ T.
\end{align*}
So the logistic map at parameter four is conjugate to the tent map.
It is also semiconjugate to the doubling map on $\mathbb T$ by
\begin{align*}
h(t)=\sin^2(\pi t).
\end{align*}
Indeed,
\begin{align*}
L(h(t))=4\sin^2(\pi t)(1-\sin^2(\pi t)).
\end{align*}
Again using $1-\sin^2\theta=\cos^2\theta$,
\begin{align*}
L(h(t))=4\sin^2(\pi t)\cos^2(\pi t)=\sin^2(2\pi t)=h(2t\pmod 1).
\end{align*}
Thus
\begin{align*}
L\circ h=h\circ D,
\end{align*}
where $D(t)=2t\pmod 1$. The map $h$ is not one-to-one because $h(t)=h(1-t)$, which is the endpoint-identification feature behind the passage from the doubling map to the interval model.
The tent map has dense periodic points because every finite binary word can be repeated periodically, giving a point whose itinerary under $T$ repeats; intervals in $[0,1]$ contain points with any sufficiently long prescribed initial binary word. Since $g$ is a homeomorphism and $L\circ g=g\circ T$, the images under $g$ of those periodic points are periodic for $L$, so periodic points of $L$ are dense in $[0,1]$. The same conjugacy transfers transitive behaviour and sensitive dependence from the tent map to $L$: the stretching-and-folding map $T$ separates nearby points by shifting binary itineraries, and $g$ carries that orbit structure onto the logistic map. Thus $L(x)=4x(1-x)$ is a concrete interval map with explicit formulas and chaotic orbit structure.
[/example]
The logistic map shows that chaos is not confined to abstract symbolic models. It appears in a smooth one-dimensional family after stretching and folding become strong enough.
[remark: Recurrence versus Chaos]
Recurrence alone is not chaos: irrational rotations are recurrent and transitive but not sensitive in the expanding sense. Chaos begins when recurrence is combined with orbit separation and a mechanism that moves small regions through the phase space. Later chapters identify symbolic dynamics, hyperbolicity, entropy, and invariant measures as four ways to make that mechanism precise.
[/remark]
Recurrence shows how orbits can return near where they started, but by itself it does not yet explain the proliferation of orbit segments. The symbolic viewpoint introduced next converts this qualitative behaviour into sequences and transition rules, giving a first precise model for chaos.
# 2. Symbolic Dynamics
These notes develop the symbolic part of a course on chaos and ergodic theory. The guiding aim is to turn qualitative features of complicated dynamical systems into precise statements about sequences, transition graphs, invariant sets, and orbit counts. We assume the basic language of metric spaces, continuity, compactness, iterates, invariant sets, recurrence, transitivity, and mixing from Chapters 0 and 1. This chapter introduces symbolic dynamics: it develops the full shift first, then restricts it by finite transition rules, and finally explains how symbolic models arise from interval maps such as the doubling map.
Symbolic dynamics replaces a complicated orbit by the sequence of symbols recording which region of phase space it visits. The point is not merely notational: once a system is encoded by sequences, recurrence, periodicity, mixing, and entropy become questions about words. This makes symbolic dynamics the combinatorial language through which Chapter 3 horseshoes, Chapter 7 entropy, and Chapter 11 Markov partitions will be expressed.
## Coding Orbits by Infinite Words
Suppose a map repeatedly moves a point through a finite list of labelled regions. The first question is what kind of space should contain the resulting infinite record, and what dynamical map should advance the record by one time step.
[definition: Full One-Sided Shift]
Let $A$ be a finite set, called the alphabet. The full one-sided shift over $A$ is the space
\begin{align*}
\Sigma_A^+ := A^{\mathbb Z_{\ge 0}} = \{x = (x_0,x_1,x_2,\dots) : x_i \in A \text{ for all } i \ge 0\},
\end{align*}
equipped with the shift map $\sigma: \Sigma_A^+ \to \Sigma_A^+$ defined by
\begin{align*}
(\sigma x)_i = x_{i+1}.
\end{align*}
[/definition]
The one-sided shift records the future of a non-invertible system, so it matches maps where the past is genuinely lost. Many chaotic systems in this course are invertible on their invariant sets, or arise as restrictions of diffeomorphisms, and then past symbols carry dynamical information rather than redundancy. This motivates a two-sided sequence space where the shift has an inverse.
[definition: Full Two-Sided Shift]
Let $A$ be a finite alphabet. The full two-sided shift over $A$ is the space
\begin{align*}
\Sigma_A := A^{\mathbb Z} = \{x = (x_i)_{i \in \mathbb Z} : x_i \in A\},
\end{align*}
equipped with the shift homeomorphism $\sigma: \Sigma_A \to \Sigma_A$ defined by
\begin{align*}
(\sigma x)_i = x_{i+1}.
\end{align*}
[/definition]
The two-sided shift describes a complete symbolic orbit, but in practice we observe only finitely many coordinates. To turn finite observations into topology, we need sets described by prescribing a finite word in a finite coordinate window. These sets are the symbolic analogue of small neighbourhoods in a phase space.
[definition: Cylinder Set]
Let $A$ be a finite alphabet and let $I \subset \mathbb Z$ be a finite interval. For a word $(a_i)_{i \in I} \in A^I$, the corresponding cylinder set in $\Sigma_A$ is
\begin{align*}
[a_i : i \in I] := \{x \in \Sigma_A : x_i = a_i \text{ for all } i \in I\}.
\end{align*}
For the one-sided shift, the same definition is used with $I \subset \mathbb Z_{\ge 0}$.
[/definition]
Cylinder sets formalise finite-time measurements: knowing a point lies in a cylinder means knowing finitely many symbols of its itinerary. A topology that made infinitely many coordinate conditions open would be too fine for dynamics, while a topology that ignored finite coordinate conditions would fail to represent observations. Since every finite observation should define an open condition, and since every open condition should be reducible to finitely many coordinate observations, the next result identifies the topology generated by cylinders. It also verifies that the shift is a genuine continuous dynamical system.
[quotetheorem:7737]
[citeproof:7737]
The finiteness of the alphabet is essential here: for an infinite discrete alphabet, the product need not be compact, so the same cylinder topology would not give the compact phase spaces used later in recurrence arguments. The theorem does not say that every shift-invariant subset is itself a subshift of finite type; it only supplies the ambient topological space in which such subsets live. This compact topology is what makes later notions such as dense periodic points, mixing, and entropy topological rather than purely combinatorial.
The topology says two sequences are close when they agree for a long block around the origin. The following metric is the standard way to measure that agreement.
[definition: Shift Metric]
Let $A$ be a finite alphabet and fix $0 < \theta < 1$. The shift metric on $\Sigma_A$ is the map
\begin{align*}
d_\theta: \Sigma_A \times \Sigma_A \to \mathbb R
\end{align*}
defined as follows. Set $d_\theta(x,x)=0$. If $x \ne y$, let $N(x,y)$ be the largest integer $N\ge -1$ such that $x_i=y_i$ for all $|i|\le N$, where the condition is vacuous for $N=-1$, and set
\begin{align*}
d_\theta(x,y) := \theta^{N(x,y)+1}.
\end{align*}
For $\Sigma_A^+$, the shift metric is the map $d_\theta:(\Sigma_A^+)\times(\Sigma_A^+)\to\mathbb R$ defined by the same formula, replacing the condition $|i| \le N$ by $0 \le i \le N$.
[/definition]
This metric makes the shift expanding in the direction of time: agreement in future symbols is shortened by applying $\sigma$. That behaviour is the symbolic analogue of stretching in smooth chaotic systems.
[example: Full One-Sided Two-Shift]
Take $A=\{0,1\}$. A point $x\in\Sigma_A^+$ has the form $x=(x_0,x_1,x_2,\dots)$ with each $x_i\in\{0,1\}$, and the shift deletes the first symbol because $(\sigma x)_i=x_{i+1}$ for every $i\ge 0$.
The cylinder $[1,0,1]$ is
\begin{align*}
[1,0,1]=\{x\in\Sigma_A^+ : x_0=1,\ x_1=0,\ x_2=1\}.
\end{align*}
It is open by the definition of the cylinder topology. Its complement is also open, since a binary sequence fails to begin with $101$ exactly when its first disagreement with $101$ occurs at one of the first three coordinates:
\begin{align*}
\Sigma_A^+\setminus[1,0,1]=[0]\cup[1,1]\cup[1,0,0].
\end{align*}
Thus $[1,0,1]$ is clopen. If $x,y\in[1,0,1]$, then $x_i=y_i$ for $0\le i\le 2$. Hence either $x=y$ and $d_\theta(x,y)=0$, or $N(x,y)\ge 2$, so
\begin{align*}
d_\theta(x,y)=\theta^{N(x,y)+1}\le \theta^3.
\end{align*}
Therefore the diameter of $[1,0,1]$ is at most $\theta^3$.
Periodic points are made by repeating a finite word. For example, let $p=010010010\dots$. Then $p_{i+3}=p_i$ for every $i\ge 0$, so
\begin{align*}
(\sigma^3p)_i=p_{i+3}=p_i.
\end{align*}
Thus $\sigma^3p=p$, meaning $p$ is a period-$3$ point of the full one-sided two-shift.
[/example]
The full two-shift gives the cleanest model of arbitrary binary branching, but a dynamical system usually supplies the binary string rather than starting with one. To connect symbols back to orbits, we need a construction that reads the region visited at each iterate. This is the role of the itinerary map.
[definition: Itinerary Map]
Let $f:X\to X$ be a map on a set $X$, and let $\mathcal P=\{P_a : a \in A\}$ be a finite partition of $X$ indexed by an alphabet $A$. The itinerary map associated to $(f,\mathcal P)$ is the map $\iota:X\to A^{\mathbb Z_{\ge 0}}$ defined by
\begin{align*}
\iota(x)_n = a_n,
\end{align*}
where $a_n$ is the unique symbol in $A$ satisfying $f^n(x)\in P_{a_n}$.
[/definition]
The itinerary map turns the original dynamics into symbolic dynamics when membership in partition elements is unambiguous. The main compatibility question is whether coding first and then shifting the symbol sequence gives the same result as moving the point once under the original map and then coding. Boundary ambiguity is the obstruction, so under a genuine partition the coding should preserve the direction of time even if it loses information about the original point.
[quotetheorem:7738]
[citeproof:7738]
The partition hypothesis is necessary: if a point lies in two labelled sets, its symbol may not be uniquely defined, and the formula can fail to describe a genuine function. The theorem does not say that $\iota$ is injective or surjective; different points can share an itinerary, and some symbolic sequences may never occur. Its role is to preserve time evolution, which is the minimum structure needed before asking whether a symbolic model captures recurrence, periodicity, or entropy of the original system.
This semiconjugacy is the bridge from geometric dynamics to sequences. The next example is the basic model behind binary expansions and already displays the issue of non-unique coding.
[example: Coding The Doubling Map]
Let $T:[0,1)\to[0,1)$ be $T(x)=2x \pmod 1$, and partition $[0,1)$ into $P_0=[0,1/2)$ and $P_1=[1/2,1)$. Write a non-ambiguous binary expansion as
\begin{align*}
x=0.x_0x_1x_2\dots=\sum_{k=0}^{\infty} x_k2^{-(k+1)}, \qquad x_k\in\{0,1\}.
\end{align*}
Multiplying by $2$ gives
\begin{align*}
2x=\sum_{k=0}^{\infty}x_k2^{-k}=x_0+\sum_{k=1}^{\infty}x_k2^{-k}.
\end{align*}
Since $x_0\in\{0,1\}$, reducing modulo $1$ removes exactly the integer part $x_0$, so
\begin{align*}
T(x)=2x \pmod 1=\sum_{k=1}^{\infty}x_k2^{-k}=0.x_1x_2x_3\dots .
\end{align*}
Repeating the same computation $n$ times gives
\begin{align*}
T^n(x)=0.x_nx_{n+1}x_{n+2}\dots .
\end{align*}
This point lies in $P_0$ exactly when $x_n=0$ and lies in $P_1$ exactly when $x_n=1$, so the itinerary satisfies
\begin{align*}
\iota(x)_n=x_n.
\end{align*}
Therefore $\iota(x)=(x_0,x_1,x_2,\dots)$, and for every $n\ge 0$,
\begin{align*}
(\iota(Tx))_n=x_{n+1}=(\sigma(\iota(x)))_n.
\end{align*}
Thus $\iota\circ T=\sigma\circ\iota$ on points with non-ambiguous binary expansions. Dyadic rationals are the exceptional points: for example, $1/2=0.1000\dots=0.0111\dots$, so a convention is needed before the itinerary is single-valued there.
[/example]
## Finite Transition Rules and Markov Shifts
The full shift allows every word. Many symbolic systems arising from geometry have local constraints: after seeing one symbol, only certain next symbols are allowed. For example, a system may allow $0$ after $1$ but forbid $1$ after $1$, so the full binary shift would include itineraries that cannot occur geometrically. The problem is to describe these constraints in a way that still leaves the dynamics tractable.
[definition: Transition Matrix]
Let $A=\{1,\dots,m\}$ be a finite alphabet. A transition matrix is a matrix $M \in \{0,1\}^{m\times m}$, where $M_{ij}=1$ means that the symbol $j$ is allowed to follow the symbol $i$.
[/definition]
A transition matrix specifies all allowed adjacent pairs, but the dynamical object is not the matrix alone. The next definition is needed to package the finite transition rule as a shift-invariant sequence space. It restricts the full shift to exactly those sequences whose neighbouring symbols are compatible with the matrix.
[definition: Subshift Of Finite Type]
Let $M \in \{0,1\}^{m\times m}$. The one-sided subshift of finite type associated to $M$ is
\begin{align*}
\Sigma_M^+ := \{x \in \{1,\dots,m\}^{\mathbb Z_{\ge 0}} : M_{x_i x_{i+1}}=1 \text{ for all } i\ge 0\}.
\end{align*}
The two-sided subshift of finite type associated to $M$ is
\begin{align*}
\Sigma_M := \{x \in \{1,\dots,m\}^{\mathbb Z} : M_{x_i x_{i+1}}=1 \text{ for all } i\in \mathbb Z\}.
\end{align*}
[/definition]
The definition keeps only sequences obeying the local rule encoded by $M$. Because the next allowed symbols depend only on the present symbol, the system resembles a Markov chain without probabilities. This motivates the standard name used when the emphasis is topological rather than stochastic.
[definition: Topological Markov Chain]
A topological Markov chain is a dynamical system $(\Sigma_M,\sigma)$ or $(\Sigma_M^+,\sigma)$ associated to a finite transition matrix $M \in \{0,1\}^{m\times m}$, where the shift maps are the restrictions $\sigma:\Sigma_M\to\Sigma_M$ and $\sigma:\Sigma_M^+\to\Sigma_M^+$.
[/definition]
A topological Markov chain reduces admissibility to graph paths. To decide whether a word can connect symbol $i$ to symbol $j$, we need to count paths of a prescribed length in the directed graph of $M$. Matrix multiplication is built to perform exactly this count.
[quotetheorem:2062]
The assumption that admissibility is governed by adjacent pairs is essential: for a constraint such as requiring an even number of zeros between successive ones, powers of a single symbol-transition matrix do not by themselves encode the needed memory. The theorem counts paths with specified initial and terminal symbols, not distinct periodic orbits or shift-orbits, so later orbit counts require additional identifications. Its main use is that qualitative properties of the shift can now be read from algebraic properties of $M^n$.
This theorem explains why powers of $M$ govern the large-scale dynamics of $\Sigma_M$. A two-cycle with transitions $0\to1$ and $1\to0$ shows the next obstruction: every symbol reaches every other eventually, but only at times of the correct parity. To distinguish mere reachability from eventual reachability at every large time, we need two matrix notions. They separate transitivity from mixing by detecting possible periodic obstructions in the transition graph.
[definition: Irreducible And Primitive Matrix]
Let $M$ be a nonnegative $m\times m$ matrix. The matrix $M$ is irreducible if for every $i,j\in\{1,\dots,m\}$ there exists $n\ge 1$ such that $(M^n)_{ij}>0$. The matrix $M$ is primitive if there exists $N\ge 1$ such that $(M^n)_{ij}>0$ for every $i,j\in\{1,\dots,m\}$ and every $n\ge N$.
[/definition]
Irreducibility corresponds to being able to move from any symbol to any other eventually, while primitivity says this can be done at all sufficiently large times. Since topological mixing asks for all large iterates of one open set to meet another, primitivity is the right matrix condition. The next theorem makes this correspondence exact for subshifts of finite type.
[quotetheorem:7739]
[citeproof:7739]
The active-symbol hypothesis cannot be omitted: adding an unused zero row and column to a primitive matrix leaves the actual nonempty shift on the old symbols unchanged, but the enlarged matrix is not primitive. The theorem also does not classify arbitrary subshifts, since systems with longer memory may be mixing without being presented by the given one-step matrix. What it gives is a practical bridge from topological mixing to a finite computation with matrix powers, which is why primitive transition matrices become the central class for orbit counting.
The classification gives a practical test for mixing. The following examples show how a single forbidden word changes the system.
[example: Golden Mean Shift]
Let the symbols be ordered as $0,1$, and let $M_{ij}=1$ exactly when the transition $i\to j$ is allowed. Thus
\begin{align*}
M_{00}=1,\quad M_{01}=1,\quad M_{10}=1,\quad M_{11}=0.
\end{align*}
The condition $M_{x_i x_{i+1}}=1$ therefore permits the adjacent pairs $00$, $01$, and $10$, and excludes exactly the adjacent pair $11$. Hence $\Sigma_M$ is precisely the set of binary sequences with no occurrence of $11$.
We compute the entries of $M^2$ from $(M^2)_{ij}=M_{i0}M_{0j}+M_{i1}M_{1j}$:
\begin{align*}
(M^2)_{00}=M_{00}M_{00}+M_{01}M_{10}=1\cdot 1+1\cdot 1=2.
\end{align*}
\begin{align*}
(M^2)_{01}=M_{00}M_{01}+M_{01}M_{11}=1\cdot 1+1\cdot 0=1.
\end{align*}
\begin{align*}
(M^2)_{10}=M_{10}M_{00}+M_{11}M_{10}=1\cdot 1+0\cdot 1=1.
\end{align*}
\begin{align*}
(M^2)_{11}=M_{10}M_{01}+M_{11}M_{11}=1\cdot 1+0\cdot 0=1.
\end{align*}
Thus every entry of $M^2$ is positive. If every entry of $M^n$ is positive for some $n\ge 2$, then
\begin{align*}
(M^{n+1})_{i0}=(M^n)_{i0}M_{00}+(M^n)_{i1}M_{10}=(M^n)_{i0}+(M^n)_{i1}>0
\end{align*}
and
\begin{align*}
(M^{n+1})_{i1}=(M^n)_{i0}M_{01}+(M^n)_{i1}M_{11}=(M^n)_{i0}>0.
\end{align*}
By induction, every entry of $M^n$ is positive for all $n\ge 2$, so $M$ is primitive. Since both one-symbol cylinders are nonempty, *Mixing Classification For Subshifts Of Finite Type* implies that the golden mean shift is topologically mixing.
[/example]
The golden mean shift shows how finite transition matrices turn a local forbidden pattern into graph dynamics. Not every natural symbolic system has this bounded-memory form. This matters because a one-step transition matrix detects only constraints that can be checked from adjacent symbols, while some symbolic systems require remembering information over an unbounded gap.
[example: Even Shift]
The even shift is the set of binary sequences in which every block of zeros between two successive ones has even length. For example, $1001$ is allowed because the two ones enclose two zeros, while $101$ is allowed because they enclose zero zeros, but $10001$ is not allowed because the two ones enclose three zeros.
This constraint is not finite-type in the sense of being detected by forbidding words up to some fixed length. Suppose a finite forbidden list existed, and let $L$ be the maximum length of a word in that list. Consider the periodic sequence
\begin{align*}
x=\dots 1\,0^{2L}\,1\,0^{2L}\,1\,0^{2L}\,1\dots .
\end{align*}
Every block of zeros between successive ones in $x$ has length $2L$, so $x$ belongs to the even shift. Now consider
\begin{align*}
y=\dots 1\,0^{2L+1}\,1\,0^{2L+1}\,1\,0^{2L+1}\,1\dots .
\end{align*}
The sequence $y$ does not belong to the even shift, since each block of zeros between successive ones has odd length $2L+1$.
However, every word of length at most $L$ appearing in $y$ also appears in $x$. Such a word is either all zeros, begins with a $1$ followed by at most $L-1$ zeros, ends with at most $L-1$ zeros followed by a $1$, or is the single symbol $1$; each of these occurs inside the block pattern $1\,0^{2L}\,1$ in $x$. Therefore no forbidden word of length at most $L$ can appear in $y$, because the same word would then appear in the allowed sequence $x$. This contradicts the assumption that the finite forbidden list detects exactly the even shift.
The system is nevertheless recognized by a finite labelled graph: use one state $E$ for “an even number of zeros since the last one” and one state $O$ for “an odd number of zeros since the last one.” Put a labelled edge $E \xrightarrow{1} E$, a labelled edge $E \xrightarrow{0} O$, and a labelled edge $O \xrightarrow{0} E$, with no edge $O \xrightarrow{1} E$. Reading labels along paths gives exactly the binary sequences in which a $1$ can occur only when the current zero-count parity is even, so the even shift is sofic but not a subshift of finite type.
[/example]
## Perron-Frobenius Theory and Periodic Points
Once a subshift is represented by a nonnegative matrix, orbit counting becomes spectral theory. The main question is how the leading eigenvalue controls the growth of words and periodic orbits. The primitivity hypothesis rules out two basic failures: reducible matrices can split into unrelated components, while irreducible but periodic matrices can have several eigenvalues of maximal modulus, as in a directed two-cycle. Perron-Frobenius theory isolates the case where one positive eigenvalue dominates all asymptotic counts.
[quotetheorem:6787]
Primitivity is essential: the matrix of a directed two-cycle has eigenvalues $1$ and $-1$, so no single eigenvalue is strictly larger in modulus than all the others. The theorem does not provide exact word counts at finite time; it gives the dominant asymptotic term and the limiting rank-one structure. For symbolic dynamics, the Perron eigenvalue is the exponential growth rate of admissible words, and it will later reappear as the topological entropy of the mixing shift of finite type.
[example: Growth Rate In The Golden Mean Shift]
For the golden mean transition matrix, with symbols ordered as $0,1$, the entries are $M_{00}=1$, $M_{01}=1$, $M_{10}=1$, and $M_{11}=0$. Thus
\begin{align*}
\det(tI-M)=(t-M_{00})(t-M_{11})-M_{01}M_{10}.
\end{align*}
Substituting the four entries gives
\begin{align*}
\det(tI-M)=(t-1)t-1\cdot 1=t^2-t-1.
\end{align*}
The roots of $t^2-t-1$ are obtained from the [quadratic formula](/theorems/1301):
\begin{align*}
t=\frac{1\pm\sqrt{(-1)^2-4(1)(-1)}}{2}=\frac{1\pm\sqrt5}{2}.
\end{align*}
Therefore the positive root, and hence the Perron eigenvalue, is
\begin{align*}
\lambda=\frac{1+\sqrt5}{2}.
\end{align*}
Let $c_n$ be the number of admissible binary words of length $n$. Split this count as $c_n=a_n+b_n$, where $a_n$ counts admissible length-$n$ words ending in $0$ and $b_n$ counts admissible length-$n$ words ending in $1$. A word ending in $0$ may be followed by $0$ or $1$, while a word ending in $1$ may be followed only by $0$, because $11$ is forbidden. Hence
\begin{align*}
a_{n+1}=a_n+b_n=c_n.
\end{align*}
Also,
\begin{align*}
b_{n+1}=a_n.
\end{align*}
Adding these two recurrences gives
\begin{align*}
c_{n+1}=a_{n+1}+b_{n+1}=c_n+a_n.
\end{align*}
Using $a_n=c_{n-1}$ from the first recurrence with index $n-1$, we obtain
\begin{align*}
c_{n+1}=c_n+c_{n-1}.
\end{align*}
The initial values are $c_1=2$ and $c_2=3$, so $c_n=F_{n+2}$ for the Fibonacci sequence with $F_0=0$ and $F_1=1$. If $\mu=(1-\sqrt5)/2$, Binet's formula gives
\begin{align*}
c_n=F_{n+2}=\frac{\lambda^{n+2}-\mu^{n+2}}{\sqrt5}.
\end{align*}
Since $|\mu|<1$, the term $\mu^{n+2}/\sqrt5$ tends to $0$, while $\lambda^{n+2}=\lambda^2\lambda^n$. Therefore
\begin{align*}
\frac{c_n}{\lambda^n}\to \frac{\lambda^2}{\sqrt5}.
\end{align*}
Thus the number of length-$n$ admissible words grows like the positive constant $\lambda^2/\sqrt5$ times $\lambda^n$, exactly matching the Fibonacci recursion forced by forbidding consecutive ones.
[/example]
This is also a point of contact with graph theory and information theory. The matrix $M$ is the adjacency matrix of a directed graph, admissible words are directed paths, and $\log \lambda$ measures the exponential information rate of the allowed symbolic language.
Perron-Frobenius describes asymptotic counts, but chaos also asks where concrete recurrent orbits are located. Counting many words does not by itself guarantee that a prescribed finite observation occurs along a periodic orbit; the word might fail to close back to its starting state. For a mixing subshift, the obstruction should disappear because admissible blocks can be bridged back to themselves, placing periodic behaviour inside every finite observational window.
[quotetheorem:7740]
[citeproof:7740]
Primitivity is stronger than needed for density alone, since irreducibility already lets an admissible word be closed by some return path. It is used here because the same hypothesis also supplies mixing and bridges of all sufficiently large lengths; a directed two-cycle shows the distinction, as cylinders can be closed periodically but only with the correct parity of return time. The theorem does not say that periodic points exhaust the space or that their periods are uniformly bounded; it says every finite admissible observation can be matched by a periodic orbit. This is the topological input behind using symbolic models for horseshoes, where periodic orbits in the model correspond to recurrent geometric orbits in the original system.
This result is one reason symbolic models are useful in chaos: periodic behaviour is not isolated from aperiodic behaviour, but distributed throughout the phase space.
[example: Periodic Approximation In The Golden Mean Shift]
In the golden mean shift, admissible binary sequences are those with no adjacent block $11$. Consider the cylinder of points whose coordinates $0,1,2,3$ are $0,1,0,0$:
\begin{align*}
C=\{x\in\Sigma_M:x_0=0,\ x_1=1,\ x_2=0,\ x_3=0\}.
\end{align*}
Let $p$ be the sequence obtained by repeating the word $0100$:
\begin{align*}
p=010001000100\dots .
\end{align*}
The adjacent pairs inside one copy of $0100$ are $01$, $10$, and $00$, and the adjacent pair across the boundary between two copies is again $00$. Thus every adjacent pair in $p$ is one of $01$, $10$, or $00$, so the forbidden pair $11$ never occurs and $p\in\Sigma_M$.
The first four coordinates of $p$ are
\begin{align*}
p_0=0,\quad p_1=1,\quad p_2=0,\quad p_3=0.
\end{align*}
Hence $p\in C$. Also, since the word $0100$ repeats, for every $i\in\mathbb Z$ we have
\begin{align*}
p_{i+4}=p_i.
\end{align*}
Therefore
\begin{align*}
(\sigma^4p)_i=p_{i+4}=p_i.
\end{align*}
So $\sigma^4p=p$, and $p$ is a periodic point lying in the prescribed cylinder. It approximates every point in the golden mean shift with the same four visible coordinates because membership in this cylinder records precisely that finite observation.
[/example]
The chapter has built the symbolic toolkit needed for Chapter 3. Full shifts model unconstrained branching, subshifts of finite type model finite transition rules, and primitive transition matrices give mixing with dense periodic points. Horseshoes will realise these symbolic systems inside smooth dynamics as invariant Cantor sets produced by stretching and folding.
Symbolic dynamics turns complicated orbit structure into combinatorics, but the course also needs geometric systems where those symbols arise naturally. Horseshoes and Smale dynamics provide that bridge by realising shifts inside smooth maps through stretching and folding.
# 3. Horseshoes and Smale Dynamics
These notes develop the part of the course where abstract symbolic dynamics becomes a concrete tool for smooth dynamical systems. Chapter 2 introduced full shifts and subshifts of finite type as topological dynamical systems in their own right; this chapter explains how those shifts are embedded inside planar maps through the geometric mechanism of stretching, folding, and repeated return. The main prerequisites are the symbolic coding of the full shift, basic topology of compact invariant sets, and the stable/unstable directions near a hyperbolic saddle. The central question is: when a map stretches a rectangle across itself in several strips and folds the image back, which orbits never escape, and how can we describe them?
## Stretching and Folding in a Rectangle
The starting point is a local picture rather than a global phase portrait. We look for a compact region $R$ whose image crosses $R$ in separated strips, while the inverse image also crosses $R$ in separated strips. Points that remain in $R$ under all forward and backward iterates are forced to make a discrete choice at each time, and those choices become the symbols in a shift space.
[definition: Horseshoe Rectangle]
Let $R \subset \mathbb R^2$ be a closed topological rectangle with two chosen vertical sides and two chosen horizontal sides. A homeomorphism $F: U \to F(U)$, defined on an open set $U \supset R$, has a two-strip horseshoe structure on $R$ if $F(R) \cap R$ is the union of two disjoint closed vertical strips $V_0,V_1 \subset R$, and $F^{-1}(R) \cap R$ is the union of two disjoint closed horizontal strips $H_0,H_1 \subset R$, with $F(H_i)=V_i$ for $i=0,1$.
[/definition]
The strip structure gives the local geometry, but we still need to identify the set on which the dynamics is actually defined for all times. Most points of $R$ eventually leave the rectangle, so the next object isolates the nonescaping core whose orbits can carry complete symbolic names.
[example: Planar Horseshoe Map]
Let $R=[0,1]^2$, and suppose the model map has two branches $H_0,H_1\subset R$ with $F(H_i)=V_i$, where $H_i$ are horizontal strips and $V_i$ are vertical strips. A point belongs to the invariant set precisely when every iterate stays in the rectangle, so
\begin{align*}
\Lambda=\{x\in R:F^n(x)\in R \text{ for every } n\in\mathbb Z\}.
\end{align*}
For $x\in\Lambda$, define its symbol at time $n$ by $s_n=i$ when $F^n(x)\in H_i$. Thus the orbit of $x$ determines a bi-infinite binary sequence $(s_n)_{n\in\mathbb Z}$.
Conversely, fix a bi-infinite sequence $s=(s_n)_{n\in\mathbb Z}$. For $N\ge 0$, consider the finite cylinder
\begin{align*}
C_N(s)=\{x\in R:F^n(x)\in H_{s_n}\text{ for every }-N\le n\le N\}.
\end{align*}
If $M>N$, then every condition defining $C_N(s)$ also appears in the definition of $C_M(s)$, so
\begin{align*}
C_M(s)\subset C_N(s).
\end{align*}
Each $C_N(s)$ is a compact rectangle because it is obtained by intersecting finitely many closed strip preimages inside the compact rectangle $R$. In the affine model, one forward step expands the horizontal direction by a factor $\lambda>1$ and contracts the vertical direction by a factor $\mu<1$. Therefore pulling back one future strip reduces horizontal width by the factor $\lambda^{-1}$, and pulling back $N$ future strip conditions reduces horizontal width by at most
\begin{align*}
\lambda^{-N}.
\end{align*}
Similarly, using inverse branches, imposing $N$ past strip conditions reduces vertical height by at most
\begin{align*}
\mu^N.
\end{align*}
Since $\lambda^{-N}\to 0$ and $\mu^N\to 0$, the nested rectangles $C_N(s)$ have both horizontal width and vertical height tending to $0$. Their intersection therefore contains exactly one point:
\begin{align*}
\bigcap_{N\ge 0}C_N(s)=\{x_s\}.
\end{align*}
This point satisfies $F^n(x_s)\in H_{s_n}$ for every $n\in\mathbb Z$, so $x_s\in\Lambda$ and its itinerary is exactly $s$.
Thus the nonescaping set is not the whole folded image inside $R$; it is the intersection left after all forward and backward strip cuts. The two Cantor directions come from the repeated binary choices in future and past time, and their transverse intersection selects one point for each bi-infinite binary code.
[/example]
The example shows that the visible folded strip is not itself the chaotic set; the chaotic set is the smaller set surviving every forward and backward cut. This motivates the definition of the maximal invariant subset of the rectangle, which is the object later assigned symbolic itineraries.
[definition: Invariant Horseshoe Set]
For a map $F: U \to F(U)$ with a two-strip horseshoe structure on $R$, the invariant horseshoe set is
\begin{align*}
\Lambda = \{x\in R : F^n(x) \text{ is defined and lies in } R \text{ for every } n\in\mathbb Z\}.
\end{align*}
[/definition]
Equivalently, when the local inverse branches are understood, this set is written as $\Lambda=\bigcap_{n\in\mathbb Z}F^n(R)$. The displayed set-theoretic definition is the precise meaning in the local setting: only points with a full orbit defined inside the chosen neighbourhood are included.
The set $\Lambda$ is the part of the rectangle that never escapes, neither in forward time nor in backward time. Since each iterate removes open gaps in one direction, the natural question is whether the surviving set has the Cantor geometry suggested by the planar model.
[quotetheorem:7741]
[citeproof:7741]
This theorem identifies the geometric shape of the invariant set and also explains why the hyperbolic estimates are not cosmetic. Without uniform contraction, the nested stable strips need not shrink to Cantor fibres; without uniform expansion, distinct forward itineraries may fail to separate. If a strip merely touches another strip along a tangency, the survivor set can contain arcs or boundary identifications rather than a clean totally disconnected product structure. The conclusion is therefore local to the invariant set and depends on the stated Markov and cone assumptions; it does not describe the whole rectangle $R$. The next question is dynamical: how does $F$ act on this Cantor-like set, and why should its action match the shift map studied earlier?
## Symbolic Coding of the Horseshoe
To turn geometry into sequences, we assign a symbol at each time according to which strip contains the orbit. The possible obstruction is ambiguity on strip boundaries, so the cleanest statement assumes the strips form a Markov partition whose overlaps are controlled by their edges.
[definition: Itinerary Map]
Let $F$ have a two-strip horseshoe structure on $R$ with invariant set $\Lambda$. The itinerary map is the map
\begin{align*}
\pi: \Lambda \to \{0,1\}^{\mathbb Z}
\end{align*}
defined by $\pi(x)=(s_n)_{n \in \mathbb Z}$, where $s_n=i$ if $F^n(x) \in H_i$.
[/definition]
The itinerary map records the complete past and future of a point relative to the two branches. The remaining issue is whether this recording loses information or misses possible sequences; resolving that issue is exactly the symbolic coding theorem.
[quotetheorem:7742]
[citeproof:7742]
This result explains why the symbolic model is not just an analogy, and the theorem is deliberately stronger than a mere coding map. The branch diffeomorphism and cone hypotheses ensure that each cylinder rectangle shrinks to one point, so two different points cannot carry the same bi-infinite itinerary. The boundary condition prevents the same orbit from being named in two incompatible ways. If a boundary orbit is shared, or if a tangency prevents cylinder diameters from shrinking, the same construction may give only a semiconjugacy or a symbolic factor, not a conjugacy. Thus the theorem describes the dynamics on $\Lambda$ under clean Markov hyperbolicity assumptions; it does not assert that nearby points outside $\Lambda$ follow the shift.
[remark: Conjugacy Versus Semiconjugacy]
A conjugacy is a homeomorphism $h:X\to Y$ satisfying $h\circ f=g\circ h$, so it preserves the full topological dynamics. A semiconjugacy is a continuous surjection $h:X\to Y$ satisfying the same intertwining relation, but it may identify different points of $X$. In nonideal horseshoes, boundary identifications or tangencies can produce only a semiconjugacy onto a symbolic shift.
[/remark]
This distinction matters when a return map is obtained from a physical model or a flow. The symbolic dynamics may still be present as a factor even when the return map fails to give a one-to-one symbolic code.
[example: Boundary Ambiguity]
Let $R_0$ and $R_1$ be two Markov rectangles with a common boundary arc $B=R_0\cap R_1$, and suppose $F^k(x)\in B$ for some integer $k$. If the itinerary is defined by closed membership, then $F^k(x)\in R_0$ and $F^k(x)\in R_1$, so the $k$th symbol can be either $0$ or $1$. Keeping all other symbols fixed, this gives two possible symbolic names:
\begin{align*}
s_n=t_n \text{ for } n\ne k,\qquad s_k=0,\qquad t_k=1.
\end{align*}
The two sequences are distinct because their $k$th entries differ, but they describe the same orbit point $x$.
One can force a single-valued coding by choosing a convention, for example assigning $B$ to $R_0$ and replacing the rectangles by $R_0'=R_0$ and $R_1'=R_1\setminus B$. Then every point has at most one symbol at time $k$, so the itinerary map is well-defined. However, before the convention is imposed, the coding relation sends the same point $x$ to both $s$ and $t$; after the convention is imposed, nearby points on opposite sides of $B$ still have symbolic names that differ at the boundary-crossing time. Thus the ambiguity is not caused by a missing orbit, but by the shared boundary itself: injectivity fails exactly for points whose forward or backward orbit lands on a shared Markov boundary.
[/example]
The boundary example shows how coding can fail to be injective in degenerate cases, while the uniformly hyperbolic horseshoe avoids this problem. With genuine conjugacy available, periodic symbolic sequences should now produce actual periodic orbits of the smooth map.
[quotetheorem:7743]
[citeproof:7743]
The result uses the full force of conjugacy. Injectivity is what turns a periodic symbolic name into a unique periodic point rather than a whole fibre of points with the same code, and surjectivity is what ensures that every symbolic periodic word is realised by the smooth map. Under only a semiconjugacy, a periodic sequence must have at least one invariant fibre, but the fibre may contain several points or even continua, and periodic symbolic data alone need not classify periodic orbits. The theorem also does not say that there are only countably many points in $\Lambda$; the invariant set is uncountable. Periodic points are countable but dense, while nonperiodic symbolic sequences give the rest of the Cantor set. This is the first dynamical payoff of the strip criterion, and it motivates the Smale theorem below: once the geometry gives a genuine conjugacy, the full list of symbolic periodic orbits becomes a family of actual periodic orbits for the original map.
## The Smale Horseshoe Theorem
The strip construction is useful because it gives a criterion for chaos from geometry. Instead of solving every orbit, we check that some iterate stretches a rectangle across itself in the right way.
[quotetheorem:7744]
[citeproof:7744]
The theorem is the bridge from smooth dynamics to symbolic dynamics, and each geometric hypothesis rules out a specific false positive. If the image crosses $R$ in only one strip, the full two-shift has no source of two-symbol branching. If the crossing strips overlap without a Markov separation, itineraries may be ambiguous. If expansion is absent, cylinder sets may not shrink in the unstable direction; if contraction is absent, points with the same future can fail to form stable Cantor fibres. The diffeomorphic branch assumption prevents folding inside a single branch from creating critical identifications not represented by the shift. Once these conditions are verified, the map contains a subsystem with sensitive dependence, dense periodic points, and positive topological entropy inherited from the full shift.
[example: A Linear-Affine Horseshoe Branch Model]
Take two horizontal strips
\begin{align*}
H_0=[0,1]\times[0,h]
\end{align*}
and
\begin{align*}
H_1=[0,1]\times[1-h,1],
\end{align*}
where $0<h<1/2$. On each $H_i$, use local coordinates $(u,v)$, with $u$ horizontal and $v$ vertical, and define an affine branch by
\begin{align*}
F_i(u,v)=(\mu v+a_i,\lambda u+b_i),
\end{align*}
where $\lambda>2$, $0<\mu<1/2$, and the constants $a_i,b_i$ are chosen so that the images are two disjoint vertical rectangles $V_i\subset [0,1]^2$.
For two points $(u_1,v_1),(u_2,v_2)\in H_i$ with the same vertical coordinate, the horizontal separation before applying $F_i$ is $|u_1-u_2|$. Their images have vertical-coordinate difference
\begin{align*}
|(\lambda u_1+b_i)-(\lambda u_2+b_i)|=\lambda |u_1-u_2|.
\end{align*}
Thus the branch stretches the horizontal direction by the factor $\lambda$. For two points with the same horizontal coordinate, the vertical separation before applying $F_i$ is $|v_1-v_2|$, and their images have horizontal-coordinate difference
\begin{align*}
|(\mu v_1+a_i)-(\mu v_2+a_i)|=\mu |v_1-v_2|.
\end{align*}
Thus the branch contracts the vertical direction by the factor $\mu$.
Along an orbit segment that remains in the two affine branches for $n$ steps, the expanding factor is the product of the $n$ identical one-step factors:
\begin{align*}
\lambda\cdot \lambda\cdots \lambda=\lambda^n.
\end{align*}
The contracting factor is similarly
\begin{align*}
\mu\cdot \mu\cdots \mu=\mu^n.
\end{align*}
Because $\lambda>1$, the unstable size grows like $\lambda^n$; because $0<\mu<1$, the stable size shrinks like $\mu^n$.
After smoothing near the gaps and extending the map outside the strips without changing these branch estimates on the invariant core, the two images form separated vertical crossings and the two preimage strips form separated horizontal crossings. Hence the local model satisfies the strip-crossing and cone hypotheses of the *[Smale Horseshoe Theorem](/theorems/7744)*. Its invariant set is coded by the full two-shift, with one expanding direction and one contracting direction on the affine branches.
[/example]
The theorem also explains why horseshoes are robust under small perturbations. The crossing, expansion, and contraction conditions are open conditions in the $C^1$ topology, so a nearby diffeomorphism still contains a conjugate symbolic subsystem, possibly on a nearby invariant Cantor set.
[remark: Hyperbolicity of the Horseshoe]
On the invariant set $\Lambda$, tangent vectors split into stable and unstable cone directions. Vectors in the stable cone contract under forward iteration, while vectors in the unstable cone contract under backward iteration. This is the prototype of a uniformly hyperbolic invariant set.
[/remark]
The local horseshoe picture often appears inside a larger phase space through a return map. A continuous-time flow can contain a horseshoe when a suitable Poincare section has a return map satisfying the strip-crossing hypotheses.
## Homoclinic Intersections and Return Maps
Horseshoes are not restricted to hand-built maps. In smooth systems, they often arise near a transverse homoclinic point, where the stable and unstable manifolds of a saddle fixed point intersect away from the saddle.
[definition: Homoclinic Point]
Let $F:M\to M$ be a diffeomorphism of a smooth surface and let $p\in M$ be a hyperbolic saddle fixed point. A point $q\ne p$ is a homoclinic point for $p$ if
\begin{align*}
q \in W^s(p) \cap W^u(p).
\end{align*}
[/definition]
A homoclinic point means that an orbit leaves the saddle along its unstable manifold and returns to it along its stable manifold. To decide whether this return creates a robust crossing rather than a fragile tangency, we need the stronger transverse version.
[definition: Transverse Homoclinic Point]
With $F$, $M$, $p$, and $q$ as above, the homoclinic point $q$ is transverse if
\begin{align*}
T_qW^s(p)+T_qW^u(p)=T_qM.
\end{align*}
[/definition]
The transversality condition turns a single return into a persistent crossing structure. The problem is that one homoclinic return is only a single orbit unless nearby strips are forced to cross repeatedly. Transversality supplies the geometric leverage: stable and unstable arcs cut across one another in a way that survives iteration and can create the rectangular strip dynamics of a horseshoe after a suitable return time.
[quotetheorem:7745]
[proofunderconstruction:7745]
This result is often called the Birkhoff-Smale mechanism. The word transverse is essential: if the stable and unstable manifolds merely touch, the return can have a quadratic homoclinic tangency instead of a crossing. In a one-parameter unfolding, such a tangency may disappear, persist as a single tangent contact, or split into transverse intersections only on selected parameter ranges, so the symbolic subsystem is not forced by the local geometry at the tangency parameter. Transversality gives a persistent crossing under small $C^1$ perturbations and lets the inclination mechanism convert one homoclinic return into many strip crossings. The conclusion is still local and invariant-set based; it does not imply that the entire surface is chaotic, nor that every orbit near the saddle is trapped in the horseshoe.
[example: Transverse Homoclinic Point for a Saddle]
Consider an area-preserving diffeomorphism of the plane with a hyperbolic saddle fixed point $p$, and suppose a branch of $W^u(p)$ meets a branch of $W^s(p)$ at a point $q\ne p$ with transverse tangent directions. Since $q\in W^s(p)$, the forward iterates satisfy $F^n(q)\to p$ as $n\to\infty$; since $q\in W^u(p)$, the backward iterates satisfy $F^n(q)\to p$ as $n\to-\infty$. Thus the orbit of $q$ leaves a neighbourhood of $p$ along the unstable manifold and later returns along the stable manifold.
The transverse crossing is the geometric input in the *[Horseshoe from a Transverse Homoclinic Point](/theorems/7745)*: for some integer $N>0$, the return map $F^N$ has a horseshoe on a compact invariant set near the homoclinic tangle. Concretely, one can choose a rectangle whose sides lie along short stable and unstable arcs; under the return map, two subrectangles cross it in separated strips, so the strip-crossing hypotheses of the *Smale Horseshoe Theorem* hold for $F^N$. The resulting invariant set is coded by a full shift on finitely many symbols.
If a symbolic sequence has period $m$ under the shift, the coding gives a point $x$ with $F^{Nm}(x)=x$. Hence periodic symbolic words give periodic orbits of the original map, and by choosing the return rectangle closer to the homoclinic orbit, these periodic orbits accumulate near the original homoclinic tangle.
[/example]
The planar picture extends to forced systems through Poincare maps. A periodically forced differential equation has a natural return map after one forcing period, and a horseshoe for that map gives chaotic dynamics for the original flow sampled once per period.
[example: Forced Damped Pendulum Return Map]
Consider the periodically forced damped pendulum
\begin{align*}
\ddot{\theta}+\gamma\dot{\theta}+\sin\theta=A\cos(\omega t),
\end{align*}
where $\gamma>0$ and $A>0$. Put $v=\dot{\theta}$. Then the equation is the first-order periodic system on the cylinder $S^1\times\mathbb R$:
\begin{align*}
\dot{\theta}=v,\qquad \dot v=-\gamma v-\sin\theta+A\cos(\omega t).
\end{align*}
The forcing period is $T=2\pi/\omega$, since
\begin{align*}
\cos(\omega(t+T))=\cos(\omega t+\omega\cdot 2\pi/\omega)=\cos(\omega t+2\pi)=\cos(\omega t).
\end{align*}
Thus the time-$T$ Poincare map sends an initial state to its state one forcing period later:
\begin{align*}
P(\theta_0,v_0)=(\theta(T),v(T)),
\end{align*}
where $(\theta(t),v(t))$ solves the displayed first-order system with $(\theta(0),v(0))=(\theta_0,v_0)$.
Assume that, for a chosen parameter regime, this Poincare map has a saddle fixed point $p$ and a transverse homoclinic point $q\ne p$, so
\begin{align*}
q\in W^s(p)\cap W^u(p)
\end{align*}
and
\begin{align*}
T_qW^s(p)+T_qW^u(p)=T_q(S^1\times\mathbb R).
\end{align*}
By the *Horseshoe from a Transverse Homoclinic Point*, some iterate $P^N$ has a horseshoe on a compact invariant set near the homoclinic tangle. By the *Smale Horseshoe Theorem*, the return dynamics on that invariant set contains a subsystem conjugate to a full shift on finitely many symbols.
A symbolic itinerary therefore records which strip the pendulum state visits after each block of $N$ forcing periods. Different symbol sequences encode different long-term patterns of rotations and oscillations; periodic symbolic sequences give periodic motions of the original forced pendulum sampled by the Poincare map.
[/example]
## Dynamical Consequences of Horseshoes
The final local problem is to identify which chaotic properties survive when we pass from the symbolic model back to the smooth map. The answer depends on the strength of the coding: a conjugacy transports properties of the full shift to the invariant set itself, while a semiconjugacy may preserve only factor-level information. Under the Smale hypotheses, the geometric construction gives the subsystem and the shift model supplies the dynamical conclusions.
[quotetheorem:7746]
[citeproof:7746]
This theorem packages the reason horseshoes are a standard model for chaos, but its hypotheses matter because the proof transports properties through conjugacy. Without a genuine conjugacy, transitivity and dense periodic points may survive only in a factor, while the original invariant set can have extra fibres or boundary identifications. Without the full two-strip Markov structure, the entropy may drop to that of a proper subshift, and forbidden transitions can destroy transitivity. The conclusion also concerns $F|_\Lambda$, not the ambient map on all of $U$; attractors, elliptic islands, or escaping regions outside $\Lambda$ are compatible with the theorem. On the horseshoe itself, however, the dynamics is deterministic and uniformly hyperbolic while containing all binary symbolic histories.
[remark: The Horseshoe as a Subsystem]
The existence of a horseshoe does not mean that every point in the ambient phase space behaves chaotically. It means that the system contains a compact invariant subset on which the dynamics is conjugate or semiconjugate to symbolic dynamics. Outside that subset, the same map may have attracting periodic orbits, wandering regions, or regular invariant curves.
[/remark]
The chapter therefore completes the passage from the abstract shift spaces of Chapter 2 to smooth dynamics. Horseshoes are the geometric devices that implant shifts inside maps; Chapter 4 abstracts their stable and unstable directions as hyperbolic sets, and Chapter 5 returns to transverse homoclinic intersections as a systematic source of horseshoes.
Horseshoes show that symbolic complexity can live inside smooth dynamics, but they are only the first manifestation of a broader geometric mechanism. The next chapter abstracts the stretching and contracting directions into hyperbolic sets and stable manifolds, which will later support a systematic theory of orbit structure.
# 4. Hyperbolic Sets and Stable Manifolds
Hyperbolicity is the geometric mechanism behind the horseshoes of the previous chapter. Instead of studying a single fixed point or an explicitly drawn rectangle, we now isolate invariant sets on which every tangent vector is forced into either an exponentially contracting direction or an exponentially expanding direction. This chapter explains how that splitting produces stable and unstable manifolds, local product structure, and topological models of the dynamics near the hyperbolic set.
The guiding question is: when does nonlinear dynamics near an invariant set behave, at small scales, like its linearization? For hyperbolic fixed points the answer is given by the [Stable Manifold Theorem](/theorems/2778) and the Hartman-Grobman Theorem. For compact hyperbolic sets the same ideas persist, but the stable and unstable directions vary over the set and the resulting invariant manifolds form families rather than single submanifolds.
## Uniform Hyperbolicity on Invariant Sets
The first obstruction to using linearization on an invariant set is that there may be no uniform separation between contracting and expanding behaviour. A point may look hyperbolic for a few iterates and then spend a long time near a neutral direction. Uniform hyperbolicity rules this out by imposing exponential estimates that hold at every point of the invariant set and for every iterate.
[definition: Hyperbolic Set for a Diffeomorphism]
Let $M$ be a smooth Riemannian manifold, let $f: M \to M$ be a $C^1$ diffeomorphism, and let $\Lambda \subset M$ be a compact $f$-invariant set. The set $\Lambda$ is a hyperbolic set for $f$ if there exist continuous subbundles $E^s, E^u \subset TM|_\Lambda$, constants $C \ge 1$ and $0 < \lambda < 1$, such that for every $x \in \Lambda$,
\begin{align*}
T_xM = E^s_x \oplus E^u_x.
\end{align*}
The splitting is invariant:
\begin{align*}
Df_x(E^s_x) = E^s_{f(x)}, \qquad Df_x(E^u_x) = E^u_{f(x)}.
\end{align*}
For all $n \ge 0$,
\begin{align*}
|Df_x^n v| \le C\lambda^n |v| \quad \text{for all } v \in E^s_x,
\end{align*}
and
\begin{align*}
|Df_x^{-n} w| \le C\lambda^n |w| \quad \text{for all } w \in E^u_x.
\end{align*}
[/definition]
The stable bundle records directions that contract in forward time, while the unstable bundle records directions that contract in backward time. The constants $C$ and $\lambda$ are part of the definition because the same exponential rate must work over all of $\Lambda$.
[example: Hyperbolic Linear Automorphism]
Let $L^s,L^u\subset \mathbb R^2$ be the eigenspaces of $A$ for $\lambda_s$ and $\lambda_u$. Since $A\in SL(2,\mathbb Z)$, both $A$ and $A^{-1}$ preserve $\mathbb Z^2$, so $f_A([x])=[Ax]$ is a diffeomorphism of $\mathbb T^2$. Identifying every tangent space of $\mathbb T^2$ with $\mathbb R^2$, set $E^s_{[x]}=L^s$ and $E^u_{[x]}=L^u$ for every $[x]\in\mathbb T^2$. Because the eigenvalues are distinct, $L^s\cap L^u=\{0\}$ and therefore
\begin{align*}T_{[x]}\mathbb T^2=L^s\oplus L^u=E^s_{[x]}\oplus E^u_{[x]}.\end{align*}
The derivative of $f_A$ is the same [linear map](/page/Linear%20Map) $A$ at every point, so
\begin{align*}Df_A(E^s_{[x]})=A L^s=L^s=E^s_{f_A([x])}.\end{align*}
Similarly,
\begin{align*}Df_A(E^u_{[x]})=A L^u=L^u=E^u_{f_A([x])}.\end{align*}
With the flat Euclidean metric, if $v\in E^s_{[x]}=L^s$, then $A^n v=\lambda_s^n v$, hence
\begin{align*}|Df_A^n v|=|A^n v|=|\lambda_s|^n |v|.\end{align*}
If $w\in E^u_{[x]}=L^u$, then $A^{-n}w=\lambda_u^{-n}w$, hence
\begin{align*}|Df_A^{-n} w|=|A^{-n}w|=|\lambda_u|^{-n}|w|.\end{align*}
Choose
\begin{align*}\lambda=\max\{|\lambda_s|,|\lambda_u|^{-1}\}.\end{align*}
Then $0<\lambda<1$, and the two estimates above give the hyperbolicity inequalities with $C=1$ for the flat metric. For any other Riemannian metric on the compact torus, there are constants $m,M>0$ such that
\begin{align*}m|\xi|_{\mathrm{flat}}\le |\xi|_g\le M|\xi|_{\mathrm{flat}}\end{align*}
for every tangent vector $\xi$. Therefore, for $v\in E^s_{[x]}$,
\begin{align*}|Df_A^n v|_g\le M|Df_A^n v|_{\mathrm{flat}}\le M\lambda^n |v|_{\mathrm{flat}}\le \frac{M}{m}\lambda^n |v|_g.\end{align*}
The same calculation with $Df_A^{-n}$ on $E^u$ gives the unstable estimate. Thus the whole torus is a hyperbolic set for $f_A$, with stable and unstable bundles given by the constant eigendirections of $A$.
[/example]
This example motivates a separate name for the case where hyperbolicity is not confined to a small invariant subset. When the entire compact phase space carries stable and unstable directions, hyperbolicity becomes a global structure rather than a local feature of a horseshoe or fixed point.
[definition: Anosov Diffeomorphism]
Let $M$ be a compact smooth Riemannian manifold. A $C^1$ diffeomorphism $f: M \to M$ is an Anosov diffeomorphism if $M$ is a hyperbolic set for $f$.
[/definition]
Anosov systems are globally hyperbolic; horseshoes are locally hyperbolic. The definitions are the same, but the geometry differs because an Anosov diffeomorphism supplies stable and unstable directions at every point of the manifold.
[remark: Choice of Riemannian Metric]
The definition of a hyperbolic set does not depend on the particular Riemannian metric on a compact manifold. Any two Riemannian metrics on a compact set give equivalent norms, so the constants $C$ and $\lambda$ may change but the existence of uniform exponential contraction and expansion does not.
[/remark]
The metric remark leaves one further obstruction: continuous time always has a neutral direction along the orbit itself. A flow cannot contract or expand the vector tangent to its own trajectory in the same way that it contracts stable directions, so the discrete-time splitting must be modified. The right definition separates this unavoidable flow direction from the stable and unstable bundles and imposes exponential estimates only on the transverse directions.
[definition: Hyperbolic Set for a Flow]
Let $M$ be a smooth Riemannian manifold, let $\varphi_t: M \to M$ be a $C^1$ flow, and let $\Lambda \subset M$ be a compact invariant set with no equilibrium points. The set $\Lambda$ is a hyperbolic set for $\varphi_t$ if there exist continuous subbundles $E^s, E^u \subset TM|_\Lambda$, constants $C \ge 1$ and $\lambda > 0$, such that for every $x \in \Lambda$,
\begin{align*}
T_xM = E^s_x \oplus \mathbb R X(x) \oplus E^u_x,
\end{align*}
where $X$ is the generating vector field. The stable and unstable bundles are invariant under the flow derivative:
\begin{align*}
D\varphi_t(E^s_x) = E^s_{\varphi_t(x)}, \qquad D\varphi_t(E^u_x) = E^u_{\varphi_t(x)}
\end{align*}
for all $t \in \mathbb R$. For all $t \ge 0$,
\begin{align*}
|D\varphi_t(v)| \le C e^{-\lambda t}|v| \quad \text{for all } v \in E^s_x,
\end{align*}
and
\begin{align*}
|D\varphi_{-t}(w)| \le C e^{-\lambda t}|w| \quad \text{for all } w \in E^u_x.
\end{align*}
[/definition]
The centre direction $\mathbb R X(x)$ is not a defect of the definition. It expresses the fact that moving a small amount along the same orbit remains a small displacement along that orbit, rather than being forced into exponential contraction or expansion.
[example: Geodesic Flow in Negative Curvature]
Let $S$ be closed and have strictly negative sectional curvature. Since $S$ is compact, there are constants $0<a\le b$ such that
\begin{align*}-b^2\le K(q)\le -a^2<0\end{align*}
for every $q\in S$. For $v\in T^1S$, let $\gamma_v(t)$ be the unit-speed geodesic with $\dot\gamma_v(0)=v$. A tangent vector to $T^1S$ transverse to the flow direction is represented by an orthogonal Jacobi field $J$ along $\gamma_v$. In dimension two, writing $J(t)=j(t)e(t)$ for a parallel unit normal field $e(t)$ gives the scalar Jacobi equation
\begin{align*}j''(t)+K(\gamma_v(t))j(t)=0.\end{align*}
The stable initial data are the Jacobi fields whose size decays as $t\to+\infty$. Equivalently, the stable line at $v$ is the graph
\begin{align*}E^s_v=\{(j(0)e(0),r_s(v)j(0)e(0)):j(0)\in\mathbb R\},\end{align*}
where $r_s(v)$ is the stable Riccati solution $r=j'/j$. The Riccati equation is
\begin{align*}r'(t)+r(t)^2+K(\gamma_v(t))=0.\end{align*}
By the standard Jacobi comparison theorem for negatively curved surfaces, the stable solution satisfies
\begin{align*}-b\le r_s(\varphi_t v)\le -a\end{align*}
for all $t\ge 0$. Hence, for a stable Jacobi field,
\begin{align*}j(t)=j(0)\exp\left(\int_0^t r_s(\varphi_\tau v)\,d\tau\right).\end{align*}
Since $r_s(\varphi_\tau v)\le -a$ for every $\tau\ge 0$,
\begin{align*}|j(t)|\le e^{-at}|j(0)|.\end{align*}
Also $j'(t)=r_s(\varphi_t v)j(t)$ and $|r_s|\le b$, so
\begin{align*}|j'(t)|\le b e^{-at}|j(0)|.\end{align*}
Thus the derivative of the geodesic flow contracts vectors in $E^s_v$ exponentially in forward time, with constants depending only on $a,b$ and the Sasaki metric.
The unstable line $E^u_v$ is defined in the same way using Jacobi fields that decay as $t\to-\infty$, so the same calculation applied to the reversed geodesic gives exponential contraction of $D\varphi_{-t}$ on $E^u_v$. The remaining one-dimensional direction is $\mathbb R X(v)$, the velocity direction of the geodesic flow itself. Therefore
\begin{align*}T_vT^1S=E^s_v\oplus \mathbb R X(v)\oplus E^u_v,\end{align*}
and the whole unit tangent bundle $T^1S$ is a hyperbolic set for $\varphi_t$: negative curvature supplies the stable and unstable Jacobi-field directions, while the flow direction is the unavoidable neutral direction along the orbit.
[/example]
The geodesic-flow example shows why hyperbolicity belongs to geometry as much as to linear algebra. Negative curvature separates geodesics exponentially, and this geometric divergence becomes the unstable direction in the tangent dynamics.
## Stable and Unstable Sets
Once hyperbolic directions have been identified, the next question is whether they integrate to actual nonlinear sets of points that converge together under iteration. Stable and unstable sets provide the topological version of this idea before any smooth-manifold structure is asserted.
[definition: Stable and Unstable Sets]
Let $f: M \to M$ be a homeomorphism of a metric space $(M,d)$, and let $x \in M$. The stable set and unstable set of $x$ are
\begin{align*}
W^s(x) = \{y \in M : d(f^n(y), f^n(x)) \to 0 \text{ as } n \to \infty\},
\end{align*}
and
\begin{align*}
W^u(x) = \{y \in M : d(f^{-n}(y), f^{-n}(x)) \to 0 \text{ as } n \to \infty\}.
\end{align*}
[/definition]
These sets are defined for any homeomorphism, but hyperbolicity is what turns them into smooth objects with tangent spaces $E^s_x$ and $E^u_x$. To prove such a smooth statement, we first need versions that only track orbits while they remain inside a controlled neighbourhood.
[definition: Local Stable and Unstable Sets]
Let $f: M \to M$ be a homeomorphism of a metric space $(M,d)$, let $x \in M$, and let $\varepsilon > 0$. The local stable and unstable sets of size $\varepsilon$ are
\begin{align*}
W^s_\varepsilon(x) = \{y \in M : d(f^n(y), f^n(x)) \le \varepsilon \text{ for all } n \ge 0\},
\end{align*}
and
\begin{align*}
W^u_\varepsilon(x) = \{y \in M : d(f^{-n}(y), f^{-n}(x)) \le \varepsilon \text{ for all } n \ge 0\}.
\end{align*}
[/definition]
The local sets are easier to control because the orbit comparison remains inside a coordinate neighbourhood where nonlinear errors can be estimated. Global stable and unstable sets are then recovered by iterating local pieces forward or backward.
[example: Stable Sets for a Linear Saddle]
Consider $A:\mathbb R^2\to\mathbb R^2$ given by $A(x,y)=(2x,y/3)$, with fixed point $0=(0,0)$. We compute the forward and backward iterates in order to identify $W^s(0)$ and $W^u(0)$.
For $n=1$,
\begin{align*}A(x,y)=(2x,3^{-1}y).\end{align*}
If $A^n(x,y)=(2^n x,3^{-n}y)$, then
\begin{align*}A^{n+1}(x,y)=A(2^n x,3^{-n}y)=(2\cdot 2^n x,3^{-1}\cdot 3^{-n}y)=(2^{n+1}x,3^{-(n+1)}y).\end{align*}
Thus, by induction,
\begin{align*}A^n(x,y)=(2^n x,3^{-n}y)\end{align*}
for every $n\ge 0$.
The distance from $A^n(x,y)$ to $0$ in the Euclidean metric is
\begin{align*}|A^n(x,y)|=\sqrt{(2^n x)^2+(3^{-n}y)^2}.\end{align*}
If $x=0$, then
\begin{align*}|A^n(0,y)|=\sqrt{0^2+(3^{-n}y)^2}=3^{-n}|y|\to 0.\end{align*}
If $x\ne 0$, then
\begin{align*}|A^n(x,y)|=\sqrt{(2^n x)^2+(3^{-n}y)^2}\ge |2^n x|=2^n|x|,\end{align*}
and $2^n|x|\to\infty$, so the forward iterates do not converge to $0$. Therefore
\begin{align*}W^s(0)=\{(0,y):y\in\mathbb R\},\end{align*}
the vertical axis.
Since $A^{-1}(x,y)=(x/2,3y)$, the same induction gives
\begin{align*}A^{-n}(x,y)=(2^{-n}x,3^n y).\end{align*}
If $y=0$, then
\begin{align*}|A^{-n}(x,0)|=\sqrt{(2^{-n}x)^2+0^2}=2^{-n}|x|\to 0.\end{align*}
If $y\ne 0$, then
\begin{align*}|A^{-n}(x,y)|=\sqrt{(2^{-n}x)^2+(3^n y)^2}\ge |3^n y|=3^n|y|,\end{align*}
and $3^n|y|\to\infty$, so the backward iterates do not converge to $0$. Hence
\begin{align*}W^u(0)=\{(x,0):x\in\mathbb R\},\end{align*}
the horizontal axis. The saddle separates the plane into one direction that contracts in forward time and one direction that contracts in backward time.
[/example]
The linear saddle is the picture that the stable manifold theorem preserves under nonlinear perturbation. The theorem does not merely state that nearby points converge; it states that the collection of convergent points forms a smooth embedded disk tangent to the stable eigenspace.
## Stable Manifolds for Hyperbolic Fixed Points
The local theory starts with a single hyperbolic fixed point because the splitting is then an ordinary splitting of one tangent space. The central problem is to pass from the linear decomposition of $Df_p$ to nonlinear invariant manifolds for $f$ near $p$.
[definition: Hyperbolic Fixed Point]
Let $M$ be a smooth manifold, let $f: M \to M$ be a $C^1$ diffeomorphism, and let $p \in M$ satisfy $f(p)=p$. The fixed point $p$ is hyperbolic if no eigenvalue of the Jacobian matrix $Jf_p$ has modulus $1$.
[/definition]
The eigenspaces of $Jf_p$ with eigenvalues inside and outside the unit circle give the stable and unstable tangent directions. The linearization only describes first-order motion at $p$; it does not automatically produce actual nonlinear sets of points that converge to or escape from $p$. Hyperbolicity removes centre directions, leaving the central problem of whether the linear stable and unstable spaces integrate to invariant disks for the nonlinear map.
[quotetheorem:7747]
[citeproof:7747]
This theorem is robust because invariant manifolds are fixed points of a contraction on a space of graphs, and the contraction comes from domination of the stable contraction over the unstable expansion. The hyperbolicity hypothesis is essential. For instance, the one-dimensional map $f(x)=x+x^2$ has $Df_0=1$, and points on one side of $0$ drift away while points on the other side approach only at a non-exponential rate, so there is no stable disk governed by a contracting eigenspace. The theorem is also local: it produces embedded disks near $p$ and exponential convergence while iterates remain in the controlled neighbourhood, but it does not say that the global stable set is a single embedded copy of Euclidean space without further hypotheses. This is the local version of the geometric stretching-and-folding mechanism behind horseshoes.
[example: Nonlinear Saddle]
Let $f(x,y)=(2x+x^2,y/3+xy)$, so $f(0,0)=(0,0)$. For an increment $(u,v)$, differentiating the two coordinate functions gives
\begin{align*}Df_{(x,y)}(u,v)=((2+2x)u,yu+(1/3+x)v).\end{align*}
At the fixed point this becomes
\begin{align*}Df_0(u,v)=(2u,v/3).\end{align*}
Thus the horizontal axis is the eigenspace for the eigenvalue $2$, because
\begin{align*}Df_0(u,0)=(2u,0)=2(u,0),\end{align*}
and the vertical axis is the eigenspace for the eigenvalue $1/3$, because
\begin{align*}Df_0(0,v)=(0,v/3)=\frac13(0,v).\end{align*}
Since $|2|>1$ and $|1/3|<1$, neither eigenvalue has modulus $1$, so $0$ is a hyperbolic fixed point.
By the *Stable Manifold Theorem for Hyperbolic Fixed Points*, there is a local stable curve tangent to
\begin{align*}E^s_0=\{(0,y):y\in\mathbb R\}\end{align*}
and a local unstable curve tangent to
\begin{align*}E^u_0=\{(x,0):x\in\mathbb R\}.\end{align*}
In this particular example the coordinate axes are already invariant. On the vertical axis,
\begin{align*}f(0,y)=(0,y/3+0\cdot y)=(0,y/3),\end{align*}
so induction gives
\begin{align*}f^n(0,y)=(0,3^{-n}y)\end{align*}
for every $n\ge 0$, and hence
\begin{align*}|f^n(0,y)|=3^{-n}|y|\to 0.\end{align*}
On the horizontal axis,
\begin{align*}f(x,0)=(2x+x^2,0/3+x\cdot 0)=(2x+x^2,0),\end{align*}
so points on that axis remain on it and the one-dimensional dynamics has derivative $2$ at the origin. The example shows that the invariant-manifold theorem recovers the stable and unstable tangent directions from the linearization, while the nonlinear terms determine the actual local dynamics on and near those invariant curves.
[/example]
The stable manifold theorem gives smooth invariant geometry, but it does not by itself describe the whole orbit structure near the fixed point. The next theorem says that, topologically, the nonlinear map near a hyperbolic fixed point has the same orbit pattern as its derivative.
[quotetheorem:2777]
[proofunderconstruction:2777]
Hartman-Grobman complements the stable manifold theorem. Stable manifolds give differentiable invariant submanifolds, while Hartman-Grobman gives a topological classification of nearby orbits. Hyperbolicity is again indispensable: if $Df_p$ has an eigenvalue on the unit circle, nonlinear terms can change the orbit structure near $p$, as in $f(x)=x+x^2$ near $0$, whose derivative is the identity but whose nearby orbits are not topologically organised like the identity map. The conclusion is deliberately topological rather than smooth; it preserves the qualitative orbit pattern but not distances, derivatives, expansion rates, or smooth invariant foliations. This limitation is exactly why the next step separates the smooth invariant-manifold theory from the topological conjugacy theory and then asks how both extend from one fixed point to a whole compact invariant set.
## Stable Manifolds for Compact Hyperbolic Sets
For a horseshoe or a solenoid attractor, there is no single fixed point around which to build a graph. The problem is to construct stable and unstable manifolds at every point of a compact hyperbolic set in a way that varies coherently with the base point.
[quotetheorem:7748]
[proofunderconstruction:7748]
The new ingredient is uniformity over $\Lambda$. Compactness is used to replace pointwise estimates by constants that work simultaneously for every base point. Without compactness or an explicitly uniform hypothesis, the contraction rates may deteriorate along a sequence of base points, so graph transforms that work at each individual point need not have a common size or common exponential rate. Without hyperbolicity itself, centre directions may appear and the local stable set can fail to be a smooth disk tangent to a well-defined stable bundle. The theorem is still local in the transverse directions: it gives coherent local disks and exponential estimates, but it does not assert that the union of all global stable sets forms a single embedded submanifold of $M$.
[example: Solenoid Attractor]
Model the solid torus as $S^1\times D^2$, with coordinates $(\theta,z)$, where $\theta\in \mathbb R/\mathbb Z$ and $z\in D^2\subset \mathbb R^2$. A standard Smale-Williams type embedding has the form
\begin{align*}F(\theta,z)=(2\theta \bmod 1,\; a z+\psi(\theta))\end{align*}
with $0<a<1$ and with $\psi(\theta)$ chosen so that the image is a thinner solid torus winding twice around the core. The maximal invariant set is
\begin{align*}\Lambda=\bigcap_{n\ge 0}F^n(S^1\times D^2),\end{align*}
so a point belongs to $\Lambda$ exactly when it remains in every nested forward image of the solid torus.
The angular coordinate expands because, for two nearby angles lifted to $\mathbb R$,
\begin{align*}|(2\theta_1)-(2\theta_2)|=2|\theta_1-\theta_2|.\end{align*}
Thus the derivative in the angular direction multiplies lengths by $2$. In each transverse disk, the linear part of the second coordinate is multiplication by $a$, so for transverse vectors $v\in \mathbb R^2$,
\begin{align*}|a v|=a|v|.\end{align*}
After $n$ iterates the angular derivative has size $2^n$, while the transverse derivative has size $a^n$:
\begin{align*}|D(2^n)(\delta\theta)|=2^n|\delta\theta|.\end{align*}
\begin{align*}|a^n v|=a^n|v|.\end{align*}
Since $0<a<1$, the transverse directions contract exponentially in forward time, and since $2^{-n}\to 0$, the angular direction contracts exponentially in backward time along inverse branches.
Therefore the stable directions are the two-dimensional transverse disk directions, and the unstable direction is the one-dimensional angular direction inside the solenoid. Local stable manifolds are small transverse disks: points in the same such disk have transverse separation multiplied by $a^n$, so their forward orbits converge. Local unstable manifolds follow the expanding core direction and record the backward itinerary of the point. The solenoid is therefore a compact hyperbolic attractor with two contracting stable directions and one expanding unstable direction.
[/example]
The solenoid illustrates why stable manifolds for compact hyperbolic sets are local families rather than one global manifold. Stable directions fill a neighbourhood of the attractor, while unstable directions remain tied to the folded invariant set.
## Local Product Structure
Hyperbolic sets have two transverse directions at every point, so a natural question is whether nearby points can be combined by taking the stable coordinate of one and the unstable coordinate of another. Local product structure formalises this idea and supplies the geometric basis for Markov partitions and symbolic codings.
[definition: Bracket Map]
Let $\Lambda$ be a compact hyperbolic set for a diffeomorphism $f: M \to M$. A bracket map on $\Lambda$ is a partially defined map $(x,y) \mapsto [x,y]$ for sufficiently close $x,y\in\Lambda$, where $[x,y]$ is the unique intersection point
\begin{align*}
[x,y] \in W^u_\varepsilon(x) \cap W^s_\varepsilon(y) \cap \Lambda.
\end{align*}
[/definition]
The notation $[x,y]$ means: keep the past of $x$ and the future of $y$. The obstruction is that stable and unstable plaques need not meet uniquely at large scales, and even a nearby ambient intersection might leave the invariant set under consideration. To use bracket notation as a genuine coordinate operation, hyperbolicity must force a unique nearby intersection and local maximality must keep that intersection inside $\Lambda$.
[quotetheorem:7749]
[citeproof:7749]
Local product structure is the geometric replacement for rectangular coordinates. Stable leaves are one family of coordinate slices, unstable leaves are the other, and the bracket operation identifies the point determined by choosing one slice from each family. The word local is essential: stable and unstable manifolds can wrap around and meet several times at larger scales, so uniqueness is only guaranteed inside a sufficiently small neighbourhood where transversality controls the intersection. The local maximality hypothesis is also doing real work. It ensures that the intersection point whose orbit shadows the past of $x$ and the future of $y$ remains in the isolated invariant set $\Lambda$; without local maximality, the same ambient intersection can lie near $\Lambda$ while not belonging to $\Lambda$ itself.
[example: Product Structure in the Toral Automorphism]
Choose nonzero vectors $e_s\in L^s$ and $e_u\in L^u$. Since $A$ is hyperbolic, $L^s\ne L^u$, so every vector in $\mathbb R^2$ has a unique decomposition in the basis $(e_s,e_u)$. Lift the nearby points $x,y\in\mathbb T^2$ to nearby points $X,Y\in\mathbb R^2$, and write their difference as
\begin{align*}Y-X=\alpha e_s+\beta e_u.\end{align*}
The lifted local unstable line through $X$ is
\begin{align*}X+\mathbb R e_u=\{X+t e_u:t\in\mathbb R\},\end{align*}
and the lifted local stable line through $Y$ is
\begin{align*}Y+\mathbb R e_s=\{Y+r e_s:r\in\mathbb R\}.\end{align*}
To find their intersection, solve
\begin{align*}X+t e_u=Y+r e_s.\end{align*}
Substituting $Y=X+\alpha e_s+\beta e_u$ gives
\begin{align*}X+t e_u=X+\alpha e_s+\beta e_u+r e_s.\end{align*}
Cancelling $X$ gives
\begin{align*}t e_u=(\alpha+r)e_s+\beta e_u.\end{align*}
Moving all terms to one side gives
\begin{align*}(\alpha+r)e_s+(\beta-t)e_u=0.\end{align*}
Since $e_s,e_u$ are linearly independent, both coefficients vanish:
\begin{align*}\alpha+r=0,\qquad \beta-t=0.\end{align*}
Thus
\begin{align*}r=-\alpha,\qquad t=\beta.\end{align*}
The unique lifted intersection point is therefore
\begin{align*}Z=X+\beta e_u=Y-\alpha e_s.\end{align*}
Projecting $Z$ to the torus gives the bracket point:
\begin{align*}[x,y]=[Z]\in\mathbb T^2.\end{align*}
For sufficiently close $x$ and $y$, the chosen lifts lie in one evenly covered coordinate neighbourhood, so this lifted intersection projects to the unique local intersection of the unstable line through $x$ and the stable line through $y$. The bracket operation is therefore exactly the operation of keeping the unstable coordinate of $x$ and the stable coordinate of $y$.
[/example]
This example also explains why product structure is local. On the torus the projected stable and unstable lines may wrap around and meet many times globally, but in a sufficiently small neighbourhood their transverse intersection is unique.
## Structural Consequences and Models
The final question in this chapter is what hyperbolicity buys us beyond local invariant manifolds. The answer is that hyperbolic dynamics is organised enough to be modelled symbolically, perturbed robustly, and analysed statistically in later chapters.
[quotetheorem:7750]
[citeproof:7750]
This persistence theorem is the reason hyperbolic systems form a stable class of chaotic examples. Their orbit structure survives small $C^1$ perturbations because stable and unstable directions cannot disappear without losing the exponential estimates. Both hypotheses are needed. If hyperbolicity is dropped, a neutral fixed point can split, disappear, or change stability under an arbitrarily small perturbation; if local maximality is dropped, there may be no isolated continuation $\Lambda_g$ to track. The conjugacy is also confined to the continued invariant sets, not to whole neighbourhoods of them, and it is topological rather than differentiable. Thus persistence protects the symbolic orbit structure used for Markov partitions and entropy, while leaving metric quantities and smooth geometry to require separate estimates.
[example: Hyperbolicity Behind a Horseshoe]
For a Smale horseshoe with Markov rectangles $R_0$ and $R_1$, the invariant set is
\begin{align*}\Lambda=\bigcap_{n\in\mathbb Z} f^{-n}(R_0\cup R_1).\end{align*}
Thus $z\in\Lambda$ exactly when every iterate $f^n(z)$ lies in one of the two rectangles. Its itinerary is the bi-infinite sequence $(s_n)_{n\in\mathbb Z}$ defined by
\begin{align*}f^n(z)\in R_{s_n},\qquad s_n\in\{0,1\}.\end{align*}
In the usual horseshoe coordinates, the map contracts one direction by a factor $0<a<1$ and expands the transverse direction by a factor $b>1$. If two points have the same future symbols, then their forward iterates stay in the same nested vertical strips, and the stable separation satisfies
\begin{align*}d_s(f^n x,f^n y)\le a^n d_s(x,y).\end{align*}
Since $a^n\to 0$, points with the same future itinerary lie on the same local stable set. If two points have the same past symbols, then applying the same estimate to $f^{-1}$ in the expanding direction gives
\begin{align*}d_u(f^{-n}x,f^{-n}y)\le b^{-n}d_u(x,y).\end{align*}
Since $b^{-n}\to 0$, points with the same past itinerary lie on the same local unstable set.
For nearby $x,y\in\Lambda$, the bracket point $[x,y]$ is the unique point in the local unstable set of $x$ and the local stable set of $y$. Therefore its itinerary is
\begin{align*}s_n([x,y])=s_n(x)\quad\text{for }n<0,\end{align*}
and
\begin{align*}s_n([x,y])=s_n(y)\quad\text{for }n\ge 0.\end{align*}
So the bracket operation literally combines the past itinerary of $x$ with the future itinerary of $y$, which is why the horseshoe dynamics is modeled by the full shift on two symbols.
[/example]
The examples in this chapter should be read as three levels of the same theory. Hyperbolic toral automorphisms give global linear models, geodesic flows give smooth geometric models with a neutral flow direction, and solenoids give attractors where expansion and contraction coexist on a fractal invariant set. The next chapters use these stable and unstable structures to build Markov partitions, define entropy, and connect deterministic stretching with statistical behaviour.
Hyperbolic sets and stable manifolds explain how local expansion and contraction organise dynamics, yet the most striking global complexity comes from how invariant manifolds intersect. The next chapter studies homoclinic intersections and the [lambda lemma](/theorems/7751), which turn those local structures into a source of new horseshoes.
# 5. Homoclinic Intersections and the Lambda Lemma
These notes study how local hyperbolicity in a smooth dynamical system produces global orbit complexity. The course has already developed hyperbolic fixed and periodic points, stable and unstable manifolds, and the horseshoe as a model of symbolic dynamics. This chapter connects those ingredients: it explains why horseshoes are not artificial examples, but arise naturally when invariant manifolds of a hyperbolic orbit return and cross.
Homoclinic intersections are the geometric mechanism by which the stable and unstable directions of a hyperbolic orbit fold back into each other. Building on Chapter 3's horseshoe construction and Chapter 4's stable and unstable manifolds, this chapter uses transversality and symbolic itineraries to explain how such intersections generate horseshoes.
The guiding question is local-to-global. If a map has a hyperbolic periodic point, its stable and unstable manifolds are local objects; if these manifolds meet again away from the periodic orbit, the map remembers the local hyperbolicity after a long global excursion. The Lambda Lemma makes this precise, and the [Smale-Birkhoff Homoclinic Theorem](/theorems/7752) converts the geometry into symbolic dynamics.
## Transverse Homoclinic Points
What does it mean for an orbit to leave a hyperbolic periodic point and return to it in both forward and backward time? The answer is encoded by the intersection of the stable and unstable manifolds. Such an intersection is much stronger when it is transverse, because transversality survives perturbation and supplies the crossing geometry needed for a horseshoe.
Let $M$ be a smooth surface and let $f:M \to M$ be a $C^r$ diffeomorphism, $r \ge 1$. Suppose $p$ is a hyperbolic fixed point; for a hyperbolic periodic point, the same definitions apply to the first return map $f^k$ where $k$ is the period.
[definition: Stable And Unstable Manifolds Of A Hyperbolic Fixed Point]
The stable and unstable manifolds of $p$ are the sets
\begin{align*}
W^s(p) &= \{x \in M : f^n(x) \to p \text{ as } n \to \infty\},
\end{align*}
and
\begin{align*}
W^u(p) &= \{x \in M : f^{-n}(x) \to p \text{ as } n \to \infty\}.
\end{align*}
Their local branches near $p$ are denoted $W^s_{\mathrm{loc}}(p)$ and $W^u_{\mathrm{loc}}(p)$.
[/definition]
The stable manifold contains points that converge to $p$ under forward iteration, while the unstable manifold contains points whose backward iterates converge to $p$. The next problem is to name the points where these two global manifolds meet again away from $p$, because such a return records an orbit asymptotic to the same invariant object in both time directions.
[definition: Homoclinic Point]
A point $q \in M$ is a homoclinic point for $p$ if
\begin{align*}
q \in W^s(p) \cap W^u(p), \qquad q \ne p.
\end{align*}
The full orbit $\{f^n(q): n \in \mathbb Z\}$ is called a homoclinic orbit.
[/definition]
A homoclinic point is not just a single crossing: every iterate $f^n(q)$ is again homoclinic. Thus a single return creates an infinite chain of returns accumulating on $p$ in both time directions. The next issue is whether the stable and unstable manifolds only touch at the return or cross with independent tangent directions.
[definition: Transverse Homoclinic Point]
A homoclinic point $q \in W^s(p) \cap W^u(p)$ is transverse if
\begin{align*}
T_q W^s(p) + T_q W^u(p) = T_q M.
\end{align*}
[/definition]
For a surface diffeomorphism, transversality says that the two one-dimensional tangent lines at $q$ are distinct. This is the condition that makes the intersection dynamically robust. If two curves cross transversely, small $C^1$ perturbations still have a nearby intersection. In dynamics, this robustness matters because nearby iterates of unstable arcs will keep crossing stable strips, producing infinitely many folded returns.
[example: Linear Saddle With No Homoclinic Return]
For $A(x,y)=(2x,y/2)$, the origin is fixed because
\begin{align*}
A(0,0)=(0,0).
\end{align*}
The derivative is the same linear map at every point: it multiplies the $x$-direction by $2$ and the $y$-direction by $1/2$. Since $|2|>1$ and $|1/2|<1$, the origin is a hyperbolic saddle. Iterating gives
\begin{align*}
A^n(x,y)=(2^n x,2^{-n}y).
\end{align*}
Thus $A^n(x,y)\to(0,0)$ as $n\to\infty$ exactly when $x=0$, because $2^n x\to 0$ forces $x=0$, while $2^{-n}y\to 0$ for every $y\in\mathbb R$. Hence
\begin{align*}
W^s(0)=\{(0,y):y\in\mathbb R\}.
\end{align*}
Similarly,
\begin{align*}
A^{-n}(x,y)=(2^{-n}x,2^n y),
\end{align*}
so $A^{-n}(x,y)\to(0,0)$ exactly when $y=0$. Therefore
\begin{align*}
W^u(0)=\{(x,0):x\in\mathbb R\}.
\end{align*}
Their intersection is
\begin{align*}
W^s(0)\cap W^u(0)=\{(0,0)\},
\end{align*}
because a point in both sets must have first coordinate $0$ and second coordinate $0$. Since a homoclinic point must be different from the fixed point, this linear saddle has no homoclinic point. The example shows that local expansion and contraction alone do not create a homoclinic return; some global folding of the invariant manifolds is also needed.
[/example]
The previous example shows why the global shape of invariant manifolds is essential. A transverse homoclinic point occurs only when the unstable manifold returns across the stable manifold after leaving a neighbourhood of $p$. This is the first appearance of the stretching-and-folding geometry used in the construction of horseshoes.
[remark: Tangencies Versus Crossings]
A homoclinic tangency is an intersection $q \in W^s(p) \cap W^u(p)$ where the tangent spaces fail to span $T_qM$. Tangencies are important in bifurcation theory, but the present chapter focuses on transverse intersections because they lead directly to uniformly hyperbolic horseshoes and symbolic coding.
[/remark]
This distinction between tangency and crossing is the geometric fork in the theory. Tangencies may unfold into complicated parameter-dependent phenomena, while transverse intersections already contain a stable symbolic core. The next step is to understand how a single transverse crossing propagates under iteration.
## The Lambda Lemma
Why should an unstable curve that crosses the stable manifold eventually look like the unstable manifold itself? The Lambda Lemma answers this question. It says that transverse crossing of the stable manifold forces forward iterates of the curve to accumulate on the local unstable manifold in the $C^1$ sense.
The statement is local near a hyperbolic fixed point, but its consequences are global because an arc can enter the local neighbourhood after travelling through the phase space. For simplicity we state the two-dimensional version used in the construction of horseshoes.
[quotetheorem:7751]
[citeproof:7751]
The lemma says that $W^u_{\mathrm{loc}}(p)$ is not isolated as a curve: many other curves that cross $W^s_{\mathrm{loc}}(p)$ are pulled into its shape by forward iteration. Each hypothesis prevents a different failure mode. The transversality hypothesis is essential. In the linear saddle $f(x,y)=(x/2,2y)$, the $x$-axis is stable and the $y$-axis is unstable. The curve
\begin{align*}
\gamma=\{(x,0): |x|<\varepsilon\}
\end{align*}
is tangent to, and in fact contained in, $W^s_{\mathrm{loc}}(0)$. Its forward iterates are
\begin{align*}
f^n(\gamma)=\{(2^{-n}x,0): |x|<\varepsilon\},
\end{align*}
so every iterate remains on the stable axis and shrinks toward the origin. No subarc of $f^n(\gamma)$ can approximate a non-degenerate compact interval in the unstable axis in the $C^1$ topology. Hyperbolicity is what supplies simultaneous contraction and expansion; near a neutral fixed point, iterates may shear or rotate without forcing convergence to an unstable direction.
The assumptions that $f$ is a diffeomorphism and that the phase space is a surface also matter in this version. Invertibility gives a genuine unstable manifold defined by backward convergence; for a non-invertible map, different inverse branches can merge, and the unstable set need not be a single embedded curve to which a graph-transform argument applies. The surface assumption keeps the local stable and unstable manifolds one-dimensional and makes transversality a crossing of curves. In higher dimensions the same idea has valid generalisations, but the statement must track dimensions and submanifolds rather than arcs. The $C^r$ regularity ensures that the invariant manifolds and the iterated curve have enough differentiability for $C^1$ convergence; for merely continuous dynamics there is no tangent direction to compare.
The lemma also has a local scope. It does not say that the whole curve $f^n(\gamma)$ converges to $W^u_{\mathrm{loc}}(p)$, nor that the dynamics outside the chosen neighbourhood is controlled. It asserts that suitable subarcs, after sufficiently many iterates, approximate compact pieces of the local unstable manifold in the $C^1$ topology. The name comes from the Greek letter $\lambda$, meant to evoke a curve crossing the stable direction and then being stretched along the unstable direction.
[example: Graph Transform Near A Saddle]
Take the local saddle
\begin{align*}
f(x,y)=(\lambda x,\mu y)
\end{align*}
with $0<|\lambda|<1<|\mu|$, so the $x$-axis is stable and the $y$-axis is unstable. Let $\gamma$ be a curve crossing the stable axis at the origin, written near the origin as
\begin{align*}
\gamma(x)=(x,\varphi(x)), \qquad \varphi(x)=a x+R(x),
\end{align*}
where $a\ne 0$, $R(x)=O(x^2)$, and therefore $R'(x)\to 0$ as $x\to 0$.
The $n$th iterate of a point on this curve is obtained by applying $f$ repeatedly:
\begin{align*}
f^n(x,\varphi(x))=(\lambda^n x,\mu^n\varphi(x)).
\end{align*}
Substituting $\varphi(x)=a x+R(x)$ gives
\begin{align*}
f^n(x,\varphi(x))=(\lambda^n x,\mu^n a x+\mu^n R(x)).
\end{align*}
The tangent vector to $f^n(\gamma)$ at this point is the derivative with respect to $x$:
\begin{align*}
\frac{d}{dx}f^n(x,\varphi(x))=(\lambda^n,\mu^n\varphi'(x)).
\end{align*}
Since $\varphi'(x)=a+R'(x)$, the slope of this tangent vector is
\begin{align*}
\frac{\mu^n\varphi'(x)}{\lambda^n}=\left(\frac{\mu}{\lambda}\right)^n(a+R'(x)).
\end{align*}
For $x$ sufficiently close to $0$, the factor $a+R'(x)$ stays nonzero, while
\begin{align*}
\left|\frac{\mu}{\lambda}\right|^n\to \infty
\end{align*}
because $|\mu|>1$ and $0<|\lambda|<1$. Hence the tangent slopes of the iterated curves become unbounded in magnitude, which means their tangent directions approach the vertical $y$-axis, the unstable direction.
At the same time, horizontal variation is multiplied by $|\lambda|^n$ and vertical variation is multiplied by $|\mu|^n$. Thus the iterates contract stable displacement and expand unstable displacement, and after taking the subarc that remains in a fixed local box, the curve becomes closer in tangent direction to the local unstable manifold. This is the linear graph-transform mechanism behind the $C^1$ convergence in the Lambda Lemma.
[/example]
The local graph-transform picture is the analytic engine of the chapter. It turns one transverse crossing into a sequence of long arcs following the unstable manifold. When the original crossing comes from a homoclinic point, these long arcs must eventually return near the crossing again, creating repeated strips of intersection.
[remark: Forward And Backward Versions]
Applying the Lambda Lemma to $f^{-1}$ gives the dual statement: a curve crossing $W^u_{\mathrm{loc}}(p)$ transversely has backward iterates accumulating on $W^s_{\mathrm{loc}}(p)$. The two versions are often used together near a transverse homoclinic orbit.
[/remark]
This backward form matters because a horseshoe requires two transverse directions at once. Forward iteration stretches along unstable leaves, while backward iteration explains the stable slicing. Together they produce the rectangle-with-strips geometry of symbolic dynamics.
## Smale-Birkhoff Homoclinic Theorem
How does a transverse homoclinic point force symbolic dynamics rather than merely complicated-looking curves? The answer is that a sufficiently high iterate of the map contains a compact invariant set on which the dynamics is conjugate, or at least semi-conjugate in common formulations, to a full shift on finitely many symbols. In the two-strip case, the relevant symbolic system is the full two-shift.
Before stating the theorem, recall what the horseshoe conclusion means. A compact invariant set $\Lambda$ carries symbolic dynamics if each point of $\Lambda$ is assigned an itinerary recording which strip it visits under iteration, and every admissible itinerary is realised by at least one orbit.
[quotetheorem:7752]
[proofunderconstruction:7752]
The theorem is the bridge from geometry to symbolic dynamics. Each main hypothesis has a specific role. Hyperbolicity supplies stable and unstable directions with uniform contraction and expansion; without it, a recurrent orbit near a neutral fixed point can fail to generate persistent strips. Transversality rules out the degenerate case of a homoclinic tangency, where a small perturbation may destroy the intersection or unfold it in a parameter-dependent way rather than giving a uniformly hyperbolic horseshoe. Invertibility is also part of the geometry: stable and unstable manifolds are both used, so the standard proof is not a theorem about arbitrary non-invertible maps.
The conclusion is also deliberately local. The invariant set $\Lambda$ is usually small, totally disconnected in transverse directions, and contained near the homoclinic orbit after passing to a return iterate. The theorem does not say that most nearby initial conditions have symbolic dynamics, nor that the whole neighbourhood is chaotic. It says that within the neighbourhood there is a compact hyperbolic subsystem with the orbit complexity of a shift: many periodic points, sensitive dependence on that invariant set, and positive topological entropy for the ambient map.
The Smale-Birkhoff theorem gives a horseshoe in structural terms, but the course next needs the concrete dynamical consequences that will feed into entropy and recurrence. Passing from "there are Markov strips" to "there are all binary itineraries" is the step that turns the geometric construction into countably many periodic orbits and a positive entropy lower bound. The following consequence records that translation explicitly, while keeping track of the return iterate used to see the horseshoe.
[quotetheorem:7753]
[citeproof:7753]
This result explains why homoclinic intersections are treated as a main route to chaos. The original system may be smooth and deterministic, yet the invariant set near the homoclinic orbit contains the combinatorics of arbitrary binary sequences.
The hypotheses cannot be weakened without changing the conclusion. If the homoclinic intersection is tangential rather than transverse, the nested rectangles used to code binary sequences may collapse, and the resulting dynamics can depend on a delicate unfolding rather than on a robust horseshoe. If the periodic point is not hyperbolic, there may be no stable and unstable manifolds with the contraction estimates needed to make distinct itineraries determine distinct orbits. The theorem also gives a subsystem, not a statistical description of typical nearby trajectories; points outside the maximal invariant set in the strips may enter the neighbourhood only briefly and then escape.
[example: Standard Map Near A Hyperbolic Periodic Point]
For the standard map $S_K:S^1\times \mathbb R\to S^1\times \mathbb R$ given by
\begin{align*}
y_{n+1}=y_n+K\sin x_n,
\end{align*}
and
\begin{align*}
x_{n+1}=x_n+y_{n+1}\pmod {2\pi},
\end{align*}
we can check explicitly that $(0,0)$ is a hyperbolic fixed point when $K>0$. Since $\sin 0=0$, the first coordinate update gives
\begin{align*}
y_1=0+K\sin 0=0,
\end{align*}
and then the second coordinate update gives
\begin{align*}
x_1=0+y_1=0\pmod {2\pi}.
\end{align*}
Thus $S_K(0,0)=(0,0)$.
Writing the map in the order $(x,y)$, we have
\begin{align*}
S_K(x,y)=(x+y+K\sin x,\ y+K\sin x).
\end{align*}
The partial derivatives are $\partial_x(x+y+K\sin x)=1+K\cos x$, $\partial_y(x+y+K\sin x)=1$, $\partial_x(y+K\sin x)=K\cos x$, and $\partial_y(y+K\sin x)=1$. Hence the derivative at $(0,0)$ is the linear map
\begin{align*}
(u,v)\mapsto ((1+K)u+v,\ Ku+v).
\end{align*}
An eigenvalue $t$ therefore satisfies
\begin{align*}
((1+K)-t)(1-t)-K=0.
\end{align*}
Expanding the left-hand side gives
\begin{align*}
((1+K)-t)(1-t)-K=(1+K)-(2+K)t+t^2-K.
\end{align*}
Combining the constant terms gives the characteristic equation
\begin{align*}
t^2-(2+K)t+1=0.
\end{align*}
Its discriminant is
\begin{align*}
(2+K)^2-4=K^2+4K=K(K+4).
\end{align*}
For $K>0$, this discriminant is positive, so the two eigenvalues are real and distinct. Their product is the constant term $1$, and their sum is $2+K>2$; therefore both are positive, one is larger than $1$, and the other is its reciprocal, hence lies between $0$ and $1$. Thus $(0,0)$ is a hyperbolic saddle fixed point.
The stable and unstable manifolds of this saddle are curves in the cylinder. If a branch of $W^u(0,0)$ wraps around the cylinder and meets a branch of $W^s(0,0)$ at a point $q\ne (0,0)$, then $q$ is a homoclinic point. If the tangent lines $T_qW^u(0,0)$ and $T_qW^s(0,0)$ are distinct, the homoclinic point is transverse, so the *Smale-Birkhoff Homoclinic Theorem* applies: some return iterate of the standard map contains a horseshoe near the homoclinic orbit. The alternating lobes cut out by the stable and unstable curves are the geometric pieces that become symbols in the resulting itinerary.
[/example]
This example is prototypical for area-preserving maps: invariant manifolds cannot simply terminate, and their recurrent intersections create a web of lobes. The theorem does not require a numerical picture, but the picture helps identify where the hypotheses should be checked.
## Homoclinic Geometry In Poincare Maps
How do homoclinic intersections arise in flows, where trajectories are continuous curves rather than iterates of a map? The standard method is to pass to a Poincare return map. A periodic orbit of the flow becomes a fixed point of the return map, and its stable and unstable manifolds cut the section in invariant manifolds for that map.
[definition: Poincare Map Near A Periodic Orbit]
Let $\Phi:\mathbb R \times M \to M$ be a $C^r$ flow on a smooth manifold $M$, write $\Phi_t(x)=\Phi(t,x)$, and let $\Gamma$ be a hyperbolic periodic orbit. A Poincare section $\Sigma$ is a codimension-one submanifold transverse to the flow at a point of $\Gamma$. The Poincare map $P:U\subset \Sigma \to \Sigma$ is defined on an open set $U$ of points whose positive orbits return first to $\Sigma$, and sends each $x\in U$ to that first return point.
[/definition]
The Poincare map converts questions about nearby recurrent flow lines into questions about iterates. If the stable and unstable manifolds of $\Gamma$ intersect transversely in the flow, their traces on $\Sigma$ give transverse homoclinic points for $P$. This sets up the next theorem, which transfers the discrete homoclinic horseshoe theorem back to continuous time.
[quotetheorem:7754]
[citeproof:7754]
The flow case adds return times but no new symbolic obstruction once a valid return map has been constructed. The existence of the Poincare section and the domain of the first-return map are essential: if nearby orbits fail to return to the section, there is no discrete system to which the Smale-Birkhoff theorem can be applied. For example, an orbit may leave a small flow box and never hit the chosen section again, or it may hit it only after passing through a region where the first-return time is not defined continuously. In either case there is no local diffeomorphism on a common domain that can carry Markov strips through repeated returns.
Hyperbolicity of the periodic orbit is equally structural. Without it, the fixed point of the Poincare map may have a unit-modulus multiplier, so there may be no stable and unstable curves in the section and no uniform contraction or expansion with which to build a horseshoe. A centre-type periodic orbit can have nearby recurrent trajectories without producing transverse stable and unstable manifolds. Transversality in the section is also essential, because a tangential return of the flow gives a tangential homoclinic intersection for $P$, not the crossing geometry needed for Markov strips.
There is one further limitation in continuous time. The symbolic coding records successive returns to the section, while the return-time function determines how long the flow spends between symbols. If return times are unbounded or singular near the set under consideration, the suspension may have subtler metric or statistical behaviour even though the return map contains a horseshoe. This is why many physical examples are analysed through their Poincare maps first and only then interpreted back in the flow.
[example: Duffing Equation Poincare Map]
A standard periodically forced Duffing oscillator can be written, for example, as
\begin{align*}
\ddot x+\delta \dot x-x+x^3=\Gamma \cos(\omega t).
\end{align*}
Set $u=x$ and $v=\dot x$. Then the second-order equation becomes the first-order non-autonomous system
\begin{align*}
\dot u=v,\qquad \dot v=-\delta v+u-u^3+\Gamma \cos(\omega t).
\end{align*}
The forcing has period $T=2\pi/\omega$, because
\begin{align*}
\cos(\omega(t+T))=\cos(\omega t+\omega T)=\cos(\omega t+2\pi)=\cos(\omega t).
\end{align*}
Let $\Phi_t$ denote the flow map of this time-dependent system on the phase plane, starting at forcing phase $t=0$, and define the time-$T$ map
\begin{align*}
P(u_0,v_0)=\Phi_T(u_0,v_0).
\end{align*}
Existence and uniqueness for smooth ordinary differential equations imply that $P$ is locally invertible, with inverse obtained by integrating the same equation backward for time $T$.
A $T$-periodic response $z_*(t)=(u_*(t),v_*(t))$ satisfies $z_*(T)=z_*(0)$, so if $z_0=z_*(0)$, then
\begin{align*}
P(z_0)=\Phi_T(z_0)=z_*(T)=z_*(0)=z_0.
\end{align*}
Thus the periodic response becomes a fixed point of the Poincare map. If this fixed point is hyperbolic, its stable and unstable manifolds are curves in the phase plane. Suppose these curves meet at a point $q\ne z_0$ and that their tangent lines at $q$ are distinct:
\begin{align*}
T_qW^s(z_0)+T_qW^u(z_0)=T_q\mathbb R^2.
\end{align*}
Then $q$ is a transverse homoclinic point for the time-$T$ map.
By the *Smale-Birkhoff Homoclinic Theorem*, some iterate $P^m$ contains a horseshoe near the homoclinic orbit. In particular, there is an invariant set on which $P^m$ realizes all binary itineraries. A binary sequence with period $r$ gives a point $z$ satisfying
\begin{align*}
P^{mr}(z)=z.
\end{align*}
Since $P$ is the time-$T$ map, this means
\begin{align*}
\Phi_{mrT}(z)=P^{mr}(z)=z.
\end{align*}
Therefore the original forced Duffing oscillator has periodic responses whose periods divide $mrT$, hence are integer multiples of the forcing period $T$. Periodic binary sequences of arbitrarily large symbolic period give infinitely many such periodic responses.
[/example]
The Duffing example illustrates the common route from a differential equation to a chaotic return map. The forcing period supplies the section, and transverse splitting of invariant manifolds supplies the hypothesis needed for the horseshoe theorem.
[example: Periodically Forced Pendulum]
For the unforced pendulum in coordinates $(\theta,v)$, with $v=\dot\theta$, take
\begin{align*}
\dot\theta=v,\qquad \dot v=-\sin\theta.
\end{align*}
Its energy is
\begin{align*}
H(\theta,v)=\frac{1}{2}v^2+1-\cos\theta.
\end{align*}
Along any solution,
\begin{align*}
\frac{d}{dt}H(\theta(t),v(t))=v\dot v+\sin\theta\,\dot\theta.
\end{align*}
Substituting $\dot\theta=v$ and $\dot v=-\sin\theta$ gives
\begin{align*}
\frac{d}{dt}H(\theta(t),v(t))=v(-\sin\theta)+\sin\theta\,v=0.
\end{align*}
Thus the level sets of $H$ are invariant. At the saddle $(\pi,0)$ on the cylinder, the energy is
\begin{align*}
H(\pi,0)=\frac{1}{2}\cdot 0^2+1-\cos\pi=0+1-(-1)=2.
\end{align*}
The separatrix is therefore the level $H=2$, so
\begin{align*}
\frac{1}{2}v^2+1-\cos\theta=2.
\end{align*}
Subtracting $1-\cos\theta$ from both sides gives
\begin{align*}
\frac{1}{2}v^2=1+\cos\theta.
\end{align*}
Multiplying by $2$ gives
\begin{align*}
v^2=2(1+\cos\theta).
\end{align*}
Using $1+\cos\theta=2\cos^2(\theta/2)$ gives
\begin{align*}
v^2=4\cos^2(\theta/2).
\end{align*}
Hence the two separatrix branches are
\begin{align*}
v=\pm 2\cos(\theta/2),
\end{align*}
with endpoints at the saddle point on the cylinder.
Now add a small $T$-periodic forcing and pass to the time-$T$ Poincare map $P$ on a section of fixed forcing phase. Suppose the saddle persists as a hyperbolic fixed point $z_0$ of $P$, and suppose the split stable and unstable manifolds meet at a point $q\ne z_0$ with
\begin{align*}
T_qW^s(z_0)+T_qW^u(z_0)=T_q\Sigma.
\end{align*}
Then $q$ is a transverse homoclinic point for $P$. By the *Smale-Birkhoff Homoclinic Theorem*, some iterate $P^m$ contains a horseshoe near the homoclinic orbit. The two main regions separated by the former separatrix correspond to oscillatory motion inside the loop and rotational motion outside it, so the symbolic itinerary of the horseshoe records switching between oscillation-type and rotation-type passages.
[/example]
This example is a geometric template for separatrix chaos. A degenerate homoclinic loop in an integrable system becomes a transverse homoclinic tangle after perturbation, and the Lambda Lemma then propagates the first crossing into a full symbolic structure.
## Consequences And Perspective
What should be remembered from this chapter as the course moves toward entropy and invariant measures? The main point is that a transverse homoclinic orbit is not a marginal complication near a periodic orbit. It is a certificate for a hyperbolic invariant set with symbolic dynamics.
[explanation: From One Crossing To Many Orbits]
A single transverse homoclinic point gives infinitely many homoclinic points because all iterates of the point remain in $W^s(p)\cap W^u(p)$. The Lambda Lemma strengthens this observation by showing that pieces of unstable manifold return in shapes that approximate the local unstable manifold itself. Repeated crossings create nested strips, and nested strips encode bi-infinite symbol sequences.
This mechanism is local in the final stage but global in its origin. The orbit must leave a neighbourhood of $p$ and return, so the theorem detects the global folding of phase space. Once the return is transverse, hyperbolicity supplies enough control to turn the geometry into a shift system.
[/explanation]
The next chapters develop quantitative and measure-theoretic ways of detecting this orbit complexity. Topological entropy will measure the exponential growth of distinguishable orbit segments, while invariant measures will describe statistical behaviour supported on hyperbolic sets like the horseshoes constructed here.
Homoclinic intersections produce rich orbit complexity, but to use that complexity effectively we need to know which approximate orbits are genuine and which features persist under perturbation. The next chapter develops shadowing and structural stability to answer exactly that question.
# 6. Shadowing and Structural Stability
This chapter asks how much of hyperbolic dynamics survives when the data we see are imperfect. Earlier chapters treated exact orbits, symbolic codings, horseshoes, and hyperbolic sets; here we study sequences that only approximately follow the dynamics and ask when they are close to true orbits. The main answer is that hyperbolicity gives shadowing, expansivity, and robust topological structure, while numerical simulation must be interpreted through these stability results rather than as exact orbit computation.
## Approximate Orbits and Shadowing
A numerical orbit, an experimental trajectory, or a symbolic itinerary produced by rounding usually does not satisfy $x_{n+1}=f(x_n)$ exactly. The first question is therefore: when does a sequence with small step-by-step errors still represent a genuine dynamical behaviour of the system?
[definition: Pseudo-Orbit]
Let $(X,d)$ be a metric space and let $f:X\to X$ be a continuous map. For $\varepsilon>0$, an $\varepsilon$-pseudo-orbit for $f$ is a sequence $(x_n)_{n\ge 0}$ in $X$ such that
\begin{align*}
d(f(x_n),x_{n+1})<\varepsilon
\end{align*}
for every $n\ge 0$. If $f$ is a homeomorphism, a two-sided $\varepsilon$-pseudo-orbit is a sequence $(x_n)_{n\in\mathbb Z}$ satisfying the same condition for every $n\in\mathbb Z$.
[/definition]
The error condition is local in time: each step is almost correct, but errors may accumulate. This raises the next problem: we need a way to say that an approximate sequence is not merely close step by step, but is uniformly followed by one actual orbit.
[definition: Shadowing]
Let $(X,d)$ be a metric space and let $f:X\to X$ be continuous. A point $y\in X$ $\delta$-shadows a sequence $(x_n)_{n\ge 0}$ if
\begin{align*}
d(f^n(y),x_n)<\delta
\end{align*}
for every $n\ge 0$. The map $f$ has the shadowing property on an invariant set $\Lambda\subset X$ if for every $\delta>0$ there exists $\varepsilon>0$ such that every $\varepsilon$-pseudo-orbit in $\Lambda$ is $\delta$-shadowed by some point $y\in X$.
[/definition]
The shadowing point is not required by definition to lie in $\Lambda$, though for locally maximal hyperbolic sets it can be chosen in $\Lambda$. Before proving such a result in smooth dynamics, it is useful to see the same orbit-pasting problem in a symbolic system where coordinates record the whole history.
[example: Shadowing in the Full Shift]
Let $\Sigma_2=\{0,1\}^{\mathbb Z}$ and use the convention $(\sigma x)_i=x_{i+1}$. Fix an integer $M\ge 1$ and suppose $(x_n)_{n\in\mathbb Z}$ is an $\varepsilon$-pseudo-orbit with $\varepsilon<2^{-M}$. Since
\begin{align*}
d(\sigma x_n,x_{n+1})<2^{-M},
\end{align*}
the definition of the metric implies that $(\sigma x_n)_i=(x_{n+1})_i$ for every $|i|<M$. Because $(\sigma x_n)_i=(x_n)_{i+1}$, this gives
\begin{align*}
(x_{n+1})_i=(x_n)_{i+1}
\end{align*}
whenever $|i|<M$.
Define $y\in\Sigma_2$ by $y_n=(x_n)_0$. We show that $\sigma^n(y)$ agrees with $x_n$ on the central block $|i|<M$. If $0\le i<M$, repeated use of $(x_{m+1})_j=(x_m)_{j+1}$ gives
\begin{align*}
(x_{n+i})_0=(x_{n+i-1})_1.
\end{align*}
Applying the same identity again gives
\begin{align*}
(x_{n+i-1})_1=(x_{n+i-2})_2.
\end{align*}
Continuing for $i$ steps, all intermediate indices are $0,1,\ldots,i-1$, hence have absolute value $<M$, so
\begin{align*}
(x_{n+i})_0=(x_n)_i.
\end{align*}
If $-M<i<0$, write $i=-k$ with $1\le k<M$. The identity with $m=n-r$ and $j=-r$ gives
\begin{align*}
(x_{n-r})_{-r}=(x_{n-r-1})_{-r+1}
\end{align*}
for $1\le r\le k$, because $|-r|<M$. Chaining these equalities yields
\begin{align*}
(x_n)_{-k}=(x_{n-k})_0.
\end{align*}
Thus for every $|i|<M$,
\begin{align*}
(\sigma^n y)_i=y_{n+i}=(x_{n+i})_0=(x_n)_i.
\end{align*}
Therefore $\sigma^n(y)$ and $x_n$ agree on all coordinates $|i|<M$, so
\begin{align*}
d(\sigma^n(y),x_n)\le 2^{-M}
\end{align*}
for every $n\in\mathbb Z$. The pasted sequence $y$ is a genuine full-shift orbit whose iterates uniformly shadow the pseudo-orbit at the scale determined by $M$.
[/example]
This example contains the main mechanism in a form without differentiability: local consistency of finite orbit segments lets us paste them into a single bi-infinite orbit. For smooth systems the question becomes geometric: what structure replaces symbolic coordinates and lets future and past errors be corrected independently? The needed structure is a uniform splitting into stable directions, which absorb future errors, and unstable directions, which absorb past errors.
[definition: Hyperbolic Set]
Let $M$ be a compact smooth manifold, let $f:M\to M$ be a $C^1$ diffeomorphism, and let $\Lambda\subset M$ be a compact $f$-invariant set. The set $\Lambda$ is hyperbolic for $f$ if there is a continuous splitting
\begin{align*}
T_xM=E_x^s\oplus E_x^u
\end{align*}
for $x\in\Lambda$, constants $C\ge 1$ and $0<\lambda<1$, and invariance $Df_x(E_x^s)=E_{f(x)}^s$ and $Df_x(E_x^u)=E_{f(x)}^u$, such that for every $n\ge 0$,
\begin{align*}
|Df_x^n v|\le C\lambda^n |v| \quad \text{for }v\in E_x^s
\end{align*}
and
\begin{align*}
|Df_x^{-n} w|\le C\lambda^n |w| \quad \text{for }w\in E_x^u.
\end{align*}
[/definition]
Hyperbolicity is designed to answer the shadowing problem because it gives estimates in opposite time directions. The next theorem is the central result of the section: every sufficiently accurate pseudo-orbit inside a compact hyperbolic set is not an artefact, but is uniformly traced by a genuine orbit.
[quotetheorem:7755]
[proofunderconstruction:7755]
This result shows why hyperbolicity is the structural hypothesis rather than a convenient technical assumption. The stable and unstable estimates give a bounded inverse to the linearized error equation, and the nonlinear terms are controlled by taking the pseudo-orbit error small. Without such a splitting, stepwise errors need not be correctable in a bounded way: an irrational rotation of the circle is an isometry, and a pseudo-orbit with a tiny systematic drift can move away from every genuine rotation orbit by an amount growing linearly in time before returning modulo $1$. Compactness is also part of the uniform statement, because the same constants $C$, $\lambda$, coordinate sizes, and shadowing thresholds must work along the whole set rather than at a single orbit.
The local maximality clause should be read carefully. Shadowing on a hyperbolic set gives an orbit near the pseudo-orbit, but without an isolating neighbourhood that orbit may lie near $\Lambda$ without belonging to $\Lambda$. Thus shadowing does not say that every nearby approximate computation has the intended invariant set as its state space, nor does it preserve labels such as a chosen initial condition; it only produces a true orbit with the same coarse itinerary. The next section adds expansivity precisely to control when such a nearby orbit is forced to be unique.
[remark: Shadowing Is Not Uniqueness]
Shadowing asserts existence of a true orbit near a pseudo-orbit. It does not by itself say that the shadowing orbit is unique. Uniqueness requires a separation principle such as expansivity, introduced next.
[/remark]
## Expansivity and the Uniqueness of Orbits
If two exact orbits remain close forever, should they be regarded as the same orbit? In chaotic systems nearby initial points may separate rapidly, so a uniform failure to separate in both time directions is a strong constraint.
[definition: Expansive Map]
Let $(X,d)$ be a metric space and let $f:X\to X$ be a homeomorphism. The map $f$ is expansive on an invariant set $\Lambda\subset X$ if there exists $c>0$ such that whenever $x,y\in\Lambda$ satisfy
\begin{align*}
d(f^n(x),f^n(y))<c
\end{align*}
for every $n\in\mathbb Z$, then $x=y$. Such a number $c$ is called an expansivity constant.
[/definition]
Expansivity should be read as a two-sided property. Forward closeness alone only places $y$ in the local stable set of $x$, while backward closeness only places it in the local unstable set. The question is whether hyperbolicity forces the two constraints together so strongly that a point close in all time must coincide with the reference point.
[quotetheorem:7756]
[citeproof:7756]
Shadowing plus expansivity gives a powerful interpretation of approximate data. Shadowing says approximate orbits come from genuine orbits, while expansivity says that sufficiently accurate two-sided data identify the genuine orbit uniquely. The two-sided condition cannot be weakened to forward time: for a hyperbolic fixed point, any different point on its local stable manifold remains close under forward iteration and even converges to it, so forward closeness records stable membership rather than equality. Hyperbolicity is also essential; the identity map and an isometric rotation have distinct points whose orbits remain uniformly close for all positive and negative times, so no expansivity constant can exist on a space with more than one nearby point.
Expansivity should also not be confused with orbit production. By itself it does not give shadowing, dense periodic points, mixing, or a symbolic coding; it is a separation statement for exact orbits already known to exist. Its main role here is therefore complementary: after shadowing supplies a candidate true orbit, expansivity says that sufficiently fine two-sided data cannot be shadowed by two different candidates inside the hyperbolic set.
[example: Numerical Pseudo-Orbits for the Cat Map]
Let $f:\mathbb T^2\to\mathbb T^2$ be induced by $A(a,b)=(2a+b,a+b)$ modulo $\mathbb Z^2$. Its characteristic polynomial is
\begin{align*}
\det(A-\lambda I)=(2-\lambda)(1-\lambda)-1
\end{align*}
so
\begin{align*}
(2-\lambda)(1-\lambda)-1=2-3\lambda+\lambda^2-1=\lambda^2-3\lambda+1.
\end{align*}
Thus the eigenvalues are
\begin{align*}
\lambda_\pm=\frac{3\pm\sqrt{9-4}}{2}=\frac{3\pm\sqrt5}{2}.
\end{align*}
Since $\lambda_+>1$ and $0<\lambda_-<1$, no eigenvalue lies on the unit circle, so this toral automorphism is hyperbolic.
Suppose the computed points satisfy
\begin{align*}
d(x_{n+1},f(x_n))<\varepsilon
\end{align*}
at each computed step. Then $(x_n)$ is an $\varepsilon$-pseudo-orbit by the definition of pseudo-orbit. If $\varepsilon$ is below the threshold supplied by the *[Shadowing Lemma for Hyperbolic Sets](/theorems/7755)* for a chosen accuracy $\delta$, there is a true orbit $f^n(y)$ such that
\begin{align*}
d(f^n(y),x_n)<\delta
\end{align*}
for every time $n$ in the range where the pseudo-orbit hypothesis is valid. The computed trajectory is therefore reliable as an orbit of the dynamical system up to shadowing accuracy, but not necessarily as the orbit of the originally typed initial condition.
The instability of recovering the initial condition is visible from the expanding eigenvalue. If two lifted initial points differ by $t v_+$ in the unstable eigendirection, then after $n$ linear iterates their lifted difference is
\begin{align*}
A^n(t v_+)=tA^n v_+=t\lambda_+^n v_+.
\end{align*}
Thus an initial uncertainty of size $|t|$ in the unstable direction becomes size $|t|\lambda_+^n|v_+|$ before quotienting modulo $\mathbb Z^2$. Expansivity says that sufficiently accurate two-sided exact orbit data identify at most one orbit, but finite rounded forward data can be shadowed by a true orbit whose initial point is not the intended one.
[/example]
The cat map example is the standard warning about numerical chaos. A long simulation may be a faithful orbit of the system without being the orbit of the initial condition typed into the computer.
## Specification and Orbit Pasting
Shadowing deals with a single approximate orbit. A stronger question asks whether several genuine orbit pieces, chosen independently, can be pasted together into one genuine orbit with short transition times.
[definition: Specification Property]
Let $(X,d)$ be a compact metric space and let $f:X\to X$ be continuous. The map $f$ has the specification property if for every $\delta>0$ there exists an integer $N\ge 1$ such that for any finite collection of orbit segments $\{(x_j,a_j,b_j):1\le j\le k\}$ with $a_j\le b_j$ and $a_{j+1}-b_j\ge N$, there exists $y\in X$ such that
\begin{align*}
d(f^n(y),f^{n-a_j}(x_j))<\delta
\end{align*}
for every $j$ and every $a_j\le n\le b_j$.
[/definition]
The transition gap $N$ is independent of the number and length of orbit pieces. Thus specification is a uniform orbit-pasting principle, stronger than topological transitivity and closely tied to mixing in hyperbolic dynamics.
[example: Specification in a Mixing Shift of Finite Type]
Let $\Sigma_A$ be a mixing subshift of finite type with alphabet $\{1,\ldots,q\}$ and transition matrix $A$. Since $A$ is mixing, there is an integer $N\ge 1$ such that for every $L\ge N$ and every pair of symbols $p,q$, the entry $(A^L)_{pq}$ is positive. By the path-counting interpretation of powers of an adjacency matrix, $(A^L)_{pq}>0$ means that there are symbols $c_1,\ldots,c_{L-1}$ such that
\begin{align*}
A_{p c_1}A_{c_1 c_2}\cdots A_{c_{L-2}c_{L-1}}A_{c_{L-1}q}=1.
\end{align*}
Thus any symbol $p$ can be connected to any symbol $q$ by an admissible word of transition length $L$ whenever $L\ge N$.
Now take finitely many admissible orbit words placed on intervals $[a_j,b_j]$, with $a_{j+1}-b_j\ge N$. Let $p_j$ be the final symbol of the word on $[a_j,b_j]$, and let $q_{j+1}$ be the initial symbol of the word on $[a_{j+1},b_{j+1}]$. For the gap length
\begin{align*}
L_j=a_{j+1}-b_j,
\end{align*}
we have $L_j\ge N$, so $(A^{L_j})_{p_j q_{j+1}}>0$. Hence there are symbols $c_{j,1},\ldots,c_{j,L_j-1}$ with
\begin{align*}
A_{p_j c_{j,1}}A_{c_{j,1}c_{j,2}}\cdots A_{c_{j,L_j-2}c_{j,L_j-1}}A_{c_{j,L_j-1}q_{j+1}}=1.
\end{align*}
Placing these symbols in the empty coordinates between $b_j$ and $a_{j+1}$ makes every adjacent transition across the gap admissible.
After inserting all connecting words, we have one finite admissible block containing every prescribed word in its assigned coordinates. To extend it to a bi-infinite point of $\Sigma_A$, choose admissible connecting words from the last symbol of the block back to the first symbol and repeat the resulting cycle periodically; mixing gives such a connector by the same positivity argument. The resulting sequence agrees exactly with each prescribed word on its assigned interval, so in the symbolic metric it $\delta$-shadows those orbit segments whenever $\delta$ only requires agreement on the corresponding finite coordinate window. This is specification in symbolic form: separated admissible pieces can be pasted into one genuine orbit with a uniform transition bound.
[/example]
This example links specification back to the symbolic dynamics developed earlier in the course. In hyperbolic systems, Markov partitions transfer symbolic orbit-pasting to geometric orbit-pasting on basic sets.
[remark: Specification and Periodic Orbits]
When a system has specification, orbit segments can often be shadowed by periodic points by adding a final transition back to the start. This is one reason periodic orbit sums appear throughout thermodynamic formalism and entropy theory. The exact periodic version of specification requires a closing condition in the definition or an additional argument from shadowing.
[/remark]
## Structural Stability of Hyperbolic Systems
The next problem is not about errors in individual orbits, but about errors in the system itself. If $g$ is a small $C^1$ perturbation of $f$, do the qualitative dynamics of $g$ match those of $f$?
[definition: Topological Conjugacy]
Let $X$ and $Y$ be topological spaces, and let $f:X\to X$ and $g:Y\to Y$ be continuous maps. A topological conjugacy from $f$ to $g$ is a homeomorphism $h:X\to Y$ such that
\begin{align*}
h\circ f=g\circ h.
\end{align*}
[/definition]
A conjugacy identifies entire dynamical systems, not just individual orbits. It preserves orbit structure, periodic periods, recurrence, transitivity, and topological entropy. To turn this into a stability property, we need to quantify which nearby systems must be conjugate to the original one.
[definition: Structural Stability]
Let $M$ be a compact smooth manifold and let $f:M\to M$ be a $C^1$ diffeomorphism. The diffeomorphism $f$ is $C^1$ structurally stable if there is a $C^1$ neighbourhood $\mathcal U$ of $f$ such that for every $g\in\mathcal U$, there exists a homeomorphism $h:M\to M$ satisfying
\begin{align*}
h\circ f=g\circ h.
\end{align*}
[/definition]
Structural stability is a topological conclusion from a smooth hypothesis. The conjugating map is generally not differentiable, because Lyapunov exponents and smooth expansion rates may change under perturbation. The next issue is to identify a dynamical hypothesis strong enough to force this stability for the whole phase space.
[definition: Anosov Diffeomorphism]
Let $M$ be a compact smooth manifold and let $f:M\to M$ be a $C^1$ diffeomorphism. The map $f$ is an Anosov diffeomorphism if the whole manifold $M$ is a hyperbolic set for $f$.
[/definition]
For Anosov systems there is no non-hyperbolic region where perturbations can create unrelated behaviour. This global hyperbolicity puts every orbit under the reach of shadowing and every pair of persistently close orbits under the reach of expansivity. The structural stability theorem is the formal expression of that mechanism.
[quotetheorem:7757]
[citeproof:7757]
This result also explains why shadowing and expansivity were introduced together. Shadowing constructs the candidate conjugacy, while expansivity prevents two nearby candidates from competing. The Anosov hypothesis cannot be replaced by arbitrary recurrence or transitivity: a diffeomorphism with a non-hyperbolic fixed point may lose, split, or create nearby fixed points after an arbitrarily small perturbation, so there is no reason for the perturbed phase portrait to be conjugate to the original one. Compactness is used to make the perturbation size, shadowing scale, and expansivity scale uniform over all of $M$; on noncompact spaces, perturbations can be small on bounded regions while changing dynamics far away.
The topology on perturbations matters as well. $C^0$ closeness only compares point images, while hyperbolicity and invariant cone estimates depend on derivatives, so small $C^0$ perturbations can destroy expansion and contraction rates. The conclusion is topological rather than smooth: the conjugacy is generally only a homeomorphism, because smooth data such as Lyapunov exponents, eigenvalues at periodic points, and differentiable invariant foliations can vary under $C^1$ perturbation even when the orbit structure is conjugate.
[example: Toral Automorphism Under Small Perturbation]
Let $f:\mathbb T^2\to\mathbb T^2$ be induced by the linear map
\begin{align*}
A(a,b)=(2a+b,a+b).
\end{align*}
Because $A$ has integer coefficients, $A(\mathbb Z^2)\subset \mathbb Z^2$, so it descends to a continuous map on $\mathbb R^2/\mathbb Z^2=\mathbb T^2$. Its determinant is
\begin{align*}
\det A=2\cdot 1-1\cdot 1=1.
\end{align*}
The inverse linear map is therefore also induced by an integer matrix, so $f$ is a diffeomorphism of the torus.
We compute the eigenvalues from the characteristic polynomial:
\begin{align*}
\det(A-\lambda I)=(2-\lambda)(1-\lambda)-1.
\end{align*}
Expanding the product gives
\begin{align*}
(2-\lambda)(1-\lambda)-1=2-2\lambda-\lambda+\lambda^2-1=\lambda^2-3\lambda+1.
\end{align*}
Thus the eigenvalues are
\begin{align*}
\lambda_\pm=\frac{3\pm\sqrt{(-3)^2-4\cdot 1\cdot 1}}{2}=\frac{3\pm\sqrt5}{2}.
\end{align*}
Now $\lambda_+=(3+\sqrt5)/2>1$, while $0<\lambda_-=(3-\sqrt5)/2<1$ because $1<\sqrt5<3$. Hence neither eigenvalue has modulus $1$, so this toral automorphism is hyperbolic; since the whole manifold $\mathbb T^2$ is the hyperbolic set, $f$ is Anosov.
If $g$ is a $C^1$ diffeomorphism sufficiently close to $f$, the *[Anosov Structural Stability Theorem](/theorems/7757)* gives a homeomorphism $h:\mathbb T^2\to\mathbb T^2$ satisfying
\begin{align*}
h\circ f=g\circ h.
\end{align*}
Applying this identity to $x\in\mathbb T^2$ gives
\begin{align*}
h(f(x))=g(h(x)).
\end{align*}
Applying the same identity to $f(x)$ gives
\begin{align*}
h(f^2(x))=g(h(f(x)))=g(g(h(x)))=g^2(h(x)).
\end{align*}
Repeating this induction gives
\begin{align*}
h(f^n(x))=g^n(h(x))
\end{align*}
for every $n\ge 0$. Since $f$ and $g$ are diffeomorphisms and $h\circ f=g\circ h$, composing with inverse maps gives the same orbit correspondence for negative $n$. Thus $h$ sends each full orbit of the cat map to a full orbit of the perturbed map with the same time ordering. The perturbation may bend the stable and unstable directions and make the derivative depend on the point, but it does not change the topological orbit structure.
[/example]
The example shows the sense in which hyperbolic chaos is robust. Small modelling errors may bend invariant directions and move periodic points, but they do not destroy the qualitative topological dynamics.
## Axiom A and Spectral Decomposition
Anosov systems are globally hyperbolic, but many important systems have hyperbolicity only on the recurrent part of phase space. The final question of the chapter is how structural organisation survives in this broader setting.
[definition: Nonwandering Set]
Let $X$ be a topological space and let $f:X\to X$ be continuous. A point $x\in X$ is nonwandering if for every neighbourhood $U$ of $x$ there exists $n\ge 1$ such that
\begin{align*}
f^n(U)\cap U\ne\varnothing.
\end{align*}
The nonwandering set of $f$ is denoted $\Omega(f)$.
[/definition]
The nonwandering set contains the part of the phase space that can return arbitrarily close to itself. For dissipative systems it may be a much smaller set than the whole manifold. The next definition asks for hyperbolicity precisely on this recurrent core, together with enough periodic orbits to make it accessible by finite data.
[definition: Axiom A Diffeomorphism]
Let $M$ be a compact smooth manifold and let $f:M\to M$ be a $C^1$ diffeomorphism. The map $f$ satisfies Axiom A if $\Omega(f)$ is a hyperbolic set and the periodic points of $f$ are dense in $\Omega(f)$.
[/definition]
Axiom A isolates the hyperbolic recurrent core, but that core can have several dynamically separate pieces. To state the decomposition theorem, we need a name for a single irreducible hyperbolic component: one that is locally maximal, transitive, and rich in periodic points.
[definition: Basic Set]
Let $f:M\to M$ be a $C^1$ diffeomorphism. A compact invariant set $\Lambda\subset M$ is a basic set if $\Lambda$ is hyperbolic, locally maximal, contains a dense orbit, and contains a dense set of periodic points.
[/definition]
Basic sets are the building blocks of Axiom A dynamics. Each behaves like one irreducible symbolic component, often modelled by a transitive subshift of finite type. This creates the final structural problem: given only the Axiom A hypotheses, do all recurrent behaviours split into finitely many basic pieces, or can infinitely many unrelated recurrent components remain? The spectral decomposition theorem answers that question by giving a finite canonical decomposition of $\Omega(f)$.
[quotetheorem:7758]
The course treats this result as a structural theorem whose proof uses Markov partitions, local product structure, and the density of periodic points to split the hyperbolic recurrent set into finitely many transitive components. The Axiom A assumptions are doing separate jobs. Hyperbolicity gives stable and unstable plaques and rules out neutral recurrent behaviour; for instance, an irrational circle rotation has every point nonwandering but has no periodic points and no finite decomposition into hyperbolic basic pieces. Density of periodic points prevents the recurrent set from containing aperiodic hyperbolic fragments invisible to periodic orbit data, which would break the symbolic and zeta-function methods used later in the course.
The theorem is also a decomposition theorem, not a full stability theorem. It organises the recurrent dynamics into finitely many basic pieces, but it does not by itself control how orbits move from one piece to another through stable and unstable manifolds. That missing ordering information is exactly what the no-cycles condition supplies in the structural stability theory for Axiom A diffeomorphisms, so spectral decomposition is the bridge from global Anosov stability to the more flexible hyperbolic recurrent setting.
[remark: Structural Stability Beyond Anosov]
For Axiom A diffeomorphisms, full structural stability also requires a no-cycles condition controlling how stable and unstable manifolds of different basic sets interact. Without such a condition, the hyperbolic pieces persist but their connecting geometry can change under perturbation. The Anosov theorem is the special case where the whole manifold is one hyperbolic recurrent object.
[/remark]
The chapter leaves us with a practical interpretation of chaos. Hyperbolic systems are sensitive to initial conditions, yet their qualitative dynamics are stable under small errors in both orbits and equations. This is why symbolic models, periodic orbit data, and numerical pseudo-orbits can describe real hyperbolic dynamics without requiring exact pointwise computation.
Shadowing and structural stability show that hyperbolic chaos can be detected from approximate data and survives small perturbations. With that robustness in hand, the course now measures complexity more directly through topological entropy.
# 7. Topological Entropy
Topological entropy assigns a number to a dynamical system by measuring how many distinguishable orbit segments it produces as time grows. Chapters 1 through 6 used recurrence, symbolic codings, horseshoes, hyperbolic sets, homoclinic intersections, and shadowing to describe chaotic behaviour qualitatively; entropy turns the same phenomena into an exponential growth rate. In this chapter we build entropy first for compact metric maps, then compare the metric definitions with the open-cover definition, compute it for shifts and standard examples, and finally state the [variational principle](/theorems/7763) as the bridge to invariant measures.
## Measuring Orbit Complexity
How can two points be regarded as dynamically different if they start close together but separate later? The metric definition of entropy answers this by comparing finite orbit segments rather than individual points. Throughout this section let $(X,d)$ be a compact metric space and let $f:X\to X$ be continuous.
For a fixed observation time $n$, the ordinary metric is replaced by the largest distance seen during the first $n$ iterates.
[definition: Bowen Metric]
For $n\in\mathbb N$, the Bowen metric is the map $d_n:X\times X\to\mathbb R$ defined by
\begin{align*}
d_n(x,y)=\max_{0\le j<n} d(f^j(x),f^j(y)).
\end{align*}
[/definition]
This metric records whether two points have remained close for the whole observed orbit segment. The first way to count complexity is to ask for a large set of orbit segments that can be told apart at scale $\varepsilon$. This gives a lower-bound style count, because every point chosen represents a genuinely different observed behaviour.
[definition: Separated Set]
Let $n\in\mathbb N$ and $\varepsilon>0$. A set $E\subset X$ is $(n,\varepsilon)$-separated if for all distinct $x,y\in E$,
\begin{align*}
d_n(x,y)>\varepsilon.
\end{align*}
The maximal cardinality of an $(n,\varepsilon)$-separated set is denoted by $s_n(f,\varepsilon)\in\mathbb N$.
[/definition]
Separated sets are finite because compactness makes $(X,d_n)$ totally bounded. The complementary counting problem is to cover all possible orbit segments by a smaller list of representatives. This motivates spanning sets, which give an upper-bound style count for the same orbit complexity.
[definition: Spanning Set]
Let $n\in\mathbb N$ and $\varepsilon>0$. A set $F\subset X$ is $(n,\varepsilon)$-spanning if for every $x\in X$ there exists $y\in F$ such that
\begin{align*}
d_n(x,y)<\varepsilon.
\end{align*}
The minimal cardinality of an $(n,\varepsilon)$-spanning set is denoted by $r_n(f,\varepsilon)\in\mathbb N$.
[/definition]
Separated and spanning sets look like different measurements: one packs orbit segments apart, the other covers all orbit segments by models. Entropy should not depend on which side of this packing-covering comparison is used. The next theorem supplies the needed equivalence at the level of exponential growth.
[quotetheorem:6803]
The equivalence means that the exponential growth rate is intrinsic to orbit separation, not to the chosen packing or covering convention. Compactness is used in a serious way: it makes the relevant separated and spanning numbers finite at each fixed scale, so the logarithmic growth rates are meaningful. Without compactness the counts can stop being finite: for the identity map on $\mathbb R$ with the usual metric, every infinite arithmetic progression with spacing greater than $\varepsilon$ is $(n,\varepsilon)$-separated for all $n$. Continuity ensures that the iterates used in $d_n$ interact well with the topology; the statement is not a general assertion about arbitrary self-maps on arbitrary metric spaces. The theorem also does not say that $r_n(f,\varepsilon)$ and $s_n(f,\varepsilon)$ are equal at finite time, only that their exponential growth rates agree after the scale is sent to zero. A single numerical invariant is now available, and this motivates the following definition of topological entropy for maps.
[definition: Topological Entropy Of A Map]
For a compact metric space $X$, topological entropy is the functional $h_{\mathrm{top}}:C(X,X)\to[0,\infty]$ defined by
\begin{align*}
h_{\mathrm{top}}(f)=\lim_{\varepsilon\downarrow0}\limsup_{n\to\infty}\frac{1}{n}\log s_n(f,\varepsilon).
\end{align*}
[/definition]
For compact maps this number is finite when $X$ has finite metric complexity at each scale, but it may be infinite for general compact metric spaces. A meaningful topological invariant must survive changes of coordinates, so the next result checks that conjugate systems have the same entropy.
[quotetheorem:6805]
Conjugacy invariance separates entropy from geometric artefacts of a particular metric. The homeomorphism hypothesis matters: a non-invertible factor map can collapse many orbit segments, and entropy can drop when passing to a factor. For example, the constant map from the full two-shift onto a one-point system intertwines the dynamics, but the entropies are $\log 2$ and $0$. Compactness enters through [uniform continuity](/page/Uniform%20Continuity) of the conjugacy and its inverse, which is what allows scale comparisons to survive for all orbit lengths. The theorem also does not classify systems with equal entropy; many non-conjugate systems have the same entropy. The simplest systems to test are those with no stretching at all, where the Bowen metrics never create new separated orbit segments. This motivates the zero-entropy computation for isometries.
[example: Isometry Has Zero Entropy]
Let $X$ be a compact metric space and let $f:X\to X$ be an isometry. Since $f$ preserves distances, induction gives
\begin{align*}
d(f^j(x),f^j(y))=d(x,y)\quad\text{for every }j\ge0.
\end{align*}
Therefore the Bowen metric does not change with $n$:
\begin{align*}
d_n(x,y)=\max_{0\le j<n}d(f^j(x),f^j(y))=\max_{0\le j<n}d(x,y)=d(x,y).
\end{align*}
Fix $\varepsilon>0$, and let $M(\varepsilon)$ be the maximal cardinality of an $\varepsilon$-separated subset of $(X,d)$. This number is finite because compact metric spaces are totally bounded. Since $d_n=d$, an $(n,\varepsilon)$-separated set for $f$ is exactly an $\varepsilon$-separated set in $X$, so
\begin{align*}
s_n(f,\varepsilon)\le M(\varepsilon)\quad\text{for every }n.
\end{align*}
Taking logarithms and dividing by $n$ gives
\begin{align*}
0\le \frac{1}{n}\log s_n(f,\varepsilon)\le \frac{1}{n}\log M(\varepsilon).
\end{align*}
As $n\to\infty$, the right-hand side tends to $0$, hence
\begin{align*}
\limsup_{n\to\infty}\frac{1}{n}\log s_n(f,\varepsilon)=0.
\end{align*}
Letting $\varepsilon\downarrow0$ in the definition of entropy therefore gives
\begin{align*}
h_{\mathrm{top}}(f)=0.
\end{align*}
Thus an isometry may have recurrent or complicated-looking motion, such as rotations of compact groups, but it does not create exponentially many distinguishable orbit segments.
[/example]
## The Open-Cover Definition
The metric definition is efficient for calculations, but entropy was originally formulated without choosing a metric. The question is how to count complexity using only the topology of the phase space. Open covers replace metric balls, and iterated refinements record which cover elements an orbit visits.
[definition: Open-Cover Entropy]
Let $X$ be compact and let $f:X\to X$ be continuous. For a finite open cover $\mathcal U$ of $X$, define
\begin{align*}
\mathcal U_0^{n-1}=\mathcal U\vee f^{-1}\mathcal U\vee\cdots\vee f^{-(n-1)}\mathcal U,
\end{align*}
where $\vee$ denotes common refinement. If $N(\mathcal V)$ is the smallest cardinality of a subcover of a finite open cover $\mathcal V$, set
\begin{align*}
h(f,\mathcal U)=\lim_{n\to\infty}\frac{1}{n}\log N(\mathcal U_0^{n-1})\in[0,\infty].
\end{align*}
The open-cover topological entropy is
\begin{align*}
h_{\mathrm{cov}}(f)=\sup_{\mathcal U} h(f,\mathcal U)\in[0,\infty],
\end{align*}
where the supremum is over finite open covers of $X$.
[/definition]
The limit exists because $\log N(\mathcal U_0^{n-1})$ is subadditive in $n$. Since this cover construction uses only topology, it should match Bowen entropy on compact metric spaces if the metric definition is genuinely topological. The next theorem proves that the two formalisms count the same orbit names at exponential scale.
[quotetheorem:7759]
[citeproof:7759]
The theorem justifies writing $h_{\mathrm{top}}$ without indicating which definition is being used. Compact metrizability is essential in the comparison because the proof uses Lebesgue numbers and finite subcovers to pass between metric balls and open covers. Outside compact metric spaces, open-cover entropy and metric separated-set entropy require additional conventions and may disagree. A concrete warning is given by the identity map on $\mathbb R$: with the usual metric, bounded intervals have finite separated counts at each fixed scale, while with the discrete metric every distinct point is separated at scales below $1$. The topology and compactness assumptions are what prevent such metric-dependent pathologies in the theorem. The result therefore explains why entropy is a topological invariant in the compact metric setting, even though the Bowen definition starts from a metric.
## Entropy Of Flows
For a continuous-time system, orbit segments have real length rather than integer length. The main issue is to normalize the growth rate per unit time and to ensure that the answer agrees with time-one maps.
[definition: Topological Entropy Of A Flow]
Let $(\varphi_t)_{t\in\mathbb R}$ be a continuous $\mathbb R$-action on a compact metric space $X$. For $T>0$, define $d_T:X\times X\to\mathbb R$ by
\begin{align*}
d_T(x,y)=\sup_{0\le t\le T} d(\varphi_t(x),\varphi_t(y)).
\end{align*}
The topological entropy of the flow is
\begin{align*}
h_{\mathrm{top}}(\varphi)=\lim_{\varepsilon\downarrow0}\limsup_{T\to\infty}\frac{1}{T}\log s_T(\varphi,\varepsilon),
\end{align*}
where $s_T(\varphi,\varepsilon)$ is the maximal cardinality of a $d_T$-separated set at scale $\varepsilon$.
[/definition]
This definition is compatible with sampling the flow at a fixed time. The normalization issue is that observing a flow for time $T$ under the time-$a$ map corresponds to observing the original continuous motion for time about $aT$. If entropy is measuring information per unit time, then changing the sampling speed should multiply the entropy by that speed rather than create a new invariant unrelated to the flow.
[quotetheorem:7760]
[citeproof:7760]
The scaling formula gives a quick way to compute entropy for suspension flows once the base map is known. The restriction $a>0$ avoids reversing time or collapsing the flow to the identity at time $0$; for invertible flows, negative times have the same entropy as the corresponding positive time after taking absolute value. Compactness and continuity are again used to compare continuous observation with sampled observation uniformly over the whole space. The hypotheses are not decorative. For example, let $X=\{0,1\}^{\mathbb Z}$ and let $\sigma:X\to X$ be the full shift. If a family of maps is declared by setting $\psi_m=\operatorname{id}_X$ at integer times and $\psi_{m+1/2}=\sigma^m$ at half-integer times, with arbitrary interpolation ignored, then integer sampling has zero entropy while half-integer observations see shift complexity. This family is not a continuous $\mathbb R$-action, and it shows why the group law and continuous dependence on time are part of the theorem rather than bookkeeping. The formula also shows that entropy measures information per unit time, so lengthening the roof in a suspension slows the rate. This motivates the standard suspension example.
[example: Suspension Flow Over A Shift]
Let $\sigma:X\to X$ be a subshift, and form the constant-roof suspension
\begin{align*}
X^\tau=(X\times[0,\tau])/\bigl((x,\tau)\sim(\sigma x,0)\bigr)
\end{align*}
with flow $\varphi_t$. At time $\tau$, every point returns once to the same height over the shifted base point:
\begin{align*}
\varphi_\tau([x,s])=[\sigma x,s].
\end{align*}
Thus the time-$\tau$ map is the shift on the symbolic coordinate together with an unchanged height coordinate, so by the standard product-with-identity entropy formula,
\begin{align*}
h_{\mathrm{top}}(\varphi_\tau)=h_{\mathrm{top}}(\sigma).
\end{align*}
By *Entropy Of Time Maps* with $a=\tau$,
\begin{align*}
h_{\mathrm{top}}(\varphi_\tau)=\tau h_{\mathrm{top}}(\varphi).
\end{align*}
Combining the two displayed equalities gives
\begin{align*}
h_{\mathrm{top}}(\sigma)=\tau h_{\mathrm{top}}(\varphi).
\end{align*}
Since $\tau>0$, division by $\tau$ yields
\begin{align*}
h_{\mathrm{top}}(\varphi)=\frac{1}{\tau}h_{\mathrm{top}}(\sigma).
\end{align*}
So the suspension does not change which symbolic names occur; it spreads one shift step over $\tau$ units of flow time, reducing the entropy rate by the factor $\tau$.
[/example]
## Subshifts And Spectral Radius
Symbolic dynamics is the model case because orbit segments are finite words. The problem becomes a counting problem: how many admissible words of length $n$ does the system allow?
[definition: Language Of A Subshift]
Let $X\subset A^{\mathbb Z}$ be a subshift over a finite alphabet $A$. The language $\mathcal L_n(X)$ is the set of words of length $n$ that occur in some point of $X$.
[/definition]
The language records exactly the finite data seen by orbit segments under the shift. To turn this into entropy, we must compare word counts with separated and spanning orbit segments in the [product topology](/page/Product%20Topology). The next theorem performs that comparison and reduces symbolic entropy to word growth.
[quotetheorem:6809]
This formula turns entropy into a combinatorial invariant for symbolic systems. The finite alphabet hypothesis is important because it makes $A^{\mathbb Z}$ compact and keeps the number of words of each length finite. The two-sided convention changes notation but not the usual entropy value; one-sided subshifts over the same language have the same word-growth rate. The theorem does not extend unchanged to countable alphabets, where compactness may fail and word counts may be infinite. It also does not describe finer dynamical structure: two subshifts can have the same word-growth rate while having different periodic point counts, mixing properties, or decompositions into invariant components. The full shift is the calibration case because no words are forbidden, so the word count is exactly a power of the alphabet size. That makes it the reference model for positive entropy.
[example: Full Shift On k Symbols]
For the full shift $A^{\mathbb Z}$ with $|A|=k$, there are no transition restrictions: a word of length $n$ is any function from $\{0,\dots,n-1\}$ to $A$. For the first coordinate there are $k$ choices, for the second coordinate there are again $k$ choices, and continuing independently through the $n$ coordinates gives
\begin{align*}
|\mathcal L_n(A^{\mathbb Z})|=\underbrace{k\cdot k\cdots k}_{n\text{ factors}}=k^n.
\end{align*}
By *Entropy Of A Subshift*,
\begin{align*}
h_{\mathrm{top}}(\sigma)=\lim_{n\to\infty}\frac{1}{n}\log |\mathcal L_n(A^{\mathbb Z})|.
\end{align*}
Substituting $|\mathcal L_n(A^{\mathbb Z})|=k^n$ gives
\begin{align*}
h_{\mathrm{top}}(\sigma)=\lim_{n\to\infty}\frac{1}{n}\log(k^n).
\end{align*}
Using $\log(k^n)=n\log k$,
\begin{align*}
h_{\mathrm{top}}(\sigma)=\lim_{n\to\infty}\log k=\log k.
\end{align*}
Thus the full shift is the benchmark example: each iterate reveals one new symbol chosen from $k$ independent possibilities.
[/example]
The next symbolic class introduces local transition rules while retaining finite combinatorics. Instead of counting all words directly, we encode which symbols may follow which other symbols in a matrix. This motivates subshifts of finite type, whose entropy can be read from linear algebra.
[definition: Subshift Of Finite Type From A Matrix]
Let $A$ be a $k\times k$ matrix with entries in $\{0,1\}$. The associated two-sided subshift of finite type is
\begin{align*}
\Sigma_A=\{x\in\{1,\dots,k\}^{\mathbb Z}: A_{x_jx_{j+1}}=1\text{ for all }j\in\mathbb Z\}.
\end{align*}
[/definition]
This definition turns admissible symbolic orbits into paths in a finite directed graph. The remaining problem is to determine the exponential growth rate of those paths, and matrix powers are designed to count them. This motivates the spectral-radius entropy formula.
[quotetheorem:7761]
[citeproof:7761]
The spectral formula makes constrained symbolic examples as computable as full shifts. The directed-cycle assumption is stronger than merely requiring $A$ to have a nonzero entry: for instance, the matrix with a single transition $1\to2$ and no return has no bi-infinite admissible path, so its two-sided shift is empty. Reducibility is allowed: different recurrent components may contribute different path growth rates, and the spectral radius selects the component with maximal exponential growth. Transient symbols can affect short words but not the limiting exponential rate. The formula computes entropy, not mixing or irreducibility; those require additional graph conditions. The golden mean shift is the standard test: a single forbidden word changes the exponential growth rate from powers of $2$ to Fibonacci growth. This illustrates how local rules reduce entropy.
[example: Golden Mean Shift]
The golden mean shift forbids the word $11$, so with alphabet $\{0,1\}$ its transition matrix is determined by
\begin{align*}
A_{00}=1,\quad A_{01}=1,\quad A_{10}=1,\quad A_{11}=0.
\end{align*}
For this $2\times2$ matrix, the characteristic polynomial is
\begin{align*}
\det(A-\lambda I)=(1-\lambda)(-\lambda)-1\cdot 1.
\end{align*}
Expanding the product gives
\begin{align*}
(1-\lambda)(-\lambda)=-\lambda+\lambda^2.
\end{align*}
Hence
\begin{align*}
\det(A-\lambda I)=\lambda^2-\lambda-1.
\end{align*}
The eigenvalues are therefore the roots of $\lambda^2-\lambda-1=0$, namely
\begin{align*}
\lambda=\frac{1+\sqrt{5}}{2}\quad\text{or}\quad \lambda=\frac{1-\sqrt{5}}{2}.
\end{align*}
Since
\begin{align*}
\left|\frac{1-\sqrt{5}}{2}\right|=\frac{\sqrt{5}-1}{2}<\frac{1+\sqrt{5}}{2},
\end{align*}
the spectral radius is
\begin{align*}
\rho(A)=\frac{1+\sqrt{5}}{2}=\phi.
\end{align*}
By *Entropy Of A Subshift Of Finite Type*,
\begin{align*}
h_{\mathrm{top}}(\sigma|_{\Sigma_A})=\log\rho(A)=\log\phi.
\end{align*}
Finally, $\phi<2$, and $\log$ is increasing, so
\begin{align*}
\log\phi<\log2.
\end{align*}
Thus forbidding the block $11$ lowers the exponential word-growth rate from the full binary value $\log2$ to $\log\phi$.
[/example]
## Interval Maps, Expanding Maps, And Horseshoes
For maps of intervals or manifolds, entropy detects how many monotone pieces, inverse branches, or symbolic strips survive under iteration. The central question is when geometric stretching creates symbolic dynamics inside the original system.
[example: Doubling Map Has Entropy Log Two]
Identify $S^1$ with $\mathbb R/\mathbb Z$ and use the circle metric $d(x,y)=\min_{m\in\mathbb Z}|x-y-m|$. Let $f(x)=2x\pmod 1$. For $0\le j<n$, multiplication by $2^j$ is $2^j$-Lipschitz for this metric, so if $d(x,y)<\varepsilon 2^{-(n-1)}$, then
\begin{align*}
d(f^j(x),f^j(y))\le 2^j d(x,y)<2^j\varepsilon 2^{-(n-1)}\le \varepsilon.
\end{align*}
Thus two points in the same interval of length $\varepsilon 2^{-(n-1)}$ cannot be $(n,\varepsilon)$-separated. Covering the circle by at most $\lceil 2^{n-1}/\varepsilon\rceil+1$ such intervals gives
\begin{align*}
s_n(f,\varepsilon)\le \left\lceil \frac{2^{n-1}}{\varepsilon}\right\rceil+1.
\end{align*}
Therefore
\begin{align*}
\limsup_{n\to\infty}\frac{1}{n}\log s_n(f,\varepsilon)\le \log 2.
\end{align*}
For the reverse inequality, fix $0<\varepsilon<1/4$ and consider the dyadic grid
\begin{align*}
E_n=\left\{\frac{k}{2^n}\pmod 1:0\le k<2^n\right\}.
\end{align*}
If $k\ne \ell$, write $q=k-\ell$ modulo $2^n$ and choose its representative with $q\not\equiv0\pmod{2^n}$. If $q=2^r u$ with $u$ odd and $0\le r\le n-2$, then for $j=n-r-2$,
\begin{align*}
f^j\left(\frac{k}{2^n}\right)-f^j\left(\frac{\ell}{2^n}\right)\equiv \frac{2^j q}{2^n}=\frac{u}{4}\pmod 1.
\end{align*}
Since $u$ is odd, $\frac{u}{4}\pmod1$ is either $\frac14$ or $\frac34$, so the circle distance is $1/4$. If instead $q\equiv 2^{n-1}\pmod{2^n}$, the original points already have distance $1/2$. Hence every two distinct points of $E_n$ have Bowen distance at least $1/4>\varepsilon$, so $E_n$ is $(n,\varepsilon)$-separated and
\begin{align*}
s_n(f,\varepsilon)\ge |E_n|=2^n.
\end{align*}
It follows that
\begin{align*}
\limsup_{n\to\infty}\frac{1}{n}\log s_n(f,\varepsilon)\ge \lim_{n\to\infty}\frac{1}{n}\log(2^n)=\log2.
\end{align*}
Combining the upper and lower bounds gives the scale-$\varepsilon$ growth rate $\log2$ for every $0<\varepsilon<1/4$, and hence
\begin{align*}
h_{\mathrm{top}}(f)=\log2.
\end{align*}
Thus each iterate of the doubling map creates one new binary choice, and the exponential orbit-complexity rate is exactly $\log2$.
[/example]
The doubling map shows entropy arising from uniform expansion. Horseshoes show the same mechanism in a geometric form, but now the symbolic choices come from strips that survive repeated stretching and folding. To state the entropy consequence precisely, we isolate the topological feature that matters: a compact invariant set carrying all symbolic itineraries.
[definition: Topological Horseshoe]
Let $f:X\to X$ be continuous. A compact invariant set $\Lambda\subset X$ is a topological horseshoe with $k$ branches if there is a continuous surjection $\pi:\Lambda\to\{1,\dots,k\}^{\mathbb N}$ such that
\begin{align*}
\pi\circ f=\sigma^+\circ\pi,
\end{align*}
where $\sigma^+$ is the one-sided full shift.
[/definition]
This definition packages the geometric horseshoe into a factor map onto a full shift while keeping the time direction honest. For a non-invertible map, future itineraries are the natural data, so the one-sided shift is the right symbolic factor. If $f|_\Lambda:\Lambda\to\Lambda$ is a homeomorphism, many courses use the two-sided version with $\{1,\dots,k\}^{\mathbb Z}$ and the two-sided full shift $\sigma$ instead. Since factors cannot have more entropy than the systems that map onto them, the existence of such a set forces exponential orbit complexity in the original dynamics. The next theorem records the resulting lower bound.
[quotetheorem:7762]
[citeproof:7762]
The theorem turns the geometric picture of a horseshoe into a numerical certificate of chaos. The factor condition is weaker than conjugacy: several points of $\Lambda$ may represent the same symbolic itinerary, but surjectivity onto the full shift is enough to force all symbolic behaviours to occur. The condition $k\ge2$ matters because a one-branch full shift has entropy $0$ and gives no positive lower bound. The estimate need not be sharp, since the ambient system may contain additional orbit complexity outside the horseshoe or even inside fibres of the coding map. In applications, the work is usually to construct the invariant set and verify the symbolic coding. Once that is done, the entropy bound follows from the full-shift computation.
A concrete failure mode explains the one-sided convention. The doubling map $x\mapsto 2x\pmod 1$ on $S^1$ is not invertible, but it has a natural coding by forward binary itineraries and hence a one-sided full two-shift factor up to the usual endpoint ambiguity. Asking for a two-sided full-shift factor directly on the circle would incorrectly demand past symbols that the map does not determine uniquely. The inverse-limit space restores those pasts, and on that natural extension the two-sided symbolic description becomes appropriate.
[example: Positive Entropy From A Two-Branch Horseshoe]
Let $f$ be the planar diffeomorphism and let $\Lambda$ be the compact invariant Cantor set of points whose full orbits stay in the rectangle. Label the two surviving strips by $0$ and $1$, and define the itinerary map $\pi:\Lambda\to\{0,1\}^{\mathbb Z}$ by declaring $\pi(x)_j=i$ exactly when $f^j(x)$ lies in the strip labelled $i$. Since applying $f$ shifts every visit one time step forward, for every $x\in\Lambda$ and every $j\in\mathbb Z$,
\begin{align*}
(\pi(fx))_j=\pi(x)_{j+1}.
\end{align*}
Thus $\pi\circ f=\sigma\circ\pi$, where $\sigma$ is the full two-shift.
In the standard hyperbolic horseshoe construction, every bi-infinite binary sequence specifies a nested family of horizontal and vertical strips, and the contraction in one direction together with expansion in the other makes the intersection a single point of $\Lambda$. Hence $\pi$ is onto, and in the conjugate case it is one-to-one as well. Therefore the horseshoe system factors onto the full two-shift, so by *Horseshoe Gives Positive Entropy*,
\begin{align*}
h_{\mathrm{top}}(f|_\Lambda)\ge h_{\mathrm{top}}(\sigma).
\end{align*}
For the full shift on two symbols, the number of length-$n$ words is $2^n$, so
\begin{align*}
h_{\mathrm{top}}(\sigma)=\lim_{n\to\infty}\frac{1}{n}\log(2^n).
\end{align*}
Since $\log(2^n)=n\log2$, this becomes
\begin{align*}
h_{\mathrm{top}}(\sigma)=\lim_{n\to\infty}\log2=\log2.
\end{align*}
Combining the two inequalities gives
\begin{align*}
h_{\mathrm{top}}(f|_\Lambda)\ge\log2.
\end{align*}
Thus the two branches force at least one binary choice per iterate, producing positive entropy on the horseshoe.
[/example]
## The Variational Principle
Topological entropy counts orbit complexity without choosing an invariant measure. The measure-theoretic theory asks how much information is produced on average along typical orbits for a chosen invariant probability measure. The bridge between the two viewpoints is the variational principle.
[definition: Measure-Theoretic Entropy]
Let $(X,\mathcal F,\mu)$ be a probability space and let $f:X\to X$ be a measurable map satisfying $\mu(f^{-1}A)=\mu(A)$ for every $A\in\mathcal F$. For a finite measurable partition $\mathcal P$ of $X$, define
\begin{align*}
H_\mu(\mathcal P)=-\sum_{P\in\mathcal P}\mu(P)\log\mu(P).
\end{align*}
Then set
\begin{align*}
h_\mu(f,\mathcal P)=\lim_{n\to\infty}\frac{1}{n}H_\mu(\mathcal P\vee f^{-1}\mathcal P\vee\cdots\vee f^{-(n-1)}\mathcal P).
\end{align*}
Here $0\log 0$ is interpreted as $0$.
The measure-theoretic entropy of $f$ with respect to $\mu$ is
\begin{align*}
h_\mu(f)=\sup_{\mathcal P} h_\mu(f,\mathcal P)\in[0,\infty],
\end{align*}
where the supremum is over finite measurable partitions of $X$.
[/definition]
This definition will be developed in the measure-theoretic part of the course. For now it explains why topological entropy is the maximal information production available among invariant measures.
[quotetheorem:7763]
This statement depends on the compact metric phase space and continuity of $f$, which ensure that orbit complexity can be compared with invariant Borel probability measures. Outside this setting the formula can fail or require a different version; for instance, non-compact spaces may lose tightness of measures built from long orbit segments, and discontinuous maps need not interact well with open covers. The theorem also does not assert that a measure attaining the supremum always exists. Its role here is to identify topological entropy as the largest measure-theoretic information rate available in the system.
[remark: Meaning Of The Variational Principle]
The variational principle says that topological entropy is not merely a worst-case count of orbit segments. It is the largest possible average information rate seen by an invariant probability measure. Measures attaining the supremum, when they exist, are called measures of maximal entropy.
[/remark]
This result closes the topological part of the course and prepares the transition to ergodic theory. Symbolic systems will remain the main testing ground: for the full $k$-shift, the measure assigning equal probability $1/k$ to each symbol has entropy $\log k$, matching the topological entropy computed above.
Topological entropy quantifies how many orbit segments a system can distinguish as time grows, converting the earlier geometric constructions into a single numerical invariant. The next chapter changes perspective again, moving from orbit counts to invariant measures and the statistical behaviour they encode.
# 8. Invariant Measures and Ergodicity
The earlier chapters developed chaos through topology, symbolic coding, entropy, and Lyapunov growth. This chapter changes viewpoint: instead of asking what every orbit does pointwise, we ask which probability distributions are preserved by the dynamics and what they say about long-run statistics. The central question is whether time averages along a typical orbit agree with space averages against an invariant measure.
Invariant measures connect Chapter 1 recurrence, Chapter 2 symbolic dynamics, and Chapter 7 entropy. They make it possible to speak about typical points even when individual orbits are complicated, and they provide the setting in which ergodicity means statistical indecomposability. The chapter also records the standard hierarchy from ergodicity to mixing and states the decomposition theorem explaining why ergodic systems are the basic building blocks of measure-preserving dynamics.
## Invariant Measures and Statistical Observables
What should replace a fixed point when a chaotic system has no preferred single long-term state? For a map $T:X\to X$, an invariant probability measure is a distribution of mass that is unchanged by applying $T$. This is the measure-theoretic analogue of an invariant set, but it remembers how much mass is assigned to each observable region.
[definition: Invariant Probability Measure]
Let $(X,\mathcal B)$ be a measurable space and let $T:X\to X$ be measurable. A probability measure $\mu$ on $(X,\mathcal B)$ is $T$-invariant if
\begin{align*}
\mu(T^{-1}A)=\mu(A)
\end{align*}
for every $A\in\mathcal B$.
[/definition]
The preimage appears because $T$ need not be invertible. Equivalently, if $f\in L^1(X,\mathcal B,\mu)$, invariance is expressed by the identity
\begin{align*}
\int_X f\circ T\,d\mu=\int_X f\,d\mu.
\end{align*}
This identity is often the most useful form, since dynamics is tested by observables rather than by all measurable sets at once.
[definition: Measure-Preserving System]
A measure-preserving system is a quadruple $(X,\mathcal B,\mu,T)$ where $(X,\mathcal B,\mu)$ is a probability space, $T:X\to X$ is measurable, and $\mu$ is $T$-invariant.
[/definition]
Once the system is measure-preserving, composition with $T$ acts on functions without changing their $L^p$ sizes. This turns orbit statistics into operator theory, which is why the mean ergodic theorem later appears as natural context.
[example: Haar Measure for Irrational Rotation]
Let $\mathbb T=\mathbb R/\mathbb Z$ and let $R_\alpha(x)=x+\alpha \pmod 1$. Write $m$ for Haar measure, normalized so that an arc of length $\ell$ has measure $\ell$. We show that $m(R_\alpha^{-1}A)=m(A)$ for every Borel set $A\subseteq\mathbb T$.
First check this on half-open arcs. If $I=[a,b)\subset \mathbb T$ with no wrap-around, then
\begin{align*}
R_\alpha^{-1}I=[a-\alpha,b-\alpha)\pmod 1.
\end{align*}
Translation changes both endpoints by the same amount, so the length is unchanged:
\begin{align*}
m(R_\alpha^{-1}I)=b-a=m(I).
\end{align*}
If the translated interval crosses $0$, it splits as a disjoint union $[0,b-\alpha)\cup[a-\alpha,1)$ after choosing representatives in $[0,1)$, and its total length is
\begin{align*}
(b-\alpha)+(1-(a-\alpha))=b-a+1.
\end{align*}
This is exactly the circular length of the wrapped arc. Thus every arc has the same Haar measure as its preimage under $R_\alpha$.
Now let
\begin{align*}
\mathcal C=\{A\in\mathcal B(\mathbb T):m(R_\alpha^{-1}A)=m(A)\}.
\end{align*}
The class $\mathcal C$ contains all arcs by the computation above. It is closed under complements because
\begin{align*}
m(R_\alpha^{-1}(\mathbb T\setminus A))=m(\mathbb T\setminus R_\alpha^{-1}A)=1-m(R_\alpha^{-1}A)=1-m(A)=m(\mathbb T\setminus A),
\end{align*}
and it is closed under countable disjoint unions by countable additivity of $m$ and the identity $R_\alpha^{-1}(\bigcup_n A_n)=\bigcup_n R_\alpha^{-1}A_n$. Since arcs generate the Borel $\sigma$-algebra on $\mathbb T$, $\mathcal C=\mathcal B(\mathbb T)$, so $m$ is $R_\alpha$-invariant. Irrationality is not needed for invariance; it is what makes the rotation have dense orbits, while the invariant statistical distribution remains the uniform Haar measure.
[/example]
The same principle applies in symbolic dynamics, where the invariant measure is often built directly from coordinate probabilities. This makes shift spaces the main test case for the definitions.
[example: Bernoulli Measure on the Full Shift]
Let $\Sigma_2=\{0,1\}^{\mathbb N}$ with left shift $\sigma(x)_j=x_{j+1}$, and fix $p\in(0,1)$. Put $p_1=p$ and $p_0=1-p$. For a cylinder
\begin{align*}[a_0\cdots a_{n-1}]=\{x\in\Sigma_2:x_0=a_0,\dots,x_{n-1}=a_{n-1}\},\end{align*}
define
\begin{align*}\mu_p([a_0\cdots a_{n-1}])=\prod_{j=0}^{n-1}p_{a_j}.\end{align*}
We verify invariance first on cylinders. A point $x$ lies in $\sigma^{-1}[a_0\cdots a_{n-1}]$ exactly when $\sigma(x)_j=a_j$ for $0\leq j\leq n-1$, which means $x_{j+1}=a_j$ for $0\leq j\leq n-1$. The coordinate $x_0$ is free, so
\begin{align*}\sigma^{-1}[a_0\cdots a_{n-1}]=[0a_0\cdots a_{n-1}]\cup[1a_0\cdots a_{n-1}],\end{align*}
and the two cylinders on the right are disjoint. Therefore, by finite additivity on disjoint cylinder sets,
\begin{align*}\mu_p(\sigma^{-1}[a_0\cdots a_{n-1}])=\mu_p([0a_0\cdots a_{n-1}])+\mu_p([1a_0\cdots a_{n-1}]).\end{align*}
Using the defining cylinder formula on each term gives
\begin{align*}\mu_p([0a_0\cdots a_{n-1}])=p_0\prod_{j=0}^{n-1}p_{a_j}.\end{align*}
Similarly,
\begin{align*}\mu_p([1a_0\cdots a_{n-1}])=p_1\prod_{j=0}^{n-1}p_{a_j}.\end{align*}
Substituting these two values,
\begin{align*}\mu_p(\sigma^{-1}[a_0\cdots a_{n-1}])=(p_0+p_1)\prod_{j=0}^{n-1}p_{a_j}.\end{align*}
Since $p_0+p_1=(1-p)+p=1$,
\begin{align*}\mu_p(\sigma^{-1}[a_0\cdots a_{n-1}])=\prod_{j=0}^{n-1}p_{a_j}=\mu_p([a_0\cdots a_{n-1}]).\end{align*}
The cylinders generate the product Borel $\sigma$-algebra on $\Sigma_2$, and both $A\mapsto\mu_p(\sigma^{-1}A)$ and $A\mapsto\mu_p(A)$ are probability measures. Since they agree on the cylinder algebra, they agree on all Borel sets. Thus $\mu_p$ is $\sigma$-invariant, giving a symbolic model whose coordinates are independent and whose symbol $1$ has frequency parameter $p$.
[/example]
## Existence of Invariant Measures
When a topological system is given first, invariant measures are not part of the input. The first structural problem is therefore existence: can a continuous map on a compact phase space always support at least one invariant probability measure? The answer is yes, and the proof uses averaged empirical distributions along a single orbit.
[quotetheorem:3423]
Compactness is the mechanism that prevents the empirical measures from losing mass at infinity; on a noncompact space, averages of point masses may drift away and have no probability-measure limit. Continuity is used so that $f\circ T$ is still a continuous [test function](/page/Test%20Function) and weak* convergence can pass through the invariance identity. The theorem is therefore an existence result, not a classification result: it does not say that the invariant measure is unique, ergodic, absolutely continuous, or related to most initial points. The next examples show precisely this range of possibilities, from atomic measures on periodic orbits to smooth-looking measures for expanding maps.
The theorem gives existence but says little about uniqueness. In many systems there are many invariant measures, supported on different pieces of the phase space or reflecting different statistical regimes.
[example: Periodic Orbit Measures]
Suppose $p\in X$ has period $q$, so $T^q p=p$ and the points $p,Tp,\dots,T^{q-1}p$ form one periodic orbit. Define
\begin{align*}
\mu_p=\frac{1}{q}\sum_{j=0}^{q-1}\delta_{T^j p}.
\end{align*}
We verify $T$-invariance by checking the defining identity on an arbitrary measurable set $A\subseteq X$. For a point mass, $\delta_y(T^{-1}A)=1$ exactly when $y\in T^{-1}A$, which is equivalent to $Ty\in A$; hence
\begin{align*}
\delta_y(T^{-1}A)=\delta_{Ty}(A).
\end{align*}
Applying this with $y=T^j p$ gives
\begin{align*}
\delta_{T^j p}(T^{-1}A)=\delta_{T^{j+1}p}(A).
\end{align*}
Therefore
\begin{align*}
\mu_p(T^{-1}A)=\frac{1}{q}\sum_{j=0}^{q-1}\delta_{T^{j+1}p}(A).
\end{align*}
Because $T^q p=p$, the list $T^1p,T^2p,\dots,T^q p$ is the same list of orbit points as $p,Tp,\dots,T^{q-1}p$, only cyclically reordered. A finite sum is unchanged by reordering, so
\begin{align*}
\frac{1}{q}\sum_{j=0}^{q-1}\delta_{T^{j+1}p}(A)=\frac{1}{q}\sum_{j=0}^{q-1}\delta_{T^j p}(A).
\end{align*}
The right-hand side is $\mu_p(A)$, so
\begin{align*}
\mu_p(T^{-1}A)=\mu_p(A).
\end{align*}
Thus $\mu_p$ is $T$-invariant. The invariant measure assigns all its mass to the finite orbit of $p$, showing that invariant statistics can live on a small periodic subset even when the surrounding phase space is much larger.
[/example]
For expanding maps, invariant measures can also be absolutely continuous with respect to Lebesgue measure. The next example is the standard model behind many later transfer-operator arguments.
[example: Invariant Density for the Doubling Map]
Let $D:\mathbb T\to\mathbb T$ be $D(x)=2x\pmod 1$, and let $m$ be normalized Haar measure. We verify $D$-invariance first on a half-open arc $I=[a,b)\subset[0,1)$, where $0\leq a<b\leq 1$. A point $x\in[0,1)$ belongs to $D^{-1}I$ exactly when $2x\pmod 1\in[a,b)$. This happens in two cases:
\begin{align*}
2x\in[a,b)
\end{align*}
or
\begin{align*}
2x\in[1+a,1+b).
\end{align*}
Dividing the first interval by $2$ gives $x\in[a/2,b/2)$, and dividing the second interval by $2$ gives $x\in[(1+a)/2,(1+b)/2)$. Hence
\begin{align*}
D^{-1}I=[a/2,b/2)\cup[(1+a)/2,(1+b)/2).
\end{align*}
The two intervals are disjoint, and their lengths are
\begin{align*}
b/2-a/2=(b-a)/2
\end{align*}
and
\begin{align*}
(1+b)/2-(1+a)/2=(b-a)/2.
\end{align*}
Therefore
\begin{align*}
m(D^{-1}I)=(b-a)/2+(b-a)/2=b-a=m(I).
\end{align*}
Now set
\begin{align*}
\mathcal C=\{A\in\mathcal B(\mathbb T):m(D^{-1}A)=m(A)\}.
\end{align*}
The computation above puts every half-open arc in $\mathcal C$. If $A\in\mathcal C$, then $D^{-1}(\mathbb T\setminus A)=\mathbb T\setminus D^{-1}A$, so
\begin{align*}
m(D^{-1}(\mathbb T\setminus A))=1-m(D^{-1}A)=1-m(A)=m(\mathbb T\setminus A).
\end{align*}
Thus $\mathcal C$ is closed under complements. If $(A_n)$ are pairwise disjoint sets in $\mathcal C$, then the sets $D^{-1}A_n$ are pairwise disjoint and $D^{-1}(\bigcup_n A_n)=\bigcup_nD^{-1}A_n$, so countable additivity gives
\begin{align*}
m(D^{-1}(\bigcup_n A_n))=\sum_n m(D^{-1}A_n)=\sum_n m(A_n)=m(\bigcup_n A_n).
\end{align*}
Thus $\mathcal C$ is a $\sigma$-algebra containing the arcs, which generate $\mathcal B(\mathbb T)$. Hence $m(D^{-1}A)=m(A)$ for every Borel set $A\subseteq\mathbb T$, so $m$ is $D$-invariant. Since $dm/dm=1$, this invariant measure has density $1$ with respect to Haar measure; for a general piecewise expanding map, the analogous density condition is the Perron-Frobenius equation for the transfer operator.
[/example]
## Ergodicity and Time Averages
An invariant measure may still split into independent statistical components. The next question is whether there are nontrivial invariant events, since such events separate the system into pieces that never exchange mass under the dynamics. Ergodicity is the condition that this does not happen at positive measure scale.
[definition: Ergodic Measure]
Let $(X,\mathcal B,\mu,T)$ be a measure-preserving system. The measure $\mu$ is ergodic for $T$ if every $A\in\mathcal B$ satisfying $T^{-1}A=A$ has $\mu(A)\in\{0,1\}$.
[/definition]
The definition says that the only invariant yes-or-no questions have deterministic answers almost everywhere. Without this condition, a system may look statistically stable only because it is a mixture of several independent behaviours; for example, a convex combination of two fixed-point measures is invariant but still remembers which fixed point was chosen. In practice, the functional version is often used: if $f\in L^1(X,\mathcal B,\mu)$ and $f\circ T=f$ $\mu$-a.e., then $f$ is constant $\mu$-a.e.
[example: Irrational Rotation Is Ergodic for Haar Measure]
Let $R_\alpha(x)=x+\alpha\pmod 1$ with $\alpha$ irrational, and let $m$ be normalized Haar measure on $\mathbb T$. We show that every $R_\alpha$-invariant Borel set has measure $0$ or $1$. Let $A\subseteq\mathbb T$ satisfy $R_\alpha^{-1}A=A$, and write
\begin{align*}
\widehat{\mathbb{1}_A}(n)=\int_{\mathbb T}\mathbb{1}_A(x)e^{-2\pi i n x}\,dm(x).
\end{align*}
Since $R_\alpha^{-1}A=A$, we have $\mathbb{1}_A(R_\alpha x)=\mathbb{1}_A(x)$ for $m$-a.e. $x$. Therefore
\begin{align*}
\widehat{\mathbb{1}_A}(n)=\int_{\mathbb T}\mathbb{1}_A(R_\alpha x)e^{-2\pi i n x}\,dm(x).
\end{align*}
Using translation-invariance of Haar measure and the substitution $y=x+\alpha$, the last integral becomes
\begin{align*}
\int_{\mathbb T}\mathbb{1}_A(y)e^{-2\pi i n (y-\alpha)}\,dm(y)=e^{2\pi i n\alpha}\int_{\mathbb T}\mathbb{1}_A(y)e^{-2\pi i n y}\,dm(y).
\end{align*}
Thus
\begin{align*}
\widehat{\mathbb{1}_A}(n)=e^{2\pi i n\alpha}\widehat{\mathbb{1}_A}(n).
\end{align*}
Equivalently,
\begin{align*}
(1-e^{2\pi i n\alpha})\widehat{\mathbb{1}_A}(n)=0.
\end{align*}
If $n\neq 0$, then $n\alpha\notin\mathbb Z$ because $\alpha$ is irrational, so $e^{2\pi i n\alpha}\neq 1$. Hence
\begin{align*}
\widehat{\mathbb{1}_A}(n)=0
\end{align*}
for every $n\neq 0$.
The only possibly nonzero Fourier coefficient is
\begin{align*}
\widehat{\mathbb{1}_A}(0)=\int_{\mathbb T}\mathbb{1}_A\,dm=m(A).
\end{align*}
By the uniqueness of [Fourier series](/page/Fourier%20Series) in $L^2(\mathbb T)$, $\mathbb{1}_A=m(A)$ $m$-a.e. Since $\mathbb{1}_A$ only takes the values $0$ and $1$, the constant $m(A)$ must be either $0$ or $1$. Therefore Haar measure is ergodic for the irrational rotation.
[/example]
Ergodicity earns its central role because it turns long orbit averages into space averages. To formulate this comparison precisely, we need notation for the finite averages accumulated along the first $N$ iterates of a point.
[definition: Birkhoff Average]
Let $(X,\mathcal B,\mu,T)$ be a measure-preserving system and let $f\in L^1(X,\mathcal B,\mu)$. For each $N\in\mathbb N$, the $N$th Birkhoff average is the measurable function $A_N f:X\to\mathbb R$ defined by
\begin{align*}
A_N f(x)=\frac{1}{N}\sum_{k=0}^{N-1} f(T^k x).
\end{align*}
[/definition]
The limit, when it exists, is a time average along the orbit of $x$. The comparison object is the space average $\int_X f\,d\mu$, so the next theorem is needed to prove both existence of the time-average limit and agreement with the space average in the ergodic case.
[quotetheorem:518]
The hypotheses in Birkhoff's theorem each carry statistical content. Measure-preservation keeps all iterates of $f$ in the same $L^1$ scale, so the averages are not biased by systematic loss or creation of mass. Integrability is the minimal assumption ensuring that the space average is finite and that the exceptional large values can be controlled by truncation. Ergodicity is used only at the final step: without it, the pointwise limit still exists, but it is the conditional average on the invariant component containing $x$, not necessarily the global number $\int_X f\,d\mu$. The theorem also does not give [uniform convergence](/page/Uniform%20Convergence) in $x$, a rate of convergence, or decorrelation estimates; those belong to stronger mixing or spectral hypotheses.
Birkhoff's theorem is pointwise and therefore stronger than a norm convergence statement. The Hilbert-space version is older and cleaner, and it explains why averaging an isometry should project onto invariant vectors.
[quotetheorem:3448]
In dynamics, this is applied to $H=L^2(X,\mathcal B,\mu)$ and $Uf=f\circ T$ when $T$ is invertible and measure-preserving. The result is included here as context: the course uses it to compare mean convergence with Birkhoff's almost-everywhere convergence, rather than as the main proof route.
[example: Frequency of Symbols in a Bernoulli Shift]
Let $(\Sigma_2,\mathcal B,\mu_p,\sigma)$ be the Bernoulli shift, and let $f(x)=\mathbb{1}_{\{x_0=1\}}$. For each $k\geq 0$, the zeroth coordinate of $\sigma^k x$ is $x_k$, so
\begin{align*}
f(\sigma^k x)=\mathbb{1}_{\{(\sigma^k x)_0=1\}}=\mathbb{1}_{\{x_k=1\}}.
\end{align*}
Therefore the $N$th Birkhoff average of $f$ is exactly the empirical frequency of $1$s among the first $N$ symbols:
\begin{align*}
A_Nf(x)=\frac{1}{N}\sum_{k=0}^{N-1}f(\sigma^k x)=\frac{1}{N}\sum_{k=0}^{N-1}\mathbb{1}_{\{x_k=1\}}.
\end{align*}
The space average is the $\mu_p$-measure of the one-symbol cylinder $[1]=\{x\in\Sigma_2:x_0=1\}$:
\begin{align*}
\int_{\Sigma_2} f\,d\mu_p=\int_{\Sigma_2}\mathbb{1}_{[1]}\,d\mu_p=\mu_p([1]).
\end{align*}
By the defining cylinder formula for the Bernoulli measure,
\begin{align*}
\mu_p([1])=p_1=p.
\end{align*}
Since the Bernoulli shift is ergodic, the *[Birkhoff Ergodic Theorem](/theorems/518)* gives
\begin{align*}
\frac{1}{N}\sum_{k=0}^{N-1}\mathbb{1}_{\{x_k=1\}}\to p
\end{align*}
for $\mu_p$-a.e. sequence $x$. Thus, for almost every Bernoulli sequence, the long-run frequency of the symbol $1$ equals its assigned probability $p$.
[/example]
## Ergodic Decomposition
If an invariant measure is not ergodic, the previous theorem still gives orbit averages, but the limiting value may depend on which invariant component contains the starting point. The structural question is whether every invariant measure can be assembled from ergodic measures. The [ergodic decomposition theorem](/theorems/3453) says that this is the correct picture under the standard hypotheses used in the course.
[quotetheorem:3453]
This theorem is stated as a structural result rather than proved in this course. Its proof uses measurable selection and Choquet-type ideas: the compact convex set of invariant probability measures has ergodic measures as its extreme points, and an invariant measure decomposes into extreme components.
[remark: Interpretation of Ergodic Components]
The measure $\Pi$ is a distribution over statistical behaviours. Sampling first an ergodic measure $\nu$ according to $\Pi$ and then sampling a point according to $\nu$ reproduces the original measure $\mu$. In this sense, non-ergodic invariant measures encode uncertainty about which ergodic regime the system is in.
[/remark]
The theorem also clarifies why ergodicity is not an exotic condition. It is the analogue of studying indecomposable components before forming mixtures.
[example: Mixture of Two Fixed Points]
Let $T:[0,1]\to[0,1]$ satisfy $T(0)=0$ and $T(1)=1$, and fix $a\in[0,1]$. Define
\begin{align*}
\mu=a\delta_0+(1-a)\delta_1.
\end{align*}
We first verify invariance. For any Borel set $A\subseteq[0,1]$, a point mass satisfies $\delta_y(T^{-1}A)=1$ exactly when $y\in T^{-1}A$, which is equivalent to $T(y)\in A$. Thus
\begin{align*}
\delta_y(T^{-1}A)=\delta_{T(y)}(A).
\end{align*}
Using $T(0)=0$ gives
\begin{align*}
\delta_0(T^{-1}A)=\delta_{T(0)}(A)=\delta_0(A).
\end{align*}
Using $T(1)=1$ gives
\begin{align*}
\delta_1(T^{-1}A)=\delta_{T(1)}(A)=\delta_1(A).
\end{align*}
Therefore
\begin{align*}
\mu(T^{-1}A)=a\delta_0(T^{-1}A)+(1-a)\delta_1(T^{-1}A)=a\delta_0(A)+(1-a)\delta_1(A)=\mu(A).
\end{align*}
So $\mu$ is $T$-invariant.
The fixed-point measures $\delta_0$ and $\delta_1$ are ergodic: if $B$ is invariant, then $\delta_0(B)$ is either $0$ or $1$ according as $0\notin B$ or $0\in B$, and the same argument applies to $\delta_1$. When $0<a<1$, the two-point support splits into the two invariant fixed-point components. The set $\{0\}$ has
\begin{align*}
\mu(\{0\})=a\delta_0(\{0\})+(1-a)\delta_1(\{0\})=a\cdot 1+(1-a)\cdot 0=a,
\end{align*}
which lies strictly between $0$ and $1$. Hence the measure is not ergodic for $0<a<1$. Thus $\mu$ is ergodic exactly at the endpoints $a=0$ and $a=1$, and for intermediate $a$ it is the convex combination of the two ergodic fixed-point measures.
[/example]
## Markov Measures and Subshifts of Finite Type
The symbolic systems from the earlier chapter often carry invariant measures with memory. The problem is to assign probabilities to allowed words in a way that respects both the transition rules and the shift. Markov measures do this by combining a stationary distribution with a transition matrix.
[definition: Markov Measure on a Subshift of Finite Type]
Let $A$ be a $\{0,1\}$ transition matrix on the alphabet $\{1,\dots,r\}$, let $\Sigma_A$ be the associated one-sided subshift of finite type, let $P=(P_{ij})$ be a stochastic matrix with $P_{ij}=0$ whenever $A_{ij}=0$, and let $\pi$ be a probability vector satisfying $\pi P=\pi$. The Markov measure $\mu_{\pi,P}$ on cylinders is given by
\begin{align*}
\mu_{\pi,P}([i_0i_1\cdots i_n])=\pi_{i_0}P_{i_0i_1}P_{i_1i_2}\cdots P_{i_{n-1}i_n}
\end{align*}
for every admissible word $i_0i_1\cdots i_n$.
[/definition]
Stationarity of $\pi$ is exactly the condition that the cylinder probabilities are unchanged after shifting. Thus Markov measures are the natural invariant probabilities for topological Markov chains, in the same way Bernoulli measures are natural for full shifts.
[example: Golden Mean Markov Measure]
Let $\Sigma_A\subseteq\{0,1\}^{\mathbb N}$ be the golden mean shift, so the allowed transitions are $00$, $01$, and $10$, while $11$ is forbidden. Choose $q\in(0,1)$ and define the transition probabilities by
\begin{align*}
P_{00}=1-q,\quad P_{01}=q,\quad P_{10}=1,\quad P_{11}=0.
\end{align*}
Write $\pi=(\pi_0,\pi_1)$. The stationarity condition $\pi P=\pi$ says that
\begin{align*}
\pi_0(1-q)+\pi_1=\pi_0.
\end{align*}
It also says that
\begin{align*}
\pi_0q=\pi_1.
\end{align*}
Together with normalization,
\begin{align*}
\pi_0+\pi_1=1.
\end{align*}
Substituting $\pi_1=q\pi_0$ into the normalization equation gives
\begin{align*}
\pi_0+q\pi_0=(1+q)\pi_0=1.
\end{align*}
Hence
\begin{align*}
\pi_0=\frac{1}{1+q}.
\end{align*}
Then
\begin{align*}
\pi_1=q\pi_0=\frac{q}{1+q}.
\end{align*}
The remaining stationarity equation is satisfied because
\begin{align*}
\pi_0(1-q)+\pi_1=\frac{1-q}{1+q}+\frac{q}{1+q}=\frac{1}{1+q}=\pi_0.
\end{align*}
For an admissible cylinder $[i_0i_1\cdots i_n]$, the Markov formula gives
\begin{align*}
\mu_{\pi,P}([i_0i_1\cdots i_n])=\pi_{i_0}P_{i_0i_1}P_{i_1i_2}\cdots P_{i_{n-1}i_n}.
\end{align*}
Because $\pi_0>0$, $\pi_1>0$, $P_{00}=1-q>0$, $P_{01}=q>0$, and $P_{10}=1>0$, every cylinder whose word has no consecutive $1$s has positive measure. If a word contains $11$, then one factor in the product is $P_{11}=0$, so the cylinder has measure $0$. Thus the measure gives positive mass exactly to the cylinders allowed by the golden mean rule.
Now verify shift-invariance on cylinders. For an admissible word $i_0i_1\cdots i_n$, the preimage $\sigma^{-1}[i_0i_1\cdots i_n]$ is the disjoint union of the cylinders $[s i_0i_1\cdots i_n]$ over those symbols $s\in\{0,1\}$ for which the transition $s\to i_0$ is allowed. Therefore finite additivity and the cylinder formula give
\begin{align*}
\mu_{\pi,P}(\sigma^{-1}[i_0i_1\cdots i_n])=\sum_{s:P_{s i_0}>0}\pi_sP_{s i_0}P_{i_0i_1}\cdots P_{i_{n-1}i_n}.
\end{align*}
Terms with $P_{s i_0}=0$ contribute $0$, so this is the same as
\begin{align*}
\mu_{\pi,P}(\sigma^{-1}[i_0i_1\cdots i_n])=\left(\sum_{s\in\{0,1\}}\pi_sP_{s i_0}\right)P_{i_0i_1}\cdots P_{i_{n-1}i_n}.
\end{align*}
By stationarity, $\sum_{s\in\{0,1\}}\pi_sP_{s i_0}=\pi_{i_0}$, and hence
\begin{align*}
\mu_{\pi,P}(\sigma^{-1}[i_0i_1\cdots i_n])=\pi_{i_0}P_{i_0i_1}\cdots P_{i_{n-1}i_n}.
\end{align*}
The right-hand side is exactly
\begin{align*}
\mu_{\pi,P}([i_0i_1\cdots i_n]).
\end{align*}
Since cylinders generate the Borel $\sigma$-algebra of $\Sigma_A$, the equality extends to all Borel sets. Thus $\mu_{\pi,P}$ is shift-invariant, and the parameter $q$ is exactly the transition probability from $0$ to $1$ while the forbidden transition $1\to 1$ remains impossible.
[/example]
Markov measures may be ergodic or mixing depending on the transition structure. For irreducible and aperiodic stochastic matrices compatible with a mixing subshift, the associated Markov measure is mixing for the shift.
## Mixing Properties
Ergodicity says invariant sets have only zero or full measure, but it does not quantify how images of sets spread through the space. Mixing conditions ask whether distant-time events become asymptotically independent. They form a hierarchy of stronger statistical randomness.
[definition: Strong Mixing]
Let $(X,\mathcal B,\mu,T)$ be a measure-preserving system. The system is strongly mixing if
\begin{align*}
\mu(A\cap T^{-n}B)\to \mu(A)\mu(B)
\end{align*}
as $n\to\infty$ for all $A,B\in\mathcal B$.
[/definition]
Strong mixing says that knowing whether $x\in A$ becomes asymptotically irrelevant to whether $T^n x\in B$. Pointwise convergence for every large time is often too rigid: correlations may have exceptional spikes caused by arithmetic or spectral structure even though those spikes occur with negligible average weight. This motivates an averaged condition that keeps the idea of losing correlations while allowing such sparse exceptional times.
[definition: Weak Mixing]
Let $(X,\mathcal B,\mu,T)$ be a measure-preserving system. The system is weakly mixing if
\begin{align*}
\frac{1}{N}\sum_{n=0}^{N-1}\left|\mu(A\cap T^{-n}B)-\mu(A)\mu(B)\right|\to 0
\end{align*}
for all $A,B\in\mathcal B$.
[/definition]
Weak mixing permits exceptional times with noticeable correlation, provided their average contribution vanishes. To place these notions in the hierarchy, we need to verify that strong asymptotic independence rules out nontrivial invariant sets.
[quotetheorem:3436]
Strong mixing is much stronger than the argument needs: the proof only uses asymptotic independence for the single pair $(A,A)$ when $A$ is invariant. The converse fails in important examples; [irrational rotations are ergodic](/theorems/3429), but they are not mixing because rotations preserve Fourier frequencies instead of separating them. Weak mixing lies between these behaviours, ruling out persistent eigenfunction correlations while still allowing failure of pointwise correlation convergence at some times. Thus the hierarchy records genuine strengthening, not merely different language for the same statistical property:
\begin{align*}
\text{strong mixing}\implies \text{weak mixing}\implies \text{ergodicity}.
\end{align*}
[example: Doubling Map Is Mixing for Haar Measure]
For the doubling map $D(x)=2x\pmod 1$ on $\mathbb T$, we show that Haar measure $m$ is strongly mixing, meaning that
\begin{align*}
m(A\cap D^{-n}B)\to m(A)m(B)
\end{align*}
for all Borel sets $A,B\subseteq\mathbb T$.
Write $e_k(x)=e^{2\pi i kx}$. For every $n\geq 0$,
\begin{align*}
e_k(D^n x)=e^{2\pi i k(2^n x)}=e_{2^n k}(x).
\end{align*}
If $j,k\in\mathbb Z$, then
\begin{align*}
\int_{\mathbb T} e_j(x)e_k(D^n x)\,dm(x)=\int_{\mathbb T}e^{2\pi i(j+2^n k)x}\,dm(x).
\end{align*}
The last integral is $1$ when $j+2^n k=0$ and $0$ otherwise, because nonzero Fourier modes have Haar integral $0$.
Now let
\begin{align*}
P(x)=\sum_{|j|\leq J}a_j e_j(x)
\end{align*}
and
\begin{align*}
Q(x)=\sum_{|k|\leq K}b_k e_k(x)
\end{align*}
be trigonometric polynomials. Then
\begin{align*}
\int_{\mathbb T}P(x)Q(D^n x)\,dm(x)=\sum_{|j|\leq J}\sum_{|k|\leq K}a_jb_k\int_{\mathbb T}e_{j+2^n k}(x)\,dm(x).
\end{align*}
Choose $n$ so large that $2^n>J$. If $k\neq 0$, then $|2^n k|\geq 2^n>J$, so no integer $j$ with $|j|\leq J$ can satisfy $j+2^n k=0$. Hence only the term $j=0,k=0$ remains, and
\begin{align*}
\int_{\mathbb T}P(x)Q(D^n x)\,dm(x)=a_0b_0.
\end{align*}
Since $a_0=\int_{\mathbb T}P\,dm$ and $b_0=\int_{\mathbb T}Q\,dm$, this gives
\begin{align*}
\int_{\mathbb T}P(x)Q(D^n x)\,dm(x)=\left(\int_{\mathbb T}P\,dm\right)\left(\int_{\mathbb T}Q\,dm\right)
\end{align*}
for all sufficiently large $n$.
Take Borel sets $A,B\subseteq\mathbb T$, and put $f=\mathbb{1}_A$ and $g=\mathbb{1}_B$. By [density of trigonometric polynomials](/theorems/1219) in $L^2(\mathbb T,m)$, choose $P,Q$ with
\begin{align*}
\|f-P\|_2<\varepsilon
\end{align*}
and
\begin{align*}
\|g-Q\|_2<\varepsilon.
\end{align*}
Using Cauchy-Schwarz and $D$-invariance of $m$,
\begin{align*}
\left|\int f(x)g(D^n x)\,dm-\int P(x)Q(D^n x)\,dm\right|\leq \varepsilon\|g\|_2+\|P\|_2\varepsilon.
\end{align*}
Also,
\begin{align*}
\left|\int f\,dm\int g\,dm-\int P\,dm\int Q\,dm\right|\leq \varepsilon\|g\|_2+\|P\|_2\varepsilon.
\end{align*}
For all sufficiently large $n$, the polynomial correlation equals the product of the polynomial integrals, so the two displayed estimates show that
\begin{align*}
\left|m(A\cap D^{-n}B)-m(A)m(B)\right|
\end{align*}
can be made arbitrarily small. Therefore
\begin{align*}
m(A\cap D^{-n}B)\to m(A)m(B).
\end{align*}
Thus the doubling map is strongly mixing for Haar measure: expansion doubles Fourier frequencies, and fixed observables lose correlation because their nonzero frequencies eventually cannot match.
[/example]
## Summary of the Measure-Theoretic Viewpoint
Invariant measures provide the probability spaces on which chaotic dynamics has statistical meaning. Krylov-Bogolyubov guarantees existence on compact topological systems, while examples from rotations, shifts, Markov chains, periodic orbits, and expanding maps show that invariant measures can be uniform, atomic, symbolic, or absolutely continuous.
Ergodicity is the assertion that an invariant measure has no nontrivial invariant component. Birkhoff's theorem makes this condition operational: for an ergodic system, time averages of integrable observables agree almost everywhere with their space averages. Ergodic decomposition then explains non-ergodic measures as mixtures of ergodic statistical regimes, and mixing properties strengthen ergodicity by imposing asymptotic independence over time.
Invariant measures and ergodicity explain how time averages relate to space averages for typical orbits, giving the statistical meaning behind chaotic motion. The next chapter refines this by asking how fast tangent vectors grow and how that growth controls the geometry of smooth ergodic systems.
# 9. Lyapunov Exponents and Smooth Ergodic Theory
This chapter turns the qualitative picture of chaos into quantitative rates of stretching. Earlier chapters described recurrence, symbolic coding, hyperbolicity, entropy, and invariant measures; here the central question is how a typical tangent vector grows under repeated differentiation of the dynamics. Lyapunov exponents measure these exponential growth rates, and smooth ergodic theory studies how those rates control instability, entropy, and dimension.
## Measuring Exponential Growth Along Orbits
The first problem is to assign a numerical stretching rate to an orbit without assuming that the derivative is constant. For a smooth map $f:M\to M$, the derivative $df_x:T_xM\to T_{f(x)}M$ changes with $x$, so the derivative along an orbit is a product of different linear maps. The correct object is therefore not a single matrix but a linear cocycle over the dynamics.
[definition: Derivative Cocycle]
Let $M$ be a smooth Riemannian manifold and let $f:M\to M$ be a $C^1$ map. The derivative cocycle over $f$ is the family of linear maps
\begin{align*}
df_x^n &: T_xM \to T_{f^n(x)}M,\qquad n\geq 0,
\end{align*}
defined by $df_x^0=\operatorname{id}_{T_xM}$ and
\begin{align*}
df_x^n = df_{f^{n-1}(x)}\circ \cdots \circ df_{f(x)}\circ df_x
\end{align*}
for $n\geq 1$.
[/definition]
The derivative cocycle identity
\begin{align*}
df_x^{n+m}=df_{f^n(x)}^m\circ df_x^n
\end{align*}
is the algebraic feature that replaces the usual law $A^{n+m}=A^mA^n$ for powers of one matrix. Since many arguments use only this multiplicative identity and not the manifold itself, we isolate the abstract cocycle notion next. This lets the same language cover derivative growth, random matrix products, and products driven by symbolic itineraries.
[definition: Linear Cocycle]
Let $(X,\mathcal B,\mu)$ be a probability space, let $T:X\to X$ be a measure-preserving map, and let $A:X\to GL(d,\mathbb R)$ be measurable. The linear cocycle generated by $A$ is the family of maps $A^n:X\to GL(d,\mathbb R)$ given by
\begin{align*}
A^n(x) = A(T^{n-1}x)\cdots A(Tx)A(x),\qquad n\geq 1,
\end{align*}
with $A^0(x)=I_d$.
[/definition]
Equivalently, each value $A^n(x)$ acts as a linear map $\mathbb R^d\to\mathbb R^d$ on tangent or coordinate vectors.
This abstract form includes derivative cocycles written in local bundle coordinates, random matrix products, and matrix products driven by symbolic dynamics. The next question is how to extract growth rates from $A^n(x)v$.
[definition: Lyapunov Exponent of a Vector]
Let $(X,\mathcal B,\mu)$ be a probability space, let $T:X\to X$ be a measure-preserving map, and let $A:X\to GL(d,\mathbb R)$ be a measurable generator of the linear cocycle $A^n(x):\mathbb R^d\to\mathbb R^d$. For $x\in X$ and $v\in \mathbb R^d\setminus\{0\}$, a Lyapunov exponent of $v$ at $x$ is a limit
\begin{align*}
\lambda(x,v)=\lim_{n\to\infty}\frac{1}{n}\log |A^n(x)v|,
\end{align*}
when the limit exists.
[/definition]
A single orbit may contain directions with different rates. The main theorem in this chapter says that, under an integrability hypothesis, almost every orbit admits a finite list of such rates and a corresponding splitting into subspaces.
[example: Constant Diagonal Cocycle]
Let $T:X\to X$ be any measure-preserving map and let $A(x)=\operatorname{diag}(2,1/3)$ for every $x\in X$. Since every factor in the cocycle product is the same diagonal matrix, multiplication of diagonal matrices gives
\begin{align*}
A^n(x)=A(T^{n-1}x)\cdots A(Tx)A(x)=\operatorname{diag}(2,1/3)^n=\operatorname{diag}(2^n,3^{-n}).
\end{align*}
For a vector on the first coordinate axis, $v=(a,0)$ with $a\neq 0$, we have
\begin{align*}
A^n(x)v=(2^n a,0).
\end{align*}
Thus
\begin{align*}
\frac{1}{n}\log |A^n(x)v|=\frac{1}{n}\log(2^n|a|)=\log 2+\frac{1}{n}\log |a|.
\end{align*}
Letting $n\to\infty$ gives the exponent $\log 2$.
For a vector on the second coordinate axis, $v=(0,b)$ with $b\neq 0$, we have
\begin{align*}
A^n(x)v=(0,3^{-n}b).
\end{align*}
Therefore
\begin{align*}
\frac{1}{n}\log |A^n(x)v|=\frac{1}{n}\log(3^{-n}|b|)=-\log 3+\frac{1}{n}\log |b|.
\end{align*}
Letting $n\to\infty$ gives the exponent $-\log 3$.
Finally, take $v=(a,b)$ with $a\neq 0$ and $b\neq 0$. Then
\begin{align*}
A^n(x)v=(2^n a,3^{-n}b).
\end{align*}
Using the Euclidean norm,
\begin{align*}
|A^n(x)v|=\sqrt{4^n a^2+3^{-2n}b^2}.
\end{align*}
Factoring out $4^n a^2$ inside the square root gives
\begin{align*}
|A^n(x)v|=2^n|a|\sqrt{1+\frac{3^{-2n}b^2}{4^n a^2}}.
\end{align*}
Since $3^{-2n}/4^n=1/36^n$, this is
\begin{align*}
|A^n(x)v|=2^n|a|\sqrt{1+\frac{b^2}{a^2 36^n}}.
\end{align*}
Taking logarithms,
\begin{align*}
\frac{1}{n}\log |A^n(x)v|=\log 2+\frac{1}{n}\log |a|+\frac{1}{2n}\log\left(1+\frac{b^2}{a^2 36^n}\right).
\end{align*}
The term $\frac{1}{n}\log |a|$ tends to $0$, and the last logarithmic term tends to $0$ because $\frac{b^2}{a^2 36^n}\to 0$. Hence every vector with both coordinates nonzero has exponent $\log 2$. The expanding first coordinate determines the asymptotic rate unless that coordinate is exactly absent.
[/example]
This example shows why the spectrum is not merely attached to an orbit: it is attached to an orbit together with invariant directions. The general theorem constructs those directions measurably rather than continuously.
## Oseledets Splittings
The central existence problem is whether finite-time products settle into stable exponential rates for almost every initial condition. Birkhoff's ergodic theorem treats sums of scalar observables; Lyapunov exponents require a multiplicative version for matrices. The subadditive structure appears in quantities like $\log \|A^n(x)\|$, since operator norms satisfy submultiplicative estimates.
[quotetheorem:7764]
[proofunderconstruction:7764]
Each hypothesis in the theorem protects a specific part of the conclusion. Without the forward integrability condition, occasional matrices with enormous norm can make $\frac{1}{n}\log\|A^n(x)\|$ fail to have an integrable asymptotic average; for instance, a shift cocycle with heavy-tailed diagonal entries can have logarithmic growth dominated by rare symbols rather than by a finite Birkhoff average. Without invertibility of the base and the fibre maps, the forward filtration still has meaning, but a two-sided equivariant direct-sum splitting can collapse because a singular matrix may send a whole direction to $0$. Without ergodicity, the exponents need not be constants: a measure split between two invariant components can record one spectrum on one component and a different spectrum on the other.
For a non-ergodic invariant measure, the same conclusion holds with exponents and multiplicities depending measurably on $x$. In the ergodic case the rates are almost surely constant, but what data should be recorded as the invariant of the cocycle? We need both the distinct rates and the number of independent directions belonging to each rate, because later entropy and dimension estimates count expanding directions with multiplicity.
[definition: Lyapunov Spectrum]
In the setting of Oseledets' theorem, the Lyapunov spectrum is the list
\begin{align*}
\lambda_1>\lambda_2>\cdots>\lambda_k
\end{align*}
together with the multiplicities $m_i=\dim E_i(x)$.
[/definition]
The multiplicities matter because entropy and dimension formulas count how many independent directions expand at each rate. When $A=df$ comes from a $C^1$ diffeomorphism, the derivative maps are fibrewise invertible and the two-sided theorem gives tangent directions determined by the long-time geometry of the orbit. For a noninvertible map or a derivative with singular points, the appropriate multiplicative ergodic theorem is usually a forward version: it may produce a filtration by growth rates rather than a two-sided invariant direct-sum splitting. This distinction is important in smooth dynamics, because endomorphisms and interval maps still have Lyapunov exponents even when the derivative cocycle does not fit the $GL(d,\mathbb R)$ theorem above.
[definition: Oseledets Regular Point]
Let $f:M\to M$ be a $C^1$ diffeomorphism preserving a probability measure $\mu$. A point $x\in M$ is Oseledets regular if the derivative cocycle $df_x^n:T_xM\to T_{f^n(x)}M$ admits Lyapunov exponents and a two-sided Oseledets splitting for the invertible derivative cocycle.
[/definition]
Oseledets regularity is an almost-everywhere property, not a topological property. A dense orbit may pass near many different geometric features, but the theorem asserts that the exponential tangent behaviour still has a definite asymptotic structure for almost every point with respect to the chosen invariant measure. For noninvertible maps, authors often use the same phrase for points where the forward derivative cocycle has the corresponding Lyapunov filtration; in these notes, "Oseledets splitting" refers to the invertible diffeomorphism case unless a forward formulation is explicitly named.
[example: Diagonal Toral Endomorphism]
Consider the map $f:\mathbb T^2\to\mathbb T^2$ induced by $B=\operatorname{diag}(2,3)$, so $f([x_1,x_2])=[2x_1,3x_2]$ modulo $\mathbb Z^2$. Since $f$ is linear in the covering coordinates on $\mathbb R^2$, its derivative at every point is the same linear map $B$. Therefore the forward derivative cocycle is
\begin{align*}
df_x^n=B^n=\operatorname{diag}(2^n,3^n)
\end{align*}
for every $x\in\mathbb T^2$ and every $n\geq 1$.
For $e_1=(1,0)$, applying the diagonal matrix gives
\begin{align*}
df_x^n e_1=(2^n,0).
\end{align*}
With the Euclidean norm, $|df_x^n e_1|=2^n$, so
\begin{align*}
\frac{1}{n}\log |df_x^n e_1|=\frac{1}{n}\log(2^n)=\log 2.
\end{align*}
For $e_2=(0,1)$, the same calculation gives
\begin{align*}
df_x^n e_2=(0,3^n).
\end{align*}
Hence $|df_x^n e_2|=3^n$, and
\begin{align*}
\frac{1}{n}\log |df_x^n e_2|=\frac{1}{n}\log(3^n)=\log 3.
\end{align*}
Thus the coordinate directions carry the forward Lyapunov exponents $\log 2$ and $\log 3$ with respect to Haar measure, and in fact at every point.
The map is not an invertible toral automorphism: its covering degree is
\begin{align*}
|\det B|=|2\cdot 3|=6,
\end{align*}
so a typical point has $6$ preimages rather than one. This is why the example belongs to the forward cocycle setting rather than the two-sided diffeomorphism statement; it also illustrates the constant diagonal rule that invariant coordinate directions grow at rates equal to the logarithms of the corresponding eigenvalue moduli.
[/example]
The diagonal case has no folding between directions. In more typical systems, finite-time behaviour can fluctuate before the asymptotic rates are visible.
## Finite-Time Exponents and Numerical Instability
The asymptotic definition of Lyapunov exponent hides a practical issue: computations and experiments only see finite orbit segments. The finite-time exponent records the growth rate observed up to time $n$, and its variation across $x$ and $n$ is part of the system's instability.
[definition: Finite-Time Lyapunov Exponent]
Let $(X,\mathcal B,\mu)$ be a probability space, let $T:X\to X$ be a measure-preserving map, and let $A:X\to GL(d,\mathbb R)$ be a measurable generator of the linear cocycle $A^n(x):\mathbb R^d\to\mathbb R^d$. For $x\in X$, $v\in\mathbb R^d\setminus\{0\}$, and $n\geq 1$, the finite-time Lyapunov exponent in direction $v$ is
\begin{align*}
\lambda_n(x,v)=\frac{1}{n}\log\frac{|A^n(x)v|}{|v|}.
\end{align*}
[/definition]
Finite-time exponents are local-in-time diagnostics. They can be large on short intervals even when the asymptotic exponent is smaller, and they can reveal transient stretching in systems whose invariant measure is supported on complicated regions of phase space.
[example: Linear Cocycle Over the Full Shift]
Let $\Sigma=\{0,1\}^{\mathbb Z}$ with the shift map $\sigma$, and define
\begin{align*}
A(x)=\operatorname{diag}(2,1/2)\quad\text{if }x_0=0
\end{align*}
and
\begin{align*}
A(x)=\operatorname{diag}(3,1/3)\quad\text{if }x_0=1.
\end{align*}
For $n\geq 1$, let
\begin{align*}
N_0(n,x)=\#\{0\leq j\leq n-1:x_j=0\},\qquad N_1(n,x)=\#\{0\leq j\leq n-1:x_j=1\}.
\end{align*}
Since $N_0(n,x)+N_1(n,x)=n$ and all cocycle matrices are diagonal, the first diagonal entry of $A^n(x)$ is
\begin{align*}
\prod_{j=0}^{n-1}\bigl(2\mathbf 1_{\{x_j=0\}}+3\mathbf 1_{\{x_j=1\}}\bigr)=2^{N_0(n,x)}3^{N_1(n,x)}.
\end{align*}
The second diagonal entry is
\begin{align*}
\prod_{j=0}^{n-1}\bigl(2^{-1}\mathbf 1_{\{x_j=0\}}+3^{-1}\mathbf 1_{\{x_j=1\}}\bigr)=2^{-N_0(n,x)}3^{-N_1(n,x)}.
\end{align*}
Thus
\begin{align*}
A^n(x)=\operatorname{diag}\bigl(2^{N_0(n,x)}3^{N_1(n,x)},2^{-N_0(n,x)}3^{-N_1(n,x)}\bigr).
\end{align*}
For $e_1=(1,0)$, this gives
\begin{align*}
A^n(x)e_1=\bigl(2^{N_0(n,x)}3^{N_1(n,x)},0\bigr).
\end{align*}
Hence, using the Euclidean norm,
\begin{align*}
\frac{1}{n}\log |A^n(x)e_1|=\frac{1}{n}\log\bigl(2^{N_0(n,x)}3^{N_1(n,x)}\bigr).
\end{align*}
Expanding the logarithm,
\begin{align*}
\frac{1}{n}\log |A^n(x)e_1|=\frac{N_0(n,x)}{n}\log 2+\frac{N_1(n,x)}{n}\log 3.
\end{align*}
For the Bernoulli measure with $\mathbb P(x_0=0)=p$, *Birkhoff's ergodic theorem* applied to the indicator functions of the cylinder events $\{x_0=0\}$ and $\{x_0=1\}$ gives
\begin{align*}
\lim_{n\to\infty}\frac{N_0(n,x)}{n}=p
\end{align*}
for almost every $x$, and therefore
\begin{align*}
\lim_{n\to\infty}\frac{N_1(n,x)}{n}=1-p.
\end{align*}
Substituting these limits gives the first-coordinate Lyapunov exponent
\begin{align*}
\lambda(e_1)=p\log 2+(1-p)\log 3.
\end{align*}
For $e_2=(0,1)$,
\begin{align*}
A^n(x)e_2=\bigl(0,2^{-N_0(n,x)}3^{-N_1(n,x)}\bigr).
\end{align*}
Thus
\begin{align*}
\frac{1}{n}\log |A^n(x)e_2|=\frac{1}{n}\log\bigl(2^{-N_0(n,x)}3^{-N_1(n,x)}\bigr).
\end{align*}
Expanding again,
\begin{align*}
\frac{1}{n}\log |A^n(x)e_2|=-\frac{N_0(n,x)}{n}\log 2-\frac{N_1(n,x)}{n}\log 3.
\end{align*}
Using the same almost-sure frequency limits,
\begin{align*}
\lambda(e_2)=-p\log 2-(1-p)\log 3.
\end{align*}
The base dynamics supplies the symbol frequencies, while the diagonal fibre maps convert those frequencies into additive logarithmic growth rates.
[/example]
This symbolic example separates randomness in the base from linear growth in the fibres. It also gives a model for nonuniform hyperbolicity: the expanding and contracting directions are fixed here, but the rate along an orbit depends on the symbolic itinerary.
[remark: Dependence on the Invariant Measure]
The same smooth map can have different Lyapunov spectra for different invariant measures. A measure supported on a repelling periodic orbit records the derivative growth along that orbit, while an absolutely continuous invariant measure records typical growth for Lebesgue-distributed initial data when such a measure exists.
[/remark]
This dependence on the measure is the bridge to ergodic theory. Entropy, recurrence statistics, and Lyapunov exponents are all invariants of an invariant measure, and the strongest smooth results relate them under hyperbolicity assumptions.
## Lyapunov Exponents for One-Dimensional Expanding Maps
For an interval or circle map, there is only one tangent direction at each regular point. The derivative cocycle becomes multiplication by derivatives along the orbit, so Lyapunov exponents reduce to Birkhoff averages of logarithmic slopes.
[quotetheorem:7765]
[citeproof:7765]
This formula explains why positive Lyapunov exponents are often interpreted as sensitive dependence: nearby points separate exponentially fast along typical orbits, at least before folding and global geometry bring them back together. Ergodicity is what turns the time average into a single number independent of the starting point; for example, for a $C^1$ interval map with one attracting fixed point $p$ satisfying $|f'(p)|<1$ and one repelling fixed point $q$ satisfying $|f'(q)|>1$, the invariant measure $\frac12\delta_p+\frac12\delta_q$ gives different pointwise exponents $\log |f'(p)|$ and $\log |f'(q)|$ on its two ergodic components. The assumption $\log |f'|\in L^1(\mu)$ excludes infinite logarithmic spikes and zeros of excessive weight: the map $f(x)=x^2$ with $\mu=\delta_0$ has $\log |f'(0)|=-\infty$, so the displayed integral is not a finite Birkhoff average. Avoiding the exceptional set is also essential: for a piecewise linear tent map, an orbit landing on the turning point has no ordinary derivative at that iterate, and the chain-rule product in the theorem no longer has the stated $C^1$ form.
[example: Logistic Map at Parameter Four]
Let $f:[0,1]\to[0,1]$ be $f(x)=4x(1-x)$. For the absolutely continuous invariant probability measure
\begin{align*}
d\mu(x)=\frac{dx}{\pi\sqrt{x(1-x)}},
\end{align*}
the one-dimensional Lyapunov exponent is the $\mu$-average of $\log |f'|$, so we compute the integral explicitly. Since $f'(x)=4-8x$, the exponent is
\begin{align*}
\int_0^1 \log |4-8x|\,\frac{dx}{\pi\sqrt{x(1-x)}}.
\end{align*}
Use the substitution $x=\sin^2\theta$ with $0<\theta<\pi/2$. Then
\begin{align*}
dx=2\sin\theta\cos\theta\,d\theta.
\end{align*}
Also,
\begin{align*}
\sqrt{x(1-x)}=\sqrt{\sin^2\theta\cos^2\theta}=\sin\theta\cos\theta,
\end{align*}
because $\sin\theta$ and $\cos\theta$ are positive on $(0,\pi/2)$. Therefore
\begin{align*}
\frac{dx}{\pi\sqrt{x(1-x)}}=\frac{2}{\pi}\,d\theta.
\end{align*}
The derivative term becomes
\begin{align*}
4-8x=4-8\sin^2\theta=4(1-2\sin^2\theta)=4\cos(2\theta).
\end{align*}
Thus the integral equals
\begin{align*}
\frac{2}{\pi}\int_0^{\pi/2}\log |4\cos(2\theta)|\,d\theta.
\end{align*}
Since $\log |4\cos(2\theta)|=\log 4+\log|\cos(2\theta)|$ away from the single zero $\theta=\pi/4$, this is
\begin{align*}
\log 4+\frac{2}{\pi}\int_0^{\pi/2}\log|\cos(2\theta)|\,d\theta.
\end{align*}
Now put $u=2\theta$. Then $d\theta=du/2$, and the logarithmic term is
\begin{align*}
\frac{2}{\pi}\int_0^{\pi/2}\log|\cos(2\theta)|\,d\theta=\frac{1}{\pi}\int_0^\pi \log|\cos u|\,du.
\end{align*}
By symmetry about $\pi/2$,
\begin{align*}
\int_0^\pi \log|\cos u|\,du=2\int_0^{\pi/2}\log(\cos u)\,du.
\end{align*}
Let
\begin{align*}
I=\int_0^{\pi/2}\log(\sin u)\,du.
\end{align*}
The substitution $u\mapsto \pi/2-u$ gives
\begin{align*}
\int_0^{\pi/2}\log(\cos u)\,du=I.
\end{align*}
Also,
\begin{align*}
2I=\int_0^{\pi/2}\log(\sin u)\,du+\int_0^{\pi/2}\log(\cos u)\,du.
\end{align*}
Combining the logarithms gives
\begin{align*}
2I=\int_0^{\pi/2}\log(\sin u\cos u)\,du.
\end{align*}
Since $\sin u\cos u=\frac12\sin(2u)$,
\begin{align*}
2I=\int_0^{\pi/2}\log(\sin(2u))\,du-\frac{\pi}{2}\log 2.
\end{align*}
With $w=2u$,
\begin{align*}
\int_0^{\pi/2}\log(\sin(2u))\,du=\frac12\int_0^\pi \log(\sin w)\,dw.
\end{align*}
The symmetry $\sin(\pi-w)=\sin w$ gives
\begin{align*}
\frac12\int_0^\pi \log(\sin w)\,dw=\int_0^{\pi/2}\log(\sin w)\,dw=I.
\end{align*}
Hence
\begin{align*}
2I=I-\frac{\pi}{2}\log 2,
\end{align*}
so
\begin{align*}
I=-\frac{\pi}{2}\log 2.
\end{align*}
Therefore
\begin{align*}
\frac{1}{\pi}\int_0^\pi \log|\cos u|\,du=-\log 2.
\end{align*}
Substituting this into the exponent integral gives
\begin{align*}
\int_0^1 \log |4-8x|\,\frac{dx}{\pi\sqrt{x(1-x)}}=\log 4-\log 2=\log 2.
\end{align*}
Thus the Lyapunov exponent of the logistic map at parameter $4$ is $\log 2$ for this absolutely continuous invariant measure; the nonconstant derivative averages to the same exponential rate as the full tent-map expansion.
[/example]
The conjugacy explains why the answer matches the tent map, while the integral explains why the derivative exponent is legitimate for the logistic invariant measure. This example is a useful warning: geometric nonlinearity of the graph is not the same thing as asymptotic tangent expansion, and non-smooth coordinate changes must be handled through the invariant measure or an explicit derivative computation.
## Exponents, Instability, and Entropy
Entropy measures the exponential growth of distinguishable orbit segments, while positive Lyapunov exponents measure exponential growth of tangent vectors. Smooth ergodic theory asks how much orbit complexity can be produced by differentiable expansion. The guiding principle is that entropy cannot exceed the total positive tangent expansion available to the system.
[quotetheorem:7766]
[proofunderconstruction:7766]
Ruelle's inequality is one-sided: tangent expansion provides an upper bound for information production. Each hypothesis rules out a concrete failure. Compactness is a convenient way to force integrability of the derivative cocycle; on $M=\mathbb R$ the map $f(x)=x^2$ has $|f'(x)|=2|x|$, and measures with heavy tails can make $\int \log^+|f'|\,d\mu$ infinite, so the right-hand side is not a finite entropy bound. The $C^1$ diffeomorphism assumption gives an invertible derivative cocycle with controlled local linearisation; a piecewise expanding interval map such as the tent map has a corner where the derivative is undefined, while a smooth noninvertible map such as $x\mapsto 2x \pmod 1$ has no inverse derivative cocycle, so the two-sided Oseledets splitting used in this formulation is not the right object. Invariance of $\mu$ is also essential: for the doubling map on the circle, a Dirac mass at a nonperiodic point is not invariant, and its pushforwards sample different orbit locations, so there is no stationary measure-theoretic entropy to compare with the orbitwise exponent. Oseledets regularity supplies the measurable expanding directions and their multiplicities; for a nonintegrable matrix cocycle over a Bernoulli shift with diagonal entries $\exp(Y(x_0))$ where $Y$ has infinite mean, the averages defining exponents can fail to be finite, so the phrase "sum of positive exponents" has no stable almost-everywhere meaning. Equality requires that the invariant measure distribute mass smoothly enough along unstable directions so that available expansion is fully converted into entropy, and singular measures on hyperbolic sets often have strictly smaller entropy than the available unstable expansion.
[quotetheorem:7767]
Pesin's formula identifies the case where Ruelle's upper bound is sharp: all positive tangent expansion is visible as metric entropy. The $C^{1+\alpha}$ hypothesis is used because Pesin theory needs more than continuous differentiability to control distortion along local stable and unstable manifolds; for merely $C^1$ systems, unstable Jacobians can fluctuate too wildly for the same absolute-continuity argument. Hyperbolicity excludes zero exponents, which would create centre directions where neither expansion nor contraction controls the local geometry. Absolute continuity of conditional measures on unstable manifolds is the measure-theoretic condition that prevents expansion from occurring in directions where the measure has too little mass; without it, a singular invariant measure can have positive unstable exponents but entropy strictly below their sum. The theorem does not say that every hyperbolic invariant measure satisfies equality, and it does not compute the exponents; it explains which smooth measures convert unstable expansion into entropy. This prepares the final dimension discussion, where failure of equality is interpreted as loss of mass across stable or fractal transverse directions.
[example: Entropy of a Hyperbolic Toral Automorphism]
Let $A\in SL(2,\mathbb Z)$ be hyperbolic with eigenvalues $\rho$ and $\rho^{-1}$, where $\rho>1$, and let $f_A:\mathbb T^2\to\mathbb T^2$ be the induced toral automorphism. Since $f_A$ is induced by a linear map, its derivative is constant: $d(f_A)_x=A$ for every $x\in\mathbb T^2$. Hence the derivative cocycle satisfies
\begin{align*}
d(f_A)^n_x=A^n
\end{align*}
for every $n\geq 1$.
Let $v_u$ be a nonzero eigenvector for $\rho$. Then $Av_u=\rho v_u$, so induction gives
\begin{align*}
A^n v_u=\rho^n v_u
\end{align*}
and therefore
\begin{align*}
\frac{1}{n}\log |A^n v_u|=\frac{1}{n}\log(\rho^n|v_u|)=\log\rho+\frac{1}{n}\log|v_u|.
\end{align*}
Letting $n\to\infty$ gives the unstable Lyapunov exponent $\log\rho$. Similarly, if $v_s$ is a nonzero eigenvector for $\rho^{-1}$, then
\begin{align*}
A^n v_s=\rho^{-n}v_s
\end{align*}
so
\begin{align*}
\frac{1}{n}\log |A^n v_s|=\frac{1}{n}\log(\rho^{-n}|v_s|)=-\log\rho+\frac{1}{n}\log|v_s|.
\end{align*}
Letting $n\to\infty$ gives the stable Lyapunov exponent $-\log\rho$.
Haar measure is smooth and invariant under $f_A$, and the unstable direction has multiplicity $1$. By *[Pesin Entropy Formula](/theorems/7767)*,
\begin{align*}
h_{\mathrm{Haar}}(f_A)=1\cdot \log\rho=\log\rho.
\end{align*}
The topological entropy of a hyperbolic toral automorphism is the logarithm of the spectral radius of its defining matrix; here the spectral radius is
\begin{align*}
\max\{\rho,\rho^{-1}\}=\rho.
\end{align*}
Thus
\begin{align*}
h_{\mathrm{top}}(f_A)=\log\rho=h_{\mathrm{Haar}}(f_A).
\end{align*}
In this linear example, the entropy is exactly the exponential expansion rate in the one-dimensional unstable eigendirection.
[/example]
This example connects the smooth and symbolic parts of the course. Markov partitions code hyperbolic toral automorphisms by subshifts of finite type, while Lyapunov exponents explain why the symbolic entropy is governed by the unstable eigenvalue.
## Dimension and Nonuniform Hyperbolicity
The final problem is to relate stretching rates to the geometry of invariant measures. In dissipative systems, an invariant measure may live on a set with non-integer dimension: expansion creates complexity while contraction folds mass into thinner regions.
[definition: Hyperbolic Invariant Measure]
Let $f:M\to M$ be a $C^1$ diffeomorphism preserving a probability measure $\mu$. The measure $\mu$ is hyperbolic if all Lyapunov exponents are nonzero for $\mu$-a.e. point.
[/definition]
Hyperbolicity for a measure is weaker than uniform hyperbolicity on a set. The Oseledets splitting only needs to exist almost everywhere, and the angle between stable and unstable spaces may vary in a nonuniform way.
The signs of the Lyapunov exponents divide the tangent directions into contracting, expanding, and neutral behaviour. Naming those pieces prepares the local geometric language used in Pesin theory.
[definition: Stable and Unstable Oseledets Spaces]
At an Oseledets regular point $x$, define
\begin{align*}
E^s(x)=\bigoplus_{\lambda_i<0}E_i(x),\qquad E^u(x)=\bigoplus_{\lambda_i>0}E_i(x),\qquad E^0(x)=\bigoplus_{\lambda_i=0}E_i(x).
\end{align*}
[/definition]
These spaces describe first-order asymptotic geometry. In the $C^{1+\alpha}$ setting, Pesin theory upgrades them to local stable and unstable manifolds through almost every regular point under suitable hypotheses.
[remark: Dimension Heuristics]
For many hyperbolic measures, formulas of Kaplan--Yorke, Ledrappier--Young, and related theories express dimensions through entropy and Lyapunov exponents. The common theme is that entropy measures how much mass branches along expanding directions, while negative exponents measure how rapidly mass is compressed transversely. This course uses these ideas as interpretation rather than as a full dimension theory.
[/remark]
The chapter closes the bridge from topological chaos to smooth statistical chaos. Lyapunov exponents quantify tangent growth, Oseledets' theorem supplies almost-everywhere invariant directions, Ruelle's inequality bounds entropy by expansion, and Pesin's formula identifies the cases where smooth unstable geometry turns expansion exactly into entropy.
Lyapunov exponents and smooth ergodic theory turn chaos into quantitative rates of stretching and contraction. Those rates then point toward the measures that ordinary observations actually see, which is the focus of the next chapter on SRB and physical measures.
# 10. SRB Measures and Physical Measures
This chapter turns the geometric picture of hyperbolic chaos into a statistical one. In earlier chapters invariant measures described the long-time behaviour of points once an initial distribution had already been chosen. The new problem is to identify invariant measures that are visible from ordinary volume: if a point is chosen at random from a region of phase space, what time averages should we expect to observe?
The answer is not usually an invariant smooth volume. Dissipative systems may collapse open sets onto thin attractors, while preserving enough unstable geometry to produce reproducible statistics. Physical measures and SRB measures are the measures designed to capture this mixture of attraction, instability, and statistical regularity. The chapter assumes the earlier material on invariant measures, ergodicity and Birkhoff time averages, hyperbolic sets, stable and unstable manifolds, and absolute continuity of foliations.
## Physical Measures and Basins
The basic observational question is the following: if we follow a single orbit for a long time and average what we see, does the result converge to integration against a fixed invariant measure? This is stronger than merely asking for invariant measures to exist, because it asks whether the measure describes a positive-volume set of initial conditions.
Let $M$ be a compact smooth manifold, let $m$ denote a smooth reference volume measure on $M$, and let $f:M \to M$ be a measurable map. For $x \in M$, the empirical measures of the orbit of $x$ are
\begin{align*}
\mu_{n,x} := \frac{1}{n}\sum_{k=0}^{n-1} \delta_{f^k(x)}.
\end{align*}
These measures record the fraction of time that the first $n$ iterates spend in each region of phase space.
[definition: Empirical Distribution]
Let $M$ be a compact metric space and let $f:M \to M$ be continuous. For $x \in M$ and $n \in \mathbb N$, the $n$th empirical distribution of $x$ is the probability measure
\begin{align*}
\mu_{n,x} := \frac{1}{n}\sum_{k=0}^{n-1} \delta_{f^k(x)}.
\end{align*}
[/definition]
The empirical distribution definition translates orbit data into measures, and convergence of these measures means convergence of all continuous time averages. To decide whether a limiting measure is actually seen from a given initial condition, we next collect all points whose empirical distributions converge to that measure.
[definition: Basin of a Measure]
Let $M$ be a compact metric space, let $f:M \to M$ be continuous, and let $\mu$ be a Borel probability measure on $M$. The basin of $\mu$ is
\begin{align*}
B(\mu) := \{x \in M : \mu_{n,x} \rightharpoonup \mu\}.
\end{align*}
[/definition]
The basin definition identifies the initial conditions that observe a given invariant measure. This is a pointwise statistical notion: two invariant measures may both exist, but only one of them may be selected by the empirical distributions of typical initial conditions. The next issue is size. A basin containing only a periodic orbit, or only a thin invariant curve of zero ambient volume, is not experimentally detectable by sampling initial data from the phase space, so we require positive smooth volume.
[definition: Physical Measure]
Let $M$ be a compact smooth manifold, let $m$ be a smooth reference volume measure on $M$, and let $f:M \to M$ be continuous. An $f$-invariant Borel probability measure $\mu$ is a physical measure if
\begin{align*}
m(B(\mu)) > 0.
\end{align*}
[/definition]
The positivity condition is the operational content of the definition: a physical measure can be detected by sampling initial data from a set of positive volume and computing long orbit averages. The measure itself may be singular with respect to $m$, especially when it lives on an attractor of lower ambient volume.
[example: Attracting Fixed Point]
Let $f:[0,1]\to[0,1]$ be $f(x)=x/2$. Since $f(0)=0/2=0$, the point $0$ is fixed. For each $x\in[0,1]$, the iterates satisfy $f^0(x)=x=2^0x$, and if $f^k(x)=2^{-k}x$, then
\begin{align*}
f^{k+1}(x)=f(f^k(x))=\frac{2^{-k}x}{2}=2^{-(k+1)}x.
\end{align*}
Thus $f^k(x)=2^{-k}x$ for every $k\ge 0$, and $2^{-k}x\to 0$ because $0\le 2^{-k}x\le 2^{-k}$.
Now let $\varphi:[0,1]\to\mathbb R$ be continuous. Since $[0,1]$ is compact, $\varphi$ is bounded; choose $C\ge 0$ with $|\varphi(t)|\le C$ for all $t\in[0,1]$. Fix $\varepsilon>0$. By continuity of $\varphi$ at $0$, choose $K$ such that $|\varphi(2^{-k}x)-\varphi(0)|<\varepsilon$ for all $k\ge K$ and all $x\in[0,1]$. Then for $n>K$,
\begin{align*}
\left|\frac{1}{n}\sum_{k=0}^{n-1}\varphi(f^k(x))-\varphi(0)\right|\le \frac{1}{n}\sum_{k=0}^{K-1}|\varphi(2^{-k}x)-\varphi(0)|+\frac{1}{n}\sum_{k=K}^{n-1}|\varphi(2^{-k}x)-\varphi(0)|.
\end{align*}
For the first sum, each term is at most $2C$, so it is bounded by $2CK/n$. For the second sum, each term is at most $\varepsilon$, so it is bounded by $\varepsilon$. Hence
\begin{align*}
\limsup_{n\to\infty}\left|\frac{1}{n}\sum_{k=0}^{n-1}\varphi(f^k(x))-\varphi(0)\right|\le \varepsilon.
\end{align*}
Since $\varepsilon>0$ was arbitrary,
\begin{align*}
\frac{1}{n}\sum_{k=0}^{n-1}\varphi(f^k(x))\to \varphi(0)=\int\varphi\,d\delta_0.
\end{align*}
Therefore $\mu_{n,x}\rightharpoonup\delta_0$ for every $x\in[0,1]$, so $B(\delta_0)=[0,1]$. Since Lebesgue measure satisfies $\mathcal L^1([0,1])=1>0$, $\delta_0$ is a physical measure. This example shows that physical measures need not be spread out: attraction alone can produce physically observable statistics.
[/example]
The attracting fixed point example is useful because it separates physicality from chaos: a measure can be physical for the purely stable reason that nearby points converge to it. This shows that positive-volume basin alone does not encode sensitive dependence or mixing. In a chaotic attractor the basin is still large, but the measure should also describe expansion along unstable directions rather than collapse to a single orbit. The next example keeps the positive-volume statistical conclusion while replacing simple attraction by chaotic folding and expansion.
[example: Logistic Map at Parameter Four]
For $f:[0,1]\to[0,1]$, $f(x)=4x(1-x)$, define $D:\mathbb R/\mathbb Z\to\mathbb R/\mathbb Z$ by $D(\theta)=2\theta \pmod 1$ and $h(\theta)=\sin^2(\pi\theta)$. Then
\begin{align*}
f(h(\theta))=4\sin^2(\pi\theta)(1-\sin^2(\pi\theta))=4\sin^2(\pi\theta)\cos^2(\pi\theta)=\sin^2(2\pi\theta)=h(D\theta).
\end{align*}
Thus $h$ semiconjugates the angle-doubling map to the logistic map.
Let $m$ be Lebesgue measure on $\mathbb R/\mathbb Z$, and set $\mu=h_*m$. For any continuous $\varphi:[0,1]\to\mathbb R$,
\begin{align*}
\int \varphi\,d\mu=\int_0^1 \varphi(\sin^2(\pi\theta))\,d\theta.
\end{align*}
On $0\le \theta\le 1/2$, put $x=\sin^2(\pi\theta)$; then
\begin{align*}
dx=2\pi\sin(\pi\theta)\cos(\pi\theta)\,d\theta=2\pi\sqrt{x(1-x)}\,d\theta.
\end{align*}
On $1/2\le \theta\le 1$, the same substitution has the same absolute Jacobian and runs from $x=1$ back to $x=0$. Therefore the two branches give
\begin{align*}
\int_0^1 \varphi(\sin^2(\pi\theta))\,d\theta=\frac{1}{\pi}\int_0^1 \frac{\varphi(x)}{\sqrt{x(1-x)}}\,d\mathcal L^1(x).
\end{align*}
Hence
\begin{align*}
d\mu(x)=\frac{1}{\pi\sqrt{x(1-x)}}\,d\mathcal L^1(x).
\end{align*}
The measure $\mu$ is invariant because $D$ preserves Lebesgue measure on $\mathbb R/\mathbb Z$ and $f\circ h=h\circ D$:
\begin{align*}
\int \varphi\circ f\,d\mu=\int_0^1 \varphi(f(h(\theta)))\,d\theta=\int_0^1 \varphi(h(D\theta))\,d\theta=\int_0^1 \varphi(h(\theta))\,d\theta=\int \varphi\,d\mu.
\end{align*}
Since $D$ is ergodic for Lebesgue measure, the *Birkhoff Ergodic Theorem* gives, for $m$-a.e. $\theta$,
\begin{align*}
\frac{1}{n}\sum_{k=0}^{n-1}\varphi(h(D^k\theta))\to \int_0^1\varphi(h(t))\,dt=\int\varphi\,d\mu.
\end{align*}
Using $h(D^k\theta)=f^k(h(\theta))$, this becomes
\begin{align*}
\frac{1}{n}\sum_{k=0}^{n-1}\varphi(f^k(h(\theta)))\to \int\varphi\,d\mu.
\end{align*}
The density $1/(\pi\sqrt{x(1-x)})$ is positive for every $x\in(0,1)$, so $\mu$ and $\mathcal L^1$ have the same null sets. Therefore the basin of $\mu$ has full Lebesgue measure in $[0,1]$, and $\mu$ is physical.
This example is chaotic and physical, but not uniformly hyperbolic: $f'(x)=4-8x$, so $f'(1/2)=0$, and no uniform expansion estimate can hold on the whole interval.
[/example]
Physical measures are defined by their basins, but the definition does not by itself give a way to construct them. Hyperbolicity supplies the missing mechanism: unstable expansion spreads volume along unstable leaves, while stable contraction makes nearby points have the same future averages.
## SRB Measures for Uniformly Hyperbolic Dynamics
For uniformly hyperbolic systems, ordinary volume is distorted in a controlled way under iteration. The central question becomes: which invariant measures have conditional distributions along unstable directions that look like smooth volume on those unstable directions? Such measures are the Sinai-Ruelle-Bowen measures.
Let $f:M\to M$ be a $C^{1+\alpha}$ diffeomorphism of a compact Riemannian manifold. A compact invariant set $\Lambda\subset M$ is hyperbolic if there is a continuous invariant splitting
\begin{align*}
T_\Lambda M = E^s \oplus E^u
\end{align*}
with uniform contraction on $E^s$ and uniform expansion on $E^u$. Local unstable manifolds $W^u_{\mathrm{loc}}(x)$ carry their own Riemannian volume measures, denoted $m^u_x$.
The next definition isolates the smoothness property in the expanding directions. Since an SRB measure may be singular in the full ambient manifold, absolute continuity is asked only after disintegrating the measure along local unstable manifolds.
[definition: SRB Measure]
Let $f:M\to M$ be a $C^{1+\alpha}$ diffeomorphism and let $\Lambda \subset M$ be a compact hyperbolic invariant set. An $f$-invariant Borel probability measure $\mu$ supported on $\Lambda$ is an SRB measure if there exists a measurable partition subordinate to local unstable manifolds such that the conditional measures of $\mu$ on the partition elements are absolutely continuous with respect to the corresponding Riemannian volume measures on local unstable manifolds.
[/definition]
This definition says that the measure is smooth in the directions where the dynamics creates volume. It does not claim smoothness across stable directions; in dissipative attractors the transverse structure may be fractal.
[remark: Regularity Threshold]
The assumption $C^{1+\alpha}$ is part of the standard SRB theory because it gives Hölder control of the derivative and absolute continuity of the stable and unstable holonomies. For merely $C^1$ diffeomorphisms, the same geometric picture may fail because distortion along unstable manifolds need not be controlled well enough.
[/remark]
The regularity threshold in the remark motivates the following theorem: under $C^{1+\alpha}$ uniform hyperbolicity, the unstable conditional measures appearing in the SRB definition are genuinely absolutely continuous and their densities are controlled by unstable Jacobians. This theorem is needed before SRB measures can be used as statistical objects rather than only as formal disintegrations.
[quotetheorem:7768]
[citeproof:7768]
This theorem is not just the definition of an SRB measure restated: it identifies the Radon-Nikodym densities and shows how expansion rates determine them. The $C^{1+\alpha}$ and uniform hyperbolicity hypotheses are doing real work. Without Hölder distortion control the infinite product can fail to converge, and without a genuine unstable splitting there is no unstable Jacobian formula to write down. A useful failure model is a $C^1$ Anosov diffeomorphism with non-absolutely-continuous invariant foliations: the topological hyperbolic picture remains, but the distortion and holonomy estimates used in the product formula are no longer available in the form needed for SRB density control. Another standard warning is the Hénon family near tangencies, where unstable directions may be defined only nonuniformly and recurrence near critical regions must be controlled separately. The theorem also does not imply smoothness in stable directions or absolute continuity with respect to ambient Riemannian volume.
[example: Linear Cat Map]
Let $A$ be the integer matrix with entries $A_{11}=2$, $A_{12}=1$, $A_{21}=1$, and $A_{22}=1$. Its determinant is
\begin{align*}
\det A=(2)(1)-(1)(1)=1.
\end{align*}
Thus $A\in SL(2,\mathbb Z)$, so it induces a toral automorphism $f_A:\mathbb T^2\to\mathbb T^2$. The characteristic polynomial is
\begin{align*}
\det(A-\lambda I)=(2-\lambda)(1-\lambda)-1.
\end{align*}
Expanding the product gives
\begin{align*}
(2-\lambda)(1-\lambda)-1=2-2\lambda-\lambda+\lambda^2-1=\lambda^2-3\lambda+1.
\end{align*}
Hence the eigenvalues are
\begin{align*}
\lambda_s=\frac{3-\sqrt{5}}{2}
\end{align*}
and
\begin{align*}
\lambda_u=\frac{3+\sqrt{5}}{2}.
\end{align*}
Since $1<\sqrt{5}<3$, we have $0<3-\sqrt{5}<2$, so $0<\lambda_s<1$. Also $\sqrt{5}>1$, so $3+\sqrt{5}>4$, and therefore $\lambda_u>2>1$. The two real eigendirections give a constant splitting of $T\mathbb T^2$ into a uniformly contracted direction and a uniformly expanded direction, so $f_A$ is Anosov.
Lebesgue measure $m$ on $\mathbb T^2$ is invariant because the linear change of variables has Jacobian $|\det A|=1$. Equivalently, for every continuous $\varphi:\mathbb T^2\to\mathbb R$,
\begin{align*}
\int_{\mathbb T^2}\varphi(f_A z)\,dm(z)=\int_{\mathbb T^2}\varphi(w)\,dm(w).
\end{align*}
The unstable foliation is the projection to $\mathbb T^2$ of the affine lines in $\mathbb R^2$ parallel to the unstable eigenspace of $A$. On a small foliated rectangle, two-dimensional Lebesgue measure decomposes as length measure along unstable line segments times transverse stable measure. Therefore the conditional measures of $m$ on unstable plaques are ordinary one-dimensional Lebesgue measures, so $m$ satisfies the SRB condition.
The cat map is ergodic for Lebesgue measure. Indeed, if $k\in\mathbb Z^2\setminus\{0\}$ and $(A^\top)^n k=k$ for some $n\ge 1$, then $1$ would be an eigenvalue of $(A^\top)^n$. But the eigenvalues of $(A^\top)^n$ are $\lambda_s^n$ and $\lambda_u^n$, and these are not equal to $1$ because $0<\lambda_s<1<\lambda_u$. Thus the induced action on nonzero Fourier modes has no finite orbit, which is the standard Fourier criterion for ergodicity of a toral automorphism. By the *Birkhoff Ergodic Theorem*, for every continuous observable $\varphi$ and for $m$-a.e. $z\in\mathbb T^2$,
\begin{align*}
\frac{1}{n}\sum_{j=0}^{n-1}\varphi(f_A^j z)\to \int_{\mathbb T^2}\varphi\,dm.
\end{align*}
Thus the empirical distributions of $m$-almost every point converge to $m$. In this conservative hyperbolic example, the SRB measure is not singular on an attractor; it is the ambient Lebesgue measure itself.
[/example]
The cat map is conservative, so the SRB measure is ambient volume. Dissipative systems show the more distinctive case: the physical measure can be singular in the full manifold while still being smooth along unstable leaves.
[example: Solenoid Attractor]
Let $T=S^1\times \mathbb D^2$ be a solid torus, and consider a Smale-Williams solenoid map $F:T\to \operatorname{int}(T)$ whose angular coordinate is doubled and whose transverse disk coordinates are contracted by a fixed factor $c$ with $0<c<1/\sqrt 2$. In local coordinates $(\theta,u,v)$ this means that the angular derivative has size $2$, while each transverse derivative has size $c$. Thus vectors tangent to the core direction are expanded by $2>1$, and vectors tangent to the disk fibers are contracted by $c<1$ at every iterate.
The maximal invariant set is
\begin{align*}
\Lambda=\bigcap_{n\ge 0}F^n(T).
\end{align*}
Because $F(T)\subset \operatorname{int}(T)$, every point of $T$ remains in the trapping region under forward iteration and is pulled toward $\Lambda$. The volume of the $n$th image is controlled by the product of the one expanding factor and the two transverse contracting factors:
\begin{align*}
\operatorname{Vol}(F^n(T))=(2c^2)^n\operatorname{Vol}(T).
\end{align*}
Since $c<1/\sqrt2$, we have $c^2<1/2$, hence $2c^2<1$, so
\begin{align*}
(2c^2)^n\operatorname{Vol}(T)\to 0.
\end{align*}
As $\Lambda\subset F^n(T)$ for every $n$, it follows that
\begin{align*}
\operatorname{Vol}(\Lambda)\le \operatorname{Vol}(F^n(T))=(2c^2)^n\operatorname{Vol}(T)
\end{align*}
for every $n$, and therefore $\operatorname{Vol}(\Lambda)=0$.
Along the unstable coordinate, $F$ is modeled by the doubling map $\theta\mapsto 2\theta \pmod 1$. The natural invariant measure on the unstable coordinate is the measure of maximal entropy for this doubling map, assigning weight $2^{-n}$ to each cylinder of length $n$. The SRB measure on $\Lambda$ is obtained by combining this unstable distribution with the stable contraction inside the disk fibers. Its conditional measures on local unstable curves are absolutely continuous with respect to arclength on those curves, while transversely the measure is concentrated on the limiting solenoid set. Thus the basin contains the whole trapping solid torus $T$, so the SRB measure is physical, even though it is singular with respect to three-dimensional volume.
[/example]
The solenoid makes the geometry of an attractor visible: expansion gives randomness along the core, contraction traps a positive-volume set, and the invariant measure lives on the limiting fractal set. The general theory packages these features in the notion of a hyperbolic attractor.
## Existence and Uniqueness in the Uniformly Hyperbolic Case
The next problem is existence. Once we know which conditional structure we want, we need a theorem ensuring that such a measure actually exists and is unique under natural dynamical irreducibility assumptions.
For an Anosov diffeomorphism the whole manifold is hyperbolic. Transitivity rules out decomposing the manifold into several disjoint large invariant pieces with different statistical behaviours. This hypothesis is not cosmetic: if a uniformly hyperbolic system splits into two disjoint invariant hyperbolic components, each component can carry its own SRB measure, and uniqueness on the whole phase space fails. Similarly, if the Anosov hypothesis is removed, there may be neutral directions or critical behaviour for which the Markov partition and bounded distortion construction no longer applies.
[quotetheorem:7769]
[proofunderconstruction:7769]
This theorem is the cleanest expression of the SRB philosophy: a transitive uniformly hyperbolic system has a single statistically observable measure. The proof also shows why thermodynamic formalism appears naturally, because the unstable Jacobian is the potential that compensates for expansion; in statistical mechanics language, the SRB measure is a Gibbs state for this geometric potential. The theorem does not say that every invariant measure is physical, nor does it say that time averages converge for every point. Without transitivity one expects several basic pieces and potentially several SRB measures; without $C^{1+\alpha}$ regularity the stable and unstable holonomies need not be absolutely continuous; without uniform hyperbolicity the symbolic Gibbs construction can break down.
[remark: Conservative and Dissipative Cases]
If the Anosov diffeomorphism preserves smooth volume, as in the cat map, then the SRB measure is that volume measure. If the map is not volume-preserving, the SRB measure is still smooth along unstable leaves but may be singular with respect to ambient volume. The basin statement is the feature that remains physically meaningful in both cases.
[/remark]
The conservative and dissipative cases in the remark both fit the whole-manifold Anosov theorem, but attractors that occupy only part of the manifold require a different formal setup. This motivates the following definition, which records the hyperbolicity, trapping, and transitivity hypotheses needed for the attractor version of SRB theory.
[definition: Axiom A Attractor]
Let $f:M\to M$ be a $C^1$ diffeomorphism. A compact invariant set $\Lambda\subset M$ is an Axiom A attractor if $\Lambda$ is a hyperbolic set, periodic points are dense in $\Lambda$, there is an open attracting neighbourhood $U\subset M$ with $\overline{f(U)}\subset U$ such that
\begin{align*}
\Lambda = \bigcap_{n\ge 0} f^n(U),
\end{align*}
and $f|_\Lambda$ is topologically transitive.
[/definition]
The Axiom A attractor definition motivates the following theorem: when an attractor has hyperbolicity, a trapping basin, and transitivity, the Anosov SRB conclusion persists on the attractor rather than on the whole manifold. Each hypothesis prevents a specific failure mode. Without attraction, a hyperbolic repeller can have natural invariant measures but no positive-volume basin. Without transitivity, an attractor may split into several basic pieces with different statistical behaviours. Without hyperbolicity, stable holonomy and distortion estimates may fail. This theorem identifies both the unique SRB measure and the positive-volume set of initial conditions that observe it.
[quotetheorem:7770]
[proofunderconstruction:7770]
The [Bowen-Ruelle theorem](/theorems/7770) is the main bridge from geometric hyperbolicity to experimentally observable statistics. It also explains why attractors with zero ambient volume can nevertheless dominate observed dynamics: stable contraction brings ordinary volume into the attractor, while unstable expansion supplies the Gibbs-type statistics on the attractor itself. The theorem does not cover nontransitive attractors, where several SRB measures may coexist, and it does not make a nonattracting hyperbolic set physical. Outside Axiom A, for instance near homoclinic tangencies or critical regions, the loss of uniform hyperbolicity means existence of an SRB measure becomes a separate and often delicate theorem.
[remark: Applying the SRB Test]
In practice the uniform theory is used in a fixed order. First identify a candidate topological basin by finding a trapping region or showing that a positive-volume set of initial conditions approaches the invariant set. Next identify the local unstable leaves and verify that the invariant set has a stable-unstable hyperbolic splitting on those leaves. Then compute the unstable Jacobian $J^u f(x)=|\det(Df_x|_{E^u_x})|$, because the Gibbs potential $-\log J^u f$ is what determines the conditional densities. Finally check the SRB condition itself: after disintegration along unstable plaques, the conditional measures should be absolutely continuous with respect to the leaf volumes and should satisfy the Gibbs-type density comparison from the unstable density formula.
[/remark]
The solenoid is the clean model for this checklist because the trapping region, unstable expansion, and transverse contraction are all visible in the construction. It shows how an attractor can be physically observed even when its invariant measure is singular in the surrounding phase space.
[example: Solenoid as a Bowen-Ruelle Attractor]
Let $F:T\to \operatorname{int}(T)$ be the Smale-Williams solenoid map on the solid torus $T=S^1\times \mathbb D^2$, with angular expansion by $2$ and transverse contraction by a factor $c$ satisfying $0<c<1/\sqrt 2$. Its maximal invariant set is
\begin{align*}
\Lambda=\bigcap_{n\ge 0}F^n(T).
\end{align*}
The inclusion $F(T)\subset \operatorname{int}(T)$ makes $T$ a trapping neighbourhood for $\Lambda$. Along the angular direction, tangent vectors are multiplied in norm by $2>1$; along the two disk directions, tangent vectors are multiplied in norm by $c<1$. Thus $\Lambda$ has a uniformly expanded one-dimensional unstable direction and uniformly contracted stable directions. The standard solenoid construction is topologically transitive because its angular dynamics factors over the transitive doubling map $\theta\mapsto 2\theta\pmod 1$ with the inverse-limit structure recording the backward itinerary. Hence $\Lambda$ is a transitive uniformly hyperbolic attractor.
By the *Bowen-Ruelle Theorem*, there is a unique SRB measure $\mu_{\mathrm{SRB}}$ supported on $\Lambda$, and its basin contains a full smooth-volume subset of the topological basin of attraction of $\Lambda$. Since every point of the trapping solid torus $T$ remains in $T$ under forward iteration and is attracted to $\Lambda$, this topological basin contains $T$; therefore almost every point of $T$ observes $\mu_{\mathrm{SRB}}$ through its empirical distributions.
The same example shows why physical and ambient-smooth need not coincide. The volume contraction of one iterate is the product of the angular factor $2$ and the two transverse factors $c$ and $c$, so
\begin{align*}
\operatorname{Vol}(F^n(T))=(2c^2)^n\operatorname{Vol}(T).
\end{align*}
Because $c<1/\sqrt 2$, we have $c^2<1/2$, hence $2c^2<1$, and so $(2c^2)^n\operatorname{Vol}(T)\to 0$. Since $\Lambda\subset F^n(T)$ for every $n$,
\begin{align*}
\operatorname{Vol}(\Lambda)\le (2c^2)^n\operatorname{Vol}(T)
\end{align*}
for every $n$, and therefore $\operatorname{Vol}(\Lambda)=0$. Thus $\mu_{\mathrm{SRB}}$ is singular with respect to three-dimensional volume, while its conditional measures on local unstable curves are absolutely continuous with respect to arclength by the SRB conclusion.
[/example]
The examples so far are uniformly hyperbolic. Many important chaotic attractors are not, and the SRB idea remains influential precisely because it gives a target property even when the uniform theory no longer applies directly.
## Nonuniform Motivation and Hénon-Like Attractors
The final question is what survives outside the uniformly hyperbolic world. In applications, critical sets, tangencies, and varying expansion rates are common, so Markov partitions and bounded distortion may not be available in the same uniform form. SRB measures still provide the benchmark for a physically meaningful chaotic attractor.
Hénon-like maps are the guiding example. They are dissipative diffeomorphisms of the plane with stretching, folding, and contraction, and for certain parameter ranges they have strange attractors with nonuniform hyperbolic behaviour.
[example: Hénon-Like Attractor]
Consider the Hénon map $H_{a,b}:\mathbb R^2\to\mathbb R^2$ defined by
\begin{align*}
H_{a,b}(x,y)=(1-a x^2+y,bx).
\end{align*}
Assume $b\ne 0$, with $|b|$ small and $a$ near $2$ in a parameter regime where a strange attractor exists. The derivative has first row $(-2ax,1)$ and second row $(b,0)$, so its determinant is
\begin{align*}
\det DH_{a,b}(x,y)=(-2ax)(0)-(1)(b)=-b.
\end{align*}
Thus $H_{a,b}$ is locally invertible everywhere when $b\ne 0$, and two-dimensional area is multiplied by $|b|$ at each iterate. Since $|b|$ is small, the map is strongly dissipative.
The inverse can be written explicitly. If $(X,Y)=H_{a,b}(x,y)$, then $Y=bx$, so
\begin{align*}
x=\frac{Y}{b}.
\end{align*}
Substituting this into $X=1-a x^2+y$ gives
\begin{align*}
y=X-1+a\left(\frac{Y}{b}\right)^2.
\end{align*}
Hence
\begin{align*}
H_{a,b}^{-1}(X,Y)=\left(\frac{Y}{b},X-1+a\frac{Y^2}{b^2}\right).
\end{align*}
So the Hénon map is a diffeomorphism for $b\ne 0$, but its small Jacobian determinant shows that it contracts ambient area.
The source of nonuniformity is visible in the first coordinate. The expanding derivative inherited from the quadratic term is $-2ax$, whose absolute value is
\begin{align*}
|-2ax|=2|a||x|.
\end{align*}
Near the line $x=0$ this quantity is small, while away from $x=0$ it can be large. Thus an orbit may experience long stretches of expansion, interrupted by returns near the critical region inherited from the one-dimensional quadratic map $x\mapsto 1-a x^2$. In the strange-attractor parameter regime, the expected SRB measure is physical, has one positive Lyapunov exponent, and has conditional measures absolutely continuous along unstable manifolds where these manifolds are defined. The example therefore marks the point where uniform Bowen-Ruelle theory is no longer enough: existence of the SRB measure depends on controlling how often typical orbits return to the nearly critical region and how much distortion accumulates during those returns.
[/example]
The Hénon example shows why the definitions in this chapter were separated from the theorems. Physical measures and SRB measures can be defined beyond uniform hyperbolicity, but their existence may require delicate estimates that are absent from the uniformly hyperbolic setting.
[remark: SRB Versus Physical]
For uniformly hyperbolic attractors, the SRB measure given by Bowen-Ruelle is physical. In broader settings the two terms emphasize different information: physical measure refers to the positive-volume basin, while SRB measure refers to absolute continuity along unstable directions. Many central theorems in smooth ergodic theory prove that these properties coincide under suitable hyperbolicity and regularity assumptions.
[/remark]
The conceptual lesson is that chaotic statistics are not merely invariant measures chosen after the fact. The unstable geometry selects measures with smooth conditionals along expanding directions, and the stable geometry makes their basins visible to ordinary volume. SRB measures are therefore the meeting point of hyperbolic geometry, ergodic averages, and physical observation.
SRB measures identify the invariant statistics visible from positive-volume sets, tying together geometry, ergodic averages, and physical observation. The next chapter packages the same hyperbolic structure into Markov partitions and thermodynamic formalism, where symbolic coding and variational principles become the main tools.
# 11. Markov Partitions and Thermodynamic Formalism
This chapter turns the geometric picture of hyperbolicity into a finite combinatorial model. Chapter 2 introduced symbolic dynamics as an independent source of examples, and Chapters 3 through 5 used it to code horseshoes; now symbolic systems become coordinates for large classes of smooth chaotic systems. The main prerequisites are hyperbolic sets, stable and unstable manifolds, subshifts of finite type, topological entropy, measure-theoretic entropy, and Perron-Frobenius theory for non-negative matrices. The second theme is thermodynamic formalism: entropy measures orbit complexity, while pressure adds a weight coming from a potential and selects invariant measures with prescribed statistical behaviour.
## Rectangles and Stable-For-Unstable Product Structure
The problem is to cut a hyperbolic invariant set into pieces that are small enough to see stable and unstable directions, but large enough that iterates move whole pieces according to a finite rule. Ordinary topological rectangles are not adapted to hyperbolic dynamics, because their sides need to follow local stable and unstable manifolds. The correct objects are rectangles in the dynamical sense: sets with a local product structure.
[definition: Local Stable And Unstable Sets]
Let $f:M\to M$ be a diffeomorphism of a compact smooth manifold, and let $\Lambda\subset M$ be a hyperbolic invariant set. For sufficiently small $\varepsilon>0$, the local stable-set assignment $W^s_\varepsilon:\Lambda\to\mathcal P(M)$ and the local unstable-set assignment $W^u_\varepsilon:\Lambda\to\mathcal P(M)$ are defined by
\begin{align*}
W^s_\varepsilon(x)=\{y\in M: d(f^n(x),f^n(y))\leq \varepsilon \text{ for all } n\geq 0\}.
\end{align*}
\begin{align*}
W^u_\varepsilon(x)=\{y\in M: d(f^{-n}(x),f^{-n}(y))\leq \varepsilon \text{ for all } n\geq 0\}.
\end{align*}
[/definition]
These local sets are short pieces of the stable and unstable manifolds. Hyperbolicity says that nearby points in a hyperbolic set have unique stable-unstable intersections, so the next piece of notation records the point obtained by taking the stable coordinate from one point and the unstable coordinate from another.
[definition: Bracket Map]
Let $\Lambda$ be a hyperbolic invariant set and fix a scale at which local product structure is defined. Let $U\subset \Lambda\times\Lambda$ be the neighbourhood of the diagonal consisting of pairs for which the local stable and unstable plaques meet in a unique point of $\Lambda$. The bracket map is the map
\begin{align*}
[\cdot,\cdot]:U\to\Lambda
\end{align*}
defined by
\begin{align*}
[x,y]=W^s_\varepsilon(x)\cap W^u_\varepsilon(y).
\end{align*}
[/definition]
The bracket map gives a coordinate operation, not merely a point of intersection. To build a symbolic partition, we need sets that are closed under this operation, because a symbolic rectangle should contain every combination of stable and unstable coordinates coming from points already inside it.
[definition: Rectangle]
A subset $R\subset\Lambda$ is a rectangle if $[x,y]\in R$ for all sufficiently close $x,y\in R$ for which the bracket is defined.
[/definition]
Rectangles should be thought of as curved products of stable plaques and unstable plaques. Their boundaries are divided into stable and unstable parts, and the Markov condition requires images of unstable sides to line up with unstable sides and preimages of stable sides to line up with stable sides.
[example: Rectangles For The Cat Map]
Let $A$ be given by $A(x_1,x_2)=(2x_1+x_2,x_1+x_2)$. Since $\det A=2\cdot 1-1\cdot 1=1$, the integer matrix $A$ induces an automorphism $f:\mathbb T^2\to\mathbb T^2$. Its characteristic polynomial is
\begin{align*}
\det(A-\lambda I)=(2-\lambda)(1-\lambda)-1=2-3\lambda+\lambda^2-1=\lambda^2-3\lambda+1.
\end{align*}
Therefore the two eigenvalues are
\begin{align*}
\lambda_u=\frac{3+\sqrt 5}{2}>1,\qquad \lambda_s=\frac{3-\sqrt 5}{2}<1.
\end{align*}
For a vector $v=(1,m)$, the equation $Av=\lambda v$ means
\begin{align*}
(2+m,1+m)=(\lambda,\lambda m).
\end{align*}
The first coordinate gives $m=\lambda-2$, so the unstable and stable directions may be represented by
\begin{align*}
e_u=\left(1,\frac{\sqrt 5-1}{2}\right),\qquad e_s=\left(1,-\frac{\sqrt 5+1}{2}\right).
\end{align*}
Choose a small parallelogram
\begin{align*}
R=\{p+a e_s+b e_u: |a|\leq \alpha,\ |b|\leq \beta\}\subset \mathbb T^2
\end{align*}
small enough that these coordinates do not wrap around the torus. If $x=p+a_x e_s+b_x e_u$ and $y=p+a_y e_s+b_y e_u$ lie in $R$, then the local stable segment through $x$ inside $R$ is
\begin{align*}
W^s_{\mathrm{loc}}(x)\cap R=\{p+a e_s+b_x e_u: |a|\leq \alpha\}.
\end{align*}
The local unstable segment through $y$ inside $R$ is
\begin{align*}
W^u_{\mathrm{loc}}(y)\cap R=\{p+a_y e_s+b e_u: |b|\leq \beta\}.
\end{align*}
These two displayed sets meet at exactly one point, namely
\begin{align*}
[x,y]=p+a_y e_s+b_x e_u.
\end{align*}
Because $|a_y|\leq \alpha$ and $|b_x|\leq \beta$, this point belongs to $R$, so $R$ is closed under the bracket operation and is a dynamical rectangle.
Finally, since $Ae_s=\lambda_s e_s$ and $Ae_u=\lambda_u e_u$, we have
\begin{align*}
A(ae_s+be_u)=a\lambda_s e_s+b\lambda_u e_u.
\end{align*}
Thus $0<\lambda_s<1<\lambda_u$ means that $f$ contracts stable sides and stretches unstable sides. This product-coordinate behaviour is the local mechanism that lets finitely many such parallelograms be arranged into a Markov partition for the cat map.
[/example]
This example shows why a finite partition should encode the dynamics by recording which rectangle an orbit visits at each time. The extra Markov alignment condition is what makes the resulting itinerary space a subshift of finite type rather than an arbitrary subshift.
## Markov Partitions and Transition Matrices
The next question is how to impose a finite transition rule on the rectangles. A partition by rectangles becomes Markov when transitions from one rectangle to another depend only on the present rectangle, not on the whole past itinerary. This is the bridge from smooth hyperbolic dynamics to topological Markov chains.
[definition: Markov Partition]
Let $f:M\to M$ be a diffeomorphism and let $\Lambda\subset M$ be a compact hyperbolic invariant set. A finite collection $\mathcal R=\{R_1,\dots,R_k\}$ of rectangles is a Markov partition for $f|_\Lambda$ if the interiors of the $R_i$ are pairwise disjoint in $\Lambda$, their union is $\Lambda$, and whenever $x\in \operatorname{int}(R_i)$ with $f(x)\in\operatorname{int}(R_j)$, the image of the local unstable plaque of $x$ inside $R_i$ covers the local unstable plaque of $f(x)$ inside $R_j$, while the preimage of the local stable plaque of $f(x)$ inside $R_j$ covers the local stable plaque of $x$ inside $R_i$.
[/definition]
The definition says that future choices propagate along unstable plaques and past choices propagate along stable plaques. Once this alignment holds, the remaining data are finite: for each ordered pair of rectangles, we only need to know whether the first can be followed by the second.
[definition: Transition Matrix Of A Markov Partition]
Let $\mathcal R=\{R_1,\dots,R_k\}$ be a Markov partition for $f|_\Lambda$. Its transition matrix is the $k\times k$ matrix $A=(A_{ij})$ with entries
\begin{align*}
A_{ij}=1 \quad \text{iff} \quad \operatorname{int}(R_i)\cap f^{-1}(\operatorname{int}(R_j))\neq\varnothing,
\end{align*}
and $A_{ij}=0$ otherwise.
[/definition]
The matrix $A$ defines a subshift of finite type $\Sigma_A$. The coding problem is not just to record which rectangles are visited, but to recover a point whose entire orbit realizes a prescribed compatible itinerary. Without the Markov alignment of stable and unstable sides, pairwise allowed transitions may fail to assemble into a global orbit, and boundary points may have more than one name. The Markov property is designed to remove the first obstruction while controlling the second by allowing finite-to-one coding.
[quotetheorem:7771]
[citeproof:7771]
The coding theorem converts geometric questions into matrix questions, but the Markov hypothesis is doing essential work. If a finite rectangle cover records visits without stable-unstable alignment, the set of possible future rectangles may depend on a longer past history, so the itinerary space need not be described by a single transition matrix. Boundary points also explain why the coding is finite-to-one rather than globally one-to-one: an orbit that lands on a shared side may receive several symbolic names. The theorem therefore gives a semiconjugacy from a subshift of finite type, not a topological conjugacy on all of $\Lambda$, and the next issue is whether partitions with this strong alignment actually exist.
[quotetheorem:7772]
This theorem is used as a structural input in the course. Compactness is needed to pass from local product boxes to a finite cover, and uniform Anosov hyperbolicity is needed so that the same stable-unstable estimates work across the whole manifold. For nonuniformly hyperbolic maps, or for maps with tangencies and critical behaviour, a finite Markov partition may not exist; one often needs countable symbolic models or inducing schemes instead. The theorem also does not construct a canonical partition, since different choices of small rectangles may give different transition matrices. In these notes we use the Sinai-Bowen-Ratner construction as the bridge that transfers statements from shifts of finite type to Anosov diffeomorphisms.
[example: A Markov Partition For The Cat Map]
For the cat map $f:\mathbb T^2\to\mathbb T^2$ induced by
\begin{align*}
A(x_1,x_2)=(2x_1+x_2,x_1+x_2),
\end{align*}
take a finite collection $\mathcal R=\{R_1,\dots,R_k\}$ of sufficiently small parallelograms in the square model of $\mathbb T^2$, with opposite sides of the square identified and with each side of each $R_i$ parallel to one of the stable or unstable eigendirections. If a point in one rectangle is written in local product coordinates as
\begin{align*}
z=p+a e_s+b e_u,
\end{align*}
then, using $Ae_s=\lambda_s e_s$ and $Ae_u=\lambda_u e_u$, its image satisfies
\begin{align*}
Az=Ap+a\lambda_s e_s+b\lambda_u e_u.
\end{align*}
Thus the stable coordinate $a$ is multiplied by $\lambda_s$, with $0<\lambda_s<1$, while the unstable coordinate $b$ is multiplied by $\lambda_u$, with $\lambda_u>1$. So each image $f(R_i)$ is a longer and thinner parallelogram strip whose unstable direction has been stretched across some of the rectangles $R_j$.
The transition matrix $T=(T_{ij})$ is defined by
\begin{align*}
T_{ij}=1 \quad \text{iff} \quad \operatorname{int}(R_i)\cap f^{-1}(\operatorname{int}(R_j))\neq\varnothing.
\end{align*}
Equivalently, $T_{ij}=1$ exactly when
\begin{align*}
f(\operatorname{int}(R_i))\cap \operatorname{int}(R_j)\neq\varnothing.
\end{align*}
Therefore an admissible symbolic itinerary $(i_n)_{n\in\mathbb Z}$ records the concrete condition
\begin{align*}
f^n(z)\in R_{i_n}\quad \text{for every } n\in\mathbb Z.
\end{align*}
When no iterate of $z$ lies on a rectangle boundary, exactly one rectangle contains each $f^n(z)$ in its interior, so the itinerary is unique. If some iterate lies on a shared stable or unstable side, then that iterate may belong to more than one rectangle, and the same orbit can receive more than one symbolic name. Thus the Markov partition converts the cat map into a subshift of finite type, with non-uniqueness occurring precisely at the partition boundaries.
[/example]
The transition matrix also gives computable dynamical invariants. If $A$ is irreducible or primitive, Perron-Frobenius theory controls orbit growth, entropy, and natural invariant measures for the symbolic model.
## Pressure and the Variational Principle
Entropy counts orbit names without weights. The next problem is to count orbits while rewarding or penalising them according to an observable $\varphi:X\to\mathbb R$, called a potential. Pressure is the resulting weighted growth rate, and it is designed so that entropy appears as the special case $\varphi=0$.
[definition: Topological Pressure For A Subshift Of Finite Type]
Let $\Sigma_A$ be a one-sided or two-sided subshift of finite type, let $\sigma:\Sigma_A\to\Sigma_A$ be the shift, and let $\varphi:\Sigma_A\to\mathbb R$ be continuous. The pressure functional is the map
\begin{align*}
P:C(\Sigma_A)\to\mathbb R.
\end{align*}
The topological pressure of $\varphi$ is
\begin{align*}
P(\varphi)=\lim_{n\to\infty}\frac{1}{n}\log\sum_{w\in\mathcal L_n(A)}\exp\left(\sup_{x\in[w]}\sum_{j=0}^{n-1}\varphi(\sigma^j x)\right),
\end{align*}
where $\mathcal L_n(A)$ is the set of admissible words of length $n$ and $[w]$ is the corresponding cylinder.
[/definition]
The supremum over a cylinder compensates for variation of the potential inside the cylinder. For Hölder potentials, this variation is uniformly controlled, and periodic-orbit or transfer-operator formulas give the same pressure.
[example: Pressure Of A Locally Constant Potential]
Let $\varphi$ depend only on the present symbol, so that $\varphi(x)=\varphi_{x_0}$ on the cylinder $[x_0]$. Define $B_{ij}=A_{ij}e^{\varphi_i}$. For an admissible word $w=i_0i_1\cdots i_{n-1}$ and any $x\in[w]$, the orbit segment has symbols $i_0,\dots,i_{n-1}$, hence
\begin{align*}
\sum_{j=0}^{n-1}\varphi(\sigma^j x)=\varphi_{i_0}+\varphi_{i_1}+\cdots+\varphi_{i_{n-1}}.
\end{align*}
Since this value is constant on $[w]$, the supremum in the pressure partition sum is the same number.
Now compare this partition sum with powers of $B$. For an admissible word $i_0\cdots i_{n-1}$,
\begin{align*}
B_{i_0i_1}B_{i_1i_2}\cdots B_{i_{n-2}i_{n-1}}=A_{i_0i_1}\cdots A_{i_{n-2}i_{n-1}}e^{\varphi_{i_0}+\cdots+\varphi_{i_{n-2}}}.
\end{align*}
Because the word is admissible, each factor $A_{i_j i_{j+1}}$ equals $1$, so
\begin{align*}
B_{i_0i_1}B_{i_1i_2}\cdots B_{i_{n-2}i_{n-1}}=e^{\varphi_{i_0}+\cdots+\varphi_{i_{n-2}}}.
\end{align*}
Therefore
\begin{align*}
e^{\varphi_{i_0}+\cdots+\varphi_{i_{n-1}}}=e^{\varphi_{i_{n-1}}}B_{i_0i_1}B_{i_1i_2}\cdots B_{i_{n-2}i_{n-1}}.
\end{align*}
If $m=\min_i\varphi_i$ and $M=\max_i\varphi_i$, then
\begin{align*}
e^m B_{i_0i_1}\cdots B_{i_{n-2}i_{n-1}}\leq e^{\varphi_{i_0}+\cdots+\varphi_{i_{n-1}}}\leq e^M B_{i_0i_1}\cdots B_{i_{n-2}i_{n-1}}.
\end{align*}
Summing over all admissible words gives
\begin{align*}
e^m\sum_{a,b}(B^{n-1})_{ab}\leq Z_n(\varphi)\leq e^M\sum_{a,b}(B^{n-1})_{ab}.
\end{align*}
Taking logarithms, dividing by $n$, and letting $n\to\infty$, the constants $m/n$ and $M/n$ vanish. By the *Perron-Frobenius theorem* for a non-negative matrix,
\begin{align*}
\lim_{n\to\infty}\frac{1}{n}\log\sum_{a,b}(B^{n-1})_{ab}=\log\rho(B).
\end{align*}
Hence
\begin{align*}
P(\varphi)=\log\rho(B).
\end{align*}
Thus a locally constant potential replaces the unweighted transition matrix $A$ by the weighted transition matrix $B$, and pressure is the logarithmic exponential growth rate of its weighted paths.
[/example]
Pressure has a measure-theoretic characterisation, but first we need the name for a measure that attains the pressure value. Such a measure balances two competing effects: it spreads mass across many distinguishable orbits to gain entropy, and it concentrates mass where the potential has high average value.
[definition: Equilibrium State]
Let $T:X\to X$ be a continuous map of a compact metric space and let $\varphi:X\to\mathbb R$ be continuous. A $T$-invariant Borel probability measure $\mu$ is an equilibrium state for $\varphi$ if
\begin{align*}
P(\varphi)=h_\mu(T)+\int_X \varphi\,d\mu.
\end{align*}
[/definition]
When $\varphi=0$, an equilibrium state is a measure of maximal entropy. The definition raises the next problem: pressure was introduced through weighted orbit growth, while equilibrium states are defined through invariant measures, so we need a theorem identifying these two viewpoints. For a general compact system, pressure is defined by replacing cylinder partition sums with weighted separated-set or spanning-set partition sums; this agrees with the cylinder definition for subshifts of finite type. That identification is the variational principle.
[quotetheorem:6816]
[citeproof:6816]
The theorem explains the word pressure: it is the supremum of the free energy $h_\mu(T)+\int\varphi\,d\mu$. Compactness of $X$ and continuity of $T$ keep the orbit-complexity definitions of pressure compatible with invariant probability measures, while continuity of $\varphi$ ensures that the potential changes controllably on small orbit scales. A concrete warning is the full shift on the countable alphabet $\mathbb N$: the space is not compact in the product topology with a discrete alphabet, and the zero potential has infinitely many one-step choices, so the finite-alphabet compactness argument no longer produces a measure of maximal entropy. In countable Markov shifts and intermittent interval maps such as the Manneville-Pomeau family, existence of equilibrium states depends on recurrence, tightness, and regularity conditions beyond the compact variational principle. Thus the variational principle identifies pressure with a measure-theoretic optimisation problem, but it does not by itself guarantee uniqueness of an optimiser; the next section obtains uniqueness in the symbolic mixing case using Perron-Frobenius theory.
## Measures Of Maximal Entropy and Parry Measures
The next problem is to identify the invariant measure that gives the largest possible entropy. For a mixing subshift of finite type, Perron-Frobenius theory supplies a canonical answer: the Parry measure. This measure is the symbolic prototype for Bowen's measures of maximal entropy on hyperbolic systems.
[definition: Measure Of Maximal Entropy]
Let $T:X\to X$ be a continuous map of a compact metric space. A $T$-invariant Borel probability measure $\mu$ is a measure of maximal entropy if
\begin{align*}
h_\mu(T)=h_{\mathrm{top}}(T).
\end{align*}
[/definition]
The definition is an optimisation problem over invariant measures. On a mixing topological Markov chain, the optimiser can be written explicitly from the Perron-Frobenius eigenvectors of the transition matrix.
[definition: Parry Measure]
Let $A$ be an irreducible $k\times k$ zero-one matrix with Perron-Frobenius eigenvalue $\lambda>0$. Let $r,l\in\mathbb R^k_+$ be right and left eigenvectors satisfying
\begin{align*}
Ar&=\lambda r, & l^\top A&=\lambda l^\top, & \sum_i l_i r_i&=1.
\end{align*}
The Parry measure $\mu_P$ on $\Sigma_A$ is the Markov measure with stationary distribution $\pi_i=l_i r_i$ and transition probabilities
\begin{align*}
P_{ij}=\frac{A_{ij}r_j}{\lambda r_i}.
\end{align*}
[/definition]
These formulas are arranged so that $\pi_i P_{ij}=l_iA_{ij}r_j/\lambda$ and the stationarity equations follow from the eigenvector identities. The next question is whether this invariant Markov measure is merely natural or actually optimal for entropy. The theorem below answers that question and gives uniqueness.
[quotetheorem:6793]
Irreducibility is essential here because it gives one transitive symbolic component and a positive Perron-Frobenius eigenvector controlling admissible words. If $A$ is primitive, the Parry measure is also mixing. If $A$ is irreducible but periodic, the entropy maximiser is still governed by the same Perron-Frobenius data, but mixing statements must be read along the period decomposition. If $A$ is reducible, different maximal components can carry distinct measures of maximal entropy, so uniqueness may fail. The same mechanism works for locally constant potentials after replacing the zero-one matrix by a weighted positive matrix. The equilibrium state is again Markov, but its transition probabilities are tilted toward symbols with larger potential weight.
[example: Equilibrium State For A Locally Constant Potential]
[claim]For the locally constant potential $\varphi(x)=\varphi_{x_0}$, the Markov measure defined from the Perron-Frobenius data of $B_{ij}=A_{ij}e^{\varphi_i}$ is an equilibrium state and has pressure value $\log\rho(B)$.[/claim]
[proof]Let $\rho=\rho(B)$, and choose positive left and right eigenvectors $l,r$ satisfying
\begin{align*}
Br=\rho r,\qquad l^\top B=\rho l^\top,\qquad \sum_i l_i r_i=1.
\end{align*}
Define
\begin{align*}
\pi_i=l_i r_i,\qquad P_{ij}=\frac{B_{ij}r_j}{\rho r_i}.
\end{align*}
First $P$ is stochastic, because for each $i$,
\begin{align*}
\sum_j P_{ij}=\sum_j \frac{B_{ij}r_j}{\rho r_i}=\frac{(Br)_i}{\rho r_i}=\frac{\rho r_i}{\rho r_i}=1.
\end{align*}
The distribution $\pi$ is stationary for $P$, since
\begin{align*}
\sum_i \pi_iP_{ij}=\sum_i l_i r_i\frac{B_{ij}r_j}{\rho r_i}=\frac{r_j}{\rho}\sum_i l_iB_{ij}.
\end{align*}
Using $l^\top B=\rho l^\top$, the last sum is $\rho l_j$, so
\begin{align*}
\sum_i \pi_iP_{ij}=\frac{r_j}{\rho}\rho l_j=l_jr_j=\pi_j.
\end{align*}
Thus $\pi$ and $P$ define a shift-invariant Markov measure $\mu_B$.
For an allowed transition $i\to j$, we have $A_{ij}=1$, hence
\begin{align*}
B_{ij}=e^{\varphi_i}.
\end{align*}
Therefore
\begin{align*}
\log P_{ij}=\log B_{ij}+\log r_j-\log\rho-\log r_i=\varphi_i+\log r_j-\log\rho-\log r_i.
\end{align*}
The Markov entropy formula gives
\begin{align*}
h_{\mu_B}(\sigma)=-\sum_{i,j}\pi_iP_{ij}\log P_{ij}.
\end{align*}
Substituting the displayed formula for $\log P_{ij}$ gives
\begin{align*}
h_{\mu_B}(\sigma)=-\sum_{i,j}\pi_iP_{ij}\varphi_i-\sum_{i,j}\pi_iP_{ij}\log r_j+\sum_{i,j}\pi_iP_{ij}\log\rho+\sum_{i,j}\pi_iP_{ij}\log r_i.
\end{align*}
Since $\sum_jP_{ij}=1$,
\begin{align*}
\int \varphi\,d\mu_B=\sum_i\pi_i\varphi_i=\sum_{i,j}\pi_iP_{ij}\varphi_i.
\end{align*}
Since $\pi P=\pi$,
\begin{align*}
\sum_{i,j}\pi_iP_{ij}\log r_j=\sum_j\pi_j\log r_j.
\end{align*}
Again using $\sum_jP_{ij}=1$,
\begin{align*}
\sum_{i,j}\pi_iP_{ij}\log r_i=\sum_i\pi_i\log r_i.
\end{align*}
The two $r$-terms are the same sum with a dummy index renamed, and the potential terms cancel after adding $\int\varphi\,d\mu_B$. Also $\sum_{i,j}\pi_iP_{ij}=1$, so
\begin{align*}
h_{\mu_B}(\sigma)+\int\varphi\,d\mu_B=\log\rho.
\end{align*}
From the pressure computation for locally constant potentials, $P(\varphi)=\log\rho(B)$, and by the *[Variational Principle For Pressure](/theorems/6816)* this equality means that $\mu_B$ attains the supremum of $h_\mu(\sigma)+\int\varphi\,d\mu$.[/proof]
Thus the zero-one transition matrix is replaced by the weighted matrix $B$, and the equilibrium state is the Markov measure whose transitions are tilted by the weights $e^{\varphi_i}$.
[/example]
For smooth hyperbolic systems, Markov partitions push these symbolic measures forward to invariant measures on the original phase space. Boundary identifications introduce finite-to-one coding issues, so the final question is whether the symbolic uniqueness theorem survives after passing through the coding map.
[quotetheorem:7773]
[citeproof:7773]
Bowen's theorem is the conceptual endpoint of the chapter, but its hypotheses mark the boundary of the argument. Mixing excludes a cyclic decomposition into several components; for a nonmixing basic set one first decomposes into cyclic pieces, and for a general Axiom A system there may be several basic components competing for maximal entropy. The basic-set assumption supplies uniform hyperbolicity and local product structure, while the finite-to-one coding still leaves boundary identifications that prevent a global conjugacy. Markov partitions therefore reduce the measure-selection problem to finite symbolic data precisely in this uniformly hyperbolic setting, and thermodynamic formalism turns that reduction into a variational problem governed by entropy, pressure, and Perron-Frobenius theory.
Markov partitions and thermodynamic formalism reduce uniformly hyperbolic dynamics to finite symbolic data and optimisation principles. The final chapter uses that reduction to synthesise the course, showing how geometry, coding, and statistics fit together into a coherent theory of chaos and ergodic behaviour.
# 12. Synthesis: Geometry, Coding, and Statistics
This final chapter synthesises the main tools of the course on chaos and ergodic theory. The prerequisites are the earlier chapters on hyperbolic dynamics, symbolic dynamics, invariant measures, entropy, Lyapunov exponents, and ergodic theorems. Its goal is to show how these ideas are combined in practice: geometry supplies mechanisms such as transverse homoclinic intersections, coding turns them into symbolic models, and statistics explains which features are seen by typical observations. The main warning is that chaos is not a single theorem: topological chaos concerns orbit structure in phase space, metric chaos concerns invariant measures and almost-everywhere behavior, and statistical chaos concerns observable averages, fluctuations, and numerical estimates.
## From Transverse Homoclinic Intersections to Entropy
The central geometric question is: when does a local crossing of invariant manifolds force global orbit complexity? In low-dimensional pictures, a transverse homoclinic point looks like a single fold-and-return event. The remarkable fact is that, under iteration, this local crossing reproduces itself on smaller scales and creates a symbolic subsystem.
[definition: Transverse Homoclinic Point]
Let $f: M \to M$ be a $C^r$ diffeomorphism of a smooth manifold $M$, and let $p$ be a hyperbolic periodic point of $f$. A point $q \ne p$ is a transverse homoclinic point for $p$ if
a) $q \in W^s(p) \cap W^u(p)$, and
b) $T_qW^s(p) + T_qW^u(p) = T_qM$.
[/definition]
This definition isolates the place where geometry becomes combinatorics. The stable and unstable manifolds do not merely touch; they cross with enough independence that nearby strips are stretched, folded, and returned across each other. This raises the next question: which theorem turns such a crossing into a symbolic subsystem with many prescribed itineraries?
[quotetheorem:7752]
[proofunderconstruction:7752]
The theorem does not say that the whole phase space is symbolic. It produces a compact invariant subsystem whose dynamics already contains exponential orbit complexity. The hypotheses are not cosmetic: if the intersection is only tangential, the strips may fold without giving two persistent transverse crossings, and a neutral periodic point does not supply the stable and unstable directions needed for Markov rectangles. This raises the next measurement problem: how do we assign a number to the growth rate of distinguishable orbit segments produced by such a subsystem?
[definition: Topological Entropy of a Compact Map]
Let $(X,d)$ be a compact metric space and let $f: X \to X$ be continuous. For $n \in \mathbb N$ define
\begin{align*}
d_n(x,y) = \max_{0 \le k < n} d(f^k(x), f^k(y)).
\end{align*}
For $\varepsilon > 0$, let $s_n(\varepsilon)$ be the maximal cardinality of a subset $E \subset X$ such that $d_n(x,y) \ge \varepsilon$ whenever $x,y \in E$ and $x \ne y$. The topological entropy of $f$ is
\begin{align*}
h_{\mathrm{top}}(f) = \lim_{\varepsilon \downarrow 0} \limsup_{n \to \infty} \frac{1}{n}\log s_n(\varepsilon).
\end{align*}
[/definition]
The separated-set definition matches the geometric picture: two initial points are counted as distinguishable if their orbit segments separate before time $n$. A horseshoe supplies many distinguishable orbit segments by assigning binary symbols to repeated passages through the Markov rectangles. The next question is how this symbolic supply forces a positive lower bound for entropy.
[quotetheorem:7774]
[citeproof:7774]
Positive entropy is not only a count of orbit segments. In symbolic systems it is accompanied by a supply of periodic codes, and the horseshoe construction transports those codes back to periodic orbits in the original phase space. The compactness and factor hypotheses are essential here: without compactness the separated-set entropy above is not the definition being used, and if the factor map goes in the opposite direction then entropy of the original system need not dominate the full shift. This raises the next structural question: how many periodic orbits are forced once the coding contains a full shift?
[quotetheorem:7775]
[citeproof:7775]
The periodic-orbit statement explains why symbolic codes are more than labels: they enumerate real recurrent motions. The primitive-word qualification matters: the repeated word $0101$ has symbolic period $4$ but least period $2$, so counting all period-$n$ words would overcount least-period orbits. To make the chain concrete, we now ask how a single transverse crossing becomes binary codes in a planar picture.
[example: Homoclinic Crossing to Binary Codes]
Let $f$ be a planar diffeomorphism with a hyperbolic saddle $p$ and a transverse homoclinic point $q\in W^s(p)\cap W^u(p)$. By the *Smale-Birkhoff Homoclinic Theorem*, there is an integer $N\ge 1$ and a small rectangle $R$ bounded by stable and unstable arcs such that the return map $F=f^N$ carries two disjoint vertical subrectangles $R_0,R_1\subset R$ across $R$ as horizontal strips.
For each bi-infinite binary sequence $a=(a_k)_{k\in\mathbb Z}\in\{0,1\}^{\mathbb Z}$, consider the set of points whose $k$th return lies in the prescribed strip:
\begin{align*}
K(a)=\bigcap_{k\in\mathbb Z} F^{-k}(R_{a_k}).
\end{align*}
The Markov-rectangle construction gives expansion across the unstable direction and contraction along the stable direction. Hence, for each finite word $a_{-m},\dots,a_m$, the set
\begin{align*}
K_m(a)=\bigcap_{k=-m}^{m}F^{-k}(R_{a_k})
\end{align*}
is a nonempty rectangle-like strip, and these sets are nested:
\begin{align*}
K_{m+1}(a)\subseteq K_m(a).
\end{align*}
The stable widths shrink under forward iteration and the unstable widths shrink under backward iteration, so the diameters of $K_m(a)$ tend to $0$. Compactness then gives exactly one point $x(a)\in K(a)$.
Define
\begin{align*}
\Lambda=\{x(a):a\in\{0,1\}^{\mathbb Z}\}.
\end{align*}
If $x(a)\in\Lambda$, then $F(x(a))$ visits $R_{a_{k+1}}$ at return time $k$, so its itinerary is the shifted sequence $\sigma(a)$:
\begin{align*}
F(x(a))=x(\sigma a).
\end{align*}
Thus the geometry of one transverse homoclinic crossing produces an invariant Cantor set $\Lambda$ whose points are coded by binary itineraries. The symbols $0$ and $1$ are not artificial labels: they record which of the two returned strips the orbit visits at each return.
[/example]
This example is the template behind the first implication chain of the chapter:
\begin{align*}
\text{transverse homoclinic point} \implies \text{horseshoe} \implies h_{\mathrm{top}}(f)>0 \implies \text{abundant periodic orbits}.
\end{align*}
The arrows should be read with their hypotheses attached. A different system may have positive entropy for reasons not visibly organised by a single horseshoe, and abundant periodic points alone do not force topological entropy to be positive.
## From Hyperbolic Geometry to Invariant Measures
The next question is how the geometric picture seen by stable and unstable manifolds becomes a measure-theoretic statement about typical orbits. Topological entropy counts distinguishable orbit segments without choosing a probability distribution. Metric entropy asks how much information per iterate is produced when the initial point is sampled according to an invariant measure.
[definition: Invariant Probability Measure]
Let $(X,\mathcal B)$ be a measurable space and let $f: X \to X$ be measurable. A probability measure $\mu$ on $(X,\mathcal B)$ is $f$-invariant if
\begin{align*}
\mu(f^{-1}(A)) = \mu(A)
\end{align*}
for every $A \in \mathcal B$.
[/definition]
Invariant measures make long-time statistics stationary. Once such a measure is fixed, the system can be studied through partitions, time averages, correlations, and entropy relative to that measure. The next question is how to measure the information generated by the system when the initial point is distributed according to this invariant measure.
[definition: Metric Entropy]
Let $(X,\mathcal B,\mu)$ be a probability space and let $f: X \to X$ be a measurable map satisfying $\mu(f^{-1}(A))=\mu(A)$ for every $A\in\mathcal B$. For a finite measurable partition $\mathcal P$, set
\begin{align*}
H_\mu(\mathcal P) = -\sum_{P \in \mathcal P} \mu(P)\log \mu(P),
\end{align*}
with the convention $0\log 0=0$. The entropy of $f$ with respect to $\mathcal P$ is
\begin{align*}
h_\mu(f,\mathcal P)=\lim_{n\to\infty}\frac{1}{n}H_\mu\left(\bigvee_{k=0}^{n-1} f^{-k}\mathcal P\right),
\end{align*}
and the metric entropy of $f$ is
\begin{align*}
h_\mu(f)=\sup_{\mathcal P} h_\mu(f,\mathcal P),
\end{align*}
where the supremum is taken over all finite measurable partitions.
[/definition]
Metric entropy depends on the measure, while topological entropy does not. This creates a comparison question: is the topological number related to the largest metric entropy available among invariant measures?
[quotetheorem:7763]
This result belongs to the entropy theory developed earlier in the course, so here we use it as a synthesis tool. It says that topological orbit complexity can be detected by invariant measures through a supremum, although a particular invariant measure need not describe Lebesgue-typical initial conditions in a smooth phase space. The statement also does not promise that a chosen measure is maximizing, and in broader noncompact or discontinuous settings an entropy supremum may fail to be attained without additional tightness, compactness, or upper semicontinuity hypotheses. Compactness is part of the bridge: for the translation $x\mapsto x+1$ on $\mathbb R$, orbit segments escape every compact set and there is no invariant Borel probability measure analogous to the measures in $\mathcal M_f(X)$. Continuity is also part of the bridge, since a discontinuous interval map can create artificial separated names by jumping across infinitely many small partition pieces near an accumulation point, while the compact continuous argument using open covers, separated sets, and weak limits of empirical measures no longer applies. The Borel probability condition cannot be replaced by an arbitrary finite or escaping distribution: for instance, on $\mathbb Z$ with the shift $n\mapsto n+1$, every initial probability distribution drifts and no stationary probability measure records the long-time orbit complexity.
[example: Bernoulli Measure on the Full Shift]
On the full two-shift $\Sigma_2=\{0,1\}^{\mathbb Z}$, let $\mu_p$ be the product measure for which each coordinate equals $1$ with probability $p$ and equals $0$ with probability $1-p$. If $C$ is a cylinder condition on coordinates $i_1,\dots,i_r$, then shifting $C$ only changes which coordinate indices are named, not how many zeros and ones are required. Since the coordinates are independent with the same one-symbol distribution at every index, $\mu_p(\sigma^{-1}C)=\mu_p(C)$ for every cylinder $C$, and cylinders generate the product Borel $\sigma$-algebra, so $\mu_p$ is shift-invariant.
Let $\mathcal P=\{P_0,P_1\}$, where $P_j=\{x\in\Sigma_2:x_0=j\}$. The join $\bigvee_{k=0}^{n-1}\sigma^{-k}\mathcal P$ records the word $x_0x_1\dots x_{n-1}$. If a word $w$ has $r$ symbols equal to $1$ and $n-r$ symbols equal to $0$, then independence gives
\begin{align*}
\mu_p([w])=p^r(1-p)^{n-r}.
\end{align*}
Therefore its contribution to partition entropy is
\begin{align*}
-\mu_p([w])\log\mu_p([w])=-p^r(1-p)^{n-r}\bigl(r\log p+(n-r)\log(1-p)\bigr).
\end{align*}
Summing over all words of length $n$, each coordinate contributes the same one-symbol entropy, so
\begin{align*}
H_{\mu_p}\left(\bigvee_{k=0}^{n-1}\sigma^{-k}\mathcal P\right)=n\bigl(-p\log p-(1-p)\log(1-p)\bigr).
\end{align*}
Since the coordinate partition is generating for the full shift, the metric entropy is
\begin{align*}
h_{\mu_p}(\sigma)= -p\log p-(1-p)\log(1-p).
\end{align*}
For $0<p<1$, set $\phi(p)=-p\log p-(1-p)\log(1-p)$. Then
\begin{align*}
\phi'(p)=\log(1-p)-\log p.
\end{align*}
Thus $\phi'(p)=0$ exactly when $1-p=p$, namely $p=1/2$. Also
\begin{align*}
\phi''(p)=-\frac{1}{1-p}-\frac{1}{p}<0.
\end{align*}
So this critical point is the unique maximum, and its value is
\begin{align*}
\phi(1/2)=-\frac12\log\frac12-\frac12\log\frac12=\log 2.
\end{align*}
Thus the fair Bernoulli measure realizes entropy $\log 2$, matching the topological entropy of the full two-shift, while biased Bernoulli measures have smaller metric entropy.
[/example]
The example shows that symbolic dynamics can carry many invariant measures, each with its own statistics. For smooth systems, the next question is how metric information is connected to infinitesimal stretching in tangent directions.
[definition: Lyapunov Exponent Along a Vector]
Let $f: M \to M$ be a $C^1$ diffeomorphism of a Riemannian manifold and let $x \in M$. For $v \in T_xM\setminus\{0\}$, when the limit exists, the Lyapunov exponent of $v$ at $x$ is
\begin{align*}
\lambda(x,v)=\lim_{n\to\infty}\frac{1}{n}\log |Df_x^n(v)|.
\end{align*}
[/definition]
Nonzero Lyapunov exponents are infinitesimal data rather than invariant manifolds by themselves. The question is whether exponential tangent splitting can be integrated into actual local stable and unstable sets for almost every point. Pesin theory gives this conversion under the regularity and measurability hypotheses needed in the nonuniform setting.
[quotetheorem:7776]
The principle uses Oseledets theorem, graph transforms with nonuniform constants, and measurable control of the sizes of Pesin charts. It is the smooth-measure analogue of uniform hyperbolicity: exponential tangent splitting is converted into actual local orbit geometry. The $C^{1+\alpha}$ regularity is essential in the usual theorem; for merely $C^1$ systems the distortion estimates needed to build nonuniform stable manifolds can fail.
[remark: Closing Lemma Perspective]
Closing lemmas express the idea that recurrent orbit segments with hyperbolic control can often be shadowed by periodic orbits. In this course we use them as perspective rather than as a main technical tool: they explain why periodic data is often dense in the statistical information of hyperbolic systems. In nonuniform settings, the precise hypotheses matter, since recurrence alone is not enough to guarantee a nearby periodic orbit with comparable exponents.
[/remark]
## Comparing Topological Chaos, Metric Chaos, and Statistical Laws
The comparison problem is this: two systems may have the same topological entropy but different typical statistics, or the same invariant measure entropy but different topological orbit structure. We therefore separate three levels of description and then record the bridges between them.
[definition: Topological Chaos Data]
For a continuous map $f: X \to X$ on a compact metric space, topological chaos data consists of properties invariant under topological conjugacy, including topological transitivity, topological mixing, density of periodic points, existence of horseshoes, symbolic factors, and $h_{\mathrm{top}}(f)$.
[/definition]
This data uses all points in the phase space, including exceptional invariant sets. It is the right language for questions about possible orbit patterns, but not by itself for questions about what a typical sampled initial condition sees. The next question is what information remains meaningful after choosing a probability measure and ignoring sets of measure zero.
[definition: Metric Chaos Data]
Let $(X,\mathcal B,\mu)$ be a probability space and let $f: X \to X$ be a measurable map satisfying $\mu(f^{-1}(A))=\mu(A)$ for every $A\in\mathcal B$. The metric chaos data of the system $(X,\mathcal B,\mu,f)$ consists of properties invariant under measure-theoretic isomorphism, including ergodicity, mixing, Bernoulli structure, metric entropy, Lyapunov exponents when defined, and almost-sure time-average behavior.
[/definition]
Metric data depends on the measure. The same map can carry many invariant measures, so changing the measure may change entropy, ergodicity, and the distribution of observed orbit segments. The next question is what a concrete measurement along one long orbit is expected to reveal.
[definition: Observable Statistical Law]
Let $(X,\mathcal B,\mu)$ be a probability space, let $f: X \to X$ be a measurable map satisfying $\mu(f^{-1}(A))=\mu(A)$ for every $A\in\mathcal B$, and let $\varphi \in L^1(X,\mu)$. An observable statistical law is a limiting statement about the sequence
\begin{align*}
\varphi(x),\ \varphi(f(x)),\ \varphi(f^2(x)),\dots
\end{align*}
for $\mu$-a.e. $x$, such as convergence of time averages, decay of correlations, a [central limit theorem](/theorems/521), or a large deviations estimate.
[/definition]
This third level is what numerical experiments and physical measurements most directly access. It is also the most sensitive to the choice of observable and the physically relevant invariant measure. The first statistical question is whether a single long orbit recovers the integral of an observable.
[quotetheorem:518]
Birkhoff's theorem is weaker than mixing or a [central limit theorem](/theorems/1848), but it is the base statistical law. It says that for an ergodic invariant measure, individual long orbits reveal space averages. Each hypothesis has a concrete role. If $X=\{0,1\}$, $\mu(\{0\})=\mu(\{1\})=1/2$, $f(0)=f(1)=0$, and $\varphi=\mathbf{1}_{\{1\}}$, then $f$ is not probability-preserving and the time average is $0$ after the first step, while $\int_X\varphi\,d\mu=1/2$. Integrability is needed because the averages may not define finite random variables with controlled positive and negative parts; on the identity map of $(0,1)$ with Lebesgue measure, $\varphi(x)=1/x$ gives infinite space average and no finite version of the displayed conclusion. Ergodicity is needed for the limiting function to be constant: for the identity map and $\varphi(x)=x$, the time average is $x$, not $\int_0^1 x\,d\mathcal L^1(x)=1/2$. The next example separates this measure-dependent statement from topological complexity.
[example: Same Map, Different Measures]
Let $\Sigma_2=\{0,1\}^{\mathbb Z}$ and let $\sigma:\Sigma_2\to\Sigma_2$ be the shift. Write $z=\ldots000\ldots$, and let $\delta_z$ be the point mass at $z$. Since $\sigma(z)=z$, for every Borel set $A\subseteq\Sigma_2$ we have
\begin{align*}
\delta_z(\sigma^{-1}A)=1 \text{ exactly when } z\in\sigma^{-1}A.
\end{align*}
The condition $z\in\sigma^{-1}A$ means $\sigma(z)\in A$, and since $\sigma(z)=z$, this is equivalent to $z\in A$. Hence
\begin{align*}
\delta_z(\sigma^{-1}A)=\delta_z(A),
\end{align*}
so $\delta_z$ is invariant.
For any finite measurable partition $\mathcal P$, exactly one atom of $\mathcal P$ has $\delta_z$-measure $1$ after ignoring null atoms. The same is true for every joined partition $\bigvee_{k=0}^{n-1}\sigma^{-k}\mathcal P$, because the measure is still concentrated at the single orbit point $z$. Therefore
\begin{align*}
H_{\delta_z}\left(\bigvee_{k=0}^{n-1}\sigma^{-k}\mathcal P\right)=-(1)\log(1)=0.
\end{align*}
Thus $h_{\delta_z}(\sigma,\mathcal P)=0$ for every finite partition $\mathcal P$, and so
\begin{align*}
h_{\delta_z}(\sigma)=0.
\end{align*}
Now let $\mu_{1/2}$ be the fair Bernoulli product measure. For the coordinate partition $\mathcal P=\{P_0,P_1\}$, where $P_j=\{x:x_0=j\}$, the join $\bigvee_{k=0}^{n-1}\sigma^{-k}\mathcal P$ consists of the $2^n$ cylinders determined by words $w=w_0\cdots w_{n-1}$. Each coordinate has probability $1/2$, so every length-$n$ cylinder has measure
\begin{align*}
\mu_{1/2}([w])=\left(\frac12\right)^n=2^{-n}.
\end{align*}
Hence
\begin{align*}
H_{\mu_{1/2}}\left(\bigvee_{k=0}^{n-1}\sigma^{-k}\mathcal P\right)=-\sum_{w\in\{0,1\}^n}2^{-n}\log(2^{-n}).
\end{align*}
There are $2^n$ summands, so
\begin{align*}
H_{\mu_{1/2}}\left(\bigvee_{k=0}^{n-1}\sigma^{-k}\mathcal P\right)=-2^n2^{-n}(-n\log2)=n\log2.
\end{align*}
Since the coordinate partition is generating for the full shift,
\begin{align*}
h_{\mu_{1/2}}(\sigma)=\lim_{n\to\infty}\frac{1}{n}n\log2=\log2.
\end{align*}
The same product structure gives mixing. If $A$ and $B$ are cylinder sets, then for all sufficiently large $n$, the coordinates defining $A$ and the coordinates defining $\sigma^{-n}B$ are disjoint. Independence gives
\begin{align*}
\mu_{1/2}(A\cap\sigma^{-n}B)=\mu_{1/2}(A)\mu_{1/2}(B).
\end{align*}
Cylinder sets generate the Borel $\sigma$-algebra, so the equality on cylinders extends to mixing for Borel sets by approximation.
The underlying topological system has not changed. The full shift has $2^n$ admissible words of length $n$, and these words give $2^n$ distinguishable orbit segments, so
\begin{align*}
h_{\mathrm{top}}(\sigma)=\lim_{n\to\infty}\frac{1}{n}\log(2^n)=\log2.
\end{align*}
Thus the same map has topological entropy $\log2$, but the invariant measure $\delta_z$ sees metric entropy $0$ while the fair Bernoulli measure sees metric entropy $\log2$ and mixing. This separates topological complexity from the statistics of a chosen measure.
[/example]
The practical comparison can be summarised by the following table in prose. Horseshoes and symbolic factors answer whether complicated orbit patterns exist. Invariant measures and metric entropy answer how much information is generated for a chosen distribution of initial conditions. Ergodic theorems, correlation estimates, and limit theorems answer what long observations of functions along an orbit will produce.
## Complete Analysis Pipeline: A Hyperbolic Toral Automorphism
A useful synthesis should be executable on an example. The standard test case is a hyperbolic toral automorphism, because the geometry, coding, entropy, exponents, and invariant measure can all be computed in the same system.
[definition: Hyperbolic Toral Automorphism]
Let $A \in SL(2,\mathbb Z)$ have no eigenvalue on the unit circle. The associated hyperbolic toral automorphism is the map
\begin{align*}
f_A: \mathbb T^2 \to \mathbb T^2, \qquad f_A(x)=Ax \pmod{\mathbb Z^2},
\end{align*}
where $\mathbb T^2=\mathbb R^2/\mathbb Z^2$.
[/definition]
The quotient turns a linear map on $\mathbb R^2$ into a smooth map on a compact phase space. Hyperbolicity of $A$ gives invariant expanding and contracting eigendirections, which project to stable and unstable foliations on the torus. The next question is how this abstract definition produces computable exponents and entropy in a concrete matrix.
[example: Arnold Cat Map Pipeline]
Take $A\in SL(2,\mathbb Z)$ acting on $\mathbb T^2$ by
\begin{align*}
f_A(x,y)=(2x+y,x+y)\pmod{\mathbb Z^2}.
\end{align*}
The determinant is $2\cdot 1-1\cdot 1=1$, so $A$ preserves area on $\mathbb R^2$ and the induced map preserves Lebesgue measure on $\mathbb T^2$. Its characteristic polynomial is
\begin{align*}
\det(A-tI)=(2-t)(1-t)-1.
\end{align*}
Expanding the product gives
\begin{align*}
(2-t)(1-t)-1=2-3t+t^2-1=t^2-3t+1.
\end{align*}
Thus the eigenvalues solve $t^2-3t+1=0$, so the quadratic formula gives
\begin{align*}
t=\frac{3\pm\sqrt{9-4}}{2}=\frac{3\pm\sqrt5}{2}.
\end{align*}
Hence
\begin{align*}
\lambda_u=\frac{3+\sqrt5}{2}, \qquad \lambda_s=\frac{3-\sqrt5}{2}.
\end{align*}
Since $\lambda_u\lambda_s=1$ and $\lambda_u>1$, we have $0<\lambda_s<1$ and $\lambda_s=\lambda_u^{-1}$.
Because $Df_A=A$ at every point, if $v_u$ is an unstable eigenvector then $A^n v_u=\lambda_u^n v_u$. Therefore
\begin{align*}
\lim_{n\to\infty}\frac1n\log\frac{|A^n v_u|}{|v_u|}=\lim_{n\to\infty}\frac1n\log(\lambda_u^n)=\log\lambda_u.
\end{align*}
Similarly, for a stable eigenvector $v_s$, $A^n v_s=\lambda_s^n v_s$, so
\begin{align*}
\lim_{n\to\infty}\frac1n\log\frac{|A^n v_s|}{|v_s|}=\log\lambda_s=\log(\lambda_u^{-1})=-\log\lambda_u.
\end{align*}
Thus the Lyapunov exponents with respect to Lebesgue measure are $\log\lambda_u$ and $-\log\lambda_u$.
By *Entropy of a Hyperbolic Toral Automorphism*, the topological entropy is the sum of $\log|\lambda|$ over eigenvalues outside the unit circle. Here the only such eigenvalue is $\lambda_u$, so
\begin{align*}
h_{\mathrm{top}}(f_A)=\log\lambda_u.
\end{align*}
Lebesgue measure is absolutely continuous with respect to Riemannian area, so *Pesin's entropy formula* gives metric entropy equal to the sum of the positive Lyapunov exponents:
\begin{align*}
h_{\mathrm{Leb}}(f_A)=\log\lambda_u.
\end{align*}
For this cat map, the unstable expansion rate, the Lebesgue metric entropy, and the topological entropy all equal $\log\lambda_u$.
[/example]
The equality in the example is a special case of a general [entropy formula for hyperbolic toral automorphisms](/theorems/7777). The next question is why the unstable eigenvalues, and only those eigenvalues, determine the exponential growth rate of distinguishable orbit segments.
[quotetheorem:7777]
[citeproof:7777]
The entropy computation explains the growth rate, and its hypotheses rule out several nearby but different phenomena. If hyperbolicity fails, as for a toral rotation or a matrix with an eigenvalue on the unit circle, orbit segments can drift along neutral directions without exponential unstable-volume growth, so the sum over unstable eigenvalues no longer describes a hyperbolic splitting. The compact torus structure matters because quotienting by $\mathbb Z^d$ turns linear expansion into recurrent dynamics; the same expanding linear map on $\mathbb R^d$ sends typical points away to infinity rather than producing a compact entropy computation by Bowen balls. The integer unimodular condition is what makes $A$ descend to an invertible torus map: a non-integer matrix does not preserve the lattice $\mathbb Z^d$, and an integer matrix with determinant different from $\pm1$ gives an endomorphism with additional covering behavior rather than the automorphism treated here. Numerical and symbolic work also needs orbit tracing, so the next question is why approximate orbits of a linear hyperbolic toral automorphism remain close to genuine orbits.
[quotetheorem:7778]
[citeproof:7778]
Shadowing justifies reading approximate coded paths as traces of true dynamics in this uniformly hyperbolic model. Hyperbolicity is the mechanism behind the correction series in the proof: without exponential contraction and expansion, errors can accumulate in a neutral direction, as they do for an irrational rotation with small repeated roundoff errors. The bi-infinite pseudo-orbit hypothesis is also structural, because the construction corrects stable errors using future data and unstable errors using past data; a one-sided pseudo-orbit needs a different statement and may only be shadowed with an initial condition chosen under extra assumptions. The theorem is not a numerical stability theorem for arbitrary discretisations, nor does it give uniqueness of the shadowing orbit unless an expansivity scale is imposed. The next question is how to build the codes themselves from the stable and unstable geometry of the torus.
[example: Symbolic Coding of the Cat Map]
Let $\mathcal R=\{R_1,\dots,R_m\}$ be a Markov partition for the cat map $f_A:\mathbb T^2\to\mathbb T^2$, with stable sides contained in stable leaves and unstable sides contained in unstable leaves. Define the transition matrix $T$ by
\begin{align*}
T_{ij}=1 \text{ exactly when } \operatorname{int}(R_i)\cap f_A^{-1}(\operatorname{int}(R_j))\ne\varnothing.
\end{align*}
For a point $x$ not lying on any forward or backward image of a rectangle boundary, there is a unique itinerary $a(x)=(a_k)_{k\in\mathbb Z}$ determined by
\begin{align*}
f_A^k(x)\in \operatorname{int}(R_{a_k}).
\end{align*}
Since $f_A^{k+1}(x)\in \operatorname{int}(R_{a_{k+1}})$ and $f_A^k(x)\in \operatorname{int}(R_{a_k})$, we have
\begin{align*}
f_A^k(x)\in \operatorname{int}(R_{a_k})\cap f_A^{-1}(\operatorname{int}(R_{a_{k+1}})).
\end{align*}
Therefore $T_{a_k a_{k+1}}=1$ for every $k$, so every orbit produces an admissible sequence in the subshift of finite type
\begin{align*}
\Sigma_T=\{a\in\{1,\dots,m\}^{\mathbb Z}:T_{a_k a_{k+1}}=1\text{ for every }k\in\mathbb Z\}.
\end{align*}
Conversely, if $a\in\Sigma_T$, consider the finite itinerary set
\begin{align*}
K_n(a)=\bigcap_{k=-n}^{n} f_A^{-k}(R_{a_k}).
\end{align*}
The Markov property says that admissibility of the adjacent transitions makes each $K_n(a)$ nonempty. The sets are nested because adding the two conditions at times $-n-1$ and $n+1$ can only shrink the intersection:
\begin{align*}
K_{n+1}(a)\subseteq K_n(a).
\end{align*}
Stable widths shrink under forward iteration and unstable widths shrink under backward iteration, so the diameter of $K_n(a)$ tends to $0$. Compactness of $\mathbb T^2$ then gives a unique point in $\bigcap_{n\ge 1}K_n(a)$ unless the itinerary lies on partition boundaries, where finitely many rectangle names may describe the same orbit.
Let $B=\bigcup_{k\in\mathbb Z}f_A^{-k}(\partial\mathcal R)$. Each $\partial\mathcal R$ is a finite union of stable and unstable arcs, hence has Lebesgue area $0$. Since $f_A$ preserves Lebesgue area, every $f_A^{-k}(\partial\mathcal R)$ also has area $0$, and the countable union $B$ has area $0$. Thus on the full-measure set $\mathbb T^2\setminus B$, the coding map is one-to-one and satisfies
\begin{align*}
a(f_A(x))=\sigma(a(x)).
\end{align*}
The symbolic model is therefore a subshift of finite type, with only boundary orbits causing finite-to-one identifications.
Lebesgue measure becomes a Markov measure under this coding. If
\begin{align*}
\pi_i=\operatorname{Leb}(R_i)
\end{align*}
and
\begin{align*}
p_{ij}=\frac{\operatorname{Leb}(R_i\cap f_A^{-1}(R_j))}{\operatorname{Leb}(R_i)}
\end{align*}
whenever $\operatorname{Leb}(R_i)>0$, then invariance of Lebesgue measure gives the stationarity relation for the weights:
\begin{align*}
\pi_j=\sum_{i=1}^m \pi_i p_{ij}.
\end{align*}
For an admissible cylinder $[i_0i_1\dots i_n]$, its symbolic measure is
\begin{align*}
\pi_{i_0}p_{i_0i_1}p_{i_1i_2}\cdots p_{i_{n-1}i_n}.
\end{align*}
So the cat map can be studied by finite symbolic transitions: the rectangles encode which region an orbit visits, the transition matrix records which visits are possible, and Lebesgue-typical toral orbits correspond to typical paths for the associated Markov measure.
[/example]
The pipeline for this example is therefore complete: identify hyperbolic splitting, build stable and unstable rectangles, code by a subshift of finite type, compute entropy from expansion or the transition matrix, compute Lyapunov exponents from eigenvalues, and use ergodic theorems to interpret time averages.
## Return Maps and Numerical Diagnostics
The final synthesis problem is methodological: in applications, the map is often not given as a symbolic model or a linear toral automorphism. We may instead have a flow, a periodically forced differential equation, or a numerical time series. The course tools still give a structured analysis pipeline, but each step has an uncertainty that must be stated.
[definition: Poincare Return Map]
Let $\Phi_t: M \to M$ be a flow and let $\Sigma \subset M$ be a transverse section. Let
\begin{align*}
\operatorname{Dom}(P)=\{x\in\Sigma:\text{ the first positive return time to }\Sigma\text{ exists}\}.
\end{align*}
For $x\in\operatorname{Dom}(P)$, define the first return time
\begin{align*}
\tau:\operatorname{Dom}(P)\to (0,\infty), \qquad \tau(x)=\inf\{t>0:\Phi_t(x)\in\Sigma\}.
\end{align*}
The Poincare return map is the partially defined map
\begin{align*}
P:\operatorname{Dom}(P)\to\Sigma, \qquad P(x)=\Phi_{\tau(x)}(x),
\end{align*}
where the definition of $\operatorname{Dom}(P)$ includes attainment of the displayed infimum.
[/definition]
Return maps reduce a flow to a discrete-time system without discarding the recurrent geometry near the section. For periodically forced oscillators, the section is often taken at one forcing period, producing a stroboscopic map. The next question is how such a return map enters the horseshoe and entropy pipeline.
[example: Periodically Forced Oscillator Return Map]
Consider a periodically forced oscillator with state $x\in\mathbb R^2$ and flow $\Phi_t$. If the forcing has period $T>0$, the stroboscopic return map is
\begin{align*}
P(x)=\Phi_T(x).
\end{align*}
Thus one iterate of $P$ is one forcing period, and induction gives
\begin{align*}
P^n(x)=\Phi_{nT}(x).
\end{align*}
Suppose a numerical picture suggests that a band $B$ in the section is stretched, folded, and returned across a compact rectangle $R$. To turn this picture into a horseshoe statement, one must replace the picture by estimates. For example, choose cone fields $C^u_z$ and $C^s_z$ on $R$ and verify that, for every $z\in R$ where the return is defined,
\begin{align*}
DP_z(C^u_z)\subseteq C^u_{P(z)}.
\end{align*}
One also needs a uniform expansion estimate such as
\begin{align*}
|DP_zv|\ge \lambda |v|
\end{align*}
for all $v\in C^u_z$ and some $\lambda>1$, together with the stable estimate
\begin{align*}
|DP_zv|\le \lambda^{-1}|v|
\end{align*}
for all $v\in C^s_z$. If two disjoint subrectangles $R_0,R_1\subset R$ are carried across $R$ with stable sides mapped into stable sides and unstable sides crossing from one side of $R$ to the other, then one can assign a binary itinerary by declaring that the $k$th return lies in $R_0$ or $R_1$.
For a prescribed finite word $a_0,\dots,a_{n-1}\in\{0,1\}$, the corresponding finite itinerary set is
\begin{align*}
K(a_0,\dots,a_{n-1})=\bigcap_{k=0}^{n-1}P^{-k}(R_{a_k}).
\end{align*}
Each additional condition $P^k(x)\in R_{a_k}$ cuts the previous set by one returned strip, so the unstable width is controlled by repeated backward contraction and the stable width is controlled by repeated forward contraction. With the estimates above, widths shrink at least by the factor $\lambda^{-n}$ after $n$ returns. Hence the geometric verification produces symbolic rectangles and a compact invariant set coded by binary sequences.
The point is that the return map of a physical oscillator enters the same horseshoe-to-entropy pipeline only after the compact trapping region, transverse crossing geometry, and cone estimates have been proved; the stretched-and-folded plot is evidence for where to look, not the proof itself.
[/example]
Numerical evidence often begins with entropy estimates and Lyapunov exponent estimates. These are useful diagnostics, but they are not substitutes for the hypotheses of theorems unless accompanied by error bounds or validated numerics. The next question is how to define the finite-time quantity that numerical Lyapunov computations approximate.
[definition: Finite-Time Lyapunov Estimate]
Let $f: U \subseteq \mathbb R^d \to \mathbb R^d$ be $C^1$, let $x\in U$, and let $v\in\mathbb R^d\setminus\{0\}$. The finite-time Lyapunov estimate over $n$ iterates is
\begin{align*}
\lambda_n(x,v)=\frac{1}{n}\log\frac{|Df_x^n(v)|}{|v|}.
\end{align*}
[/definition]
The estimate becomes meaningful only after discussing convergence, dependence on $x$ and $v$, and numerical stability of the derivative computation. In simulations, re-normalisation and QR methods are used to estimate the whole Lyapunov spectrum. The next example explains how such estimates are compared with entropy computations from sampled data.
[example: Estimating Entropy and Exponents from Data]
Suppose the sampled orbit segment is $x_0=x,x_1=f(x),\dots,x_{N-1}=f^{N-1}(x)$, and suppose a derivative computation is available along the same segment. Choose a unit tangent vector $v_0$. Define
\begin{align*}
\alpha_k=|Df_{x_k}v_k|
\end{align*}
and, when $\alpha_k\ne0$, renormalize by
\begin{align*}
v_{k+1}=\frac{Df_{x_k}v_k}{\alpha_k}.
\end{align*}
Then $Df_{x_k}v_k=\alpha_k v_{k+1}$ with $|v_{k+1}|=1$. Iterating this identity gives
\begin{align*}
Df_{x_{n-1}}\cdots Df_{x_0}v_0=\alpha_0\alpha_1\cdots\alpha_{n-1}v_n.
\end{align*}
Since $|v_n|=1$, the finite-time Lyapunov estimate is
\begin{align*}
\lambda_n(x,v_0)=\frac{1}{n}\log |Df_x^n v_0|=\frac{1}{n}\log(\alpha_0\alpha_1\cdots\alpha_{n-1}).
\end{align*}
Using $\log(ab)=\log a+\log b$ repeatedly, this becomes
\begin{align*}
\lambda_n(x,v_0)=\frac{1}{n}\sum_{k=0}^{n-1}\log\alpha_k.
\end{align*}
Thus the exponent estimate is an average of the observed logarithmic stretch factors along the orbit.
For a partition-based entropy estimate, let $\mathcal P=\{C_1,\dots,C_m\}$ and record the cell name $a_k=i$ when $x_k\in C_i$. For a word $w=w_0\cdots w_{r-1}$ of length $r$, define its empirical frequency by
\begin{align*}
\widehat p_N(w)=\frac{1}{N-r+1}\#\{0\le k\le N-r:a_k=w_0,\dots,a_{k+r-1}=w_{r-1}\}.
\end{align*}
The empirical block entropy is
\begin{align*}
\widehat H_N(r)=-\sum_{w\in\{1,\dots,m\}^r}\widehat p_N(w)\log \widehat p_N(w),
\end{align*}
with $0\log0=0$. The corresponding entropy-per-step estimate is
\begin{align*}
\widehat h_N(r)=\frac{1}{r}\widehat H_N(r).
\end{align*}
If the orbit is typical for an ergodic invariant measure and $\mathcal P$ is a generating partition, then *Birkhoff Time Averages* identifies the limiting word frequencies with the measure of the corresponding cylinder sets, and the block entropies converge to the metric entropy rate as the block length is increased.
In the smooth hyperbolic setting, if the sampled invariant measure satisfies the hypotheses of *Pesin's entropy formula*, then metric entropy is the sum of the positive Lyapunov exponents, counted with multiplicity. The numerical comparison is therefore: the block-frequency calculation estimates $h_\mu(f)$, while the derivative calculation estimates the positive Lyapunov exponents. Agreement between the two is evidence that the intended hyperbolic statistical picture is plausible, but it does not itself prove hyperbolicity, existence of a generating partition, or the absolute-continuity/SRB hypotheses needed for the entropy formula.
[/example]
The example leaves a final synthesis question. When the invariant measure is compatible with smooth hyperbolic geometry, how exactly should metric entropy relate to the positive Lyapunov exponents? The guiding answer is Pesin's entropy formula.
[quotetheorem:7779]
The full equality requires the technical framework of nonuniform hyperbolicity and absolute continuity along unstable laminations, so this chapter records it as a guiding principle with its hypotheses visible. The inequality alone is the general statement; without the absolute-continuity or SRB-type assumptions, a hyperbolic invariant measure can have entropy strictly smaller than the sum of its positive Lyapunov exponents. This explains why entropy estimates and Lyapunov exponent estimates often agree in hyperbolic numerical experiments, but also why agreement is a theorem-level conclusion rather than a consequence of hyperbolicity alone.
[remark: What the Pipeline Proves]
A complete analysis should state which layer has been established. A verified horseshoe proves topological entropy and symbolic orbit complexity. An invariant ergodic physical measure proves almost-sure time-average laws for observables. A validated Lyapunov spectrum plus the hypotheses of Pesin theory connects infinitesimal stretching to entropy and local invariant manifolds.
[/remark]
The course ends with this organising picture. Geometry supplies mechanisms such as transverse homoclinic intersections and hyperbolic splittings. Coding turns those mechanisms into shifts, transition matrices, and entropy computations. Statistics interprets invariant measures, time averages, and Lyapunov exponents, explaining which features are visible to typical observations rather than only possible somewhere in phase space.
## Beyond And Connections
The next natural direction is to separate which parts of the theory are topological, which are measure-theoretic, and which genuinely depend on smoothness. [Ergodic Theory I: Foundations](/page/Ergodic%20Theory%20I%3A%20Foundations) gives the invariant-measure language behind recurrence, ergodicity, and time averages. [Ergodic Theory II: Entropy and Advanced Topics](/page/Ergodic%20Theory%20II%3A%20Entropy%20and%20Advanced%20Topics) continues the entropy side, especially the passage from partitions and symbolic codings to measure-theoretic entropy.
Smooth hyperbolic dynamics also has two important extensions. One is nonuniform hyperbolicity, where Oseledets splittings and Pesin theory replace uniform cone estimates; theorems about SRB measures and entropy formulas live most naturally there. The other is thermodynamic formalism, where Markov partitions and transfer operators turn orbit growth into pressure, equilibrium states, and zeta functions.
Several neighbouring Androma courses supply useful comparison points. [Cambridge II Dynamical Systems](/page/Cambridge%20II%20Dynamical%20Systems) gives a broader entry point for examples and qualitative dynamics, while [Partial Differential Equations III: Parabolic and Hyperbolic Evolution Equations](/page/Partial%20Differential%20Equations%20III%3A%20Parabolic%20and%20Hyperbolic%20Evolution%20Equations) shows how hyperbolic propagation appears in an analytic setting. A useful way to continue is to ask, for each system under study, which invariant objects are topological, which are probabilistic, and which are stable under perturbation.
## References
Contents
- Introduction
- The Central Question of Chaotic Dynamics
- Three Viewpoints on Chaos
- The Course Roadmap
- Conventions and Standing Assumptions
- What Counts as Understanding a Chaotic System
- 1. From Recurrence to Chaos
- Orbits and Invariant Sets
- Limit Sets and Recurrence
- Topological Transitivity and Mixing
- Sensitive Dependence and Devaney Chaos
- 2. Symbolic Dynamics
- Coding Orbits by Infinite Words
- Finite Transition Rules and Markov Shifts
- Perron-Frobenius Theory and Periodic Points
- 3. Horseshoes and Smale Dynamics
- Stretching and Folding in a Rectangle
- Symbolic Coding of the Horseshoe
- The Smale Horseshoe Theorem
- Homoclinic Intersections and Return Maps
- Dynamical Consequences of Horseshoes
- 4. Hyperbolic Sets and Stable Manifolds
- Uniform Hyperbolicity on Invariant Sets
- Stable and Unstable Sets
- Stable Manifolds for Hyperbolic Fixed Points
- Stable Manifolds for Compact Hyperbolic Sets
- Local Product Structure
- Structural Consequences and Models
- 5. Homoclinic Intersections and the Lambda Lemma
- Transverse Homoclinic Points
- The Lambda Lemma
- Smale-Birkhoff Homoclinic Theorem
- Homoclinic Geometry In Poincare Maps
- Consequences And Perspective
- 6. Shadowing and Structural Stability
- Approximate Orbits and Shadowing
- Expansivity and the Uniqueness of Orbits
- Specification and Orbit Pasting
- Structural Stability of Hyperbolic Systems
- Axiom A and Spectral Decomposition
- 7. Topological Entropy
- Measuring Orbit Complexity
- The Open-Cover Definition
- Entropy Of Flows
- Subshifts And Spectral Radius
- Interval Maps, Expanding Maps, And Horseshoes
- The Variational Principle
- 8. Invariant Measures and Ergodicity
- Invariant Measures and Statistical Observables
- Existence of Invariant Measures
- Ergodicity and Time Averages
- Ergodic Decomposition
- Markov Measures and Subshifts of Finite Type
- Mixing Properties
- Summary of the Measure-Theoretic Viewpoint
- 9. Lyapunov Exponents and Smooth Ergodic Theory
- Measuring Exponential Growth Along Orbits
- Oseledets Splittings
- Finite-Time Exponents and Numerical Instability
- Lyapunov Exponents for One-Dimensional Expanding Maps
- Exponents, Instability, and Entropy
- Dimension and Nonuniform Hyperbolicity
- 10. SRB Measures and Physical Measures
- Physical Measures and Basins
- SRB Measures for Uniformly Hyperbolic Dynamics
- Existence and Uniqueness in the Uniformly Hyperbolic Case
- Nonuniform Motivation and Hénon-Like Attractors
- 11. Markov Partitions and Thermodynamic Formalism
- Rectangles and Stable-For-Unstable Product Structure
- Markov Partitions and Transition Matrices
- Pressure and the Variational Principle
- Measures Of Maximal Entropy and Parry Measures
- 12. Synthesis: Geometry, Coding, and Statistics
- From Transverse Homoclinic Intersections to Entropy
- From Hyperbolic Geometry to Invariant Measures
- Comparing Topological Chaos, Metric Chaos, and Statistical Laws
- Complete Analysis Pipeline: A Hyperbolic Toral Automorphism
- Return Maps and Numerical Diagnostics
- Beyond And Connections
- References
Dynamical Systems II: Chaos and Ergodic Theory
Content
Problems
History
Created by admin on 6/19/2026 | Last updated on 6/19/2026
Prerequisites
No prerequisites required for this page.
Rate this page
★
★
★
★
★
Poor
Excellent