A measure tells us how large a set is, but dynamics asks a sharper question: what happens to size when a system evolves? If $T: E \to E$ moves every point of a [measure space](/page/Measure%20Space) $(E, \mathcal E, \mu)$ to its next position, then the image of a set can be distorted, folded, or spread out. The central question is whether the statistical size of every observable event is unchanged by this motion. A measure-preserving transformation is the answer: it is a measurable self-map whose inverse images have the same measure as the original sets.
The inverse image is the right object because events are tested by asking where they came from. If $A \in \mathcal E$ is an event after one step, then $T^{-1}(A)$ is the event that the starting point lands in $A$ after one step. Preserving measure means that the probability, mass, or volume of seeing $A$ after applying the system is the same as seeing $A$ before applying it. This turns a deterministic map into a statistical symmetry.
[example: A Rotation That Preserves Arc Length]
Let $E=[0,1)$, $\mathcal E=\mathcal B([0,1))$, and $\mu=\mathcal L^1|_{[0,1)}$. Fix $\alpha\in[0,1)$ and define
\begin{align*}
T(x)=x+\alpha \pmod 1.
\end{align*}
We compute the preimage of a half-open interval $A=[a,b)\subsetneq[0,1)$, where $0\leq a<b\leq 1$. If $a\geq \alpha$, then
\begin{align*}
T^{-1}(A)=[a-\alpha,b-\alpha)
\end{align*}
because $x+\alpha\in[a,b)$ exactly when $x\in[a-\alpha,b-\alpha)$. Hence
\begin{align*}
\mu(T^{-1}(A))=(b-\alpha)-(a-\alpha)=b-a=\mu(A).
\end{align*}
If $b\leq \alpha$, then $x+\alpha\pmod 1\in[a,b)$ exactly when $x+\alpha-1\in[a,b)$, so
\begin{align*}
T^{-1}(A)=[1+a-\alpha,1+b-\alpha).
\end{align*}
Therefore
\begin{align*}
\mu(T^{-1}(A))=(1+b-\alpha)-(1+a-\alpha)=b-a=\mu(A).
\end{align*}
It remains to handle the case $a<\alpha<b$, where the interval crosses the cut point after shifting backward. Then $x+\alpha\pmod 1\in[a,b)$ holds either when $x+\alpha\in[\alpha,b)$ or when $x+\alpha-1\in[a,\alpha)$, so
\begin{align*}
T^{-1}(A)=[0,b-\alpha)\cup[1+a-\alpha,1).
\end{align*}
The two intervals are disjoint, and their total length is
\begin{align*}
(b-\alpha)+\bigl(1-(1+a-\alpha)\bigr)=(b-\alpha)+(\alpha-a)=b-a.
\end{align*}
Thus $\mu(T^{-1}(A))=\mu(A)$ for every half-open interval $[a,b)\subsetneq[0,1)$. Since the half-open intervals generate $\mathcal B([0,1))$, the equality extends to all Borel sets by the standard uniqueness principle for measures generated by intervals. The rotation can move every point, but it leaves the Lebesgue distribution of points unchanged.
[/example]
The rotation example is deliberately simple, but it already contains the main lesson. A transformation may preserve measure without fixing sets pointwise. It may also fail to be invertible as a function while still preserving measure through preimages. The theory is therefore about measurable dynamics rather than geometric rigidity: the important question is what happens to measured events under time evolution.
## Definition
### Preservation by Preimages
Before defining the transformation, recall that this is a child concept of measure. The ambient structure is already a measure space: a set equipped with a $\sigma$-algebra of measurable events and a countably additive measure. The new ingredient is a measurable self-map that acts on that measured world without changing the measure of any event when pulled back.
[definition: Measure-Preserving Transformation]
Let $(E, \mathcal E, \mu)$ be a measure space. A measurable map $T: E \to E$ is a measure-preserving transformation if
\begin{align*}
\mu(T^{-1}(A)) = \mu(A)
\end{align*}
for every $A \in \mathcal E$.
[/definition]
The definition uses preimages, not images. Images of measurable sets need not be measurable under an arbitrary measurable map, and even when they are measurable, image size is not the right invariant for non-injective maps. Preimages always respect the $\sigma$-algebra: if $A$ is measurable and $T$ is measurable, then $T^{-1}(A)$ is measurable.
Many applications use total mass $1$, because the measure describes a probability distribution on possible states. It is useful to name this common case separately: the same equation is now read as stationarity of probabilities under time evolution.
[definition: Probability-Preserving Transformation]
Let $(\Omega, \mathcal F, \mathbb P)$ be a probability space. A measurable map $T: \Omega \to \Omega$ is a probability-preserving transformation if
\begin{align*}
\mathbb P(T^{-1}(A)) = \mathbb P(A)
\end{align*}
for every $A \in \mathcal F$.
[/definition]
This is the same definition with the letter $\mathbb P$ emphasizing probability. In much of ergodic theory, by convention, the abbreviation m.p.t. refers to a probability-preserving transformation unless a more general measure space is explicitly stated.
### Pushforwards and Observable Tests
The next task is to express the same preservation law as an equality between measures. That reformulation is useful because dynamics often starts with a map and asks which measures remain fixed by it, and because later examples will move between set-level preservation and distribution-level invariance.
[definition: Pushforward Measure]
Let $(E, \mathcal E)$ and $(F, \mathcal F)$ be measurable spaces, let $\mu$ be a measure on $(E, \mathcal E)$, and let $T: E \to F$ be measurable. The pushforward measure $T_\#\mu$ on $(F, \mathcal F)$ is defined by
\begin{align*}
(T_\#\mu)(B) = \mu(T^{-1}(B))
\end{align*}
for every $B \in \mathcal F$.
[/definition]
Once pushforwards are available, there are two apparently different ways to say that no mass changes under the dynamics: every event has the same measure as its preimage, or the whole measure is unchanged after being pushed forward by the map. The useful point is that these are not separate requirements. They are the same condition, and that equivalence lets later arguments switch between event-level tests and measure-level fixed points without adding any new hypothesis.
This raises the practical test for invariance: when a transformation is given, should one verify preservation by checking all measurable events, or by checking a single equality of measures? The following result supplies exactly that bridge, turning the setwise preservation condition into a pushforward fixed-point condition.
[quotetheorem:8380]
This characterisation is often the most compact way to remember the definition. It also makes measure preservation look like a fixed-point condition: the measure $\mu$ is fixed by the operation of pushing measures forward along $T$.
A tempting but wrong alternative would be to ask for $\mu(T(A))=\mu(A)$ for measurable sets $A$. That condition is unreliable: images may fail to be measurable, and non-injective maps can preserve preimage-measure while collapsing many points onto the same image event. The next example makes the failure visible and prepares the observable viewpoint: if images are the wrong primitive, then events and functions pulled back along $T$ become the natural tests.
[example: Why Images Are the Wrong Test]
Let $E=[0,1)$, let $\mathcal E=\mathcal B([0,1))$, let $\mu=\mathcal L^1|_{[0,1)}$, and define
\begin{align*}
T(x)=2x \pmod 1.
\end{align*}
We first check that image-measure is the wrong condition. If $A=[0,1/2)$ and $x\in A$, then $0\leq 2x<1$, so $T(x)=2x$. Thus every $y\in[0,1)$ is hit by taking $x=y/2\in[0,1/2)$, and hence $T(A)=[0,1)$. Therefore
\begin{align*}
\mu(T(A))=\mu([0,1))=1
\end{align*}
while
\begin{align*}
\mu(A)=\mu([0,1/2))=\frac{1}{2}.
\end{align*}
So $\mu(T(A))\neq \mu(A)$.
Now let $B=[a,b)\subsetneq[0,1)$, where $0\leq a<b\leq 1$. On the first half of $E$, if $x\in[0,1/2)$, then $T(x)=2x$, so
\begin{align*}
T(x)\in B \quad \text{if and only if} \quad a\leq 2x<b
\end{align*}
which is equivalent to
\begin{align*}
\frac{a}{2}\leq x<\frac{b}{2}.
\end{align*}
On the second half, if $x\in[1/2,1)$, then $T(x)=2x-1$, so
\begin{align*}
T(x)\in B \quad \text{if and only if} \quad a\leq 2x-1<b
\end{align*}
which is equivalent to
\begin{align*}
\frac{1+a}{2}\leq x<\frac{1+b}{2}.
\end{align*}
Hence
\begin{align*}
T^{-1}(B)=\left[\frac{a}{2},\frac{b}{2}\right)\cup\left[\frac{1+a}{2},\frac{1+b}{2}\right).
\end{align*}
These two intervals are disjoint because $b/2\leq 1/2\leq (1+a)/2$. Their lengths are
\begin{align*}
\frac{b}{2}-\frac{a}{2}=\frac{b-a}{2}
\end{align*}
and
\begin{align*}
\frac{1+b}{2}-\frac{1+a}{2}=\frac{b-a}{2}.
\end{align*}
Therefore
\begin{align*}
\mu(T^{-1}(B))=\frac{b-a}{2}+\frac{b-a}{2}=b-a=\mu(B).
\end{align*}
Since half-open intervals generate $\mathcal B([0,1))$, the equality extends to all Borel sets by the *uniqueness principle for measures generated by intervals*. Thus the doubling map is measure-preserving even though it does not preserve image-measure for the set $[0,1/2)$.
[/example]
This example is the first real warning in the subject. Measure preservation is not a statement about tracking individual pieces of material without overlap. It is a statement about the distribution seen by observations after pulling events back through time.
## Invariance of Integrals
Once events are preserved, observables should also have unchanged averages. This is the bridge from set-theoretic dynamics to probability and analysis: instead of asking whether an event $A$ has the same measure after one step, we ask whether a measurable function has the same integral after composition with $T$.
[quotetheorem:8381]
The theorem says that a measure-preserving transformation is invisible to averages. A single point may move dramatically, but every integrable statistic has the same expected value before and after applying the map. In practice, however, checking all integrable functions is usually as hard as checking all measurable sets, and checking every measurable event can be impossible in applications where the measure is described only through moments, densities, or continuous test functions.
This creates a useful certification problem: when does invariance on a manageable class of observables force invariance of the whole probability measure? If the chosen observables determine probability measures, then equality of their integrals against $\mu$ and against $T_\#\mu$ is enough to recover $T_\#\mu=\mu$. The next theorem packages that test so later arguments can verify preservation through observables rather than through every event.
[quotetheorem:8382]
This is the form used in many applications: a proposed invariant distribution is tested against observables rather than against every event. Markov chains, smooth dynamics, and statistical mechanics all use this same idea with different classes of observables.
## Iteration and Dynamical Systems
A transformation becomes dynamics only when it is repeated. The preservation law must therefore survive iteration: if one step preserves the measure, then two steps, three steps, and every finite time step should preserve it as well.
[definition: Iterates of a Transformation]
Let $E$ be a set and let $T:E\to E$ be a map. The iterates of $T$ are the maps $T_k:E\to E$ defined by $T_0=\operatorname{id}_E$ and
\begin{align*}
T_{k+1}=T\circ T_k
\end{align*}
for every $k\in\mathbb N\cup\{0\}$.
[/definition]
The notation $T_k$ records time without using superscripts for enumeration. In many books this is written $T^k$, but here the subscript convention avoids overloading superscripts except for powers and derivative order. Once iterates are named, the transformation should no longer be treated as an isolated function: its meaning depends on the measurable events it acts on and on the measure it leaves fixed.
This motivates packaging the state space, $\sigma$-algebra, measure, and time-evolution map into a single object. Without that package, the same formula for $T$ could describe different measured dynamics after changing $\mu$ or $\mathcal E$. The following definition names the object whose finite-time behaviour, invariant sets, and orbit averages will be studied.
[definition: Measure-Preserving System]
A measure-preserving system is a quadruple $(E,\mathcal E,\mu,T)$ where $(E,\mathcal E,\mu)$ is a measure space and $T:E\to E$ is a measure-preserving transformation.
[/definition]
When $\mu(E)=1$, this is often called a probability-preserving system. The system viewpoint is important because changing the measure can change the dynamics being studied, even when the underlying point map is the same. The first structural question for such a system is whether the preservation law is genuinely stable under all finite time steps.
[quotetheorem:8383]
This theorem lets us talk about long-time behaviour without rechecking the definition at every time. If $A$ is an event, then $T_k^{-1}(A)$ is the event that the system lies in $A$ after $k$ steps, and it always has the same measure as $A$. The finite cyclic example below shows this repeated preservation in a setting where time evolution is just a permutation of atoms.
[example: A Finite Cyclic System]
Let $E=\{0,1,2,3\}$, let $\mathcal E=\mathcal P(E)$, and let $\mu(A)=|A|/4$. Define $T:E\to E$ by
\begin{align*}
T(i)=i+1\pmod 4.
\end{align*}
For any $A\subset E$, an element $i\in E$ belongs to $T^{-1}(A)$ exactly when $i+1\pmod 4$ belongs to $A$. Equivalently,
\begin{align*}
T^{-1}(A)=\{j-1\pmod 4:j\in A\}.
\end{align*}
The map $j\mapsto j-1\pmod 4$ is a bijection from $A$ onto $T^{-1}(A)$, so
\begin{align*}
|T^{-1}(A)|=|A|.
\end{align*}
Therefore
\begin{align*}
\mu(T^{-1}(A))=\frac{|T^{-1}(A)|}{4}=\frac{|A|}{4}=\mu(A).
\end{align*}
Thus $T$ is measure-preserving.
The same calculation applies at every finite time. For $k\in\mathbb N\cup\{0\}$, the $k$-th iterate is
\begin{align*}
T_k(i)=i+k\pmod 4.
\end{align*}
Hence
\begin{align*}
T_k^{-1}(A)=\{j-k\pmod 4:j\in A\}.
\end{align*}
The map $j\mapsto j-k\pmod 4$ is a bijection from $A$ onto $T_k^{-1}(A)$, so
\begin{align*}
\mu(T_k^{-1}(A))=\frac{|T_k^{-1}(A)|}{4}=\frac{|A|}{4}=\mu(A).
\end{align*}
The dynamics merely rotates the four atoms, so counting measure normalized by $4$ is unchanged at every time step.
[/example]
Finite examples remove analytic distractions. They show that measure preservation is already a statement about counting and symmetry: a deterministic permutation preserves the uniform measure because it rearranges the atoms without changing how many lie in any event.
## Null Sets and Isomorphism
In measure theory, sets of measure zero are invisible to integration and probability. Dynamical systems therefore identify maps that agree outside a null set, and they treat invertibility up to null sets as the correct substitute for pointwise bijectivity.
[definition: Almost Everywhere Equality of Transformations]
Let $(E,\mathcal E,\mu)$ be a measure space, and let $S,T:E\to E$ be measurable maps. The maps $S$ and $T$ are equal $\mu$-a.e. if
\begin{align*}
\mu(\{x\in E:S(x)\neq T(x)\})=0.
\end{align*}
[/definition]
This definition is necessary because changing a transformation on a null set should not change the induced dynamics seen by measurable events or integrable observables. To make that principle usable, we need to know that null exceptional sets remain null after pulling them back through a measure-preserving transformation.
[quotetheorem:8384]
Null sets are therefore backward-invariant in measure. This is the technical reason that identities valid a.e. can be composed with a measure-preserving transformation without becoming false on a set of positive measure. It also explains why pointwise bijectivity is too rigid for measured dynamics: changing a map on a null set should not create a genuinely different system.
The right replacement asks for inverse maps only after all null exceptional sets have been discarded. This distinction matters because many models are built from representatives of a.e. equivalence classes; demanding exact inverse identities would make the theory depend on arbitrary choices on invisible sets. The following definition isolates the invertibility notion that survives measure-theoretic observation.
[definition: Invertibility Modulo Null Sets]
Let $(E,\mathcal E,\mu)$ and $(F,\mathcal F,\nu)$ be measure spaces, and let $\Phi:E\to F$ be measurable. The map $\Phi$ is invertible modulo null sets if there exists a measurable map $\Psi:F\to E$ such that
\begin{align*}
\Psi\circ \Phi = \operatorname{id}_E \quad \mu\text{-a.e.}
\end{align*}
and
\begin{align*}
\Phi\circ \Psi = \operatorname{id}_F \quad \nu\text{-a.e.}
\end{align*}
[/definition]
This is the right invertibility notion for ergodic theory. It keeps exactly the information that survives measure-theoretic observation and discards distinctions confined to null sets. To compare two dynamical systems, invertibility alone is not enough: the comparison must also preserve measure and match the two time evolutions.
[definition: Measure-Preserving Isomorphism]
Let $(E,\mathcal E,\mu,T)$ and $(F,\mathcal F,\nu,S)$ be measure-preserving systems. A measure-preserving isomorphism from the first system to the second is a measurable map $\Phi:E\to F$ such that $\Phi_\#\mu=\nu$, $\Phi$ is invertible modulo null sets, and
\begin{align*}
\Phi\circ T = S\circ \Phi \quad \mu\text{-a.e.}
\end{align*}
[/definition]
The commutation relation says that $\Phi$ converts the time evolution of one system into the time evolution of the other. Two isomorphic systems may look different pointwise, but they have the same measurable dynamics.
## Invariant Sets and Ergodicity
Measure preservation says the whole distribution is stationary, but it does not say the dynamics mixes the space. A system may split into two invariant regions that never communicate. Ergodicity is the condition that rules out such measurable decompositions.
[definition: Invariant Set]
Let $(E,\mathcal E,\mu,T)$ be a measure-preserving system. A measurable set $A\in\mathcal E$ is invariant if
\begin{align*}
\mu(T^{-1}(A)\triangle A)=0.
\end{align*}
[/definition]
The symmetric difference appears because equality of sets is too strict in measure theory. An invariant set is an event whose truth value is unchanged after applying the dynamics, except possibly on a null set. For invertible transformations this matches the usual orbit picture modulo null sets; for non-invertible transformations it is better read as equality between the event $A$ and its one-step pullback $T^{-1}(A)$ up to null sets. Either way, such a set is a measurable obstruction to treating the system as one indecomposable piece.
If a positive-probability invariant event has a positive-probability complement, the system has split into two measurable worlds that never communicate. Measure preservation alone permits this split, so it is not enough for indecomposable dynamics. The obstruction must be ruled out directly: every event whose truth value is unchanged by the dynamics should be either negligible or almost certain.
[definition: Ergodic Transformation]
Let $(E,\mathcal E,\mathbb P,T)$ be a probability-preserving system. The transformation $T$ is ergodic if every invariant set $A\in\mathcal E$ satisfies
\begin{align*}
\mathbb P(A)=0 \quad \text{or} \quad \mathbb P(A)=1.
\end{align*}
[/definition]
Ergodicity says that the only time-invariant events are negligible events and almost certain events. It is a mathematical version of indecomposability: the probability space cannot be split into two positive-probability invariant subsystems.
[example: The Identity Map Is Not Ergodic]
Let $(E,\mathcal E,\mathbb P)$ be a probability space, choose an event $A\in\mathcal E$ with $0<\mathbb P(A)<1$, and let $T=\operatorname{id}_E$. We compute the pullback of $A$ under $T$. For $x\in E$,
\begin{align*}
x\in T^{-1}(A)\quad \text{if and only if}\quad T(x)\in A\quad \text{if and only if}\quad x\in A.
\end{align*}
Hence $T^{-1}(A)=A$, and therefore
\begin{align*}
T^{-1}(A)\triangle A=A\triangle A=\varnothing.
\end{align*}
Since $\mathbb P(\varnothing)=0$, the event $A$ is invariant. But $0<\mathbb P(A)<1$, so $A$ has neither probability $0$ nor probability $1$. Thus the identity map preserves the probability measure but is not ergodic, because it leaves every event fixed rather than forcing invariant events to be trivial.
[/example]
This failure is useful because it separates measure preservation from ergodicity. The identity map preserves every probability measure, but it has too many invariant events to describe a genuinely indecomposable motion. Sets, however, are only one way to observe a system: in analysis and probability, invariance more often appears through functions whose values stay unchanged along the dynamics.
That functional viewpoint should be equivalent to the set-based definition. Indicator functions turn invariant sets into invariant observables, while level sets of an invariant observable recover invariant events. The next theorem records this equivalence, making ergodicity usable in spaces where observables are easier to handle than arbitrary events.
[quotetheorem:8385]
This characterization changes the form of the ergodicity test without changing its content. An invariant measurable function is an observable whose value does not change under the time evolution, so a nonconstant one would separate the space into measurably distinguishable regions that the dynamics never mixes. Ergodicity rules out exactly that obstruction: every time-invariant observable must be constant outside a null set.
The formulation is useful because invariant functions often arise naturally from computations, limits, or conserved quantities. For example, if an orbit statistic satisfies $f\circ T=f$, then in an ergodic probability-preserving system it cannot carry nontrivial information about the starting point. The limitation is just as important: without ergodicity, invariant functions may be nonconstant, and their level sets reveal the invariant pieces into which the system decomposes. This observable viewpoint will be the bridge from invariant sets to long-time averages, where one studies functions rather than individual events.
## Recurrence and Long-Time Averages
### Returns in Finite Measure
The most striking consequence of preserving finite measure is recurrence. If an event has positive probability, then almost every point in that event must return to it infinitely often. Without finite total measure, this statement can fail because mass can drift away forever.
[definition: Recurrent Point for a Set]
Let $(E,\mathcal E)$ be a measurable space, let $T:E\to E$ be measurable, and let $A\in\mathcal E$. A point $x\in A$ is recurrent to $A$ if there exist infinitely many $k\in\mathbb N$ such that $T_k(x)\in A$.
[/definition]
Recurrence is a pointwise statement, but the obstruction is global. If a positive-measure set contained many points that entered once and never came back, the successive images of those escaping pieces would produce too much disjoint mass inside a finite space. Measure preservation forbids that kind of dissipation, so finite invariant measure forces return behavior rather than merely allowing it.
[quotetheorem:3425]
The finite-measure hypothesis is essential. If the space has infinite measure, a transformation can preserve measure while carrying each point away from any bounded region forever.
[example: Recurrence Can Fail in Infinite Measure]
Let $E=\mathbb Z$, let $\mathcal E=\mathcal P(\mathbb Z)$, and let $\mu$ be counting measure. Define $T:\mathbb Z\to\mathbb Z$ by
\begin{align*}
T(n)=n+1.
\end{align*}
For any subset $B\subseteq\mathbb Z$, an integer $n$ belongs to $T^{-1}(B)$ exactly when $n+1\in B$. Therefore
\begin{align*}
T^{-1}(B)=\{m-1:m\in B\}.
\end{align*}
The map $m\mapsto m-1$ is a bijection from $B$ onto $T^{-1}(B)$, with inverse $n\mapsto n+1$. Hence counting measure gives
\begin{align*}
\mu(T^{-1}(B))=|T^{-1}(B)|=|B|=\mu(B).
\end{align*}
Thus $T$ is measure-preserving, even though $\mu(E)=|\mathbb Z|=\infty$.
Now take $A=\{0\}$. The iterates satisfy $T_0(0)=0$, and if $T_k(0)=k$, then
\begin{align*}
T_{k+1}(0)=T(T_k(0))=T(k)=k+1.
\end{align*}
By induction, $T_k(0)=k$ for every $k\in\mathbb N\cup\{0\}$. Therefore $T_k(0)\in A$ holds only when $k=0$. The point $0$ has no positive return time to $A$, so finite-measure recurrence can fail in infinite measure spaces.
[/example]
### Orbit Averages
Recurrence tells us that returns happen, but it does not quantify long-time statistical averages. For that, the central object is the average of an observable along an orbit: instead of only asking whether a point comes back, we ask what a long run of observations records.
[definition: Birkhoff Average]
Let $(E,\mathcal E)$ be a measurable space, let $T:E\to E$ be measurable, and let $f:E\to\mathbb C$ be measurable. For each $n\in\mathbb N$, the Birkhoff average of $f$ is the measurable function $A_n f:E\to\mathbb C$ defined by
\begin{align*}
(A_n f)(x)=\frac{1}{n}\sum_{k=0}^{n-1} f(T_k(x)).
\end{align*}
[/definition]
Birkhoff averages are the formal version of measuring a system over a long experiment. A priori, two starting points could spend their time in different invariant regions and produce different limiting averages. Ergodicity removes that obstruction: once there are no nontrivial invariant pieces, a typical long orbit has no smaller measurable component on which to base a different statistical average.
[quotetheorem:7385]
The theorem is the conceptual payoff, but its hypotheses matter. The observable must be integrable, and the conclusion is almost-everywhere rather than pointwise for every starting state; exceptional initial points may still behave irregularly. Without ergodicity, the limiting average need not be the global integral, because different invariant components can carry different statistics. Thus the result is not saying that every orbit is uniformly distributed, nor that every measurable observation has a classical pointwise limit without an integrability assumption; it says that the obstruction to a universal space average is precisely invariant information.
This stationary-process formulation is another language for the same Birkhoff averages above. Given a measure-preserving transformation $T$ and an observable $f$, the sequence $X_k=f\circ T^k$ is stationary, and its empirical mean is exactly $A_n f$. Ergodicity says that the invariant information in this process is trivial, so the long-run average along one typical orbit collapses to the space average $\int f\,d\mu$. For example, in an irrational rotation of the circle, the time average of a continuous observation agrees for almost every starting point with its integral over the circle; for a system decomposed into two invariant pieces of positive measure, the corresponding averages can remember which piece the orbit started in. This is why orbit averages are stronger than recurrence: recurrence says that typical points return, while Birkhoff's theorem identifies the statistical value accumulated between those returns.
## Sources of Measure-Preserving Transformations
Measure-preserving maps arise in several different ways. Some come from algebraic symmetries, some from smooth volume preservation, and some from stationary probability distributions. The common feature is always the same: the chosen measure is fixed by the dynamics.
[definition: Invariant Measure]
Let $(E,\mathcal E)$ be a measurable space and let $T:E\to E$ be measurable. A measure $\mu$ on $(E,\mathcal E)$ is invariant under $T$ if
\begin{align*}
T_\#\mu=\mu.
\end{align*}
[/definition]
This reverses the emphasis of the original definition. Instead of starting with a measured space and asking whether $T$ preserves the given measure, we start with $T$ and ask which measures make it measure-preserving.
[example: A Stationary Distribution on a Finite Set]
Let $E=\{0,1\}$ and let $T:E\to E$ be the constant map
\begin{align*}
T(x)=0.
\end{align*}
Take $\mu=\delta_0$, so $\mu(A)=1$ if $0\in A$ and $\mu(A)=0$ if $0\notin A$. We verify preservation by checking an arbitrary event $A\subseteq E$. If $0\in A$, then $T(x)=0\in A$ for both $x=0$ and $x=1$, so
\begin{align*}
T^{-1}(A)=E.
\end{align*}
Hence
\begin{align*}
\mu(T^{-1}(A))=\mu(E)=1=\mu(A).
\end{align*}
If $0\notin A$, then $T(x)=0\notin A$ for both $x=0$ and $x=1$, so
\begin{align*}
T^{-1}(A)=\varnothing.
\end{align*}
Hence
\begin{align*}
\mu(T^{-1}(A))=\mu(\varnothing)=0=\mu(A).
\end{align*}
Thus $\mu(T^{-1}(A))=\mu(A)$ for every $A\in\mathcal P(E)$, so $T$ is measure-preserving on $(E,\mathcal P(E),\delta_0)$.
Now let $\nu$ be the uniform probability measure on $E$, so $\nu(A)=|A|/2$. For the event $\{0\}$, both points are sent to $0$, and therefore
\begin{align*}
T^{-1}(\{0\})=\{0,1\}=E.
\end{align*}
Thus
\begin{align*}
\nu(T^{-1}(\{0\}))=\nu(E)=\frac{|E|}{2}=\frac{2}{2}=1.
\end{align*}
But
\begin{align*}
\nu(\{0\})=\frac{|\{0\}|}{2}=\frac{1}{2}.
\end{align*}
Therefore $\nu(T^{-1}(\{0\}))\neq \nu(\{0\})$, so the same point map preserves $\delta_0$ but does not preserve the uniform probability measure.
[/example]
This example is small but important. Measure preservation is not a property of the function alone; it is a property of the function together with the measure. For smooth transformations, it is natural to ask for a check that comes from geometry rather than from testing every measurable set.
The geometric check is local volume distortion. A differentiable map expands or contracts infinitesimal volumes according to the determinant of its Jacobian matrix, and the change-of-variables theorem turns that infinitesimal statement into a statement about [Lebesgue measure](/page/Lebesgue%20Measure) of sets. The next theorem gives the standard criterion that turns a smooth volume calculation into measure preservation.
[quotetheorem:8386]
Smooth volume preservation is therefore a special case of the same abstract definition. The determinant condition says infinitesimal volume is preserved everywhere, while measure preservation says every measurable event has the same volume after pullback.
[example: A Shear Preserves Area]
Let $E=\mathbb R^2$, let $\mu=\mathcal L^2$, and define $T:\mathbb R^2\to\mathbb R^2$ by
\begin{align*}
T(x_1,x_2)=(x_1+x_2,x_2).
\end{align*}
The inverse map is
\begin{align*}
T^{-1}(y_1,y_2)=(y_1-y_2,y_2),
\end{align*}
because
\begin{align*}
T(y_1-y_2,y_2)=((y_1-y_2)+y_2,y_2)=(y_1,y_2).
\end{align*}
Both $T$ and $T^{-1}$ have polynomial coordinate functions, so $T$ is a $C^1$ diffeomorphism from $\mathbb R^2$ onto $\mathbb R^2$.
For $x=(x_1,x_2)$ and $h=(h_1,h_2)$,
\begin{align*}
T(x+h)-T(x)=((x_1+h_1)+(x_2+h_2),x_2+h_2)-(x_1+x_2,x_2)=(h_1+h_2,h_2).
\end{align*}
Thus the Jacobian matrix has first row $(1,1)$ and second row $(0,1)$. Its determinant is
\begin{align*}
\det JT_x=(1)(1)-(1)(0)=1.
\end{align*}
Therefore
\begin{align*}
|\det JT_x|=|1|=1
\end{align*}
for every $x\in\mathbb R^2$. By the *[Jacobian Criterion for Volume Preservation](/theorems/8386)*, $T$ preserves the restricted Lebesgue measure $\mathcal L^2|_{\mathbb R^2}=\mathcal L^2$. The map shears rectangles into parallelograms, but the determinant calculation shows that their Lebesgue area is unchanged.
[/example]
This geometric picture complements the doubling map. A shear is invertible and volume-preserving in the classical sense; the doubling map is non-injective but still measure-preserving on the circle. The abstract definition includes both.
## Beyond and Connected Topics
Measure-preserving transformations are the entry point to ergodic theory, where one studies invariant sets, orbit averages, mixing, entropy, and classification up to measure-preserving isomorphism. Ergodicity is the first indecomposability condition; mixing and entropy ask finer questions about how information spreads under iteration.
They also connect directly to probability and [measure](/page/Cambridge%20IB%20Probability%20and%20Measure) through [stationary processes](/page/Time%20Series%20Analysis). If $(X_k)_{k\in\mathbb N}$ is a stationary stochastic process, the shift on the path space preserves the law of the whole process. In that setting, Birkhoff's theorem becomes a rigorous law of long-run empirical averages.
In smooth dynamics, measure preservation often comes from geometry: symplectic maps preserve phase-space volume, geodesic flows preserve Liouville measure, and divergence-free flows preserve volume under suitable hypotheses. These examples translate geometric conservation laws into invariant measures.
Finally, invariant measures are central in dynamics even when a map is not initially given with a preferred measure. Krylov-Bogolyubov type arguments, Markov-chain stationary distributions, and physical measures all ask which probability measures remain fixed under time evolution.
## References
Androma, [Cambridge IB Probability and Measure](/page/Cambridge%20IB%20Probability%20and%20Measure).
Androma, [Geometric Measure Theory II: Area and Coarea Formulas](/page/Geometric%20Measure%20Theory%20II%3A%20Area%20and%20Coarea%20Formulas).
Androma, [Geometric Measure Theory III: BV Functions and Sets of Finite Perimeter](/page/Geometric%20Measure%20Theory%20III%3A%20BV%20Functions%20and%20Sets%20of%20Finite%20Perimeter).
Androma, [Time Series Analysis](/page/Time%20Series%20Analysis).
Peter Walters, *An Introduction to Ergodic Theory* (1982).
Karl Petersen, *Ergodic Theory* (1983).
Manfred Einsiedler and Thomas Ward, *Ergodic Theory with a View Towards Number Theory* (2011).
Patrick Billingsley, *Probability and Measure* (1995).
Measure-Preserving Transformation
Also known as: Measure preserving map, Measure-preserving map, Measure preserving transformation, Measure-preserving system, Invariant measure transformation, Probability-preserving transformation, Volume-preserving transformation