Bernoulli Shifts Are Mixing of All Orders (Theorem # 3438)
Theorem
Let $(\Omega,\mathcal{A},p)$ be a probability space, and consider the two-sided product space
\begin{align*}
X := \Omega^{\mathbb{Z}}, \qquad \mathcal{B} := \mathcal{A}^{\otimes \mathbb{Z}}, \qquad \mu := p^{\otimes \mathbb{Z}}.
\end{align*}
For each $m\in\mathbb{Z}$, define the coordinate projection
\begin{align*}
\pi_m: X &\to \Omega \\
x &\mapsto x_m,
\end{align*}
and the [Bernoulli shift](/theorems/???)
\begin{align*}
\sigma: X &\to X \\
x &\mapsto (x_{m+1})_{m\in\mathbb{Z}}.
\end{align*}
Then the measure-preserving system $(X,\mathcal{B},\mu,\sigma)$ is [mixing of all orders](/theorems/???): for every integer $k\geq 2$, all events $A_0,\ldots,A_{k-1}\in\mathcal{B}$, and every sequence of integer tuples
\begin{align*}
0=n_0^{(r)} < n_1^{(r)} < \cdots < n_{k-1}^{(r)}
\end{align*}
such that
\begin{align*}
\min_{1\leq j\leq k-1}\bigl(n_j^{(r)}-n_{j-1}^{(r)}\bigr)\to \infty
\end{align*}
as $r\to\infty$, one has
\begin{align*}
\mu\left(\bigcap_{j=0}^{k-1}\sigma^{-n_j^{(r)}}A_j\right)
\to
\prod_{j=0}^{k-1}\mu(A_j).
\end{align*}
Analysis
Measure Theory
Discussion
Bernoulli shifts are mixing of all orders, hence strongly mixing. These canonical systems exemplify maximal mixing properties and high entropy.
Proof
[proofplan]
We first prove the assertion for cylinder events, where each event depends on only finitely many coordinates. Large separation of the time shifts makes the corresponding coordinate windows disjoint, and the defining independence of the product measure then gives exact factorisation. We then pass from cylinder events to arbitrary measurable events by approximating each event in measure by a cylinder event and controlling the intersection and product errors uniformly. Since the approximation error is arbitrary, the required mixing limit follows.
[/proofplan]
[step:Fix notation, the shift sign convention, and the mixing target]
The base probability space is $(\Omega,\mathcal{A},p)$, and the [Bernoulli shift](/theorems/???) acts on
\begin{align*}
X := \Omega^{\mathbb{Z}}, \qquad \mathcal{B} := \mathcal{A}^{\otimes \mathbb{Z}}, \qquad \mu := p^{\otimes \mathbb{Z}}.
\end{align*}
For each $m\in\mathbb{Z}$, the coordinate projection is
\begin{align*}
\pi_m: X &\to \Omega \\
x &\mapsto x_m,
\end{align*}
and we fix the sign convention
\begin{align*}
\sigma(x)_m = x_{m+1} \qquad \text{for every } m\in\mathbb{Z}.
\end{align*}
Iterating this convention gives $\sigma^n(x)_m = x_{m+n}$ for every $n\in\mathbb{Z}$ and every $m\in\mathbb{Z}$. Hence for set preimages,
\begin{align*}
\sigma^{-n}A = \{x\in X : \sigma^n(x)\in A\}
\end{align*}
for every $A\in\mathcal{B}$ and every $n\in\mathbb{Z}$.
We must prove that for every integer $k\geq 2$, all events $A_0,\ldots,A_{k-1}\in\mathcal{B}$, and every sequence of integers
\begin{align*}
0=n_0^{(r)} < n_1^{(r)} < \cdots < n_{k-1}^{(r)}
\end{align*}
such that
\begin{align*}
\min_{1\leq j\leq k-1}\bigl(n_j^{(r)}-n_{j-1}^{(r)}\bigr)\to \infty
\end{align*}
as $r\to\infty$, one has
\begin{align*}
\mu\left(\bigcap_{j=0}^{k-1}\sigma^{-n_j^{(r)}}A_j\right)
\to
\prod_{j=0}^{k-1}\mu(A_j).
\end{align*}
This is precisely [mixing of all orders](/theorems/???).
[guided]
We spell out the objects because the proof uses the product structure directly, and we pin down the sign convention because the location of the coordinate window of a shifted event depends on it. The base probability space is $(\Omega,\mathcal{A},p)$, and the [Bernoulli shift](/theorems/???) acts on
\begin{align*}
X := \Omega^{\mathbb{Z}}, \qquad \mathcal{B} := \mathcal{A}^{\otimes \mathbb{Z}}, \qquad \mu := p^{\otimes \mathbb{Z}}.
\end{align*}
The coordinate projection at $m\in\mathbb{Z}$ is the measurable map
\begin{align*}
\pi_m: X &\to \Omega \\
x &\mapsto x_m.
\end{align*}
We fix the convention
\begin{align*}
\sigma(x)_m = x_{m+1},
\end{align*}
so that the $m$-th coordinate after applying $\sigma$ is the old $(m+1)$-st coordinate. Iterating gives $\sigma^n(x)_m = x_{m+n}$ for every $n\in\mathbb{Z}$. Set preimages are defined in the usual way:
\begin{align*}
\sigma^{-n}A := \{x\in X : \sigma^n(x)\in A\}, \qquad A\in\mathcal{B},\ n\in\mathbb{Z}.
\end{align*}
The [mixing-of-all-orders](/theorems/???) statement asks for asymptotic independence of finitely many shifted events. Thus, fixing $k\geq 2$ and events $A_0,\ldots,A_{k-1}\in\mathcal{B}$, we must prove that whenever the ordered time shifts
\begin{align*}
0=n_0^{(r)} < n_1^{(r)} < \cdots < n_{k-1}^{(r)}
\end{align*}
separate so that
\begin{align*}
\min_{1\leq j\leq k-1}\bigl(n_j^{(r)}-n_{j-1}^{(r)}\bigr)\to \infty,
\end{align*}
then
\begin{align*}
\mu\left(\bigcap_{j=0}^{k-1}\sigma^{-n_j^{(r)}}A_j\right)
\to
\prod_{j=0}^{k-1}\mu(A_j).
\end{align*}
The proof first establishes this equality exactly for cylinder events once the coordinate windows are disjoint, and then uses approximation in measure to remove the cylinder assumption.
[/guided]
[/step]
[step:Record shift-invariance of the Bernoulli product measure on cylinders and extend it to $\mathcal{B}$]
We show that
\begin{align*}
\mu(\sigma^{-n}A)=\mu(A) \qquad \text{for every } A\in\mathcal{B},\ n\in\mathbb{Z}.
\end{align*}
A cylinder event is an event $C\in\mathcal{B}$ for which there exist a finite set $W\subset\mathbb{Z}$ and a measurable set $E\in\mathcal{A}^{\otimes W}$ such that
\begin{align*}
C = \pi_W^{-1}(E),
\end{align*}
where
\begin{align*}
\pi_W: X &\to \Omega^W \\
x &\mapsto (x_m)_{m\in W}.
\end{align*}
For such a cylinder, the identity $\sigma^n(x)_m = x_{m+n}$ gives
\begin{align*}
\sigma^{-n}C = \pi_{n+W}^{-1}(E),
\end{align*}
where we identify $E\in\mathcal{A}^{\otimes W}$ with the same product set in $\mathcal{A}^{\otimes(n+W)}$ via the index translation $m\mapsto m+n$. By the defining finite-coordinate description of the product measure,
\begin{align*}
\mu(\pi_{n+W}^{-1}(E))
=
p^{\otimes(n+W)}(E)
=
p^{\otimes W}(E)
=
\mu(\pi_W^{-1}(E)),
\end{align*}
since $p^{\otimes(n+W)}$ and $p^{\otimes W}$ are both the $|W|$-fold product of $p$ with itself and only differ in their indexing. Hence $\mu(\sigma^{-n}C)=\mu(C)$ for every cylinder $C$.
The collection of cylinder events is an algebra generating $\mathcal{B}$, and $\mu$ is a probability measure. The set
\begin{align*}
\mathcal{D}:=\{A\in\mathcal{B} : \mu(\sigma^{-n}A)=\mu(A) \text{ for every } n\in\mathbb{Z}\}
\end{align*}
is closed under complementation and countable disjoint unions and contains the generating algebra of cylinders. By the [monotone class theorem](/theorems/???) (in its $\pi$-$\lambda$ form), $\mathcal{D}=\mathcal{B}$.
[/step]
[step:Factor shifted cylinder events once their coordinate windows are disjoint]
Fix cylinder events $C_0,\ldots,C_{k-1}\in\mathcal{B}$. For each $j\in\{0,\ldots,k-1\}$, choose a finite coordinate window $W_j\subset\mathbb{Z}$ and a set $E_j\in\mathcal{A}^{\otimes W_j}$ such that $C_j=\pi_{W_j}^{-1}(E_j)$. Define
\begin{align*}
R := 1 + \max\{|a-b| : a\in W_i,\ b\in W_j,\ 0\leq i,j\leq k-1\},
\end{align*}
with the convention that $R:=1$ if all $W_j$ are empty. If integers $0=n_0<n_1<\cdots<n_{k-1}$ satisfy
\begin{align*}
\min_{1\leq j\leq k-1}(n_j-n_{j-1}) \geq R,
\end{align*}
then the shifted windows
\begin{align*}
n_0+W_0,\ n_1+W_1,\ \ldots,\ n_{k-1}+W_{k-1}
\end{align*}
are pairwise disjoint.
Indeed, if $i<j$, then
\begin{align*}
n_j-n_i
=
\sum_{\ell=i+1}^{j}(n_\ell-n_{\ell-1})
\geq R.
\end{align*}
For $a\in W_i$ and $b\in W_j$, the definition of $R$ gives $a-b<R$, hence
\begin{align*}
n_i+a < n_j+b.
\end{align*}
Thus $(n_i+W_i)\cap(n_j+W_j)=\varnothing$.
By the previous step, $\sigma^{-n_j}C_j=\pi_{n_j+W_j}^{-1}(E_j)$, so $\sigma^{-n_j}C_j$ depends only on the coordinates indexed by $n_j+W_j$. Therefore the events
\begin{align*}
\sigma^{-n_0}C_0,\ \sigma^{-n_1}C_1,\ \ldots,\ \sigma^{-n_{k-1}}C_{k-1}
\end{align*}
depend on pairwise disjoint finite coordinate sets. By the defining finite-coordinate independence of the product measure $\mu=p^{\otimes\mathbb{Z}}$,
\begin{align*}
\mu\left(\bigcap_{j=0}^{k-1}\sigma^{-n_j}C_j\right)
=
\prod_{j=0}^{k-1}\mu(\sigma^{-n_j}C_j).
\end{align*}
By shift-invariance, $\mu(\sigma^{-n_j}C_j)=\mu(C_j)$ for every $j$, so
\begin{align*}
\mu\left(\bigcap_{j=0}^{k-1}\sigma^{-n_j}C_j\right)
=
\prod_{j=0}^{k-1}\mu(C_j).
\end{align*}
[guided]
Cylinder events are the events whose membership is determined by finitely many coordinates. Fix cylinder events $C_0,\ldots,C_{k-1}$. For each $j\in\{0,\ldots,k-1\}$, choose a finite coordinate window $W_j\subset\mathbb{Z}$ and a measurable set $E_j\in\mathcal{A}^{\otimes W_j}$ such that $C_j=\pi_{W_j}^{-1}(E_j)$. We need a single separation scale large enough to prevent all translated windows from overlapping. Define
\begin{align*}
R := 1 + \max\{|a-b| : a\in W_i,\ b\in W_j,\ 0\leq i,j\leq k-1\},
\end{align*}
with the convention $R:=1$ if every $W_j$ is empty. This $R$ is finite because the sets $W_j$ are finite.
Assume now that integers $0=n_0<n_1<\cdots<n_{k-1}$ satisfy
\begin{align*}
\min_{1\leq j\leq k-1}(n_j-n_{j-1}) \geq R.
\end{align*}
We verify that the shifted windows
\begin{align*}
n_0+W_0,\ n_1+W_1,\ \ldots,\ n_{k-1}+W_{k-1}
\end{align*}
are pairwise disjoint. Fix $i<j$. The adjacent gaps add, so
\begin{align*}
n_j-n_i
=
\sum_{\ell=i+1}^{j}(n_\ell-n_{\ell-1})
\geq R.
\end{align*}
If $a\in W_i$ and $b\in W_j$, then the definition of $R$ gives $a-b<R$. Combining this with $n_j-n_i\geq R$ yields
\begin{align*}
n_i+a < n_j+b.
\end{align*}
Thus no integer can lie in both $n_i+W_i$ and $n_j+W_j$, so the two shifted windows are disjoint. Since $i<j$ was arbitrary, all shifted windows are pairwise disjoint.
The point of this disjointness is that product measure makes disjoint coordinate blocks independent. Under our fixed convention $\sigma(x)_m=x_{m+1}$, the previous step identified $\sigma^{-n_j}C_j$ with the cylinder $\pi_{n_j+W_j}^{-1}(E_j)$, which depends only on the shifted coordinate window $n_j+W_j$. Therefore the finite-coordinate independence built into $\mu=p^{\otimes\mathbb{Z}}$ gives
\begin{align*}
\mu\left(\bigcap_{j=0}^{k-1}\sigma^{-n_j}C_j\right)
=
\prod_{j=0}^{k-1}\mu(\sigma^{-n_j}C_j).
\end{align*}
Applying the shift-invariance from the previous step, $\mu(\sigma^{-n_j}C_j)=\mu(C_j)$ for each $j$. Hence, once the cylinder windows are separated,
\begin{align*}
\mu\left(\bigcap_{j=0}^{k-1}\sigma^{-n_j}C_j\right)
=
\prod_{j=0}^{k-1}\mu(C_j).
\end{align*}
This is stronger than a limiting statement: for cylinders the equality is exact after sufficiently large separation.
[/guided]
[/step]
[step:Approximate arbitrary events by cylinder events]
Let $A_0,\ldots,A_{k-1}\in\mathcal{B}$ and let $\varepsilon>0$. The finite-coordinate cylinder events form an algebra that generates $\mathcal{B}$, and $\mu$ is a finite measure. By the [monotone class theorem](/theorems/???) (in its measure-approximation form for finite measures), there exist cylinder events $C_0,\ldots,C_{k-1}\in\mathcal{B}$ such that
\begin{align*}
\mu(A_j\triangle C_j) < \frac{\varepsilon}{3k}
\end{align*}
for every $j\in\{0,\ldots,k-1\}$.
For any integers $0=n_0<n_1<\cdots<n_{k-1}$, define
\begin{align*}
I_A(n_0,\ldots,n_{k-1})
&:=
\mu\left(\bigcap_{j=0}^{k-1}\sigma^{-n_j}A_j\right),\\
I_C(n_0,\ldots,n_{k-1})
&:=
\mu\left(\bigcap_{j=0}^{k-1}\sigma^{-n_j}C_j\right).
\end{align*}
By the inclusion
\begin{align*}
\left(\bigcap_{j=0}^{k-1}\sigma^{-n_j}A_j\right)
\triangle
\left(\bigcap_{j=0}^{k-1}\sigma^{-n_j}C_j\right)
\subseteq
\bigcup_{j=0}^{k-1}\sigma^{-n_j}(A_j\triangle C_j),
\end{align*}
the union bound and shift-invariance of $\mu$ give
\begin{align*}
\left|I_A(n_0,\ldots,n_{k-1})-I_C(n_0,\ldots,n_{k-1})\right|
&\leq
\mu\left(
\bigcup_{j=0}^{k-1}\sigma^{-n_j}(A_j\triangle C_j)
\right)\\
&\leq
\sum_{j=0}^{k-1}\mu(\sigma^{-n_j}(A_j\triangle C_j))\\
&=
\sum_{j=0}^{k-1}\mu(A_j\triangle C_j)\\
&<
\frac{\varepsilon}{3}.
\end{align*}
[guided]
We now pass from finite-coordinate events to arbitrary measurable events. The reason this is possible is that cylinder events generate the product $\sigma$-algebra $\mathcal{B}=\mathcal{A}^{\otimes\mathbb{Z}}$, and $\mu$ is a finite measure because it is a probability measure. The [monotone class theorem](/theorems/???) for finite measures therefore says that every $A\in\mathcal{B}$ can be approximated in $\mu$-measure by cylinder events: for every $A\in\mathcal{B}$ and every $\delta>0$, there exists a cylinder event $C$ with $\mu(A\triangle C)<\delta$.
Fix $A_0,\ldots,A_{k-1}\in\mathcal{B}$ and $\varepsilon>0$. Applying this approximation separately to each $A_j$ with $\delta=\varepsilon/(3k)$, choose cylinder events $C_0,\ldots,C_{k-1}$ such that
\begin{align*}
\mu(A_j\triangle C_j) < \frac{\varepsilon}{3k}
\end{align*}
for every $j\in\{0,\ldots,k-1\}$. The symmetric difference $A_j\triangle C_j$ is the set on which the two events disagree.
For integers $0=n_0<n_1<\cdots<n_{k-1}$, introduce the two intersection probabilities
\begin{align*}
I_A(n_0,\ldots,n_{k-1})
&:=
\mu\left(\bigcap_{j=0}^{k-1}\sigma^{-n_j}A_j\right),\\
I_C(n_0,\ldots,n_{k-1})
&:=
\mu\left(\bigcap_{j=0}^{k-1}\sigma^{-n_j}C_j\right).
\end{align*}
If the two intersections differ at some point $x$, then $x$ belongs to one of the intersections but not the other, and this forces at least one index $j$ for which $x$ lies in the symmetric difference $\sigma^{-n_j}A_j\triangle\sigma^{-n_j}C_j=\sigma^{-n_j}(A_j\triangle C_j)$. Therefore the symmetric difference of the two intersections is contained in the union of the shifted symmetric differences:
\begin{align*}
\left(\bigcap_{j=0}^{k-1}\sigma^{-n_j}A_j\right)
\triangle
\left(\bigcap_{j=0}^{k-1}\sigma^{-n_j}C_j\right)
\subseteq
\bigcup_{j=0}^{k-1}\sigma^{-n_j}(A_j\triangle C_j).
\end{align*}
Taking $\mu$-measures and using the union bound gives
\begin{align*}
\left|I_A(n_0,\ldots,n_{k-1})-I_C(n_0,\ldots,n_{k-1})\right|
&\leq
\mu\left(
\bigcup_{j=0}^{k-1}\sigma^{-n_j}(A_j\triangle C_j)
\right)\\
&\leq
\sum_{j=0}^{k-1}\mu(\sigma^{-n_j}(A_j\triangle C_j)).
\end{align*}
By the shift-invariance of $\mu$ established earlier, each term satisfies
\begin{align*}
\mu(\sigma^{-n_j}(A_j\triangle C_j))=\mu(A_j\triangle C_j).
\end{align*}
Thus
\begin{align*}
\left|I_A(n_0,\ldots,n_{k-1})-I_C(n_0,\ldots,n_{k-1})\right|
&\leq
\sum_{j=0}^{k-1}\mu(A_j\triangle C_j)\\
&<
\frac{\varepsilon}{3}.
\end{align*}
This estimate is uniform in the time shifts; it does not require the shifts to be separated.
[/guided]
[/step]
[step:Control the product of measures under cylinder approximation]
For numbers $a_0,\ldots,a_{k-1},b_0,\ldots,b_{k-1}\in[0,1]$, the telescoping identity gives
\begin{align*}
\prod_{j=0}^{k-1}a_j-\prod_{j=0}^{k-1}b_j
=
\sum_{\ell=0}^{k-1}
\left(\prod_{j=0}^{\ell-1} b_j\right)
(a_\ell-b_\ell)
\left(\prod_{j=\ell+1}^{k-1} a_j\right).
\end{align*}
Taking absolute values and using $0\leq a_j,b_j\leq 1$ gives
\begin{align*}
\left|\prod_{j=0}^{k-1}a_j-\prod_{j=0}^{k-1}b_j\right|
\leq
\sum_{\ell=0}^{k-1}|a_\ell-b_\ell|.
\end{align*}
Apply this with $a_j=\mu(A_j)$ and $b_j=\mu(C_j)$. Since
\begin{align*}
|\mu(A_j)-\mu(C_j)|\leq \mu(A_j\triangle C_j),
\end{align*}
we obtain
\begin{align*}
\left|
\prod_{j=0}^{k-1}\mu(A_j)
-
\prod_{j=0}^{k-1}\mu(C_j)
\right|
\leq
\sum_{j=0}^{k-1}\mu(A_j\triangle C_j)
<
\frac{\varepsilon}{3}.
\end{align*}
[guided]
We also need to bound the error between the products $\prod_j \mu(A_j)$ and $\prod_j \mu(C_j)$. Since each factor lies in $[0,1]$, a single-factor change of magnitude $\eta$ can change the full product by at most $\eta$. We make this telescoping argument explicit.
For numbers $a_0,\ldots,a_{k-1},b_0,\ldots,b_{k-1}\in[0,1]$, write the difference of products as a telescoping sum where each term replaces exactly one $b_\ell$ by $a_\ell$:
\begin{align*}
\prod_{j=0}^{k-1}a_j-\prod_{j=0}^{k-1}b_j
=
\sum_{\ell=0}^{k-1}
\left(\prod_{j=0}^{\ell-1} b_j\right)
(a_\ell-b_\ell)
\left(\prod_{j=\ell+1}^{k-1} a_j\right).
\end{align*}
This identity is verified by checking that successive partial sums replace one factor at a time. Each product of $b_j$'s or $a_j$'s in $[0,1]$ has absolute value at most $1$, so taking absolute values and applying the triangle inequality gives
\begin{align*}
\left|\prod_{j=0}^{k-1}a_j-\prod_{j=0}^{k-1}b_j\right|
\leq
\sum_{\ell=0}^{k-1}|a_\ell-b_\ell|.
\end{align*}
We now apply this with $a_j=\mu(A_j)$ and $b_j=\mu(C_j)$, both of which lie in $[0,1]$ because $\mu$ is a probability measure. The basic inequality
\begin{align*}
|\mu(A_j)-\mu(C_j)|\leq \mu(A_j\triangle C_j)
\end{align*}
holds because $A_j\subseteq C_j\cup(A_j\triangle C_j)$ and $C_j\subseteq A_j\cup(A_j\triangle C_j)$, so $\mu(A_j)\leq\mu(C_j)+\mu(A_j\triangle C_j)$ and symmetrically. Combining the telescoping bound with the cylinder approximation from the previous step,
\begin{align*}
\left|
\prod_{j=0}^{k-1}\mu(A_j)
-
\prod_{j=0}^{k-1}\mu(C_j)
\right|
\leq
\sum_{j=0}^{k-1}\mu(A_j\triangle C_j)
<
k\cdot\frac{\varepsilon}{3k}
=
\frac{\varepsilon}{3}.
\end{align*}
Like the intersection estimate, this bound does not involve the time shifts $n_0,\ldots,n_{k-1}$ at all.
[/guided]
[/step]
[step:Combine exact cylinder factorisation with approximation to obtain the limit]
Applying the cylinder-factorisation step to the cylinders $C_0,\ldots,C_{k-1}$ chosen above, there exists $R\in\mathbb{N}$ such that whenever
\begin{align*}
0=n_0<n_1<\cdots<n_{k-1}
\end{align*}
and
\begin{align*}
\min_{1\leq j\leq k-1}(n_j-n_{j-1})\geq R,
\end{align*}
one has
\begin{align*}
I_C(n_0,\ldots,n_{k-1})
=
\prod_{j=0}^{k-1}\mu(C_j).
\end{align*}
For such shifts, the triangle inequality, the intersection approximation estimate, and the product approximation estimate give
\begin{align*}
\left|
\mu\left(\bigcap_{j=0}^{k-1}\sigma^{-n_j}A_j\right)
-
\prod_{j=0}^{k-1}\mu(A_j)
\right|
&\leq
\left|I_A(n_0,\ldots,n_{k-1})-I_C(n_0,\ldots,n_{k-1})\right|\\
&\quad+
\left|
\prod_{j=0}^{k-1}\mu(C_j)
-
\prod_{j=0}^{k-1}\mu(A_j)
\right|\\
&<
\frac{2\varepsilon}{3}
<
\varepsilon.
\end{align*}
Therefore, for every $\varepsilon>0$, all sufficiently separated shifts satisfy
\begin{align*}
\left|
\mu\left(\bigcap_{j=0}^{k-1}\sigma^{-n_j}A_j\right)
-
\prod_{j=0}^{k-1}\mu(A_j)
\right|
<\varepsilon.
\end{align*}
Equivalently,
\begin{align*}
\mu\left(\bigcap_{j=0}^{k-1}\sigma^{-n_j}A_j\right)
\to
\prod_{j=0}^{k-1}\mu(A_j)
\end{align*}
as $\min_{1\leq j\leq k-1}(n_j-n_{j-1})\to\infty$. Since $k\geq 2$ and $A_0,\ldots,A_{k-1}\in\mathcal{B}$ were arbitrary, the Bernoulli shift is mixing of all orders.
[guided]
We now combine the three ingredients: the exact cylinder factorisation, the intersection-approximation error, and the product-approximation error. Recall that the cylinders $C_0,\ldots,C_{k-1}$ were chosen so that $\mu(A_j\triangle C_j)<\varepsilon/(3k)$ for every $j$.
Apply the cylinder-factorisation step to these cylinders. There exists $R\in\mathbb{N}$ — depending on the cylinder windows $W_0,\ldots,W_{k-1}$ and hence on $\varepsilon$ — such that whenever
\begin{align*}
0=n_0<n_1<\cdots<n_{k-1}\quad\text{and}\quad\min_{1\leq j\leq k-1}(n_j-n_{j-1})\geq R,
\end{align*}
the exact identity
\begin{align*}
I_C(n_0,\ldots,n_{k-1})
=
\prod_{j=0}^{k-1}\mu(C_j)
\end{align*}
holds. For such shifts, the triangle inequality bounds the target error by two pieces that we have already estimated:
\begin{align*}
\left|
I_A(n_0,\ldots,n_{k-1})
-
\prod_{j=0}^{k-1}\mu(A_j)
\right|
&\leq
\left|I_A(n_0,\ldots,n_{k-1})-I_C(n_0,\ldots,n_{k-1})\right|\\
&\quad+
\left|
I_C(n_0,\ldots,n_{k-1})-\prod_{j=0}^{k-1}\mu(C_j)
\right|\\
&\quad+
\left|
\prod_{j=0}^{k-1}\mu(C_j)
-
\prod_{j=0}^{k-1}\mu(A_j)
\right|.
\end{align*}
The middle term vanishes by the cylinder factorisation. The first and third terms are each bounded by $\varepsilon/3$ by the previous two steps. Hence
\begin{align*}
\left|
I_A(n_0,\ldots,n_{k-1})
-
\prod_{j=0}^{k-1}\mu(A_j)
\right|
<
\frac{\varepsilon}{3}+0+\frac{\varepsilon}{3}
=
\frac{2\varepsilon}{3}
<
\varepsilon.
\end{align*}
What does this give us as a limit statement? Fix any sequence of tuples
\begin{align*}
0=n_0^{(r)} < n_1^{(r)} < \cdots < n_{k-1}^{(r)}
\end{align*}
with $\min_{1\leq j\leq k-1}(n_j^{(r)}-n_{j-1}^{(r)})\to\infty$ as $r\to\infty$. For our fixed $\varepsilon$, the constant $R=R(\varepsilon)$ above is finite, so the separation $\min_{1\leq j\leq k-1}(n_j^{(r)}-n_{j-1}^{(r)})$ is eventually at least $R(\varepsilon)$. From that point on,
\begin{align*}
\left|
\mu\left(\bigcap_{j=0}^{k-1}\sigma^{-n_j^{(r)}}A_j\right)
-
\prod_{j=0}^{k-1}\mu(A_j)
\right|
<\varepsilon.
\end{align*}
Since $\varepsilon>0$ was arbitrary, this proves
\begin{align*}
\mu\left(\bigcap_{j=0}^{k-1}\sigma^{-n_j^{(r)}}A_j\right)
\to
\prod_{j=0}^{k-1}\mu(A_j)
\end{align*}
as $r\to\infty$. Since $k\geq 2$ and $A_0,\ldots,A_{k-1}\in\mathcal{B}$ were arbitrary, the Bernoulli shift $(X,\mathcal{B},\mu,\sigma)$ is mixing of all orders.
[/guided]
[/step]
Explore Further
Cross Product as the Hodge Dual of the Wedge Product
analysis
Zeroth De Rham Cohomology of a Smooth Manifold
analysis
Maximal Ergodic Lemma
analysis
Oka's Coherence Theorem
analysis
Levi Problem and Hörmander $L^2$ $\bar{\partial}$ Theorem
analysis
Halmos–von Neumann Theorem
analysis
Natural Boundary Condition Theorem
analysis
Error Correction Principle for the $\bar\partial$ Equation
analysis