[proofplan]We realize the Bernoulli shift on the product probability space $(\Omega, \mathcal{B}, \mu)$ and identify the finite-coordinate cylinder algebra $\mathcal{A}$ of events depending on only finitely many coordinates. We verify that $\mathcal{A}$ is an algebra generating $\mathcal{B}$ and that $\sigma$ preserves $\mu$. For events in $\mathcal{A}$ whose coordinate windows are separated, the product structure of $\mu$ gives exact independence, so the strong mixing identity for elements of $\mathcal{A}$ holds with equality for all sufficiently large $N$. We extend to arbitrary measurable sets via the Caratheodory approximation theorem and a symmetric-difference perturbation bound. Strong mixing then implies weak mixing and ergodicity directly from the definitions.[/proofplan]
[step:Realize the Bernoulli shift and identify its cylinder algebra]
The sequence space $(\Omega, \mathcal{B}, \mu)$ and the shift $\sigma: \Omega \to \Omega$, $(\sigma x)_n = x_{n+1}$, are as in the theorem statement. The hypothesis $\sum_{i=0}^{k-1} p_i = 1$ ensures $\nu$ is a probability measure on $(S, 2^S)$, and the product construction then yields a probability measure $\mu$ on $(\Omega, \mathcal{B})$.
For every finite $W \subset \mathbb{Z}$ and every $A \subseteq S^W$, define the finite-coordinate [cylinder set](/page/Cylinder%20Set)
\begin{align*}
C_W(A) := \{x \in \Omega : (x_n)_{n \in W} \in A\}.
\end{align*}
With the convention that $S^{\varnothing}$ is a one-point set, $C_{\varnothing}(\varnothing) = \varnothing$ and $C_{\varnothing}(S^{\varnothing}) = \Omega$. Let
\begin{align*}
\mathcal{A} := \{C_W(A) : W \subset \mathbb{Z} \text{ finite}, \, A \subseteq S^W\}.
\end{align*}
We verify three properties of $\mathcal{A}$.
(i) *Closure under complement.* For every $C_W(A) \in \mathcal{A}$, $\Omega \setminus C_W(A) = C_W(S^W \setminus A) \in \mathcal{A}$.
(ii) *Closure under finite unions.* Given $C_{W_1}(A_1)$ and $C_{W_2}(A_2)$, let $W := W_1 \cup W_2$ and define $\tilde A_i := \{b \in S^W : b|_{W_i} \in A_i\}$ for $i \in \{1, 2\}$. Then $C_{W_i}(A_i) = C_W(\tilde A_i)$, and
\begin{align*}
C_{W_1}(A_1) \cup C_{W_2}(A_2) = C_W(\tilde A_1 \cup \tilde A_2) \in \mathcal{A}.
\end{align*}
In particular $\mathcal{A}$ is an algebra.
(iii) *Generation.* The product $\sigma$-algebra $\mathcal{B}$ is by definition the smallest $\sigma$-algebra making each coordinate projection $\pi_n: \Omega \to S$, $\pi_n(x) = x_n$, measurable. Each event $\{\pi_n \in B\}$ with $B \subseteq S$ equals $C_{\{n\}}(B) \in \mathcal{A}$, so $\sigma(\mathcal{A}) = \mathcal{B}$.
*The shift $\sigma$ preserves $\mu$.* For any cylinder $C_W(A) \in \mathcal{A}$, the identity $(\sigma x)_n = x_{n+1}$ gives
\begin{align*}
\sigma^{-1} C_W(A) = C_{W-1}(A'),
\end{align*}
where $W - 1 := \{n - 1 : n \in W\}$ and $A' \subseteq S^{W-1}$ is the relabelling of $A$ along the bijection $n \mapsto n - 1$. The defining property of the product measure $\mu = \bigotimes_{n \in \mathbb{Z}} \nu$ is that for every finite $F \subset \mathbb{Z}$ the pushforward of $\mu$ under the projection $\pi_F: x \mapsto (x_n)_{n \in F}$ equals the finite product $\nu^F := \bigotimes_{n \in F} \nu$. Because every coordinate marginal is the same measure $\nu$, the product $\nu^F$ depends only on the cardinality of $F$, not on which indices $F$ contains. Hence
\begin{align*}
\mu(\sigma^{-1}C_W(A)) = \nu^{W-1}(A') = \nu^W(A) = \mu(C_W(A)).
\end{align*}
So $\mu \circ \sigma^{-1}$ and $\mu$ agree on $\mathcal{A}$. By the [Uniqueness of Extension Theorem](/theorems/???) (two probability measures agreeing on a generating algebra agree on the generated $\sigma$-algebra), $\mu \circ \sigma^{-1} = \mu$ on $\mathcal{B}$.
[/step]
[step:Prove exact independence for sufficiently separated cylinder events]
Let $E_1 = C_{W_1}(A_1)$ and $E_2 = C_{W_2}(A_2)$ be arbitrary elements of $\mathcal{A}$. For $N \in \mathbb{N}$ set $W_1 + N := \{n + N : n \in W_1\}$. From $(\sigma^N x)_n = x_{n+N}$,
\begin{align*}
\sigma^{-N} E_1 = C_{W_1 + N}(A_1'),
\end{align*}
where $A_1' \subseteq S^{W_1 + N}$ is the relabelling of $A_1$ along $n \mapsto n + N$.
If either $W_1 = \varnothing$ or $W_2 = \varnothing$, then the corresponding $E_i$ equals $\varnothing$ or $\Omega$, and the identity
\begin{align*}
\mu(\sigma^{-N}E_1 \cap E_2) = \mu(E_1)\mu(E_2)
\end{align*}
holds for every $N \in \mathbb{N}$ by direct inspection. Assume henceforth that $W_1$ and $W_2$ are both nonempty.
Set
\begin{align*}
N_0 := \max W_2 - \min W_1 + 1 \in \mathbb{Z}.
\end{align*}
For every integer $N \geq N_0$, every element $n + N$ of $W_1 + N$ satisfies
\begin{align*}
n + N \geq \min W_1 + N_0 = \max W_2 + 1 > \max W_2,
\end{align*}
hence $(W_1 + N) \cap W_2 = \varnothing$.
Set $F := (W_1 + N) \cup W_2$. By the defining property of the product measure, the pushforward $(\pi_F)_*\mu$ equals $\nu^{W_1 + N} \otimes \nu^{W_2}$ on $S^{W_1 + N} \times S^{W_2}$ because $W_1 + N$ and $W_2$ are disjoint. Therefore
\begin{align*}
\mu(\sigma^{-N}E_1 \cap E_2)
&= \mu\bigl(C_{W_1 + N}(A_1') \cap C_{W_2}(A_2)\bigr) \\
&= (\nu^{W_1 + N} \otimes \nu^{W_2})(A_1' \times A_2) \\
&= \nu^{W_1 + N}(A_1') \, \nu^{W_2}(A_2) \\
&= \mu(\sigma^{-N}E_1) \, \mu(E_2).
\end{align*}
By Step 1, $\sigma$ preserves $\mu$, so $\mu(\sigma^{-N}E_1) = \mu(E_1)$. Hence, for every integer $N \geq N_0$,
\begin{align*}
\mu(\sigma^{-N}E_1 \cap E_2) = \mu(E_1)\mu(E_2).
\end{align*}
[guided]
The required independence is built into the construction of the product probability measure; we make the dependence on coordinate windows explicit.
Write $E_1 = C_{W_1}(A_1)$ and $E_2 = C_{W_2}(A_2)$ with $W_1, W_2 \subset \mathbb{Z}$ finite and $A_i \subseteq S^{W_i}$. If $W_1 = \varnothing$, then $E_1 \in \{\varnothing, \Omega\}$ and the identity to be proved is immediate for every $N$; similarly for $W_2$. We therefore assume $W_1, W_2 \neq \varnothing$, which is the case the definition of $N_0$ requires.
First track $\sigma^{-N}E_1$. Since $(\sigma^N x)_n = x_{n+N}$, the event $\sigma^N x \in E_1$ is the condition $(x_{n+N})_{n \in W_1} \in A_1$. Relabelling indices $m = n + N$ converts this to $(x_m)_{m \in W_1 + N} \in A_1'$, where $A_1'$ is the obvious image of $A_1$ under the bijection $n \mapsto n + N$. Thus $\sigma^{-N}E_1 = C_{W_1 + N}(A_1')$ — an event depending only on coordinates in $W_1 + N$.
The choice $N_0 = \max W_2 - \min W_1 + 1$ separates the windows. For $N \geq N_0$, every element of $W_1 + N$ is at least $\min W_1 + N \geq \min W_1 + N_0 = \max W_2 + 1$, so $(W_1 + N) \cap W_2 = \varnothing$.
Now we use the defining property of the product measure. For any finite disjoint $F_1, F_2 \subset \mathbb{Z}$, the joint pushforward of $\mu$ under $\pi_{F_1 \cup F_2}$ is the product $\nu^{F_1} \otimes \nu^{F_2}$. Applying this with $F_1 = W_1 + N$ and $F_2 = W_2$,
\begin{align*}
\mu(\sigma^{-N}E_1 \cap E_2) = (\nu^{W_1 + N} \otimes \nu^{W_2})(A_1' \times A_2) = \nu^{W_1 + N}(A_1') \, \nu^{W_2}(A_2).
\end{align*}
The first factor equals $\mu(\sigma^{-N}E_1)$ and the second equals $\mu(E_2)$. Since $\sigma$ preserves $\mu$ by Step 1, $\mu(\sigma^{-N}E_1) = \mu(E_1)$. So for every $N \geq N_0$,
\begin{align*}
\mu(\sigma^{-N}E_1 \cap E_2) = \mu(E_1)\mu(E_2).
\end{align*}
This is stronger than convergence: on the cylinder algebra $\mathcal{A}$, the mixing identity holds with equality for all sufficiently large $N$, not merely in the limit. The role of the next step is to transport this eventual equality to an asymptotic statement on all of $\mathcal{B}$.
[/guided]
[/step]
[step:Approximate arbitrary measurable sets by elements of the cylinder algebra]
Let $E, F \in \mathcal{B}$ and $\varepsilon > 0$. By Step 1, $\mathcal{A}$ is an algebra with $\sigma(\mathcal{A}) = \mathcal{B}$, and $\mu$ is a probability measure on $(\Omega, \mathcal{B})$. By the [Caratheodory Approximation Theorem](/theorems/???) (for every $G \in \sigma(\mathcal{A})$ and every $\delta > 0$ there exists $G' \in \mathcal{A}$ with $\mu(G \triangle G') < \delta$), there exist $E', F' \in \mathcal{A}$ with
\begin{align*}
\mu(E \triangle E') < \varepsilon, \qquad \mu(F \triangle F') < \varepsilon,
\end{align*}
where $E \triangle E' := (E \setminus E') \cup (E' \setminus E)$ denotes the [symmetric difference](/page/Symmetric%20Difference).
For every $N \in \mathbb{N}$ define the mixing error
\begin{align*}
D_N(E, F) := \left|\mu(\sigma^{-N}E \cap F) - \mu(E)\mu(F)\right|.
\end{align*}
*Intersection perturbation.* Because preimages commute with set operations and $\sigma$ preserves $\mu$,
\begin{align*}
\mu\bigl((\sigma^{-N}E) \triangle (\sigma^{-N}E')\bigr) = \mu(\sigma^{-N}(E \triangle E')) = \mu(E \triangle E') < \varepsilon.
\end{align*}
For any measurable $G, G', H, H'$ the inclusion
\begin{align*}
(G \cap H) \triangle (G' \cap H') \subseteq (G \triangle G') \cup (H \triangle H')
\end{align*}
holds (if $\omega \in G \cap H$ but $\omega \notin G' \cap H'$, then $\omega \notin G'$ or $\omega \notin H'$; the symmetric case is identical), so subadditivity of $\mu$ gives
\begin{align*}
\left|\mu(G \cap H) - \mu(G' \cap H')\right| \leq \mu(G \triangle G') + \mu(H \triangle H').
\end{align*}
Applying this with $G = \sigma^{-N}E$, $G' = \sigma^{-N}E'$, $H = F$, $H' = F'$,
\begin{align*}
\left|\mu(\sigma^{-N}E \cap F) - \mu(\sigma^{-N}E' \cap F')\right| < 2\varepsilon.
\end{align*}
*Product perturbation.* The bound $|\mu(G) - \mu(G')| \leq \mu(G \triangle G')$ follows from $G \setminus G', G' \setminus G \subseteq G \triangle G'$. Using $\mu(F), \mu(E') \leq 1$,
\begin{align*}
|\mu(E)\mu(F) - \mu(E')\mu(F')|
&\leq |\mu(E) - \mu(E')|\mu(F) + \mu(E')|\mu(F) - \mu(F')| \\
&\leq \mu(E \triangle E') + \mu(F \triangle F') < 2\varepsilon.
\end{align*}
Combining the two perturbation bounds by the triangle inequality,
\begin{align*}
D_N(E, F) \leq D_N(E', F') + 4\varepsilon.
\end{align*}
By Step 2 applied to $E', F' \in \mathcal{A}$, there exists $N_0 \in \mathbb{N}$ such that $D_N(E', F') = 0$ for every $N \geq N_0$. Therefore
\begin{align*}
\limsup_{N \to \infty} D_N(E, F) \leq 4\varepsilon.
\end{align*}
Since $\varepsilon > 0$ was arbitrary and $D_N(E, F) \geq 0$,
\begin{align*}
\lim_{N \to \infty} \left|\mu(\sigma^{-N}E \cap F) - \mu(E)\mu(F)\right| = 0.
\end{align*}
[guided]
The cylinder algebra $\mathcal{A}$ is the natural class for which mixing is checkable by hand: its elements depend on finitely many coordinates, and disjoint coordinate sets give exact independence under the product measure. We must transfer this property to all of $\mathcal{B}$.
The vehicle is the [Caratheodory Approximation Theorem](/theorems/???). Its hypotheses, all of which we have verified, are: (i) $\mathcal{A}$ is an algebra (Step 1, items (i), (ii)); (ii) $\sigma(\mathcal{A}) = \mathcal{B}$ (Step 1, item (iii)); (iii) $\mu$ is finite, in fact a probability measure. The conclusion is that for every $G \in \mathcal{B}$ and every $\delta > 0$, there exists $G' \in \mathcal{A}$ with $\mu(G \triangle G') < \delta$. Pick such $E'$ for $E$ and $F'$ for $F$ with $\delta = \varepsilon$.
Define the mixing error $D_N(E, F)$ as above. The strategy is to bound $|D_N(E, F) - D_N(E', F')|$ by $O(\varepsilon)$ uniformly in $N$, then kill $D_N(E', F')$ in the limit via Step 2.
*Intersection term.* Because $\sigma$ is measure-preserving and preimages commute with set operations, the symmetric difference is preserved in measure:
\begin{align*}
\mu\bigl((\sigma^{-N}E) \triangle (\sigma^{-N}E')\bigr) = \mu(\sigma^{-N}(E \triangle E')) = \mu(E \triangle E') < \varepsilon.
\end{align*}
A general bound on intersections is
\begin{align*}
(G \cap H) \triangle (G' \cap H') \subseteq (G \triangle G') \cup (H \triangle H'),
\end{align*}
verified pointwise: if $\omega$ lies in $G \cap H$ but not in $G' \cap H'$, then $\omega \in G$ and $\omega \in H$, and either $\omega \notin G'$ (so $\omega \in G \triangle G'$) or $\omega \notin H'$ (so $\omega \in H \triangle H'$); the case where $\omega \in G' \cap H' \setminus G \cap H$ is symmetric. Therefore
\begin{align*}
\left|\mu(\sigma^{-N}E \cap F) - \mu(\sigma^{-N}E' \cap F')\right| < 2\varepsilon.
\end{align*}
*Product term.* Note that $|\mu(G) - \mu(G')| \leq \mu(G \triangle G')$ because $G \setminus G'$ and $G' \setminus G$ are subsets of $G \triangle G'$. Since all measures lie in $[0,1]$,
\begin{align*}
|\mu(E)\mu(F) - \mu(E')\mu(F')|
&\leq |\mu(E) - \mu(E')|\mu(F) + \mu(E')|\mu(F) - \mu(F')| \\
&\leq \mu(E \triangle E') + \mu(F \triangle F') < 2\varepsilon.
\end{align*}
Combining via the triangle inequality,
\begin{align*}
D_N(E, F) \leq D_N(E', F') + 4\varepsilon.
\end{align*}
Step 2 gives $D_N(E', F') = 0$ for all sufficiently large $N$, so $\limsup_N D_N(E, F) \leq 4\varepsilon$. Since $\varepsilon$ was arbitrary and $D_N(E, F)$ is nonnegative, the only possible limit is $0$. The factor $4$ in $4\varepsilon$ is harmless precisely because it is independent of $N$.
[/guided]
[/step]
[step:Conclude strong mixing, weak mixing, and ergodicity]
By Step 3, for every $E, F \in \mathcal{B}$,
\begin{align*}
\lim_{N \to \infty} \mu(\sigma^{-N}E \cap F) = \mu(E)\mu(F).
\end{align*}
This is the definition of [strong mixing](/page/Strong%20Mixing) for the measure-preserving system $(\Omega, \mathcal{B}, \mu, \sigma)$. Hence $B(p_0, \dots, p_{k-1})$ is strongly mixing.
For every $E, F \in \mathcal{B}$ set $a_N := \left|\mu(\sigma^{-N}E \cap F) - \mu(E)\mu(F)\right|$. By Step 3, $a_N \to 0$. The [Cesaro mean](/page/Cesaro%20Mean) of a convergent nonnegative real sequence converges to the same limit, so
\begin{align*}
\frac{1}{M}\sum_{N=0}^{M-1} \left|\mu(\sigma^{-N}E \cap F) - \mu(E)\mu(F)\right| \to 0 \qquad (M \to \infty).
\end{align*}
This is the definition of [weak mixing](/page/Weak%20Mixing) of $\sigma$.
Finally, let $E \in \mathcal{B}$ be $\sigma$-invariant, i.e. $\sigma^{-1}E = E$. Then $\sigma^{-N}E = E$ for every $N \in \mathbb{N}$, so strong mixing applied to the pair $(E, E)$ gives
\begin{align*}
\mu(E) = \mu(E \cap E) = \lim_{N \to \infty} \mu(\sigma^{-N}E \cap E) = \mu(E)^2.
\end{align*}
The equation $\mu(E) = \mu(E)^2$ in $[0, 1]$ forces $\mu(E) \in \{0, 1\}$. Every $\sigma$-invariant measurable set therefore has measure $0$ or $1$, which is the definition of [ergodicity](/page/Ergodicity).
This proves that every Bernoulli shift $B(p_0, \dots, p_{k-1})$ with $k \geq 1$ and probability vector $(p_0, \dots, p_{k-1})$ is strongly mixing, and in particular weakly mixing and ergodic.
[/step]