[proofplan]
Fix the probability-preserving system $(X, \mathcal{B}, \mu, T)$ from the theorem statement. We prove the two implications separately. For strong mixing implies weak mixing, strong mixing gives ordinary convergence of the correlation $\mu(T^{-n}A \cap B) \to \mu(A)\mu(B)$ for every $A, B \in \mathcal{B}$, and Cesàro averaging preserves convergence of a non-negative null sequence to zero, which is the definition of weak mixing. For weak mixing implies ergodicity, we apply the weak-mixing condition to an invariant set $A$ and its complement $A^c$. Invariance forces $T^{-n}A \cap A^c = \varnothing$ for every $n \in \mathbb{N} \cup \{0\}$, so the Cesàro average reduces to the constant $\mu(A)\mu(A^c)$, which must therefore vanish; because $\mu$ is a probability measure, this forces $\mu(A) \in \{0, 1\}$.
[/proofplan]
[step:Pass from strong mixing to weak mixing via Cesàro averaging]
Assume that $T$ is strongly mixing. Let $A, B \in \mathcal{B}$ be arbitrary measurable sets. Define the deviation sequence
\begin{align*}
a: \mathbb{N} \cup \{0\} &\to [0, \infty), \\
n &\mapsto |\mu(T^{-n}A \cap B) - \mu(A)\mu(B)|.
\end{align*}
By the definition of [strong mixing](/page/Strong%20Mixing) of $T$, $\mu(T^{-n}A \cap B) \to \mu(A)\mu(B)$ as $n \to \infty$, which is the statement $a(n) \to 0$.
We show that the Cesàro averages of $a$ converge to $0$. Let $\varepsilon > 0$. Since $a(n) \to 0$, there exists $N \in \mathbb{N}$ such that $a(n) < \varepsilon$ for every $n \geq N$. Define the finite constant
\begin{align*}
C_N := \sum_{n=0}^{N-1} a(n) \in [0, \infty),
\end{align*}
which is finite because each $a(n) \in [0, 1]$ (both $\mu(T^{-n}A \cap B)$ and $\mu(A)\mu(B)$ lie in $[0, 1]$, as $\mu$ is a probability measure). For every integer $M > N$,
\begin{align*}
0 \leq \frac{1}{M} \sum_{n=0}^{M-1} a(n)
&= \frac{1}{M} \sum_{n=0}^{N-1} a(n) + \frac{1}{M} \sum_{n=N}^{M-1} a(n) \\
&\leq \frac{C_N}{M} + \frac{M - N}{M} \varepsilon \\
&\leq \frac{C_N}{M} + \varepsilon.
\end{align*}
Since $C_N$ is fixed, $C_N / M \to 0$ as $M \to \infty$. Taking $\limsup_{M \to \infty}$ on both sides,
\begin{align*}
0 \leq \limsup_{M \to \infty} \frac{1}{M} \sum_{n=0}^{M-1} a(n) \leq \varepsilon.
\end{align*}
As $\varepsilon > 0$ was arbitrary and the averages are non-negative,
\begin{align*}
\lim_{M \to \infty} \frac{1}{M} \sum_{n=0}^{M-1} |\mu(T^{-n}A \cap B) - \mu(A)\mu(B)| = 0.
\end{align*}
Since $A, B \in \mathcal{B}$ were arbitrary, this is the definition of [weak mixing](/page/Weak%20Mixing) of $T$. Therefore $T$ is weakly mixing.
[guided]
We are given strong mixing of $T$: for every pair $A, B \in \mathcal{B}$, the correlation $\mu(T^{-n}A \cap B)$ converges to the product $\mu(A)\mu(B)$ as $n \to \infty$. Weak mixing requires the same correlations to converge to $\mu(A)\mu(B)$ in the weaker Cesàro sense: the time-averaged absolute deviation must vanish. The strategy is purely analytic — take the strong-mixing convergence and Cesàro-average it. No hypothesis on $T$ beyond strong mixing is consumed; even measure-preservation enters only through the definition of strong mixing itself.
Fix $A, B \in \mathcal{B}$ and define the deviation sequence
\begin{align*}
a: \mathbb{N} \cup \{0\} &\to [0, \infty), \\
n &\mapsto |\mu(T^{-n}A \cap B) - \mu(A)\mu(B)|.
\end{align*}
Strong mixing of $T$ states precisely that $a(n) \to 0$. We want to conclude that the Cesàro averages $\frac{1}{M} \sum_{n=0}^{M-1} a(n)$ also tend to $0$ — this is the classical Cesàro-mean fact applied to a non-negative null sequence, which we prove directly so that the argument is self-contained.
Fix $\varepsilon > 0$. By the definition of $a(n) \to 0$, there exists an integer $N \geq 1$ such that
\begin{align*}
a(n) < \varepsilon \quad \text{for every } n \geq N.
\end{align*}
The first $N$ terms are controlled by a single constant,
\begin{align*}
C_N := \sum_{n=0}^{N-1} a(n).
\end{align*}
Each $a(n) \leq 1$, since $\mu(T^{-n}A \cap B) \in [0, 1]$ and $\mu(A)\mu(B) \in [0, 1]$ (the probability hypothesis $\mu(X) = 1$ is used here to bound the measures by $1$), so $C_N \leq N < \infty$.
For $M > N$, split the sum at index $N$:
\begin{align*}
\frac{1}{M} \sum_{n=0}^{M-1} a(n)
&= \frac{1}{M} \sum_{n=0}^{N-1} a(n) + \frac{1}{M} \sum_{n=N}^{M-1} a(n) \\
&\leq \frac{C_N}{M} + \frac{1}{M}(M - N)\varepsilon \\
&\leq \frac{C_N}{M} + \varepsilon.
\end{align*}
The first term $C_N / M$ tends to $0$ as $M \to \infty$ because $C_N$ is a fixed finite constant. Therefore
\begin{align*}
\limsup_{M \to \infty} \frac{1}{M} \sum_{n=0}^{M-1} a(n) \leq \varepsilon.
\end{align*}
The averages are non-negative, so their $\liminf$ is at least $0$. Letting $\varepsilon \downarrow 0$,
\begin{align*}
\lim_{M \to \infty} \frac{1}{M} \sum_{n=0}^{M-1} |\mu(T^{-n}A \cap B) - \mu(A)\mu(B)| = 0.
\end{align*}
This is the defining property of [weak mixing](/page/Weak%20Mixing). Since $A, B \in \mathcal{B}$ were arbitrary, $T$ is weakly mixing.
[/guided]
[/step]
[step:Pass from weak mixing to ergodicity via an invariant set and its complement]
Assume that $T$ is weakly mixing. We show that every $T$-invariant measurable set has measure $0$ or $1$.
Let $A \in \mathcal{B}$ be $T$-invariant, meaning $T^{-1}A = A$. Define the complement
\begin{align*}
A^c := X \setminus A \in \mathcal{B}.
\end{align*}
We first prove by induction on $n$ that $T^{-n}A = A$ for every $n \in \mathbb{N} \cup \{0\}$. For the base case $n = 0$, we have $T^{-0}A = (\operatorname{id}_X)^{-1}A = A$ by the convention $T^0 = \operatorname{id}_X$; this case is needed because the Cesàro sums defining weak mixing begin at $n = 0$. For the inductive step, assume $T^{-n}A = A$ for some $n \geq 0$. Then
\begin{align*}
T^{-(n+1)}A = T^{-1}(T^{-n}A) = T^{-1}A = A,
\end{align*}
using the inductive hypothesis in the second equality and the invariance $T^{-1}A = A$ in the third. By induction, $T^{-n}A = A$ for every $n \in \mathbb{N} \cup \{0\}$.
Consequently, for every $n \in \mathbb{N} \cup \{0\}$,
\begin{align*}
T^{-n}A \cap A^c = A \cap A^c = \varnothing,
\end{align*}
so
\begin{align*}
\mu(T^{-n}A \cap A^c) = \mu(\varnothing) = 0.
\end{align*}
Applying the definition of [weak mixing](/page/Weak%20Mixing) of $T$ to the ordered pair $(A, A^c) \in \mathcal{B} \times \mathcal{B}$,
\begin{align*}
0
&= \lim_{M \to \infty} \frac{1}{M} \sum_{n=0}^{M-1} |\mu(T^{-n}A \cap A^c) - \mu(A)\mu(A^c)| \\
&= \lim_{M \to \infty} \frac{1}{M} \sum_{n=0}^{M-1} |0 - \mu(A)\mu(A^c)| \\
&= \mu(A)\mu(A^c),
\end{align*}
where the last equality holds because the summand is independent of $n$ and equals the constant $\mu(A)\mu(A^c)$.
Since $(X, \mathcal{B}, \mu)$ is a probability space, $\mu(X) = 1$. From the disjoint decomposition $X = A \sqcup A^c$ and finite additivity of $\mu$,
\begin{align*}
\mu(A^c) = \mu(X) - \mu(A) = 1 - \mu(A).
\end{align*}
Substituting,
\begin{align*}
\mu(A)(1 - \mu(A)) = 0.
\end{align*}
Since $0 \leq \mu(A) \leq 1$, the only solutions are $\mu(A) = 0$ or $\mu(A) = 1$.
Every $T$-invariant measurable set has $\mu$-measure $0$ or $1$, which is the definition of [ergodicity](/page/Ergodic%20Transformation) for the measure-preserving transformation $T$. Therefore $T$ is ergodic.
[guided]
We are given that $T$ is weakly mixing on the probability space $(X, \mathcal{B}, \mu)$. Ergodicity is the statement that every $T$-invariant measurable set has $\mu$-measure $0$ or $1$. Fix an invariant set $A \in \mathcal{B}$; the goal is to show $\mu(A) \in \{0, 1\}$.
The key idea is to apply weak mixing to the pair $(A, A^c)$ rather than $(A, A)$. Both $A$ and $A^c$ are invariant; more importantly, $A$ and $A^c$ are disjoint, so the intersection $T^{-n}A \cap A^c$ is empty for every $n$. The weak-mixing Cesàro average, which compares the correlations $\mu(T^{-n}A \cap A^c)$ to the product $\mu(A)\mu(A^c)$, then reduces to the constant $\mu(A)\mu(A^c)$, and weak mixing forces this constant to vanish. The probability hypothesis $\mu(X) = 1$ is what converts the vanishing product into the dichotomy $\mu(A) \in \{0,1\}$.
Define $A^c := X \setminus A \in \mathcal{B}$. The proof has three logical pieces: (i) iterated preimages of an invariant set are still that set; (ii) intersection with $A^c$ is therefore empty at every iterate; (iii) the weak-mixing average then equates $\mu(A)\mu(A^c)$ to $0$.
*(i) Iterated invariance.* We show by induction on $n$ that $T^{-n}A = A$ for every $n \in \mathbb{N} \cup \{0\}$. The base case $n = 0$ is the identity $T^{-0}A = A$, which uses the convention $T^0 = \operatorname{id}_X$. We include $n = 0$ explicitly because the Cesàro sums defining weak mixing begin at $n = 0$, so we will need $T^{-0}A \cap A^c = \varnothing$. For the inductive step, suppose $T^{-n}A = A$. Then
\begin{align*}
T^{-(n+1)}A = T^{-1}(T^{-n}A) = T^{-1}A = A,
\end{align*}
where the second equality is the inductive hypothesis and the third is the assumed invariance $T^{-1}A = A$. Induction gives $T^{-n}A = A$ for every $n \geq 0$.
*(ii) Empty intersections.* For every $n \in \mathbb{N} \cup \{0\}$,
\begin{align*}
T^{-n}A \cap A^c = A \cap A^c = \varnothing,
\end{align*}
so $\mu(T^{-n}A \cap A^c) = 0$.
*(iii) Apply weak mixing to $(A, A^c)$.* By the definition of weak mixing of $T$ applied to the ordered pair $(A, A^c) \in \mathcal{B} \times \mathcal{B}$,
\begin{align*}
\lim_{M \to \infty} \frac{1}{M} \sum_{n=0}^{M-1} |\mu(T^{-n}A \cap A^c) - \mu(A)\mu(A^c)| = 0.
\end{align*}
Substituting $\mu(T^{-n}A \cap A^c) = 0$ from part (ii), the summand is the constant $|\mu(A)\mu(A^c)| = \mu(A)\mu(A^c)$ (the product is non-negative), and the Cesàro average of a constant equals that constant:
\begin{align*}
\lim_{M \to \infty} \frac{1}{M} \sum_{n=0}^{M-1} \mu(A)\mu(A^c) = \mu(A)\mu(A^c).
\end{align*}
Combining the two displays, $\mu(A)\mu(A^c) = 0$.
Since $(X, \mathcal{B}, \mu)$ is a probability space, $\mu(X) = 1$. The disjoint decomposition $X = A \sqcup A^c$ and finite additivity give
\begin{align*}
\mu(A^c) = 1 - \mu(A).
\end{align*}
Therefore
\begin{align*}
\mu(A)(1 - \mu(A)) = 0.
\end{align*}
Both factors lie in $[0, 1]$, so the product vanishes only when at least one factor vanishes: $\mu(A) = 0$ or $\mu(A) = 1$.
Every $T$-invariant measurable set has measure $0$ or $1$. By the definition of [ergodicity](/page/Ergodic%20Transformation), $T$ is ergodic. This completes the second implication and hence the mixing hierarchy.
[/guided]
[/step]