[guided]We are given that $T$ is weakly mixing on the probability space $(X, \mathcal{B}, \mu)$. Ergodicity is the statement that every $T$-invariant measurable set has $\mu$-measure $0$ or $1$. Fix an invariant set $A \in \mathcal{B}$; the goal is to show $\mu(A) \in \{0, 1\}$.
The key idea is to apply weak mixing to the pair $(A, A^c)$ rather than $(A, A)$. Both $A$ and $A^c$ are invariant; more importantly, $A$ and $A^c$ are disjoint, so the intersection $T^{-n}A \cap A^c$ is empty for every $n$. The weak-mixing Cesàro average, which compares the correlations $\mu(T^{-n}A \cap A^c)$ to the product $\mu(A)\mu(A^c)$, then reduces to the constant $\mu(A)\mu(A^c)$, and weak mixing forces this constant to vanish. The probability hypothesis $\mu(X) = 1$ is what converts the vanishing product into the dichotomy $\mu(A) \in \{0,1\}$.
Define $A^c := X \setminus A \in \mathcal{B}$. The proof has three logical pieces: (i) iterated preimages of an invariant set are still that set; (ii) intersection with $A^c$ is therefore empty at every iterate; (iii) the weak-mixing average then equates $\mu(A)\mu(A^c)$ to $0$.
*(i) Iterated invariance.* We show by induction on $n$ that $T^{-n}A = A$ for every $n \in \mathbb{N} \cup \{0\}$. The base case $n = 0$ is the identity $T^{-0}A = A$, which uses the convention $T^0 = \operatorname{id}_X$. We include $n = 0$ explicitly because the Cesàro sums defining weak mixing begin at $n = 0$, so we will need $T^{-0}A \cap A^c = \varnothing$. For the inductive step, suppose $T^{-n}A = A$. Then
\begin{align*}
T^{-(n+1)}A = T^{-1}(T^{-n}A) = T^{-1}A = A,
\end{align*}
where the second equality is the inductive hypothesis and the third is the assumed invariance $T^{-1}A = A$. Induction gives $T^{-n}A = A$ for every $n \geq 0$.
*(ii) Empty intersections.* For every $n \in \mathbb{N} \cup \{0\}$,
\begin{align*}
T^{-n}A \cap A^c = A \cap A^c = \varnothing,
\end{align*}
so $\mu(T^{-n}A \cap A^c) = 0$.
*(iii) Apply weak mixing to $(A, A^c)$.* By the definition of weak mixing of $T$ applied to the ordered pair $(A, A^c) \in \mathcal{B} \times \mathcal{B}$,
\begin{align*}
\lim_{M \to \infty} \frac{1}{M} \sum_{n=0}^{M-1} |\mu(T^{-n}A \cap A^c) - \mu(A)\mu(A^c)| = 0.
\end{align*}
Substituting $\mu(T^{-n}A \cap A^c) = 0$ from part (ii), the summand is the constant $|\mu(A)\mu(A^c)| = \mu(A)\mu(A^c)$ (the product is non-negative), and the Cesàro average of a constant equals that constant:
\begin{align*}
\lim_{M \to \infty} \frac{1}{M} \sum_{n=0}^{M-1} \mu(A)\mu(A^c) = \mu(A)\mu(A^c).
\end{align*}
Combining the two displays, $\mu(A)\mu(A^c) = 0$.
Since $(X, \mathcal{B}, \mu)$ is a probability space, $\mu(X) = 1$. The disjoint decomposition $X = A \sqcup A^c$ and finite additivity give
\begin{align*}
\mu(A^c) = 1 - \mu(A).
\end{align*}
Therefore
\begin{align*}
\mu(A)(1 - \mu(A)) = 0.
\end{align*}
Both factors lie in $[0, 1]$, so the product vanishes only when at least one factor vanishes: $\mu(A) = 0$ or $\mu(A) = 1$.
Every $T$-invariant measurable set has measure $0$ or $1$. By the definition of [ergodicity](/page/Ergodic%20Transformation), $T$ is ergodic. This completes the second implication and hence the mixing hierarchy.[/guided]