[proofplan]
We prove both directions. For the ergodic-to-extreme direction, a convex decomposition of $\mu$ forces each component measure to be absolutely continuous with respect to $\mu$; the Radon-Nikodym density of one component is bounded and, by invariance, is fixed by composition with $T$ in $L^2(\mu)$. Ergodicity then forces this density to be constant, and the total mass condition forces the constant to be $1$. Conversely, if $\mu$ is not ergodic, a proper invariant set lets us condition $\mu$ on the set and on its complement, producing two distinct invariant probability measures whose convex combination is $\mu$.
[/proofplan]
[step:Record the invariant-set formulation used throughout]
Let $(X,\mathcal B)$ denote the measurable space and let
\begin{align*}
T:X&\to X
\end{align*}
be the measurable transformation underlying $\mathcal M_T$. For a probability measure $\nu:\mathcal B\to[0,1]$, the condition $\nu\in\mathcal M_T$ means
\begin{align*}
\nu(T^{-1}B)=\nu(B)\quad\text{for every }B\in\mathcal B.
\end{align*}
We use the measure-theoretic formulation of ergodicity: $\mu\in\mathcal M_T$ is ergodic if every $A\in\mathcal B$ satisfying
\begin{align*}
\mu(T^{-1}A\triangle A)=0
\end{align*}
has $\mu(A)\in\{0,1\}$, where $\triangle$ denotes symmetric difference.
[/step]
[step:Prove that dominated invariant densities are constant under ergodicity]
Assume that $\mu$ is ergodic. Let $C\in(0,\infty)$ and let $\nu:\mathcal B\to[0,1]$ be a probability measure such that $\nu\in\mathcal M_T$ and
\begin{align*}
\nu(B)\leq C\mu(B)\quad\text{for every }B\in\mathcal B.
\end{align*}
Then $\nu\ll\mu$. By the [Radon-Nikodym Theorem](/theorems/1247), there is a $\mathcal B$-measurable function
\begin{align*}
h:X&\to[0,\infty)\\
x&\mapsto h(x)
\end{align*}
such that
\begin{align*}
\nu(B)=\int_B h(x)\,d\mu(x)\quad\text{for every }B\in\mathcal B.
\end{align*}
For $\varepsilon\in(0,\infty)$, define $H_\varepsilon:=\{x\in X:h(x)>C+\varepsilon\}$. The domination of $\nu$ by $C\mu$ gives
\begin{align*}
(C+\varepsilon)\mu(H_\varepsilon)
\leq \int_{H_\varepsilon}h(x)\,d\mu(x)
=\nu(H_\varepsilon)
\leq C\mu(H_\varepsilon),
\end{align*}
so $\mu(H_\varepsilon)=0$. Taking the union over $\varepsilon\in\mathbb Q\cap(0,\infty)$ shows that $h\leq C$ $\mu$-almost everywhere. After redefining $h$ on a $\mu$-null set, we may assume $0\leq h\leq C$ everywhere.
For every bounded $\mathcal B$-measurable function $\varphi:X\to\mathbb R$, invariance of $\nu$ gives
\begin{align*}
\int_X \varphi(x)h(x)\,d\mu(x)
=\int_X \varphi(x)\,d\nu(x)
=\int_X \varphi(T(x))\,d\nu(x)
=\int_X \varphi(T(x))h(x)\,d\mu(x),
\end{align*}
where the middle identity is first the defining invariance identity for indicators and then follows for bounded measurable functions by the monotone class theorem. Taking $\varphi=h$, which is bounded, gives
\begin{align*}
\int_X h(x)^2\,d\mu(x)=\int_X h(T(x))h(x)\,d\mu(x).
\end{align*}
Since $\mu\in\mathcal M_T$ and $h^2:X\to[0,\infty)$ is bounded and measurable,
\begin{align*}
\int_X h(T(x))^2\,d\mu(x)=\int_X h(x)^2\,d\mu(x).
\end{align*}
Therefore
\begin{align*}
\int_X\bigl(h(T(x))-h(x)\bigr)^2\,d\mu(x)
&=\int_X h(T(x))^2\,d\mu(x)-2\int_X h(T(x))h(x)\,d\mu(x)+\int_X h(x)^2\,d\mu(x)\\
&=0.
\end{align*}
Thus $h\circ T=h$ $\mu$-almost everywhere.
[claim:Invariant measurable functions are constant under an ergodic measure]
Let $g:X\to\mathbb R$ be a bounded $\mathcal B$-measurable function satisfying $g\circ T=g$ $\mu$-almost everywhere. Then there is a constant $c\in\mathbb R$ such that $g=c$ $\mu$-almost everywhere.
[/claim]
[proof]
For each $q\in\mathbb Q$, define
\begin{align*}
E_q:=\{x\in X:g(x)>q\}.
\end{align*}
Since $g\circ T=g$ $\mu$-almost everywhere, the sets $T^{-1}E_q$ and $E_q$ differ only on a $\mu$-null set. Hence
\begin{align*}
\mu(T^{-1}E_q\triangle E_q)=0.
\end{align*}
By ergodicity, $\mu(E_q)\in\{0,1\}$ for every $q\in\mathbb Q$.
Choose $M\in(0,\infty)$ such that $|g(x)|\leq M$ for every $x\in X$. Define
\begin{align*}
c:=\inf\{q\in\mathbb Q:\mu(E_q)=0\}.
\end{align*}
This number is finite because $\mu(E_q)=0$ for every rational $q\geq M$ and $\mu(E_q)=1$ for every rational $q<-M$. If $q\in\mathbb Q$ and $q<c$, then $\mu(E_q)=1$. If $q\in\mathbb Q$ and $q>c$, then there is $r\in\mathbb Q$ with $r<q$ and $\mu(E_r)=0$, so $E_q\subseteq E_r$ and $\mu(E_q)=0$.
Now
\begin{align*}
\{x\in X:g(x)>c\}=\bigcup_{\substack{q\in\mathbb Q\\ q>c}}E_q
\end{align*}
has $\mu$-measure $0$, and
\begin{align*}
\{x\in X:g(x)<c\}=\bigcup_{\substack{q\in\mathbb Q\\ q<c}}(X\setminus E_q)
\end{align*}
also has $\mu$-measure $0$. Therefore $\mu(\{x\in X:g(x)\neq c\})=0$.
[/proof]
Applying the claim to $g=h$ gives $h=c$ $\mu$-almost everywhere for some $c\in\mathbb R$. Since both $\nu$ and $\mu$ are probability measures,
\begin{align*}
1=\nu(X)=\int_X h(x)\,d\mu(x)=\int_X c\,d\mu(x)=c\mu(X)=c.
\end{align*}
Thus $h=1$ $\mu$-almost everywhere, and consequently $\nu=\mu$.
[guided]
We prove a lemma that will be applied to one component of a convex decomposition. Assume $\mu$ is ergodic, let $C\in(0,\infty)$, and let $\nu:\mathcal B\to[0,1]$ be a probability measure satisfying $\nu\in\mathcal M_T$ and
\begin{align*}
\nu(B)\leq C\mu(B)\quad\text{for every }B\in\mathcal B.
\end{align*}
The domination implies absolute continuity: if $\mu(B)=0$, then $\nu(B)\leq C\mu(B)=0$, so $\nu(B)=0$. Hence $\nu\ll\mu$.
By the [Radon-Nikodym Theorem](/theorems/1247), there is a $\mathcal B$-measurable function
\begin{align*}
h:X&\to[0,\infty)\\
x&\mapsto h(x)
\end{align*}
such that
\begin{align*}
\nu(B)=\int_B h(x)\,d\mu(x)\quad\text{for every }B\in\mathcal B.
\end{align*}
We also need $h$ to be bounded, because later we will use $h$ itself as a test function. For $\varepsilon\in(0,\infty)$, define
\begin{align*}
H_\varepsilon:=\{x\in X:h(x)>C+\varepsilon\}.
\end{align*}
Then
\begin{align*}
(C+\varepsilon)\mu(H_\varepsilon)
\leq \int_{H_\varepsilon}h(x)\,d\mu(x)
=\nu(H_\varepsilon)
\leq C\mu(H_\varepsilon).
\end{align*}
The only way this inequality can hold is $\mu(H_\varepsilon)=0$. Taking the countable union over rational $\varepsilon>0$ shows that $h\leq C$ $\mu$-almost everywhere. Redefining $h$ on a $\mu$-null set does not change the measure represented by $h\,d\mu$, so we may assume $0\leq h\leq C$ everywhere.
The key point is to convert invariance of the measure $\nu$ into invariance of its density $h$. Let $\varphi:X\to\mathbb R$ be bounded and $\mathcal B$-measurable. Since $\nu\in\mathcal M_T$, the identity
\begin{align*}
\int_X \varphi(x)\,d\nu(x)=\int_X \varphi(T(x))\,d\nu(x)
\end{align*}
holds first for indicator functions $\varphi=\mathbb 1_B$ and then for bounded measurable $\varphi$ by the monotone class theorem. Substituting $d\nu=h\,d\mu$ gives
\begin{align*}
\int_X \varphi(x)h(x)\,d\mu(x)=\int_X \varphi(T(x))h(x)\,d\mu(x).
\end{align*}
Now choose $\varphi=h$. This is allowed because $h$ is bounded and measurable. We get
\begin{align*}
\int_X h(x)^2\,d\mu(x)=\int_X h(T(x))h(x)\,d\mu(x).
\end{align*}
Since $\mu\in\mathcal M_T$, applying invariance of $\mu$ to the bounded measurable function $h^2:X\to[0,\infty)$ gives
\begin{align*}
\int_X h(T(x))^2\,d\mu(x)=\int_X h(x)^2\,d\mu(x).
\end{align*}
Combining these two identities,
\begin{align*}
\int_X\bigl(h(T(x))-h(x)\bigr)^2\,d\mu(x)
&=\int_X h(T(x))^2\,d\mu(x)-2\int_X h(T(x))h(x)\,d\mu(x)+\int_X h(x)^2\,d\mu(x)\\
&=0.
\end{align*}
A nonnegative function with integral $0$ is $0$ $\mu$-almost everywhere, so $h\circ T=h$ $\mu$-almost everywhere.
It remains to explain why an invariant measurable function is constant under ergodicity. Let $g:X\to\mathbb R$ be a bounded $\mathcal B$-measurable function satisfying $g\circ T=g$ $\mu$-almost everywhere. For each rational number $q\in\mathbb Q$, define
\begin{align*}
E_q:=\{x\in X:g(x)>q\}.
\end{align*}
Because $g(T(x))=g(x)$ outside a $\mu$-null set, membership in $E_q$ and membership in $T^{-1}E_q$ agree outside that null set. Thus
\begin{align*}
\mu(T^{-1}E_q\triangle E_q)=0.
\end{align*}
Ergodicity gives $\mu(E_q)\in\{0,1\}$ for every rational $q$.
Choose $M\in(0,\infty)$ such that $|g(x)|\leq M$ for every $x\in X$, and define
\begin{align*}
c:=\inf\{q\in\mathbb Q:\mu(E_q)=0\}.
\end{align*}
The set in the infimum is nonempty because $E_q=\varnothing$ for rational $q\geq M$, and it is bounded below because $E_q=X$ for rational $q<-M$. If $q<c$, then $\mu(E_q)$ cannot be $0$, so $\mu(E_q)=1$. If $q>c$, the definition of the infimum gives some rational $r<q$ with $\mu(E_r)=0$; since $E_q\subseteq E_r$, we get $\mu(E_q)=0$.
Therefore
\begin{align*}
\{x\in X:g(x)>c\}=\bigcup_{\substack{q\in\mathbb Q\\q>c}}E_q
\end{align*}
has measure $0$, and
\begin{align*}
\{x\in X:g(x)<c\}=\bigcup_{\substack{q\in\mathbb Q\\q<c}}(X\setminus E_q)
\end{align*}
has measure $0$. Hence $g=c$ $\mu$-almost everywhere. Applying this to $g=h$ gives $h=c$ $\mu$-almost everywhere. Since $\nu(X)=1$ and $\mu(X)=1$,
\begin{align*}
1=\nu(X)=\int_X h(x)\,d\mu(x)=\int_X c\,d\mu(x)=c.
\end{align*}
Thus $h=1$ $\mu$-almost everywhere, so $\nu=\mu$.
[/guided]
[/step]
[step:Use the constant-density result to rule out proper decompositions of an ergodic measure]
Assume $\mu$ is ergodic and suppose
\begin{align*}
\mu=t\mu_1+(1-t)\mu_2
\end{align*}
for some $t\in(0,1)$ and some $\mu_1,\mu_2\in\mathcal M_T$. For every $B\in\mathcal B$,
\begin{align*}
t\mu_1(B)\leq t\mu_1(B)+(1-t)\mu_2(B)=\mu(B),
\end{align*}
so
\begin{align*}
\mu_1(B)\leq \frac{1}{t}\mu(B).
\end{align*}
Applying the previous step with $\nu=\mu_1$ and $C=1/t$ gives $\mu_1=\mu$. Then, for every $B\in\mathcal B$,
\begin{align*}
(1-t)\mu_2(B)=\mu(B)-t\mu_1(B)=\mu(B)-t\mu(B)=(1-t)\mu(B).
\end{align*}
Since $1-t>0$, $\mu_2(B)=\mu(B)$ for every $B\in\mathcal B$, so $\mu_2=\mu$. Hence every convex decomposition of $\mu$ inside $\mathcal M_T$ is forced to use only $\mu$ itself, and $\mu$ is an extreme point of $\mathcal M_T$.
[/step]
[step:Condition a nonergodic measure on a proper invariant set]
Assume now that $\mu$ is not ergodic. Then there exists $A\in\mathcal B$ such that
\begin{align*}
0<\mu(A)<1
\quad\text{and}\quad
\mu(T^{-1}A\triangle A)=0.
\end{align*}
Define
\begin{align*}
a&:=\mu(A),\\
A^c&:=X\setminus A,\\
b&:=\mu(A^c)=1-a.
\end{align*}
Then $a,b\in(0,1)$. Define set functions
\begin{align*}
\mu_A:\mathcal B&\to[0,1]\\
B&\mapsto \frac{\mu(B\cap A)}{a}
\end{align*}
and
\begin{align*}
\mu_{A^c}:\mathcal B&\to[0,1]\\
B&\mapsto \frac{\mu(B\cap A^c)}{b}.
\end{align*}
Both are probability measures, because intersection with a fixed measurable set preserves countable disjoint unions, and $\mu_A(X)=\mu_{A^c}(X)=1$.
For every $B\in\mathcal B$, the sets $T^{-1}B\cap A$ and $T^{-1}B\cap T^{-1}A$ differ by a subset of $A\triangle T^{-1}A$, which has $\mu$-measure $0$. Hence
\begin{align*}
\mu_A(T^{-1}B)
&=\frac{\mu(T^{-1}B\cap A)}{a}\\
&=\frac{\mu(T^{-1}B\cap T^{-1}A)}{a}\\
&=\frac{\mu(T^{-1}(B\cap A))}{a}\\
&=\frac{\mu(B\cap A)}{a}\\
&=\mu_A(B),
\end{align*}
where the fourth equality uses $\mu\in\mathcal M_T$. Also $T^{-1}A^c\triangle A^c=T^{-1}A\triangle A$, so the same computation gives
\begin{align*}
\mu_{A^c}(T^{-1}B)
&=\frac{\mu(T^{-1}B\cap A^c)}{b}\\
&=\frac{\mu(T^{-1}B\cap T^{-1}A^c)}{b}\\
&=\frac{\mu(T^{-1}(B\cap A^c))}{b}\\
&=\frac{\mu(B\cap A^c)}{b}\\
&=\mu_{A^c}(B).
\end{align*}
Therefore $\mu_A,\mu_{A^c}\in\mathcal M_T$.
[guided]
Because $\mu$ is not ergodic, there is a measurable set $A\in\mathcal B$ that is invariant up to a $\mu$-null set and has genuinely intermediate measure:
\begin{align*}
0<\mu(A)<1
\quad\text{and}\quad
\mu(T^{-1}A\triangle A)=0.
\end{align*}
Define
\begin{align*}
a&:=\mu(A),\\
A^c&:=X\setminus A,\\
b&:=\mu(A^c)=1-a.
\end{align*}
The inequalities above imply $a,b\in(0,1)$, so division by $a$ and by $b$ is valid.
We condition $\mu$ on $A$ and on $A^c$. Define
\begin{align*}
\mu_A:\mathcal B&\to[0,1]\\
B&\mapsto \frac{\mu(B\cap A)}{a}
\end{align*}
and
\begin{align*}
\mu_{A^c}:\mathcal B&\to[0,1]\\
B&\mapsto \frac{\mu(B\cap A^c)}{b}.
\end{align*}
These are probability measures: countable additivity follows from countable additivity of $\mu$ because intersections with $A$ and $A^c$ preserve disjoint unions, and
\begin{align*}
\mu_A(X)=\frac{\mu(A)}{a}=1,
\qquad
\mu_{A^c}(X)=\frac{\mu(A^c)}{b}=1.
\end{align*}
We now verify that these conditional measures are $T$-invariant. Fix $B\in\mathcal B$. Since $\mu(T^{-1}A\triangle A)=0$, replacing $A$ by $T^{-1}A$ inside an intersection changes the $\mu$-measure by $0$. Thus
\begin{align*}
\mu(T^{-1}B\cap A)=\mu(T^{-1}B\cap T^{-1}A).
\end{align*}
Using this replacement,
\begin{align*}
\mu_A(T^{-1}B)
&=\frac{\mu(T^{-1}B\cap A)}{a}\\
&=\frac{\mu(T^{-1}B\cap T^{-1}A)}{a}\\
&=\frac{\mu(T^{-1}(B\cap A))}{a}\\
&=\frac{\mu(B\cap A)}{a}\\
&=\mu_A(B).
\end{align*}
The fourth equality is exactly the $T$-invariance of $\mu$, applied to the measurable set $B\cap A$.
The complement is invariant up to the same null set because
\begin{align*}
T^{-1}A^c\triangle A^c=T^{-1}A\triangle A.
\end{align*}
Therefore, for the same fixed $B\in\mathcal B$,
\begin{align*}
\mu_{A^c}(T^{-1}B)
&=\frac{\mu(T^{-1}B\cap A^c)}{b}\\
&=\frac{\mu(T^{-1}B\cap T^{-1}A^c)}{b}\\
&=\frac{\mu(T^{-1}(B\cap A^c))}{b}\\
&=\frac{\mu(B\cap A^c)}{b}\\
&=\mu_{A^c}(B).
\end{align*}
Thus $\mu_A,\mu_{A^c}\in\mathcal M_T$.
[/guided]
[/step]
[step:Assemble the conditional measures into a proper convex decomposition]
For every $B\in\mathcal B$,
\begin{align*}
a\mu_A(B)+b\mu_{A^c}(B)
&=a\frac{\mu(B\cap A)}{a}+b\frac{\mu(B\cap A^c)}{b}\\
&=\mu(B\cap A)+\mu(B\cap A^c)\\
&=\mu(B).
\end{align*}
Thus
\begin{align*}
\mu=a\mu_A+b\mu_{A^c}.
\end{align*}
Since $a,b\in(0,1)$, this is a convex decomposition inside $\mathcal M_T$. It is proper because
\begin{align*}
\mu_A(A)=1
\quad\text{and}\quad
\mu_{A^c}(A)=0,
\end{align*}
so $\mu_A\neq\mu_{A^c}$. Therefore $\mu$ is not an extreme point of $\mathcal M_T$.
We have shown that ergodicity implies extremality and that nonergodicity implies nonextremality. Hence $\mu\in\mathcal M_T$ is ergodic if and only if $\mu$ is an extreme point of $\mathcal M_T$.
[/step]