[proofplan]
The hypothesis $|\mathcal{M}_T| = 1$ collapses the convex invariant-measure simplex $\mathcal{M}_T$ to the singleton $\{\mu\}$, so $\mu$ is automatically the unique extreme point of $\mathcal{M}_T$. The extremality characterization of ergodic measures then gives $\mathcal{E}_T = \operatorname{Ext}(\mathcal{M}_T) = \{\mu\}$, which proves (i). For (ii), any Borel probability measure on the one-point space $\mathcal{E}_T = \{\mu\}$ is determined by its total mass and must therefore be the Dirac mass at $\mu$; applied to the Choquet representing measure $\tau_\mu$, this yields $\tau_\mu = \delta_\mu$.
[/proofplan]
[step:Turn unique ergodicity into a singleton invariant-measure simplex]
By the hypothesis of [unique ergodicity](/page/Unique%20Ergodicity), the set $\mathcal{M}_T$ of $T$-[invariant](/page/Invariant%20Measure) Borel probability measures on $X$ satisfies
\begin{align*}
\mathcal{M}_T = \{\mu\}.
\end{align*}
The set $\mathcal{M}_T$ is convex. Indeed, let $\nu_0, \nu_1 \in \mathcal{M}_T$ and $a \in [0,1]$, and define
\begin{align*}
\nu_a: \mathcal{B}(X) &\to [0,1] \\
A &\mapsto a \nu_0(A) + (1-a) \nu_1(A).
\end{align*}
Then $\nu_a$ is a Borel probability measure on $X$ as a convex combination of two Borel probability measures. Since $T: X \to X$ is continuous, $T$ is $\mathcal{B}(X)$-measurable, so $T^{-1}(A) \in \mathcal{B}(X)$ for every $A \in \mathcal{B}(X)$. For every $A \in \mathcal{B}(X)$,
\begin{align*}
\nu_a(T^{-1}(A))
&= a \nu_0(T^{-1}(A)) + (1-a) \nu_1(T^{-1}(A)) \\
&= a \nu_0(A) + (1-a) \nu_1(A) \\
&= \nu_a(A),
\end{align*}
where the second equality uses the $T$-invariance of $\nu_0$ and $\nu_1$. Thus $\nu_a \in \mathcal{M}_T$.
[guided]
We first translate unique ergodicity into the exact object used in the ergodic decomposition theorem. The hypothesis $|\mathcal{M}_T| = 1$ says that there is exactly one $T$-invariant Borel probability measure on $X$, namely $\mu$. Therefore
\begin{align*}
\mathcal{M}_T = \{\mu\}.
\end{align*}
We also record why $\mathcal{M}_T$ is a convex set, since the extremality characterization of ergodicity used below is a statement about the convex geometry of $\mathcal{M}_T$. Take $\nu_0, \nu_1 \in \mathcal{M}_T$ and $a \in [0,1]$. Define
\begin{align*}
\nu_a: \mathcal{B}(X) &\to [0,1] \\
A &\mapsto a \nu_0(A) + (1-a) \nu_1(A).
\end{align*}
This is a Borel probability measure because it is a convex combination of two Borel probability measures. Since $T: X \to X$ is continuous, $T$ is Borel measurable; hence $T^{-1}(A) \in \mathcal{B}(X)$ whenever $A \in \mathcal{B}(X)$. Using the $T$-invariance of $\nu_0$ and $\nu_1$, we compute
\begin{align*}
\nu_a(T^{-1}(A))
&= a \nu_0(T^{-1}(A)) + (1-a) \nu_1(T^{-1}(A)) \\
&= a \nu_0(A) + (1-a) \nu_1(A) \\
&= \nu_a(A).
\end{align*}
Thus $\nu_a$ is also $T$-invariant, so $\nu_a \in \mathcal{M}_T$. This confirms that the singleton $\{\mu\}$ is being regarded as a convex invariant-measure simplex, which is the structure that the extremality characterization of ergodicity acts on.
[/guided]
[/step]
[step:Identify $\mu$ as the unique extreme point of $\mathcal{M}_T$]
Let $\operatorname{Ext}(\mathcal{M}_T)$ denote the set of extreme points of the convex set $\mathcal{M}_T$. Suppose $t \in (0,1)$ and $\nu_0, \nu_1 \in \mathcal{M}_T$ satisfy
\begin{align*}
\mu = t \nu_0 + (1-t) \nu_1.
\end{align*}
Since $\mathcal{M}_T = \{\mu\}$ by the previous step, $\nu_0 = \mu$ and $\nu_1 = \mu$. Hence $\mu$ is an extreme point of $\mathcal{M}_T$. Conversely, every extreme point of $\mathcal{M}_T$ is an element of $\mathcal{M}_T = \{\mu\}$, so
\begin{align*}
\operatorname{Ext}(\mathcal{M}_T) = \{\mu\}.
\end{align*}
[guided]
We now extract the convex-geometric consequence of $\mathcal{M}_T = \{\mu\}$. Recall that an extreme point of a convex set $C$ is an element $c \in C$ that cannot be written as a non-trivial convex combination of two distinct elements of $C$; concretely, if $c = t c_0 + (1-t) c_1$ with $t \in (0,1)$ and $c_0, c_1 \in C$, then $c_0 = c_1 = c$.
Let $\operatorname{Ext}(\mathcal{M}_T)$ denote the set of extreme points of $\mathcal{M}_T$. To check that $\mu$ is extreme, suppose $t \in (0,1)$ and $\nu_0, \nu_1 \in \mathcal{M}_T$ satisfy
\begin{align*}
\mu = t \nu_0 + (1-t) \nu_1.
\end{align*}
Because $\mathcal{M}_T = \{\mu\}$ from the previous step, the only possible values for $\nu_0$ and $\nu_1$ are $\mu$, forcing $\nu_0 = \mu$ and $\nu_1 = \mu$. The defining condition for extremality is satisfied, so $\mu \in \operatorname{Ext}(\mathcal{M}_T)$. Conversely, every extreme point of $\mathcal{M}_T$ is in particular an element of $\mathcal{M}_T$; since $\mathcal{M}_T = \{\mu\}$, there is no other candidate. Therefore
\begin{align*}
\operatorname{Ext}(\mathcal{M}_T) = \{\mu\}.
\end{align*}
This is the input required by the extremality characterization of ergodic measures invoked in the next step.
[/guided]
[/step]
[step:Apply the extremality characterization of ergodicity to conclude $\mu$ is ergodic]
Let $\mathcal{E}_T \subseteq \mathcal{M}_T$ denote the set of [ergodic](/page/Ergodicity) $T$-invariant Borel probability measures on $X$. Since $X$ is compact metrizable and $T: X \to X$ is continuous, the hypotheses of the [ergodic measures are exactly the extreme invariant measures theorem](/theorems/???) are satisfied. That theorem gives
\begin{align*}
\mathcal{E}_T = \operatorname{Ext}(\mathcal{M}_T).
\end{align*}
Combining with the previous step,
\begin{align*}
\mathcal{E}_T = \{\mu\}.
\end{align*}
In particular, $\mu \in \mathcal{E}_T$, i.e., $\mu$ is ergodic. This establishes conclusion (i).
[guided]
The previous step was purely convex geometry. To convert it into ergodic theory, we use the standard characterization of ergodic measures as the extreme points of the invariant-measure simplex.
Let $\mathcal{E}_T \subseteq \mathcal{M}_T$ denote the set of ergodic $T$-invariant Borel probability measures on $X$. The theorem [ergodic measures are exactly the extreme invariant measures](/theorems/???) applies because its two hypotheses are satisfied here: (a) $X$ is compact metrizable, and (b) $T: X \to X$ is continuous — both are hypotheses of the theorem under proof. Its conclusion is
\begin{align*}
\mathcal{E}_T = \operatorname{Ext}(\mathcal{M}_T).
\end{align*}
From the previous step, the right-hand side is $\{\mu\}$. Hence
\begin{align*}
\mathcal{E}_T = \{\mu\}.
\end{align*}
In particular $\mu \in \mathcal{E}_T$, which is exactly the statement that $\mu$ is ergodic. This establishes conclusion (i) of the theorem.
[/guided]
[/step]
[step:Collapse the Choquet representing measure to the Dirac mass at $\mu$]
By the [Choquet representation theorem for invariant measures](/theorems/???), applied in the setting of a continuous map $T: X \to X$ on a compact metrizable space $X$ — both hypotheses are in scope — there exists a Borel probability measure
\begin{align*}
\tau_\mu: \mathcal{B}(\mathcal{E}_T) &\to [0,1]
\end{align*}
such that, for every $A \in \mathcal{B}(X)$,
\begin{align*}
\mu(A) = \int_{\mathcal{E}_T} \nu(A) \, d\tau_\mu(\nu).
\end{align*}
By the previous step, $\mathcal{E}_T = \{\mu\}$. Hence $\mathcal{E}_T$ is a one-point measurable space and its Borel $\sigma$-algebra is $\mathcal{B}(\mathcal{E}_T) = \{\varnothing, \{\mu\}\}$. Define the [Dirac measure](/page/Dirac%20Measure) at $\mu$ by
\begin{align*}
\delta_\mu: \mathcal{B}(\mathcal{E}_T) &\to [0,1] \\
B &\mapsto \begin{cases} 1, & \mu \in B, \\ 0, & \mu \notin B. \end{cases}
\end{align*}
Because $\tau_\mu$ is a probability measure on $\mathcal{E}_T = \{\mu\}$, the probability-measure axioms force
\begin{align*}
\tau_\mu(\varnothing) = 0 = \delta_\mu(\varnothing)
\qquad \text{and} \qquad
\tau_\mu(\{\mu\}) = 1 = \delta_\mu(\{\mu\}).
\end{align*}
These two equalities exhaust $\mathcal{B}(\mathcal{E}_T)$, so $\tau_\mu(B) = \delta_\mu(B)$ for every $B \in \mathcal{B}(\mathcal{E}_T)$. Therefore
\begin{align*}
\tau_\mu = \delta_\mu,
\end{align*}
which establishes conclusion (ii). Substituting back into the Choquet representation formula gives the consistency check
\begin{align*}
\mu(A) = \int_{\mathcal{E}_T} \nu(A) \, d\delta_\mu(\nu) = \mu(A) \qquad \text{for every } A \in \mathcal{B}(X),
\end{align*}
where the second equality is the defining property of $\delta_\mu$ applied to the evaluation map $\nu \mapsto \nu(A)$ at the point $\nu = \mu$.
[guided]
The role of the Choquet representation theorem in this step is purely to supply a representing measure $\tau_\mu$ on $\mathcal{E}_T$. Once $\mathcal{E}_T$ has been identified as a one-point space in the previous step, the probability-measure axioms alone determine $\tau_\mu$ uniquely, and it must coincide with the Dirac mass at $\mu$.
In detail: by the [Choquet representation theorem for invariant measures](/theorems/???) — whose hypotheses (compact metrizable $X$, continuous $T$) are in scope by assumption — we obtain a Borel probability measure
\begin{align*}
\tau_\mu: \mathcal{B}(\mathcal{E}_T) &\to [0,1]
\end{align*}
representing $\mu$ in the sense that, for every $A \in \mathcal{B}(X)$,
\begin{align*}
\mu(A) = \int_{\mathcal{E}_T} \nu(A) \, d\tau_\mu(\nu).
\end{align*}
The previous step established $\mathcal{E}_T = \{\mu\}$. A one-point measurable space has only two Borel subsets, namely $\varnothing$ and $\{\mu\}$, and any probability measure on such a space is forced by the axioms to assign mass $0$ to the empty set and mass $1$ to the full space. Define the Dirac probability measure at $\mu$ by
\begin{align*}
\delta_\mu: \mathcal{B}(\mathcal{E}_T) &\to [0,1] \\
B &\mapsto \begin{cases} 1, & \mu \in B, \\ 0, & \mu \notin B. \end{cases}
\end{align*}
Then $\delta_\mu(\varnothing) = 0$ and $\delta_\mu(\{\mu\}) = 1$, matching the values forced on $\tau_\mu$. Therefore
\begin{align*}
\tau_\mu(\varnothing) = 0 = \delta_\mu(\varnothing)
\qquad \text{and} \qquad
\tau_\mu(\{\mu\}) = 1 = \delta_\mu(\{\mu\}),
\end{align*}
so $\tau_\mu(B) = \delta_\mu(B)$ on every $B \in \mathcal{B}(\mathcal{E}_T)$. Hence
\begin{align*}
\tau_\mu = \delta_\mu.
\end{align*}
For internal consistency, substitute $\tau_\mu = \delta_\mu$ back into the representation formula: for any $A \in \mathcal{B}(X)$,
\begin{align*}
\int_{\mathcal{E}_T} \nu(A) \, d\delta_\mu(\nu) = \mu(A),
\end{align*}
which is the defining property of the Dirac measure applied to the bounded measurable function $\nu \mapsto \nu(A)$ at the point $\nu = \mu$. So the ergodic decomposition of $\mu$ places all of its mass on the single ergodic measure $\mu$ itself, which establishes conclusion (ii).
[/guided]
[/step]