[proofplan]
We first reduce equidistribution to checking the convergence identity on a countable uniformly dense family in $C(X)$. The [Birkhoff ergodic theorem](/theorems/518) gives a full-measure set on which the desired identity holds for every function in that countable family. Finally, uniform approximation transfers the identity from the dense family to an arbitrary continuous [test function](/page/Test%20Function).
[/proofplan]
[step:Choose a countable uniformly dense family of continuous test functions]
Let $d: X \times X \to [0,\infty)$ denote the metric on $X$. For $a \in X$ and $r > 0$, let $B(a,r) := \{x \in X : d(x,a) < r\}$ denote the open ball in the [metric space](/page/Metric%20Space) $(X,d)$. For a subset $E \subset X$, define the distance-to-set function $\operatorname{dist}(\cdot,E): X \to [0,\infty]$ by
\begin{align*}
\operatorname{dist}(x,E) := \inf_{y \in E} d(x,y).
\end{align*}
Let $C(X)$ denote the real [vector space](/page/Vector%20Space) of continuous functions $f: X \to \mathbb{R}$ equipped with the uniform norm
\begin{align*}
\|f\|_{\infty} := \sup_{x \in X} |f(x)|.
\end{align*}
Since $(X,d)$ is compact and metric, $C(X)$ is separable in the uniform norm. Fix a countable [dense subset](/page/Dense%20Subset)
\begin{align*}
\mathcal{D} := \{f_j : X \to \mathbb{R} \mid j \in \mathbb{N}\} \subset C(X).
\end{align*}
To justify the separability assertion, for each $m \in \mathbb{N}$ choose a finite set $F_m = \{a_{m,1},\dots,a_{m,k_m}\} \subset X$ such that the open balls $B(a_{m,i},1/(4m))$ cover $X$. For each $m$ and $i$, define the [continuous function](/page/Continuous%20Function) $\psi_{m,i}: X \to [0,\infty)$ by
\begin{align*}
\psi_{m,i}(x) := \operatorname{dist}\bigl(x, X \setminus B(a_{m,i},1/(2m))\bigr).
\end{align*}
The function $\Psi_m: X \to (0,\infty)$ given by
\begin{align*}
\Psi_m(x) := \sum_{i=1}^{k_m} \psi_{m,i}(x)
\end{align*}
is strictly positive because the smaller balls $B(a_{m,i},1/(4m))$ cover $X$. Hence the functions $\varphi_{m,i}: X \to [0,1]$ defined by
\begin{align*}
\varphi_{m,i}(x) := \frac{\psi_{m,i}(x)}{\Psi_m(x)}
\end{align*}
are continuous and satisfy
\begin{align*}
\sum_{i=1}^{k_m} \varphi_{m,i}(x) = 1
\end{align*}
for every $x \in X$. The collection of all rational linear combinations
\begin{align*}
\sum_{i=1}^{k_m} q_i \varphi_{m,i}
\end{align*}
with $m \in \mathbb{N}$ and $q_i \in \mathbb{Q}$ is countable.
Let $f: X \to \mathbb{R}$ be continuous and let $\varepsilon > 0$. Since $X$ is compact, $f$ is uniformly continuous. Choose $m \in \mathbb{N}$ so large that $d(x,y) < 1/m$ implies $|f(x)-f(y)| < \varepsilon/2$. Choose rational numbers $q_i \in \mathbb{Q}$ satisfying $|q_i - f(a_{m,i})| < \varepsilon/2$ for $1 \leq i \leq k_m$, and define $g: X \to \mathbb{R}$ by
\begin{align*}
g(x) := \sum_{i=1}^{k_m} q_i\varphi_{m,i}(x).
\end{align*}
If $\varphi_{m,i}(x) \neq 0$, then $x \in B(a_{m,i},1/(2m))$, so $d(x,a_{m,i}) < 1/m$ and therefore $|f(x)-f(a_{m,i})| < \varepsilon/2$. Using the partition-of-unity identity,
\begin{align*}
|f(x)-g(x)| \leq \sum_{i=1}^{k_m} \varphi_{m,i}(x)|f(x)-q_i| < \varepsilon
\end{align*}
for every $x \in X$. Thus the countable family of these rational linear combinations is dense in $C(X)$, and we may enumerate a dense subset as $\mathcal{D} = \{f_j\}_{j \in \mathbb{N}}$.
[/step]
[step:Apply Birkhoff's theorem to every function in the dense family]
For each $j \in \mathbb{N}$, the function $f_j: X \to \mathbb{R}$ is continuous on compact $X$, hence bounded and Borel measurable. Since $\mu$ is a probability measure, $f_j \in L^1(X,\mathcal{B}(X),\mu)$.
By the Birkhoff Ergodic Theorem (citing a result not yet in the wiki: Birkhoff Ergodic Theorem), applied to the measure-preserving system $(X,\mathcal{B}(X),\mu,T)$ and the integrable function $f_j$, there exists a measurable set $A_j \in \mathcal{B}(X)$ with $\mu(A_j)=1$ such that for every $x \in A_j$,
\begin{align*}
\lim_{N \to \infty} \frac{1}{N}\sum_{n=0}^{N-1} f_j(T^n x) = \int_X f_j(y)\,d\mu(y).
\end{align*}
Here ergodicity is used in the Birkhoff theorem to identify the invariant [conditional expectation](/page/Conditional%20Expectation) of $f_j$ with the constant function equal to $\int_X f_j(y)\,d\mu(y)$.
Define the measurable set $A \in \mathcal{B}(X)$ by
\begin{align*}
A := \bigcap_{j=1}^{\infty} A_j.
\end{align*}
Since $A$ is a countable intersection of full-measure sets,
\begin{align*}
\mu(A)=1.
\end{align*}
For every $x \in A$ and every $j \in \mathbb{N}$, the Birkhoff average identity above holds for $f_j$.
[/step]
[step:Extend the convergence identity from the dense family to every continuous function]
Fix $x \in A$ and let $f: X \to \mathbb{R}$ be continuous. Let $\varepsilon > 0$. Since $\mathcal{D}$ is dense in $C(X)$ with respect to $\|\cdot\|_{\infty}$, choose $j \in \mathbb{N}$ such that
\begin{align*}
\|f-f_j\|_{\infty} < \varepsilon.
\end{align*}
For each $N \in \mathbb{N}$, define the time averages $M_N(f,x) \in \mathbb{R}$ and $M_N(f_j,x) \in \mathbb{R}$ by
\begin{align*}
M_N(f,x) := \frac{1}{N}\sum_{n=0}^{N-1} f(T^n x)
\end{align*}
and
\begin{align*}
M_N(f_j,x) := \frac{1}{N}\sum_{n=0}^{N-1} f_j(T^n x).
\end{align*}
Then the triangle inequality gives
\begin{align*}
\left|M_N(f,x)-\int_X f(y)\,d\mu(y)\right| \leq \left|M_N(f,x)-M_N(f_j,x)\right| + \left|M_N(f_j,x)-\int_X f_j(y)\,d\mu(y)\right| + \left|\int_X f_j(y)\,d\mu(y)-\int_X f(y)\,d\mu(y)\right|.
\end{align*}
The first term is bounded by $\varepsilon$ because
\begin{align*}
\left|M_N(f,x)-M_N(f_j,x)\right| \leq \frac{1}{N}\sum_{n=0}^{N-1} |f(T^n x)-f_j(T^n x)| \leq \|f-f_j\|_{\infty} < \varepsilon.
\end{align*}
The third term is bounded by $\varepsilon$ because $\mu(X)=1$ and
\begin{align*}
\left|\int_X f_j(y)\,d\mu(y)-\int_X f(y)\,d\mu(y)\right| \leq \int_X |f_j(y)-f(y)|\,d\mu(y) \leq \|f-f_j\|_{\infty}\mu(X) < \varepsilon.
\end{align*}
Since $x \in A$, the middle term tends to $0$ as $N \to \infty$. Therefore
\begin{align*}
\limsup_{N \to \infty}\left|M_N(f,x)-\int_X f(y)\,d\mu(y)\right| \leq 2\varepsilon.
\end{align*}
Because $\varepsilon > 0$ was arbitrary,
\begin{align*}
\lim_{N \to \infty} M_N(f,x) = \int_X f(y)\,d\mu(y).
\end{align*}
[guided]
We now explain why convergence on the countable dense family is enough. Fix a point $x \in A$ and a continuous function $f: X \to \mathbb{R}$. The set $A$ was chosen so that the Birkhoff average identity is already known for every function $f_j$ in the dense family $\mathcal{D}$. The goal is to compare $f$ with one such $f_j$ uniformly on all of $X$, because uniform control automatically controls both the time averages along the orbit and the space averages against $\mu$.
Let $\varepsilon > 0$. Since $\mathcal{D}$ is dense in $C(X)$ for the uniform norm, there exists $j \in \mathbb{N}$ such that
\begin{align*}
\|f-f_j\|_{\infty} < \varepsilon.
\end{align*}
For each $N \in \mathbb{N}$, define the time averages
\begin{align*}
M_N(f,x) := \frac{1}{N}\sum_{n=0}^{N-1} f(T^n x)
\end{align*}
and
\begin{align*}
M_N(f_j,x) := \frac{1}{N}\sum_{n=0}^{N-1} f_j(T^n x).
\end{align*}
We compare the desired quantity with the known one by adding and subtracting $M_N(f_j,x)$ and $\int_X f_j(y)\,d\mu(y)$. The triangle inequality gives
\begin{align*}
\left|M_N(f,x)-\int_X f(y)\,d\mu(y)\right| \leq \left|M_N(f,x)-M_N(f_j,x)\right| + \left|M_N(f_j,x)-\int_X f_j(y)\,d\mu(y)\right| + \left|\int_X f_j(y)\,d\mu(y)-\int_X f(y)\,d\mu(y)\right|.
\end{align*}
The first term is small because every orbit point $T^n x$ lies in $X$, and the uniform norm controls the difference at every point of $X$:
\begin{align*}
\left|M_N(f,x)-M_N(f_j,x)\right| \leq \frac{1}{N}\sum_{n=0}^{N-1}|f(T^n x)-f_j(T^n x)| \leq \|f-f_j\|_{\infty} < \varepsilon.
\end{align*}
The third term is small for the same uniform reason, now integrated against the probability measure $\mu$:
\begin{align*}
\left|\int_X f_j(y)\,d\mu(y)-\int_X f(y)\,d\mu(y)\right| \leq \int_X |f_j(y)-f(y)|\,d\mu(y) \leq \|f-f_j\|_{\infty}\mu(X) < \varepsilon.
\end{align*}
The middle term is the one for which we use the definition of $A$: since $x \in A \subset A_j$, the Birkhoff average identity for $f_j$ holds at $x$, so
\begin{align*}
\lim_{N \to \infty}\left|M_N(f_j,x)-\int_X f_j(y)\,d\mu(y)\right| = 0.
\end{align*}
Taking the limit superior in the triangle inequality therefore gives
\begin{align*}
\limsup_{N \to \infty}\left|M_N(f,x)-\int_X f(y)\,d\mu(y)\right| \leq 2\varepsilon.
\end{align*}
Since $\varepsilon > 0$ was arbitrary, the limit superior is $0$, and hence
\begin{align*}
\lim_{N \to \infty} M_N(f,x) = \int_X f(y)\,d\mu(y).
\end{align*}
This proves the equidistribution identity for the arbitrary continuous test function $f$.
[/guided]
[/step]
[step:Conclude equidistribution on a full-measure set]
We have constructed a measurable set $A \subset X$ with $\mu(A)=1$ such that for every $x \in A$ and every continuous function $f: X \to \mathbb{R}$,
\begin{align*}
\lim_{N \to \infty} \frac{1}{N}\sum_{n=0}^{N-1} f(T^n x) = \int_X f(y)\,d\mu(y).
\end{align*}
This is precisely the definition that the orbit $(T^n x)_{n \geq 0}$ is equidistributed with respect to $\mu$. Hence the orbit of $\mu$-almost every point $x \in X$ is equidistributed with respect to $\mu$.
[/step]