[guided]Fix an ergodic invariant Borel probability measure $\nu\in\mathcal M_T(X)$ and let $\eta>0$. The ergodicity assumption is essential in this first part: Birkhoff's ergodic theorem then says that the Birkhoff averages of the [continuous function](/page/Continuous%20Function) $\phi:X\to\mathbb R$ converge $\nu$-almost everywhere to the scalar integral $\int_X\phi\,d\nu$, not merely to a conditional expectation. Since $X$ is compact and $\phi$ is continuous, $\phi\in L^1(X,\mathcal B(X),\nu)$, so the theorem applies. Hence
\begin{align*}
\lim_{n\to\infty}\frac{1}{n}S_n\phi(x)=\int_X\phi(y)\,d\nu(y)
\end{align*}
for $\nu$-almost every $x\in X$.
Birkhoff's theorem gives pointwise convergence, so we still need to extract a single time threshold that works for every point in a positive-measure set. For each $m\in\mathbb N$, define the Borel tail set
\begin{align*}
B_m:=\left\{x\in X: \frac{1}{n}S_n\phi(x)\ge \int_X\phi(y)\,d\nu(y)-\eta \text{ for every } n\ge m\right\}.
\end{align*}
The almost-everywhere convergence says that almost every $x$ eventually belongs to one of these tail sets, so $\nu(\bigcup_{m=1}^{\infty}B_m)=1$. The sets are increasing in $m$, and continuity from below for the probability measure $\nu$ gives an integer $N_1\in\mathbb N$ such that $B:=B_{N_1}$ satisfies $\nu(B)>1/2$. By the definition of $B$, every $x\in B$ and every $n\ge N_1$ satisfy
\begin{align*}
\frac{1}{n}S_n\phi(x)\ge \int_X\phi(y)\,d\nu(y)-\eta.
\end{align*}
We now need many points in $B$ that are actually separated in the Bowen metric. A measurable partition with small atoms would not be enough, because different atoms can have points arbitrarily close to one another. Instead we use Katok's separated-set entropy formula in its compact-metric ergodic form. Its hypotheses are exactly the following: $X$ is compact metric, $T:X\to X$ is continuous, $\nu$ is an ergodic $T$-invariant Borel probability measure, and $B\in\mathcal B(X)$ has positive $\nu$-measure. These have already been verified: compactness and continuity are hypotheses of the theorem, $\nu$ was chosen ergodic in $\mathcal M_T(X)$, and the tail-set argument gave $\nu(B)>1/2$. If $h_\nu(T)<\infty$, then for every sufficiently small $\varepsilon>0$ and all sufficiently large $n$, the formula provides an $(n,\varepsilon)$-separated set $E_n\subset B$ such that
\begin{align*}
|E_n|\ge \exp(n(h_\nu(T)-\eta)).
\end{align*}
If $h_\nu(T)=\infty$, the same conclusion holds with $h_\nu(T)-\eta$ replaced by any finite number $R>0$, and letting $R\to\infty$ after the pressure estimate gives the desired infinite lower bound.
For every $x\in E_n$, the Birkhoff lower bound gives
\begin{align*}
\exp(S_n\phi(x))\ge \exp(n(\int_X\phi(y)\,d\nu(y)-\eta)).
\end{align*}
Therefore
\begin{align*}
Z_n(\varepsilon)\ge \sum_{x\in E_n}\exp(S_n\phi(x))\ge \exp(n(h_\nu(T)+\int_X\phi(y)\,d\nu(y)-2\eta)).
\end{align*}
Taking logarithms, dividing by $n$, passing to the limsup, letting $\varepsilon\downarrow0$, and then letting $\eta\downarrow0$ gives
\begin{align*}
P(T,\phi)\ge h_\nu(T)+\int_X\phi(y)\,d\nu(y).
\end{align*}
To pass from ergodic measures to an arbitrary invariant measure $\mu\in\mathcal M_T(X)$, use the ergodic decomposition. Let $\mathcal E_T(X)$ denote the Borel subset of $\mathcal M_T(X)$ consisting of ergodic invariant Borel probability measures, with $\mathcal M_T(X)$ equipped with the weak topology. The ergodic decomposition theorem gives a Borel probability measure $\tau$ on $\mathcal E_T(X)$ such that for every $f\in C(X)$,
\begin{align*}
\int_X f(x)\,d\mu(x)=\int_{\mathcal E_T(X)}\left(\int_X f(x)\,d\nu(x)\right)d\tau(\nu).
\end{align*}
The entropy-affinity theorem for this decomposition and the displayed barycentric identity imply that the entropy and potential terms are affine:
\begin{align*}
h_\mu(T)+\int_X\phi(y)\,d\mu(y)=\int \left(h_\nu(T)+\int_X\phi(y)\,d\nu(y)\right)\,d\tau(\nu).
\end{align*}
The ergodic case bounds every integrand by $P(T,\phi)$, so the integral is also bounded by $P(T,\phi)$. Hence
\begin{align*}
P(T,\phi)\ge h_\mu(T)+\int_X\phi(y)\,d\mu(y).
\end{align*}
Taking the supremum over invariant measures proves the lower bound.[/guided]