[proofplan]Fix $N$ and replace the finite maximal averages by finite partial sums, since a positive average is equivalent to a positive partial sum. Define the maximal partial sum $F_N=\max_{0\le k\le N}S_k$, so that $E_N=\{F_N>0\}$ and $F_N$ vanishes outside $E_N$. Hopf's telescoping observation gives the pointwise inequality $f\ge F_N-F_N\circ T$ on $E_N$. Integrating this inequality and using measure preservation to control the shifted term yields the desired nonnegativity.[/proofplan]
[step:Define the finite sums and identify the positivity set]
Fix an integer $N\ge 1$. By the hypotheses of the theorem, $f:X\to\mathbb{R}$ is a fixed $\mathcal{B}$-measurable representative of the [$L^1$](/page/???) class.
Define
\begin{align*}
S_0:X&\to \mathbb{R}\\
x&\mapsto 0,
\end{align*}
and, for each integer $k$ with $1\le k\le N$,
\begin{align*}
S_k:X&\to \mathbb{R}\\
x&\mapsto \sum_{n=0}^{k-1} f(T^n x).
\end{align*}
Define the maximal partial-sum function
\begin{align*}
F_N:X&\to [0,\infty)\\
x&\mapsto \max\{S_0(x),S_1(x),\ldots,S_N(x)\}.
\end{align*}
Since each iterate $T^n$ is $\mathcal{B}$-measurable (as a finite composition of the $\mathcal{B}$-measurable map $T$) and $f$ is $\mathcal{B}$-measurable, the composition $f\circ T^n$ is $\mathcal{B}$-measurable. By the [Algebra of Measurable Functions](/theorems/???), finite sums and finite maxima of measurable functions are measurable; hence each $S_k$ and $F_N$ is $\mathcal{B}$-measurable. The range of $F_N$ lies in $[0,\infty)$ because $S_0\equiv 0$ is one of the terms in the maximum.
With these sums, the [finite maximal average](/page/???) introduced in the theorem statement satisfies, for each $x\in X$,
\begin{align*}
M_N f(x)=\max_{1\le k\le N}\frac{S_k(x)}{k}.
\end{align*}
Because every integer $k$ with $1\le k\le N$ is positive,
\begin{align*}
M_N f(x)>0
\quad\Longleftrightarrow\quad
\text{there exists }k\in\{1,\ldots,N\}\text{ such that }S_k(x)>0
\quad\Longleftrightarrow\quad
F_N(x)>0,
\end{align*}
the last equivalence using $S_0(x)=0$ and $F_N\ge S_0=0$ pointwise. Therefore
\begin{align*}
E_N=\{x\in X:F_N(x)>0\}.
\end{align*}
Consequently $F_N(x)=0$ for every $x\in X\setminus E_N$, and
\begin{align*}
F_N=F_N\,\mathbb{1}_{E_N},
\end{align*}
where $\mathbb{1}_{E_N}:X\to\{0,1\}$ denotes the indicator of $E_N$.
[/step]
[step:Record the integral invariance supplied by measure preservation]
[claim:Pulling back by $T$ preserves integrals]
Let $h:X\to[0,\infty]$ be a $\mathcal{B}$-measurable function. Then
\begin{align*}
\int_X h(Tx)\,d\mu(x)=\int_X h(x)\,d\mu(x).
\end{align*}
If $h\in L^1(X,\mu)$ is real-valued, the same equality holds for $h$.
[/claim]
[proof]
For a set $A\in\mathcal{B}$, let $\mathbb{1}_A:X\to\{0,1\}$ denote its indicator. Since $T$ is measure preserving,
\begin{align*}
\int_X \mathbb{1}_A(Tx)\,d\mu(x)
=\mu(T^{-1}(A))
=\mu(A)
=\int_X \mathbb{1}_A(x)\,d\mu(x).
\end{align*}
Let $\varphi:X\to[0,\infty)$ be a nonnegative simple function of the form
\begin{align*}
\varphi=\sum_{j=1}^r \alpha_j\,\mathbb{1}_{A_j},
\end{align*}
where $r\ge 1$ is an integer, $\alpha_j\in[0,\infty)$, and $A_j\in\mathcal{B}$ for each $j$. By linearity of the Lebesgue integral,
\begin{align*}
\int_X \varphi(Tx)\,d\mu(x)
&=\sum_{j=1}^r \alpha_j\int_X \mathbb{1}_{A_j}(Tx)\,d\mu(x)\\
&=\sum_{j=1}^r \alpha_j\int_X \mathbb{1}_{A_j}(x)\,d\mu(x)\\
&=\int_X \varphi(x)\,d\mu(x).
\end{align*}
Now let $h:X\to[0,\infty]$ be measurable. Choose an increasing sequence $(\varphi_m)_{m=1}^{\infty}$ of nonnegative simple measurable functions $\varphi_m:X\to[0,\infty)$ with $\varphi_m(x)\nearrow h(x)$ for every $x\in X$. Then $\varphi_m(Tx)\nearrow h(Tx)$ for every $x\in X$. Applying the [Monotone Convergence Theorem](/theorems/???) twice gives
\begin{align*}
\int_X h(Tx)\,d\mu(x)
&=\lim_{m\to\infty}\int_X \varphi_m(Tx)\,d\mu(x)\\
&=\lim_{m\to\infty}\int_X \varphi_m(x)\,d\mu(x)\\
&=\int_X h(x)\,d\mu(x).
\end{align*}
If $h\in L^1(X,\mu)$ is real-valued, define $h^+:X\to[0,\infty)$ and $h^-:X\to[0,\infty)$ by $h^+=\max\{h,0\}$ and $h^-=\max\{-h,0\}$. Both $h^+$ and $h^-$ are nonnegative and measurable, both have finite integral since $|h|=h^++h^-$ is integrable, and $h=h^+-h^-$. Applying the nonnegative case to $h^+$ and $h^-$ separately and subtracting gives the asserted equality for $h$.
[/proof]
We verify by induction on $n$ that, for every integer $n\ge 0$, the iterate $T^n:X\to X$ is $\mathcal{B}$-measurable and
\begin{align*}
\int_X |f(T^n x)|\,d\mu(x)=\int_X |f(x)|\,d\mu(x).
\end{align*}
For $n=0$, $T^0=\operatorname{id}_X$ is measurable and the equality is immediate. Suppose the statement holds for some integer $n\ge 0$. Then $T^{n+1}=T\circ T^n$ is the composition of two $\mathcal{B}$-measurable maps and is therefore $\mathcal{B}$-measurable, so $|f\circ T^{n+1}|=|f|\circ T^{n+1}:X\to[0,\infty)$ is $\mathcal{B}$-measurable. Applying the nonnegative case of the claim to $h=|f|\circ T^n$ gives
\begin{align*}
\int_X |f(T^{n+1} x)|\,d\mu(x)
=\int_X |f(T^n(Tx))|\,d\mu(x)
=\int_X |f(T^n x)|\,d\mu(x),
\end{align*}
and the induction hypothesis closes the step.
In particular, for each integer $n$ with $0\le n\le N-1$,
\begin{align*}
\int_X |f(T^n x)|\,d\mu(x)=\int_X |f(x)|\,d\mu(x)<\infty,
\end{align*}
so each function $x\mapsto f(T^n x)$ lies in $L^1(X,\mu)$, and therefore each $S_k$ lies in $L^1(X,\mu)$. The pointwise bound
\begin{align*}
0\le F_N(x)\le \sum_{n=0}^{N-1}|f(T^n x)|
\end{align*}
shows $F_N\in L^1(X,\mu)$. Applying the real-valued case of the claim to $h=F_N$ gives
\begin{align*}
\int_X F_N(Tx)\,d\mu(x)=\int_X F_N(x)\,d\mu(x)<\infty.
\end{align*}
[/step]
[step:Compare the positive maximum with the shifted maximum]
Let $x\in E_N$. Since $F_N(x)>0$ and $S_0(x)=0$,
\begin{align*}
F_N(x)=\max_{1\le k\le N}S_k(x).
\end{align*}
For every integer $k$ with $1\le k\le N$,
\begin{align*}
S_k(x)
&=\sum_{n=0}^{k-1}f(T^n x)\\
&=f(x)+\sum_{n=1}^{k-1}f(T^n x)\\
&=f(x)+\sum_{m=0}^{k-2}f(T^m(Tx))\\
&=f(x)+S_{k-1}(Tx),
\end{align*}
using the reindexing $m=n-1$ and the identity $T^n=T^{n-1}\circ T$ for $n\ge 1$. Since $k-1\in\{0,\ldots,N-1\}$, we have $S_{k-1}(Tx)\le F_N(Tx)$. Hence
\begin{align*}
S_k(x)\le f(x)+F_N(Tx)
\end{align*}
for every $k\in\{1,\ldots,N\}$. Taking the maximum over $k$ gives
\begin{align*}
F_N(x)\le f(x)+F_N(Tx),
\end{align*}
and therefore
\begin{align*}
f(x)\ge F_N(x)-F_N(Tx)
\end{align*}
for every $x\in E_N$.
[guided]
The reason we work on $E_N$ is that, on this set, the maximum is genuinely achieved by one of the positive partial sums rather than by the artificial term $S_0=0$. Let $x\in E_N$. Since $F_N(x)>0$ and $S_0(x)=0$, the value $S_0(x)$ cannot be the only source of the maximum, so
\begin{align*}
F_N(x)=\max_{1\le k\le N}S_k(x).
\end{align*}
Now we compare each partial sum starting at $x$ with a partial sum starting at $Tx$. For each integer $k$ satisfying $1\le k\le N$,
\begin{align*}
S_k(x)
&=\sum_{n=0}^{k-1}f(T^n x)\\
&=f(x)+\sum_{n=1}^{k-1}f(T^n x)\\
&=f(x)+\sum_{m=0}^{k-2}f(T^m(Tx))\\
&=f(x)+S_{k-1}(Tx).
\end{align*}
The change of index is $m=n-1$, so when $n$ runs from $1$ to $k-1$, the index $m$ runs from $0$ to $k-2$; and $T^n x=T^{n-1}(Tx)$ for $n\ge 1$. Since $k-1$ lies in $\{0,\ldots,N-1\}$, the term $S_{k-1}(Tx)$ is one of the partial sums included in the definition of $F_N(Tx)$. Thus
\begin{align*}
S_{k-1}(Tx)\le F_N(Tx),
\end{align*}
and hence
\begin{align*}
S_k(x)\le f(x)+F_N(Tx).
\end{align*}
This estimate holds for every $k\in\{1,\ldots,N\}$. Taking the maximum over those $k$ gives
\begin{align*}
F_N(x)\le f(x)+F_N(Tx).
\end{align*}
Rearranging yields the Hopf pointwise inequality
\begin{align*}
f(x)\ge F_N(x)-F_N(Tx)
\end{align*}
for every $x\in E_N$.
[/guided]
[/step]
[step:Integrate the pointwise inequality and use that the maximum vanishes off $E_N$]
By the previous step and the [Linearity and Monotonicity of the Lebesgue Integral](/theorems/???),
\begin{align*}
\int_{E_N} f(x)\,d\mu(x)
&\ge \int_{E_N}F_N(x)\,d\mu(x)-\int_{E_N}F_N(Tx)\,d\mu(x).
\end{align*}
All three integrals are finite: $f\in L^1(X,\mu)$ by hypothesis and $F_N,F_N\circ T\in L^1(X,\mu)$ by the previous step. Since $F_N\circ T\ge 0$ on $X$ and $E_N\subseteq X$, monotonicity of the integral gives the domain enlargement
\begin{align*}
\int_{E_N}F_N(Tx)\,d\mu(x)
\le \int_X F_N(Tx)\,d\mu(x).
\end{align*}
By the pullback invariance proved in the previous step,
\begin{align*}
\int_X F_N(Tx)\,d\mu(x)=\int_X F_N(x)\,d\mu(x).
\end{align*}
Since $F_N=F_N\,\mathbb{1}_{E_N}$ vanishes outside $E_N$,
\begin{align*}
\int_X F_N(x)\,d\mu(x)=\int_{E_N}F_N(x)\,d\mu(x).
\end{align*}
Combining these three estimates,
\begin{align*}
\int_{E_N} f(x)\,d\mu(x)
&\ge \int_{E_N}F_N(x)\,d\mu(x)-\int_{E_N}F_N(Tx)\,d\mu(x)\\
&\ge \int_{E_N}F_N(x)\,d\mu(x)-\int_X F_N(Tx)\,d\mu(x)\\
&= \int_{E_N}F_N(x)\,d\mu(x)-\int_X F_N(x)\,d\mu(x)\\
&=0.
\end{align*}
This proves the desired inequality for the fixed integer $N\ge 1$. Since $N\ge 1$ was arbitrary, the conclusion holds for every integer $N\ge 1$.
[guided]
We now integrate the pointwise inequality from the previous step. All integrals below are finite because $f\in L^1(X,\mu)$ by hypothesis and $F_N,F_N\circ T\in L^1(X,\mu)$ by the pullback-invariance step. Since
\begin{align*}
f(x)\ge F_N(x)-F_N(Tx)
\end{align*}
for every $x\in E_N$, monotonicity and linearity of the Lebesgue integral give
\begin{align*}
\int_{E_N} f(x)\,d\mu(x)
&\ge \int_{E_N}F_N(x)\,d\mu(x)-\int_{E_N}F_N(Tx)\,d\mu(x).
\end{align*}
The remaining task is to show that the shifted term cannot exceed the unshifted maximal-sum integral. The subtle point is that measure preservation gives exact invariance after integrating over all of $X$, not after integrating over an arbitrary subset such as $E_N$. Since $F_N\circ T\ge 0$ on $X$ and $E_N\subseteq X$, monotonicity of the integral over a larger nonnegative domain gives
\begin{align*}
\int_{E_N}F_N(Tx)\,d\mu(x)
\le \int_X F_N(Tx)\,d\mu(x).
\end{align*}
By the pullback-invariance claim applied to the real-valued integrable function $F_N:X\to[0,\infty)$,
\begin{align*}
\int_X F_N(Tx)\,d\mu(x)=\int_X F_N(x)\,d\mu(x).
\end{align*}
Finally, $F_N$ vanishes outside $E_N$ because $E_N=\{F_N>0\}$ and $F_N\ge 0$ pointwise, so $F_N=F_N\,\mathbb{1}_{E_N}$ and
\begin{align*}
\int_X F_N(x)\,d\mu(x)=\int_{E_N}F_N(x)\,d\mu(x).
\end{align*}
Chaining these estimates with the integrated Hopf inequality yields
\begin{align*}
\int_{E_N} f(x)\,d\mu(x)
&\ge \int_{E_N}F_N(x)\,d\mu(x)-\int_{E_N}F_N(Tx)\,d\mu(x)\\
&\ge \int_{E_N}F_N(x)\,d\mu(x)-\int_X F_N(Tx)\,d\mu(x)\\
&= \int_{E_N}F_N(x)\,d\mu(x)-\int_X F_N(x)\,d\mu(x)\\
&=0.
\end{align*}
Thus
\begin{align*}
\int_{E_N} f(x)\,d\mu(x)\ge 0.
\end{align*}
Since the integer $N\ge 1$ was arbitrary, the inequality holds for every finite maximal function $M_N f$, which is the statement of the theorem.
[/guided]
[/step]