[proofplan]
We restrict to the full-measure set of points whose forward orbits avoid the finite exceptional set and for which Birkhoff's ergodic theorem applies to the integrable observable $\log |f'|$. On such an orbit, the ordinary chain rule applied repeatedly gives a product formula for the derivative of the $n$-th iterate. Taking logarithms converts this product into a Birkhoff average. Ergodicity then identifies the almost-everywhere limit of this average with the space average of $\log |f'|$.
[/proofplan]
[step:Choose the full-measure set where both the orbit and the Birkhoff average are controlled]
Let $\mathbb{N}_0 := \{0,1,2,\dots\}$. Define the orbit-regular set
\begin{align*}
X_{\mathrm{reg}} := \{x \in I : f^j(x) \notin E \text{ for every } j \in \mathbb{N}_0\}.
\end{align*}
By hypothesis, $\mu(X_{\mathrm{reg}})=1$. Since $E \subset I \setminus X_{\mathrm{reg}}$, this also gives $\mu(E)=0$.
Define the observable $\varphi:I\to\mathbb{R}$ by $\varphi(t)=\log |f'(t)|$ on $I \setminus E$, with the prescribed arbitrary extension on $E$. The assumption $\log |f'| \in L^1(I,\mu)$ means precisely that $\varphi \in L^1(I,\mu)$. Since $f$ is Borel measurable and $\mu$ is $f$-invariant and ergodic, $f$ is a probability-preserving transformation of the Borel probability space $(I,\mu)$. Hence Birkhoff's ergodic theorem for measure-preserving transformations applies to $\varphi$ and gives a measurable set $X_{\mathrm{Bir}} \subset I$ with $\mu(X_{\mathrm{Bir}})=1$ such that, for every $x \in X_{\mathrm{Bir}}$,
\begin{align*}
\lim_{n \to \infty} \frac{1}{n}\sum_{j=0}^{n-1}\varphi(f^j(x)) = \int_I \varphi(t)\, d\mu(t).
\end{align*}
Here we use the [Birkhoff Ergodic Theorem](/theorems/518): if a probability-preserving transformation is ergodic and $\varphi \in L^1$, then the time averages of $\varphi$ converge almost everywhere to the space average of $\varphi$.
Define the null set of non-finite values of the observable by
\begin{align*}
N_{\varphi} := \{t \in I : \varphi(t) \notin \mathbb{R}\}.
\end{align*}
Because $\varphi \in L^1(I,\mu)$, we have $\mu(N_{\varphi})=0$. For each $j \in \mathbb{N}_0$, $f$-invariance gives
\begin{align*}
\mu(f^{-j}(N_{\varphi})) = \mu(N_{\varphi}) = 0.
\end{align*}
Set
\begin{align*}
X_{\mathrm{fin}} := I \setminus \bigcup_{j=0}^{\infty} f^{-j}(N_{\varphi}).
\end{align*}
By [countable subadditivity](/theorems/1108), $\mu(X_{\mathrm{fin}})=1$, and for every $x \in X_{\mathrm{fin}}$ and every $j \in \mathbb{N}_0$, the value $\varphi(f^j(x))$ is finite.
Set
\begin{align*}
X := X_{\mathrm{reg}} \cap X_{\mathrm{Bir}} \cap X_{\mathrm{fin}}.
\end{align*}
Because $X_{\mathrm{reg}}$, $X_{\mathrm{Bir}}$, and $X_{\mathrm{fin}}$ each have full $\mu$-measure, countable additivity gives $\mu(X)=1$.
[guided]
We need one set on which every ingredient in the argument is valid. There are two separate requirements. First, the forward orbit must avoid the finite exceptional set $E$, because the derivative of $f$ is only assumed to be available away from $E$. This gives the set
\begin{align*}
X_{\mathrm{reg}} := \{x \in I : f^j(x) \notin E \text{ for every } j \in \mathbb{N}_0\},
\end{align*}
where $\mathbb{N}_0 := \{0,1,2,\dots\}$. The theorem assumes $\mu(X_{\mathrm{reg}})=1$. Since $E \subset I \setminus X_{\mathrm{reg}}$, the same assumption also implies $\mu(E)=0$.
Second, we need Birkhoff's ergodic theorem to apply to the logarithmic observable. Define $\varphi:I\to\mathbb{R}$ by $\varphi(t)=\log |f'(t)|$ on $I \setminus E$, extending it arbitrarily on $E$. This extension does not affect the argument on $X_{\mathrm{reg}}$, because no point of such an orbit ever lies in $E$. The integrability assumption says $\varphi \in L^1(I,\mu)$.
Now Birkhoff's ergodic theorem for measure-preserving transformations applies because $f$ is Borel measurable, $\mu$ is an $f$-invariant probability measure, and $\varphi$ is integrable. The ergodicity hypothesis strengthens the conclusion: the limiting time average is the constant space average, not merely a [conditional expectation](/page/Conditional%20Expectation). Thus there is a set $X_{\mathrm{Bir}} \subset I$ with $\mu(X_{\mathrm{Bir}})=1$ such that, for every $x \in X_{\mathrm{Bir}}$,
\begin{align*}
\lim_{n \to \infty} \frac{1}{n}\sum_{j=0}^{n-1}\varphi(f^j(x)) = \int_I \varphi(t)\, d\mu(t).
\end{align*}
Here we use the Birkhoff Ergodic Theorem: for an ergodic probability-preserving transformation and an $L^1$ observable, the time averages converge almost everywhere to the integral of the observable.
There is one more point to control before taking logarithms of derivative products: the values of $\varphi$ must be finite along the whole forward orbit. Define
\begin{align*}
N_{\varphi} := \{t \in I : \varphi(t) \notin \mathbb{R}\}.
\end{align*}
Since $\varphi \in L^1(I,\mu)$, the function is finite $\mu$-almost everywhere, so $\mu(N_{\varphi})=0$. For each $j \in \mathbb{N}_0$, invariance of $\mu$ under $f$ gives
\begin{align*}
\mu(f^{-j}(N_{\varphi})) = \mu(N_{\varphi}) = 0.
\end{align*}
Therefore the countable union $\bigcup_{j=0}^{\infty} f^{-j}(N_{\varphi})$ is null. Define
\begin{align*}
X_{\mathrm{fin}} := I \setminus \bigcup_{j=0}^{\infty} f^{-j}(N_{\varphi}).
\end{align*}
Then $\mu(X_{\mathrm{fin}})=1$, and for every $x \in X_{\mathrm{fin}}$ every value $\varphi(f^j(x))$ is a finite real number.
Finally define
\begin{align*}
X := X_{\mathrm{reg}} \cap X_{\mathrm{Bir}} \cap X_{\mathrm{fin}}.
\end{align*}
Since the intersection of finitely many full-measure sets again has full measure, $\mu(X)=1$. This is the set on which the chain rule computation, the logarithmic conversion, and the ergodic limit can be combined.
[/guided]
[/step]
[step:Apply the chain rule along every regular forward orbit]
Fix $x \in X$. First observe that the derivative determined by a local representative is independent of the representative. If $F,G$ are two $C^1$ representatives near $y\in I\setminus E$, then $H:=F-G$ is $C^1$ and vanishes on a subset of $I$ accumulating at $y$, because $I$ is a nondegenerate interval. Hence the difference quotients for $H$ along that subset are zero, and the ordinary derivative $H'(y)$ must be zero. Thus $F'(y)=G'(y)$, including at endpoints of $I$.
Next we record the local composition fact needed for the ordinary chain rule. Fix $n \in \mathbb{N}$. For each $k=0,\dots,n-1$, the point $f^k(x)$ lies in $I\setminus E$, so choose a local $C^1$ representative $F_k:U_k\to\mathbb{R}$ of $f$ near $f^k(x)$. By continuity of these finitely many maps, we may shrink open intervals $V_k\subset U_k$ around $f^k(x)$, for $k=0,\dots,n-1$, and choose an open interval $V_n$ around $f^n(x)$ so that
\begin{align*}
F_k(V_k)\subset V_{k+1}
\end{align*}
for every $k=0,\dots,n-1$. Then
\begin{align*}
G_n:=F_{n-1}\circ\cdots\circ F_0
\end{align*}
is a $C^1$ map on $V_0$. After possibly shrinking $V_0$ further along this finite construction, every $z\in V_0\cap I$ has its first $n$ iterates in the corresponding $V_k$, and the equality $G_n(z)=f^n(z)$ follows by induction from $F_k=f$ on $V_k\cap I$. Thus $G_n$ is a local $C^1$ representative of the finite iterate at $x$, and the derivative $(f^n)'(x)$ exists.
We claim that, for every $n \in \mathbb{N}$, this derivative satisfies
\begin{align*}
(f^n)'(x) = \prod_{j=0}^{n-1} f'(f^j(x)).
\end{align*}
For $n=1$, this is the identity $(f^1)'(x)=f'(x)$, and $x \notin E$ because $x \in X_{\mathrm{reg}}$. The repaired hypothesis gives a local $C^1$ representative of $f$ near $x$, and the uniqueness argument above makes the ordinary derivative well-defined.
Suppose the formula holds for some $n \in \mathbb{N}$. The local composition fact applied to $n$ gives a local $C^1$ representative of $f^n$ near $x$, and the local representative hypothesis gives a local $C^1$ representative of $f$ near $f^n(x)$. Thus the ordinary chain rule applies to the local representatives of the composition $f^{n+1}=f \circ f^n$ at $x$, and it gives
\begin{align*}
(f^{n+1})'(x) = f'(f^n(x))(f^n)'(x).
\end{align*}
Substituting the induction hypothesis yields
\begin{align*}
(f^{n+1})'(x) = f'(f^n(x))\prod_{j=0}^{n-1} f'(f^j(x)) = \prod_{j=0}^{n} f'(f^j(x)).
\end{align*}
By induction, the product formula holds for every $n \in \mathbb{N}$.
[guided]
We now justify the derivative formula carefully, because the exceptional set hypothesis alone would not handle endpoints of $I$ or endpoints of components of $I \setminus E$. The repaired statement says more: whenever $y \in I \setminus E$, there is an open interval $U_y \subset \mathbb{R}$ containing $y$ and a $C^1$ map $F_y: U_y \to \mathbb{R}$ that agrees with $f$ on $U_y \cap I$. This gives an ordinary derivative $f'(y)$ by differentiating the local representative $F_y$ at $y$.
This derivative is independent of the chosen representative. If $F$ and $G$ are two choices, then $H=F-G$ is $C^1$ and is zero on $I$ near $y$. Since $I$ is nondegenerate, there are points of $I$ different from $y$ approaching $y$ from within $I$. The corresponding difference quotients of $H$ are zero, so the ordinary derivative $H'(y)$ is zero. Therefore $F'(y)=G'(y)$. The same argument covers endpoints, because a one-sided accumulating subset still determines the derivative of the ambient $C^1$ representative.
For any fixed finite time $n$, these local representatives can be composed along the finite orbit segment of $x$. For $k=0,\dots,n-1$, choose a $C^1$ representative $F_k$ near $f^k(x)$. After shrinking neighborhoods, continuity gives
\begin{align*}
F_k(V_k)\subset V_{k+1}
\end{align*}
along the orbit segment. The composition $F_{n-1}\circ\cdots\circ F_0$ is therefore a $C^1$ local representative of $f^n$ at $x$. Thus the ordinary chain rule is being applied to genuine local $C^1$ maps, not merely to pointwise derivatives.
Fix $x \in X$. Since $X \subset X_{\mathrm{reg}}$, every point $f^j(x)$ lies in $I \setminus E$. For $n=1$, the formula reads
\begin{align*}
(f^1)'(x)=f'(x),
\end{align*}
which is the definition of the derivative of the first iterate.
Assume now that, for some $n \in \mathbb{N}$, the derivative $(f^n)'(x)$ exists and satisfies
\begin{align*}
(f^n)'(x) = \prod_{j=0}^{n-1} f'(f^j(x)).
\end{align*}
Because the finite-iterate representative above gives a local $C^1$ representative of $f^n$ at $x$, and because $f^n(x) \notin E$ gives a local $C^1$ representative of $f$ at $f^n(x)$, the ordinary chain rule is applicable to the composition $f^{n+1}=f \circ f^n$ at $x$, and it gives
\begin{align*}
(f^{n+1})'(x) = f'(f^n(x))(f^n)'(x).
\end{align*}
Substituting the induction hypothesis into the right-hand side gives
\begin{align*}
(f^{n+1})'(x) = f'(f^n(x))\prod_{j=0}^{n-1} f'(f^j(x)) = \prod_{j=0}^{n} f'(f^j(x)).
\end{align*}
This proves the induction step. Hence, by induction on $n$, $(f^n)'(x)$ exists and satisfies the stated product formula for every $n \in \mathbb{N}$.
[/guided]
[/step]
[step:Convert the derivative product into a Birkhoff sum]
For $x \in X$ and $n \in \mathbb{N}$, the product formula gives
\begin{align*}
|(f^n)'(x)| = \prod_{j=0}^{n-1}|f'(f^j(x))|.
\end{align*}
Since $x \in X \subset X_{\mathrm{fin}}$, every term $\varphi(f^j(x))$ is finite for every $j \in \mathbb{N}_0$. Taking logarithms of the product therefore gives
\begin{align*}
\log |(f^n)'(x)| = \sum_{j=0}^{n-1}\log |f'(f^j(x))| = \sum_{j=0}^{n-1}\varphi(f^j(x)).
\end{align*}
Dividing by $n$ gives
\begin{align*}
\frac{1}{n}\log |(f^n)'(x)| = \frac{1}{n}\sum_{j=0}^{n-1}\varphi(f^j(x)).
\end{align*}
[/step]
[step:Pass to the limit using ergodicity]
Let $x \in X$. Since $x \in X_{\mathrm{Bir}}$, the Birkhoff average of $\varphi$ along the orbit of $x$ converges to the space average:
\begin{align*}
\lim_{n \to \infty}\frac{1}{n}\sum_{j=0}^{n-1}\varphi(f^j(x)) = \int_I \varphi(t)\, d\mu(t).
\end{align*}
Using the logarithmic identity from the previous step, we obtain
\begin{align*}
\lim_{n \to \infty}\frac{1}{n}\log |(f^n)'(x)| = \int_I \varphi(t)\, d\mu(t).
\end{align*}
Finally, by the definition of $\varphi$ on $I \setminus E$ and the already proved fact that $\mu(E)=0$, changing $\varphi$ on $E$ does not change its integral. Hence
\begin{align*}
\int_I \varphi(t)\, d\mu(t) = \int_I \log |f'(t)|\, d\mu(t).
\end{align*}
Thus, for every $x$ in the full-measure set $X$, the Lyapunov exponent exists and equals the stated integral. This proves the theorem.
[/step]