Lebesgue Differentiation Theorem (Theorem # 74)
Theorem
Let $f \in L_{loc}^1(\mathbb{R}^n)$ be a locally [integrable](/page/Integral) [function](/page/Function). Then, for almost every $x \in \mathbb{R}^n$ (with respect to the Lebesgue measure $\mathcal{L}^n$), the following equality holds:
\begin{align*}
\lim_{r \to 0^+} \frac{1}{\mathcal{L}^n(B(x,r))} \int_{B(x,r)} |f(y) - f(x)| \, d\mathcal{L}^n(y) = 0
\end{align*}
where $B(x,r)$ denotes the open ball centered at $x$ with radius $r$. Points $x$ satisfying this property are called Lebesgue points of $f$.
Analysis
Measure Theory
Discussion
For any locally integrable function $f \in L^1_{\mathrm{loc}}(\mathbb{R}^n)$, almost every point $x \in \mathbb{R}^n$ is a Lebesgue point: the average of $|f(y) - f(x)|$ over a ball $B(x,r)$ tends to $0$ as $r \to 0^+$. The proof approximates $f$ in $L^1$ by a continuous function, controls the error via the Hardy-Littlewood maximal operator's weak-type $(1,1)$ bound and Chebyshev's inequality, then sends the approximation error to zero.
Proof
[proofplan]
We prove that almost every point in $\mathbb{R}^n$ is a Lebesgue point of $f \in L^1_{\mathrm{loc}}(\mathbb{R}^n)$. First, we reduce to $f \in L^1(\mathbb{R}^n)$ by a localisation argument. Then we approximate $f$ by a continuous compactly supported function $g$, decompose the oscillation $T_r(f)(x)$ via the triangle inequality into a term involving $g$ (which vanishes as $r \to 0$ by continuity) and terms involving the error $h = f - g$ controlled by the Hardy-Littlewood maximal function. The weak-type $(1,1)$ bound for the maximal operator and Chebyshev's inequality show the "bad set" has measure at most $C\varepsilon/\delta$, and since $\varepsilon$ is arbitrary, the bad set is null for each threshold $\delta$.
[/proofplan]
[step:Reduce to $f \in L^1(\mathbb{R}^n)$ by localisation]
It suffices to prove the theorem for $f \in L^1(\mathbb{R}^n)$. For general $f \in L^1_{\mathrm{loc}}(\mathbb{R}^n)$, define $f_k := f \cdot \mathbb{1}_{B(0,k)}$ for each $k \in \mathbb{N}$. Each $f_k \in L^1(\mathbb{R}^n)$, and for $x \in B(0, k)$ and $r < \operatorname{dist}(x, \partial B(0,k))$, we have $f_k = f$ on $B(x,r)$, so $x$ is a Lebesgue point of $f$ if and only if it is a Lebesgue point of $f_k$. If the theorem holds for each $f_k$, the set of non-Lebesgue points of $f$ inside $B(0,k)$ is a null set for every $k$, and the full set of non-Lebesgue points is $\bigcup_{k=1}^\infty (\text{non-Lebesgue points of } f \text{ in } B(0,k))$, a countable union of null sets, hence null.
We assume $f \in L^1(\mathbb{R}^n)$ for the remainder.
[/step]
[step:Define the oscillation functional and decompose via a continuous approximation]
For $x \in \mathbb{R}^n$ and $r > 0$, define the average oscillation:
\begin{align*}
T_r(f)(x) := \frac{1}{\mathcal{L}^n(B(x,r))} \int_{B(x,r)} |f(y) - f(x)| \, d\mathcal{L}^n(y).
\end{align*}
We must show $\limsup_{r \to 0^+} T_r(f)(x) = 0$ for $\mathcal{L}^n$-a.e. $x$.
Fix $\varepsilon > 0$. Since $C_c(\mathbb{R}^n)$ is dense in $L^1(\mathbb{R}^n)$, there exists $g \in C_c(\mathbb{R}^n)$ with $\|f - g\|_{L^1} < \varepsilon$. Write $h := f - g$, so $f = g + h$ and $\|h\|_{L^1} < \varepsilon$.
By the triangle inequality:
\begin{align*}
T_r(f)(x) &\le \frac{1}{\mathcal{L}^n(B(x,r))} \int_{B(x,r)} |g(y) - g(x)| \, d\mathcal{L}^n(y) \\
&\quad + \frac{1}{\mathcal{L}^n(B(x,r))} \int_{B(x,r)} |h(y)| \, d\mathcal{L}^n(y) + |h(x)|.
\end{align*}
The second term on the right is bounded above by the Hardy-Littlewood maximal function $M(h)(x)$, defined by:
\begin{align*}
M(h)(x) := \sup_{r > 0} \frac{1}{\mathcal{L}^n(B(x,r))} \int_{B(x,r)} |h(y)| \, d\mathcal{L}^n(y).
\end{align*}
Therefore:
\begin{align*}
T_r(f)(x) \le \frac{1}{\mathcal{L}^n(B(x,r))} \int_{B(x,r)} |g(y) - g(x)| \, d\mathcal{L}^n(y) + M(h)(x) + |h(x)|.
\end{align*}
[guided]
The strategy is to split $f$ into a "nice" part $g$ (continuous, compactly supported) and a "small" error $h = f - g$. For the nice part, the oscillation vanishes as $r \to 0$ by continuity. For the error, we bound the oscillation using the maximal function, which we can control in measure via the weak-type inequality.
Fix $\varepsilon > 0$. Density of $C_c(\mathbb{R}^n)$ in $L^1(\mathbb{R}^n)$ provides $g \in C_c(\mathbb{R}^n)$ with $\|f - g\|_{L^1} < \varepsilon$. Set $h = f - g$.
We decompose the oscillation using $f(y) - f(x) = (g(y) - g(x)) + (h(y) - h(x))$ and the triangle inequality:
\begin{align*}
|f(y) - f(x)| \le |g(y) - g(x)| + |h(y)| + |h(x)|.
\end{align*}
Averaging over $B(x,r)$:
\begin{align*}
T_r(f)(x) \le \frac{1}{\mathcal{L}^n(B(x,r))} \int_{B(x,r)} |g(y) - g(x)| \, d\mathcal{L}^n(y) + \frac{1}{\mathcal{L}^n(B(x,r))} \int_{B(x,r)} |h(y)| \, d\mathcal{L}^n(y) + |h(x)|.
\end{align*}
The second term is at most $M(h)(x)$ by definition of the Hardy-Littlewood maximal function. The third term $|h(x)|$ arises because $|h(x)|$ is constant in $y$ and factors out of the average.
[/guided]
[/step]
[step:Take $\limsup$ as $r \to 0$ and eliminate the continuous part]
Since $g$ is continuous, for every $x \in \mathbb{R}^n$:
\begin{align*}
\lim_{r \to 0^+} \frac{1}{\mathcal{L}^n(B(x,r))} \int_{B(x,r)} |g(y) - g(x)| \, d\mathcal{L}^n(y) = 0.
\end{align*}
Taking $\limsup_{r \to 0^+}$ in the decomposition inequality:
\begin{align*}
\Omega(f)(x) := \limsup_{r \to 0^+} T_r(f)(x) \le M(h)(x) + |h(x)|.
\end{align*}
[guided]
Why does the continuous term vanish? Fix $x$ and $\varepsilon' > 0$. By continuity of $g$, there exists $\delta > 0$ such that $|g(y) - g(x)| < \varepsilon'$ whenever $|y - x| < \delta$. For $r < \delta$, every $y \in B(x,r)$ satisfies $|y - x| < r < \delta$, so:
\begin{align*}
\frac{1}{\mathcal{L}^n(B(x,r))} \int_{B(x,r)} |g(y) - g(x)| \, d\mathcal{L}^n(y) < \varepsilon'.
\end{align*}
Since $\varepsilon' > 0$ was arbitrary, the limit is $0$. The $\limsup$ of the remaining terms gives the bound $\Omega(f)(x) \le M(h)(x) + |h(x)|$. Note that the right-hand side does not depend on $r$ -- it is already a pointwise bound valid for all $r$.
[/guided]
[/step]
[step:Bound the measure of the singular set using the weak-type $(1,1)$ inequality]
For $\delta > 0$, define the singular set $E_\delta := \{x \in \mathbb{R}^n : \Omega(f)(x) > \delta\}$. From the bound $\Omega(f)(x) \le M(h)(x) + |h(x)|$:
\begin{align*}
E_\delta \subseteq \{x : M(h)(x) > \delta/2\} \cup \{x : |h(x)| > \delta/2\}.
\end{align*}
The Hardy-Littlewood maximal inequality (weak-type $(1,1)$) provides a dimensional constant $C = C(n) > 0$ with:
\begin{align*}
\mathcal{L}^n(\{M(h) > \delta/2\}) \le \frac{2C}{\delta} \|h\|_{L^1}.
\end{align*}
Chebyshev's inequality (Markov's inequality) gives:
\begin{align*}
\mathcal{L}^n(\{|h| > \delta/2\}) \le \frac{2}{\delta} \|h\|_{L^1}.
\end{align*}
Combining:
\begin{align*}
\mathcal{L}^n(E_\delta) \le \frac{2(C + 1)}{\delta} \|h\|_{L^1} < \frac{2(C + 1)}{\delta} \varepsilon.
\end{align*}
[guided]
We need to show the set where $\Omega(f) > \delta$ has measure zero. The bound $\Omega(f)(x) \le M(h)(x) + |h(x)|$ means that if $\Omega(f)(x) > \delta$, then at least one of $M(h)(x)$ or $|h(x)|$ exceeds $\delta/2$. Therefore:
\begin{align*}
E_\delta \subseteq \{M(h) > \delta/2\} \cup \{|h| > \delta/2\}.
\end{align*}
For the first set, we use the weak-type $(1,1)$ bound for the Hardy-Littlewood maximal operator. This is a fundamental result in harmonic analysis: there exists a constant $C > 0$ depending only on the dimension $n$ such that for any $g \in L^1(\mathbb{R}^n)$ and $\lambda > 0$, $\mathcal{L}^n(\{M(g) > \lambda\}) \le \frac{C}{\lambda} \|g\|_{L^1}$. Applying this with $g = h$ and $\lambda = \delta/2$:
\begin{align*}
\mathcal{L}^n(\{M(h) > \delta/2\}) \le \frac{2C}{\delta} \|h\|_{L^1}.
\end{align*}
For the second set, Chebyshev's inequality gives $\mathcal{L}^n(\{|h| > \delta/2\}) \le \frac{2}{\delta} \|h\|_{L^1}$.
Adding both estimates and using $\|h\|_{L^1} = \|f - g\|_{L^1} < \varepsilon$:
\begin{align*}
\mathcal{L}^n(E_\delta) \le \frac{2C}{\delta} \|h\|_{L^1} + \frac{2}{\delta} \|h\|_{L^1} = \frac{2(C+1)}{\delta} \|h\|_{L^1} < \frac{2(C+1)}{\delta} \varepsilon.
\end{align*}
[/guided]
[/step]
[step:Send $\varepsilon \to 0$ and take a countable union to conclude]
Since $\varepsilon > 0$ was arbitrary and $\mathcal{L}^n(E_\delta) < \frac{2(C+1)}{\delta} \varepsilon$ for every $\varepsilon > 0$, it follows that $\mathcal{L}^n(E_\delta) = 0$ for each $\delta > 0$.
The set of points where $\Omega(f)(x) > 0$ is:
\begin{align*}
\{x : \Omega(f)(x) > 0\} = \bigcup_{k=1}^{\infty} E_{1/k}.
\end{align*}
As a countable union of null sets, this set has $\mathcal{L}^n$-measure zero. Therefore $\limsup_{r \to 0^+} T_r(f)(x) = 0$ for $\mathcal{L}^n$-a.e. $x$, which means $\lim_{r \to 0^+} T_r(f)(x) = 0$ a.e. (since $T_r(f)(x) \ge 0$, the $\limsup$ being zero forces the limit to exist and equal zero).
[guided]
The final step ties the argument together. The bound $\mathcal{L}^n(E_\delta) < \frac{2(C+1)}{\delta} \varepsilon$ holds for every $\varepsilon > 0$ (we can always find a better continuous approximation $g$). Since the left-hand side $\mathcal{L}^n(E_\delta)$ does not depend on $\varepsilon$, we must have $\mathcal{L}^n(E_\delta) = 0$.
This holds for every $\delta > 0$. The set where $\Omega(f)(x) > 0$ -- the set of non-Lebesgue points -- equals $\bigcup_{k=1}^{\infty} E_{1/k}$: a point $x$ with $\Omega(f)(x) > 0$ satisfies $\Omega(f)(x) > 1/k$ for some $k$, so $x \in E_{1/k}$. Conversely, every $x \in E_{1/k}$ has $\Omega(f)(x) > 1/k > 0$.
Since each $E_{1/k}$ has measure zero and the union is countable:
\begin{align*}
\mathcal{L}^n(\{x : \Omega(f)(x) > 0\}) \le \sum_{k=1}^{\infty} \mathcal{L}^n(E_{1/k}) = 0.
\end{align*}
Therefore $\Omega(f)(x) = 0$ for $\mathcal{L}^n$-a.e. $x$. Since $T_r(f)(x) \ge 0$ for all $r$ and $\limsup_{r \to 0^+} T_r(f)(x) = 0$, the limit exists and equals $0$ at almost every point.
[/guided]
[/step]