The [Lebesgue integral](/page/Lebesgue%20Integral) is built on a deceptively simple idea: to integrate a [function](/page/Function) $f$, partition its *range* rather than its domain, and measure the size of each preimage set $\{x : f(x) \in I\}$ for each interval $I \subset \mathbb{R}$. This strategy succeeds precisely when those preimage sets are *measurable* — when we can assign them a well-defined size. A function whose preimage sets are all measurable is called a **measurable function**, and this page develops the theory of such functions from the ground up.
Why should we expect any difficulty? For continuous functions $f: \mathbb{R}^n \to \mathbb{R}$, the preimage of every open set is open, hence Borel, hence Lebesgue measurable. The trouble emerges as soon as we move beyond continuity. Physical models regularly produce functions with jump discontinuities (shock waves), functions defined only up to a set of measure zero (equivalence classes in $L^p$), and functions constructed as pointwise limits of sequences (solutions obtained by approximation). We need a class of functions broad enough to accommodate all of these, yet structured enough that the integration machinery still works.
[example: A Non-Measurable Preimage Destroys Integration]
Suppose we attempt to integrate a function $f: [0,1] \to \mathbb{R}$ for which the preimage $f^{-1}((0, \infty))$ is a non-measurable subset of $[0,1]$. Then the "positive part" $f^+ = \max(f, 0)$ satisfies $\{x : f^+(x) > 0\} = f^{-1}((0, \infty))$, which has no well-defined Lebesgue measure. The Lebesgue integral of $f^+$ requires computing
\begin{align*}
\int_0^1 f^+(x) \, d\mathcal{L}^1 = \sup\left\{ \int_0^1 s(x) \, d\mathcal{L}^1 : 0 \le s \le f^+, \; s \text{ simple} \right\},
\end{align*}
but even the simplest approximation $s = c \cdot \mathbb{1}_{f^{-1}((0,\infty))}$ for some constant $c > 0$ requires $f^{-1}((0, \infty))$ to be measurable. Without measurability of preimage sets, the entire integration theory collapses at the first step.
Such pathology is not hypothetical. The Vitali set construction (using the Axiom of Choice to select one representative from each coset of $\mathbb{Q}$ in $\mathbb{R}/\mathbb{Q}$, restricted to $[0,1]$) produces a set $V \subset [0,1]$ that is not Lebesgue measurable. The indicator function $\mathbb{1}_V: [0,1] \to \{0,1\}$ has $\mathbb{1}_V^{-1}(\{1\}) = V$, so $\mathbb{1}_V$ is not Lebesgue measurable — it cannot be integrated.
[/example]
The concept of a measurable function is therefore not a technicality to be checked and forgotten; it is the *precise boundary* separating functions that can be integrated from those that cannot. The theory that follows addresses three fundamental questions:
1. **Characterisation:** What conditions on $f$ guarantee measurability, and how can we check them in practice?
2. **Stability:** Which operations on measurable functions preserve measurability — sums, products, compositions, pointwise limits?
3. **Approximation:** Can every measurable function be approximated by structurally simple functions, and if so, how?
## Definition
The central definition extracts the essential property from the discussion above: a function is measurable when every preimage of a "detectable" set is itself "detectable."
[definition: Measurable Function]
Let $(X, \mathcal{F})$ and $(Y, \mathcal{G})$ be measurable spaces (sets equipped with $\sigma$-algebras). A function $f: X \to Y$ is **$(\mathcal{F}, \mathcal{G})$-measurable** if the preimage of every $\mathcal{G}$-measurable set belongs to $\mathcal{F}$:
\begin{align*}
f^{-1}(B) \in \mathcal{F} \quad \text{for every } B \in \mathcal{G}.
\end{align*}
When $Y = \mathbb{R}$ (or $Y = \overline{\mathbb{R}} = \mathbb{R} \cup \{-\infty, +\infty\}$) and $\mathcal{G} = \mathcal{B}(\mathbb{R})$ is the Borel $\sigma$-algebra, we say $f$ is **$\mathcal{F}$-measurable**. When $X = \mathbb{R}^n$, $\mathcal{F} = \mathcal{L}(\mathbb{R}^n)$ is the Lebesgue $\sigma$-algebra, and $\mathcal{G} = \mathcal{B}(\mathbb{R})$, we say $f$ is **Lebesgue measurable**.
[/definition]
Several points deserve immediate attention.
First, the definition is modelled on [continuity](/page/Continuity): a function between topological spaces is continuous if and only if the preimage of every open set is open. Replacing "open" with "measurable" yields the definition above. This analogy is not merely cosmetic — it explains why measurable functions share many closure properties with continuous functions (stability under composition, algebraic operations, limits) while relaxing the topological constraints.
Second, the codomain $\sigma$-algebra $\mathcal{G}$ matters. The same function $f: \mathbb{R} \to \mathbb{R}$ may be $(\mathcal{L}(\mathbb{R}), \mathcal{B}(\mathbb{R}))$-measurable but fail to be $(\mathcal{L}(\mathbb{R}), \mathcal{L}(\mathbb{R}))$-measurable. In practice, the codomain almost always carries the Borel $\sigma$-algebra, and the phrase "measurable function" without further qualification means $\mathcal{F}$-measurable with $\mathcal{G} = \mathcal{B}(\mathbb{R})$.
Third, the extended real line $\overline{\mathbb{R}} = [-\infty, +\infty]$ is equipped with the $\sigma$-algebra generated by sets of the form $(a, +\infty]$ for $a \in \mathbb{R}$, together with $\{-\infty\}$ and $\{+\infty\}$. Functions taking values $\pm \infty$ (such as $f(x) = 1/x$ at $x = 0$, or suprema of infinite families) are accommodated naturally in this framework.
[example: Continuous Functions Are Borel Measurable]
Let $f: \mathbb{R}^n \to \mathbb{R}$ be continuous. We verify that $f$ is $(\mathcal{B}(\mathbb{R}^n), \mathcal{B}(\mathbb{R}))$-measurable — that is, Borel measurable.
The Borel $\sigma$-algebra $\mathcal{B}(\mathbb{R})$ is generated by the open sets of $\mathbb{R}$. Since $f$ is continuous, $f^{-1}(G)$ is open in $\mathbb{R}^n$ for every open set $G \subset \mathbb{R}$. Every open set in $\mathbb{R}^n$ is a Borel set, so $f^{-1}(G) \in \mathcal{B}(\mathbb{R}^n)$.
It remains to verify that the collection $\{B \in \mathcal{B}(\mathbb{R}) : f^{-1}(B) \in \mathcal{B}(\mathbb{R}^n)\}$ is a $\sigma$-algebra containing the open sets. This follows from the set-theoretic identities:
\begin{align*}
f^{-1}(B^c) &= (f^{-1}(B))^c, \\
f^{-1}\!\left(\bigcup_{k=1}^\infty B_k\right) &= \bigcup_{k=1}^\infty f^{-1}(B_k).
\end{align*}
Since this collection is a $\sigma$-algebra containing all open sets, it contains $\mathcal{B}(\mathbb{R})$. Therefore $f^{-1}(B) \in \mathcal{B}(\mathbb{R}^n)$ for every $B \in \mathcal{B}(\mathbb{R})$.
Note that continuous functions are Borel measurable, and since $\mathcal{B}(\mathbb{R}^n) \subset \mathcal{L}(\mathbb{R}^n)$ (every Borel set is Lebesgue measurable), continuous functions are also Lebesgue measurable. The converse fails: the [Dirichlet function](/page/Function) $\mathbb{1}_{\mathbb{Q}}: \mathbb{R} \to \{0, 1\}$ is Borel measurable (since $\mathbb{Q}$ is a Borel set) but nowhere continuous.
[/example]
## Equivalent Characterisations for Real-Valued Functions
Checking the full preimage condition — $f^{-1}(B) \in \mathcal{F}$ for every Borel set $B$ — directly is impractical. The Borel $\sigma$-algebra contains an enormous family of sets (cardinality $2^{\aleph_0}$, the same as the power set of $\mathbb{R}$). A natural question is: can we reduce the verification to a much smaller generating family?
The answer is yes, and the reduction is dramatic. The Borel $\sigma$-algebra $\mathcal{B}(\mathbb{R})$ is generated by the half-lines $(a, \infty)$ as $a$ ranges over $\mathbb{R}$. Since $\sigma$-algebras are closed under complements, countable unions, and countable intersections, controlling preimages of half-lines is sufficient to control preimages of all Borel sets.
[quotetheorem:525]
The equivalence of (ii)-(v) is immediate from the identities
\begin{align*}
\{f \ge a\} &= \bigcap_{k=1}^\infty \{f > a - 1/k\}, \\
\{f < a\} &= \{f \ge a\}^c, \\
\{f \le a\} &= \{f > a\}^c.
\end{align*}
The passage from (ii) to (i) requires showing that the collection $\{B \in \mathcal{B}(\overline{\mathbb{R}}) : f^{-1}(B) \in \mathcal{F}\}$ is a $\sigma$-algebra. Since it contains all sets $(a, \infty]$ and these generate $\mathcal{B}(\overline{\mathbb{R}})$, this $\sigma$-algebra contains $\mathcal{B}(\overline{\mathbb{R}})$.
In practice, condition (ii) is the most commonly used: to verify that $f$ is measurable, check that the *superlevel sets* $\{f > a\}$ are measurable for every threshold $a \in \mathbb{R}$.
[remark: Restricting to Rational Thresholds]
Condition (ii) can be further reduced: it suffices to check $\{f > q\} \in \mathcal{F}$ for every $q \in \mathbb{Q}$. This is because $\{f > a\} = \bigcup_{k=1}^\infty \{f > q_k\}$ for any sequence of rationals $q_k \searrow a$, and $\mathcal{F}$ is closed under countable unions. This reduction is useful when verifying measurability of explicitly constructed functions.
[/remark]
The rational threshold reduction is especially effective for functions that take only countably many values, since the superlevel set $\{f > q\}$ stabilises: it equals either the full space, the empty set, or one of finitely many preimage sets. The following example illustrates this directly.
[example: The Dirichlet Function Is Borel Measurable]
Define the Dirichlet function
\begin{align*}
f: \mathbb{R} &\to \mathbb{R} \\
x &\mapsto \mathbb{1}_{\mathbb{Q}}(x) = \begin{cases} 1 & \text{if } x \in \mathbb{Q}, \\ 0 & \text{if } x \notin \mathbb{Q}. \end{cases}
\end{align*}
We verify measurability using the criterion (ii). The superlevel set $\{f > a\}$ takes only three possible forms:
\begin{align*}
\{f > a\} = \begin{cases}
\mathbb{R} & \text{if } a < 0, \\
\mathbb{Q} & \text{if } 0 \le a < 1, \\
\varnothing & \text{if } a \ge 1.
\end{cases}
\end{align*}
The sets $\mathbb{R}$, $\varnothing$, and $\mathbb{Q} = \bigcup_{q \in \mathbb{Q}} \{q\}$ (a countable union of singletons, each of which is closed hence Borel) are all Borel sets. Therefore $f$ is Borel measurable.
This function is the standard example of a measurable function that is nowhere continuous: at every point $x_0 \in \mathbb{R}$, every neighbourhood contains both rationals and irrationals, so $f$ oscillates between $0$ and $1$ in every interval. The measurability criterion sees no difficulty here, because it asks only about the *global* structure of level sets, not about local oscillation.
[/example]
## Algebraic and Lattice Operations
A central practical requirement is that measurability be preserved under the operations of analysis: addition, multiplication, taking suprema and infima, and passing to pointwise limits. Without these closure properties, we would need to re-verify measurability after every computation — an intolerable burden.
The situation is analogous to the stability of continuous functions under algebraic operations, but with a crucial advantage: measurable functions are also closed under *pointwise limits*, a property that continuous functions conspicuously lack. This is the single most important structural difference between the two classes, and it is the reason that measure theory (rather than topology) provides the natural framework for integration.
[quotetheorem:1018]
The key to verifying these is to express superlevel sets of the compound function in terms of superlevel sets of $f$ and $g$. For instance, the identity
\begin{align*}
\{f + g > a\} = \bigcup_{q \in \mathbb{Q}} \big(\{f > q\} \cap \{g > a - q\}\big)
\end{align*}
expresses the superlevel set of $f + g$ as a countable union of intersections of measurable sets, hence measurable. The rational numbers appear because we need a countable dense set to "scan" all possible ways of splitting the threshold $a$ between $f$ and $g$. An uncountable union would not be guaranteed to remain in the $\sigma$-algebra.
For products, one uses the polarisation identity $fg = \frac{1}{4}[(f+g)^2 - (f-g)^2]$ after establishing that $h \mapsto h^2$ preserves measurability (which follows from $\{h^2 > a\} = \{h > \sqrt{a}\} \cup \{h < -\sqrt{a}\}$ for $a > 0$).
The lattice operations $\max$ and $\min$ follow from the identities
\begin{align*}
\{\max(f,g) > a\} &= \{f > a\} \cup \{g > a\}, \\
\{\min(f,g) > a\} &= \{f > a\} \cap \{g > a\},
\end{align*}
which are finite unions and intersections of measurable sets.
[explanation: Why Sums Require the Rational Trick]
A tempting but incorrect approach to proving measurability of $f + g$ is to write $\{f + g > a\} = \{f > a - g\}$ and argue that "the right-hand side involves $f$ and $g$, both measurable, so the set is measurable." The error is subtle: the set $\{f > a - g\}$ involves a *variable* threshold $a - g(x)$ that depends on $x$, and the condition "$f(x) > h(x)$" cannot be deduced from measurability of $f$ and $h$ without further argument.
The rational union trick resolves this by replacing the variable threshold with a countable family of fixed thresholds. Precisely: $f(x) + g(x) > a$ if and only if there exists $q \in \mathbb{Q}$ with $f(x) > q$ and $g(x) > a - q$. The "if" direction is clear (add the two inequalities). For "only if": if $f(x) + g(x) > a$, then $f(x) > a - g(x)$, and by density of $\mathbb{Q}$ in $\mathbb{R}$, there exists a rational $q$ with $a - g(x) < q < f(x)$, which gives $f(x) > q$ and $g(x) > a - q$.
This technique — reducing a condition involving two functions to a countable family of conditions involving each function separately — recurs throughout measure theory.
[/explanation]
The most powerful closure property of measurable functions concerns pointwise limits. This is where measurability decisively separates from continuity.
[quotetheorem:1024]
The argument for (i) rests on the identity
\begin{align*}
\left\{\sup_{k \ge 1} f_k > a\right\} = \bigcup_{k=1}^\infty \{f_k > a\},
\end{align*}
which is a countable union of measurable sets. For the infimum, $\{\inf_{k \ge 1} f_k < a\} = \bigcup_{k=1}^\infty \{f_k < a\}$. Parts (ii) and (iii) follow from (i) via the expressions
\begin{align*}
\limsup_{k \to \infty} f_k &= \inf_{m \ge 1} \sup_{k \ge m} f_k, \\
\liminf_{k \to \infty} f_k &= \sup_{m \ge 1} \inf_{k \ge m} f_k.
\end{align*}
The contrast with continuous functions is stark: the pointwise limit of continuous functions need not be continuous (consider $f_k(x) = x^k$ on $[0,1]$, converging to a discontinuous limit), but the pointwise limit of measurable functions is always measurable. This closure under limits is what makes measurable functions the natural domain for the Lebesgue integral: approximation schemes (truncation, regularisation, monotone convergence) produce sequences whose limits remain in the class.
[example: A Measurable Function That Is Not Borel Measurable]
The distinction between Lebesgue measurability and Borel measurability is real, not just a technicality. We construct a Lebesgue measurable function that is not Borel measurable.
Let $\varphi: [0,1] \to [0,1]$ be the Cantor-Lebesgue function (devil's staircase), which is continuous, non-decreasing, maps $[0,1]$ onto $[0,1]$, and is constant on each interval complementary to the Cantor set $C$. Define $\psi: [0,1] \to [0,2]$ by $\psi(x) = \varphi(x) + x$. Then $\psi$ is a continuous, strictly increasing bijection, hence a homeomorphism onto its image.
The Cantor set $C$ has $\mathcal{L}^1(C) = 0$, yet $\psi(C)$ has $\mathcal{L}^1(\psi(C)) = 1$ (since $\psi$ "stretches" $C$ while mapping the complement of $C$, which has measure $1$, to a set of measure $1$, and $\mathcal{L}^1(\psi([0,1])) = 2$). Within $\psi(C)$ (a set of positive measure), there exists a non-measurable subset $V \subset \psi(C)$ — no set of positive Lebesgue measure is entirely free of non-measurable subsets (by the axiom of choice).
Now set $A = \psi^{-1}(V) \subset C$. Since $A \subset C$ and $\mathcal{L}^1(C) = 0$, the set $A$ is Lebesgue measurable (every subset of a null set is Lebesgue measurable). However, $A$ is not a Borel set: if $A$ were Borel, then $V = \psi(A)$ would be the continuous image of a Borel set, hence an analytic set, hence Lebesgue measurable — contradicting the choice of $V$.
The indicator $f = \mathbb{1}_A: [0,1] \to \{0,1\}$ is Lebesgue measurable but not Borel measurable. This example demonstrates that the Lebesgue $\sigma$-algebra is strictly larger than the Borel $\sigma$-algebra — they are completed by including all subsets of null sets.
[/example]
## Composition and Image Measurability
A natural operation is to compose measurable functions: if $f: X \to Y$ is $(\mathcal{F}, \mathcal{G})$-measurable and $g: Y \to Z$ is $(\mathcal{G}, \mathcal{H})$-measurable, is $g \circ f$ measurable? For continuous functions between topological spaces, the composition of continuous functions is continuous — does the same hold for measurable functions?
The answer is yes, but with an important caveat about which $\sigma$-algebras are involved. The composition rule works cleanly when the intermediate $\sigma$-algebra $\mathcal{G}$ is the *same* in both the codomain of $f$ and the domain of $g$. It can fail when different $\sigma$-algebras are used.
[quotetheorem:1019]
The verification is a one-line computation with preimages: for any $C \in \mathcal{H}$,
\begin{align*}
(g \circ f)^{-1}(C) = f^{-1}(g^{-1}(C)).
\end{align*}
Since $g$ is $(\mathcal{G}, \mathcal{H})$-measurable, $g^{-1}(C) \in \mathcal{G}$. Since $f$ is $(\mathcal{F}, \mathcal{G})$-measurable, $f^{-1}(g^{-1}(C)) \in \mathcal{F}$.
[explanation: The Composition Pitfall with Lebesgue Measurability]
A subtle failure occurs when the $\sigma$-algebras are mismatched. Let $f: \mathbb{R} \to \mathbb{R}$ be Lebesgue measurable (i.e., $(\mathcal{L}(\mathbb{R}), \mathcal{B}(\mathbb{R}))$-measurable) and let $g: \mathbb{R} \to \mathbb{R}$ be Lebesgue measurable. Is $g \circ f$ Lebesgue measurable?
Not necessarily. The composition theorem requires that $f$ be $(\mathcal{L}(\mathbb{R}), \mathcal{G})$-measurable and $g$ be $(\mathcal{G}, \mathcal{B}(\mathbb{R}))$-measurable for the *same* $\sigma$-algebra $\mathcal{G}$ in the middle. If $f$ is $(\mathcal{L}, \mathcal{B})$-measurable and $g$ is $(\mathcal{B}, \mathcal{B})$-measurable (i.e., Borel measurable), then $g \circ f$ is $(\mathcal{L}, \mathcal{B})$-measurable — the composition works. But if $g$ is only $(\mathcal{L}, \mathcal{B})$-measurable, the middle $\sigma$-algebras are $\mathcal{B}$ (codomain of $f$) and $\mathcal{L}$ (domain of $g$), and these do not match: $\mathcal{B} \subsetneq \mathcal{L}$.
The concrete failure: the example from the previous section gives a set $A$ that is Lebesgue measurable but not Borel measurable, with a continuous (hence Borel measurable) bijection $\psi$ such that $\psi^{-1}(V) = A$. Taking $g = \mathbb{1}_A$ (Lebesgue measurable) and $f = \psi^{-1}$ (continuous, hence Borel measurable), the composition $g \circ f = \mathbb{1}_A \circ \psi^{-1} = \mathbb{1}_{\psi(A)} = \mathbb{1}_V$ is not Lebesgue measurable (since $V$ is not Lebesgue measurable).
The practical rule: **compositions with Borel measurable functions are safe; compositions with merely Lebesgue measurable functions may fail.** This is one reason why the default convention in integration theory is to take the codomain $\sigma$-algebra to be Borel.
[/explanation]
## Approximation by Simple Functions
In the construction of the [Lebesgue integral](/page/Lebesgue%20Integral), we cannot integrate a general measurable function directly. Instead, we integrate *simple functions* — finite linear combinations of indicator functions — and then define the integral of a general function as a limit. The success of this program depends on two things: (1) simple functions must be easy to integrate (they are — each reduces to a finite sum of "measure times value" terms), and (2) every measurable function must be approximable by simple functions in a controlled way.
This section addresses the second requirement. The key difficulty is the nature of the approximation: we need the approximating sequence to converge *monotonically from below* for non-negative functions, because the monotone convergence theorem (which is the engine that passes integrals through limits) requires this monotonicity.
[definition: Simple Function]
Let $(X, \mathcal{F})$ be a measurable space. A function $s: X \to \mathbb{R}$ is a **simple function** if it takes only finitely many values. That is, the range $s(X) = \{c_1, c_2, \ldots, c_N\}$ is a finite set.
Every simple function admits a unique **canonical representation**:
\begin{align*}
s = \sum_{j=1}^N c_j \, \mathbb{1}_{A_j},
\end{align*}
where $c_1, c_2, \ldots, c_N$ are the distinct values of $s$ and $A_j = s^{-1}(\{c_j\}) = \{x \in X : s(x) = c_j\}$. The sets $A_1, \ldots, A_N$ partition $X$.
The simple function $s$ is $\mathcal{F}$-measurable if and only if each set $A_j$ belongs to $\mathcal{F}$.
[/definition]
The measurability condition on $s$ is equivalent to requiring that $s$ is measurable as a function in the sense of our general definition, since $s^{-1}(B) = \bigcup_{j: c_j \in B} A_j$ is a finite union of sets in $\mathcal{F}$.
The following theorem is fundamental to the entire theory of integration. It guarantees that every measurable function is the *monotone* limit of simple functions — not just a limit in some abstract sense, but one that increases pointwise at every point.
[quotetheorem:1020]
The explicit construction is important because it is used repeatedly in integration theory. For each integer $k \ge 1$, partition the range $[0, k)$ into $k \cdot 2^k$ equal subintervals of length $2^{-k}$, and collect everything above $k$ into a single level:
\begin{align*}
s_k(x) = \begin{cases}
\displaystyle \frac{j-1}{2^k} & \text{if } \frac{j-1}{2^k} \le f(x) < \frac{j}{2^k}, \quad j = 1, 2, \ldots, k \cdot 2^k, \\[6pt]
k & \text{if } f(x) \ge k.
\end{cases}
\end{align*}
In the canonical representation,
\begin{align*}
s_k = \sum_{j=1}^{k \cdot 2^k} \frac{j-1}{2^k} \, \mathbb{1}_{A_{k,j}} + k \cdot \mathbb{1}_{B_k},
\end{align*}
where
\begin{align*}
A_{k,j} &= f^{-1}\!\left(\left[\tfrac{j-1}{2^k}, \tfrac{j}{2^k}\right)\right), \qquad B_k = f^{-1}([k, \infty]).
\end{align*}
These sets are measurable because $f$ is measurable.
Monotonicity $s_k \le s_{k+1}$ follows because the partition at step $k+1$ refines the partition at step $k$: each interval $[\frac{j-1}{2^k}, \frac{j}{2^k})$ is split into $[\frac{2j-2}{2^{k+1}}, \frac{2j-1}{2^{k+1}})$ and $[\frac{2j-1}{2^{k+1}}, \frac{2j}{2^{k+1}})$, and $s_{k+1}$ takes values at least as large as $s_k$ on each piece.
For a general (not necessarily non-negative) measurable function $f: X \to \overline{\mathbb{R}}$, we decompose $f = f^+ - f^-$ and approximate each part separately.
[example: Explicit Simple Approximation of $f(x) = x^2$]
Consider $f: [0, 2] \to [0, 4]$ given by $f(x) = x^2$, which is continuous hence measurable. We compute the first few terms of the standard approximating sequence.
**For $k = 1$:** The partition divides $[0, 1)$ into two subintervals: $[0, 1/2)$ and $[1/2, 1)$. The overflow set is $\{f \ge 1\} = \{x^2 \ge 1\} = [1, 2]$.
\begin{align*}
s_1(x) = \begin{cases}
0 & \text{if } x \in [0, 1/\sqrt{2}), \\
1/2 & \text{if } x \in [1/\sqrt{2}, 1), \\
1 & \text{if } x \in [1, 2].
\end{cases}
\end{align*}
The maximum approximation error is $\max_{x \in [0,2]} |f(x) - s_1(x)|$. On $[1, 2]$, $f(x) - s_1(x) = x^2 - 1$ which reaches $3$ at $x = 2$.
**For $k = 2$:** The partition divides $[0, 2)$ into $8$ subintervals of length $1/4$: $[0, 1/4)$, $[1/4, 1/2)$, $\ldots$, $[7/4, 2)$. The overflow set is $\{f \ge 2\} = \{x^2 \ge 2\} = [\sqrt{2}, 2]$. For example:
\begin{align*}
s_2(x) &= 0 \quad \text{on } [0, 1/2), & &\text{since } x^2 < 1/4, \\
s_2(x) &= 1/4 \quad \text{on } [1/2, \sqrt{2}/2), & &\text{since } 1/4 \le x^2 < 1/2, \\
s_2(x) &= 2 \quad \text{on } [\sqrt{2}, 2], & &\text{since } x^2 \ge 2.
\end{align*}
The maximum error on the overflow set is now $f(2) - s_2(2) = 4 - 2 = 2$, which is smaller than before. On the non-overflow part, the error is at most $1/4$.
As $k$ increases, the partition of the range becomes finer (error at most $2^{-k}$ on the non-overflow region) and the overflow threshold increases (catching larger values), so $s_k \nearrow f$ pointwise. On $[0, 2]$ (where $f$ is bounded by $4$), once $k \ge 4$ the overflow region is empty and the convergence is uniform with error at most $2^{-k}$.
[/example]
## Convergence of Measurable Functions
In analysis, we frequently construct solutions or establish properties by taking limits of sequences of functions. A bounded sequence of measurable functions may converge in several different senses — pointwise, almost everywhere, in measure, or in $L^p$ norm — and these notions are distinct. Understanding the relationships between them is essential for applying the convergence theorems of integration theory (monotone convergence, dominated convergence, Fatou's lemma).
The central tension is this: pointwise convergence is a natural and intuitive notion, but it is *too rigid* for many purposes. It does not define a topology on function spaces that interacts well with integration. Convergence in measure is weaker and more flexible, but it does not control the behavior of individual points. The interplay between these notions — and the theorems that relate them — is the subject of this section.
### Almost Everywhere Equivalence
Before discussing convergence, we must address a fundamental ambiguity: in measure theory, a function is typically determined only up to a set of measure zero. Two functions that differ on a null set produce the same integral and the same $L^p$ norm. This leads to the convention of identifying functions that agree almost everywhere.
[definition: Almost Everywhere Agreement]
Let $(X, \mathcal{F}, \mu)$ be a measure space. Two measurable functions $f, g: X \to \overline{\mathbb{R}}$ are **equal $\mu$-almost everywhere** (written $f = g$ $\mu$-a.e.) if the set where they disagree has measure zero:
\begin{align*}
\mu(\{x \in X : f(x) \neq g(x)\}) = 0.
\end{align*}
[/definition]
A property that holds on a set whose complement has measure zero is said to hold **$\mu$-almost everywhere** (abbreviated $\mu$-a.e., or simply a.e. when the measure is understood).
An important consequence: if $f$ is measurable and $g = f$ $\mu$-a.e., is $g$ necessarily measurable? The answer depends on the $\sigma$-algebra. If $\mathcal{F}$ is *complete* with respect to $\mu$ (meaning every subset of a null set belongs to $\mathcal{F}$), then yes — any function equal a.e. to a measurable function is measurable. The Lebesgue $\sigma$-algebra $\mathcal{L}(\mathbb{R}^n)$ is complete with respect to Lebesgue measure, so this holds in the Lebesgue setting. The Borel $\sigma$-algebra $\mathcal{B}(\mathbb{R}^n)$ is *not* complete, and a function equal a.e. to a Borel function need not be Borel measurable (as demonstrated by the Cantor set example above).
### Convergence in Measure
Pointwise convergence and convergence almost everywhere ask that $f_k(x) \to f(x)$ for every $x$ (or for almost every $x$). Convergence in measure relaxes this to a weaker, more quantitative condition: the set where $f_k$ differs from $f$ by more than $\varepsilon$ must shrink in measure.
[definition: Convergence in Measure]
Let $(X, \mathcal{F}, \mu)$ be a measure space. A sequence of measurable functions $f_k: X \to \mathbb{R}$ **converges in measure** to a measurable function $f: X \to \mathbb{R}$ (written $f_k \xrightarrow{\mu} f$) if for every $\varepsilon > 0$,
\begin{align*}
\lim_{k \to \infty} \mu\!\left(\{x \in X : |f_k(x) - f(x)| > \varepsilon\}\right) = 0.
\end{align*}
[/definition]
Convergence in measure does not require convergence at any particular point — it is a "collective" condition on the size of the set of bad points. This makes it weaker than a.e. convergence in general, but the relationship between the two notions is more subtle than a simple hierarchy.
[example: Convergence in Measure Without Pointwise Convergence]
The "typewriter sequence" on $[0,1]$ with Lebesgue measure demonstrates that convergence in measure does not imply convergence at any point.
Define a sequence of functions by cycling indicator functions of successively shorter subintervals through $[0,1]$. Formally, for each $k \ge 1$, write $k = 2^m + j$ with $0 \le j < 2^m$ (unique), and set
\begin{align*}
f_k = \mathbb{1}_{[j/2^m, \, (j+1)/2^m]}.
\end{align*}
The first few terms are:
\begin{align*}
f_1 &= \mathbb{1}_{[0,1]}, \quad f_2 = \mathbb{1}_{[0, 1/2]}, \quad f_3 = \mathbb{1}_{[1/2, 1]}, \\
f_4 &= \mathbb{1}_{[0, 1/4]}, \quad f_5 = \mathbb{1}_{[1/4, 1/2]}, \quad f_6 = \mathbb{1}_{[1/2, 3/4]}, \quad f_7 = \mathbb{1}_{[3/4, 1]}, \quad \ldots
\end{align*}
**Convergence in measure:** For any $\varepsilon \in (0, 1]$, the set $\{|f_k| > \varepsilon\} = [j/2^m, (j+1)/2^m]$ has measure $2^{-m}$, which tends to $0$ as $k \to \infty$ (since $m \to \infty$). So $f_k \xrightarrow{\mu} 0$.
**Failure of pointwise convergence:** Fix any $x_0 \in [0, 1]$. For each $m$, there is exactly one index $j$ with $x_0 \in [j/2^m, (j+1)/2^m]$, so $f_k(x_0) = 1$ for $k = 2^m + j$. Meanwhile, for all other values of $k$ with the same $m$, $f_k(x_0) = 0$. Thus $f_k(x_0)$ oscillates between $0$ and $1$ infinitely often, and $\lim_{k \to \infty} f_k(x_0)$ does not exist for any $x_0$.
[/example]
The typewriter sequence shows that convergence in measure is strictly weaker than a.e. convergence: the full sequence need not converge at any point. However, convergence in measure still retains enough structure to extract a well-behaved subsequence — this is the content of the Riesz subsequence principle.
[quotetheorem:1021]
The theorem says that while the full sequence may fail to converge pointwise (as the typewriter sequence demonstrates), one can always pass to a subsequence that does converge almost everywhere. The subsequence is constructed by a diagonal argument: for each $\ell \ge 1$, choose $k_\ell$ so that $\mu(\{|f_{k_\ell} - f| > 2^{-\ell}\}) < 2^{-\ell}$. The Borel-Cantelli lemma then implies that $f_{k_\ell} \to f$ a.e.
Note what the theorem does *not* say: it does not assert that the full sequence converges a.e., nor does it identify *which* subsequence works (the choice depends on the specific sequence). In applications — particularly in the proof of the Riesz-Fischer theorem for $L^p$ completeness — the Riesz subsequence principle is the key tool that links convergence in $L^p$ norm (which implies convergence in measure) to a.e. convergence of a subsequence.
The converse direction requires an additional hypothesis: finite measure.
[quotetheorem:1022]
The finite measure hypothesis is essential. For a counterexample on an infinite measure space, consider $(X, \mathcal{F}, \mu) = (\mathbb{R}, \mathcal{L}(\mathbb{R}), \mathcal{L}^1)$ and the sequence $f_k = \mathbb{1}_{[k, k+1]}$. Then $f_k(x) \to 0$ for every $x \in \mathbb{R}$ (eventually $x \notin [k, k+1]$), so the convergence is pointwise everywhere — stronger than a.e. convergence. Yet for any $\varepsilon \in (0,1]$, the set $\{|f_k| > \varepsilon\} = [k, k+1]$ has measure $1$ for every $k$, so $f_k$ does *not* converge to $0$ in measure. The support of $f_k$ "escapes to infinity" without shrinking in size — a phenomenon that finite measure prevents, because in a finite measure space every sequence of measurable sets satisfying $A_k \supset \{|f_k - f| > \varepsilon\}$ with $\mu(A_k) \not\to 0$ would force the a.e. convergence to fail (by Borel-Cantelli or direct contradiction via the "tail set" $\limsup_k \{|f_k - f| > \varepsilon\}$).
These two results together give the precise relationship: on finite measure spaces, a.e. convergence and convergence in measure are comparable (a.e. convergence is strictly stronger), while on general measure spaces, convergence in measure only guarantees a.e. convergence of a subsequence.
### Egoroff's Theorem: From Almost Everywhere to Almost Uniform
Pointwise convergence a.e. is difficult to work with directly because it provides no control over the *rate* of convergence — different points may converge at wildly different speeds. In analysis, we often need uniform convergence (or something close to it) to exchange limits with integrals or other operations.
Egoroff's theorem provides the bridge: on a set of finite measure, almost everywhere convergence can be "upgraded" to *uniform convergence* at the cost of removing a set of arbitrarily small measure. This is the precise sense in which a.e. convergence is "almost" uniform.
[quotetheorem:14]
The hypothesis $\mu(A) < \infty$ cannot be dropped. On $(\mathbb{R}, \mathcal{L}(\mathbb{R}), \mathcal{L}^1)$, consider $f_k = \mathbb{1}_{[k, k+1]}$. Then $f_k(x) \to 0$ for every $x \in \mathbb{R}$ (eventually $x \notin [k, k+1]$), so $f_k \to 0$ pointwise everywhere. However, for any measurable $B$ with $\mathcal{L}^1(\mathbb{R} \setminus B) < 1$, the set $B$ must intersect $[k, k+1]$ for all sufficiently large $k$ (otherwise we would have removed all of $[k, k+1]$ for infinitely many $k$, using total measure at least $1$). On $B \cap [k, k+1]$, $f_k = 1$, so $\|f_k\|_{L^\infty(B)} = 1$ for all large $k$, and uniform convergence on $B$ fails. This "escape to infinity" phenomenon is possible only because $\mu(\mathbb{R}) = \infty$.
Egoroff's theorem is a key ingredient in the proof of Lusin's theorem (below) and in establishing the relationship between different modes of convergence.
### Lusin's Theorem: Measurable Functions Are Nearly Continuous
Lusin's theorem is a remarkable structural result: every measurable function is continuous except on a set of arbitrarily small measure. More precisely, we can find a continuous function that agrees with the given measurable function outside an arbitrarily small exceptional set.
This should be surprising. Measurable functions include objects like the Dirichlet function $\mathbb{1}_{\mathbb{Q}}$, which is discontinuous at every point. Lusin's theorem says that even for such functions, we can remove a small set and make the restriction continuous — the discontinuities, while dense, are measure-theoretically negligible.
[quotetheorem:1023]
Lusin's theorem does *not* say that $f$ is continuous on the set $\{x : \bar{f}(x) = f(x)\}$ — the restriction of $f$ to $A \setminus E$ (where $E$ is the exceptional set) may fail to extend continuously to all of $A$. What the theorem provides is a *globally* continuous function $\bar{f}$ that matches $f$ outside a small set. The distinction matters: the restriction of a function to a subset can be continuous on that subset without being the restriction of any globally continuous function. Lusin's theorem guarantees the stronger global extension.
The theorem also does *not* say that $f$ itself is continuous outside a small set. Consider the Dirichlet function $\mathbb{1}_{\mathbb{Q}}$: it is discontinuous at every point, so there is no set of full measure on which $f$ is continuous. Lusin's theorem avoids this by providing a different function $\bar{f}$ that is globally continuous and agrees with $f$ outside $E$.
## Measurability and the Lebesgue Integral
The [Lebesgue integral](/page/Lebesgue%20Integral) is constructed in three stages: first for non-negative simple functions, then for general non-negative measurable functions via the monotone approximation theorem, and finally for general measurable functions via the decomposition $f = f^+ - f^-$. At each stage, measurability is the gate: a function can pass to the next stage of the construction only if it is measurable.
[definition: Lebesgue Integral of a Simple Function]
Let $(X, \mathcal{F}, \mu)$ be a measure space and let $s: X \to [0, \infty)$ be a non-negative simple measurable function with canonical representation $s = \sum_{j=1}^N c_j \, \mathbb{1}_{A_j}$. The **Lebesgue integral** of $s$ with respect to $\mu$ is defined by
\begin{align*}
\int_X s \, d\mu = \sum_{j=1}^N c_j \, \mu(A_j),
\end{align*}
with the convention $0 \cdot \infty = 0$.
[/definition]
For a general non-negative measurable function $f: X \to [0, \infty]$, the integral is defined as the supremum over all simple functions below $f$:
\begin{align*}
\int_X f \, d\mu = \sup\left\{ \int_X s \, d\mu : s \text{ simple measurable}, \; 0 \le s \le f \right\}.
\end{align*}
The monotone approximation theorem guarantees that this supremum is achieved as a limit: if $s_k \nearrow f$ is the standard approximating sequence, then $\int s_k \, d\mu \nearrow \int f \, d\mu$ by the [monotone convergence theorem](/theorems/509).
A general measurable function $f: X \to \overline{\mathbb{R}}$ is **integrable** (or in $L^1(X, \mu)$) if $\int_X |f| \, d\mu < \infty$, in which case
\begin{align*}
\int_X f \, d\mu = \int_X f^+ \, d\mu - \int_X f^- \, d\mu.
\end{align*}
If $f$ is not measurable, neither $f^+$ nor $f^-$ is measurable, and the approximation by simple functions fails — the integral is undefined.
[explanation: What Happens If We Try to Integrate a Non-Measurable Function]
Suppose $f: [0,1] \to \{0, 1\}$ is the indicator of a non-measurable set $V \subset [0,1]$. The function $f$ is non-negative, so we attempt to define $\int f \, d\mathcal{L}^1$ as the supremum over simple measurable functions $s \le f$.
The difficulty is that the "natural" simple function $s = \mathbb{1}_V$ is *not measurable*, so it does not belong to the family over which we take the supremum. The measurable simple functions $s \le f$ satisfy $s(x) \le 1$ everywhere and $s(x) = 0$ outside $V$. But because $V$ is not measurable, we cannot use $\mathbb{1}_V$ itself. We are restricted to measurable subsets of $V$: the simple functions $s = c \cdot \mathbb{1}_A$ with $A \subset V$ and $A \in \mathcal{L}(\mathbb{R})$. Such sets $A$ exist (for instance $A = \varnothing$), but their measures form the *inner measure* $\mu_*(V)$, not the measure $\mu(V)$ (which does not exist).
The supremum $\sup \int s \, d\mathcal{L}^1$ therefore gives $\mu_*(V)$, while the analogous infimum construction (approximation from above) gives $\mu^*(V)$. For a non-measurable set, $\mu_*(V) < \mu^*(V)$, and the integral is not well-defined.
This is the exact analogue of the Riemann integral's failure for functions with "too many" discontinuities. The Lebesgue integral's great advantage is that the class of measurable functions is closed under limits, so the class of integrable functions is much larger — but the barrier of measurability remains.
[/explanation]
## Standard Techniques for Working with Measurable Functions
This section collects the most frequently used methods for establishing measurability, constructing measurable functions, and manipulating them in proofs. These techniques reappear throughout integration theory, probability, and PDE.
### Verifying Measurability via Generators
The measurability criterion (checking superlevel sets $\{f > a\}$) is the most common technique, but it is a special case of a more general principle.
A $\sigma$-algebra $\mathcal{G}$ on $Y$ is generated by a collection $\mathcal{C} \subset \mathcal{G}$, written $\mathcal{G} = \sigma(\mathcal{C})$. To verify that $f: X \to Y$ is $(\mathcal{F}, \mathcal{G})$-measurable, it suffices to check $f^{-1}(C) \in \mathcal{F}$ for every $C \in \mathcal{C}$. This is because the collection $\{B \in \mathcal{G} : f^{-1}(B) \in \mathcal{F}\}$ is a $\sigma$-algebra (by the preimage identities for complements and countable unions), and if it contains $\mathcal{C}$, it contains $\sigma(\mathcal{C}) = \mathcal{G}$.
This principle reduces measurability checks to the *generators* of the target $\sigma$-algebra:
- **$Y = \mathbb{R}$, $\mathcal{G} = \mathcal{B}(\mathbb{R})$:** Generated by open intervals, or by half-lines $(a, \infty)$. Check superlevel sets.
- **$Y = \mathbb{R}^m$, $\mathcal{G} = \mathcal{B}(\mathbb{R}^m)$:** Generated by open rectangles $\prod_{i=1}^m (a_i, b_i)$. A function $f = (f_1, \ldots, f_m): X \to \mathbb{R}^m$ is measurable if and only if each component $f_i: X \to \mathbb{R}$ is measurable.
- **$Y$ is a metric space, $\mathcal{G} = \mathcal{B}(Y)$:** Generated by open sets, or by open balls $B(y, r)$. Check $f^{-1}(B(y,r)) \in \mathcal{F}$ for each $y \in Y$ and $r > 0$.
[example: Measurability of Vector-Valued Functions]
Let $f: \mathbb{R}^n \to \mathbb{R}^m$ be given by $f(x) = (f_1(x), \ldots, f_m(x))$. We show that $f$ is $(\mathcal{L}(\mathbb{R}^n), \mathcal{B}(\mathbb{R}^m))$-measurable if and only if each component $f_i: \mathbb{R}^n \to \mathbb{R}$ is Lebesgue measurable.
For the forward direction: if $f$ is measurable and $\pi_i: \mathbb{R}^m \to \mathbb{R}$ is the projection onto the $i$-th coordinate, then $f_i = \pi_i \circ f$. Since $\pi_i$ is continuous (hence Borel measurable), the composition is $(\mathcal{L}(\mathbb{R}^n), \mathcal{B}(\mathbb{R}))$-measurable.
For the reverse direction: if each $f_i$ is measurable, then for any open rectangle $R = \prod_{i=1}^m (a_i, b_i)$,
\begin{align*}
f^{-1}(R) = \bigcap_{i=1}^m f_i^{-1}((a_i, b_i)) = \bigcap_{i=1}^m \{a_i < f_i < b_i\}.
\end{align*}
Each set $\{a_i < f_i < b_i\} = \{f_i > a_i\} \cap \{f_i < b_i\}$ is measurable, so $f^{-1}(R)$ is a finite intersection of measurable sets, hence measurable. Since open rectangles generate $\mathcal{B}(\mathbb{R}^m)$, $f$ is measurable.
[/example]
### Building Measurable Functions from Continuous Ones
A rich source of measurable functions is the class of Borel measurable functions, which in turn is built up from continuous functions using the operations of analysis.
The Borel $\sigma$-algebra $\mathcal{B}(\mathbb{R})$ is the smallest $\sigma$-algebra containing all open sets. The class of Borel measurable functions is therefore the smallest class that contains all continuous functions and is closed under pointwise limits. In fact, this class can be constructed by transfinite induction:
- **Baire class 0:** continuous functions.
- **Baire class 1:** pointwise limits of continuous functions. This includes functions like $\mathbb{1}_{\mathbb{Q}}$ (which is the pointwise limit of $f_k(x) = \lim_{m \to \infty} \cos(k! \pi x)^{2m}$, though this requires care).
- **Baire class $\alpha$:** pointwise limits of sequences from $\bigcup_{\beta < \alpha}$ (Baire class $\beta$).
The class of all Borel measurable functions is $\bigcup_{\alpha < \omega_1}$ (Baire class $\alpha$), where $\omega_1$ is the first uncountable ordinal. This characterisation is theoretically elegant but rarely used in practice — the generator technique above is more efficient.
In practice, the most common route to measurability is the following chain of reasoning:
1. Continuous functions are Borel measurable (by the definition of the Borel $\sigma$-algebra).
2. Sums, products, and quotients of measurable functions are measurable (algebraic stability).
3. Pointwise limits of measurable functions are measurable (closure under limits).
4. Every function arising as the output of a "computable" procedure (involving only these operations) is measurable.
This covers the vast majority of functions encountered in analysis and probability. Non-measurable functions cannot be constructed without the Axiom of Choice.
### Truncation and Localisation
Many arguments in integration theory require working with bounded functions or functions of finite support. The standard tools are *truncation* and *localisation*, both of which preserve measurability.
**Truncation.** For $f: X \to \overline{\mathbb{R}}$ measurable and $M > 0$, define the truncation
\begin{align*}
f_M: X &\to [-M, M] \\
x &\mapsto \max(-M, \min(f(x), M)) = \begin{cases} M & \text{if } f(x) > M, \\ f(x) & \text{if } |f(x)| \le M, \\ -M & \text{if } f(x) < -M. \end{cases}
\end{align*}
Since $f_M = \min(\max(f, -M), M)$ and $\max$, $\min$ preserve measurability, $f_M$ is measurable. Moreover $f_M \to f$ pointwise as $M \to \infty$, and $|f_M| \le |f|$, so the [dominated convergence theorem](/theorems/4) applies.
**Localisation.** For a measurable set $A \in \mathcal{F}$, the restriction $f \cdot \mathbb{1}_A$ is measurable (product of measurable functions). This is used to isolate the behavior of $f$ on $A$ while setting it to zero elsewhere.
### The Role of Completeness
A recurring subtlety in measure theory is the distinction between complete and incomplete $\sigma$-algebras. The Lebesgue $\sigma$-algebra $\mathcal{L}(\mathbb{R}^n)$ is the completion of the Borel $\sigma$-algebra $\mathcal{B}(\mathbb{R}^n)$ with respect to Lebesgue measure: it includes all subsets of $\mathcal{L}^n$-null sets.
Completeness matters in two situations:
1. **Modification on null sets.** If $f$ is measurable and $g = f$ $\mu$-a.e., we need $g$ to be measurable. This holds if and only if $\mathcal{F}$ is complete. In the Lebesgue setting, modifying a measurable function on a null set always produces a measurable function. In the Borel setting, it may not.
2. **Functions defined a.e.** A function $f$ defined only $\mu$-a.e. (for instance, a Radon-Nikodym derivative, or the pointwise limit of a sequence that converges only a.e.) can be extended to all of $X$ by setting $f = 0$ (or any fixed value) on the null set where it is undefined. This extension is measurable only if the null set belongs to $\mathcal{F}$, which again requires completeness.
For this reason, the Lebesgue $\sigma$-algebra is the standard choice for integration theory, while the Borel $\sigma$-algebra is preferred in probability theory (where completeness is handled differently, via the construction of the probability space).
## References
- H.L. Royden, P.M. Fitzpatrick, *Real Analysis* (4th edition, 2010).
- G.B. Folland, *Real Analysis: Modern Techniques and Their Applications* (2nd edition, 1999).
- L.C. Evans, R.F. Gariepy, *Measure Theory and Fine Properties of Functions* (Revised edition, 2015).
- W. Rudin, *Real and Complex Analysis* (3rd edition, 1987).
- E.M. Stein, R. Shakarchi, *Real Analysis: Measure Theory, Integration, and Hilbert Spaces* (2005).