In analysis, we frequently encounter sequences of [measurable functions](/page/Measurable%20Functions) $(f_n)$ converging to a limit $f$, and need to pass to the limit in integrals, compositions, or other operations. The most natural notion — pointwise convergence — turns out to be poorly behaved: it is not controlled by any metric on function spaces, it is not stable under rearrangements of the sequence, and two sequences that agree outside a set of measure zero can have completely different pointwise limits. The stronger notion of [convergence in $L^p$](/page/L%5Ep%20Spaces) fixes some of these deficiencies, but demands too much: it requires the *size* of the differences $|f_n - f|$ to be globally controlled, which excludes many natural approximation procedures.
Between these two extremes lies **convergence in measure**, which asks only that the set where $f_n$ and $f$ differ appreciably becomes small — without demanding anything about *how large* the difference is on that set, and without insisting that the convergence happen at every point. This mode of convergence is coarser than $L^p$ convergence but finer than what can be deduced from pointwise convergence alone. It arises naturally in probability theory (where it is called *convergence in probability*), in the construction of the [Lebesgue integral](/page/Lebesgue%20Integral), and in the study of approximation schemes for PDE.
The central surprise of the theory is that convergence in measure, despite being weaker than $L^p$ convergence, retains just enough structure to guarantee the existence of pointwise-convergent subsequences (the Riesz Subsequence Principle). This subsequence extraction is the bridge that connects measure-theoretic convergence to the pointwise world, and it is the reason convergence in measure appears as a hypothesis in results like the Vitali Convergence Theorem.
[example: Typewriter Sequence]
The **typewriter sequence** demonstrates that convergence in measure does not imply pointwise convergence at any point. Consider the measure space $([0,1), \mathcal{L}^1)$, and define a sequence of indicator functions as follows. Enumerate the dyadic intervals: for each $n \in \mathbb{N}$, write $n = 2^k + j$ where $0 \le j < 2^k$, and set
\begin{align*}
f_n := \mathbb{1}_{[j/2^k,\, (j+1)/2^k)}.
\end{align*}
The intervals have length $2^{-k}$, so for any $\varepsilon \in (0, 1]$,
\begin{align*}
\mathcal{L}^1\bigl(\{x \in [0,1) : |f_n(x)| > \varepsilon\}\bigr) = 2^{-k} \to 0 \quad \text{as } n \to \infty.
\end{align*}
Hence $f_n \to 0$ in measure. However, for every $x \in [0,1)$, the value $f_n(x)$ equals $1$ for infinitely many $n$ (since the intervals sweep across $[0,1)$ repeatedly), so $\limsup_{n \to \infty} f_n(x) = 1$ for all $x$. The sequence does not converge pointwise anywhere.
This example shows that convergence in measure permits the "bad set" $\{|f_n - f| > \varepsilon\}$ to wander through the domain, as long as its measure shrinks. Pointwise convergence, by contrast, requires the bad set to eventually avoid each individual point.
[/example]
## Definition
The definition of convergence in measure captures a single requirement: for each tolerance level $\varepsilon > 0$, the set where the approximation fails must eventually have small measure. No condition is placed on the size of $|f_n - f|$ on this exceptional set, and no condition is placed on the behavior at individual points.
[definition: Convergence in Measure]
Let $(X, \mathcal{A}, \mu)$ be a measure space, and let $f, f_1, f_2, \ldots : X \to \mathbb{R}$ be $\mathcal{A}$-measurable functions that are finite $\mu$-a.e. The sequence $(f_n)$ **converges in measure** to $f$, written $f_n \xrightarrow{\mu} f$, if for every $\varepsilon > 0$,
\begin{align*}
\lim_{n \to \infty} \mu\bigl(\{x \in X : |f_n(x) - f(x)| > \varepsilon\}\bigr) = 0.
\end{align*}
When $\mu$ is a probability measure $\mathbb{P}$, this is called **convergence in probability**, written $f_n \xrightarrow{\mathbb{P}} f$.
[/definition]
Several features of this definition deserve comment.
**The limit is unique $\mu$-a.e.** If $f_n \xrightarrow{\mu} f$ and $f_n \xrightarrow{\mu} g$, then $f = g$ $\mu$-a.e. This follows from the inclusion $\{|f - g| > \varepsilon\} \subset \{|f_n - f| > \varepsilon/2\} \cup \{|f_n - g| > \varepsilon/2\}$ and subadditivity of $\mu$.
**Finite measure is not assumed.** The definition makes sense on $\sigma$-finite or even arbitrary measure spaces. However, several key results — notably Egorov's Theorem and the implication "pointwise a.e. $\Rightarrow$ in measure" — require $\mu(X) < \infty$. On infinite measure spaces, the relationship between convergence in measure and pointwise convergence changes dramatically, as we explore in the section on the hierarchy of modes.
**The role of $\varepsilon$.** The quantifier structure is: for *every* $\varepsilon > 0$, the measures converge to zero. It is not sufficient to check a single $\varepsilon$; the definition requires uniform smallness across all tolerance levels. However, because $\{|f_n - f| > \varepsilon_1\} \subset \{|f_n - f| > \varepsilon_2\}$ whenever $\varepsilon_1 > \varepsilon_2$, it suffices to verify the condition along any sequence $\varepsilon_k \downarrow 0$.
### Cauchy in Measure
Just as Cauchy sequences in $\mathbb{R}$ converge without requiring advance knowledge of the limit, there is an analogous notion for convergence in measure.
[definition: Cauchy in Measure]
A sequence $(f_n)$ of $\mathcal{A}$-measurable functions is **Cauchy in measure** if for every $\varepsilon > 0$,
\begin{align*}
\lim_{m, n \to \infty} \mu\bigl(\{x \in X : |f_m(x) - f_n(x)| > \varepsilon\}\bigr) = 0.
\end{align*}
[/definition]
The space of measurable functions on a finite measure space, equipped with convergence in measure, is a complete topological vector space: every Cauchy-in-measure sequence converges in measure. This completeness is a consequence of the Riesz Subsequence Principle, which we establish below.
## The Hierarchy of Convergence Modes
The central difficulty in working with sequences of measurable functions is that there are many natural notions of convergence — pointwise, pointwise a.e., uniform, $L^p$, in measure — and none of them implies all the others. Understanding which implications hold, which fail, and under what additional hypotheses a weak mode can be promoted to a stronger one is essential for applying limit theorems correctly.
### From $L^p$ to Measure: What Is Lost
The strongest of the common modes (apart from uniform convergence) is convergence in $L^p$. It controls both the *size* and the *extent* of the set where $f_n$ deviates from $f$. Convergence in measure retains only the latter control.
[quotetheorem:1075]
The mechanism is Chebyshev's inequality: for any $\varepsilon > 0$,
\begin{align*}
\mu\bigl(\{|f_n - f| > \varepsilon\}\bigr) \le \frac{1}{\varepsilon^p} \int_X |f_n - f|^p \, d\mu = \frac{\|f_n - f\|_{L^p}^p}{\varepsilon^p} \to 0.
\end{align*}
The converse fails: convergence in measure does not imply $L^p$ convergence, even on finite measure spaces. The obstruction is that convergence in measure controls the *measure* of the exceptional set but not the *height* of the function on that set.
[example: Convergence in Measure Without $L^p$ Convergence]
On $([0,1), \mathcal{L}^1)$, define $g_n := n \cdot \mathbb{1}_{[0, 1/n)}$ for each $n \in \mathbb{N}$. Then for any $\varepsilon > 0$, once $n > \varepsilon$ we have
\begin{align*}
\mathcal{L}^1\bigl(\{|g_n| > \varepsilon\}\bigr) = \mathcal{L}^1\bigl([0, 1/n)\bigr) = \frac{1}{n} \to 0,
\end{align*}
so $g_n \to 0$ in measure. However,
\begin{align*}
\|g_n\|_{L^1} = n \cdot \frac{1}{n} = 1 \quad \text{for all } n,
\end{align*}
so $g_n \not\to 0$ in $L^1$. For $L^p$ with $p > 1$, the failure is even more dramatic: $\|g_n\|_{L^p}^p = n^p / n = n^{p-1} \to \infty$.
The functions $g_n$ form a sequence of increasingly tall, increasingly narrow spikes. Convergence in measure detects that the spike is narrow; $L^p$ convergence additionally requires that the spike is short enough for the integral to vanish.
[/example]
### From Pointwise a.e. to Measure: The Role of Finite Measure
On a finite measure space, pointwise a.e. convergence is stronger than convergence in measure. This is a consequence of Egorov's Theorem, which provides a powerful intermediary: pointwise convergence can be upgraded to *almost-uniform* convergence, which in turn implies convergence in measure.
[quotetheorem:896]
Egorov's Theorem says that on a finite measure space, pointwise a.e. convergence is "almost" uniform convergence — the uniformity can fail only on a set of arbitrarily small measure. The finite measure hypothesis is indispensable: on $(\mathbb{R}, \mathcal{L}^1)$, the sequence $f_n := \mathbb{1}_{[n, n+1]}$ converges pointwise to zero everywhere, but the convergence is not uniform on any set whose complement has finite measure, because $\sup_{x \in \mathbb{R} \setminus A} |f_n(x)| = 1$ for all $n$ large enough that $[n, n+1]$ is not contained in $A$.
From Egorov's Theorem, the implication "pointwise a.e. $\Rightarrow$ in measure" on finite measure spaces is immediate: given $\varepsilon, \delta > 0$, choose $A$ with $\mu(A) < \delta$ and $N$ such that $|f_n - f| \le \varepsilon$ on $X \setminus A$ for all $n \ge N$. Then $\{|f_n - f| > \varepsilon\} \subset A$, so $\mu(\{|f_n - f| > \varepsilon\}) < \delta$.
**On infinite measure spaces, pointwise a.e. convergence does NOT imply convergence in measure.** The translating indicator $f_n = \mathbb{1}_{[n, n+1]}$ converges pointwise to $0$ everywhere, yet $\mathcal{L}^1(\{f_n > \varepsilon\}) = 1$ for all $n$ and all $\varepsilon \in (0, 1]$. The "bad set" does not shrink; it merely drifts to infinity. Convergence in measure requires the bad set to have small measure, not just to avoid each fixed point eventually.
### From Measure to Pointwise: The Riesz Subsequence Principle
The typewriter sequence (Example above) shows that convergence in measure does not imply pointwise convergence. However, convergence in measure always guarantees the existence of a subsequence that converges pointwise a.e. This is the **Riesz Subsequence Principle**, one of the most frequently used tools in real analysis and probability.
[quotetheorem:1021]
The principle works by choosing the subsequence so that the convergence is fast enough to apply the Borel--Cantelli lemma. Specifically, one selects $n_k$ so that $\mu(\{|f_{n_k} - f| > 2^{-k}\}) < 2^{-k}$. Then the sets $E_k := \{|f_{n_k} - f| > 2^{-k}\}$ satisfy $\sum_k \mu(E_k) < \infty$, so $\mu(\limsup_k E_k) = 0$ by Borel--Cantelli. For every $x \notin \limsup_k E_k$, there exists $K$ such that $|f_{n_k}(x) - f(x)| \le 2^{-k}$ for all $k \ge K$, giving pointwise convergence.
The Riesz Subsequence Principle has a partial converse that characterizes convergence in measure entirely through subsequences.
[quotetheorem:1076]
This characterization is the measure-theoretic analogue of the fact that a sequence in a topological space converges to $x$ if and only if every subsequence has a further subsequence converging to $x$. It provides a powerful technique: to prove convergence in measure, it suffices to show that every subsequence has a pointwise-a.e.-convergent sub-subsequence with the correct limit. This "subsequence of subsequence" argument appears throughout probability theory and the theory of [Sobolev spaces](/page/Sobolev%20Spaces).
### Summary of Implications
The following diagram summarizes the relationships on a **finite** measure space $(X, \mathcal{A}, \mu)$ with $\mu(X) < \infty$:
\begin{align*}
&\text{Uniform} \implies \text{Pointwise} \implies \text{Pointwise a.e.} \implies \text{In measure} \\
&\text{$L^p$} \implies \text{In measure} \\
&\text{In measure} \implies \text{Pointwise a.e. (subsequence only)}
\end{align*}
On $\sigma$-finite spaces with infinite total measure, the implication "pointwise a.e. $\Rightarrow$ in measure" fails. No arrow from convergence in measure back to $L^p$ exists without additional hypotheses (see the Vitali Convergence Theorem below).
## Promoting Convergence in Measure to $L^p$ Convergence
The gap between convergence in measure and $L^p$ convergence — demonstrated by the tall-spike example above — arises from two distinct pathological behaviors. First, the functions may develop *concentration*: their mass piles up on a shrinking set (as with $g_n = n \cdot \mathbb{1}_{[0,1/n)}$). Second, on infinite measure spaces, mass may *escape to infinity* (as with $h_n = \mathbb{1}_{[n, n+1]}$ in $L^1(\mathbb{R})$). The Vitali Convergence Theorem identifies the precise conditions that rule out both pathologies.
### Uniform Integrability
The first pathology — concentration — is excluded by requiring the integrals of $(f_n)$ to be uniformly controlled. The correct condition is **uniform integrability**, which demands that the tails of the integrals $\int_{\{|f_n| > t\}} |f_n| \, d\mu$ are uniformly small.
[definition: Uniform Integrability]
Let $(X, \mathcal{A}, \mu)$ be a measure space. A family $\mathcal{F} \subset L^1(X, \mu)$ is **uniformly integrable** if
\begin{align*}
\lim_{t \to \infty} \sup_{f \in \mathcal{F}} \int_{\{|f| > t\}} |f| \, d\mu = 0.
\end{align*}
[/definition]
Uniform integrability prevents the mass of the functions from concentrating on sets of small measure. A single function $f \in L^1$ always satisfies this condition (by dominated convergence applied to $|f| \cdot \mathbb{1}_{\{|f| > t\}}$), but a sequence may fail: the tall-spike sequence $g_n = n \cdot \mathbb{1}_{[0,1/n)}$ has $\int_{\{g_n > t\}} g_n \, d\mathcal{L}^1 = 1$ for $t < n$, so the supremum over $n$ does not vanish.
A useful equivalent characterization on finite measure spaces: $\mathcal{F}$ is uniformly integrable if and only if $\sup_{f \in \mathcal{F}} \|f\|_{L^1} < \infty$ and for every $\delta > 0$ there exists $\eta > 0$ such that
\begin{align*}
\mu(A) < \eta \implies \sup_{f \in \mathcal{F}} \int_A |f| \, d\mu < \delta.
\end{align*}
This reformulation makes the connection to absolute continuity of the integral explicit: uniform integrability says that the integrals of all functions in $\mathcal{F}$ are uniformly absolutely continuous with respect to $\mu$.
### Tightness
The second pathology — escape of mass to infinity — is relevant only on spaces of infinite measure and is controlled by **tightness**.
[definition: Tightness]
Let $(X, \mathcal{A}, \mu)$ be a measure space. A family $\mathcal{F}$ of measurable functions is **tight** if for every $\delta > 0$ there exists a measurable set $K \in \mathcal{A}$ with $\mu(K) < \infty$ such that
\begin{align*}
\sup_{f \in \mathcal{F}} \mu\bigl(\{x \in X \setminus K : |f(x)| > \delta\}\bigr) < \delta.
\end{align*}
[/definition]
On a finite measure space, every family of measurable functions is automatically tight (take $K = X$). On $(\mathbb{R}, \mathcal{L}^1)$, the translating indicators $h_n = \mathbb{1}_{[n,n+1]}$ are not tight: for any bounded set $K$, we have $\mathcal{L}^1(\{x \notin K : h_n(x) > 1/2\}) = 1$ for all sufficiently large $n$.
### The Vitali Convergence Theorem
With both obstructions identified — concentration and escape — the Vitali Convergence Theorem characterizes exactly when convergence in measure can be promoted to $L^1$ convergence.
[quotetheorem:950]
On a finite measure space, condition (3) is automatic, and the theorem reduces to: $f_n \to f$ in $L^1$ if and only if $f_n \xrightarrow{\mu} f$ and $(f_n)$ is uniformly integrable. This is the form most commonly encountered.
Each of the three conditions is necessary, and dropping any one of them produces a counterexample:
- **Without convergence in measure:** Take $f_n = (-1)^n \mathbb{1}_{[0,1]}$ on $([0,1], \mathcal{L}^1)$. The sequence is uniformly integrable and tight, but $f_n$ does not converge in measure (and indeed does not converge in $L^1$).
- **Without uniform integrability:** The tall-spike sequence $g_n = n \cdot \mathbb{1}_{[0,1/n)}$ converges to $0$ in measure and is tight, but $\|g_n\|_{L^1} = 1 \not\to 0$. The mass concentrates without bound.
- **Without tightness:** On $(\mathbb{R}, \mathcal{L}^1)$, define $h_n := (1/n) \cdot \mathbb{1}_{[0,n]}$. Since $\|h_n\|_{L^\infty} = 1/n \to 0$, for any $\varepsilon > 0$ we have $\{h_n > \varepsilon\} = \varnothing$ once $n > 1/\varepsilon$, so $h_n \to 0$ in measure. The sequence is uniformly integrable: every $h_n$ satisfies $|h_n| \le 1$, so $\int_{\{|h_n| > t\}} |h_n| \, d\mathcal{L}^1 = 0$ for $t \ge 1$. Yet $\|h_n\|_{L^1} = 1$ for all $n$. Tightness is what fails: for any set $K$ with $\mathcal{L}^1(K) < \infty$, the support $[0, n]$ eventually extends far beyond $K$, so $\int_{\mathbb{R} \setminus K} h_n \, d\mathcal{L}^1 \to 1$. The mass does not concentrate — it spreads over an unbounded region.
The Vitali Convergence Theorem generalizes the Dominated Convergence Theorem: if $|f_n| \le g$ for some $g \in L^1$, then $(f_n)$ is uniformly integrable (since $\int_{\{|f_n| > t\}} |f_n| \, d\mu \le \int_{\{g > t\}} g \, d\mu \to 0$ as $t \to \infty$). The Dominated Convergence Theorem is therefore the special case where uniform integrability is guaranteed by a single dominating function.
## A Metrization of Convergence in Measure
A natural question is whether convergence in measure can be described by a metric — that is, whether it is a metrizable notion of convergence. On a finite measure space, the answer is yes, and the resulting metric space is complete. This is useful because it places convergence in measure within the framework of metric space topology, making tools like the Baire Category Theorem available.
[definition: Ky Fan Metric]
Let $(X, \mathcal{A}, \mu)$ be a measure space with $\mu(X) < \infty$. For $\mathcal{A}$-measurable functions $f, g : X \to \mathbb{R}$ that are finite $\mu$-a.e., define
\begin{align*}
d_\mu(f, g) := \inf\bigl\{\varepsilon > 0 : \mu\bigl(\{|f - g| > \varepsilon\}\bigr) \le \varepsilon\bigr\}.
\end{align*}
(Functions are identified when they agree $\mu$-a.e.)
[/definition]
This is called the **Ky Fan metric** (or, in some references, the **Ky Fan distance**). It metrizes convergence in measure: $d_\mu(f_n, f) \to 0$ if and only if $f_n \xrightarrow{\mu} f$. The verification requires checking two directions. If $d_\mu(f_n, f) \to 0$, then for any $\varepsilon > 0$, eventually $d_\mu(f_n, f) < \varepsilon$, which means $\mu(\{|f_n - f| > \varepsilon\}) \le \varepsilon$, giving convergence in measure. Conversely, if $f_n \xrightarrow{\mu} f$, then for any $\varepsilon > 0$, eventually $\mu(\{|f_n - f| > \varepsilon\}) \le \varepsilon$, so $d_\mu(f_n, f) \le \varepsilon$.
An alternative metric that is sometimes more convenient for computations is
\begin{align*}
\rho(f, g) := \int_X \frac{|f - g|}{1 + |f - g|} \, d\mu.
\end{align*}
When $\mu(X) < \infty$, this also metrizes convergence in measure, and the resulting topology is the same. However, $\rho$ has the advantage of being defined by an integral, which makes it easier to verify convergence using dominated convergence or monotone convergence arguments.
[remark: Infinite Measure]
On a $\sigma$-finite space with $\mu(X) = \infty$, convergence in measure is still a well-defined notion, but it is **not metrizable** in general. The Ky Fan metric as defined above does not yield a finite value, and the integral metric $\rho$ diverges. One can instead work with convergence in measure on each set of a $\sigma$-finite decomposition, but the resulting topology need not be metrizable. In probability theory this issue does not arise, since the underlying measure is always finite.
[/remark]
## Techniques for Working with Convergence in Measure
### Subsequence Extraction and Diagonal Arguments
The most common technique for proving results about convergence in measure is the **subsequence-plus-pointwise** strategy. The pattern is:
1. Extract a pointwise-a.e.-convergent subsequence using the Riesz Subsequence Principle.
2. Apply a known result for pointwise convergence (Fatou's Lemma, Dominated Convergence, monotonicity arguments).
3. Conclude the desired property for the original sequence using the Subsequence Characterization: if every subsequence has a further subsequence with the desired property, then the full sequence has the property.
This three-step pattern resolves many problems that initially seem to require direct $\varepsilon$-$\delta$ arguments with measures of sets.
[example: Convergence in Measure Preserves Integrals Under Domination]
Suppose $f_n \xrightarrow{\mu} f$ on a finite measure space $(X, \mathcal{A}, \mu)$, and $|f_n| \le g$ for some $g \in L^1(X, \mu)$. We claim $\int_X f_n \, d\mu \to \int_X f \, d\mu$.
**Step 1: Reduce to pointwise.** Let $(f_{n_j})$ be any subsequence of $(f_n)$. By the Riesz Subsequence Principle, there exists a further subsequence $(f_{n_{j_k}})$ with $f_{n_{j_k}} \to f$ pointwise $\mu$-a.e.
**Step 2: Apply Dominated Convergence.** Since $|f_{n_{j_k}}| \le g \in L^1$ and $f_{n_{j_k}} \to f$ pointwise $\mu$-a.e., the Dominated Convergence Theorem gives $\int_X f_{n_{j_k}} \, d\mu \to \int_X f \, d\mu$.
**Step 3: Conclude for the full sequence.** We have shown that every subsequence of the real sequence $\bigl(\int_X f_n \, d\mu\bigr)$ has a further subsequence converging to $\int_X f \, d\mu$. Therefore $\int_X f_n \, d\mu \to \int_X f \, d\mu$.
This argument does not require proving convergence in $L^1$ first; it works directly at the level of integrals.
[/example]
### Chebyshev's Inequality as a Bridge
The most direct tool for converting $L^p$ information into measure-theoretic information is **Chebyshev's inequality** (also called Markov's inequality): for $f \in L^p(X, \mu)$ with $p \ge 1$,
\begin{align*}
\mu\bigl(\{|f| > t\}\bigr) \le \frac{\|f\|_{L^p}^p}{t^p}, \quad t > 0.
\end{align*}
This inequality is the mechanism behind the implication "$L^p$ convergence $\Rightarrow$ convergence in measure." More subtly, it provides *quantitative* control: if $\|f_n - f\|_{L^p} = O(n^{-\alpha})$ for some $\alpha > 0$, then
\begin{align*}
\mu\bigl(\{|f_n - f| > n^{-\beta}\}\bigr) \le \frac{\|f_n - f\|_{L^p}^p}{n^{-\beta p}} = O\bigl(n^{\beta p - \alpha p}\bigr),
\end{align*}
which is summable when $\beta < \alpha - 1/p$. By Borel--Cantelli, this gives pointwise a.e. convergence along the *full* sequence (not just a subsequence), with an explicit rate. This technique is used extensively in probability theory to promote convergence in probability to almost sure convergence when the convergence rate is sufficiently fast.
### Truncation and Approximation
When working with convergence in measure, it is often useful to reduce to bounded functions. If $f_n \xrightarrow{\mu} f$, then the truncations $T_M f_n := \max(-M, \min(f_n, M))$ also converge in measure to $T_M f$ for each fixed $M > 0$. This follows from the inclusion
\begin{align*}
\{|T_M f_n - T_M f| > \varepsilon\} \subset \{|f_n - f| > \varepsilon\} \cup \{|f_n| > M\} \cup \{|f| > M\},
\end{align*}
which holds because the truncation map $t \mapsto \max(-M, \min(t, M))$ is a contraction on $\mathbb{R}$ (it is $1$-Lipschitz). Truncation reduces many problems to the bounded case, where uniform integrability is automatic and the theory simplifies.
[example: Composition with Continuous Functions]
Suppose $f_n \xrightarrow{\mu} f$ and $\varphi : \mathbb{R} \to \mathbb{R}$ is continuous. Does $\varphi \circ f_n \xrightarrow{\mu} \varphi \circ f$?
**On finite measure spaces, yes.** Fix $\varepsilon > 0$. For any $M > 0$, the restriction of $\varphi$ to $[-M, M]$ is uniformly continuous, so there exists $\delta = \delta(\varepsilon, M) > 0$ such that $|s - t| < \delta$ and $|s|, |t| \le M$ imply $|\varphi(s) - \varphi(t)| < \varepsilon$. Then
\begin{align*}
\{|\varphi \circ f_n - \varphi \circ f| > \varepsilon\} \subset \{|f_n - f| > \delta\} \cup \{|f_n| > M\} \cup \{|f| > M\}.
\end{align*}
The first set has measure tending to $0$ by convergence in measure. The third set has fixed measure $\mu(\{|f| > M\})$, which tends to $0$ as $M \to \infty$ (since $f$ is finite a.e. and $\mu(X) < \infty$). For the second set, note that $\{|f_n| > M\} \subset \{|f_n - f| > M/2\} \cup \{|f| > M/2\}$. The first part tends to $0$ in measure; the second is $\mu(\{|f| > M/2\})$, also small for large $M$.
Choosing $M$ large enough that $\mu(\{|f| > M/2\}) < \varepsilon$, and then $N$ large enough that the remaining terms are each less than $\varepsilon$, we obtain $\mu(\{|\varphi \circ f_n - \varphi \circ f| > \varepsilon\}) < 3\varepsilon$ for $n \ge N$.
**On infinite measure spaces, the result can fail** without additional hypotheses: if $f_n = \mathbb{1}_{[n, n+1]}$ on $(\mathbb{R}, \mathcal{L}^1)$ and $\varphi(t) = t^2$, then $f_n \xrightarrow{\mu} 0$ fails (as noted above, $f_n$ does not converge in measure on $\mathbb{R}$), but even for sequences that do converge in measure on infinite spaces, the argument above breaks down because we cannot make $\mu(\{|f| > M\})$ small.
[/example]
## The Probabilistic Perspective
In probability theory, where the underlying measure space $(\Omega, \mathcal{F}, \mathbb{P})$ has total mass $\mathbb{P}(\Omega) = 1$, convergence in measure becomes **convergence in probability**: a sequence of random variables $X_n$ converges in probability to $X$ if for every $\varepsilon > 0$,
\begin{align*}
\lim_{n \to \infty} \mathbb{P}\bigl(|X_n - X| > \varepsilon\bigr) = 0.
\end{align*}
Because probability spaces are always finite measure spaces, several simplifications occur automatically: pointwise a.e. convergence (almost sure convergence, in probabilistic language) implies convergence in probability, tightness is automatic, and the Ky Fan metric metrizes the topology.
The Riesz Subsequence Principle becomes the statement that convergence in probability implies the existence of an almost surely convergent subsequence. This is one of the workhorses of probability theory, used to establish:
- The relationship between the Weak and Strong Laws of Large Numbers: convergence of $\bar{X}_n$ in probability (Weak Law) does not guarantee almost sure convergence (Strong Law), but it does guarantee an a.s.-convergent subsequence.
- Stability results: if $X_n \xrightarrow{\mathbb{P}} X$ and $Y_n \xrightarrow{\mathbb{P}} Y$, then $X_n + Y_n \xrightarrow{\mathbb{P}} X + Y$ and $X_n Y_n \xrightarrow{\mathbb{P}} XY$ (provided the products are well-defined). These follow from the subsequence characterization and the corresponding results for a.s. convergence.
- Slutsky's Theorem: if $X_n \xrightarrow{d} X$ and $Y_n \xrightarrow{\mathbb{P}} c$ for a constant $c$, then $X_n + Y_n \xrightarrow{d} X + c$ and $X_n Y_n \xrightarrow{d} cX$.
The Vitali Convergence Theorem, in the probabilistic setting, states that $X_n \to X$ in $L^1(\mathbb{P})$ if and only if $X_n \xrightarrow{\mathbb{P}} X$ and $(X_n)$ is uniformly integrable. This is the basis for the theory of uniformly integrable martingales: a martingale $(M_n)$ converges in $L^1$ if and only if it is uniformly integrable, and the $L^1$ convergence is equivalent to convergence in probability plus uniform integrability (the a.e. convergence is guaranteed separately by the Martingale Convergence Theorem).
[remark: Convergence in Probability vs. Convergence in Distribution]
Convergence in probability is strictly stronger than **convergence in distribution** (weak convergence of laws): if $X_n \xrightarrow{\mathbb{P}} X$, then $X_n \xrightarrow{d} X$, but the converse fails. For instance, if $X$ has a symmetric distribution and $X_n = -X$ for all $n$, then $X_n \xrightarrow{d} X$ (since $X$ and $-X$ have the same law) but $X_n \xrightarrow{\mathbb{P}} X$ only when $X = 0$ a.s. The one exception is convergence to a constant: if $X_n \xrightarrow{d} c$ for a deterministic $c \in \mathbb{R}$, then $X_n \xrightarrow{\mathbb{P}} c$. This fact is used implicitly in many applications of Slutsky's Theorem.
[/remark]
## Almost Uniform Convergence
Egorov's Theorem introduces a mode of convergence that sits strictly between pointwise a.e. convergence and convergence in measure, clarifying the passage from one to the other.
[definition: Almost Uniform Convergence]
Let $(X, \mathcal{A}, \mu)$ be a measure space. A sequence $(f_n)$ of $\mathcal{A}$-measurable functions **converges almost uniformly** to $f$ if for every $\delta > 0$ there exists $A \in \mathcal{A}$ with $\mu(A) < \delta$ such that $f_n \to f$ uniformly on $X \setminus A$.
[/definition]
Egorov's Theorem states that on finite measure spaces, pointwise a.e. convergence implies almost uniform convergence. Almost uniform convergence, in turn, always implies convergence in measure (on any measure space, not just finite ones): given $\varepsilon, \delta > 0$, choose $A$ with $\mu(A) < \delta$ and $N$ with $|f_n - f| \le \varepsilon$ on $X \setminus A$ for $n \ge N$. Then $\{|f_n - f| > \varepsilon\} \subset A$, so $\mu(\{|f_n - f| > \varepsilon\}) < \delta$.
The converse — convergence in measure implies almost uniform convergence — is **false**, even on finite measure spaces. The typewriter sequence $f_n = \mathbb{1}_{[j/2^k, (j+1)/2^k)}$ converges to $0$ in measure but not almost uniformly: for any set $A$ with $\mathcal{L}^1(A) < 1$, the complement $[0,1) \setminus A$ has positive measure, and $\sup_{x \in [0,1) \setminus A} |f_n(x)| = 1$ for infinitely many $n$ (since the dyadic intervals eventually intersect any set of positive measure).
Thus, on finite measure spaces, the chain of implications is:
\begin{align*}
\text{Pointwise a.e.} \implies \text{Almost uniform} \implies \text{In measure} \implies \text{Pointwise a.e. (subsequence)}.
\end{align*}
None of these implications reverses.
## References
- Folland, G. B., *Real Analysis: Modern Techniques and Their Applications*, 2nd ed. (1999).
- Bogachev, V. I., *Measure Theory*, Vol. I (2007).
- Rudin, W., *Real and Complex Analysis*, 3rd ed. (1987).
- Billingsley, P., *Probability and Measure*, 3rd ed. (1995).
- Evans, L. C. and Gariepy, R. F., *Measure Theory and Fine Properties of Functions*, rev. ed. (2015).