[proofplan]
From convergence in measure, we extract a subsequence whose tails decrease fast enough for summability. Specifically, we choose indices $k_1 < k_2 < \cdots$ such that $\mu(\{|f_{k_\ell} - f| > 2^{-\ell}\}) < 2^{-\ell}$. The Borel-Cantelli-type argument (via direct summability and containment of the limsup) then shows that the set of points belonging to infinitely many of the "bad" sets $\{|f_{k_\ell} - f| > 2^{-\ell}\}$ has measure zero. On the complement — which has full measure — $|f_{k_\ell}(x) - f(x)| \le 2^{-\ell}$ for all sufficiently large $\ell$, giving pointwise convergence.
[/proofplan]
[step:Extract a subsequence with geometrically decaying measure of deviation]
By the definition of convergence in measure, for every $\varepsilon > 0$ and every $\delta > 0$, there exists $N \in \mathbb{N}$ such that $\mu(\{x \in X : |f_k(x) - f(x)| > \varepsilon\}) < \delta$ for all $k \ge N$.
We apply this definition iteratively with $\varepsilon = 2^{-\ell}$ and $\delta = 2^{-\ell}$ for $\ell = 1, 2, 3, \ldots$ to select a strictly increasing sequence of indices $k_1 < k_2 < k_3 < \cdots$ as follows.
Set $\ell = 1$: choose $k_1$ such that $\mu(\{|f_{k_1} - f| > 2^{-1}\}) < 2^{-1}$.
Given $k_\ell$, set $\ell + 1$: by convergence in measure with $\varepsilon = 2^{-(\ell+1)}$ and $\delta = 2^{-(\ell+1)}$, there exists $N_{\ell+1}$ such that $\mu(\{|f_k - f| > 2^{-(\ell+1)}\}) < 2^{-(\ell+1)}$ for all $k \ge N_{\ell+1}$. Choose $k_{\ell+1} := \max(k_\ell + 1, N_{\ell+1})$ to ensure $k_{\ell+1} > k_\ell$.
This produces a subsequence $(f_{k_\ell})_{\ell \ge 1}$ satisfying
\begin{align*}
\mu\!\left(\left\{x \in X : |f_{k_\ell}(x) - f(x)| > \frac{1}{2^\ell}\right\}\right) < \frac{1}{2^\ell} \quad \text{for every } \ell \ge 1.
\end{align*}
[guided]
Why do we choose $\varepsilon = \delta = 2^{-\ell}$? The goal is to make the measures of the "bad" sets summable. Any choice producing a summable series would work — for instance, $\varepsilon = \delta = \ell^{-2}$ — but the geometric choice $2^{-\ell}$ is standard and produces the cleanest estimates. The summability $\sum_{\ell=1}^\infty 2^{-\ell} = 1 < \infty$ is what drives the Borel-Cantelli argument in the next step.
Note that convergence in measure does *not* produce the indices canonically — we are making choices at each stage. This is a pure existence argument (there is no uniqueness of the subsequence).
[/guided]
[/step]
[step:Define the exceptional set and bound its measure via summability]
Define the "bad" sets
\begin{align*}
E_\ell := \left\{x \in X : |f_{k_\ell}(x) - f(x)| > \frac{1}{2^\ell}\right\}, \quad \ell = 1, 2, 3, \ldots,
\end{align*}
and let
\begin{align*}
E := \limsup_{\ell \to \infty} E_\ell = \bigcap_{m=1}^{\infty} \bigcup_{\ell = m}^{\infty} E_\ell.
\end{align*}
The set $E$ consists of those $x \in X$ that belong to infinitely many of the $E_\ell$.
For each $m \ge 1$, the set $\bigcup_{\ell=m}^{\infty} E_\ell$ contains $E$, and by subadditivity of $\mu$,
\begin{align*}
\mu\!\left(\bigcup_{\ell=m}^{\infty} E_\ell\right) \le \sum_{\ell=m}^{\infty} \mu(E_\ell) < \sum_{\ell=m}^{\infty} \frac{1}{2^\ell} = \frac{1}{2^{m-1}}.
\end{align*}
Since $E \subset \bigcup_{\ell=m}^{\infty} E_\ell$ for every $m$, monotonicity of $\mu$ gives
\begin{align*}
\mu(E) \le \frac{1}{2^{m-1}} \quad \text{for every } m \ge 1.
\end{align*}
Taking $m \to \infty$ yields $\mu(E) = 0$.
[guided]
This is the core of the argument and follows the same logic as the first [Borel-Cantelli Lemma](/theorems/507), though we do not need independence (which is a hypothesis of the second Borel-Cantelli lemma, not the first). The first Borel-Cantelli lemma states: if $(E_\ell)$ is any sequence of measurable sets with $\sum \mu(E_\ell) < \infty$, then $\mu(\limsup E_\ell) = 0$. Here we have $\sum_{\ell=1}^\infty \mu(E_\ell) < \sum_{\ell=1}^\infty 2^{-\ell} = 1 < \infty$, so the conclusion applies.
We gave the direct proof above (bounding $\mu(\bigcup_{\ell \ge m} E_\ell)$ by the tail of the series) to keep the argument self-contained, but the reader should recognise this as precisely the Borel-Cantelli mechanism.
[/guided]
[/step]
[step:Conclude pointwise convergence on the full-measure complement of $E$]
Let $x \in X \setminus E$. Since $x \notin E = \limsup_{\ell \to \infty} E_\ell$, the point $x$ belongs to only finitely many of the sets $E_\ell$. That is, there exists $L = L(x) \in \mathbb{N}$ such that $x \notin E_\ell$ for all $\ell \ge L$. By the definition of $E_\ell$, this means
\begin{align*}
|f_{k_\ell}(x) - f(x)| \le \frac{1}{2^\ell} \quad \text{for all } \ell \ge L.
\end{align*}
Since $2^{-\ell} \to 0$ as $\ell \to \infty$, the squeeze inequality gives $f_{k_\ell}(x) \to f(x)$.
We have shown: there exists a set $E$ with $\mu(E) = 0$ such that $f_{k_\ell}(x) \to f(x)$ for every $x \in X \setminus E$. This is exactly the statement that $f_{k_\ell} \to f$ $\mu$-a.e.
[guided]
It is worth noting what fails if we try to upgrade from subsequential a.e. convergence to full-sequence a.e. convergence. Convergence in measure does *not* imply a.e. convergence of the original sequence — the classical counterexample is the "typewriter sequence" of indicator functions on $[0,1]$: define $f_k = \mathbb{1}_{[j/2^m, (j+1)/2^m]}$ where $k = 2^m + j$, $0 \le j < 2^m$. This sequence converges to $0$ in measure (the support has Lebesgue measure $2^{-m} \to 0$), but for every $x \in [0,1]$, $f_k(x) = 1$ for infinitely many $k$, so pointwise convergence fails everywhere. The Riesz Subsequence Principle guarantees only that some subsequence converges a.e. — and indeed, any subsequence extracted along a single dyadic level (e.g., $f_{2^m}$) converges a.e. to $0$.
[/guided]
[/step]