[proofplan]
We first prove the assertion for bounded [measurable functions](/page/Measurable%20Functions) by estimating the second moment of the centered empirical average and using the elementary [Markov inequality](/theorems/514) for its square. For a general integrable function, we truncate $g$ to a bounded function $g_M$ and decompose the empirical error into the bounded truncated error plus two tail errors. The bounded part vanishes as $n\to\infty$, while integrability of $g$ makes the population tail small, and the empirical tail is controlled in probability by its expectation.
[/proofplan]
[step:Prove the convergence for bounded measurable functions]
Let $h:\mathcal X\to\mathbb R$ be a bounded $\mathcal A$-measurable function. Define
\begin{align*}
Ph:=\int_{\mathcal X} h(x)\,dP(x).
\end{align*}
For each $n\in\mathbb N$, define the real-valued [random variable](/page/Random%20Variable) $P_nh:\Omega\to\mathbb R$ by
\begin{align*}
P_nh(\omega):=\frac{1}{n}\sum_{i=1}^n h(X_i(\omega)).
\end{align*}
Let $K:=\sup_{x\in\mathcal X}|h(x)|<\infty$. For any integrable real-valued random variable $Y:\Omega\to\mathbb R$, write
\begin{align*}
\mathbb E[Y]:=\int_\Omega Y(\omega)\,d\mathbb P(\omega).
\end{align*}
For each $i\in\mathbb N$, define the centered real-valued random variable $Z_i:\Omega\to\mathbb R$ by
\begin{align*}
Z_i(\omega):=h(X_i(\omega))-Ph.
\end{align*}
Since $X_1,X_2,\dots$ are independent and identically distributed with common distribution $P$, the random variables $Z_1,Z_2,\dots$ are independent and identically distributed. The change-of-law identity for $X_i$ gives
\begin{align*}
\mathbb E[Z_i]=\int_\Omega \bigl(h(X_i(\omega))-Ph\bigr)\,d\mathbb P(\omega)=\int_{\mathcal X} h(x)\,dP(x)-Ph=0.
\end{align*}
Moreover $|Z_i|\le 2K$, so $\mathbb E[Z_i^2]\le 4K^2$. For $i<j$, independence gives $\mathbb E[Z_iZ_j]=\mathbb E[Z_i]\mathbb E[Z_j]=0$. Expanding the square and using these mixed-term identities,
\begin{align*}
\mathbb E\left[\left(P_nh-Ph\right)^2\right]=\mathbb E\left[\left(\frac{1}{n}\sum_{i=1}^n Z_i\right)^2\right]=\frac{1}{n^2}\sum_{i=1}^n\mathbb E[Z_i^2]+\frac{2}{n^2}\sum_{1\le i<j\le n}\mathbb E[Z_iZ_j]=\frac{1}{n^2}\sum_{i=1}^n\mathbb E[Z_i^2]\le \frac{4K^2}{n}.
\end{align*}
For every $\varepsilon>0$, the nonnegative random variable $(P_nh-Ph)^2$ satisfies
\begin{align*}
\varepsilon^2\,\mathbb 1_{\{|P_nh-Ph|>\varepsilon\}}\le (P_nh-Ph)^2.
\end{align*}
Integrating with respect to $\mathbb P$ gives the Markov estimate
\begin{align*}
\mathbb P(|P_nh-Ph|>\varepsilon)\le \frac{1}{\varepsilon^2}\mathbb E\left[\left(P_nh-Ph\right)^2\right]\le \frac{4K^2}{n\varepsilon^2}.
\end{align*}
Letting $n\to\infty$ proves $P_nh\xrightarrow{\mathbb P}Ph$ for bounded measurable $h$.
[guided]
The bounded case is the variance estimate that drives the whole proof. Let $h:\mathcal X\to\mathbb R$ be bounded and $\mathcal A$-measurable, and define
\begin{align*}
Ph:=\int_{\mathcal X} h(x)\,dP(x), \qquad P_nh(\omega):=\frac{1}{n}\sum_{i=1}^n h(X_i(\omega)).
\end{align*}
If $K:=\sup_{x\in\mathcal X}|h(x)|$, then $K<\infty$. For each $i\in\mathbb N$, set $Z_i(\omega):=h(X_i(\omega))-Ph$. Since the $X_i$ are independent and all have law $P$, the $Z_i$ are independent and identically distributed. The change-of-law identity gives
\begin{align*}
\mathbb E[Z_i]=\int_\Omega h(X_i(\omega))\,d\mathbb P(\omega)-Ph=\int_{\mathcal X}h(x)\,dP(x)-Ph=0.
\end{align*}
Also $|Z_i|\le 2K$, so $\mathbb E[Z_i^2]\le 4K^2$. If $i<j$, independence gives $\mathbb E[Z_iZ_j]=\mathbb E[Z_i]\mathbb E[Z_j]=0$. Therefore
\begin{align*}
\mathbb E\left[\left(P_nh-Ph\right)^2\right]=\mathbb E\left[\left(\frac{1}{n}\sum_{i=1}^n Z_i\right)^2\right].
\end{align*}
Expanding the square and using the vanishing of the mixed terms gives
\begin{align*}
\mathbb E\left[\left(P_nh-Ph\right)^2\right]=\frac{1}{n^2}\sum_{i=1}^n\mathbb E[Z_i^2]+\frac{2}{n^2}\sum_{1\le i<j\le n}\mathbb E[Z_iZ_j].
\end{align*}
Since $\mathbb E[Z_iZ_j]=0$ for $i<j$ and $\mathbb E[Z_i^2]\le 4K^2$, we obtain
\begin{align*}
\mathbb E\left[\left(P_nh-Ph\right)^2\right]\le \frac{4K^2}{n}.
\end{align*}
For every $\varepsilon>0$, the nonnegative random variable $(P_nh-Ph)^2$ satisfies
\begin{align*}
\varepsilon^2\,\mathbb 1_{\{|P_nh-Ph|>\varepsilon\}}\le (P_nh-Ph)^2.
\end{align*}
After integration with respect to $\mathbb P$, this yields
\begin{align*}
\mathbb P(|P_nh-Ph|>\varepsilon)\le \frac{1}{\varepsilon^2}\mathbb E\left[\left(P_nh-Ph\right)^2\right]\le \frac{4K^2}{n\varepsilon^2}.
\end{align*}
The right-hand side tends to $0$ as $n\to\infty$, which proves $P_nh\xrightarrow{\mathbb P}Ph$ for bounded measurable $h$.
[/guided]
[/step]
[step:Truncate the integrable function and identify the tail]
For each $M>0$, define the truncated function $g_M:\mathcal X\to\mathbb R$ by
\begin{align*}
g_M(x):=g(x)\,\mathbb 1_{\{|g|\le M\}}(x).
\end{align*}
Define the tail function $r_M:\mathcal X\to\mathbb R$ by
\begin{align*}
r_M(x):=g(x)-g_M(x).
\end{align*}
Then $g_M$ is bounded and measurable, with $|g_M(x)|\le M$ for every $x\in\mathcal X$, and
\begin{align*}
|r_M(x)|=|g(x)|\,\mathbb 1_{\{|g|>M\}}(x).
\end{align*}
Since $g$ is integrable, meaning that $|g|$ is $P$-integrable and
\begin{align*}
\int_{\mathcal X}|g(x)|\,dP(x)<\infty,
\end{align*}
and since $|g|\,\mathbb 1_{\{|g|>M\}}\downarrow 0$ pointwise as $M\to\infty$, the [Dominated Convergence Theorem](/theorems/4) applied with dominating function $|g|$ gives
\begin{align*}
P|r_M|:=\int_{\mathcal X}|r_M(x)|\,dP(x)=\int_{\mathcal X}|g(x)|\,\mathbb 1_{\{|g|>M\}}(x)\,dP(x)\to 0.
\end{align*}
[guided]
The goal is to replace the possibly unbounded function $g$ by a bounded approximation, because the bounded case has already been proved. For each $M>0$, define the function $g_M:\mathcal X\to\mathbb R$ by
\begin{align*}
g_M(x):=g(x)\,\mathbb 1_{\{|g|\le M\}}(x).
\end{align*}
This function is measurable because $g$ is measurable and $\{|g|\le M\}\in\mathcal A$. It is bounded because $|g_M(x)|\le M$ for every $x\in\mathcal X$.
The part removed by the truncation is the function $r_M:\mathcal X\to\mathbb R$ defined by
\begin{align*}
r_M(x):=g(x)-g_M(x).
\end{align*}
By the definition of $g_M$, this tail satisfies
\begin{align*}
|r_M(x)|=|g(x)|\,\mathbb 1_{\{|g|>M\}}(x)
\end{align*}
for every $x\in\mathcal X$. This identity is the key point: the truncation error is not arbitrary; it is exactly the part of $|g|$ lying above level $M$.
Because $g$ is integrable,
\begin{align*}
\int_{\mathcal X}|g(x)|\,dP(x)<\infty.
\end{align*}
As $M\to\infty$, the functions $|g|\,\mathbb 1_{\{|g|>M\}}$ decrease pointwise to $0$ and are dominated by the integrable function $|g|$. The hypotheses of the [Dominated Convergence Theorem](/theorems/4) are therefore satisfied: the functions converge pointwise to $0$ and the dominating function has finite $P$-integral. Hence the tail integrals vanish:
\begin{align*}
P|r_M|:=\int_{\mathcal X}|r_M(x)|\,dP(x)=\int_{\mathcal X}|g(x)|\,\mathbb 1_{\{|g|>M\}}(x)\,dP(x)\to 0.
\end{align*}
This is precisely where the hypothesis $g\in L^1(\mathcal X,\mathcal A,P)$ is used.
[/guided]
[/step]
[step:Control the empirical tail by its expectation]
For each $M>0$ and $n\in\mathbb N$, define the nonnegative random variable $P_n|r_M|:\Omega\to[0,\infty)$ by
\begin{align*}
P_n|r_M|(\omega):=\frac{1}{n}\sum_{i=1}^n |r_M(X_i(\omega))|.
\end{align*}
Since each $X_i$ has distribution $P$, the change-of-law identity gives
\begin{align*}
\mathbb E[P_n|r_M|]=\frac{1}{n}\sum_{i=1}^n \int_\Omega |r_M(X_i(\omega))|\,d\mathbb P(\omega)=\frac{1}{n}\sum_{i=1}^n \int_{\mathcal X}|r_M(x)|\,dP(x)=P|r_M|.
\end{align*}
For every $a>0$, the nonnegative random variable $P_n|r_M|$ satisfies
\begin{align*}
a\,\mathbb 1_{\{P_n|r_M|>a\}}\le P_n|r_M|.
\end{align*}
Integrating with respect to $\mathbb P$ gives the Markov estimate
\begin{align*}
\mathbb P(P_n|r_M|>a)\le \frac{P|r_M|}{a}.
\end{align*}
[guided]
We need a probability bound for the empirical tail $P_n|r_M|$. For fixed $M>0$ and $n\in\mathbb N$, define
\begin{align*}
P_n|r_M|(\omega):=\frac{1}{n}\sum_{i=1}^n |r_M(X_i(\omega))|.
\end{align*}
This random variable is nonnegative because each summand is nonnegative. Since every $X_i$ has distribution $P$, the change-of-law identity gives
\begin{align*}
\mathbb E[P_n|r_M|]=\frac{1}{n}\sum_{i=1}^n\int_\Omega |r_M(X_i(\omega))|\,d\mathbb P(\omega).
\end{align*}
Applying the common law $P$ of each $X_i$ to every summand yields
\begin{align*}
\mathbb E[P_n|r_M|]=\frac{1}{n}\sum_{i=1}^n\int_{\mathcal X}|r_M(x)|\,dP(x)=P|r_M|.
\end{align*}
For $a>0$, the pointwise inequality
\begin{align*}
a\,\mathbb 1_{\{P_n|r_M|>a\}}\le P_n|r_M|
\end{align*}
holds because on the event $\{P_n|r_M|>a\}$ the right-hand side is larger than $a$, and outside that event the left-hand side is $0$. Integrating with respect to $\mathbb P$ gives
\begin{align*}
a\,\mathbb P(P_n|r_M|>a)\le \mathbb E[P_n|r_M|]=P|r_M|,
\end{align*}
and division by $a>0$ yields
\begin{align*}
\mathbb P(P_n|r_M|>a)\le \frac{P|r_M|}{a}.
\end{align*}
[/guided]
[/step]
[step:Combine the truncated convergence with the tail estimates]
Fix $\varepsilon>0$. For every $M>0$ and $n\in\mathbb N$, the triangle inequality gives
\begin{align*}
|P_ng-Pg|\le |P_ng_M-Pg_M|+|P_nr_M-Pr_M|\le |P_ng_M-Pg_M|+P_n|r_M|+P|r_M|.
\end{align*}
Choose $M>0$ such that
\begin{align*}
P|r_M|<\frac{\varepsilon}{3}.
\end{align*}
Then
\begin{align*}
\{|P_ng-Pg|>\varepsilon\}
\subset
\left\{|P_ng_M-Pg_M|>\frac{\varepsilon}{3}\right\}
\cup
\left\{P_n|r_M|>\frac{\varepsilon}{3}\right\}.
\end{align*}
Taking probabilities and applying the union bound,
\begin{align*}
\mathbb P(|P_ng-Pg|>\varepsilon)\le \mathbb P\left(|P_ng_M-Pg_M|>\frac{\varepsilon}{3}\right)+\mathbb P\left(P_n|r_M|>\frac{\varepsilon}{3}\right)\le \mathbb P\left(|P_ng_M-Pg_M|>\frac{\varepsilon}{3}\right)+\frac{3P|r_M|}{\varepsilon}.
\end{align*}
The function $g_M$ is bounded and measurable, so the bounded case proves
\begin{align*}
\mathbb P\left(|P_ng_M-Pg_M|>\frac{\varepsilon}{3}\right)\to 0
\end{align*}
as $n\to\infty$. Hence
\begin{align*}
\limsup_{n\to\infty}\mathbb P(|P_ng-Pg|>\varepsilon)
\le \frac{3P|r_M|}{\varepsilon}.
\end{align*}
Since $M$ was chosen after using only the tail condition and $P|r_M|\to 0$ as $M\to\infty$, we obtain
\begin{align*}
\limsup_{n\to\infty}\mathbb P(|P_ng-Pg|>\varepsilon)=0.
\end{align*}
Because $\varepsilon>0$ was arbitrary, this proves
\begin{align*}
P_ng\xrightarrow{\mathbb P}Pg.
\end{align*}
[guided]
Fix $\varepsilon>0$. The theorem statement defines $Pg=\int_{\mathcal X}g(x)\,dP(x)$ and $P_ng(\omega)=n^{-1}\sum_{i=1}^n g(X_i(\omega))$, so the notation is valid for the integrable function $g$. For each $M>0$, write $g=g_M+r_M$, where $g_M$ is bounded and $r_M$ is the tail. Then the triangle inequality gives
\begin{align*}
|P_ng-Pg|\le |P_ng_M-Pg_M|+|P_nr_M-Pr_M|.
\end{align*}
Using $|P_nr_M|\le P_n|r_M|$ and $|Pr_M|\le P|r_M|$, we get
\begin{align*}
|P_ng-Pg|\le |P_ng_M-Pg_M|+P_n|r_M|+P|r_M|.
\end{align*}
Choose $M>0$ so that $P|r_M|<\varepsilon/3$, which is possible because the truncation step proved $P|r_M|\to 0$. If $|P_ng-Pg|>\varepsilon$, then at least one of the two inequalities
\begin{align*}
|P_ng_M-Pg_M|>\frac{\varepsilon}{3}, \qquad P_n|r_M|>\frac{\varepsilon}{3}
\end{align*}
must hold. Hence the union bound gives
\begin{align*}
\mathbb P(|P_ng-Pg|>\varepsilon)
\le \mathbb P\left(|P_ng_M-Pg_M|>\frac{\varepsilon}{3}\right)+\mathbb P\left(P_n|r_M|>\frac{\varepsilon}{3}\right).
\end{align*}
The first term tends to $0$ as $n\to\infty$ by the bounded case applied to the bounded measurable function $g_M$. The empirical-tail estimate with $a=\varepsilon/3$ gives
\begin{align*}
\mathbb P\left(P_n|r_M|>\frac{\varepsilon}{3}\right)\le \frac{3P|r_M|}{\varepsilon}.
\end{align*}
Therefore
\begin{align*}
\limsup_{n\to\infty}\mathbb P(|P_ng-Pg|>\varepsilon)\le \frac{3P|r_M|}{\varepsilon}.
\end{align*}
Finally let $M\to\infty$. Since $P|r_M|\to 0$, the right-hand side tends to $0$, so
\begin{align*}
\limsup_{n\to\infty}\mathbb P(|P_ng-Pg|>\varepsilon)=0.
\end{align*}
Because this holds for every $\varepsilon>0$, it is exactly the definition of $P_ng\xrightarrow{\mathbb P}Pg$.
[/guided]
[/step]