Weak Law of Large Numbers for an Integrable Function

Weak Law of Large Numbers for an Integrable Function (Theorem # 6294)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We first prove the assertion for bounded [measurable functions](/page/Measurable%20Functions) by estimating the second moment of the centered empirical average and using the elementary [Markov inequality](/theorems/514) for its square. For a general integrable function, we truncate $g$ to a bounded function $g_M$ and decompose the empirical error into the bounded truncated error plus two tail errors. The bounded part vanishes as $n\to\infty$, while integrability of $g$ makes the population tail small, and the empirical tail is controlled in probability by its expectation. [/proofplan] [step:Prove the convergence for bounded measurable functions] Let $h:\mathcal X\to\mathbb R$ be a bounded $\mathcal A$-measurable function. Define \begin{align*} Ph:=\int_{\mathcal X} h(x)\,dP(x). \end{align*} For each $n\in\mathbb N$, define the real-valued [random variable](/page/Random%20Variable) $P_nh:\Omega\to\mathbb R$ by \begin{align*} P_nh(\omega):=\frac{1}{n}\sum_{i=1}^n h(X_i(\omega)). \end{align*} Let $K:=\sup_{x\in\mathcal X}|h(x)|<\infty$. For any integrable real-valued random variable $Y:\Omega\to\mathbb R$, write \begin{align*} \mathbb E[Y]:=\int_\Omega Y(\omega)\,d\mathbb P(\omega). \end{align*} For each $i\in\mathbb N$, define the centered real-valued random variable $Z_i:\Omega\to\mathbb R$ by \begin{align*} Z_i(\omega):=h(X_i(\omega))-Ph. \end{align*} Since $X_1,X_2,\dots$ are independent and identically distributed with common distribution $P$, the random variables $Z_1,Z_2,\dots$ are independent and identically distributed. The change-of-law identity for $X_i$ gives \begin{align*} \mathbb E[Z_i]=\int_\Omega \bigl(h(X_i(\omega))-Ph\bigr)\,d\mathbb P(\omega)=\int_{\mathcal X} h(x)\,dP(x)-Ph=0. \end{align*} Moreover $|Z_i|\le 2K$, so $\mathbb E[Z_i^2]\le 4K^2$. For $i<j$, independence gives $\mathbb E[Z_iZ_j]=\mathbb E[Z_i]\mathbb E[Z_j]=0$. Expanding the square and using these mixed-term identities, \begin{align*} \mathbb E\left[\left(P_nh-Ph\right)^2\right]=\mathbb E\left[\left(\frac{1}{n}\sum_{i=1}^n Z_i\right)^2\right]=\frac{1}{n^2}\sum_{i=1}^n\mathbb E[Z_i^2]+\frac{2}{n^2}\sum_{1\le i<j\le n}\mathbb E[Z_iZ_j]=\frac{1}{n^2}\sum_{i=1}^n\mathbb E[Z_i^2]\le \frac{4K^2}{n}. \end{align*} For every $\varepsilon>0$, the nonnegative random variable $(P_nh-Ph)^2$ satisfies \begin{align*} \varepsilon^2\,\mathbb 1_{\{|P_nh-Ph|>\varepsilon\}}\le (P_nh-Ph)^2. \end{align*} Integrating with respect to $\mathbb P$ gives the Markov estimate \begin{align*} \mathbb P(|P_nh-Ph|>\varepsilon)\le \frac{1}{\varepsilon^2}\mathbb E\left[\left(P_nh-Ph\right)^2\right]\le \frac{4K^2}{n\varepsilon^2}. \end{align*} Letting $n\to\infty$ proves $P_nh\xrightarrow{\mathbb P}Ph$ for bounded measurable $h$. [guided] The bounded case is the variance estimate that drives the whole proof. Let $h:\mathcal X\to\mathbb R$ be bounded and $\mathcal A$-measurable, and define \begin{align*} Ph:=\int_{\mathcal X} h(x)\,dP(x), \qquad P_nh(\omega):=\frac{1}{n}\sum_{i=1}^n h(X_i(\omega)). \end{align*} If $K:=\sup_{x\in\mathcal X}|h(x)|$, then $K<\infty$. For each $i\in\mathbb N$, set $Z_i(\omega):=h(X_i(\omega))-Ph$. Since the $X_i$ are independent and all have law $P$, the $Z_i$ are independent and identically distributed. The change-of-law identity gives \begin{align*} \mathbb E[Z_i]=\int_\Omega h(X_i(\omega))\,d\mathbb P(\omega)-Ph=\int_{\mathcal X}h(x)\,dP(x)-Ph=0. \end{align*} Also $|Z_i|\le 2K$, so $\mathbb E[Z_i^2]\le 4K^2$. If $i<j$, independence gives $\mathbb E[Z_iZ_j]=\mathbb E[Z_i]\mathbb E[Z_j]=0$. Therefore \begin{align*} \mathbb E\left[\left(P_nh-Ph\right)^2\right]=\mathbb E\left[\left(\frac{1}{n}\sum_{i=1}^n Z_i\right)^2\right]. \end{align*} Expanding the square and using the vanishing of the mixed terms gives \begin{align*} \mathbb E\left[\left(P_nh-Ph\right)^2\right]=\frac{1}{n^2}\sum_{i=1}^n\mathbb E[Z_i^2]+\frac{2}{n^2}\sum_{1\le i<j\le n}\mathbb E[Z_iZ_j]. \end{align*} Since $\mathbb E[Z_iZ_j]=0$ for $i<j$ and $\mathbb E[Z_i^2]\le 4K^2$, we obtain \begin{align*} \mathbb E\left[\left(P_nh-Ph\right)^2\right]\le \frac{4K^2}{n}. \end{align*} For every $\varepsilon>0$, the nonnegative random variable $(P_nh-Ph)^2$ satisfies \begin{align*} \varepsilon^2\,\mathbb 1_{\{|P_nh-Ph|>\varepsilon\}}\le (P_nh-Ph)^2. \end{align*} After integration with respect to $\mathbb P$, this yields \begin{align*} \mathbb P(|P_nh-Ph|>\varepsilon)\le \frac{1}{\varepsilon^2}\mathbb E\left[\left(P_nh-Ph\right)^2\right]\le \frac{4K^2}{n\varepsilon^2}. \end{align*} The right-hand side tends to $0$ as $n\to\infty$, which proves $P_nh\xrightarrow{\mathbb P}Ph$ for bounded measurable $h$. [/guided] [/step] [step:Truncate the integrable function and identify the tail] For each $M>0$, define the truncated function $g_M:\mathcal X\to\mathbb R$ by \begin{align*} g_M(x):=g(x)\,\mathbb 1_{\{|g|\le M\}}(x). \end{align*} Define the tail function $r_M:\mathcal X\to\mathbb R$ by \begin{align*} r_M(x):=g(x)-g_M(x). \end{align*} Then $g_M$ is bounded and measurable, with $|g_M(x)|\le M$ for every $x\in\mathcal X$, and \begin{align*} |r_M(x)|=|g(x)|\,\mathbb 1_{\{|g|>M\}}(x). \end{align*} Since $g$ is integrable, meaning that $|g|$ is $P$-integrable and \begin{align*} \int_{\mathcal X}|g(x)|\,dP(x)<\infty, \end{align*} and since $|g|\,\mathbb 1_{\{|g|>M\}}\downarrow 0$ pointwise as $M\to\infty$, the [Dominated Convergence Theorem](/theorems/4) applied with dominating function $|g|$ gives \begin{align*} P|r_M|:=\int_{\mathcal X}|r_M(x)|\,dP(x)=\int_{\mathcal X}|g(x)|\,\mathbb 1_{\{|g|>M\}}(x)\,dP(x)\to 0. \end{align*} [guided] The goal is to replace the possibly unbounded function $g$ by a bounded approximation, because the bounded case has already been proved. For each $M>0$, define the function $g_M:\mathcal X\to\mathbb R$ by \begin{align*} g_M(x):=g(x)\,\mathbb 1_{\{|g|\le M\}}(x). \end{align*} This function is measurable because $g$ is measurable and $\{|g|\le M\}\in\mathcal A$. It is bounded because $|g_M(x)|\le M$ for every $x\in\mathcal X$. The part removed by the truncation is the function $r_M:\mathcal X\to\mathbb R$ defined by \begin{align*} r_M(x):=g(x)-g_M(x). \end{align*} By the definition of $g_M$, this tail satisfies \begin{align*} |r_M(x)|=|g(x)|\,\mathbb 1_{\{|g|>M\}}(x) \end{align*} for every $x\in\mathcal X$. This identity is the key point: the truncation error is not arbitrary; it is exactly the part of $|g|$ lying above level $M$. Because $g$ is integrable, \begin{align*} \int_{\mathcal X}|g(x)|\,dP(x)<\infty. \end{align*} As $M\to\infty$, the functions $|g|\,\mathbb 1_{\{|g|>M\}}$ decrease pointwise to $0$ and are dominated by the integrable function $|g|$. The hypotheses of the [Dominated Convergence Theorem](/theorems/4) are therefore satisfied: the functions converge pointwise to $0$ and the dominating function has finite $P$-integral. Hence the tail integrals vanish: \begin{align*} P|r_M|:=\int_{\mathcal X}|r_M(x)|\,dP(x)=\int_{\mathcal X}|g(x)|\,\mathbb 1_{\{|g|>M\}}(x)\,dP(x)\to 0. \end{align*} This is precisely where the hypothesis $g\in L^1(\mathcal X,\mathcal A,P)$ is used. [/guided] [/step] [step:Control the empirical tail by its expectation] For each $M>0$ and $n\in\mathbb N$, define the nonnegative random variable $P_n|r_M|:\Omega\to[0,\infty)$ by \begin{align*} P_n|r_M|(\omega):=\frac{1}{n}\sum_{i=1}^n |r_M(X_i(\omega))|. \end{align*} Since each $X_i$ has distribution $P$, the change-of-law identity gives \begin{align*} \mathbb E[P_n|r_M|]=\frac{1}{n}\sum_{i=1}^n \int_\Omega |r_M(X_i(\omega))|\,d\mathbb P(\omega)=\frac{1}{n}\sum_{i=1}^n \int_{\mathcal X}|r_M(x)|\,dP(x)=P|r_M|. \end{align*} For every $a>0$, the nonnegative random variable $P_n|r_M|$ satisfies \begin{align*} a\,\mathbb 1_{\{P_n|r_M|>a\}}\le P_n|r_M|. \end{align*} Integrating with respect to $\mathbb P$ gives the Markov estimate \begin{align*} \mathbb P(P_n|r_M|>a)\le \frac{P|r_M|}{a}. \end{align*} [guided] We need a probability bound for the empirical tail $P_n|r_M|$. For fixed $M>0$ and $n\in\mathbb N$, define \begin{align*} P_n|r_M|(\omega):=\frac{1}{n}\sum_{i=1}^n |r_M(X_i(\omega))|. \end{align*} This random variable is nonnegative because each summand is nonnegative. Since every $X_i$ has distribution $P$, the change-of-law identity gives \begin{align*} \mathbb E[P_n|r_M|]=\frac{1}{n}\sum_{i=1}^n\int_\Omega |r_M(X_i(\omega))|\,d\mathbb P(\omega). \end{align*} Applying the common law $P$ of each $X_i$ to every summand yields \begin{align*} \mathbb E[P_n|r_M|]=\frac{1}{n}\sum_{i=1}^n\int_{\mathcal X}|r_M(x)|\,dP(x)=P|r_M|. \end{align*} For $a>0$, the pointwise inequality \begin{align*} a\,\mathbb 1_{\{P_n|r_M|>a\}}\le P_n|r_M| \end{align*} holds because on the event $\{P_n|r_M|>a\}$ the right-hand side is larger than $a$, and outside that event the left-hand side is $0$. Integrating with respect to $\mathbb P$ gives \begin{align*} a\,\mathbb P(P_n|r_M|>a)\le \mathbb E[P_n|r_M|]=P|r_M|, \end{align*} and division by $a>0$ yields \begin{align*} \mathbb P(P_n|r_M|>a)\le \frac{P|r_M|}{a}. \end{align*} [/guided] [/step] [step:Combine the truncated convergence with the tail estimates] Fix $\varepsilon>0$. For every $M>0$ and $n\in\mathbb N$, the triangle inequality gives \begin{align*} |P_ng-Pg|\le |P_ng_M-Pg_M|+|P_nr_M-Pr_M|\le |P_ng_M-Pg_M|+P_n|r_M|+P|r_M|. \end{align*} Choose $M>0$ such that \begin{align*} P|r_M|<\frac{\varepsilon}{3}. \end{align*} Then \begin{align*} \{|P_ng-Pg|>\varepsilon\} \subset \left\{|P_ng_M-Pg_M|>\frac{\varepsilon}{3}\right\} \cup \left\{P_n|r_M|>\frac{\varepsilon}{3}\right\}. \end{align*} Taking probabilities and applying the union bound, \begin{align*} \mathbb P(|P_ng-Pg|>\varepsilon)\le \mathbb P\left(|P_ng_M-Pg_M|>\frac{\varepsilon}{3}\right)+\mathbb P\left(P_n|r_M|>\frac{\varepsilon}{3}\right)\le \mathbb P\left(|P_ng_M-Pg_M|>\frac{\varepsilon}{3}\right)+\frac{3P|r_M|}{\varepsilon}. \end{align*} The function $g_M$ is bounded and measurable, so the bounded case proves \begin{align*} \mathbb P\left(|P_ng_M-Pg_M|>\frac{\varepsilon}{3}\right)\to 0 \end{align*} as $n\to\infty$. Hence \begin{align*} \limsup_{n\to\infty}\mathbb P(|P_ng-Pg|>\varepsilon) \le \frac{3P|r_M|}{\varepsilon}. \end{align*} Since $M$ was chosen after using only the tail condition and $P|r_M|\to 0$ as $M\to\infty$, we obtain \begin{align*} \limsup_{n\to\infty}\mathbb P(|P_ng-Pg|>\varepsilon)=0. \end{align*} Because $\varepsilon>0$ was arbitrary, this proves \begin{align*} P_ng\xrightarrow{\mathbb P}Pg. \end{align*} [guided] Fix $\varepsilon>0$. The theorem statement defines $Pg=\int_{\mathcal X}g(x)\,dP(x)$ and $P_ng(\omega)=n^{-1}\sum_{i=1}^n g(X_i(\omega))$, so the notation is valid for the integrable function $g$. For each $M>0$, write $g=g_M+r_M$, where $g_M$ is bounded and $r_M$ is the tail. Then the triangle inequality gives \begin{align*} |P_ng-Pg|\le |P_ng_M-Pg_M|+|P_nr_M-Pr_M|. \end{align*} Using $|P_nr_M|\le P_n|r_M|$ and $|Pr_M|\le P|r_M|$, we get \begin{align*} |P_ng-Pg|\le |P_ng_M-Pg_M|+P_n|r_M|+P|r_M|. \end{align*} Choose $M>0$ so that $P|r_M|<\varepsilon/3$, which is possible because the truncation step proved $P|r_M|\to 0$. If $|P_ng-Pg|>\varepsilon$, then at least one of the two inequalities \begin{align*} |P_ng_M-Pg_M|>\frac{\varepsilon}{3}, \qquad P_n|r_M|>\frac{\varepsilon}{3} \end{align*} must hold. Hence the union bound gives \begin{align*} \mathbb P(|P_ng-Pg|>\varepsilon) \le \mathbb P\left(|P_ng_M-Pg_M|>\frac{\varepsilon}{3}\right)+\mathbb P\left(P_n|r_M|>\frac{\varepsilon}{3}\right). \end{align*} The first term tends to $0$ as $n\to\infty$ by the bounded case applied to the bounded measurable function $g_M$. The empirical-tail estimate with $a=\varepsilon/3$ gives \begin{align*} \mathbb P\left(P_n|r_M|>\frac{\varepsilon}{3}\right)\le \frac{3P|r_M|}{\varepsilon}. \end{align*} Therefore \begin{align*} \limsup_{n\to\infty}\mathbb P(|P_ng-Pg|>\varepsilon)\le \frac{3P|r_M|}{\varepsilon}. \end{align*} Finally let $M\to\infty$. Since $P|r_M|\to 0$, the right-hand side tends to $0$, so \begin{align*} \limsup_{n\to\infty}\mathbb P(|P_ng-Pg|>\varepsilon)=0. \end{align*} Because this holds for every $\varepsilon>0$, it is exactly the definition of $P_ng\xrightarrow{\mathbb P}Pg$. [/guided] [/step]

Prerequisites (0/6 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

Triangle Inequality For Inner Product Spaces

Definitions & Concepts

Explore Further

Distribution Definition Expectation Definition Random Variable Definition Variance Definition Event Definition Triangle Inequality For Inner Product Spaces Theorem #433 Sub-Gaussian Tail Implies Moment Growth Probability & Statistics Closed Form Formula for the Ridge Regression Estimator Probability & Statistics Sub-Gaussian Empirical Mean Confidence Bound Probability & Statistics Jensen's Inequality Probability Theory Basic Implications Between Modes of Convergence Probability Theory MISE Rate for Multivariate Kernel Density Estimation Probability & Statistics Second-Moment Criterion for Contiguity Probability & Statistics Central Limit Theorem for Nondegenerate U-Statistics Probability & Statistics Probability & Statistics Area

What brings you to Androma?

Start with a route through the knowledge graph.