Law of the Unconscious Statistician — Statement & Proof

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

No discussion available for this theorem.

Proof

[proofplan] The identity $\mathbb{E}[g(X)] = \int_{\mathbb{R}} g(x) \, d\mu_X(x)$ is proved by the standard four-step approximation scheme of abstract integration theory. In Step 1 we verify the identity for indicator functions $g = \mathbb{1}_B$ directly from the definition $\mu_X = \mathbb{P} \circ X^{-1}$. In Step 2 we extend by linearity of expectation to non-negative simple functions. In Step 3 we pass to arbitrary non-negative Borel-measurable $g$ by approximating monotonically with simple functions and applying the Monotone Convergence Theorem simultaneously on $(\Omega, \mathcal{F}, \mathbb{P})$ and on $(\mathbb{R}, \mathcal{B}(\mathbb{R}), \mu_X)$. In Step 4 we handle integrable functions of arbitrary sign via the positive-negative decomposition $g = g^+ - g^-$. The density formula (iii) then follows from the [Radon-Nikodym Theorem](/page/Radon-Nikodym%20Theorem) applied to the pair $\mu_X \ll \mathcal{L}^1$. [/proofplan] [step:Establish the identity for indicator functions from the definition of the pushforward] Let $B \in \mathcal{B}(\mathbb{R})$ and set $g = \mathbb{1}_B$. Since $X: (\Omega, \mathcal{F}) \to (\mathbb{R}, \mathcal{B}(\mathbb{R}))$ is measurable, $X^{-1}(B) \in \mathcal{F}$. For every $\omega \in \Omega$, \begin{align*} \mathbb{1}_B(X(\omega)) = 1 \iff X(\omega) \in B \iff \omega \in X^{-1}(B), \end{align*} so $\mathbb{1}_B \circ X = \mathbb{1}_{X^{-1}(B)}$ pointwise on $\Omega$. Therefore: \begin{align*} \mathbb{E}[\mathbb{1}_B(X)] = \mathbb{E}[\mathbb{1}_{X^{-1}(B)}] = \mathbb{P}(X^{-1}(B)) = \mu_X(B) = \int_{\mathbb{R}} \mathbb{1}_B(x) \, d\mu_X(x). \end{align*} [guided] The strategy for the full proof is to build up from the simplest possible test functions — indicators of measurable sets — where the identity reduces directly to the definition of $\mu_X$. Fix $B \in \mathcal{B}(\mathbb{R})$ and set $g = \mathbb{1}_B$. Since $X: (\Omega, \mathcal{F}) \to (\mathbb{R}, \mathcal{B}(\mathbb{R}))$ is $\mathcal{F}/\mathcal{B}(\mathbb{R})$-measurable and $B \in \mathcal{B}(\mathbb{R})$, the preimage $X^{-1}(B) = \{\omega \in \Omega : X(\omega) \in B\}$ belongs to $\mathcal{F}$. Hence $\mathbb{1}_{X^{-1}(B)}$ is a valid $\mathcal{F}$-measurable [random variable](/page/Random%20Variable). Pointwise on $\Omega$: $\mathbb{1}_B(X(\omega)) = 1$ if and only if $X(\omega) \in B$, which holds if and only if $\omega \in X^{-1}(B)$. Thus $\mathbb{1}_B \circ X = \mathbb{1}_{X^{-1}(B)}$ as functions on $\Omega$. Now compute: \begin{align*} \mathbb{E}[\mathbb{1}_B(X)] = \mathbb{E}[\mathbb{1}_{X^{-1}(B)}] = \mathbb{P}(X^{-1}(B)). \end{align*} By the definition of the pushforward (image) measure $\mu_X = \mathbb{P} \circ X^{-1}$: \begin{align*} \mathbb{P}(X^{-1}(B)) = (\mathbb{P} \circ X^{-1})(B) = \mu_X(B). \end{align*} By the definition of the [Lebesgue integral](/page/Lebesgue%20Integral) of a non-negative [simple function](/page/Simple%20Function) against $\mu_X$: \begin{align*} \mu_X(B) = \int_{\mathbb{R}} \mathbb{1}_B(x) \, d\mu_X(x). \end{align*} Chaining these three equalities gives $\mathbb{E}[\mathbb{1}_B(X)] = \int_{\mathbb{R}} \mathbb{1}_B(x) \, d\mu_X(x)$, completing Step 1. [/guided] [/step] [step:Extend to non-negative simple functions by linearity of expectation] Let $g: \mathbb{R} \to [0, \infty)$ be a non-negative $\mathcal{B}(\mathbb{R})$-measurable simple function in canonical form: \begin{align*} g = \sum_{k=1}^{N} a_k \, \mathbb{1}_{B_k}, \end{align*} where $N \in \mathbb{N}$, $a_k \geq 0$, and $B_1, \dots, B_N \in \mathcal{B}(\mathbb{R})$ are pairwise disjoint. Then $g \circ X = \sum_{k=1}^{N} a_k \, \mathbb{1}_{B_k}(X)$. Each term $a_k \mathbb{1}_{B_k}(X)$ is a bounded $\mathcal{F}$-measurable random variable and hence integrable. By linearity of expectation and Step 1: \begin{align*} \mathbb{E}[g(X)] = \sum_{k=1}^{N} a_k \, \mathbb{E}[\mathbb{1}_{B_k}(X)] = \sum_{k=1}^{N} a_k \, \mu_X(B_k) = \int_{\mathbb{R}} g(x) \, d\mu_X(x), \end{align*} where the last equality is the definition of $\int_{\mathbb{R}} g \, d\mu_X$ for a non-negative simple function. [guided] A non-negative simple function is a finite non-negative linear combination of indicator functions, so the identity of Step 1 propagates by linearity. The canonical representation (disjoint sets, non-negative coefficients) is no loss of generality: any simple function can be written in this form by taking its level sets as the $B_k$. Since $g \circ X = \sum_{k=1}^N a_k \mathbb{1}_{B_k}(X)$ and each $a_k \geq 0$ is a non-negative finite constant, each summand $a_k \mathbb{1}_{B_k}(X)$ is bounded and $\mathcal{F}$-measurable. A bounded measurable function on a probability space is automatically integrable (its expectation exists and is finite). Linearity of expectation for finite sums of integrable random variables gives: \begin{align*} \mathbb{E}[g(X)] = \sum_{k=1}^{N} a_k \, \mathbb{E}[\mathbb{1}_{B_k}(X)]. \end{align*} By Step 1, $\mathbb{E}[\mathbb{1}_{B_k}(X)] = \mu_X(B_k)$ for each $k = 1, \dots, N$. By the standard definition of the abstract Lebesgue integral of a non-negative simple function against $\mu_X$: \begin{align*} \sum_{k=1}^{N} a_k \, \mu_X(B_k) = \int_{\mathbb{R}} g(x) \, d\mu_X(x). \end{align*} Combining: $\mathbb{E}[g(X)] = \int_{\mathbb{R}} g(x) \, d\mu_X(x)$. [/guided] [/step] [step:Lift to all non-negative Borel-measurable functions via monotone approximation and the Monotone Convergence Theorem] Let $g: \mathbb{R} \to [0, \infty)$ be Borel-measurable. By [Monotone Approximation by Simple Functions](/theorems/1020), there exists a sequence $(g_n)_{n=1}^{\infty}$ of non-negative $\mathcal{B}(\mathbb{R})$-measurable simple functions satisfying $0 \leq g_n \nearrow g$ pointwise on $\mathbb{R}$. Since $g_n \nearrow g$ on $\mathbb{R}$, the compositions satisfy $0 \leq g_n(X(\omega)) \nearrow g(X(\omega))$ for every $\omega \in \Omega$. The sequence $(g_n \circ X)_{n \geq 1}$ consists of non-negative $\mathcal{F}$-measurable random variables that increase monotonically to $g \circ X$. Applying the [Monotone Convergence Theorem](/theorems/509) on $(\Omega, \mathcal{F}, \mathbb{P})$: \begin{align*} \mathbb{E}[g(X)] = \lim_{n \to \infty} \mathbb{E}[g_n(X)]. \end{align*} Applying the [Monotone Convergence Theorem](/theorems/509) on $(\mathbb{R}, \mathcal{B}(\mathbb{R}), \mu_X)$ to the non-negative monotone non-decreasing sequence $(g_n)$ converging pointwise to $g$: \begin{align*} \int_{\mathbb{R}} g(x) \, d\mu_X(x) = \lim_{n \to \infty} \int_{\mathbb{R}} g_n(x) \, d\mu_X(x). \end{align*} By Step 2, $\mathbb{E}[g_n(X)] = \int_{\mathbb{R}} g_n(x) \, d\mu_X(x)$ for every $n \geq 1$. Therefore: \begin{align*} \mathbb{E}[g(X)] = \lim_{n \to \infty} \mathbb{E}[g_n(X)] = \lim_{n \to \infty} \int_{\mathbb{R}} g_n(x) \, d\mu_X(x) = \int_{\mathbb{R}} g(x) \, d\mu_X(x). \end{align*} Both sides may equal $+\infty$. This establishes part (i) of the theorem. [guided] The passage from simple functions to arbitrary non-negative measurable $g$ is effected by sandwiching $g$ between simple functions that increase to it, then passing to the limit using the Monotone Convergence Theorem on each of the two measure spaces in play. **Constructing the approximating sequence.** By [Monotone Approximation by Simple Functions](/theorems/1020), for every non-negative Borel-measurable $g: \mathbb{R} \to [0, \infty)$ there exist non-negative $\mathcal{B}(\mathbb{R})$-measurable simple functions $g_n: \mathbb{R} \to [0, \infty)$ with \begin{align*} 0 \leq g_1(x) \leq g_2(x) \leq \dots \leq g(x) \quad \text{for all } x \in \mathbb{R}, \qquad \lim_{n \to \infty} g_n(x) = g(x) \quad \text{for all } x \in \mathbb{R}. \end{align*} (The standard construction truncates $g$ at height $n$ and rounds down to the nearest $2^{-n}$.) **Lifting to $\Omega$.** For each fixed $\omega \in \Omega$, the inequality $g_n(x) \nearrow g(x)$ applies at $x = X(\omega)$, giving $g_n(X(\omega)) \nearrow g(X(\omega))$. Hence the sequence of random variables $(g_n \circ X)_{n \geq 1}$ on $(\Omega, \mathcal{F}, \mathbb{P})$ is: (a) $\mathcal{F}$-measurable (compositions of measurable maps); (b) non-negative; (c) monotone non-decreasing; (d) converging pointwise to $g \circ X$. **Applying MCT on $(\Omega, \mathcal{F}, \mathbb{P})$.** The [Monotone Convergence Theorem](/theorems/509) requires non-negative measurable integrands in a monotone non-decreasing sequence — all verified. Therefore: \begin{align*} \mathbb{E}[g(X)] = \mathbb{E}\!\left[\lim_{n \to \infty} g_n(X)\right] = \lim_{n \to \infty} \mathbb{E}[g_n(X)]. \end{align*} **Applying MCT on $(\mathbb{R}, \mathcal{B}(\mathbb{R}), \mu_X)$.** The same three conditions hold for $(g_n)$ on $\mathbb{R}$ with respect to $\mu_X$. Therefore: \begin{align*} \int_{\mathbb{R}} g(x) \, d\mu_X(x) = \lim_{n \to \infty} \int_{\mathbb{R}} g_n(x) \, d\mu_X(x). \end{align*} **Combining.** Step 2 gives $\mathbb{E}[g_n(X)] = \int_{\mathbb{R}} g_n(x) \, d\mu_X(x)$ for each $n \geq 1$. Hence the two limits coincide: \begin{align*} \mathbb{E}[g(X)] = \lim_{n \to \infty} \mathbb{E}[g_n(X)] = \lim_{n \to \infty} \int_{\mathbb{R}} g_n(x) \, d\mu_X(x) = \int_{\mathbb{R}} g(x) \, d\mu_X(x). \end{align*} Both sides are equal in $[0, +\infty]$. Applying this to $g$ replaced by $|g|$ will later confirm the equivalence of integrability on the two spaces. [/guided] [/step] [step:Handle integrable functions of arbitrary sign via positive-negative decomposition] Let $g: \mathbb{R} \to \mathbb{R}$ be Borel-measurable with $g \circ X \in L^1(\Omega, \mathcal{F}, \mathbb{P})$. Define: \begin{align*} g^+ &:= \max\{g, 0\}: (\mathbb{R}, \mathcal{B}(\mathbb{R})) \to [0, \infty), \\ g^- &:= \max\{-g, 0\}: (\mathbb{R}, \mathcal{B}(\mathbb{R})) \to [0, \infty), \end{align*} so that $g = g^+ - g^-$ and $|g| = g^+ + g^-$ pointwise. Both $g^+$ and $g^-$ are non-negative and Borel-measurable as pointwise maxima of Borel-[measurable functions](/page/Measurable%20Functions). Since $g \circ X \in L^1(\Omega, \mathcal{F}, \mathbb{P})$, we have $\mathbb{E}[|g(X)|] < \infty$. Because $g^+(X) \leq |g(X)|$ and $g^-(X) \leq |g(X)|$ pointwise on $\Omega$, it follows that $\mathbb{E}[g^+(X)] < \infty$ and $\mathbb{E}[g^-(X)] < \infty$. Applying Step 3 to the non-negative functions $g^+$ and $g^-$: \begin{align*} \mathbb{E}[g^+(X)] = \int_{\mathbb{R}} g^+(x) \, d\mu_X(x) < \infty, \qquad \mathbb{E}[g^-(X)] = \int_{\mathbb{R}} g^-(x) \, d\mu_X(x) < \infty. \end{align*} Both integrals are finite, so their difference is well-defined. Using $g(X) = g^+(X) - g^-(X)$ and linearity of expectation: \begin{align*} \mathbb{E}[g(X)] &= \mathbb{E}[g^+(X)] - \mathbb{E}[g^-(X)] \\ &= \int_{\mathbb{R}} g^+(x) \, d\mu_X(x) - \int_{\mathbb{R}} g^-(x) \, d\mu_X(x) = \int_{\mathbb{R}} g(x) \, d\mu_X(x). \end{align*} Applying Step 3 to $|g| = g^+ + g^-$ gives $\int_{\mathbb{R}} |g| \, d\mu_X = \mathbb{E}[|g(X)|] < \infty$, so $g \in L^1(\mathbb{R}, \mathcal{B}(\mathbb{R}), \mu_X)$. Conversely, if $g \in L^1(\mathbb{R}, \mathcal{B}(\mathbb{R}), \mu_X)$, the same identity for $|g|$ gives $\mathbb{E}[|g(X)|] < \infty$. This establishes the equivalence of integrability asserted in part (ii) and completes the proof of the main identity. [guided] Steps 1–3 established the identity for non-negative $g$. A general integrable function takes both positive and negative values, so we cannot apply Step 3 directly. The remedy is the standard decomposition $g = g^+ - g^-$, which separates $g$ into two non-negative pieces. **Why we need finiteness of both parts.** Define $g^+ = \max\{g, 0\}$ and $g^- = \max\{-g, 0\}$. Both are non-negative and Borel-measurable. From $g = g^+ - g^-$ and linearity of expectation we want to write $\mathbb{E}[g(X)] = \mathbb{E}[g^+(X)] - \mathbb{E}[g^-(X)]$. This subtraction is only valid when both terms are finite; otherwise we would be forming $+\infty - (+\infty)$, which is undefined. The hypothesis $g \circ X \in L^1(\Omega, \mathcal{F}, \mathbb{P})$ guarantees $\mathbb{E}[|g(X)|] < \infty$. Since $0 \leq g^+(X) \leq |g(X)|$ and $0 \leq g^-(X) \leq |g(X)|$ pointwise, we get $\mathbb{E}[g^+(X)] \leq \mathbb{E}[|g(X)|] < \infty$ and $\mathbb{E}[g^-(X)] \leq \mathbb{E}[|g(X)|] < \infty$. **Applying Step 3 to each part.** Each of $g^+$ and $g^-$ is non-negative and Borel-measurable. Step 3 applies to each: \begin{align*} \mathbb{E}[g^+(X)] = \int_{\mathbb{R}} g^+(x) \, d\mu_X(x) < \infty, \qquad \mathbb{E}[g^-(X)] = \int_{\mathbb{R}} g^-(x) \, d\mu_X(x) < \infty. \end{align*} The finiteness established above for the left-hand sides transfers to the right-hand sides, confirming that $g^+$ and $g^-$ are $\mu_X$-integrable. **Combining.** Since both integrals on the right are finite, the difference is well-defined: \begin{align*} \mathbb{E}[g(X)] = \mathbb{E}[g^+(X)] - \mathbb{E}[g^-(X)] = \int_{\mathbb{R}} g^+(x) \, d\mu_X(x) - \int_{\mathbb{R}} g^-(x) \, d\mu_X(x) = \int_{\mathbb{R}} g(x) \, d\mu_X(x). \end{align*} **The equivalence of integrability.** Applying Step 3 to $|g| = g^+ + g^-$ gives $\int_{\mathbb{R}} |g| \, d\mu_X = \mathbb{E}[|g(X)|]$. Therefore $g \circ X \in L^1(\mathbb{P})$ (i.e., $\mathbb{E}[|g(X)|] < \infty$) if and only if $g \in L^1(\mathbb{R}, \mathcal{B}(\mathbb{R}), \mu_X)$ (i.e., $\int_{\mathbb{R}} |g| \, d\mu_X < \infty$). This confirms the equivalence stated in part (ii) of the theorem. [/guided] [/step] [step:Derive the density formula when $\mu_X \ll \mathcal{L}^1$ from the Radon-Nikodym Theorem] Suppose $\mu_X \ll \mathcal{L}^1$. The [Radon-Nikodym Theorem](/theorems/1247), applied to the $\sigma$-finite measure spaces $(\mathbb{R}, \mathcal{B}(\mathbb{R}), \mu_X)$ and $(\mathbb{R}, \mathcal{B}(\mathbb{R}), \mathcal{L}^1)$ (both $\sigma$-finite since $\mathbb{R} = \bigcup_{n=1}^\infty [-n, n]$ and $\mathcal{L}^1([-n,n]) = 2n < \infty$, $\mu_X([-n,n]) \leq 1 < \infty$), guarantees the existence of a unique (up to $\mathcal{L}^1$-a.e. equality) non-negative $\mathcal{B}(\mathbb{R})$-measurable function $f_X \in L^1(\mathbb{R}, \mathcal{B}(\mathbb{R}), \mathcal{L}^1)$ such that \begin{align*} \mu_X(B) = \int_B f_X(x) \, d\mathcal{L}^1(x) \qquad \text{for all } B \in \mathcal{B}(\mathbb{R}). \end{align*} In particular $\int_{\mathbb{R}} f_X \, d\mathcal{L}^1 = \mu_X(\mathbb{R}) = 1$. It remains to show that $\int_{\mathbb{R}} g \, d\mu_X = \int_{\mathbb{R}} g \, f_X \, d\mathcal{L}^1$ for $g \in L^1(\mathbb{R}, \mathcal{B}(\mathbb{R}), \mu_X)$. For $B \in \mathcal{B}(\mathbb{R})$, the Radon-Nikodym condition gives $\int_{\mathbb{R}} \mathbb{1}_B \, d\mu_X = \mu_X(B) = \int_{\mathbb{R}} \mathbb{1}_B \, f_X \, d\mathcal{L}^1$. Extending to simple functions by linearity, then to non-negative measurable functions by the [Monotone Convergence Theorem](/theorems/509) applied on $(\mathbb{R}, \mathcal{B}(\mathbb{R}), \mathcal{L}^1)$, and finally to integrable functions by positive-negative decomposition — the identical three-step extension of Steps 2–4 — yields: \begin{align*} \int_{\mathbb{R}} g(x) \, d\mu_X(x) = \int_{\mathbb{R}} g(x) \, f_X(x) \, d\mathcal{L}^1(x). \end{align*} Combining with the main identity from Step 4: \begin{align*} \mathbb{E}[g(X)] = \int_{\mathbb{R}} g(x) \, d\mu_X(x) = \int_{\mathbb{R}} g(x) \, f_X(x) \, d\mathcal{L}^1(x). \end{align*} This completes the proof of all three parts of the theorem. [guided] The density formula converts the integral against the possibly singular measure $\mu_X$ into a standard Lebesgue integral. The bridge is the [Radon-Nikodym Theorem](/theorems/2640). **Verifying the hypotheses of the Radon-Nikodym Theorem.** The [Radon-Nikodym Theorem](/theorems/1247) requires: (a) both measures are $\sigma$-finite, and (b) $\mu_X \ll \mathcal{L}^1$. For (a): $\mathbb{R} = \bigcup_{n=1}^\infty [-n, n]$, and $\mathcal{L}^1([-n,n]) = 2n < \infty$, $\mu_X([-n,n]) \leq \mu_X(\mathbb{R}) = 1 < \infty$ for each $n$. For (b): this is exactly the hypothesis of part (iii). The theorem therefore guarantees existence and $\mathcal{L}^1$-a.e. uniqueness of $f_X \geq 0$ with $\mu_X(B) = \int_B f_X \, d\mathcal{L}^1$ for all $B \in \mathcal{B}(\mathbb{R})$. Note: $\int_{\mathbb{R}} f_X \, d\mathcal{L}^1 = \mu_X(\mathbb{R}) = 1$, confirming that $f_X$ is a probability density. **Extending the Radon-Nikodym identity to integrable $g$.** The defining relation of $f_X$ is an identity of measures: \begin{align*} \int_{\mathbb{R}} \mathbb{1}_B(x) \, d\mu_X(x) = \mu_X(B) = \int_{\mathbb{R}} \mathbb{1}_B(x) \, f_X(x) \, d\mathcal{L}^1(x) \quad \text{for all } B \in \mathcal{B}(\mathbb{R}). \end{align*} We now promote this from indicator functions to all $g \in L^1(\mathbb{R}, \mathcal{B}(\mathbb{R}), \mu_X)$ by repeating the same three steps as the main proof (Steps 2–4), but now working with the pair of measures $(\mu_X, \mathcal{L}^1)$ on $(\mathbb{R}, \mathcal{B}(\mathbb{R}))$ and the function $x \mapsto g(x) f_X(x)$ in place of $g \circ X$: - *Simple functions*: by linearity, $\int_{\mathbb{R}} \sum_k a_k \mathbb{1}_{B_k} \, d\mu_X = \sum_k a_k \mu_X(B_k) = \sum_k a_k \int_{B_k} f_X \, d\mathcal{L}^1 = \int_{\mathbb{R}} \sum_k a_k \mathbb{1}_{B_k} \cdot f_X \, d\mathcal{L}^1$. - *Non-negative measurable*: approximate by simple functions $h_n \nearrow h$ and apply the [Monotone Convergence Theorem](/theorems/509) to $(h_n f_X)$ on $(\mathbb{R}, \mathcal{B}(\mathbb{R}), \mathcal{L}^1)$ (noting $h_n f_X \nearrow h f_X$ pointwise) and to $(h_n)$ on $(\mathbb{R}, \mathcal{B}(\mathbb{R}), \mu_X)$. - *Integrable*: decompose $g = g^+ - g^-$ and use finiteness on both sides. The outcome is $\int_{\mathbb{R}} g \, d\mu_X = \int_{\mathbb{R}} g f_X \, d\mathcal{L}^1$ for all $g \in L^1(\mathbb{R}, \mathcal{B}(\mathbb{R}), \mu_X)$. **Conclusion.** Combining the main identity (Step 4) with the conversion formula just established: \begin{align*} \mathbb{E}[g(X)] = \int_{\mathbb{R}} g(x) \, d\mu_X(x) = \int_{\mathbb{R}} g(x) \, f_X(x) \, d\mathcal{L}^1(x). \end{align*} This completes the proof of all three parts of the theorem. [/guided] [/step]

Prerequisites (0/7 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

Definitions & Concepts

What brings you to Androma?

Start with a route through the knowledge graph.

Law of the Unconscious Statistician (Theorem # 3536)

Discussion

Proof

Prerequisites (0/7 completed)

Prerequisites Graph

Explore Further

Sign in to Androma

Check your inbox

One last step

Law of the Unconscious Statistician (Theorem # 3536)

Discussion

Proof

Prerequisites (0/7 completed)

Prerequisites Graph

Explore Further