Properties of Expectation — Statement & Proof

Properties of Expectation (Theorem # 1117)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] Each property is proved by expanding the expectation as a sum over the range of the discrete random variable and manipulating the resulting series. Part (v) uses the algebraic completion-of-the-square identity to show the minimum of $\mathbb{E}[(X - c)^2]$ over $c \in \mathbb{R}$ is attained at $c = \mathbb{E}[X]$. [/proofplan] [step:Prove non-negativity: $X \ge 0$ implies $\mathbb{E}[X] \ge 0$] If $X \ge 0$, then $X(\omega) \ge 0$ for all $\omega$, so every value $x$ in the range of $X$ satisfies $x \ge 0$. Since $\mathbb{P}(X = x) \ge 0$ for all $x$, \begin{align*} \mathbb{E}[X] = \sum_x x \, \mathbb{P}(X = x) \ge 0, \end{align*} where each term in the sum is the product of a non-negative number $x$ and a non-negative probability. [/step] [step:Prove that $X \ge 0$ and $\mathbb{E}[X] = 0$ imply $\mathbb{P}(X = 0) = 1$] Suppose $X \ge 0$ and $\mathbb{E}[X] = 0$. Then $\sum_x x \, \mathbb{P}(X = x) = 0$, where every term $x \, \mathbb{P}(X = x) \ge 0$ (since $x \ge 0$ and $\mathbb{P}(X = x) \ge 0$). A sum of non-negative terms equals zero if and only if every term is zero. For $x > 0$, the term $x \, \mathbb{P}(X = x) = 0$ forces $\mathbb{P}(X = x) = 0$ (since $x \ne 0$). Therefore \begin{align*} \mathbb{P}(X \ne 0) = \sum_{x \ne 0} \mathbb{P}(X = x) = 0, \end{align*} which gives $\mathbb{P}(X = 0) = 1$. [/step] [step:Prove linearity: $\mathbb{E}[a + bX] = a + b\,\mathbb{E}[X]$] Let $Y = a + bX$. The values of $Y$ are $\{a + bx : x \in \operatorname{Range}(X)\}$, and $\mathbb{P}(Y = a + bx) = \mathbb{P}(X = x)$. Substituting into the definition of expectation and using the substitution $y = a + bx$: \begin{align*} \mathbb{E}[a + bX] &= \sum_x (a + bx)\,\mathbb{P}(X = x) \\ &= a \sum_x \mathbb{P}(X = x) + b \sum_x x \, \mathbb{P}(X = x) \\ &= a \cdot 1 + b \, \mathbb{E}[X], \end{align*} where we used $\sum_x \mathbb{P}(X = x) = 1$ (the total probability over the range of $X$). [/step] [step:Prove additivity: $\mathbb{E}[X + Y] = \mathbb{E}[X] + \mathbb{E}[Y]$] By the definition of expectation for the discrete random variable $X + Y$: \begin{align*} \mathbb{E}[X + Y] &= \sum_{x}\sum_{y} (x + y)\,\mathbb{P}(X = x, Y = y) \\ &= \sum_{x}\sum_{y} x\,\mathbb{P}(X = x, Y = y) + \sum_{x}\sum_{y} y\,\mathbb{P}(X = x, Y = y). \end{align*} In the first double sum, $x$ does not depend on $y$, so we may factor: \begin{align*} \sum_{x} x \sum_{y} \mathbb{P}(X = x, Y = y) = \sum_x x \, \mathbb{P}(X = x) = \mathbb{E}[X], \end{align*} where we used the marginal probability $\sum_y \mathbb{P}(X = x, Y = y) = \mathbb{P}(X = x)$. By the same argument with the roles of $x$ and $y$ exchanged, the second double sum equals $\mathbb{E}[Y]$. [guided] This proof does not require independence — it uses only the marginalisation identity $\sum_y \mathbb{P}(X = x, Y = y) = \mathbb{P}(X = x)$, which holds for any joint distribution. The key step is the double sum over the joint distribution. We write \begin{align*} \mathbb{E}[X + Y] = \sum_x \sum_y (x+y)\,\mathbb{P}(X = x, Y = y). \end{align*} We split $x + y$ into two terms and handle each separately. For the $x$-term: since $x$ is constant in the inner sum over $y$, we pull it out and sum $\mathbb{P}(X = x, Y = y)$ over $y$, which gives the marginal $\mathbb{P}(X = x)$ by the [law of total probability](/theorems/1113). The same reasoning applies symmetrically to the $y$-term. [/guided] [/step] [step:Show $\mathbb{E}[X]$ minimises $\mathbb{E}[(X - c)^2]$ over $c \in \mathbb{R}$] Let $\mu = \mathbb{E}[X]$. For any $c \in \mathbb{R}$, add and subtract $\mu$: \begin{align*} (X - c)^2 = ((X - \mu) + (\mu - c))^2 = (X - \mu)^2 + 2(X - \mu)(\mu - c) + (\mu - c)^2. \end{align*} Taking expectations and using linearity: \begin{align*} \mathbb{E}[(X - c)^2] &= \mathbb{E}[(X - \mu)^2] + 2(\mu - c)\,\mathbb{E}[X - \mu] + (\mu - c)^2. \end{align*} By linearity, $\mathbb{E}[X - \mu] = \mathbb{E}[X] - \mu = 0$, so the middle term vanishes. This gives \begin{align*} \mathbb{E}[(X - c)^2] = \mathbb{E}[(X - \mu)^2] + (\mu - c)^2. \end{align*} Since $(\mu - c)^2 \ge 0$ with equality if and only if $c = \mu$, the minimum of $\mathbb{E}[(X - c)^2]$ over $c \in \mathbb{R}$ is $\mathbb{E}[(X - \mu)^2]$, attained uniquely at $c = \mu = \mathbb{E}[X]$. [guided] Why does the "add and subtract $\mu$" trick work? The idea is to decompose the squared error $(X - c)^2$ into a term that depends on $c$ and a term that does not. Writing $X - c = (X - \mu) + (\mu - c)$ and expanding the square produces a cross-term $2(X - \mu)(\mu - c)$. The factor $(\mu - c)$ is a constant and can be pulled out of the expectation, leaving $\mathbb{E}[X - \mu]$. By definition of $\mu = \mathbb{E}[X]$, this expectation is zero — this is precisely the property that makes $\mu$ special. Once the cross-term vanishes, we are left with $\mathbb{E}[(X - \mu)^2] + (\mu - c)^2$, and the second term is a non-negative quantity that vanishes only when $c = \mu$. [/guided] [/step]

Prerequisites (0/3 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

Law of Total Probability

Definitions & Concepts

Explore Further

random variable Definition expectation Definition Law of Total Probability Theorem #1113 Reflection Principle for Brownian Motion Brownian Motion Logistic Regression Log-Likelihood, Score, and Observed Hessian Probability & Statistics Density of a Transformed Variable Probability Theory Memoryless Property of the Exponential Probability Theory Exponential Martingale for Brownian Motion Brownian Motion Optional Stopping Theorem Martingale Theory Independence Through Rectangles Probability & Statistics Cramer's Theorem Large Deviations Probability & Statistics Area Probability Theory Subarea

What brings you to Androma?

Start with a route through the knowledge graph.

Properties of Expectation (Theorem # 1117)

Discussion

Proof

Prerequisites (0/3 completed)

Prerequisites Graph

Explore Further

Sign in to Androma

Check your inbox

One last step

Properties of Expectation (Theorem # 1117)

Discussion

Proof

Prerequisites (0/3 completed)

Prerequisites Graph

Explore Further