Pointwise Mean and Variance of the Empirical Distribution Function

Pointwise Mean and Variance of the Empirical Distribution Function (Theorem # 6293)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] Fix a point $x\in\mathbb{R}$ and rewrite the empirical distribution function at $x$ as the average of indicator random variables. Each indicator records whether $X_i$ falls in the Borel set $(-\infty,x]$, hence is a Bernoulli [random variable](/page/Random%20Variable) with success probability $F(x)$. Linearity of expectation gives the mean, and independence makes the cross-covariances vanish, so the [variance](/page/Variance) of the average is the average of the Bernoulli variances divided by $n$. [/proofplan] [step:Convert the empirical distribution function at a fixed point into Bernoulli variables] Fix $x\in\mathbb{R}$. Define the Borel measurable map $h_x:\mathbb{R}\to\{0,1\}$ by $h_x(t)=\mathbb{1}_{(-\infty,x]}(t)$ for every $t\in\mathbb{R}$. For each $i\in\{1,\dots,n\}$, define the random variable $Y_i:\Omega\to\{0,1\}$ by \begin{align*} Y_i=h_x\circ X_i=\mathbb{1}_{\{X_i\le x\}}. \end{align*} Since $(-\infty,x]\in\mathcal{B}(\mathbb{R})$, each $Y_i$ is $\mathcal{F}$-measurable. Also \begin{align*} \mathbb{P}(Y_i=1)=\mathbb{P}(X_i\le x)=F(x), \end{align*} because the $X_i$ have common distribution function $F$. Thus $Y_1,\dots,Y_n$ are identically distributed Bernoulli random variables with success probability $F(x)$. Since $Y_i=h_x\circ X_i$ and $X_1,\dots,X_n$ are independent, the random variables $Y_1,\dots,Y_n$ are independent. Finally, \begin{align*} F_n(x)=\frac{1}{n}\sum_{i=1}^n Y_i. \end{align*} [guided] Fix $x\in\mathbb{R}$. The value $F_n(x)$ is not a function of a variable point anymore; it is a random variable on $\Omega$. To isolate its elementary structure, define the map $h_x:\mathbb{R}\to\{0,1\}$ by $h_x(t)=\mathbb{1}_{(-\infty,x]}(t)$ for every $t\in\mathbb{R}$. The set $(-\infty,x]$ is Borel, so $h_x$ is measurable as a map from $(\mathbb{R},\mathcal{B}(\mathbb{R}))$ to $\{0,1\}$ with the discrete $\sigma$-algebra. For each $i\in\{1,\dots,n\}$, define the random variable $Y_i:\Omega\to\{0,1\}$ by $Y_i(\omega)=h_x(X_i(\omega))$ for every $\omega\in\Omega$. Equivalently, \begin{align*} Y_i=\mathbb{1}_{\{X_i\le x\}}. \end{align*} Because $X_i$ is measurable and $h_x$ is measurable, the composition $Y_i=h_x\circ X_i$ is a random variable. Now compute its success probability. The event $\{Y_i=1\}$ is exactly the event $\{X_i\le x\}$, so \begin{align*} \mathbb{P}(Y_i=1)=\mathbb{P}(X_i\le x)=F(x). \end{align*} Thus each $Y_i$ is a Bernoulli random variable with success probability $F(x)$. The independence of the original variables is preserved under applying the same measurable map $h_x$ coordinatewise, so $Y_1,\dots,Y_n$ are independent. With this notation, the empirical distribution function at the fixed point $x$ becomes \begin{align*} F_n(x)=\frac{1}{n}\sum_{i=1}^n \mathbb{1}_{\{X_i\le x\}}=\frac{1}{n}\sum_{i=1}^n Y_i. \end{align*} This reduction is the whole probabilistic content of the theorem: after fixing $x$, the problem is just about the average of i.i.d. Bernoulli random variables. [/guided] [/step] [step:Compute the pointwise expectation by linearity] Since each $Y_i$ is bounded, all expectations below are finite. For every $i\in\{1,\dots,n\}$, \begin{align*} \mathbb{E}[Y_i]=1\cdot\mathbb{P}(Y_i=1)+0\cdot\mathbb{P}(Y_i=0)=F(x). \end{align*} Using linearity of expectation for the finite sum, we first have \begin{align*} \mathbb{E}[F_n(x)]=\mathbb{E}\left[\frac{1}{n}\sum_{i=1}^n Y_i\right]. \end{align*} Linearity and the scalar factor give \begin{align*} \mathbb{E}[F_n(x)]=\frac{1}{n}\sum_{i=1}^n \mathbb{E}[Y_i]. \end{align*} Since $\mathbb{E}[Y_i]=F(x)$ for every $i\in\{1,\dots,n\}$, \begin{align*} \mathbb{E}[F_n(x)]=\frac{1}{n}\sum_{i=1}^n F(x)=F(x). \end{align*} [/step] [step:Compute the Bernoulli variance at the fixed point] For every $i\in\{1,\dots,n\}$, the identity $Y_i^2=Y_i$ holds pointwise because $Y_i$ takes only the values $0$ and $1$. Therefore the [variance](/page/Variance) of $Y_i$ satisfies \begin{align*} \operatorname{Var}(Y_i)=\mathbb{E}[Y_i^2]-(\mathbb{E}[Y_i])^2. \end{align*} Using $Y_i^2=Y_i$ and $\mathbb{E}[Y_i]=F(x)$, this becomes \begin{align*} \operatorname{Var}(Y_i)=\mathbb{E}[Y_i]-(F(x))^2=F(x)(1-F(x)). \end{align*} [/step] [step:Use independence to compute the variance of the average] For square-integrable real-valued random variables $A:\Omega\to\mathbb{R}$ and $B:\Omega\to\mathbb{R}$, define their covariance by \begin{align*} \operatorname{Cov}(A,B)=\mathbb{E}[AB]-\mathbb{E}[A]\mathbb{E}[B]. \end{align*} This definition applies to the variables $Y_i$ because they are bounded. For distinct indices $i,j\in\{1,\dots,n\}$, independence of $Y_i$ and $Y_j$ gives \begin{align*} \mathbb{E}[Y_iY_j]=\mathbb{E}[Y_i]\mathbb{E}[Y_j], \end{align*} so \begin{align*} \operatorname{Cov}(Y_i,Y_j) =\mathbb{E}[Y_iY_j]-\mathbb{E}[Y_i]\mathbb{E}[Y_j] =0. \end{align*} We derive the finite-sum [variance](/page/Variance) identity in this case from covariance bilinearity. Since the $Y_i$ are bounded, all second moments are finite, and \begin{align*} \operatorname{Var}\left(\sum_{i=1}^n Y_i\right)=\operatorname{Cov}\left(\sum_{i=1}^n Y_i,\sum_{j=1}^n Y_j\right). \end{align*} Bilinearity of covariance for finite sums gives \begin{align*} \operatorname{Var}\left(\sum_{i=1}^n Y_i\right)=\sum_{i=1}^n\sum_{j=1}^n \operatorname{Cov}(Y_i,Y_j). \end{align*} Splitting the diagonal terms from the off-diagonal terms and using $\operatorname{Cov}(Y_i,Y_j)=0$ for $i\ne j$ yields \begin{align*} \operatorname{Var}\left(\sum_{i=1}^n Y_i\right)=\sum_{i=1}^n \operatorname{Var}(Y_i). \end{align*} Therefore \begin{align*} \operatorname{Var}(F_n(x))=\operatorname{Var}\left(\frac{1}{n}\sum_{i=1}^n Y_i\right)=\frac{1}{n^2}\sum_{i=1}^n F(x)(1-F(x))=\frac{F(x)(1-F(x))}{n}. \end{align*} Since $x\in\mathbb{R}$ was arbitrary, both formulas hold for every $x\in\mathbb{R}$. [/step]

Prerequisites (0/4 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Definitions & Concepts

Explore Further

What brings you to Androma?

Start with a route through the knowledge graph.