To define a distribution, it is tempting to list the probabilities of the values a [random variable](/page/Random%20Variable) can take. That works for a die, a binomial count, or any random variable supported on a [countable set](/page/Countable%20Set). It fails as soon as the random variable is continuous, because then every individual value may have probability zero. It also fails for mixed random variables, such as a waiting time that is sometimes exactly zero and otherwise continuously distributed. The cumulative distribution function solves this by asking a question that makes sense in every case: how much probability lies to the left of a point?
[example: Why Point Probabilities Do Not Describe a Continuous Law]
Let $X\sim\operatorname{Unif}(0,1)$, so probabilities inside $(0,1)$ are computed by length. For any $x\in\mathbb R$, the singleton $\{x\}$ has length $0$, hence
\begin{align*}
\mathbb P(X=x)=0.
\end{align*}
Thus the point-probability list is identically zero, even though the random variable is not degenerate.
The interval probabilities still determine where the mass lies. Write $\mathcal L^1$ for one-dimensional [Lebesgue measure](/page/Lebesgue%20Measure), that is, ordinary length on the real line. If $0\le a\le b\le1$, then
\begin{align*}
\mathbb P(a<X\le b)=\mathcal L^1((a,b])=b-a.
\end{align*}
The CDF of $X$ is
\begin{align*}
F_X(t)=t\qquad\text{for }0\le t\le1,
\end{align*}
so
\begin{align*}
F_X(b)-F_X(a)=b-a.
\end{align*}
Therefore
\begin{align*}
\mathbb P(a<X\le b)=F_X(b)-F_X(a).
\end{align*}
The CDF records the interval probabilities that point probabilities miss, so left-sided events can describe a continuous distribution even when every individual point has probability zero.
[/example]
This example is the basic reason cumulative distribution functions are central in probability and statistics. They give a common language for discrete, continuous, singular, and mixed laws. They also translate questions about random variables into questions about monotone functions on the real line, which is often the most tractable form of a one-dimensional distribution.
## Definition
The event $\{X \le x\}$ is measurable whenever $X$ is a real-valued random variable, so its probability is defined for every real threshold $x$. Varying the threshold lets us scan the whole law from left to right.
[definition: Cumulative Distribution Function]
Let $(\Omega, \mathcal F, \mathbb P)$ be a [probability space](/page/Probability%20Space), and let $X: (\Omega,\mathcal F) \to (\mathbb R, \mathcal B(\mathbb R))$ be a real-valued random variable, where $\mathcal B(\mathbb R)$ denotes the Borel $\sigma$-algebra on $\mathbb R$. The cumulative distribution function of $X$ is the function
\begin{align*}
F_X:\mathbb R \to [0,1]
\end{align*}
given by
\begin{align*}
F_X(x)=\mathbb P(X\le x).
\end{align*}
[/definition]
This definition deliberately does not assume that $X$ has a density or takes countably many values. The CDF is attached to the law $\mu_X=\mathbb P\circ X^{-1}$, the pushforward distribution of $\mathbb P$ under $X$, rather than to a particular formula for $X$. In concrete terms, $\mu_X(A)=\mathbb P(X\in A)$ for each Borel set $A\in\mathcal B(\mathbb R)$.
Sometimes two different random variables have the same law, or a model is specified directly as a probability measure on the real line. To avoid tying the concept to a particular sample space, we also need the measure-level version of the same object.
[definition: Distribution Function of a Probability Measure]
Let $\mu$ be a probability measure on $(\mathbb R,\mathcal B(\mathbb R))$. The distribution function of $\mu$ is the function
\begin{align*}
F_\mu:\mathbb R \to [0,1]
\end{align*}
given by
\begin{align*}
F_\mu(x)=\mu((-\infty,x]).
\end{align*}
[/definition]
If $\mu=\mu_X$, then $F_\mu=F_X$. This small shift of viewpoint matters because many statistical models specify the distribution first and construct a random variable later.
A first calculation should include both jumps and flat parts, because those are the visual signals of atoms and intervals with no mass.
[example: CDF of a Bernoulli Random Variable]
Let $X\sim\operatorname{Ber}(p)$ with $p\in[0,1]$, so the only possible values are $0$ and $1$, with $\mathbb P(X=0)=1-p$ and $\mathbb P(X=1)=p$. We compute $F_X(x)=\mathbb P(X\le x)$ by considering where the threshold $x$ lies relative to $0$ and $1$.
If $x<0$, then neither possible value of $X$ is at most $x$, so $\{X\le x\}=\varnothing$ and
\begin{align*}
F_X(x)=\mathbb P(\varnothing)=0.
\end{align*}
If $0\le x<1$, then $X\le x$ happens exactly when $X=0$, because $0\le x$ and $1>x$. Hence
\begin{align*}
F_X(x)=\mathbb P(X=0)=1-p.
\end{align*}
If $x\ge1$, then both possible values satisfy $X\le x$, so $\{X\le x\}=\Omega$ and
\begin{align*}
F_X(x)=\mathbb P(\Omega)=1.
\end{align*}
Therefore $F_X(x)=0$ for $x<0$, $F_X(x)=1-p$ for $0\le x<1$, and $F_X(x)=1$ for $x\ge1$.
At $0$, the left limit is $F_X(0-)=0$ and the value is $F_X(0)=1-p$, so the jump size is
\begin{align*}
F_X(0)-F_X(0-)=1-p-0=1-p.
\end{align*}
At $1$, the left limit is $F_X(1-)=1-p$ and the value is $F_X(1)=1$, so the jump size is
\begin{align*}
F_X(1)-F_X(1-)=1-(1-p)=p.
\end{align*}
On the intervals $(-\infty,0)$, $(0,1)$, and $(1,\infty)$ the CDF is constant, reflecting that a Bernoulli random variable puts probability mass only at the two points $0$ and $1$.
[/example]
The next example shows the opposite behaviour: no jumps, but a nonzero rate of accumulation across an interval.
[example: CDF of a Uniform Random Variable]
Let $X\sim\operatorname{Unif}(a,b)$ with $a<b$, so probabilities inside $[a,b]$ are normalized lengths: for $a\le u\le v\le b$,
\begin{align*}
\mathbb P(u<X\le v)=\frac{v-u}{b-a}.
\end{align*}
We compute $F_X(x)=\mathbb P(X\le x)$ by locating the threshold $x$ relative to the support interval $[a,b]$.
If $x<a$, then $X\le x$ is impossible because $X$ only takes values in $[a,b]$, so
\begin{align*}
F_X(x)=\mathbb P(X\le x)=0.
\end{align*}
If $a\le x\le b$, then $\{X\le x\}$ is the event that $X$ lies in $[a,x]$, and the endpoint $a$ has probability $0$ for a uniform law on an interval. Hence
\begin{align*}
F_X(x)=\mathbb P(a<X\le x)=\frac{x-a}{b-a}.
\end{align*}
If $x>b$, then $X\le x$ is certain, so
\begin{align*}
F_X(x)=\mathbb P(X\le x)=1.
\end{align*}
Thus
\begin{align*}
F_X(x)=0\text{ for }x<a,\quad F_X(x)=\frac{x-a}{b-a}\text{ for }a\le x\le b,\quad F_X(x)=1\text{ for }x>b.
\end{align*}
For $a\le s<t\le b$, both $s$ and $t$ fall in the middle case, so
\begin{align*}
F_X(t)-F_X(s)=\frac{t-a}{b-a}-\frac{s-a}{b-a}.
\end{align*}
Putting the two fractions over the same denominator gives
\begin{align*}
F_X(t)-F_X(s)=\frac{(t-a)-(s-a)}{b-a}.
\end{align*}
Expanding the numerator gives
\begin{align*}
(t-a)-(s-a)=t-a-s+a=t-s.
\end{align*}
Therefore
\begin{align*}
F_X(t)-F_X(s)=\frac{t-s}{b-a}.
\end{align*}
The same normalized-length rule gives
\begin{align*}
\mathbb P(s<X\le t)=\frac{t-s}{b-a}.
\end{align*}
Hence
\begin{align*}
\mathbb P(s<X\le t)=F_X(t)-F_X(s).
\end{align*}
For a uniform law, CDF increments over subintervals exactly record the relative lengths of those subintervals.
[/example]
## Shape of a Distribution Function
### Monotonicity and Right-Continuity
A CDF is not an arbitrary function from $\mathbb R$ to $[0,1]$. Since the events $\{X\le x\}$ grow as $x$ increases, the function must be increasing. Since probabilities are continuous under monotone limits of events, the function must have specific one-sided continuity and limiting behaviour. These analytic properties are the bridge between probability and real analysis.
[quotetheorem:9459]
Right-continuity is not a decorative condition. It fixes the value at a jump to include the mass at the endpoint, matching the convention $\mathbb P(X\le x)$ rather than $\mathbb P(X<x)$.
The previous theorem gives necessary conditions. For CDFs to be a usable language for probability laws, we also need to know that these conditions are sufficient: whenever an increasing right-[continuous function](/page/Continuous%20Function) has the correct endpoint limits, it is not pretending to be a distribution function; it really comes from a unique probability measure.
[quotetheorem:4986]
This theorem says that the graph of a CDF is not merely a summary of a law; it is the law. The uniqueness part is what makes CDF convergence and quantile transformations so powerful.
### Jumps and Atoms
To read atoms from a CDF, the value at $x$ must be compared with the mass already accumulated before reaching $x$. That requires a precise notation for approaching $x$ from the left.
[definition: Left Limit of a Distribution Function]
Let $F: \mathbb R\to[0,1]$ be increasing. The left-limit operation associated to $F$ is the function
\begin{align*}
F_-:\mathbb R\to[0,1]
\end{align*}
defined by
\begin{align*}
F_-(x)=\lim_{y\uparrow x} F(y).
\end{align*}
We write $F(x-)$ for $F_-(x)$.
[/definition]
The left limit measures the probability strictly below $x$ when $F$ is a CDF. The question is how much probability is added exactly at $x$, rather than before $x$. That added mass is the vertical jump of the CDF:
\begin{align*}
\mathbb P(X=x)=F_X(x)-F_X(x-).
\end{align*}
This jump formula lets us read point masses directly from the graph, so the following example uses it as a computational tool rather than as a regularity statement. Geometrically, a step in the graph is visible probability mass, not a defect in regularity.
[example: Reading Atoms from a Step Function]
Suppose $F(x)=0$ for $x<-1$, $F(x)=1/4$ for $-1\le x<0$, $F(x)=3/4$ for $0\le x<2$, and $F(x)=1$ for $x\ge2$. If $F$ is the CDF of $X$, then the mass at a point is the jump of the CDF there:
\begin{align*}
\mathbb P(X=x)=F(x)-F(x-).
\end{align*}
At $-1$, values just to the left of $-1$ fall in the region $x<-1$, so $F(-1-)=0$, while $F(-1)=1/4$. Hence
\begin{align*}
\mathbb P(X=-1)=F(-1)-F(-1-)=\frac14-0=\frac14.
\end{align*}
At $0$, values just to the left of $0$ fall in the region $-1\le x<0$, so $F(0-)=1/4$, while $F(0)=3/4$. Therefore
\begin{align*}
\mathbb P(X=0)=F(0)-F(0-)=\frac34-\frac14=\frac24=\frac12.
\end{align*}
At $2$, values just to the left of $2$ fall in the region $0\le x<2$, so $F(2-)=3/4$, while $F(2)=1$. Thus
\begin{align*}
\mathbb P(X=2)=F(2)-F(2-)=1-\frac34=\frac14.
\end{align*}
On each open interval $(-\infty,-1)$, $(-1,0)$, $(0,2)$, and $(2,\infty)$, the function $F$ is constant, so for any $a<b$ inside one of these intervals,
\begin{align*}
\mathbb P(a<X\le b)=F(b)-F(a)=0.
\end{align*}
The flat pieces therefore mean that no mass lies in those open intervals; all the mass is concentrated at the jump points $-1$, $0$, and $2$.
[/example]
## Probabilities of Intervals
### Endpoint Conventions
The CDF is useful because it lets us compute probabilities of intervals without returning to the underlying sample space. This is the main operational reason it appears so early in probability courses.
For [real numbers](/page/Real%20Numbers) $a<b$, the CDF gives the basic interval probabilities by subtracting endpoint values, with left limits used when the left endpoint is included. The half-open interval with open left endpoint satisfies
\begin{align*}
\mathbb P(a<X\le b)=F_X(b)-F_X(a).
\end{align*}
If the left endpoint is included, replace $F_X(a)$ by the left limit at $a$:
\begin{align*}
\mathbb P(a\le X\le b)=F_X(b)-F_X(a-).
\end{align*}
If the right endpoint is excluded, replace $F_X(b)$ by the left limit at $b$:
\begin{align*}
\mathbb P(a<X<b)=F_X(b-)-F_X(a).
\end{align*}
Combining both endpoint changes gives
\begin{align*}
\mathbb P(a\le X<b)=F_X(b-)-F_X(a-).
\end{align*}
Here $F_X(t-)=\lim_{y\uparrow t}F_X(y)$ is the left limit. The open and closed endpoints matter only when the distribution has atoms at the endpoints. For continuous distributions the four formulas collapse to the same value.
Limit theorems and interval calculations often need to separate points where no mass is sitting from points where a jump occurs. This motivates naming the points where the CDF has no jump.
[definition: Continuity Point of a Distribution Function]
Let $F: \mathbb R\to[0,1]$ be a distribution function. A point $x\in\mathbb R$ is a continuity point of $F$ if
\begin{align*}
\lim_{y\to x}F(y)=F(x).
\end{align*}
[/definition]
At a continuity point the law has no atom. This is why many limit theorems are stated only at continuity points of the limiting CDF.
[example: Endpoint Conventions Matter Only at Atoms]
Let $X\sim\operatorname{Ber}(p)$, so $X$ takes only the values $0$ and $1$, with $\mathbb P(X=0)=1-p$ and $\mathbb P(X=1)=p$. The event $0<X\le 1$ occurs exactly when $X=1$, because the value $0$ is excluded by the strict lower endpoint and the value $1$ is included by the upper endpoint. Hence
\begin{align*}
\{0<X\le 1\}=\{X=1\}.
\end{align*}
Therefore
\begin{align*}
\mathbb P(0<X\le 1)=\mathbb P(X=1)=p.
\end{align*}
For the closed interval, both possible values of $X$ are included:
\begin{align*}
\{0\le X\le 1\}=\{X=0\}\cup\{X=1\}.
\end{align*}
The two events $\{X=0\}$ and $\{X=1\}$ are disjoint, so finite additivity gives
\begin{align*}
\mathbb P(0\le X\le 1)=\mathbb P(X=0)+\mathbb P(X=1).
\end{align*}
Substituting the Bernoulli probabilities gives
\begin{align*}
\mathbb P(0\le X\le 1)=(1-p)+p=1.
\end{align*}
The difference between the two endpoint conventions is therefore
\begin{align*}
\mathbb P(0\le X\le 1)-\mathbb P(0<X\le 1)=1-p.
\end{align*}
This is exactly the mass at the excluded endpoint $0$. In CDF notation, $F_X(0)=\mathbb P(X\le 0)=\mathbb P(X=0)=1-p$, while $F_X(0-)=0$, so
\begin{align*}
F_X(0)-F_X(0-)=(1-p)-0=1-p.
\end{align*}
Thus open and closed endpoint formulas differ here only because the Bernoulli law has an atom at $0$.
[/example]
### Determining the Whole Law
Interval probabilities are already a large amount of information, but a probability law assigns mass to every Borel set. The next result explains why knowing the CDF is enough: half-lines generate the Borel $\sigma$-algebra, so equality on all left half-lines forces equality on all Borel events.
[quotetheorem:9460]
This is the formal justification for treating the CDF as the distribution. In statistics, estimating the CDF means estimating all probabilities of Borel events, not just a convenient graph.
## Discrete, Continuous, and Mixed Laws
### Discrete Accumulation
The same CDF can represent very different kinds of accumulation. Sometimes mass is concentrated in jumps. Sometimes mass is spread with a density. Sometimes both happen in the same distribution. The CDF is valuable because it does not force the modeller to choose a category too early.
A discrete law is often introduced through probabilities at individual points. To connect that description with the cumulative viewpoint, we first isolate the function that records the mass at each support point.
[definition: Probability Mass Function]
Let $X$ be a real-valued random variable whose range is contained in a countable set $S\subseteq\mathbb R$. The probability mass function of $X$ is the function $p_X: S \to [0,1]$ defined by
\begin{align*}
p_X(s)=\mathbb P(X=s).
\end{align*}
[/definition]
Once the point masses are known, the CDF is obtained by accumulating every mass at or below the threshold. This turns a local list of probabilities into a global distribution function.
[quotetheorem:9461]
This formula says that a discrete CDF is a running total of point masses. If the support is finite, the sum has only finitely many nonzero terms below each threshold. If the support is countably infinite, the same expression is understood as the countable sum over all support points $s\le x$; monotone convergence ensures that these accumulated partial sums are well behaved. The formula is useful because it converts a table of atomic probabilities into the CDF without needing intervals or integrals. The next case asks for the analogous local object when mass is not concentrated at points but spread continuously along the real line.
### Densities and Absolute Continuity
For a continuously spread law, point masses do not describe the distribution. The more useful local object is a density, whose integral over a set gives the probability of landing in that set. On the real line, write $\mathcal L^1$ (one-dimensional Lebesgue measure, i.e. length) for Lebesgue measure.
[definition: Probability Density Function]
Let $X$ be a real-valued random variable with law $\mu_X$. A probability density function for $X$ is a [measurable function](/page/Measurable%20Function) $f_X: \mathbb R\to[0,\infty)$ such that
\begin{align*}
\mathbb P(X\in A)=\int_A f_X(x)\,d\mathcal L^1(x)
\end{align*}
for every $A\in\mathcal B(\mathbb R)$.
[/definition]
A density becomes practically useful only when it can recover the cumulative probabilities that define the distribution. The point is that knowing infinitesimal mass alone is not yet a distributional description: to compare with the CDF framework, one must know how the density accumulates over every left ray $(-\infty,x]$. The formal statement records exactly when integrating the local object $f_X$ recovers the global object $F_X$.
[quotetheorem:9462]
This relation should not be read backward without care. Many CDFs have no density, and many have a density plus atoms.
[example: A Mixed Distribution]
Let $B\sim\operatorname{Ber}(p)$ and let $Y\sim\operatorname{Exp}(\lambda)$, with $p\in(0,1)$ and $\lambda>0$. Define $X$ by setting $X=0$ on the event $\{B=1\}$ and $X=Y$ on the event $\{B=0\}$, with $B$ and $Y$ independent.
If $x<0$, then $X\le x$ is impossible: on $\{B=1\}$ we have $X=0>x$, and on $\{B=0\}$ we have $X=Y\ge0>x$ for an exponential random variable. Hence
\begin{align*}
F_X(x)=\mathbb P(X\le x)=0.
\end{align*}
Now let $x\ge0$. On $\{B=1\}$, we have $X=0\le x$. On $\{B=0\}$, the condition $X\le x$ is the condition $Y\le x$. Therefore
\begin{align*}
\{X\le x\}=\{B=1\}\cup(\{B=0\}\cap\{Y\le x\}).
\end{align*}
The two events on the right are disjoint, so finite additivity gives
\begin{align*}
\mathbb P(X\le x)=\mathbb P(B=1)+\mathbb P(B=0,\ Y\le x).
\end{align*}
Using independence of $B$ and $Y$,
\begin{align*}
\mathbb P(B=0,\ Y\le x)=\mathbb P(B=0)\mathbb P(Y\le x).
\end{align*}
Since $\mathbb P(B=1)=p$, $\mathbb P(B=0)=1-p$, and $Y\sim\operatorname{Exp}(\lambda)$ has $\mathbb P(Y\le x)=1-e^{-\lambda x}$ for $x\ge0$, we get
\begin{align*}
F_X(x)=p+(1-p)(1-e^{-\lambda x}).
\end{align*}
At $0$, this formula gives
\begin{align*}
F_X(0)=p+(1-p)(1-e^0)=p+(1-p)(1-1)=p.
\end{align*}
For every $y<0$, $F_X(y)=0$, so
\begin{align*}
F_X(0-)=0.
\end{align*}
Thus the jump at $0$ has size
\begin{align*}
F_X(0)-F_X(0-)=p-0=p.
\end{align*}
For $x>0$, the term $(1-p)(1-e^{-\lambda x})$ is the continuously accumulating exponential part, while the jump $p$ records the atom at $0$.
[/example]
### Singular Behaviour
A continuous CDF need not come from a density. To describe this missing third possibility, we name distributions whose mass is carried by a Lebesgue null set even though no point receives positive mass.
[definition: Singular Distribution Function]
A map $F:\mathbb R\to[0,1]$ is a singular distribution function if $F$ is a continuous distribution function and there exists a Borel set $A\subsetneq\mathbb R$ with $\mathcal L^1(A)=0$ such that the associated probability measure $\mu$ satisfies
\begin{align*}
\mu(A)=1.
\end{align*}
[/definition]
Singular distribution functions are rare in elementary modelling but important in measure theory. They show that the trichotomy "discrete or density" is incomplete.
[example: The Cantor Distribution]
Let $C_0=[0,1]$. For $n\ge1$, obtain $C_n$ from $C_{n-1}$ by removing the open middle third of each remaining closed interval, and set
\begin{align*}
C=\bigcap_{n=0}^{\infty}C_n.
\end{align*}
At stage $n$, there are $2^n$ remaining intervals, each of length $3^{-n}$, so
\begin{align*}
\mathcal L^1(C_n)=2^n3^{-n}=\left(\frac23\right)^n.
\end{align*}
Since $C\subseteq C_n$ for every $n$, monotonicity of Lebesgue measure gives
\begin{align*}
0\le \mathcal L^1(C)\le \left(\frac23\right)^n.
\end{align*}
Letting $n\to\infty$ yields
\begin{align*}
\mathcal L^1(C)=0.
\end{align*}
The Cantor distribution function $F$ is defined by assigning to a point of $C$ with ternary expansion $x=0.a_1a_2a_3\cdots_3$, where each $a_i\in\{0,2\}$, the binary value
\begin{align*}
F(x)=0.\frac{a_1}{2}\frac{a_2}{2}\frac{a_3}{2}\cdots_2.
\end{align*}
On each open interval removed during the construction of $C$, the function is extended constantly from the common endpoint values. This gives an increasing continuous distribution function with
\begin{align*}
F(0)=0
\end{align*}
and
\begin{align*}
F(1)=1.
\end{align*}
Let $\mu$ be the probability measure whose distribution function is $F$. If $(u,v)$ is one of the removed middle-third intervals, then $F$ is constant on $(u,v)$ and has no jump at $u$ or $v$ because $F$ is continuous. Therefore the mass of that interval is
\begin{align*}
\mu((u,v))=F(v-)-F(u)=F(v)-F(u)=0.
\end{align*}
The complement $[0,1]\setminus C$ is the countable union of all removed open intervals, so countable additivity gives
\begin{align*}
\mu([0,1]\setminus C)=0.
\end{align*}
Since $F(0)=0$ and $F(1)=1$, all the mass lies in $[0,1]$, hence
\begin{align*}
\mu(C)=\mu([0,1])-\mu([0,1]\setminus C)=1-0=1.
\end{align*}
Thus the Cantor distribution is continuous and has no point masses, but all of its probability mass is carried by the Lebesgue-null set $C$.
[/example]
The Cantor example is the warning that the graph of a CDF can carry more information than either jumps or a density reveal. Once we accept the CDF as the primary object, it is natural to ask how to move in the reverse direction: from a probability level back to a point on the real line.
## Quantiles and Simulation
### Generalized Inverses
The CDF moves from values of a random variable to probabilities. In statistics and simulation we often need the reverse direction: given a probability level, find the threshold at which that much mass has accumulated. Jumps and flat pieces make ordinary inverse functions inadequate, so the inverse must be defined with an infimum.
[definition: Quantile Function]
Let $F$ be a distribution function. The [quantile function](/page/Quantile%20Function), also called the generalized inverse of $F$, is the function
\begin{align*}
F^{-1}:(0,1)\to\mathbb R
\end{align*}
defined by
\begin{align*}
F^{-1}(p)=\inf\{x\in\mathbb R:F(x)\ge p\}.
\end{align*}
[/definition]
Some authors extend this convention to the endpoints by setting $F^{-1}(0)=\inf\{x\in\mathbb R:F(x)>0\}$ and $F^{-1}(1)=\inf\{x\in\mathbb R:F(x)=1\}$ when these quantities are finite, or by allowing the values $-\infty$ and $\infty$ in the extended real line. This page keeps the quantile function on $(0,1)$, where inverse-transform sampling and ordinary percentiles avoid endpoint ambiguity.
The purpose of this generalized inverse is not only to report percentiles. It gives a way to construct random variables with a prescribed law from uniform randomness, which is the basic mechanism behind inverse-transform simulation.
[quotetheorem:1139]
This theorem is one of the simplest bridges between probability theory and simulation. It also explains why quantiles are more than descriptive statistics: they parameterize the distribution by probability mass.
[example: Simulating an Exponential Random Variable]
Let $F(x)=0$ for $x<0$ and $F(x)=1-e^{-\lambda x}$ for $x\ge0$, where $\lambda>0$. To compute the quantile function, fix $p\in(0,1)$ and solve $F(x)=p$. Since $p>0$ and $F(x)=0$ for $x<0$, the solution must satisfy $x\ge0$, so
\begin{align*}
p=1-e^{-\lambda x}.
\end{align*}
Subtracting $1$ from both sides gives
\begin{align*}
p-1=-e^{-\lambda x}.
\end{align*}
Multiplying by $-1$ gives
\begin{align*}
1-p=e^{-\lambda x}.
\end{align*}
Because $0<p<1$, we have $0<1-p<1$, so taking logarithms is valid:
\begin{align*}
\log(1-p)=\log(e^{-\lambda x}).
\end{align*}
Using $\log(e^y)=y$ gives
\begin{align*}
\log(1-p)=-\lambda x.
\end{align*}
Dividing by $-\lambda$ gives
\begin{align*}
x=-\frac{1}{\lambda}\log(1-p).
\end{align*}
Thus
\begin{align*}
F^{-1}(p)=-\frac{1}{\lambda}\log(1-p), \qquad 0<p<1.
\end{align*}
Now let $U\sim\operatorname{Unif}(0,1)$ and define
\begin{align*}
X=F^{-1}(U)=-\frac{1}{\lambda}\log(1-U).
\end{align*}
By *[Inverse Transform Sampling](/theorems/1139)*, $X$ has distribution function $F$. Therefore $X$ has the $\operatorname{Exp}(\lambda)$ distribution, because its CDF is $0$ on $(-\infty,0)$ and $1-e^{-\lambda x}$ on $[0,\infty)$.
[/example]
### Medians and Probability Levels
Quantiles organize statistical summaries by probability level rather than by algebraic moments. The most familiar example is the median, which identifies a threshold with at least half the mass on each side.
[definition: Median]
Let $X$ be a real-valued random variable with CDF $F_X$. A median of $X$ is a real number $m\in\mathbb R$ such that
\begin{align*}
F_X(m)\ge \frac12
\end{align*}
and
\begin{align*}
\mathbb P(X\ge m)\ge \frac12.
\end{align*}
[/definition]
For distributions with flat parts or jumps, medians need not be unique. The quantile convention selects the lower median $F_X^{-1}(1/2)$.
## Empirical Distribution Functions
### From Data to a Step Function
In statistics the true CDF is usually unknown. A sample gives observed values, and the most direct estimator of the CDF counts the proportion of observations at or below each threshold. This estimator is a random step function, and its convergence properties justify many nonparametric methods.
[definition: Empirical Distribution Function]
Let $X_1,\dots,X_n$ be real-valued random variables. The empirical distribution function is the random function
\begin{align*}
F_n:\mathbb R \to [0,1]
\end{align*}
defined by
\begin{align*}
F_n(x)=\frac{1}{n}\sum_{i=1}^n \mathbb{1}_{\{X_i\le x\}}.
\end{align*}
[/definition]
The definition makes sense for any finite list of real-valued random variables. When the variables are i.i.d. with common law, $F_n$ becomes an estimator of their common CDF; without that common-law assumption it is still a random step function attached to the observed sample.
The empirical CDF places mass $1/n$ at each observed data point, counting multiplicity. To express this as an actual probability measure, we package the sample as a weighted sum of point masses.
[definition: Empirical Measure]
Let $X_1,\dots,X_n$ be real-valued random variables on a probability space $(\Omega,\mathcal F,\mathbb P)$. The empirical measure is the random map
\begin{align*}
\mu_n:\Omega \to \mathcal P(\mathbb R)
\end{align*}
defined by
\begin{align*}
\mu_n(\omega)=\frac{1}{n}\sum_{i=1}^n \delta_{X_i(\omega)}
\end{align*}
where $\mathcal P(\mathbb R)$ denotes the set of probability measures on $(\mathbb R,\mathcal B(\mathbb R))$ and $\delta_x$ denotes the point mass at $x$.
[/definition]
The empirical distribution function is the distribution function of $\mu_n$. This converts data into a probability measure and lets statistical estimation be phrased as convergence of random measures.
[example: Empirical CDF from a Small Sample]
For the observed sample $2,1,2,4$, the empirical CDF is
\begin{align*}
F_4(x)=\frac14\left(\mathbb 1_{\{2\le x\}}+\mathbb 1_{\{1\le x\}}+\mathbb 1_{\{2\le x\}}+\mathbb 1_{\{4\le x\}}\right).
\end{align*}
We compute it by counting how many of the four observations are at most the threshold $x$.
If $x<1$, then none of $1,2,2,4$ is at most $x$, so each indicator is $0$ and
\begin{align*}
F_4(x)=\frac14(0+0+0+0)=0.
\end{align*}
If $1\le x<2$, then only the observation $1$ is at most $x$, so
\begin{align*}
F_4(x)=\frac14(0+1+0+0)=\frac14.
\end{align*}
If $2\le x<4$, then the observations $1,2,2$ are at most $x$, while $4$ is not, so
\begin{align*}
F_4(x)=\frac14(1+1+1+0)=\frac34.
\end{align*}
If $x\ge4$, then all four observations are at most $x$, so
\begin{align*}
F_4(x)=\frac14(1+1+1+1)=1.
\end{align*}
Thus $F_4(x)=0$ for $x<1$, $F_4(x)=1/4$ for $1\le x<2$, $F_4(x)=3/4$ for $2\le x<4$, and $F_4(x)=1$ for $x\ge4$. At $2$, the left limit is $F_4(2-)=1/4$ and the value is $F_4(2)=3/4$, so the jump size is
\begin{align*}
F_4(2)-F_4(2-)=\frac34-\frac14=\frac24=\frac12.
\end{align*}
This jump has size $2/4$ because exactly two of the four observations are equal to $2$, and the empirical CDF assigns mass $1/4$ to each observed data point, counting multiplicity.
[/example]
### Uniform Convergence
Pointwise convergence of the empirical CDF follows from the law of large numbers at each fixed threshold. For statistical procedures that inspect all thresholds at once, pointwise convergence is not enough; we need the whole graph to converge uniformly.
[quotetheorem:2004]
The theorem says that the whole graph of the empirical CDF eventually tracks the true graph uniformly, not just at fixed points. This is why the empirical CDF is a canonical nonparametric estimator.
To understand confidence bands and goodness-of-fit tests, convergence alone is not enough. We also need the scale and shape of the random fluctuation around the limiting CDF. A clean way to put the fluctuation on the probability scale is available when $F$ is continuous and strictly increasing: then the probability integral transform sends the sample to $U_i=F(X_i)\sim\operatorname{Unif}(0,1)$. If $F$ has flat pieces or jumps, the same idea needs a generalized or randomized transform, so the theorem below is stated in the regular case where the uniform empirical process is exactly the right object.
[quotetheorem:6303]
The theorem applies to the uniform empirical process, so it is a statement about the centered fluctuations of the random step function $t\mapsto F_n(t)$ on the whole interval $[0,1]$, not merely at one fixed threshold. The continuity and strict monotonicity assumptions in the preceding reduction are what let a general sample be transported to the uniform scale without losing the exact empirical-process form. Under this regularity, the limiting object is a Brownian bridge, reflecting the constraint that the empirical CDF is pinned at $0$ and $1$.
For this page, the important message is not the technical topology of the convergence but the scale and shape of the fluctuation: empirical CDF errors are typically of order $n^{-1/2}$, and after that rescaling their whole graph has a nondegenerate random limit. This is the source of Kolmogorov-Smirnov style confidence bands and goodness-of-fit tests. When $F$ has jumps or flat pieces, the same simple uniform-process statement no longer applies directly; one must handle ties, atoms, or generalized transforms separately.
## Convergence in Distribution
### CDFs as Limit Objects
CDFs are the standard language for weak limits of real-valued random variables. The right question is not whether $F_n(x)\to F(x)$ at every $x$, because jumps of the limiting CDF create unavoidable endpoint ambiguity. The correct formulation asks for convergence at continuity points of the limit.
[definition: Convergence in Distribution via CDFs]
Let $X_1,X_2,\dots$ and $X$ be real-valued random variables with CDFs $F_{X_n}$ and $F_X$. The sequence $(X_n)_{n\in\mathbb N}$ converges in distribution to $X$, written $X_n\xrightarrow{d}X$, if
\begin{align*}
\lim_{n\to\infty}F_{X_n}(x)=F_X(x)
\end{align*}
for every continuity point $x$ of $F_X$.
[/definition]
The restriction to continuity points is forced by examples where mass approaches a point from one side. At the jump, the limiting value includes the atom while the approximating probabilities may not.
[example: Why Continuity Points Are Required]
Let $X_n=1/n$ almost surely and let $X=0$ almost surely. For each fixed $n$, the CDF of $X_n$ is determined by whether the threshold has reached the single possible value $1/n$:
\begin{align*}
F_{X_n}(x)=0\text{ if }x<1/n,\quad F_{X_n}(x)=1\text{ if }x\ge 1/n.
\end{align*}
Similarly, since $X=0$ almost surely,
\begin{align*}
F_X(x)=0\text{ if }x<0,\quad F_X(x)=1\text{ if }x\ge 0.
\end{align*}
The only discontinuity of $F_X$ is at $0$. If $x<0$, then $x<0<1/n$ for every $n$, so
\begin{align*}
F_{X_n}(x)=0=F_X(x)
\end{align*}
for every $n$. If $x>0$, choose $N$ with $N\ge 1/x$. Then for every $n\ge N$,
\begin{align*}
\frac1n\le \frac1N\le x,
\end{align*}
so
\begin{align*}
F_{X_n}(x)=1=F_X(x).
\end{align*}
Thus $F_{X_n}(x)\to F_X(x)$ at every continuity point $x$ of $F_X$, and therefore $X_n\xrightarrow{d}X$.
At the discontinuity point $0$, the approximating CDFs do not converge to the limiting CDF value:
\begin{align*}
F_{X_n}(0)=0
\end{align*}
for every $n$, because $0<1/n$, while
\begin{align*}
F_X(0)=1
\end{align*}
because $X=0$ almost surely. This is exactly why convergence in distribution is stated only at continuity points of the limiting CDF.
[/example]
### Weak Convergence of Measures
The definition above is about random variables, but the deeper object is the law. Two different random variables can have the same distribution, so a convergence notion phrased only through representatives should be checked against the induced probability measures. The issue is whether convergence of the one-dimensional left-ray probabilities captures the usual [weak convergence](/page/Weak%20Convergence) of those laws; the formal criterion below gives precisely that bridge.
[quotetheorem:1171]
This theorem is the measure-level version of the definition above, and it explains why CDF convergence is the right real-line test for weak convergence of laws. The sets $(-\infty,x]$ generate the Borel sets, so their probabilities contain enough information to identify a law; however, convergence cannot be demanded at every endpoint, because an atom of the limiting law can create a jump in the limiting CDF. The continuity-set condition isolates exactly the endpoints where no mass is sitting on the boundary, so interval probabilities behave continuously under weak convergence.
On the real line, this is what justifies the usual CDF criterion for convergence in distribution: checking $F_{X_n}(x)\to F_X(x)$ at continuity points of $F_X$ is equivalent to weak convergence of the induced laws. The theorem is needed here because it turns the computational definition using left rays into an invariant statement about probability measures, independent of the particular random variables chosen to represent those laws.
## Beyond and Connected Topics
CDFs are the entry point to [probability distributions](/page/Distribution) on the real line, but they are only one representation of a law. Probability mass functions are efficient for countable laws, probability density functions are efficient for absolutely continuous laws, and characteristic functions are often better for sums and limit theorems because products encode independence.
In statistics, the empirical CDF leads directly to nonparametric inference. Confidence bands, Kolmogorov-Smirnov statistics, quantile estimation, and bootstrap methods all start from the same step-function estimator introduced above. The natural course-level continuation is [Cambridge II Principles of Statistics](/page/Cambridge%20II%20Principles%20of%20Statistics), where distribution functions become objects to estimate and compare.
In measure-theoretic probability, the characterization theorem for CDFs is a special case of constructing measures from values on generating classes. This connects naturally to [Cambridge IB Probability and Measure](/page/Cambridge%20IB%20Probability%20and%20Measure), where random variables, laws, Borel sets, and convergence of measures are treated systematically.
For stochastic processes, finite-dimensional distributions are often described by joint CDFs. [Brownian motion](/page/Brownian%20Motion), martingales, stopping times, and stochastic differential equations require more structure than one-dimensional CDFs, but their marginal distributions are still read through CDFs and quantiles. The next advanced direction is [Cambridge III Stochastic Calculus and Applications](/page/Cambridge%20III%20Stochastic%20Calculus%20and%20Applications).
## References
Androma, [Cambridge IA Probability](/page/Cambridge%20IA%20Probability).
Androma, [Cambridge II Principles of Statistics](/page/Cambridge%20II%20Principles%20of%20Statistics).
Androma, [Cambridge IB Probability and Measure](/page/Cambridge%20IB%20Probability%20and%20Measure).
Androma, [Cambridge III Stochastic Calculus and Applications](/page/Cambridge%20III%20Stochastic%20Calculus%20and%20Applications).
Androma, [Distribution](/page/Distribution).
Billingsley, *Probability and Measure* (1995).
Durrett, *Probability: Theory and Examples* (2019).
van der Vaart, *Asymptotic Statistics* (1998).
Cumulative Distribution Function
Also known as: CDF, Distribution Function, Cumulative Probability Function, Probability Distribution Function