Many probability problems first appear as yes-or-no questions. Did the first card have a certain suit? Did a graph contain a triangle? Did a customer return? Did a random walk hit a boundary before time $n$? The event itself is not a number, so it cannot be added, averaged, centered, or inserted into a variance computation. The indicator random variable is the small device that removes this obstruction: it replaces occurrence by $1$ and non-occurrence by $0$.
This replacement changes the shape of a problem. Instead of asking for the probability of a complicated count directly, we can write the count as a sum of indicators and use linearity of expectation. Instead of treating dependence between events as informal overlap, we can compute covariance. Instead of separating set language from random variable language, we can move between events and functions on the [probability space](/page/Probability%20Space).
[example: Counting Heads by Events]
Let $(\theta_i)_{1 \le i \le n}$ be the outcomes of $n$ coin tosses, where each $\theta_i$ is either $H$ or $T$. For each $i$, let $A_i=\{\theta_i=H\}$. Define
\begin{align*} N=\sum_{i=1}^n \mathbb{1}_{A_i}. \end{align*}
For a fixed outcome $\omega$, the $i$th term is
\begin{align*} \mathbb{1}_{A_i}(\omega)=1 \text{ if } \theta_i(\omega)=H,\quad \mathbb{1}_{A_i}(\omega)=0 \text{ if } \theta_i(\omega)=T. \end{align*}
Hence $N(\omega)$ is the sum of one $1$ for each head and one $0$ for each tail, so $N(\omega)$ is exactly the number of heads in the outcome $\omega$.
If the coin has heads probability $p$, then $\mathbb{P}(A_i)=p$ for every $i$. Since $\mathbb{1}_{A_i}$ takes only the values $0$ and $1$, its expectation is
\begin{align*} \mathbb{E}[\mathbb{1}_{A_i}]=1\cdot \mathbb{P}(A_i)+0\cdot \mathbb{P}(A_i^c). \end{align*}
Thus
\begin{align*} \mathbb{E}[\mathbb{1}_{A_i}]=\mathbb{P}(A_i)=p. \end{align*}
Using finite linearity of expectation,
\begin{align*} \mathbb{E}[N]=\mathbb{E}\left[\sum_{i=1}^n \mathbb{1}_{A_i}\right]=\sum_{i=1}^n \mathbb{E}[\mathbb{1}_{A_i}]. \end{align*}
Substituting $\mathbb{E}[\mathbb{1}_{A_i}]=p$ into the finite sum gives
\begin{align*} \mathbb{E}[N]=\sum_{i=1}^n p=np. \end{align*}
This expectation computation uses only the individual probabilities of the events $A_i$; independence is needed for the binomial distribution of $N$, while variance depends on the pairwise intersection probabilities $\mathbb{P}(A_i\cap A_j)$.
[/example]
The example shows the main principle of the chapter. Indicators are the bridge between Boolean structure and linear structure. Events combine by unions and intersections; random variables combine by sums and products. Indicator variables translate between the two languages without losing information about the event.
## Definition
The parent concept is a [random variable](/page/Random%20Variable): a measurable map from a probability space into a measurable state space. An event $A \in \mathcal F$ already has measurable structure, because membership in $A$ is a measurable yes-or-no property of $\omega \in \Omega$. To use expectation, variance, and distributional language, we need to package that membership test as a random variable.
[definition: Indicator Random Variable]
Let $(\Omega, \mathcal F, \mathbb P)$ be a probability space and let $A \in \mathcal F$ be an event. The indicator random variable of $A$ is the function $\mathbb{1}_A: \Omega \to \{0,1\}$ defined by $\mathbb{1}_A(\omega)=1$ when $\omega \in A$ and $\mathbb{1}_A(\omega)=0$ when $\omega \notin A$.
[/definition]
The codomain $\{0,1\}$ is usually viewed as a measurable subspace of $\mathbb R$ with the power set $\sigma$-algebra, so $\mathbb{1}_A$ is a real-valued random variable. The measurability condition is exactly the condition $A \in \mathcal F$, since
\begin{align*} \mathbb{1}_A^{-1}(\{1\}) = A \quad \text{and} \quad \mathbb{1}_A^{-1}(\{0\}) = A^c. \end{align*}
The next question is what distribution this two-valued random variable has. That matters because it lets us translate event probabilities into the standard language of named random-variable laws.
[quotetheorem:9475]
The Bernoulli law tells us what distribution an indicator has, but in calculations we often encounter a $0$-$1$ random variable first and only later need to identify the event it marks. The next result supplies that reverse translation, which is what makes the event-random-variable dictionary exact rather than one-directional.
[quotetheorem:10138]
The identification lets us prove facts about events by proving facts about $0$-$1$ random variables, and it lets us interpret any $0$-$1$ random variable as an event in disguise. This two-way translation is the reason indicators are used throughout probability rather than only in elementary counting.
## Events as Numbers
### Indicators on Measure Spaces
The indicator notation is not restricted to probability. In measure theory, the same construction turns a measurable set into a [measurable function](/page/Measurable%20Function). This broader version is needed because expectation is an integral, and indicators are the first functions whose integrals are prescribed by the measure.
[definition: Indicator Function]
Let $(E,\mathcal E)$ be a measurable space and let $B \in \mathcal E$. The indicator function of $B$ is the function $\mathbb{1}_B: E \to \{0,1\}$ defined by $\mathbb{1}_B(x)=1$ when $x \in B$ and $\mathbb{1}_B(x)=0$ when $x \notin B$.
[/definition]
On a probability space, the indicator function of an event is an indicator random variable. Once a measure $\mu$ is present, the natural operation is integration. The next result explains why the integral of an indicator must recover the measure of the set it marks.
[quotetheorem:10139]
This theorem converts probabilities into expectations, and expectations behave linearly even when the underlying events are dependent. To extend this idea beyond a single event, we next isolate the finite-valued random variables obtained by combining finitely many indicators.
[definition: Simple Random Variable]
Let $(\Omega,\mathcal F,\mathbb P)$ be a probability space. A real-valued random variable $X: \Omega \to \mathbb R$ is a simple random variable if it has finite range.
[/definition]
The definition identifies the finite-valued objects, but it does not yet show how indicators generate them. The decomposition theorem below gives the exact representation, using one indicator for each level event of the random variable.
[quotetheorem:10140]
The formula says that indicators do not merely count events. They assemble finite-valued random variables. General nonnegative random variables are then built by limits of simple random variables, which is one path from elementary probability into measure-theoretic probability.
[example: A Three-Valued Random Variable]
Let $X$ be the payoff of a game with possible values $-1$, $0$, and $3$. For each outcome $\omega$, exactly one of the events $\{X=-1\}$, $\{X=0\}$, and $\{X=3\}$ occurs. If $X(\omega)=-1$, then
\begin{align*} -\mathbb{1}_{\{X=-1\}}(\omega)+0\mathbb{1}_{\{X=0\}}(\omega)+3\mathbb{1}_{\{X=3\}}(\omega)=-1\cdot 1+0\cdot 0+3\cdot 0=-1. \end{align*}
If $X(\omega)=0$, then
\begin{align*} -\mathbb{1}_{\{X=-1\}}(\omega)+0\mathbb{1}_{\{X=0\}}(\omega)+3\mathbb{1}_{\{X=3\}}(\omega)=-1\cdot 0+0\cdot 1+3\cdot 0=0. \end{align*}
If $X(\omega)=3$, then
\begin{align*} -\mathbb{1}_{\{X=-1\}}(\omega)+0\mathbb{1}_{\{X=0\}}(\omega)+3\mathbb{1}_{\{X=3\}}(\omega)=-1\cdot 0+0\cdot 0+3\cdot 1=3. \end{align*}
Thus, pointwise,
\begin{align*} X=-\mathbb{1}_{\{X=-1\}}+0\mathbb{1}_{\{X=0\}}+3\mathbb{1}_{\{X=3\}}. \end{align*}
Since $0\mathbb{1}_{\{X=0\}}(\omega)=0$ for every $\omega$, this is the same as
\begin{align*} X=-\mathbb{1}_{\{X=-1\}}+3\mathbb{1}_{\{X=3\}}. \end{align*}
Now use finite linearity of expectation and the identity $\mathbb{E}[\mathbb{1}_A]=\mathbb{P}(A)$ for an indicator:
\begin{align*} \mathbb{E}[X]=\mathbb{E}[-\mathbb{1}_{\{X=-1\}}+0\mathbb{1}_{\{X=0\}}+3\mathbb{1}_{\{X=3\}}]. \end{align*}
By finite linearity,
\begin{align*} \mathbb{E}[X]=-\mathbb{E}[\mathbb{1}_{\{X=-1\}}]+0\mathbb{E}[\mathbb{1}_{\{X=0\}}]+3\mathbb{E}[\mathbb{1}_{\{X=3\}}]. \end{align*}
Substituting the corresponding event probabilities gives
\begin{align*} \mathbb{E}[X]=-\mathbb{P}(X=-1)+0\mathbb{P}(X=0)+3\mathbb{P}(X=3). \end{align*}
Since the middle term is $0$,
\begin{align*} \mathbb{E}[X]=-\mathbb{P}(X=-1)+3\mathbb{P}(X=3). \end{align*}
The payoff is therefore determined, for purposes of expectation, by the probabilities of the two nonzero payoff events.
[/example]
### Boolean Algebra in Arithmetic Form
After indicators translate events into numbers, set operations should have arithmetic counterparts. These identities are useful because they let intersections, unions, complements, and differences be manipulated inside expectations.
[quotetheorem:10141]
These formulas express Boolean logic in arithmetic. Multiplication corresponds to intersection, subtraction from $1$ corresponds to complement, and inclusion-exclusion appears as the formula for a union.
## Counting by Indicators
### Expected Counts
Many random quantities are counts: number of successes, number of fixed points, number of occupied boxes, number of isolated vertices, number of collisions, number of records. The direct distribution of a count may be hard. The count itself is often easy to write as a sum of indicators, so we first name that construction.
[definition: Counting Random Variable]
Let $(\Omega,\mathcal F,\mathbb P)$ be a probability space, and let $A_1,\ldots,A_n \in \mathcal F$ be events. The counting random variable associated to $A_1,\ldots,A_n$ is the real-valued random variable $N: \Omega \to \{0,\ldots,n\}$ defined by
\begin{align*}
N(\omega)=\sum_{i=1}^n \mathbb{1}_{A_i}(\omega).
\end{align*}
[/definition]
The value of $N$ is the number of events among $A_1,\ldots,A_n$ that occur. The events may overlap, and the definition still makes sense. Directly finding the distribution of $N$ can require understanding all intersections among the events, but its expectation asks for less information. Since $N$ is a finite sum of indicator variables, the key question is whether expectation can be pushed through that finite sum.
[quotetheorem:1117]
For a counting variable, finite linearity still needs one more translation: each summand is an indicator, and an indicator's expectation is the probability of its event. Combining those two facts turns the abstract linearity rule into a usable formula for expected counts.
[quotetheorem:3534]
No independence appears in the hypotheses. This absence is not a missing assumption; it is the point. Expectation adds even when random variables are strongly dependent.
[example: Fixed Points of a Random Permutation]
Let $\pi$ be uniformly distributed over the [symmetric group](/page/Symmetric%20Group) $S_n$, with $n\ge 1$. For each $i\in\{1,\ldots,n\}$, let $A_i=\{\pi(i)=i\}$. The fixed-point count is
\begin{align*} F=\sum_{i=1}^n \mathbb{1}_{A_i}. \end{align*}
Indeed, for a permutation $\pi$, the term $\mathbb{1}_{A_i}$ equals $1$ exactly when $i$ is fixed and equals $0$ otherwise, so the sum contains one $1$ for each fixed point and one $0$ for each non-fixed point.
For a fixed $i$, the number of permutations in $S_n$ with $\pi(i)=i$ is $(n-1)!$, because the remaining $n-1$ elements may be permuted arbitrarily. Since $|S_n|=n!$, uniformity gives
\begin{align*} \mathbb{P}(A_i)=\frac{(n-1)!}{n!}. \end{align*}
Using $n!=n(n-1)!$, this becomes
\begin{align*} \mathbb{P}(A_i)=\frac{(n-1)!}{n(n-1)!}=\frac{1}{n}. \end{align*}
By finite linearity of expectation and the identity $\mathbb{E}[\mathbb{1}_{A_i}]=\mathbb{P}(A_i)$ for an indicator,
\begin{align*} \mathbb{E}[F]=\mathbb{E}\left[\sum_{i=1}^n \mathbb{1}_{A_i}\right]=\sum_{i=1}^n \mathbb{E}[\mathbb{1}_{A_i}]. \end{align*}
Substituting $\mathbb{E}[\mathbb{1}_{A_i}]=1/n$ for every $i$ gives
\begin{align*} \mathbb{E}[F]=\sum_{i=1}^n \frac{1}{n}=n\cdot\frac{1}{n}=1. \end{align*}
Thus a uniformly random permutation has expected fixed-point count $1$, even though the events $A_i$ are not independent; only their individual probabilities enter this expectation calculation.
[/example]
### Independent Trials
Indicators also reveal the binomial distribution as a count of independent identical events. The distinction between the sum representation and the named distribution matters: the sum representation always exists, while the binomial law requires independence and equal probabilities.
[quotetheorem:10142]
The theorem explains why a binomial random variable is a count of successes in independent repeated trials. If the probabilities differ, the sum is still useful, but the distribution is no longer generally binomial.
[example: Unequal Success Probabilities]
Suppose component $i$ works with probability $p_i$, and let $A_i$ be the event that component $i$ works. The count of working components is
\begin{align*} N=\mathbb{1}_{A_1}+\mathbb{1}_{A_2}+\mathbb{1}_{A_3}. \end{align*}
For each $i$, the indicator satisfies
\begin{align*} \mathbb{E}[\mathbb{1}_{A_i}]=\mathbb{P}(A_i)=p_i. \end{align*}
Using finite linearity of expectation,
\begin{align*} \mathbb{E}[N]=\mathbb{E}[\mathbb{1}_{A_1}]+\mathbb{E}[\mathbb{1}_{A_2}]+\mathbb{E}[\mathbb{1}_{A_3}]. \end{align*}
Substituting the three indicator expectations gives
\begin{align*} \mathbb{E}[N]=p_1+p_2+p_3. \end{align*}
The event $\{N=2\}$ occurs exactly when components $1$ and $2$ work while component $3$ fails, or components $1$ and $3$ work while component $2$ fails, or components $2$ and $3$ work while component $1$ fails. Thus
\begin{align*} \{N=2\}=(A_1\cap A_2\cap A_3^c)\cup(A_1\cap A_2^c\cap A_3)\cup(A_1^c\cap A_2\cap A_3). \end{align*}
These three events are disjoint, so finite additivity gives
\begin{align*} \mathbb{P}(N=2)=\mathbb{P}(A_1\cap A_2\cap A_3^c)+\mathbb{P}(A_1\cap A_2^c\cap A_3)+\mathbb{P}(A_1^c\cap A_2\cap A_3). \end{align*}
By independence of the component events,
\begin{align*} \mathbb{P}(A_1\cap A_2\cap A_3^c)=\mathbb{P}(A_1)\mathbb{P}(A_2)\mathbb{P}(A_3^c)=p_1p_2(1-p_3). \end{align*}
Similarly,
\begin{align*} \mathbb{P}(A_1\cap A_2^c\cap A_3)=p_1(1-p_2)p_3. \end{align*}
And
\begin{align*} \mathbb{P}(A_1^c\cap A_2\cap A_3)=(1-p_1)p_2p_3. \end{align*}
Therefore
\begin{align*} \mathbb{P}(N=2)=p_1p_2(1-p_3)+p_1(1-p_2)p_3+(1-p_1)p_2p_3. \end{align*}
The count is still a sum of independent indicators, but because the success probabilities may differ, its distribution is Poisson-binomial rather than binomial unless $p_1=p_2=p_3$.
[/example]
The same method is common in graph theory, where indicators turn substructure counts into sums over possible substructures. This is the first step in many probabilistic method arguments.
[example: Triangles in an Erdős-Rényi Graph]
Let $G \sim G(n,p)$ on vertex set $\{1,\ldots,n\}$, and for each three-element set $S\subset \{1,\ldots,n\}$ let $A_S$ be the event that the three vertices in $S$ span a triangle. Define
\begin{align*} T=\sum_{S\subset \{1,\ldots,n\},\ |S|=3}\mathbb{1}_{A_S}. \end{align*}
For a fixed graph outcome $g$, the summand $\mathbb{1}_{A_S}(g)$ equals $1$ exactly when $S$ spans a triangle in $g$, and equals $0$ otherwise. Thus the sum contains one $1$ for each triangle in $g$ and one $0$ for each non-triangle triple, so $T(g)$ is exactly the number of triangles in $g$.
Fix $S=\{a,b,c\}$. Let $E_{ab}$, $E_{ac}$, and $E_{bc}$ be the events that the corresponding three edges are present. Then
\begin{align*} A_S=E_{ab}\cap E_{ac}\cap E_{bc}. \end{align*}
By independence of the edge events in $G(n,p)$,
\begin{align*} \mathbb{P}(A_S)=\mathbb{P}(E_{ab})\mathbb{P}(E_{ac})\mathbb{P}(E_{bc})=p\cdot p\cdot p=p^3. \end{align*}
For each such $S$,
\begin{align*} \mathbb{E}[\mathbb{1}_{A_S}]=1\cdot\mathbb{P}(A_S)+0\cdot\mathbb{P}(A_S^c)=\mathbb{P}(A_S)=p^3. \end{align*}
Using finite linearity of expectation,
\begin{align*} \mathbb{E}[T]=\mathbb{E}\left[\sum_{S\subset \{1,\ldots,n\},\ |S|=3}\mathbb{1}_{A_S}\right]=\sum_{S\subset \{1,\ldots,n\},\ |S|=3}\mathbb{E}[\mathbb{1}_{A_S}]. \end{align*}
There are $\binom{n}{3}$ three-element subsets of $\{1,\ldots,n\}$, and each contributes $p^3$, so
\begin{align*} \mathbb{E}[T]=\sum_{S\subset \{1,\ldots,n\},\ |S|=3}p^3=\binom{n}{3}p^3. \end{align*}
The triangle events need not be mutually independent. For instance, when $n\ge 4$, the triples $\{1,2,3\}$ and $\{1,2,4\}$ share the edge $\{1,2\}$, so both triangle events occur exactly when the five edges $\{1,2\}$, $\{1,3\}$, $\{2,3\}$, $\{1,4\}$, and $\{2,4\}$ are present. Hence their intersection has probability $p^5$, while the product of their individual probabilities is $p^3p^3=p^6$, which differs from $p^5$ when $0<p<1$. The expectation calculation above does not require independence among triangle events; it only uses the individual probabilities $\mathbb{P}(A_S)=p^3$.
[/example]
## Moments and Dependence
### Variance of a Single Indicator
Expectation only sees the first-order probability of an event. To measure fluctuation, we need second moments. Indicators have a special simplification because a $0$-$1$ value is unchanged when squared, so their variance has a closed form in terms of the event probability.
[quotetheorem:10143]
The identity $X^2=X$ is special to $0$-$1$ random variables. For two event indicators, the corresponding second-moment question is whether the events occur together more or less often than their individual probabilities predict.
[quotetheorem:10144]
Positive covariance means the two events occur together more often than independence would predict. Negative covariance means they avoid each other relative to independence. To use this in sums of many indicators, we need a condition that removes all pairwise covariance terms.
[definition: Pairwise Independent Events]
Let $(\Omega,\mathcal F,\mathbb P)$ be a probability space. Events $A_1,\ldots,A_n \in \mathcal F$ are pairwise independent if for all distinct $i,j \in \{1,\ldots,n\}$, $\mathbb{P}(A_i\cap A_j)=\mathbb{P}(A_i)\mathbb{P}(A_j)$.
[/definition]
Pairwise independence is weaker than mutual independence, but it is enough to make distinct indicator variables uncorrelated. The variance of a count is therefore controlled by individual variances plus the pairwise overlap errors.
[quotetheorem:10145]
The formula separates individual randomness from dependence. In applications, estimating the double sum is often the main work.
[example: Two Overlapping Events]
Roll a fair six-sided die with sample space $\{1,2,3,4,5,6\}$. Let $A$ be the event that the outcome is even, and let $B$ be the event that the outcome is at least $4$. Then
\begin{align*} A=\{2,4,6\},\quad B=\{4,5,6\},\quad A\cap B=\{4,6\}. \end{align*}
Since the die is fair, each outcome has probability $1/6$, so finite additivity gives
\begin{align*} \mathbb{P}(A)=\frac{3}{6},\quad \mathbb{P}(B)=\frac{3}{6},\quad \mathbb{P}(A\cap B)=\frac{2}{6}. \end{align*}
By *Covariance of Indicators*,
\begin{align*} \operatorname{Cov}(\mathbb{1}_A,\mathbb{1}_B)=\mathbb{P}(A\cap B)-\mathbb{P}(A)\mathbb{P}(B). \end{align*}
Substituting the three probabilities gives
\begin{align*} \operatorname{Cov}(\mathbb{1}_A,\mathbb{1}_B)=\frac{2}{6}-\frac{3}{6}\cdot\frac{3}{6}. \end{align*}
The product term is
\begin{align*} \frac{3}{6}\cdot\frac{3}{6}=\frac{9}{36}. \end{align*}
Also,
\begin{align*} \frac{2}{6}=\frac{12}{36}. \end{align*}
Therefore
\begin{align*} \operatorname{Cov}(\mathbb{1}_A,\mathbb{1}_B)=\frac{12}{36}-\frac{9}{36}=\frac{3}{36}=\frac{1}{12}. \end{align*}
The covariance is positive because the overlap outcomes $4$ and $6$ make both indicators equal to $1$, so the events occur together more often than their separate probabilities would predict under independence.
[/example]
### Independence Revisited
Independence can be stated for events or for random variables. Since indicators carry exactly the information of their events, the two forms of independence should agree. This equivalence is useful when moving between Bernoulli variables and event families.
[quotetheorem:10146]
This theorem is often used silently: a statement about independent Bernoulli variables is also a statement about independent events, and conversely.
## Conditional Information
### Conditional Probabilities as Random Variables
Indicators also express [conditional probability](/page/Conditional%20Probability). If $\mathcal G \subset \mathcal F$ is the information currently available, then the best $\mathcal G$-measurable prediction of whether $A$ occurs is the [conditional expectation](/page/Conditional%20Expectation) of $\mathbb{1}_A$. This motivates the measure-theoretic definition of conditional probability.
[definition: Conditional Probability Given a Sigma-Algebra]
Let $(\Omega,\mathcal F,\mathbb P)$ be a probability space, let $A \in \mathcal F$, and let $\mathcal G \subset \mathcal F$ be a sub-$\sigma$-algebra. The conditional probability of $A$ given $\mathcal G$ is a $\mathcal G$-measurable random variable $\mathbb{P}(A\mid \mathcal G): \Omega \to [0,1]$, defined up to almost-sure equality, such that
\begin{align*}
\mathbb{P}(A\mid \mathcal G)=\mathbb{E}[\mathbb{1}_A\mid \mathcal G].
\end{align*}
[/definition]
This definition treats conditional probability as a random variable, not as a single number. The next characterization states the averaging property that makes it the right conditional version of $\mathbb{P}(A)$.
[quotetheorem:972]
The formula says that conditional probability gives the right probability after restricting to any event whose occurrence is already known at the information level $\mathcal G$.
[example: Conditional Probability on a Finite Partition]
Let $B_1,\ldots,B_m$ be disjoint events with union $\Omega$ and $\mathbb{P}(B_j)>0$ for each $j$, and let $\mathcal G=\sigma(B_1,\ldots,B_m)$. For an event $A\in\mathcal F$, define
\begin{align*} Y=\sum_{j=1}^m \frac{\mathbb{P}(A\cap B_j)}{\mathbb{P}(B_j)}\mathbb{1}_{B_j}. \end{align*}
If $\omega\in B_k$, then $\mathbb{1}_{B_k}(\omega)=1$ and $\mathbb{1}_{B_j}(\omega)=0$ for every $j\ne k$, because the events $B_1,\ldots,B_m$ are disjoint. Hence
\begin{align*} Y(\omega)=\frac{\mathbb{P}(A\cap B_k)}{\mathbb{P}(B_k)}. \end{align*}
Thus $Y$ is constant on each atom $B_k$, so $Y$ is $\mathcal G$-measurable.
Now let $G\in\mathcal G$. Since $\mathcal G$ is generated by the finite partition $B_1,\ldots,B_m$, there is a set of indices $J\subset\{1,\ldots,m\}$ such that
\begin{align*} G=\bigcup_{j\in J} B_j. \end{align*}
Using the displayed value of $Y$ on each $B_j$ and finite additivity of the integral over disjoint sets,
\begin{align*} \int_G Y\,d\mathbb P=\sum_{j\in J}\frac{\mathbb{P}(A\cap B_j)}{\mathbb{P}(B_j)}\mathbb{P}(B_j). \end{align*}
Since $\mathbb{P}(B_j)>0$, each factor $\mathbb{P}(B_j)$ cancels:
\begin{align*} \int_G Y\,d\mathbb P=\sum_{j\in J}\mathbb{P}(A\cap B_j). \end{align*}
The events $A\cap B_j$ for $j\in J$ are disjoint, and their union is $A\cap G$, so finite additivity gives
\begin{align*} \int_G Y\,d\mathbb P=\mathbb{P}(A\cap G). \end{align*}
Therefore
\begin{align*} \mathbb{P}(A\mid\mathcal G)=\sum_{j=1}^m \frac{\mathbb{P}(A\cap B_j)}{\mathbb{P}(B_j)}\mathbb{1}_{B_j}. \end{align*}
On the atom $B_j$, this conditional probability equals the ordinary conditional probability
\begin{align*} \mathbb{P}(A\mid B_j)=\frac{\mathbb{P}(A\cap B_j)}{\mathbb{P}(B_j)}. \end{align*}
[/example]
### Indicator Processes
Events may depend on time, a threshold, or another index. Instead of repeatedly naming separate indicators, it is useful to regard the whole indexed family as a [stochastic process](/page/Stochastic%20Process), especially when time averages or filtrations enter the problem.
[definition: Indicator Process]
Let $(\Omega,\mathcal F,\mathbb P)$ be a probability space and let $(A_t)_{t\in T}$ be a family of events indexed by a set $T$. The indicator process associated to $(A_t)_{t\in T}$ is the stochastic process $(X_t)_{t\in T}$ such that, for each $t\in T$, the coordinate map $X_t: \Omega \to \{0,1\}$ is defined by $X_t=\mathbb{1}_{A_t}$.
[/definition]
For example, if $A_t$ is the event that a queue is nonempty at time $t$, then $X_t$ records whether the system is busy. Time averages of $X_t$ become proportions of time spent in the event.
[example: Occupation Time as an Integral of Indicators]
Let $(X_t)_{t\ge 0}$ be a real-valued stochastic process whose sample paths $t\mapsto X_t(\omega)$ are Borel-measurable, and fix a Borel set $B\subset \mathbb R$. For a fixed outcome $\omega$, define
\begin{align*} f_\omega(t)=\mathbb{1}_{\{X_t(\omega)\in B\}},\qquad 0\le t\le T. \end{align*}
Since $t\mapsto X_t(\omega)$ is Borel-measurable and $B$ is Borel, the set
\begin{align*} E_\omega=\{t\in[0,T]:X_t(\omega)\in B\} \end{align*}
is a Borel subset of $[0,T]$. Therefore $f_\omega=\mathbb{1}_{E_\omega}$ is Lebesgue-measurable.
The occupation time of $B$ up to time $T$ is
\begin{align*} O_T(B)(\omega)=\int_0^T \mathbb{1}_{\{X_t(\omega)\in B\}}\,dt. \end{align*}
Because the integrand is the indicator of $E_\omega$, its integral is the [Lebesgue measure](/page/Lebesgue%20Measure) of $E_\omega$:
\begin{align*} O_T(B)(\omega)=\lambda(E_\omega)=\lambda(\{t\in[0,T]:X_t(\omega)\in B\}). \end{align*}
Thus $O_T(B)(\omega)$ records exactly the length of time during $[0,T]$ for which the sample path lies in $B$. If $X_t$ is the position of a particle and $B$ is a region of interest, then $O_T(B)$ is the time spent in that region over the interval $[0,T]$.
[/example]
## Limits and Rare Events
### Event Limits Through Indicators
Sequences of indicators encode sequences of events. This is valuable because event occurrence may stabilize, recur infinitely often, or become rare. To talk about these behaviours precisely, we first name the two standard set-theoretic limit events.
[definition: Eventual and Infinitely Often Events]
Let $(\Omega,\mathcal F,\mathbb P)$ be a probability space, and let $(A_n)_{n\ge 1}$ be a sequence of events. The event that $A_n$ occurs infinitely often is $\{A_n \text{ i.o.}\}=\limsup_n A_n=\bigcap_{n=1}^{\infty}\bigcup_{m\ge n}A_m$. The event that $A_n$ occurs for all sufficiently large $n$ is $\liminf_n A_n=\bigcup_{n=1}^{\infty}\bigcap_{m\ge n}A_m$.
[/definition]
The indicators $\mathbb{1}_{A_n}$ converge pointwise exactly when the membership of $\omega$ in $A_n$ eventually settles. If the upper and lower event limits differ, some outcomes keep switching between occurrence and non-occurrence instead of having a stable yes-or-no answer. To recognize exactly when a sequence of indicator variables has a pointwise limit, we need a criterion stated in terms of the limsup and liminf events.
[quotetheorem:10147]
Pointwise convergence is often too strong for probability applications. The probabilistic version asks only that the disagreement set between $A_n$ and $A$ have probability tending to zero.
[quotetheorem:10148]
The symmetric difference measures the probability that the two yes-or-no answers disagree. For indicators, convergence in probability is exactly disagreement probability tending to zero.
[example: Shrinking Intervals]
Let $U\sim \operatorname{Unif}(0,1)$, and set $A_n=\{U\le 1/n\}$. We show that $\mathbb{1}_{A_n}\to 0$ almost surely and that the expectations also tend to $0$.
Fix an outcome $\omega$ with $U(\omega)>0$. By the [Archimedean property](/theorems/737), choose an integer $N$ such that $N>1/U(\omega)$. If $n\ge N$, then $n>1/U(\omega)$, so
\begin{align*} \frac{1}{n}<U(\omega). \end{align*}
Thus $\omega\notin A_n$ for every $n\ge N$, and hence
\begin{align*} \mathbb{1}_{A_n}(\omega)=0 \end{align*}
for every $n\ge N$. Therefore $\mathbb{1}_{A_n}(\omega)\to 0$ for every $\omega$ with $U(\omega)>0$. Since $U$ has the uniform distribution on $(0,1)$, the exceptional event satisfies
\begin{align*} \mathbb{P}(U=0)=0. \end{align*}
So $\mathbb{1}_{A_n}\to 0$ almost surely.
For the expectation, $\mathbb{1}_{A_n}$ takes value $1$ on $A_n$ and value $0$ on $A_n^c$, so
\begin{align*} \mathbb{E}[\mathbb{1}_{A_n}]=1\cdot \mathbb{P}(A_n)+0\cdot \mathbb{P}(A_n^c). \end{align*}
Hence
\begin{align*} \mathbb{E}[\mathbb{1}_{A_n}]=\mathbb{P}(A_n). \end{align*}
Because $A_n=\{U\le 1/n\}$ and $1/n\in(0,1]$, the uniform distribution gives
\begin{align*} \mathbb{P}(A_n)=\mathbb{P}(U\le 1/n)=\frac{1}{n}. \end{align*}
Therefore
\begin{align*} \mathbb{E}[\mathbb{1}_{A_n}]=\frac{1}{n}. \end{align*}
Since $1/n\to 0$, the rare-event probabilities and the corresponding indicator expectations both vanish.
[/example]
### Rare-Event Counts
Rare-event counts are sums of indicators where each event has small probability but many opportunities to occur. This setting needs a theorem that explains when the aggregate count has a Poisson limit rather than collapsing to $0$.
[quotetheorem:10149]
This result says that many nearly impossible independent events can produce a non-degenerate count. Indicators isolate the microscopic events; the Poisson distribution describes their aggregate.
[example: Birthday Collisions as Indicator Events]
Suppose $m$ people independently choose birthdays uniformly from $365$ days. For each pair $\{i,j\}$ with $1\le i<j\le m$, let $A_{ij}$ be the event that persons $i$ and $j$ have the same birthday. Define
\begin{align*} C=\sum_{1\le i<j\le m}\mathbb{1}_{A_{ij}}. \end{align*}
For a fixed birthday assignment $\omega$, the summand $\mathbb{1}_{A_{ij}}(\omega)$ equals $1$ exactly when persons $i$ and $j$ match, and equals $0$ otherwise. Therefore $C(\omega)$ is the number of matching pairs of people in that assignment.
Fix a pair $\{i,j\}$. Person $i$ has some birthday, and person $j$ matches it with probability $1/365$, because person $j$ chooses uniformly from the $365$ possible days independently of person $i$. Thus
\begin{align*} \mathbb{P}(A_{ij})=\frac{1}{365}. \end{align*}
Since $\mathbb{1}_{A_{ij}}$ takes only the values $0$ and $1$,
\begin{align*} \mathbb{E}[\mathbb{1}_{A_{ij}}]=1\cdot \mathbb{P}(A_{ij})+0\cdot \mathbb{P}(A_{ij}^c). \end{align*}
Substituting $\mathbb{P}(A_{ij})=1/365$ gives
\begin{align*} \mathbb{E}[\mathbb{1}_{A_{ij}}]=\frac{1}{365}. \end{align*}
Using finite linearity of expectation,
\begin{align*} \mathbb{E}[C]=\mathbb{E}\left[\sum_{1\le i<j\le m}\mathbb{1}_{A_{ij}}\right]=\sum_{1\le i<j\le m}\mathbb{E}[\mathbb{1}_{A_{ij}}]. \end{align*}
There is one pair for each two-element subset of the $m$ people, so the number of summands is $\binom{m}{2}$. Each summand has expectation $1/365$, hence
\begin{align*} \mathbb{E}[C]=\sum_{1\le i<j\le m}\frac{1}{365}=\binom{m}{2}\frac{1}{365}=\frac{\binom{m}{2}}{365}. \end{align*}
The event of at least one shared birthday is
\begin{align*} \{C\ge 1\}. \end{align*}
Thus the indicators make the expected number of matching pairs easy to compute, while the probability of at least one shared birthday asks for the probability that this count is positive.
[/example]
## Beyond and Connected Topics
Indicator random variables sit immediately below the general theory of [random variables](/page/Random%20Variable). They are the simplest nonconstant random variables and the first examples where measurability matters: the event $A$ must belong to $\mathcal F$ for $\mathbb{1}_A$ to be a random variable.
In measure theory, indicators are the atoms from which simple functions and Lebesgue integrals are built. The identity $\int \mathbb{1}_B\,d\mu=\mu(B)$ is the starting point for extending integration from sets to nonnegative [measurable functions](/page/Measurable%20Functions) and then to integrable signed functions.
In elementary probability, indicators are the standard method for expected counts. This includes fixed points of permutations, coupon collection, occupancy problems, records, and subgraph counts in random graphs. The connection with [Cambridge II Graph Theory](/page/Cambridge%20II%20Graph%20Theory) appears through probabilistic counting arguments on random graphs.
In advanced probability, indicator processes and conditional expectations of indicators lead toward martingales, stopping times, occupation times, and point processes. The expression $\mathbb{P}(A\mid\mathcal G)=\mathbb{E}[\mathbb{1}_A\mid\mathcal G]$ is a gateway from elementary conditional probability to the measure-theoretic formulation used in stochastic processes.
For rare events, sums of indicators are the natural input for Poisson approximation, the [second moment method](/theorems/2057), concentration inequalities, and Borel-Cantelli arguments. In each case, the hard problem is not defining the indicators, but understanding how much dependence remains among them.
These directions are extensions of the same basic device rather than replacements for the central message: indicators turn events into random variables so that integration, sums, and limits can act on them.
## References
Androma, [Cambridge IA Probability](/page/Cambridge%20IA%20Probability).
Androma, [Cambridge IB Probability and Measure](/page/Cambridge%20IB%20Probability%20and%20Measure).
Androma, [Cambridge III Advanced Probability](/page/Cambridge%20III%20Advanced%20Probability).
Androma, [Cambridge II Graph Theory](/page/Cambridge%20II%20Graph%20Theory).
Androma, [Random Variable](/page/Random%20Variable).
William Feller, *An Introduction to Probability Theory and Its Applications, Volume I* (1968).
Patrick Billingsley, *Probability and Measure* (1995).
Rick Durrett, *Probability: Theory and Examples* (2019).
Geoffrey Grimmett and David Stirzaker, *Probability and Random Processes* (2020).
Indicator Random Variable
Also known as: Indicator variable, Indicator function, Event indicator, Bernoulli indicator, Dummy variable, Indicator random variable