Exchangeability Identification Formula — Statement & Proof

Exchangeability Identification Formula (Theorem # 9656)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] Fix the treatment value $x$ and compare three laws: the marginal law of $Y_x$, the conditional law of $Y_x$ among units with $X=x$, and the conditional law of the observed outcome $Y$ among those units. Marginal exchangeability identifies the first two laws. Consistency identifies the last two because $Y=Y_x$ on the event $\{X=x\}$. The expectation identity then follows from equality of laws by applying the law identity to bounded truncations and passing to the limit using integrability. [/proofplan] [step:Use exchangeability to condition on the observed treatment group] Fix $x\in\{0,1\}$ such that $\mathbb P(X=x)>0$, and fix a Borel set $A\in\mathcal B(\mathbb R)$. Define the event \begin{align*} E_A:=\{\omega\in\Omega:Y_x(\omega)\in A\}. \end{align*} Since $Y_x$ is $\mathcal F$-measurable and $A$ is Borel, $E_A\in\mathcal F$. Marginal exchangeability says that $Y_x$ is independent of $X$, hence $E_A$ is independent of the event $\{X=x\}$. Therefore \begin{align*} \mathbb P(E_A\cap\{X=x\})=\mathbb P(E_A)\mathbb P(X=x). \end{align*} Since $\mathbb P(X=x)>0$, division by $\mathbb P(X=x)$ gives \begin{align*} \mathbb P(Y_x\in A\mid X=x)=\mathbb P(Y_x\in A). \end{align*} [guided] We first isolate exactly what marginal exchangeability gives. Fix $x\in\{0,1\}$ with $\mathbb P(X=x)>0$, and let $A\in\mathcal B(\mathbb R)$. Define \begin{align*} E_A:=\{\omega\in\Omega:Y_x(\omega)\in A\}. \end{align*} This is an event because $Y_x:(\Omega,\mathcal F)\to(\mathbb R,\mathcal B(\mathbb R))$ is measurable and $A$ is Borel. Marginal exchangeability means that the potential outcome $Y_x$ is independent of the treatment $X$. Equivalently, every event determined by $Y_x$ is independent of every event determined by $X$. The event $E_A$ is determined by $Y_x$, and the event $\{X=x\}$ is determined by $X$, so \begin{align*} \mathbb P(E_A\cap\{X=x\})=\mathbb P(E_A)\mathbb P(X=x). \end{align*} The positivity hypothesis $\mathbb P(X=x)>0$ is exactly what permits conditioning on the group with observed treatment value $x$. Dividing by $\mathbb P(X=x)$ yields \begin{align*} \mathbb P(Y_x\in A\mid X=x)=\frac{\mathbb P(E_A\cap\{X=x\})}{\mathbb P(X=x)}=\mathbb P(E_A)=\mathbb P(Y_x\in A). \end{align*} Thus exchangeability identifies the marginal law of the potential outcome with its conditional law inside the observed treatment group. [/guided] [/step] [step:Use consistency to replace the potential outcome by the observed outcome] Define the event \begin{align*} F_A:=\{\omega\in\Omega:Y(\omega)\in A\}. \end{align*} By consistency, $Y=Y_x$ on $\{X=x\}$. Hence \begin{align*} E_A\cap\{X=x\}=F_A\cap\{X=x\}. \end{align*} Dividing the probabilities of these equal events by $\mathbb P(X=x)$ gives \begin{align*} \mathbb P(Y_x\in A\mid X=x)=\mathbb P(Y\in A\mid X=x). \end{align*} Combining this identity with the previous step gives \begin{align*} \mathbb P(Y_x\in A)=\mathbb P(Y\in A\mid X=x). \end{align*} [guided] We now use consistency to connect the potential outcome to the observed outcome. Define the observed-outcome event \begin{align*} F_A:=\{\omega\in\Omega:Y(\omega)\in A\}. \end{align*} This is an event because $Y:(\Omega,\mathcal F)\to(\mathbb R,\mathcal B(\mathbb R))$ is measurable and $A\in\mathcal B(\mathbb R)$. Consistency says that, on the event $\{X=x\}$, the observed outcome is exactly the potential outcome under treatment value $x$. Therefore an outcome value belongs to $A$ for $Y_x$ if and only if it belongs to $A$ for $Y$, once we restrict to units satisfying $X=x$. Hence \begin{align*} E_A\cap\{X=x\}=F_A\cap\{X=x\}. \end{align*} Because $\mathbb P(X=x)>0$, conditional probabilities given $X=x$ are obtained by dividing by $\mathbb P(X=x)$. Thus \begin{align*} \mathbb P(Y_x\in A\mid X=x)=\frac{\mathbb P(E_A\cap\{X=x\})}{\mathbb P(X=x)}=\frac{\mathbb P(F_A\cap\{X=x\})}{\mathbb P(X=x)}=\mathbb P(Y\in A\mid X=x). \end{align*} The preceding step proved $\mathbb P(Y_x\in A\mid X=x)=\mathbb P(Y_x\in A)$. Combining the two equalities gives \begin{align*} \mathbb P(Y_x\in A)=\mathbb P(Y\in A\mid X=x). \end{align*} [/guided] [/step] [step:Pass from equality of laws to equality of expectations by truncation] Assume $Y_x\in L^1(\Omega,\mathcal F,\mathbb P)$. Define the [conditional probability measure](/theorems/4972) \begin{align*} \mathbb P_x^Y(B):=\mathbb P(B\mid X=x)=\frac{\mathbb P(B\cap\{X=x\})}{\mathbb P(X=x)} \end{align*} for every $B\in\mathcal F$. The law identity already proved says that, under $\mathbb P$, the [random variable](/page/Random%20Variable) $Y_x$ has the same distribution as $Y$ under $\mathbb P_x^Y$. Consequently, if $g:\mathbb R\to\mathbb R$ is bounded and Borel, then \begin{align*} \mathbb E[g(Y_x)]=\mathbb E[g(Y)\mid X=x]. \end{align*} Indeed, this follows first for indicator functions $g=\mathbb 1_B$ with $B\in\mathcal B(\mathbb R)$ from the equality of laws, then for non-negative simple Borel functions by linearity. If $g$ is bounded and non-negative, choose finite-valued Borel simple functions $s_n:\mathbb R\to[0,\infty)$ with $\|s_n-g\|_\infty\to0$; the two expectations differ by at most $\|s_n-g\|_\infty$, so the identity passes to $g$. Applying this to the positive and negative parts gives the identity for every bounded Borel $g:\mathbb R\to\mathbb R$. For each $n\in\mathbb N$, define the bounded Borel truncation map \begin{align*} \tau_n:\mathbb R&\to\mathbb R \end{align*} by \begin{align*} \tau_n(t):=\max\{-n,\min\{t,n\}\}. \end{align*} Equality of distributions gives \begin{align*} \mathbb E[\tau_n(Y_x)]=\mathbb E[\tau_n(Y)\mid X=x]. \end{align*} Also define \begin{align*} \sigma_n:\mathbb R&\to[0,\infty) \end{align*} by \begin{align*} \sigma_n(t):=\min\{|t|,n\}. \end{align*} Again by equality of distributions, \begin{align*} \mathbb E[\sigma_n(Y)\mid X=x]=\mathbb E[\sigma_n(Y_x)]. \end{align*} Since $\sigma_n(Y_x)\uparrow |Y_x|$, the [monotone convergence theorem](/theorems/509) gives \begin{align*} \lim_{n\to\infty}\mathbb E[\sigma_n(Y_x)]=\mathbb E[|Y_x|]<\infty. \end{align*} Therefore $Y$ is integrable under the [conditional probability](/page/Conditional%20Probability) measure $\mathbb P_x^Y$. Since $|\tau_n(Y_x)|\le |Y_x|$ and $\tau_n(Y_x)\to Y_x$ pointwise, the [dominated convergence theorem](/theorems/4) gives \begin{align*} \lim_{n\to\infty}\mathbb E[\tau_n(Y_x)]=\mathbb E[Y_x]. \end{align*} Since $|\tau_n(Y)|\le |Y|$ and $\tau_n(Y)\to Y$ $\mathbb P_x^Y$-a.s., the [dominated convergence theorem](/theorems/7529) under $\mathbb P_x^Y$ gives \begin{align*} \lim_{n\to\infty}\mathbb E[\tau_n(Y)\mid X=x]=\mathbb E[Y\mid X=x]. \end{align*} Passing to the limit in \begin{align*} \mathbb E[\tau_n(Y_x)]=\mathbb E[\tau_n(Y)\mid X=x] \end{align*} therefore yields \begin{align*} \mathbb E[Y_x]=\mathbb E[Y\mid X=x]. \end{align*} This proves the distributional and expectation identification formulas. [guided] We have already proved equality of the two laws: \begin{align*} \mathbb P(Y_x\in A)=\mathbb P(Y\in A\mid X=x) \end{align*} for every $A\in\mathcal B(\mathbb R)$. To turn this law identity into an expectation identity, define the conditional probability measure $\mathbb P_x^Y$ on $(\Omega,\mathcal F)$ by \begin{align*} \mathbb P_x^Y(B):=\mathbb P(B\mid X=x)=\frac{\mathbb P(B\cap\{X=x\})}{\mathbb P(X=x)} \end{align*} for $B\in\mathcal F$. The positivity assumption $\mathbb P(X=x)>0$ makes this definition valid. Under $\mathbb P$, the random variable $Y_x$ has the same distribution as $Y$ under $\mathbb P_x^Y$. Therefore, for every bounded Borel map $g:\mathbb R\to\mathbb R$, \begin{align*} \mathbb E[g(Y_x)]=\mathbb E[g(Y)\mid X=x]. \end{align*} The justification is by the standard monotone-class construction: the identity holds for indicators of Borel sets by equality of laws, hence for non-negative simple Borel functions by linearity, then for bounded non-negative Borel functions by uniform approximation with finite-valued Borel simple functions, and finally for bounded signed Borel functions by applying the result to positive and negative parts. For $n\in\mathbb N$, define the bounded Borel truncation map $\tau_n:\mathbb R\to\mathbb R$ by \begin{align*} \tau_n(t):=\max\{-n,\min\{t,n\}\}. \end{align*} Applying the bounded-Borel identity with $g=\tau_n$ gives \begin{align*} \mathbb E[\tau_n(Y_x)]=\mathbb E[\tau_n(Y)\mid X=x]. \end{align*} To verify conditional integrability of $Y$, define $\sigma_n:\mathbb R\to[0,\infty)$ by \begin{align*} \sigma_n(t):=\min\{|t|,n\}. \end{align*} Again the bounded-Borel identity gives \begin{align*} \mathbb E[\sigma_n(Y)\mid X=x]=\mathbb E[\sigma_n(Y_x)]. \end{align*} Since $\sigma_n(Y_x)\uparrow |Y_x|$ pointwise and $Y_x\in L^1(\Omega,\mathcal F,\mathbb P)$, the monotone convergence theorem gives \begin{align*} \lim_{n\to\infty}\mathbb E[\sigma_n(Y_x)]=\mathbb E[|Y_x|]<\infty. \end{align*} It follows that $Y$ is integrable under $\mathbb P_x^Y$, which is exactly conditional integrability on $\{X=x\}$. Now $|\tau_n(Y_x)|\le |Y_x|$ and $\tau_n(Y_x)\to Y_x$ pointwise, so the dominated convergence theorem gives \begin{align*} \lim_{n\to\infty}\mathbb E[\tau_n(Y_x)]=\mathbb E[Y_x]. \end{align*} Similarly, $|\tau_n(Y)|\le |Y|$ and $\tau_n(Y)\to Y$ $\mathbb P_x^Y$-almost surely, while $Y$ is integrable under $\mathbb P_x^Y$. Applying the dominated convergence theorem under $\mathbb P_x^Y$ gives \begin{align*} \lim_{n\to\infty}\mathbb E[\tau_n(Y)\mid X=x]=\mathbb E[Y\mid X=x]. \end{align*} Passing to the limit in the truncation identity therefore yields \begin{align*} \mathbb E[Y_x]=\mathbb E[Y\mid X=x]. \end{align*} [/guided] [/step]

Prerequisites (0/1 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Definitions & Concepts

Random Variable

Explore Further

What brings you to Androma?

Start with a route through the knowledge graph.

Exchangeability Identification Formula (Theorem # 9656)

Discussion

Proof

Prerequisites (0/1 completed)

Prerequisites Graph

Explore Further

Sign in to Androma

Check your inbox

One last step

Exchangeability Identification Formula (Theorem # 9656)

Discussion

Proof

Prerequisites (0/1 completed)

Prerequisites Graph

Explore Further