[proofplan]
The proof is a direct identification calculation. Since $A$ and $\varepsilon$ are integrable and $Y=\alpha+\beta A+\varepsilon$, the conditional means of $Y$ given the events $\{Z=0\}$ and $\{Z=1\}$ are well-defined and can be computed by linearity of expectation. Subtracting the two conditional mean equations cancels the intercept, and the equality of conditional error means cancels the error contribution. The remaining reduced-form difference is $\beta$ times the first-stage difference, and the nonzero first stage permits division.
[/proofplan]
[step:Verify that the conditional means in the Wald ratio are well-defined]
Because $A$ and $\varepsilon$ are integrable and $\alpha,\beta\in\mathbb R$, the [random variable](/page/Random%20Variable) $Y=\alpha+\beta A+\varepsilon$ is integrable. Indeed,
\begin{align*} \mathbb E[|Y|]\le |\alpha|+|\beta|\mathbb E[|A|]+\mathbb E[|\varepsilon|]<\infty. \end{align*}
Since $\mathbb P(Z=0)>0$ and $\mathbb P(Z=1)>0$, the conditional expectations $\mathbb E[Y\mid Z=z]$, $\mathbb E[A\mid Z=z]$, and $\mathbb E[\varepsilon\mid Z=z]$ are defined for $z\in\{0,1\}$ as conditional expectations given positive-probability events.
[/step]
[step:Compute the conditional outcome mean at each instrument level]
Fix $z\in\{0,1\}$. Let $E_z\in\mathcal F$ denote the event $E_z:=\{\omega\in\Omega:Z(\omega)=z\}$. Since $\mathbb P(E_z)>0$, conditioning on $Z=z$ means conditioning on the event $E_z$. By linearity of the integral over the probability measure $\mathbb P(\,\cdot\,\mid E_z)$,
\begin{align*} \mathbb E[Y\mid Z=z]=\mathbb E[\alpha+\beta A+\varepsilon\mid Z=z]. \end{align*}
Therefore
\begin{align*} \mathbb E[Y\mid Z=z]=\alpha+\beta\mathbb E[A\mid Z=z]+\mathbb E[\varepsilon\mid Z=z]. \end{align*}
[guided]
We first translate the structural equation into an equation for conditional means. Fix $z\in\{0,1\}$, and define the event
\begin{align*} E_z:=\{\omega\in\Omega:Z(\omega)=z\}. \end{align*}
The hypotheses $\mathbb P(Z=0)>0$ and $\mathbb P(Z=1)>0$ imply $\mathbb P(E_z)>0$, so [conditional expectation](/page/Conditional%20Expectation) given $Z=z$ is ordinary expectation with respect to the [conditional probability measure](/theorems/4972) $\mathbb P(\,\cdot\,\mid E_z)$.
The structural equation is
\begin{align*} Y=\alpha+\beta A+\varepsilon. \end{align*}
All three terms on the right are integrable: constants are integrable on a [probability space](/page/Probability%20Space), $A$ and $\varepsilon$ are integrable by hypothesis, and multiplication by the finite constant $\beta$ preserves integrability. Hence linearity of expectation applies under the [conditional probability](/page/Conditional%20Probability) measure $\mathbb P(\,\cdot\,\mid E_z)$. We obtain
\begin{align*} \mathbb E[Y\mid Z=z]=\mathbb E[\alpha+\beta A+\varepsilon\mid Z=z]. \end{align*}
Linearity gives
\begin{align*} \mathbb E[Y\mid Z=z]=\mathbb E[\alpha\mid Z=z]+\mathbb E[\beta A\mid Z=z]+\mathbb E[\varepsilon\mid Z=z]. \end{align*}
Since $\alpha$ and $\beta$ are finite constants,
\begin{align*} \mathbb E[\alpha\mid Z=z]=\alpha. \end{align*}
Also,
\begin{align*} \mathbb E[\beta A\mid Z=z]=\beta\mathbb E[A\mid Z=z]. \end{align*}
Combining these identities yields
\begin{align*} \mathbb E[Y\mid Z=z]=\alpha+\beta\mathbb E[A\mid Z=z]+\mathbb E[\varepsilon\mid Z=z]. \end{align*}
This is the key reduction: the causal slope $\beta$ now appears as the coefficient of the conditional treatment mean inside an observable conditional mean equation.
[/guided]
[/step]
[step:Subtract the two conditional mean equations and cancel the error term]
Applying the preceding identity with $z=1$ and $z=0$ gives
\begin{align*} \mathbb E[Y\mid Z=1]=\alpha+\beta\mathbb E[A\mid Z=1]+\mathbb E[\varepsilon\mid Z=1]. \end{align*}
Also,
\begin{align*} \mathbb E[Y\mid Z=0]=\alpha+\beta\mathbb E[A\mid Z=0]+\mathbb E[\varepsilon\mid Z=0]. \end{align*}
Subtracting the second equality from the first cancels the intercept $\alpha$ and gives
\begin{align*} \mathbb E[Y\mid Z=1]-\mathbb E[Y\mid Z=0]=\beta\bigl(\mathbb E[A\mid Z=1]-\mathbb E[A\mid Z=0]\bigr)+\mathbb E[\varepsilon\mid Z=1]-\mathbb E[\varepsilon\mid Z=0]. \end{align*}
By the assumed equality of conditional error means, the last two terms cancel. Hence
\begin{align*} \mathbb E[Y\mid Z=1]-\mathbb E[Y\mid Z=0]=\beta\bigl(\mathbb E[A\mid Z=1]-\mathbb E[A\mid Z=0]\bigr). \end{align*}
[/step]
[step:Divide by the nonzero first stage to identify the Wald estimand]
Define the first-stage mean difference $\Delta_A\in\mathbb R$ by
\begin{align*} \Delta_A:=\mathbb E[A\mid Z=1]-\mathbb E[A\mid Z=0]. \end{align*}
The relevance hypothesis says $\Delta_A\ne 0$. The preceding step gives
\begin{align*} \mathbb E[Y\mid Z=1]-\mathbb E[Y\mid Z=0]=\beta\Delta_A. \end{align*}
Dividing both sides by $\Delta_A$ yields
\begin{align*} \frac{\mathbb E[Y\mid Z=1]-\mathbb E[Y\mid Z=0]}{\mathbb E[A\mid Z=1]-\mathbb E[A\mid Z=0]}=\beta. \end{align*}
By the definition of $\beta_{\mathrm{Wald}}$, this is precisely
\begin{align*} \beta_{\mathrm{Wald}}=\beta. \end{align*}
[/step]