[proofplan]
We first analyze one proposal-acceptance pair $(Y,U)$ and compute the joint probability that the proposal lies in a measurable set $B$ and is accepted. The domination hypothesis implies $q a=p/M$ $\mu$-almost everywhere, including on the set where $q=0$, so the joint accepted law is exactly $M^{-1}p\,d\mu$. Taking $B=E$ gives the single-trial success probability $1/M$, and conditioning identifies the law of one accepted proposal. Finally, independence of the repeated pairs makes the acceptance indicators independent Bernoulli random variables, which gives the geometric law of $T$, almost sure finiteness, and the law of $Y_T$.
[/proofplan]
[step:Compute the accepted law for one proposal]
Let $Y:(\Omega,\mathcal F)\to(E,\mathcal E)$ be an $E$-valued [random variable](/page/Random%20Variable) with density $q$ with respect to $\mu$, and let $U:(\Omega,\mathcal F)\to((0,1),\mathcal B((0,1)))$ be independent of $Y$ with distribution $\operatorname{Unif}(0,1)$. Define the event $A\in\mathcal F$ by
\begin{align*}
A=\{U\le a(Y)\}.
\end{align*}
First observe that $p=0$ $\mu$-almost everywhere on the measurable set $\{x\in E:q(x)=0\}$. Indeed, on the full-measure set where $p\le Mq$, the condition $q(x)=0$ implies $0\le p(x)\le0$. Hence, for $\mu$-almost every $x\in E$,
\begin{align*}
q(x)a(x)=\frac{p(x)}{M}.
\end{align*}
Let $N\in\mathcal E$ be a $\mu$-null set outside which $p\le Mq$. Then $0\le a(x)\le1$ for every $x\in E\setminus N$ with $q(x)>0$, while $a(x)=0$ on $\{q=0\}$. Since
\begin{align*}
\int_N q(x)\,d\mu(x)=0,
\end{align*}
the exceptional set has zero mass under the proposal measure $q\,d\mu$.
Let $B\in\mathcal E$. Let $\mathcal L^1$ denote one-dimensional [Lebesgue measure](/page/Lebesgue%20Measure) on $(0,1)$, and let $\nu$ denote the proposal probability measure on $(E,\mathcal E)$ defined by
\begin{align*}
\nu(C)=\int_C q(y)\,d\mu(y)
\end{align*}
for $C\in\mathcal E$. Since $Y$ has density $q$ and $U$ is independent of $Y$, the joint law of $(Y,U)$ on $E\times(0,1)$ is $\nu\otimes\mathcal L^1$. Equivalently, this joint law has density $(y,u)\mapsto q(y)$ with respect to $\mu\otimes\mathcal L^1$. Integrating the indicator of the measurable set $\{(y,u)\in B\times(0,1):u\le a(y)\}$ against this product law gives
\begin{align*}
\mathbb P(Y\in B,A)=\int_B\int_0^1 \mathbb 1_{\{u\le a(y)\}}q(y)\,d\mathcal L^1(u)\,d\mu(y).
\end{align*}
For every $y\in E\setminus N$, the inner integral equals $a(y)$ because $0\le a(y)\le1$. Since $N$ has zero mass under $q\,d\mu$, changing the value of the inner integral on $N$ does not change the outer integral. Thus
\begin{align*}
\mathbb P(Y\in B,A)=\int_B q(y)a(y)\,d\mu(y).
\end{align*}
Using $q a=p/M$ $\mu$-almost everywhere gives
\begin{align*}
\mathbb P(Y\in B,A)=\frac{1}{M}\int_B p(y)\,d\mu(y).
\end{align*}
[guided]
We isolate one proposal because rejection sampling repeats identical independent trials. Let $Y:(\Omega,\mathcal F)\to(E,\mathcal E)$ have density $q$ with respect to $\mu$, let $U:(\Omega,\mathcal F)\to((0,1),\mathcal B((0,1)))$ be independent of $Y$ and uniformly distributed on $(0,1)$, and define
\begin{align*}
A=\{U\le a(Y)\}.
\end{align*}
The first point is the almost-everywhere bookkeeping at the set where $q$ vanishes. The domination assumption says $p(x)\le Mq(x)$ for $\mu$-almost every $x\in E$. Choose a measurable set $N\in\mathcal E$ with $\mu(N)=0$ such that $p(x)\le Mq(x)$ for every $x\in E\setminus N$. On the part of $E\setminus N$ where $q(x)=0$, this gives $0\le p(x)\le0$, hence $p(x)=0$. Therefore the convention $a(x)=0$ on $\{q=0\}$ is compatible with the identity
\begin{align*}
q(x)a(x)=\frac{p(x)}{M}
\end{align*}
for $\mu$-almost every $x\in E$. The same domination gives $0\le a(x)\le1$ for every $x\in E\setminus N$: when $q(x)>0$ this follows by dividing $p(x)\le Mq(x)$ by $Mq(x)$, and when $q(x)=0$ it follows from the definition $a(x)=0$. Since
\begin{align*}
\int_N q(x)\,d\mu(x)=0,
\end{align*}
the exceptional set has zero mass under the proposal measure $q\,d\mu$ and is never seen by $Y$ with positive probability.
Now fix a measurable set $B\in\mathcal E$. Let $\mathcal L^1$ denote one-dimensional Lebesgue measure on $(0,1)$, and define the proposal probability measure $\nu$ on $(E,\mathcal E)$ by
\begin{align*}
\nu(C)=\int_C q(y)\,d\mu(y)
\end{align*}
for $C\in\mathcal E$. Independence of $Y$ and $U$ means that the joint law of $(Y,U)$ is $\nu\otimes\mathcal L^1$. Since $\nu$ has density $q$ with respect to $\mu$, the same joint law has density $(y,u)\mapsto q(y)$ with respect to $\mu\otimes\mathcal L^1$. Therefore the event $\{Y\in B,U\le a(Y)\}$ has probability obtained by integrating its indicator against this product law:
\begin{align*}
\mathbb P(Y\in B,A)=\int_B\int_0^1 \mathbb 1_{\{u\le a(y)\}}q(y)\,d\mathcal L^1(u)\,d\mu(y).
\end{align*}
For each $y\in E\setminus N$, the one-dimensional Lebesgue measure of $\{u\in(0,1):u\le a(y)\}$ is $a(y)$, because $0\le a(y)\le1$. Hence the inner integral becomes $a(y)$ outside $N$. The contribution of $N$ to the outer integral is zero because $\int_N q\,d\mu=0$, and so
\begin{align*}
\mathbb P(Y\in B,A)=\int_B q(y)a(y)\,d\mu(y).
\end{align*}
Substituting the almost-everywhere identity $q a=p/M$ yields
\begin{align*}
\mathbb P(Y\in B,A)=\frac{1}{M}\int_B p(y)\,d\mu(y).
\end{align*}
This is the central calculation: among accepted proposals, the unnormalised law is exactly the target density scaled by the constant $1/M$.
[/guided]
[/step]
[step:Identify the one-step acceptance probability and conditional accepted distribution]
Taking $B=E$ in the preceding formula and using that $p$ is a probability density gives
\begin{align*}
\mathbb P(A)=\frac{1}{M}\int_E p(y)\,d\mu(y)=\frac{1}{M}.
\end{align*}
Since $M<\infty$, this probability is positive. Therefore, for every $B\in\mathcal E$,
\begin{align*}
\mathbb P(Y\in B\mid A)=\frac{\mathbb P(Y\in B,A)}{\mathbb P(A)}=\int_B p(y)\,d\mu(y).
\end{align*}
Thus the conditional law of a proposal given acceptance has density $p$ with respect to $\mu$.
[/step]
[step:Turn repeated acceptances into independent Bernoulli trials]
For each $n\in\mathbb N$, define the acceptance event $A_n\in\mathcal F$ and its indicator $I_n:\Omega\to\{0,1\}$ by
\begin{align*}
A_n=\{U_n\le a(Y_n)\}
\end{align*}
and
\begin{align*}
I_n=\mathbb 1_{A_n}.
\end{align*}
The pairs $(Y_n,U_n)$ are independent because the sequence $(Y_n)_{n\ge1}$ is independent, the sequence $(U_n)_{n\ge1}$ is independent, and the two families are independent of each other. Since $A_n$ is determined by the pair $(Y_n,U_n)$, the events $(A_n)_{n\ge1}$ are independent. Applying the one-step computation to each pair gives
\begin{align*}
\mathbb P(A_n)=\frac{1}{M}
\end{align*}
for every $n\in\mathbb N$. Hence $(I_n)_{n\ge1}$ are independent Bernoulli random variables with success probability $1/M$.
[/step]
[step:Derive the geometric law and almost sure finiteness of the stopping time]
For $n\in\mathbb N$, the event $\{T=n\}$ is
\begin{align*}
\{T=n\}=A_1^c\cap\cdots\cap A_{n-1}^c\cap A_n.
\end{align*}
Using independence of the events $(A_k)_{k\ge1}$ and the identity $\mathbb P(A_k)=1/M$, we obtain
\begin{align*}
\mathbb P(T=n)=\left(1-\frac{1}{M}\right)^{n-1}\frac{1}{M}.
\end{align*}
Also
\begin{align*}
\mathbb P(T>n)=\mathbb P(A_1^c\cap\cdots\cap A_n^c)=\left(1-\frac{1}{M}\right)^n.
\end{align*}
Because $M<\infty$, we have $1/M>0$, so
\begin{align*}
\lim_{n\to\infty}\left(1-\frac{1}{M}\right)^n=0.
\end{align*}
The events $\{T>n\}$ decrease to $\{T=\infty\}$. By continuity from above for probability measures, therefore
\begin{align*}
\mathbb P(T=\infty)=\lim_{n\to\infty}\mathbb P(T>n)=0.
\end{align*}
Thus $T<\infty$ almost surely and $T$ has the geometric distribution on $\{1,2,\dots\}$ with success probability $1/M$.
[/step]
[step:Sum over the accepted trial to obtain the density of $Y_T$]
Let $B\in\mathcal E$. Since $T<\infty$ almost surely, the events $\{T=n\}$ for $n\in\mathbb N$ partition a full-probability subset of $\Omega$. Hence
\begin{align*}
\mathbb P(Y_T\in B)=\sum_{n=1}^{\infty}\mathbb P(Y_n\in B,T=n).
\end{align*}
For each $n$, the event $\{T=n\}$ is $A_1^c\cap\cdots\cap A_{n-1}^c\cap A_n$. The event $A_1^c\cap\cdots\cap A_{n-1}^c$ is determined by the pairs $(Y_k,U_k)$ with $1\le k<n$, while the event $\{Y_n\in B\}\cap A_n$ is determined by $(Y_n,U_n)$. By independence of the pairs,
\begin{align*}
\mathbb P(Y_n\in B,T=n)=\left(1-\frac{1}{M}\right)^{n-1}\mathbb P(Y_n\in B,A_n).
\end{align*}
The one-step accepted-law computation gives
\begin{align*}
\mathbb P(Y_n\in B,A_n)=\frac{1}{M}\int_B p(y)\,d\mu(y).
\end{align*}
Therefore
\begin{align*}
\mathbb P(Y_T\in B)=\sum_{n=1}^{\infty}\left(1-\frac{1}{M}\right)^{n-1}\frac{1}{M}\int_B p(y)\,d\mu(y).
\end{align*}
The geometric series has sum $1$ because $1/M>0$, so
\begin{align*}
\mathbb P(Y_T\in B)=\int_B p(y)\,d\mu(y).
\end{align*}
Since this holds for every $B\in\mathcal E$, the accepted output $Y_T$ has density $p$ with respect to $\mu$. This completes the proof.
[/step]