[proofplan]
We first isolate each stratum by writing its sample mean as a [random variable](/page/Random%20Variable) $Y_k := \bar{g}_k$. Inside a fixed stratum, the centered variables $g(X_{k,j}) - \mathbb{E}[g(X_{k,1})]$ are independent, square-integrable, and have mean zero, so expanding the square of their average gives $\operatorname{Var}(Y_k) = \sigma_k^2/n_k$. The random variables $Y_1,\dots,Y_K$ are independent because each is a measurable function of an independent block of samples. Finally, we expand the variance of the weighted sum defining $\hat{I}_{\mathrm{strat}}$, use the vanishing of cross terms, and factor out $\mathcal{L}^d(D)^2$.
[/proofplan]
[step:Define the stratum means as square-integrable random variables]
For each $k \in \{1,\dots,K\}$, define the real-valued random variable
\begin{align*}
Y_k : \Omega \to \mathbb{R}, \qquad Y_k(\omega) := \frac{1}{n_k}\sum_{j=1}^{n_k} g(X_{k,j}(\omega)).
\end{align*}
Thus $Y_k = \bar{g}_k$. The measurability of $Y_k$ follows because each composition $g \circ X_{k,j} : (\Omega,\mathcal{F}) \to (\mathbb{R},\mathcal{B}(\mathbb{R}))$ is measurable and finite sums of measurable real-valued functions are measurable. Since each $g(X_{k,j})$ belongs to $L^2(\Omega,\mathcal{F},\mathbb{P})$ and $n_k$ is finite, the finite linear combination $Y_k$ also belongs to $L^2(\Omega,\mathcal{F},\mathbb{P})$. Hence $\operatorname{Var}(Y_k)$ is finite for every $k$.
For each $k$, define the mean
\begin{align*}
\mu_k := \mathbb{E}[g(X_{k,1})].
\end{align*}
This expectation is finite because $g(X_{k,1}) \in L^2(\Omega,\mathcal{F},\mathbb{P})$, and therefore $g(X_{k,1}) \in L^1(\Omega,\mathcal{F},\mathbb{P})$.
[/step]
[step:Compute the variance of each stratum sample mean by expanding centered variables]
Fix $k \in \{1,\dots,K\}$. For each $j \in \{1,\dots,n_k\}$, define the centered real-valued random variable
\begin{align*}
Z_{k,j} : \Omega \to \mathbb{R}, \qquad Z_{k,j}(\omega) := g(X_{k,j}(\omega)) - \mu_k
\end{align*}
The variables $Z_{k,1},\dots,Z_{k,n_k}$ are independent because they are [measurable functions](/page/Measurable%20Functions) of the independent variables $X_{k,1},\dots,X_{k,n_k}$; this is the standard [preservation of independence](/theorems/1116) under measurable maps. They satisfy $\mathbb{E}[Z_{k,j}] = 0$ and
\begin{align*}
\mathbb{E}[Z_{k,j}^2] = \operatorname{Var}(g(X_{k,j})) = \sigma_k^2
\end{align*}
where the last equality uses that $X_{k,j}$ and $X_{k,1}$ have the same distribution on $D_k$.
Since
\begin{align*}
Y_k - \mathbb{E}[Y_k] = \frac{1}{n_k}\sum_{j=1}^{n_k} Z_{k,j}
\end{align*}
we compute
\begin{align*}
\operatorname{Var}(Y_k) = \mathbb{E}\left[\left(\frac{1}{n_k}\sum_{j=1}^{n_k} Z_{k,j}\right)^2\right]
\end{align*}
Expanding the finite square gives
\begin{align*}
\operatorname{Var}(Y_k) = \frac{1}{n_k^2}\sum_{j=1}^{n_k}\sum_{\ell=1}^{n_k}\mathbb{E}[Z_{k,j}Z_{k,\ell}]
\end{align*}
If $j \neq \ell$, independence and zero mean give
\begin{align*}
\mathbb{E}[Z_{k,j}Z_{k,\ell}] = \mathbb{E}[Z_{k,j}]\mathbb{E}[Z_{k,\ell}] = 0
\end{align*}
If $j = \ell$, then $\mathbb{E}[Z_{k,j}^2] = \sigma_k^2$. Therefore exactly $n_k$ diagonal terms remain, and
\begin{align*}
\operatorname{Var}(Y_k) = \frac{1}{n_k^2} n_k \sigma_k^2 = \frac{\sigma_k^2}{n_k}
\end{align*}
[guided]
Fix one stratum index $k \in \{1,\dots,K\}$. The goal in this step is to compute the variance of the average of the samples inside this single stratum. Define, for each $j \in \{1,\dots,n_k\}$, the centered random variable
\begin{align*}
Z_{k,j} : \Omega \to \mathbb{R}, \qquad Z_{k,j}(\omega) := g(X_{k,j}(\omega)) - \mu_k
\end{align*}
The centering is useful because variance is the second moment of the centered variable. Since $X_{k,1},\dots,X_{k,n_k}$ are independent and each $Z_{k,j}$ is obtained from $X_{k,j}$ by applying the measurable map $g$ and subtracting the constant $\mu_k$, the variables $Z_{k,1},\dots,Z_{k,n_k}$ are independent. Also,
\begin{align*}
\mathbb{E}[Z_{k,j}] = \mathbb{E}[g(X_{k,j})] - \mu_k = 0
\end{align*}
The equality $\mathbb{E}[g(X_{k,j})] = \mu_k$ holds because all variables $X_{k,j}$ have the same uniform distribution on $D_k$. Similarly, the second centered moment is the common stratum variance:
\begin{align*}
\mathbb{E}[Z_{k,j}^2] = \operatorname{Var}(g(X_{k,j})) = \operatorname{Var}(g(X_{k,1})) = \sigma_k^2
\end{align*}
Now write the centered sample mean in terms of these centered variables:
\begin{align*}
Y_k - \mathbb{E}[Y_k] = \frac{1}{n_k}\sum_{j=1}^{n_k} Z_{k,j}
\end{align*}
By the definition of variance,
\begin{align*}
\operatorname{Var}(Y_k) = \mathbb{E}\left[\left(Y_k - \mathbb{E}[Y_k]\right)^2\right] = \mathbb{E}\left[\left(\frac{1}{n_k}\sum_{j=1}^{n_k} Z_{k,j}\right)^2\right]
\end{align*}
Because the sum is finite, we may expand the square algebraically before taking expectation:
\begin{align*}
\operatorname{Var}(Y_k) = \frac{1}{n_k^2}\sum_{j=1}^{n_k}\sum_{\ell=1}^{n_k}\mathbb{E}[Z_{k,j}Z_{k,\ell}]
\end{align*}
There are two kinds of terms. For off-diagonal terms with $j \neq \ell$, independence gives factorisation of expectation, and the zero means make the term vanish:
\begin{align*}
\mathbb{E}[Z_{k,j}Z_{k,\ell}] = \mathbb{E}[Z_{k,j}]\mathbb{E}[Z_{k,\ell}] = 0
\end{align*}
For diagonal terms with $j = \ell$, the term is exactly the second centered moment:
\begin{align*}
\mathbb{E}[Z_{k,j}Z_{k,j}] = \mathbb{E}[Z_{k,j}^2] = \sigma_k^2
\end{align*}
Thus the double sum has $n_k$ nonzero diagonal contributions, each equal to $\sigma_k^2$. Hence
\begin{align*}
\operatorname{Var}(Y_k) = \frac{1}{n_k^2} n_k \sigma_k^2 = \frac{\sigma_k^2}{n_k}
\end{align*}
This is precisely the variance-reduction factor for averaging $n_k$ independent samples in the $k\text{-th}$ stratum.
[/guided]
[/step]
[step:Show the stratum sample means are independent]
For each $k \in \{1,\dots,K\}$, define the sub-$\sigma$-algebra
\begin{align*}
\mathcal{G}_k := \sigma(X_{k,1},\dots,X_{k,n_k}) \subset \mathcal{F}.
\end{align*}
The hypothesis that all samples are independent implies that the sub-$\sigma$-algebras $\mathcal{G}_1,\dots,\mathcal{G}_K$ are independent. Indeed, for events $A_k \in \mathcal{G}_k$, the defining identity $\mathbb{P}(A_1 \cap \cdots \cap A_K) = \prod_{k=1}^K \mathbb{P}(A_k)$ first holds on cylinder events determined by finitely many coordinates in each block by independence of the full family, and the finite monotone-class theorem extends it to the generated $\sigma$-algebras. Since $Y_k$ is $\mathcal{G}_k$-measurable for each $k$, the random variables $Y_1,\dots,Y_K$ are independent.
[/step]
[step:Expand the variance of the weighted stratified estimator]
Define the real-valued random variable
\begin{align*}
S : \Omega \to \mathbb{R}, \qquad S(\omega) := \sum_{k=1}^K p_k Y_k(\omega).
\end{align*}
Then
\begin{align*}
\hat{I}_{\mathrm{strat}} = \mathcal{L}^d(D) S.
\end{align*}
Since $\mathcal{L}^d(D)$ is a deterministic finite constant, the variance homogeneity identity $\operatorname{Var}(cV)=c^2\operatorname{Var}(V)$ for $c \in \mathbb{R}$ and $V \in L^2(\Omega,\mathcal{F},\mathbb{P})$ gives
\begin{align*}
\operatorname{Var}(\hat{I}_{\mathrm{strat}}) = \mathcal{L}^d(D)^2 \operatorname{Var}(S).
\end{align*}
For each $k$, define the centered weighted random variable
\begin{align*}
W_k : \Omega \to \mathbb{R}, \qquad W_k(\omega) := p_k\bigl(Y_k(\omega) - \mathbb{E}[Y_k]\bigr).
\end{align*}
The variables $W_1,\dots,W_K$ are independent, square-integrable, and have mean zero, because the variables $Y_1,\dots,Y_K$ have these properties after centering and deterministic scaling. Moreover,
\begin{align*}
S - \mathbb{E}[S] = \sum_{k=1}^K W_k.
\end{align*}
By the definition of variance for a square-integrable real-valued random variable, applied to $S \in L^2(\Omega,\mathcal{F},\mathbb{P})$,
\begin{align*}
\operatorname{Var}(S) = \mathbb{E}\left[\left(\sum_{k=1}^K W_k\right)^2\right].
\end{align*}
Expanding the finite square,
\begin{align*}
\operatorname{Var}(S) = \sum_{k=1}^K\sum_{\ell=1}^K \mathbb{E}[W_k W_\ell].
\end{align*}
For $k \neq \ell$, independence and zero mean imply
\begin{align*}
\mathbb{E}[W_k W_\ell] = \mathbb{E}[W_k]\mathbb{E}[W_\ell] = 0.
\end{align*}
For $k = \ell$,
\begin{align*}
\mathbb{E}[W_k^2] = p_k^2 \operatorname{Var}(Y_k).
\end{align*}
Consequently, this proves the finite additivity of variance for the independent centered square-integrable summands $W_1,\dots,W_K$ in the present setting, namely
\begin{align*}
\operatorname{Var}(S) = \sum_{k=1}^K p_k^2 \operatorname{Var}(Y_k).
\end{align*}
Substituting $\operatorname{Var}(Y_k) = \sigma_k^2/n_k$ from the previous computation yields
\begin{align*}
\operatorname{Var}(S) = \sum_{k=1}^K \frac{p_k^2\sigma_k^2}{n_k}.
\end{align*}
Multiplying by $\mathcal{L}^d(D)^2$ gives
\begin{align*}
\operatorname{Var}(\hat{I}_{\mathrm{strat}}) = \mathcal{L}^d(D)^2 \sum_{k=1}^K \frac{p_k^2\sigma_k^2}{n_k}.
\end{align*}
This is the desired variance formula.
[/step]