[proofplan]
We set $Z := \mathbb E[Y \mid \mathcal G]$ and use the defining property of [conditional expectation](/page/Conditional%20Expectation). First, testing against the event $\Omega$ gives equality of expectations. Then we prove the $L^2$ orthogonality identity $\mathbb E[(Y-Z)W]=0$ for every square-integrable $\mathcal G$-measurable [random variable](/page/Random%20Variable) $W$, first for bounded $W$ and then by truncation. Applying this with $W := Z-\mathbb E[Y]$ yields the Pythagorean variance decomposition, from which the variance inequality follows by nonnegativity of a square.
[/proofplan]
[step:Use the defining property of conditional expectation to preserve the mean]
Let
\begin{align*}
Z: (\Omega,\mathcal G) \to (\mathbb R,\mathcal B(\mathbb R))
\end{align*}
denote a version of the conditional expectation $\mathbb E[Y \mid \mathcal G]$. By definition, $Z$ is $\mathcal G$-measurable, $Z \in L^1(\Omega,\mathcal G,\mathbb P|_{\mathcal G})$, and for every event $A \in \mathcal G$,
\begin{align*}
\mathbb E[\mathbb 1_A Z] = \mathbb E[\mathbb 1_A Y].
\end{align*}
Since $\Omega \in \mathcal G$, taking $A := \Omega$ gives
\begin{align*}
\mathbb E[Z] = \mathbb E[Y].
\end{align*}
[/step]
[step:Show that the conditional expectation is square-integrable]
For each $m \in \mathbb N$, define the truncation map
\begin{align*}
T_m: \mathbb R \to \mathbb R, \qquad t \mapsto \max\{-m,\min\{t,m\}\}.
\end{align*}
Define the bounded $\mathcal G$-measurable random variable
\begin{align*}
Z_m: (\Omega,\mathcal G) \to (\mathbb R,\mathcal B(\mathbb R)), \qquad \omega \mapsto T_m(Z(\omega)).
\end{align*}
Since $Z_m$ is bounded and $\mathcal G$-measurable, the defining property of conditional expectation extends from indicators to bounded $\mathcal G$-measurable random variables by linearity and monotone approximation of nonnegative simple functions. Hence
\begin{align*}
\mathbb E[Z_m Z] = \mathbb E[Z_m Y].
\end{align*}
The product $Z_m Z$ is nonnegative because $Z_m$ has the same sign as $Z$, and
\begin{align*}
Z_m Z = |Z|\min\{|Z|,m\}.
\end{align*}
By Cauchy-Schwarz applied to the real-valued random variables $Z_m$ and $Y$,
\begin{align*}
\mathbb E[Z_m Z] = \mathbb E[Z_m Y] \leq \mathbb E[Z_m^2]^{1/2}\mathbb E[Y^2]^{1/2}.
\end{align*}
Since $|Z_m| \leq |Z|$ and $Z_m$ has the same sign as $Z$,
\begin{align*}
Z_m^2 \leq Z_m Z.
\end{align*}
Therefore
\begin{align*}
\mathbb E[Z_m Z] \leq \mathbb E[Z_m Z]^{1/2}\mathbb E[Y^2]^{1/2}.
\end{align*}
If $\mathbb E[Z_m Z] = 0$, the desired bound is immediate for that $m$; otherwise division by $\mathbb E[Z_m Z]^{1/2}$ gives
\begin{align*}
\mathbb E[Z_m Z] \leq \mathbb E[Y^2].
\end{align*}
As $m \to \infty$, the nonnegative random variables $Z_m Z = |Z|\min\{|Z|,m\}$ increase pointwise to $Z^2$. By the [monotone convergence theorem](/theorems/509),
\begin{align*}
\mathbb E[Z^2] = \lim_{m \to \infty}\mathbb E[Z_m Z] \leq \mathbb E[Y^2] < \infty.
\end{align*}
Thus $Z \in L^2(\Omega,\mathcal G,\mathbb P|_{\mathcal G})$.
[guided]
We need $Z$ to be square-integrable before it is legitimate to discuss $\operatorname{Var}(Z)$ as a finite quantity. Conditional expectation is initially defined only as an $L^1$ object, so this step proves the extra $L^2$ bound from the assumption $Y \in L^2$.
For each $m \in \mathbb N$, define the truncation map
\begin{align*}
T_m: \mathbb R \to \mathbb R, \qquad t \mapsto \max\{-m,\min\{t,m\}\}.
\end{align*}
Now define
\begin{align*}
Z_m: (\Omega,\mathcal G) \to (\mathbb R,\mathcal B(\mathbb R)), \qquad \omega \mapsto T_m(Z(\omega)).
\end{align*}
The random variable $Z_m$ is bounded and $\mathcal G$-measurable. The defining property of conditional expectation says that integration against $Z$ and against $Y$ agree on indicators of sets in $\mathcal G$. By linearity this holds for simple $\mathcal G$-[measurable functions](/page/Measurable%20Functions), and by monotone approximation it holds for bounded $\mathcal G$-measurable functions. Applying this extension with the [test function](/page/Test%20Function) $Z_m$ gives
\begin{align*}
\mathbb E[Z_m Z] = \mathbb E[Z_m Y].
\end{align*}
The reason for choosing $Z_m$ is that it has the same sign as $Z$, so the product $Z_mZ$ is nonnegative. More precisely,
\begin{align*}
Z_m Z = |Z|\min\{|Z|,m\}.
\end{align*}
This expression increases pointwise to $Z^2$ as $m \to \infty$. We now bound its expectation uniformly in $m$. By the [Cauchy-Schwarz inequality](/theorems/432) applied to the real-valued square-integrable random variables $Z_m$ and $Y$,
\begin{align*}
\mathbb E[Z_m Z] = \mathbb E[Z_m Y] \leq \mathbb E[Z_m^2]^{1/2}\mathbb E[Y^2]^{1/2}.
\end{align*}
Because $Z_m$ is a truncation of $Z$ with the same sign, we have $Z_m^2 \leq Z_mZ$. Hence
\begin{align*}
\mathbb E[Z_m Z] \leq \mathbb E[Z_m Z]^{1/2}\mathbb E[Y^2]^{1/2}.
\end{align*}
If $\mathbb E[Z_mZ]=0$, this already gives $\mathbb E[Z_mZ]\leq \mathbb E[Y^2]$. If $\mathbb E[Z_mZ]>0$, divide by $\mathbb E[Z_mZ]^{1/2}$ to obtain the same conclusion:
\begin{align*}
\mathbb E[Z_m Z] \leq \mathbb E[Y^2].
\end{align*}
Finally, since $Z_mZ$ increases pointwise to $Z^2$, the monotone convergence theorem gives
\begin{align*}
\mathbb E[Z^2] = \lim_{m \to \infty}\mathbb E[Z_m Z] \leq \mathbb E[Y^2] < \infty.
\end{align*}
Therefore $Z \in L^2(\Omega,\mathcal G,\mathbb P|_{\mathcal G})$.
[/guided]
[/step]
[step:Prove the orthogonality of the residual to square-integrable $\mathcal G$-measurable variables]
Let
\begin{align*}
W: (\Omega,\mathcal G) \to (\mathbb R,\mathcal B(\mathbb R))
\end{align*}
be any random variable in $L^2(\Omega,\mathcal G,\mathbb P|_{\mathcal G})$. For each $m \in \mathbb N$, define
\begin{align*}
W_m: (\Omega,\mathcal G) \to (\mathbb R,\mathcal B(\mathbb R)), \qquad \omega \mapsto T_m(W(\omega)),
\end{align*}
where $T_m$ is the truncation map from the previous step. Since $W_m$ is bounded and $\mathcal G$-measurable, the defining property of conditional expectation gives
\begin{align*}
\mathbb E[W_mY] = \mathbb E[W_mZ].
\end{align*}
Because $W_m \to W$ pointwise and $|W_m| \leq |W|$, Cauchy-Schwarz gives
\begin{align*}
\mathbb E[|W_mY-WY|] \leq \mathbb E[(W_m-W)^2]^{1/2}\mathbb E[Y^2]^{1/2} \to 0.
\end{align*}
Similarly, since $Z \in L^2$,
\begin{align*}
\mathbb E[|W_mZ-WZ|] \leq \mathbb E[(W_m-W)^2]^{1/2}\mathbb E[Z^2]^{1/2} \to 0.
\end{align*}
Passing to the limit in $\mathbb E[W_mY] = \mathbb E[W_mZ]$ yields
\begin{align*}
\mathbb E[WY] = \mathbb E[WZ].
\end{align*}
Equivalently,
\begin{align*}
\mathbb E[W(Y-Z)] = 0.
\end{align*}
[/step]
[step:Apply orthogonality to decompose the variance]
Define the constant
\begin{align*}
\mu := \mathbb E[Y].
\end{align*}
By the first step, $\mu = \mathbb E[Z]$. Define the centered random variables
\begin{align*}
Y_0: (\Omega,\mathcal F) \to (\mathbb R,\mathcal B(\mathbb R)), \qquad \omega \mapsto Y(\omega)-\mu
\end{align*}
and
\begin{align*}
Z_0: (\Omega,\mathcal G) \to (\mathbb R,\mathcal B(\mathbb R)), \qquad \omega \mapsto Z(\omega)-\mu.
\end{align*}
Since $Z_0$ is $\mathcal G$-measurable and belongs to $L^2(\Omega,\mathcal G,\mathbb P|_{\mathcal G})$, the orthogonality result with $W := Z_0$ gives
\begin{align*}
\mathbb E[Z_0(Y-Z)] = 0.
\end{align*}
Now $Y-\mu = (Z-\mu)+(Y-Z)=Z_0+(Y-Z)$. Expanding the square and using the preceding orthogonality relation,
\begin{align*}
\operatorname{Var}(Y) = \mathbb E[(Y-\mu)^2].
\end{align*}
\begin{align*}
\mathbb E[(Y-\mu)^2] = \mathbb E[Z_0^2] + 2\mathbb E[Z_0(Y-Z)] + \mathbb E[(Y-Z)^2].
\end{align*}
\begin{align*}
\mathbb E[(Y-\mu)^2] = \mathbb E[Z_0^2] + \mathbb E[(Y-Z)^2].
\end{align*}
Since $\mu=\mathbb E[Z]$,
\begin{align*}
\mathbb E[Z_0^2] = \operatorname{Var}(Z).
\end{align*}
Therefore
\begin{align*}
\operatorname{Var}(Y) = \operatorname{Var}(Z) + \mathbb E[(Y-Z)^2].
\end{align*}
[/step]
[step:Conclude that conditioning cannot increase variance]
The random variable $(Y-Z)^2$ is nonnegative, so its expectation is nonnegative:
\begin{align*}
\mathbb E[(Y-Z)^2] \geq 0.
\end{align*}
Using the variance decomposition from the previous step,
\begin{align*}
\operatorname{Var}(Y) = \operatorname{Var}(Z) + \mathbb E[(Y-Z)^2] \geq \operatorname{Var}(Z).
\end{align*}
Since $Z=\mathbb E[Y\mid\mathcal G]$, this is exactly
\begin{align*}
\operatorname{Var}(\mathbb E[Y\mid\mathcal G]) \leq \operatorname{Var}(Y).
\end{align*}
Together with $\mathbb E[\mathbb E[Y\mid\mathcal G]]=\mathbb E[Y]$, this proves the theorem.
[/step]