[proofplan]
Let $X := \mathbb E[Y \mid \mathcal G]$. We first prove that $X$ is square-integrable by testing the defining identity for conditional expectation against bounded truncations of $X$. We then show that the bounded-test orthogonality property is exactly the conditional expectation property and prove uniqueness by testing the difference of two candidates against its own truncations. Finally, we extend orthogonality from bounded $\mathcal G$-measurable test variables to all variables in $L^2(\Omega,\mathcal G,\mathbb P)$ and use the resulting Pythagorean identity to prove the unique mean-square minimisation property.
[/proofplan]
[step:Show that the conditional expectation is square-integrable]
Because $Y \in L^2(\Omega,\mathcal F,\mathbb P)$ and $\mathbb P(\Omega)=1$, the random variable $Y$ belongs to $L^1(\Omega,\mathcal F,\mathbb P)$. Let
\begin{align*}
X: (\Omega,\mathcal G) &\to (\mathbb R,\mathcal B(\mathbb R))
\end{align*}
denote $X := \mathbb E[Y \mid \mathcal G]$.
For each $n \in \mathbb N$, define the bounded $\mathcal G$-measurable random variable
\begin{align*}
Z_n: \Omega &\to \mathbb R \\
\omega &\mapsto X(\omega)\mathbb 1_{\{|X|\le n\}}(\omega).
\end{align*}
The defining property of conditional expectation gives
\begin{align*}
\mathbb E[X Z_n] = \mathbb E[Y Z_n].
\end{align*}
Since $X Z_n = X^2\mathbb 1_{\{|X|\le n\}}$, we have
\begin{align*}
\mathbb E[X^2\mathbb 1_{\{|X|\le n\}}]
= \mathbb E[Y X\mathbb 1_{\{|X|\le n\}}].
\end{align*}
Set
\begin{align*}
a_n := \mathbb E[X^2\mathbb 1_{\{|X|\le n\}}] \in [0,\infty).
\end{align*}
Using $2|ab|\le a^2+b^2$ with
\begin{align*}
a = \varepsilon |Y|, \qquad b = \varepsilon^{-1}|X|\mathbb 1_{\{|X|\le n\}},
\end{align*}
for any $\varepsilon>0$, we obtain
\begin{align*}
a_n
&\le \mathbb E[|Y|\,|X|\,\mathbb 1_{\{|X|\le n\}}] \\
&\le \frac{\varepsilon^2}{2}\mathbb E[Y^2]
+ \frac{1}{2\varepsilon^2}\mathbb E[X^2\mathbb 1_{\{|X|\le n\}}] \\
&= \frac{\varepsilon^2}{2}\mathbb E[Y^2] + \frac{1}{2\varepsilon^2}a_n.
\end{align*}
Taking $\varepsilon=1$ gives
\begin{align*}
a_n \le \mathbb E[Y^2].
\end{align*}
The sequence $(X^2\mathbb 1_{\{|X|\le n\}})_{n\in\mathbb N}$ increases pointwise to $X^2$, so the definition of the non-negative expectation by increasing truncation gives
\begin{align*}
\mathbb E[X^2] = \sup_{n\in\mathbb N} a_n \le \mathbb E[Y^2] < \infty.
\end{align*}
Thus $X \in L^2(\Omega,\mathcal G,\mathbb P)$.
[guided]
The conditional expectation is initially defined for integrable random variables, and $Y$ is integrable because $Y \in L^2$ on a probability space:
\begin{align*}
|Y| \le \frac{1}{2}(Y^2+1),
\end{align*}
so $\mathbb E[|Y|]<\infty$. Let
\begin{align*}
X: (\Omega,\mathcal G) &\to (\mathbb R,\mathcal B(\mathbb R))
\end{align*}
be $X := \mathbb E[Y\mid\mathcal G]$.
The point is to prove $X \in L^2$ without assuming it. We cannot test the defining identity with $X$ directly, because we do not yet know that $X$ is square-integrable. Instead, we test against bounded truncations of $X$. For each $n\in\mathbb N$, define
\begin{align*}
Z_n: \Omega &\to \mathbb R \\
\omega &\mapsto X(\omega)\mathbb 1_{\{|X|\le n\}}(\omega).
\end{align*}
Since $X$ is $\mathcal G$-measurable, the set $\{|X|\le n\}$ belongs to $\mathcal G$, and $Z_n$ is $\mathcal G$-measurable. It is bounded by $n$, so it is an admissible test variable in the defining property of conditional expectation. Therefore
\begin{align*}
\mathbb E[XZ_n] = \mathbb E[YZ_n].
\end{align*}
Substituting the definition of $Z_n$ gives
\begin{align*}
\mathbb E[X^2\mathbb 1_{\{|X|\le n\}}]
= \mathbb E[Y X\mathbb 1_{\{|X|\le n\}}].
\end{align*}
Define
\begin{align*}
a_n := \mathbb E[X^2\mathbb 1_{\{|X|\le n\}}].
\end{align*}
We now bound $a_n$ uniformly in $n$. The elementary inequality $2|ab|\le a^2+b^2$, applied with
\begin{align*}
a = |Y|, \qquad b = |X|\mathbb 1_{\{|X|\le n\}},
\end{align*}
gives
\begin{align*}
|Y|\,|X|\,\mathbb 1_{\{|X|\le n\}}
\le \frac{1}{2}Y^2 + \frac{1}{2}X^2\mathbb 1_{\{|X|\le n\}}.
\end{align*}
Taking expectations yields
\begin{align*}
a_n
&\le \mathbb E[|Y|\,|X|\,\mathbb 1_{\{|X|\le n\}}] \\
&\le \frac{1}{2}\mathbb E[Y^2] + \frac{1}{2}a_n.
\end{align*}
Hence
\begin{align*}
a_n \le \mathbb E[Y^2].
\end{align*}
The functions $X^2\mathbb 1_{\{|X|\le n\}}$ increase pointwise to $X^2$. Therefore
\begin{align*}
\mathbb E[X^2]
= \sup_{n\in\mathbb N}\mathbb E[X^2\mathbb 1_{\{|X|\le n\}}]
\le \mathbb E[Y^2] < \infty.
\end{align*}
This proves $X\in L^2(\Omega,\mathcal G,\mathbb P)$.
[/guided]
[/step]
[step:Identify the conditional expectation by bounded orthogonality]
Let $Z:\Omega\to\mathbb R$ be a bounded $\mathcal G$-measurable random variable. Since $Y\in L^1$, $X\in L^1$, and $Z$ is bounded, both $\mathbb E[YZ]$ and $\mathbb E[XZ]$ are finite.
The defining property of conditional expectation gives
\begin{align*}
\mathbb E[XZ] = \mathbb E[YZ].
\end{align*}
Therefore
\begin{align*}
\mathbb E[(Y-X)Z] = 0.
\end{align*}
Conversely, suppose $U:\Omega\to\mathbb R$ is a $\mathcal G$-measurable random variable in $L^2(\Omega,\mathcal F,\mathbb P)$ such that
\begin{align*}
\mathbb E[(Y-U)Z] = 0
\end{align*}
for every bounded $\mathcal G$-measurable real-valued random variable $Z$. Taking $Z=\mathbb 1_A$ for an arbitrary set $A\in\mathcal G$ gives
\begin{align*}
\mathbb E[Y\mathbb 1_A] = \mathbb E[U\mathbb 1_A].
\end{align*}
Since $U$ is $\mathcal G$-measurable and integrable, this is precisely the defining identity for $U=\mathbb E[Y\mid\mathcal G]$. Thus the bounded orthogonality condition characterises the conditional expectation.
[/step]
[step:Prove uniqueness by testing the difference against its truncations]
Let $U_1,U_2\in L^2(\Omega,\mathcal F,\mathbb P)$ be $\mathcal G$-measurable random variables satisfying
\begin{align*}
\mathbb E[(Y-U_i)Z]=0
\end{align*}
for every bounded $\mathcal G$-measurable real-valued random variable $Z$ and for $i\in\{1,2\}$. Define
\begin{align*}
W:\Omega&\to\mathbb R\\
\omega&\mapsto U_1(\omega)-U_2(\omega).
\end{align*}
Then $W$ is $\mathcal G$-measurable and belongs to $L^2(\Omega,\mathcal F,\mathbb P)$. Subtracting the two orthogonality identities gives
\begin{align*}
\mathbb E[WZ]=0
\end{align*}
for every bounded $\mathcal G$-measurable real-valued random variable $Z$.
For each $n\in\mathbb N$, define
\begin{align*}
T_n:\Omega&\to\mathbb R\\
\omega&\mapsto W(\omega)\mathbb 1_{\{|W|\le n\}}(\omega).
\end{align*}
Then $T_n$ is bounded and $\mathcal G$-measurable. Hence
\begin{align*}
0=\mathbb E[WT_n]=\mathbb E[W^2\mathbb 1_{\{|W|\le n\}}].
\end{align*}
Since $W^2\mathbb 1_{\{|W|\le n\}}$ increases pointwise to $W^2$, we get
\begin{align*}
\mathbb E[W^2]=0.
\end{align*}
Therefore $W=0$ $\mathbb P$-a.s., so $U_1=U_2$ in $L^2(\Omega,\mathcal F,\mathbb P)$. This proves uniqueness.
[/step]
[step:Extend orthogonality from bounded tests to all square-integrable forecasts]
Let $V\in L^2(\Omega,\mathcal G,\mathbb P)$. Define
\begin{align*}
V_n:\Omega&\to\mathbb R\\
\omega&\mapsto V(\omega)\mathbb 1_{\{|V|\le n\}}(\omega)
\end{align*}
for each $n\in\mathbb N$. Each $V_n$ is bounded and $\mathcal G$-measurable, so
\begin{align*}
\mathbb E[(Y-X)V_n]=0.
\end{align*}
Moreover,
\begin{align*}
|(Y-X)(V-V_n)|
\le \frac{1}{2}(Y-X)^2 + \frac{1}{2}(V-V_n)^2,
\end{align*}
and $(V_n)_{n\in\mathbb N}$ converges to $V$ in $L^2(\Omega,\mathcal G,\mathbb P)$ by square-integrable truncation. Therefore
\begin{align*}
\mathbb E[(Y-X)(V-V_n)]\to 0.
\end{align*}
It follows that
\begin{align*}
\mathbb E[(Y-X)V]=0.
\end{align*}
[guided]
We already know orthogonality against bounded $\mathcal G$-measurable test variables. For the minimisation argument, the test variable will be $V-X$, where $V$ is an arbitrary square-integrable $\mathcal G$-measurable forecast. This variable need not be bounded, so we extend the identity by truncation.
Let $V\in L^2(\Omega,\mathcal G,\mathbb P)$. For each $n\in\mathbb N$, define
\begin{align*}
V_n:\Omega&\to\mathbb R\\
\omega&\mapsto V(\omega)\mathbb 1_{\{|V|\le n\}}(\omega).
\end{align*}
Because $V$ is $\mathcal G$-measurable, the set $\{|V|\le n\}$ lies in $\mathcal G$, and hence $V_n$ is $\mathcal G$-measurable. It is bounded by $n$, so the already proved bounded orthogonality gives
\begin{align*}
\mathbb E[(Y-X)V_n]=0.
\end{align*}
We now pass to the limit. The difference $V-V_n$ converges to $0$ in $L^2$ because
\begin{align*}
(V-V_n)^2 = V^2\mathbb 1_{\{|V|>n\}},
\end{align*}
and these non-negative functions decrease pointwise to $0$ while being dominated by the integrable function $V^2$. Also $Y-X\in L^2$ because both $Y$ and $X$ are in $L^2$. The elementary inequality
\begin{align*}
2|(Y-X)(V-V_n)|\le (Y-X)^2+(V-V_n)^2
\end{align*}
shows that the product has integrable control along the truncation limit, and the usual $L^2$ product estimate gives
\begin{align*}
\mathbb E[(Y-X)(V-V_n)]\to 0.
\end{align*}
Therefore
\begin{align*}
\mathbb E[(Y-X)V]
=
\lim_{n\to\infty}\mathbb E[(Y-X)V_n]
=
0.
\end{align*}
Thus the orthogonality identity holds for every $V\in L^2(\Omega,\mathcal G,\mathbb P)$, not only for bounded test variables.
[/guided]
[/step]
[step:Use the Pythagorean identity to prove the mean-square minimisation property]
Let $V\in L^2(\Omega,\mathcal G,\mathbb P)$. Define
\begin{align*}
W:\Omega&\to\mathbb R\\
\omega&\mapsto V(\omega)-X(\omega).
\end{align*}
Then $W\in L^2(\Omega,\mathcal G,\mathbb P)$. By the extended orthogonality just proved,
\begin{align*}
\mathbb E[(Y-X)W]=0.
\end{align*}
Since $Y-V=(Y-X)-W$, expanding the square gives
\begin{align*}
\mathbb E[(Y-V)^2]
&= \mathbb E[((Y-X)-W)^2] \\
&= \mathbb E[(Y-X)^2] - 2\mathbb E[(Y-X)W] + \mathbb E[W^2] \\
&= \mathbb E[(Y-X)^2] + \mathbb E[(V-X)^2].
\end{align*}
Hence
\begin{align*}
J(V) \ge J(X).
\end{align*}
Equality holds if and only if
\begin{align*}
\mathbb E[(V-X)^2]=0,
\end{align*}
which is equivalent to $V=X$ $\mathbb P$-a.s. Therefore $X=\mathbb E[Y\mid\mathcal G]$ is the unique minimiser of $J$ over $L^2(\Omega,\mathcal G,\mathbb P)$.
[/step]