[proofplan]
Write the prediction error as the sum of the unpredictable residual $Y-m_X(X)$ and the discrepancy $m_X(X)-g(X)$ between the proposed predictor and the conditional mean. Expanding the square gives the desired risk decomposition once the cross term is shown to have expectation zero. That orthogonality follows from the defining property of conditional expectation because $m_X(X)-g(X)$ is a square-integrable $\sigma(X)$-measurable random variable. The non-negative discrepancy term then identifies the minimizers exactly.
[/proofplan]
[step:Define the residual and the predictor discrepancy]
Fix an admissible Borel measurable function
\begin{align*}
g:\mathbb R^p \to \mathbb R
\end{align*}
with $g(X)\in L^2(\Omega,\mathcal F,\mathbb P)$. Define the residual random variable
\begin{align*}
U:\Omega &\to \mathbb R \\
\omega &\mapsto Y(\omega)-m_X(X(\omega))
\end{align*}
and the discrepancy random variable
\begin{align*}
V:\Omega &\to \mathbb R \\
\omega &\mapsto m_X(X(\omega))-g(X(\omega)).
\end{align*}
Since $Y\in L^2(\Omega,\mathcal F,\mathbb P)$, $m_X(X)=\mathbb E[Y\mid\sigma(X)]\in L^2(\Omega,\mathcal F,\mathbb P)$, and $g(X)\in L^2(\Omega,\mathcal F,\mathbb P)$, both $U$ and $V$ belong to $L^2(\Omega,\mathcal F,\mathbb P)$. Also, $V$ is $\sigma(X)$-measurable because both $m_X(X)$ and $g(X)$ are $\sigma(X)$-measurable.
[/step]
[step:Show the residual is orthogonal to every square-integrable function of $X$]
We prove that
\begin{align*}
\mathbb E[UV]=0.
\end{align*}
For each $k\in\mathbb N$, define the bounded truncation
\begin{align*}
V_k:\Omega &\to \mathbb R \\
\omega &\mapsto \max\{-k,\min\{V(\omega),k\}\}.
\end{align*}
Each $V_k$ is bounded and $\sigma(X)$-measurable. Since $m_X(X)=\mathbb E[Y\mid\sigma(X)]$ $\mathbb P$-almost surely, the defining property of conditional expectation gives
\begin{align*}
\mathbb E[YV_k]=\mathbb E[m_X(X)V_k].
\end{align*}
Subtracting the right-hand side from the left-hand side yields
\begin{align*}
\mathbb E[UV_k]=\mathbb E[(Y-m_X(X))V_k]=0.
\end{align*}
Because $V_k\to V$ pointwise and $|V_k-V|^2\leq 4|V|^2$ with $V\in L^2(\Omega,\mathcal F,\mathbb P)$, we have $V_k\to V$ in $L^2(\Omega,\mathcal F,\mathbb P)$. Applying the [Cauchy-Schwarz inequality](/theorems/432) to the product $U(V_k-V)$ gives
\begin{align*}
|\mathbb E[U(V_k-V)]|
\leq
\mathbb E[U^2]^{1/2}\mathbb E[(V_k-V)^2]^{1/2}
\to 0.
\end{align*}
Therefore
\begin{align*}
\mathbb E[UV]
=
\lim_{k\to\infty}\mathbb E[UV_k]
=
0.
\end{align*}
[guided]
The only delicate point is that the defining property of conditional expectation is immediate for bounded $\sigma(X)$-measurable test variables, while $V=m_X(X)-g(X)$ is only known to be square-integrable. We therefore approximate $V$ by bounded $\sigma(X)$-measurable truncations.
For each $k\in\mathbb N$, define
\begin{align*}
V_k:\Omega &\to \mathbb R \\
\omega &\mapsto \max\{-k,\min\{V(\omega),k\}\}.
\end{align*}
Since $V$ is $\sigma(X)$-measurable and the truncation map $t\mapsto \max\{-k,\min\{t,k\}\}$ is Borel measurable, $V_k$ is $\sigma(X)$-measurable. It is also bounded by $k$. Thus the defining property of $m_X(X)=\mathbb E[Y\mid\sigma(X)]$ applies to $V_k$ and gives
\begin{align*}
\mathbb E[YV_k]=\mathbb E[m_X(X)V_k].
\end{align*}
Equivalently,
\begin{align*}
\mathbb E[(Y-m_X(X))V_k]=0.
\end{align*}
In the notation $U=Y-m_X(X)$, this is
\begin{align*}
\mathbb E[UV_k]=0.
\end{align*}
It remains to pass from $V_k$ to $V$. Since $V_k(\omega)\to V(\omega)$ for every $\omega\in\Omega$ and $|V_k(\omega)-V(\omega)|\leq 2|V(\omega)|$, we have
\begin{align*}
|V_k(\omega)-V(\omega)|^2\leq 4|V(\omega)|^2.
\end{align*}
The function $4|V|^2$ is integrable because $V\in L^2(\Omega,\mathcal F,\mathbb P)$, so $V_k\to V$ in $L^2(\Omega,\mathcal F,\mathbb P)$. Since $U\in L^2(\Omega,\mathcal F,\mathbb P)$, the Cauchy-Schwarz inequality gives
\begin{align*}
|\mathbb E[U(V_k-V)]|
\leq
\mathbb E[U^2]^{1/2}\mathbb E[(V_k-V)^2]^{1/2}
\to 0.
\end{align*}
Thus
\begin{align*}
\mathbb E[UV]
=
\lim_{k\to\infty}\mathbb E[UV_k]
=
0.
\end{align*}
This is the orthogonality statement: the residual $Y-\mathbb E[Y\mid\sigma(X)]$ has zero inner product with every square-integrable function of $X$.
[/guided]
[/step]
[step:Expand the squared error and remove the cross term]
Using
\begin{align*}
Y-g(X)=\bigl(Y-m_X(X)\bigr)+\bigl(m_X(X)-g(X)\bigr)=U+V,
\end{align*}
we expand the square:
\begin{align*}
\mathbb E[(Y-g(X))^2]
&=
\mathbb E[(U+V)^2] \\
&=
\mathbb E[U^2]+2\mathbb E[UV]+\mathbb E[V^2].
\end{align*}
By the orthogonality established above, $\mathbb E[UV]=0$. Hence
\begin{align*}
\mathbb E[(Y-g(X))^2]
=
\mathbb E[(Y-m_X(X))^2]
+
\mathbb E[(m_X(X)-g(X))^2].
\end{align*}
[/step]
[step:Identify the minimum and the equality case]
The second term in the decomposition is non-negative:
\begin{align*}
\mathbb E[(m_X(X)-g(X))^2]\geq 0.
\end{align*}
Therefore, for every admissible $g$,
\begin{align*}
\mathbb E[(Y-g(X))^2]
\geq
\mathbb E[(Y-m_X(X))^2].
\end{align*}
Taking $g=m_X$ gives equality, so the minimum risk is
\begin{align*}
\mathbb E[(Y-m_X(X))^2]
=
\mathbb E[(Y-\mathbb E[Y\mid\sigma(X)])^2].
\end{align*}
Moreover, equality holds for an admissible $g$ exactly when
\begin{align*}
\mathbb E[(m_X(X)-g(X))^2]=0,
\end{align*}
which is equivalent to $m_X(X)=g(X)$ $\mathbb P$-almost surely. This proves both the minimizing property and the stated equality case.
[/step]