[proofplan]
We prove Property (iv), the orthogonal decomposition that identifies $\mathbb{E}[Z \mid W]$ as the best least-squares predictor. The strategy is to add and subtract $\mathbb{E}[Z \mid W]$ inside $(Z - g(W))^2$, expand the square into three terms, and show that the cross term vanishes. The cross term is eliminated by conditioning on $W$, applying the "taking out what is known" property (iii) to extract $(\mathbb{E}[Z \mid W] - g(W))$ as a measurable function of $W$, and using the defining property $\mathbb{E}[Z - \mathbb{E}[Z \mid W] \mid W] = 0$.
[/proofplan]
[step:Add and subtract $\mathbb{E}[Z \mid W]$ and expand the square]
Write $m(W) := \mathbb{E}[Z \mid W]$ for brevity. Decompose
\begin{align*}
Z - g(W) = \bigl(Z - m(W)\bigr) + \bigl(m(W) - g(W)\bigr).
\end{align*}
Expanding the square and taking expectations:
\begin{align*}
\mathbb{E}[(Z - g(W))^2] &= \mathbb{E}\bigl[\bigl(Z - m(W)\bigr)^2\bigr] + \mathbb{E}\bigl[\bigl(m(W) - g(W)\bigr)^2\bigr] \\
&\quad + 2\,\mathbb{E}\bigl[\bigl(Z - m(W)\bigr)\bigl(m(W) - g(W)\bigr)\bigr].
\end{align*}
The expansion uses the algebraic identity $(a + b)^2 = a^2 + 2ab + b^2$ applied with $a = Z - m(W)$ and $b = m(W) - g(W)$, together with linearity of expectation.
[/step]
[step:Show the cross term vanishes by conditioning on $W$]
It remains to show
\begin{align*}
\mathbb{E}\bigl[(Z - m(W))(m(W) - g(W))\bigr] = 0.
\end{align*}
Apply the tower property (ii) with the $\sigma$-algebra $\sigma(W)$:
\begin{align*}
\mathbb{E}\bigl[(Z - m(W))(m(W) - g(W))\bigr] = \mathbb{E}\Bigl[\mathbb{E}\bigl[(Z - m(W))(m(W) - g(W)) \mid W\bigr]\Bigr].
\end{align*}
The tower property applies because $(Z - m(W))(m(W) - g(W))$ is integrable: by the Cauchy--Schwarz inequality, $\mathbb{E}[|(Z - m(W))(m(W) - g(W))|] \leq \|Z - m(W)\|_{L^2} \cdot \|m(W) - g(W)\|_{L^2}$, and both factors are finite since $\mathbb{E}[Z^2] < \infty$ and $\mathbb{E}[(g(W))^2] < \infty$.
Inside the inner conditional expectation, $m(W) - g(W)$ is $\sigma(W)$-measurable. By Property (iii) — "taking out what is known" — with $\mathbb{E}[(m(W) - g(W))^2] < \infty$ and $\mathbb{E}[(Z - m(W))^2] < \infty$:
\begin{align*}
\mathbb{E}\bigl[(Z - m(W))(m(W) - g(W)) \mid W\bigr] = (m(W) - g(W)) \cdot \mathbb{E}[Z - m(W) \mid W].
\end{align*}
Now evaluate $\mathbb{E}[Z - m(W) \mid W]$. By linearity of conditional expectation and the fact that $m(W) = \mathbb{E}[Z \mid W]$ is $\sigma(W)$-measurable:
\begin{align*}
\mathbb{E}[Z - m(W) \mid W] = \mathbb{E}[Z \mid W] - \mathbb{E}[m(W) \mid W] = m(W) - m(W) = 0.
\end{align*}
Therefore the inner conditional expectation is $(m(W) - g(W)) \cdot 0 = 0$, and consequently
\begin{align*}
\mathbb{E}\bigl[(Z - m(W))(m(W) - g(W))\bigr] = \mathbb{E}[0] = 0.
\end{align*}
[guided]
The core of this step is showing that the "prediction error" $Z - \mathbb{E}[Z \mid W]$ is orthogonal (in the $L^2$ sense) to every $\sigma(W)$-measurable random variable.
We use the tower property to move from a global expectation to a conditional one:
\begin{align*}
\mathbb{E}\bigl[(Z - m(W))(m(W) - g(W))\bigr] = \mathbb{E}\Bigl[\mathbb{E}\bigl[(Z - m(W))(m(W) - g(W)) \mid W\bigr]\Bigr].
\end{align*}
Why does this help? Because conditioning on $W$ makes $m(W) - g(W)$ a known quantity (it is $\sigma(W)$-measurable), so we can pull it outside the conditional expectation using Property (iii):
\begin{align*}
\mathbb{E}\bigl[(Z - m(W))(m(W) - g(W)) \mid W\bigr] = (m(W) - g(W)) \cdot \mathbb{E}[Z - m(W) \mid W].
\end{align*}
The application of Property (iii) requires both factors to be in $L^2$: $\mathbb{E}[(Z - m(W))^2] < \infty$ holds because $\mathbb{E}[Z^2] < \infty$, and $\mathbb{E}[(m(W) - g(W))^2] < \infty$ because both $m(W)$ and $g(W)$ are in $L^2$.
Now the decisive computation: $\mathbb{E}[Z - m(W) \mid W] = \mathbb{E}[Z \mid W] - m(W) = 0$. This uses the linearity of conditional expectation and the fact that $m(W) = \mathbb{E}[Z \mid W]$ is $\sigma(W)$-measurable, so $\mathbb{E}[m(W) \mid W] = m(W)$. In other words, $Z - \mathbb{E}[Z \mid W]$ has conditional mean zero given $W$ — this is the fundamental property of conditional expectation as a projection.
Since the inner conditional expectation is zero, the outer expectation is zero, and the cross term vanishes.
[/guided]
[/step]
[step:Conclude the orthogonal decomposition and uniqueness of the minimiser]
Combining the expansion with the vanishing cross term:
\begin{align*}
\mathbb{E}[(Z - g(W))^2] = \mathbb{E}[(Z - \mathbb{E}[Z \mid W])^2] + \mathbb{E}[(\mathbb{E}[Z \mid W] - g(W))^2].
\end{align*}
The first term on the right-hand side is the irreducible error — it does not depend on the choice of $g$. The second term $\mathbb{E}[(\mathbb{E}[Z \mid W] - g(W))^2] \geq 0$, with equality if and only if $g(W) = \mathbb{E}[Z \mid W]$ almost surely. Therefore $g = \mathbb{E}[Z \mid W]$ is the unique minimiser of $g \mapsto \mathbb{E}[(Z - g(W))^2]$ over all measurable $g : \mathbb{R}^d \to \mathbb{R}$.
[/step]