[proofplan]
We prove both directions directly from the Hilbert-space [inner product](/page/Inner%20Product). If $\hat X$ is a minimizer, then every line $\hat X+tZ$ inside $\mathcal H$ has minimal squared error at $t=0$, and expanding this quadratic in $t$ forces the linear coefficient to vanish. Conversely, if the residual $X-\hat X$ is orthogonal to every vector in $\mathcal H$, then expanding $X-Z=(X-\hat X)+(\hat X-Z)$ gives a Pythagorean identity, which shows that no $Z \in \mathcal H$ can have smaller error.
[/proofplan]
[step:Expand the error along every affine line in $\mathcal H$]
Assume first that $\hat X$ minimizes $\mathbb E[|X-Z|^2]$ over $Z \in \mathcal H$. The probability space $(\Omega,\mathcal F,\mathbb P)$ is fixed from the theorem statement, and all random vectors are regarded as elements of $L^2(\Omega;\mathbb R^n)$. Define the residual random vector $R \in L^2(\Omega;\mathbb R^n)$ by $R := X-\hat X$.
Fix $Z \in \mathcal H$. Since $\mathcal H$ is a linear subspace and $\hat X,Z \in \mathcal H$, we have $\hat X+tZ \in \mathcal H$ for every $t \in \mathbb R$. Define the function $\phi:\mathbb R \to \mathbb R$ by
\begin{align*}
\phi(t) := \mathbb E[|X-(\hat X+tZ)|^2].
\end{align*}
Because $R,Z \in L^2(\Omega;\mathbb R^n)$, the quantities $\mathbb E[|R|^2]$, $\mathbb E[R^\top Z]$, and $\mathbb E[|Z|^2]$ are finite by the [Cauchy-Schwarz inequality](/theorems/432) in $L^2(\Omega;\mathbb R^n)$. Expanding the Euclidean square pointwise and taking expectations gives
\begin{align*}
\phi(t)=\mathbb E[|R-tZ|^2]=\mathbb E[|R|^2]-2t\,\mathbb E[R^\top Z]+t^2\mathbb E[|Z|^2].
\end{align*}
Since $\hat X$ is a minimizer, $\phi(0)\leq \phi(t)$ for every $t \in \mathbb R$. Hence
\begin{align*}
0\leq \phi(t)-\phi(0)=-2t\,\mathbb E[R^\top Z]+t^2\mathbb E[|Z|^2]
\end{align*}
for every $t \in \mathbb R$.
[guided]
Assume that $\hat X$ minimizes the least-squares error over $\mathcal H$. The probability space $(\Omega,\mathcal F,\mathbb P)$ is the one fixed in the theorem statement, and the notation $L^2(\Omega;\mathbb R^n)$ means square-integrable $\mathbb R^n$-valued random vectors modulo almost-sure equality. The residual is the part of $X$ not captured by $\hat X$, so define $R \in L^2(\Omega;\mathbb R^n)$ by
\begin{align*}
R := X-\hat X.
\end{align*}
To test whether $R$ is orthogonal to a direction $Z \in \mathcal H$, we move from $\hat X$ along the line determined by $Z$. Fix $Z \in \mathcal H$. Since $\mathcal H$ is a linear subspace, every perturbation $\hat X+tZ$ also belongs to $\mathcal H$ for $t \in \mathbb R$. Define $\phi:\mathbb R \to \mathbb R$ by
\begin{align*}
\phi(t) := \mathbb E[|X-(\hat X+tZ)|^2].
\end{align*}
This function records the least-squares error along the admissible line through $\hat X$.
We may expand this expression because $R,Z \in L^2(\Omega;\mathbb R^n)$. In particular, $\mathbb E[R^\top Z]$ is finite by the Cauchy-Schwarz inequality in the [Hilbert space](/page/Hilbert%20Space) $L^2(\Omega;\mathbb R^n)$. Since $X-(\hat X+tZ)=R-tZ$, the Euclidean identity $|a-b|^2=|a|^2-2a^\top b+|b|^2$ gives pointwise
\begin{align*}
|R-tZ|^2=|R|^2-2tR^\top Z+t^2|Z|^2.
\end{align*}
Taking expectations yields
\begin{align*}
\phi(t)=\mathbb E[|R|^2]-2t\,\mathbb E[R^\top Z]+t^2\mathbb E[|Z|^2].
\end{align*}
The minimality of $\hat X$ says that $t=0$ minimizes this quadratic function, because $\hat X+tZ$ is admissible for every real $t$. Therefore
\begin{align*}
0\leq \phi(t)-\phi(0)=-2t\,\mathbb E[R^\top Z]+t^2\mathbb E[|Z|^2]
\end{align*}
for every $t \in \mathbb R$.
[/guided]
[/step]
[step:Force the residual to be orthogonal to every direction in $\mathcal H$]
Keep $Z \in \mathcal H$ fixed and set
\begin{align*}
b := \mathbb E[R^\top Z], \qquad a := \mathbb E[|Z|^2].
\end{align*}
The inequality from the previous step is
\begin{align*}
0\leq -2tb+t^2a
\end{align*}
for every $t \in \mathbb R$. If $b>0$, choosing $t>0$ sufficiently small gives $-2tb+t^2a<0$. If $b<0$, choosing $t<0$ with sufficiently small $|t|$ gives $-2tb+t^2a<0$. Both alternatives contradict the inequality. Thus $b=0$, meaning
\begin{align*}
\mathbb E[(X-\hat X)^\top Z]=0.
\end{align*}
Since $Z \in \mathcal H$ was arbitrary, the residual $X-\hat X$ is orthogonal to every element of $\mathcal H$.
[guided]
We continue from the variational inequality obtained in the previous step. Fix $Z \in \mathcal H$ and define the [real numbers](/page/Real%20Numbers) $b \in \mathbb R$ and $a \in [0,\infty)$ by
\begin{align*}
b := \mathbb E[R^\top Z], \qquad a := \mathbb E[|Z|^2].
\end{align*}
The finiteness of $b$ follows from the Cauchy-Schwarz inequality in $L^2(\Omega;\mathbb R^n)$, and $a$ is finite because $Z$ is square-integrable. The inequality from the preceding line expansion says
\begin{align*}
0\leq -2tb+t^2a
\end{align*}
for every $t \in \mathbb R$.
Why must the linear coefficient vanish? If $b>0$, choose a real number $t>0$ satisfying $0<t<2b/(a+1)$. Then $a+1>0$, and
\begin{align*}
-2tb+t^2a=t(-2b+ta)\leq t(-2b+t(a+1))<0,
\end{align*}
contradicting the inequality. If $b<0$, choose $t<0$ with $0<|t|<2|b|/(a+1)$. Writing $t=-|t|$, we get
\begin{align*}
-2tb+t^2a=|t|(2b+|t|a)\leq |t|(2b+|t|(a+1))<0,
\end{align*}
again contradicting the inequality. Both nonzero alternatives are impossible, so $b=0$.
Substituting the definition of $b$ gives
\begin{align*}
\mathbb E[(X-\hat X)^\top Z]=\mathbb E[R^\top Z]=0.
\end{align*}
Because $Z \in \mathcal H$ was arbitrary, this proves that $X-\hat X$ is orthogonal to every admissible direction in $\mathcal H$.
[/guided]
[/step]
[step:Use orthogonality to obtain the Pythagorean identity]
Conversely, assume that
\begin{align*}
\mathbb E[(X-\hat X)^\top Y]=0
\end{align*}
for every $Y \in \mathcal H$. Let $Z \in \mathcal H$ be arbitrary, and define $W \in \mathcal H$ by $W:=\hat X-Z$. Since $\mathcal H$ is a linear subspace and $\hat X,Z \in \mathcal H$, we have $W \in \mathcal H$. Hence the assumed orthogonality gives
\begin{align*}
\mathbb E[(X-\hat X)^\top W]=0.
\end{align*}
Using $X-Z=(X-\hat X)+(\hat X-Z)=R+W$, where $R:=X-\hat X$, we expand:
\begin{align*}
\mathbb E[|X-Z|^2]=\mathbb E[|R+W|^2]=\mathbb E[|R|^2]+2\mathbb E[R^\top W]+\mathbb E[|W|^2].
\end{align*}
The middle term is zero by orthogonality, so
\begin{align*}
\mathbb E[|X-Z|^2]=\mathbb E[|X-\hat X|^2]+\mathbb E[|\hat X-Z|^2].
\end{align*}
Since $\mathbb E[|\hat X-Z|^2]\geq 0$, it follows that
\begin{align*}
\mathbb E[|X-Z|^2]\geq \mathbb E[|X-\hat X|^2].
\end{align*}
[guided]
Now assume the orthogonality condition:
\begin{align*}
\mathbb E[(X-\hat X)^\top Y]=0
\end{align*}
for every $Y \in \mathcal H$. We want to prove that $\hat X$ minimizes the squared error.
Take an arbitrary competitor $Z \in \mathcal H$. The useful comparison vector is not $Z$ itself, but the displacement from $Z$ to $\hat X$. Define $W \in L^2(\Omega;\mathbb R^n)$ by
\begin{align*}
W := \hat X-Z.
\end{align*}
Because $\mathcal H$ is a linear subspace and both $\hat X$ and $Z$ lie in $\mathcal H$, the difference $W$ also lies in $\mathcal H$. Therefore the orthogonality hypothesis applies to $W$, giving
\begin{align*}
\mathbb E[(X-\hat X)^\top W]=0.
\end{align*}
Define $R \in L^2(\Omega;\mathbb R^n)$ by $R:=X-\hat X$. Then
\begin{align*}
X-Z=(X-\hat X)+(\hat X-Z)=R+W.
\end{align*}
Expanding the Euclidean square pointwise gives
\begin{align*}
|X-Z|^2=|R+W|^2=|R|^2+2R^\top W+|W|^2.
\end{align*}
Taking expectations, and using the orthogonality just proved for $W$, we obtain
\begin{align*}
\mathbb E[|X-Z|^2]=\mathbb E[|R|^2]+2\mathbb E[R^\top W]+\mathbb E[|W|^2]=\mathbb E[|R|^2]+\mathbb E[|W|^2].
\end{align*}
Substituting back $R=X-\hat X$ and $W=\hat X-Z$ gives the Pythagorean identity
\begin{align*}
\mathbb E[|X-Z|^2]=\mathbb E[|X-\hat X|^2]+\mathbb E[|\hat X-Z|^2].
\end{align*}
The second term is nonnegative because it is the expectation of a nonnegative [random variable](/page/Random%20Variable). Hence
\begin{align*}
\mathbb E[|X-Z|^2]\geq \mathbb E[|X-\hat X|^2].
\end{align*}
This proves that the arbitrary competitor $Z$ cannot improve on $\hat X$.
[/guided]
[/step]
[step:Conclude the equivalence]
The previous step shows that every $Z \in \mathcal H$ satisfies
\begin{align*}
\mathbb E[|X-Z|^2]\geq \mathbb E[|X-\hat X|^2],
\end{align*}
so $\hat X$ minimizes the least-squares error over $\mathcal H$. Together with the first two steps, this proves both implications and completes the proof of the equivalence.
[guided]
The previous step proved the key inequality for an arbitrary competitor $Z \in \mathcal H$:
\begin{align*}
\mathbb E[|X-Z|^2]\geq \mathbb E[|X-\hat X|^2].
\end{align*}
This is exactly the definition that $\hat X$ minimizes the least-squares error over $\mathcal H$. The first two steps proved the reverse implication, namely that minimality forces the orthogonality condition. Therefore the two conditions are equivalent.
[/guided]
[/step]