[proofplan]
We expand the quadratic risk $Q(a,b)$ using only finite first and second moments, so the first-order conditions can be computed algebraically. The two resulting normal equations are then solved explicitly: the intercept equation expresses $a$ in terms of $b$, and the slope equation reduces to $\operatorname{Cov}(X,Y)-b\operatorname{Var}(X)=0$. Finally, after inserting the resulting coefficients, the residual is orthogonal to both $1$ and $X$, so every other risk differs by the non-negative quantity $\mathbb E[(c+dX)^2]$; the assumption $\operatorname{Var}(X)>0$ makes equality possible only for $c=d=0$.
[/proofplan]
[step:Expand the quadratic risk into finite moments]
Since $X,Y\in L^2(\Omega,\mathcal F,\mathbb P)$, the random variables $X$, $Y$, $X^2$, $Y^2$, and $XY$ are integrable. The integrability of $XY$ follows from the elementary inequality
\begin{align*}
|XY|\leq \frac{1}{2}X^2+\frac{1}{2}Y^2.
\end{align*}
Thus, for every $(a,b)\in\mathbb R^2$, expanding the square gives
\begin{align*}
Q(a,b)
&=\mathbb E[(Y-a-bX)^2]\\
&=\mathbb E[Y^2]-2a\mathbb E[Y]-2b\mathbb E[XY]+a^2+2ab\mathbb E[X]+b^2\mathbb E[X^2].
\end{align*}
Therefore $Q$ is a real quadratic polynomial in $(a,b)$ with finite coefficients.
[guided]
The purpose of this step is to avoid any hidden differentiating-under-the-expectation issue. We first show that every coefficient in the expanded expression is finite.
Because $X,Y\in L^2(\Omega,\mathcal F,\mathbb P)$, the second moments $\mathbb E[X^2]$ and $\mathbb E[Y^2]$ are finite. The mixed term $XY$ is integrable because, pointwise on $\Omega$,
\begin{align*}
|XY|\leq \frac{1}{2}X^2+\frac{1}{2}Y^2,
\end{align*}
and the right-hand side has finite expectation. Since a probability measure has total mass $1$, square-integrability also implies $\mathbb E[|X|]<\infty$ and $\mathbb E[|Y|]<\infty$.
Now expand the square before taking expectations:
\begin{align*}
(Y-a-bX)^2
=Y^2-2aY-2bXY+a^2+2abX+b^2X^2.
\end{align*}
Taking expectations term by term is justified by the integrability just verified, and gives
\begin{align*}
Q(a,b)
&=\mathbb E[Y^2]-2a\mathbb E[Y]-2b\mathbb E[XY]+a^2+2ab\mathbb E[X]+b^2\mathbb E[X^2].
\end{align*}
Thus $Q:\mathbb R^2\to\mathbb R$ is an ordinary quadratic polynomial in the two real variables $a$ and $b$.
[/guided]
[/step]
[step:Derive the population normal equations from the quadratic polynomial]
For fixed $(a,b)\in\mathbb R^2$, the partial derivatives of the polynomial expression for $Q$ are
\begin{align*}
\frac{\partial Q}{\partial a}(a,b)
&=-2\mathbb E[Y]+2a+2b\mathbb E[X]\\
&=-2\mathbb E[Y-a-bX],
\end{align*}
and
\begin{align*}
\frac{\partial Q}{\partial b}(a,b)
&=-2\mathbb E[XY]+2a\mathbb E[X]+2b\mathbb E[X^2]\\
&=-2\mathbb E[X(Y-a-bX)].
\end{align*}
Hence every unconstrained minimizer of $Q$ must satisfy
\begin{align*}
\mathbb E[Y-a-bX]&=0,&
\mathbb E[X(Y-a-bX)]&=0.
\end{align*}
[/step]
[step:Solve the normal equations explicitly]
Assume $(a,b)\in\mathbb R^2$ satisfies the normal equations. From the first equation,
\begin{align*}
0=\mathbb E[Y-a-bX]=\mathbb E[Y]-a-b\mathbb E[X],
\end{align*}
so
\begin{align*}
a=\mathbb E[Y]-b\mathbb E[X].
\end{align*}
Substitute this value of $a$ into the second equation:
\begin{align*}
0
&=\mathbb E[X(Y-a-bX)]\\
&=\mathbb E[XY]-a\mathbb E[X]-b\mathbb E[X^2]\\
&=\mathbb E[XY]-(\mathbb E[Y]-b\mathbb E[X])\mathbb E[X]-b\mathbb E[X^2]\\
&=\mathbb E[XY]-\mathbb E[X]\mathbb E[Y]-b\bigl(\mathbb E[X^2]-(\mathbb E[X])^2\bigr)\\
&=\operatorname{Cov}(X,Y)-b\operatorname{Var}(X).
\end{align*}
Since $\operatorname{Var}(X)>0$, this gives
\begin{align*}
b=\frac{\operatorname{Cov}(X,Y)}{\operatorname{Var}(X)}.
\end{align*}
Consequently
\begin{align*}
a=\mathbb E[Y]-\frac{\operatorname{Cov}(X,Y)}{\operatorname{Var}(X)}\mathbb E[X].
\end{align*}
Thus the normal equations have exactly one solution, namely $(\alpha,\beta)$.
[guided]
We now solve the two equations as a linear system in the unknown [real numbers](/page/Real%20Numbers) $a$ and $b$. The first normal equation is
\begin{align*}
0=\mathbb E[Y-a-bX].
\end{align*}
Linearity of expectation gives
\begin{align*}
0=\mathbb E[Y]-a-b\mathbb E[X],
\end{align*}
so the intercept must be
\begin{align*}
a=\mathbb E[Y]-b\mathbb E[X].
\end{align*}
The second normal equation is
\begin{align*}
0=\mathbb E[X(Y-a-bX)].
\end{align*}
Expanding the product and using linearity of expectation gives
\begin{align*}
0=\mathbb E[XY]-a\mathbb E[X]-b\mathbb E[X^2].
\end{align*}
Substitute the expression $a=\mathbb E[Y]-b\mathbb E[X]$ obtained from the first equation:
\begin{align*}
0
&=\mathbb E[XY]-(\mathbb E[Y]-b\mathbb E[X])\mathbb E[X]-b\mathbb E[X^2]\\
&=\mathbb E[XY]-\mathbb E[X]\mathbb E[Y]+b(\mathbb E[X])^2-b\mathbb E[X^2]\\
&=\mathbb E[XY]-\mathbb E[X]\mathbb E[Y]-b\bigl(\mathbb E[X^2]-(\mathbb E[X])^2\bigr).
\end{align*}
The identity
\begin{align*}
\operatorname{Cov}(X,Y)=\mathbb E[XY]-\mathbb E[X]\mathbb E[Y]
\end{align*}
follows by expanding $\mathbb E[(X-\mathbb E[X])(Y-\mathbb E[Y])]$, and
\begin{align*}
\operatorname{Var}(X)=\mathbb E[X^2]-(\mathbb E[X])^2.
\end{align*}
Therefore the second normal equation becomes
\begin{align*}
0=\operatorname{Cov}(X,Y)-b\operatorname{Var}(X).
\end{align*}
Because $\operatorname{Var}(X)>0$, division by $\operatorname{Var}(X)$ is valid, and hence
\begin{align*}
b=\frac{\operatorname{Cov}(X,Y)}{\operatorname{Var}(X)}.
\end{align*}
Substituting this value of $b$ into the intercept formula gives
\begin{align*}
a=\mathbb E[Y]-\frac{\operatorname{Cov}(X,Y)}{\operatorname{Var}(X)}\mathbb E[X].
\end{align*}
So the normal equations have the unique solution $(a,b)=(\alpha,\beta)$.
[/guided]
[/step]
[step:Show the normal-equation solution is the unique minimizer]
Define the residual random variable
\begin{align*}
R:\Omega&\to\mathbb R,\\
\omega&\mapsto Y(\omega)-\alpha-\beta X(\omega).
\end{align*}
By the previous step, $(\alpha,\beta)$ satisfies the normal equations, so
\begin{align*}
\mathbb E[R]&=0,&
\mathbb E[XR]&=0.
\end{align*}
Let $(a,b)\in\mathbb R^2$ be arbitrary, and define $c:=a-\alpha$ and $d:=b-\beta$. Then
\begin{align*}
Y-a-bX=R-c-dX.
\end{align*}
Expanding the square and using the two orthogonality relations gives
\begin{align*}
Q(a,b)
&=\mathbb E[(R-c-dX)^2]\\
&=\mathbb E[R^2]-2c\mathbb E[R]-2d\mathbb E[XR]+\mathbb E[(c+dX)^2]\\
&=Q(\alpha,\beta)+\mathbb E[(c+dX)^2]\\
&\geq Q(\alpha,\beta).
\end{align*}
Thus $(\alpha,\beta)$ is a minimizer.
If equality holds, then $\mathbb E[(c+dX)^2]=0$, so $c+dX=0$ $\mathbb P$-a.s. If $d\neq 0$, then $X=-c/d$ $\mathbb P$-a.s., which implies $\operatorname{Var}(X)=0$, contradicting the hypothesis. Hence $d=0$, and then $c=0$. Therefore $a=\alpha$ and $b=\beta$, so the minimizer is unique.
[/step]
[step:Identify the minimizer with the normal equations]
We have shown that every minimizer satisfies the normal equations, that the normal equations have the unique solution $(\alpha,\beta)$, and that $(\alpha,\beta)$ is the unique minimizer of $Q$. Therefore, for every $(a,b)\in\mathbb R^2$,
\begin{align*}
(a,b)=(\alpha,\beta)
\end{align*}
if and only if
\begin{align*}
\mathbb E[Y-a-bX]&=0,&
\mathbb E[X(Y-a-bX)]&=0.
\end{align*}
This proves both the stated formulas and the claimed equivalence with the population normal equations.
[/step]