[proofplan]
We first check that all expectations appearing in the quadratic risk are finite. Then we expand the risk as a quadratic polynomial in $\beta$, identify the linear coefficient $m=\mathbb E[\phi(X)Y]$, and use $Q\beta^*=m$ to complete the square. The only structural point is that invertibility of the Gram matrix $Q=\mathbb E[\phi(X)\phi(X)^\top]$ forces the quadratic form $h^\top Qh$ to be strictly positive for every nonzero $h$, so the completed square has a unique minimizer. Finally, rewriting $Q\beta^*=m$ gives the population normal equations.
[/proofplan]
custom_env
admin
[step:Verify the risk and moment vector are well defined]
For each $0\le j\le p$, the random variable $\phi_j(X)Y$ is integrable because
\begin{align*}
|\phi_j(X)Y|\leq \frac{1}{2}\phi_j(X)^2+\frac{1}{2}Y^2
\end{align*}
and both terms on the right have finite expectation. Hence the vector
\begin{align*}
m:=\mathbb E[\phi(X)Y]\in\mathbb R^{p+1}
\end{align*}
is well defined.
For each $\beta=(\beta_0,\dots,\beta_p)^\top\in\mathbb R^{p+1}$, the random variable $\beta^\top\phi(X)$ is square-integrable. Indeed,
\begin{align*}
(\beta^\top\phi(X))^2
&=\left(\sum_{j=0}^{p}\beta_j\phi_j(X)\right)^2\\
&\leq (p+1)\sum_{j=0}^{p}\beta_j^2\phi_j(X)^2,
\end{align*}
where the inequality is the finite-dimensional [Cauchy-Schwarz inequality](/theorems/432). Taking expectations gives $\mathbb E[(\beta^\top\phi(X))^2]<\infty$. Therefore $Y-\beta^\top\phi(X)$ is square-integrible, and $R(\beta)$ is finite for every $\beta\in\mathbb R^{p+1}$.
[/step]
custom_env
admin
[step:Expand the population risk as a quadratic form]For every $\beta\in\mathbb R^{p+1}$, expanding the square and using linearity of expectation gives
\begin{align*}
R(\beta)
&=\mathbb E[(Y-\beta^\top\phi(X))^2]\\
&=\mathbb E[Y^2]-2\mathbb E[\beta^\top\phi(X)Y]+\mathbb E[\beta^\top\phi(X)\phi(X)^\top\beta]\\
&=\mathbb E[Y^2]-2\beta^\top m+\beta^\top Q\beta.
\end{align*}
The second equality is justified by the integrability verified above, and the last equality uses the definitions of $m$ and $Q$.[/step]
custom_env
admin
[guided]The point of this step is to convert the least-squares problem into a finite-dimensional quadratic minimization problem. Fix $\beta=(\beta_0,\dots,\beta_p)^\top\in\mathbb R^{p+1}$. Since the previous step proved that $Y$, $\beta^\top\phi(X)$, and their product are integrable in the required powers, we may expand the square inside the expectation and use linearity:
\begin{align*}
R(\beta)
&=\mathbb E[(Y-\beta^\top\phi(X))^2]\\
&=\mathbb E[Y^2]-2\mathbb E[\beta^\top\phi(X)Y]+\mathbb E[(\beta^\top\phi(X))^2].
\end{align*}
Now $\beta$ is deterministic, so it can be pulled through the expectation. The middle term becomes
\begin{align*}
\mathbb E[\beta^\top\phi(X)Y]
=\beta^\top\mathbb E[\phi(X)Y]
=\beta^\top m.
\end{align*}
For the quadratic term, use $(\beta^\top\phi(X))^2=\beta^\top\phi(X)\phi(X)^\top\beta$, again with deterministic $\beta$, to obtain
\begin{align*}
\mathbb E[(\beta^\top\phi(X))^2]
=\beta^\top\mathbb E[\phi(X)\phi(X)^\top]\beta
=\beta^\top Q\beta.
\end{align*}
Combining these identities gives
\begin{align*}
R(\beta)=\mathbb E[Y^2]-2\beta^\top m+\beta^\top Q\beta.
\end{align*}[/guided]
custom_env
admin
[step:Show that the invertible Gram matrix gives a strictly positive quadratic form]We claim that
\begin{align*}
h^\top Qh>0\qquad\text{for every }h\in\mathbb R^{p+1}\setminus\{0\}.
\end{align*}
Let $h\in\mathbb R^{p+1}$. Then
\begin{align*}
h^\top Qh
&=h^\top\mathbb E[\phi(X)\phi(X)^\top]h\\
&=\mathbb E[(h^\top\phi(X))^2]\geq 0.
\end{align*}
If $h^\top Qh=0$, then $\mathbb E[(h^\top\phi(X))^2]=0$, so $h^\top\phi(X)=0$ $\mathbb P$-a.s. For each $0\le j\le p$, the random variable $\phi_j(X)(h^\top\phi(X))$ is integrable by Cauchy-Schwarz, and hence
\begin{align*}
(Qh)_j
&=\mathbb E[\phi_j(X)(h^\top\phi(X))]\\
&=0.
\end{align*}
Thus $Qh=0$. Since $Q$ is invertible, $h=0$. Therefore no nonzero $h$ can satisfy $h^\top Qh=0$, proving strict positivity.[/step]
custom_env
admin
[guided]We need strict positivity of $h^\top Qh$ because uniqueness of the minimizer will come from a square-completion term. The matrix $Q$ is a Gram matrix of the random feature vector $\phi(X)$, so for any $h\in\mathbb R^{p+1}$,
\begin{align*}
h^\top Qh
&=h^\top\mathbb E[\phi(X)\phi(X)^\top]h\\
&=\mathbb E[h^\top\phi(X)\phi(X)^\top h]\\
&=\mathbb E[(h^\top\phi(X))^2]\geq 0.
\end{align*}
Thus $Q$ is positive semidefinite.
It remains to rule out a nonzero direction $h$ with zero quadratic form. Suppose $h^\top Qh=0$. Then
\begin{align*}
\mathbb E[(h^\top\phi(X))^2]=0.
\end{align*}
A nonnegative random variable with expectation zero is zero almost surely, so $h^\top\phi(X)=0$ $\mathbb P$-a.s. For each coordinate $0\le j\le p$, the random variable $\phi_j(X)(h^\top\phi(X))$ is integrable because both $\phi_j(X)$ and $h^\top\phi(X)$ are square-integrable. Therefore
\begin{align*}
(Qh)_j
&=\mathbb E[\phi_j(X)(\phi(X)^\top h)]\\
&=\mathbb E[\phi_j(X)(h^\top\phi(X))]\\
&=0.
\end{align*}
Since this holds for every coordinate $j$, we have $Qh=0$. The hypothesis says that $Q$ is invertible, so its kernel is $\{0\}$, and hence $h=0$. Therefore every nonzero $h$ satisfies $h^\top Qh>0$.[/guided]
custom_env
admin
[step:Complete the square to identify the unique minimizer]
Define
\begin{align*}
\beta^*:=Q^{-1}m\in\mathbb R^{p+1}.
\end{align*}
Then $Q\beta^*=m$. For any $\beta\in\mathbb R^{p+1}$, set
\begin{align*}
h:=\beta-\beta^*\in\mathbb R^{p+1}.
\end{align*}
Using the quadratic expansion from the previous step and $m=Q\beta^*$, we compute
\begin{align*}
R(\beta)-R(\beta^*)
&=\left(\mathbb E[Y^2]-2\beta^\top m+\beta^\top Q\beta\right)
-\left(\mathbb E[Y^2]-2\beta^{*\top}m+\beta^{*\top}Q\beta^*\right)\\
&=-2(\beta-\beta^*)^\top m+\beta^\top Q\beta-\beta^{*\top}Q\beta^*\\
&=-2h^\top Q\beta^*+(\beta^*+h)^\top Q(\beta^*+h)-\beta^{*\top}Q\beta^*\\
&=h^\top Qh.
\end{align*}
By strict positivity of the quadratic form, $h^\top Qh\geq 0$, with equality if and only if $h=0$. Hence $R(\beta)\geq R(\beta^*)$ for every $\beta\in\mathbb R^{p+1}$, and equality holds if and only if $\beta=\beta^*$. Thus $\beta^*$ is the unique minimizer of $R$.
[/step]
custom_env
admin
[step:Rewrite the minimizing equation as the population normal equations]
Since $Q\beta^*=m$, we have
\begin{align*}
0
&=m-Q\beta^*\\
&=\mathbb E[\phi(X)Y]-\mathbb E[\phi(X)\phi(X)^\top]\beta^*\\
&=\mathbb E[\phi(X)Y]-\mathbb E[\phi(X)(\phi(X)^\top\beta^*)]\\
&=\mathbb E[\phi(X)(Y-\beta^{*\top}\phi(X))].
\end{align*}
This proves that the unique minimizer satisfies the population normal equations.
Conversely, if $\beta\in\mathbb R^{p+1}$ satisfies
\begin{align*}
\mathbb E[\phi(X)(Y-\beta^\top\phi(X))]=0,
\end{align*}
then
\begin{align*}
m-Q\beta=0.
\end{align*}
Since $Q$ is invertible, $\beta=Q^{-1}m=\beta^*$. Therefore the population normal equations characterize the unique population least-squares minimizer.
[/step]