[proofplan]
We regard the closed linear span of finitely many observations as a finite-dimensional Hilbert space inside $L^2(\Omega,\mathcal{F},\mathbb{P})$, with inner product given by covariance because the process has mean zero. The innovations are obtained from the observations by a triangular orthogonalization procedure, so $E_1,\dots,E_n$ form an orthogonal basis of the same subspace as $X_1,\dots,X_n$. Expanding the orthogonal projection of $X_{n+1}$ in this basis gives the coefficient formula. The covariance recursion follows by inserting the earlier predictor formula into $E_k=X_k-P_{k-1}X_k$, and the variance identity follows from the Pythagorean decomposition of $X_{n+1}$ into its projection and innovation.
[/proofplan]
[step:Place the process in the finite-dimensional Hilbert-space setting]
Define the bilinear map
\begin{align*}
(\cdot,\cdot)_{L^2}: L^2(\Omega,\mathcal{F},\mathbb{P})\times L^2(\Omega,\mathcal{F},\mathbb{P}) &\to \mathbb{R}, \\
(Y,Z) &\mapsto \mathbb{E}[YZ].
\end{align*}
Since each $X_t$ has mean zero, this inner product agrees with covariance on pairs of observations:
\begin{align*}
(X_r,X_s)_{L^2}=\operatorname{Cov}(X_r,X_s)=\kappa(r,s).
\end{align*}
Define $\mathcal{H}_0:=\{0\}$, and for each $n\geq 1$ define the finite-dimensional subspace
\begin{align*}
\mathcal{H}_n:=\operatorname{span}\{X_1,\dots,X_n\}\subset L^2(\Omega,\mathcal{F},\mathbb{P}).
\end{align*}
By the finite-dimensional [orthogonal projection theorem](/page/Orthogonal%20Projection), for every $n\geq 0$ the orthogonal projection
\begin{align*}
P_n:L^2(\Omega,\mathcal{F},\mathbb{P})&\to \mathcal{H}_n
\end{align*}
exists; in particular, $P_0$ is the zero projection onto $\mathcal{H}_0$.
For each $n\geq 1$, define the covariance matrix $\Gamma_n\in\mathbb{R}^{n\times n}$ by
\begin{align*}
(\Gamma_n)_{rs}:=\kappa(r,s),\qquad 1\leq r,s\leq n.
\end{align*}
The nonsingularity of $\Gamma_n$ implies that $X_1,\dots,X_n$ are linearly independent in $L^2$. Indeed, if real numbers $a_1,\dots,a_n$ satisfy $\sum_{r=1}^n a_rX_r=0$ in $L^2$, then
\begin{align*}
0
&=
\mathbb{E}\left[\left(\sum_{r=1}^n a_rX_r\right)^2\right] \\
&=
\sum_{r=1}^n\sum_{s=1}^n a_ra_s\kappa(r,s).
\end{align*}
Thus $a^\top\Gamma_na=0$, where $a=(a_1,\dots,a_n)^\top$. Since $\Gamma_n$ is a positive definite covariance matrix, $a=0$. Hence $\dim \mathcal{H}_n=n$.
[/step]
[step:Show that the innovations form an orthogonal basis of the past]
Fix $n\geq 1$. For each $t\in\{1,\dots,n\}$, the innovation
\begin{align*}
E_t=X_t-P_{t-1}X_t
\end{align*}
belongs to $\mathcal{H}_t$, because $X_t\in\mathcal{H}_t$ and $P_{t-1}X_t\in\mathcal{H}_{t-1}\subset\mathcal{H}_t$. Also, by the defining property of orthogonal projection,
\begin{align*}
(E_t,Y)_{L^2}=0\qquad\text{for every }Y\in\mathcal{H}_{t-1}.
\end{align*}
In particular, if $1\leq s<t\leq n$, then $E_s\in\mathcal{H}_s\subset\mathcal{H}_{t-1}$, so
\begin{align*}
(E_t,E_s)_{L^2}=0.
\end{align*}
Thus $E_1,\dots,E_n$ are pairwise orthogonal.
The relation $E_t=X_t-P_{t-1}X_t$ expresses $E_t$ as $X_t$ plus an element of $\mathcal{H}_{t-1}$. Thus, with respect to the ordered lists $(X_1,\dots,X_n)$ and $(E_1,\dots,E_n)$, the change-of-basis matrix is lower triangular with all diagonal entries equal to $1$. Its determinant is therefore $1$, so the matrix is invertible. Hence
\begin{align*}
\operatorname{span}\{E_1,\dots,E_n\}
=
\operatorname{span}\{X_1,\dots,X_n\}
=
\mathcal{H}_n.
\end{align*}
Since $X_1,\dots,X_n$ are linearly independent, no $E_t$ is zero in $L^2$. Consequently
\begin{align*}
v_t=(E_t,E_t)_{L^2}=\mathbb{E}[E_t^2]>0,\qquad 1\leq t\leq n.
\end{align*}
[guided]
The innovations are the result of orthogonalizing the observations in chronological order. For each $t$, the variable $P_{t-1}X_t$ is the best linear prediction of $X_t$ from the earlier variables $X_1,\dots,X_{t-1}$, so the residual
\begin{align*}
E_t=X_t-P_{t-1}X_t
\end{align*}
is orthogonal to every element of the past space $\mathcal{H}_{t-1}$.
Let us verify the two required facts carefully. First, $E_t\in\mathcal{H}_t$, because $X_t\in\mathcal{H}_t$ and $P_{t-1}X_t\in\mathcal{H}_{t-1}\subset\mathcal{H}_t$. Second, if $s<t$, then $E_s\in\mathcal{H}_s\subset\mathcal{H}_{t-1}$, while $E_t$ is orthogonal to all of $\mathcal{H}_{t-1}$. Hence
\begin{align*}
(E_t,E_s)_{L^2}=0.
\end{align*}
Thus the innovations are mutually orthogonal.
It remains to check that they span the same space as the observations. The identity
\begin{align*}
E_t=X_t-P_{t-1}X_t
\end{align*}
writes $E_t$ as $X_t$ plus a linear combination of $X_1,\dots,X_{t-1}$. Therefore the transformation from $X_1,\dots,X_n$ to $E_1,\dots,E_n$ is triangular and has diagonal coefficient $1$ at every stage. Such a triangular transformation is invertible, so
\begin{align*}
\operatorname{span}\{E_1,\dots,E_n\}
=
\operatorname{span}\{X_1,\dots,X_n\}.
\end{align*}
Here $\Gamma_n\in\mathbb{R}^{n\times n}$ denotes the covariance matrix with entries $(\Gamma_n)_{rs}=\kappa(r,s)$ for $1\leq r,s\leq n$. Because $\Gamma_n$ is nonsingular, the variables $X_1,\dots,X_n$ are linearly independent in $L^2$, and hence the same is true of $E_1,\dots,E_n$. Since each $E_t$ is nonzero and orthogonal to the others,
\begin{align*}
v_t=\mathbb{E}[E_t^2]=(E_t,E_t)_{L^2}>0.
\end{align*}
[/guided]
[/step]
[step:Expand the one-step predictor in the orthogonal innovation basis]
Fix $n\geq 1$. Since $E_1,\dots,E_n$ form an orthogonal basis of $\mathcal{H}_n$, there exist unique real numbers $c_{n,1},\dots,c_{n,n}$ such that
\begin{align*}
P_nX_{n+1}=\sum_{k=1}^{n}c_{n,k}E_k.
\end{align*}
Taking the $L^2$ inner product with $E_k$ and using orthogonality gives
\begin{align*}
(P_nX_{n+1},E_k)_{L^2}=c_{n,k}(E_k,E_k)_{L^2}=c_{n,k}v_k.
\end{align*}
Because $E_k\in\mathcal{H}_n$ and $X_{n+1}-P_nX_{n+1}$ is orthogonal to $\mathcal{H}_n$,
\begin{align*}
(P_nX_{n+1},E_k)_{L^2}=(X_{n+1},E_k)_{L^2}.
\end{align*}
The innovation $E_k$ has mean zero, since $P_{k-1}X_k\in\mathcal{H}_{k-1}$ is a linear combination of the mean-zero variables $X_1,\dots,X_{k-1}$. Therefore
\begin{align*}
(X_{n+1},E_k)_{L^2}
=
\operatorname{Cov}(X_{n+1},E_k).
\end{align*}
Since $v_k>0$,
\begin{align*}
c_{n,k}=\frac{\operatorname{Cov}(X_{n+1},E_k)}{v_k}.
\end{align*}
Now put $k=n+1-j$. Then
\begin{align*}
P_nX_{n+1}
&=
\sum_{j=1}^{n}\theta_{n,j}E_{n+1-j}, \\
\theta_{n,j}
&=
\frac{\operatorname{Cov}(X_{n+1},E_{n+1-j})}{v_{n+1-j}}.
\end{align*}
[/step]
[step:Derive the covariance recursion from the previous predictor]
Fix integers $n\geq 1$ and $k$ with $1\leq k\leq n$. If $k=1$, then $P_0$ is the zero projection onto $\mathcal{H}_0=\{0\}$, so $E_1=X_1-P_0X_1=X_1$. Hence
\begin{align*}
\operatorname{Cov}(X_{n+1},E_1)=\operatorname{Cov}(X_{n+1},X_1)=\kappa(n+1,1),
\end{align*}
which is the stated formula with an empty sum.
Assume now that $2\leq k\leq n$. Applying the predictor representation already proved with $n=k-1$ gives
\begin{align*}
P_{k-1}X_k=\sum_{r=1}^{k-1}\theta_{k-1,r}E_{k-r}.
\end{align*}
Using $E_k=X_k-P_{k-1}X_k$ and bilinearity of covariance,
\begin{align*}
\operatorname{Cov}(X_{n+1},E_k)
&=
\operatorname{Cov}\left(X_{n+1},X_k-\sum_{r=1}^{k-1}\theta_{k-1,r}E_{k-r}\right)\\
&=
\operatorname{Cov}(X_{n+1},X_k)
-
\sum_{r=1}^{k-1}\theta_{k-1,r}\operatorname{Cov}(X_{n+1},E_{k-r})\\
&=
\kappa(n+1,k)
-
\sum_{r=1}^{k-1}\theta_{k-1,r}\operatorname{Cov}(X_{n+1},E_{k-r}).
\end{align*}
This is the desired recursion.
[/step]
[step:Compute the innovation variance by orthogonal decomposition]
Fix $n\geq 1$. The decomposition
\begin{align*}
X_{n+1}=P_nX_{n+1}+E_{n+1}
\end{align*}
is orthogonal, because $P_nX_{n+1}\in\mathcal{H}_n$ and $E_{n+1}=X_{n+1}-P_nX_{n+1}$ is orthogonal to $\mathcal{H}_n$. The [Pythagorean identity](/page/Pythagorean%20Theorem) in the Hilbert space $L^2(\Omega,\mathcal{F},\mathbb{P})$ therefore gives
\begin{align*}
\mathbb{E}[X_{n+1}^2]
=
\mathbb{E}\bigl[(P_nX_{n+1})^2\bigr]+\mathbb{E}[E_{n+1}^2].
\end{align*}
Using the expansion
\begin{align*}
P_nX_{n+1}=\sum_{j=1}^{n}\theta_{n,j}E_{n+1-j}
\end{align*}
and the pairwise orthogonality of $E_1,\dots,E_n$, we get
\begin{align*}
\mathbb{E}\bigl[(P_nX_{n+1})^2\bigr]
&=
\sum_{j=1}^{n}\theta_{n,j}^2\mathbb{E}[E_{n+1-j}^2] \\
&=
\sum_{j=1}^{n}\theta_{n,j}^2v_{n+1-j}.
\end{align*}
Since $\mathbb{E}[X_{n+1}^2]=\kappa(n+1,n+1)$ and $v_{n+1}=\mathbb{E}[E_{n+1}^2]$, rearranging gives
\begin{align*}
v_{n+1}
=
\kappa(n+1,n+1)
-
\sum_{j=1}^{n}\theta_{n,j}^2v_{n+1-j}.
\end{align*}
This completes the proof of the innovations algorithm.
[/step]