[proofplan]
We centre the process and use the autoregressive equation to exhibit an explicit linear predictor of $Y_t$ using only the first $p$ lags. For any larger lag order $k>p$, the corresponding residual is exactly the innovation $Z_t$, which is orthogonal to every variable in the larger lag space by causality. The nonsingularity of the lag covariance matrix forces the projection coefficients onto the $k$ lag variables to be unique, so the unique coefficient of the extra lag $Y_{t-k}$ must be zero.
[/proofplan]
custom_env
admin
[step:Introduce the finite lag space and the candidate predictor using only the first $p$ lags]
Fix $k \in \mathbb{N}$ with $k > p$ and fix $t \in \mathbb{Z}$. Define the finite-dimensional subspace $V_k \subset L^2(\Omega,\mathcal{F},\mathbb{P})$ by
\begin{align*}
V_k := \operatorname{span}\{Y_{t-1},Y_{t-2},\dots,Y_{t-k}\}.
\end{align*}
Define the candidate predictor $P_k \in V_k$ by
\begin{align*}
P_k := \sum_{i=1}^{p}\phi_i Y_{t-i}.
\end{align*}
This belongs to $V_k$ because $k>p$, so each lag $Y_{t-i}$ with $1 \leq i \leq p$ is among the generators of $V_k$. By the centred autoregressive equation,
\begin{align*}
Y_t - P_k = Z_t.
\end{align*}
[/step]
custom_env
admin
[step:Show that the innovation residual is orthogonal to the entire lag space]For every $j \in \{1,\dots,k\}$, the time index $t-j$ satisfies $t-j<t$. The causal innovation orthogonality hypothesis therefore gives
\begin{align*}
\mathbb{E}[Z_tY_{t-j}] = 0.
\end{align*}
Since $Y_t-P_k=Z_t$, it follows that
\begin{align*}
\mathbb{E}[(Y_t-P_k)Y_{t-j}] = 0
\end{align*}
for every $j \in \{1,\dots,k\}$. Thus $Y_t-P_k$ is orthogonal in $L^2(\Omega,\mathcal{F},\mathbb{P})$ to each generator of $V_k$, and hence to every element of $V_k$ by linearity of expectation.[/step]
custom_env
admin
[guided]We need to verify that $P_k$ is not merely a plausible autoregressive predictor, but actually satisfies the defining orthogonality condition for the centred $L^2$ projection onto the whole lag space $V_k$. The residual after subtracting $P_k$ is
\begin{align*}
Y_t - P_k = Z_t.
\end{align*}
Now take an arbitrary generator $Y_{t-j}$ of $V_k$, where $j \in \{1,\dots,k\}$. Since $j \geq 1$, the variable $Y_{t-j}$ is a past centred value relative to time $t$. The causal AR hypothesis says precisely that the innovation $Z_t$ is uncorrelated with every centred past variable. Therefore
\begin{align*}
\mathbb{E}[Z_tY_{t-j}] = 0.
\end{align*}
Substituting $Z_t=Y_t-P_k$ gives
\begin{align*}
\mathbb{E}[(Y_t-P_k)Y_{t-j}] = 0.
\end{align*}
This holds for every generator $Y_{t-1},\dots,Y_{t-k}$. If $W \in V_k$, then there are scalars $c_1,\dots,c_k \in \mathbb{R}$ such that
\begin{align*}
W = \sum_{j=1}^{k} c_jY_{t-j}.
\end{align*}
Using linearity of expectation and the orthogonality just proved,
\begin{align*}
\mathbb{E}[(Y_t-P_k)W] = \sum_{j=1}^{k} c_j\mathbb{E}[(Y_t-P_k)Y_{t-j}] = 0.
\end{align*}
Hence the residual $Y_t-P_k$ is orthogonal to all of $V_k$. This is exactly the normal equation condition characterising the centred $L^2$ projection onto $V_k$.[/guided]
custom_env
admin
[step:Use nonsingularity to identify the unique projection coefficients]
The space $V_k$ is finite-dimensional, hence closed in the [Hilbert space](/page/Hilbert%20Space) $L^2(\Omega,\mathcal{F},\mathbb{P})$. By the Hilbert-space [projection theorem](/theorems/1985), the orthogonality condition from the preceding step characterises the centred $L^2$ projection onto $V_k$, so
\begin{align*}
\operatorname{Proj}_{V_k}Y_t = P_k.
\end{align*}
Writing $P_k$ in the $k$-lag generators gives the coefficient vector $b_k=(b_{k,1},\dots,b_{k,k}) \in \mathbb{R}^k$ defined by
$b_{k,j}:=\phi_j$ for $1 \leq j \leq p$ and $b_{k,j}:=0$ for $p<j\leq k$.
If another coefficient vector $c=(c_1,\dots,c_k)\in\mathbb{R}^k$ represents the same projection, then the difference $d=c-b_k \in \mathbb{R}^k$ satisfies
\begin{align*}
\sum_{j=1}^{k} d_jY_{t-j}=0
\end{align*}
in $L^2(\Omega,\mathcal{F},\mathbb{P})$. Taking the square in $L^2$ gives
\begin{align*}
0 = \mathbb{E}\left[\left(\sum_{j=1}^{k} d_jY_{t-j}\right)^2\right] = d^\top \Gamma_k d.
\end{align*}
Because $\Gamma_k$ is nonsingular and is a covariance matrix, it is positive definite under the stated nonsingularity hypothesis. Hence $d=0$, so the coefficient vector is unique and equals $b_k$.
[/step]
custom_env
admin
[step:Read off the coefficient of the $k$th lag]
Since $k>p$, the definition of $b_k$ gives $b_{k,k}=0$. The uniqueness of projection coefficients gives $a_{k,k}=b_{k,k}$. Therefore
\begin{align*}
a_{k,k}=0.
\end{align*}
Because $k>p$ was arbitrary, the partial autocorrelation is zero at every lag $k>p$.
[/step]