[proofplan]
The proof is obtained by multiplying the autoregressive recursion by a lagged value of the process and taking expectations. For positive lags, the current innovation $Z_t$ is uncorrelated with the past variable $X_{t-h}$, so only the autoregressive terms remain. Stationarity converts each covariance $\operatorname{Cov}(X_{t-k},X_{t-h})$ into the autocovariance value $\gamma(h-k)$. For lag zero, the same computation leaves the current innovation contribution $\operatorname{Cov}(Z_t,X_t)=\sigma^2$, because causality gives $X_t=Z_t$ plus a component depending only on past innovations.
[/proofplan]
[step:Record the mean-zero covariance identities used in the computation]
Since $(Z_t)_{t \in \mathbb{Z}}$ is white noise, $\mathbb{E}[Z_t]=0$ for every $t \in \mathbb{Z}$. Taking expectations in
\begin{align*}
X_t=\sum_{k=1}^{p}\phi_k X_{t-k}+Z_t
\end{align*}
and using second-order stationarity gives a constant mean $m:=\mathbb{E}[X_t]$ satisfying
\begin{align*}
m=\sum_{k=1}^{p}\phi_k m.
\end{align*}
For the covariance computations below, we use centered variables. Define $Y_t:=X_t-m$ for $t \in \mathbb{Z}$. Then $(Y_t)_{t \in \mathbb{Z}}$ is second-order stationary, has mean zero, has the same autocovariance function as $(X_t)$, and satisfies
\begin{align*}
Y_t=\sum_{k=1}^{p}\phi_k Y_{t-k}+Z_t.
\end{align*}
Therefore it is enough to prove the displayed identities under the normalization $\mathbb{E}[X_t]=0$ for every $t \in \mathbb{Z}$. Under this normalization,
\begin{align*}
\gamma(a-b)=\operatorname{Cov}(X_a,X_b)=\mathbb{E}[X_aX_b]
\end{align*}
for all $a,b \in \mathbb{Z}$.
[/step]
[step:Multiply by a past value to obtain the positive-lag equations]
Fix an integer $h \geq 1$. Multiplying the autoregressive equation at time $t$ by $X_{t-h}$ and taking expectations gives
\begin{align*}
\mathbb{E}[X_tX_{t-h}]
&=\mathbb{E}\left[\left(\sum_{k=1}^{p}\phi_k X_{t-k}+Z_t\right)X_{t-h}\right] \\
&=\sum_{k=1}^{p}\phi_k \mathbb{E}[X_{t-k}X_{t-h}]
+\mathbb{E}[Z_tX_{t-h}].
\end{align*}
The final expectation vanishes because $h \geq 1$ and causality gives that $Z_t$ is uncorrelated with $X_{t-h}$:
\begin{align*}
\mathbb{E}[Z_tX_{t-h}]
=\operatorname{Cov}(Z_t,X_{t-h})
=0.
\end{align*}
By second-order stationarity, for each $k \in \{1,\dots,p\}$,
\begin{align*}
\mathbb{E}[X_{t-k}X_{t-h}]
=\operatorname{Cov}(X_{t-k},X_{t-h})
=\gamma((t-k)-(t-h))
=\gamma(h-k).
\end{align*}
Also,
\begin{align*}
\mathbb{E}[X_tX_{t-h}]
=\operatorname{Cov}(X_t,X_{t-h})
=\gamma(h).
\end{align*}
Substituting these identities yields
\begin{align*}
\gamma(h)=\sum_{k=1}^{p}\phi_k\gamma(h-k).
\end{align*}
Since $h \geq 1$ was arbitrary, the positive-lag Yule-Walker equations hold for every integer $h \geq 1$.
[/step]
[step:Compute the current innovation contribution at lag zero]
By causality, there are real coefficients $(\psi_j)_{j \geq 0}$ with $\psi_0=1$ such that
\begin{align*}
X_t=\sum_{j=0}^{\infty}\psi_j Z_{t-j}
\end{align*}
with convergence in $L^2$. Define the past-innovation component $W_t$ by the $L^2$-convergent series
\begin{align*}
W_t=\sum_{j=1}^{\infty}\psi_j Z_{t-j}.
\end{align*}
Then $X_t=Z_t+W_t$. Since white noise has uncorrelated values at distinct times, $Z_t$ is uncorrelated with every $Z_{t-j}$ for $j \geq 1$. Passing to the $L^2$ limit in the covariance pairing gives
\begin{align*}
\operatorname{Cov}(Z_t,W_t)=0.
\end{align*}
Therefore
\begin{align*}
\operatorname{Cov}(Z_t,X_t)
&=\operatorname{Cov}(Z_t,Z_t+W_t) \\
&=\operatorname{Var}(Z_t)+\operatorname{Cov}(Z_t,W_t) \\
&=\sigma^2.
\end{align*}
Because both variables have mean zero, this is equivalently
\begin{align*}
\mathbb{E}[Z_tX_t]=\sigma^2.
\end{align*}
[guided]
The only point at lag zero that differs from the positive-lag calculation is that $X_t$ contains the current innovation $Z_t$ itself. Causality makes this precise: there are real coefficients $(\psi_j)_{j \geq 0}$, with $\psi_0=1$, such that
\begin{align*}
X_t=\sum_{j=0}^{\infty}\psi_j Z_{t-j}
\end{align*}
in $L^2$. We separate the current innovation from the strictly past innovations by defining
\begin{align*}
W_t=\sum_{j=1}^{\infty}\psi_j Z_{t-j}.
\end{align*}
Then $X_t=Z_t+W_t$. Since $(Z_t)_{t \in \mathbb{Z}}$ is white noise, $Z_t$ is uncorrelated with $Z_{t-j}$ for each $j \geq 1$. The covariance map is continuous under $L^2$ convergence, so $Z_t$ is uncorrelated with the $L^2$ limit $W_t$:
\begin{align*}
\operatorname{Cov}(Z_t,W_t)=0.
\end{align*}
Thus the only contribution to $\operatorname{Cov}(Z_t,X_t)$ comes from the coefficient-one copy of $Z_t$ inside $X_t$:
\begin{align*}
\operatorname{Cov}(Z_t,X_t)
&=\operatorname{Cov}(Z_t,Z_t+W_t) \\
&=\operatorname{Cov}(Z_t,Z_t)+\operatorname{Cov}(Z_t,W_t) \\
&=\operatorname{Var}(Z_t)+0 \\
&=\sigma^2.
\end{align*}
Since the variables are centered, this also says $\mathbb{E}[Z_tX_t]=\sigma^2$.
[/guided]
[/step]
[step:Multiply by the present value to obtain the lag-zero equation]
Multiplying the autoregressive equation at time $t$ by $X_t$ and taking expectations gives
\begin{align*}
\mathbb{E}[X_t^2]
&=\mathbb{E}\left[\left(\sum_{k=1}^{p}\phi_kX_{t-k}+Z_t\right)X_t\right] \\
&=\sum_{k=1}^{p}\phi_k\mathbb{E}[X_{t-k}X_t]+\mathbb{E}[Z_tX_t].
\end{align*}
By second-order stationarity,
\begin{align*}
\mathbb{E}[X_t^2]=\gamma(0),
\end{align*}
and, for each $k \in \{1,\dots,p\}$,
\begin{align*}
\mathbb{E}[X_{t-k}X_t]
=\operatorname{Cov}(X_{t-k},X_t)
=\gamma(k).
\end{align*}
The previous step gives $\mathbb{E}[Z_tX_t]=\sigma^2$. Substituting these identities into the expectation equation yields
\begin{align*}
\gamma(0)=\sum_{k=1}^{p}\phi_k\gamma(k)+\sigma^2.
\end{align*}
Together with the positive-lag identities, this proves the Yule-Walker equations.
[/step]