Differencing Removes a Simple Unit Root (Theorem # 3636)
Theorem
Let $\phi,\psi:\mathbb{C}\to\mathbb{C}$ be real-coefficient polynomials satisfying
\begin{align*}
\phi(z)=(1-z)\psi(z)
\end{align*}
for every $z\in\mathbb{C}$. Let $(Z_t)_{t\in\mathbb{Z}}$ be zero-mean white noise with finite variance, and suppose all zeros of $\psi$ lie outside the closed unit disk. If a process $(X_t)_{t\in\mathbb{Z}}$ satisfies
\begin{align*}
\phi(B)X_t=Z_t,
\end{align*}
and $Y_t=\Delta X_t$, where $B$ is the backshift operator and $\Delta=1-B$, then $(Y_t)_{t\in\mathbb{Z}}$ satisfies
\begin{align*}
\psi(B)Y_t=Z_t.
\end{align*}
Moreover, in the class of causal absolutely summable linear-process solutions, this reduced equation has the unique causal weakly stationary ARMA solution.
Discussion
No discussion available for this theorem.
Proof
[proofplan]
The proof has two parts. First, we use only algebra of the backshift operator: because polynomials in $B$ commute, the factorization $\phi(z) = (1-z)\psi(z)$ gives $\phi(B)X_t = \psi(B)\Delta X_t$, so the differenced process satisfies the reduced autoregressive equation. Second, the root condition on $\psi$ implies that $1/\psi$ has a power-series expansion with absolutely summable coefficients, producing a causal linear-process solution driven by the white noise. The absolute summability of those coefficients gives weak stationarity, the finite convolution identity verifies the reduced equation in $L^2$, and coefficient comparison proves uniqueness in the explicitly stated class of causal absolutely summable linear-process solutions.
[/proofplan]
[step:Use the polynomial factorization to remove the unit-root factor]
For every real polynomial $q(z) = \sum_{k=0}^{m} q_k z^k$, define the backshift polynomial operator
\begin{align*}
q(B): \mathbb{R}^{\mathbb{Z}} &\to \mathbb{R}^{\mathbb{Z}} \\
(W_t)_{t\in\mathbb{Z}} &\mapsto \left(\sum_{k=0}^{m} q_k W_{t-k}\right)_{t\in\mathbb{Z}}.
\end{align*}
Write $\psi(z)=\sum_{k=0}^{p}\psi_k z^k$, where $p\in\mathbb{N}\cup\{0\}$ and $\psi_k\in\mathbb{R}$ for $0\leq k\leq p$. Since $B^iB^j = B^{i+j} = B^jB^i$ for all non-negative integers $i,j$, polynomial operators in $B$ commute. Therefore
\begin{align*}
\phi(B)
= ((1-B)\psi(B))
= \psi(B)(1-B)
= \psi(B)\Delta.
\end{align*}
Applying this identity to $(X_t)_{t \in \mathbb{Z}}$ gives, for every $t \in \mathbb{Z}$,
\begin{align*}
Z_t
= \phi(B)X_t
= \psi(B)\Delta X_t
= \psi(B)Y_t.
\end{align*}
Thus the differenced process $(Y_t)_{t \in \mathbb{Z}}$ satisfies the reduced autoregressive equation
\begin{align*}
\psi(B)Y_t = Z_t.
\end{align*}
[guided]
We first isolate the purely algebraic point. If $q(z) = \sum_{k=0}^{m} q_k z^k$ is a polynomial, then $q(B)$ is the map
\begin{align*}
q(B): \mathbb{R}^{\mathbb{Z}} &\to \mathbb{R}^{\mathbb{Z}} \\
(W_t)_{t\in\mathbb{Z}} &\mapsto \left(\sum_{k=0}^{m} q_k W_{t-k}\right)_{t\in\mathbb{Z}}.
\end{align*}
Also write $\psi(z)=\sum_{k=0}^{p}\psi_k z^k$, with $p\in\mathbb{N}\cup\{0\}$ and $\psi_k\in\mathbb{R}$. The important fact is that the powers of $B$ commute:
\begin{align*}
B^iB^jW_t = W_{t-i-j} = B^jB^iW_t.
\end{align*}
Hence any two polynomials in $B$ commute.
Now use the assumed factorization
\begin{align*}
\phi(z) = (1-z)\psi(z).
\end{align*}
Substituting $B$ for $z$ gives
\begin{align*}
\phi(B)
= (1-B)\psi(B)
= \psi(B)(1-B)
= \psi(B)\Delta,
\end{align*}
where $\Delta = 1-B$. Since $Y_t = \Delta X_t$, the original equation becomes
\begin{align*}
Z_t
= \phi(B)X_t
= \psi(B)\Delta X_t
= \psi(B)Y_t.
\end{align*}
So the unit-root factor $1-B$ has been absorbed by differencing, and the remaining dynamics are governed by $\psi(B)$.
[/guided]
[/step]
[step:Invert the stable autoregressive polynomial by an absolutely summable power series]
Because every zero of $\psi$ lies outside the closed unit disk, there exists a number $r > 1$ such that $\psi(z) \neq 0$ whenever $|z| < r$. Therefore the reciprocal map
\begin{align*}
R_\psi: \{z \in \mathbb{C}: |z|<r\} &\to \mathbb{C} \\
z &\mapsto \frac{1}{\psi(z)}
\end{align*}
is holomorphic and has a Taylor expansion
\begin{align*}
\frac{1}{\psi(z)} = \sum_{j=0}^{\infty} \pi_j z^j
\end{align*}
converging absolutely for every $|z| < r$. Choose $\rho$ with $1 < \rho < r$. By [Cauchy's coefficient estimate](/page/Cauchy%20Coefficient%20Estimate) applied on the circle $|z|=\rho$, there is a finite constant
\begin{align*}
M_\rho := \sup_{|z|=\rho} \left|\frac{1}{\psi(z)}\right| < \infty
\end{align*}
such that
\begin{align*}
|\pi_j| \leq M_\rho \rho^{-j}
\end{align*}
for every $j \geq 0$. Hence
\begin{align*}
\sum_{j=0}^{\infty} |\pi_j|
\leq M_\rho \sum_{j=0}^{\infty} \rho^{-j}
= M_\rho \frac{\rho}{\rho-1}
< \infty.
\end{align*}
[guided]
The root condition is exactly what makes the reduced autoregressive equation stable. Since all zeros of $\psi$ lie outside the closed unit disk, the polynomial $\psi$ has no zero on some larger open disk around the origin. More precisely, there exists $r > 1$ such that
\begin{align*}
\psi(z) \neq 0
\end{align*}
whenever $|z| < r$.
Therefore the reciprocal map
\begin{align*}
R_\psi: \{z \in \mathbb{C}: |z|<r\} &\to \mathbb{C} \\
z &\mapsto \frac{1}{\psi(z)}
\end{align*}
is holomorphic. Its Taylor series at $0$ has the form
\begin{align*}
\frac{1}{\psi(z)} = \sum_{j=0}^{\infty} \pi_j z^j,
\end{align*}
and this series converges throughout the disk of radius $r$.
We need more than convergence at each point: we need absolute summability of the coefficient sequence $(\pi_j)_{j=0}^{\infty}$. Choose a radius $\rho$ satisfying $1 < \rho < r$. Since the circle $\{z \in \mathbb{C}: |z|=\rho\}$ is compact and $1/\psi$ is continuous there, the constant
\begin{align*}
M_\rho := \sup_{|z|=\rho} \left|\frac{1}{\psi(z)}\right|
\end{align*}
is finite. [Cauchy's coefficient estimate](/page/Cauchy%20Coefficient%20Estimate) gives
\begin{align*}
|\pi_j| \leq M_\rho \rho^{-j}
\end{align*}
for every $j \geq 0$. Since $\rho > 1$, the geometric series is summable, and therefore
\begin{align*}
\sum_{j=0}^{\infty} |\pi_j|
\leq M_\rho \sum_{j=0}^{\infty} \rho^{-j}
= M_\rho \frac{\rho}{\rho-1}
< \infty.
\end{align*}
This absolute summability is the analytic form of the autoregressive stationarity condition.
[/guided]
[/step]
[step:Construct the causal stationary solution of the reduced equation]
Let $(\Omega,\mathcal{F},\mathbb{P})$ denote the probability space on which the white noise process $(Z_t)_{t \in \mathbb{Z}}$ is defined, and let $L^2(\Omega,\mathcal{F},\mathbb{P})$ denote the Hilbert space of square-integrable real-valued random variables modulo equality $\mathbb{P}$-a.s., with norm $\|U\|_{L^2}:=(\mathbb{E}[|U|^2])^{1/2}$. Write $\sigma^2:=\operatorname{Var}(Z_0)$; by the zero-mean white-noise hypothesis, $\mathbb{E}[Z_t]=0$ and $\operatorname{Cov}(Z_s,Z_u)=0$ for $s\neq u$. In this proof, a causal absolutely summable linear process means a process $(W_t)_{t\in\mathbb{Z}}$ for which there exists a sequence $(c_j)_{j=0}^{\infty}\in\ell^1$ such that
\begin{align*}
W_t=\sum_{j=0}^{\infty}c_jZ_{t-j}
\end{align*}
in $L^2(\Omega,\mathcal{F},\mathbb{P})$ for every $t\in\mathbb{Z}$. Define the process $(\widetilde{Y}_t)_{t \in \mathbb{Z}}$ by
\begin{align*}
\widetilde{Y}_t := \sum_{j=0}^{\infty} \pi_j Z_{t-j}.
\end{align*}
Since $(\pi_j)_{j=0}^{\infty} \in \ell^1 \subset \ell^2$ and $\operatorname{Var}(Z_t)=\sigma^2<\infty$, the partial sums form a Cauchy sequence in $L^2(\Omega,\mathcal{F},\mathbb{P})$, because for $N>M$,
\begin{align*}
\mathbb{E}\left[\left|\sum_{j=M+1}^{N}\pi_j Z_{t-j}\right|^2\right]
= \sigma^2 \sum_{j=M+1}^{N}|\pi_j|^2
\leq \sigma^2 \left(\sum_{j=M+1}^{N}|\pi_j|\right)^2.
\end{align*}
Thus the series converges in $L^2(\Omega,\mathcal{F},\mathbb{P})$ for each $t \in \mathbb{Z}$. The process is causal because $\widetilde{Y}_t$ is a function of $Z_t,Z_{t-1},Z_{t-2},\dots$ only.
The expectation functional is continuous on $L^2(\Omega,\mathcal{F},\mathbb{P})$: by the [Cauchy-Schwarz inequality](/page/Cauchy-Schwarz%20Inequality),
\begin{align*}
|\mathbb{E}[U]| \leq \mathbb{E}[|U|] \leq \|U\|_{L^2}
\end{align*}
for every $U\in L^2(\Omega,\mathcal{F},\mathbb{P})$. Therefore the mean may be computed by passing from finite partial sums to the $L^2$ limit:
\begin{align*}
\mathbb{E}[\widetilde{Y}_t]
= \sum_{j=0}^{\infty} \pi_j \mathbb{E}[Z_{t-j}]
= 0.
\end{align*}
For $h \in \mathbb{Z}$, using the white-noise covariance relation
\begin{align*}
\operatorname{Cov}(Z_s,Z_u)
=
\begin{cases}
\sigma^2, & s=u,\\
0, & s \neq u,
\end{cases}
\end{align*}
we obtain the covariance by first computing finite partial sums and then passing to the $L^2$ limit. This passage is valid because if $U_N\to U$ and $V_N\to V$ in $L^2(\Omega,\mathcal{F},\mathbb{P})$, then the [Cauchy-Schwarz inequality](/page/Cauchy-Schwarz%20Inequality) gives
\begin{align*}
|\operatorname{Cov}(U_N,V_N)-\operatorname{Cov}(U,V)|
\leq \|U_N-U\|_{L^2}\|V_N\|_{L^2}+\|U\|_{L^2}\|V_N-V\|_{L^2},
\end{align*}
after expanding around $U$ and $V$. Hence
\begin{align*}
\operatorname{Cov}(\widetilde{Y}_{t+h},\widetilde{Y}_t)
&=
\operatorname{Cov}\left(\sum_{i=0}^{\infty}\pi_i Z_{t+h-i}, \sum_{j=0}^{\infty}\pi_j Z_{t-j}\right) \\
&=
\sigma^2 \sum_{\substack{i,j \geq 0 \\ t+h-i=t-j}} \pi_i\pi_j \\
&=
\sigma^2 \sum_{\substack{i,j \geq 0 \\ j=i-h}} \pi_i\pi_j.
\end{align*}
This depends only on $h$, not on $t$. Hence $(\widetilde{Y}_t)_{t \in \mathbb{Z}}$ is weakly stationary.
[guided]
We now build the stationary causal solution explicitly. Let $(\Omega,\mathcal{F},\mathbb{P})$ be the probability space on which the white noise process $(Z_t)_{t \in \mathbb{Z}}$ is defined. The notation $L^2(\Omega,\mathcal{F},\mathbb{P})$ means the Hilbert space of square-integrable real-valued random variables modulo equality $\mathbb{P}$-a.s., equipped with the norm $\|U\|_{L^2}:=(\mathbb{E}[|U|^2])^{1/2}$. Write $\sigma^2:=\operatorname{Var}(Z_0)$. The zero-mean white-noise hypothesis says that $\mathbb{E}[Z_t]=0$ for every $t\in\mathbb{Z}$ and that $\operatorname{Cov}(Z_s,Z_u)=0$ whenever $s\neq u$.
Here causal has a precise meaning: a process $(W_t)_{t\in\mathbb{Z}}$ is a causal absolutely summable linear process if there is a sequence $(c_j)_{j=0}^{\infty}\in\ell^1$ such that
\begin{align*}
W_t=\sum_{j=0}^{\infty}c_jZ_{t-j}
\end{align*}
in $L^2(\Omega,\mathcal{F},\mathbb{P})$ for every $t\in\mathbb{Z}$. Define
\begin{align*}
\widetilde{Y}_t := \sum_{j=0}^{\infty} \pi_j Z_{t-j}.
\end{align*}
This is causal in the stated sense because the coefficient sequence $(\pi_j)_{j=0}^{\infty}$ is in $\ell^1$ and only present and past shocks appear: the terms are $Z_t,Z_{t-1},Z_{t-2},\dots$.
The coefficient sequence is absolutely summable, so in particular it is square summable. To verify convergence, consider the difference of two partial sums with $N>M$. The white-noise covariance relation gives
\begin{align*}
\mathbb{E}\left[\left|\sum_{j=M+1}^{N}\pi_j Z_{t-j}\right|^2\right]
= \sigma^2 \sum_{j=M+1}^{N}|\pi_j|^2
\leq \sigma^2 \left(\sum_{j=M+1}^{N}|\pi_j|\right)^2.
\end{align*}
The right-hand side tends to $0$ as $M,N \to \infty$ because $(\pi_j)_{j=0}^{\infty}\in\ell^1$. Hence the series converges in $L^2(\Omega,\mathcal{F},\mathbb{P})$ for each fixed $t$. Its mean is computed by first using finite partial sums and then passing to the $L^2$ limit. This passage is valid because expectation is continuous on $L^2(\Omega,\mathcal{F},\mathbb{P})$: by the [Cauchy-Schwarz inequality](/page/Cauchy-Schwarz%20Inequality),
\begin{align*}
|\mathbb{E}[U]| \leq \mathbb{E}[|U|] \leq \|U\|_{L^2}
\end{align*}
for every $U\in L^2(\Omega,\mathcal{F},\mathbb{P})$. Therefore linearity of expectation for the finite sums and $\mathbb{E}[Z_t]=0$ give
\begin{align*}
\mathbb{E}[\widetilde{Y}_t]
= \sum_{j=0}^{\infty} \pi_j \mathbb{E}[Z_{t-j}]
= 0.
\end{align*}
To verify weak stationarity, we compute the covariance at lag $h \in \mathbb{Z}$. The white-noise covariance rule is
\begin{align*}
\operatorname{Cov}(Z_s,Z_u)
=
\begin{cases}
\sigma^2, & s=u,\\
0, & s \neq u.
\end{cases}
\end{align*}
We first compute the covariance for finite partial sums and then let the truncation level tend to infinity. This is legitimate because covariance is continuous with respect to $L^2$ convergence: if $U_N\to U$ and $V_N\to V$ in $L^2(\Omega,\mathcal{F},\mathbb{P})$, then expanding the covariance difference and applying the [Cauchy-Schwarz inequality](/page/Cauchy-Schwarz%20Inequality) gives
\begin{align*}
|\operatorname{Cov}(U_N,V_N)-\operatorname{Cov}(U,V)|
\leq \|U_N-U\|_{L^2}\|V_N\|_{L^2}+\|U\|_{L^2}\|V_N-V\|_{L^2},
\end{align*}
which tends to $0$ when the approximating sequences are $L^2$ convergent and hence bounded in $L^2$. Therefore
\begin{align*}
\operatorname{Cov}(\widetilde{Y}_{t+h},\widetilde{Y}_t)
&=
\operatorname{Cov}\left(\sum_{i=0}^{\infty}\pi_i Z_{t+h-i}, \sum_{j=0}^{\infty}\pi_j Z_{t-j}\right) \\
&=
\sum_{i=0}^{\infty}\sum_{j=0}^{\infty}\pi_i\pi_j
\operatorname{Cov}(Z_{t+h-i},Z_{t-j}) \\
&=
\sigma^2 \sum_{\substack{i,j \geq 0 \\ t+h-i=t-j}} \pi_i\pi_j \\
&=
\sigma^2 \sum_{\substack{i,j \geq 0 \\ j=i-h}} \pi_i\pi_j.
\end{align*}
The final expression depends on the lag $h$ but not on the time index $t$. Thus the mean is constant and the autocovariance depends only on the lag, which is precisely weak stationarity.
[/guided]
[/step]
[step:Verify that the causal solution satisfies the reduced autoregressive equation]
Using the notation $\psi(z)=\sum_{k=0}^{p}\psi_k z^k$ fixed above, the identity
\begin{align*}
\psi(z)\sum_{j=0}^{\infty}\pi_j z^j = 1
\end{align*}
holds for $|z|<r$. Since the series is absolutely convergent on $|z|\leq 1$, multiplication by the finite polynomial $\psi$ is justified term-by-term, and comparison of coefficients gives
\begin{align*}
\sum_{k=0}^{\min\{p,m\}}\psi_k\pi_{m-k}
=
\begin{cases}
1, & m=0,\\
0, & m\geq 1,
\end{cases}
\end{align*}
where $\pi_j:=0$ for $j<0$. For $N\in\mathbb{N}$, define the truncated process $\widetilde{Y}_{t,N}:=\sum_{j=0}^{N}\pi_j Z_{t-j}$. Because $\widetilde{Y}_{t,N}\to\widetilde{Y}_t$ in $L^2(\Omega,\mathcal{F},\mathbb{P})$ for every $t$ and $\psi(B)$ is the finite linear combination $\sum_{k=0}^{p}\psi_kB^k$, we have
\begin{align*}
\psi(B)\widetilde{Y}_{t,N}\to \psi(B)\widetilde{Y}_t
\end{align*}
in $L^2(\Omega,\mathcal{F},\mathbb{P})$. For each finite $N$, rearranging the finite sums gives
\begin{align*}
\psi(B)\widetilde{Y}_{t,N}
&= \sum_{m=0}^{N+p}\left(\sum_{k=0}^{\min\{p,m\}}\psi_k\pi_{m-k}\right)Z_{t-m},
\end{align*}
where $\pi_j:=0$ for $j<0$ and for $j>N$ in this finite truncation. The coefficient identity gives the main term $Z_t$ and leaves only the boundary tail
\begin{align*}
R_{t,N}:=\sum_{m=N+1}^{N+p}\left(\sum_{k=m-N}^{p}\psi_k\pi_{m-k}\right)Z_{t-m}.
\end{align*}
Thus
\begin{align*}
\psi(B)\widetilde{Y}_{t,N}=Z_t+R_{t,N}.
\end{align*}
Using orthogonality of the white-noise variables and then the triangle inequality in $\ell^1$, we have
\begin{align*}
\mathbb{E}[|R_{t,N}|^2]
&=\sigma^2\sum_{m=N+1}^{N+p}\left|\sum_{k=m-N}^{p}\psi_k\pi_{m-k}\right|^2 \\
&\leq \sigma^2\left(\sum_{m=N+1}^{N+p}\sum_{k=m-N}^{p}|\psi_k|\,|\pi_{m-k}|\right)^2 \\
&\leq \sigma^2\left(\sum_{k=0}^{p}|\psi_k|\right)^2\left(\sum_{j=N+1-p}^{N}|\pi_j|\right)^2.
\end{align*}
The last expression tends to $0$ as $N\to\infty$ because $(\pi_j)_{j=0}^{\infty}\in\ell^1$. Since also $\psi(B)\widetilde{Y}_{t,N}\to\psi(B)\widetilde{Y}_t$ in $L^2(\Omega,\mathcal{F},\mathbb{P})$, it follows that
\begin{align*}
\psi(B)\widetilde{Y}_t=Z_t
\end{align*}
in $L^2(\Omega,\mathcal{F},\mathbb{P})$ for every $t\in\mathbb{Z}$.
Now let $(V_t)_{t\in\mathbb{Z}}$ be any causal absolutely summable linear-process solution of the reduced equation in the sense defined above. Thus there is a coefficient sequence $(a_j)_{j=0}^{\infty}\in\ell^1$ such that
\begin{align*}
V_t=\sum_{j=0}^{\infty}a_j Z_{t-j}
\end{align*}
in $L^2(\Omega,\mathcal{F},\mathbb{P})$ and $\psi(B)V_t=Z_t$. If $\sigma^2=0$, then $Z_t=0$ in $L^2(\Omega,\mathcal{F},\mathbb{P})$ for every $t$, so every causal linear process driven by $(Z_t)$ is the zero process in $L^2$; hence $V_t=\widetilde{Y}_t=0$ for every $t$.
Assume now that $\sigma^2>0$. Define the absolutely summable sequence $(c_m)_{m=0}^{\infty}$ by
\begin{align*}
c_m:=\sum_{k=0}^{\min\{p,m\}}\psi_k a_{m-k},
\end{align*}
where $a_j:=0$ for $j<0$. Since $\psi$ is a finite polynomial and $(a_j)_{j=0}^{\infty}\in\ell^1$, applying $\psi(B)$ to $V$ gives
\begin{align*}
\psi(B)V_t=\sum_{m=0}^{\infty}c_m Z_{t-m}
\end{align*}
in $L^2(\Omega,\mathcal{F},\mathbb{P})$. Because $\psi(B)V_t=Z_t$, the difference sequence $(d_m)_{m=0}^{\infty}$ defined by $d_0:=c_0-1$ and $d_m:=c_m$ for $m\geq1$ satisfies
\begin{align*}
\sum_{m=0}^{\infty}d_m Z_{t-m}=0
\end{align*}
in $L^2(\Omega,\mathcal{F},\mathbb{P})$. For a fixed $\ell\geq0$, take covariance with $Z_{t-\ell}$. The covariance is continuous under the $L^2$ limit and the white-noise covariance relation gives
\begin{align*}
0=\operatorname{Cov}\left(\sum_{m=0}^{\infty}d_m Z_{t-m},Z_{t-\ell}\right)=\sigma^2 d_\ell.
\end{align*}
Since $\sigma^2>0$, $d_\ell=0$ for every $\ell\geq0$. Therefore
\begin{align*}
\sum_{m=0}^{\infty}\left(\sum_{k=0}^{\min\{p,m\}}\psi_k a_{m-k}\right)z^m=1
\end{align*}
for $|z|<1$. Since $\psi(z)\neq 0$ for $|z|<1$, this implies
\begin{align*}
\sum_{m=0}^{\infty}a_m z^m=\frac{1}{\psi(z)}=\sum_{m=0}^{\infty}\pi_m z^m
\end{align*}
for $|z|<1$. Uniqueness of Taylor coefficients gives $a_m=\pi_m$ for every $m\geq0$, so $V_t=\widetilde{Y}_t$ in $L^2(\Omega,\mathcal{F},\mathbb{P})$ for every $t\in\mathbb{Z}$.
Thus the reduced equation has the unique causal absolutely summable linear-process solution
\begin{align*}
\widetilde{Y}_t = \sum_{j=0}^{\infty} \pi_j Z_{t-j}.
\end{align*}
The preceding covariance computation proves that this solution is weakly stationary. The first step proves separately that the differenced process $Y_t=\Delta X_t$ satisfies $\psi(B)Y_t=Z_t$; when $Y$ is taken in the stated causal absolutely summable linear-process class, it is therefore this unique stationary reduced ARMA process.
[guided]
The power series was chosen so that it is the reciprocal of $\psi$. Using the notation $\psi(z)=\sum_{k=0}^{p}\psi_k z^k$ fixed above, the reciprocal identity is
\begin{align*}
\psi(z)\sum_{j=0}^{\infty}\pi_j z^j = 1
\end{align*}
for $|z|<r$. Because $\psi$ is a finite polynomial and $(\pi_j)_{j=0}^{\infty}\in\ell^1$, multiplying the finite polynomial into the absolutely convergent series is legitimate. Comparing Taylor coefficients gives
\begin{align*}
\sum_{k=0}^{\min\{p,m\}}\psi_k\pi_{m-k}
=
\begin{cases}
1, & m=0,\\
0, & m\geq 1,
\end{cases}
\end{align*}
where $\pi_j:=0$ for $j<0$.
We now translate this coefficient identity back into the stochastic equation using finite truncations. For $N\in\mathbb{N}$, set
\begin{align*}
\widetilde{Y}_{t,N}:=\sum_{j=0}^{N}\pi_jZ_{t-j}.
\end{align*}
The construction above gives $\widetilde{Y}_{t,N}\to\widetilde{Y}_t$ in $L^2(\Omega,\mathcal{F},\mathbb{P})$ for every $t$. Since $\psi(B)=\sum_{k=0}^{p}\psi_kB^k$ is a finite sum of shifts, applying $\psi(B)$ preserves $L^2$ convergence, so
\begin{align*}
\psi(B)\widetilde{Y}_{t,N}\to\psi(B)\widetilde{Y}_t
\end{align*}
in $L^2(\Omega,\mathcal{F},\mathbb{P})$. For each fixed $N$, all sums are finite, and therefore
\begin{align*}
\psi(B)\widetilde{Y}_{t,N}
&= \sum_{m=0}^{N+p}\left(\sum_{k=0}^{\min\{p,m\}}\psi_k\pi_{m-k}\right)Z_{t-m},
\end{align*}
where $\pi_j:=0$ for $j<0$ and for $j>N$ in this finite truncation. The coefficient identity kills all coefficients except the coefficient of $Z_t$, but the finite truncation creates boundary terms near the cutoff. More precisely,
\begin{align*}
\psi(B)\widetilde{Y}_{t,N}=Z_t+R_{t,N},
\end{align*}
where
\begin{align*}
R_{t,N}:=\sum_{m=N+1}^{N+p}\left(\sum_{k=m-N}^{p}\psi_k\pi_{m-k}\right)Z_{t-m}.
\end{align*}
The tail $R_{t,N}$ vanishes in $L^2$. Indeed, orthogonality of the white-noise variables and the triangle inequality give
\begin{align*}
\mathbb{E}[|R_{t,N}|^2]
&=\sigma^2\sum_{m=N+1}^{N+p}\left|\sum_{k=m-N}^{p}\psi_k\pi_{m-k}\right|^2 \\
&\leq \sigma^2\left(\sum_{m=N+1}^{N+p}\sum_{k=m-N}^{p}|\psi_k|\,|\pi_{m-k}|\right)^2 \\
&\leq \sigma^2\left(\sum_{k=0}^{p}|\psi_k|\right)^2\left(\sum_{j=N+1-p}^{N}|\pi_j|\right)^2.
\end{align*}
Because $(\pi_j)_{j=0}^{\infty}\in\ell^1$, the final tail tends to $0$ as $N\to\infty$. Since $\psi(B)\widetilde{Y}_{t,N}\to\psi(B)\widetilde{Y}_t$ in $L^2(\Omega,\mathcal{F},\mathbb{P})$, we conclude
\begin{align*}
\psi(B)\widetilde{Y}_t=Z_t.
\end{align*}
This is the precise reason the infinite filter may be convolved with the finite autoregressive polynomial: the convolution is first performed at the finite level, the boundary tail is estimated, and only then the $L^2$ limit is taken.
It remains to explain uniqueness in the stated causal absolutely summable linear-process class without appealing to an external criterion. Let $(V_t)_{t\in\mathbb{Z}}$ be another such solution. Then, by definition of this solution class, there is a sequence $(a_j)_{j=0}^{\infty}\in\ell^1$ such that
\begin{align*}
V_t=\sum_{j=0}^{\infty}a_j Z_{t-j}
\end{align*}
in $L^2(\Omega,\mathcal{F},\mathbb{P})$ and $\psi(B)V_t=Z_t$. There is one degenerate case to separate. If $\sigma^2=0$, then $Z_t=0$ in $L^2(\Omega,\mathcal{F},\mathbb{P})$ for every $t$, so every causal linear process driven by $(Z_t)$ is the zero process in $L^2$. In that case $V_t=\widetilde{Y}_t=0$ for every $t$.
Assume now that $\sigma^2>0$. Define
\begin{align*}
c_m:=\sum_{k=0}^{\min\{p,m\}}\psi_k a_{m-k},
\end{align*}
with $a_j:=0$ for $j<0$. The sequence $(c_m)_{m=0}^{\infty}$ is absolutely summable because it is the convolution of the finite sequence $(\psi_0,\dots,\psi_p)$ with the $\ell^1$ sequence $(a_j)_{j=0}^{\infty}$. Hence applying $\psi(B)$ to $V$ gives
\begin{align*}
\psi(B)V_t=\sum_{m=0}^{\infty}c_mZ_{t-m}
\end{align*}
in $L^2(\Omega,\mathcal{F},\mathbb{P})$. Since $\psi(B)V_t=Z_t$, define $d_0:=c_0-1$ and $d_m:=c_m$ for $m\geq1$; then
\begin{align*}
\sum_{m=0}^{\infty}d_mZ_{t-m}=0
\end{align*}
in $L^2(\Omega,\mathcal{F},\mathbb{P})$. To identify the coefficients, fix $\ell\geq0$ and take covariance with $Z_{t-\ell}$. Continuity of covariance under $L^2$ convergence justifies passing through the infinite sum, and the white-noise covariance relation yields
\begin{align*}
0=\operatorname{Cov}\left(\sum_{m=0}^{\infty}d_m Z_{t-m},Z_{t-\ell}\right)=\sigma^2 d_\ell.
\end{align*}
Because $\sigma^2>0$, this gives $d_\ell=0$. Since $\ell$ was arbitrary, $c_0=1$ and $c_m=0$ for every $m\geq1$. Therefore
\begin{align*}
\sum_{m=0}^{\infty}\left(\sum_{k=0}^{\min\{p,m\}}\psi_k a_{m-k}\right)z^m=1
\end{align*}
for $|z|<1$. Since the root condition gives $\psi(z)\neq0$ on $|z|<1$, division by $\psi(z)$ is valid there, and hence
\begin{align*}
\sum_{m=0}^{\infty}a_m z^m=\frac{1}{\psi(z)}=\sum_{m=0}^{\infty}\pi_m z^m
\end{align*}
for $|z|<1$. Uniqueness of Taylor coefficients now forces $a_m=\pi_m$ for every $m\geq0$. Thus any causal absolutely summable linear-process solution agrees with $\widetilde{Y}$ in $L^2(\Omega,\mathcal{F},\mathbb{P})$ at every time.
We have therefore verified the reduced ARMA description: the equation $\psi(B)Y_t=Z_t$ has the unique causal absolutely summable linear-process solution
\begin{align*}
\widetilde{Y}_t = \sum_{j=0}^{\infty} \pi_j Z_{t-j}.
\end{align*}
The covariance computation above shows that this solution is weakly stationary. The first step proved separately that the algebraic differencing operation sends $\phi(B)X_t=Z_t$ to $\psi(B)\Delta X_t=Z_t$. Thus differencing cancels the simple unit-root factor $1-B$, and, in the stated causal linear-process class, the stationary reduced dynamics are governed by $\psi(B)$.
[/guided]
[/step]
Explore Further
Fisher–Neyman Sufficiency of the Sample Mean and Sample Covariance for the Multivariate Normal Family
probability
Gauss-Markov Efficiency of Generalized Least Squares in Seemingly Unrelated Regressions
probability
Yule-Walker Equations for a Causal Autoregressive Process
probability
Orthogonality of Innovations
probability
Wilks' Lambda Product Formula
probability
Stationarity of the Differenced ARIMA Process
probability
Consistency of Sample Principal Component Analysis
probability
Conditional Variance Forecast Formula for a Stationary GARCH(1,1) Process
probability