[proofplan]
The root condition gives a disc of radius larger than $1$ on which $\theta/\phi$ is holomorphic, hence its Taylor coefficients decay geometrically and are absolutely summable. These coefficients define a causal linear process by $L^2$ convergence, and stationarity follows because the covariance is obtained as the $L^2$ limit of finite covariance sums depending only on time differences. The identity $\phi(z)\psi(z)=\theta(z)$ is first translated into coefficient identities and then into the ARMA equation by showing that all truncation boundary coefficients vanish in $\ell^2$. Uniqueness follows within the class of absolutely summable causal linear processes by subtracting two such solutions: the coefficient sequence of the difference has a generating function annihilated by $\phi$, and analyticity at $0$ forces that generating function to vanish.
[/proofplan]
[step:Choose a zero-free disc and obtain absolutely summable coefficients]
Let $\rho_\phi$ denote the distance from $0$ to the nearest zero of $\phi$, with the convention $\rho_\phi=\infty$ if $\phi$ has no complex zero. Since $\phi(z)\ne 0$ for $|z|\le 1$, we have $\rho_\phi>1$. Choose a real number $r$ such that $1<r<\rho_\phi$. Then $\phi$ has no zero on the closed disc $\{z\in\mathbb{C}: |z|\le r\}$.
Define
\begin{align*}
\psi:\{z\in\mathbb{C}: |z|<r\} &\to \mathbb{C}\\
z &\mapsto \frac{\theta(z)}{\phi(z)}.
\end{align*}
The denominator is nonzero on this disc, so $\psi$ is holomorphic there. Therefore $\psi$ has a Taylor expansion about $0$,
\begin{align*}
\psi(z)=\sum_{j=0}^{\infty}\psi_j z^j \qquad (|z|<r),
\end{align*}
whose radius of convergence is at least $r>1$.
Choose $s$ with $1<s<r$. Since the closed circle $\{z\in\mathbb{C}: |z|=s\}$ is compact and $\psi$ is continuous on it, define
\begin{align*}
M_s=\sup_{|z|=s}|\psi(z)|<\infty.
\end{align*}
The [Cauchy coefficient estimate](/page/Cauchy%20Estimates), applied to the holomorphic function $\psi$ on the disc $\{z\in\mathbb{C}: |z|<r\}$ and the circle of radius $s$, gives
\begin{align*}
|\psi_j|\le M_s s^{-j}\qquad \text{for every }j\ge 0.
\end{align*}
Hence
\begin{align*}
\sum_{j=0}^{\infty}|\psi_j|
\le
M_s\sum_{j=0}^{\infty}s^{-j}
=
M_s\frac{s}{s-1}
<\infty.
\end{align*}
Thus $(\psi_j)_{j\ge0}$ is absolutely summable.
[/step]
[step:Construct the causal linear process in $L^2$]
For each $n\in\mathbb{N}\cup\{0\}$ and $t\in\mathbb{Z}$, define the finite partial sum
\begin{align*}
X_{t,n}=\sum_{j=0}^{n}\psi_j Z_{t-j}.
\end{align*}
If $m>n$, orthogonality of the white-noise sequence gives
\begin{align*}
\mathbb{E}\left[\left|X_{t,m}-X_{t,n}\right|^2\right]
&=
\mathbb{E}\left[\left|\sum_{j=n+1}^{m}\psi_j Z_{t-j}\right|^2\right] \\
&=
\sigma^2\sum_{j=n+1}^{m}|\psi_j|^2.
\end{align*}
Since $\sum_{j=0}^{\infty}|\psi_j|<\infty$, also $\sum_{j=0}^{\infty}|\psi_j|^2<\infty$. Therefore $(X_{t,n})_{n\ge0}$ is Cauchy in $L^2$. By [Completeness of $L^2$](/page/L2%20Space), there exists an $L^2$ random variable $X_t$ such that
\begin{align*}
X_t=\lim_{n\to\infty}X_{t,n}
=
\sum_{j=0}^{\infty}\psi_j Z_{t-j}
\qquad \text{in }L^2.
\end{align*}
The random variable $X_t$ belongs to the closed linear span in $L^2$ of $\{Z_t,Z_{t-1},Z_{t-2},\dots\}$, so the process is causal.
For $h\in\mathbb{Z}$, compute first with the finite sums. Orthogonality gives
\begin{align*}
\mathbb{E}[X_{t+h,n}\overline{X_{t,n}}]
&=
\sigma^2\sum_{\substack{0\le i,j\le n\\ t+h-i=t-j}}\psi_i\overline{\psi_j}.
\end{align*}
Since $X_{t+h,n}\to X_{t+h}$ and $X_{t,n}\to X_t$ in $L^2$, the Cauchy-Schwarz inequality implies
\begin{align*}
\mathbb{E}[X_{t+h,n}\overline{X_{t,n}}]
\to
\mathbb{E}[X_{t+h}\overline{X_t}].
\end{align*}
The finite sums converge absolutely to
\begin{align*}
\mathbb{E}[X_{t+h}\overline{X_t}]
&=
\sigma^2\sum_{\substack{i,j\ge0\\ t+h-i=t-j}}\psi_i\overline{\psi_j} \\
&=
\sigma^2\sum_{\substack{i,j\ge0\\ j=i-h}}\psi_i\overline{\psi_j},
\end{align*}
because $(\psi_j)$ is absolutely summable. The right-hand side depends on $h$ and not on $t$. Also $\mathbb{E}[X_t]=0$ for every $t$. Hence $(X_t)_{t\in\mathbb{Z}}$ is weakly stationary.
[guided]
We first build the candidate process as a limit of finite moving averages, because finite moving averages are unambiguous random variables. For $n\ge0$ and $t\in\mathbb{Z}$, set
\begin{align*}
X_{t,n}=\sum_{j=0}^{n}\psi_j Z_{t-j}.
\end{align*}
To prove that these partial sums converge in $L^2$, take $m>n$. The difference is
\begin{align*}
X_{t,m}-X_{t,n}
=
\sum_{j=n+1}^{m}\psi_j Z_{t-j}.
\end{align*}
When we square and take expectation, all cross terms vanish because distinct white-noise variables are orthogonal:
\begin{align*}
\mathbb{E}\left[\left|X_{t,m}-X_{t,n}\right|^2\right]
=
\sigma^2\sum_{j=n+1}^{m}|\psi_j|^2.
\end{align*}
The coefficient sequence is absolutely summable, hence square summable, so the right-hand side tends to $0$ as $m,n\to\infty$. Thus the partial sums are Cauchy in $L^2$. By [Completeness of $L^2$](/page/L2%20Space), this Cauchy sequence has a limit in $L^2$, and we denote it by
\begin{align*}
X_t=\sum_{j=0}^{\infty}\psi_j Z_{t-j}.
\end{align*}
This representation is causal because $X_t$ is built only from present and past innovations $Z_t,Z_{t-1},Z_{t-2},\dots$. To check stationarity, compute the covariance through finite approximations and then pass to the limit. For each $n$, orthogonality gives
\begin{align*}
\mathbb{E}[X_{t+h,n}\overline{X_{t,n}}]
=
\sigma^2\sum_{\substack{0\le i,j\le n\\ t+h-i=t-j}}\psi_i\overline{\psi_j}.
\end{align*}
Because $X_{t+h,n}\to X_{t+h}$ and $X_{t,n}\to X_t$ in $L^2$, the Cauchy-Schwarz inequality justifies passing the covariance to the limit:
\begin{align*}
\mathbb{E}[X_{t+h,n}\overline{X_{t,n}}]
\to
\mathbb{E}[X_{t+h}\overline{X_t}].
\end{align*}
The finite coefficient sums converge absolutely since $(\psi_j)$ is absolutely summable. Therefore
\begin{align*}
\mathbb{E}[X_{t+h}\overline{X_t}]
=
\sigma^2\sum_{\substack{i,j\ge0\\ t+h-i=t-j}}\psi_i\overline{\psi_j}.
\end{align*}
The condition $t+h-i=t-j$ is equivalent to $j=i-h$, so the resulting sum depends only on $h$. Therefore $(X_t)$ is weakly stationary.
[/guided]
[/step]
[step:Translate the power-series identity into the ARMA equation]
The theorem statement defines the autoregressive order $p\in\mathbb{N}\cup\{0\}$, the moving-average order $q\in\mathbb{N}\cup\{0\}$, the coefficients $\phi_1,\dots,\phi_p\in\mathbb{C}$, and the coefficients $\theta_0,…,\theta_q\in\mathbb{C}$ by
\begin{align*}
\phi(z)=1-\sum_{k=1}^{p}\phi_kz^k,
\qquad
\theta(z)=\sum_{m=0}^{q}\theta_mz^m.
\end{align*}
For indices $m>q$, define $\theta_m=0$. The identity $\phi(z)\psi(z)=\theta(z)$ holds for $|z|<r$. Comparing Taylor coefficients gives, for every $m\ge0$,
\begin{align*}
\psi_m-\sum_{k=1}^{\min(p,m)}\phi_k\psi_{m-k}=\theta_m.
\end{align*}
For $N\ge0$, define
\begin{align*}
S_{t,N}=\sum_{m=0}^{N}\psi_m Z_{t-m}.
\end{align*}
Since $S_{t,N}\to X_t$ in $L^2$, each finite lag also satisfies $S_{t-k,N}\to X_{t-k}$ in $L^2$. Therefore
\begin{align*}
S_{t,N}-\sum_{k=1}^{p}\phi_k S_{t-k,N}
\to
X_t-\sum_{k=1}^{p}\phi_k X_{t-k}
\qquad \text{in }L^2.
\end{align*}
Define $\psi_j=0$ for $j<0$ and for $j>N$. The coefficient of $Z_{t-m}$ in the finite filtered truncation is
\begin{align*}
b_{m,N}=\psi_m-\sum_{k=1}^{p}\phi_k\psi_{m-k}
\end{align*}
with this truncated convention. For fixed $m\le N$, the coefficient identity gives $b_{m,N}=\theta_m$ except for the finitely many boundary indices $N+1\le m\le N+p$, where
\begin{align*}
b_{m,N}=-\sum_{k=m-N}^{p}\phi_k\psi_{m-k}.
\end{align*}
Thus
\begin{align*}
\sum_{m=0}^{\infty}|b_{m,N}-\theta_m|^2
\le
\sum_{m=N+1}^{N+p}\left(\sum_{k=m-N}^{p}|\phi_k|\,|\psi_{m-k}|\right)^2
\to 0
\end{align*}
as $N\to\infty$, because only finitely many shifts occur and $\psi_j\to0$. Orthogonality of the white-noise sequence converts this $\ell^2$ convergence of coefficients into $L^2$ convergence of the corresponding random variables. Consequently,
\begin{align*}
X_t-\sum_{k=1}^{p}\phi_kX_{t-k}
=
\sum_{m=0}^{q}\theta_m Z_{t-m}
\qquad \text{in }L^2.
\end{align*}
That is,
\begin{align*}
\phi(B)X_t=\theta(B)Z_t.
\end{align*}
[guided]
The analytic identity
\begin{align*}
\phi(z)\psi(z)=\theta(z)
\end{align*}
is the algebraic heart of the construction. The theorem statement fixes the polynomial conventions: $p$ is the autoregressive order, $q$ is the moving-average order,
\begin{align*}
\phi(z)=1-\sum_{k=1}^{p}\phi_kz^k,
\qquad
\theta(z)=\sum_{m=0}^{q}\theta_mz^m,
\end{align*}
with coefficients $\phi_k,\theta_m\in\mathbb{C}$. For $m>q$, set $\theta_m=0$. Since
\begin{align*}
\psi(z)=\sum_{j=0}^{\infty}\psi_jz^j,
\end{align*}
multiplying the two series and comparing the coefficient of $z^m$ gives
\begin{align*}
\psi_m-\sum_{k=1}^{\min(p,m)}\phi_k\psi_{m-k}=\theta_m.
\end{align*}
This coefficient identity says exactly that applying the autoregressive filter $\phi(B)$ to the infinite moving average with coefficients $(\psi_j)$ should leave the finite moving average with coefficients $(\theta_j)$.
To justify this at the process level, define the finite truncation
\begin{align*}
S_{t,N}=\sum_{m=0}^{N}\psi_m Z_{t-m}.
\end{align*}
We already proved that $S_{t,N}\to X_t$ in $L^2$. Since a finite linear combination of $L^2$-convergent sequences is again $L^2$-convergent, we have
\begin{align*}
S_{t,N}-\sum_{k=1}^{p}\phi_k S_{t-k,N}
\to
X_t-\sum_{k=1}^{p}\phi_k X_{t-k}
\qquad \text{in }L^2.
\end{align*}
Now we identify the coefficient error instead of hiding it in notation. Define $\psi_j=0$ for $j<0$ and for $j>N$ while working with the finite truncation. Then the coefficient of $Z_{t-m}$ in
\begin{align*}
S_{t,N}-\sum_{k=1}^{p}\phi_k S_{t-k,N}
\end{align*}
is
\begin{align*}
b_{m,N}=\psi_m-\sum_{k=1}^{p}\phi_k\psi_{m-k}
\end{align*}
under this truncated convention. For $m\le N$, the only possible discrepancy from the infinite coefficient identity occurs when a shifted index crosses the cutoff. There is no discrepancy except near the boundary, and for $N+1\le m\le N+p$ the coefficient is
\begin{align*}
b_{m,N}=-\sum_{k=m-N}^{p}\phi_k\psi_{m-k}.
\end{align*}
Therefore
\begin{align*}
\sum_{m=0}^{\infty}|b_{m,N}-\theta_m|^2
\le
\sum_{m=N+1}^{N+p}\left(\sum_{k=m-N}^{p}|\phi_k|\,|\psi_{m-k}|\right)^2
\to 0.
\end{align*}
The convergence holds because the number of autoregressive coefficients is finite and $\psi_j\to0$, which follows from absolute summability of $(\psi_j)$. Orthogonality of the white noise converts this $\ell^2$ convergence of coefficients into $L^2$ convergence of the corresponding random variables. Therefore the limit is
\begin{align*}
\sum_{m=0}^{q}\theta_m Z_{t-m}.
\end{align*}
Thus
\begin{align*}
X_t-\sum_{k=1}^{p}\phi_kX_{t-k}
=
\sum_{m=0}^{q}\theta_m Z_{t-m},
\end{align*}
which is precisely $\phi(B)X_t=\theta(B)Z_t$.
[/guided]
[/step]
[step:Show that no other absolutely summable causal linear process can satisfy the same equation]
Let $(\widetilde{X}_t)_{t\in\mathbb{Z}}$ be another causal stationary solution in the class stated in the theorem, so it is a causal linear process of the form
\begin{align*}
\widetilde{X}_t=\sum_{j=0}^{\infty}\widetilde{\psi}_j Z_{t-j},
\qquad
\sum_{j=0}^{\infty}|\widetilde{\psi}_j|<\infty.
\end{align*}
Define the coefficient differences
\begin{align*}
a_j=\widetilde{\psi}_j-\psi_j \qquad (j\ge0),
\end{align*}
and define the difference process
\begin{align*}
Y_t=\widetilde{X}_t-X_t=\sum_{j=0}^{\infty}a_j Z_{t-j}.
\end{align*}
Then $\sum_{j=0}^{\infty}|a_j|<\infty$ and
\begin{align*}
\phi(B)Y_t=0.
\end{align*}
Define the generating function
\begin{align*}
A:\{z\in\mathbb{C}: |z|<1\}&\to\mathbb{C}\\
z&\mapsto \sum_{j=0}^{\infty}a_jz^j.
\end{align*}
Absolute summability implies that $A$ is holomorphic on the open unit disc and continuous on the closed unit disc.
Define $a_j=0$ for $j<0$, and define the filtered coefficient sequence $(c_m)_{m\ge0}$ by
\begin{align*}
c_m=a_m-\sum_{k=1}^{p}\phi_k a_{m-k}.
\end{align*}
Since $(a_j)$ is absolutely summable and the autoregressive order $p$ is finite, $(c_m)$ is square summable. The identity $\phi(B)Y_t=0$ means
\begin{align*}
\sum_{m=0}^{\infty}c_m Z_{t-m}=0
\qquad \text{in }L^2.
\end{align*}
Using orthogonality of the white-noise sequence and the hypothesis $\sigma^2>0$, we obtain
\begin{align*}
0
=
\mathbb{E}\left[\left|\sum_{m=0}^{\infty}c_m Z_{t-m}\right|^2\right]
=
\sigma^2\sum_{m=0}^{\infty}|c_m|^2,
\end{align*}
so $c_m=0$ for every $m\ge0$. These equations are exactly the Taylor coefficient equations for
\begin{align*}
\phi(z)A(z)=0
\qquad (|z|<1).
\end{align*}
Since $\phi(z)\ne0$ for every $|z|<1$, it follows that $A(z)=0$ for every $|z|<1$. Hence all Taylor coefficients of $A$ vanish, so $a_j=0$ for every $j\ge0$. Therefore $\widetilde{\psi}_j=\psi_j$ for every $j\ge0$, and $\widetilde{X}_t=X_t$ in $L^2$ for every $t\in\mathbb{Z}$.
[guided]
To prove uniqueness in the stated class, subtract two possible absolutely summable causal linear solutions. Suppose
\begin{align*}
\widetilde{X}_t=\sum_{j=0}^{\infty}\widetilde{\psi}_j Z_{t-j},
\qquad
\sum_{j=0}^{\infty}|\widetilde{\psi}_j|<\infty,
\end{align*}
is another causal stationary solution of the same ARMA equation in that class. Define
\begin{align*}
a_j=\widetilde{\psi}_j-\psi_j
\qquad (j\ge0),
\end{align*}
and
\begin{align*}
Y_t=\widetilde{X}_t-X_t=\sum_{j=0}^{\infty}a_jZ_{t-j}.
\end{align*}
Because both coefficient sequences are absolutely summable, $(a_j)$ is absolutely summable. Since both processes solve the same equation, subtracting the two equations gives
\begin{align*}
\phi(B)Y_t=0.
\end{align*}
Now encode the coefficients of the causal linear process in a generating function:
\begin{align*}
A:\{z\in\mathbb{C}: |z|<1\}&\to\mathbb{C}\\
z&\mapsto \sum_{j=0}^{\infty}a_jz^j.
\end{align*}
Absolute summability ensures that this series converges absolutely for $|z|\le1$, and in particular $A$ is holomorphic on $|z|<1$.
We must justify carefully what information is contained in $\phi(B)Y_t=0$. Define $a_j=0$ for $j<0$, and define
\begin{align*}
c_m=a_m-\sum_{k=1}^{p}\phi_k a_{m-k}
\qquad (m\ge0).
\end{align*}
Because $(a_j)$ is absolutely summable and only finitely many autoregressive coefficients $\phi_k$ occur, the sequence $(c_m)$ is square summable. Applying the filter $\phi(B)$ to $Y_t$ therefore gives
\begin{align*}
\phi(B)Y_t
=
\sum_{m=0}^{\infty}c_m Z_{t-m}
\qquad \text{in }L^2.
\end{align*}
Since $\phi(B)Y_t=0$ in $L^2$, orthogonality of the white-noise sequence gives
\begin{align*}
0
=
\mathbb{E}\left[\left|\sum_{m=0}^{\infty}c_m Z_{t-m}\right|^2\right]
=
\sigma^2\sum_{m=0}^{\infty}|c_m|^2.
\end{align*}
Here the assumption $\sigma^2>0$ is essential: it lets us conclude from the vanishing variance that $\sum_{m=0}^{\infty}|c_m|^2=0$, and hence $c_m=0$ for every $m\ge0$. These equations say exactly that every Taylor coefficient of $\phi(z)A(z)$ is zero. Therefore
\begin{align*}
\phi(z)A(z)=0
\qquad (|z|<1).
\end{align*}
The root condition gives $\phi(z)\ne0$ throughout the unit disc. Dividing by $\phi(z)$ gives
\begin{align*}
A(z)=0
\qquad (|z|<1).
\end{align*}
A holomorphic function with zero Taylor expansion on a disc has all coefficients equal to zero, so $a_j=0$ for every $j\ge0$. Thus the two causal coefficient sequences agree term by term, and the two processes agree in $L^2$ at every time.
[/guided]
[/step]
[step:Identify the solution with the stated transfer function]
The construction produced a causal stationary solution
\begin{align*}
X_t=\sum_{j=0}^{\infty}\psi_j Z_{t-j}=\psi(B)Z_t,
\end{align*}
where
\begin{align*}
\psi(z)=\frac{\theta(z)}{\phi(z)}=\sum_{j=0}^{\infty}\psi_jz^j.
\end{align*}
The first step showed that this power series has radius of convergence strictly larger than $1$ and that $(\psi_j)_{j\ge0}$ is absolutely summable. The uniqueness step showed that no other absolutely summable causal linear process can solve the same ARMA equation. This proves existence and uniqueness of the causal stationary ARMA solution in the class stated in the theorem.
[/step]