[proofplan]
The root condition implies that $\theta/\phi$ is analytic on a disk of radius strictly larger than $1$. Its Taylor coefficients therefore decay geometrically by Cauchy's estimates, hence are absolutely summable. We use these coefficients to define a causal linear process, verify weak stationarity directly from the white-noise covariance structure, and then check the ARMA equation by multiplying the absolutely summable filter by the finite lag polynomial. Finally, any other absolutely summable causal linear solution has coefficients satisfying the same convolution identities, and the nonzero variance of the innovations forces equality coefficient by coefficient.
[/proofplan]
[step:Choose an analytic expansion of $\theta/\phi$ with absolutely summable coefficients]
Let $R_\phi$ denote the set of zeros of $\phi$ in $\mathbb{C}$. Since $\phi(z)\ne 0$ for all $|z|\le 1$ and $R_\phi$ is finite, there exists $r>1$ such that $\phi(z)\ne 0$ for every $z\in\mathbb{C}$ with $|z|<r$. Define
\begin{align*}
f: \{z\in\mathbb{C}: |z|<r\} \to \mathbb{C}
\end{align*}
by
\begin{align*}
f(z)=\frac{\theta(z)}{\phi(z)}.
\end{align*}
Then $f$ is analytic on $\{z\in\mathbb{C}: |z|<r\}$.
Choose a number $\rho$ with $1<\rho<r$, and define
\begin{align*}
M_\rho=\sup\{|f(z)|: |z|=\rho\}.
\end{align*}
The set $\{z\in\mathbb{C}: |z|=\rho\}$ is compact and $f$ is continuous, so $M_\rho<\infty$. Let $(\psi_j)_{j\ge 0}$ be the Taylor coefficients of $f$ at $0$, so
\begin{align*}
f(z)=\sum_{j=0}^{\infty}\psi_j z^j
\end{align*}
for $|z|<r$. By Cauchy's coefficient estimate applied on the circle of radius $\rho$,
\begin{align*}
|\psi_j|\le M_\rho \rho^{-j}
\end{align*}
for every $j\ge 0$. Therefore
\begin{align*}
\sum_{j=0}^{\infty}|\psi_j|\le M_\rho \sum_{j=0}^{\infty}\rho^{-j}=\frac{M_\rho \rho}{\rho-1}<\infty.
\end{align*}
[guided]
The important point is that the zeros of $\phi$ lie strictly outside the closed unit disk. Since $\phi$ is a nonzero polynomial, it has only finitely many complex zeros. Because none of them has modulus at most $1$, we may choose $r>1$ so that no zero of $\phi$ lies in the open disk $\{z\in\mathbb{C}: |z|<r\}$. Hence the quotient
\begin{align*}
f: \{z\in\mathbb{C}: |z|<r\} \to \mathbb{C}
\end{align*}
defined by
\begin{align*}
f(z)=\frac{\theta(z)}{\phi(z)}
\end{align*}
is analytic on that disk.
Why do we need a radius larger than $1$ rather than just analyticity on the unit disk? We need absolute summability of the Taylor coefficients, and that follows from geometric decay. Choose $\rho$ with $1<\rho<r$, and define
\begin{align*}
M_\rho=\sup\{|f(z)|: |z|=\rho\}.
\end{align*}
The circle $|z|=\rho$ is compact and $f$ is continuous, so this supremum is finite. Let $(\psi_j)_{j\ge 0}$ be the Taylor coefficients of $f$ at $0$. Cauchy's coefficient estimate on the circle $|z|=\rho$ gives
\begin{align*}
|\psi_j|\le M_\rho \rho^{-j}
\end{align*}
for every $j\ge 0$. Since $\rho>1$, the geometric series with ratio $\rho^{-1}$ converges, and therefore
\begin{align*}
\sum_{j=0}^{\infty}|\psi_j|\le M_\rho \sum_{j=0}^{\infty}\rho^{-j}=\frac{M_\rho \rho}{\rho-1}<\infty.
\end{align*}
This is the analytic reason the root condition gives a stable causal filter.
[/guided]
[/step]
[step:Define the causal process and verify weak stationarity]
For each $t\in\mathbb{Z}$, define the [random variable](/page/Random%20Variable)
\begin{align*}
X_t=\sum_{j=0}^{\infty}\psi_j Z_{t-j}.
\end{align*}
Let $L^2(\Omega,\mathcal F,\mathbb P)$ denote the [Hilbert space](/page/Hilbert%20Space) of real-valued square-integrable random variables, modulo equality almost surely, with norm $\|W\|_{L^2}=(\mathbb{E}[|W|^2])^{1/2}$. Let $\ell^1(\mathbb{N}_0)$ denote the space of absolutely summable real sequences indexed by $\mathbb{N}_0=\{0,1,2,\dots\}$, and let $\ell^2(\mathbb{N}_0)$ denote the space of square-summable real sequences indexed by $\mathbb{N}_0$. Since $\sum_{j=0}^{\infty}|\psi_j|<\infty$, the series converges in $L^2(\Omega,\mathcal F,\mathbb P)$. Indeed, for $m>n$,
\begin{align*}
\mathbb{E}\left[\left|\sum_{j=n+1}^{m}\psi_j Z_{t-j}\right|^2\right]=\sigma^2\sum_{j=n+1}^{m}|\psi_j|^2,
\end{align*}
using the orthogonality of distinct white-noise innovations. Since $\ell^1(\mathbb{N}_0)\subset \ell^2(\mathbb{N}_0)$, the right-hand side tends to $0$ as $m,n\to\infty$.
The mean is
\begin{align*}
\mathbb{E}[X_t]=\sum_{j=0}^{\infty}\psi_j\mathbb{E}[Z_{t-j}]=0.
\end{align*}
For $h\in\mathbb{Z}$, define the covariance function $\gamma:\mathbb{Z}\to\mathbb{R}$ by
\begin{align*}
\gamma(h)=\operatorname{Cov}(X_{t+h},X_t).
\end{align*}
The $L^2$ convergence of the defining series and orthogonality of white noise give
\begin{align*}
\gamma(h)=\sigma^2\sum_{\substack{j, k\ge 0, t+h-j=t-k}}\psi_j\psi_k.
\end{align*}
Equivalently,
\begin{align*}
\gamma(h)=\sigma^2\sum_{j=\max\{0,h\}}^{\infty}\psi_j\psi_{j-h}
\end{align*}
for $h\ge 0$, with the analogous expression for $h<0$. This depends only on $h$, not on $t$. Hence $(X_t)_{t\in\mathbb{Z}}$ is weakly stationary and causal.
[/step]
[step:Verify the ARMA equation by multiplying the causal filter]
Extend the coefficient sequence by setting $\psi_j=0$ for $j<0$. Define polynomial coefficient sequences $(a_i)_{i\ge 0}$ and $(b_k)_{k\ge 0}$ by requiring
\begin{align*}
\phi(z)=\sum_{i=0}^{p}a_i z^i
\end{align*}
and
\begin{align*}
\theta(z)=\sum_{k=0}^{q}b_k z^k,
\end{align*}
and then setting $a_i=0$ for $i>p$ and $b_k=0$ for $k>q$. Since $\phi(z)f(z)=\theta(z)$ for $|z|<r$, the Taylor coefficients satisfy
\begin{align*}
\sum_{i=0}^{p}a_i\psi_{n-i}=b_n
\end{align*}
for every $n\ge 0$.
For each $t\in\mathbb{Z}$, the finite order of $\phi(B)$ and the absolute summability of $(\psi_j)$ justify regrouping the series:
\begin{align*}
\phi(B)X_t=\sum_{i=0}^{p}a_i X_{t-i}.
\end{align*}
Substituting the definition of $X_{t-i}$ gives
\begin{align*}
\phi(B)X_t=\sum_{i=0}^{p}a_i\sum_{j=0}^{\infty}\psi_j Z_{t-i-j}.
\end{align*}
Setting $n=i+j$ and collecting the coefficient of $Z_{t-n}$ gives
\begin{align*}
\phi(B)X_t=\sum_{n=0}^{\infty}\left(\sum_{i=0}^{p}a_i\psi_{n-i}\right)Z_{t-n}.
\end{align*}
Using the coefficient identity above,
\begin{align*}
\phi(B)X_t=\sum_{n=0}^{q}b_n Z_{t-n}=\theta(B)Z_t.
\end{align*}
Thus $(X_t)_{t\in\mathbb{Z}}$ is a causal linear weakly stationary solution of the ARMA equation.
[/step]
[step:Compare coefficients to prove uniqueness among causal linear solutions]
Let $(Y_t)_{t\in\mathbb{Z}}$ be another causal linear weakly stationary solution driven by the same white noise, with
\begin{align*}
Y_t=\sum_{j=0}^{\infty}\eta_j Z_{t-j}
\end{align*}
and
\begin{align*}
\sum_{j=0}^{\infty}|\eta_j|<\infty.
\end{align*}
Extend $\eta_j=0$ for $j<0$. Applying the same finite regrouping argument to $\phi(B)Y_t=\theta(B)Z_t$ gives
\begin{align*}
\sum_{n=0}^{\infty}\left(\sum_{i=0}^{p}a_i\eta_{n-i}-b_n\right)Z_{t-n}=0
\end{align*}
in $L^2(\Omega,\mathcal F,\mathbb P)$.
For fixed $m\ge 0$, define
\begin{align*}
c_n=\sum_{i=0}^{p}a_i\eta_{n-i}-b_n,\qquad n\ge 0.
\end{align*}
Taking covariance with $Z_{t-m}$ and using $\mathbb{E}[Z_{t-n}Z_{t-m}]=0$ for $n\ne m$ and $\mathbb{E}[Z_{t-m}^2]=\sigma^2$ gives
\begin{align*}
0=\mathbb{E}\left[\left(\sum_{n=0}^{\infty}c_n Z_{t-n}\right)Z_{t-m}\right]=c_m\sigma^2.
\end{align*}
Since $\sigma^2>0$, $c_m=0$. Therefore
\begin{align*}
\sum_{i=0}^{p}a_i\eta_{n-i}=b_n
\end{align*}
for every $n\ge 0$.
Define
\begin{align*}
g: \{z\in\mathbb{C}: |z|\le 1\} \to \mathbb{C}
\end{align*}
by the absolutely convergent [power series](/page/Power%20Series)
\begin{align*}
g(z)=\sum_{j=0}^{\infty}\eta_j z^j.
\end{align*}
The coefficient identities say that $\phi(z)g(z)=\theta(z)$ for $|z|<1$. Since $\phi(z)\ne 0$ on $|z|\le 1$, it follows that
\begin{align*}
g(z)=\frac{\theta(z)}{\phi(z)}
\end{align*}
for $|z|<1$. But $\theta/\phi$ has Taylor coefficients $(\psi_j)_{j\ge 0}$ at $0$, while $g$ has Taylor coefficients $(\eta_j)_{j\ge 0}$. Uniqueness of Taylor coefficients gives $\eta_j=\psi_j$ for every $j\ge 0$. Hence $Y_t=X_t$ in $L^2$ for every $t\in\mathbb{Z}$, proving uniqueness in the causal linear class.
[/step]