[proofplan]
Let $Q$ be the normalised polynomial whose zeros are precisely the poles of $f$, with their orders. Then $g=Qf$ is holomorphic in the disk. The Pade condition says that the first $n$ Taylor coefficients of $q_{m,n}f$ after degree $m$ vanish. We use these $n$ equations to control the $n$ principal-part coefficients of $q_{m,n}f$ at the poles. A finite-dimensional matrix estimate shows that those principal-part coefficients tend to zero for every bounded subsequence of denominators. Passing to limits then forces every denominator limit to contain the full pole divisor, so the limit is $Q$. A scaling argument proves the denominators are bounded, hence all denominators converge to $Q$. Finally the Pade error is a Taylor tail of the holomorphic functions $q_{m,n}Qf$, divided by $Qq_{m,n}$ on compact sets avoiding the poles.
[/proofplan]
[step:Remove the poles and record the Pade coefficient equations]
If $n=0$, then $f$ is holomorphic in $\{z\in\mathbb{C}:|z|<R\}$, $q_{m,0}=1$, and $p_{m,0}$ is the Taylor polynomial of $f$ of degree at most $m$. For every compact $K$ in the disk, Cauchy's coefficient estimate on a circle between $K$ and the boundary gives uniform convergence of the Taylor polynomials to $f$ on $K$. Hence assume $n\geq 1$.
Let $\alpha_1,\ldots,\alpha_s$ be the distinct poles of $f$, and let $\nu_j$ be the order of the pole at $\alpha_j$. Thus $\nu_1+\cdots+\nu_s=n$. Since $f$ is holomorphic at $0$, each $\alpha_j$ is nonzero. Define
\begin{align*}
Q(z)=\prod_{j=1}^{s}\left(1-\frac{z}{\alpha_j}\right)^{\nu_j}.
\end{align*}
Then $Q(0)=1$, and $g=Qf$ extends holomorphically to the whole disk.
For regular entries write $p_m=p_{m,n}$ and $q_m=q_{m,n}$. The Pade condition is
\begin{align*}
q_m(z)f(z)-p_m(z)=O(z^{m+n+1})\quad\text{as }z\to0.
\end{align*}
Because $\deg p_m\leq m$, the coefficients of $p_m$ in degrees $m+1,\ldots,m+n$ are zero. Therefore
\begin{align*}
[z^k](q_mf)=0\quad\text{for }k=m+1,\ldots,m+n.
\end{align*}
These are $n$ homogeneous equations in the coefficients of $q_m$.
[/step]
[step:Estimate the principal parts from the vanished Taylor coefficients]
For any polynomial $q$ of degree at most $n$, let $A_{j,\ell}(q)$ be the coefficient of $(z-\alpha_j)^{-\ell}$ in the principal part of $qf$ at $\alpha_j$. Thus
\begin{align*}
q(z)f(z)=h_q(z)+\sum_{j=1}^{s}\sum_{\ell=1}^{\nu_j}A_{j,\ell}(q)(z-\alpha_j)^{-\ell},
\end{align*}
where $h_q$ is holomorphic in the disk. The map $q\mapsto A_{j,\ell}(q)$ is linear for every pair $(j,\ell)$.
Fix $\rho$ with $\max_j|\alpha_j|<\rho<R$. If a family of polynomials $q$ of degree at most $n$ is coefficientwise bounded, then the functions $h_q$ are uniformly bounded on $|z|\leq\rho$. Indeed the principal-part coefficients are bounded by linearity, the principal parts are bounded on $|z|=\rho$, and $qf$ is bounded on $|z|=\rho$. Hence Cauchy's coefficient estimate gives a constant $C_\rho>0$ such that
\begin{align*}
|[z^k]h_q|\leq C_\rho\rho^{-k}
\end{align*}
for all $k\geq0$ and all such $q$.
Now let $(q_m)$ be a coefficientwise bounded sequence of denominators satisfying the displayed Pade equations. Put the numbers $A_{j,\ell}(q_m)$ into a vector $a_m$ of length $n$, and put the numbers $-[z^k]h_{q_m}$, for $k=m+1,\ldots,m+n$, into a vector $b_m$. Since
\begin{align*}
[z^k](z-\alpha_j)^{-\ell}=(-1)^\ell\binom{k+\ell-1}{\ell-1}\alpha_j^{-k-\ell},
\end{align*}
the vanished Taylor coefficients give a linear system
\begin{align*}
M_ma_m=b_m.
\end{align*}
The row $i$, corresponding to $k=m+i$, and the column $(j,r)$, corresponding to $\ell=r+1$, has entry
\begin{align*}
(-1)^{r+1}\binom{m+i+r}{r}\alpha_j^{-m-i-r-1}.
\end{align*}
We need a uniform inverse estimate for $M_m$. Put $\beta_j=\alpha_j^{-1}$. After extracting the nonzero factors $(-1)^{r+1}\alpha_j^{-r-1}$ from columns, the relevant determinant has columns
\begin{align*}
i\mapsto \binom{m+i+r}{r}\beta_j^{m+i}.
\end{align*}
Factor $\beta_j^m$ from every column in the block corresponding to $\alpha_j$. The remaining determinant is a polynomial in $m$, because each $\binom{m+i+r}{r}$ is a polynomial in $m$ of degree $r$. Its highest nonzero coefficient is obtained, in each block $j$, by taking the distinct powers $m^0,m^1,\ldots,m^{\nu_j-1}$ from the columns with $r=0,\ldots,\nu_j-1$. Up to a nonzero scalar, that coefficient is the determinant of the confluent matrix whose columns are the functions
\begin{align*}
i\mapsto i^t\beta_j^i,\qquad 0\leq t\leq\nu_j-1.
\end{align*}
This confluent determinant is nonzero. Indeed, if a linear combination of these columns vanished for $i=1,\ldots,n$, then the sequence
\begin{align*}
u_i=\sum_{j=1}^{s}\sum_{t=0}^{\nu_j-1}c_{j,t}i^t\beta_j^i
\end{align*}
would have its first $n$ terms equal to zero. Its generating function has denominator $\prod_j(1-\beta_j z)^{\nu_j}$ and numerator of degree at most $n-1$. The first $n$ Taylor coefficients of the generating function are zero, so the numerator has a zero of order at least $n$ at $0$. Since its degree is at most $n-1$, the numerator is zero. The partial fraction expansion at the distinct poles $z=\beta_j^{-1}$ then gives every $c_{j,t}=0$.
Thus the determinant of $M_m$ is nonzero for all sufficiently large $m$, and its absolute value is bounded below by a positive constant times
\begin{align*}
m^{-N_1}\prod_{j=1}^{s}|\alpha_j|^{-m\nu_j}
\end{align*}
for some integer $N_1\geq0$. The cofactors of $M_m$ are bounded above by a constant times a polynomial in $m$ multiplied by the corresponding products of $|\alpha_j|^{-m}$. Dividing cofactors by the determinant gives constants $C_0>0$ and $N_0\geq0$ such that
\begin{align*}
\|M_m^{-1}\|\leq C_0m^{N_0}\left(\max_j|\alpha_j|\right)^m
\end{align*}
for all sufficiently large $m$. Choose $\gamma$ with $\max_j|\alpha_j|<\gamma<\rho$. Absorbing the polynomial factor into the larger exponential gives
\begin{align*}
\|M_m^{-1}\|\leq C_0\gamma^m
\end{align*}
for all sufficiently large $m$, after increasing $C_0$ if necessary. Since the Cauchy estimate gives $\|b_m\|\leq C_1\rho^{-m}$, we get
\begin{align*}
\|a_m\|\leq C_0C_1\left(\frac{\gamma}{\rho}\right)^m.
\end{align*}
Because $\gamma<\rho$, each coefficient $A_{j,\ell}(q_m)$ tends to $0$.
[guided]
We now prove the same estimate with the bookkeeping made explicit. The point is that the Pade equations give exactly as many scalar equations as there are principal-part coefficients. For a bounded family of polynomials $q$, write
\begin{align*}
q(z)f(z)=h_q(z)+\sum_{j=1}^{s}\sum_{\ell=1}^{\nu_j}A_{j,\ell}(q)(z-\alpha_j)^{-\ell}.
\end{align*}
The coefficients $A_{j,\ell}(q)$ depend linearly on $q$, so they are uniformly bounded when $q$ is coefficientwise bounded. On the circle $|z|=\rho$, the functions $(z-\alpha_j)^{-\ell}$ are bounded because $|\alpha_j|<\rho$, and $qf$ is bounded because $f$ is holomorphic on that circle and $q$ ranges over a bounded finite-dimensional set. Therefore $h_q$ is uniformly bounded on $|z|=\rho$, and Cauchy's estimate gives
\begin{align*}
|[z^k]h_q|\leq C_\rho\rho^{-k}.
\end{align*}
For the sequence $q_m$, collect the numbers $A_{j,\ell}(q_m)$ into a vector $a_m$, and collect the numbers $-[z^k]h_{q_m}$, with $k=m+1,\ldots,m+n$, into a vector $b_m$. The identity
\begin{align*}
[z^k](z-\alpha_j)^{-\ell}=(-1)^\ell\binom{k+\ell-1}{\ell-1}\alpha_j^{-k-\ell}
\end{align*}
turns the equations $[z^k](q_mf)=0$ into
\begin{align*}
M_ma_m=b_m.
\end{align*}
For $k=m+i$ and $\ell=r+1$, the matrix entry is
\begin{align*}
(-1)^{r+1}\binom{m+i+r}{r}\alpha_j^{-m-i-r-1}.
\end{align*}
The repeated-pole case requires the confluent part of the matrix, not just the leading term in each column. Put $\beta_j=\alpha_j^{-1}$. After removing fixed nonzero column constants, the entries have the form
\begin{align*}
\binom{m+i+r}{r}\beta_j^{m+i}.
\end{align*}
Factor $\beta_j^m$ from the columns in block $j$. The determinant that remains is a polynomial in $m$. Its first nonzero top-degree coefficient is the confluent determinant with columns
\begin{align*}
i\mapsto i^t\beta_j^i,\qquad 0\leq t\leq\nu_j-1.
\end{align*}
This determinant is nonzero: a linear relation among those columns would make the first $n$ terms of a sequence $\sum_{j,t}c_{j,t}i^t\beta_j^i$ vanish. The generating function of that sequence has denominator $\prod_j(1-\beta_j z)^{\nu_j}$ and numerator of degree at most $n-1$. Having the first $n$ Taylor coefficients zero forces that numerator to be zero, and then uniqueness of partial fractions forces every $c_{j,t}$ to vanish. Hence the confluent determinant is nonzero.
Consequently $M_m$ is invertible for all sufficiently large $m$. The determinant lower bound and the cofactor upper bounds together imply
\begin{align*}
\|M_m^{-1}\|\leq C_0m^{N_0}\left(\max_j|\alpha_j|\right)^m
\end{align*}
for suitable constants $C_0>0$ and $N_0\geq0$. Choosing $\gamma$ with $\max_j|\alpha_j|<\gamma<\rho$, the polynomial factor is absorbed by $\gamma^m$, so
\begin{align*}
\|M_m^{-1}\|\leq C_0\gamma^m.
\end{align*}
Together with $\|b_m\|\leq C_1\rho^{-m}$, this yields
\begin{align*}
\|a_m\|\leq C_0C_1\left(\frac{\gamma}{\rho}\right)^m.
\end{align*}
The ratio $\gamma/\rho$ is less than $1$, so $a_m\to0$. Equivalently, every principal-part coefficient $A_{j,\ell}(q_m)$ tends to $0$.
[/guided]
[/step]
[step:Identify every bounded subsequential denominator limit]
Let $q_{m_k}\to q$ coefficientwise along a bounded subsequence. The linearity of $q\mapsto A_{j,\ell}(q)$ and the previous step give
\begin{align*}
A_{j,\ell}(q)=0
\end{align*}
for every $j$ and $\ell$. Thus $qf$ has no principal part at any pole $\alpha_j$.
It remains to translate this into divisibility by $Q$. Fix $j$. Since $f$ has a pole of order $\nu_j$ at $\alpha_j$, its Laurent expansion has the form
\begin{align*}
f(z)=\sum_{t=1}^{\nu_j}b_{j,t}(z-\alpha_j)^{-t}+\phi_j(z),
\end{align*}
where $\phi_j$ is holomorphic near $\alpha_j$ and $b_{j,\nu_j}\neq0$. Write $q(z)=(z-\alpha_j)^d u(z)$ near $\alpha_j$, with $u(\alpha_j)\neq0$ unless $q=0$. If $d<\nu_j$, then the product $qf$ has a nonzero term
\begin{align*}
u(\alpha_j)b_{j,\nu_j}(z-\alpha_j)^{d-\nu_j}
\end{align*}
of negative degree, and no term from $\phi_j$ can cancel it because $q\phi_j$ is holomorphic. Therefore $qf$ can have no principal part at $\alpha_j$ only if $d\geq\nu_j$. Hence $(z-\alpha_j)^{\nu_j}$ divides $q$ for every $j$, so $Q$ divides $q$.
Both $Q$ and $q$ have degree at most $n$, and each denominator satisfies $q_m(0)=1$, so every bounded subsequential limit satisfies $q(0)=1$. Since $Q(0)=1$, divisibility by $Q$ forces $q=Q$.
[/step]
[step:Prove that the denominators are bounded]
Suppose the sequence $(q_m)$ is not coefficientwise bounded. Write
\begin{align*}
q_m(z)=\sum_{a=0}^{n}c_{m,a}z^a
\end{align*}
and set $L_m=\max_{0\leq a\leq n}|c_{m,a}|$. Along an unbounded subsequence, $L_m\to\infty$. Define
\begin{align*}
\widetilde q_m=L_m^{-1}q_m.
\end{align*}
Then $(\widetilde q_m)$ is coefficientwise bounded, has degree at most $n$, and has at least one coefficient of modulus $1$. Passing to a further subsequence, $\widetilde q_m\to\widetilde q$ coefficientwise, where $\widetilde q$ is nonzero.
The equations $[z^k](q_mf)=0$ are homogeneous in $q_m$, so $\widetilde q_m$ satisfies the same equations. By the previous step, every bounded subsequential limit of $\widetilde q_m$ is divisible by $Q$, so $\widetilde q=cQ$ for some constant $c$. But
\begin{align*}
\widetilde q(0)=\lim_m L_m^{-1}q_m(0)=\lim_m L_m^{-1}=0.
\end{align*}
Since $Q(0)=1$, this gives $c=0$, contradicting that $\widetilde q$ is nonzero. Thus $(q_m)$ is coefficientwise bounded.
Now every subsequence of $(q_m)$ has a further subsequence converging coefficientwise to $Q$. In a finite-dimensional coefficient space this implies the whole sequence $q_m$ converges coefficientwise to $Q$.
[/step]
[step:Deduce compact-uniform convergence of the rational approximants]
Let $K$ be a compact subset of the disk avoiding the poles. Since $Q$ has no zero on $K$,
\begin{align*}
\delta=\min_{z\in K}|Q(z)|>0.
\end{align*}
Coefficientwise convergence implies uniform convergence on $K$, so for sufficiently large $m$,
\begin{align*}
|q_m(z)|\geq\delta/2
\end{align*}
for all $z\in K$.
Choose $\sigma$ and $\tau$ with $K\subset\{z\in\mathbb{C}:|z|\leq\sigma\}$ and $\sigma<\tau<R$. Since $g=Qf$ is holomorphic and the coefficients of $q_m$ are bounded, there is $C_\tau>0$ such that
\begin{align*}
|q_m(z)g(z)|\leq C_\tau
\end{align*}
for all $|z|\leq\tau$ and all sufficiently large $m$.
The function
\begin{align*}
H_m(z)=q_m(z)g(z)-p_m(z)Q(z)
\end{align*}
has a zero of order at least $m+n+1$ at $0$, and $p_mQ$ has degree at most $m+n$. Therefore $H_m$ is the Taylor tail of the holomorphic function $q_mg$ beginning at degree $m+n+1$. If $q_mg(z)=\sum_{k=0}^{\infty}a_{m,k}z^k$, then Cauchy's estimate gives
\begin{align*}
|a_{m,k}|\leq C_\tau\tau^{-k}.
\end{align*}
For $z\in K$ this yields
\begin{align*}
|H_m(z)|\leq\frac{C_\tau}{1-\sigma/\tau}\left(\frac{\sigma}{\tau}\right)^{m+n+1}.
\end{align*}
Finally, because $g=Qf$,
\begin{align*}
f(z)-r_{m,n}(z)=\frac{H_m(z)}{Q(z)q_m(z)}.
\end{align*}
On $K$ the denominator has modulus at least $\delta^2/2$, so
\begin{align*}
|f(z)-r_{m,n}(z)|\leq\frac{2C_\tau}{\delta^2(1-\sigma/\tau)}\left(\frac{\sigma}{\tau}\right)^{m+n+1}.
\end{align*}
Since $\sigma/\tau<1$, the right-hand side tends to zero uniformly on $K$. This proves $r_{m,n}\to f$ uniformly on every compact subset avoiding the poles.
[/step]