[proofplan]
Set $\delta_n:=\hat\theta_n-\theta_0$ and first use the approximate estimating equation together with stochastic equicontinuity to express $\Psi(\hat\theta_n)$ as $-\Psi_n(\theta_0)$ up to an $o_{\mathbb P}(n^{-1/2})$ error. This shows that $\sqrt n\,\Psi(\hat\theta_n)$ is bounded in probability. Nonsingularity of the derivative $A$ and differentiability of $\Psi$ then convert this bound into the root-$n$ tightness of $\delta_n$. With that rate available, the differentiability remainder becomes $o_{\mathbb P}(n^{-1/2})$, so the estimating equation linearizes to $\sqrt n\,\delta_n=-A^{-1}\sqrt n\,\Psi_n(\theta_0)+o_{\mathbb P}(1)$, and the conclusion follows from Slutsky's theorem and the [continuous mapping theorem](/theorems/1847).
[/proofplan]
[step:Introduce the estimation error and rewrite the estimating equation]
Define the random vector
\begin{align*}
\delta_n:\Omega\to\mathbb R^p
\end{align*}
by
\begin{align*}
\delta_n(\omega):=\hat\theta_n(\omega)-\theta_0.
\end{align*}
Since $\hat\theta_n\xrightarrow{\mathbb P}\theta_0$, we have $\delta_n\xrightarrow{\mathbb P}0$.
For each $n\in\mathbb N$, define the random vector
\begin{align*}
R_n:\Omega\to\mathbb R^p
\end{align*}
by
\begin{align*}
R_n:=(\Psi_n-\Psi)(\hat\theta_n)-(\Psi_n-\Psi)(\theta_0).
\end{align*}
The stochastic equicontinuity assumption says precisely that
\begin{align*}
\sqrt n\,|R_n|=o_{\mathbb P}(1).
\end{align*}
Using $\Psi(\theta_0)=0$, we compute
\begin{align*}
\Psi(\hat\theta_n)+\Psi_n(\theta_0)=\Psi_n(\hat\theta_n)-R_n.
\end{align*}
Multiplying by $\sqrt n$ gives
\begin{align*}
\sqrt n\,\left|\Psi(\hat\theta_n)+\Psi_n(\theta_0)\right|\le \sqrt n\,|\Psi_n(\hat\theta_n)|+\sqrt n\,|R_n|.
\end{align*}
The residual assumption gives $\sqrt n\,|\Psi_n(\hat\theta_n)|=o_{\mathbb P}(1)$, and the stochastic equicontinuity assumption gives $\sqrt n\,|R_n|=o_{\mathbb P}(1)$. Hence
\begin{align*}
\Psi(\hat\theta_n)=-\Psi_n(\theta_0)+o_{\mathbb P}(n^{-1/2}).
\end{align*}
[/step]
[step:Deduce that $\Psi(\hat\theta_n)$ is of order $n^{-1/2}$]
Because
\begin{align*}
\sqrt n\,\Psi_n(\theta_0)\xrightarrow{d}Z,
\end{align*}
the sequence $(\sqrt n\,\Psi_n(\theta_0))_{n\ge 1}$ is tight in $\mathbb R^p$. In the notation used below, $Y_n=O_{\mathbb P}(a_n)$ for positive deterministic numbers $a_n$ means that $Y_n/a_n$ is bounded in probability. Therefore
\begin{align*}
\Psi_n(\theta_0)=O_{\mathbb P}(n^{-1/2}).
\end{align*}
Combining this with the conclusion of the previous step yields
\begin{align*}
\Psi(\hat\theta_n)=O_{\mathbb P}(n^{-1/2}).
\end{align*}
[/step]
[step:Use nonsingularity of $A$ to obtain the root-$n$ rate]
Since $\Psi$ is differentiable at $\theta_0$ with [Jacobian matrix](/page/Jacobian%20Matrix) $A$, there exists a deterministic remainder map
\begin{align*}
r:U\to\mathbb R^p
\end{align*}
defined on a neighbourhood $U\subset\mathbb R^p$ of $0$ such that $\theta_0+h\in\Theta$ for every $h\in U$ and
\begin{align*}
\Psi(\theta_0+h)=Ah+r(h)
\end{align*}
with
\begin{align*}
\lim_{h\to 0}\frac{|r(h)|}{|h|}=0.
\end{align*}
Because $A$ is nonsingular, $A^{-1}$ exists. Define
\begin{align*}
c:=\frac{1}{2\|A^{-1}\|_{\mathrm{op}}}>0.
\end{align*}
For $\rho>0$, let $B(0,\rho):=\{h\in\mathbb R^p:|h|<\rho\}$ denote the open Euclidean ball in $\mathbb R^p$ with centre $0$ and radius $\rho$. Choose $\rho>0$ such that $B(0,\rho)\subset U$ and
\begin{align*}
|r(h)|\le c|h|
\end{align*}
for every $h\in B(0,\rho)$. For such $h$, the lower bound
\begin{align*}
|Ah|\ge \frac{1}{\|A^{-1}\|_{\mathrm{op}}}|h|
\end{align*}
gives
\begin{align*}
|\Psi(\theta_0+h)|\ge |Ah|-|r(h)|\ge c|h|.
\end{align*}
On the event $\{|\delta_n|<\rho\}$, this gives
\begin{align*}
|\delta_n|\le c^{-1}|\Psi(\hat\theta_n)|.
\end{align*}
Since $\delta_n\xrightarrow{\mathbb P}0$, the events $\{|\delta_n|<\rho\}$ have probability tending to $1$. Since $\Psi(\hat\theta_n)=O_{\mathbb P}(n^{-1/2})$, it follows that
\begin{align*}
\sqrt n\,\delta_n=O_{\mathbb P}(1).
\end{align*}
[guided]
The subtle point is that the theorem does not assume the rate $\sqrt n(\hat\theta_n-\theta_0)=O_{\mathbb P}(1)$; we must derive it. We start from differentiability. There is a neighbourhood $U\subset\mathbb R^p$ of $0$ and a deterministic map
\begin{align*}
r:U\to\mathbb R^p
\end{align*}
such that $\theta_0+h\in\Theta$ for $h\in U$ and
\begin{align*}
\Psi(\theta_0+h)=Ah+r(h),
\end{align*}
where the differentiability remainder satisfies
\begin{align*}
\lim_{h\to 0}\frac{|r(h)|}{|h|}=0.
\end{align*}
The nonsingularity of $A$ supplies a local inverse lower bound. Since $A^{-1}$ exists, for every $h\in\mathbb R^p$ we have $h=A^{-1}Ah$, so the [operator norm](/page/Operator%20Norm) inequality gives
\begin{align*}
|h|\le \|A^{-1}\|_{\mathrm{op}}|Ah|.
\end{align*}
Equivalently,
\begin{align*}
|Ah|\ge \frac{1}{\|A^{-1}\|_{\mathrm{op}}}|h|.
\end{align*}
Define
\begin{align*}
c:=\frac{1}{2\|A^{-1}\|_{\mathrm{op}}}>0.
\end{align*}
For $\rho>0$, let $B(0,\rho):=\{h\in\mathbb R^p:|h|<\rho\}$ denote the open Euclidean ball in $\mathbb R^p$ with centre $0$ and radius $\rho$. Because $|r(h)|/|h|\to 0$, we may choose $\rho>0$ such that $B(0,\rho)\subset U$ and
\begin{align*}
|r(h)|\le c|h|
\end{align*}
for every $h\in B(0,\rho)$. Therefore, whenever $h\in B(0,\rho)$,
\begin{align*}
|\Psi(\theta_0+h)|=|Ah+r(h)|.
\end{align*}
The [reverse triangle inequality](/theorems/2300) gives
\begin{align*}
|\Psi(\theta_0+h)|\ge |Ah|-|r(h)|.
\end{align*}
Using the lower bound for $|Ah|$ and the chosen bound on $r(h)$, we obtain
\begin{align*}
|\Psi(\theta_0+h)|\ge \frac{1}{\|A^{-1}\|_{\mathrm{op}}}|h|-c|h|=c|h|.
\end{align*}
Now put $h=\delta_n=\hat\theta_n-\theta_0$. Since $\delta_n\xrightarrow{\mathbb P}0$, the event $\{|\delta_n|<\rho\}$ has probability tending to $1$. On that event,
\begin{align*}
|\delta_n|\le c^{-1}|\Psi(\hat\theta_n)|.
\end{align*}
The previous step proved
\begin{align*}
\Psi(\hat\theta_n)=O_{\mathbb P}(n^{-1/2}).
\end{align*}
Multiplying the displayed inequality by $\sqrt n$ on the high-probability event $\{|\delta_n|<\rho\}$ gives
\begin{align*}
\sqrt n\,|\delta_n|\le c^{-1}\sqrt n\,|\Psi(\hat\theta_n)|.
\end{align*}
The right-hand side is $O_{\mathbb P}(1)$, and the exceptional event has probability tending to $0$. Hence
\begin{align*}
\sqrt n\,\delta_n=O_{\mathbb P}(1).
\end{align*}
This is the rate needed before we are allowed to multiply the differentiability remainder by $\sqrt n$.
[/guided]
[/step]
[step:Linearize $\Psi(\hat\theta_n)$ at $\theta_0$ with a negligible remainder]
Using the same remainder map $r:U\to\mathbb R^p$ as above, define an extended remainder map $\tilde r:\mathbb R^p\to\mathbb R^p$ by $\tilde r(h):=r(h)$ for $h\in U$ and $\tilde r(h):=0$ for $h\notin U$. On the event $\{\delta_n\in U\}$ we have
\begin{align*}
\Psi(\hat\theta_n)=A\delta_n+\tilde r(\delta_n).
\end{align*}
Since $U$ is a neighbourhood of $0$ and $\delta_n\xrightarrow{\mathbb P}0$, the event $\{\delta_n\in U\}$ has probability tending to $1$. Since $\sqrt n\,\delta_n=O_{\mathbb P}(1)$, the differentiability condition implies
\begin{align*}
\sqrt n\,\tilde r(\delta_n)=o_{\mathbb P}(1).
\end{align*}
Indeed, define $q:\mathbb R^p\to[0,\infty)$ by $q(0):=0$,
\begin{align*}
q(h):=\frac{|r(h)|}{|h|}
\end{align*}
for $h\in U\setminus\{0\}$, and $q(h):=0$ for $h\notin U$. Then $q$ is continuous at $0$ in the sequential sense needed here, because $q(h)\to 0$ as $h\to 0$ with $h\in U$ and every sequence converging to $0$ is eventually in $U$. Hence $q(\delta_n)\xrightarrow{\mathbb P}0$, while $\sqrt n\,|\delta_n|=O_{\mathbb P}(1)$. Therefore
\begin{align*}
\sqrt n\,|\tilde r(\delta_n)|=q(\delta_n)\sqrt n\,|\delta_n|=o_{\mathbb P}(1).
\end{align*}
Therefore, since $\{\delta_n\in U\}$ has probability tending to $1$,
\begin{align*}
\sqrt n\,\Psi(\hat\theta_n)=A\sqrt n\,\delta_n+o_{\mathbb P}(1).
\end{align*}
[/step]
[step:Rearrange the linearized equation and pass to the limit]
From the first step,
\begin{align*}
\sqrt n\,\Psi(\hat\theta_n)=-\sqrt n\,\Psi_n(\theta_0)+o_{\mathbb P}(1).
\end{align*}
From the linearization step,
\begin{align*}
\sqrt n\,\Psi(\hat\theta_n)=A\sqrt n\,\delta_n+o_{\mathbb P}(1).
\end{align*}
Combining these two identities gives
\begin{align*}
A\sqrt n\,\delta_n=-\sqrt n\,\Psi_n(\theta_0)+o_{\mathbb P}(1).
\end{align*}
Multiplying by the fixed matrix $A^{-1}$ yields
\begin{align*}
\sqrt n\,\delta_n=-A^{-1}\sqrt n\,\Psi_n(\theta_0)+o_{\mathbb P}(1).
\end{align*}
By the continuous mapping theorem for the continuous [linear map](/page/Linear%20Map) $x\mapsto -A^{-1}x$ from $\mathbb R^p$ to $\mathbb R^p$,
\begin{align*}
-A^{-1}\sqrt n\,\Psi_n(\theta_0)\xrightarrow{d}-A^{-1}Z.
\end{align*}
By Slutsky's theorem applied to the preceding convergence and the $o_{\mathbb P}(1)$ remainder, we conclude that
\begin{align*}
\sqrt n(\hat\theta_n-\theta_0)=\sqrt n\,\delta_n\xrightarrow{d}-A^{-1}Z.
\end{align*}
This is the desired asymptotic normality statement.
[/step]