[proofplan]
We prove convergence in probability by showing that the probability of leaving an arbitrary neighbourhood of $\theta_0$ tends to zero. The deterministic separation of $M$ outside that neighbourhood gives a positive margin. Compact containment localizes $\hat\theta_n$ to a compact set with arbitrarily high probability, and [uniform convergence](/page/Uniform%20Convergence) on that compact set transfers the deterministic margin to the random criteria $M_n$. Approximate optimality then rules out, with probability tending to zero, the event that $\hat\theta_n$ lies in the compact set but outside the neighbourhood.
[/proofplan]
custom_env
admin
[step:Fix an open neighbourhood and extract a deterministic separation margin]Let $U\subset\Theta$ be an open neighbourhood of $\theta_0$. Define
\begin{align*}
s_U:=\sup_{\theta\in\Theta\setminus U}M(\theta)
\end{align*}
as an extended real number. By hypothesis,
\begin{align*}
s_U<M(\theta_0).
\end{align*}
Hence there exists a number $\eta_U>0$ such that
\begin{align*}
s_U\le M(\theta_0)-3\eta_U.
\end{align*}
If $s_U\in\mathbb R$, one may take $\eta_U=(M(\theta_0)-s_U)/3$ after decreasing it slightly if desired; if $s_U=-\infty$, any positive $\eta_U$ works.[/step]
custom_env
admin
[guided]We fix an open neighbourhood $U$ of the proposed [limit point](/page/Limit%20Point) $\theta_0$ because convergence in probability to $\theta_0$ will follow once we can prove that $\mathbb P(\hat\theta_n\notin U)$ becomes small for every such $U$. Since $U$ is open in the metric topology, $U\in\mathcal B(\Theta)$, so the event $\{\hat\theta_n\notin U\}$ is measurable. The separation assumption says that $M$ is uniformly lower at every point outside $U$ than at $\theta_0$.
Define the extended real number
\begin{align*}
s_U:=\sup_{\theta\in\Theta\setminus U}M(\theta).
\end{align*}
The hypothesis gives
\begin{align*}
s_U<M(\theta_0).
\end{align*}
This strict inequality is the deterministic margin that drives the proof. We choose a positive number $\eta_U>0$ satisfying
\begin{align*}
s_U\le M(\theta_0)-3\eta_U.
\end{align*}
The factor $3$ is only a buffer: later two uniform approximation errors will consume $2\eta_U$, leaving one positive $\eta_U$ to contradict approximate optimality. If $s_U$ is finite, such an $\eta_U$ exists because $M(\theta_0)-s_U>0$. If $s_U=-\infty$, the displayed inequality holds for every positive $\eta_U$.[/guided]
custom_env
admin
[step:Localize the estimator to a compact set with high probability]Let $\varepsilon>0$. By compact containment in probability, choose a compact set $K_\varepsilon\subset\Theta$ such that
\begin{align*}
\liminf_{n\to\infty}\mathbb P(\hat\theta_n\in K_\varepsilon)\ge 1-\varepsilon.
\end{align*}
Define the compact set
\begin{align*}
C_\varepsilon:=K_\varepsilon\cup\{\theta_0\}.
\end{align*}
The singleton $\{\theta_0\}$ is compact in the [metric space](/page/Metric%20Space) $\Theta$, and the finite union of compact sets is compact, so $C_\varepsilon$ is compact. Therefore the maps $R_{n,\varepsilon}:\Omega\to\mathbb R$ defined by
\begin{align*}
R_{n,\varepsilon}(\omega):=\sup_{\theta\in C_\varepsilon}|M_n(\omega,\theta)-M(\theta)|
\end{align*}
are measurable real-valued random variables and satisfy
\begin{align*}
R_{n,\varepsilon}\xrightarrow{\mathbb P}0.
\end{align*}[/step]
custom_env
admin
[guided]We next use compact containment to restrict attention to a compact subset of the parameter space with probability at least $1-\varepsilon$ in the limit. Let $\varepsilon>0$. By compact containment in probability, there exists a compact set $K_\varepsilon\subset\Theta$ such that
\begin{align*}
\liminf_{n\to\infty}\mathbb P(\hat\theta_n\in K_\varepsilon)\ge 1-\varepsilon.
\end{align*}
Uniform convergence of $M_n$ to $M$ is only assumed on compact sets, so we must include both the random estimator and the deterministic comparison point $\theta_0$ in one compact set. Define
\begin{align*}
C_\varepsilon:=K_\varepsilon\cup\{\theta_0\}.
\end{align*}
The singleton $\{\theta_0\}$ is compact in the metric space $\Theta$, and a finite union of compact sets is compact. Hence $C_\varepsilon$ is compact.
For each $n\in\mathbb N$, define the [random variable](/page/Random%20Variable) $R_{n,\varepsilon}:\Omega\to\mathbb R$ by
\begin{align*}
R_{n,\varepsilon}(\omega):=\sup_{\theta\in C_\varepsilon}|M_n(\omega,\theta)-M(\theta)|.
\end{align*}
The compact-uniform convergence hypothesis applies to the compact set $C_\varepsilon$, so each $R_{n,\varepsilon}$ is measurable and real-valued, and
\begin{align*}
R_{n,\varepsilon}\xrightarrow{\mathbb P}0.
\end{align*}
This is the localization step: all later comparisons between $M_n$ and $M$ will occur on $C_\varepsilon$, where the assumed uniform convergence is available.[/guided]
custom_env
admin
[step:Show that leaving the neighbourhood inside the compact set forces a positive optimality gap]For $\omega\in\Omega$, suppose that $\hat\theta_n(\omega)\in K_\varepsilon\setminus U$ and that $R_{n,\varepsilon}(\omega)\le\eta_U$. Since $\theta_0\in C_\varepsilon$ and $\hat\theta_n(\omega)\in C_\varepsilon$, the definition of $R_{n,\varepsilon}$ gives
\begin{align*}
M_n(\omega,\theta_0)\ge M(\theta_0)-\eta_U
\end{align*}
and
\begin{align*}
M_n(\omega,\hat\theta_n(\omega))\le M(\hat\theta_n(\omega))+\eta_U.
\end{align*}
Because $\hat\theta_n(\omega)\notin U$, the definition of $s_U$ gives
\begin{align*}
M(\hat\theta_n(\omega))\le s_U\le M(\theta_0)-3\eta_U.
\end{align*}
Combining these estimates,
\begin{align*}
M_n(\omega,\theta_0)-M_n(\omega,\hat\theta_n(\omega))\ge \eta_U.
\end{align*}
Since $\sup_{\theta\in\Theta}M_n(\omega,\theta)\ge M_n(\omega,\theta_0)$, it follows that
\begin{align*}
\Delta_n(\omega)\ge \eta_U.
\end{align*}
Thus, for every $n\in\mathbb N$,
\begin{align*}
\{\hat\theta_n\in K_\varepsilon\setminus U\}\subseteq \{R_{n,\varepsilon}>\eta_U\}\cup\{\Delta_n\ge\eta_U\}.
\end{align*}[/step]
custom_env
admin
[guided]We prove the key event inclusion by fixing an outcome $\omega\in\Omega$. Suppose first that $\hat\theta_n(\omega)\in K_\varepsilon\setminus U$ and $R_{n,\varepsilon}(\omega)\le\eta_U$. Since $\theta_0\in C_\varepsilon$ and $\hat\theta_n(\omega)\in K_\varepsilon\subset C_\varepsilon$, the definition of $R_{n,\varepsilon}$ gives the two uniform approximation bounds
\begin{align*}
M_n(\omega,\theta_0)\ge M(\theta_0)-\eta_U
\end{align*}
and
\begin{align*}
M_n(\omega,\hat\theta_n(\omega))\le M(\hat\theta_n(\omega))+\eta_U.
\end{align*}
The assumption $\hat\theta_n(\omega)\notin U$ allows us to use the deterministic separation margin. By the definition of $s_U$,
\begin{align*}
M(\hat\theta_n(\omega))\le s_U\le M(\theta_0)-3\eta_U.
\end{align*}
Combining the preceding three inequalities yields
\begin{align*}
M_n(\omega,\theta_0)-M_n(\omega,\hat\theta_n(\omega))\ge \eta_U.
\end{align*}
Because the supremum of $M_n(\omega,\cdot)$ over $\Theta$ is at least its value at $\theta_0$, the optimality gap satisfies
\begin{align*}
\Delta_n(\omega)\ge \eta_U.
\end{align*}
Therefore, whenever $\hat\theta_n(\omega)\in K_\varepsilon\setminus U$, either the uniform error is larger than $\eta_U$ or the optimality gap is at least $\eta_U$. Hence, for every $n\in\mathbb N$,
\begin{align*}
\{\hat\theta_n\in K_\varepsilon\setminus U\}\subseteq \{R_{n,\varepsilon}>\eta_U\}\cup\{\Delta_n\ge\eta_U\}.
\end{align*}[/guided]
custom_env
admin
[step:Use uniform convergence and approximate optimality to make the bad compact event vanish]Taking probabilities in the event inclusion from the previous step gives
\begin{align*}
\mathbb P(\hat\theta_n\in K_\varepsilon\setminus U)\le \mathbb P(R_{n,\varepsilon}>\eta_U)+\mathbb P(\Delta_n\ge\eta_U).
\end{align*}
Since $R_{n,\varepsilon}\xrightarrow{\mathbb P}0$ and $\Delta_n\xrightarrow{\mathbb P}0$, both terms on the right-hand side converge to $0$. Hence
\begin{align*}
\lim_{n\to\infty}\mathbb P(\hat\theta_n\in K_\varepsilon\setminus U)=0.
\end{align*}[/step]
custom_env
admin
[guided]The event inclusion from the previous step turns the bad compact event into two events whose probabilities are controlled by the hypotheses. Taking probabilities and using subadditivity of probability gives
\begin{align*}
\mathbb P(\hat\theta_n\in K_\varepsilon\setminus U)\le \mathbb P(R_{n,\varepsilon}>\eta_U)+\mathbb P(\Delta_n\ge\eta_U).
\end{align*}
The first term tends to $0$ because $R_{n,\varepsilon}\xrightarrow{\mathbb P}0$ and $\eta_U>0$ is fixed. The second term tends to $0$ because $\Delta_n\xrightarrow{\mathbb P}0$ and $\eta_U>0$ is fixed. Therefore
\begin{align*}
\lim_{n\to\infty}\mathbb P(\hat\theta_n\in K_\varepsilon\setminus U)=0.
\end{align*}
This proves that, after localization to $K_\varepsilon$, the estimator cannot remain outside $U$ with non-vanishing probability.[/guided]
custom_env
admin
[step:Let the compact containment error vanish and conclude convergence in probability]For every $n\in\mathbb N$,
\begin{align*}
\{\hat\theta_n\notin U\}\subseteq \{\hat\theta_n\notin K_\varepsilon\}\cup\{\hat\theta_n\in K_\varepsilon\setminus U\}.
\end{align*}
Therefore
\begin{align*}
\limsup_{n\to\infty}\mathbb P(\hat\theta_n\notin U)\le \limsup_{n\to\infty}\mathbb P(\hat\theta_n\notin K_\varepsilon).
\end{align*}
By the choice of $K_\varepsilon$,
\begin{align*}
\limsup_{n\to\infty}\mathbb P(\hat\theta_n\notin K_\varepsilon)\le \varepsilon.
\end{align*}
Thus
\begin{align*}
\limsup_{n\to\infty}\mathbb P(\hat\theta_n\notin U)\le \varepsilon.
\end{align*}
Since $\varepsilon>0$ was arbitrary,
\begin{align*}
\lim_{n\to\infty}\mathbb P(\hat\theta_n\notin U)=0.
\end{align*}
For $r>0$, define the open ball
\begin{align*}
B(\theta_0,r):=\{\theta\in\Theta:d(\theta,\theta_0)<r\}.
\end{align*}
Applying the preceding conclusion to the neighbourhood $U=B(\theta_0,r)$ gives
\begin{align*}
\mathbb P(d(\hat\theta_n,\theta_0)\ge r)\le \mathbb P(\hat\theta_n\notin B(\theta_0,r))\to 0.
\end{align*}
Equivalently, $\hat\theta_n\xrightarrow{\mathbb P}\theta_0$.[/step]
custom_env
admin
[guided]It remains to remove the compact localization error. For every $n\in\mathbb N$,
\begin{align*}
\{\hat\theta_n\notin U\}\subseteq \{\hat\theta_n\notin K_\varepsilon\}\cup\{\hat\theta_n\in K_\varepsilon\setminus U\}.
\end{align*}
Taking probabilities, then taking upper limits, and using the preceding step gives
\begin{align*}
\limsup_{n\to\infty}\mathbb P(\hat\theta_n\notin U)\le \limsup_{n\to\infty}\mathbb P(\hat\theta_n\notin K_\varepsilon).
\end{align*}
By the choice of $K_\varepsilon$,
\begin{align*}
\liminf_{n\to\infty}\mathbb P(\hat\theta_n\in K_\varepsilon)\ge 1-\varepsilon.
\end{align*}
Equivalently,
\begin{align*}
\limsup_{n\to\infty}\mathbb P(\hat\theta_n\notin K_\varepsilon)\le \varepsilon.
\end{align*}
Thus
\begin{align*}
\limsup_{n\to\infty}\mathbb P(\hat\theta_n\notin U)\le \varepsilon.
\end{align*}
Since $\varepsilon>0$ was arbitrary, we obtain
\begin{align*}
\lim_{n\to\infty}\mathbb P(\hat\theta_n\notin U)=0.
\end{align*}
Now let $r>0$ and define the open ball
\begin{align*}
B(\theta_0,r):=\{\theta\in\Theta:d(\theta,\theta_0)<r\}.
\end{align*}
This is a neighbourhood of $\theta_0$, so the preceding conclusion applied to $U=B(\theta_0,r)$ gives
\begin{align*}
\mathbb P(d(\hat\theta_n,\theta_0)\ge r)\le \mathbb P(\hat\theta_n\notin B(\theta_0,r))\to 0.
\end{align*}
This is exactly the definition of $\hat\theta_n\xrightarrow{\mathbb P}\theta_0$ in the metric space $(\Theta,d)$.[/guided]