[proofplan]
The posterior probability of class $k$ at an observation $x$ is proportional to the prior probability $\pi_k$ multiplied by the class-conditional Gaussian density at $x$. Taking logarithms preserves the maximizers because the logarithm is strictly increasing. The Gaussian log-density consists of a class-independent constant, a determinant term, and a Mahalanobis quadratic term; removing the class-independent constant leaves exactly $q_k(x)$.
[/proofplan]
[step:Write the posterior probabilities in terms of the class-conditional densities]
Let $\mathcal{L}^p$ denote Lebesgue measure on $\mathbb{R}^p$.
For each $k \in \{1,\dots,g\}$, define the class-conditional density
\begin{align*}
f_k : \mathbb{R}^p &\to (0,\infty) \\
x &\mapsto (2\pi)^{-p/2}(\det \Sigma_k)^{-1/2}
\exp\left(-\frac{1}{2}(x-\mu_k)^\top \Sigma_k^{-1}(x-\mu_k)\right).
\end{align*}
This formula is valid because $\Sigma_k$ is symmetric positive definite, so $\det \Sigma_k > 0$ and $\Sigma_k^{-1}$ exists.
Define the marginal density of $X$ by
\begin{align*}
m : \mathbb{R}^p &\to (0,\infty) \\
x &\mapsto \sum_{j=1}^g \pi_j f_j(x).
\end{align*}
Since $\pi_j > 0$ and $f_j(x) > 0$ for every $j$ and every $x \in \mathbb{R}^p$, we have $m(x) > 0$ for every $x \in \mathbb{R}^p$. Since each $f_j$ is a probability density with respect to $\mathcal{L}^p$ and $\sum_{j=1}^g \pi_j = 1$, the mixture density satisfies
\begin{align*}
\int_{\mathbb{R}^p} m(x)\,d\mathcal{L}^p(x)
&= \sum_{j=1}^g \pi_j \int_{\mathbb{R}^p} f_j(x)\,d\mathcal{L}^p(x)
= \sum_{j=1}^g \pi_j
= 1.
\end{align*}
Thus $m$ is the density of the marginal law of $X$.
For each $k$, choose the regular conditional probability version
\begin{align*}
\eta_k : \mathbb{R}^p &\to [0,1] \\
x &\mapsto \mathbb{P}(Y=k \mid X=x)
= \frac{\pi_k f_k(x)}{m(x)}.
\end{align*}
This is a posterior version defined for every $x \in \mathbb{R}^p$; changing it on an $X$-null set would give the same regular conditional distribution.
[guided]
Let $\mathcal{L}^p$ denote Lebesgue measure on $\mathbb{R}^p$. This is the measure with respect to which the Gaussian densities below are integrated.
For each class $k$, the conditional distribution of $X$ given $Y=k$ is multivariate normal with mean $\mu_k$ and covariance matrix $\Sigma_k$. Because $\Sigma_k$ is symmetric positive definite, the determinant is positive and the inverse exists, so the Gaussian density is the well-defined map
\begin{align*}
f_k : \mathbb{R}^p &\to (0,\infty) \\
x &\mapsto (2\pi)^{-p/2}(\det \Sigma_k)^{-1/2}
\exp\left(-\frac{1}{2}(x-\mu_k)^\top \Sigma_k^{-1}(x-\mu_k)\right).
\end{align*}
The unconditional density of $X$ is obtained by mixing the class-conditional densities with weights given by the priors. Thus we define
\begin{align*}
m : \mathbb{R}^p &\to (0,\infty) \\
x &\mapsto \sum_{j=1}^g \pi_j f_j(x).
\end{align*}
Each summand $\pi_j f_j(x)$ is positive because $\pi_j > 0$ and $f_j(x) > 0$. Therefore $m(x) > 0$ for every $x \in \mathbb{R}^p$, so division by $m(x)$ is legitimate. The prior normalization is used here: since each $f_j$ integrates to $1$ with respect to $\mathcal{L}^p$ and $\sum_{j=1}^g \pi_j = 1$,
\begin{align*}
\int_{\mathbb{R}^p} m(x)\,d\mathcal{L}^p(x)
&= \sum_{j=1}^g \pi_j \int_{\mathbb{R}^p} f_j(x)\,d\mathcal{L}^p(x)
= 1.
\end{align*}
Thus $m$ is the marginal density of $X$.
There is a technical point in the notation $\mathbb{P}(Y=k \mid X=x)$: when $X$ has a density, the event $\{X=x\}$ has probability zero. Therefore this expression means a chosen regular conditional probability version. [Bayes' formula](/theorems/1114) for densities gives the posterior version
\begin{align*}
\eta_k : \mathbb{R}^p &\to [0,1] \\
x &\mapsto \mathbb{P}(Y=k \mid X=x)
= \frac{\pi_k f_k(x)}{m(x)}.
\end{align*}
The codomain is $[0,1]$ because in the edge case $g=1$ the unique posterior probability is $1$. The denominator $m(x)$ is the same for every class, so the only class-dependent quantity in the posterior comparison is $\pi_k f_k(x)$.
[/guided]
[/step]
[step:Reduce posterior maximization to log-density maximization]
Fix $x \in \mathbb{R}^p$. Since $m(x) > 0$ and does not depend on $k$,
\begin{align*}
\operatorname*{arg\,max}_{1 \le k \le g} \eta_k(x)
=
\operatorname*{arg\,max}_{1 \le k \le g} \pi_k f_k(x).
\end{align*}
Since $\pi_k f_k(x) > 0$ for every $k$ and the logarithm is strictly increasing on $(0,\infty)$,
\begin{align*}
\operatorname*{arg\,max}_{1 \le k \le g} \pi_k f_k(x)
=
\operatorname*{arg\,max}_{1 \le k \le g} \log(\pi_k f_k(x)).
\end{align*}
Expanding the logarithm gives
\begin{align*}
\log(\pi_k f_k(x))
&= \log \pi_k - \frac{p}{2}\log(2\pi)
- \frac{1}{2}\log \det \Sigma_k
- \frac{1}{2}(x-\mu_k)^\top \Sigma_k^{-1}(x-\mu_k).
\end{align*}
[guided]
Fix an observation $x \in \mathbb{R}^p$. The posterior probability of class $k$ is
\begin{align*}
\eta_k(x) = \frac{\pi_k f_k(x)}{m(x)}.
\end{align*}
The denominator $m(x)$ is positive and is the same for all classes $k$. Therefore multiplying each posterior probability by the same positive number $m(x)$ does not change which classes maximize it:
\begin{align*}
\operatorname*{arg\,max}_{1 \le k \le g} \eta_k(x)
=
\operatorname*{arg\,max}_{1 \le k \le g} \pi_k f_k(x).
\end{align*}
Next, every number $\pi_k f_k(x)$ is positive. The logarithm is strictly increasing on $(0,\infty)$, so applying $\log$ to each positive score also preserves the set of maximizers:
\begin{align*}
\operatorname*{arg\,max}_{1 \le k \le g} \pi_k f_k(x)
=
\operatorname*{arg\,max}_{1 \le k \le g} \log(\pi_k f_k(x)).
\end{align*}
Now substitute the explicit Gaussian density:
\begin{align*}
\log(\pi_k f_k(x))
&= \log \pi_k
+ \log\left((2\pi)^{-p/2}\right)
+ \log\left((\det \Sigma_k)^{-1/2}\right)
+ \log\left(
\exp\left(-\frac{1}{2}(x-\mu_k)^\top \Sigma_k^{-1}(x-\mu_k)\right)
\right) \\
&= \log \pi_k - \frac{p}{2}\log(2\pi)
- \frac{1}{2}\log \det \Sigma_k
- \frac{1}{2}(x-\mu_k)^\top \Sigma_k^{-1}(x-\mu_k).
\end{align*}
This separates the log posterior numerator into the prior contribution, the determinant contribution, the quadratic Mahalanobis contribution, and the class-independent normalizing constant.
[/guided]
[/step]
[step:Remove the class-independent constant and identify a pointwise Bayes rule]
The term $-\frac{p}{2}\log(2\pi)$ is independent of $k$. Therefore adding or subtracting it from every class score does not change the set of maximizers. Hence
\begin{align*}
\operatorname*{arg\,max}_{1 \le k \le g} \log(\pi_k f_k(x))
&=
\operatorname*{arg\,max}_{1 \le k \le g}
\left[
\log \pi_k
- \frac{1}{2}\log \det \Sigma_k
- \frac{1}{2}(x-\mu_k)^\top \Sigma_k^{-1}(x-\mu_k)
\right] \\
&=
\operatorname*{arg\,max}_{1 \le k \le g} q_k(x).
\end{align*}
For $0$-$1$ loss, the conditional risk of assigning class $a \in \{1,\dots,g\}$ at the fixed observation $x$ is
\begin{align*}
R_x(a) := \mathbb{P}(Y \neq a \mid X=x) = 1 - \eta_a(x).
\end{align*}
Thus a pointwise conditional-risk minimizer is exactly a maximizer of $\eta_a(x)$, and the preceding identities show that one may choose such a minimizer in
\begin{align*}
\operatorname*{arg\,max}_{1 \le k \le g} q_k(x).
\end{align*}
Define the smallest-index tie-breaking classifier
\begin{align*}
\delta : \mathbb{R}^p &\to \{1,\dots,g\} \\
x &\mapsto \min \operatorname*{arg\,max}_{1 \le k \le g} q_k(x).
\end{align*}
This map is measurable: for each $a \in \{1,\dots,g\}$, the event $\{x \in \mathbb{R}^p : \delta(x)=a\}$ is a finite intersection of sets of the form $\{x : q_a(x) \ge q_j(x)\}$ and $\{x : q_i(x) < q_a(x)\}$, and these sets are Borel because each $q_k$ is continuous. Thus $\delta$ is a measurable [Bayes classifier](/theorems/1941) version. A global Bayes classifier is determined only up to changes on $X$-null sets, so the conclusion is the existence of this measurable pointwise maximizing version, not that every representative must agree at every individual $x$.
[guided]
The logarithmic score contains one term that does not depend on the class:
\begin{align*}
-\frac{p}{2}\log(2\pi).
\end{align*}
If the same real number is added to every candidate score, the set of maximizers is unchanged. Removing this class-independent term gives
\begin{align*}
\operatorname*{arg\,max}_{1 \le k \le g} \log(\pi_k f_k(x))
&=
\operatorname*{arg\,max}_{1 \le k \le g}
\left[
\log \pi_k
- \frac{1}{2}\log \det \Sigma_k
- \frac{1}{2}(x-\mu_k)^\top \Sigma_k^{-1}(x-\mu_k)
\right] \\
&=
\operatorname*{arg\,max}_{1 \le k \le g} q_k(x).
\end{align*}
Now connect the score maximization to Bayes risk. For $0$-$1$ loss, if we decide class $a \in \{1,\dots,g\}$ after observing $x$, the conditional probability of making an error is
\begin{align*}
R_x(a) := \mathbb{P}(Y \neq a \mid X=x) = 1 - \mathbb{P}(Y=a \mid X=x) = 1 - \eta_a(x).
\end{align*}
Therefore minimizing $R_x(a)$ over $a$ is the same as maximizing $\eta_a(x)$ over $a$. The earlier reductions showed that the maximizers of $\eta_a(x)$ are precisely the maximizers of $q_a(x)$, so a pointwise Bayes decision at $x$ may be chosen from
\begin{align*}
\operatorname*{arg\,max}_{1 \le k \le g} q_k(x).
\end{align*}
To turn the pointwise maximizing statement into a classifier, we must choose ties measurably. Define
\begin{align*}
\delta : \mathbb{R}^p &\to \{1,\dots,g\} \\
x &\mapsto \min \operatorname*{arg\,max}_{1 \le k \le g} q_k(x).
\end{align*}
This is a measurable rule. Indeed, for a fixed class $a$, the set where $\delta(x)=a$ is described by requiring $q_a(x) \ge q_j(x)$ for every $j$ and requiring $q_i(x) < q_a(x)$ for every $i<a$. These are finitely many Borel conditions because all the score functions $q_k$ are continuous functions of $x$. Therefore $\delta$ is a measurable classifier that chooses a maximizing class at every $x \in \mathbb{R}^p$.
This pointwise statement should still be read with the regular conditional probability convention from the first step. Since posterior probabilities are only determined up to $X$-null sets, a global Bayes classifier is also only determined up to such null-set modifications. Thus the theorem proves that there exists a measurable Bayes classifier version that uses the displayed maximizing rule for every $x \in \mathbb{R}^p$.
[/guided]
[/step]