[proofplan]
The Bayes rule under zero-one loss compares the posterior scores $\pi_k f_k(x)$, where $f_k$ is the conditional Gaussian density of $X$ given $Y=k$; strict preference for class $1$ is the strict inequality $\pi_1 f_1(x)>\pi_2 f_2(x)$, while equality is a tie to be resolved by a separate tie-breaking convention. Because both classes have the same covariance matrix, the normalising constants and determinant factors in $f_1$ and $f_2$ cancel after taking logarithms. Expanding the two quadratic forms cancels the common $x^\top\Sigma^{-1}x$ term and leaves exactly the stated linear inequality in $x$. When the priors are equal, the prior-odds term vanishes and the equality set is the locus of equal Mahalanobis squared distance from the two means; this is a hyperplane if the means are distinct and all of $\mathbb R^p$ if the means coincide.
[/proofplan]
[step:Write the Bayes comparison using the two Gaussian densities]
Since $\Sigma$ is symmetric positive definite, the inverse matrix $A:=\Sigma^{-1}\in\mathbb R^{p\times p}$ exists and is symmetric positive definite. For each $k\in\{1,2\}$, define the conditional density
\begin{align*}
f_k:\mathbb R^p&\to(0,\infty)\\
x&\mapsto (2\pi)^{-p/2}(\det\Sigma)^{-1/2}
\exp\left(-\frac{1}{2}(x-\mu_k)^\top A(x-\mu_k)\right).
\end{align*}
Under zero-one loss, class $1$ has strictly smaller conditional risk than class $2$ at $x$ exactly when its posterior score is strictly larger, namely when
\begin{align*}
\pi_1 f_1(x)>\pi_2 f_2(x).
\end{align*}
On the equality set $\pi_1 f_1(x)=\pi_2 f_2(x)$, the two actions have the same conditional risk, so a single-valued [Bayes classifier](/theorems/1941) may break the tie arbitrarily.
Since $\pi_1,\pi_2>0$ and $f_1(x),f_2(x)>0$, taking logarithms preserves the strict inequality. Thus the comparison is equivalent to
\begin{align*}
\log\pi_1+\log f_1(x)>\log\pi_2+\log f_2(x).
\end{align*}
[guided]
The Bayes classifier under zero-one loss chooses the class with the larger posterior probability. For two classes with prior probabilities $\pi_k$ and conditional densities $f_k$, this posterior comparison is equivalent to comparing the unnormalised posterior scores $\pi_k f_k(x)$, because the common marginal density of $X$ at $x$ is positive and cancels from both sides.
Since $\Sigma$ is symmetric positive definite, its inverse
\begin{align*}
A:=\Sigma^{-1}
\end{align*}
exists and is also symmetric positive definite. For each class $k\in\{1,2\}$, the conditional Gaussian density is the map
\begin{align*}
f_k:\mathbb R^p&\to(0,\infty)\\
x&\mapsto (2\pi)^{-p/2}(\det\Sigma)^{-1/2}
\exp\left(-\frac{1}{2}(x-\mu_k)^\top A(x-\mu_k)\right).
\end{align*}
Therefore class $1$ is strictly preferred to class $2$ exactly when
\begin{align*}
\pi_1 f_1(x)>\pi_2 f_2(x).
\end{align*}
If equality holds instead, the two classes have equal posterior score and hence equal conditional risk under zero-one loss; a Bayes classifier is then not uniquely determined by the risk comparison and may use any tie-breaking convention. All quantities in the strict inequality are positive: $\pi_1,\pi_2>0$ by hypothesis, and Gaussian densities are strictly positive on $\mathbb R^p$. Hence the logarithm is strictly increasing on the relevant domain, so the same comparison is equivalent to
\begin{align*}
\log\pi_1+\log f_1(x)>\log\pi_2+\log f_2(x).
\end{align*}
[/guided]
[/step]
[step:Cancel the common Gaussian normalising terms]
For each $k\in\{1,2\}$, the logarithm of $f_k(x)$ is
\begin{align*}
\log f_k(x)
=
-\frac{p}{2}\log(2\pi)-\frac{1}{2}\log\det\Sigma
-\frac{1}{2}(x-\mu_k)^\top A(x-\mu_k).
\end{align*}
Substituting this expression into the logarithmic Bayes comparison, the terms
\begin{align*}
-\frac{p}{2}\log(2\pi)-\frac{1}{2}\log\det\Sigma
\end{align*}
appear on both sides and cancel. Hence class $1$ is chosen exactly when
\begin{align*}
\log\pi_1-\frac{1}{2}(x-\mu_1)^\top A(x-\mu_1)
>
\log\pi_2-\frac{1}{2}(x-\mu_2)^\top A(x-\mu_2).
\end{align*}
[/step]
[step:Expand the quadratic forms and isolate the linear term in $x$]
Because $A$ is symmetric, for each $k\in\{1,2\}$ we have
\begin{align*}
(x-\mu_k)^\top A(x-\mu_k)
=
x^\top Ax-2\mu_k^\top Ax+\mu_k^\top A\mu_k.
\end{align*}
Substituting these expansions gives
\begin{align*}
\log\pi_1-\frac{1}{2}x^\top Ax+\mu_1^\top Ax-\frac{1}{2}\mu_1^\top A\mu_1
>
\log\pi_2-\frac{1}{2}x^\top Ax+\mu_2^\top Ax-\frac{1}{2}\mu_2^\top A\mu_2.
\end{align*}
The common term $-\frac{1}{2}x^\top Ax$ cancels. Moving all remaining terms involving $x$ to the left and all constant terms to the right yields
\begin{align*}
(\mu_1-\mu_2)^\top Ax
>
\frac{1}{2}\left(\mu_1^\top A\mu_1-\mu_2^\top A\mu_2\right)-\log\frac{\pi_1}{\pi_2}.
\end{align*}
Since $A=\Sigma^{-1}$, this is exactly
\begin{align*}
(\mu_1-\mu_2)^\top\Sigma^{-1}x
>
\frac{1}{2}\left(\mu_1^\top\Sigma^{-1}\mu_1-\mu_2^\top\Sigma^{-1}\mu_2\right)-\log\frac{\pi_1}{\pi_2}.
\end{align*}
[guided]
The only algebraic point that needs care is the expansion of the quadratic form. Since $A$ is symmetric, the two mixed terms agree:
\begin{align*}
x^\top A\mu_k=\mu_k^\top A^\top x=\mu_k^\top Ax.
\end{align*}
Thus, for each $k\in\{1,2\}$,
\begin{align*}
(x-\mu_k)^\top A(x-\mu_k)
=
x^\top Ax-2\mu_k^\top Ax+\mu_k^\top A\mu_k.
\end{align*}
Substituting this into the logarithmic Bayes comparison gives
\begin{align*}
\log\pi_1-\frac{1}{2}x^\top Ax+\mu_1^\top Ax-\frac{1}{2}\mu_1^\top A\mu_1
>
\log\pi_2-\frac{1}{2}x^\top Ax+\mu_2^\top Ax-\frac{1}{2}\mu_2^\top A\mu_2.
\end{align*}
The term $-\frac{1}{2}x^\top Ax$ is present on both sides because both classes have the same covariance matrix. This is the exact algebraic reason the Bayes rule becomes linear in $x$ rather than quadratic.
Canceling the common term and rearranging gives
\begin{align*}
\mu_1^\top Ax-\mu_2^\top Ax
>
\frac{1}{2}\mu_1^\top A\mu_1-\frac{1}{2}\mu_2^\top A\mu_2-\log\pi_1+\log\pi_2.
\end{align*}
Combining the left-hand side and rewriting the logarithms as a prior-odds term,
\begin{align*}
(\mu_1-\mu_2)^\top Ax
>
\frac{1}{2}\left(\mu_1^\top A\mu_1-\mu_2^\top A\mu_2\right)-\log\frac{\pi_1}{\pi_2}.
\end{align*}
Finally $A=\Sigma^{-1}$ by definition, so this is precisely the stated decision inequality.
[/guided]
[/step]
[step:Identify the equal-prior boundary as the Mahalanobis bisector]
Assume $\pi_1=\pi_2$. Then $\log(\pi_1/\pi_2)=0$, and the decision boundary is the set of $x\in\mathbb R^p$ satisfying
\begin{align*}
(\mu_1-\mu_2)^\top Ax
=
\frac{1}{2}\left(\mu_1^\top A\mu_1-\mu_2^\top A\mu_2\right).
\end{align*}
For $v\in\mathbb R^p$, define $|v|_A^2:=v^\top Av$. Then
\begin{align*}
|x-\mu_1|_A^2=|x-\mu_2|_A^2
\end{align*}
is equivalent, after expanding both sides, to
\begin{align*}
x^\top Ax-2\mu_1^\top Ax+\mu_1^\top A\mu_1
=
x^\top Ax-2\mu_2^\top Ax+\mu_2^\top A\mu_2,
\end{align*}
which is equivalent to
\begin{align*}
(\mu_1-\mu_2)^\top Ax
=
\frac{1}{2}\left(\mu_1^\top A\mu_1-\mu_2^\top A\mu_2\right).
\end{align*}
Thus the equal-prior equality set is exactly the locus of points with equal squared Mahalanobis distance from $\mu_1$ and $\mu_2$. Since this locus is defined by one nonconstant affine linear equation in $x$ when $\mu_1\ne\mu_2$, it is a hyperplane in that case. If $\mu_1=\mu_2$, the equality above reduces to $0=0$, the two class-conditional densities coincide, and the equality set is all of $\mathbb R^p$ rather than a proper hyperplane. This completes the proof.
[/step]