[proofplan]
We derive the Bayes rule by comparing the two equal-prior Gaussian class-conditional densities. Because the covariance matrices are equal, the quadratic terms cancel and the decision rule reduces to thresholding the one-dimensional Fisher score $(\mu_1-\mu_2)^\top\Sigma^{-1}X$ at the midpoint. Under either class this centered score is a univariate normal random variable with mean $\Delta^2/2$ or $-\Delta^2/2$ and variance $\Delta^2$, so each conditional misclassification probability equals $\Phi(-\Delta/2)$. The equal prior weights then leave the same value as the total Bayes error.
[/proofplan]
[step:Handle the case where the two Gaussian laws coincide]
Assume first that $\Delta=0$. Since $\Sigma$ is positive definite, the quadratic form $v \mapsto v^\top \Sigma^{-1}v$ is positive definite, so $\Delta=0$ implies $\mu_1=\mu_2$. Hence the two conditional laws of $X$ are identical.
Let $\delta:\mathbb{R}^p\to\{1,2\}$ be any deterministic classifier. Denote the common conditional law of $X$ by $\nu$. Its error probability is
\begin{align*}
\mathbb{P}(\delta(X)\neq Y)
&=
\frac{1}{2}\mathbb{P}(\delta(X)=2\mid Y=1)
+
\frac{1}{2}\mathbb{P}(\delta(X)=1\mid Y=2)\\
&=
\frac{1}{2}\nu(\{x\in\mathbb{R}^p:\delta(x)=2\})
+
\frac{1}{2}\nu(\{x\in\mathbb{R}^p:\delta(x)=1\})\\
&=
\frac{1}{2}.
\end{align*}
Thus the Bayes error is $1/2$. Since $\Phi(0)=1/2$, this equals $\Phi(-\Delta/2)$ when $\Delta=0$.
[/step]
[step:Compute the equal-prior Bayes decision rule when the means differ]
Assume now that $\Delta>0$. Define the mean difference vector $d\in\mathbb{R}^p$, the Fisher direction $a\in\mathbb{R}^p$, and the midpoint $m\in\mathbb{R}^p$ by
\begin{align*}
d := \mu_1-\mu_2,
\qquad
a := \Sigma^{-1}d,
\qquad
m := \frac{\mu_1+\mu_2}{2}.
\end{align*}
For $k\in\{1,2\}$, let $f_k:\mathbb{R}^p\to(0,\infty)$ be the density of $\mathcal{N}_p(\mu_k,\Sigma)$ with respect to $\mathcal{L}^p$:
\begin{align*}
f_k(x)
:=
\frac{1}{(2\pi)^{p/2}(\det\Sigma)^{1/2}}
\exp\left(
-\frac{1}{2}(x-\mu_k)^\top\Sigma^{-1}(x-\mu_k)
\right).
\end{align*}
With equal priors, the [Bayes classifier](/theorems/1941) chooses class $1$ exactly where $f_1(x)\ge f_2(x)$ and class $2$ otherwise. Taking logarithms, this condition is equivalent to
\begin{align*}
(x-\mu_1)^\top\Sigma^{-1}(x-\mu_1)
\le
(x-\mu_2)^\top\Sigma^{-1}(x-\mu_2).
\end{align*}
Expanding both quadratic forms and cancelling the common term $x^\top\Sigma^{-1}x$ gives
\begin{align*}
-2\mu_1^\top\Sigma^{-1}x+\mu_1^\top\Sigma^{-1}\mu_1
\le
-2\mu_2^\top\Sigma^{-1}x+\mu_2^\top\Sigma^{-1}\mu_2.
\end{align*}
Rearranging yields
\begin{align*}
d^\top\Sigma^{-1}x
\ge
\frac{1}{2}\left(\mu_1^\top\Sigma^{-1}\mu_1-\mu_2^\top\Sigma^{-1}\mu_2\right).
\end{align*}
Since $\Sigma^{-1}$ is symmetric, the right-hand side is $d^\top\Sigma^{-1}m$. Therefore the Bayes classifier $\delta_*:\mathbb{R}^p\to\{1,2\}$ is
\begin{align*}
\delta_*(x)
=
\begin{cases}
1, & a^\top(x-m)\ge 0,\\
2, & a^\top(x-m)<0.
\end{cases}
\end{align*}
[guided]
The equal-prior Bayes rule compares posterior probabilities. Because the priors are equal, comparing posterior probabilities is the same as comparing the class-conditional densities $f_1$ and $f_2$.
For each $k\in\{1,2\}$, the conditional density of $X$ given $Y=k$ is the map $f_k:\mathbb{R}^p\to(0,\infty)$ defined by
\begin{align*}
f_k(x)
:=
\frac{1}{(2\pi)^{p/2}(\det\Sigma)^{1/2}}
\exp\left(
-\frac{1}{2}(x-\mu_k)^\top\Sigma^{-1}(x-\mu_k)
\right).
\end{align*}
The normalizing constants are identical because the covariance matrix is the same in both classes. Therefore $f_1(x)\ge f_2(x)$ holds exactly when
\begin{align*}
(x-\mu_1)^\top\Sigma^{-1}(x-\mu_1)
\le
(x-\mu_2)^\top\Sigma^{-1}(x-\mu_2).
\end{align*}
The key cancellation is the common quadratic term in $x$. Expanding gives
\begin{align*}
x^\top\Sigma^{-1}x
-2\mu_1^\top\Sigma^{-1}x
+\mu_1^\top\Sigma^{-1}\mu_1
\le
x^\top\Sigma^{-1}x
-2\mu_2^\top\Sigma^{-1}x
+\mu_2^\top\Sigma^{-1}\mu_2.
\end{align*}
After cancelling $x^\top\Sigma^{-1}x$ and rearranging, we obtain
\begin{align*}
(\mu_1-\mu_2)^\top\Sigma^{-1}x
\ge
\frac{1}{2}\left(\mu_1^\top\Sigma^{-1}\mu_1-\mu_2^\top\Sigma^{-1}\mu_2\right).
\end{align*}
Now define
\begin{align*}
d := \mu_1-\mu_2,
\qquad
a := \Sigma^{-1}d,
\qquad
m := \frac{\mu_1+\mu_2}{2}.
\end{align*}
Since $\Sigma^{-1}$ is symmetric,
\begin{align*}
d^\top\Sigma^{-1}m
&=
\frac{1}{2}(\mu_1-\mu_2)^\top\Sigma^{-1}(\mu_1+\mu_2)\\
&=
\frac{1}{2}\left(\mu_1^\top\Sigma^{-1}\mu_1-\mu_2^\top\Sigma^{-1}\mu_2\right).
\end{align*}
Thus the decision rule is the hyperplane rule
\begin{align*}
\delta_*(x)
=
\begin{cases}
1, & a^\top(x-m)\ge 0,\\
2, & a^\top(x-m)<0.
\end{cases}
\end{align*}
This is the Fisher linear discriminant rule: project $x$ onto the direction $a=\Sigma^{-1}(\mu_1-\mu_2)$ and compare with the projected midpoint.
[/guided]
[/step]
[step:Compute the class one misclassification probability from the projected score]
Define the score map $S:\mathbb{R}^p\to\mathbb{R}$ by
\begin{align*}
S(x) := a^\top(x-m).
\end{align*}
Conditionally on $Y=1$, the random variable $S(X):\Omega\to\mathbb{R}$ is normally distributed with mean
\begin{align*}
a^\top(\mu_1-m)
&=
a^\top\left(\frac{\mu_1-\mu_2}{2}\right)
=
\frac{1}{2}d^\top\Sigma^{-1}d
=
\frac{\Delta^2}{2}
\end{align*}
and variance
\begin{align*}
a^\top\Sigma a
&=
d^\top\Sigma^{-1}\Sigma\Sigma^{-1}d
=
d^\top\Sigma^{-1}d
=
\Delta^2.
\end{align*}
Therefore, under $Y=1$,
\begin{align*}
\frac{S(X)-\Delta^2/2}{\Delta}\sim \mathcal{N}(0,1).
\end{align*}
The classifier assigns class $2$ exactly when $S(X)<0$, so
\begin{align*}
\mathbb{P}(\delta_*(X)=2\mid Y=1)
&=
\mathbb{P}(S(X)<0\mid Y=1)\\
&=
\mathbb{P}\left(\frac{S(X)-\Delta^2/2}{\Delta}<-\frac{\Delta}{2}\,\middle|\,Y=1\right)\\
&=
\Phi\left(-\frac{\Delta}{2}\right).
\end{align*}
[/step]
[step:Compute the class two misclassification probability by the same score]
Conditionally on $Y=2$, the same score $S(X)$ is normally distributed with mean
\begin{align*}
a^\top(\mu_2-m)
&=
a^\top\left(\frac{\mu_2-\mu_1}{2}\right)
=
-\frac{1}{2}d^\top\Sigma^{-1}d
=
-\frac{\Delta^2}{2}
\end{align*}
and variance
\begin{align*}
a^\top\Sigma a=\Delta^2.
\end{align*}
The classifier assigns class $1$ exactly when $S(X)\ge 0$. Hence
\begin{align*}
\mathbb{P}(\delta_*(X)=1\mid Y=2)
&=
\mathbb{P}(S(X)\ge 0\mid Y=2)\\
&=
\mathbb{P}\left(\frac{S(X)+\Delta^2/2}{\Delta}\ge \frac{\Delta}{2}\,\middle|\,Y=2\right)\\
&=
1-\Phi\left(\frac{\Delta}{2}\right)\\
&=
\Phi\left(-\frac{\Delta}{2}\right),
\end{align*}
where the final equality uses the symmetry identity $\Phi(-t)=1-\Phi(t)$ for the standard normal distribution.
[/step]
[step:Average the two conditional errors using the equal priors]
The Bayes error rate is the error probability of $\delta_*$. Since $\mathbb{P}(Y=1)=\mathbb{P}(Y=2)=1/2$, the [law of total probability](/theorems/1113) gives
\begin{align*}
\mathbb{P}(\delta_*(X)\neq Y)
&=
\frac{1}{2}\mathbb{P}(\delta_*(X)=2\mid Y=1)
+
\frac{1}{2}\mathbb{P}(\delta_*(X)=1\mid Y=2)\\
&=
\frac{1}{2}\Phi\left(-\frac{\Delta}{2}\right)
+
\frac{1}{2}\Phi\left(-\frac{\Delta}{2}\right)\\
&=
\Phi\left(-\frac{\Delta}{2}\right).
\end{align*}
Together with the already handled case $\Delta=0$, this proves that the Bayes error rate is $\Phi(-\Delta/2)$ for all $\mu_1,\mu_2\in\mathbb{R}^p$.
[/step]