[proofplan]
Project the multivariate observations onto each fixed contrast direction $a_j$, reducing the problem to scalar normal samples. For every nonzero contrast, the usual centered and studentized sample mean has a Student $t$ distribution with $n-1$ degrees of freedom, so the chosen Bonferroni quantile gives marginal failure probability $\alpha/k$. Zero contrast vectors are harmless because their target and interval both collapse to $0$. Finally, the union bound converts the $k$ marginal failure bounds into the stated simultaneous coverage bound.
[/proofplan]
[step:Project each fixed contrast to a scalar normal sample]
For each $j \in \{1,\dots,k\}$ and $i \in \{1,\dots,n\}$, define the real-valued random variable
\begin{align*}
Y_{ij}: (\Omega,\mathcal{F}) &\to (\mathbb{R},\mathcal{B}(\mathbb{R})) \\
\omega &\mapsto a_j^\top X_i(\omega).
\end{align*}
Since $X_1,\dots,X_n$ are independent and identically distributed as $\mathcal{N}_p(\mu,\Sigma)$ and $a_j$ is deterministic, the random variables $Y_{1j},\dots,Y_{nj}$ are independent and identically distributed with univariate normal distribution
\begin{align*}
Y_{ij} \sim \mathcal{N}(a_j^\top \mu,\, a_j^\top \Sigma a_j).
\end{align*}
Define the scalar sample mean and scalar sample variance for the $j$th projected sample by
\begin{align*}
\bar Y_j := \frac{1}{n}\sum_{i=1}^n Y_{ij},
\qquad
V_j := \frac{1}{n-1}\sum_{i=1}^n (Y_{ij}-\bar Y_j)^2 .
\end{align*}
By the definitions of $\bar X$ and $S$,
\begin{align*}
\bar Y_j
=
a_j^\top \bar X
\end{align*}
and
\begin{align*}
V_j
&=
\frac{1}{n-1}\sum_{i=1}^n
\left(a_j^\top X_i-a_j^\top \bar X\right)^2 \\
&=
\frac{1}{n-1}\sum_{i=1}^n
a_j^\top (X_i-\bar X)(X_i-\bar X)^\top a_j \\
&=
a_j^\top S a_j .
\end{align*}
[guided]
The goal is to prove a statement about $a_j^\top \bar X$ and $a_j^\top S a_j$. For a fixed $j$, these are exactly the ordinary sample mean and sample variance of the scalar observations obtained by applying the fixed linear functional $x \mapsto a_j^\top x$ to each vector observation.
For each $j \in \{1,\dots,k\}$ and $i \in \{1,\dots,n\}$, define
\begin{align*}
Y_{ij}: (\Omega,\mathcal{F}) &\to (\mathbb{R},\mathcal{B}(\mathbb{R})) \\
\omega &\mapsto a_j^\top X_i(\omega).
\end{align*}
The map $x \mapsto a_j^\top x$ is deterministic and linear. Therefore applying it to independent random vectors preserves independence, and since the $X_i$ have the same distribution, the $Y_{ij}$ also have the same distribution for fixed $j$. A linear image of a multivariate normal random vector is univariate normal, with mean and variance computed by the usual linear transformation formulas:
\begin{align*}
Y_{ij} \sim \mathcal{N}(a_j^\top \mu,\, a_j^\top \Sigma a_j).
\end{align*}
Now define the scalar sample mean and scalar sample variance by
\begin{align*}
\bar Y_j := \frac{1}{n}\sum_{i=1}^n Y_{ij},
\qquad
V_j := \frac{1}{n-1}\sum_{i=1}^n (Y_{ij}-\bar Y_j)^2 .
\end{align*}
The sample mean is
\begin{align*}
\bar Y_j
=
\frac{1}{n}\sum_{i=1}^n a_j^\top X_i
=
a_j^\top \left(\frac{1}{n}\sum_{i=1}^n X_i\right)
=
a_j^\top \bar X .
\end{align*}
For the variance, substitute $Y_{ij}=a_j^\top X_i$ and $\bar Y_j=a_j^\top \bar X$:
\begin{align*}
V_j
&=
\frac{1}{n-1}\sum_{i=1}^n
\left(a_j^\top X_i-a_j^\top \bar X\right)^2 \\
&=
\frac{1}{n-1}\sum_{i=1}^n
\left(a_j^\top (X_i-\bar X)\right)^2 \\
&=
\frac{1}{n-1}\sum_{i=1}^n
a_j^\top (X_i-\bar X)(X_i-\bar X)^\top a_j \\
&=
a_j^\top
\left[
\frac{1}{n-1}\sum_{i=1}^n
(X_i-\bar X)(X_i-\bar X)^\top
\right]
a_j \\
&=
a_j^\top S a_j .
\end{align*}
Thus the displayed interval in the theorem is precisely the usual scalar $t$ interval for the projected observations $Y_{1j},\dots,Y_{nj}$.
[/guided]
[/step]
[step:Obtain the marginal coverage probability for every contrast]
Define the Bonferroni critical value
\begin{align*}
c := t_{n-1;1-\alpha/(2k)} .
\end{align*}
For each $j \in \{1,\dots,k\}$, define the coverage event
\begin{align*}
A_j
:=
\left\{
\omega \in \Omega :
|a_j^\top \bar X(\omega)-a_j^\top \mu|
\leq
c\sqrt{\frac{a_j^\top S(\omega)a_j}{n}}
\right\}.
\end{align*}
If $a_j=0$, then $a_j^\top \bar X=0$, $a_j^\top \mu=0$, and $a_j^\top S a_j=0$, so $A_j=\Omega$ and hence
\begin{align*}
\mathbb{P}(A_j^c)=0 \leq \frac{\alpha}{k}.
\end{align*}
Now suppose $a_j \neq 0$. Since $\Sigma$ is positive definite,
\begin{align*}
\sigma_j^2 := a_j^\top \Sigma a_j > 0.
\end{align*}
By the standard univariate normal theory fact that the studentized mean of an independent normal sample has a Student $t$ distribution with $n-1$ degrees of freedom (citing a result not yet in the wiki: Student $t$ statistic for a normal sample), the statistic
\begin{align*}
T_j
:=
\frac{\bar Y_j-a_j^\top \mu}{\sqrt{V_j/n}}
\end{align*}
has the Student $t$ distribution with $n-1$ degrees of freedom. Using $\bar Y_j=a_j^\top \bar X$ and $V_j=a_j^\top S a_j$, the event $A_j$ is exactly $\{|T_j|\leq c\}$. Therefore, by the definition of $c$ as the two-sided $1-\alpha/k$ Student $t$ critical value,
\begin{align*}
\mathbb{P}(A_j^c)
=
\mathbb{P}(|T_j|>c)
=
\frac{\alpha}{k}.
\end{align*}
Thus for every $j \in \{1,\dots,k\}$,
\begin{align*}
\mathbb{P}(A_j^c) \leq \frac{\alpha}{k}.
\end{align*}
[guided]
We now prove the marginal confidence statement for one contrast at a time. Define
\begin{align*}
c := t_{n-1;1-\alpha/(2k)} .
\end{align*}
For each $j \in \{1,\dots,k\}$, define the event
\begin{align*}
A_j
:=
\left\{
\omega \in \Omega :
|a_j^\top \bar X(\omega)-a_j^\top \mu|
\leq
c\sqrt{\frac{a_j^\top S(\omega)a_j}{n}}
\right\}.
\end{align*}
This is the event that the $j$th displayed interval contains its target value $a_j^\top \mu$.
There is a small degeneracy to handle first. If $a_j=0$, then the contrast is identically zero:
\begin{align*}
a_j^\top \bar X=0,
\qquad
a_j^\top \mu=0,
\qquad
a_j^\top S a_j=0.
\end{align*}
Hence the interval is $[0,0]$, which contains $0$ for every outcome $\omega \in \Omega$. Thus $A_j=\Omega$, and
\begin{align*}
\mathbb{P}(A_j^c)=0 \leq \frac{\alpha}{k}.
\end{align*}
Assume now that $a_j \neq 0$. Since $\Sigma$ is positive definite, the projected variance is strictly positive:
\begin{align*}
\sigma_j^2 := a_j^\top \Sigma a_j > 0.
\end{align*}
From the previous step, $Y_{1j},\dots,Y_{nj}$ are independent and identically distributed as $\mathcal{N}(a_j^\top \mu,\sigma_j^2)$. Therefore the usual studentized mean statistic
\begin{align*}
T_j
:=
\frac{\bar Y_j-a_j^\top \mu}{\sqrt{V_j/n}}
\end{align*}
has the Student $t$ distribution with $n-1$ degrees of freedom by the standard univariate normal theory result for the Student statistic (citing a result not yet in the wiki: Student $t$ statistic for a normal sample). The hypotheses of that result are satisfied here: the observations are independent, normally distributed, have common finite mean $a_j^\top \mu$, have common positive variance $\sigma_j^2$, and $n\geq 2$.
Because $\bar Y_j=a_j^\top \bar X$ and $V_j=a_j^\top S a_j$, the event $A_j$ can be rewritten as
\begin{align*}
A_j
=
\{|T_j|\leq c\}.
\end{align*}
The Student $t$ distribution is symmetric about $0$, and $c=t_{n-1;1-\alpha/(2k)}$ is chosen so that the upper tail probability is $\alpha/(2k)$. Hence the two tails together have probability $\alpha/k$:
\begin{align*}
\mathbb{P}(A_j^c)
=
\mathbb{P}(|T_j|>c)
=
\frac{\alpha}{2k}+\frac{\alpha}{2k}
=
\frac{\alpha}{k}.
\end{align*}
Combining this with the zero-contrast case gives, for every $j \in \{1,\dots,k\}$,
\begin{align*}
\mathbb{P}(A_j^c) \leq \frac{\alpha}{k}.
\end{align*}
[/guided]
[/step]
[step:Apply the union bound to pass from marginal to simultaneous coverage]
The event that all intervals cover their corresponding targets is
\begin{align*}
\bigcap_{j=1}^k A_j.
\end{align*}
Its complement is the event that at least one interval fails:
\begin{align*}
\left(\bigcap_{j=1}^k A_j\right)^c
=
\bigcup_{j=1}^k A_j^c.
\end{align*}
By finite subadditivity of probability,
\begin{align*}
\mathbb{P}\left(\bigcup_{j=1}^k A_j^c\right)
\leq
\sum_{j=1}^k \mathbb{P}(A_j^c)
\leq
\sum_{j=1}^k \frac{\alpha}{k}
=
\alpha .
\end{align*}
Therefore
\begin{align*}
\mathbb{P}\left(\bigcap_{j=1}^k A_j\right)
=
1-
\mathbb{P}\left(\left(\bigcap_{j=1}^k A_j\right)^c\right)
\geq
1-\alpha .
\end{align*}
This is exactly the asserted simultaneous coverage probability.
[guided]
The marginal estimates do not require the events $A_1,\dots,A_k$ to be independent. This is precisely why the Bonferroni argument is useful: it only uses finite subadditivity of probability.
The event of simultaneous success is
\begin{align*}
\bigcap_{j=1}^k A_j,
\end{align*}
because this says that every one of the $k$ intervals contains its corresponding target. Its complement is the event that at least one interval fails. By De Morgan's law,
\begin{align*}
\left(\bigcap_{j=1}^k A_j\right)^c
=
\bigcup_{j=1}^k A_j^c.
\end{align*}
Now apply finite subadditivity of probability to the events $A_1^c,\dots,A_k^c$:
\begin{align*}
\mathbb{P}\left(\bigcup_{j=1}^k A_j^c\right)
\leq
\sum_{j=1}^k \mathbb{P}(A_j^c).
\end{align*}
From the previous step, each failure probability is at most $\alpha/k$, so
\begin{align*}
\mathbb{P}\left(\bigcup_{j=1}^k A_j^c\right)
\leq
\sum_{j=1}^k \frac{\alpha}{k}
=
\alpha .
\end{align*}
Taking complements gives
\begin{align*}
\mathbb{P}\left(\bigcap_{j=1}^k A_j\right)
=
1-
\mathbb{P}\left(\left(\bigcap_{j=1}^k A_j\right)^c\right)
\geq
1-\alpha .
\end{align*}
Since $A_j$ was defined exactly as the event that the $j$th displayed interval contains $a_j^\top \mu$, this proves the claimed simultaneous coverage statement.
[/guided]
[/step]