[proofplan]
We first compute the exact distribution of a fixed estimated contrast. The normal model gives a scalar normal numerator, while the pooled covariance estimator supplies an independent chi-square denominator in the same response direction. This yields a Studentised statistic with $N-g$ degrees of freedom for each fixed pair $(a,c)$. Finally, simultaneous intervals follow by controlling the maximum of these Studentised statistics; the Bonferroni case is obtained by applying the union bound to the two tails of each fixed contrast.
[/proofplan]
[step:Compute the mean and variance of a fixed estimated contrast]
Fix a pair $(a,c)\in\mathcal{F}$. Define the scalar random variables
\begin{align*}
Y_{ij}: \Omega &\to \mathbb{R} \\
\omega &\mapsto a^\top X_{ij}(\omega).
\end{align*}
Since $X_{ij}\sim\mathcal{N}_p(\mu_i,\Sigma)$, each $Y_{ij}$ is normally distributed with
\begin{align*}
Y_{ij}\sim \mathcal{N}(a^\top\mu_i, a^\top\Sigma a).
\end{align*}
Because $a\neq 0$ and $\Sigma$ is positive definite, $a^\top\Sigma a>0$.
The sample mean of the scalar observations in group $i$ is
\begin{align*}
\bar Y_i := \frac{1}{n_i}\sum_{j=1}^{n_i}Y_{ij}
= a^\top \bar X_i.
\end{align*}
Hence
\begin{align*}
\hat\theta(a,c)
= \sum_{i=1}^g c_i \bar Y_i.
\end{align*}
Independence of the groups and the variance formula for independent linear combinations give
\begin{align*}
\mathbb{E}[\hat\theta(a,c)]
&= \sum_{i=1}^g c_i a^\top \mu_i
= \theta(a,c),\\
\operatorname{Var}(\hat\theta(a,c))
&= \sum_{i=1}^g c_i^2 \operatorname{Var}(\bar Y_i)
= a^\top\Sigma a \sum_{i=1}^g \frac{c_i^2}{n_i}.
\end{align*}
Therefore
\begin{align*}
\frac{\hat\theta(a,c)-\theta(a,c)}
{\left(a^\top\Sigma a\sum_{i=1}^g c_i^2/n_i\right)^{1/2}}
\sim \mathcal{N}(0,1).
\end{align*}
[guided]
Fix a response direction $a\in\mathbb{R}^p$ and a group contrast $c\in\mathbb{R}^g$. The multivariate contrast becomes an ordinary scalar contrast after projecting every observation onto $a$. Define
\begin{align*}
Y_{ij}: \Omega &\to \mathbb{R} \\
\omega &\mapsto a^\top X_{ij}(\omega).
\end{align*}
A linear image of a multivariate normal vector is normal, so
\begin{align*}
Y_{ij}\sim \mathcal{N}(a^\top\mu_i, a^\top\Sigma a).
\end{align*}
The variance is positive because $a\neq 0$ and $\Sigma$ is positive definite.
Now define the scalar group mean
\begin{align*}
\bar Y_i := \frac{1}{n_i}\sum_{j=1}^{n_i}Y_{ij}.
\end{align*}
By the definition of $Y_{ij}$,
\begin{align*}
\bar Y_i = a^\top \bar X_i.
\end{align*}
Thus the estimated multivariate contrast is exactly the scalar contrast
\begin{align*}
\hat\theta(a,c)
= a^\top\sum_{i=1}^g c_i\bar X_i
= \sum_{i=1}^g c_i\bar Y_i.
\end{align*}
Since the groups are independent, the random variables $\bar Y_1,\dots,\bar Y_g$ are independent. Therefore
\begin{align*}
\mathbb{E}[\hat\theta(a,c)]
&= \sum_{i=1}^g c_i \mathbb{E}[\bar Y_i]
= \sum_{i=1}^g c_i a^\top\mu_i
= \theta(a,c),\\
\operatorname{Var}(\hat\theta(a,c))
&= \sum_{i=1}^g c_i^2 \operatorname{Var}(\bar Y_i)
= \sum_{i=1}^g c_i^2 \frac{a^\top\Sigma a}{n_i}
= a^\top\Sigma a \sum_{i=1}^g \frac{c_i^2}{n_i}.
\end{align*}
Because $\hat\theta(a,c)$ is a linear combination of independent normal random variables, it is normal. Hence the centred and standardised contrast satisfies
\begin{align*}
\frac{\hat\theta(a,c)-\theta(a,c)}
{\left(a^\top\Sigma a\sum_{i=1}^g c_i^2/n_i\right)^{1/2}}
\sim \mathcal{N}(0,1).
\end{align*}
[/guided]
[/step]
[step:Studentise the fixed contrast using the pooled covariance estimator]
For the fixed $a$, define the pooled scalar within-groups sum of squares
\begin{align*}
Q_a := \sum_{i=1}^g \sum_{j=1}^{n_i}(Y_{ij}-\bar Y_i)^2.
\end{align*}
Since $Y_{ij}=a^\top X_{ij}$ and $\bar Y_i=a^\top\bar X_i$,
\begin{align*}
Q_a
= a^\top W a.
\end{align*}
For independent normal samples with common variance $a^\top\Sigma a$, the statistic
\begin{align*}
\frac{Q_a}{a^\top\Sigma a}
\end{align*}
has chi-square distribution with $N-g$ degrees of freedom and is independent of the group sample means $\bar Y_1,\dots,\bar Y_g$. Since $a^\top S_p a = Q_a/(N-g)$, the fixed-pair Studentised statistic
\begin{align*}
T(a,c)
:= \frac{\hat\theta(a,c)-\theta(a,c)}
{\left(a^\top S_p a\sum_{i=1}^g c_i^2/n_i\right)^{1/2}}
\end{align*}
has Student $t$ distribution with $N-g$ degrees of freedom.
[guided]
The variance $a^\top\Sigma a$ is unknown, so we replace it by the corresponding pooled sample variance in the same projected direction. Define
\begin{align*}
Q_a := \sum_{i=1}^g \sum_{j=1}^{n_i}(Y_{ij}-\bar Y_i)^2.
\end{align*}
Using $Y_{ij}=a^\top X_{ij}$ and $\bar Y_i=a^\top\bar X_i$, we obtain
\begin{align*}
Q_a
&= \sum_{i=1}^g \sum_{j=1}^{n_i}
a^\top(X_{ij}-\bar X_i)(X_{ij}-\bar X_i)^\top a\\
&= a^\top\left(\sum_{i=1}^g \sum_{j=1}^{n_i}
(X_{ij}-\bar X_i)(X_{ij}-\bar X_i)^\top\right)a\\
&= a^\top W a.
\end{align*}
Thus $a^\top S_p a=Q_a/(N-g)$.
For each group, the projected observations $Y_{ij}$ form an independent normal sample with common variance $a^\top\Sigma a$. The usual normal-sample decomposition says that the within-group sum of squares divided by the common variance is chi-square with $n_i-1$ degrees of freedom and is independent of the corresponding group mean. Summing over the independent groups gives
\begin{align*}
\frac{Q_a}{a^\top\Sigma a}\sim \chi^2_{N-g},
\end{align*}
and this chi-square variable is independent of $\bar Y_1,\dots,\bar Y_g$, hence independent of $\hat\theta(a,c)$.
Combining this independent chi-square denominator with the standard normal numerator from the previous step gives
\begin{align*}
T(a,c)
:= \frac{\hat\theta(a,c)-\theta(a,c)}
{\left(a^\top S_p a\sum_{i=1}^g c_i^2/n_i\right)^{1/2}}
\sim t_{N-g}.
\end{align*}
[/guided]
[/step]
[step:Convert a bound on the maximum statistic into simultaneous confidence intervals]
By the defining property of $K_\alpha(\mathcal{F})$,
\begin{align*}
\mathbb{P}\left(
\sup_{(a,c)\in\mathcal{F}} |T(a,c)| \leq K_\alpha(\mathcal{F})
\right)\geq 1-\alpha.
\end{align*}
On this event, for every $(a,c)\in\mathcal{F}$,
\begin{align*}
|\hat\theta(a,c)-\theta(a,c)|
\leq K_\alpha(\mathcal{F})
\left(a^\top S_p a\sum_{i=1}^g \frac{c_i^2}{n_i}\right)^{1/2}.
\end{align*}
Equivalently,
\begin{align*}
\theta(a,c)
\in
\left[
\hat\theta(a,c)-K_\alpha(\mathcal{F})s(a,c),
\hat\theta(a,c)+K_\alpha(\mathcal{F})s(a,c)
\right]
\end{align*}
for every $(a,c)\in\mathcal{F}$ simultaneously. Hence the displayed intervals have simultaneous coverage at least $1-\alpha$.
[/step]
[step:Derive the Bonferroni critical constant for a finite family]
Assume that $\mathcal{F}=\{(a_k,c_k):1\leq k\leq m\}$ is finite, and define
\begin{align*}
T_k := T(a_k,c_k), \qquad 1\leq k\leq m.
\end{align*}
Each $T_k$ has Student $t$ distribution with $\nu:=N-g$ degrees of freedom. Let
\begin{align*}
K := t_{\nu,\,1-\alpha/(2m)}.
\end{align*}
Then
\begin{align*}
\mathbb{P}(|T_k|>K)
= \frac{\alpha}{m}
\end{align*}
for each $k$. By the union bound,
\begin{align*}
\mathbb{P}\left(\max_{1\leq k\leq m}|T_k|>K\right)
&\leq \sum_{k=1}^m \mathbb{P}(|T_k|>K)\\
&= \alpha.
\end{align*}
Therefore
\begin{align*}
\mathbb{P}\left(\max_{1\leq k\leq m}|T_k|\leq K\right)
\geq 1-\alpha.
\end{align*}
Thus the Bonferroni constant $t_{N-g,\,1-\alpha/(2m)}$ satisfies the required simultaneous coverage condition.
[guided]
For a finite list of contrasts, no independence between the different statistics $T_1,\dots,T_m$ is needed. Define
\begin{align*}
T_k := T(a_k,c_k), \qquad 1\leq k\leq m.
\end{align*}
The previous step showed that each individual statistic has the same marginal distribution:
\begin{align*}
T_k \sim t_\nu,
\qquad \nu:=N-g.
\end{align*}
Choose
\begin{align*}
K := t_{\nu,\,1-\alpha/(2m)}.
\end{align*}
By the symmetry of the Student $t_\nu$ distribution,
\begin{align*}
\mathbb{P}(|T_k|>K)
=
2\mathbb{P}(T_k>K)
=
\frac{\alpha}{m}.
\end{align*}
The event that at least one interval fails is the event that at least one absolute statistic exceeds $K$. This event is contained in the union of the individual tail events, so the union bound gives
\begin{align*}
\mathbb{P}\left(\max_{1\leq k\leq m}|T_k|>K\right)
&=
\mathbb{P}\left(\bigcup_{k=1}^m\{|T_k|>K\}\right)\\
&\leq \sum_{k=1}^m \mathbb{P}(|T_k|>K)\\
&= \sum_{k=1}^m \frac{\alpha}{m}
= \alpha.
\end{align*}
Taking complements yields
\begin{align*}
\mathbb{P}\left(\max_{1\leq k\leq m}|T_k|\leq K\right)
\geq 1-\alpha.
\end{align*}
Therefore the Bonferroni critical value satisfies the defining condition for simultaneous confidence intervals.
[/guided]
[/step]