[proofplan]
We prove that a linear transformation of a multivariate normal vector is again multivariate normal, with transformed mean and covariance. The strategy rests on the definitional characterisation of multivariate normality: a random vector is multivariate normal iff every linear combination of its components is (univariate) normal. Given this, the proof of the general linear transformation law reduces to applying the definition to $AX$ and using the identity $s^\top(AX) = (A^\tops)^\topX$. The mean and covariance transformations are direct consequences of the linearity of expectation and the bilinearity of covariance. The orthogonal special case follows instantly by specialising $\Sigma = \sigma^2 I$ and $A A^\top = I$.
[/proofplan]
[step:Verify that every linear combination of $AX$ is univariate normal]
Let $X: \Omega \to \mathbb{R}^n$ be multivariate normal, $X \sim N_n(\mu, \Sigma)$, and let $A \in \mathbb{R}^{m \times n}$ be a deterministic matrix. Consider the transformed vector
\begin{align*}
Y: \Omega &\to \mathbb{R}^m \\
\omega &\mapsto AX(\omega).
\end{align*}
To show that $Y$ is multivariate normal, we must show that for every $s \in \mathbb{R}^m$, the random variable $s^\top Y$ is univariate normal. Fix such an $s$. Using the transpose identity $(AX)^\top s = X^\top A^\top s$, or equivalently $s^\top A X = (A^\top s)^\top X$, we can write
\begin{align*}
s^\top Y = s^\top A X = (A^\top s)^\top X = t^\top X,
\end{align*}
where $t := A^\top s \in \mathbb{R}^n$.
By the definition of multivariate normality applied to $X$, for every $t \in \mathbb{R}^n$ the linear combination $t^\top X$ is univariate normal. In particular, this holds for our chosen $t = A^\top s$. Since $s \in \mathbb{R}^m$ was arbitrary, every linear combination of the components of $Y$ is univariate normal, so $Y$ is multivariate normal by definition.
[guided]
The definition of multivariate normality we use is: a random vector $X: \Omega \to \mathbb{R}^n$ is multivariate normal iff for every $t \in \mathbb{R}^n$, the scalar random variable $t^\top X = \sum_{i=1}^n t_i X_i$ is (univariate) normal, where a constant is treated as a degenerate normal with zero variance. This is the "all linear combinations are normal" characterisation.
We want to check this property for $Y = AX$. Pick any $s \in \mathbb{R}^m$ and consider
\begin{align*}
s^\top Y = s^\top (A X).
\end{align*}
The key algebraic observation is that we can "move $A$ over" to $s$ using the transpose: $s^\top A = (A^\top s)^\top$. Setting $t := A^\top s \in \mathbb{R}^n$ (note $A^\top$ has shape $n \times m$, so this has the right dimension), we get
\begin{align*}
s^\top Y = (A^\top s)^\top X = t^\top X.
\end{align*}
But $t^\top X$ is a linear combination of the components of $X$ — precisely the object the definition of multivariate normality of $X$ controls. Since $X$ is multivariate normal, $t^\top X$ is univariate normal. The choice of $s$ was arbitrary, so every linear combination of the components of $Y$ is univariate normal, which is exactly the condition for $Y$ to be multivariate normal.
Why does this work? It is a one-line algebraic trick: a linear combination of a linear transform is itself a linear combination. The structure $A^\top: \mathbb{R}^m \to \mathbb{R}^n$ converts the "test vector" $s$ in the target space to a "test vector" $t$ in the source space, where the normality hypothesis already lives.
[/guided]
[/step]
[step:Compute the mean of $AX$ by linearity of expectation]
By the linearity of expectation applied componentwise,
\begin{align*}
\mathbb{E}[AX] = A\,\mathbb{E}[X] = A\mu.
\end{align*}
Explicitly, the $i$-th component is $\mathbb{E}\!\left[\sum_{j=1}^n A_{ij} X_j\right] = \sum_{j=1}^n A_{ij} \mathbb{E}[X_j] = \sum_{j=1}^n A_{ij} \mu_j = (A\mu)_i$.
[/step]
[step:Compute the covariance of $AX$ via the bilinearity of covariance]
The covariance matrix of $Y = AX$ is defined as
\begin{align*}
\operatorname{Cov}(Y) = \mathbb{E}\!\left[(Y - \mathbb{E}[Y])(Y - \mathbb{E}[Y])^\top\right].
\end{align*}
Substituting $Y - \mathbb{E}[Y] = AX - A\mu = A(X - \mu)$,
\begin{align*}
\operatorname{Cov}(AX) &= \mathbb{E}\!\left[A(X - \mu)(X - \mu)^\top A^\top\right] \\
&= A\,\mathbb{E}\!\left[(X - \mu)(X - \mu)^\top\right] A^\top \\
&= A \Sigma A^\top,
\end{align*}
where the second line uses linearity of expectation (the matrices $A$ and $A^\top$ are deterministic and factor outside the expectation), and the third uses the definition $\Sigma = \operatorname{Cov}(X)$.
Combining this with the previous step, $Y = AX \sim N_m(A\mu, A\Sigma A^\top)$.
[guided]
The covariance matrix of a random vector $Z: \Omega \to \mathbb{R}^m$ with mean $\nu = \mathbb{E}[Z]$ is defined entry-wise by
\begin{align*}
\operatorname{Cov}(Z)_{ij} = \mathbb{E}[(Z_i - \nu_i)(Z_j - \nu_j)],
\end{align*}
or compactly as the matrix $\operatorname{Cov}(Z) = \mathbb{E}[(Z - \nu)(Z - \nu)^\top]$, where the outer product $(Z - \nu)(Z - \nu)^\top$ is an $m \times m$ random matrix and the expectation is taken entry-wise.
We apply this to $Y = AX$ with mean $\mathbb{E}[Y] = A\mu$ (established in the previous step). The centred vector is
\begin{align*}
Y - A\mu = AX - A\mu = A(X - \mu),
\end{align*}
so its outer product is
\begin{align*}
(Y - A\mu)(Y - A\mu)^\top = A(X - \mu)\bigl[A(X - \mu)\bigr]^\top = A(X - \mu)(X - \mu)^\top A^\top.
\end{align*}
Taking expectations entry-wise, and using that the deterministic matrices $A, A^\top$ factor through the expectation (since $\mathbb{E}$ is linear and each entry of $A(\cdot)A^\top$ is a fixed linear combination of entries of the bracketed random matrix),
\begin{align*}
\operatorname{Cov}(Y) = A \,\mathbb{E}\!\left[(X - \mu)(X - \mu)^\top\right] A^\top = A \Sigma A^\top.
\end{align*}
This is the **sandwich formula** for transformed covariances: wrap the original covariance $\Sigma$ in $A$ on the left and $A^\top$ on the right. It is the matrix analogue of the scalar identity $\operatorname{Var}(aX) = a^2 \operatorname{Var}(X)$, with the two factors of $a$ generalising to $A$ on one side and $A^\top$ on the other.
Together with the identification of the mean as $A\mu$ and the multivariate normality established in the first step, the full distributional statement is $AX \sim N_m(A\mu, A\Sigma A^\top)$. A multivariate normal distribution is determined by its mean and covariance, so nothing more is needed.
[/guided]
[/step]
[step:Specialise to orthogonal $A$ with $\Sigma = \sigma^2 I_n$]
Now suppose $m = n$, $A \in \mathbb{R}^{n \times n}$ is orthogonal (i.e. $A A^\top = A^\top A = I_n$), and $\Sigma = \sigma^2 I_n$ for some $\sigma > 0$. By the general case proved above,
\begin{align*}
AX \sim N_n(A\mu,\; A(\sigma^2 I_n)A^\top).
\end{align*}
The covariance simplifies as
\begin{align*}
A(\sigma^2 I_n)A^\top = \sigma^2 A I_n A^\top = \sigma^2 A A^\top = \sigma^2 I_n,
\end{align*}
using the orthogonality $A A^\top = I_n$. Therefore $AX \sim N_n(A\mu, \sigma^2 I_n)$: the covariance structure is unchanged by an orthogonal transformation when $\Sigma$ is a scalar multiple of the identity.
In particular, if $X \sim N_n(\mathbf{0}, \sigma^2 I_n)$ and $A$ is orthogonal, then $AX \sim N_n(\mathbf{0}, \sigma^2 I_n)$ — so the components of $AX$ are i.i.d. $N(0, \sigma^2)$, exactly as the components of $X$ were. This is the distributional symmetry that underpins sampling-theoretic arguments such as the independence of $\bar{X}$ and $S^2$ for normal samples.
[/step]