[proofplan]
We work on the sample space $(\mathbb{R}^p)^n$ and write the joint density of the sample with respect to Lebesgue measure. The only algebraic point is to decompose the centered quadratic form into a within-sample covariance term and a sample-mean term. This shows that the likelihood ratio between any parameter value and a fixed reference parameter is measurable with respect to $\sigma(\bar{X},S)$. A conditional-expectation argument then proves sufficiency directly from the definition.
[/proofplan]
[step:Write the joint density on the product sample space]
Let
\begin{align*}
\Theta := \{(\mu,\Sigma) : \mu \in \mathbb{R}^p,\ \Sigma \in \mathbb{R}^{p \times p}\text{ is symmetric positive definite}\}
\end{align*}
denote the parameter space. Let
\begin{align*}
\Omega_0 := (\mathbb{R}^p)^n
\end{align*}
be the sample space, equipped with its Borel $\sigma$-algebra $\mathcal{B}(\Omega_0)$. For $\theta=(\mu,\Sigma)\in\Theta$, let $P_\theta$ denote the joint law of $(X_1,\dots,X_n)$ on $\Omega_0$.
Let $\mathcal{L}^{np}$ denote Lebesgue measure on $\Omega_0$. For a point
$x=(x_1,\dots,x_n)\in\Omega_0$, define the sample mean map
\begin{align*}
\bar{x}: \Omega_0 &\to \mathbb{R}^p \\
(x_1,\dots,x_n) &\mapsto \frac{1}{n}\sum_{i=1}^{n}x_i
\end{align*}
and the sample covariance map
\begin{align*}
s: \Omega_0 &\to \mathbb{R}^{p\times p} \\
(x_1,\dots,x_n) &\mapsto \frac{1}{n-1}\sum_{i=1}^{n}(x_i-\bar{x})(x_i-\bar{x})^\top .
\end{align*}
Then $P_\theta$ has density
\begin{align*}
f_\theta:\Omega_0 &\to [0,\infty) \\
x &\mapsto (2\pi)^{-np/2}(\det \Sigma)^{-n/2}
\exp\left(
-\frac{1}{2}\sum_{i=1}^{n}(x_i-\mu)^\top\Sigma^{-1}(x_i-\mu)
\right)
\end{align*}
with respect to $\mathcal{L}^{np}$.
[guided]
We first place the statistical problem on a fixed measurable space. The observations form a point
$x=(x_1,\dots,x_n)$ in $\Omega_0=(\mathbb{R}^p)^n$, and all parameter values use the same dominating measure, namely Lebesgue measure $\mathcal{L}^{np}$ on this product space.
For each parameter $\theta=(\mu,\Sigma)$, where $\Sigma$ is symmetric positive definite, the inverse matrix $\Sigma^{-1}$ exists and $\det\Sigma>0$. Therefore the joint density of independent $\mathcal{N}_p(\mu,\Sigma)$ observations is the product of the $n$ multivariate normal densities:
\begin{align*}
f_\theta(x)
=
(2\pi)^{-np/2}(\det \Sigma)^{-n/2}
\exp\left(
-\frac{1}{2}\sum_{i=1}^{n}(x_i-\mu)^\top\Sigma^{-1}(x_i-\mu)
\right).
\end{align*}
The statistic in the theorem is the measurable map
\begin{align*}
T:\Omega_0 &\to \mathbb{R}^p\times \mathbb{R}^{p\times p}\\
x &\mapsto (\bar{x},s(x)).
\end{align*}
Thus proving sufficiency means proving that the conditional law of the full sample given $T$ can be chosen independently of $\theta$.
[/guided]
[/step]
[step:Decompose the normal quadratic form through the sample mean and covariance]
For each $x=(x_1,\dots,x_n)\in\Omega_0$, we have
\begin{align*}
\sum_{i=1}^{n}(x_i-\mu)(x_i-\mu)^\top
=
\sum_{i=1}^{n}(x_i-\bar{x})(x_i-\bar{x})^\top
+
n(\bar{x}-\mu)(\bar{x}-\mu)^\top .
\end{align*}
Indeed,
\begin{align*}
x_i-\mu=(x_i-\bar{x})+(\bar{x}-\mu),
\end{align*}
so expanding the outer product gives
\begin{align*}
\sum_{i=1}^{n}(x_i-\mu)(x_i-\mu)^\top
&=
\sum_{i=1}^{n}(x_i-\bar{x})(x_i-\bar{x})^\top \\
&\quad+
\sum_{i=1}^{n}(x_i-\bar{x})(\bar{x}-\mu)^\top \\
&\quad+
\sum_{i=1}^{n}(\bar{x}-\mu)(x_i-\bar{x})^\top \\
&\quad+
\sum_{i=1}^{n}(\bar{x}-\mu)(\bar{x}-\mu)^\top .
\end{align*}
The two cross terms vanish because
\begin{align*}
\sum_{i=1}^{n}(x_i-\bar{x})=\sum_{i=1}^{n}x_i-n\bar{x}=0.
\end{align*}
Since
\begin{align*}
\sum_{i=1}^{n}(x_i-\bar{x})(x_i-\bar{x})^\top=(n-1)s(x),
\end{align*}
we obtain
\begin{align*}
\sum_{i=1}^{n}(x_i-\mu)(x_i-\mu)^\top
=
(n-1)s(x)+n(\bar{x}-\mu)(\bar{x}-\mu)^\top .
\end{align*}
[guided]
The exponent in the normal likelihood contains a sum of quadratic forms. To expose the statistic $(\bar{x},s(x))$, we rewrite every centered observation around the sample mean:
\begin{align*}
x_i-\mu=(x_i-\bar{x})+(\bar{x}-\mu).
\end{align*}
Expanding the outer product and summing over $i$ gives
\begin{align*}
\sum_{i=1}^{n}(x_i-\mu)(x_i-\mu)^\top
&=
\sum_{i=1}^{n}(x_i-\bar{x})(x_i-\bar{x})^\top \\
&\quad+
\left(\sum_{i=1}^{n}(x_i-\bar{x})\right)(\bar{x}-\mu)^\top \\
&\quad+
(\bar{x}-\mu)\left(\sum_{i=1}^{n}(x_i-\bar{x})\right)^\top \\
&\quad+
n(\bar{x}-\mu)(\bar{x}-\mu)^\top .
\end{align*}
The defining property of the sample mean is
\begin{align*}
\sum_{i=1}^{n}(x_i-\bar{x})=0.
\end{align*}
Therefore both cross terms vanish. The within-sample term is exactly $(n-1)s(x)$ by the definition of the sample covariance map. Hence
\begin{align*}
\sum_{i=1}^{n}(x_i-\mu)(x_i-\mu)^\top
=
(n-1)s(x)+n(\bar{x}-\mu)(\bar{x}-\mu)^\top .
\end{align*}
This is the algebraic reason no additional feature of the data enters the likelihood.
[/guided]
[/step]
[step:Factor the density through the statistic $(\bar{x},s(x))$]
For every symmetric positive definite matrix $\Sigma$, the identity
\begin{align*}
v^\top\Sigma^{-1}v=\operatorname{tr}\left(\Sigma^{-1}vv^\top\right)
\end{align*}
holds for every $v\in\mathbb{R}^p$. Applying this identity with $v=x_i-\mu$ and summing over $i$ gives
\begin{align*}
\sum_{i=1}^{n}(x_i-\mu)^\top\Sigma^{-1}(x_i-\mu)
=
\operatorname{tr}\left(
\Sigma^{-1}
\sum_{i=1}^{n}(x_i-\mu)(x_i-\mu)^\top
\right).
\end{align*}
Using the decomposition from the previous step,
\begin{align*}
\sum_{i=1}^{n}(x_i-\mu)^\top\Sigma^{-1}(x_i-\mu)
=
\operatorname{tr}\left(
\Sigma^{-1}
\left[(n-1)s(x)+n(\bar{x}-\mu)(\bar{x}-\mu)^\top\right]
\right).
\end{align*}
Define, for each $\theta=(\mu,\Sigma)\in\Theta$, the map
\begin{align*}
g_\theta:\mathbb{R}^p\times\mathbb{R}^{p\times p} &\to [0,\infty)\\
(m,A) &\mapsto
(2\pi)^{-np/2}(\det \Sigma)^{-n/2}
\exp\left(
-\frac{1}{2}\operatorname{tr}\left(
\Sigma^{-1}\left[(n-1)A+n(m-\mu)(m-\mu)^\top\right]
\right)
\right).
\end{align*}
Then
\begin{align*}
f_\theta(x)=g_\theta(\bar{x},s(x))
\end{align*}
for every $x\in\Omega_0$ and every $\theta\in\Theta$.
[guided]
We now translate the matrix decomposition into the scalar exponent of the density. For a vector $v\in\mathbb{R}^p$ and a symmetric positive definite matrix $\Sigma$, the scalar quadratic form can be written as a trace:
\begin{align*}
v^\top\Sigma^{-1}v=\operatorname{tr}(\Sigma^{-1}vv^\top).
\end{align*}
This identity is useful because the previous step decomposed the matrix sum of outer products.
Applying the trace identity to each vector $x_i-\mu$ gives
\begin{align*}
\sum_{i=1}^{n}(x_i-\mu)^\top\Sigma^{-1}(x_i-\mu)
=
\operatorname{tr}\left(
\Sigma^{-1}
\sum_{i=1}^{n}(x_i-\mu)(x_i-\mu)^\top
\right).
\end{align*}
Substituting the decomposition already proved yields
\begin{align*}
\sum_{i=1}^{n}(x_i-\mu)^\top\Sigma^{-1}(x_i-\mu)
=
\operatorname{tr}\left(
\Sigma^{-1}
\left[(n-1)s(x)+n(\bar{x}-\mu)(\bar{x}-\mu)^\top\right]
\right).
\end{align*}
Thus the density depends on the full sample $x$ only through the pair $(\bar{x},s(x))$. Formally, defining
\begin{align*}
g_\theta(m,A)
=
(2\pi)^{-np/2}(\det \Sigma)^{-n/2}
\exp\left(
-\frac{1}{2}\operatorname{tr}\left(
\Sigma^{-1}\left[(n-1)A+n(m-\mu)(m-\mu)^\top\right]
\right)
\right)
\end{align*}
for $(m,A)\in\mathbb{R}^p\times\mathbb{R}^{p\times p}$, we have
\begin{align*}
f_\theta(x)=g_\theta(\bar{x},s(x)).
\end{align*}
So every likelihood ratio between two parameter values is measurable with respect to the $\sigma$-algebra generated by $(\bar{x},s(x))$.
[/guided]
[/step]
[step:Use likelihood-ratio measurability to prove sufficiency]
Let $\theta_0=(0,I_p)\in\Theta$, where $I_p$ is the $p\times p$ identity matrix. Since $f_{\theta_0}(x)>0$ for every $x\in\Omega_0$, define the likelihood-ratio map
\begin{align*}
L_\theta:\Omega_0 &\to (0,\infty)\\
x &\mapsto \frac{f_\theta(x)}{f_{\theta_0}(x)}.
\end{align*}
Because each density factors through $T(x):=(\bar{x},s(x))$, the map $L_\theta$ is $\sigma(T)$-measurable.
Let $A\in\mathcal{B}(\Omega_0)$. Choose a version of the conditional expectation
\begin{align*}
K_A:\Omega_0 &\to [0,1]\\
x &\mapsto \mathbb{E}_{\theta_0}[\mathbb{1}_A\mid\sigma(T)](x).
\end{align*}
We claim that the same $K_A$ is a version of $\mathbb{E}_\theta[\mathbb{1}_A\mid\sigma(T)]$ for every $\theta\in\Theta$.
Indeed, for every $B\in\sigma(T)$, using $dP_\theta=L_\theta\,dP_{\theta_0}$ and the $\sigma(T)$-measurability of $L_\theta$ and $K_A$, we obtain
\begin{align*}
\int_B K_A\,dP_\theta
&=
\int_B K_A L_\theta\,dP_{\theta_0}\\
&=
\int_B L_\theta\,\mathbb{E}_{\theta_0}[\mathbb{1}_A\mid\sigma(T)]\,dP_{\theta_0}\\
&=
\int_B L_\theta\mathbb{1}_A\,dP_{\theta_0}\\
&=
\int_B \mathbb{1}_A\,dP_\theta.
\end{align*}
Thus $K_A$ is $\sigma(T)$-measurable and satisfies the defining property of
$\mathbb{E}_\theta[\mathbb{1}_A\mid\sigma(T)]$, independently of $\theta$. Hence $\sigma(T)$ is sufficient for $\theta$, and therefore $(\bar{X},S)$ is sufficient for $(\mu,\Sigma)$.
[guided]
To finish, we verify sufficiency directly from conditional expectations. Fix the reference parameter
\begin{align*}
\theta_0=(0,I_p).
\end{align*}
The corresponding density $f_{\theta_0}$ is strictly positive everywhere on $\Omega_0$, so for each parameter $\theta\in\Theta$ we can form the likelihood ratio
\begin{align*}
L_\theta(x)=\frac{f_\theta(x)}{f_{\theta_0}(x)}.
\end{align*}
The previous step shows that both $f_\theta$ and $f_{\theta_0}$ are functions of $T(x)=(\bar{x},s(x))$. Hence $L_\theta$ is also a function of $T(x)$, and therefore $L_\theta$ is $\sigma(T)$-measurable.
Now let $A$ be any Borel subset of $\Omega_0$. Under the reference law $P_{\theta_0}$, choose a version
\begin{align*}
K_A(x)=\mathbb{E}_{\theta_0}[\mathbb{1}_A\mid\sigma(T)](x).
\end{align*}
This function is $\sigma(T)$-measurable and takes values in $[0,1]$. We show that the same function works as the conditional probability of $A$ given $T$ under every parameter value.
Let $B\in\sigma(T)$. Since $P_\theta$ has Radon-Nikodym derivative $L_\theta$ with respect to $P_{\theta_0}$, we have
\begin{align*}
\int_B K_A\,dP_\theta
=
\int_B K_A L_\theta\,dP_{\theta_0}.
\end{align*}
Because $L_\theta$ is $\sigma(T)$-measurable, the product $K_A L_\theta$ is an admissible $\sigma(T)$-measurable multiplier in the defining identity for conditional expectation under $P_{\theta_0}$. Therefore
\begin{align*}
\int_B K_A L_\theta\,dP_{\theta_0}
=
\int_B L_\theta\mathbb{1}_A\,dP_{\theta_0}.
\end{align*}
Converting back from $P_{\theta_0}$ to $P_\theta$ gives
\begin{align*}
\int_B L_\theta\mathbb{1}_A\,dP_{\theta_0}
=
\int_B \mathbb{1}_A\,dP_\theta.
\end{align*}
Thus
\begin{align*}
\int_B K_A\,dP_\theta=\int_B \mathbb{1}_A\,dP_\theta
\end{align*}
for every $B\in\sigma(T)$. This is exactly the defining property of
$\mathbb{E}_\theta[\mathbb{1}_A\mid\sigma(T)]$. Since the version $K_A$ was chosen under the fixed reference law and does not depend on $\theta$, the conditional distribution of the full sample given $(\bar{X},S)$ is independent of $(\mu,\Sigma)$. Therefore $(\bar{X},S)$ is sufficient for $(\mu,\Sigma)$.
[/guided]
[/step]