[proofplan]
We prove the two convergences component by component. The scalar [strong law of large numbers](/theorems/1852) gives almost sure convergence of each coordinate average of $X_i$ to the corresponding coordinate of $\mu$. Applying the same law to the integrable scalar variables $(X_i)_j(X_i)_k$ gives convergence of the empirical second moment matrix to $\Sigma+\mu\mu^\top$. Finally, we use the algebraic identity expressing the MLE covariance as empirical second moment minus $\hat{\mu}_n\hat{\mu}_n^\top$, and pass to the limit entrywise.
[/proofplan]
[step:Apply the scalar strong law to each coordinate of the sample mean]
For $i \in \mathbb{N}$ and $1 \le j \le p$, write $(X_i)_j$ for the $j$-th coordinate of $X_i$. Since $X_i \sim \mathcal{N}_p(\mu,\Sigma)$, each coordinate $(X_i)_j$ is an integrable real-valued random variable and
\begin{align*}
\mathbb{E}[(X_1)_j] = \mu_j.
\end{align*}
For each fixed $j$, the sequence $((X_i)_j)_{i \in \mathbb{N}}$ is i.i.d. Therefore, by the scalar [Strong Law of Large Numbers](/theorems/520) (citing a result not yet in the wiki: Strong Law of Large Numbers),
\begin{align*}
\frac{1}{n}\sum_{i=1}^n (X_i)_j \xrightarrow{a.s.} \mu_j.
\end{align*}
Because the index set $\{1,\dots,p\}$ is finite, the intersection of these $p$ almost sure events still has probability $1$. On that event, every coordinate of $\hat{\mu}_n$ converges to the corresponding coordinate of $\mu$, hence
\begin{align*}
\hat{\mu}_n \xrightarrow{a.s.} \mu.
\end{align*}
[guided]
We first reduce the vector convergence to finitely many scalar convergences. For $i \in \mathbb{N}$ and $1 \le j \le p$, let $(X_i)_j$ denote the $j$-th coordinate of the random vector $X_i$. Since $X_i$ has multivariate normal distribution $\mathcal{N}_p(\mu,\Sigma)$, its $j$-th coordinate is a real-valued normal random variable with finite first moment and mean $\mu_j$:
\begin{align*}
\mathbb{E}[(X_1)_j] = \mu_j.
\end{align*}
For a fixed coordinate $j$, the random variables $((X_i)_j)_{i \in \mathbb{N}}$ are independent and identically distributed, because the vectors $(X_i)_{i \in \mathbb{N}}$ are independent and identically distributed and coordinate projection preserves independence and identical distribution. The scalar Strong Law of Large Numbers applies to this integrable i.i.d. sequence, so
\begin{align*}
\frac{1}{n}\sum_{i=1}^n (X_i)_j \xrightarrow{a.s.} \mathbb{E}[(X_1)_j] = \mu_j.
\end{align*}
This argument gives one almost sure event for each coordinate. Since there are only finitely many coordinates, the intersection of these events still has probability $1$. On that intersection, all coordinate averages converge simultaneously:
\begin{align*}
(\hat{\mu}_n)_j = \frac{1}{n}\sum_{i=1}^n (X_i)_j \to \mu_j
\end{align*}
for every $1 \le j \le p$. This is precisely $\hat{\mu}_n \xrightarrow{a.s.} \mu$ in $\mathbb{R}^p$.
[/guided]
[/step]
[step:Apply the scalar strong law to each entry of the empirical second moment matrix]
For $1 \le j,k \le p$, define the real-valued random variables
\begin{align*}
Y_{i,jk} : \Omega &\to \mathbb{R}\\
\omega &\mapsto (X_i(\omega))_j (X_i(\omega))_k .
\end{align*}
The variables $(Y_{i,jk})_{i \in \mathbb{N}}$ are i.i.d. for each fixed pair $(j,k)$. They are integrable because, by Cauchy-Schwarz,
\begin{align*}
\mathbb{E}[|Y_{1,jk}|]
\le \mathbb{E}[(X_1)_j^2]^{1/2}\mathbb{E}[(X_1)_k^2]^{1/2}
< \infty,
\end{align*}
and all second moments of a multivariate normal vector are finite. By the scalar Strong Law of Large Numbers,
\begin{align*}
\frac{1}{n}\sum_{i=1}^n (X_i)_j(X_i)_k
\xrightarrow{a.s.}
\mathbb{E}[(X_1)_j(X_1)_k].
\end{align*}
By the definition of covariance,
\begin{align*}
\Sigma_{jk}
=
\mathbb{E}\bigl[((X_1)_j-\mu_j)((X_1)_k-\mu_k)\bigr]
=
\mathbb{E}[(X_1)_j(X_1)_k]-\mu_j\mu_k,
\end{align*}
so
\begin{align*}
\mathbb{E}[(X_1)_j(X_1)_k] = \Sigma_{jk}+\mu_j\mu_k.
\end{align*}
Since there are finitely many pairs $(j,k)$, these convergences hold simultaneously almost surely for all $1 \le j,k \le p$. Therefore, if
\begin{align*}
M_n := \frac{1}{n}\sum_{i=1}^n X_iX_i^\top,
\end{align*}
then
\begin{align*}
M_n \xrightarrow{a.s.} \Sigma+\mu\mu^\top
\end{align*}
entrywise.
[guided]
The covariance estimator is most easily handled by first controlling the empirical second moment matrix. For $1 \le j,k \le p$, define
\begin{align*}
Y_{i,jk} : \Omega &\to \mathbb{R}\\
\omega &\mapsto (X_i(\omega))_j (X_i(\omega))_k .
\end{align*}
This is the scalar random variable that gives the $(j,k)$-entry of $X_iX_i^\top$.
For each fixed pair $(j,k)$, the variables $(Y_{i,jk})_{i \in \mathbb{N}}$ are independent and identically distributed because each $Y_{i,jk}$ is obtained from $X_i$ by the same measurable function $x \mapsto x_jx_k$. To apply the scalar Strong Law of Large Numbers, we must check integrability. By Cauchy-Schwarz,
\begin{align*}
\mathbb{E}[|Y_{1,jk}|]
=
\mathbb{E}[|(X_1)_j(X_1)_k|]
\le
\mathbb{E}[(X_1)_j^2]^{1/2}
\mathbb{E}[(X_1)_k^2]^{1/2}.
\end{align*}
A multivariate normal random vector has finite second moments in every coordinate, so the right-hand side is finite. Thus the strong law applies and gives
\begin{align*}
\frac{1}{n}\sum_{i=1}^n (X_i)_j(X_i)_k
\xrightarrow{a.s.}
\mathbb{E}[(X_1)_j(X_1)_k].
\end{align*}
We now identify the limit. By definition, the covariance matrix $\Sigma$ satisfies
\begin{align*}
\Sigma_{jk}
=
\mathbb{E}\bigl[((X_1)_j-\mu_j)((X_1)_k-\mu_k)\bigr].
\end{align*}
Expanding the product and using $\mathbb{E}[(X_1)_j]=\mu_j$ and $\mathbb{E}[(X_1)_k]=\mu_k$, we obtain
\begin{align*}
\Sigma_{jk}
&=
\mathbb{E}[(X_1)_j(X_1)_k]
-\mu_j\mathbb{E}[(X_1)_k]
-\mu_k\mathbb{E}[(X_1)_j]
+\mu_j\mu_k\\
&=
\mathbb{E}[(X_1)_j(X_1)_k]-\mu_j\mu_k.
\end{align*}
Hence
\begin{align*}
\mathbb{E}[(X_1)_j(X_1)_k] = \Sigma_{jk}+\mu_j\mu_k.
\end{align*}
There are only $p^2$ pairs $(j,k)$. Taking the intersection of the corresponding almost sure events, we obtain simultaneous convergence of all entries. Therefore the empirical second moment matrix
\begin{align*}
M_n := \frac{1}{n}\sum_{i=1}^n X_iX_i^\top
\end{align*}
satisfies
\begin{align*}
M_n \xrightarrow{a.s.} \Sigma+\mu\mu^\top
\end{align*}
entrywise.
[/guided]
[/step]
[step:Rewrite the covariance estimator using empirical second moments]
For each $n \in \mathbb{N}$,
\begin{align*}
\hat{\Sigma}_n
&=
\frac{1}{n}\sum_{i=1}^n (X_i-\hat{\mu}_n)(X_i-\hat{\mu}_n)^\top\\
&=
\frac{1}{n}\sum_{i=1}^n X_iX_i^\top
-\frac{1}{n}\sum_{i=1}^n X_i\hat{\mu}_n^\top
-\frac{1}{n}\sum_{i=1}^n \hat{\mu}_nX_i^\top
+\frac{1}{n}\sum_{i=1}^n \hat{\mu}_n\hat{\mu}_n^\top.
\end{align*}
Since $\hat{\mu}_n$ does not depend on the summation index $i$ and $\frac{1}{n}\sum_{i=1}^n X_i=\hat{\mu}_n$, the middle and final terms become
\begin{align*}
\frac{1}{n}\sum_{i=1}^n X_i\hat{\mu}_n^\top &= \hat{\mu}_n\hat{\mu}_n^\top,\\
\frac{1}{n}\sum_{i=1}^n \hat{\mu}_nX_i^\top &= \hat{\mu}_n\hat{\mu}_n^\top,\\
\frac{1}{n}\sum_{i=1}^n \hat{\mu}_n\hat{\mu}_n^\top &= \hat{\mu}_n\hat{\mu}_n^\top.
\end{align*}
Therefore
\begin{align*}
\hat{\Sigma}_n = M_n-\hat{\mu}_n\hat{\mu}_n^\top.
\end{align*}
[/step]
[step:Pass to the limit in the covariance identity]
Work on the probability-one event on which both
\begin{align*}
\hat{\mu}_n &\to \mu, &
M_n &\to \Sigma+\mu\mu^\top
\end{align*}
hold entrywise. For each $1 \le j,k \le p$, the $(j,k)$-entry of $\hat{\mu}_n\hat{\mu}_n^\top$ is $(\hat{\mu}_n)_j(\hat{\mu}_n)_k$. Since products of convergent real sequences converge to the product of their limits,
\begin{align*}
(\hat{\mu}_n)_j(\hat{\mu}_n)_k \to \mu_j\mu_k.
\end{align*}
Thus
\begin{align*}
\hat{\mu}_n\hat{\mu}_n^\top \to \mu\mu^\top
\end{align*}
entrywise. Using the identity from the previous step,
\begin{align*}
\hat{\Sigma}_n
=
M_n-\hat{\mu}_n\hat{\mu}_n^\top
\to
(\Sigma+\mu\mu^\top)-\mu\mu^\top
=
\Sigma
\end{align*}
entrywise. Hence
\begin{align*}
\hat{\Sigma}_n \xrightarrow{a.s.} \Sigma.
\end{align*}
Together with $\hat{\mu}_n \xrightarrow{a.s.} \mu$, this proves the claimed strong consistency of both maximum likelihood estimators.
[/step]