[proofplan]
The proof starts from the exact finite-sample normal equation for OLS on the event where the sample second-moment matrix is invertible. The sample second-moment matrix converges in probability to $Q$, so its inverse converges in probability to $Q^{-1}$ and the invertibility event has probability tending to $1$. The score term $n^{-1/2}\sum_{i=1}^n X_i u_i$ converges in distribution to a centered Gaussian vector with covariance $\Omega$ by the [multivariate central limit theorem](/theorems/1854). Slutsky's theorem then gives the covariance transformation by $Q^{-1}$ on both sides.
[/proofplan]
[step:Derive the exact OLS expansion on the invertibility event]
For each $n \in \mathbb{N}$, define the sample second-moment matrix $S_n: \Omega \to \mathbb{R}^{p \times p}$ and the sample score vector $T_n: \Omega \to \mathbb{R}^p$ by
\begin{align*}
S_n := \frac{1}{n}\sum_{i=1}^n X_i X_i^\top,
\qquad
T_n := \frac{1}{\sqrt{n}}\sum_{i=1}^n X_i u_i.
\end{align*}
On $A_n$, the matrix $\sum_{i=1}^n X_iX_i^\top = nS_n$ is invertible. Using $Y_i = X_i^\top \beta_0 + u_i$, we compute
\begin{align*}
\hat{\beta}_n
&=
\left(\sum_{i=1}^n X_i X_i^\top\right)^{-1}
\left(\sum_{i=1}^n X_i(X_i^\top \beta_0 + u_i)\right) \\
&=
\left(\sum_{i=1}^n X_i X_i^\top\right)^{-1}
\left(\sum_{i=1}^n X_i X_i^\top\right)\beta_0
+
\left(\sum_{i=1}^n X_i X_i^\top\right)^{-1}
\left(\sum_{i=1}^n X_i u_i\right) \\
&=
\beta_0
+
S_n^{-1}\left(\frac{1}{n}\sum_{i=1}^n X_i u_i\right).
\end{align*}
Multiplying by $\sqrt{n}$ gives, on $A_n$,
\begin{align*}
\sqrt{n}(\hat{\beta}_n - \beta_0) = S_n^{-1}T_n.
\end{align*}
[guided]
The goal of this step is to isolate the two asymptotic objects: the design matrix average and the score average. Define
\begin{align*}
S_n := \frac{1}{n}\sum_{i=1}^n X_i X_i^\top,
\qquad
T_n := \frac{1}{\sqrt{n}}\sum_{i=1}^n X_i u_i.
\end{align*}
Here $S_n$ is an $\mathbb{R}^{p \times p}$-valued random matrix and $T_n$ is an $\mathbb{R}^p$-valued random vector. On the event $A_n$, the matrix $\sum_{i=1}^n X_iX_i^\top$ is invertible, so the closed-form OLS expression is valid.
Substituting the model equation $Y_i = X_i^\top \beta_0 + u_i$ into the OLS formula gives
\begin{align*}
\hat{\beta}_n
&=
\left(\sum_{i=1}^n X_i X_i^\top\right)^{-1}
\left(\sum_{i=1}^n X_i(X_i^\top \beta_0 + u_i)\right) \\
&=
\left(\sum_{i=1}^n X_i X_i^\top\right)^{-1}
\left(\sum_{i=1}^n X_i X_i^\top\right)\beta_0
+
\left(\sum_{i=1}^n X_i X_i^\top\right)^{-1}
\left(\sum_{i=1}^n X_i u_i\right).
\end{align*}
The first term equals $\beta_0$ because the matrix is invertible on $A_n$. Therefore
\begin{align*}
\hat{\beta}_n - \beta_0
=
\left(\sum_{i=1}^n X_i X_i^\top\right)^{-1}
\left(\sum_{i=1}^n X_i u_i\right).
\end{align*}
Since $\sum_{i=1}^n X_iX_i^\top = nS_n$, its inverse is $n^{-1}S_n^{-1}$. Hence
\begin{align*}
\sqrt{n}(\hat{\beta}_n - \beta_0)
=
S_n^{-1}
\left(\frac{1}{\sqrt{n}}\sum_{i=1}^n X_i u_i\right)
=
S_n^{-1}T_n.
\end{align*}
This identity is exact on $A_n$; no limiting approximation has been used yet.
[/guided]
[/step]
[step:Show the sample second-moment matrix converges to $Q$ and is eventually invertible with high probability]
For $1 \le j,k \le p$, the scalar random variables $X_{ij}X_{ik}$ are i.i.d. and integrable because
\begin{align*}
|X_{ij}X_{ik}| \le |X_i|^2,
\qquad
\mathbb{E}[|X_i|^2] = \operatorname{tr}(Q) < \infty.
\end{align*}
By the [weak law of large numbers](/theorems/1127) for each coordinate entry (citing a result not yet in the wiki: [Weak Law of Large Numbers](/theorems/1851)),
\begin{align*}
(S_n)_{jk}
=
\frac{1}{n}\sum_{i=1}^n X_{ij}X_{ik}
\xrightarrow{\mathbb{P}}
Q_{jk}.
\end{align*}
Since $p$ is fixed, entrywise convergence implies convergence in probability in any matrix norm:
\begin{align*}
S_n \xrightarrow{\mathbb{P}} Q.
\end{align*}
Because $Q$ is positive definite, it is invertible. The determinant map $\det: \mathbb{R}^{p \times p} \to \mathbb{R}$ and the inverse map on $GL(p,\mathbb{R})$ are continuous. Therefore,
\begin{align*}
\mathbb{P}(A_n) \to 1,
\qquad
S_n^{-1}\mathbb{1}_{A_n} \xrightarrow{\mathbb{P}} Q^{-1}.
\end{align*}
Equivalently, after redefining $S_n^{-1}$ arbitrarily on $A_n^c$, we have
\begin{align*}
S_n^{-1} \xrightarrow{\mathbb{P}} Q^{-1}.
\end{align*}
[guided]
We next control the random matrix multiplying the score term. For $1 \le j,k \le p$, let $X_{ij}$ denote the $j$-th component of $X_i$. The $(j,k)$ entry of $S_n$ is
\begin{align*}
(S_n)_{jk} = \frac{1}{n}\sum_{i=1}^n X_{ij}X_{ik}.
\end{align*}
These summands are i.i.d. because the random vectors $X_i$ are i.i.d. They are also integrable: by the [Cauchy-Schwarz inequality](/theorems/432) in $\mathbb{R}^p$,
\begin{align*}
|X_{ij}X_{ik}| \le |X_i|^2.
\end{align*}
Moreover
\begin{align*}
\mathbb{E}[|X_i|^2]
=
\sum_{j=1}^p \mathbb{E}[X_{ij}^2]
=
\operatorname{tr}(Q)
<
\infty,
\end{align*}
because $Q=\mathbb{E}[X_iX_i^\top]$ is a finite positive definite matrix.
The weak law of large numbers applies to each scalar sequence $(X_{ij}X_{ik})_{i=1}^\infty$ (citing a result not yet in the wiki: Weak Law of Large Numbers). Thus
\begin{align*}
(S_n)_{jk}
\xrightarrow{\mathbb{P}}
\mathbb{E}[X_{ij}X_{ik}]
=
Q_{jk}.
\end{align*}
Since $p$ is fixed, there are only finitely many entries. Entrywise convergence therefore implies convergence of the whole matrix in probability:
\begin{align*}
S_n \xrightarrow{\mathbb{P}} Q.
\end{align*}
Because $Q$ is positive definite, all its eigenvalues are positive, so $Q$ is invertible and $\det Q \neq 0$. The determinant function is continuous on $\mathbb{R}^{p \times p}$, so
\begin{align*}
\det(S_n) \xrightarrow{\mathbb{P}} \det(Q).
\end{align*}
Since $\det(Q) \neq 0$, this implies
\begin{align*}
\mathbb{P}(\det(S_n) \neq 0) \to 1.
\end{align*}
But $\det(S_n) \neq 0$ is exactly the event $A_n$, because multiplying by $n$ does not affect invertibility. Hence $\mathbb{P}(A_n) \to 1$.
Finally, the inverse map $M \mapsto M^{-1}$ is continuous on the [open set](/page/Open%20Set) $GL(p,\mathbb{R})$ of invertible matrices. Therefore the [continuous mapping theorem](/theorems/1847) gives
\begin{align*}
S_n^{-1} \xrightarrow{\mathbb{P}} Q^{-1},
\end{align*}
with any arbitrary definition on $A_n^c$, since $\mathbb{P}(A_n^c)\to 0$.
[/guided]
[/step]
[step:Apply the multivariate central limit theorem to the score term]
Define the score summand $Z_i: \Omega \to \mathbb{R}^p$ by
\begin{align*}
Z_i := X_i u_i.
\end{align*}
The random vectors $(Z_i)_{i=1}^{\infty}$ are i.i.d. Since $\mathbb{E}[X_i u_i]=0$, we have $\mathbb{E}[Z_i]=0$. Also,
\begin{align*}
\mathbb{E}[|Z_i|^2]
=
\mathbb{E}[u_i^2|X_i|^2]
<
\infty,
\end{align*}
so $Z_i$ has finite second moment. Its covariance matrix is
\begin{align*}
\operatorname{Var}(Z_i)
=
\mathbb{E}[Z_iZ_i^\top]
=
\mathbb{E}[u_i^2X_iX_i^\top]
=
\Omega.
\end{align*}
By the multivariate [central limit theorem](/theorems/1848) (citing a result not yet in the wiki: Multivariate [Central Limit Theorem](/theorems/521)),
\begin{align*}
T_n
=
\frac{1}{\sqrt{n}}\sum_{i=1}^n Z_i
\xrightarrow{d}
\mathcal{N}(0,\Omega).
\end{align*}
[guided]
The score term is
\begin{align*}
T_n = \frac{1}{\sqrt{n}}\sum_{i=1}^n X_i u_i.
\end{align*}
To put this in the standard central limit theorem form, define
\begin{align*}
Z_i := X_i u_i.
\end{align*}
Then $Z_i: \Omega \to \mathbb{R}^p$ is a random vector, and
\begin{align*}
T_n = \frac{1}{\sqrt{n}}\sum_{i=1}^n Z_i.
\end{align*}
We verify the hypotheses of the multivariate central limit theorem. First, the $Z_i$ are i.i.d. because each $Z_i$ is a measurable function of the i.i.d. pair $(X_i,u_i)$. Second, the mean is zero:
\begin{align*}
\mathbb{E}[Z_i] = \mathbb{E}[X_i u_i] = 0.
\end{align*}
Third, $Z_i$ has finite second moment because
\begin{align*}
|Z_i|^2 = |X_i u_i|^2 = u_i^2 |X_i|^2
\end{align*}
and the theorem assumes
\begin{align*}
\mathbb{E}[u_i^2|X_i|^2] < \infty.
\end{align*}
The covariance matrix is therefore well-defined and equals
\begin{align*}
\operatorname{Var}(Z_i)
&=
\mathbb{E}[(Z_i-\mathbb{E}[Z_i])(Z_i-\mathbb{E}[Z_i])^\top] \\
&=
\mathbb{E}[Z_iZ_i^\top] \\
&=
\mathbb{E}[(X_i u_i)(X_i u_i)^\top] \\
&=
\mathbb{E}[u_i^2X_iX_i^\top]
=
\Omega.
\end{align*}
The multivariate central limit theorem now applies (citing a result not yet in the wiki: Multivariate Central Limit Theorem), giving
\begin{align*}
T_n
=
\frac{1}{\sqrt{n}}\sum_{i=1}^n Z_i
\xrightarrow{d}
\mathcal{N}(0,\Omega).
\end{align*}
[/guided]
[/step]
[step:Combine the matrix and score limits by Slutsky's theorem]
From the preceding steps,
\begin{align*}
S_n^{-1} \xrightarrow{\mathbb{P}} Q^{-1},
\qquad
T_n \xrightarrow{d} \mathcal{N}(0,\Omega).
\end{align*}
By Slutsky's theorem for products of random matrices and random vectors (citing a result not yet in the wiki: Slutsky's Theorem),
\begin{align*}
S_n^{-1}T_n
\xrightarrow{d}
Q^{-1}Z,
\end{align*}
where $Z \sim \mathcal{N}(0,\Omega)$. Since linear transformations of Gaussian random vectors are Gaussian,
\begin{align*}
Q^{-1}Z \sim \mathcal{N}(0,Q^{-1}\Omega(Q^{-1})^\top).
\end{align*}
The matrix $Q$ is symmetric because $Q=\mathbb{E}[X_iX_i^\top]$, hence $Q^{-1}$ is symmetric. Therefore
\begin{align*}
Q^{-1}\Omega(Q^{-1})^\top = Q^{-1}\Omega Q^{-1}.
\end{align*}
Using the exact expansion on $A_n$ and $\mathbb{P}(A_n)\to 1$, changing the estimator arbitrarily on $A_n^c$ does not affect the limiting distribution. Hence
\begin{align*}
\sqrt{n}(\hat{\beta}_n-\beta_0)
\xrightarrow{d}
\mathcal{N}(0,Q^{-1}\Omega Q^{-1}).
\end{align*}
This proves the theorem.
[guided]
We now combine the two limits. The previous steps established
\begin{align*}
S_n^{-1} \xrightarrow{\mathbb{P}} Q^{-1}
\end{align*}
and
\begin{align*}
T_n \xrightarrow{d} \mathcal{N}(0,\Omega).
\end{align*}
The first convergence is in probability to a non-random matrix, while the second convergence is in distribution to a Gaussian random vector. Slutsky's theorem applies exactly to this situation (citing a result not yet in the wiki: Slutsky's Theorem): multiplying a sequence converging in distribution by a sequence converging in probability to a constant matrix preserves the distributional limit with that constant matrix applied. Therefore
\begin{align*}
S_n^{-1}T_n
\xrightarrow{d}
Q^{-1}Z,
\end{align*}
where $Z$ is a random vector with distribution $\mathcal{N}(0,\Omega)$.
It remains to identify the covariance matrix of $Q^{-1}Z$. A linear transformation of a Gaussian random vector is Gaussian. Its mean is
\begin{align*}
\mathbb{E}[Q^{-1}Z] = Q^{-1}\mathbb{E}[Z] = 0,
\end{align*}
and its covariance matrix is
\begin{align*}
\operatorname{Var}(Q^{-1}Z)
&=
\mathbb{E}[(Q^{-1}Z)(Q^{-1}Z)^\top] \\
&=
Q^{-1}\mathbb{E}[ZZ^\top](Q^{-1})^\top \\
&=
Q^{-1}\Omega(Q^{-1})^\top.
\end{align*}
Since $Q=\mathbb{E}[X_iX_i^\top]$ is symmetric, its inverse $Q^{-1}$ is also symmetric, so
\begin{align*}
Q^{-1}\Omega(Q^{-1})^\top
=
Q^{-1}\Omega Q^{-1}.
\end{align*}
Finally, the identity
\begin{align*}
\sqrt{n}(\hat{\beta}_n-\beta_0)=S_n^{-1}T_n
\end{align*}
was derived on $A_n$, and we proved $\mathbb{P}(A_n)\to 1$. Since modifying a sequence of random vectors on events whose probabilities tend to zero does not change its limiting distribution, the arbitrary extension of $\hat{\beta}_n$ on $A_n^c$ is asymptotically irrelevant. Thus
\begin{align*}
\sqrt{n}(\hat{\beta}_n-\beta_0)
\xrightarrow{d}
\mathcal{N}(0,Q^{-1}\Omega Q^{-1}),
\end{align*}
which is precisely the claimed asymptotic normality of random design OLS.
[/guided]
[/step]