[proofplan]
We first separate the residual covariance matrix into the infeasible covariance matrix using the true errors and an error term caused by replacing $u_i$ with $\hat{u}_i$. The infeasible matrix converges to $\Omega$ by the [weak law of large numbers](/theorems/1127), while the residual replacement term is negligible because $\hat{\beta}_n \xrightarrow{\mathbb{P}} \beta_0$ and the fourth-moment assumptions control all fixed-dimensional coordinate averages. We then combine this convergence with the convergence of $(n^{-1}X^\top X)^{-1}$ to $Q^{-1}$. The HC1 result follows because its degrees-of-freedom multiplier converges to $1$ when $p$ is fixed.
[/proofplan]
custom_env
admin
[step:Express the residuals through the estimation error]Define the estimation error vector $\Delta_n \in \mathbb{R}^p$ by
\begin{align*}
\Delta_n := \hat{\beta}_n - \beta_0.
\end{align*}
For each $i \in \{1,\dots,n\}$, using $Y_i = X_i^\top \beta_0 + u_i$ gives
\begin{align*}
\hat{u}_i
= Y_i - X_i^\top \hat{\beta}_n
= u_i - X_i^\top \Delta_n.
\end{align*}
Hence
\begin{align*}
\hat{u}_i^2 - u_i^2
= -2u_i X_i^\top \Delta_n + (X_i^\top \Delta_n)^2.
\end{align*}
Therefore
\begin{align*}
\frac{1}{n}\sum_{i=1}^{n}\hat{u}_i^2 X_iX_i^\top
=
\frac{1}{n}\sum_{i=1}^{n}u_i^2X_iX_i^\top
+
R_n,
\end{align*}
where the residual replacement matrix $R_n \in \mathbb{R}^{p \times p}$ is
\begin{align*}
R_n
:=
\frac{1}{n}\sum_{i=1}^{n}
(\hat{u}_i^2-u_i^2)X_iX_i^\top.
\end{align*}[/step]
custom_env
admin
[guided]The first task is to isolate exactly where the feasible estimator differs from the infeasible one. Define
\begin{align*}
\Delta_n := \hat{\beta}_n - \beta_0.
\end{align*}
This is a random vector in $\mathbb{R}^p$. Since the model is $Y_i = X_i^\top \beta_0 + u_i$, the residual computed with $\hat{\beta}_n$ satisfies
\begin{align*}
\hat{u}_i
= Y_i - X_i^\top \hat{\beta}_n
= X_i^\top \beta_0 + u_i - X_i^\top \hat{\beta}_n
= u_i - X_i^\top(\hat{\beta}_n-\beta_0)
= u_i - X_i^\top\Delta_n.
\end{align*}
Squaring gives
\begin{align*}
\hat{u}_i^2
= u_i^2 - 2u_iX_i^\top\Delta_n + (X_i^\top\Delta_n)^2,
\end{align*}
so
\begin{align*}
\hat{u}_i^2-u_i^2
= -2u_iX_i^\top\Delta_n + (X_i^\top\Delta_n)^2.
\end{align*}
Thus the feasible middle matrix decomposes as
\begin{align*}
\frac{1}{n}\sum_{i=1}^{n}\hat{u}_i^2 X_iX_i^\top
=
\frac{1}{n}\sum_{i=1}^{n}u_i^2X_iX_i^\top
+
\frac{1}{n}\sum_{i=1}^{n}(\hat{u}_i^2-u_i^2)X_iX_i^\top.
\end{align*}
We call the second term $R_n$. The rest of the proof shows that the first term converges to $\Omega$ and $R_n$ converges to the zero matrix in probability.[/guided]
custom_env
admin
[step:Show the infeasible covariance matrix converges to $\Omega$]For $j,k \in \{1,\dots,p\}$, define the real-valued random variable
\begin{align*}
Z_{i,jk} := u_i^2 X_{ij}X_{ik}.
\end{align*}
By Cauchy-Schwarz,
\begin{align*}
\mathbb{E}[|Z_{i,jk}|]
\leq
\left(\mathbb{E}[u_i^4]\right)^{1/2}
\left(\mathbb{E}[X_{ij}^2X_{ik}^2]\right)^{1/2}.
\end{align*}
Since $|X_{ij}| \leq |X_i|$ and $|X_{ik}| \leq |X_i|$,
\begin{align*}
X_{ij}^2X_{ik}^2 \leq |X_i|^4,
\end{align*}
and therefore $\mathbb{E}[|Z_{i,jk}|] < \infty$. By the [weak law of large numbers](/theorems/1851) for i.i.d. integrable random variables (citing a result not yet in the wiki: weak law of large numbers),
\begin{align*}
\frac{1}{n}\sum_{i=1}^{n} Z_{i,jk}
\xrightarrow{\mathbb{P}}
\mathbb{E}[Z_{i,jk}]
=
\Omega_{jk}.
\end{align*}
Because $p$ is fixed, entrywise convergence over the finitely many pairs $(j,k)$ implies
\begin{align*}
\frac{1}{n}\sum_{i=1}^{n}u_i^2X_iX_i^\top
\xrightarrow{\mathbb{P}}
\Omega.
\end{align*}[/step]
custom_env
admin
[guided]We prove convergence entry by entry. Fix indices $j,k \in \{1,\dots,p\}$ and define
\begin{align*}
Z_{i,jk} := u_i^2 X_{ij}X_{ik}.
\end{align*}
This is the $(j,k)$ entry of the matrix $u_i^2X_iX_i^\top$. To apply the weak law of large numbers, we need integrability of $Z_{i,jk}$. The fourth-moment assumptions give it. Indeed, by Cauchy-Schwarz applied to $|u_i|^2$ and $|X_{ij}X_{ik}|$,
\begin{align*}
\mathbb{E}[|Z_{i,jk}|]
=
\mathbb{E}[u_i^2|X_{ij}X_{ik}|]
\leq
\left(\mathbb{E}[u_i^4]\right)^{1/2}
\left(\mathbb{E}[X_{ij}^2X_{ik}^2]\right)^{1/2}.
\end{align*}
Since each coordinate is bounded in absolute value by the Euclidean norm,
\begin{align*}
X_{ij}^2X_{ik}^2 \leq |X_i|^4.
\end{align*}
Thus $\mathbb{E}[X_{ij}^2X_{ik}^2] < \infty$, and hence $\mathbb{E}[|Z_{i,jk}|] < \infty$.
The random variables $Z_{1,jk}, Z_{2,jk}, \dots$ are i.i.d. because the observations are i.i.d. Therefore the weak law of large numbers for i.i.d. integrable random variables gives
\begin{align*}
\frac{1}{n}\sum_{i=1}^{n} Z_{i,jk}
\xrightarrow{\mathbb{P}}
\mathbb{E}[Z_{i,jk}]
=
\mathbb{E}[u_i^2X_{ij}X_{ik}]
=
\Omega_{jk}.
\end{align*}
There are only $p^2$ entries, and $p$ is fixed. A finite union bound therefore upgrades entrywise convergence to convergence of the whole matrix:
\begin{align*}
\frac{1}{n}\sum_{i=1}^{n}u_i^2X_iX_i^\top
\xrightarrow{\mathbb{P}}
\Omega.
\end{align*}[/guided]
custom_env
admin
[step:Bound the residual replacement matrix by fixed-dimensional moment averages]For $j,k \in \{1,\dots,p\}$, the $(j,k)$ entry of $R_n$ satisfies
\begin{align*}
|(R_n)_{jk}|
&\leq
2\sum_{\ell=1}^{p}|\Delta_{n,\ell}|
\left(\frac{1}{n}\sum_{i=1}^{n}|u_iX_{i\ell}X_{ij}X_{ik}|\right)
\\
&\quad+
\sum_{\ell=1}^{p}\sum_{m=1}^{p}
|\Delta_{n,\ell}\Delta_{n,m}|
\left(\frac{1}{n}\sum_{i=1}^{n}|X_{i\ell}X_{im}X_{ij}X_{ik}|\right).
\end{align*}
For each fixed quadruple of indices, Hölder's inequality with four exponents all equal to $4$ gives
\begin{align*}
\mathbb{E}[|u_iX_{i\ell}X_{ij}X_{ik}|]
\leq
\left(\mathbb{E}[u_i^4]\right)^{1/4}
\left(\mathbb{E}[X_{i\ell}^4]\right)^{1/4}
\left(\mathbb{E}[X_{ij}^4]\right)^{1/4}
\left(\mathbb{E}[X_{ik}^4]\right)^{1/4}
<\infty,
\end{align*}
and similarly
\begin{align*}
\mathbb{E}[|X_{i\ell}X_{im}X_{ij}X_{ik}|]
\leq
\prod_{r \in \{\ell,m,j,k\}}
\left(\mathbb{E}[X_{ir}^4]\right)^{1/4}
<\infty.
\end{align*}
The weak law of large numbers therefore implies that every average appearing in the displayed bound is $O_{\mathbb{P}}(1)$. Since $p$ is fixed and $\Delta_n \xrightarrow{\mathbb{P}} 0$, it follows that $(R_n)_{jk} \xrightarrow{\mathbb{P}} 0$ for every $j,k$. Hence
\begin{align*}
R_n \xrightarrow{\mathbb{P}} 0.
\end{align*}[/step]
custom_env
admin
[guided]We now control the cost of replacing $u_i$ by $\hat{u}_i$. Fix $j,k \in \{1,\dots,p\}$. From the residual expansion,
\begin{align*}
(R_n)_{jk}
=
\frac{1}{n}\sum_{i=1}^{n}
\left[-2u_iX_i^\top\Delta_n+(X_i^\top\Delta_n)^2\right]X_{ij}X_{ik}.
\end{align*}
Write the inner product in coordinates:
\begin{align*}
X_i^\top\Delta_n = \sum_{\ell=1}^{p}X_{i\ell}\Delta_{n,\ell}.
\end{align*}
Taking absolute values and using the triangle inequality gives
\begin{align*}
|(R_n)_{jk}|
&\leq
2\sum_{\ell=1}^{p}|\Delta_{n,\ell}|
\left(\frac{1}{n}\sum_{i=1}^{n}|u_iX_{i\ell}X_{ij}X_{ik}|\right)
\\
&\quad+
\sum_{\ell=1}^{p}\sum_{m=1}^{p}
|\Delta_{n,\ell}\Delta_{n,m}|
\left(\frac{1}{n}\sum_{i=1}^{n}|X_{i\ell}X_{im}X_{ij}X_{ik}|\right).
\end{align*}
The important point is that every sample average in parentheses has a finite expectation. For the terms containing $u_i$, Hölder's inequality with exponents $4,4,4,4$ gives
\begin{align*}
\mathbb{E}[|u_iX_{i\ell}X_{ij}X_{ik}|]
\leq
\left(\mathbb{E}[u_i^4]\right)^{1/4}
\left(\mathbb{E}[X_{i\ell}^4]\right)^{1/4}
\left(\mathbb{E}[X_{ij}^4]\right)^{1/4}
\left(\mathbb{E}[X_{ik}^4]\right)^{1/4}.
\end{align*}
Each coordinate fourth moment is finite because $|X_{ir}| \leq |X_i|$ for every coordinate $r$, and $\mathbb{E}[|X_i|^4] < \infty$. For the terms without $u_i$, the same form of Hölder's inequality gives
\begin{align*}
\mathbb{E}[|X_{i\ell}X_{im}X_{ij}X_{ik}|]
\leq
\left(\mathbb{E}[X_{i\ell}^4]\right)^{1/4}
\left(\mathbb{E}[X_{im}^4]\right)^{1/4}
\left(\mathbb{E}[X_{ij}^4]\right)^{1/4}
\left(\mathbb{E}[X_{ik}^4]\right)^{1/4}
<\infty.
\end{align*}
Thus every average in the bound is $O_{\mathbb{P}}(1)$ by the weak law of large numbers.
Now use consistency. Since the asymptotic normality assumptions imply $\hat{\beta}_n \xrightarrow{\mathbb{P}}\beta_0$, we have
\begin{align*}
\Delta_n \xrightarrow{\mathbb{P}} 0.
\end{align*}
Because $p$ is fixed, the finite sums over $\ell$ and $m$ do not grow with $n$. Hence the first group of terms is a finite sum of $o_{\mathbb{P}}(1)O_{\mathbb{P}}(1)$ terms, and the second group is a finite sum of $o_{\mathbb{P}}(1)^2O_{\mathbb{P}}(1)$ terms. Therefore
\begin{align*}
(R_n)_{jk} \xrightarrow{\mathbb{P}} 0.
\end{align*}
Since this holds for all finitely many entries, the whole matrix satisfies
\begin{align*}
R_n \xrightarrow{\mathbb{P}} 0.
\end{align*}[/guided]
custom_env
admin
[step:Conclude consistency of the residual covariance matrix]
Combining the decomposition from the first step with the two convergence results gives
\begin{align*}
\frac{1}{n}\sum_{i=1}^{n}\hat{u}_i^2X_iX_i^\top
=
\frac{1}{n}\sum_{i=1}^{n}u_i^2X_iX_i^\top
+
R_n
\xrightarrow{\mathbb{P}}
\Omega + 0
=
\Omega.
\end{align*}
This proves the first asserted convergence.
[/step]
custom_env
admin
[step:Invert the empirical second-moment matrix]Define the empirical second-moment matrix $Q_n \in \mathbb{R}^{p \times p}$ by
\begin{align*}
Q_n := \frac{1}{n}X^\top X
=
\frac{1}{n}\sum_{i=1}^{n}X_iX_i^\top.
\end{align*}
For each $j,k$, the random variable $X_{ij}X_{ik}$ is integrable because
\begin{align*}
\mathbb{E}[|X_{ij}X_{ik}|]
\leq
\left(\mathbb{E}[X_{ij}^2]\right)^{1/2}
\left(\mathbb{E}[X_{ik}^2]\right)^{1/2}
<\infty.
\end{align*}
Thus the weak law of large numbers gives
\begin{align*}
Q_n \xrightarrow{\mathbb{P}} Q.
\end{align*}
Since $\lambda_{\min}(Q)>0$, the matrix $Q$ is invertible. Let $\|\cdot\|_{\mathrm{op}}$ denote the operator norm on $\mathbb{R}^{p \times p}$. On the event
\begin{align*}
\|Q_n-Q\|_{\mathrm{op}}
<
\frac{1}{\|Q^{-1}\|_{\mathrm{op}}},
\end{align*}
the Neumann series argument gives invertibility of $Q_n$ and
\begin{align*}
Q_n^{-1}-Q^{-1}
=
Q_n^{-1}(Q-Q_n)Q^{-1}.
\end{align*}
Equivalently, continuity of inversion at the invertible matrix $Q$ yields
\begin{align*}
Q_n^{-1} \xrightarrow{\mathbb{P}} Q^{-1}.
\end{align*}[/step]
custom_env
admin
[guided]We need the two outer factors in the sandwich estimator. Define
\begin{align*}
Q_n := \frac{1}{n}X^\top X
=
\frac{1}{n}\sum_{i=1}^{n}X_iX_i^\top.
\end{align*}
For fixed $j,k$, the $(j,k)$ entry is the sample average of $X_{ij}X_{ik}$. This variable is integrable because Cauchy-Schwarz gives
\begin{align*}
\mathbb{E}[|X_{ij}X_{ik}|]
\leq
\left(\mathbb{E}[X_{ij}^2]\right)^{1/2}
\left(\mathbb{E}[X_{ik}^2]\right)^{1/2},
\end{align*}
and the second moments are finite since $\mathbb{E}[|X_i|^4] < \infty$. Therefore, by the weak law of large numbers,
\begin{align*}
Q_n \xrightarrow{\mathbb{P}} Q.
\end{align*}
It remains to justify inversion. The assumption $\lambda_{\min}(Q)>0$ says that $Q$ is positive definite, hence invertible. Matrix inversion is continuous at invertible matrices. A direct verification is as follows. If
\begin{align*}
\|Q_n-Q\|_{\mathrm{op}}
<
\frac{1}{\|Q^{-1}\|_{\mathrm{op}}},
\end{align*}
then
\begin{align*}
Q_n
=
Q\left[I+Q^{-1}(Q_n-Q)\right],
\end{align*}
and the matrix $I+Q^{-1}(Q_n-Q)$ is invertible by the Neumann series because its perturbation has operator norm less than $1$. Thus $Q_n$ is invertible on an event whose probability tends to $1$. On that event,
\begin{align*}
Q_n^{-1}-Q^{-1}
=
Q_n^{-1}(Q-Q_n)Q^{-1}.
\end{align*}
Since $Q_n \to Q$ in probability and inversion is continuous at $Q$, this yields
\begin{align*}
Q_n^{-1} \xrightarrow{\mathbb{P}} Q^{-1}.
\end{align*}[/guided]
custom_env
admin
[step:Multiply the convergent sandwich factors]
By definition of HC0,
\begin{align*}
n\widehat{\operatorname{Var}}_{\mathrm{HC0}}(\hat{\beta}_n)
&=
n(X^\top X)^{-1}
\left(\sum_{i=1}^{n}\hat{u}_i^2X_iX_i^\top\right)
(X^\top X)^{-1}
\\
&=
Q_n^{-1}
\left(\frac{1}{n}\sum_{i=1}^{n}\hat{u}_i^2X_iX_i^\top\right)
Q_n^{-1}.
\end{align*}
The matrix product map
\begin{align*}
(A,B,C) \mapsto ABC
\end{align*}
from $\mathbb{R}^{p \times p}\times\mathbb{R}^{p \times p}\times\mathbb{R}^{p \times p}$ to $\mathbb{R}^{p \times p}$ is continuous. Combining
\begin{align*}
Q_n^{-1}\xrightarrow{\mathbb{P}}Q^{-1},
\qquad
\frac{1}{n}\sum_{i=1}^{n}\hat{u}_i^2X_iX_i^\top\xrightarrow{\mathbb{P}}\Omega,
\end{align*}
gives
\begin{align*}
n\widehat{\operatorname{Var}}_{\mathrm{HC0}}(\hat{\beta}_n)
\xrightarrow{\mathbb{P}}
Q^{-1}\Omega Q^{-1}.
\end{align*}
[/step]
custom_env
admin
[step:Apply the fixed-dimensional HC1 multiplier]
For HC1,
\begin{align*}
n\widehat{\operatorname{Var}}_{\mathrm{HC1}}(\hat{\beta}_n)
=
\frac{n}{n-p}
n\widehat{\operatorname{Var}}_{\mathrm{HC0}}(\hat{\beta}_n).
\end{align*}
Since $p$ is fixed,
\begin{align*}
\frac{n}{n-p}
=
\frac{1}{1-p/n}
\to 1.
\end{align*}
Multiplying the HC0 convergence by this scalar sequence gives
\begin{align*}
n\widehat{\operatorname{Var}}_{\mathrm{HC1}}(\hat{\beta}_n)
\xrightarrow{\mathbb{P}}
Q^{-1}\Omega Q^{-1}.
\end{align*}
This completes the proof.
[/step]