Asymptotic Normality of Random Design Ordinary Least Squares

Asymptotic Normality of Random Design Ordinary Least Squares (Theorem # 4448)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] The proof starts from the exact finite-sample normal equation for OLS on the event where the sample second-moment matrix is invertible. The sample second-moment matrix converges in probability to $Q$, so its inverse converges in probability to $Q^{-1}$ and the invertibility event has probability tending to $1$. The score term $n^{-1/2}\sum_{i=1}^n X_i u_i$ converges in distribution to a centered Gaussian vector with covariance $\Omega$ by the [multivariate central limit theorem](/theorems/1854). Slutsky's theorem then gives the covariance transformation by $Q^{-1}$ on both sides. [/proofplan] [step:Derive the exact OLS expansion on the invertibility event] For each $n \in \mathbb{N}$, define the sample second-moment matrix $S_n: \Omega \to \mathbb{R}^{p \times p}$ and the sample score vector $T_n: \Omega \to \mathbb{R}^p$ by \begin{align*} S_n := \frac{1}{n}\sum_{i=1}^n X_i X_i^\top, \qquad T_n := \frac{1}{\sqrt{n}}\sum_{i=1}^n X_i u_i. \end{align*} On $A_n$, the matrix $\sum_{i=1}^n X_iX_i^\top = nS_n$ is invertible. Using $Y_i = X_i^\top \beta_0 + u_i$, we compute \begin{align*} \hat{\beta}_n &= \left(\sum_{i=1}^n X_i X_i^\top\right)^{-1} \left(\sum_{i=1}^n X_i(X_i^\top \beta_0 + u_i)\right) \\ &= \left(\sum_{i=1}^n X_i X_i^\top\right)^{-1} \left(\sum_{i=1}^n X_i X_i^\top\right)\beta_0 + \left(\sum_{i=1}^n X_i X_i^\top\right)^{-1} \left(\sum_{i=1}^n X_i u_i\right) \\ &= \beta_0 + S_n^{-1}\left(\frac{1}{n}\sum_{i=1}^n X_i u_i\right). \end{align*} Multiplying by $\sqrt{n}$ gives, on $A_n$, \begin{align*} \sqrt{n}(\hat{\beta}_n - \beta_0) = S_n^{-1}T_n. \end{align*} [guided] The goal of this step is to isolate the two asymptotic objects: the design matrix average and the score average. Define \begin{align*} S_n := \frac{1}{n}\sum_{i=1}^n X_i X_i^\top, \qquad T_n := \frac{1}{\sqrt{n}}\sum_{i=1}^n X_i u_i. \end{align*} Here $S_n$ is an $\mathbb{R}^{p \times p}$-valued random matrix and $T_n$ is an $\mathbb{R}^p$-valued random vector. On the event $A_n$, the matrix $\sum_{i=1}^n X_iX_i^\top$ is invertible, so the closed-form OLS expression is valid. Substituting the model equation $Y_i = X_i^\top \beta_0 + u_i$ into the OLS formula gives \begin{align*} \hat{\beta}_n &= \left(\sum_{i=1}^n X_i X_i^\top\right)^{-1} \left(\sum_{i=1}^n X_i(X_i^\top \beta_0 + u_i)\right) \\ &= \left(\sum_{i=1}^n X_i X_i^\top\right)^{-1} \left(\sum_{i=1}^n X_i X_i^\top\right)\beta_0 + \left(\sum_{i=1}^n X_i X_i^\top\right)^{-1} \left(\sum_{i=1}^n X_i u_i\right). \end{align*} The first term equals $\beta_0$ because the matrix is invertible on $A_n$. Therefore \begin{align*} \hat{\beta}_n - \beta_0 = \left(\sum_{i=1}^n X_i X_i^\top\right)^{-1} \left(\sum_{i=1}^n X_i u_i\right). \end{align*} Since $\sum_{i=1}^n X_iX_i^\top = nS_n$, its inverse is $n^{-1}S_n^{-1}$. Hence \begin{align*} \sqrt{n}(\hat{\beta}_n - \beta_0) = S_n^{-1} \left(\frac{1}{\sqrt{n}}\sum_{i=1}^n X_i u_i\right) = S_n^{-1}T_n. \end{align*} This identity is exact on $A_n$; no limiting approximation has been used yet. [/guided] [/step] [step:Show the sample second-moment matrix converges to $Q$ and is eventually invertible with high probability] For $1 \le j,k \le p$, the scalar random variables $X_{ij}X_{ik}$ are i.i.d. and integrable because \begin{align*} |X_{ij}X_{ik}| \le |X_i|^2, \qquad \mathbb{E}[|X_i|^2] = \operatorname{tr}(Q) < \infty. \end{align*} By the [weak law of large numbers](/theorems/1127) for each coordinate entry (citing a result not yet in the wiki: [Weak Law of Large Numbers](/theorems/1851)), \begin{align*} (S_n)_{jk} = \frac{1}{n}\sum_{i=1}^n X_{ij}X_{ik} \xrightarrow{\mathbb{P}} Q_{jk}. \end{align*} Since $p$ is fixed, entrywise convergence implies convergence in probability in any matrix norm: \begin{align*} S_n \xrightarrow{\mathbb{P}} Q. \end{align*} Because $Q$ is positive definite, it is invertible. The determinant map $\det: \mathbb{R}^{p \times p} \to \mathbb{R}$ and the inverse map on $GL(p,\mathbb{R})$ are continuous. Therefore, \begin{align*} \mathbb{P}(A_n) \to 1, \qquad S_n^{-1}\mathbb{1}_{A_n} \xrightarrow{\mathbb{P}} Q^{-1}. \end{align*} Equivalently, after redefining $S_n^{-1}$ arbitrarily on $A_n^c$, we have \begin{align*} S_n^{-1} \xrightarrow{\mathbb{P}} Q^{-1}. \end{align*} [guided] We next control the random matrix multiplying the score term. For $1 \le j,k \le p$, let $X_{ij}$ denote the $j$-th component of $X_i$. The $(j,k)$ entry of $S_n$ is \begin{align*} (S_n)_{jk} = \frac{1}{n}\sum_{i=1}^n X_{ij}X_{ik}. \end{align*} These summands are i.i.d. because the random vectors $X_i$ are i.i.d. They are also integrable: by the [Cauchy-Schwarz inequality](/theorems/432) in $\mathbb{R}^p$, \begin{align*} |X_{ij}X_{ik}| \le |X_i|^2. \end{align*} Moreover \begin{align*} \mathbb{E}[|X_i|^2] = \sum_{j=1}^p \mathbb{E}[X_{ij}^2] = \operatorname{tr}(Q) < \infty, \end{align*} because $Q=\mathbb{E}[X_iX_i^\top]$ is a finite positive definite matrix. The weak law of large numbers applies to each scalar sequence $(X_{ij}X_{ik})_{i=1}^\infty$ (citing a result not yet in the wiki: Weak Law of Large Numbers). Thus \begin{align*} (S_n)_{jk} \xrightarrow{\mathbb{P}} \mathbb{E}[X_{ij}X_{ik}] = Q_{jk}. \end{align*} Since $p$ is fixed, there are only finitely many entries. Entrywise convergence therefore implies convergence of the whole matrix in probability: \begin{align*} S_n \xrightarrow{\mathbb{P}} Q. \end{align*} Because $Q$ is positive definite, all its eigenvalues are positive, so $Q$ is invertible and $\det Q \neq 0$. The determinant function is continuous on $\mathbb{R}^{p \times p}$, so \begin{align*} \det(S_n) \xrightarrow{\mathbb{P}} \det(Q). \end{align*} Since $\det(Q) \neq 0$, this implies \begin{align*} \mathbb{P}(\det(S_n) \neq 0) \to 1. \end{align*} But $\det(S_n) \neq 0$ is exactly the event $A_n$, because multiplying by $n$ does not affect invertibility. Hence $\mathbb{P}(A_n) \to 1$. Finally, the inverse map $M \mapsto M^{-1}$ is continuous on the [open set](/page/Open%20Set) $GL(p,\mathbb{R})$ of invertible matrices. Therefore the [continuous mapping theorem](/theorems/1847) gives \begin{align*} S_n^{-1} \xrightarrow{\mathbb{P}} Q^{-1}, \end{align*} with any arbitrary definition on $A_n^c$, since $\mathbb{P}(A_n^c)\to 0$. [/guided] [/step] [step:Apply the multivariate central limit theorem to the score term] Define the score summand $Z_i: \Omega \to \mathbb{R}^p$ by \begin{align*} Z_i := X_i u_i. \end{align*} The random vectors $(Z_i)_{i=1}^{\infty}$ are i.i.d. Since $\mathbb{E}[X_i u_i]=0$, we have $\mathbb{E}[Z_i]=0$. Also, \begin{align*} \mathbb{E}[|Z_i|^2] = \mathbb{E}[u_i^2|X_i|^2] < \infty, \end{align*} so $Z_i$ has finite second moment. Its covariance matrix is \begin{align*} \operatorname{Var}(Z_i) = \mathbb{E}[Z_iZ_i^\top] = \mathbb{E}[u_i^2X_iX_i^\top] = \Omega. \end{align*} By the multivariate [central limit theorem](/theorems/1848) (citing a result not yet in the wiki: Multivariate [Central Limit Theorem](/theorems/521)), \begin{align*} T_n = \frac{1}{\sqrt{n}}\sum_{i=1}^n Z_i \xrightarrow{d} \mathcal{N}(0,\Omega). \end{align*} [guided] The score term is \begin{align*} T_n = \frac{1}{\sqrt{n}}\sum_{i=1}^n X_i u_i. \end{align*} To put this in the standard central limit theorem form, define \begin{align*} Z_i := X_i u_i. \end{align*} Then $Z_i: \Omega \to \mathbb{R}^p$ is a random vector, and \begin{align*} T_n = \frac{1}{\sqrt{n}}\sum_{i=1}^n Z_i. \end{align*} We verify the hypotheses of the multivariate central limit theorem. First, the $Z_i$ are i.i.d. because each $Z_i$ is a measurable function of the i.i.d. pair $(X_i,u_i)$. Second, the mean is zero: \begin{align*} \mathbb{E}[Z_i] = \mathbb{E}[X_i u_i] = 0. \end{align*} Third, $Z_i$ has finite second moment because \begin{align*} |Z_i|^2 = |X_i u_i|^2 = u_i^2 |X_i|^2 \end{align*} and the theorem assumes \begin{align*} \mathbb{E}[u_i^2|X_i|^2] < \infty. \end{align*} The covariance matrix is therefore well-defined and equals \begin{align*} \operatorname{Var}(Z_i) &= \mathbb{E}[(Z_i-\mathbb{E}[Z_i])(Z_i-\mathbb{E}[Z_i])^\top] \\ &= \mathbb{E}[Z_iZ_i^\top] \\ &= \mathbb{E}[(X_i u_i)(X_i u_i)^\top] \\ &= \mathbb{E}[u_i^2X_iX_i^\top] = \Omega. \end{align*} The multivariate central limit theorem now applies (citing a result not yet in the wiki: Multivariate Central Limit Theorem), giving \begin{align*} T_n = \frac{1}{\sqrt{n}}\sum_{i=1}^n Z_i \xrightarrow{d} \mathcal{N}(0,\Omega). \end{align*} [/guided] [/step] [step:Combine the matrix and score limits by Slutsky's theorem] From the preceding steps, \begin{align*} S_n^{-1} \xrightarrow{\mathbb{P}} Q^{-1}, \qquad T_n \xrightarrow{d} \mathcal{N}(0,\Omega). \end{align*} By Slutsky's theorem for products of random matrices and random vectors (citing a result not yet in the wiki: Slutsky's Theorem), \begin{align*} S_n^{-1}T_n \xrightarrow{d} Q^{-1}Z, \end{align*} where $Z \sim \mathcal{N}(0,\Omega)$. Since linear transformations of Gaussian random vectors are Gaussian, \begin{align*} Q^{-1}Z \sim \mathcal{N}(0,Q^{-1}\Omega(Q^{-1})^\top). \end{align*} The matrix $Q$ is symmetric because $Q=\mathbb{E}[X_iX_i^\top]$, hence $Q^{-1}$ is symmetric. Therefore \begin{align*} Q^{-1}\Omega(Q^{-1})^\top = Q^{-1}\Omega Q^{-1}. \end{align*} Using the exact expansion on $A_n$ and $\mathbb{P}(A_n)\to 1$, changing the estimator arbitrarily on $A_n^c$ does not affect the limiting distribution. Hence \begin{align*} \sqrt{n}(\hat{\beta}_n-\beta_0) \xrightarrow{d} \mathcal{N}(0,Q^{-1}\Omega Q^{-1}). \end{align*} This proves the theorem. [guided] We now combine the two limits. The previous steps established \begin{align*} S_n^{-1} \xrightarrow{\mathbb{P}} Q^{-1} \end{align*} and \begin{align*} T_n \xrightarrow{d} \mathcal{N}(0,\Omega). \end{align*} The first convergence is in probability to a non-random matrix, while the second convergence is in distribution to a Gaussian random vector. Slutsky's theorem applies exactly to this situation (citing a result not yet in the wiki: Slutsky's Theorem): multiplying a sequence converging in distribution by a sequence converging in probability to a constant matrix preserves the distributional limit with that constant matrix applied. Therefore \begin{align*} S_n^{-1}T_n \xrightarrow{d} Q^{-1}Z, \end{align*} where $Z$ is a random vector with distribution $\mathcal{N}(0,\Omega)$. It remains to identify the covariance matrix of $Q^{-1}Z$. A linear transformation of a Gaussian random vector is Gaussian. Its mean is \begin{align*} \mathbb{E}[Q^{-1}Z] = Q^{-1}\mathbb{E}[Z] = 0, \end{align*} and its covariance matrix is \begin{align*} \operatorname{Var}(Q^{-1}Z) &= \mathbb{E}[(Q^{-1}Z)(Q^{-1}Z)^\top] \\ &= Q^{-1}\mathbb{E}[ZZ^\top](Q^{-1})^\top \\ &= Q^{-1}\Omega(Q^{-1})^\top. \end{align*} Since $Q=\mathbb{E}[X_iX_i^\top]$ is symmetric, its inverse $Q^{-1}$ is also symmetric, so \begin{align*} Q^{-1}\Omega(Q^{-1})^\top = Q^{-1}\Omega Q^{-1}. \end{align*} Finally, the identity \begin{align*} \sqrt{n}(\hat{\beta}_n-\beta_0)=S_n^{-1}T_n \end{align*} was derived on $A_n$, and we proved $\mathbb{P}(A_n)\to 1$. Since modifying a sequence of random vectors on events whose probabilities tend to zero does not change its limiting distribution, the arbitrary extension of $\hat{\beta}_n$ on $A_n^c$ is asymptotically irrelevant. Thus \begin{align*} \sqrt{n}(\hat{\beta}_n-\beta_0) \xrightarrow{d} \mathcal{N}(0,Q^{-1}\Omega Q^{-1}), \end{align*} which is precisely the claimed asymptotic normality of random design OLS. [/guided] [/step]

Prerequisites (0/7 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

Definitions & Concepts

Explore Further

Distribution Definition Determinant Definition Central Limit Theorem Theorem #521 Weak Law of Large Numbers Theorem #1127 Central Limit Theorem Theorem #1848 Weak Law of Large Numbers Theorem #1851 Multivariate Central Limit Theorem Theorem #1854 Tail Integral Formula Probability Theory Basic Implications Between Modes of Convergence Probability Theory Sherman–Morrison Leave-One-Out Formula Probability & Statistics Skorokhod Embedding Theorem Brownian Motion Limiting Events Are Measurable Probability & Statistics Conditional Expectation as the $L^2$ Risk Minimizer Probability & Statistics Extinction Probability Probability Theory PGF of a Sum Probability Theory Probability & Statistics Area

What brings you to Androma?

Start with a route through the knowledge graph.