Cook's Distance Formula for Internally Studentized Residuals

Cook's Distance Formula for Internally Studentized Residuals (Theorem # 4458)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We first verify that the leave-one-out least-squares matrix is invertible under the hypothesis $1-h_{ii}>0$. Then we derive the leave-one-out coefficient difference directly from the normal equations, avoiding any external matrix-inverse formula. Finally, we compute the squared change in fitted values and rewrite the resulting expression using the internally studentized residual. [/proofplan] [step:Show that deleting the $i$-th observation still gives an invertible normal matrix] Set \begin{align*} A := X^\top X \in \mathbb{R}^{p \times p}. \end{align*} Since $X$ has full column rank, $A$ is symmetric positive definite and therefore invertible. Let $S \in \mathbb{R}^{p \times p}$ denote the unique symmetric positive definite matrix satisfying $S^2=A$, and define $A^{1/2}:=S$ and $A^{-1/2}:=S^{-1}$. Define \begin{align*} A_{(i)} := X_{(i)}^\top X_{(i)} = A - x_i x_i^\top \in \mathbb{R}^{p \times p}. \end{align*} We prove that $A_{(i)}$ is positive definite. Let $v \in \mathbb{R}^p$ with $v \ne 0$. Since $A$ is positive definite, $v^\top A v > 0$. Applying the [Cauchy-Schwarz inequality](/theorems/432) to the Euclidean inner product after writing \begin{align*} x_i^\top v = \left(A^{-1/2}x_i\right)^\top \left(A^{1/2}v\right), \end{align*} gives \begin{align*} (x_i^\top v)^2 \leq \left(x_i^\top A^{-1}x_i\right)\left(v^\top A v\right) = h_{ii}\, v^\top A v. \end{align*} Therefore \begin{align*} v^\top A_{(i)}v = v^\top A v - (x_i^\top v)^2 \geq (1-h_{ii})v^\top A v > 0. \end{align*} Thus $A_{(i)}$ is positive definite and invertible, so $\hat{\beta}_{(i)}$ is well-defined. [guided] Set \begin{align*} A := X^\top X \in \mathbb{R}^{p \times p}. \end{align*} The full column rank assumption on $X$ means that $Xv \ne 0$ for every nonzero $v \in \mathbb{R}^p$. Hence \begin{align*} v^\top A v = v^\top X^\top Xv = |Xv|^2 > 0 \end{align*} for every $v \ne 0$, so $A$ is symmetric positive definite. Let $S \in \mathbb{R}^{p \times p}$ denote the unique symmetric positive definite matrix satisfying $S^2=A$. We define $A^{1/2}:=S$ and $A^{-1/2}:=S^{-1}$; the inverse $S^{-1}$ exists because $S$ is positive definite. After deleting the $i$-th observation, the new normal matrix is \begin{align*} A_{(i)} := X_{(i)}^\top X_{(i)} = A - x_i x_i^\top. \end{align*} To define $\hat{\beta}_{(i)}$, we must know that $A_{(i)}$ is invertible. Let $v \in \mathbb{R}^p$ with $v \ne 0$. The only term that could destroy positive definiteness is $(x_i^\top v)^2$. We control it using Cauchy-Schwarz in a form adapted to $A$: \begin{align*} x_i^\top v = \left(A^{-1/2}x_i\right)^\top \left(A^{1/2}v\right). \end{align*} Thus \begin{align*} (x_i^\top v)^2 \leq \left|A^{-1/2}x_i\right|^2\left|A^{1/2}v\right|^2 = \left(x_i^\top A^{-1}x_i\right)\left(v^\top A v\right) = h_{ii}\,v^\top A v. \end{align*} Consequently \begin{align*} v^\top A_{(i)}v = v^\top A v - (x_i^\top v)^2 \geq (1-h_{ii})v^\top A v. \end{align*} The hypothesis $1-h_{ii}>0$ and the positive definiteness of $A$ give \begin{align*} (1-h_{ii})v^\top A v > 0. \end{align*} Therefore $A_{(i)}$ is positive definite, hence invertible, and the leave-one-out coefficient vector $\hat{\beta}_{(i)}$ is well-defined. [/guided] [/step] [step:Derive the leave-one-out coefficient difference from the normal equations] Let $y_i \in \mathbb{R}$ denote the $i$-th component of $y$. The full-data normal equation is \begin{align*} A\hat{\beta} = X^\top y. \end{align*} The leave-one-out normal equation is \begin{align*} (A-x_i x_i^\top)\hat{\beta}_{(i)} = X^\top y - x_i y_i. \end{align*} Since $e_i = y_i - x_i^\top \hat{\beta}$, we compute \begin{align*} (A-x_i x_i^\top)\hat{\beta} &= A\hat{\beta}-x_i x_i^\top \hat{\beta} \\ &= X^\top y - x_i x_i^\top \hat{\beta} \\ &= X^\top y - x_i y_i + x_i(y_i-x_i^\top \hat{\beta}) \\ &= X^\top y - x_i y_i + x_i e_i. \end{align*} Subtracting the leave-one-out normal equation gives \begin{align*} (A-x_i x_i^\top)(\hat{\beta}-\hat{\beta}_{(i)}) = x_i e_i. \end{align*} We claim that \begin{align*} \hat{\beta}-\hat{\beta}_{(i)} = \frac{e_i}{1-h_{ii}}A^{-1}x_i. \end{align*} Indeed, \begin{align*} (A-x_i x_i^\top)\left(\frac{e_i}{1-h_{ii}}A^{-1}x_i\right) &= \frac{e_i}{1-h_{ii}}\left(x_i-x_i x_i^\top A^{-1}x_i\right) \\ &= \frac{e_i}{1-h_{ii}}(1-h_{ii})x_i \\ &= x_i e_i. \end{align*} Since $A-x_i x_i^\top=A_{(i)}$ is invertible, the solution is unique, proving the claimed formula. [guided] The coefficient difference is determined by comparing two normal equations. The full-data coefficient vector satisfies \begin{align*} A\hat{\beta} = X^\top y, \end{align*} where $A=X^\top X$. The leave-one-out coefficient vector satisfies \begin{align*} (A-x_i x_i^\top)\hat{\beta}_{(i)} = X^\top y - x_i y_i, \end{align*} because deleting the $i$-th row removes exactly the rank-one matrix $x_i x_i^\top$ from $X^\top X$ and exactly the vector $x_i y_i$ from $X^\top y$. Now evaluate the leave-one-out normal matrix on the full-data coefficient vector: \begin{align*} (A-x_i x_i^\top)\hat{\beta} &= A\hat{\beta}-x_i x_i^\top \hat{\beta} \\ &= X^\top y - x_i x_i^\top \hat{\beta}. \end{align*} The scalar residual at the deleted observation is \begin{align*} e_i = y_i - x_i^\top \hat{\beta}. \end{align*} Thus \begin{align*} -x_i x_i^\top \hat{\beta} = -x_i y_i + x_i(y_i-x_i^\top \hat{\beta}) = -x_i y_i + x_i e_i. \end{align*} Substituting this into the previous equation gives \begin{align*} (A-x_i x_i^\top)\hat{\beta} = X^\top y - x_i y_i + x_i e_i. \end{align*} Subtracting the leave-one-out normal equation yields \begin{align*} (A-x_i x_i^\top)(\hat{\beta}-\hat{\beta}_{(i)}) = x_i e_i. \end{align*} We now solve this linear equation explicitly. Consider the vector \begin{align*} w := \frac{e_i}{1-h_{ii}}A^{-1}x_i \in \mathbb{R}^p. \end{align*} Using $h_{ii}=x_i^\top A^{-1}x_i$, we compute \begin{align*} (A-x_i x_i^\top)w &= (A-x_i x_i^\top)\left(\frac{e_i}{1-h_{ii}}A^{-1}x_i\right) \\ &= \frac{e_i}{1-h_{ii}}\left(AA^{-1}x_i-x_i x_i^\top A^{-1}x_i\right) \\ &= \frac{e_i}{1-h_{ii}}\left(x_i-h_{ii}x_i\right) \\ &= x_i e_i. \end{align*} Since the previous step proved that $A-x_i x_i^\top$ is invertible, this solution is unique. Therefore \begin{align*} \hat{\beta}-\hat{\beta}_{(i)} = \frac{e_i}{1-h_{ii}}A^{-1}x_i. \end{align*} [/guided] [/step] [step:Compute the squared change in fitted values] Using the coefficient difference from the previous step, \begin{align*} X\hat{\beta}-X\hat{\beta}_{(i)} = X(\hat{\beta}-\hat{\beta}_{(i)}) = \frac{e_i}{1-h_{ii}}XA^{-1}x_i. \end{align*} Hence \begin{align*} |X\hat{\beta}-X\hat{\beta}_{(i)}|^2 &= \frac{e_i^2}{(1-h_{ii})^2} \left(XA^{-1}x_i\right)^\top\left(XA^{-1}x_i\right) \\ &= \frac{e_i^2}{(1-h_{ii})^2} x_i^\top A^{-1}X^\top X A^{-1}x_i \\ &= \frac{e_i^2}{(1-h_{ii})^2} x_i^\top A^{-1} A A^{-1}x_i \\ &= \frac{e_i^2}{(1-h_{ii})^2} x_i^\top A^{-1}x_i \\ &= \frac{e_i^2 h_{ii}}{(1-h_{ii})^2}. \end{align*} Dividing by $p s^2$ gives \begin{align*} D_i = \frac{|X\hat{\beta}-X\hat{\beta}_{(i)}|^2}{p s^2} = \frac{e_i^2}{p s^2}\frac{h_{ii}}{(1-h_{ii})^2}. \end{align*} [/step] [step:Rewrite the formula using the internally studentized residual] By definition of the internally studentized residual, \begin{align*} r_i = \frac{e_i}{s\sqrt{1-h_{ii}}}. \end{align*} Since $s^2>0$ and $1-h_{ii}>0$, this is well-defined and gives \begin{align*} r_i^2 = \frac{e_i^2}{s^2(1-h_{ii})}. \end{align*} Equivalently, \begin{align*} \frac{e_i^2}{s^2}=r_i^2(1-h_{ii}). \end{align*} Substituting this identity into the first Cook's distance formula gives \begin{align*} D_i = \frac{e_i^2}{p s^2}\frac{h_{ii}}{(1-h_{ii})^2} = \frac{r_i^2(1-h_{ii})}{p}\frac{h_{ii}}{(1-h_{ii})^2} = \frac{r_i^2}{p}\frac{h_{ii}}{1-h_{ii}}. \end{align*} This proves both stated identities. [/step]

Explore Further

Variance Inflation Factor Formula Probability & Statistics Interval Probabilities from the Distribution Function Probability Theory Limiting Events Are Measurable Probability & Statistics Skorokhod Embedding Theorem Brownian Motion Joint Distribution of Brownian Motion and its Maximum Brownian Motion Existence and Uniqueness of Conditional Expectation Conditional Expectation Bias-Variance Decomposition for Squared Loss Probability & Statistics Countable Subadditivity Probability Theory Probability & Statistics Area

What brings you to Androma?

Start with a route through the knowledge graph.