[proofplan]
We first verify that the leave-one-out least-squares matrix is invertible under the hypothesis $1-h_{ii}>0$. Then we derive the leave-one-out coefficient difference directly from the normal equations, avoiding any external matrix-inverse formula. Finally, we compute the squared change in fitted values and rewrite the resulting expression using the internally studentized residual.
[/proofplan]
[step:Show that deleting the $i$-th observation still gives an invertible normal matrix]
Set
\begin{align*}
A := X^\top X \in \mathbb{R}^{p \times p}.
\end{align*}
Since $X$ has full column rank, $A$ is symmetric positive definite and therefore invertible. Let $S \in \mathbb{R}^{p \times p}$ denote the unique symmetric positive definite matrix satisfying $S^2=A$, and define $A^{1/2}:=S$ and $A^{-1/2}:=S^{-1}$. Define
\begin{align*}
A_{(i)} := X_{(i)}^\top X_{(i)} = A - x_i x_i^\top \in \mathbb{R}^{p \times p}.
\end{align*}
We prove that $A_{(i)}$ is positive definite. Let $v \in \mathbb{R}^p$ with $v \ne 0$. Since $A$ is positive definite, $v^\top A v > 0$. Applying the [Cauchy-Schwarz inequality](/theorems/432) to the Euclidean inner product after writing
\begin{align*}
x_i^\top v = \left(A^{-1/2}x_i\right)^\top \left(A^{1/2}v\right),
\end{align*}
gives
\begin{align*}
(x_i^\top v)^2
\leq \left(x_i^\top A^{-1}x_i\right)\left(v^\top A v\right)
= h_{ii}\, v^\top A v.
\end{align*}
Therefore
\begin{align*}
v^\top A_{(i)}v
= v^\top A v - (x_i^\top v)^2
\geq (1-h_{ii})v^\top A v
> 0.
\end{align*}
Thus $A_{(i)}$ is positive definite and invertible, so $\hat{\beta}_{(i)}$ is well-defined.
[guided]
Set
\begin{align*}
A := X^\top X \in \mathbb{R}^{p \times p}.
\end{align*}
The full column rank assumption on $X$ means that $Xv \ne 0$ for every nonzero $v \in \mathbb{R}^p$. Hence
\begin{align*}
v^\top A v = v^\top X^\top Xv = |Xv|^2 > 0
\end{align*}
for every $v \ne 0$, so $A$ is symmetric positive definite. Let $S \in \mathbb{R}^{p \times p}$ denote the unique symmetric positive definite matrix satisfying $S^2=A$. We define $A^{1/2}:=S$ and $A^{-1/2}:=S^{-1}$; the inverse $S^{-1}$ exists because $S$ is positive definite.
After deleting the $i$-th observation, the new normal matrix is
\begin{align*}
A_{(i)} := X_{(i)}^\top X_{(i)} = A - x_i x_i^\top.
\end{align*}
To define $\hat{\beta}_{(i)}$, we must know that $A_{(i)}$ is invertible. Let $v \in \mathbb{R}^p$ with $v \ne 0$. The only term that could destroy positive definiteness is $(x_i^\top v)^2$. We control it using Cauchy-Schwarz in a form adapted to $A$:
\begin{align*}
x_i^\top v = \left(A^{-1/2}x_i\right)^\top \left(A^{1/2}v\right).
\end{align*}
Thus
\begin{align*}
(x_i^\top v)^2
\leq \left|A^{-1/2}x_i\right|^2\left|A^{1/2}v\right|^2
= \left(x_i^\top A^{-1}x_i\right)\left(v^\top A v\right)
= h_{ii}\,v^\top A v.
\end{align*}
Consequently
\begin{align*}
v^\top A_{(i)}v
= v^\top A v - (x_i^\top v)^2
\geq (1-h_{ii})v^\top A v.
\end{align*}
The hypothesis $1-h_{ii}>0$ and the positive definiteness of $A$ give
\begin{align*}
(1-h_{ii})v^\top A v > 0.
\end{align*}
Therefore $A_{(i)}$ is positive definite, hence invertible, and the leave-one-out coefficient vector $\hat{\beta}_{(i)}$ is well-defined.
[/guided]
[/step]
[step:Derive the leave-one-out coefficient difference from the normal equations]
Let $y_i \in \mathbb{R}$ denote the $i$-th component of $y$. The full-data normal equation is
\begin{align*}
A\hat{\beta} = X^\top y.
\end{align*}
The leave-one-out normal equation is
\begin{align*}
(A-x_i x_i^\top)\hat{\beta}_{(i)} = X^\top y - x_i y_i.
\end{align*}
Since $e_i = y_i - x_i^\top \hat{\beta}$, we compute
\begin{align*}
(A-x_i x_i^\top)\hat{\beta}
&= A\hat{\beta}-x_i x_i^\top \hat{\beta} \\
&= X^\top y - x_i x_i^\top \hat{\beta} \\
&= X^\top y - x_i y_i + x_i(y_i-x_i^\top \hat{\beta}) \\
&= X^\top y - x_i y_i + x_i e_i.
\end{align*}
Subtracting the leave-one-out normal equation gives
\begin{align*}
(A-x_i x_i^\top)(\hat{\beta}-\hat{\beta}_{(i)}) = x_i e_i.
\end{align*}
We claim that
\begin{align*}
\hat{\beta}-\hat{\beta}_{(i)}
= \frac{e_i}{1-h_{ii}}A^{-1}x_i.
\end{align*}
Indeed,
\begin{align*}
(A-x_i x_i^\top)\left(\frac{e_i}{1-h_{ii}}A^{-1}x_i\right)
&= \frac{e_i}{1-h_{ii}}\left(x_i-x_i x_i^\top A^{-1}x_i\right) \\
&= \frac{e_i}{1-h_{ii}}(1-h_{ii})x_i \\
&= x_i e_i.
\end{align*}
Since $A-x_i x_i^\top=A_{(i)}$ is invertible, the solution is unique, proving the claimed formula.
[guided]
The coefficient difference is determined by comparing two normal equations. The full-data coefficient vector satisfies
\begin{align*}
A\hat{\beta} = X^\top y,
\end{align*}
where $A=X^\top X$. The leave-one-out coefficient vector satisfies
\begin{align*}
(A-x_i x_i^\top)\hat{\beta}_{(i)} = X^\top y - x_i y_i,
\end{align*}
because deleting the $i$-th row removes exactly the rank-one matrix $x_i x_i^\top$ from $X^\top X$ and exactly the vector $x_i y_i$ from $X^\top y$.
Now evaluate the leave-one-out normal matrix on the full-data coefficient vector:
\begin{align*}
(A-x_i x_i^\top)\hat{\beta}
&= A\hat{\beta}-x_i x_i^\top \hat{\beta} \\
&= X^\top y - x_i x_i^\top \hat{\beta}.
\end{align*}
The scalar residual at the deleted observation is
\begin{align*}
e_i = y_i - x_i^\top \hat{\beta}.
\end{align*}
Thus
\begin{align*}
-x_i x_i^\top \hat{\beta}
= -x_i y_i + x_i(y_i-x_i^\top \hat{\beta})
= -x_i y_i + x_i e_i.
\end{align*}
Substituting this into the previous equation gives
\begin{align*}
(A-x_i x_i^\top)\hat{\beta}
= X^\top y - x_i y_i + x_i e_i.
\end{align*}
Subtracting the leave-one-out normal equation yields
\begin{align*}
(A-x_i x_i^\top)(\hat{\beta}-\hat{\beta}_{(i)}) = x_i e_i.
\end{align*}
We now solve this linear equation explicitly. Consider the vector
\begin{align*}
w := \frac{e_i}{1-h_{ii}}A^{-1}x_i \in \mathbb{R}^p.
\end{align*}
Using $h_{ii}=x_i^\top A^{-1}x_i$, we compute
\begin{align*}
(A-x_i x_i^\top)w
&= (A-x_i x_i^\top)\left(\frac{e_i}{1-h_{ii}}A^{-1}x_i\right) \\
&= \frac{e_i}{1-h_{ii}}\left(AA^{-1}x_i-x_i x_i^\top A^{-1}x_i\right) \\
&= \frac{e_i}{1-h_{ii}}\left(x_i-h_{ii}x_i\right) \\
&= x_i e_i.
\end{align*}
Since the previous step proved that $A-x_i x_i^\top$ is invertible, this solution is unique. Therefore
\begin{align*}
\hat{\beta}-\hat{\beta}_{(i)}
= \frac{e_i}{1-h_{ii}}A^{-1}x_i.
\end{align*}
[/guided]
[/step]
[step:Compute the squared change in fitted values]
Using the coefficient difference from the previous step,
\begin{align*}
X\hat{\beta}-X\hat{\beta}_{(i)}
= X(\hat{\beta}-\hat{\beta}_{(i)})
= \frac{e_i}{1-h_{ii}}XA^{-1}x_i.
\end{align*}
Hence
\begin{align*}
|X\hat{\beta}-X\hat{\beta}_{(i)}|^2
&= \frac{e_i^2}{(1-h_{ii})^2}
\left(XA^{-1}x_i\right)^\top\left(XA^{-1}x_i\right) \\
&= \frac{e_i^2}{(1-h_{ii})^2}
x_i^\top A^{-1}X^\top X A^{-1}x_i \\
&= \frac{e_i^2}{(1-h_{ii})^2}
x_i^\top A^{-1} A A^{-1}x_i \\
&= \frac{e_i^2}{(1-h_{ii})^2}
x_i^\top A^{-1}x_i \\
&= \frac{e_i^2 h_{ii}}{(1-h_{ii})^2}.
\end{align*}
Dividing by $p s^2$ gives
\begin{align*}
D_i
= \frac{|X\hat{\beta}-X\hat{\beta}_{(i)}|^2}{p s^2}
= \frac{e_i^2}{p s^2}\frac{h_{ii}}{(1-h_{ii})^2}.
\end{align*}
[/step]
[step:Rewrite the formula using the internally studentized residual]
By definition of the internally studentized residual,
\begin{align*}
r_i = \frac{e_i}{s\sqrt{1-h_{ii}}}.
\end{align*}
Since $s^2>0$ and $1-h_{ii}>0$, this is well-defined and gives
\begin{align*}
r_i^2 = \frac{e_i^2}{s^2(1-h_{ii})}.
\end{align*}
Equivalently,
\begin{align*}
\frac{e_i^2}{s^2}=r_i^2(1-h_{ii}).
\end{align*}
Substituting this identity into the first Cook's distance formula gives
\begin{align*}
D_i
= \frac{e_i^2}{p s^2}\frac{h_{ii}}{(1-h_{ii})^2}
= \frac{r_i^2(1-h_{ii})}{p}\frac{h_{ii}}{(1-h_{ii})^2}
= \frac{r_i^2}{p}\frac{h_{ii}}{1-h_{ii}}.
\end{align*}
This proves both stated identities.
[/step]