[proofplan]
We diagonalize the ridge normal matrix $X^\top X + \rho I_p$ using the right singular vector basis of $X$. In that basis, inversion is scalar division by $d_j^2+\rho$ on the row-space directions and by $\rho$ on the null-space directions. Since $X^\top y$ has no component in the null space of $X$, only the first $r$ singular directions contribute. Multiplying the resulting coefficient expansion by $X$ gives the fitted-value formula with shrinkage factors $d_j^2/(d_j^2+\rho)$.
[/proofplan]
custom_env
admin
[step:Diagonalize the ridge normal matrix in the right singular vector basis]
Since $X = UDV^\top$, orthogonality of $U$ gives
\begin{align*}
X^\top X
&= (UDV^\top)^\top(UDV^\top) \\
&= VD^\top U^\top UDV^\top \\
&= VD^\top DV^\top.
\end{align*}
Define the diagonal matrix
\begin{align*}
A := D^\top D + \rho I_p \in \mathbb{R}^{p \times p}.
\end{align*}
Then
\begin{align*}
X^\top X + \rho I_p
=
VAV^\top.
\end{align*}
The diagonal entries of $A$ are $d_j^2+\rho$ for $1 \le j \le r$ and $\rho$ for $r < j \le p$. Because $\rho > 0$, all diagonal entries of $A$ are positive, so $A$ is invertible. Since $V$ is orthogonal,
\begin{align*}
(X^\top X + \rho I_p)^{-1}
=
VA^{-1}V^\top.
\end{align*}
[/step]
custom_env
admin
[step:Expand $X^\top y$ in the right singular vector basis]
For each $1 \le j \le r$, the diagonal structure of $D$ gives $D^\top u_j = d_j e_j$, where $e_j \in \mathbb{R}^p$ is the $j$-th standard basis vector. For $r < j \le n$, the corresponding singular value is zero, so no positive singular direction contributes. Hence
\begin{align*}
X^\top y
&= VD^\top U^\top y \\
&= V\left(\sum_{j=1}^r d_j(u_j^\top y)e_j\right) \\
&= \sum_{j=1}^r d_j(u_j^\top y)v_j.
\end{align*}
[/step]
custom_env
admin
[step:Apply the inverse diagonal multiplier to obtain the coefficient formula]
Using the previous two steps,
\begin{align*}
\hat{\beta}^{\mathrm{ridge}}(\rho)
&= (X^\top X + \rho I_p)^{-1}X^\top y \\
&= VA^{-1}V^\top \left(\sum_{j=1}^r d_j(u_j^\top y)v_j\right).
\end{align*}
Since $V^\top v_j = e_j$ and $A^{-1}e_j = (d_j^2+\rho)^{-1}e_j$ for $1 \le j \le r$, we get
\begin{align*}
\hat{\beta}^{\mathrm{ridge}}(\rho)
&= \sum_{j=1}^r d_j(u_j^\top y)V A^{-1}e_j \\
&= \sum_{j=1}^r \frac{d_j}{d_j^2+\rho}(u_j^\top y)Ve_j \\
&= \sum_{j=1}^r \frac{d_j}{d_j^2+\rho}(u_j^\top y)v_j.
\end{align*}
[/step]
custom_env
admin
[step:Multiply by $X$ to obtain the fitted-value formula]
For $1 \le j \le r$, the [singular value decomposition](/theorems/3071) gives
\begin{align*}
Xv_j = UDV^\top v_j = UDe_j = d_j u_j.
\end{align*}
Therefore
\begin{align*}
X\hat{\beta}^{\mathrm{ridge}}(\rho)
&=
X\left(\sum_{j=1}^r \frac{d_j}{d_j^2+\rho}(u_j^\top y)v_j\right) \\
&=
\sum_{j=1}^r \frac{d_j}{d_j^2+\rho}(u_j^\top y)Xv_j \\
&=
\sum_{j=1}^r \frac{d_j^2}{d_j^2+\rho}(u_j^\top y)u_j.
\end{align*}
This is the stated fitted-value expansion, and the proof is complete.
[/step]