[proofplan]
We first rewrite the Lasso objective using the identity $\frac{1}{n}X^\top X=I_p$. This converts the quadratic loss into a constant plus a squared Euclidean distance from $\beta$ to $z$, so the minimisation separates into $p$ independent one-dimensional convex problems. We then solve a generic scalar problem exactly by considering whether the scalar datum lies above $\lambda$, inside $[-\lambda,\lambda]$, or below $-\lambda$. The strict convexity of the separated scalar objectives gives uniqueness of each coordinate and therefore uniqueness of the full minimiser.
[/proofplan]
[step:Show that the orthonormal design assumption forces $p \leq n$]
The hypothesis $X^\top X/n=I_p$ is equivalent to $X^\top X = n I_p$. Hence the matrix $X^\top X$ is invertible, so the $p$ columns of $X$ are linearly independent vectors in $\mathbb{R}^n$. A linearly independent family in $\mathbb{R}^n$ has at most $n$ elements, so $p \leq n$.
[/step]
[step:Rewrite the Lasso objective as a separable sum]
Define the objective function $Q: \mathbb{R}^p \to \mathbb{R}$ by
\begin{align*}
Q(\beta) := \frac{1}{2n}|Y-X\beta|^2+\lambda\sum_{j=1}^p |\beta_j|.
\end{align*}
For $\beta \in \mathbb{R}^p$, expand the squared norm using the Euclidean [inner product](/page/Inner%20Product). The expansion is
\begin{align*}
\frac{1}{2n}|Y-X\beta|^2 = \frac{1}{2n}|Y|^2 - \frac{1}{n}\beta^\top X^\top Y + \frac{1}{2n}\beta^\top X^\top X\beta.
\end{align*}
Using $z = X^\top Y/n$ and $X^\top X/n = I_p$, this becomes
\begin{align*}
\frac{1}{2n}|Y-X\beta|^2 = \frac{1}{2n}|Y|^2-\beta^\top z+\frac{1}{2}|\beta|^2.
\end{align*}
Completing the square gives
\begin{align*}
\frac{1}{2n}|Y-X\beta|^2
=
\frac{1}{2n}|Y|^2
-\frac{1}{2}|z|^2
+\frac{1}{2}|\beta-z|^2.
\end{align*}
Define the constant $C \in \mathbb{R}$ by
\begin{align*}
C := \frac{1}{2n}|Y|^2-\frac{1}{2}|z|^2.
\end{align*}
Then
\begin{align*}
Q(\beta) = C+\sum_{j=1}^p \left(\frac{1}{2}(\beta_j-z_j)^2+\lambda|\beta_j|\right).
\end{align*}
Thus minimising $Q$ over $\mathbb{R}^p$ is equivalent to minimising each coordinate function independently.
[guided]
The reason the orthonormal design hypothesis is powerful is that it removes all interactions between different coordinates of $\beta$. Define the function $Q: \mathbb{R}^p \to \mathbb{R}$ by
\begin{align*}
Q(\beta) := \frac{1}{2n}|Y-X\beta|^2+\lambda\sum_{j=1}^p |\beta_j|.
\end{align*}
We expand the quadratic term. For every $\beta \in \mathbb{R}^p$,
\begin{align*}
\frac{1}{2n}|Y-X\beta|^2
=
\frac{1}{2n}(Y-X\beta)^\top(Y-X\beta).
\end{align*}
Expanding the Euclidean inner product gives
\begin{align*}
\frac{1}{2n}|Y-X\beta|^2
=
\frac{1}{2n}|Y|^2
-\frac{1}{n}\beta^\top X^\top Y
+\frac{1}{2n}\beta^\top X^\top X\beta.
\end{align*}
Now use the definitions
The definition of $z$ gives
\begin{align*}
z = \frac{1}{n}X^\top Y.
\end{align*}
The orthonormal design hypothesis gives
\begin{align*}
\frac{1}{n}X^\top X = I_p.
\end{align*}
These give
\begin{align*}
\frac{1}{2n}|Y-X\beta|^2
=
\frac{1}{2n}|Y|^2-\beta^\top z+\frac{1}{2}|\beta|^2.
\end{align*}
Completing the square in $\beta$ gives
\begin{align*}
\frac{1}{2}|\beta|^2-\beta^\top z
=
\frac{1}{2}|\beta-z|^2-\frac{1}{2}|z|^2.
\end{align*}
Therefore, if
\begin{align*}
C := \frac{1}{2n}|Y|^2-\frac{1}{2}|z|^2,
\end{align*}
then
\begin{align*}
Q(\beta)
=
C+\frac{1}{2}|\beta-z|^2+\lambda\sum_{j=1}^p |\beta_j|.
\end{align*}
Since $|\beta-z|^2 = \sum_{j=1}^p (\beta_j-z_j)^2$, this is
\begin{align*}
Q(\beta)
=
C+\sum_{j=1}^p
\left(
\frac{1}{2}(\beta_j-z_j)^2+\lambda|\beta_j|
\right).
\end{align*}
The constant $C$ does not affect the minimiser. What remains is a sum of terms where the $j$th term depends only on $\beta_j$. Hence the full $p$-dimensional minimisation problem reduces exactly to $p$ independent one-dimensional minimisation problems.
[/guided]
[/step]
[step:Solve the one-dimensional soft-thresholding problem]
Fix $t \in \mathbb{R}$ and define $\phi_t: \mathbb{R} \to \mathbb{R}$ by
\begin{align*}
\phi_t(u) := \frac{1}{2}(u-t)^2+\lambda|u|.
\end{align*}
We claim that $\phi_t$ has the unique minimiser $S_\lambda(t)$.
If $u>0$, then
\begin{align*}
\phi_t(u)=\frac{1}{2}(u-t)^2+\lambda u,
\qquad
\phi_t'(u)=u-t+\lambda.
\end{align*}
The critical point in $(0,\infty)$ is $u=t-\lambda$, which lies in $(0,\infty)$ exactly when $t>\lambda$.
If $u<0$, then
\begin{align*}
\phi_t(u)=\frac{1}{2}(u-t)^2-\lambda u,
\qquad
\phi_t'(u)=u-t-\lambda.
\end{align*}
The critical point in $(-\infty,0)$ is $u=t+\lambda$, which lies in $(-\infty,0)$ exactly when $t<-\lambda$.
It remains to handle $u=0$. For $u>0$,
\begin{align*}
\phi_t'(u)=u-t+\lambda,
\end{align*}
so the right derivative at $0$ is $\lambda-t$. For $u<0$,
\begin{align*}
\phi_t'(u)=u-t-\lambda,
\end{align*}
so the left derivative at $0$ is $-\lambda-t$.
We verify the minimisation condition at $0$ directly. If $u>0$, then
\begin{align*}
\phi_t(u)-\phi_t(0)
=
\frac{1}{2}u^2+u(\lambda-t).
\end{align*}
If $u<0$, then
\begin{align*}
\phi_t(u)-\phi_t(0)
=
\frac{1}{2}u^2-u(\lambda+t).
\end{align*}
Therefore $\phi_t(u) \geq \phi_t(0)$ for every $u \in \mathbb{R}$ exactly when $\lambda-t \geq 0$ and $\lambda+t \geq 0$, equivalently when
\begin{align*}
-\lambda \leq t \leq \lambda.
\end{align*}
Thus $0$ minimises $\phi_t$ exactly when $|t|\leq \lambda$.
Combining the three cases, the only candidate selected by the derivative computations is $t-\lambda$ when $t>\lambda$, is $0$ when $|t|\leq \lambda$, and is $t+\lambda$ when $t<-\lambda$. This is exactly the soft-thresholding value
\begin{align*}
S_\lambda(t)=\operatorname{sgn}(t)(|t|-\lambda)_+.
\end{align*}
It remains only to justify that these candidates are global and unique minimisers; we do this directly rather than citing a convex-analysis theorem. Since $u \mapsto \frac{1}{2}(u-t)^2$ is strictly convex and coercive on $\mathbb{R}$, and since $u \mapsto \lambda|u|$ is convex, the function $\phi_t$ is strictly convex and coercive. The one-sided derivative computations show that, in the case $t>\lambda$, the derivative is negative on $(-\infty,t-\lambda)$ and positive on $(t-\lambda,\infty)$, so $t-\lambda$ is the global minimiser. In the case $t<-\lambda$, the derivative is negative on $(-\infty,t+\lambda)$ and positive on $(t+\lambda,\infty)$, so $t+\lambda$ is the global minimiser. In the case $|t|\leq \lambda$, the direct difference estimates above show $\phi_t(u)\geq \phi_t(0)$ for every $u\in\mathbb{R}$. Strict convexity then makes the global minimiser unique in all three cases. Hence
\begin{align*}
\operatorname*{arg\,min}_{u \in \mathbb{R}} \phi_t(u)=S_\lambda(t).
\end{align*}
[/step]
[step:Assemble the coordinatewise minimisers and prove uniqueness]
For each $j \in \{1,\dots,p\}$, define $\phi_{z_j}: \mathbb{R} \to \mathbb{R}$ by
\begin{align*}
\phi_{z_j}(u) := \frac{1}{2}(u-z_j)^2+\lambda|u|.
\end{align*}
From the previous step, the unique minimiser of $\phi_{z_j}$ is $S_\lambda(z_j)$. Therefore the unique minimiser of the separated sum $\sum_{j=1}^p \phi_{z_j}(\beta_j)$ is the vector $\hat{\beta} \in \mathbb{R}^p$ defined coordinatewise by
\begin{align*}
\hat{\beta}_j := S_\lambda(z_j),
\qquad j=1,\dots,p.
\end{align*}
Adding the constant $C$ does not change the minimiser, so the same vector uniquely minimises $Q$. This is exactly the penalised Lasso solution, and the stated formula follows.
[/step]