[proofplan]
We regard $\mathbb{R}^{n \times d}$ as Euclidean space with the Frobenius norm, so the Gaussian matrix $G$ is a standard Gaussian vector in dimension $nd$. The singular value maps $A \mapsto s_1(A)$ and $A \mapsto s_d(A)$ are $1$-Lipschitz with respect to the Frobenius norm, by the singular-value perturbation inequality. Gordon's Gaussian comparison theorem gives the expectation bounds $\mathbb{E}[s_1(G)] \leq \sqrt n+\sqrt d$ and $\mathbb{E}[s_d(G)] \geq \sqrt n-\sqrt d$. Gaussian concentration for Lipschitz functions then converts these mean bounds into the two stated tail estimates.
[/proofplan]
custom_env
admin
[step:Put the Gaussian matrix in Euclidean form]
Let $\mathbb{R}^{n \times d}$ be equipped with the Frobenius [inner product](/page/Inner%20Product) and Frobenius norm
\begin{align*}
(A,B)_F &:= \operatorname{tr}(A^\top B), &
\|A\|_F &:= \left(\sum_{i=1}^n \sum_{j=1}^d A_{ij}^2\right)^{1/2}.
\end{align*}
For a vector $x=(x_1,\dots,x_m) \in \mathbb{R}^m$, where $m \in \{d,n\}$, let
\begin{align*}
|x| := \left(\sum_{i=1}^m x_i^2\right)^{1/2}
\end{align*}
denote its Euclidean norm. Under the identification $\mathbb{R}^{n \times d} \cong \mathbb{R}^{nd}$ given by listing the matrix entries, the random matrix $G$ is a standard Gaussian vector in the Euclidean space $(\mathbb{R}^{n \times d},\|\cdot\|_F)$, because its coordinates are independent $\mathcal{N}(0,1)$ random variables.
Define the largest singular value map $\sigma_{\max}: \mathbb{R}^{n \times d} \to [0,\infty)$ by
\begin{align*}
\sigma_{\max}(A) := s_1(A).
\end{align*}
Define the smallest singular value map $\sigma_{\min}: \mathbb{R}^{n \times d} \to [0,\infty)$ by
\begin{align*}
\sigma_{\min}(A) := s_d(A).
\end{align*}
These are the two real-valued functions to which Gaussian concentration will be applied.
[/step]
custom_env
admin
[step:Verify that the extreme singular value maps are $1$-Lipschitz]For any two matrices $A,B \in \mathbb{R}^{n \times d}$, the [Weyl-Mirsky singular value perturbation inequality](https://doi.org/10.1007/978-1-4612-0653-8) states that
\begin{align*}
|s_k(A)-s_k(B)| \leq \|A-B\|_{\mathrm{op}}
\end{align*}
for every $k \in \{1,\dots,d\}$, where $\|\cdot\|_{\mathrm{op}}$ is the operator norm for linear maps from $\mathbb{R}^d$ to $\mathbb{R}^n$. Since the operator norm is bounded by the Frobenius norm,
\begin{align*}
\|A-B\|_{\mathrm{op}} \leq \|A-B\|_F,
\end{align*}
we obtain
\begin{align*}
|\sigma_{\max}(A)-\sigma_{\max}(B)| \leq \|A-B\|_F.
\end{align*}
The same perturbation estimate with $k=d$ gives
\begin{align*}
|\sigma_{\min}(A)-\sigma_{\min}(B)| \leq \|A-B\|_F.
\end{align*}
Thus both $\sigma_{\max}$ and $\sigma_{\min}$ are $1$-Lipschitz functions on the Euclidean space $(\mathbb{R}^{n \times d},\|\cdot\|_F)$.[/step]
custom_env
admin
[guided]The concentration theorem we will use applies to Lipschitz functions of a standard Gaussian vector. Therefore the first technical point is to check that changing the matrix a little in Frobenius norm can change either extreme singular value by at most the same amount.
Let $A,B \in \mathbb{R}^{n \times d}$. The [Weyl-Mirsky singular value perturbation inequality](https://doi.org/10.1007/978-1-4612-0653-8) gives, for each index $k \in \{1,\dots,d\}$,
\begin{align*}
|s_k(A)-s_k(B)| \leq \|A-B\|_{\mathrm{op}},
\end{align*}
where $\|A-B\|_{\mathrm{op}}$ is the operator norm of the [linear map](/page/Linear%20Map) $A-B:\mathbb{R}^d \to \mathbb{R}^n$. This result is exactly the singular-value analogue of the eigenvalue perturbation bound for self-adjoint matrices.
The Gaussian vector lives naturally in the Frobenius geometry, not the operator-norm geometry. We therefore compare the two norms. For every matrix $M \in \mathbb{R}^{n \times d}$,
\begin{align*}
\|M\|_{\mathrm{op}}
= \sup_{x \in \mathbb{R}^d,\ |x|=1} |Mx|
\leq \left(\sum_{i=1}^n \sum_{j=1}^d M_{ij}^2\right)^{1/2}
= \|M\|_F.
\end{align*}
Applying this with $M=A-B$ gives
\begin{align*}
|s_k(A)-s_k(B)| \leq \|A-B\|_F.
\end{align*}
Taking $k=1$ proves that $A \mapsto s_1(A)$ is $1$-Lipschitz, and taking $k=d$ proves that $A \mapsto s_d(A)$ is $1$-Lipschitz. This is the exact regularity needed for Gaussian concentration.[/guided]
custom_env
admin
[step:Insert Gordon's comparison bounds for the two expectations]
Let $S^{m-1}:=\{x \in \mathbb{R}^m: |x|=1\}$ denote the Euclidean unit sphere in $\mathbb{R}^m$. We use Gordon's rectangular Gaussian comparison inequality in the following form: if $H \in \mathbb{R}^{n \times d}$ has independent centered Gaussian entries of variance $1$, then
\begin{align*}
\mathbb{E}\left[\sup_{u \in S^{d-1}} |Hu|\right] &\leq \sqrt n+\sqrt d, &
\mathbb{E}\left[\inf_{u \in S^{d-1}} |Hu|\right] &\geq \sqrt n-\sqrt d.
\end{align*}
This is the rectangular form of [Gordon's Gaussian comparison inequality](https://doi.org/10.1007/BF02786070). Its hypotheses are met for $H=G$ because the entries of $G$ are independent centered Gaussian random variables with variance $1$. By the variational characterisations
\begin{align*}
s_1(G) &= \sup_{u \in S^{d-1}} |Gu|, &
s_d(G) &= \inf_{u \in S^{d-1}} |Gu|,
\end{align*}
we obtain
\begin{align*}
\mathbb{E}[s_1(G)] &\leq \sqrt n+\sqrt d, &
\mathbb{E}[s_d(G)] &\geq \sqrt n-\sqrt d.
\end{align*}
[/step]
custom_env
admin
[step:Apply Gaussian concentration to the largest singular value]
Gaussian concentration for Lipschitz functions says that if $X$ is a standard Gaussian vector in a finite-dimensional Euclidean space and $f$ is $1$-Lipschitz, then for every $t \geq 0$,
\begin{align*}
\mathbb{P}\left(f(X) \geq \mathbb{E}[f(X)] + t\right) \leq e^{-t^2/2}
\end{align*}
by [Gaussian concentration for Lipschitz functions](https://doi.org/10.1007/978-3-642-20212-4). Applying this theorem with $X=G$ and $f=\sigma_{\max}$ is valid by the Euclidean identification above and the $1$-Lipschitz estimate already proved. Therefore, for every $t \geq 0$,
\begin{align*}
\mathbb{P}\left(s_1(G) \geq \mathbb{E}[s_1(G)] + t\right) \leq e^{-t^2/2}.
\end{align*}
Since $\mathbb{E}[s_1(G)] \leq \sqrt n+\sqrt d$, the event
\begin{align*}
\left\{s_1(G) \geq \sqrt n+\sqrt d+t\right\}
\end{align*}
is contained in
\begin{align*}
\left\{s_1(G) \geq \mathbb{E}[s_1(G)] + t\right\}.
\end{align*}
Hence
\begin{align*}
\mathbb{P}\left(s_1(G) \geq \sqrt n+\sqrt d+t\right) \leq e^{-t^2/2}.
\end{align*}
[/step]
custom_env
admin
[step:Apply Gaussian concentration to the smallest singular value]
Since $\sigma_{\min}$ is $1$-Lipschitz, define $-\sigma_{\min}: \mathbb{R}^{n \times d} \to \mathbb{R}$ by
\begin{align*}
(-\sigma_{\min})(A) := -s_d(A).
\end{align*}
This function is also $1$-Lipschitz. Applying Gaussian concentration to $f=-\sigma_{\min}$ gives, for every $t \geq 0$,
\begin{align*}
\mathbb{P}\left(-s_d(G) \geq \mathbb{E}[-s_d(G)] + t\right) \leq e^{-t^2/2}.
\end{align*}
Equivalently,
\begin{align*}
\mathbb{P}\left(s_d(G) \leq \mathbb{E}[s_d(G)] - t\right) \leq e^{-t^2/2}.
\end{align*}
By Gordon's lower expectation bound,
\begin{align*}
\mathbb{E}[s_d(G)] \geq \sqrt n-\sqrt d.
\end{align*}
Therefore the event
\begin{align*}
\left\{s_d(G) \leq \sqrt n-\sqrt d-t\right\}
\end{align*}
is contained in
\begin{align*}
\left\{s_d(G) \leq \mathbb{E}[s_d(G)]-t\right\}.
\end{align*}
Consequently,
\begin{align*}
\mathbb{P}\left(s_d(G) \leq \sqrt n-\sqrt d-t\right) \leq e^{-t^2/2}.
\end{align*}
Together with the largest-singular-value estimate, this proves both claimed Davidson-Szarek bounds.
[/step]