[proofplan]
We reduce the statement to the standard real Wishart soft-edge theorem of Johnstone, which gives Tracy-Widom $\beta = 1$ fluctuations for the largest eigenvalue of a real Gaussian sample covariance matrix with the precise centering and scaling used here. The only substantive verification is that the present matrix model is exactly the real Wishart/Laguerre orthogonal ensemble covered by that theorem and that the normalization is transferred from $X_n^\top X_n$ to $n^{-1}X_n^\top X_n$. The aspect-ratio hypothesis $p_n/n \to c \in (0,\infty)$ ensures that the Johnstone hypotheses apply and that the scale is of order $n^{-2/3}$ for the eigenvalues of $S_n$.
[/proofplan]
[step:Name the largest eigenvalue and the Wishart normalization]
For each $n$, let $p_n \in \mathbb{N}$ be the column dimension of $X_n$, and let
\begin{align*}
W_n := X_n^\top X_n \in \mathbb{R}^{p_n \times p_n}
\end{align*}
denote the unnormalised real Wishart matrix. Let $\Lambda_{1,n}$ denote the largest eigenvalue of $W_n$, and let $\lambda_{1,n}$ denote the largest eigenvalue of $S_n = n^{-1}W_n$. Since scalar multiplication by $n^{-1}$ multiplies every eigenvalue by $n^{-1}$, we have
\begin{align*}
\lambda_{1,n} = n^{-1}\Lambda_{1,n}.
\end{align*}
The entries of $X_n$ are independent real standard Gaussian random variables, so $W_n$ is the real Wishart matrix with identity covariance and parameters $(n,p_n)$. If $p_n > n$, then $W_n = X_n^\top X_n$ has $p_n - n$ additional zero eigenvalues, while $X_nX_n^\top$ and $X_n^\top X_n$ have the same non-zero eigenvalues with the same multiplicities by the singular-value decomposition. Hence the largest eigenvalue is unaffected by passing between $X_n^\top X_n$ and $X_nX_n^\top$.
[/step]
[step:Apply the real Wishart soft-edge theorem]
Define
\begin{align*}
\widetilde{\mu}_n &:= (\sqrt{n - 1/2} + \sqrt{p_n - 1/2})^2, \\
\widetilde{\sigma}_n &:= (\sqrt{n - 1/2} + \sqrt{p_n - 1/2})(1/\sqrt{n - 1/2} + 1/\sqrt{p_n - 1/2})^{1/3}.
\end{align*}
By the [Johnstone Tracy-Widom theorem for real Wishart matrices](https://doi.org/10.1214/aos/1009210544), applied to the real Gaussian matrix $X_n$ with identity covariance and aspect ratio $p_n/n \to c \in (0,\infty)$, the largest eigenvalue $\Lambda_{1,n}$ of $W_n = X_n^\top X_n$ satisfies the asserted soft-edge limit. If the theorem is stated with the smaller matrix dimension first, then in the case $p_n>n$ we apply it to $X_nX_n^\top$ instead; the preceding singular-value argument identifies its largest eigenvalue with $\Lambda_{1,n}$, and the displayed centering and scaling are symmetric in $n$ and $p_n$. Thus
\begin{align*}
\frac{\Lambda_{1,n} - \widetilde{\mu}_n}{\widetilde{\sigma}_n} \xrightarrow{d} \xi,
\end{align*}
where $\xi$ has the Tracy-Widom distribution $F_1$.
[guided]
The external theorem we need is the real Wishart soft-edge limit, often called Johnstone's Tracy-Widom theorem for the largest eigenvalue of a real Wishart matrix. Its hypotheses are: the matrix entries are independent real standard Gaussian random variables, the population covariance is the identity matrix, and the dimensions satisfy $p_n/n \to c$ for some $c \in (0,\infty)$. These are exactly the hypotheses in the theorem statement: $X_n$ has independent $\mathcal N(0,1)$ entries, so $X_n^\top X_n$ is a real identity-covariance Wishart matrix, and the aspect ratio assumption is $p_n/n \to c \in (0,\infty)$.
There is one dimensional convention to check. Some formulations state the real Wishart soft-edge theorem for the non-zero eigenvalues of the sample covariance matrix with the smaller dimension listed first. If $p_n \le n$, this is directly $X_n^\top X_n$. If $p_n > n$, then $X_n^\top X_n$ has $p_n-n$ zero eigenvalues, but the singular-value decomposition gives the same positive eigenvalues for $X_n^\top X_n$ and $X_nX_n^\top$, with the same multiplicities. Therefore the largest eigenvalue is the same in the two matrices. Since the centering $\widetilde{\mu}_n$ and scaling $\widetilde{\sigma}_n$ are symmetric in $n$ and $p_n$, applying Johnstone's theorem to $X_nX_n^\top$ in the case $p_n>n$ gives exactly the same displayed normalization.
The theorem gives the following convergence for the largest eigenvalue $\Lambda_{1,n}$ of the unnormalised Wishart matrix $W_n = X_n^\top X_n$:
\begin{align*}
\frac{\Lambda_{1,n} - \widetilde{\mu}_n}{\widetilde{\sigma}_n} \xrightarrow{d} \xi,
\end{align*}
where $\xi$ has distribution function $F_1$, the Tracy-Widom law with parameter $\beta = 1$. The centering and scaling are
\begin{align*}
\widetilde{\mu}_n &:= (\sqrt{n - 1/2} + \sqrt{p_n - 1/2})^2, \\
\widetilde{\sigma}_n &:= (\sqrt{n - 1/2} + \sqrt{p_n - 1/2})(1/\sqrt{n - 1/2} + 1/\sqrt{p_n - 1/2})^{1/3}.
\end{align*}
This is the only deep input in the proof: the Pfaffian kernel formula, the Laguerre asymptotics at the soft edge, and the Airy kernel identification are precisely the content of the cited soft-edge theorem, not additional unproved steps here.
[/guided]
[/step]
[step:Transfer the convergence from $W_n$ to $S_n$]
Define the normalized centering and scaling constants
\begin{align*}
\mu_n &:= n^{-1}\widetilde{\mu}_n, \\
\sigma_n &:= n^{-1}\widetilde{\sigma}_n.
\end{align*}
Using $\lambda_{1,n} = n^{-1}\Lambda_{1,n}$, we compute
\begin{align*}
\frac{\lambda_{1,n} - \mu_n}{\sigma_n}
&= \frac{n^{-1}\Lambda_{1,n} - n^{-1}\widetilde{\mu}_n}{n^{-1}\widetilde{\sigma}_n} \\
&= \frac{\Lambda_{1,n} - \widetilde{\mu}_n}{\widetilde{\sigma}_n}.
\end{align*}
The right-hand side converges in distribution to $F_1$ by the previous step, so
\begin{align*}
\frac{\lambda_{1,n} - \mu_n}{\sigma_n} \xrightarrow{d} \xi,
\end{align*}
where $\xi$ has distribution function $F_1$.
[guided]
The cited soft-edge theorem is stated for the unnormalised Wishart matrix $W_n=X_n^\top X_n$, while the theorem statement uses the normalized sample covariance matrix $S_n=n^{-1}W_n$. We therefore transfer the normalization explicitly. Define
\begin{align*}
\mu_n &:= n^{-1}\widetilde{\mu}_n, \\
\sigma_n &:= n^{-1}\widetilde{\sigma}_n.
\end{align*}
Since multiplying a matrix by the scalar $n^{-1}$ multiplies each of its eigenvalues by $n^{-1}$, the largest eigenvalues satisfy $\lambda_{1,n}=n^{-1}\Lambda_{1,n}$. Substituting this identity and the definitions of $\mu_n$ and $\sigma_n$ gives
\begin{align*}
\frac{\lambda_{1,n} - \mu_n}{\sigma_n}
&= \frac{n^{-1}\Lambda_{1,n} - n^{-1}\widetilde{\mu}_n}{n^{-1}\widetilde{\sigma}_n} \\
&= \frac{\Lambda_{1,n} - \widetilde{\mu}_n}{\widetilde{\sigma}_n}.
\end{align*}
The right-hand side converges in distribution to $\xi$ by Johnstone's theorem as applied in the previous step. Hence
\begin{align*}
\frac{\lambda_{1,n} - \mu_n}{\sigma_n} \xrightarrow{d} \xi,
\end{align*}
where $\xi$ has distribution function $F_1$.
[/guided]
[/step]
[step:Identify the Marchenko-Pastur edge scale]
The upper Marchenko-Pastur edge for the normalized covariance matrix $S_n = n^{-1}X_n^\top X_n$ is
\begin{align*}
b_c := (1 + \sqrt{c})^2.
\end{align*}
Since $p_n/n \to c$, we have
\begin{align*}
\mu_n
= \frac{(\sqrt{n - 1/2} + \sqrt{p_n - 1/2})^2}{n}
\longrightarrow (1 + \sqrt{c})^2 = b_c.
\end{align*}
Moreover
\begin{align*}
\sigma_n
= \frac{(\sqrt{n - 1/2} + \sqrt{p_n - 1/2})(1/\sqrt{n - 1/2} + 1/\sqrt{p_n - 1/2})^{1/3}}{n},
\end{align*}
and the first factor is of order $n^{1/2}$ while the second factor is of order $n^{-1/6}$. Hence $\sigma_n$ is of order $n^{-2/3}$. This is exactly the centering at the upper Marchenko-Pastur edge and the soft-edge scaling asserted in the theorem.
[/step]