[proofplan]
This theorem is quoted in the notes as a standard random-matrix input rather than derived from earlier course results. The external source is the classical Marchenko-Pastur sample-covariance theorem, in the Bai-Silverstein finite-fourth-moment formulation: for centered variance-one independent entries with uniformly bounded fourth moments and $p_n/n\to c\in(0,\infty)$, the empirical spectral distribution of the matrix
\begin{align*}
n^{-1}Y_n^\top Y_n
\end{align*}
converges weakly in probability to the Marchenko-Pastur probability measure with the displayed density and zero atom. The proof below verifies that the normalized matrices satisfy the theorem's hypotheses, records the quoted theorem precisely, and then transfers the conclusion back to $S_n/\sigma^2$.
[/proofplan]
[step:Normalize the matrix and define the empirical spectral measures]
For each $n \in \mathbb{N}$, write $X_{n,ij}$ for the $(i,j)$ entry of $X_n$ and define the normalized matrix
\begin{align*}
Y_n: \{1,\dots,n\} \times \{1,\dots,p_n\} &\to \mathbb{R}, \\
(i,j) &\mapsto \sigma^{-1} X_{n,ij}.
\end{align*}
Then the entries of $Y_n$ are independent and identically distributed, have mean $0$, variance $1$, and fourth moments bounded uniformly in $n$. All convergence in probability below is with respect to the underlying probability measure $\mathbb{P}$. We write $\mathcal{L}^1$ for one-dimensional Lebesgue measure on $\mathbb{R}$. Define
\begin{align*}
A_n := \frac{1}{n}Y_n^\top Y_n \in \mathbb{R}^{p_n \times p_n}.
\end{align*}
The empirical spectral distribution of $S_n/\sigma^2$ is the probability measure
\begin{align*}
\mu_n := \frac{1}{p_n}\sum_{k=1}^{p_n} \delta_{\lambda_{k,n}},
\end{align*}
where $\lambda_{1,n},\dots,\lambda_{p_n,n}$ are the eigenvalues of $A_n$, counted with algebraic multiplicity, and $\delta_t$ denotes the Dirac probability measure at $t\in\mathbb R$.
[/step]
[step:Invoke the quoted Marchenko-Pastur sample-covariance theorem]
Define the open upper half-plane $\mathbb{C}_+ := \{w \in \mathbb{C}: \operatorname{Im}(w)>0\}$. Let $I_{p_n}\in\mathbb R^{p_n\times p_n}$ denote the identity matrix. Define the Stieltjes transform of $\mu_n$ as the map
\begin{align*}
m_n: \mathbb{C}_+ &\to \mathbb{C}_+, \\
z &\mapsto \int_{\mathbb{R}} \frac{1}{x-z}\,d\mu_n(x)
= \frac{1}{p_n}\operatorname{Tr}\bigl((A_n-zI_{p_n})^{-1}\bigr).
\end{align*}
Since $A_n$ is real symmetric and non-negative definite, every eigenvalue of $A_n$ is real, so $A_n-zI_{p_n}$ is invertible for every $z \in \mathbb{C}_+$.
We use the following quoted theorem of Marchenko-Pastur, in the Bai-Silverstein sample-covariance form. If the entries of $Y_n$ are independent and identically distributed, have mean $0$, variance $1$, uniformly bounded fourth moments, and $p_n/n \to c \in (0,\infty)$, then the empirical spectral distribution of
\begin{align*}
n^{-1}Y_n^\top Y_n
\end{align*}
converges weakly in probability to the probability measure $\mu_c$ whose Stieltjes transform is the unique analytic map $m_c: \mathbb{C}_+ \to \mathbb{C}_+$ satisfying $\operatorname{Im} m_c(z)>0$ and
\begin{align*}
z = -\frac{1}{m_c(z)} + \frac{1}{1+c m_c(z)}.
\end{align*}
The hypotheses of this quoted theorem are exactly the normalized hypotheses verified in the previous step: independence and identical distribution, mean $0$, variance $1$, uniformly bounded fourth moments, and $p_n/n \to c$. Therefore
\begin{align*}
\mu_n \xrightarrow{\mathbb{P}} \mu_c
\end{align*}
weakly. The same quoted theorem gives the Stieltjes-transform equation above; solving its quadratic equation gives
\begin{align*}
m_c(z)=\frac{1-c-z+\sqrt{(z-a_c)(z-b_c)}}{2cz},
\end{align*}
where the square root is the analytic branch on $\mathbb{C}\setminus[a_c,b_c]$ satisfying $\sqrt{(z-a_c)(z-b_c)}\sim z$ as $|z|\to\infty$.
[/step]
[step:Invert the limiting transform to obtain the Marchenko-Pastur density]
Let $x \in (a_c,b_c)$. With the stated branch of the square root, the non-tangential boundary value from $\mathbb{C}_+$ is
\begin{align*}
\sqrt{(x-a_c+i0)(x-b_c+i0)} = i\sqrt{(b_c-x)(x-a_c)}.
\end{align*}
Therefore the boundary imaginary part of $m_c$ is
\begin{align*}
\frac{1}{\pi}\operatorname{Im} m_c(x+i0)
= \frac{1}{2\pi c x}\sqrt{(b_c-x)(x-a_c)}.
\end{align*}
At the endpoints $a_c$ and $b_c$ the square-root term vanishes, so the same formula defines a locally integrable density on $[a_c,b_c]$. Define the absolutely continuous measure $\nu_c$ on $\mathbb{R}$ by
\begin{align*}
\nu_c(E) := \int_E \frac{1}{2\pi c x}\sqrt{(b_c-x)(x-a_c)}\,\mathbb{1}_{[a_c,b_c]}(x)\,d\mathcal{L}^1(x)
\end{align*}
for every Borel set $E \subset \mathbb{R}$. The Stieltjes inversion theorem applies because $m_c$ is analytic on $\mathbb{C}_+$ and has the displayed boundary imaginary part for $\mathcal{L}^1$-almost every $x \in \mathbb{R}$. It identifies the absolutely continuous part of the limiting measure as
\begin{align*}
f_c(x)\,d\mathcal{L}^1(x)
= \frac{1}{2\pi c x}\sqrt{(b_c-x)(x-a_c)}\,\mathbb{1}_{[a_c,b_c]}(x)\,d\mathcal{L}^1(x).
\end{align*}
To check the continuous mass, substitute $x=1+c-2\sqrt c\cos\theta$ on $[a_c,b_c]$. This gives
\begin{align*}
\int_{a_c}^{b_c}
\frac{1}{2\pi c x}\sqrt{(b_c-x)(x-a_c)}\,d\mathcal{L}^1(x)
=
\begin{cases}
1, & 0<c\le 1,\\
1/c, & c>1.
\end{cases}
\end{align*}
Thus the absolutely continuous part already has total mass $1$ when $0<c\le 1$, and it has mass $1/c$ when $c>1$, leaving exactly the possible mass $1-1/c$ at $0$ computed in the next step.
[/step]
[step:Account for the zero eigenvalue atom when $c>1$]
If $c>1$, then eventually $p_n>n$. Since
\begin{align*}
A_n=n^{-1}Y_n^\top Y_n
\end{align*}
has rank at most $n$, it has at least $p_n-n$ zero eigenvalues. Hence
\begin{align*}
\mu_n(\{0\}) \ge \frac{p_n-n}{p_n} = 1-\frac{n}{p_n} \to 1-\frac{1}{c}.
\end{align*}
The limiting Stieltjes transform has exactly this atom and no larger one. Indeed, for a probability measure $\mu$, an atom of mass $\alpha$ at $0$ contributes $-\alpha/z$ to the transform $\int_{\mathbb{R}}(x-z)^{-1}\,d\mu(x)$, so $\alpha=-\lim_{\eta\downarrow 0} i\eta m(i\eta)$ whenever the limit exists. For $c>1$, the chosen square-root branch satisfies
\begin{align*}
\sqrt{(0-a_c)(0-b_c)}=1-c,
\end{align*}
and therefore
\begin{align*}
m_c(z)=\frac{1-c-z+\sqrt{(z-a_c)(z-b_c)}}{2cz}
=-\frac{1-1/c}{z}+r(z),
\end{align*}
where $r$ is bounded on some punctured neighbourhood of $0$ in $\mathbb{C}_+$. Hence
\begin{align*}
-\lim_{\eta\downarrow 0} i\eta\,m_c(i\eta)=1-\frac{1}{c}.
\end{align*}
When $0<c\le 1$, the numerator has no negative simple-pole coefficient at $z=0$, so the atom at $0$ has mass $0$. Thus the limiting Marchenko-Pastur law has an atom of mass $1-1/c$ at $0$ when $c>1$, and no such atom when $0<c\le 1$.
[/step]
[step:Conclude weak convergence in probability]
The quoted Marchenko-Pastur theorem already supplies [weak convergence](/page/Weak%20Convergence) in probability of the empirical spectral distribution of
\begin{align*}
n^{-1}Y_n^\top Y_n
\end{align*}
to $\mu_c$. Equivalently, for every bounded continuous function $\varphi:\mathbb{R}\to\mathbb{R}$,
\begin{align*}
\int_{\mathbb{R}}\varphi\,d\mu_n
-
\int_{\mathbb{R}}\varphi\,d\mu_c
\xrightarrow{\mathbb{P}}0.
\end{align*}
Therefore
\begin{align*}
\mu_n \xrightarrow{\mathbb{P}} \mu_c
\end{align*}
weakly, where $\mu_c$ is the Marchenko-Pastur distribution with density $f_c$ on $[a_c,b_c]$ and, for $c>1$, an atom of mass $1-1/c$ at $0$. Since $\mu_n$ is precisely the empirical spectral distribution of $S_n/\sigma^2$, this proves the theorem.
[/step]