[proofplan]
We evaluate the supremum over the $\ell_2$-ball in closed form using the Cauchy--Schwarz inequality, then apply Jensen's inequality (concavity of $\sqrt{\cdot}$) to pass from $\mathbb{E}\|v\|_2$ to $(\mathbb{E}\|v\|_2^2)^{1/2}$. The cross terms in $\mathbb{E}\|\sum_i \varepsilon_i x_i\|_2^2$ vanish by independence and mean-zero of the Rademacher variables, leaving the sum $\sum_i \|x_i\|_2^2$. The population-level bound follows by taking expectations over the data and using the i.i.d. assumption.
[/proofplan]
[step:Evaluate the supremum over $\|\beta\|_2 \leq \lambda$ in closed form]
The empirical Rademacher complexity is
\begin{align*}
\hat{R}(\mathcal{H}(x_{1:n})) = \mathbb{E}\!\left[\sup_{\beta:\,\|\beta\|_2 \leq \lambda} \frac{1}{n}\sum_{i=1}^n \varepsilon_i x_i^\top \beta\right].
\end{align*}
Since the sum is linear in $\beta$, we may write
\begin{align*}
\sup_{\beta:\,\|\beta\|_2 \leq \lambda} \frac{1}{n}\sum_{i=1}^n \varepsilon_i x_i^\top \beta = \sup_{\beta:\,\|\beta\|_2 \leq \lambda} \frac{1}{n}\beta^\top\!\left(\sum_{i=1}^n \varepsilon_i x_i\right).
\end{align*}
Define $v := \sum_{i=1}^n \varepsilon_i x_i \in \mathbb{R}^p$. The Cauchy--Schwarz inequality in $\mathbb{R}^p$ gives $\beta^\top v \leq \|\beta\|_2 \|v\|_2$ for all $\beta, v \in \mathbb{R}^p$, with equality when $\beta$ is a positive scalar multiple of $v$. Since $\|\beta\|_2 \leq \lambda$, the supremum is attained at $\beta^* = \lambda v / \|v\|_2$ (when $v \neq 0$; the case $v = 0$ gives a supremum of $0$), yielding
\begin{align*}
\sup_{\beta:\,\|\beta\|_2 \leq \lambda} \frac{1}{n}\beta^\top v = \frac{\lambda}{n}\|v\|_2 = \frac{\lambda}{n}\!\left\|\sum_{i=1}^n \varepsilon_i x_i\right\|_2.
\end{align*}
Therefore
\begin{align*}
\hat{R}(\mathcal{H}(x_{1:n})) = \frac{\lambda}{n}\,\mathbb{E}\!\left\|\sum_{i=1}^n \varepsilon_i x_i\right\|_2.
\end{align*}
[guided]
Why can we evaluate the supremum exactly? The constraint set $\{\beta : \|\beta\|_2 \leq \lambda\}$ is the closed $\ell_2$-ball of radius $\lambda$, and the objective $\beta \mapsto \beta^\top v$ is a linear functional. The maximum of a linear functional over an $\ell_2$-ball is a classical duality result: $\sup_{\|\beta\|_2 \leq \lambda} \beta^\top v = \lambda\|v\|_2$. This is the content of the Cauchy--Schwarz inequality: $\beta^\top v \leq \|\beta\|_2\|v\|_2 \leq \lambda\|v\|_2$, with equality achieved by choosing $\beta = \lambda v/\|v\|_2$.
[/guided]
[/step]
[step:Apply Jensen's inequality to pass from $\mathbb{E}\|v\|_2$ to $(\mathbb{E}\|v\|_2^2)^{1/2}$]
The function $t \mapsto \sqrt{t}$ is concave on $[0, \infty)$. By [Jensen's Inequality](/theorems/1977) applied to the random variable $\|v\|_2^2 = \|\sum_{i=1}^n \varepsilon_i x_i\|_2^2$ (which is non-negative) and the concave function $\sqrt{\cdot}$:
\begin{align*}
\mathbb{E}\!\left\|\sum_{i=1}^n \varepsilon_i x_i\right\|_2 = \mathbb{E}\!\left[\sqrt{\left\|\sum_{i=1}^n \varepsilon_i x_i\right\|_2^2}\right] \leq \sqrt{\mathbb{E}\!\left\|\sum_{i=1}^n \varepsilon_i x_i\right\|_2^2}.
\end{align*}
Substituting into the expression from the previous step:
\begin{align*}
\hat{R}(\mathcal{H}(x_{1:n})) \leq \frac{\lambda}{n}\left(\mathbb{E}\!\left\|\sum_{i=1}^n \varepsilon_i x_i\right\|_2^2\right)^{1/2}.
\end{align*}
[guided]
Jensen's inequality for concave functions reverses the direction: $\mathbb{E}[\varphi(Z)] \leq \varphi(\mathbb{E}[Z])$ when $\varphi$ is concave. Here $\varphi(t) = \sqrt{t}$ is concave (since $\varphi''(t) = -1/(4t^{3/2}) < 0$ for $t > 0$), so $\mathbb{E}[\sqrt{Z}] \leq \sqrt{\mathbb{E}[Z]}$ for any non-negative random variable $Z$. We apply this with $Z = \|\sum_i \varepsilon_i x_i\|_2^2$.
[/guided]
[/step]
[step:Compute $\mathbb{E}\|\sum_{i=1}^n \varepsilon_i x_i\|_2^2$ by expanding and eliminating cross terms]
Expand the squared norm:
\begin{align*}
\mathbb{E}\!\left\|\sum_{i=1}^n \varepsilon_i x_i\right\|_2^2 = \mathbb{E}\!\left[\sum_{i=1}^n \sum_{j=1}^n \varepsilon_i \varepsilon_j\, x_i^\top x_j\right] = \sum_{i=1}^n \sum_{j=1}^n \mathbb{E}[\varepsilon_i \varepsilon_j]\, x_i^\top x_j.
\end{align*}
Since $\varepsilon_1, \ldots, \varepsilon_n$ are independent Rademacher variables (each taking values $\pm 1$ with equal probability), their moments satisfy:
\begin{align*}
\mathbb{E}[\varepsilon_i \varepsilon_j] = \begin{cases} \mathbb{E}[\varepsilon_i^2] = 1 & \text{if } i = j, \\ \mathbb{E}[\varepsilon_i]\,\mathbb{E}[\varepsilon_j] = 0 & \text{if } i \neq j, \end{cases}
\end{align*}
where the $i \neq j$ case uses independence and $\mathbb{E}[\varepsilon_i] = 0$. Therefore all cross terms vanish and
\begin{align*}
\mathbb{E}\!\left\|\sum_{i=1}^n \varepsilon_i x_i\right\|_2^2 = \sum_{i=1}^n x_i^\top x_i = \sum_{i=1}^n \|x_i\|_2^2.
\end{align*}
[guided]
This computation is the analogue of the variance of a sum of independent mean-zero random vectors: $\operatorname{Var}(\sum_i Z_i) = \sum_i \operatorname{Var}(Z_i)$ when the $Z_i$ are independent with mean zero. Here each $Z_i = \varepsilon_i x_i$ has $\mathbb{E}[Z_i] = x_i \mathbb{E}[\varepsilon_i] = 0$ and $\mathbb{E}[\|Z_i\|_2^2] = \|x_i\|_2^2 \mathbb{E}[\varepsilon_i^2] = \|x_i\|_2^2$. The cross terms $\mathbb{E}[Z_i^\top Z_j] = x_i^\top x_j \mathbb{E}[\varepsilon_i \varepsilon_j] = 0$ for $i \neq j$ by independence and the mean-zero property.
[/guided]
[/step]
[step:Assemble the empirical bound]
Substituting the result of the previous step:
\begin{align*}
\hat{R}(\mathcal{H}(x_{1:n})) \leq \frac{\lambda}{n}\left(\sum_{i=1}^n \|x_i\|_2^2\right)^{1/2},
\end{align*}
which is the first stated bound.
[/step]
[step:Derive the population-level bound by taking expectations over the data]
Suppose $X_1, \ldots, X_n$ are i.i.d. copies of a random vector $X \in \mathbb{R}^p$. The population Rademacher complexity is $R_n(\mathcal{H}) = \mathbb{E}_{X_{1:n}}[\hat{R}(\mathcal{H}(X_{1:n}))]$. We return to the exact expression from the first step (before applying Jensen) and take the full expectation over both $\varepsilon$ and $X$:
\begin{align*}
R_n(\mathcal{H}) = \frac{\lambda}{n}\,\mathbb{E}_{\varepsilon, X}\!\left\|\sum_{i=1}^n \varepsilon_i X_i\right\|_2.
\end{align*}
Applying Jensen's inequality as before (concavity of $\sqrt{\cdot}$, now over the joint randomness):
\begin{align*}
R_n(\mathcal{H}) \leq \frac{\lambda}{n}\left(\mathbb{E}_{\varepsilon, X}\!\left\|\sum_{i=1}^n \varepsilon_i X_i\right\|_2^2\right)^{1/2}.
\end{align*}
By the same cross-term cancellation (the $\varepsilon_i$ are independent of everything and of each other, with $\mathbb{E}[\varepsilon_i] = 0$):
\begin{align*}
\mathbb{E}_{\varepsilon, X}\!\left\|\sum_{i=1}^n \varepsilon_i X_i\right\|_2^2 = \sum_{i=1}^n \mathbb{E}\!\left[\|X_i\|_2^2\right] = n\,\mathbb{E}\!\left[\|X\|_2^2\right],
\end{align*}
where the last equality uses the identical distribution of $X_1, \ldots, X_n$. Therefore
\begin{align*}
R_n(\mathcal{H}) \leq \frac{\lambda}{n}\left(n\,\mathbb{E}\!\left[\|X\|_2^2\right]\right)^{1/2} = \frac{\lambda\,(\mathbb{E}\|X\|_2^2)^{1/2}}{\sqrt{n}},
\end{align*}
which is the second stated bound. This completes the proof.
[guided]
The population-level bound exhibits the standard $1/\sqrt{n}$ rate: the Rademacher complexity of the $\ell_2$-constrained linear class decreases at rate $n^{-1/2}$. The factor $\lambda(\mathbb{E}\|X\|_2^2)^{1/2}$ reflects the interaction between the complexity of the hypothesis class (controlled by the radius $\lambda$ of the $\ell_2$-ball) and the scale of the data (controlled by $\mathbb{E}\|X\|_2^2$). A larger $\ell_2$ constraint or higher-dimensional data increases the Rademacher complexity, while more data points decrease it -- this is the bias-complexity tradeoff quantified in a single formula.
[/guided]
[/step]