[proofplan]
We use the $\ell_1$--$\ell_\infty$ duality (Holder's inequality) to reduce the supremum over the $\ell_1$-ball to the $\ell_\infty$-norm of the Rademacher average vector. We then bound the expected $\ell_\infty$-norm by the [Expected Maximum of Sub-Gaussians](/theorems/1963) bound, noting that for each coordinate $j$ the Rademacher sum $\frac{1}{n}\sum_{i=1}^n \varepsilon_i x_{ij}$ and its negation are both sub-Gaussian, giving $2p$ sub-Gaussian random variables whose maximum controls the $\ell_\infty$-norm.
[/proofplan]
[step:Apply Holder's inequality to evaluate the supremum over the $\ell_1$-ball]
The empirical Rademacher complexity is
\begin{align*}
\hat{R}(\mathcal{H}(x_{1:n})) = \mathbb{E}\!\left[\sup_{\beta:\,\|\beta\|_1 \leq \lambda} \frac{1}{n}\sum_{i=1}^n \varepsilon_i x_i^\top \beta\right] = \mathbb{E}\!\left[\sup_{\beta:\,\|\beta\|_1 \leq \lambda} \beta^\top\!\left(\frac{1}{n}\sum_{i=1}^n \varepsilon_i x_i\right)\right].
\end{align*}
Define the random vector $v := \frac{1}{n}\sum_{i=1}^n \varepsilon_i x_i \in \mathbb{R}^p$. By Holder's inequality with exponents $1$ and $\infty$ (since $\frac{1}{1} + \frac{1}{\infty} = 1$), for any $\beta, v \in \mathbb{R}^p$:
\begin{align*}
\beta^\top v \leq \|\beta\|_1 \|v\|_\infty.
\end{align*}
The supremum over $\|\beta\|_1 \leq \lambda$ is attained: let $j^* = \arg\max_j |v_j|$, and set $\beta^* = \lambda\, \operatorname{sgn}(v_{j^*})\, e_{j^*}$ (where $e_{j^*}$ is the $j^*$-th standard basis vector). Then $\|\beta^*\|_1 = \lambda$ and $(\beta^*)^\top v = \lambda |v_{j^*}| = \lambda\|v\|_\infty$. Therefore
\begin{align*}
\hat{R}(\mathcal{H}(x_{1:n})) = \lambda\,\mathbb{E}\!\left[\|v\|_\infty\right] = \lambda\,\mathbb{E}\!\left[\max_{j=1,\ldots,p}\left|\frac{1}{n}\sum_{i=1}^n \varepsilon_i x_{ij}\right|\right].
\end{align*}
[guided]
The $\ell_1$--$\ell_\infty$ duality is the analogue of the $\ell_2$ self-duality used in the [Rademacher Complexity of $\ell_2$-Constrained Class](/theorems/1982). For the $\ell_2$-ball, the supremum of a linear functional reduces to the $\ell_2$-norm of the coefficient vector (via Cauchy--Schwarz). For the $\ell_1$-ball, it reduces to the $\ell_\infty$-norm (via Holder). This explains why the $\ell_1$-constrained bound depends on $\max_j$ (coordinate-wise quantities) rather than on $\sum_i \|x_i\|_2^2$ (an aggregate): the $\ell_1$-ball is "spiky", so the worst-case hypothesis concentrates its weight on a single coordinate.
[/guided]
[/step]
[step:Rewrite the expected $\ell_\infty$-norm as the expected maximum of $2p$ sub-Gaussian variables]
Write
\begin{align*}
\max_{j=1,\ldots,p}\left|\frac{1}{n}\sum_{i=1}^n \varepsilon_i x_{ij}\right| = \max_{j=1,\ldots,p}\max\!\left\{\frac{1}{n}\sum_{i=1}^n \varepsilon_i x_{ij},\; -\frac{1}{n}\sum_{i=1}^n \varepsilon_i x_{ij}\right\} = \max_{k=1,\ldots,2p} W_k,
\end{align*}
where, for $j = 1, \ldots, p$, we define $W_j := \frac{1}{n}\sum_{i=1}^n \varepsilon_i x_{ij}$ and $W_{p+j} := -\frac{1}{n}\sum_{i=1}^n \varepsilon_i x_{ij}$.
Each $W_k$ is mean-zero: $\mathbb{E}[W_j] = \frac{1}{n}\sum_{i=1}^n x_{ij}\,\mathbb{E}[\varepsilon_i] = 0$, and similarly $\mathbb{E}[W_{p+j}] = 0$.
For the sub-Gaussian parameter of $W_j$: the summands $\varepsilon_i x_{ij}$ are independent (since the $\varepsilon_i$ are independent and the $x_{ij}$ are fixed constants). Each $\varepsilon_i x_{ij}$ takes values in $[-|x_{ij}|, |x_{ij}|]$, so by [Hoeffding's Lemma](/theorems/1956), $\varepsilon_i x_{ij}$ is sub-Gaussian with parameter $(2|x_{ij}|)/2 = |x_{ij}|$. By [Sub-Gaussian Stability Under Linear Combinations](/theorems/1960) applied with weights $\gamma_i = 1/n$ and parameters $\sigma_i = |x_{ij}|$:
\begin{align*}
W_j \text{ is sub-Gaussian with parameter } \sigma_j := \left(\sum_{i=1}^n \frac{x_{ij}^2}{n^2}\right)^{1/2} = \frac{1}{n}\left(\sum_{i=1}^n x_{ij}^2\right)^{1/2}.
\end{align*}
Since $W_{p+j} = -W_j$ and negation preserves the sub-Gaussian property with the same parameter, $W_{p+j}$ is also sub-Gaussian with parameter $\sigma_j$.
Define
\begin{align*}
\sigma_{\max} := \max_{j=1,\ldots,p} \sigma_j = \max_{j=1,\ldots,p} \frac{1}{n}\left(\sum_{i=1}^n x_{ij}^2\right)^{1/2}.
\end{align*}
Since a sub-Gaussian random variable with parameter $\sigma_j \leq \sigma_{\max}$ is also sub-Gaussian with parameter $\sigma_{\max}$, all $2p$ variables $W_1, \ldots, W_{2p}$ are sub-Gaussian with the common parameter $\sigma_{\max}$.
[/step]
[step:Apply the expected maximum bound and simplify]
By the [Expected Maximum of Sub-Gaussians](/theorems/1963) applied to the $2p$ mean-zero sub-Gaussian random variables $W_1, \ldots, W_{2p}$, each with parameter $\sigma_{\max}$ (independence is not required):
\begin{align*}
\mathbb{E}\!\left[\max_{k=1,\ldots,2p} W_k\right] \leq \sigma_{\max}\sqrt{2\log(2p)}.
\end{align*}
Substituting into the expression from the first step:
\begin{align*}
\hat{R}(\mathcal{H}(x_{1:n})) \leq \lambda \cdot \sigma_{\max} \cdot \sqrt{2\log(2p)} = \lambda \cdot \max_{j=1,\ldots,p}\left(\frac{1}{n}\sum_{i=1}^n x_{ij}^2\right)^{1/2} \cdot \frac{\sqrt{2\log(2p)}}{\sqrt{n}} \cdot \sqrt{n}.
\end{align*}
Expanding $\sigma_{\max}$ explicitly:
\begin{align*}
\hat{R}(\mathcal{H}(x_{1:n})) \leq \lambda \cdot \frac{1}{n}\left(\max_{j=1,\ldots,p}\sum_{i=1}^n x_{ij}^2\right)^{1/2} \cdot \sqrt{2\log(2p)} = \lambda \cdot \max_{j=1,\ldots,p}\left(\frac{1}{n}\sum_{i=1}^n x_{ij}^2\right)^{1/2} \cdot \frac{\sqrt{2\log(2p)}}{\sqrt{n}},
\end{align*}
where in the last step we used $\frac{1}{n}(\sum_i x_{ij}^2)^{1/2} = \frac{1}{\sqrt{n}}(\frac{1}{n}\sum_i x_{ij}^2)^{1/2}$. This is the first stated bound.
When $x_{ij} \in [-C, C]$ for all $i, j$, we have $\frac{1}{n}\sum_{i=1}^n x_{ij}^2 \leq \frac{1}{n} \cdot nC^2 = C^2$ for each $j$, so $\max_j (\frac{1}{n}\sum_i x_{ij}^2)^{1/2} \leq C$. Substituting:
\begin{align*}
\hat{R}(\mathcal{H}(x_{1:n})) \leq \frac{\lambda C\sqrt{2\log(2p)}}{\sqrt{n}},
\end{align*}
which is the second stated bound. This completes the proof.
[guided]
The key difference from the $\ell_2$-constrained bound is the logarithmic dependence on $p$. The $\ell_2$-constrained class has Rademacher complexity $O(\lambda(\mathbb{E}\|X\|_2^2)^{1/2}/\sqrt{n})$, which depends on $p$ only through $\|X\|_2^2$. The $\ell_1$-constrained class pays a $\sqrt{\log p}$ factor (from the maximum of $2p$ sub-Gaussians), but its bound depends on the coordinate-wise second moment $\max_j \frac{1}{n}\sum_i x_{ij}^2$ rather than the full squared norm. When $p$ is large but each coordinate is bounded by $C$, the $\ell_1$ bound scales as $O(\lambda C\sqrt{\log p/n})$, which can be much smaller than the $\ell_2$ bound if $\|X\|_2^2 \approx pC^2$ grows with $p$. This explains the practical effectiveness of $\ell_1$-regularisation (LASSO) in high-dimensional settings: the $\ell_1$ constraint produces a hypothesis class with better statistical properties when $p \gg n$.
[/guided]
[/step]