[proofplan]
We prove the two-sided minimax rate by combining the standard bias-variance upper bound for a boundary-corrected density estimator with an Assouad hypercube lower bound. The upper bound chooses the resolution
\begin{align*}
h_n\asymp n^{-1/(2s+d)},
\end{align*}
so the squared Hölder bias
\begin{align*}
h_n^{2s}
\end{align*}
and stochastic variance
\begin{align*}
(nh_n^d)^{-1}
\end{align*}
balance. The lower bound embeds a finite family of localized density perturbations of the uniform density; the $L^2$ separation is of order
\begin{align*}
h^{2s}
\end{align*}
while adjacent likelihoods have bounded Hellinger distance at the same resolution scale. These two estimates give matching constants in front of
\begin{align*}
n^{-2s/(2s+d)}.
\end{align*}
[/proofplan]
[step:Apply a boundary-corrected estimator to obtain the upper bound]
Use a boundary-corrected compactly supported orthonormal wavelet basis on $[0,1]^d$ with regularity strictly larger than $s$. Let $V_h$ be the span of wavelets up to spatial scale $h$, so $\dim V_h\le C_Dh^{-d}$. Let $P_h$ be the $L^2$ projection onto $V_h$. Define the estimator as the measurable map
\begin{align*}
\hat f_{n,h}:[0,1]^{dn}\to L^2([0,1]^d)
\end{align*}
obtained by replacing each coefficient $\langle f,\varphi\rangle$ of $P_hf$ by the empirical average $n^{-1}\sum_{i=1}^n\varphi(X_i)$ over the finitely many basis functions $\varphi\in V_h$. The boundary-wavelet approximation theorem for Hölder balls gives
\begin{align*}
\|P_hf-f\|_{L^2([0,1]^d)}^2\le C_1h^{2s}
\end{align*}
for every $f\in\mathcal H^s(L;[0,1]^d)$. Write $\mathbb E_f$ and $\operatorname{Var}_f$ for one-sample expectation and variance when $X_1$ has density $f$. Orthogonality and independence give the integrated variance bound
\begin{align*}
\mathbb E_{f,n}\|\hat f_{n,h}-P_hf\|_{L^2([0,1]^d)}^2=\sum_{\varphi\in V_h}\operatorname{Var}_f\left(\frac{1}{n}\sum_{i=1}^n\varphi(X_i)\right)\le \frac{1}{n}\sum_{\varphi\in V_h}\mathbb E_f[\varphi(X_1)^2].
\end{align*}
Since $f\le B$ and the basis is orthonormal in $L^2([0,1]^d)$, the last expression is at most $B\dim(V_h)/n\le C_2/(nh^d)$. Therefore, for every $f\in\mathcal F$,
\begin{align*}
\mathbb E_{f,n}\left[\int_{[0,1]^d}|\hat f_{n,h}(x)-f(x)|^2\,d\mathcal L^d(x)\right]\le C_1 h^{2s}+\frac{C_2}{n h^d}.
\end{align*}
The first term is the squared approximation error of the boundary-corrected projection on the Hölder ball, and the second term is the integrated variance under the product density law. Choose
\begin{align*}
h_n=n^{-1/(2s+d)}.
\end{align*}
If necessary replace $h_n$ by the nearest admissible dyadic resolution; this changes the estimate only by a constant depending on $s$ and $d$. Therefore there is $C=C(s,d,L,b,B)>0$ such that
\begin{align*}
R_n(\mathcal F)\le C n^{-2s/(2s+d)}.
\end{align*}
[guided]
The upper bound is the familiar bias-variance calculation, but the boundary of $[0,1]^d$ matters: an ordinary translation-invariant kernel can lose its approximation order near the boundary. Use a boundary-corrected compactly supported orthonormal wavelet basis with regularity greater than $s$. At resolution $h$, let $V_h$ be the finite-dimensional space spanned by wavelets up to that scale, with $\dim V_h\le C_Dh^{-d}$. Define
\begin{align*}
\hat f_{n,h}:[0,1]^{dn}\to L^2([0,1]^d)
\end{align*}
by estimating each projection coefficient by the sample average $n^{-1}\sum_{i=1}^n\varphi(X_i)$. The boundary-wavelet approximation bound gives $\|P_hf-f\|_2^2\le C_1h^{2s}$ for $f\in\mathcal H^s(L;[0,1]^d)$. In this paragraph $\mathbb E_f$ denotes one-sample expectation under density $f$. For the variance, orthogonality gives a sum over coefficients, and since $f\le B$,
\begin{align*}
\mathbb E_{f,n}\|\hat f_{n,h}-P_hf\|_2^2\le\frac{1}{n}\sum_{\varphi\in V_h}\mathbb E_f[\varphi(X_1)^2]\le\frac{B\dim(V_h)}{n}\le\frac{C_2}{nh^d}.
\end{align*}
Adding squared bias and variance gives constants $C_1=C_1(s,d,L,B)>0$ and $C_2=C_2(s,d,B)>0$ with
\begin{align*}
\mathbb E_{f,n}\left[\int_{[0,1]^d}|\hat f_{n,h}(x)-f(x)|^2\,d\mathcal L^d(x)\right]\le C_1 h^{2s}+\frac{C_2}{n h^d}.
\end{align*}
Here $C_1h^{2s}$ is the squared Hölder approximation error, and $C_2/(nh^d)$ is the integrated variance coming from averaging $n$ independent observations over cells or wavelets of volume comparable to $h^d$.
To minimize the displayed upper bound, balance the two terms:
\begin{align*}
h^{2s}=\frac{1}{n h^d}.
\end{align*}
This is equivalent to $h^{2s+d}=n^{-1}$, so we choose
\begin{align*}
h_n=n^{-1/(2s+d)}.
\end{align*}
Substituting this value gives
\begin{align*}
h_n^{2s}=n^{-2s/(2s+d)}
\end{align*}
and
\begin{align*}
\frac{1}{n h_n^d}=n^{-1}n^{d/(2s+d)}=n^{-2s/(2s+d)}.
\end{align*}
Thus the estimator has risk at most a constant multiple of $n^{-2s/(2s+d)}$, uniformly over $f\in\mathcal F$. Taking the infimum over all estimators can only decrease the risk, hence
\begin{align*}
R_n(\mathcal F)\le C n^{-2s/(2s+d)}.
\end{align*}
[/guided]
[/step]
[step:Embed a localized Assouad hypercube inside the density class]
Choose once and for all a nonzero function
\begin{align*}
\psi:[0,1]^d\to\mathbb R
\end{align*}
in $C_c^\infty((0,1)^d)$ such that
\begin{align*}
\int_{[0,1]^d}\psi(x)\,d\mathcal L^d(x)=0.
\end{align*}
The bump $\psi$ is fixed as part of the proof, depending only on $d$ and the chosen Hölder norm convention, so all constants involving $\psi$ are absorbed into constants depending on $s,d,L,b,B,\rho$.
For $h\in(0,1]$, place $M\asymp h^{-d}$ coordinate-aligned cubes $Q_1,\dots,Q_M\subset[0,1]^d$ of side length comparable to $h$, with pairwise gaps also comparable to $h$. Let $x_j\in Q_j$ denote the coordinatewise minimal vertex of $Q_j$, and define localized bumps $\psi_j:[0,1]^d\to\mathbb R$ by
\begin{align*}
\psi_j(x)=h^s\psi\left(\frac{x-x_j}{h}\right).
\end{align*}
For each sign vector $\theta=(\theta_1,\dots,\theta_M)\in\{-1,1\}^M$, define $f_\theta:[0,1]^d\to\mathbb R$ by
\begin{align*}
f_\theta(x)=1+a\sum_{j=1}^M\theta_j\psi_j(x),
\end{align*}
where $a=a(s,d,L,b,B,\rho)>0$ is chosen sufficiently small. Since the supports of the $\psi_j$ are disjoint, the zero integral of $\psi$ gives
\begin{align*}
\int_{[0,1]^d}f_\theta(x)\,d\mathcal L^d(x)=1.
\end{align*}
Because the supports of the bumps are separated by gaps comparable to $h$, the Hölder seminorm across different supports is controlled by the same scale as the local seminorms. Under the convention $m=\lfloor s\rfloor$ and $\alpha=s-m\in[0,1)$, with the Hölder-Zygmund interpretation when $\alpha=0$, the scaling $\psi_j(x)=h^s\psi((x-x_j)/h)$ gives a uniform bound $\|\sum_j\theta_j\psi_j\|_{\mathcal H^s}\le C_3$ for a constant $C_3=C_3(s,d,\psi)>0$. The uniform bound gives $|a\sum_j\theta_j\psi_j(x)|\le aC_4h^s\le aC_4$ with $C_4=C_4(\psi)>0$. Choose
\begin{align*}
a\le \min\left\{\frac{\rho}{C_3},\frac{1-b}{C_4},\frac{B-1}{C_4}\right\}.
\end{align*}
Then the stated interior-radius hypothesis gives $f_\theta=1+g_\theta\in\mathcal H^s(L;[0,1]^d)$ for $g_\theta=a\sum_j\theta_j\psi_j$, while $0<h\le1$ gives $b\le f_\theta\le B$. Hence $\{f_\theta:\theta\in\{-1,1\}^M\}\subset\mathcal F$.
[/step]
[step:Use Assouad's lemma to prove the lower bound]
Let $\mathbb P_{\theta,n}$ denote the $n$-fold product law generated by the density $f_\theta$. Define the squared Hellinger distance between probability measures $P$ and $Q$ dominated by a measure $\mu$ by
\begin{align*}
H^2(P,Q)=\int \left(\sqrt{\frac{dP}{d\mu}}-\sqrt{\frac{dQ}{d\mu}}\right)^2\,d\mu,
\end{align*}
so $0\le H^2(P,Q)\le 2$. If $\theta$ and $\theta'$ differ only in coordinate $j$, then
\begin{align*}
\int_{[0,1]^d}|f_\theta(x)-f_{\theta'}(x)|^2\,d\mathcal L^d(x)=4a^2h^{2s+d}\int_{[0,1]^d}|\psi(u)|^2\,d\mathcal L^d(u).
\end{align*}
Since all densities in the hypercube are bounded below by $b$, the squared Hellinger distance between adjacent one-sample laws is bounded by a constant multiple of $a^2h^{2s+d}$. For the product laws this gives
\begin{align*}
H^2(\mathbb P_{\theta,n},\mathbb P_{\theta',n})\le C_5 n a^2 h^{2s+d},
\end{align*}
with $C_5=C_5(b,\psi)>0$. Choose
\begin{align*}
h_n=n^{-1/(2s+d)}.
\end{align*}
Then the adjacent Hellinger distances are bounded by a constant strictly below $2$ after reducing $a$ if necessary. We use the following finite-hypercube form of [Assouad's lemma](/theorems/5906). Suppose $\{P_\theta:\theta\in\{-1,1\}^M\}$ is a statistical experiment and $\{g_\theta\}$ is a family in a [metric space](/page/Metric%20Space) with squared loss $\ell(g,\tilde g)=\|g-\tilde g\|_{L^2([0,1]^d)}^2$. If, for each coordinate $j$, changing only $\theta_j$ changes $g_\theta$ on a disjoint coordinate block with squared $L^2$ separation at least $\delta^2$, and if every adjacent pair has Hellinger affinity at least $\kappa>0$, then
\begin{align*}
\inf_{\tilde g}\sup_\theta \mathbb E_\theta[\ell(g_\theta,\tilde g)]\ge \frac{\kappa}{4}M\delta^2.
\end{align*}
Here the infimum is over all measurable estimators in $L^2([0,1]^d)$. The first hypothesis holds by the construction of $\theta\mapsto f_\theta$. If $\theta$ and $\theta'$ differ only in coordinate $j$, then changing the $j$th sign changes the density only on $Q_j$, so the coordinatewise squared $L^2$ separation is
\begin{align*}
\delta^2=4a^2h_n^{2s+d}\int_{[0,1]^d}|\psi(u)|^2\,d\mathcal L^d(u).
\end{align*}
The second hypothesis holds because the preceding Hellinger estimate and the choice of $a$ give $H^2(\mathbb P_{\theta,n},\mathbb P_{\theta',n})\le 2-\eta$ for some $\eta=\eta(s,d,L,b,B,\rho,\psi)>0$ and every adjacent pair. Since Hellinger affinity is $1-H^2/2$ under this normalization, every adjacent affinity is at least $\eta/2$. Assouad's lemma therefore yields a constant $c_1=c_1(s,d,L,b,B,\rho)>0$ such that
\begin{align*}
R_n(\mathcal F)\ge c_1 M h_n^{2s+d}.
\end{align*}
Since $M\asymp h_n^{-d}$, this becomes
\begin{align*}
R_n(\mathcal F)\ge c n^{-2s/(2s+d)}
\end{align*}
for a constant $c=c(s,d,L,b,B,\rho)>0$.
[guided]
We now rebuild the lower-bound construction from the beginning, because Assouad's lemma needs an explicit finite cube of alternatives inside $\mathcal F$. Choose a nonzero bump
\begin{align*}
\psi:[0,1]^d\to\mathbb R
\end{align*}
in $C_c^\infty((0,1)^d)$ such that
\begin{align*}
\int_{[0,1]^d}\psi(x)\,d\mathcal L^d(x)=0.
\end{align*}
For a resolution $h\in(0,1]$, choose $M$ coordinate-aligned cubes $Q_1,\dots,Q_M\subset[0,1]^d$ of side length comparable to $h$ and separated by gaps comparable to $h$. A regular grid inside a fixed interior subcube gives constants $c_M,C_M>0$, depending only on $d$, such that
\begin{align*}
c_Mh^{-d}\le M\le C_Mh^{-d}.
\end{align*}
Let $x_j\in Q_j$ be the coordinatewise minimal vertex of $Q_j$, and define
\begin{align*}
\psi_j:[0,1]^d\to\mathbb R
\end{align*}
by
\begin{align*}
\psi_j(x)=h^s\psi\left(\frac{x-x_j}{h}\right).
\end{align*}
For each sign vector $\theta=(\theta_1,\dots,\theta_M)\in\{-1,1\}^M$, define
\begin{align*}
f_\theta:[0,1]^d\to\mathbb R
\end{align*}
by
\begin{align*}
f_\theta(x)=1+a\sum_{j=1}^M\theta_j\psi_j(x),
\end{align*}
where $a>0$ is a small constant to be fixed.
We verify that these alternatives are densities in $\mathcal F$. The supports of the $\psi_j$ are disjoint, and the change of variables $u=(x-x_j)/h$ gives
\begin{align*}
\int_{[0,1]^d}\psi_j(x)\,d\mathcal L^d(x)=h^{s+d}\int_{[0,1]^d}\psi(u)\,d\mathcal L^d(u)=0.
\end{align*}
Therefore
\begin{align*}
\int_{[0,1]^d}f_\theta(x)\,d\mathcal L^d(x)=1.
\end{align*}
The separated supports ensure that cross-support Hölder quotients are bounded by the same scale as the local quotients. The Hölder scaling, with $m=\lfloor s\rfloor$ and the Hölder-Zygmund convention when $s$ is an integer, gives a constant $C_3=C_3(s,d,\psi)>0$ such that
\begin{align*}
\left\|\sum_{j=1}^M\theta_j\psi_j\right\|_{\mathcal H^s}\le C_3.
\end{align*}
The uniform bound gives a constant $C_4=C_4(\psi)>0$ such that
\begin{align*}
\left|a\sum_{j=1}^M\theta_j\psi_j(x)\right|\le aC_4h^s\le aC_4.
\end{align*}
Choose
\begin{align*}
a\le \min\left\{\frac{\rho}{C_3},\frac{1-b}{C_4},\frac{B-1}{C_4}\right\}.
\end{align*}
Then the perturbation $g_\theta=a\sum_j\theta_j\psi_j$ has $\|g_\theta\|_{\mathcal H^s([0,1]^d)}\le\rho$, so the interior-radius hypothesis gives $f_\theta=1+g_\theta\in\mathcal H^s(L;[0,1]^d)$. The strict inequalities $b<1<B$ make the remaining two bounds positive, so $b\le f_\theta\le B$, and $f_\theta$ integrates to $1$. Therefore $f_\theta\in\mathcal F$ for every $\theta\in\{-1,1\}^M$.
Now compare adjacent vertices. Let $\theta,\theta'\in\{-1,1\}^M$ differ only in coordinate $j$. Then $f_\theta-f_{\theta'}=2a\theta_j\psi_j$ on $Q_j$ and is zero off $Q_j$. Hence
\begin{align*}
\int_{[0,1]^d}|f_\theta(x)-f_{\theta'}(x)|^2\,d\mathcal L^d(x)=4a^2h^{2s+d}\int_{[0,1]^d}|\psi(u)|^2\,d\mathcal L^d(u).
\end{align*}
This quantity is the coordinatewise squared-loss separation
\begin{align*}
\delta^2=4a^2h^{2s+d}\int_{[0,1]^d}|\psi(u)|^2\,d\mathcal L^d(u).
\end{align*}
We also need adjacent statistical experiments to remain close. Let $\mathbb P_{\theta,n}$ denote the $n$-fold product law generated by $f_\theta$. For probability measures $P$ and $Q$ dominated by a measure $\mu$, define
\begin{align*}
H^2(P,Q)=\int \left(\sqrt{\frac{dP}{d\mu}}-\sqrt{\frac{dQ}{d\mu}}\right)^2\,d\mu.
\end{align*}
Since every $f_\theta$ is bounded below by $b$, the pointwise inequality
\begin{align*}
(\sqrt{u}-\sqrt{v})^2\le \frac{(u-v)^2}{b}
\end{align*}
for $u,v\ge b$ gives a one-sample Hellinger bound bounded by a constant multiple of $a^2h^{2s+d}$. [Tensorization of Hellinger affinity](/theorems/5908) for independent samples then gives a constant $C_5=C_5(b,\psi)>0$ such that
\begin{align*}
H^2(\mathbb P_{\theta,n},\mathbb P_{\theta',n})\le C_5na^2h^{2s+d}.
\end{align*}
Choose
\begin{align*}
h_n=n^{-1/(2s+d)}.
\end{align*}
After reducing $a$ if necessary, this gives
\begin{align*}
H^2(\mathbb P_{\theta,n},\mathbb P_{\theta',n})\le 2-\eta
\end{align*}
for some $\eta>0$ and every adjacent pair. Since the Hellinger affinity equals $1-H^2/2$ under this normalization, every adjacent affinity is at least $\eta/2$.
We apply the finite-hypercube form of Assouad's lemma: if the coordinatewise squared separation is at least $\delta^2$ and adjacent affinities are at least $\kappa>0$, then
\begin{align*}
\inf_{\tilde f_n}\sup_\theta\mathbb E_\theta\left[\int_{[0,1]^d}|\tilde f_n(x)-f_\theta(x)|^2\,d\mathcal L^d(x)\right]\ge \frac{\kappa}{4}M\delta^2.
\end{align*}
For completeness, this form follows by assigning to any estimator $\tilde f_n$ a sign estimate for each coordinate according to which of the two adjacent alternatives is closer on the block $Q_j$. On the event that the $j$th sign is misclassified, the disjoint-block $L^2$ loss is at least $\delta^2/4$. The two-point testing lower bound gives misclassification probability at least one half of the Hellinger affinity of the adjacent laws, hence at least $\kappa/2$. Summing the block losses over disjoint $Q_j$ gives the displayed constant after weakening constants.
Here $\kappa=\eta/2$, and the hypercube lies inside $\mathcal F$, so
\begin{align*}
R_n(\mathcal F)\ge c_1M h_n^{2s+d}
\end{align*}
for a constant $c_1=c_1(s,d,L,b,B,\psi)>0$. Finally $M\asymp h_n^{-d}$, so
\begin{align*}
R_n(\mathcal F)\ge c h_n^{2s}=c n^{-2s/(2s+d)}.
\end{align*}
[/guided]
[/step]
[step:Combine the two estimates to obtain the minimax equivalence]
The upper bound gives $R_n(\mathcal F)\le Cn^{-2s/(2s+d)}$, while the Assouad construction gives $R_n(\mathcal F)\ge cn^{-2s/(2s+d)}$. Therefore
\begin{align*}
c n^{-2s/(2s+d)}\le R_n(\mathcal F)\le C n^{-2s/(2s+d)}.
\end{align*}
The constants $c,C>0$ depend only on $s,d,L,b,B,\rho$ and the fixed Hölder norm convention, and not on $n$. This is exactly
\begin{align*}
R_n(\mathcal F)\asymp n^{-2s/(2s+d)}.
\end{align*}
[/step]