[guided]The Fourier-side estimate works because the two frequency regions remember different information about $f$. Low frequencies are controlled by the size of $\hat f$ itself, and the only general pointwise control available is the $L^1$ estimate
\begin{align*}
|\hat f(\xi)| \leq \frac{1}{(2\pi)^{n/2}}\|f\|_{L^1(\mathbb{R}^n)}.
\end{align*}
This estimate is valid because the Fourier transform was defined by an absolutely convergent integral and the triangle inequality applies to that integral.
We now introduce a radius $R > 0$ and decompose $\mathbb{R}^n$ as the disjoint union of $B(0,R)$ and $\mathbb{R}^n \setminus B(0,R)$. Define $\alpha_n := \mathcal{L}^n(B(0,1))$, the Lebesgue measure of the unit ball in $\mathbb{R}^n$. The Plancherel theorem for the Fourier transform on $L^2(\mathbb{R}^n)$ applies because $f \in C_c^\infty(\mathbb{R}^n) \subset L^2(\mathbb{R}^n)$, and it gives
\begin{align*}
\|f\|_{L^2(\mathbb{R}^n)}^2 = \int_{B(0,R)} |\hat f(\xi)|^2\,d\mathcal{L}^n(\xi) + \int_{\mathbb{R}^n \setminus B(0,R)} |\hat f(\xi)|^2\,d\mathcal{L}^n(\xi).
\end{align*}
For the low-frequency part, we use the pointwise bound uniformly on $B(0,R)$:
\begin{align*}
\int_{B(0,R)} |\hat f(\xi)|^2\,d\mathcal{L}^n(\xi) \leq \frac{1}{(2\pi)^n}\|f\|_{L^1(\mathbb{R}^n)}^2\mathcal{L}^n(B(0,R)).
\end{align*}
The scaling of Lebesgue measure under the dilation $\xi = R\eta$ gives $\mathcal{L}^n(B(0,R)) = R^n\mathcal{L}^n(B(0,1)) = \alpha_n R^n$, so
\begin{align*}
\int_{B(0,R)} |\hat f(\xi)|^2\,d\mathcal{L}^n(\xi) \leq \frac{\alpha_n}{(2\pi)^n}R^n\|f\|_{L^1(\mathbb{R}^n)}^2.
\end{align*}
For the high-frequency part, the point is that large frequencies can be paid for by derivatives. On $\mathbb{R}^n \setminus B(0,R)$, the inequality $|\xi| \geq R$ implies $|\hat f(\xi)|^2 \leq R^{-2}|\xi|^2|\hat f(\xi)|^2$. Hence
\begin{align*}
\int_{\mathbb{R}^n \setminus B(0,R)} |\hat f(\xi)|^2\,d\mathcal{L}^n(\xi) \leq R^{-2}\int_{\mathbb{R}^n}|\xi|^2|\hat f(\xi)|^2\,d\mathcal{L}^n(\xi).
\end{align*}
The Fourier derivative identity applies because each $\partial_{x_j}f$ is smooth, compactly supported, and in $L^2(\mathbb{R}^n)$. Since $\widehat{\partial_{x_j}f}(\xi)=i\xi_j\hat f(\xi)$, the Plancherel theorem gives
\begin{align*}
\int_{\mathbb{R}^n}|\xi|^2|\hat f(\xi)|^2\,d\mathcal{L}^n(\xi) = \sum_{j=1}^n \int_{\mathbb{R}^n}|\xi_j|^2|\hat f(\xi)|^2\,d\mathcal{L}^n(\xi).
\end{align*}
For each $j \in \{1,\dots,n\}$, Plancherel applied to $\partial_{x_j}f$ gives
\begin{align*}
\int_{\mathbb{R}^n}|\xi_j|^2|\hat f(\xi)|^2\,d\mathcal{L}^n(\xi) = \|\partial_{x_j}f\|_{L^2(\mathbb{R}^n)}^2.
\end{align*}
Therefore
\begin{align*}
\int_{\mathbb{R}^n}|\xi|^2|\hat f(\xi)|^2\,d\mathcal{L}^n(\xi) = \sum_{j=1}^n\|\partial_{x_j}f\|_{L^2(\mathbb{R}^n)}^2.
\end{align*}
By the definition of the $L^2$ norm of the gradient, the last sum is $\|\nabla f\|_{L^2(\mathbb{R}^n)}^2$. Therefore, for every $R > 0$,
\begin{align*}
\|f\|_{L^2(\mathbb{R}^n)}^2 \leq \frac{\alpha_n}{(2\pi)^n}R^n\|f\|_{L^1(\mathbb{R}^n)}^2 + R^{-2}\|\nabla f\|_{L^2(\mathbb{R}^n)}^2.
\end{align*}[/guided]