[proofplan]
The proof is the standard Chernoff argument. We bound the upper tail by applying Markov's inequality to the non-negative [random variable](/page/Random%20Variable) $\exp(\lambda S)$, factor its expectation using independence, and then use the Rademacher [moment generating function](/page/Moment%20Generating%20Function) estimate. The resulting exponential bound depends on a free parameter $\lambda>0$, which is optimized by choosing $\lambda=u/\sum_{i=1}^n a_i^2$. The two-sided estimate follows by applying the same one-sided bound to $-S$ and using the union bound.
[/proofplan]
custom_env
admin
[step:Introduce the variance proxy and apply Markov's inequality to $\exp(\lambda S)$]Define
\begin{align*}
A=\sum_{i=1}^n a_i^2.
\end{align*}
By hypothesis, $A>0$. Fix $u>0$ and $\lambda>0$. Define the non-negative random variable $Y_\lambda:(\Omega,\mathcal F)\to(\mathbb R,\mathcal B(\mathbb R))$ by
\begin{align*}
Y_\lambda(\omega)=\exp(\lambda S(\omega)).
\end{align*}
Since the exponential function is increasing, the event $\{S\ge u\}$ is contained in the event $\{Y_\lambda\ge \exp(\lambda u)\}$. Markov's inequality applied to $Y_\lambda$ gives
\begin{align*}
\mathbb P(S\ge u)\le \exp(-\lambda u)\mathbb E[\exp(\lambda S)].
\end{align*}[/step]
custom_env
admin
[guided]The purpose of introducing $\lambda>0$ is to turn the additive random variable $S$ into an exponential random variable whose expectation factorizes well under independence. Define
\begin{align*}
A=\sum_{i=1}^n a_i^2.
\end{align*}
The assumption that at least one coefficient is nonzero is exactly the statement that $A>0$, so the denominator in the claimed bound is positive.
Fix $u>0$ and $\lambda>0$. Define the real-valued random variable $Y_\lambda:(\Omega,\mathcal F)\to(\mathbb R,\mathcal B(\mathbb R))$ by
\begin{align*}
Y_\lambda(\omega)=\exp(\lambda S(\omega)).
\end{align*}
Because $S$ is measurable and the exponential function is Borel measurable, $Y_\lambda$ is a non-negative real-valued random variable. On the event $\{S\ge u\}$, monotonicity of $x\mapsto \exp(\lambda x)$ gives
\begin{align*}
\exp(\lambda S)\ge \exp(\lambda u).
\end{align*}
Therefore
\begin{align*}
\{S\ge u\}\subseteq \{Y_\lambda\ge \exp(\lambda u)\}.
\end{align*}
Markov's inequality for the non-negative random variable $Y_\lambda$ at the positive level $\exp(\lambda u)$ gives
\begin{align*}
\mathbb P(Y_\lambda\ge \exp(\lambda u))\le \frac{\mathbb E[Y_\lambda]}{\exp(\lambda u)}.
\end{align*}
Combining the event inclusion with this inequality yields
\begin{align*}
\mathbb P(S\ge u)\le \exp(-\lambda u)\mathbb E[\exp(\lambda S)].
\end{align*}
This is the [Chernoff bound](/theorems/6038) before estimating the exponential moment.[/guided]
custom_env
admin
[step:Factor and bound the exponential moment using the Rademacher moment generating function]For each $i\in\{1,\ldots,n\}$, define $X_i:(\Omega,\mathcal F)\to(\mathbb R,\mathcal B(\mathbb R))$ by
\begin{align*}
X_i(\omega)=\exp(\lambda a_i\varepsilon_i(\omega)).
\end{align*}
The random variables $X_1,\ldots,X_n$ are independent because they are Borel functions of the independent random variables $\varepsilon_1,\ldots,\varepsilon_n$. They are also bounded, since each $\varepsilon_i$ takes values in $\{-1,1\}$ almost surely. Hence independence factorization of expectations gives
\begin{align*}
\mathbb E[\exp(\lambda S)]=\prod_{i=1}^n \mathbb E[\exp(\lambda a_i\varepsilon_i)].
\end{align*}
For each $i\in\{1,\ldots,n\}$, the random variable $\varepsilon_i$ is Rademacher by hypothesis and $\lambda a_i\in\mathbb R$. By [citetheorem:10111] applied with $t=\lambda a_i$,
\begin{align*}
\mathbb E[\exp(\lambda a_i\varepsilon_i)]\le \exp\left(\frac{\lambda^2a_i^2}{2}\right).
\end{align*}
Multiplying these estimates and using the definition of $A$ gives
\begin{align*}
\mathbb E[\exp(\lambda S)]\le \prod_{i=1}^n \exp\left(\frac{\lambda^2a_i^2}{2}\right)=\exp\left(\frac{\lambda^2A}{2}\right).
\end{align*}[/step]
custom_env
admin
[guided]The reason for exponentiating $S$ is that exponentials convert sums into products. For each $i\in\{1,\ldots,n\}$, define $X_i:(\Omega,\mathcal F)\to(\mathbb R,\mathcal B(\mathbb R))$ by
\begin{align*}
X_i(\omega)=\exp(\lambda a_i\varepsilon_i(\omega)).
\end{align*}
The map $x\mapsto \exp(\lambda a_i x)$ is Borel measurable, so $X_i$ is a Borel function of $\varepsilon_i$. Since $\varepsilon_1,\ldots,\varepsilon_n$ are independent, the random variables $X_1,\ldots,X_n$ are independent. Since each $\varepsilon_i$ takes only the values $-1$ and $1$ almost surely, each $X_i$ is bounded and hence integrable.
The identity $S=\sum_{i=1}^n a_i\varepsilon_i$ gives
\begin{align*}
\exp(\lambda S)=\prod_{i=1}^n X_i.
\end{align*}
Independence factorization for bounded independent random variables therefore yields
\begin{align*}
\mathbb E[\exp(\lambda S)]=\prod_{i=1}^n \mathbb E[X_i]=\prod_{i=1}^n \mathbb E[\exp(\lambda a_i\varepsilon_i)].
\end{align*}
Now fix $i\in\{1,\ldots,n\}$. The random variable $\varepsilon_i$ is Rademacher by hypothesis, and $\lambda a_i$ is a real number. The Rademacher moment [generating function](/page/Generating%20Function) estimate [citetheorem:10111], applied with $t=\lambda a_i$, gives
\begin{align*}
\mathbb E[\exp(\lambda a_i\varepsilon_i)]\le \exp\left(\frac{\lambda^2a_i^2}{2}\right).
\end{align*}
Multiplying this estimate over $i\in\{1,\ldots,n\}$ gives
\begin{align*}
\mathbb E[\exp(\lambda S)]\le \prod_{i=1}^n \exp\left(\frac{\lambda^2a_i^2}{2}\right)=\exp\left(\frac{\lambda^2}{2}\sum_{i=1}^n a_i^2\right)=\exp\left(\frac{\lambda^2A}{2}\right).
\end{align*}[/guided]
custom_env
admin
[step:Optimize the Chernoff parameter]
Combining the previous two steps, for every $\lambda>0$,
\begin{align*}
\mathbb P(S\ge u)\le \exp\left(-\lambda u+\frac{\lambda^2A}{2}\right).
\end{align*}
Since $u>0$ and $A>0$, the choice
\begin{align*}
\lambda=\frac{u}{A}
\end{align*}
is admissible. Substituting this value gives
\begin{align*}
-\lambda u+\frac{\lambda^2A}{2}=-\frac{u^2}{A}+\frac{u^2}{2A}=-\frac{u^2}{2A}.
\end{align*}
Therefore
\begin{align*}
\mathbb P(S\ge u)\le \exp\left(-\frac{u^2}{2A}\right)=\exp\left(-\frac{u^2}{2\sum_{i=1}^n a_i^2}\right).
\end{align*}
[/step]
custom_env
admin
[step:Apply the one-sided estimate to $-S$ and use the union bound]
Define $T:(\Omega,\mathcal F)\to(\mathbb R,\mathcal B(\mathbb R))$ by
\begin{align*}
T(\omega)=-S(\omega)=\sum_{i=1}^n (-a_i)\varepsilon_i(\omega).
\end{align*}
The coefficients $-a_1,\ldots,-a_n$ satisfy
\begin{align*}
\sum_{i=1}^n (-a_i)^2=A.
\end{align*}
Applying the one-sided bound just proved to $T$ gives
\begin{align*}
\mathbb P(-S\ge u)\le \exp\left(-\frac{u^2}{2A}\right).
\end{align*}
Since
\begin{align*}
\{|S|\ge u\}=\{S\ge u\}\cup\{-S\ge u\},
\end{align*}
the union bound yields
\begin{align*}
\mathbb P(|S|\ge u)\le \mathbb P(S\ge u)+\mathbb P(-S\ge u)\le 2\exp\left(-\frac{u^2}{2A}\right).
\end{align*}
Replacing $A$ by $\sum_{i=1}^n a_i^2$ gives the desired two-sided estimate.
[/step]