A fair coin is often introduced as the simplest random experiment, but its usual coding by $0$ and $1$ carries an asymmetry. The value $1$ records success and the value $0$ records failure, so sums count successes rather than cancellations. Many probabilistic arguments need a sign instead: a random quantity that pushes equally often to the right and to the left, has mean zero, and has no size variability beyond its sign. That object is the Rademacher random variable.
The first surprise is how much structure is already present in this two-point distribution. A sequence of independent Rademacher random variables behaves like repeated random choices of signs; its partial sums form the standard simple symmetric random walk; its products model parity; its [moment generating function](/page/Moment%20Generating%20Function) gives sharp sub-Gaussian estimates; and its linear combinations are the basic test case for concentration inequalities. The definition is small, but the uses reach from elementary probability to Banach-space geometry and randomized algorithms.
[example: A Random Sign Cancels Instead of Counts]
Let $(\Omega,\mathcal F,\mathbb P)$ be the two-outcome [probability space](/page/Probability%20Space) with $\Omega=\{H,T\}$, $\mathcal F=2^\Omega$, and
\begin{align*}
\mathbb P(\{H\})=\mathbb P(\{T\})=\frac{1}{2}.
\end{align*}
Define $B:\Omega\to\mathbb R$ by $B(H)=1$ and $B(T)=0$, and define $\varepsilon:\Omega\to\mathbb R$ by $\varepsilon(H)=1$ and $\varepsilon(T)=-1$. We compute the expectations of these two codings of the same fair coin flip.
For the indicator coding $B$, the expectation over the two atoms is
\begin{align*}
\mathbb E[B]=B(H)\mathbb P(\{H\})+B(T)\mathbb P(\{T\}).
\end{align*}
Substituting the values of $B$ and the probabilities gives
\begin{align*}
\mathbb E[B]=1\cdot\frac{1}{2}+0\cdot\frac{1}{2}.
\end{align*}
Hence
\begin{align*}
\mathbb E[B]=\frac{1}{2}.
\end{align*}
For the sign coding $\varepsilon$, the same finite expectation formula gives
\begin{align*}
\mathbb E[\varepsilon]=\varepsilon(H)\mathbb P(\{H\})+\varepsilon(T)\mathbb P(\{T\}).
\end{align*}
Substituting $\varepsilon(H)=1$ and $\varepsilon(T)=-1$ gives
\begin{align*}
\mathbb E[\varepsilon]=1\cdot\frac{1}{2}+(-1)\cdot\frac{1}{2}.
\end{align*}
Therefore
\begin{align*}
\mathbb E[\varepsilon]=\frac{1}{2}-\frac{1}{2}=0.
\end{align*}
The variable $B$ counts heads and has positive mean, while $\varepsilon$ records the same randomness as two opposite signs whose contributions cancel exactly. This centering is why Rademacher variables are the natural coding when the problem is about random walks or fluctuations around a mean.
[/example]
The example shows the guiding question of the page: what can be proved from the assumption that a real-valued [random variable](/page/Random%20Variable) is exactly a fair random sign? The answer begins with its law, then moves through moments, sums, concentration, and the ways Rademacher variables convert deterministic coefficients into random oscillations.
## Definition
### The Fair Sign
The point of the definition is to isolate the fair sign itself, independently of the particular sample space used to generate it. We do not care whether the sign came from a coin, a binary digit, or another experiment; we care only that the induced distribution on $\mathbb R$ puts equal mass at $-1$ and $1$.
[definition: Rademacher Random Variable]
Let $(\Omega, \mathcal F, \mathbb P)$ be a probability space. A real-valued random variable $\varepsilon: (\Omega, \mathcal F) \to (\mathbb R, \mathcal B(\mathbb R))$ is a Rademacher random variable if
\begin{align*}
\mathbb P(\varepsilon = -1)=\frac{1}{2}, \qquad \mathbb P(\varepsilon = 1)=\frac{1}{2}.
\end{align*}
[/definition]
Equivalently, the law of $\varepsilon$ is the [probability measure](/page/Probability%20Measure)
\begin{align*}
\mu_\varepsilon=\frac{1}{2}\delta_{-1} + \frac{1}{2}\delta_1,
\end{align*}
where $\delta_a$ denotes the Dirac probability measure at $a \in \mathbb R$.
### Independent Signs
A single sign is useful, but most applications need many signs at once. The issue is not just that each coordinate should be fair; each new coordinate should bring new randomness, since repeated copies of the same sign do not create genuine fluctuation.
[definition: Rademacher Sequence]
Let $(\Omega, \mathcal F, \mathbb P)$ be a probability space. A sequence $(\varepsilon_n)_{n\in\mathbb N}$ is a Rademacher sequence if, for each $n\in\mathbb N$, the coordinate $\varepsilon_n:(\Omega,\mathcal F)\to(\mathbb R,\mathcal B(\mathbb R))$ is a Rademacher random variable, and the family $(\varepsilon_n)_{n\in\mathbb N}$ is independent.
[/definition]
Independence is not a decorative assumption. Without it, each coordinate can be a fair sign while the sequence contains no new randomness after the first coordinate.
[example: Fair Marginals Without Independent Signs]
Let $\varepsilon$ be a Rademacher random variable and define $\varepsilon_n=\varepsilon$ for every $n\in\mathbb N$. For each $n$,
\begin{align*}
\mathbb P(\varepsilon_n=1)=\mathbb P(\varepsilon=1)=\frac{1}{2}.
\end{align*}
Also,
\begin{align*}
\mathbb P(\varepsilon_n=-1)=\mathbb P(\varepsilon=-1)=\frac{1}{2}.
\end{align*}
Thus every coordinate $\varepsilon_n$ has the Rademacher law.
The sequence is nevertheless not independent. Since $\varepsilon_1=\varepsilon$ and $\varepsilon_2=\varepsilon$, the event $\{\varepsilon_1=1,\varepsilon_2=1\}$ is the same event as $\{\varepsilon=1\}$. Therefore
\begin{align*}
\mathbb P(\varepsilon_1=1,\varepsilon_2=1)=\mathbb P(\varepsilon=1)=\frac{1}{2}.
\end{align*}
On the other hand,
\begin{align*}
\mathbb P(\varepsilon_1=1)=\mathbb P(\varepsilon=1)=\frac{1}{2}
\end{align*}
and
\begin{align*}
\mathbb P(\varepsilon_2=1)=\mathbb P(\varepsilon=1)=\frac{1}{2}.
\end{align*}
Multiplying these marginal probabilities gives
\begin{align*}
\mathbb P(\varepsilon_1=1)\mathbb P(\varepsilon_2=1)=\frac{1}{2}\cdot\frac{1}{2}=\frac{1}{4}.
\end{align*}
Since $\frac{1}{2}\ne\frac{1}{4}$, the equality required for independence fails for the two events $\{\varepsilon_1=1\}$ and $\{\varepsilon_2=1\}$. The example separates the one-dimensional law of each coordinate from the joint independence needed for a Rademacher sequence.
[/example]
### Relation With Bernoulli Variables
The Rademacher variable is also the centered version of a fair Bernoulli variable. This conversion is worth making explicit because many elementary probability models start with a success indicator, while later estimates want a centered sign.
[quotetheorem:10109]
This theorem lets one move between the language of successes and the language of signs. The sign language is especially efficient for moments, because all powers of $\varepsilon$ collapse to either $1$ or $\varepsilon$.
## Moments and Transforms
### Parity of Moments
A Rademacher random variable has no tail beyond its two atoms, so its moment computations are exact rather than asymptotic. The useful point is not only that the mean is zero and the variance is one, but that every higher moment is already determined by parity.
[quotetheorem:10110]
The parity rule is the algebraic engine behind many later estimates. Products of independent signs vanish in expectation unless every sign appears with even total exponent, and that cancellation is what makes Rademacher sums tractable.
[example: Product Moments Detect Parity]
Let $\varepsilon_1,\varepsilon_2,\varepsilon_3$ be independent Rademacher random variables, and set
\begin{align*}
Y=\varepsilon_1^3\varepsilon_2^2\varepsilon_3.
\end{align*}
Since powers of different independent variables are still independent, the expectation of the product factors:
\begin{align*}
\mathbb E[Y]=\mathbb E[\varepsilon_1^3]\mathbb E[\varepsilon_2^2]\mathbb E[\varepsilon_3].
\end{align*}
For a Rademacher variable, odd powers have expectation $0$ and even powers have expectation $1$ by *[Moments of a Rademacher Random Variable](/theorems/10110)*, so
\begin{align*}
\mathbb E[\varepsilon_1^3]=0.
\end{align*}
Also,
\begin{align*}
\mathbb E[\varepsilon_2^2]=1
\end{align*}
and
\begin{align*}
\mathbb E[\varepsilon_3]=0.
\end{align*}
Substituting these three values gives
\begin{align*}
\mathbb E[Y]=0\cdot 1\cdot 0=0.
\end{align*}
For the even-exponent monomial, independence gives
\begin{align*}
\mathbb E[\varepsilon_1^2\varepsilon_2^4\varepsilon_3^6]=\mathbb E[\varepsilon_1^2]\mathbb E[\varepsilon_2^4]\mathbb E[\varepsilon_3^6].
\end{align*}
Each exponent is even, so *Moments of a Rademacher Random Variable* gives
\begin{align*}
\mathbb E[\varepsilon_1^2]=1.
\end{align*}
Similarly,
\begin{align*}
\mathbb E[\varepsilon_2^4]=1
\end{align*}
and
\begin{align*}
\mathbb E[\varepsilon_3^6]=1.
\end{align*}
Therefore
\begin{align*}
\mathbb E[\varepsilon_1^2\varepsilon_2^4\varepsilon_3^6]=1\cdot 1\cdot 1=1.
\end{align*}
Independent random signs erase monomials containing an unpaired sign, while monomials in which every sign appears to an even power survive with expectation $1$.
[/example]
### Exponential Transforms
Moments are useful term by term, but tail estimates require a device that sees all moments at once. The moment [generating function](/page/Generating%20Function) does exactly this, and for a random sign it is explicit enough to compare with the Gaussian exponential.
[definition: Moment Generating Function of a Rademacher Random Variable]
Let $\varepsilon$ be a Rademacher random variable. The moment generating function of $\varepsilon$ is the function $M_\varepsilon: \mathbb R \to \mathbb R$ defined by
\begin{align*}
M_\varepsilon(t) = \mathbb E[e^{t\varepsilon}].
\end{align*}
[/definition]
The definition gives a general transform, but the Rademacher case is valuable because the transform can be computed exactly and then bounded in a form that tensorizes over independent sums. The next theorem supplies both pieces: the exact expression and the Gaussian-type estimate used later.
[quotetheorem:10111]
For later tail bounds it is helpful to name the inequality separately. The estimate
\begin{align*}
\mathbb E[e^{t\varepsilon}] \le e^{t^2/2}
\end{align*}
for every $t\in\mathbb R$ is called the sub-Gaussian Rademacher bound. It is the exact exponential control that will be inserted into products when independent signs are summed.
This bound is strongest when signs are added with deterministic coefficients. The exponential of the sum factorises, and the single-variable estimate becomes a many-variable tail bound.
## Random Signed Sums
### Linear Combinations of Signs
The main construction built from Rademacher variables is a random signed sum. Instead of randomising magnitudes, we fix coefficients and randomise their signs. This is a controlled way to test cancellation in finite-dimensional geometry, [empirical process theory](/page/Empirical%20Process%20Theory), and probability inequalities.
[definition: Rademacher Sum]
Let $n\in\mathbb N$, let $a_1,\ldots,a_n\in\mathbb R$, and let $\varepsilon_1,\ldots,\varepsilon_n$ be independent Rademacher random variables on a probability space $(\Omega,\mathcal F,\mathbb P)$. The associated Rademacher sum is the real-valued random variable $S:(\Omega,\mathcal F)\to(\mathbb R,\mathcal B(\mathbb R))$ defined by
\begin{align*}
S(\omega)=\sum_{i=1}^n a_i\varepsilon_i(\omega)
\end{align*}
for $\omega\in\Omega$.
[/definition]
After defining the sum, the first quantitative question is its natural scale. Since each sign is centered, cross terms should cancel in expectation; the theorem records the precise form of that cancellation.
[quotetheorem:10112]
The variance computation does not say that all coefficient choices behave the same. It gives the natural scale, but the actual distribution can be discrete in different ways depending on the arithmetic of the coefficients.
[example: Two Coefficients With the Same Variance Scale]
Let $\varepsilon_1,\varepsilon_2$ be independent Rademacher random variables, and set $S_1=\varepsilon_1+\varepsilon_2$. Since each sign takes only the values $-1$ and $1$, the four possible pairs are $(-1,-1)$, $(-1,1)$, $(1,-1)$, and $(1,1)$. By independence and the Rademacher probabilities,
\begin{align*}
\mathbb P(\varepsilon_1=-1,\varepsilon_2=-1)=\mathbb P(\varepsilon_1=-1)\mathbb P(\varepsilon_2=-1)=\frac{1}{2}\cdot\frac{1}{2}=\frac{1}{4}.
\end{align*}
On this event, $S_1=-1+(-1)=-2$, so
\begin{align*}
\mathbb P(S_1=-2)=\frac{1}{4}.
\end{align*}
The value $0$ occurs exactly when the signs are opposite. Hence
\begin{align*}
\mathbb P(S_1=0)=\mathbb P(\varepsilon_1=-1,\varepsilon_2=1)+\mathbb P(\varepsilon_1=1,\varepsilon_2=-1).
\end{align*}
Using independence on both terms gives
\begin{align*}
\mathbb P(S_1=0)=\frac{1}{2}\cdot\frac{1}{2}+\frac{1}{2}\cdot\frac{1}{2}=\frac{1}{4}+\frac{1}{4}=\frac{1}{2}.
\end{align*}
Similarly, $S_1=2$ exactly on $\{\varepsilon_1=1,\varepsilon_2=1\}$, so
\begin{align*}
\mathbb P(S_1=2)=\frac{1}{2}\cdot\frac{1}{2}=\frac{1}{4}.
\end{align*}
The expectation of $S_1$ is
\begin{align*}
\mathbb E[S_1]=(-2)\cdot\frac{1}{4}+0\cdot\frac{1}{2}+2\cdot\frac{1}{4}=0.
\end{align*}
Therefore
\begin{align*}
\operatorname{Var}(S_1)=\mathbb E[S_1^2]-(\mathbb E[S_1])^2.
\end{align*}
Substituting the three atoms gives
\begin{align*}
\mathbb E[S_1^2]=(-2)^2\cdot\frac{1}{4}+0^2\cdot\frac{1}{2}+2^2\cdot\frac{1}{4}=1+0+1=2.
\end{align*}
Since $\mathbb E[S_1]=0$,
\begin{align*}
\operatorname{Var}(S_1)=2-0^2=2.
\end{align*}
Now set $S_2=\sqrt{2}\,\varepsilon_1$. Since $\varepsilon_1=-1$ with probability $\frac{1}{2}$ and $\varepsilon_1=1$ with probability $\frac{1}{2}$, the variable $S_2$ takes the values $-\sqrt{2}$ and $\sqrt{2}$ with probabilities
\begin{align*}
\mathbb P(S_2=-\sqrt{2})=\frac{1}{2}
\end{align*}
and
\begin{align*}
\mathbb P(S_2=\sqrt{2})=\frac{1}{2}.
\end{align*}
Its expectation is
\begin{align*}
\mathbb E[S_2]=(-\sqrt{2})\cdot\frac{1}{2}+\sqrt{2}\cdot\frac{1}{2}=0.
\end{align*}
Also,
\begin{align*}
\mathbb E[S_2^2]=(-\sqrt{2})^2\cdot\frac{1}{2}+(\sqrt{2})^2\cdot\frac{1}{2}=2\cdot\frac{1}{2}+2\cdot\frac{1}{2}=2.
\end{align*}
Thus
\begin{align*}
\operatorname{Var}(S_2)=\mathbb E[S_2^2]-(\mathbb E[S_2])^2=2-0^2=2.
\end{align*}
Both random signed sums have variance $2$, but $S_1$ has three atoms while $S_2$ has two, so the variance records the quadratic scale without determining the full distribution.
[/example]
### Exponential Tail Bounds
The variance gives the scale, but applications often need a probability estimate for large deviations from zero. The moment generating function estimate is designed for this: it turns independence into a product and then optimizes an exponential Markov bound over the parameter $t$.
[quotetheorem:10113]
The estimate says that the Euclidean length of the coefficient vector controls deviations. It does not claim that the Gaussian tail is the exact probability; rather, it is a robust upper bound that survives without arithmetic information about the coefficients.
## Simple Symmetric Random Walk
### From Signs to Paths
When all coefficients are equal to $1$, the Rademacher sum becomes a path on the integer line. This is the simplest model of fluctuation: every step changes position by one, and the direction is chosen by an independent fair sign.
[definition: Simple Symmetric Random Walk]
Let $(\varepsilon_n)_{n\in\mathbb N}$ be a Rademacher sequence. The simple symmetric random walk generated by $(\varepsilon_n)_{n\in\mathbb N}$ is the [stochastic process](/page/Stochastic%20Process) $(S_n)_{n\ge 0}$ of maps $S_n:(\Omega,\mathcal F)\to(\mathbb Z,2^{\mathbb Z})$ defined by $S_0=0$ and
\begin{align*}
S_n(\omega)=\sum_{i=1}^n \varepsilon_i(\omega)
\end{align*}
for $n\in\mathbb N$ and $\omega\in\Omega$.
[/definition]
The random walk reveals a parity constraint that can be missed if the walk is treated as a rough Gaussian object too early. After $n$ steps, the position has the same parity as $n$.
[example: The Distribution After Three Steps]
Let $(S_n)_{n\ge 0}$ be a simple symmetric random walk generated by the independent Rademacher variables $\varepsilon_1,\varepsilon_2,\varepsilon_3,\ldots$. At time $3$,
\begin{align*}
S_3=\varepsilon_1+\varepsilon_2+\varepsilon_3.
\end{align*}
Each $\varepsilon_i$ takes the values $-1$ and $1$, so the eight possible sign triples are
\begin{align*}
(-1,-1,-1),\ (-1,-1,1),\ (-1,1,-1),\ (1,-1,-1),\ (-1,1,1),\ (1,-1,1),\ (1,1,-1),\ (1,1,1).
\end{align*}
By independence, each triple has probability
\begin{align*}
\frac{1}{2}\cdot\frac{1}{2}\cdot\frac{1}{2}=\frac{1}{8}.
\end{align*}
The value $S_3=-3$ occurs exactly for the triple $(-1,-1,-1)$, because
\begin{align*}
-1+(-1)+(-1)=-3.
\end{align*}
Hence
\begin{align*}
\mathbb P(S_3=-3)=\frac{1}{8}.
\end{align*}
The value $S_3=-1$ occurs exactly for the three triples with two $-1$ signs and one $1$ sign:
\begin{align*}
-1+(-1)+1=-1.
\end{align*}
Also,
\begin{align*}
-1+1+(-1)=-1.
\end{align*}
And
\begin{align*}
1+(-1)+(-1)=-1.
\end{align*}
Therefore
\begin{align*}
\mathbb P(S_3=-1)=3\cdot\frac{1}{8}=\frac{3}{8}.
\end{align*}
Similarly, the value $S_3=1$ occurs exactly for the three triples with two $1$ signs and one $-1$ sign:
\begin{align*}
-1+1+1=1.
\end{align*}
Also,
\begin{align*}
1+(-1)+1=1.
\end{align*}
And
\begin{align*}
1+1+(-1)=1.
\end{align*}
Thus
\begin{align*}
\mathbb P(S_3=1)=3\cdot\frac{1}{8}=\frac{3}{8}.
\end{align*}
Finally, $S_3=3$ occurs exactly for $(1,1,1)$, since
\begin{align*}
1+1+1=3.
\end{align*}
So
\begin{align*}
\mathbb P(S_3=3)=\frac{1}{8}.
\end{align*}
There is no mass at $0$ after three steps: among three signs, the number of $1$ signs plus the number of $-1$ signs is $3$, so these two counts cannot be equal, and equality of the counts is exactly what would make the sum zero.
[/example]
### Exact Distribution and Averaging
The finite-time distribution matters because it keeps track of the lattice structure of the walk. To reach position $k$ after $n$ steps, the walk must have exactly
\begin{align*}
\frac{n+k}{2}
\end{align*}
upward steps, so parity and counting combine into the following formula.
[quotetheorem:10114]
The exact formula also raises a limiting question: do the small parity-constrained imbalances accumulate into a persistent linear bias, or do they disappear after averaging? The quantity $S_n/n$ measures the average signed step, so its almost sure limit records whether the walk has any long-run drift rather than merely finite-time fluctuation.
[quotetheorem:520]
The theorem formalises the idea that the signs cancel in the long run. The random walk still visits large values, but its linear drift vanishes almost surely.
## Symmetrisation and Randomization
### External Random Signs
Rademacher variables are not only objects to study; they are tools for modifying other random variables. Multiplying by an independent random sign forces symmetry without changing magnitude, and inserting signs into a sum separates size from direction.
[definition: Symmetrisation by a Rademacher Variable]
Let $X:(\Omega,\mathcal F)\to(\mathbb R,\mathcal B(\mathbb R))$ be a real-valued random variable on a probability space $(\Omega,\mathcal F,\mathbb P)$, and let $\varepsilon:(\Omega,\mathcal F)\to(\mathbb R,\mathcal B(\mathbb R))$ be a Rademacher random variable on the same probability space, independent of $X$. The Rademacher symmetrisation of $X$ is the real-valued random variable $X^{\mathrm{sym}}:(\Omega,\mathcal F)\to(\mathbb R,\mathcal B(\mathbb R))$ defined by
\begin{align*}
X^{\mathrm{sym}} = \varepsilon X.
\end{align*}
[/definition]
The definition preserves magnitude by construction, but symmetrisation is useful only if its distribution can be computed from the original law. Independence separates the two possible signs, so probabilities for $\varepsilon X$ should receive equal contributions from the law of $X$ and from its reflected law. The formal identity below is the bookkeeping that makes this averaging usable in later arguments.
[quotetheorem:10115]
The formula shows why the independent sign matters. If the sign is chosen from $X$ itself, the magnitude may be preserved but the desired symmetry can fail.
[example: Dependence Can Destroy Symmetrisation]
Let $X$ be a Rademacher random variable and define $\varepsilon=X$. Then $\varepsilon$ has the same distribution as $X$, so
\begin{align*}
\mathbb P(\varepsilon=1)=\mathbb P(X=1)=\frac{1}{2}.
\end{align*}
Also,
\begin{align*}
\mathbb P(\varepsilon=-1)=\mathbb P(X=-1)=\frac{1}{2}.
\end{align*}
Thus $\varepsilon$ is again a Rademacher random variable.
However, $\varepsilon$ is not independent of $X$. Indeed, the event $\{\varepsilon=1,X=1\}$ is the same event as $\{X=1\}$, because $\varepsilon=X$. Hence
\begin{align*}
\mathbb P(\varepsilon=1,X=1)=\mathbb P(X=1)=\frac{1}{2}.
\end{align*}
But
\begin{align*}
\mathbb P(\varepsilon=1)\mathbb P(X=1)=\frac{1}{2}\cdot\frac{1}{2}=\frac{1}{4}.
\end{align*}
Since $\frac{1}{2}\ne\frac{1}{4}$, the factorisation required for independence fails.
The product is degenerate. On the event $\{X=1\}$,
\begin{align*}
\varepsilon X=X\cdot X=1\cdot 1=1.
\end{align*}
On the event $\{X=-1\}$,
\begin{align*}
\varepsilon X=X\cdot X=(-1)(-1)=1.
\end{align*}
Since these two events have total probability
\begin{align*}
\mathbb P(X=1)+\mathbb P(X=-1)=\frac{1}{2}+\frac{1}{2}=1,
\end{align*}
we have $\varepsilon X=1$ almost surely. Therefore
\begin{align*}
|\varepsilon X|=|1|=1
\end{align*}
almost surely, while
\begin{align*}
|X|=1
\end{align*}
almost surely because $X$ only takes the values $-1$ and $1$. Thus the magnitude is preserved, but the distribution is not symmetric around zero: $\mathbb P(\varepsilon X=1)=1$ and $\mathbb P(\varepsilon X=-1)=0$. Independence is the condition that makes the random sign external rather than already encoded in $X$.
[/example]
### Empirical Rademacher Averages
Symmetrisation also appears in empirical process theory, where random signs are used to compare a random fluctuation with a more manageable signed version. To measure how much a class can fit random noise, we let the class choose the vector that correlates best with the signs.
[definition: Empirical Rademacher Average]
Let $n\in\mathbb N$, and let $\varepsilon_1,\ldots,\varepsilon_n$ be independent Rademacher random variables on a probability space $(\Omega,\mathcal F,\mathbb P)$. Let $\mathcal D_n$ be the collection of all nonempty sets $\mathcal A\subset\mathbb R^n$ such that the random variable
\begin{align*}
\omega \mapsto \sup_{a\in\mathcal A}\frac{1}{n}\sum_{i=1}^n \varepsilon_i(\omega)a_i
\end{align*}
is $\mathcal F$-measurable and integrable. The empirical Rademacher average is the function $\mathfrak R_n:\mathcal D_n\to\mathbb R$ defined by
\begin{align*}
\mathfrak R_n(\mathcal A)=\mathbb E\left[\sup_{a\in\mathcal A}\frac{1}{n}\sum_{i=1}^n \varepsilon_i a_i\right].
\end{align*}
[/definition]
The supremum asks how well the set $\mathcal A$ can align with a random sign pattern. Small Rademacher average means that no vector in the class usually correlates strongly with pure noise.
[example: Rademacher Average of a Singleton]
Let $\mathcal A=\{a\}$ for a fixed vector $a=(a_1,\ldots,a_n)\in\mathbb R^n$. For every outcome $\omega$, the supremum over the singleton contains only one value, so
\begin{align*}
\sup_{b\in\mathcal A}\frac{1}{n}\sum_{i=1}^n \varepsilon_i(\omega)b_i=\frac{1}{n}\sum_{i=1}^n \varepsilon_i(\omega)a_i.
\end{align*}
Thus the defining formula for the empirical Rademacher average gives
\begin{align*}
\mathfrak R_n(\mathcal A)=\mathbb E\left[\frac{1}{n}\sum_{i=1}^n \varepsilon_i a_i\right].
\end{align*}
For each $i$, since $\varepsilon_i$ is Rademacher,
\begin{align*}
\mathbb E[\varepsilon_i]=1\cdot\mathbb P(\varepsilon_i=1)+(-1)\cdot\mathbb P(\varepsilon_i=-1).
\end{align*}
Substituting the two probabilities gives
\begin{align*}
\mathbb E[\varepsilon_i]=1\cdot\frac{1}{2}+(-1)\cdot\frac{1}{2}=\frac{1}{2}-\frac{1}{2}=0.
\end{align*}
Using linearity of expectation for the finite sum,
\begin{align*}
\mathfrak R_n(\mathcal A)=\frac{1}{n}\sum_{i=1}^n a_i\mathbb E[\varepsilon_i].
\end{align*}
Substituting $\mathbb E[\varepsilon_i]=0$ for every $i$ gives
\begin{align*}
\mathfrak R_n(\mathcal A)=\frac{1}{n}\sum_{i=1}^n a_i\cdot 0=0.
\end{align*}
A single fixed vector cannot adapt to the signs, so its average correlation with the signs vanishes.
[/example]
The same definition becomes nonzero when the class contains enough choices to react to the signs. This is the bridge from a two-point distribution to complexity measures in learning theory.
## Inequalities and Comparison Principles
### Khintchine-Type Estimates
Rademacher variables provide a clean setting for inequalities because their randomness is bounded, centered, and independent. A first comparison relates moments of random signed sums to the Euclidean norm of the coefficients. For $p=2$ this is the variance identity; for other $p$ it is the Khintchine inequality.
[quotetheorem:10116]
The inequality says that, for random signs, every $L^p$ size of the signed sum is comparable to the $L^2$ coefficient scale. This is not a statement about arbitrary independent variables; it uses the symmetry, boundedness, and exact cancellation of Rademacher signs.
[example: The $p=2$ Case of Khintchine]
Let $S=\sum_{i=1}^n a_i\varepsilon_i$, where $\varepsilon_1,\ldots,\varepsilon_n$ are independent Rademacher random variables. Since $|S|^2=S^2$, we compute the second moment.
\begin{align*}
S^2=\left(\sum_{i=1}^n a_i\varepsilon_i\right)\left(\sum_{j=1}^n a_j\varepsilon_j\right)=\sum_{i=1}^n\sum_{j=1}^n a_i a_j\varepsilon_i\varepsilon_j.
\end{align*}
By linearity of expectation for the finite double sum,
\begin{align*}
\mathbb E[S^2]=\sum_{i=1}^n\sum_{j=1}^n a_i a_j\mathbb E[\varepsilon_i\varepsilon_j].
\end{align*}
Separate the diagonal terms $i=j$ from the off-diagonal terms $i\ne j$:
\begin{align*}
\mathbb E[S^2]=\sum_{i=1}^n a_i^2\mathbb E[\varepsilon_i^2]+\sum_{i\ne j}a_i a_j\mathbb E[\varepsilon_i\varepsilon_j].
\end{align*}
For each $i$, the exponent $2$ is even, so *Moments of a Rademacher Random Variable* gives
\begin{align*}
\mathbb E[\varepsilon_i^2]=1.
\end{align*}
If $i\ne j$, then $\varepsilon_i$ and $\varepsilon_j$ are independent, so $\varepsilon_i\varepsilon_j$ has expectation
\begin{align*}
\mathbb E[\varepsilon_i\varepsilon_j]=\mathbb E[\varepsilon_i]\mathbb E[\varepsilon_j].
\end{align*}
Each exponent $1$ is odd, so *Moments of a Rademacher Random Variable* gives
\begin{align*}
\mathbb E[\varepsilon_i]=0
\end{align*}
and
\begin{align*}
\mathbb E[\varepsilon_j]=0.
\end{align*}
Thus, for $i\ne j$,
\begin{align*}
\mathbb E[\varepsilon_i\varepsilon_j]=0\cdot 0=0.
\end{align*}
Substituting the diagonal and off-diagonal values gives
\begin{align*}
\mathbb E[S^2]=\sum_{i=1}^n a_i^2\cdot 1+\sum_{i\ne j}a_i a_j\cdot 0.
\end{align*}
Therefore
\begin{align*}
\mathbb E[S^2]=\sum_{i=1}^n a_i^2.
\end{align*}
Taking square roots,
\begin{align*}
\left(\mathbb E[|S|^2]\right)^{1/2}=\left(\sum_{i=1}^n a_i^2\right)^{1/2}.
\end{align*}
So in the case $p=2$, the lower and upper Khintchine bounds are both exact, and the Khintchine inequality holds with $A_2=B_2=1$.
[/example]
### Banach-Space Type
A second comparison concerns signs attached to vectors in a [normed space](/page/Normed%20Space). The scalar theory says that random signs measure Euclidean size; the vector theory asks how much cancellation survives after taking a norm.
[definition: Rademacher Type $p$]
Let $(X,\|\cdot\|_X)$ be a [Banach space](/page/Banach%20Space) and let $1\le p\le 2$. The space $X$ has Rademacher type $p$ if there exists a constant $T>0$ such that for every $n\in\mathbb N$, every $x_1,\ldots,x_n\in X$, and independent Rademacher random variables $\varepsilon_1,\ldots,\varepsilon_n$,
\begin{align*}
\mathbb E\left[\left\|\sum_{i=1}^n \varepsilon_i x_i\right\|_X^p\right] \le T^p\sum_{i=1}^n \|x_i\|_X^p.
\end{align*}
[/definition]
The most important test case for type is a [Hilbert space](/page/Hilbert%20Space), where the [inner product](/page/Inner%20Product) can turn the norm squared of a random signed vector sum into a sum of pairwise products. The next theorem records the exact identity that makes Hilbert spaces the model geometry for Rademacher cancellation.
[quotetheorem:10117]
This vector-valued perspective is one reason Rademacher variables occur outside elementary probability. They turn geometric questions about norms into probabilistic questions about random signs.
## Beyond and Connected Topics
Rademacher random variables sit immediately below several larger theories. The first continuation is the general theory of [random variables](/page/Random%20Variable), where the sample-space-to-state-space viewpoint explains why the definition depends only on the induced law. The second is the study of Bernoulli, binomial, and random-walk distributions in [Cambridge IA Probability](/page/Cambridge%20IA%20Probability), where fair signs are the centered form of repeated coin tossing.
In measure-theoretic probability, Rademacher sequences become a standard example of independent random variables on product spaces. This viewpoint belongs naturally with [Cambridge IB Probability and Measure](/page/Cambridge%20IB%20Probability%20and%20Measure), where independence, product measures, and almost sure convergence are treated as measure-theoretic constructions rather than finite counting rules.
In advanced probability, random signs are a testing ground for concentration, symmetrisation, martingale methods, and invariance principles. They are close enough to coin flips for exact calculation, but rich enough to model noise in empirical processes and high-dimensional probability. This is one route into [Cambridge III Advanced Probability](/page/Cambridge%20III%20Advanced%20Probability).
In machine learning, empirical Rademacher averages measure how strongly a hypothesis class can correlate with pure random labels. This makes Rademacher variables part of the language of generalisation bounds, especially in the setting of [Cambridge II Mathematics of Machine Learning](/page/Cambridge%20II%20Mathematics%20of%20Machine%20Learning).
## References
Androma, [Cambridge IA Probability](/page/Cambridge%20IA%20Probability).
Androma, [Cambridge IB Probability and Measure](/page/Cambridge%20IB%20Probability%20and%20Measure).
Androma, [Cambridge II Mathematics of Machine Learning](/page/Cambridge%20II%20Mathematics%20of%20Machine%20Learning).
Androma, [Cambridge III Advanced Probability](/page/Cambridge%20III%20Advanced%20Probability).
Patrick Billingsley, *Probability and Measure* (1995).
Michel Ledoux and Michel Talagrand, *Probability in Banach Spaces* (1991).
Roman Vershynin, *High-Dimensional Probability* (2018).
Rademacher Random Variable
Also known as: Rademacher variable, Rademacher distribution, Random sign variable, Symmetric Bernoulli sign, Rademacher sequence