[proofplan]
We establish the large deviation [limit](/page/Limit) $\lim_{n \to \infty} -\frac{1}{n} \log \mathbb{P}(S_n \geq na) = \Psi^*(a)$ through matching upper and lower bounds. The upper bound is a direct exponential Chebyshev argument: for each $\lambda \geq 0$, $\mathbb{P}(S_n \geq na) \leq e^{-\lambda na} \mathbb{E}[e^{\lambda S_n}] = e^{-n(\lambda a - \Psi(\lambda))}$, and optimising over $\lambda$ gives $\Psi^*(a)$. The lower bound requires a change-of-measure (exponential tilting) argument: we shift to $X_i - a$, tilt the [distribution](/page/Distribution) to recentre the mean at $0$, and use the [Central Limit Theorem](/theorems/521) under the tilted measure to show the tilted probability of $S_n \geq 0$ converges to $1/2$. The general case reduces to the case of bounded support via truncation and a compactness argument.
[/proofplan]
[step:Upper bound: apply exponential Chebyshev and optimise over $\lambda \geq 0$]
For any $\lambda \geq 0$, the [Markov Inequality](/theorems/514) applied to the non-negative random variable $e^{\lambda S_n}$ at threshold $e^{\lambda na}$ gives
\begin{align*}
\mathbb{P}(S_n \geq na) = \mathbb{P}(e^{\lambda S_n} \geq e^{\lambda na}) \leq e^{-\lambda na} \, \mathbb{E}[e^{\lambda S_n}].
\end{align*}
Since the $X_i$ are i.i.d., $\mathbb{E}[e^{\lambda S_n}] = \mathbb{E}[e^{\lambda X_1}]^n = e^{n\Psi(\lambda)}$ (where $\Psi(\lambda) = \log \mathbb{E}[e^{\lambda X_1}]$ is the cumulant generating [function](/page/Function)). Therefore
\begin{align*}
\mathbb{P}(S_n \geq na) \leq e^{-n(\lambda a - \Psi(\lambda))}.
\end{align*}
Taking logarithms, dividing by $-n$, and optimising over $\lambda \geq 0$:
\begin{align*}
-\frac{1}{n} \log \mathbb{P}(S_n \geq na) \geq \sup_{\lambda \geq 0} (\lambda a - \Psi(\lambda)) = \Psi^*(a).
\end{align*}
Since the right-hand side is independent of $n$, $\liminf_{n \to \infty} \left(-\frac{1}{n} \log \mathbb{P}(S_n \geq na)\right) \geq \Psi^*(a)$.
[guided]
This is the exponential Chebyshev method, the workhorse of large deviation upper bounds. The idea is: to bound the probability that $S_n$ is atypically large ($\geq na$ with $a > \bar{x}$), we exponentiate both sides ($e^{\lambda S_n} \geq e^{\lambda na}$) and apply [Markov's Inequality](/theorems/514). The exponential transform converts the additive event $\{S_n \geq na\}$ into a multiplicative bound, which interacts well with the independence of the $X_i$.
The key identity $\mathbb{E}[e^{\lambda S_n}] = e^{n\Psi(\lambda)}$ comes from independence: $\mathbb{E}[e^{\lambda(X_1 + \cdots + X_n)}] = \prod_{i=1}^n \mathbb{E}[e^{\lambda X_i}] = \mathbb{E}[e^{\lambda X_1}]^n$, and taking logarithms gives $\log \mathbb{E}[e^{\lambda S_n}] = n \Psi(\lambda)$.
The bound $e^{-n(\lambda a - \Psi(\lambda))}$ is valid for every $\lambda \geq 0$. The optimal $\lambda$ maximises $\lambda a - \Psi(\lambda)$, which is the Legendre transform $\Psi^*(a)$. If $\Psi$ is differentiable, the optimiser satisfies $\Psi'(\lambda^*) = a$ — it is the value of the tilt parameter that recentres the distribution at $a$. This connection between the optimal Chebyshev bound and exponential tilting is the conceptual backbone of Cramér's theorem.
[/guided]
[/step]
[step:Reduce the lower bound to the case $a = 0$, $\bar{x} \leq 0$]
Replacing $X_i$ by $\tilde{X}_i := X_i - a$ shifts the mean to $\tilde{\bar{x}} = \bar{x} - a \leq 0$ (since $a \geq \bar{x}$) and transforms the cumulant generating function:
\begin{align*}
\tilde{\Psi}(\lambda) = \log \mathbb{E}[e^{\lambda(X_1 - a)}] = \Psi(\lambda) - \lambda a,
\end{align*}
so $\tilde{\Psi}^*(0) = \sup_{\lambda \geq 0}(-\tilde{\Psi}(\lambda)) = \sup_{\lambda \geq 0}(\lambda a - \Psi(\lambda)) = \Psi^*(a)$. The event $\{S_n \geq na\}$ becomes $\{\tilde{S}_n \geq 0\}$. It therefore suffices to prove: if $\bar{x} \leq 0$, then
\begin{align*}
\limsup_{n \to \infty} \frac{1}{n} \log \mathbb{P}(S_n \geq 0) \geq \inf_{\lambda \geq 0} \Psi(\lambda).
\end{align*}
[/step]
[step:Case 1: $\mathbb{P}(X_1 > 0) = 0$]
If $X_1 \leq 0$ a.s., then $S_n \geq 0$ if and only if $S_n = 0$, which requires $X_i = 0$ for all $i$. By independence,
\begin{align*}
\mathbb{P}(S_n \geq 0) = \mathbb{P}(X_1 = 0)^n.
\end{align*}
For the rate function: $\Psi(\lambda) = \log \mathbb{E}[e^{\lambda X_1}]$, and since $X_1 \leq 0$ a.s., $e^{\lambda X_1} \leq 1$ for $\lambda \geq 0$, with equality on $\{X_1 = 0\}$. By the [Monotone Convergence Theorem](/theorems/509), $\lim_{\lambda \to +\infty} \mathbb{E}[e^{\lambda X_1}] = \mathbb{P}(X_1 = 0)$ (since $e^{\lambda X_1} \downarrow \mathbb{1}_{\{X_1 = 0\}}$ for $X_1 \leq 0$). Therefore
\begin{align*}
\inf_{\lambda \geq 0} \Psi(\lambda) \leq \lim_{\lambda \to \infty} \Psi(\lambda) = \log \mathbb{P}(X_1 = 0),
\end{align*}
and $\frac{1}{n} \log \mathbb{P}(S_n \geq 0) = \log \mathbb{P}(X_1 = 0) \geq \inf_{\lambda \geq 0} \Psi(\lambda)$.
[/step]
[step:Case 2: $\mathbb{E}[e^{\lambda X_1}] < \infty$ for all $\lambda$ and $\mathbb{P}(X_1 > 0) > 0$]
Since $\mathbb{E}[e^{\lambda X_1}] < \infty$ for all $\lambda \in \mathbb{R}$, the cumulant generating function $\Psi$ is $C^\infty$ on $\mathbb{R}$ (by differentiation under the [integral](/page/Integral), justified by dominated convergence). Define
\begin{align*}
M(\lambda) := \mathbb{E}[e^{\lambda X_1}], \quad \text{so } \Psi(\lambda) = \log M(\lambda), \quad \Psi'(\lambda) = \frac{M'(\lambda)}{M(\lambda)} = \frac{\mathbb{E}[X_1 e^{\lambda X_1}]}{M(\lambda)}.
\end{align*}
At $\lambda = 0$: $\Psi'(0) = \mathbb{E}[X_1] = \bar{x} \leq 0$.
As $\lambda \to +\infty$: since $\mathbb{P}(X_1 > 0) > 0$, there exists $\delta > 0$ with $\mathbb{P}(X_1 \geq \delta) > 0$. Then $\mathbb{E}[X_1 e^{\lambda X_1}] \geq \delta \cdot e^{\lambda \delta} \cdot \mathbb{P}(X_1 \geq \delta) \to +\infty$, and since $M(\lambda) \geq \mathbb{P}(X_1 \geq 0) > 0$, we get $\Psi'(\lambda) \to +\infty$.
By the [intermediate value theorem](/theorems/629) applied to the continuous function $\Psi'$, there exists $\theta > 0$ with $\Psi'(\theta) = 0$.
[guided]
The parameter $\theta$ is the exponential tilt that recentres the mean at $0$. We are looking for a change of measure under which $S_n/n$ concentrates near $0$ (instead of near $\bar{x} \leq 0$), so that $\mathbb{P}_\theta(S_n \geq 0)$ is bounded away from $0$.
The existence of $\theta$ requires two ingredients: (i) $\Psi'(0) = \bar{x} \leq 0$ (the original mean is non-positive), and (ii) $\Psi'(\lambda) \to +\infty$ as $\lambda \to +\infty$ (possible because $X_1$ has positive mass on $(0, \infty)$, so the tilt can push the mean to any positive value). The [intermediate value theorem](/theorems/180) bridges the gap.
The condition $\mathbb{P}(X_1 > 0) > 0$ is essential: without it, we are in Case 1.
[/guided]
[/step]
[step:Tilt the measure by $\theta$ and apply the [Central Limit Theorem](/theorems/521)]
Define the tilted probability measure $\mathbb{P}_\theta$ by the Radon-Nikodym [derivative](/page/Derivative)
\begin{align*}
\frac{d\mathbb{P}_\theta}{d\mathbb{P}} = \frac{e^{\theta S_n}}{M(\theta)^n}.
\end{align*}
Under $\mathbb{P}_\theta$, the $X_i$ are i.i.d. with common distribution having density $e^{\theta x}/M(\theta)$ with respect to the original law of $X_1$. The mean and variance under $\mathbb{P}_\theta$ are
\begin{align*}
\mathbb{E}_\theta[X_1] = \Psi'(\theta) = 0, \quad \operatorname{Var}_\theta(X_1) = \Psi''(\theta) =: \sigma_\theta^2.
\end{align*}
The variance $\sigma_\theta^2$ is finite and positive: finiteness follows from the hypothesis $M(\lambda) < \infty$ for all $\lambda$ (which guarantees all moments are finite under $\mathbb{P}_\theta$), and positivity holds because $X_1$ is not a.s. constant (since $\mathbb{P}(X_1 > 0) > 0$ and $\mathbb{E}_\theta[X_1] = 0$).
For any $\varepsilon > 0$, we bound $\mathbb{P}(S_n \geq 0)$ from below by restricting to the event $\{S_n \in [0, \varepsilon n]\}$:
\begin{align*}
\mathbb{P}(S_n \geq 0) &\geq \mathbb{P}(S_n \in [0, \varepsilon n]) \\
&= \mathbb{E}\!\left[\mathbb{1}_{\{S_n \in [0, \varepsilon n]\}}\right] \\
&= \mathbb{E}_\theta\!\left[\mathbb{1}_{\{S_n \in [0, \varepsilon n]\}} \cdot \frac{M(\theta)^n}{e^{\theta S_n}}\right] \\
&\geq M(\theta)^n \cdot e^{-\theta \varepsilon n} \cdot \mathbb{P}_\theta(S_n \in [0, \varepsilon n]),
\end{align*}
where the last inequality uses $e^{\theta S_n} \leq e^{\theta \varepsilon n}$ on $\{S_n \in [0, \varepsilon n]\}$ (since $\theta > 0$ and $S_n \leq \varepsilon n$).
By the [Central Limit Theorem](/theorems/521) applied under $\mathbb{P}_\theta$ (the $X_i$ are i.i.d. with mean $0$ and finite variance $\sigma_\theta^2$), $S_n / (\sigma_\theta \sqrt{n}) \xrightarrow{d} \mathcal{N}(0,1)$ under $\mathbb{P}_\theta$. The event $\{S_n \in [0, \varepsilon n]\}$ in the scaled variable is $\{S_n / (\sigma_\theta \sqrt{n}) \in [0, \varepsilon \sqrt{n} / \sigma_\theta]\}$. Since $\varepsilon \sqrt{n} / \sigma_\theta \to +\infty$,
\begin{align*}
\mathbb{P}_\theta(S_n \in [0, \varepsilon n]) \to \mathbb{P}(Z \geq 0) = \frac{1}{2},
\end{align*}
where $Z \sim \mathcal{N}(0,1)$.
[guided]
The exponential tilting (or change of measure) is the central technique in the lower bound. The idea: the event $\{S_n \geq 0\}$ is a large deviation event under $\mathbb{P}$ (since $\mathbb{E}[S_n] = n\bar{x} \leq 0$, the sum $S_n$ must fluctuate above its mean). Under $\mathbb{P}_\theta$, the same event is typical (since $\mathbb{E}_\theta[S_n] = 0$).
The Radon-Nikodym derivative $d\mathbb{P}_\theta/d\mathbb{P} = e^{\theta S_n}/M(\theta)^n$ is a product of i.i.d. factors $e^{\theta X_i}/M(\theta)$, which makes $\mathbb{P}_\theta$ another product measure. This is the key structural feature: exponential tilting preserves independence.
The restriction to $\{S_n \in [0, \varepsilon n]\}$ (instead of $\{S_n \geq 0\}$) ensures an upper bound on $e^{\theta S_n}$, which is needed to convert the change-of-measure identity into a lower bound. On the larger event $\{S_n \geq 0\}$, the factor $e^{-\theta S_n}$ could be arbitrarily small (when $S_n$ is large), preventing a useful bound.
The CLT under $\mathbb{P}_\theta$ is applicable because $\theta$ was chosen to make $\mathbb{E}_\theta[X_1] = 0$, and the variance $\sigma_\theta^2 = \Psi''(\theta)$ is finite (all moments exist since $M(\lambda) < \infty$ for all $\lambda$). The scaling $[0, \varepsilon n] = \sigma_\theta \sqrt{n} \cdot [0, \varepsilon \sqrt{n}/\sigma_\theta]$ shows that the interval grows to $[0, +\infty)$ in the CLT scale, capturing half the Gaussian mass.
[/guided]
[/step]
[step:Extract the exponential rate for Case 2]
Taking logarithms in the bound $\mathbb{P}(S_n \geq 0) \geq M(\theta)^n e^{-\theta \varepsilon n} \mathbb{P}_\theta(S_n \in [0, \varepsilon n])$, dividing by $n$, and taking $\limsup$:
\begin{align*}
\limsup_{n \to \infty} \frac{1}{n} \log \mathbb{P}(S_n \geq 0) &\geq \log M(\theta) - \theta \varepsilon + \limsup_{n \to \infty} \frac{1}{n} \log \mathbb{P}_\theta(S_n \in [0, \varepsilon n]) \\
&= \Psi(\theta) - \theta \varepsilon + 0,
\end{align*}
since $\mathbb{P}_\theta(S_n \in [0, \varepsilon n]) \to 1/2 > 0$ implies $\frac{1}{n} \log \mathbb{P}_\theta(S_n \in [0, \varepsilon n]) \to 0$.
Letting $\varepsilon \downarrow 0$:
\begin{align*}
\limsup_{n \to \infty} \frac{1}{n} \log \mathbb{P}(S_n \geq 0) \geq \Psi(\theta) \geq \inf_{\lambda \geq 0} \Psi(\lambda).
\end{align*}
[/step]
[step:Case 3 (general): reduce to bounded support via truncation and compactness]
For general $X_1$ with $\mathbb{P}(X_1 > 0) > 0$ but possibly $M(\lambda) = \infty$ for large $\lambda$, we truncate. For each $K > 0$, let $\nu_K$ denote the law of $X_1$ conditioned on $|X_1| \leq K$, with cumulant generating function
\begin{align*}
\Psi_K(\lambda) := \log \int_{-K}^{K} e^{\lambda x} \, d\mu(x),
\end{align*}
where $\mu$ is the law of $X_1$. Since $|x| \leq K$, $M_K(\lambda) := \int_{-K}^K e^{\lambda x} \, d\mu(x)$ is finite for all $\lambda$, so Case 2 applies to the $\nu_K$-distributed variables (after normalising to a probability measure).
Let $\mu_n$ and $\nu_{K,n}$ denote the laws of $S_n$ under $\mu$ and $\nu_K$ respectively. Since $\nu_K$ is the law of $X_1$ restricted to $[-K, K]$, a sample of $n$ i.i.d. $\nu_K$-draws can be coupled with $n$ i.i.d. $\mu$-draws conditioned on all falling in $[-K, K]$:
\begin{align*}
\mu_n([0, \infty)) \geq \nu_{K,n}([0, \infty)) \cdot \mu([-K, K])^n.
\end{align*}
Taking logarithms, dividing by $n$, and applying Case 2 to the bounded distribution:
\begin{align*}
\limsup_{n \to \infty} \frac{1}{n} \log \mu_n([0, \infty)) \geq \inf_{\lambda \geq 0} \Psi_K(\lambda) + \log \mu([-K, K]).
\end{align*}
As $K \to \infty$, $\mu([-K, K]) \to 1$, so $\log \mu([-K, K]) \to 0$. Also $\Psi_K(\lambda) \uparrow \Psi(\lambda)$ for each $\lambda \geq 0$ (by the [Monotone Convergence Theorem](/theorems/509)), so $\inf_{\lambda \geq 0} \Psi_K(\lambda) \uparrow \inf_{\lambda \geq 0} \Psi(\lambda)$.
[guided]
The convergence of infima requires a compactness argument. Define $J_K := \inf_{\lambda \geq 0} \Psi_K(\lambda)$. Since $\Psi_K \leq \Psi_{K'}$ for $K \leq K'$ (the integration domain $[-K,K]$ is enlarged), $J_K$ is non-decreasing in $K$, and $J_K \leq \inf_{\lambda \geq 0} \Psi(\lambda) =: J$ for all $K$.
We need $J_K \to J$. Suppose for contradiction that $J_K \leq J - \delta$ for all $K$. Then for each $K$, there exists $\lambda_K \geq 0$ with $\Psi_K(\lambda_K) \leq J$. Consider the level [sets](/page/Set) $L_K := \{\lambda \geq 0 : \Psi_K(\lambda) \leq J\}$. Each $L_K$ is a closed interval (since $\Psi_K$ is convex with $\Psi_K(0) = \log \mu([-K,K]) \leq 0 \leq J$ and $\Psi_K(\lambda) \to +\infty$ as $\lambda \to +\infty$ for $K$ large enough that $\mu((0,K]) > 0$). The sets $L_K$ are nested: $L_{K'} \subset L_K$ for $K' \geq K$ (since $\Psi_{K'} \geq \Psi_K$). Each $L_K$ is non-empty and compact (a closed bounded interval in $[0, \infty)$; bounded because $\Psi_K(\lambda) \geq \lambda \cdot (-K) + \log \mu([-K,K])$ which grows linearly). By the finite intersection property, $\bigcap_K L_K \neq \varnothing$, so there exists $\lambda_0 \geq 0$ with $\Psi_K(\lambda_0) \leq J$ for all $K$. Letting $K \to \infty$: $\Psi(\lambda_0) = \lim_K \Psi_K(\lambda_0) \leq J$, so $J \leq \Psi(\lambda_0) \leq J$, confirming $J_K \to J$.
[/guided]
[/step]
[step:Combine the upper and lower bounds]
The upper bound (valid for all $n$) gives
\begin{align*}
\liminf_{n \to \infty} \left(-\frac{1}{n} \log \mathbb{P}(S_n \geq na)\right) \geq \Psi^*(a).
\end{align*}
The lower bound (from Cases 1-3 and the reduction to $a = 0$) gives
\begin{align*}
\limsup_{n \to \infty} \left(-\frac{1}{n} \log \mathbb{P}(S_n \geq na)\right) \leq \Psi^*(a).
\end{align*}
Together, $\lim_{n \to \infty} -\frac{1}{n} \log \mathbb{P}(S_n \geq na) = \Psi^*(a)$.
[/step]