[proofplan]
We prove a one-sided concentration bound by shifting $X$ by an arbitrary positive parameter $a$ and applying the elementary Markov argument to the non-negative [random variable](/page/Random%20Variable) $(X+a)^2$. The event $\{X\ge r\}$ forces $(X+a)^2$ to be at least $(r+a)^2$, so the probability is bounded by a rational function of $a$. We then minimize this rational function over $a>0$, obtaining the optimal choice $a=\sigma^2/r$ when $\sigma^2>0$, while the degenerate case $\sigma^2=0$ is handled separately.
[/proofplan]
[step:Handle the zero-variance case directly]
Assume first that $\sigma^2=0$. Since $\mathbb E[X]=0$, the definition of variance gives $\mathbb E[X^2]=0$. For the fixed number $r>0$, define the event
\begin{align*}
A_r:=\{\omega\in\Omega:X(\omega)\ge r\}\in\mathcal F.
\end{align*}
Pointwise on $\Omega$,
\begin{align*}
r^2\mathbb 1_{A_r}\le X^2.
\end{align*}
Taking expectations and using monotonicity of expectation gives
\begin{align*}
r^2\mathbb P(A_r)=\mathbb E[r^2\mathbb 1_{A_r}]\le \mathbb E[X^2]=0.
\end{align*}
Because $r^2>0$, this implies $\mathbb P(X\ge r)=0$. Therefore
\begin{align*}
\mathbb P(X\ge r)=0=\frac{\sigma^2}{\sigma^2+r^2},
\end{align*}
so the desired inequality holds when $\sigma^2=0$.
[/step]
[step:Bound the upper tail after shifting by a positive parameter]
Assume now that $\sigma^2>0$. Fix $a>0$, and define the non-negative real-valued random variable $Y_a:\Omega\to[0,\infty)$ by
\begin{align*}
Y_a(\omega)=(X(\omega)+a)^2
\end{align*}
for every $\omega\in\Omega$.
For the event
\begin{align*}
A_r:=\{\omega\in\Omega:X(\omega)\ge r\},
\end{align*}
we have $X(\omega)+a\ge r+a>0$ for every $\omega\in A_r$, hence
\begin{align*}
(r+a)^2\mathbb 1_{A_r}\le Y_a
\end{align*}
pointwise on $\Omega$. Taking expectations gives
\begin{align*}
(r+a)^2\mathbb P(A_r)
=\mathbb E[(r+a)^2\mathbb 1_{A_r}]
\le \mathbb E[Y_a].
\end{align*}
Since $X\in L^2(\Omega,\mathcal F,\mathbb P)$ and $a$ is finite, $Y_a$ is integrable. Expanding the square and using $\mathbb E[X]=0$ gives
\begin{align*}
\mathbb E[Y_a]
=\mathbb E[(X+a)^2]
=\mathbb E[X^2]+2a\mathbb E[X]+a^2
=\sigma^2+a^2.
\end{align*}
Dividing by $(r+a)^2>0$, we obtain
\begin{align*}
\mathbb P(X\ge r)\le \frac{\sigma^2+a^2}{(r+a)^2}
\end{align*}
for every $a>0$.
[guided]
The purpose of introducing $a>0$ is to build a non-negative square whose size is forced by the event $X\ge r$. Define $Y_a:\Omega\to[0,\infty)$ by
\begin{align*}
Y_a(\omega)=(X(\omega)+a)^2
\end{align*}
for every $\omega\in\Omega$. This is a valid non-negative random variable because $X$ is measurable and the map $t\mapsto(t+a)^2$ is Borel measurable.
Now define the tail event
\begin{align*}
A_r:=\{\omega\in\Omega:X(\omega)\ge r\}.
\end{align*}
Since $X$ is measurable and $[r,\infty)$ is Borel, $A_r\in\mathcal F$. On this event, $X(\omega)\ge r$, and therefore
\begin{align*}
X(\omega)+a\ge r+a>0.
\end{align*}
Squaring preserves the inequality because both sides are non-negative, so for every $\omega\in A_r$,
\begin{align*}
(X(\omega)+a)^2\ge (r+a)^2.
\end{align*}
For $\omega\notin A_r$, the indicator $\mathbb 1_{A_r}(\omega)$ is $0$. Hence the pointwise inequality on all of $\Omega$ is
\begin{align*}
(r+a)^2\mathbb 1_{A_r}\le Y_a.
\end{align*}
Taking expectations and using monotonicity of expectation gives
\begin{align*}
(r+a)^2\mathbb P(A_r)
=\mathbb E[(r+a)^2\mathbb 1_{A_r}]
\le \mathbb E[Y_a].
\end{align*}
The random variable $Y_a$ is integrable because
\begin{align*}
Y_a=(X+a)^2\le 2X^2+2a^2,
\end{align*}
and $X\in L^2(\Omega,\mathcal F,\mathbb P)$. Expanding the square inside the expectation,
\begin{align*}
\mathbb E[Y_a]
=\mathbb E[(X+a)^2]
=\mathbb E[X^2]+2a\mathbb E[X]+a^2.
\end{align*}
The hypotheses give $\mathbb E[X]=0$ and, since the mean is zero,
\begin{align*}
\operatorname{Var}(X)=\mathbb E[(X-\mathbb E[X])^2]=\mathbb E[X^2]=\sigma^2.
\end{align*}
Therefore
\begin{align*}
\mathbb E[Y_a]=\sigma^2+a^2.
\end{align*}
Because $r+a>0$, division by $(r+a)^2$ is valid, and we conclude that
\begin{align*}
\mathbb P(X\ge r)\le \frac{\sigma^2+a^2}{(r+a)^2}.
\end{align*}
This estimate holds for every $a>0$, so the remaining task is to choose the best shift.
[/guided]
[/step]
[step:Minimize the shifted bound over the parameter $a$]
Define the function $\phi:(0,\infty)\to(0,\infty)$ by
\begin{align*}
\phi(a)=\frac{\sigma^2+a^2}{(r+a)^2}
\end{align*}
for every $a\in(0,\infty)$.
The function $\phi$ is differentiable on $(0,\infty)$. By the quotient rule,
\begin{align*}
\phi'(a)=\frac{2a(r+a)^2-2(r+a)(\sigma^2+a^2)}{(r+a)^4}.
\end{align*}
Factoring $2(r+a)$ from the numerator and using $r+a>0$ gives
\begin{align*}
\phi'(a)=\frac{2(ar-\sigma^2)}{(r+a)^3}.
\end{align*}
Since $r>0$ and $\sigma^2>0$, the unique critical point is
\begin{align*}
a_0:=\frac{\sigma^2}{r}>0.
\end{align*}
Moreover, $\phi'(a)<0$ for $0<a<a_0$ and $\phi'(a)>0$ for $a>a_0$, so $\phi$ attains its minimum on $(0,\infty)$ at $a_0$. Substituting $a_0=\sigma^2/r$ into the bound from the previous step yields
\begin{align*}
\mathbb P(X\ge r)\le \frac{\sigma^2+(\sigma^2/r)^2}{(r+\sigma^2/r)^2}.
\end{align*}
Rewriting numerator and denominator over the common factor $r^2$ gives
\begin{align*}
\frac{\sigma^2+(\sigma^2/r)^2}{(r+\sigma^2/r)^2}=\frac{\sigma^2(r^2+\sigma^2)/r^2}{(r^2+\sigma^2)^2/r^2}.
\end{align*}
Cancelling $r^2>0$ and one factor of $r^2+\sigma^2>0$ gives
\begin{align*}
\mathbb P(X\ge r)\le \frac{\sigma^2}{r^2+\sigma^2}.
\end{align*}
This is exactly the claimed inequality.
[/step]