[proofplan]
We apply Talagrand's convex distance inequality to the median sublevel set $A=\{x\in K:F(x)\le m_F\}$. Convexity makes $A$ a closed convex subset of the product space, and the median condition gives $\mathbb P(X\in A)\ge 1/2$. The main geometric point is that if $F(x)\ge m_F+r$, then the convex distance from $x$ to $A$ is at least $r/L$: the Euclidean projection of $x$ onto $A$ supplies a separating unit vector, and the coordinate intervals having length at most $1$ converts that separation into Talagrand's weighted Hamming distance. Talagrand's inequality then gives the stated Gaussian upper-tail bound, and the case of interval length at most $D$ follows by scaling the coordinates.
[/proofplan]
[step:Define the median sublevel set and Talagrand convex distance]
Let
\begin{align*}
A:=\{x\in K:F(x)\le m_F\}.
\end{align*}
Since $F:K\to\mathbb R$ is convex, the set $A$ is convex. Since $F$ is Lipschitz on $K$, it is continuous on $K$, so $A$ is closed in the compact set $K$. Since $m_F$ is a median of $F(X)$,
\begin{align*}
\mathbb P(X\in A)=\mathbb P(F(X)\le m_F)\ge \frac{1}{2}.
\end{align*}
Let $\mathcal P(K)$ denote the power set of $K$. Define the Talagrand convex distance map used in [Talagrand's Convex Distance Inequality](/theorems/1182) as the map $d_T:K\times\mathcal P(K)\to[0,\infty]$ given by
\begin{align*}
d_T(x,B):=\sup_{\alpha\in[0,\infty)^n,\ |\alpha|=1}\ \inf_{y\in B}\sum_{i=1}^{n}\alpha_i\mathbb 1_{\{x_i\ne y_i\}}.
\end{align*}
Here $B\subset K$ is arbitrary, $|\alpha|$ denotes the Euclidean norm of $\alpha\in\mathbb R^n$, and $\mathbb 1_{\{x_i\ne y_i\}}$ is the indicator of the event that the $i$th coordinates of $x$ and $y$ differ.
[/step]
[step:Show that a large value of $F$ forces large convex distance from the sublevel set]
Since the theorem statement assumes $L>0$, the quotient $r/L$ is well-defined. We prove
\begin{align*}
\{x\in K:F(x)-m_F\ge r\}\subseteq \{x\in K:d_T(x,A)\ge r/L\}.
\end{align*}
Fix $x\in K$ with $F(x)-m_F\ge r$. If $r=0$, then $d_T(x,A)\ge 0=r/L$, so assume $r>0$. Then $x\notin A$.
Because $A$ is a nonempty compact convex subset of $\mathbb R^n$, choose $p\in A$ such that
\begin{align*}
|x-p|=\inf_{y\in A}|x-y|.
\end{align*}
Define $u\in\mathbb R^n$ by
\begin{align*}
u:=\frac{x-p}{|x-p|}.
\end{align*}
For every $y\in A$ and every $t\in[0,1]$, convexity gives $p+t(y-p)\in A$. The function $\varphi_y:[0,1]\to\mathbb R$ defined by $\varphi_y(t):=|x-p-t(y-p)|^2$ has a minimum at $t=0$, so its right derivative at $0$ is nonnegative. Hence $(x-p)\cdot(y-p)\le 0$, and therefore
\begin{align*}
u\cdot(x-y)=\frac{|x-p|^2-(x-p)\cdot(y-p)}{|x-p|}\ge |x-p|
\end{align*}
for every $y\in A$. Since $F$ is $L$-Lipschitz and $F(p)\le m_F$,
\begin{align*}
r\le F(x)-m_F\le F(x)-F(p)\le L|x-p|.
\end{align*}
Hence $|x-p|\ge r/L$.
Define $\alpha\in[0,\infty)^n$ by $\alpha_i:=|u_i|$ for $i\in\{1,\dots,n\}$. Since $|u|=1$, we have $|\alpha|=1$. For every $y\in A$, using that each interval $K_i$ has length at most $1$,
\begin{align*}
\sum_{i=1}^{n}\alpha_i\mathbb 1_{\{x_i\ne y_i\}}
=
\sum_{i=1}^{n}|u_i|\mathbb 1_{\{x_i\ne y_i\}}
\ge
\sum_{i=1}^{n}|u_i||x_i-y_i|
\ge
u\cdot(x-y).
\end{align*}
Therefore, for every $y\in A$,
\begin{align*}
\sum_{i=1}^{n}\alpha_i\mathbb 1_{\{x_i\ne y_i\}}\ge \frac{r}{L}.
\end{align*}
Taking the infimum over $y\in A$ and then the supremum over all admissible weights in the definition of $d_T$ gives
\begin{align*}
d_T(x,A)\ge \frac{r}{L}.
\end{align*}
[guided]
The goal is to turn the analytic statement $F(x)\ge m_F+r$ into a combinatorial-distance statement suitable for Talagrand's inequality. Fix $x\in K$ with $F(x)-m_F\ge r$. If $r=0$, the conclusion $d_T(x,A)\ge 0$ follows from the definition of $d_T$, so assume $r>0$. Then $F(x)>m_F$, while every point of $A$ has $F$-value at most $m_F$, so $x\notin A$.
The set $A$ is nonempty because $\mathbb P(X\in A)\ge 1/2$. It is compact and convex, so there is a point $p\in A$ minimizing Euclidean distance to $x$:
\begin{align*}
|x-p|=\inf_{y\in A}|x-y|.
\end{align*}
Define the unit vector
\begin{align*}
u:=\frac{x-p}{|x-p|}.
\end{align*}
For any $y\in A$ and any $t\in[0,1]$, convexity of $A$ gives $p+t(y-p)\in A$. Since $p$ minimizes the squared distance from $x$ to $A$, define the function $\varphi:[0,1]\to\mathbb R$ by
\begin{align*}
\varphi(t):=|x-p-t(y-p)|^2.
\end{align*}
This function has its minimum at $t=0$. Thus its right derivative at $0$ is nonnegative:
\begin{align*}
0\le \varphi'(0)=-2(x-p)\cdot(y-p).
\end{align*}
Hence
\begin{align*}
(x-p)\cdot(y-p)\le 0.
\end{align*}
Rearranging gives
\begin{align*}
u\cdot(x-y)=\frac{(x-p)\cdot(x-y)}{|x-p|}
=
\frac{|x-p|^2-(x-p)\cdot(y-p)}{|x-p|}
\ge |x-p|.
\end{align*}
Now the Lipschitz hypothesis converts the increase in $F$ into a lower bound on $|x-p|$. Since $p\in A$, we have $F(p)\le m_F$. Therefore
\begin{align*}
r\le F(x)-m_F\le F(x)-F(p)\le L|x-p|.
\end{align*}
Thus
\begin{align*}
|x-p|\ge \frac{r}{L}.
\end{align*}
It remains to connect this Euclidean separation to Talagrand's convex distance. Define the weight vector $\alpha\in[0,\infty)^n$ by
\begin{align*}
\alpha_i:=|u_i|
\end{align*}
for $i\in\{1,\dots,n\}$. Since $u$ is a Euclidean unit vector,
\begin{align*}
|\alpha|^2=\sum_{i=1}^{n}|u_i|^2=|u|^2=1.
\end{align*}
For any $y\in A$, the condition that $K_i$ has length at most $1$ implies $|x_i-y_i|\le 1$ for each coordinate. Hence
\begin{align*}
|x_i-y_i|\le \mathbb 1_{\{x_i\ne y_i\}}
\end{align*}
for every $i\in\{1,\dots,n\}$. Therefore
\begin{align*}
\sum_{i=1}^{n}\alpha_i\mathbb 1_{\{x_i\ne y_i\}}
=
\sum_{i=1}^{n}|u_i|\mathbb 1_{\{x_i\ne y_i\}}
\ge
\sum_{i=1}^{n}|u_i||x_i-y_i|.
\end{align*}
By the triangle inequality in each coordinate,
\begin{align*}
\sum_{i=1}^{n}|u_i||x_i-y_i|
\ge
\sum_{i=1}^{n}u_i(x_i-y_i)
=
u\cdot(x-y).
\end{align*}
Combining this with the projection inequality and the Lipschitz lower bound gives
\begin{align*}
\sum_{i=1}^{n}\alpha_i\mathbb 1_{\{x_i\ne y_i\}}\ge \frac{r}{L}
\end{align*}
for every $y\in A$. Taking the infimum over $y\in A$ gives at least $r/L$ for this particular admissible weight vector $\alpha$, and taking the supremum over all admissible weights can only increase the value. Hence
\begin{align*}
d_T(x,A)\ge \frac{r}{L}.
\end{align*}
[/guided]
[/step]
[step:Apply Talagrand's convex distance inequality to the median sublevel set]
We use [Talagrand's Convex Distance Inequality](/theorems/1182) for product probability spaces in the following form: for every measurable set $B\subset K$ and every $s\ge 0$,
\begin{align*}
\mathbb P(X\in B)\,\mathbb P(d_T(X,B)\ge s)\le \exp\left(-\frac{s^2}{4}\right).
\end{align*}
Here $B$ denotes an arbitrary measurable subset of $K$, $s$ denotes a nonnegative real parameter, and $d_T$ is the convex distance defined above.
Apply this inequality with $B=A$ and $s=r/L$. The set $A$ is Borel measurable because it is closed in $K$, and the coordinates of $X$ are independent by assumption, so the product-space hypothesis is satisfied. Since $\mathbb P(X\in A)\ge 1/2$, Talagrand's inequality gives
\begin{align*}
\mathbb P\left(d_T(X,A)\ge \frac{r}{L}\right)\le 2\exp\left(-\frac{r^2}{4L^2}\right).
\end{align*}
By the inclusion proved in the preceding step,
\begin{align*}
\mathbb P(F(X)-m_F\ge r)\le \mathbb P\left(d_T(X,A)\ge \frac{r}{L}\right).
\end{align*}
For $r=0$, the desired estimate holds because probabilities are at most $1$. For $r>0$, the event $\{F(X)-m_F\ge r\}$ is contained in $\{F(X)>m_F\}$, and the median property gives
\begin{align*}
\mathbb P(F(X)>m_F)\le 1-\mathbb P(F(X)\le m_F)\le \frac{1}{2}.
\end{align*}
Set $a:=r^2/L^2$. If $0<a\le 8\log 2$, then
\begin{align*}
\mathbb P(F(X)-m_F\ge r)\le \frac{1}{2}\le \exp\left(-\frac{a}{8}\right).
\end{align*}
If $a\ge 8\log 2$, then the Talagrand bound yields
\begin{align*}
\mathbb P(F(X)-m_F\ge r)\le 2\exp\left(-\frac{a}{4}\right)\le \exp\left(-\frac{a}{8}\right).
\end{align*}
Thus, with the universal constant $c:=1/8$,
\begin{align*}
\mathbb P(F(X)-m_F\ge r)\le \exp\left(-\frac{c r^2}{L^2}\right).
\end{align*}
[guided]
The geometric work in the previous step produced exactly the input needed for [Talagrand's Convex Distance Inequality](/theorems/1182). We use the following form of that inequality: if $B\subset K$ is a measurable set and $s\ge 0$, then
\begin{align*}
\mathbb P(X\in B)\,\mathbb P(d_T(X,B)\ge s)\le \exp\left(-\frac{s^2}{4}\right).
\end{align*}
The symbols here are auxiliary: $B$ is an arbitrary measurable subset of the product space $K$, and $s$ is a nonnegative real number. The distance $d_T$ is the weighted Hamming convex distance already defined in the proof.
We apply the inequality to the particular set $B=A$ and to the value $s=r/L$. The hypotheses are satisfied: $A$ is measurable because it is closed in $K$, and the law of $X$ is a product probability measure because the coordinates $X_1,\dots,X_n$ are independent. Since $m_F$ is a median, we already know
\begin{align*}
\mathbb P(X\in A)\ge \frac{1}{2}.
\end{align*}
Therefore Talagrand's inequality implies
\begin{align*}
\mathbb P\left(d_T(X,A)\ge \frac{r}{L}\right)\le 2\exp\left(-\frac{r^2}{4L^2}\right).
\end{align*}
Now we use the inclusion from the preceding step. That inclusion says that every point with $F(x)-m_F\ge r$ must have convex distance at least $r/L$ from $A$. Hence
\begin{align*}
\mathbb P(F(X)-m_F\ge r)\le \mathbb P\left(d_T(X,A)\ge \frac{r}{L}\right).
\end{align*}
This gives the correct Gaussian decay for large $r$, but it contains a prefactor $2$. To obtain the theorem's no-prefactor form with a universal constant, we split into small and large deviations.
For $r=0$, the desired bound is immediate because the right-hand side equals $1$ and probabilities are at most $1$. Assume $r>0$ and set
\begin{align*}
a:=\frac{r^2}{L^2}.
\end{align*}
Because $r>0$, the event $\{F(X)-m_F\ge r\}$ is contained in $\{F(X)>m_F\}$. The median condition gives
\begin{align*}
\mathbb P(F(X)>m_F)\le 1-\mathbb P(F(X)\le m_F)\le \frac{1}{2}.
\end{align*}
If $0<a\le 8\log 2$, then
\begin{align*}
\mathbb P(F(X)-m_F\ge r)\le \frac{1}{2}\le \exp\left(-\frac{a}{8}\right).
\end{align*}
If $a\ge 8\log 2$, then the prefactor in the Talagrand estimate is absorbed into the exponential:
\begin{align*}
2\exp\left(-\frac{a}{4}\right)\le \exp\left(-\frac{a}{8}\right).
\end{align*}
Combining the two cases gives
\begin{align*}
\mathbb P(F(X)-m_F\ge r)\le \exp\left(-\frac{r^2}{8L^2}\right).
\end{align*}
Thus the theorem holds with the universal constant $c:=1/8$.
[/guided]
[/step]
[step:Reduce intervals of length at most $D$ to the unit-length case]
Assume now that every $K_i$ has length at most $D$, where $D>0$. The unit-length case has been proved with the universal constant $c=1/8$, so the scaled conclusion will have the same constant after replacing $L$ by $DL$. For each $i\in\{1,\dots,n\}$, write $K_i=[a_i,b_i]$ with $b_i-a_i\le D$, and define
\begin{align*}
\widetilde K_i:=\left\{\frac{x_i-a_i}{D}:x_i\in K_i\right\}\subset\mathbb R.
\end{align*}
Then $\widetilde K_i$ is a compact interval of length at most $1$. Set
\begin{align*}
\widetilde K:=\widetilde K_1\times\cdots\times\widetilde K_n.
\end{align*}
Define the affine map $T:\widetilde K\to K$ by
\begin{align*}
T(z):=(a_1+Dz_1,\dots,a_n+Dz_n).
\end{align*}
Let $(\Omega,\mathcal F,\mathbb P)$ denote the probability space on which $X$ is defined. Define the random vector $\widetilde X:\Omega\to\widetilde K$ by
\begin{align*}
\widetilde X:=T^{-1}(X).
\end{align*}
The coordinates of $\widetilde X$ remain independent because each coordinate is obtained from the corresponding coordinate of $X$ by a measurable one-dimensional affine transformation.
Define the function $\widetilde F:\widetilde K\to\mathbb R$ by
\begin{align*}
\widetilde F(z):=F(Tz).
\end{align*}
The function $\widetilde F$ is convex because it is the composition of the convex function $F$ with an affine map. For $z,w\in\widetilde K$,
\begin{align*}
|\widetilde F(z)-\widetilde F(w)|
=
|F(Tz)-F(Tw)|
\le
L|Tz-Tw|
=
DL|z-w|.
\end{align*}
Thus $\widetilde F$ is $DL$-Lipschitz. Since $\widetilde F(\widetilde X)=F(X)$, the same number $m_F$ is a median of $\widetilde F(\widetilde X)$. Applying the unit-length case to $\widetilde F$ and $\widetilde X$ gives
\begin{align*}
\mathbb P(F(X)-m_F\ge r)=\mathbb P(\widetilde F(\widetilde X)-m_F\ge r)
\end{align*}
\begin{align*}
\mathbb P(\widetilde F(\widetilde X)-m_F\ge r)\le \exp\left(-\frac{c r^2}{D^2L^2}\right).
\end{align*}
This is the claimed replacement of $L$ by $DL$ with the same universal constant $c$ from the unit-length case.
[/step]
[step:Explain why no lower-tail estimate follows for a general convex function]
The proof used convexity of $F$ to make the sublevel set $\{x\in K:F(x)\le m_F\}$ convex. A lower-tail estimate for $F$ would be an upper-tail estimate for $-F$, but the preceding argument applies to $-F$ only when $-F$ is convex on $K$. Therefore no lower-tail estimate is implied for a general convex $F$ by this theorem alone.
[/step]