[proofplan]
We first prove the estimate for smooth viscous approximations of the conservation law. Differentiating the viscous equation gives a parabolic equation for the spatial slope, and comparison with an explicit Riccati supersolution gives
\begin{align*}
\partial_x u_\varepsilon(t,x)\leq \frac{1}{\alpha t}.
\end{align*}
We then pass to the entropy solution by the vanishing-viscosity characterization of scalar convex conservation laws. Finally, we translate the distributional derivative bound into the one-sided difference quotient estimate by subtracting the affine function $x \mapsto x/(\alpha t)$ and using the monotone representative of a distribution whose derivative is non-positive.
[/proofplan]
[step:Prove the one-sided slope bound for smooth viscous solutions]
Fix $\varepsilon > 0$. Let $u_{0,\varepsilon}: \mathbb{R} \to I$ be a smooth bounded approximation of the initial data with bounded spatial derivative $\partial_x u_{0,\varepsilon}\in L^\infty(\mathbb R)$. Let $u_\varepsilon: [0,\infty) \times \mathbb{R} \to I$ be the bounded smooth solution of the viscous conservation law
\begin{align*}
\partial_t u_\varepsilon + \partial_x f(u_\varepsilon) = \varepsilon \partial_{xx} u_\varepsilon.
\end{align*}
Because the formal statement now assumes $I=[a,b]$ is compact and $u_{0,\varepsilon}$ takes values in $I$, the constants $a$ and $b$ are respectively sub- and supersolutions of the viscous equation. The [Scalar Parabolic Maximum Principle](/theorems/5984), applied on finite cylinders and then exhausted to the whole line, gives $u_\varepsilon(t,x) \in I$ for all $t>0$ and $x \in \mathbb{R}$.
Define the spatial derivative $w_\varepsilon: (0,\infty) \times \mathbb{R} \to \mathbb{R}$ by
\begin{align*}
w_\varepsilon(t,x) = \partial_x u_\varepsilon(t,x).
\end{align*}
Differentiating the viscous equation with respect to $x$ gives
\begin{align*}
\partial_t w_\varepsilon + f'(u_\varepsilon)\partial_x w_\varepsilon + f''(u_\varepsilon)(w_\varepsilon)^2 = \varepsilon \partial_{xx} w_\varepsilon.
\end{align*}
Define $M_\varepsilon:=\max\{\|\partial_x u_{0,\varepsilon}\|_{L^\infty(\mathbb R)},0\}$. If $M_\varepsilon=0$, set $H_\varepsilon:(0,\infty)\to\mathbb R$ by $H_\varepsilon(t)=0$. If $M_\varepsilon>0$, set $H_\varepsilon:(0,\infty)\to\mathbb R$ by
\begin{align*}
H_\varepsilon(t)=\frac{1}{\alpha t+M_\varepsilon^{-1}}.
\end{align*}
Then $H_\varepsilon(0)\geq w_\varepsilon(0,x)$ for all $x\in\mathbb R$, and $H_\varepsilon$ satisfies $H_\varepsilon'(t)+\alpha H_\varepsilon(t)^2=0$ when $M_\varepsilon>0$; when $M_\varepsilon=0$, the same identity holds.
Let $z_\varepsilon: [0,\infty)\times\mathbb R\to\mathbb R$ be defined by $z_\varepsilon(t,x)=w_\varepsilon(t,x)-H_\varepsilon(t)$. On the set where $z_\varepsilon>0$, we have $w_\varepsilon>H_\varepsilon\geq0$. Using $u_\varepsilon(t,x)\in I$ and $f''\geq\alpha$ on $I$, the equation for $w_\varepsilon$ gives
\begin{align*}
\partial_t z_\varepsilon+f'(u_\varepsilon)\partial_x z_\varepsilon-\varepsilon\partial_{xx}z_\varepsilon \leq -\alpha (w_\varepsilon)^2+\alpha H_\varepsilon^2.
\end{align*}
Since $w_\varepsilon>H_\varepsilon\geq0$ on $\{z_\varepsilon>0\}$, the right-hand side is non-positive there. Equivalently, at any interior point where $z_\varepsilon$ has a positive local maximum, the inequality gives the maximum-principle differential contradiction for the positive part $(z_\varepsilon)^+$. Thus $(z_\varepsilon)^+$ is a bounded classical subsolution of the linear parabolic comparison problem on each strip $[0,T]\times\mathbb R$, with bounded drift coefficient $f'(u_\varepsilon)$ and diffusion coefficient $\varepsilon>0$. Moreover $(z_\varepsilon)^+(0,x)=0$ for every $x\in\mathbb R$. Applying the global-line form of the [Scalar Parabolic Maximum Principle](/theorems/5984), obtained by the standard quadratic barrier exhaustion with a barrier coefficient tending to zero after the spatial radius tends to infinity, gives $(z_\varepsilon)^+(t,x)=0$ for all $t\in[0,T]$ and $x\in\mathbb R$, hence $z_\varepsilon(t,x)\leq0$. Since $T>0$ was arbitrary,
\begin{align*}
w_\varepsilon(t,x)\leq H_\varepsilon(t)\leq \frac{1}{\alpha t}
\end{align*}
for all $t>0$ and $x\in\mathbb R$. Equivalently,
\begin{align*}
\partial_x u_\varepsilon(t,x)\leq \frac{1}{\alpha t}
\end{align*}
for all $t>0$ and $x\in\mathbb R$.
[guided]
The goal is to control positive slopes. The viscous equation is
\begin{align*}
\partial_t u_\varepsilon + \partial_x f(u_\varepsilon) = \varepsilon \partial_{xx} u_\varepsilon.
\end{align*}
Because $u_\varepsilon$ is smooth, all differentiations below are classical. Define $w_\varepsilon: (0,\infty) \times \mathbb{R} \to \mathbb{R}$ by
\begin{align*}
w_\varepsilon(t,x)=\partial_x u_\varepsilon(t,x).
\end{align*}
Differentiating the equation in $x$ and using the chain rule
\begin{align*}
\partial_x f(u_\varepsilon)=f'(u_\varepsilon)\partial_x u_\varepsilon
\end{align*}
gives
\begin{align*}
\partial_t w_\varepsilon + f'(u_\varepsilon)\partial_x w_\varepsilon + f''(u_\varepsilon)(w_\varepsilon)^2 = \varepsilon \partial_{xx} w_\varepsilon.
\end{align*}
The nonlinear term $f''(u_\varepsilon)(w_\varepsilon)^2$ is the useful term. It has the correct sign when $w_\varepsilon$ is positive, and the convexity assumption supplies the quantitative lower bound $f''(u_\varepsilon)\geq \alpha$. The clean way to avoid dependence on the initial size of $w_\varepsilon$ is to compare $w_\varepsilon$ with an explicit solution of the Riccati ordinary differential equation forced by the convexity term. Define $M_\varepsilon:=\max\{\|\partial_xu_{0,\varepsilon}\|_{L^\infty(\mathbb R)},0\}$. If $M_\varepsilon=0$, define $H_\varepsilon:(0,\infty)\to\mathbb R$ by $H_\varepsilon(t)=0$. If $M_\varepsilon>0$, define $H_\varepsilon:(0,\infty)\to\mathbb R$ by
\begin{align*}
H_\varepsilon(t)=\frac{1}{\alpha t+M_\varepsilon^{-1}}.
\end{align*}
Then $H_\varepsilon(0)\geq w_\varepsilon(0,x)$ for every $x\in\mathbb R$. Also, $H_\varepsilon'(t)+\alpha H_\varepsilon(t)^2=0$ in the case $M_\varepsilon>0$, while $H_\varepsilon'(t)+\alpha H_\varepsilon(t)^2=0$ also holds when $M_\varepsilon=0$.
Set $z_\varepsilon(t,x)=w_\varepsilon(t,x)-H_\varepsilon(t)$. We want to prove that $z_\varepsilon\leq0$. On the set where $z_\varepsilon>0$, one has $w_\varepsilon>H_\varepsilon\geq0$. The [Scalar Parabolic Maximum Principle](/theorems/5984) first gives $u_\varepsilon(t,x)\in I$. Here the updated theorem statement assumes $I=[a,b]$ is compact, the initial data take values in $I$, and the constants $a$ and $b$ are respectively sub- and supersolutions. Therefore the hypothesis $f''(r)\geq\alpha$ applies to $r=u_\varepsilon(t,x)$. Subtracting the equation for $H_\varepsilon$ from the equation for $w_\varepsilon$ gives, on $\{z_\varepsilon>0\}$,
\begin{align*}
\partial_t z_\varepsilon+f'(u_\varepsilon)\partial_x z_\varepsilon-\varepsilon\partial_{xx}z_\varepsilon \leq -\alpha (w_\varepsilon)^2+\alpha H_\varepsilon^2.
\end{align*}
Since $w_\varepsilon>H_\varepsilon\geq0$, the right-hand side is non-positive. Thus the positive part of $z_\varepsilon$ cannot be created after time $0$.
To make the unbounded spatial domain rigorous, fix $T>0$. On $[0,T]\times\mathbb R$, the coefficient $f'(u_\varepsilon)$ is bounded because $u_\varepsilon$ takes values in the compact interval $I$, and $z_\varepsilon$ is bounded because $w_\varepsilon$ is bounded by parabolic regularity from the bounded smooth initial derivative while $H_\varepsilon$ is bounded on $[0,T]$. The global-line version of the [Scalar Parabolic Maximum Principle](/theorems/5984) applies to the bounded classical subsolution $z_\varepsilon$ with initial data $z_\varepsilon(0,x)\leq0$. Equivalently, one may prove this version by fixing auxiliary parameters $\eta>0$ and $R>0$ and applying the finite-cylinder principle to $z_\varepsilon-\eta t-\eta x^2$ on $[0,T]\times[-R,R]$, where the coefficient of $x^2$ is chosen so that the lateral boundary is negative before sending $R\to\infty$ and then $\eta\downarrow0$. Hence $z_\varepsilon(t,x)\leq0$ for every $t\in[0,T]$ and $x\in\mathbb R$. As $T>0$ is arbitrary,
\begin{align*}
w_\varepsilon(t,x)\leq H_\varepsilon(t)\leq \frac{1}{\alpha t}
\end{align*}
for every $t>0$ and $x\in\mathbb R$. Therefore
\begin{align*}
\partial_xu_\varepsilon(t,x)\leq \frac{1}{\alpha t}.
\end{align*}
[/guided]
[/step]
[step:Pass the viscous estimate to the entropy solution]
Let $u_0:\mathbb R\to I$ denote the initial datum from the Cauchy problem in the theorem statement, interpreted as an element of $L^\infty(\mathbb R)$ with values in $I$ for $\mathcal L^1$-almost every point. Let $(\varepsilon_k)_{k\in\mathbb{N}}$ be a sequence with $\varepsilon_k>0$ and $\varepsilon_k\to 0$. Choose smooth initial data $u_{0,k}:\mathbb{R}\to I$ such that $u_{0,k}\to u_0$ in $L^1_{\mathrm{loc}}(\mathbb{R})$ and $\|u_{0,k}\|_{L^\infty(\mathbb{R})}\leq \|u_0\|_{L^\infty(\mathbb{R})}$. Let $u_k:[0,\infty)\times\mathbb R\to I$ be the corresponding smooth viscous solution with viscosity coefficient $\varepsilon_k$. By the vanishing-viscosity characterization of entropy solutions for scalar conservation laws with convex flux, together with the $L^1$-contraction and uniqueness theorem for Kružkov entropy solutions recorded on the [Entropy Solution](/page/Entropy%20Solution) page, these solutions converge to $u$ in $L^1_{\mathrm{loc}}((0,\infty)\times\mathbb{R})$, after passing to no further subsequence. The hypotheses are satisfied because the updated statement assumes bounded initial data with values in the compact interval $I$, $f\in C^2(\mathbb R)$ and hence is Lipschitz on $I$, the approximating data are uniformly bounded and converge to $u_0$ in $L^1_{\mathrm{loc}}(\mathbb R)$, and the entropy solution is the unique Kružkov semigroup solution with initial datum $u_0$. Here $L^1_{\mathrm{loc}}((0,\infty)\times\mathbb{R})$ denotes the space of functions integrable with respect to $\mathcal{L}^2$ on every compact subset of $(0,\infty)\times\mathbb{R}$.
By [Fubini's theorem](/theorems/2961) applied on compact time-space rectangles and a diagonal subsequence argument, the convergence in $L^1_{\mathrm{loc}}((0,\infty)\times\mathbb{R})$ implies that, after passing to a subsequence not relabelled, $u_k(t,\cdot)\to u(t,\cdot)$ in $L^1_{\mathrm{loc}}(\mathbb{R})$ for $\mathcal{L}^1$-almost every $t>0$. Fix such a time $t>0$. Let $\varphi:\mathbb R\to\mathbb R$ be a non-negative [test function](/page/Test%20Function) with $\varphi\in C_c^\infty(\mathbb{R})$. [Integration by parts](/theorems/210) with respect to $\mathcal{L}^1$ and the smooth estimate give
\begin{align*}
-\int_{\mathbb{R}} u_k(t,x)\varphi'(x)\,d\mathcal{L}^1(x)
\leq \frac{1}{\alpha t}\int_{\mathbb{R}}\varphi(x)\,d\mathcal{L}^1(x).
\end{align*}
Passing to the limit in the left-hand side using local $L^1$ convergence and the boundedness of $\varphi'$ gives
\begin{align*}
-\int_{\mathbb{R}} u(t,x)\varphi'(x)\,d\mathcal{L}^1(x)
\leq \frac{1}{\alpha t}\int_{\mathbb{R}}\varphi(x)\,d\mathcal{L}^1(x).
\end{align*}
Thus
\begin{align*}
\partial_x u(t,\cdot)\leq \frac{1}{\alpha t}
\end{align*}
in the sense of distributions for $\mathcal{L}^1$-almost every $t>0$. Let $t_0>0$ be arbitrary and choose times $t_j\to t_0$ for which the distributional inequality holds. By the strong $L^1_{\mathrm{loc}}$ continuity of the Kružkov entropy-solution semigroup, whose hypotheses are the same bounded-data and $C^2$ flux assumptions just verified, the entropy solution has a representative continuous as a map $(0,\infty)\to L^1_{\mathrm{loc}}(\mathbb{R})$, so $u(t_j,\cdot)\to u(t_0,\cdot)$ in $L^1_{\mathrm{loc}}(\mathbb{R})$. For every non-negative $\varphi\in C_c^\infty(\mathbb{R})$, passing to the limit in
\begin{align*}
-\int_{\mathbb{R}}u(t_j,x)\varphi'(x)\,d\mathcal{L}^1(x)\leq \frac{1}{\alpha t_j}\int_{\mathbb{R}}\varphi(x)\,d\mathcal{L}^1(x)
\end{align*}
gives the same inequality at $t_0$. Since $t_0>0$ was arbitrary, the distributional bound holds for every $t>0$.
[guided]
We now explain why the smooth estimate survives the limit $\varepsilon\downarrow0$. Let $u_0:\mathbb R\to I$ be the bounded initial datum from the theorem statement. Choose smooth functions $u_{0,k}:\mathbb R\to I$ with $u_{0,k}\to u_0$ in $L^1_{\mathrm{loc}}(\mathbb R)$ and with the same uniform $L^\infty$ bound. The vanishing-viscosity characterization of entropy solutions applies because $f\in C^2(\mathbb R)$, the data are uniformly bounded with values in the compact interval $I$, and the theorem statement specifies that $u$ is the Kružkov entropy solution with initial trace $u_0$. The $L^1$-contraction semigroup uniqueness theorem for Kružkov entropy solutions then identifies every vanishing-viscosity limit with this same $u$, so
\begin{align*}
u_k\to u \quad \text{in } L^1_{\mathrm{loc}}((0,\infty)\times\mathbb R).
\end{align*}
By Fubini's theorem on compact time-space rectangles and a diagonal subsequence argument, for $\mathcal L^1$-almost every $t>0$ this convergence gives $u_k(t,\cdot)\to u(t,\cdot)$ in $L^1_{\mathrm{loc}}(\mathbb R)$. Fix such a time and let $\varphi:\mathbb R\to\mathbb R$ be a non-negative test function with $\varphi\in C_c^\infty(\mathbb R)$. The smooth estimate from the previous step says
\begin{align*}
\partial_xu_k(t,x)\leq \frac{1}{\alpha t}.
\end{align*}
Integrating against $\varphi$ and integrating by parts with respect to $\mathcal L^1$ gives
\begin{align*}
-\int_{\mathbb R}u_k(t,x)\varphi'(x)\,d\mathcal L^1(x)\leq \frac{1}{\alpha t}\int_{\mathbb R}\varphi(x)\,d\mathcal L^1(x).
\end{align*}
Because $\varphi'$ is bounded and compactly supported, local $L^1$ convergence passes the left-hand side to the limit. Hence
\begin{align*}
-\int_{\mathbb R}u(t,x)\varphi'(x)\,d\mathcal L^1(x)\leq \frac{1}{\alpha t}\int_{\mathbb R}\varphi(x)\,d\mathcal L^1(x).
\end{align*}
This is exactly the distributional inequality
\begin{align*}
\partial_xu(t,\cdot)\leq \frac{1}{\alpha t}
\end{align*}
at almost every time. To obtain every $t>0$, fix $t_0>0$ and choose good times $t_j\to t_0$. The strong $L^1_{\mathrm{loc}}$ continuity of the Kružkov semigroup gives $u(t_j,\cdot)\to u(t_0,\cdot)$ in $L^1_{\mathrm{loc}}(\mathbb R)$, and the same test-function limit gives the inequality at $t_0$. Thus the distributional estimate holds for every $t>0$.
[/guided]
[/step]
[step:Convert the distributional derivative bound into the one-sided representative estimate]
Fix $t>0$ and define the constant
\begin{align*}
C_t := \frac{1}{\alpha t}.
\end{align*}
The distributional inequality
\begin{align*}
\partial_x u(t,\cdot)\leq C_t
\end{align*}
means that for every non-negative $\varphi\in C_c^\infty(\mathbb{R})$,
\begin{align*}
-\int_{\mathbb{R}}u(t,x)\varphi'(x)\,d\mathcal{L}^1(x)
\leq C_t\int_{\mathbb{R}}\varphi(x)\,d\mathcal{L}^1(x).
\end{align*}
Define $v_t: \mathbb{R} \to \mathbb{R}$ by
\begin{align*}
v_t(x)=u(t,x)-C_t x.
\end{align*}
Then $\partial_x v_t\leq 0$ in the sense of distributions. By the distributional characterization of monotone functions from the theory of [Distributional Derivatives](/page/Distributional%20Derivative), a locally integrable function whose distributional derivative is a non-positive Radon measure has a non-increasing representative. Hence $v_t$ has a non-increasing representative. For this representative, whenever $x<y$ outside a common $\mathcal{L}^1$-null exceptional set,
\begin{align*}
v_t(y)\leq v_t(x).
\end{align*}
Substituting the definition of $v_t$ gives
\begin{align*}
u(t,y)-C_t y \leq u(t,x)-C_t x.
\end{align*}
Therefore
\begin{align*}
u(t,y)-u(t,x)\leq C_t(y-x)=\frac{y-x}{\alpha t}.
\end{align*}
This is the desired Oleinik one-sided estimate.
[guided]
Fix $t>0$ and set $C_t:=1/(\alpha t)$. The distributional estimate proved above means that every non-negative test function $\varphi:\mathbb R\to\mathbb R$ with $\varphi\in C_c^\infty(\mathbb R)$ satisfies
\begin{align*}
-\int_{\mathbb R}u(t,x)\varphi'(x)\,d\mathcal L^1(x)\leq C_t\int_{\mathbb R}\varphi(x)\,d\mathcal L^1(x).
\end{align*}
Define $v_t:\mathbb R\to\mathbb R$ by $v_t(x)=u(t,x)-C_tx$. Subtracting the affine function removes the constant upper bound on the derivative, so $\partial_xv_t\leq0$ in the sense of distributions. The distributional characterization of monotone functions from the theory of [Distributional Derivatives](/page/Distributional%20Derivative) applies because $v_t\in L^1_{\mathrm{loc}}(\mathbb R)$ and its distributional derivative is a non-positive Radon measure. Therefore $v_t$ has a non-increasing representative.
For this representative, there is a common $\mathcal L^1$-null exceptional set outside which $v_t(y)\leq v_t(x)$ whenever $x<y$. Substituting $v_t(x)=u(t,x)-C_tx$ gives
\begin{align*}
u(t,y)-C_ty\leq u(t,x)-C_tx.
\end{align*}
Rearranging and replacing $C_t$ by $1/(\alpha t)$ yields
\begin{align*}
u(t,y)-u(t,x)\leq \frac{y-x}{\alpha t}.
\end{align*}
This proves the one-sided representative estimate and completes the proof.
[/guided]
[/step]