[proofplan]
We construct the subgradient $r$ in three stages. First, we show that every convex function $f: \mathbb{R}^n \to \mathbb{R}$ possesses one-sided directional derivatives in every direction and that the resulting map $d \mapsto g(d)$ is sublinear (positively homogeneous and subadditive). Second, we apply the [Hahn-Banach Theorem](/theorems/879) to the sublinear functional $g$ to obtain a linear functional $L: \mathbb{R}^n \to \mathbb{R}$ satisfying $L(d) \leq g(d)$ for all $d \in \mathbb{R}^n$. Third, we identify $L$ with the inner product against a vector $r \in \mathbb{R}^n$ and verify that $r$ satisfies the supporting hyperplane inequality.
[/proofplan]
[step:Establish existence and monotonicity of the difference quotient for convex functions]
Fix $x \in \mathbb{R}^n$ and a direction $d \in \mathbb{R}^n$. For $t > 0$, define the difference quotient
\begin{align*}
\varphi: (0, \infty) &\to \mathbb{R} \\
t &\mapsto \frac{f(x + td) - f(x)}{t}.
\end{align*}
We claim that $\varphi$ is nondecreasing. Let $0 < s < t$. Write $x + sd$ as a convex combination of $x$ and $x + td$:
\begin{align*}
x + sd = \frac{t - s}{t} \cdot x + \frac{s}{t} \cdot (x + td).
\end{align*}
Since $0 < s < t$, the coefficient $\lambda := s/t$ satisfies $0 < \lambda < 1$. By convexity of $f$,
\begin{align*}
f(x + sd) \leq (1 - \lambda) f(x) + \lambda f(x + td) = f(x) + \frac{s}{t}\bigl(f(x + td) - f(x)\bigr).
\end{align*}
Subtracting $f(x)$ and dividing by $s > 0$:
\begin{align*}
\frac{f(x + sd) - f(x)}{s} \leq \frac{f(x + td) - f(x)}{t},
\end{align*}
which gives $\varphi(s) \leq \varphi(t)$. Hence $\varphi$ is nondecreasing on $(0, \infty)$.
[guided]
The goal of this step is to understand the behaviour of the one-sided difference quotient $t \mapsto \frac{f(x+td) - f(x)}{t}$ as $t \searrow 0$. For the limit to exist, we need some monotonicity or boundedness property.
Fix $x \in \mathbb{R}^n$ and a direction $d \in \mathbb{R}^n$. Define the difference quotient
\begin{align*}
\varphi: (0, \infty) &\to \mathbb{R} \\
t &\mapsto \frac{f(x + td) - f(x)}{t}.
\end{align*}
Why should $\varphi$ be monotone? The key observation is that the point $x + sd$ (for $0 < s < t$) lies on the line segment joining $x$ and $x + td$, so convexity provides a bound on $f(x + sd)$. Explicitly, we write
\begin{align*}
x + sd = \frac{t - s}{t} \cdot x + \frac{s}{t} \cdot (x + td),
\end{align*}
which is a convex combination because the coefficient $\lambda := s/t$ satisfies $0 < \lambda < 1$. By convexity of $f$,
\begin{align*}
f(x + sd) &\leq (1 - \lambda) f(x) + \lambda f(x + td) \\
&= f(x) + \frac{s}{t}\bigl(f(x + td) - f(x)\bigr).
\end{align*}
Subtracting $f(x)$ from both sides and dividing by $s > 0$ (which preserves the inequality direction since $s > 0$):
\begin{align*}
\frac{f(x + sd) - f(x)}{s} \leq \frac{f(x + td) - f(x)}{t}.
\end{align*}
This is precisely $\varphi(s) \leq \varphi(t)$, so $\varphi$ is nondecreasing on $(0, \infty)$.
This monotonicity is the engine of the entire proof: it simultaneously guarantees that the limit as $t \searrow 0$ exists and provides the subadditivity of the limiting object.
[/guided]
[/step]
[step:Prove that the one-sided directional derivative exists and is finite]
Since $\varphi$ is nondecreasing, the limit
\begin{align*}
g(d) := \lim_{t \to 0^+} \frac{f(x + td) - f(x)}{t} = \inf_{t > 0} \frac{f(x + td) - f(x)}{t}
\end{align*}
exists in $\mathbb{R} \cup \{-\infty\}$. We verify that $g(d) > -\infty$. Write $x$ as the midpoint convex combination
\begin{align*}
x = \frac{1}{2}(x + td) + \frac{1}{2}(x - td)
\end{align*}
for any $t > 0$. By convexity of $f$,
\begin{align*}
f(x) \leq \frac{1}{2} f(x + td) + \frac{1}{2} f(x - td).
\end{align*}
Rearranging:
\begin{align*}
f(x + td) - f(x) \geq f(x) - f(x - td) = -(f(x - td) - f(x)).
\end{align*}
Dividing by $t > 0$:
\begin{align*}
\frac{f(x + td) - f(x)}{t} \geq -\frac{f(x - td) - f(x)}{t} = -\varphi_{-d}(t),
\end{align*}
where $\varphi_{-d}(t) := \frac{f(x - td) - f(x)}{t}$ is the difference quotient in the direction $-d$. Since $\varphi_{-d}$ is also nondecreasing (by the result of the previous step applied to the direction $-d$), for any fixed $t_0 > 0$ and all $0 < t \leq t_0$ we have $\varphi_{-d}(t) \leq \varphi_{-d}(t_0)$. Therefore
\begin{align*}
\varphi(t) = \frac{f(x + td) - f(x)}{t} \geq -\varphi_{-d}(t_0) = -\frac{f(x - t_0 d) - f(x)}{t_0} > -\infty.
\end{align*}
Since $\varphi$ is bounded below by a finite constant and nondecreasing, $g(d) = \inf_{t > 0} \varphi(t)$ is a finite real number.
[guided]
We established in the previous step that $\varphi(t) = \frac{f(x+td) - f(x)}{t}$ is nondecreasing. A nondecreasing function that is bounded below has a finite infimum, so the limit $g(d) = \lim_{t \to 0^+} \varphi(t) = \inf_{t > 0} \varphi(t)$ exists — provided we can rule out $g(d) = -\infty$.
Why might $g(d)$ be $-\infty$? If $\varphi(t) \to -\infty$ as $t \to 0^+$, the infimum would be $-\infty$. But $\varphi$ is nondecreasing, so this would require $\varphi(t) = -\infty$ for all $t > 0$, contradicting the fact that $f$ is real-valued. We give a cleaner quantitative bound using the reverse direction.
Write $x$ as the midpoint
\begin{align*}
x = \frac{1}{2}(x + td) + \frac{1}{2}(x - td).
\end{align*}
Convexity gives $f(x) \leq \frac{1}{2} f(x + td) + \frac{1}{2} f(x - td)$, which rearranges to
\begin{align*}
f(x + td) - f(x) \geq -(f(x - td) - f(x)).
\end{align*}
Dividing by $t > 0$ and using the notation $\varphi_{-d}(t) = \frac{f(x - td) - f(x)}{t}$ for the difference quotient in the direction $-d$:
\begin{align*}
\varphi(t) \geq -\varphi_{-d}(t).
\end{align*}
The function $\varphi_{-d}$ is also nondecreasing (the monotonicity argument applies to every direction, including $-d$). Fixing any $t_0 > 0$, for all $0 < t \leq t_0$ we have $\varphi_{-d}(t) \leq \varphi_{-d}(t_0)$, hence
\begin{align*}
\varphi(t) \geq -\varphi_{-d}(t_0) = -\frac{f(x - t_0 d) - f(x)}{t_0}.
\end{align*}
The right-hand side is a fixed real number (since $f$ maps into $\mathbb{R}$), so $g(d) = \inf_{t > 0} \varphi(t) \geq -\varphi_{-d}(t_0) > -\infty$.
The conclusion: for every direction $d \in \mathbb{R}^n$, the one-sided directional derivative
\begin{align*}
g(d) = \lim_{t \to 0^+} \frac{f(x + td) - f(x)}{t}
\end{align*}
exists as a finite real number.
[/guided]
[/step]
[step:Show that the directional derivative $g$ is sublinear: positively homogeneous and subadditive]
We verify two properties of the map $g: \mathbb{R}^n \to \mathbb{R}$.
**Positive homogeneity.** For $\lambda > 0$ and $d \in \mathbb{R}^n$,
\begin{align*}
g(\lambda d) = \lim_{t \to 0^+} \frac{f(x + t\lambda d) - f(x)}{t} = \lambda \lim_{s \to 0^+} \frac{f(x + sd) - f(x)}{s} = \lambda \, g(d),
\end{align*}
where we substituted $s = t\lambda$ (so $s \to 0^+$ as $t \to 0^+$ since $\lambda > 0$) and used $t = s/\lambda$.
**Subadditivity.** Let $d_1, d_2 \in \mathbb{R}^n$. For $t > 0$, write
\begin{align*}
x + t(d_1 + d_2) = \frac{1}{2}(x + 2td_1) + \frac{1}{2}(x + 2td_2).
\end{align*}
By convexity of $f$,
\begin{align*}
f(x + t(d_1 + d_2)) \leq \frac{1}{2} f(x + 2td_1) + \frac{1}{2} f(x + 2td_2).
\end{align*}
Subtracting $f(x) = \frac{1}{2} f(x) + \frac{1}{2} f(x)$ and dividing by $t > 0$:
\begin{align*}
\frac{f(x + t(d_1 + d_2)) - f(x)}{t} &\leq \frac{1}{2} \cdot \frac{f(x + 2td_1) - f(x)}{t} + \frac{1}{2} \cdot \frac{f(x + 2td_2) - f(x)}{t} \\
&= \frac{f(x + 2td_1) - f(x)}{2t} + \frac{f(x + 2td_2) - f(x)}{2t}.
\end{align*}
Taking $t \to 0^+$ (with $s := 2t \to 0^+$):
\begin{align*}
g(d_1 + d_2) \leq g(d_1) + g(d_2).
\end{align*}
Since $g$ is positively homogeneous and subadditive, $g$ is a **sublinear functional** on $\mathbb{R}^n$.
[guided]
We need to verify that the directional derivative map $g: \mathbb{R}^n \to \mathbb{R}$ is sublinear, meaning it is both positively homogeneous and subadditive. This is the key structural property that allows us to apply the Hahn-Banach Theorem in the next step.
**Positive homogeneity: $g(\lambda d) = \lambda \, g(d)$ for $\lambda > 0$.**
This is a direct consequence of reparametrising the limit. For $\lambda > 0$,
\begin{align*}
g(\lambda d) = \lim_{t \to 0^+} \frac{f(x + t\lambda d) - f(x)}{t}.
\end{align*}
Substituting $s = t\lambda$ (so $t = s/\lambda$ and $s \to 0^+$ as $t \to 0^+$):
\begin{align*}
g(\lambda d) = \lim_{s \to 0^+} \frac{f(x + sd) - f(x)}{s/\lambda} = \lambda \lim_{s \to 0^+} \frac{f(x + sd) - f(x)}{s} = \lambda \, g(d).
\end{align*}
**Subadditivity: $g(d_1 + d_2) \leq g(d_1) + g(d_2)$.**
The idea is to express $x + t(d_1 + d_2)$ as a convex combination that splits the two directions. Observe
\begin{align*}
x + t(d_1 + d_2) = \frac{1}{2}(x + 2td_1) + \frac{1}{2}(x + 2td_2).
\end{align*}
Why this particular combination? Because it is a midpoint, so the convexity inequality has equal weights $1/2$, and the factor $2t$ in each term is chosen so that after dividing by $t$ we recover the difference quotients $\frac{f(x + 2td_i) - f(x)}{2t}$, each of which has the form $\varphi_{d_i}(2t)$ and converges to $g(d_i)$.
By convexity of $f$,
\begin{align*}
f(x + t(d_1 + d_2)) \leq \frac{1}{2} f(x + 2td_1) + \frac{1}{2} f(x + 2td_2).
\end{align*}
Subtracting $f(x)$ from both sides and dividing by $t > 0$:
\begin{align*}
\frac{f(x + t(d_1 + d_2)) - f(x)}{t} &\leq \frac{f(x + 2td_1) - f(x)}{2t} + \frac{f(x + 2td_2) - f(x)}{2t}.
\end{align*}
Taking $t \to 0^+$ on both sides (setting $s = 2t \to 0^+$), the left side converges to $g(d_1 + d_2)$ and each term on the right converges to $g(d_i)$:
\begin{align*}
g(d_1 + d_2) \leq g(d_1) + g(d_2).
\end{align*}
Since $g$ satisfies both positive homogeneity and subadditivity, $g$ is a sublinear functional on the vector space $\mathbb{R}^n$. This is precisely the kind of functional that dominates linear functionals in the [Hahn-Banach Theorem](/theorems/879).
[/guided]
[/step]
[step:Apply the Hahn-Banach Theorem to obtain a linear functional dominated by $g$]
Consider the zero subspace $W = \{0\} \subset \mathbb{R}^n$ and define the linear functional
\begin{align*}
\ell: W &\to \mathbb{R} \\
0 &\mapsto 0.
\end{align*}
Since $g(0) = \lim_{t \to 0^+} \frac{f(x) - f(x)}{t} = 0$, we have $\ell(0) = 0 = g(0)$, so $\ell(w) \leq g(w)$ for all $w \in W$.
The functional $g: \mathbb{R}^n \to \mathbb{R}$ is positively homogeneous and subadditive (established in the previous step). We verify the hypotheses of the [Hahn-Banach Theorem](/theorems/879): $\mathbb{R}^n$ is a real vector space, $W = \{0\}$ is a subspace, $g$ is positively homogeneous and subadditive, $\ell$ is linear on $W$, and $\ell(w) \leq g(w)$ for all $w \in W$ (the single condition $0 \leq 0$ holds). All hypotheses are satisfied.
By the [Hahn-Banach Theorem](/theorems/879) applied with $V = \mathbb{R}^n$, $W = \{0\}$, and $p = g$, there exists a linear extension
\begin{align*}
L: \mathbb{R}^n \to \mathbb{R}
\end{align*}
such that
\begin{align*}
L(d) \leq g(d) \quad \text{for all } d \in \mathbb{R}^n.
\end{align*}
[guided]
We now apply the [Hahn-Banach Theorem](/theorems/879) to promote the sublinear functional $g$ into a linear functional. The [Hahn-Banach Theorem](/theorems/879) requires four inputs:
1. A real vector space $V$.
2. A subspace $W \subset V$.
3. A sublinear functional $p: V \to \mathbb{R}$ (positively homogeneous and subadditive).
4. A linear functional $\ell: W \to \mathbb{R}$ with $\ell(w) \leq p(w)$ for all $w \in W$.
We choose: $V = \mathbb{R}^n$, $W = \{0\}$, $p = g$, and $\ell: \{0\} \to \mathbb{R}$ the zero map.
We verify all hypotheses:
- $\mathbb{R}^n$ is a real vector space.
- $\{0\}$ is a subspace of $\mathbb{R}^n$.
- $g$ is positively homogeneous: $g(\lambda d) = \lambda g(d)$ for all $\lambda > 0$, $d \in \mathbb{R}^n$ (shown in the previous step).
- $g$ is subadditive: $g(d_1 + d_2) \leq g(d_1) + g(d_2)$ for all $d_1, d_2 \in \mathbb{R}^n$ (shown in the previous step).
- $\ell(0) = 0 \leq g(0) = 0$, so the domination condition holds on $W$.
The [Hahn-Banach Theorem](/theorems/879) produces a linear functional $L: \mathbb{R}^n \to \mathbb{R}$ extending $\ell$ (which is vacuous since $W = \{0\}$) and satisfying
\begin{align*}
L(d) \leq g(d) \quad \text{for all } d \in \mathbb{R}^n.
\end{align*}
One might ask: why start from the trivial subspace $\{0\}$ rather than from a more informative subspace? In $\mathbb{R}^n$, the Hahn-Banach extension from $\{0\}$ is sufficient because any linear functional on a finite-dimensional space is automatically continuous, and the extension is guaranteed to exist. Starting from $\{0\}$ keeps the argument clean and avoids the need to construct $\ell$ on a nontrivial subspace.
[/guided]
[/step]
[step:Represent the linear functional $L$ as the inner product with a vector $r \in \mathbb{R}^n$]
Since $L: \mathbb{R}^n \to \mathbb{R}$ is a linear functional on a finite-dimensional space, it is determined by its action on the standard basis. Define $r \in \mathbb{R}^n$ by
\begin{align*}
r_i := L(e_i) \quad \text{for } i = 1, \ldots, n,
\end{align*}
where $e_1, \ldots, e_n$ denotes the standard basis of $\mathbb{R}^n$. By linearity of $L$, for any $d = \sum_{i=1}^n d_i \, e_i \in \mathbb{R}^n$,
\begin{align*}
L(d) = \sum_{i=1}^n d_i \, L(e_i) = \sum_{i=1}^n r_i \, d_i = r \cdot d.
\end{align*}
The domination condition $L(d) \leq g(d)$ therefore reads
\begin{align*}
r \cdot d \leq g(d) = \lim_{t \to 0^+} \frac{f(x + td) - f(x)}{t} \quad \text{for all } d \in \mathbb{R}^n.
\end{align*}
[guided]
Every linear functional on $\mathbb{R}^n$ can be represented as an inner product — this is a consequence of the finite-dimensional Riesz representation (or, more elementarily, of the fact that linear maps from $\mathbb{R}^n$ to $\mathbb{R}$ are determined by their action on a basis).
Define the vector $r = (r_1, \ldots, r_n) \in \mathbb{R}^n$ by $r_i := L(e_i)$, where $e_1, \ldots, e_n$ is the standard basis of $\mathbb{R}^n$. For any $d = \sum_{i=1}^n d_i \, e_i$, linearity of $L$ gives
\begin{align*}
L(d) = L\Bigl(\sum_{i=1}^n d_i \, e_i\Bigr) = \sum_{i=1}^n d_i \, L(e_i) = \sum_{i=1}^n r_i \, d_i = r \cdot d.
\end{align*}
Substituting into the Hahn-Banach conclusion $L(d) \leq g(d)$:
\begin{align*}
r \cdot d \leq g(d) \quad \text{for all } d \in \mathbb{R}^n.
\end{align*}
This is the vector $r$ that will serve as the subgradient. The remaining task is to promote the infinitesimal inequality $r \cdot d \leq g(d)$ into the global inequality $f(y) \geq f(x) + r \cdot (y - x)$.
[/guided]
[/step]
[step:Verify the supporting hyperplane inequality for all $y \in \mathbb{R}^n$]
Fix an arbitrary $y \in \mathbb{R}^n$ and set $d := y - x$. By the monotonicity established in the first step, $\varphi(t) = \frac{f(x + td) - f(x)}{t}$ is nondecreasing on $(0, \infty)$. Evaluating at $t = 1$:
\begin{align*}
g(d) = \inf_{t > 0} \frac{f(x + td) - f(x)}{t} \leq \frac{f(x + d) - f(x)}{1} = f(y) - f(x).
\end{align*}
Combining with the domination inequality $r \cdot d \leq g(d)$:
\begin{align*}
r \cdot (y - x) = r \cdot d \leq g(d) \leq f(y) - f(x).
\end{align*}
Rearranging:
\begin{align*}
f(y) \geq f(x) + r \cdot (y - x).
\end{align*}
Since $y \in \mathbb{R}^n$ was arbitrary, this completes the proof. $\blacksquare$
[guided]
We now close the argument by converting the infinitesimal bound $r \cdot d \leq g(d)$ into the global supporting hyperplane inequality. Fix any $y \in \mathbb{R}^n$ and set $d := y - x$.
From the first step, the difference quotient $\varphi(t) = \frac{f(x + td) - f(x)}{t}$ is nondecreasing. The directional derivative $g(d)$ is defined as the infimum of $\varphi$ over $t > 0$. Evaluating $\varphi$ at the specific value $t = 1$:
\begin{align*}
g(d) = \inf_{t > 0} \varphi(t) \leq \varphi(1) = f(x + d) - f(x) = f(y) - f(x).
\end{align*}
This inequality is the crucial passage from infinitesimal (directional derivative) to global (function values): it uses the fact that the infimum of a nondecreasing function is at most any particular value, and $t = 1$ gives precisely $f(y) - f(x)$.
Chaining the two inequalities:
\begin{align*}
r \cdot (y - x) \leq g(y - x) \leq f(y) - f(x).
\end{align*}
The first inequality is the Hahn-Banach domination, and the second is the monotonicity estimate. Rearranging yields
\begin{align*}
f(y) \geq f(x) + r \cdot (y - x),
\end{align*}
which is the desired supporting hyperplane inequality. Since $y \in \mathbb{R}^n$ was arbitrary and $x \in \mathbb{R}^n$ was fixed at the outset, the proof is complete. $\blacksquare$
[/guided]
[/step]