[proofplan]
The [Mean Value Theorem](/theorems/186) fails for vector-valued maps (the intermediate value need not exist), so we reduce to the scalar case by projecting onto an arbitrary unit vector $\xi \in \mathbb{R}^n$. The scalar function $g(t) = \langle \xi, f(\gamma(t)) \rangle$ along the parametrised line segment satisfies $|g'(t)| \leq M\|b - a\|$ by the [Chain Rule](/theorems/323) and the [Lipschitz bound on linear maps](/theorems/321). The one-dimensional Mean Value Theorem applied to $g$ gives the bound for each projection, and choosing $\xi$ optimally recovers the full norm inequality.
[/proofplan]
[step:Reduce to bounding scalar projections along unit vectors]
Define the parametrisation $\gamma: [0, 1] \to U$ by $\gamma(t) = a + t(b - a)$, so $\gamma(0) = a$, $\gamma(1) = b$, and $\gamma([0,1]) = [a, b] \subseteq U$.
[claim:Reduction to scalar projections]
It suffices to show that for every unit vector $\xi \in \mathbb{R}^n$ with $\|\xi\| = 1$, $|\langle \xi, f(b) - f(a)\rangle| \leq M\|b - a\|$.
[/claim]
[proof]
If $f(b) = f(a)$, the inequality holds. Otherwise, choose $\xi = (f(b) - f(a))/\|f(b) - f(a)\|$. Then $\langle \xi, f(b) - f(a)\rangle = \|f(b) - f(a)\|$, and the scalar bound gives the desired vector bound.
[/proof]
[/step]
[step:Apply the one-dimensional Mean Value Theorem to the projected function]
Fix a unit vector $\xi \in \mathbb{R}^n$ and define $g: [0, 1] \to \mathbb{R}$ by $g(t) = \langle \xi, f(\gamma(t))\rangle$. By the [Chain Rule](/theorems/323), $g$ is [differentiable](/page/Derivative) on $[0,1]$ with
\begin{align*}
g'(t) = \langle \xi, Df_{\gamma(t)}(b - a)\rangle.
\end{align*}
Applying the Cauchy--Schwarz inequality with $\|\xi\| = 1$, then the [Lipschitz bound for linear maps](/theorems/321) and the hypothesis $\|Df_z\| \leq M$:
\begin{align*}
|g'(t)| \leq \|\xi\| \cdot \|Df_{\gamma(t)}(b - a)\| \leq \|Df_{\gamma(t)}\| \cdot \|b - a\| \leq M\|b - a\|.
\end{align*}
By the one-dimensional [Mean Value Theorem](/theorems/186) applied to $g$ on $[0,1]$, there exists $c \in (0,1)$ with $g(1) - g(0) = g'(c)$. Therefore:
\begin{align*}
|\langle \xi, f(b) - f(a)\rangle| = |g(1) - g(0)| = |g'(c)| \leq M\|b - a\|.
\end{align*}
Since this holds for every unit vector $\xi$, the claim gives $\|f(b) - f(a)\| \leq M\|b - a\|$.
[guided]
Why do we need the projection trick at all? The [Mean Value Theorem](/theorems/186) for scalar functions produces an exact intermediate point $c$ with $g(1) - g(0) = g'(c)$, but for vector-valued functions no such intermediate point need exist.
The classic counterexample is $f(t) = (\cos t, \sin t)$ on $[0, 2\pi]$:
$f(2\pi) - f(0) = \mathbf{0}$ but $Df_t = (-\sin t, \cos t) \neq \mathbf{0}$ everywhere.
By projecting onto unit vectors $\xi$, we reduce to the scalar function $g(t) = \langle \xi, f(\gamma(t))\rangle$, for which the one-dimensional Mean Value Theorem applies: there exists $c \in (0,1)$ with $g(1) - g(0) = g'(c)$.
The chain of bounds on $|g'(t)|$ uses two estimates.
First, Cauchy--Schwarz:
$|\langle \xi, Df_{\gamma(t)}(b - a)\rangle| \leq \|\xi\| \cdot \|Df_{\gamma(t)}(b - a)\| = \|Df_{\gamma(t)}(b - a)\|$ (since $\|\xi\| = 1$).
Second, the [Lipschitz bound on linear maps](/theorems/321):
$\|Df_{\gamma(t)}(b - a)\| \leq \|Df_{\gamma(t)}\| \cdot \|b - a\| \leq M\|b - a\|$.
The final step chooses $\xi$ optimally: if $f(b) \neq f(a)$, set $\xi = (f(b) - f(a))/\|f(b) - f(a)\|$.
Then $\langle \xi, f(b) - f(a)\rangle = \|f(b) - f(a)\|$.
The scalar bound $|\langle \xi, f(b) - f(a)\rangle| \leq M\|b - a\|$ then becomes $\|f(b) - f(a)\| \leq M\|b - a\|$, completing the proof.
[/guided]
[/step]