[proofplan]
By [Componentwise Differentiability](/theorems/324), it suffices to prove the result for scalar-valued [functions](/page/Function) $f: U \to \mathbb{R}$. We decompose the increment $f(a + h) - f(a)$ by varying one coordinate at a time (telescoping along coordinate axes), apply the one-dimensional [Mean Value Theorem](/theorems/186) to each single-variable increment, and use the [continuity](/page/Continuity) of the partial derivatives at $a$ to show the resulting error vanishes.
[/proofplan]
[step:Reduce to the scalar case and decompose the increment along coordinate axes]
By [Componentwise Differentiability](/theorems/324), it suffices to treat $f: U \to \mathbb{R}$. We present the argument for $m = 2$; the general case follows by the same telescoping with $m$ terms.
Let $a = (a_1, a_2)$ and $h = (h_1, h_2)$ with $a + h \in B_r(a)$. Define $\tau \in \mathcal{L}(\mathbb{R}^2, \mathbb{R})$ by $\tau(h) = D_1 f(a) h_1 + D_2 f(a) h_2$. We must show $f(a + h) - f(a) - \tau(h) = o(|h|)$.
Decompose the increment by adding and subtracting an intermediate evaluation:
\begin{align*}
f(a + h) - f(a) = \bigl[f(a_1 + h_1, a_2 + h_2) - f(a_1, a_2 + h_2)\bigr] + \bigl[f(a_1, a_2 + h_2) - f(a_1, a_2)\bigr].
\end{align*}
[/step]
[step:Apply the one-variable Mean Value Theorem to each bracket]
For the first bracket: the [function](/page/Function) $t \mapsto f(t, a_2 + h_2)$ is differentiable on an interval containing $[a_1, a_1 + h_1]$ (since all partial [derivatives](/page/Derivative) exist on $B_r(a)$). By the [Mean Value Theorem](/theorems/186), there exists $\xi_1$ between $a_1$ and $a_1 + h_1$ such that
\begin{align*}
f(a_1 + h_1, a_2 + h_2) - f(a_1, a_2 + h_2) = D_1 f(\xi_1, a_2 + h_2) \cdot h_1.
\end{align*}
For the second bracket: $t \mapsto f(a_1, t)$ is differentiable on an interval containing $[a_2, a_2 + h_2]$. By the Mean Value Theorem, there exists $\xi_2$ between $a_2$ and $a_2 + h_2$ such that
\begin{align*}
f(a_1, a_2 + h_2) - f(a_1, a_2) = D_2 f(a_1, \xi_2) \cdot h_2.
\end{align*}
[/step]
[step:Estimate the error using continuity of partial derivatives]
Subtracting $\tau(h) = D_1 f(a) h_1 + D_2 f(a) h_2$:
\begin{align*}
f(a + h) - f(a) - \tau(h) = \bigl[D_1 f(\xi_1, a_2 + h_2) - D_1 f(a)\bigr]h_1 + \bigl[D_2 f(a_1, \xi_2) - D_2 f(a)\bigr]h_2.
\end{align*}
Taking absolute values and using $|h_i| \leq |h|$:
\begin{align*}
\frac{|f(a + h) - f(a) - \tau(h)|}{|h|} \leq |D_1 f(\xi_1, a_2 + h_2) - D_1 f(a)| + |D_2 f(a_1, \xi_2) - D_2 f(a)|.
\end{align*}
As $h \to \mathbf{0}$, the point $(\xi_1, a_2 + h_2) \to a$ (since $\xi_1$ lies between $a_1$ and $a_1 + h_1$) and $(a_1, \xi_2) \to a$. By [continuity](/page/Continuity) of $D_1 f$ and $D_2 f$ at $a$, both terms on the right tend to $0$. Therefore $f(a + h) - f(a) - \tau(h) = o(|h|)$, confirming $f$ is [differentiable](/page/Derivative) at $a$ with derivative $\tau$.
[guided]
The strategy of decomposing along coordinate axes is the standard telescoping trick. Why does it work? Each bracket isolates a single-variable function to which the [Mean Value Theorem](/theorems/186) applies, producing an intermediate point where the partial derivative is evaluated.
The error arises because these intermediate evaluations are at nearby points $(\xi_1, a_2 + h_2)$ and $(a_1, \xi_2)$ rather than at $a$ itself. The [continuity](/page/Continuity) hypothesis on the partial derivatives is consumed precisely here: it guarantees that
\begin{align*}
|D_1 f(\xi_1, a_2 + h_2) - D_1 f(a)| \to 0 \quad \text{and} \quad |D_2 f(a_1, \xi_2) - D_2 f(a)| \to 0
\end{align*}
as $h \to \mathbf{0}$. Without continuity of the partials, the Mean Value Theorem still applies, but the error need not vanish -- and indeed there exist functions with all partial derivatives existing at a point but which are not differentiable there.
The bound $|h_i| \leq |h|$ absorbs the directional components into the normalising factor, yielding the clean estimate
\begin{align*}
\frac{|f(a + h) - f(a) - \tau(h)|}{|h|} \leq |D_1 f(\xi_1, a_2 + h_2) - D_1 f(a)| + |D_2 f(a_1, \xi_2) - D_2 f(a)| \to 0.
\end{align*}
For general $m$, the same argument uses $m$ intermediate points and $m$ applications of the one-dimensional Mean Value Theorem.
[/guided]
[/step]