Differentiability is the point at which multivariable calculus becomes linear algebra. A continuous map may carry nearby points to nearby points, but a differentiable map does something more rigid: near a chosen point, it is well approximated by a single [linear map](/page/Linear%20Map). This local linear model is what makes the chain rule possible, turns nonlinear equations into tangent-level questions, and connects [continuity](/page/Continuity), [partial derivatives](/page/Partial%20Derivative), the [Jacobian matrix](/page/Jacobian%20Matrix), and the [inverse function theorem](/theorems/51).
The definition is stronger than having directional or partial derivatives. It asks for one linear approximation that works uniformly for all small displacement vectors. That single map is the total derivative, and it is the object that survives under coordinate changes, composition, optimization, and geometric constructions.
[motivation]
In one-variable calculus, differentiability at $a$ means that the graph has a best affine approximation near $a$:
\begin{align*}
f(a+h) = f(a) + f'(a)h + o(|h|).
\end{align*}
For maps $f: U \subset \mathbb{R}^m \to \mathbb{R}^n$, the number $f'(a)$ must be replaced by a linear map from input displacements to output displacements. The definition of differentiability is designed to isolate exactly this first-order behaviour.
The key point is that the error must be small compared with $|h|$, not merely small in absolute size. If the error is only known to vanish, the map is continuous. If the error vanishes after division by $|h|$, the linear part captures the whole first-order motion of the map.
[/motivation]
## Definition
The central question is whether the increment $f(a+h)-f(a)$ has a linear part that explains it to first order. Since $h$ ranges through vectors in $\mathbb{R}^m$, the candidate first-order model must be a linear map on $\mathbb{R}^m$ with values in $\mathbb{R}^n$.
[definition: Differentiable Map]
Let $U \subset \mathbb{R}^m$ be open, let $a \in U$, and let $f: U \to \mathbb{R}^n$ be a function. The map $f$ is differentiable at $a$ if there exist a linear map $L \in \mathcal{L}(\mathbb{R}^m, \mathbb{R}^n)$ and a function $r: U-a \to \mathbb{R}^n$ such that $r(0)=0$, $r$ is continuous at $0$, and for every $h \in \mathbb{R}^m$ with $a+h \in U$,
\begin{align*}
f(a+h) = f(a) + L(h) + |h|r(h).
\end{align*}
[/definition]
A reusable calculus needs a name for the unique first-order model at a point. Naming that linear map allows later statements to refer to the first-order approximation without repeating the full remainder condition.
[definition: Total Derivative]
Let $U \subset \mathbb{R}^m$ be open, let $a \in U$, and let $f: U \to \mathbb{R}^n$ be differentiable at $a$. The total derivative of $f$ at $a$ is the unique linear map $Df_a: \mathbb{R}^m \to \mathbb{R}^n$ such that
\begin{align*}
f(a+h)=f(a)+Df_a(h)+o(|h|)
\end{align*}
as $h \to 0$ with $a+h \in U$.
[/definition]
A single point condition is not enough for calculus rules that must hold uniformly as the base point moves. Since compositions, derivative maps, and coordinate formulas all require differentiability at each point where they are applied, we need a domain-level condition rather than repeated pointwise hypotheses.
[definition: Differentiable Map on an Open Set]
Let $U \subset \mathbb{R}^m$ be open and let $f: U \to \mathbb{R}^n$ be a function. The map $f$ is differentiable on $U$ if $f$ is differentiable at every point $a \in U$.
[/definition]
A theory of regularity also needs to compare derivatives at different points. Treating $a\mapsto Df_a$ as a function is the gateway to continuous differentiability, higher derivatives, and estimates that vary across the domain.
[definition: Derivative Map]
Let $U \subset \mathbb{R}^m$ be open and let $f: U \to \mathbb{R}^n$ be differentiable on $U$. The derivative map of $f$ is the function $Df: U \to \mathcal{L}(\mathbb{R}^m, \mathbb{R}^n)$ defined by $a \mapsto Df_a$.
[/definition]
Concrete calculations often require coordinates. In Euclidean space the standard bases convert each total derivative into a rectangular array of partial derivatives, which gives the computational object used in change-of-variables and linearization formulas.
[definition: Jacobian Matrix]
Let $U \subset \mathbb{R}^m$ be open, let $a \in U$, and let $f=(f_1,\ldots,f_n):U\to\mathbb{R}^n$ be differentiable at $a$. The Jacobian matrix of $f$ at $a$ is the matrix $Jf_a \in \mathbb{R}^{n\times m}$ whose entries are
\begin{align*}
(Jf_a)_{ij}=\partial_{x_j}f_i(a), \qquad 1\le i\le n,\quad 1\le j\le m.
\end{align*}
[/definition]
The Jacobian matrix acts on column vectors by matrix multiplication and represents the same first-order map: $Df_a(h)=Jf_a h$. This distinction matters in geometry and functional analysis, where the derivative is intrinsically a linear map and a matrix only after bases have been chosen.
## Equivalent Characterisations
The remainder term definition is precise, but in estimates it is often more convenient to divide by $|h|$ and express the condition as a limiting statement. This formulation makes the phrase "first-order approximation" literal and provides the standard test used in computations.
[quotetheorem:319]
This limit form is often the practical way to verify differentiability: after proposing a candidate linear map, one estimates the error divided by $|h|$ and checks that it tends to zero. It also clarifies the limitation of first-order calculus: differentiability controls only the leading linear part, while all higher-order behaviour is hidden in the vanishing remainder.
Coordinate partial derivatives are often easier to compute than the total derivative, but by themselves they do not automatically control the full multivariable remainder. The key obstruction is whether the coordinate data vary regularly enough near the point to assemble into one linear approximation.
[quotetheorem:327]
The hypothesis is sufficient, not necessary. A function may be differentiable at a point without having partial derivatives continuous near that point. To compare differentiability with weaker line-based tests, we next isolate the [directional derivative](/page/Directional%20Derivative).
[definition: Directional Derivative]
Let $U \subset \mathbb{R}^m$ be open, let $a \in U$, let $v\in\mathbb{R}^m$, and let $f:U\to\mathbb{R}^n$ be a function. The directional derivative of $f$ at $a$ in the direction $v$ is the limit
\begin{align*}
D_v f(a)=\lim_{t\to 0}\frac{f(a+tv)-f(a)}{t},
\end{align*}
provided this limit exists in $\mathbb{R}^n$.
[/definition]
Directional derivatives test the function only along one-dimensional slices, so their values could a priori be unrelated from one direction to another.
The natural question is what extra structure differentiability imposes on these slice derivatives. If a genuine linear approximation exists at $a$, then every directional test must be obtained by feeding the direction vector into that same linear map.
[quotetheorem:326]
The converse fails in general: even having every directional derivative at a point does not guarantee differentiability there. The failure is that line-by-line information need not assemble into one linear approximation.
## Examples
Linear maps are the model case: their first-order approximation is exact, with no remainder. This is the baseline against which nonlinear examples are measured.
[example: Linear Maps]
Let $A\in\mathbb{R}^{n\times m}$ and define $f:\mathbb{R}^m\to\mathbb{R}^n$ by $f(x)=Ax$. Fix $a\in\mathbb{R}^m$ and write $L:\mathbb{R}^m\to\mathbb{R}^n$ for the linear map $L(h)=Ah$. For every $h\in\mathbb{R}^m$,
\begin{align*}
f(a+h)=A(a+h)=Aa+Ah=f(a)+L(h).
\end{align*}
Hence
\begin{align*}
f(a+h)-f(a)-L(h)=0.
\end{align*}
Taking the remainder function $r(h)=0$ for every $h$ gives $r(0)=0$, $r$ is continuous at $0$, and
\begin{align*}
f(a+h)=f(a)+L(h)+|h|r(h).
\end{align*}
Therefore $f$ is differentiable at the arbitrary point $a$, so it is differentiable on all of $\mathbb{R}^m$, with
\begin{align*}
Df_a(h)=Ah.
\end{align*}
Since this formula does not depend on $a$, the derivative map is constant. In the standard bases, the matrix representing the linear map $h\mapsto Ah$ is exactly $A$, so $Jf_a=A$ for every $a\in\mathbb{R}^m$.
[/example]
A nonlinear map can still be differentiable because its leading change is linear after expanding around the base point. Polynomials and elementary smooth functions provide clean computational examples.
[example: A Polynomial and Trigonometric Map]
Let $f:\mathbb{R}^2\to\mathbb{R}^2$ be defined by $f(x_1,x_2)=(f_1(x_1,x_2),f_2(x_1,x_2))$, where $f_1(x_1,x_2)=x_1^2x_2$ and $f_2(x_1,x_2)=x_1+\sin x_2$. Fix $a=(a_1,a_2)$. The first component has partial derivatives
\begin{align*}
\partial_{x_1}f_1(a)=\lim_{t\to 0}\frac{(a_1+t)^2a_2-a_1^2a_2}{t}=\lim_{t\to 0}\frac{2a_1a_2t+a_2t^2}{t}=\lim_{t\to 0}(2a_1a_2+a_2t)=2a_1a_2.
\end{align*}
Also,
\begin{align*}
\partial_{x_2}f_1(a)=\lim_{t\to 0}\frac{a_1^2(a_2+t)-a_1^2a_2}{t}=\lim_{t\to 0}\frac{a_1^2t}{t}=a_1^2.
\end{align*}
For the second component,
\begin{align*}
\partial_{x_1}f_2(a)=\lim_{t\to 0}\frac{(a_1+t+\sin a_2)-(a_1+\sin a_2)}{t}=\lim_{t\to 0}\frac{t}{t}=1.
\end{align*}
Using the one-variable identity $(\sin)'=\cos$,
\begin{align*}
\partial_{x_2}f_2(a)=\cos a_2.
\end{align*}
The functions $(x_1,x_2)\mapsto 2x_1x_2$, $(x_1,x_2)\mapsto x_1^2$, $(x_1,x_2)\mapsto 1$, and $(x_1,x_2)\mapsto \cos x_2$ are continuous, so the partial derivatives are continuous at every point. By *[Continuous Partial Derivatives Imply Differentiability](/theorems/327)*, $f$ is differentiable at every $a\in\mathbb{R}^2$, and its derivative is represented by these four entries. Therefore, for $h=(h_1,h_2)$,
\begin{align*}
Df_a(h)=(2a_1a_2h_1+a_1^2h_2, h_1+\cos(a_2)h_2).
\end{align*}
This computation shows how the total derivative packages the four first-order coordinate rates into one linear map $\mathbb{R}^2\to\mathbb{R}^2$.
[/example]
The next example explains why the definition is not replaced by the existence of partial derivatives. The coordinate axes can look well behaved while the map has incompatible behaviour along other curves.
[example: Partial Derivatives Without Differentiability]
Define $f:\mathbb{R}^2\to\mathbb{R}$ by $f(0,0)=0$ and, for $(x_1,x_2)\ne(0,0)$, by
\begin{align*}
f(x_1,x_2)=\frac{x_1x_2}{x_1^2+x_2^2}.
\end{align*}
We compute the two partial derivatives at $(0,0)$. For the first coordinate direction, when $t\ne 0$,
\begin{align*}
f(t,0)=\frac{t\cdot 0}{t^2+0^2}=0.
\end{align*}
Hence
\begin{align*}
\partial_{x_1}f(0,0)=\lim_{t\to 0}\frac{f(t,0)-f(0,0)}{t}=\lim_{t\to 0}\frac{0-0}{t}=0.
\end{align*}
For the second coordinate direction, when $t\ne 0$,
\begin{align*}
f(0,t)=\frac{0\cdot t}{0^2+t^2}=0.
\end{align*}
Thus
\begin{align*}
\partial_{x_2}f(0,0)=\lim_{t\to 0}\frac{f(0,t)-f(0,0)}{t}=\lim_{t\to 0}\frac{0-0}{t}=0.
\end{align*}
Now approach $(0,0)$ along the line $x_1=x_2$. For $t\ne 0$,
\begin{align*}
f(t,t)=\frac{t\cdot t}{t^2+t^2}=\frac{t^2}{2t^2}=\frac{1}{2}.
\end{align*}
The points $(t,t)$ tend to $(0,0)$ as $t\to 0$, but the corresponding function values tend to $1/2$, not to $f(0,0)=0$. Therefore $f$ is not continuous at $(0,0)$. By *[Differentiability Implies Continuity](/theorems/184)*, $f$ cannot be differentiable at $(0,0)$. The example shows that coordinate-axis rates alone can exist and still fail to assemble into a genuine first-order approximation.
[/example]
Even directional derivatives in every direction can be misleading. The issue is again that differentiability requires one linear map controlling all directions at the same scale.
[example: Directional Derivatives Without Differentiability]
Define $f:\mathbb{R}^2\to\mathbb{R}$ by $f(0,0)=0$ and, for $(x_1,x_2)\ne(0,0)$, by
\begin{align*}
f(x_1,x_2)=\frac{x_1^2x_2}{x_1^4+x_2^2}.
\end{align*}
We compute the directional derivative at $(0,0)$ in an arbitrary direction $v=(v_1,v_2)$. If $v=(0,0)$, then $tv=(0,0)$ for every $t$, so
\begin{align*}
\frac{f(tv)-f(0,0)}{t}=\frac{0-0}{t}=0
\end{align*}
for $t\ne 0$, and therefore $D_vf(0,0)=0$.
Now suppose $v\ne(0,0)$. For $t\ne 0$,
\begin{align*}
f(tv)=f(tv_1,tv_2)=\frac{(tv_1)^2(tv_2)}{(tv_1)^4+(tv_2)^2}.
\end{align*}
Expanding each power gives
\begin{align*}
f(tv)=\frac{t^3v_1^2v_2}{t^4v_1^4+t^2v_2^2}.
\end{align*}
Since $t\ne 0$, factor $t^2$ from the denominator:
\begin{align*}
f(tv)=\frac{tv_1^2v_2}{t^2v_1^4+v_2^2}.
\end{align*}
Thus the directional difference quotient is
\begin{align*}
\frac{f(tv)-f(0,0)}{t}=\frac{v_1^2v_2}{t^2v_1^4+v_2^2}.
\end{align*}
If $v_2\ne 0$, then the denominator tends to $v_2^2$, so
\begin{align*}
D_vf(0,0)=\lim_{t\to 0}\frac{v_1^2v_2}{t^2v_1^4+v_2^2}=\frac{v_1^2v_2}{v_2^2}=\frac{v_1^2}{v_2}.
\end{align*}
If $v_2=0$, then $v_1\ne 0$, and for $t\ne 0$,
\begin{align*}
f(tv_1,0)=\frac{(tv_1)^2\cdot 0}{(tv_1)^4+0^2}=0.
\end{align*}
Hence
\begin{align*}
D_vf(0,0)=\lim_{t\to 0}\frac{0-0}{t}=0.
\end{align*}
So every directional derivative at $(0,0)$ exists.
However, along the parabola $x_2=x_1^2$, for $t\ne 0$ we have
\begin{align*}
f(t,t^2)=\frac{t^2t^2}{t^4+(t^2)^2}.
\end{align*}
Since $(t^2)^2=t^4$, this becomes
\begin{align*}
f(t,t^2)=\frac{t^4}{t^4+t^4}.
\end{align*}
Combining the denominator terms gives
\begin{align*}
f(t,t^2)=\frac{t^4}{2t^4}=\frac{1}{2}.
\end{align*}
The points $(t,t^2)$ tend to $(0,0)$ as $t\to 0$, but the corresponding function values tend to $1/2$, while $f(0,0)=0$. Therefore $f$ is not continuous at $(0,0)$. By *[Differentiability Implies Continuity](/theorems/322)*, $f$ is not differentiable at $(0,0)$, even though all directional derivatives there exist. This shows that line-by-line first-order data need not assemble into one total derivative.
[/example]
Differentiability is not only a pointwise regularity condition; it controls local change quantitatively. The next example shows how the derivative predicts increments.
[example: Linear Approximation]
Let $f:\mathbb{R}^2\to\mathbb{R}$ be defined by $f(x_1,x_2)=e^{x_1}\cos x_2$, and let $a=(0,0)$. Since $f(0,0)=e^0\cos 0=1$, the first partial derivative at $a$ is
\begin{align*}
\partial_{x_1}f(0,0)=\lim_{t\to 0}\frac{e^t\cos 0-1}{t}=\lim_{t\to 0}\frac{e^t-1}{t}=1.
\end{align*}
The second partial derivative at $a$ is
\begin{align*}
\partial_{x_2}f(0,0)=\lim_{t\to 0}\frac{e^0\cos t-1}{t}=\lim_{t\to 0}\frac{\cos t-1}{t}=0.
\end{align*}
Thus the Jacobian row at $a$ is $Jf_a=(1,0)$.
For all $(x_1,x_2)$, the partial derivative formulas are
\begin{align*}
\partial_{x_1}f(x_1,x_2)=e^{x_1}\cos x_2.
\end{align*}
\begin{align*}
\partial_{x_2}f(x_1,x_2)=-e^{x_1}\sin x_2.
\end{align*}
These functions are continuous at $(0,0)$, so by *Continuous Partial Derivatives Imply Differentiability*, $f$ is differentiable at $(0,0)$ and
\begin{align*}
Df_a(h_1,h_2)=1\cdot h_1+0\cdot h_2=h_1.
\end{align*}
Therefore the first-order expansion is
\begin{align*}
f(h_1,h_2)=f(0,0)+Df_a(h_1,h_2)+o(|h|)=1+h_1+o(|h|).
\end{align*}
The absence of a linear $h_2$ term is also visible from the one-variable Taylor expansion in the $x_2$ direction:
\begin{align*}
f(0,h_2)-f(0,0)=\cos h_2-1=-\frac{h_2^2}{2}+O(|h_2|^4).
\end{align*}
So motion in the $x_2$ direction changes the function only at second order near $(0,0)$, while the first-order change is entirely the $h_1$ term.
[/example]
## Properties
Before proving rules for differentiable maps, one needs the most basic regularity consequence. A first-order approximation controls the entire increment $f(a+h)-f(a)$, so it forces the function values themselves to approach $f(a)$.
[quotetheorem:322]
This consequence is necessary: a function cannot have a reliable first-order approximation at $a$ while still jumping or oscillating at zeroth order there. It also explains the earlier counterexamples: directional or partial information may exist without continuity, but true differentiability rules out that pathology because the linear term and the small remainder both vanish as $h\to 0$.
The definition of differentiability would be unusable if two different linear maps could both give errors smaller than first order.
This raises a well-posedness issue for the notation $Df_a$: before using derivatives in computations, we must know that the first-order linear part is uniquely determined by $f$ at $a$. The obstruction is that differentiability is phrased by existence of some linear approximation, not by an explicit formula for it. If uniqueness failed, later rules such as the chain rule or algebra rules could depend on a choice of derivative rather than on the function itself. The next formal result removes that ambiguity by showing that the little-$o$ condition pins down exactly one linear map.
[quotetheorem:320]
Uniqueness makes $Df_a$ part of the data determined by $f$, not an auxiliary choice made during a proof. Once this is known, formulas involving derivatives can be stated unambiguously and compared across coordinates, examples, and later constructions.
To differentiate a composition, we need a rule relating the local linear approximation of the composite to those of its factors. This rule is the structural reason differentiability works well with coordinate changes, substitutions, and nonlinear systems.
[quotetheorem:323]
The theorem says that first-order parts compose exactly as linear maps compose. In coordinates this becomes multiplication of Jacobian matrices, so the derivative of $g\circ f$ is obtained by multiplying the Jacobian of $g$ at $f(a)$ with the Jacobian of $f$ at $a$. This is why the chain rule drives later multivariable calculus: it lets local linear models pass through changes of variables, parametrizations, and nonlinear systems.
Calculus also needs stability under addition, scalar multiplication, and multiplication of scalar-valued functions. These closure rules allow differentiability to be checked by building complicated formulas from simpler differentiable pieces.
[quotetheorem:8866]
These rules are the main way differentiable functions are built in practice. Polynomials, rational expressions on their domains, coordinate formulas, and many vector-valued maps are handled by reducing them to constants, projections, products, sums, and scalar multiples rather than by returning to the definition each time.
For a real-valued differentiable function, the derivative at a point is a linear functional on directions, while geometric intuition often asks for a vector that points in the direction of greatest first-order increase. The Euclidean [inner product](/page/Inner%20Product) is what converts that functional into the gradient vector.
[quotetheorem:7908]
The gradient is therefore not a [second derivative](/page/Second%20Derivative) or an extra structure on top of $Df_a$; it is the vector representative of the same linear functional, made possible by the Euclidean inner product. This representation is central in optimization because the sign and size of $\nabla f(a)\cdot h$ describe first-order increase and decrease in each direction.
Pointwise differentiability can be too weak for the inverse function theorem, [implicit function theorem](/theorems/52), and many stability estimates. A stronger class asks not only for derivatives at each point but for those derivatives to vary continuously across the domain.
[definition: Continuously Differentiable Map]
Let $U\subset\mathbb{R}^m$ be open and let $f:U\to\mathbb{R}^n$ be differentiable on $U$. The map $f$ is continuously differentiable on $U$, written $f\in C^1(U;\mathbb{R}^n)$, if the derivative map $Df:U\to\mathcal{L}(\mathbb{R}^m,\mathbb{R}^n)$ is continuous.
[/definition]
Continuously differentiable maps are the natural setting for the most familiar theorems of multivariable calculus, including the inverse and implicit function theorems.
## Relationship to Other Concepts
Differentiability refines [continuity](/page/Continuity). Continuity remembers only zeroth-order behaviour: whether $f(x)$ approaches $f(a)$ as $x$ approaches $a$. Differentiability remembers the first-order part of that approach and supplies a linear approximation with a controlled error.
It also organizes [partial derivatives](/page/Partial%20Derivative). Partial derivatives are coordinate probes of the total derivative, while the total derivative is the coordinate-free first-order object. The [Jacobian matrix](/page/Jacobian%20Matrix) is the standard-basis representation of that object.
For scalar-valued functions, differentiability connects to the gradient, tangent hyperplanes, and optimization. At an interior local extremum of a differentiable function, the derivative must vanish, which is the multivariable version of Fermat's theorem.
[quotetheorem:7724]
This condition is only first order; it identifies critical points but does not classify them. Classification requires second derivatives, quadratic approximations, and the Hessian.
Differentiability is also the local language of smooth geometry. On a smooth manifold, a smooth map has a differential between tangent spaces, and in coordinates this differential is exactly the total derivative described here. The Euclidean definition is therefore the model for tangent maps in differential geometry.
Finally, differentiability is the entry point for local nonlinear analysis. The inverse function theorem studies when a differentiable map is locally invertible; the implicit function theorem studies when equations $F(x,y)=0$ define $y$ as a differentiable function of $x$; and differential equations use differentiable vector fields to control flows.
[remark: Derivative Versus Jacobian]
The total derivative $Df_a$ is a linear map. The Jacobian matrix $Jf_a$ is its matrix representation in the standard bases. The rank belongs both to the linear map and to any representing matrix. Determinants, traces, and eigenvalues are defined only when the derivative is an endomorphism, so that $Jf_a$ is square.
[/remark]
[remark: Why the Domain Is Open]
The definition uses all sufficiently small displacements $h$ around $a$. Requiring $U$ to be open ensures that points near $a$ in every direction remain available. Differentiability on non-open sets requires a separate relative or extension-based definition.
[/remark]
## References
[Continuity](/page/Continuity).
[Partial Derivative](/page/Partial%20Derivative).
[Jacobian Matrix](/page/Jacobian%20Matrix).
[Linear Map](/page/Linear%20Map).
[Inverse Function Theorem](/theorems/51).
Michael Spivak, *Calculus on Manifolds* (1965).
Walter Rudin, *Principles of Mathematical Analysis* (1976).
Serge Lang, *Undergraduate Analysis* (1983).
Differentiable Map
Also known as: Differentiable mapping, Differentiable function, Differentiable vector-valued map, Frechet differentiable map, Multivariable differentiability