A function of several variables can fail to have an ordinary one-dimensional derivative for the simple reason that there is no single direction in which to move. At a point $a \in \mathbb{R}^m$, the expression $f(a+h)-f(a)$ depends on the whole vector $h$, and different approaches to $a$ can see different behaviour. Partial derivatives isolate the most basic probes: move parallel to one coordinate axis, freeze all other variables, and measure the resulting one-dimensional rate of change.
This coordinate-axis viewpoint is deliberately modest. It does not try to describe the full [derivative](/page/Derivative) at once. Instead, it asks for the response of the function to each coordinate input separately. That makes partial derivatives computationally accessible and central in calculus, differential equations, optimization, complex analysis, and differential geometry.
The price of this accessibility is that partial derivatives can mislead. A function may have every partial derivative at a point and still fail to be continuous there. Even when all first partial derivatives exist, they may not combine into a linear approximation. The chapter therefore treats partial derivatives as local coordinate tests: powerful, necessary for differentiability, but not by themselves the same as differentiability.
[example: Axis Tests Can Miss Two-Dimensional Behaviour]
Let $f:\mathbb{R}^2\to\mathbb{R}$ satisfy $f(0,0)=0$ and, for $(x,y)\ne(0,0)$,
\begin{align*}
f(x,y)=\frac{xy}{x^2+y^2}.
\end{align*}
We compute the two coordinate-axis difference quotients at the origin. For $t\ne0$,
\begin{align*}
f(t,0)=\frac{t\cdot 0}{t^2+0^2}=0.
\end{align*}
Hence
\begin{align*}
\frac{f(t,0)-f(0,0)}{t}=\frac{0-0}{t}=0.
\end{align*}
Taking $t\to0$ gives $\partial_x f(0,0)=0$. Similarly, for $t\ne0$,
\begin{align*}
f(0,t)=\frac{0\cdot t}{0^2+t^2}=0.
\end{align*}
Therefore
\begin{align*}
\frac{f(0,t)-f(0,0)}{t}=\frac{0-0}{t}=0,
\end{align*}
so $\partial_y f(0,0)=0$.
The diagonal approach gives different behaviour. For $x\ne0$,
\begin{align*}
f(x,x)=\frac{x\cdot x}{x^2+x^2}.
\end{align*}
Since $x^2+x^2=2x^2$ and $x^2\ne0$,
\begin{align*}
f(x,x)=\frac{x^2}{2x^2}=\frac{1}{2}.
\end{align*}
Thus along the path $(x,x)\to(0,0)$ the function values stay equal to $1/2$, while $f(0,0)=0$. The coordinate-axis partial derivatives both exist and vanish, but the function is not continuous at the origin.
[/example]
This example gives the main warning for the chapter. Partial derivatives are not a replacement for the total derivative; they are the coordinate pieces from which differentiability may sometimes be reconstructed, provided additional regularity is present.
## Definition
The most direct way to define a partial derivative is to reduce the multivariable function to a one-variable function by holding all but one coordinate fixed. The openness of the domain matters because we need to move slightly in both positive and negative coordinate directions while remaining inside the domain.
[definition: Partial Derivative]
Let $U \subset \mathbb{R}^m$ be open, let $f: U \to \mathbb{R}^n$ be a function, let $a=(a_1,\ldots,a_m) \in U$, and let $i \in \{1,\ldots,m\}$. The $i$-th partial derivative of $f$ at $a$ is
\begin{align*}
\partial_{x_i} f(a) := \lim_{t \to 0} \frac{f(a+t e_i)-f(a)}{t},
\end{align*}
provided this limit exists in $\mathbb{R}^n$, where $e_i$ is the $i$-th standard basis vector of $\mathbb{R}^m$.
[/definition]
The notation emphasizes the coordinate $x_i$ rather than an abstract direction. In the definition-family graph, this page is a child of [Derivative](/page/Derivative): the partial derivative is the [directional derivative](/page/Directional%20Derivative) in the coordinate direction $e_i$, while the total derivative, when it exists, is a [linear map](/page/Linear%20Map) controlling all small vectors $h \in \mathbb{R}^m$ at once.
[remark: Scalar and Vector-Valued Cases]
If $n=1$, then $\partial_{x_i} f(a)$ is a real number. If $f=(f_1,\ldots,f_n):U\to\mathbb{R}^n$, then
\begin{align*}
\partial_{x_i} f(a)=\bigl(\partial_{x_i} f_1(a),\ldots,\partial_{x_i} f_n(a)\bigr)
\end{align*}
whenever all component partial derivatives exist.
[/remark]
For functions on the plane, the notation often records the visible coordinate names. This is the form used in elementary calculus, and it is the same definition with $m=2$.
[example: Partial Derivatives in Two Variables]
Let $U=\mathbb{R}^2$ and let $f:U\to\mathbb{R}$ be given by
\begin{align*}
f(x,y)=x^2y+\sin(xy).
\end{align*}
Fix $(x,y)\in\mathbb{R}^2$. For the $x$-partial derivative, the difference quotient is
\begin{align*}
\frac{f(x+t,y)-f(x,y)}{t}=\frac{(x+t)^2y+\sin((x+t)y)-x^2y-\sin(xy)}{t}.
\end{align*}
Expanding $(x+t)^2=x^2+2xt+t^2$ gives
\begin{align*}
\frac{(x+t)^2y-x^2y}{t}=\frac{(x^2+2xt+t^2)y-x^2y}{t}.
\end{align*}
The numerator is $x^2y+2xty+t^2y-x^2y=2xty+t^2y$, so
\begin{align*}
\frac{(x+t)^2y-x^2y}{t}=2xy+ty.
\end{align*}
For the sine term, write $(x+t)y=xy+ty$. Since $\frac{d}{ds}\sin s=\cos s$ and $s(t)=xy+ty$ has derivative $y$, the one-variable chain rule gives
\begin{align*}
\lim_{t\to0}\frac{\sin((x+t)y)-\sin(xy)}{t}=y\cos(xy).
\end{align*}
Therefore
\begin{align*}
\partial_x f(x,y)=\lim_{t\to0}\left(2xy+ty+\frac{\sin((x+t)y)-\sin(xy)}{t}\right)=2xy+y\cos(xy).
\end{align*}
For the $y$-partial derivative, the difference quotient is
\begin{align*}
\frac{f(x,y+t)-f(x,y)}{t}=\frac{x^2(y+t)+\sin(x(y+t))-x^2y-\sin(xy)}{t}.
\end{align*}
The polynomial part satisfies
\begin{align*}
\frac{x^2(y+t)-x^2y}{t}=\frac{x^2y+x^2t-x^2y}{t}=x^2.
\end{align*}
For the sine term, write $x(y+t)=xy+xt$. Since $\frac{d}{ds}\sin s=\cos s$ and $s(t)=xy+xt$ has derivative $x$, the one-variable chain rule gives
\begin{align*}
\lim_{t\to0}\frac{\sin(x(y+t))-\sin(xy)}{t}=x\cos(xy).
\end{align*}
Thus
\begin{align*}
\partial_y f(x,y)=x^2+x\cos(xy).
\end{align*}
The two formulas are different because the first coordinate probe changes $x$ while keeping $y$ fixed, and the second changes $y$ while keeping $x$ fixed.
[/example]
A partial derivative at a single point is useful, but analysis often needs the derivative as a function of the point. That leads to the partial derivative function on the subset where the pointwise limits exist.
[definition: Partial Derivative Function]
Let $U \subset \mathbb{R}^m$ be open and let $f:U\to\mathbb{R}^n$ be a function. If $\partial_{x_i}f(a)$ exists for every $a\in U$, the $i$-th partial derivative function is the map $\partial_{x_i}f:U\to\mathbb{R}^n$ given by $a\mapsto \partial_{x_i}f(a)$.
[/definition]
Once partial derivatives are functions, we can ask whether they are continuous, integrable, bounded, or differentiable again. These extra properties are what allow coordinate calculations to support theorems about full differentiability, Taylor expansions, and PDE.
## Coordinate Directions and the Total Derivative
### Jacobian Data
The total derivative asks for a linear map that approximates every small displacement. Partial derivatives ask only what happens on the coordinate axes. The relationship is essential: differentiability forces the partial derivatives to exist, and the partial derivatives are the columns of the Jacobian matrix.
Before connecting axis derivatives to full differentiability, we need a single object that stores every coordinate response. The Jacobian matrix is the bookkeeping device that makes the possible linear approximation visible.
[definition: Jacobian Matrix]
Let $U\subset\mathbb{R}^m$ be open, let $f=(f_1,\ldots,f_n):U\to\mathbb{R}^n$, and let $a\in U$. If each partial derivative $\partial_{x_j}f_i(a)$ exists, the Jacobian matrix of $f$ at $a$ is the matrix $Jf_a\in\mathbb{R}^{n\times m}$ with entries
\begin{align*}
(Jf_a)_{ij}=\partial_{x_j}f_i(a).
\end{align*}
[/definition]
The Jacobian matrix records all coordinate-axis rates in one object. The next question is whether this recorded coordinate data really comes from a total derivative. Differentiability gives the strongest answer: the entries in the Jacobian are not merely separate slopes but the standard-coordinate representation of the same linear map.
[quotetheorem:7904]
This theorem gives a necessary condition for differentiability. It does not give a sufficient condition. Knowing the values of $Df_a(e_i)$ would determine a candidate linear map, but the function may still fail to be approximated by that map away from the coordinate axes.
### Failure of Axis Data Alone
The next example returns to the opening warning in the language of differentiability. Its purpose is to separate two statements that are often confused: existence of the coordinate slopes, and existence of a genuine linear approximation.
[example: All Partial Derivatives Exist but No Total Derivative]
Let $f:\mathbb{R}^2\to\mathbb{R}$ satisfy $f(0,0)=0$ and, for $(x,y)\ne(0,0)$,
\begin{align*}
f(x,y)=\frac{xy}{x^2+y^2}.
\end{align*}
We first compute the two coordinate-axis partial derivatives at the origin from the defining difference quotients. For $t\ne0$,
\begin{align*}
f(t,0)=\frac{t\cdot 0}{t^2+0^2}=0.
\end{align*}
Hence
\begin{align*}
\frac{f(t,0)-f(0,0)}{t}=\frac{0-0}{t}=0.
\end{align*}
Since the constant quotient $0$ has limit $0$ as $t\to0$, we get $\partial_x f(0,0)=0$. Similarly, for $t\ne0$,
\begin{align*}
f(0,t)=\frac{0\cdot t}{0^2+t^2}=0.
\end{align*}
Therefore
\begin{align*}
\frac{f(0,t)-f(0,0)}{t}=\frac{0-0}{t}=0,
\end{align*}
so $\partial_y f(0,0)=0$.
Now examine the diagonal path $y=x$. For $x\ne0$,
\begin{align*}
f(x,x)=\frac{x\cdot x}{x^2+x^2}.
\end{align*}
Because $x\cdot x=x^2$ and $x^2+x^2=2x^2$, this becomes
\begin{align*}
f(x,x)=\frac{x^2}{2x^2}.
\end{align*}
Since $x\ne0$ implies $x^2\ne0$, cancellation gives
\begin{align*}
f(x,x)=\frac{1}{2}.
\end{align*}
Thus along the path $(x,x)\to(0,0)$ the function values stay equal to $1/2$, while $f(0,0)=0$, so $f$ is not continuous at the origin. If $f$ were differentiable at the origin, then by the definition of differentiability there would be a linear map $L:\mathbb{R}^2\to\mathbb{R}$ and a remainder $r(h)$ with
\begin{align*}
f(h)-f(0,0)=L(h)+r(h)
\end{align*}
and $r(h)/\lVert h\rVert\to0$ as $h\to0$. Since $L$ is linear on finite-dimensional Euclidean space, $L(h)\to0$ as $h\to0$, and also $r(h)=\lVert h\rVert\bigl(r(h)/\lVert h\rVert\bigr)\to0$. This would force $f(h)\to f(0,0)$, contradicting the diagonal calculation. Therefore both partial derivatives exist and vanish at the origin, but the total derivative does not exist there.
[/example]
The previous failure shows that pointwise existence of partial derivatives is too weak. A standard repair is to require the partial derivatives to exist near the point and vary continuously. Then the coordinate data do assemble into a full linear approximation.
[quotetheorem:327]
This result is often the bridge between computational calculus and the abstract derivative. The hypotheses ask for more than axis slopes at one point: they require stable coordinate slopes nearby. Stability is what controls movement in directions that are not coordinate axes.
## First-Order Calculus Rules
### Algebraic Rules
Partial derivatives inherit their elementary rules from one-variable differentiation because each calculation freezes all other variables. The important point is to state the rules with domains and codomains in place, so that each formula has a precise meaning.
The first package of rules concerns linear combinations and products. These rules are needed whenever partial derivatives are used to differentiate expressions built from simpler functions, and each formula assumes the relevant partial derivatives exist at the point under discussion.
[quotetheorem:7905]
These formulas explain why partial derivative computations look familiar. The variable being differentiated changes, but the algebra is the algebra of one-variable difference quotients along the line $a+t e_i$.
[example: Product Rule with a Frozen Variable]
Let $f:\mathbb{R}^2\to\mathbb{R}$ be given by
\begin{align*}
f(x,y)=e^x\cos y.
\end{align*}
Fix $(x,y)\in\mathbb{R}^2$. For the $x$-partial derivative, the difference quotient is
\begin{align*}
\frac{f(x+t,y)-f(x,y)}{t}=\frac{e^{x+t}\cos y-e^x\cos y}{t}.
\end{align*}
Using $e^{x+t}=e^x e^t$, this becomes
\begin{align*}
\frac{f(x+t,y)-f(x,y)}{t}=e^x\cos y\frac{e^t-1}{t}.
\end{align*}
Since $\lim_{t\to0}(e^t-1)/t=1$, we obtain
\begin{align*}
\partial_x f(x,y)=e^x\cos y.
\end{align*}
For the $y$-partial derivative, the difference quotient is
\begin{align*}
\frac{f(x,y+t)-f(x,y)}{t}=\frac{e^x\cos(y+t)-e^x\cos y}{t}.
\end{align*}
Factoring out $e^x$ gives
\begin{align*}
\frac{f(x,y+t)-f(x,y)}{t}=e^x\frac{\cos(y+t)-\cos y}{t}.
\end{align*}
Since $\frac{d}{ds}\cos s=-\sin s$, the last quotient tends to $-\sin y$ as $t\to0$. Therefore
\begin{align*}
\partial_y f(x,y)=-e^x\sin y.
\end{align*}
The coordinate being varied determines which one-variable derivative is taken: the other coordinate is fixed only along that particular coordinate-axis path.
[/example]
### Chain Rule
Products and sums keep the same input variables, but compositions can change the coordinate system before the outer function is evaluated. A coordinate change in the input can affect every coordinate of an intermediate function, so the partial derivative of a composition must account for all intermediate coordinate directions.
[quotetheorem:7906]
In matrix language, this is the statement that the Jacobian matrix of a composition is a product of Jacobian matrices. The displayed coordinate formula is often more informative when solving PDE or changing variables.
[example: Chain Rule in Polar Coordinates]
Let $F:\mathbb{R}^2\to\mathbb{R}$ be differentiable, and define $u:(0,\infty)\times\mathbb{R}\to\mathbb{R}$ by
\begin{align*}
u(r,\theta)=F(r\cos\theta,r\sin\theta).
\end{align*}
Write $g(r,\theta)=(g_1(r,\theta),g_2(r,\theta))=(r\cos\theta,r\sin\theta)$, so $u=F\circ g$. Set
\begin{align*}
x=g_1(r,\theta)=r\cos\theta,\qquad y=g_2(r,\theta)=r\sin\theta.
\end{align*}
For the $r$-partial derivative, the coordinate derivatives of $g$ are
\begin{align*}
\partial_r g_1(r,\theta)=\partial_r(r\cos\theta)=\cos\theta
\end{align*}
because $\cos\theta$ is fixed while $r$ varies, and
\begin{align*}
\partial_r g_2(r,\theta)=\partial_r(r\sin\theta)=\sin\theta
\end{align*}
because $\sin\theta$ is fixed while $r$ varies. By the *[Chain Rule for Partial Derivatives](/theorems/7906)* applied to $u=F\circ g$,
\begin{align*}
\partial_r u(r,\theta)=(\partial_x F)(g(r,\theta))\,\partial_r g_1(r,\theta)+(\partial_y F)(g(r,\theta))\,\partial_r g_2(r,\theta).
\end{align*}
Substituting $g(r,\theta)=(x,y)$ and the two computed derivatives gives
\begin{align*}
\partial_r u(r,\theta)=(\partial_x F)(x,y)\cos\theta+(\partial_y F)(x,y)\sin\theta.
\end{align*}
For the $\theta$-partial derivative, $r$ is fixed. Since $\frac{d}{d\theta}\cos\theta=-\sin\theta$,
\begin{align*}
\partial_\theta g_1(r,\theta)=\partial_\theta(r\cos\theta)=-r\sin\theta.
\end{align*}
Since $\frac{d}{d\theta}\sin\theta=\cos\theta$,
\begin{align*}
\partial_\theta g_2(r,\theta)=\partial_\theta(r\sin\theta)=r\cos\theta.
\end{align*}
Applying the same chain rule in the $\theta$ direction gives
\begin{align*}
\partial_\theta u(r,\theta)=(\partial_x F)(g(r,\theta))\,\partial_\theta g_1(r,\theta)+(\partial_y F)(g(r,\theta))\,\partial_\theta g_2(r,\theta).
\end{align*}
Substituting $g(r,\theta)=(x,y)$, $\partial_\theta g_1(r,\theta)=-r\sin\theta$, and $\partial_\theta g_2(r,\theta)=r\cos\theta$ yields
\begin{align*}
\partial_\theta u(r,\theta)=-(\partial_x F)(x,y)r\sin\theta+(\partial_y F)(x,y)r\cos\theta.
\end{align*}
The formulas show that changing from Cartesian coordinates to polar coordinates replaces each coordinate derivative of $u$ by a weighted combination of the Cartesian partial derivatives of $F$.
[/example]
## Higher-Order Partial Derivatives
### Iterated Coordinate Derivatives
Once a partial derivative is itself a function, it can be differentiated again. Higher-order partial derivatives measure repeated coordinate responses and are the raw material for Taylor expansions, Hessian matrices, elliptic operators, and many evolution equations.
The notation should distinguish repeated differentiation from enumeration. Superscripts are used for derivative order, while the coordinate labels remain subscripts.
[definition: Higher-Order Partial Derivative]
Let $U\subset\mathbb{R}^m$ be open, let $f:U\to\mathbb{R}^n$, and let $i_1,\ldots,i_k\in\{1,\ldots,m\}$. Let $V\subset U$ be the set of points at which the successive partial derivatives in the order $i_1,\ldots,i_k$ exist. The higher-order partial derivative with respect to $i_1,\ldots,i_k$ is the map
\begin{align*}
\partial_{x_{i_k}}\cdots\partial_{x_{i_1}}f:V\to\mathbb{R}^n.
\end{align*}
[/definition]
The iterated notation records the exact order of differentiation, which is useful for a single calculation but heavy in estimates involving many derivatives at once. PDE and Taylor theory need a compact way to quantify over every derivative of a fixed total order, and multi-index notation provides that compression without changing the underlying coordinate derivatives.
[definition: Multi-Index Partial Derivative]
Let $U\subset\mathbb{R}^m$ be open, let $f:U\to\mathbb{R}^n$, and let $\alpha=(\alpha_1,\ldots,\alpha_m)\in\mathbb{N}_0^m$. Let $V_\alpha\subset U$ be the set of points at which the successive partial derivatives in
\begin{align*}
\partial_{x_1}^{\alpha_1}\cdots\partial_{x_m}^{\alpha_m}f
\end{align*}
exist. The multi-index partial derivative of order $|\alpha|=\alpha_1+\cdots+\alpha_m$ is the map
\begin{align*}
D^\alpha f:V_\alpha\to\mathbb{R}^n
\end{align*}
given by
\begin{align*}
D^\alpha f=\partial_{x_1}^{\alpha_1}\cdots\partial_{x_m}^{\alpha_m}f.
\end{align*}
[/definition]
The notation $D^\alpha f$ is standard in analysis, but the first-order gradient of a scalar function is written with $\nabla f$. Multi-index notation becomes most useful after one understands when the order of differentiation may be exchanged.
[example: A Second Mixed Partial Derivative]
Let $f:\mathbb{R}^2\to\mathbb{R}$ be defined by
\begin{align*}
f(x,y)=x^3y^2+e^{xy}.
\end{align*}
We compute the mixed partial derivative $\partial_y\partial_x f(x,y)$ by first varying $x$ and keeping $y$ fixed. The polynomial term satisfies
\begin{align*}
\partial_x(x^3y^2)=y^2\partial_x(x^3)=3x^2y^2.
\end{align*}
For the exponential term, the one-variable chain rule with inner function $xy$ gives
\begin{align*}
\partial_x(e^{xy})=e^{xy}\partial_x(xy)=y e^{xy}.
\end{align*}
Hence
\begin{align*}
\partial_x f(x,y)=3x^2y^2+y e^{xy}.
\end{align*}
Now differentiate this expression with respect to $y$. For the first term,
\begin{align*}
\partial_y(3x^2y^2)=3x^2\partial_y(y^2)=6x^2y.
\end{align*}
For the second term, apply the product rule to $y e^{xy}$:
\begin{align*}
\partial_y(y e^{xy})=(\partial_y y)e^{xy}+y\partial_y(e^{xy}).
\end{align*}
Since $\partial_y y=1$ and the chain rule gives $\partial_y(e^{xy})=e^{xy}\partial_y(xy)=x e^{xy}$, this becomes
\begin{align*}
\partial_y(y e^{xy})=e^{xy}+xy e^{xy}.
\end{align*}
Therefore
\begin{align*}
\partial_y\partial_x f(x,y)=6x^2y+e^{xy}+xy e^{xy}.
\end{align*}
If the order is reversed, first
\begin{align*}
\partial_y f(x,y)=\partial_y(x^3y^2)+\partial_y(e^{xy})=2x^3y+x e^{xy}.
\end{align*}
Differentiating this with respect to $x$ gives
\begin{align*}
\partial_x(2x^3y)=6x^2y.
\end{align*}
For $x e^{xy}$, the product rule gives
\begin{align*}
\partial_x(x e^{xy})=(\partial_x x)e^{xy}+x\partial_x(e^{xy})=e^{xy}+xy e^{xy}.
\end{align*}
Thus
\begin{align*}
\partial_x\partial_y f(x,y)=6x^2y+e^{xy}+xy e^{xy}.
\end{align*}
In this example the two mixed partial derivatives agree, and the agreement is visible from the two explicit coordinate computations.
[/example]
### Mixed Partials
The equality of mixed partial derivatives is not a formal consequence of the notation. Repeated coordinate differentiation can depend on the order of the coordinate probes unless the function has enough regularity near the point. The next theorem is the standard local condition that permits the order to be exchanged.
[quotetheorem:7907]
This theorem is often called Clairaut's theorem or Schwarz's theorem. Its assumptions matter because mixed partial derivatives can exist and fail to agree when regularity near the point is insufficient.
[example: Mixed Partials Can Depend on Order]
Define $f:\mathbb{R}^2\to\mathbb{R}$ by $f(0,0)=0$ and, for $(x,y)\ne(0,0)$,
\begin{align*}
f(x,y)=\frac{xy(x^2-y^2)}{x^2+y^2}.
\end{align*}
We compute the two mixed partial derivatives at the origin from the defining one-variable difference quotients. First, along the $x$-axis, if $t\ne0$, then
\begin{align*}
f(t,0)=\frac{t\cdot 0\cdot(t^2-0^2)}{t^2+0^2}=0.
\end{align*}
Hence
\begin{align*}
\partial_x f(0,0)=\lim_{t\to0}\frac{f(t,0)-f(0,0)}{t}=\lim_{t\to0}\frac{0-0}{t}=0.
\end{align*}
Now fix $y\ne0$ and compute $\partial_x f(0,y)$. Since $f(0,y)=0$, for $t\ne0$,
\begin{align*}
\frac{f(t,y)-f(0,y)}{t}=\frac{1}{t}\frac{ty(t^2-y^2)}{t^2+y^2}.
\end{align*}
Cancelling the factor $t$ gives
\begin{align*}
\frac{f(t,y)-f(0,y)}{t}=y\frac{t^2-y^2}{t^2+y^2}.
\end{align*}
Letting $t\to0$ with $y$ fixed gives
\begin{align*}
\partial_x f(0,y)=y\frac{0-y^2}{0+y^2}=-y.
\end{align*}
This formula also agrees with $\partial_x f(0,0)=0$ when $y=0$. Therefore
\begin{align*}
\partial_y\partial_x f(0,0)=\lim_{t\to0}\frac{\partial_x f(0,t)-\partial_x f(0,0)}{t}=\lim_{t\to0}\frac{-t-0}{t}=-1.
\end{align*}
For the opposite order, first compute along the $y$-axis. If $t\ne0$, then
\begin{align*}
f(0,t)=\frac{0\cdot t\cdot(0^2-t^2)}{0^2+t^2}=0.
\end{align*}
Thus
\begin{align*}
\partial_y f(0,0)=\lim_{t\to0}\frac{f(0,t)-f(0,0)}{t}=\lim_{t\to0}\frac{0-0}{t}=0.
\end{align*}
Now fix $x\ne0$ and compute $\partial_y f(x,0)$. Since $f(x,0)=0$, for $t\ne0$,
\begin{align*}
\frac{f(x,t)-f(x,0)}{t}=\frac{1}{t}\frac{xt(x^2-t^2)}{x^2+t^2}.
\end{align*}
Cancelling the factor $t$ gives
\begin{align*}
\frac{f(x,t)-f(x,0)}{t}=x\frac{x^2-t^2}{x^2+t^2}.
\end{align*}
Letting $t\to0$ with $x$ fixed gives
\begin{align*}
\partial_y f(x,0)=x\frac{x^2-0}{x^2+0}=x.
\end{align*}
This formula also agrees with $\partial_y f(0,0)=0$ when $x=0$. Therefore
\begin{align*}
\partial_x\partial_y f(0,0)=\lim_{t\to0}\frac{\partial_y f(t,0)-\partial_y f(0,0)}{t}=\lim_{t\to0}\frac{t-0}{t}=1.
\end{align*}
The same function has $\partial_y\partial_x f(0,0)=-1$ and $\partial_x\partial_y f(0,0)=1$, so mixed partial derivatives can depend on the order when the surrounding second-order regularity is missing.
[/example]
## Gradients, Hessians, and Linearization
For scalar-valued functions, the first partial derivatives assemble into a vector rather than a matrix. This vector points in the direction of greatest first-order increase when the Euclidean structure is in force. It is one of the main ways partial derivatives become geometry.
The gradient is not defined as a list for its own sake; it is the object that represents the linear functional $h\mapsto Df_a(h)$ by an [inner product](/page/Inner%20Product) whenever $f$ is differentiable.
[definition: Gradient]
Let $U\subset\mathbb{R}^m$ be open and let $f:U\to\mathbb{R}$ be a function whose first partial derivatives exist at $a\in U$. The gradient of $f$ at $a$ is
\begin{align*}
\nabla f(a)=\bigl(\partial_{x_1}f(a),\ldots,\partial_{x_m}f(a)\bigr)\in\mathbb{R}^m.
\end{align*}
[/definition]
The definition gives a vector, while differentiability gives a linear functional on displacement vectors. To use gradients in linear approximation, estimates, and geometry, these two descriptions must agree: dotting a displacement with the gradient should recover the first-order change of the function.
[quotetheorem:7908]
First-order linearization gives a tangent hyperplane, but it cannot distinguish a local minimum from a saddle point at a critical point. To measure curvature of a scalar function, we need to organize the second partial derivatives into a matrix.
[definition: Hessian Matrix]
Let $U\subset\mathbb{R}^m$ be open, let $f:U\to\mathbb{R}$, and let $a\in U$. If all second partial derivatives $\partial_{x_i}\partial_{x_j}f(a)$ exist, the Hessian matrix of $f$ at $a$ is the matrix $Hf_a\in\mathbb{R}^{m\times m}$ with entries
\begin{align*}
(Hf_a)_{ij}=\partial_{x_i}\partial_{x_j}f(a).
\end{align*}
[/definition]
The Hessian is most useful when it is symmetric, and the equality of mixed partial derivatives supplies symmetry under continuous second partial derivatives. The sign of the resulting quadratic form detects local shape near a critical point.
[example: Gradient and Hessian of a Quadratic Function]
Let $A=(a_{ij})_{1\leq i,j\leq m}$, let $b=(b_1,\ldots,b_m)$, and write $x=(x_1,\ldots,x_m)$. Since $A$ is symmetric, $a_{ij}=a_{ji}$ for all $i,j$. Expanding the two dot products gives
\begin{align*}
f(x)=\frac{1}{2}\sum_{i=1}^m\sum_{j=1}^m a_{ij}x_i x_j+\sum_{i=1}^m b_i x_i.
\end{align*}
Fix $k\in\{1,\ldots,m\}$. Using the one-variable product rule in the coordinate $x_k$,
\begin{align*}
\partial_{x_k}(x_i x_j)=\delta_{ik}x_j+x_i\delta_{jk},
\end{align*}
where $\delta_{ik}=1$ if $i=k$ and $\delta_{ik}=0$ otherwise. Therefore
\begin{align*}
\partial_{x_k}f(x)=\frac{1}{2}\sum_{i=1}^m\sum_{j=1}^m a_{ij}(\delta_{ik}x_j+x_i\delta_{jk})+b_k.
\end{align*}
The terms with $\delta_{ik}$ leave only $i=k$, and the terms with $\delta_{jk}$ leave only $j=k$, so
\begin{align*}
\partial_{x_k}f(x)=\frac{1}{2}\sum_{j=1}^m a_{kj}x_j+\frac{1}{2}\sum_{i=1}^m a_{ik}x_i+b_k.
\end{align*}
Since $a_{ik}=a_{ki}$ and the index $i$ is a dummy index,
\begin{align*}
\frac{1}{2}\sum_{j=1}^m a_{kj}x_j+\frac{1}{2}\sum_{i=1}^m a_{ik}x_i=\sum_{j=1}^m a_{kj}x_j.
\end{align*}
Thus
\begin{align*}
\partial_{x_k}f(x)=\sum_{j=1}^m a_{kj}x_j+b_k=(Ax)_k+b_k.
\end{align*}
Since this holds for every coordinate $k$,
\begin{align*}
\nabla f(x)=Ax+b.
\end{align*}
For the Hessian, differentiate the $k$-th first partial derivative with respect to $x_\ell$:
\begin{align*}
\partial_{x_\ell}\partial_{x_k}f(x)=\partial_{x_\ell}\left(\sum_{j=1}^m a_{kj}x_j+b_k\right)=\sum_{j=1}^m a_{kj}\delta_{j\ell}=a_{k\ell}.
\end{align*}
With the chapter's convention $(Hf_x)_{ij}=\partial_{x_i}\partial_{x_j}f(x)$, this gives
\begin{align*}
(Hf_x)_{ij}=a_{ji}=a_{ij}.
\end{align*}
Hence
\begin{align*}
Hf_x=A.
\end{align*}
The gradient is affine in $x$, while the Hessian is the same matrix $A$ at every point, so the quadratic function has constant second-order behaviour.
[/example]
## Partial Differential Equations
### First-Order Equations
Partial derivatives become structural objects in PDE because equations can prescribe relationships among coordinate rates of change. The notation $\partial_t u$, $\partial_{x_i}u$, and $\Delta u$ separates time, space, transport, diffusion, and forcing.
A PDE is not merely an equation containing many symbols. It is an equation for an unknown function on a domain, and the partial derivatives describe the local constraints that the function must satisfy.
[definition: First-Order Partial Differential Equation]
Let $U\subset\mathbb{R}^m$ be open and let $F:U\times\mathbb{R}\times\mathbb{R}^m\to\mathbb{R}$ be a function. A first-order partial differential equation for an unknown function $u:U\to\mathbb{R}$ has the form
\begin{align*}
F(x,u(x),\nabla u(x))=0
\end{align*}
for $x\in U$ at points where the displayed quantities are defined.
[/definition]
This definition shows how first partial derivatives enter equations as independent variables of a constraint. Linear equations are the first class where the geometry can often be read directly from the coefficients.
[example: Transport Equation]
Let $b=(b_1,\ldots,b_m)\in\mathbb{R}^m$ be fixed, and let $u:\mathbb{R}^m\to\mathbb{R}$ be differentiable. Since
\begin{align*}
\nabla u(x)=\bigl(\partial_{x_1}u(x),\ldots,\partial_{x_m}u(x)\bigr),
\end{align*}
the transport equation is the coordinate identity
\begin{align*}
b\cdot\nabla u(x)=\sum_{i=1}^m b_i\,\partial_{x_i}u(x)=0.
\end{align*}
Fix a point $x\in\mathbb{R}^m$ and define the one-variable function
\begin{align*}
\varphi(t)=u(x+tb).
\end{align*}
Writing $x+tb=(x_1+tb_1,\ldots,x_m+tb_m)$, the chain rule gives
\begin{align*}
\varphi'(t)=\sum_{i=1}^m \partial_{x_i}u(x+tb)\,\frac{d}{dt}(x_i+tb_i).
\end{align*}
Since $\frac{d}{dt}(x_i+tb_i)=b_i$, this becomes
\begin{align*}
\varphi'(t)=\sum_{i=1}^m b_i\,\partial_{x_i}u(x+tb)=b\cdot\nabla u(x+tb).
\end{align*}
If $u$ satisfies $b\cdot\nabla u=0$ at every point, then $\varphi'(t)=0$ for every $t$. Hence $\varphi$ is constant on each interval by the one-variable fact that a differentiable function with zero derivative is constant. Therefore
\begin{align*}
u(x+tb)=u(x)
\end{align*}
for all $t$, so the coordinate partial derivatives combine to say that $u$ is constant along every line parallel to $b$.
[/example]
### Second-Order Operators
First-order equations constrain slopes. Many fundamental equations instead constrain curvature or averaged second-order variation. The Laplacian is the prototype because it sums the pure second coordinate derivatives into a single operator.
[definition: Laplacian]
Let $U\subset\mathbb{R}^m$ be open and let $u:U\to\mathbb{R}$ have second partial derivatives at $x\in U$. The Laplacian of $u$ at $x$ is
\begin{align*}
\Delta u(x)=\sum_{i=1}^m \partial_{x_i}\partial_{x_i}u(x).
\end{align*}
[/definition]
To calibrate the Laplacian, we need a [test function](/page/Test%20Function) whose curvature can be read in every coordinate direction. The radial quadratic $u(x)=|x|^2$ is the basic model, and its Laplacian fixes the factor that appears throughout Poisson equations, harmonic functions, and radial barrier arguments.
[quotetheorem:7909]
This computation is simple but important: even a rotationally symmetric expression is evaluated through coordinate partial derivatives. More advanced theory explains why the final operator is invariant under orthogonal changes of coordinates.
## Coordinates on Manifolds
Partial derivatives are coordinate-dependent, and that dependence becomes visible on manifolds. A smooth manifold does not come with global coordinates, so partial derivatives must be taken inside a named chart.
The local-coordinate version is still indispensable because vector fields, differential forms, metrics, and curvature are all computed by differentiating coordinate component functions.
[definition: Coordinate Partial Derivative on a Manifold]
Let $M$ be a smooth $m$-dimensional manifold, let $(U,\varphi)$ be a chart with coordinates $(x_1,\ldots,x_m)$, let $p\in U$, and let $f:M\to\mathbb{R}$ be a function such that $f\circ\varphi^{-1}:\varphi(U)\to\mathbb{R}$ is defined near $\varphi(p)$. The coordinate partial derivative of $f$ with respect to $x_i$ at $p$ is
\begin{align*}
\partial_{x_i}f(p):=\partial_{y_i}(f\circ\varphi^{-1})(\varphi(p)),
\end{align*}
where $(y_1,\ldots,y_m)$ are the standard coordinates on $\varphi(U)\subset\mathbb{R}^m$.
[/definition]
The notation suppresses the chart only after the chart is fixed. Changing coordinates changes the coordinate partial derivative operators by the chain rule, which is why geometric objects are built from transformation laws rather than from bare coordinate derivatives alone.
[example: Coordinate Dependence on the Circle]
Let $A\subset S^1$ be the open arc parametrized by $\theta\in(-\pi/2,\pi/2)$, and let $f:S^1\to\mathbb{R}$ be given on this arc by
\begin{align*}
f(\cos\theta,\sin\theta)=\cos\theta.
\end{align*}
In the coordinate $\theta$, the coordinate representative of $f$ is the one-variable function $F(\theta)=\cos\theta$. Therefore
\begin{align*}
\frac{d}{d\theta}f(\cos\theta,\sin\theta)=F'(\theta)=-\sin\theta.
\end{align*}
Now use the coordinate $s=2\theta$ on the same arc. Then $\theta=s/2$, so the same point of $S^1$ is written as
\begin{align*}
(\cos\theta,\sin\theta)=\left(\cos\frac{s}{2},\sin\frac{s}{2}\right).
\end{align*}
In the coordinate $s$, the coordinate representative of $f$ is
\begin{align*}
G(s)=f\left(\cos\frac{s}{2},\sin\frac{s}{2}\right)=\cos\frac{s}{2}.
\end{align*}
Using the one-variable chain rule with inner function $s/2$, whose derivative is $1/2$, gives
\begin{align*}
G'(s)=-\sin\frac{s}{2}\cdot\frac{1}{2}.
\end{align*}
Since $\theta=s/2$, this is
\begin{align*}
\frac{d}{ds}f\left(\cos\frac{s}{2},\sin\frac{s}{2}\right)=-\frac{1}{2}\sin\theta.
\end{align*}
The function on the circle has not changed, but replacing $\theta$ by the rescaled coordinate $s=2\theta$ changes the coordinate derivative by the factor $1/2$.
[/example]
This is the entry point to differential geometry: partial derivatives are local coordinate tools, while tangent vectors and differentials are the coordinate-independent objects they help represent.
## Beyond and Connected Topics
Partial derivatives sit between elementary one-variable differentiation and the full derivative. The natural next topic is [Derivative](/page/Derivative), where the total derivative $Df_a$ is defined as a linear approximation and partial derivatives appear as its values on the standard basis vectors.
For analysis on metric and topological spaces, the background about open sets and continuity belongs to [Cambridge IB Analysis and Topology](/page/Cambridge%20IB%20Analysis%20and%20Topology). That course-level setting explains why openness, limits, and continuity are part of the definition rather than incidental technicalities.
In complex analysis, the real partial derivatives $\partial_x$ and $\partial_y$ combine into the Wirtinger operators $\partial_z$ and $\partial_{\bar z}$. The Cauchy--Riemann equations in [Cambridge IB Complex Analysis](/page/Cambridge%20IB%20Complex%20Analysis) show that holomorphicity is a strong compatibility condition between two real partial derivatives.
In differential equations, partial derivatives become the language of evolution and constraint. [Cambridge IA Differential Equations](/page/Cambridge%20IA%20Differential%20Equations) begins with ordinary differential equations, while PDE theory asks what changes when several independent variables are present.
On manifolds, coordinate partial derivatives must be interpreted through charts and transformation rules. [Cambridge III Differential Geometry](/page/Cambridge%20III%20Differential%20Geometry) develops the coordinate-independent framework in which these local computations become tangent vectors, vector fields, connections, and curvature.
## References
Androma, [Cambridge IB Analysis and Topology](/page/Cambridge%20IB%20Analysis%20and%20Topology).
Androma, [Cambridge IB Complex Analysis](/page/Cambridge%20IB%20Complex%20Analysis).
Androma, [Cambridge IA Differential Equations](/page/Cambridge%20IA%20Differential%20Equations).
Androma, [Cambridge III Differential Geometry](/page/Cambridge%20III%20Differential%20Geometry).
Apostol, *Calculus, Volume II* (1969).
Spivak, *Calculus on Manifolds* (1965).
Evans, *Partial Differential Equations* (2010).