The [derivative](/page/Derivative) of a function is the first systematic answer to the question: near a point, which linear map best approximates the function? Many questions in analysis need the next layer of information. A linear approximation tells us instantaneous velocity, but it does not tell us whether that velocity is increasing, rotating, flattening, or changing direction. The second derivative measures the change of the derivative itself.
This idea is familiar for functions $f: U \subset \mathbb{R} \to \mathbb{R}$, where $f''(a)$ measures acceleration or concavity. In several variables the same idea survives, but the output is no longer merely a number or a matrix without context. Since $Df_a$ is a [linear map](/page/Linear%20Map), the derivative of $Df$ at $a$ is naturally a linear map whose values are linear maps. After the standard identification, it is a bilinear map in two direction variables.
Second derivatives are the local language behind Taylor Theorem, Hessian Matrix, second-order tests for extrema, [convex functions](/page/Convex%20Function), Newton's method, curvature computations, and second-order differential equations. The concept is therefore a child of the derivative: it differentiates the first-order approximation once more.
## Definition
The first derivative varies with the base point. If $f$ is differentiable near $a$, then nearby points $x$ have derivatives $Df_x$. The second derivative is introduced to measure whether this derivative-valued map has its own linear approximation at $a$.
[definition: Second Derivative]
Let $U \subset \mathbb{R}^m$ be open, let $a \in U$, and let $f: U \to \mathbb{R}^n$ be differentiable on an open neighbourhood $V \subset U$ of $a$. Write $\mathcal{L}(E,F)$ for the space of linear maps from a [vector space](/page/Vector%20Space) $E$ to a vector space $F$. The second derivative of $f$ at $a$ is the derivative at $a$ of the map $Df: V \to \mathcal{L}(\mathbb{R}^m, \mathbb{R}^n)$, $x \mapsto Df_x$. When it exists, it is denoted $D^2f_a$ and satisfies
\begin{align*}
D^2f_a \in \mathcal{L}(\mathbb{R}^m, \mathcal{L}(\mathbb{R}^m, \mathbb{R}^n)).
\end{align*}
[/definition]
The target space in the definition is accurate but cumbersome. In computations, a second derivative should accept two direction vectors and return the second-order response of the function. This motivates the standard bilinear reading of the same object.
[definition: Bilinear Form of the Second Derivative]
Let $U \subset \mathbb{R}^m$ be open, let $a \in U$, and let $f: U \to \mathbb{R}^n$ have a second derivative at $a$. The [bilinear form](/page/Bilinear%20Form) associated to $D^2f_a$ is the map $D^2f_a: \mathbb{R}^m \times \mathbb{R}^m \to \mathbb{R}^n$ defined by
\begin{align*}
D^2f_a(h,k)=(D^2f_a(h))(k).
\end{align*}
[/definition]
Equivalently, $D^2f_a \in \mathrm{Bil}(\mathbb{R}^m \times \mathbb{R}^m,\mathbb{R}^n)$, where $\mathrm{Bil}(E \times E,F)$ denotes the space of bilinear maps from $E \times E$ to $F$.
Once this identification is made, analysts usually use the same symbol $D^2f_a$ for both the operator-valued linear map and the bilinear map. The next definition gives the name for the pointwise existence condition, because many arguments need to distinguish existence at a point from continuous existence on a whole [open set](/page/Open%20Set).
[definition: Twice Differentiable at a Point]
Let $U \subset \mathbb{R}^m$ be open, let $a \in U$, and let $f: U \to \mathbb{R}^n$. The map $f$ is twice differentiable at $a$ if there is an open neighbourhood $V \subset U$ of $a$ such that $f$ is differentiable on $V$ and the map $Df: V \to \mathcal{L}(\mathbb{R}^m, \mathbb{R}^n)$ is differentiable at $a$.
[/definition]
This pointwise notion separates existence of a second derivative from any choice of coordinates or matrices. When the codomain is $\mathbb{R}$, many second-order questions ask how to store the bilinear map in standard coordinates, and that storage device is the Hessian matrix.
[definition: Hessian Matrix]
Let $U \subset \mathbb{R}^m$ be open, let $a \in U$, and let $f: U \to \mathbb{R}$ be twice differentiable at $a$. The Hessian matrix of $f$ at $a$ is the matrix $Hf_a \in \mathbb{R}^{m \times m}$ representing the bilinear map $D^2f_a$ in the standard basis, with entries
\begin{align*}
(Hf_a)_{ij}=D^2f_a(e_i,e_j), \qquad 1 \le i,j \le m.
\end{align*}
[/definition]
With this convention, the second-order contribution in Taylor expansion is expressed by $D^2f_a(h,h)$, or in coordinates by $h^\top Hf_a h$. A Hessian entry is obtained by differentiating a coordinate derivative again, so the coordinate-level notion of a second partial derivative needs its own definition.
[definition: Second Partial Derivative]
Let $U \subset \mathbb{R}^m$ be open, let $f: U \to \mathbb{R}^n$, and let $a \in U$. If the partial derivative $\partial_{x_j} f$ is defined on a neighbourhood of $a$ and $\partial_{x_i}(\partial_{x_j}f)(a)$ exists, then the second partial derivative of $f$ at $a$ in the $x_j$ direction followed by the $x_i$ direction is
\begin{align*}
\partial_{x_i}\partial_{x_j} f(a):=\partial_{x_i}(\partial_{x_j}f)(a).
\end{align*}
[/definition]
The order of indices matters in the notation, and isolated second partial derivatives do not by themselves provide a stable second-order calculus. To make coordinate formulas, symmetry, and Taylor remainders behave well on open sets, one imposes continuity of the total second derivative.
[definition: Twice Continuously Differentiable Map]
Let $U \subset \mathbb{R}^m$ be open and let $f: U \to \mathbb{R}^n$. The map $f$ is twice continuously differentiable on $U$, written $f \in C^2(U;\mathbb{R}^n)$, if $f$ is twice differentiable at every point of $U$ and the map $D^2f: U \to \mathcal{L}(\mathbb{R}^m, \mathcal{L}(\mathbb{R}^m, \mathbb{R}^n))$, $x \mapsto D^2f_x$, is continuous.
[/definition]
This continuity assumption is stronger than pointwise existence. It is the hypothesis that makes mixed partial derivatives interchangeable and makes second-order Taylor remainders stable near the point.
## Equivalent Characterisations
The most useful alternative view of the second derivative is through a second-order approximation. The first derivative gives an affine approximation; the second derivative gives the next correction term. This perspective explains why the second derivative is bilinear: a second-order correction is quadratic in the displacement. In the Taylor remainder below, $o(|h|^2)$ means a term whose norm divided by $|h|^2$ tends to $0$ as $h \to 0$; it is smaller than every fixed quadratic error near the point.
[quotetheorem:8707]
This statement is the operational meaning of the second derivative: after subtracting the value and the linear part, the remaining leading term is quadratic. Some theorem cards use the notation $f''(a)$ for this second derivative at a point; for a multivariable map this means the same bilinear map as $D^2f_a$, not an ordinary quotient of one-variable functions. Computations usually require a coordinate formula for that bilinear map.
[quotetheorem:331]
The indexing reflects that $\partial_{x_j}f$ comes from the first derivative applied to $e_j$, and then $\partial_{x_i}$ differentiates that component of the derivative map. A major question is when the order of these two differentiations can be exchanged.
[quotetheorem:332]
Applications often probe a multivariable function by restricting it to a line through the point of interest. Along such a line, the two-direction bilinear object becomes an ordinary one-variable second derivative. This motivates the directional interpretation.
[quotetheorem:8708]
This result explains why the diagonal values $D^2f_a(v,v)$ control curvature along straight lines. The examples below show how the abstract definition becomes concrete in computation.
## Examples
The one-dimensional case is the anchor for intuition. There, every linear map $\mathbb{R}\to\mathbb{R}$ is multiplication by a number, so the second derivative reduces to the familiar second derivative from calculus.
[example: One Variable Polynomial]
Let $f: \mathbb{R} \to \mathbb{R}$ be given by
\begin{align*}
f(x)=x^4-3x^2+2x.
\end{align*}
Differentiating term by term gives
\begin{align*}
f'(x)=4x^3-6x+2.
\end{align*}
Thus, for $a \in \mathbb{R}$, the linear map $Df_a:\mathbb{R}\to\mathbb{R}$ is multiplication by
\begin{align*}
f'(a)=4a^3-6a+2.
\end{align*}
Differentiating once more gives
\begin{align*}
f''(x)=12x^2-6,
\end{align*}
so $D^2f_a$ is multiplication by $12a^2-6$ after identifying bilinear maps on $\mathbb{R}$ with multiplication by a scalar.
At $a=1$,
\begin{align*}
f(1)=1^4-3\cdot 1^2+2\cdot 1=0,
\end{align*}
and
\begin{align*}
f'(1)=4\cdot 1^3-6\cdot 1+2=0.
\end{align*}
Also,
\begin{align*}
f''(1)=12\cdot 1^2-6=6.
\end{align*}
Expanding $f(1+h)$ gives
\begin{align*}
f(1+h)=(1+h)^4-3(1+h)^2+2(1+h).
\end{align*}
Using $(1+h)^2=1+2h+h^2$ and $(1+h)^4=1+4h+6h^2+4h^3+h^4$, this becomes
\begin{align*}
f(1+h)=1+4h+6h^2+4h^3+h^4-3-6h-3h^2+2+2h.
\end{align*}
Collecting constant, linear, and higher-order terms gives
\begin{align*}
f(1+h)=3h^2+4h^3+h^4.
\end{align*}
Since $4h^3+h^4=h^2(4h+h^2)$ and $4h+h^2\to 0$ as $h\to 0$, we have
\begin{align*}
4h^3+h^4=o(|h|^2).
\end{align*}
Therefore
\begin{align*}
f(1+h)=0+0h+3h^2+o(|h|^2).
\end{align*}
The first derivative vanishes at $1$, so the nonzero quadratic term is the first local term detected by $D^2f_1$.
[/example]
The next example shows why the multivariable second derivative should be bilinear rather than merely a list of numbers. The second-order term must accept two directions.
[example: Quadratic Form in Two Variables]
Let $f: \mathbb{R}^2 \to \mathbb{R}$ be defined by
\begin{align*}
f(x_1,x_2)=3x_1^2+2x_1x_2-x_2^2.
\end{align*}
Fix $a=(a_1,a_2)$ and $h=(h_1,h_2)$. To compute the first derivative, expand $f(a+th)$:
\begin{align*}
f(a+th)=3(a_1+th_1)^2+2(a_1+th_1)(a_2+th_2)-(a_2+th_2)^2.
\end{align*}
Expanding each factor gives
\begin{align*}
f(a+th)=3a_1^2+6ta_1h_1+3t^2h_1^2+2a_1a_2+2ta_1h_2+2ta_2h_1+2t^2h_1h_2-a_2^2-2ta_2h_2-t^2h_2^2.
\end{align*}
The coefficient of $t$ is
\begin{align*}
6a_1h_1+2a_1h_2+2a_2h_1-2a_2h_2=(6a_1+2a_2)h_1+(2a_1-2a_2)h_2,
\end{align*}
so
\begin{align*}
Df_a(h)=(6a_1+2a_2)h_1+(2a_1-2a_2)h_2.
\end{align*}
Now let $k=(k_1,k_2)$. Applying the same formula at $a+tk=(a_1+tk_1,a_2+tk_2)$ gives
\begin{align*}
Df_{a+tk}(h)=\bigl(6(a_1+tk_1)+2(a_2+tk_2)\bigr)h_1+\bigl(2(a_1+tk_1)-2(a_2+tk_2)\bigr)h_2.
\end{align*}
Subtracting $Df_a(h)$ leaves
\begin{align*}
Df_{a+tk}(h)-Df_a(h)=t(6k_1+2k_2)h_1+t(2k_1-2k_2)h_2.
\end{align*}
Dividing by $t$ and taking $t \to 0$ gives
\begin{align*}
D^2f_a(k,h)=(6k_1+2k_2)h_1+(2k_1-2k_2)h_2.
\end{align*}
Since scalar multiplication commutes, this is
\begin{align*}
D^2f_a(h,k)=6h_1k_1+2h_1k_2+2h_2k_1-2h_2k_2.
\end{align*}
The formula contains no $a_1$ or $a_2$, so the second derivative is independent of the base point. In the standard basis, its Hessian entries are $(Hf_a)_{11}=6$, $(Hf_a)_{12}=2$, $(Hf_a)_{21}=2$, and $(Hf_a)_{22}=-2$, so the Hessian matrix is the coordinate storage of this bilinear map.
[/example]
Vector-valued functions behave component by component, but the second derivative still has one bilinear input pair and a vector output. This is important in systems of equations and nonlinear maps between Euclidean spaces.
[example: Vector-Valued Second Derivative]
Let $F: \mathbb{R}^2 \to \mathbb{R}^2$ be given by
\begin{align*}
F(x_1,x_2)=(x_1x_2, e^{x_1}+x_2^3).
\end{align*}
Write $F=(F_1,F_2)$, where $F_1(x_1,x_2)=x_1x_2$ and $F_2(x_1,x_2)=e^{x_1}+x_2^3$. For $a=(a_1,a_2)$, the first partial derivatives are
\begin{align*}
\partial_{x_1}F_1(a)=a_2,\quad \partial_{x_2}F_1(a)=a_1,\quad \partial_{x_1}F_2(a)=e^{a_1},\quad \partial_{x_2}F_2(a)=3a_2^2.
\end{align*}
Thus, for $h=(h_1,h_2)$,
\begin{align*}
DF_a(h)=(a_2h_1+a_1h_2, e^{a_1}h_1+3a_2^2h_2).
\end{align*}
Now let $k=(k_1,k_2)$ and compute how $DF_x(h)$ changes when the base point moves from $a$ to $a+tk$. Since $a+tk=(a_1+tk_1,a_2+tk_2)$,
\begin{align*}
DF_{a+tk}(h)=((a_2+tk_2)h_1+(a_1+tk_1)h_2, e^{a_1+tk_1}h_1+3(a_2+tk_2)^2h_2).
\end{align*}
Subtracting $DF_a(h)$ gives
\begin{align*}
DF_{a+tk}(h)-DF_a(h)=(tk_2h_1+tk_1h_2, (e^{a_1+tk_1}-e^{a_1})h_1+3((a_2+tk_2)^2-a_2^2)h_2).
\end{align*}
For the square term,
\begin{align*}
(a_2+tk_2)^2-a_2^2=a_2^2+2ta_2k_2+t^2k_2^2-a_2^2=2ta_2k_2+t^2k_2^2.
\end{align*}
Dividing by $t$ gives
\begin{align*}
\frac{DF_{a+tk}(h)-DF_a(h)}{t}=(k_2h_1+k_1h_2, \frac{e^{a_1+tk_1}-e^{a_1}}{t}h_1+(6a_2k_2+3tk_2^2)h_2).
\end{align*}
Since $e^{a_1+tk_1}=e^{a_1}e^{tk_1}$ and $\lim_{t\to 0}(e^{tk_1}-1)/t=k_1$, we have
\begin{align*}
\lim_{t\to 0}\frac{e^{a_1+tk_1}-e^{a_1}}{t}=e^{a_1}k_1.
\end{align*}
Therefore
\begin{align*}
D^2F_a(k,h)=(k_2h_1+k_1h_2, e^{a_1}k_1h_1+6a_2k_2h_2).
\end{align*}
Renaming the two direction variables gives the same bilinear map in the usual order:
\begin{align*}
D^2F_a(h,k)=(h_1k_2+h_2k_1, e^{a_1}h_1k_1+6a_2h_2k_2).
\end{align*}
The first component records only the mixed interaction between the two coordinates, while the second component records pure $x_1$ and pure $x_2$ second-order contributions, so $D^2F_a$ naturally has vector output in $\mathbb{R}^2$.
[/example]
Existence of some second partial derivatives is not the same thing as existence of the total second derivative. The next example marks a boundary that prevents coordinate computations from being mistaken for the full concept.
[example: Existing Second Partials Without a Total Second Derivative]
Define $f: \mathbb{R}^2 \to \mathbb{R}$ by $f(0,0)=0$ and, for $(x_1,x_2)\neq(0,0)$,
\begin{align*}
f(x_1,x_2)=\frac{x_1^2x_2^2}{x_1^2+x_2^2}.
\end{align*}
We first compute the relevant partial derivatives at the origin. Since $f(s,0)=0$ and $f(0,s)=0$ for every $s \in \mathbb{R}$, the first partial derivatives at $(0,0)$ are
\begin{align*}
\partial_{x_1}f(0,0)=\lim_{s\to 0}\frac{f(s,0)-f(0,0)}{s}=\lim_{s\to 0}\frac{0-0}{s}=0
\end{align*}
and
\begin{align*}
\partial_{x_2}f(0,0)=\lim_{s\to 0}\frac{f(0,s)-f(0,0)}{s}=\lim_{s\to 0}\frac{0-0}{s}=0.
\end{align*}
For the pure second partial in the $x_1$ direction, note that $\partial_{x_1}f(s,0)=0$ for all $s$: indeed $f(s+r,0)=0$ and $f(s,0)=0$, so the difference quotient is identically $0$. Therefore
\begin{align*}
\partial_{x_1}\partial_{x_1}f(0,0)=\lim_{s\to 0}\frac{\partial_{x_1}f(s,0)-\partial_{x_1}f(0,0)}{s}=\lim_{s\to 0}\frac{0-0}{s}=0.
\end{align*}
Similarly $\partial_{x_2}f(0,s)=0$ for all $s$, so
\begin{align*}
\partial_{x_2}\partial_{x_2}f(0,0)=\lim_{s\to 0}\frac{\partial_{x_2}f(0,s)-\partial_{x_2}f(0,0)}{s}=\lim_{s\to 0}\frac{0-0}{s}=0.
\end{align*}
For the mixed partial $\partial_{x_1}\partial_{x_2}f(0,0)$, compute $\partial_{x_2}f(s,0)$ first. If $s\neq 0$, then
\begin{align*}
\partial_{x_2}f(s,0)=\lim_{r\to 0}\frac{f(s,r)-f(s,0)}{r}=\lim_{r\to 0}\frac{s^2r^2}{s^2+r^2}\cdot \frac{1}{r}.
\end{align*}
The last expression is
\begin{align*}
\frac{s^2r}{s^2+r^2},
\end{align*}
which tends to $0$ as $r\to 0$. Hence $\partial_{x_2}f(s,0)=0$ for $s\neq 0$, and also $\partial_{x_2}f(0,0)=0$. Thus
\begin{align*}
\partial_{x_1}\partial_{x_2}f(0,0)=\lim_{s\to 0}\frac{\partial_{x_2}f(s,0)-\partial_{x_2}f(0,0)}{s}=\lim_{s\to 0}\frac{0-0}{s}=0.
\end{align*}
If these second partial derivatives assembled into a total second derivative at the origin, then the quadratic term determined by them would be zero. But along the diagonal $h=(t,t)$,
\begin{align*}
f(t,t)=\frac{t^2t^2}{t^2+t^2}.
\end{align*}
For $t\neq 0$, this is
\begin{align*}
f(t,t)=\frac{t^4}{2t^2}=\frac{t^2}{2}.
\end{align*}
Also $|(t,t)|^2=t^2+t^2=2t^2$, so
\begin{align*}
\frac{f(t,t)}{|(t,t)|^2}=\frac{t^2/2}{2t^2}=\frac{1}{4}
\end{align*}
for every $t\neq 0$. Therefore $f(t,t)$ is not $o(|(t,t)|^2)$ as $t\to 0$. The coordinate second partials listed above all vanish, but the function still has a nonzero second-order contribution along the diagonal, so those partials do not form a total second derivative at the origin.
[/example]
This failure is the main warning for multivariable calculus: coordinate partials are useful evidence, but the total derivative is the invariant object. Regularity hypotheses such as $C^2$ make the coordinate and total viewpoints agree.
## Properties
Second derivatives inherit linearity from the derivative. This matters because most functions encountered in analysis are built by addition and scalar multiplication before more nonlinear operations enter. A precise linearity rule lets second-order calculations be assembled term by term from simpler pieces.
[quotetheorem:8709]
This property lets second-order calculations be built term by term. Nonlinear composition requires a richer formula, because both the outer and inner maps can contribute second-order terms. Without such a rule, second derivatives would be hard to use in changes of variables, parametrisations, and nonlinear systems.
[quotetheorem:8710]
The chain rule explains how second derivatives behave under composition; local optimization asks what information the second derivative gives by itself. At a critical point of a scalar function, the first-order term vanishes, so the sign of the quadratic term becomes the first available local test for whether nearby values rise or fall. The next theorem isolates the nondegenerate case where that quadratic information is strong enough to force an extremum.
[quotetheorem:8607]
The test is inconclusive when the quadratic form is semidefinite or indefinite in a degenerate way. A global shape condition, convexity, is also governed by the same quadratic form when the domain is convex.
[quotetheorem:2545]
This theorem explains why the Hessian appears throughout optimization. Convexity uses all diagonal values of the second derivative, while PDE often combines only the pure coordinate terms through a trace. That trace operation motivates the Laplacian.
[definition: Laplacian of a Twice Differentiable Function]
Let $U \subset \mathbb{R}^m$ be open and let $f \in C^2(U;\mathbb{R})$. The Laplacian of $f$ is the function $\Delta f: U \to \mathbb{R}$ defined by
\begin{align*}
\Delta f(x)=\sum_{i=1}^m \partial_{x_i}\partial_{x_i}f(x).
\end{align*}
[/definition]
The Laplacian discards mixed terms and keeps the trace of the Hessian. This makes it central in potential theory, heat flow, wave equations, and elliptic PDE.
## Relationship to Other Concepts
The second derivative is best understood as part of a hierarchy. The [derivative](/page/Derivative) gives the first-order approximation, the second derivative gives the quadratic correction, and higher derivatives continue by differentiating $x \mapsto D^kf_x$ when that map is defined.
Its coordinate representatives are the [partial derivatives](/page/Partial%20Derivative) of order two. Those representatives are convenient for calculation, but the total second derivative is the coordinate-free Euclidean object. The difference matters whenever partial derivatives exist without enough regularity to produce a valid quadratic approximation.
For scalar-valued maps, the Hessian Matrix packages the second derivative into a matrix. This matrix controls local extrema, convexity, and the quadratic term in Taylor Theorem. For vector-valued maps, each component has its own Hessian, while $D^2f_a$ is a single bilinear map with vector output.
In analysis and PDE, second derivatives are the basic ingredients of second-order operators such as the Laplacian and general elliptic operators. In the weak theory, classical second derivatives are replaced or extended by [weak derivatives](/page/Weak%20Derivative), leading to [Sobolev spaces](/page/Sobolev%20Space) such as $W^{2,p}(U)$.
In geometry, second derivatives are the source of curvature phenomena. Even when the final geometric objects are defined intrinsically, their coordinate formulas often begin with second derivatives of coordinate functions, metrics, or parametrisations.
[remark: Notation for One Variable]
For a function $f: U \subset \mathbb{R} \to \mathbb{R}$, the notations $D^2f_a$ and $f''(a)$ describe the same second-order information after identifying linear and bilinear maps on $\mathbb{R}$ with multiplication by [real numbers](/page/Real%20Numbers).
[/remark]
[remark: Matrix Representation]
For $f: U \subset \mathbb{R}^m \to \mathbb{R}$, the Hessian matrix $Hf_a$ depends on the chosen basis, while the bilinear map $D^2f_a$ is the underlying object. In standard Euclidean coordinates the two are related by
\begin{align*}
D^2f_a(h,k)=h^\top Hf_a k.
\end{align*}
[/remark]
## References
[Derivative](/page/Derivative).
[Partial Derivative](/page/Partial%20Derivative).
[Taylor's Theorem](/theorems/827).
Spivak, *Calculus on Manifolds* (1965).
Apostol, *Mathematical Analysis* (1974).
Munkres, *Analysis on Manifolds* (1991).
Second Derivative
Also known as: second derivative, second order derivative, second Frechet derivative, Hessian derivative