Iterated Directional Derivative

Also known as: Higher directional derivative, Repeated directional derivative, Second directional derivative, Iterated Gateaux derivative, Directional derivative iteration

Edit 0 Issues 0 Pull Requests Roadmap Admin

Content

Problems

History

Issues Verification Attributions

Iterated directional derivatives measure repeated change along prescribed directions. A first [directional derivative](/page/Directional%20Derivative) asks how a function changes when its input is moved infinitesimally along one vector. An iterated directional derivative asks the next question: after measuring change in direction $v$, how does that new scalar or vector-valued function change in direction $u$? This concept is a local child of the broader [derivative](/page/Derivative): it keeps the directional viewpoint while moving toward second-order information such as the Hessian matrix, Taylor expansions, and second-order tests in multivariable calculus. The order of the directions matters at the level of definition. Writing $D_uD_v f(a)$ means first form the directional derivative $D_v f$ near $a$, then differentiate that resulting function in direction $u$ at $a$. Under stronger smoothness hypotheses the two orders agree, but without those hypotheses the two iterated derivatives can differ or one of them can fail to exist. This is why the definition records both directions and the neighbourhood on which the first directional derivative is available. ## Definition A single directional derivative at a point uses only values of $f$ on one line through that point. To form an iterated directional derivative, the first directional derivative $D_vf$ must be defined near the point so that it becomes a genuine nearby function before the second directional derivative is applied. Requiring $f$ to be differentiable on a neighbourhood is the clean standard hypothesis that guarantees this. [definition: Iterated Directional Derivative] Let $U \subset \mathbb{R}^m$ be an [open set](/page/Open%20Set), let $a \in U$, let $u, v \in \mathbb{R}^m$, and let $f: U \to \mathbb{R}^n$ be a function. Suppose that $f$ is differentiable on an open neighbourhood $W \subset U$ of $a$, and write $D_vf(x):=Df_x(v)$ for $x \in W$. The iterated directional derivative of $f$ at $a$, first in direction $v$ and then in direction $u$, is \begin{align*} D_uD_v f(a) := D_u(D_v f)(a), \end{align*} provided the directional derivative on the right exists. [/definition] The neighbourhood hypothesis is not decorative. It ensures that $x \mapsto D_vf(x)$ is available near $a$, so the second directional derivative is an ordinary directional derivative of that nearby function. A first computation keeps the notation anchored. Linear functions have no second-order change, so they provide the baseline case against which curvature is measured. [example: Linear Function] Let $T: \mathbb{R}^m \to \mathbb{R}^n$ be linear, let $b \in \mathbb{R}^n$, and define $f(x)=T(x)+b$. For $x,v \in \mathbb{R}^m$, the directional derivative in direction $v$ is computed from the defining difference quotient: \begin{align*} \frac{f(x+tv)-f(x)}{t}=\frac{T(x+tv)+b-(T(x)+b)}{t}. \end{align*} By linearity of $T$, $T(x+tv)=T(x)+tT(v)$, so \begin{align*} \frac{T(x+tv)+b-(T(x)+b)}{t}=\frac{tT(v)}{t}=T(v) \end{align*} for every $t \ne 0$. Taking the limit as $t \to 0$ gives \begin{align*} D_vf(x)=T(v). \end{align*} Thus the function $x \mapsto D_vf(x)$ is the constant function with value $T(v)$. For $a,u \in \mathbb{R}^m$, its directional derivative in direction $u$ is therefore \begin{align*} D_uD_vf(a)=\lim_{s \to 0}\frac{D_vf(a+su)-D_vf(a)}{s}. \end{align*} Since $D_vf(a+su)=T(v)$ and $D_vf(a)=T(v)$, the quotient is \begin{align*} \frac{T(v)-T(v)}{s}=0 \end{align*} for every $s \ne 0$, so \begin{align*} D_uD_vf(a)=0. \end{align*} This example shows that affine linear functions have first-order slope but no second-order directional variation. [/example] ## Related Definitions Many second-order questions move only along one line: whether a curve bends upward, whether a critical point is stable in a chosen direction, or what the quadratic term of a one-variable slice should be. This motivates the special case where the two directions coincide. [definition: Second Directional Derivative] Let $U \subset \mathbb{R}^m$ be open, let $a \in U$, let $v \in \mathbb{R}^m$, and let $f: U \to \mathbb{R}^n$ be a function. If $f$ is differentiable on an open neighbourhood $W \subset U$ of $a$ and $D_v(D_v f)(a)$ exists, then the second directional derivative of $f$ at $a$ in direction $v$ is \begin{align*} D_v^2 f(a) := D_vD_v f(a). \end{align*} [/definition] This is not a [second derivative](/page/Second%20Derivative) with respect to a coordinate unless $v$ is a coordinate vector. It measures curvature along the affine line $a + tv$, and the length of $v$ affects the scale: replacing $v$ by $\lambda v$ scales $D_v^2 f(a)$ by $\lambda^2$ when the second derivative exists in the bilinear sense. The definition of an iterated directional derivative allows ordered differentiation even when no full second derivative exists. To connect this ordered operation with the usual second-order calculus, we need a regularity condition saying that the total derivative itself varies differentiably with the base point. That condition is twice Frechet differentiability at a point. [definition: Twice Frechet Differentiable at a Point] Let $U \subset \mathbb{R}^m$ be open, let $a \in U$, and let $f: U \to \mathbb{R}^n$ be differentiable on an open neighbourhood $W \subset U$ of $a$. The function $f$ is twice Frechet differentiable at $a$ if the map \begin{align*} W \to \mathcal{L}(\mathbb{R}^m, \mathbb{R}^n), \qquad x \mapsto Df_x \end{align*} is differentiable at $a$. [/definition] Here $Df_x$ denotes the total derivative at $x$, a [linear map](/page/Linear%20Map), not its matrix representation. In this notation, $\mathcal{L}(\mathbb{R}^m,\mathbb{R}^n)$ means the space of linear maps from $\mathbb{R}^m$ to $\mathbb{R}^n$. For scalar-valued functions, the matrix representation of the second derivative is the Hessian matrix once coordinates are chosen. ## Equivalent Characterisations The definition by repeated directional differentiation is economical, but it hides the two-variable nature of second-order change. If $f$ has a genuine second derivative at $a$, then two directions enter through a bilinear map. We write $\mathrm{Bil}(\mathbb{R}^m \times \mathbb{R}^m, \mathbb{R}^n)$ for the space of bilinear maps that take an ordered pair of directions in $\mathbb{R}^m$ and return a vector in $\mathbb{R}^n$. The next formal statement identifies the second derivative with such a bilinear object, which is the structure needed to compare the two orders of directional differentiation. [quotetheorem:9039] This theorem is the bridge from the ordered definition to the symmetric bilinear object used in second-order calculus. For scalar-valued functions, computation usually happens in coordinates. The following formula turns the bilinear second derivative into an expression involving the gradient and the Hessian matrix, making directional notation compatible with standard multivariable calculus. [quotetheorem:331] The coordinate expression involves mixed second partial derivatives, so the next natural question is when the two orders of differentiation give the same answer. The standard sufficient condition is continuous second differentiability. [quotetheorem:332] This theorem is a directional version of equality of mixed partial derivatives. The conclusion can fail when the relevant second partial derivatives exist but are not continuous near the point. ## Examples The fastest way to recognise the definition is to compute it for a quadratic polynomial. Quadratics have constant second-order behaviour, so they show the bilinear structure without analytic complications. [example: Quadratic Function] Let $A \in \mathbb{R}^{m \times m}$ be symmetric and define $f: \mathbb{R}^m \to \mathbb{R}$ by $f(x)=x^\top A x$. For $x,v \in \mathbb{R}^m$ and $t \ne 0$, the defining quotient for the directional derivative is \begin{align*} \frac{f(x+tv)-f(x)}{t}=\frac{(x+tv)^\top A(x+tv)-x^\top Ax}{t}. \end{align*} Expanding the product gives \begin{align*} (x+tv)^\top A(x+tv)=x^\top Ax+t v^\top Ax+t x^\top Av+t^2v^\top Av. \end{align*} Since $x^\top Av$ is a scalar and $A^\top=A$, we have \begin{align*} x^\top Av=(x^\top Av)^\top=v^\top A^\top x=v^\top Ax. \end{align*} Therefore \begin{align*} (x+tv)^\top A(x+tv)-x^\top Ax=2t v^\top Ax+t^2v^\top Av. \end{align*} Dividing by $t$ gives \begin{align*} \frac{f(x+tv)-f(x)}{t}=2v^\top Ax+t v^\top Av. \end{align*} Taking $t \to 0$ yields \begin{align*} D_vf(x)=2v^\top Ax. \end{align*} Now fix $a,u \in \mathbb{R}^m$ and differentiate the function $x \mapsto D_vf(x)=2v^\top Ax$ in direction $u$. For $s \ne 0$, \begin{align*} \frac{D_vf(a+su)-D_vf(a)}{s}=\frac{2v^\top A(a+su)-2v^\top Aa}{s}. \end{align*} Using linearity of matrix multiplication in the vector argument, \begin{align*} 2v^\top A(a+su)-2v^\top Aa=2v^\top Aa+2s v^\top Au-2v^\top Aa=2s v^\top Au. \end{align*} Thus the quotient equals \begin{align*} \frac{2s v^\top Au}{s}=2v^\top Au. \end{align*} Taking $s \to 0$ gives \begin{align*} D_uD_vf(a)=2v^\top Au. \end{align*} The result is independent of the base point $a$, while its dependence on $u$ and $v$ is bilinear. [/example] The next example shows how the same definition recovers familiar mixed partial derivatives when the directions are coordinate vectors. It also clarifies why coordinate formulas are special cases, not replacements for the directional definition. [example: Coordinate Directions] Let $U \subset \mathbb{R}^2$ be open, let $f \in C^2(U)$ with $f: U \to \mathbb{R}$, and let $a=(a_1,a_2) \in U$. With $e_1=(1,0)$ and $e_2=(0,1)$, the first directional derivative in direction $e_2$ is the $x_2$-partial derivative, because for $x=(x_1,x_2) \in U$, \begin{align*} D_{e_2}f(x)=\lim_{t \to 0}\frac{f(x+t e_2)-f(x)}{t}=\lim_{t \to 0}\frac{f(x_1,x_2+t)-f(x_1,x_2)}{t}=\partial_{x_2}f(x_1,x_2). \end{align*} Therefore the iterated derivative in direction $e_1$ is \begin{align*} D_{e_1}D_{e_2}f(a)=\lim_{s \to 0}\frac{\partial_{x_2}f(a_1+s,a_2)-\partial_{x_2}f(a_1,a_2)}{s}=\partial_{x_1}\partial_{x_2}f(a). \end{align*} Similarly, \begin{align*} D_{e_2}D_{e_1}f(a)=\partial_{x_2}\partial_{x_1}f(a). \end{align*} For instance, let $f(x_1,x_2)=x_1^2x_2+\sin(x_1x_2)$. Holding $x_1$ fixed and differentiating with respect to $x_2$ gives \begin{align*} \partial_{x_2}(x_1^2x_2)=x_1^2. \end{align*} By the chain rule, \begin{align*} \partial_{x_2}\sin(x_1x_2)=\cos(x_1x_2)\partial_{x_2}(x_1x_2)=x_1\cos(x_1x_2). \end{align*} Thus \begin{align*} \partial_{x_2}f(x_1,x_2)=x_1^2+x_1\cos(x_1x_2). \end{align*} Now differentiate this expression with respect to $x_1$. The first term gives \begin{align*} \partial_{x_1}(x_1^2)=2x_1. \end{align*} For the second term, the product rule and chain rule give \begin{align*} \partial_{x_1}\bigl(x_1\cos(x_1x_2)\bigr)=\cos(x_1x_2)+x_1\bigl(-\sin(x_1x_2)\bigr)\partial_{x_1}(x_1x_2). \end{align*} Since $\partial_{x_1}(x_1x_2)=x_2$, this becomes \begin{align*} \partial_{x_1}\bigl(x_1\cos(x_1x_2)\bigr)=\cos(x_1x_2)-x_1x_2\sin(x_1x_2). \end{align*} Hence \begin{align*} \partial_{x_1}\partial_{x_2}f(x_1,x_2)=2x_1+\cos(x_1x_2)-x_1x_2\sin(x_1x_2). \end{align*} Evaluating at $a=(a_1,a_2)$ gives \begin{align*} D_{e_1}D_{e_2}f(a)=2a_1+\cos(a_1a_2)-a_1a_2\sin(a_1a_2). \end{align*} This shows that coordinate directions recover the usual mixed partial derivatives as special cases of iterated directional derivatives. [/example] A boundary case is just as informative: existence of one iterated directional derivative does not force existence of the reversed one. The definition was written with an explicit order for this reason. [example: Order Can Matter Without Smoothness] Define $f: \mathbb{R}^2 \to \mathbb{R}$ by \begin{align*} f(x_1,x_2) = \begin{cases} \dfrac{x_1x_2(x_1^2-x_2^2)}{x_1^2+x_2^2}, & (x_1,x_2) \ne (0,0), 0, & (x_1,x_2) = (0,0). \end{cases} \end{align*} We compute the two coordinate iterates at $a=(0,0)$, using $e_1=(1,0)$ and $e_2=(0,1)$. First fix $h \in \mathbb{R}$ and compute $D_{e_2}f(h,0)$. Since $f(h,0)=0$, for $t \ne 0$ we have \begin{align*} \frac{f(h,t)-f(h,0)}{t}=\frac{f(h,t)}{t}. \end{align*} If $h \ne 0$, then \begin{align*} \frac{f(h,t)}{t}=\frac{1}{t}\cdot \frac{ht(h^2-t^2)}{h^2+t^2}=\frac{h(h^2-t^2)}{h^2+t^2}. \end{align*} Taking $t \to 0$ gives \begin{align*} D_{e_2}f(h,0)=\frac{h \cdot h^2}{h^2}=h. \end{align*} If $h=0$, then $f(0,t)=0$ for every $t$, so \begin{align*} D_{e_2}f(0,0)=\lim_{t \to 0}\frac{0}{t}=0. \end{align*} Thus, on the $x_1$-axis, \begin{align*} D_{e_2}f(h,0)=h \end{align*} for every $h \in \mathbb{R}$. Therefore \begin{align*} D_{e_1}D_{e_2}f(0,0)=\lim_{s \to 0}\frac{D_{e_2}f(s,0)-D_{e_2}f(0,0)}{s}. \end{align*} Using $D_{e_2}f(s,0)=s$ and $D_{e_2}f(0,0)=0$, this becomes \begin{align*} D_{e_1}D_{e_2}f(0,0)=\lim_{s \to 0}\frac{s}{s}=1. \end{align*} For the reversed order, fix $k \in \mathbb{R}$ and compute $D_{e_1}f(0,k)$. Since $f(0,k)=0$, for $t \ne 0$ we have \begin{align*} \frac{f(t,k)-f(0,k)}{t}=\frac{f(t,k)}{t}. \end{align*} If $k \ne 0$, then \begin{align*} \frac{f(t,k)}{t}=\frac{1}{t}\cdot \frac{tk(t^2-k^2)}{t^2+k^2}=\frac{k(t^2-k^2)}{t^2+k^2}. \end{align*} Taking $t \to 0$ gives \begin{align*} D_{e_1}f(0,k)=\frac{k(0-k^2)}{k^2}=-k. \end{align*} If $k=0$, then $f(t,0)=0$ for every $t$, so \begin{align*} D_{e_1}f(0,0)=\lim_{t \to 0}\frac{0}{t}=0. \end{align*} Thus, on the $x_2$-axis, \begin{align*} D_{e_1}f(0,k)=-k \end{align*} for every $k \in \mathbb{R}$. Hence \begin{align*} D_{e_2}D_{e_1}f(0,0)=\lim_{s \to 0}\frac{D_{e_1}f(0,s)-D_{e_1}f(0,0)}{s}. \end{align*} Using $D_{e_1}f(0,s)=-s$ and $D_{e_1}f(0,0)=0$, this becomes \begin{align*} D_{e_2}D_{e_1}f(0,0)=\lim_{s \to 0}\frac{-s}{s}=-1. \end{align*} The function is still differentiable at the origin. Indeed, for $(x_1,x_2) \ne (0,0)$, \begin{align*} |f(x_1,x_2)|=\frac{|x_1x_2|\,|x_1^2-x_2^2|}{x_1^2+x_2^2}. \end{align*} Since $2|x_1x_2| \le x_1^2+x_2^2$ and $|x_1^2-x_2^2| \le x_1^2+x_2^2$, we get \begin{align*} |f(x_1,x_2)| \le \frac{1}{2}(x_1^2+x_2^2). \end{align*} Writing $\|(x_1,x_2)\|=(x_1^2+x_2^2)^{1/2}$, this gives \begin{align*} \frac{|f(x_1,x_2)-0|}{\|(x_1,x_2)\|} \le \frac{1}{2}\|(x_1,x_2)\|. \end{align*} The right-hand side tends to $0$ as $(x_1,x_2) \to (0,0)$, so $Df_{(0,0)}=0$. Thus differentiability at the point does not force the ordered second directional derivatives to agree; the two orders give $1$ and $-1$ because the second-order behaviour is not controlled by $C^2$ regularity near the origin. [/example] The line-restriction viewpoint is useful for pure second derivatives in a single direction. It also prevents a common mistake: $D_v^2f(a)$ is not obtained by differentiating a one-variable function twice unless the first derivative is interpreted with the same scaling by $v$. [example: Second Derivative Along a Line] Let $U \subset \mathbb{R}^m$ be open, let $a \in U$, let $v \in \mathbb{R}^m$, and suppose $a+tv \in U$ for all sufficiently small $t$. Let $f \in C^2(U)$ with $f: U \to \mathbb{R}$, and define $g(t)=f(a+tv)$ on a small interval around $0$. For any such $t$ and for $h \ne 0$ small enough, \begin{align*} \frac{g(t+h)-g(t)}{h}=\frac{f(a+(t+h)v)-f(a+tv)}{h}. \end{align*} Since $a+(t+h)v=(a+tv)+hv$, this becomes \begin{align*} \frac{g(t+h)-g(t)}{h}=\frac{f((a+tv)+hv)-f(a+tv)}{h}. \end{align*} Taking $h \to 0$ gives \begin{align*} g'(t)=D_vf(a+tv). \end{align*} Therefore \begin{align*} g''(0)=\lim_{s \to 0}\frac{g'(s)-g'(0)}{s}. \end{align*} Substituting $g'(s)=D_vf(a+sv)$ and $g'(0)=D_vf(a)$ gives \begin{align*} g''(0)=\lim_{s \to 0}\frac{D_vf(a+sv)-D_vf(a)}{s}. \end{align*} By the definition of the second directional derivative, the right-hand side is \begin{align*} D_vD_vf(a)=D_v^2f(a). \end{align*} Thus \begin{align*} g''(0)=D_v^2f(a). \end{align*} For $f(x)=|x|^2$ on $\mathbb{R}^m$, the line restriction is \begin{align*} g(t)=|a+tv|^2=(a+tv)\cdot(a+tv). \end{align*} Expanding the dot product gives \begin{align*} g(t)=a\cdot a+t\,a\cdot v+t\,v\cdot a+t^2v\cdot v. \end{align*} Since $a\cdot v=v\cdot a$, this is \begin{align*} g(t)=|a|^2+2t\,a\cdot v+t^2|v|^2. \end{align*} Differentiating once gives \begin{align*} g'(t)=2a\cdot v+2t|v|^2. \end{align*} Differentiating again gives \begin{align*} g''(t)=2|v|^2. \end{align*} Hence \begin{align*} D_v^2f(a)=g''(0)=2|v|^2. \end{align*} The factor $|v|^2$ records that the second derivative along the line is scaled by the speed of the parametrisation $t \mapsto a+tv$. [/example] ## Properties The definition is local: changing the function away from the point and a small neighbourhood around it does not change the iterated directional derivative. This locality is inherited from the ordinary directional derivative. [quotetheorem:9040] Locality lets one compute iterated directional derivatives after restricting to a convenient neighbourhood. The next algebraic feature is linearity in the function being differentiated. [quotetheorem:9041] Linearity in the function does not by itself control how the two directions enter the expression $D_uD_vf(a)$. For an arbitrary function, the first directional derivative in direction $v$ may fail to vary smoothly enough near $a$, so differentiating it in the direction $u$ need not produce a map that is additive or homogeneous in either direction. The missing ingredient is genuine second-order differentiability near the point; under that regularity, the iterated directional derivative becomes a bilinear function of $(u,v)$. [quotetheorem:330] This theorem is the conceptual reason second-order Taylor formulas contain a quadratic term. The next statement records the one-directional form, where that quadratic term is governed by $D_v^2f(a)$. [quotetheorem:9042] This is the form most directly used in second-variation arguments, optimization, and local classification of critical points. In several variables, it is often combined with the Hessian formula to express the quadratic term as a matrix expression. ## Relationship to Other Concepts Iterated directional derivatives sit between first-order differential calculus and full second-order differential calculus. They refine the [directional derivative](/page/Directional%20Derivative) by applying it twice, and they recover mixed [partial derivatives](/page/Partial%20Derivative) when the directions are coordinate vectors. They also provide a coordinate-light route to the Hessian matrix. For $f: U \to \mathbb{R}$ of class $C^2$, the Hessian matrix records the bilinear map $(u,v) \mapsto D_uD_vf(a)$ in the standard basis. Thus the Hessian is a representation of second-order directional data, not a separate phenomenon. The concept is closely tied to [Taylor's theorem](/theorems/827). Along a line $a+tv$, the number $D_v^2f(a)$ is the second coefficient controlling how far the graph bends away from its tangent approximation. In optimization, the sign of $D_v^2f(a)$ for different directions $v$ helps distinguish minima, maxima, and saddle points. For weaker function classes, iterated directional derivatives should be compared with the [weak derivative](/page/Weak%20Derivative) and distributional notions of differentiation. A weak second derivative may exist even when classical iterated directional derivatives do not exist pointwise, while classical iterated directional derivatives may exist at isolated points without giving useful global regularity. [remark: Order and Regularity] The notation $D_uD_vf(a)$ encodes an ordered operation. Symmetry is a [regularity theorem](/theorems/2750), usually obtained from continuity of second partial derivatives or from twice Frechet differentiability. Treating the order as irrelevant without checking hypotheses can lead to wrong computations. [/remark] [remark: Vector-Valued Functions] For $f: U \to \mathbb{R}^n$, each $D_uD_vf(a)$ is a vector in $\mathbb{R}^n$. Componentwise computation is valid: if $f=(f_1,\ldots,f_n)$ and all relevant iterated derivatives exist, then \begin{align*} D_uD_vf(a)=\bigl(D_uD_vf_1(a),\ldots,D_uD_vf_n(a)\bigr). \end{align*} [/remark] ## References [Derivative](/page/Derivative), Androma. [Directional Derivative](/page/Directional%20Derivative), Androma. [Partial Derivative](/page/Partial%20Derivative), Androma. Michael Spivak, *Calculus on Manifolds* (1965). Walter Rudin, *Principles of Mathematical Analysis* (1976). Serge Lang, *Undergraduate Analysis* (1997).

Created by admin on 6/21/2026 | Last updated on 6/21/2026

What brings you to Androma?

Start with a route through the knowledge graph.

Iterated Directional Derivative

Sign in to Androma

Check your inbox

One last step

Iterated Directional Derivative

Prerequisites (0/3 completed)

Prerequisites Graph

Rate this page